Skip to main content

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 3

In our previous experiment, we ran source code of Word count MapReduce application eclipse. This time, we are going to write our own piece of code.

Remember Permutations and Combinations you studied in College? We will write a fresh approach to compute combinations of all strings in a file. You'll have to make a very few changes to the existing code.

First, you need to create a text file with some words separated by spaces:
- Create a new text file named words.txt in /home/hadoop/Documents/combinations/
- Enter some text like:
Astronomystar sun earth moon milkyway asteroid pulsar nebula mars venus jupiter neptune saturn blackhole galaxy cygnus cosmic comet solar eclipse globular panorama apollo discovery seti aurora dwarf halebopp plasmasphere supernova cluster europa juno keplar helios indego genamede neutrinos callisto messier nashville sagittarius corona circinus hydra whirlpool rosette tucanaeAndroidcupcake donut eclair froyo gingerbread honeycomb icecreamsandwich jellybean kitkat lemonadeUbuntuwarty warthog haory hedgehog breezy badger dapper drake edgy eft feisty fawn gutsy gibbon herdy heron intrepid ibex jaunty jackalope karmic koala lucid lynx meverick meerkat natty narwhal oneiric ocelot raring ringtail

- Save and exit

Now open your Eclipse. In your existing Combinatorics Project, add a new Class:
- Right click on src > New > Class
- Name it Combinations
- Replace the code with the following:

import java.io.IOException;
import java.util.Arrays;
import java.util.Date;
import java.util.SortedSet;
import java.util.StringTokenizer;
import java.util.TreeSet;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**

 * MapReduce Application to discover combinations of all words in a text file
 * 
 * @author hadoop
 * 
 */
public class Combinations {

// Inherit Mapper class, our input and output will be all text

public static class MyMap extends Mapper<LongWritable, Text, Text, Text> {
// Empty string to write as values against keys by default
private Text term = new Text();

// Mapping function to map all words to key-value pairs

protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// Convert all text to a string
String str = value.toString();
// Tokenizer to break text into words, separated by space
StringTokenizer tokenizer = new StringTokenizer(str, " ");
// Write a key-value pair against each term to context (job)
while (tokenizer.hasMoreTokens()) {
term.set(tokenizer.nextToken());
// If length of the term exceeds 20 characters, skip it, because
// processing strings of greater lengths may not be possible
if (term.getLength() > 20)
continue;
// Initially, pass term as both key and value
context.write(term, term);
}
};
}

// Inherit Reducer class, this will be executed on multiple nodes and write

// output to text file
public static class MyReduce extends Reducer<Text, Text, Text, Text> {
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// Sorted collection to store set of combinations
SortedSet<String> list = new TreeSet<String>();
// Iterate for each key
for (Text text : values) {
// Find out all combinations of a string
String str = text.toString();
int length = str.length();
// The number of combinations is 2^(n-1)
int total = ((Double) Math.pow(2, length)).intValue() - 1;
for (int i = 0; i < total; i++) {
String tmp = "";
char[] charArray = new StringBuilder(
Integer.toBinaryString(i)).reverse().toString()
.toCharArray();
for (int j = 0; j < charArray.length; j++) {
if (charArray[j] == '1') {
tmp += str.charAt(j);
}
}
list.add(tmp);
}
list.add(str);
}
// Write term as key and its combinations to output
context.write(key, new Text(Arrays.toString(list.toArray())));
};
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
// Initiate MapReduce job, named combinations
Job job = new Job(conf, "combinations");
// Our keys in output will be text
job.setOutputKeyClass(Text.class);
// Our values in output will be text
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMap.class);
job.setReducerClass(MyReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// Fetch input/output paths from arguments
FileInputFormat.addInputPath(job, new Path(args[0]));
// Create a time stamped directory inside input directory for output
FileOutputFormat.setOutputPath(job,
new Path(args[0] + String.valueOf(new Date().getTime())));
job.waitForCompletion(true);
}
}

Execute the code:
- Right click > Run As > Run Configurations
- Right click on Java Application (from left pane) > New
- On top, name it Combinations
- In Arguments tab, write /home/hadoop/Documents/combinations/
- Apply and Run

The MapReduce application should run and you should be able to collect the output in a time stamped folder in /home/hadoop/Documents/combinations/ directory

This algorithm is explained in detail here. You should also read the comments, the rest of the code is self-explanatory.

With this experiment, we'll wrap up our experiments on Single-node Hadoop clusters. I will encourage you to do some more experiments and make yourself comfortable before you try out something Big.

Comments

Popular posts from this blog

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 1

Okay. You just set up Hadoop on a single node on a VM and now wondering what comes next. Of course, you’ll run something on it, and what could be better than your own piece of code? But before we move to that, let’s first try to run an existing program to make sure things are well set on our Hadoop cluster. Power up your Ubuntu with Hadoop on it and on Terminal (Ctrl+Alt+T) run the following command: $ start-all.sh Provide the password whenever asked and when all the jobs have started, execute the following command to make sure all the jobs are running: $ jps Note: The “jps” utility is available only in Oracle JDK, not Open JDK. See, there are reasons it was recommended in the first place. You should be able to see the following services: NameNode SecondaryNameNode DataNode JobTracker TaskTracker Jps We'll take a minute to very briefly define these services first. NameNode : a component of HDFS (Hadoop File System) that manages all the f...

How to detach from Facebook... properly

Yesterday, I deactivated my Facebook account after using it for 10 years. Of course there had to be a very solid reason; there was, indeed... their privacy policy . If you go through this page, you might consider pulling off as well. Anyways, that's not what this blog post is about. What I learned from yesterday is that the so-called "deactivate" option on Facebook is nothing more than logging out. You can log in again without any additional step and resume from where you last left. Since I really wanted to remove myself from Facebook as much as I can, I investigated ways to actually delete a Facebook account. There's a plethora of blogs on the internet, which will tell you how you can simply remove Facebook account. But almost all of them will either tell you to use "deactivate" and "request delete" options. The problem with that is that Facebook still has a last reusable copy of your data. If you really want to be as safe from its s...

5 things to do when you cannot trace a bug

Programming today, is more about fixing existing code than writing new. In most of the cases, bugs are easy to trace, especially when you are using a modern IDEs like Eclipse or Visual Studio. However, it is very likely that you get trapped in situations like a specific button not doing anything, application crashing randomly,  or a record not updating for a specific ID. Here are some tips you may find lifesavers if you get jammed too often when debugging your code: Catching random errors Remember, there are no random errors unless you are calling a random function. The code is always consistent, if a function calculates compound interest of an amount over a certain period of time -- within an allowed range -- correctly, it will never do it wrong as long as the parameter values are in range. So, the code is consistent. Data, however, may not be. Here is an example: public boolean saveRecordInDB (int id, String name, float height, float weight) { // Do something } Test: ...