Skip to main content

Posts

Showing posts from March, 2014

Step-by-step guide to set up Multi-node Hadoop Cluster on Ubuntu Virtual Machines

If you've landed on this page, I know your feelings. Wanna know how it feels when it's done? Ride on, it's like Roller Coaster... You have successfully configured a single-node cluster in 7 easy steps . Good! But you are yet to taste the real essence of Hadoop. Recall that the primary purpose of Hadoop is to distribute a very lengthy task to more than one machines. This is exactly what we are going to do, but the only difference is that we will be doing so in Virtual machines. Step 1: Networking We have several things to do first with the existing VM, beginning with disabling IPv6. This is a recommendation because Hadoop currently does not support IPv6 according to their official Wiki . In order to do so, you will have to modify a fine named  /etc/sysctl.conf : - Launch your Virtual Machine from Virtualbox - Open your terminal and run: $ sudo nano /etc/sysctl.conf - Add the following lines at the end of the file: # Disable ipv6 net.ipv6.conf.all.disable_ipv6 =...

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 3

In our previous experiment , we ran source code of Word count MapReduce application eclipse. This time, we are going to write our own piece of code. Remember Permutations and Combinations you studied in College? We will write a fresh approach to compute combinations of all strings in a file. You'll have to make a very few changes to the existing code. First, you need to create a text file with some words separated by spaces: - Create a new text file named words.txt in /home/hadoop/Documents/combinations/ - Enter some text like: Astronomy star sun earth moon milkyway asteroid pulsar nebula mars venus jupiter neptune saturn blackhole galaxy cygnus cosmic comet solar eclipse globular panorama apollo discovery seti aurora dwarf halebopp plasmasphere supernova cluster europa juno keplar helios indego genamede neutrinos callisto messier nashville sagittarius corona circinus hydra whirlpool rosette tucanae Android cupcake donut eclair froyo gingerbread honeycomb icecreamsandwich...

A faster, Non-recursive Algorithm to compute all Combinations of a String

Imagine you're me, and you studied Permutations and Combinations in your high school maths and after so many years, you happen to know that to solve a certain problem, you need to apply Combinations. You do your revision and confidently open your favourite IDE to code; after typing some usual lines, you pause and think, then you do the next best thing - search on Internet. You find out a nice recursive solution, which does the job well. Like the following: import java.util.ArrayList; import java.util.Date; public class Combination {    public ArrayList<ArrayList<String>> compute (ArrayList<String> restOfVals) {       if (restOfVals.size () < 2) {          ArrayList<ArrayList<String>> c = new ArrayList<ArrayList<String>> ();          c.add (restOfVals);          return c;       }       else {  ...

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 2

Previously, we saw how to execute built-in example of Word Count on Hadoop , in this part, we will try to build the same application on Eclipse from the source code of word count and run it. First, you need to install Eclipse on your Hadoop-ready Virtual Machine (assuming that JDK is already installed when you set up Hadoop). This can be done by installing from Ubuntu software center, but my recommendation is that you download it and extract to your Home directory. Any version of Eclipse should work, I have done the experiments on version 4.3 (Kepler). After installation, launch Eclipse and the first thing to do is to make Oracle JDK your default Java Runtime: - Go to Window > Preferences > Java > Installed JREs - If the default JRE does not point to Oracle JRE, then edit and set the directory to /usr/lib/jvm/java-7-oracle/ - Press OK to finish Now we will create a Java Application Project: - Go to New > Java Project - Name the pro...