Skip to main content

Hadoop Bullet: a simple script to deploy Hadoop on fresh machine in automated fashion

Installing Hadoop is a hassle; it involves a variety of steps, some proficiency on Linux commands and writing to various files. If you have tried manual installation, you know what I'm talking about.

So, here is a simple Linux shell script. Save the following script as bulletinstall.sh and on your Ubuntu-ready machine, run it using:

$ sudo sh bulletinstall.sh

This script has been tested on Ubuntu 14.04 LTS; if you experience any issues, feel free to drop a comment. Here is the script:


#!/bin/bash

# This document is free to share and/or modify, and comes with ABSOLUTELY NO WARRANTIES. I will not be responsible for any damage or corruption caused to your Computer. Do know your stuff before you run this and backup your important files before trying out.

# Author: owaishussain@outlook.com

# LINUX SCRIPT TO INSTALL HADOOP 1.2.1 ON A MACHINE

# If you already have this file, then put it in /tmp directory and comment out "wget"
HADOOP_URL=http://www.us.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz

echo "************************************************************"
cd /tmp
echo "Downloading Apache Hadoop from $HADOOP_URL (you may change the version to any other, but this one has been tested)"
wget "$HADOOP_URL"
echo "STEP 1/10 COMPLETE..."
echo "************************************************************"

# Hadoop will be deployed on "/usr/local" directory, if you want to change it, then modify the path under this section
echo "************************************************************"
echo "Extracting files..."
tar -xzf hadoop-1.2.1.tar.gz
echo "Copying to \"/usr/local/\""
cp -R hadoop-1.2.1 /usr/local/
echo "STEP 2/10 COMPLETE..."
echo "************************************************************"

# Oracle JDK7 will be downloaded from a 3rd party repository (ppa:webupd8team/java. Credit to them). In case, it is unavailable, you may use an alternative in the first command under this section
echo "************************************************************"
echo "Installing Oracle JDK7. Please accept the Oracle license when asked"
add-apt-repository ppa:webupd8team/java
apt-get update
# If you are skeptik about JDK7 (and I won't blame you for that), you may switch comments respectively, in the two lines below to install JDK6
apt-get install oracle-java7-installer
#apt-get install sun-java6-jdk
echo "STEP 3/10 COMPLETE..."
echo "************************************************************"

# This script assumes that you will run Hadoop on new user \"hadoop\". If you wish to choose different user, you're most welcome to; just comment whole section below, but be sure to replace the username everywhere in the script with what you want
echo "************************************************************"
echo "Creating user and group named\" hadoop\""
adduser hadoop
adduser hadoop sudo
echo "Creating home directory for user"
mkdir -p /home/hadoop/tmp
echo "Assigning rights on home directory"
chown -R hadoop:hadoop /home/hadoop/tmp
chmod 755 /home/hadoop/tmp
echo "Changing ownership of Hadoop's installation directory"
chown -R hadoop:hadoop /usr/local/hadoop-1.2.1
echo "STEP 4/10 COMPLETE..."
echo "************************************************************"

# This is to test and initialize the newly created user
echo "************************************************************"
echo "Logging in. ***** PLEASE PROVIDE PASSWORD AND RUN 'exit' (without quotes) TO LOG OUT *****"
su - hadoop
echo "STEP 5/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Setting environment variables"
cd /usr/local/hadoop-1.2.1/conf
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> hadoop-env.sh
echo "STEP 6/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Configuring Hadoop properties in *site.xml"
mv core-site.xml core-site.xml.bck
touch core-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/tmp</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>
</configuration>" > core-site.xml
mv mapred-site.xml mapred-site.xml.bck
touch mapred-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
</property>
</configuration>" > mapred-site.xml
mv hdfs-site.xml hdfs-site.xml.bck
touch hdfs-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>" > hdfs-site.xml
chown hadoop:hadoop /usr/local/hadoop-1.2.1/conf/*-site.xml
echo "STEP 7/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> /home/hadoop/.bashrc
echo "export HADOOP_HOME=/usr/local/hadoop-1.2.1" >> /home/hadoop/.bashrc
echo "export PATH=$PATH:/usr/local/hadoop-1.2.1/bin" >> /home/hadoop/.bashrc
echo "STEP 8/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Installing OpenSSH Server"
apt-get install openssh-server
ssh-keygen -t rsa -P ""
cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
echo "STEP 9/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Removing temporary files"
# If you want to preserve the downloaded Hadoop application, you may comment out the first command under this section
rm hadoop-1.2.1.tar.gz
rm -R hadoop-1.2.1
echo "STEP 10/10 COMPLETE..."
echo "************************************************************"
echo "Congratulations! Your Hadoop setup is complete. Please log into hadoop user and start hadoop services."

echo "************************************************************"

Comments

Popular posts from this blog

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 1

Okay. You just set up Hadoop on a single node on a VM and now wondering what comes next. Of course, you’ll run something on it, and what could be better than your own piece of code? But before we move to that, let’s first try to run an existing program to make sure things are well set on our Hadoop cluster.
Power up your Ubuntu with Hadoop on it and on Terminal (Ctrl+Alt+T) run the following command: $ start-all.sh
Provide the password whenever asked and when all the jobs have started, execute the following command to make sure all the jobs are running: $ jps
Note: The “jps” utility is available only in Oracle JDK, not Open JDK. See, there are reasons it was recommended in the first place.
You should be able to see the following services: NameNode SecondaryNameNode DataNode JobTracker TaskTracker Jps


We'll take a minute to very briefly define these services first.
NameNode: a component of HDFS (Hadoop File System) that manages all the file system metadata, links, trees, directory structure, etc…

A faster, Non-recursive Algorithm to compute all Combinations of a String

Imagine you're me, and you studied Permutations and Combinations in your high school maths and after so many years, you happen to know that to solve a certain problem, you need to apply Combinations.

You do your revision and confidently open your favourite IDE to code; after typing some usual lines, you pause and think, then you do the next best thing - search on Internet. You find out a nice recursive solution, which does the job well. Like the following:

import java.util.ArrayList;
import java.util.Date;

public class Combination {
   public ArrayList<ArrayList<String>> compute (ArrayList<String> restOfVals) {
      if (restOfVals.size () < 2) {
         ArrayList<ArrayList<String>> c = new ArrayList<ArrayList<String>> ();
         c.add (restOfVals);
         return c;
      }
      else {
         ArrayList<ArrayList<String>> newList = new ArrayList<ArrayList<String>> ();
         for (String o : restOfVals) {
            A…

Titanic: A case study for predictive analysis on R (Part 4)

Working with titanic data set picked from Kaggle.com's competition, we predicted the passenger survivals with 79.426% accuracy in our previous attempt. This time, we will try to learn the missing values instead of setting trying mean or median. Let's start with Age.

Looking at the available data, we can hypothetically correlate Age with attributes like Title, Sex, Fare and HasCabin. Also note that we previous created variable AgePredicted; we will use it here to identify which records were filled previously.

> age_train <- dataset[dataset$AgePredicted == 0, c("Age","Title","Sex","Fare","HasCabin")]
>age_test <- dataset[dataset$AgePredicted == 1, c("Title","Sex","Fare","HasCabin")]
>formula <- Age ~ Title + Sex + Fare + HasCabin
>rp_fit <- rpart(formula, data=age_train, method="class")
>PredAge <- predict(rp_fit, newdata=age_test, type="vector")
&…