Skip to main content

Hadoop Bullet: a simple script to deploy Hadoop on fresh machine in automated fashion

Installing Hadoop is a hassle; it involves a variety of steps, some proficiency on Linux commands and writing to various files. If you have tried manual installation, you know what I'm talking about.

So, here is a simple Linux shell script. Save the following script as and on your Ubuntu-ready machine, run it using:

$ sudo sh

This script has been tested on Ubuntu 14.04 LTS; if you experience any issues, feel free to drop a comment. Here is the script:


# This document is free to share and/or modify, and comes with ABSOLUTELY NO WARRANTIES. I will not be responsible for any damage or corruption caused to your Computer. Do know your stuff before you run this and backup your important files before trying out.

# Author:


# If you already have this file, then put it in /tmp directory and comment out "wget"

echo "************************************************************"
cd /tmp
echo "Downloading Apache Hadoop from $HADOOP_URL (you may change the version to any other, but this one has been tested)"
wget "$HADOOP_URL"
echo "STEP 1/10 COMPLETE..."
echo "************************************************************"

# Hadoop will be deployed on "/usr/local" directory, if you want to change it, then modify the path under this section
echo "************************************************************"
echo "Extracting files..."
tar -xzf hadoop-1.2.1.tar.gz
echo "Copying to \"/usr/local/\""
cp -R hadoop-1.2.1 /usr/local/
echo "STEP 2/10 COMPLETE..."
echo "************************************************************"

# Oracle JDK7 will be downloaded from a 3rd party repository (ppa:webupd8team/java. Credit to them). In case, it is unavailable, you may use an alternative in the first command under this section
echo "************************************************************"
echo "Installing Oracle JDK7. Please accept the Oracle license when asked"
add-apt-repository ppa:webupd8team/java
apt-get update
# If you are skeptik about JDK7 (and I won't blame you for that), you may switch comments respectively, in the two lines below to install JDK6
apt-get install oracle-java7-installer
#apt-get install sun-java6-jdk
echo "STEP 3/10 COMPLETE..."
echo "************************************************************"

# This script assumes that you will run Hadoop on new user \"hadoop\". If you wish to choose different user, you're most welcome to; just comment whole section below, but be sure to replace the username everywhere in the script with what you want
echo "************************************************************"
echo "Creating user and group named\" hadoop\""
adduser hadoop
adduser hadoop sudo
echo "Creating home directory for user"
mkdir -p /home/hadoop/tmp
echo "Assigning rights on home directory"
chown -R hadoop:hadoop /home/hadoop/tmp
chmod 755 /home/hadoop/tmp
echo "Changing ownership of Hadoop's installation directory"
chown -R hadoop:hadoop /usr/local/hadoop-1.2.1
echo "STEP 4/10 COMPLETE..."
echo "************************************************************"

# This is to test and initialize the newly created user
echo "************************************************************"
echo "Logging in. ***** PLEASE PROVIDE PASSWORD AND RUN 'exit' (without quotes) TO LOG OUT *****"
su - hadoop
echo "STEP 5/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Setting environment variables"
cd /usr/local/hadoop-1.2.1/conf
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >>
echo "STEP 6/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Configuring Hadoop properties in *site.xml"
mv core-site.xml core-site.xml.bck
touch core-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
</configuration>" > core-site.xml
mv mapred-site.xml mapred-site.xml.bck
touch mapred-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
</configuration>" > mapred-site.xml
mv hdfs-site.xml hdfs-site.xml.bck
touch hdfs-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
</configuration>" > hdfs-site.xml
chown hadoop:hadoop /usr/local/hadoop-1.2.1/conf/*-site.xml
echo "STEP 7/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> /home/hadoop/.bashrc
echo "export HADOOP_HOME=/usr/local/hadoop-1.2.1" >> /home/hadoop/.bashrc
echo "export PATH=$PATH:/usr/local/hadoop-1.2.1/bin" >> /home/hadoop/.bashrc
echo "STEP 8/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Installing OpenSSH Server"
apt-get install openssh-server
ssh-keygen -t rsa -P ""
cat /home/hadoop/.ssh/ >> /home/hadoop/.ssh/authorized_keys
echo "STEP 9/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Removing temporary files"
# If you want to preserve the downloaded Hadoop application, you may comment out the first command under this section
rm hadoop-1.2.1.tar.gz
rm -R hadoop-1.2.1
echo "STEP 10/10 COMPLETE..."
echo "************************************************************"
echo "Congratulations! Your Hadoop setup is complete. Please log into hadoop user and start hadoop services."

echo "************************************************************"


Popular posts from this blog

Playing in Amazon's Clouds - Introduction to Elastic Computing Cloud - Part 1

A really brief Intro.. Researcher, Trying to execute an extremely computationally resource hungry experiment? App developer, unsure of how much data you'll be collecting from the users? Student, tasked to build your FYP (final year project) on distributed computing environment? Just an ordinary techie trying to catch up with the world? If you're any of these, you cannot escape the fact that Cloud computing is storming in and you have to engage yourself actively in it. Adopt it, or perish. I'm a newbie (better say wannabe) in this massive web of computing, and here just to share some experiences I'm having - successes and failures. First of all, Cloud computing is nothing new, it has been there for over 3 decades and was referred with names like Grid computing  and Distributed computing . It was business people that came up with a catchy name to attract business. The idea behind distributed computing is simple. We create a network of computers t...

How to detach from Facebook... properly

Yesterday, I deactivated my Facebook account after using it for 10 years. Of course there had to be a very solid reason; there was, indeed... their privacy policy . If you go through this page, you might consider pulling off as well. Anyways, that's not what this blog post is about. What I learned from yesterday is that the so-called "deactivate" option on Facebook is nothing more than logging out. You can log in again without any additional step and resume from where you last left. Since I really wanted to remove myself from Facebook as much as I can, I investigated ways to actually delete a Facebook account. There's a plethora of blogs on the internet, which will tell you how you can simply remove Facebook account. But almost all of them will either tell you to use "deactivate" and "request delete" options. The problem with that is that Facebook still has a last reusable copy of your data. If you really want to be as safe from its s...

Yet another Blog on Query Optimization for MySQL Server

If you have been into MIS development for some time, then you may have realized that buying latest, multi-thousand-dollar Machine, stuffed with a top notch processor and an army of memory chips is not sufficient to your needs when it comes to processing large data, especially when your DBMS is MySQL Server. In this article, I have tried to input  the tips and techniques to-be-followed - some in general and some specific to MySQL Server; but I would, as every blogger, repeat the same common phrase that " in the end   it all depends on your scenario ". The results you are going to see will mostly be in milliseconds so before thinking "is it worth the effort if the result is in a few milliseconds?", do know that these results are derived using a very very simple database with not more than 100000 records in a table.  With complex databases and records in millions, the effort will pay you back. Coming straight to topic, here are some points you should not ign...