Sunday, April 27, 2014

Hadoop Bullet: a simple script to deploy Hadoop on fresh machine in automated fashion

Installing Hadoop is a hassle; it involves a variety of steps, some proficiency on Linux commands and writing to various files. If you have tried manual installation, you know what I'm talking about.

So, here is a simple Linux shell script. Save the following script as bulletinstall.sh and on your Ubuntu-ready machine, run it using:

$ sudo sh bulletinstall.sh

This script has been tested on Ubuntu 14.04 LTS; if you experience any issues, feel free to drop a comment. Here is the script:


#!/bin/bash

# This document is free to share and/or modify, and comes with ABSOLUTELY NO WARRANTIES. I will not be responsible for any damage or corruption caused to your Computer. Do know your stuff before you run this and backup your important files before trying out.

# Author: owaishussain@outlook.com

# LINUX SCRIPT TO INSTALL HADOOP 1.2.1 ON A MACHINE

# If you already have this file, then put it in /tmp directory and comment out "wget"
HADOOP_URL=http://www.us.apache.org/dist/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz

echo "************************************************************"
cd /tmp
echo "Downloading Apache Hadoop from $HADOOP_URL (you may change the version to any other, but this one has been tested)"
wget "$HADOOP_URL"
echo "STEP 1/10 COMPLETE..."
echo "************************************************************"

# Hadoop will be deployed on "/usr/local" directory, if you want to change it, then modify the path under this section
echo "************************************************************"
echo "Extracting files..."
tar -xzf hadoop-1.2.1.tar.gz
echo "Copying to \"/usr/local/\""
cp -R hadoop-1.2.1 /usr/local/
echo "STEP 2/10 COMPLETE..."
echo "************************************************************"

# Oracle JDK7 will be downloaded from a 3rd party repository (ppa:webupd8team/java. Credit to them). In case, it is unavailable, you may use an alternative in the first command under this section
echo "************************************************************"
echo "Installing Oracle JDK7. Please accept the Oracle license when asked"
add-apt-repository ppa:webupd8team/java
apt-get update
# If you are skeptik about JDK7 (and I won't blame you for that), you may switch comments respectively, in the two lines below to install JDK6
apt-get install oracle-java7-installer
#apt-get install sun-java6-jdk
echo "STEP 3/10 COMPLETE..."
echo "************************************************************"

# This script assumes that you will run Hadoop on new user \"hadoop\". If you wish to choose different user, you're most welcome to; just comment whole section below, but be sure to replace the username everywhere in the script with what you want
echo "************************************************************"
echo "Creating user and group named\" hadoop\""
adduser hadoop
adduser hadoop sudo
echo "Creating home directory for user"
mkdir -p /home/hadoop/tmp
echo "Assigning rights on home directory"
chown -R hadoop:hadoop /home/hadoop/tmp
chmod 755 /home/hadoop/tmp
echo "Changing ownership of Hadoop's installation directory"
chown -R hadoop:hadoop /usr/local/hadoop-1.2.1
echo "STEP 4/10 COMPLETE..."
echo "************************************************************"

# This is to test and initialize the newly created user
echo "************************************************************"
echo "Logging in. ***** PLEASE PROVIDE PASSWORD AND RUN 'exit' (without quotes) TO LOG OUT *****"
su - hadoop
echo "STEP 5/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Setting environment variables"
cd /usr/local/hadoop-1.2.1/conf
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> hadoop-env.sh
echo "STEP 6/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Configuring Hadoop properties in *site.xml"
mv core-site.xml core-site.xml.bck
touch core-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/tmp</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
</property>
</configuration>" > core-site.xml
mv mapred-site.xml mapred-site.xml.bck
touch mapred-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
</property>
</configuration>" > mapred-site.xml
mv hdfs-site.xml hdfs-site.xml.bck
touch hdfs-site.xml
echo "<?xml version=\"1.0\"?>
<?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?>
<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>" > hdfs-site.xml
chown hadoop:hadoop /usr/local/hadoop-1.2.1/conf/*-site.xml
echo "STEP 7/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "export JAVA_HOME=/usr/lib/jvm/java-7-oracle" >> /home/hadoop/.bashrc
echo "export HADOOP_HOME=/usr/local/hadoop-1.2.1" >> /home/hadoop/.bashrc
echo "export PATH=$PATH:/usr/local/hadoop-1.2.1/bin" >> /home/hadoop/.bashrc
echo "STEP 8/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Installing OpenSSH Server"
apt-get install openssh-server
ssh-keygen -t rsa -P ""
cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
echo "STEP 9/10 COMPLETE..."
echo "************************************************************"

echo "************************************************************"
echo "Removing temporary files"
# If you want to preserve the downloaded Hadoop application, you may comment out the first command under this section
rm hadoop-1.2.1.tar.gz
rm -R hadoop-1.2.1
echo "STEP 10/10 COMPLETE..."
echo "************************************************************"
echo "Congratulations! Your Hadoop setup is complete. Please log into hadoop user and start hadoop services."

echo "************************************************************"