Skip to main content

Posts

Showing posts from January, 2014

7-step guide to set up Hadoop (single-node Cluster) on Ubuntu Virtual Machine

Apache Hadoop is a framework for distributed computing , based on Hadoop Distributed File System (HDFS); it offers a solution for large-scale data containment and processing by incorporating a system to distribute the data keeping, its retrieval and processing to inexpensive hardware. Fancy as it seems, deploying and configuring Hadoop, unfortunately isn't a trivial process. It requires big effort, time and expertise. In this post, we will try to demonstrate how to setup a single-node cluster of Hadoop on a Ubuntu Linux virtual machine, purely for experimental purposes. If done correctly, this might be helpful for large-scale deployments. Let's get started. Step 1: Getting Ubuntu VM ready First step is to install Oracle VirtualBox on your machine and create an Ubuntu (13.04) Virtual machine. After successful creation of Virtual Machine (VM), launch and update your OS on VM using the following combination of commands on Terminal (Ctrl+T): # Update your OS usin