Skip to main content

Posts

Showing posts from 2014

A step-by-step guide to query data on Hadoop using Hive

Hadoop empowers us to solve problems that require intense processing and storage on commodity hardware harnessing the power of distributed computing, while ensuring reliability. When it comes to applicability beyond experimental purposes, the industry welcomes Hadoop with warm heart, as it can query their databases in realistic time regardless of the volume of data. In this post, we will try to run some experiments to see how this can be done.

Before you start, make sure you have set up a Hadoop cluster. We will use Hive, a data warehouse to query large data sets and a adequate-sized sample data set, along with an imaginary database of a travelling agency on MySQL; the DB consisting of details about their clients, including Flight bookings, details of bookings and hotel reservations. Their data model is as below:



The number of records in the database tables are as:
- booking: 2.1M
- booking_detail: 2.1M
- booking_hotel: 1.48M
- city: 2.2K

We will write a query that retrieves total country-wi…

Finally, a way to speed up Android emulator

The Android emulator's speed is killer. Literally. Android developers know this; irrespective of platform you are using, Windows, Linux or Macintosh, and your hardware specification, the emulator provided with the Android's SDK crawls like a snail.

Thankfully, the developers at ‎Intel® have come up with a hardware-assisted virtualization engine that uses the power of hardware virtualization to boost the performance of Android emulators. Here is how you can configure it:

1. First, you need to enable Hardware Virtualization (VTx) technology from the BIOS settings on your computer. Since different vendors have different settings, you may have to search on the web on how to do so.

2. Next, you need to download and install Android HAX Manager from Intel. During installation, you will be asked to reserve the amount of memory for HAX, keep it default (1024MB).

3. Now go to your Android SDK directory and launch "SDK Manager.exe".

4. Under Extras, check "Intel x86 Emulator Ac…

Hadoop Bullet: a simple script to deploy Hadoop on fresh machine in automated fashion

Installing Hadoop is a hassle; it involves a variety of steps, some proficiency on Linux commands and writing to various files. If you have tried manual installation, you know what I'm talking about.

So, here is a simple Linux shell script. Save the following script as bulletinstall.sh and on your Ubuntu-ready machine, run it using:

$ sudo sh bulletinstall.sh

This script has been tested on Ubuntu 14.04 LTS; if you experience any issues, feel free to drop a comment. Here is the script:


#!/bin/bash

# This document is free to share and/or modify, and comes with ABSOLUTELY NO WARRANTIES. I will not be responsible for any damage or corruption caused to your Computer. Do know your stuff before you run this and backup your important files before trying out.

# Author: owaishussain@outlook.com

# LINUX SCRIPT TO INSTALL HADOOP 1.2.1 ON A MACHINE

# If you already have this file, then put it in /tmp directory and comment out "wget"
HADOOP_URL=http://www.us.apache.org/dist/hadoop/common/hadoo…

Step-by-step guide to set up Multi-node Hadoop Cluster on Ubuntu Virtual Machines

If you've landed on this page, I know your feelings. Wanna know how it feels when it's done? Ride on, it's like Roller Coaster...

You have successfully configured a single-node cluster in 7 easy steps. Good! But you are yet to taste the real essence of Hadoop. Recall that the primary purpose of Hadoop is to distribute a very lengthy task to more than one machines. This is exactly what we are going to do, but the only difference is that we will be doing so in Virtual machines.

Step 1: NetworkingWe have several things to do first with the existing VM, beginning with disabling IPv6. This is a recommendation because Hadoop currently does not support IPv6 according to their official Wiki. In order to do so, you will have to modify a fine named /etc/sysctl.conf:
- Launch your Virtual Machine from Virtualbox
- Open your terminal and run:
$ sudo nano /etc/sysctl.conf

- Add the following lines at the end of the file:
# Disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.di…

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 3

In our previous experiment, we ran source code of Word count MapReduce application eclipse. This time, we are going to write our own piece of code.

Remember Permutations and Combinations you studied in College? We will write a fresh approach to compute combinations of all strings in a file. You'll have to make a very few changes to the existing code.

First, you need to create a text file with some words separated by spaces:
- Create a new text file named words.txt in /home/hadoop/Documents/combinations/
- Enter some text like:
Astronomystar sun earth moon milkyway asteroid pulsar nebula mars venus jupiter neptune saturn blackhole galaxy cygnus cosmic comet solar eclipse globular panorama apollo discovery seti aurora dwarf halebopp plasmasphere supernova cluster europa juno keplar helios indego genamede neutrinos callisto messier nashville sagittarius corona circinus hydra whirlpool rosette tucanaeAndroidcupcake donut eclair froyo gingerbread honeycomb icecreamsandwich jellybean kitkat …