By | July 28, 2015

Hadoop installation on ubuntu – Hadoop 2.7.1

Prerequisites

  • Virtualbox
  • Ubuntu 10.04 LTS or higher

Note – In this tutorial series i will explain all component installation and configuration manually, we will cover all hadoop admin related activity, for futher tutorial you can always refer installation button available in menu bar, If you need help on VitualBox and Ubuntu configuration just drop comment in comment box i will upload new post for vitualbox + ubuntu installation, In this tutorial, I will use gedit instead of vi editor, if you are familiar with vi editor you can use it.

Step 1 – Download JDK7.tar.gz from oracle official site and extract JDK7

tar -zxvf jdk-7u79-linux-i586.tar.gz

Set JAVA Environment Path

gedit ~/.bashrc
JAVA_HOME=/root/software/jdk1.7.0_79
export JAVA_HOME
PATH=$PATH:$JAVA_HOME/bin
export PATH
source ~/.bashrc

Check Java is working fine.

java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) Client VM (build 24.79-b02, mixed mode)

Step 2 – Disable IPV6 – Hadoop does not support IPV6

gedit /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Step 3 – Install openssh-server

sudo apt-get install openssh-server

Step 4 –  Create Hadoop user

sudo addgroup hadoop

sudo adduser --ingroup hadoop hduser

My Console logs -
root@guest:~# sudo addgroup hadoop
Adding group `hadoop' (GID 1001) ...
Done.
root@guest:~# sudo adduser --ingroup hadoop hduser
Adding user `hduser' ...
Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
    Full Name []: 
    Room Number []: 
    Work Phone []: 
    Home Phone []: 
    Other []: 
Is the information correct? [Y/n] Y
root@guest:~

Add hduser in sudo group

sudo adduser hduser sudo

root@guest:~# sudo adduser hduser sudo
Adding user `hduser' to group `sudo' ...
Adding user hduser to group sudo
Done.
root@guest:~#

Step 5 – Generate RSA public and private key, switch to hduser

su - hduser

ssh-keygen -t rsa -P ""

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh localhost

root@guest:~# su - hduser
hduser@guest:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa): 
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
3a:59:39:c7:2a:14:5a:f8:28:34:8e:0d:b9:e1:39:d1 hduser@guest
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
| ..  .           |
|+.oE. o          |
|.Oo. = . o       |
|o++ o o S o      |
|  .. . + +       |
|      = .        |
|       o         |
|                 |
+-----------------+
hduser@guest:~$ cat /home/hduser/.ssh/id_rsa.pub >> /home/hduser/.ssh/authorized_keys
hduser@guest:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is aa:e1:84:60:16:49:5b:0a:c4:02:ff:b2:62:16:89:c2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux guest 2.6.32-38-generic #83-Ubuntu SMP Wed Jan 4 11:13:04 UTC 2012 i686 GNU/Linux
Ubuntu 10.04.4 LTS

Welcome to Ubuntu!
 * Documentation:  https://help.ubuntu.com/

271 packages can be updated.
244 updates are security updates.


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
hduser@guest:~$ 

Step 6 – Hadoop Installation

wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

tar -xvf hadoop-2.7.1.tar.gz

Make directory for Hadoop.


mkdir /usr/hadoop/

mv hadoop-2.7.1 /usr/Hadoop/

Give ownership of this folder to hduser

sudo chown -R hduser:hadoop /usr/hadoop/hadoop-2.7.1

Step 7 – Environment configuration

gedit ~/.bashrc

At the end of file paste below content

export HADOOP_HOME=/usr/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Hadoop configuration file

cd /usr/Hadoop/hadoop-2.7.1/etc/hadoop

gedit core-site.xml

Add below content in file

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/hadoop/tmp</value>
 </property>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>
</configuration>

Save it and you need to make tmp dir for this type

sudo mkdir /usr/hadoop/tmp

Assign owner of this folder to hduser

sudo chown hduser:hadoop /usr/hadoop/tmp

gedit hdfs-site.xml

Add below line into this file

<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/hadoop_data/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/hadoop_data/datanode</value>
 </property>
</configuration>

Save it but you need to create directory, follow below steps

sudo mkdir /usr/hadoop_data

sudo chown -R hduser:hadoop /usr/hadoop_data

We need to edit mapred-site.xml.template, for this type copy command and remove .template from the end of file

cp /usr/hadoop/etc/hadoop/mapred-site.xml.template /usr/hadoop/etc/hadoop/mapred-site.xml

gedit mapred-site.xml

Add below lines

<configuration>
 <property>
 <name>mapred.job.tracker</name>
 <value>localhost:9001</value>
 </property>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
 </configuration>
gedit yarn-site.xml

Add below lines and save file

<configuration>
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>
gedit hadoop-env.sh

export JAVA_HOME=/root/software/jdk1.7.0_79

Step 8 – First time starting of namenode

hdfs namenode -format

hduser@guest:/usr/hadoop/hadoop-2.7.1$ bin/hdfs namenode -format
.. 
..
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at guest/127.0.1.1
************************************************************/
hduser@guest:/usr/hadoop/hadoop-2.7.1$
sbin/start-all.sh

hduser@guest:/usr/hadoop/hadoop-2.7.1$ sbin/start-all.
jps
For YARN(Resource Manager) Web UI - http://localhost:8088
For Namenode WEB UI - http://localhost:50070

Next Tutorial we will see how to install and configure latest Hive

Please share this Knowledge ! Join us on Facebook for more updates !

Leave a Reply

Your email address will not be published. Required fields are marked *