Hadoop Installation and Mapreduce Basic program Examples

Hadoop 2.9.0 Single Node Cluster Configuration on Ubuntu

Run the following commands in Ububtu terminal. Internet connection is mandatory

Installing Java

1. sudo apt-get update

2. sudo apt install openjdk-8-jdk

3. java -version

4. update-alternatives --config java

Adding a dedicated Hadoop user

5. sudo addgroup hadoop

6. sudo adduser --ingroup hadoop hduser

7. groups hduser

8. sudo adduser hduser sudo

Installing SSH

9. sudo apt-get install ssh

10. which ssh

11. which sshd

12. su hduser

13. ssh-keygen

14. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

15. ssh localhost

Download Hadoop

Download Hadoop 2.6.5

16 wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz

Extract Hadoop zip file

17. tar xvzf hadoop-2.9.0.tar.gz

cd hadoop-2.9.0

18. sudo mkdir -p /usr/local/hadoop

19. sudo mv * /usr/local/hadoop

20. sudo chown -R hduser:hadoop /usr/local/hadoop

Setup Configuration Files

The following files need to be configured one by one

I. ~/.bashrc

II. /usr/local/hadoop/etc/hadoop/hadoop-env.sh

III. /usr/local/hadoop/etc/hadoop/core-site.xml

IV. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template

V. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Install vim editor

sudo apt install vim-gtk3

21.vim ~/.bashrc

Append the following lines to ~/.bashrc file

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#export HADOOP_HOME=/usr/local/hadoop/sbin

hadoop-env.sh

22. source ~/.bashrc

23. javac -version

24. which javac

25. readlink -f /usr/bin/javac

26 vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Add the following lines to hadoop-env.sh

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

core-site.xml

27. sudo mkdir -p /app/hadoop/tmp

28. sudo chown hduser:hadoop /app/hadoop/tmp

29. vim /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following lines to core-site.xml

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
</configuration>


mapred-site.xml

30. cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

31.vim /usr/local/hadoop/etc/hadoop/mapred-site.xml

Add the following lines to mapred-site.xml

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
</configuration>

Namenode and the Datanode creation

32. sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

33. sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

34. sudo chown -R hduser:hadoop /usr/local/hadoop_store

hdfs-site.xml

35. vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following lines to hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>

Add the following lines to yarn-site.xml

36. vim /usr/local/hadoop/etc/hadoop/yarn-site.xml


<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
</configuration>

Format the New Hadoop Filesystem

37. hadoop namenode -format

Start NameNode daemon and DataNode daemon

38. start-dfs.sh

39. start-yarn.sh

40 jps

or

start-all.sh

netstat

jps

Stop NameNode daemon and DataNode daemon

stop-dfs.sh

stop-yarn.sh

jps

or

stop-all.sh

netstat

web UI of the NameNode daemon

NameNode - http://localhost:50070/

SecondaryNameNode

http://localhost:50090/status.jsp

Resource Manager:

http://localhost:8088/