Hadoop Installation and Mapreduce Basic program Examples
Hadoop 2.9.0 Single Node Cluster Configuration on Ubuntu
Run the following commands in Ububtu terminal. Internet connection is mandatory
Installing Java
1. sudo apt-get update
2. sudo apt install openjdk-8-jdk
3. java -version
4. update-alternatives --config java
Adding a dedicated Hadoop user
5. sudo addgroup hadoop
6. sudo adduser --ingroup hadoop hduser
7. groups hduser
8. sudo adduser hduser sudo
Installing SSH
9. sudo apt-get install ssh
10. which ssh
11. which sshd
12. su hduser
13. ssh-keygen
14. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
15. ssh localhost
Download Hadoop
Download Hadoop 2.6.5
16 wget https://archive.apache.org/dist/hadoop/core/hadoop-2.9.0/hadoop-2.9.0.tar.gz
Extract Hadoop zip file
17. tar xvzf hadoop-2.9.0.tar.gz
cd hadoop-2.9.0
18. sudo mkdir -p /usr/local/hadoop
19. sudo mv * /usr/local/hadoop
20. sudo chown -R hduser:hadoop /usr/local/hadoop
Setup Configuration Files
The following files need to be configured one by one
I. ~/.bashrc
II. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
III. /usr/local/hadoop/etc/hadoop/core-site.xml
IV. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
V. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Install vim editor
sudo apt install vim-gtk3
21.vim ~/.bashrc
Append the following lines to ~/.bashrc file
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#export HADOOP_HOME=/usr/local/hadoop/sbin
hadoop-env.sh
22. source ~/.bashrc
23. javac -version
24. which javac
25. readlink -f /usr/bin/javac
26 vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Add the following lines to hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
core-site.xml
27. sudo mkdir -p /app/hadoop/tmp
28. sudo chown hduser:hadoop /app/hadoop/tmp
29. vim /usr/local/hadoop/etc/hadoop/core-site.xml
Add the following lines to core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
mapred-site.xml
30. cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
31.vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
Add the following lines to mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Namenode and the Datanode creation
32. sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
33. sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
34. sudo chown -R hduser:hadoop /usr/local/hadoop_store
hdfs-site.xml
35. vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Add the following lines to hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
Add the following lines to yarn-site.xml
36. vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Format the New Hadoop Filesystem
37. hadoop namenode -format
Start NameNode daemon and DataNode daemon
38. start-dfs.sh
39. start-yarn.sh
40 jps
or
start-all.sh
netstat
jps
Stop NameNode daemon and DataNode daemon
stop-dfs.sh
stop-yarn.sh
jps
or
stop-all.sh
netstat
web UI of the NameNode daemon
NameNode - http://localhost:50070/
SecondaryNameNode
http://localhost:50090/status.jsp
Resource Manager:
http://localhost:8088/