Hadoop installation

Modified from https://tecadmin.net/setup-hadoop-on-ubuntu/

IPv6 disable: https://itsfoss.com/disable-ipv6-ubuntu-linux/

Apache Hadoop 3.1 have noticeable improvements any many bug fixes over the previous stable 3.0 releases. This version has many improvements in HDFS and MapReduce. This tutorial will help you to install and configure Hadoop 3.1.2 Single-Node Cluster on Ubuntu 18.04, 16.04 LTS and LinuxMint Systems. This article has been tested with Ubuntu 18.04 LTS.

Step 1 – Prerequsities

Java is the primary requirement for running Hadoop on any system, So make sure you have Java installed on your system using the following command. If you don’t have Java installed on your system, use one of the following links to install it first. JAVA 1.8 openjdk

- On the command line, type:
  - $ sudo apt-get install openjdk-8-jdk
  - The openjdk-8-jre package contains just the Java Runtime Environment. If you want to develop Java programs then please install the openjdk-8-jdk package.
- java -version

openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08) OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

check your java location with command

$ jrunscript -e 'java.lang.System.out.println(java.lang.System.getProperty("java.home"));'

/usr/lib/jvm/java-8-openjdk-amd64/jre

Step 2 – Create User for Hadoop

We recommend creating a normal (nor root) account for Hadoop working. To create an account using the following command.

adduser hadoop

After creating the account, it also required to set up key-based ssh to its own account. To do this use execute following commands.

su - hadoop

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

Now, SSH to localhost with Hadoop user. This should not ask for the password but the first time it will prompt for adding RSA to the list of known hosts.

ssh localhost

exit

Maybe for convenience you need to make hadoop user as sudoer:

You can follow this instruction.

https://www.digitalocean.com/community/tutorials/how-to-create-a-sudo-user-on-ubuntu-quickstart

Step 3 – Download Hadoop Source Archive

In this step, download hadoop 3.1 source archive file using below command. You can also select alternate download mirror for increasing download speed.

cd ~

wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

tar xzf hadoop-3.2.1.tar.gz

mv hadoop-3.2.1 hadoop

Step 4 – Setup Hadoop Pseudo-Distributed Mode

4.1. Setup Hadoop Environment Variables

Setup the environment variables used by the Hadoop. Edit ~/.bashrc file and append following values at end of file.

export HADOOP_HOME=/home/hadoop/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop

export JAVA_HOME="$(jrunscript -e 'java.lang.System.out.println(java.lang.System.getProperty("java.home"));')"

export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

export CLASSPATH=./

export CLASSPATH=$CLASSPATH:`hadoop classpath`:.:

export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Then, apply the changes in the current running environment

source ~/.bashrc

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable. Change the JAVA path as per install on your system. This path may vary as per your operating system version and installation source. So make sure you are using the correct path.

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Update below entry:

export JAVA_HOME="$(jrunscript -e 'java.lang.System.out.println(java.lang.System.getProperty("java.home"));')"

4.2. Setup Hadoop Configuration Files

Hadoop has many configuration files, which need to configure as per requirements of your Hadoop infrastructure. Let’s start with the configuration with basic Hadoop single node cluster setup. first, navigate to below location

cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Edit hdfs-site.xml

<name>dfs.replication</name>

</property>

<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>

</property>

<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>

</property>

</configuration>

Edit mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

<property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property>

</configuration>

Edit yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

4.3. Format Namenode

Now format the namenode using the following command, make sure that Storage directory is

hdfs namenode -format

Sample output:

WARNING: /home/hadoop/hadoop/logs does not exist. Creating.

2018-05-02 17:52:09,678 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = tecadmin/127.0.1.1

STARTUP_MSG: args = [-format]

...

2018-05-02 17:52:13,717 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.

2018-05-02 17:52:13,806 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression

2018-05-02 17:52:14,161 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .

2018-05-02 17:52:14,224 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

2018-05-02 17:52:14,282 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at tecadmin/127.0.1.1

************************************************************/

Step 5 – Start Hadoop Cluster

Let’s start your Hadoop cluster using the scripts provides by Hadoop. Just navigate to your $HADOOP_HOME/sbin directory and execute scripts one by one.

cd $HADOOP_HOME/sbin/

Now execute start-dfs.sh script.

./start-dfs.sh

Then execute start-yarn.sh script.

./start-yarn.sh

Step 6 – Access Hadoop Services in Browser

Hadoop NameNode started on default port 9870. Access your server on port 9870 in your favorite web browser.

http://yourip: 9870/

Page updated

Google Sites

Report abuse