Trang chủ‎ > ‎IT‎ > ‎Hadoop‎ > ‎

Setting up a Apache Hadoop 2.7 single node on Ubuntu 14.04

I have been setting up recently a Apache Hadoop 2.7 single node “cluster” for testing purposes. Thought it might be useful to save it and share it for future. Here is all what you need to get it running on Ubuntu 14.04.

  1. Prerequisites:
    • Java 7 (at least)
      To check installed version, run:
      java -version
      

      To install Java 8, run:

      sudo add-apt-repository ppa:webupd8team/java
      sudo apt-get update
      sudo apt-get install oracle-java8-installer
      sudo apt-get install oracle-java8-set-default
      
    • SSH Server accepting password authentication (at least for the setup time).
      To install, run:
      sudo apt-get install openssh-server
      

      To enable password authentication (if you have just installed SSH server, probably you do not need to do that), run:

      sudo sed -i -e 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config
      sudo service ssh restart
      
  2. Create hadoop group and user:
    sudo addgroup hadoop
    sudo adduser --ingroup hadoop hadoop
    

    Provide information you are asked for.

  3. Download and unpack Apache Hadoop 2.7 (check the closest mirror here):

    wget http://apache.rediris.es/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz
    sudo tar -xzvf hadoop-2.7.0.tar.gz -C /usr/local/lib/
    sudo chown -R hadoop:hadoop /usr/local/lib/hadoop-2.7.0
    
  4. Create HDFS directories:
    sudo mkdir -p /var/lib/hadoop/hdfs/namenode
    sudo mkdir -p /var/lib/hadoop/hdfs/datanode
    sudo chown -R hadoop /var/lib/hadoop
    

    If you want to use different paths, change values in 7.9 appropriately.

  5. Log is as the hadoop user:

    sudo su - hadoop
    
  6. Create SSH key and add it to authorized keys:
    ssh-keygen -t rsa -P ""
    ssh-copy-id -i ~/.ssh/id_rsa localhost
    

    You might be asked to accept the machine key and provide hadoop user’s password.

  7. Configure Hadoop:

    1. Check where your Java is installed:
      readlink -f /usr/bin/java
      

      If you get something like /usr/lib/jvm/java-8-oracle/jre/bin/java,/usr/lib/jvm/java-8-oracle is what you should used for JAVA_HOME.

    2. Add to ~/.bashrc file:

      export JAVA_HOME=/usr/lib/jvm/java-8-oracle
      export HADOOP_INSTALL=/usr/local/lib/hadoop-2.7.0
      export PATH=$PATH:$HADOOP_INSTALL/bin
      export PATH=$PATH:$HADOOP_INSTALL/sbin
      export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
      export HADOOP_COMMON_HOME=$HADOOP_INSTALL
      export HADOOP_HDFS_HOME=$HADOOP_INSTALL
      export YARN_HOME=$HADOOP_INSTALL
      export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
      export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
      
    3. Reload ~/.bashrc file:
      source ~/.bashrc
      
    4. Modify JAVA_HOME in /usr/local/lib/hadoop-2.7.0/etc/hadoop/hadoop-env.sh:
      export JAVA_HOME=/usr/lib/jvm/java-8-oracle
      
    5. Modify /usr/local/lib/hadoop-2.7.0/etc/hadoop/core-site.xml to have something like:
      <configuration>
        ...
        <property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000</value>
        </property>
        ...
      </configuration>
      
    6. Modify /usr/local/lib/hadoop-2.7.0/etc/hadoop/yarn-site.xml to have something like:
      <configuration>
        ...
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        ...
      </configuration>
      
    7. Create /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml from template:
      cp /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml.template /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml
      
    8. Modify /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml to have something like:
      <configuration>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
      </configuration>
      
    9. Modify /usr/local/lib/hadoop-2.7.0/etc/hadoop/hdfs-site.xml to have something like:
      <configuration>
        ...
        <property>
          <name>dfs.replication</name>
          <value>1</value>
        </property>
        <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/var/lib/hadoop/hdfs/namenode</value>
        </property>
        <property>
          <name>dfs.datanode.data.dir</name>
          <value>file:/var/lib/hadoop/hdfs/datanode</value>
        </property>
        ...
      </configuration>
      
  8. Format file system:
    hdfs namenode -format
    
  9. Start Hadoop:
    start-dfs.sh
    start-yarn.sh
    

    You might be asked to accept machine’s key.

  10. Check if everything is running:

    jps
    

    You should get something like:

    Jps
    NodeManager
    NameNode
    ResourceManager
    DataNode
    SecondaryNameNode
    
  11. Enjoy Hadoop!
Comments