Hadoop Cluster

Hadoop Guide

HPC Cluster currently provides an experimental Apache Hadoop cluster. The software are managed with the Cloudera Distribution Hadoop (CDH), which makes it easier to maintain the HDFS/MapReduce, and HBase. For the version, storage space, number of nodes, please see the section "HADOOP Cluster View" below.

Important Notes

Accessing Hadoop Cluster

Hadoop Cluster View

Access HDFS (Name Node) with links command from one of the cluster nodes:

links hpcdata1.priv.cwru.edu:8088

Content Excerpts:

X Apps

Y GB Memory

160 VCores

Press ESC -> File -> Exit

Some important ports (more at https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_ports.html):

8088  - Cluster Overview

19888 - Job Tracker

8888  - Hue

11000 - Oozie

Packages in CDH

You can also check the CDH version and Native Libraries

hadoop checknative -a

output:

21/04/15 10:23:28 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

21/04/15 10:23:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

Native library checking:

hadoop:  true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libhadoop.so.1.0.0

zlib:    true /lib64/libz.so.1

zstd  :  true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libzstd.so.1

snappy:  true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libsnappy.so.1

lz4:     true revision:10301

bzip2:   true /lib64/libbz2.so.1

openssl: true /lib64/libcrypto.so

ISA-L:   true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libisal.so.2


Hadoop Tutorials:

Hadoop Java: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-java

Hadoop Streaming: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-streaming

Hadoop Pipes: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-pipes 

Hadoop Hbase: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-hbase 

Apache Spark: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/apache-spark 

Hadoop Pig: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-pig

Hadoop Hive: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-hive 


References for the Hadoop Examples:

[1] Tutorial sample: http://www.youtube.com/watch?v=1ArXR5cl9fk

[2] Hadoop Tutorial: https://ccp.cloudera.com/display/DOC/Hadoop+Tutorial

[3] Hadoop Hbase Tutorial: http://hadoopinterviews.com/data-improt-hbase-map-reduce/

[4] Apache Spark: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/spark.html

[5] Standford Workshop: http://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf

[6] Apache Spark (batch): http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/

[7] Apache Pig: http://hortonworks.com/hadoop/pig/

[8] Cloudera Pig Scripts: http://archive.cloudera.com/cdh4/cdh/4/pig/start.html#pig-scripts 

[9] Pig Tutorial: https://github.com/rohitsden/pig-tutorial

[10] Apache Hive: https://hive.apache.org/

[11] Apache Hive Tutorial: https://cwiki.apache.org/confluence/display/Hive/GettingStarted