Hadoop Cluster
Hadoop Guide
HPC Cluster currently provides an experimental Apache Hadoop cluster. The software are managed with the Cloudera Distribution Hadoop (CDH), which makes it easier to maintain the HDFS/MapReduce, and HBase. For the version, storage space, number of nodes, please see the section "HADOOP Cluster View" below.
Important Notes
(very Imp) Please make sure to use the latest version of Hadoop header files, libraries and the jar files as they may be different in the sample examples below. For the version (e.g. CDH-<version>), please see the section "Hadoop Cluster View" below.
Also, the paths (both HOME and HDFS) used in the sample examples may not exactly match with yours. Please change them as required.
Accessing Hadoop Cluster
If you have never used our Hadoop Cluster, your CaseID would need to be added to Hadoop cluster user list. To get the account, Please email us at hpc-supportATcase.edu.
Login to hpcdata and enter your Case password when prompted
ssh -X <caseID>@hpcdata1.case.edu
Create a new directory "hadoop projects) in the /home/<CaseID> directory and cd into it.
mkdir /home/<CaseID>/hadoop-projects
cd /home/<CaseID>/hadoop-projects
Hadoop Cluster View
Access HDFS (Name Node) with links command from one of the cluster nodes:
links hpcdata1.priv.cwru.edu:8088
Content Excerpts:
X Apps
Y GB Memory
160 VCores
Press ESC -> File -> Exit
Some important ports (more at https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_ports.html):
8088 - Cluster Overview
19888 - Job Tracker
8888 - Hue
11000 - Oozie
Packages in CDH
You can also check the CDH version and Native Libraries
hadoop checknative -a
output:
21/04/15 10:23:28 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
21/04/15 10:23:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
zstd : true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libzstd.so.1
snappy: true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /lib64/libcrypto.so
ISA-L: true /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native/libisal.so.2
Hadoop Tutorials:
Hadoop Java: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-java
Hadoop Streaming: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-streaming
Hadoop Pipes: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-pipes
Hadoop Hbase: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-hbase
Apache Spark: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/apache-spark
Hadoop Pig: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-pig
Hadoop Hive: https://sites.google.com/a/case.edu/hpcc/servers-and-storage/hadoop-guide-1/hadoop-hive
References for the Hadoop Examples:
[1] Tutorial sample: http://www.youtube.com/watch?v=1ArXR5cl9fk
[2] Hadoop Tutorial: https://ccp.cloudera.com/display/DOC/Hadoop+Tutorial
[3] Hadoop Hbase Tutorial: http://hadoopinterviews.com/data-improt-hbase-map-reduce/
[4] Apache Spark: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/spark.html
[5] Standford Workshop: http://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf
[6] Apache Spark (batch): http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
[7] Apache Pig: http://hortonworks.com/hadoop/pig/
[8] Cloudera Pig Scripts: http://archive.cloudera.com/cdh4/cdh/4/pig/start.html#pig-scripts
[9] Pig Tutorial: https://github.com/rohitsden/pig-tutorial
[10] Apache Hive: https://hive.apache.org/
[11] Apache Hive Tutorial: https://cwiki.apache.org/confluence/display/Hive/GettingStarted