Hadoop Java
Hadoop Examples with Jar Files
Using existing Hadoop Examples
The simplest way to start with Hadoop is to use the available Hadoop Examples, that Cloudera has provided.
How to estimate Pi:
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100Run a wordcount map reduce example. Need to create input and output directory at hdfs (/user/<caseID>) and include input files in input directory (See section HDFS Filesystem below)
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/<CaseID>/wordcount/input /user/<CaseID>/wordcount/output
Compiling Java and Running the Jar file
Another way is to write your own Java code, and then create the Java classes and the Jar file.
Compile the code into Java classes with the java or javac command
javac -cp $(hadoop classpath) -d wordcount_classes/ WordCount.javaCreate the Jar file with the jar command
jar -cvf wordcount.jar -C wordcount_classes/ .Run the jar command, similar to the examples above
hadoop jar wordcount.jar org.myorg.WordCount /user/<CaseID>/wordcount/input /user/<CaseID>/wordcount/output
HDFS Filesystem
HDFS is a file system overlay on top of the native Linux OS. Therefore whenever you want to run the hadoop or yarn commands that rely on the HDFS, you would need to reference it specifically, and cannot use the usual OS shortcut.
Userspace directory is /user/<CaseID>
If this userspace is missing, please contact us to help create this for you.
hadoop fs -ls /user/<CaseID>
drwxr-xr-x - sxg125 hpcadmin 0 2015-01-28 10:28 /user/sxg125/hadoop-streaming
drwxr-xr-x - sxg125 hpcadmin 0 2015-01-27 15:29 /user/sxg125/wordcountCreating your own directories in the HDFS
hadoop fs -mkdir /user/<CaseID>/wordcount
hadoop fs -mkdir /user/<CaseID>/wordcount/inputPutting a file or files into an HDFS directory
hadoop fs -put /home/<CaseID>/hadoop-projects/file* /user/<CaseID>/wordcount/inputListing an HDFS directory
$ hadoop fs -ls /user/<CaseID>/wordcount/input
-rw-r--r-- 3 <caseID> supergroup 22 2013-01-23 15:51 /user/sxg125/wordcount/input/file01
-rw-r--r-- 3 <caseID> supergroup 28 2013-01-23 15:51 /user/<caseID>/wordcount/input/file02Listing what is inside an HDFS file:
$ hadoop fs -cat /user/<caseID>/wordcount/output/part*
Hello World Bye World
Goodbye 1
Hello 2
World 2
Bye 1
Hadoop 2
Job Tracker Location:
Track your jobs (Yarn Resource Manager): links hpcdata1.priv.cwru.edu:19888
or,
Open the firefox from the master node and type: hpcdata1.priv.cwru.edu:19888