Hadoop Java

Hadoop Examples with Jar Files

Using existing Hadoop Examples

The simplest way to start with Hadoop is to use the available Hadoop Examples, that Cloudera has provided.

1. How to estimate Pi:
  hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
2. Run a wordcount map reduce example. Need to create input and output directory at hdfs (/user/<caseID>) and include input files in input directory (See section HDFS Filesystem below)
  hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/<CaseID>/wordcount/input /user/<CaseID>/wordcount/output

Compiling Java and Running the Jar file

Another way is to write your own Java code, and then create the Java classes and the Jar file.

Compile the code into Java classes with the java or javac command
javac -cp $(hadoop classpath) -d wordcount_classes/ WordCount.java
Create the Jar file with the jar command
jar -cvf wordcount.jar -C wordcount_classes/ .
Run the jar command, similar to the examples above
hadoop jar wordcount.jar org.myorg.WordCount /user/<CaseID>/wordcount/input /user/<CaseID>/wordcount/output

HDFS Filesystem

HDFS is a file system overlay on top of the native Linux OS. Therefore whenever you want to run the hadoop or yarn commands that rely on the HDFS, you would need to reference it specifically, and cannot use the usual OS shortcut.

Userspace directory is /user/<CaseID>
If this userspace is missing, please contact us to help create this for you.
hadoop fs -ls /user/<CaseID>
drwxr-xr-x - sxg125 hpcadmin 0 2015-01-28 10:28 /user/sxg125/hadoop-streaming
drwxr-xr-x - sxg125 hpcadmin 0 2015-01-27 15:29 /user/sxg125/wordcount
Creating your own directories in the HDFS
hadoop fs -mkdir /user/<CaseID>/wordcount
hadoop fs -mkdir /user/<CaseID>/wordcount/input
Putting a file or files into an HDFS directory
hadoop fs -put /home/<CaseID>/hadoop-projects/file* /user/<CaseID>/wordcount/input
Listing an HDFS directory
$ hadoop fs -ls /user/<CaseID>/wordcount/input
-rw-r--r-- 3 <caseID> supergroup 22 2013-01-23 15:51 /user/sxg125/wordcount/input/file01
-rw-r--r-- 3 <caseID> supergroup 28 2013-01-23 15:51 /user/<caseID>/wordcount/input/file02
Listing what is inside an HDFS file:
$ hadoop fs -cat /user/<caseID>/wordcount/output/part*
Hello World Bye World
Goodbye 1
Hello 2
World 2
Bye 1
Hadoop 2

Job Tracker Location:

Track your jobs (Yarn Resource Manager): links hpcdata1.priv.cwru.edu:19888

or,

Open the firefox from the master node and type: hpcdata1.priv.cwru.edu:19888

Page updated

Report abuse