Hadoop Pipes
HADOOP Pipes
Hadoop Pipes is the C++ interface to Hadoop Reduce. Hadoop Pipes uses sockets to enable task-trackers to communicate processes running the C++ map or reduce functions.
Copy the C++ file wordcount.cpp from /home/sxg125/hadoop-projects/hadoop-streaming/pipes and cd to that path.
cp -r /home/sxg125/hadoop-projects/hadoop-streaming/pipes .
cd hadoop-projects/hadoop-streaming/pipes
Native Libraries
First check the path for the native libraries and use that path while compiling.
hadoop checknative -a
output:
15/05/06 16:45:52 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
15/05/06 16:45:52 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
Compilation
Compile the code:
g++ -m64 wordcount.cpp -I /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-0.20-mapreduce/include/hadoop -Wall -L /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/lib/native -lhadooppipes -lhadooputils -lpthread -lssl -g -O2 -o wordcount
You should get the executable "wordcount"
Copy the executable to /user/<user>/hadoop-pipes/input. You can use the same input directory for wordcount but create a new output directory /user/<user>/hadoop-pipes/output.
hadoop fs -put wordcount /user/<CaseID>/wordcount/input
Running
Next, we can run the job
hadoop pipes -D mapred.job.name=maxtemp -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /user/<user>/wordcount/input/* -output /user/<user>/wordcount/output -program /user/<user>/wordcount/input/wordcount
Check the output.
hadoop fs -cat /user/<user>/hadoop-pipes/output/part*
Since file01 and file02 are used as input files, the output is similar to the ones you got by using java.
Hadoop 2
Hello 2
Bye 1
World 2
Goodbye 1