Hadoop Pipes

HADOOP Pipes

Hadoop Pipes is the C++ interface to Hadoop Reduce. Hadoop Pipes uses sockets to enable task-trackers to communicate processes running the C++ map or reduce functions.

Copy the C++ file wordcount.cpp from /home/sxg125/hadoop-projects/hadoop-streaming/pipes and cd to that path.

cp -r /home/sxg125/hadoop-projects/hadoop-streaming/pipes .

cd hadoop-projects/hadoop-streaming/pipes

Native Libraries

First check the path for the native libraries and use that path while compiling.

hadoop checknative -a

output: 

15/05/06 16:45:52 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

15/05/06 16:45:52 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

Native library checking:

hadoop: true /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native/libhadoop.so.1.0.0

zlib: true /lib64/libz.so.1

snappy: true /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native/libsnappy.so.1

lz4: true revision:99

bzip2: true /lib64/libbz2.so.1

openssl: true /usr/lib64/libcrypto.so


Compilation

Compile the code:

g++ -m64 wordcount.cpp -I /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop-0.20-mapreduce/include/hadoop -Wall -L /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/lib/native -lhadooppipes -lhadooputils -lpthread -lssl -g -O2 -o wordcount

You should get the executable "wordcount"

Copy the executable to /user/<user>/hadoop-pipes/input. You can use the same input directory for wordcount but create a new output directory /user/<user>/hadoop-pipes/output.

hadoop fs -put wordcount /user/<CaseID>/wordcount/input


Running 

Next, we can run the job

 hadoop pipes -D mapred.job.name=maxtemp -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /user/<user>/wordcount/input/* -output /user/<user>/wordcount/output -program /user/<user>/wordcount/input/wordcount

Check the output.

hadoop fs -cat /user/<user>/hadoop-pipes/output/part*

Since file01 and file02 are used as input files, the output is similar to the ones you got by using java.

Hadoop  2

Hello   2

Bye     1

World   2

Goodbye 1