Hadoop supports programming in other languages (python, R, ..etc) via genericAPI called streaming.
Similar to pipes concept in unix.
Format is " Input | Mapper | Reducer
Ex: echo "this sentence has five words" |cat | wc
Default input format: "Key \t value \n"
Running a normal hadoop job
hadoop jar wordcount.jar inputfile outputfile
Running a hadoop streaming job
hadoop jar hadoop-streaming.jar -input inputfile -output outputfile -file filename -mapper path -reducer path
Ex: word count hadoop streaming job using linux commands
hadoop jar hadoop-streaming.jar -input hello.txt -output /output -mapper cat -reducer "wc -l" # to count the number of lines in the input file
hadoop fs -ls -R /output
Two files will be listed like /output/SUCCESS and /output/part-00001
hadoop fs -cat /output/part* #shows content of output file
Ex: word count hadoop streaming job using python
Mapper.py -mapper code in python
Reducer.py -reducer code in python
hadoop jar hadoop-streaming.jar -input inputfile -output outputfil -file mappercodefile -mapper mappername -file reducercodefile -reducer reducername