Combination of R and Hadoop
R Packages for Hadoop -RHadoop, RHIPE
R Packages for Spark -SparkR
R Packages forH2O -h2o
R Packages forMongoDB -rmongodb,Rmongo
Major advantage of using R with Hadoop is "Elastic data analytics platform"
Big Data Platform (Hadoop) + Data Analytics (R) = Big data Analytics (RHadoop)
rhdfs - R and HDFS connector
rmr -R interface for calling Hadoop MapReduce API
rhbase - R and HBase connector
Available on github
Steps to install RHadoop in your PC:
installing R
Installing Hadoop
Installing R packages
configuring environment
installing RhHadoop (rmr,rhdfs,rhbase)
library('rhdfs')
library('rmr2')
initialization
Hdfs operations
hdfs.init()
hdfs.defaults()
file manipulation
hdfs.put()
hdfs.copy()
hdfs.write()
hdfs.read()
hdfs.rm()
hdfs.ls()
RMR opretaions
to.dfs()
from.dfs()
mapreduce(input,output,map,reduce,combine,input.format,output.format,verbose)
keyval(key,val)
Example: Word Count script in R (Wordcount.R)
wordcount=function(input,output = NULL,pattern = " ") #pattern represents delimiter
{
wcmap=function(k,lines) { #Mapper function
keyval(unlist(strsplit(x = lines,split = pattern)),1)}
wcreduce=function(word,counts) { #Reducer function
keyval(word,sum(counts))}
mapreduce(input=input,output=output,input.format="text",map=wcmap,reduce=wcreduce)
}
wordcount('<inputfilelpath>')
from.dfs("<tmp_file_location>") #Checking the wordcount output
source("./Wordcount.R")
*****Page is in under construction*****