Hadoop-HDFS



To install hadoop:

define at least JAVA_HOME to be the root of your Java installation.
$ vim conf/hadoop-env.sh

$ bin/hadoop    # try to see if you get its usage 

run example
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop*examples*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Setup passphraseless ssh
check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Use the following conf/hadoop-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Start the hadoop daemons:
$ bin/start-all.sh
When you're done, stop the daemons with:
$ bin/stop-all.sh

Run example

Copy the files at the input and files into the distributed filesystem:
$ bin/hadoop fs -put conf input
initially in HDFS there is only the following, but with the above command it creates  /user/morteza/input and puts alll the stuff from input dir there
/tmp/hadoop-morteza/mapred/system/jobtracker.info

Run some of the examples provided:
$ bin/hadoop jar hadoop*examples*.jar grep input output 'dfs[a-z.]+'

Examine the output files: 
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
OR
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

Some commands

$ bin/hadoop dfs -ls /

$  bin/hadoop dfsadmin -report   # free disk usage



Subpages (1): Cloud
Comments