Hadoop Java 程序1:复制本地文件到 HDFS

按:这个例子来源于《Hadoop 权威指南》,但《Hadoop Java 程序2:复制本地文件(包括目录)到 HDFS》的方式更简单,且能满足需求。

pom.xml dependcies:

        <dependency>

            <groupId>org.apache.hadoop</groupId>

            <artifactId>hadoop-common</artifactId>

            <version>2.6.0</version>

        </dependency>

        <dependency>

            <groupId>org.apache.hadoop</groupId>

            <artifactId>hadoop-hdfs</artifactId>

            <version>2.6.0</version>

        </dependency>

Java 代码如下,之所以把import *也贴上,是因为 hadoop 包和 JDK 的同名类实在是太多了,调整麻烦且容易出错。

import java.io.BufferedInputStream;

import java.io.FileInputStream;

import java.io.InputStream;

import java.io.OutputStream;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IOUtils;

import org.apache.hadoop.util.Progressable;

public class FileCopyWithProgress {

    public static void main(String[] args) throws Exception{

        String localSrc = args[0];

        String dst = args[1];

        InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(URI.create(dst), conf);

        OutputStream out = fs.create(new Path(dst), new Progressable(){

            public void progress(){

                System.out.print(".");

            }

        });

        IOUtils.copyBytes(in, out, 4096, true);

    }

}

设置 classpath:

export HADOOP_CLASSPATH=/home/hadoop/sandbox

将FileCopyWithProgress和FileCopyWithProgress$1复制到 namenode 的 /heome/hadoop/sandbox 目录中,然后执行

$ hadoop FileCopyWithProgress testCopyLocalFileToHdfs.txt hdfs://192.168.1.150:9000/testCopyLocalFileToHdfsHello.txt

这条命令也可以简写为:

$ hadoop FileCopyWithProgress testCopyLocalFileToHdfs.txt /testCopyLocalFileToHdfsHello.txt

以上指令用到的测试数据testCopyLocalFileToHdfs.txt请事先准备,192.168.1.150 是 namenode 的 IP.

验证,去 http://192.168.1.150:50070/explorer.html 看看, 它已经在那里了。