Hadoop @ TACC‎ > ‎

Upload your data to TACC computer

  • There are 2 data storage available for TACC hadoopers now.
    • Your home directory.
    • Scratch (available @ $SCRATCH)
      • You can find out your scratch directory by typing: cds and then pwd
  • You can pull files to your scratch directory with wget. For example to get the (319MB) Open American National Corpus, do:
$ cd /your/scratch/dir
$ wget http://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip
  • You can use scp to transfer files on your local computer to your Longhorn scratch directory
$ scp myfile.tgz username@longhorn.tacc.utexas.edu:/your/scratch/dir
  • You can also use FileZilla to connect to longhorn server with SFTP.
    • Host: longhorn.tacc.utexas.edu
    • Server Type: SFTP
    • Logon Type: Normal

  •  To upload big files
    • Upload your data to your name under /scratch.
    • You can find out your scratch directory by typing:
login1:~$ cd $SCRATCH


login1:~$ cds

  • Uploading time from local to Hadoop DFS @ 1way 128 and 256 (not much difference as I/O takes most of time).

16(1way 128)00:01:5500:09:3300:19:3300:48:1701:34:50
32(1way 256)00:01:5300:09:3800:18:5100:47:3701:35:04