Upload your data to TACC computer

    • There are 2 data storage available for TACC hadoopers now.
      • Your home directory.
      • Scratch (available @ $SCRATCH)
        • You can find out your scratch directory by typing: cds and then pwd
    • You can pull files to your scratch directory with wget. For example to get the (319MB) Open American National Corpus, do:

$ cd /your/scratch/dir

$ wget http://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip

    • You can use scp to transfer files on your local computer to your Longhorn scratch directory

$ scp myfile.tgz username@longhorn.tacc.utexas.edu:/your/scratch/dir

    • You can also use FileZilla to connect to longhorn server with SFTP.
      • Host: longhorn.tacc.utexas.edu
      • Server Type: SFTP
      • Logon Type: Normal
    • To upload big files
      • Upload your data to your name under /scratch.
      • You can find out your scratch directory by typing:

login1:~$ cd $SCRATCH

or

login1:~$ cds

    • Uploading time from local to Hadoop DFS @ 1way 128 and 256 (not much difference as I/O takes most of time).