Upload your data to TACC computer
- There are 2 data storage available for TACC hadoopers now.
- Your home directory.
- Scratch (available @ $SCRATCH)
- You can find out your scratch directory by typing: cds and then pwd
- You can pull files to your scratch directory with wget. For example to get the (319MB) Open American National Corpus, do:
$ cd /your/scratch/dir
$ wget http://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip
- You can use scp to transfer files on your local computer to your Longhorn scratch directory
$ scp myfile.tgz username@longhorn.tacc.utexas.edu:/your/scratch/dir
- You can also use FileZilla to connect to longhorn server with SFTP.
- Host: longhorn.tacc.utexas.edu
- Server Type: SFTP
- Logon Type: Normal
- To upload big files
- Upload your data to your name under /scratch.
- You can find out your scratch directory by typing:
login1:~$ cd $SCRATCH
or
login1:~$ cds
- Uploading time from local to Hadoop DFS @ 1way 128 and 256 (not much difference as I/O takes most of time).