Globus 

Large Scale Data Transfer (Utility Software)

Why use Globus?

The best way to move large amounts of data on or off of RC servers is by using software called Globus and the Data Transfer Nodes.

While Secure File Transfer Protocol (sftp) is a universal standard, it has its disadvantages. It's relatively slow and would require restarting the entire file transfer if the connection is disrupted. 

Globus will automatically pause the transfer if your machine loses connection to the internet, and will resume once you reconnect. The protocol Globus is built on, GridFTP, supports using parallel TCP streams and multi-node transfers to achieve high performance. "Globus lets you use a web browser or command line interface to submit transfer and synchronization requests, optionally choosing encryption. Globus takes it from there. With this 'fire and forget' model you can concentrate on your research while Globus handles the mundane (but important) details of successful large-scale data transfer" (globus site)

How do I log in to the data transfer node?

You can ssh to the data transfer node using either the ssh command shown below (with your user name replacing 'user') if you are on Mac or Linux. If you are on Windows you will need an ssh client  such as PuTTY. More information can be found here.

ssh user@dtn.coeus.rc.pdx.edu

If you are unable to connect, you may need to request access to this node from help-rc@pdx.edu.


Transferring Data with Globus using the Data Transfer Node

Basic Steps:


Globus Setup

/vol/globus is NOT a volume to store your data. Data here will be periodically removed. Move your data to an appropriate location after transferring it, such as your home directory or research share.

Globus Web App and Setting up the Globus Client 

To begin you need to set up a globus client on the machine you would like to transfer data to or from, and create an account on the web app. The client will run in the background on your machine and make it possible for globus to access your filesystem. The web app is what you will use to transfer data, be sure to choose Portland State University as your organization when setting up the web app. 

Detailed instructions for downloading Globus Connect Personal and setting up an endpoint can be found on the globus site. Follow the instructions for your operating system. If you choose to not run the globus client automatically on start up (installation option), you will need to start it manually before transferring any data to or from your local endpoint. 


Transferring data

Personal Endpoints

At this point you have configured a single endpoint. You can practice transferring toy data with this tutorial from globus. You can set up another endpoint on a different machine and use Globus to transfer data between the endpoints as long as Globus Connect Personal is currently running on both endpoints.


Data Transfer Node

The data transfer node is set up specifically for high speed data transfer. It can be used to transfer data on to or off of PSU's research computing systems at higher speeds than possible with two personal endpoints. To transfer data to or from the data transfer node, you will first need to create a directory to store your data. Log in to the data transfer node then run the following commands:

cd /vol/globus

mkdir your_username

Now on the globus web app navigate to the File Manager, click 'transfer or sync' and indicate the two 'Collections' you want to transfer data between. If you search 'coeus' the option 'portlandstate#coeus' will appear in the drop down. This is how you access the directory you created '/vol/globus/your_username' on the data transfer node.

Screenshot from 2018-12-11 13-56-09.png

Select all the data you wish to transfer, and click the 'Start' button.


Transferring to or from Standard Compute Servers

You can use the command rsync to transfer data to or from the data transfer node. Log in to the data transfer node, and navigate to your directory 'vol/globus/your_username'. As rsync may take a long time it is a good idea to run this command inside a screen or tmux session in case you lose connection with the server.

The following is an example transferring 'Example_Directory' from the home directory of user 'user' on Circe to the data transfer node:

ssh user@dtn.coeus.rc.pdx.edu

cd /vol/globus/user

screen

rsync -avz your_username@circe.rc.pdx.edu:/home/user/Example_Directory ./

About this rsync command:


If using screen detach from the session by pressing <ctrl><a><d> all at the same time. You will get an output similar to:

[detached from #####.pts-0.dtn]

Refer the bottom of this page for more resources on screen and tmux.

Reattach to kill the screen session after rsync has completed. To reattach to the screen session run the command 'screen -r'. Kill the screen session by pressing <ctrl><d> at the same time. You will get the output:

[screen is terminating]

More information on basic screen commands


The same process can be used to transfer files off of the data transfer node after using globus.