Transferring Cloud Storage Data with rclone

Transferring files to and from Google Drive with RCLONE

Having access to Google Drive from the HPC environment provides an option to archive data and even share data with collaborators who have no access to the NYU HPC environment. Other options to archiving data include the HPC Archive file system and using Globus to share data with collaborators.

Access to Google Drive is provided by rclone - rsync for cloud storage - a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. rclone is available on Greene cluster as a module, the module currently (November 2022) is rclone/1.60.1 

For more details on how to use rclone to sync files to Google Drive, please see: https://rclone.org/drive/

rclone can be invoked in one of the three modes:

Please try with these options: 

rclone --transfers=32 --checkers=16 --drive-chunk-size=16384k --drive-upload-cutoff=16384k copy source:sourcepath dest:destpath

This option works great for file sizes 1Gb+ to 250GB. Keep in mind that there is a rate limiting of 2 files/sec for upload into Google Drive.  Small file transfers don’t work that well. If you have many small jobs, please tar the parent directory of such folders and splits the tar file into 100GB chunks and uploads then into Google Drive.

rclone Configuration

You need to configure rclone before you will be able to move files between the HPC Environment and Google Drive

There are specific instruction on the rclone web site: https://rclone.org/drive/

Step 1:  Login to Greene:

Follow instructions to log into the Greene HPC cluster.

Step 2:  Load the rclone module

$ module load rclone/1.60.1

Step 3: Configure rclone

Configuring rclone and setting up remote access to your Google Drive, using the command:

$ rclone config

This will try to open the config files and you will see the below content:

You can select one of the options (here we show how to set up a new remote)

2021/03/23 18:10:29 NOTICE: Config file "/home/netid/.config/rclone/rclone.conf" not found - using defaultsNo remotes found - make a new onen) New remotes) Set configuration passwordq) Quit confign/s/q> nname> remote1Type of storage to configure.Enter a string value. Press Enter for the default ("").Choose a number from below, or type in your own value 1 / 1Fichier   \ "fichier" 2 / Alias for an existing remote   \ "alias" 3 / Amazon Drive   \ "amazon cloud drive" 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)   \ "s3" 5 / Backblaze B2   \ "b2" 6 / Box   \ "box" 7 / Cache a remote   \ "cache" 8 / Citrix Sharefile    \ "sharefile" 9 / Dropbox   \ "dropbox"10 / Encrypt/Decrypt a remote    \ "crypt"11 / FTP Connection   \ "ftp" 12 / Google Cloud Storage (this is not Google Drive)   \ "google cloud storage"13 / Google Drive   \ "drive" 14 / Google Photos   \ "google photos"............37 / premiumize.me   \ "premiumizeme"38 / seafile   \ "seafile"Storage> 13** See help for drive backend at: https://rclone.org/drive/ **Google Application Client IdSetting your own is recommended.See https://rclone.org/drive/#making-your-own-client-id for how to create your own.If you leave this blank, it will use an internal key which is low performance.Enter a string value. Press Enter for the default ("").client_id> Just Hit EnterOAuth Client SecretLeave blank normally.Enter a string value. Press Enter for the default ("").client_secret> Just Hit EnterScope that rclone should use when requesting access from drive.Enter a string value. Press Enter for the default ("").Choose a number from below, or type in your own value 1 / Full access all files, excluding Application Data Folder.   \ "drive" 2 / Read-only access to file metadata and file contents.   \ "drive.readonly"   / Access to files created by rclone only. 3 | These are visible in the drive website.   | File authorization is revoked when the user deauthorizes the app.   \ "drive.file"   / Allows read and write access to the Application Data folder. 4 | This is not visible in the drive website.   \ "drive.appfolder"   / Allows read-only access to file metadata but 5 | does not allow any access to read or download file content.   \ "drive.metadata.readonly"scope> 1ID of the root folderLeave blank normally.Fill in to access "Computers" folders (see docs), or for rclone to usea non root folder as its starting point.Enter a string value. Press Enter for the default ("").root_folder_id> Just Hit EnterService Account Credentials JSON file pathLeave blank normally.Needed only if you want use SA instead of interactive login.Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.Enter a string value. Press Enter for the default ("").service_account_file> Just Hit EnterEdit advanced config? (y/n)y) Yesn) No (default)y/n> nRemote configUse auto config? * Say Y if not sure * Say N if you are working on a remote or headless machiney) Yes (default)n) Noy/n> nPlease go to the following link: https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id= CUT AND PASTE The URL ABOVE INTO A BROWSER ON YOUR LAPTOP/DESKTOP Log in and authorize rclone for accessEnter verification code> ENTER VERIFICATION CODE HEREConfigure this as a team drive?y) Yesn) No (default)y/n> n--------------------[remote1]type = drivescope = drivetoken = {"access_token":", removed "}--------------------y) Yes this is OK (default)e) Edit this remoted) Delete this remotey/e/d> yCurrent remotes:Name                 Type====                 ====remote1              drivee) Edit existing remoten) New remoted) Delete remoter) Rename remotec) Copy remotes) Set configuration passwordq) Quit confige/n/d/r/c/s/q> q

Step 4:

Sample commands:

$ rclone lsd remote1:

Transferring files to Google Drive, using the command below:

$ rclone copy <source_folder> <remote_name>:<name_of_folder_on_gdrive>

It looks something like below:

$ rclone copy /home/user1 remote1:backup_home_user1

Step 5:

The files are transferred and you can find the files on your Google Drive.

Note: Rclone only copies new files or files different from the already existing files on Google Drive.