Archivetar
Archivetar (V2) is a collection of several tools intended to make the archiving and the use of big data easier. Targeted mostly at the research / HPC use case it is useful in other cases where having fewer files but not one gigantic file is beneficial.
Fig. Create an archive for files smaller than 100M and send it via globus as Globus perform poorly with smaller file size.
Archivetar is available on Rider (RHEL 7) cluster, as a container managed through the singularity module. The variable $ARCHIVETAR is set by the singularity module. Reference that variable explicitly as shown below. To see other system-provided containers available through the singularity app, use 'module display singularity'.
Load Singularity Module
module load singularity
Execute
singularity exec $ARCHIVETAR archivetar --help | more
output:
usage: archivetar [-h] [--dryrun] -p PREFIX [-s SIZE] [-t TAR_SIZE]
Check Archivetar commands:
singularity exec $ARCHIVETAR ls /archivetar/dist
archivepurge archivescan archivetar unarchivetar
Check the ussage for each command:
singularity exec $ARCHIVETAR archivetar --help
Prepare a directory for archive
options:
-h, --help show this help message and exit
--dryrun Print what would do but dont do it, aditional --dryrun increases how far the script runs 1 = Walk Filesystem and stop, 2 = Filter and create sublists
-p PREFIX, --prefix PREFIX
Go to the location from where you are transferring files/directories and issues the following command to transfer data from source globus endpoint (e.g. cwru#dtn2) to the destination Globus end-point UUID (e.g. 0409ae6e-a356-xxxx). You can gethe UUID from Globus.org -> Collections -> Click on Colection-name.
singularity exec $ARCHIVETAR archivetar --size 100MB --tar-size 1G --prefix my-archive --remove-files --source cwru#dtn2 --destination 0409ae6e-a356-11e9-a379-0a2653bc2660 --destination-dir /~/
Here, the cutoff size is 100MB for smaller files which will be archived with each tar size of 1G (i.e. my-archive-1.tar, my-archive-2.tar etc. where my-archive is the prefix) and then deleted (flag --remove-files ). The bigger files will be transferred as is. For the first time, you may be asked to validate the account with the code as showed:
Please go to this URL and login:
https://auth.globus.org/v2/oauth2/authorize?xxx
Please enter the code you get after login here:
Clean up Tars:
rm -f my-archive*
Purge Small Files i.e. Un-Archive a directory prepped by archivetar if used with the option --save-purge-list
singularity exec $ARCHIVETAR archivepurge --purge-list my-archive-*.cache
Untar/Expand Archive Directory with the name of the prefix that you have chosen:
singularity exec $ARCHIVETAR unarchivetar --prefix my-archive
Check more usage at Archivetar Usage GitHub Page.
Review CWRU HPC singularity usage notes