Archivetar

Archivetar (V2) is a collection of several tools intended to make the archiving and the use of big data easier. Targeted mostly at the research / HPC use case it is useful in other cases where having fewer files but not one gigantic file is beneficial.

Fig. Create an archive for files smaller than 100M and send it via globus as Globus perform poorly with smaller file size.


Archivetar is available on Rider (RHEL 7) cluster, as a container managed through the singularity module. The variable $ARCHIVETAR is set by the singularity module. Reference that variable explicitly as shown below.  To see other system-provided containers available through the singularity app, use 'module display singularity'.

Load Singularity Module

module load singularity

Execute

singularity exec $ARCHIVETAR archivetar --help | more

output:

usage: archivetar [-h] [--dryrun] -p PREFIX [-s SIZE] [-t TAR_SIZE]  

Check Archivetar commands:

singularity exec $ARCHIVETAR ls /archivetar/dist
archivepurge  archivescan  archivetar  unarchivetar

Check the ussage for each command:

singularity exec $ARCHIVETAR archivetar --help

Prepare a directory for archive

options:
  -h, --help            show this help message and exit
  --dryrun              Print what would do but dont do it, aditional --dryrun increases how far the script runs 1 = Walk Filesystem and stop, 2 = Filter and create sublists
  -p PREFIX, --prefix PREFIX


Go to the location from where you are transferring files/directories and issues the following command to transfer data from source globus endpoint (e.g. cwru#dtn2) to the destination Globus end-point UUID (e.g. 0409ae6e-a356-xxxx). You can gethe UUID from Globus.org -> Collections -> Click on Colection-name.

singularity exec $ARCHIVETAR archivetar --size 100MB --tar-size 1G --prefix my-archive --remove-files --source cwru#dtn2 --destination 0409ae6e-a356-11e9-a379-0a2653bc2660 --destination-dir /~/

Here, the cutoff size is 100MB for smaller files which will be archived with each tar size of 1G (i.e. my-archive-1.tar, my-archive-2.tar etc. where my-archive is the prefix) and then deleted (flag --remove-files ). The bigger files will be transferred as is. For the first time, you may be asked to validate the account with the code as showed:

Please go to this URL and login:
https://auth.globus.org/v2/oauth2/authorize?xxx

Please enter the code you get after login here:  

Clean up Tars:

rm -f my-archive*  

Purge Small Files i.e. Un-Archive a directory prepped by archivetar if used with the option --save-purge-list

singularity exec $ARCHIVETAR archivepurge --purge-list my-archive-*.cache

Untar/Expand Archive Directory with the name of the prefix that you have chosen:

singularity exec $ARCHIVETAR unarchivetar --prefix my-archive

Check more usage at Archivetar Usage GitHub Page.

Review CWRU HPC singularity usage notes