Rclone

The following directions are heavily borrowed from MIT Libraries and Digital Archivist Joe Carrano. You can read more of their digital preservation workflows here.


Rclone is a tool for managing material in cloud services. For BHL’s purposes it is used as a way of transferring content out of a donor's cloud storage into BHL's digital storage.


Table of contents

Setting up Rclone

Rclone is already installed on the Removable Media Workstations but if you want to transfer content using your own computer you’ll need to install and set up Rclone. Set-up will have to be repeated for each cloud storage service and possibly reconnected if some time has passed between the last time you've used Rclone. Please contact the Archivist for Digital Curation for installation and set-up help. 

Using Rclone

In general we use Rclone for transferring files from cloud services. When possible we also use it to confirm the fixity of the files downloaded. 

Appraisal or preparing for a transfer 

When transferring content from a cloud service, it may not be apparent the extent or content contained in storage. Rclone has some features which allow for basic analysis of the contents of cloud storage. 

Creating a list of files and basic metadata 

The lsf  command allows for listing information about files in a machine-readable way. Particularly useful options are -s for size, -t for modified time, and -m for mimetype (i.e. the file type). Through this command you can create a CSV listing this information along with the paths -p which can be used for appraisal of the content's preservation needs, content analysis based on file names and modified dates, and to sum the size column to determine how much storage a transfer will require. Here is an example:



rclone lsf --csv --format ptms -R --files-only [name of remote as set up above]:[name_of_folder_or_file (if spaces in name, you can put quotation marks around this after the colon)] > path/to/csv/filename.csv


Here is an example for Dropbox:



rclone lsf --csv --format ptms -R --files-only dropbox:"UMSI" > /Home/Desktop/UMSI/file_list.csv



Finding if a transfer contains Google drive formats 

When transferring content from Google Drive, there may be Google objects (Docs, Sheets, Slides, etc.) that will not be exported by Rclone in the native Drive "format" but in an equivalent format such as Microsoft Word instead. Additionally, there are some object types that cannot be exported by Rclone (such as Forms). In order to prepare for how best to export these files, it is useful to make a list of all the files in the transfer that are Google objects. This list should be included in the submission documentation of the transfer to record the original format of these files.


In order to find this information, we will use the lsf command similar to above but add --drive-show-all-gdocs to show the google docs (even those that can't be exported) and --metadata-include  to filter for only Google object mimetypes. We will also exclude size, since Rclone cannot measure that for Google objects.


Here is an example for Google Drive:


rclone lsf --csv --format ptm -R --files-only --drive-show-all-gdocs --metadata-include "vnd.google-apps.*" googledrive:"UMSI Web Archiving" > /Home/Desktop/UMSI_Web_Archiving/orig_in_gdrive_formats.csv



Copying files 

Basic copying

The command to copy files is fairly simple, you specify that you want to copy the files, enter their location, and then their destination. For instance:


rclone copy [name of remote as set up above]:[name_of_folder_or_file (if spaces in name, you can put quotation marks around this after the colon)] [/path/to/destination/folder/originalname, i.e. processing folder, etc. If you want to retain the original folder name, enter it here, quoted if there are spaces in it]


Here are examples:


rclone copy dropbox:"UMSI Web Archiving" "/Home/Desktop/UMSI Web Archiving


rclone copy gdrive:”Planet Blue” /Home/Desktop/UM-Planet_Blue


Shared folder copying

Things shared with you in cloud services often appear in a separate section from your personal storage area. In order to access that with Rclone you will often need to add a flag for the specific service to the copy command detailed above. Here are some the flags for Google Drive and Dropbox:


Google drive copying

Google drive has some unique features that sometimes allow for or require alternative steps.

You may need to check if Google objects exist in your transfer, see this section above (link). Most common formats of Docs, Sheets, and Slides we will export in open document equivalent formats. You can do this by setting the google drive export formats (the default are Microsoft Office documents). There are other options (such as PDF) described at the link above. 


Here is an example:


rclone copy googledrive:"UMSI Web Archiving" --drive-shared-with-me "/Home/Desktop/UMSI_Web_Archiving" --drive-export-formats ods,odt,odp


Extracting checksums

Some cloud providers have checksums stored in their system that you can extract and facilitate fixity checking. Some are unique to their system or some can be more standard types. Here is a general layout of the command to extract the checksums into a text file:


rclone hashsum [type of checksum] [remote source]:"folder_name or file" (same as used when copying) --output-file /path/to/output/file.txt 


Here is an example for dropbox:

rclone hashsum dropbox dropbox:"UMSI Web Archiving" --output-file /Home/Desktop/UMSI_Web_Archiving/dropbox_checksums.txt


Here is an example for Google Drive:

rclone hashsum md5 googledrive:"UMSI Web Archiving" --output-file /Home/Desktop/UMSI_Web_Archiving/checksum.md5


NOTE: Google objects, such as Docs, Sheets, and Slides, do not have checksums stored in Google Drive that can be extracted. If you have any of these in the content you're transferring, they will be downloaded as regular files, but they will not have checksums in the checksum file extracted from Google Drive.



Confirming fixity

In order to confirm fixity, there are number of options:


Confirm the using the checksums you extracted in the steps above:

rclone checksum [checksum type] /path/to/checksum/file.txt /path/to/local_directory/of/copied_files


Here is an example for dropbox:

rclone checksum dropbox /Home/Desktop/UMSI_Web_Archiving/dropbox_checksums.txt "/Home/Desktop/UMSI_Web_Archiving"


Confirm without local checksums/those that rclone generates:

rclone check [remote name]:[source folder] /path/to/local_copy/of/source_folder