GHPCC Storage

In order to make sure that our cluster storage is optimized, we will be adhering to certain storage policies on the cluster. Here are the following locations that all PaiLab cluster users have access to:

HOME DIRECTORY (/home/[username]/)

    • 50G of storage reserved for your personal use, not accessible to any other user

    • only directory that is backed up

    • store all SCRIPTS/FINALIZED FILES here, to ensure the ability to re-create analyses if it becomes necessary

PROJECT FOLDER (/project/umw_athma_pai/)

    • 1.5T of storage accessible to all group members

    • when saving something here, please ensure the permissions are such that all group members can rwx.

    • Store the following:

      • genome information (/project/umw_athma_pai/genomes/)

        • common use genome fasta files, mapping indexes, gtfs, etc as needed

        • organized by: /project/umw_athma_pai/genomes/[species]/[genome build]/

    • raw data (/project/umw_athma_pai/raw/)

      • common-use raw data files, primarily fastq files for Illumina data and both signal/fastq files for Minion data

      • includes: (1) data generated by our lab & (2) data downloaded from SRA/GEO

      • stay tuned for organization system to keep track of and search all data that is here

NEARLINE FOLDER (/nl/umw_athma_pai/)

  • currently 2T of storage accessible to all group members, with options to increase storage as needed

  • when saving something here, please save to a user-specific folder or common folder that's named to clearly indicate it's purpose

  • Store the following:

    • files that are being actively worked on, including PROCESSED FILES / ANALYSES, etc

GENERAL RULES/POLICIES:

    • Every user has access to as much storage space as needed for their specific needs/projects, but please be reasonable and conscientious about usage

      • avoid saving multiple versions of the same file

      • gzip files when possible

      • regularly clean up temporary files, including test files, intermediate files, error/output files, etc

  • Thanks to an automated script, we will send regular updates regarding cluster usage, including information about:

    • project/nl usage per folder/user

    • instances of the following files:

      • unzipped fastq files --> all fastq files should be gzipped

      • sam files --> all sam files should be converted to bam files

      • [suggest other checks here]