Supercomputing

We are excited to announce the general release of our new gs21 scratch system, now available at /mnt/gs21/scratch on all user systems, including gateways, development nodes, and the compute cluster. The new scratch system provides 3 PB of space for researchers and allows us to continue to maintain 50 TB quotas for our growing community. The new system also includes 200 TB of high-speed flash. You may begin to utilize the new scratch system immediately, and as always, please open a ticket at contact.icer.msu.edu should you have any questions or require any assistance.

On July 1st, /mnt/scratch and the $SCRATCH variable will be updated to point to the new gs21 space. On July 15th, ls15 and gs18 will be marked read-only, and will be removed from service on July 31st. Please update your scripts or programs that explicitly use gs18 and ls15 to use gs21. If you have a large quantity of data (>10 TB) to move from the old systems to the new, please contact us and we can help you use tools to transfer the data quickly.

Due to the high utilization of gs18 and the age of ls15 and gs18, we are encouraging users to begin migrating to gs21 at their earliest convenience.

Nicholas Rahme

HPC Administrator

/mnt/gs21/scratch/users/$USER

gs21 scratch : /mnt/gs21/scratch/users/$USER (or use variable $SCRATCH)

HPCC users can find their scratch spaces from the two paths:

"Any file not modified in 45 days may be deleted to keep the system stable and available to all users. The quota of the space for each user is limited to total 50TB and less than 1 Million files "

Users can run "quota" command to check the usage limit of their two spaces.

For the HPCC/ICER, you can start with:

https://wiki.hpcc.msu.edu/display/TEAC/Introduction+to+HPCC

https://icer.msu.edu/sites/default/files/Introductory%20Supercomputing.pdf

Partition names:

https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=18972892

How to submit jobs (SLURM scheduler):

https://wiki.hpcc.msu.edu/download/attachments/20120839/1hr.pdf?version=8&modificationDate=1597111477650&api=v2

https://wiki.hpcc.msu.edu/display/ITH/Job+Management+by+SLURM

SLURM commands/dependencies

https://wiki.hpcc.msu.edu/display/TEAC/List+of+Job+Specifications

The following is a list of basic #SBATCH specifications. To see the complete options of SBATCH, please refer to the SLURM sbatch command page.

https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=20119995

This link above mentions useful commands for managing job submissions. For example, the following can be used to cancel all pending submissions:

scancel --state="PENDING" --user=YOURUSERNAME

The following code may also be useful, where the number of nodes, CPUs per task, and memory allocation can be updated for a single job submission:

scontrol update job JOB_NUMBER NumNodes=2-2 CPUsPerTask=8 MinMemoryCPU=5G

If you want your jobs to run without incurring in cpu hours

#SBATCH -A mendoza_q

#SBATCH --exclude=amr-163,amr-178,amr-179,acm-028,acm-034

this should submit to our queue, exclude your buyin nodes and flip the jobs to run in the general partition, avoiding incurring in cpu hours.

If you want your to run specifically on one node

#SBATCH -w acm-028

this will submit specifically to node acm-028.

CPU and GPU usage limits in the general queue

https://docs.icer.msu.edu/job_policies/

Users with general accounts can use the powertools command SLURMUsage to check their used CPU and GPU time (in minutes) and remaining CPU and GPU time (in hours):

$ ml powertools # run this command if powertools not loaded

$ SLURMUsage

You can also check your GPU usage with:

https://docs.icer.msu.edu/Frequently_Asked_Questions_FAQ_/#how-do-i-check-my-cpu-or-gpu-time-usage

Managing Files in the supercomputer/cluster:

https://wiki.hpcc.msu.edu/display/ITH/File+Permissions+on+HPCC

https://wiki.hpcc.msu.edu/display/ITH/Change+Primary+Group

Execute the command below to replace or add lines to many files at once in the current directory. For example, CRYSTAL17 input .d12 files can be modified to add a keyword RESTART by ensuring that newtext=oldtext\nRESTART.

sed -i -r 's/oldtext/newtext/g' *.fileextension

How to make the files in your directory readable and editable for the group?

Execute the command below.

chgrp -R mendozacortes_group ~

where ~ means your entire directory or an specific path if desired, which will give you and your group access. You and your users would further need to run

newgrp mendozacortes_group

to ensure that new files are written as that user.

Additionally the user can request that their default group is mendozacortes_group, but each user would need to do that as well. We will be requesting to change all our users default group but thig might take some time.

Memory usage guidelines:

The main resources for successfully allocating resources in the HPCC are CPUs and memory. Users are often more aware of CPU usage than memory usage. However, both are equally important when requesting/allocating resources. Requesting too much memory can result in jobs pending for a long time, and HPCC resources not being used efficiently.

Memory usage can be modified in your sbatch script through the line:

#SBATCH --mem-per-cpu=200M

Up to 5GB can be allocated per CPU, depending on the nodes. However, this amount of memory is generally only required for very resource-intensive calculations.

To check how much memory a job used, you can run the PowerTools command js -j <jobID> after the job is completed. Then, compare the line "MaxRSS" (maximum memory footprint) an "ReqMem"(memory the job requested) and figure out the usage efficiency. This is a good way to gauge how much memory you should allocate for future jobs.

SLURM info

To submit a job

sbatch job.sh

To cancel a job in the queue

scancel jobID

To cancel a job many jobs in a range of jobids, where STATUS = PENDING or RUNNING

scancel -u $USER -t "STATUS" {jobid1..jobid2}

To list all of your job submissions since a specified date, which will specify the job id, job name, and working directory

sacct --start=2020-09-01 --format="jobid,jobname%20,workdir%70"

To determine more in-depth details of a job, use the command below. MAXRSS reports the amount of RAM requested for the job, which can help troubleshoot the quantity of RAM required to specify for that job. Additionally, ExitCode reports the exact reason a job may have been canceled.

js -j jobid

If you want to submit to general, but not incur cpu hours

If you want to submit to general, but not incur cpu hours, as what happens when they don't go through the buyin, they can try submitting with the following options in their submit scripts:

#SBATCH -A mendoza_q

#SBATCH --exclude=amr-163,amr-178,amr-179

this should submit to your queue, exclude your buyin nodes and flip the jobs to run in the general partition, avoiding incurring cpu.

New 'Scavenger' Queue Does Not Count Against Standard HPCC Limit

With few exceptions, each researcher using the HPCC is limited to running up to 520 jobs or 1040 cores at one time. Annually, non-buyin users are limited to a total of 500,000 CPU hours and 10,000 GPU hours. These limits do not apply to jobs submitted to the scavenger queue.

Similar to jobs submitted to the general-long queue, these jobs can request up to a 7-day wall time; however, jobs in the scavenger queue may be interrupted if resources are required for other non-scavenger jobs. The default behavior for interrupted jobs is to be re-queued, but users can opt for cancellation if it is more conducive to their workflow. We recommend that only users who can checkpoint and restart or have a workflow implemented that can manage jobs being canceled or requeued use this new queue.

https://icer.msu.edu/about/announcements/new-scavenger-queue-does-not-count-against-standard-hpcc-limits

About the 500,000 hours limit in the general-long queue.

if you are worried that your allocation in the general-long queue might reaching the 500,000 hours limit. Remember that it gets renewed every Jan 1st:

https://wiki.hpcc.msu.edu/plugins/servlet/mobile?contentId=20121162#content/view/20121162

500,000 CPU hours and 10,000 GPU hours (600,000 minutes) every year (from January 1st to December 31st) starting from 2021

If you go over the limit before Dec 31st, we can submit a form to get extra hours before the end of year:

https://contact.icer.msu.edu/cpugpu

Old Cluster:

Connecting to the supercomputer/cluster for the first time:

https://rcc.fsu.edu/manage/login

Just follow these instructions and when asked, say that you will work with me. After you sign, I will receive a notification and I will approve your account.

Connecting to the supercomputer/cluster when outside college campus:

https://rcc.fsu.edu/doc/off-campus-vpn-access

If you are outside FSU campus and want to connect to the cluster:

Accessing our Archival Storage (for large files, long term): ~50 TB available.

https://rcc.fsu.edu/doc/globus

We have access to many supercomputing clusters (locally and externally), with typical architecture and hybrid.

Some are (but not limited) to the supercomputer in NVIDIA, and some computing time in other places.

Here are some tips of our supercomputers.

Our partition in HPC

All the information is in: https://rcc.fsu.edu/docs

We name our nodes based on their physical location in the data center, and we've recently moved some around in an effort to upgrade our network layout and overall sustainability.

You can always see what nodes are in your partition by running rcctool my:partitions mendoza_q.

This will list the nodes in the partition along with their CPU/memory specs.

As of September 2020:

>> rcctool my:partitions mendoza_q

+-----------+----------------------+---------------------+----------------------+---------------+-----------+---------------+

+-----------+----------------------+---------------------+----------------------+---------------+-----------+---------------+

+-----------+----------------------+---------------------+----------------------+---------------+-----------+---------------+

15 Nodes / 360 cores (use sinfo -p mendoza_q to get node status)

The mendoza_q partition has access to 352 cores on these nodes

Note: Node names are subject to change periodically

+--------------+------+-------------------------------------------+-------+--------+----------+

+--------------+------+-------------------------------------------+-------+--------+----------+

| hpc-m36-1-4 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | | # 5.3 GB/core

| hpc-m36-2-6 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-2-7 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-2-8 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-2-9 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-4-10 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-1 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-11 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-12 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-2 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-3 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-5-8 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-6-1 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-6-3 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

| hpc-m36-6-4 | 2018 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | |

+--------------+------+-------------------------------------------+-------+--------+----------+

If you want to know the memory available for the other nodes, you can use:

>> rcctool my:partitions engineering_long

-----------------------------------------------+

43 Nodes / 1,204 cores (use sinfo -p engineering_long to get node status)

Note: Node names are subject to change periodically

+--------------+------+-------------------------------------------+-------+--------+----------+

+--------------+------+-------------------------------------------+-------+--------+----------+

| hpc-m35-2-10 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-11 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-12 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-4 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-5 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-6 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-7 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-8 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #4.6GB/core

| hpc-m35-2-9 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | | #...

| hpc-m35-3-1 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-10 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-11 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-12 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-2 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-3 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-4 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-5 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-6 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-7 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-8 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-3-9 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-1 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-10 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-11 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-12 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-2 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-3 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-4 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-5 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-6 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-7 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-8 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-4-9 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-1 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-10 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-11 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-12 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-2 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-5 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-6 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-7 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-8 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

| hpc-m35-5-9 | 2017 | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 28 | 128GB | |

+--------------+------+-------------------------------------------+-------+--------+----------+

>> rcctool my:partitions genacc_q

+----------+----------------------+---------------------+---------------------+---------------+------------+---------------+

208 Nodes / 3,372 cores (use sinfo -p genacc_q to get node status)

Note: Node names are subject to change periodically

+--------------+------+---------------------------------------------+-------+--------+----------+

+--------------+------+---------------------------------------------+-------+--------+----------+

| hpc-d36-3-1 | 2019 | AMD Opteron(TM) Processor 6220 | 32 | 128GB | | #4GB/core

| hpc-d36-3-2 | 2019 | AMD Opteron(TM) Processor 6220 | 32 | 128GB | | #4GB/core

| hpc-d36-5-1 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-5-2 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-5-3 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-5-4 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-7-1 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-7-2 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-7-3 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-7-4 | 2019 | AMD Opteron(tm) Processor 4184 | 12 | 96GB | | #8GB/core

| hpc-d36-9-1 | 2019 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 16 | 64GB | | #4GB/core

| hpc-d36-9-2 | 2019 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 16 | 64GB | | #4GB/core

| hpc-d36-9-3 | 2019 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 16 | 64GB | | #4GB/core

| hpc-d36-9-4 | 2019 | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 16 | 64GB | | #4GB/core

| hpc-m32-10-1 | 2019 | Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz | 40 | 191GB | | #4.8GB/core

| hpc-m32-10-2 | 2018 | Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz | 40 | 191GB | | #4.8GB/core

| hpc-i36-1 | 2017 | Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz | 28 | 257GB | | #9.2GB/core

| hpc-m36-2-12 | 2015 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | | #5.3GB/core

| hpc-m36-4-5 | 2015 | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 24 | 128GB | | #5.3GB/core

| hpc-i29-10 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-11 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-12 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-13 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-14 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-15 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-5 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-6 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-7 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-8 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i29-9 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #2.7GB/core

| hpc-i30-10 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | | #...

| hpc-i30-11 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-12 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-13 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-14 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-7 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-8 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i30-9 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-1 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-10 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-2 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-4 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-5 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-6 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-7 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-8 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i31-9 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-1 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-2 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-3 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-4 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-5 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-6 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-7 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-8 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-i32-9 | 2010 | AMD Opteron(tm) Processor 6174 | 48 | 128GB | |

| hpc-r29-1 | 2008 | Quad-Core AMD Opteron(tm) Processor 2382 | 8 | 32GB | |

How do I use the latest nodes

Example starting with: https://rcc.fsu.edu/submit-script-generator

This will give you the nodes available by year:

>> sinfo -o %f

AVAIL_FEATURES

YEAR2012,amd

YEAR2013,intel

YEAR2012,intel

YEAR2019,intel

YEAR2014,intel

YEAR2010,amd

YEAR2017,intel

YEAR2018,amd

YEAR2008,amd

YEAR2015,intel

YEAR2018,intel

Let's say that you want a node of a certain year for a VASP calculation:

#!/bin/sh

#SBATCH -J pHGF_Li3Stage1

##SBATCH -N 1

#SBATCH -t 336:00:00

#SBATCH -n 16

#SBATCH -p genacc_q

##SBATCH -C "YEAR2019&YEAR2018&YEAR2017&YEAR2015&YEAR2014&YEAR2013&YEAR2012"

##SBATCH --nodelist=hpc-d36-3-1

##SBATCH --output=job.out

export JOB=pHGF_Li3Stage1

###The lines below requests the latest version of vasp (vasp.5.4.4). To use the version 5.4.1 refer to the script given below

module purge

#module load gnu

#module load gnu-openmpi

module load vasp

which vasp_std

mpirun -np 16 vasp_std > vasp_new.out

Let's say that you want specific 2 nodes for a CRYSTAL calculation:

#!/bin/bash

#SBATCH -J graph_Ac

##SBATCH -N 1

#SBATCH -n 16

#SBATCH -p genacc_q

#SBATCH -t 336:00:00

##SBATCH -C "YEAR2019&YEAR2018&YEAR2017&YEAR2015&YEAR2014&YEAR2013&YEAR2012"

cat $SLURM_JOB_NODELIST

export JOB=graph_Ac

export DIR=$SLURM_SUBMIT_DIR

export PATH=/gpfs/research/mendozagroup/crystal/2017/v1_0_1b/ifort14/bin/Linux-ifort14_XE_emt64/v1.0.1:$PATH

# export PATH=/gpfs/research/mendozagroup/crystal/2017/v1_0_2/ifort14/bin/Linux-ifort14_XE_emt64/v1.0.2:$PATH

export scratch=/gpfs/research/mendozagroup/scratch/crystal/${USER}/crys17

echo "submit directory: "

echo $SLURM_SUBMIT_DIR

module purge

module load intel/16

module load intel-openmpi

rm -fr $scratch/$JOB

mkdir -p $scratch/$JOB

# the following line need be modified according to where your input is located

cp $DIR/${JOB}.d12 $scratch/$JOB/INPUT

#cp $DIR/${JOB}.f9 $scratch/$JOB/fort.9

cd $scratch/$JOB

touch hostfile

rm hostfile

for i in `scontrol show hostnames $SLURM_JOB_NODELIST`

do

echo "$i slots=16" >> hostfile

done

# in the following, -np parameters should be equal to those specified above.

#mpirun -np 8 -machinefile hostfile Pcrystal >& $DIR/${JOB}.out

#mpirun -np 4 -machinefile hostfile Pproperties >& $DIR/${JOB}.DOSS.out

#srun -n 8 Pcrystal >& $DIR/${JOB}.out

mpirun -np 16 -machinefile hostfile Pcrystal >& $DIR/${JOB}.out

#mpirun -np 16 -machinefile hostfile MPPcrystal >& $DIR/${JOB}.out

#cp fort.9 ${DIR}/${JOB}.f9

#cp fort.25 ${DIR}/${JOB}.f25

#cp BAND.DAT ${DIR}/${JOB}.BAND

#cp DOSS.DAT ${DIR}/${JOB}.DOSS

# uncomment the next 5 lines if you want to remove the scratch directory

#if [ $? -eq 0 ]

#then

#cd ${DIR}

#rm -rf $scratch/${JOB}

How do I use the specific nodes

Example starting with: https://rcc.fsu.edu/submit-script-generator

Let's say that you want a specific node for a VASP calculation:

#!/bin/sh

#SBATCH -J pHGF_Li3Stage1

#SBATCH -N 1

#SBATCH -t 336:00:00

#SBATCH -n 16

#SBATCH -p genacc_q

#SBATCH -w hpc-d36-3-1

##SBATCH --nodelist=hpc-d36-3-1

##SBATCH --output=job.out

export JOB=pHGF_Li3Stage1

###The lines below requests the latest version of vasp (vasp.5.4.4). To use the version 5.4.1 refer to the script given below

module purge

#module load gnu

#module load gnu-openmpi

module load vasp

which vasp_std

mpirun -np 16 vasp_std > vasp_new.out

Let's say that you want specific 2 nodes for a CRYSTAL calculation:

#!/bin/bash

#SBATCH -J graph_Ac

##SBATCH -N 1

#SBATCH -n 16

#SBATCH -p genacc_q

#SBATCH -t 48:00:00

#SBATCH -w hpc-d36-3-1,hpc-d36-3-2

##SBATCH --nodelist=hpc-d36-3-1,hpc-d36-3-2

cat $SLURM_JOB_NODELIST

export JOB=graph_Ac

export DIR=$SLURM_SUBMIT_DIR

export PATH=/gpfs/research/mendozagroup/crystal/2017/v1_0_1b/ifort14/bin/Linux-ifort14_XE_emt64/v1.0.1:$PATH

# export PATH=/gpfs/research/mendozagroup/crystal/2017/v1_0_2/ifort14/bin/Linux-ifort14_XE_emt64/v1.0.2:$PATH

export scratch=/gpfs/research/mendozagroup/scratch/crystal/${USER}/crys17

echo "submit directory: "

echo $SLURM_SUBMIT_DIR

module purge

module load intel/16

module load intel-openmpi

rm -fr $scratch/$JOB

mkdir -p $scratch/$JOB

# the following line need be modified according to where your input is located

cp $DIR/${JOB}.d12 $scratch/$JOB/INPUT

#cp $DIR/${JOB}.f9 $scratch/$JOB/fort.9

cd $scratch/$JOB

touch hostfile

rm hostfile

for i in `scontrol show hostnames $SLURM_JOB_NODELIST`

do

echo "$i slots=16" >> hostfile

done

# in the following, -np parameters should be equal to those specified above.

#mpirun -np 8 -machinefile hostfile Pcrystal >& $DIR/${JOB}.out

#mpirun -np 4 -machinefile hostfile Pproperties >& $DIR/${JOB}.DOSS.out

#srun -n 8 Pcrystal >& $DIR/${JOB}.out

mpirun -np 16 -machinefile hostfile Pcrystal >& $DIR/${JOB}.out

#mpirun -np 16 -machinefile hostfile MPPcrystal >& $DIR/${JOB}.out

#cp fort.9 ${DIR}/${JOB}.f9

#cp fort.25 ${DIR}/${JOB}.f25

#cp BAND.DAT ${DIR}/${JOB}.BAND

#cp DOSS.DAT ${DIR}/${JOB}.DOSS

# uncomment the next 5 lines if you want to remove the scratch directory

#if [ $? -eq 0 ]

#then

#cd ${DIR}

#rm -rf $scratch/${JOB}

Scratch Volume

https://rcc.fsu.edu/doc/storage-quota-management

As part of our ongoing migration to GPFS, we have setup a new scratch volume. We encourage all general-access scratch users to move to this new volume as soon as possible.

We have setup a new general access scratch space, located at /gpfs/research/scratch/[USERNAME].

Our new Group Scratch is in: /gpfs/research/mendozagroup/scratch/

To check how much is available you can type:

[jmendozacortes@hpc-login-vm3 ~]$ gpfs_quota mendozagroup

Showing quota information for research fileset: mendozagroup

Block Limits | File Limits

Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace Remarks

research FILESET 5.13T 15.52T 15.52T 403.5G none | 1883051 0 0 35241 none DSS01.local

To check how much is available in your personal account, you can type:

[jmendozacortes@hpc-login-vm3 ~]$ gpfs_quota

Our Scratch Space (Size 2.438 TB)

There is a total of 10,652GB available in the volume.

You can see how much space is used and available in that volume by running:

pan_df -h /panfs/storage.local/engineering/mendozagroup/scratch/

As of 2018/01/29 we have 15.5TB of space available.

However most of the scratch space is being occupied by old jobs from vasp and g09, which we should clean soon.

General Scratch Space (Size 7.3 TB)

There is a total of 7.3TB available in the volume.

You can determine how much space is available at any given time by running the following command:

pan_df -h /panfs/storage.local/scratch/

Please note that this scratch space is shared by all HPC users, so we are more aggressive about deleting old files from it. Please remove any I/O data as soon as your job is finished using it.

You can request space on this general scratch space by submitting a ticket to RCC and it will be created under:

/panfs/storage.local/scratch/[YOUR-USERNAME]

Please remember to delete data once you are done with it, and review our scratch policy: https://rcc.fsu.edu/doc/scratch-space

Our GPU partition in HPC

The GPU is now in our partition and you should be able to access it using the -greg=gpu:1 flag for srun/sbatch. See https://rcc.fsu.edu/doc/gpus for more information. Keep in mind that the documentation is for our generic gpu nodes which have 4 cards instead of 1.

The K40 gpu card is in node hpc-m36-1-4 which is in the mendoza_q partition

The gpu card was added to a node in your partition, so you and your group will have high priority access to it.

srun -p mendoza_q --gres=gpu:1 --pty bash

Keep in mind, that your nodes are also part of the engineering partition, so you might not get immediate access. I believe the walltime for that partition is 6 hours.

Other partition in other external supercomputers

Ask about them if you need more super computing power.

Archival Storage (Globus)

Besides putting some files in our panasas (/panfs/storage.local/engineering/mendozagroup/scratch/) space, you can put large files and backup files in our archival storage that we have other >10 TB. This is useful to transfer also big files.

The easiest and most reliable method to transfer data from and to the archival storage is through globus. See https://rcc.fsu.edu/doc/globus for more information about this service. That will allow you to schedule transfers from our Panasas and Lustre storage to the archival space or if you download the globus software to your laptop/desktop to transfer from there to one of our storage system. Use fsurcc#archival as the endpoint. You can use volume can be found under /mnt/archival/engineering/mendozacortes/

Let me know if you run into any issues with the storage.

It was setup with our group owner ship for the mendozagroup, so students should also be able to write to it.

How to make the files in your directory readable and editable for other members of the group?

Please go to you gpfs directory and change the properties of all of your directories/files to be read and exchange with members of the group:

e.g. execute the command below.

chmod -R g+rX ~

where ~ means your entire directory or an specific path if desired. This recursively sets r/x, i.e. members of the group can read/exchange files.

Alternatively, you can change the directories to be part of the group:

This will make your files readable to the members of our group:

chgrp -R mendozacortes_group ~

where ~ means your entire directory or an specific path if desired.

General tips for cleaning disk space

When working with computational simulations, most programs generate temporary files that can be deleted once papers are published and relevant data is stored. This can be done easily using simple bash commands.

For example, if you want to erase all files of a certain extension .ext in all subdirectories of a folder you can run the following commands:

First, check which files you will be erasing with:

find . -name "*.ext"

Then erase those files using:

find . -name "*.ext" -exec rm {} \;