Check Job Performance 

Analyzing compute job performance

Initially your goal is just to get your process running.  It's often important to know how well your job is running.  Is your job using system resources effectively? How do check what system resources your application is using?  Are you maximizing system resources available to you?  In an ideal world, your process will use most of the available CPU and RAM on a compute node or server.  There are some basic steps you can take to get an idea of how your job is running.

First, read the documentation (all of the pages for the OIT-RC hardware specifications can be found here).  It should identify system requirements, and will often give you important clues as to whether your software can use multiple threads, run in parallel (i.e. use MPI), etc.  If you're writing your own application, you can use the following steps as well, to track how well your software is using system resources.

Determining your Hardware Usage

Not sure about some of the terminology used in this FAQ?   You can find answers in the Glossary and commonly used terms

Observing jobs in progress

If you want to know exactly how much RAM and CPU your job is using, there is several ways to do so.

Adapting your SBATCH Script

Here is a collection of possible limitations and how to specify it in an SBATCH directive. For more on these, refer to SLURM Parallelism.

Scaling Up

As you scale your job on more nodes, it's a good idea to verify system usage at each step.  


How to verify that all of the selected cores are being used