Compute Canada

General Information:

Batch Orienting: To use Compute Canada, it is important to submit your jobs in batches. This allows for efficient use of resources and helps to prevent unnecessary delays.

Job Queuing: When you submit a job, it will be added to a queue and will wait for available resources to become free. It is important to keep in mind that the wait time will depend on the number of jobs already queued and the available resources at that time.

Priority and Resource Allocation: The priority of your job depends on your allocation and past usage. It is important to not use up all the available resources to ensure that other users have access to resources as well. By using resources efficiently, you can increase your priority and receive faster access to resources in the future.

Connecting to Compute Canada:

Connecting to Your Terminal: To access Compute Canada, use the command 

ssh -Y username@cedar.computecanada.ca

In your terminal, it is important to note that all commands are case sensitive.

Creating Your Own Project Directory: After logging in, create your own project directory where you can store and access your work. This will ensure that your files are organized and easily accessible.

First change your director to your group directory.

cd projects/<groupname>/<username>

and then make your project directory.

mkdir <projectname>

Transferring files:

After change your directory to your project directory,  you can use the following command to transfer your file there.  Make sure that you don't miss the dot  in the end. This dot refers to the current directory.


scp ~/Desktop/example.txt  username@cedar.computecanada.ca:projects/<groupname>/<username>/<projectname>

Environment:

A virtual environment is a self-contained environment that allows you to install and run specific versions of software packages and dependencies for your project. This enables you to create an isolated environment that is independent of the system-wide software installed on the cluster. This is useful when you require specific versions of software packages that may not be compatible with the system-wide software. Using virtual environments can also help avoid conflicts between different versions of the same package, making it easier to manage dependencies and maintain consistency across different projects. For instance, if you plan to work with Python, you can check the available Python versions using the following command: 

module avail python 

Once you have identified the version you need, load it with the command 

module load python/3.7

Here are the steps to create and activate a virtual environment:

pip install virtualenv

virtualenv <env_name>

 where <env_name> is the name of your environment. 

source <env_name>/bin/activate

 This will switch your shell to use the virtual environment you created. You will notice that the name of the environment appears in your command prompt.

pip install <package_name>

 Any packages you install will only be available within the virtual environment you created.

By following these steps, you can easily create and activate a virtual environment in Compute Canada and install any package you need for your project.


CUDA:


CUDA is a programming model and parallel computing platform developed by NVIDIA for general-purpose computing on GPUs. Compute Canada provides access to clusters with CUDA-enabled GPUs, enabling users to run GPU-accelerated applications.

CUDA-enabled GPUs offer high-performance computing capabilities for various applications, including molecular dynamics, computational fluid dynamics, machine learning, and more. In Compute Canada, users can utilize CUDA-enabled clusters to run their GPU-accelerated codes and applications, resulting in faster computation and enabling researchers to tackle larger and more complex problems.

To use CUDA on Compute Canada clusters, users must load the necessary CUDA modules and compile their code using the NVIDIA CUDA compiler. Users can also leverage CUDA-aware libraries that have been optimized for GPU computing, further enhancing their applications' acceleration.

Sample:

Create a file named running_job.sh. 

This is a bash script with Slurm directives that specifies the settings for a job to be run on a Compute Canada cluster.

#!/bin/bash

This line specifies the interpreter that will be used to execute the script, in this case, Bash.

#SBATCH --job-name=summ

This line specifies a name for the job.

#SBATCH --nodes=1

This line specifies the number of nodes to be used for the job. In this case, only one node will be used.


#SBATCH --gpus-per-node=v100l:1

This line specifies the type and number of GPUs per node to be used for the job. In this case, one V100 GPU with the model name "v100l" will be used.


#SBATCH --ntasks=1

This line specifies the number of tasks to be executed in parallel.

#SBATCH --mem=32G

This line specifies the amount of memory required for the job in gigabytes.

#SBATCH --cpus-per-task=20

This line specifies the number of CPUs per task.

#SBATCH --time=00:00:10

This line specifies the maximum time allowed for the job to run in hours:minutes:seconds.

#SBATCH --account=<group_name>

This line specifies the account to which the job belongs.

#SBATCH --output=<log_name>.log

This line specifies the name of the file to which the standard output will be written.

#SBATCH --error=<error_name>.err

This line specifies the name of the file to which the standard error output will be written.

#SBATCH --mail-user=<email_address>

These lines specify the type of email notifications to be sent, in this case, at the beginning, end, or failure of the job.


#SBATCH --mail-type=BEGIN

#SBATCH --mail-type=END

#SBATCH --mail-type=FAIL

This line loads a standard environment module.

module load StdEnv/2020

This line loads a GCC compiler module.

module load gcc/9.3.0

This line loads a CUDA module.

module load cuda/11.1.1

This line prints the version of the NVIDIA CUDA compiler.

nvcc -V

This line sources a file activiate your enviroment.

source <env_dir>/bin/activate

This line runs an executable file named "hello".

./hello.sh

Now copy each line to running_job.sh and save and close the file. See this sample file. Now you can execute your job with this command.

sbatch running_job.sh

You can check your status of your job with this command:

sq

You can cancel your job by this command:

scancel <JOBID>

We note that we cen get the JOBID with sq.

For checking log or error file, you can use teh following command:

tail -f <file_name>