Fluid Numerics Cloud

Getting Started

Getting Help

At any time, you can get help by sending an e-mail to support@fluidnumerics.com

Additionally, you can reach us on slack. We have integrated a support feature into slack just for this hackathon! If you need support related to the slurm cluster, you can now easily reach us using the slack command `/supportmyfluid` . For example :

/supportmyfluid I need help submitting a job to the cluster.

will submit a ticket listing this description:

I need help submitting a job to the cluster.

and an automated message will tell you a ticket has been generated.

To resolve your issue quickly, be as clear as possible in describing the issue you are having. If there are issues with the cluster, list all of the steps necessary to reproduce the problem you are encountering.

Logging in

You can access the login node using the username and password that were sent to you before the hackathon. If you did not receive a username and password, reach out to support@fluidnumerics.com to request an account.

Once you have a username and password,

$ ssh username@gpucloud.fluidnumerics.com

On your first login, you will be required to create a new, secure password that is at least 8 characters long.

Compiling your code

We have provided gcc/6.4.0, gcc/8.2.0, and pgi/18.10 compilers on the fluid-cloud cluster. Additionally, we have installed your application's software dependencies that were provided in the GCP User Registration form.

To see which compilers and packages are available, you can run `module avail`

$ module avail
--------------------------------- /apps/modules/modulefiles ---------------------------------
dot  module-git  module-info  modules  null  use.own  

-------------------------------- /apps/packages/modulefiles ---------------------------------
cuda/10.0.130                                  netcdf/4.6.1/serial/gcc/6.4.0  
gcc/6.4.0                                      netcdf/4.6.1/serial/gcc/8.2.0  
gcc/8.2.0                                      netcdf/4.6.1/serial/pgi/18.10  
hdf5/1.10.3/parallel/openmpi/3.1.2/gcc/6.4.0   openmpi/3.1.2/gcc/6.4.0        
hdf5/1.10.3/parallel/openmpi/3.1.2/gcc/8.2.0   openmpi/3.1.2/gcc/8.2.0        
hdf5/1.10.3/parallel/openmpi/3.1.2/pgi/18.10   openmpi/3.1.2/pgi/18.10        
hdf5/1.10.3/serial/gcc/6.4.0                   pgi/18.10                      
hdf5/1.10.3/serial/gcc/8.2.0                   
hdf5/1.10.3/serial/pgi/18.10                   
netcdf/4.6.1/parallel/openmpi/3.1.2/gcc/6.4.0  
netcdf/4.6.1/parallel/openmpi/3.1.2/gcc/8.2.0  
netcdf/4.6.1/parallel/openmpi/3.1.2/pgi/18.10  

To load a compiler to your path, use `module load`

$ module load gcc/8.2.0
gcc version 8.2.0 loaded.

$ gcc --version
gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To use, for example, parallel netcdf with gcc/8.2.0 and openmpi/3.1.2,

$ module load gcc/8.2.0 openmpi/3.1.2/gcc/8.2.0 netcdf/4.6.1/parallel/openmpi/3.1.2/gcc/8.2.0

With the desired compiler and packages loaded to your path, you can now build your application.

Submitting jobs

Job scheduling is handled with SchedMD's Slurm Job Scheduler.

To view which partitions are available, use `sinfo`

$ sinfo
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
octo-v100*       up   infinite     20  idle~ fluid-cloud-compute[00000-00019]
quad-p100        up   infinite     20  idle~ fluid-cloud-compute[01000-01019]
quad-p4          up   infinite     20  idle~ fluid-cloud-compute[02000-02019]
quad-k80         up   infinite     20  idle~ fluid-cloud-compute[03000-03019]
single-k80       up   infinite     40  idle~ fluid-cloud-compute[04000-04039]
single-p100      up   infinite     40  idle~ fluid-cloud-compute[05000-05039]
standard-32      up   infinite     40  idle~ fluid-cloud-compute[06000-06039]
small-utility    up   infinite      5  idle~ fluid-cloud-compute[07000-07004]

You can submit batch jobs ( recommended ) using the `sbatch` command and a suitable batch submission file. The batch submission files sets information, like, the name of the job, how much wall-time is needed, and which partition your job will execute on.

#!/bin/bash
#SBATCH --job-name=my_hpc_app       # Job name
#SBATCH --output=my_hpc_app_%j.log  # Standard output and error log
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --ntasks-per-node=1         # with 1 task per node
#SBATCH --partition=small-utility   # name of the partition you will run on 
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --exclusive                 # Request exclusive access to a node

echo "HOST : " $(hostname)
date
....

The above example shows the top portion of a batch submission file. On GPU nodes, we recommend using the `--exclusive` to ensure you are the only resident of a GPU on a compute node. At the bottom of the file, you will add commands necessary to run your application.

To submit the job, use `sbatch` ( here, my_app.slurm is your batch submission file )

$ sbatch my_app.slurm

You can also get interactive sessions on the cluster using `srun`. For example,

$ srun --partition=small-utility --exclusive --pty /bin/bash

If you use interactive sessions, we ask that you actively conduct necessary work while connected, and release the node when you are finished.

About the Cluster

The Fluid Numerics Cloud cluster is an elastic High Performance Computing Cluster powered by Google Cloud Platform.

We have arranged a number of partitions for hackathon attendess to experiment with a variety of GPUs for accelerating their applications.

  • 64-octo-v100 - highmem-64 ( 64 CPU + 416 GB RAM ) + 8 Nvidia® Tesla® V100 GPUs
  • 32-quad-p100 - highmem-32 ( 32 CPU + 208 GB RAM ) + 4 Nvidia® Tesla® P100 GPUs
  • 16-quad-p4 - highmem-16 ( 16 CPU + 104 GB RAM ) + 4 Nvidia® Tesla® P4 GPUs
  • 16-quad-k80 - highmem-16 ( 16 CPU + 104 GB RAM ) + 4 Nvidia® Tesla® K80 GPUs
  • 8-single-k80 - highmem-8 ( 8 CPU + 52 GB RAM ) + 1 Nvidia® Tesla® K80 GPU
  • 8-single-p100 - highmem-8 ( 8 CPU + 52 GB RAM ) + 1 Nvidia® Tesla® P100 GPU
  • standard-32 - standard-32 ( 32 CPU + 120 GB RAM )

Users can access the cluster via ssh and can schedule jobs to run on compute nodes with SchedMD's Slurm job scheduler.

Compute nodes are provisioned on-the-fly and are removed when they are idle. This elasticity keeps compute costs low on the cloud by providing only the compute resources that are needed exactly when they are needed.

HPC packages are made available through environment modules. Each stack is built with gcc/6.4.0, gcc/8.2.0, and pgi/18.10