RUNNING JOBS

Overview

Cluster353 is designed for Massively Parallel Processing (MPP), which is the parallel execution of a program using multiple program tasks and/or threads on multiple processors. Processors on Cluster353 are combined into "nodes" with 4 GB of memory and 4 processor cores. Tasks typically communicate data with other tasks outside their working memory space using a message passing API, like the Message Passing Interface (MPI). Other parallel execution and communication models are also possible (e.g. by using OpenMP). Unlike systems with a resource manager and batch scheduler, the user must make sure there are enough resources available for executing their job within the given reservation of nodes for the job.

MPI Job Execution

Running MPI jobs:

user@headnode:~$ mpirun -n <number of workers> --hostfile ./<hostfile> ./<binary-to-execute>

Where <hostfile> lists the nodes to use with your job. A sample hostfile for Cluster353 can be found here.

Monitoring Jobs

To check the load of current node

user@headnode:~$ top

To check the load of a specific node:

user@headnode:~$ ssh <node-hostname> top

To see the load of the whole cluster in the command line use the tool clustertop

user@headnode:~$ clustertop

To see the load of the whole cluster visually, go to http://cluster353.ddns.net/ganglia and enter your user name and the password given to you upon signing up.