Astro Compute Cluster

astro cluster configuration

The astro cluster consists of a login node, a set of compute nodes, and two NFS file servers.

- astro.rcf.bnl.gov: This is the login node. Always log into astro. This machine is the only machine from which you can submit wq jobs.
  - astro has 32Gb of ram and 8 cores, so it should support a number of concurrent users and jobs.
  - Interactive jobs can be run on astro, or if you require a lot of resources use "wq -c bash" or "wq -c tcsh" possibly with -r "mode: bynode" if you need a lot of memory. Add -r "X:1" to get X window forwarding.
  - The home areas are hosted on a separate system and there is a quota of 10G per user.
  - The home areas are backed up regularly.
- astroXXXX.rcf.bnl.gov: These are the compute nodes. wq jobs are farmed out to these nodes. Please don't run jobs on these machines unless you have submitted them through the queue, it will impact queued jobs. An exception is if you want to log into a node to run "top" or some other monitoring tool. In other words, use discretion.
  - Each machine has a globally writable scratch area under /data, but do not store data there permanently. DO clean up after your job finishes regardless.
- nsf servers:
  - Make yourself a directory under /astro/astronfs01/workarea/ and put data there, not in your home.
  - the nfs servers are not backed up.

hadoop
- - HDFS is the Hadoop Distributed File System It is large: all disk on the compute nodes is available transparently in one big file system. It is very high performance and reliable. If you are killing the nfs server, or just interested in trying it, ask us for an "account" in hadoop.
  - Data is stored redundantly and is very safe. Store data you want forever in your home area or in HDFS.

Backups

The home areas are backed up fairly regularly. But you should never rely on the backups for your code. You should put your code in a revision system like git, hg, or svn and regularly push changes to an external repository. Some free hosts are github.com and google code

If you do have data that is important to you or will take a long time to regenerate, consider putting it into the Hadoop distributed file system.

Page updated

Google Sites

Report abuse