The CBA Computation Cluster

Cluster Details: Right now we have 25 Sun Fire x2200 blades, each with eight physical cores and eight gigabytes of RAM. One is configured as the head node, which means that in total there are 184 physical cores and 184 gigabytes of RAM available for computation.

The cluster is running a newer distribution of the Rocks Operating System, which is built on and mostly compatible with Red Hat. The cluster is entirely on the 192.168.1.0/24 subnet. The Rocks DHCP server assigns IPs based on the MAC address of each node. The head cluster node, "cluster.cba.mit.edu", 18.85.8.70 public 192.168.1.3 private, holds the aforementioned DHCP server as well as a job scheduler.

Getting Started: After you log in, create a directory in the home directory using

cd ~/
mkdir yourname

This is your personal workspace. Be careful not to modify environment variables or stuff outside of your home directory; this account has sudo privileges!

Scheduling Jobs: The cluster uses the job scheduler TORQUE. This provides a method of distributing work among the different cores.

Some helpful cluster commands:

qsub -t [$num] [$jobscript]   Submits $num processes of $jobscript, one per core
qstat                         Lists currently running/queued jobs
qdel [$jobnumber | all]       Deletes the job named $jobnumber
xpbs&                         A GUI for monitoring batch jobs submitted through TORQUE
xpbsmon&                      A GUI for real-time resource monitoring of the entire cluster

Useful Software:

If you wish to install new software, untar it to the ~/programs directory, and add a short description of it to the README file.