UserPreferences

BMCBeowulfSuperComputer:WhatsNew


1. BMC Beowulf Supercomputer: Whats New

1.1. Current Status:

Number of Nodes online: 47

Nodes offline: 35

Last checked date: Nov 7, 2006

1.2. Schedule

User usage requests

User Start Date/Time Stop Date/Time Nodes needed (by number, or "all") Notes
Doug Blank
Tom Carroll 9/29 10/06 all I can use fewer nodes, if needed. Please let me know - tcarroll
Helen Grundman
Mike Noel

1.3. Detailed Status

1.3.1. New dispatch command

The new dispatch replaces the old dispatch and pdispatch commands. It works as follows.

   dispatch [[flags]* [includes]* [excludes]*]* [command]+

The flags:

   -p run in parallel
   -d debug mode. don't actually do anything, just print out what it would do
   -r report format (output all on one line per node)
   -c add color to output
   -u allow duplicate node numbers

The [includes] and [excludes] are either a single node number, a range, or a list:

   singles: dispatch 2 4 6 7 uptime
   range  : dispatch 1-48 uptime
   list   : dispatch 2,3,4 uptime

includes are given by a positive prefix or none, excludes by a negative prefix. The set of nodes is built up in a left-to-right order.

dispatch 1-48 -10-20 15 uptime

This example will add all of the nodes, remove 10-20, and then add back in 15.

The old version:

   dispatch 1-48 -i 29,35,48 ls -al
The new version:
   dispatch 1-48 -29,35,48 ls -al 
Other examples:
   dispatch -p -r 1 2 3 10-20 -15 30-40 -25 uname -a

1.3.2. System

The upgrade of the Beowulf cluster from Redhat 9 to Fedora Core 4 is finished. What does this mean to you? As far as running your code is concerned, not much. Some programs (such as Java) are still the same version, and haven't changed in their functionality at all. Other items (such as MPI) have changed quite a bit, as the entire system is running up-to-date versions of most everything else.

There are three machines that seem to be flakey that I have left offline: bw29, bw35, and bw48. If someone wanted to look at those to see if you can determine if it is a hardware problem, please do. In addition, bw25 needs to have its powerlight cable checked. Can someone take bw25 down, open it up, and check the wires?

For those of you that care, here is a list of what changed:

If you find anything that doesn't work as it did or as you would like, please let us all know. If you or your students would like to know more about the hardware, (and help maintain it!), please let us know. I hope that we can all get together soon to discuss more effective uses of the cluster.

1.4. Maintenance

Hardware upgrades needed:

Software upgrades needed:

Previous upgrades:

   dispatch 1 48 -i 36,39 "cat /proc/meminfo" | grep MemTotal | cat -n

Here's a way to get all of the machines that are free, and put them into a lam-bhost.def file:

dispatch 1-48 -r "uptime" | cut -d":" -f1,6 | cut -f1 -d"," | grep 0.00 | cut -f1 -d":" > lam-bhost.def

This uses a new -r flag on dispatch that formats the output in a report form, showing machine and results.

1.5. Meetings:

Planning meeting: BMC Beowulf Supercomputer Project, Phase 2

Jan 7, 2005, 10am Park Science 230

We currently have 24 nodes, each with 512MB RAM, standard ethernet networking, and 10 GB harddrives. We need to decide which of the following have highest priority for this year:

  1. Add more nodes (ie, computers): there is no limit to the number of nodes we can add, and each node can increase performance of the cluster.

  2. Add more memory: some programs require more RAM than we currently we have avaiable. This is especially true if we begin to utilize programs such as Mathematica, Matlab, Gaussian, or Maya (3D rendering).

  3. Add faster networking: currently, we have the slowest networking money can buy. Some processes require faster communication between processors in order to be effective.

  4. Add software: currently, we have only utilized free, open source software, and Mathematica. In the future, other scientists may wish to use other commercial packages.

We will start the meeting off with an overview of the Beowulf, and a brief introduction to using it.