From dblank at mainline.brynmawr.edu Mon Sep 19 17:25:45 2005 From: dblank at mainline.brynmawr.edu (Douglas S. Blank) Date: Mon Sep 19 17:26:40 2005 Subject: [Beowulf] update Message-ID: <432F2CD9.80302@cs.brynmawr.edu> You are getting this announcement because you have used BMC's Beowulf computer cluster in the past. If you do not want to receive these messages, you can remove yourself from the mailing list here: http://emergent.brynmawr.edu/mailman/listinfo/beowulf If you wish to send messages to all of the Beowulf users, you can also find information at the above link. As some of you may know, the power to Park Science was interrupted sometime last week. We have the main harddisk on the Beowulf set up to be able to handle such interrupts (via RAID); however, it did not recover gracefully. The good news is that no data was lost. The bad news is that neither Matt nor I could figure out how to get the main disk on-line quickly. So, I'm taking a quick poll to see which of the following actions would be appropriate at the current time. First, some background. The Beowulf operating system (Red Hat 9) has not been updated nor patched since the day we installed it back in the Fall of 2003. This means that there are many security issues that need to be fixed, else someone can hack in and cause havoc (delete data, steal data, create spam, etc). We need to update the software at some point. Options: 1. Just get the system back into the state that it was in last week. We can do this fairly quickly, but it won't be up-to-date, and we'll have to take it down sometime in the near future to upgrade it. 2. Take a few more days, and update the system such that it will be up-to-date, and will be useable like it was last week. Opinions? -Doug -- Douglas S. Blank, Assistant Professor dblank@brynmawr.edu, (610)526-6501 Bryn Mawr College, Computer Science Program 101 North Merion Ave, Park Science Bld. Bryn Mawr, PA 19010 dangermouse.brynmawr.edu From dblank at brynmawr.edu Wed Sep 28 14:32:56 2005 From: dblank at brynmawr.edu (Douglas S. Blank) Date: Wed Sep 28 14:33:02 2005 Subject: [Beowulf] status update Message-ID: <433AE1D8.6030505@brynmawr.edu> Beowulf users, I have finished the upgrade of the Beowulf cluster from Redhat9 to Fedora Core 4. What does this mean to you? As far as running your code is concerned, not much. Some programs (such as Java) are still the same version, and haven't changed in their functionality at all. Other items (such as MPI) have changed quite a bit, as the entire system is running up-to-date versions of most everything else. There are three machines that seem to be flakey that I have left offline: bw29, bw35, and bw48. If someone wanted to look at those to see if you can determine if it is a hardware problem, please do. In addition, bw25 needs to have its powerlight cable checked. Can someone take bw25 down, open it up, and check the wires? For those of you that care, here is a list of what changed: - the head node (bw01) is now a dual-processor XEON computer. It has a SCSI disk, but only 256 MB of memory. We need to look into upgrading the memory. (Mike, do we have money left for that?) - the old head node is currently not being used. It could be upgraded and used as a replacement for 29, 35, or 48. It has two, new 40 GB harddrives. (Tom, can you look it that?) - there are currently 45 nodes up and running. - MPI works the same way as before: lamboot, lamnodes, mpirun, lamhalt - dispatch and pdispatch work the same way as before, eg: dispatch 01 48 -r -i 25,35,48 "uptime" - all of the machines can now access the outside world (all IP numbers). However, you still can only access a particular node by going through the root node (beowulf.brynmawr.edu). rsh is set up so that you do not need a password to run programs on various nodes). - the OS can now be updated automatically. We will need to make sure that if something critical changes on the root, then it will be updated on the rest of the nodes. - all nodes (except the root) can be rebuilt using Matt's kickstart rebuilding tools. We need to back up at least /home and /var/www/ on beowulf.brynmawr.edu. Matt? - we beleive that we can install and use Mathematica in parallel. Matt, is this possible? If you find anything that doesn't work as it did or as you would like, please let us all know. If you or your students would like to know more about the hardware, (and help maintain it!), please let me know. I hope that we can all get together soon to discuss more effective uses of the cluster. Thanks! -Doug -- Douglas S. Blank, Assistant Professor dblank@brynmawr.edu, (610)526-6501 Bryn Mawr College, Computer Science Program 101 North Merion Ave, Park Science Bld. Bryn Mawr, PA 19010 dangermouse.brynmawr.edu