1. BMC Beowulf Supercomputer: Programming Notes
1.1. Running many programs in parallel
You can run any command on a node with:
rsh MACHINENAME COMMAND
where MACHINENAME is one of bw01 through bw24. Not all of those machines are up and running yet, however. COMMAND is any program that you can run at the prompt. Try:
rsh bw04 ls /home/
for example.
You can redirect the output of a rsh command. You must decide who you want to get the output of the program. If you want the output to come back to the server node, then do this:
rsh bw16 > serverfile.out # the server creates/overwrites a file in current dir rsh bw02 >> serverfile.out # the server will create/append a file in current dir
But you can also have the node create the file by "escaping the redirect":
rsh bw16 ">" /home/dblank/remotefile.out # the node creates/overwrites a file rsh bw02 ">>" remotefile.out # the node will create/append a file
The first will allow all of the data to be sent to the server, and it will handle screen output. The second allows the remote node to handle the screen output.
To run a process on a remote node in the background, you will need to do two things: use the -n flag, and follow the command with an ampersand:
rsh -n bw10 somereallylongprogram &
If you are running Java programs, you will need to change to the directory. Here is a Java program sent to node bw09 to run in the background:
rsh -n bw09 "cd /home/mydir/java; java SomeProgram" &
1.2. Dispatching many copies to the nodes
Because the above rsh method is a very useful for running non-parallel programs, I have written a small Python script to automate the dispatching of a command to many nodes. It takes the form:
dispatch STARTNUM STOPNUM COMMAND
Examples:
dispatch 2 24 ls dispatch 2 15 "ls -al" dispatch 10 24 "java /home/dblank/myjavaprogram.class"
Each of these commands above will execute sequentially, so that the output is nice to read on the host.
To run in parallel, use the pdispatch program, and they will all run at the same time:
pdispatch 2 24 ls pdispatch 2 15 "ls -al" pdispatch 10 24 somereallylongrunningprogram
When running a program in a particular directory, you will have to refer to it by path + name. Note also that Java is very particular about how it runs a .class file. You will probably have to do something like:
pdispatch 1 24 "cd /home/dblank/java; java ThreeDoorProblem"
That is, you will have to change to the directory prior to running Java. The semicolon is necessary.
1.3. Remote logging into nodes
You can log into a machine with:
rlogin MACHINENAME
and that should not ask for a password. You probably won't need to do that, however.
1.4. MPI
LAM-MPI is up and running on the nodes mentioned in /etc/lam/lam-bhost.def. If you start with:
lamboot
it will automatically use that file.
With MPI you must issue a lamhalt before you exit. Otherwise you have to log in from a different terminal window and issue it.
1.5. Useful tidbits
- there is a script in /home/setup/bashrc that will run each time someone logs into one of the nodes.
- I've made a script called 'mpython' that is really just a wrapper for "mpirun -np $1 pyMPI ...". That way you can run pyMPI by:
mpython 11 script.py
so it looks much like running a regular Python script:
python script.py
(where 11 is the number of nodes you want to recruit).
