BMC Beowulf Supercomputer:Programming Demo
For this demonstration, we will be using the Python programming language, and MPI, the Message Passing Interface. We will use LAM/MPI from http://www.lam-mpi.org/ .
To just run a bunch of copies of a program, and some additional notes, see: BMCBeowulfSuperComputer:ProgrammingNotes
Step 1: Start LAM
First, we need to make the nodes ready for parallel processing. We do this with:
lamboot
You should get a message from Indiana University, and then get your terminal prompt back. To see what nodes you have available, try:
lamnodes
lamboot uses a default listing of files. The list is stored at /etc/lam/lam-bhost.def. You can also make and use your own list of machines to use. Lines beginning with hash marks are comments and are ignored. Here is a sample bhost file:
bw01 bw02 bw03 bw04 bw05 bw06 bw07 #bw08 #bw09 #bw10 #bw11
Now, we can start LAM with your own bhost file:
lamboot bhost
where bhost is the name of your bhost file.
Now we are ready to run an MPI-compiled program. This could be C, C++, or Fortran. In our case, I have compiled Python so that it will run as a MPI program.
You run any MPI program with:
mpirun -np NUM PROGRAM [ARGS]
where NUM is replaced with the number of instances of a computer node. Usually, you want this to match the number of computers in your bhost file. So, in our second example above, NUM would be 11. You should also replace PROGRAM with the name of the MPI-compiled program. ARGS are optional. For Python, PROGRAM will be pyMPI. Because we will be using pyMPI so often, we have made a short, easy to remember alias:
mpython NUM PROGRAM [ARGS]
Let's see a specific example. Consider the following Python program called sum.py:
# sum 0+1+...+(N - 1)
import mpi
N = 7000000
mystart = (N / mpi.size) * mpi.rank
myend = (N / mpi.size) * (mpi.rank + 1)
# last processor gets to go the full length...
if mpi.rank == mpi.size - 1:
myend=N
sum = 0L
for i in range(mystart, myend):
sum += i
totalSum = mpi.reduce(sum, mpi.SUM)
if mpi.rank == 0:
print totalSum
Notice that it imports a library named mpi and refers to:
| mpi.rank | ID of the current node |
| mpi.size | total number of computers |
| mpi.reduce | collects data from each of the computers |
We run this program with:
mpython 11 sum.py
or:
mpirun -np 11 pyMPI sum.py
Try making stop much, much larger. How much more time does it take?
Python and MPI
The allocation of tasks to nodes can be tricky. If you know your tasks each take about the same amount of time, then you could do something like:
tasks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sublen = len(tasks) / (mpi.size - 1)
start = sublen * mpi.rank
for t in tasks[start:start+sublen]:
Experiment(t)
if mpi.rank == (mpi.size - 1):
for t in tasks[sublen * mpi.size:]:
Experiment(t)
and that would divy the experiments as equally as possible over the available nodes.
If you want to create a series of experiments that explored the interaction of three parameters (for example), in Python you could create a list of tuples like:
x = [1, 2, 3] y = [.4, .5, .6] z = [10, 20, 30] crossproduct = [(a, b, c) for a in x for b in y for c in z] print crossproduct
would print:
[(1, 0.40000000000000002, 10), (1, 0.40000000000000002, 20), (1, 0.40000000000000002, 30), (1, 0.5, 10), (1, 0.5, 20), (1, 0.5, 30), (1, 0.59999999999999998, 10), (1, 0.59999999999999998, 20), (1, 0.59999999999999998, 30), (2, 0.40000000000000002, 10), (2, 0.40000000000000002, 20), (2, 0.40000000000000002, 30), (2, 0.5, 10), (2, 0.5, 20), (2, 0.5, 30), (2, 0.59999999999999998, 10), (2, 0.59999999999999998, 20), (2, 0.59999999999999998, 30), (3, 0.40000000000000002, 10), (3, 0.40000000000000002, 20), (3, 0.40000000000000002, 30), (3, 0.5, 10), (3, 0.5, 20), (3, 0.5, 30), (3, 0.59999999999999998, 10), (3, 0.59999999999999998, 20), (3, 0.59999999999999998, 30) ]
And then one could:
tasks = crossproduct
sublen = len(tasks) / (mpi.size - 1)
start = sublen * mpi.rank
for xyztuple in tasks[start:start+sublen]:
Experiment(xyztuple)
if mpi.rank == (mpi.size - 1):
for xyztuple in tasks[sublen * mpi.size:]:
Experiment(xyztuple)
On the other hand, if you know that your tasks take very different amounts of time, you can use a simple interface written by Doug Blank and Jim Marshall:
from pyrobot.mpi import *
if mpi.rank == 0:
allocator = TaskAllocator(TASKS)
allocator.run()
else:
handler = TaskHandler()
handler.run()
This assumes that TASKS is a sequence of objects that have a run method. The tasks get passed over the network (using the pickle operation). In this manner, you should try to run the longer tasks first. Here is a simple task object class that works with the above code:
class TaskObject:
def __init__(self, i):
self.i = i
def run(self):
return Experiment(self.i)
TASKS = map( lambda i: TaskObject(i), range(10) )
Of course, TASKS could be any kind of sequence and of any type, as long as the object has a run() method. The code is very short, and can be found in /home/pyrobot/mpi/__init__.py. You can also combine the crossproduct method above with the task allocator:
from pyrobot.mpi import *
x = [1, 2, 3]
y = [.4, .5, .6]
z = [10, 20, 30]
crossproduct = [(a, b, c) for a in x for b in y for c in z]
class TaskObject:
def __init__(self, xyztuple):
self.data = xyztuple
def run(self):
# do some calcs on self.data
return results
TASKS = map(lambda a: TaskObject(a), crossproduct)
if mpi.rank == 0:
allocator = TaskAllocator(TASKS)
allocator.run()
else:
handler = TaskHandler()
handler.run()
Extended MPI Example
Here is another example that creates a fractal image:
import mpi
import sys
from array import *
from struct import *
#Function to return BMP header
def makeBMPHeader( width, height ):
#Set up the bytes in the BMP header
headerBytes = [ 66, 77, 28, 88, 0, 0, 0, 0, 0, 0, 54, 0, 0, 0]
headerBytes += [40] + 3*[0] + [100] + 3*[0] + [ 75,0,0,0,1,0,24] + [0]*9 + [18,11,0,0,18,11]
headerBytes += [0]*10
#Pack this data sa a string
data =""
for x in range(54):
data += pack( 'B', headerBytes[x] )
#Create a string to overwrite the width and height in the BMP header
replaceString = pack( 'I', width )
replaceString += pack( 'I', height)
#Return a 54-byte string that will be the new BMP header
newString = data[0:18] + replaceString + data[26:54]
return newString
#Define our fractal parameters
c = 0.4 + 0.3j
maxIterationsPerPoint = 64
distanceWhenUnbounded = 3.0
#define our function to iterate
def f( x ):
return x*x + c
#Define the bounds of the xy plane we will work in
globalBounds = (-0.6, -0.6, 0.4, 0.4 ) #x1, y1, x2, y2
#define the size of the BMP to output
#For now this must be divisible by the # of processes
bmpSize = (400,400)
#Define the range of y-pixels in the BMP this process works on
myYPixelRange = [ 0,0]
myYPixelRange[0] = mpi.rank * bmpSize[1]/mpi.procs
myYPixelRange[1] = (mpi.rank+1) * bmpSize[1]/mpi.procs
if mpi.rank == 0:
print "Starting computation (groan)\n"
#Now we can start to iterate over pixels!!
myString = ""
myArray = array('B')
for y in range( myYPixelRange[0], myYPixelRange[1]):
for x in range( 0, bmpSize[0] ):
#Calculate the (x,y) in the plane from the (x,y) in the BMP
thisX = globalBounds[0] + (float(x)/bmpSize[0])*(globalBounds[2]-globalBounds[0])
thisY = (float(y)/bmpSize[1])*(globalBounds[3] - globalBounds[1])
thisY += globalBounds[1]
#Create a complex # representation of this point
thisPoint = complex(thisX, thisY)
#Iterate the function f until it grows unbounded
nxt = f( thisPoint )
numIters = 0
while 1:
dif = nxt-thisPoint
if abs(nxt - thisPoint) > distanceWhenUnbounded:
break;
if numIters >= maxIterationsPerPoint:
break;
nxt = f(nxt)
numIters = numIters+1
#Convert the number of iterations to a color value
colorFac = 255.0*float(numIters)/float(maxIterationsPerPoint)
myRGB = ( colorFac*0.8 + 32, 24+0.1*colorFac, 0.5*colorFac )
#append this color value to a running list
myArray.append( int(myRGB[2]) ) #blue first
myArray.append( int(myRGB[1]) ) #The green
myArray.append( int(myRGB[0]) ) #Red is last
#Now I reduce the lists to process 0!!
masterString = mpi.reduce( myArray.tostring(), mpi.SUM, 0 )
#Tell user that we're done
message = "process " + str(mpi.rank) + " done with computation!!"
print message
#Process zero does the file writing
if mpi.rank == 0:
masterArray = array('B')
masterArray.fromstring(masterString)
#Write a BMP header
myBMPHeader = makeBMPHeader( bmpSize[0], bmpSize[1] )
print "Header length is ", len(myBMPHeader)
print "BMP size is ", bmpSize
print "Data length is ", len(masterString)
#Open the output file and write to the BMP
outFile = open( 'output.bmp', 'w' )
outFile.write( myBMPHeader )
outFile.write( masterString )
outFile.close()
That produces the following image:
Quit
Finally, you need to quit LAM/MPI:
lamhalt
Try http://vortex.brynmawr.edu/beowulf for a page with some more detailed instructions on getting a cluster up and running (as a test bed for your mpi enabled code, before taking it to the 24 node machine!)
