mpichdemo.html

There is a lovely mpi demo that can be done using the mandelbrot set.

These files are found in /home/pjordan/mandel/. I will package them up shortly. I had to make some edits to them, to get them to work.

Using a base installed debian cluster, install mpich on each node using apt-get. Rshd and rsh need to be correctly installed and configured for this to work properly.

I have provided a secondary install script to do all these tasks for you at http://g4cluster.blackwire.com/scripts/secondaries/mpich.config".

Download it, distribute it, and execute it on each node according to the instructions in secondary.html.

Now this 'mandel' demo uses X windows to display its image. You can use any X server you like to view this demo. You will have to use ssh to access one of the cluster nodes, I don't recommend you run the demo from master. That is, do not include master in your "machinefile" (man mpirun), and do not start mpirun from master either.

Make sure to use the X11 forwarding option of ssh. This is "-X" for openssh (v1.3).

Follow the instruction at hostsup.html to generate a hostsup file, and then convert it into the format that mpirun expects like so:

cat hostsup | cut -d' ' -f 2 > mymachines

Once you have this file, and you are logged in as user to a cluster node using ssh:

ssh -X pjordan@slave014 # example

you can start the demo with the following command:

mpirun -np 20 -machinefile /home/pjordan/mymachines /home/pjordan/mandel/mandel

See mpirun for an explanation of the options. I will only say that the "20" indicates how many nodes of the cluster you wish to use. If you don't have that many nodes, then obviously you need to use a lower number. Also, the full path MUST be given for any files named on the mpirun command line. I have lost a lot of time myself trying to figure out why it wouldn't work, only to realize much later that when it is spawning processes on other nodes it doesn't have any where a file is unless it was fully qualified with the full path.

The above example assumes you are using nfs and that master's /home directory is also /home on each node of the cluster. Otherwise it won't be able to find the files I named on the command line. You can work around this of course by copying the files to each node, but why would you want to ?

One final note, if you are getting strange error mesages of the form "Cannot connect to X display" or something like that, check that you used ssh with the -X option. If you didn't use ssh, then you didn't follow the instructions in this document and you are on your own. Sucker.