CPSC 441, Fall 2014
Lab 10, Part 1: Introduction to MPI
This lab will get you started working with MPI. You will write a couple simple MPI programs, and do some testing. In the second part of the MPI lab, next Monday, you will work on a more substantial program. Work from both parts is due on December 12, the last day of class.
You should have read the MPI handout before starting the lab! However, after that handout was printed, the MPI installation on the lab computers has been changed from MPICH to OpenMPI. The only change that this requires is a slightly different procedure for running an MPI program on multiple hosts. The correct instructions for OpenMPI are given below.
As I mentioned in class, I have made new accounts for people in the course to simplify running MPI programs on multiple computers. Your user name and password will be available at the lab. (Note: It's no use changing your password, since you actually have a different account on each of the 20 computers, and changing your password will only affect one machine.)
Before starting work on MPI, you should log in to your new account and set it up for passwordless access between lab computers. Note that this only works because the home directory for the new account uses the NFS file system instead of the AFS file system used for your regular account. Here's how it's done...
Your ssh configuration is stored in a hidden directory named ".ssh", which must be accessible only to you. Usually, it is created automatically when you use ssh, but you can create it with the following commands in your home directory:
mkdir .ssh chmod 700 .ssh
Usually, the first time you ssh to a computer, you are asked whether you really want to connect. To avoid this, you can copy a "known_hosts" file into your .ssh directory that already lists all of the lab computers. The file you need is in the directory /nfshome/MPI. Copy it into your .ssh directory with the command:
cp /nfshome/MPI/known_hosts .ssh
Once that's done, you can ssh to one of the lab computers without verifying that you really want to connect. But you still need to enter your password. You can avoid that by using public-key authentication instead of password authentication. To set this up, first change into the .ssh directory and use the ssh-keygen command to generate "RSA keys" as follows:
cd .ssh ssh-keygen -t rsa
Press return three times for the keygen program, to accept all the defaults. This will create files named id_rsa and id_rsa.pub. These are your public and private keys. To enable passwordless ssh, the private key must be in the .ssh directory on the computer from which you are ssh'ing, and the public key must be in a file named authorized_keys in the .ssh directory on the computer to which you are trying to connect. But in this case, since your networked home directory is the same on all the computers, it's the same .ssh directory in both cases. The upshot is that you need to copy id_rsa.pub to a file named authorized_keys. Still working in the .ssh directory, use the command
cp id_rsa.pub authorized_keys
That should do it! Test your configuration by trying to ssh to a couple of the lab computers. You should be connected without having to answer any questions and without having to give your password. For example,
ssh cslab7 or ssh csfac3
(Don't forget to log out!)
The folder /nfshome/MPI contains files that you will need for this lab. You might want to copy the entire folder into your home directory, and use the copied MPI folder as your working directory for the rest of this lab. In any case, you will need to copy all the files from /nfshome/MPI into your working directory (except for known_hosts). The rest of the lab assumes that you have done that.
To make sure that you can compile and run MPI programs, you should try compiling and running the sample program hello_mpi.c. This is the same sample program that I handed out in class. To compile it, use the command
mpicc -o hello_mpi hello_mpi.c
Once it has compiled, you can use
mpirun -n 10 ./hello_mpi
to run it. This command runs the program in 10 processes on the computer that you are using. The values that you use for the name of the executable ("-o") and for the number of processes ("-n") are up to you.
You should also try running the program on several computers. There are two ways to do this. One way is to use a "hosts file" that contains the names of the computers on which you want to run the program. The second way is to list the host names as an option in the command. (Note: This is a little different from the handout, since the handout described MPICH but we are actually using OpenMPI.)
A hosts file simply lists the names of the computers, one per line. The file allhosts, which you should have copied from /nfshome/MPI, lists all 20 available computers. To run hello_mpi on all 20 computers, use the command
mpirun -hostfile allhosts -n 20 ./hello_mpi
If you make your own host file, with a subset of the computers, just substitute the name of your file for "allhosts". To list the hosts on the command line, use the "-H" option, followed by a comma-separated list of host names (with no spaces in the list). For example,
mpirun -H cslab3,cslab8,csfac2,csfac6 -n 8 ./hello_mpi
An MPI Program: Estimating PI (Badly)
The mathematical constant PI is equal to the area of a circle of radius one. Consider the circle of radius one defined by x*x + y*y < 1. Now consider the quarter-slice of the circle that satisfies x >= 0 and y >= 0. This quarter circle has area PI/4, and it lies inside the square 0 <= x < 1, 0 <= y < 1, which has area 1. Suppose that you pick a large number of radom points in the square. Then you can expect the fraction of the random points that lie inside the circle to be approximately equal to PI/4. Multiplying this fraction by 4 will give an approximation for PI. The more points you use, the better the approximation you can expect to get (although the approximation turns out to be pretty poor, even using a lot of points).
The program estimate_pi_uniprocessor.c implements this algorithm, with no parallelism. Of course, if you could use MPI to spread out the calculations onto a lot of computers, you could get the answer faster. That's the assignment in exercises 1 and 2. Note that you might find it useful to look at the sample MPI programs primes1.c and primes2.c. These sample programs have a lot in common with what you need to do for the two exercises.
Exercise 1: Write an MPI version of estimate_pi that uses all available processes to do the work. Each process will perform the task of selecting many random points and counting how many of the points satisfy x*x + y*y < 1. Each process except process 0 should send its count back to process 0 using MPI_Send. Process 0 should use MPI_Recv to receive the messages from the other processes and it should print out the estimate of PI given by the combined results. The number of trials to be performed by each process can be given by a constant in the program, as is done in the "uniprocessor" version. Only process 0 should do any output.
Exercise 2: Your program for Exercise 1 uses MPI_Send and MPI_Recv for communication. In fact, it is simpler to use the collective communication function MPI_Reduce to get the data from all of the processes to process 0. Write a second version of the pi-estimating version of the program, using a collective communication function instead of MPI_Send and MPI_Recv.
A Little Empirical Speedup Test
Note: Everyone in the class should write their own programs for Exercise 1 and Exercise 2. Starting with Exercise 3, and continuing with all of Lab 10b, you have the option of working with a partner. If you work with a partner, be sure that both names are on your work.
Reminder: Programs such as primes1.c that use functions such as sqrt from the math library must be linked to that library. The math library is actually named just "m", and you can link to it by adding the option "-lm" to the end of the compilation command. For example,
mpicc -o primes1 primes1.c -lm
Exercise 3: For this exercise, use the sample MPI program primes2.c. You should measure the speedup that you get by using various numbers of processes on one machine and on several machines. Compile the program. Run it with just one process, and see how long it takes. (Process 0 will report the elapsed time to standard output.) You can do this with the command:
mpirun -n 1 ./primes2
Run the program with 4 processes on one computer. Run it with 4 processes on 4 computers. Run it with 16 processes on 4 computers. Run it with 20 processes on all 20 computers, and maybe with 80 processes on all 20 computers. You might try each experiment several times, and take the average run time. You can try other numbers of processes and computers if you want. You might consider increasing the constant DEFAULT_RANGE in the program, so that it will take a larger total processing time. Note that we have quad-core computers, so each computer can run 4 processes in parallel. (One thing to keep in mind: Starting up and tearing down the MPI virtual machine takes time, and the more processes and computers you use, the more significant that overhead will be.) As your response to this exercise, report and comment on the speedups that you obtain.