CPSC 441, Networking, Fall 2004
Lab 11: Programming MPI


THIS LAB is really a set of three programming exercises. You will start work on the programs in lab and complete the rest for homework. The first two exercises are just a warm-up to get you started on programming with MPI. The third exercise is more substantial and is somewhat open-ended. Ideally, you will have time to finish one or two of the exercises in lab.

This lab will be due, along with work from the previous lab, on Monday, December 6 (the beginning of the last week of classes).


Estimating PI (Badly)

The mathematical constant PI is equal to the area of a circle of radius one. Consider the circle of radius one defined by x*x + y*y < 1. Now consider the quarter-slice of the circle that satisfies x >= 0 and y >= 0. This quarter circle has area PI/4, and it lies inside the square 0 <= x < 1, 0 <= y < 1, which has area 1. Suppose that you pick a large number of radom points in the square. Then you can expect the fraction of the random points that lie inside the circle to be approximately equal to PI/4. Multiplying this fraction by 4 will give an approximation for PI. The more points you use, the better the approximation you can expect to get (although the approximation turns out to be pretty poor, even using a lot of points).

The program estimate_pi_uniprocessor.cc in the directory /classes/f04/cs441/MPI implements this algorithm. Of course, if you could use MPI to spread out the calculations onto a lot of computers, you could get the answer faster. That's the assignment in exercises 1 and 2.

Exercise 1:  Write an MPI version of estimate_pi that uses all the available processors in the MPI virtual machine to do the work. Each process will perform the task of selecting many random points and counting how many of the points satisfy x*x y*y < 1. All the processes except process 0 should send their counts back to process 0 using MPI_Send. Process 0 should use MPI_Recv to receive the messages from the other processes and it should print out the estimate of PI given by the combined results. Your program should work for any number of processes -- don't assume that the number of processes is 12. It should take the total number of trials to do from a command-line argument, just as the uniprocessor version of the program does. In fact, your program can be modeled pretty closely on the sample prime-counting program, primes1.cc.

(Here is one little problem that you will have to deal with: The number of trials that you want is probably not divisible by the number of processes, so if you have each process do exactly the same number of trials, you won't get the correct total number. Perhaps the easiest way to deal with this is to have process 0 do a few extra trials to make up the difference.)

Exercise 2:  Your program for Exercise 1 uses MPI_Send and MPI_Recv for communication. In fact, it is simpler to use one of the collective communication functions, such as MPI_Reduce or MPI_Gather, to get the data from all the other processes to process 0. Write a second version of the pi-estimating version of the program, using a collective communication function instead of MPI_Send and MPI_Recv. You can look at the program primes2.cc for an example of this.


Counting Primes (Nicely)

For the main programming exercise on MPI, you will write a more complicated program that use a master/slave structure to implement load balancing. The program is yet another version of counting prime numbers. The problem is to count all the prime numbers in the range from 2 up to some specified upper limit.

As you have seen, if you simply divide the range into sections and let each process work on one section, then the load is not balanced and some processors will be idle for a large part of the run. One way to alleviate the problem is to break up the task into smaller-size tasks. A task consists of counting the prime numbers in some range between some specified minimum and maximum values. Process 0, acting as a "master" sends one task to each of the other "slave" processes. When a slave process finishes with its assigned task, it sends the result back to the master process. If all the tasks have not yet been assigned, the master will send a new task to the slave. The master process combines all the results that it receives for each task and prints out the final total.

Note that the size of the tasks has to be chosen carefully. If the tasks are too large, the load won't be balanced. If the tasks are too small, the communication overhead will be be so large that the gains you make by balancing the load will be more than wiped out by the communication time.

The prime-counting program primes_uniprocessor.cc uses the fact that an integer is prime if it is not evenly divisible by any number between 2 and its square root. However, a more efficient test says that an integer is prime is not evenly divisible by any prime number between 2 and its square root. So, if we keep a list of (small) prime numbers, we can use it to test other (bigger) numbers for primality. This is what is done by primes_smarter_uniprocessor.cc. this program starts by making a list of all primes between 2 and the square root of 231-1. Since 231-1 is the largest value of type int, the list contains enough primes to test any int for primality. You will want to do something similar in your program to make it as efficient as possible.

It is possible that your program might run faster if there are two or more processes running on each physical computer, since some processes can still be computing while others are waiting to be assigned new tasks. Try it. Maybe you can come up with other ways to speed up the program, for example a more clever way to create the original list of small primes.

Exercise 3:  Write a load-balancing master/slave version of the primes counting program. The program should count prime numbers between 2 and some upper limit, which can be specified as a command-line argument. You can assume that the upper limit is fairly large. You should try to write a program that will use as little elapsed time as possible. On Monday, December 6, we will run everyone's program to see whose is fastest at counting the primes between 2 and 100,000,000 when run on our network of 12 computers. (Note for overly clever people: Your program must actually compute and count the primes -- it shouldn't just print a pre-computed answer!)


David Eck