CPSC 441, Fall 2018
Lab 8: Distributed Processing in Java
For this lab, you will implement a search for big prime numbers using a distributed application written in Java. The numbers that you find will actually be only "probably prime," but the probability that a number identified as probably prime is actually not prime can be made arbitrarily small. You will use Java's BigInteger class to represent the possible prime numbers.
To write the distributed application, you will need to write at least two programs: a "worker" program that will run on several computers and a "supervisor" program that runs on only one computer. The supervisor will communicate with the workers using Java's Socket API. This is similar to what was done with the distributed Mandelbrot computation in Lab 7.
You are advised, but not required, to work with a partner on this assignment.
This lab should be turned in before you leave for Thanksgiving break, no later than 8:00 AM on Saturday, November 17. It will be printed and collected at that time. All of your work should be in a folder named lab8 inside your homework folder. I will print out all Java files in that folder, so please do not include extra source files. If you work with a partner on the lab, make sure both names are in the source code files; only one person should turn in the assignment. The comments in your program should include instructions for running it!
There is no lab next week, because of the exam that will be given next Wednesday.
About Prime Numbers and BigInteger
Large prime numbers are used in public key encryption, Generating a public/private key pair involves finding two large primes. The more bits in the prime numbers, the stronger the encryption. Although it is difficult to prove that a big number is prime, there is a probabilistic test that can be used. The test has a "certainty factor." If N is the certainty factor and if the test says that a number is prime, then the probability that it is actually not prime is less than 2-N.
The class java.util.BigInteger in Java can represent integers with any number of bits (within the limits of available memory, of course.) It has a method n.isProbablyPrime(certainty) for testing a BigInteger n for primality. The test is relatively quick for numbers of 1000 or even 2000 bits, but it slows down as the number of bits grows.
To search for (probably) prime numbers, pick a random BigInteger, n, with the desired number of bits. Make sure it is odd, since an even number is certainly not prime. Use isProbablePrime() to test n for primality. If that doesn't work, try n + 2, and so on. This works because primes are actually relatively common. (There are, however, long stretches of consecutive integers that are not prime, so to be safe, you might want to choose a new random starting point from time to time.)
Sample Programs
There are two sample programs in the folder /classes/cs441/bigprimes. SimpleBigPrimes.java is a single-threaded program that looks for probable primes with a given number of bits and prints out the ones that it finds. It runs forever, until it is stopped. The number of bits is a constant in the program.
The program ThreadBigPrimes uses four threads to search for prime numbers (and it demonstrates some techniques for working with threads). The program will run for a given amount of time, printing out any threads that it finds, and then stops. There is a supervisor thread that generates "candidate" numbers to be tested and puts them into a queue. The four worker threads take numbers from the queue and test them. The sturcture is similar to the threaded web server from Lab 3.
I set ThreadedBigPrimes to run for 5 minutes looking for 8192-bit primes. In five test runs on a computer in the Lansing computer lab, it found 1, 2, 1 0, and 1 primes — a total of 5 primes in 100 minutes of compute time (four threads times five minutes times five test runs).
Note that the source code for the Mandelbrot program is available on line at http://math.hws.edu/eck/js/mandelbrot/java/xMandelbrotSource-1-2/, if you want to look at it. A simpler version, with no GUI, is discussed in Javanotes, Section 12.4.5.
A Distributed Prime Search
Your assignment is to get more computers involved in the search for big prime numbers. You can write a program that simply finds prime numbers and lists them, like the sample programs do. Or you can choose a different goal. For example, you might decide to count the number of prime numbers in a given range of BigIntegers. You might search for a single very large prime number and stop when one is found. You might search for twin primes (a prime p such that p+2 is also prime, like 17 and 19), but for that, the number of bits will have to be smaller, since twin primes are much less common than primes.
You will write a worker program that will be run on multiple computers. When the worker is started, it will need to create a ServerSocket to listen for a connection from the supervisor program. Once the connection is established, the worker can receive "tasks" from the supervisor. The task might be checking one number for primality, or it could be a range of numbers (to avoid having individual tasks that are too small). After completing the task, the worker can return some sort of result to the supervisor, and it can then be assigned a new task. Part of your assignment is to design the protocol that will be used for communication between the supervisor and the workers. A worker can be either single-threaded or multi-threaded. That is up to you.
(For testing, you might want to log into a computer and start the worker there, instead of starting it remotely. That will give you a chance to see any debugging information output by the worker. It would be nice to have some way for the workers to terminate automatically. For example, they could simply terminate when the connection from the supervisor is closed. Another option is to use a timer to terminate the program if there has been no connection for some period of time.)
You will also write the supervisor program, which only runs on one computer (the one that you are sitting in front of). The supervisor will have to be told where to contact the workers; that is, it will need an IP address and port number for each worker. (Or, the port number could simply be a constant in the programs.) The supervisor will need one thread for communicating with each worker. A communication thread will open a connection to its worker, and then run in a loop in which it sends tasks to the worker and gets back the result.
The supervisor program might need another thread to generate tasks and feed then, through a blocking queue, to the communication threads. The structure would be similar to ThreadedBigPrimes, except that the communication threads are not actually carrying out tasks; the are simply handing the tasks off to other computers. Alternatively, depending on what you are trying to accomplish, it might be sufficient for each communication thread to generate tasks on its own.