CPSC 441, Fall 2014
Lab 9: ARP / xMandelbrot
This lab has two unrelated parts. In the first part you will investigate the ARP protocol. The second part introduces the idea of distributed processing, using computation of the famous Mandelbrot Set as an example. You should spend some class time on each part. (There are a lot of exercises in the lab; most of them are short.)
I encourage you to work with a partner on this lab.
The lab is due next Friday, in class. There is no lab next week.
Part 1: ARP
You computer has an "ARP cache" in which it stores information about Ethernet addresses associated with different IP addresses. To view the contents of the ARP cache, use the command
arp -n
The "-n" stops the command from trying to do DNS lookups to find host names for IP addresses.
Exercise 1: Find the MAC address of the Ethernet card in the computer csfac0 (IP address 172.21.7.90). This machine will not be in your ARP cache unless you have tried to access it (for example by pinging it).
Exercise 2: What happens to the ARP cache when you try to access a non-existant IP address on the local network, such as 172.21.7.125? How about if you try to access an address outside the local network, such as 123.45.67.89? Why the difference?
Exercise 3: The arp command can be used to make entries into the ARP cache (although only if you are running as root). Read the man page for the arp command and look up the meaning of the "-s" option. Why might you want to use this option? What would happen if you used this option to add an incorrect Ethernet address to the ARP cache?
Start up Wireshark. Start a packet capture on eth0. Set the filter to arp, and click "Apply". While the packet capture is running, access a computer that is not already in your ARP cache (maybe csfac1, 172.21.7.91). This will generate an ARP request and response.
Exercise 4: Discuss the number of ARP requests that you see. Why are there so many? Where do they all come from? Why don't you see the responses to all these requests?
Exercise 5: Find the reply to the ARP request that originated from your computer. (Hint: Use the filter arp.opcode == reply.) What is your computer's Ethernet address? How can you tell just by inspecting the ARP reply packet?
Exercise 6: There is such a thing as a "gratuitous" ARP request or ARP reply. Do you see any in the packet capture? (Hint: Use the filer arp.isgratuitous.) Do a web search for "gratuitous ARP", and do some reading. You will probaby find the Wireshark Wiki page. What are gratuitious ARP requests? Why are they used? (Do you have any interesting observations about the gratuitous ARP requests in your packet capture?)
Exercise 7: Look at how the source and destination Ethernet addresses are listed in the top third of the Wireshark window. They are not shown in purely hexadecimal form. Explain the format.
Part 2: A Distributed Processing Example
In distributed processing, a computation is broken up among a number of computers that communicate over a network. It is a type of parallel processing, but the relatively slow speed of network communication makes it different from parallel processing using threads inside a single computer. In this part of the lab, you will work with a fairly simple example of distributed processing. The computation in this case produces an image of the Mandelbrot set.
This example uses a master/slave model of distributed computation. The master program breaks up the computation of a complete image into a number of smaller jobs and distributes those jobs to slave processes running on other computers. The master receives the results of the jobs and combines them to create the complete image. (In fact, the program also uses some slave processes running as threads on the same computer as the master, so it can work even when there are no networked slaves.)
The master program is /classes/cs441/xMandelbrot.jar. It is written in Java. You can also find it on the web at http://math.hws.edu/xJava/MB, where you can find more information about the Mandelbrot set and the program. You should start up the program and try it out. You can zoom in by dragging a box around part of the image. The interesting parts are along the border of the black area. As you zoom in, you might need to increase the value in the "MaxIterations" menu — increasing the value might fill in some black areas with color. If you zoom in too far, you will exceed the precision of the Java double type. Try it with the last example in the "Examples" menu, which is just on the verge of doing so; zoom in to see the result. However, the program is capable of using arbitrary precision arithmetic. Just check the option "Enable High Precision" in the "Control" menu. Try it, and note that the high precision computation takes much longer than computing with the built-in double type. By distributing this time-consuming computation, you can get the image a lot faster.
Load the last example from the "Examples" menu, then check "Enable High Precision" in the "Control" menu. You should work with the same example for the rest of the lab.
To use a distributed computation, you need to start slave programs on several computers. The slave program is /classes/cs441/MBNetServe.java. By default, the program communicates on port 17071, but we might have several people trying to run the program on the same computer. So, you should select a different port number. I will assume that you are using port 12321 in the following discussion. (We might try to organize this better during lab, so that we don't have multiple users on the same machine; in that case you can just leave the port number out, and the default port will be used.) Start the slave on several computers using commands of the form
ssh -f hostname java -jar /classes/cs441/MBNetServe.jar -quiet -port 12321 -processcount 3
For the hostname, use one of cslab0, cslab1, ..., cslab11, or csfac0, csfac1, ..., csfac7. Select at random, if we haven't made better arrangements! Substitute your selected port number for 12321, or leave out the port number if you are using the default port.
Next, you have to tell the master about the slaves. In the xMandelbrot program, select the command "Configure Multiprocessing" from the "Control" menu. This will show the following dialog box, configured here with 5 slave computers:
In the dialog box, click "Enable Networking". Then click "Add Network Host", enter the name and port for one of your slave computers, and click OK. The slave will be added to the list of computers in the Multiprocessing Dialog. Repeat for each slave. Finally, click "Apply Config Now". At that point, the master will try to establish connections with all the slaves. You should see their status change from INACTIVE to CONNECTED. (Remember that you always have to click "Apply Config Now" to make a change in the configuration effective.
You should also uncheck "Use One Proecess for Each Processor" at the top of the dialog box, and set the Number of Processes to 3, as shown in the picture above. You should leave the dialog box open while working with the program in the rest of the lab.
Finally, you are ready to try some high precision computations with distributed processing.
Exercise 8: You should be using the last example, with high-precision processing enabled. Run a computation using distributed processing, and time how long it takes. Then turn off "Enable Networking" in the Multiprocessing dialog, and click "Apply Config Now", so that the slave status changes back to INACTIVE. Run the same computation, without the slaves, and time it again. How much speedup did you see? Was it what you expected? Ideally, you should try this with different numbers of slaves. (Note: The program doesn't really make it possible to run exactly the same computation. To run one that is essentially the same, use the "Custom" command in the "MaxIterations" menu, and change the number of iterations by one.)
Exercise 9:Start up Wireshark again. Shut down the networking in xMandelbrot, then start a packet capture in Wireshark. With the packet capture running, re-enable xMandelbrot networking and do a distributed computation with the program. The goal of this exercise is to figure out as much as you can, just from looking at the packet trace, about the protocol that is used by the program. Set the Wireshark filter to tcp.port == 12321, replacing 12321 with the port that is being used by the Mandelbrot program. The main Wireshark window doesn't show the packet data in text form, but you can see the content by selecting one of the packets and using the "Follow TCP Stream" command from Wireshark's "Analyze" menu. Outgoing packets are shown in red and incoming packets in blue. (It might also be interesting to look at the "Time Sequence Graph (Stevens)" command from the "TCP Stream Graph" sub-menu of the "Statistics" menu.) In addition, watching how the picture is composed in the Master program while the computation is in progress might give you some hints about how the computation is broken up into subproblems. Write a few paragraphs about your observations, conclusions, and speculations.