In this lab, you will write a program that uses linked lists. A typical operation on linked lists is to search the list to find some specified item. The search algorithm for linked lists uses linear search. In analyzing this algorithm, we can ask how many nodes are visited before the item is found (or found to be not in the list). Of course, in the worst case, the answer is n, where n is the length of the list, and the worst case run time is Θ(n). But we can also ask about the average number of nodes that are visited during a search.
If we assume that all the items that we look for are in the list and that each item in the list is equally likely to be the target of a search, then the average number of items that are visited during a search is about n/2. However, in many realistic applications, some items are much more likely to be the target of a search than others. The average search time can be reduced by putting the more likely items at the front of the list. (In fact, the minimum average search time will be achieved by storing the items in the list in order of their probability of being searched for, starting with the most probable item at the head of the list.
Unfortunately, we are not likely to know the probabilities in advance, and even if we did know them, it is likely that the probabilities will change over time. One approach to dealing with this fact is to use a self-modifying linked list that is designed to automatically keep items arranged at least approximately according to their probability of being searched for. The idea is that when an item is searched for, it should be moved up in the list. The items that are searched for most often -- that is, the items that have a high probability of being searched for -- will tend to spend most of the time near the front of the list, while items that are rarely searched for will tend to spend most of the time near the back of the list.
Now, the question is, how should the list modify itself when an item is searched for? There are at least two possible policies: Move the selected item all the way to the head of the list, or move the item up one position by swapping it with the item that precedes it in the list. In this lab, you will implement both possible policies, and you will run some empirical tests to determine whether doing so really does decrease the average search time significantly.
Note on Lab 4: Lab 4 is due, but I will not check your work until the morning of Wednesday, September 26. Before then, you should make sure that your project for Lab 4 is checked into your CVS repository and is up-to-date.
You can begin by starting a new Eclipse project. (You can consider creating a named package inside that project to contain your work, but that is not required for this lab.) In that project, you will write a class to implement the self-modifying linked list, and a second class that contains a main program to test your list class.
Inside the list class, define a nested class to represent a node on the list, and use an instance variable to store a pointer to the head of the list. For the purposes of gathering statistics, you will also need instance variables to keep track of the total number of searches performed and the total number of items that were inspected during those searches:
private static class ListNode { int id; // An ID number that identifies this item. String data; // The data that is stored in this item, along with the ID. ListNode next; } private ListNode head; private int numberOfSearches, numberOfItemsVisited;
(The data strings are irrelevant as far as today's lab is concerned, and you can use arbitrary strings in the list that you create.)
The class should contain a method
public void addItem(int id, String data)
that adds a new list node to the list, where the new node contains the given id and data. The node can be added at the front of the list (which is very easy).
The class also needs a method
public String retrieve(int id)
that searches the list for a node that has the specified id number, and returns the data from that node. (You can return null if the item is not found.) The retrieve() method is also responsible for keeping track of the number of searches and the total number of items that were inspected during all the searches. In fact, you really only want to keep track of this data for successful searches.
Finally, you should provide some method for retrieving information about the statistics that were gathered during the searches.
Back in the main program (in a separate class), you should create a list object and fill it with 100 items. Use the integers from 0 to 99 as IDs for the items. You can use any data strings you want; they are irrelevant for today's lab.
It would be nice if the items in the list were randomly ordered by ID. To do this, put the numbers 0 through 99 in an array of length 100. Shuffle the array into a random order. Then use the numbers from the array as IDs when you add the items to the list. (Once you have added the items to the list, you don't care about the array anymore.) Here is some code that you can use to create the array:
int[] numbers = new int[100]; for (int i = 0; i < 100; i++) numbers[i] = i; for (int i = 0; i < 100; i++) { int r = (int)(100*Math.random()); // random position in the array int temp = numbers[r]; // swap numbers[i] and numbers[r] numbers[r] = numbers[i]; numbers[i] = temp; }
Now, using the list object's retrieve() method, run a large number of searches for randomly selected ID numbers in the range 0 to 100. Compute the average number of steps in all the searches, using the data collected inside the list object. Since each item in the list has an equal probability of being searched for, the answer should be close to 50.5, (This is (1 + 2 + ... + 100)/100.)
You want to make sure that your list is working correctly before you move on to the next part of the lab.
For the second part of the lab, you want to run searches in which certain ID numbers have a significantly higher probability of being selected than others. Here is a method that selects an ID number in the range 0 to 99, where the probability of choosing a number less than 20 is higher than the probability of choosing a number greater than 20:
static int generateRequest() { if (Math.random() > 0.3) return (int)(20*Math.random()); else return 20 + (int)(80*Math.random()); }
Use this method or something similar in your main program to select ID numbers to search for.
The main point of this lab is to make the list self-modifying, so that it can adjust itself to reflect the varying probabilities of different items. You should first implement the idea of moving a selected item to the front of the list.
The self-modification is done in the retrieve() method of your list class. In that method, if you find an item in the first node of the list, the list should not be modified. If you find it later in the list, the list should be modified. In order to make the modification, you will need two pointers of type ListNode, one to the node that contains the item and one to the preceding node in the list. This is similar to an example that we did in class, when we deleted an item from a linked list. Typically, the two ListNode variables would be called runner and previous. To modify the list, delete the node that runner points to from its current position in the list, then tack that same node onto the front of the list. This is very easy; it can be done in three lines of code, once you have the variables runner and previous. The point of this is to move the item that you have just searched for to the front of the list.
After making these modifications, run your program again. You should see a significant reduction in the average number of steps per search.
There is a second form of self-modifying list whose approach is less extreme than moving a selected item all the way to the front of the list. In the second version, the selected item is simply swapped with the preceding item in the list. This allows items that are selected a lot to slowly move towards the front of the list, while items that are more rarely selected will slowly move towards the back.
Make a second version of your self-modifying list class (by copying the original and pasting it back into the project with a new name). Change the new class so that it implements the second version of self-modification. The only change will be in the retrieve() method. You will still need the runner and previous pointers, but now you should simply swap the contents of the node that is pointed to by runner with the contents of the node that is pointed to by previous There is no need to move the actual nodes themselves; it's easier just to move their contents.
You should be able to substitute the second list class for the first list class in your main program, to see whether the second version is more or less effective than the first at decreasing the average number of steps per search.
Do some testing with both versions of the self-modifying list. Try changing the probabilities that are used in the generateRequest() method. You are welcome to do any other experimentation that interests you. Write up your observations. You can put them in a short text file in your Eclipse project. (To add a plain file to an Eclipse project, you can use the "New/File" command.)