CPSC 225, Spring 2012
Lab 2: Search/Sort Experiments

We have been talking about the analysis of algorithms in class, and we have looked at some algorithms for searching and sorting arrays as examples. In this lab, you will write programs to do some experiments with searching and sorting algorithms. In the experiments, you will measure the run times of the algorithms for various inputs. You'll also do a couple of calculations based on your results.

Start the lab by creating a new project named lab2 (or something like lab2-cs225 if you prefer). Add copies of the files unsorted_words.txt, WordList.java, and SortedWordList.java to the src folder in your project. You can find copies of these files in /classes/cs225. WordList.java and unsorted_words.txt are the same files that you used in Lab 1. SortedWordList.java is identical to WordList.java, except that it sorts the list of words, and it uses binary search to search the list for a given word. (In fact, it would make more sense to store the list of words in a file that is already sorted, but for the purposes of this lab, we are using an unsorted file.)

The lab asks you to write some programs and a written lab report. Turn in your programs by copying your lab2 folder into your homework folder in /classes/cs225/homework. You can either turn in the written part in class or include it as a clearly named file in your lab2 folder.

Some Techniques

This section goes over a few techniques that you will need to do the lab, and it contains some methods that you can use in your programs. Note that if you open this page in a web browser, you can copy-and-paste the methods from the web browser into your program. You don't have to retype the code!

First: Both parts of the lab ask you to measure the time it takes for your program to perform some task. Recall that this can be done using the standard function System.currentTimeMillis(). This function returns a value of type long that represents the number of milliseconds since a certain time in the past (January 1, 1970). Call this function once before starting the task, saving the value in a variable of type long. Then call this function again after the task, and subtract the two values to get the time, in milliseconds, that it took to perform the task.

Note that there is always some variation in run time, mostly due to the fact that the computer is always doing more than just working on your program. In particular, measurements that aren't at least a few milliseconds are pretty much useless. In general, you want to the task that you are measuring to take at least 10 milliseconds, preferably longer.

Second: Remember how to use random numbers in Java. The function Math.random() returns a randomly selected double in the range from 0.0 to 1.0 (not including 1.0). To get a random integer in the range 0 to N, inclusive, you can use (int)((N+1)*Math.random(). To get a random lower-case letter, you would use:

char ch = (char)( 'a' + (int)(26*Math.random()) );

This leads to the following method for creating random word of a given length where "word" in this case just means a sequence of lower-case letters:

/**
 * Returns a random string of length n made up of randomly
 * selected lower case letters.
 */
static String randomWord( int n ) {
   String word = "";
   for (int i = 0; i < n; i++) {
      char ch = (char)( 'a' + (int)(26*Math.random()) );
      word = word + ch;
   }
   return word;
}

Third: You will need the selection sort and insertion sort algorithms that we looked at in class. Here are methods that implement those algorithms for an array of type double[]:

/**
* Sort an entire array of doubles, using selection sort.
*/
static void selectionSort(double[] A) {
  for (int top = A.length-1; top > 0; top--) {
     int maxLoc = 0;
     for (int i = 1; i <= top; i++) {
        if (A[i] > A[maxLoc])
           maxLoc = i;
     }
     double temp = A[maxLoc];
     A[maxLoc] = A[top];
     A[top] = temp;
  }
}


/**
* Sort an entire array of doubles, using insertion sort.
*/
static void insertionSort(double[] A) {
  for (int top = 1; top < A.length; top++) {
     int pos = top;
     double temp = A[top];
     while (pos > 0 && A[pos-1] > temp) {
        A[pos] = A[pos-1];
        pos--;
     }
     A[pos] = temp;
  }
}

Part 1: Linear versus Binary Search

The WordList class looks up words in an array using linear search. The SortedWordList class does the same thing using binary search, but it has to sort the array before it can do this.

For Part 1 of the lab, you should devise experiments to determine, numerically, two things: First, How much faster is binary search than linear search in this case? And second, How long does it take to sort the array?

For the first question, it won't be sufficient to measure the time that it takes to search for a single word, since that time is too short to measure with reasonable accuracy. So you will want to search for a bunch of words and measure the total search time for all of them, using each of the two algorithms. Consider making an array of random words, using techniques discussed in the previous section.

For the second question, note that constructing an object of type WordList involves loading the array from a file, while constructing an object of type SortedWordList involves both loading the array from a file and then sorting that array.

You should write a program to do the experiments. It's possible to do both experiments in the same program. You will want to run the program several times, since there will be some variation in the run times. To get the best possible answers, you should take the average of the times from several runs.

(By the way, searching and sorting in SortedWordList are done using the built-in functions Arrays.sort and Arrays.binarySearch from the class java.util.Arrays.)

Part 2: Three Sorting Algorithms

For the second part of the lab, you want to write a program that compares the time it takes to sort an array using three different sorting algorithms: Selection Sort, Insertion Sort, and the algorithm that is used by the built-in function Arrays.sort(). You will apply these algorithms to arrays of type double[].

Methods for selection sort and insertion sort are given above. You can use them in your program. Arrays.sort() is available as long as you import it from the package java.util.

Your program should sort arrays of various sizes using all three algorithms. You should do this at least for arrays of length 1000, 10000, and 100000. For Arrays.sort(), you can also do arrays with a million elements or more. (You might have trouble measuring the sort time for Arrays.sort() on short arrays, so use some longer ones.)

You should fill your arrays with random real numbers. It's a good idea to make several identical arrays, so that you can apply the algorithms to the same input.

You don't necessarily have to write a program that will do all the experiments in one run. You can use a program in which the length of the arrays is given by a constant. Then run the program several times, with different values for that constant.

Part 3:Results and Analysis

For the written lab report, you should present the results of your experiments. That is, report all the run times that you measured (and say exactly what it was that you measured in each case).

In addition to just reporting the results, you should answer the following questions.

About Part 1:

  1. How many times faster was binary search than linear search in this application?
  2. How long did it take to sort the array?
  3. Suppose that you have the unsorted list of words, and you need to search for some words in the list. Is it worth sorting the list first? The answer depends on how many words you want to look up.

About Part 2:

First, Write a paragraph discussing how the run times of the three algorithms compare.

Finally, I would like you to do some math. Saying that the run time of an algorithm is Θ(n2) means that the run time is approximately proportional to n2. So, we can expect that there is some constant K such that the run time is approximately equal to K*n2. If you have measured the run time, it is possible to solve for K and say that K is approximately equal to the run time divided by n2 (and we can expect the approximation to be better for larger values of n). Once you have K, you can predict the run time, at least approximately, for larger values of n by plugging n into the formula K*n2.

Similarly, saying that the run time of an algorithm is Θ(n*log(n)) means that the run time is approximately equal to K*n*log(n), for some constant K. If you've measured the run time (for an appropriately large value of n), you can get an approximate value for K by dividing the run time by n*log(n). (Note that any logarithm function will work here, as long as you use the same one consistently.)

Using these ideas, the fact that the run time for insertion sort is Θ(n2), and the fact that the run time for Arrays.sort is Θ(n*log(n)), you should calculate answers to the following questions:

  1. About how long would it take to sort an array of one million doubles using insertion sort?
  2. About how long would it take to sort an array of one billion doubles using insertion sort?
  3. About how long would it take to sort an array of one billion doubles using Arrays.sort?

Present your calculations and your actual numerical predictions. Please convert large numbers of milliseconds to a more natural measure of time.