CPSC 225, Spring 2019
Lab 4: Timing Exercises

The amount of programming for this lab is relatively small. You will be working with binary search and quicksort, but you can mostly copy the code for those algorithms from the textbook. In addition to the programming, the main lab work is to do some timing experiments to see how much speedup you get by replacing linear search with binary search. At the bottom of this web page are some written exercises, including reporting on the results of your timing experiments and some run-time analysis problems.

You will need a copy of /classes/cs225/lab4-files/TimedSort.java. You might also need copies of the other files in /classes/cs225/lab4-files. (See below.)

The work for this lab includes some programming and some written responses to exercises. The exercises should be turned in on paper at lab next week. For the programming, you should turn in at least your files Spellcheck.java and SortedWordList.java. (Other files are optional.) You should place those files in a folder named lab4 in your homework folder in the directory /classes/cs225/homework. This work is due at next week's lab.

Timed Spellchecks

For this semester's first lab, you wrote a spell check program. That program used a WordList class that represents a list of legal words. The actual list of words was stored in an array of strings, and that array was unsorted. The method for testing whether a word is in the list uses simple linear search of the unsorted list. We have seen that binary search is a much faster sorting method, provided that the list of words is sorted. For your first exercise, you will implement this idea to speed up the spell check program, and you will make some timing measurements to see how much speed-up you get. (You'll also get a bit more experience with subclassing.)

You will need a copy of WordList.java that has been modified in one crucial way: The instance variable named words has to be changed from private to protected. You can make the change yourself, or you can get a new WordList.java from the directory /classes/cs225/lab4-files.

You will also need a working Spellcheck program. (It should be one that only makes one WordList object and uses it throughout the program.) You can use your own program, or you can use mine. You can get a copy of my working Spellcheck.java from /classes/cs225/lab4-files.

And of course, you will need a copy of unsorted_words.txt, the data file that contains the list of legal words.


You should modify the program Spellcheck.java to find and output the time that it takes the computer to do two operations. In particular, you want to time how long it takes to create the WordList object, and you want to time how long it takes to run all of the subroutines that check for words that are similar to the user's input. (Not the times to run each of the subroutines individually, but the total time to run all the subroutines.) You can time an operation by recording the time before and the time after the operation. The built-in function System.nanoTime() returns the current time, measured in nanoseconds, where one billion nanoseconds are equal to one second. Here, for example, is the beginning of my Spellcheck program, modified to report the time that it takes to create the list of words. (The format specifier "%,d" outputs an integer value with commas, such as 258,178,568.)

public class Spellcheck {
	
	private static WordList listOfWords;  // The list of correctly spelled words.
	
	public static void main(String[] args) {
		long start = System.nanoTime();
		listOfWords = new WordList();
		long end = System.nanoTime();
		System.out.printf("Time to create word list: %,d nanoseconds.%n", 
		                                                      (end - start));

Next, you want to make a new class, SortedWordList, that will use binary search to search for words in the list. You must create SortedWordList as a subclass of WordList. In the subclass, the constructor must sort the array, words, that contains the list of words, and the contains() method must be overridden to use binary search. Furthermore, you should implement these operations as follows:

To sort the array, you should use quicksort, and you should implement quicksort using subroutines that you add to the SortedWordList class. To implement binary search, you should use a recursive binary search subroutine that you add to the SortedWordList class.

I strongly suggest that you start by copy-and-pasting the code for binarySearch(), quicksortStep(), and quicksort() from Section 9.1 in the textbook directly into your file. However, those methods are written to work with arrays of ints, and you need to apply the algorithms to an array of Strings. This means that you will have to modify the subroutines to work with stings instead of ints. (And hopefully, that will encourage you to pay attention to what they actually do.)

Note: Recall that a string, str, has a method str.compartTo(s), where the parameter s is also a string. This method returns an integer that is negative if str is less than s, is zero if str is equal to s and is positive if str is greater than s. You will need to use this compartTo method to compare strings.


You are then ready to do some timing experiments and collect some data. Run the original SpellCheck, with added timing code, and record some of the times. It is best to run the program twice and use the second run. (This is because creating the word list is likely to be faster the second time you do it; this happens because the file contents are likely to be "cached"—that is, saved—after it is read the first time, making subsequent reads faster.) It is also better to run a few spell checks before recording the measurements. This is because a Java program actually speeds up when it is run. (Java uses a "just-in-time compiler" that optimizes code that is executed several times; if you input the same word multiple times just after starting the program, you should notice the compute times decrease after the first one or two times.)

There will be some variation in the times simply because of random factors such as the computer doing other things besides running your program. To get a good idea of the actual time required by the algorithms, you will need to run each experiment several times.

Next, you want to try using the sorted list instead of the original unsorted list. Since SortedWordList is a subclass of WordList, the only change that you need to make to the Spellcheck program is to create the word list as a SortedWordList object instead of as a WordList. It should only be necessary to change one line. Change

      listOfWords = new WordList();
to
      listOfWords = new SortedWordList();

You don't even have to change the type for the variable listOfWords! An object of type SortedWordList can be assigned to a variable of type WordList.

You should then run the modified program and collect timing data. You will need to report your data in Exercises 1 and 2, below.

(Note: Yes, I do know that storing the words in the file in sorted order in the first place would make more sense than sorting them every time you load the file. Just go with it.)

TimedSorts

The file TimedSort.java runs a timing experiment to test how long it takes to sort the same array using insertionSort() and the built-in sorting method, Arrays.sort(). (Arrays.sort() uses Quicksort to sort an array of numbers.) As it is written, TimedSort.java sorts an array of 10000 random integers. You should collect timing data from this program sorting arrays of various sizes, including at the very least arrays of size 1000, 10000, and 100000. Modify the program as necessary to change the array size. As usual, it's best to run the program several times for each array size to get a better of idea of the true running times. You will need the data that you collect for Exercises 3 and 4.

Exercises

Your responses to these exercises are to be written (or typed) on paper. Your responses should be turned in at the lab next Tuesday. The first four exercises ask about the timing experiments that you did for the lab. The remaining problems ask you to analyze the run time of several code segments. For all exercises, show your work and explain your reasoning.


Exercise 1:  Report your run time measurements for creating a WordList object to the run time for creating a SortedWordList object, and discuss the results. (Compare the run times. How much time does sorting the array add to the process? Is the difference really significant? Is the extra work worth it? How many words do you have to spell check before sorting the array actually saves time?)


Exercise 2:  Report on the time measurements that you made for spell-checking the user's words, and discuss the results. (How much of a speedup do you see? How do the timing results depend on the length of the word that is being spellchecked? Is the extra complexity of the binary search algorithm justified in this application.)


Exercise 3:  Report on the time measurements that you made for sorting arrays of various sizes using insertionSort() and Arrays.sort(). Also, to help compare the times, for each array size, you should compute and report the ratio of "time to sort with insertionSort" divided by "time to sort with Arrays.sort". You should see that this ratio increases as the array size increases. Explain why that behavior is expected.


Exercise 4:  How long would it take for insertionSort() to sort an array of one billion random numbers? How long would it take for Arrays.sort()? Compute the best estimates that you can, based on your timing data from TimedSort.java.

Remember that insertionSort() has average run time Θ(n2). This means that the run time should be approximately some constant times n2. You can use the timing data from the experiment to find an approximate value for the constant, and then use the formula to estimate the time to sort one billion items. Similarly, you can use the fact that Arrays.sort() has run time Θ(n*log(n)) for the second part of the exercise.

(Report your answers in more appropriate units than nanoseconds, such as years or days or hours or minutes.)


Exercise 5:  Suppose that N is the length of the array A. Analyze the run time of the following code segment, in terms of N. That is, give a "Big-Theta" estimate of the run time, such as Θ(N2) or Θ(log(N)) or Θ(sqrt(N)) or Θ(N) or .... Don't just state the run time! Justify your answer by explaining why your algorithm represents the run time of the code segment. The explanation does not have to be very long, just a short paragraph.

sum = 0;
for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
        sum = sum + A[i]*A[j];
    }
}

Exercise 6:  Suppose that N is the length of the arrays A and B. Analyze the run time of the following code segment, in terms of N. (Justify your answer.)

for (int i = 0; i < N; i++) {
    B[i] = A[i];
}
for (int i = 1; i < N; i++) {
    B[i] = B[i] + B[i-1];
}

Exercise 7:  Suppose that N is the length of the array A. Analyze the run time of the following code segment, in terms of N. (Justify your answer.)

int K = N-1;
int sum = 0;
while (K > 0) {
    sum = sum + A[K];
    K = K/2;
}

Exercise 8:  Consider the following subroutine, where N gives the length of the array A:

static boolean test( int[] A, int N ) {
    for (int i = 0; i < N; i++) {
        for (int j = i+1; j < N; j++) {
            if (A[i] == A[j]) {
                return true;
            }
        }
    }
    return false;
}

This subroutine tests whether there is a duplicate element in the array. Analyze the best case and the worst case run times of this subroutine, in terms of N. What kind of array will have the best-case run time? What kind of array will have the worst case run time? (Why?)