## CPSC 225, Spring 2012

Lab 2: Search/Sort Experiments

We have been talking about the analysis of algorithms in class, and we have looked at some algorithms for searching and sorting arrays as examples. In this lab, you will write programs to do some experiments with searching and sorting algorithms. In the experiments, you will measure the run times of the algorithms for various inputs. You'll also do a couple of calculations based on your results.

Start the lab by creating a new project named *lab2* (or something
like *lab2-cs225* if you prefer). Add copies of the files
unsorted_words.txt,
WordList.java, and SortedWordList.java
to the *src* folder in your project. You can find copies of these files in
*/classes/cs225*. *WordList.java* and *unsorted_words.txt* are the same files that
you used in Lab 1. *SortedWordList.java* is
identical to *WordList.java*, except that it sorts the list of words, and it uses
binary search to search the list for a given word. (In fact, it would make more sense
to store the list of words in a file that is already sorted, but for the purposes of
this lab, we are using an unsorted file.)

The lab asks you to write some programs and a written lab report. Turn in
your programs by copying your lab2 folder into your homework folder in
*/classes/cs225/homework*. You can either turn in the written part in class
or include it as a clearly named file in your lab2 folder.

### Some Techniques

This section goes over a few techniques that you will need to do the lab,
and it contains some methods that you can use in your programs. **Note**
that if you open this page in a web browser, you can copy-and-paste the
methods from the web browser into your program. You don't have to
retype the code!

**First**: Both parts of the lab ask you to measure the time it takes
for your program to perform some task. Recall that this can be done
using the standard function *System.currentTimeMillis*().
This function returns a value of type *long* that represents the
number of milliseconds since a certain time in the past (January 1, 1970).
Call this function once before starting the task, saving the value in a
variable of type *long*. Then call this function again after the task, and subtract the
two values to get the time, in milliseconds, that it took to perform the
task.

Note that there is always some variation in run time, mostly due to the fact that the computer is always doing more than just working on your program. In particular, measurements that aren't at least a few milliseconds are pretty much useless. In general, you want to the task that you are measuring to take at least 10 milliseconds, preferably longer.

**Second**: Remember how to use random numbers in Java. The
function *Math.random*() returns a randomly selected *double*
in the range from 0.0 to 1.0 (not including 1.0). To get a random
integer in the range 0 to *N*, inclusive, you can use
`(int)((N+1)*Math.random()`. To get a random
lower-case letter, you would use:

char ch = (char)( 'a' + (int)(26*Math.random()) );

This leads to the following method for creating random word of a given length where "word" in this case just means a sequence of lower-case letters:

/** * Returns a random string of length n made up of randomly * selected lower case letters. */ static String randomWord( int n ) { String word = ""; for (int i = 0; i < n; i++) { char ch = (char)( 'a' + (int)(26*Math.random()) ); word = word + ch; } return word; }

**Third:** You will need the selection sort and
insertion sort algorithms that we looked at in class.
Here are methods that implement those algorithms
for an array of type *double*`[]`:

/** * Sort an entire array of doubles, using selection sort. */ static void selectionSort(double[] A) { for (int top = A.length-1; top > 0; top--) { int maxLoc = 0; for (int i = 1; i <= top; i++) { if (A[i] > A[maxLoc]) maxLoc = i; } double temp = A[maxLoc]; A[maxLoc] = A[top]; A[top] = temp; } } /** * Sort an entire array of doubles, using insertion sort. */ static void insertionSort(double[] A) { for (int top = 1; top < A.length; top++) { int pos = top; double temp = A[top]; while (pos > 0 && A[pos-1] > temp) { A[pos] = A[pos-1]; pos--; } A[pos] = temp; } }

### Part 1: Linear versus Binary Search

The *WordList* class looks up words in an array using linear search.
The *SortedWordList* class does the same thing using binary search, but
it has to sort the array before it can do this.

For Part 1 of the lab, you should devise experiments to determine, numerically, two things: First, How much faster is binary search than linear search in this case? And second, How long does it take to sort the array?

For the first question, it won't be sufficient to measure the time that it takes to search for a single word, since that time is too short to measure with reasonable accuracy. So you will want to search for a bunch of words and measure the total search time for all of them, using each of the two algorithms. Consider making an array of random words, using techniques discussed in the previous section.

For the second question, note that constructing an object of type
*WordList* involves loading the array from a file, while
constructing an object of type *SortedWordList* involves
both loading the array from a file and then sorting that array.

You should write a program to do the experiments. It's possible to do both experiments in the same program. You will want to run the program several times, since there will be some variation in the run times. To get the best possible answers, you should take the average of the times from several runs.

(By the way, searching and sorting in *SortedWordList* are done
using the built-in functions *Arrays.sort* and *Arrays.binarySearch*
from the class *java.util.Arrays*.)

### Part 2: Three Sorting Algorithms

For the second part of the lab, you want to write a program that
compares the time it takes to sort an array using three different
sorting algorithms: Selection Sort, Insertion Sort, and
the algorithm that is used by the built-in function
*Arrays.sort*(). You will apply these algorithms
to arrays of type *double*`[]`.

Methods for selection sort and insertion sort are given above.
You can use them in your program. *Arrays.sort*() is available
as long as you import it from the package *java.util*.

Your program should sort arrays of various sizes using all three
algorithms. You should do this at least for arrays of length
1000, 10000, and 100000. For *Arrays.sort*(), you can also
do arrays with a million elements or more. (You might
have trouble measuring the sort time for *Arrays.sort*()
on short arrays, so use some longer ones.)

You should fill your arrays with random real numbers. It's a good idea to make several identical arrays, so that you can apply the algorithms to the same input.

You don't necessarily have to write a program that will do all the experiments in one run. You can use a program in which the length of the arrays is given by a constant. Then run the program several times, with different values for that constant.

### Part 3:Results and Analysis

For the written lab report, you should present the results of your experiments. That is, report all the run times that you measured (and say exactly what it was that you measured in each case).

In addition to just reporting the results, you should answer the following questions.

**About Part 1:**

- How many times faster was binary search than linear search in this application?
- How long did it take to sort the array?
- Suppose that you have the unsorted list of words, and you need to search for some words in the list. Is it worth sorting the list first? The answer depends on how many words you want to look up.

**About Part 2:**

**First**, Write a paragraph discussing how the run times of the three algorithms compare.

**Finally**, I would like you to do some math. Saying that the run time of an algorithm is
Θ(n^{2}) means that the run time is approximately proportional to n^{2}.
So, we can expect that there is some constant K such that the run time is approximately
equal to K*n^{2}. If you have measured the run time, it is possible to solve for K and
say that K is approximately equal to the run time divided by n^{2} (and we can expect the
approximation to be better for larger values of n). Once you have K, you can predict
the run time, at least approximately, for larger values of n by plugging n into
the formula K*n^{2}.

Similarly, saying that the run time of an algorithm is Θ(n*log(n)) means that the run time is approximately equal to K*n*log(n), for some constant K. If you've measured the run time (for an appropriately large value of n), you can get an approximate value for K by dividing the run time by n*log(n). (Note that any logarithm function will work here, as long as you use the same one consistently.)

Using these ideas, the fact that the run time for insertion sort is Θ(n^{2}), and
the fact that the run time for *Arrays.sort* is Θ(n*log(n)), you should calculate answers to the following questions:

- About how long would it take to sort an array of one million
`doubles`using insertion sort? - About how long would it take to sort an array of one billion
`doubles`using insertion sort? - About how long would it take to sort an array of one billion
`doubles`using`Arrays.sort`?

Present your calculations and your actual numerical predictions. Please convert large numbers of milliseconds to a more natural measure of time.