Lab 2: Sorting Benchmarks

The point of this lab is to run some experiments to compare the run time of some sorting algorithms. The programming for this lab is not particularly difficult, but you will have to think about the design of the experiments. This lab, unlike most labs, has a written component. In addition to programming the experiments, you will have to write up a short lab report to present your results.

**Both the programming and the written parts
of the lab are due at the beginning of lab next week. The program should be
copied into your homework folder in /classes/s11/cs225/homework.
For the written part, you can either
turn in your report on paper or you can include it in a file along with your program.
**

You will need a copy of the file *SlowSort.java*, which you can find in
the directory */classes/s11/cs225*. `SlowSort` contains several static methods for
sorting arrays. **You should start a new Eclipse project and copy SlowStart.java into that project.**

The `SlowSort` class defines the following methods for sorting arrays of various types:

public static void selectionSort(double[] A) // sort the entire array public static void insertionSort(double[] A) // sort the entire array public static void selectionSort(double[] A, int count) // sort A[0]..A[count-1] public static void insertionSort(double[] A, int count) // sort A[0]..A[count-1]

The class also includes the same set of four methods for sorting arrays of type `int[]`
and arrays of type `String[]`.
All of these methods have average case run time Θ(N^{2}), where N is the number of items
that is being sorted. (When sorting random arrays, the average run time is what you expect to see.)

Now, there is a standard class in Java, `java.util.Arrays`, that includes some built-in
methods for sorting arrays. `Arrays.sort(A)` can be used to sort an entire array `A` of numbers or
strings. `Arrays.sort(A,0,N)` can be used to sort the first `N` items in an array `A` of numbers
or strings. The sorting methods used by the `Arrays` class have run time Θ(N*log(N)),
at least in the average case. (In fact, the sorting methods used by this class uses are
versions of MergeSort for sorting Objects and of QuickSort for sorting numbers.)

The goal of this lab is to investigate the run times of the various sorting algorithms and to
get a feel for the real difference between Θ(n^{2}) and Θ(n*log(n)). To do this,
you will construct arrays filled with random values, use various methods to sort them, and measure
the run times.

The basic idea is to make an array filled with random numbers or random strings. (You will have
to give some thought to the question of what it means to generate a "random string." Very short
strings are not a good idea, since you will get too many duplicate strings. Insertion sort, for example,
will perform much better than average on an array that contains many duplicates.)
It's a good idea to make multiple copies of each array, so that you can try several
sorting algorithms on the same data. Remember that just saying "`B=A`" does not
make a copy of the array `A`; it just makes `B` point to the same array as `A`.

To measure the time that it takes to sort an array, you can use the following pattern:

long startTime = System.currentTimeMillis(); ... // Sort the array long runTime = System.currenTimeMillis() - startTime();

This measures the run time in milliseconds, where 1 millisecond equals 1/1000 second. For a short array, the time might be measured as 1 or 2 or even 0 milliseconds, which is only a very crude measurement of the actual run time. You really want to use arrays that will give a run time of at least, say, a dozen milliseconds.

Make sure that you only measure the time that it takes to sort the array. Don't include the time that it takes to fill the array with random values.

**You should apply SlowSort.insertionSort(), SlowSort.selectionSort(), and Arrays.sort()
to arrays of type double[] and to arrays of type String[]. You can do int[] too, if you want,
but it's not required. You should sort arrays of size
1000, 10000, and 100000, at least. For Arrays.sort(), you can also sort arrays of several million
items -- or even larger if you ask how to increase the amount of memory that is made available to your
Java program.**

You can either write one program that does all the work, or you can write several very similar programs. You should output the results in a clear format, and you should follow the usual style rules for programming, including the use of comments.

For the written lab report, you should **first** present the results. Tables might be a good
format for this.

**Second**, you should have a brief discussion of the comparisons between the run times of different algorithms.
How does InsertionSort compare to SelectionSort? How do they compare to the algorithms used by `Arrays.sort()`?
Is there any difference between the sort times for `double[]` and `String[]`?

**Third**, you should try to come up with some explanations for the observations that you have made.

**Fourth** and finally, I would like you to do a bit of math. Saying that the run time of an algorithm is
Θ(n^{2}) means that the run time is approximately proportional to n^{2}.
So, we can expect that there is some constant K such that the run time is approximately
equal to K*n^{2}. If you have measured the run time, it is possible to get an estimate for K by
saying that K is approximately equal to the run time divided by n^{2} (and we can expect the
approximation to be better for larger values of n).

This means that you can use your run-time measurements for a particular algorithm to find
an approximate value for the constant K for that algorithm. Once you have K, you can use the formula
K*n^{2} to **predict** the run time for array sizes n that you have not measured.
Similar computations apply, of course, to an algorithm with run time Θ(n*log(n)).

**I would like you to do calculations of this type to make predictions about the following:**

- How long would it take to sort an array of one million
`doubles`using insertion sort? - How long would it take to sort an array of one billion
`doubles`using insertion sort? - How long would it take to sort an array of one billion
`doubles`using`Arrays.sort`?

Present your actual numerical predictions. It would be nice to convert from milliseconds to a more convenient measure of time.