Data Structures and Algorithms
Lab 1, January 26, 2004
This lab has two unrelated sections. In the first, you will look at a program that lets you both visualize and time sorting algorithms. In the second section, you will learn a bit about CVS, the "Concurrent Versioning System." There are several homework exercises associated with the lab; these are due next Monday, February 2.
Part 1: xSortLab
In the first part of the lab you will use a program called xSortLab. This program is meant for learning about and experimenting with five different sorting algorithms: Selection Sort, Insertion Sort, Bubble Sort, Merge Sort, and QuickSort. (The version of Bubble Sort in this program is slightly different from the one you saw in last week's homework.) We already know that Selection, Insertion, and Bubble Sort have average-case run time Big-Theta(n*n). QuickSort and Merge Sort, it turns out, have average-case run time Big-Theta(n*log(n)).
The program is written in Java. To run it, you can use the command:java -jar /home/cs327/xSortLab.jar
The program has three screens, which can be selected using the pop-up menu at the top of the window. In the Visual Sort screen, you can watch as the computer sorts an array of 16 bars into order of increasing height, using any one of the five sorting algorithms. You might be interested in looking at Selection Sort, Insertion Sort, and Bubble Sort, if you want to get a better handle on how they work. We will cover QuickSort in class soon. However, the topic of the first exercise is Merge Sort, which we will not cover in class.
Homework Exercise 1: Use the "Visual Sort" in xSortLab to learn about Merge Sort, and figure out how it works. As the sort runs, you will see a description of what is happening at the bottom of the window. Try running the sort with the "Fast" box checked to get an overview of the algorithms. Try it with the "Fast" box unchecked, or by clicking the "Step" button over and over, to get a more detailed view. For your homework, turn in an English or pseudocode description of the Merge Sort algorithm, and try to explain why Merge Sort is a Big-Theta(n*log(n)) algorithm. (Hint: How long does each "phase" take for an array of size n? How many phases are there for an array of size n? Why?)
For your second homework exercise, you will use the "Timed Sort" screen. Select it using the pop-up menu at the top of the window. This screen allows you to collect empirical data about the run time of the five sorting algorithms on arrays of various sizes. (If you don't want to write down the data as you collect it, you can use the "Log" screen. This screen contains the data from all the experiments that you do. After doing the experiments, you can go to the "Log" screen and save the log in a file.)
In the "Timed Sort" screen, you can specify both the array size and the number of arrays. The run time is reported as the "Approximate Compute Time." You might also be interested in checking out the number of comparisons and copies that are done during the sorting. Run times can only be measured approximately. If you sort a single small array, the run time is likely to be reported as a very small number, even zero. To get an accurate run time, you can sort more arrays. For example, you can set the number of arrays to 1000 and then divide the reported run time by 1000 to get the average time it took to sort each of the arrays.
Homework Exercise 2: Use xSortLab to rank the five sorting algorithms from slowest to fastest for arrays of size 5. (You can sort two million arrays of this size -- there is a limit of 10,000,000 for the total size of all the arrays. You will need to use 2000000 as the number of arrays to get a reasonable compute time.) Do the same thing for arrays of size 10000. Report your results. Is the ranking the same in both cases? Would you expect it to be the same for both array size? Explain.
Just for fun, you might want to try Merge Sort and Quick Sort on an array of size one million or even ten million. Don't try this for the other three methods.
For Quick Sort and Merge Sort, we expect that for large values of n, the average run time on an array of size n will be approximately c*n*log(n), where c is some constant. (Note that you can use any log function, not just the base-2 log, since using another log function only changes the constant c.) You can get an approximate value of c by measuring the run time and dividing the answer by n*log(n). You can get a better idea of c by repeating this calculation for several values of n.
Similarly, for Selection Sort, Insertion Sort, and Bubble Sort, the average run time for large n will be approximately c*n*n for some constant c. You can find an approximate value of c by measuring the run time and dividing the answer by n*n.
Homework Exercise 3: Pick one of the algorithms QuickSort or Merge Sort. Estimate the value of c, as discussed above. Then use the formula c*n*log(n) to estimate how long the algorithm will take to sort an array of size one billion (1,000,000,000). Do the same thing for one of the algorithms Selection Sort, Insertion Sort, or Bubble Sort, using the formula c*n*n. Report your data and your answers.
Part 2: Introduction to CVS
CVS is a system for managing revisions of a project, usually a programming project. It can store multiple versions of a project and keeps a history of all the revisions. It can be used over the Internet and makes it easy for several people to work on the same project. It is commonly used for Open Source projects, which can have contributors from all over the world. It can also be useful to help organize an individual project. It can allow you to experiment with the code, but with the option to revert easily to a previous version. Although CVS is a complex system, for an individual project you only need a few of its features.
This part of the lab will lead you through the creation of a project that you will use in a future homework assignment. The exercise for this week is just to work through the lab.
Homework Exercise 4: Work through steps 0 through 5, below. There is nothing to turn in for this exercise. Your account will contain two new directories. I will check that these directories exist and contain what they should.
Step 0: Define Environment Variables: It will be convenient to define some environment variables to make it easier to work with CVS. Edit your .bashrc file and add the following two lines:export CVSROOT="/home/username/.CVS" export CVSEDITOR=nedit
where "username" is replaced by your own user name and the value of CVSEDITOR should be the name of your favorite text editor. (If you don't define CVSROOT, you would have to say "cvs -d /home/username/.CVS" instead of just "cvs" in some of the commands that come up in the rest of the lab. If you don't define CVSEDITOR, you might find yourself using the vi editor.)
These changes will be effective only in a newly opened console window, so open a new window to do the rest of the lab.
Step 1: Make a CVS Repository: A CVS repository is a directory where CVS keeps the official copies of projects, including all the revisions that have been made to them. You can have a personal repository for your individual projects in your home directory. You will use this directory only through CVS, so you might as well make it a hidden directory (with a name that starts with a "."). You will use the directory named .CVS in your home directory. This name is already defined in the CVSROOT environment variable. All you have to do to create the directory and initialize the repository is give the commandcvs init
After you do this, the .CVS directory will already contain some administrative files that are used by CVS.
Step 2: Start a Project: Although you can start with an empty project, usually you start a project from some existing files (after you have them in a state where you think they are worth saving). In this case, you will use a project that I have already started. The files for the project are in the directory /home/cs327/randlist. To start a project based on these files, use the commands:cd /home/cs327/randlist cvs import -m "random lists" randlist me start
The "cvs import" command creates a project made up of all the files and directories in the current directory. You must be in the right directory when you give this command. Here, '-m "random lists"' is an initial log message for the project. CVS records this message in the project's history. (An initial log message is mandatory. If you leave it out, CVS will open the CVSEDITOR so that you can type the message in an editing window; the cvs command completes when you save and exit the editor.) The "randlist" parameter becomes the name of the project in CVS; it does not necessarily have to be the same as the name of the current directory. The "me" identifies the originator of the project or something like that; no one seems to care about it and it can have any value. The "start" could also have any value; it is the name of the initial version of the project. (It's possible to assign a name to any version, and its easy to recover any named version later.)
After you give this command, CVS has made copies of the files in the /home/cs327/randlist directory, and you have no further need for that directory. If you were starting a project from files of your own, you could now archive the original files -- you will be working on the versions controlled by CVS.
Step 3: Work on the Project: To work on a CVS project, you need a working copy of the project files; you never work on the versions stored in the CVS repository. To get a working copy of a project, use the "cvs checkout" command, giving the project name as the parameter:cvs checkout randlist
This creates a directory named "randlist" that contains all the project files plus an extra directory named CVS that contains CVS administrative information. You will not do anything with the CVS directory. Change into the randlist directory:cd randlist
The directory contains a "Makefile." This file is used with the "make" command to compile the project. To do the compilation, just use the command:make
You can then run the compiled program, which is named "randlist". It just creates and prints a list containing the 10 integers from 1 to 10 in a random order.
As a little exercise, edit the file randlist.cc and modify it so that the list that it creates contains the numbers from 1 to 100 instead of from 1 to 10. Note that you are making the changes only on your local copy of the file.
Step 4: Save The changes: When you've made some successful changes to the working copy of the project, you will want to store the changes to the official version in the CVS repository. You can do this with the "cvs commit" command. You must give this command in the directory that contains the working version of the project. The command must include a message, which is recorded in the project log. For example:
cvs commit -m "increase list size to 100"
Step 5: Add a New File: If you add a new file to the working directory, it is not automatically added to the project. For example, the compiled program created by the "make" command is not added to the repository when you commit the changes (and you don't want it to be). If you do want a new file to become part of the project, you have to use the "cvs add" command.
Create a new file named README in the randlist directory. (It can contain anything you like.) To add this file to the project, give the command:cvs add README
The file is not actually copied into the CVS repository until you do a "cvs commit." If you add the file, do a commit, and then checkout another version of the project, you'll see that the newly created version includes the README file -- as well as the changed version of randlist.cc
To learn more about CVS, you can look at its man page. You can do this with the command "man cvs". However, it's much nicer to look at man pages in konqueror. Just open a konqueror window and enter "man:cvs" in its Location box. If you ever use CVS to work on a project with other people, you will at least want to know about the "cvs update" command.