This course ended on May 14, 2013

## CPSC 327: Data Structures and Algorithms

```   Department of Mathematics and Computer Science
Hobart and William Smith Colleges

Spring 2013.

Instructor:  David J. Eck  (eck@hws.edu)

Course Handout:  http://math.hws.edu/eck/courses/cpsc327_s13.html

Monday, Wednesday, Friday, 3:00 -- 3:55 PM
Room Lansing 301.

```

### First Week: January 23 and 25

The first week is an introduction to the course. You should read Chapter 1, especially Sections 1.1, 1.2, and 1.3. Note that Section 1.4 is there mostly for later reference. The textbook assumes that you already know something about "graphs" (where a graph is a set of "vertices" and a set of "edges" between vertices). That's probably not true, and I will cover graphs and their representations in class when when we need them. Also, the only type of tree that you are likely to have seen is binary trees, and you can skip over the stuff about more general trees for now.

For your first homework assignment, you should do the following exercises from the textbook:

```   From Section 1.1:  5, 6
From Section 1.2:  9
From Section 1.3:  9
From Section 1.4:  2, 3, 7
```

You can discuss the exercises with other people in the course, but in the end, you should write up your own solutions in your words.

This homework is due in class next Wednesday, January 30.

It's not too early to start thinking about doing your first short presentation! Some of the exercises in Chapter 1 might make good topics.

### Second Week: January 28 and 30; February 1

This week, we'll be looking at the mathematical analysis of algorithm efficiency. We'll look at the classes Ω(f(n)), O(f(n)), and Θ(f(n)). And we'll compute the run-time efficiency of simple algorithms. The reading for the week is Sections 2.1, 2.2, and 2.3. The homework, which is due next Wednesday, is

```     Section 2.1, # 7, 9
Section 2.2, # 9, 10
Section 2.3, # 3, 5, 6, 13
```

### Third Week: February 4, 6, and 8

We are going to jump over the rest of Chapter 2 for now and get started on Chapter 3. We'll get back to Section 2.4 later.

The reading for the week is Chapter 3, Sections 1, 2, 3, and 4. This chapter covers "brute force" and "exhaustive search" algorithms, often the simplest but least efficient way of solving a problem. The first three algorithms in the chapter are selection sort, bubble sort, and sequential search. These are mostly left for reading. (The idea in Section 3.2 of adding the search term to the end of the list that you are searching is something that you should be aware of.) We will spend most of our time on string matching, the convex hull problem, and various examples of exhaustive search in Section 3.4. By the end of the week, we should start talking about graphs and how a graph can be represented as an adjacency matrix and as adjacency lists.

Here is the homework for the week, including a programming assignment:

```      Section 3.1, # 10
Section 3.2, # 5
Section 3.3, # 5
Section 3.4, # 6, 8
```

Programming Assignment: Write a program to implement the brute force convex hull algorithm described on page 112 and 113. The input is a list of points in the plane. Every line of the input file contains two numbers, separated by a space, that give the x and y coordinates of one of the points. The goal is to output a list of the line segments that make up the boundary of the convex hull of the given set of points. You can assume that no three input points lie on a line. I will put some sample input files in the directory /classes/cs327.

All the homework is due next Wednesday, February 13. The written homework should be turned in in class. Your program should be copied to your homework folder, inside the directory /classes/cs327/homework.

### Fourth Week: February 11, 13, and 15

The topic for the week is graphs. We will define related concepts such as graph, directed graph, weighted graph, paths in graphs, acyclic graph, connected graph, and tree. We'll look at the two basic graph traversal algorithms, depth-first search and breadth-first search. The reading is the material in Section 1.4 on graphs and trees (pages 28--35), and the section on graph traversal, Section 3.5. On Friday, we will cover topological sorting of directed acyclic graphs, which can be found in Section 4.2 in the textbook.

The following homework is due next Wednesday, February 20:

```     Section 3.5, #1, #4, #6, and either #8a or #8b (you choose a or b).
```

There is also a programming assignment that is due on Monday, February 25: You can do this assignment either working alone or with a partner, but if you are working with a partner, you have to do a little more programming.

For the assignment, you will write two programs. Both programs will work with basic (undirected, unweighted) graphs. One program will represent the graph using an adjacency matrix. The other program will represent the graph using adjacency lists. For the adjacency lists, you can make your own linked lists, or you can use Java's LinkedList class.

Each program should be able to read a description of the graph from a file. The file format is as follows: The first line contains two integers, separated by a space, where the first integer is the number of vertices in the graph and the second integer is the number of edges. After that, there is one line for each edge. A line specifies an edge as two numbers, separated by a space; the numbers give the two vertices that are joined by the edge. (Vertices are numbered from zero to N-1, where N is the number of vertices in the graph.)

After reading the file and building a representation of the graph, the programs should find all the connected components of the graph. One of the programs should use breadth first search to do this, and the other program should use depth first search. The output from the program should list the size of each connected component. For example:

```     Connected component 1:  17 vertices
Connected component 2:  5 vertices
Connected component 3:  67 vertices
```

If you are working alone, that is all you need to do. If you are working with a partner, then your programs must also determine whether or not the graph is acyclic. If the graph is acyclic, the program should simply say that the graph is acyclic. However, if the graph contains a cycle, then the program should print out a cycle by listing the vertices that occur in the cycle. A graph can, in general, contain many cycles; you only need to list one of them.

Sample data files for your programs can be found in /classes/cs327/graph-files.

### Fifth Week: February 18, 20, and 22

We will be going back to cover recurrence relations, Section 2.4, which we omitted the first time through Chapter 2. The main point to think about is the run-time analysis of recursive algorithms, where you can't simply count the number of iterations of loops.

The programming assignment on graph representations, depth-first search, and breadth-first search is due next Monday, February 25. The following written homework is also due on that day:

```         Section 2.4, # 1b and 1c (by backward chaining)
# 4a and 4b (by forward chaining)

Section 4.2, # 1a and 5b
```

Also, remember that there is a test coming up next Wednesday, March 27. An information sheet is available.

### Sixth Week: February 25 and 27; March 1

Posted late... Much of the week was taken up by the test on Wednesday, prep for the test on Friday, and going over the test on Friday. Aside from that, we made some progress in Chapter 4. There was no new reading for the week.

Here is a link to a copy of the take-home part of the test: cs327-takehome1.pdf

### Seventh Week: March 4, 6, and 8

This week, we will be looking in detail at three algorithms from Chapters 4 and 5: the k-th order statistic algorithm, QuickSelect, pages 158-161; multiplication of large integers, pages 187-189; and the efficient closest-pair algorithm, pages 192-195. We will also look at other examples of "divide and conquer" that you have already seen, such as merge sort and QuickSort. As part of the order statistic algorithm, we will revisit the partitioning method that is used in QuickSort.

You should also read the introduction to Chapter 5 and go through Sections 5.1 and 5.2 to be sure that you understand Merge Sort and QuickSort.

### Eighth Week: March 11, 13, and 15

We will cover selected topics from Sections 6.1, 6.4, and 6.3, including: "presorting" as a technique for transforming one instance of a problem into another, more tractable instance; heaps, Heapsort, and using a heap to implement a priority queue; and a brief look at balanced binary trees and 2-3 trees.

Homework #6 and Programming Assignment #3 are due on Friday, March 15.

Next week is Spring Break. Have fun!

### Ninth Week: March 25, 27, and 29

We begin the week with some basic examples of "problem reduction" from Section 6.6. Then we move on to Chapter 7, "Space and Time Tradeoffs." We will cover the Boyer-Moore algorithm for string matching, hashing, and (probably taking us into next week) B-Trees.

Homework #7 is due on Friday, March 29. The assignment sheet also contains some information about the second round of student presentations.

Note: In the end, we did not cover the Boyer-Moore algorithm, and we cut short Chapter 7 at the end of the week.

### Tenth Week: April 1, 3, and 5

The reading for the week is Chapter 8, which covers Dynamic Programming.

There is a test coming up next week, on Friday, April 12.

The next programming assignment is due at 4:00 PM on Tuesday, April 9. It has two parts, the first on closed hashing and the second on dynamic programming.

Programming Assignment #4, Part 1: The assignment is to write a program that implements closed hashing, using linear probing, and use it to test the formulas for the expected number of probes. You can use int values as keys, and use a hash table of type int[]. You will need to implement insert and find operations. You do not need a delete operation. (The find operation should just check whether the key is in the table.) Your basic experiment should be able to easily vary the size of the table and the load factor. You should run trials for various table sizes and for various load factors, including some that are close to one. You should run a large number of successful searches and a large number of unsuccessful searches, and you should compute the average number of probes. According to the book, the average number of probes for a successful search and unsuccessful searches should be

```         S  =  ( 1 + 1/(1-α) ) / 2

u  =  ( 1 + 1/((1-α)*(1-α)) ) / 2
```

where α is the load factor, and the approximation should improve as the size of the table increases. (You might want to run your experiments using several different hash functions. The formulas are for an ideal hash function and randomly distributed keys.)

You should write up a report that explains your procedures and presents your results and conclusions.

Programming Assignment #4, Part 2: Pick one of the dynamic programming applications that uses a two-dimensional table, other than the "robot coin" problem. Write a program that implements the dynamic programming approach to the problem and prints out the solution. The program should print out the full solution, not just the minimum or maximum value. You should make your program easy to test, without having to recompile it.

Candidate problems include: edit distance, longest common subsequence, knapsack problem, and optimal binary sort tree. The first two are not in the book, but were covered in class. The last two are in the book, but were not covered in class.

### Eleventh Week: April 8, 10, and 12

There is a test on Friday. An information sheet is available

Aside from the test, we will be finishing up Dynamic Programming (Chapter 8) and starting on Greedy Techniques (Chapter 9). We should cover Floyd's all-pairs shortest path algorithm from Chapter 8. Hopefully, we will have time to do at least one of the algorithms from Chapter 9.

Programming assignment #4 is due on Tuesday, at 4:00 PM.

### Twelfth Week: April 15, 17, and 19

We are working on "Greedy algorithms." We will finish up Dijkstra's single-source shortest path algorithm and move on to Prim's and Kruskal's minimal spanning tree algorithms. The reading is Chapter 9, Sections 1 to 3. We might move on to start Chapter 10 before the end of the week.

The take-home part of the second test is due on Wednesday

We are starting a second round of presentations. Tom leads off on Monday, with a presentation about hash functions. Jimmy is up on Friday to talk about data compression algorithms.

### Thirteenth Week: April 22, 24, and 26

We will finish Chapter 9 by briefly covering Kruskal's minimal spanning tree algorithm. I will discuss Critical Path Analysis, which is not covered in the book but is based on the "longest path in a dag" problem from the recent take-home test. After that, we will do the Stable Marriage Problem, Section 10.4, the only thing that we will do in Chapter 10. If we have time this week, we will begin a discussion of P and NP.

Here is the final programming assignment for the semester, which will be due sometime next week:

Write a program to implement Prim's algorithm. Use the final version that we looked at in class. The graphs to which you will apply the algorithm are sparse, and so you should use the adjacency list representation for the graphs.

The program will work with input files of the following form: The first line contains two integers giving the number of vertices in the graph and the number of edges in the graph. Following that, there is one line for each edge, which contains two integers and one real number giving the vertex numbers of the two endpoints of the edge and the weight of the edge. (Remember that for an undirected graph, each edge appears twice in the adjacency list representation.)

The program should compute and output the total weight of a minimal spanning tree for the graph. However, if the graph described by the input file is not connected, the program should say so instead of outputting the total weight. (My program also outputs each edge that is added to the tree, but you are not required to do that.)

Sample input files for the program can be found in /classes/cs327/prim-files. The program should take only a few seconds, even for the largest graph. Here are the weights found by my program (which I do not guarantee to be correct):

```        prim5.dat:       5.3
prim10.dat:      45
prim25.dat:      56
prim50.dat:      graph is not connected
prim100.dat:     171.088
prim1000.dat:    3937.391
prim10000.dat:   17558.412
```

### Fourteenth Week: April 29; May 1 and 3

The final programming assignment is due on Wednesday.

We will spend the week on P, NP, and NP completeness. In the book, this material is in Section 11.3, but I will be going a little more in-depth than the textbook. We might also do a bit from Section 12.3, on approximation algorithms for NP-complete problems.

### Fifteenth Week and Final Exam: May 6 and 14

Monday, May 6 is the last day of classes, and will be review for the exam. The final exam takes place in our regular classroom at 8:30 PM on Tuesday, May 14. Here is the information sheet.

Here are my office hours for the end of the semester (you might be able to find me in my office at other times too):

```       Monday, May 6:    1:30 to 3:00
Tuesday, May 7:   1:30 to 3:00
Thursday, May 9:  2:45 to 4:00
Friday, May 10:   1:00 to 4:30
Monday, May 13:  11:00 to 3:00
Tuesday, May 14: 12:00 to 1:15
```