CPSC 327: Hash Table Assignment

CPSC 327, Spring 2004
Assignment: Hash Tables

This assignment is due on Monday, March 1. You should turn in copies of all the files that you write, and you should make sure that your program is available for testing in your account on the department server. This is not an assignment that you will be able to do overnight! I remind you that this is an individual assignment. You should not work on this assignment with other people in the class.

In class, we are looking at the Set Abstract Data Type and at how to use Hash Tables to implement the Set ADT. One natural use of a set is to hold the words in a spelling dictionary. This makes looking up the words very efficient. In this assignment, you will implement and use such a dictionary. You can write the program for this assignment in either Java or C++. No matter which language you use, you are responsible for following the rules of good programming style.

For the first part of the assignment, you should write a class that implements an Open Hash Table of strings. That is, the table is stored as an array of simple linked lists of strings (or, more precisely, an array of pointers to such lists). You might need to look up the implementation of simple linked list operations in your Java or C++ textbook. Your class should be an implementation of the ADT Set of Strings. The class must include:

A function add(s) that adds the string s to the table, if it is not already there.
A function remove(s) that removes the string s from the table, if it's there.
A function contains(s) that returns a boolean value that checks whether the string s is in the table.
A function size() that returns the number of strings in the table.
A hash function to compute the hash code of a string. This function can be private.
A constructor that accepts the size of the table as a parameter.

The second part of the assignment is to write a main program that uses your hash table class. The main program will behave like the standard Linux utility program ispell, when that program is used interactively to check individual words. To try it, enter the command ispell on the command line. You will be prompted to enter words. When you type a word, there are three possible responses: An output of "ok" means that the word is in the dictionary. An output of "not found" means that the word is not in the dictionary and neither are any near misses. Otherwise, the output is a list of near misses, that is, words that are in the dictionary and are similar to what you typed. The ispell program recognizes several types of near-misses: single-letter omissions, single-letter additions, transpositions of two neighboring letters, substitution of one letter for another, and two words run together. Your program should do the same.

The file /usr/share/dict/words contains 45378 words, with one word per line. (This is not the same dictionary used by ispell.) Your program will use this file for its dictionary. It should create a hash table of an appropriate size for the number of words. It should read the words from the file and add them all to the hash table. Then it should prompt the user to enter words. For each word, it should check whether the word is in the dictionary. If so, it should say "ok". If not, it should look for all possible near misses. If the program finds any near misses in the dictionary, it should print them. If not, it should say "not found".

One issue that you will have to settle is what to do with upper-case letters. Note that both the words in the dictionary and the user's input can contain upper case letters. I suggest that you simply convert all letters to lower case.

Searching for near misses will require a significant amount of programming. You will have to generate every possible near miss and look for it in the dictionary. Specifically:

Construct every string that can be made by deleting one letter from the word. (n possibilities, where n is the length of the word)
Construct every string that can be made by inserting any letter of the alphabet at any position in the string. (26*(n+1) possibilities)
Construct every string that can be made by swapping two neighboring characters in the string. (n-1 possibilities)
Construct every string that can be made by replacing each letter in the word with some letter of the alphabet. (26*n possibilities (including the original word n times, which is probably easier than avoiding it))
Construct every pair of strings that can be made by inserting a space into the word. (n-1 pairs of words; you have to check separately in the dictionary for each word in the pair)

Ideally, you should list the near misses in alphabetical order, as ispell does.

If you would like to do some extra credit work on this assignment, talk to be about writing a program that can spellcheck files instead of individual words.

David Eck

CPSC 327, Spring 2004 Assignment: Hash Tables

CPSC 327, Spring 2004
Assignment: Hash Tables