Programming Assignment 7: Binary Sort Trees

ASSIGNMENT SEVEN asks you to do some work with
Binary Sort Trees. **It is due next Wednesday, April 3.**

We discussed Binary Sort Trees (BST's) in class and looked at
recursive subroutines for traversing a BST and for inserting an
item into a BST. If items are inserted into a BST in a random order,
then the process of searching for an item in the tree is very efficient --
much more than searching a linked list. Searching for an item in a linked list
with N items takes N/2 steps on average, even if the list is sorted.
For a linked list, the number of steps is closer to log_{2}(N),
which is much smaller. For this assignment, you will verify this fact
experimentally by constructing and analyzing random BST trees. The
trees will contain integers as items, so the node type for the trees is:

struct TreeNode { int item; TreeNode *left; TreeNode *right; };

You will need the *insertBST* method that we looked at in class.
To make a random BST tree with a given number, N, of nodes, start with
an empty tree. Then generate N random ints and insert them into the tree
one by one. When you are working on this part of the program, you might
want to use the *printBST* method to make sure that the items
in the tree are in fact in the correct order, but the final version of
the program will not print the trees.

Once you have a random BST, you can ask how many steps it takes to
find an item in the tree. There are two different questions you can
ask: What is the *maximum* number of steps, and what is the
*average* number of steps. Two terms can be helpful here.

- The
**depth**of a node in a binary tree is the number of nodes on a path from the root node to the given node. The root is at depth 1. The children of the root are at depth 2, the children of the children of the root are at depth 3, and so on. - The
**height**of a binary tree is the maximum depth of any node in the tree. It is the number of nodes on the longest path starting from the root.

The height of a tree is the maximum number of steps it will take
to find an item in the tree. The average number of steps is equal to the
average depth of all the nodes in the tree; it can be found by adding
up the depths of all the nodes and dividing by the number of nodes.
To find these values, you will need two recursive functions:
a function to find the height of a tree and a function to add up the
depths of all the nodes in the tree. To write the height function, you
should think about how the height of a non-empty tree can be computed
from the heights of its sub-trees. In the sum-of-depths function,
you can keep track of the depths of the nodes that the function visits
by using an int parameter (named *depth* perhaps) that goes up
by one every time the function calls itself recursively.

You should also have some way of deleting all the nodes of a tree.
Since your program will be building a lot of trees, you should reclaim the
memory used for each tree after you are done with it. You can write
a recursive function to delete the nodes of a tree. (It is not sufficient
just to apply the **delete** operator to the root node, since that
will not delete the nodes in the subtrees.)

The object of the program is to run experiments in which you create random BST's and determine their heights and average depths. You should write a function

void experiment(int numberOfNodes, int &height, double &averageDepth);

to run one experiment. The first parameter gives the number of nodes in the tree. The second parameter is used to return the height of the tree that is created. The third parameter is used to return the average depth of all the nodes in the tree.

To review, here are the functions you will need:

- find the height of a binary tree
- find the sum of the depths of the nodes in a binary tree
- delete all the nodes of a binary tree
- Create a random BST with N nodes, and find its height and average depth

Your main program should create 1000 random trees containing a given number of items. Do this by calling the last of the above functions 1000 times. The program should report the average height of the 1000 trees and the average average depth of the 1000 trees. You might also want to print out the maximum height that was observed for all the trees.

You should do this for a variety of sizes (either by having the program do it automatically or by running the program several times for given sizes). Write a paragraph describing how the answers depend on the number of items in the tree, and turn in this paragraph along with a printout of your program. You should place a copy of the program in your homework directory.

As I mentioned in class, the final assignment for this class is to come up with an idea for a programming project and to do it. It will only count for 30 points, and it does not have to be much longer than the rest of the assignments for the class. The main point is for you to design a project that you want to do. I might throw out ideas for projects in class, but I will not specify any details. The project can be either a program or a small library of classes that could be useful in a variety of programs. You are encouraged to discuss your ideas with me. The project can be turned in any time up to the day after the final exam. (The exam is on Saturday, May 4.)

David Eck, 25 March 2002