Lab 6: Random Expressions

This week is the first of two labs that deal with Expression Trees. You will want to be sure to finish this week's lab before next Tuesday, but your work for the two labs will be turned in together, two weeks from the date of this lab.

The basic idea here is that we imagine that we have a set of sample data, and we would like to find an expression that matches that data as well as possible. The data is in the form of input/output pairs. That is, we have a set of input values to some function or physical process, and we have the output that is produced for each input value. We would like to find an expression that will give approximately the same output for each of the sample input values.

In order to do this, we can try generating random expressions. The expressions can include the variable "x" to represent the value of an input. Each expression that we generate can be tested on all the sample inputs, and we can keep track of which expression gives the best overall approximation to the sample outputs. After a certain amount of time, we simply take the best expression that we have found so far.

For this lab, you will implement the procedure described in the preceding
paragraph. Next week, we will try to apply *genetic programming* to the
problem and see how much better it does (if at all).

The file */classes/s09/cs225/Expressions.java*
contains several static
nested classes that define expression nodes, as we discussed in class. **You
should get a copy of this file and add it to your Eclipse project.** To keep
things simple, you can add put the code that you write for the two
labs into this file.

The file contains a recursive *copy()* method that makes a full copy of
an expression tree. You will need this method next week, but for now it just
serves as a model of recursive tree processing. It might help you to think about
the *randomExpression()* method that you will have to write.
The file also contains a
*makeTestData()* method that is meant to produce the sample input/output
data that you want to match. Finally, there is a *main()* method whose
purpose is to test the other stuff in the file; you will have to replace this
method.

Your first task is to write a recursive method that creates a random expression. You don't want to make a tremendously complex expression (and certainly not one that is infinitely long...), so the method should have a parameter that limits the "height" of the expression tree. The height is the greatest number of steps from the "root" of the tree to one of the "leaves" at the bottom of the tree. In the recursive method, the height of the tree is the same as the number of levels in the recursion. The method should have a parameter should represent the maximum height:

static ExpNode randomExpression(int maxHeight) {...

The method can randomly decide what type of node to create. If *maxHeight* is
zero, the only possibilities should be a constant node or a variable node.
If the maxHeight is greater than zero, any type of node is possible (even
constant or variable nodes; maxHeight gives the maximum allowable height, not the
actual height). To make a binary operator node, you have to create random
expressions for the left and right operands of the binary operator; you can do
this recursively, passing maxHeight-1 as the value of the parameter. (Note: For
constant nodes, I suggest that you limit yourself to fairly small numerical values.)

Add some code to the main method to test your function. Print out a bunch of randomly created expressions and their values.

Once you have a random expression, you have to test it against the sample
input/output data created by the *makeTestData()* method. When you do this,
you will have two outputs for each input -- the one in the sample data and the
one produced by your expression. The expression's outputs are supposed to be
an approximation for the sample outputs. The question is, how to measure the closeness
of the approximation?

A common measurement for such situations is the *Root Mean Square Error* (RMSE).
This is given by the square root of the average of the squares of the errors for
all the individual inputs. Mathematically, the RMSE is defined as

where *x _{i}* are the sample inputs,

static RMSError( ExpNode f, double[][] sampleData ) {...

to compute the Root Mean Square Error. Once you have this method, you
can write a *main* method that generates a bunch of random expressions,
finds the RMSE for each of them on the sample data, and keeps track of the one
that has the smallest RMSE. Do this in a while loop that runs for, say,
10 seconds. After the loop ends, print out the best expression found and its RMS Error.

(**Note:** There is one problem that you might want to think about. It is
possible for the value of an expression to be infinite (if it divides a non-zero
number by zero) or NaN (if it divides zero by zero). Such things can happen
easily in randomly generated expressions. If you simply use the formula given
above for the RMS Error, the answer will be infinite or NaN. This is not a big
problem for today's lab, but you might want to think about what would be the right value for
the RMSE when it happens. It will be more of an issue next week.)

The expression node classes that I have given you include only a constant node
class, a variable node class, and a binary operator class. Add one more type
of expression node, *UnaryOpNode*. Use it to represent (at least) the mathematical
functions *sin*, *cos*, *exp*, and *abs*. Each of these functions
can be considered to be an operator that applies to just one operand. (The "unary minus"
operator discussed in the book is another example, and you can add that to the class if
you want.)

I suggest that you use integers to code the possible functions (for example, 0 for *sin*,
1 for *cos*,
and so on). You might want to define constants such as "`final static int COS = 1`"
to give names to the constants.

Besides defining the *UnaryOpNode* class, you will have to take account of the
existence of this class in the *copy()* method and in the *randomExpression()*
method. Don't forget to test your work.