Programming Assignment 8

THIS IS THE NEXT-TO-TO LAST programming assignment for this course. It concerns the genetic algorithm, and it asks you to write a template class. The template class will implement the genetic algorithm. You will use it in two different programs. One will be equivalent to the example program, ga_text.cc. The other will be a completely different application in which you evolve finite automata.

This assignment is due on Tuesday, April 23. If you turn it in later than 4:00 PM on that day, there will be a 20% penalty. The absolute deadline for turning in the project with a penalty is 4:00 PM on the day after the final exam. (The exam is at 7:00 PM on Saturday, May 4.) This assignment will be worth 30 points.

(**About Assignment 9.**
As you know, the final assignment, Assignment 9, is to write a program
of your own choice. Assignment 9 will be graded for 25 points.
For full credit, it should be turned in by 4:00 PM on the first reading day,
Wednesday May 1. If you turn it in later than that, there will be
a 20% penalty. The absolute deadline for turning in the project with
a penalty is 4:00 PM on the day after the final exam. It is your
responsibility to choose a project of reasonable size and difficulty
(similar to the other assignments in the course). Although you are, of course,
welcome to learn new programming techniques for this project, I would
encourage you to stick to things that we have covered. I also encourage
you to talk to me about your ideas before you start the project.
If you want to work with someone on this project, please get my permission
in advance to do so. Here are a few general ideas: Something using
the ncurses library, such as a game or simple text editor (see the
programs *hangman,* and *worm,* for example). A useful class
library, such as one for working with ncurses. A program based on one
of the other chapters in the book *Artificial Life* from which the
Genetic Algorithms reading was taken. A Repeated Prisoner's Dilemma tournament,
like the one described in the Genetic Algorithms reading. Some kind of
data analysis program that would be useful in another course.)

The Genetic Algorithm has many applications in widely differing fields, but the same basic procedure applies in all these cases: An initial "population" of random "individuals" is generated. Somehow, a fitness is assigned to each individual. The individuals reproduce to make a new generation, with fitter individuals having a greater change of producing more offspring. In the course of reproduction, "genetic operators" such as mutation and crossover are applied. (For the sake of simplicity, we will consider only these two operators. Also, we assume that reproduction occurs all at once, to produce an entirely new generation from the current generation. Some versions of the Genetic Algorithm use a continuous reproduction model, where individuals can reproduce at any time.)

This is a natural application for templates. By creating a Population class template, you can code the mechanism of the Genetic Algorithm once, and then apply it easily to many different types of individuals. You should write a template class

template<class Individual> class Population { ...

The parameter class, *Individual,* can be assumed to have the
following features:

- A default constructor (with no parameters) that constructs a random individual.
- A copy constructor and an assignment operator with the correct semantics. (In many cases, the ones that are provided automatically have the right semantics.)
- A function
(with no parameter) that can be called during the reproduction process to determine the fitness of the individual. The value returned by this function should always be non-negative.`double getFitness()` - A function
(with no parameters) that mutates the individual by making some small change in its data.`void mutate()` - A function
that performs a crossover operation on this individual and the specified partner individual. Both individuals are changed by this operation.`void crossover(Individual &partner)`

These constructors and member functions must be present in any
class that is used as a parameter to the *Population* template.
(The class can, of course, have other members if it needs them. In particular,
it should have a destructor if one is needed to release memory allocated
on the heap by the constructors.)

Here are some detailed specifications for the *Population* template class.
All data members of the class must be private.
The class must define at least the following public members:

- Either a constructor that allows the population size to be set as a parameter, or a method that allows the population size to be set later. (In the later case, you will have to use dynamically allocated arrays.)
- An overloaded
that makes it possible to get a reference to any one of the individuals in the population.`operator[]` - A function
that does all the actual work of creating a new population and applying mutation and crossover operations to it. (This function will contain most of the code in the class.)`void breed()` - Functions for setting the mutation probability and the crossover probability that are to be used during reproduction.
- Functions for retrieving the average fitness of the population, the maximum fitness among all the individuals, and an individual of maximal fitness. (These values can be computed and saved in member variables when the population is created, rather than recomputing them every time a function is called.)

The *Population* template must be defined in a header file named
`Population.h`.

As an example, rewrite the ga_text example using
your template. That is, write an *Individual* class in which
the data is a string and the fitness of the individual is one plus the
number of positions where the string matches some "goal" string.
(This goal string should either be a global variable or a static member
variable of the Individual class. This is kind of ugly, but it's necessary
to fit the example into the framework of the template.)

Some of the code for your program can be copied from
ga_text.cc. In particular, you will
want to use the code for producing the new generation, based on the
fitness values of individuals in the current generation.
**Note** that as a practical matter, you should first write
*Population* as a regular **non-template** class.
Then just add "template<class Individual>" at the beginning
when you have it working. Generally, because of the difficulty that
the compiler has in reporting errors in templates,
it is **much easier** to write
templates in this way, rather than trying to define them as templates
from the beginning.

The second part of the assignment is to apply the *Population*
template to an entirely different type of individual. In this case,
the individuals will be finite automata. A finite automaton is a
simple kind of abstract computing device. It is similar to a Turing
machine, except that it doesn't have a tape. Instead, it simply reads
a string of inputs, and then produces some sort of output value.
It has an internal "state," which can change in response to the
input values that it reads. For this assignment, the internal state
is simply one of the N numbers between 0 and N-1, where N is
some specified positive integer.

The state of the automaton changes as it reads a string of input values. We will assume that the only possible input values are 0 and 1, and that the finite automaton outputs a single value --- also 0 or 1 --- at the end of its computation. The computation itself is specified by a set of rules. It is these rules that actually define the automaton. For each of its states, the automaton has three rules that tell it (1) what state to change to if it is in that state and reads a 1, (2) what state to change to if it is in that state and reads a 0, and (3) what value to output if it is in that state when it reaches the end of the input string. Items (1) and (2) are really just numbers between 0 and N-1 specifying the new state. Item (3) is just a 0 or 1 specifying the output value. So, the automaton is entirely specified entirely by 3 numbers for each state, a total of 3*N numbers.

When an automaton is run with a given input string of 0's and 1's, its computation is given by the following simple algorithm:

Set the current state to 0 For each item in the input string Change to the state specified by the current state and the input Output the output value specified for the current state

Now, what can we do with finite automata? If the output can only
be zero or one, we might try to use them for *pattern recognition*
or *classification.* Suppose that each string can be classified
into one of two cases: strings that display some particular pattern
and strings that don't. Possible patterns might be: starts with
a 0, has no 0's in even numbered positions, contains no substrings
of three consecutive 1's, every occurances of 0110 is followed by 1,
consists entirely of repetitions of the substring 001, contains the
substring 1111. All these patterns can be recognized by finite
automata. That is, there is a finite automaton that will read a
string and then output a 1 if the input string displays the pattern
and will output a 0 if the input string does not display the pattern.
(There are other patterns that *cannot* be recognized by
any finite automaton, such as: contains more 0's than 1's or
reads the same forwards as backwards.) The question is, instead of
trying to *program* a finite automaton to recognize a given
pattern, can we *evolve* one to do so? That's what you
are going to try to do.

Write a *Automaton* class that can be used in the
*Population* template class. In addition the the methods required
by the template, I suggest that you include a function such as
`int run(string input)`
to run the automaton with a given input string and return the value that
is output by the autmaton. Use 16 states (or make the
number of states a constant in the program). A random automaton
can be created simply by choosing random values for all the numbers that
define the automaton's rules. A mutation can be accomplished by changing
one of the numbers to a random value. A crossover will just swap some
of the rules between the two automata. The only really hard part is coming
up with a fitness value. (**Warning: The getFitness() function
should not actually compute the fitness. It should simply return
a fitness value that has already been pre-computed. You should compute
the fitness value for each individual before calling the population's
breed() method. Since this method might call getFitness() many times
for each individual, it would be extremely inefficient to compute the
fitness each time it is called.**)

The goal of the automaton is to classify strings. A fitness value can be computed by giving the automaton a bunch of strings to classify and grading it on how many strings it classifies correctly. The more strings it classifies correctly, the higher its fitness. There are various ways to apply this idea. One is to have a "training set" of input strings whose classification is known. Each automaton is tested on these strings. If you manage to evolve an automaton that gets a perfect score on the training set, you can hope that it will also be able to correctly classify strings that were not in the training set.

I will leave the details up to you. You might read the training set from a file, so that you can easily try your program on various patterns. Be sure to document what you are doing in your comments.

Here is an alternative view of the classification problem that
you might find more interesting. Imagine a long sequence of 0's and
1's. Given a series of, say, ten numbers from the sequence, an automaton
will try to predict, with better than 50% accuracy, the next number
in the series. This is really the same as the classification problem,
except that the "training set" consists of all the length-10 substrings of
a long string of 0's and 1's, and the classification (0 or 1) of each
substring is specified by the following number in the string. Since
the string might or might not show a pattern, you might or might not
be able to evolve an automaton that can predict with 100% accuracy.
But since any finite string has *some* pattern, you can probably
evolve one that can do somewhat better than 50%.

This could even be useful, if the input string represents some sort of experimental data. After being trained on some data, the automata could be asked to make actual predictions about the next experiment. Is is possible that you can evolve automata to recognize some subtle pattern than might not be apparent to a human observer? What if the experiment is "Did the stock market go up or down today"? Could your automata make a killing on the market?

More realistically, consider the game where two players each choose "heads" or "tails". If the players' choices match, the first player wins. If the players' choices are different, the second player wins. The game is repeated over and over. Is there any way to make money at this game? Let's say that the first player is a computer. Can the computer play in such a way that it will probably come out ahead in the long run? If the computer makes its choice at random, it can expect to break even in the long run, no matter what the opponent does. Can it do better? That is, can it predict its opponent's next choice, with better than 50% accuracy, based on the opponent's previous choices? Sound familiar? Just think 0/1 instead of head/tail. Since humans have a hard time making truly random choices, there might be some sort of pattern in the opponent's choices. If the computer can recognize the pattern, it can win more games than it loses. Suppose the computer has an evolving population of automata to make the predictions...

If you want to experiment with this idea, you could write your program as follows: Ask the user to enter a long random sequence of 0's and 1's (about 100). Use the first half of the sequence to train your sequence of automata. Then test them on the second half. Will they do better than 50% accuracy on the test? They shouldn't, if the numbers are really random. They will, if there is some sort of hidden pattern.

David
Eck, April 2002