THIS IS THE NEXT-TO-TO LAST programming assignment for this course. It concerns the genetic algorithm, and it asks you to write a template class. The template class will implement the genetic algorithm. You will use it in two different programs. One will be equivalent to the example program, ga_text.cc. The other will be a completely different application in which you evolve finite automata.
This assignment is due on Tuesday, April 23. If you turn it in later than 4:00 PM on that day, there will be a 20% penalty. The absolute deadline for turning in the project with a penalty is 4:00 PM on the day after the final exam. (The exam is at 7:00 PM on Saturday, May 4.) This assignment will be worth 30 points.
(About Assignment 9. As you know, the final assignment, Assignment 9, is to write a program of your own choice. Assignment 9 will be graded for 25 points. For full credit, it should be turned in by 4:00 PM on the first reading day, Wednesday May 1. If you turn it in later than that, there will be a 20% penalty. The absolute deadline for turning in the project with a penalty is 4:00 PM on the day after the final exam. It is your responsibility to choose a project of reasonable size and difficulty (similar to the other assignments in the course). Although you are, of course, welcome to learn new programming techniques for this project, I would encourage you to stick to things that we have covered. I also encourage you to talk to me about your ideas before you start the project. If you want to work with someone on this project, please get my permission in advance to do so. Here are a few general ideas: Something using the ncurses library, such as a game or simple text editor (see the programs hangman, and worm, for example). A useful class library, such as one for working with ncurses. A program based on one of the other chapters in the book Artificial Life from which the Genetic Algorithms reading was taken. A Repeated Prisoner's Dilemma tournament, like the one described in the Genetic Algorithms reading. Some kind of data analysis program that would be useful in another course.)
The Genetic Algorithm has many applications in widely differing fields, but the same basic procedure applies in all these cases: An initial "population" of random "individuals" is generated. Somehow, a fitness is assigned to each individual. The individuals reproduce to make a new generation, with fitter individuals having a greater change of producing more offspring. In the course of reproduction, "genetic operators" such as mutation and crossover are applied. (For the sake of simplicity, we will consider only these two operators. Also, we assume that reproduction occurs all at once, to produce an entirely new generation from the current generation. Some versions of the Genetic Algorithm use a continuous reproduction model, where individuals can reproduce at any time.)
This is a natural application for templates. By creating a Population class template, you can code the mechanism of the Genetic Algorithm once, and then apply it easily to many different types of individuals. You should write a template class
template<class Individual> class Population { ...
The parameter class, Individual, can be assumed to have the following features:
These constructors and member functions must be present in any class that is used as a parameter to the Population template. (The class can, of course, have other members if it needs them. In particular, it should have a destructor if one is needed to release memory allocated on the heap by the constructors.)
Here are some detailed specifications for the Population template class. All data members of the class must be private. The class must define at least the following public members:
The Population template must be defined in a header file named Population.h.
As an example, rewrite the ga_text example using your template. That is, write an Individual class in which the data is a string and the fitness of the individual is one plus the number of positions where the string matches some "goal" string. (This goal string should either be a global variable or a static member variable of the Individual class. This is kind of ugly, but it's necessary to fit the example into the framework of the template.)
Some of the code for your program can be copied from ga_text.cc. In particular, you will want to use the code for producing the new generation, based on the fitness values of individuals in the current generation. Note that as a practical matter, you should first write Population as a regular non-template class. Then just add "template<class Individual>" at the beginning when you have it working. Generally, because of the difficulty that the compiler has in reporting errors in templates, it is much easier to write templates in this way, rather than trying to define them as templates from the beginning.
The second part of the assignment is to apply the Population template to an entirely different type of individual. In this case, the individuals will be finite automata. A finite automaton is a simple kind of abstract computing device. It is similar to a Turing machine, except that it doesn't have a tape. Instead, it simply reads a string of inputs, and then produces some sort of output value. It has an internal "state," which can change in response to the input values that it reads. For this assignment, the internal state is simply one of the N numbers between 0 and N-1, where N is some specified positive integer.
The state of the automaton changes as it reads a string of input values. We will assume that the only possible input values are 0 and 1, and that the finite automaton outputs a single value --- also 0 or 1 --- at the end of its computation. The computation itself is specified by a set of rules. It is these rules that actually define the automaton. For each of its states, the automaton has three rules that tell it (1) what state to change to if it is in that state and reads a 1, (2) what state to change to if it is in that state and reads a 0, and (3) what value to output if it is in that state when it reaches the end of the input string. Items (1) and (2) are really just numbers between 0 and N-1 specifying the new state. Item (3) is just a 0 or 1 specifying the output value. So, the automaton is entirely specified entirely by 3 numbers for each state, a total of 3*N numbers.
When an automaton is run with a given input string of 0's and 1's, its computation is given by the following simple algorithm:
Set the current state to 0 For each item in the input string Change to the state specified by the current state and the input Output the output value specified for the current state
Now, what can we do with finite automata? If the output can only be zero or one, we might try to use them for pattern recognition or classification. Suppose that each string can be classified into one of two cases: strings that display some particular pattern and strings that don't. Possible patterns might be: starts with a 0, has no 0's in even numbered positions, contains no substrings of three consecutive 1's, every occurances of 0110 is followed by 1, consists entirely of repetitions of the substring 001, contains the substring 1111. All these patterns can be recognized by finite automata. That is, there is a finite automaton that will read a string and then output a 1 if the input string displays the pattern and will output a 0 if the input string does not display the pattern. (There are other patterns that cannot be recognized by any finite automaton, such as: contains more 0's than 1's or reads the same forwards as backwards.) The question is, instead of trying to program a finite automaton to recognize a given pattern, can we evolve one to do so? That's what you are going to try to do.
Write a Automaton class that can be used in the Population template class. In addition the the methods required by the template, I suggest that you include a function such as int run(string input) to run the automaton with a given input string and return the value that is output by the autmaton. Use 16 states (or make the number of states a constant in the program). A random automaton can be created simply by choosing random values for all the numbers that define the automaton's rules. A mutation can be accomplished by changing one of the numbers to a random value. A crossover will just swap some of the rules between the two automata. The only really hard part is coming up with a fitness value. (Warning: The getFitness() function should not actually compute the fitness. It should simply return a fitness value that has already been pre-computed. You should compute the fitness value for each individual before calling the population's breed() method. Since this method might call getFitness() many times for each individual, it would be extremely inefficient to compute the fitness each time it is called.)
The goal of the automaton is to classify strings. A fitness value can be computed by giving the automaton a bunch of strings to classify and grading it on how many strings it classifies correctly. The more strings it classifies correctly, the higher its fitness. There are various ways to apply this idea. One is to have a "training set" of input strings whose classification is known. Each automaton is tested on these strings. If you manage to evolve an automaton that gets a perfect score on the training set, you can hope that it will also be able to correctly classify strings that were not in the training set.
I will leave the details up to you. You might read the training set from a file, so that you can easily try your program on various patterns. Be sure to document what you are doing in your comments.
Here is an alternative view of the classification problem that you might find more interesting. Imagine a long sequence of 0's and 1's. Given a series of, say, ten numbers from the sequence, an automaton will try to predict, with better than 50% accuracy, the next number in the series. This is really the same as the classification problem, except that the "training set" consists of all the length-10 substrings of a long string of 0's and 1's, and the classification (0 or 1) of each substring is specified by the following number in the string. Since the string might or might not show a pattern, you might or might not be able to evolve an automaton that can predict with 100% accuracy. But since any finite string has some pattern, you can probably evolve one that can do somewhat better than 50%.
This could even be useful, if the input string represents some sort of experimental data. After being trained on some data, the automata could be asked to make actual predictions about the next experiment. Is is possible that you can evolve automata to recognize some subtle pattern than might not be apparent to a human observer? What if the experiment is "Did the stock market go up or down today"? Could your automata make a killing on the market?
More realistically, consider the game where two players each choose "heads" or "tails". If the players' choices match, the first player wins. If the players' choices are different, the second player wins. The game is repeated over and over. Is there any way to make money at this game? Let's say that the first player is a computer. Can the computer play in such a way that it will probably come out ahead in the long run? If the computer makes its choice at random, it can expect to break even in the long run, no matter what the opponent does. Can it do better? That is, can it predict its opponent's next choice, with better than 50% accuracy, based on the opponent's previous choices? Sound familiar? Just think 0/1 instead of head/tail. Since humans have a hard time making truly random choices, there might be some sort of pattern in the opponent's choices. If the computer can recognize the pattern, it can win more games than it loses. Suppose the computer has an evolving population of automata to make the predictions...
If you want to experiment with this idea, you could write your program as follows: Ask the user to enter a long random sequence of 0's and 1's (about 100). Use the first half of the sequence to train your sequence of automata. Then test them on the second half. Will they do better than 50% accuracy on the test? They shouldn't, if the numbers are really random. They will, if there is some sort of hidden pattern.