CS 327, Homework 7

CPSC 327, Spring 2019
Homework 7: Heuristic Search

In class, we looked at two techniques for random heuristic search of a large space of "configurations" or "states": Simulated Annealing and the Genetic Algorithm. These techniques apply to optimization problems where a "fitness function" is defined for configuations, and the goal is to find a configuration that minimizes or maximizes the fitness. There must also be a way to randomly modify or "mutate" a configuration to give a new configuration. And for the genetic algorithm, there must be a way to combine or "crossover" two configurations to give a new configuration that combines traits from its two "parent" configurations.

In class, we looked at the idea of applying the two techniques to the k-coloring problem: Given an undirected graph, G, and a positive integer, k, determine whether it is possible to assign colors, chosen from a set of k colors, to G so that for each edge (v,w) in G, the colors assigned to v and w are different. In this context,

A configuration is simply an array of integers whose length is equal to the number of vertices in G and whose elements are integers in the range 1 through k, representing the k possible colors.
The fitness function for a configuration is the number of edges whose endpoints have the same color. A configuration is a solution to the k-color problem if and only if its fitness value is zero. So here, the goal is to minimize the fitness function.
A possible mutation operation is simply to choose a vertex at random and change its color. Another option is to choose an edge whose vertices have the same color and to change the color of one of its endpoints. To get potentially larger jumps, you might apply one or both of these operations several times to the same configuration. (The possibility of larger jumps in fitness is useful in simulated annealing.)
To do a crossover, choose a division point, N, at random between 0 and the length of the array. Make a new configuration array by copying elements up to index N from the first array and copying elements after index N from the second array.

Note that if a configuration is found that has fitness zero, we can be sure that the graph is k-colorable. If we never find a configuration with fitness zero, however, we can't say for sure that the graph is not k-colorable; it might just be that we have failed to find the coloring. (But in a practical application, we might be satisfied with having a very small number of bad edges.)

We looked at examples where the genetic algorithm found a 2-coloring of a 900-vertex grid in a few thousand generations with a population size of 1000. In that case, the algorithm created and evaluated a few million configurations. The size of the configuration space that was being searched was 2⁹⁰⁰, which is a 1 followed by 270 zeros, an almost unimaginably large number. There are only two possible ways of 2-coloring a grid, and the algorithm was able to find one by looking at only a microscopically small fraction of the possible configurations. (But note that the genetic algorithm does not always work so well.) Simulated annealing didn't handle that large a grid in the time I was willing to give it, but it was able to find solutions in extremely large configuration spaces.

The Assignment

The assignment will be due next Monday, April 15, unless we agree in class to a later due date next Friday, April 19.

You should write a Java program that implements either the genetic algorithm or simulated annealing. The program should search for solutions to k-coloring problems, as described above. Alternatively, you can pick a different problem to work on, but you should discuss it with me first. This is meant to be an individual assignment, but if you prefer to work with someone, you can work with one other person and turn in two programs, one for the genetic algorithm and one for the simulated annealing.

This should be a relatively short assignment, given the amount of information you have. I am not giving any specific requirements for the user interface, and it even OK if the program source has to be modified to apply it to a different graph. However, you should make it clear how to use the program (in the comments on the program). Also, you should include a comment stating a few example problems on which you have run the program, and the results that you got.

You are welcome to use the following functions to make sample graphs on which to try your algorithm (assuming that you are using my Graph class to represent the graphs):

    /**
     * Make a graph that is a rectangular grid with the specified
     * number of rows and columns.  A vertex is connected to its
     * neighbors to the left, to the right, above, and below.
     */
    static Graph makeGrid(int rows, int cols) {
        Graph g = new Graph(rows*cols);
        for (int c = 0; c < cols; c++) {
            for (int r = 0; r < rows-1; r++) {
                int a = r*cols + c;
                int b = a + cols;
                g.addEdge(a,b);
            }
        }
        for (int c = 0; c < cols-1; c++) {
            for (int r = 0; r < rows; r++) {
                int a = r*cols + c;
                int b = a + 1;
                g.addEdge(a,b);
            }
        }
        return g;
    }


    /**
     * Creates a random undirected graph with specified numbers of vertices
     * and edges, using a pseudo-random number generator initialized with a
     * given seed.
     */
    private static Graph randomGraph( int vertexCt, int edgeCt, int seed ) {
        Random rand = new Random(seed);
        Graph grph;
        grph = new Graph(vertexCt);
        while (grph.getEdgeCount() < edgeCt) {
            int a = rand.nextInt(vertexCt);
            int b = rand.nextInt(vertexCt);
            if ( a != b && ! grph.edgeExists(a, b) ) {
                grph.addEdge(a, b);
            }
        }
        return grph;
    }

About the Genetic Algorithm

As mentioned in class, the genetic algorithm is more of an idea than a specific algorithm. The general outline of the algorithm is always something like this:

Create a population of randomly generated configurations
bestEver = any individual
Repeat until a solution is found (or time runs out) {
    Sort the population by fitness (if that's useful for selection)
    best = fittest individual in population
    if (best is fitter than bestEver)
        bestEver = best
        if (bestEver is a solution)
            break
    Make a new population array, and for each spot in the array:
        select two configurations from current population by fitness
        apply a crossover operation on them to make a new configuration
        mutate the configuration (possibly only with some probability)
        add the new configuration to the new population array
    Maybe output some progress report
}

There are many ways to "tune" the algorithm, and a lot of tuning might be necessary to get it to work well on a particular problem (if that is even possible). Things that can be adjusted include population size, mutation probability, the nature of the mutation and crossover operations, whether some configurations are copied from the old to the new population unchanged, and how selection by fitness is done. Selection by fitness can be done in many ways, but it doesn't usually have to be very complex to be effective. As I mentioned in class, one idea is to choose one parent from the population in general and one parent from the more fit part of the population.

About Simulated Annealing

Similarly, simulated annealing is more of an idea than a specific algorithm. Simulated annealing looks at only one configuration at at time. At the core of simulated annealing is a probability function Change(c_old,c_new,temperature. After a configuration c_old is mutated to give a new configuration, c_new, the Change() function gives the probability that the algorithm will jump to c_new; that is, the probability that it will replace the current configuration, c_old, with c_new. The probability depends on "temperature" where temperature is a number that decreases over time. Usually, if the new configuration is fitter than the old configuration, the algorithm always takes the jump. If the new configuration is less fit, the probability of jumping will be less than one. At higher temperatures, the algorithm should be more likely to accept a jump that decreases the fitness. The decision can also depend on the amount by which the fitness decreases, with smaller jumps being more likely than larger jumps.

In general outline, a simulated annealing algorithm is something like this:

bestEver = some configuration
temperature = a high value
repeat until a solution is found or time runs out {
    current = a randomly generated configuration
    repeat for some large set number of times {
        newConfig = mutate(current)
        if Math.random() < Change(current,newConfig,temperature)
            current = newConfig
            if newConfig is fitter than bestEver
                bestEver = newConfig
                terminate if bestEver is a solution
         Maybe break if stuck for a while
    }
    Maybe output a progress report
}

CPSC 327, Spring 2019 Homework 7: Heuristic Search

The Assignment

About the Genetic Algorithm

About Simulated Annealing

CPSC 327, Spring 2019
Homework 7: Heuristic Search