Lecture 36 Log into Windows/ACENET. Download and extract TravelingSalesman.zip. Double-click into the project folders to the solution file. Double-click on the solution file to start MS VS. Final programming project vs. final programming exam? Questions? Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 1 Outline Evolutionary computation aka "genetic algorithms" Terminology Basic algorithm Traveling salesman problem Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 2 Evolutionary Computation Evolutionary computation is an area of computing devoted to computing techniques that mimic biological evolution. This type of computation was first conceived in the 1950's and 1960's, but executing these techniques requires a large amount of computing resources, so until the 1990's, it was mostly a research topic. Now that CPUs are very fast and memory is very cheap, anyone can explore this area. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 3 Evolutionary Computation Some terminology Fitness – a measure of the quality of an individual item in a population compared to an ideal. Only the fittest individuals survive to the next generation. Crossover – similar to biological reproduction. Characteristics of one or more parent individuals are combined by some algorithm to form a new individual. Mutation – a random change in some portion of an individual Growth – a small change in an individual directed toward a better fit to a goal Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 4 Evolutionary Computation Evolutionary algorithm (EA) is a generic term that encompasses all programming methods that rely on evolution to achieve a stated goal. There are four recognized subareas of evolutionary computing Evolutionary programming – EA that relies primarily on mutation Evolutionary strategy – EA that combines mutation and crossover Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 5 Evolutionary Computation Genetic algorithm – often used interchangeably with EA, but also is used to mean EA that relies primarily on crossover Genetic programming – evolutionary techniques applied to creating a program to solve a problem. Typically the output is a LISP program. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 6 Basic Evolutionary Algorithm An original population is selected. Typically this is done in a random fashion so that each member of the population represents a solution to the problem. This is the first generation. The individuals are assessed to determine their fitness. Typically, they are sorted with the fittest on top. After sorting, parents are selected from the population. They are the individuals that are the most fit. E.g., the top 30%. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 7 Basic Evolutionary Algorithm Crossover is performed using the parents to generate children. E.g., two randomly selected parents may be chosen with parts of each going together to form a child. This process continues until the full population is produced. E.g. if 30% of the population are parents, then the other 70% may be replaced by children. Mutation is performed, making small random changes to certain elements of an individual without regard to whether it is beneficial. Usually a small percentage of the population. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 8 Basic Evolutionary Algorithm Growth operations may be performed. Growth allows some small group of the population to change in positive fashion. It is similar to mutation except individuals are assessed immediately and only positive changes are retained. The population of survivors (typically parents) and the new children become the next generation. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 9 Basic Evolutionary Algorithm This process is repeated many times until some time limit expires or until some member of the population achieves a desired assessed value. Research in this area primarily focuses on determining appropriate fitness functions, methods of crossover, and methods of mutation as applied to application areas. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 10 Traveling Salesman Problem One such application area is the Traveling Salesman Problem (TSP). The problem is stated as follows: A salesman is assigned a region of n cities as his sales territory. Periodically, he is required to tour his region. That is, he is to visit each city exactly once before returning to his home city. Assuming that the cost of traveling between two cities is proportional to the distance between the two cities, what is the least-cost route the salesman can take? Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 11 Traveling Salesman Problem The TSP has been posed since at least the 1800's with much work done by Harvard mathematicians in the 1930's on a general form of the problem. A brute force solution would be to determine the total distance of all possible routes. Unfortunately, for n cities, there are (n-1)! possible routes, making this solution infeasible for all but the smallest problems, even with the 155 fastest computers. E.g., 99! 9.332 x 10 Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 12 Traveling Salesman Problem We can arrive at a reasonable approximation to the optimal solution using evolutionary programming techniques as follows: Generate a large number, N, of random paths and find the shortest one. This is generation 0. For example, for n = 100, to compute the total distance of a path we computed 99 distances. Then for N = 1000, we do this 1000 times. The fitness function is the length of each path, so for n cities, we sum (n-1) distances between each city in the path. The shortest path is the most fit. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 13 Traveling Salesman Problem To create the next generation, we can do crossover by using the shortest path to generate (N-1) new children paths by taking a random subsection of the shortest path and adding the remaining cities at random. E.g. if n = 5, a shortest path may be {3,0,2,4,1}. A random subsection might be {2,4} and randomly selecting the remaining cities could result in a new child path of {2,4,0,3,1}. After all the children paths are constructed, mutation may be applied. E.g. exchanging the positions of two random cities. The generation is sorted, and then this process is repeated for k generations. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 14 Traveling Salesman Project The Traveling Salesman project is an application that implements this evolutionary algorithm and uses graphics to display the shortest path. A typical run is shown on the next slide. The user can enter the following options: number of cities number of generations number of paths per generation mutation percentage Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 15 Traveling Salesman Project genes passed on – can be a percentage or random. If it is random, then a path of random length is chosen from the parent path and copied to each child path. If it is not random, then the user can chose 0% to 99% of the parent path to be passed to each child. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 16 Traveling Salesman Project The program has two classes. City class – represents the location of a city as an (x,y) screen coordinate and stores a boolean flag visited. It supports properties to get and set these values. Path class – contains a path through the cities represented as an array of integers that are indexes into the application array of cities and the length of the path. It supports properties to get and set these values. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 17 Traveling Salesman Project The main application data structures are cities: an array of City objects that are created when the program starts up and are recreated whenever the user changes the number of cities. (See the constructor and the nudNumCities_ValueChanged handler. Both call the CreateCities method.) paths: an array of Path objects that are the individuals of a generation. shortestPath: a Path object that contains the shortest length Path generated so far Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 18 In-class Exercise The first part of the exercise is to complete the FindShortestPath method. This method does the following: Determine the shortest path of the current generation Determine whether this shortest path is the shortest path seen overall. This part has been provided. The basic idea is to keep track of the shortest path as each individual is assessed. Start by saying the first path is the shortest so far. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 19 In-class Exercise The algorithm for this is: 1. Initialize shortestIndex to 0 and shortestLength to paths[0].PathLength 2. For indexes from 1 to numPaths-1 2.1 If paths[i].PathLength is shorter than shortestLength, then i becomes the new shortestIndex and paths[i].PathLength becomes the new shortestLength Write the code for this. Run and test the program using small numbers. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 20 In-class Exercise We can improve the evolution of the shortest path in two ways: In the Mutation method, instead of exchanging a city with the next city in the path, we can exchange a city with a random other city. Be sure that the second city chosen is not the same as the first. In the Crossover method, instead of always copying the first x cities from the shortest path, we can copy a random subset of x cities from the shortest path. This requires choosing a random starting index and "incrementing" it as the cities are copied. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 21 In-Class Exercise We can improve the user interface by having the starting city rendered as a solid green circle to distinguish it from the other cities in the path. This can be done using the FillEllipse method that receives a Brush (rather than a Pen). Since the filled portion is inside the drawn outline, a filled ellipse needs to be slightly larger than a drawn one to look the same size. Brush greenBrush = new SolidBrush(Color.Green); g.FillEllipse(greenBrush, cities[0].X­4, cities[0].Y­4, 9, 9); Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 22 GUI Notes The user input is done using NumericUpDown objects (prefix nud). This GUI element is like a textbox, but also has the up-down arrows to click on. The amount that is added/subtracted is the Increment property. It also prevents non-numeric characters from being typed in. The progress bar is a ProgressBar object (prefix pgb). To use it, the Maximum property is set. The Value property starts at 0 and as it is set, the GUI element displays the Value/Maximum ratio. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 23 GUI Notes The Random button has a CheckedChanged event handler that is called whenever the button is clicked. When it is checked, it sets the randomGenes flag to true and disables the Percent box. Vice versa when it is unchecked. The Number of Cities box has a ValueChanged event handler that is called whenever the user changes the value in that box (either by clicking an arrow or by typing in the box). This handler recreates the city list to have the appropriate number of cities. The new state is not displayed until the next Start button click. Monday, April 11 CS 205 Programming for the Sciences - Lecture 36 24