CASC: A Case Study in Coevolution

Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T High Level View of CASC Page  2 CASC Evolutionary Model Page  3 CASC Evolutionary Model Page  4 CASC Evolutionary Model Page  5 CASC Evolutionary Model Page  6 Reproduction Phase: Programs  Randomly select a genetic operation to perform – Probability of operation selection is configurable and/or adaptive  Select individual(s) to use – First select sub-set of individuals (i.e., tournament) – Then perform fitness proportional selection in sub-set (i.e., roulette) – Reselection allowed  Perform operation, generate new program(s)  Add new individuals to population  Repeat until specified number of individuals has been created Page  7 Reproduction Phase: Programs  Genetic Operations – Reset – Copy – Crossover • Two individuals are randomly selected based off fitness • Randomly select and exchange compatible sub-trees • Generates two new programs – Mutation • Off-by-one mutation bias • Randomly select individual based off fitness • Randomly select and change mutable node • Generate a new sub-tree (if necessary) – Architecture Altering Operations • Delete a line, add assignment, add flow control Page  8 Reproduction Phase: Test Cases  Reproduction employs uniform crossover  Same selection method as programs  Each offspring has a chance to mutate  Genes to mutate are selected random  Mutated gene is randomly adjusted – The amount adjusted is selected from a Gaussian distribution Page  9 CASC Evolutionary Model Page  10 CASC Evolutionary Model Page  11 Evaluation Phase  All programs run against all test cases – Full population exposure vs. population sampling – Hash table used to avoid repeat evaluations  Executions scored based on input and output of the program – Black box style – Run-time exceptions and time-outs monitored  Fitness for program is average of all execution scores – Test case scores are directly related to this value Page  12 CASC Evolutionary Model Page  13 CASC Evolutionary Model Page  14 CASC Evolutionary Model Page  15 CASC Implementation Details Page  16  Adaptive parameter control – EAs typically have many control parameters – Difficult to find optimal settings for these parameters – In CASC genetic operator probabilities are adaptive parameters – Rewarded/punished based on performance • If one operator is generating improved individuals more than the others make it more likely to be used – Allows the system to adapt to the different phases in the search CASC Implementation Details Page  17  Parallel Computation – Computational complexity is generally a problem for EAs – CASC typically writes and compiles thousands of programs on a given run • Typically executes millions of evaluations (literally) – To reduce run times executions are done in parallel (NIC cluster) • All other evolutionary phases are done in serial – Main node: responsible for generating and writing programs – Worker nodes: responsible for compiling and executing programs – Dramatically speeds up execution CASC Criticisms  Scalability – The problem space is infinite for even simple programs – Must correct software in reasonable time, regardless of program size  Fitness Function Design – Each new problem for CASC requires a new fitness function – Infinite possible fitness functions – Limited number of high quality fitness functions – Design of high quality fitness functions is extremely difficult Page  18 Scalability: ARCD  Automated Relevant Code Discovery (ARCD) System – Preprocessor for CASC – Uses bug localization techniques to remove irrelevant lines of code from consideration – Ensemble of analysis methods • Each method generates a set of suspect lines of code • Results are combined together and a relevant code set is generated – Voting system – Confidence levels • Employ state of the art bug localization techniques • Exploit the availability of fitness function – Prototype is under development – Three techniques currently implemented • Positive/negative trace comparison • Line suspicion based on fitness • Fitness run-time plot Page  19 ARCD: Pos./Neg. Trace Comparison Page  20 Positive Trace Negative Trace 1 2 3 4 5 4 5 4 5 10 11 12 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 3 0 0 0 0 0 0 0 0 0 4 0 0 0 4 0 1 0 1 0 0 0 0 5 0 0 0 0 5 0 2 0 2 0 0 0 6 7 4 5 6 7 8 9 4 5 8 9 10 11 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 2 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 2 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 ARCD: Fitness Plots Page  21 Fitness Plots 1.2 1 Fitness 0.8 0.6 0.4 0.2 0 0 10 20 30 40 Lines Executed 50 60 70 Incorrect Program Correct Program Scalability: CC-CoEA  Cooperative-Competitve Coevolution (CC-CoEA) – Multiple program populations – Cooperative coevolution of program components – Each sub-population is focused on a specific portion of the program – Components are selected from each population and a program is assembled – Fitness indicates how well each component operated – Divide the problem space into smaller, more manageable pieces – Allow CASC to “freeze” sub-populations that are suspected to have converged Page  22 Scalability: CC-CoEA Page  23 Fitness Function Design Page  24  Current approach: guide for fitness function generation – Formalize the thought process for fitness function design – Incorporate quality measures to assure quality fitness functions – Incorporate advanced fitness function techniques, mapped to problem characteristics (indicate when techniques will be useful) – Extend to be useful for black box search algorithms that use fitness functions – Implement as semi-automated tool for fitness function design  Alternative approach – Exploit formal specifications • Information about expected program operation • Possibly generate new, correct code from scratch – No evidence this approach will be superior • Many open problems • One-to-many relationships Page  25 Questions?

CASC: A Case Study in Coevolution

Related documents

Products

Support

CASC: A Case Study in Coevolution

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib