Delaney Software Services Inc. The Genetic Model in Business Application Development Artificial Intelligence in Business Series Thomas E. Delaney Delaney Software Services Inc. 281-550-7752 tom@delaneyware.com January 20, 2001 Table of Contents OVERVIEW .......................................................................................................... 3 THE NEED FOR A NEW APPROACH ................................................................. 4 USING GENETIC ALGORITHMS IN BUSINESS APPLICATIONS ..................... 6 AN EXAMPLE ...................................................................................................... 7 THE NEXT STEP – GENETIC PROGRAMMING ............................................... 11 MORE GENETIC PROGRAMMING ................................................................... 14 THE EMERGENCE OF AN ADAPTIVE SYSTEM .............................................. 14 FINAL THOUGHTS ............................................................................................ 17 SOURCES .......................................................................................................... 18 Table of Figures FIGURE 1: A GENETIC ALGORITHM FLOWCHART .............................................................. 10 FIGURE 2: ALGORITHM EVOLUTION WITH GENETIC PROGRAMMING................................. 13 FIGURE 3: THE ADAPTIVE SYSTEM ................................................................................... 16 ______________________________________________________________________________________ Delaney Software Services Page 2 2/17/2016 Overview Many modern software applications include workflow or project planning tools. Frequently, these tools consist of a large number of variables and an even larger number of database records where these variables are expressed in many different combinations. These records represent entities in the business process, such as work orders, purchase orders, and employees. Frequently there are subtle interdependencies among these entities that are not expressed in the application or the underlying data model. Customers want to use the data not only for archiving purposes, but also to assist in managing the business work flow, maximizing customer satisfaction, and improving the corporate profitability. The question for the application designer is how best to achieve these objectives. With complex business systems, there are too many interdependencies among business entities for the programmer to calculate the best answer to a specific question. For instance, a typical rotating equipment repair facility may at any time be working 100 projects concurrently. Each job has a specific set of tasks designed to execute a specific workscope. Each task has one-to-many resource requirements, including asset requirements, such as a lathe or boring mill, as well as skill level requirements. These resources are usually limited and therefore have to be carefully scheduled. Furthermore, the job’s timing, the profitability of the work order and the employee satisfaction associated with the work are dependent on the scheduling. Frequently, these interdependencies work against one another. For instance, it may be more profitable to work employees 12 hours a day, seven days a week to meet a customer’s short deadline, but then the employee satisfaction level would seriously deteriorate, likely to the point where skilled labor is lost and then the shop’s ability to provide its traditional goods and services is seriously compromised. Even with sophisticated ERP business systems, many of these difficult-to-model business processes cannot be optimized in any way. The age-old approach of management-on-thefly, or better yet the biggest-fire-is-extinguished-first approach are the management styles most commonly in use today. This is due to the fact that the entire business process is ______________________________________________________________________________________ Delaney Software Services Page 3 2/17/2016 not sufficiently represented in the software and/or the tools are not available to make the most use of the available information. Another approach is required. Genetic algorithms provide the most viable solution. A recent article in Scientific American references the work of John H. Holland, a renowned computer scientist who holds appointments at the University of Michigan and Santa Fe who states, “Genetic algorithms have proved important in generating new solutions across a number of areas. There is not any counterpart to this type of crossbreading in traditional optimization analyses.” This paper explains in detail how these algorithms may be assigned to a company’s business processes and provide best-practice solutions to manage complex operational interdependencies. The Need for a New Approach The example described in the overview is not what we would call a simple optimization problem—it is a problem that does not have a concrete, easily identifiable solution. Even if we could infer a solution, a precise method for proving its veracity does not exist. Let’s extend our simple example to better illustrate our dilemma. Let us say that our shop has two employees, two machines, and two projects currently in-house, both of which are emergencies. Project A is for our best customer, but has the lower expected gross margin. Project B has better margin prospects, but is for a customer we only hear from once a year. Both Projects require the use of Machine A, but as it turns out, the same tasks could be performed on Machine B, but it would take these tasks 25% longer to performed. Also, employee A is very efficient on both machines, but employee B has limited experience on Machine A and takes about 25% longer than Employee A. However, employee B wants very much to get more work on Machine A. In fact, employee B has expressed that another shop has promised him that if he were to come and work for them he would be able to use such a machine more often and increase his skill level. ______________________________________________________________________________________ Delaney Software Services Page 4 2/17/2016 To satisfy employee B, and hopefully prevent him from leaving, we want to give him the work on Machine A. The problem is this will adversely affect our profit margin and extend the time to complete the jobs. So perhaps we could then say we will let employee B work Project B, after Employee A has worked Project A on this machine. This sounds great, but won’t we be tempted to make the same decision next time when presented with a similar conflict? If so, we can bet that employee B will be leaving. Furthermore, no one within our organization will ever be as proficient as employee A on this machine, which means we are going to reduce future margins by consistently making such short term scheduling decisions. If employee B leaves for a job at another company, in the future we will be able to process slightly more than half the work we can currently process under these circumstances, which is obviously undesirable. If we are strictly driven by lead-time requirements, then we schedule employee A on Machine A, but then we will run into a labor problem, which in turn will lead to an inability to meet future customer requirements. Thus the short-term solution in one specific area actually can lead to a long-term problem in the same area. In other words, we end up solving a problem today by creating an even bigger problem in the future. If we are driven strictly by profit considerations, then we probably would make the same choice short term, but our ability to produce profit will be compromised in the future by the labor problem. But maybe we can have the best of both worlds: we buy an exact replica of Machine A, and work both jobs concurrently with both employees. This sounds great, but obviously increases the shop burden pool, and decreases profit, and how will we know the expense was worth it? As a company adds more and more jobs, with more and more tasks and resource requirements, with more and more assets available as resources, the problem will continue to escalate. Typical workflow software tools can provide the shop with the ability to schedule and predict asset and labor utilization, and may allow the user to tweak some variables and adjust the scheduling, but how do we know, with all the various-and sometimes competing-measures of performance, that we aren’t wasting opportunities? What is ______________________________________________________________________________________ Delaney Software Services Page 5 2/17/2016 needed is an alternative software tool that can demonstrably improve resource allocation as measured against performance indexes that truly indicate overall facility performance. Using Genetic Algorithms in Business Applications The alternative that we propose is based on what are known as genetic algorithms. The components of a genetic algorithm are the following: 1. A genetic code that succinctly represents the entire business process. This code consists of one-to-many genes, each of which can take on one-to-many predetermined values. All of the genes in one code are said to represent one individual in the population. What a gene represents has to be determined by the designer, but it must have the following characteristics: a. The value of each gene is completely independent of the value of any other gene. In other words, if Gene X changes value, this does not affect the value of any other gene. b. What the genes represent affects every aspect of the business process, so that changing the value of the genes affects all business processes which contribute to the performance indexes. Algorithms are designed which demonstrate this dependence, and these algorithms are said to express the genes. 2. A fitness function, which is an algorithm that takes a genetic code and in effect scores it against all the performance factors we want to include. 3. A procedure for replication, known as crossover and mutation. Crossover takes two current codes and combines them to create a third code; mutation is included for random gene value changes. 4. Population control rules that determine which codes participate in future crossover and mutation operations, and which codes are discarded. 5. An appropriate control procedure which determine how many generations to create, how many individuals per generation, etc. ______________________________________________________________________________________ Delaney Software Services Page 6 2/17/2016 The concept behind genetic algorithms is that through a series of iterations, the algorithm will “evolve” toward a “fit” solution. In other words, those genetic codes that fair well against the fitness function will be allowed to survive and multiply; those that do not will be discarded. In the end we will have one or a set of possible “solutions” to our optimization problem. To further insure our success, and to kick off the evolutionary process, we will need to “seed” the algorithm with an initial population set. The beauty of this approach is that we can virtually guarantee our success by creating an algorithm that will in effect create a simplistic version of our best current business practices. Along with this solution we may seed the algorithm with “extreme” solutions, such as those that employ the maximum amount of overtime to increase facility throughput, or eliminate overtime to maximize immediate profitability. With our best business practice solution and the extreme solutions, the algorithm over time will settle on solutions that fair better against our performance indexes than our best business practices alone. An Example Let’s return to our example. Design question one: what will a gene represent in our model? This is the big question. Here the art of creating genetic algorithms plays a part in the design process. How can we possibly come up with a single parameter and the set of values for that parameter that in the end represents everything that we do? It is important to remember that we will also be designing the algorithms that express our genes, so this is the key. Let us start with what it is we want to measure. In other words, if we could measure everything that we need to measure to determine how well we are performing, then what would those measurements be? ______________________________________________________________________________________ Delaney Software Services Page 7 2/17/2016 In our simple example, we have already stated indirectly that we are interested in three things: gross margin, customer satisfaction, and employee satisfaction. We will stick to these three for our example, but the list could easily be expanded, and doing so further demonstrates the need for an alternative computational approach. Additional measurements could include: vendor performance, the sales process, asset utilization, other financial measurements, etc. Given our list of measurements, we then ask: is there a parameter that already exists, or is there one we could create in a genetic code that would serve as a gene? This means that we could design expression algorithms that use this specific gene to address all aspects of our business. In our example, we can see that the scheduling of labor and resources really is the key. So perhaps a gene could represent a resource time-slot, with the value representing what resource is allocated for that time slot. An individual would then consist of all the individual gene values for all the time slots for say an entire month. We could vary these values and work toward optimization. Unfortunately, this choice violates our rule that the genes be independent. If we change the resource allocation in one time slot, then we are going to have to fill that slot with another resource, which may already be allocated, which would mean that the gene for that resource would have to change also. This domino effect could continue until we find that we are spending much of our critical computational time resolving these conflicts. There is a better choice. Say we invent a new parameter, called the scheduling urgency, and we assign a value of this parameter to every job in house. Then we express the gene by saying that the highest value means that every available resource is applied to every available task at any given time in the workscope execution. We could further say then that the lowest value would mean scheduling the task in this job only when every other job with a higher urgency is completed. Then we would design simple rules to resolve ______________________________________________________________________________________ Delaney Software Services Page 8 2/17/2016 conflicts, such as when there are multiple jobs all with the same urgency, which job is executed first, are they executed in concert, and if so, how. This gene has the advantage of inter-gene independence that we require. If job X is changed from a high to a low urgency, it then means that when the entire genome is expressed that this job will be scheduled later–it does not force us to change the urgency of any other job. We can now see how our algorithms will work together. Our seeding algorithm will create a genetic code that reflects the urgency of all jobs. This is information we should be able to easily record. Our best business practice genetic code, which is part of the initial seeding, will be represented by those urgency values and thus the schedule that reflects how we would do things the old-fashioned way. The initial genes are scored against the fitness function, then they are subjected to the crossover and mutation algorithms, and then we score the next population against the fitness function, and so on. Only those solutions that fair well compared to other solutions in the population are allowed to “survive”, so we are virtually guaranteed to evolve toward better solutions than would have previously been created in a typical reactionary mode. The flowchart below illustrates the various computational steps in our algorithm. ______________________________________________________________________________________ Delaney Software Services Page 9 2/17/2016 Figure 1: A Genetic Algorithm Flowchart ______________________________________________________________________________________ Delaney Software Services Page 10 2/17/2016 The Design Steps To create such a system, we will need to perform the following steps: 1. Thoroughly examine the current business process. Determine all the required measurement points, and suggest many more. 2. Create an object and/or data model that will accommodate the entire business process with all its interdependencies. 3. Create the business system needed to record all the necessary data, or modify the one that is already in place. 4. Determine how best to represent the process with a genetic code that we invent. 5. Create the algorithm that is the fitness function. 6. Create the algorithms to express the genes. 7. Create the population evolution code. 8. Determine how many populations (iterations) will be created, how many individuals per population, how much raw computing power will be required, hardware recommendations, can the algorithm’s execution be distributed among many machines in the organization, etc. The Next Step – Genetic Programming In our example above, we had to create an algorithm for the fitness function. We can imagine this algorithm consisting of three elements in our example: a profitability calculation, a customer satisfaction index, and employee satisfaction index. The profitability calculation is straightforward: for each job, we simply calculate the total costs for labor and materials, and then deduct this from the sales price, and we arrive at a gross margin for each job. The customer satisfaction and employee indexes, however, are not as straightforward to calculate. How exactly does one calculate indexes like these? Is there a better algorithm to calculate them? To answer these questions and to help us generate better algorithms, we employ genetic programming. ______________________________________________________________________________________ Delaney Software Services Page 11 2/17/2016 To calculate these indexes, we have to provide at the very least a parameter or set of parameters that we are confident are measures of our actual performance in these areas. We may not actually have a procedure in place to determine the values of these parameters, so such a procedure would need to be created, and then this process would need to be added to our general business process, and thus in some way it would have to be incorporated into our gene expression algorithms as well. Once we have all the parameters and procedures we need to determine values for these parameters, we have to create an algorithm that takes these parameters as inputs and produces an output that is an accurate measure of the performance in these areas. Although we may have some ideas on where to start, for many measures we cannot be confident there is a satisfactory correlation between the values of our measures or parameters (inputs) and the value of our index (output). We need a tool that determines the fitness of our fitness algorithm, and a means to modify this algorithm and then test the fitness of the new algorithm. This is where we introduce genetic programming to our computational arsenal. Genetic programming is distinct from genetic algorithms in that with genetic programming we alter the content of a program, and therefore how a calculation is executed. Here an individual in the population is a distinct program or procedure. Thus we are creating new procedures or software from other procedures or software. Otherwise, the process is similar to that with a genetic algorithm. A high-level look at the components of a genetic program include: a. Functions and terminals. The parameters we include in our genetic program are known as terminals. The functions are the procedures we introduce. Examples would be mathematical procedures, such as addition, subtraction, and more advanced functions. b. Fitness function. This is a procedure that actually measures how well our composite algorithm performs. In our example, we would measure how well the customer satisfaction index actually measures customer satisfaction. ______________________________________________________________________________________ Delaney Software Services Page 12 2/17/2016 c. Crossover and mutation operations. We need to modify the actual calculational steps in our algorithm, and we do so in a manner similar to genetic algorithms. But here individual discrete computation steps or groups of these steps are the individual genes that we combine and mutate. d. Population control procedures. We want to control the size of our population, as well as which individuals participate in reproduction and to what extent they do so. The fitness function in this case has to measure how much statistical correlation there is between what we are using for parameters (measures) and the actual performance (index). Thus with the genetic program, our fitness algorithms can “evolve” to a more “fit” set of computation steps that produce an index that is tightly correlated to actual performance. Furthermore, with this procedure we can determine if in fact we have sufficient inputs to determine an output. Failure to demonstrate convergence toward a solution would be an indicator. The illustration below summarizes the process. Figure 2: Algorithm Evolution with Genetic Programming ______________________________________________________________________________________ Delaney Software Services Page 13 2/17/2016 More Genetic Programming Next we may observe that there is another step in our original genetic algorithm that we want to examine. When we settled on a gene parameter, we had to create algorithms that express those genes. In order to initiate the process, we had to settle on a finite set of values for these genes. We also had to settle on specific rules for the expression of each gene. We will need to know if there is a potential opportunity for improvement in this rule set as well, and we can do so by subjecting our gene expression algorithms to the genetic programming process. This will be more challenging to execute, however. For starters, there are constraints in this step that we will be forced to include that were not present with the fitness functions in the genetic algorithms. We cannot arbitrarily combine terminals and functions in various ways as we could with these fitness functions. Furthermore, the gene expression algorithms are much more complicated than the fitness functions, so there will be many more potential variants. The Emergence of an Adaptive System The genetic program here can be designed so that our entire process is adaptive; in other words, the program will evolve. In our original genetic algorithm, the individual genes could only assume discrete values. This value would be input to the expression algorithm, which would then express the gene in one and only one way. We can allow our genetic program to examine subtle differences between gene values. Also, we may find through this process that there are more degrees of freedom that we want to include in our genetic code. We can therefore design this genetic program so that it tests some of these gray areas. By so doing, we may find one or more parameters need to be added to our original genes as additional degrees of freedom. ______________________________________________________________________________________ Delaney Software Services Page 14 2/17/2016 Here we would want to include actual historical data as a part of the fitness function. In other words, we want to examine how well our gene values and expression algorithms predict what actually happened. The fitness function here, however, is going to be a challenge to design. Keep in mind that our original algorithm creates “more fit” solutions from “less fit” ones. It does not settle on some specific answer. However, with each solution comes a prediction of actual performance. We can examine actual performance versus predicted, as well as to what extent the recommendations provided by the solution or solution set were actually executed. Obviously, if the solutions are virtually ignored, then we cannot know how well our solution predicted actual performance. So the question here becomes: is there an algorithm that through its execution in the expression process more accurately predicts actual performance? In other words, if a facility faithfully executed the solutions produced by the original genetic algorithm, and after the fact we determine absolutely how the facility performed, can we create an expression algorithm that better predicts that actual performance? Better in this case would mean more quickly (fewer computation steps or rounds), or more accurately (predicted performance closer to actual performance, such as gross margin, etc.). Our genetic program here will create and destroy variations on our expression algorithm by the combination and mutation of the computational steps within the individuals in the population. The solutions that emerge from this process will more accurately encode the actual behavior of the facility itself. Therefore, the software will in effect begin to “learn” new computational techniques based on what it “observes” from the behavior of the facility. In the end, we may have a set of procedures in place that are very different than those with which we started, that have been modified to account for the actual human behavior in a facility, and that were created entirely by software. This is a completely adaptive system. The next diagram illustrates the entire adaptive system. ______________________________________________________________________________________ Delaney Software Services Page 15 2/17/2016 Figure 3: The Adaptive System ______________________________________________________________________________________ Delaney Software Services Page 16 2/17/2016 Final Thoughts As IT has advanced, so has the complexity of the problems that can be addressed with IT solutions. There comes a time when old tools are no longer appropriate, due to the fact that with the raw processing power available, we can afford to look at old problems in new ways. This is entirely possible with genetic algorithms augmented with genetic programming to produce a completely adaptive software system. Genetic algorithms are computationally intensive, and could not reasonably be expected to fully represent complex processes and run on anything but a supercomputer until recently. Now with desktop CPU’s executing in excess of a billion operations per second, we can afford to take our computational tools to the next level. Genetic programs and genetic algorithms are a new tool in the business application designer’s arsenal that allow the programmer to design resolutions to problems that were computationally intractable in the past. ______________________________________________________________________________________ Delaney Software Services Page 17 2/17/2016 Sources Julie Wakefield, Complexity’s Business Model, Scientific American, January 2001, http://www.sciam.com/2001/0101issue/0101techbus1.html Jaime Fernandez, The Genetic Programming Notebook, http://www.geneticprogramming.com/ Melanie Mitchell, 1996, An Introduction to Genetic Algorithms, MIT Press ______________________________________________________________________________________ Delaney Software Services Page 18 2/17/2016