Genetic Algorithms - Delaney Software Services

advertisement
Delaney Software Services Inc.
The Genetic Model in
Business Application
Development
Artificial Intelligence in Business Series
Thomas E. Delaney
Delaney Software Services Inc.
281-550-7752
tom@delaneyware.com
January 20, 2001
Table of Contents
OVERVIEW .......................................................................................................... 3
THE NEED FOR A NEW APPROACH ................................................................. 4
USING GENETIC ALGORITHMS IN BUSINESS APPLICATIONS ..................... 6
AN EXAMPLE ...................................................................................................... 7
THE NEXT STEP – GENETIC PROGRAMMING ............................................... 11
MORE GENETIC PROGRAMMING ................................................................... 14
THE EMERGENCE OF AN ADAPTIVE SYSTEM .............................................. 14
FINAL THOUGHTS ............................................................................................ 17
SOURCES .......................................................................................................... 18
Table of Figures
FIGURE 1: A GENETIC ALGORITHM FLOWCHART .............................................................. 10
FIGURE 2: ALGORITHM EVOLUTION WITH GENETIC PROGRAMMING................................. 13
FIGURE 3: THE ADAPTIVE SYSTEM ................................................................................... 16
______________________________________________________________________________________
Delaney Software Services
Page 2
2/17/2016
Overview
Many modern software applications include workflow or project planning tools.
Frequently, these tools consist of a large number of variables and an even larger number
of database records where these variables are expressed in many different combinations.
These records represent entities in the business process, such as work orders, purchase
orders, and employees.
Frequently there are subtle interdependencies among these
entities that are not expressed in the application or the underlying data model.
Customers want to use the data not only for archiving purposes, but also to assist in
managing the business work flow, maximizing customer satisfaction, and improving the
corporate profitability. The question for the application designer is how best to achieve
these objectives. With complex business systems, there are too many interdependencies
among business entities for the programmer to calculate the best answer to a specific
question. For instance, a typical rotating equipment repair facility may at any time be
working 100 projects concurrently. Each job has a specific set of tasks designed to
execute a specific workscope.
Each task has one-to-many resource requirements,
including asset requirements, such as a lathe or boring mill, as well as skill level
requirements. These resources are usually limited and therefore have to be carefully
scheduled. Furthermore, the job’s timing, the profitability of the work order and the
employee satisfaction associated with the work are dependent on the scheduling.
Frequently, these interdependencies work against one another. For instance, it may be
more profitable to work employees 12 hours a day, seven days a week to meet a
customer’s short deadline, but then the employee satisfaction level would seriously
deteriorate, likely to the point where skilled labor is lost and then the shop’s ability to
provide its traditional goods and services is seriously compromised.
Even with sophisticated ERP business systems, many of these difficult-to-model business
processes cannot be optimized in any way. The age-old approach of management-on-thefly, or better yet the biggest-fire-is-extinguished-first approach are the management styles
most commonly in use today. This is due to the fact that the entire business process is
______________________________________________________________________________________
Delaney Software Services
Page 3
2/17/2016
not sufficiently represented in the software and/or the tools are not available to make the
most use of the available information. Another approach is required.
Genetic algorithms provide the most viable solution. A recent article in Scientific
American references the work of John H. Holland, a renowned computer scientist who
holds appointments at the University of Michigan and Santa Fe who states, “Genetic
algorithms have proved important in generating new solutions across a number of areas.
There is not any counterpart to this type of crossbreading in traditional optimization
analyses.” This paper explains in detail how these algorithms may be assigned to a
company’s business processes and provide best-practice solutions to manage complex
operational interdependencies.
The Need for a New Approach
The example described in the overview is not what we would call a simple optimization
problem—it is a problem that does not have a concrete, easily identifiable solution.
Even if we could infer a solution, a precise method for proving its veracity does not exist.
Let’s extend our simple example to better illustrate our dilemma. Let us say that our
shop has two employees, two machines, and two projects currently in-house, both of
which are emergencies. Project A is for our best customer, but has the lower expected
gross margin. Project B has better margin prospects, but is for a customer we only hear
from once a year. Both Projects require the use of Machine A, but as it turns out, the
same tasks could be performed on Machine B, but it would take these tasks 25% longer
to performed. Also, employee A is very efficient on both machines, but employee B has
limited experience on Machine A and takes about 25% longer than Employee A.
However, employee B wants very much to get more work on Machine A. In fact,
employee B has expressed that another shop has promised him that if he were to come
and work for them he would be able to use such a machine more often and increase his
skill level.
______________________________________________________________________________________
Delaney Software Services
Page 4
2/17/2016
To satisfy employee B, and hopefully prevent him from leaving, we want to give him the
work on Machine A. The problem is this will adversely affect our profit margin and
extend the time to complete the jobs. So perhaps we could then say we will let employee
B work Project B, after Employee A has worked Project A on this machine. This sounds
great, but won’t we be tempted to make the same decision next time when presented with
a similar conflict? If so, we can bet that employee B will be leaving. Furthermore, no
one within our organization will ever be as proficient as employee A on this machine,
which means we are going to reduce future margins by consistently making such short
term scheduling decisions. If employee B leaves for a job at another company, in the
future we will be able to process slightly more than half the work we can currently
process under these circumstances, which is obviously undesirable.
If we are strictly driven by lead-time requirements, then we schedule employee A on
Machine A, but then we will run into a labor problem, which in turn will lead to an
inability to meet future customer requirements. Thus the short-term solution in one
specific area actually can lead to a long-term problem in the same area. In other words,
we end up solving a problem today by creating an even bigger problem in the future. If
we are driven strictly by profit considerations, then we probably would make the same
choice short term, but our ability to produce profit will be compromised in the future by
the labor problem. But maybe we can have the best of both worlds: we buy an exact
replica of Machine A, and work both jobs concurrently with both employees. This
sounds great, but obviously increases the shop burden pool, and decreases profit, and how
will we know the expense was worth it? As a company adds more and more jobs, with
more and more tasks and resource requirements, with more and more assets available as
resources, the problem will continue to escalate.
Typical workflow software tools can provide the shop with the ability to schedule and
predict asset and labor utilization, and may allow the user to tweak some variables and
adjust the scheduling, but how do we know, with all the various-and sometimes
competing-measures of performance, that we aren’t wasting opportunities? What is
______________________________________________________________________________________
Delaney Software Services
Page 5
2/17/2016
needed is an alternative software tool that can demonstrably improve resource allocation
as measured against performance indexes that truly indicate overall facility performance.
Using Genetic Algorithms in Business
Applications
The alternative that we propose is based on what are known as genetic algorithms. The
components of a genetic algorithm are the following:
1. A genetic code that succinctly represents the entire business process. This code
consists of one-to-many genes, each of which can take on one-to-many
predetermined values. All of the genes in one code are said to represent one
individual in the population. What a gene represents has to be determined by the
designer, but it must have the following characteristics:
a. The value of each gene is completely independent of the value of any
other gene. In other words, if Gene X changes value, this does not affect
the value of any other gene.
b. What the genes represent affects every aspect of the business process, so
that changing the value of the genes affects all business processes which
contribute to the performance indexes. Algorithms are designed which
demonstrate this dependence, and these algorithms are said to express the
genes.
2. A fitness function, which is an algorithm that takes a genetic code and in effect
scores it against all the performance factors we want to include.
3. A procedure for replication, known as crossover and mutation. Crossover takes
two current codes and combines them to create a third code; mutation is included
for random gene value changes.
4. Population control rules that determine which codes participate in future
crossover and mutation operations, and which codes are discarded.
5. An appropriate control procedure which determine how many generations to
create, how many individuals per generation, etc.
______________________________________________________________________________________
Delaney Software Services
Page 6
2/17/2016
The concept behind genetic algorithms is that through a series of iterations, the
algorithm will “evolve” toward a “fit” solution. In other words, those genetic codes
that fair well against the fitness function will be allowed to survive and multiply;
those that do not will be discarded. In the end we will have one or a set of possible
“solutions” to our optimization problem.
To further insure our success, and to kick off the evolutionary process, we will need
to “seed” the algorithm with an initial population set. The beauty of this approach is
that we can virtually guarantee our success by creating an algorithm that will in effect
create a simplistic version of our best current business practices. Along with this
solution we may seed the algorithm with “extreme” solutions, such as those that
employ the maximum amount of overtime to increase facility throughput, or eliminate
overtime to maximize immediate profitability.
With our best business practice
solution and the extreme solutions, the algorithm over time will settle on solutions
that fair better against our performance indexes than our best business practices alone.
An Example
Let’s return to our example. Design question one: what will a gene represent in our
model?
This is the big question. Here the art of creating genetic algorithms plays a part in the
design process. How can we possibly come up with a single parameter and the set of
values for that parameter that in the end represents everything that we do? It is important
to remember that we will also be designing the algorithms that express our genes, so this
is the key.
Let us start with what it is we want to measure. In other words, if we could measure
everything that we need to measure to determine how well we are performing, then what
would those measurements be?
______________________________________________________________________________________
Delaney Software Services
Page 7
2/17/2016
In our simple example, we have already stated indirectly that we are interested in three
things: gross margin, customer satisfaction, and employee satisfaction. We will stick to
these three for our example, but the list could easily be expanded, and doing so further
demonstrates the need for an alternative computational approach.
Additional
measurements could include: vendor performance, the sales process, asset utilization,
other financial measurements, etc.
Given our list of measurements, we then ask: is there a parameter that already exists, or
is there one we could create in a genetic code that would serve as a gene? This means that
we could design expression algorithms that use this specific gene to address all aspects of
our business.
In our example, we can see that the scheduling of labor and resources really is the key.
So perhaps a gene could represent a resource time-slot, with the value representing what
resource is allocated for that time slot. An individual would then consist of all the
individual gene values for all the time slots for say an entire month. We could vary these
values and work toward optimization.
Unfortunately, this choice violates our rule that the genes be independent. If we change
the resource allocation in one time slot, then we are going to have to fill that slot with
another resource, which may already be allocated, which would mean that the gene for
that resource would have to change also. This domino effect could continue until we find
that we are spending much of our critical computational time resolving these conflicts.
There is a better choice. Say we invent a new parameter, called the scheduling urgency,
and we assign a value of this parameter to every job in house. Then we express the gene
by saying that the highest value means that every available resource is applied to every
available task at any given time in the workscope execution. We could further say then
that the lowest value would mean scheduling the task in this job only when every other
job with a higher urgency is completed. Then we would design simple rules to resolve
______________________________________________________________________________________
Delaney Software Services
Page 8
2/17/2016
conflicts, such as when there are multiple jobs all with the same urgency, which job is
executed first, are they executed in concert, and if so, how.
This gene has the advantage of inter-gene independence that we require. If job X is
changed from a high to a low urgency, it then means that when the entire genome is
expressed that this job will be scheduled later–it does not force us to change the urgency
of any other job.
We can now see how our algorithms will work together. Our seeding algorithm will
create a genetic code that reflects the urgency of all jobs. This is information we should
be able to easily record. Our best business practice genetic code, which is part of the
initial seeding, will be represented by those urgency values and thus the schedule that
reflects how we would do things the old-fashioned way. The initial genes are scored
against the fitness function, then they are subjected to the crossover and mutation
algorithms, and then we score the next population against the fitness function, and so on.
Only those solutions that fair well compared to other solutions in the population are
allowed to “survive”, so we are virtually guaranteed to evolve toward better solutions
than would have previously been created in a typical reactionary mode. The flowchart
below illustrates the various computational steps in our algorithm.
______________________________________________________________________________________
Delaney Software Services
Page 9
2/17/2016
Figure 1: A Genetic Algorithm Flowchart
______________________________________________________________________________________
Delaney Software Services
Page 10
2/17/2016
The Design Steps
To create such a system, we will need to perform the following steps:
1. Thoroughly examine the current business process. Determine all the required
measurement points, and suggest many more.
2. Create an object and/or data model that will accommodate the entire business
process with all its interdependencies.
3. Create the business system needed to record all the necessary data, or modify the
one that is already in place.
4. Determine how best to represent the process with a genetic code that we invent.
5. Create the algorithm that is the fitness function.
6. Create the algorithms to express the genes.
7. Create the population evolution code.
8. Determine how many populations (iterations) will be created, how many
individuals per population, how much raw computing power will be required,
hardware recommendations, can the algorithm’s execution be distributed among
many machines in the organization, etc.
The Next Step – Genetic Programming
In our example above, we had to create an algorithm for the fitness function. We can
imagine this algorithm consisting of three elements in our example:
a profitability
calculation, a customer satisfaction index, and employee satisfaction index.
The
profitability calculation is straightforward: for each job, we simply calculate the total
costs for labor and materials, and then deduct this from the sales price, and we arrive at a
gross margin for each job. The customer satisfaction and employee indexes, however,
are not as straightforward to calculate. How exactly does one calculate indexes like
these? Is there a better algorithm to calculate them? To answer these questions and to
help us generate better algorithms, we employ genetic programming.
______________________________________________________________________________________
Delaney Software Services
Page 11
2/17/2016
To calculate these indexes, we have to provide at the very least a parameter or set of
parameters that we are confident are measures of our actual performance in these areas.
We may not actually have a procedure in place to determine the values of these
parameters, so such a procedure would need to be created, and then this process would
need to be added to our general business process, and thus in some way it would have to
be incorporated into our gene expression algorithms as well.
Once we have all the parameters and procedures we need to determine values for these
parameters, we have to create an algorithm that takes these parameters as inputs and
produces an output that is an accurate measure of the performance in these areas.
Although we may have some ideas on where to start, for many measures we cannot be
confident there is a satisfactory correlation between the values of our measures or
parameters (inputs) and the value of our index (output). We need a tool that determines
the fitness of our fitness algorithm, and a means to modify this algorithm and then test the
fitness of the new algorithm. This is where we introduce genetic programming to our
computational arsenal.
Genetic programming is distinct from genetic algorithms in that with genetic
programming we alter the content of a program, and therefore how a calculation is
executed. Here an individual in the population is a distinct program or procedure. Thus
we are creating new procedures or software from other procedures or software.
Otherwise, the process is similar to that with a genetic algorithm. A high-level look at
the components of a genetic program include:
a. Functions and terminals. The parameters we include in our genetic program are
known as terminals. The functions are the procedures we introduce. Examples
would be mathematical procedures, such as addition, subtraction, and more
advanced functions.
b. Fitness function.
This is a procedure that actually measures how well our
composite algorithm performs. In our example, we would measure how well the
customer satisfaction index actually measures customer satisfaction.
______________________________________________________________________________________
Delaney Software Services
Page 12
2/17/2016
c. Crossover and mutation operations. We need to modify the actual calculational
steps in our algorithm, and we do so in a manner similar to genetic algorithms.
But here individual discrete computation steps or groups of these steps are the
individual genes that we combine and mutate.
d. Population control procedures. We want to control the size of our population,
as well as which individuals participate in reproduction and to what extent they do
so.
The fitness function in this case has to measure how much statistical correlation there is
between what we are using for parameters (measures) and the actual performance
(index). Thus with the genetic program, our fitness algorithms can “evolve” to a more
“fit” set of computation steps that produce an index that is tightly correlated to actual
performance. Furthermore, with this procedure we can determine if in fact we have
sufficient inputs to determine an output. Failure to demonstrate convergence toward a
solution would be an indicator.
The illustration below summarizes the process.
Figure 2: Algorithm Evolution with Genetic Programming
______________________________________________________________________________________
Delaney Software Services
Page 13
2/17/2016
More Genetic Programming
Next we may observe that there is another step in our original genetic algorithm that we
want to examine. When we settled on a gene parameter, we had to create algorithms that
express those genes. In order to initiate the process, we had to settle on a finite set of
values for these genes. We also had to settle on specific rules for the expression of each
gene. We will need to know if there is a potential opportunity for improvement in this
rule set as well, and we can do so by subjecting our gene expression algorithms to the
genetic programming process.
This will be more challenging to execute, however. For starters, there are constraints in
this step that we will be forced to include that were not present with the fitness functions
in the genetic algorithms. We cannot arbitrarily combine terminals and functions in
various ways as we could with these fitness functions. Furthermore, the gene expression
algorithms are much more complicated than the fitness functions, so there will be many
more potential variants.
The Emergence of an Adaptive System
The genetic program here can be designed so that our entire process is adaptive; in other
words, the program will evolve. In our original genetic algorithm, the individual genes
could only assume discrete values.
This value would be input to the expression
algorithm, which would then express the gene in one and only one way. We can allow
our genetic program to examine subtle differences between gene values. Also, we may
find through this process that there are more degrees of freedom that we want to include
in our genetic code. We can therefore design this genetic program so that it tests some of
these gray areas. By so doing, we may find one or more parameters need to be added to
our original genes as additional degrees of freedom.
______________________________________________________________________________________
Delaney Software Services
Page 14
2/17/2016
Here we would want to include actual historical data as a part of the fitness function. In
other words, we want to examine how well our gene values and expression algorithms
predict what actually happened. The fitness function here, however, is going to be a
challenge to design. Keep in mind that our original algorithm creates “more fit” solutions
from “less fit” ones. It does not settle on some specific answer. However, with each
solution comes a prediction of actual performance. We can examine actual performance
versus predicted, as well as to what extent the recommendations provided by the solution
or solution set were actually executed. Obviously, if the solutions are virtually ignored,
then we cannot know how well our solution predicted actual performance.
So the question here becomes: is there an algorithm that through its execution in the
expression process more accurately predicts actual performance? In other words, if a
facility faithfully executed the solutions produced by the original genetic algorithm, and
after the fact we determine absolutely how the facility performed, can we create an
expression algorithm that better predicts that actual performance? Better in this case
would mean more quickly (fewer computation steps or rounds), or more accurately
(predicted performance closer to actual performance, such as gross margin, etc.).
Our genetic program here will create and destroy variations on our expression algorithm
by the combination and mutation of the computational steps within the individuals in the
population. The solutions that emerge from this process will more accurately encode the
actual behavior of the facility itself. Therefore, the software will in effect begin to
“learn” new computational techniques based on what it “observes” from the behavior of
the facility. In the end, we may have a set of procedures in place that are very different
than those with which we started, that have been modified to account for the actual
human behavior in a facility, and that were created entirely by software. This is a
completely adaptive system.
The next diagram illustrates the entire adaptive system.
______________________________________________________________________________________
Delaney Software Services
Page 15
2/17/2016
Figure 3: The Adaptive System
______________________________________________________________________________________
Delaney Software Services
Page 16
2/17/2016
Final Thoughts
As IT has advanced, so has the complexity of the problems that can be addressed with IT
solutions. There comes a time when old tools are no longer appropriate, due to the fact
that with the raw processing power available, we can afford to look at old problems in
new ways. This is entirely possible with genetic algorithms augmented with genetic
programming to produce a completely adaptive software system. Genetic algorithms are
computationally intensive, and could not reasonably be expected to fully represent
complex processes and run on anything but a supercomputer until recently. Now with
desktop CPU’s executing in excess of a billion operations per second, we can afford to
take our computational tools to the next level. Genetic programs and genetic algorithms
are a new tool in the business application designer’s arsenal that allow the programmer to
design resolutions to problems that were computationally intractable in the past.
______________________________________________________________________________________
Delaney Software Services
Page 17
2/17/2016
Sources
Julie Wakefield, Complexity’s Business Model, Scientific American, January 2001,
http://www.sciam.com/2001/0101issue/0101techbus1.html
Jaime Fernandez, The Genetic Programming Notebook,
http://www.geneticprogramming.com/
Melanie Mitchell, 1996, An Introduction to Genetic Algorithms, MIT Press
______________________________________________________________________________________
Delaney Software Services
Page 18
2/17/2016
Download