BOOLEAN FUNCTIONS AND SOME ISSUES IN DISCOVERY AND INVENTION* David A. Schum School of Information Technology & Engineering School of Law George Mason University Honorary Professor of Evidence Science University College London Revised: 1 April, 2005 * The author is very grateful for the support provided by the National Aeronautics and Space Administration under contracts # 526964 and # 526965 from the NASA Langley Research Center to George Mason University, and for the support provided by the Leverhulme Foundation and the Economic and Social Research Council [UK] to University College London. 2 1.0 INTRODUCTORY COMMENTS I was very fortunate to have been asked by Professor Tomasz Arciszewski to participate in his research on a computer-based system called Inventor 2000 [and its successors], which had, as its initial objective, improvements in the design of wind-bracing systems for tall buildings. Equally fortunate has been my association over the past eight years with Col. Carl Hunt. I was privileged to serve as director of Col. Hunt’s doctoral dissertation on a computer-based system called ABEM [Agent Based Evidence Marshaling]. At the time, Col. Hunt was Commanding Officer, Computer Crime Investigative Unit, Criminal Investigation Division, United States Army. This system has been designed by Col. Hunt to facilitate discovery-related tasks in criminal investigation and in many other contexts. These two research activities may seem quite unrelated, but they are not. In a recent paper [Schum, 2001] I illustrated how these two research activities are in fact symbiotic and commensal. They are symbiotic since ideas generated in work on Inventor 2000 have been useful in work related to ABEM; ideas generated in work on ABEM have been useful in work on Inventor 2000. Two major items on the common table at which both research activities feed are the complexity of the processes under investigation in both research ventures and the role of imaginative, creative, or inventive thought that is driven by curiosity. The utility of Inventor 2000 goes far beyond the boundaries of the initial specific context in which it has been applied [wind-bracing systems for tall buildings]. It is in fact an elegant test bed for investigating a variety of issues in the study of discovery and invention. Similarly, work on the ABEM system raises issues that have great importance in any kind of discovery or investigative tasks regardless of the specific context in which they are encountered. How we marshal or organize our existing thoughts and evidence greatly influences how successful we will be in generating new thoughts and new evidence. In my own thinking about Inventor 2000 and ABEM I have, for various purposes, encountered the need for Boolean functions and their alternative means of expression. In my work thus far, I have described these functions and their alternative expressions rather casually. A major purpose of the present paper is to provide a more careful account of these useful formal devices. My apologies go out immediately to readers for whom the developments in this paper are already well known. My second objective of course is to try to generate new thoughts about the very difficult matters being investigated in research on Inventor 2000 and ABEM. This paper is being written for those interested in our present work and for those whose own thoughts may be stimulated by the formal developments and discussion to follow. Boolean functions arise in many contexts, especially in probability theory. Study of these functions and their alternative means of expression provide very useful methods for dealing with complex events to which probability measures are to be applied. But, as I hope to illustrate, these functions also arise in the study of many other phenomena including those associated with our work on Inventor 2000 and ABEM. One current activity I will later discuss concerns Stuart Kauffman’s work on selforganizing systems and interesting phase transitions that result when such systems are expressed in terms of Boolean functions and are exercised in various ways [Kauffman, 2000]. I will relate this work to our studies of evidence marshaling and discovery. The search engine that drives Inventor 2000 [and its successor Inventor 2001] is based on the strategy of evolutionary computation as developed for the current projects by Professor Ken De Jong. This search and optimization strategy rests on algorithms that involve the genetically inspired processes of selection, recombination, random variation, and competition of, in our case, alternative wind bracing and other designs. To be useful in our current work, I must relate the following ideas concerning Boolean functions to these genetically inspired processes. At the very least, the Boolean function ideas I now present give us a different language to employ in our work on the search and inquiry mechanisms necessary during the processes of discovery and invention. My major reference source on evolutionary computation is the recent work of Dumitrescu, et al [2000]. I begin with a definition of a Boolean function and the different ways in which these functions may be expressed and analyzed. 3 2.0 BOOLEAN FUNCTIONS AND THEIR CANONICAL FORMS Every new faculty member beginning his/her academic career ought to have a colleague, friend, and mentor such as the one I had when I took my first academic position at Rice University in 1966. Professor Paul E. Pfeiffer, now Emeritus Professor of Mathematical Sciences at Rice University, served all three of these roles for me during my entire 20 years at Rice. We shared a common interest in probability theory. Paul had already written several notable works on probability theory and its applications before our association began. His Concepts of Probability Theory [ Pfeiffer, 1965] is a classic and is still available in the Dover series. His earlier Sets, Events, and Switching [Pfeiffer, 1964] is also a classic that has been so helpful to many persons, not just those whose interest is in electrical engineering. How honored I was when Paul asked me to collaborate on a book entitled Introduction to Applied Probability [Pfeiffer & Schum, 1973]. Working with Paul on this book was one of the most enjoyable and profitable educational experiences of my life. After I left Rice for George Mason University in 1985, Paul wrote a muchexpanded version of our work entitled Probability for Applications [Pfeiffer, 1990]. One of Paul’s abiding concerns has been that students are rarely [except in his works] given very extensive tutoring in strategies for handling complex events, i.e. compound events that involve unions, intersections, and complementations of these events. All of my writings on evidence and probability for the past thirty years or so carry Paul’s stamp. I could not have proceeded in my work on evidence and inference without Paul’s wise and patient tutoring on strategies for coping with situations in which we have many events to consider in probabilistic reasoning and in which we seek to relate and combine them in various ways. By definition, a Boolean function (f) of a finite class A of sets [events] is a rule of combination of these sets based on a finite number of applications of the operations of forming unions, intersections, and complements. It will be convenient in what follows to let this finite class of sets [events] be represented by A = {An-1, An-2, ..., A1, A0}. As you see, A is simply a listing of n sets or events. The particular numbering of them beginning at (n -1) and ending at zero has useful properties to be mentioned a bit later on. For some purposes, class A may involve listings of events that are not distinguished by subscripts; for example, A = {A, B, C}. In writing a Boolean function of events there are certain conventions that I will follow that simplify the writing. The intersection symbol [] is usually suppressed. Thus, A B is written as AB. The union symbol [ ] is never suppressed. Another convention I will follow is to use the following symbol to indicate the complement or negation of an event: Ec = E-complement, or notE. One more convention concerns the union or disjunction of two or more mutually exclusive events. Paul Pfeiffer used the conventional union symbol with a horizontal bar across the arms of the symbol to indicate a disjoint union. I cannot reproduce this disjoint union symbol on my computer and will use instead the symbol , read "circle-plus". Thus, A B is read "A or B, but not both". At least one other work follows this convention [Gregg, 1998]. 2.1 The Algebra of Sets: A Brief Review Just in case you have forgotten the basic rules for combining sets in forming and analyzing Boolean functions, here are some of the basic rules. I will note the ones that are particularly important in the discussion to follow. First, let represent a basic space or universal set of all possible elements or outcomes in some well-defined situation. In what follows I will refer to as the universe of discourse or simply the space of all possibilities. Then let represent the empty or vacuous set. In probability is called the impossible event. Let A, B, and C represent any subsets of . 4 Complement Rules: A Ac = A Ac = [i.e. A and Ac form a partition of since A and Ac are mutually exclusive and exhaustive of ] c = c = [Ac]c = A. Identity Rules: A=A A= A= A=A Idempotent Rules: AA=A AA=A Commutative Rules: AB=BA AB=BA De Morgan's Rules: (A B)c = Ac Bc (A B)c = Ac Bc. [De Morgan's rules are very important in seeing what happens when we decompose or express a Boolean function in different ways using what I will later term minterms and maxterms]. Associative Rules: (A B) C = A (B C) (A B) C = A (B C) Distributive Rules: A (B C) = (A B) (A C) A (B C) = (A B) (A C). [These two rules are also very important in decomposing Boolean functions] As you see, some of these rules involve the simplest possible Boolean functions; for example; f1(A, B) = A B; f2(A, B, C) = A (B C); or f3(A, B) = (A B)c. But we wish to have some way of analyzing Boolean functions that are not this simple and whose analysis requires us to express a given Boolean function in different ways. The first method of analysis I will mention involves expressing a Boolean function in what is called its disjunctive canonical form. As you will 5 see, this very useful method of analysis allows us to express any Boolean function in terms of the finest grain partitioning of any basic space . 2.2 Minterms and the Minterm Expansion Theorem Since all Boolean functions involve events and their complements, each event A and its complement Ac forms a class of events that partitions a basic space ; i.e. the class of events Aj = {Aj , Ajc} partitions . This I noted above in discussing the complementation property in the algebra of sets. But now suppose that we have a class A = {An-1, An-2, ..., A1, A0} of n events. A minterm [or minimal polynomial as it is also called (see Birkhoff & Mac Lane, 1965, 323-324)] is n 1 the intersection set M of the form M = Y j 0 j , where each Yj is either Aj or Ajc. The disjoint and exhaustive class of all such minterms is called the partition [of ] generated by class A. Thus, A is called the generating class for the partition and the individual Aj are called the generating events. As an example of minterm generation consider the class of events A = {A2, A1, A0}. We might ordinarily suppose that none of the events in this class is empty, but this is not essential. In addition, it is not necessary to suppose that the events in a generating class are mutually exclusive. In the present example we have three classes of events A2 = {A2, A2c}; A1 = {A1, A1c}; and A0 = {A0, A0c}. Each of these three classes forms a partition of basic space , since the events in each of these three classes are mutually exclusive and exhaustive of . When we consider the joint partitioning of in terms of these three classes of events we observe that we will have eight intersection sets or minterms that are listed as follows: A2cA1cA0c A2cA1cA0 A2cA1A0c A2cA1A0 A2A1cA0c A2A1cA0 A2A1A0c A2A1A0 In general, when we have n events in some generating class A, we will generate 2n minterms. Observe that the minterms we have generated in the above example are indeed mutually exclusive. The reason is that the pattern of complementation in each minterm is different. No element or outcome can reside in more than one of these minterms. This will be true in general for n events in a generating class. The resulting 2n minterms will be mutually exclusive and also exhaustive of a basic space . There are different ways in which we can portray the collection of minterms generated by some class A. The first, shown below, is tabular in nature and serves to illustrate one very convenient way of keeping a systematic account of the minterms that are generated. This method leads to what is called the binary designator method for numbering minterms. I will illustrate this method using the case above in which we have n = 3 events in a generating class. First, if an event in a minterm does not carry a complementation we assign it the number 1; if it does appear complemented, we assign it the number 0. This of course preserves the binary nature of events in any class Ai = {Ai, Aic}. Take minterm A2A1cA0 as an example, we assign it the numbers 101. This is called the binary designator for the minterm. In more current literature a binary designator is termed a bit string. The decimal equivalent for the bit string 101 is: 1(22) + 0(21) + 1(20) = 5. The following table shows the binary designators assigned to each of the eight minterms when we have n = 3 events, their decimal equivalents, and their minterm symbol. 6 Minterm Binary Designator Decimal A2 A1 A0 Equivalent A2cA1cA0c A2cA1cA0 A2cA1A0c A2cA1A0 A2A1cA0c A2A1cA0 A2A1A0c A2A1A0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 Minterm Symbol M0 M1 M2 M3 M4 M5 M6 M7 This tabular binary designator account of minterms makes use of the particular ordering of events in a generating class A that I mentioned earlier. In determining the decimal equivalent for the numbering of minterms the subscript on an event indicates the power to which the number 2 is to be raised. The event’s binary designator simply indicates whether or not this power of 2 is included in the sum of powers of 2 across the events. Another example, for M 2, is: 0(22) + 1(21) + 0(20) = 2. As I noted, binary designators can also be called bit strings. For a variety of purposes it is useful to portray a collection of generated minterms in a variation of a Venn diagram called a minterm map. The figure below shows the minterm map for the example in which we have n = 3 events in a generating class. The advantage of a minterm map over a conventional Venn diagram is that it illustrates clearly the disjoint nature of the minterms and, as I will illustrate later, it allows us to portray analyses of Boolean functions in very orderly ways. A2c A0c A2 A1c A1 A1c M0 M2 M4 M6 M1 M3 M5 M7 A1 A0 In the following discussion sloth overtakes me and I shall avoid having to write subscripts on events in some generating class A. As I consider larger class of events and Boolean functions of these events the use of subscripts gets very tedious and is unnecessary. I can still preserve the binary designator ordering and labeling of minterms provided that I order the events in certain ways and preserve this ordering on minterm maps I will provide. The following table illustrates my method for the case in which generating class A = {A, B, C}. 7 Minterm Binary Designator A B C 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 AcBcCc AcBcC AcBCc AcBC ABcCc ABcC ABCc ABC Minterm Number M0 M1 M2 M3 M4 M5 M6 M7 All that really matters here is that I preserve the ordering of the binary designators and assume that A would get subscript 2, B the subscript 1, and C the subscript 0. The minterm map I generate using class A = {A, B, C} is as follows: Ac Bc Cc C A B Bc B M0 M2 M4 M6 M1 M3 M5 M7 One way of describing a minterm map is to say that it represents the finest grain partitioning of a basic space that is consistent with the definition of a Boolean function. The reason is that such functions only permit the consideration of binary event classes such as {A, Ac}. Its true of course that the phenomena underlying these events may have many possible states or levels or may even exist on a continuum. In any case, these states or levels, however many there are, can always be partitioned in a binary manner. For example, we can partition the heights of people, in theory a continuum, into the binary class: A = persons less than five feet tall, and Ac = persons five feet tall or over. So, if we could express some arbitrary Boolean function in terms of minterms we could also say that this function has been expressed in terms of the finest grain elements that such functions allow. A Boolean function expressed in term of minterms is often said to be expressed in its canonical form. The word “canonical” in mathematics is used to indicate some “standard” form in which a result might be expressed. However, as I will discuss, there is more than one way in which a Boolean function can be expressed in a canonical form. 8 The first way of expressing a Boolean function in canonical form rests on an important result called the minterm expansion theorem. This theorem is expressed as follows: A Boolean function f of a finite class A = {An-1, An-2, ..., A1, A0} of sets [events] can be expressed in one and only one way as the disjoint union of a subclass of the minterms [M] in the partition generated by the class A. This theorem can be expressed in symbols as: F f ( An1 , An2 ,..., A0 ) M i J F i , where JF is a uniquely determined subset of the [index] set JF = {0, 1, ..., 2n - 1}. It is understood here that the union expressed in this theorem is the disjoint union of this unique subset of minterms, since the minterms are by their construction mutually exclusive. The index set JF here simply lists the 2n possible minterms generated by class A. Proof of the minterm expansion theorem is given in Pfeiffer & Schum [ 1973, 169-170]. I pause here for a moment to note that this theorem informs us about how many different Boolean functions there are when such functions involve n events. If each different Boolean function involves a unique subset of minterms, as this theorem asserts, then all we need to do is to determine the number of unique subsets of minterms. The answer is: 2 raised to the power 2n possible subsets. Thus, for a generating class involving just five events, there are 2 32 = 4,294,967,296 possible Boolean functions involving these five events. In just a moment I will mention how to determine whether two differently stated Boolean functions are in fact equivalent. My next task is to discuss how we go about the task of expressing a Boolean function in disjunctive canonical form in terms of minterms. Many years ago I ran across a very good account of the necessary steps in this task in the classic text by Birkhoff and Mac Lane [1965, 3rd ed, 322-324]. There are four steps in this process. Some of the steps need to be applied more than once and, on occasion, some of the steps can be omitted. In addition, the order of the last two steps may be reversed. Here first are the four steps necessary; each step involves rules in the algebra of sets I summarized at the outset [one reason why I provided this summary]. I will then give some examples involving Boolean functions. Step 1: Use De Morgan’s law to move complements from outside to inside any parenthesis; for example, (AB)c = (Ac Bc); (A B)c = AcBc. Step 2: Use the distributive law for intersection to move intersections inside parentheses; for example, A(B C) = (AB) (AC). Step 3. Use the idempotency and complementary rules to omit certain terms; for example, AA = A; A A = A; AAc = ; A = A. Step 4. Write equivalent expressions for any term that does not contain n events in its intersection. For example, for n = 2 and A = {A, B}, write A as A = AB ABc. [Remember that means the disjoint union]. As another example, when n = 3 and A = {A, B, C}, write ABc as ABcC ABcCc. Example 1: F = f1(A, B, C) = [A (B Cc)c]c F = Ac(B Cc) [Step 1] 9 = AcB AcCc [Step 2] = AcBC AcBCc AcBCc AcBcCc [Step 4] = AcBC AcBCc AcBcCc [Step 3] f1 = M0 M2 M3. Example 2: F = f2(A, B, C) = (AB Cc)(Ac C) = (AB)(Ac C) Cc(Ac C) = ABAc ABC AcCc CCc = ABC AcCc = ABC AcBCc AcBcCc = M0 M2 M7. I’ll pause here for a moment to mention why it is so often important to be able to decompose a Boolean function into its unique collection of minterms. Later on I will give specific examples of the necessity of performing such decompositions in contexts such as engineering design and criminal investigations. In the two examples I have just provided the Boolean functions f1 and f2 might represent general conditions, requirements, or statements that must be satisfied. The decomposition of these functions into unique disjoint collections of minterms simply provides a listing of all the specific ways in which these general conditions, requirements, or statements can be satisfied. Being able to provide listings of all the specific ways in which some complex Boolean expression can be satisfied turns out to have very important consequences in discovery and invention, regardless of the context in which these activities occur. Later on I will illustrate how these results from the minterm expansion theorem correspond to the idea of a schema in evolutionary computation [Dumitrescu, et al, 2000, 33 - 35]. As a final note on formal issues associated with Boolean functions and their minterm expansions I return to a consideration of the number of Boolean functions that are possible given the n events in a generating class; the number is two raised to the power 2 n. As I noted above, this large number refers to the number of unique subsets of the 2n possible minterms when we have n events in a generating class. One interesting fact is that two or more apparently different Boolean functions may in fact be equivalent in the sense that they can be expressed by the same unique subset of minterms. Another way of saying this is to say that two requirements, conditions, statements, or schema, expressed in terms of Boolean functions, may be saying the same thing, even though they are expressed differently. Here is an example: Example: Let f1(A, B, C) = A BC; and let f2(A, B, C) = ACc (A B)C. Using the minterm expansion process I just discussed, we can easily determine that: f1 = f2 = M3 M4 M5 M6 M7. This means that these two, apparently different, Boolean functions are in fact equivalent and are saying the same things when they are decomposed into their specific minterm elements. 2.3 Another Canonical Form: Maxterms 10 As I have just illustrated, any Boolean function f can be expressed in canonical or standard form in terms of the disjoint union of a unique subset of minterms. But there is another formally equivalent canonical form of a Boolean function that arises from application of de Morgan's laws. We first define a maxterm as: j n 1 Max Y , j 0 j where Yj is either Aj or Ajc. It happens that we can express any Boolean function in terms of the intersection of a unique subset of maxterms. This maxterm expansion will be equivalent to a corresponding minterm expansion. I first encountered the idea of a maxterm in Paul Pfeiffer's Sets, Events, and Switching [1964, p 75]. I also found discussions of the essential ideas of maxterms in a probability book by Edward Thorp [1966, p 25], though he did not label these disjunctive expressions maxterms. All he says is that we can express any Boolean function in two equivalent forms; one of which involves the disjoint union of intersection sets and the other involves the intersection of disjunctive sets. In the book by Birkhoff and Mac Lane [1965, p 324325] the whole idea of maxterm expansions of Boolean functions is left as an exercise for the student. In my work on evidence and probabilistic reasoning these past years, I have never had any occasion to use a maxterm expansion of a Boolean function; but I have had many occasions to use minterm expansions. Consequently, I have only recently taken an interest in maxterms and expansions of Boolean functions as intersections of unique subsets of maxterms. I was pleased to find one recent source of information about maxterms and their use in determining another canonical form of Boolean functions [Gregg, 1998, pp117 - 121]. There are several ways in which we can generate a maxterm expansion of a given Boolean function. The easiest way, it seems, is to begin with a minterm expansion of this function and then apply de Morgan's law [twice] to it. I will illustrate this process using a Boolean function for which we already have a minterm expansion. Consider the function f(A, B, C) = (AB Cc)(Ac C), whose minterm expansion, as we saw above, is: ABC AcBCc AcBcCc = M0 M2 M7. In this case in which n = 3 events in a generating class, there are eight possible minterms whose disjoint union is . The remaining minterms not in the expansion of f(A, B, C) are: M1, M3, M4, M5, and M6. Let's first express their disjoint union: [M1 M3 M4 M5 M6]. Now observe that [M1 M3 M4 M5 M6]c = M0 M2 M7. If we now apply de Morgan's law twice to the left-hand side of this equality, we have: [M1 M3 M4 M5 M6]c = [M1c M3c M4c M5c M6c] = [(AcBcC)c (AcBC)c (ABcCc)c (ABcC)c (ABCc)c = (A B Cc) (A Bc Cc) (Ac B C) (Ac B Cc) (Ac Bc C). This last expression is the conjunctive maxterm expansion of f(A, B, C) = (AB Cc)(Ac C). Call this maxterm expansion EMAX and call the minterm expansion EMIN. From the developments above, its clear that EMAX = EMIN; they are just different, but formally equivalent, ways of expressing a Boolean function in canonical or standard form. So, in this particular example: ABC AcBCc AcBcCc = (A B Cc) (A Bc Cc) (Ac B C) (Ac B Cc) (Ac Bc C). In his work on Boolean algebra and circuits, Gregg provides a tabular method for generating both minterm and maxterm expansions of a given Boolean function. As far as minterm expansions are concerned, I believe the method I have employed is easier. I also believe the method I have used for maxterm expansions is easier. However, I will provide an example of 11 Gregg's tabular maxterm expansion, since I will use this method in an illustration of some issues associated with Stuart Kauffman's interest in Boolean functions and phase transitions. For a start, we can think of a table of binary designators, such as the ones on pages 5 and 6 as truth tables. Each row in these tables records answers to the questions: Do we have an A, a B, a C? In Row M0, for Ac, Bc, Cc, the answers are: No, No and No, which we indicate by 0, 0, 0. For Row M 5, for A, Bc, C, we have the answers Yes, No, Yes, which we indicate by 1, 0, 1. In short 0 means no and 1 means yes in any of these rows. Here, first, is Gregg's entire tabular analysis, which I will explain step by step. In this example I will continue to use the function f(A, B, C) = (AB Cc)(Ac C). ----------------------------------------------------------------------------------------------------------------------------- --Binary Designators (1) (2) (3) (4) [Truth?] A B C (AB Cc) (Ac C) (AB Cc)(Ac C) Maxterm 0 0 0 1 1 1 --0 0 1 0 1 0 (A B Cc) 0 1 0 1 1 1 --0 1 1 0 1 0 (A Bc Cc) 1 0 0 1 0 0 (Ac B C) 1 0 1 0 1 0 (Ac B Cc) 1 1 0 1 0 0 (Ac Bc C) 1 1 1 1 1 1 --_____________________________________________________________________________ After listing the binary designators, or truth table, the first step in Gregg's method is to break up f(A, B, C) into its major parenthesized elements: (AB Cc) in Column 1 and (Ac C) in Column 2. Then, going through the binary designators or truth table, row by row, we ask whether the combination of yes (1) and no (0) indications is consistent with the terms that head Columns 1 and 2. For example, consider row 0 and the truth values [0, 0, 0]. This is consistent with (AB Cc) since we have Cc in this row. It is also consistent with (Ac C) in Column 2 since we have Ac in Row 1. So, for Row 0, we record a 1 under the terms shown in Columns 1 and 2 indicating that Row 0 is consistent with both of the terms shown in Columns 1 and 2. As another example, consider Row 2 whose truth values are [0, 1, 0]. This row of truth values for A, B, and C is consistent with the terms in both Columns 1 and 2. We have Cc for the term in Column 1 and Ac for the term in Column 2. We make the same truth determinations for each row of the table of binary designators or truth values for A, B, and C. Columns 1 and 2 show the results. The next step is to consider the intersection of the two parenthesized terms in Columns 1 and 2; this gives us our Boolean function being analyzed: f (A, B, C) = (AB Cc)(Ac C). This entire function is shown in Column 3. The truth value for this entire function will be 1 (yes) if an only if each of its elements in Columns 1 and 2 are both true (i.e. both take the value 1). Observe that this entire function takes the value 1 only for rows 0, 2, and 7. [Recall that f(A, B, C) in this case has the minterm expansion: M0 M2 M7]. The third step focuses on those instances in which, for f(A, B, C) in Column 3 the truth value is zero. In this final step we take the union of A, B, and C in each case and then complement any term in these expressions that takes a 1 in its corresponding binary designator. For example, consider Row 1 and its truth values [0, 0, 1]. We form the maxterm for this row by adding a complement to C in this disjunction since C takes the value 1 in the binary designator. Thus, for Row 1 we have the maxterm (A B Cc) shown in Column 4. As another example, for Row 6 and its binary designator [1, 1, 0], the maxterm in Column 4 is: (Ac Bc C). If you 12 compare the maxterms in Column 4 generated by this tabular method, you will see that they are the same as those I generated using de Morgan's laws twice over starting with the minterm expansion for f(A, B, C) = (AB Cc)(Ac C). In some instances, this tabular method might be quicker than the de Morgan laws method. The truth is that neither method is very speedy when we have Boolean functions to consider in which the number n of events in their generating class is very large. In some instances we will get lucky and be able to observe by inspection of a minterm map which minterms appear in an expansion of some Boolean function. When this does not happen, however, we know that there are two formally equivalent ways of expressing a Boolean function in canonical form, as I have just illustrated. As I noted above, and will illustrate further below, the virtue of minterm expansions is that they provide us with an account of all specific and unique instances of conjunctive combinations of the events in some generating class of events that occurs in a Boolean function of interest. The first two applications of Boolean functions I will now mention make use of minterm expansions. 3.0 SOME APPLICATIONS OF BOOLEAN FUNCTIONS Here is a collection of thoughts about how Boolean functions and their canonical forms may be usefully employed in three related areas of ongoing research. All three of these areas involve processes having great complexity in which efforts are being made to discover or generate new ideas or to invent new engineering designs. As I noted earlier, I believe there to be common elements of these activities as they involve discovery and invention. 3.1 Inventor 2000/2001 In 2001 I tried my best to generate some ideas I hoped were useful in work with Tom Arciszewski, Ken De Jong, and Tim Sauer on Inventor 2000/2001. In another document [Schum, 2001] I have mentioned some thoughts involving Boolean functions and their minterm expansions that may serve to stimulate the process of inquiry regarding the evolutionary mechanism according to which Inventor 2000/2001 generates new wind-bracing designs for tall buildings. The computational engine in this system makes use of the evolutionary processes of mutations and recombinations [e.g. crossovers], both of which can be construed as search mechanisms [Kauffman, 2000, pp 16-20]. This evolutionary process also involves selection since at each new step of the evolutionary process, only the fittest designs are selected and allowed to "mate" to produce new designs at the next iteration. In early studies using Inventor 2000, the fitness criterion was very simple and involved only one measure, namely the physical weight of the design. Multivariate and, presumably, nonlinear fitness functions are being contemplated. One rather obvious element of the complexity of the evolutionary process in Inventor 2000/2001 concerns the size of the design space to be searched. All engineering designs have attributes, features, or characteristics; wind-bracing designs have many such attributes. The wind bracing designs of concern in our early studies each had 220 attributes. But any design attribute has a number of possible states or levels. In these early studies, 108 design attributes had 7 possible states, 108 had 4 possible states, and 4 had 2 possible states. This makes the total number of different designs in this space to be T = [7 108][4108][24] 29.76(10)156, a preposterously large number. Inventor 2000 incorporated a feasibility filter that automatically eliminates any combination of attribute settings that would produce an infeasible or foolish design. Even if only one in every million designs were accepted by this filter as feasible/sensible, we would still have T* 29.76(10)150 possible designs to search through in hopes of finding the fittest designs. If a computer could generate (10)6 new designs every second [one every microsecond], it would take this system 9.44(10)143 years to generate all the possible designs in this space 1. 1 At (10)6/sec., this makes 6(10)7/minute, 3.6(10)8/hr; 8.64(10)10/day, and 3.154(10)13/year. But, since there are about 29.76(10)156 possible designs, generating all of them would take about 13 Looking through everything in the hope of finding something does not make sense, even when we have possibility spaces not nearly as preposterously large as the one in our studies using Inventor 2000/2001. This system makes use of search processes that mimic evolutionary mechanisms [mutations, crossovers, and selections] that one might say are tried and true, since nature has apparently used such mechanisms to produce an enormous diversity of species, including homo sapiens sapiens, many of which have a degree of fitness that has allowed them to survive for very long periods of time in the face of many environmental constraints. The issue of the fitness of engineering designs raises even more issues of complexity. Clearly, the fitness or suitability of an engineering design is a multiattribute characteristic. At this point it seems an open question regarding just how many attributes ought to be considered in evaluating the fitness of designs for wind-bracing or other engineering systems. Just identifying these individual fitness attributes is not enough; we need to assess their relative importance and, most importantly, to specify how they might combine in determining overall fitness. In most cases we can easily suppose that these fitness attributes interact in very subtle ways. This brings more combinatorics to mind since, if we had some number k of fitness attributes, there are 2 k - (k + 1) possible interaction patterns to consider. One point here regarding the existence of complex interaction patterns is that overall fitness functions will almost certainly be nonlinear in nature. So, in Inventor 2000/2001 has complexity all over the place. If we construe the mechanisms of Inventor 2000/2001 as being search processes alone we realize that a major element of the discovery of new and fitter designs lacks one essential ingredient, namely inquiry, the asking of questions. Its a fair guess that, at no points in the last several billion years on earth, did nature stop the evolutionary process to see how well it was proceeding and to ask how it might be improved. But we certainly have this capability. Ken De Jong has wisely proposed that the evolutionary computational mechanisms in Inventor 2001 be made “tunable” in the sense that we are allowed to adjust these mechanisms at various points in order for them to operate more effectively and efficiently. Effective here presumably means assured evolutionary convergence toward maximal fitness regions in the preposterously large design space. Efficiency presumably means the speed at which such convergence might take place. In other words, how can we converge to these maximal fitness regions, wherever they may be, in the smallest number of evolutionary steps or stages? Such capabilities would certainly enhance the applicability of Inventor 2001, and its successors, in any area of engineering or in other contexts in which design improvements are continually being sought. My thoughts now turn to the process of inquiry. In order to decide how best to make Inventor 2001 tunable we have to begin by asking some questions, knowing that not all of these questions will be productive and also knowing that some of these questions may seem quite impertinent. One major element of Carl Hunt’s work on the ABEM system is this system’s ability to help the user to ask better or more strategically important questions during investigative activities, such as the criminal investigations in which he is particularly interested. By strategically important I mean that a question leads an investigator along a new productive line of inquiry. I shy away from saying that a strategically important question is always the “right” question. The reason is that we may have to ask a sequence of questions whose answers may eventually lead us to ask one that is “right” in the sense that its answers allow us to generate a hypothesis that contains some truth. When I began to think about how I could contribute to research on Inventor 2000 I had virtually no idea about what questions I should be asking about this system. This was due in part to the complexity of this system’s activities as well as the complexity of the evolutionary process this system attempts to capture in computational terms. It is also true that I am neither a structural engineer nor a computer scientist. As anyone who has ever studied the process of discovery knows, thought experiments and simulations are very useful. Indeed, they are frequently the only kind of studies one can perform, given the difficulty of performing more conventional empirical studies of the suitability of 29.76(10)156/3.154(10)13 = 9.44(10)143 years. 14 any methods alleged to enhance discovery-related activities. I had already discovered the difficulty of doing conventional empirical evaluations in my work on the design of computer-based systems to enhance the process of marshaling thoughts and masses of evidence in investigations in many contexts [e.g. Schum, 1999]. In my view, Inventor 2000/2001 might just as well be called Discoverer 2000/2001, because it is attempting to bring to light the fittest designs [there may be many of them] whose attribute combinations already exist among the T 29.76(10)156 designs that are possible. As I thought about how to begin to generate useful questions and thought experiments or simulations, it seemed obvious that I would need to consider much simpler situations. First notice above on page 12 that only four of the 220 wind-bracing design attributes are binary in nature. Suppose all of these design attributes were binary. In this case there would then be 2220 1.685(10)66 possible designs [instead of T 29.76(10)156 designs that actually exist at present]. Not much real simplification here! Another possibility of course is to imagine that there are fewer design attributes, all of which are binary in nature. This is where I began to think about the possibility of Boolean functions, and their minterm expansions, as useful devices for simulating the evolutionary processes captured by Inventor 2000. As we all know, simulations are useful to the extent that they capture faithfully the most critical elements of the phenomena and activities being simulated. This issue is termed: simulation fidelity. It seemed that I could capture the essential evolutionary processes involving mutations, crossovers, and selection of the fittest designs to use as parents in the generation of possibly even fitter designs. Here are some connections between Boolean functions, minterms, and these evolutionary processes. 3.1.1 Designs as Chromosomes or Minterms To begin, suppose that instead of there being 220 wind-bracing design attributes there are just six such attributes, each of which has only binary states. In this highly simplified situation, the generating class A = {A, B, C, D, E, F}, where {A, Ac}, {B, Bc}, {C, Cc}, {D, Dc}, {E, Ec}, and {F, Fc}, are the binary states of each individual event [attribute] class. In this case there are 2 6 = 64 possible minterms, each of which constitutes a unique design attribute combination. For example, one design is (ABCDcEFc), whose binary designator, or bit string, is (111010) and whose minterm number is M58. In words, we can say that this design has an A, B, C, and E, but no D and no F. The following minterm map shows the numbering for the 64 minterms in this special case. Ac Ec A Bc B Bc Cc C Cc C Cc C B Cc C Fc 0 8 16 24 32 40 48 56 F 1 9 17 25 33 41 49 57 Fc 2 10 18 26 34 42 50 58 F 3 11 19 27 35 43 51 59 Fc 4 12 20 28 36 44 52 60 F 5 13 21 29 37 45 53 61 Fc 6 14 22 30 38 46 54 62 F 7 15 23 31 39 47 55 63 Dc E Ec D E 15 Figure 1 In the evolutionary computation literature, the minterm map just shown would be called a search or representation space [X], whose individual members [minterms] are x X. The members x X are variously called: chromosomes, genotypes, genomes, or individuals [Dumitrescu, et al, 2000, 12]. As I will illustrate later, one advantage of a minterm representation for a design is that it identifies and distinguishes between genes on a chromosome [design] in a way that a bit string cannot do. One related matter concerns the formulation of Boolean functions. Consider again M58 = (ABCDcEFc), whose bit string is (111010). We cannot form any Boolean function using just a bit string; such functions require identification of the events whose possible binary states are indicated by 0 or 1. I do understand however that computation involves bit strings. As I described earlier, a given Boolean function can be decomposed into a unique subset of either minterms or maxterms. There is an interesting relation between minterms and a term employed in evolutionary computation; the term is schema. Consideration of this term lets me introduce some additional ideas related to possible uses of Boolean functions in our work. First, suppose all chromosomes [or designs] of concern have n genes [design attributes] that each have binary states. We know that there are 2n possible chromsomes, genotypes, or genomes; in other words X = the complete set of these chromosomes. As just noted, X can be interpreted as a minterm map. But, since each chromsome is assumed to have binary states, we can also represent X, the chromosome space, as an n-dimensional hypercube: X = {0,1}n. A schema of the chromosome space X is a string of length n involving the three possible symbols 0, 1, and *, where the asterisk symbol represents what is termed a "wild card" or a "do not care" entry [i.e. it could be either 0 or 1, we don't care which]. Consider the schema S = 1 * * 0. In bit string terms, there are four chomosomes that are represented by this schema S; they are: 1000; 1010; 1100; and 1110. In event form, using minterms for this four variable case, the associated minterms are: ABcCcDc; ABcCDc; ABCcDc; and ABCDc. In words, what this schema says is that we have a selection of all possible four variable minterms having an A but not a D. But what was just said is a simple Boolean function f(A,B,C,D) = (ADc). This schema is just another way of representing all the chromosomes, minterms, or designs that satisfy f(A, B, C, D). The minterm expansion theorem simply allows us to identify all the designs or elements in a schema. A bit later I will illustrate how a variety of useful indices including fitness criteria and feasibility filters can be expressed as Boolean functions, any of which can be decomposed into a schema consisting of a unique subset of either minterms or maxterms. Simply stated, minterm expansions of Boolean functions act to identify all the possible instances of any schema that may be of interest. 3.1.2. Selection Mechanisms The basic mechanisms employed in evolutionary computation involve selection, mutation, and some form of recombination of genetic elements. Such processes take place over time (t) and have some starting point t0. Let U(t0) represent the initial total universe or population of chromosomes/minterms available before any initial selection is made; i.e. U(t 0) = X. Remember that X here represents the entire collection 2n of chromosomes/minterms of length n. In work on Inventor 2000, U(t0) = T 29.76(10)156 designs that are possible. A digression is necessary at this point because of how chromosomes/minterms are described in the evolutionary computation literature I have read. It is asserted that the genetic algorithms involved in evolutionary computation suppose a population of independent individuals or chromosomes [e.g. Dumitrescu, et al, 2000, 26]. I believe there is a mistake in using this term independence with reference to individuals, designs, chromosomes, or minterms. I have gone to considerable lengths to show that chromosomes, designs, or minterms in U(t0) = X are mutually exclusive. This means that they cannot be independent [in a probabilistic sense]. The terms independence and mutual exclusivity are two of the most frequently misused terms in all of probability theory. Many persons use them interchangeably, when they are not; if we have one we cannot have the other. The reason follows from elementary probability. Suppose two events E and F, each with non-zero probability. If these events are independent, then P(E|F) = P(E), which entails that P(EF) = P(E)P(F) > 0, since P(E) > 16 0 and P(F) > 0. But now suppose that the same events E and F, each having non-zero probability, are mutually exclusive, in which case P(EF) = 0. They cannot be independent since this would require P(E)P(F) 0, since P(E) > 0 and P(F) > 0. What all this says is that any design can only have one unique chromosome or minterm. Now, there are three instances in which independence does arise in the consideration of chromosomes, designs, or minterms. The concept of Boolean functions helps us to identify the first form of independence. In decomposing a Boolean function into minterms or maxterms, no specification is required regarding whether the events in the Boolean function are either independent or mutually exclusive. For example, suppose f(A, B, C) = ABc. Decomposed into minterms, f(A, B, C) = ABcC ABcCc, regardless of how we might believe events A, B, C [and their complements] to be related. Now here's the point: calculating a probability of any minterm does require knowledge about the relationship between events in a minterm. For example, suppose we believe that events A, B, and C are completely independent, then P[f(A, B, C)] = P(ABc) = P(A)P(Bc)P(C) + P(A)P(Bc)P(Cc). Stated in genetic terms, two different chromosomes are by nature mutually exclusive. However, their genetic elements may or may not be independent; perhaps the existence of an allele of one gene makes the existence of an allele in another gene more probable or less probable. A second independence at issue is of great importance in studying the fitness of chromosomes/designs/ minterms. Genes in a given chromosome may interact, or be nonindependent, in their influence on the fitness of chromosomes. Here's where nonlinearity and great complexity enter the picture. Trying to discover the nature of these interactions among genes that influence fitness is perhaps the greatest challenge facing discovery in the evolutionary process. A final form of independence concerns the selection process itself, to which we now return. The conditions of sampling or selection of chromosomes governs this form of independence. If selection is made with replacement, then the probability of generating [selecting] a chromosome does not change from trial to trial, in which case we say that the trials are independent. But, the actual selection of chromsomes in evolutionary computation proceeds along other lines, as I will now describe. Returning to U(t0) = X, the initial universe or population of individual chromosomes, suppose a selection is made from this initial universe to begin the genetic processes of mutation and/or mating [for purposes of recombination or the transfer of genetic elements among chromosomes]. Let this initial selection of individual chromosomes be represented by M(t 0), where M(t0) is a subset of U(t0) = X. We might refer to M(t0) as the initial conditions of the evolutionary process since it is from this initial selection of chromosomes that the evolutionary process gets started. In many examples of evolutionary computation, random sampling from U(t0), assuming a uniform probability distribution, is used to select members of M(t 0). But there are other strategies, one of which is called partially enumerative initiation [Dumitrescu, et al, 2000, 31 - 32]. This strategy makes use of the schema, discussed above, that can be represented as Boolean functions. Using some fitness criteria, suppose that various interesting schema of a specified size are identified. This form of sampling involves assuring that at least one instance of each identified schema is included in the sample selected. Another strategy is called doping. Using this strategy, M(t0) might be "doped" by the insertion of some very fit chromosomes. This assumes, of course, that fitness is readily recognizable or that it might be easily guessed or inferred. Some knowledge of fitness formed the basis for the feasibility filter in Inventor 2000 that I mentioned earlier. There's another very interesting matter concerning selection that I have given some time trying to analyze formally; it concerns variability in the gene pool of selected samples such as M(t0) as well as in other samples selected at later stages in the evolutionary process. As we know, genetic variability among parent chromosomes being mated acts to promote diversity in the genetic characteristics of offspring. Ensuring genetic diversity at each stage of the evolutionary process seems one major objective as the process lurches toward regions of greater fitness. Absent such variability, the process might wallow in some non-ideal region of a fitness landscape. This is one reason why incest is not especially adaptive. Mutations at various stages of the evolutionary process also help to prevent wallowing at a certain fitness level. 17 Here is a strategy in which an attempt is made to promote genetic variability in the subsequent offspring of some collection of parent chromosomes. We first need to determine how many different chromosome pairs are possible in some universe or population from which some selection is to be made. The following arguments apply whether the population of concern is U(t 0) or some population that arises at a later stage in the evolutionary process. The major assumption in what follows is that the genetic elements of any individual chromosome exist in binary states, such as those I have considered in my discussion of minterms. First, we already know that, when chromosomes have length n, there are 2n possible different chromosomes/minterms. There are 4 n possible pairs of chromosomes in some U(t0) = X. This arises from the fact that there are four possible pairings of the binary genetic elements of any chromosome. For example, if {A, Ac} form the binary states of gene A, then we can pair these states in two chromosomes in the following four ways: (A, A); (A, Ac); (Ac, A); and (Ac, Ac). Now, analyses of genetic variability or diversity among pairs of chromosomes requires that we have a way of determining how many chromosome pairs, among the 4n that are possible, have exactly k genetic elements in common, where 0 k n. As I have shown elsewhere [Arciszewski, Sauer, & Schum, 2002, 55-56], letting (k) be the number of pairs of chromosomes having exactly k genetic elements in common, (k) = C(n, k)2k2n-k = C(n, k)2n. [I use the expression C(n, k) to represent the process of selecting k elements from n distinguishable elements with replacements not allowed. As we know, C(n, k) = n!/k!(n-k)!]. Across values of k, the distribution of (k) is symmetrical, as one would expect given the binary nature of the genetic elements involved in these combinatorics. Now, what we need to show genetic variability in chromosome pairs is not the number of elements they have in common, but the number of elements that are different. This turns out to be exactly what the Hamming distance shows us. For any two bit strings of equal length, the Hamming shows the number of different elements in the strings. For example, suppose the two bit strings (0010) and (0101). Their Hamming distance is 3 since the last three bits in each string are different. Chromosome pairs whose Hamming distance is largest will contribute most to genetic variability. . I cannot easily illustrate the initial or later selection method I have in mind using short chromosomes, or simple designs, such as those shown in the minterm map above on page 14. Suppose instead a situation in which chromosomes/minterms/ designs have twenty genes or attributes, each of which exist in binary states. This represents an initial universe U(t 0) = X = 1,048,576 possible chromosomes/designs. In this situation there are 420 = 1.099512(10)12 possible different chromosome/design pairs. Table 1 on the next page shows how many of these chromosome pairs have Hamming distances between zero and twenty. The situation shown in Table 1, though analytically tractable, still results in astronomical numbers and begins to resemble the complexity of the actual design situation faced in Inventor 2000. Because of the symmetry of this unimodal distribution, the mean, median, and mode coincide. If a single chromosome pair were picked at random from this distribution, the most likely consequence would be that this pair has Hamming distance = 10. This table is helpful in illustrating various strategies that may be employed in determining an initial M(t 0) or a similar mutation/mating population at any later stage in evolutionary computation. The purpose of such strategies, again, is to ensure that the gene pool has high variability at each stage of evolutionary computation. In the case shown, involving 20 binary genes [design attributes], we see that there are over a million chromosome pairs that have Hamming distance = 20. However, this represents just a vanishingly small proportion of the total number of pairs. If the individuals in M(t 0) were selected from among these 1, 048,576 pairs, this would ensure that the initial conditions of the evolutionary process favored the most variable gene pool possible in the case in which binary chromosomes are of length twenty. Less stringent strategies are possible of course. Observe in 18 Table 1 that just six tenths of one percent of all chromosome pairs have Hamming distance at least 16. but this is still a very large number [6,496,976,896 chromosome pairs]. Another consequence of the distribution in Table 1 is that, if the selection of pairs for M(t 0) were done at random, about 73.6% of these pairs would have a Hamming distance between 8 and 12. Knowledge of the proportion of common genetic elements in chromosome pairs, such as Table 1 below provides, is helpful in decisions about what initial conditions might be constructed for any run of evolutionary computations. Common Elements 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Totals Hamming Distance 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 (k) 1,048,576 20,971,520 199,229,440 1,195,376,640 5,080,350,720 1.62571(10)10 4.06428(10)10 8.12856(10)10 1.32089(10)11 1.76118(10)11 1.93731(10)11 1.76118(10)11 1.32089(10)11 8.12856(10)10 4.06428(10)10 1.62571(10)10 5,080,350,720 1,195,376,640 199,229,440 20,971,520 1,048,576 1.099512(10)12 Proportion of Total Pairs 0.000+ 0.000+ 0.000+ 0.001 0.005 0.015 0.037 0.074 0.120 0.160 0.176 0.160 0.120 0.074 0.037 0.015 0.005 0.001 0.000+ 0.000+ 0.000+ 1.000 Table 1 Cumulative Proportion 0.000+ 0.000+ 0.000+ 0.001 0.006 0.021 0.058 0.132 0.252 0.412 0.588 0.748 0.868 0.942 0.979 0.994 0.999 0.999+ 0.999+ 0.999+ 1.000 I now return to the selection process as it unfolds over time. I hope Figure 2 will help clarify various choices we have in allowing evolutionary processes to unfold. 19 As I noted above, in my present example involving chromosomes having twenty binary genetic elements, there are U(t0) = 220 = 1,048,576 different or unique chromosomes. From this number we select M(t0) chromosomes, according to a certain strategy, to represent the initial mating/mutation population that will result in our first generation of "children" chromosomes that are obtained from the recombination and/or mutation of the chromosomes in this selection. As Figure 2 illustrates, let C(t1) represent the first generation of "children" produced by this mating/mutation process. I understand that it might not be entirely appropriate to call the new individuals in C(t1) "children" if the first evolutionary process only involved mutations. We could of course call them "single-parent children". In any case, here is where fitness criteria enter the picture, if they have not already done so in the initial selection of M(t0) . Suppose some selection of the "fittest" children in C(t1) is made to form some new mating population M(t1). In Figure 2 this selection process is indicated by the bold arrow from C(t 1) to M(t1). But in some instances of evolutionary computation I have seen, individuals from the "parent" population M(t 0) that produced C(t1), might be included as well in the formation of the new mating population M(t 1). This I have indicated by the thin arrow [with a question mark] in Figure 2. The idea seems to be that not all of the new children in the first generation are necessarily more fit than their parents; some may be even less fit than their parents. So, particularly fit parents might be included in the mating population that will produce a second generation [C(t2)]. There are several concepts associated with selection that I should mention here. The first, termed the generation gap, refers to the percentage of a population that is to be replaced in each generation. As an example, in Figure 2, at t1, we began with a collection of individuals M(t0), those initially selected to form a mating/mutation pool. As a result of this mating/mutation process, we now have our first generation containing new chromosomes C(t 1). The generation gap here refers to how many of these individuals will be selected for replacement to form the second generation. Another term is selection pressure, which refers to the degree to which highly fit individuals in one generation are allowed to produce offspring in the next generation. Consider again C(t1) in Figure 2. Various strategies are possible in selecting from among these individuals to form the mating population M(t1) whose offspring will form the new generation C(t2). If selection pressure is low, then each individual in C(t1) has a reasonable chance to be included in M(t1). High values of selection pressure favor the fittest individuals in each generation. Two correlated consequences of selection are termed genetic drift and premature convergence. Genetic drift is usually associated with a loss of genetic diversity and causes an evolving population to cluster around a particular fitness value, even when there are other higher fitness regions. Premature convergence is a way of saying that evolving populations converge to only locally optimal regions in fitness space. This can also happen when a few comparatively fit individuals are allowed to dominate by being selected again and again in successive mating/mutation processes. This acts to decrease genetic diversity. In general, there seem to be two major objectives to be served in the selection process. The first is to ensure high reproductive chances for the fittest individuals. The second is to preserve maximum genetic diversity in order to explore as much of the search space as is possible. Different evolution strategies involve various tradeoffs between these two objectives. 3.1.3 Search Involving Recombinations and Mutations The next issue to be examined concerns the various forms of mating that have been identified [Dumitrescu, et al, 2000, 120 - 121]; they identify seven different mating paradigms. I can easily illustrate these seven methods using the minterm map on in Figure 1 on page 14 for binary chromosomes/minterms of length six. The first thing I will note is that there are 46 = 4096 possible chromosome pairs in this six-variable case. I will also use the notation for collections of 20 chromosomes I introduced in Figure 2 above on page 18. First, suppose that U(t 0) = X is the entire minterm map shown in Figure 1 on page 14. From this collection of 64 possible chromosomes/minterms we are to select some number M(t0) 64 for mating operations and possibly mutations. The first, and apparently the most popular, method involves random mating in which mates [pairs of chromosomes/minterms] are selected at random from M(t0). A second method involves inbreeding in which similar parents are intentionally mated. For example, we might choose to mate minterms M50 and M58 because their Hamming distance is just 1 [M50 has Cc and M58 has C]. The next mating strategy is called line breeding where one unique very highly fit individual is bred with other members of M(t0) and the offspring are selected as parents for the next generation. Suppose M44 is the lucky chromosome that gets chosen to mate with each one of, say, 20 other minterms that are randomly chosen to form M(t 0). There is, of course, an important question gone begging in this strategy; it assumes that we have a well-defined fitness function and could measure the fitness of every chromosome/minterm on the map. If we knew what was the fittest one of the lot, we would have no search problem on our hands. A fourth method is called out-breeding in which very different individual chromosomes/minterms are mated. Extreme examples in the minterm map on page 14 are M0 and M63, which are among the C(6, 0)26 = 64 chromosome pairs whose Hamming distance is the maximum 6. Having selected this chromosome pair, we might then select any of the other C(6, 1)26 = 384 chromosome/minterm pairs that have only one genetic element in common. The fifth method is called self-fertilization in which an individual is combined with itself. I may be missing something here, but I fail to see what is accomplished by this method. Crossovers would not, by themselves, produce new children; the parent would continue to clone him/herself. Only if mutations were added would any new children eventually result. A sixth method, called cloning, occurs when an individual chromosome is never replaced and is added, without modification, to every new generation. A seventh method is called positive assortive mating and occurs when similar individuals are mated. In the minterm map on page 14, exactly C(6, 6)2 6 = 64 chromosome pairs will have all six genetic elements in common and exactly C(6, 5)26 = 384 will have exactly five genetic elements in common. The seventh method, called negative assortive mating, sounds very much like out-breeding where dissimilar individuals are mated. An eighth method might be added to the list that involves incestual matings of various sorts; parents mating with children, children of the same parents mating together, and even children mating with their grandparents. In simulations I have performed, not all of these incestual matings simply produce copies of already generated chromosomes. However, such matings do severely restrict genetic diversity. Following are various strategies that have been considered for recombination of chromosome elements, not all of which involve crossovers of the genetic elements of two "parent" chromosomes; and not all of which are observed in nature. I begin with recombinations alone and then later consider them in combination with mutation operations. Such recombination and mutation operations are designed to enhance the exploration of more extensive regions of a fitness landscape. Here first is a selection of crossover strategies that have been employed; there are others as well. Single Point Crossovers: Suppose two chromosomes having n genes. Crossovers, or the exchange of genetic elements among these two chromosomes, are accomplished by selecting a crossover point for this exchange. If chromosomes have n genes, then there are n - 1 single crossover points. In most cases of evolutionary computation single crossover points are selected at random. Here is a case involving two parent chromosomes P1 and P2 where each parent has six binary genetic elements. Crossover point 2 has been selected and, after the exchange of genetic elements, two children chromosomes C1 and C2 result: P1 P2 C1 A Bc Ac B A Bc C2 Ac B 21 C D Ec Fc Cc D E F Cc D E F C D Ec Fc M44 M23 M39 M28 Referring to the six-variable minterm map on page 14, we mated minterms M44 and M23 and, after crossover, will produced M39 and M28 as children. It is apparent that children different from their parents will usually [not always] result from different crossover point settings. Such differences will not happen when genetic elements are the same for each parent below a single crossover point. Two-Point Crossovers: In the case of two chromosomes each having n genetic elements, there are C(n -1, 2) possible settings of two crossover points. Following is an example when n = 6 and we have the same two parents [M44 and M23] as in the previous example. P1 P2 C1 C2 A Bc Ac B A Bc Ac B C D Ec Cc D E Cc D E C D Ec Fc F Fc F M23 M38 M29 M44 As you observe, the children of these same two parents differ from those produced by the singlepoint crossover. N-Point Crossovers: The number of possible crossover points is obviously limited by the number of genes on any pair of chromosomes. One final example involves the same two parents M44 and M23 with three crossover points as shown: P1 P2 C1 C2 A Ac A Ac Bc C B Cc B Cc Bc C D Ec D E D Ec D E Fc F F Fc 22 M44 M23 M54 M14 As you see, different children are produced here from those produced by either of the two other crossover forms I have just illustrated. A very interesting issue arises here concerning crossovers, schema and their associated Boolean functions. The use of just a single crossover point can cause difficulties because it can prevent us from generating new children that are associated with some given schema of interest. Remember that schema can be expressed as Boolean functions for binary genetic settings. Schema can represent such things as expected successful or fit genetic combinations. Consider the following situation in which we have chromosome pairs having, say, eleven genes A through K. Here are two schema that I will express in two ways: S1 = (01* * * * * * * 11) = (AcB * * * * * * *JK), and S2 = (* * * 101 * * * * *) = (* * * DEcF * * * * *). First consider S1 and the following Boolean function and its interpretation. Let f1(A, B, C, D, E, F, G, H, I, J, K) = (AcBJK) This schema says: "Offspring will be fit to degree Y if they have B, J, and K, but not A". Now let f2(A, B, C, D, E, F, G, H, I, J, K) = (DEcF). This schema says: "Offspring will be fit to degree Y if they have D and F, but not E". We can, of course, combine these two Boolean functions to read: "Offspring will be successful to degree Y if they have either B, J, and K, but not A, or if they have D and F, but not E". The Boolean function here will be f1,2 = (AcBJK) (DEcF). Minterms or bit strings associated with f1,2, as just described, will have the following form S3 = (01 * 101 * * * 11) = (AcB * DEcF * * * JK). As I mentioned earlier, the minterm expansion theorem assures us that we can identify all the 24 = 16 specific chromosomes [minterms] that will be associated with S3. As discussed by Dumitrescu et al [2000, 109], if we mate two chromosomes represented by S1 and S2, and if we adopt a single-point crossover, we will be unable to have, as offspring, children that satisfy S3. But we will have such desireable offspring if we choose a two-point crossover operation instead. It happens that single crossover points can act to prevent the occurrence of offspring that are associated with complex fitness specifications. This is one reason why nature often employs multiple crossover points, and so do many persons engaged in studies employing evolutionary computation. Adaptive [Punctuated] Crossovers: Crossover operations need not be fixed throughout the evolutionary process; indeed these operations can be made to evolve themselves in light of results obtained in preceding generations. One such method, adaptive [punctuated] crossover, is described by Dumitrescu et al [2000, 110-111] as a method involving the self-adjustment of crossovers as evolutionary computation trials proceed. If a selected crossover point produces a good outcome it is retained; if not, it is allowed to die and new ones are introduced. Such strategy requires recording of crossover points along with the genetic information in any chromosome. Segmented Crossover: This is a variation of the N-Point crossover method in which the number of crossover points is not held constant over trials. Different numbers of crossover points are selected randomly as trials proceed. 23 Uniform Crossover: In this method there are no pre-defined crossover points. Instead, the state of any gene in an offspring is selected at random, according to various probability distributions, from the gene states of its parents. I next consider the operation of mutation as an evolutionary operation. The effect of this operation is generally to change the state of an individual genetic element in a chromosome. For example, in a chromosome having gene state A, this state may be mutated to state Ac. It happens of course that the states of more than one gene in a chromosome may be altered by mutation. In evolutionary computation, as in the evolution of living organisms, mutation is a probabilistic [or stochastic] rather than a deterministic process. A mutation probability, or mutation rate, pm is defined. This probability refers to the likeliness that any single genetic element of any chromosome will be altered by mutation. Suppose at time ti there are N chromosomes in some search space, each having n binary genes. This means that there are nN total gene states in this space or population at time ti. Thus, on average, there will be nNpm gene states in this population that will undergo mutation at time ti. As expected in searches involving evolutionary computation, mutation rates can be varied and many studies have been performed to find optimal values of p m in particular situations. Usually, however, mutation rate pm is set at very small values, typically in the range [0.001, 0.01]. It happens that mutation rates can be tuned or varied as the evolutionary search process proceeds. In other words, mutation rates need not be kept stationary as the process moves along. Suppose, as this search process continues, there is convergence to conditions of greater fitness on a fitness landscape. Mutations at this point might be disruptive. Various strategies have been employed to overcome such difficulties [Dumitrescu, et al, 2000, 139 - 144]. Some involve schemes for making pm time-dependent, such that pm decreases over time during trials. Other methods act to decrease pm as measured fitness of chromosomes in new generations increases. If evolutionary computations relied only on random mutations, convergence to regions of greater fitness might take a long time. Recombinations, in the form of crossovers, can speed up the process. A major role of mutations is to help prevent the loss of genetic diversity and helps prevent premature convergence. A result is that larger regions of a fitness landscape can be explored. However, Stuart Kauffman [2000, 16 - 20] has recently commented on the suitability of search procedures involving mutations, recombinations, or both. It might seem that search processes in the absence of metaphoric mating and recombination [i.e. mutation alone] are quite useless. However, Kauffman argues that, as search strategies, recombinations are only useful on smooth, highly correlated fitness landscapes where regions of greatest fitness all cluster together. He further argues that, averaged across all possible fitness landscapes, one search procedure is as good as another. This brings me to my final topic as far as Inventor 2000/2001 is concerned, namely a discussion of how alternative fitness criteria may be described for binary search processes. 3.1.4 Fitness Criteria, Fitness Functions and Boolean Functions Under the assumption of binary genetic elements of individual chromosomes/minterms/designs in some search space, Boolean functions and their decompositions into minterms or maxterms provide a theoretical basis for capturing many important elements of evolutionary processes. I cite as particularly elegant examples the abundant use of such functions by Stuart Kauffman in his work on self-organization and complex adaptive systems [Kauffman, 1993, 1995, 2000]. I will now argue that fitness criteria as well as the feasibility filters in this system can be represented in terms of Boolean functions. Having a set of fitness criteria is, of course, not the same as having a definite mathematical function appropriate for grading the fitness of individual chromosomes/designs in some search space. But being able to express fitness criteria is a step toward the development of explicit fitness functions. Using Boolean functions we can identify specific combinations of genetic elements, and their possible interactions, that seem to contribute to fitness [an example follows below]. The decomposition of Boolean functions in identifying fitness criteria does present several problems, most of which concern chromosome length and the incredible sizes of the search spaces that 24 result. I will mention some of these problems a bit later. I was about to lose interest in relating Boolean functions and fitness criteria until I read about the work on schemata in evolutionary computation as described by Dumitrescu et al. In discussing schemata such as those in the examples I mentioned above, Dumitrescu et al connect schemata and fitness in a somewhat curious way. In one place, following Holland [1975], they relate schemata to building blocks having "acceptable solutions" which, when combined, produce larger building blocks having "good solutions" [Dumitrescu et al, 2000, 33]. Another example involves their mating strategy called partially enumerative initiation that I mentioned earlier. This strategy concerns matings involving at least one member of schemata assumed to be "successful". In another place, while discussing N-point crossovers, they apply the term "successful" to chromosomes that represent schemata [2000, 109]. In these situations, the assumption is that there is some way of grading the "acceptability", "goodness", or "success" of particular chromosomes as the evolutionary process proceeds. As I have illustrated above, any schemata, as well as combinations of them, can be represented as Boolean functions. Minterm decompositions of any schemata or combination of them, represented as a Boolean function, can be decomposed to reveal the specific chromosomes that correspond to these schemata or combinations of them. Having mentioned the "acceptability", "goodness", or "success" of members of schemata, Dumitrescu et al then say: "No direct computation of the schemata fitness is made by the genetic algorithm" [2000, 37]. As far as I can tell from what they have said, the fuzzy fitness judgments just mentioned are based on knowledge of a problem domain or perhaps just on guesses or hypotheses about fitness. In Inventor 2000, a single criterion, overall design weight, was employed with complete recognition of the fact that other fitness criteria are necessary. Here are some thoughts about fitness criteria, fitness functions, and chromosomes/designs represented as minterms. Consider again the six-variable minterm map in Figure 1 on page 14. Suppose there to be a specific fitness function g that can be applied to any minterm Mi on this map, and further suppose that we have values of g(Mi) for all 64 minterms. If we had such a function we could say that we have described the fitness landscape for this complete collection of minterms, or search space of individual chromosomes/possible designs. There are some troubles here related to the interpretation of such a fitness landscape that concern the properties of minterms and the various numerical ways in which we can identify them. First, on pages 5 - 7 above I described a binary designator method for keeping track of minterms in an orderly way, and I mentioned how binary designators are also called bit strings. Using the binary designator or bit string for any minterm we can convert this binary designator into a decimal equivalent, and it is this decimal equivalent that we use to identify a minterm and to place it on a minterm map. Thus, minterm ABC cDEFc has an associated bit string 110110, whose decimal equivalent is the number 54; so we say that minterm ABCcDEFc = M54. A major question we have to ask is: What do the decimal equivalents assigned to each minterm tell us about this minterm? The obvious answer seems to be that the decimal equivalents we assign to minterms have only nominal scale properties. That is, all these numbers do is to identify the unique event [genetic element] combination that occurs in each minterm. Here is what these minterms numbers do not tell us. By the way, the following conclusions apply to minterm maps, or binary search spaces, of any size. First, two minterms [chromosomes/designs] having adjacent numbers on a minterm map are not necessarily close in terms of their genetic states. Some are close and some are not; but the relationships are orderly as I will explain. For example here are two adjacent minterms M 0 and M1, where M0 = AcBcCcDcEcFc and M1 = AcBcCcDcEcF; they differ only in terms of the complementation on one attribute [gene] F. But now consider adjacent minterms M31 and M32, where M31 = AcBCDEF and M32 = ABcCcDcEcFc. As you see, they have completely different genetic states; i.e. they differ in all six gene states. So adjacency of minterm numbers does not tell us how similar the minterms are in any straightforward way. However, the relationship between numbering adjacency and difference in genetic states is quite interesting and is shown in Table 2 on the next page. What the table shows is how many genetic elements a minterm has in common [C] with its immediate predecessor on the numbering scale ranging from 0 to 63. For 25 example, in Table 2 M16 has one element in common with M15 and five elements in common with M17. One interesting fact is that every odd-numbered minterm has exactly five elements in common with its immediate predecessor. Here is a simple accounting of the number of minterms that have exactly k elements in common with their immediate predecessors on a minterm map: k 5 4 3 2 1 0 Number 32 [All the odd-numbered minterms] 16 8 4 2 1 [Only M32] There's nothing magical about the orderliness of these results; they are a simple consequence of our ordering of the events in a minterm or chromosome so that we can obtain decimal equivalents for the bit strings or binary designators that identify each minterm. Minterm Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 C ---5 4 5 3 5 4 5 2 5 4 5 3 5 4 5 1 5 4 5 3 5 4 5 2 5 4 Minterm Number 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 C 0 5 4 5 3 5 4 5 2 5 4 5 3 5 4 5 1 5 4 5 3 5 4 5 2 5 4 26 27 28 29 30 31 5 3 5 4 5 59 60 61 62 63 5 3 5 4 5 Table 2 What's the essential message conveyed by Table 2? I believe it is that chromosomes/designs/minterms numbered in terms of binary designators or bit strings, however reasonable this seems, makes it quite difficult to obtain easily interpretable and orderly fitness landscapes in which we can talk about various fitness regions indicating many local and, perhaps, one global fitness region. I hope the following figure helps to illustrate my concerns. Global Max: M44 Local Max. M0 g(Mi) 0 1 2 M40 63 Minterm Map Figure 3. The figure above shows a hypothetical fitness landscape associated with chromosomes/designs/minterms that have been numbered in terms of their binary designators or bit strings. The numbering of the minterms on the map above is the same as the numbering shown in the figure on page 14. Associated with each minterm Mi is a numerical value g(Mi) that indicates its fitness according to criteria that have been established. In this landscape I have established one globally maximum value of g(Mi) and have indicated one among several values of g(Mi) that are only local maxima. One trouble with small fitness landscapes such as this one is that the search process would not take very long to find this global maximum, which as indicated is at M44. Depending on how many parents we chose in our initial mating population, M(t 0), we could easily exhaust all 64 possible chromosomes/ designs/minterms in a very few evolutionary stages involving crossover and mutations. Successive operations of crossovers and mutations producing new children designs would, in short order, keep coming back to the same chromosomes/designs already discovered. This shows the limitations of my simulations based on 27 small numbers of binary variables, which I fully understand. But this simple minterm map does let me illustrate what I take to be some important and interesting characteristics of evolutionary "trajectories" that might be taken on much larger fitness landscapes associated with chromosomes/designs/ minterms based on binary variables. The first point I will illustrate using the fitness landscape in Figure 3 concerns fitness discontinuities that can occur as the evolutionary process unfolds. We might believe that the evolutionary process always proceeds smoothly toward higher fitness regions and that an orderly convergence to regions of greater fitness is usually observed as a result. To illustrate how this steady convergence may not happen, suppose two parent chromosomes/designs have been selected for mating and that, after crossover, one of the children chromosomes is M 40 = ABcCDcEcFc. As the above figure shows, the fitness of M40, g(M40), is quite small. But suppose that, at the next evolutionary stage, the M40 chromosome had its gene Dc mutated to D. The result would be chromosome M44 = ABcCDEcFc whose fitness is the global fitness maximum. Asked to explain this drastic jump in fitness, we decide that A, C, and D interact strongly, in the absence of B, E, and F, to increase fitness. All it took to turn M40 from a loser to a winner was a genetic change from Dc to D. There are, of course, other routes to the chromosome M44, having the global fitness maximum in Figure 3. One such route begins with mating parents M 34 = ABcCcDcEFc and M12 = AcBcCDEcFc. If a single crossover point is set between the second and the third genes, one child that will result is M44 = ABcCDEcFc, whose fitness is globally maximum. The fitness of M34 and M12 could be any measured value less than the global maximum; only M44 has this maximum value in the present example. I wish to recall Stuart Kauffman's point, mentioned above on page 23, about crossovers being useful only on "highly correlated fitness landscapes in which regions of greatest fitness cluster together". His comment raises some interesting matters concerning fitness of chromosomes and the genetic similarity of adjacent chromosomes/minterms I have been examining. One issue concerns how closely fitness of chromosomes clusters around chromosomes/minterms that are adjacent on a minterm map. In Figure 3 above I have arbitarily assigned M44 a globally maximum value of a fitness function. Shown below in Figure 4 is the genetic closeness of each of the chromosomes/minterms adjacent to M44. The underlined number for each minterm indicates how many genetic elements the minterm has in common with M 44. For example, M35 has just one element in common with M44, while M45 has five elements in common with M44. 35 1 43 3 51 1 36 5 44 52 37 4 45 5 4 53 3 Figure 4 So, if fitness correlates with genetic similarity, then M36, M37, M45, and M52 should all have fitness close to the fitness maximum at M44. One trouble is that genetic similarity with M44 does not require the adjacency shown in Figure 4. For example M40 = ABcCDcEcFc, one of whose genes I mutated to yield M44 = ABcCDEcFc, has five elements in common with M44 before mutation. As you see, M40 is not adjacent to M44. I know that my fitness assignments in Figure 3 are perfectly arbitrary. All my present argument shows is that, if fitness and genetic similarity are correlated, then we might not have the fitness clustering that Kauffman suggests. Another example involves the two chromosomes, M34 = ABcCcDcEFc and M12 = AcBcCDEcFc, that I mated 28 to produce M44. M34 has just three genetic elements in common with M44, but M12 has five elements in common with M44. Neither M34 nor M12 are adjacent to M44 on the fitness landscape. Here is an example of a fitness criterion and how it could be expressed as a Boolean function in the same way I expressed fitness criteria in terms of schemata above on page 22. The criterion reads, in the six-binary variable case: "Chromosomes [designs] are fittest when we have A, B, and C, provided that we do not have either E or F". In symbols, f(A, B, C, D, E, F) = ABC(EF)c = ABC(Ec Fc). From the minterm map on page 14 we see that M56, M57, M58, M60, M61, and M62 meet this criterion. Asked to explain this criterion we might observe that the fittest chromosomes/designs are the result of an interaction involving A, B, and C, that occurs when either or both of E and F are absent. Further, this interaction between A, B, and C is not affected by what state D is in. So, this fitness criterion, expressed as a Boolean function acts to define a fitness region in the fitness landscape shown across a minterm map. There is an interesting relation between fitness functions and composite utility functions encountered in decision theory and analysis. As in decision tasks, evolutionary processes produce consequences that have value or utility. Value or utility here concerns the fitness of the consequences of evolutionary processes in the form of new chromosomes or, in the case of Inventor 2000/2001, new engineering designs. In both situations the consequences are multiattribute in nature. Each attribute specifies a single dimension along which the value of a consequence can be measured. For example, in deciding which one of several houses to purchase we consider all the following attributes and, perhaps, many others as well: A1, Cost: A2, Location; A3, Driving distance to work; A4, Floor plan; A5, House size, and so on. For any particular house, we assign a value [V] to each of the observed levels of any of these attributes; in short, we have V(A1), V(A2), V(A3) and so on for each of the attributes we have identified and for which we have measured values. In the case in which houses have n measurable attributes, the composite value V of any house [Hi] we are considering can then be represented by V[Hi] = f[w1V(A1), w2V(A2), ..., wiV(Ai),..., wnV(An)], where wi is an importance weight assigned to attribute Ai, V(Ai) is the value attached to some level of attribute Ai, and f is some real-valued function that prescribes how the importance-weighted values V(Ai) attached to each attribute are to be aggregated or combined to produce the composite value V[Hi]. Determining sensible identifications of function f in different circumstances is anything but easy in decision theory and analysis. A frequently-employed simplification is to assume that f is linear, in which case we have: i n V [ Hi ] wiV ( Ai ). i 1 Such a linear function of course ignores any of the possible dependencies or interactions that may exist among the n value attributes. Many books and papers have been written on the various forms function f might take that account for various forms of interactions [e.g. Keeney & Raiffa, 1976]. When we have n attributes we have 2 n -{n+1} possible interaction patterns involving two or more attributes. The frequently-made linearity assumption ignores any subtleties or complexities that reside in value attribute interactions. As we know, linear models never expose any surprises that may so often lurk in these interactions. As I will now explain, we have these same difficulties in our attempts to define useful/reasonable fitness functions g[Mi] that grade the overall fitness of any chromosome/design/minterm Mi. Clearly, the appearance of a fitness landscape assigned across some universe or search space, represented by a minterm map, depends crucially on how fitness function g[Mi] is defined. There are some unusually difficult problems associated with grading the fitness of chromosomes/designs when such a process is compared with value assessment in decision analysis. Let's first see what is involved in assessing the value or utility of various levels of an attribute of a choice consequence in decision analysis. A very common method is based on 29 grading the value/utility of an attribute on a [0, 1] scale, where 1 means highest value and 0 means lowest value. For a certain attribute a value function may be increasing if having more of this attribute is better than having less of it. One example would of course involve money. Most people would prefer having more money than less money. In other situations, however, having more of something is worse than having less of it; this leads to a decreasing value function. Driving distance to work is a good example in which we encounter decreasing value functions. Both of these examples involve monotone functions. But there are situations encountered in which value is a concave function involving an intermediate maximum value with a tailing off of value on either side. Grading the overall fitness of a design, represented as a minterm, presents difficulties that are not encountered in grading the composite value/utility of a decision consequence. Consider a minterm or design Mi whose overall fitness g(Mi) is to be established. I must return to the subscript identification of the event classes involved in a minterm that I introduced earlier on page 5. What I need to illustrate fitness function identification is a "generic" minterm whose exact genetic elements [alleles of genes] have not yet been identified. First, suppose A = {An-1, An-2, ..., A1, A0} is the generating class for a minterm map resulting from n binary event n 1 classes of the form Aj = {Aj, Ajc}. By definition, minterm MI = Y j 0 j , where each Yj is either Aj or Ajc. The generic minterm Mi, expressed in these terms is: Mi Yn-1 Yn-2 . . . Yj . . . Y1 Y0 Yj = Either Aj or Ajc. g(Mi) = Overall fitness of Mi. Next, consider individual gene Yj whose possible alleles [genetic states] are Aj and Ajc. What we need is a function fj(Yj) that prescribes the contribution of gene Yj to overall fitness g(Mi). We might first suppose that fj(Aj) fj(Ajc); i.e., that the contribution to overall fitness of genetic element or allele Aj is not the same as the contribution of Ajc. But this might not always be the case because of possible interactions among genetic elements. For example, it might be true that when Yk is in state Ak, then fj(Aj) = fj(Ajc). In other words, Ak renders overall fitness g(Mi) the same whether Yj is in state Aj or Ajc. The next consideration illustrates the basic distinction between grading the overall fitness of a design/chromosme/minterm and the composite value/utility of a multiattribute decision consequence. In decision theory and analysis involving multiattribute choice consequences we assume that the value or utility associated with any attribute of this consequence is graded on the same scale, commonly taken to be the [0, 1] interval. Employment of this grading scale is a consequence of the celebrated preference axioms of von Neumann and Morgenstern [1946]. What they showed was that value/utility of consequences and their attributes could be graded on a conventional probability scale. Of course its true that the form of the value functions applied to 30 different consequence attributes may be quite different. Some may be increasing, some decreasing, and some nonmonotonic. In grading the contribution of genetic element Yj to the overall fitness of design or minterm Mi, we will observe that the fitness contributions given by each fj(Yj) are different for each attribute Yj. In other words, fn-1, fn-2, ..., fj, ..., f1, f0 may all be quite different. A basic trouble is that, unlike value/utility gradation, we will probability not be able to put all of these fitness measures on a common scale of measurement. For example, suppose that in grading the fitness of a wind-bracing design Yj = design weight and Yk = positioning of the crossmembers of the design. We expect that fj(Yj) to be different from fk(Yk) and that their fitness is measured on quite different scales. It might be worth investigating to see whether "fitness" can be graded on a common scale across design attributes in the same way that value/utility can be graded, across attributes, on a common scale. This would simplify matters a bit. We come, finally, to perhaps the greatest difficulty in determining overall fitness function g(Mi) for designs/chromosomes represented as minterms Mi. This difficulty concerns how we are to aggregate or combine our individual genetic fitness contributions f j(Yj). In theory, what we need is given by: g(Mi) = F[fn-1(Yn-1), fn-2(Yn-2) ..., fj(Yj) , ..., f1(Y1), f0(Y0)], where F is a rule for combining the individual fitness contributions f j(Yj). All I can say at this point is that F is presumably non-linear. We could assume linearity, as so often done in decision analysis, but this would invite all the problems associated with ignoring important interactions or nonindependencies among the fitness contributions of the individual genes in designs represented as chromosomes or minterms. In most cases, just listing all the possible interactions among these genetic elements would be an impossible task, let alone evaluating the fitness consequences of these interactions. Again, if there are n binary genetic elements in a chromosome/ design, there are 2n - {n + 1} possible interactions involving two or more of these genetic elements. Determining F, as well as determining each of the f j(Yj) requires considerable domain knowledge. As I noted earlier, our entire representation of a fitness landscape across some search space depends upon how we define each f j(Yj) and F. There is so much more to be said about the task of defining these crucial features of our work involving evolutionary computation. 3.2 Boolean Functions and Evidence Marshaling in Discovery. My present belief is that Boolean functions and their decompositions can play an important role during the process of discovery, especially when they are combined with the genetically inspired evolutionary computational strategies I have briefly reviewed. I’ll begin with the process of evaluating hypotheses. 3.2.1 Hypothesis Evaluation My first task is simply to illustrate how hypotheses in discovery-related activities can themselves be associated with Boolean functions. Following are some abstract assertions similar to those made frequently and easily, in non-abstract circumstances, by scientists, engineers, intelligence analysts, physicians, attorneys, historians, auditors and others who perform complex discovery-related or fact investigation tasks. Suppose three persons fall to arguing about the conditions under which hypothesis H might be true. Let’s leave aside for the moment what led to the generation of hypothesis H in the first place. 1) First Person: "If events A, B, and C occur, or, if D and E occur but F does not, then I would argue that hypothesis H would follow". 2) Second Person [In response to the first]: "I'll go along with your argument, provided that event D does not occur. I don't believe H will explain D occurring in the presence of E but not of F". 31 3) Third Person [In response to the first two]: "I think you are both wrong. For H to be true, it does not matter whether events A, B, or C occur, all that matters is that D occurs when neither E nor F occur". In this simple situation we have three persons each making an assertion about hypothesis H in the form of a Boolean function. For Person 1 the Boolean function is f 1 = ABC DEFc. For Person 2, the function is f2 = ABC DcEFc. For Person 3 the function is f3 = DEcFc. I'll note here that Person 3's assertion is in fact a schema as defined in evolutionary computation. Another way to write this assertion as a bit string is: (* * * 100). Using the minterm map shown above on page 14 we can list all possible binary event combinations that are consistent with each of these three assertions. For Person 1, minterms M56 through M63 and minterms M6, M14, M22, M30, M38, M46, and M54 are all specific event combinations that are consistent with H. For Person 2, hypothesis H would explain minterms M56 through M63 and minterms M2, M10, M18, M26, M34, M42, and M50. For Person 3, hypothesis H would only explain minterms M4, M12, M20, M28, M36, M44, M52, and M60.Thus, Person 3’s assertion is the most restrictive of the three Boolean statements about hypothesis H. For Persons 1 and 2 there are fifteen different event combinations that could be explained by hypothesis H, though seven of the event combinations are different for these two persons. But only eight event combinations are explained by H according to Person 3. A minterm decomposition of each of these three Boolean assertions allows us help settle arguments about whose specification of H, if any among the three that are offered, agrees with evidence obtained about these six binary classes of events; here are some examples. First, suppose we know now that H is true but have observed M43 = ABcCDcEF. This makes all three persons wrong in their assertions about H, since M43 does not appear in the decomposition of any of their Boolean assertions [i.e. we need an entirely new definition of hypothesis H]. Suppose instead that we have observed M58, when we know that H is true. This agrees with the assertions of both Person 1 and Person 2 since M58 appears in decompositions of both of their Boolean assertions. However, we cannot tell who is generally correct, Person 1 or Person 2; all we have done is to rule out Person 3’ s assertion regarding hypothesis H. Evidence in the form of M 60 = ABCDEcFc would not rule out any of the three assertions made about H, since it appears in the decompositions of all three of them. Finally, some evidence combinations would favor just one hypothesis assertion among the three we have considered. For example, M 38 would only favor Person 1’s assertion, M26 would only favor Person 2’s assertion, and M28 would only favor Person 3’s assertion. 3.2.2 Hypothesis Generation Boolean functions and their decompositions can play other roles besides helping to evaluate the suitability of hypothesis statements. They also appear in the generation or discovery of new hypotheses, which I will now explain. This role brings Boolean functions in contact with the evolutionary computational approach we are using, as well as with the evidence marshaling tasks studied by Carl Hunt. In the examples just provided we were considering some hypothesis H and what it might explain, but we gave no consideration to the manner in which this hypothesis was first generated; perhaps it was just a guess on someone’s part. But I now wish to consider the generation of hypotheses from evidence we are gathering and how Boolean functions arise during this process. As I proceed, I will present some thoughts about how evolutionary computational strategies can be employed during these important discovery-related tasks. One place to begin is with Sherlock Holmes and his views about the process of discovery. Along the way I will combine his thoughts with those of John Henry Wigmore, the celebrated American evidence scholar, whose writings on evidence in law I have so often plundered during the past 35 years. In The Boscombe Valley Mystery, Holmes tells his colleague Dr. Watson: "You know my method. It is founded on the observance of trifles". The trifles to which Holmes was referring consist of any form of observation, datum, or detail that may later become evidence in some 32 inference task when its relevance is established. Evidence is relevant if it bears in some way on hypotheses already being entertained or if it allows the generation of a new hypothesis. Trifles may be tangible in nature in the form of objects, passages in documents, details in sensor images, etc; or they may be obtained from the parsing of the testimony from human sources. In every case, however, evidence reveals the possible occurrence of certain events whose actual occurrence would be important in inductive or probabilistic reasoning tasks. It is here that Wigmore's ideas become very important. In his Science of Judicial Proof [Wigmore, 1937] Wigmore advises us to distinguish between evidence of an event and the event's actually occurring. Evidence of an event and the event itself are not the same. Just because we have evidence that event E occurred does not entail that it did occur. The source of this evidence, including our own observations, might not be perfectly credible. So, we must distinguish between evidence E* and event E itself; from evidence E* we can only infer the occurrence of event E. As we make observations and ask questions [perhaps the most important element of discovery], trifles begin to emerge, often at an astonishing rate. From an emerging base of trifles we begin to generate ideas in the form of hypotheses concerning possible explanations for the trifles we are observing. Here we encounter search problems having the same complexity as the ones encountered in Inventor 2000/2001. Figure 5 illustrates a major problem we face in generating new hypotheses in productive and efficient ways. A Trifle Base Hj Hi Figure 5 On occasion we might get lucky and be able to generate a plausible hypothesis from just as single trifle. In Figure 5, just a single trifle allowed us to generate hypothesis H i as a possibility. As an example, the finding of a fingerprint, a DNA trace, or a footprint might allow us to identify a particular suspect in a criminal investigation. More commonly, however, new hypotheses arise as we consider combinations of trifles, details, or data. In Figure 5 a new hypothesis H j is suggested by bringing together, marshaling, or colligating [Charles S. Peirce's term] several trifles that seem to be related in some way. Here is where the trouble starts. Looking through all possible combinations of trifles, in the hope of finding interesting ones, is as foolish as it is impossible. The number of possible combinations of two or more trifles is exponential with the number of trifles we have. With n trifles, we have 2n -{n+1} possible trifle combinations. The question is: How do we decide which trifle combinations to examine and that might be suggestive of new hypotheses that can be taken seriously? I add here, parenthetically, that the events of September 11, 2001 have occurred since I wrote the first version of this paper. Since these events, we have all heard the phrase: "connecting the dots" and how our intelligence services have not been so good at this task. What Sherlock Holmes referred to as "trifles" are now commonly referred to as "dots". There is nothing simple about the task of "connecting the dots" in any situation whether in intelligence analysis, medicine, history, law, or whatever. 33 In our work extending over nearly ten years, Peter Tillers and I studied a variety of evidence marshaling strategies designed to enhance the process of generating new hypotheses from combinations of trifles [or "dots"]. Different evidence marshaling strategies are necessary at different points during an episode of discovery. We identified fifteen different evidence marshaling strategies and showed how they could be implemented in a prototype computer system we called Marshalplan. A review of this work appears elsewhere [Schum, 1999]. Each marshaling operation we identified plays the role of a metaphoric magnet or attractor for bringing together trifles that, taken together, may suggest new hypotheses or new lines of investigation. Carl Hunt extended this work considerably by showing, in his doctoral dissertation [2001], how trifles [potential items of evidence] could be made to self-organize and to suggest scenarios leading to new hypotheses and new lines of investigation. A few words are necessary about hypotheses and their mutation or revision as discovery proceeds. Hypotheses in many areas are generated from marshaled collections of thoughts and evidence regarding events that happen over time. Taken together and ordered temporally, these collections of thoughts and evidence begin to resemble scenarios, stories, or possible complex explanations. We can in fact represent these scenarios as minterms, or chromosomes, provided that we restrict our attention to binary events. But we need to expand the kinds of binary genetic elements involved in a scenario considered as a minterm or chromosome. There are three classes of binary elements [genetic states] we need to consider in the construction of a scenario, story, or narrative account of some emerging phenomenon for which we are seeking an explanation or hypothesis. In some cases we will have specific evidence A*, that event A occurred. This is called positive evidence; so-named because it records the occurrence of an event. But we might instead have received evidence Ac*, that event A did not occur. Evidence of the nonoccurrence of an event is called negative evidence. So, one class of events we must consider is the binary evidential class {A*, Ac*}. In some rare instances, we might be willing to say that we know for sure that event A occurred. Knowing for sure that event A occurred means that the source of evidence about this event is unimpeachable. We might instead “know” that event A did not occur [A c]. So, another possible binary event class is {A, Ac}. Finally, in order to fill in gaps left by evidence we do not have, or by lack of any knowledge about events, we often insert hypothetical events that are also called gap-fillers. Usually, these gap-fillers are based on guesses, hunches, or upon past experience. Here is the major heuristic value of constructing scenarios during discovery and investigation. Each gap-filler we identify in order to construct a scenario or story that “hangs together” opens up a new line of investigation. Let a = a gap-filler or hypothetical saying that event A might have occurred. Then , let ac = a gap-filler saying that event A might not have occured. Thus, the binary class {a, ac} represents gap-fillers or hypotheticals indicating the guessed or inferred occurrence or nonoccurrence of event A. All stories or scenarios are mixtures of fact and fancy. The fanciful elements of our scenarios consist of these gap-fillers or hypotheticals. I add here that, when I speak of marshaling thoughts and evidence, at least some of these thoughts may be the gap-fillers we introduce in scenarios. They do in fact represent potential items of evidence we might collect. Here, in symbolic form, is what a scenario might look like when cast in terms of the three binary classes just identified: {A*, Ac*}, {A, Ac}, and {a, ac}. First, suppose our scenario concerns events A, B, C, D, E, and F. We have evidence A* and C* that events A and C occurred, and we have evidence Ec* that event E did not occur. Having no evidence [yet] about whether event B occurred, we insert gap filler b to link together evidence items A* and C*. We insert another gapfiller dc as a guess that event D did not occur. Finally, suppose we are willing to believe with perfect confidence that event F occurred. In a homicide investigation, for example, F may represent the event that victim V was killed. We know V was killed because we are presently looking at V's corpse on a slab [V was identified by his wife]. So, in minterm or chromosome form, our scenario can be represented by: (A*bC*dcEc*F). This means that we have the following six 34 generating classes of events: {A*, Ac*}, {b, bc}, {C*, Cc*}, {d, dc}, {E*, Ec*}, and {F, Fc}. We can still employ the binary designator or bit string method to keep track of the 64 possible minterms representing possible variations in the scenario being constructed. In the present example we have the bit string (111001). So, our current minterm (A*bC*dcEc*F) = M57, using the six variable binary designator system shown in Figure 1 on page 14. Variations in our emerging scenario or story may take place for any number of reasons. Some of these variations will occur that involve the six classes of binary events suggested by our scenario M57. As time passes and we gather new evidence and have new ideas, we will of course need to add new classes of events representing new evidence, new gap-fillers, and possibly new known events. In short our minterm map will naturally grow larger. In complex situations the number of possible scenarios or stories we can tell will begin to approach the size of the search space in Inventor 2000. Using just the six generating classes shown above, here are some ways in which our scenarios or stories might change. Any changes may well suggest new hypotheses, some of which may be quite interesting and perhaps even more valuable than the story we are currently telling on the basis of (A*bC*dcEc*F) = M57. First, we might be interested in seeing whether a story would make sense, and suggest a new hypothesis, if we changed gap-filler b to bc and/or changed gap-filler dc to d. This brings to mind the mutation operations in evolutionary computation. Other scenario revisions may have a basis in the credibility of the sources of our evidence. For example, we had a source who/that reported the occurrence of event A [this report we labeled A*]. Suppose we now have reason to believe that this source’s credibility is suspect and possibly have another source that reports Ac*, that event A did not occur. Or we might instead wish to examine how our story might change if the original source had reported Ac* rather than A*. Finally, we must be prepared to change our minds about whether we really “know” that a certain event occurred. For example, as described above, we let “known” event F = Victim V was killed. Victim V was positively identified by a woman who identified herself as V’s wife. What we know for sure is that we have a dead person on our hands. However, we might discover that this woman is not V’s wife after all. Can we still be sure that this dead person is V and not someone else? Thus, we might consider how our story would change if we changed event F to event F c. Our scenarios or stories might be revised for other reasons that will involve changes in the ingredients of their minterm representations. For example, (A*bC*dcEc*F) = M57 is based on two gap-fillers b and dc. These gap-fillers, as mentioned, open up new lines of investigation and so we begin the search for evidence about these events that now have just hypothetical status. Suppose we find credible evidence that events B and D did in fact occur [i.e. we have evidence B* and D*]. In our guesses, we were apparently correct to guess b, but incorrect to guess dc. So now our story, in minterm form, looks like: (A*B*C*D*Ec*F). We still have 64 possible alternative scenarios except that we now have five evidence classes and one class representing “known” events. Our binary designator in this example numbering will also change because we altered the complementation pattern [i.e. we went from dc to D*]. So, our new scenario (A*B*C*D*Ec*F) = M61 [its new bit string is (111101)]. The point is that we can always keep track of possible scenarios in an orderly way, provided that we make appropriate adjustments in the manner in which we identify the generating classes for a minterm map. I hope one point emerging from the above discussion of minterm representations of scenarios or stories is that we still have a search problem on our hands. It is just possible that the methods of evolutionary computation can assist us in generating “fitter” explanations or hypotheses. Just because we construct a scenario or tell a story does not mean that all of its ingredients are as we suppose them to be. In Carl Hunt’s doctoral dissertation work, his ABEM system generated “histories” that could be “reversed parsed” and converted into scenarios [Hunt, 2001]. These scenarios, in turn, allow the user to generate hypotheses to explain the events being observed. Carl’s work made no use of evolutionary computation but I am going to suggest 35 that such methods might be very useful in generating alternative scenarios or stories that may suggest alternative hypotheses, some of which may be more interesting and productive than ones we initially entertain. I have already described how we might “mutate” the ingredients of a scenario, expressed as a minterm or chromosome. My next task is to illustrate how we might recombine, via crossovers, the ingredients of two “parent” scenarios to produce, as children, entirely new scenarios or stories. These new scenarios may in turn suggest possibilities or hypotheses that are “fitter” than ones we may earlier have entertained. Discovery or investigation rests on the process of inquiry, the asking of questions. Such questions themselves can become “magnets” or “attractors” for extracting from an emerging trifle base interesting combinations of trifles that may suggest new and more valuable [fitter] hypotheses. We understand that, at present, there are no computers that can, by themselves, generate hypotheses from collections of thoughts and evidence. But they can certainly be made to assist the persons whose intelligence, experience, and awareness does allow them to generate hypotheses or possible explanations. Hypotheses themselves can serve as “magnets” or attractors in bringing together or colligating combinations of trifles. Some of these trifles may become evidence favoring the hypothesis that attracted them, others may become evidence that disfavors this hypothesis. Following is one example of how the recombination process in evolutionary computation might be very useful in generating and exploring new combinations of trifles. The example I have chosen involves two scenarios represented as minterms or chromosomes that may each have been generated by a question. In some cases, of course, the questions we ask may be related in some way and so we might expect them to attract at least some of the same trifles. Suppose the first question [Q 1] attracts the evidential trifles A*, C*, Ec*, and F*. The person asking this question believes this combination of trifles suggests a scenario [S1] and inserts gap-fillers b and d to make the emerging scenario tell a more connected story. Arranged in temporal order, the elements of this scenario, expressed in minterm form, are as follows: (A*bC*dEc*F*). A second question [Q2] is asked, possibly though not necessarily by the same person who asked Q1. This second question attracts evidential trifles A*, C*, K* and L* that together suggest a different scenario [S2]. To make this scenario more coherent the person inserts gap-fillers g and j. Arranged temporally, a minterm representation of this scenario is: (A*gC*jK*L*). As you observe trifles, A* and C* appear in both of these scenarios. Notice, however, that the events recorded in A* and C* are linked in different ways in these two scenarios. In S1 they are linked by gap-filler b and in S2 they are linked by gap-filler g. Also notice that each of these scenarios suggests a different six-variable minterm map, each one showing possible variations of each of these two scenarios. The difference is that S 1 concerns events A, B, C, D, E and F, but S2 concerns events A, G, C, J, K, and L. The two minterm maps just described show two ways of varying our scenarios or stories; but there is a third way that involves crossovers among the two scenarios. As shown below, suppose we mate our two scenarios and set a single crossover point at the third position. The result is: S1 S2 S3 S4 A* A* A* A* b g b g C* C* C* C* d j j d Ec* F* K* L* K* L* Ec* F* 36 By mating S1 and S2 we have produced two new scenarios, as children, S3 and S4. It may happen that either S3 or S4 suggest entirely new hypotheses no one would have thought of from S1, S2, or any of their possible variations. So, what the crossover operation has done here is to suggest entirely different stories, all of which will be based on existing thoughts and evidence. A final thought here is that both S3 and S4 generate new and different minterm maps, each of which will provide possible variations in these two new scenarios or stories. Perhaps one of these revisions of either S3 or S4 will be even “fitter” than S3 or S4. It is probably well past time for me to take on the task of trying to state what fitness means when this term is applied to new hypotheses being generated. As mentioned earlier, in the generation of new engineering designs it is possible, though usually difficult, to develop real-valued multivariate fitness functions that grade the overall fitness of designs being generated by evolutionary computations. The degree of fitness of new hypotheses certainly raises some interesting and important issues, only some that have, to my present knowledge, been addressed. The first issue concerns the process of discovery itself and the nature of the hypotheses we generate or discover. Basically, all of my discussion of Boolean functions and minterms in the generation of hypotheses has involved possible adjuncts to the abductive reasoning process by which new hypotheses arise. In particular, the possible applications of evolutionary computation to scenario and hypothesis generation that I just described can be thought of as ways to assist people in performing imaginative or abductive reasoning. This form of reasoning is, according to Charles S. Peirce, associated with the generation of new ideas in the form of hypotheses or possible explanations of phenomena of interest to us. As we know, deductive reasoning shows that something is necessary and inductive reasoning shows that something is probable. On most accounts new ideas are not generated by either deductive or inductive reasoning. But abductive reasoning only shows that something is possible. Grading the fitness of new hypotheses in terms of probability, at least ordinary Kolmogorov probabilities, does not seem sensible since, during discovery, we may not have any disjoint and exhaustive hypotheses. In addition, as I will mention a bit later, our hypotheses may easily mutate or change, or be entirely eliminated, as discovery lurches forward. Because abductive reasoning just generates hypotheses that are possible, perhaps we might consider grading them in terms of their possibility. I know of one person who has carefully distinguished between possibility and probability, namely the British economist G. L. S. Shackle. In his work Decision, Order, and Time in Human Affairs [1968], he offers a theory of potential surprise according to which we might grade the possibility of hypotheses. Clearly, possibility and probability are not the same. Some hypothesis, certainly very possible, might have very low probability in light of present evidence. The distinct possibility that you have a certain disease worries you. But after an extensive series of diagnostic tests your physician tells you to stop worrying since the results of not one of these tests shows any likeliness of your having this disease. Shackle’s theory of potential surprise is quite interesting but I have not yet examined whether its requirements could be met during discovery in which our hypotheses may suffer continual mutation, change, or elimination. I now address these matters. In many situations hypotheses we generate are initially vague, imprecise, or undifferentiated. For example, in a criminal investigation we may first entertain the hypothesis H 0, that the victim’s death was the result of criminal act. Our first evidence is that the killer was male [A*]. So, H0 now reads: “The victim was killed by a criminal act committed by a male” [only slightly more specific]. New evidence suggests that the killer was also under the age of 30 [B*]. H0 now is: (A*B*), that the killer was male and under the age of 30. We next guess that the killer was lefthanded [c]; so now H0 = (A*B*c). Further evidence suggest that the killer was known to the victim {D*]. Now H0 becomes: (A*B*cD*) and begins to resemble a scenario or story. H0 would now read: “The victim’s death was the result of a criminal act performed by a male under the age of 30 37 who was known to the victim and who was possibly left-handed”. In Glenn Shafer’s terms [e.g. 1976, 115-121], what we have done here is to refine hypothesis H0 by incorporating new evidence in it to make it more refined or specific and less vague. Later very credible evidence that the killer was female might cause us to eliminate H0 altogether. We might be well-advised, however, to keep track of all the reasons why we chose to eliminate H 0 [at least tentatively]; this gives us some protection against hindsight critics who will chastise us if it turns out that H 0 contained truth after all. Shafer’s system of belief functions does allow us to assign numbers to hypotheses that mutate or change. It may be useful to examine this system carefully in connection with discovery-related tasks. Speaking of truth, it might be argued that the obvious way to grade the fitness of generated hypotheses is the degree to which they seem to contain “truth”. The global maximum on the fitness landscape across all hypotheses that could be considered would be the hypothesis that is truthful in all respects; i.e. no other hypothesis could possibly offer a better explanation of the phenomena of interest. There are many troubles with this prescription, perhaps the most obvious being that we may never be able to tell the extent to which any hypothesis contains “the whole truth and nothing but the truth”. Twelve jurors reached the verdict, beyond reasonable doubt, that Nicola Sacco and Bartolomeo Vanzetti were guilty of first-degree felony murder in the slaying of a payroll guard named Allesandro Berardelli on April 15, 1920. Did the jurors reach “truth” in their verdict? This question still arouses great controversy today as Jay Kadane and I discovered in our probabilistic analysis of the Sacco and Vanzetti evidence [1996]. The point is that in so many situations there will never be any “gold standard” or ground truth against which to evaluate hypotheses generated during discovery. So, what are we left with as possible ways to assess the “fitness” of new hypotheses we generate during discovery? Two possible suggestions known to me at present come from the works of the logicians Jaakko and Merrill Hintikka [e.g. 1983, 1992, 1991] and the philosopher Isaac Levy [e.g. 1983, 1984, 1991]. The Hintikka’s propose an interrogative model for discovery in which a game is being played against nature. At any step in this game the player’s have a choice about whether to deduce what is or has happened from acquired knowledge or to ask a new question of nature. These new questions are graded by the extent to which they are strategically important in the sense that they open up new productive lines of investigation. This eliminates problems associated with trying to define and identify what are the “right” questions. At any stage of discovery, only clairvoyance would allow us to determine exactly what questions we should ask at this stage. So, one way to assess the fitness of hypotheses seems to be in terms of their judged strategic importance: Will this new possibility we are considering allow us to further the investigation in productive ways? In many episodes of discovery this might not be such an easy question to answer. The complex and especially nonlinear nature of the world about us is always full of surprises. In Levy’s works he attends, among other things, to distinctions between the various forms of reasoning I mentioned above. Specifically, he notes current arguments that discovery involves induction rather than abduction. In his own arguments Levy notes that induction, involving the justification or testing of hypotheses, we already have existing hypotheses that have been generated somehow and that are taken seriously. In abductive reasoning, however, our only claim is that some hypothesis is possible; i.e., we have not yet accepted this hypothesis into our corpus or body of knowledge. Levy argues that one way of grading the suitability or fitness of a new hypothesis is to test its informational virtue; i.e. to see what new phenomena [potential items of evidence] this new hypothesis suggests. This new hypothesis will be useful to the extent that it allows us to open new lines of evidence and to generate other new possibilities. In closing here I simply mention that discovery-related processes are not sufficiently well understood so that we can easily describe how they may be assisted in various ways. Attempts to contrive a logic of discovery have not met with any discernable success. It has been said that the human brain is the very cathedral of complexity in the known universe [Coveny & Highfield, 1995, 279]. As far as I can tell the most interesting and complex "services" taking place in this cathedral 38 concerns how we generate new ideas in the form of hypotheses or possible explanations for events of interest to us. 3.3 Boolean Functions, Ksat, and Phase Transitions in Discovery I hope I have given adequate evidence that, for binary events, hypotheses can be expressed as Boolean functions. I have given two examples. The first involved expressing hypotheses about design fitness criteria as Boolean functions; I gave an example above on page 27. My second example [page 31-32] involved any investigation in which alternative conjunctive event combinations [scenarios] may be associated with some hypothesis. Further, as my comments on the minterm and maxterm representations of any Boolean function illustrate, we can decompose such functions to determine the exact number of specific ways in which any Boolean function can be satisfied. Thus, in the case of engineering designs, we can in theory at least determine how many specific designs, expressed as minterms, satisfy given fitness criteria. For prodigiously large design search spaces, such as those encountered in Inventor 2000, we will not be able to list all the possible ways in which some fitness criteria can be satisfied. In the case of other investigations we can use these decomposition methods to help settle arguments about which combinations of evidence [i.e. scenarios] some hypothesis best explains. Since all discovery involves the generation of hypotheses, I have always been interested in ways for determining alternative specific ways in which some hypothesis might be satisfied. How pleased I was recently to discover that others have been deeply concerned about Boolean functions and the satisfiability of hypotheses in various contexts. While reading Stuart Kauffman’s new book Investigations [2000] I was greatly interested in his discussion of Boolean functions and what he has termed the Ksat problem [Kauffman, 2000, 192 - 194]. The term Ksat is shorthand for K-satisfiability. Associated with this Ksat problem is a very interesting phase transition that, I believe, has very important implications for hypothesis generation and testing in any context. The following comments, if nothing else, supply a different formulation for examining Ksat problems. What I have done is to take an example Boolean function that Kauffman presents and express this same function in two formally equivalent ways using the minterm and maxterm expansions I have discussed. In the process I hope to add a bit to the discussion of the Ksat problem and its consequences. Kauffman begins by saying [2000, page 192]: “Any Boolean function can be expressed in ‘normal disjunctive form’, an example is (A1 or A2) and (A3 or A4) and (not A1 or not A4)”. Sloth overtakes me just now and so I will eliminate the subscript terminology here and let A1 = A, A2 = B, A3 = C and A4 = D; I will also suppress the intersection symbol according to the convention I mentioned above on page 3. With these revisions, Kauffman’s Boolean function can be written as: f(A, B, C, D) = (A B)(C D)(Ac Dc). Before I express this function in two different ways, I need to say a bit about terminology. It seems that there is some disagreement among mathematicians about what to call Boolean statements involving disjunctions of conjunctions or conjunctions of disjunctions. For example, my favorite mathematics dictionary [Borowski & Borwein, 1991] defines a disjunctive normal form as a disjunction of conjunctions and a conjunctive normal form as a conjunction of disjunctions. All Boolean functions have terms in parentheses that are themselves connected by either disjunction or conjunction. The parenthesized terms [Kauffman calls them clauses] in turn are connected disjunctively or conjunctively. The definitions I have just given focus on how clauses are connected and not on how the events in a clause are connected. However, I agree with Kauffman’s interpretation since it corresponds with how I have described my minterm and maxterm expansions of Boolean functions. We both focus on how the events within a parenthesis are connected. Minterms involve events connected conjunctively and maxterms involve events connected disjunctively. I will now express Kauffman’s Boolean function in two different ways, each of which provides additional information about this function and sets the stage for my discussion of the Ksat problem. 3.3.1 Kauffman’s Example in Conjunctive Canonical Form [Minterms]: 39 Using the minterm expansion theorem, together with the method I described above on page 8, we can express Kauffman’s f(A, B, C, D) = (A B)(C D)(Ac Dc) as (AcBCcD) (AcBCDc) (AcBCD) (ABcCDc) (ABCDc). Using the binary designator or bit string method for numbering minterms, we can also express f(A, B, C, D) here as M5 M6 M7 M10 M14 [see Figure 6 below]. Remember that the symbol means “disjoint union”. So, Kauffman’s Boolean function f(A, B, C, D) = (A B)(C D)(Ac Dc) can be satisfied in any of five specific ways as indicated by these five minterms. Remember that minterms represent the finest grain decomposition of a basic space of outcomes that is allowed by the binary nature of the events in Boolean functions. Thus, if Kauffman’s Boolean function was associated with some hypothesis H, the five minterms just identified show the exact number of ways that H could be satisfied. The five minterms listed here might each, with appropriate event labeling, be a possible scenario that satisfies H. In the engineering design context, H might be some statement of fitness and the five minterms represent the five specific designs that will satisfy H. Now, in Kauffman’s example, we have V = four binary variables. As mentioned earlier, we thus have 24 = 16 possible minterms shown in the diagram below. Thus, we also have 2 16 = 65,536 possible Boolean functions in this four-variable case. This simply tells us the total number of hypotheses that are possible concerning the four binary variables in this case. Ac Bc A Bc B B Dc M0 M4 M8 M12 D M1 M5 M9 M13 Dc M2 M6 M10 M14 D M3 M7 M11 M15 Cc C Figure 6 In pictorial form, Figure 6 shows the conjunctive satisfiability of Kauffman’s Boolean function. Five of the sixteen possible minterms will each satisfy this Boolean function. 3.3.2 Kauffman’s Example in Disjunctive Canonical Form [Maxterms] As I discussed earlier [pages 9-12], there is a formally equivalent way of expressing Boolean functions, such as the one Kauffman describes, in disjunctive canonical form in terms of maxterms. I will use both the De Morgan and the Gregg methods to produce a canonical decomposition of Kauffman’s Boolean function f(A, B, C, D) = (A B)(C D)(Ac Dc). I use both methods for two reasons. First, they are both informative, but in different ways. Second, they provide a check on my Boolean manipulations. I should of course get the same answer using both methods. I first begin with the minterm decomposition of f(A, B, C, D) = (A B)(C D)(Ac Dc), which I claimed was f(A, B, C, D) = M5 M6 M7 M10 M14. We first note that [M5 M6 M7 40 M10 M14] = [M0 M1 M2 M3 M4 M8 M9 M11 M12 M13 M15]C. If we apply de Morgan’s law twice to this complemented term in the right-hand expression, we can express in disjunctive canonical form Kauffman’s Boolean function f(A, B, C, D) = (A B)(C D)(Ac Dc) = (A B C D) (A B C Dc) (A B Cc D) (A B Cc Dc) (A Bc C D) (Ac B C D) (Ac B C Dc) (Ac B Cc Dc) (Ac Bc C D) (Ac Bc C Dc) (Ac Bc Cc Dc). The first thing to note here is that Kauffman’s Boolean function, expressed in disjunctive canonical form, involves eleven disjunctive maxterms, all of which are combined conjunctively. In other words, we must have all of these eleven maxterms taken together to satisfy f(A, B, C, D) = (A B)(C D)(Ac Dc). Now I will employ Gregg’s method for disjunctive decomposition that I described above on pages 10 -11. _____________________________________________________________________________ Row A B C D (A B) (C D) (Ac Dc) (A B)(C D)(Ac Dc) Maxterm 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 1 0 (A (A (A (A (A B C D) B C Dc) B Cc D) B Cc Dc) Bc C D) ---------------------------(Ac B C D) (A c B C Dc) ---------(A c B Cc Dc) (A c Bc C D) (A c Bc C Dc) ---------(Ac Bc Cc Dc) ----------------------------------------------------------------------------------------------------------------------------- ---It appears that I have performed my de Morgan operations appropriately since my lists of maxterms generated by the de Morgan and the Gregg methods agree. The table above helps me to explain a major distinction between Kauffman’s original Boolean function and my maxterm expansion of it. This distinction involves the number of variables in the clauses of our functions. In Kauffman’s formulation, all clauses have just two binary variables. But in both my minterm and maxterm expansions, all clauses have all four binary variables. My method requires that this be so. Indeed, the definitions of a minterm [page 5] and a maxterm [page 10] require that both of these terms contain states of all Boolean variables in the function decomposition mentioned above. This will be an important point to keep in mind as we proceed. My minterms and maxterms, as clauses, will always have a number of variables in clauses [C] that is equal to the number of variables [V]. Now its time to compare the minterm and maxterm expansions of Kauffman’s Boolean function to see what they reveal. First, only the minterm expansion reveals the specific bit strings 41 that satisfy this Boolean function. You can see this in the table above. Only the bit strings in Rows 5, 6, 7, 10, and 14 correspond to the truth of, or satisfy, Kauffman’s f(A, B, C, D) = (A B)(C D)(Ac Dc). This is the same thing as saying that the only conjunctive minterms that satisfy this function are M5, M6, M7, M10, and M14. Far less informative is the equivalent statement that this Boolean function is also satisfied by the intersection of all the eleven disjunctive maxterms that are derived from the remaining eleven bit strings. Thus, to satisfy Kauffman’s Boolean function we must have one or the other of the five minterms I have identified. Or, equivalently, we must have all of the maxterms I have identified. The reason, as you see in the table above, is that no one of the individual bit strings in Rows 0, 1, 2, 3, 4, 8, 9, 11, 12, 13, and 15 corresponds with the truth of, or satisfies, f(A, B, C, D) = (A B)(C D)(Ac Dc). Observe in the last column of the table above that a maxterm associated with any of these eleven rows has a different complementation pattern than does the bit string with which it is associated. 3.3.3 Ksat and Phase Transitions. I can now apply what I have done so far to Kauffman’s very interesting discussion of phase transitions that occur as far as the satisfiability of Boolean functions is concerned [Kauffman, 2000, 192-194]. At first, I will restrict attention to my maxterm expansion of Kauffman’s original Boolean function. To set the stage for my comments on Ksat problems, lets review the three essential ingredients of the Kauffman [hereafter Stuart] and Schum [hereafter Dave] formulations of Stuart’s Boolean function. Variables [V] Clauses [C] Variables In Each Clause [K] 4 3 2 4 11 4 Stuart: Dave [Maxterms]: Stuart first draws upon the work of a physicist named Scott Kirkpatrick who has studied Boolean functions and the extent to which they might be satisfied [unfortuately Stuart does not supply a reference to Kirkpatrick’s work]. What Kirkpatrick discovered was the importance of the ratio between the number of clauses [C] in a Boolean function, and the number of variables [V] that this function involves. As C gets larger than V a point is eventually reached at which the probability of satisfying the Boolean function drops suddenly and precipitously; corresponding to a phase transition. What is remarkable about Kirkpatrick’s work is that his studies revealed that a specific point on the C/V ratio could be determined where this phase transition will occur and that this critical ratio of C/V = R depends only on K, the number of variables in each clause. He showed that R = ln2(2K) = 0.6931(2K). Here’s where the term Ksat comes from; this critical ratio R depends only on K. I note here that Kirkpatrick’s formulation assumes that there will always be the same number of variables in each of the disjunctive clauses in a Boolean function. In forming any Boolean function there’s no requirement that all clauses have the same number of variables. For example, we might be interested in the satisfiability of the Boolean function g(A, B, C, D) = (A B)(B Cc D)(A Dc). In this case K is not constant over the three disjunctive clauses in the Boolean function. Figure 7 shows the nature of this phase transition. 1.0 Ln2(2K) Probability of Satisfiability 0 42 C/V Figure 7 I believe that this very interesting result can give a formal expression to at least one interpretation of Occam’s Razor. The number of clauses in a Boolean function basically identifies the number of constraints imposed in satisfying this function. If the number of these constraints gets to be many times larger than the number of variables there are, the chances of finding a result that satisfies this function decreases, and precipitously so; this is what’s so interesting about Kirkpatrick’s result. Let me return for a moment to hypotheses expressed as Boolean functions. What this phase transition says is that a point will be reached at which the specificity of our hypothesis will suddenly outrun our ability to find any scenario that satisfies it. Here’s another way, I believe, to interpret Occam’s Razor. Detailed hypotheses are often necessary in many situations. However, when they become too detailed we may never be able to find combinations of evidence that are consistent with these hypotheses. I spent a bit of time exploring Stuart’s Boolean function and my maxterm representation of it as they relate to this critical ratio R = ln2(2K). I thought this might be interesting since the value of K is different in our two formally equivalent expressions; so is the ratio [C/V] in our two expressions. For Stuart, K = 2; for me K = 4 = V. In any maxterm expansion K must equal V. In Stuart’s expression the actual value of C/V = 3/4; in my maxterm expression, actual C/V = 11/4 = 2.75. I next calculated the critical value of R [for phase transition] in each of our expressions. For Stuart, R = ln2(22) = 2.772588722. For my maxterm expression, R = ln2(24) = 11.09035489 [I carry all these decimals here for a reason I’ll mention in just a minute]. Both of our actual C/V ratios fall short of critical R required for a phase transition. I think the reasons differ, however, which is another point I’ll address a bit later. Next, using the critical value of ratio C/V = R = ln2(2K), and knowing that in both Stuart’s and my formulation V = 4, I wondered how large Stuart’s C could be made before he encounters a phase transition. In Stuart’s case, C = (4)R = (4)(2.772588722) = 11.09035488, which is about as close as it can come to the value of the critical ratio in my maxterm formulation of his Boolean function. In this case, our actual C/V ratios would be the same, namely 11/4. Calculation of C for my maxterm expression produces sort of an anomaly, which I believe I can explain. For Dave, C = 4R = (4)(11.09035489) = 44.36141956. This seems a preposterously large number of clauses I could add to my maxterm expansion before I fall off the phase transition cliff. What it does say, however, is that I stand virtually no chance of doing so. I believe the reason is that my minterm and maxterm expansions simply illustrate all the ways in which Stuart’s original Boolean function can be satisfied. My minterm expansion shows that there are just five specific ways in which fourvariable conjunctions will satisfy his function and my maxterm expansion shows that there are exactly eleven four-variable disjunctions, all of which must be true to satisfy his function. I add here that this conjunction of eleven disjunctions is immediately derivable from the five specific conjunctive minterms. In closing, I add only that discussions of these very interesting Ksat problems seem incomplete without discussion of the two major ways in which any Boolean function can be expanded in terms of minterms and maxterms. Each of these expansions provides information that lurks within Boolean functions but is not exposed until these expansions are performed. Minterm and maxterm expansions give a complete account of the satisfiability of a Boolean function. It is usually not easy to tell just by examining an original Boolean function whether it is satisfied by any particular settings of its event ingredients. As Stuart mentions, some setting of the ingredients of Boolean functions may lead to contradictions. Fortunately, these possibilities are all eliminated in the procedure for generating a minterm expansion of a Boolean function [see Step 3 on page 8]. The minterm and maxterm expansions I have provided for Stuart's Boolean function are simply examples of how we can rather easily [at least for relatively simple functions] 43 determine the specific situations in which a Boolean function can be satisfied. The Ksat problem seems very important and I do hope that discussion of it will continue. 4.0 A BRIEF SUMMARY My belief is that the concept of a Boolean function is vital in so many studies of discovery and invention. Thanks to those who have studied these functions, we have ways of determining specifically how these functions may be satisfied. I have shown how Boolean functions, and elements that arise in their decomposition, are useful in capturing attributes of the geneticallyinspired evolutionary computation approach to search processes in engineering design. Equally important are their applications in other areas in which many forms of evidence are employed to generate and test hypotheses about events or phenomena in law, intelligence analysis, and other important investigative areas. In addition, such functions arise naturally in abstract studies of how complex situations involving hypothesis generation and evaluation might be profitably investigated. I am certainly not the only person to observe the value of construing many different problems in terms of Boolean functions. But I do hope that my present collection of ideas adds a bit to the discussion of discovery and invention and that it will help generate your own further thoughts about these very interesting and very complex intellectual processes. REFERENCES Arciszewski, T., Sauer, T., Schum, D. Conceptual Designing: Chaos-Based Approach. Journal of Intelligent and Fuzzy Systems. Vol. 13, 2002/2003, 45-60 Birkhoff, G., MacLane, S. A Survey of Modern Algebra. Macmillan, NY, 1965 Borowski, E., Borwein, J. Harper Collins Dictionary of Mathematics. Harper Collins, NY, 1991 Coveny, P., Highfield, R. Frontiers of Complexity: The Search for Order in a Chaotic World. Fawcett Columbine, NY, 1995 Dumitrescu, D., Lazzerini, B., Jain, L., Dumitrescu, A. Evolutionary Computation. CRC Press, Boca Raton, Florida, 2000 Gregg, J. Ones and Zeros: Understanding Boolean Algebra, Digital Circuits, and the Logic of Sets. IEEE Press, NY, 1998 Hintikka, J. Sherlock Holmes Formalized. In: The Sign of Three: Dupin, Holmes, Peirce. eds Eco, U, Sebeok, T. Indiana University Press, Bloomington IN. 1983. pp 170-178 Hintikka, J., The Concept of Induction in the Light of the Interrogative Approach to Inquiry. In: Inference, Explanation, and Other Frustrations: Essays in the Philosophy of Science. ed. Earman, J. University of California Press, Berkeley, 1992 Hintikka, J., Bachman, J. What If?: Toward Excellence in Reasoning. Mayfield, Mountain View, CA, 1991 Hintikka, J., Hintikka, M. Sherlock Holmes Confronts Modern Logic: Toward a Theory of Information-Seeking Through Questioning. In: The Sign of Three: Dupin, Holmes, Peirce. eds Eco, U, Sebeok, T. Indiana University Press, Bloomington IN. 1983. pp 154-169 Holland, J. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975. Hunt, C. Agent-Based Evidence Marshaling: Agent-Based Creative Processes for Discovering and Forming Emergent Scenarios and Hypotheses. Doctoral Dissertation, George Mason University. 12 May, 2001 Kadane, J., Schum, D. A Probabilistic Analysis of the Sacco and Vanzetti Evidence. Wiley & Sons, NY, 1996 Kauffman, S. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, NY, 1993 Kauffman, S. At Home in the Universe. Oxford University Press, NY, 1995 Kauffman, S. Investigations. Oxford University Press, 2000 Keeney, R., Raiffa, H. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley & Sons, NY, 1976 Levi, I. The Enterprise of Knowledge: An Essay on Knowledge, Credal Probability, and Chance. MIT Press, 1983 44 Levi, I. Decisions and Revisions: Philosophical Essays on Knowledge and Value. Cambridge University Press. 1984 Levi, I. The Fixation of Belief and Its Undoing: Changing Beliefs through Inquiry. Cambridge University Press. 1991 Pfeiffer, P. Sets, Events, and Switching. McGraw-Hill, NY, 1964 Pfeiffer, P. Concepts of Probability Theory. McGraw-Hill, NY, 1965 Pfeiffer, P. Probability for Applications. Springer-Verlag, NY, 1990 Pfeiffer, P., Schum, D. Introduction to Applied Probability. Academic Press, NY, 1973 Schum, D. Marshaling Thoughts and Evidence During Fact Investigation. South Texas Law Review. Vol. 40, No 2, Summer, 1999, pp 401-454 Schum, D., Discovery, Invention, and Their Enhancement, First International Conference: Innovation in Architecture, Engineering, and Construction. Loughborough University, Great Britain. 15 January, 2001 Shackle, G. L. S. Decision, Order, and Time in Human Affairs. Cambridge University Press, 1968 Shafer, G. A Mathematical Theory of Evidence. Princeton University Press, 1976 Thorp, E. Elementary Probability. Wiley & Sons, NY, 1966 Von Neumann, J., Morgenstern, O. The Theory of Games and Economic Behavior. Princeton University Press, 1946. Wigmore, J. The Science of Judicial Proof: As Given by Logic, Psychology, and General Experience and Illustrated in Judicial Trials. 3rd Ed. Little, Brown & Co. Boston, 1937.