0 Boolean algebra, Boolean nodes Encyclopedia of Medical Decision Making, 2009 contribution by J. Hilden, j.hilden@biostat.ku.dk Boolean algebra, Boolean nodes A Boolean, or logical, variable is one that can take the values T (true) or F (false), and the Boolean, or logical, algebra, pioneered by George Boole (1815-1864), holds the formal machinery that allows such truth values to be logically combined. Boolean principles offer a framework for handling questionnaire and symptom data of the common binary kind (Yes vs. No, normal (‘negative’) vs. abnormal (‘positive’), etc.). For clinical decisions, even measurements, probabilities, etc., often have to be dichotomized (binarized). Library searches exploit Boolean AND, OR and NOT; and the digital computer is essentially a huge number of electronic switches (On vs. Off) connected in a Boolean manner, marching to the beat of a clock. Boolean principles also underlie logical checking of rule-based decision support systems for inconsistencies, incompleteness and redundancy. The algebra For example, let A, B, C, … be diagnostic tests or, more precisely, the Boolean variables that hold the answers to ‘Did test A come out positive?’, etc. Boolean negation, alias NOT, swaps T and F, indicating, in our example, whether a test came out negative: ¬A = (not A) = (false if A is true; true if A is false) = (F if A, otherwise T). Note that ¬(¬A) = A. Other basic operations are AND and OR: AND (‘Did both A and B come out positive?’): A B = (A and B) = (T if both A and B, otherwise F). OR (‘Did A or B, or both, come out positive?’): 1 A B = (A or B) = (F if neither A nor B, otherwise T) = ¬(¬A ¬B). The mirror image of the right-most identity also works: A B = ¬(¬A ¬B) = (F if one or both of A and B are false, otherwise T). Set theory involves set intersection (), union () and complementing, which are the precise analogues of , and ¬, respectively. OR and AND are associative operations: more than two terms can be ORed (ANDed), in arbitrary order, to reflect ‘at least one true’ (‘all true’). Distributive properties include: (A B) (A C) = A (B C), (A B) (A C) = A (B C). The former may be read: to be a ‘female diabetic or female with hypertension’ means to be a ‘female with diabetes or hypertension.’ Self-combination: (A A) = (A A) = A . Finally, test A must be either positive or negative, but cannot be both: (A ¬A) = T, (A ¬A) = F . The former expression is a tautology, i.e. a necessarily true proposition. The OR described so far is the inclusive OR, cf. “or both” above. Informatics (checksums, cryptography) makes frequent use of the exclusive OR, abbr. EXOR. Equivalence (≡), in the sense of having the same truth value, and EXOR are each other’s negations: (A ≡ B) = (A and B are both true or both false) , (A EXOR B) = ¬(A ≡ B) = (one of A and B is true, not both) . Now, ((A EXOR B) EXOR C) is true if just one or all three terms are true. Extending this rule to repeated EXORs of multiple Boolean terms, one finds that the result is true if the number of true terms is odd, and false if it is even. 2 The implication symbol – a warning The implication symbol (→) is a treacherous abbreviation: (A → U) = (A ‘implies’ U) = (U ¬A) = (if A, then U; otherwise T) . Spelling it out: if A, then the expression reproduces the truth value of U; if not A, then the result is T – regardless of U! This is different from ‘A causes U,’ however construed. It is also different from the language of decision recommendations and deductions in rule-based decision support systems. When you come across a statement like ‘If symptom A is present, diagnosis U is applicable,’ you take it to tell you nothing about patients without symptom A. If told that the statement is untrue, you take that to mean that A alone should not trigger label U; and this is exactly how a rule-based system would react to cancellation of ‘If A, then U.’ The Boolean negation ¬(A → U) = (¬U A), on the other hand, would claim that, despite symptom A, diagnosis U was not made (in a particular case), or: every patient has symptom A and diagnosis U is never made (when read as a general rule). Physical analogues A and B may be gates. When they must be passed one after the other, entry is possible if and only if both gates are open. Symbolically, E = A B. If, on the other hand, A and B are alternative routes of entry, E = A B (is at least one gate open?). Electrical switches connected ‘in series’ or ‘in parallel’ are analogous. Diagnostic tests in series and in parallel, see below. Boolean data in programming – Boolean nodes Most programming languages make available a Boolean, or Logical, data type. It may look thus: Boolean L; #declares L to be a Boolean variable# L := (n < 20); #L becomes true or false depending on the value of n# if( L )print(“n small”); #to be printed only if n is below 20# Programming languages have different ways of writing AND and OR. Often, however, it is convenient to let T and F be represented by 1 and 0, leading to: ¬A = (1 – A), 3 ABC … = ABC… [an ordinary product of zeros and ones] = min(A, B, C, …) , A B C … = max(A, B, C, …) . Conversely, mathematicians often write and for min and max; so x y = min(x, y), the smaller of x and y. Checksum calculations, involving EXORing of multiple 0-or-1 digits, produce the result 1 precisely when the number of component 1’s is odd, as stated above, i.e. when their ordinary sum is odd: EXOR(A, B, C, …) = (1 if A + B + C + … is odd; otherwise 0) . Boolean nodes A standard decision tree has a stem (clinical problem) and two types of splits: chance nodes and decision nodes. Between the stem and the leaves, each of which represents a possible clinical diary, the tree gets increasingly bushy because there is no way branches can rejoin. In practice, many subtrees are nearly identical and can be drawn, and programmed, just once. They can then be entered from various points as subroutines, with arguments that let them inherit patient features from the calling point. Numerical arguments allow probabilities and utilities to be context-dependent (e.g., age-dependent); Boolean arguments allow the subtree to be structurally modified. They govern special Boolean nodes, which act as switches. The Boolean argument ‘Has the patient already had a hemicolectomy?’ may govern a switch that blocks a decision branch involving hemicolectomy. Likewise, in modeling an annual screening program, the same annual subtree may have to be entered recursively: a Boolean switch may serve to prevent endless recursion. Boolean nodes allow compact, minimally redundant, trees to be drawn – sometimes at the expense of intelligibility. (Some authors have used the term for other kinds of two-way nodes.) Boolean matrices Consider a graph of N interconnected nodes (vertices). An N×N matrix C = {Cij} of Boolean elements may represent the interconnections (arcs, edges), Cij being T if and only if node i is directly connected to node j. Both directed and non-directed graphs may be represented in this way; in the latter case, the matrix is symmetric (Cij = Cji). The diagonal elements, Cii, are set to F 4 unless self-referring nodes make sense, as they do in state-progression models, including Markov chains. Matrix C is called the adjacency matrix of the graph. The matrix product D = CC, with addition and multiplication replaced with OR and AND, now answers the question: can one pass in precisely two steps from node i to node k? Readers familiar with matrices will see that Dik = j=1,…,N{ Cij Cjk } = (Ci1 C1k) (Ci2 C2k) . . . = (is there at least one node j that can be used as a stepping stone?) . This idea can be elaborated to answer many types of connectedness questions. Example: In this directed graph, the arcs are numbered for convenience, and the elements in matrix C that go into the calculation of DA,D are shaded. A 1 B 2 C 3 D 4 E 5 5 Adjacency matrix C (edge absent marked with a dot; edge numbers shown for clarity) To A B C D E From A . T (1) . . . B . . T (2) T (3) . C . . . . . D . . . . T (4) E . . . . T (5) Matrix D = CC (by aligning the ‘From A’ row with the ‘To D’ column in matrix C, both shaded, one sees that the only match is in the second position, implying that node B is the only stepping stone from A to D, i.e. route 1-3) To A B C D E From A . . T (12) T (13) . B . . . . T (34) C . . . . . D . . . . T (45) E . . . . T (55) The Boolean approach to diagnostic tests The 2k possible outcomes of k binary tests, and the k! possible sequences of execution, may render ordinary decision trees unmanageable: one ends up with 2kk! leaves, or 48 when k = 3 (80 when tests may be executed simultaneously). A shortcut is offered by the following, much simpler, 6 Boolean procedure, supplemented, if necessary, by an ad hoc analysis to find the least costly or risky execution scheme (flow-chart). After tabulating the 2k outcomes and the associated clinical actions found by utility maximization, suppose it turns out that intervention U is recommended when, and only when, tests A, B and C have the result patterns: (+ + +), (+ + –), (– + +) or (– – +). The Boolean representation U = (ABC) (AB¬C) (¬ABC) (¬A¬BC) reduces to U = (A B) (¬A C), suggesting that test A should be performed first and, depending on the result, B or C will decide. However, it also reduces to U = (B C) ((B EXOR C)(A ≡ B)), suggesting that one should begin with B and C and, if they disagree, check whether A sides with B. The former execution scheme is attractive when A is inexpensive and risk-free, the latter when A is costly or risky. Unless the choice is obvious, one must go through all contingencies and calculate expected money and utility costs. These concerns, as well as the constraints that arise when tests are technically intertwined, can be handled in a decision tree but the answer found by the Boolean procedure is otherwise exactly the one a decision tree would give. In other words, the two procedures lead to the same test interpretation scheme but the Boolean procedure may require ad hoc supplementary calculations to find the optimal test execution scheme. Warning: Speaking of tests A and B being applied in parallel (in series) may refer to the tests being executed simultaneously (vs. one after the other), but some authors are thinking of the gate analogy above and therefore use the parallel vs. series terminology to characterize two interpretation schemes, viz. U = (A B) vs. U = (A B). As a by-product, the Boolean procedure reveals whether any tests are mandatory (needed in all cases) or redundant (dispensable). In the small artificial example this did not happen. The 1986 paper that popularized these techniques also illustrated some geometric features that the ROC diagram will possess when the lattice of all Boolean combinations of several binary tests is plotted. Jørgen Hilden See also Diagnostic process; Decision support systems; Subtrees (use in constructing decision trees); Causal inferences and diagrams; Markov models; ROC. Further references Boole, G. (1854). An investigation of the laws of thought. London, Macmillan. 7 Garcia-Remesal, M., Maojo, V., Laita, L., Roanes-Lozano, E., & Crespo, J. (2007). An algebraic approach to detect logical inconsistencies in medical appropriateness criteria. Conf Proc IEEE Eng Med Biol Soc, 1, 5148-5151. Glasziou, P., & Hilden, J. (1986). Decision tables and logic in decision analysis. Medical Decision Making, 6, 154-160. Lau, J., Kassirer, J.P., & Pauker, S.G. (1983). Decision Maker 3.0: improved decision analysis by personal computer. Medical Decision Making, 3, 39-43.