Modeling Changing Perspectives — Reconceptualizing Sensorimotor Experiences: Papers from the 2014 AAAI Fall Symposium Integration of Inference and Machine Learning as a Tool for Creative Reasoning Bartłomiej Śnieżyński AGH University of Science and Technology al. Mickiewicza 30 30-059 Krakow, Poland e-mail: bartlomiej.sniezynski@agh.edu.pl Abstract MILS combines many knowledge manipulation techniques during reasoning. It is able to use a background knowledge, simple proof rules (such as generalization or modus ponens) or complex patterns (machine learning algorithms) to produce information that was not stored explicite in the knowledge base. In the following sections related research is discussed, the MILS model and inference algorithm are presented. Next, preliminary results of experiments: knowledge base and three use cases are described. In this paper a method to integrate inference and machine learning is proposed. Execution of learning algorithm is defined as a complex inference rule, which generates intrinsically new knowledge. Such a solution makes the reasoning process more creative and allows to re-conceptualize agent’s experiences depending on the context. Knowledge representation used in the model is based on the Logic of Plausible Reasoning (LPR). Three groups of knowledge transmutations are defined: search transmutations that are looking for the information in data, inference transmutations that are formalized as LPR proof rules, and complex ones that can use machine learning algorithms or knowledge representation change operators. All groups can be used by inference engine in a similar manner. In the paper appropriate system model and inference algorithm are proposed. Additionally, preliminary experimental results are presented. Related research LPR was proposed by Alan Collins and Richard Michalski, who in 1989 published their article entitled ”The Logic of Plausible Reasoning, A Core Theory” (Collins and Michalski 1989). The aim of this study was to identify patterns of reasoning used by humans and create a formal system, which would be able to represent these patterns. The basic operations performed on the knowledge defined in the LPR include: abduction and deduction – used to explain and predict the characteristics of objects based on domain knowledge; generalisation and specialisation – allow for generalisation and refining of information by changing the set of objects to which this information relates to a set larger or smaller; abstraction and concretisation – change the level of detail in description of objects; similarity and contrast – allow the inference by analogy or lack of similarity between objects. The experimental results confirming that the methods of reasoning used by humans can be represented in the LPR are presented in subsequent papers (Boehm-Davis, Dontas, and Michalski 1990a; 1990b). The objective set by the creators has caused that LPR is significantly different from other known knowledge representation methods, such as classical logic, fuzzy logic, multi-valued logic, Demster - Shafer theory, probabilistic logic, Bayesian networks, semantic networks, ontologies, rough sets, or default logic. Firstly, there are many inference rules in LPR, which are not present in the formalisms mentioned above. Secondly, many parameters are specified for representing the uncertainty of knowledge. Introduction Traditional reasoning techniques applied in AI offer convergent interpretation of the stored knowledge, which does not provide new knowledge. Machine learning techniques may be creative and provide diversity but are not integrated with inference process. In this paper a method to integrate these two approaches is proposed. Execution of learning algorithm is defined as a complex inference rule, which produces new knowledge. Such a solution allows to re-conceptualize agent’s experiences depending on the context. It is possible to change perspective in which stored data is analyzed and intrinsically new knowledge is generated. The solution proposed is formulated as a Multistrategy Inference and Learning System (MILS). The idea is based on the Inferential Theory of Learning (Michalski 1994). In this approach, learning and inference can be presented as a goal-guided exploration of the knowledge space using operators called knowledge transmutations. As a base for knowledge representation the Logic of Plausible Reasoning (LPR) (Collins and Michalski 1989) is used. However, it is possible to apply this approach using other knowledge representation techniques, which are based on logic. c 2014, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 33 Using relational symbols, formulas of LPR can be defined. If o, o0 , o1 , ..., on , a, a1 , ..., an , v, c ∈ C, v1 , ..., vn are lists of elements of C, then V (o, a, v), H(o1 , o, c), B(o1 , o), S(o1 , o2 , o, a), E(o1 , a1 , o2 , a2 ), V (o1 , a1 , v1 ) ∧ ... ∧ V (on , an , vn ) → V (o, a, v) are LPR formulas. The LPR language can be extended by adding countable set of variables, which may be used instead of constant symbols in formulas. To manage uncertainty the following label algebra is used: A = (A, {fri }). (1) A is a set of labels which estimate uncertainty of formulas. Labeled formula is a pair f : l where f is a formula and l ∈ A is a label. A set of labeled formulas can be considered as a knowledge base. LPR inference patterns are defined as proof rules. Every proof rule ri has a sequence of premises (of length pri ) and a conclusion. {fri } is a set of functions which are used in proof rules to generate a label of a conclusion: for every proof rule ri an appropriate function fri : Apri → A should be defined. For rule ri with premises p1 : l1 , ..., pn : ln the plausible label of its conclusion is calculated using fri (l1 , ..., ln ). Examples of definitions of plausible algebras can be found in (Śnieżyński 2001; 2002). There are five main types of proof rules: GEN , SP EC, SIM , T RAN and M P . They correspond to the following inference patterns: generalization, specialization, similarity transformation, transitivity of relations and modus ponens. Some transformations can be applied to different types of formulas, therefore indexes are used to distinguish different versions of rules. Formal definitions of these rules can be found in (Collins and Michalski 1989; Śnieżyński 2003). On the basis of LPR, a DIH (Dynamically Interlaced Hierarchies) formalism was developed (Hieb and Michalski 1993b; 1993a). Knowledge consists of a static part represented by hierarchies and a dynamic part, which are traces, playing a role similar to statements in LPR. The DIH distinguishes three types of hierarchies: types, components and priorities. The latter type of hierarchy can be divided into subclasses: hierarchies of measures (used to represent the physical quantities), hierarchies of quantification (allowing quantifiers to be included in traces, such as e.g. one, most, or all) and hierarchies of schemes (used as means for the definition of multi-argument relationships and needed to interpret the traces). ITL was formulated just after DIH development (Michalski 1994). Michalski et al. also developed ITL implementation - an INTERLACE system (Alkharouf and Michalski 1996). This system is based on DIH and can generate sequences of knowledge operations that will enable the derivation of a target trace from the input hierarchies and traces. Yet, not all kinds of hierarchy, probability and factors describing the uncertainty of the information were included there. Also rule induction was not taken into account. Outline of the logic of plausible reasoning MILS is based on LPR. It is formalized as a labeled deductive system (LDS) (Gabbay 1991). If needed, instead of LPR another knowledge representation, which can be formulated using LDS, may be used. The language consists of a finite set of constant symbols C, five relational symbols and logical connectives: →, ∧. The relational symbols are: V, H, B, S, E. They are used to represent: statements (V ), hierarchy (H, B), similarity (S) and dependency (E). Statements are represented as object-attribute-value triples: V (o, a, v), where o, a, v ∈ C. It is a representation of the fact that object o has an attribute a equals v If object o has several values of a, there should be several appropriate statements in a knowledge base. To represent vagueness of knowledge it is possible to extend this definition and allow to use composite value [v1 , v2 , . . . , vn ], list of elements of C. It can be interpreted that object o has an attribute a equals v1 or v2 , . . ., or vn . Relation H(o1 , o, c), where o1 , o, c ∈ C, means that o1 is o in a context c. Context is used for specification of the range of inheritance. o1 and o have the same value for all attributes which depend on attribute c of object o. To show that one object is below the other in any hierarchy, relation B(o1 , o), where o1 , o ∈ C, should be used. Relation S(o1 , o2 , c) represents a fact, that o1 is similar to o2 ; o1 , o2 , c ∈ C. Context, as above, specifies the range of similarity. Only these attributes of o1 and o2 have the same values which depend on c. Dependency relation E(o1 , a1 , o2 , a2 ), where o1 , a1 , o2 , a2 ∈ C, means that values of attribute a1 of object o1 depend on attribute a2 of the second object (o2 ). In object-attribute-value triples, value should be placed below an attribute in a hierarchy: if V (o, a, [v1 , v2 , . . . , vn ]) is in a knowledge base, there should be also H(vi , a, c) for any 1 ≤ i ≤ n, c ∈ C. MILS Model MILS may be used to find an answer for a given hypothesis. The inference algorithm builds the proof using knowledge transmutations to infer the answer. It may also find substitutions for variables appearing in the hypothesis. Three types of knowledge transmutations are defined in MILS: • simple (LPR proof rules), • complex (using complex computations, e.g. rule induction algorithms or clustering methods) • search (database or web searching procedures). Knowledge transmutation can be represented as a triple: (p, c, a), where p are (possibly empty) premises or preconditions, c is a consequence (pattern of formula(s) that can be generated) and a is an action (empty for simple transmutations) that should be executed to generate consequence if premises are true according to the knowledge base. Every transmutation has its cost assigned. The cost should represent its computational complexity and (or) other important resources that are consumed (e.g. database access or search engines fees). Usually, simple transmutations have a low cost, search transmutations have a moderate cost and complex ones have a high cost. 34 This algorithm generates a tree T , which nodes (N ) are labeled by sequences of formulas. Every edge of T is labeled by a knowledge transmutation, which consequence can be unified with the first formula of a parent node or is labeled by the term kb(l) if the first formula of a parent node can be unified with ψ : l ∈ KB. s is the root of T . It is labeled by [ϕ]. The goal is to generate a node labeled by empty set of formulas. As it was mentioned, to limit the number of nodes expanded, A* algorithm is used. Therefore nodes in the OP EN sequence are ordered according to the values of evaluation function f : N → R, which is defined as follows: f (n) = g(n) + h(n), (2) MILS inference algorithm (see algorithm 1) is an adaptation of LPR proof algorithm (Śnieżyński 2003), where proof rules are replaced by more general knowledge transmutations. It is based on AUTOLOGIC system developed by Morgan (Morgan 1985). To limit the number of nodes and to generate optimal inference chains, algorithm A* (Hart, Nilsson, and Raphael 1968) is used. Input: ϕ – formula, KB – finite set of labeled formulas Output: If ∃l ∈ A such that ϕ : l can be inferred from KB: success, P – inference chain of ϕ : l from KB; else: failure T := tree with one node (root) s = [ϕ]; OP EN := [s]; while OP EN is not empty do n := the first element from OP EN ; Remove n from OP EN ; if n = [] then Generate proof P using path from s to n; Exit with success; end if the first formula of n represents action then Execute action; if action was successfull then add action’s results to KB; E:=nodes generated by removing from n action formula; end else K := knowledge transmutations, which consequence can be unified with first formula of n; E := nodes generated by replacing the first formula of n by premises and action of transmutations from K and applying substitutions from unifier generated in the previous step; if the first formula from n can be unified with element of KB then Add to E node obtained from n by removing the first formula and applying substitutions from unifier; end end Remove from E nodes generating loops; Append E to T connecting nodes to n; Insert nodes from E to OP EN ; end Exit with failure; Algorithm 1: MILS inference algorithm where g : N → R represents the actual cost of the inference chain, using knowledge transmutation costs and label of ϕ that can be generated, and h : N → R is a heuristic function which estimates the cost of the path from n to the goal node (e.g. minimal knowledge transmutation cost multiplied by the length of n can be used). All formulas in the proof path can be forgotten when a new task is executed. But it is also possible to keep these formulas in cache knowledge base together with the counter that will indicate the number of proofs in which this such formula is used. Using a formula in some proof should increase its counter, not using should decrease it. If the counter is equal 0, formula should be removed from temporal knowledge base. Preliminary Experimental Results In this section experimental implementation of MILS is described and some examples of inference chains are presented. In the current version of software only one complex and several simple knowledge transmutations are implemented. They are: GENo , SP ECo , SIMo , GENv , SP ECv , SIMv , SP ECE , SIME , SP ECo→ , HB , T RANB , AQ, where the last one is a rule induction transmutation based on Michalski’s AQ algorithm (Michalski 1973). Other rule induction algorithms, like C4.5 (Quinlan 1993) may be also used. Label algebra is very simple. Every formula is labeled by a single value from the range [0, 1] representing its certainty or strength. Only hierarchy formulas are labeled by a pair of such values, one used in generalizations, second in specialization. To calculate label of the consequence, labels of premises are multiplied. Cost of M P transmutation is 0.2, cost of SP ECo→ is 0.3, costs of HB , T RANB and T RANP are equal 0.05. The rest of simple transmutations has cost 0.1. The complex transmutation has the highest cost: 10. The domain on which the system was tested is similar to that used to test LPR (Boehm-Davis, Dontas, and Michalski 1990b). It represents a part of agent’s geography knowledge, hence some facts are uncertain or missing, some others are not true. Hierarchy of objects used is presented in figure 1, statements are presented in table 1. Input data is a set of labeled formulas KB – a knowledge base and a hypothesis (question) represented by the formula ϕ, which should be proved from KB. If there exist a label l ∈ A such, that ϕ : l can be inferred from KB, appropriate inference chain is returned, else procedure exits with failure. Agent’s experience and the context description should be also stored in KB as LPR formulas. 35 Figure 1: Hierarchy of objects Place Europe Germany Poland Albania China North Korea x Literacy high high high medium low medium medium Table 1: Statements GNP change GNP per capita slow decrease high slow growth medium stable medium fast growth low slow decrease low slow growth low that literate rate in Europe is high (statement V(europe, lit rate, high)), Slovakia is a typical European country in the context of culture (H(slovakia, europe, culture)), and literate rate depends on the culture context, we can specialize the object and replace europe with slovakia. It is not possible to answer the third question using the knowledge base and simple inference rules. Therefore AQ complex rule is chosen by the inference algorithm. As a result it produced implication formulas with consequence matching the statement V(place, gov, com). Training data was prepared using statements describing all the countries. Every country was a separate example. Class attribute was equal to gov. As a result, one such formula was generated: The following three questions were asked to present how system is able to infer plausible knowledge: 1. Is in Germany GNP growth? – v(germany, gnp-chg, gnp-grow), 2. What is literate rate in Slovakia? – v(slovakia, lit-rate, X), 3. Is government in some unknown country x communistic? – v(x, gov, com). Answers returned by the system are presented in table 2. Each answer has two parts: an inferred answer to the question, and an inference chain used to derive the answer in a form of proof tree. Proof tree has a following form: p(ϕ, l, r, P ), Government type democracy democracy democracy communism communism - (3) V (place, lit rate, [low, medium]) ∧ V (place, gnp chg, [f ast grow, slow decr, slow grow]) → V (place, gov, com) where ϕ : l is a formula proved using rule r and P is a list of proof trees representing inference of premises of r. If ϕ : l was taken from then knowledge base, then P is empty and r = kb(l). To answer the first question MILS used simple knowledge transmutation GENv which corresponds to abstraction. In the knowledge base a statement V(germany, gnp-chg, slow-grow) is stored. Using GENv slow-grow is replaced by more general value gnp-grow. The second question is answered using SP ECo inference rule, which corresponds to specialization. Knowing Next, this implication was specialized using SP ECo→ which replaced place with x. The result was derived by modus ponens using information about x from the knowledge base. The last example clearly demonstrates the difference between MILS and traditional inference engines. Without learning inference rule the answer would not be found. 36 1. v(germany, gnp_chg, gnp_grow) p(v(germany, gnp_chg, gnp_grow), vPL(0.9), genv, [ p(h(slow_grow, gnp_grow, all), hPL(1.0, 0.3), kb, []), p(b(gnp_grow, gnp_chg), bPL(1), kb, []), p(e(gnp_chg, europe, gnp_chg, all), ePL(1.0), kb, []), p(h(germany, europe, all), hPL(1.0, 0.1), kb, []), p(v(germany, gnp_chg, slow_grow), vPL(0.9), kb, []) ]) 2. v(slovakia, lit_rate, X) p(v(slovakia, lit_rate, high), vPL(0.9), speco, [ p(h(slovakia, europe, culture), hPL(1.0, 0.01), kb, []), p(e(europe, lit_rate, europe, culture), ePL(1), spece, [ p(h(europe, place, all), hPL(1.0, 1.0), kb, []), p(e(place, lit_rate, place, all), ePL(1.0), kb, []), p(e(place, lit_rate, place, all), ePL(1.0), kb, []) ]), p(v(europe, lit_rate, high), vPL(0.9), kb, []) ]) 3. v(x, gov, com) p(v(x, gov, com), vPL(0.8), aq, [ p(aq(x, gov, com), vPL(_), kb, []), p(v(x, gov, com), vPL(0.8), modusponens, [ p(impl(v(x, gov, com), [v(x, lit_rate, [low, medium]), v(x, gnp_chg, [fast_grow, slow_decr, slow_grow])]), iPL(1), speci, [ p(h(x, place, all), hPL(1.0, 0.01), kb, []), p(impl(v(place, gov, com), [v(place, lit_rate, [low, medium]), v(place, gnp_chg, [fast_grow, slow_decr, slow_grow])]), iPL(1), aq, []) ]), p(v(x, lit_rate, [low, medium]), vPL(0.8), kb, []), p(v(x, gnp_chg, [fast_grow, slow_decr, slow_grow]), vPL(0.9), kb, []) ]) ]) Table 2: Questions and answers returned by MILS Conclusions and Further Works Acknowledgments The research reported in the paper was supported by the grant “Information management and decision support system for the Government Protection Bureau” (No. DOBRBIO4/060/13423/2013) from the Polish National Center for Research and Development. Multistrategy Inference and Learning System (MILS) can be considered as an inference system that can manage knowledge in a multi fashion way, in a way similar as human beings do. It combines search, inference and machine learning capabilities that are performed in a uniform way. As a result, the reasoning process is more creative than in classical AI models. Depending on the agent’s experience and the context, other knowledge may be discovered from the stored statements. References Alkharouf, N. W., and Michalski, R. S. 1996. Multistrategy task-adaptive learning using dynamically interlaced hierarchies. In Michalski, R. S., and Wnek, J., eds., Proceedings of the Third International Workshop on Multistrategy Learning. Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990a. Plausible reasoning: An outline of theory and the validation of its structural properties. In Intelligent Systems: State of the Art and Future Directions. North Holland. Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990b. A validation and exploration of the Collins-Michalski theory of plausible reasoning. Technical report, George Mason University. Further works will concern enrichment of software capabilities by extending the range of implemented knowledge transmutations. For example, adding clustering algorithm is planed. It will be used to derive similarity formulas. Testing the system in other (more realistic) domains is also considered. Application in a system for data analysis that is being developed for the Polish Government Protection Bureau is considered. MILS will be applied in decision support component. Application of MILS in robotics seems to be also a good research direction. 37 Collins, A., and Michalski, R. S. 1989. The logic of plausible reasoning: A core theory. Cognitive Science 13:1–49. Gabbay, D. M. 1991. LDS – Labeled Deductive Systems. Oxford University Press. Hart, P.; Nilsson, N. J.; and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost path. IEEE Trans. Syst. Science and Cybernetics 4 (2):100–107. Hieb, M. R., and Michalski, R. S. 1993a. A knowledge representation system based on dynamically interlaced hierarchies: Basic ideas and examples. Technical report, George Mason University. Hieb, M. R., and Michalski, R. S. 1993b. Multitype inference in multistrategy task-adaptive learning: Dynamic interlaced hierarchies. Technical report, George Mason University. Michalski, R. S. 1973. AQVAL/1 – computer implementation of a variable valued logic VL1 and examples of its application to pattern recognition. In Proc. of the First International Joint Conference on Pattern Recognition. Michalski, R. S. 1994. Inferential theory of learning: Developing foundations for multistrategy learning. In Michalski, R. S., ed., Machine Learning: A Multistrategy Approach, Volume IV. Morgan Kaufmann Publishers. Morgan, C. G. 1985. Autologic. Logique et Analyse 28 (110-111):257–282. Quinlan, J. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Śnieżyński, B. 2001. Verification of the logic of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Systems 2001, Advances in Soft Computing. Physica-Verlag, Springer. Śnieżyński, B. 2002. Probabilistic label algebra for the logic of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Systems 2002, Advances in Soft Computing. Physica-Verlag, Springer. Śnieżyński, B. 2003. Proof searching algorithm for the logic of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Processing and Web Mining, Advances in Soft Computing, 393–398. Springer. 38