Integration of Inference and Bartłomiej ´Snie˙zy ´nski

advertisement
Modeling Changing Perspectives — Reconceptualizing Sensorimotor Experiences: Papers from the 2014 AAAI Fall Symposium
Integration of Inference and
Machine Learning as a Tool for Creative Reasoning
Bartłomiej Śnieżyński
AGH University of Science and Technology
al. Mickiewicza 30
30-059 Krakow, Poland
e-mail: bartlomiej.sniezynski@agh.edu.pl
Abstract
MILS combines many knowledge manipulation techniques during reasoning. It is able to use a background
knowledge, simple proof rules (such as generalization or
modus ponens) or complex patterns (machine learning algorithms) to produce information that was not stored explicite
in the knowledge base.
In the following sections related research is discussed,
the MILS model and inference algorithm are presented.
Next, preliminary results of experiments: knowledge base
and three use cases are described.
In this paper a method to integrate inference and machine learning is proposed. Execution of learning algorithm is defined as a complex inference rule, which
generates intrinsically new knowledge. Such a solution makes the reasoning process more creative and allows to re-conceptualize agent’s experiences depending on the context. Knowledge representation used in
the model is based on the Logic of Plausible Reasoning (LPR). Three groups of knowledge transmutations
are defined: search transmutations that are looking for
the information in data, inference transmutations that
are formalized as LPR proof rules, and complex ones
that can use machine learning algorithms or knowledge representation change operators. All groups can
be used by inference engine in a similar manner. In
the paper appropriate system model and inference algorithm are proposed. Additionally, preliminary experimental results are presented.
Related research
LPR was proposed by Alan Collins and Richard Michalski,
who in 1989 published their article entitled ”The Logic of
Plausible Reasoning, A Core Theory” (Collins and Michalski 1989). The aim of this study was to identify patterns
of reasoning used by humans and create a formal system,
which would be able to represent these patterns. The basic
operations performed on the knowledge defined in the LPR
include:
abduction and deduction – used to explain and predict the
characteristics of objects based on domain knowledge;
generalisation and specialisation – allow for generalisation and refining of information by changing the set of
objects to which this information relates to a set larger or
smaller;
abstraction and concretisation – change the level of detail
in description of objects;
similarity and contrast – allow the inference by analogy
or lack of similarity between objects.
The experimental results confirming that the methods of
reasoning used by humans can be represented in the LPR are
presented in subsequent papers (Boehm-Davis, Dontas, and
Michalski 1990a; 1990b). The objective set by the creators
has caused that LPR is significantly different from other
known knowledge representation methods, such as classical logic, fuzzy logic, multi-valued logic, Demster - Shafer
theory, probabilistic logic, Bayesian networks, semantic networks, ontologies, rough sets, or default logic. Firstly, there
are many inference rules in LPR, which are not present in
the formalisms mentioned above. Secondly, many parameters are specified for representing the uncertainty of knowledge.
Introduction
Traditional reasoning techniques applied in AI offer convergent interpretation of the stored knowledge, which does not
provide new knowledge. Machine learning techniques may
be creative and provide diversity but are not integrated with
inference process. In this paper a method to integrate these
two approaches is proposed. Execution of learning algorithm is defined as a complex inference rule, which produces
new knowledge. Such a solution allows to re-conceptualize
agent’s experiences depending on the context. It is possible
to change perspective in which stored data is analyzed and
intrinsically new knowledge is generated.
The solution proposed is formulated as a Multistrategy
Inference and Learning System (MILS). The idea is based
on the Inferential Theory of Learning (Michalski 1994).
In this approach, learning and inference can be presented
as a goal-guided exploration of the knowledge space using
operators called knowledge transmutations. As a base for
knowledge representation the Logic of Plausible Reasoning
(LPR) (Collins and Michalski 1989) is used. However, it is
possible to apply this approach using other knowledge representation techniques, which are based on logic.
c 2014, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
33
Using relational symbols, formulas of LPR can be
defined. If o, o0 , o1 , ..., on , a, a1 , ..., an , v, c ∈ C,
v1 , ..., vn are lists of elements of C, then V (o, a, v),
H(o1 , o, c), B(o1 , o), S(o1 , o2 , o, a), E(o1 , a1 , o2 , a2 ),
V (o1 , a1 , v1 ) ∧ ... ∧ V (on , an , vn ) → V (o, a, v) are LPR
formulas.
The LPR language can be extended by adding countable
set of variables, which may be used instead of constant symbols in formulas.
To manage uncertainty the following label algebra is
used:
A = (A, {fri }).
(1)
A is a set of labels which estimate uncertainty of formulas.
Labeled formula is a pair f : l where f is a formula and
l ∈ A is a label. A set of labeled formulas can be considered
as a knowledge base.
LPR inference patterns are defined as proof rules. Every proof rule ri has a sequence of premises (of length
pri ) and a conclusion. {fri } is a set of functions which
are used in proof rules to generate a label of a conclusion: for every proof rule ri an appropriate function fri :
Apri → A should be defined. For rule ri with premises
p1 : l1 , ..., pn : ln the plausible label of its conclusion
is calculated using fri (l1 , ..., ln ). Examples of definitions
of plausible algebras can be found in (Śnieżyński 2001;
2002).
There are five main types of proof rules: GEN , SP EC,
SIM , T RAN and M P . They correspond to the following
inference patterns: generalization, specialization, similarity
transformation, transitivity of relations and modus ponens.
Some transformations can be applied to different types of
formulas, therefore indexes are used to distinguish different
versions of rules. Formal definitions of these rules can be
found in (Collins and Michalski 1989; Śnieżyński 2003).
On the basis of LPR, a DIH (Dynamically Interlaced Hierarchies) formalism was developed (Hieb and Michalski
1993b; 1993a). Knowledge consists of a static part represented by hierarchies and a dynamic part, which are traces,
playing a role similar to statements in LPR. The DIH distinguishes three types of hierarchies: types, components and
priorities. The latter type of hierarchy can be divided into
subclasses: hierarchies of measures (used to represent the
physical quantities), hierarchies of quantification (allowing
quantifiers to be included in traces, such as e.g. one, most, or
all) and hierarchies of schemes (used as means for the definition of multi-argument relationships and needed to interpret
the traces).
ITL was formulated just after DIH development (Michalski 1994). Michalski et al. also developed ITL implementation - an INTERLACE system (Alkharouf and Michalski
1996). This system is based on DIH and can generate sequences of knowledge operations that will enable the derivation of a target trace from the input hierarchies and traces.
Yet, not all kinds of hierarchy, probability and factors describing the uncertainty of the information were included
there. Also rule induction was not taken into account.
Outline of the logic of plausible reasoning
MILS is based on LPR. It is formalized as a labeled deductive system (LDS) (Gabbay 1991). If needed, instead of LPR
another knowledge representation, which can be formulated
using LDS, may be used.
The language consists of a finite set of constant symbols
C, five relational symbols and logical connectives: →, ∧.
The relational symbols are: V, H, B, S, E. They are used to
represent: statements (V ), hierarchy (H, B), similarity (S)
and dependency (E).
Statements are represented as object-attribute-value
triples: V (o, a, v), where o, a, v ∈ C. It is a representation
of the fact that object o has an attribute a equals v If object
o has several values of a, there should be several appropriate statements in a knowledge base. To represent vagueness
of knowledge it is possible to extend this definition and allow to use composite value [v1 , v2 , . . . , vn ], list of elements
of C. It can be interpreted that object o has an attribute a
equals v1 or v2 , . . ., or vn .
Relation H(o1 , o, c), where o1 , o, c ∈ C, means that o1 is
o in a context c. Context is used for specification of the range
of inheritance. o1 and o have the same value for all attributes
which depend on attribute c of object o.
To show that one object is below the other in any hierarchy, relation B(o1 , o), where o1 , o ∈ C, should be used.
Relation S(o1 , o2 , c) represents a fact, that o1 is similar to
o2 ; o1 , o2 , c ∈ C. Context, as above, specifies the range of
similarity. Only these attributes of o1 and o2 have the same
values which depend on c.
Dependency relation E(o1 , a1 , o2 , a2 ), where o1 , a1 , o2 ,
a2 ∈ C, means that values of attribute a1 of object o1 depend on attribute a2 of the second object (o2 ).
In object-attribute-value triples, value should be placed
below an attribute in a hierarchy: if V (o, a, [v1 , v2 , . . . , vn ])
is in a knowledge base, there should be also H(vi , a, c) for
any 1 ≤ i ≤ n, c ∈ C.
MILS Model
MILS may be used to find an answer for a given hypothesis.
The inference algorithm builds the proof using knowledge
transmutations to infer the answer. It may also find substitutions for variables appearing in the hypothesis. Three types
of knowledge transmutations are defined in MILS:
• simple (LPR proof rules),
• complex (using complex computations, e.g. rule induction
algorithms or clustering methods)
• search (database or web searching procedures).
Knowledge transmutation can be represented as a triple:
(p, c, a), where p are (possibly empty) premises or preconditions, c is a consequence (pattern of formula(s) that can
be generated) and a is an action (empty for simple transmutations) that should be executed to generate consequence if
premises are true according to the knowledge base.
Every transmutation has its cost assigned. The cost should
represent its computational complexity and (or) other important resources that are consumed (e.g. database access or
search engines fees). Usually, simple transmutations have a
low cost, search transmutations have a moderate cost and
complex ones have a high cost.
34
This algorithm generates a tree T , which nodes (N ) are
labeled by sequences of formulas. Every edge of T is labeled
by a knowledge transmutation, which consequence can be
unified with the first formula of a parent node or is labeled
by the term kb(l) if the first formula of a parent node can be
unified with ψ : l ∈ KB. s is the root of T . It is labeled by
[ϕ]. The goal is to generate a node labeled by empty set of
formulas.
As it was mentioned, to limit the number of nodes
expanded, A* algorithm is used. Therefore nodes in the
OP EN sequence are ordered according to the values of
evaluation function f : N → R, which is defined as follows:
f (n) = g(n) + h(n),
(2)
MILS inference algorithm (see algorithm 1) is an adaptation of LPR proof algorithm (Śnieżyński 2003), where proof
rules are replaced by more general knowledge transmutations. It is based on AUTOLOGIC system developed by
Morgan (Morgan 1985). To limit the number of nodes and
to generate optimal inference chains, algorithm A* (Hart,
Nilsson, and Raphael 1968) is used.
Input: ϕ – formula, KB – finite set of labeled formulas
Output: If ∃l ∈ A such that ϕ : l can be inferred from
KB: success, P – inference chain of ϕ : l
from KB; else: failure
T := tree with one node (root) s = [ϕ];
OP EN := [s];
while OP EN is not empty do
n := the first element from OP EN ;
Remove n from OP EN ;
if n = [] then
Generate proof P using path from s to n;
Exit with success;
end
if the first formula of n represents action then
Execute action;
if action was successfull then
add action’s results to KB;
E:=nodes generated by removing from n
action formula;
end
else
K := knowledge transmutations, which
consequence can be unified with first formula of
n;
E := nodes generated by replacing the first
formula of n by premises and action of
transmutations from K and applying
substitutions from unifier generated in the
previous step;
if the first formula from n can be unified with
element of KB then
Add to E node obtained from n by
removing the first formula and applying
substitutions from unifier;
end
end
Remove from E nodes generating loops;
Append E to T connecting nodes to n;
Insert nodes from E to OP EN ;
end
Exit with failure;
Algorithm 1: MILS inference algorithm
where g : N → R represents the actual cost of the inference
chain, using knowledge transmutation costs and label of ϕ
that can be generated, and h : N → R is a heuristic function
which estimates the cost of the path from n to the goal node
(e.g. minimal knowledge transmutation cost multiplied by
the length of n can be used).
All formulas in the proof path can be forgotten when a
new task is executed. But it is also possible to keep these
formulas in cache knowledge base together with the counter
that will indicate the number of proofs in which this such
formula is used. Using a formula in some proof should increase its counter, not using should decrease it. If the counter
is equal 0, formula should be removed from temporal knowledge base.
Preliminary Experimental Results
In this section experimental implementation of MILS is described and some examples of inference chains are presented.
In the current version of software only one complex and
several simple knowledge transmutations are implemented.
They are: GENo , SP ECo , SIMo , GENv , SP ECv ,
SIMv , SP ECE , SIME , SP ECo→ , HB , T RANB , AQ,
where the last one is a rule induction transmutation based
on Michalski’s AQ algorithm (Michalski 1973). Other rule
induction algorithms, like C4.5 (Quinlan 1993) may be also
used.
Label algebra is very simple. Every formula is labeled
by a single value from the range [0, 1] representing its certainty or strength. Only hierarchy formulas are labeled by
a pair of such values, one used in generalizations, second
in specialization. To calculate label of the consequence, labels of premises are multiplied. Cost of M P transmutation
is 0.2, cost of SP ECo→ is 0.3, costs of HB , T RANB and
T RANP are equal 0.05. The rest of simple transmutations
has cost 0.1. The complex transmutation has the highest
cost: 10.
The domain on which the system was tested is similar to
that used to test LPR (Boehm-Davis, Dontas, and Michalski
1990b). It represents a part of agent’s geography knowledge,
hence some facts are uncertain or missing, some others are
not true. Hierarchy of objects used is presented in figure 1,
statements are presented in table 1.
Input data is a set of labeled formulas KB – a knowledge
base and a hypothesis (question) represented by the formula
ϕ, which should be proved from KB. If there exist a label
l ∈ A such, that ϕ : l can be inferred from KB, appropriate
inference chain is returned, else procedure exits with failure.
Agent’s experience and the context description should be
also stored in KB as LPR formulas.
35
Figure 1: Hierarchy of objects
Place
Europe
Germany
Poland
Albania
China
North Korea
x
Literacy
high
high
high
medium
low
medium
medium
Table 1: Statements
GNP change GNP per capita
slow decrease
high
slow growth
medium
stable
medium
fast growth
low
slow decrease
low
slow growth
low
that literate rate in Europe is high (statement V(europe,
lit rate, high)), Slovakia is a typical European country in the context of culture (H(slovakia, europe,
culture)), and literate rate depends on the culture context, we can specialize the object and replace europe with
slovakia.
It is not possible to answer the third question using the
knowledge base and simple inference rules. Therefore AQ
complex rule is chosen by the inference algorithm. As a
result it produced implication formulas with consequence
matching the statement V(place, gov, com). Training data was prepared using statements describing all the
countries. Every country was a separate example. Class attribute was equal to gov. As a result, one such formula was
generated:
The following three questions were asked to present how
system is able to infer plausible knowledge:
1. Is in Germany GNP growth? – v(germany,
gnp-chg, gnp-grow),
2. What is literate rate in Slovakia? – v(slovakia,
lit-rate, X),
3. Is government in some unknown country x communistic?
– v(x, gov, com).
Answers returned by the system are presented in table 2.
Each answer has two parts: an inferred answer to the question, and an inference chain used to derive the answer in a
form of proof tree. Proof tree has a following form:
p(ϕ, l, r, P ),
Government type
democracy
democracy
democracy
communism
communism
-
(3)
V (place, lit rate, [low, medium]) ∧
V (place, gnp chg, [f ast grow, slow decr, slow grow]) →
V (place, gov, com)
where ϕ : l is a formula proved using rule r and P is a list of
proof trees representing inference of premises of r. If ϕ : l
was taken from then knowledge base, then P is empty and
r = kb(l).
To answer the first question MILS used simple
knowledge transmutation GENv which corresponds
to abstraction. In the knowledge base a statement
V(germany, gnp-chg, slow-grow) is stored.
Using GENv slow-grow is replaced by more general
value gnp-grow.
The second question is answered using SP ECo inference rule, which corresponds to specialization. Knowing
Next, this implication was specialized using SP ECo→
which replaced place with x. The result was derived by
modus ponens using information about x from the knowledge base.
The last example clearly demonstrates the difference between MILS and traditional inference engines. Without
learning inference rule the answer would not be found.
36
1. v(germany, gnp_chg, gnp_grow)
p(v(germany, gnp_chg, gnp_grow), vPL(0.9), genv, [
p(h(slow_grow, gnp_grow, all), hPL(1.0, 0.3), kb, []),
p(b(gnp_grow, gnp_chg), bPL(1), kb, []),
p(e(gnp_chg, europe, gnp_chg, all), ePL(1.0), kb, []),
p(h(germany, europe, all), hPL(1.0, 0.1), kb, []),
p(v(germany, gnp_chg, slow_grow), vPL(0.9), kb, [])
])
2. v(slovakia, lit_rate, X)
p(v(slovakia, lit_rate, high), vPL(0.9), speco, [
p(h(slovakia, europe, culture), hPL(1.0, 0.01), kb, []),
p(e(europe, lit_rate, europe, culture), ePL(1), spece, [
p(h(europe, place, all), hPL(1.0, 1.0), kb, []),
p(e(place, lit_rate, place, all), ePL(1.0), kb, []),
p(e(place, lit_rate, place, all), ePL(1.0), kb, [])
]),
p(v(europe, lit_rate, high), vPL(0.9), kb, [])
])
3. v(x, gov, com)
p(v(x, gov, com), vPL(0.8), aq, [
p(aq(x, gov, com), vPL(_), kb, []),
p(v(x, gov, com), vPL(0.8), modusponens, [
p(impl(v(x, gov, com), [v(x, lit_rate, [low, medium]),
v(x, gnp_chg, [fast_grow, slow_decr, slow_grow])]),
iPL(1), speci, [
p(h(x, place, all), hPL(1.0, 0.01), kb, []),
p(impl(v(place, gov, com), [v(place, lit_rate, [low, medium]),
v(place, gnp_chg, [fast_grow, slow_decr, slow_grow])]),
iPL(1), aq, [])
]),
p(v(x, lit_rate, [low, medium]), vPL(0.8), kb, []),
p(v(x, gnp_chg, [fast_grow, slow_decr, slow_grow]), vPL(0.9), kb, [])
])
])
Table 2: Questions and answers returned by MILS
Conclusions and Further Works
Acknowledgments
The research reported in the paper was supported by the
grant “Information management and decision support system for the Government Protection Bureau” (No. DOBRBIO4/060/13423/2013) from the Polish National Center for
Research and Development.
Multistrategy Inference and Learning System (MILS) can be
considered as an inference system that can manage knowledge in a multi fashion way, in a way similar as human beings do. It combines search, inference and machine learning capabilities that are performed in a uniform way. As a
result, the reasoning process is more creative than in classical AI models. Depending on the agent’s experience and the
context, other knowledge may be discovered from the stored
statements.
References
Alkharouf, N. W., and Michalski, R. S. 1996. Multistrategy
task-adaptive learning using dynamically interlaced hierarchies. In Michalski, R. S., and Wnek, J., eds., Proceedings
of the Third International Workshop on Multistrategy Learning.
Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990a.
Plausible reasoning: An outline of theory and the validation
of its structural properties. In Intelligent Systems: State of
the Art and Future Directions. North Holland.
Boehm-Davis, D.; Dontas, K.; and Michalski, R. S. 1990b.
A validation and exploration of the Collins-Michalski theory of plausible reasoning. Technical report, George Mason
University.
Further works will concern enrichment of software capabilities by extending the range of implemented knowledge
transmutations. For example, adding clustering algorithm is
planed. It will be used to derive similarity formulas.
Testing the system in other (more realistic) domains is
also considered. Application in a system for data analysis
that is being developed for the Polish Government Protection Bureau is considered. MILS will be applied in decision
support component. Application of MILS in robotics seems
to be also a good research direction.
37
Collins, A., and Michalski, R. S. 1989. The logic of plausible reasoning: A core theory. Cognitive Science 13:1–49.
Gabbay, D. M. 1991. LDS – Labeled Deductive Systems.
Oxford University Press.
Hart, P.; Nilsson, N. J.; and Raphael, B. 1968. A formal
basis for the heuristic determination of minimum cost path.
IEEE Trans. Syst. Science and Cybernetics 4 (2):100–107.
Hieb, M. R., and Michalski, R. S. 1993a. A knowledge
representation system based on dynamically interlaced hierarchies: Basic ideas and examples. Technical report, George
Mason University.
Hieb, M. R., and Michalski, R. S. 1993b. Multitype inference in multistrategy task-adaptive learning: Dynamic interlaced hierarchies. Technical report, George Mason University.
Michalski, R. S. 1973. AQVAL/1 – computer implementation of a variable valued logic VL1 and examples of its
application to pattern recognition. In Proc. of the First International Joint Conference on Pattern Recognition.
Michalski, R. S. 1994. Inferential theory of learning: Developing foundations for multistrategy learning. In Michalski, R. S., ed., Machine Learning: A Multistrategy Approach,
Volume IV. Morgan Kaufmann Publishers.
Morgan, C. G. 1985. Autologic. Logique et Analyse 28
(110-111):257–282.
Quinlan, J. 1993. C4.5: Programs for Machine Learning.
Morgan Kaufmann.
Śnieżyński, B. 2001. Verification of the logic of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent
Information Systems 2001, Advances in Soft Computing.
Physica-Verlag, Springer.
Śnieżyński, B. 2002. Probabilistic label algebra for the logic
of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Systems 2002, Advances in Soft Computing. Physica-Verlag, Springer.
Śnieżyński, B. 2003. Proof searching algorithm for the logic
of plausible reasoning. In Kłopotek, M., and et al., eds., Intelligent Information Processing and Web Mining, Advances
in Soft Computing, 393–398. Springer.
38
Download