Abduction using Neural Models

EEL 6876
Intelligent Diagnostics
Abduction using Neural Models
By Madan Bharadwaj
Dr. Avelino Gonzalez
Abdu ction u s ing Ne ura l Mod e ls
Addressing abductive reasoning effectively remains a formidable challenge to researchers world
over. Most of the work on Abduction has been done using a logical or a probabilistic approach.
In this paper, I review a few papers that discuss abduction in a Neural framework. The networks
proposed cover almost the complete range of abduction problems and provide elegant solutions,
which can be interpreted as similar to abduction performed by human beings.
With the rising prominence of Abduction as a vital form of reasoning in AI many researchers
have attempted to address abductive reasoning in a logical or a probabilistic framework. Some of
them have however pondered the use of Neural or connectionist models to deal with abductive
reasoning. Some of the first works using this approach dates back to late 80’s [3] when
‘connectionist’ ideas were floated and followed up with more developed neural models [4] to
address abduction. In the following sections we will see some of the more recent ideas which
have evolved and have been experimentally proved to provide a framework to use abduction for
real world problems.
It is rather surprising that abduction got recognized as a prime form of human reasoning only
recently when most of what we know can be thought of as abduced knowledge from real life
observations. To put that in an informal definition, Abduction is the process of evolving a
hypothesis from a set of observations. An elementary explanation of abduction can be found in
On the other hand Neural Networks are a class of algorithms that are taught to differentiate
between similar and dissimilar data using a training data set and then are used in the real world to
do the same on real data. Though one cannot immediately make a connection between the two,
they can be interpreted to be doing the same thing.
Let us consider an example to illustrate the discussion floated in the previous section. A very
easily understood and widely used example for demonstrating the application of Neural
Networks is the ‘Character Recognition’ task. A neural network is popularly used to recognize
handwritten characters so that it can be fed automatically without having to perform manual
For example if we want our network to recognize A’s and B’s of the English alphabet, we train
the network with a collection of handwritten A’s and B’s and ‘teach’ it to recognize the two of
them as two different classes of data. Figure 1 shows a sample training set of 5 A’s and 5 B’s
(say collected from your rough book) that can be used to train a neural network for this task.
Once trained with the training data they will create classes in an implicit manner to represent A’s
and B’s. Now, they can be tested on real data from a real data source (an ‘A’ picked out from
your handwritten class notes) to elicit a response from it. The Neural Network has the inherent
capability to generalize from specific instances and hence will be able to recognize your ‘A’
once it sees it.
Figure 1: Handwritten Characters. A’s and B’s
Figure 2: After training the Neural Network classifies data into classes
Now returning to our original discussion, abduction can be seen as generalizing from a set of
observations to synthesize a hypothesis in order to explain the observations. In reality, we have a
collection of hypothesis and our abduction algorithm will be expected to pick one among them,
the one which best explains the observed data as it understands it. The observations can be
intuitively linked to the handwritten A’s and B’s and the collection of hypothesis can be
connected to the classes that the neural network creates to partition the data.
Through this simple explanation we can make an association in our minds between the properties
of abduction and the functioning of neural networks. With this introduction let’s probe into the
more concrete ideas that discuss implementing the concept.
There are two basic approaches to designing a neural network architecture.
1. Designing neural network architecture to reflect the problem dynamics
2. Designing an Energy function to represent the same
The first paper [1] we are going to discuss follows the first approach and portrays a simple
architecture that covers most domains of Abduction providing an elegant solution to our
problem. The second paper [2] proposes the latter approach and attempts to minimize an error
function to attain the same end.
They call the network UNIFY and true to its name it is a unified model for all classes of
abduction problems as opposed to the other model which will only address linear and monotonic
abduction problems. The network is a simple 3-layered architecture built specifically to
implement abduction. The network hence bears a unique physical structure unlike most other
neural network models used for more multifarious purposes. The model works on the basis of
competition between cells (or nodes as it is more commonly addressed in neural network
literature) to encode the observations and maximize fitting hypotheses. The algorithm is
discussed incrementally, introducing the simple version first for the sake of simplicity and better
understanding and then introducing minor modifications to accommodate for other classes of
abduction problems. The modifications made are general and do not affect the class of abduction
treated before the modification.
For any further discussion we need to understand the different types of abduction problems and
what they mean, to understand its ramifications on the network architecture. To define them
rather briefly, four major types of abduction problems are encountered in the literature. They are
1. Independent Abduction Problems
2. Monotonic Abduction Problems
3. Open Abduction Problems
4. Incompatible Abduction Problems
An abduction problem is defined as AP (H, O, R), where H denotes an elementary set of
Hypothesis, O denoted the set of observable facts and R denotes a mapping from H to O.
If each causal relationship can be expressed in the form h  o and where h  H & o  O . If one
and only one elementary hypothesis is needed to explain one observation, then the problem is
If there is at least one causal relationship which can be expresses as  h  H  o meaning, o
needs to be derived from more than one hypothesis as a conjunction of two hypothesis, then the
problem is Monotonic.
For an observation uplet1 (OP, OA, OU), if we have OA  OU then the problem is Open. If
there is at least one instance from the observation set that is observed to be absent and one
instance that is explicitly stated as unknown then the problem is Open.
For an observation uplet1 (OP, OA, OU), if we have atleast one causal relationship which is
expressed as  h  H  , and  OA then the problem is Incompatible. If there is at least one
observation is observed to be absent but can be derived from the cardinal set of elementary
hypothesis which are synthesized from the other observations, then it belongs to the
Incompatible class of abduction problems.
The explanations provided above are inadequate to provide the clearest picture one can provide
towards the explanation of abduction classes. I urge the reader to refer pages 2-3 of [1] where it
is more clearly and elaborately explained with an example.
The paper describes a simplified initial model to guide the reader through the process of building
the comprehensive network from its smaller and simpler blocks. A pictorial representation of the
idea is depicted in Figure 3.
Hypothesis Layer
Observation Layer
Inhibitory Weights
Excitatory Weights
Figure 3: The Initial Mode. Reprinted from [1]
The network comprises of two layers, one consisting of all the observations and the other
consisting of all the hypothesis. The two are connected by weights as described in the mapping
provided initially. Figure 4 illustrates the weights as described in the mapping rules
Figure 4: An illustration to show weights in axioms
The weights are intuitively the belief that we have over the presence of the hypothesis given the
observation. By altering the weights we can alter our belief or disbelief in our hypotheses. There
are two types of weights that are introduced in the architecture: Excitatory weights and Inhibitory
weights. The excitatory weights are present in between a hypothesis and an observation, whereas
an inhibitory weight can be seen in between two hypothesis. They serve to discredit opposing
hypothesis intending to weaken it and shove it off the competition process.
The network is initialized to its startup setting where the values of the observation cells are
initialized to values based on the belief of the observation: a 0.5 for an observation cell would
mean a 50% belief that the observation was observed. The excitatory weights are initialized to
the values as given by the mapping rules and those that are not mentioned are set to zero. By this
setting the observation will not affect hypothesis that are not related to it. The initialization
inhibitory weights are not discussed and going by common sense they must be set to zero. The
hypothesis taken a initial value dictated by the equation…
and A is a small value, say 0.001.
competition between the hypothesis begin.
For every iteration Stage II of the algorithm
described in Figure 5 is executed until the
termination condition is met or in neural
network terminology, ‘until the network
When the termination condition is met, one
can expect one or more hypothesis to have
won the competition and hence would
comprise the final set of elementary
hypothesis that can explain the observation
set fully.
Figure 5: Step by Step explanation of the algorithm. [1]
The unique structure of the network can be explained or its usefulness realized only when we
analyze the equations used to govern the dynamics of the network. Let us review the equations
used for this purpose.
There are four major equations that govern the activities of the network.
The hypothesis cell updating equation given by
Where EXi represents the Excitatory input to the hypothesis and the IHi represents the inhibitory
input the hypothesis cell. Note that the excitatory input is on top of the equation whereas the
inhibitory input is in the bottom. Hence more excitation will improve the value of the hypothesis
cell and more inhibition will lessen the value of the cell, which will translate into better or poorer
chances for the hypothesis to win the competition with the other hypothesis.
The Excitatory weight updating equation given by
Note that the weight is updated depending on the cell values of the concerned hypothesis and
observation and the present weight it holds.
The inhibitory weight equation given by
This equation is interesting since it calculates the inhibitory weight based on the differences
between the previous and present cell values of the two hypothesis cells that it connects. If one
value is negative and one is positive the weight will increase and hence weaken the other
hypothesis hence effectively implementing the hypothesis negation implemented in a logical
To summarize the working of the network one can say that the network structure ensures real
value inputs to hypothesis and observations and hence can portrays real life situations more
effectively than other approaches. The competition process may also be intuitively understood as
similar to the one that happens in our minds, where we weigh different theories to explain an
The authors have picked out a neat example to illustrate the concept. They have picked a simple
observation set and a mapping rule set to initialize the network and have come up with answers
that can be verified easily within a logical framework.
Figure 6: A graph showing the hypotheses cell values at various instances. Reprinted from [1]
Figure 6 shows a graph plotting the values of the different competing hypothesis at different time
intervals. One can immediately see how some hypothesis cells decay to zero and two others
survive till termination. Actually, one hypothesis comes out clearly as a winner and another
stabilizes to a value considerably greater than zero. The results can show how the process
debunks false or incomplete hypothesis and exalts correct or appropriate hypothesis. The other
immediate consequence is also that it gives real value outputs which can be construed as beliefs
in the resulting hypothesis. This would also be very much representative of the resuts one can
expect from real world abduction problems.
In the last section the initial model we introduced provides the basis for the unified model that
we are going to present in this section. The reason for doing so is that the initial model cannot
explain the more complicated monotonic, open and incompatible abduction problems. The initial
model cannot resolve conjunction of hypothesis and hence we are introducing the unified model
that will address all problems including simple ones effectively.
The unified model consists of three layers in place of two of the initial model to help conjunction
of hypothesis. If an observation has to be explained by a conjunction of hypothesis only then the
intermediate layer comes into play. A node on the intermediate layer connected to the respective
hypothesis will act as a middle man to channel flow of information between the hypotheses and
the observation. Using slightly modified equations the unified model addresses the problem
effectively and elegantly. The nodes on the intermediate layer act like a composite jumbo
hypothesis representing the conjunction of all the associated hypotheses. This way the model can
be addressed like the initial model in basic framework with changes to accommodate the
intermediate layer implemented in the equations.
Figure 7: The Unified Model. Reprinted from [1]
Apart from this structural addition, there is one more major appendage implemented to ensure
maximum problem domain coverage. To cover the class of Incompatible abduction problems the
hypothesis cells that constitute incompatibility are connected by two connections. They act very
much similar to the inhibitory connections and in a way just add to the inhibition of these
hypotheses to avoid incompatibility in the final solution set.
Let us have a look at the equations that cover the network now. The hypothesis cell equations is
now given by
The ICi represents the connection weights of the Incompatibility connections. Since they inhibit
the hypothesis they are represented in the denominator. Negative influences of absent
observations are realized by using EX+ and EX- notations where, EX- represents the observations
are observed to be absent. The cell values of the observations cells for these observations will be
negative, hence inhibiting the hypothesis.
The incompatibility weights are updated according to the equation
where g1(x)=1 if x>0 or else 0 and g2(x) = 1 if x is a Exclusive hypothesis or in other words the
only hypothesis to observe a particular observation. The major benefit of the equation is it
eschews discriminating against exclusive hypothesis but inhibits all other hypothesis that are
connected, hence providing a way out of the incompatibility condition.
The algorithm was tested on various toy problems and graphs have been plotted to give the
reader an intuitive understanding of the results. Further the algorithm is tested on a problem
constructed from a real world murder case that went to trial by axiomizing the evidences and
constructing the possible outcomes of the case as candidate hypotheses. The example is very
interesting and shows that the network in very good light.
The authors Goel and Ramanujam put forward a more conventional approach in [2] that has been
time tested and has well known properties. They propose two approaches, one using the Hopfield
model and other using ART2. They have also clearly stated the limitations of their study and
constraints using their approach.
The authors seem to prefer the use of the Energy function approach to the design of a neural
network, with analog internal operating parts. This they claim has inherent advantages over
digital circuitry. The idea is to develop a network that will minimize an Energy function and find
the minima of the function and hence finding the best hypothesis. To get to this end they have
performed some operations on the data set partitioning them into domains and using subnets of
the proposed neural architecture for each domain.
The paper starts by saying that the algorithm is being implemented for linear and monotonic
abduction problems. The first step of the process is to formulate the abduction task as an
constrained optimization problem. This they achieve by representing the observations and
hypotheses in data sets with certain easily deduced relationships between them. Then they
partition the problem into as many sets as possible, or in other words create a Power Set 3 for the
observation set and the hypotheses set and create a map between the two.
They then define the characteristics based on which the classifier will choose the hypotheses as
the best among its compatriots. They broadly define the characteristics as 1. Maximal Coverage
2. Maximum Belief and 3. Minimal Cardinality. They then take into account the possible
connection or associations that can be allowed between elementary hypotheses.
With this framework in hand they now approach the problem with a Hopfield neural model with
neurons having to minimize an energy function given by
where the W’s are the connection weights and the Vi and Vj are the outputs of the neurons. By
getting to minimize this function the network will determine choosing the best hypotheses.
The authors have also proposed an ART model for the same purpose, where they split the
domain into observations domain and hypotheses domain . By making use of local decisions
taken by the observation neurons and the hypotheses neurons and the information exchange
between the two domains a composite hypotheses set is synthesized.
I have made only a brief review of these models because they do not address all classes of
abduction problem and hence their significance is greatly truncated.
From this essentially incomplete review of literature and my other readings, I can see more and
more study is being done on developing models for abductive reasoning. Non-Monotonic
reasoning and reasoning under uncertainty are active research areas now. In my understanding
the framework for this kind of reasoning has to be in essence fuzzy. Since a large part of real
world explanations in the human reasoning process are not purely axiomized; where evidence for
a particular occurrence is not always 100% true but can be expressed with partial confidence and
since results of a hypotheses are evidence for some other hypotheses in real life, it is very much
necessary to maintain a fuzzy nature in the framework under which we can discuss abduction.
This property is inherently present in Neural Networks and hence can prove to be an effective
framework for abductive reasoning.
The downside of the approach is that the functioning of a neural network is still very much
abstract and has not been completely understood by researchers. This aspect promotes the
tendency not to invest heavily on techniques that can backfire at a later time and would force
researchers to redo and reprove all their work using a different approach. Hence the real credit
that Neural Networks may deserve is lost due to cynicism.
However recent studies like deriving a rule base from a neural network that explains the
functioning of a neural network [7] and other similar studies can help researchers shoo the fears
and embrace neural networks as a test bed for their research.
The UNIFY paper [1] concedes that it still cannot solve the problems mooted by the cancellation
class of abduction problems discussed in [8]. The authors have promised more research on that
aspect and develop or modify their UNIFY to solve all the classes of Abduction problems.
Other Neural frameworks can be considered for the purpose. I would think ART provide a far
more elegant and flexible solution to the purpose than described in [2]. Studying that avenue of
thought would be an exciting experience. Genetic algorithms can also be touched upon, to see if
they can provide any basis for the problems.
This paper reviewed a few interesting neural models proposed to solve abduction problems. The
UNIFY algorithm put forward by Ayeb et al stands out for their simplicity, applicability and
domain coverage. Other approaches using Hopfield networks and ART were also briefly
discussed. The concluding sections critiqued the idea as a whole, and provides some suggestions
and explanations behind the philosophy of using neural networks for abduction.
- An observation uplet (OP, OA,) is defined primarily for Multiple Observations which are
applicable only for the Open and Incompatible class of abduction problems. Here OP represents
the observations that are known to be present, OA represents the observations that is known to be
absent and OU represents the observations that are unknown.
- ART- Adaptive Resonance Theory
– A Power Set is a set containing all possible subsets of a set
[1]. B.Ayeb, S.Wang and J.Ge, “A Unified Model for Abduction-Based Reasoning” IEEE
Transaction on Systems, Man and Cybernetics – Part A: Systems and Humans, Vol 28,
No. 4, July 1998
[2]. A.K. Goel and J. Ramanujam, “A Neural Architecture for a Class of Abduction
Problems”, IEEE Transaction on Systems, Man and Cybernetics – Part B –
Cybernetics, Vol. 26, No. 6, December 1996
[3]. _____, “A Connectionist Model for Diagnostic Problem Solving: Part II”, IEEE
Transaction on Systems, Man and Cybernetics., Vol19, pp. 285-289, 1989
[4]. A. Goel, J. Ramanujam and P. Sadayappan, “Towards a ‘neural’ architecture of
abductive reasoning”, in Proc. 2nd Int. Conf. Neural Networks, 1988, pp. I-681-I-688.
[5]. D.Poole, A. Mackworth and R.Goebel, “Computational Intelligence: A Logical
Approach”, pp 319-343, Oxford University Press, 1998.
[6]. C. Christodoulou and M. Georgiopoulos, “Applications of Neural Networks in
Electromagnetics”, Boston: Artech House, 2001.
[7]. Castro, J.L.; Mantas, C.J.; Benitez, J.M., “Interpretation of artificial neural networks by
means of fuzzy rules”, IEEE Transactions on Neural Networks, Volume: 13 Issue: 1,
Jan. 2002. Page(s): 101 –116
[8]. T. Bylander, D. Allemang, M. C. Tanner, and J. R. Josephon, “The computational
complexity of abduction,” Artif. Intell., vol. 49, pp. 25–60, 1991.