From: AAAI Technical Report FS-97-04. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Soft Computing Explains
Heuristic Numerical Methods in Data Processing
Logic Programming
and in
Hung T. Nguyen
Department of Mathematical Sciences
NewMexico State University
Las Cruces, NM88003, USA
hunguyen@nmsu.edu
Vladik Kreinovich
Department of ComputerScience
University of Texas at E1 Paso
E1 Paso, TX 79968, USA
vladik@cs.utep.edu
Bernadette Bouchon-Meuiner
LIP6, UPMC,Case 169
4 place Jussieu
75252 Paris C6dex05, France
bouchon@poleia.lip6.fr
Abstract
Weshow that fuzzy logic and other soft computing
approachesexplain and justify heuristic numericalmethods
used in data processing and in logic programming,in
particular, M-methods
in robust statistics, regularization
techniques,metricfixed point theorems,etc.
Introduction
What is soft computing good for? Traditional
viewpoint. Whenare soft computing methods (fuzzy,
neural, etc.) mostly used now?Let us take, as an example,
control, which is one of the major success stories of soft
computing(especially of fuzzy methods; see, e.g., (Klir
1995)).
¯ In control, if we know the exact equations that
describe the controlled system, and if we knowthe
exact objective function of the control, then we can
often apply the optimal control techniques developed
in traditional (crisp) control theory and computethe
optimal control.
However,the major application of soft computingin
control is to the situations whenwe only have partial
knowledgeabout the controlled system and about the
objective functions and in which, therefore,
traditional optimal control theory is not directly
applicable. Here is where all knownsuccess stories
come from: utilities
like washing machines or
camcoders, car parking automation, and other
applications all share one thing: they all have to
operate in a partially knownenvironment.
From this viewpoint, as we gain more and more knowledge
about a system, a momentcomes when we do not need to
use soft computing techniques any longer: when we have
accumulated enough knowledge, we will then be able to
use traditional (crisp) techniques.
Fromthis viewpoint, soft computingmethodslook like a
(successful but still) intermediate step, "poor man’s"data
processing techniques, that need to be used only if we
cannot apply "more optimal" traditional methods.
Evenin these situations, we can, in principle, use
soft computingmethodsinstead: e.g., we can use
simpler fuzzy control rules instead of (more
complicated) traditional techniques. As a result,
we may get a control that is much easier to
computebut that it somewhatworse in quality.
Another possible use of soft computing:it has a great
potential for justifying heuristic methods. The above
viewpoint summarizesthe existing usage of soft computing
techniques: currently, these methods are, indeed, mainly
used only when we lack information. However,as we will
30
try to showin this paper, the potential for soft computing
techniques is muchbroader.
Indeed, let us assume that we have the complete
information, and so, we can use somecrisp data processing
algorithms. Where do these algorithms come from? If
several data processing algorithms are applicable, whichof
these algorithms should we choose? In a few cases, these
algorithms have a profound mathematicaljustification, but
in most cases, these methodsare heuristic in the sense that
their justification comes from informal arguments rather
than from the formal proofs.
Now, "informal" means formulated in a natural
language, not in the languageof mathematics.So, to justify
these methods, we must formalize this natural language
description in precise terms. This is exactly what soft
computing(especially fuzzy logic) is doing.
So, soft computinghas a great potential in justifying
heuristic methods.
What we are planning to do. In this paper, we will show
that soft computingmethodscan indeed explain heuristic
methods, both in more traditional data processing and in
intelligent data processingtechniques(e.g., related to logic
programming).
Data processing methods: a useful
classification
Before we explain howsoft computingcan be used, let us
briefly classify the existing methodsof data processing by
the complexityof results that we want to achieve:
In this paper, we will showthat soft computingexplains
heuristic data processingmethodsof all three types.
Soft computing explains heuristic
methods in statistics
Twotypes of heuristic methods in statistics.
Most
statistical methodsare well justified. There are only two
areas where heuristic methodsare still used (see, e.g.,
(Wadsworth1990)):
While we are computing and transforming
probabilities, we are on the safe ground of welljustified techniques. Whatever estimate we get, we
can always compute the probabilities of different
possible errors for this estimates, the probabilities for
different errors in estimating the above-mentioned
probabilities, etc. However, comes a time when we
need to make precise recommendations (make
decisions). And here is when we often have to use
heuristic methods.
Let us give a simple example of what we mean.
Measurement errors usually have a Gaussian
distribution, with some standard deviation a. We
usually knowthis value ~ after testing the measuring
instrument. A natural question is: if we have
measured the value.V ,what are the possible actual
values of the measuredquantity x?
The literal answer is "any", because for Gaussian
distribution, arbitrary large deviations .~ - x have
non-zero probability: with probability 5%, we can
have errors greater than 2a, with probability 0.1%,
errors greater than 3~, etc. Supposethat we bought
an instrument, made a measurement, and the error
was greater than 2a. Well, we wouldconclude that it
is possible. But if an error was greater than 6a (which
corresponds to the probability = 106%), we would
simply conclude that it cannot be a randomerror: we
would ask the manufacturer to replace the faulty
instrument, and he will most probably replace it (or
at least repair it).
In some problems, all we want is one or several
numerical values. It may be that we measure some
characteristics, or it maybe that we knowthe model,
and we want, based on the experimental data, to
estimate the parameters of this model. These
problemsare usually handled by statistical methods.
In other problems, we want to knowa function. We
may want to reconstruct an image (brightness as
function of coordinates), we may want to filter
signal (intensity as a function of time), etc. These
methods are usually handled by different
regularization techniques.
Finally, there are even more complicated problems,
in which we want to reconstruct a model of an
analyzed system. Methodsthat handle such problems
are called intelligent data processing methods. Many
of these methodsare based on logic programmhzg,a
formalism that (successfully) describes complicated
logical statements algorithmically, in a kind of
programminglanguage terms.
31
This common sense decision is based on the
heuristics idea that if someevent has a very small
probability (like < 106%), then this event cannot
occur. If you walk into a casino and a roulette, which
is supposedto be random,stops on red 100 times in a
row you wouldconclude that it is broken.
This idea, however, is very difficult to formalize,
because we can alwaysdivide the real line into small
enough segments so that the probability of being in
each of them is _<106%. Thus, if we simply postulate
that such low probability events do not happen, we
will have a make a nonsensical conclusion that no
value of error is possible at all. Thus, we haveto use
heuristic methods.
Another situation in which we have to use heuristic
methodsis robust statistics (see, e.g., (Huber 1981)
and (Wadsworth1990)): If we knowthe exact values
of the probabilities, then we can use traditional welljustified heuristic methods. However,often, we do
not knowthese probabilities. What methods should
we then use? For example, let us consider the
situation in whichwe want to find the parameters C1,
.... CI, of a modely= f(xl .... ,xn, Ci ..... Ci,) based
the results (xff) ..... x (k) , y(k)), <K,ofseveral
(K) measurements in which we measure both the
inputs xi, and the output y. Ideally, weshould find the
values of Ci for which the modelexactly predicts all
measurementsresults, i.e., (k)for which all prediction
.,
Cl.,.. .. C ) are equal
errors e(k)_ yCk)_f(xff) ,..
t,
to O. This requirement makes perfect sense for an
idealized situation in which all the values of y are
measured with an absolute accuracy. However,since
measurementsare never 100%accurate, even if the
modelis precise, the measuredvalues y(k) will still
differ from the actual values, and thus, from the
model’s prediction. Thus, we cannot require that the
errors are exactly equal to O; we can only require that
theseerrorsej ..... e~ are small.
Whenerrors are normally distributed, then a natural
wayto determine the parameters is by using the least
squares methodet 2 +...+ ek2 --> min. If we do not
knowthe probabilities, we can use a class of heuristic
methods called M-methods, in which we find the
parametersCj ..... Ct, fromthe condition
~(ej) +...+~(er0 --->min
(1)
for some function ~(x). These methods often work
fine, but they are still heuristic. Howcan we justify
them?
The next question is: what function gt(x) should
choose?
Let us showthat both types of heuristic methods can be
justified by using soft computing(namely, fuzzy logic).
Formalization of the requirement that events with
small probabilities cannot happen. Wewant to describe
the requirement that an event with a sufficiently small
probability, i.e., with a probability that does not exceeda
certain small numberPo << 1, cannot occur. The problem
with a standard probabilistic
justification
of this
requirement is that we mayhave manyevents El ..... Em
each of which has a very small probability p(Ei,) < Po.
There is no problem with excluding each of these events.
However, when we want to exclude all of them, the
excluded event E = El V . . . V Emcan have a very high
probability (all the wayup to 1).
To avoid this problem, we can, in situations of partial
information, use degrees of belief d(Ei) instead of
probabilities p(Ei). In this case, the degree of belief d(E)
the entire excluded part E = El V . . . VEmcan be
determined from the degrees of belief d(Ei) by using a tconorm a V b, a fuzzy analogue of "or" (Klir and Yuan
1995), (Nguyen and Walker 1997): d(E) d(E1) V. .. V
d(Em). The simplest t-conormis a V b = max(a, b). For this
t-conorm, if d(Ei) < Po for all i, then d(E)
max(d(E1)..... d(Em)) is also < Po. Thus, we can safely
excludeall these events.
Justification
of M-methods in robust statistics.
Informally, the requirement for choosing the parameters of
the modelis that all the errors ei are small, i.e., that el is
small, e2 is small ..... and eK is small. A natural wayto
formalize this requirementis to use fuzzy logic. Let It (x)
be a membership function that describes the naturallanguageterm "small". Then, our degree of belief that el is
small is equal to It(e0, our degree of belief that e2 is small
is equal to It(ez), etc. Toget the degreeof belief d that all
conditions are satisfied, we must use a t-norm (a fuzzy
analogue of "and"), i.e., use a formula d = la(e0&...&
It(eK), where&is this t-norm.
In (Nguyen, Kreinovich, and Wojciechowski 1997),
have shownthat within an arbitrary accuracy, an arbitrary
t-norm can be approximated by a strictly Archimedeantnorm. Therefore, for all practical purposes, we can assume
that the t-norm that describes the experts’ reasoning, is
strictly Archimedeanand therefore, has the form a&b=
cpq(tp(a)+ cp(b)) for somestrictly decreasing function ~p
(Klir and Yuan 1995), (Nguyen and Walker 1997). Thus,
d = qfl(q)(it(ej)) + .... + ¢p(it(eK))). Wewant
values of the parameters for which our degree of belief d
(that the modelis good) is the largest possible. Since the
function q~ is strictly decreasing, d attains its maximum
if
and only if the auxiliary characteristic D = ~p (d) attains its
minimum. From the formula that describe d, we can
conclude that D = q~(it(e0) + .... + cp(it(eK)). Thus,
conditionD ---> mintakes the form(1), with ~(x)=q)((it(x)).
So, we have indeed justified
the M-methods. This
justification enables us to answer the second question:
what function ~(x) should we choose. Weshould base this
choice on the opinion of the experts. Fromthese experts,
we extract the membershipfunction It(x) that corresponds
to "small", and the function ¢p(x) that best describes the
experts’ "and".
Comment. In image processing, M-methods are called
generalized entropy methods, and the function ~g(x)
called a generalized entropy function. In (Flores, Ugarte,
and Kreinovich 1993), (Mora, Flores, and Kreinovich
1994), and (Flores, Kreinovich, and Vasquez 1997),
have successfully used this method for radar imaging
(including planetary radar imaging). For such problems,
32
minimization of the function (1) is a difficult task;
compute the minimum, we used another soft computing
technique: genetic algorithms.
The difference x and -x are of the same size, so the
degree of belief that x is small must be the same as the
degree of belief that -x is small. Hence,It(x) = It(-x), i.e.,
It(x) is an evenfunction.
Soft computingexplains heuristic
methodsin regularization
Heuristics. One of the main problems in reconstructing a
signal is that this problemis ill-defined in the following
sense: if we knowthe values x(t,) of the signal x(t) for
several consecutive momentsof time tl ..... t,, we can, in
principle, get arbitrary values in between. To make
meaningful conclusions, we need to restrict the class of
signals to signals that smoothlydependon time, i.e., which
cannot deviate largely from x(tk) while t goes fromtk to the
next value tk÷~ ¯ Thus, we have two problemshere:
¯
first,
and
¯
second, when we have fixed the smoothness, which
extrapolation techniques should we choose.
Thelarger x > 0, the less reasonableit is to call x small,
thus, the membership
It(x) must be strictly decreasingfor x
>0.
It is natural to assumethat smallness of signal intervals
(of type x(t) -x(s)) is described by a similar membership
function, with the only difference that signals may be
measuredin different units. Therefore, we describe ’small’
for signals as It((x(t)-x(s))/k)
for somemultiplicative
constant k that describes the possible difference in units.
Thus, the above inequality takes the form I.t (t-s) <
It((x(t)-x(s))/k) for all t and s. Sincethe functionIt is even,
we have It(x)= It(Ixl), and the aboveinequality can be
writtenas It (It-s0 < It(l(x(t)-x(s))l/k). SinceIt(x) is a strictly
decreasingfunction, this inequality is equivalent to It - sl >
Ix(t) - x(s)l/k, i.e., to Ix(t) - < k.lt-sl, and
howcan we limit the smoothnessof the signal,
Ix(t)-x(s)l
In both cases, we mostly have to use heuristics methods.
Let us show that these methodscan be justified by using
fuzzy logic.
</~
It-sl
Justification of smoothness. Intuitively, the unknown
signal mustsatisfy the propertythat its values x(t) and x(s)
in two close momentsof time should be, themselves, close
to each other. In other words, we must have the following
implication: if t and s are close, then x(t) and x(s) shouldbe
close. Howcan we formalize this requirement?
In classical (two-valued) logic, the statement "if A then
B" is equivalent to the inequality t(A) < t(B), where t(A)
and t(B) are the truth values of the statements A and B
(t(A) = 1 if A is true, and = 0 otherwise). Similarly,
fuzzy logic, if we have a 100%belief in the implication "if
A then B", it is usually interpreted as d(A) < d(B), where
d(A) and d(B) are the degrees of belief in A and B,
respectively. So, the above statement can be reformulated
as the inequality d(A) < d(B), where d(A) is the degree of
belief that t and s are close, and B is the degree of belief
that x(t) andx(s) are close.
Intuitively, the degree of closeness between the two
moments of time t and s depends only on the interval
betweenthem, i.e., on the difference t - s. In other words,t
is close to s if and only if the difference t--s is small.
Similarly, x(t) is close to x(s) if and only if the difference
x(t)-x(s) is small. To describe these degrees of belief in
precise terms, we must know the membership functions
corresponding to the word "small". Let It(x) be the
membershipfunction that describe "smallness" of time
intervals.
33
Thus, out of a fuzzy informal restriction we got a crisp
restriction on the signal x(t). Whens---->t, we concludethat
the timederivative x(t) of the signal x(t) is limited by k.
Moreover, from the experts, we can elicit
the
membershipfunctions that correspond to "small" for time
intervals and for signal values, and therefore, extract the
value k that limits the time derivative.
Justification of traditional regularization methods. In
(Kreinovich et al. 1992), we have shown that if
formalize the statements like "the derivative x(t) of the
signal x(t) should be small for all t", and "the value of the
signal itself should not be too large", then we arrive at the
extrapolation techniques called regularization, that chooses
a signal x(t) for which~(x(t))2dt+ 7~..~(x(t))2dt-->min, and
that, moreover, thus justification enables us to find the
value of the regularization parameter ~. based on the
experts’ knowledge.
Soft computingexplains heuristic
methods in logic programming
Heuristic numerical methods in logic programming.
Logic programming is based on traditional,
two-valued
logic, in which every statement is either true or false
(inside the computer: 1 or 0). However,in proving results
about logic programs, and in computingthe answer sets, it
is often helpful to use numbersin between 0 and 1. These
intermediate numbers are usually introduced ad hoc,
without any meaningfulinterpretation, as heuristic tools.
¯
Soft computing justification.
Experts sometimes err.
Hence,whenan expert assigns the truth values tb...,tn .... to
the properties Pi, his assignments may differ from the
actual (unknown)truth values T1 .....
T. ....
of the
properties of the described object.
As a result,
even when we have two different
interpretations t g:s, it is still possible that these
interpretations describe the same object (T = S), and the
difference is simply due to the experts’ errors. Intuitively,
the "closer" t and s, the larger our degree of belief d(T = S)
that the (unknown)actual interpretations T and S coincide.
Thus, we can define the metric p(t, s) as 1 - d(T = S) = d(T
¢-S). Let us showthat this natural definition leads exactly to
the desired metric.
Indeed, T:~s if and only if 3n(T, ~ S, ). An existential
quantifier is, in effect, an infinite sequence of "or"
operations; so, to make the conclusion meaningful, we
must use max as an "or" operation (see the detailed
explanation of this choice in (Nguyenet al. 1997)). Hence,
d(T ~e S) = maxd(T.C-S,)
One of the cases whensuch numbersare used is the use
of metric fixed point theorems (Fitting 1994) (see also
(Khamsi,Kreinovich, and Misane 1993)). Manymethods
logic programmingused the idea of a sequential approach
to the answer, in which we start with some (possibly
incorrect) interpretation so, and then apply somereasonable
correcting procedures---)C(s) until no morecorrections
necessary. An interpretation s is usually described by
describing, for eachof the atomicproperties Pl ...... P ......
(i.e., properties that are either directly used in the logic
program, or that can be constructed based on the ones
used), whethereach particular property Pi is true or not. In
other words,s = (sl ..... s ..... ), wheresi = if pi is true,
and = 0 otherwise. In terms of the correcting procedure, the
fact that no morecorrections are necessary takes the form
C(s) = s. In mathematical terms, this means that the
resulting model s is a fixed point of the correcting
transformation C.
Fixed points are well-analyzed in mathematics.
Therefore, to prove the existence of answer sets and the
convergence of the above iterative procedure, it is
reasonable to use the existing mathematical fixed point
theorems.Traditionally, in viewof the discrete character of
traditional
logic programming models, in logic
programming, only discrete-valued fixed point theorems
were used. Fitting (Fitting 1994) was the first
successfully apply continuous-valued (namely, metric)
fixed point theorems in logic programming.The main idea
behind these methods is that we introduce a numerical
metric to describe the distance p(t, s) between two
interpretations t and s.
Metric methodsare used for logic programsin which the
atomicproperties are naturally stratified:
¯
n
If for someproperty p,, we have t. = sn, then for this
property, we have no reasons to believe that 7". ~ S,.
Therefore, for this n, d(T. ~ S,) = 0, and in the maximum,
we can only take into consideration the values n for which
tn:~ sn
Howdo the values d(n) = d(Tn ~ S,) differ?
If the corresponding property Pn can be directly
extracted from the rules, then we are the most
confident in the expert estimates. So, if an expert
does say that t. g: sn, we take d(n) to be equal to 1
(or close to 1 ). Wewill denote the corresponding
degree of belief by dl.
the first layer consists of the properties that can be
easily deduceddirectly from the logic program;
For statements from the second layer, we need to use
additional rules to justify them. Experts maynot be
100%sure in these rules. Hence, our degree of belief
in whatever conclusions we make by using these
rules should be smaller than in conclusions that do
not use these rules. As a result, for the properties Pn
from the second layer, our degree of belief d(n) that
the expert is right and that T, g: S, is smaller than for
the first layer. If we denote the degree of belief for
properties from the second layer by d2, we thus
concludethat dz < dl.
the second layer contains the ones that be determine
indirectly, i.e., for which,in addition to the rules, we
must knowthe truth values of the atomic statements
fromthe first layer;
¯ similarly, we can define third, fourth layers, etc.
The distance p(t, s) is then defined as L, where Lis the
first layer on whicht and s differ.
Similarly, the degree of belief d3, d4,
corresponding to different layers is a strictly
decreasing sequence: dl> d2> d3> ....
Comments.
¯
Howcan we justify this metric?
Fromthe viewpoint of metric fixed point theory, the
choice of 2L does not really matter; instead, we could
use any strictly decreasingsequencede.
34
6.
For each n, the degree of belief d(T, ~ Sn) is equal to db
where L is the layer that contains the property p, .The
smaller L, the larger dL. Therefore, the distance p(t, s )
d(T ~ S) , which is equal to the maximum
of these values
d(n) for all n for which t, ~: s,, is equal to dL for the
smallest L that contains a property n for whicht, ~ s,. In
other words,the distancep(t, s ) is equal to dr. for the first
layer L on whicht and s differ. This is exactly the desired
metric.
7.
Comment.Similarly to the previous cases, soft computing
methods not only justify methods of logic programming,
but also help to find appropriate algorithms for solving
logic programmingproblems; see, e.g., (Kreinovich 1997),
(Kreinovich and Mints 1997), and references therein.
8.
Acknowledgments
This work was partially supported by NSF Grants No.
EEC-9322370 and DUE-9750858, by NASA under
cooperative agreement NCCW-0089,and by the Future
Aerospace Science and Technology Program (FAST)
Center for Structural Integrity of AerospaceSystems,effort
sponsored by the Air Force Office of Scientific Research,
Air Force Materiel Command, under grant number
F49620-95-1-0518.
This work was partially done when V. Kreinovich was an
invited professor at Laforia, University Paris VI, Summer
1996.
9.
10.
11.
References
1.
2.
3.
4.
5.
Fitting, M. 1994. Metric methods: Three examples and
a theorem. Journal of Logic Programmhzg
21(3): 113127.
Flores, B. C., Ugarte, A., and Kreinovich, V. 1993.
Choice of an entropy-like function for range-Doppler
processing. In Proceedings of the SPIE/International
Society for Optical Engineering, Vol. 1960, Automatic
Object RecognitionIII, 47-56.
Flores, B. C., Kreinovich, V., and Vasquez, R. 1997.
Signal design for radar imaging and radar astronomy:
genetic optimization.
In: Dasgupta, D., and
Michalewicz, Z. eds. Evolutionary Algorithms in
Engineering
Applications,
Berlin-Heidelberg:
SpringerVerlag 406-423.
Huber,P. J. 1981. Robuststatistics. N.Y.: Wiley.
Khamsi, M. A., Kreinovich, V., and Misane, D. 1993.
A new methodof proving the existence of answer sets
for disjunctive logic programs: a metric fixed point
theorem for multi-valued maps. In Proceedings of the
Workshop on Logic Programming with Incomplete
Information, Vancouver, B.C., Canada, October 1993,
58-73.
12.
13.
14.
35
Klir, G., and Yuan, B. 1995. Fuzzy sets amt fuzzy
logic: theory and applications. Upper Saddle River,
NJ: Prentice Hall.
Kreinovich, V., Chang, C.C., Reznik, L., and
Solopchenko, G. N. 1992. Inverse problems: fuzzy
representation
of uncertainty
generates a
regularization. In Proceedings of NAFIPS92:North
American Fuzzy Information Processing Society
Conference. Puerto Vallarta. Mexico, December1517, 1992, 418-426. Houston, TX: NASAJohnson
Space Center.
Kreinovich, V. 1997. S. Maslov’sIterative Method:15
Years Later (Freedom of Choice, Neural Networks,
Numerical Optimization. Uncertainty Reasoning, and
Chemical Computing), In: Kreinovich, V., and Mints.
G. eds. 1997. Problems of reducing the exhaustive
search. Providence, RI: American Mathematical
Society, 175-189.
Kreinovich, V., and Mints, G. eds. 1997. Problems of
reducing the exhaustive search. Providence. RI:
American Mathematical Society (AMSTranslations-Series 2, Vol. 178).
Mora, J. L., Flores, B. C., and Kreinovich, V. 1994.
Suboptimumbinary phase code search using a genetic
algorithm. In: Udpa, S. D., and Han, H. C. eds.
Advanced Microwaveand Millimeter-Wave Detectors.
Proceedings of the SPIE/International Society for
Optical Engineering, Vol. 2275, San Diego,CA. 1994.
1 68-176.
Nguyen, H. T., Kreinovich. V., Nesterov, V., and
Nakamura,M. 1997. On hardware support for interval
computations and for soft computing: a theorem IEEE
Transactions on Fuzzy systems 5(1): 108-127.
Nguyen,H.T.,Kreinovich,V.,and Wojciechowski, P.
1997. Strict Archimedeant-Norms and t-Conorms as
Universal Approximators, Technical Report No.
UTEP-CS-97-3, Department of Computer Science,
University of Texas at E1 Paso; the LATEX
file can be
downloaded via a link in the second author’s
homepage
http://cs, utep. edu/csdept/faulty/kreinovich,
html.
Nguyen,H. T., and Walker, E. A. 1997. A first course
hzfuzzy logic. BocaRaton, Florida: CRCPress.
Wadsworth, H. M. Jr. ed. 1990. Hamtbook of
statistical methodsfor engineers and scientists. N.Y.:
McGraw-HillPublishing Co.