Model Based Anytime Signal Processing for Resource Insufficient

advertisement
Computational Intelligence –
a Possible Solution for
Unsolvable Problems
Annamária R. Várkonyi-Kóczy
Dept. of Measurement and Information Systems,
Budapest University of Technology and
Economics
koczy@mit.bme.hu
Contents
•
•
•
•
Motivation: Why do we need something ”non-classical”?
What is Computational Intelligence?
How CI works?
About some of the methods of CI
–
–
–
–
Fuzzy Logic
Neural Networks
Genetic Algorithms
Anytime Techniques
• Engineering view: Practical issues
• Conclusions – Is CI really solution for unsolvable
problems?
03.10.2006
Tokyo Institute of Technology
2
Motivation: Why do we need
something ”non-classical”?
• Nonlinearity, never unseen spatial and temporal complexity of
systems and tasks
• Imprecise, uncertain, insufficient, ambiguous, contradictory
information, lack of knowledge
• Finite resources  Strict time requirements (real-time
processing)
• Need for optimization
+
• User’s comfort
New challanges/more complex tasks to be solved  more
sophisticated solutions needed
3
Never unseen spatial and temporal
complexity of systems and tasks
How can we drive in heavy traffic?
Many components, very complex
system. Can classical or even AI
systems solve it?
Not, as far as we know.
But WE, humans can.
And we would like to build
MACHINES to be able to do the
same.
Our car, save fuel, save time, etc.
4
Never unseen spatial and temporal
complexity of systems and tasks
Help:
• Increased computer facilities
• Model integrated computing
• New modeling techniques
• Approximative computing
• Hybrid systems
5
Imprecise, uncertain, insufficient,
ambiguous, contradictory information,
lack of knowledge
• How can I get to Shibuya?
(Person 1: Turn right at the lamp, than straight ahead till the 3rd
corner, than right again ... NO: better turn to the left) (Person 2:
Turn right at the lamp, than straight ahead till appr. the 6th
corner ... than I don’t know) (Person 3: It is in this direction →
somewhere ...)
• It is raining
• The traffic light is out of order
• I don’t know in which building do we have the special lecture
(in Building III or II or ...)? And at what time???? (Does it start
at 3 p.m. or at 2 p.m? And: on the 3rd or 4th of October?)
• When do I have to start from home at at what time?
Who (a person or computer) can show me an algorithm to find 6an
OPTIMUM solution?
Imprecise, uncertain, insufficient,
ambiguous, contradictory information,
lack of knowledge
Help:
• Intelligent and soft computing techniques
being able to handle the problems
• New data acquisition and representation
techniques
• Adaptivity, robustness, ability to learn
7
Finite resources  Strict time
requirements (real-time processing)
• It is 10.15 a.m. My lecture starts at 3 p.m.
(hopefully the information is correct)
• I am still not finished with my homework
• I have run out of the fuel and I don’t have enough
money for a taxi
• I am very hungry
• I have promised my Professor to help him to
prepare some demo in the Lab this morning
I can not fulfill everything with maximum
preciseness
8
Finite resources  Strict time
requirements (real-time processing)
Help:
• Low complexity methods
• Flexible systems
• Approximative methods
• Results for qualitative evaluations & for
supporting decisions
• Anytime techniques
9
Need for optimization
• Traditionally:
optimization = precision
• New definition:
optimization = cost optimization
• But what is cost!?
presition and certainty also carry a cost
10
Need for optimization
Let’s look ”TIME” as a resource:
• The most important thing is to go the Lab and help
my Professor (He is my Professor and I have
promised it). I will spend there as needed, min. 3
hours
• I have to submit the homework, but I will work in
the Lab., i.e. today I will prepare an ”average” and
not a ”maximum” level homework (1 hour)
• I don’t have time to eat at home, I will buy a bento
at the station (5 minutes)
• The train is more expensive then the bus but takes
much less time, i.e. I will go by train (40 minutes)
11
User’s comfort
• I have to ask the way to the university but
unfortunately, I don’t speak Japanese
• Next time I also want to find my way
• Today it took one and a half hour to get
here. How about tomorrow?
• It would be good get more help
• ....
12
User’s comfort
Help:
• Modeling methods and representation
techniques making possible to
–
–
–
–
–
–
handle
interprete
predict
improve
optimise the system and
give more and more support in the processing
13
User’s comfort
Human language
Modularity, simplicity, hierarchical structures
Aims of the processing
processing
aims of preprocessing
preprocessing
improving the performance of the algorithms
giving more support to the processing (new)
image processing / computer vision:
preprocessing
processing
noise smoothing
feature extraction (edge, corner detection)
pattern recognition, etc.
3D modeling, medical diagnostics, etc.
automatic 3D modeling, automatic ...
14
The most important elements of the
solution
• Low complexity, approximative modeling
• Application of adaptive and robust techniques
• Definition and application of the proper cost function
including the hierarchy and measure of importance of
the elements
• Trade-off between accuracy (granularity) and
complexity (computational time and resource need)
• Giving support for the further processing
These do not cope with traditional and AI methods.
But how about the new approaches, about
COMPUTATIONAL INTELLIGENCE?
15
What is Computational Intelligence?
Computer
+
Increased computer facilities
Intelligence
Added by the new methods
L.A. Zadeh, Fuzzy Sets [1965]:
“In traditional – hard – computing, the prime desiderata are
precision, certainty, and rigor. By contrast, the point of departure
of soft computing is the thesis that precision and certainty carry
a cost and that computation, reasoning, and decision making
should exploit – whenever possible – the tolerance for
imprecision and uncertainty.”
16
What is Computational Intelligence?
• CI can be viewed as a corsortium of methodologies which play
important role in conception, design, and utilization of
information/intelligent systems.
• The principal members of the consortium are: fuzzy logic
(FL), neuro computing (NC), evalutionary computing (EC),
anytime computing (AC), probabilistic computing (PC),
chaotic computing (CC), and (parts of) machine learning
(ML).
• The methodologies are complementary and synergistic, rather
than competitive.
• What is common: Exploit the tolerance for imprecision,
uncertainty, and partial truth to achieve tractability, robustness,
low solution cost and better rapport with reality.
17
Computational Intelligence
fulfill all of the five requirements:
(Low complexity, approximative modeling
application of adaptive and robust techniques
Definition and application of the proper cost function including the hierarchy
and measure of importance of the elements
Trade-off between accuracy (granularity) and complexity (computational time
and resource need)
Giving support for the further processing)
18
How CI works?
1. Knowledge
•
•
•
•
•
•
•
•
Information acquisition (observation)
Information processing (numeric, symbolic)
Storage and retrieval of the information
Search for a ”structure” (algorithm for the nonalgorithmizable processing)
Certain knowledge (can be obtained by formal
methods): closed, open world: ABSTRACT WORLDS)
Uncertain knowledge (by cognitive methods)
(ARTIFICIAL and REAL WORLDS)
Lack of knowledge
Knowledge representation
19
How CI works?
1. Knowledge
• In real life nearly everything is optimization
• (Ex.1. Determination of the velocity = Calculation of the optimum
estimation of the velocity from the measured time and done distance)
• Ex.2. Determination of the resistance = the optimum estimation of the
resistance with the help of the measured intensity of current and
voltage
• Ex.3. Analysis of a measurement result = the optimum estimation of
the measured quantity in the kowledge of the conditions of the
measurement and the measured data)
• Ex. 4. Daily time-table
• Ex. 5. Optimum route between two towns
In Ex. 1-3 the criteria of the optimization is unambiguos and easily can
be given
Ex. 4-5 are also simple tasks but the criteria is not unambiguos
20
Optimum route:
What is optimum? (Subjective, depending on the requirements,
taste, limits of the person)
- We prefer/are able to travel by aeroplane, train, car, ...
Let’s say car is selected:
- the shortest route (min petrol need), the quickest route (motorway),
the most beautiful route with sights (whenever it is possible I never
miss the view of the Fuji-san ...), where by best restaurants are
located, where I can visit my friends, ...
OK, let’s fix the preferences of a certain person:
-But is it summer or winter, is it sunshine or raining, how about the
road reconstructions, ....
By going into the details we get nearer and nearer to the solution
Knowledge is needed for the determination of a good descriptive
model of the circumstances and goals
21
But do we know what kind of wheather will be in two months?
2. Model
• Known model e.g. analithic model (given by
differential equations) - too complex to be
handled
• Lack of knowledge - the information about the
system is uncertain or imperfect
We need new, more precise knowledge
The knowledge representation (model) should be
handable and should tolerate the problems
22
Learning and Modeling
New knowledge by learning:
Unknown, partially unknown, known but too
complex to be handled, ill-defined systems
Model by which we can be analyze the system
and can predict the behavior of the system
+
Criteria (quality measure) for the validity of
the model
23
Input
u
Unknown system
d
Criteria
Model
c
Measure of the
quality of the model
y
Parameter
tuning
1. Observation (u, d, y), 2. Knowledge representation (model,
formalism), 3. Decision (optimizasion, c(d,y)), 4. Tuning (of the
parameters), 5. Environmental influence,(non-observed input,
noise, etc.) 6. Prediction ability (for the future input)
24
Iterative procedure:
We build a system for collecting information
We improve the system by
building in the knowledge
We collect the information
We improve the observation and collect more information
25
Problem
Knowledge
representation,
Model
Non-represented part
of the problem
Represented
knowledge
Independant space,
coupled to the problem
by the formalism
26
3. Optimization
•
•
•
•
Valid where the model is valid
Given a system with free parameters
Given an objective measure
The task is to set the parameters which mimimize
or maximize the qualitative measure
• Systematic and random methods
• Exploitation (of the deterministic knowledge) and
exploration (of new knowledge)
27
Methods of
Computational Intelligence
• fuzzy logic –low complexity, easy build in of the a
priori knowledge into computers, tolerance for
imprecision, interpretability
• neuro computing - learning ability
• evalutionary computing – optimization, optimum
learning
• anytime computing – robustness, flexibility,
adaptivity, coping with the temporal circumstances
• probabilistic reasoning – uncertainty, logic
• chaotic computing – open mind
• machine learning - intelligence
28
Fuzzy Logic
•
•
•
•
•
Lotfi Zadeh, 1965
Knowledge representation in natural language
”computing with words”
Perceptions
Value imprecisiation meaning precisiation
29
History of fuzzy theory
•
•
•
•
Fuzzy sets & logic: Zadeh 1964/1965Fuzzy algorithm: Zadeh 1968-(1973)Fuzzy control by linguistic rules: Mamdani & Al. ~1975Industrial applications: Japan 1987- (Fuzzy boom), Korea
Home electronics
Vehicle control
Process control
Pattern recognition & image processing
Expert systems
Military systems (USA ~1990-)
Space research
• Applications to very complex control problems: Japan 1991e.g. helicopter autopilot
30
Areas in which Fuzzy Logic was
succesfully used:
•
•
•
•
•
•
•
Modeling and control
Classification and pattern recognition
Databases
Expert Systems
(Fuzzy) hardware
Signal and image processing
Etc.
31
• Universe of discourse: Cartesian (direct) product
of all the possible values of each of the descriptors
• Linguistic variable (linguistic term) [Zadeh]: ”By
a linguistic variable we mean a variable whose
values are words or sentences in a natural or
artificial language. For example, Age is a
linguistic variable if its values are linguistic rather
than numerical, i.e., young, not young, very young,
quite young, old, not very old and not very young,
etc., rather than 20, 21, 22, 23, ...”
• Fuzzy set: It represents a property of the linguistic
variable. A degree of includance is associated to
each of the possible values of the linguistic
variable (characteristic function)
• Membership value: The degree of belonging into
the set.
32
An Example
• A class of students (e.g. M.Sc. Students taking
•
the Spec. Course „Computational Intelligence”)
• The universe of discourse: X
• “Who does have a driver’s license?”
• A subset of X = A (Crisp) Set
• (X) = CHARACTERISTIC FUNCTION
1
0
1
1
0
1
1
• “Who can drive very well?”
(X) = MEMBERSHIP FUNCTION
0.7
0
1.0
0.8
0
0.4
0.2
FUZZY SET
33
Definitions
• Crisp set:
c•
•d
•a
•x
•y
aA
bA
B
• Convex set:
•b
Crisp set A
A is not convex as aA, cA, but
d=a+(1-)c A, [0, 1].
B is convex as for every x, yB and
[0, 1] z=x+(1-)y B.
• Subset:
•x
•y
A
B
If xA then
also xB.
AB
34
Definitions
• Relative complement or difference:
A–B={x | xA and xB}
B={1, 3, 4, 5}, A–B={2, 6}.
C={1, 3, 4, 5, 7, 8}, A–C={2, 6}!
• Complement: A  X  A
where X is the universe.
Complementation is involutive: A  A
Basic properties:   X,
• Union:
X
AB={x | xA or xB}
n
For
Ai | i  I  Ai  x | x  Ai for some i
AX  X
A   A
AA  X
i 1
(Law of excluded middle)
35
Definitions
• Intersection:
AB={x | xA and xB}.
n
For
Ai | i  I  Ai  x | x  Ai for all i
A   
AX  A
i 1
(Law of contradiction)
AA  
• More properties:
Commutativity:
Associativity:
Idempotence:
Distributivity:
AB=BA, AB=BA.
ABC=(AB)C=A(BC),
ABC=(AB)C=A(BC).
AA=A, AA=A.
A(BC)=(AB)(AC),
A(BC)=(AB)(AC).
36
Membership function
A 
Crisp set
Fuzzy set
Characteristic function
Membership function
A:X{0, 1}
A:X[0, 1]
1
1  5 x 102


0
x  5  x  17



 B   0.2x  5
5  x  10 
 1

 7 x  17  10  x  17 
 C  2A
37
Some basic concepts of fuzzy sets
Elements
Infant
Adult
Young
Old
5
0
0
1
0
10
0
0
1
0
20
0
.8
.8
.1
30
0
1
.5
.2
40
0
1
.2
.4
50
0
1
.1
.6
60
0
1
0
.8
70
0
1
0
1
80
0
1
0
1
38
Some basic concepts of fuzzy sets
• Support: supp(A)={x | A(x)>0}.
~
P
supp: x   P x 
Infant=0, so supp(Infant)=0.
If |supp(A)|<, A can be defined
A=1/x1+ 2/x2+…+ n/xn.
n
A   i / x i
i 1
A    A x  / x
x
• Kernel (Nucleus, Core):
Kernel(A)={x | A(x)=1}.
39
Definitions
• Height:
–
–
–
–
Height (A)  max ( A ( x ))  sup ( A ( x ))
x
• height(old)=1
height(infant)=0
If height(A)=1
A is normal
If height(A)<1
A is subnormal
height(0)=0
(If height(A)=1 then supp(A)=0)
x
• a-cut: A a  {x |  A ( x )  a}
Strong Cut: A a  {x |  A ( x )  a}
Young 0.8  {5,10}
• Young 0.8  {5,10, 20}
– Kernel:
A1  {x |  A ( x )  1}
– Support: A 0  {x |  A ( x )  0}
• If A is subnormal, Kernel(A)=0
– A a  A IF
a
40
Definitions
• Fuzzy set operations defined by L.A. Zadeh in
1964/1965
 A x   1   A x 
• Complement:
AB x   min A x , B x 
• Intersection:
 AB x   max  A x , B x 
• Union:
(x):
 x  A, B
0
 x  A, B
1
41
Definitions
This is really a generalization of crisp set op’s!
A
B
A
0
0
1
1
0
1
0
1
1
1
0
0
AB AB
0
0
0
1
0
1
1
1
1-A
min
max
1
1
0
0
0
0
0
1
0
1
1
1
42
Fuzzy Proportion
• Fuzzy proportion: X is P
‘Tina is young’, where:
‘Tina’: Crispage, ‘young’: fuzzy predicate.
Fuzzy sets expressing
linguistic terms for ages
Truth claims – Fuzzy sets
over [0, 1]
• Fuzzy logic based approximate reasoning
is most important for applications!
43
 CRISP RELATION : SOME INTERACTION OR ASSOCIATION
BETWEEN ELEMENTS OF TWO OR MORE SETS.
 FUZZY RELATION : VARIOUS DEGREES OF ASSOCIATION CAN BE
REPRESENTED
A
B







A
B
0.5
 0.8
 1


CRISP RELATION


0.9
0.6
FUZZY RELATION
CR
FR
 CARTESIAN (DIRECT) PRODUCT OF TWO (OR MORE) SETS
X, Y
X  Y = { (x,y)  x  X, y  Y }
X  Y  Y  X IF X  Y !
x = { (x , x , …, x )  x  X , i  N

i=1
n
MORE GENERALLY:
i
1
2
n
i
i
n
}
44
Fuzzy Logic Control
• Fuzzification: converts the
numerical value to a fuzzy one;
determines the degree of matching
• Defuzzification converts the
fuzzy term to a classical numerical
value
• The knowledge base contains the
fuzzy rules
• The inference engine describes the
methodology to compute the output
from the input
45
Fuzzyfication
μ
1
8,4
X
The measured (crisp) value is converted to a fuzzy set
containing one element with membership value=1
μ(x) = 1
0
if x=8,4
otherwise
46
Defuzzification
Center of Gravity Method (COG)
yCOG 
 y
y
( y )dy
yY

yY
y
( y )dy
47
Specificity of fuzzy partitions
Fuzzy Partition A containing three linguistic terms
Fuzzy Partition A* containing seven linguistic terms
48
Fuzzy inference mechanism (Mamdani)
•
If x1 = A1,i and x2 = A2,i and...and xn = An,i then y = Bi
w j ,i  max {min{  X ( x j ),  A j ,i ( x j )}}
xj
The weighting factor wji characterizes,
how far the input xj corresponds to the
rule antecedent fuzzy set Aj,i in one
dimension
wi  min{ w1,i , w2,i ,, wn ,i }
The weighting factor wi
characterizes, how far
the input x fulfils to the
antecedents of the rule Ri.
49
Conclusion
 y ( y )  min( wi ,  B ( y ))
i
i
The conclusion of rule Ri for a given x observation is yi
50
Fuzzy Inference
• Mamdani Type
51
Fuzzy systems: an example
TEMPERATURE
MOTOR_SPEED
Fuzzy systems operate on fuzzy rules:
IF temperature is COLD THEN motor_speed is LOW
IF temperature is WARM THEN motor_speed is MEDIUM
IF temperature is HOT THEN motor_speed is HIGH
52
Inference mechanism (Mamdani)
Temperature = 55
Motor Speed
RULE 1
RULE 2
RULE 3
Motor Speed = 43.6
53
Planning of Fuzzy Controllers
Determination of fuzzy controllers = determination
of the antecedents + consequents of the rules
• Antecedents:
– Selection of the input dimensions
– Determination of the fuzzy partitions for the inputs
– Determination of the parameters for the fuzzy variables
• Consequents:
– Determination of the parameters
54
Fuzzy-controlled Washing Machine
(Aptronix Examples)
• Objective
Design a washing machine
controller, which gives the correct
wash time even though a precise
model of the input/output
relationship is not available
• Inputs:
Dirtyness, type of dirt
• Output:
Wash time
55
Fuzzy-controlled Washing Machine
• Rules for our washing machine
controller are derived from
common sense data taken from
typical home use, and
experimentation in a controlled
environment.
A typical intuitive rule is as
follows:
If saturation time is long and
transparency is bad,
then wash time should be long.
56
Air Conditioning Temperature
Control
•
•
There is a sensor in the room to
monitor temperature for feedback
control, and there are two control
elements, cooling valve and heating
valve, to adjust the air supply
temperature to the room.
Temperature control has several
unfavorable features: non-linearity,
interference, dead time, and
external disturbances, etc.
Conventional approaches usually
do not result in satisfactory
temperature control.
Rules for this controller may be
formulated using statements similar
to:
If temperature is low then open
heating valve greatly
57
Air Conditioning Temperature
Control – Modified Model
•
There are two sensors in the modified
system: one to monitor temperature and
one to monitor humidity. There are
three control elements: cooling valve,
heating valve, and humidifying valve,
to adjust temperature and humidity of
the air supply.
Rules for this controller can be formulated by adding rules for humidity control
to the basic model.
If temperature is low then open humidifying valve slightly.
This rule acts as a predictor of humidity (it leads the humidity value) and is
also designed to prevent overshoot in the output humidity curve.
58
Smart Cars 1 - Rules
The number of rules depends on the problem. We
shall consider only two for the simplicity of the
example:
Rule 1: If the distance between two cars is short
and the speed of your car is high(er than the
other one’s), then brake hard.
Rule 2: If the distance between two cars is
moderately long and the speed of your car is
high(er than the other one’s), then brake
moderately hard.
59
Smart Cars 2 – Membership
Functions
– Determine the membership functions for
the antecedent and consequent blocks
– Most frequently 3, 5 or 7 fuzzy sets are
used (3 for crude control, 5 and 7 for
finer control results)
– Typical shapes (triangular – most
frequent)
60
Smart Cars 3 – Simplify Rules using
Codes
– Distance between two cars: X1
speed: X2
Breaking strength: Y
Labels- small, medium, large: S, M,
L
PL - Positive Large
PM - Positive Medium
PS - Positive Small
ZR - Aproximately Zero
NS - Negative Small
NM - Negative Medium
NL - Negative Large
– In the case of X2 (speed), small,
medium, and large mean the amount
that this car's speed is higher than
the car in front.
– Rule 1:
If X1=S and X2=M, then Y=L
Rule 2:
If X1=M and X2=L, then Y=M
61
Smart Cars 4 - Inference
– Determine the degree of matching
– Adjust the consequent block
– Total evaluation of the conclusions
based on the rules
To determine the control amount at a
certain point, a defuzzifier is used
(e.g. the center of gravity). In this
case the center of gravity is located at
a position somewhat harder than
medium strength, as indicated by the
arrow
62
Advantages of Fuzzy Controllers
• Control design process is simpler
• Design complexity reduced, without need
for complex mathematical analysis
• Code easier to write, allows detailed
simulations
• More robust, as tests with weight changes
demonstrate
• Development period reduced
63
Neural Networks
•
•
•
•
(McCullogh & Pitts, 1943, Hebb, 1949)
Rosenblatt, 1958 (Perceptrone)
Widrow-Hoff, 1960 (Adaline)
It mimics the human brain
64
Neural Networks
Neural Nets are parallel, distributed information
processing tools which are
• Highly connected systems composed of identical
or similar operational units evaluating local
processing (processing element, neuron) usually in
a well-ordered topology
• Possessing some kind of learning algorithm which
usually means learning by patterns and also
determines the mode of the information processing
• They also possess an information recall algorithm
making possible the usage of the previously
learned information
65
Application area where NNs are
succesfully used
• One and multidimentional signal processing
(image processing, speach processing, etc.)
• System identification and control
• Robotics
• Medical diagnostics
• Economical features estimation
66
Application area where NNs are
succesfully used
• Associative memory = content addresable memory
• Classification system (e.g. Pattern recognition, character
recognition)
• Optimization system (the usually feedback NN
approximates the cost function) (e.g. radio frequency
distribution, A/D converter, traveling sailsman problem)
• Approximation system (any input-output mapping)
• Nonlinear dynamic system model (e.g. Solution of partial
differtial equation systems, prediction, rule learning)
67
Main features
•
•
•
•
•
Complex, non-linear input-output mapping
Adaptivity, learning ability
distributed architecture
fault tolerant property
possibility of parallel analog or digital
VLSI implementations
• Analogy with neurobiology
68
The simple neuron
Linear combinator with non-linear activation:
69
Typical activation functions
step
linear sections tangens hyperbolic
sygmoid
70
Classical neural nets
• Static nets (without memory, feedforward networks)
– One layer
– Multi layer
• MLP (Multi Layer Perceptron)
• RBF (Radial Basis Function)
• CMAC (Cerebellar Model Artculation Controller)
• Dynamic nets (with memory or feedback recall networks)
– Feedforward (with memory elements)
– Feedback
• Local feedback
• Global feedback
71
Feedforward architectures
One layer architectures: Rosenblatt perceptron
72
Feedforward architectures
One layer architectures
Input
Output
Tunable parameters (weighting factors)
73
Feedforward architectures
Multilayer network (static MLP net)
74
Approximation property
• universal approximation property for some
kinds of NNs
• Kolmogorov: Any continuous real valued
N variable function defined over the [0,1]N
compact interval can be represented with
the help of appropriately chosen 1 variable
functions and sum operation.
75
Learning
Learning = parameter estimation
• supervised learning
• unsupervised learning
• analytic learning
76
Supervised learning
estimation of the model parameters by x, y, d
n (noise)
Input
x
System: d=f(x,n)
d
Criteria:
C(d,y)
NN Model:
y=fM(x,w)
C=C(ε)
y
Parameter
tuning
77
Supervised learning
• Criteria function
– Quadratic:
– ...
78
• Minimization of the criteria
• Analytic solution (only if it is very simple)
• Iterative techniques
– Gradient methods
– Searching methods
• Exhaustive
• Random
• Genetic search
79
Parameter correction
• Perceptron
• Gradient methods
– LMS (least means square algorithm)
• ...
80
LMS (Iterative solution based on the
temporary error)
• Temporary error:
• Temporary gradient:
• Weight update:
81
Gradient methods
• The route of the convergence
82
Gradient methods
• Single neuron with nonlinear acticvation
• Multilayer network: backpropagation (BP)
83
Teaching an MLP network:
The Backpropagation algorithm
84
Design of MLP networks
• Size of the network (number of layers,
number of hidden neurons)
• The value of the learning factor, µ
• Initial values of the parameters
• Validation, learning set, test set
• Teaching method (sequential, batch)
• Stopping criteria (error limit, number of
cycles)
85
Modular networks
•
•
•
•
Hierarchical networks
Linear combination of NNs
Mixture of experts
Hybrid networks
86
Linear combination of networks
87
Mixture of experts (MOE)
Gating
network
experts
88
Decomposition of complex tasks
• Decomposition and learning
– Decomposition before learning
– Decomposition during the learning (automatic
task decomposition)
• Problem space decomposition
– Input space decomposition
– Output space decomposition
89
Example: Automatic recognition of
numbers (e.g. Postal code)
• Binary pictures with 16x16 pixels
• Preprocessing (idea: the numbers are composed of
edge segments): 4 edge detections
• normalization  four 8x8 pictures (i.e. 256 input
elements
• Classification by 45 independant networks, each
classifying only two classes of the ten figures (1 or
2, 1 or 3, ..., 8 or 0, 9 or 0)
• The corresponding network output are connected
to an AND gate, if its output equals to 1 then the
figure is recognized
90
Example: Automatic recognition of
handwritten figures (e.g. Postal codes)
Edge detection
normalization
horizontal
input
diagonal \
Edge detection masks
vertical
diagonal /
91
Example: Automatic recognition of
handwritten figures (e.g. Postal codes)
92
Genetic Algorithms
• John Holland, 1975
• Adaptive method for searching and
optimization problems
• Copying the genetic processes of the biological
organisms
• Natural selection (Charles Darwin: The Origin
of Species)
• Multi points search
93
Successful applicational areas
• Optimization (circuit design, scheduling)
• Automatic programming
• Machine learning (classification, prediction,
wheather forecast, learning of NNs)
• Economical systems
• Immunology
• Ecology
• Modeling of social systems
94
The algorithm
• Initial population → parent selection →
creation of new individuals (crossover,
mutation) → quality measure, reproduction
→ new generation → exit criteria?
• If no: continue with the algorithm
• If yes: selection of the result, decoding
• Like in biology in real word
95
Problem building
• Selection of the most important features,
coding
• Fitness function = quality measure
(optimum criterium)
• Exit criteria
• Selection of the size of the population
• Specification of the genetic operations
96
Simple genetic algorithms
• Representation = features coded in a binary
string (chromosome, string)
• Fitness function = representing the
”viability” (optimality) of the individual
• Selection = selecting the parent individuals
from the generation (e.g. random but
fitness based, i.e. better chance with higher
fittness value)
97
Simple genetic algorithms
• Crossover from 2 parents two offsprings
(one point, two point, N-point, uniform)

98
Simple genetic algorithms
• Mutation (of the bits (genes)) (one or independant)

• Reproduction = who will survive and form the
next (new) generation
– Individuals with the best fitness function
• Exit: after a number of generation or depending on
the fitness function of the best individual or
average of the generation, ...
99
Example for GAs
Maximize the f(x)=x2 function where x can
take values between 0 and 31
Let’s start with a population containing 4
elements (generated randomly by throwing
a coin). Each element (string) consists of 5
bits (to be able to code numbers between 0
and 31)
100
Example for GAs
number
Initial
x value
population
f(x)
f(xi)/∑ f(x) ranking
1
01101
13
169
0.14
1
2
11000
24
576
0.49
2
3
01000
8
64
0.06
0
4
10011
19
361
0.31
1
Sum
1170
1170
1.00
4
Average
293
293
0.25
1
Maximum
576
576
0.49
2
101
Example for GAs
The pairs
Sequence
of the
selection
Position of New
x value
the
population
crossover
f(x)
01101
2
4
01100
12
144
11000
1
4
11001
25
625
11000
4
2
11011
27
729
10011
3
2
10000
16
256
Sum
1754
Average
439
Maximum
729
102
Conclusions
• The fitness improved significantly in the new
generation (both the average and the maximum)
• Initial population: randomly chosen
• Selection: 4 times by a roulette wheel where
”better” individuals had bigger sectors having
bigger chance (the 3rd (worst) string has died out!)
• Pairs: the 1-2, 3-4 selections
• Position of the crossover: randomly chosen
• Mutation: bit by bit with p=0.001 probability
• (the generation contains 20 bits, in average 0.02
bit will be mutated – in this example none)
103
Anytime Techniques –
Why do we need them?
• Larger scale signal processing (DSP) systems, Artificial
Intelligence
– Limited amount of resources
– Abrupt changes in…
• Environment
• Processing system
• Computational resources (shortage)
• Data flow (loss)
– Processing should be continued
• Low complexity  lower, but possibly enough
accuracy or partial results (for qualitative decisions)
 Anytime systems
104
Anytime Systems –
What do they offer?
• To handle abrupt changes due to failures
• To fulfill prescribed response time conditions
(changeable response time)
• Continuos operation in case of serious shortage of
necessary data (temporary overload of certain
communication channels, sensor failures, etc.)
/processing time
• To provide appropriate overall performance for the
whole system
• guaranteed response time, known error
• Flexibility: available input data, available time,
computational power, balance between time and quality
(quality: accuracy, resolution, etc…)
105
Anytime systems – How do they work?
• Conditions: on-line computing, guaranteed
response time, limited resources (changing in
time)
• Anytime processing: coping with the
temporarily available resources to maintain the
overall performance
• “correct”models, treatable by the limited resources
during limited time, low and changeable complexity,
possibility of reallocation of the resources, changeable
and guaranteed response time/ computational need,
known error
• tools: iterative algorithms, other types of
methods used in a modular architecture
106
• optimization of the whole system (processing
chain) based on intelligent decisions (expert
system, shortage indicators)
• algorithms and models of simpler complexity
• temporarily lower accuracy
• data for qualitative evaluations & for
supporting decisions
• coping with the temporal conditions
• supporting ‘early’ decision making
• preventing serious alarm situations
107
• Shortage indicators
• Intelligent monitor
• Special compilation methods during
runtime
• Strict time constraints for the monitor
• The number and the complexity of the
executable task can be very high

add-in + optimization
108
Missing input samples
Temporary overload of certain communication
channels, sensor failures, etc.  the input
samples fail to arrive in time or will be lost

prediction mechanism (estimations based on
previous data)
example: resonator based filters
109
Temporal shortage of computing
power
Temporary shortage of computer power  the
signal processing can not be performed in time

Trade-off between the approximation accuracy
and the complexity
complexity reduction techniques, reduction of
the sampling rate, application of less
accurate evaluations
110
Temporal shortage of computing
power
Examples:
• application of lower order filters or transformers (in
case of recursive discrete transformers: to switch
off some of the channels, obvious req.: to maintain
e.g. the orthogonality of the transformations
• Singular Value Decomposition applied to fuzzy
models, B-spline neural networks, wavelet
functions, Gabor functions, etc. - fuzzy filters,
111
human hearing system, generalized NNs
Temporal shortage of computing
time
Temporary shortage of computer time 
the signal processing can not be performed
in time
Examples:
 block-recursive filters and filter-banks
 overcomplete signal representations
112
Anytime algorithms – iterative
methods
• Evaluate 734/25! (after 1 second: appr. 30 → after
5 seconds: better 29,3 → after 8 seconds: exactly
29,36
•
We build a system for collecting information
We improve the system by
building in the knowledge
We collect the information
We improve the observation and collect more information
113
Anytime algorithms – modular
architecture
• Units = Distinct/different
implementations of a task,
with the same interface
but different performance
characteristics :
–
–
–
–
characteristics
complexity
accuracy
error transfer characteristic
•  selection
Expert System
Selection
Unit A/1
Unit B/1
Unit A/2
Unit B/2
Unit A/3
Unit B/3
Module A
Module B
114
Engineering view: Practical
issues
• Well defined mathematical fundation but there is a gap
between the theory and the implementation
• When and which is working better? (the theory can not
give any answer or is lazy to think over?)
• How to choose the
sizes/parameters/shapes/definitions/etc.?
• What if the axioms are inconsistant/incomplete? (the
practical possibility can be 0)
• Handling of the exceptions, e.g. the rule for very young
overwrites the rule young
• Good advises: Modeling, a priori knowledge, iteration,
hybrid systems, smooth systems/parameters (as near to
the real world as possible)
115
Accuracy problems
• How can we handle accuracy problems if we
e.g. don’t have any input information?
• What if in time critical applications not only
the stationary responses are to be considered?
• How can the different modeling/data
representation methods interprete the other’s
results?
• New (classical+nonclassical) measures are
needed
116
Transients
• Dynamic systems:
Change in the systems  transients
• Depending on the transfer function and on
the actual implementation of the structure
• Strongly related to the „energy distribution”
of the system
• Effected by the steps and the reconfiguration
„route”
117
Transients
• Must be reduced and treated:
– careful choosing of the architecture (orthogonal
structures have better transients)
– multi step reconfiguration: selection of the
number and location of the intermediate steps
– estimation of the effect of transients
118
Is CI really solution for unsolvable
problems?
• Yes: The high number of succesful
applications and the new areas where
automatization became possible prove that
Computational Intelligence can be a
solution for otherwise unsolvable problems
• Although: With the new methods new
problems have arised to be solved by you
Future engineering is unthinkable without
Computational Intelligence
119
Conclusions
•
•
•
•
What is Computational Intelligence?
What is the secret of its success?
How does it work?
What kind of approaches/concepts are
attached?
• New problems with open questions
03.10.2006
Tokyo Institute of Technology
120
Download