Computational Intelligence – a Possible Solution for Unsolvable Problems Annamária R. Várkonyi-Kóczy Dept. of Measurement and Information Systems, Budapest University of Technology and Economics koczy@mit.bme.hu Contents • • • • Motivation: Why do we need something ”non-classical”? What is Computational Intelligence? How CI works? About some of the methods of CI – – – – Fuzzy Logic Neural Networks Genetic Algorithms Anytime Techniques • Engineering view: Practical issues • Conclusions – Is CI really solution for unsolvable problems? 03.10.2006 Tokyo Institute of Technology 2 Motivation: Why do we need something ”non-classical”? • Nonlinearity, never unseen spatial and temporal complexity of systems and tasks • Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge • Finite resources Strict time requirements (real-time processing) • Need for optimization + • User’s comfort New challanges/more complex tasks to be solved more sophisticated solutions needed 3 Never unseen spatial and temporal complexity of systems and tasks How can we drive in heavy traffic? Many components, very complex system. Can classical or even AI systems solve it? Not, as far as we know. But WE, humans can. And we would like to build MACHINES to be able to do the same. Our car, save fuel, save time, etc. 4 Never unseen spatial and temporal complexity of systems and tasks Help: • Increased computer facilities • Model integrated computing • New modeling techniques • Approximative computing • Hybrid systems 5 Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge • How can I get to Shibuya? (Person 1: Turn right at the lamp, than straight ahead till the 3rd corner, than right again ... NO: better turn to the left) (Person 2: Turn right at the lamp, than straight ahead till appr. the 6th corner ... than I don’t know) (Person 3: It is in this direction → somewhere ...) • It is raining • The traffic light is out of order • I don’t know in which building do we have the special lecture (in Building III or II or ...)? And at what time???? (Does it start at 3 p.m. or at 2 p.m? And: on the 3rd or 4th of October?) • When do I have to start from home at at what time? Who (a person or computer) can show me an algorithm to find 6an OPTIMUM solution? Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge Help: • Intelligent and soft computing techniques being able to handle the problems • New data acquisition and representation techniques • Adaptivity, robustness, ability to learn 7 Finite resources Strict time requirements (real-time processing) • It is 10.15 a.m. My lecture starts at 3 p.m. (hopefully the information is correct) • I am still not finished with my homework • I have run out of the fuel and I don’t have enough money for a taxi • I am very hungry • I have promised my Professor to help him to prepare some demo in the Lab this morning I can not fulfill everything with maximum preciseness 8 Finite resources Strict time requirements (real-time processing) Help: • Low complexity methods • Flexible systems • Approximative methods • Results for qualitative evaluations & for supporting decisions • Anytime techniques 9 Need for optimization • Traditionally: optimization = precision • New definition: optimization = cost optimization • But what is cost!? presition and certainty also carry a cost 10 Need for optimization Let’s look ”TIME” as a resource: • The most important thing is to go the Lab and help my Professor (He is my Professor and I have promised it). I will spend there as needed, min. 3 hours • I have to submit the homework, but I will work in the Lab., i.e. today I will prepare an ”average” and not a ”maximum” level homework (1 hour) • I don’t have time to eat at home, I will buy a bento at the station (5 minutes) • The train is more expensive then the bus but takes much less time, i.e. I will go by train (40 minutes) 11 User’s comfort • I have to ask the way to the university but unfortunately, I don’t speak Japanese • Next time I also want to find my way • Today it took one and a half hour to get here. How about tomorrow? • It would be good get more help • .... 12 User’s comfort Help: • Modeling methods and representation techniques making possible to – – – – – – handle interprete predict improve optimise the system and give more and more support in the processing 13 User’s comfort Human language Modularity, simplicity, hierarchical structures Aims of the processing processing aims of preprocessing preprocessing improving the performance of the algorithms giving more support to the processing (new) image processing / computer vision: preprocessing processing noise smoothing feature extraction (edge, corner detection) pattern recognition, etc. 3D modeling, medical diagnostics, etc. automatic 3D modeling, automatic ... 14 The most important elements of the solution • Low complexity, approximative modeling • Application of adaptive and robust techniques • Definition and application of the proper cost function including the hierarchy and measure of importance of the elements • Trade-off between accuracy (granularity) and complexity (computational time and resource need) • Giving support for the further processing These do not cope with traditional and AI methods. But how about the new approaches, about COMPUTATIONAL INTELLIGENCE? 15 What is Computational Intelligence? Computer + Increased computer facilities Intelligence Added by the new methods L.A. Zadeh, Fuzzy Sets [1965]: “In traditional – hard – computing, the prime desiderata are precision, certainty, and rigor. By contrast, the point of departure of soft computing is the thesis that precision and certainty carry a cost and that computation, reasoning, and decision making should exploit – whenever possible – the tolerance for imprecision and uncertainty.” 16 What is Computational Intelligence? • CI can be viewed as a corsortium of methodologies which play important role in conception, design, and utilization of information/intelligent systems. • The principal members of the consortium are: fuzzy logic (FL), neuro computing (NC), evalutionary computing (EC), anytime computing (AC), probabilistic computing (PC), chaotic computing (CC), and (parts of) machine learning (ML). • The methodologies are complementary and synergistic, rather than competitive. • What is common: Exploit the tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, low solution cost and better rapport with reality. 17 Computational Intelligence fulfill all of the five requirements: (Low complexity, approximative modeling application of adaptive and robust techniques Definition and application of the proper cost function including the hierarchy and measure of importance of the elements Trade-off between accuracy (granularity) and complexity (computational time and resource need) Giving support for the further processing) 18 How CI works? 1. Knowledge • • • • • • • • Information acquisition (observation) Information processing (numeric, symbolic) Storage and retrieval of the information Search for a ”structure” (algorithm for the nonalgorithmizable processing) Certain knowledge (can be obtained by formal methods): closed, open world: ABSTRACT WORLDS) Uncertain knowledge (by cognitive methods) (ARTIFICIAL and REAL WORLDS) Lack of knowledge Knowledge representation 19 How CI works? 1. Knowledge • In real life nearly everything is optimization • (Ex.1. Determination of the velocity = Calculation of the optimum estimation of the velocity from the measured time and done distance) • Ex.2. Determination of the resistance = the optimum estimation of the resistance with the help of the measured intensity of current and voltage • Ex.3. Analysis of a measurement result = the optimum estimation of the measured quantity in the kowledge of the conditions of the measurement and the measured data) • Ex. 4. Daily time-table • Ex. 5. Optimum route between two towns In Ex. 1-3 the criteria of the optimization is unambiguos and easily can be given Ex. 4-5 are also simple tasks but the criteria is not unambiguos 20 Optimum route: What is optimum? (Subjective, depending on the requirements, taste, limits of the person) - We prefer/are able to travel by aeroplane, train, car, ... Let’s say car is selected: - the shortest route (min petrol need), the quickest route (motorway), the most beautiful route with sights (whenever it is possible I never miss the view of the Fuji-san ...), where by best restaurants are located, where I can visit my friends, ... OK, let’s fix the preferences of a certain person: -But is it summer or winter, is it sunshine or raining, how about the road reconstructions, .... By going into the details we get nearer and nearer to the solution Knowledge is needed for the determination of a good descriptive model of the circumstances and goals 21 But do we know what kind of wheather will be in two months? 2. Model • Known model e.g. analithic model (given by differential equations) - too complex to be handled • Lack of knowledge - the information about the system is uncertain or imperfect We need new, more precise knowledge The knowledge representation (model) should be handable and should tolerate the problems 22 Learning and Modeling New knowledge by learning: Unknown, partially unknown, known but too complex to be handled, ill-defined systems Model by which we can be analyze the system and can predict the behavior of the system + Criteria (quality measure) for the validity of the model 23 Input u Unknown system d Criteria Model c Measure of the quality of the model y Parameter tuning 1. Observation (u, d, y), 2. Knowledge representation (model, formalism), 3. Decision (optimizasion, c(d,y)), 4. Tuning (of the parameters), 5. Environmental influence,(non-observed input, noise, etc.) 6. Prediction ability (for the future input) 24 Iterative procedure: We build a system for collecting information We improve the system by building in the knowledge We collect the information We improve the observation and collect more information 25 Problem Knowledge representation, Model Non-represented part of the problem Represented knowledge Independant space, coupled to the problem by the formalism 26 3. Optimization • • • • Valid where the model is valid Given a system with free parameters Given an objective measure The task is to set the parameters which mimimize or maximize the qualitative measure • Systematic and random methods • Exploitation (of the deterministic knowledge) and exploration (of new knowledge) 27 Methods of Computational Intelligence • fuzzy logic –low complexity, easy build in of the a priori knowledge into computers, tolerance for imprecision, interpretability • neuro computing - learning ability • evalutionary computing – optimization, optimum learning • anytime computing – robustness, flexibility, adaptivity, coping with the temporal circumstances • probabilistic reasoning – uncertainty, logic • chaotic computing – open mind • machine learning - intelligence 28 Fuzzy Logic • • • • • Lotfi Zadeh, 1965 Knowledge representation in natural language ”computing with words” Perceptions Value imprecisiation meaning precisiation 29 History of fuzzy theory • • • • Fuzzy sets & logic: Zadeh 1964/1965Fuzzy algorithm: Zadeh 1968-(1973)Fuzzy control by linguistic rules: Mamdani & Al. ~1975Industrial applications: Japan 1987- (Fuzzy boom), Korea Home electronics Vehicle control Process control Pattern recognition & image processing Expert systems Military systems (USA ~1990-) Space research • Applications to very complex control problems: Japan 1991e.g. helicopter autopilot 30 Areas in which Fuzzy Logic was succesfully used: • • • • • • • Modeling and control Classification and pattern recognition Databases Expert Systems (Fuzzy) hardware Signal and image processing Etc. 31 • Universe of discourse: Cartesian (direct) product of all the possible values of each of the descriptors • Linguistic variable (linguistic term) [Zadeh]: ”By a linguistic variable we mean a variable whose values are words or sentences in a natural or artificial language. For example, Age is a linguistic variable if its values are linguistic rather than numerical, i.e., young, not young, very young, quite young, old, not very old and not very young, etc., rather than 20, 21, 22, 23, ...” • Fuzzy set: It represents a property of the linguistic variable. A degree of includance is associated to each of the possible values of the linguistic variable (characteristic function) • Membership value: The degree of belonging into the set. 32 An Example • A class of students (e.g. M.Sc. Students taking • the Spec. Course „Computational Intelligence”) • The universe of discourse: X • “Who does have a driver’s license?” • A subset of X = A (Crisp) Set • (X) = CHARACTERISTIC FUNCTION 1 0 1 1 0 1 1 • “Who can drive very well?” (X) = MEMBERSHIP FUNCTION 0.7 0 1.0 0.8 0 0.4 0.2 FUZZY SET 33 Definitions • Crisp set: c• •d •a •x •y aA bA B • Convex set: •b Crisp set A A is not convex as aA, cA, but d=a+(1-)c A, [0, 1]. B is convex as for every x, yB and [0, 1] z=x+(1-)y B. • Subset: •x •y A B If xA then also xB. AB 34 Definitions • Relative complement or difference: A–B={x | xA and xB} B={1, 3, 4, 5}, A–B={2, 6}. C={1, 3, 4, 5, 7, 8}, A–C={2, 6}! • Complement: A X A where X is the universe. Complementation is involutive: A A Basic properties: X, • Union: X AB={x | xA or xB} n For Ai | i I Ai x | x Ai for some i AX X A A AA X i 1 (Law of excluded middle) 35 Definitions • Intersection: AB={x | xA and xB}. n For Ai | i I Ai x | x Ai for all i A AX A i 1 (Law of contradiction) AA • More properties: Commutativity: Associativity: Idempotence: Distributivity: AB=BA, AB=BA. ABC=(AB)C=A(BC), ABC=(AB)C=A(BC). AA=A, AA=A. A(BC)=(AB)(AC), A(BC)=(AB)(AC). 36 Membership function A Crisp set Fuzzy set Characteristic function Membership function A:X{0, 1} A:X[0, 1] 1 1 5 x 102 0 x 5 x 17 B 0.2x 5 5 x 10 1 7 x 17 10 x 17 C 2A 37 Some basic concepts of fuzzy sets Elements Infant Adult Young Old 5 0 0 1 0 10 0 0 1 0 20 0 .8 .8 .1 30 0 1 .5 .2 40 0 1 .2 .4 50 0 1 .1 .6 60 0 1 0 .8 70 0 1 0 1 80 0 1 0 1 38 Some basic concepts of fuzzy sets • Support: supp(A)={x | A(x)>0}. ~ P supp: x P x Infant=0, so supp(Infant)=0. If |supp(A)|<, A can be defined A=1/x1+ 2/x2+…+ n/xn. n A i / x i i 1 A A x / x x • Kernel (Nucleus, Core): Kernel(A)={x | A(x)=1}. 39 Definitions • Height: – – – – Height (A) max ( A ( x )) sup ( A ( x )) x • height(old)=1 height(infant)=0 If height(A)=1 A is normal If height(A)<1 A is subnormal height(0)=0 (If height(A)=1 then supp(A)=0) x • a-cut: A a {x | A ( x ) a} Strong Cut: A a {x | A ( x ) a} Young 0.8 {5,10} • Young 0.8 {5,10, 20} – Kernel: A1 {x | A ( x ) 1} – Support: A 0 {x | A ( x ) 0} • If A is subnormal, Kernel(A)=0 – A a A IF a 40 Definitions • Fuzzy set operations defined by L.A. Zadeh in 1964/1965 A x 1 A x • Complement: AB x min A x , B x • Intersection: AB x max A x , B x • Union: (x): x A, B 0 x A, B 1 41 Definitions This is really a generalization of crisp set op’s! A B A 0 0 1 1 0 1 0 1 1 1 0 0 AB AB 0 0 0 1 0 1 1 1 1-A min max 1 1 0 0 0 0 0 1 0 1 1 1 42 Fuzzy Proportion • Fuzzy proportion: X is P ‘Tina is young’, where: ‘Tina’: Crispage, ‘young’: fuzzy predicate. Fuzzy sets expressing linguistic terms for ages Truth claims – Fuzzy sets over [0, 1] • Fuzzy logic based approximate reasoning is most important for applications! 43 CRISP RELATION : SOME INTERACTION OR ASSOCIATION BETWEEN ELEMENTS OF TWO OR MORE SETS. FUZZY RELATION : VARIOUS DEGREES OF ASSOCIATION CAN BE REPRESENTED A B A B 0.5 0.8 1 CRISP RELATION 0.9 0.6 FUZZY RELATION CR FR CARTESIAN (DIRECT) PRODUCT OF TWO (OR MORE) SETS X, Y X Y = { (x,y) x X, y Y } X Y Y X IF X Y ! x = { (x , x , …, x ) x X , i N i=1 n MORE GENERALLY: i 1 2 n i i n } 44 Fuzzy Logic Control • Fuzzification: converts the numerical value to a fuzzy one; determines the degree of matching • Defuzzification converts the fuzzy term to a classical numerical value • The knowledge base contains the fuzzy rules • The inference engine describes the methodology to compute the output from the input 45 Fuzzyfication μ 1 8,4 X The measured (crisp) value is converted to a fuzzy set containing one element with membership value=1 μ(x) = 1 0 if x=8,4 otherwise 46 Defuzzification Center of Gravity Method (COG) yCOG y y ( y )dy yY yY y ( y )dy 47 Specificity of fuzzy partitions Fuzzy Partition A containing three linguistic terms Fuzzy Partition A* containing seven linguistic terms 48 Fuzzy inference mechanism (Mamdani) • If x1 = A1,i and x2 = A2,i and...and xn = An,i then y = Bi w j ,i max {min{ X ( x j ), A j ,i ( x j )}} xj The weighting factor wji characterizes, how far the input xj corresponds to the rule antecedent fuzzy set Aj,i in one dimension wi min{ w1,i , w2,i ,, wn ,i } The weighting factor wi characterizes, how far the input x fulfils to the antecedents of the rule Ri. 49 Conclusion y ( y ) min( wi , B ( y )) i i The conclusion of rule Ri for a given x observation is yi 50 Fuzzy Inference • Mamdani Type 51 Fuzzy systems: an example TEMPERATURE MOTOR_SPEED Fuzzy systems operate on fuzzy rules: IF temperature is COLD THEN motor_speed is LOW IF temperature is WARM THEN motor_speed is MEDIUM IF temperature is HOT THEN motor_speed is HIGH 52 Inference mechanism (Mamdani) Temperature = 55 Motor Speed RULE 1 RULE 2 RULE 3 Motor Speed = 43.6 53 Planning of Fuzzy Controllers Determination of fuzzy controllers = determination of the antecedents + consequents of the rules • Antecedents: – Selection of the input dimensions – Determination of the fuzzy partitions for the inputs – Determination of the parameters for the fuzzy variables • Consequents: – Determination of the parameters 54 Fuzzy-controlled Washing Machine (Aptronix Examples) • Objective Design a washing machine controller, which gives the correct wash time even though a precise model of the input/output relationship is not available • Inputs: Dirtyness, type of dirt • Output: Wash time 55 Fuzzy-controlled Washing Machine • Rules for our washing machine controller are derived from common sense data taken from typical home use, and experimentation in a controlled environment. A typical intuitive rule is as follows: If saturation time is long and transparency is bad, then wash time should be long. 56 Air Conditioning Temperature Control • • There is a sensor in the room to monitor temperature for feedback control, and there are two control elements, cooling valve and heating valve, to adjust the air supply temperature to the room. Temperature control has several unfavorable features: non-linearity, interference, dead time, and external disturbances, etc. Conventional approaches usually do not result in satisfactory temperature control. Rules for this controller may be formulated using statements similar to: If temperature is low then open heating valve greatly 57 Air Conditioning Temperature Control – Modified Model • There are two sensors in the modified system: one to monitor temperature and one to monitor humidity. There are three control elements: cooling valve, heating valve, and humidifying valve, to adjust temperature and humidity of the air supply. Rules for this controller can be formulated by adding rules for humidity control to the basic model. If temperature is low then open humidifying valve slightly. This rule acts as a predictor of humidity (it leads the humidity value) and is also designed to prevent overshoot in the output humidity curve. 58 Smart Cars 1 - Rules The number of rules depends on the problem. We shall consider only two for the simplicity of the example: Rule 1: If the distance between two cars is short and the speed of your car is high(er than the other one’s), then brake hard. Rule 2: If the distance between two cars is moderately long and the speed of your car is high(er than the other one’s), then brake moderately hard. 59 Smart Cars 2 – Membership Functions – Determine the membership functions for the antecedent and consequent blocks – Most frequently 3, 5 or 7 fuzzy sets are used (3 for crude control, 5 and 7 for finer control results) – Typical shapes (triangular – most frequent) 60 Smart Cars 3 – Simplify Rules using Codes – Distance between two cars: X1 speed: X2 Breaking strength: Y Labels- small, medium, large: S, M, L PL - Positive Large PM - Positive Medium PS - Positive Small ZR - Aproximately Zero NS - Negative Small NM - Negative Medium NL - Negative Large – In the case of X2 (speed), small, medium, and large mean the amount that this car's speed is higher than the car in front. – Rule 1: If X1=S and X2=M, then Y=L Rule 2: If X1=M and X2=L, then Y=M 61 Smart Cars 4 - Inference – Determine the degree of matching – Adjust the consequent block – Total evaluation of the conclusions based on the rules To determine the control amount at a certain point, a defuzzifier is used (e.g. the center of gravity). In this case the center of gravity is located at a position somewhat harder than medium strength, as indicated by the arrow 62 Advantages of Fuzzy Controllers • Control design process is simpler • Design complexity reduced, without need for complex mathematical analysis • Code easier to write, allows detailed simulations • More robust, as tests with weight changes demonstrate • Development period reduced 63 Neural Networks • • • • (McCullogh & Pitts, 1943, Hebb, 1949) Rosenblatt, 1958 (Perceptrone) Widrow-Hoff, 1960 (Adaline) It mimics the human brain 64 Neural Networks Neural Nets are parallel, distributed information processing tools which are • Highly connected systems composed of identical or similar operational units evaluating local processing (processing element, neuron) usually in a well-ordered topology • Possessing some kind of learning algorithm which usually means learning by patterns and also determines the mode of the information processing • They also possess an information recall algorithm making possible the usage of the previously learned information 65 Application area where NNs are succesfully used • One and multidimentional signal processing (image processing, speach processing, etc.) • System identification and control • Robotics • Medical diagnostics • Economical features estimation 66 Application area where NNs are succesfully used • Associative memory = content addresable memory • Classification system (e.g. Pattern recognition, character recognition) • Optimization system (the usually feedback NN approximates the cost function) (e.g. radio frequency distribution, A/D converter, traveling sailsman problem) • Approximation system (any input-output mapping) • Nonlinear dynamic system model (e.g. Solution of partial differtial equation systems, prediction, rule learning) 67 Main features • • • • • Complex, non-linear input-output mapping Adaptivity, learning ability distributed architecture fault tolerant property possibility of parallel analog or digital VLSI implementations • Analogy with neurobiology 68 The simple neuron Linear combinator with non-linear activation: 69 Typical activation functions step linear sections tangens hyperbolic sygmoid 70 Classical neural nets • Static nets (without memory, feedforward networks) – One layer – Multi layer • MLP (Multi Layer Perceptron) • RBF (Radial Basis Function) • CMAC (Cerebellar Model Artculation Controller) • Dynamic nets (with memory or feedback recall networks) – Feedforward (with memory elements) – Feedback • Local feedback • Global feedback 71 Feedforward architectures One layer architectures: Rosenblatt perceptron 72 Feedforward architectures One layer architectures Input Output Tunable parameters (weighting factors) 73 Feedforward architectures Multilayer network (static MLP net) 74 Approximation property • universal approximation property for some kinds of NNs • Kolmogorov: Any continuous real valued N variable function defined over the [0,1]N compact interval can be represented with the help of appropriately chosen 1 variable functions and sum operation. 75 Learning Learning = parameter estimation • supervised learning • unsupervised learning • analytic learning 76 Supervised learning estimation of the model parameters by x, y, d n (noise) Input x System: d=f(x,n) d Criteria: C(d,y) NN Model: y=fM(x,w) C=C(ε) y Parameter tuning 77 Supervised learning • Criteria function – Quadratic: – ... 78 • Minimization of the criteria • Analytic solution (only if it is very simple) • Iterative techniques – Gradient methods – Searching methods • Exhaustive • Random • Genetic search 79 Parameter correction • Perceptron • Gradient methods – LMS (least means square algorithm) • ... 80 LMS (Iterative solution based on the temporary error) • Temporary error: • Temporary gradient: • Weight update: 81 Gradient methods • The route of the convergence 82 Gradient methods • Single neuron with nonlinear acticvation • Multilayer network: backpropagation (BP) 83 Teaching an MLP network: The Backpropagation algorithm 84 Design of MLP networks • Size of the network (number of layers, number of hidden neurons) • The value of the learning factor, µ • Initial values of the parameters • Validation, learning set, test set • Teaching method (sequential, batch) • Stopping criteria (error limit, number of cycles) 85 Modular networks • • • • Hierarchical networks Linear combination of NNs Mixture of experts Hybrid networks 86 Linear combination of networks 87 Mixture of experts (MOE) Gating network experts 88 Decomposition of complex tasks • Decomposition and learning – Decomposition before learning – Decomposition during the learning (automatic task decomposition) • Problem space decomposition – Input space decomposition – Output space decomposition 89 Example: Automatic recognition of numbers (e.g. Postal code) • Binary pictures with 16x16 pixels • Preprocessing (idea: the numbers are composed of edge segments): 4 edge detections • normalization four 8x8 pictures (i.e. 256 input elements • Classification by 45 independant networks, each classifying only two classes of the ten figures (1 or 2, 1 or 3, ..., 8 or 0, 9 or 0) • The corresponding network output are connected to an AND gate, if its output equals to 1 then the figure is recognized 90 Example: Automatic recognition of handwritten figures (e.g. Postal codes) Edge detection normalization horizontal input diagonal \ Edge detection masks vertical diagonal / 91 Example: Automatic recognition of handwritten figures (e.g. Postal codes) 92 Genetic Algorithms • John Holland, 1975 • Adaptive method for searching and optimization problems • Copying the genetic processes of the biological organisms • Natural selection (Charles Darwin: The Origin of Species) • Multi points search 93 Successful applicational areas • Optimization (circuit design, scheduling) • Automatic programming • Machine learning (classification, prediction, wheather forecast, learning of NNs) • Economical systems • Immunology • Ecology • Modeling of social systems 94 The algorithm • Initial population → parent selection → creation of new individuals (crossover, mutation) → quality measure, reproduction → new generation → exit criteria? • If no: continue with the algorithm • If yes: selection of the result, decoding • Like in biology in real word 95 Problem building • Selection of the most important features, coding • Fitness function = quality measure (optimum criterium) • Exit criteria • Selection of the size of the population • Specification of the genetic operations 96 Simple genetic algorithms • Representation = features coded in a binary string (chromosome, string) • Fitness function = representing the ”viability” (optimality) of the individual • Selection = selecting the parent individuals from the generation (e.g. random but fitness based, i.e. better chance with higher fittness value) 97 Simple genetic algorithms • Crossover from 2 parents two offsprings (one point, two point, N-point, uniform) 98 Simple genetic algorithms • Mutation (of the bits (genes)) (one or independant) • Reproduction = who will survive and form the next (new) generation – Individuals with the best fitness function • Exit: after a number of generation or depending on the fitness function of the best individual or average of the generation, ... 99 Example for GAs Maximize the f(x)=x2 function where x can take values between 0 and 31 Let’s start with a population containing 4 elements (generated randomly by throwing a coin). Each element (string) consists of 5 bits (to be able to code numbers between 0 and 31) 100 Example for GAs number Initial x value population f(x) f(xi)/∑ f(x) ranking 1 01101 13 169 0.14 1 2 11000 24 576 0.49 2 3 01000 8 64 0.06 0 4 10011 19 361 0.31 1 Sum 1170 1170 1.00 4 Average 293 293 0.25 1 Maximum 576 576 0.49 2 101 Example for GAs The pairs Sequence of the selection Position of New x value the population crossover f(x) 01101 2 4 01100 12 144 11000 1 4 11001 25 625 11000 4 2 11011 27 729 10011 3 2 10000 16 256 Sum 1754 Average 439 Maximum 729 102 Conclusions • The fitness improved significantly in the new generation (both the average and the maximum) • Initial population: randomly chosen • Selection: 4 times by a roulette wheel where ”better” individuals had bigger sectors having bigger chance (the 3rd (worst) string has died out!) • Pairs: the 1-2, 3-4 selections • Position of the crossover: randomly chosen • Mutation: bit by bit with p=0.001 probability • (the generation contains 20 bits, in average 0.02 bit will be mutated – in this example none) 103 Anytime Techniques – Why do we need them? • Larger scale signal processing (DSP) systems, Artificial Intelligence – Limited amount of resources – Abrupt changes in… • Environment • Processing system • Computational resources (shortage) • Data flow (loss) – Processing should be continued • Low complexity lower, but possibly enough accuracy or partial results (for qualitative decisions) Anytime systems 104 Anytime Systems – What do they offer? • To handle abrupt changes due to failures • To fulfill prescribed response time conditions (changeable response time) • Continuos operation in case of serious shortage of necessary data (temporary overload of certain communication channels, sensor failures, etc.) /processing time • To provide appropriate overall performance for the whole system • guaranteed response time, known error • Flexibility: available input data, available time, computational power, balance between time and quality (quality: accuracy, resolution, etc…) 105 Anytime systems – How do they work? • Conditions: on-line computing, guaranteed response time, limited resources (changing in time) • Anytime processing: coping with the temporarily available resources to maintain the overall performance • “correct”models, treatable by the limited resources during limited time, low and changeable complexity, possibility of reallocation of the resources, changeable and guaranteed response time/ computational need, known error • tools: iterative algorithms, other types of methods used in a modular architecture 106 • optimization of the whole system (processing chain) based on intelligent decisions (expert system, shortage indicators) • algorithms and models of simpler complexity • temporarily lower accuracy • data for qualitative evaluations & for supporting decisions • coping with the temporal conditions • supporting ‘early’ decision making • preventing serious alarm situations 107 • Shortage indicators • Intelligent monitor • Special compilation methods during runtime • Strict time constraints for the monitor • The number and the complexity of the executable task can be very high add-in + optimization 108 Missing input samples Temporary overload of certain communication channels, sensor failures, etc. the input samples fail to arrive in time or will be lost prediction mechanism (estimations based on previous data) example: resonator based filters 109 Temporal shortage of computing power Temporary shortage of computer power the signal processing can not be performed in time Trade-off between the approximation accuracy and the complexity complexity reduction techniques, reduction of the sampling rate, application of less accurate evaluations 110 Temporal shortage of computing power Examples: • application of lower order filters or transformers (in case of recursive discrete transformers: to switch off some of the channels, obvious req.: to maintain e.g. the orthogonality of the transformations • Singular Value Decomposition applied to fuzzy models, B-spline neural networks, wavelet functions, Gabor functions, etc. - fuzzy filters, 111 human hearing system, generalized NNs Temporal shortage of computing time Temporary shortage of computer time the signal processing can not be performed in time Examples: block-recursive filters and filter-banks overcomplete signal representations 112 Anytime algorithms – iterative methods • Evaluate 734/25! (after 1 second: appr. 30 → after 5 seconds: better 29,3 → after 8 seconds: exactly 29,36 • We build a system for collecting information We improve the system by building in the knowledge We collect the information We improve the observation and collect more information 113 Anytime algorithms – modular architecture • Units = Distinct/different implementations of a task, with the same interface but different performance characteristics : – – – – characteristics complexity accuracy error transfer characteristic • selection Expert System Selection Unit A/1 Unit B/1 Unit A/2 Unit B/2 Unit A/3 Unit B/3 Module A Module B 114 Engineering view: Practical issues • Well defined mathematical fundation but there is a gap between the theory and the implementation • When and which is working better? (the theory can not give any answer or is lazy to think over?) • How to choose the sizes/parameters/shapes/definitions/etc.? • What if the axioms are inconsistant/incomplete? (the practical possibility can be 0) • Handling of the exceptions, e.g. the rule for very young overwrites the rule young • Good advises: Modeling, a priori knowledge, iteration, hybrid systems, smooth systems/parameters (as near to the real world as possible) 115 Accuracy problems • How can we handle accuracy problems if we e.g. don’t have any input information? • What if in time critical applications not only the stationary responses are to be considered? • How can the different modeling/data representation methods interprete the other’s results? • New (classical+nonclassical) measures are needed 116 Transients • Dynamic systems: Change in the systems transients • Depending on the transfer function and on the actual implementation of the structure • Strongly related to the „energy distribution” of the system • Effected by the steps and the reconfiguration „route” 117 Transients • Must be reduced and treated: – careful choosing of the architecture (orthogonal structures have better transients) – multi step reconfiguration: selection of the number and location of the intermediate steps – estimation of the effect of transients 118 Is CI really solution for unsolvable problems? • Yes: The high number of succesful applications and the new areas where automatization became possible prove that Computational Intelligence can be a solution for otherwise unsolvable problems • Although: With the new methods new problems have arised to be solved by you Future engineering is unthinkable without Computational Intelligence 119 Conclusions • • • • What is Computational Intelligence? What is the secret of its success? How does it work? What kind of approaches/concepts are attached? • New problems with open questions 03.10.2006 Tokyo Institute of Technology 120