Advanced Topics in Image Analysis and Machine Learning Image Degradation and Restoration Week 02 Faculty of Information Science and Engineering Ritsumeikan University Today’s class outline – Overview of Soft Computing Definition Applications Techniques – Genetic Algorithms Introduction to Genetic Algorithms – Image Restoration Project Introduction Soft Computing: A Brief Introduction Soft vs. Hard Computing Hard Computing, the normal programming that we do, requires precise and known in advance algorithms that generate the exact solution Soft Computing differs from conventional hard computing in that, unlike hard computing, it is tolerant of imprecision, uncertainty, partial truth, and approximation Soft Computing is to find approximate solutions to both imprecisely and precisely formulated problems Soft Computing: The Idea Exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness, and low solution cost There are many intractable problems in the world that have no known polynomial algorithm available for their solutions (e.g. VLSI layout design, cargo placement / packing optimisation, etc.) Soft Computing Applications Handwriting and speech recognition Image processing and data compression Automotive system control and manufacturing Decision-support systems (planning and scheduling, etc.) Artificial vision systems Consumer electronics control … Techniques of Soft Computing Soft Computing includes three major complementary groups of techniques: - Evolutionary Computation: Genetic Algorithm (GA) and Genetic Programming - Neurocomputing: Artificial Neural Networks (ANN) and Machine Learning - Fuzzy Sets and Fuzzy Logic, Rough Sets, and Probabilistic Reasoning Overview of the Techniques Evolu&onary Computa&on: the algorithms for search and op&miza&on Neurocompu&ng: the algorithms for learning and curve fi<ng Fuzzy Logic and Probabilis&c Reasoning: the algorithms for dealing with decision‐making imprecision and uncertainty, and Rough Sets: the approach to handle uncertainty arising from the granularity in the domain of discourse Evolutionary Computation: Genetic Algorithm The idea (by John Holland, in 1970s): - Evolution works biologically so, maybe, it will work with simulated environments - Each possible solution (whether good or bad) is represented by a chromosome – a pattern of bits which works as DNA. Then: Determine the better solutions Mate these solutions to produce new solutions which are (hopefully) occasionally better than the parents Repeat this for many generations Artificial Neural Network (ANN) ANN Training ANN’s are trained in a variety of ways. During the training process, the weights for links among the different neurons are modified un&l the desired output is generated for each input case used in the training process OUTPUT .9 .2 .2 .4 1.1 1.7 .2 .3 3 .4 .2 .1 2 INPUT 1 Fuzzy Sets and Fuzzy Logic Sets with fuzzy boundaries - Consider a set of tall people Crisp set A 1.0 Fuzzy set B 1.0 .9 Membership .5 function 172cm Heights 172 186 Heights Rough Sets A rough set is a set with vague boundaries. By comparison, a regular set has sharp boundaries, that is, elements on the boundary either belong to the set or do not belong to it. Rough Sets Set membership of elements on the boundary of a rough set is undetermined, i.e., it is not known whether they belong to the set or not. A rough set is characterised by its lower approximation, that is, a collection of elements which belong to the set for sure, and its upper approximation, that is a collection of elements which may belong to a set. A boundary of a rough set is defined as a difference between the upper approximation and the lower approximation. Simulated Annealing: The Problem A simple, smooth 2D problem Global Maximum A real-world problem Optimisation problem: Search of the global maximum Simulated Annealing: The Idea (Physical) annealing is the process of slowly cooling down a substance (like a heated liquid metal) Simulated annealing is a stochastic optimisation method implemented by the Metropolis Algorithm (Monte Carlo): – Current thermodynamic state = solution (hopefully, the global maximum) - Energy equation = objective function - “Ground” (stable) state = global optimum Genetic Algorithm (GA) OVERVIEW A class of probabilistic optimisation algorithms Inspired by the biological evolution process Uses concepts of “Natural Selection” and “Genetic Inheritance” (Darwin 1859) Originally developed by John Holland (1975) Special Features: – Traditionally emphasizes combining information from good parents (crossover) – There are many GA variants, e.g., reproduction models, operators GA overview (cont) Particularly well suited for hard problems where little is known about the underlying search space Widely-used in business, science and engineering Holland’s original GA is now known as the simple genetic algorithm (SGA). Other GAs use different: – Representations – Mutations – Crossovers – Selection mechanisms Function Optimisation GA's are useful for solving multidimensional problems containing many local maxima (or minima) in the A real-world problem solution space A simple optimisation problem (no need to use a GA to solve this!) global local A standard method of finding maxima or minima is via the gradient decent (gradient ascent) method global local I found the top! Problem: this method may find only a local maxima! Genetic Algorithm: the Idea My height is 13.2m My height is 10.5m My height is 7.5m My height is 3.6m The Genetic Algorithm uses multiple climbers in parallel to find the global optimum Genetic algorithm – some iterations later A climber has approached the “global maximum” I found the top! GA Stochastic operators Selection replicates the most successful solutions found in a population at a rate proportional to their relative quality Crossover takes two distinct solutions and then randomly mixes their parts to form novel solutions Mutation randomly perturbs a candidate solution The Evolutionary Cycle fittest parents selection modification modified offspring initiate & evaluate population parents evaluated 䇺strong䇻 offspring evaluation deleted members discard GA Example: the MAXONE problem Suppose we want to maximise the number of ones in a string of 10 binary digits A gene can be encoded as a string of 10 binary digits, e.g., 0010110101 The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code, e.g. f(0010110101) = 6 We start with a population of n random strings. Suppose that n = 6 Example (initialisation) Our initial population of parent genes is made using random binary data: s1 = 1111010101 f (s1) = 7 s2 = 0111000101 f (s2) = 5 s3 = 1110110101 f (s3) = 7 s4 = 0100010011f (s4) = 4 s5 = 1110111101 f (s5) = 8 s6 = 0100110000f (s6) = 3 The fitness f of a parent gene is simply the sum of the bits. Selection Selection is an operation that is used to choose the best parent genes from the current population for breeding a new child population Purpose: to focus the search in promising regions of the solution space Example (Selection) Next we apply fitness proportionate selection with the roulette wheel method: f (i ) Individual i will have a ∑ f (i ) i probability to be chosen We repeat the extraction as many times as the number of individuals we need to have the same parent population size (6 in our case) 1 n Area is Proportional to fitness value 2 3 4 Example (selection continued) Suppose that, after performing selection, we get the following population: Selected parents s` s1` = 1111010101 (s1) s2` = 1110110101 (s3) s3` = 1110111101 (s5) s4` = 0111000101 (s2) s5` = 0100010011 (s4) s6` = 1110111101 (s5) Original parents (s) Example (crossover) • Next we mate parent strings using crossover. • For each pair of parents we decide according to a crossover probability (for instance 0.6) whether to actually perform crossover or not. • Suppose that we decide to actually perform crossover only for pairs (s1`, s2`) and (s5`, s6`). • For each pair, we randomly choose a crossover point, for instance bit 2 for the first and bit 5 for the second parent Example (crossover cont.) Before crossover: s1` = 1111010101 s2` = 1110110101 s5` = 0100010011 s6` = 1110111101 After crossover: s1`` = 1110110101 s2`` = 1111010101 s5`` = 0100011101 s6`` = 1110110011 Note: sometimes crossover results in no changes to the pair! Example (mutation) The final step is to apply random mutation: for each bit in the current gene population we allow a small probability of mutation (for instance 0.05) Before applying mutation: After applying mutation: s1`` = 1110110101 s2`` = 1111010101 s3`` = 1110111101 s4`` = 0111000101 s5`` = 0100011101 s6`` = 1110110011 s1``` = 1110100101 s2``` = 1111110100 s3``` = 1110101111 s4``` = 0111000101 s5``` = 0100011101 s6``` = 1110110001 Fitness: f (s1``` ) = 6 f (s2``` ) = 7 f (s3``` ) = 8 f (s4``` ) = 5 f (s5``` ) = 5 f (s6``` ) = 6 Purpose: mutation adds new information that may be missing from the current population Example: Results • In one generation, the total population fitness changed from 34 to 37, thus improved by ~9% • At this point, we go through the same process all over again (repetition), until a stopping criterion is met Another example – Maximise X2 Simple problem: maximise y=x2 over the x interval {0,1,…,31} GA approach: • Representation: binary code, e.g. 01101 ↔ 13(10 • Population size: 4 genes (parents) • Random initialisation • Roulette wheel selection • 1-point crossover, bit-wise mutation We will show one generational cycle as an example 2 x example: selection • Make sure you understand this slide! You will implement something similar during your image restoration coding project! Probi calculation for gene S1: Prob(169) = 169/1170 = 0.144 Expected count(S1 ) = Probi * n = 0.14 * 4 = 0.58 x2 example: crossover • Each pair of genes may undergo crossover. • The crossover points are randomly selected. • Notice that, after crossover, the average population fitness increased from 293 to 439, and the best genes fitness increased from 576 to 729! x2 example: mutation • All gene bits may undergo mutation (based on the mutation rate). • Notice that, after mutation, the average population fitness increased from 439 to 588(the best genes fitness did not change though)! GA Group Projects • Today we will form teams of several students; • Each team will implement a GA in Matlab (or C/Java/VB?) to restore a corrupted image: • Each team should have one good programmer, and access to a notebook computer (preferably with Matlab)! • You will submit a written report and give a short presentation in week 15 GA Group Project: details The form of the corruption source is additive noise: N(row,col)= NoiseAmp×sin([2π×NoiseFreqRow×row]+[2π ×NoiseFreqCol×col])) Teams must code a simple GA that optimises the three unknown constants NoiseAmp, NoiseFreqRow, and NoiseFreqCol such that the restoration error (the difference between the original and GA-optimised restored image) is minimised. To make things easy, we will measure the average per-pixel restoration error, thus: Restoration error = (Ioriginal + NoiseGA)-Icorrupted where Ioriginal is the original uncorrupted Lena image, Icorrupted is the corrupted image (I will give you), and NoiseGA is the modelled GA corruption noise using the noise equation above. GA Group Project: details Each iteration of your GA will, for each gene in the population: – Generate new values for NoiseAmp, NoiseFreqRow, and NoiseFreqCol. – Corrupt the original image using the equation N(row,col)=NoiseAmp×sin([2π×NoiseFreqRow×row]+[2π×NoiseFreqCol×col])) – Measure the restoration error (subtract the corrupted image from the original image). This becomes the (inverse of) this gene’s fitness – Make new child genes using selection, crossover, and mutation functions. The search ranges for the three variables are: – NoiseAmp 0 to 30.0 – NoiseFreqRow 0 to 0.01 – NoiseFreqCol 0 to 0.01 Each gene encodes all three variables. If you use 1 byte per variable, each gene will be 24-bits, if you use 2-bytes per variable, 48 bits: 10110111 01010001 11001010 (24-bits per gene) NoiseAmp NoiseFreqRow NoiseFreqCol You need to map the (binary) integer values of each gene to floating point values for the variables. I.e, for NoiseAmp, 00000000=0.0 and 11111111=30.0 Next Lecture We will learn more about Genetic Algorithms (GAs) We will discuss the image restoration project. Read: Gonzalez and Woods Access to the course website: http://www.ritsumei.ac.jp/~gulliver/iaml Homework: Project Preparation Start coding your GA. User inputs are population size (integer, e.g., 50), crossover rate (%, integer, e.g. 60), mutation rate (%, integer, e.g. 5), and total iterations (integer, e.g. 100). Make arrays to hold the gene binary values Fill the arrays with random binary data Map the gene’s binary values to the three noise parameters’ values (floating point) Using the equation N(row,col)=NoiseAmp*sin([2π*NoiseFreqRo w*row]+[2π*NoiseFreqCol*col])) calculate the corruption noise for each pixel of the image. Remember, the noise values can be negative, so use signed data types.