Example I: Predicting the Weather Let us study an interesting neural network application. Its purpose is to predict the local weather based on a set of current weather data: • temperature (degrees Celsius) • atmospheric pressure (inches of mercury) • relative humidity (percentage of saturation) • wind speed (kilometers per hour) • wind direction (N, NE, E, SE, S, SW, W, or NW) • cloud cover (0 = clear … 9 = total overcast) • weather condition (rain, hail, thunderstorm, …) April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 1 Example I: Predicting the Weather We assume that we have access to the same data from several surrounding weather stations. There are eight such stations that surround our position in the following way: 100 km April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 2 Example I: Predicting the Weather How should we format the input patterns? We need to represent the current weather conditions by an input vector whose elements range in magnitude between zero and one. When we inspect the raw data, we find that there are two types of data that we have to account for: • Scaled, continuously variable values • n-ary representations of category values April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 3 Example I: Predicting the Weather The following data can be scaled: • temperature (-10… 40 degrees Celsius) • atmospheric pressure (26… 34 inches of mercury) • relative humidity (0… 100 percent) • wind speed (0… 250 km/h) • cloud cover (0… 9) We can just scale each of these values so that its lower limit is mapped to some and its upper value is mapped to (1 - ). These numbers will be the components of the input vector. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 4 Example I: Predicting the Weather Usually, wind speeds vary between 0 and 40 km/h. By scaling wind speed between 0 and 250 km/h, we can account for all possible wind speeds, but usually only make use of a small fraction of the scale. Therefore, only the most extreme wind speeds will exert a substantial effect on the weather prediction. Consequently, we will use two scaled input values: • wind speed ranging from 0 to 40 km/h • wind speed ranging from 40 to 250 km/h April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 5 Example I: Predicting the Weather How about the non-scalable weather data? • Wind direction is represented by an eightcomponent vector, where only one element (or possibly two adjacent ones) is active, indicating one out of eight wind directions. • The subjective weather condition is represented by a nine-component vector with at least one, and possibly more, active elements. With this scheme, we can encode the current conditions at a given weather station with 23 vector components: • one for each of the four scaled parameters • two for wind speed • eight for wind direction • nine for the subjective weather condition April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 6 Example I: Predicting the Weather Since the input does not only include our station, but also the eight surrounding ones, the input layer of the network looks like this: … our station … … north … northwest The network has 207 input neurons, which accept 207-component input vectors. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 7 Example I: Predicting the Weather What should the output patterns look like? We want the network to produce a set of indicators that we can interpret as a prediction of the weather in 24 hours from now. In analogy to the weather forecast on the evening news, we decide to demand the following four indicators: • a temperature prediction • a prediction of the chance of precipitation occurring • an indication of the expected cloud cover • a storm indicator (extreme conditions warning) April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 8 Example I: Predicting the Weather Each of these four indicators can be represented by one scaled output value: • temperature (-10… 40 degrees Celsius) • chance of precipitation (0%… 100%) • cloud cover (0… 9) • storm warning: two possibilities: – 0: no storm warning; 1: storm warning – probability of serious storm (0%… 100%) Of course, the actual network outputs range from to (1 - ), and after their computation, if necessary, they are scaled to match the ranges specified above. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 9 Example I: Predicting the Weather We decide (or experimentally determine) to use a hidden layer with 42 sigmoidal neurons. In summary, our network has • 207 input neurons • 42 hidden neurons • 4 output neurons Because of the small output vectors, 42 hidden units may suffice for this application. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 10 Example I: Predicting the Weather The next thing we need to do is collecting the training exemplars. First we have to specify what our network is supposed to do: In production mode, the network is fed with the current weather conditions, and its output will be interpreted as the weather forecast for tomorrow. Therefore, in training mode, we have to present the network with exemplars that associate known past weather conditions at a time t with the conditions at t – 24 hrs. So we have to collect a set of historical exemplars with known correct output for every input. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 11 Example I: Predicting the Weather Obviously, if such data is unavailable, we have to start collecting them. The selection of exemplars that we need depends, among other factors, on the amount of changes in weather at our location. For example, in Honolulu, Hawaii, our exemplars may not have to cover all seasons, because there is little variation in the weather. In Boston, however, we would need to include data from every calendar month because of dramatic changes in weather across seasons. As we know, some winters in Boston are much harder than others, so it might be a good idea to collect data for several years. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 12 Example I: Predicting the Weather And how about the granularity of our exemplar data, i.e., the frequency of measurement? Using one sample per day would be a natural choice, but it would neglect rapid changes in weather. If we use hourly instantaneous samples, however, we increase the likelihood of conflicts. Therefore, we decide to do the following: We will collect input data every hour, but the corresponding output pattern will be the average of the instantaneous patterns over a 12-hour period. This way we reduce the possibility of errors while increasing the amount of training data. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 13 Example I: Predicting the Weather Now we have to train our network. If we use samples in one-hour intervals for one year, we have 8,760 exemplars. Our network has 20742 + 424 = 8862 weights, which means that data from ten years, i.e., 87,600 exemplars would be desirable. Rule of thumb: There should be at least 5 to 10 times as many training exemplars as there are weights in the network. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 14 Example I: Predicting the Weather Since with a large number of samples the hold-oneout training method is very time consuming, we decide to use partial-set training instead. The best way to do this would be to acquire a test set (control set), that is, another set of input-output pairs measured on random days and at random times. After training the network with the 87,600 exemplars, we could then use the test set to evaluate the performance of our network. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 15 Example I: Predicting the Weather Neural network troubleshooting: • Plot the global error as a function of the training epoch. The error should decrease after every epoch. If it oscillates, do the following tests. • Try reducing the size of the training set. If then the network converges, a conflict may exist in the exemplars. • If the network still does not converge, continue pruning the training set until it does converge. Then add exemplars back gradually, thereby detecting the ones that cause conflicts. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 16 Example I: Predicting the Weather • If this still does not work, look for saturated neurons (extreme weights) in the hidden layer. If you find those, add more hidden-layer neurons, possibly an extra 20%. • If there are no saturated units and the problems still exist, try lowering the learning parameter and training longer. • If the network converges but does not accurately learn the desired function, evaluate the coverage of the training set. • If the coverage is adequate and the network still does not learn the function precisely, you could refine the pattern representation. For example, you could include a season indicator to the input, helping the network to discriminate between similar inputs that produce very different outputs. Then you can start predicting the weather! April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 17 Further Examples Online TensorFlow Neural Network Playground: http://bit.ly/1SM2VVh The OCHRE demo applet for optical character recognition: http://www.sund.de/netze/applets/BPN/bpn2/ochre.html … and if you are interested in Deep Learning: ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/ April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 18 Computer Vision April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 19 Computer Vision A simple two-stage model of computer vision: Image processing Bitmap image Scene analysis Scene description feedback (tuning) Prepare image for scene analysis April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing Build an iconic model of the world 20 Computer Vision The image processing stage prepares the input image for the subsequent scene analysis. Usually, image processing results in one or more new images that contain specific information on relevant features of the input image. The information in the output images is arranged in the same way as in the input image. For example, in the upper left corner in the output images we find information about the upper left corner in the input image. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 21 Computer Vision The scene analysis stage interprets the results from the image processing stage. Its output completely depends on the problem that the computer vision system is supposed to solve. For example, it could be the number of bacteria in a microscopic image, or the identity of a person whose retinal scan was input to the system. In the following lectures we will focus on the lowerlevel, i.e., image processing techniques. Later we will discuss a variety of scene analysis methods and algorithms. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 22 Computer Vision How can we turn a visual scene into something that can be algorithmically processed? Usually, we map the visual scene onto a two-dimensional array of intensities. In the first step, we have to project the scene onto a plane. This projection can be most easily understood by imagining a transparent plane between the observer (camera) and the visual scene. The intensities from the scene are projected onto the plane by moving them along a straight line from their initial position to the observer. The result will be a two-dimensional projection of the threedimensional scene as it is seen by the observer. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 23 Camera Geometry April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 24 Color Imaging via Bayer Filter April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 25 Magnetic Resonance Imaging April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 26 Digitizing Visual Scenes In this course, we will mostly restrict our concept of images to grayscale. Grayscale values usually have a resolution of 8 bits (256 different values), in medical applications sometimes 12 bits (4096 values), or in binary images only 1 bit (2 values). We simply choose the available gray level whose intensity is closest to the gray value color we want to convert. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 27 Digitizing Visual Scenes With regard to spatial resolution, we will map the intensity in our image onto a two-dimensional finite array: y’ [0, 0] [0, 1] [0, 2] [0, 3] [1, 0] [1, 1] [1, 2] [1, 3] [2, 0] [2, 1] [2, 2] [2, 3] x’ April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 28 Digitizing Visual Scenes So the result of our digitization is a two-dimensional array of discrete intensity values. Notice that in such a digitized image F[i, j] • the first coordinate i indicates the row of a pixel, starting with 0, • the second coordinate j indicates the column of a pixel, starting with 0. In an m×n pixel array, the relationship between image (with origin in ints center) and pixel coordinates is given by the equations n 1 x' j 2 April 14, 2016 m 1 y ' i 2 Introduction to Artificial Intelligence Lecture 20: Image Processing 29 Image Size and Resolution April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 30 Intensity Resolution April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 31 Intensity Transformation Sometimes we need to transform the intensities of all image pixels to prepare the image for better visibility of information or for algorithmic processing. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 32 Gamma Transformation Gamma transformation: April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 33 Gamma Transformation April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 34 Linear Histogram Scaling April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 35 Linear Histogram Scaling For a desired intensity range [a, b] we can use the following linear transformation: Note that outliers (individual pixels of very low or high intensity) should be disregarded when computing Imin and Imax. April 14, 2016 Introduction to Artificial Intelligence Lecture 20: Image Processing 36