Example I: Predicting the Weather

advertisement
Example I: Predicting the Weather
Let us study an interesting neural network application.
Its purpose is to predict the local weather based on
a set of current weather data:
• temperature (degrees Celsius)
• atmospheric pressure (inches of mercury)
• relative humidity (percentage of saturation)
• wind speed (kilometers per hour)
• wind direction (N, NE, E, SE, S, SW, W, or NW)
• cloud cover (0 = clear … 9 = total overcast)
• weather condition (rain, hail, thunderstorm, …)
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
1
Example I: Predicting the Weather
We assume that we have access to the same data
from several surrounding weather stations.
There are eight such stations that surround our
position in the following way:
100 km
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
2
Example I: Predicting the Weather
How should we format the input patterns?
We need to represent the current weather conditions
by an input vector whose elements range in
magnitude between zero and one.
When we inspect the raw data, we find that there are
two types of data that we have to account for:
• Scaled, continuously variable values
• n-ary representations of category values
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
3
Example I: Predicting the Weather
The following data can be scaled:
• temperature (-10… 40 degrees Celsius)
• atmospheric pressure (26… 34 inches of mercury)
• relative humidity (0… 100 percent)
• wind speed (0… 250 km/h)
• cloud cover (0… 9)
We can just scale each of these values so that its
lower limit is mapped to some  and its upper value is
mapped to (1 - ).
These numbers will be the components of the input
vector.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
4
Example I: Predicting the Weather
Usually, wind speeds vary between 0 and 40 km/h.
By scaling wind speed between 0 and 250 km/h, we
can account for all possible wind speeds, but usually
only make use of a small fraction of the scale.
Therefore, only the most extreme wind speeds will
exert a substantial effect on the weather prediction.
Consequently, we will use two scaled input values:
• wind speed ranging from 0 to 40 km/h
• wind speed ranging from 40 to 250 km/h
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
5
Example I: Predicting the Weather
How about the non-scalable weather data?
• Wind direction is represented by an eightcomponent vector, where only one element (or
possibly two adjacent ones) is active, indicating one
out of eight wind directions.
• The subjective weather condition is represented by a
nine-component vector with at least one, and
possibly more, active elements.
With this scheme, we can encode the current conditions at a
given weather station with 23 vector components:
• one for each of the four scaled parameters
• two for wind speed
• eight for wind direction
• nine for the subjective weather condition
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
6
Example I: Predicting the Weather
Since the input does not only include our station, but
also the eight surrounding ones, the input layer of the
network looks like this:
…
our station
…
…
north
…
northwest
The network has 207 input neurons, which accept
207-component input vectors.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
7
Example I: Predicting the Weather
What should the output patterns look like?
We want the network to produce a set of indicators
that we can interpret as a prediction of the weather in
24 hours from now.
In analogy to the weather forecast on the evening
news, we decide to demand the following four
indicators:
• a temperature prediction
• a prediction of the chance of precipitation occurring
• an indication of the expected cloud cover
• a storm indicator (extreme conditions warning)
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
8
Example I: Predicting the Weather
Each of these four indicators can be represented by
one scaled output value:
• temperature (-10… 40 degrees Celsius)
• chance of precipitation (0%… 100%)
• cloud cover (0… 9)
• storm warning: two possibilities:
–
0: no storm warning; 1: storm warning
–
probability of serious storm (0%… 100%)
Of course, the actual network outputs range from  to
(1 - ), and after their computation, if necessary, they
are scaled to match the ranges specified above.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
9
Example I: Predicting the Weather
We decide (or experimentally determine) to use a
hidden layer with 42 sigmoidal neurons.
In summary, our network has
• 207 input neurons
• 42 hidden neurons
• 4 output neurons
Because of the small output vectors, 42 hidden units
may suffice for this application.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
10
Example I: Predicting the Weather
The next thing we need to do is collecting the
training exemplars.
First we have to specify what our network is
supposed to do:
In production mode, the network is fed with the
current weather conditions, and its output will be
interpreted as the weather forecast for tomorrow.
Therefore, in training mode, we have to present the
network with exemplars that associate known past
weather conditions at a time t with the conditions at
t – 24 hrs.
So we have to collect a set of historical exemplars
with known correct output for every input.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
11
Example I: Predicting the Weather
Obviously, if such data is unavailable, we have to start
collecting them.
The selection of exemplars that we need depends, among
other factors, on the amount of changes in weather at our
location.
For example, in Honolulu, Hawaii, our exemplars may
not have to cover all seasons, because there is little
variation in the weather.
In Boston, however, we would need to include data from
every calendar month because of dramatic changes in
weather across seasons.
As we know, some winters in Boston are much harder
than others, so it might be a good idea to collect data for
several years.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
12
Example I: Predicting the Weather
And how about the granularity of our exemplar data,
i.e., the frequency of measurement?
Using one sample per day would be a natural choice,
but it would neglect rapid changes in weather.
If we use hourly instantaneous samples, however,
we increase the likelihood of conflicts.
Therefore, we decide to do the following:
We will collect input data every hour, but the
corresponding output pattern will be the average of
the instantaneous patterns over a 12-hour period.
This way we reduce the possibility of errors while
increasing the amount of training data.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
13
Example I: Predicting the Weather
Now we have to train our network.
If we use samples in one-hour intervals for one year,
we have 8,760 exemplars.
Our network has 20742 + 424 = 8862 weights, which
means that data from ten years, i.e., 87,600
exemplars would be desirable.
Rule of thumb: There should be at least 5 to 10 times
as many training exemplars as there are weights in
the network.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
14
Example I: Predicting the Weather
Since with a large number of samples the hold-oneout training method is very time consuming, we decide
to use partial-set training instead.
The best way to do this would be to acquire a test set
(control set), that is, another set of input-output pairs
measured on random days and at random times.
After training the network with the 87,600 exemplars,
we could then use the test set to evaluate the
performance of our network.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
15
Example I: Predicting the Weather
Neural network troubleshooting:
• Plot the global error as a function of the training
epoch. The error should decrease after every
epoch. If it oscillates, do the following tests.
• Try reducing the size of the training set. If then the
network converges, a conflict may exist in the
exemplars.
• If the network still does not converge, continue
pruning the training set until it does converge. Then
add exemplars back gradually, thereby detecting
the ones that cause conflicts.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
16
Example I: Predicting the Weather
• If this still does not work, look for saturated neurons (extreme
weights) in the hidden layer. If you find those, add more
hidden-layer neurons, possibly an extra 20%.
• If there are no saturated units and the problems still exist, try
lowering the learning parameter  and training longer.
• If the network converges but does not accurately learn the
desired function, evaluate the coverage of the training set.
• If the coverage is adequate and the network still does not
learn the function precisely, you could refine the pattern
representation. For example, you could include a season
indicator to the input, helping the network to discriminate
between similar inputs that produce very different outputs.
Then you can start predicting the weather!
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
17
Further Examples Online
TensorFlow Neural Network Playground:
http://bit.ly/1SM2VVh
The OCHRE demo applet for optical character recognition:
http://www.sund.de/netze/applets/BPN/bpn2/ochre.html
… and if you are interested in Deep Learning:
ConvNetJS:
http://cs.stanford.edu/people/karpathy/convnetjs/
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
18
Computer Vision
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
19
Computer Vision
A simple two-stage model of computer vision:
Image
processing
Bitmap
image
Scene
analysis
Scene
description
feedback (tuning)
Prepare image for
scene analysis
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
Build an iconic
model of the world
20
Computer Vision
The image processing stage prepares the input
image for the subsequent scene analysis.
Usually, image processing results in one or more new
images that contain specific information on relevant
features of the input image.
The information in the output images is arranged in
the same way as in the input image. For example, in
the upper left corner in the output images we find
information about the upper left corner in the input
image.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
21
Computer Vision
The scene analysis stage interprets the results from
the image processing stage.
Its output completely depends on the problem that the
computer vision system is supposed to solve.
For example, it could be the number of bacteria in a
microscopic image, or the identity of a person
whose retinal scan was input to the system.
In the following lectures we will focus on the lowerlevel, i.e., image processing techniques.
Later we will discuss a variety of scene analysis
methods and algorithms.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
22
Computer Vision
How can we turn a visual scene into something that can be
algorithmically processed?
Usually, we map the visual scene onto a two-dimensional
array of intensities.
In the first step, we have to project the scene onto a plane.
This projection can be most easily understood by imagining a
transparent plane between the observer (camera) and the
visual scene.
The intensities from the scene are projected onto the plane by
moving them along a straight line from their initial position to
the observer.
The result will be a two-dimensional projection of the threedimensional scene as it is seen by the observer.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
23
Camera Geometry
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
24
Color Imaging via Bayer Filter
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
25
Magnetic Resonance Imaging
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
26
Digitizing Visual Scenes
In this course, we will mostly restrict our concept of
images to grayscale.
Grayscale values usually have a resolution of 8 bits
(256 different values), in medical applications
sometimes 12 bits (4096 values), or in binary images
only 1 bit (2 values).
We simply choose the available gray level whose
intensity is closest to the gray value color we want to
convert.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
27
Digitizing Visual Scenes
With regard to spatial resolution, we will map the
intensity in our image onto a two-dimensional finite
array:
y’
[0, 0]
[0, 1]
[0, 2]
[0, 3]
[1, 0]
[1, 1]
[1, 2]
[1, 3]
[2, 0]
[2, 1]
[2, 2]
[2, 3]
x’
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
28
Digitizing Visual Scenes
So the result of our digitization is a two-dimensional
array of discrete intensity values.
Notice that in such a digitized image F[i, j]
• the first coordinate i indicates the row of a pixel,
starting with 0,
• the second coordinate j indicates the column of a
pixel, starting with 0.
In an m×n pixel array, the relationship between image
(with origin in ints center) and pixel coordinates is
given by the equations
n 1
x'  j 
2
April 14, 2016
 m 1 
y '   i 

2 

Introduction to Artificial Intelligence
Lecture 20: Image Processing
29
Image Size and Resolution
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
30
Intensity Resolution
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
31
Intensity Transformation
Sometimes we need to transform the intensities of all
image pixels to prepare the image for better visibility
of information or for algorithmic processing.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
32
Gamma Transformation
Gamma
transformation:
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
33
Gamma Transformation
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
34
Linear Histogram Scaling
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
35
Linear Histogram Scaling
For a desired intensity range [a, b] we can use the
following linear transformation:
Note that outliers (individual pixels of very low or high
intensity) should be disregarded when computing Imin
and Imax.
April 14, 2016
Introduction to Artificial Intelligence
Lecture 20: Image Processing
36
Download