Image Classification

advertisement
Image Classification
Chapter 12
Intro

Digital image classification is assigning
pixels to classes (categories)



Each pixel has as many digital values as there are
bands
Compare the values to pixels of known
composition and assign them accordingly
Each class (in theory) is homogenous
Intro

Direct uses


Produce a map of land-use/land-cover
Indirect use

Classification is an intermediate step, and may
form only one of several data layers in a GIS

Water map vs water quality GIS
Intro

Classifier is a computer program that does
some sort of image classification



Many different types available
No one method is “best”
Simplest is a point or spectral classifier

Considers each pixel individually


Simple and economic
Can’t describe relation to neighboring pixels
Intro

Spatial neighborhood classifiers consider
groups of pixels

More difficult to program and more expensive
Intro




Supervised vs Unsupervised classification
Supervised requires the analyst to identify
known areas
Unsupervised determines a set number of
categories based on a computer algorithm
Hybrid classifiers are a mix of the two
Supervised classification
Unsupervised classification



No previous knowledge assumed about data.
Tries to spectrally separate the pixels.
User has controls over:




Number of classes
Number of iterations
Convergence thresholds
Two main algorithms: Isodata and k-means
Unsupervised classification
Informational Classes

Informational classes are categories of
interest to the users



Geological units
Forest types
Land use
Spectral classes



Spectral classes are pixels that are of uniform
brightness in each of their several channels
The idea is to link spectral classes to
informational classes
However, there is usually variability that
causes confusion

A forest can have trees of varying age, health,
species composition, density, etc.
Classes

Informational classes then are usually
composed of numerous spectral
subclasses.


These subclasses may be displayed as a single
unit on the final product
We often will want to examine how different
the pixels within a class are

Look at variance and standard deviation
Variance

The Variance is defined as:


The average of the squared differences from the
Mean.
Which is the square of the standard deviation, ie:
σ2
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
600 + 470 + 170 + 430 + 300
Mean =
1970
=
5
= 394
5
Now, we calculate each dogs difference from the Mean:
To calculate the Variance, take each difference, square it, and then average the result:
2062 + 762 + (-224)2 + 362 + (-94)2
Variance: σ2 =
108,520
=
5
So, the Variance is 21,704.
= 21,704
5
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation: σ = √21,704 = 147
So, using the Standard Deviation we have a "standard" way of knowing what is normal,
and what is extra large or extra small.
Rottweillers are tall dogs. And Dachsunds are a bit short ... but don't tell them!
Normal Distribution
Differences between means

A crude estimate of might be to simply look at the
differences in means between two classes


This is too simplistic since it does not account for
differences in variability between the two
A better way is to look at normalized differences
| xa  xb |
ND 
sa  sb
NDWI

The Normalized Difference Water Index (NDWI)
(Gao, 1996)



The SWIR reflectance reflects changes in both the
vegetation water content and the spongy mesophyll
structure in vegetation canopies,
The NIR reflectance is affected by leaf internal structure
and leaf dry matter content but not by water content.
The combination of the NIR with the SWIR removes
variations induced by leaf internal structure and leaf dry
matter content, improving the accuracy in retrieving the
vegetation water content (Ceccato et al. 2001).


Normalized Difference Snow Index (NDSI )
Normalized Difference Vegetation Index


Normalized Difference Cloud Index (NDCI) is
defined as a ratio between the difference and the
sum of two zenith radiances measured for two
narrow spectral bands in the visible and near-IR
regions.

It provides extra tools to remove the radiative effects of the
3D cloud structure.
Unsupervised Classification


RS images are usually composed of several
relatively uniform spectral classes
Unsupervised classification is the
identification, labeling and mapping of such
classes
Unsupervised Classification

Advantages




Requires no prior knowledge of the region
Human error is minimized
Unique classes are recognized as distinct units
Disadvantages



Classes do not necessarily match informational
categories of interest
Limited control of classes and identities
Spectral properties of classes can change with
time
Unsupervised Classification


Distance Measures are used to group or
cluster brightness values together
Euclidean distance between points in space
is a common way to calculate closeness

Euclidean metric is the "ordinary" distance
between two points that one would measure with
a ruler, and is given by the Pythagorean formula.
Euclidean Distance

Distance
Euclidean Distance


Example, the (Euclidean) distance between
points (2, -1) and (-2, 2)
dist((2, -1), (2, 2))






= √(2 - (-2))² + ((-1) - 2)²
= √(2 + 2)² + (-1 - 2)²
= √(4)² + (-3)²
= √16 + 9
= √25
= √5.
Euclidean Distance

This can be extended to multiple dimensions
(bands)

Add the differences together
1
Pixel A
34
Pixel B
26
Difference 8
(Dif)2
64


Σ (dif)2 = 1,637
√1637 = 40.5
2
28
16
12
144
3
22
52
-30
900
4
6
29
-23
529
Distances

There are a number of other distances that
can be calculated

L1 distance is the sum of absolute differences
between different bands

e.g. 8=12+30+23 = 73 for previous example
Example Landsat bands
Near-IR band
Red band
Example spectral plot
• Two bands of data.
• Each pixel marks a location
in this 2d spectral space
Band 2
• Our eye’s can split the data
into clusters.
• Some points do not fit
clusters.
Band 1
K-means (unsupervised)
1.
2.
3.
4.
5.
A set number of cluster centers are
positioned randomly through the spectral
space.
Pixels are assigned to their nearest
cluster.
The mean location is re-calculated for
each cluster.
Repeat 2 and 3 until movement of cluster
centres is below threshold.
Assign class types to spectral clusters.
Band 1
1. First iteration. The
cluster centers are
set at random.
Pixels will be
assigned to the
nearest center.
Band 2
Band 2
Band 2
Example k-means
Band 1
2. Second iteration.
The centers move to
the mean-center of
all pixels in this
cluster.
Band 1
3. N-th iteration. The
centers have
stabilized.
Key Components

Regardless of the unsupervised algorithm
need to pay attention to methods for



Measuring distance
Identifying class centroids
Testing distinctness of classes
Decision Boundary

All classification programs try to determine
classes based on “decision boundaries”


That is, divide feature space into an exhaustive
set of nonoverlapping regions
Begin with a set of prelabeled points for each
class (training samples)


Minimum Distance to Means – determine locus of
points equidistant from class mean
Nearest neighbor – determine locus of points
equidistant from the nearest member of 2 classes
Decision boundary
Decision boundary
Decision Boundaries

Classification usually not so easy



Desired classes have distributions in feature
space that are not obviously separated
Nearly always have to use more than three
features (dimensions)
Wind up having to use discriminant functions
Supervised classification





Start with knowledge of class types.
Classes are chosen at start
Training samples are created for each class
Ground truth used to verify the training
samples.
Quite a few algorithms. Here we will look at:


Parallelepiped
Maximum likelihood
Supervised Classification

Advantages




Analyst has control over the selected classes
tailored to the purpose
Has specific classes of known identity
Does not have to match spectral categories on
the final map with informational categories of
interest
Can detect serious errors in classification if
training areas are missclassified
Supervised Classification

Disadvantages


Analyst imposes a classification (may not be
natural)
Training data are usually tied to informational
categories and not spectral properties




Remember diversity
Training data selected may not be representative
Selection of training data may be time consuming
and expensive
May not be able to recognize special or unique
categories because they are not known or small
Supervised Classification

Training data




Specify corner points of selected areas
Assume that the correct ID is known
Often requires ancillary data (maps, photos, etc.)
Field work often needed to verify
Supervised Classification

Key Characteristics of Training areas

Number of pixels



Number depends on the number of categories,
their diversity, and resources available


Have several training areas for one category
A total of at least 100 pixels per category
More areas also allow discarding ones that have too
high a variance
Shape –not important, usually rectangular for
ease
Supervised Classification

More Key Characteristics


Locations must be spread around the image and
be easily transferred from map to image
Size must be large enough to estimate spectral
characteristics and variations



Varies with sensor type and resolution
Varies with heterogeneity of area
Uniformity means that each training set should
be as homogenous as possible
Idealized Sequence






Assemble information
Conduct field studies
Conduct preliminary study of scene to
determine landmarks and assess image
quality
Identify training areas
Evaluate training data
Edit training data if necessary
Feature Selection

Graphic Method – one of the first simple
feature selection aids

Plot ±1σ in a bar graph
Feature Selection

Cospectral parallelepiped plots (ellipse
plots) visual representation of separability in
two dimensional feature space


Use mean and SD of training class statistics for
each class, c, and band, k
Parallelepipeds represent mean ±1σ of each band
for each class
Feature Selection

Statistical Methods


Statistically try to separate clusters
Results in two types of errors


A pixel is assigned to a class to which it does not
belong (error of commission)
A pixel is not assigned to its appropriate class (error
of omission)
Parallelepiped

Also known as box decision rule, or levelslice procedure

Based on the values of the training data
Parallelepiped (supervised)





For each training region determine the range of
values observed in each band.
These ranges form a spectral box (or parallelepiped)
which is used to classify this class type.
Assign new image pixels to the parallelepiped which
it fits into best.
Pixels outside all boxes can be unclassified or
assigned to the closest one.
Problems with classes that exhibit high correlation
between bands. This creates long ‘diagonal’ datasets that don’t fit well into a box.
Parallelepiped example
Training classes plotted in spectral
space. In this example using 2 bands.
Parallelepiped example
continued
•Each class type defines a
spectral box
•Note that some boxes overlap
even though the classes are
spatially separable.
•This is due to band correlation in
some classes.
•Can be overcome by
customising boxes.
Parallelepiped example
• The algorithm tests a pixel to see if its spectral
values fall within the bounds of each class.
• Pixels are sequentially tested against the defined
classes (i.e., class 1 is tested first, class 2 is tested
next, etc.).
• As soon as the test is passed, the pixel is classified
and the algorithm moves on to the next pixel.
• This classifier is mathematically simple.
• Problem: We will have ambiguities when working
with classes with overlapping bounds.
Parallelepiped example
• The checking procedure stops once the digital
numbers, associated with the investigated pixel,
lies within the bounds of a certain class.
– The classification result is order dependent.
• In other words, the final classification result
depends on how the classes are numbered.
• This is not a desirable feature.
• Solution: Minimum distance classifier.
Minimum Distance Classifier
• Any pixel in the scene is categorized using the
distances between:
– The digital number vector (spectral vector)
associated with that pixel, and
– The means of the information classes derived from
the training sets.
• The pixel is designated to the class with the shortest
distance.
• Some versions of this classifier use the standard
deviation of the classes to determine a minimum
distance threshold.
Minimum Distance Classifier
• If minimum distance is greater than the threshold,
the pixel will be considered unclassified.
– This pixel does not belong to any of the
classes represented by the training set.
• This classifier is slower than the parallelepiped
classifier
• This classifier is mathematically simple.
• Problem: We do not use the standard deviation
derived from the training data.
Maximum likelihood
(supervised)





For each training class the spectral variance and
covariance is calculated.
The class can then be statistically modelled with
a mean vector and covariance matrix.
This assumes the class is normally distributed.
Which is generally okay for natural surfaces.
Unidentified pixels can then be given a
probability of being in any one class.
Assign the new pixel to the class with the
highest probability – or unclassified if all
probabilities low.
Maximum likelihood
(supervised)
Maximum likelihood example
• Normal probability distributions
are fitted to each training class.
• The lines in the diagram show
regions of equal probability.
• Point 1 would be assigned to
class ‘pond culture’ as this is
most probable.
• Point 2 would generally be
unclassified as the probabilities
of fitting into one for the
classes would be below
threshold.
1
Equiprobability
contours
2
Maximum likelihood example
• Characteristics:
– Generally produces the most accurate
classification results.
– Assumes normal distribution of the
spectral data within the training classes.
– Mathematically complex.
– Computationally slow.
ISODATA (hybrid)


Extends k-means. Also calculate standard deviation
for clusters.
After mean location is re-calculated for each cluster
we can either:






Combine clusters if centers are close.
Split clusters with large standard deviation in any
dimension.
Delete clusters that are to small.
Then reclassify each pixel and repeat.
Stop on max iterations or convergence limit.
Assign class types to spectral clusters.
Band 1
1. Data is clustered
but blue cluster is
very stretched in
band 1.
Band 2
Band 2
Band 2
Example ISODATA
Band 1
2.Cyan and green
clusters only have 2
or less pixels. So
they will be
removed.
Band 1
3. Either assign
outliers to nearest
cluster, or mark as
unclassified.
Bayes’s Classification


Bayesian classification is based on the probability of
observing a particular class given a particular pixel
value
Lets play a simple game with two sets of dice.




One normal pair
One augmented pair – 2 extra spots per side
Player 1 selects a pair of dice randomly and rolls
then, announcing only the outcome
Player 2 names which type of dice were used
Bayes’s Classification

To determine the decision boundary we can
list all possible outcomes, and how likely
each one is.



For normal dice values 2 – 12
For augmented dice 6- 16
How likely is each?


To get 2, there is only one way to roll
To get 3, there are two ways (1 and 2, or 2 and 1)
Bayes’s Classification

The histograms become discriminant
functions, and the decision boundary can be
set based on the most probable outcome in
any given case


If a 7 is the outcome, it could have come from
either pair of dice, but more likely from the
standard pair
If a 4, then it is most assuredly from the standard
pair
Bayes’s Classification


Now lets assume we are trying to guess the
type of groundcover
We could estimate the probabilities from
training areas


We would generate histograms to estimate the
probability function for each class
Use these probabilities to separate pixels
K-Nearest Neighbors
k Nearest
Neighbor Requires 3 things:




?

The set of stored records
Distance metric to compute
distance between records
The value of k, the number of
nearest neighbors to retrieve
To classify an unknown record:



Compute distance to other
training records
Identify k nearest neighbors
Use class labels of nearest
neighbors to determine the class
label of unknown record (e.g., by
taking majority vote)
k
Nearest Neighbor

Compute the distance between two
points:



Euclidean distance
d(p,q) = √∑(pi – qi)2
Hamming distance (overlap metric)
Determine the class from nearest
neighbor list

Take the majority vote of class labels among
the k-nearest neighbors
k
Nearest Neighbor

k = 1:


?
k = 3:


Belongs to triangle class
k = 7:


Belongs to square class
Belongs to square class
Choosing the value of k:



If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from
other classes
Choose an odd value for k, to eliminate ties
k
Nearest Neighbor
 Accuracy
of all NN based classification,
prediction, or recommendations depends solely
on a data model, no matter what specific NN
algorithm is used.
 Scaling issues


Attributes may have to be scaled to prevent distance
measures from being dominated by one of the
attributes.
Examples
 Height
of a person may vary from 4’ to 6’
 Weight of a person may vary from 100lbs to 300lbs
 Income of a person may vary from $10k to $500k
 Nearest

Neighbor classifiers are lazy learners
Models are not built explicitly unlike eager learners.
Nearest Neighbor
Advantages
k
 Simple
technique that is easily implemented
 Building model is cheap
 Extremely flexible classification scheme
 Well suited for


Multi-modal classes
Records with multiple class labels
 Error

Cover & Hart paper (1967)
 Can


rate at most twice that of Bayes error rate
sometimes be the best method
Michihiro Kuramochi and George Karypis, Gene Classification using Expression
Profiles: A Feasibility Study, International Journal on Artificial Intelligence Tools. Vol.
14, No. 4, pp. 641-660, 2005
K nearest neighbor outperformed SVM for protein function prediction using
expression profiles
Nearest Neighbor
Disadvantages
k

Classifying unknown records are
relatively expensive



Requires distance computation of k-nearest
neighbors
Computationally intensive, especially when
the size of the training set grows
Accuracy can be severely degraded by
the presence of noisy or irrelevant
features
Many Classifiers
Ground truth


Ideally the training regions need to be based
on ground observation.
They should be large enough to capture all
the spectral variability in the class type.


E.g. different types of forest, shallow water and
deep ocean etc.
Do not need to get too detailed otherwise
classes will not be spectrally separable.
Ancillary Data

Acquired by other means



Used to assist in classification or analysis
Maps, reports, other data
Primary requirements



Available digitally
Pertain to the problem
Compatible with the RS data
Ancillary Data



Incompatibility is a serious problem
Physical – digital formats
Logical – data usually collected for another
reason




Scale
Resolution
Date
accuracy
Ancillary Data

Stratification – subdivide the image into that
are easy to define using ancillary data


Elevation could be used to look at alpine
vegetation separately from lowland
Postclassification sorting – examine
confusion matrix and look in more detail at
confused classes
Post classification


Can check non-training regions with more
ground truth if available.
Calculate classification statistics.





Confusion Matrix: Columns show ground truth, rows
show how many pixels are assigned to each class.
Overall accuracy: Total correct pixels/total pixels
Commission errors: Incorrect pixels assigned to a
class
Omission errors: Pixels in class that are assigned a
different class
Visually check to see if any major errors or
unwanted features.
Classification and Regression
Tree Analysis



Classification and Regression Tree Analysis
(CART) is a method to incorporate ancillary
data into image classification
Requires accurate training data, but not prior
knowledge of the role of the variables
Advantage is that it identifies useful data and
separates it from those that don’t contribute
to the classification.
Fuzzy Clustering

Traditional methods allow a pixel to be
identified only with a single cluster



There are many processes which can make
matching problematic
So many pixels will be incorrectly labeled
Fuzzy logic allows partial membership

Instead of a water pixel, it could be 0.7 water and
0.3 forest.
Neural Networks

Artificial Neural Networks (ANN) are
computer programs that simulate the brain


Establishment of linkages and then reinforcement
of linkages between input and output.
Generally comprised of three elements



Input layers – source data
Hidden layers – association by weights
Output layer - classes
Neural Networks


There can be forward propagation – the
normal training to classification sequence
Backward propagation is a retrospective
analysis of input and output which allows
adjustment of the weights

This creates a transfer function


Quantitative link between input and output
Weights may show some bands are more effective for
certain classes and other bands for different classes
Contextual Classification

Context is derived from spatial relationships
within the image

Can operate on either classified or unclassified
scenes


Usually some classification has been done
It reassigns pixels as appropriate based on
location (context)
Contextual Classification
Contextual Classification
Contextual Classification
Download