Postdoctoral Fellow
Department of Computer Science
University of British Columbia
IEEE Haptics Symposium
March 4, 2012
Vancouver, B.C., Canada
Machine learning
Template matching
Machine learning
Pattern recognition
Statistical pattern recognition
Structural pattern recognition
Regression
Neural networks
Supervised methods
Unsupervised methods
IEEE Haptics Symposium 2012 2
What is pattern recognition?
title even appears in the
International Association for Pattern Recognition
(IAPR) newsletter many definitions exist simply: the process of labeling observations ( x ) with predefined categories ( w
)
IEEE Haptics Symposium 2012 3
Various applications of PR
[Jain et al., 2000]
IEEE Haptics Symposium 2012 4
Supervised learning
“tufa”
“tufa”
“tufa”
Can you identify other “tufa”s here?
IEEE Haptics Symposium 2012 lifted from lecture notes by Josh Tenenbaum
5
Unsupervised learning
How many categories are there?
Which image belongs to which category?
IEEE Haptics Symposium 2012 lifted from lecture notes by Josh Tenenbaum
6
Pattern recognition in haptics/HCI
[Altun et al., 2010a] human activity recognition body-worn inertial sensors
accelerometers and gyroscopes daily activities
sitting, standing, walking, stairs, etc.
sports activities
walking/running, cycling, rowing, basketball, etc.
IEEE Haptics Symposium 2012 7
Pattern recognition in haptics/HCI
[Altun et al., 2010a] walking right arm acc left arm acc
IEEE Haptics Symposium 2012 basketball
8
Pattern recognition in haptics/HCI
[Flagg et al., 2012]
touch gesture recognition on a conductive fur patch
IEEE Haptics Symposium 2012 9
Pattern recognition in haptics/HCI
[Flagg et al., 2012]
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
stroke
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
scratch
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
light touch
IEEE Haptics Symposium 2012 10
Other haptics/HCI applications?
IEEE Haptics Symposium 2012 11
Pattern recognition example
excellent example by
Duda et al.
classifying incoming fish on a conveyor belt using a camera image
sea bass
salmon
[Duda et al., 2000]
IEEE Haptics Symposium 2012 12
Pattern recognition example
how to classify? what kind of information can distinguish these two species?
length, width, weight, etc.
suppose a fisherman tells us that salmon are usually shorter
so, let's use length as a feature what to do to classify?
capture image – find fish in the image – measure length – make decision how to make the decision?
how to find the threshold?
IEEE Haptics Symposium 2012 13
Pattern recognition example
[Duda et al., 2000]
IEEE Haptics Symposium 2012 14
Pattern recognition example
on the average, salmon are usually shorter, but is this a good feature ?
let's try classifying according to lightness of the fish scales
IEEE Haptics Symposium 2012 15
Pattern recognition example
[Duda et al., 2000]
IEEE Haptics Symposium 2012 16
Pattern recognition example
how to choose the threshold?
IEEE Haptics Symposium 2012 17
Pattern recognition example
how to choose the threshold?
minimize the probability of error
sometimes we should consider costs of different errors
salmon is more expensive customers who order salmon but get sea bass instead will be angry customers who order sea bass but occasionally get salmon instead will not be unhappy
IEEE Haptics Symposium 2012 18
Pattern recognition example
we don't have to use just one feature
let's use lightness and width each point is a feature vector
2-D plane is the feature space
[Duda et al., 2000]
IEEE Haptics Symposium 2012 19
Pattern recognition example
we don't have to use just one feature
let's use lightness and width each point is a feature vector
2-D plane is the feature space decision boundary
[Duda et al., 2000]
IEEE Haptics Symposium 2012 20
Pattern recognition example
should we add as more features as we can?
do not use redundant features
IEEE Haptics Symposium 2012 21
Pattern recognition example
should we add as more features as we can?
do not use redundant features
consider noise in the measurements
IEEE Haptics Symposium 2012 22
Pattern recognition example
should we add as more features as we can?
do not use redundant features
consider noise in the measurements
moreover,
avoid adding too many features
more features means higher dimensional feature vectors difficult to work in high dimensional spaces this is called the curse of dimensionality more on this later
IEEE Haptics Symposium 2012 23
Pattern recognition example
how to choose the decision boundary?
is this one better?
[Duda et al., 2000]
IEEE Haptics Symposium 2012 24
Pattern recognition example
how to choose the decision boundary?
is this one better?
[Duda et al., 2000]
IEEE Haptics Symposium 2012 25
Probability theory review
a chance experiment, e.g., tossing a 6-sided die
1, 2, 3, 4, 5, 6 are possible outcomes the set of all outcomes:
W
={1,2,3,4,5,6} is the sample space
any subset of the sample space is an event
the event that the outcome is odd: A={1,3,5}
each event is assigned a number called the probability of the event: P(A)
the assigned probabilities can be selected freely, as long as Kolmogorov axioms are not violated
IEEE Haptics Symposium 2012 26
Probability axioms
for any event,
for the sample space,
for disjoint events
third axiom also includes the case die tossing – if all outcomes are equally likely
for all i =1…6, probability of getting outcome i is 1/6
IEEE Haptics Symposium 2012 27
Conditional probability
sometimes events occur and change the probabilities of other events
example: ten coins in a bag
nine of them are fair coins – heads (H) and tails (T) one of them is fake – both sides are heads (H)
I randomly draw one coin from the bag, but I don’t show it to you
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T which of these events would you bet on?
IEEE Haptics Symposium 2012 28
Conditional probability
suppose I flip the coin five times, obtaining the outcome HHHHH (five heads in a row)
call this event F
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T which of these events would you bet on now?
IEEE Haptics Symposium 2012 29
Conditional probability
definition: the conditional probability of event A given that event B has occurred: read as: "probability of A given B"
P(AB) is the probability of events A and B occurring together
Bayes’ theorem:
IEEE Haptics Symposium 2012 30
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
we know that F occurred we want to find –
difficult – use Bayes’ theorem
IEEE Haptics Symposium 2012 31
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
IEEE Haptics Symposium 2012 32
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH) probability of observing F if H
0 was true prior probability
(before the observation F) posterior probability total probability of observing F
IEEE Haptics Symposium 2012 33
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
IEEE Haptics Symposium 2012 total probability of observing F
34
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1
1
IEEE Haptics Symposium 2012 35
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
1 1/10
IEEE Haptics Symposium 2012 36
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
1 1/10 1/32
IEEE Haptics Symposium 2012 37
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
1 1/10 1/32 9/10
IEEE Haptics Symposium 2012 38
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
32/41
1 1/10 which event would you bet on?
IEEE Haptics Symposium 2012
1/32 9/10
39
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
32/41
1 1/10 1/32 this is very similar to a pattern recognition problem!
9/10
IEEE Haptics Symposium 2012 40
Conditional probability
H
0
: the coin is fake, both sides H
H
1
: the coin is fair – one side H, other side T
F: obtaining five heads in a row (HHHHH)
1 1/10
32/41
1 1/10 1/32 9/10 we can put a label on the coin as “fake” based on our observations!
IEEE Haptics Symposium 2012 41
Bayesian inference w
0
: the coin belongs to the “fake” class w
1
: the coin belongs to the “fair” class x : observation
decide if the posterior probability is higher than others
this is called the MAP (maximum a posteriori) decision rule
IEEE Haptics Symposium 2012 42
Random variables
we model the observations with random variables
a random variable is a real number whose value depends on a chance experiment
discrete random variable
the possible values form a discrete set
continuous random variable
the possible values form a continuous set
IEEE Haptics Symposium 2012 43
Random variables
a discrete random variable X is characterized by a probability mass function (pmf)
a pmf has two properties
IEEE Haptics Symposium 2012 44
Random variables
a continuous random variable X is characterized by a probability density function
(pdf) denoted by for all possible values
probabilities are calculated for intervals
IEEE Haptics Symposium 2012 45
Random variables
a pdf also has two properties
IEEE Haptics Symposium 2012 46
Expectation
definition
average of possible values of X, weighted by probabilities
also called expected value, mean
IEEE Haptics Symposium 2012 47
Variance and standard deviation
variance is the expected value of deviation from the mean
variance is always positive
or zero, which means X is not random
standard deviation is the square root of the variance
IEEE Haptics Symposium 2012 48
Gaussian (normal) distribution
possibly the most ''natural'' distribution
encountered frequently in nature central limit theorem
sum of i.i.d. random variables is Gaussian definition: the random variable with pdf
two parameters:
IEEE Haptics Symposium 2012 49
Gaussian distribution it can be proved that: figure lifted from http://assets.allbusiness.com
IEEE Haptics Symposium 2012 50
Random vectors
extension of the scalar case
pdf:
mean:
covariance matrix:
covariance matrix is always symmetric and positive semidefinite
IEEE Haptics Symposium 2012 51
Multivariate Gaussian distribution
probability density function:
two parameters:
compare with the univariate case:
IEEE Haptics Symposium 2012 52
Bivariate Gaussian exercise
The scatter plots show 100 independent samples drawn from zero-mean Gaussian distributions,with different covariance matrices. Match the covariance matrices with the scatter plots, by inspection only.
4
2
0
-2
-4
-4 -2 a
0 2 4
4
2
0
-2
-4
-4 -2 0 2 b
IEEE Haptics Symposium 2012
4
4
2
0
-2
-4
-4 -2 c
0 2
53
4
Bivariate Gaussian exercise
The scatter plots show 100 independent samples drawn from zero-mean Gaussian distributions,with different covariance matrices. Match the covariance matrices with the scatter plots, by inspection only.
4
2
0
-2
-4
-4 -2 a
0 2 4
4
2
0
-2
-4
-4 -2 0 2 b
IEEE Haptics Symposium 2012
4
4
2
0
-2
-4
-4 -2 c
0 2
54
4
Bayesian decision theory
Bayesian decision theory falls into the subjective interpretation of probability
in the pattern recognition context, some prior belief about the class (category) of an observation is updated using the Bayes rule
IEEE Haptics Symposium 2012 55
Bayesian decision theory
back to the fish example
say we have two class es ( states of nature )
let be the prior probability that the fish is a sea bass
is the prior probability that the fish is a salmon
IEEE Haptics Symposium 2012 56
Bayesian decision theory
prior probabilities reflect our belief about which kind of fish to expect, before we observe it
we can choose according to the fishing location, time of year etc. if we don’t have any prior knowledge, we can choose equal priors (or uniform priors )
IEEE Haptics Symposium 2012 57
Bayesian decision theory
let be the feature vector obtained from our observations
can include features like lightness, weight, length, etc.
calculate posterior probabilities
how to calculate?
and
IEEE Haptics Symposium 2012 58
Bayesian decision theory
is called the class-conditional probability density function (CCPDF)
pdf of observation x if the true class was
the CCPDF is usually not known
e.g., impossible to know the pdf of the length of all sea bass in the world
but it can be estimated, more on this later
for now, assume that the CCPDF is known
just substitute observation x in
IEEE Haptics Symposium 2012 59
Bayesian decision theory
MAP rule (also called the minimum-error rule ):
decide if
decide otherwise
do we really have to calculate ?
IEEE Haptics Symposium 2012 60
Bayesian decision theory
multiclass problems: maximum a posteriori (MAP) decision rule the MAP rule minimizes the error probability, and is the best performance that can be achieved (of course, if the CCPDFs are known)
if prior probabilities are equal: maximum likelihood (ML) decision rule
IEEE Haptics Symposium 2012 61
Exercise (single feature)
find:
the maximum likelihood decision rule
[Duda et al., 2000]
IEEE Haptics Symposium 2012 62
Exercise (single feature)
find:
the maximum likelihood decision rule
[Duda et al., 2000]
IEEE Haptics Symposium 2012 63
Exercise (single feature)
find:
the MAP decision rule
if
if
[Duda et al., 2000]
IEEE Haptics Symposium 2012 64
Exercise (single feature)
find:
the MAP decision rule
if
if
[Duda et al., 2000]
IEEE Haptics Symposium 2012 65
Discriminant functions
we can generalize this
let be the discriminant function for the ith class
decision rule: assign x to class i if
for the MAP rule:
IEEE Haptics Symposium 2012 66
Discriminant functions
the discriminant functions divide the feature space into decision regions that are separated by decision boundaries
IEEE Haptics Symposium 2012 67
consider a multiclass problem ( c classes)
discriminant functions:
easy to show analytically that the decision boundaries are hyperquadrics
if the feature space is 2-D, conic sections
hyperplanes (or lines for 2-D) if covariance matrices are the same for all classes (degenerate case)
IEEE Haptics Symposium 2012 68
Examples equal and spherical covariance matrices
2-D equal covariance matrices
[Duda et al., 2000]
IEEE Haptics Symposium 2012
3-D
69
Examples
[Duda et al., 2000]
IEEE Haptics Symposium 2012 70
Examples
[Duda et al., 2000]
IEEE Haptics Symposium 2012 71
2-D example
artificial data
[Jain et al., 2000]
3
2
1
0
-1
-2
-3
-2
IEEE Haptics Symposium 2012
0 2 4
72
Density estimation
but, CCPDFs are usually unknown
that's why we need training data density estimation parametric non-parametric assume a class of densities (e.g. Gaussian), find the parameters
IEEE Haptics Symposium 2012 estimate the pdf directly
(and numerically) from the training data
73
Density estimation
assume we have n samples of training vectors for a class
we assume that these samples are independent and drawn from a certain probability distribution
this is called the generative approach
IEEE Haptics Symposium 2012 74
Parametric methods
we will consider only the Gaussian case
underlying assumption: samples are actually noise-corrupted versions of a single feature vector
why Gaussian? three important properties
completely specified by mean and variance
linear transformations remain Gaussian
central limit theorem: many phenomena encountered in reality are asymptotically
Gaussian
IEEE Haptics Symposium 2012 75
Gaussian case
assume are drawn from a
Gaussian distribution
how to find the pdf?
IEEE Haptics Symposium 2012 76
Gaussian case
assume are drawn from a
Gaussian distribution
how to find the pdf?
finding the mean and covariance is sufficient sample mean sample covariance
IEEE Haptics Symposium 2012 77
2-D example
back to the 2-D example calculate apply the MAP rule
4
3
2
1
0
-1
-2
-3
-2
IEEE Haptics Symposium 2012
0 2 4
78
2-D example
back to the 2-D example
IEEE Haptics Symposium 2012 79
2-D example
3
2
1
0
-1
-2
-3
-2 0 2 4
IEEE Haptics Symposium 2012 decision boundary with true pdf decision boundary with estimated pdf
80
Haptics example
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
stroke
[Flagg et al., 2012]
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
scratch
5
4
3
2
1
0
0 0.5
1 t (s)
1.5
2 2.5
light touch which feature to use for discrimination?
IEEE Haptics Symposium 2012 81
Haptics example
[Flagg et al., 2012]
7 participants performed each gesture 10 times
210 samples in total
we should find distinguishing features
let's use one feature at a time
we assume the feature value is normally distributed, find the mean and covariance
IEEE Haptics Symposium 2012 82
0.8
0.6
0.4
0.2
0
-5
1.4
Haptics example
1.2
stroke scratch light touch
1
0 5 minimum value
10
IEEE Haptics Symposium 2012 assume equal priors apply ML rule
83
15
10
5
0
3.5
30
Haptics example
25 stroke scratch light touch
20 apply ML rule decision boundaries?
(decision thresholds for 1-D)
4 4.5
maximum value
5
IEEE Haptics Symposium 2012 84
Haptics example
5
let's plot the 2-D distribution
4.5
clearly this isn't a
"good" classifier for this problem
4
the Gaussian assumption is not valid
3.5
3
-1 0
IEEE Haptics Symposium 2012
1 2 3 minimum value stroke scratch light touch
4 5
85
Activity recognition example
[Altun et al., 2010a]
4 participants (2 male, 2 female)
activities: standing, ascending stairs, walking
720 samples in total
sensor: accelerometer on the right leg
let's use the same features
minimum and maximum values
IEEE Haptics Symposium 2012 86
3
2.5
2
1.5
1
0.5
0
-5
Activity recognition example feature 1 feature 2 standing stairs walking
-4 -3 -2 minimum value
-1 0
3.5
3
2.5
2
1.5
1
0.5
0
-2 0 2 maximum value standing stairs walking
4
IEEE Haptics Symposium 2012 87
Activity recognition example
4
the Gaussian assumption looks valid
3
2
1
this is a "good" classifier for this problem
0
-1
-2
-5 -4 -3 -2 -1 minimum value standing stairs walking
0 1
IEEE Haptics Symposium 2012 88
Activity recognition example
4
decision boundaries
3
2 standing stairs walking
1
0
-1
-2
-5 -4 -3 -2 -1 0 1 minimum value
IEEE Haptics Symposium 2012 89
Haptics example
how to solve the problem?
IEEE Haptics Symposium 2012 90
Haptics example
how to solve the problem?
either change the classifier, or change the features
IEEE Haptics Symposium 2012 91
Non-parametric methods
let's estimate the CCPDF directly from samples
simplest method to use is the histogram
partition the feature space into (equally-sized) bins
count the number of samples in each bin k : number of samples in the bin that includes x n : total number of samples
V : volume of the bin
IEEE Haptics Symposium 2012 92
Non-parametric methods
how to choose the bin size?
number of bins increase exponentially with the dimension of the feature space
we can do better than that!
IEEE Haptics Symposium 2012 93
Non-parametric methods
compare the following density estimates
pdf estimates with six samples image from http://en.wikipedia.org/wiki/Parzen_Windows
IEEE Haptics Symposium 2012 94
Kernel density estimation
a density estimate can be obtained as
where the functions are Gaussians centered at . More precisely,
K : Gaussian kernel h n
: width of the Gaussian
IEEE Haptics Symposium 2012 95
Kernel density estimation
three different density estimates with different widths
if the width is large, the pdf will be too smooth if the width is small, the pdf will be too spiked as the width approaches zero, the pdf converges to a sum of Dirac delta functions
[Duda et al., 2000]
IEEE Haptics Symposium 2012 96
KDE for activity recognition data
2
1.5
standing stairs walking
1
0.5
0
-5 -4 -3 -2 -1 minimum value
0 1
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-2 standing stairs walking
0 2 maximum value
4 6
IEEE Haptics Symposium 2012 97
KDE for activity recognition data
4
3 standing stairs walking
2
1
0
-1
-2
-5 -4 -3 -2 -1 0 1 minimum value
IEEE Haptics Symposium 2012 98
KDE for gesture recognition data
0.5
0.4
0.3
0.2
0.1
0
-5 stroke scratch light touch
0 5 minimum value
10
10
8
2
0
2
6
4 stroke scratch light touch
3 4 maximum value
5 6
IEEE Haptics Symposium 2012 99
Other density estimation methods
Gaussian mixture models
parametric model the distribution as sum of M Gaussians optimization algorithm:
expectation-maximization (EM)
k -nearest neighbor estimation
non-parametric variable width fixed k
IEEE Haptics Symposium 2012 100
Another example
[Aksoy., 2011]
IEEE Haptics Symposium 2012 101
Measuring classifier performance
how do we know our classifiers will work?
how do we measure the performance, i.e., decide one classifier is better than the other?
correct recognition rate confusion matrix
ideally, we should have more data independent from the training set and test the classifiers
IEEE Haptics Symposium 2012 102
Confusion matrix confusion matrix for an 8class problem [Tunçel et al., 2009]
IEEE Haptics Symposium 2012 103
Measuring classifier performance
use the training samples to test the classifiers
this is possible, but not good practice
100% correct classification rate for this example!
because the classifier
"memorized" the training samples instead of "learning" them
[Duda et al., 2000]
IEEE Haptics Symposium 2012 104
Cross validation
having a separate test data set might not be possible for some cases
we can use cross validation
use some of the data for training, and the remaining for testing
how to divide the data?
IEEE Haptics Symposium 2012 105
Cross validation methods
repeated random sub-sampling
divide the data into two groups randomly (usually the size of the training set is larger)
train and test, record the correct classification rate
do this repeatedly, take the average
IEEE Haptics Symposium 2012 106
Cross validation methods
K -fold cross validation
randomly divide the data into K sets use K -1 sets for training, 1 set for testing repeat K times, at each fold use a different set for testing leave-one-out cross validation
use one sample for testing, and all the remaining for training same as K -fold cross validation, with K being equal to the total number of samples
IEEE Haptics Symposium 2012 107
0.8
0.6
0.4
0.2
0
-5
1.4
Haptics example
1.2
stroke scratch light touch
1
0 5 minimum value assume equal priors apply ML rule stroke scratch light touch stroke
53
2
35 scratch light touch
16 1
66
28
2
7
60.0%
10 the decision region for light touch is too small!!
IEEE Haptics Symposium 2012 108
15
10
5
0
3.5
30
Haptics example
25 stroke scratch light touch
20 apply ML rule stroke scratch light touch stroke
61
13
18 scratch light touch
0 9
24 33
14 38
58.5%
4 4.5
maximum value
5
IEEE Haptics Symposium 2012 109
Haptics example
0.5
0.4
0.3
stroke scratch light touch
10
8
6
4
2 stroke scratch light touch
0.2
0.1
0
-5 0 5 minimum value
10 stroke scratch light touch stroke
48
2
32 scratch light touch
16
67
30
6
1
8
58.8%
0
2 3 4 maximum value
5 6 stroke scratch light touch stroke
60
4
9 scratch light touch
0 10
23 43
13 48
62.4%
IEEE Haptics Symposium 2012 110
Activity recognition example
3
2.5
2
1.5
1
0.5
0
-5 standing stairs walking
-4 -3 -2 minimum value
-1 0
3.5
3
2.5
2
1.5
1
0.5
0
-2 0 2 maximum value standing stairs walking
4 standing stairs walking standing stairs
239
5
0
1
171
132 walking
0
64
108
71.9% standing stairs walking
IEEE Haptics Symposium 2012 standing stairs
232
41
0
8
146
72 walking
0
53
168
75.8%
111
Activity recognition example
4
3 standing stairs walking
2
1
0 standing stairs walking standing stairs
239
0
0
1
209
56 walking
0
31
184
87.8%
-1
-2
-5 -4 -3 -2 -1 minimum value
0 1
IEEE Haptics Symposium 2012 112
Another cross-validation method
used in HCI studies with multiple human subjects
subject-based leave-one-out cross validation
number of subjects: S leave one subject's data out, train with the remaining data
repeat for S times, each time test with a different subject, then average gives an estimate for the expected correct recognition rate when a new user is encountered
IEEE Haptics Symposium 2012 113
Activity recognition example minimum value maximum value
K -fold standing stairs walking standing stairs
239
5
0
1
171
132 walking
0
64
108
71.9% standing stairs walking subject-based leave-one-out standing stairs
180
13
1
60
150
125 walking
0
77
114
61.6%
K -fold standing stairs walking standing stairs
232
41
0
8
146
72 walking
0
53
168
75.8% standing stairs walking subject-based leave-one-out standing stairs
134
42
0
106
135
71 walking
0
63
169
60.8%
IEEE Haptics Symposium 2012 114
Activity recognition example
K -fold
4
3 standing stairs walking standing stairs walking standing stairs
239
0
0
1
209
56 walking
0
31
184
87.8%
2
1
0
-1
-2
-5 -4 1 standing stairs walking subject-based leave-one-out standing stairs
206
0
0
34
182
39 walking
0
58
201
81.8%
-3 -2 -1 minimum value
0
IEEE Haptics Symposium 2012 115
Dimensionality reduction
for most problems a few features are not enough
adding features sometimes helps
[Duda et al., 2000]
IEEE Haptics Symposium 2012 116
Dimensionality reduction
should we add as many features as we can?
what does this figure say?
IEEE Haptics Symposium 2012
[Jain et al., 2000]
117
Dimensionality reduction
we should add features up to a certain point
the more the training samples, the farther away this point is
more features = higher dimensional spaces
in higher dimensions, we need more samples to estimate the parameters and the densities accurately
number of necessary training samples grows exponentially with the dimension of the feature space
this is called the curse of dimensionality
IEEE Haptics Symposium 2012 118
Dimensionality reduction
how many features to use?
rule of thumb: use at least ten times as many training samples as the number of features
which features to use?
difficult to know beforehand
one approach: consider many features and select among them
IEEE Haptics Symposium 2012 119
Pen input recognition
[Willems, 2010]
IEEE Haptics Symposium 2012 120
Touch gesture recognition
[Flagg et al., 2012]
IEEE Haptics Symposium 2012 121
Feature reduction and selection
form a set of many features
some of them might be redundant
feature reduction (sometimes called feature extraction )
form linear or nonlinear combinations of features features in the reduced set usually don’t have physical meaning
feature selection
select most discriminative features from the set
IEEE Haptics Symposium 2012 122
Feature reduction
we will only consider Principal Component
Analysis (PCA)
unsupervised method
we don’t care about the class labels
consider the distribution of all the feature vectors in the d -dimensional feature space
PCA is the projection to a lower dimensional space that “best represents the data”
get rid of unnecessary dimensions
IEEE Haptics Symposium 2012 123
Principal component analysis
how to “best represent the data?”
6
4
2
0
-2
-4
-6
-6 -4 -2 0 2 4 6
IEEE Haptics Symposium 2012 124
Principal component analysis
how to “best represent the data?”
6
4
2
0 find the direction(s) in which the variance of the data is the largest
-2
-4
-6
-6 -4 -2 0 2 4 6
IEEE Haptics Symposium 2012 125
Principal component analysis
find the covariance matrix
spectral decomposition:
eigenvalues: on the diagonal of
eigenvectors: columns of
covariance matrix is symmetric and positive semidefinite = eigenvalues are nonnegative, eigenvectors are orthogonal
IEEE Haptics Symposium 2012 126
Principal component analysis
put the eigenvalues in decreasing order
corresponding eigenvectors show the principal directions in which the variance of the data is largest
say we want to have m features only
project to the space spanned by the first m eigenvectors
IEEE Haptics Symposium 2012 127
Activity recognition example
[Altun et al., 2010a]
five sensor units (wrists, legs,chest)
each unit has three accelerometers, three gyroscopes, three magnetometers
45 sensors in total computed 26 features from sensor signals
mean, variance, min, max,
Fourier transform etc.
45x26=1170 features
IEEE Haptics Symposium 2012 128
Activity recognition example
compute covariance matrix
find eigenvalues and eigenvectors
plot first 100 eigenvalues
reduced the number of features to 30
IEEE Haptics Symposium 2012 129
Activity recognition example
IEEE Haptics Symposium 2012 130
Activity recognition example what does the Bayesian decision making (BDM) result suggest?
IEEE Haptics Symposium 2012 131
Feature reduction
ideally, this should be done for the training set only
estimate from the training set, find eigenvalues and eigenvectors and the projection
apply the projection to the test vector
for example for K -fold cross validation, this should be done K times
computationally expensive
IEEE Haptics Symposium 2012 132
Feature selection
alternatively, we can select from our large feature set
say we have d features and want to reduce it to m
optimal way: evaluate all possibilities and choose the best one
not feasible except for small values of m and d
suboptimal methods: greedy search
IEEE Haptics Symposium 2012 133
Feature selection
best individual features
evaluate all the d features individually, select the best m features
IEEE Haptics Symposium 2012 134
Feature selection
sequential forward selection
start with the empty set
evaluate all features one by one, select the best one, add to the set
form pairs of features with this one and one of the remaining features, add the best one to the set
form triplets of features with these two and one of the remaining features, add the best one to the set
…
IEEE Haptics Symposium 2012 135
Feature selection
sequential backward selection
start with the full feature set
evaluate by removing one feature at a time from the set, then remove the worst feature
continue step 2 with the current feature set
…
IEEE Haptics Symposium 2012 136
Feature selection
plus p – take away r selection
first enlarge the feature set by adding p features using sequential forward selection
then remove r features using sequential backward selection
IEEE Haptics Symposium 2012 137
Activity recognition example first 5 features selected by sequential forward selection first 5 features selected by PCA
[Altun et al., 2010b]
SFS performs better than PCA for a few features. If 10-15 features are used, their performances become closer.
Time domain features and leg features are more discriminative
IEEE Haptics Symposium 2012 138
Activity recognition example
[Altun et al., 2010b]
IEEE Haptics Symposium 2012 139
Discriminative methods
we talked about discriminant functions
for the MAP rule we used
discriminative methods try to find directly from data
IEEE Haptics Symposium 2012 140
Linear discriminant functions
consider the discriminant function that is a linear combination of the components of x
for the two-class case, there is a single decision boundary
IEEE Haptics Symposium 2012 141
Linear discriminant functions
for the multiclass case, there are options
c two-class problems, separate from others
consider classes pairwise
IEEE Haptics Symposium 2012 142
Linear discriminant functions distinguish one class from others
[Duda et al., 2000] consider classes pairwise
IEEE Haptics Symposium 2012 143
Linear discriminant functions
or, use the original definition
assign x to class i if
[Duda et al., 2000]
IEEE Haptics Symposium 2012 144
Nearest mean classifier
find the means of training vectors
assign the class of the nearest mean for a test vector y
IEEE Haptics Symposium 2012 145
2-D example
artificial data
3
2
1
0
-1
-2
-3
-2
IEEE Haptics Symposium 2012
0 2 4
146
2-D example
estimated parameters
3
2
1
0
-1 decision boundary with true pdf decision boundary with nearest mean classifier
-2
-3
-2
IEEE Haptics Symposium 2012
0 2 4
147
Activity recognition example
4
3 standing stairs walking
2
1
0
-1
-2
-5 -4 -3 -2 -1 0 1 minimum value
IEEE Haptics Symposium 2012 148
for a test vector y
find the k closest training vectors let be the number of training vectors belonging to class i among these k vectors
simplest case: k=1
just find the closest training vector assign its class
decision boundaries:
Voronoi tessellation of the space
IEEE Haptics Symposium 2012 149
decision regions:
[Duda et al., 2000]
IEEE Haptics Symposium 2012 this is called a
Voronoi tessellation
150
test sample
circle class
square class
triangle
note how the decision is different for k=3 and k=5 k=3 k=5 http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
IEEE Haptics Symposium 2012 151
no training is needed
computation time for testing is high
many techniques to reduce the computational load exist
other alternatives exist for computing the distance
Manhattan distance (L
1 chessboard distance (L
∞ norm) norm)
IEEE Haptics Symposium 2012 152
Haptics example
5
4.5
K -fold stroke scratch light touch stroke
52
7
13 scratch light touch
6 12
40
16
23
41
63.3%
4
3.5
3
-1 0 subject-based leave-one-out
1 2 3 minimum value stroke scratch light touch
4 5 stroke scratch light touch stroke
50
7
14 scratch light touch
6
41
23
14
22
33
59.0%
IEEE Haptics Symposium 2012 153
Activity recognition example
4
3 standing stairs walking
2 standing stairs walking
K -fold standing stairs
240
0
0
0
206
38 walking
0
34
202
90.0%
1
0
-1
-2
-5 -4 standing stairs walking subject-based leave-one-out standing stairs
240
0
0
0
202
40 walking
0
38
200
1
89.2%
-3 -2 -1 minimum value
0
IEEE Haptics Symposium 2012 154
Activity recognition example
4
3 standing stairs walking decision boundaries for k =3
2
1
0
-1
-2
-5 -4 -3 -2 -1 0 1 minimum value
IEEE Haptics Symposium 2012 155
Feature normalization
especially when computing distances, the scales of the feature axes are important
features with large ranges may be weighted more
feature normalization can be applied so that the ranges are similar
IEEE Haptics Symposium 2012 156
Feature normalization
linear scaling where l is the lowest value and u is the largest value of the feature x
normalization to zero mean & unit variance where m is the mean value and s is the standard deviation of the feature x
other methods exist
IEEE Haptics Symposium 2012 157
Feature normalization
ideally, the parameters l , u , m
, and s should be estimated from the training set only, and then used on the test vectors
for example for K -fold cross validation, this should be done K times
IEEE Haptics Symposium 2012 158
Discriminative methods
another popular method is the binary decision tree
start from the root node
proceed in the tree by setting thresholds on the feature values
proceed with sequentially answering questions like
"is feature j less than threshold value T k
?"
IEEE Haptics Symposium 2012 159
Activity recognition example
4
3 standing stairs walking
2
1
0
-1
-2
-5 -4 -3 -2 -1 0 1 minimum value
IEEE Haptics Symposium 2012 160
Discriminative methods
one very popular method is the support vector machine classifier linear classifier applicable to linearly separable data if the data is not linearly separable, maps to a higher dimensional space
usually a Hilbert space
IEEE Haptics Symposium 2012
[Aksoy, 2011]
161
Comparison for activity recognition
1170 features reduced to 30 by PCA
19 activities
8 participants
IEEE Haptics Symposium 2012 162
References
S. Aksoy, Pattern Recognition lecture notes, Bilkent University, Ankara, Turkey, 2011.
A. Moore, Statistical Data Mining tutorials ( http://www.autonlab.org/tutorials )
J. Tenenbaum, The Cognitive Science of Intuitive Theories lecture notes, Massachussetts Institute of Technology,
MA, USA, 2006. (accessed online: http://www.mit.edu/~jbt/9.iap/9.94.Tenenbaum.ppt)
R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification , 2 nd ed., Wiley-Interscience, 2000.
A. K. Jain, R. P. D. Duin, J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4 —37, January 2000.
A. R. Webb, Statistical Pattern Recognition , 2 nd ed., John Wiley & Sons, West Sussex, England, 2002.
V. N. Vapnik, The Nature of Statistical Learning Theory , 2 nd ed., Springer-Verlag New York, Inc., 2000.
K. Altun, B. Barshan, O. Tuncel, (2010a)
“Comparative study on classifying human activities with miniature inertial/magnetic sensors,” Pattern Recognition, 43(10):3605—3620, October 2010.
K. Altun, B. Barshan, (2010b) "Human activity recognition using inertial/magnetic sensor units," in Human Behavior
Understanding, Lecture Notes in Computer Science, A.A.Salah et al. (eds.), vol. 6219, pp. 38
—51, Springer,
Berlin, Heidelberg, August 2010.
A. Flagg, D. Tam, K. MacLean, R. Flagg, “Conductive fur sensing for a gesture-aware furry robot,” Proceedings of
IEEE 2012 Haptics Symposium , March 4-7, 2012, Vancouver, B.C., Canada.
O. Tuncel, K. Altun, B. Barshan, “Classifying human leg motions with uniaxial piezoelectric gyroscopes,” Sensors,
9(11):8508 —8546, November 2009.
D. Willems, Interactive Maps – using the pen in human-computer interaction, PhD Thesis, Radboud University
Nijmegen, Netherlands, 2010
(accessed online: http://www.donwillems.net/waaaa/InteractiveMaps_PhDThesis_DWillems.pdf)
IEEE Haptics Symposium 2012 163