Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury

advertisement
Training Conditional Random Fields using
Virtual Evidence Boosting
Lin Liao, Tanzeem
†
Choudhury ,
Dieter Fox, and Henry Kautz
†Intel
University of Washington
Research
Introduction
Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs)
Application domain: To learn human activity models from continuous, multi-modal sensory inputs
Approaches to Training Conditional Random Fields (CRFs)
Maximum Likelihood
Maximum Pseudo-Likelihood
Our Approach: Virtual Evidence Boosting
• Run numerical optimization to find the
optimal weights, which requires inference at
each iteration
• Convert a CRF into separate patches; each consists
of a hidden node and true values of neighbors
• Convert a CRF into separate patches; each consists of a
hidden node and virtual evidence of neighbors
• Run ML learning on separate patches
• Inefficient for complex structures
• Efficient but may over-estimate inter-dependency
• Run boosting (to select features) and belief propagation
(to update virtual evidence) alternately
• Inadequate for continuous observations and
feature selection
• Inadequate for continuous observations and feature
selection
• Efficient and unified approach to feature selection and
parameter estimation
• Suitable for both discrete and continuous observations
Algorithms
Extension of LogitBoost with Virtual Evidence
Virtual Evidence Boosting for CRFs
• Traditional boosting algorithms assume feature values be deterministic
• We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could
also be a likelihood value or probability distribution
INPUTS: training samples (ve(xi), yi), with yi  {0,1}, 1 i  N and F  0
OUTPUT: F (linear combination of features)
FOR each iteration
FOR each sample
Compute likelihood pi  p( yi |ve(xi))
Compute sample weight wi  pi(1 pi)
( y 1)
Compute working response zi  ip
i
END
N X
Obtain best weak learner by solving argmin   wi ve(xi)( f (xi)  zi)2
f
i1 xi1
Add the weak learner to F
END
INPUTS: Structure of CRF and training samples (ve(xi), yi), with yi  {0,1},
1 i  N and F  0
OUTPUT: F (linear combination of features)
FOR each iteration
Run BP using current F to get virtual evidence ve(xi, n(yi));
FOR each sample
Compute likelihood pi  p( yi |ve(xi))
Compute sample weight wi  pi(1 pi)
Compute working response z  ( yi 1)
i
pi
END
N X
Obtain best weak learner by solving argmin   wi ve(xi)( f (xi)  zi)2
f
i1 xi1
Add the weak learner to F
END
Experiments
Boosted Random Fields versus VEB
• Closest related work to VEB is Boosted Random
Fields (Torralba 2004)
• BRFs combine boosting and belief propagation
but assume dense graph structure and weak pairwise influence
• We compare the two as the pair-wise influence
changes
• VEB performs significantly better with strong
relations
Feature Selection
VEB can be used to extract sparse structure from
complex models. In this experiment it is able to
find the exact order in a high-order HMM, and
thus outperforms other learning alternatives.
Application: Human Activity Recognition
Model human activities and select discriminatory features from multimodal sensor data.
Sensors include accelerometer, audio, light, temperature, etc.
Indoor Activities
• Activities: computer usage, meal, TV, meeting, and
sleeping
• Linear chain CRF with 315 continuous input features
• 1100 minutes of data over 12 days
Physical Activities and Spatial Contexts
Training Algorithm
Average accuracy
VEB
94.1%
BRF
88.0%
ML + all observations
87.7%
ML + boosting
88.5%
MPL + all observations
87.9%
MPL + boosting
88.5%
Training Algorithm
Average accuracy
VEB
88.8%
MPL + all observations
72.1%
MPL + boosting
70.9%
HMM + AdaBoost
85.8%
Context sequence
Activity sequence
• Context: indoors, outdoors, and vehicles
• Activities: stationary, walking, running, driving,
and going up/down stairs
• Approximately 650 continuous input features
• 400 minutes of data over 12 episodes
Download