Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem † Choudhury , Dieter Fox, and Henry Kautz †Intel University of Washington Research Introduction Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs) Application domain: To learn human activity models from continuous, multi-modal sensory inputs Approaches to Training Conditional Random Fields (CRFs) Maximum Likelihood Maximum Pseudo-Likelihood Our Approach: Virtual Evidence Boosting • Run numerical optimization to find the optimal weights, which requires inference at each iteration • Convert a CRF into separate patches; each consists of a hidden node and true values of neighbors • Convert a CRF into separate patches; each consists of a hidden node and virtual evidence of neighbors • Run ML learning on separate patches • Inefficient for complex structures • Efficient but may over-estimate inter-dependency • Run boosting (to select features) and belief propagation (to update virtual evidence) alternately • Inadequate for continuous observations and feature selection • Inadequate for continuous observations and feature selection • Efficient and unified approach to feature selection and parameter estimation • Suitable for both discrete and continuous observations Algorithms Extension of LogitBoost with Virtual Evidence Virtual Evidence Boosting for CRFs • Traditional boosting algorithms assume feature values be deterministic • We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could also be a likelihood value or probability distribution INPUTS: training samples (ve(xi), yi), with yi {0,1}, 1 i N and F 0 OUTPUT: F (linear combination of features) FOR each iteration FOR each sample Compute likelihood pi p( yi |ve(xi)) Compute sample weight wi pi(1 pi) ( y 1) Compute working response zi ip i END N X Obtain best weak learner by solving argmin wi ve(xi)( f (xi) zi)2 f i1 xi1 Add the weak learner to F END INPUTS: Structure of CRF and training samples (ve(xi), yi), with yi {0,1}, 1 i N and F 0 OUTPUT: F (linear combination of features) FOR each iteration Run BP using current F to get virtual evidence ve(xi, n(yi)); FOR each sample Compute likelihood pi p( yi |ve(xi)) Compute sample weight wi pi(1 pi) Compute working response z ( yi 1) i pi END N X Obtain best weak learner by solving argmin wi ve(xi)( f (xi) zi)2 f i1 xi1 Add the weak learner to F END Experiments Boosted Random Fields versus VEB • Closest related work to VEB is Boosted Random Fields (Torralba 2004) • BRFs combine boosting and belief propagation but assume dense graph structure and weak pairwise influence • We compare the two as the pair-wise influence changes • VEB performs significantly better with strong relations Feature Selection VEB can be used to extract sparse structure from complex models. In this experiment it is able to find the exact order in a high-order HMM, and thus outperforms other learning alternatives. Application: Human Activity Recognition Model human activities and select discriminatory features from multimodal sensor data. Sensors include accelerometer, audio, light, temperature, etc. Indoor Activities • Activities: computer usage, meal, TV, meeting, and sleeping • Linear chain CRF with 315 continuous input features • 1100 minutes of data over 12 days Physical Activities and Spatial Contexts Training Algorithm Average accuracy VEB 94.1% BRF 88.0% ML + all observations 87.7% ML + boosting 88.5% MPL + all observations 87.9% MPL + boosting 88.5% Training Algorithm Average accuracy VEB 88.8% MPL + all observations 72.1% MPL + boosting 70.9% HMM + AdaBoost 85.8% Context sequence Activity sequence • Context: indoors, outdoors, and vehicles • Activities: stationary, walking, running, driving, and going up/down stairs • Approximately 650 continuous input features • 400 minutes of data over 12 episodes