Processing Sequential Sensor Data John Krumm Microsoft Research Redmond, Washington USA jckrumm@microsoft.com 1 Interpret a Sequential Signal 1-D Signal 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Time (seconds) Signal is • Often a function of time (as above) • Often from a sensor 2 2 Pervasive/Ubicomp Examples Signal sources • Accelerometer • Light sensor • Gyro sensor • Indoor location • GPS • Microphone •… Interpretations • Speed • Mode of transportation • Location • Moving vs. not moving • Proximity to other people • Emotion •… 3 3 Goals of this Tutorial • Confidence to add sequential signal processing to your research • Ability to assess research with simple sequential signal processing • Know the terminology • Know the basic techniques • How to implement • Where they’re appropriate • Assess numerical results in an accepted way • At least give the appearance that you know what you’re talking about 4 4 Not Covering 12000 10000 Regression – fit function to data 8000 6000 4000 2000 0 0 10 20 30 40 50 60 70 80 90 100 5 Classification – classify things based on measured features 4 3 2 1 0 0 0.5 1 1.5 2 2.5 3 100% Statistical Tests – determine if data support a hypothesis 80% 60% 40% 20% 0% 5 5 Outline • Introduction (already done!) • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 6 Signal Dimensionality 1-D Signal 120 100 80 60 40 20 0 1D: z(t) 0 10 20 30 40 50 60 70 80 90 100 Time (seconds) 2-D Signal 100 () z1(t) z2(t) 80 70 z2 (meters) 2D: z(t) = 90 60 50 40 30 bold means vector 20 10 0 0 10 20 30 40 50 60 z1 (meters) 70 80 90 100 7 7 Sampled Signal Cannot measure nor store continuous signal, so take samples instead [ z(0), z(Δ), z(2Δ), … , z((n-1)Δ) ] = [ z1, z2, z3, … , zn ] Δ = sampling interval, e.g. 1 second, 5 minutes, … 1-D Signal 120 100 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (seconds) Δ = 0.1 seconds 8 8 Signal + Noise Noise • Often assumed to be Gaussian • Often assumed to be zero mean • Often assumed to be i.i.d. (independent, identically distributed) • vi ~ N(0,σ) for zero mean, Gaussian, i.i.d., σ is standard deviation zi = xi + vi measurement from noisy sensor actual value, but unknown random number representing sensor noise 1-D Signal 120 100 80 60 40 20 0 0 10 20 30 40 50 Time (seconds) 60 70 80 90 100 9 9 Running Example Actual Path and Measured Locations Track a moving person in (x,y) • 1000 (x,y) measurements • Δ = 1 second 100 90 80 70 measurement vector y (meters) actual location noise ππ = ππ + ππ π₯π ππ = π¦ = π₯π , π¦π π (π₯) ππ = π£π (π¦) π£π ~ π π 0,3 π 0,3 zero mean 60 50 40 30 20 10 0 0 10 20 30 50 60 70 80 90 100 x (meters) standard deviation = 3 meters Also 10 randomly inserted outliers with N(0,15) 40 start outlier 10 Outline • Introduction • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 11 Mean Filter • Also called “moving average” and “box car filter” • Apply to x and y measurements separately Filtered version of this point is mean of points in solid box zx t • “Causal” filter because it doesn’t look into future • Causes lag when values change sharply • Help fix with decaying weights, e.g. • Sensitive to outliers, i.e. one really bad point can cause mean to take on any value • Simple and effective (I will not vote to reject your paper if you use this technique) 12 Mean Filter Actual Path and Measured Locations 90 90 80 80 70 70 outlier 60 50 40 60 50 40 30 30 20 20 10 10 0 0 10 20 30 40 50 60 x (meters) 70 80 90 100 Mean Filter 100 y (meters) y (meters) 100 0 0 10 20 30 40 50 60 70 80 90 100 x (meters) 10 points in each mean • Outlier has noticeable impact • If only there were some convenient way to fix this … 13 13 Median Filter Filtered version of this point is mean median of points in solid box Insensitive to value of, e.g., this point zx t Median is way less sensitive to outliners than mean median (1, 3, 4, 7, 1 x 1010) = 4 mean (1, 3, 4, 7, 1 x 1010) ≈ 2 x 109 14 14 Median Filter Actual Path and Measured Locations Median Filter 100 90 90 80 80 70 70 outlier 60 50 40 y (meters) y (meters) 100 60 50 40 30 30 20 20 10 10 0 0 10 20 30 40 50 60 70 80 90 100 0 0 10 20 x (meters) 30 40 50 60 70 80 90 100 x (meters) 10 points in each median Outlier has noticeable less impact 15 15 Mean and Median Filter Mean and Median Filter 100 90 Mean Median 80 The median is almost always better to use than the mean. y (meters) 70 60 50 40 Editorial: mean vs. median 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 x (meters) 16 16 Outline • Introduction • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 17 Kalman Filter Assumed trajectory is parabolic • Mean and median filters assume smoothness • Kalman filter adds assumption about trajectory My favorite book on Kalman filtering Weight data against assumptions about system’s dynamics Big difference #1: Kalman filter includes (helpful) assumptions about behavior of measured process 18 Kalman Filter Kalman filter separates measured variables from state variables (π₯) Measure: ππ = π§π (π¦ ) π§π π₯π π¦π Infer state: ππ = (π₯) π£π (π¦ ) π£π Running example: measure (x,y) coordinates (noisy) 100 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Running example: estimate location and velocity (ο!) Big difference #2: Kalman filter can include state variables that are not measured directly 19 19 Kalman Filter Measurements Measurement vector is related to state vector by a matrix multiplication plus noise. ππ = π»π ππ + ππ Running example: (π₯) π§π (π¦ ) π§π = 1 0 0 1 0 0 (π₯) π£π + π π, π π (π¦) = π₯π + π 0, ππ (π¦ ) = π¦π + π 0, ππ π§π π₯π π¦π π£π (π₯) π§π 0 0 Sleepy eyes threat level: orange • In this case, measurements are just noisy copies of actual location • Makes sensor noise explicit, e.g. GPS has σ of around 5 meters 20 20 Kalman Filter Dynamics Insert a bias for how we think system will change through time ππ = Φπ−1 ππ−1 + π€π−1 π₯π π¦π (π₯) π£π (π¦) π£π 1 0 = 0 0 0 1 0 0 βπ‘π 0 1 0 0 βπ‘π 0 1 (π₯) π₯π = π₯π−1 + βπ‘π π£π (π₯) π£π (π₯) = π£π−1 + π(0, ππ ) π₯π−1 π¦π−1 (π₯) π£π−1 (π¦) π£π−1 0 0 + π(0, π ) π π(0, ππ ) location is standard straight-line motion velocity changes randomly (because we don’t have any idea what it actually does) 21 21 Kalman Filter Ingredients 1 0 0 0 1 0 0 0 0 1 0 0 H matrix: gives measurements for given state π π, π π Measurement noise: sensor noise 0 1 0 0 βπ‘π 0 1 0 π π, ππ 0 βπ‘π 0 1 φ matrix: gives time dynamics of state Process noise: uncertainty in dynamics model 22 22 Kalman Filter Recipe (−) ππ (−) ππ (+) = Φπ−1 ππ−1 (+) π = Φπ−1 ππ−1 Φπ−1 + ππ−1 (−) πΎπ = ππ (+) ππ (+) ππ (−) π»ππ π»π ππ (−) = ππ = πΌ− π»ππ + π π −1 (−) + πΎπ ππ − π»π ππ (−) πΎπ π»π ππ • Just plug in measurements and go • Recursive filter – current time step uses state and error estimates from previous time step Sleepy eyes threat level: red Big difference #3: Kalman filter gives uncertainty estimate in the form of a Gaussian covariance matrix 23 23 Kalman Filter Velocity model: (π₯) π£π Kalman Filter (π₯) = π£π−1 + π(0, ππ ) 100 90 80 70 y (meters) • Smooth • Tends to overshoot corners • Too much dependence on straight line velocity assumption • Too little dependence on data 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 x (meters) 25 25 Kalman Filter Velocity model: (π₯) π£π = (π₯) π£π−1 Kalman Filter + π(0, ππ ) 100 90 Untuned 80 Tuned 70 y (meters) • Hard to pick process noise σs • Process noise models our uncertainty in system dynamics • Here it accounts for fact that motion is not a straight line 60 50 40 30 “Tuning” σs (by trying a bunch of values) gives better result 20 10 0 0 10 20 30 40 50 60 70 80 90 100 x (meters) 26 26 Kalman Filter The Kalman filter was fine back in the old days. But I really prefer more modern methods that are not saddled with Kalman’s restrictions on continuous state variables and linearity assumptions. Editorial: Kalman filter 27 27 Outline • Introduction (already done!) • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 28 Particle Filter Dieter Fox et al. WiFi tracking in a multi-floor building • Multiple “particles” as hypotheses • Particles move based on probabilistic motion model • Particles live or die based on how well they match sensor data 29 29 Particle Filter Dieter Fox et al. • Allows multi-modal uncertainty (Kalman is unimodal Gaussian) • Allows continuous and discrete state variables (e.g. 3rd floor) • Allows rich dynamic model (e.g. must follow floor plan) • Can be slow, especially if state vector dimension is too large (e.g. (x, y, identity, activity, next activity, emotional state, …) ) 30 30 Particle Filter Ingredients • z = measurement, x = state, not necessarily same • Probability distribution of a measurement given actual value • Can be anything, not just Gaussian like Kalman • But we use Gaussian for running example, just like Kalman π ππ ππ p(zi|xi) E.g. measured speed (in z) will be slower if emotional state (in x) is “tired” xi zi For running example, measurement is noisy version of actual value 31 31 Particle Filter Ingredients • Probabilistic dynamics, how state changes through time • Can be anything, e.g. • Tend to go slower up hills • Avoid left turns • Attracted to Scandinavian people • Closed form not necessary • Just need a dynamic simulation with a noise component • But we use Gaussian for running example, just like Kalman π ππ ππ−1 xi random vector xi-1 32 32 Home Example Rich measurement and state dynamics models Measurements z = ( (x,y) location in house from WiFi)T State (what we want to estimate) x = (room, activity) π ππ ππ • p((x,y) in kitchen | in bathroom) = 0 π ππ ππ−1 • p( sleeping now | sleeping previously) = 0.9 • p( cooking now | working previously) = 0.02 • p( watching TV & sleeping| *) = 0 • p( bedroom 4 | master bedroom) = 0 33 33 Particle Filter Algorithm Start with N instances of state vector xi(j) , i = 0, j = 1 … N 1. i = i+1 2. Take new measurement zi 3. Propagate particles forward in time with p(xi|xi-1), i.e. generate new, random hypotheses 4. Compute importance weights wi(j) = p(zi|xi(j)), i.e. how well does measurement support hypothesis? 5. Normalize importance weights so they sum to 1.0 6. Randomly pick new particles based on importance weights 7. Goto 1 Compute state estimate • Weighted mean (assumes unimodal) • Median Sleepy eyes threat level: orange 34 Particle Filter Dieter Fox et al. WiFi tracking in a multi-floor building • Multiple “particles” as hypotheses • Particles move based on probabilistic motion model • Particles live or die based on how well they match sensor data 35 35 Particle Filter Running Example π ππ ππ Particle Filter Measurement model reflects true, simulated measurement noise. Same as Kalman in this case. p(zi|xi) 100 90 80 70 Actual y (meters) xi zi π ππ ππ−1 π₯π = π₯π−1 + Straight line motion with random velocity change. Same as Kalman in this case. (π₯) βπ‘π π£π location is standard straight-line motion 60 Particle 1000 Particle 1000000 50 40 30 20 10 0 0 (π₯) π£π (π₯) = π£π−1 + π(0, ππ ) velocity changes randomly (because we don’t have any idea what it actually does) 10 20 30 40 50 60 70 80 90 100 x (meters) Sometimes increasing the number of particles helps36 36 Particle Filter Resources UbiComp 2004 Especially Chapter 1 37 37 Particle Filter The particle filter is wonderfully rich and expressive if you can afford the computations. Be careful not to let your state vector get too large. Editorial: Particle filter 38 38 Outline • Introduction • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 39 Hidden Markov Model (HMM) Big difference from previous: states are discrete, e.g. • Spoken phoneme • {walking, driving, biking, riding bus} • {moving, still} • {cooking, sleeping, watching TV, playing game, … } Markov 1856 - 1922 Hidden Markov 40 40 (Unhidden) Markov Model 0.7 0.9 0.1 bus walk 0.1 0.0 0.2 0.0 • Move to new state (or not) • at every time click • when finished with current state • Transition probabilities control state transitions 0.1 drive 0.9 Example inspired by: 41 UbiComp 2003 41 Hidden Markov Model 0.7 0.9 0.1 bus walk 0.1 0.0 Can “see” states only via noisy sensor 0.2 0.0 0.1 drive accelerometer 0.9 42 42 HMM: Two Parts Two parts to every HMM: 1) Observation probabilities P(Xi(j)|zi) – probability of state j given measurement at time i 2) Transition probabilities ajk – probability of transition from state j to state k Initial State Probabilities Transition Probabilities Observation Probabilities Transition Probabilities Observation Probabilities Transition Probabilities Observation Probabilities P(X0(j)) ajk P(X1(j)|z1) ajk P(X2(j)|z2) ajk P(X3(j)|z2) • Find path that maximizes product of probabilities (observation & transition) • Use Viterbi algorithm to find path efficiently 43 43 Smooth Results with HMM Signal Strength still moving still Signal Strength 80 60 40 20 moving vs. still 0 0 10 20 30 40 50 60 70 80 90 100 Time (sec.) still moving 0.00011 0.99989 still 0.99989 moving 0.00011 noise variance Signal strength has higher variance when moving → observation probabilities Transitions between states relatively rare (made-up numbers) → transition probabilities 44 44 Smooth Results with HMM 0.99989 still 0.4 still still 0.2 0.00011 still 0.9 0.3 0.00011 moving 0.99989 moving 0.6 moving 0.8 Viterbi algorithm finds path with maximum product of observation and transition probabilities moving 0.1 0.7 80 Still vs. Moving Estimate 60 moving actual 40 still 20 moving 0 moving inf erred still 0 20 40 60 80 100 inf erred and smoothed with HMM still 0 200 400 600 800 1000 Time (seconds) still moving Results in fewer false transitions between states, i.e. smoother and slightly more accurate noise variance 45 45 Running Example Hidden Markov Model Discrete states are 10,000 1m x 1m squares 100 90 80 Observation probabilities spread in Gaussian over nearby squares as per measurement noise model Transition probabilities go to 8-connected neighbors y (meters) 70 60 50 40 0.011762 0.136136 0.011762 30 0.13964 0.401401 0.13964 0.011762 0.136136 0.011762 20 10 0 0 10 20 30 40 50 60 70 80 90 100 x (meters) 46 46 HMM Reference • Good description of Viterbi algorithm • Also how to learn model from data 47 47 Hidden Markov Model The HMM is great for certain applications when your states are discrete. Editorial: Hidden Markov Model Tracking in (x,y,z) with HMM? • Huge state space (→ slow) • Long dwells • Interactions with other airplanes 48 48 Outline • Introduction • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 49 Presenting Continuous Performance Results Tracking Error vs. Filter Euclidian distance estimated value meters ππ = ππ − ππ actual value 50 45 40 35 30 25 20 15 10 5 0 Mean Error Median Error Measured Plot mean or median of Euclidian distance error • Median is less sensitive to error outliers Mean Median Kalman Kalman (untuned) (tuned) Particle HMM Tracking Error vs. Filter 7 meters 6 5 Mean Error 4 Median Error 3 2 Note: Don’t judge these filtering methods based on these plots. I didn’t spend much time tuning the methods to improve their performance. 1 0 Mean Median Kalman (untuned) Kalman (tuned) Particle HMM 50 50 Presenting Continuous Performance Results Cumulative error distribution • Shows how errors are distributed • More detailed than just a mean or median error Cumulative Error Distribution 1 0.9 0.8 Fraction 0.7 95th percentile 0.6 Median 0.5 HMM 0.4 Kalman (tuned) 0.3 Particle 0.2 Mean 0.1 Kalman (untuned) 0 0 1 2 3 4 5 6 7 8 9 10 Error (meters) 95% of the time, the particle filter gives an error of 6 meters or less (95th percentile error) 50% of the time, the particle filter gives an error of 2 meters or less (median error) 51 51 Presenting Discrete Performance Results Techniques like particle filter and HMM can classify sequential data into discrete classes Confusion matrix Actual Activities Si tti ng Sta ndi ng Wa l ki ng Inferred Acti vi ti es Up s ta i rs Down s ta i rs El eva tor down El eva tor up Brus hi ng teeth Si tti ng 75% 24% 1% 0% 0% 0% 0% 0% Sta ndi ng 29% 55% 6% 1% 0% 4% 3% 2% Wa l ki ng 4% 7% 79% 3% 4% 1% 1% 1% Up s ta i rs 0% 1% 4% 95% 0% 0% 1% 0% Down s ta i rs 0% 1% 7% 0% 89% 2% 0% 0% El eva tor down 0% 2% 1% 0% 8% 87% 1% 0% El eva tor up 0% 2% 2% 6% 0% 3% 87% 0% Brus hi ng teeth 2% 10% 3% 0% 0% 0% 0% 85% Pervasive 2006 52 52 End Actual Path and Measured Locations 100 90 80 meters • Introduction • Signal terminology and assumptions • Running example • Filtering • Mean and median filters • Kalman filter • Particle filter • Hidden Markov model • Presenting performance results 70 y (meters) 60 50 40 30 20 10 0 0 10 20 30 40 50 60 x (meters) 70 80 90 100 Tracking Error vs. Filter 7 6 5 4 3 2 1 0 Mean Error Median Error Mean Median Kalman (untuned) Kalman (tuned) Particle HMM 53 53 54 Ubiquitous Computing Fundamentals, CRC Press, © 2010 55