Selective Perception Policies for Guiding Sensing

advertisement

Selective Perception Policies for Guiding

Sensing and Computation in Multimodal Systems

Brief Presentation of ICMI ’ 03

N.Oliver & E.Horvitz paper

Nikolaos Mavridis, Feb ‘ 02

Introduction

The menu for today:

An application that served as testbed & excuse

The architecture of recognition engines used

Two varieties of selective perception

Results

Big Ideas

An intro to resolver

The main big idea:

NO NEED TO NOTICE AND PROCESS

EVERYTHING ALWAYS!

The Application

SEER:

A multimodal system for recognizing office activity

General setting:

A basic requirement for visual surveillance and multimodal HCI, is the provision of of rich, human-centric notions of context in a tractable manner …

Prior work: mainly particular scenarios (waiving the hand etc.), HMM, DynBN

Output Categories:

PC=Phone Conversation

FFC=Face2Face Conversation

P=Presentation

O=Other Activity

NP=Nobody Present

DC=Distant Conversation (out of field of view)

Input:

Audio: PCA of LPC coeffs, energy, μ,σ of ω0 , zero cr. rate

Audio Localisation: Time Delay of Arrival (TDOA)

Video: skin color, motion, foreground and face densities

Mouse & Keyboard: History of 1,5 and 60sec of activity

Recognition Engine

Recognition engine: LHMM (Layered!)

First level:

Parallel discriminative HMM’s for categories:

Audio: human speech, music, silence, noise, ring, keyboard

Video: nobody, static person, moving person, multiperson

Second level:

Input: Outputs of above + derivative of sound loc + keyb histories

Output: PC, P, FFC, P, DC, N – longer temporal extent!

Selective Perception Strategies usable for both levels!

Selecting which features to use at the input of the HMM’s!

Example:

 motion & skin density for one active person

Skin density & face detection for multiple people

Also for second stage: selecting which first stage HMM’s to run…

HMM’s vs LHMM’s

Compared to CP HMM’s (cart. Product, one long feature vector)

Prior knowledge about problem encoded in structure for LHMM’s

I.e. decomposition into smaller subproblems -> less training required, more filtered output for second stage, only first level needs retraining!

Selective Perception Strategies

Why sense everything and compute everything always?!?

Two approaches:

EVI: Expected Value of Information (ala RESOLVER)

Decision theory and uncertainty reduction

EVI computed for different overlapping subsets , real time, every frame

Greedy, one-step lookahead approach for computing the next best set of observation to evaluate

Rate-based perception (somewhat similar to RIP BEHAVIOR)

Policies defined heuristically for specifying observational frequencies and duty cycles for each computed feature

Two baselines for comparison:

Compute everything!

Randomly select feature subsets

Expected Value of Information

Endowing the perceptual system with knowledge of the value of action in the world …

EV ( f k

)   m

P ( f k m | E ) max i

 j

P ( M j

| E , f k m ) U ( M i

, M j

) f k

: The feature

(f.e.

4 features : subset

K  16) k ( k  1 ...

K ) f k m : All

(f.e.

for possible f

16 outcomes above, i.e.

of f k

( m k all four features

 1 ...

M k

) and binary outcomes m  16)

E : All previous observatio nal Evidence

P ( f k m | E ) : Probabilit y of outcome given evidence

U ( M i

, M j

) : Utility of asserting ground truth M i as M j

P ( M | j

 j

P ( M max i

(

E , j

| j

...

) f k m ) :

E ,

Probabilit f k m ) U ( M i

, M y of asserting j

)

: Expected utility

: activity

Expected utility for the ground

M j due to

truth f k m given ground state that maximizes

truth it

M i

Expected Value of Information

But what we are really interested in is what we have to gain! Thus:

EVI ( f k

)  EV ( f k

)  max

P ( M j i j

Where we also account for:

|

E , f k m ) U ( M i

, M j

)  cost

What we would given no sensing at all

Cost of sensing – but have to map cost and utility to the same currency !

(

HMM-ised implementation used!

Richer cost models:

Non-identity U matrix

Constant vs. activity-dependent costs (what else is running?) successful results! (no significant decrease in accuracy;-))

– f k

)

Rate-based perception

Simple idea:

In this case, no online-tuning of rates …

Doesn ’ t capture sequential prerequisites etc.

Results

EVI: No significant performance decrease with much less computational cost!

Also effective in activity-dependent mode.

And even more to be gained!

Take home message:

Big Ideas

No need to sense & compute everything always!

In essence we have a Planner :

 a planner for goal-based sensing and cognition!

Not only useful for AI:

Approach might be useful for computational modeling of human performance , too …

Simple satisficing works:

No need for fully-optimised planning; with some precautions, one-step ahead with many approximations is sufficient –

ALSO more plausible for Humans! (ref:Ullman)

Easy co-existence with other goal-based modules:

We just need a method for distributing time-varying costs of sensing and cognitising actions (centralised stockmarket?)

As a future direction: time-decreasing confidence mentioned

Download