efrosbmm_iccv03.ppt

advertisement

Recognizing Action at a Distance

A.A. Efros, A.C. Berg, G. Mori, J. Malik

UC Berkeley

UC Berkeley

Computer Vision Group

ICCV 2003

Looking at People

Near field Far field

• 300-pixel man

• Limb tracking

– e.g. Yacoob & Black,

UC Berkeley

Rao & Shah, etc.

Computer Vision Group

• 3-pixel man

• Blob tracking

– vast surveillance literature

ICCV 2003

Medium-field Recognition

UC Berkeley

Computer Vision Group

The 30-Pixel Man

ICCV 2003

Appearance vs. Motion

UC Berkeley

Computer Vision Group

Jackson Pollock

Number 21 (detail)

ICCV 2003

Goals

• Recognize human actions at a distance

– Low resolution, noisy data

– Moving camera, occlusions

– Wide range of actions (including non-periodic)

UC Berkeley

Computer Vision Group

ICCV 2003

Our Approach

• Motion-based approach

– Non-parametric; use large amount of data

– Classify a novel motion by finding the most similar motion from the training set

• Related Work

– Periodicity analysis

• Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis;

Collins et al.

– Model-free

• Temporal Templates [Bobick & Davis]

• Orientation histograms [Freeman et al; Zelnik & Irani]

• Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

UC Berkeley

Computer Vision Group

ICCV 2003

Gathering action data

• Tracking

– Simple correlation-based tracker

UC Berkeley

– User-initialized

Computer Vision Group

ICCV 2003

Figure-centric Representation

• Stabilized spatio-temporal volume

– No translation information

– All motion caused by person’s limbs

• Good news: indifferent to camera motion

• Bad news: hard!

• Good test to see if actions, not just translation, are being captured

UC Berkeley

Computer Vision Group

ICCV 2003

Remembrance of Things Past

• “Explain” novel motion sequence by matching to previously seen video clips

– For each frame, match based on some temporal extent input sequence run walk left motion analysis swing walk right database jog

Challenge: how to compare motions?

UC Berkeley

Computer Vision Group

ICCV 2003

How to describe motion?

• Appearance

– Not preserved across different clothing

• Gradients (spatial, temporal)

– same (e.g. contrast reversal)

• Edges/Silhouettes

– Too unreliable

• Optical flow

– Explicitly encodes motion

– Least affected by appearance

– …but too noisy

UC Berkeley

Computer Vision Group

ICCV 2003

Spatial Motion Descriptor

Image frame Optical flow F x , y

F x

, F F x

, F x

, F y

, F y

 blurred F x

, F x

, F y

, F y

ICCV 2003

Spatio-temporal Motion Descriptor

Temporal extent E

Sequence A S …

… …

Sequence B t

E

A A

E

I matrix

E

B frame-to-frame

UC Berkeley

E blurry I

B motion-to-motion

ICCV 2003 similarity matrix

Football Actions: matching

Input

Sequence

Matched

Frames

UC Berkeley

Computer Vision Group input matched

ICCV 2003

Football Actions: classification

10 actions; 4500 total frames; 13-frame motion descriptor

UC Berkeley

Computer Vision Group

ICCV 2003

Classifying Ballet Actions

16 Actions; 24800 total frames; 51-frame motion descriptor.

Men used to classify women and vice versa.

UC Berkeley

Computer Vision Group

ICCV 2003

Classifying Tennis Actions

6 actions; 4600 frames; 7-frame motion descriptor

Woman player used as training, man as testing.

UC Berkeley

Computer Vision Group

ICCV 2003

Classifying Tennis

• Red bars show classification results

UC Berkeley

Computer Vision Group

ICCV 2003

Querying the Database input sequence run swing walk left walk right database

Action Recognition: run walk left swing walk right

Joint Positions:

UC Berkeley

Computer Vision Group jog jog

ICCV 2003

2D Skeleton Transfer

• We annotate database with 2D joint positions

• After matching, transfer data to novel sequence

– Ajust the match for best fit

Input sequence:

Transferred 2D skeletons:

UC Berkeley

Computer Vision Group

ICCV 2003

3D Skeleton Transfer

• We populate database with rendered stick figures from

3D Motion Capture data

• Matching as before, we get 3D joint positions (kind of)!

Input sequence:

Transferred 3D skeletons:

UC Berkeley

Computer Vision Group

ICCV 2003

“Do as I Do” Motion Synthesis input sequence synthetic sequence

• Matching two things:

– Motion similarity across sequences

– Appearance similarity within sequence (like VideoTextures)

• Dynamic Programming

UC Berkeley

Computer Vision Group

ICCV 2003

Source Motion

“Do as I Do”

Source Appearance

3400 Frames

ICCV 2003 UC Berkeley

Computer Vision Group

Result

“Do as I Say” Synthesis run walk left swing walk right jog run jog swing walk left walk right synthetic sequence

• Synthesize given action labels

– e.g. video game control

UC Berkeley

Computer Vision Group

ICCV 2003

“Do as I Say”

• Red box shows when constraint is applied

UC Berkeley

Computer Vision Group

ICCV 2003

UC Berkeley

Computer Vision Group

Actor Replacement

SHOW VIDEO

(GregWorldCup.avi, DivX)

ICCV 2003

Conclusions

• In medium field action is about motion

• What we propose:

– A way of matching motions at coarse scale

• What we get out:

– Action recognition

– Skeleton transfer

– Synthesis: “Do as I Do”

&

“Do as I say”

• What we learned?

– A lot to be said for the “little guy”!

UC Berkeley

Computer Vision Group

ICCV 2003

Download