Human Activity Analysis

advertisement
By: Ryan Wendel


It is an ongoing analysis in which videos are
analyzed frame by frame
Most of the video recognition is pulled from
3-D graphic engines




“HAA” stands for Human Activity Analysis
Surveillance systems
Patient monitoring systems
Human-computer interfaces


We are going to take a look at methodologies
that have been developed for simple human
actions.
And high-level activities.




Gestures
Actions
Interactions
Group activities




Basic movements of a persons body parts.
For example:
Raising an arm
Lifting a leg





A Single persons activities which could entail
multiple gestures.
For example:
Walking
Waving
Shaking body



Interactions that involve two or more
people / items.
For Example:
Two people fighting





Activities performed by multiple people.
For example:
A group running
A group walking
A group fighting

Can be separated into two sections
◦ Single-layered approaches: An approach that deals
with recognizing human activities based on a video
feed (frame by frame.)
◦ Hierarchical approaches: An approach aimed at
describing the high level approach to HAA by
showing high level activities in simpler terms.


Main objective is to analyze simple sequences
of movements of humans
Can be categorized into two different
categories
◦ Space-time approach: takes an input video as a 3D volume
◦ Sequential approach: takes an input video and
interprets it as a sequence of observations

Divided into three different subsections
based on features
◦ Space-time volume
◦ Space-time Trajectories
◦ Space-time features


Captures a group of human activities by
analyzing volumes of a video (frame by
frame.)
Also uses types of recognition using spacetime volumes to measure similarities between
two volumes

Uses stick figure modeling to extract joint
positions of a person at each frame by frame


Does not extract features frame by frame
Extracts features when there is a appearance
or shape change in 3-D Space-time volume

Space-Time Volume
◦ Hard to differentiate between multiple people in the
same scene.

Space-Time Trajectories
◦ 3-D body-part detection and tracking is still an
unsolved problem, and it requires a strong lowlevel component that can estimate 3-D join
location.

Space-Time features
◦ Not suitable for modeling complex activities

Divided into two different subsections based
on features
◦ Exemplar-based
◦ State model-based

Review
◦ Sequential approach: takes an input video and
interprets it as a sequence of observations

Exemplar-based
◦ Shows human activities with a set of sample
sequences of action executions

Sequential set of sequences that represent a
human activity as a model composed of a set
of states.


Exemplar-based is more flexible in terms of
comparing multiple sample sequences
Where as State Model-based can handle a
probabilistic analysis of an activity better.



Sequential approach is able to handle and
detect more complex activities performed
Whereas the Space-time approach handles
simpler less complex activities.
Both methods are based off of some type of a
sequences of images


Allows the recognition of high-level activities
based on the recognition results of other
simpler activities
Advantages of the Hierarchical Approach
◦ Has the ability to recognize high-level activities
with a more in depth structure
◦ Amount of data required to recognize an activity is
significantly less then single-layered approach
◦ Easier to incorporate human knowledge



Statistical approach
Syntactic approach
Description-based approach


Statistical approaches use the state-based
models to recognize activities
If you use multiple layers of a state-based
model you can use these separate models to
recognize activities with sequential structures


Human activities are recognized as a string of
symbols
Human activities are shown as a set of
production rules generating a string of
actions

Human activities that use recognition with
complex spatio-temporal structures
◦ A spatio-temporal structure is a detector used for
recognizing human actions

Uses Context-free grammars (CFGs) to
represent activities
◦ CFGs are used to recognize high-level activities
◦ The detection extracts space-time points and local
periodic motions to obtain a sparse distribution of
interest points in a video



Probability theory
Fuzzy logic
Bayesian network:
◦ Used for recognition of an activity, based on the
activities temporal structure representation
◦ Uses a large network with over 10,000 nodes

A group of persons marching
◦ The images are recognized as an overall motion of
an entire group

A group of people fighting
◦ Multiple videos are used to recognize the activity
that a “group is fighting”



Recognition of interactions between humans
and objects requires multiple components
involved.
A lot of human-object interaction ignores
interaction between object recognition and
motion estimation
You can also factor in object dependencies,
motions, and human activities to determine
activities involved




J.K. Aggarwal and M.S. Ryoo. 2011. Human activity analysis: A review.
ACM Comput. Surv. 43, 3, Article 16 (April 2011), 43 pages.
DOI=10.1145/1922649.1922653
http://doi.acm.org/10.1145/1922649.1922653
Christopher O. Jaynes. 1996. Computer vision and artificial intelligence.
Crossroads 3, 1 (September 1996), 7-10. DOI=10.1145/332148.332152
http://doi.acm.org/10.1145/332148.332152
Zhu Li, Yun Fu, Thomas Huang, and Shuicheng Yan. 2008. Real-time
human action recognition by luminance field trajectory analysis. In
Proceedings of the 16th ACM international conference on Multimedia
(MM '08). ACM, New York, NY, USA, 671-676.
DOI=10.1145/1459359.1459456
http://doi.acm.org/10.1145/1459359.1459456

Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift
descriptor and its application to action recognition. In Proceedings of the
15th international conference on Multimedia (MULTIMEDIA '07). ACM,
New York, NY, USA, 357-360. DOI=10.1145/1291233.1291311
http://doi.acm.org/10.1145/1291233.1291311
Download