Computer Vision: Gesture Recognition from Images Joshua R. New

advertisement
Computer Vision:
Gesture Recognition from
Images
Joshua R. New
Knowledge Systems Laboratory
Jacksonville State University
Knowledge Systems Lab
JN 9/10/2002
Outline
•
•
•
•
•
•
Terminology
Current Research and Uses
Kjeldsen’s PhD Thesis
Implementation Overview
Implementation Analysis
Future Directions
Knowledge Systems Lab
JN 9/10/2002
Terminology
Image Processing - Computer manipulation of images. Some of
the many algorithms used in image processing include
convolution (on which many others are based), edge detection,
and contrast enhancement.
Computer Vision - A branch of artificial intelligence and image
processing concerned with computer processing of images from
the real world. Computer vision typically requires a combination
of low level image processing to enhance the image quality (e.g.
remove noise, increase contrast) and higher level pattern
recognition and image understanding to recognize features
present in the image.
Knowledge Systems Lab
JN 9/10/2002
Current Research
•Capture images from a camera
•Process images to extract features
•Use those features to train a learning system to
recognize the gesture
•Use the gesture as a meaningful input into a system
More information located at:
http://www.cybernet.com/~ccohen/
Knowledge Systems Lab
JN 9/10/2002
Current Research Example
•Starner and Pentland
•2 hands segmented
•Hand shape from a bounding ellipse
•Eight element feature vector
•Recognition using Hidden Markov Models
Knowledge Systems Lab
JN 9/10/2002
Current Uses
•Sign Stream (released demo
for MacOS)
•Database tool for analysis of
linguistic data captured on
video
•Developed at Boston
University with funding from
ASL Linguistic Research
Project and NSF
•http://www.bu.edu/asllrp/Sign
Stream/
Knowledge Systems Lab
JN 9/10/2002
Current Uses
•Recursive Models of Human
Motion (Smart Desk, MIT)
•Models the constraints by
which we move
•Visually-guided gestural
interaction, animation, and face
recognition
•Stereoscopic vision for 3D
modeling
•http://vismod.www.media.mit.
edu/vismod/demos/smartdesk/
Knowledge Systems Lab
JN 9/10/2002
Current Uses
Knowledge Systems Lab
JN 9/10/2002
Kjeldsen’s PhD thesis
•Application
•Gesture recognition as a system interface to augment that
of the mouse
•Menu selection, window move, and resize
•Input: 200x300 image
•Calibration of user’s hand
Knowledge Systems Lab
JN 9/10/2002
Kjeldsen’s PhD thesis
•Image split into HSI channels (I = Intensity, Lightness, Value)
•Segmentation with largest connected component
•Eroded to get rid of edges
•Gray-scale values sent to learning system
Knowledge Systems Lab
JN 9/10/2002
Kjeldsen’s PhD thesis
•Learning System – Backprop network
•1014 input nodes (one for each pixel)
•20 hidden nodes
•1 output node for each classification
•40 images of each pose
•Results:
•Correct classification 90-96% of the time on images
Knowledge Systems Lab
JN 9/10/2002
Implementation Overview
• System:
•
•
1.33 Ghz AMD Athlon
OpenCV and IPL libraries (from Intel)
• Input:
•
•
2 – 640x480 images, saturation channel
Max hand size in x and y orientations in # of pixels
• Output:
•
•
•
•
Rough estimate of movement
Refined estimate of movement
Number of fingers being held up
Rough Orientation
Knowledge Systems Lab
JN 9/10/2002
Implementation Overview
Chronological order of system:
1)
2)
3)
4)
5)
6)
7)
8)
Saturation channel extraction
Threshold Saturation channel
Calculate Center of Mass (CoM)
Reduce Noise
Remove arm from hand
Calculate refined-CoM
Calculate orientation
Count the number of fingers
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
1. Saturation channel extraction:
Digital camera, saved as JPGs
JPGs converted to 640x480 PPMs
Saturation channels extracted into PGMs
Original Image
Hue
Lightness
Saturation
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
2. Threshold Saturation channel:
a) Threshold value – 50 (values range from 0 to 255)
b) @ PixelValue = PixelValue ≥ 50 ? 128 : 0
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
3. Calculate Center of Mass (CoM):
a) Count number of 128-valued pixels
b) Sum x-values and y-values of those pixels
c) Divide each sum by the number of pixels
a) 0th moment of an image:
M 00   I ( x, y)
b) 1st moment for x and y of an image, respectively:
M 10   x  I ( x, y)
M 01   y  I ( x, y)
c) Center of Mass (location of centroid):
M
M
( xc , y c ) where xc  10 and y c  01
M 00
M 00
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
4. Reduce Noise:
FloodFill at the computed CoM (128-valued pixels become 192)
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
5. Remove arm from hand
a)
b)
c)
Find top left of bounding box
Apply border for bounding box from calibration measure
FloodFill, 192 to 254
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
6. Calculate refined-CoM (rCoM):
a)
b)
Threshold, 254 to 255
Compute CoM as before
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
7. Orientation:
a) 0th moment of an image:
M   I ( x, y)
b) 1st moment for x and y of an image, respectively:
M   x  I ( x, y)
M   y  I ( x, y)
c) 2nd moment for x and y of an image, respectively:
M   x  I ( x, y)
M   y  I ( x, y)


M



2 
 x y 
d) Orientation of
M






  M  x    M  y  
image major axis:
 M

M
 


00
10
01
2
2
20
02
11
c c
00
20

arctan
2
c
00
02
2
c
00
2
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
8. Count the number of fingers (via FingerCountGivenX)
Function inputs:
a) Pointer to Image Data
b) rCoM
c) Radius = .17*HandSizeX + .17*HandSizeY
d) Starting Location (x or y, call appropriate function)
e) Ending Location (x or y, call appropriate function)
f) White Pixel Counter
g) Black Pixel Counter
h) Finger Counter
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
8. Count the number of fingers:
•
•
•
2 similar functions – start/end location
in x or y
After all previous steps, the fingerfinding function sweeps out an arc,
counting the number of white and
black pixels as it progresses
A finger in the current system is
defined to be any 10+ white pixels
separated by 3+ black pixels
(salt/pepper tolerance) minus 1 for the
hand itself
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
8. Count the number of fingers:
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
8. Count the number of fingers:
• Illustration of noise tolerance
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
System
Input:
System
Output:
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
System
Input:
System
Output:
Knowledge Systems Lab
JN 9/10/2002
Implementation Analysis
System Runtime:
•
•
•
Real Time – requires
30fps
Current time – 16.5 ms
for one frame (without
reading or writing)
Current Processing
Capability on 1.33 Ghz
Athlon – 60 fps
Process
Steps
Time (ms)
Athlon MP 1500
(1.33 Ghz)
Pentium
850 Mhz
1) Reading Image
?
?
2) Reading Image
208
340
3) Threshold
.5
6.5
4) Center of Mass
3.5
18.5
5) Flood Fill
1.5
27
6) Bounding Box Top-Left
3.5
5.5
7) Arm Removal
2
34.5
8) Refined CoM
4
19
9) Finger Counting
.5
1
10) Write Image
233
324
Time w/o R&W
16.5
112
Time w/o Write
224.5
452
Total Time
457.5
776.5
Knowledge Systems Lab
JN 9/10/2002
Future Directions
•
•
•
•
Optimization
Orientation for Hand Registration
New Finger Counting Approach
Learning System
For additional information, please visit http://ksl.jsu.edu.
Knowledge Systems Lab
JN 9/10/2002
Download