PPT Slides - Michigan State University

advertisement
A Non-obtrusive Head Mounted
Face Capture System
Chandan K. Reddy
Master’s Thesis Defense
Thesis Committee:
Dr. George C. Stockman (Main Advisor)
Dr. Frank Biocca (Co-Advisor)
Dr. Charles Owen
Dr. Jannick Rolland (External Faculty)
Modes of Communication






Text only - e.g. Mail, Electronic Mail
Voice only – e.g. Telephone
PC camera based conferencing – e.g. Web cam
Multi-user Teleconferencing
Teleconferencing through Virtual Environments
Augmented Reality Based Teleconferencing
Problem Definition






Face Capture System ( FCS )
Virtual View Synthesis
Depth Extraction and 3D Face Modeling
Head Mounted Projection Displays
3D Tele-immersive Environments
High Bandwidth Network Connections
Thesis Contributions





Complete hardware setup for the FCS.
Camera-mirror parameter estimation for the optimal
configuration of the FCS.
Generation of quality frontal videos from two side videos
Reconstruction of texture mapped 3D face model from
two side views
Evaluation mechanisms for the generated frontal views.
Existing Face Capture Systems
Courtesy :
FaceCap3d - a product
from Standard Deviation
Optical Face Tracker – a
product from Adaptive Optics
Advantages : Freedom for Head Movements
Drawbacks : Obstruction of the user’s Field of view
Main Applications : Character Animation and Mobile environments
Existing Face Capture Systems
Courtesy:
National tele-immersion Initiative
Sea of Cameras
(UNC Chappel Hill)
Advantages : No burden for the user
Drawbacks : Highly equipped environments and restricted head motion
Main Applications : Teleconferencing and Collaborative work
Proposed Face Capture System
(F. Biocca and J. P. Rolland, “Teleportal face-to-face system”, Patent Filed, 2000.)
Novel Face Capture System that is being developed.
Two Cameras capture the corresponding side views
through the mirrors
Advantages







User’s field of view is unobstructed
Portable and easy to use
Gives very accurate and quality face images
Can process in real-time
Simple and user-friendly system
Static with respect to human head
Flipping the mirror – cameras view the user’s
viewpoint
Applications







Mobile Environments
Collaborative Work
Multi-user Teleconferencing
Medical Areas
Distance Learning
Gaming and Entertainment industry
Others
System Design
Equipment Required
Hardware
2 lipstick cameras
2 lenses with focal length 12mm
2 mirrors with 1.5 inch diameter
2 Matrox Meteor II standard cards
Software
MIL – LITE 7.0
Visual Studio 6.0
Adobe Premiere 6.0
Sound Recorder
Lighting equipment
VGA to NTSC Converter
A Projector
A Microphone
Network
Internet 2
NAC 3000 MPEG Encoder
NAC 4000 MEG Decoder
Optical Layout

Three Components to be considered
Camera
 Mirror
 Human Face

Specification Parameters

Camera








Sensing area: 3.2 mm X 2.4 mm (¼”).
Pixel Dimensions: Image sensed is of dimensions 768 X
494 pixels. Digitized image size is 320 X 240 due to
restrictions of the RAM size.
Focal Length(Fc): 12 mm (VCL – 12UVM).
Field of View (FOV): 15.2 0 X 11.4 0.
Diameter (Dc): 12mm
Fnumber (Nc): 1 -achieve maximum lightness.
Minimum Working Distance (MWD)- 200 mm.
Depth of Field (DOF): to be estimated
Specification Parameters (Contd.)

Mirror





Human Face



Diameter (Dm) / Fnumber (Nm)
Focal Length (fm)
Magnification factor (Mm)
Radius of curvature (Rm)
Height of the face to be captured (H~ 250mm)
Width of the face to be captured (W~ 175 mm)
Distances


Distance between the camera and the mirror. (Dcm~150mm)
Distance between the mirror and the face. (Dmf ~200mm)
Customization of Cameras and Mirrors

Off-the-shelf cameras




Customizing camera lens is a tedious task
Trade-off has to be made between the field of view and the
depth of field
Sony DXC LS1 with 12mm lens is suitable for our application
Custom designed mirrors




A plano-convex lens with 40mm diameter is coated with black
on the planar side.
The radius of curvature of the convex surface is 155.04 mm.
The thickness at the center of the lens is 5 mm.
The thickness at the edge is 3.7 mm.
Block diagram of the system
Experimental setup
Virtual Video Synthesis
Problem Statement
Generating virtual frontal view from two side views
Data processing



Two synchronized videos are captured in real-time (30
frames/sec) simultaneously.
For effective capturing and processing, the data is
stored in uncompressed format.
Machine Specifications (Lorelei @ metlab.cse.msu.edu):





Pentium III processor
Processor speed: 746 MHz
RAM Size: 384 MB
Hard Disk write Speed (practical): 9 MB/s
MIL-LITE is configured to use 150 MB of RAM
Data processing (Contd.)



Size of 1 second video = 30 * 320 * 240 *3
= 6.59 MB
Using 150 MB RAM, only 10 seconds video
from two cameras can be captured
Why does the processing have to be offline?
Calibration procedure is not automatic
 Disk writing speed must be at least 14 MB/S.
 To capture 2 videos of 640 * 480 resolution, the
Disk writing speed must be at least 54 MB/S ???

Structured Light technique
Projecting a grid on the
frontal view of the face
A square grid in the frontal view
appears as a quadrilateral (with
curved edges) in the real side view
Color Balancing

Hardware based approach


White balancing of the cameras
Why this is more robust ? – why not software based ?





There is no change in the input camera
Better handling of varying lighting conditions
No pre - knowledge of the skin color is required
No additional overhead
Its enough if both cameras are color balanced relatively
Off-line Calibration Stage
Left Calibration
Face Image
Right Calibration
Face Image
Projector
Transformation Tables
Operational Stage
Left
Face Image
Transformation
Tables
Left Warped
Face Image
Right
Face Image
Right Warped
Face Image
Mosaiced Face Image
Virtual video synthesis
(Calibration phase)
Virtual video synthesis (contd.)
Virtual Frontal Video
Comparison of the Frontal Views
First row – Virtual frontal views
Second row – Original frontal views
Video Synchronization (Eye blinking)
First row – Virtual frontal views
Second row – Original frontal views
Face Data through Head Mounted System
3D Face Model
Coordinate Systems
There are five coordinate systems in our application





World Coordinate System (WCS)
Face Coordinate System (FCS)
Left Camera Coordinate system (LCCS)
Right Camera Coordinate system (RCCS)
Projector Coordinate System (PCS)
Camera Calibration

Conversion from 3D world coordinates to 2D camera
coordinates - Perspective Transformation Model
C11
s L Pr
s
=
s L Pc
C12 C13 C14
s W Px
C21
C22 C23 C24
s W Py
s W Pz
C31
C32 C33
1
1
Eliminating the scale factor
uj = (c11 – c31 uj) xj + (c12 – c32 uj) yj + (c13 – c33 uj) zj + c14
vj = (c21 – c31 vj) xj + (c22 – c32 vj) yj + (c23 – c33 vj) zj + c24
Calibration sphere




A sphere can be used for Calibration
Calibration points on the sphere are
chosen in such a way that the
Azimuthal angle is varied in steps of 45o
Polar angle is varied in steps of 30o
The location of these calibration points
is known in the 3D coordinate System
with respect to the origin of the sphere
The origin of the sphere defines the
origin of the World Coordinate System
Projector Calibration






Similar to Camera Calibration
2D image coordinates can not be obtained directly
from a 2D image.
A “Blank Image” is projected onto the sphere
The 2D coordinates of the calibration points on the
projected image are noted
More points can be seen from the projector’s point of
view – some points are common to both camera views
Results appear to have slightly more errors when
compared to the camera calibration
3D Face Model Construction

Why?
To obtain different views of the face
 To generate the stereo pair to view it in the HMPD


Steps required
Computation of 3D Locations
 Customization of 3D Model
 Texture Mapping

Computation of 3D points

3d point estimation using stereo

Stereo between two cameras is not possible because of
the occlusion by the facial features

Hence two stereo pair computations



Left camera and projector
Right camera and projector
Using stereo, compute 3D points of prominent facial
feature points in FCS
3D Generic Face Model
A generic face model with 395 vertices and 818 triangles
Left: front view and Right: side view
Texture Mapped 3D Face
Evaluation
Evaluation Schemes





Evaluation of facial expressions and is not studied
extensively in literature
Evaluation can be done for facial alignment, face
recognition for static images
Lip and eye movements in a dynamic event
Perceptual quality – How are the moods conveyed?
Two types of evaluation
Objective evaluation
 Subjective evaluation

Objective Evaluation



Theoretical Evaluation
No human feedback required
This evaluation can give us a measure of
Face recognition
 Face alignment
 Facial movements


Methods applied
Normalized cross correlation
 Euclidean distance measures

Evaluation Images
5 frames were considered for objective evaluation
First row – virtual frontal views
Second row – original frontal views
Normalized Cross-Correlation

Regions considered for normalized cross-correlation
( Left: Real image Right: Virtual image)
Normalized Cross-Correlation



Let V be the virtual image and R be the real image
Let w be the width and h be the height of the images
The Normalized Cross-correlation between the two
images V and R is given by
where
Normalized Cross-Correlation
Video Left eye Right
eye
Frames
Frame1 0.988
0.987
Mouth
Eyes + Complete
Mouth
face
0.993
0.989
0.989
Frame2 0.969
0.972
0.985
0.978
0.985
Frame3 0.969
0.967
0.992
0.978
0.986
Frame4 0.991
0.989
0.993
0.990
0.990
Frame5 0.985
0.986
0.992
0.988
0.989
Euclidean Distance measures
Euclidean distance between two points i and j is given by
Let Rij be the euclidean distance between two points i and j in the real image
Let Vij be the euclidean distance between two points i and j in the virtual image
Dij = | Rij - Vij |
Euclidean Distance measures
frames
Daf
Dbf Dcf Dcg Ddg Deg
Error
Frame1
2.00 0.80 4.15 3.49 2.95 3.46
2.80
Frame2
0.59 3.00 0.79 4.91 0.63 0.80
1.79
Frame3
1.88 3.84 4.29 4.34 2.68 1.83
3.14
Frame4
1.09 2.97 2.10 6.33 3.01 4.08
3.36
Frame5
1.62 2.21 5.57 4.99 1.24 1.90
2.92
Subjective Evaluation



Evaluates the human perception
Measurement of quality of a talking face
Factors that might affect







Quality of the video
Facial movements and expressions
Synchronization of the two halves of the face
Color and Texture of the face
Quality of audio
Synchronization of audio
A preliminary study has been made to assess the quality
of the generated videos
Future Work
Work
Conclusion and Future
Conclusion
Virtual
Frontal Image
Virtual
Frontal Video
Texture Mapped
3D Face Model
3D Facial
Animation
Summary





Design and implementation of a novel Face
Capture System
Generation of virtual frontal view from two side
views in a video sequence
Extraction of depth information using stereo
method
Texture mapped 3D face model generation
Evaluation of virtual frontal videos
Future Work







Online processing in real-time
Automatic calibration
3D facial animation
Subjective Evaluation of the virtual frontal videos
Data compression while processing and transmission
Customization of camera lenses
Integration with a Head Mounted Projection Display
Thank You
Doubts,
Queries
&
Suggestions
Download