A Non-obtrusive Head Mounted Face Capture System Chandan K. Reddy Master’s Thesis Defense Thesis Committee: Dr. George C. Stockman (Main Advisor) Dr. Frank Biocca (Co-Advisor) Dr. Charles Owen Dr. Jannick Rolland (External Faculty) Modes of Communication Text only - e.g. Mail, Electronic Mail Voice only – e.g. Telephone PC camera based conferencing – e.g. Web cam Multi-user Teleconferencing Teleconferencing through Virtual Environments Augmented Reality Based Teleconferencing Problem Definition Face Capture System ( FCS ) Virtual View Synthesis Depth Extraction and 3D Face Modeling Head Mounted Projection Displays 3D Tele-immersive Environments High Bandwidth Network Connections Thesis Contributions Complete hardware setup for the FCS. Camera-mirror parameter estimation for the optimal configuration of the FCS. Generation of quality frontal videos from two side videos Reconstruction of texture mapped 3D face model from two side views Evaluation mechanisms for the generated frontal views. Existing Face Capture Systems Courtesy : FaceCap3d - a product from Standard Deviation Optical Face Tracker – a product from Adaptive Optics Advantages : Freedom for Head Movements Drawbacks : Obstruction of the user’s Field of view Main Applications : Character Animation and Mobile environments Existing Face Capture Systems Courtesy: National tele-immersion Initiative Sea of Cameras (UNC Chappel Hill) Advantages : No burden for the user Drawbacks : Highly equipped environments and restricted head motion Main Applications : Teleconferencing and Collaborative work Proposed Face Capture System (F. Biocca and J. P. Rolland, “Teleportal face-to-face system”, Patent Filed, 2000.) Novel Face Capture System that is being developed. Two Cameras capture the corresponding side views through the mirrors Advantages User’s field of view is unobstructed Portable and easy to use Gives very accurate and quality face images Can process in real-time Simple and user-friendly system Static with respect to human head Flipping the mirror – cameras view the user’s viewpoint Applications Mobile Environments Collaborative Work Multi-user Teleconferencing Medical Areas Distance Learning Gaming and Entertainment industry Others System Design Equipment Required Hardware 2 lipstick cameras 2 lenses with focal length 12mm 2 mirrors with 1.5 inch diameter 2 Matrox Meteor II standard cards Software MIL – LITE 7.0 Visual Studio 6.0 Adobe Premiere 6.0 Sound Recorder Lighting equipment VGA to NTSC Converter A Projector A Microphone Network Internet 2 NAC 3000 MPEG Encoder NAC 4000 MEG Decoder Optical Layout Three Components to be considered Camera Mirror Human Face Specification Parameters Camera Sensing area: 3.2 mm X 2.4 mm (¼”). Pixel Dimensions: Image sensed is of dimensions 768 X 494 pixels. Digitized image size is 320 X 240 due to restrictions of the RAM size. Focal Length(Fc): 12 mm (VCL – 12UVM). Field of View (FOV): 15.2 0 X 11.4 0. Diameter (Dc): 12mm Fnumber (Nc): 1 -achieve maximum lightness. Minimum Working Distance (MWD)- 200 mm. Depth of Field (DOF): to be estimated Specification Parameters (Contd.) Mirror Human Face Diameter (Dm) / Fnumber (Nm) Focal Length (fm) Magnification factor (Mm) Radius of curvature (Rm) Height of the face to be captured (H~ 250mm) Width of the face to be captured (W~ 175 mm) Distances Distance between the camera and the mirror. (Dcm~150mm) Distance between the mirror and the face. (Dmf ~200mm) Customization of Cameras and Mirrors Off-the-shelf cameras Customizing camera lens is a tedious task Trade-off has to be made between the field of view and the depth of field Sony DXC LS1 with 12mm lens is suitable for our application Custom designed mirrors A plano-convex lens with 40mm diameter is coated with black on the planar side. The radius of curvature of the convex surface is 155.04 mm. The thickness at the center of the lens is 5 mm. The thickness at the edge is 3.7 mm. Block diagram of the system Experimental setup Virtual Video Synthesis Problem Statement Generating virtual frontal view from two side views Data processing Two synchronized videos are captured in real-time (30 frames/sec) simultaneously. For effective capturing and processing, the data is stored in uncompressed format. Machine Specifications (Lorelei @ metlab.cse.msu.edu): Pentium III processor Processor speed: 746 MHz RAM Size: 384 MB Hard Disk write Speed (practical): 9 MB/s MIL-LITE is configured to use 150 MB of RAM Data processing (Contd.) Size of 1 second video = 30 * 320 * 240 *3 = 6.59 MB Using 150 MB RAM, only 10 seconds video from two cameras can be captured Why does the processing have to be offline? Calibration procedure is not automatic Disk writing speed must be at least 14 MB/S. To capture 2 videos of 640 * 480 resolution, the Disk writing speed must be at least 54 MB/S ??? Structured Light technique Projecting a grid on the frontal view of the face A square grid in the frontal view appears as a quadrilateral (with curved edges) in the real side view Color Balancing Hardware based approach White balancing of the cameras Why this is more robust ? – why not software based ? There is no change in the input camera Better handling of varying lighting conditions No pre - knowledge of the skin color is required No additional overhead Its enough if both cameras are color balanced relatively Off-line Calibration Stage Left Calibration Face Image Right Calibration Face Image Projector Transformation Tables Operational Stage Left Face Image Transformation Tables Left Warped Face Image Right Face Image Right Warped Face Image Mosaiced Face Image Virtual video synthesis (Calibration phase) Virtual video synthesis (contd.) Virtual Frontal Video Comparison of the Frontal Views First row – Virtual frontal views Second row – Original frontal views Video Synchronization (Eye blinking) First row – Virtual frontal views Second row – Original frontal views Face Data through Head Mounted System 3D Face Model Coordinate Systems There are five coordinate systems in our application World Coordinate System (WCS) Face Coordinate System (FCS) Left Camera Coordinate system (LCCS) Right Camera Coordinate system (RCCS) Projector Coordinate System (PCS) Camera Calibration Conversion from 3D world coordinates to 2D camera coordinates - Perspective Transformation Model C11 s L Pr s = s L Pc C12 C13 C14 s W Px C21 C22 C23 C24 s W Py s W Pz C31 C32 C33 1 1 Eliminating the scale factor uj = (c11 – c31 uj) xj + (c12 – c32 uj) yj + (c13 – c33 uj) zj + c14 vj = (c21 – c31 vj) xj + (c22 – c32 vj) yj + (c23 – c33 vj) zj + c24 Calibration sphere A sphere can be used for Calibration Calibration points on the sphere are chosen in such a way that the Azimuthal angle is varied in steps of 45o Polar angle is varied in steps of 30o The location of these calibration points is known in the 3D coordinate System with respect to the origin of the sphere The origin of the sphere defines the origin of the World Coordinate System Projector Calibration Similar to Camera Calibration 2D image coordinates can not be obtained directly from a 2D image. A “Blank Image” is projected onto the sphere The 2D coordinates of the calibration points on the projected image are noted More points can be seen from the projector’s point of view – some points are common to both camera views Results appear to have slightly more errors when compared to the camera calibration 3D Face Model Construction Why? To obtain different views of the face To generate the stereo pair to view it in the HMPD Steps required Computation of 3D Locations Customization of 3D Model Texture Mapping Computation of 3D points 3d point estimation using stereo Stereo between two cameras is not possible because of the occlusion by the facial features Hence two stereo pair computations Left camera and projector Right camera and projector Using stereo, compute 3D points of prominent facial feature points in FCS 3D Generic Face Model A generic face model with 395 vertices and 818 triangles Left: front view and Right: side view Texture Mapped 3D Face Evaluation Evaluation Schemes Evaluation of facial expressions and is not studied extensively in literature Evaluation can be done for facial alignment, face recognition for static images Lip and eye movements in a dynamic event Perceptual quality – How are the moods conveyed? Two types of evaluation Objective evaluation Subjective evaluation Objective Evaluation Theoretical Evaluation No human feedback required This evaluation can give us a measure of Face recognition Face alignment Facial movements Methods applied Normalized cross correlation Euclidean distance measures Evaluation Images 5 frames were considered for objective evaluation First row – virtual frontal views Second row – original frontal views Normalized Cross-Correlation Regions considered for normalized cross-correlation ( Left: Real image Right: Virtual image) Normalized Cross-Correlation Let V be the virtual image and R be the real image Let w be the width and h be the height of the images The Normalized Cross-correlation between the two images V and R is given by where Normalized Cross-Correlation Video Left eye Right eye Frames Frame1 0.988 0.987 Mouth Eyes + Complete Mouth face 0.993 0.989 0.989 Frame2 0.969 0.972 0.985 0.978 0.985 Frame3 0.969 0.967 0.992 0.978 0.986 Frame4 0.991 0.989 0.993 0.990 0.990 Frame5 0.985 0.986 0.992 0.988 0.989 Euclidean Distance measures Euclidean distance between two points i and j is given by Let Rij be the euclidean distance between two points i and j in the real image Let Vij be the euclidean distance between two points i and j in the virtual image Dij = | Rij - Vij | Euclidean Distance measures frames Daf Dbf Dcf Dcg Ddg Deg Error Frame1 2.00 0.80 4.15 3.49 2.95 3.46 2.80 Frame2 0.59 3.00 0.79 4.91 0.63 0.80 1.79 Frame3 1.88 3.84 4.29 4.34 2.68 1.83 3.14 Frame4 1.09 2.97 2.10 6.33 3.01 4.08 3.36 Frame5 1.62 2.21 5.57 4.99 1.24 1.90 2.92 Subjective Evaluation Evaluates the human perception Measurement of quality of a talking face Factors that might affect Quality of the video Facial movements and expressions Synchronization of the two halves of the face Color and Texture of the face Quality of audio Synchronization of audio A preliminary study has been made to assess the quality of the generated videos Future Work Work Conclusion and Future Conclusion Virtual Frontal Image Virtual Frontal Video Texture Mapped 3D Face Model 3D Facial Animation Summary Design and implementation of a novel Face Capture System Generation of virtual frontal view from two side views in a video sequence Extraction of depth information using stereo method Texture mapped 3D face model generation Evaluation of virtual frontal videos Future Work Online processing in real-time Automatic calibration 3D facial animation Subjective Evaluation of the virtual frontal videos Data compression while processing and transmission Customization of camera lenses Integration with a Head Mounted Projection Display Thank You Doubts, Queries & Suggestions