Tracking and Data Fusion for 4D Visualization Ulrich Neumann Suya You Computer Science Department Integrated Media Systems Center University of Southern California MURI Review June 2002 Research Goals Combine all manner of 3D models, images, video, and data in a coherent dynamic visualization to support spatio-temporal data understanding, information extraction, and dynamic scene change detection Needs: LiDAR Laser Stereo … Satellite Aerial Ground Text Hyper-links Communications … … Data Fusion Tracking 3D model Video/image Data Model refinement Information Fusion & Visualization Research Highlights We pursue basic algorithm research and testbed implementations that are feasible with current or near term technology Robust tracking for outdoor unprepared environment – Portable hybrid tracking and data acquisition/fusion system – Natural feature tracking using vision sensors Fusion of 2D video/images and 3D model – LiDAR data tessellation and model reconstruction – Real-time video texture projection and visualization 6DOF Auto-calibration technology – Detect and calibrate scene features (points and lines) to refine models and aid in tracking Tracking in Unprepared Environments • People with sensors (or unmanned sensors) moving in environment provide textures and data for visualizations… Where are they? Where are they looking? • Need 6DOF pose tracking over wide area outdoors – Varying sensor data availability and data rates • vision, GPS, inertial sensors – Varying certainty of measurements • spatial and temporal noise and precision – Fusion models and algorithms • underdetermined system needs constraints • real-time acquisition and execution on portable systems • Developed two tracking systems – Portable hybrid tracking and real-time data acquisition system – Natural feature tracking for computing motion of video sensor Portable Acquisition System DGPS receiver 3DOF Gyro sensor Stereo camera head Com1 Data fusion Com2 and storage Firewire System configuration: • • • • RTK differential GPS (Ashtech Z-Sensor base/mobile) 3D inertial tracker (Intersense IS300) Real-time stereo head (SRI International) PIII 866Hz laptop Real-Time Data Acquisition • Hybrid DGPS and Inertial sensor provide real-time 6DOF pose tracking • High-resolution digital camera pairs are used to capture video streams for texture projection and façade reconstruction • Complete self-contained system in a backpack • Acquisition in real-time (all data streams are timestamped and synchronized) • Includes data capture and playback tools Natural Feature Tracking Using Vision Sensor Problem Most vision tracking methods require a priori knowledge about the environment • Pre-calibrated landmarks • Active sensors • Scene models Active control or modification of an outdoor environment is unrealistic Our approach • Detect and use naturally occurred features • Robust tracking 1D (point) and 2D feature (region) • SFM (structure from motion) from tracked features Neither camera ego-motion nor structure information is known Natural Feature Tracking Using Vision Sensor 2D Feature tracking Pose & Structure Estimate Structure Pose Video streams 2D Feature detection (new feature) Feature verification Feature list Approach provides camera pose and structure estimates • Related pose tracking can be directly used for augmented reality overlays • Structure estimates allows continually tracking • also can be used to improve/refine existing models • Framework allows further sensor fusion (GPS, gyroscopes) for absolute pose reconstruction Natural Feature Tracking Vision Tracking Used for AR Overlays Fusion of 2D video/image and 3D model Imagine dozens of video streams from people, UAVs, and sensors distributed and moving through scene…. • Use sensor models and 3D models of the scene to integrate video/image data from different sources • Visualize the imagery in unique, innovative ways that maximize information extraction and comprehension • Produce dynamic visualizations from arbitrary viewpoints - Static textures pre-compute mapping of images to models - dynamic projections onto models – like “slide projectors” 3D Model: LiDAR from flyover • LiDAR provides 3D accurate (sub-meter) position samples and cm height accuracy - Use as base model/context for video visualization • Raw LiDAR comes as a 3D point-cloud - Need tools for data resampling, tessellation, and model reconstruction Data Tessellation and Model Reconstruction • Data tessellation – Data re-sampling (irregular sample cloud to regular grid) – Surface interpolation (hole filling) • 3D models represented as triangle meshes - Easily converted to many other geometric representations - Supports many level-of-detail techniques - Easily add photometric information (texture projections) - Hardware acceleration for fast image rendering • Use VRML as the standard model representation - supports web applications and open tool-base Data Resampling and Model Reconstruction Resampled range image Reconstructed 3D model LIDAR data acquired for USC campus Image/Video Texture Projection Texture Projection vs. Texture Map – dynamic vs. static – texture image and position both change each video frame 3D model image texture Virtual Texture Projector • Dynamic view and placement control during visualization • Update pose and “paint” the scene each frame • HW texture transformation during rendering (Video) • Real-time visualization with HW accelerator Dynamic Texture Projection • Putting it all together… − Accurate 3D models − Accurate 3D sensor models − calibration and 6DOF tracking − Projection transformation computes texture mapping during rendering − Visibility and occlusion processing − multi-pass algorithm ensure that only visible surfaces are textured video texture projected on USC model (GPS/inertial tracking) video texture projected on USC model (vision/GPS/inertial tracking) video texture projected on USC LiDAR/model (vision/GPS/inertial) LiDAR/Projection Rendering System • Visualizations from arbitrary viewpoints • User control of viewpoint as well as image inclusion (blending, and projection parameters) • Multi-texture projectors simultaneously visualize many image sources projected on the same model • Hardware acceleration with G-Force graphics card with use of pixel-shader features • Visualization wall (8x10-foot tracked stereo) 6 DOF Autocalibration • Autocalibration computes 3D scene information during tracking – allows tracking in regions beyond models or landmarks – Provides the necessary scale factor data to create the 6th DOF that is lacking from vision-only tracking – Provides absolute pose data for stabilizing the multisensors data fusion – Provided estimated 3D information of structure features (point, line and edge) to improve/refine models 6 DOF Autocalibration • Camera pose and features are estimated simultaneously along the motion path – Point features (developed in past 2-3 years) – Line features (this past year) • Pose estimate from 3D calibrated lines or points • Autocalibration of 3D lines – Unique line representation (minimal) • Four parameters N1, N2 (x, y, 1) for a given (R1T1, R2T2) to uniquely determine a 3D line L – EKF based estimator • Update per feature or measurement • Adapt to different sample rate Simulation Results RMS error for camera orientations degree 0.7 0.6 Simulation with 100 inch volume and 50 lines 0.5 0.4 0.3 0.2 1340 1237 1134 1031 928 825 722 619 516 413 310 207 104 1 0.1 0 Frame Id Reprojection errors for test points pixel 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0 3 2.5 2 1.5 1 Frame Id 1353 1249 1145 1041 937 833 729 625 521 417 313 209 105 1309 1200 982 1091 Frame Id 873 764 655 546 437 328 219 110 1 0.5 0 1 m RMS error for camera positions Autocalibration Results Tracked line features are marked as green Red lines are the projections of autocalibrated lines A virtual lamp and chair are inserted into the real scene based on the estimated camera pose Future Plans • Automate steps in data capture and fusion for vision/gps/inertial tracking and visualization – do 8-12 captures around 3-4 buildings on campus – visualize streams simultaneously – correspondences from video to LiDAR/models • edges, windows, … • Model refinement (w/Cal, GT) – constrained autocalibration for estimating building features • simplification and definition of edges, planar faces, windows, … • Temporal data management (w/Cal, GT, UCSC, USyr) – accumulation of persistent textures – user management and temporal blending (update) Dynamic Texture Projection Benefits and capabilities Real-time multi-source data fusion – Models, imagery, video, maps ... Enhanced data understanding – Dynamic controls of viewpoint as well as image inclusion – Rapid update to reflect most recent information – Highly interactive real-time perspective view capability 3D data editing using photogrammetric methods – Enables reuse of models and incremental refinement Projection on LiDAR Data Sensor Sensor Image plane View frustum Aerial view of projected image texture (campus of Purdue University)