ECE 598: An Invitation to 3-D Vision: From Images to

advertisement
ECE 598: An Invitation to 3-D Vision: From Images to Geometric Models
Yi Ma
Department of Electrical & Computer Engineering
Course Proposal for Fall 2004
Course Description
This is an advanced graduate-level course on the geometry and dynamics of computer vision. It will cover
both geometric and algorithmic aspects of recovering three-dimensional motion and structure from multiple
(or a sequence of) two-dimensional images. The first part of the course consists of a mathematical introduction to multi-view geometry relying on a brand new matrix rank and subspace techniques from linear
algebra, with a special emphasis on a global approach to multiple view analysis (as opposed to most extant
methods). New geometry and algebra associated to multiple views of multiple moving objects will be the
focus of this course, with broad applications to motion/image/video segmentation and scene reconstruction
(which no other vision courses or books so far has ever covered).
This is an advanced course in machine vision, and it can be considered as a follow-up of the computer vision
course ECE449 (CS443) or the image processing course ECE447, with the subject covered being narrower
but much deeper. Special pattern recognition methods that arise from motion analysis, such as generalized
principal component analysis, are also complementary to general methods introduced in the other ECE497
course, “Pattern Recognition,” offered by Narendra Ahuja last semester. However, this course is entirely
self-contained, necessary background knowledge in rigid-body motions, camera models, feature extraction
and correspondence, two- and three-view geometry will be briefly covered in the beginning. Since they are
not the focus, softwares and codes will be provided to students to get hands-on experience. The goal is
however for the students to learn the basics of multiple images and multiple motions so as to deal with more
challenging tasks in their final projects.
The essential prerequisite is linear algebra (Math318, Math381 or ECE415). Some familiarity with geometry (Math423 or Math424), linear systems theory (ECE415), estimation theory (ECE461) and rigid-body
kinematics (ECE389/GE389) would increase your appreciation but not crucial. The course targets at the
following students:
1. Graduate students in ECE/CS in the areas of machine vision, image processing, and computer graphics
interested in a more unified and simple treatment of reconstruction from multiple images or video
sequences.
2. Graduate students in ECE or ME in the areas of control and robotics interested in vision based robotic
control.
3. Graduate students in Mathematics interested in applications of modern geometry and multi-linear
algebra to computer vision.
Relation to Previous Courses and Future Plan
A preliminary version of this course has been offered twice before as ECE497 in Spring 2001 and Fall
2002. They were both very well received by the students. There will be several improvements over the old
courses. Firstly, the course notes have been completed and published recently as a textbook by Springer,
which will be used as the primary text for the new course. Secondly, the new course will no longer be limited
1
to the conventional topics of multiple-view geometry. New and more diverse topics will be introduced
to update the old course program, such as generalized principal component analysis and its applications
in image/motion/video segmentation, geometry of non-rigid motions and linked rigid-body motions (e.g.,
human movement), and geometry of different camera models including orthographic, affine, and catadioptric
cameras.
In the near future, with the collaboration with other computer vision and image processing faculties (including Tom Huang, Narendra Ahuja, Yoshi Shinagawa), I would like to propose a permanent course “Computer
Vision II,” which covers special topics in machine vision such as pattern recognition, machine learning,
video processing, and multiple-view geometry. This course is an integral part of that effort.
General Information
Required textbook:
1. An Invitation to 3-D Vision: From Images to Geometric Models, Yi Ma, S. Soatto, J. Košecká and S.
Sastry, Springer, 2003. (Price: $69.99)
Supplementary books:
2. W. Boothby, An Introduction to Differential Manifolds and Riemannian Geometry, Academic Press,
second edition, 1986.
3. O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press, 1993.
3. O. Faugeras and Q.-T. Luong, Geometry of Multiple Images, MIT Press, 2001.
4. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge Press, 2000.
5. B. Horn, Robot Vision, MIT Press, 1986.
6. S. Maybank, Theory of Reconstruction from Image Motion, Springer-Verlag, 1993.
7. R. Murray, Z.-X. Li and S. Sastry, A Mathematical Introduction to Robotic Manipulation, CRC Press
Inc. 1994.
8. J. Weng, N. Ahuja and T. Huang, Motion and Structure From Image Sequences, Springer, 1993.
Recommended readings (samples):
1. A Hierarchical Approach to 2-D and 3-D Motion Segmentation via Polynomial Fitting and Differentiating, Rene Vidal and Yi Ma, in Proceedings of the European Conference on Computer Vision
(ECCV), Prague, 2004.
2. Generalized Principal Component Analysis (GPCA): an analytic solution to segmentation of mixtures
of subspaces, Rene Vidal, Yi Ma and Shankar Sastry, IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June, 2003.
3. Recognition of Human Gait, Alessandro Bissocco, Alessandro Chiuso, Yi Ma and Stefano Soatto,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, December, 2001.
More literature will be provided to the students throughout the semester.
Grading policy: Weekly homework (60%) and Final Project (40%). The final project can be done in a
group of 2 or 3 students. The project can be theoretical, experimental or a mixture of both. It consists of a
midterm proposal, a final presentation (in class) and a web-based report.
2
Tentative Course Syllabus and Schedule
1. Representation of a three-dimensional moving scene (1.5 hours): Rigid-body motion, canonical
exponential coordinates, Rodrigues formula, Euclidean, affine and projective transformations. (Chapter 2 of Ma et. al.)
2. Image formation (1.5 hours): Mathematical model for ideal perspective projection and the pinhole
camera, other geometric projection models. (Chapter 3 of Ma et. al.)
3. Image primitive and correspondence (1.5 hours): Photometric features and geometric features,
image correspondences and optical flows, feature selection, matching and tracking. (Chapter 4 of Ma
et. al.)
4. Two-view geometry (4.5 hours): Epipolar geometry (discrete and continuous cases), geometric characterization of the essential matrix, and the eight-point, seven-point and six-point algorithms, optimal
estimation and reconstruction from two views. (Chapter 5 of Ma et. al.)
5. Camera calibration and self-calibration (4.5 hours): Camera calibration from a rig, uncalibrated
epipolar geometry, the fundamental matrix, camera self-calibration from Kruppa’s equations and from
projective, affine and Euclidean stratification. (Chapter 6 of Ma et. al.)
6. Multiple-view geometry of point and line features (3 hours): Two- and three-view geometry review, relation to bi-focal and tri-focal tensors. Multiple-view constraints in terms of matrix rank,
reconstruction based on pure point and line features. (Chapter 8 of Ma et. al.)
7. Batch multiple-view reconstruction (1.5 hours): A universal rank condition for multiple images of
mixtures of points, lines, and planes. (Chapter 9 of Ma et. al.)
8. Midterm project proposal (the 7th week): A three-page project proposal from each team.
9. Multiple-view geometry and symmetry (3 hours): Symmetric rank conditions associated with a
single of a symmetric structure. Reflective, rotational, translational symmetries. Symmetry-based
3-D reconstruction. (Chapter 10 of Ma et. al.)
10. Multiple-view geometry of other camera models (3 hours): Multiple-view geometry, rank conditions, and reconstruction techniques associated with orthographic, affine, and catadioptric cameras.
11. Multiple-motion segmentations from two-views (3 hours): Multibody epipolar geometry, polynomial factorization, motion segmentation. (Chapter 7 of Ma et. al.)
12. Generalized principal component analysis (4.5 hours): Clustering techniques, subspace methods,
and hybrid linear models. Polynomial fitting, differentiation, and division. Recursive and robust
GPCA algorithms. (The CVPR paper with Rene Vidal)
13. Motion/image/video segmentation via GPCA (3 hours): A hierarchical approach to motion segmentation for linear, affine, and projective models. Image and video segmentation and compression
via GPCA. (The ECCV paper with Rene Vidal)
14. Geometry of non-rigid body motions and linked rigid-body motions (3 hours): Rank conditions
for non-rigid body motions; subspace methods for the identification of linked rigid-body motions
(human gait) from images.
15. Selected topics in vision systems and applications (3 hours): Dynamical and active vision, visual
feedback, vision-based control systems. (Chapter 11 of Ma et. al.)
16. Final project presentation and report (1.5 hours): 15-20 minutes in class presentation for each
project. The final report shall be submitted in the form of a project website.
3
Download