Università del Salento Facoltà di Ingegneria Image Processing (Elaborazione delle Immagini) A.A. 2012/2013 PART II – Two case studies Dario Cazzato, INO – CNR dario.cazzato@ino.it This lesson introduce the use of the OpenCV library on real cases. Two study cases: ◦ Stereo Correspondence Problem; ◦ Segmentation of Video Sequences. Why OpenCV? ◦ Free, open source, for real-time application, crossplatform, constantly updated, strong partners and research. C/C++: ◦ “cv::Mat vs CvMat”. Basic components: ◦ Matrixes, vectors, rectangles, sizes, images, datatype… Some example. Men: ◦ Binocular vision. ◦ Average distance between eyes: 6cm. ◦ An object/point seen with eyes is viewed as one, altough in the retina We have two images. The combined image is more than the sum of its parts. It’s not trivial! Other configurations: ◦ Animals: Predator: binocular sight. Prey: lateral eyes (to enlarge the field of sight). ◦ Intersecting line of sight (typical in stereo vision!). Let’s see some key concept from a practical point of view, but all the problem is absolutely larger! ◦ You will see more detail about epipolar geometry and matrixes computation at lesson. The simplest model of the camera. Only a single ray enters from any particular point, the pinhole aperture. This point is projected onto the image plane. The focal lenght is the distance from the pinhole aperture to the image plane. A real point Q is projected onto the image plane by the ray passing through the center of projection. This intersection gives q. Calibration Matrix (3X4) From “Learning OpenCV”, G.Bradski, A.Kaehler, O’Reilly. Homogeneous coordinate system. If you have N dimension, use N+1 coordinates. Homography Perspective Geometry 1 Camera is not enough! With two (or more) cameras we can compute depth by triangulation, if we are able to find homologous points in the two images. Epipolar Geometry Four steps: 1. 2. 3. 4. Undistortion; Rectification; Disparity Map; Triangulation. 1. Undistortion: removal of tangential and radial lens distortion. This problem concerns the single camera! Distortion vector (1X5) 1. Undistortion: removal of tangential and radial lens distortion. 2. Rectify: output row-aligned images (coplanar, with the same y-coordinate). With rectified images, we can search for a point in one image in the same line (y-coordinate) of the second one! Of course, a stereo calibration is needed (extrinsic and intrinsic parameters): ◦ ◦ Intrinsic: focal lenght, distortion. Extrinsic: Matrixes R, T that aligns the two cameras (Essential Matrix E, you will see more at lesson). We can divide the procedure in : ◦ ◦ Stereo Calibration: computation of the geometric relations between the two cameras in space; Stereo Rectification: “correction” of individual images as made with row-aligned image plane and parallel optical axes. Look at the example of OpenCV, and the source stereo_calib.cpp. Rectification turns the cameras in standard form! Example 1 From “Learning OpenCV”, G.Bradski, A.Kaehler, O’Reilly. 3. Disparity map: difference in x-coordinates of the same point viewed in the 2 cameras. A map is created computing the disparity for all the points. It’s encoded as a grayscale image, where farer point are darker. 4. Triangulation: difference in x-coordinates of the same point viewed in the 2 cameras. Idea: (d:T = f:Z) How to find homologous points? ◦ Correlation-based - checking if one location in one image looks/seems like another in another image; ◦ Feature-based - finding features in the image and seeing if the layout of a subset of features is similar in the two images. Occlusions Photometric transformations Uniform regions Noise Specular surfaces Perspective views Repetition “No results possible” detection: ◦ Sometime we just would like to say: “No correspondent point in the other image for this point” … Local Algorithm Winner take all strategy Sum of Absolute Difference: Sum of Squared Differences: Zero-mean Normalized Cross Correlation: Not just the window, but a fast normalization; ZNCC has range [-1,1]; We compute the ZNCC for each pixel center; We take the max value. What we can do to enhounce the model? ◦ Ratio between first and second max; Idea behind: if the maximum and a local maximum have similar value, the probability of error increase, and we could reject these values (putting a treshold). ◦ Spread; Idea behind: a flat area means repetitive texture. Just discard maximum in flat peaks. ◦ Multiple windows; ◦ Kernel shape based on segmentation. Computation time increases! Two enhouncements: 1. Check the epipolar line ± size: We can deal with noise in the epipolar geometry; For a fast computation, keep size small! (1,2,3). 2. Inverse function: We take the maximum, and we make ZNCC again starting from the right image; If the new winner isn’t the starting point (or is more than a treshold far, an error occurred, so discard the point). Demo 1 One of the Video Sequences Segmentation algorithms. Good with fixed camera and static background. High level goal: ◦ People detection. ◦ Surveillance: Reactive; Proactive. BS: subtract the current frame from the background model. Two phases: ◦ Background training; ◦ Foreground detection. Improvements to the base version. A codebook is built for every pixel; A codebook is composed by codewords, boxes; that grow to cover the common values seen over the time; Samples of each pixel are clustered in set of codewords; Incoming pixel: ◦ It has a brightness in the brightness range AND Color Distortion less than a treshold = BACKGROUND; ◦ Othervise FOREGROUND. MNRL (Maximum Negative Run Lenght): let us to make the background learning with objects movement. It refines codebook separating codebooks that can have foreground from the real background. MNRL = 50%. The foreground is simply detected computing the distance of the sample from the nearest cluster mean. Left object in the scene: Holes problem: Layering Modeling/Detection - 3 classes of codebook and 3 parameters that let to switch in the categories: ◦ Permanent; ◦ Non-permanent; ◦ Training. Adaptative Codebook Updating: ◦ Retraining is not the solution!! ◦ Global status updating at each frame; ◦ Periodical cleanining of the old codebook. Median filter: Median filter: Opening and Closing: Morphological Operators Opening and Closing: Opening: Closing: Median Filter: Why object detection? Not all the white pixels are of real interest (noise, holes not yet updated…). Object detection and labeling algorithm required. A: when an external contour point is encountered for the first time, a complete trace of the contour is made. This procedure stops when A is found again. All that points will have the same label A; B: when A' is encountered (it is an external contour point already labeled), a scan of the entire line is made, marking with the same label all the points encountered; C: when an internal contour point B is encountered for the first time, it takes the same label. Then a trace of this contour is made, giving again the same label to all the met point; D: when an already labeled point is found, like B', a scan of the entire line is made, marking the detected point with the same label. We slide all the blobs continuing to process only blobs with an area and ratio included in a range: ◦ [min Area, max Area], in pixel; ◦ [min Ratio, max Ratio]. Decision of range: ◦ Average height (from 1,60m to 2m); ◦ Average width of the box (from 10 cm to 60 cm); ◦ Distance from the camera (from 1m to 5m); A first necessary loss of genarality. Demo 2 Technical report (Camera Calibration): ◦ A Flexible New Technique for Camera Calibration, Zhengyou Zhang, 1998 Papers (Stereo Vision): ◦ Chia-Hung Chen, Han-Pang Huang, and Sheng-Yen Lo, Stereo-Based 3D Localization for Grasping KnownObjects with a Robotic Arm System Department of Mechanical Engineering National Taiwan University 10647, Taipei, Taiwan Papers (Codebook): ◦ Kim, Chalidabhongse, Harwood, Davis, Real-Time foreground-background segmentation using Codebook model, Computer Vision Lab, Department of Computer Science, University of Maryland, College Park, MD 20742, USA, Faculty of Information Technology, King Mongkut’s Institute of Technology, Ladkrabang, Bangkok 10520, Thailand, 2005. ◦ P. Fihl, R. Corlin, S. Park, T.B. Moeslund, M.M Trivedi, Tracking of Individuals in Very Long Video Sequence, Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark, Computer Vision and Robotics Research Laboratory The University of California, San Diego, USA, 2006 Stereo Images and disparities (ground truth): Camera Calibration and 3D Reconstruction with OpenCV: ◦ http://vision.middlebury.edu/stereo ◦ http://docs.opencv.org/modules/calib3d/doc/camera_c alibration_and_3d_reconstruction.html Motion Analysis with OpenCV: Books: Me: ◦ http://docs.opencv.org/modules/video/doc/motion_ana lysis_and_object_tracking.html (MOG) ◦ Codebook: “Learning OpenCV” (O’Reilly), Chapter 9. ◦ Stereo Vision: “Learning OpenCV” (O’Reilly), Chapter 12. ◦ dario.cazzato@ino.it