COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING OBJECTS IN 3D ENVIRONMENT Sravan Kumar Reddy Mothe B.Tech., Jawaharlal Nehru Technological University, India, 2007 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in MECHANICAL ENGINEERING at CALIFORNIA STATE UNIVERSITY, SACRAMENTO SPRING 2011 COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING OBJECTS IN 3D ENVIRONMENT A Project by Sravan Kumar Reddy Mothe Approved by: ________________________________, Committee Chair Yong S.Suh, Ph. D. _________________________ Date ii Student: Sravan Kumar Reddy Mothe I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. ________________________, Graduate Coordinator Kenneth Sprott, Ph. D. Department of Mechanical Engineering iii _____________________ Date Abstract of COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING OBJECTS IN 3D ENVIRONMENT by Sravan Kumar Reddy Mothe In recent times computer recognition has become most powerful tool for many robotic applications. Applications like inspection, surveillance, industrial automation and gaming needs 3D positional data in order to interact with external environment and this can be achieved by computer recognition. Computer recognition can be done by using many different tools which are OpenCV, Matlab, OpenGL, etc. OpenCV has an optimized library with 500 useful functions for detecting, tracking, image transformation, 3D vision, etc. The scope of this project is to get 3D position of an object from two sensors. The sensors are two cameras which are needed to be calibrated before they see the 3D object. Calibration is the process in which the output image from two cameras is vertically aligned, which means all pixel points are vertically aligned. After calibration these two images from camera1 and camera 2 are inputted into openCV 3D function. This application is majorly designed for Ping Pong game shooter. The coding part of this iv project includes writing code in C++ using OpenCV libraries for calibrating cameras, and recognition for tracking 3D object. The output of the coding part is to get 3D position of player’s bat from cameras in camera coordinate system. This 3D positional data can be inputted into the shooter so that shooter’s joints can move automatically to shoot the ball exactly to the player’s bat for training purposes. This 3D vision technology can be used in many other applications like industrial robots, unmanned vehicles, intelligent surveillance, medical devices, gaming, etc. _______________________________, Committee Chair Yong S.Suh, Ph. D. ______________________ Date v ACKNOWLEDGMENTS While working on this project, some people helped me to reach where I am today and I would like to thank all for their support and patience. Firstly, I would like to thank Professor Dr. Yong S.Suh for giving me an opportunity to do this project. His continuous support was the main thing that helped me to develop immense interest on the project that led to do this project. Dr.Yong S.Suh helped me by providing many sources of information that needed from beginning of the project till the end. He was always there to meet, talk and answer the questions that came across during the project. Special thanks to my advisor Dr Kenneth Sprott for helping me to complete the writing of this dissertation, without his encouragement and constant guidance I could not have finished this report. Finally, I would also like to thank all my family, friends and Mechanical engineering department who helped me to complete this project work successfully. Without any of the above-mentioned people the project would not have come out the way it did. Thank you all. vi TABLE OF CONTENTS Page Acknowledgments vi List of Figures ix Software Specifications x Chapter 1. INTRODUCTION AND BACKGROUND 1 1.1 Introduction to computer vision 2 1.2 Applications of computer vision 2 1.3 Tools available for computer vision 8 1.3.1 OpenCV (Open Source Computer Vision) 8 1.3.2 VXL (Vision-something-Libraries) 9 1.3.3 BLEPO 10 1.3.4 MinGPU (A minimum GPU library for computer vision) 10 1.4 Description of the project 2. EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING PROCEDURES 2.1 11 13 Steps to calibrate stereo cameras and obtain 3D data 13 2.1.1 Step1 ( Loading OpenCV library) 14 2.1.2 Step2 ( Preparing chessboard and seting up cameras) 17 2.1.3 Step3 (C++ program to capture calibration data) vii 19 2.1.4 Step4 (Code for calibration and for 3D object tracking) 3. IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION 3.1 Important functions used for this project 25 26 26 3.1.1 cvFindChessboardCorners 26 3.1.2 cvDrawChessboardCorners 28 3.1.3 cvStereoCalibrate 28 3.1.4 cvComputeCorrespondEpilines 32 3.1.5 cvInitUndistortRectifyMap 33 3.1.6 cvStereoBMState 36 3.1.7 cvReprojectImageTo3D 39 3.2 Pseudo code for stereo calibration and 3D vision 4. RESULTS OF CALIBRATION AND 3D VISION 40 73 4.1 Project Application Information 73 4.2 Coordinates of a colored object in front of cameras 75 4.3 76 Results are graphically shown 5. CONCLUSION 81 6. FUTURE WORK 83 Bibliography 84 viii LIST OF FIGURES Page 1. Figure 1.1: Interaction of various fields of study defining interests in computer vision 2 2. Figure 2.1: Setting up OpenCV path for Environmental Variables 14 3. Figure 2.2: Creating of new Win32 Console Application 16 4. Figure 2.3: Loading libraries to Visual Studio 2010 16 5. Figure 2.4: Loading Additional Dependencies to the project 17 6. Figure 2.5: Chessboard used for calibration of stereo cameras 18 7. Figure 2.6: Two USB cameras are fixed to a solid board in front of 3D object 19 8. Figure 2.7: Calibration data from camera 1 and camera 2 24 9. Figure 2.8: Text file from that calibration code reads from 25 10. Figure 4.1: Camera coordinate system 75 11. Figure 4.2: Detected corners on the image taken from left camera 76 12. Figure 4.3: Detected corners on the image taken from right camera 77 13. Figure 4.4: Rectified image pair 78 14. Figure 4.5: Displays average error to its sub-pixel accuracy 78 15. Figure 4.6: Coordinates of an object with respective to left camera 79 16. Figure 4.7: Disparity between left and right image 80 ix SOFTWARE SPECIFICATIONS 1. The initial requirement to run the program is to have C++ compiler, preferable compiler is Visual Studio 2010. 2. Download OpenCV libraries and load them to Visual Studio. 3. Create new project in Visual Studio by opening the folder that is on the disc and open the source file and run it. Note: Two USB cameras should be connected before running the source file. 4. OpenCV loading procedures are clearly illustrated on the report. 5. Operating System: Windows 7 or Windows Vista (preferred). 6. System requirements: 4 GB RAM and 2.53 GHz processor speed (preferred). x 1 Chapter 1 INTRODUCTION AND BACKGROUND 1.1) Introduction to computer vision: Vision is our most powerful sense. It provides us with a remarkable amount of information about our surroundings and enables us to interact intelligently with the environment. Through it we learn the position and identities of objects and relation between them. Vision is also most complicated sense. The knowledge that we have accumulated about how our biological vision system operate is still fragmentary and confined mostly to processing stages directly concerned with signals from the sensors. Today one can find vision systems successfully deal with a variable environment as parts of machine. Computer vision (image understanding) is a technology that studies how to reconstruct, and understand a 3D scene from its 2D images in terms of the properties of the structures present in the scene. Computer vision is concerned with modeling and replicating human vision using computer software and hardware. It combines knowledge in computer science, electrical engineering, mathematics, physiology, biology, and cognitive science. It needs knowledge from all these fields in order to understand and simulate the operation of the human vision system. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from 2 images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data. Biological studies Computer Science and Engineering Artificial Intelligence /Cognitive Studies Computer vision Electronics Engineering Mechanical Engineering Robotics Figure 1.1: Interaction of various fields of study defining interests in computer vision. 1.2) Applications of computer vision: Much of artificial intelligence deals with autonomous planning or deliberation for robot systems to navigate through an environment. A detailed understanding of these environments is required to navigate through them. Information about the environment could be provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Potential application areas 3 for vision-driven automated systems are many. Each brings its own particular problems which must be resolved by system designers if successful operation is to be achieved but, generally speaking, applications can be categorized according to the processing requirements they impose. To illustrate I briefly describe a number of such areas of application. Examples are categorized under their principal application area. [Ref: 8] Three-dimensional modeling: 1. Creates 3D models from a set of images. Objects are imaged on a calibration. 2. Photo Modeler software allows creation of texture-mapped 3-D models from a small number of photographs. Uses some manual user input. 3. Uses projected light to create a full 3-D textured model of the human face or body in sub-second times. Traffic and road management: 1. Created the Auto scope system that uses roadside video cameras for real-time traffic management. Over 100,000 cameras are in use. 2. Imaging and scanning solutions for road network surveying. Web Applications: 1. Image retrieval based on face recognition. 4 2. Develops a system for image search on the web. Uses GPUs for increased performance. 3. Image retrieval based on content. 4. Virtual makeover website, TAAZ.com uses computer vision methods to allow users to try on makeup, hair styles, sunglasses, and jewelry. Security and Biometrics: 1. Systems for intelligent video surveillance. 2. Systems for biometric face recognition. 3. Fingerprint recognition systems with a novel sensor. 4. Systems for behavior recognition in real-time video surveillance. 5. Fingerprint recognition systems. 6. Smart video surveillance systems. 7. Security systems using novel sensors, such as registered visible and thermal infrared images and use of polarized lighting. 8. Security systems for license plate recognition, surveillance, and access control. 9. Image processing and computer vision for image forensics. 10. Automated monitoring systems, including face and object recognition. 11. Detection and identification of computer users. 12. Detection and monitoring of people in video streams. 13. Face verification and other biometrics for passport control. 5 People tracking: 1. Tracking people within stores for sales, marketing, and security. 2. Systems for counting and tracking pedestrians using overhead cameras. 3. Tracking people in stores to improve marketing and service. Object Recognition for Mobile Devices: 1. Visual search for smart phones, photo management, and other applications. 2. Image recognition and product search for camera phones. Industrial automation and inspection: 1. Industrial robots with vision for part placement and inspection. 2. Vision systems for the plastics industry. 3. Inspection systems for optical media, sealants, displays, and other industries. 4. Develops 3D scanners for sawmills and other applications. 5. Vision systems for industrial inspection tasks, including food processing, glassware, medical devices, and the steel industry. 6. Develops 3D vision systems using laser sensors for inspection of wood products, roads, automotive manufacturing, and other areas. 7. Industrial mobile robots that use vision for mapping and navigation. 8. Trainable computer vision systems for inspection and automation. 9. Laser-based inspection and templating systems. 6 10. Vision systems for surface inspection and sports vision applications. 11. Systems to inspect output from high-speed printing presses. 12. Vision systems for textile inspection and other applications. 13. Systems for inspection and process control in semiconductor manufacturing. 14. Automated inspection systems for printed circuit boards and flat panel displays. 15. Creates 3D laser scanning systems for automotive and other applications. 16. Has developed a system for accurate scanning of 3D objects for the automotive and other industries. The system uses a 4-camera head with projection of textured illumination to enable accurate stereo matching. Games and Gesture Recognition: 1. Time-of-flight range sensors and software for gesture recognition. Acquired by Microsoft in 2010. 2. Tracks human gestures for playing games or interacting with computers. 3. Real-time projected infrared depth sensor and software for gesture recognition. Developed the sensing system in Microsoft's Xbox Kinect. 4. Interactive advertising for projected displays that tracks human gestures. 5. Uses computer vision to track the hand and body motions of players to control the Sony Play station. 7 Film and Video: Sports analysis: 1. Uses multiple cameras to provide precise tracking in table tennis, cricket, and other sports for refereeing and commentary. 2. Creates photorealistic 3D visualization of sporting events for sports broadcasting and analysis. 3. Systems for tracking sports action to provide enhanced broadcasts. 4. Develops Piero system for sports analysis and augmentation. 5. Systems for tracking sports players and the ball in real time, using some human assistance. (My project can be used for this application.) 6. Vision systems to provide real-time graphics augmentation for sports broadcasts. 7. Provides 3D tracking of points on the human face or other surfaces for character animation. Uses invisible phosphorescent makeup to provide a random texture for stereo matching. 8. Systems for creating virtual television sets, sports analysis, and other applications of real-time augmented reality. 9. Video content management and delivery, including object identification and tracking. 10. Systems for tracking objects in video or film and solving for 3D motion to allow for precise augmentation with 3D computer graphics. 8 1.3) Tools available for computer vision: 1.3.1) OpenCV (Open Source Computer Vision): OpenCV is a library of programming functions for real time computer vision applications. The library is written in C and C++ and runs under different platforms namely Linux, Windows and Mac OS X. OpenCV was designed for strong focus on realtime applications. Further automatic optimization on Intel architectures can be achieved by Intel’s Integrated Performance Primitives (IPP), which consists of low-level optimized routines in many different algorithmic areas. One of OpenCV’s goals is to provide a flexible computer vision infrastructure that helps us build fairly sophisticated vision applications quickly. The OpenCV Library has over 500 functions that can be used for many areas in vision, including factory product inspection, medical imaging, surveillance, user interface, camera calibration, stereo vision, and robotics. OpenCV core main libraries: 1. “CVAUX” for Experimental/Beta. 2. “CXCORE” for Linear Algebra and Raw matrix support, etc. 3. “HIGHGUI” for Media/Window Handling and Read/write AVIs, window displays, etc. 9 OpenCV’s latest version is available from http://SourceForge.net/projects/opencvlibrary. We can be able to download openCV library and build it into Visual Studio 2010 and steps to build library will be illustrated later in this report. 1.3.2) VXL (Vision-something-Libraries): VXL is a collection of C++ libraries designed for computer vision research and implementation. It was created from TargetJr with the aim of making a light, fast and consistent system. VXL is written in ANSI/ISO C++ and is designed to be portable over many platforms. Core libraries in VXL are: 1. VNL (Numeric): Numerical containers and algorithms like matrices, vectors, decompositions, optimizers. 2. VIL (Imaging): Loading, saving and manipulating images in many common file formats, including very large images. 3. VGL (Geometry): Geometry for points, curves and other elementary objects in 1, 2 or 3 dimensions. 4. VSL (Streaming I/O), VBL (Basic templates), VUL (Utilities): Miscellaneous platform-independent functionality. 10 As well as the core libraries, there are libraries covering numerical algorithms, image processing, co-ordinate systems, camera geometry, stereo, video manipulation, and structure recovery from motion, probability modeling, GUI design, classification, robust estimation, feature tracking, topology, structure manipulation, 3D imaging, etc. Each core library is lightweight, and can be used without reference to the other core libraries. 1.3.3) BLEPO: Blepo is an open-source C/C++ library to facilitate computer vision research and education. Blepo is designed to be easy to use, efficient, and extensive. 1. It enable researchers to focus on algorithm development rather than low-level details such as memory management, reading/writing files, capturing images, and visualization, without sacrificing efficiency; 2. It enable educators and students to learn image manipulation in a C++ environment that is easy to use; and 3. It captures a repository of the more mature, well-established algorithms to enable their use by others both within and without the community to avoid having to reinvent the wheel. 1.3.4) MinGPU: A minimum GPU library for computer vision: In computer vision it is becoming popular to implement algorithms in whole or in part on a Graphics Processing Unit (GPU), due to the superior speed GPUs can offer compared 11 to CPUs. GPU has implemented two well known computer vision algorithms – LukasKanade optical flow and optimized normalized cross-correlation as well as homography transformation between two 3D views. Minimum GPU is a library which contains, as minimal as possible, all of the necessary functions to convert an existing CPU code to GPU. MinGPU provides simple interfaces which can be used to load a 2D array into the GPU and perform operations on it. All GPU and OpenGL related code is encapsulated in the library; therefore users of this library need not to know any details on how GPU works. Because GPU programming is currently not that simple for anyone outside the Computer Graphics community, this library can facilitate an introduction to GPU world to researchers who have never used the GPU before. The library works with both nVidia and ATI families of graphics cards and is configurable. 1.4) Description of the Project: The goal of the project is design a sensor that detects and tracks objects in 3D environment, for this project we specifically designed for the Ping Pong game shooter. The sensors used for this project are stereo cameras and these cameras make the shooter to know where the player’s bat is at. This project is basically about writing code using OpenCV library for cameras to see the player’s bat. The output from the code is exact 3D location of the bat (X, Y & Z coordinates) from the sensor. The output from the stereo cameras is just a video stream and this video stream is further analyzed by the computer program written in C++. The program basically consists of 12 different function namely Calibration, Rectification, Disparity, Background Subtraction and Reprojection3D. The Calibration basically does is to remove the distortion in the video streams which is usually caused by improper alignment and quality of the lenses. The Rectification function rectifies both images so that all pixels from the both images are vertically aligned. Disparity function calculates the differences in x-coordinates on the image planes of the same feature viewed in the left and right cameras. Background subtraction function subtracts the background and makes the program to see only the colored object (bat) for tracking. Finally the function Reprojection3D takes the disparity as an input and outputs the coordinates of an object from the sensors. These coordinates can be inputted into the shooter so the shooter can understand the bat position and implement the advance training modes for the player to practice the game. 13 Chapter 2 EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING PROCEDURES 2.1) Steps to calibrate stereo cameras and obtain 3D data: 1. Loading OpenCV libraries into Visual Studio 2010 and check with the sample program whether it is working well or not. If step 1 is okay move to step 2. 2. Make a good sized chessboard and tape it to a solid piece of wood, plastic (make sure the image is not bent or calibration will not work). Make sure we focus our cameras so that the text on the chessboard is readable, make sure we don't modify the focus and distance between the cameras after calibration or during calibration since these two are important factors in calibration. If we modify position or focus of cameras we need to calibrate ones again. 3. Compile a C++ code that can capture chessboard pictures from left and right cameras at same time. The code takes 16 sets of calibration pictures. The waiting time for every set of pictures is 10 seconds so that we can move the chessboard into different position in front of stereo cameras. All constant values are preprogrammed in the code and can be adjustable according to the requirements. 4. After getting the calibration data from step3 we can execute the stereo calibration code that outputs the 3D data of a colored object. It actually takes calibration data from step3 as an input. The program basically does is to calibrate stereo cameras and locate 3D object. X, Y and Z are output values of colored object from stereo 14 cameras. Repeat step3 until we get good results. The average error function in the code should be minimum, usually less than 0.3. Lesser the error, better the output (3D data). Each step is clearly illustrated bellow: 2.1.1) Step1 (Loading OpenCV libraries): 1. Download OpenCV source files from http://sourceforge.net/projects/opencvlibrary/. The downloaded file should contain folders namely Bin, Src, Include and Lib. Downloaded openCV2.1.0 folder should be in ProgramFiles folder of our computer. 2. Go to Start menu->Computer and right click on it-> Click on properties->click on Advance System Settings->click on Environmental Variables->Add new path in User Variable ABC box as shown in the figure 2.1-1. Figure 2.1: Setting up OpenCV path for Environmental Variables. 15 3. To get installation done we need to follow these steps shown below in figures and text. Go to Start -> All Programs -> Microsoft Visual Studio 2010 (Express) -> Microsoft Visual (Studio / C++ 2010 Express). File -> New -> Project Name: 'OpenCV_Helloworld', with selecting ‘Win32Console Application' click 'OK' and click 'Finish' Go to project -> OpenCV_Helloworld Properties...Configuration Properties -> VC++ Directories Go to Executable Directories and click add: ‘C:\ProgramFiles\OpenCV2.1\bin;’ Go to Include Directories and click add: ' C:\Program Files\OpenCV2.1\include\opencv;' Go to Library Directories and click add: ' C:\Program Files\OpenCV2.1\lib;' Go to Source Directories and click add and add five source files ‘C:\ProgramFiles\OpenCV2.1\src\cv;C:\ProgramFiles\OpenCV2.1\src\cvaux;C:\P rogramFiles\OpenCV2.1\src\cvaux\vs;C:\ProgramFiles\OpenCV2.1\src\cxcore;C: \ProgramFiles\OpenCV2.1\src\highgui;C:\ProgramFiles\OpenCV2.1\src\ml;C:\Pr ogramFiles\OpenCV2.1\src1;’ Go to Linker -> Input -> Additional Dependencies and click add: ‘cvaux210d.lib;cv210d.lib;cxcore210d.lib;highgui210d.lib;ml210d.lib;’ 16 Figure 2.2: Creating new Win32 Console Application. Figure 2.3: Loading libraries to Visual Studio 2010. 17 Figure 2.4: Loading Additional Dependencies to the project. 2.1.2) Step2 ( Preparing chessboard and seting up cameras): Prepare a chessboard of 90 by 70 centimeters sized for cameras which are 40 centimeters apart. The chess borard should have atleast 9 boxes in vertical and 6 boxes in horizontal and not less than these values. The reason because cameras wouldn’t able to recognise corners if the box size is too small. So we should make sure the size of the box should be larger and number of boxes should be minimum for above mentioned size. Figure 2.5 displays the chessboard used for this project. Cameras are approximately placed 40 centimeters apart. After calibration cameras are should not be moved, the reason is because functions in code makes the rotational and transulational relation between two cameras. Distance and focus are the 18 major factors in the relation. Larger the distance between the cameras and larger the distance we can track an object from the camera. Cameras are fixed to a solid board with some consatnt distance so that we don’t have to calibrate every time when we need to track or get 3D data of an object. Figure 2.1-6 shows the camera setup. Figure 2.5: Chessboard used for calibration of stereo cameras. 19 Figure 2.6: Two USB cameras are fixed to a solid board in front of 3D object. 2.1.3) Step3 (C++ program to capture calibration data): The below program that is used to check whether the OpenCV libraries are working well and to get 16 sets of calibration data. #include "cv.h" #include "cxmisc.h" #include "highgui.h" #include <vector> #include <string> #include <algorithm> 20 #include <stdio.h> #include <ctype.h> int main(int argc, char **argv) { //Initializing image arrays and capture arrays for frames, gray frames from two cameras. IplImage* frames[2]; IplImage* framesGray[2]; CvCapture* captures[2]; for(int lr=0;lr<2;lr++) { captures[lr] = 0; frames[lr]=0; framesGray[lr] = 0; } int r =cvWaitKey(1000);//setting wait time to 1000 milliseconds before capturing chessboard images. captures[0] = cvCaptureFromCAM(0); //capture from first cam captures[1] = cvCaptureFromCAM(1); //capture from second cam int count=0; 21 while(1)// loop is used to get every captured frame and if count reaches to the number of pictures we need for the calibration, loop exits. { frames[0] = cvQueryFrame(captures[0]);//getting frame frames[1] = cvQueryFrame(captures[1]); cvShowImage("RawImage1",frames[0]);//showing raw image cvShowImage("RawImage2",frames[1]); framesGray[0] = cvCreateImage(cvGetSize(frames[0]),8,1);//creating image for gray image conversion framesGray[1] = cvCreateImage(cvGetSize(frames[1]),8,1); cvCvtColor(frames[0], framesGray[0], CV_BGR2GRAY); //converting BGR image to gray scale image cvCvtColor(frames[1], framesGray[1], CV_BGR2GRAY); cvShowImage("frame1",framesGray[0]);//show converted gray image cvShowImage("frame2",framesGray[1]); printf("count=%d\n",count); if(count==0) { cvSaveImage("calibleft1.jpg",framesGray[0]); //saving gray image to drive cvSaveImage("calibright1.jpg",framesGray[1]); count=count++; 22 //loop exits and count is incremented and count is 1. In the next loop it goes to next for count==1 and same process repeats for 6 image pairs } else if(count==1) { cvSaveImage("calibleft2.jpg",framesGray[0]); cvSaveImage("calibright2.jpg",framesGray[1]); count=count++; } else if(count==2) { cvSaveImage("calibleft3.jpg",framesGray[0]); cvSaveImage("calibright3.jpg",framesGray[1]); count=count++; } else if(count==3) { cvSaveImage("calibleft4.jpg",framesGray[0]); cvSaveImage("calibright4.jpg",framesGray[1]); count=count++; } 23 else if(count==4) { cvSaveImage("calibleft5.jpg",framesGray[0]); cvSaveImage("calibright5.jpg",framesGray[1]); count=count++; } else if(count==5) { cvSaveImage("calibleft6.jpg",framesGray[0]); cvSaveImage("calibright6.jpg",framesGray[1]); count=count++; }// if we need more image sets then we need to add more else if functions. else if(count==6) { cvSaveImage("calibleft7.jpg",framesGray[0]); cvSaveImage("calibright7.jpg",framesGray[1]); count=count++; } int c= cvWaitKey(5000)//for every picture it waits for 5000 milliseconds so that we can change the position of chessboard; }} 24 Figure 2.7: Calibration data from camera 1 and camera 2. All images are automatically saved into ‘C:SampleProgram\SampleProgram\SampleProgram\’. If the project name is ‘Sampleprogram’ and C++ file name is also ‘SampleProgram’ then all image files are 25 should be copied to ‘C: SampleProgram\SampleProgram\Debug\StereoData’ so that calibration code can read a text file which has the address of these images. The text file is shown below in a picture. Figure 2.8: Text file that the calibration code reads from. 2.1.4) Step4 (Code for calibration and for 3D object tracking): Stereo calibration code and functions are illustrated clearly in the next section of this report. The outputs and execution is showed in video named StereoCalibOP.avi. The source code, calibration data (images), and video is in the compact disc attached at the end of the report. 26 Chapter 3 IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION 3.1) Important functions used for this project: 3.1.1) cvFindChessboardCorners: This is openCV inbuilt function used to find out chessboard inner corners. Calibration of stereo cameras is done by finding corners of chessboard to its sub-pixel accuracy. Detecting the chessboard corners by the cameras can be achieved by the below function. int cvFindChessboardCorners ( const void* image, //image – Source chessboard view; it must be an 8-bit grayscale or color image// CvSize patternSize, //patternSize – The number of inner corners per chessboard row and column ( patternSize = cvSize(columns, rows) )// CvPoint2D32f* corners, // corners – The output array of corners detected// int* cornerCount=NULL, // cornerCount – The output corner counter. If it is not NULL, it stores the number of corners found// int flags=CV_CALIB_CB_ADAPTIVE_THRESH) //flags –Various operation flags, can be 0 or a combination of the following values: 27 CV_CALIB_CB_ADAPTIVE_THRESH - Adaptive threshold is to convert the image to black and white, rather than a fixed threshold level (computed from the average image brightness). CV_CALIB_CB_NORMALIZE_IMAGE - Normalize the image gamma with function cv EqualizeHist() before applying fixed or adaptive threshold value. CV_CALIB_CB_FILTER_QUADS - Additional criteria (like contour area, perimeter, square-like shape) to filter out false quads that are extracted at the contour retrieval stage. The function attempts to determine whether the input image is a view of the chessboard pattern and locate the internal chessboard corners. The function returns a non-zero value if all of the corners have been found and they have been placed in a certain order (row by row, left to right in every row), otherwise, if the function fails to find all the corners or reorder them, it returns 0. For example, a regular chessboard has 9x7 squares has 8x6 internal corners, that is, points, where the black squares touch each other. The coordinates detected are approximate, and to determine their position more accurately, we may use the function cvFindCornerSubPix(). Note: the function requires some white space around the board to make the detection more robust in various environment (otherwise if there is no border and the background is dark, the outer black squares could not be segmented properly and so the square grouping and ordering algorithm will fail).// 28 3.1.2) cvDrawChessboardCorners: The function draws the individual chessboard corners detected as red circles if the board was not found or as colored corners connected with lines if the board was found. cvDrawChessboardCorners( CvArr* image, // image – The destination image; it must be an 8-bit color image// CvSize patternSize, //Same as in function cvFindChessboardCorners()// CvPoint2D32f* corners, //Same as in function cvFindChessboardCorners()// int count, // The number of corners// int patternWasFound, //Indicates whether the complete board was found not or . One may just pass the return value to cvFindChessboardCorners() function.// 3.1.3) cvStereoCalibrate: Stereo calibration is the process of computing the geometrical relationship between the two cameras in space. Stereo calibration depends on finding the rotation matrix R and translation vector T between the two cameras. cvStereoCalibrate( constCvMat* objectPoints, // objectPoints, is an N-by-3 matrix containing the physical coordinates of each of the K points on each of the M images of the 3D object such that N 29 = K × M. When using chessboards as the 3D object, these points are located in the coordinate frame attached to the object (and usually choosing the Z-coordinate of the points on the chessboard plane to be 0), but any known 3D points may be used. // const CvMat* imagePoints1, const CvMat* imagePoints2, \\ imagePoints1 and imagePoints are N-by-2 matrices containing the left and right pixel coordinates (respectively) of all of the object points. If you performed calibration using a chessboard for the two cameras, then imagePoints1 and imagePoints2 are just the respective returned values for the multiple calls to cvFindChessboardCorners() for the left and right camera views. // const CvMat*pointCounts, // Integer 1xM or Mx1 vector (where M is the number of calibration pattern views) containing the number of points in each particular view. Sum of vector elements must match the size of object Points and image Points. CvMat* cameraMatrix1, CvMat* cameraMatrix2, The input/output first and second camera matrix: Where fx and fy are focal lengths of camera, and Cx and Cy are the center of coordinates on the projection screen. CvMat* distCoeffs1, 30 CvMat* distCoeffs2, //The input/output lens distortion coefficients for the first and second camera 5x1 or 1x5 floating-point vectors . CvSize imageSize, // Size of the image, used only to initialize intrinsic camera matrix.// CvMat* R, //The output rotation matrix between the 1st and the 2nd cameras’ coordinate systems.// CvMat*T, //The output translation vector between the cameras’ coordinate systems.// CvMat* E=0, //The optional output essential matrix.// CvMat* F=0, //The optional output fundamental matrix.// CvTermCriteriaterm_crit =cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 1e-6), // The termination criteria for the iterative optimization algorithm.// Int flags=CV_CALIB_FIX_INTRINSIC) //Different flags, may be 0 or combination of the following values: CV_CALIB_FIX_INTRINSIC - If it is set, camera Matrix as well as distcoeffs are fixed, so that only R, T, E and F are estimated. 31 CV_CALIB_USE_INTRINSIC_GUESS - The flag allows the function to optimize some or all of the intrinsic parameters, depending on the other flags, but the initial values are provided by us. CV_CALIB_FIX_PRINCIPAL_POINT - The principal points are fixed during the optimization. CV_CALIB_FIX_FOCAL_LENGTH - and CV_CALIB_FIX_ASPECT_RATIO - are fixed. is optimized, but the ratio CV_CALIB_SAME_FOCAL_LENGTH - Enforces is fixed. and CV_CALIB_ZERO_TANGENT_DIST - Tangential distortion coefficients for each camera are set to zeros and fixed there. CV_CALIB_FIX_K1, CV_CALIB_FIX_K2, CV_CALIB_FIX_K3 - Fixes the corresponding radial distortion coefficient (the coefficient must be passed to the function) The function estimates transformation between the 2 cameras making a stereo pair. For stereo camera the relative position and orientation of the 2 cameras are fixed. The poses will relate to each other, i.e. given ( , ) it should be possible to compute ( , )- we only need to know the position and orientation of the 2nd camera relative to the 1st camera. That’s what the function does. It computes ( , ) such that: 32 Optionally, it computes the essential matrix E: Where are components of the translation vector T: And also the function can compute the fundamental matrix F: Besides the stereo-related information, the function can also perform full calibration of each of the 2 cameras. However, because of the high dimensionality of the parameter space and noise in the input data the function can diverge from the correct solution. 3.1.4) cvComputeCorrespondEpilines: The OpenCV function cvComputeCorrespondEpilines() computes, for a list of points in one image, the epipolar lines in the other image. For any given point in one image, there is a different corresponding epipolar line in the other image. Each computed line is encoded in the form of a vector of three points (a, b, c) such that the epipolar line is defined by the equation: ax + by + c = 0 cvComputeCorrespondEpilines( const CvMat* points, // The input points. 2xN, Nx2, 3xN or Nx3 array (where N number of points). Multi-channel 1xN or Nx1 array is also acceptable// 33 int whichImage, // Index of the image (1 or 2) that contains the points.// const CvMat* F, // The fundamental matrix that can be estimated using FindFundamentalMat or StereoRectify.// CvMat* lines // The output epilines, a 3xN or Nx3 array. Each line is encoded by 3 numbers (a, b, c). //) For points in one image of a stereo pair, computes the corresponding epilines in the other image. From the fundamental matrix definition, line point in the second image for the in the first image (i.e. whenwhichImage=1) is computed as: vice versa, when which Image=2, is computed from and, as: Line coefficients are defined up to a scale. They are normalized, such that 3.1.5) cvInitUndistortRectifyMap: The function cvInitUndistortRectifyMap() outputs mapx and mapy. These maps indicate from where we should interpolate source pixels for each pixel of the destination image; the maps can then be plugged directly into cvRemap(). The function cvInitUndistortRectifyMap() is called separately for the left and the right cameras so that we can obtain their distinct mapx and mapy remapping parameters. The function cvRemap() may then be called, using the left and then the right maps each time we have new left and right stereo images to rectify. 34 cvInitUndistortRectifyMap( const CvMat* cameraMatrix, const CvMat* distCoeffs, const CvMat* R, // The optional rectification transformation in object space (3x3 matrix). R1 or R2, computed by StereoRectify can be passed here. If the matrix is NULL, the identity transformation is assumed.// const CvMat*newCameraMatrix, CvArr* map1, // The first output map of type CV_32FC1 or CV_16SC2 - the second variant is more efficient.// CvArr* map2, // The second output map of type CV_32FC1 or CV_16UC1 - the second variant is more efficient.//) The function computes the joint un-distortion and rectification transformation and represents the result in the form of maps for Remap. The undistorted image will look like the original, as if it matrix =newCameraMatrix and was zero captured with distortion. a In camera the case with of camera stereo camera newCameraMatrix is normally set to P1 orP2 computed by StereoRectify. Also, 35 this new camera will be oriented differently in the coordinate space, according to R. That, for example, helps to align two heads of a stereo camera so that the epipolar lines on both images become horizontal and have the same y- coordinate (in the case of horizontally aligned stereo camera). The function actually builds the maps for the inverse mapping algorithm that is used by Remap. That is, for each pixel in the destination (corrected and rectified) image the function computes the corresponding coordinates in the source image (i.e. in the original image from camera). The process is the following: where are the distortion coefficients. In the case of a stereo camera this function is called twice, once for each camera head, after StereoRectify, which in its turn is called after StereoCalibrate. But if the stereo camera was not calibrated, it is still possible to compute the rectification transformations directly from the fundamental matrix using StereoRectifyUncalibrated. For each camera 36 the function computes homograph H as the rectification transformation in pixel domain, not a rotation matrix R in 3D space. The R can be computed from H as where the camera matrix can be chosen arbitrarily. 3.1.6) cvStereoBMState* cvCreateStereoBMState( int preset=CV_STEREO_BM_BASIC, // Any of the parameters can be overridden after creating the structure.// int numberOfDisparities=0// The number of disparities. If the parameter is 0, it is taken from the preset; otherwise the supplied value overrides the one from preset.//) #define CV_STEREO_BM_NARROW #define CV_STEREO_BM_FISH_EYE 1 #define CV_STEREO_BM_BASIC 0 The function creates the stereo correspondence structure and initializes it. It is possible to override any of the parameters to FindStereoCorrespondenceBM. typedef struct CvStereoBMState { //pre filters (normalize input images): at any time between the calls 37 int preFilterType; int preFilterSize;//for 5x5 up to 21x21 int preFilterCap; //correspondence using Sum of Absolute Difference (SAD): int SADWindowSize; // Could be 5x5,7x7, ..., 21x21 int minDisparity; int numberOfDisparities;//Number of pixels to search //post filters (knock out bad matches): int textureThreshold; //minimum allowed float uniquenessRatio;// Filter out if: // [ match_val - min_match <uniqRatio*min_match ] over the corr window area int speckleWindowSize;//Disparity variation window int speckleRange;//Acceptable range of variation in window // temporary buffers CvMat* preFilteredImg0; CvMat* preFilteredImg1; CvMat* slidingSumBuf; } CvStereoBMState; The state structure is allocated and returned by the function cvCreateStereoBMState(). This function takes the parameter preset, which can be set to any one of the following. 38 CV_STEREO_BM_BASIC Sets all parameters to their default values CV_STEREO_BM_FISH_EYE Sets parameters for dealing with wide-angle lenses CV_STEREO_BM_NARROW Sets parameters for stereo cameras with narrow field of view This function also takes the optional parameter numberOfDisparities; if nonzero, it overrides the default value from the preset. Here is the specification: The state structure, CvStereoBMState{}, is released by calling void cvReleaseBMState(CvStereoBMState **BMState); Any stereo correspondence parameters can be adjusted at any time between cvFindStereo CorrespondenceBM calls by directly assigning new values of the state structure fields. The correspondence function will take care of allocating/reallocating the internal buffers as needed. Finally, cvFindStereoCorrespondenceBM() takes in rectified image pairs and outputs a disparity map given its state structure: void cvFindStereoCorrespondenceBM( const CvArr *leftImage, const CvArr *rightImage, CvArr *disparityResult, 39 CvStereoBMState *BMState ); 3.1.7) cvReprojectImageTo3D ( const CvArr* disparity, // The input single-channel 16-bit signed or 32-bit floating-point disparity image.// CvArr* _3dImage, // The output 3-channel floating-point image of the same size as disparity. Each element of_3dImage(x, y, z) will contain the 3D coordinates of the point (x, y), computed from the disparity map.// const CvMat* Q, // The perspective transformation matrix that can be obtained with StereoRectify.// int handleMissingValues=0 //If true, when the pixels with the minimal disparity (that corresponds to the outliers; will be transformed to 3D points with some very large Z value (currently set to 10000). The function transforms 1-channel disparity map to 3channel image representing a 3D surface. That is, for each pixel (x,y) and the corresponding disparity d=disparity(x, y) it computes: The matrix Q can be arbitrary matrix computed by StereoRectify. 40 3.2) Pseudo code for stereo calibration and 3D vision: Source Code for this project is available in the compact disc attached at the end of the report. Initialize libraries which OpenCV has for computer vision and load ‘C++’ libraries too. Below are the libraries #include <cv.h #include <cxmisc.h> #include <highgui.h> #include <cvaux.h> #include <vector> #include <string> #include <algorithm> #include <stdio.h> #include <ctype.h> using namespace std; All the elements of the standard C++ library are declared within what is called a namespace, the namespace with the name ‘std’. So in order to access its functionality we declare with this expression that we will be using these entities. This line is very frequent in C++ programs that use the standard library Given a list of chessboard images, the number of corners (nx, ny) on the chessboards, and a flag: useCalibrated for calibrated (0) or (1) for unCalibrated. 41 (1: use cvStereoCalibrate(), 2: compute fundamental matrix separately) stereo. Calibrate the cameras and display the rectified results along with the computed disparity images. static void Creating a function named SteroCalib which takes four parameters from the main program, the first one is name of the text file that has the list of image files for calibration; second and third inputs are number of chessboard inner corners in X direction and Y direction respectively; fourth input is an integer to use another function named ‘useUncailbrated’ in this ‘SteroCalib’ function; if int=0 don’t use function or int=1 use the function. StereoCalib(const char* imageList, int nx, int ny, int useUncalibrated) { Initializing constant values and declaring integers, vectors and float elements that are used by this program. int DisplayCorners = 1; Making ‘DisplayCorners’ as true used in if statement ahead. int showUndistorted = 1; Making ‘showUndistorted’ as true used in if statement ahead. bool isVerticalStereo = false const int maxScale = 1; const float squareSize = 1.f; //Set this to your actual square size. ‘rt’ is the file name which is first input to this function and we open this file by using this function fopen FILE* f = fopen(imageList, "rt"); 42 Declaration of i, j, lr, nframes, n = nx*ny, N = 0 as integers; Declaration of arrays of strings, CvPoints in 3D or 2D, chars. vector<string> imageNames[2]; vector<CvPoint3D32f> objectPoints; vector<CvPoint2D32f> points[2]; vector<int> npoints; vector<uchar> active[2]; vector<CvPoint2D32f> temp(n); CvSize is function to initialize image size. Initially declaring as {0, 0} and storing the values to ‘imageSize’ CvSize imageSize = {0, 0}; Initializing arrays, vectors. Creating multi dimensional arrays for creating CvMat double M1[3][3], M2[3][3], D1[5], D2[5], Q[4][4]; double R[3][3], T[3], E[3][3], F[3][3]; Creating matrices using function CvMat for saving camera matrix, distortion, rotational, translation, fundamental and projection matrices between camera 1 and 2. CvMat _M1 = cvMat(3, 3, CV_64F, M1 ); CameraMatrix for camera 1 CvMat _M2 = cvMat(3, 3, CV_64F, M2 ); CameraMatrix for camera 2 CvMat _D1 = cvMat(1, 5, CV_64F, D1 ); Distortion coefficients for camera 1 CvMat _D2 = cvMat(1, 5, CV_64F, D2 ); Distortion coefficients for camera 2 43 CvMat _R = cvMat(3, 3, CV_64F, R ); Rotational matrix between cam 1 and 2 CvMat _T = cvMat(3, 1, CV_64F, T );Translational matrix between cam 1 and 2 CvMat _E = cvMat(3, 3, CV_64F, E ); Essential matrix between cam 1 and 2 CvMat _F = cvMat(3, 3, CV_64F, F ); Fundamental matrix between 1 and 2 CvMat _Q = cvMat(4, 4, CV_64F, Q); Projection matrix between 1 and 2 If ( displayCorners is true ) then cvNamedWindow( "corners", 1 ); this function creates a new window named corners which we can see the color image as function says ‘1’ at end. // READ IN THE LIST OF CHESSBOARDS: if( !f ) if the file is not opened it will exit the loop below and we did in the earlier part of the program. { fprintf(stderr, "can’t open file %s\n", imageList ); displays error message if the program couldn’t be able to load the image list. return; } for(i=0;;i++) { An array that can store up to 1024 elements of type chars. char buf[1024]; 44 int count = 0, result=0; lr = i % 2; Crating a vector array to load multiple points [lr] the remainder of ‘i’ in every iteration. vector<CvPoint2D32f>& pts = points[lr]; if( !fgets( buf, sizeof(buf)-3, f )) if not some file name with some chars then exit the loop. break; size_t len = strlen(buf); while( len > 0 && isspace(buf[len-1])) buf[--len] = '\0'; this loop reads all file names and rejects the files that begin with ‘#’. if( buf[0] == '#') continue; IplImage* img = cvLoadImage( buf, 0 ); loads the file with openCV function cvLoadImage. if( not a image ) break; imageSize = cvGetSize(img); getting the size of image by openCV function. imageNames[lr].push_back(buf); imagename is 0,1,2,3……………….and moves next step for finding corners of chessboard image from cam 1 and cam 2. //FIND CHESSBOARDS AND CORNERS THEREIN: for( int s = 1; s <= maxScale; s++ ) { 45 IplImage* timg = img; Initializing loaded image as some temp image. if( s > 1 ) loop which is grater then one as it is incremented above. { Creating temp image with same properties of image multiplied with‘s’. timg = cvCreateImage(cvSize(img->width*s,img->height*s), img->depth, img->nChannels ); resizing img with tmig using openCV function cvResize cvResize( img, timg, CV_INTER_CUBIC ); } Doing all operations on temp image instead of using main loaded image results from function cvFindChessboardCorners loaded into ‘result’. result = cvFindChessboardCorners( timg, cvSize(nx, ny), &temp[0], &count, CV_CALIB_CB_ADAPTIVE_THRESH | CV_CALIB_CB_NORMALIZE_IMAGE); The above function takes temp image and number of corners ‘nx’ and ‘ny’. if( timg is not equal to img ) then release temp image data as below. cvReleaseImage( &timg ); if( result || s == maxScale ) for( j = 0; j < count; j++ ) 46 { Below functions divided by‘s’ and saves pixel values in temp[j]. temp[j].x /= s; temp[j].y /= s; } if( result ) break; } Below loop displays the corners on the image. if( displayCorners ) { printf("%s\n", buf); displays file which the program is finding corners from. Creating another temp image called cimg to display corners. IplImage* cimg = cvCreateImage( imageSize, 8, 3 ); cvCvtColor( img to cimg with CV_GRAY2BGR ); cvDrawChessboardCorners( cimg, cvSize(nx, ny), &temp[0], count, result ); Draw chessboard corners with colored circles around the found corners with connecting lines between points. cvShowImage( "corners", cimg ); displays image on the screen with corners on chessboard. cvReleaseImage( &cimg ); 47 if( cvWaitKey(0) == 27 ) //Allow ESC to quit exit(-1); } else putchar('.'); N = pts.size(); pts.resize(N + n, cvPoint2D32f(0,0)); active[lr].push_back((uchar)result); //assert( result != 0 ); if( result ) { Calibration will suffer sub-pixel interpolation. cvFindCornerSubPix( img, &temp[0], count, cvSize(11, 11), cvSize(-1,-1), cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 0.01)); copy( temp.begin(), temp.end(), pts.begin() + N ); } cvReleaseImage( &img ); } fclose(f); printf("\n"); // HARVEST CHESSBOARD 3D OBJECT POINT LIST: 48 nframes = active[0].size();//Number of good chessboads found objectPoints.resize(nframes*n); for( i = 0; i < ny; i++ ) for( j = 0; j < nx; j++ ) objectPoints[i*nx + j] = cvPoint3D32f(i*squareSize, j*squareSize, 0); for( i = 1; i < nframes; i++ ) copy( objectPoints.begin(), objectPoints.begin() + n, objectPoints.begin() + i*n ); npoints.resize(nframes,n); N = nframes*n; CvMat _objectPoints = cvMat(1, N, CV_32FC3, &objectPoints[0] ); CvMat _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] ); CvMat _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] ); CvMat _npoints = cvMat(1, npoints.size(), CV_32S, &npoints[0] ); cvSetIdentity(&_M1); cvSetIdentity(&_M2); cvZero(&_D1); cvZero(&_D2); // CALIBRATE THE STEREO CAMERAS printf("Running stereo calibration ..."); fflush(stdout); 49 The object points and image points from the above section are needed for stereo camera calibration. cvStereoCalibrate( &_objectPoints, &_imagePoints1, &_imagePoints2, &_npoints, &_M1, &_D1, &_M2, &_D2, imageSize, &_R, &_T, &_E, &_F, cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, 1e-5), CV_CALIB_FIX_ASPECT_RATIO + CV_CALIB_ZERO_TANGENT_DIST + CV_CALIB_SAME_FOCAL_LENGTH); printf(" done\n"); CALIBRATION QUALITY CHECK Because the output fundamental matrix implicitly includes all the output information, we can check the quality of calibration using the epipolar geometry constraint: m2^t*F*m1=0 vector<CvPoint3D32f> lines[2]; initializing two matrices for epipolar line. points[0].resize(N); points resize for every image for two cameras. points[1].resize(N); _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] ); pixel points of first camera _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] ); pixel points of second camera lines[0].resize(N); crating line for every corner point in image of cam 1 lines[1].resize(N); crating line for every corner point in image of cam 2 CvMat _L1 = cvMat(1, N, CV_32FC3, &lines[0][0]); 50 CvMat _L2 = cvMat(1, N, CV_32FC3, &lines[1][0]); Always work in undistorted space cvUndistortPoints( &_imagePoints1, &_imagePoints1, &_M1, &_D1, 0, &_M1 ); cvUndistortPoints( &_imagePoints2, &_imagePoints2, &_M2, &_D2, 0, &_M2 ); Function ‘undistortion’ which mathematically removes lens distortion, and rectification, which mathematically aligns the images with respect to each other. cvComputeCorrespondEpilines( &_imagePoints1, 1, &_F, &_L1 ); cvComputeCorrespondEpilines( &_imagePoints2, 2, &_F, &_L2 ); double avgErr = 0; The method (which is avgErr) is the sum of the distances of the points from the corresponding epipolar lines to its subpixel accuracy. for( i = 0; i < N; i++ ) { double err = fabs(points[0][i].x*lines[1][i].x + points[0][i].y*lines[1][i].y + lines[1][i].z) + fabs(points[1][i].x*lines[0][i].x +points[1][i].y*lines[0][i].y + lines[0][i].z);avgErr += err; } printf( "avg err = %g\n", avgErr/(nframes*n) ); COMPUTE AND DISPLAY RECTIFICATION if( showUndistorted is true as the value is 1 ) 51 { Initializing matrices for rectification purposes. CvMat* mx1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F ); CvMat* my1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F ); CvMat* mx2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F ); CvMat* my2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F ); CvMat* img1r = cvCreateMat( imageSize.height,imageSize.width, CV_8U ); CvMat* img2r = cvCreateMat( imageSize.height,imageSize.width, CV_8U ); IplImage* disp= cvCreateImage( imageSize, IPL_DEPTH_16S, 1 ); CvMat* vdisp = cvCreateMat( imageSize.height,imageSize.width, CV_8U ); CvMat* pair; double R1[3][3], R2[3][3], P1[3][4], P2[3][4]; CvMat _R1 = cvMat(3, 3, CV_64F, R1); CvMat _R2 = cvMat(3, 3, CV_64F, R2); int c=0; IF BY CALIBRATED (BOUGUET'S METHOD) if( useUncalibrated == 0 ) { CvMat _P1 = cvMat(3, 4, CV_64F, P1); CvMat _P2 = cvMat(3, 4, CV_64F, P2); 52 cvStereoRectify( &_M1, &_M2, &_D1, &_D2, imageSize,&_R, &_T,&_R1, &_R2, &_P1, &_P2, &_Q,0/*CV_CALIB_ZER O_DISPARITY*/ ); Return parameters are Rl and Rr, rectification rotations for the left and right image planes. Similarly, we get back the 3-by-4 left and right projection equations Pl and Pr. An optional return parameter is Q, the 4-by-4 reprojection matrix used in cvReprojectto3D to get 3D coordinates of the object. isVerticalStereo = fabs(P2[1][3]) > fabs(P2[0][3]); Precompute maps for cvRemap() cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_P1,mx1,my1); cvInitUndistortRectifyMap(&_M2,&_D2,&_R2,&_P2,mx2,my2); mx and my of two cameras from cvInitUndistortRectifyMap are relocate matrix of pixel points for row aligned left and right images. } OR ELSE HARTLEY'S METHOD else if( useUncalibrated == 1 || useUncalibrated == 2 ) Use intrinsic parameters of each camera, but compute the rectification transformation directly from the fundamental matrix. { double H1[3][3], H2[3][3], iM[3][3]; CvMat _H1 = cvMat(3, 3, CV_64F, H1); CvMat _H2 = cvMat(3, 3, CV_64F, H2); 53 CvMat _iM = cvMat(3, 3, CV_64F, iM); Just to show independently used F if( useUncalibrated == 2 ) The fundamental matrix F is just like the essential matrix E, except that F operates in image pixel coordinates whereas E operates in physical coordinates. The fundamental matrix has seven parameters, two for each epipole and three for the homography that relates the two image planes. cvFindFundamentalMat( &_imagePoints1,&_imagePoints2, &_F); The function cvStereoRectifyUncalibrated computes the rectification transformations without knowing intrinsic parameters of the cameras and their relative position in space, hence the suffix "Uncalibrated". Another related difference from cvStereoRectify is that the function outputs not the rectification transformations in the object (3D) space, but the planar perspective transformations, encoded by the homography matrices H1 and H2. cvStereoRectifyUncalibrated( &_imagePoints1,&_imagePoints2, &_F,imageSize,&_H1, &_H2,3); cvInvert(&_M1, &_iM); cvMatMul(&_H1, &_M1, &_R1); cvMatMul(&_iM, &_R1, &_R1); cvInvert(&_M2, &_iM); cvMatMul(&_H2, &_M2, &_R2); cvMatMul(&_iM, &_R2, &_R2); 54 Precompute map for cvRemap() cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_M1,mx1,my1); cvInitUndistortRectifyMap(&_M2,&_D1,&_R2,&_M2,mx2,my2); } else assert(0); cvNamedWindow( "rectified", 1 ); RECTIFY THE IMAGES AND FIND DISPARITY MAPS if( it is not VerticalStereo ) Creating a single image; double the size original image in vertical direction. If it is vertical stereo so that we can see both left and right rectified images in single image. pair = cvCreateMat( imageSize.height, imageSize.width*2, CV_8UC3 ); else pair = cvCreateMat( imageSize.height*2, imageSize.width, CV_8UC3 ); Setup for finding stereo correspondences This is done by running a window of different sizes of 5-by-5, 7-by-7, . . . , 21-by-21. The center for each feature in the left image, we search the corresponding row in the right image for a best match. CvStereoBMState *BMState = cvCreateStereoBMState(); 55 assert(BMState != 0); BMState->preFilterSize=7; In the pre-filtering step, the input images are normalized to reduce lighting differences and to enhance image texture. BMState->preFilterCap=30; BMState->SADWindowSize=5; Correspondence is computed by a sliding SAD window BMState->minDisparity=0; minDisparity is where the matching search should start. CvScalar p; if(c>0) { if(p.val[2]>-100.00) { BMState->numberOfDisparities=256; The disparity search is then carried out over ‘numberOfDisparities’ counted in pixels. printf("nd%d\n",BMState->numberOfDisparities); } else { BMState->numberOfDisparities=128; printf("nd%d\n",BMState->numberOfDisparities); } 56 } else { BMState->numberOfDisparities=256; } Each disparity limit defines a plane at a fixed depth from the cameras. BMState->textureThreshold=10; Texture is search parameter. BMState->uniquenessRatio=15; BMState->speckleWindowSize=21; BMState->speckleRange=4; c=c++; CvCapture *capture1 = 0; Initializing capture for camera parameters. CvCapture *capture2 = 0; IplImage *imageBGR1 = 0; Initializing frame images from capture and setting initially to null. IplImage *imageBGR2 = 0; int key = 0; capture1 = cvCaptureFromCAM( 0 ); The function that initializes cameras 1 and 2. capture2 = cvCaptureFromCAM( 1 ); Checking if capture is happening. if ( !capture1 ) { 57 fprintf( stderr, "Cannot open initialize webcam!\n" ); } Create a window for the video cvNamedWindow( "result", CV_WINDOW_AUTOSIZE ); while( key != 'q' ) this loop is used get frames from capture continuously { get a frame imageBGR1 = cvQueryFrame( capture1 ); imageBGR2 = cvQueryFrame( capture2 ); Check if( !imageBGR1 ) break; if not frame the brake the loop. Create some GUI windows for output display. cvShowImage("Input Image1", imageBGR1); cvShowImage("Input Image2", imageBGR2); IplImage* imageHSV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); Full HSV color image. cvCvtColor(imageBGR1, imageHSV1, CV_BGR2HSV); converting color to BGR to HSV which is easy to separate color planes. cvShowImage("s1",imageHSV1); initializing planes IplImage* planeH1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component. 58 IplImage* planeS1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation component. IplImage* planeV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness component. cvCvtPixToPlane(imageHSV1, planeH1, planeS1, planeV1, 0); display current frame IplImage* planeH11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component. IplImage* planeS11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation component. IplImage* planeV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness component. Converting pixel to plane cvCvtPixToPlane(imageHSV1, planeH11, planeS11, planeV11, 0); Extracting the 3 color components. Setting up saturation and brightness to maximum in order to separate Hue plane from others. cvSet(planeS11, CV_RGB(255,255,255)); cvSet(planeV11, CV_RGB(255,255,255)); IplImage* imageHSV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); Full HSV color image. IplImage* imageBGR11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); Full RGB color image. 59 cvCvtPlaneToPix( planeH11, planeS11, planeV11, 0, imageHSV11 ); Combine separate color components into one. cvCvtColor(imageHSV11, imageBGR11, CV_HSV2BGR); Convert from a BGR to an HSV image. cvReleaseImage(&planeH11);cvReleaseImage(&planeS11); cvReleaseImage(&planeV11); cvReleaseImage(&imageHSV11); cvReleaseImage(&imageBGR11); Doing same thing for other color components as above with Hue component IplImage* planeH21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component. IplImage* planeS21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation component. IplImage* planeV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1);Brightness component. cvCvtPixToPlane(imageHSV1, planeH21, planeS21, planeV21, 0); //Extract the 3 color components. cvSet(planeS21, CV_RGB(255,255,255)); //cvSet(planeV21, CV_RGB(255,255,255)); IplImage* imageHSV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); HSV color image. IplImage* imageBGR21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); RGB color image. 60 cvCvtPlaneToPix( planeH21, planeS21,planeV21, 0, imageHSV21 ); combine the separate color components. cvCvtColor(imageHSV21, imageBGR21, CV_HSV2BGR Convert from a BGR to an HSV image. cvReleaseImage(&planeH21);cvReleaseImage(&planeS21);cvReleaseImage(&planeV21 ); cvReleaseImage(&imageHSV21);cvReleaseImage(&imageBGR21); IplImage* planeH31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component. IplImage* planeS31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation component. IplImage* planeV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness component. cvCvtPixToPlane(imageHSV1, planeH31, planeS31, planeV31, 0); Extract the 3 color components. //cvSet(planeS31, CV_RGB(255,255,255)); cvSet(planeV31, CV_RGB(255,255,255)); IplImage* imageHSV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full HSV color image. IplImage* imageBGR31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full RGB color image. 61 cvCvtPlaneToPix( planeH31, planeS31, planeV31, 0, imageHSV31 ); Combine separate color components into one. cvCvtColor(imageHSV31, imageBGR31, CV_HSV2BGR); Convert from a BGR to an HSV image. cvReleaseImage(&planeH31);cvReleaseImage(&planeS31);cvReleaseImage(&planeV31 ); cvReleaseImage(&imageHSV31);cvReleaseImage(&imageBGR31); cvThreshold(planeH1, planeH1, 170, UCHAR_MAX, CV_THRESH_BINARY); cvThreshold(planeS1, planeS1, 171, UCHAR_MAX, CV_THRESH_BINARY); cvThreshold(planeV1, planeV1, 136, UCHAR_MAX, CV_THRESH_BINARY); // Show the thresholded HSV channels IplImage* img1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); // Greyscale output image. cvAnd(planeH1, planeS1, img1);imageColor = H {BITWISE_AND} S. cvAnd(img1, planeV1, img1); imageColor = H {BITWISE_AND} S {BITWISE_AND} V. Show the output image on the screen. cvNamedWindow("Skin Pixels1", CV_WINDOW_AUTOSIZE); cvShowImage("color Pixels1", img1); form camera 1 and same above steps are repeated for second camera 62 IplImage* imageHSV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV color image. cvCvtColor(imageBGR2, imageHSV2, CV_BGR2HSV); //cvShowImage("s",imageHSV2); IplImage* planeH2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Hue component. IplImage* planeS2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Saturation component. IplImage* planeV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Brightness component. cvCvtPixToPlane(imageHSV2, planeH2, planeS2, planeV2, 0); display current frame IplImage* planeH12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component. IplImage* planeS12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation component. IplImage* planeV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness component. cvCvtPixToPlane(imageHSV2, planeH12, planeS12, planeV12, 0); Extract the 3 color components. cvSet(planeS12, CV_RGB(255,255,255));cvSet(planeV12, CV_RGB(255,255,255)); IplImage* imageHSV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV color image. 63 IplImage* imageBGR12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB color image. cvCvtPlaneToPix( planeH12, planeS12, planeV12, 0, imageHSV12 );// Combine separate color components into one. cvCvtColor(imageHSV12, imageBGR12, CV_HSV2BGR);// Convert from a BGR to an HSV image. cvReleaseImage(&planeH12);cvReleaseImage(&planeS12);cvReleaseImage(&planeV12 ); cvReleaseImage(&imageHSV12);cvReleaseImage(&imageBGR12); IplImage* planeH22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component. IplImage* planeS22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Saturation component. IplImage* planeV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Brightness component. cvCvtPixToPlane(imageHSV2, planeH22, planeS22, planeV22, 0); Extract the 3 color components. cvSet(planeS22, CV_RGB(255,255,255));//cvSet(planeV2, CV_RGB(255,255,255)); IplImage* imageHSV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV color image. IplImage* imageBGR22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB color image. 64 cvCvtPlaneToPix( planeH22, planeS22, planeV22, 0, imageHSV2 );// Combine separate color components into one. cvCvtColor(imageHSV22, imageBGR22, CV_HSV2BGR);// Convert from a BGR to an HSV image. cvReleaseImage(&planeH22);cvReleaseImage(&planeS22);cvReleaseImage(&planeV22 ); cvReleaseImage(&imageHSV22);cvReleaseImage(&imageBGR22); IplImage* planeH32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component. IplImage* planeS32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation component. IplImage* planeV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness component. cvCvtPixToPlane(imageHSV2, planeH32, planeS32, planeV32, 0); // Extract the 3 color components. //cvSet(planeS3, CV_RGB(255,255,255));cvSet(planeV32, CV_RGB(255,255,255)); IplImage* imageHSV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV color image. IplImage* imageBGR32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB color image. cvCvtPlaneToPix( planeH32, planeS32, planeV32, 0, imageHSV32 ); Combine separate color components into one. 65 cvCvtColor(imageHSV32, imageBGR32, CV_HSV2BGR); Convert from a BGR to an HSV image. cvReleaseImage(&planeH32);cvReleaseImage(&planeS32);cvReleaseImage(&planeV32 );cvReleaseImage(&imageHSV32);cvReleaseImage(&imageBGR32); cvThreshold(planeH2, planeH2, 170, UCHAR_MAX, CV_THRESH_BINARY); cvThreshold(planeS2, planeS2, 171, UCHAR_MAX, CV_THRESH_BINARY); cvThreshold(planeV2, planeV2, 136, UCHAR_MAX, CV_THRESH_BINARY); // Show the thresholded HSV channels IplImage* img2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); // Greyscale output image. cvAnd(planeH2, planeS2, img2);// imagecolor = H {BITWISE_AND} S. cvAnd(img2, planeV2, img2);// imagecolor = H {BITWISE_AND} S {BITWISE_AND} V. // Show the output image on the screen.//cvNamedWindow(“color Pixels2", CV_WINDOW_AUTOSIZE); //cvShowImage("color Pixels2", img2); These color pixel images are imputed to remap to rectify both color images. If color images 1 and 2( img1 && img2 ) { CvMat part; 66 cvRemap( img1, img1r, mx1, my1 ); cvRemap( img2, img2r, mx2, my2 ); (if is not VerticalStereo || useUncalibrated != 0 ) { // When the stereo camera is oriented vertically then useUncalibrated==0 does not transpose the image, so the epipolar lines in the rectified images are vertical. Stereo correspondence function does not support such a case. Applying StereoCorrespondenceBM to rectified color images to get disparity map. cvFindStereoCorrespondenceBM( img1r, img2r, disp, BMState); Saving disparity to the text file//cvSave("disp.txt",disp); IplImage* real_disparity= cvCreateImage( imageSize, IPL_DEPTH_8U, 1 ); cvConvertScale( disp, real_disparity, 1.0/16, 0 ); cvNamedWindow( "disparity" ); cvShowImage( "disparity",real_disparity ); if( useUncalibrated == 0 )//Using Bouguet, we can calculate the depth { Calculating depth using Bouguet Method using function ‘cvReprojectImageTo3D’ and the input is calculated disparity from above step. IplImage* depth = cvCreateImage( imageSize, IPL_DEPTH_32F, 3 ); cvReprojectImageTo3D(real_disparity , depth, &_Q); Steps to diplay depth of colored object (X, Y, Z coordinates from cameras) on console window: 67 The below steps are just finding out minimum, maximum and average distance of colored pixel from the camera. int l=0; float r[10000];float o[10000];float m[10000]; creating arrays for storing pixel values which are extracted from depth image. for(int i=0;i<imageSize.height;i++){ for (int j=0;j<imageSize.width;j++) { CvScalar s; p=cvGet2D(depth,i,j); // get the (i,j) pixel value s=cvGet2D(real_disparity,i,j); getting disparities which are greater than 1(colored pixels are<1) from real_disparity image. if(s.val[0]!=0) { //printf("X=%f, Y=%f, Z=%f\n",p.val[0],p.val[1],p.val[2]); r[l]=p.val[2]; o[l]=p.val[0]; m[l]=p.val[1]; //printf("%f",r[l]); //printf("------------next->>>>>>>>"); l=l++; 68 } //if(l==1) //break; } } float minr =r[0];float mino =o[0];float minm =m[0]; float maxr= r[0];float maxo= o[0];float maxm= m[0]; float sumr=0;float sumo=0;float summ=0; float avgr; float avgo; float avgm; calculating sum of all x,y z values of color pixels. for(int pl=0;pl<l;pl++) { sumr=sumr+r[pl];sumo=sumo+o[pl];summ=summ+m[pl]; calculating minimum of all x,y z values of color pixels and it same like calculating min, max and average of elements in a array. if (r[pl] < minr) { minr = r[pl]; } if (r[pl] > maxr) { 69 maxr = r[pl]; } if (o[pl] < mino) { mino = o[pl]; } Calculating maximum; if (o[pl] > maxo) { maxo = o[pl]; } if (m[pl] < minm) { minm = m[pl]; } if (m[pl] > maxm) { maxm = m[pl]; }} avgr=(float)sumr/l; avgo=(float)sumo/l; 70 avgm=(float)summ/l; Outputting value to console window printf("MAX Z=%f\nMIN Z=%f\n",minr,maxr); printf("AVG Z=%f\n",avgr); printf("MAX X=%f\nMIN X=%f\n",mino,maxo); printf("AVG X=%f\n",avgo); printf("MAX Y=%f\nMIN Y=%f\n",minm,maxm); printf("AVG Y=%f\n",avgm); printf("X=%f, Y=%f, Z=%f\n",avgo,avgm,avgr); printf("----------\n"); } The below part is used only to show green lines on Pair image so that we can notice that left and right images are rectified or not. /* if( !isVerticalStereo ) { cvGetCols( pair, &part, 0, imageSize.width ); Copy elements from multiple adjacent columns of an array cvCvtColor( img1r, &part, CV_GRAY2BGR ); cvGetCols( pair, &part, imageSize.width, imageSize.width*2 ); cvCvtColor( img2r, &part, CV_GRAY2BGR ); for( j = 0; j < imageSize.height; j += 16 ) cvLine( pair, cvPoint(0,j), 71 cvPoint(imageSize.width*2,j), CV_RGB(0,255,0)); } else { cvGetRows( pair, &part, 0, imageSize.height ); cvCvtColor( img1r, &part, CV_GRAY2BGR ); cvGetRows( pair, &part, imageSize.height, imageSize.height*2); cvCvtColor( img2r, &part, CV_GRAY2BGR ); for( j = 0; j < imageSize.width; j += 16 ) cvLine( pair, cvPoint(j,0), cvPoint(j,imageSize.height*2), CV_RGB(0,255,0)); } //cvShowImage( "rectified", pair );*/ key = cvWaitKey(1); } cvReleaseImage( &img1 ); cvReleaseImage( &img2 ); } 72 cvReleaseStereoBMState(&BMState); cvReleaseMat( &mx1 ); cvReleaseMat( &my1 ); cvReleaseMat( &mx2 ); cvReleaseMat( &my2 ); cvReleaseMat( &img1r ); cvReleaseMat( &img2r ); cvReleaseImage( &disp ); } } int main(void)main function { StereoCalib("1.txt", 5, 7, 0); return 0;} 73 Chapter 4 RESULTS OF CALIBRATION AND 3D VISION 4.1) Project Application information: The major purpose of this project is to design a sensor that can detect and track the player’s bat for table tennis shooter. The sensor can be linked to a robotic system that can throw the ball to the player’s bat for training purposes. The sensor is designed to detect a colored object (Player’s bat) from range 2 to 3 meters away from the sensor. The sensors that can see the colored object are stereo cameras which are placed at convenient place for tracking, outputs are (X, Y, &Z) coordinates of a colored object from one of the camera in camera coordinate system. These coordinates can be inputted into robotic system (shooter) so that shooter can move their joints to throw the ball for different kinds of training purposes. The shooter can basically have two types of training modes. 1. Random mode; in this the shooter can throw anywhere on the table for second stage of training. 2. Spot mode; in this the shooter can only throw the ball on to the table so that after bouncing back it should exactly hit the player’s bat, for spot mode the shooter needs the bat coordinates from stereo cameras (sensors). We could add intelligence and machine learning techniques to the shooter by having the coordinates of an object. For this specific project we used two USB cameras as a sensor to see the object. Cameras are placed approximately placed 50 cm apart so see the object at 2 to 3 meters away from 74 the sensors. Before cameras see the 3D object, the cameras needs to be calibrated so that images from two cameras are vertically aligned. There are few constraints that are needed to follow every time we track the object. Constrains are as follows: 1. The object that we are tracking should be always clearly seen by the two cameras, if not the code stops outputting the coordinates of the object and starts again if camera sees the object. 2. The camera focus and the distance between the cameras are arranged so that the player’s playing area should be clearly seen by two cameras. Players bat should be on the image plane of both cameras. 3. Cameras are not moved or refocused after calibration, if moved; we need to calibrate once again with new calibration data. 4. For tracking the object which is at larger distance; distance, angle, and focus are need be adjusted accordingly. Larger the distance between cameras, larger distance object can be tracked. Focus can be adjusted according to the size of the object that we are tracking for. Distance between the cameras is the major factor for tracking the object. 5. The outputs are (X, Y, & Z) coordinates of colored object. The outputs are categorized as Min, Max and Average; the reason because all pixels wouldn’t show the same value 75 but the average value is approximately (1 to 3cm difference) equal to the coordinates of the object that we are tracking. 4.2) Coordinates of a colored object in front of cameras: (0, 0, Z) Figure 4.1: Camera coordinate system X, Y and Z values are from the left camera. The program continuously outputs the coordinates of a colored object in a console window. Maximum Z=-91.37 cm (The maximum depth value of a pixel in the disparity image). Minimum Z= -70.63 cm (The minimum depth value of a pixel in the disparity image). Average Z=-76.74 cm (The average depth value of pixels in the disparity image which is the exact real depth value of colored object). Maximum X=-51.85 cm (The maximum X value of a pixel in the disparity image). 76 Minimum X= -40.46 cm (The minimum X value of a pixel in the disparity image). Average X=-43.89cm (The average X value of pixels in the disparity image which is the exact real X value of colored object). Maximum Y=-3.51 cm (The maximum X value of a pixel in the disparity image). Minimum Y= -2.08 cm (The minimum X value of a pixel in the disparity image). Average Y=-2.83 cm (The average X value of pixels in the disparity image which is the exact real X value of colored object). 4.3) Results are graphically shown below with left and right calibration data. Figure 4.2: Detected corners on the image taken from left camera. 77 Figure 4.3: Detected corners on the image taken from right camera. 78 Figure 4.4: Rectified image pair, Each pixel in the left image is vertically aligned to the right image. Figure 4.5: Displays average error to its sub-pixel accuracy. 79 Figure 4.6: Coordinates of an object with respective to left camera. Colored object is moved towards and away from the camera to check whether the cameras are tracking the object correctly or not. 80 Figure 4.7: Disparity between left and right image. Disparity is pixel distance between the matching pixels from left and right image. Each disparity limit defines a plane at a fixed depth from the cameras. 81 Chapter 5 CONCLUSION The purpose of this project is to design a sensor that can detect player’s bat for table tennis shooter. We have designed this sensor using two USB cameras and OpenCV computer vision library. The library basically consists of some important functions which does everything for us. The program was coded in C++ using OpenCV library. The program function is to calibrate stereo cameras, rectify the distortion in the images and finally outputs coordinates of the object. In program was coded to read number of chessboard images from a text file which contains a list of left and right stereo (chessboard) image pairs, which are used to calibrate the cameras and then also rectify the images. The code next reads the left and right image pairs and finds the chessboard corners to sub-pixel accuracy. The code saves object and image points for all images. With the list of image and object points on good chessboard images, the code calls the important cvStereoCalibrate() to calibrate the camera. The calibration function outputs the camera matrix and distortion vector for two cameras; it also outputs rotation matrix, the translation vector, the essential matrix, and the fundamental matrix. The accuracy of calibration can be checked by checking how nearly the points in one image lie on the Epipolar lines of the other image. Epilines are computed by the function cvComputeCorrespondEpilines. The dot product of the points with the Epipolar is 0.29 with our calibration data. The code then computes the rectification maps using the uncalibrated method cvStereoRectifyUncalibrated() or the calibrated (Bouguet) method 82 cvStereoRectify(). Here we have rectified data for each source pixels for both left and right images. The distance between and focus of cameras should not be changed after calibration because calibration function makes rotational and translational relation between two cameras. We initialized two cameras and made them to take images of a colored object for tracking. Background subtraction technique is used to make the images to see only the colored object from two cameras. The images from left and right camera are rectified using cvRemap() and lines are drawn across the image pairs that makes us to see how well the rectified images were aligned .The rectified images are initialized to the block-matching state using cvCreateBMState(). We then computed the disparity maps by using cvFindStereoCorrespondenceBM(). This Disparity is initialized to the function cvReprojectImageTo3D() to get depth Image. The (X, Y, &Z) coordinates of colored object are encoded in the depth image, this function also takes projection matrix which consists of distance between two cameras and focal length of each camera. These coordinates of colored object (Player’s bat) can be linked to the shooter; this makes the shooter to throw the ball exactly to player’s bat for training purpose. The shooter can be now being able to see a 3D scene that makes the shooter to use machine learning and intelligent techniques for advance training purposes. Machine learning makes the shooter to understand the player’s skill on different shots and that makes shooter to throw the ball intelligently for all kinds of training purposes. 83 Chapter 6 FUTURE WORK Work can be extended by adding machine learning techniques for better accuracy of results. Work can be extended by designing and implementing a table tennis shooter (a robotic system) that can throw a ball to players bat for training purposes. This robot can take 3D coordinates of a bat from this project code. Work can be extended for industrial inspection purposes. Work can be extended to develop an application for part modeling; this application can be linked to different modeling software (Example: Solid Works, CATIA, Pro-E, etc) for faster part modeling. Work can be extended for detecting an object in front of vehicle (Unmanned Vehicle). 84 BIBLIOGRAPHY [1] Gray Bradski , Adrian Khehler, “Learning OpenCV Computer Vision with OpenCV library”. [2] Gray Bradski, “OpenCV Wiki” [Online]. Available: http://opencv.willowgarage.com/wiki [3] Michalel C. Fairhurst, “Computer Vision for Robotic Systems”. [4] David Forsyth, Jean Ponce, “Computer vision: a modern approach”. [5] Shervine Mami “Learning OpenCV” [Online]. Available: http://www.shervinemami.co.cc/openCV.html [6] KenConley “ ROS wiki” [Online] Available: http://www.ros.org/wiki/ [7] Vishvjit S. Nalwa “A Guided Tour of Computer Vision” [8] David Lowe, “Computer Vision Industry” [Online]. Available: http://www.cs.ubc.ca/~lowe/vision.html