COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING OBJECTS IN 3D ENVIRONMENT

advertisement
COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
OBJECTS IN 3D ENVIRONMENT
Sravan Kumar Reddy Mothe
B.Tech., Jawaharlal Nehru Technological University, India, 2007
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
MECHANICAL ENGINEERING
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
SPRING
2011
COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
OBJECTS IN 3D ENVIRONMENT
A Project
by
Sravan Kumar Reddy Mothe
Approved by:
________________________________, Committee Chair
Yong S.Suh, Ph. D.
_________________________
Date
ii
Student: Sravan Kumar Reddy Mothe
I certify that this student has met the requirements for format contained in the
University format manual, and that this project is suitable for shelving in the Library and
credit is to be awarded for the Project.
________________________, Graduate Coordinator
Kenneth Sprott, Ph. D.
Department of Mechanical Engineering
iii
_____________________
Date
Abstract
of
COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
OBJECTS IN 3D ENVIRONMENT
by
Sravan Kumar Reddy Mothe
In recent times computer recognition has become most powerful tool for many robotic
applications. Applications like inspection, surveillance, industrial automation and gaming
needs 3D positional data in order to interact with external environment and this can be
achieved by computer recognition. Computer recognition can be done by using many
different tools which are OpenCV, Matlab, OpenGL, etc. OpenCV has an optimized
library with 500 useful functions for detecting, tracking, image transformation, 3D vision,
etc.
The scope of this project is to get 3D position of an object from two sensors. The
sensors are two cameras which are needed to be calibrated before they see the 3D object.
Calibration is the process in which the output image from two cameras is vertically
aligned, which means all pixel points are vertically aligned. After calibration these two
images from camera1 and camera 2 are inputted into openCV 3D function.
This
application is majorly designed for Ping Pong game shooter. The coding part of this
iv
project includes writing code in C++ using OpenCV libraries for calibrating cameras,
and recognition for tracking 3D object. The output of the coding part is to get 3D position
of player’s bat from cameras in camera coordinate system. This 3D positional data can be
inputted into the shooter so that shooter’s joints can move automatically to shoot the ball
exactly to the player’s bat for training purposes. This 3D vision technology can be used
in many other applications like industrial robots, unmanned vehicles, intelligent
surveillance, medical devices, gaming, etc.
_______________________________, Committee Chair
Yong S.Suh, Ph. D.
______________________
Date
v
ACKNOWLEDGMENTS
While working on this project, some people helped me to reach where I am today and I
would like to thank all for their support and patience.
Firstly, I would like to thank Professor Dr. Yong S.Suh for giving me an opportunity to
do this project. His continuous support was the main thing that helped me to develop
immense interest on the project that led to do this project. Dr.Yong S.Suh helped me by
providing many sources of information that needed from beginning of the project till the
end. He was always there to meet, talk and answer the questions that came across during
the project.
Special thanks to my advisor Dr Kenneth Sprott for helping me to complete the writing of
this dissertation, without his encouragement and constant guidance I could not have
finished this report.
Finally, I would also like to thank all my family, friends and Mechanical engineering
department who helped me to complete this project work successfully. Without any of
the above-mentioned people the project would not have come out the way it did. Thank
you all.
vi
TABLE OF CONTENTS
Page
Acknowledgments
vi
List of Figures
ix
Software Specifications
x
Chapter
1. INTRODUCTION AND BACKGROUND
1
1.1
Introduction to computer vision
2
1.2
Applications of computer vision
2
1.3
Tools available for computer vision
8
1.3.1
OpenCV (Open Source Computer Vision)
8
1.3.2
VXL (Vision-something-Libraries)
9
1.3.3
BLEPO
10
1.3.4
MinGPU (A minimum GPU library for computer vision)
10
1.4
Description of the project
2. EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING
PROCEDURES
2.1
11
13
Steps to calibrate stereo cameras and obtain 3D data
13
2.1.1
Step1 ( Loading OpenCV library)
14
2.1.2
Step2 ( Preparing chessboard and seting up cameras)
17
2.1.3
Step3 (C++ program to capture calibration data)
vii
19
2.1.4
Step4 (Code for calibration and for 3D object tracking)
3. IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION
3.1 Important functions used for this project
25
26
26
3.1.1
cvFindChessboardCorners
26
3.1.2
cvDrawChessboardCorners
28
3.1.3
cvStereoCalibrate
28
3.1.4
cvComputeCorrespondEpilines
32
3.1.5
cvInitUndistortRectifyMap
33
3.1.6
cvStereoBMState
36
3.1.7
cvReprojectImageTo3D
39
3.2 Pseudo code for stereo calibration and 3D vision
4. RESULTS OF CALIBRATION AND 3D VISION
40
73
4.1 Project Application Information
73
4.2 Coordinates of a colored object in front of cameras
75
4.3
76
Results are graphically shown
5. CONCLUSION
81
6. FUTURE WORK
83
Bibliography
84
viii
LIST OF FIGURES
Page
1. Figure 1.1: Interaction of various fields of study defining interests
in computer vision
2
2. Figure 2.1: Setting up OpenCV path for Environmental Variables
14
3. Figure 2.2: Creating of new Win32 Console Application
16
4. Figure 2.3: Loading libraries to Visual Studio 2010
16
5. Figure 2.4: Loading Additional Dependencies to the project
17
6. Figure 2.5: Chessboard used for calibration of stereo cameras
18
7. Figure 2.6: Two USB cameras are fixed to a solid board in front of
3D object
19
8. Figure 2.7: Calibration data from camera 1 and camera 2
24
9. Figure 2.8: Text file from that calibration code reads from
25
10. Figure 4.1: Camera coordinate system
75
11. Figure 4.2: Detected corners on the image taken from left camera
76
12. Figure 4.3: Detected corners on the image taken from right camera
77
13. Figure 4.4: Rectified image pair
78
14. Figure 4.5: Displays average error to its sub-pixel accuracy
78
15. Figure 4.6: Coordinates of an object with respective to left camera
79
16. Figure 4.7: Disparity between left and right image
80
ix
SOFTWARE SPECIFICATIONS
1. The initial requirement to run the program is to have C++ compiler, preferable
compiler is Visual Studio 2010.
2. Download OpenCV libraries and load them to Visual Studio.
3. Create new project in Visual Studio by opening the folder that is on the disc and open
the source file and run it. Note: Two USB cameras should be connected before
running the source file.
4. OpenCV loading procedures are clearly illustrated on the report.
5. Operating System: Windows 7 or Windows Vista (preferred).
6. System requirements: 4 GB RAM and 2.53 GHz processor speed (preferred).
x
1
Chapter 1
INTRODUCTION AND BACKGROUND
1.1)
Introduction to computer vision:
Vision is our most powerful sense. It provides us with a remarkable amount of
information about our surroundings and enables us to interact intelligently with the
environment. Through it we learn the position and identities of objects and relation
between them. Vision is also most complicated sense. The knowledge that we have
accumulated about how our biological vision system operate is still fragmentary and
confined mostly to processing stages directly concerned with signals from the sensors.
Today one can find vision systems successfully deal with a variable environment as parts
of machine.
Computer vision (image understanding) is a technology that studies how to
reconstruct, and understand a 3D scene from its 2D images in terms of the properties of
the structures present in the scene. Computer vision is concerned with modeling and
replicating human vision using computer software and hardware. It combines knowledge
in computer science, electrical engineering, mathematics, physiology, biology, and
cognitive science. It needs knowledge from all these fields in order to understand and
simulate the operation of the human vision system. As a scientific discipline, computer
vision is concerned with the theory behind artificial systems that extract information from
2
images. The image data can take many forms, such as video sequences, views from
multiple cameras, or multi-dimensional data.
Biological studies
Computer Science
and Engineering
Artificial
Intelligence
/Cognitive
Studies
Computer
vision
Electronics
Engineering
Mechanical
Engineering
Robotics
Figure 1.1: Interaction of various fields of study defining interests in computer vision.
1.2) Applications of computer vision:
Much of artificial intelligence deals with autonomous planning or deliberation for robot
systems to navigate through an environment. A detailed understanding of these
environments is required to navigate through them. Information about the environment
could be provided by a computer vision system, acting as a vision sensor and providing
high-level information about the environment and the robot. Potential application areas
3
for vision-driven automated systems are many. Each brings its own particular problems
which must be resolved by system designers if successful operation is to be achieved but,
generally speaking, applications can be categorized according to the processing
requirements they impose. To illustrate I briefly describe a number of such areas of
application.
Examples are categorized under their principal application area. [Ref: 8]

Three-dimensional modeling:
1.
Creates 3D models from a set of images. Objects are imaged on a calibration.
2.
Photo Modeler software allows creation of texture-mapped 3-D models from a
small number of photographs. Uses some manual user input.
3.
Uses projected light to create a full 3-D textured model of the human face or
body in sub-second times.

Traffic and road management:
1. Created the Auto scope system that uses roadside video cameras for real-time
traffic management. Over 100,000 cameras are in use.
2. Imaging and scanning solutions for road network surveying.

Web Applications:
1. Image retrieval based on face recognition.
4
2. Develops a system for image search on the web. Uses GPUs for increased
performance.
3. Image retrieval based on content.
4. Virtual makeover website, TAAZ.com uses computer vision methods to allow
users to try on makeup, hair styles, sunglasses, and jewelry.

Security and Biometrics:
1. Systems for intelligent video surveillance.
2. Systems for biometric face recognition.
3. Fingerprint recognition systems with a novel sensor.
4. Systems for behavior recognition in real-time video surveillance.
5. Fingerprint recognition systems.
6. Smart video surveillance systems.
7. Security systems using novel sensors, such as registered visible and thermal
infrared images and use of polarized lighting.
8. Security systems for license plate recognition, surveillance, and access control.
9. Image processing and computer vision for image forensics.
10. Automated monitoring systems, including face and object recognition.
11. Detection and identification of computer users.
12. Detection and monitoring of people in video streams.
13. Face verification and other biometrics for passport control.
5

People tracking:
1. Tracking people within stores for sales, marketing, and security.
2. Systems for counting and tracking pedestrians using overhead cameras.
3. Tracking people in stores to improve marketing and service.

Object Recognition for Mobile Devices:
1. Visual search for smart phones, photo management, and other applications.
2. Image recognition and product search for camera phones.

Industrial automation and inspection:
1. Industrial robots with vision for part placement and inspection.
2. Vision systems for the plastics industry.
3. Inspection systems for optical media, sealants, displays, and other industries.
4. Develops 3D scanners for sawmills and other applications.
5. Vision systems for industrial inspection tasks, including food processing,
glassware, medical devices, and the steel industry.
6. Develops 3D vision systems using laser sensors for inspection of wood
products, roads, automotive manufacturing, and other areas.
7. Industrial mobile robots that use vision for mapping and navigation.
8. Trainable computer vision systems for inspection and automation.
9. Laser-based inspection and templating systems.
6
10. Vision systems for surface inspection and sports vision applications.
11. Systems to inspect output from high-speed printing presses.
12. Vision systems for textile inspection and other applications.
13. Systems for inspection and process control in semiconductor manufacturing.
14. Automated inspection systems for printed circuit boards and flat panel displays.
15. Creates 3D laser scanning systems for automotive and other applications.
16. Has developed a system for accurate scanning of 3D objects for the automotive
and other industries. The system uses a 4-camera head with projection of
textured illumination to enable accurate stereo matching.

Games and Gesture Recognition:
1. Time-of-flight range sensors and software for gesture recognition. Acquired by
Microsoft in 2010.
2. Tracks human gestures for playing games or interacting with computers.
3. Real-time projected infrared depth sensor and software for gesture recognition.
Developed the sensing system in Microsoft's Xbox Kinect.
4. Interactive advertising for projected displays that tracks human gestures.
5. Uses computer vision to track the hand and body motions of players to control
the Sony Play station.
7

Film and Video: Sports analysis:
1. Uses multiple cameras to provide precise tracking in table tennis, cricket, and
other sports for refereeing and commentary.
2. Creates photorealistic 3D visualization of sporting events for sports
broadcasting and analysis.
3. Systems for tracking sports action to provide enhanced broadcasts.
4. Develops Piero system for sports analysis and augmentation.
5. Systems for tracking sports players and the ball in real time, using some human
assistance. (My project can be used for this application.)
6. Vision systems to provide real-time graphics augmentation for sports
broadcasts.
7. Provides 3D tracking of points on the human face or other surfaces for character
animation. Uses invisible phosphorescent makeup to provide a random texture
for stereo matching.
8. Systems for creating virtual television sets, sports analysis, and other
applications of real-time augmented reality.
9. Video content management and delivery, including object identification and
tracking.
10. Systems for tracking objects in video or film and solving for 3D motion to allow
for precise augmentation with 3D computer graphics.
8
1.3) Tools available for computer vision:
1.3.1) OpenCV (Open Source Computer Vision):
OpenCV is a library of programming functions for real time computer vision
applications. The library is written in C and C++ and runs under different platforms
namely Linux, Windows and Mac OS X. OpenCV was designed for strong focus on realtime applications. Further automatic optimization on Intel architectures can be achieved
by Intel’s Integrated Performance Primitives (IPP), which consists of low-level optimized
routines in many different algorithmic areas. One of OpenCV’s goals is to provide a
flexible computer vision infrastructure that helps us build fairly sophisticated vision
applications quickly. The OpenCV Library has over 500 functions that can be used for
many areas in vision, including factory product inspection, medical imaging,
surveillance, user interface, camera calibration, stereo vision, and robotics.
OpenCV core main libraries:
1. “CVAUX” for Experimental/Beta.
2. “CXCORE” for Linear Algebra and Raw matrix support, etc.
3. “HIGHGUI” for Media/Window Handling and Read/write AVIs, window displays,
etc.
9
OpenCV’s latest version is available from http://SourceForge.net/projects/opencvlibrary.
We can be able to download openCV library and build it into Visual Studio 2010 and
steps to build library will be illustrated later in this report.
1.3.2) VXL (Vision-something-Libraries):
VXL is a collection of C++ libraries designed for computer vision research and
implementation. It was created from TargetJr with the aim of making a light, fast and
consistent system. VXL is written in ANSI/ISO C++ and is designed to be portable over
many platforms.
Core libraries in VXL are:
1. VNL (Numeric): Numerical containers and algorithms like matrices, vectors,
decompositions, optimizers.
2. VIL (Imaging): Loading, saving and manipulating images in many common file
formats, including very large images.
3. VGL (Geometry): Geometry for points, curves and other elementary objects in 1, 2
or 3 dimensions.
4. VSL (Streaming I/O), VBL (Basic templates), VUL (Utilities): Miscellaneous
platform-independent functionality.
10
As well as the core libraries, there are libraries covering numerical algorithms, image
processing, co-ordinate systems, camera geometry, stereo, video manipulation, and
structure recovery from motion, probability modeling, GUI design, classification, robust
estimation, feature tracking, topology, structure manipulation, 3D imaging, etc. Each
core library is lightweight, and can be used without reference to the other core libraries.
1.3.3) BLEPO:
Blepo is an open-source C/C++ library to facilitate computer vision research and
education. Blepo is designed to be easy to use, efficient, and extensive.
1. It enable researchers to focus on algorithm development rather than low-level
details such as memory management, reading/writing files, capturing images, and
visualization, without sacrificing efficiency;
2. It enable educators and students to learn image manipulation in a C++
environment that is easy to use; and
3. It captures a repository of the more mature, well-established algorithms to enable
their use by others both within and without the community to avoid having to
reinvent the wheel.
1.3.4) MinGPU: A minimum GPU library for computer vision:
In computer vision it is becoming popular to implement algorithms in whole or in part on
a Graphics Processing Unit (GPU), due to the superior speed GPUs can offer compared
11
to CPUs. GPU has implemented two well known computer vision algorithms – LukasKanade optical flow and optimized normalized cross-correlation as well as homography
transformation between two 3D views. Minimum GPU is a library which contains, as
minimal as possible, all of the necessary functions to convert an existing CPU code to
GPU. MinGPU provides simple interfaces which can be used to load a 2D array into the
GPU and perform operations on it. All GPU and OpenGL related code is encapsulated in
the library; therefore users of this library need not to know any details on how GPU
works. Because GPU programming is currently not that simple for anyone outside the
Computer Graphics community, this library can facilitate an introduction to GPU world
to researchers who have never used the GPU before. The library works with both nVidia
and ATI families of graphics cards and is configurable.
1.4) Description of the Project:
The goal of the project is design a sensor that detects and tracks objects in 3D
environment, for this project we specifically designed for the Ping Pong game shooter.
The sensors used for this project are stereo cameras and these cameras make the shooter
to know where the player’s bat is at. This project is basically about writing code using
OpenCV library for cameras to see the player’s bat. The output from the code is exact 3D
location of the bat (X, Y & Z coordinates) from the sensor.
The output from the stereo cameras is just a video stream and this video stream is further
analyzed by the computer program written in C++. The program basically consists of
12
different function namely Calibration, Rectification, Disparity, Background Subtraction
and Reprojection3D. The Calibration basically does is to remove the distortion in the
video streams which is usually caused by improper alignment and quality of the lenses.
The Rectification function rectifies both images so that all pixels from the both images
are vertically aligned. Disparity function calculates the differences in x-coordinates on
the image planes of the same feature viewed in the left and right cameras. Background
subtraction function subtracts the background and makes the program to see only the
colored object (bat) for tracking. Finally the function Reprojection3D takes the disparity
as an input and outputs the coordinates of an object from the sensors. These coordinates
can be inputted into the shooter so the shooter can understand the bat position and
implement the advance training modes for the player to practice the game.
13
Chapter 2
EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING PROCEDURES
2.1) Steps to calibrate stereo cameras and obtain 3D data:
1. Loading OpenCV libraries into Visual Studio 2010 and check with the sample
program whether it is working well or not. If step 1 is okay move to step 2.
2.
Make a good sized chessboard and tape it to a solid piece of wood, plastic (make
sure the image is not bent or calibration will not work). Make sure we focus our
cameras so that the text on the chessboard is readable, make sure we don't modify
the focus and distance between the cameras after calibration or during calibration
since these two are important factors in calibration. If we modify position or
focus of cameras we need to calibrate ones again.
3. Compile a C++ code that can capture chessboard pictures from left and right
cameras at same time. The code takes 16 sets of calibration pictures. The waiting
time for every set of pictures is 10 seconds so that we can move the chessboard
into different position in front of stereo cameras. All constant values are
preprogrammed in the code and can be adjustable according to the requirements.
4. After getting the calibration data from step3 we can execute the stereo calibration
code that outputs the 3D data of a colored object. It actually takes calibration data
from step3 as an input. The program basically does is to calibrate stereo cameras
and locate 3D object. X, Y and Z are output values of colored object from stereo
14
cameras. Repeat step3 until we get good results. The average error function in the
code should be minimum, usually less than 0.3. Lesser the error, better the output
(3D data). Each step is clearly illustrated bellow:
2.1.1) Step1 (Loading OpenCV libraries):
1. Download OpenCV source files from http://sourceforge.net/projects/opencvlibrary/.
The downloaded file should contain folders namely Bin, Src, Include and Lib.
Downloaded openCV2.1.0 folder should be in ProgramFiles folder of our computer.
2. Go to Start menu->Computer and right click on it-> Click on properties->click on
Advance System Settings->click on Environmental Variables->Add new path in User
Variable ABC box as shown in the figure 2.1-1.
Figure 2.1: Setting up OpenCV path for Environmental Variables.
15
3. To get installation done we need to follow these steps shown below in figures and
text.
Go to Start -> All Programs -> Microsoft Visual Studio 2010 (Express) -> Microsoft
Visual (Studio / C++ 2010 Express).

File -> New -> Project

Name: 'OpenCV_Helloworld', with selecting ‘Win32Console Application' click
'OK' and click 'Finish'

Go to project -> OpenCV_Helloworld Properties...Configuration Properties ->
VC++ Directories

Go to Executable Directories and click add: ‘C:\ProgramFiles\OpenCV2.1\bin;’

Go to Include Directories and click add: ' C:\Program
Files\OpenCV2.1\include\opencv;'

Go to Library Directories and click add: ' C:\Program Files\OpenCV2.1\lib;'

Go to Source Directories and click add and add five source files
‘C:\ProgramFiles\OpenCV2.1\src\cv;C:\ProgramFiles\OpenCV2.1\src\cvaux;C:\P
rogramFiles\OpenCV2.1\src\cvaux\vs;C:\ProgramFiles\OpenCV2.1\src\cxcore;C:
\ProgramFiles\OpenCV2.1\src\highgui;C:\ProgramFiles\OpenCV2.1\src\ml;C:\Pr
ogramFiles\OpenCV2.1\src1;’

Go to Linker -> Input -> Additional Dependencies and click add:
‘cvaux210d.lib;cv210d.lib;cxcore210d.lib;highgui210d.lib;ml210d.lib;’
16
Figure 2.2: Creating new Win32 Console Application.
Figure 2.3: Loading libraries to Visual Studio 2010.
17
Figure 2.4: Loading Additional Dependencies to the project.
2.1.2) Step2 ( Preparing chessboard and seting up cameras):
Prepare a chessboard of 90 by 70 centimeters sized for cameras which are 40 centimeters
apart. The chess borard should have atleast 9 boxes in vertical and 6 boxes in horizontal
and not less than these values. The reason because cameras wouldn’t able to recognise
corners if the box size is too small. So we should make sure the size of the box should be
larger and number of boxes should be minimum for above mentioned size. Figure 2.5
displays the chessboard used for this project.
Cameras are
approximately placed 40 centimeters apart. After calibration
cameras are should not be moved, the reason is because functions in code makes the
rotational and transulational relation between two cameras. Distance and focus are the
18
major factors in the relation. Larger the distance between the cameras and larger the
distance we can track an object from the camera. Cameras are fixed to a solid board with
some consatnt distance so that we don’t have to calibrate every time when we need to
track or get 3D data of an object. Figure 2.1-6 shows the camera setup.
Figure 2.5: Chessboard used for calibration of stereo cameras.
19
Figure 2.6: Two USB cameras are fixed to a solid board in front of 3D object.
2.1.3) Step3 (C++ program to capture calibration data):
The below program that is used to check whether the OpenCV libraries are working well
and to get 16 sets of calibration data.
#include "cv.h"
#include "cxmisc.h"
#include "highgui.h"
#include <vector>
#include <string>
#include <algorithm>
20
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
//Initializing image arrays and capture arrays for frames, gray frames from two
cameras.
IplImage* frames[2];
IplImage* framesGray[2];
CvCapture* captures[2];
for(int lr=0;lr<2;lr++)
{
captures[lr] = 0;
frames[lr]=0;
framesGray[lr] = 0;
}
int r =cvWaitKey(1000);//setting wait time to 1000 milliseconds before capturing
chessboard images.
captures[0] = cvCaptureFromCAM(0); //capture from first cam
captures[1] = cvCaptureFromCAM(1); //capture from second cam
int count=0;
21
while(1)// loop is used to get every captured frame and if count reaches to the number
of pictures we need for the calibration, loop exits.
{
frames[0] = cvQueryFrame(captures[0]);//getting frame
frames[1] = cvQueryFrame(captures[1]);
cvShowImage("RawImage1",frames[0]);//showing raw image
cvShowImage("RawImage2",frames[1]);
framesGray[0] = cvCreateImage(cvGetSize(frames[0]),8,1);//creating image for gray
image conversion
framesGray[1] = cvCreateImage(cvGetSize(frames[1]),8,1);
cvCvtColor(frames[0], framesGray[0], CV_BGR2GRAY); //converting BGR image
to gray scale image
cvCvtColor(frames[1], framesGray[1], CV_BGR2GRAY);
cvShowImage("frame1",framesGray[0]);//show converted gray image
cvShowImage("frame2",framesGray[1]);
printf("count=%d\n",count);
if(count==0)
{
cvSaveImage("calibleft1.jpg",framesGray[0]); //saving gray image to drive
cvSaveImage("calibright1.jpg",framesGray[1]);
count=count++;
22
//loop exits and count is incremented and count is 1. In the next loop it goes to next
for count==1 and same process repeats for 6 image pairs
}
else if(count==1)
{
cvSaveImage("calibleft2.jpg",framesGray[0]);
cvSaveImage("calibright2.jpg",framesGray[1]);
count=count++;
}
else if(count==2)
{
cvSaveImage("calibleft3.jpg",framesGray[0]);
cvSaveImage("calibright3.jpg",framesGray[1]);
count=count++;
}
else if(count==3)
{
cvSaveImage("calibleft4.jpg",framesGray[0]);
cvSaveImage("calibright4.jpg",framesGray[1]);
count=count++;
}
23
else if(count==4)
{
cvSaveImage("calibleft5.jpg",framesGray[0]);
cvSaveImage("calibright5.jpg",framesGray[1]);
count=count++;
}
else if(count==5)
{
cvSaveImage("calibleft6.jpg",framesGray[0]);
cvSaveImage("calibright6.jpg",framesGray[1]);
count=count++;
}// if we need more image sets then we need to add more else if functions.
else if(count==6)
{
cvSaveImage("calibleft7.jpg",framesGray[0]);
cvSaveImage("calibright7.jpg",framesGray[1]);
count=count++;
}
int c= cvWaitKey(5000)//for every picture it waits for 5000 milliseconds so that we
can change the position of chessboard;
}}
24
Figure 2.7: Calibration data from camera 1 and camera 2.
All images are automatically saved into
‘C:SampleProgram\SampleProgram\SampleProgram\’. If the project name is
‘Sampleprogram’ and C++ file name is also ‘SampleProgram’ then all image files are
25
should be copied to ‘C: SampleProgram\SampleProgram\Debug\StereoData’ so that
calibration code can read a text file which has the address of these images.
The text file is shown below in a picture.
Figure 2.8: Text file that the calibration code reads from.
2.1.4) Step4 (Code for calibration and for 3D object tracking):
Stereo calibration code and functions are illustrated clearly in the next section of this
report. The outputs and execution is showed in video named StereoCalibOP.avi. The
source code, calibration data (images), and video is in the compact disc attached at the
end of the report.
26
Chapter 3
IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION
3.1) Important functions used for this project:
3.1.1) cvFindChessboardCorners: This is openCV inbuilt function used to find out
chessboard inner corners. Calibration of stereo cameras is done by finding corners of
chessboard to its sub-pixel accuracy. Detecting the chessboard corners by the cameras
can be achieved by the below function.
int cvFindChessboardCorners (
const void* image, //image – Source chessboard view; it must be an 8-bit grayscale or
color image//
CvSize patternSize, //patternSize – The number of inner corners per chessboard row and
column ( patternSize = cvSize(columns, rows) )//
CvPoint2D32f* corners, // corners – The output array of corners detected//
int* cornerCount=NULL, // cornerCount – The output corner counter. If it is not NULL,
it stores the number of corners found//
int flags=CV_CALIB_CB_ADAPTIVE_THRESH)
//flags –Various operation flags, can be 0 or a combination of the following values:
27
CV_CALIB_CB_ADAPTIVE_THRESH - Adaptive threshold is to convert the image to
black and white, rather than a fixed threshold level (computed from the average image
brightness).
CV_CALIB_CB_NORMALIZE_IMAGE - Normalize the image gamma with function
cv EqualizeHist() before applying fixed or adaptive threshold value.
CV_CALIB_CB_FILTER_QUADS - Additional criteria (like contour area, perimeter,
square-like shape) to filter out false quads that are extracted at the contour retrieval stage.
The function attempts to determine whether the input image is a view of the chessboard
pattern and locate the internal chessboard corners. The function returns a non-zero value
if all of the corners have been found and they have been placed in a certain order (row by
row, left to right in every row), otherwise, if the function fails to find all the corners or
reorder them, it returns 0. For example, a regular chessboard has 9x7 squares has 8x6
internal corners, that is, points, where the black squares touch each other. The coordinates
detected are approximate, and to determine their position more accurately, we may use
the function cvFindCornerSubPix().
Note: the function requires some white space around the board to make the detection
more robust in various environment (otherwise if there is no border and the background is
dark, the outer black squares could not be segmented properly and so the square grouping
and ordering algorithm will fail).//
28
3.1.2) cvDrawChessboardCorners: The function draws the individual chessboard corners
detected as red circles if the board was not found or as colored corners connected with
lines if the board was found.
cvDrawChessboardCorners(
CvArr* image, // image – The destination image; it must be an 8-bit color image//
CvSize patternSize, //Same as in function cvFindChessboardCorners()//
CvPoint2D32f* corners, //Same as in function cvFindChessboardCorners()//
int count, // The number of corners//
int patternWasFound, //Indicates whether the complete board was found
not
or
. One may just pass the return value to cvFindChessboardCorners() function.//
3.1.3) cvStereoCalibrate: Stereo calibration is the process of computing the geometrical
relationship between the two cameras in space. Stereo calibration depends on finding the
rotation matrix R and translation vector T between the two cameras.
cvStereoCalibrate(
constCvMat* objectPoints, // objectPoints, is an N-by-3 matrix containing the physical
coordinates of each of the K points on each of the M images of the 3D object such that N
29
= K × M. When using chessboards as the 3D object, these points are located in the
coordinate frame attached to the object (and usually choosing the Z-coordinate of the
points on the chessboard plane to be 0), but any known 3D points may be used. //
const CvMat* imagePoints1,
const CvMat* imagePoints2, \\ imagePoints1 and imagePoints are N-by-2 matrices
containing the left and right pixel coordinates (respectively) of all of the object points. If
you performed calibration using a chessboard for the two cameras, then imagePoints1
and imagePoints2 are just the respective returned values for the multiple calls to
cvFindChessboardCorners() for the left and right camera views. //
const CvMat*pointCounts, // Integer 1xM or Mx1 vector (where M is the number of
calibration pattern views) containing the number of points in each particular view. Sum
of vector elements must match the size of object Points and image Points.
CvMat* cameraMatrix1,
CvMat* cameraMatrix2,
The input/output first and second camera matrix:
Where fx and fy are focal lengths of camera, and Cx and Cy are the center of coordinates
on the projection screen.
CvMat* distCoeffs1,
30
CvMat* distCoeffs2,
//The input/output lens distortion coefficients for the first and second camera 5x1 or 1x5
floating-point vectors
.
CvSize imageSize, // Size of the image, used only to initialize intrinsic camera matrix.//
CvMat* R, //The output rotation matrix between the 1st and the 2nd cameras’ coordinate
systems.//
CvMat*T, //The output translation vector between the cameras’ coordinate systems.//
CvMat* E=0, //The optional output essential matrix.//
CvMat* F=0, //The optional output fundamental matrix.//
CvTermCriteriaterm_crit
=cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 1e-6), // The
termination criteria for the iterative optimization algorithm.//
Int flags=CV_CALIB_FIX_INTRINSIC) //Different flags, may be 0 or combination of
the following values:
CV_CALIB_FIX_INTRINSIC - If it is set, camera Matrix as well as distcoeffs are fixed,
so that only R, T, E and F are estimated.
31
CV_CALIB_USE_INTRINSIC_GUESS - The flag allows the function to optimize some
or all of the intrinsic parameters, depending on the other flags, but the initial values are
provided by us.
CV_CALIB_FIX_PRINCIPAL_POINT - The principal points are fixed during the
optimization.
CV_CALIB_FIX_FOCAL_LENGTH -
and
CV_CALIB_FIX_ASPECT_RATIO -
are fixed.
is optimized, but the ratio
CV_CALIB_SAME_FOCAL_LENGTH - Enforces
is fixed.
and
CV_CALIB_ZERO_TANGENT_DIST - Tangential distortion coefficients for each
camera are set to zeros and fixed there.
CV_CALIB_FIX_K1,
CV_CALIB_FIX_K2,
CV_CALIB_FIX_K3 -
Fixes
the
corresponding radial distortion coefficient (the coefficient must be passed to the function)
The function estimates transformation between the 2 cameras making a stereo pair. For
stereo camera the relative position and orientation of the 2 cameras are fixed. The poses
will relate to each other, i.e. given (
,
) it should be possible to compute (
,
)-
we only need to know the position and orientation of the 2nd camera relative to the 1st
camera. That’s what the function does. It computes (
,
) such that:
32
Optionally, it computes the essential matrix E:
Where
are components of the translation vector T:
And also the function can compute the fundamental matrix F:
Besides the stereo-related information, the function can also perform full calibration of
each of the 2 cameras. However, because of the high dimensionality of the parameter
space and noise in the input data the function can diverge from the correct solution.
3.1.4) cvComputeCorrespondEpilines:
The OpenCV function cvComputeCorrespondEpilines() computes, for a list of points in
one image, the epipolar lines in the other image. For any given point in one image, there
is a different corresponding epipolar line in the other image. Each computed line is
encoded in the form of a vector of three points (a, b, c) such that the epipolar line is
defined by the equation: ax + by + c = 0
cvComputeCorrespondEpilines(
const CvMat* points, // The input points. 2xN, Nx2, 3xN or Nx3 array (where N number
of points). Multi-channel 1xN or Nx1 array is also acceptable//
33
int whichImage, // Index of the image (1 or 2) that contains the points.//
const
CvMat* F,
//
The
fundamental
matrix
that
can
be
estimated
using FindFundamentalMat or StereoRectify.//
CvMat* lines // The output epilines, a 3xN or Nx3 array. Each line
is
encoded by 3 numbers (a, b, c). //)
For points in one image of a stereo pair, computes the corresponding epilines in the other
image. From the fundamental matrix definition, line
point
in the second image for the
in the first image (i.e. whenwhichImage=1) is computed as:
vice versa, when which Image=2,
is computed from
and,
as:
Line
coefficients are defined up to a scale. They are normalized, such that
3.1.5) cvInitUndistortRectifyMap:
The function cvInitUndistortRectifyMap() outputs mapx and mapy. These maps indicate
from where we should interpolate source pixels for each pixel of the destination image;
the
maps
can
then
be
plugged
directly
into
cvRemap().
The
function
cvInitUndistortRectifyMap() is called separately for the left and the right cameras so that
we can obtain their distinct mapx and mapy remapping parameters. The function
cvRemap() may then be called, using the left and then the right maps each time we have
new left and right stereo images to rectify.
34
cvInitUndistortRectifyMap(
const CvMat* cameraMatrix,
const CvMat* distCoeffs,
const CvMat* R, // The optional rectification transformation in object space (3x3
matrix). R1 or R2, computed by StereoRectify can be passed here. If the matrix is NULL,
the identity transformation is assumed.//
const CvMat*newCameraMatrix,
CvArr* map1, // The first output map of type CV_32FC1 or CV_16SC2 - the second
variant is more efficient.//
CvArr* map2, // The second output map of type CV_32FC1 or CV_16UC1 - the second
variant is more efficient.//)
The function computes the joint un-distortion and rectification transformation and
represents the result in the form of maps for Remap. The undistorted image will look like
the
original,
as
if
it
matrix =newCameraMatrix and
was
zero
captured
with
distortion.
a
In
camera
the
case
with
of
camera
stereo
camera newCameraMatrix is normally set to P1 orP2 computed by StereoRectify. Also,
35
this new camera will be oriented differently in the coordinate space, according to R. That,
for example, helps to align two heads of a stereo camera so that the epipolar lines on both
images become horizontal and have the same y- coordinate (in the case of horizontally
aligned stereo camera).
The function actually builds the maps for the inverse mapping algorithm that is used
by Remap. That is, for each pixel
in the destination (corrected and rectified) image
the function computes the corresponding coordinates in the source image (i.e. in the
original image from camera). The process is the following:
where
are the distortion coefficients.
In the case of a stereo camera this function is called twice, once for each camera head,
after StereoRectify, which in its turn is called after StereoCalibrate. But if the stereo
camera was not calibrated, it is still possible to compute the rectification transformations
directly from the fundamental matrix using StereoRectifyUncalibrated. For each camera
36
the function computes homograph H as the rectification transformation in pixel domain,
not a rotation matrix R in 3D space. The R can be computed from H as
where the camera matrix can be chosen
arbitrarily.
3.1.6) cvStereoBMState* cvCreateStereoBMState(
int preset=CV_STEREO_BM_BASIC, // Any of the parameters can be overridden after
creating the structure.//
int numberOfDisparities=0// The number of disparities. If the parameter is 0, it is taken
from the preset; otherwise the supplied value overrides the one from preset.//)
#define CV_STEREO_BM_NARROW
#define CV_STEREO_BM_FISH_EYE 1
#define CV_STEREO_BM_BASIC 0
The function creates the stereo correspondence structure and initializes it. It is possible to
override
any
of
the
parameters
to FindStereoCorrespondenceBM.
typedef struct CvStereoBMState {
//pre filters (normalize input images):
at
any
time
between
the
calls
37
int preFilterType;
int preFilterSize;//for 5x5 up to 21x21
int preFilterCap;
//correspondence using Sum of Absolute Difference (SAD):
int SADWindowSize; // Could be 5x5,7x7, ..., 21x21
int minDisparity;
int numberOfDisparities;//Number of pixels to search
//post filters (knock out bad matches):
int textureThreshold; //minimum allowed
float uniquenessRatio;// Filter out if:
// [ match_val - min_match <uniqRatio*min_match ] over the corr window area
int speckleWindowSize;//Disparity variation window
int speckleRange;//Acceptable range of variation in window
// temporary buffers
CvMat* preFilteredImg0;
CvMat* preFilteredImg1;
CvMat* slidingSumBuf;
} CvStereoBMState;
The state structure is allocated and returned by the function cvCreateStereoBMState().
This function takes the parameter preset, which can be set to any one of the following.
38
CV_STEREO_BM_BASIC
Sets all parameters to their default values
CV_STEREO_BM_FISH_EYE
Sets parameters for dealing with wide-angle lenses
CV_STEREO_BM_NARROW
Sets parameters for stereo cameras with narrow field of view
This function also takes the optional parameter numberOfDisparities; if nonzero, it
overrides the default value from the preset. Here is the specification:
The state structure, CvStereoBMState{}, is released by calling
void cvReleaseBMState(CvStereoBMState **BMState);
Any stereo correspondence parameters can be adjusted at any time between cvFindStereo
CorrespondenceBM calls by directly assigning new values of the state structure fields.
The correspondence function will take care of allocating/reallocating the internal buffers
as needed.
Finally, cvFindStereoCorrespondenceBM() takes in rectified image pairs and outputs a
disparity map given its state structure:
void cvFindStereoCorrespondenceBM(
const CvArr *leftImage,
const CvArr *rightImage,
CvArr *disparityResult,
39
CvStereoBMState *BMState
);
3.1.7) cvReprojectImageTo3D (
const CvArr* disparity, // The input single-channel 16-bit signed or 32-bit floating-point
disparity image.//
CvArr* _3dImage, // The output 3-channel floating-point image of the same size
as disparity. Each element of_3dImage(x, y, z) will contain the 3D coordinates of the
point (x, y), computed from the disparity map.//
const CvMat* Q, // The
perspective transformation matrix that can be obtained
with StereoRectify.//
int handleMissingValues=0 //If true, when the pixels with the minimal disparity (that
corresponds to the outliers; will be transformed to 3D points with some very large Z
value (currently set to 10000). The function transforms 1-channel disparity map to 3channel image representing a 3D surface. That is, for each pixel (x,y) and the
corresponding disparity d=disparity(x, y) it computes:
The matrix Q can be arbitrary
matrix computed by StereoRectify.
40
3.2) Pseudo code for stereo calibration and 3D vision:
Source Code for this project is available in the compact disc attached at the end of the
report.
Initialize libraries which OpenCV has for computer vision and load ‘C++’ libraries too.
Below are the libraries
#include <cv.h
#include <cxmisc.h>
#include <highgui.h>
#include <cvaux.h>
#include <vector>
#include <string>
#include <algorithm>
#include <stdio.h>
#include <ctype.h>
using namespace std;
All the elements of the standard C++ library are declared within what is called a
namespace, the namespace with the name ‘std’. So in order to access its functionality we
declare with this expression that we will be using these entities. This line is very frequent
in C++ programs that use the standard library
Given a list of chessboard images, the number of corners (nx, ny) on the chessboards, and
a flag: useCalibrated for calibrated (0) or (1) for unCalibrated.
41
(1: use cvStereoCalibrate(), 2: compute fundamental matrix separately) stereo. Calibrate
the cameras and display the rectified results along with the computed disparity images.
static void
Creating a function named SteroCalib which takes four parameters from the main
program, the first one is name of the text file that has the list of image files for
calibration; second and third inputs are number of chessboard inner corners in X direction
and Y direction respectively; fourth input is an integer to use another function named
‘useUncailbrated’ in this ‘SteroCalib’ function; if int=0 don’t use function or int=1 use
the function.
StereoCalib(const char* imageList, int nx, int ny, int useUncalibrated)
{
Initializing constant values and declaring integers, vectors and float elements that are
used by this program.
int DisplayCorners = 1; Making ‘DisplayCorners’ as true used in if statement ahead.
int showUndistorted = 1; Making ‘showUndistorted’ as true used in if statement ahead.
bool isVerticalStereo = false
const int maxScale = 1;
const float squareSize = 1.f; //Set this to your actual square size.
‘rt’ is the file name which is first input to this function and we open this file by using this
function fopen FILE* f = fopen(imageList, "rt");
42
Declaration of i, j, lr, nframes, n = nx*ny, N = 0 as integers;
Declaration of arrays of strings, CvPoints in 3D or 2D, chars.
vector<string> imageNames[2];
vector<CvPoint3D32f> objectPoints;
vector<CvPoint2D32f> points[2];
vector<int> npoints;
vector<uchar> active[2];
vector<CvPoint2D32f> temp(n);
CvSize is function to initialize image size. Initially declaring as {0, 0} and storing the
values to ‘imageSize’
CvSize imageSize = {0, 0};
Initializing arrays, vectors.
Creating multi dimensional arrays for creating CvMat
double M1[3][3], M2[3][3], D1[5], D2[5], Q[4][4];
double R[3][3], T[3], E[3][3], F[3][3];
Creating matrices using function CvMat for saving camera matrix, distortion, rotational,
translation, fundamental and projection matrices between camera 1 and 2.
CvMat _M1 = cvMat(3, 3, CV_64F, M1 ); CameraMatrix for camera 1
CvMat _M2 = cvMat(3, 3, CV_64F, M2 ); CameraMatrix for camera 2
CvMat _D1 = cvMat(1, 5, CV_64F, D1 ); Distortion coefficients for camera 1
CvMat _D2 = cvMat(1, 5, CV_64F, D2 ); Distortion coefficients for camera 2
43
CvMat _R = cvMat(3, 3, CV_64F, R ); Rotational matrix between cam 1 and 2
CvMat _T = cvMat(3, 1, CV_64F, T );Translational matrix between cam 1 and 2
CvMat _E = cvMat(3, 3, CV_64F, E ); Essential matrix between cam 1 and 2
CvMat _F = cvMat(3, 3, CV_64F, F ); Fundamental matrix between 1 and 2
CvMat _Q = cvMat(4, 4, CV_64F, Q); Projection matrix between 1 and 2
If ( displayCorners is true ) then
cvNamedWindow( "corners", 1 ); this function creates a new window named corners
which we can see the color image as function says ‘1’ at end.
// READ IN THE LIST OF CHESSBOARDS:
if( !f ) if the file is not opened it will exit the loop below and we did in the earlier part of
the program.
{
fprintf(stderr, "can’t open file %s\n", imageList ); displays error message if the program
couldn’t be able to load the image list.
return;
}
for(i=0;;i++)
{
An array that can store up to 1024 elements of type chars.
char buf[1024];
44
int count = 0, result=0;
lr = i % 2;
Crating a vector array to load multiple points [lr] the remainder of ‘i’ in every iteration.
vector<CvPoint2D32f>& pts = points[lr];
if( !fgets( buf, sizeof(buf)-3, f )) if not some file name with some chars then exit the loop.
break;
size_t len = strlen(buf);
while( len > 0 && isspace(buf[len-1]))
buf[--len] = '\0'; this loop reads all file names and rejects the files that begin with ‘#’.
if( buf[0] == '#')
continue;
IplImage* img = cvLoadImage( buf, 0 ); loads the file with openCV function
cvLoadImage.
if( not a image )
break;
imageSize = cvGetSize(img); getting the size of image by openCV function.
imageNames[lr].push_back(buf); imagename is 0,1,2,3……………….and moves next
step for finding corners of chessboard image from cam 1 and cam 2.
//FIND CHESSBOARDS AND CORNERS THEREIN:
for( int s = 1; s <= maxScale; s++ )
{
45
IplImage* timg = img; Initializing loaded image as some temp image.
if( s > 1 ) loop which is grater then one as it is incremented above.
{
Creating temp image with same properties of image multiplied with‘s’.
timg = cvCreateImage(cvSize(img->width*s,img->height*s),
img->depth, img->nChannels );
resizing img with tmig using openCV function cvResize
cvResize( img, timg, CV_INTER_CUBIC );
}
Doing all operations on temp image instead of using main loaded image results from
function cvFindChessboardCorners loaded into ‘result’.
result = cvFindChessboardCorners( timg, cvSize(nx, ny),
&temp[0], &count,
CV_CALIB_CB_ADAPTIVE_THRESH |
CV_CALIB_CB_NORMALIZE_IMAGE);
The above function takes temp image and number of corners ‘nx’ and ‘ny’.
if( timg is not equal to img )
then release temp image data as below.
cvReleaseImage( &timg );
if( result || s == maxScale )
for( j = 0; j < count; j++ )
46
{
Below functions divided by‘s’ and saves pixel values in temp[j].
temp[j].x /= s;
temp[j].y /= s;
}
if( result )
break;
}
Below loop displays the corners on the image.
if( displayCorners )
{
printf("%s\n", buf); displays file which the program is finding corners from.
Creating another temp image called cimg to display corners.
IplImage* cimg = cvCreateImage( imageSize, 8, 3 );
cvCvtColor( img to cimg with CV_GRAY2BGR );
cvDrawChessboardCorners( cimg, cvSize(nx, ny), &temp[0],
count, result ); Draw chessboard corners with colored circles around the found corners
with connecting lines between points.
cvShowImage( "corners", cimg ); displays image on the screen with corners on
chessboard.
cvReleaseImage( &cimg );
47
if( cvWaitKey(0) == 27 ) //Allow ESC to quit
exit(-1);
}
else
putchar('.');
N = pts.size();
pts.resize(N + n, cvPoint2D32f(0,0));
active[lr].push_back((uchar)result);
//assert( result != 0 );
if( result )
{
Calibration will suffer sub-pixel interpolation.
cvFindCornerSubPix( img, &temp[0], count, cvSize(11, 11), cvSize(-1,-1),
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 0.01));
copy( temp.begin(), temp.end(), pts.begin() + N );
}
cvReleaseImage( &img );
}
fclose(f);
printf("\n");
// HARVEST CHESSBOARD 3D OBJECT POINT LIST:
48
nframes = active[0].size();//Number of good chessboads found
objectPoints.resize(nframes*n);
for( i = 0; i < ny; i++ )
for( j = 0; j < nx; j++ )
objectPoints[i*nx + j] = cvPoint3D32f(i*squareSize, j*squareSize, 0);
for( i = 1; i < nframes; i++ )
copy( objectPoints.begin(), objectPoints.begin() + n, objectPoints.begin() + i*n );
npoints.resize(nframes,n);
N = nframes*n;
CvMat _objectPoints = cvMat(1, N, CV_32FC3, &objectPoints[0] );
CvMat _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] );
CvMat _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] );
CvMat _npoints = cvMat(1, npoints.size(), CV_32S, &npoints[0] );
cvSetIdentity(&_M1);
cvSetIdentity(&_M2);
cvZero(&_D1);
cvZero(&_D2);
// CALIBRATE THE STEREO CAMERAS
printf("Running stereo calibration ...");
fflush(stdout);
49
The object points and image points from the above section are needed for stereo camera
calibration.
cvStereoCalibrate( &_objectPoints, &_imagePoints1, &_imagePoints2, &_npoints,
&_M1, &_D1, &_M2, &_D2, imageSize, &_R, &_T, &_E, &_F,
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, 1e-5),
CV_CALIB_FIX_ASPECT_RATIO +
CV_CALIB_ZERO_TANGENT_DIST +
CV_CALIB_SAME_FOCAL_LENGTH);
printf(" done\n");
CALIBRATION QUALITY CHECK
Because the output fundamental matrix implicitly includes all the output information, we
can check the quality of calibration using the epipolar geometry constraint:
m2^t*F*m1=0
vector<CvPoint3D32f> lines[2]; initializing two matrices for epipolar line.
points[0].resize(N); points resize for every image for two cameras.
points[1].resize(N);
_imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] ); pixel points of first camera
_imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] ); pixel points of second camera
lines[0].resize(N); crating line for every corner point in image of cam 1
lines[1].resize(N); crating line for every corner point in image of cam 2
CvMat _L1 = cvMat(1, N, CV_32FC3, &lines[0][0]);
50
CvMat _L2 = cvMat(1, N, CV_32FC3, &lines[1][0]);
Always work in undistorted space
cvUndistortPoints( &_imagePoints1, &_imagePoints1, &_M1, &_D1, 0, &_M1 );
cvUndistortPoints( &_imagePoints2, &_imagePoints2, &_M2, &_D2, 0, &_M2 );
Function ‘undistortion’ which mathematically removes lens distortion, and rectification,
which mathematically aligns the images with respect to each other.
cvComputeCorrespondEpilines( &_imagePoints1, 1, &_F, &_L1 );
cvComputeCorrespondEpilines( &_imagePoints2, 2, &_F, &_L2 );
double avgErr = 0;
The method (which is avgErr) is the sum of the distances of the points from the
corresponding epipolar lines to its subpixel accuracy.
for( i = 0; i < N; i++ )
{
double err = fabs(points[0][i].x*lines[1][i].x + points[0][i].y*lines[1][i].y + lines[1][i].z)
+ fabs(points[1][i].x*lines[0][i].x +points[1][i].y*lines[0][i].y + lines[0][i].z);avgErr +=
err;
}
printf( "avg err = %g\n", avgErr/(nframes*n) );
COMPUTE AND DISPLAY RECTIFICATION
if( showUndistorted is true as the value is 1 )
51
{
Initializing matrices for rectification purposes.
CvMat* mx1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );
CvMat* my1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );
CvMat* mx2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );
CvMat* my2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );
CvMat* img1r = cvCreateMat( imageSize.height,imageSize.width, CV_8U );
CvMat* img2r = cvCreateMat( imageSize.height,imageSize.width, CV_8U );
IplImage* disp= cvCreateImage( imageSize, IPL_DEPTH_16S, 1 );
CvMat* vdisp = cvCreateMat( imageSize.height,imageSize.width, CV_8U );
CvMat* pair;
double R1[3][3], R2[3][3], P1[3][4], P2[3][4];
CvMat _R1 = cvMat(3, 3, CV_64F, R1);
CvMat _R2 = cvMat(3, 3, CV_64F, R2);
int c=0;
IF BY CALIBRATED (BOUGUET'S METHOD)
if( useUncalibrated == 0 )
{
CvMat _P1 = cvMat(3, 4, CV_64F, P1);
CvMat _P2 = cvMat(3, 4, CV_64F, P2);
52
cvStereoRectify( &_M1, &_M2, &_D1, &_D2, imageSize,&_R, &_T,&_R1, &_R2,
&_P1, &_P2, &_Q,0/*CV_CALIB_ZER O_DISPARITY*/ );
Return parameters are Rl and Rr, rectification rotations for the left and right image
planes. Similarly, we get back the 3-by-4 left and right projection equations Pl and Pr. An
optional return parameter is Q, the 4-by-4 reprojection matrix used in cvReprojectto3D to
get 3D coordinates of the object.
isVerticalStereo = fabs(P2[1][3]) > fabs(P2[0][3]);
Precompute maps for cvRemap()
cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_P1,mx1,my1);
cvInitUndistortRectifyMap(&_M2,&_D2,&_R2,&_P2,mx2,my2);
mx and my of two cameras from cvInitUndistortRectifyMap are relocate matrix of pixel
points for row aligned left and right images.
}
OR ELSE HARTLEY'S METHOD
else if( useUncalibrated == 1 || useUncalibrated == 2 )
Use intrinsic parameters of each camera, but compute the rectification transformation
directly from the fundamental matrix.
{
double H1[3][3], H2[3][3], iM[3][3];
CvMat _H1 = cvMat(3, 3, CV_64F, H1);
CvMat _H2 = cvMat(3, 3, CV_64F, H2);
53
CvMat _iM = cvMat(3, 3, CV_64F, iM);
Just to show independently used F
if( useUncalibrated == 2 )
The fundamental matrix F is just like the essential matrix E, except that
F operates in image pixel coordinates whereas E operates in physical coordinates. The
fundamental matrix has seven parameters, two for each epipole and three for the
homography that relates the two image planes.
cvFindFundamentalMat( &_imagePoints1,&_imagePoints2, &_F);
The function cvStereoRectifyUncalibrated computes the rectification transformations
without knowing intrinsic parameters of the cameras and their relative position in space,
hence the suffix "Uncalibrated". Another related difference from cvStereoRectify is that
the function outputs not the rectification transformations in the object (3D) space, but the
planar perspective transformations, encoded by the homography matrices H1 and H2.
cvStereoRectifyUncalibrated( &_imagePoints1,&_imagePoints2, &_F,imageSize,&_H1,
&_H2,3);
cvInvert(&_M1, &_iM);
cvMatMul(&_H1, &_M1, &_R1);
cvMatMul(&_iM, &_R1, &_R1);
cvInvert(&_M2, &_iM);
cvMatMul(&_H2, &_M2, &_R2);
cvMatMul(&_iM, &_R2, &_R2);
54
Precompute map for cvRemap()
cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_M1,mx1,my1);
cvInitUndistortRectifyMap(&_M2,&_D1,&_R2,&_M2,mx2,my2);
}
else
assert(0);
cvNamedWindow( "rectified", 1 );
RECTIFY THE IMAGES AND FIND DISPARITY MAPS
if( it is not VerticalStereo )
Creating a single image; double the size original image in vertical direction. If it is
vertical stereo so that we can see both left and right rectified images in single image.
pair = cvCreateMat( imageSize.height, imageSize.width*2, CV_8UC3 );
else
pair = cvCreateMat( imageSize.height*2, imageSize.width, CV_8UC3 );
Setup for finding stereo correspondences
This is done by running a window of different sizes of 5-by-5, 7-by-7, . . . , 21-by-21.
The center for each feature in the left image, we search the corresponding row in the right
image for a best match.
CvStereoBMState *BMState = cvCreateStereoBMState();
55
assert(BMState != 0);
BMState->preFilterSize=7; In the pre-filtering step, the input images are normalized to
reduce lighting differences and to enhance image texture.
BMState->preFilterCap=30;
BMState->SADWindowSize=5; Correspondence is computed by a sliding SAD window
BMState->minDisparity=0; minDisparity is where the matching search should start.
CvScalar p;
if(c>0)
{
if(p.val[2]>-100.00)
{
BMState->numberOfDisparities=256;
The disparity search is then carried out over ‘numberOfDisparities’ counted in pixels.
printf("nd%d\n",BMState->numberOfDisparities);
}
else
{
BMState->numberOfDisparities=128;
printf("nd%d\n",BMState->numberOfDisparities);
}
56
}
else
{
BMState->numberOfDisparities=256;
}
Each disparity limit defines a plane at a fixed depth from the cameras.
BMState->textureThreshold=10; Texture is search parameter.
BMState->uniquenessRatio=15;
BMState->speckleWindowSize=21;
BMState->speckleRange=4;
c=c++;
CvCapture *capture1 = 0; Initializing capture for camera parameters.
CvCapture *capture2 = 0;
IplImage *imageBGR1 = 0; Initializing frame images from capture and setting initially
to null.
IplImage *imageBGR2 = 0;
int
key = 0;
capture1 = cvCaptureFromCAM( 0 ); The function that initializes cameras 1 and 2.
capture2 = cvCaptureFromCAM( 1 );
Checking if capture is happening.
if ( !capture1 ) {
57
fprintf( stderr, "Cannot open initialize webcam!\n" );
} Create a window for the video
cvNamedWindow( "result", CV_WINDOW_AUTOSIZE );
while( key != 'q' ) this loop is used get frames from capture continuously
{ get a frame
imageBGR1 = cvQueryFrame( capture1 );
imageBGR2 = cvQueryFrame( capture2 );
Check
if( !imageBGR1 ) break; if not frame the brake the loop.
Create some GUI windows for output display.
cvShowImage("Input Image1", imageBGR1);
cvShowImage("Input Image2", imageBGR2);
IplImage* imageHSV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); Full HSV color
image.
cvCvtColor(imageBGR1, imageHSV1, CV_BGR2HSV); converting color to BGR to
HSV which is easy to separate color planes.
cvShowImage("s1",imageHSV1);
initializing planes
IplImage* planeH1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.
58
IplImage* planeS1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation
component.
IplImage* planeV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness
component.
cvCvtPixToPlane(imageHSV1, planeH1, planeS1, planeV1, 0); display current frame
IplImage* planeH11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.
IplImage* planeS11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation
component.
IplImage* planeV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness
component.
Converting pixel to plane
cvCvtPixToPlane(imageHSV1, planeH11, planeS11, planeV11, 0); Extracting the 3 color
components.
Setting up saturation and brightness to maximum in order to separate Hue plane from
others.
cvSet(planeS11, CV_RGB(255,255,255));
cvSet(planeV11, CV_RGB(255,255,255));
IplImage* imageHSV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3);
Full HSV color image.
IplImage* imageBGR11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3);
Full RGB color image.
59
cvCvtPlaneToPix( planeH11, planeS11, planeV11, 0, imageHSV11 );
Combine separate color components into one.
cvCvtColor(imageHSV11, imageBGR11, CV_HSV2BGR);
Convert from a BGR to an HSV image.
cvReleaseImage(&planeH11);cvReleaseImage(&planeS11);
cvReleaseImage(&planeV11);
cvReleaseImage(&imageHSV11); cvReleaseImage(&imageBGR11);
Doing same thing for other color components as above with Hue component
IplImage* planeH21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.
IplImage* planeS21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation
component.
IplImage* planeV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1);Brightness
component.
cvCvtPixToPlane(imageHSV1, planeH21, planeS21, planeV21, 0); //Extract the 3 color
components.
cvSet(planeS21, CV_RGB(255,255,255));
//cvSet(planeV21, CV_RGB(255,255,255));
IplImage* imageHSV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); HSV color
image.
IplImage* imageBGR21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); RGB color
image.
60
cvCvtPlaneToPix( planeH21, planeS21,planeV21, 0, imageHSV21 ); combine the
separate color components.
cvCvtColor(imageHSV21, imageBGR21, CV_HSV2BGR Convert from a BGR to an
HSV image.
cvReleaseImage(&planeH21);cvReleaseImage(&planeS21);cvReleaseImage(&planeV21
);
cvReleaseImage(&imageHSV21);cvReleaseImage(&imageBGR21);
IplImage* planeH31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.
IplImage* planeS31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation
component.
IplImage* planeV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness
component.
cvCvtPixToPlane(imageHSV1, planeH31, planeS31, planeV31, 0); Extract the 3 color
components.
//cvSet(planeS31, CV_RGB(255,255,255));
cvSet(planeV31, CV_RGB(255,255,255));
IplImage* imageHSV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full HSV
color image.
IplImage* imageBGR31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full RGB
color image.
61
cvCvtPlaneToPix( planeH31, planeS31, planeV31, 0, imageHSV31 ); Combine separate
color components into one.
cvCvtColor(imageHSV31, imageBGR31, CV_HSV2BGR); Convert from a BGR to an
HSV image.
cvReleaseImage(&planeH31);cvReleaseImage(&planeS31);cvReleaseImage(&planeV31
);
cvReleaseImage(&imageHSV31);cvReleaseImage(&imageBGR31);
cvThreshold(planeH1, planeH1, 170, UCHAR_MAX, CV_THRESH_BINARY);
cvThreshold(planeS1, planeS1, 171, UCHAR_MAX, CV_THRESH_BINARY);
cvThreshold(planeV1, planeV1, 136, UCHAR_MAX, CV_THRESH_BINARY);
// Show the thresholded HSV channels
IplImage* img1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); // Greyscale output
image.
cvAnd(planeH1, planeS1, img1);imageColor = H {BITWISE_AND} S.
cvAnd(img1, planeV1, img1);
imageColor = H {BITWISE_AND} S {BITWISE_AND} V.
Show the output image on the screen.
cvNamedWindow("Skin Pixels1", CV_WINDOW_AUTOSIZE);
cvShowImage("color Pixels1", img1); form camera 1 and same above steps are repeated
for second camera
62
IplImage* imageHSV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV
color image.
cvCvtColor(imageBGR2, imageHSV2, CV_BGR2HSV);
//cvShowImage("s",imageHSV2);
IplImage* planeH2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Hue component.
IplImage* planeS2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Saturation
component.
IplImage* planeV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Brightness
component.
cvCvtPixToPlane(imageHSV2, planeH2, planeS2, planeV2, 0);
display current frame
IplImage* planeH12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.
IplImage* planeS12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation
component.
IplImage* planeV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness
component.
cvCvtPixToPlane(imageHSV2, planeH12, planeS12, planeV12, 0); Extract the 3 color
components.
cvSet(planeS12, CV_RGB(255,255,255));cvSet(planeV12, CV_RGB(255,255,255));
IplImage* imageHSV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV
color image.
63
IplImage* imageBGR12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB
color image.
cvCvtPlaneToPix( planeH12, planeS12, planeV12, 0, imageHSV12 );// Combine separate
color components into one.
cvCvtColor(imageHSV12, imageBGR12, CV_HSV2BGR);// Convert from a BGR to an
HSV image.
cvReleaseImage(&planeH12);cvReleaseImage(&planeS12);cvReleaseImage(&planeV12
);
cvReleaseImage(&imageHSV12);cvReleaseImage(&imageBGR12);
IplImage* planeH22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.
IplImage* planeS22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Saturation
component.
IplImage* planeV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Brightness
component.
cvCvtPixToPlane(imageHSV2, planeH22, planeS22, planeV22, 0); Extract the 3 color
components.
cvSet(planeS22, CV_RGB(255,255,255));//cvSet(planeV2, CV_RGB(255,255,255));
IplImage* imageHSV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV
color image.
IplImage* imageBGR22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB
color image.
64
cvCvtPlaneToPix( planeH22, planeS22, planeV22, 0, imageHSV2 );// Combine separate
color components into one.
cvCvtColor(imageHSV22, imageBGR22, CV_HSV2BGR);// Convert from a BGR to an
HSV image.
cvReleaseImage(&planeH22);cvReleaseImage(&planeS22);cvReleaseImage(&planeV22
);
cvReleaseImage(&imageHSV22);cvReleaseImage(&imageBGR22);
IplImage* planeH32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.
IplImage* planeS32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation
component.
IplImage* planeV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness
component.
cvCvtPixToPlane(imageHSV2, planeH32, planeS32, planeV32, 0); // Extract the 3 color
components.
//cvSet(planeS3, CV_RGB(255,255,255));cvSet(planeV32, CV_RGB(255,255,255));
IplImage* imageHSV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV
color image.
IplImage* imageBGR32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB
color image.
cvCvtPlaneToPix( planeH32, planeS32, planeV32, 0, imageHSV32 ); Combine separate
color components into one.
65
cvCvtColor(imageHSV32, imageBGR32, CV_HSV2BGR); Convert from a BGR to an
HSV image.
cvReleaseImage(&planeH32);cvReleaseImage(&planeS32);cvReleaseImage(&planeV32
);cvReleaseImage(&imageHSV32);cvReleaseImage(&imageBGR32);
cvThreshold(planeH2, planeH2, 170, UCHAR_MAX, CV_THRESH_BINARY);
cvThreshold(planeS2, planeS2, 171, UCHAR_MAX, CV_THRESH_BINARY);
cvThreshold(planeV2, planeV2, 136, UCHAR_MAX, CV_THRESH_BINARY);
// Show the thresholded HSV channels
IplImage* img2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); // Greyscale output
image.
cvAnd(planeH2, planeS2, img2);// imagecolor = H {BITWISE_AND} S.
cvAnd(img2, planeV2, img2);// imagecolor = H {BITWISE_AND} S {BITWISE_AND}
V.
// Show the output image on the screen.//cvNamedWindow(“color Pixels2",
CV_WINDOW_AUTOSIZE);
//cvShowImage("color Pixels2", img2);
These color pixel images are imputed to remap to rectify both color images.
If color images 1 and 2( img1 && img2 )
{
CvMat part;
66
cvRemap( img1, img1r, mx1, my1 );
cvRemap( img2, img2r, mx2, my2 );
(if is not VerticalStereo || useUncalibrated != 0 )
{
// When the stereo camera is oriented vertically then useUncalibrated==0 does not
transpose the image, so the epipolar lines in the rectified images are vertical. Stereo
correspondence function does not support such a case.
Applying StereoCorrespondenceBM to rectified color images to get disparity map.
cvFindStereoCorrespondenceBM( img1r, img2r, disp, BMState);
Saving disparity to the text file//cvSave("disp.txt",disp);
IplImage* real_disparity= cvCreateImage( imageSize, IPL_DEPTH_8U, 1 );
cvConvertScale( disp, real_disparity, 1.0/16, 0 ); cvNamedWindow( "disparity" );
cvShowImage( "disparity",real_disparity );
if( useUncalibrated == 0 )//Using Bouguet, we can calculate the depth
{
Calculating depth using Bouguet Method using function ‘cvReprojectImageTo3D’ and
the input is calculated disparity from above step.
IplImage* depth = cvCreateImage( imageSize, IPL_DEPTH_32F, 3 );
cvReprojectImageTo3D(real_disparity , depth, &_Q);
Steps to diplay depth of colored object (X, Y, Z coordinates from cameras) on console
window:
67
The below steps are just finding out minimum, maximum and average distance of colored
pixel from the camera.
int l=0;
float r[10000];float o[10000];float m[10000]; creating arrays for storing pixel values
which are extracted from depth image.
for(int i=0;i<imageSize.height;i++){
for (int j=0;j<imageSize.width;j++)
{
CvScalar s;
p=cvGet2D(depth,i,j); // get the (i,j) pixel value
s=cvGet2D(real_disparity,i,j); getting disparities which are greater than 1(colored pixels
are<1) from real_disparity image.
if(s.val[0]!=0)
{
//printf("X=%f, Y=%f, Z=%f\n",p.val[0],p.val[1],p.val[2]);
r[l]=p.val[2];
o[l]=p.val[0];
m[l]=p.val[1];
//printf("%f",r[l]);
//printf("------------next->>>>>>>>");
l=l++;
68
}
//if(l==1)
//break;
}
}
float minr =r[0];float mino =o[0];float minm =m[0];
float maxr= r[0];float maxo= o[0];float maxm= m[0];
float sumr=0;float sumo=0;float summ=0;
float avgr; float avgo; float avgm;
calculating sum of all x,y z values of color pixels.
for(int pl=0;pl<l;pl++)
{
sumr=sumr+r[pl];sumo=sumo+o[pl];summ=summ+m[pl];
calculating minimum of all x,y z values of color pixels and it same like calculating min,
max and average of elements in a array.
if (r[pl] < minr)
{
minr = r[pl];
}
if (r[pl] > maxr)
{
69
maxr = r[pl];
}
if (o[pl] < mino)
{
mino = o[pl];
}
Calculating maximum;
if (o[pl] > maxo)
{
maxo = o[pl];
}
if (m[pl] < minm)
{
minm = m[pl];
}
if (m[pl] > maxm)
{
maxm = m[pl];
}}
avgr=(float)sumr/l;
avgo=(float)sumo/l;
70
avgm=(float)summ/l;
Outputting value to console window
printf("MAX Z=%f\nMIN Z=%f\n",minr,maxr);
printf("AVG Z=%f\n",avgr);
printf("MAX X=%f\nMIN X=%f\n",mino,maxo);
printf("AVG X=%f\n",avgo);
printf("MAX Y=%f\nMIN Y=%f\n",minm,maxm);
printf("AVG Y=%f\n",avgm);
printf("X=%f, Y=%f, Z=%f\n",avgo,avgm,avgr);
printf("----------\n");
}
The below part is used only to show green lines on Pair image so that we can notice that
left and right images are rectified or not. /* if( !isVerticalStereo )
{
cvGetCols( pair, &part, 0, imageSize.width ); Copy elements from multiple adjacent
columns of an array
cvCvtColor( img1r, &part, CV_GRAY2BGR );
cvGetCols( pair, &part, imageSize.width, imageSize.width*2 );
cvCvtColor( img2r, &part, CV_GRAY2BGR );
for( j = 0; j < imageSize.height; j += 16 )
cvLine( pair, cvPoint(0,j),
71
cvPoint(imageSize.width*2,j),
CV_RGB(0,255,0));
}
else
{
cvGetRows( pair, &part, 0, imageSize.height );
cvCvtColor( img1r, &part, CV_GRAY2BGR );
cvGetRows( pair, &part, imageSize.height,
imageSize.height*2);
cvCvtColor( img2r, &part, CV_GRAY2BGR );
for( j = 0; j < imageSize.width; j += 16 )
cvLine( pair, cvPoint(j,0),
cvPoint(j,imageSize.height*2),
CV_RGB(0,255,0));
}
//cvShowImage( "rectified", pair );*/
key = cvWaitKey(1);
}
cvReleaseImage( &img1 );
cvReleaseImage( &img2 );
}
72
cvReleaseStereoBMState(&BMState);
cvReleaseMat( &mx1 );
cvReleaseMat( &my1 );
cvReleaseMat( &mx2 );
cvReleaseMat( &my2 );
cvReleaseMat( &img1r );
cvReleaseMat( &img2r );
cvReleaseImage( &disp );
}
}
int main(void)main function
{
StereoCalib("1.txt", 5, 7, 0);
return 0;}
73
Chapter 4
RESULTS OF CALIBRATION AND 3D VISION
4.1) Project Application information:
The major purpose of this project is to design a sensor that can detect and track the
player’s bat for table tennis shooter. The sensor can be linked to a robotic system that can
throw the ball to the player’s bat for training purposes. The sensor is designed to detect a
colored object (Player’s bat) from range 2 to 3 meters away from the sensor. The sensors
that can see the colored object are stereo cameras which are placed at convenient place
for tracking, outputs are (X, Y, &Z) coordinates of a colored object from one of the
camera in camera coordinate system. These coordinates can be inputted into robotic
system (shooter) so that shooter can move their joints to throw the ball for different kinds
of training purposes. The shooter can basically have two types of training modes. 1.
Random mode; in this the shooter can throw anywhere on the table for second stage of
training. 2. Spot mode; in this the shooter can only throw the ball on to the table so that
after bouncing back it should exactly hit the player’s bat, for spot mode the shooter needs
the bat coordinates from stereo cameras (sensors). We could add intelligence and
machine learning techniques to the shooter by having the coordinates of an object.
For this specific project we used two USB cameras as a sensor to see the object. Cameras
are placed approximately placed 50 cm apart so see the object at 2 to 3 meters away from
74
the sensors. Before cameras see the 3D object, the cameras needs to be calibrated so that
images from two cameras are vertically aligned.
There are few constraints that are needed to follow every time we track the object.
Constrains are as follows:
1. The object that we are tracking should be always clearly seen by the two cameras,
if not the code stops outputting the coordinates of the object and starts again if
camera sees the object.
2. The camera focus and the distance between the cameras are arranged so that the
player’s playing area should be clearly seen by two cameras. Players bat should
be on the image plane of both cameras.
3. Cameras are not moved or refocused after calibration, if moved; we need to
calibrate once again with new calibration data.
4. For tracking the object which is at larger distance; distance, angle, and focus are
need be adjusted accordingly. Larger the distance between cameras, larger
distance object can be tracked. Focus can be adjusted according to the size of the
object that we are tracking for. Distance between the cameras is the major factor
for tracking the object.
5.
The outputs are (X, Y, & Z) coordinates of colored object. The outputs are categorized
as Min, Max and Average; the reason because all pixels wouldn’t show the same value
75
but the average value is approximately (1 to 3cm difference) equal to the coordinates of
the object that we are tracking.
4.2) Coordinates of a colored object in front of cameras:
(0, 0, Z)
Figure 4.1: Camera coordinate system
X, Y and Z values are from the left camera. The program continuously outputs the
coordinates of a colored object in a console window.
Maximum Z=-91.37 cm (The maximum depth value of a pixel in the disparity image).
Minimum Z= -70.63 cm (The minimum depth value of a pixel in the disparity image).
Average Z=-76.74 cm (The average depth value of pixels in the disparity image which is
the exact real depth value of colored object).
Maximum X=-51.85 cm (The maximum X value of a pixel in the disparity image).
76
Minimum X= -40.46 cm (The minimum X value of a pixel in the disparity image).
Average X=-43.89cm (The average X value of pixels in the disparity image which is the
exact real X value of colored object).
Maximum Y=-3.51 cm (The maximum X value of a pixel in the disparity image).
Minimum Y= -2.08 cm (The minimum X value of a pixel in the disparity image).
Average Y=-2.83 cm (The average X value of pixels in the disparity image which is the
exact real X value of colored object).
4.3) Results are graphically shown below with left and right calibration data.
Figure 4.2: Detected corners on the image taken from left camera.
77
Figure 4.3: Detected corners on the image taken from right camera.
78
Figure 4.4: Rectified image pair,
Each pixel in the left image is vertically aligned to the right image.
Figure 4.5: Displays average error to its sub-pixel accuracy.
79
Figure 4.6: Coordinates of an object with respective to left camera.
Colored object is moved towards and away from the camera to check whether the
cameras are tracking the object correctly or not.
80
Figure 4.7: Disparity between left and right image.
Disparity is pixel distance between the matching pixels from left and right image. Each
disparity limit defines a plane at a fixed depth from the cameras.
81
Chapter 5
CONCLUSION
The purpose of this project is to design a sensor that can detect player’s bat for table
tennis shooter. We have designed this sensor using two USB cameras and OpenCV
computer vision library. The library basically consists of some important functions which
does everything for us. The program was coded in C++ using OpenCV library. The
program function is to calibrate stereo cameras, rectify the distortion in the images and
finally outputs coordinates of the object. In program was coded to read number of
chessboard images from a text file which contains a list of left and right stereo
(chessboard) image pairs, which are used to calibrate the cameras and then also rectify
the images. The code next reads the left and right image pairs and finds the chessboard
corners to sub-pixel accuracy. The code saves object and image points for all images.
With the list of image and object points on good chessboard images, the code calls the
important cvStereoCalibrate() to calibrate the camera. The calibration function outputs
the camera matrix and distortion vector for two cameras; it also outputs rotation matrix,
the translation vector, the essential matrix, and the fundamental matrix. The accuracy of
calibration can be checked by checking how nearly the points in one image lie on the
Epipolar lines of the other image. Epilines are computed by the function
cvComputeCorrespondEpilines. The dot product of the points with the Epipolar is 0.29
with our calibration data. The code then computes the rectification maps using the
uncalibrated method cvStereoRectifyUncalibrated() or the calibrated (Bouguet) method
82
cvStereoRectify(). Here we have rectified data for each source pixels for both left and
right images. The distance between and focus of cameras should not be changed after
calibration because calibration function makes rotational and translational relation
between two cameras. We initialized two cameras and made them to take images of a
colored object for tracking. Background subtraction technique is used to make the images
to see only the colored object from two cameras. The images from left and right camera
are rectified using cvRemap() and lines are drawn across the image pairs that makes us to
see how well the rectified images were aligned .The rectified images are initialized to the
block-matching state using cvCreateBMState(). We then computed the disparity maps by
using cvFindStereoCorrespondenceBM(). This Disparity is initialized to the function
cvReprojectImageTo3D() to get depth Image. The (X, Y, &Z) coordinates of colored
object are encoded in the depth image, this function also takes projection matrix which
consists of distance between two cameras and focal length of each camera.
These coordinates of colored object (Player’s bat) can be linked to the shooter; this makes
the shooter to throw the ball exactly to player’s bat for training purpose. The shooter can
be now being able to see a 3D scene that makes the shooter to use machine learning and
intelligent techniques for advance training purposes. Machine learning makes the shooter
to understand the player’s skill on different shots and that makes shooter to throw the ball
intelligently for all kinds of training purposes.
83
Chapter 6
FUTURE WORK
 Work can be extended by adding machine learning techniques for better accuracy of
results.
 Work can be extended by designing and implementing a table tennis shooter (a
robotic system) that can throw a ball to players bat for training purposes. This robot
can take 3D coordinates of a bat from this project code.
 Work can be extended for industrial inspection purposes.
 Work can be extended to develop an application for part modeling; this application
can be linked to different modeling software (Example: Solid Works, CATIA, Pro-E,
etc) for faster part modeling.
 Work can be extended for detecting an object in front of vehicle (Unmanned
Vehicle).
84
BIBLIOGRAPHY
[1] Gray Bradski , Adrian Khehler, “Learning OpenCV Computer Vision with OpenCV
library”.
[2] Gray Bradski, “OpenCV Wiki” [Online].
Available: http://opencv.willowgarage.com/wiki
[3] Michalel C. Fairhurst, “Computer Vision for Robotic Systems”.
[4] David Forsyth, Jean Ponce, “Computer vision: a modern approach”.
[5] Shervine Mami “Learning OpenCV” [Online].
Available: http://www.shervinemami.co.cc/openCV.html
[6] KenConley “ ROS wiki” [Online]
Available: http://www.ros.org/wiki/
[7] Vishvjit S. Nalwa “A Guided Tour of Computer Vision”
[8] David Lowe, “Computer Vision Industry” [Online].
Available: http://www.cs.ubc.ca/~lowe/vision.html
Download