Basic Image Manipulation and Perspective Projection Om Pavithra Bonam and Sridhar Godavarthy University of South Florida {obonam, sgodavar}@cse.usf.edu Computer Vision (CAP 6415 : Project 0) 1 Abstract Perspective projection is the process of rendering an object in 3-dimensions as it would appear on an image screen when captured on a camera. In this project we attempt to obtain the perspective projection of an object -given a camera, the location of the object and the location of the image plane. The projected image is then scaled and saved to an image file. The program was able to perform the conversion accurately and saved the images to a PGM file. The program was also able to gracefully handle any erroneous input. 2 Objectives Write a program to read and write PGM files and perform simple operations on PGM images. To understand perspective projection, the relation between real world, camera world and pixel coordinates and to convert from one to the other and to use this understanding to write a program to project objects in 3 dimensions onto an image screen and also to save the image to a file. 3 Programming and Problem solving The entire program was written in C++. Once we read the input file from the user giving the specs of the ROI and image files, the next part in programming involved reading and writing to a PGM file. The PGM file is a very simple format and does not contain any excess information. The PGM file format is as follows: 3.1 File Format P5 PGM in binary format # place for any comments numrows numcols max_gray_scale_value Pixels… Once the format is known, reading is only a file operation. The trick here was to write a function that can read characters and another function that could read integers. This helped save a lot of programming effort as calling these functions repetitively helped read the entire file without much additional overload and also helped compact the code. By slightly tweaking these functions to handle spaces and other punctuations, we were able to read the file successfully. Memory for the pixel values was allocated dynamically. 3.2 Implementing the ROI The ROI is represented as a pair of x and y coordinates corresponding to the top left and bottom right corners. Whenever an operation is requested, we check to see if an ROI has been specified and if it has, then start from the left corner and work our way towards the right bottom pixel performing the required operations. This way, we avoid testing the other pixels. This is made possible by the fact that ROIs are continuous rectangular regions. 3.3 Image Operations In order to find the minimum and maximum gray scale operations in the ROI, we are left with no option but to sequentially scan every pixel checking for the maximum and minimum at every pixel. once the minimum and maximum values have been obtained, the threshold is calculated. the threshold needs to be applied on the whole image and hence e ignore the ROI and compare each pixel to see if it falls above or below the threshold and replace the pixel. Finally, saving the image to a PGM file is just a file write operation once we have intelligent functions that are capable of writing to the file. 3.4 The im class All the image operations were encapsulated in a class called im. This class provides for almost all of the basic operations needed on an image including reading, writing, accessing and setting individual pixels, initializing an empty image, copying from one image to another, setting and getting ROI, setting all pixels to white or to black etc. 3.5 Aligning the camera axes with the real world axes This was probably the toughest challenge in the entire project. The complications were mainly due to our insufficient understanding of the alignment facts and also due to our virtual zero knowledge of coordinate geometry. We were unable to proceed until we had a complete brush up of the basic concepts in coordinate geometry. Finding pan and tilt required some serious thought. Finally, because the xy plane is parallel to the x axis, the calculation of the theta angle reduced to a simple angle between two points. Calculating alpha on the other hand was slightly more difficult. We arrived at a solution by using the vector dot product. If we consider the Z axis as one vector and the camera axis as the other vector, their dot product is given by 𝐴. 𝐵 = |𝐴|. |𝐵|. 𝑐𝑜𝑠 𝛼 ------(1) from which the angle α can be calculated. Creating vectors given two points is basic. 3.6 Converting from real world to image plane coordinates Once we had the pan and tilt, conversion of any point from the real world coordinate system to the image plane was straight forward by using the equations 2.5-42 and 2.5-43 At the end of this stage, we have converted from a 3-D object to a 2-D image as seen through a camera and have discarded the depth information 3.7 Converting from image plane coordinates to pixel coordinates The coordinates now are in the image plane. However, they are still in the real world coordinates with the origin (0, 0) at the center. In order to display these vertices on a computer screen or in an image, we will need to first convert them to the pixel coordinates. The difference is that in the pixel coordinates system, the origin is at the left top and there are no negative coordinates! This conversion can be achieved by the conversion functions given in the textbook and replicated here for completion: 𝑖= 𝑛−1 − 2 𝑗= 𝑚−1 + 2 𝑦 ----------(2) 𝑥 ----------(3) where, x,y are the coordinates in the image plane coordinates m,n are the dimensions of the image i,j are the pixel coordinates. Although we attempted to plot the original image plane coordinates, they were really small in the order of a hundredth and the plotted images could not really be deciphered without any scaling. So, we went ahead and scaled the image plane coordinates before converting them to the pixel coordinates. The pixel coordinates were, of course, rounded off to the nearest integer value. 3.8 Drawing the edges in the image This was the most tedious task we had to undergo. Trying to explain it in detail would be an injustice, both to the effort we put in and the actual objective of understanding the coordinate systems and their conversions. We will satisfy ourselves and the reader by saying that we adopted a simple slope intercept conversion and used the slope and the intercept at each point to calculate the next point on the line. We had to be careful to see that we did not attempt to draw the line beyond the image boundaries. We were aided by the fact that the image plane was restricted from +1 to -1 and hence we were able to use the scaling itself as a boundary condition. 4 Observations The major observation was the conversion from the real world to perspective images. It was an experience to actually see the images as we had once drawn in our Engineering drawing class. The only difference was that we did not perform hidden line removal. That would have definitely made the final image look much better, but then we can already see the magnitude of the effort we would have had to put in to complete that. The exercise was an eye opener when it came to the relationship between pan, tilt, the real axes, the image plane and the camera location. We had to put in a lot of thought and even ended up making mini models of the three axes to help us in our understanding, but it was all worth the effort as we are now able to visualize, with much less effort, the actual alignment process. Another interesting observation was the actual conversion between the real world and the pixel coordinates. This is probably the first exercise where we have combined these two and converted between them. We have done several image processing operations with a lot of implicit assumption about the origin, rows and columns. Only after attempting to program this did we realize that the rows and columns are actually interchanged between the pixel and real coordinates and so was the origin shift. Definitely worth mentioning was hidden line removal. Our excitement on seeing the final output image was huge, but very short lived. We realized how important it is to perform hidden line removal. It kind of marred the whole image. Line drawing, between two points, is not as easy a task as it seems. What we thought would be the most trivial part of the project turned out to be the most tedious! We spent most of our time on this. Initially, we thought that we would just have to join along the vertical, horizontal or diagonal, but it was when we put our fingers to the keyboard that we realized the intricacies involved. But once we got the idea of using the slope, it was a cake walk. Some effort was saved by the fact that the x-axis was parallel to the x-y plane. Because of this, instead of having to go through the vector cross product to find the pan, we could just project the point onto the x-y plane and calculate the angle between two straight lines. We did observe that the angle was being measured clockwise and hence needed to be negative. This fact, though, was not intuitive and we discovered it only during our sanity check. The restriction of the image plane to +1 , -1 helped reduce a lot of error and bounds checking when displaying the image. 5 Results 5.1 Binarization Following are some of the output image after binarization. The program was tested on several images of varying sizes and ranges. The program performed quite effectively and produced accurate results. Figures 3, 4 and 5 are individually capable of confirming the correct working of the program. The input images used in these cases are uniquely designed to be of block form and the ROI have been selected to isolate the black or white regions. The ROI used, the maximum and minimum gray scale values have been mentioned in the captions of the figures. (a) (b) Fig 1. (a) Input PGM image(64 x 64) and (b) Result of binarization with a threshold based on the specified ROI (10,100 82,200). calculated Lmax:239 and Lmin:0 (a) (b) Fig 2. (a) Input PGM image(512 x 512) and (b) Result of binarization with a threshold based on the specified ROI (10,100 82,200). calculated Lmax:200 and Lmin:86 (a) (b) Fig 3. (a) Input PGM image(400 x 400) and (b) Result of binarization with a threshold based on the specified ROI (150, 200 300, 350). calculated Lmax:255 and Lmin:0 (a) (b) Fig 4. (a) Input PGM image(400 x 400) and (b) Result of binarization with a threshold based on the specified ROI (150, 200 300, 350). calculated Lmax:255 and Lmin:255 (a) (b) Fig 5. (a) Input PGM image(400 x 400) and (b) Result of binarization with a threshold based on the specified ROI (50, 50 80, 60). calculated Lmax:0 and Lmin:0 (a) (b) Fig 6. (a) Input PGM image(800 x 800) and (b) Result of binarization with a threshold based on the specified ROI (150, 200 300, 350). calculated Lmax:227 and Lmin:0 5.2 Perspective Projection The results turned out to be much more beautiful and accurate than we dared to hope for! We tried our program for various objects and sizes. The following figures show these results. The figures each show the image as scaled to a 256x256 image and also the coordinates used for the object in real world coordinates. Figure 9 shows an image where the image plane coordinates were beyond the maximum of (+1 to -1). As can be seen the program clearly displays only the portion of the coordinates that can actually be projected onto the image plane. 10 10 10 0 0 0 3 12 8 0 0 0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 1 1 1 2 1 3 1 4 5 6 5 7 5 8 2 7 2 8 3 6 3 8 4 6 4 7 Fig 7. A cube in perspective vision on a 256x256 canvas. The real world coordinates are given next to it. 10 10 10 0 0 0 3 12 8 0 0 0 0 0 4 0 4 0 4 0 0 4 4 4 4 4 0 4 0 4 0 4 4 1 2 1 3 1 4 5 6 5 7 5 8 2 7 2 8 3 6 3 8 4 6 4 7 Fig 8. Another cube in perspective vision on a 256x256 canvas. The real world coordinates are given next to it. 10 10 10 0 0 0 3 12 8 0 0 0 0 0 16 0 16 0 16 0 0 16 16 16 16 16 0 16 0 16 0 16 16 1 2 1 3 1 4 5 6 5 7 5 8 2 7 2 8 3 6 3 8 4 6 4 7 Fig 9. Yet another cube in perspective vision on a 256x256 canvas. The real world coordinates are given next to it. Note that the edges of the cube extend beyond the canvas and are being truncated. 10 10 10 0 0 0 3 12 8 0 0 0 0 0 4 0 10 0 5 0 0 5 10 4 5 4 0 5 0 4 0 10 4 1 2 1 3 1 4 5 6 5 7 5 8 2 7 2 8 3 6 3 8 4 6 4 7 Fig 10. A cuboid in perspective vision on a 256x256 canvas. The real world coordinates are given next to it. 10 10 10 0 0 0 3 12 8 0 0 0 0 0 4 0 10 0 4 0 0 4 10 4 4 10 0 4 0 4 0 10 4 1 2 1 3 1 4 5 6 5 7 5 8 2 7 2 8 3 6 3 8 4 6 4 7 Fig 11. A square prism in perspective vision on a 256x256 canvas. The real world coordinates are given next to it. 6 Conclusion We were able to effectively convert real world coordinates into image plane coordinates and eventually into pixel coordinates, scale the resulting image and also save it to a PGM file. We were able to write our own program to read and write PGM files successfully. The program has been tested on several images and also checked for boundary conditions. The program handles all scenarios and exits gracefully in scenarios involving faulty input. We now have an image class that can be used to perform many standard operations on a PGM image. We were able to understand the concept of perspective projection and also the importance of axes alignment and the pan and tilt angles. 7 References [1]Rafael C. Gonzalez , Richard E. Woods, Digital Image Processing, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1992 [2] R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision (McGraw-Hill, 1995) [3] www.about.com