Shape inference for sheet of paper with text via characteristic strips Project by: Arie Kozak 1. Introduction In this project I will involve myself with attempt to perform shape inference, mainly using characteristic strips algorithm. In general the algorithm is not easily applicable, therefore this will be done with certain set of assumptions (stated below) under controlled environment. The sheet of paper (excluding text) constitutes a Lambertian surface (in approximation at least) with constant albido, therefore it provides good case for application of the algorithm. Given a photograph of a sheet with text, the program will output its plot in 3 dimensional space. The photograph is taken with single light source from above, while the sheet is curved - generally non-planar – as planar shape of the sheet produces trivial solution. 2. Approach and method The following main assumptions have been made: 1. 2. 3. 4. Single point (infinite) light source. The scene is illuminated and photographed from above, meaning L = E = (0, 0, 1). Most of the surface is paper without text. Dark text on bright paper. 5. ππ» (π₯0 , π¦0 ) ππ₯ = ππ» (π₯0 , π¦0 ) ππ¦ = 0 for some x0,y0 (local maxima point). 6. Surface H(x, y) is differential (meaning no surface/depth discontinuities) and constant in one direction in some coordinate system with direction of z axis equals to the direction of light source (L or E): The process will be divided into several steps. 1. Separate the sheet from the rest of the irrelevant image (background). This step is done mostly manually (there is enough work as it is, and time is limited). The area of interest is marked with predefined unique color (I used red) around a closed shape of the sheet: Given this, the image can be segmented: all pixels inside the enclosing are separated from those outside. This can be done by finding 2 connected components in the image (one is enough, 2nd is the rest). Or in MatLab terms, it can be done with convolution. Given pixel (-1, -1) (meaning it is outside of the original image) is outside, all other pixels from the same segment can be found by continuously convolving it with 0 1 πΎπ = [1 1 0 1 0 1] 0 Initializing I(1,1)=1 and the red contour to large negative number (-b). After each iteration, the values are reset back to values of 1 and –b using logical operations. This creates "spilling" like effect, as if water would be spilled outside of border: The background is marked in red: 2. Separate the text from the paper. The paper is white, and the text is dark -> use thresholding. Low illumination can create darker regions, the difference is that they become darker "slowly" (smaller gradient) in comparison to text. To improve the results on thresholding, high pass filter was applied to the image. In total 2 histograms were generated: one for original image and the other with high pass filter applied (only the lowest frequencies were removed). This method showed to improve the results in practice. Given histogram, the hill with highest point is identified as "paper". The threshold is the closest local minimum to the left of the global maximum. The local minimum can be found by calculating zero-crossings of the derivative of histogram. After calculating 2 thresholds for each of the histograms, text and paper are separated for both images. Text1 = {<x,y>| I1(x,y)<threshold1} Text2 = {<x,y>| I2(x,y)<threshold2} The result is πππ₯π‘ = πππ₯π‘1 ∩ πππ₯π‘2 So the image was partitioned into 3 segments: background, paper and text: 3. Getting rid of text. Characteristics strips algorithm assumes the value of intensity only depends on p and q, but intensity rapidly changes when text is encountered without any change in p or q what-soever. To apply the algorithm, the issue of text must be solved somehow. Assuming ink confirms to Lambertian property, given p, q and intensity at that point, albido for that point can be calculated. Lambertian means that albido is constant for all pixels of text, so dividing by it would cancel it out. Unfortunately, this method failed because it appears that it varies for different normal vectors throughout the image. Concluding it is not Lambertian probably. Another way, is to use interpolation from near pixels to calculate the value of "text" pixels. The result was not very smooth; under further investigation it appears that contrast is increased, apparently artificially by the camera in question, similarly to "lateral inhibition" effect we learned in class. To lesser the effect, the image was smoothed using Gaussian before interpolation. I used 3x3 kernel with sigma = 0.5 (since only the pixels close to the text are affected). Additional smoothing after interpolation with kernel of the same size with sigma = 1.5 - for better smoothing around the areas of interpolation. 4. Finding starting points. Characteristics strips requires knowledge of initial coordinates x, y, and H, p, q values in that point. Given 5th assumption, in points with p = q = 0, and according to reflectance map following from other assumptions: π (π, π) = 1 √π2 + π 2 + 1 R(p, q) is maximal (1) at such point. Therefore, for each pixel with maximal intensity in the image, it can be concluded that p = q = 0. This point however cannot be used as starting point, so as suggested in Horn's chapter 11, I will use parabolic approximation. Let's assume π»(π₯, π¦) = π»0 + 0.5(ππ₯ 2 + ππ₯π¦ + ππ¦ 2 ) Where x = y = 0 is the point with p = q = 0 for simplicity, so π= ππ» ππ» = ππ₯ + ππ¦, π = = ππ¦ + ππ₯ ππ₯ ππ¦ πΌ(π₯, π¦) = π (π, π) πΈ(π₯, π¦) = 1 = 0.5(π2 + π 2 + 1) 2πΌ(π₯, π¦)2 = 0.5(π2 + π 2 )π₯ 2 + (π + π)ππ₯π¦ + 0.5(π 2 + π 2 )π¦ 2 + 1 πΈπ₯π₯ = π2 + π 2 { πΈπ¦π¦ = π 2 + π 2 πΈπ₯π¦ = (π + π)π E(x, y) can be calculated and so the second derivatives. Using the solution to the equations, the values of p and q in the near points can be calculated. Those points will serve as starting points for the algorithm. After some algebraic transformations, the equations can be written as following: π 4 (πΈπ¦π¦ 2 − 2πΈπ¦π¦ πΈπ₯π₯ + πΈπ₯π₯ 2 + 4πΈπ₯π¦ 2 ) + π 2 (−2πΈπ¦π¦ πΈπ₯π¦ 2 − 2πΈπ₯π¦ 2 πΈπ₯π₯ ) + πΈπ₯π¦ 4 = 0 Which is solvable using standard quadratic formula. According to Horn, the equation has up to 4 solutions. The solutions can be categorized to 3 sets: a>=0, c>=0 a<=0, c<=0 a and c have opposite signs. The first type of solution is not suitable because characteristic strips doesn't work for local minima points with current reflectance map, since: πΏπ₯ = − πΏπ¦ = − π √π2 + π 2 + 1 π √π2 + π 2 + 1 πΏπ πΏπ Therefore any advancement is done in the direction opposite to the gradient of H, and hence H is decreasing in that direction. Which makes local minima "inescapable". With local maxima, it is the opposite - which makes it desirable case. Hence the surface should have at least one maxima for algorithm to generate at least some result. The third type of solution is not suitable as well, because of assumption number 6. Within the assumed coordinate system, the parabolic approximation would be of the form: π»(π₯, π¦) = ππ¦ 2 + ππ¦ + π for some coefficients a, b, c. H is not dependent on x therefore the partial derivative by x is zero. In any rotated coordinate system of xy plane (z axis direction is still up), where v = (vx, vy) and w = (wx, wy) is the direction of rotated axis x and y accordingly (w*v=0), hence π»π(π ∗ π£, π ∗ π€) = π»(π₯, π¦), π = (π₯, π¦) π₯ − π£π¦ π¦ π¦ − π€π₯ π₯ π ππ€π₯ 2 2 2 π»π(π₯, π¦) = π» ( , +π₯ +β― )=π¦ π£π₯ π€π¦ π€π¦ 2 π€π¦ 2 Therefore, the coefficients of x^2 and y^2 have the same sign. In practice, the image does not perfectly describe the light intensity distribution, because of many factors such as camera calibration / not really infinite/uniform light source / paper is not 100% Lambertian surface etc. So it's good practice, instead of finding global maxima, use all pixels within certain % of maximum intensity. Of cause, only local maxima points will be used. It is possible (expected) to have areas of local maxima/minima, so all such points within certain distance from each other will be categorized as single "cluster". Using here the variation of the same algorithm as in the beginning (finding connected components). area of cluster cluster points (marked in red, not easily seen here): Another image: Points: Each of those clusters assumed to be local maxima or local minima. There are some constraints on those possibilities though. Function cannot have 2 local maxima "in a row". Between each two clusters of one type, there must be another in between them of another type. In general, the type of the second cluster is the same as the first one ο³ number of clusters between them is odd. The cluster is considered between them, if any line connecting the two clusters intersects with it (I connected two clusters with line between their centers of mass). In this example there are two possible solutions: Note: the program will generate always one solution only (first cluster is 'max'), to generate 2nd solution, set variable "sol" to 1 (in main2.m). 5. Characteristic strips. For each maximal point found, given parabolic fit consistent with local maxima (a<=0, c<=0), p and q values near points are calculated in 4 directions: 45º, 135 º, 225 º, 315 º. Each such point serves as starting point for characteristic strips. The calculation is done for each cluster separately, starting from initial height = 0. The calculation of the relative height between clusters is described below. 6. Calculating relative height between clusters. Two clusters are chosen for merge when one of them has known height, and they are closest to each other. For each cluster with unknown height (A), cluster (B) with known height (B) is selected according to maximal "score": πππππ = ∑ πππ΄ 1 ππ,π 2 where ππ,π is the distance (on XY plane) between point i to the closest point j in B. Calculating closest points in naïve way can be quite expensive, especially in MatLab, so I used Voronoi diagram for this (built in MatLab). Voronoi diagram is a data structure that allows to query closest point fairly quickly. Voronoi diagram –each point corresponds to area with all pixels are closest to it. Now, for each point in A that is close enough to B, expected height is calculated using interpolation from points in B. If ππ , ππ is current and expected height of point i accordingly, find relative height x, such that error will be minimal: π π(π₯) = ∑(ππ + π₯ − ππ )2 → πππ π π π ′ (π₯) = 2 ∑(π₯ + ππ − ππ ) = 0 π π 1 π₯ = ∑(ππ − ππ ) π π 7. Rebuild the surface of the sheet. According to assumption 6, H is constant in on direction. That means there is an ππ» alternative coordinate system with ππ₯ = 0, let v = (vx, vy) be such direction. This means: ∀π: ππ ∗ π£ = 0, ππ = (ππ , ππ ) In other words, gradient projection on v for all points should be 0. Geometrically this means all points (p, q) are on the same line. In practice there is some deviation due to noise, so this line is approximated using "least square line" calculation we learned in class. The new axis y being in direction of v and x is perpendicular to it. Now let's project all points to the YZ plane: After sorting the points by y-coordinate, polyline approximation algorithm from the class can be used: The algorithm requires an error as input argument though. The error is not known, but the number of expected edge points is known. The curve should have one point for each minima/maxima cluster and addition to two ending points. If algorithm returns number of edge points higher than expected -> increase the error, otherwise decrease it. In such way the error can be found using binary search. Using those edge points, cubic spline interpolation (built in MatLab) can be used. Cubic spline is piecewise of polynomial functions for 3rd degree. Finally, the sheet surface is calculated: 3. Results I ran the program on various images and got overall satisfactory results: (the sample from before) 4. Conclusions The program has quite a bit of different parameters that might need adjustment for certain type of images: like max% intensity threshold for finding max pixels, and maximal distance to relate to the same cluster. Characteristic strips requires sufficient amount of local maxima areas in the image to produce enough data (with current reflectance map at least). Generally the result is more than one solution, and does not include full 3d information of the surface spatial structure (height are only relative, and multiplication by constant is also solution). In spite of this, for carefully taken photos the results were pretty good. Some problems / points for improvement: ο· Detect sheet of paper automatically. ο· Relaxing some assumptions, for example: o Surface being constant in one direction can be relaxed, while partial derivative being independent of perpendicular direction. Although, calculating YZ projection might be more difficult / require more data. o Illumination from arbitrary direction. Parabolic approximation need to be changed / replaced in this case. ο· Search for clusters can/should be improved (as it doesn't work well for some cases). ο· Polyline approximation doesn't work well for some cases, should be replaced with better algorithm. 5. References ο· ο· ο· [1] Introduction to Computational and Biological Vision – Prof. Ohad Ben-Shahar. Including B.K.P. Horn's book chapter 11. Voronoi: http://en.wikipedia.org/wiki/Voronoi_diagram. Spline: http://en.wikipedia.org/wiki/Spline_%28mathematics%29.