Shape inference for sheet of paper with text via characteristic strips

advertisement
Shape inference for sheet of paper with text via characteristic
strips
Project by: Arie Kozak
1. Introduction
In this project I will involve myself with attempt to perform shape inference, mainly using
characteristic strips algorithm. In general the algorithm is not easily applicable, therefore this will
be done with certain set of assumptions (stated below) under controlled environment. The sheet
of paper (excluding text) constitutes a Lambertian surface (in approximation at least) with
constant albido, therefore it provides good case for application of the algorithm.
Given a photograph of a sheet with text, the program will output its plot in 3 dimensional space.
The photograph is taken with single light source from above, while the sheet is curved - generally
non-planar – as planar shape of the sheet produces trivial solution.
2. Approach and method
The following main assumptions have been made:
1.
2.
3.
4.
Single point (infinite) light source.
The scene is illuminated and photographed from above, meaning L = E = (0, 0, 1).
Most of the surface is paper without text.
Dark text on bright paper.
5.
πœ•π»
(π‘₯0 , 𝑦0 )
πœ•π‘₯
=
πœ•π»
(π‘₯0 , 𝑦0 )
πœ•π‘¦
= 0 for some x0,y0 (local maxima point).
6. Surface H(x, y) is differential (meaning no surface/depth discontinuities) and constant in one
direction in some coordinate system with direction of z axis equals to the direction of light
source (L or E):
The process will be divided into several steps.
1. Separate the sheet from the rest of the irrelevant image (background).
This step is done mostly manually (there is enough work as it is, and time is limited). The
area of interest is marked with predefined unique color (I used red) around a closed shape
of the sheet:
Given this, the image can be segmented: all pixels inside the enclosing are separated from
those outside. This can be done by finding 2 connected components in the image (one is
enough, 2nd is the rest). Or in MatLab terms, it can be done with convolution. Given pixel (-1,
-1) (meaning it is outside of the original image) is outside, all other pixels from the same
segment can be found by continuously convolving it with
0 1
𝐾𝑑 = [1 1
0 1
0
1]
0
Initializing I(1,1)=1 and the red contour to large negative number (-b). After each iteration,
the values are reset back to values of 1 and –b using logical operations. This creates
"spilling" like effect, as if water would be spilled outside of border:
The background is marked in red:
2. Separate the text from the paper.
The paper is white, and the text is dark -> use thresholding. Low illumination can create
darker regions, the difference is that they become darker "slowly" (smaller gradient) in
comparison to text. To improve the results on thresholding, high pass filter was applied to the
image. In total 2 histograms were generated: one for original image and the other with high
pass filter applied (only the lowest frequencies were removed). This method showed to
improve the results in practice.
Given histogram, the hill with highest point is identified as "paper". The threshold is the
closest local minimum to the left of the global maximum. The local minimum can be found by
calculating zero-crossings of the derivative of histogram.
After calculating 2 thresholds for each of the histograms, text and paper are separated for
both images.
Text1 = {<x,y>| I1(x,y)<threshold1}
Text2 = {<x,y>| I2(x,y)<threshold2}
The result is 𝑇𝑒π‘₯𝑑 = 𝑇𝑒π‘₯𝑑1 ∩ 𝑇𝑒π‘₯𝑑2
So the image was partitioned into 3 segments: background, paper and text:
3. Getting rid of text.
Characteristics strips algorithm assumes the value of intensity only depends on p and q, but
intensity rapidly changes when text is encountered without any change in p or q what-soever. To apply the algorithm, the issue of text must be solved somehow.
Assuming ink confirms to Lambertian property, given p, q and intensity at that point, albido
for that point can be calculated. Lambertian means that albido is constant for all pixels of
text, so dividing by it would cancel it out. Unfortunately, this method failed because it
appears that it varies for different normal vectors throughout the image. Concluding it is not
Lambertian probably.
Another way, is to use interpolation from near pixels to calculate the value of "text" pixels.
The result was not very smooth; under further investigation it appears that contrast is
increased, apparently artificially by the camera in question, similarly to "lateral inhibition"
effect we learned in class. To lesser the effect, the image was smoothed using Gaussian
before interpolation. I used 3x3 kernel with sigma = 0.5 (since only the pixels close to the text
are affected). Additional smoothing after interpolation with kernel of the same size with
sigma = 1.5 - for better smoothing around the areas of interpolation.
4. Finding starting points.
Characteristics strips requires knowledge of initial coordinates x, y, and H, p, q values in
that point. Given 5th assumption, in points with p = q = 0, and according to reflectance map
following from other assumptions:
𝑅(𝑝, π‘ž) =
1
√𝑝2 + π‘ž 2 + 1
R(p, q) is maximal (1) at such point. Therefore, for each pixel with maximal intensity in the
image, it can be concluded that p = q = 0. This point however cannot be used as starting
point, so as suggested in Horn's chapter 11, I will use parabolic approximation. Let's assume
𝐻(π‘₯, 𝑦) = 𝐻0 + 0.5(π‘Žπ‘₯ 2 + 𝑏π‘₯𝑦 + 𝑐𝑦 2 )
Where x = y = 0 is the point with p = q = 0 for simplicity, so
𝑝=
πœ•π»
πœ•π»
= π‘Žπ‘₯ + 𝑏𝑦, π‘ž =
= 𝑐𝑦 + 𝑏π‘₯
πœ•π‘₯
πœ•π‘¦
𝐼(π‘₯, 𝑦) = 𝑅(𝑝, π‘ž)
𝐸(π‘₯, 𝑦) =
1
= 0.5(𝑝2 + π‘ž 2 + 1)
2𝐼(π‘₯, 𝑦)2
= 0.5(π‘Ž2 + 𝑏 2 )π‘₯ 2 + (π‘Ž + 𝑐)𝑏π‘₯𝑦 + 0.5(𝑐 2 + 𝑏 2 )𝑦 2 + 1
𝐸π‘₯π‘₯ = π‘Ž2 + 𝑏 2
{ 𝐸𝑦𝑦 = 𝑐 2 + 𝑏 2
𝐸π‘₯𝑦 = (π‘Ž + 𝑐)𝑏
E(x, y) can be calculated and so the second derivatives. Using the solution to the
equations, the values of p and q in the near points can be calculated. Those points will serve
as starting points for the algorithm. After some algebraic transformations, the equations can
be written as following:
𝑏 4 (𝐸𝑦𝑦 2 − 2𝐸𝑦𝑦 𝐸π‘₯π‘₯ + 𝐸π‘₯π‘₯ 2 + 4𝐸π‘₯𝑦 2 ) + 𝑏 2 (−2𝐸𝑦𝑦 𝐸π‘₯𝑦 2 − 2𝐸π‘₯𝑦 2 𝐸π‘₯π‘₯ ) + 𝐸π‘₯𝑦 4 = 0
Which is solvable using standard quadratic formula.
According to Horn, the equation has up to 4 solutions. The solutions can be categorized to
3 sets:
a>=0, c>=0
a<=0, c<=0
a and c have opposite signs.
The first type of solution is not suitable because characteristic strips doesn't work for local
minima points with current reflectance map, since:
𝛿π‘₯ = −
𝛿𝑦 = −
𝑝
√𝑝2 + π‘ž 2 + 1
π‘ž
√𝑝2 + π‘ž 2 + 1
𝛿𝑠
𝛿𝑠
Therefore any advancement is done in the direction opposite to the gradient of H, and
hence H is decreasing in that direction. Which makes local minima "inescapable". With local
maxima, it is the opposite - which makes it desirable case. Hence the surface should have at
least one maxima for algorithm to generate at least some result.
The third type of solution is not suitable as well, because of assumption number 6. Within
the assumed coordinate system, the parabolic approximation would be of the form:
𝐻(π‘₯, 𝑦) = π‘Žπ‘¦ 2 + 𝑏𝑦 + 𝑐
for some coefficients a, b, c. H is not dependent on x therefore the partial derivative by x is
zero. In any rotated coordinate system of xy plane (z axis direction is still up), where v = (vx,
vy) and w = (wx, wy) is the direction of rotated axis x and y accordingly (w*v=0), hence
π»π‘Ÿ(𝑝 ∗ 𝑣, 𝑝 ∗ 𝑀) = 𝐻(π‘₯, 𝑦),
𝑝 = (π‘₯, 𝑦)
π‘₯ − 𝑣𝑦 𝑦 𝑦 − 𝑀π‘₯ π‘₯
π‘Ž
π‘Žπ‘€π‘₯ 2
2
2
π»π‘Ÿ(π‘₯, 𝑦) = 𝐻 (
,
+π‘₯
+β‹―
)=𝑦
𝑣π‘₯
𝑀𝑦
𝑀𝑦 2
𝑀𝑦 2
Therefore, the coefficients of x^2 and y^2 have the same sign.
In practice, the image does not perfectly describe the light intensity distribution, because
of many factors such as camera calibration / not really infinite/uniform light source / paper
is not 100% Lambertian surface etc. So it's good practice, instead of finding global maxima,
use all pixels within certain % of maximum intensity. Of cause, only local maxima points will
be used.
It is possible (expected) to have areas of local maxima/minima, so all such points within
certain distance from each other will be categorized as single "cluster". Using here the
variation of the same algorithm as in the beginning (finding connected components).
area of cluster
cluster points (marked in red, not easily seen here):
Another image:
Points:
Each of those clusters assumed to be local maxima or local minima. There are some
constraints on those possibilities though. Function cannot have 2 local maxima "in a row".
Between each two clusters of one type, there must be another in between them of another
type. In general, the type of the second cluster is the same as the first one  number of
clusters between them is odd. The cluster is considered between them, if any line
connecting the two clusters intersects with it (I connected two clusters with line between
their centers of mass).
In this example there are two possible solutions:
Note: the program will generate always one solution only (first cluster is 'max'), to
generate 2nd solution, set variable "sol" to 1 (in main2.m).
5. Characteristic strips.
For each maximal point found, given parabolic fit consistent with local maxima (a<=0,
c<=0), p and q values near points are calculated in 4 directions: 45º, 135 º, 225 º, 315 º. Each
such point serves as starting point for characteristic strips.
The calculation is done for each cluster separately, starting from initial height = 0. The
calculation of the relative height between clusters is described below.
6. Calculating relative height between clusters.
Two clusters are chosen for merge when one of them has known height, and they are
closest to each other. For each cluster with unknown height (A), cluster (B) with known
height (B) is selected according to maximal "score":
π‘†π‘π‘œπ‘Ÿπ‘’ = ∑
π‘–πœ–π΄
1
𝑑𝑖,𝑗 2
where 𝑑𝑖,𝑗 is the distance (on XY plane) between point i to the closest point j in B.
Calculating closest points in naïve way can be quite expensive, especially in MatLab, so I
used Voronoi diagram for this (built in MatLab). Voronoi diagram is a data structure that
allows to query closest point fairly quickly.
Voronoi diagram –each point corresponds to area with all pixels are closest to it.
Now, for each point in A that is close enough to B, expected height is calculated using
interpolation from points in B. If π‘Žπ‘– , 𝑏𝑖 is current and expected height of point i accordingly,
find relative height x, such that error will be minimal:
𝑁
𝑒(π‘₯) = ∑(π‘Žπ‘– + π‘₯ − 𝑏𝑖 )2 → π‘šπ‘–π‘›
𝑖
𝑁
𝑒
′ (π‘₯)
= 2 ∑(π‘₯ + π‘Žπ‘– − 𝑏𝑖 ) = 0
𝑖
𝑁
1
π‘₯ = ∑(𝑏𝑖 − π‘Žπ‘– )
𝑁
𝑖
7. Rebuild the surface of the sheet.
According to assumption 6, H is constant in on direction. That means there is an
πœ•π»
alternative coordinate system with πœ•π‘₯ = 0, let v = (vx, vy) be such direction. This means:
∀𝑖: 𝑔𝑖 ∗ 𝑣 = 0, 𝑔𝑖 = (𝑝𝑖 , π‘žπ‘– )
In other words, gradient projection on v for all points should be 0. Geometrically this
means all points (p, q) are on the same line. In practice there is some deviation due to noise,
so this line is approximated using "least square line" calculation we learned in class.
The new axis y being in direction of v and x is perpendicular to it. Now let's project all
points to the YZ plane:
After sorting the points by y-coordinate, polyline approximation algorithm from the class
can be used:
The algorithm requires an error as input argument though. The error is not known, but the
number of expected edge points is known. The curve should have one point for each
minima/maxima cluster and addition to two ending points. If algorithm returns number of
edge points higher than expected -> increase the error, otherwise decrease it. In such way
the error can be found using binary search.
Using those edge points, cubic spline interpolation (built in MatLab) can be used. Cubic
spline is piecewise of polynomial functions for 3rd degree.
Finally, the sheet surface is calculated:
3. Results
I ran the program on various images and got overall satisfactory results:
(the sample from before)
4. Conclusions
The program has quite a bit of different parameters that might need adjustment for certain
type of images: like max% intensity threshold for finding max pixels, and maximal distance
to relate to the same cluster. Characteristic strips requires sufficient amount of local
maxima areas in the image to produce enough data (with current reflectance map at
least). Generally the result is more than one solution, and does not include full 3d
information of the surface spatial structure (height are only relative, and multiplication by
constant is also solution).
In spite of this, for carefully taken photos the results were pretty good. Some problems /
points for improvement:
ο‚· Detect sheet of paper automatically.
ο‚· Relaxing some assumptions, for example:
o Surface being constant in one direction can be relaxed, while partial derivative being
independent of perpendicular direction. Although, calculating YZ projection might be
more difficult / require more data.
o Illumination from arbitrary direction. Parabolic approximation need to be changed /
replaced in this case.
ο‚· Search for clusters can/should be improved (as it doesn't work well for some cases).
ο‚· Polyline approximation doesn't work well for some cases, should be replaced with
better algorithm.
5. References
ο‚·
ο‚·
ο‚·
[1] Introduction to Computational and Biological Vision – Prof. Ohad Ben-Shahar. Including
B.K.P. Horn's book chapter 11.
Voronoi: http://en.wikipedia.org/wiki/Voronoi_diagram.
Spline: http://en.wikipedia.org/wiki/Spline_%28mathematics%29.
Download