Final Report

advertisement
Abstract
The goal of this project is to design and implement a playing card recognizer. This is
accomplished with a template-matching based character recognition program running on a host
PC with a Texas Instrument (TI) Evaluation Module (EVM) Board using a TMS6701 DSP chip.
The system is able to recognize a constrained set of playing cards with 62% accuracy at a rate of
10 frames per seconds.
1. Introduction
This project was originally intended for use in an autonomous poker card playing system.
The system would consists of a card recognition portion employing a video camera interfacing to
a character recognition program running on a DSP chip, and a artificial intelligence program to
make decisions based on the recognition output. Due to time and hardware constraints, we have
limited the scope of this project to implementing a program that takes black-and-white playing
card images stored on a PC and determines the number of the card.
Image recognition is a problem encountered in many applications, and many different
methods have been developed. We use a template-matching based system in our recognition
program due to its simplicity. This is discussed in the next section. The details of the
implementation are described in Section 3. The result is in Section 4.
2. Theory of Operation
2.1 Design
This project focuses on a subset of a standard deck of 52 playing cards. The card images
are taken in full color with an HP 315 digital camera at 640 by 480 pixel resolution, figure 1.
The images, taken against solid black background, will be from a fixed distance and a random
rotation. Next, the cards’ gray scale intensity value is computed by taking the scalar product of
each RGB pixel with the intensity vector from the YUV representation. The mean and variance
of the pixel values are computed next, and a binary threshold applied at two standard deviations
above the mean pixel value. The remaining foreground blobs are labeled, and the largest eightconnected blob is taken to be the playing card, and the smaller blobs are flood filled. Using
moment calculations, the major axis of the circumscribing ellipse is calculated and used to rotate
the card to a standard position. During the rotation, a nearest pixel heuristic is used to
interpolate. Once the card is in a standard position, the top left corner of the card is correlated
with the 180-degree rotation of the bottom right corner of the card to ensure that the proper
information (number and suit identifiers) are isolated. Finally, template matching is
implemented through a normalized two dimensional cross correlation procedure to identify the
suit and card number.
1
Figure 1. Card recognition flow chart
Down
Sample &
Convert to
Binary
High
Resolution
Color Image
Crosscorrelate
with
Templates
Label
Connected
Objects &
Flood Fill
Card Area
Binary Image
of Card
Corner
Compute
Scores and
Compare
Use Major Axis
to Find Corners
for Scaling and
Rotation
Extract
Corner for
Comparison
Card Value
and Suit
3. Implementation
3.1 Overview
The current implementation is able to recognize the card number from a constrained set
of card images stored on the host. The images must be in black-and-white with the card in a
fixed rotation. The card can be of any size. As illustrated in Section 2.2, several steps are
involved in the recognition. The position of the playing card in the image must first be found
and the card corner containing the card number extracted. This is discussed in Section 3.4. The
corner is then scaled to match the size of the template using the bicubic spline interpolation
described in Section 3.5. The resized corners are compared against the templates using the
normalized cross-correlation detailed in Section 3.6.
3.2 Program Structure
The card recognition program involves two cooperative processes running on separate
hardware. This is illustrated in the diagram below:
2
Figure 2. Implementation diagram
Image File
(1) Corner Extraction
Templates
(2) Scaling
(3) Cross-Correlation
RTDX
(5) User Interface
Host
(4) Card Identification
EVM
One process runs on the host, which is a personal computer using a Pentium III microprocessor;
it is responsible for the user interface, corner extraction, and image scaling. The other executes
on a Texas Instruments (TI) Evaluation Module (EVM) board with a TMS-6701 DSP processor
and is responsible for the correlation calculation and card identification. This chip was chosen
because the program uses floating-point arithmetic. The current implementation does the
processing image-by-image, step-by-step, and no synchronization or buffering mechanism is
used. This is quite inefficient since the program is divided into several steps each requiring
different amounts of computation. Program throughput can be significantly improved by
pipelining the program. This would involve adding buffers and synchronization mechanisms
between each step.
In dividing the program between the host and the DSP chip, we tried to match the
capability of the hardware with the requirement of the program. The corner extraction involves
testing pixel values in images. This is done iteratively with if statements inside for loops. Since
the DSP chip is not suited for this type of control structure, it is done on the host. Furthermore,
having the corner extraction done on the host avoids the large I/O overhead involved in
transferring the whole image to the EVM board in real-time. The scaling is also done on the
host. Although bicubic interpolation is basically a filtering operation suited for the DSP chip, the
data transfer to the EVM can be further simplified if the images sent are scaled to a standard size.
The cross-correlation is computed on the EVM because it involves many multiplications and
additions—both well suited for the DSP chip.
3.3 Data Format
Each image is stored and processed as two-dimensional arrays of 8-bit characters, with
each character representing one pixel. There are several reasons for this selection of data unit.
First, a character is the smallest unit of data that can be easily manipulated on both the host and
the DSP chip. Using the smallest unit possible also decreases the amount of data that is to be
3
transferred between the PC and the EVM board. If 16-bit integers were used, the data size would
be doubled.
3.4 Corner Extraction
Corner extraction is done using the GetCorner function. Its purpose is to extract the
identifying character in the corner of a playing card. The function takes in a black-and-white
image of a playing card on a black background.
Data is read from the image files into memory using the fread function in C. It is stored
in a variable called cardImg. All cards used in the project have a standard size of ? X ?. The
cardImg is then passed to the GetCorner function, and its extracted corner is returned as output.
This is achieved in three steps.
The first step is to find where the card is on the black background. The FindCorner
function does this by starting at the top-left corner of cardImg and searching through the rows for
the first non-zero pixel. This is identified as the top of the image. Starting in the same top-left
corner position but searching through the columns identifies the left edge of the image. The
same process is repeated from the bottom-right corner to identify the bottom and the right edge
of the image.
Now that the card has been located, the corner can be extracted in the second step.
ExtractCorner does this by removing from cardImg a block of pixels that extends 1/6th of the
height of the card from the top and 1/8th of the width of the card from the left.
Because the cardImg was not perfectly aligned in the background, when the corner is
extracted there are still extra zeros around its top and left edges. These are removed in the third
and final step. FillEdges simply goes around the top and left edges of the card and inverts the
zero pixel values to be one.
The final output image is stored in the variable outImg. The data from this image is
dumped to a file using the fwrite function in C.
3.5 Resizing Images with Bicubic-Spline Interpolation
In order to obtain the most accurate normalized cross-correlation value it is necessary that
the characters of the template and input image have as close to the same height and width as
possible. There are many methods of interpolation for resizing images, but one of the most
powerful is the bicubic spline interpolation method. Essentially, a continuous coordinate system
( x , y ) is defined in a region bordered on its corners by four pixel values. These four pixel
values have discrete coordinates ( i , j ) within the image, figure 3. A pixel value is calculated
and assigned to the continuous coordinates ( x , y) and then mapped to the discrete coordinate ( k
, m ) of the output image.
4
Figure 3. Coordinate mapping idea illustration
The bicubic spline algorithm uses interpolation polynomials, equations 1, in the
horizontal and
C 0 (t )  at 3  at 2
C1 (t )  (a  2)t 3  (2a  3)t 2  at
C 2 (t )  (a  2)t 3  (a  3)t 2  1
Eq 1.
C 3 (t )  at 3  2at 2
Vertical directions. These polynomials are used as weights for the input pixel values. The input
to the cubic polynomials is a value derived from the magnification factor and the location of the
continuous plane within the input image. That is, it depends on values of the magnification factor
and the coordinates of the four corner pixels. Let the input image be denoted as f i , j , the
intermediate continuous value as F ( x, y ) , and the output image value as g k , m . For
( x, y)  [i, i  1), [ j, j  1) the horizontal spline is defined as
H j ( x)  f i , j C3 ( x  i )  f i 1, j C 2 ( x  i )  f i  2, j C1 ( x  i )  f i 3, j C 0 ( x  i )
Eq 2.
and the intermediate image point is
F ( x, y )  H j C3 ( y  j )  H j 1C 2 ( y  j )  H j  2 C1 ( y  j )  H j 3C 0 ( y  j ) Eq 3.
and finally, the output image is
g k , m  F ( x, y )
Eq 4.
where x  k / u and y  m / v . The vertical and horizontal magnification factors are
v  height out height in and v  widthout widthin , respectively.
5
An efficient method of performing the above calculation over the entire input image is
described in [2]. The way the coordinates are related from f i , j to F ( x, y ) , from F ( x, y ) to g k , m ,
and ultimately the dependence of the index values of g k , m on those of f i , j play a major role in
the overall operation of the algorithm. Within the C program attached, two arrays, Lx and Ly,
are generated that store the mappings of output image indexes to the input image indexes. They
are defined as
L[ k ]  k / r
Eq 4.
where r is the magnification factor and k is iterated from zero to the desired output dimension
(height or width). An intermediate array is defined to temporarily store values from the
evaluation of equation 2. This is defined as
hk , j  H j ( k / r )
Eq. 5
After creating look-up tables for the values of equations 1, equation 5 can be written as
hk , j  f Ly [ k ], j cy3 [k ]  f Ly [ k ]1, j cy2 [k ]  f Ly [ k ]2, j cy1[k ]  f Ly [ k ]3, j cy0 [k ]
Eq. 6
The output image can then be found by
g k ,m  hk , LX [ m ] cx3 [m]  hk , LX [ m ]1cx 2 [m]  hk , LX [ m ] 2 cx1 [m]  hk , LX [ m ]3 cx0 [m] Eq. 7
The function void bcsInterp( ) is a C function based on equations 6 and 7. The input to
the function is an input image, reference to a memory location for the output image, input
dimensions, and output dimensions. From equations 6 and 7 it can be seen that if there is a main
loop indexed by k and subsidiary loops for j , l , and m , then the output image can be
determined as long as the input output image has height greater than or equal to the width. The
function uses dynamic memory allocation to create a space in memory for the output. The
auxiliary function freeMDarrays() in appendix A was written to free the two dimensional array
output when no longer needed. Failure to free the memory may result in a memory leak, which
if unchecked in an iterative process will consume available memory.
3.6 Normalized Cross-Correlation
To determine the correct playing card some numeric quantifier is needed for any
intelligent front end to make a decision. The idea is compare the input image with the template
and output a numeric value that will serve as an estimate of the degree to which the template
resembles the image. There are many such correlation schemes that will suffice, however the
most useful is one that will output a value between 0 and 1, 0 being totally uncorrelated and 1
being a perfect match.
Mathematically the normalized cross correlation is given as
6
h 1 w 1
C ( x, y ) 
  T ( x, y ) I ( x  x, y  y )
y   0 x  0
h 1 w 1
  T ( x, y )   I ( x  x, y  y )
y   0 x  0
Eq. 8
h 1 w 1
2
2
y   0 x  0
where ( x , y ) are the template, T , coordinates, ( x, y ) are the Image, I , coordinates, and h and w
are the height and width of the template, respectively, figure 4.
Figure 4. Coordinates illustration
The code that performs the normalized cross-correlation NcrossCorr( ) is basically the
power behind the entire implementation. Obviously, it will require a two-dimensional loop
structure with as few conditional checks as possible. The approach taken in the implementation
was to define the coordinates origin at the top-left corners of images and templates, while
proceeding to find the most efficient way to calculate the value.
From equation 8, it can be seen from the summations that the loops will iterate according
to the coordinates of T . That is pretty simple, but what if the template and image are not of the
same size? The unfortunate limitation of C is that it will not perform a check on mis-indexed
arrays, (e.g. an array defined to have 100 elements can still be referenced at the 105th element
without throwing an error). This limitation poses the danger that a mis-indexed array will
wander to some critical point in memory and tamper with its contents, which could be another
variable or other program data. Three solutions would be to make both template and the image
the same size and correlate once, zero pad either the template or image enough so that when
sliding these over one another (changing the values of ( x, y ) ), or perform a conditional check on
the coordinates at every iteration so that there is no danger. For this implementation the first
solution was seen as the best method, provided that the image and template could be scaled
approximately to equal size. So, ( x, y ) was held static at (0,0), which matches the top-left corners
of T and I , figure 5.
7
Figure 5. At (x,y)=(0,0) perfect overlap
This allows us to perform the normalized cross-correlation at one ( x, y ) location, rather than
multiple, and greatly reduces the number of multiply accumulate (MAC) operations performed
per input image.
The calculation within the body of the function NcrossCorr( ) can be broken up into three
MAC units that fit in succession within the loop structure, one from the numerator, and two from
the denominator. After the final iteration is complete, the square root and division operations
proceed. The output is returned to the calling function as a double between 0 and 1.
4. Test Results
The final program is able to recognize cards with an accuracy of 62% at a rate of around
0.1 seconds per card. This corresponds approximately to a video rate of 10 frames per second if
the program was interfaced to a video camera. The results are summarized in Table 1:
Table 1. Test Results of Card Recognition Program
Card Number Recognition Result Execution Time
A
A
0.11
2
2
0.09
3
3
0.10
4
4
0.08
5
10
0.09
6
6
0.12
7
Q
0.10
8
8
0.09
9
10
0.10
10
A
0.10
J
J
0.11
Q
A
0.09
K
K
0.08
8
The recognition results displayed in italics are the incorrect ones.
5. Conclusion
The goal of the project was to design and implement character recognition as the first
stage in developing a playing card recognizer. Testing resulted in correct recognition 62% of the
time. We postulate the reason for incorrect recognition of 5 of the 13 cards is due to distortion
added to the input image during rotation of the card.
Rotation is achieved using MATLAB’s imrotate function, which uses bicubic
interpolation to map the pixels of the input image to their new locations in the output image. The
amount of interpolation required depends on two factors: the quality of the input image and the
amount of shift needed to rotate that image into a standard alignment. The more interpolation
needed, the greater the amount of distortion added to the input image. The amount of distortion
added can be measured by comparing the variation in the size of the output image to the input
image. All input images were of a standard size, 480-by-640. The output images, however, vary
greatly in size due to the amount of interpolation performed on them. The images with the
greatest variance in both height and width dimensions (cards 5, 7, 9, 10, and Q) were the ones
that were incorrectly identified.
We believe this problem can be fixed in later stages by making the corner extraction
function more robust against added noise distortions. As implemented currently, the corner
extraction does a good but not optimal job of isolating the corner character by looking for the
“corner edges” of that character. When there is a lot of noise added to the image, these corner
edges can be incorrectly identified. A better way of isolating the corner character would be to
look for the biggest “blob” in the binary image of the extracted corner. This “blob” would be the
character we are trying to isolate and could be found by looking for the largest 8-connected
segment of pixels valued at 1. With a little more time, this change could be easily added to the
code.
Other extensions to the project could be to pipeline the processing of the images – the
card recognition is currently done on an image-by-image basis – and to add an actual video
camera interface rather than doing the processing on stored images.
References
1. Aramini, M. J. “Efficient Image Magnification by Bicubic Spline Interpolation”,
http://www.ultranet.com/~aramini/design.html, 2002
2. Ritter, Wilson, Computer Vision Algorithms in Image Algebra, CRC Press, NY, 2001.
9
Download