A Design of Multi-input Associative Memory Array Processor for

advertisement
A Design of Pattern Recognition System based on the Cooperation between an
Associative Memory Array Processor and the Microcomputer
Abstract: This paper deploys the microcomputer with its available facilities as an input,
output, and control unit for an associative memory array processor for pattern recognition.
Today’s technology of microcomputers makes it easy to work with images. The major
difficulty is the time consumed to execute the learning and recognition processes of a
pattern. We will benefit from the Intel-based microcomputer facilities as capturing a live
video image and converting it into a graphics file, providing the array processor with the
necessary data to start either the learning or recognition processes through the
microcomputer output ports, writing control programs to control the array processor
operations, and finally getting the processing results from the array processor to display
them on the output devices of the microcomputer. The associative memory array
processor itself is considered as a type of single instruction stream, multiple data stream
machines. The machine is built around content-addressable associative memory. The
basic building block of the architecture is a one-bit processing element, which can
perform the load and comparison processes. The recognition is based on simple
template-matching algorithm. The bit-parallel organization is utilized to allow the
comparison process to be done in only one clock cycle. Character recognition will be
considered as an example. The system is able to recognize bit-mapped and outline fonts
which are not limited to specific sizes and attributes of a typeface. The architecture is of
interest because it eliminates the need for complex training algorithms that are usually
used by neural networks.
Index terms: Associative memory array processor, Bit-parallel organization, Pattern
recognition, Single-instruction multiple-data machines.
I. Introduction
Problems in computer vision are computationally intensive. A sequence of images at
medium resolution (512x512 pixels) and standard frame rate (30 frames per second) in
color (3 bytes per pixel) represents a rate of almost 24 million bytes of data per second.
A simple algorithm for feature extraction may require thousands of basic operations per
pixel, and a typical vision system requires significantly more complex computations [1].
In fact, the need to speed up image processing computations brought parallel processing
in the computer vision domain. Parallel processing reduces a task’s execution time by
solving multiple parts of the problem concurrently [2].
Image processing and low-level computer vision have several characteristics that make
them suitable for implementation with parallel computers among them: the operations are
regular, the algorithms are not complex and the programming is relatively simple, the
algorithms are stable, and images provide a natural source of parallelism [3,4].
Associative processing systems are parallel content-addressable memories which
compare data on the basis of their contents rather than their location. There are two main
classes of associative memories., the bit-parallel, and the bit-serial. In bit-parallel, the
comparison process is performed in parallel in a single clock cycle while in bit serial, one
bit slice at a time across all the words can be compared causing several clock-cycles to be
utilized [5,6]. The bit-serial organization is slower in speed than the bit-parallel
organization and requires less hardware [4,7,8,9,10].
This paper tries to strengthen and extend the power of traditional Intel-based
microcomputers and allow them to operate much faster in the operations of pattern
recognition and matching. We will introduce a design of an associative memory array
processor (AMAP) that can be used in character recognition as an example. The design
will be based on the bit-parallel organization. The AMAP consists of identical processing
elements (PE’s) that are capable of storing information about a specific position into the
pattern and also contains some logic gates to carry out logical operations. We considered
that the pattern is digitized and normalized to fill a size of ixj bits that represents the
image. Character recognition will be based on simple template-matching algorithm that
compares the pixel positions of the whole character to reference data.
The organization of the associative parallel computer system is described in Section II.
Section III presents the proposed architecture of the AMAP. This section also includes the
development of the logical structure of the processing element. Section IV summarizes
the operation of the system. Section V describes how to handle the overall operation of
the AMAP via the Intel-base microcomputer. It includes a detailed description of
providing the AMAP with the input image, controlling the pattern processing operation,
and getting back the results from the AMAP. Section VI describes a special purpose
language for the architecture. The compiler program of this language is written in C
language. Section VII illustrates a modification for the design of the input array. This
modification concerns with the pattern centralization operation. It is considered as one of
the most important operations in pattern preprocessing. Section VIII is assigned for
some concluding remarks.
Central Control Unit
Control Unit
Program
Memory
Slice Select Register
Input
Device
Input
Array
AMAP
Match
Register
Decoding Circuitry
Output Device
Fig.1 Parallel Organization
II. Organization
The bit parallel organization is the type utilized into this paper. The organization
is shown in Fig. 1. The organization consists of input unit, output unit, associative
memory array processor AMAP, and central control unit. The input unit has an input
device that accepts the input image (ixj pixels) . The input unit then sends the image data
to the input array that consists of ixj flip-flop. The input array holds the input data of the
image in order to pass it to the AMAP in parallel. The AMAP is a three-dimensional
array of identical processing elements PEs. Each PE is capable processing data.
Operations of data processing include loading and complex searches in parallel. The
AMAP can be considered to have n-slices each of ixj identical PEs. Since the system is
designed for pattern recognition then a learning phase must be applied prior to the
recognition process. After the learning operation, each slice holds reference data about a
specific object. During the recognition process, each slice decides whether the input
pattern matches its contents or not. The decision from each slice is passed directly to the
output unit which has the match register MR, decoding circuitry, and an output device.
The n-bit match register holds the decisions from all the slices. The pattern obtained into
the MR can be easily decoded to an equivalent pattern name via the utilization of a
special decoding circuitry that is responsible for converting the output pattern of the
match register to a standard ASCII code. The output of the decoding circuitry is then
forwarded to an output device to display the result. The central control unit consists of
the control unit, the program memory, and slice select register SSR. The control unit
reads the stored program in the memory and then issues control commands that control
the overall system activities. It ,for example, specifies the slice of the AMAP to which an
input pattern -stored in the input array- will be directed. This is done by providing the
SSR with an appropriate pattern. The SSR selects a slice to work with during the load
(learning) operation.
III. Architecture
The proposed AMAP consists of n-slices as shown in Fig. 2. Each slice is a gridlike structure of ixj identical cells. The n-bit SSR is responsible for selecting a specific
slice during the learning process. This can be done by activating the bit into the SSR that
corresponds to the slice (S0 for Slice 0, S1 for slice 1, ...etc.). The n-bit match register
holds the result of the comparison process. If there is a slice that has a pattern matches
with the input pattern then the corresponding bit into the match register will be set to 1.
The unit cell of the AMAP is a one-bit processing element (PE) which can perform the
load function and also contains some logic gates to enable its stored bit to be compared
with the corresponding bit of the input array.
SSR
MR
S0
S1
S2
M0
M1
M2
PE PE PE
1,1 1,2 1,3
... P E1,j
PE
PE
2,1 2,2
PE
3,1
Slice n
Sn
PE
2,j
Slice 2
Slice 1
Slice 0
..
.
Pi,1
E
PE PE
i,2 i,3
...
PE
i,j
Mn
Fig.2 AMAP Architecture
The PE is capable of storing sufficient information about the bit-pattern it corresponds.
When different typefaces of the same character are applied, a specific bit into the pattern
may or may not vary. If it is varied, the corresponding PE should report that “don’t care”
about this bit. If this bit has not changed during all typefaces, the corresponding PE
should know precisely whether it is 0 or 1. This means that each PE should store
information about only one state from the previously described 3 states “1,0, and don’t
care.” Another initial state has been specified to be applied to each PE prior to the load
operation of different typefaces.
START
I
F
PE
LOAD
COMP
[a]
D
A
A
E
Clear
I
COMP
START
LOAD
F
E
D
Clear
Q
Q
[b]
Fig.3 Logical Structure of the PE [a] Block Diagram, and [b] Internal logical structure
The PE should be also capable of performing comparison process therefore, it
should contain some logic gates in addition to some storage elements to support the
process. Two separate signals “LOAD” and “COMP” are generated by the control unit
and are fed to the PE to distinguish between load and compare operations. Considering
the logical operation of the proposed PE, the internal logical structure of it can be easily
constructed as shown in Fig. 3. The proposed PE has 2 flip flops Q and A. Q flip-flop is
set to 1 if a specific position into all typefaces of the same sample is 1 while A flip-flop is
set to 1 if a specific position into all typefaces of the same sample is 0. Any other case
will cause Q and A to be set to 1. The state diagram of the PE is shown in Fig. 4.
I=0
QA
01
I=1
I=0
QA
00
START
QA
11
I=x
I=1
QA
10
I=0
I=1
Fig.4 State Diagram of the PE
IV Operation
The operation of the PE can be divided into two processes: “learning process” and
“recognition process.” During Learning process, different typefaces of the same image or
character are fed sequentially to the AMAP. The control unit executes a learning
operation program. This program down loads the contents of the binary pattern of the
input array. It also sends a pattern to the SSR to select a slice from the AMAP. This slice
will maintain reference data about a specific character. Different typefaces of the same
character are fed to the same slice. The learning process can start for example with
entering typefaces of character A. This means that the first slice in the AMAP will
contain a reference pattern about character A. All the PE’s simultaneously change their
logical contents. Table I illustrates the expected next state of both flip-flops of the PE as
the input pattern from the input array changes. The operation will continue until each
slice of the AMAP has reference data about a specific character. Both flip-flops of the
PE change their logical contents according to the two logical expressions:
Qi,jt+1 = Ii,j + Qi,jt
Ai,jt+1 = I’ i,j + Ai,jt
subject to the logical condition: LOAD = 1
Table I. Learning Process
Qt
At
I
Qt+1
At+1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
0
1
1
1
0
1
1
The recognition process can be summarized in the following steps:
1- A character or a typeface is read with a camera or scanner, digitized, and then
normalized in size to fill ixj pixel binary image.
2- This binary image is stored into the input array.
3- COMP signal is set to 1 and the contents of the input array are allowed to be compared
with the contents of the corresponding PE’s.
4- A decision has to be taken whether that input pattern matches with any of the AMAP
patterns or not. This is done according to the logical expression:
F = (A . I’) + (Q . I)
subject to the logical condition: COMP = 1
5- F output of each PE is set to 1 in case of matching or don’t care and 0 otherwise.
6- All F outputs of a slice processing elements are ANDed together and the result is fed
directly to the corresponding bit in the match register. Fig. 5 shows the AND operation
for all the F-outputs of slice n. All the slices provide their corresponding bit to the match
register simultaneously (slice 0 to M0, slice 1 to M1 ...etc.). In this way we can obtain a
specific pattern into the MR. This pattern indicates which slice pattern is matched with
the input pattern. A decoding circuitry is then utilized to convert the pattern of the MR
into an equivalent ASCII code to be displayed on an output device. Table II summarizes
the recognition process.
Table II. Recognition Process
Q
A
I
F
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
x
x
1
0
0
1
1
1
AND Gate
Slice n
PE
PE PE
1,1 1,2 1,3
...
PE PE
2,1 2,2
PE
3,1
..
.
PE
i,1
PE PE
i,3
i,2
...
PE
1,j
F1,1
F1,2
PE
2,j
.
.
.
.
PE
i,j
Match Register
M0
M1
M2
Fi,j
Mn
Fig.5 And Operation for all the outputs from slice n
Download