A Design of Pattern Recognition System based on the Cooperation between an Associative Memory Array Processor and the Microcomputer Abstract: This paper deploys the microcomputer with its available facilities as an input, output, and control unit for an associative memory array processor for pattern recognition. Today’s technology of microcomputers makes it easy to work with images. The major difficulty is the time consumed to execute the learning and recognition processes of a pattern. We will benefit from the Intel-based microcomputer facilities as capturing a live video image and converting it into a graphics file, providing the array processor with the necessary data to start either the learning or recognition processes through the microcomputer output ports, writing control programs to control the array processor operations, and finally getting the processing results from the array processor to display them on the output devices of the microcomputer. The associative memory array processor itself is considered as a type of single instruction stream, multiple data stream machines. The machine is built around content-addressable associative memory. The basic building block of the architecture is a one-bit processing element, which can perform the load and comparison processes. The recognition is based on simple template-matching algorithm. The bit-parallel organization is utilized to allow the comparison process to be done in only one clock cycle. Character recognition will be considered as an example. The system is able to recognize bit-mapped and outline fonts which are not limited to specific sizes and attributes of a typeface. The architecture is of interest because it eliminates the need for complex training algorithms that are usually used by neural networks. Index terms: Associative memory array processor, Bit-parallel organization, Pattern recognition, Single-instruction multiple-data machines. I. Introduction Problems in computer vision are computationally intensive. A sequence of images at medium resolution (512x512 pixels) and standard frame rate (30 frames per second) in color (3 bytes per pixel) represents a rate of almost 24 million bytes of data per second. A simple algorithm for feature extraction may require thousands of basic operations per pixel, and a typical vision system requires significantly more complex computations [1]. In fact, the need to speed up image processing computations brought parallel processing in the computer vision domain. Parallel processing reduces a task’s execution time by solving multiple parts of the problem concurrently [2]. Image processing and low-level computer vision have several characteristics that make them suitable for implementation with parallel computers among them: the operations are regular, the algorithms are not complex and the programming is relatively simple, the algorithms are stable, and images provide a natural source of parallelism [3,4]. Associative processing systems are parallel content-addressable memories which compare data on the basis of their contents rather than their location. There are two main classes of associative memories., the bit-parallel, and the bit-serial. In bit-parallel, the comparison process is performed in parallel in a single clock cycle while in bit serial, one bit slice at a time across all the words can be compared causing several clock-cycles to be utilized [5,6]. The bit-serial organization is slower in speed than the bit-parallel organization and requires less hardware [4,7,8,9,10]. This paper tries to strengthen and extend the power of traditional Intel-based microcomputers and allow them to operate much faster in the operations of pattern recognition and matching. We will introduce a design of an associative memory array processor (AMAP) that can be used in character recognition as an example. The design will be based on the bit-parallel organization. The AMAP consists of identical processing elements (PE’s) that are capable of storing information about a specific position into the pattern and also contains some logic gates to carry out logical operations. We considered that the pattern is digitized and normalized to fill a size of ixj bits that represents the image. Character recognition will be based on simple template-matching algorithm that compares the pixel positions of the whole character to reference data. The organization of the associative parallel computer system is described in Section II. Section III presents the proposed architecture of the AMAP. This section also includes the development of the logical structure of the processing element. Section IV summarizes the operation of the system. Section V describes how to handle the overall operation of the AMAP via the Intel-base microcomputer. It includes a detailed description of providing the AMAP with the input image, controlling the pattern processing operation, and getting back the results from the AMAP. Section VI describes a special purpose language for the architecture. The compiler program of this language is written in C language. Section VII illustrates a modification for the design of the input array. This modification concerns with the pattern centralization operation. It is considered as one of the most important operations in pattern preprocessing. Section VIII is assigned for some concluding remarks. Central Control Unit Control Unit Program Memory Slice Select Register Input Device Input Array AMAP Match Register Decoding Circuitry Output Device Fig.1 Parallel Organization II. Organization The bit parallel organization is the type utilized into this paper. The organization is shown in Fig. 1. The organization consists of input unit, output unit, associative memory array processor AMAP, and central control unit. The input unit has an input device that accepts the input image (ixj pixels) . The input unit then sends the image data to the input array that consists of ixj flip-flop. The input array holds the input data of the image in order to pass it to the AMAP in parallel. The AMAP is a three-dimensional array of identical processing elements PEs. Each PE is capable processing data. Operations of data processing include loading and complex searches in parallel. The AMAP can be considered to have n-slices each of ixj identical PEs. Since the system is designed for pattern recognition then a learning phase must be applied prior to the recognition process. After the learning operation, each slice holds reference data about a specific object. During the recognition process, each slice decides whether the input pattern matches its contents or not. The decision from each slice is passed directly to the output unit which has the match register MR, decoding circuitry, and an output device. The n-bit match register holds the decisions from all the slices. The pattern obtained into the MR can be easily decoded to an equivalent pattern name via the utilization of a special decoding circuitry that is responsible for converting the output pattern of the match register to a standard ASCII code. The output of the decoding circuitry is then forwarded to an output device to display the result. The central control unit consists of the control unit, the program memory, and slice select register SSR. The control unit reads the stored program in the memory and then issues control commands that control the overall system activities. It ,for example, specifies the slice of the AMAP to which an input pattern -stored in the input array- will be directed. This is done by providing the SSR with an appropriate pattern. The SSR selects a slice to work with during the load (learning) operation. III. Architecture The proposed AMAP consists of n-slices as shown in Fig. 2. Each slice is a gridlike structure of ixj identical cells. The n-bit SSR is responsible for selecting a specific slice during the learning process. This can be done by activating the bit into the SSR that corresponds to the slice (S0 for Slice 0, S1 for slice 1, ...etc.). The n-bit match register holds the result of the comparison process. If there is a slice that has a pattern matches with the input pattern then the corresponding bit into the match register will be set to 1. The unit cell of the AMAP is a one-bit processing element (PE) which can perform the load function and also contains some logic gates to enable its stored bit to be compared with the corresponding bit of the input array. SSR MR S0 S1 S2 M0 M1 M2 PE PE PE 1,1 1,2 1,3 ... P E1,j PE PE 2,1 2,2 PE 3,1 Slice n Sn PE 2,j Slice 2 Slice 1 Slice 0 .. . Pi,1 E PE PE i,2 i,3 ... PE i,j Mn Fig.2 AMAP Architecture The PE is capable of storing sufficient information about the bit-pattern it corresponds. When different typefaces of the same character are applied, a specific bit into the pattern may or may not vary. If it is varied, the corresponding PE should report that “don’t care” about this bit. If this bit has not changed during all typefaces, the corresponding PE should know precisely whether it is 0 or 1. This means that each PE should store information about only one state from the previously described 3 states “1,0, and don’t care.” Another initial state has been specified to be applied to each PE prior to the load operation of different typefaces. START I F PE LOAD COMP [a] D A A E Clear I COMP START LOAD F E D Clear Q Q [b] Fig.3 Logical Structure of the PE [a] Block Diagram, and [b] Internal logical structure The PE should be also capable of performing comparison process therefore, it should contain some logic gates in addition to some storage elements to support the process. Two separate signals “LOAD” and “COMP” are generated by the control unit and are fed to the PE to distinguish between load and compare operations. Considering the logical operation of the proposed PE, the internal logical structure of it can be easily constructed as shown in Fig. 3. The proposed PE has 2 flip flops Q and A. Q flip-flop is set to 1 if a specific position into all typefaces of the same sample is 1 while A flip-flop is set to 1 if a specific position into all typefaces of the same sample is 0. Any other case will cause Q and A to be set to 1. The state diagram of the PE is shown in Fig. 4. I=0 QA 01 I=1 I=0 QA 00 START QA 11 I=x I=1 QA 10 I=0 I=1 Fig.4 State Diagram of the PE IV Operation The operation of the PE can be divided into two processes: “learning process” and “recognition process.” During Learning process, different typefaces of the same image or character are fed sequentially to the AMAP. The control unit executes a learning operation program. This program down loads the contents of the binary pattern of the input array. It also sends a pattern to the SSR to select a slice from the AMAP. This slice will maintain reference data about a specific character. Different typefaces of the same character are fed to the same slice. The learning process can start for example with entering typefaces of character A. This means that the first slice in the AMAP will contain a reference pattern about character A. All the PE’s simultaneously change their logical contents. Table I illustrates the expected next state of both flip-flops of the PE as the input pattern from the input array changes. The operation will continue until each slice of the AMAP has reference data about a specific character. Both flip-flops of the PE change their logical contents according to the two logical expressions: Qi,jt+1 = Ii,j + Qi,jt Ai,jt+1 = I’ i,j + Ai,jt subject to the logical condition: LOAD = 1 Table I. Learning Process Qt At I Qt+1 At+1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 The recognition process can be summarized in the following steps: 1- A character or a typeface is read with a camera or scanner, digitized, and then normalized in size to fill ixj pixel binary image. 2- This binary image is stored into the input array. 3- COMP signal is set to 1 and the contents of the input array are allowed to be compared with the contents of the corresponding PE’s. 4- A decision has to be taken whether that input pattern matches with any of the AMAP patterns or not. This is done according to the logical expression: F = (A . I’) + (Q . I) subject to the logical condition: COMP = 1 5- F output of each PE is set to 1 in case of matching or don’t care and 0 otherwise. 6- All F outputs of a slice processing elements are ANDed together and the result is fed directly to the corresponding bit in the match register. Fig. 5 shows the AND operation for all the F-outputs of slice n. All the slices provide their corresponding bit to the match register simultaneously (slice 0 to M0, slice 1 to M1 ...etc.). In this way we can obtain a specific pattern into the MR. This pattern indicates which slice pattern is matched with the input pattern. A decoding circuitry is then utilized to convert the pattern of the MR into an equivalent ASCII code to be displayed on an output device. Table II summarizes the recognition process. Table II. Recognition Process Q A I F 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 x x 1 0 0 1 1 1 AND Gate Slice n PE PE PE 1,1 1,2 1,3 ... PE PE 2,1 2,2 PE 3,1 .. . PE i,1 PE PE i,3 i,2 ... PE 1,j F1,1 F1,2 PE 2,j . . . . PE i,j Match Register M0 M1 M2 Fi,j Mn Fig.5 And Operation for all the outputs from slice n