MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY A Project Presented to the faculty of the Department of Computer Science California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Electrical and Electronic Engineering by Avinash Kumar Pandey FALL 2012 © 2012 Avinash Kumar Pandey ALL RIGHTS RESERVED ii MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY A Project by Avinash Kumar Pandey Approved by: __________________________________, Committee Chair V. Scott Gordon, Ph.D. __________________________________, Second Reader Dennis Dahlquist, P.E. ____________________________ Date iii Student: Avinash Kumar Pandey I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Preetham Kumar, Ph.D. Department of Electrical and Electronic Engineering iv ________________ Date Abstract of MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY by Avinash Kumar Pandey This project provides a simplified model in Verilog to demonstrate a flow of communication for a neural network chip called CM1K. In this communication model, the HDL acts as a data provider to the chip as well as a controller to channelize the communication flow. Hardware level communication with a non-contemporary technology can be challenging, particularity for someone new to the technology. This project provides a good understanding of the CM1K technology, HDL model for the controller, communication protocol, necessary knowledge of the tools for creating an HDL model and simulation results using ModelSim tool, demonstrating the working of the HDL logical blocks. The HDL model is ASIC agonistic and can be used for further references and research purposes. _______________________, Committee Chair V. Scott Gordon, Ph.D. _______________________ Date v ACKNOWLEDGEMENTS I would like to extend my gratitude to my project advisor and committee chair, Dr. V. Scott Gordon for his valuable help, timely guidance and moral support. I will always remember his positive and considering attitude. His dedication towards making a student successful is incomparable and inspiring. I would like to thank him for thoroughly reviewing and giving me his valuable feedback. I have learned many things form him and will be sure to take it forward to my professional career. I would also like to thank Dr. Dennis Dahlquist for spending his valuable time and giving a second opinion on my project report. I thank him for all his cooperation and reviewing my report in a very timely manner. I would like to thank Dr. Kumar for his guidance throughout my master’s program. He is a great asset for all graduated students in EEE department. I would also like to thank CogniMem team for their support, help and consideration. I personally like to thank Mr. Bruce McCormick for his help and considerations. I would also like to mention Mr. Bill Nagel, for his guidance and help, and extend my gratitude towards him as well. I would also like to thank Mr. Sam Miller from Ansync for providing me with the tools necessary to troubleshoot my design. vi I would like to thank my friend, Jay Panchal for giving me encouragement during frustrating moments of the project. I also would like to thank my roommate Ayush Chadha and friend Sarjeet Goswami for their support and motivation throughout my master’s program. Lastly, I would like to thank everyone who has helped me directly or indirectly in finishing my master’s program. Avinash Kumar Pandey vii TABLE OF CONTENTS Page Acknowledgement ........................................................................................................... vi List of Tables .....................................................................................................................x List of Figure ................................................................................................................... xi Chapter 1. INTRODUCTION ........................................................................................................ 1 1.1 Overview ................................................................................................................ 1 1.2 Purpose of the Project ............................................................................................ 2 1.3 Benefits of MCwCM.............................................................................................. 3 1.4 Applications of the CM1K technology .................................................................. 3 2. BACKGROUND .......................................................................................................... 4 2.1 Overview ................................................................................................................ 4 2.2 Definitions.............................................................................................................. 5 2.3 CM1K Architecture ............................................................................................. 10 2.4 CM1K Working and Applications ....................................................................... 15 2.4.1 Learning of a Neuron .................................................................................. 17 2.4.2 Current use of the Technology .................................................................... 18 3. INTRODUCTION TO LATTICE FPGA-LFXP2-5E-6TN144C AND CM1KPGA69 MODULE .................................................................................................... 20 3.1 Introduction .......................................................................................................... 20 3.2 Designing with an FPGA ..................................................................................... 20 3.2.1 Lattice-XP2 family...................................................................................... 21 3.2.2 Programming a Lattice -XP2 FPGA ........................................................... 24 3.2.2.1 Diamond Design Software and Design suite .................................. 25 viii 3.3 Design flow .......................................................................................................... 27 3.4 HDL-Verilog® ..................................................................................................... 30 4. MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY (MCwCM) .................................................................................................................................... 32 4.1 Introduction .......................................................................................................... 32 4.2 CM1K communication Protocol ......................................................................... 33 4.3 MCwCM components .......................................................................................... 39 4.4 Signal description of MCwCM ............................................................................ 40 4.5 Communication protocols .................................................................................... 43 4.6 MCwCM Design Flow ......................................................................................... 44 4.7 Simulation results................................................................................................. 47 4.8 MCwCM “write” operation ................................................................................. 51 4.9 MCwCM “write” operation ................................................................................. 52 6. Conclusion .................................................................................................................. 54 Appendix A. MCwCM design code in Verilog ........................................................... 55 Appendix B. MCwCM test-bench code in Verilog ..................................................... 61 References ...................................................................................................................... 64 ix LIST OF TABLES Tables Page 1. Table 2a: Manhattan distance Vs. Norm L-sup ......................................................... 16 2. Table 2b: Classification Example .............................................................................. 17 3. Table 3a: Basic physical attributes of LFXP2-5E-6TN144C FPGA device.............. 23 4. Table 3b: Tools and Descriptions of the tools in Diamond Design suite .................. 26 5. Table 4a: Neuron Registers-1 .................................................................................... 34 6. Table 4b: Neuron Registers-2 .................................................................................... 35 7. Table 4c: Neuron Registers-3 .................................................................................... 36 x LIST OF FIGURES Figures Page 1. Figure 2a: CM1K Introduction ................................................................................. 10 2. Figure 2b: RBF Classifier class relationship ............................................................ 11 3. Figure 2c: Recognition time vs. Knowledge size graph ........................................... 12 4. Figure 2d: CM1K Functional Diagram .................................................................... 13 5. Figure 2e: Methodology Flow Chart ........................................................................ 18 6. Figure 3a: The Lattice XP2 Brevia Development kit ............................................... 22 7. Figure 3b: XP2 on-chip component placement ........................................................ 24 8. Figure 3c: Typical design flow ................................................................................. 28 9. Figure 3d: Typical design flow for an FPGA ........................................................... 29 10. Figure 4a: MCwCM Block diagram ......................................................................... 40 11. Figure 4b: MCwCM signal representation ............................................................... 41 12. Figure 4c: MCwCM state diagram ........................................................................... 46 13. Figure 4d: Waveform output-1 ................................................................................. 47 14. Figure 4e: Waveform output-2 ................................................................................. 48 xi 1 Chapter 1 INTRODUCTION Today’s computers are based on Von Neumann’s model of computer architecture. Scientists and researchers speculate that contemporary technologies are ultimately limited by the fact that most of their operations are done serially. Issues involved with the Von Neumann’s architecture are scalability, cache coherency, clock synchronization and other issues related to shared-memory. In the near future, tackling these challenges on a serial-based architecture will become increasingly difficult. 1.1 Overview A parallel architecture can use contemporary technologies and can still satiate the need for speed and performance. A neural network is one example of a parallel architecture based technology. However, a software implementation of a neural network is typically executed in serial, and thus can only perform as well as a serial architecture based system, hence is not truly parallel. A hardware implementation of neural networks can however be a truly parallel architecture. Neural Networks have been around for a few decades but a hardware implementation of this technology is still an innovative concept. CogniMem Technologies Inc. (CTI) has designed a chip called CM1K [7], which is based on purely parallel architecture. 2 1.2 Purpose of the Project Currently, the primary mode of communication with the CM1K is through software applications, on pre-defined hardware platforms. Lack of a direct communication with the chip at hardware level hinders a lot of applications and research opportunities. Hardware level communication with a non-contemporary technology can be challenging, particularity for someone new to the technology. There are various aspects involved at this level of communication. First and foremost, one has to have a clear understanding of the technology, and second, a good knowledge of the communication protocols and specifications. This project provides a good understanding of the technology through a simplified model in Verilog® [24], and demonstrates a flow of communication between a CM1K chip and an FPGA [13]. A detailed documentation is provided for further usage of this model for future researches and projects. This communication model is called MCwCM (Model for Communication with Cognitive Memory). MCwCM consist of an HDL [3] (Hardware Descriptive Language) design, which acts as an external source for data generation for the CM1K chip, and is also the controller to administer the communication flow. The ModelSim® tool used to demonstrate the results for the logical working of the design. This tool can generate test vectors to test a design under test (DUT) and can provide output in the form of waveforms. 3 1.3 Benefits of MCwCM Following are the benefits of the system under discussion: 1. Provides a deep understanding of CM1K technology. 2. Basis of future projects and researches in the field of hardware implementation of neural networks. 3. A ready-to-use hardware level communication model for evaluating the CM1K technology. 1.4 Applications of the CM1K technology. Following is a list of a few fields of application where this technology can be proved revolutionary. 1. Data mining. 2. 3D graphic rendering. 3. Machine vision. 4. Pattern recognition. 5. Gesture recognition. 6. Optical character recognition. 4 Chapter 2 BACKGROUND 2.1 Overview The human brain is the best known processor to humanity. Our brain processes a huge amount of data around us and gives us the ability to respond based on our past learning and experiences. This central processing unit (CPU) of human body makes things that we do on the daily basis effortless. When conscious, our brain uses the cerebral cortex for its functionality [8]. Six layers comprise the Cerebral cortex; they all work in tandem to maintain wakefulness and consciousness. The functional column of this multilayered organization is made up of vertical chain of neurons [9]. Each functional column is a functional unit of cerebral cortex and consists of 4000 neurons [8]. Our brain can connect thousands of neurons with billions of connections to provide us the generalization [10] and learning capabilities. Our brain uses this learning and generalization capabilities to learn new models and retain that information. For a reasonable response, brain needs to see a model over a period of time and for a certain length of time. The adaptability to new knowledge provides brain an ability to be able to self-learn and readjust the preexisting knowledge in its Hierarchical Temporal Memory (HTM) [10]. In 1993, Guy Paillet, a French inventor and researcher, presented the concept of a self-organizing trainable parallel neural network chip to IBM and worked with a team 5 at the IBM lab in Essonnes, France lead by Pascal Tanhoff [1]. The outcome of this collaborative effort was an ASIC trademarked by IBM as the Zero Instruction Set Computer (ZISC) chip. Two generations of ZISC were released: ZISC36 with 36 neurons and ZISC78 with 78 neurons. In 2008, Anne Menendez, a researcher and technologist, and Guy Paillet designed CM1K™. This is an advanced version of ZISC78 with 1k neurons along with some additional features. 2.2 Definitions Every technology has a set of terminologies and keywords, which are often hackneyed among the technologists, but are exotic to a new user of the technology. Therefore, it is necessary for a novice to understand these terminologies and keywords to get the most out of the CM1K technology. The following are the most important keywords and terms used in this technology. 1. Neurons: A neuron is a cognitive and reactive memory, which can autonomously evaluate the distance between an incoming pattern and a reference pattern stored in its memory. If this distance falls within a range called the influence field, the neuron returns a positive classification, which consists of the distance value and the category of its reference pattern [7]. 6 2. Vectors: A neurons in a CM1K chip can store data with 8-bit wide and 255-bit deep. So, in total each neuron has a storage capacity of 256 Byte. An input for neuron storage is called as an input vector or a vector. 3. KNN: The k-nearest neighbor algorithm (k-NN/KNN) is a method for classifying objects based on closest training examples in the feature space [7]. This is one of the classifiers in the CM1K technology. 4. RBF: RBF is a short form for Radial Base Function. This is one of the classifiers used in the CM1K technology. RBF embeds in a two layer neural network, where each hidden unit implements a radial activated function. The output units implement a weighted sum of hidden unit outputs [4]. 5. ZISC: ZISC represents “Zero Instruction Set Computer”. This is in contrary to RISC (Restricted Instruction Set Computer). The concept of ZISC was presented by Guy Paillet in 1993 [1], and is base foundation of CM1K technology. 6. Classifiers: In neural networks, a classifier is a method for determining which of a finite number of categories to assign to a set of input values. In the case of CM1K technologies, there are two classifiers: KNN and RBF. 7. Distance: Distance is a term to quantify the amount of drift in an incoming vector pattern with respect to a stored vector pattern in a neuron. 7 8. Category: In CM1K technology, category is a term used to distinguish stored patterns in the CM1K. One CM1K chip can differentiate 32768 different category values [31]. 9. Learning a vector: A learning vector represents an incoming vector with intent of modifying the information in the neurons. A learning vector always the following “write” sequence : - Should write to COMP [7] register followed by LCOMP [7] register followed by CAT (category) register. 10. Broadcasting a vector: A Broadcasting vector is similar to an incoming vector. Except from writing to the CAT register, it follows the same write sequence as incoming vector. 11. Committed neuron: In CM1K system, a committed neuron is a neuron, which has some information/data stored in it along with an associated category. 12. Recognition of a vector: This process compares an incoming vector with a value stored in neurons. If the incoming pattern matches with the stored pattern then the recognition is successful otherwise, it is either no recognition or uncertain. Apart from above the mentioned recognition states concept, there is a false recognition state concept as well; which means that the incoming 8 pattern is recognized as one of the stored categories, however, is not the correct category for that particular incoming pattern. For instance, suppose if you have trained a CM1K system to recognize two fruits- an apple and an orange. Now to test the system, a test vector representing a pattern of an orange is broadcasted. If the system recognizes this incoming pattern as an apple instead of an orange then this “recognition state” is refers to a “false recognition”. 13. Degenerated neurons: When the Active Influence Field (AIF) of a neuron reduces to a value lesser than its Minimum Influence Field value (MINIF), then the CM1K chip generates a flag for the neuron. A neuron associated with such a flag refers to a degenerated neuron. 14. Minimum area of influence: Minimum areas of influence or Minimum Influence field abbreviates as MinIf (MINIF). It determines the minimum limit for area of influence of a neuron. Usually it is set to a value of two, however, can go all the way down to a zero. 15. Maximum area of influence: Maximum areas of influence or Maximum Influence field abbreviates as MaxIf (MAXIF). It determines the maximum limit for area of influence of a neuron. 65536 is maximum value that can be set as a MaxIf value. 9 16. Firing state: Firing state is a state in which the distance between an input vector and a stored pattern is less than neuron’s influence field. 17. RTL: In CM1K technology, RTL stands for Ready-To- Learn Neuron. In HDL design, RTL stands for Register-Transfer-Level. 18. Bi-Directional Bus: “Bus” here represents a channel for data exchange. A bi-directional bus represents a single channel that is capable of transmitting data as well as receiving data on the same physical connection. 10 2.3 CM1K Architecture The CM1K is composed of massively hardwired parallel architecture with non- linear classifiers. The technology in CM1K consists of a chain of identical cells (i.e. neurons) addressed in parallel with their own “genetic” material to store, learn and recall patterns [6]. It does not need any additional code or external control unit to store, learn or recognize a pattern. A neuron reports to a supervision unit autonomously. Figure 2a: CM1K Introduction [6] 11 Figure 2b: RBF Classifier class relationship The above figure displays a relation between a 2-dimensional non-linear decision space and the RBF classifier. The CM1K’s RBF network is highly adaptive and is capable of a real-time reinforced learning. Its architecture enables the tracing of an incoming vector for novelty and commits new neurons as per the need for new models. The neurons work in collaboration with each other through a bi‐directional parallel bus. This provides accuracy with recognition speed in real‐time and adaptive setting [6]. The CM1K neural network is a Radial Basis Function (RBF) classifier with built-in model generator. This classifier can efficiently cope with disjoint, embedded and overlapping classes. 12 Additionally, CM1K also has the capability to work in K-Nearest Neighbor (KNN) mode. The KNN is one of the most effective algorithms in the neural networks. If used effectively, it can be used for applications like data-mining, computational geometry and human vision with utter simplicity and accuracy. The parallel memory based architecture of the CM1K, does not require Von Neumann’s fetch and decode model. Due to its parallel architecture, execution time becomes independent of the number of trained examples. It solely depends on the number of input vectors, length of the input vector and the value of K in KNN mode. Figure 2c: Recognition time vs. Knowledge size graph [6] This chip embeds a recognition engine to classify a digital signals, which can be received directly form a sensor. To enable this feature of the chip, a user must enable 13 the RECO_EN [2] signal. RECO_EN [2] enables the in-built recognition state of the chip. When this state is enabled, it acts as a master controller. A user can access the V_DATA [2] bus by enabling RECO_EN and VI_EN [2] simultaneously. V_DATA bus is only accessible when both RECO_EN and VI_EN are enabled. V_DATA can be used to provide a video-input directly to the CM1K chip. The chip then can use its internal recognition logic to process the information to produce an output. Figure 2d: CM1K Functional Diagram [31] Figure 1d shows a block diagram of the CM1K architecture. This chip consists of the following modules: 14 Top Control logic (NSR and RSR registers, Ready and busy Control Signals) Cluster of 16 Neurons Recognition Stage (optional Usage) I2C slave (optional Usage) Top Control logic: This module synchronizes communication between the clusters of neurons, the recognition state machine and the I2C slave. In addition, it is responsible for inter-module and inter-neuron communications. The inter-module communication through a bi-directional parallel bus of 25 wires Data Strobe (DS), Read/Write (RW_), 5-bit Register (REG), 16-bit Data (DATA) and Ready (RDY). The inter-neuron uses two additional lines indicating the global status of the neural network; Identified recognition (ID) and Uncertain recognition (UNC). Communication with external control units can be made through the same parallel bus or by using serial I2C bus [2]. Cluster of 16 Neurons: Cluster of Neurons consists of 16 identical neurons operating in parallel. The execution or behavior of neurons is independent of their parent cluster or chip. All neurons form same cluster of different cluster display the same behavior and execute the instructions in parallel. No controller or supervisor is required for this. This module is responsible for the selection of one of the two classifiers: KNN or RBF. [2] Recognition stage (optional usage): Enabling RECO_EN pin on the CM1K chip enables the Recognition. This can be done programmatically via a control command. 15 This module is used for the recognition of the incoming vectors and providing a response [2] and becomes a master controller for the CM1K chip upon enabling the RECO_EN pin. I2C slave controller (optional usage): is enabled physically with I2C_EN pin and used for receiving the serial signal on the I2C_clk and I2C-DATA lines. It also converts it into a combination of DS, RW_, REG and DATA signals compatible with the parallel neuron bus [2]. 2.4 CM1K’s Working and Applications CM1K emulates our brain to some extent. It gets information in the form of input vectors, a neuron reacts to a command and either learn the pattern or work in tandem with other neurons to recognize a pattern. A neuron memory consists of 256 components, with each component 8-bit wide. Therefore, in essence each neuron has 265 Bytes of total storage capacity. Each CM1K is made up of 1024 such neurons, which makes 1024 times 256 byes of memory for a system with one CM1K chip. Each neuron has its own processing unit that reacts to an incoming pattern. When a vector is broadcasted to the CM1K chip, all of the neurons can access the information at once. When attempting to recognize a pattern, each neuron has the capability to spy the response of its counterparts and to withdraw itself from the race if another neuron reports a smaller distance value [7]. 16 In CM1K, its Distance Evaluation Unit (DEU) calculates a distance value. The distance is calculated between the incoming vector and the pattern stored in neurons. DEU can calculate distance by either by Manhattan distance (also known as Norm L1) or by Norm L-Sup (also known as Lsup). Both Manhattan distance and Lsup distance have their advantage and disadvantages. The following table states some of the features of the above-mentioned methods: Manhattan Distance Mathematical equation DL1= ∑ |Vi –Pi| Norm L-Sup Mathematical equation DLsup = Max |Vi-Pi| The L1 distance emphasizes the drift of The Lsup Distance emphasizes the largest the sum of all the components between drift of the same component between incoming vector V and the stored input vector V and the stored pattern P. pattern P. Table 2a: Manhattan distance Vs. Norm L-sup In CM1K, neurons will “fire” if the distance between the input vector and the stored pattern is less than its influence field. This state of firing refers to “Firing state”. Firing state is the neurons’ way of communicating that they have the information which is been broadcasted on the bus. A firing neuron contains a lot of information, one of the most important pieces of information is the category associated with a neuron. The system multiplexes the response from each fired neuron to produce a classification. The CM1K classifies a response as either an Identified or Uncertain recognition. 17 Let us take an example to understand the responses. Consider a system that has four trained neurons. Each neuron is associated with either category 1 or category 2. Now, let us broadcast an input vector, which is similar to the knowledge stored in the neurons. Now depending upon the multiplexed result of the neurons result, the system will determine if the input vector was recognized of not. The following table shows the possible outcomes Total number of neurons Total number of neurons Classification associated with Category 1 associated with Category 2 3 1 Identified as category 1 2 1 2.4.1 2 3 Table 2b: Classification Example Uncertain recognition Identified as category 2 Learning of a Neuron Learning in neurons is a multi-step process. First, an input vector is broadcasted on the bus followed by a category value. Second, those neurons are either in RTL (Ready-To-Learn) state or in “firing state” response to broadcasting vectors. Now, if a firing neuron responds to the learning operation then its influence field is modified with the latest information. If no neuron fires to the broadcasting vector then a new neuron gets committed with the category of the input vector. The influence field for this new neuron gets set to its maximum influence field. 18 The neurons in the CM1K are connected through a bi-directional parallel bus that prevents any redundant learning in the system. This bus is accessible to all of the neurons. Whenever an input vector is broadcasted on this neuron bus, each neuron checks for their status and response to the broadcast accordingly. 2.4.2 Current use of the Technology CM1K’s pattern recognition capabilities have benefited many fields. One of its practical implementations is offshore fish recognition and sorting. In this application, it uses V1KU [1] which integrates a CMOS sensor for providing image/video input vector data and a CM1K chip for pattern recognition. However, this system is accompanied by a software tool that is used for communication with this device. The following block diagram explains the process involved in this application. Figure 2e: Methodology Flow chart [11] 19 This system requires software interface for its operations, hence is very software dependent. An end user can communicate with this system only through Image Knowledge Builder (IKB) (-name of the software), which prohibits the possibility of a direct hardware level communication with the CM1K chip. IKB provides a model for communication with the chip at software level. This model provides great software applications perspectives but has limited access for the hardware level communication. The software and an USB port handle all the communication protocols in IKB; it is difficult to define exact functionality of the chip and limits the possibility of exploration of the CM1K technology. For this arises a need for a hardware communication model. Such a model will enable an end user in understanding this technology to the core. There are various benefits for such a model, such as speeding up the processes of data exchange, as it cuts down the USB overhead. With software communication model there will always be an overhead of some or the other standard protocols, USB in case of the IKB. 20 Chapter 3 INTRODUCTION TO LATTICE FPGA-LFXP2-5E-6TN144C AND CM1K- PGA69 MODULE 3.1 Introduction Two major pieces of hardware involved in this project are, an FPGA from Lattice Semiconductor™ and a CM1K –PGA69 module for Cognimem Technologies Inc. This chapter introduces both of these hardware modules. This chapter also discusses other tools necessary for this project along with a brief description of Veriloga Hardware Descriptive Language (HDL). 3.2 Designing with an FPGA There are three main categories of Field Programmable Devices (FPDs): Simple Programmable Logic Devices (SPLDs), Complex Programmable Logic Devices (CPLDs) and Field Programmable Gate Arrays (FPGAs) [13]. Each of them have their advantages and disadvantages, covering all of which would be out of the scope of this report. This section will provide a brief overview of an FPGA and then will discuss Lattice-XP2 FPGA architecture in a little detail. The first type of user-programmable chip that could implement a logic circuit was a Programmable Read-Only Memory (PROM) [13], since then, FPDs have evolved dramatically. Now one can find a piece of FPD suitable for virtually every application. 21 In 1985, Xilinx introduced a concept of combining the user control, the densities, cost benefits of gate arrays, and named it as the FPGA. An FPGA is a regular structure of Logic cells™ or modules and interconnects which is under the designer’s complete control [15]. There are two type of FPGAs: Static Random Access Memory (SRAM)-based and One-time programmable (OTP). This types of FPGAs differs in the implementation of the logic cells™ and the mechanism to use to make connections in the device [15]. However, SRAM-based FPGAs are most popular among the designer because of their reusability. FPGA is among the fastest growing segments of the semiconductor industry. There are several FPGA vendors in the market. Quality and features are the deciding factors in selecting an FPGA. Xilinx, Altera, Lattice, Actel are among the top vendors of the industry. For this project, a Lattice-Xp2 family FPGA was chosen. CogniMem Technologies Inc. facilitated with a Brevia Evaluation board [19] and a CM1K-PGA69 [16] module for this project. 3.2.1 Lattice-XP2 family LatticeXP2™ is an Instant-on, secure, small-form-factor FPGA with a versatile development platform for quick launch of design Initiatives and rapid time-to-market [17]. These devices are based on flexiFlash™ [18] architecture that combines a 4-input 22 Look-up Table (LUT) based FPGA fabric with non-Volatile Flash cells for on-chip storage of design data [17]. Figure 3a: The Lattice XP2 Brevia Development kit [19] The LatticeXP2 Brevia Development kit has a LFXP2-5E-6TN144C FPGA device, a 2Mbit SPI Flash Memory, a serial RS-232 interface, On-board USB controller for JTAG programming (FTDI - FT2232H), a 2x20 and a 2x5 Expansion Headers, 4-bit DIP Switch for user-defined inputs and 8 Status LEDs for user-defined outputs [15]. LFXP2-5E-6TN144C FPGA device is a part of XP2 FPGAs family and inherits the benefits like instant-on, infinite re-configurability, on-chip storage with FlashBAK embedded block memory and serial TAG memory with design security feature [20]. 23 Table 3a: Basic physical attributes of LFXP2-5E-6TN144C FPGA device [20] [19] LFXP2-5E-6TN144C FPGA device is a 144-pin form-factor with its core working at 1.2Volt. It has five thousand (5k) LUTs for a designer as a resource for programing. This device contains an array of logic block surrounded by Programmable I/O cells (PIC). Interspersed between the rows of logic blocks are rows of sysMEM™ Embedded Block RAM (EBR) and a row of sysDSP™ Digital Signal Processing block as shown in figure 3b [20]. 24 Figure 3b: XP2 on-chip component placement [20] 3.2.2 Programming a Lattice -XP2 FPGA Lattice semiconductor™ is one of the most popular FPGAs in the semiconductor market, their innovative architecture, programmable Power Management and Clock management solutions provides an easy to use, yet powerful device. They also provide a lot of free software tools to with their devices for a complete design solution. Several such tools are used in this project. Lattice Diamond design software is one of the most important tools used in this product. 25 3.2.2.1 Diamond Design Software and Design suite. Lattice diamond® design software allows large and complex design to be efficiently implemented into the lattice XP2 family of FPGA devices. The diamond software uses the synthesis tool output along with the constraints from its floor planning tools to place and route the design in the LatticeXp2 device. The timing tool extracts the timing from the routing and back-annotates it into the design for timing verification [20]. Lattice semiconductor™ provides customers with a free logic design environment called Lattice Diamond. This tool provides a complete platform for a design. A designer can chose Verilog® [24], VHDL [23] or a mix of both of these Hardware Descriptive Languages (HDL) [24] for their design. It also allows mixing of EDIF [22] and schematic sources [22]. Writing pin constraint is easy with its Spreadsheet view [22]. A designer can install the latest version of the Diamond tool from Lattice Semiconductor™ websites. Currently this tool is available for Linux and Windows platform. There are several other tools, which come as a part of the tools set along with Diamond suite. Some of these tools are optional and some are a part of the Diamond software design environment. A few of these tools can be downloaded as a stand-alone tool and can be used without installing Diamond software. Diamond Programmer [22] is one such tool that is a part of the Diamond design tool and can also be used as a stand-alone tool. This project uses the Diamond software extensively. The following 26 table gives an overview of the tools that are part of the Diamond software. These tools will be used for their respective purposes as needed throughout the project. Tool Project Management Tools Design Entry Tools Design Simulation Tools Design Constraints Application Tools Design Implementation Tools Analyzing Static Timing, Power Consumption, and Signal integrity Tools Programming the FPGA Tool Testing and Debugging Onchip Tools Applying Engineering Change Order Tool EPIC Device Editor HTML Help and User Documentation Tcl/Tk Scripting Tool Description Include the Reports view, Run Manager, and the Security Setting tool to enable you to create and maintain the project, keep track of the stages in the design implementation process, review reports, and compare different implementations of the project. Include Source Editor, Schematic Editor, Symbol Editor, Symbol Library Manager, IPexpress, Memory Generator, LatticeMico System, and HDL Diagram, which offer VHDL, Verilog, EDIF, schematic and mixed-mode design entry support and design structure check. Include Simulation Wizard, Active-HDL Lattice Edition, and Waveform Editor for performing functional simulation for the projects and creating the test stimulus files. Include Spreadsheet View, Package View, Device View, Netlist View, NCD View, Floorplan View and Physical View to enable you to set constraints for implementing the design. Include the Process view, Synplify Pro for Lattice, and Clear Tool Memory to ease the design implementation process. Include Timing Analysis View and Power Calculator to enable you to estimate the design performance, experiment with different configurations, and to calculate power consumption. Includes Programmer to let you program the FPGAs. Include Reveal Inserter, Reveal Analyzer, and LatticeMico Debugger to let you complete the final stage of developing a design: testing in the actual FPGA, either on a test board or in your system. Includes ECO Editor which supports engineering change orders by editing the output files from the place-and-route stage of the design implementation process. Provides device editing capability for engineering change management and detailed manipulation of FPGA implementation. Includes complete instructions for designing with Diamond design tools and third-party tools. Also provides user manuals, tutorials, example design projects, and access to technical documentation from the Lattice Semiconductor Web site. Enables you to automate Diamond design processing. Table 3b: Tools and Descriptions of the tools in Diamond Design suite [21]. 27 3.3 Design flow Hardware Descriptive Language (HDL) model typically follows a design flow. In general, this flow is followed by industries to tape-out Application Specific Integrated Circuits (ASICs) or General Purposes Processors (GPPs). This typical flow starts with a design specification, followed by a behavioral description. Once behavior description is complete, a Register Transistor Level (RTL) is described in HDL. After finishing writing HDL code, a functional verification is done. If the functional verification [27] fails then, RTL description is checked again to make sure that there are no mistakes in the process so far. If the functional verification passes the design specification criterion then logic synthesis [25] and timing verification [26] of the design is done. A Gate-Level Netlist is then generated, followed by Logical verification and testing. If every process passes the specification criterion so far, then Floor-Planning and Automatic place and Route is done. This process leads to Physical layout and then Layout verification is done. If this processes passes all the industry criterion and design specifications then Implementation of the design is done. In case of any failure in this step of design flow the RTL description step is revisited and then whole flow is followed until Layout verification is accomplished. 28 Design Specification Behavioral Description RTL Description (HDL) Functional Verification and Testing Logic synthesis and Timing Verification Gate- Level Netlist Logical Verification and Testing Floor Planning and Automatic Place & Route Physical Layout Layout Verification Implementation Figure 3c: Typical design flow [3] In FPGAs, a similar approach is followed, however instead of Implementation of the design as a physical chip; the design is implemented into the FPGA. This implementation results in a design as good as an ASIC but with the flexibility to change the design at any point of time due to any good reason. Using an FPGA for a prevalidation of a design can lead to saving a lot of time and money for any innovative venture. The following figure displays a typical design flow for a FPGA. 29 Figure 3d: Typical design flow for an FPGA [29]. The design flow for Very Large Scale Industry (VLSI) Integrated Circuits (ICs) and FPGAs are slightly different; they still use HDLs to describe digital circuits. There are two basic HDLs which can be used to describe any digital circuit and follows IEEE standards: Verilog® [24] and VHDL (Very high-speed integrated circuit Hardware Descriptive Language) [23]. This project has used Verilog for representing a digital model of communication between an FPGA and a cognitive memory chip CM1K. 30 3.4 HDL-Verilog® In the early 1980s, the digital designing market was at the verge of booming. Digital designers felt the need of a language that could translate/generate a Netlist [24, 25] from an HDL code. This need for a language to model a digital circuit gave birth to Verilog in 1983 at Gateway Design Automation [3]. Verilog was accepted as an IEEE standard in 1995. Later another standard for Verilog was published as IEEE- 1364-2001 that had many significant changes [3]. Now designers can describe circuits at register transfer level (RTL) using Verilog. All a designer has to do is to describe the data flow using HDL at RTL level and the rest would be taken care by the synthesis tool [25]. The synthesis tool would extract all the required information form the code and will provide the designer with the details of the gates and their interconnections to other gates [3, 25]. An HDL has many benefits over a schematic based digital design. One of the most lucrative advantages of using HDL for designing a digital circuit is that it makes the design fabrication technology independent. This means that an HDL design for logic can be used for fabricating a circuit at 35 micrometer or at 32 nanometer. A lot of money and time can be saved by modeling a design in HDL than actually manufacturing a prototype and then troubleshooting it. Additionally, with the growth in the complexity in digital circuits, it has virtually became impossible for a human to physically design a circuit without using a HDL and other associated tools [22]. When 31 a designer designs an HDL code, it needs to verified and tested for its functionality. There are various methods and tools available in market to validate a Verilog design. ModelSim® is used for simulation and testing of the Verilog code in this project. This is a great tool for student and professional to simulate a HDL design. This tool provides a support for both Verilog and VHDL. It has an easy to use graphical interface that makes it one of the better choices for students starting with HDL simulation and verification. A Student Edition version of this tool can execute up to 10,000 lines of code [30]. ModelSim provides a better way to diagnose a design by providing waveforms and data flow for a design. 32 Chapter 4 MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY (MCwCM) 4.1 Introduction Hardware level communication requires a good understanding of the communication protocol between two devices. According to Oxford dictionary, in English language, protocol is defined as “The accepted or established code of procedure or behavior in any group, organization, or situation” [32]. This also applies to two devices communicating to each other; they need to agree upon a set of rules for communication. Communication between two devices implies to an exchange of information or data. Therefore, in the world of digital electronics, a protocol refers to a set of rules that ensures a smooth flow of information between two devices. MCwCM demonstrates a hardware level communication between a CM1K PGA-69 [14] and Lattice FPGA-LFXP2-5E-6TN144C [15]. Previous chapters were dedicated to CM1K, Lattice FPGA device and required tools and technologies. This chapter will provide a detailed description of the MCwCM design. In MCwCM system, the flow of information happens between a Lattice FPGALFXP2-5E-6TN144C and the CM1K PGA-69 module. From here onwards, this document will use “PGA” to represent CM1K-PGA69 module, “FPGA” to represent Lattice FPGA-LFXP2-5E-6TN144C and “eval board” to represent the LatticeXP2 Brevia Development kit. 33 4.2 CM1K communication Protocol CM1K can work in two modes: Normal mode, or Save and Restore mode. A Normal mode is also sometime refers to Learning and Recognition mode. In this mode, a designer can utilize 16-bit parallel bus or can use serial bus for an application. MCwCM is designed to work in the normal mode and utilizes CM1K parallel bus. 15 registers can control behavior of all neurons. The following tables discuss neuron registers in detail. These tables have six columns, the first column represents the name of the register. The second column describes the functionality of the register. The third column contains the address for the respective registers. A write cycle can update a register whereas a read cycle lets the used to render the information of a particular register. The fourth column describes the possible operations on a particular register in normal mode. The column fifth describes the possible read or write operation on a particular register in save and restore mode. Lastly, column six displays a default value for a particular register along with its value range. 34 Table 4a: Neuron Registers-1 [7] 35 Table 4b: Neuron Registers-2 [7] 36 Table 4c: Neuron Registers-3 [7] 37 Apart from registers, there are command and control lines which play an important role in information exchange in CM1K technology. Following is a brief description of some of the most critical command and control lines. DS: DS stands for a Data Strobe line, this is a signal generated from the target to initiate a read or a write command from or to CM1K module. RW_: This is a command line and lets CM1K module know whether the data needs to be written into the CM1K module or is to be rendered form CM1K module. By default RW_ is 1, which implies a read command. REG: This is a 5 bit register. This is used to address all 15 neuron registers. “Addr” column of table 4a to table 4c represents the hex-decimal address value for all the neuron register in CM1K. DATA: This is a 16- bit bi-directional data bus which is used to write data into CM1K in write mode and renders data/information from CM1K in read mode. All the bits of DATA are connected with a pull-up connection and results in a default value of 16-bit wide bus with all 1’s (xFFFF). ID: The ID represents Identified line. Value of this line represents recognition status. If all the neurons recognize the last input vector and are return same category value then value of ID_ is zero. This line is updated each time last component of a vector is 38 broadcasted to the neurons either through a write command to LCOMP register or through real time recognition logic of the CM1K [2]. UNC: The UNC represents Uncertain line. This is a bidirectional in nature and shall never be driven [2]. Value of UNC is zero when there lays a conflict among firing neurons on the category of an input vector. Apart from above mentioned command and control lines there are a few more important signals and pins a designer must be aware of to exact a hardware level communication. These signals are as follows: S_CHIP: This pin is used to enable or disable parallel bus of a CM1K chip. By default this is pulled down, this setting configures parallel bus a as a bidirectional bus and enables multiple CM1K chips to receive commands synchronously. DCI: DCI stands for Daisy Chain In, and is an input to the first neuron of the system. When DCI is high, it puts neuron in ready to learn mode (RTL) [7]. DCO: DCO stands for Daisy Chain Out. DCO of a chip becomes and active high when a last neuron of the chip gets committed. In a multichip configuration, this signal connects to DCI signal of the next chip in the row. 39 RDY: RDY stand for Ready line. RDY is pulled down by the neurons during the execution of a command and released upon its termination [2]. This is a very important signal and should be paid more attention while designing with CM1K. As this project will does not use inbuilt recognition logic of the CM1K chip, therefore we will disable the recognition logic by setting RECO_EN signal as Active low in the FPGA design. In addition, inbuilt I2C master is not used in this project so will be disabled by setting its value to an active low in the FPGA design. 4.3 MCwCM components. MCwCM HDL design consists of many components. The following is a brief description of the components in MCwCM: Control State Machine (CSM): This module of the HDL design is the controller and controls the data flow between the FPGA and PGA. This also generates the data for the PGA module along with the address to where this data should be directed in the PGA module. CM1K Register Address Bus Logic (bus logic): This logic is connected to the Register Address bus signals of the CSM and provides the register address information to the PGA module. 40 CM1K Data Bus Logic (data logic): This module connects the data bus signals of the CSM and the data bus of the PGA module. This ensures a dedicated path for data exchange. Figure 4a: MCwCM Block diagram 4.4 Signal description of MCwCM. As described in above section of this chapter, there are signals which are used for talking to a CM1K chip. These signals need to be configured with their respective 41 values for a hardware level communication. In MCwCM design FPGA will initialize the signals and registers for the CM1K chip in PGA module. Figure 4b: MCwCM signal representation diagram Figure 2b displays the direction of all the signals to the FPGA. All the FPGA signals are described below. DCI: DCI here represents Daisy Chain In. This signal is been set to an Active high. This enables the CM1K chip in the PGA module and sets its first neuron to its active state (RTL state). This signal is an output for FPGA’s point of view. 42 RDY: This signal represents Ready and is an input to the FPGA. This signal lets the logic in FPGA know whether the PGA is ready to receive a command or not. It the RDY signal is an active low then it implies that CM1K is not ready to receive any further command and is still processing the previous command. DS: This signal represents Data Strobe and is connected to the DS pin of the PGA module. This is an output form the FPGA and an input to the PGA module. Data Bus: This is a 16-bit bus which connects to the DATA lines of the PGA module. This bus is responsible for transmitting data from FPGA to PGA and receiving data from PGA to FPGA. This is a bi-directional bus. Cm1kAddressBus: This is a 5 bit address bus and is connected to REG line of the PGA module. This bus is used to address the neuron register in the CM1K chip of the PGA module. This is also a bidirectional bus. RwEn: This signal represents read, write enable, and connects to the RW_ pin of the PGA module. This is a command signal represent a read or a write cycle. This is an output form the FPGA and is an input to the CM1K chip. UNC: UNC represents Uncertain and is an input to the FPGA. This pin is connected to the UNCn pin of the PGA module. ID: ID represents Identified line and is connects to the ID pin of the PGA module. This is an input to the FPGA and output from the PGA module. 43 4.5 Communication protocols The following are the rules which are to be followed when designing with a CM1K chip. 1. Active low RDY signal implies that the CM1K chip is busy processing previous command and hence no new transaction should be initiated. 2. DS should be and Active high before the start of every transaction and then should be pulled down. 3. DS should be never to be asserted when RDY line is an active low. 4. Neurons sample signals on the positive edge of the clock, so all the signals should be stable before positive edge of the clock. 5. The setup time must at least be five nanoseconds. 6. The hold time must at least be five nanoseconds. 7. When not driving the DATA line it must be put to its default value. 8. DS must be asserted or de-asserted only at the negative edge of the clock. 9. RECO_EN should be set to active low in case the internal Recognition logic is not used in the design. 10. I2C_EN should be set to active low in case I2C master is not used in the design. 11. S_CHIP line should be set to and Active low. 44 12. G_STDBY lines should be also set to an active low. This line is responsible for putting the CM1K chip in standby mode. This project is not utilizing this feature and thus G_STDBY is set to an active low. 13. DCI line is set to an active high as this will set first neuron in the chain to ready to learn state. 14. G_RESET of the PGA module is an active low reset, which means that if G_RESET value is an active low then the PGA module will go to its reset state. These protocols are for a design with a single chip, which an external master and internal Recognition logic disabled. Disabling internal Recognition logic also disables the digital input bus [7]. To use this bus, recognition logic should be enabled. 4.6 MCwCM Design Flow This design is an implementation of a state machine. This state machine has four states: IDLE, WRITE, BUS_TURNAROUND and READ. IDLE state: This state initializes or put all the registers to their default state as required. This is also the default state for the system. WRITE state: This is the state where the FPGA initiates a write cycle. This state is responsible for designating a REG address and then providing data that needs to be written to designated register address of the CM1K. 45 BUS_TRUNAROUND: This is an intermediate state between the write and read cycle. This sets the data bus to its default state as per the protocol. Sets DS signal to an active low as per the protocol. READ: The read happens in this state. This state initiates a read cycle and sets the value of RW_ line to an active high. It also configures the address for the register in the CM1K from where the data is to be read. After a read cycle is done, the machine goes back to its IDLE state. In MCwCM design, the number of read and write cycle performed is dependent on the “MAX_NUM_TRANS”. This parameter can be set in the HDL code to increase or decrease the number of read and write cycles. Another parameter in the MCwCM design is BASE_ADDR , a value that defines the address to write; for this project it defaults to “x06” which is the Minif Address. Parameter DEFAULT_WR_DATA is the data which is initially written to the CM1K chip on the PGA module and then is incremented by one for the rest of the cycles. DEFAULT_DATA data is the default data for the data bus in an idle condition. The following is a state diagram representation of the MCwCM design. 46 If ((start_op == 1) && (num_trans_comple ted < MAX_NUM_TRANS)) && (Rdy)) Start of the State machine If not (Rdy & start_op) IDLE If not ((start_op == 1) && (num_trans_comple ted < MAX_NUM_TRANS)) && (Rdy)) If (Rdy & start_op) If not ((Rdy & start_op)) READ WRITE If not ((Rdy & start_op)) If (Rdy & start_op) BUS_TURN AROUND If (Rdy & start_op) Figure 4c: MCwCM state diagram representation diagram 47 In Figure 4c, state diagram “start_op” represents start of an operation. In idle state all the registers are reset to their default state. The above mentions state machine was realized in Verilog. The design code can be found in the Appendix . 4.7 Simulation results Following represents the waveform output of the design, with detailed timing explanation. Initialization of the Cmk Clock Initialization of all the registers in the system on the positive edge of the clock Figure 4d: Waveform output-1 48 The MCwCM has two clock sources. One of the clock sources generates a clock tic of 50 MHz. Then this 50 MHz clock is further divided into a 25 MHz clock. This clock division is done to comply with the maximum clock frequency limit of CM1K chip of 27 MHz. Another thing mentioned in the above waveform is that all the registers in the design are initialized on the positive edge of the clock. Figure 4e: Waveform output-2 49 In the above waveform output, the following signals are worth noticing. These signals display the relationship between the signals at hardware level communication: fpga_top_tb/Rdy: This is the signal which CM1K will generate when it will sense a read or write command. In this waveform, this signal has been generated for the testbench. For more information on how to generate this signal, please refer the code of in the test-bench. fpga_top_tb/DataStrobe: DataStrobe is the signal which is generated from the MCwCM and responses to the ready signal. If you observe the waveform output closely then you will notice that DataStrobe signal is inserted when the “Rdy” signal is active high. This DataStrobe indicates to the CM1K chip that the communication device is ready to communicate with the chip. After sensing this DataStrobe signal, CM1K chip pulls down the Ready signal on the positive edge of the clock, and keeps it pulled down as along as the command is not finished. Once the current command is finished, CM1K pulls the “Rdy” signal high to indicate that CM1K is ready for the next command. If there are more read or write commands for the chip, then DataStrobe signal is again asserted (made active high) and then the system follows the above-mentioned sequence. Important thing to notice is that DataStrobe should be stable by the positive edge of the clock for CM1K chip to sense on the positive edge. Therefore, this signal is generated on the negative edge of the clock. 50 fpga_top_tb/rdWrEn: This is the signal generated from the MCWCM design. This signal tells the CM1K chip that a communicating device wants to write into the chip or wants read from the chip. This signal should be stable by the positive edge of the clock so that the CM1K chip can sense it on the positive edge of the clock. Therefore in this design this signal is triggered on the negative edge of the clock so that the next positive edge can have a stable signal for the CM1K chip. This is an important aspect of the design and should be considered while designing with CM1K. fpga_top_tb/RegBus: This is a 5 bit register, and is part of the MCwCM design. This signal selects to the registers of the CM1K chip. In MCwCM this is set to a predefined value which is address “0x06”. This is register address to set MinIf (minimum influence field) of the CM1K. This can be changed to any address and then the system will write into that location or will read the information from that location. This value of this register should also be stable before the positive edge of the clock. Therefore in MCwCM it is set at the negative edge of the clock so that it can be stable by the positive edge of the subsequent positive edge of the clock. fpga_top_tb/DataBus: DataBus represents the information or data which needs to be written from the CM1K chip or read from the chip. Data should also be stable before the positive edge of the clock and hence is made available on the DataBus on the negative edge of the clock. 51 If observed closely, one can see a blue line after every cycle (read and write), this line represents “Z” value, which means high impedance. This represents that the DataBus line is been released for other operations, like bus snooping. fpga_top_tb/LED: This is a signal that connects to the LEDs on the eval board. Eval board has eight leds that can be used in a design. In MCwCM, these LEDs have two functions as described under: 1. To display the state of reset: Whenever a reset is applied, the 4th and 5th LEDs are lit. 2. To display the read output: When not in reset state, these LEDs represent the data value of the DataBus after the last read transaction. Note: Only eight LEDs are available, so it can represent a maximum value of 256. 4.8 MCwCM “write” operation MCWCM write operation follows the following sequence: Before beginning a write cycle MCwCM checks for following signals - RDY: If active high or not. - Start_op: Active high or not. - (num_trans_completed < MAX_NUM_TRANS): Number of transaction completed are less than maximum number of transmission or not. 52 Note: “MAX_NUM_TRANS” is a parameter that determines how many times the state machine will run. Default is set to 10; however, this can be set to any desired value by updating this field in the code. If all the above-mentioned conditions are satisfied then MCWCM system enters the “Write” state of the state machine. The write state generates a DataStrobe signal, a desired value for register address, a write command, and data to be written for respective lines. The PGA module fetches these signals on the positive edge of the clock and acknowledges them by pulling the RDY signal down to its active low state. This initiates a write cycle. Number of clock cycle to complete a write command depends upon the targeted register of the CM1K chip. It takes one clock cycle to write data from MinIF (Register address “0x06”). If any of the above-mentioned conditions are not met then the MCWCM stays in the IDLE state. It waits at IDLE state until all the conditions are met, and then enters into the WRITE state and completes a write command. 4.9 MCwCM “read” operation. MCWCM read operation follows the following sequence: Before beginning a read cycle MCwCM checks for following signals - RDY: If active high or not. - Start_op: Active high or not. 53 - (num_trans_completed < MAX_NUM_TRANS): Number of transaction completed are less than maximum number of transmission or not. Note: “MAX_NUM_TRANS” is a parameter that determines how many times the state machine will run. Default is set to 10; however, this can be set to any desired value by updating this field in the code. If all the above-mentioned conditions are satisfied then MCWCM system enters the “read” state of the state machine. The read state generates a DataStrobe signal, a desired value for register address that system intends to read data from, and a read command. The PGA module fetches these signals on the positive edge of the clock and acknowledges them by pulling the RDY signal down to its active low state. This initiates a read cycle. Number of clock cycles to complete a read command depends upon the targeted register a CM1K chip. It takes one clock cycle to read data from MinIF (Register address “0x06”). If any of the above-mentioned conditions are not met then the MCwCM stays in the BUS_TURNAROUND state. It waits in BUS_TURNAROUND state until all the conditions are met, and then enters into the READ state and completes a read command. 54 Chapter 6 CONCLUSION This project has shown how to create a hardware level model for communication with CM1K technology. This technology has many advantages over the contemporary von-Neumann model of computing. When using CM1K, attention should be paid to both hardware level as well as software level of communication. Currently, there are several software options present in the market to interact with the CM1K chip. MCwCM is the first hardware model for communication with this technology. The milestones achieved in this project are: 1. Successfully studied and implemented CM1K communication protocol in Verilog. 2. Successfully implemented of MCwCM model in an FPGA 3. A communication system with an FPGA and PGA to demonstrate a hardware level communication. The MCwCM is currently being used at Cognimem Technologies Inc., for testing the functionality of newly fabricated PGA69 modules. In future, this project could be extended to implement entire functionality of CM1K chip to create a standalone system with all the features of CM1K technology. 55 APPENDIX A MCwCM design code in Verilog module fpga_top ( // From Board to FPGA pins input wire SysClk, input wire SysRst_n, // From FPGA to CMIK chip output reg cm1kClk,//output wire cm1kClk, output wire cm1kRst_n, output wire dci, output wire G_STDBY, output wire s_chip, output wire reco_en, output wire i2c_en, //visual output for read and write output wire [7:0]LED, // From Board to FPGA like push Button input wire start_op, // From CMIK chip to FPGA input wire Rdy, inout wire UNC, output wire tempUNC, // input wire ID, output wire tempID, output wire DataStrobe, output wire rdWrEn, inout wire [4:0] RegBus,// inout wire [15:0] DataBus ); input wire UNC, output wire [4:0] RegBus, // Parameters parameter IDLE = 2'b00; parameter WRITE = 2'b01; parameter BUS_TURNAROUND = 2'b10; parameter READ = 2'b11; parameter MAX_NUM_TRANS = 10; parameter BASE_ADDR = 5'b00110; //5'h06 parameter DEFAULT_WR_DATA =16'h01; //16'h5A5A; parameter DEFAULT_DATA = 16'hFFFF; parameter DEFAULT_UNC = 1'bz; 56 // Local registers and wires reg [1:0] clk_divider_cntr; reg [3:0] num_trans_completed; reg DataStrobe_d1; // Internal registers for Control State Machine (csm) reg [15:0] csm_data; reg [4:0] csm_reg; //reg csm_UNC; reg csm_ds; reg csm_rdWrEn; reg [1:0] curState; reg [1:0] nxtState; // Latch readData to diplay on LED. wire [15:0] readData; //wire [7:0] LED; // ********************** // Clock Generator logic // ********************** always @ (posedge SysClk) begin if (!SysRst_n) begin clk_divider_cntr <= 0; end else if (clk_divider_cntr == 3) begin clk_divider_cntr <= 0; end else begin clk_divider_cntr <= clk_divider_cntr + 1; end end //assign cm1kClk = (clk_divider_cntr > 1) ? 1 : 0; always @ (posedge SysClk) begin if (!SysRst_n) begin cm1kClk <= 0; end if ((SysRst_n) &&( clk_divider_cntr <=1 ) ) begin cm1kClk <= 0; end else if ((SysRst_n) &&( clk_divider_cntr >=2)) begin cm1kClk <= 1; end end // ********************** // Standby signal logic // ********************** 57 assign G_STDBY= 0; // ****************************** // temprory "UNC" and "ID" logic // ****************************** //wire tempUNC; //wire tempID; assign tempUNC = UNC; // assign UNC= assign tempID = ID; cm1kRst_n ? UNC : 1'bZ; // ********************** // "s_chip" logic // ********************** assign s_chip = 0; assign reco_en= 0; assign i2c_en = 0; assign DataStrobe = csm_ds; assign rdWrEn = csm_rdWrEn; // ********************** // "dci" logic // ********************** assign dci = 1; // ********************** // Reset logic // ********************** assign cm1kRst_n = SysRst_n; // *************************** // Control logic for DataBus // *************************** assign DataBus = (~rdWrEn & DataStrobe) ? csm_data : 16'hzzzz; // *************************** // Control logic for RegBus // *************************** //assign RegBus = csm_reg; // to make regBus as an output only. assign RegBus = cm1kRst_n ? csm_reg : 5'hzz; // *************************** // FPGA Control state machine // *************************** always @ (negedge cm1kClk or negedge cm1kRst_n) begin if (!cm1kRst_n) begin curState <= IDLE; end else begin curState <= nxtState; end end always @ (*) begin case(curState) IDLE: begin if ((start_op == 1) && (num_trans_completed < MAX_NUM_TRANS)&& (Rdy)) begin 58 nxtState = WRITE; end else begin nxtState = IDLE; end end WRITE: begin if (Rdy & start_op) begin nxtState = WRITE; end else begin nxtState = BUS_TURNAROUND; end end BUS_TURNAROUND: begin if (Rdy & start_op) begin nxtState = READ; end else begin nxtState = BUS_TURNAROUND; end end READ: begin if (Rdy & start_op) begin nxtState = READ; end else begin nxtState = IDLE; end end endcase end always @ (curState) begin case(curState) IDLE: begin csm_ds = 0; csm_rdWrEn = 1; csm_reg = 5'h06;//5'h00; csm_data = DEFAULT_DATA; //csm_UNC = DEFAULT_UNC; end WRITE: begin csm_ds = 1; csm_rdWrEn = 0; csm_reg = BASE_ADDR; // + num_trans_completed; csm_data = DEFAULT_WR_DATA + num_trans_completed; // csm_UNC = DEFAULT_UNC; end 59 BUS_TURNAROUND: begin csm_ds = 0; csm_rdWrEn = 1; csm_reg = 5'h06;//5'h00; csm_data = DEFAULT_DATA; // csm_UNC = DEFAULT_UNC; end READ: begin csm_ds = 1; csm_rdWrEn = 1; csm_reg = BASE_ADDR; //+ num_trans_completed; csm_data = DEFAULT_DATA; // csm_UNC = UNC; end endcase end // ****************************************************** // Logic to count number of wr/rd transactions completed // ****************************************************** always @ (negedge cm1kClk or negedge cm1kRst_n) begin if (!cm1kRst_n) begin num_trans_completed <= 0; end else if (curState == READ) begin num_trans_completed <= num_trans_completed + 1; // ";"removed from oriiginal end end // Logic to latch always @ (negedge if (!cm1kRst_n) DataStrobe_d1 else DataStrobe_d1 end assign readData = //assign readData read data cm1kClk or negedge cm1kRst_n) begin <= 0; <= DataStrobe & rdWrEn; DataStrobe_d1 ? DataBus : readData; //16'hzzzz; = DataStrobe_d1 ? DataBus : 16'hzzzz; // LED Logic // For write data, drive 8'h55 and for reaad, drive 8'hAA //assign LED = readData[15:8]; assign LED = cm1kRst_n ?((num_trans_completed == MAX_NUM_TRANS) ? (~readData) : 8'b11111111) : 8'b11100111; //assign LED = ((DataBus == csm_data) ? 8'h55 : ((DataBus == readData) ? readData : 8'hzz)); 60 //assign LED = ((DataBus == csm_data) ? 8'h55 : ((DataBus == readData) ? 8'hAA : 8'hzz)); endmodule 61 APPENDIX B MCwCM test-bench code in Verilog `timescale 1ns/100ps //`include "fpga_top.v" module fpga_top_tb (); reg SysClk; reg SysRst_n; wire dci; wire cm1kClk; wire cm1kRst_n; wire s_chip; wire G_STDBY; reg start_op; reg Rdy; wire UNC; reg UNC_tb; reg ID; wire tempID; wire wire wire wire wire DataStrobe; rdWrEn; [4:0] RegBus; [15:0] DataBus; [7:0]LED; reg [15:0] write_data; reg DataStrobe_d1; wire ledClk_tb; //assign ledClk_tb = fpga_top_inst.ledClk; defparam fpga_top_inst.MAX_NUM_TRANS = 10; fpga_top fpga_top_inst ( .SysClk(SysClk), .SysRst_n(SysRst_n), .cm1kClk(cm1kClk), .cm1kRst_n(cm1kRst_n), .start_op(start_op), .LED(LED) , // added another output .Rdy(Rdy), .UNC(UNC), // .tempUNC(tempUNC), .ID(ID), .tempID(tempID), .DataStrobe(DataStrobe), .rdWrEn(rdWrEn), 62 // .RegBus(RegBus), .DataBus(DataBus), .dci(dci), .s_chip(s_chip), .G_STDBY(G_STDBY) .ledClk(ledClk_tb) ); initial begin SysClk = 0; SysRst_n = 1; Rdy = 0; UNC_tb = 1; ID = 1; start_op = 0; #5; SysRst_n = 0; #10; SysRst_n = 1; repeat (fpga_top_inst.MAX_NUM_TRANS * 2) begin @(posedge cm1kClk); @(posedge cm1kClk); @(posedge cm1kClk); start_op <= 1; wait (rdWrEn & DataStrobe_d1 & !Rdy); start_op <= 0; end end // Dump all variables initial begin $dumpfile("Run.dmp"); $dumpvars(); #1200; $stop; // $finish; end always begin #2 SysClk = ~SysClk; end always @ (posedge cm1kClk or negedge cm1kRst_n) begin if (!cm1kRst_n) Rdy <= 1; else if (DataStrobe) Rdy <= 0; else Rdy <= 1; end 63 always @ (negedge cm1kClk or negedge cm1kRst_n) begin if (!cm1kRst_n) DataStrobe_d1 <= 0; else DataStrobe_d1 <= DataStrobe & rdWrEn; end always @ (*) begin if (~rdWrEn & DataStrobe & !Rdy) write_data = DataBus; end assign DataBus = (rdWrEn & DataStrobe_d1) ? write_data : 16'hzzzz; //assign UNC = rdWrEn ? UNC_tb : 1'bz; assign UNC = UNC_tb; endmodule 64 REFERENCES [1] Cognimem Technologies Inc. webpage, accessed on October 1, 2012 www.Cognimem.com [2] CM1K Hardware User’s Manual, Version 2.4.0, Cognimem, Inc. Revised 07/10/2012 http://cognimem.com/_docs/TechnicalManuals/TM_CM1K_Hardware_Manu al.pdf [3] Palnitkar, S. and Goel, P., Verilog HDL: A Guide to Digital Design and Synthesis, Prentice Hall, 1996. [4] Bors, A. (2001), Introduction of the Radial Basis Function (RBF) Networks, Online Symposium for Electronics Engineers (OSEE 2001). http://www-users.cs.york.ac.uk/adrian/Papers/Others/OSEE01.pdf [5] Cognimem, from Technology to ASIC (Tech. Brief), Cognimem, Inc. (accessed 9/2012). http://www.cognimem.com/_docs/TechnicalBriefs/TB_from_Techno_to_ASIC.pdf [6] CM1K introduction to CM1K presentation, Cognimem, Inc. Revised on 07/10/2012. http://www.cognimem.com/_docs/Presentations/PR_CM1K_introduction.pdf [7] CM1K Reference Guide, Version 2.2.1, Cognimem, Inc. Revised 07/10/2012. http://www.cognimem.com/_docs/TechnicalManuals/TM_CogniMem_Technology_Reference_Guide.pdf [8] Weichang Chen; Ziqiang Wang; Zhihua Chen; Zhiyi Chen; , "Working Mechanism of Brain Neural Network," Neural Networks and Brain, 2005. ICNN&B '05. International Conference on , vol.3, no., pp.nil5-1309, 13-15 Oct. 2005 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1614872 [9] Kanold, PJ 2003, 'Role of Subplate Neurons in Fuctional Maturation of Visual Cortical Columns', Science, 301, 5632, p. 521, MAS Ultra - School Edition, EBSCOhost, viewed 17 September 2012. 65 http://www.ces.clemson.edu/bio/documents/Publications/Kara%20%20Science%202003.pdf [10] George, D.; , "How to make computers that work like the brain," Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE , vol., no., pp.420423, 26-31 July 2009 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5227027&isnumb er=5227020 [11] Menendez, Anne; Paillet, Guy;, “Fish Inspection System Using a Parallel Neural Network Chip and the Image Knowledge Builder Application,” AI Magazine Vol.29(1):21–28 http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/2084 [12] HDL Synthesis Coding Guidelines for Lattice Semiconductor FPGAs , Version 2.0 , Lattice Semiconductor™, Revised, September 2012. http://www.latticesemi.com/lit/docs/technotes/tn1008.pdf?jsessionid=f03063 791bdddc45308a24453f6966594233 [13] Brown, Stephen; Rose, Jonathan; “Architecture of FPGAs and CPLDs: A Tutorial”, Department of Electrical and Computer Engineering, University of Toronto, accessed on 12 October 2012. http://www.eecg.toronto.edu/~jayar/pubs/brown/survey.pdf [14] CM1K-PGA69 Module Hardware Specification, Cognimem, Inc. Viewed on 12 October 2012. http://www.cognimem.com/_docs/Technical-Manuals/CTICM1KPGA69%20%20Hardware%20Specification%20final.pdf [15] Lattice Semiconductor, product page, Xp2 Brevia Development kit description. http://www.latticesemi.com/products/developmenthardware/developmentkits/ xp2breviadevelopmentkit.cfm [16] Parnell, Karen; Mehta, Nick., alnitkar, S. and Goel, P., Programmable Logic DesignQuick Start Hand Book, Xilinx®, January 2002, Viewed on 14 October 2012. http://www.ee.ic.ac.uk/pcheung/teaching/ee3_dsd/beginners_bk_4x.pdf [17] Lattice Semiconductor, Low –cost, 3rd generation, Non-Volatile FPGA,; http://www.latticesemi.com/documents/33797.pdf 66 [18] Lattice Semiconductor, FlexiFlash Architecture, http://www.latticesemi.com/products/fpga/xp2/flexiflasharchitecture.cfm [19] Lattice Semiconductor, LatticeXp2 Brevia User Guide, http://www.latticesemi.com/documents/doc43735x37.pdf [20] Lattice Semiconductor, XP2 family data sheet handbook, http://www.latticesemi.com/documents/HB1004.pdf [21] Lattice Diamond Installation guide http://www.latticesemi.com/documents/diamond_20_install_pc.pdf [22] Lattice Diamond 2.0 Tutorial http://www.latticesemi.com/documents/latticediamondtutorial20.pdf [23] "IEEE Standard VHDL Language Reference Manual.," ANSI/IEEE Std 1076-1993 , vol., no., pp.i, 1994 doi: 10.1109/IEEESTD.1994.121433 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=392 561&isnumber=8908 [24] "IEEE Standard for Verilog Hardware Description Language," IEEE Std 1364-2005 (Revision of IEEE Std 1364-2001) , vol., no., pp.0_1-560, 2006 doi: 10.1109/IEEESTD.2006.99495 URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1620780&is number=33945 [25] Jie-Hong (Roland) Jiang, Srinivas Devadas (2009). "Logic synthesis in a nutshell". In Laung-Terng Wang, Yao-Wen Chang, Kwang-Ting Cheng. Electronic design automation: synthesis, verification, and test. Morgan Kaufmann. ISBN 978-0-12-374364-0. Chapter 6. [26] M. Nomura et. al., “Timing Verification System Bases on Delay Time Hierarchical Nature,” 19th Design Automation Conf., pp. 622-628. 1982. [27] Functional verification article on EETimes. http://www.eetimes.com/design/eda-design/4004785/Leveraging-systemmodels-for-RTL-functional-verification 67 [28] Synopsys Floor Planning tool http://www.synopsys.com/tools/implementation/signoff/capsulemodule/desig n_plan_wp.pdf [29] Porting designs form Xilinx , Altera in to a Lattice semiconductor http://www.latticesemi.com/lit/docs/manuals/fpga_design_guide.pdf [30] Mentor Graphics –ModelSim product page http://www.mentor.com/products/fpga/simulation/modelsim [31] CM1K data sheet, Cognimem, Inc. Accessed on November 4, 2012. http://www.cognimem.com/_docs/Datasheet/DS_CM1K.pdf [32] Online Oxford dictionary, accessed on Nov 1, 2012. http://oxforddictionaries.com/definition/american_english/protocol?region=us &q=protocol