MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Electrical and Electronic Engineering
by
Avinash Kumar Pandey
FALL
2012
© 2012
Avinash Kumar Pandey
ALL RIGHTS RESERVED
ii
MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY
A Project
by
Avinash Kumar Pandey
Approved by:
__________________________________, Committee Chair
V. Scott Gordon, Ph.D.
__________________________________, Second Reader
Dennis Dahlquist, P.E.
____________________________
Date
iii
Student: Avinash Kumar Pandey
I certify that this student has met the requirements for format contained in the
University format manual, and that this project is suitable for shelving in the Library
and credit is to be awarded for the Project.
__________________________, Graduate Coordinator
Preetham Kumar, Ph.D.
Department of Electrical and Electronic Engineering
iv
________________
Date
Abstract
of
MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY
by
Avinash Kumar Pandey
This project provides a simplified model in Verilog to demonstrate a flow of
communication for a neural network chip called CM1K. In this communication model,
the HDL acts as a data provider to the chip as well as a controller to channelize the
communication flow. Hardware level communication with a non-contemporary
technology can be challenging, particularity for someone new to the technology. This
project provides a good understanding of the CM1K technology, HDL model for the
controller, communication protocol, necessary knowledge of the tools for creating an
HDL model and simulation results using ModelSim tool, demonstrating the working of
the HDL logical blocks. The HDL model is ASIC agonistic and can be used for further
references and research purposes.
_______________________, Committee Chair
V. Scott Gordon, Ph.D.
_______________________
Date
v
ACKNOWLEDGEMENTS
I would like to extend my gratitude to my project advisor and committee chair,
Dr. V. Scott Gordon for his valuable help, timely guidance and moral support. I will
always remember his positive and considering attitude. His dedication towards making
a student successful is incomparable and inspiring. I would like to thank him for
thoroughly reviewing and giving me his valuable feedback. I have learned many things
form him and will be sure to take it forward to my professional career.
I would also like to thank Dr. Dennis Dahlquist for spending his valuable time
and giving a second opinion on my project report. I thank him for all his cooperation
and reviewing my report in a very timely manner.
I would like to thank Dr. Kumar for his guidance throughout my master’s
program. He is a great asset for all graduated students in EEE department.
I would also like to thank CogniMem team for their support, help and
consideration. I personally like to thank Mr. Bruce McCormick for his help and
considerations. I would also like to mention Mr. Bill Nagel, for his guidance and help,
and extend my gratitude towards him as well.
I would also like to thank Mr. Sam Miller from Ansync for providing me with
the tools necessary to troubleshoot my design.
vi
I would like to thank my friend, Jay Panchal for giving me encouragement
during frustrating moments of the project.
I also would like to thank my roommate Ayush Chadha and friend Sarjeet
Goswami for their support and motivation throughout my master’s program.
Lastly, I would like to thank everyone who has helped me directly or indirectly
in finishing my master’s program.
Avinash Kumar Pandey
vii
TABLE OF CONTENTS
Page
Acknowledgement ........................................................................................................... vi
List of Tables .....................................................................................................................x
List of Figure ................................................................................................................... xi
Chapter
1. INTRODUCTION ........................................................................................................ 1
1.1 Overview ................................................................................................................ 1
1.2 Purpose of the Project ............................................................................................ 2
1.3 Benefits of MCwCM.............................................................................................. 3
1.4 Applications of the CM1K technology .................................................................. 3
2. BACKGROUND .......................................................................................................... 4
2.1 Overview ................................................................................................................ 4
2.2 Definitions.............................................................................................................. 5
2.3 CM1K Architecture ............................................................................................. 10
2.4 CM1K Working and Applications ....................................................................... 15
2.4.1 Learning of a Neuron .................................................................................. 17
2.4.2 Current use of the Technology .................................................................... 18
3. INTRODUCTION TO LATTICE FPGA-LFXP2-5E-6TN144C AND CM1KPGA69 MODULE .................................................................................................... 20
3.1 Introduction .......................................................................................................... 20
3.2 Designing with an FPGA ..................................................................................... 20
3.2.1 Lattice-XP2 family...................................................................................... 21
3.2.2 Programming a Lattice -XP2 FPGA ........................................................... 24
3.2.2.1 Diamond Design Software and Design suite .................................. 25
viii
3.3 Design flow .......................................................................................................... 27
3.4 HDL-Verilog® ..................................................................................................... 30
4. MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY (MCwCM)
.................................................................................................................................... 32
4.1 Introduction .......................................................................................................... 32
4.2 CM1K communication Protocol ......................................................................... 33
4.3 MCwCM components .......................................................................................... 39
4.4 Signal description of MCwCM ............................................................................ 40
4.5 Communication protocols .................................................................................... 43
4.6 MCwCM Design Flow ......................................................................................... 44
4.7 Simulation results................................................................................................. 47
4.8 MCwCM “write” operation ................................................................................. 51
4.9 MCwCM “write” operation ................................................................................. 52
6. Conclusion .................................................................................................................. 54
Appendix A. MCwCM design code in Verilog ........................................................... 55
Appendix B. MCwCM test-bench code in Verilog ..................................................... 61
References ...................................................................................................................... 64
ix
LIST OF TABLES
Tables
Page
1. Table 2a: Manhattan distance Vs. Norm L-sup ......................................................... 16
2. Table 2b: Classification Example .............................................................................. 17
3. Table 3a: Basic physical attributes of LFXP2-5E-6TN144C FPGA device.............. 23
4. Table 3b: Tools and Descriptions of the tools in Diamond Design suite .................. 26
5. Table 4a: Neuron Registers-1 .................................................................................... 34
6. Table 4b: Neuron Registers-2 .................................................................................... 35
7. Table 4c: Neuron Registers-3 .................................................................................... 36
x
LIST OF FIGURES
Figures
Page
1. Figure 2a: CM1K Introduction ................................................................................. 10
2. Figure 2b: RBF Classifier class relationship ............................................................ 11
3. Figure 2c: Recognition time vs. Knowledge size graph ........................................... 12
4. Figure 2d: CM1K Functional Diagram .................................................................... 13
5. Figure 2e: Methodology Flow Chart ........................................................................ 18
6. Figure 3a: The Lattice XP2 Brevia Development kit ............................................... 22
7. Figure 3b: XP2 on-chip component placement ........................................................ 24
8. Figure 3c: Typical design flow ................................................................................. 28
9. Figure 3d: Typical design flow for an FPGA ........................................................... 29
10. Figure 4a: MCwCM Block diagram ......................................................................... 40
11. Figure 4b: MCwCM signal representation ............................................................... 41
12. Figure 4c: MCwCM state diagram ........................................................................... 46
13. Figure 4d: Waveform output-1 ................................................................................. 47
14. Figure 4e: Waveform output-2 ................................................................................. 48
xi
1
Chapter 1
INTRODUCTION
Today’s computers are based on Von Neumann’s model of computer
architecture. Scientists and researchers speculate that contemporary technologies are
ultimately limited by the fact that most of their operations are done serially. Issues
involved with the Von Neumann’s architecture are scalability, cache coherency, clock
synchronization and other issues related to shared-memory. In the near future, tackling
these challenges on a serial-based architecture will become increasingly difficult.
1.1
Overview
A parallel architecture can use contemporary technologies and can still satiate
the need for speed and performance. A neural network is one example of a parallel
architecture based technology. However, a software implementation of a neural network
is typically executed in serial, and thus can only perform as well as a serial architecture
based system, hence is not truly parallel. A hardware implementation of neural
networks can however be a truly parallel architecture.
Neural Networks have been around for a few decades but a hardware
implementation of this technology is still an innovative concept. CogniMem
Technologies Inc. (CTI) has designed a chip called CM1K [7], which is based on purely
parallel architecture.
2
1.2
Purpose of the Project
Currently, the primary mode of communication with the CM1K is through
software applications, on pre-defined hardware platforms. Lack of a direct
communication with the chip at hardware level hinders a lot of applications and
research opportunities.
Hardware level communication with a non-contemporary technology can be
challenging, particularity for someone new to the technology. There are various aspects
involved at this level of communication. First and foremost, one has to have a clear
understanding of the technology, and second, a good knowledge of the communication
protocols and specifications.
This project provides a good understanding of the technology through a
simplified model in Verilog® [24], and demonstrates a flow of communication between
a CM1K chip and an FPGA [13]. A detailed documentation is provided for further
usage of this model for future researches and projects.
This communication model is called MCwCM (Model for Communication with
Cognitive Memory). MCwCM consist of an HDL [3] (Hardware Descriptive Language)
design, which acts as an external source for data generation for the CM1K chip, and is
also the controller to administer the communication flow.
The ModelSim® tool used to demonstrate the results for the logical working of
the design. This tool can generate test vectors to test a design under test (DUT) and can
provide output in the form of waveforms.
3
1.3
Benefits of MCwCM
Following are the benefits of the system under discussion:
1. Provides a deep understanding of CM1K technology.
2. Basis of future projects and researches in the field of hardware
implementation of neural networks.
3. A ready-to-use hardware level communication model for evaluating the
CM1K technology.
1.4
Applications of the CM1K technology.
Following is a list of a few fields of application where this technology can be
proved revolutionary.
1. Data mining.
2. 3D graphic rendering.
3. Machine vision.
4. Pattern recognition.
5. Gesture recognition.
6. Optical character recognition.
4
Chapter 2
BACKGROUND
2.1
Overview
The human brain is the best known processor to humanity. Our brain processes a
huge amount of data around us and gives us the ability to respond based on our past
learning and experiences. This central processing unit (CPU) of human body makes
things that we do on the daily basis effortless. When conscious, our brain uses the
cerebral cortex for its functionality [8]. Six layers comprise the Cerebral cortex; they all
work in tandem to maintain wakefulness and consciousness. The functional column of
this multilayered organization is made up of vertical chain of neurons [9]. Each
functional column is a functional unit of cerebral cortex and consists of 4000 neurons
[8]. Our brain can connect thousands of neurons with billions of connections to provide
us the generalization [10] and learning capabilities. Our brain uses this learning and
generalization capabilities to learn new models and retain that information. For a
reasonable response, brain needs to see a model over a period of time and for a certain
length of time. The adaptability to new knowledge provides brain an ability to be able to
self-learn and readjust the preexisting knowledge in its Hierarchical Temporal Memory
(HTM) [10].
In 1993, Guy Paillet, a French inventor and researcher, presented the concept of
a self-organizing trainable parallel neural network chip to IBM and worked with a team
5
at the IBM lab in Essonnes, France lead by Pascal Tanhoff [1]. The outcome of this
collaborative effort was an ASIC trademarked by IBM as the Zero Instruction Set
Computer (ZISC) chip. Two generations of ZISC were released: ZISC36 with 36
neurons and ZISC78 with 78 neurons. In 2008, Anne Menendez, a researcher and
technologist, and Guy Paillet designed CM1K™. This is an advanced version of
ZISC78 with 1k neurons along with some additional features.
2.2
Definitions
Every technology has a set of terminologies and keywords, which are often
hackneyed among the technologists, but are exotic to a new user of the technology.
Therefore, it is necessary for a novice to understand these terminologies and keywords
to get the most out of the CM1K technology. The following are the most important
keywords and terms used in this technology.
1.
Neurons: A neuron is a cognitive and reactive memory, which can
autonomously evaluate the distance between an incoming pattern and a
reference pattern stored in its memory. If this distance falls within a range
called the influence field, the neuron returns a positive classification, which
consists of the distance value and the category of its reference pattern [7].
6
2.
Vectors: A neurons in a CM1K chip can store data with 8-bit wide and
255-bit deep. So, in total each neuron has a storage capacity of 256 Byte.
An input for neuron storage is called as an input vector or a vector.
3.
KNN: The k-nearest neighbor algorithm (k-NN/KNN) is a method for
classifying objects based on closest training examples in the feature space
[7]. This is one of the classifiers in the CM1K technology.
4.
RBF: RBF is a short form for Radial Base Function. This is one of the
classifiers used in the CM1K technology. RBF embeds in a two layer
neural network, where each hidden unit implements a radial activated
function. The output units implement a weighted sum of hidden unit outputs
[4].
5.
ZISC: ZISC represents “Zero Instruction Set Computer”. This is in
contrary to RISC (Restricted Instruction Set Computer). The concept of
ZISC was presented by Guy Paillet in 1993 [1], and is base foundation of
CM1K technology.
6.
Classifiers: In neural networks, a classifier is a method for determining
which of a finite number of categories to assign to a set of input values. In
the case of CM1K technologies, there are two classifiers: KNN and RBF.
7.
Distance: Distance is a term to quantify the amount of drift in an incoming
vector pattern with respect to a stored vector pattern in a neuron.
7
8.
Category: In CM1K technology, category is a term used to distinguish
stored patterns in the CM1K. One CM1K chip can differentiate 32768
different category values [31].
9.
Learning a vector: A learning vector represents an incoming vector with
intent of modifying the information in the neurons. A learning vector
always the following “write” sequence :
-
Should write to COMP [7] register followed by LCOMP [7] register
followed by CAT (category) register.
10. Broadcasting a vector: A Broadcasting vector is similar to an incoming
vector. Except from writing to the CAT register, it follows the same write
sequence as incoming vector.
11. Committed neuron: In CM1K system, a committed neuron is a neuron,
which has some information/data stored in it along with an associated
category.
12. Recognition of a vector: This process compares an incoming vector with a
value stored in neurons. If the incoming pattern matches with the stored
pattern then the recognition is successful otherwise, it is either no
recognition or uncertain.
Apart from above the mentioned recognition states concept, there is
a false recognition state concept as well; which means that the incoming
8
pattern is recognized as one of the stored categories, however, is not the
correct category for that particular incoming pattern.
For instance, suppose if you have trained a CM1K system to
recognize two fruits- an apple and an orange. Now to test the system, a test
vector representing a pattern of an orange is broadcasted. If the system
recognizes this incoming pattern as an apple instead of an orange then this
“recognition state” is refers to a “false recognition”.
13. Degenerated neurons: When the Active Influence Field (AIF) of a neuron
reduces to a value lesser than its Minimum Influence Field value (MINIF),
then the CM1K chip generates a flag for the neuron. A neuron associated
with such a flag refers to a degenerated neuron.
14. Minimum area of influence: Minimum areas of influence or Minimum
Influence field abbreviates as MinIf (MINIF). It determines the minimum
limit for area of influence of a neuron. Usually it is set to a value of two,
however, can go all the way down to a zero.
15. Maximum area of influence: Maximum areas of influence or Maximum
Influence field abbreviates as MaxIf (MAXIF). It determines the maximum
limit for area of influence of a neuron. 65536 is maximum value that can be
set as a MaxIf value.
9
16. Firing state: Firing state is a state in which the distance between an input
vector and a stored pattern is less than neuron’s influence field.
17. RTL: In CM1K technology, RTL stands for Ready-To- Learn Neuron. In
HDL design, RTL stands for Register-Transfer-Level.
18. Bi-Directional Bus: “Bus” here represents a channel for data exchange. A
bi-directional bus represents a single channel that is capable of transmitting
data as well as receiving data on the same physical connection.
10
2.3
CM1K Architecture
The CM1K is composed of massively hardwired parallel architecture with non-
linear classifiers. The technology in CM1K consists of a chain of identical cells (i.e.
neurons) addressed in parallel with their own “genetic” material to store, learn and
recall patterns [6]. It does not need any additional code or external control unit to store,
learn or recognize a pattern. A neuron reports to a supervision unit autonomously.
Figure 2a: CM1K Introduction [6]
11
Figure 2b: RBF Classifier class relationship
The above figure displays a relation between a 2-dimensional non-linear
decision space and the RBF classifier. The CM1K’s RBF network is highly adaptive
and is capable of a real-time reinforced learning. Its architecture enables the tracing of
an incoming vector for novelty and commits new neurons as per the need for new
models.
The neurons work in collaboration with each other through a bi‐directional
parallel bus. This provides accuracy with recognition speed in real‐time and adaptive
setting [6]. The CM1K neural network is a Radial Basis Function (RBF) classifier with
built-in model generator. This classifier can efficiently cope with disjoint, embedded
and overlapping classes.
12
Additionally, CM1K also has the capability to work in K-Nearest Neighbor
(KNN) mode. The KNN is one of the most effective algorithms in the neural networks.
If used effectively, it can be used for applications like data-mining, computational
geometry and human vision with utter simplicity and accuracy. The parallel memory
based architecture of the CM1K, does not require Von Neumann’s fetch and decode
model. Due to its parallel architecture, execution time becomes independent of the
number of trained examples. It solely depends on the number of input vectors, length of
the input vector and the value of K in KNN mode.
Figure 2c: Recognition time vs. Knowledge size graph [6]
This chip embeds a recognition engine to classify a digital signals, which can be
received directly form a sensor. To enable this feature of the chip, a user must enable
13
the RECO_EN [2] signal. RECO_EN [2] enables the in-built recognition state of the
chip. When this state is enabled, it acts as a master controller.
A user can access the V_DATA [2] bus by enabling RECO_EN and VI_EN [2]
simultaneously. V_DATA bus is only accessible when both RECO_EN and VI_EN are
enabled. V_DATA can be used to provide a video-input directly to the CM1K chip. The
chip then can use its internal recognition logic to process the information to produce an
output.
Figure 2d: CM1K Functional Diagram [31]
Figure 1d shows a block diagram of the CM1K architecture. This chip consists of
the following modules:
14
Top Control logic (NSR and RSR registers, Ready and busy Control Signals)
Cluster of 16 Neurons
Recognition Stage (optional Usage)
I2C slave (optional Usage)
Top Control logic: This module synchronizes communication between the clusters of
neurons, the recognition state machine and the I2C slave. In addition, it is responsible
for inter-module and inter-neuron communications. The inter-module communication
through a bi-directional parallel bus of 25 wires Data Strobe (DS), Read/Write (RW_),
5-bit Register (REG), 16-bit Data (DATA) and Ready (RDY). The inter-neuron uses
two additional lines indicating the global status of the neural network; Identified
recognition (ID) and Uncertain recognition (UNC). Communication with external
control units can be made through the same parallel bus or by using serial I2C bus [2].
Cluster of 16 Neurons: Cluster of Neurons consists of 16 identical neurons operating
in parallel. The execution or behavior of neurons is independent of their parent cluster
or chip. All neurons form same cluster of different cluster display the same behavior
and execute the instructions in parallel. No controller or supervisor is required for this.
This module is responsible for the selection of one of the two classifiers: KNN or RBF.
[2]
Recognition stage (optional usage): Enabling RECO_EN pin on the CM1K chip
enables the Recognition. This can be done programmatically via a control command.
15
This module is used for the recognition of the incoming vectors and providing a
response [2] and becomes a master controller for the CM1K chip upon enabling the
RECO_EN pin.
I2C slave controller (optional usage): is enabled physically with I2C_EN pin and used
for receiving the serial signal on the I2C_clk and I2C-DATA lines. It also converts it
into a combination of DS, RW_, REG and DATA signals compatible with the parallel
neuron bus [2].
2.4
CM1K’s Working and Applications
CM1K emulates our brain to some extent. It gets information in the form of
input vectors, a neuron reacts to a command and either learn the pattern or work in
tandem with other neurons to recognize a pattern. A neuron memory consists of 256
components, with each component 8-bit wide. Therefore, in essence each neuron has
265 Bytes of total storage capacity. Each CM1K is made up of 1024 such neurons,
which makes 1024 times 256 byes of memory for a system with one CM1K chip. Each
neuron has its own processing unit that reacts to an incoming pattern. When a vector is
broadcasted to the CM1K chip, all of the neurons can access the information at once.
When attempting to recognize a pattern, each neuron has the capability to spy the
response of its counterparts and to withdraw itself from the race if another neuron
reports a smaller distance value [7].
16
In CM1K, its Distance Evaluation Unit (DEU) calculates a distance value. The
distance is calculated between the incoming vector and the pattern stored in neurons.
DEU can calculate distance by either by Manhattan distance (also known as Norm L1)
or by Norm L-Sup (also known as Lsup). Both Manhattan distance and Lsup distance
have their advantage and disadvantages. The following table states some of the features
of the above-mentioned methods:
Manhattan Distance
Mathematical equation
DL1= ∑ |Vi –Pi|
Norm L-Sup
Mathematical equation
DLsup = Max |Vi-Pi|
The L1 distance emphasizes the drift of The Lsup Distance emphasizes the largest
the sum of all the components between drift of the same component between
incoming vector V and the stored
input vector V and the stored pattern P.
pattern P.
Table 2a: Manhattan distance Vs. Norm L-sup
In CM1K, neurons will “fire” if the distance between the input vector and the
stored pattern is less than its influence field. This state of firing refers to “Firing state”.
Firing state is the neurons’ way of communicating that they have the information which
is been broadcasted on the bus. A firing neuron contains a lot of information, one of the
most important pieces of information is the category associated with a neuron. The
system multiplexes the response from each fired neuron to produce a classification. The
CM1K classifies a response as either an Identified or Uncertain recognition.
17
Let us take an example to understand the responses. Consider a system that has
four trained neurons. Each neuron is associated with either category 1 or category 2.
Now, let us broadcast an input vector, which is similar to the knowledge stored in the
neurons. Now depending upon the multiplexed result of the neurons result, the system
will determine if the input vector was recognized of not. The following table shows the
possible outcomes
Total number of neurons Total number of neurons
Classification
associated with Category 1 associated with Category 2
3
1
Identified as category 1
2
1
2.4.1
2
3
Table 2b: Classification Example
Uncertain recognition
Identified as category 2
Learning of a Neuron
Learning in neurons is a multi-step process. First, an input vector is broadcasted
on the bus followed by a category value. Second, those neurons are either in RTL
(Ready-To-Learn) state or in “firing state” response to broadcasting vectors. Now, if a
firing neuron responds to the learning operation then its influence field is modified with
the latest information. If no neuron fires to the broadcasting vector then a new neuron
gets committed with the category of the input vector. The influence field for this new
neuron gets set to its maximum influence field.
18
The neurons in the CM1K are connected through a bi-directional parallel bus
that prevents any redundant learning in the system. This bus is accessible to all of the
neurons. Whenever an input vector is broadcasted on this neuron bus, each neuron
checks for their status and response to the broadcast accordingly.
2.4.2
Current use of the Technology
CM1K’s pattern recognition capabilities have benefited many fields. One of its
practical implementations is offshore fish recognition and sorting. In this application, it
uses V1KU [1] which integrates a CMOS sensor for providing image/video input vector
data and a CM1K chip for pattern recognition. However, this system is accompanied by
a software tool that is used for communication with this device. The following block
diagram explains the process involved in this application.
Figure 2e: Methodology Flow chart [11]
19
This system requires software interface for its operations, hence is very software
dependent. An end user can communicate with this system only through Image
Knowledge Builder (IKB) (-name of the software), which prohibits the possibility of a
direct hardware level communication with the CM1K chip.
IKB provides a model for communication with the chip at software level. This
model provides great software applications perspectives but has limited access for the
hardware level communication. The software and an USB port handle all the
communication protocols in IKB; it is difficult to define exact functionality of the chip
and limits the possibility of exploration of the CM1K technology. For this arises a need
for a hardware communication model. Such a model will enable an end user in
understanding this technology to the core.
There are various benefits for such a model, such as speeding up the processes
of data exchange, as it cuts down the USB overhead.
With software communication model there will always be an overhead of some
or the other standard protocols, USB in case of the IKB.
20
Chapter 3
INTRODUCTION TO LATTICE FPGA-LFXP2-5E-6TN144C
AND CM1K- PGA69 MODULE
3.1
Introduction
Two major pieces of hardware involved in this project are, an FPGA from
Lattice Semiconductor™ and a CM1K –PGA69 module for Cognimem Technologies
Inc.
This chapter introduces both of these hardware modules.
This chapter also
discusses other tools necessary for this project along with a brief description of Veriloga Hardware Descriptive Language (HDL).
3.2
Designing with an FPGA
There are three main categories of Field Programmable Devices (FPDs): Simple
Programmable Logic Devices (SPLDs), Complex Programmable Logic Devices
(CPLDs) and Field Programmable Gate Arrays (FPGAs) [13]. Each of them have their
advantages and disadvantages, covering all of which would be out of the scope of this
report. This section will provide a brief overview of an FPGA and then will discuss
Lattice-XP2 FPGA architecture in a little detail.
The first type of user-programmable chip that could implement a logic circuit
was a Programmable Read-Only Memory (PROM) [13], since then, FPDs have evolved
dramatically. Now one can find a piece of FPD suitable for virtually every application.
21
In 1985, Xilinx introduced a concept of combining the user control, the
densities, cost benefits of gate arrays, and named it as the FPGA. An FPGA is a regular
structure of Logic cells™ or modules and interconnects which is under the designer’s
complete control [15]. There are two type of FPGAs: Static Random Access Memory
(SRAM)-based and One-time programmable (OTP). This types of FPGAs differs in the
implementation of the logic cells™ and the mechanism to use to make connections in
the device [15]. However, SRAM-based FPGAs are most popular among the designer
because of their reusability. FPGA is among the fastest growing segments of the
semiconductor industry.
There are several FPGA vendors in the market. Quality and features are the
deciding factors in selecting an FPGA. Xilinx, Altera, Lattice, Actel are among the top
vendors of the industry.
For this project, a Lattice-Xp2 family FPGA was chosen. CogniMem
Technologies Inc. facilitated with a Brevia Evaluation board [19] and a CM1K-PGA69
[16] module for this project.
3.2.1
Lattice-XP2 family
LatticeXP2™ is an Instant-on, secure, small-form-factor FPGA with a versatile
development platform for quick launch of design Initiatives and rapid time-to-market
[17]. These devices are based on flexiFlash™ [18] architecture that combines a 4-input
22
Look-up Table (LUT) based FPGA fabric with non-Volatile Flash cells for on-chip
storage of design data [17].
Figure 3a: The Lattice XP2 Brevia Development kit [19]
The LatticeXP2 Brevia Development kit has a LFXP2-5E-6TN144C FPGA
device, a 2Mbit SPI Flash Memory, a serial RS-232 interface, On-board USB controller
for JTAG programming (FTDI - FT2232H), a 2x20 and a 2x5 Expansion Headers, 4-bit
DIP Switch for user-defined inputs and 8 Status LEDs for user-defined outputs [15].
LFXP2-5E-6TN144C FPGA device is a part of XP2 FPGAs family and inherits
the benefits like instant-on, infinite re-configurability, on-chip storage with FlashBAK
embedded block memory and serial TAG memory with design security feature [20].
23
Table 3a: Basic physical attributes of LFXP2-5E-6TN144C FPGA device
[20] [19]
LFXP2-5E-6TN144C FPGA device is a 144-pin form-factor with its core
working at 1.2Volt. It has five thousand (5k) LUTs for a designer as a resource for
programing. This device contains an array of logic block surrounded by Programmable
I/O cells (PIC). Interspersed between the rows of logic blocks are rows of sysMEM™
Embedded Block RAM (EBR) and a row of sysDSP™ Digital Signal Processing block
as shown in figure 3b [20].
24
Figure 3b: XP2 on-chip component placement [20]
3.2.2
Programming a Lattice -XP2 FPGA
Lattice semiconductor™ is one of the most popular FPGAs in the semiconductor
market, their innovative architecture, programmable Power Management and Clock
management solutions provides an easy to use, yet powerful device. They also provide a
lot of free software tools to with their devices for a complete design solution. Several
such tools are used in this project. Lattice Diamond design software is one of the most
important tools used in this product.
25
3.2.2.1
Diamond Design Software and Design suite.
Lattice diamond® design software allows large and complex design to be
efficiently implemented into the lattice XP2 family of FPGA devices. The diamond
software uses the synthesis tool output along with the constraints from its floor planning
tools to place and route the design in the LatticeXp2 device. The timing tool extracts the
timing from the routing and back-annotates it into the design for timing verification
[20].
Lattice semiconductor™ provides customers with a free logic design
environment called Lattice Diamond. This tool provides a complete platform for a
design. A designer can chose Verilog® [24], VHDL [23] or a mix of both of these
Hardware Descriptive Languages (HDL) [24] for their design. It also allows mixing of
EDIF [22] and schematic sources [22]. Writing pin constraint is easy with its
Spreadsheet view [22].
A designer can install the latest version of the Diamond tool from Lattice
Semiconductor™ websites. Currently this tool is available for Linux and Windows
platform. There are several other tools, which come as a part of the tools set along with
Diamond suite. Some of these tools are optional and some are a part of the Diamond
software design environment. A few of these tools can be downloaded as a stand-alone
tool and can be used without installing Diamond software. Diamond Programmer [22]
is one such tool that is a part of the Diamond design tool and can also be used as a
stand-alone tool. This project uses the Diamond software extensively. The following
26
table gives an overview of the tools that are part of the Diamond software. These tools
will be used for their respective purposes as needed throughout the project.
Tool
Project Management Tools
Design Entry Tools
Design Simulation Tools
Design Constraints
Application Tools
Design Implementation Tools
Analyzing Static Timing,
Power
Consumption, and Signal
integrity Tools
Programming the FPGA Tool
Testing and Debugging Onchip
Tools
Applying Engineering Change
Order Tool
EPIC Device Editor
HTML Help and User
Documentation
Tcl/Tk Scripting Tool
Description
Include the Reports view, Run Manager, and the Security Setting tool to enable
you to create and maintain the project, keep track of the stages in the design
implementation process, review reports, and compare different implementations
of the project.
Include Source Editor, Schematic Editor, Symbol Editor, Symbol Library
Manager, IPexpress, Memory Generator, LatticeMico System, and HDL
Diagram, which offer VHDL, Verilog, EDIF, schematic and mixed-mode
design entry support and design structure check.
Include Simulation Wizard, Active-HDL Lattice Edition, and Waveform Editor
for performing functional simulation for the projects and creating the test
stimulus files.
Include Spreadsheet View, Package View, Device View, Netlist View, NCD
View, Floorplan View and Physical View to enable you to set constraints for
implementing the design.
Include the Process view, Synplify Pro for Lattice, and Clear Tool Memory to
ease the design implementation process.
Include Timing Analysis View and Power Calculator to enable you to estimate
the design performance, experiment with different configurations, and to
calculate power consumption.
Includes Programmer to let you program the FPGAs.
Include Reveal Inserter, Reveal Analyzer, and LatticeMico Debugger to let you
complete the final stage of developing a design: testing in the actual FPGA,
either on a test board or in your system.
Includes ECO Editor which supports engineering change orders by editing the
output files from the place-and-route stage of the design implementation
process.
Provides device editing capability for engineering change management and
detailed manipulation of FPGA implementation.
Includes complete instructions for designing with Diamond design tools and
third-party tools. Also provides user manuals, tutorials, example design
projects, and access to technical documentation from the Lattice Semiconductor
Web site.
Enables you to automate Diamond design processing.
Table 3b: Tools and Descriptions of the tools in Diamond Design suite [21].
27
3.3
Design flow
Hardware Descriptive Language (HDL) model typically follows a design flow.
In general, this flow is followed by industries to tape-out Application Specific
Integrated Circuits (ASICs) or General Purposes Processors (GPPs). This typical flow
starts with a design specification, followed by a behavioral description. Once behavior
description is complete, a Register Transistor Level (RTL) is described in HDL. After
finishing writing HDL code, a functional verification is done. If the functional
verification [27] fails then, RTL description is checked again to make sure that there are
no mistakes in the process so far. If the functional verification passes the design
specification criterion then logic synthesis [25] and timing verification [26] of the
design is done. A Gate-Level Netlist is then generated, followed by Logical verification
and testing.
If every process passes the specification criterion so far, then Floor-Planning and
Automatic place and Route is done. This process leads to Physical layout and then
Layout verification is done. If this processes passes all the industry criterion and design
specifications then Implementation of the design is done. In case of any failure in this
step of design flow the RTL description step is revisited and then whole flow is
followed until Layout verification is accomplished.
28
Design Specification
Behavioral Description
RTL Description (HDL)
Functional Verification and Testing
Logic synthesis and Timing Verification
Gate- Level Netlist
Logical Verification and Testing
Floor Planning and Automatic Place & Route
Physical Layout
Layout Verification
Implementation
Figure 3c: Typical design flow [3]
In FPGAs, a similar approach is followed, however instead of Implementation
of the design as a physical chip; the design is implemented into the FPGA. This
implementation results in a design as good as an ASIC but with the flexibility to change
the design at any point of time due to any good reason. Using an FPGA for a prevalidation of a design can lead to saving a lot of time and money for any innovative
venture. The following figure displays a typical design flow for a FPGA.
29
Figure 3d: Typical design flow for an FPGA [29].
The design flow for Very Large Scale Industry (VLSI) Integrated Circuits (ICs)
and FPGAs are slightly different; they still use HDLs to describe digital circuits. There
are two basic HDLs which can be used to describe any digital circuit and follows IEEE
standards: Verilog® [24] and VHDL (Very high-speed integrated circuit Hardware
Descriptive Language) [23]. This project has used Verilog for representing a digital
model of communication between an FPGA and a cognitive memory chip CM1K.
30
3.4
HDL-Verilog®
In the early 1980s, the digital designing market was at the verge of booming.
Digital designers felt the need of a language that could translate/generate a Netlist [24,
25] from an HDL code. This need for a language to model a digital circuit gave birth to
Verilog in 1983 at Gateway Design Automation [3]. Verilog was accepted as an IEEE
standard in 1995. Later another standard for Verilog was published as IEEE- 1364-2001
that had many significant changes [3].
Now designers can describe circuits at register transfer level (RTL) using
Verilog. All a designer has to do is to describe the data flow using HDL at RTL level
and the rest would be taken care by the synthesis tool [25]. The synthesis tool would
extract all the required information form the code and will provide the designer with the
details of the gates and their interconnections to other gates [3, 25].
An HDL has many benefits over a schematic based digital design. One of the
most lucrative advantages of using HDL for designing a digital circuit is that it makes
the design fabrication technology independent. This means that an HDL design for logic
can be used for fabricating a circuit at 35 micrometer or at 32 nanometer. A lot of
money and time can be saved by modeling a design in HDL than actually
manufacturing a prototype and then troubleshooting it. Additionally, with the growth in
the complexity in digital circuits, it has virtually became impossible for a human to
physically design a circuit without using a HDL and other associated tools [22]. When
31
a designer designs an HDL code, it needs to verified and tested for its functionality.
There are various methods and tools available in market to validate a Verilog design.
ModelSim® is used for simulation and testing of the Verilog code in this
project. This is a great tool for student and professional to simulate a HDL design. This
tool provides a support for both Verilog and VHDL. It has an easy to use graphical
interface that makes it one of the better choices for students starting with HDL
simulation and verification. A Student Edition version of this tool can execute up to
10,000 lines of code [30]. ModelSim provides a better way to diagnose a design by
providing waveforms and data flow for a design.
32
Chapter 4
MODEL FOR COMMUNICATION WITH A COGNITIVE MEMORY (MCwCM)
4.1
Introduction
Hardware level communication requires a good understanding of the
communication protocol between two devices. According to Oxford dictionary, in
English language, protocol is defined as “The accepted or established code of procedure
or behavior in any group, organization, or situation” [32]. This also applies to two
devices communicating to each other; they need to agree upon a set of rules for
communication. Communication between two devices implies to an exchange of
information or data. Therefore, in the world of digital electronics, a protocol refers to a
set of rules that ensures a smooth flow of information between two devices.
MCwCM demonstrates a hardware level communication between a CM1K
PGA-69 [14] and Lattice FPGA-LFXP2-5E-6TN144C [15]. Previous chapters were
dedicated to CM1K, Lattice FPGA device and required tools and technologies. This
chapter will provide a detailed description of the MCwCM design.
In MCwCM system, the flow of information happens between a Lattice FPGALFXP2-5E-6TN144C and the CM1K PGA-69 module. From here onwards, this
document will use “PGA” to represent CM1K-PGA69 module, “FPGA” to represent
Lattice FPGA-LFXP2-5E-6TN144C and “eval board” to represent the LatticeXP2
Brevia Development kit.
33
4.2
CM1K communication Protocol
CM1K can work in two modes: Normal mode, or Save and Restore mode. A
Normal mode is also sometime refers to Learning and Recognition mode. In this mode,
a designer can utilize 16-bit parallel bus or can use serial bus for an application.
MCwCM is designed to work in the normal mode and utilizes CM1K parallel bus. 15
registers can control behavior of all neurons.
The following tables discuss neuron registers in detail. These tables have six
columns, the first column represents the name of the register. The second column
describes the functionality of the register. The third column contains the address for the
respective registers. A write cycle can update a register whereas a read cycle lets the
used to render the information of a particular register. The fourth column describes the
possible operations on a particular register in normal mode. The column fifth describes
the possible read or write operation on a particular register in save and restore mode.
Lastly, column six displays a default value for a particular register along with its value
range.
34
Table 4a: Neuron Registers-1 [7]
35
Table 4b: Neuron Registers-2 [7]
36
Table 4c: Neuron Registers-3 [7]
37
Apart from registers, there are command and control lines which play an
important role in information exchange in CM1K technology. Following is a brief
description of some of the most critical command and control lines.
DS: DS stands for a Data Strobe line, this is a signal generated from the target to
initiate a read or a write command from or to CM1K module.
RW_: This is a command line and lets CM1K module know whether the data needs to
be written into the CM1K module or is to be rendered form CM1K module. By default
RW_ is 1, which implies a read command.
REG: This is a 5 bit register. This is used to address all 15 neuron registers. “Addr”
column of table 4a to table 4c represents the hex-decimal address value for all the
neuron register in CM1K.
DATA: This is a 16- bit bi-directional data bus which is used to write data into CM1K
in write mode and renders data/information from CM1K in read mode. All the bits of
DATA are connected with a pull-up connection and results in a default value of 16-bit
wide bus with all 1’s (xFFFF).
ID: The ID represents Identified line. Value of this line represents recognition status. If
all the neurons recognize the last input vector and are return same category value then
value of ID_ is zero. This line is updated each time last component of a vector is
38
broadcasted to the neurons either through a write command to LCOMP register or
through real time recognition logic of the CM1K [2].
UNC: The UNC represents Uncertain line. This is a bidirectional in nature and shall
never be driven [2]. Value of UNC is zero when there lays a conflict among firing
neurons on the category of an input vector.
Apart from above mentioned command and control lines there are a few more
important signals and pins a designer must be aware of to exact a hardware level
communication. These signals are as follows:
S_CHIP: This pin is used to enable or disable parallel bus of a CM1K chip. By default
this is pulled down, this setting configures parallel bus a as a bidirectional bus and
enables multiple CM1K chips to receive commands synchronously.
DCI: DCI stands for Daisy Chain In, and is an input to the first neuron of the system.
When DCI is high, it puts neuron in ready to learn mode (RTL) [7].
DCO: DCO stands for Daisy Chain Out. DCO of a chip becomes and active high when
a last neuron of the chip gets committed. In a multichip configuration, this signal
connects to DCI signal of the next chip in the row.
39
RDY: RDY stand for Ready line. RDY is pulled down by the neurons during the
execution of a command and released upon its termination [2]. This is a very important
signal and should be paid more attention while designing with CM1K.
As this project will does not use inbuilt recognition logic of the CM1K chip,
therefore we will disable the recognition logic by setting RECO_EN signal as Active
low in the FPGA design. In addition, inbuilt I2C master is not used in this project so
will be disabled by setting its value to an active low in the FPGA design.
4.3
MCwCM components.
MCwCM HDL design consists of many components. The following is a brief
description of the components in MCwCM:
Control State Machine (CSM): This module of the HDL design is the controller and
controls the data flow between the FPGA and PGA. This also generates the data for the
PGA module along with the address to where this data should be directed in the PGA
module.
CM1K Register Address Bus Logic (bus logic): This logic is connected to the
Register Address bus signals of the CSM and provides the register address information
to the PGA module.
40
CM1K Data Bus Logic (data logic): This module connects the data bus signals of the
CSM and the data bus of the PGA module. This ensures a dedicated path for data
exchange.
Figure 4a: MCwCM Block diagram
4.4
Signal description of MCwCM.
As described in above section of this chapter, there are signals which are used
for talking to a CM1K chip. These signals need to be configured with their respective
41
values for a hardware level communication. In MCwCM design FPGA will initialize
the signals and registers for the CM1K chip in PGA module.
Figure 4b: MCwCM signal representation
diagram
Figure 2b displays the direction of all the signals to the FPGA. All the FPGA
signals are described below.
DCI: DCI here represents Daisy Chain In. This signal is been set to an Active high.
This enables the CM1K chip in the PGA module and sets its first neuron to its active
state (RTL state). This signal is an output for FPGA’s point of view.
42
RDY: This signal represents Ready and is an input to the FPGA. This signal lets the
logic in FPGA know whether the PGA is ready to receive a command or not. It the
RDY signal is an active low then it implies that CM1K is not ready to receive any
further command and is still processing the previous command.
DS: This signal represents Data Strobe and is connected to the DS pin of the PGA
module. This is an output form the FPGA and an input to the PGA module.
Data Bus: This is a 16-bit bus which connects to the DATA lines of the PGA module.
This bus is responsible for transmitting data from FPGA to PGA and receiving data
from PGA to FPGA. This is a bi-directional bus.
Cm1kAddressBus: This is a 5 bit address bus and is connected to REG line of the PGA
module. This bus is used to address the neuron register in the CM1K chip of the PGA
module. This is also a bidirectional bus.
RwEn: This signal represents read, write enable, and connects to the RW_ pin of the
PGA module. This is a command signal represent a read or a write cycle. This is an
output form the FPGA and is an input to the CM1K chip.
UNC: UNC represents Uncertain and is an input to the FPGA. This pin is connected to
the UNCn pin of the PGA module.
ID: ID represents Identified line and is connects to the ID pin of the PGA module. This
is an input to the FPGA and output from the PGA module.
43
4.5
Communication protocols
The following are the rules which are to be followed when designing with a CM1K
chip.
1.
Active low RDY signal implies that the CM1K chip is busy processing
previous command and hence no new transaction should be initiated.
2.
DS should be and Active high before the start of every transaction and then
should be pulled down.
3.
DS should be never to be asserted when RDY line is an active low.
4.
Neurons sample signals on the positive edge of the clock, so all the signals
should be stable before positive edge of the clock.
5.
The setup time must at least be five nanoseconds.
6.
The hold time must at least be five nanoseconds.
7.
When not driving the DATA line it must be put to its default value.
8.
DS must be asserted or de-asserted only at the negative edge of the clock.
9.
RECO_EN should be set to active low in case the internal Recognition logic is
not used in the design.
10.
I2C_EN should be set to active low in case I2C master is not used in the
design.
11.
S_CHIP line should be set to and Active low.
44
12.
G_STDBY lines should be also set to an active low. This line is responsible
for putting the CM1K chip in standby mode. This project is not utilizing this
feature and thus G_STDBY is set to an active low.
13.
DCI line is set to an active high as this will set first neuron in the chain to
ready to learn state.
14.
G_RESET of the PGA module is an active low reset, which means that if
G_RESET value is an active low then the PGA module will go to its reset
state.
These protocols are for a design with a single chip, which an external master and
internal Recognition logic disabled. Disabling internal Recognition logic also disables
the digital input bus [7]. To use this bus, recognition logic should be enabled.
4.6
MCwCM Design Flow
This design is an implementation of a state machine. This state machine has four
states: IDLE, WRITE, BUS_TURNAROUND and READ.
IDLE state: This state initializes or put all the registers to their default state as required.
This is also the default state for the system.
WRITE state: This is the state where the FPGA initiates a write cycle. This state is
responsible for designating a REG address and then providing data that needs to be
written to designated register address of the CM1K.
45
BUS_TRUNAROUND: This is an intermediate state between the write and read cycle.
This sets the data bus to its default state as per the protocol. Sets DS signal to an active
low as per the protocol.
READ: The read happens in this state. This state initiates a read cycle and sets the
value of RW_ line to an active high. It also configures the address for the register in the
CM1K from where the data is to be read.
After a read cycle is done, the machine goes back to its IDLE state. In MCwCM
design, the number of read and write cycle performed is dependent on the
“MAX_NUM_TRANS”. This parameter can be set in the HDL code to increase or
decrease the number of read and write cycles. Another parameter in the MCwCM
design is BASE_ADDR , a value that defines the address to write; for this project it
defaults to “x06” which is the Minif Address. Parameter DEFAULT_WR_DATA is the
data which is initially written to the CM1K chip on the PGA module and then is
incremented by one for the rest of the cycles. DEFAULT_DATA data is the default data
for the data bus in an idle condition. The following is a state diagram representation of
the MCwCM design.
46
If ((start_op == 1)
&&
(num_trans_comple
ted <
MAX_NUM_TRANS))
&& (Rdy))
Start of the State
machine
If not (Rdy & start_op)
IDLE
If not
((start_op == 1)
&&
(num_trans_comple
ted <
MAX_NUM_TRANS))
&& (Rdy))
If (Rdy & start_op)
If not ((Rdy & start_op))
READ
WRITE
If not ((Rdy & start_op))
If (Rdy & start_op)
BUS_TURN
AROUND
If (Rdy & start_op)
Figure 4c: MCwCM state diagram
representation diagram
47
In Figure 4c, state diagram “start_op” represents start of an operation. In idle
state all the registers are reset to their default state. The above mentions state machine
was realized in Verilog. The design code can be found in the Appendix .
4.7
Simulation results
Following represents the waveform output of the design, with detailed timing
explanation.
Initialization
of the Cmk
Clock
Initialization of all the
registers in the system
on the positive edge
of the clock
Figure 4d: Waveform output-1
48
The MCwCM has two clock sources. One of the clock sources generates a clock
tic of 50 MHz. Then this 50 MHz clock is further divided into a 25 MHz clock. This
clock division is done to comply with the maximum clock frequency limit of CM1K
chip of 27 MHz.
Another thing mentioned in the above waveform is that all the registers in the
design are initialized on the positive edge of the clock.
Figure 4e: Waveform output-2
49
In the above waveform output, the following signals are worth noticing. These
signals display the relationship between the signals at hardware level communication:
fpga_top_tb/Rdy: This is the signal which CM1K will generate when it will sense a
read or write command. In this waveform, this signal has been generated for the testbench. For more information on how to generate this signal, please refer the code of in
the test-bench.
fpga_top_tb/DataStrobe: DataStrobe is the signal which is generated from the
MCwCM and responses to the ready signal. If you observe the waveform output closely
then you will notice that DataStrobe signal is inserted when the “Rdy” signal is active
high. This DataStrobe indicates to the CM1K chip that the communication device is
ready to communicate with the chip. After sensing this DataStrobe signal, CM1K chip
pulls down the Ready signal on the positive edge of the clock, and keeps it pulled down
as along as the command is not finished. Once the current command is finished, CM1K
pulls the “Rdy” signal high to indicate that CM1K is ready for the next command.
If there are more read or write commands for the chip, then DataStrobe signal is
again asserted (made active high) and then the system follows the above-mentioned
sequence. Important thing to notice is that DataStrobe should be stable by the positive
edge of the clock for CM1K chip to sense on the positive edge. Therefore, this signal is
generated on the negative edge of the clock.
50
fpga_top_tb/rdWrEn: This is the signal generated from the MCWCM design. This
signal tells the CM1K chip that a communicating device wants to write into the chip or
wants read from the chip. This signal should be stable by the positive edge of the clock
so that the CM1K chip can sense it on the positive edge of the clock. Therefore in this
design this signal is triggered on the negative edge of the clock so that the next positive
edge can have a stable signal for the CM1K chip. This is an important aspect of the
design and should be considered while designing with CM1K.
fpga_top_tb/RegBus: This is a 5 bit register, and is part of the MCwCM design. This
signal selects to the registers of the CM1K chip. In MCwCM this is set to a predefined
value which is address “0x06”. This is register address to set MinIf (minimum influence
field) of the CM1K. This can be changed to any address and then the system will write
into that location or will read the information from that location. This value of this
register should also be stable before the positive edge of the clock. Therefore in
MCwCM it is set at the negative edge of the clock so that it can be stable by the positive
edge of the subsequent positive edge of the clock.
fpga_top_tb/DataBus: DataBus represents the information or data which needs to be
written from the CM1K chip or read from the chip. Data should also be stable before
the positive edge of the clock and hence is made available on the DataBus on the
negative edge of the clock.
51
If observed closely, one can see a blue line after every cycle (read and write),
this line represents “Z” value, which means high impedance. This represents that the
DataBus line is been released for other operations, like bus snooping.
fpga_top_tb/LED: This is a signal that connects to the LEDs on the eval board. Eval
board has eight leds that can be used in a design. In MCwCM, these LEDs have two
functions as described under:
1. To display the state of reset: Whenever a reset is applied, the 4th and 5th LEDs
are lit.
2. To display the read output: When not in reset state, these LEDs represent the
data value of the DataBus after the last read transaction.
Note: Only eight LEDs are available, so it can represent a maximum value of 256.
4.8
MCwCM “write” operation
MCWCM write operation follows the following sequence:
Before beginning a write cycle MCwCM checks for following signals
-
RDY: If active high or not.
-
Start_op: Active high or not.
-
(num_trans_completed < MAX_NUM_TRANS): Number of transaction
completed are less than maximum number of transmission or not.
52
Note: “MAX_NUM_TRANS” is a parameter that determines how many times the state
machine will run. Default is set to 10; however, this can be set to any desired value by
updating this field in the code.
If all the above-mentioned conditions are satisfied then MCWCM system enters
the “Write” state of the state machine. The write state generates a DataStrobe signal, a
desired value for register address, a write command, and data to be written for
respective lines. The PGA module fetches these signals on the positive edge of the
clock and acknowledges them by pulling the RDY signal down to its active low state.
This initiates a write cycle. Number of clock cycle to complete a write command
depends upon the targeted register of the CM1K chip. It takes one clock cycle to write
data from MinIF (Register address “0x06”).
If any of the above-mentioned conditions are not met then the MCWCM stays in
the IDLE state. It waits at IDLE state until all the conditions are met, and then enters
into the WRITE state and completes a write command.
4.9
MCwCM “read” operation.
MCWCM read operation follows the following sequence:
Before beginning a read cycle MCwCM checks for following signals
-
RDY: If active high or not.
-
Start_op: Active high or not.
53
-
(num_trans_completed < MAX_NUM_TRANS): Number of transaction
completed are less than maximum number of transmission or not.
Note: “MAX_NUM_TRANS” is a parameter that determines how many times the state
machine will run. Default is set to 10; however, this can be set to any desired value by
updating this field in the code.
If all the above-mentioned conditions are satisfied then MCWCM system enters
the “read” state of the state machine. The read state generates a DataStrobe signal, a
desired value for register address that system intends to read data from, and a read
command. The PGA module fetches these signals on the positive edge of the clock and
acknowledges them by pulling the RDY signal down to its active low state. This
initiates a read cycle. Number of clock cycles to complete a read command depends
upon the targeted register a CM1K chip. It takes one clock cycle to read data from
MinIF (Register address “0x06”).
If any of the above-mentioned conditions are not met then the MCwCM stays in
the BUS_TURNAROUND state. It waits in BUS_TURNAROUND state until all the
conditions are met, and then enters into the READ state and completes a read command.
54
Chapter 6
CONCLUSION
This project has shown how to create a hardware level model for communication
with CM1K technology. This technology has many advantages over the contemporary
von-Neumann model of computing. When using CM1K, attention should be paid to
both hardware level as well as software level of communication. Currently, there are
several software options present in the market to interact with the CM1K chip.
MCwCM is the first hardware model for communication with this technology.
The milestones achieved in this project are:
1. Successfully studied and implemented CM1K communication protocol in
Verilog.
2.
Successfully implemented of MCwCM model in an FPGA
3. A communication system with an FPGA and PGA to demonstrate a hardware
level communication.
The MCwCM is currently being used at Cognimem Technologies Inc., for testing
the functionality of newly fabricated PGA69 modules.
In future, this project could be extended to implement entire functionality of CM1K
chip to create a standalone system with all the features of CM1K technology.
55
APPENDIX A
MCwCM design code in Verilog
module fpga_top (
// From Board to FPGA pins
input wire SysClk,
input wire SysRst_n,
// From FPGA to CMIK chip
output reg cm1kClk,//output wire cm1kClk,
output wire cm1kRst_n,
output wire dci,
output wire G_STDBY,
output wire s_chip,
output wire reco_en,
output wire i2c_en,
//visual output for read and write
output wire [7:0]LED,
// From Board to FPGA like push Button
input wire start_op,
// From CMIK chip to FPGA
input wire Rdy,
inout wire UNC, output wire tempUNC, //
input wire ID, output wire tempID,
output wire DataStrobe,
output wire rdWrEn,
inout wire [4:0] RegBus,//
inout wire [15:0] DataBus
);
input wire UNC,
output wire [4:0] RegBus,
// Parameters
parameter IDLE = 2'b00;
parameter WRITE = 2'b01;
parameter BUS_TURNAROUND = 2'b10;
parameter READ = 2'b11;
parameter MAX_NUM_TRANS = 10;
parameter BASE_ADDR = 5'b00110; //5'h06
parameter DEFAULT_WR_DATA =16'h01; //16'h5A5A;
parameter DEFAULT_DATA = 16'hFFFF;
parameter DEFAULT_UNC = 1'bz;
56
// Local registers and wires
reg [1:0] clk_divider_cntr;
reg [3:0] num_trans_completed;
reg DataStrobe_d1;
// Internal registers for Control State Machine (csm)
reg [15:0] csm_data;
reg [4:0] csm_reg;
//reg
csm_UNC;
reg csm_ds;
reg csm_rdWrEn;
reg [1:0] curState;
reg [1:0] nxtState;
// Latch readData to diplay on LED.
wire [15:0] readData;
//wire [7:0] LED;
// **********************
// Clock Generator logic
// **********************
always @ (posedge SysClk) begin
if (!SysRst_n) begin
clk_divider_cntr <= 0;
end
else if (clk_divider_cntr == 3) begin
clk_divider_cntr <= 0;
end
else begin
clk_divider_cntr <= clk_divider_cntr + 1;
end
end
//assign cm1kClk = (clk_divider_cntr > 1) ? 1 : 0;
always @ (posedge SysClk) begin
if (!SysRst_n) begin
cm1kClk <= 0;
end
if ((SysRst_n) &&( clk_divider_cntr <=1 ) ) begin
cm1kClk <= 0;
end
else if ((SysRst_n) &&( clk_divider_cntr >=2)) begin
cm1kClk <= 1;
end
end
// **********************
// Standby signal logic
// **********************
57
assign G_STDBY= 0;
// ******************************
// temprory "UNC" and "ID" logic
// ******************************
//wire tempUNC;
//wire tempID;
assign tempUNC = UNC; // assign UNC=
assign tempID = ID;
cm1kRst_n ? UNC : 1'bZ;
// **********************
// "s_chip" logic
// **********************
assign s_chip = 0;
assign reco_en= 0;
assign i2c_en = 0;
assign DataStrobe = csm_ds;
assign rdWrEn = csm_rdWrEn;
// **********************
// "dci" logic
// **********************
assign dci = 1;
// **********************
// Reset logic
// **********************
assign cm1kRst_n = SysRst_n;
// ***************************
// Control logic for DataBus
// ***************************
assign DataBus = (~rdWrEn & DataStrobe) ? csm_data : 16'hzzzz;
// ***************************
// Control logic for RegBus
// ***************************
//assign RegBus = csm_reg; // to make regBus as an output only.
assign RegBus = cm1kRst_n ? csm_reg : 5'hzz;
// ***************************
// FPGA Control state machine
// ***************************
always @ (negedge cm1kClk or negedge cm1kRst_n) begin
if (!cm1kRst_n) begin
curState <= IDLE;
end
else begin
curState <= nxtState;
end
end
always @ (*) begin
case(curState)
IDLE:
begin
if ((start_op == 1) && (num_trans_completed < MAX_NUM_TRANS)&& (Rdy))
begin
58
nxtState = WRITE;
end
else begin
nxtState = IDLE;
end
end
WRITE:
begin
if (Rdy & start_op) begin
nxtState = WRITE;
end
else begin
nxtState = BUS_TURNAROUND;
end
end
BUS_TURNAROUND:
begin
if (Rdy & start_op) begin
nxtState = READ;
end
else begin
nxtState = BUS_TURNAROUND;
end
end
READ:
begin
if (Rdy & start_op) begin
nxtState = READ;
end
else begin
nxtState = IDLE;
end
end
endcase
end
always @ (curState) begin
case(curState)
IDLE:
begin
csm_ds = 0;
csm_rdWrEn
= 1;
csm_reg
= 5'h06;//5'h00;
csm_data
= DEFAULT_DATA;
//csm_UNC
= DEFAULT_UNC;
end
WRITE:
begin
csm_ds = 1;
csm_rdWrEn
= 0;
csm_reg
= BASE_ADDR; // + num_trans_completed;
csm_data
= DEFAULT_WR_DATA + num_trans_completed;
// csm_UNC
= DEFAULT_UNC;
end
59
BUS_TURNAROUND:
begin
csm_ds = 0;
csm_rdWrEn
= 1;
csm_reg
= 5'h06;//5'h00;
csm_data
= DEFAULT_DATA;
//
csm_UNC
= DEFAULT_UNC;
end
READ:
begin
csm_ds = 1;
csm_rdWrEn
= 1;
csm_reg
= BASE_ADDR; //+ num_trans_completed;
csm_data
= DEFAULT_DATA;
//
csm_UNC
= UNC;
end
endcase
end
// ******************************************************
// Logic to count number of wr/rd transactions completed
// ******************************************************
always @ (negedge cm1kClk or negedge cm1kRst_n) begin
if (!cm1kRst_n) begin
num_trans_completed <= 0;
end
else if (curState == READ) begin
num_trans_completed <= num_trans_completed + 1; // ";"removed from
oriiginal
end
end
// Logic to latch
always @ (negedge
if (!cm1kRst_n)
DataStrobe_d1
else
DataStrobe_d1
end
assign readData =
//assign readData
read data
cm1kClk or negedge cm1kRst_n) begin
<= 0;
<= DataStrobe & rdWrEn;
DataStrobe_d1 ? DataBus : readData; //16'hzzzz;
= DataStrobe_d1 ? DataBus : 16'hzzzz;
// LED Logic
// For write data, drive 8'h55 and for reaad, drive 8'hAA
//assign LED = readData[15:8];
assign LED = cm1kRst_n ?((num_trans_completed == MAX_NUM_TRANS) ? (~readData)
: 8'b11111111) : 8'b11100111;
//assign LED = ((DataBus == csm_data) ? 8'h55 : ((DataBus == readData) ?
readData : 8'hzz));
60
//assign LED = ((DataBus == csm_data) ? 8'h55 : ((DataBus == readData) ? 8'hAA
: 8'hzz));
endmodule
61
APPENDIX B
MCwCM test-bench code in Verilog
`timescale 1ns/100ps
//`include "fpga_top.v"
module fpga_top_tb ();
reg SysClk;
reg SysRst_n;
wire dci;
wire cm1kClk;
wire cm1kRst_n;
wire s_chip;
wire G_STDBY;
reg start_op;
reg Rdy;
wire UNC; reg UNC_tb;
reg ID; wire tempID;
wire
wire
wire
wire
wire
DataStrobe;
rdWrEn;
[4:0] RegBus;
[15:0] DataBus;
[7:0]LED;
reg [15:0] write_data;
reg DataStrobe_d1;
wire ledClk_tb; //assign ledClk_tb = fpga_top_inst.ledClk;
defparam fpga_top_inst.MAX_NUM_TRANS = 10;
fpga_top fpga_top_inst (
.SysClk(SysClk),
.SysRst_n(SysRst_n),
.cm1kClk(cm1kClk),
.cm1kRst_n(cm1kRst_n),
.start_op(start_op),
.LED(LED) , // added another output
.Rdy(Rdy),
.UNC(UNC),
// .tempUNC(tempUNC),
.ID(ID),
.tempID(tempID),
.DataStrobe(DataStrobe),
.rdWrEn(rdWrEn),
62
//
.RegBus(RegBus),
.DataBus(DataBus),
.dci(dci),
.s_chip(s_chip),
.G_STDBY(G_STDBY)
.ledClk(ledClk_tb)
);
initial begin
SysClk = 0;
SysRst_n = 1;
Rdy = 0;
UNC_tb = 1;
ID = 1;
start_op = 0;
#5;
SysRst_n = 0;
#10;
SysRst_n = 1;
repeat (fpga_top_inst.MAX_NUM_TRANS * 2) begin
@(posedge cm1kClk);
@(posedge cm1kClk);
@(posedge cm1kClk);
start_op <= 1;
wait (rdWrEn & DataStrobe_d1 & !Rdy);
start_op <= 0;
end
end
// Dump all variables
initial begin
$dumpfile("Run.dmp");
$dumpvars();
#1200;
$stop; // $finish;
end
always
begin
#2 SysClk = ~SysClk;
end
always @ (posedge cm1kClk or negedge cm1kRst_n) begin
if (!cm1kRst_n)
Rdy <= 1;
else if (DataStrobe)
Rdy <= 0;
else
Rdy <= 1;
end
63
always @ (negedge cm1kClk or negedge cm1kRst_n) begin
if (!cm1kRst_n)
DataStrobe_d1 <= 0;
else
DataStrobe_d1 <= DataStrobe & rdWrEn;
end
always @ (*) begin
if (~rdWrEn & DataStrobe & !Rdy)
write_data = DataBus;
end
assign DataBus = (rdWrEn & DataStrobe_d1) ? write_data : 16'hzzzz;
//assign UNC = rdWrEn ? UNC_tb : 1'bz;
assign UNC = UNC_tb;
endmodule
64
REFERENCES
[1]
Cognimem Technologies Inc. webpage, accessed on October 1, 2012
www.Cognimem.com
[2]
CM1K Hardware User’s Manual, Version 2.4.0, Cognimem, Inc. Revised
07/10/2012
http://cognimem.com/_docs/TechnicalManuals/TM_CM1K_Hardware_Manu
al.pdf
[3]
Palnitkar, S. and Goel, P., Verilog HDL: A Guide to Digital Design and
Synthesis, Prentice Hall, 1996.
[4]
Bors, A. (2001), Introduction of the Radial Basis Function (RBF) Networks,
Online Symposium for Electronics Engineers (OSEE 2001).
http://www-users.cs.york.ac.uk/adrian/Papers/Others/OSEE01.pdf
[5]
Cognimem, from Technology to ASIC (Tech. Brief), Cognimem, Inc.
(accessed 9/2012).
http://www.cognimem.com/_docs/TechnicalBriefs/TB_from_Techno_to_ASIC.pdf
[6]
CM1K introduction to CM1K presentation, Cognimem, Inc. Revised on
07/10/2012.
http://www.cognimem.com/_docs/Presentations/PR_CM1K_introduction.pdf
[7]
CM1K Reference Guide, Version 2.2.1, Cognimem, Inc. Revised 07/10/2012.
http://www.cognimem.com/_docs/TechnicalManuals/TM_CogniMem_Technology_Reference_Guide.pdf
[8]
Weichang Chen; Ziqiang Wang; Zhihua Chen; Zhiyi Chen; , "Working
Mechanism of Brain Neural Network," Neural Networks and Brain, 2005.
ICNN&B '05. International Conference on , vol.3, no., pp.nil5-1309, 13-15
Oct. 2005
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1614872
[9]
Kanold, PJ 2003, 'Role of Subplate Neurons in Fuctional Maturation of
Visual Cortical Columns', Science, 301, 5632, p. 521, MAS Ultra - School
Edition, EBSCOhost, viewed 17 September 2012.
65
http://www.ces.clemson.edu/bio/documents/Publications/Kara%20%20Science%202003.pdf
[10] George, D.; , "How to make computers that work like the brain," Design
Automation Conference, 2009. DAC '09. 46th ACM/IEEE , vol., no., pp.420423, 26-31 July 2009
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5227027&isnumb
er=5227020
[11] Menendez, Anne; Paillet, Guy;, “Fish Inspection System Using a Parallel
Neural Network Chip and the Image Knowledge Builder Application,” AI
Magazine Vol.29(1):21–28
http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/2084
[12] HDL Synthesis Coding Guidelines for Lattice Semiconductor FPGAs ,
Version 2.0 , Lattice Semiconductor™, Revised, September 2012.
http://www.latticesemi.com/lit/docs/technotes/tn1008.pdf?jsessionid=f03063
791bdddc45308a24453f6966594233
[13] Brown, Stephen; Rose, Jonathan; “Architecture of FPGAs and CPLDs: A
Tutorial”, Department of Electrical and Computer Engineering, University of
Toronto, accessed on 12 October 2012.
http://www.eecg.toronto.edu/~jayar/pubs/brown/survey.pdf
[14] CM1K-PGA69 Module Hardware Specification, Cognimem, Inc. Viewed on
12 October 2012.
http://www.cognimem.com/_docs/Technical-Manuals/CTICM1KPGA69%20%20Hardware%20Specification%20final.pdf
[15] Lattice Semiconductor, product page, Xp2 Brevia Development kit
description.
http://www.latticesemi.com/products/developmenthardware/developmentkits/
xp2breviadevelopmentkit.cfm
[16] Parnell, Karen; Mehta, Nick., alnitkar, S. and Goel, P., Programmable Logic
DesignQuick Start Hand Book, Xilinx®, January 2002, Viewed on 14
October 2012.
http://www.ee.ic.ac.uk/pcheung/teaching/ee3_dsd/beginners_bk_4x.pdf
[17] Lattice Semiconductor, Low –cost, 3rd generation, Non-Volatile FPGA,;
http://www.latticesemi.com/documents/33797.pdf
66
[18] Lattice Semiconductor, FlexiFlash Architecture,
http://www.latticesemi.com/products/fpga/xp2/flexiflasharchitecture.cfm
[19] Lattice Semiconductor, LatticeXp2 Brevia User Guide,
http://www.latticesemi.com/documents/doc43735x37.pdf
[20] Lattice Semiconductor, XP2 family data sheet handbook,
http://www.latticesemi.com/documents/HB1004.pdf
[21] Lattice Diamond Installation guide
http://www.latticesemi.com/documents/diamond_20_install_pc.pdf
[22] Lattice Diamond 2.0 Tutorial
http://www.latticesemi.com/documents/latticediamondtutorial20.pdf
[23]
"IEEE Standard VHDL Language Reference Manual.," ANSI/IEEE
Std 1076-1993 , vol., no., pp.i, 1994
doi: 10.1109/IEEESTD.1994.121433
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=392
561&isnumber=8908
[24] "IEEE Standard for Verilog Hardware Description Language," IEEE Std
1364-2005 (Revision of IEEE Std 1364-2001) , vol., no., pp.0_1-560, 2006
doi: 10.1109/IEEESTD.2006.99495
URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1620780&is
number=33945
[25] Jie-Hong (Roland) Jiang, Srinivas Devadas (2009). "Logic synthesis in a
nutshell". In Laung-Terng Wang, Yao-Wen Chang, Kwang-Ting Cheng.
Electronic design automation: synthesis, verification, and test. Morgan
Kaufmann. ISBN 978-0-12-374364-0. Chapter 6.
[26] M. Nomura et. al., “Timing Verification System Bases on Delay Time
Hierarchical Nature,” 19th Design Automation Conf., pp. 622-628. 1982.
[27] Functional verification article on EETimes.
http://www.eetimes.com/design/eda-design/4004785/Leveraging-systemmodels-for-RTL-functional-verification
67
[28] Synopsys Floor Planning tool
http://www.synopsys.com/tools/implementation/signoff/capsulemodule/desig
n_plan_wp.pdf
[29] Porting designs form Xilinx , Altera in to a Lattice semiconductor
http://www.latticesemi.com/lit/docs/manuals/fpga_design_guide.pdf
[30] Mentor Graphics –ModelSim product page
http://www.mentor.com/products/fpga/simulation/modelsim
[31] CM1K data sheet, Cognimem, Inc. Accessed on November 4, 2012.
http://www.cognimem.com/_docs/Datasheet/DS_CM1K.pdf
[32] Online Oxford dictionary, accessed on Nov 1, 2012.
http://oxforddictionaries.com/definition/american_english/protocol?region=us
&q=protocol