Design and Implementation of a DMA Controller for Digital Signal

advertisement
Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Design and Implementation of a DMA Controller
for Digital Signal Processor
Examensarbete utfört i Datorteknik
vid Tekniska högskolan i Linköping
av
Guoyou Jiang
LiTH-ISY-EX--10/4244--SE
Linköping 2010
Department of Electrical Engineering
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköpings tekniska högskola
Linköpings universitet
581 83 Linköping
Design and Implementation of a DMA Controller
for Digital Signal Processor
Examensarbete utfört i Datorteknik
vid Tekniska högskolan i Linköping
av
Guoyou Jiang
LiTH-ISY-EX--10/4244--SE
Handledare:
Dake Liu
isy, Linköpings universitet
Examinator:
Dake Liu
isy, Linköpings universitet
Linköping, 12 August, 2010
Avdelning, Institution
Division, Department
Datum
Date
Division of Computer Engineering
Department of Electrical Engineering
Linköpings universitet
SE-581 83 Linköping, Sweden
Språk
Language
Rapporttyp
Report category
ISBN
¤ Svenska/Swedish
¤ Licentiatavhandling
ISRN
¤ Engelska/English
£
¤ Examensarbete
£
¤ C-uppsats
¤ D-uppsats
¤
¤ Övrig rapport
2010-08-12
—
LiTH-ISY-EX--10/4244--SE
Serietitel och serienummer ISSN
Title of series, numbering
—
¤
URL för elektronisk version
http://www.da.isy.liu.se
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-58868
Titel
Title
Design and Implementation of a DMA Controller for Digital Signal Processor
Författare Guoyou Jiang
Author
Sammanfattning
Abstract
The thesis work is conducted in the division of computer engineering at the
department of electrical engineering in Linköping University. During the thesis
work, a configurable Direct Memory Access (DMA) controller was designed and
implemented. The DMA controller runs at 200MHz under 65nm digital CMOS
technology. The estimated gate count is 26595.
The DMA controller has two address generators and can provide two clock
sources. It can thus handle data read and write simultaneously. There are 16
channels built in the DMA controller, the data width can be 16-bit, 32-bit and
64-bit. The DMA controller supports 2D data access by configuring its intelligent
linking table. The DMA is designed for advanced DSP applications and it is not
dedicated for cache which has a fixed priority.
Nyckelord
Keywords
direct memory access, DMA, digital signal processing, DSP, linking table, processor, peripherals, scalability, testbench, verification
Abstract
The thesis work is conducted in the division of computer engineering at the
department of electrical engineering in Linköping University. During the thesis
work, a configurable Direct Memory Access (DMA) controller was designed and
implemented. The DMA controller runs at 200MHz under 65nm digital CMOS
technology. The estimated gate count is 26595.
The DMA controller has two address generators and can provide two clock
sources. It can thus handle data read and write simultaneously. There are 16
channels built in the DMA controller, the data width can be 16-bit, 32-bit and
64-bit. The DMA controller supports 2D data access by configuring its intelligent
linking table. The DMA is designed for advanced DSP applications and it is not
dedicated for cache which has a fixed priority.
5
Acknowledgments
This is the result of master thesis work starting from spring of 2009 to the spring
of 2010 in Linköping University.
First of all, I would like to thank my supervisor and examiner Professor Dake
Liu, who gave me the great opportunity to do this final year project. The thesis
would not be possible to complete without his experience and support.
Second, I would like to give my gratitude to those Ph.D students in the division
of Computer Engineering. Their experience in the digital signal processor design
helped me a lot. Jian Wang, who helped me with some key issues in the design of
behavior model. Di Wu, who introduced me with this topic. Olof Kraigher, who
helped me to solve some programming problems of the C++ model. I also want
to thank He Zhang, who helped me discussing some example applications of the
design.
I also want to appreciate Thomas Österholm, who helped me to integrate my
design to the complete DSP system. Andreas Ehliar and Johan Eilert who gave
me a lot of advice while implement my design into ASIC.
Last but not least, I want to express my appreciation to my parents in my
hometown Shanghai, their love and supports are unlimited and throughout my
entire academic career far away from home.
7
Contents
1 Introduction
1.1 Scope . . . . . .
1.2 Method . . . . .
1.3 Thesis Overview
1.4 Notations . . . .
1.5 Abbreviations . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
14
15
15
16
2 Background
2.1 DMA Basics . . . . . . . . . . .
2.2 DMA Operations . . . . . . . .
2.2.1 Normal DMA Operation
2.2.2 Chain Operation . . . .
2.2.3 Linking Table Operation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
18
19
19
20
3 Application Requirements
3.1 Application Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Requirement Specification . . . . . . . . . . . . . . . . . . . . . . .
23
23
26
4 Interfaces
4.1 Host Interface . . . . . . . . . .
4.1.1 Main Status Register . .
4.1.2 Main Control Register .
4.1.3 Special Memory Control
4.2 Memory Interface . . . . . . . .
4.3 Behavior model of I/O . . . . .
4.4 Task Packet Specification . . .
. . . . .
. . . . .
. . . . .
Register
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
30
31
31
32
32
5 DMA Hardware
5.1 Host Interface . . . . . . . . . .
5.1.1 Block Diagram . . . . .
5.1.2 Interface . . . . . . . . .
5.2 Source Address Generator . . .
5.2.1 Block Diagram . . . . .
5.2.2 Interface . . . . . . . . .
5.3 Destination Address Generator
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
38
38
38
39
39
40
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
42
42
42
42
43
43
45
6 Integration
6.1 Hardware Integration . . . . . . . . .
6.2 Software Integration . . . . . . . . .
6.3 DMA Programming . . . . . . . . .
6.3.1 Initialize the DMA Controller
6.3.2 Poll the DMA Controller . .
6.3.3 Handle the DMA Interrupt .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
47
48
49
50
51
7 Verification
7.1 Functional Verification . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . .
53
53
54
8 Conclusion
8.1 Achieved Results . . . .
8.1.1 DMA Benchmark
8.1.2 Comparison . . .
8.1.3 Conclusion . . .
8.2 Future Work . . . . . .
55
55
55
56
57
57
5.4
5.5
5.6
5.3.1 Block Diagram
5.3.2 Interface . . . .
Source Decoder . . . .
5.4.1 Block Diagram
5.4.2 Interface . . . .
Destination Decoder .
5.5.1 Block Diagram
5.5.2 Interface . . . .
Transaction FSM . . .
5.6.1 Interface . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
59
A DMA Simulator C++ Header
61
B DMA Simulator C++ Code
63
List of Figures
1.1
DIT butterfly of Radix-2 FFT . . . . . . . . . . . . . . . . . . . . .
14
2.1
2.2
2.3
2.4
System overview . . . . . . . . . . . . . . . . . .
Basic DMA operation to save processor run time.
DMA Chain operation example. . . . . . . . . . .
An example of DMA linking table operation. . .
.
.
.
.
18
19
20
20
3.1
3.2
Matrix Transposition . . . . . . . . . . . . . . . . . . . . . . . . . .
Transfer decomposition of Example 3.1 . . . . . . . . . . . . . . . .
23
24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
11
3.3
3.4
3.5
Transfer decomposition of Example 3.2 . . . . . . . . . . . . . . . .
Neighbor Searching in Motion Estimation . . . . . . . . . . . . . .
Transfer decomposition of Example 3.3 . . . . . . . . . . . . . . . .
25
25
27
4.1
DMA configuration . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
DMA Hardware architecture . . . . . . . . . . .
DMA Controller Block Diagram . . . . . . . . .
Block diagram of Host Interface Module . . . .
Block diagram of Source address generator . . .
Block diagram of Destination address generator
Block diagram of Source decoder . . . . . . . .
Block diagram of Destination decoder . . . . .
Finite State Machine of the control logic . . . .
.
.
.
.
.
.
.
.
37
38
39
40
41
42
43
44
7.1
DMA Functional Verification Flow . . . . . . . . . . . . . . . . . .
53
8.1
8.2
Timing diagram of basic DMA operation. . . . . . . . . . . . . . .
Timing diagram of linking table operation. . . . . . . . . . . . . .
55
56
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
3.1
3.2
Preparing DMA for Motion Estimation . . . . . . . . . . . . . . . .
Requirement Specification . . . . . . . . . . . . . . . . . . . . . . .
26
28
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
Host Interface . . . . . . . . . . . . . . . . . .
DMA Registers specification . . . . . . . . . .
Main status register specification . . . . . . .
Main control register specification . . . . . .
Special memory control register specification
Memory Interface . . . . . . . . . . . . . . . .
Task packet specification . . . . . . . . . . . .
Control Vector 1 . . . . . . . . . . . . . . . .
Control Vector 2 . . . . . . . . . . . . . . . .
Control Vector 3 . . . . . . . . . . . . . . . .
Control Vector 4 & 5 . . . . . . . . . . . . . .
Control Vector 6 & 7 & 8 . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
30
30
30
31
31
33
34
35
36
36
36
5.1
5.2
5.3
5.4
5.5
5.6
Interface
Interface
Interface
Interface
Interface
Interface
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
40
41
43
44
45
8.1
Synthesis Result of DMA controller . . . . . . . . . . . . . . . . . .
56
of
of
of
of
of
of
Host Interface Module . . . .
Source address generator . . .
Destination address generator
Source decoder . . . . . . . .
Destination decoder . . . . . .
Transaction FSM . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
12
Contents
8.2
Results Comparison with and without DMA . . . . . . . . . . . . .
57
Chapter 1
Introduction
Today, as the technology evolving, there are lots of DSP applications emerge on the
horizon. The demands for rich content multimedia such as HDTV or 3D display
are huge. Behind all these demands, there are always some technologies pushing
the need for better experience of electronic products. One of them is called digital
signal processing. The DSP techniques have provided improvements in traditional
signal processing applications like audio, visual, radar, and communications [9,
p.1].
The component which does the digital signal processing can be called digital signal processor. A special designed peripheral of the processor can help the
processor itself with accessing memories. That peripheral can be called DMA
controller.
With the help of DMA or DMA controller, the processor can do more tasks
related to computing itself while the data transfer is in progress. Since most of the
memory accesses are hidden from the DSP algorithms, it is important to reveal the
hidden memory accesses from the algorithms [6]. A DMA controller will be a great
help in the perspective of both power consumption and performance benchmark.
For example, a DIT butterfly algorithm, which is the basis of FFT algorithm, can
be divided into the following steps and it is shown in Figure 1.1:
1. Load two complex operands;
2. Load one complex coefficient and perform one complex Multiply;
3. Perform two complex Addition;
4. Store two complex results.
This is a simple example of memory accesses hidden in the basic DSP algorithms,
more detailed discussion will be presented in Chapter 3.
13
14
Introduction
Figure 1.1. DIT butterfly of Radix-2 FFT
1.1
Scope
The scope of this thesis work is to design and implement a DMA peripheral for
Senior, a DSP processor developed at the division of Computer Engineering in
Linköping University.
The interface between the DMA controller and DSP core was already done in
another project [7, p.53]. The design work started from the definition of DMA
specification.
For many DSP applications, it is always desired to use a technique called linking
table to accelerate the processing two-dimensional array [6, p.584]. The linking
table is thus supported in the current DMA design.
In order to make sure the design is correct, a test bench is also developed to
verify the functionality of designed modules. Since the DMA should work with
Senior DSP, the test bench was written on the basis of the Senior test bench.
1.2
Method
For designing the DMA module, the specification should be defined on the requirement of applications. Since the DMA is designed to meet the need of Senior
DSP, a behavioral model of DMA module should also be added to the exist Senior
instruction set simulator. It is important to develop the behavioral model because
it can be used not only to get the performance benchmark of the hardware, but
also be used to compare with the actual hardware for verification.
Once the behavioral model is done, the RTL implementation is to translate
the behavioral model into RTL language such as Verilog. After the completion of
RTL implementation, the behavioral model is used as a golden reference to verify
the RTL module. If they produce the same result, then it is believed that the RTL
implementation is correct.
1.3 Thesis Overview
1.3
15
Thesis Overview
In Chapter 1, a brief introduction is presented to let the reader know what this
thesis is about. Some basic knowledge background and operations of DMA will
then be discussed in Chapter 2.
In Chapter 3, some applications will be analyzed first and then the requirement
specification will be discussed based on the analysis of application requirements.
The designed DMA controller should work together with our host Senior DSP,
in Chapter 4, the interfaces and registers of the DMA controller will be described
along with the DMA task. Thus, the user of Senior will have an idea on how the
DMA works with Senior DSP.
After discussing the requirement specification and the host interface, Chapter
5 will describe the detailed hardware architecture of the designed DMA controller,
the micro architecture of each block will also be detailed in this chapter.
Once the DMA controller hardware is completed, we need to integrate it into
the Senior system, Chapter 6 discuss the integration of DMA controller both in
hardware perspective and in software perspective. The DMA controller behavioral
model will also be discussed.
Chapter 7 will discuss the verification of the implemented hardware.
Chapter 8 is the summary which contains the results I have got, together with
the conclusions and the future work.
1.4
Notations
In order to make the thesis more understandable, there are some notations the
readers should be kept in mind while go through the text.
A $ and 0x before the number means that the number is in hexadecimal.
A number without any prefix is a decimal number. For example, "0x64" means
decimal value 100, while "64" means decimal value 64.
When discussing specific bits of a word, the Verilog syntax is used as far as possible. Three zeros after each other followed by three ones is written as 6’b000111,
where 6 denotes the total number of bits, the b tells it is a binary value. status[10:5] means the bits 10 to 5 of register status, and just bit 3 is written as
status[3].
16
1.5
Introduction
Abbreviations
3D
AGU
ASIC
ASIP
DCT
DDR
DIT
DM
DMA
DRAM
DSP
FFT
FIFO
FPGA
FSM
GIO
HDTV
I/O
ISR
IP
JPEG
LSB
MB
MP3
MSB
MUX
PC
PM
RTL
SDR
3 Dimensional
Address Generation Unit
Application Specific Integrated Circuit
Application Specific Instruction set Processor
Discrete Cosine Transform
Double Data Rate
Decimation In Time
Data Memory
Direct Memory Access
Dynamic Random Access Memory
Digital Signal Processor
Fast Fourier Transform
First In First Out
Field Programable Gate Array
Finite State Machine
General I/O
High Definition Television
Input/Output
Interrupt Service Routine
Intellectual Property
Joint Photographic Experts Group
Least Significant Bit
Macro Block
MPEG 1 Layer 3
Most Significant Bit
Multiplexer
Program Counter
Program Memory
Register Transfer Level
Single Data Rate
Chapter 2
Background
With the help of pipeline, the processor core can execute one operation in one
cycle, including calculation, data load and data store, in reality it is only possible
to achieve optimal performance in the application if the processor core has to do
the data transfer itself [4, p.75]. This is where the DMA controller can be used to
relieve the core from data movements.
2.1
DMA Basics
DMA stands for Direct Memory Access, and it is a technique to transfer data
blocks between memories directly without using the processor for data access [6,
p.535] [5]. Since the DSP is designed to do highly computational work, in most
cases, a separated peripheral should help the processor core to access processor
memories instead of the processor itself doing that. While the peripheral is doing
memory transactions, the processor can do other operations not related to those
memory transfers.
DMA module or DMA controller, by definition, is a peripheral module of a
processor core for direct memory access. The basic work flow of a DMA transaction
can be described as follows. The core or other data units prepare and send a DMA
request to the DMA controller when they want to transfer a lot of data. The DMA
controller prepares and transfers data while the core can do other operations. The
core might poll the status of DMA controller to see if the transfer is completed, or
an interrupt will be sent to core or other data units by the DMA controller when
the transaction is finished. Then the processor core can decide if it is going to
continue to process on the data.
A DMA subsystem can consist of a processor core, DMA module and several
memory modules connected to both processor core and DMA module.
The DMA module can provide DMA transfers between two memory interfaces.
Transfers can also be performed between memories and high-speed I/Os. Figure
2.1 shows a typical DSP sub-system with the DMA module inside.
In this DSP sub-system, the DSP core acts like the system master, and the
DMA module is the slave of the DSP core. On the other hand, the DMA module
17
18
Background
Figure 2.1. System overview
is the master of its connected memory modules and high-speed I/Os, etc. Both
the DSP core and the DMA module can access the memory modules, but cares
must be taken since the memories cannot be accessed at the same time.
From the DMA controller’s point of view, the master DSP core configure the
data format of the transaction and request DMA to do the data transfer. The
configuration is called a DMA channel, which consists of the task priority, source
port and destination port of the transfer, start addresses of both ports, the data
packet size, etc.
2.2
DMA Operations
Usually, the DMA controller should be able to support more than one operation,
since there are quite a lot of different access patterns according to different DSP
algorithms. This section will illustrate several transfer options and their operations.
2.2 DMA Operations
2.2.1
19
Normal DMA Operation
This is a simple DMA operation performing a block copy. In this operation, DMA
performs a block copy from one location to another, either on the same interface
or on different interfaces. The external software running on the processor core is
responsible for limiting the access time. Figure 2.2 shows the basic DMA operation
performed by the DMA controller.
Figure 2.2. Basic DMA operation to save processor run time.
As we can see from Figure 2.2, the processor core is responsible for the DMA
transaction, once there is a need for the data of the processor, the processor will
prepare a DMA request which specifies some basic parameters of the transfer.
Then the processor will send the request through the general I/O to the DMA
controller. The DMA controller will transfer the corresponding data from memory
location 1 to memory location 2 based on the request sent by the processor. When
the transfer is finished, the processor will check the status register of DMA controller or an interrupt will be sent to the processor. When the processor get the
information that the transfer is done. It can use the data provided by the DMA.
Thus, while the DMA is doing the data transfer, the processor can do other things
rather than transferring the data itself, the run time can be saved.
2.2.2
Chain Operation
In this operation, a contiguous set of elements can be transferred when a synchronous event occurs [1] [8]. The DMA controller is used to transfer a chain of
data elements which have equal distance between each element. Once the DMA
controller gets the task, it will setup the proper parameters and transfer each
element in that chain. Figure 2.3 is an illustration of this operation.
As we can see in Figure 2.3, each data element is separated by fixed stride.
After transferring the first data element, the DMA can transfer the next element
just like the data elements are chained together. By doing this operation, extra
time for channel configuration can be saved.
20
Background
Figure 2.3. DMA Chain operation example.
2.2.3
Linking Table Operation
In this operation, multiple data blocks will be merged as one large data block of
a DMA transaction. Since some of the DSP algorithms require data blocks at
different locations in the main memory, with the help of linking table, multiple
data blocks can be loaded sequentially by one DMA transaction. For example,
in a video CODEC application, it is often desired to compare data from different
reference frames [6]. A linking table concatenates several data blocks into one
DMA transaction. Figure 2.4 gives an example of linking table.
Figure 2.4. An example of DMA linking table operation.
The first data block starts at the physical address 0x2000, the length of this
block is 256 data words. While the first data block is loaded, the loading of second
data block, which has the block number 2, is followed at once. As we can see from
the Figure 2.4, the start address is 0x4000 and the length is 128. And after the
loading of data block 2, the loading of data block 3 is activated immediately. The
start address of data block 3 is 0x8000 and block length is 512. When the link=0
is reached at the end of data block 3, the DMA transaction is finished. Using
linking table, three non-continuous data blocks transferring are merged into one
single DMA transaction.
2.2 DMA Operations
21
Actually, linking table operation is a more flexible form of chain operation.
Since the distance between each data element is not fixed, we need another parameter to determine the length of each data element. Table 4.7 gives us a detailed
configuration of linking table.
Chapter 3
Application Requirements
In this chapter, several application examples will be described and analyzed, then
the requirement specification will be proposed based on the analysis of these examples.
3.1
Application Analysis
First of all, let us take several application examples into consideration.
Example 3.1: Matrix Transposition
Suppose we want to transpose a matrix.
0x0
0x4
0x8
0xC
0x1
0x5
0x9
0xD
0x2
0x6
0xA
0xE
0x3
0x7
0xB
0xF
Address
0
1
2
3
4
5
.
.
14
15
Data
0x0
0x1
0x2
0x3
0x4
0x5
.
.
0xE
0xF
0x0
0x1
0x2
0x3
0x4
0x5
0x6
0x7
0x8
0x9
0xA
0xB
0xC
0xD
0xE
0xF
Figure 3.1. Matrix Transposition
The matrix may be saved in the memory consequently shown as Figure 3.1.
In order to transpose the matrix, we can simply move the data from the original
address to the desired position. It could be thus abstracted by the chain operation
as we discussed in Section 2.2.
23
24
Application Requirements
Figure 3.2. Transfer decomposition of Example 3.1
The data transfer can be represented in Figure 3.2, we can split the whole
transfer into four chained transfer. In the example, the source address is discrete
with a stride of 4 data words while the destination address is continuous. This is
only a simple example due to the small size of the matrix. In more complicated
application, the matrix could be very large, but the basic principle still holds.
Example 3.2: Create a large Matrix
Suppose we want to create a large matrix with 4096 elements, each element of
the matrix is the same value 0 or 1. This case is quite common in the matrix
manipulation in both communication algorithms and video processing algorithms.
It is possible to create such matrix by writing continuous zeros or ones to a serious
address. But to do this will waste quite a lot of precious core cycles, which makes
impossible for the core to do more useful tasks.
In this case, we can simply use the DMA controller to create the zero matrix.
First we use the core to write one element in DM0 the matrix, then we use the
DMA controller to transfer the same content to the DM1, suppose we should create
the matrix in DM1.
As we can see from Figure 3.3, the transfer is quite simple. The source address
is fixed, while the destination address is continuous. The data to be transferred is
the same as the size of the matrix.
3.1 Application Analysis
25
Figure 3.3. Transfer decomposition of Example 3.2
Let us see a more complicated and realistic example according to the algorithms
of motion estimation [6, p.585].
Example 3.3: Motion Estimation
In the motion estimation algorithm, each macro block (usually 16 × 16 = 256
pixels) in the current frame will be compared by searching the neighboring area
of the reference frame.
01
09
17
25
33
02
03
04
18
26
34
19
20
28
36
35
05
06
07
08
01
09
17
25
33
02
03
04
05
06
07
27
Figure 3.4. Neighbor Searching in Motion Estimation
Suppose we divide the picture into 8 × 8 = 64 macro blocks, each macro block
contains 256 pixels. We want to estimate the motion vector of macro block 27
in the current frame. Based on the algorithm, we need to search the neighboring
macro blocks in the reference frame. The macro blocks of number 18, 19, 20, 26,
28, 34, 35, 36 in the reference frame are going to be compared. Usually, the data
memory of the processor core is not large enough to hold the whole picture, we
need to transfer the desired data from main memory to the data memory of the
processor core. Then the processor can perform the algorithms on the data.
08
26
Application Requirements
Let’s say the segment address of the current frame in the main memory is 32768
and the address of the reference frame is 32768+(8×8)×(16×16) = 49152. Thus,
we can specify the data block to be transferred in Table 3.1.
Specification
DMA task ID
Task priority
Number of links
Source port
Destination port
Destination start
address
Link 1 start address
Link 1 length
Link 2 start address
Link 2 length
Link 3 start address
Link 3 length
Link 4 start address
Link 4 length
Link 5 start address
Link 5 length
Value
1
1
5
Main memory
DM0
0
32768 + 26 × 256 = 39424
256
49152 + 17 × 256 = 53504
768
49152 + 25 × 256 = 55552
256
49152 + 27 × 256 = 56064
256
49152 + 33 × 256 = 57600
768
Comment
The identification of transaction
The priority of the transaction
Block 27 in current frame
Block 18 in reference frame
3 blocks in row
Block 26 in reference frame
Block 28 in reference frame
Block 34 in reference frame
3 blocks in row
Table 3.1. Preparing DMA for Motion Estimation
Based on the data block specification in Table 3.1, we can draw the transfer
decomposition in Figure 3.5 as follows.
3.2
Requirement Specification
As we have described in Chapter 2, the DSP core is responsible to configure the
DMA controller. So we need to specify the parameters of the memory transfer.
As configured by the DSP core, the DMA controller will connect a source port
and a destination port. Here, a port is either a data source supplying data or a
data sink consuming data. In most cases, a port is a memory location or a data
buffer. A DMA data transaction is to move data from source port to destination
port as configured by the DMA task from the master DSP core.
In order to design the DMA controller, we need to specify the following parameters of the memory transaction.
• Number of ports supported by the DMA controller
This specifies the number of channels can be connected by the DMA controller.
• Address Generator Unit (AGU)
The AGU is used to provide address required for memory access. At least
3.2 Requirement Specification
27
Figure 3.5. Transfer decomposition of Example 3.3
two AGUs are needed, one to provide source address and the other is to
provide the destination address of a data transfer.
• Data Width
Since the DMA controller should support different memory modules, the
width of data path should be configurable. We need to specify the data
widths supported by the DMA module.
• Memory Organization
Since there are two different ways to store words in a byte-addressed memory.
The least significant byte stored at lower address is called “little endian”,
while the least significant byte stored at higher address is called “big endian”
[3]. There is no specific reason why to choose one way or another, but still
we need to specify the format we support during the data transfer.
• Linking Table support
As we described in Chapter 2, the linking table can save the extra cost for
configuring several separate data blocks by concatenating several data blocks
into one transaction. On the other hand, it also costs extra hardware to keep
track of several different data blocks [2]. Thus we need to specify the length
of the linking table.
Table 3.2 shows the requirement specification of the DMA controller to be
designed for Senior system.
28
Application Requirements
No.
1
2
3
4
5
6
7
Description
16 Source ports:
8 on-chip memory, 1 off-chip memory, 1 high-speed I/O, other reserved.
16 Destination ports:
8 on-chip memory, 1 off-chip memory, 1 high-speed I/O, other reserved.
Address Generator Unit (AGU)
1 for Source port, 1 for Destination port, each has 32b address space.
Clock Generator: supply clock signal for memory (I/O),Source:Destination
1:1, 1:1/2, 1:1/4, 1:1/8, 1:1/16, 1:1/32, 1:1/64;
1:2, 1:4, 1:8, 1:16, 1:32, 1:64.
Data width
Source port: 8 bits, 16 bits, 32 bits. (64 bits not implemented in Senior.)
Destination port: 8 bits, 16 bits, 32 bits. (64 bits not implemented in Senior.)
Memory organization:
The DMA controller should support both big endian and little endian data.
Linking Table supported, the maximum length of linking table is 64.
Table 3.2. Requirement Specification
Chapter 4
Interfaces
The DMA module is controlled by the Senior core. Thus, when configuring, the
Senior core uses its I/O instruction in and out to read and write the registers of
the DMA module.
4.1
Host Interface
The host interface of DMA module conforms to the standard Senior I/O and should
be connected through general I/O of Senior processor. The interface between DSP
core and DMA module can be seen in Table 4.1. The data buses from and to the
DMA module are 32 bits wide. Only the 16 LSB are used for current DMA
configuration.
Name
clk_i
rst_i
addr_i
data_i
rd_strobe_i
wr_strobe_i
data_o
width
1
1
16
32
1
1
32
DIR
In
In
In
In
In
In
Out
Description
System clock.
System reset, active low.
Address input (from DSP core).
Data input (from DSP core).
Read strobe signal.
Write strobe signal.
Data output (to DSP core).
Table 4.1. Host Interface
Table 4.2 gives an overview of the DMA Register specification. The reference
[7, p.53] has shown more detailed information about how to connect a peripheral
to the Senior I/O.
4.1.1
Main Status Register
The status register is used to show the status of DMA transactions. Firmware
developer can use this register to handle the DMA transactions.
29
30
Interfaces
Name
Status
Addr
00
Width
16
written by
DMA
Control
01
16
Senior
Output Data
10
16
Input Data
11
16
Senior
Description
Show the status of DMA. Further details can be found in Table
4.3.
Used for configuring and controlling the DMA, details can be
found in Table 4.4.
Not used in current implementation.
DSP core writes task packet to
this port to configure the DMA
channel.
Table 4.2. DMA Registers specification
Bits
[0]
[1]
[2]
[3]
[4]
[15 : 5]
Specification
Idling or busy: Idle=0, Busy = 1.
When 1, a channel can be configured,
When 0, no channel is available.
When 1, running task is finished.
When 1, an exception is occurred.
When 1, task queue is full.
Reserved
Table 4.3. Main status register specification
4.1.2
Main Control Register
The control register, as the name suggests, is used to control a DMA transaction.
Bits
[0]=1
[1]=1
[2]=1
[9 : 3]
[10]=1
[14 : 11]
[15]
Specification
Reset DMA, flush the current task.
Shutdown DMA.
Data rate: always using DMA clock.
Reserved
Activate a task (Channel) which is specified in task ID.
DMA task ID
When [15] = 1, ask for a channel configuration
Table 4.4. Main control register specification
4.2 Memory Interface
4.1.3
31
Special Memory Control Register
This register doesn’t belong to the general I/O of Senior core. It is a special
purpose register, which is written by the DMA controller and read by Senior core.
By writing the corresponding bit in the register, the DMA controller will notify
the Senior core which memory is being accessed now.
Bits
[0]=1
[1]=1
[2]=1
[15 : 3]
Specification
The DMA controller is accessing DM 0.
The DMA controller is accessing DM 1.
The DMA controller is accessing PM.
Reserved
Table 4.5. Special memory control register specification
4.2
Memory Interface
The memory interface is used for the slaves of the DMA module. Since the DMA
module supports 16 in ports and 16 out ports, we need 32 ports in all. Table 4.6
shows the detail of the memory interface needed for the DMA module.
Name
src0_data_i
src0_addr_o
src0_csn_o
src0_oe_o
width
32
16
1
DIR
I
O
O
1
O
src1
...
src15
Interfaces for Source Port 15.
dst0_data_o
dst0_addr_o
dst0_csn_o
32
16
1
O
O
O
dst0_we_o
1
O
dst1
...
dst15
Description
Data input for Source Port 0.
Address output for Source Port 0.
Memory chip select enable for Source
Port 0, active low.
Memory output enable for Source Port
0, active low.
Interfaces for Source Port 1.
Data output for Destination Port 0.
Address output for Destination Port 0.
Memory chip select enable for Destination Port 0, active low.
Memory write enable for Destination
Port 0, active low.
Interfaces for Destination Port 1.
Interfaces for Destination Port 15.
Table 4.6. Memory Interface
32
Interfaces
4.3
Behavior model of I/O
Since we use only one data I/O for both configuring the DMA module and writing
DMA task, we need a protocol to distinguish the DMA configuration and task
receiving. Figure 4.1 illustrates the configuration flow of the DMA module.
Figure 4.1. DMA configuration
Here, the PREAMBLE means the first control vector we sent to control register
of the DMA module. Chapter 6 shows several examples of how to program the
DMA controller.
4.4
Task Packet Specification
The task packet is used to setup the DMA transfer channel, both for normal DMA
operation and linking table multiple transaction. Since the DSP core has a general
I/O of 16-bit data width, the task packet is also 16-bit wide per data word.
We could specify a transaction by configuring a channel. The configuration
includes configuring the source, the destination and the transaction. Generally, a
basic channel configuration includes the following steps:
• Task priority
• Data size: the length of the data block.
• Data from: the name of the source port.
• Data to: the name of the destination port.
• The physical start address of the source port.
• The physical start address of the destination port.
• The endian behavior of the source port: Big or Little endian.
Besides the software configuration for the DMA transaction, the hardware
specifications of transactions are also important to know by the DMA designers
and DMA users:
4.4 Task Packet Specification
33
• The maximum source clock rate.
• The maximum destination clock rate.
• Data width of the source port: 8 bits, 16 bits, 32 bits or 64 bits.
• Data width of the destination port: 8 bits, 16 bits, 32 bits or 64 bits.
• Data protocol of the source port: error check or not.
Table 4.7 shows a task packet consists of 2 links, and from Table 4.8 to Table
4.12, we can see the explanation of each control vector. The length of task packet
depends on the total number of the linking table.
SRC
width
2b
Number of Links
Task Priority Task ID
8b
4b
4b
DST
SRC DST
SRC
DST
SRC
DST
width proc proc endian endian
rate
rate
2b
1b
1b
1b
1b
4b
4b
Reserved
Source Port
Destination Port
6b
5b
5b
Destination Address low part
16b
Destination Address high part
16b
Source Address 1 low part
16b
Source Address 1 high part
16b
Length of Link 1
16b
Source Address 2 low part
16b
Source Address 2 high part
16b
Length of Link 2
16b
...
Table 4.7. Task packet specification
34
Name
Number of Links
Task Priority
Task ID
Interfaces
Bits
[15:8]
[7:4]
[3:0]
Description
Specify the total number of links, up to 64
Specify the priority of the task.(Not yet implemented)
Specify Task ID.
Table 4.8. Control Vector 1
4.4 Task Packet Specification
Name
SRC width
Bits
[15:14]
DST width
[13:12]
SRC proc
[11]
DST proc
[10]
SRC endian
[9]
DST endian
[8]
SRC rate
[7:4]
DST rate
[3:0]
Description
Specify the data width of source port:
2’b00: 8 bits
2’b01: 16 bits
2’b10: 32 bits
2’b11: 64 bits
Specify the data width of destination port:
2’b00: 8 bits
2’b01: 16 bits
2’b10: 32 bits
2’b11: 64 bits
Specify if the source port use parity check:
1’b0: Don’t use
1’b1: Use
Specify if the destination port use parity check:
1’b0: Don’t use
1’b1: Use
Specify endian of source port:
1’b0: Little endian
1’b1: Big endian
Specify endian of destination port:
1’b0: Little endian
1’b1: Big endian
Clock rate of source port:
4’b0000: clk;
4’b0001: clk/2;
4’b0010: clk/4;
4’b0011: clk/8;
4’b0100: clk/16;
4’b0101: clk/32;
4’b0110: clk/64;
Clock rate of destination port:
4’b0000: clk;
4’b0001: clk/2;
4’b0010: clk/4;
4’b0011: clk/8;
4’b0100: clk/16;
4’b0101: clk/32;
4’b0110: clk/64;
Table 4.9. Control Vector 2
35
36
Interfaces
Name
Reserved
Source Port
Destination Port
Bits
[15:10]
[9:5]
[4:0]
Description
Reserved for future use.
Specify the source port number.
Specify the destination port number.
Table 4.10. Control Vector 3
Name
Destination Address low part
Destination Address high part
Bits
[15:0]
[15:0]
Description
low 16 bit part of destination address.
high 16 bit part of destination address.
Table 4.11. Control Vector 4 & 5
Name
Source Address 1 low part
Source Address 1 high part
Length of Link 1
Bits
[15:0]
[15:0]
[15:0]
Description
Specify low 16 bit part of source address 1.
Specify high 16 bit part of source address 1.
Specify the length of Link 1.
Table 4.12. Control Vector 6 & 7 & 8
Chapter 5
DMA Hardware
Generally, the DMA controller hardware can be divided into data path and control
path [6, p.572]. Figure 5.1 shows the basic architecture of the DMA module.
Figure 5.1. DMA Hardware architecture
The DMA data path gets data from the source port using source address generator, and stores data to the destination port using the destination address generator. In order to handle the data with different data rates and formats, source
decoding and destination decoding module are also needed.
The DMA control path consists of the channel configuration FSM (Finite State
Machine) and transaction FSM. The DSP core can request for the configuration
of a channel. When the DMA is idle, the channel configuration FSM will issue the
channel to the transaction FSM module. The transaction FSM is responsible for
the control of data path. When the block is transmitted, the channel configuration
FSM will generate an interrupt to the DSP core.
The following sections will give more detail information about the sub blocks
of the DMA controller. Figure 5.2 shows the block diagram of the DMA controller
with its main inputs and outputs.
37
38
DMA Hardware
Figure 5.2. DMA Controller Block Diagram
5.1
Host Interface
This is the interface between Senior DSP core and DMA controller. It is used to
keep the control vectors sent by DSP core into registers inside the DMA controller
and update the status register which can be accessed by the Senior DSP core.
5.1.1
Block Diagram
Figure 5.3 shows the block diagram of the Host Interface.
The input MUX is used to select input I/O data based on the input I/O
address. The task FIFO is used to keep the Task packet, which will be used by
transaction FSM. The output MUX is to output the desired data based on I/O
address.
5.1.2
Interface
Table 5.1 gives the detail interface description of the Host Interface.
5.2 Source Address Generator
39
Figure 5.3. Block diagram of Host Interface Module
Name
clk_i
rst_i
io_data_i
io_addr_i
io_rd_strobe_i
io_wr_strobe_i
io_data_o
config_reg_addr_i
config_reg_addr_en_i
config_reg_data_o
contrl_reg_o
status_reg_i
width
1
1
16
16
1
1
16
8
1
16
16
16
DIR
I
I
I
I
I
I
O
I
I
O
O
I
Description
Clock input.
Synchronous reset, active low.
Data input from Host interface.
Address input from Host interface.
Read strobe from Host interface.
Write strobe from Host interface.
Data output to Host interface. (Reserved)
Read address for Task queue.
Read enable signal for Task queue.
Task queue data output.
DMA control register, output to transaction FSM.
DMA status register, input from transaction FSM.
Table 5.1. Interface of Host Interface Module
5.2
Source Address Generator
This module is used to generate the address for the source port, it is controlled by
the transaction FSM.
5.2.1
Block Diagram
Figure 5.4 shows the block diagram of the source address generator.
Once the transaction FSM decodes the task packet parameter into several
control signals, it will send these signals to the source address generator. As
40
DMA Hardware
Figure 5.4. Block diagram of Source address generator
shown in Figure 5.4, an Adder is used inside source address generator to produce
the output source port address. Two counters are also implemented to count how
many words and how many links have been transferred, and thus the end link or
end transfer signal will be asserted once the transfer is finished.
5.2.2
Interface
Table 5.2 gives the interface detail of source address generator.
Name
clk_i
step_i
enable_i
set_addr_i
end_link_o
end_transfer_o
src_addr_i
src_length_i
src_link_number_i
src_addr_o
width
1
2
1
1
1
1
32
16
8
32
DIR
I
I
I
I
O
O
I
I
I
O
Description
Clock input.
Address increment step.
Enable address increment.
Set start address.
Indicate the end of one link.
Indicate the end of transfer.
Start address of the transfer.
Transfer length.
Total number of links.
Source address output.
Table 5.2. Interface of Source address generator
5.3 Destination Address Generator
5.3
41
Destination Address Generator
This module is used to generate the address for the destination port, the control
signal to this module is provided by the transaction FSM.
5.3.1
Block Diagram
Figure 5.5 shows the block diagram of the destination address generator.
Figure 5.5. Block diagram of Destination address generator
This module has the same structure as source address generator, the only
difference is that it doesn’t need the counter for counting transferred words or
links.
5.3.2
Interface
Table 5.3 gives the detailed interface description of destination address generator.
Name
clk_i
step_i
enable_i
setaddr_i
addr_i
addr_o
width
1
2
1
1
32
32
DIR
I
I
I
I
I
O
Description
Clock input.
Address increment step.
Enable address increment.
Set start address.
Start address of the transfer.
Address output.
Table 5.3. Interface of Destination address generator
42
5.4
DMA Hardware
Source Decoder
This module decodes the incoming data based on the task packet provided by the
transaction FSM. It will adapt the data into the internal data format which can
be transferred through the channel.
5.4.1
Block Diagram
Figure 5.6 shows the block diagram of the source decoder.
Figure 5.6. Block diagram of Source decoder
The source decoder consists of several MUXs to decode the incoming data
based on control signals provided by transaction FSM. First, the input data are
segmented by 8 bytes, then the MUXs will select the right combination of data
bytes to get the internal data format.
5.4.2
Interface
Table 5.4 gives the interface detail of Source decoder.
5.5
Destination Decoder
This module will package the internal data format into the data format specified
by the task packet.
5.5.1
Block Diagram
Figure 5.7 shows the block diagram of the destination decoder.
5.6 Transaction FSM
Name
clk
rst
src_width
src_parity
src_endian
channel_din
channel_dout
43
width
1
1
2
1
1
64
64
DIR
I
I
I
I
I
I
O
Description
Clock input.
Synchronous reset, active low.
Source data width.
Source parity check.
Source endian.
Data input from source port.
Data output to channel FIFO.
Table 5.4. Interface of Source decoder
Figure 5.7. Block diagram of Destination decoder
The destination decoder has the similar structure as source decoder. The
output MUX will combine the internal data into the desired data format based on
control signals provided by transaction FSM.
5.5.2
Interface
Table 5.5 gives the detail interface description of Destination decoder.
5.6
Transaction FSM
This FSM is necessary to control all the transaction based on the task packet provided by the DSP core. It receives the incoming task packet and saves the packet
into the DMA internal registers. According to the task packet, the transaction
FSM will decode the task packet based on the specification in Table 3.2 and then
44
DMA Hardware
Name
clk
rst
dest_width
dest_parity
dest_endian
channel_din
channel_dout
width
1
1
2
1
1
64
64
DIR
I
I
I
I
I
I
O
Description
Clock input.
Synchronous reset, active low.
Destination data width.
Destination parity check.
Destination endian.
Data input from channel FIFO.
Data output to destination port.
Table 5.5. Interface of Destination decoder
issue different control signals to different sub blocks of DMA controller to complete
the DMA transaction. Figure 5.8 shows the Finite State Machine of the control
logic.
Figure 5.8. Finite State Machine of the control logic
There are eight states of the transaction FSM in the current design. IDLE is the
default state when the DMA controller is reset. Once the Senior core requests to
configure the DMA controller, CONFIG1 state will be entered, and the transaction
FSM will decode the incoming common control vectors until it finishes the first
5 common control vectors. States CONFIG2_1, CONFIG2_2 and CONFIG2_3
continues to configure the source address and link length of the linking table. Once
the channel is configured, state TRANS is entered, the DMA controller starts the
data transfer. When the FSM receives the “end of link” signal, state WAIT is
entered to wait for configure the next transfer in the linking table. Then the FSM
will repeat states CONFIG2_1, CONFIG2_2 and CONFIG2_3 to configure the
channel. Once the “end of transfer” signal is detected, state FINISH will be
5.6 Transaction FSM
45
entered and the interrupt signal will be sent to the Senior core and status register
will be updated. Then the DMA controller will wait for the Senior core to respond
either on the status register or on the interrupt signal.
5.6.1
Interface
Table 5.6 gives the detailed interface description of Transaction FSM.
Name
clk_i
rst_i
src_port_o
dst_port_o
config_reg_data_i
contrl_reg_i
config_reg_addr_o
config_addr_en_o
status_reg_o
src_addr_o
src_addr_en_o
src_addr_incr
enable_src_gen_o
link_length_o
link_num_o
end_link_i
end_transfer_i
dst_addr_o
dst_addr_en_o
dst_addr_incr
enable_dst_gen_o
src_rate_o
src_parity_o
src_endian_o
dst_rate_o
dst_parity_o
dst_endian_o
src_csn_o
src_oe_o
dst_csn_o
dst_we_o
width
1
1
5
5
16
16
8
1
16
32
1
2
1
16
8
1
1
32
1
2
1
4
1
1
4
1
1
1
1
1
1
DIR
I
I
O
O
I
I
O
O
O
O
O
O
O
O
O
I
I
O
O
O
O
O
O
O
O
O
O
O
O
O
O
Description
Clock input.
Synchronous reset, active low.
Source port number.
Destination port number.
Task packet data input.
Control register data input.
Task packet read address.
Task packet read enable.
Status register data output.
Start address of source port.
Enable source port start address.
Increment step of source port.
Source address generator enable signal.
Length of current transfer link.
Total link number.
End of current link.
End of current transfer.
Start address of destination port.
Enable destination port start address.
Increment step of destination port.
Destination address generator enable signal.
Source port data rate.
Source port parity check.
Source port endian.
Destination port data rate.
Destination port parity check.
Destination port endian.
Source port chip select enable, active low.
Source port output enable, active low.
Destination port chip select enable, active low.
Destination port write enable, active low.
Table 5.6. Interface of Transaction FSM
Chapter 6
Integration
Since the DMA controller should work together with the Senior DSP core, we
need to integrate the DMA controller into the processor core. In this Chapter, the
basic flow will be introduced. It includes the hardware integration and software
integration.
6.1
Hardware Integration
The DMA controller works as a peripheral of the Senior DSP core. As introduced
in Chapter 4 and Reference [7], the peripheral can be connected to any available
GIO. In the following piece of code, the DMA controller is connected to I/O
number 5. The Senior DSP system has other peripherals connected such as timer
and interrupt controller.
The memory interface of the DMA controller should also be connected to the
current Senior memory sub-system. Since the processor need to know which memory is being accessed by DMA controller to make sure the processor core will not
access the same memory module, the Special Memory Control Register of DMA
controller should be connected to Senior core, also.
6.2
Software Integration
In order make the verification of the DMA controller easier, a behavioral model of
DMA controller is also developed. Thus, it is necessary to integrate the behavioral
model into the simulator.
The behavioral model is written in C++. At first, the behavioral model is
not exactly cycle accurate. After the simulation of hardware implementation, the
behavioral model is further tuned to meet the timing specification of the actual
hardware.
The behavioral model should be compiled together with the Senior simulator.
The DMA controller should be instantiated in header file of the simulator in Example 6.1.
47
48
Integration
Example 6.1: Create DMA Behavioral Model in simulator
class Senior {
public:
...
// -------------------------------// DMA
// -------------------------------DMAController dma_controller;
...
}
In the Senior simulator, the DMA controller should be connected to the program memory and data memory in the constructor of Senior. It should be connected to a specific I/O address as well, the codes are shown in Example 6.2.
Example 6.2: Connect DMA Controller
Senior::Senior() {
...
dma_controller.cycle = &cycle;
dma_controller.peripherals = &peripherals;
dma_controller.pm[0] = &pm[0];
dma_controller.pm[1] = &pm[1];
for (int i=0; i<4; i++) {
dma_controller.dm[i] = &dm[i];
}
...
}
int SrSim::srmain(int argc, char** argv) {
...
// Add DMA peripheral at I/O address 5
fprintf(stdout, "Loading DMA peripheral at address 5.\n");
addPeripheral(&(dma_controller),5);
...
}
6.3
DMA Programming
In this section, some sample codes by which the Senior DSP core can program the
DMA controller will be listed.
6.3 DMA Programming
6.3.1
49
Initialize the DMA Controller
In Example 6.3, the DMA controller is configured with a task packet contains 3
links by Senior core through its I/O instructions.
Example 6.3: Configure the DMA Controller
;; Define the address of DMA registers
;; DMA is connected to I/O 5
#define
#define
#define
#define
DMA_STATUS
DMA_CONTRL
DMA_OUT_DATA
DMA_IN_DATA
0x05
0x45
0x85
0xC5
.code
;;; DMA task 1
set
r9,$FFFF ; start symbol, task package preamble
;;; number of link = 3, priority = 0, task ID = 2
set
r10,$0301
;;; width = 16bit, endian = 0, src / dst rate = 1
set
r11,$5000
;;; src port = 3, dst port = 4
set
r12,$0064
out
DMA_IN_DATA,r9
out
DMA_IN_DATA,r10 ; write config vector to config fifo
out
DMA_IN_DATA,r11
out
DMA_IN_DATA,r12
set
set
out
out
;; link
set
set
set
out
out
out
;; link
set
set
set
;; link
set
r10,$0010 ; dst addr low part
r11,$0000 ; dst addr high part
DMA_IN_DATA,r10
DMA_IN_DATA,r11
1
r10,$0000 ; src addr low part
r11,$0000 ; src addr high part
r12,32 ; link length = 32
DMA_IN_DATA,r10
DMA_IN_DATA,r11
DMA_IN_DATA,r12
2
r10,$0030 ; src addr low part
r11,$0000 ; src addr high part
r12,16 ; link length = 16
3
r13,$0060 ; src addr low part
50
Integration
set
set
out
out
out
out
out
out
r14,$0000 ; src addr high part
r15,$40 ; link length = 64
DMA_IN_DATA,r10
DMA_IN_DATA,r11
DMA_IN_DATA,r12
DMA_IN_DATA,r13
DMA_IN_DATA,r14
DMA_IN_DATA,r15
;;; wait for channel configuration
task1_channel_config
in
r1,DMA_STATUS
nop
and
r1,$0002
sub
r1,$0002
jump.ne task1_channel_config
;; start DMA task 1
;; write control register, start config channel
;; and start DMA transfer
set
r1,0x8000 ; config a channel
set
r2,0x0400 ; start DMA
out
DMA_CONTRL,r1
out
DMA_CONTRL,r2
6.3.2
Poll the DMA Controller
In Example 6.4, the Senior core will poll the status register of the DMA controller
to check if the transfer is completed. If the transaction is done, the processor will
go out of the loop and continue to do the other things.
Example 6.4: Poll the status of DMA Controller
;;; wait for DMA task 1 finish
task1_done
in
r1,DMA_STATUS
nop
and
r1,$0006
sub
r1,$0006
jump.ne task1_done
;;; Start to do other things
6.3 DMA Programming
6.3.3
51
Handle the DMA Interrupt
From Example 6.4, we can find that there is a big disadvantage of polling DMA
controller. The processor cannot do anything but waiting for the DMA controller
to complete the transfer. Thus, it is necessary to deal with the interrupt so that the
processor core can do other things while the DMA controller is doing the transfer.
In Example 6.5, the entry for the interrupt service routine (ISR) should be set
correctly according to the actual hardware connection. The flow of the interrupt
can be described as:
[Interrupt Received] → [Push Flags] → [Push PC] → [PC = DM1[SPR(intaddr)]]
→ [Interrupt service routine] → [Instruction = RETI] → [Pop PC and Start Jump]
→ [Pop Flags]
Example 6.5: Handle the DMA Interrupt
.code
set
set
set
nop
st1
jump
sp, 0x7000 ; set the stack point
intaddr, 0x0000 ; set interrupt BASE address (DM1)
r0, INTERRUPT_ROUTINE
(0x0008), r0 ; store interrupt address 4 at BASE+8
MAIN_PROGRAM
INTERRUPT_ROUTINE
;;; Here is the interrupt service routine
reti
MAIN_PROGRAM
;;; Main Program
Chapter 7
Verification
After the hardware is completed, it is always important to verify the correctness
of the designed hardware. In the area of semiconductor industry, it is extremely
critical to make sure the design is bug-free before tape out, since the non-recurring
engineering (NRE) cost of a tape out in 0.13µm technology is more than 1 million
USD in the year 2004 [10]. Modern technology has even higher NRE cost.
7.1
Functional Verification
The functional verification of the DMA controller is based on the test bench of
Senior processor. The basic principle of verification is to compare the output
from the behavioral model of DMA controller with the output from the RTL code
simulation. If the results match, it is believed that the designed hardware is
correct, otherwise debug procedures should be taken.
Figure 7.1 shows the functional verification flow.
Figure 7.1. DMA Functional Verification Flow
Several test cases have been developed to increase the code coverage of the
53
54
Verification
design. Currently, normal DMA operation, linking table operation and large block
transferring with interrupt has been tested. The code coverage is 91.7%.
7.2
Hardware Implementation
For a hardware design, it is always exciting to implement the design into real
hardware, either on FPGA or on ASIC. It is an honor that Professor Liu offered
me an oppertunity to make my design into real hardware.
The FPGA implementation was targeted on Xilinx Virtex 4 FPGA while the
ASIC implementation was targeted on Infineon 65nm CMOS technology.
The implementation was straight forward, the logic synthesizer translates the
RTL code into netlist based on the specific technology, either CMOS standard cell
or FPGA cell. The backend tool will produce the layout based on the floorplan
and synthesized netlist. Some optimization will be performed while the design
hierarchy might be broken. Since the implementation was about the whole Senior
system, I will only discuss the results of the DMA module in Chapter 8.
Chapter 8
Conclusion
8.1
8.1.1
Achieved Results
DMA Benchmark
From the RTL simulation, we can see the timing diagram of the DMA controller
when it is performing the transaction. The timing diagram is drawn in Figure 8.1
and Figure 8.2, respectively.
Note that the extra 4 cycles in Figure 8.2 between 2 links are used to configure
the corresponding transfer parameter for the second link.
Figure 8.1. Timing diagram of basic DMA operation.
The DMA controller has also been synthesized in 65nm digital CMOS technology and implemented in Xilinx Virtex 4 FPGA. Table 8.1 shows the result.
From Table 8.1, we can find that the estimated gate count for CMOS 65nm
technology is relatively high, that’s because a 256 word depth with 16-bit word
55
56
Conclusion
Figure 8.2. Timing diagram of linking table operation.
Target Technology
Working Frequency
Estimated Gate Count
Number of Flip Flops
Number of 4 input LUTs
Estimated Power
ST
65nm CMOS
without mem
200 MHz
26595
4.18 mW
Infineon
65nm CMOS
with mem
200 MHz
18000
2.48 mW
Xilinx
FPGA Virtex 4
88 MHz
504
694
Not Available
Table 8.1. Synthesis Result of DMA controller
width dual-port RAM is used as the control FIFO in the DMA controller. And the
memory was not optimized in this implementation and was synthesized directly.
If memory cell is used in the synthesis, the actual gate count is 18000. The
synthesis result for the FPGA implementation is quite comparable to the ASIC
implementation without memory.
8.1.2
Comparison
Theoretically, with the help of the DMA controller, the efficiency of memory transfer should be improved since the DMA controller can read and write the memory
pipelined as shown in Figure 8.1. It is of course possible for the processor core to
read and write memory pipelined, but it will cost extra register file and programming tricks. It is somewhat only partially pipelined because the limit of registers
available when the desired transfer is too large such as tens of kilo bytes.
In order to compare the efficiency of the memory transfer, Table 8.2 compares
the Clock Cycle the Senior spent when transfer a certain amount of data blocks.
The test case 1 includes three different memory transfer tasks from and to
different parts of the memory sub-system. Task 1 contains three links with 32,
16 and 64 data words respectively. The transfer is from memory port 3 to port
4. Task 2 is almost the same as task 1, except the destination is memory port 5.
1 Here
the optimization means software optimization such as software pipeline
8.2 Future Work
Results
Clock Cycle
Code Size(Bytes)
57
without DMA and
no optimization
1055
212
without DMA but
with optimization
543
548
with DMA
466
488
Table 8.2. Results Comparison with and without DMA
Task 3 is to transfer 32 data word from memory port 4 to memory port 3.
The reader should keep in mind that the benchmark is only a way to estimate the actual performance. The performance benchmark should always been
collected on the real-life applications such as a FFT or DCT kernels or even more
complicated applications such as a complete JPEG decoder and MP3 decoder.
8.1.3
Conclusion
The DMA controller can improve the memory transfer efficiency and make it possible for the processor to do other things while the data transfer is being performed.
There is no free lunch, extra hardware cost and extra code size should be paid for
this improvement.
For some timing critical applications, it is almost impossible for the processor
core to do both data calculation and data transfer. Thus, the DMA technique is
preferred.
8.2
Future Work
As discussed in section 8.1.2, the actual improvement of DMA controller should
be measured on more complicated application such as baseband kernel algorithm
or multimedia kernel algorithms. Which means the DMA controller together with
the Senior processor core should be implemented on either FPGA or ASIC to make
a chip, and the whole application should be developed on the platform.
In order to support off-chip memory modules, external memory interface should
also be developed. That would possibly include the commonly used DDR DRAM
interface and NAND Flash memory interface.
The behavioral model of the DMA controller is currently statically compiled
into the Senior simulator. In order to protect Intellectual Property (IP) and technical detail of Senior core, it is better to compile it dynamically.
Bibliography
[1] TMS320C6000 DSP Enhanced Direct Memory Access (EDMA) Controller
Reference Guide, March 2005. Literature Number:SPRU234B.
[2] Dave Comisky, Sanjive Aganvala, and Charles Fuoco. A Scalable HighPerformance DMA Architecture for DSP Applications. In International Conference on Computer Design, pages 414–419, 2000.
[3] Steve Furber. ARM System-on-Chip Architecture. Addison-Wesley Professional, 2nd edition, August 2000.
[4] David J.Katz and Rick Gentile. Embedded Media Processing. Elsevier,
September 2005.
[5] Phil Lapsley, Jeff Bier, Amit Shoham, and Edward A. Lee. DSP Processor Fundamentals: Architectures and Features. Wiley-IEEE Press, February
1997.
[6] Dake Liu. Embedded DSP Processor Design, Volume 2: Application Specific
Instruction set Processors (Systems on Silicon). Morgan Kaufmann, June
2008.
[7] Markus Svensson and Thomas Österholm. Optimization and Verification of
an Integrated DSP. Master’s thesis, Linköping University, December 2008.
[8] Tongtong Wang. Design of High-performance DMA Controller for Multi-core
Platform. Master’s thesis, Linköping University, May 2006.
[9] Lars Wanhammar. DSP Integrated Circuits. Academic Press, 1st edition,
May 1999.
[10] Kun-Cheng Wu and Yu-Wen Tsai. Structured ASIC, evolution or revolution?
In Proceedings of the 2004 international symposium on Physical design, pages
103–106. ACM, 2004.
59
Appendix A
DMA Simulator C++
Header
#ifndef DMA_CONTROLLER_HPP
#define DMA_CONTROLLER_HPP
#include "support.hpp"
#include "peripheral.hpp"
#include "memory.hpp"
#include "data_memory.hpp"
#include <map>
#include <queue>
#include <stdlib.h>
#include <stdint.h>
#define DMA_LINK_NUM 64 // DMA linking table number
#define DMA_TASKQ_SIZE 3
#define DMA_PM1
0
#define DMA_PM2
1
#define DMA_DM0_1
2
#define DMA_DM0_2
3
#define DMA_DM1_1
4
#define DMA_DM1_2
5
struct Links_t{
uint16_t srcAddrL;
uint16_t srcAddrH;
uint16_t length;
};
struct DMATask_t{
uint8_t linkNumber;
uint8_t taskPriority;
uint8_t taskID;
uint8_t srcWidth;
uint8_t dstWidth;
bool srcProtocol;
bool dstProtocol;
bool srcEndian;
bool dstEndian;
uint8_t srcRate;
uint8_t dstRate;
uint8_t srcPort;
uint8_t dstPort;
uint16_t dstAddrL;
uint16_t dstAddrH;
Links_t links[DMA_LINK_NUM];
};
struct DMAStatus_t{
bool busy;
61
62
DMA Simulator C++ Header
bool
bool
bool
bool
chReady;
finish;
exception;
queueFull;
};
struct DMAControl_t{
bool reset;
bool shutdown;
bool dmaClock;
bool start;
uint8_t taskID;
bool reqChConf;
};
class DMAController : public Peripheral {
public:
cycle_T* cycle;
std::map<unsigned int, Peripheral*>* peripherals; //connect to peripheral IO
DMAController(void);
~DMAController(void);
long ioCommunicate(unsigned int, unsigned long, unsigned long,
unsigned int, unsigned long);
int GetInterrupt();
int Step();
// Program memory
Memory *pm[2];
// Data memory
DataMemory *dm[4];
unsigned long clockTag;
void
start(unsigned long cycle);
void
configChannel(unsigned long cycle);
uint16_t getStatusReg(unsigned long cycle);
uint16_t getControlReg();
void
setControlReg(uint16_t data);
void
putTaskQueue(uint16_t data, unsigned long cycle);
void
shutDown();
void
reset();
private:
// DMA config
DMAStatus_t _status;
DMAControl_t _control;
DMATask_t
_task;
uint16_t
_statusReg;
uint16_t
_controlReg;
uint16_t
_taskQueue[DMA_TASKQ_SIZE][198];
// DMA task queue
uint16_t _queuePtr;
uint16_t _nextQueuePtr;
uint16_t _taskPtr;
uint32_t _taskRegAddr;
std::queue<uint16_t> _taskQ;
// Task queue function
void
_setTaskReg(uint32_t queID, uint32_t addr, uint16_t data);
uint16_t _getTaskReg(uint32_t queID, uint32_t addr);
// DMA data transfer function
void
_trans();
uint32_t _transCycle();
void
_syncReg();
void
_syncTask();
};
#endif
Appendix B
DMA Simulator C++ Code
#include "dma_controller.hpp"
#include <stdlib.h>
static inline int gv(unsigned int insn, int bitpos, int bits) {
return ((insn >> bitpos) & ((1<<bits)-1));
}
//----------------------------// DMA peripheral I/O
//----------------------------long DMAController::ioCommunicate(unsigned int addr_in,
unsigned long data_in,
unsigned long data_in2,
unsigned int read_write,
unsigned long cycle) {
if (read_write == 1) {
// Reading
switch(gv(addr_in,6,2)) {
case 0: // Status register
return getStatusReg(cycle);
case 1: // Control register
return getControlReg();
case 2: // Out port data to DSP core
fprintf(stderr, "Warning: No data written to DSP core.\n");
return -1;
case 3: // In port from DSP core
fprintf(stderr, "Warning: Can’t read In port data.\n");
return -1;
default:// Unkown operation
fprintf(stderr, "Warning: Unknown operation.\n");
return -1;
}
}
else if (read_write == 2) {
// Writing
switch(gv(addr_in,6,2)) {
case 0: // Status register
fprintf(stderr, "Warning: Trying to write read-only status register.\n");
return -1;
case 1: // Control register
setControlReg((uint16_t)data_in);
printf("DMA: Cycle(%lu), write DMA_CONTROL,
value = 0x%04x.\n",cycle,(uint16_t)data_in);
if (gv(data_in,0,1)) {
63
64
DMA Simulator C++ Code
reset(); // Reset
}
else if (gv(data_in,1,1)) {
shutDown(); // Shutdown DMA
}
else if (gv(data_in,15,1)) {
configChannel(cycle); // Request config DMA task
}
else if (gv(data_in,10,1)) {
start(cycle); // Start DMA transaction
}
return 0;
case 2: // Out port data to DSP core
fprintf(stderr, "Warning: Trying to write OUT port of DMA.\n");
return -1;
case 3: // In port data from DSP core
putTaskQueue((uint16_t)data_in, cycle);
return 0;
default: // Unknown operation
fprintf(stderr, "Warning: Unknown operation.\n");
return -1;
}
}
fprintf(stderr, "DMA ERROR: Wrong read_write state,read_write=%d.\n",
read_write);
return -1;
}
int DMAController::GetInterrupt() {
return 0;
}
int DMAController::Step() {
return 0;
}
//----------------------------// DMA controller behavior
//----------------------------DMAController::DMAController() {
reset();
}
DMAController::~DMAController() {
};
void DMAController::start(unsigned long cycle) {
printf("DMA: Cycle(%lu), start transfer.\n",cycle);
_status.busy = 1;
_status.finish = 0;
_status.chReady = 0;
_syncReg();
clockTag = cycle + _transCycle();
printf("DMA: update clockTag = %lu.\n",clockTag);
}
uint16_t DMAController::getStatusReg(unsigned long cycle) {
if (_status.busy && clockTag <= cycle) {
_trans();
_status.busy = 0;
_status.finish = 1;
_status.chReady = 1;
printf("DMA: Cycle(%lu), task(%d) finished.\n",cycle,_task.taskID);
}
if (_taskQ.size() == DMA_TASKQ_SIZE) _status.queueFull = 1;
else _status.queueFull = 0;
65
_syncReg();
return _statusReg;
}
uint16_t DMAController::getControlReg() {
_syncReg();
return _controlReg;
}
void DMAController::setControlReg(uint16_t data) {
_controlReg = data;
}
void DMAController::putTaskQueue(uint16_t data, unsigned long cycle) {
if ((uint16_t)data == 0xFFFF) {
_queuePtr = _nextQueuePtr;
_taskRegAddr = 0;
_taskQ.push(_queuePtr);
if (_queuePtr >= DMA_TASKQ_SIZE-1) _nextQueuePtr = 0;
else _nextQueuePtr = _queuePtr+1;
printf("DMA: Cycle(%lu), Senior config task queue[%d].\n",cycle,_queuePtr);
}
else {
_setTaskReg(_queuePtr, _taskRegAddr, data);
_taskRegAddr++;
}
}
void DMAController::configChannel(unsigned long cycle) {
_taskPtr = _taskQ.front();
_taskQ.pop();
printf("DMA: Cycle(%lu), configChannel(); _taskPtr = %d,
_taskQ.size = %d.\n",cycle, _taskPtr, _taskQ.size());
_syncReg();
_syncTask();
printf("DMA task packet: \n");
printf("
|-Link number %d, Task priority %d, Task ID 0x%04x\n",
_task.linkNumber,_task.taskPriority,_task.taskID);
printf("
|------------\n");
printf("
|-SRC width %d, DST width %d, SRC protocol %d,
DST protocol %d\n", _task.srcWidth,_task.dstWidth,
_task.srcProtocol,_task.dstProtocol);
printf("
| SRC endian %d, DST endian %d, SRC rate
%d,
DST rate
%d\n",_task.srcEndian,_task.dstEndian,
_task.srcRate,_task.dstRate);
printf("
|------------\n");
printf("
|-SRC port %d, DST port %d\n", _task.srcPort,_task.dstPort);
printf("
|------------\n");
printf("
|-DST addr low
0x%04x\n", _task.dstAddrL);
printf("
|-DST addr high
0x%04x\n", _task.dstAddrH);
for (int i = 0; i < _task.linkNumber; i++) {
printf("
|-DMA link %d\n", i);
printf("
|
|-SRC addr low
0x%04x\n", _task.links[i].srcAddrL);
printf("
|
|-SRC addr high
0x%04x\n", _task.links[i].srcAddrH);
printf("
|
|-Link length
%d\n",
_task.links[i].length);
}
}
void DMAController::shutDown() {
clockTag = 0;
_queuePtr = 0;
_nextQueuePtr = 0;
_taskPtr = 0;
66
_status.busy = 1;
_status.finish = 0;
_status.chReady = 0;
_status.exception = 0;
_status.queueFull = 1;
setControlReg(0x0000);
_syncReg();
DMA Simulator C++ Code
}
void DMAController::reset() {
clockTag = 0;
_queuePtr = 0;
_nextQueuePtr = 0;
_taskPtr = 0;
_status.busy = 0;
_status.finish = 1;
_status.chReady = 1;
_status.exception = 0;
_status.queueFull = 0;
}
void DMAController::_trans() {
uint16_t tmpData;
uint32_t srcAddr;
uint32_t dstAddr;
dstAddr = ((uint32_t)_task.dstAddrH << 16) + (uint32_t)_task.dstAddrL;
for (int link = 0; link < _task.linkNumber; link++) {
srcAddr = ((uint32_t)_task.links[link].srcAddrH << 16)
+ (uint32_t)_task.links[link].srcAddrL;
for (int i = 0; i < _task.links[link].length; i++) {
// Read data vector from source
switch(_task.srcPort) {
case(DMA_PM1):
tmpData = pm[0]->Read((uint16_t)srcAddr); break;
case(DMA_PM2):
tmpData = pm[1]->Read((uint16_t)srcAddr); break;
case(DMA_DM0_1):
tmpData = dm[0]->dmaRead((uint16_t)srcAddr); break;
case(DMA_DM0_2):
tmpData = dm[1]->dmaRead((uint16_t)srcAddr); break;
case(DMA_DM1_1):
tmpData = dm[2]->dmaRead((uint16_t)srcAddr); break;
case(DMA_DM1_2):
tmpData = dm[3]->dmaRead((uint16_t)srcAddr); break;
default: break;
}
// Write data vector to destination
switch(_task.dstPort) {
case(DMA_PM1):
pm[0]->Write(dstAddr,tmpData); break;
case(DMA_PM2):
pm[1]->Write(dstAddr,tmpData); break;
case(DMA_DM0_1):
dm[0]->dmaWrite(dstAddr,tmpData); break;
case(DMA_DM0_2):
dm[1]->dmaWrite(dstAddr,tmpData); break;
case(DMA_DM1_1):
dm[2]->dmaWrite(dstAddr,tmpData); break;
case(DMA_DM1_2):
dm[3]->dmaWrite(dstAddr,tmpData); break;
default: break;
}
// Update address pointer
67
}
}
srcAddr++;
dstAddr++;
}
uint32_t DMAController::_transCycle() {
uint32_t transCycle = 0;
for (int link = 0; link < _task.linkNumber; link++) {
transCycle += _task.links[link].length;
}
return transCycle;
}
void DMAController::_syncReg() {
// Status register
_statusReg = (_status.queueFull << 4) | (_status.exception << 3) |
(_status.finish << 2) | (_status.chReady << 1) | (_status.busy);
// Control register
_control.reset
= gv(_controlReg,0,1);
_control.shutdown = gv(_controlReg,1,1);
_control.dmaClock = gv(_controlReg,2,1);
_control.start
= gv(_controlReg,10,1);
_control.taskID
= gv(_controlReg,11,4);
_control.reqChConf = gv(_controlReg,15,1);
}
void DMAController::_syncTask() {
//DMA task register
_task.linkNumber
= gv(_taskQueue[_taskPtr][0],8,8);
_task.taskPriority = gv(_taskQueue[_taskPtr][0],4,4);
_task.taskID
= gv(_taskQueue[_taskPtr][0],0,4);
_task.srcWidth
= gv(_taskQueue[_taskPtr][1],14,2);
_task.dstWidth
= gv(_taskQueue[_taskPtr][1],12,2);
_task.srcProtocol = gv(_taskQueue[_taskPtr][1],11,1);
_task.dstProtocol = gv(_taskQueue[_taskPtr][1],10,1);
_task.srcEndian
= gv(_taskQueue[_taskPtr][1], 9,1);
_task.dstEndian
= gv(_taskQueue[_taskPtr][1], 8,1);
_task.srcRate
= gv(_taskQueue[_taskPtr][1], 4,4);
_task.dstRate
= gv(_taskQueue[_taskPtr][1], 0,4);
_task.srcPort
= gv(_taskQueue[_taskPtr][2],5,5);
_task.dstPort
= gv(_taskQueue[_taskPtr][2],0,5);
_task.dstAddrL
= _taskQueue[_taskPtr][3];
_task.dstAddrH
= _taskQueue[_taskPtr][4];
for (int i=0; i < DMA_LINK_NUM; i++) {
_task.links[i].srcAddrL = _taskQueue[_taskPtr][5+i*3];
_task.links[i].srcAddrH = _taskQueue[_taskPtr][5+i*3+1];
_task.links[i].length
= _taskQueue[_taskPtr][5+i*3+2];
}
}
uint16_t DMAController::_getTaskReg(uint32_t queID, uint32_t addr) {
return _taskQueue[queID][addr];
}
void DMAController::_setTaskReg(uint32_t queID, uint32_t addr, uint16_t data) {
_taskQueue[queID][addr] = data;
}
68
DMA Simulator C++ Code
Upphovsrätt
Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —
under 25 år från publiceringsdatum under förutsättning att inga extraordinära
omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,
skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en
senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman
i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form
eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller
konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet — or its possible replacement — for a period of 25 years from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission for
anyone to read, to download, to print out single copies for his/her own use and
to use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses of
the document are conditional on the consent of the copyright owner. The publisher
has taken technical and administrative measures to assure authenticity, security
and accessibility.
According to intellectual property law the author has the right to be mentioned
when his/her work is accessed as described above and to be protected against
infringement.
For additional information about the Linköping University Electronic Press
and its procedures for publication and for assurance of document integrity, please
refer to its www home page: http://www.ep.liu.se/
c Guoyou Jiang
°
Download