Chapter 2 - TMS320C6000 Architectural Overview

advertisement
DSP Lecture 02
Chapter 2
TMS320C6000 Architectural Overview
Learning Objectives
•
•
•
•
•
•
2
Describe C6000 CPU architecture.
Introduce some basic instructions.
Describe the C6000 memory map.
Provide an overview of the peripherals.
Performance
Software Development
General DSP System Block Diagram
Internal Memory
Internal Buses
External
Memory
Central
Processing
Unit
3
P
E
R
I
P
H
E
R
A
L
S
Implementation of Sum of Products (SOP)
It has been shown in
Chapter 1 that SOP is the
key element for most DSP
algorithms.
So let’s write the code for
this algorithm and at the
same time discover the
C6000 architecture.
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
4
Implementation of Sum of Products (SOP)
So let’s implement the SOP
algorithm!
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The implementation in this
module will be done in
assembly.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
5
Multiply (MPY)
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The multiplication of a1 by x1 is done in
assembly by the following instruction:
MPY
a1, x1, Y
This instruction is performed by a
multiplier unit that is called “.M”
6
Multiply (.M unit)
40
Y = 
an * xn
n = 1
.M
The . M unit performs multiplications in
hardware
MPY
.M
a1, x1, Y
Note: 16-bit by 16-bit multiplier provides a 32-bit result.
32-bit by 32-bit multiplier provides a 64-bit result.
7
Addition (.?)
40
Y = 
an * xn
n = 1
.M
.?
8
MPY
.M
a1, x1, prod
ADD
.?
Y, prod, Y
Add (.L unit)
40
Y = 
an * xn
n = 1
.M
.L
MPY
.M
a1, x1, prod
ADD
.L
Y, prod, Y
RISC processors such as the C6000 use registers to
hold the operands, so lets change this code.
9
Register File - A
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
a1, x1, prod
ADD
.L
Y, prod, Y
A15
32-bits
Let us correct this by replacing a, x, prod and Y by the
registers as shown above.
10
Specifying Register Names
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
A15
32-bits
The registers A0, A1, A3 and A4 contain the values to be
used by the instructions.
11
Specifying Register Names
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
A15
32-bits
Register File A contains 16 registers (A0 -A15) which
are 32-bits wide.
12
Data loading
Register File A
A0
A1
a1
x1
A2
A3
prod
Y
.M
.
.
.
.L
A15
32-bits
13
Q: How do we load the
operands into the registers?
Load Unit “.D”
Register File A
A0
A1
Q: How do we load the
operands into the registers?
a1
x1
A2
A3
prod
Y
.M
.
.
.
.L
.D
A15
32-bits
Data Memory
14
A: The operands are loaded
into the registers by loading
them from the memory
using the .D unit.
Load Unit “.D”
Register File A
A0
A1
a1
x1
A2
A3
prod
Y
.M
.
.
.
.L
.D
A15
32-bits
Data Memory
15
It is worth noting at this
stage that the only way to
access memory is through the
.D unit.
Load Instruction
Register File A
A0
A1
a1
x1
A2
A3
prod
Y
.M
.
.
.
.L
.D
A15
32-bits
Data Memory
16
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
Load Instructions (LDB, LDH,LDW,LDDW)
Register File A
A0
A1
a1
x1
A2
A3
prod
Y
.M
.
.
.
.L
A: The load instructions.
.D
A15
32-bits
Data Memory
17
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
Using the Load Instructions
Before using the load unit you
have to be aware that this
processor is byte addressable,
which means that each byte is
represented by a unique
address.
Data
address
00000000
00000002
00000004
00000006
00000008
Also the addresses are 32-bit
wide.
FFFFFFFF
16-bits
18
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
Where:
Rn is a register that contains
the address of the operand to
be loaded
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
and
Rm is the destination register.
FFFFFFFF
16-bits
19
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
The question now is how many
bytes are going to be loaded
into the destination register?
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
FFFFFFFF
16-bits
20
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
The answer, is that it depends on
the instruction you choose:
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
• LDB: loads one byte (8-bit)
• LDH: loads half word (16-bit)
• LDW: loads a word (32-bit)
• LDDW: loads a double word (64-bit)
Note: LD on its own does not
exist.
21
FFFFFFFF
16-bits
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
22
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
23
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
24
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
25
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
26
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
Question:
If data can only be accessed by the
load instruction and the .D unit,
how can we load the register
pointer Rn in the first place?
address
Data
0xA
0xB
0xC
0xD
0x2
0x1
0x4
0x3
0x6
0x5
0x8
0x7
00000000
00000002
00000004
00000006
00000008
FFFFFFFF
16-bits
27
Loading the Pointer Rn
• The instruction MVKL will allow a move of a 16-bit constant into
a register as shown below:
MVKL .?
a, A5
(‘a’ is a constant or label)
• How many bits represent a full address?
32 bits
• So why does the instruction not allow a 32-bit move?
All instructions are 32-bit wide (see instruction opcode).
28
Loading the Pointer Rn
• To solve this problem another instruction is available:
MVKH
eg.
MVKH
.?
a, A5
(‘a’ is a constant or label)
ah
al
a
ah
x
A5
• Finally, to move the 32-bit address to a register we can
use:
29
MVKL
a, A5
MVKH
a, A5
Loading the Pointer Rn
• Always use MVKL then MVKH, look at the following
examples:
Example 1
A5 = 0x87654321
MVKL
0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
MVKH
0x1234FABC, A5
A5 = 0x1234FABC ; OK
Example 2
MVKH
A5 = 0x12344321
30
0x1234FABC, A5
MVKL
0x1234FABC, A5
A5 = 0xFFFFFABC ; Wrong
LDH, MVKL and MVKH
Register File A
A0
A1
a
x
A2
A3
A4
prod
Y
.M
.
.
.
.L
.D
A15
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
32-bits
pt1 and pt2 point to some locations
Data Memory
31
in the data memory.
Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
So let’s create a loop
so that we can
implement the SOP
for N Taps.
32
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
So let’s create a loop
so that we can
implement the SOP
for N Taps.
33
With the C6000 processors
there are no dedicated
instructions such as block
repeat. The loop is created
using the B instruction.
What are the steps for creating a loop
1. Create a label to branch to.
2. Add a branch instruction, B.
3. Create a loop counter.
4. Add an instruction to decrement the loop counter.
5. Make the branch conditional based on the value in
the loop counter.
34
1. Create a label to branch to
loop
35
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
2. Add a branch instruction, B.
loop
36
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.?
loop
Which unit is used by the B instruction?
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
37
Data Memory
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.?
loop
Which unit is used by the B instruction?
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
38
Data Memory
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.S
loop
3. Create a loop counter.
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
39
Data Memory
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
MVKL .S
pt2, A6
pt2, A6
count, B0
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.S
loop
B registers will be introduced later
4. Decrement the loop counter
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
40
Data Memory
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
MVKL .S
pt2, A6
pt2, A6
count, B0
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
5. Make the branch conditional based on the value
in the loop counter
• What is the syntax for making instruction conditional?
[condition] Instruction
Label
e.g.
[B1] B
loop
(1) The condition can be one of the following
registers: A1, A2, B0, B1, B2.
(2) Any instruction can be conditional.
41
5. Make the branch conditional based on the value
in the loop counter
• The condition can be inverted by adding the exclamation symbol “!”
as follows:
[!condition] Instruction
Label
e.g.
42
[!B0]
B
loop ;branch if B0 = 0
[B0] B
loop
;branch if B0 != 0
5. Make the branch conditional
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
43
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
Data Memory
[B0]
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
More on the Branch Instruction (1)
 With this processor all the instructions are
encoded in a 32-bit.
 Therefore the label must have a dynamic range
of less than 32-bit as the instruction B has to be
coded.
32-bit
B
 Case 1:
21-bit relative address
B .S1
label
 Relative branch.
 Label limited to +/- 220 offset.
44
More on the Branch Instruction (2)
 By specifying a register as an operand instead of
a label, it is possible to have an absolute branch.
 This will allow a dynamic range of 232.
32-bit
B
 Case 2:
B .S2
 Absolute branch.
 Operates on .S2 ONLY!
45
5-bit register
code
register
Testing the code
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code performs the following
operations:
loop
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
However, we would like to perform:
ADD
.L
A4, A3, A4
a0*x0 + a1*x1 + a2*x2 + … + aN*xN
SUB
.S
B0, 1, B0
B
.S
loop
a0*x0 + a0*x0 + a0*x0 + … + a0*x0
[B0]
46
Modifying the pointers
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
The solution is to modify the pointers
loop
A5 and A6.
[B0]
47
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Indexing Pointers
Syntax
Description
*R
Pointer
Pointer
Modified
No
In this case the pointers are used but not modified.
R can be any register
48
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
Pointer
+ Pre-offset
- Pre-offset
Pointer
Modified
No
No
No
In this case the pointers are modified BEFORE being used
and RESTORED to their previous values.



49
[disp] specifies the number of elements size in DW (64-bit), W
(32-bit), H (16-bit), or B (8-bit).
disp = R or 5-bit constant.
R can be any register.
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Pointer
Modified
No
No
No
Yes
Yes
In this case the pointers are modified BEFORE being used
and NOT RESTORED to their Previous Values.
50
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes
In this case the pointers are modified AFTER being used
and NOT RESTORED to their Previous Values.
51
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement



52
Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes
[disp] specifies # elements - size in DW, W, H, or B.
disp = R or 5-bit constant.
R can be any register.
Modify and testing the code
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code now performs the following
loop
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
[B0]
53
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code now performs the following
loop
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
[B0]
54
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop
The Pointer A7 has not been initialized.
[B0]
55
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 pt3, A7
MVKH .S2 pt3, A7
MVKL .S2 count, B0
The Pointer A7 is now initialized.
loop
[B0]
56
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
What is the initial value of A4?
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
A4 is used as an accumulator,
so it needs to be reset to zero.
loop
[B0]
57
MVKL
MVKH
MVKL
ZERO
LDH
.S2
.S2
.S2
.L
.D
pt3, A7
pt3, A7
count, B0
A4
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Increasing the processing power!
Register File A
A0
A1
A2
A3
A4
.S1
.M1
.
.
.
How can we add more
processing power to
this processor?
.L1
.D1
A15
32-bits
Data Memory
58
Increasing the processing power!
Register File A
A0
A1
A2
A3
A4
.S1
.M1
.
.
.
(1) Increase the clock
frequency.
(2) Increase the number
of Processing units.
.L1
.D1
A15
32-bits
Data Memory
59
To increase the Processing Power, this processor has two
sides (A and B or 1 and 2)
Register File A
Register File B
A0
B0
.S1
.S2
A1
B1
A2
B2
A3
B3
.M1
.M2
A4
B4
.
.
.
A15
.L1
.L2
.D1
.D2
32-bits
B15
32-bits
Data Memory
60
.
.
.
Can the two sides exchange operands in order to increase
performance?
Register File A
Register File B
A0
B0
.S1
.S2
A1
B1
A2
B2
A3
B3
.M1
.M2
A4
B4
.
.
.
A15
.L1
.L2
.D1
.D2
32-bits
B15
32-bits
Data Memory
61
.
.
.
The answer is YES but there are limitations.

To exchange operands between the two
sides, some cross paths or links are
required.
What is a cross path?
62

A cross path links one side of the CPU to
the other.

There are two types of cross paths:

Data cross paths.

Address cross paths.
Data Cross Paths
63

Data cross paths can also be referred to
as register file cross paths.

These cross paths allow operands from
one side to be used by the other side.

There are only two cross paths:

one path which conveys data from side B
to side A, 1X.

one path which conveys data from side A
to side B, 2X.
TMS320C67x Data-Path
64
Data Cross Paths
65

Data cross paths only apply to the .L, .S
and .M units.

The data cross paths are very useful,
however there are some limitations in
their use.
Data Cross Path Limitations
<dst>
.L1
.M1
.S1
<src>
(1) The destination register must be
on same side as unit.
(2) Source registers - up to one cross
path per execute packet per side.
Execute packet: group of instructions that
execute simultaneously.
66
A
<src>
2x
1x
B
Data Cross Path Limitations
<dst>
eg:
ADD
MPY
SUB
|| ADD
67
.L1
.M1
.S1
.L1x
.M1x
.S1x
.L1x
A
<src>
<src>
A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2
|| Means that the SUB and ADD
belong to the same fetch packet,
therefore execute
simultaneously.
2x
1x
B
Data Cross Path Limitations
<dst>
eg:
ADD
MPY
SUB
|| ADD
.L1
.M1
.S1
.L1x
.M1x
.S1x
.L1x
68
<src>
A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2
NOT VALID!
A
<src>
2x
1x
B
Data Cross Paths for both sides
<dst>
<dst>
69
.L1
.M1
.S1
.L2
.M2
.S2
A
<src>
<src>
2x
<src>
<src>
1x
B
Address cross paths
Data
Addr
A
.D1
(1) The pointer must be on the same
side of the unit.
LDW.D1T1 *A0,A5
STW.D1T1 A5,*A0
70
Load or store to either side
Data1
DA1 = T1
DA2 = T2
Data2
71
A5
.D1
*A0
LDW.D1T1 *A0,A5
LDW.D1T2 *A0,B5
B5
A
B
Standard Parallel Loads
Data1
DA1 = T1
DA2 = T2
A5
.D1
*A0
.D2
*B0
A
B
LDW.D1T1 *A0,A5
|| LDW.D2T2 *B0,B5
72
B5
Parallel Load/Store using address cross
paths
Data1
DA1 = T1
DA2 = T2
A5
.D1
*A0
.D2
*B0
A
B
LDW.D1T2 *A0,B5
|| STW.D2T1 A5,*B0
73
B5
Fill the blanks ... Does this work?
Data1
DA1 = T1
DA2 = T2
.D1
*A0
.D2
*B0
A
B
LDW.D1__ *A0,B5
|| STW.D2__ B6,*B0
74
Not Allowed!
Parallel accesses: both cross or neither cross
Data1
DA2 = T2
.D1
*A0
.D2
*B0
B5
LDW.D1T2 *A0,B5
B6
|| STW.D2T2 B6,*B0
75
A
B
Conditions Don’t Use Cross Paths
• If a conditional register comes from the opposite side,
it does NOT use a
data or address cross-path.
• Examples:
[B2]
[A1]
76
ADD
LDW
.L1
.D2
A2,A0,A4
*B0,B5
‘C62x Data-Path Summary
CPU
Ref Guide
Full CPU Datapath
(Pg 2-2)
77
‘C67x Data-Path Summary
78
‘C67x
Cross Paths - Summary

•

• Address

79
Data
– Destination register on same side as unit.
– Source registers - up to one cross path per execute
packet per side.
– Use “x” to indicate cross-path.
– Pointer must be on same side as unit.
– Data can be transferred to/from either side.
– Parallel accesses: both cross or neither cross.
Conditionals Don’t Use Cross Paths.
Code Review (using side A only)
40
Y = 
n = 1
MVK
loop: LDH
LDH
MPY
ADD
SUB
[A2] B
STH
.S1
.D1
.D1
.M1
.L1
.L1
.S1
.D1
an * xn
40, A2
*A5++, A0
*A6++, A1
A0, A1, A3
A3, A4, A4
A2, 1, A2
loop
A4, *A7
; A2 = 40, loop count
; A0 = a(n)
; A1 = x(n)
; A3 = a(n) * x(n)
; Y = Y + A3
; decrement loop count
; if A2  0, branch
; *A7 = Y
Note: Assume that A4 was previously cleared and the pointers are initialised.
80
Dual Resources : Twice as Nice
Register File A
A0
A1
A2
A3
A4
A5
A6
A7
..
A15
or
A31
104
cn
xn
cnt
prd
sum
*c
*x
*y
..
32-bits
Register File B
.S1
.S2
.M1
.M2
.L1
.L2
.D1
.D2
..
32-bits
B0
B1
B2
B3
B4
B5
B6
B7
..
B15
or
B31
Our final view of the sum of products example...
Optional - Resource Specific Coding
40
y = 
Register File A
A0
A1
A2
A3
A4
A5
A6
A7
..
A15
or
A31
105
cn
xn
cnt
prd
sum
*c
*x
*y
..
32-bits
cn * xn
n = 1
.S1
loop:
.M1
.L1
MVK
.S1
40, A2
LDH
.D1
*A5++, A0
LDH
.D1
*A6++, A1
MPY
.M1
A0, A1, A3
ADD
.L1
A4, A3, A4
SUB
.S1
A2, 1, A2
.S1
loop
.D1
A4, *A7
[A2] B
.D1
STW
It’s easier to use symbols rather than
register names, but you can use
either method.
TMS320C6000 Instruction Set
106
'C6000 System Block Diagram
External
Memory
Internal Buses
.D1 .D2
.M1 .M2
.L1 .L2
.S1 .S2
Reggister Set B
Register Set A
P
E
R
I
P
H
E
R
A
L
S
On-chip
Memory
CPU
107
To summarize each units’ instructions ...
‘C62x RISC-like instruction set
.S Unit
.S
.L
.D
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
108
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
.M Unit
.D Unit
.M
.L Unit
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
LDB
(B/H/W) SUB
SUBAB (B/H/W)
MV
ZERO
MPY
MPYH
MPYLH
MPYHL
SMPY
SMPYH
No Unit Used
NOP
IDLE
'C62x RISC-Like Instruction Set (by category)
Arithmetic
Logical
ABS
ADD
ADDA
ADDK
ADD2
MPY
MPYH
NEG
SMPY
SMPYH
SADD
SAT
SSUB
SUB
SUBA
SUBC
SUB2
ZERO
AND
CMPEQ
CMPGT
CMPLT
NOT
OR
SHL
SHR
SSHL
XOR
Bit Mgmt
CLR
EXT
LMBD
NORM
SET
Data Mgmt
LDB/H/W
MV
MVC
MVK
MVKL
MVKH
MVKLH
STB/H/W
Program Ctrl
B
IDLE
NOP
Note: Refer to the 'C6000 CPU Reference Guide for more details.
109
‘C67x: Superset of Fixed-Point (by units)
.S Unit
.S
.L
.D
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP
.D Unit
.M
110
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
LDB
(B/H/W) SUB
LDDW
SUBAB (B/H/W)
MV
ZERO
.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP
.M Unit
MPY
MPYH
MPYLH
MPYHL
SMPY
SMPYH
MPYSP
MPYDP
MPYI
MPYID
No Unit Required
NOP
IDLE
'C64x: Superset of ‘C62x Instruction Set
.S
.D
111
Dual/Quad Arith
SADD2
SADDUS2
SADD4
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
Bitwise Logical UNPKHU4
ANDN
UNPKLU4
Shifts & Merge SWAP2
SPACK2
SHR2
SPACKU4
SHRU2
SHLMB
SHRMB
Dual Arithmetic Mem Access
ADD2
LDDW
SUB2
LDNW
LDNDW
Bitwise Logical STDW
AND
STNW
ANDN
STNDW
OR
XOR
Load Constant
MVK (5-bit)
Address Calc.
ADDAD
Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4
.L
Branches/PC
BDEC
BPOS
BNOP
ADDKPC
Dual/Quad Arith
ABS2
ADD2
ADD4
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN
.M
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4
Multiplies
MPYHI
Shift & Merge
MPYLI
SHLMB
MPYHIR
SHRMB
MPYLIR
Load Constant
MPY2
MVK (5-bit)
SMPY2
Bit Operations DOTP2
DOTPN2
BITC4
DOTPRSU2
BITR
DOTPNRSU2
DEAL
DOTPU4
SHFL
DOTPSU4
Move
GMPY4
MVD
XPND2/4
Sample C62x Compiler Benchmarks
Algorithm
Used In
Asm
Cycles
Assembly Time
(s)
C Cycles
(Rel 4.0)
C Time (s)
% Efficiency vs
Hand Coded
Block Mean Square Error
MSE of a 20 column image matrix
For motion
compensation of
image data
348
1.16
402
1.34
87%
Codebook Search
CELP based voice
coders
977
3.26
961
3.20
100%
Vector Max
40 element input vector
Search Algorithms
61
0.20
59
0.20
100%
VSELP based voice
coders
238
0.79
280
0.93
85%
Search Algorithms
1185
3.95
1318
4.39
90%
IIR Filter
16 coefficients
Filter
43
0.14
38
0.13
100%
IIR – cascaded biquads
10 Cascaded biquads (Direct Form
II)
Filter
70
0.23
75
0.25
93%
VSELP based voice
coders
61
0.20
58
0.19
100%
51
0.17
47
0.16
100%
279
0.93
274
0.91
100%
All-zero FIR Filter
40 samples,
10 coefficients
Minimum Error Search
Table Size = 2304
MAC
Two 40 sample vectors
Vector Sum
Two 44 sample vectors
MSE
MSE between two 256 element
vectors


112
Mean Sq. Error
Computation in
Vector Quantizer
Completely natural C code (non ’C6000 specific)
Code available at: http://www.ti.com/sc/c6000compiler
TI C62x™ Compiler Performance Release 4.0: Execution Time in s @ 300 MHz
Versus hand-coded assembly based on cycle count
Sample Imaging & Telecom Benchmarks
DSP & Image Processing
Kernels
Cycle Count
Performance
C62x
C64x
Reed Solomon Decode: Syndrome
Accumulation
(204,188,8) Packet
1680
470
Viterbi Decode (GSM)
(16 states)
38.25
FFT - Radix 4 - Complex
(size = N log (N)) (16-bit)
12.7
Polyphase Filter - Image Scaling
(8-bit)
0.77
Correlation - 3x3
(8-bit)
4.5
Median Filter - 3x3
(8-bit)
9.0
Motion Estimation - 8x8 MAD (8-bit)
cycles/packet
14*
cycles/output
6.0
cycles/data
0.33
cycles/output/filter tap
1.28
cycles/pixel
2.1
cycles/pixel
0.953
0.126
cycles/pixel
*
113
Includes traceback
Cycle Improvement
C64:C62
720MHz C64x vs
300MHz C62x
3.5x
8.4x
2.7x
6.5x
2.1x
5x
2.3x
5.5x
3.5x
8.4x
4.3x
10.3x
7.6x
18.2x
Sample C62x Compiler Benchmarks
Algorithm
Used In
Asm
Cycles
Assembly Time
(s)
C Cycles
(Rel 4.0)
C Time (s)
% Efficiency vs
Hand Coded
Block Mean Square Error
MSE of a 20 column image matrix
For motion
compensation of
image data
348
1.16
402
1.34
87%
Codebook Search
CELP based voice
coders
977
3.26
961
3.20
100%
Vector Max
40 element input vector
Search Algorithms
61
0.20
59
0.20
100%
VSELP based voice
coders
238
0.79
280
0.93
85%
Search Algorithms
1185
3.95
1318
4.39
90%
IIR Filter
16 coefficients
Filter
43
0.14
38
0.13
100%
IIR – cascaded biquads
10 Cascaded biquads (Direct Form
II)
Filter
70
0.23
75
0.25
93%
VSELP based voice
coders
61
0.20
58
0.19
100%
51
0.17
47
0.16
100%
279
0.93
274
0.91
100%
All-zero FIR Filter
40 samples,
10 coefficients
Minimum Error Search
Table Size = 2304
MAC
Two 40 sample vectors
Vector Sum
Two 44 sample vectors
MSE
MSE between two 256 element
vectors


114
Mean Sq. Error
Computation in
Vector Quantizer
Completely natural C code (non ’C6000 specific)
Code available at: http://www.ti.com/sc/c6000compiler
TI C62x™ Compiler Performance Release 4.0: Execution Time in s @ 300 MHz
Versus hand-coded assembly based on cycle count
TMS320C6000 Memory
116
Memory size per device
Devices
C6201,
C6204,
C6701
C6205
EMIFA
EMIFB
P
D
=
=
64 kB
64 kB
C6202
P
D
=
=
256 kB
128 kB
C6203
P
D
=
=
384 kB
512 kB
L1P
L1D
L2
=
=
=
4 kB
4 kB
64 kB
C6713
L1P
L1D
L2
=
=
=
4 kB
4 kB
256 kB
128M Bytes
(32-bits wide)
N/A
C6411
DM642
L1P
L1D
L2
=
=
=
16 kB
16 kB
256 kB
128M Bytes
(32-bits wide)
N/A
C6414
C6415
C6416
L1P
L1D
L2
=
=
=
16 kB
16 kB
1 MB
256M Bytes
(64-bits wide)
C6211
C6711
C6712
117
Internal
52M Bytes
(32-bits wide)
128M Bytes
(32-bits wide)
N/A
N/A
64M Bytes
(16-bits wide)
64M Bytes
(16-bits wide)
Internal Memory Summary
Devices
Internal
(L2)
C6211
C6711
C6713
64 kB
512M
(32-bit wide)
C6712
256 kB
512M
(16-bit wide)
Devices
Internal
(L2)
C6414
C6415
C6416
1 MB
DM642
256 kB
C6411
256 kB
LINK: TMS320C6000 DSP Generation
118
External
External
A: 1GB
B: 256kB
(64-bit)
(16-bit)
1GB (64-bit)
256MB (32-bit)
Performance
Making use of Parallelism
Given this simple loop …
40
y = 
cn * xn
n = 1
c
x
cnt
prod
y
*cp
*xp
*yp
.S1
short mac(short *c, short *x, int count) {
for (i=0; i < count; i++) {
sum += c[i] * x[i]; } …
.M1
.S1
40, cnt
LDH
.D1
*cp++, c
LDH
.D1
*xp++, x
MPY
.M1
c, x, prod
ADD
.L1
y, prod, y
SUB
.L1
cnt, 1, cnt
B
.S1
loop
STW
.D
y, *yp
loop:
.L1
.D1
[cnt]
120
MVK
How many of these instructions can we get in parallel?
C62x Intense Parallelism
short mac(short *c, short *x, int count) {
for (i=0; i < count; i++) {
sum += c[i] * x[i]; } …
MPY
||
MPYH
|| [B0] B
||
LDW
||
LDW
.M2
.M1
.S1
.D1
.D2
B7,A3,B4
B7,A3,A5
L3
*A4++,A3
*B6++,B7
L2: ; PIPED LOOP PROLOG
MPY .M2 B7,A3,B4
||
MPYH .M1 B7,A3,A5
Given this
code
LDW C
.D1
*A4++,A3
|| [B0] B
.S1 L3
||
LDW .D2 *B6++,B7
||
LDW .D1 *A4++,A3
The C62x compiler can achieve||
LDW .D2 *B6++,B7
LDW .D1 *A4++,A3
;** -----------------------*
Two
|| Sum-of-Products
LDW .D2 *B6++,B7 per cycle
L3:
[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7
[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7
[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7
||
||
||
||
||
||
||
; PIPED LOOP KERNEL
ADD .L2 B4,B5,B5
ADD .L1 A5,A0,A0
MPY .M2 B7,A3,B4
MPYH .M1 B7,A3,A5
[B0]B
.S1 L3
[B0]SUB .S2 B0,1,B0
LDW .D1 *A4++,A3
LDW .D2 *B6++,B7
;** -----------------------*
121
What about the ‘C67x?
C67x MAC using Natural C
Memory
The C67x compiler gets two 32-bit
A0
B0
.D1
.D2
floating-point
Sum-of-Products per iteration
.M1
.M2
.L1
.L2
..
A15
..
.S1
.S2
Controller/Decoder
122
B15
float mac(float *c, float *x, int count)
{ int i, float sum = 0;
for (i=0; i < count; i++) {
sum += c[i] * x[i]; } …
;** --------------------------------------------------*
LOOP: ; PIPED LOOP KERNEL
LDDW .D1
A4++,A7:A6
||
LDDW .D2
B4++,B7:B6
||
MPYSP .M1X
A6,B6,A5
||
MPYSP .M2X
A7,B7,B5
||
ADDSP .L1
A5,A8,A8
||
ADDSP .L2
B5,B8,B8
|| [A1] B
.S2
LOOP
|| [A1] SUB
.S1
A1,1,A1
;** --------------------------------------------------*
Can the 'C64x do better?
C64x gets four MAC’s using DOTP2
short mac(short *c, short *x, int count)
{ int i, short sum = 0;
DOTP2
m1
m0
A5
n0
B5
x
n1
=
m1*n1 + m0*n0
A6
+
running sum
123
A7
for (i=0; i < count; i++) {
sum += c[i] * x[i]; } …
;** --------------------------------------------------*
; PIPED LOOP KERNEL
LOOP: ADD
.L2
B8,B6,B6
||
ADD
.L1
A6,A7,A7
||
DOTP2 .M2X B4,A4,B8
||
DOTP2 .M1X B5,A5,A6
|| [ B0] B
.S1
LOOP
|| [ B0] SUB
.S2
B0,-1,B0
||
LDDW .D2T2 *B7++,B5:B4
||
LDDW .D1T1 *A3++,A5:A4
;** --------------------------------------------------*
How many multiplies can the ‘C6x perform?
MMAC’s

How many 16-bit MMACs (millions of MACs per second)
can the 'C6201 perform?
400 MMACs

(two .M units x 200 MHz)
How about 16x16 MMAC’s on the ‘C64x devices?
2 .M units
x
2 16-bit MACs (per .M unit / per cycle)
x 720 MHz
---------------2880 MMACs

How many 8-bit MMACs on the ‘C64x?
5760 MMACs (on 8-bit data)
124
How Do We Get Such High Parallelism?
 Compiler and Assembly Optimizer use a technique called Software Pipelining
 Software pipelining enables high performance
(esp. on DSP-type loops)
 Key point: Tools do all the work!
125
What is software pipelining?
Let's look at a simple example ...
Tools Use Software Pipelining
Here’s a simple example to demonstrate ...
LDH
||
How many cycles would
it take to perform this
loop 5 times?
LDH
MPY
5 x 3 = 15
______________
cycles
ADD
126
Our functional units could be used like ...
Without Software Pipelining
Cycle
.D1
.D2
1
ldh
ldh
2
.M1
ldh
.S1
.S2
ldh
mpy
6
127
.L2
add
5
7
.L1
mpy
3
4
.M2
add
ldh
ldh
In seven cycles, we’re almost half-way done ...
With Software Pipelining
Cycle
.D1
.D2
1
ldh
ldh
2
ldh
ldh
mpy
3
ldh
ldh
mpy
add
4
ldh
ldh
mpy
add
Completes
in only 7 cycles
5
ldh
ldh
mpy
add
mpy
add
6
7
128
.M1
.M2
.L1
.L2
.S1
.S2
add
It takes 1/2 the time! How does this translate to code?
S/W Pipelining Translated to Code
c1:
Cycle
1
.D1
ldh
.D2
ldh
2
ldh
ldh
mpy
3
ldh
ldh
mpy
c2:
add
4
ldh
ldh
mpy
add
5
ldh
ldh
mpy
add
mpy
add
6
7
129
||
add
.S1
LDH
LDH
.S2
||
||
MPY
LDH
LDH
||
||
||
ADD
MPY
LDH
LDH
c3:
DSK
Code Composer Studio
C6416 DSK
131
Diagnostic Utility included with DSK ...
C6416 DSK
132
Diagnostic Utility included with DSK ...
DSK’s Diagnostic Utility
133

Test/Diagnose
DSK hardware

Verify USB
emulation link

Use Advanced
tests to facilitate
debugging

Reset DSK
hardware
CCS Overview ...
Code Composer Studio
Standard
Runtime
Libraries
Compiler
Asm Opto
SIM
DSK
Asm
Edit
Link
.out
Debug
EVM
DSP/BIOS
Config
Tool
DSP/BIOS
Libraries
Third
Party
DSK’s Code Composer Studio Includes:
 Integrated Edit / Debug GUI  Simulator
 Code Generation Tools
 BIOS: Real-time kernel
Real-time analysis
134
XDS
DSP
Board
CCS is Project centric ...
Code Generation
Asm
Optimizer
Link.cmd
.sa
Asm
Editor
.asm
Linker
.obj
.out
.c / .cpp
.map
Compiler
135
What is a Project?
Project (.PJT) file contain:
References to files:



Source
Libraries
Linker, etc …
Project settings:



136
Compiler Options
DSP/BIOS
Linking, etc …
The project menu ...
Project Menu
Hint:
Project Menu
Create
andvia
open
projects
 Access
pull-down
menu
by right-clicking
.pjt file
fromorthe
Project menu,
in project explorer window
not the File menu.
Build Options...
137
Next slide
Build Options
-g -q -fr"c:\modem\Debug" -mv6700

138
Eight Categories of
Compiler options
The most common Compiler Options are ...
Compiler’s Build Options


debug
options

140
Nearly one-hundred compiler options available to
tune your code's performance, size, etc.
Following table lists the most common options:
Options
Description
-mv6700
-mv6400
-fr <dir>
-fs <dir>
-q
-g
-s
Generate ‘C67x code (‘C62x is default)
Generate 'C64x code
Directory for object/output files
Directory for assembly files
Quiet mode (display less info while compiling)
Enables src-level symbolic debugging
Interlist C statements into assembly listing
In Chapter 4 we will examine the options which
enable the compiler’s optimizer
And, the Config Tool ...
DSP/BIOS Configuration Tool
Simplifies system design by:





142
Automatically includes the appropriate
runtime support libraries
Automatically handles interrupt vectors
and system reset
Handles system memory configuration
(builds CMD file)
Generates 5 files when CDB file is saved:
 C file, Asm file, 2 header files and a
linker command (.cmd) file
More to be discussed later …
‘C6000 C Data Types
143
Type
Size
Representation
char, signed char
unsigned char
short
unsigned short
int, signed int
unsigned int
long, signed long
unsigned long
enum
float
double
long double
pointers
8 bits
8 bits
16 bits
16 bits
32 bits
32 bits
40 bits
40 bits
32 bits
32 bits
64 bits
64 bits
32 bits
ASCII
ASCII
2’s complement
binary
2s complement
binary
2’s complement
binary
2’s complement
IEEE 32-bit
IEEE 64-bit
IEEE 64-bit
binary
GEL Scripting
GEL: General Extension
Language
 C style syntax
 Large number of debugger
commands as GEL functions
 Write your own functions
 Create GEL menu items
146
CCS Scripting


147
Debug using VB Script or Perl
Using CCS Scripting, a simple script can:
 Start CCS
 Load a file
 Read/write memory
 Set/clear breakpoints
 Run, and perform other basic testing
functions
TCONF Scripting (CDB)
Tconf Script (.tcf)
hello_dsk62cfg.tcf
utils.loadPlatform("dsk6211");
utils.getProgObjs(prog);
LOG_system.bufLen = 128;
utils.importFile("hello");
prog.gen();
Tconf Include File (.tci)
var trace = LOG.create("trace");
trace.bufLen = 32;
/* load DSK6211 platform into TCOM */
/* make all prog objects JavaScript global vars */
/* set buffer length of LOG_system to 128 */
/* import portable application script */
/* generate cfg files (and CDB file) */
hello.tci
/* create a new user log, named trace */
/* initialize its length to 32 (words) */
Your Application
hello.c
#include <log.h>
extern LOG_Obj trace;
/* created in hello.tci */
int main() {
• A textual way to configure CDB files
LOG_printf(&trace, "Hello World!\n");
• Runs on both PC and Unix
return (0);
}
• Create #include type files (.tci)
• More flexible than Config Tool
148
Chapter 2
TMS320C6000 Architectural Overview
- End -
Download