OneDay8260Rev2_8.ppt

advertisement
Goals
•
•
•
•
Provide an overview of the 8260 device
Allow a quick start of an 8260 design cycle
Gain familiarity with debug issues particular to the 8260
Create the basis to build further experience
[Rev 2.8]
1 of 107
Outline
• 8260 Architecture
• Application examples
• Debug considerations
[Rev 2.8]
2 of 107
Outline
• 8260 Architecture
–
–
–
–
Device overview
Core CPU
SIU
CPM
[Rev 2.8]
3 of 107
EC603e
PowerPC
Core
16 KB I-Cache
IMMU
16 KB D-Cache
DMMU
COMM. PROCESSOR MODULE
Internal
Four
Serial
Interrupt
Memory
Timers
DMAs
Controller
Space
Parallel I/O
32-bit RISC and Virtual
Baud Rate
Generators Timers Program ROM IDMAs
SYSTEM INTERFACE UNIT
60x Bus Interface Unit
PowerPC-to-Local Bridge
Local Bus Interface Unit
Memory Controller
Time Counter/PIT
Bus Arbiter
L2 Cache Controller
System Functions
MCC1 MCC2 FCC1 FCC2 FCC3 SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C
Serial Interface
Time Slot Assigner
8 TDMs
MII
2 UTOPIA
[Rev 2.8]
4 of 107
CPU
•
•
•
•
•
•
Based on the MPC603e core
Up to two instructions fetched per clock
Up to three instructions issued and retired per clock
Up to five instructions in execution per clock
Most instructions execute in one clock
Branches can execute in zero clocks
[Rev 2.8]
5 of 107
Programming Model
32 bits
64 bits
GPR0
GPR1
GPR2
GPR3
GPR4
FPR0
FPR1
FPR2
FPR3
FPR4
CR
XER
FPSCR
MSR
PVR
GPR30
GPR31
FPR30
FPR31
CTR
LR
TBU
TBL
SRR0
SRR1
DEC
SPRn
SPRx
[Rev 2.8]
6 of 107
MSR
Bit 0 is MSB
0
0
0
0
0
0
0
0
0
0
0
0
0 POW 0 ILE EE PR FP ME FE0 SE BE FE1 0
Bit 31 is LSB
IP IR DR 0
0
RI LE
Power management enabled
Interrupt little endian mode
External interrupt enable
Privilege level
Floating point available
Machine check enable
Floating point exception mode [0,1]
Single step trace enabled
Branch trace enabled
Exception [interrupt] prefix
Instruction address translation enabled
Data address translation enabled
Recoverable exception
Little endian mode
[Rev 2.8]
7 of 107
CPU Overview
Inst. Cache
Branch
Processing
Sequential
Fetcher
System Register Unit
Instruction
Queue
Dispatch
Inst.
MMU
CTR
CR
LR
Floating Point Unit
Instruction Unit
/ + *
Integer Unit
GPR File
R0-R31
/ + * XER
GP Rename Regs
FPR File
Load/Store Unit
FPR0-FPR31
FP Rename Regs
Data
MMU
Completion
Unit
Main
Memory
Data
Cache
[Rev 2.8]
8 of 107
Execution Units
• Execution units operate in parallel
–
–
–
–
–
–
Fetch / Branch
Integer
Floating Point
Load / Store
System
Completion
[Rev 2.8]
9 of 107
Fetch / Dispatch
•
•
•
•
Instructions are fetched in pairs
Non-branch instructions enter the instruction queue
Branch instructions are redirected to the branch unit
Two instructions can be sent to the execution units and one
to the branch unit for a total of three issued instructions per
clock
• All instructions “appear” to execute sequentially
[Rev 2.8]
10 of 107
On each CPU clock:
64 bit wide transfer from instruction cache
Instruction
Instruction Cache
Instruction
Instructions fall through to
first open location in queue
Instruction
Instruction
Instruction
Instruction
Instruction
Branch instruction closest to the
bottom of the queue is issued to
the branch unit on each clock
Bottom two non-branch
instructions are dispatched to
available execution units
Instruction
Execution Unit
Instruction
Execution Unit
Instruction
Branch
Processing
CTR
CR
LR
[Rev 2.8]
11 of 107
Branch
• Branches are pre-executed, giving an effective execution
time of zero clocks
• Instruction queue provides look ahead to determine data
dependencies
• Unresolved conditional branches are statically predicted
under control of the compiler
[Rev 2.8]
12 of 107
Subroutine Control Flow
Software maintained stack
Address of this instruction is
placed into the Link Register
by the branch function
GPR1
Branch to sub
LR
Instructions save the LR to the stack
to allow nested function calls
Branch to sub
The LR is reused for another call
LR
Branch to LR
The LR is recalled from the stack
to allow a return from subroutine
Branching to the contents of the LR is a return instruction
[Rev 2.8]
13 of 107
Integer
• Integer unit directly accesses the GPR file
• Rename registers prevent stalls and allow instructions to be
un-executed
• Most instructions execute in one clock
• Divides have been optimized over the 603 to reduce
latency by 50%
[Rev 2.8]
14 of 107
Floating Point
• Floating point unit directly accesses the FPR file
• Rename registers prevent stalls and allow instructions to be
un-executed (The same as in the integer GPR file)
• Supports single (32 bit) and double (64 bit) precision
operands
• Three stage pipeline accepts one instruction per clock
• Supports all IEEE 754 floating-point data types
(normalized, denormalized, NaN, zero, and infinity) in
hardware, eliminating the latency incurred by software
exception routines
[Rev 2.8]
15 of 107
Load/Store
• Responsible for all transfers between the GPR file and main
memory
• Instructions appear to execute in order
• Actual accesses can occur out of order
• Loads from cache execute in one clock with a two clock latency
• Stores to cache execute in one clock with a latency of three
clocks
• Speculative loads are placed in the rename registers
• Speculative stores remain in the store queue
[Rev 2.8]
16 of 107
System
• Performs moves to and from SPR’s
• Doubles as an auxiliary integer unit
– Executes add / compare instructions
– Executes condition register logical operations
• Instructions that affect processor mode force serialization
of the processor
[Rev 2.8]
17 of 107
Completion
• Holds instructions executed in parallel or out of order until
they can be retired in order
• Retiring an instruction commits it’s results to the processor
state
• Simply discarding an instruction from the completion
queue effectively un-executes it
• Two instructions can be retired per clock
[Rev 2.8]
18 of 107
Instruction Set
• 68K instructions were based on an accumulator, direct memory model
add (0x00035300).L, D4
D0
D1
D2
D3
D4
D5
D6
D7
0x00035300
+
[Rev 2.8]
19 of 107
Instruction Set
• PowerPC instructions are based on a triadic, load/store model
lwz
add
r2,0x00035300
r6,r2,r4
GPR0
GPR1
GPR2
GPR3
GPR4
GPR5
GPR6
GPR7
0x00035300
+
GPR31
[Rev 2.8]
20 of 107
Exceptions
• All exceptions cause processing to vector to a
predetermined memory location
• The base address of the vector table is controlled by the [IP] bit in
the MSR
• Each vector is placed at a page boundary
•
•
•
•
•
• 64 instructions can be placed at a vector before hitting the next
vector
Reset = 0xnnn00100
Machine Check = 0xnnn00200
External Interrupt = 0xnnn00500
Decrementer = 0xnnn00900
Etc.
[Rev 2.8]
21 of 107
Exceptions
Flash
MSR[IP] = 1
FFF00100
Instruction
64 instructions
External
500
Instruction
Instruction
64 instructions
ISI
400
Instruction
Instruction
64 instructions
DSI
300
Instruction
Instruction
64 instructions
RAM
00000100
MSR[IP] = 0
Machine Check 200
Instruction
Instruction
64 instructions
Reset
100
Instruction
[Rev 2.8]
22 of 107
Exceptions
• Only the Decrementer and the External Interrupt can be
masked by the [EE] bit in the MSR
• Machine Check exceptions can vector to a routine or force
Checkstop state
• All other exceptions are synchronous (caused by
instruction execution) and are unmaskable
[Rev 2.8]
23 of 107
Nesting Exceptions
• When an exception occurs, return state is stored in the
processor
•
•
•
•
There is no automated stacking of critical registers
The address of the return instruction is stored in SRR0
The MSR prior to the exception is in SRR1
The [EE] bit of the MSR is cleared
• The processor must save these registers and any other
GPR’s to a software maintained stack
• The EABI specifies GPR1 to be the stack pointer
• The [RI] bit in the MSR is set by software when enough
information is saved to allow recovery from a nested
exception
[Rev 2.8]
24 of 107
Exception Control Flow
Address of this instruction
is placed into SRR0 by
the hardware
An exception after the completion of
this instruction
causes flow to be directed to the
Software maintained stack
GPR1
ISR
SRR0
SRR1
Instructions save the SRR’s to the stack
to allow nested exceptions
The MSR[RI] bit is cleared by the
exception hardware and set by software
after the SRR’s have been saved
An exception while MSR[RI]
is cleared causes a machine
check event
The MSR[RI] bit is cleared by the
software just before the SRR’s are
restored by the software
It is safe for exceptions to occur
in this section of code
Breakpoints
Are
Exceptions!
The SRR’s is recalled from the stack
to allow a return from subroutine
rfi
[Rev 2.8]
25 of 107
Cache
• Independent instruction and data caches implements an
internal Harvard Architecture
• Each cache is 16Kbyte, four way set associative
• Caching of separate memory areas is controlled by the
MMU
[Rev 2.8]
26 of 107
Cache Organization
0
Stored in address tag (20)
128 sets
Set select (7)
31
Word Byte
Way 0
Block 508
Way 1
Block 509
Way 2
Block 510
Way 3
Block 511
Way 0
Address Tag 0
State
Words 0-7
Block 0
Way 1
Address Tag 1
State
Words 0-7
Block 1
Way 2
Address Tag 2
State
Words 0-7
Block 2
Way 3
Address Tag 3
State
Words 0-7
Block 3
[Rev 2.8]
27 of 107
Cache Operation
• Each cache block (or line) can be in one of three state (MEI
protocol)
– M = modified (or dirty)
• Resides in cache and is different than memory
– E = exclusive (resident and clean)
• Resides in cache and is identical to memory
– I = invalid (not resident)
• The “shared” state of the full MESI protocol is not supported
– Would allow synchronization of multiply cached blocks
• There is no cache coherency for the instruction cache
[Rev 2.8]
28 of 107
Cache control
• Hardware implementation dependent registers (HIDn)
control cache function
– Enabling
– Invalidate
– Locking
• Supervisor instructions provide block level control
– Allocate, flush, invalidate, store, touch, zero
• Ability to store a given block of memory into the cache is
controlled by the MMU
– Each block or page in the MMU has WIMG bits
• (Write-through, Inhibited, Global, Guarded)
[Rev 2.8]
29 of 107
MMU
• The MMU provides for both memory translation and
access control
• The system boots in Real (un-translated) mode
• To effectively use the caches, the MMU must be used in
block or page mode
– Effectively, a null translation is performed
[Rev 2.8]
30 of 107
Protection
• The primary use of the MMU in embedded applications is
for cache control and access protection
• The WIMG bits are set for each page
–
–
–
–
W = write-through (applicable only to data cache)
I = inhibited
M = memory coherency supported in hardware
G = guarded (indicates that memory is ill-behaved)
• I/O spaces
• All accesses are forced to be in order
• No speculative reads or pre-fetches
[Rev 2.8]
31 of 107
Translation
• Block or page translation allows the full use of a virtual
memory model
• Block translation provides a memory space of 232 bytes
• Page translation provides a virtual memory space of 252
bytes
• System must be debugged with RTOS tools
– Emulators and hardware debuggers don’t support it
[Rev 2.8]
32 of 107
Real mode
32
Logical address
WIMG:
W = 0: write-back
I = 0: cache enable
M = 1: data is global
G = 1: memory is guarded
32
Physical address
[Rev 2.8]
33 of 107
BAT mode
4
11
17
BEPI (15)
WIMG
4 BRPN
BL (11)
&
11
+
BAT Reg n
4
11
Logical address
17
Physical address
[Rev 2.8]
34 of 107
Page mode
Logical address
4
16
12
Segment register
Virtual address
24
16
12
40
TLB page table
20
WIMG
12
Physical address
[Rev 2.8]
35 of 107
Reset operation
Reset Source
Power-on reset
External hard reset
Software watchdog
Bus monitor
Checkstop
External soft reset
Reset
PLL
System
configuration
sampled
Clock
module
reset
HREST
driven
Other internal
logic reset
SREST
driven
Core
reset
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
[Rev 2.8]
36 of 107
Reset Types
• Power-on reset is used to align all logic from a chaotic
state after Vcc stabilizes
– The PLL then begins to lock
• Hard reset is analogous to the normal reset on other
processors
– The PLL is not affected
• Soft reset can be used to initiate a warm start
– Not commonly used
– Not driven or monitored by the emulator
– Basically, a non-returnable exception to the reset vector
[Rev 2.8]
37 of 107
Reset Sequence
POR asserted
HRESET asserted
SREST asserted
HREST &
SREST asserted
HREST &
SREST asserted
SREST asserted
PLL locks
RSTCONF
sampled
RSTCONF
sampled
Internal logic
reset
Internal logic
reset
Internal logic
reset
HREST &
SRESET negated
HREST &
SRESET negated
SRESET negated
[Rev 2.8]
38 of 107
Memory Map Startup
Boot Map
CS0
At boot,
CS0 is
active for
one of two
large areas
of the
address
space. All
other chip
selects are
invalid.
Before Config Word
After Config Word
Application
Target Map
Flash
Flash
Flash
Flash
Flash
IMMR
Flash
IMMR
CSi
IMMR
I/O
Flash
Flash
CSx,y,z
Flash
RAM
Flash
Flash
[Rev 2.8]
39 of 107
Memory Map Implications
•Since the Flash memory access by CS0 occupies one of two large areas in the address
space, boot code can be linked to execute in a number of different locations
•Any branches will change the NIA from the boot location to the linked location
•All other chip selects are off
•IMMR RAM is still available
•CS0 must be reduced in scope before activating other chip selects
•Be careful no to pull the rug out from under the boot code when reducing CS0
•BSP re-entry issues:
•Altering chip select option registers while assuming the value in the Valid bit
•Can the chip selects to the RAM and Flash be altered while running out of either?
[Rev 2.8]
40 of 107
Memory Map Init Issues
•Three different factors can enhance (confuse) the boot process:
•The MSR[IP]
•The reset vector can be 0x0000_0100 or 0xfff0_0100
•Determined by the Reset Configuration Word
•Not changed by an SRESET
•CS0 scope
•CS0 responds to either a the upper or lower end of the memory map
•It must be changed while it is being used
•It may have already been reduced by a previous pass through the BSP
•Code link results
•Execution can start in code that is linked to a different address than the boot vector
•Only the address lines within the memory device are significant
•PC Relative addressing will solve this, right? WRONG!
•The first branch, will set the NIA MSB’s to the current execution value
[Rev 2.8]
41 of 107
RTOS Boot Sequences
Compressed
application
image
Flash
External
application
image
Boot Code
Boot code
decompresses
and relocates
application
from flash
BSP
IMMR
Data, stack,
heap, etc.
I/O
Chip Select x
Uncompressed
application
image
BSP
Boot code loads
application over
communication channel or
backplane
Base Register
Base Address
RAM
V
Option Register
Mask
Options
[Rev 2.8]
42 of 107
Endian Bus Connections
31 MS Byte Lane
24
7
0
8 Bit
7 LS Byte Lane
0
7
0
8 Bit
0 MS Byte Lane
7
7
0
8 Bit
68K
7 LS Byte Lane
0
31 MS Byte Lane
24
X86
PPC
24 LS Byte Lane
31
[Rev 2.8]
43 of 107
Big Endian Bus
8 Bit
16 Bit
7-0
15-8
0-7
0-7
7-0
32 Bit
31-24 23-16 15-8
7-0
0-7
8-15
8-15
16-23
8260
0
7
8
15
16
23
24
31
32
39
40
47
48
55
56
63
MS Byte Lane
24-31
63
56
Byte Lane
55
48
Byte Lane
47
40
Byte Lane
39
32
Byte Lane
31
24
Byte Lane
23
16
15
8
Byte Lane
LS Byte Lane
64 Bit
7
0
[Rev 2.8]
44 of 107
Configuration Word
• Configuration word is latched from Flash memory during
reset cycle
• A 32 bit value is loaded 8 bits at a time from the high order
bits of the data bus
– Immune to boot memory width
• RSTCONF pin allows configuration word to be forced to
all zero
• Multiple 8260 can access the same memory device
[Rev 2.8]
45 of 107
Configuration Word Contents
EARB EXMC CDIS EBM
BPS
CIP
BMS BBD
ISPS
L2CPC
MMR
•
•
•
•
EARB – External arbitration
EXMC – External memory controller
CDIS - Core disable
EBM - External bus mode
•
•
BPS – Boot port size
CIP – Core initial prefix
•
•
•
ISPS – Internal space port size
L2CPC – L2 cache control pins
DPPC – Data parity pin configuration
•
ISB – Internal space base address
LBPC
DPPC
-
APPC
ISB
CS10PC
-
MODCK_H

BMS – Boot memory space

BBD – Busy bus disable
MMR – Mask Masters request
LBPC – Local bus pin configuration
APPC – Address parity pin configuration
CS10PC – CS10 pin configuration
MODCK_H – MODCK high order bits





[Rev 2.8]
46 of 107
Configuration Word Format
8 bit wide boot device
Address
offset from
CS0
603 bus
MSB byte
lane (0-7)
0x00
0x01
Byte 0
Ignored
0x08
0x09
Byte 1
Ignored
0x10
0x11
Byte 2
Ignored
0x18
0x19
Byte 3
Ignored
32 bit wide boot device
Address
603 bus
offset from MSB byte
CS0
lane (0-7)
0x00
0x04
0x08
0x0C
0x10
0x14
0x18
0x1C
Byte 0
Ignored
Byte 1
Ignored
Byte 2
Ignored
Byte 3
Ignored
603 bus
byte lane
(24-31)
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
[Rev 2.8]
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
Ignored
47 of 107
Configuring a single 8260
8260
A bus
D bus
Vcc
RSTCONF
8260
A bus
D bus
Boot Flash
RSTCONF
[Rev 2.8]
48 of 107
Configuring multiple 8260’s
Master
8260
A bus
D bus
Boot Flash
RSTCONF
Slave 1 DA bus
bus
8260
RSTCONF
Slave 7 DA bus
bus
8260
RSTCONF
A0
A6
[Rev 2.8]
49 of 107
SIU
• The SIU contains the logic to interface the external system
components to the 8260
• Contains all of the glue logic needed for a typical
embedded application
[Rev 2.8]
50 of 107
SIU Overview
SYSTEM INTERFACE UNIT
60x Bus Interface Unit
PowerPC-to-Local Bridge
Local Bus Interface Unit
Memory Controller
Time Counter/PIT
Bus Arbiter
L2 Cache Controller
System Functions
[Rev 2.8]
51 of 107
603e Bus
• Very high performance bus
–
–
–
–
–
Separate address and data tenures
Pipelined
Bursting
Multi-master
Cache snooping
[Rev 2.8]
52 of 107
603e bus cycle
Address only cycle to support cache snoop
Address
Data
[Rev 2.8]
53 of 107
Local Bus
Two busses, one address map:
Address map
Flash
Flash
Code/Data
SDRAM
CPM Buffer
SDRAM
Code/Data
SDRAM
Memory
Control
CPM Buffer
SDRAM
[Rev 2.8]
54 of 107
Memory Control
• 12 banks of memory
– Each can be configured for any type of device
• Glueless support of SDRAM devices
• Glueless support of SRAM, EPROM, Flash
– Using general purpose chip select machine
• Three user programmable machines
• All memory controllers can be allocated to either the 603
or local bus
[Rev 2.8]
55 of 107
System control
•
•
•
•
•
•
•
•
Clock synthesis
Reset control
Interrupt control
Real time clock
Periodic interrupt timer
Bus monitor
Bus arbiter
Watchdog timer
[Rev 2.8]
56 of 107
Interrupt Control
Software
Watchdog Timer
Or
IRQ0
IRQ[0-7]
MCP
Fall / Level
Port C [0-15]
CPM Channels
Edge / Fall
Interrupt Controller
IRQ[1-7]
INT
603
Core
On board Timers
[Rev 2.8]
57 of 107
SIU Interrupt Vectors
•
All external interrupts cause processing at 0xnnn00500
– There is space for 64 instructions to save processor state and resolve the SIU
vector
•
Vectors are six bits
– Shifting w/ indirect addressing is used to decommutate to service routines
– A 16 bit load from the long word address of the SIVEC register will point to a 64
entry array of 1K byte (256 instructions) service routines.
– An 8 bit load will allow a 64 entry jump table of branch instructions
[Rev 2.8]
58 of 107
SIU Interrupt Vector Register
5 6
0
Six Bit Interrupt Code
0
7 8
0
0
15 16
0
0
0
0
0
0
0
0
31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8 bit read from
address 0xnnn10C04
16 bit read from address 0xnnn10C04
32 bit read from address 0xnnn10C04
[Rev 2.8]
59 of 107
SIU Interrupt Vectors 8 bit Read
Six Bit Interrupt Code
0
0
Table of branch instructions to ISRs
Each vector value points
to a different branch instruction
in the table
ba routine_g
ba routine_f
ba routine_e
ba routine_d
ba routine_c
ba routine_b
ba routine_a
_18
_14
_10
_0c
_08
_04
_00
[Rev 2.8]
60 of 107
SIU Interrupt Vectors 16 bit Read
Six Bit Interrupt Code
0
0
0
0
0
0
0
0
0
0
nnnn0fff
Each vector value points
to a block of 1K bytes / 256 instructions
256 32-bit
instructions
nnnn0c00
nnnn0bff
256 32-bit
instructions
nnnn0800
nnnn07ff
256 32-bit
instructions
nnnn0400
nnnn03ff
256 32-bit
instructions
nnnn0000
[Rev 2.8]
61 of 107
CPM
• Communications processor module
• Direct hardware support for all protocol and application
interfaces
– Ethernet, ATM, HDLC, T1/E1, T3/E3, BiSync, UART, ISDN,
PCM highway
– Parallel I/O
– Full serial and virtual DMA support
[Rev 2.8]
62 of 107
IMMR Format
• All on-chip peripherals are accessed though a single 128K
byte area of memory
• Within the first 64K of address space, there are three
blocks of dual ported RAM
• The second 64K of address space contains the control
registers of the on-chip peripherals
[Rev 2.8]
63 of 107
0x1_ffff
IMMR Map
Upper 64K
Hardware Registers 0x1_4000
SI routing RAM (8K)
0x1_2000
0x1_1c00
Control registers (7K)
0x1_0000
0x0_c000
0x0_b000
Lower 64K
Dual Ported RAM
FCC Data (4K)
0x0_9000
0x0_8000
Parameter RAM (4K)
0x0_4000
Buffer Descriptors / uCode / Data (16K)
0x0_0000
[Rev 2.8]
64 of 107
Dual Ported RAM usage
• The layout of the Dual Ported RAM is determined by the
uCode in the CPM
• When the CPM is not in operation, it is nothing more than
internal memory
– During the boot sequence, stack, global data, and heap can reside
in this memory
– Initialization code can be written in C++!
– A multi-layered boot process can be used
• First code resides in flash, uses internal RAM to setup chip selects
• Second code resides in another section of flash and uses external
RAM to load main application over a CPM channel
• Third level is the main application
– Each level has it’s own crt0.s function and initializes the EABI
from scratch
[Rev 2.8]
65 of 107
CPM Overview
COMM. PROCESSOR MODULE
Four
Internal
Interrupt
Timers
Memory
Controller
Parallel I/O
Space
Baud Rate
32-bit RISC and
Generators Timers Program ROM
Serial
DMAs
Virtual
IDMAs
MCC1 MCC2 FCC1 FCC2 FCC3 SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C
Time Slot Assigner
Serial Interface
[Rev 2.8]
66 of 107
DMA’s
• Serial DMA’s
– Full bi-directional support of all serial channels
– Can access the 603 or local bus
• Virtual DMA
– 4 channels
– Uses the serial DMA hardware to generate transfers
– Memory to memory or memory to/from I/O
[Rev 2.8]
67 of 107
CPM Buffer Structure
BD128
IMMR
BD3
BD2
BD1
RAM
[Rev 2.8]
68 of 107
Buffer Descriptor Format
16 bits
Status and Control
Data Length
High Order Pointer
Low Order Pointer
[Rev 2.8]
69 of 107
From Channel to Buffer
Location fixed by:
- Hardware channel
Format fixed by:
- Protocol
Communication
Channel hardware
Parameter
RAM
Dual ported
RAM
(Buffer
Descriptors)
Location determined by:
- Value in Buffer Descriptor
- Memory controller mapping of Local/603 bus
Format determined by:
- Protocol
Data Buffers
Location determined by:
- Parameter RAM value
Format of control and status determined by Protocol
[Rev 2.8]
70 of 107
SCC’s
• The SCC’s implement the following protocols:
–
–
–
–
SDLC/HDLC
AppleTalk
UART
10-Mbps Ethernet
[Rev 2.8]
71 of 107
Ethernet Frame
Stored by CPM in Receive buffer
Stored by CPU in Transmit buffer
Preamble
Start
Frame
Destination
Address
Source
Address
Type /
Length
7 bytes
1 byte
6 bytes
6 bytes
2 bytes
Data
46 - 1500 bytes
Frame
Check
4 bytes
[Rev 2.8]
72 of 107
Ethernet Buffer Descriptor
Receive
Control & Status
E
Transmit
Control & Status
R
Common for
Transmit and
Receive
- W I L F
PAD
- M
-
LG NO SH CR OV CL
W I L TC DEF HB RC RL
RC
UN CSL
Data Length
High Order Pointer
Low Order Pointer
[Rev 2.8]
73 of 107
Status and Control Definitions
Receive
Control & Status
E
- W I L F
- M
-
LG NO SH CR OV CL
First in Frame: Set by the CPM to inform the CPU that this is the
start of a new frame.
Last in Frame: Set by the CPM or the CPU to inform the other that
this is the last buffer of a frame.
Interrupt: Generate an interrupt after this buffer is used by the CPM.
Wrap: This is the last BD in this set of BD’s.
Empty / Ready:
0 = This buffer is owned by the CPU
1 = This buffer is owned by the CPM
Transmit CRC: Transmit the CRC after this buffer
Transmit
Control & Status
R
PAD
W I L TC DEF HB RC RL
RC
UN CSL
[Rev 2.8]
74 of 107
Transmit Frames
Parameter RAM points to this BD
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=1
L = 1 TC = 1
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=0
L = 0 TC = 1
R=0 W=0 I=1
L = 1 TC = 1
R=0 W=1 I=1
L = 1 TC = 1
After all buffers are filled, “R” is set
to “1” in all BD’s in this list
These BD’s are for the next frame
for this channel
This BD is for a single buffer frame
[Rev 2.8]
75 of 107
Receive Frames
Parameter RAM points to this BD
E=1 W=0 I=0
L= 0 F= 1
E=1 W=0 I=0
L= 0 F= 0
E=1 W=0 I=0
L= 0 F= 0
E=1 W=0 I=0
L= 0 F= 0
E=1 W=0 I=0
L= 1 F= 0
E=1 W=0 I=0
L= 0 F= 1
E=1 W=0 I=0
L= 0 F= 0
E=1 W=0 I=0
L= 1 F= 0
E=1 W=1 I=0
L= 1 F= 1
After all buffers are filled, “E” is set
to “1” in all BD’s in this list
These BD’s are for the next frame
for this channel
This BD is for a single buffer frame
[Rev 2.8]
76 of 107
The [E/R] bits
Initial
Value
Operation
Transmit
[Ready]
0
Fill with data
by CPU
Receive
[Empty]
1
Fill with data
by CPM
Changed
by
Changed
to
Operation
Changed Changed
by
to
CPU
1
CPM
transmits
buffer
CPM
0
CPM
0
CPU reads
buffer
CPU
1
Polarity can be confusing because the sense is reversed for
complementary operations. However, the same level always
indicates who [CPU vs. CPM] owns the buffer. This bit is the
same for all protocols on all channels.
[Rev 2.8]
77 of 107
The [W] bits
• The Wrap bit is always set to indicate the last buffer
descriptor for the channel
• It does not delineate frames!
• The value of the first buffer descriptor is stored in the
channel’s parameter RAM
– The list of BD’s is bounded by the parameter RAM and the [W] bit
• Any BD past a BD with the [W] bit set, that’s not pointed
to by parameter RAM is inaccessible by the CPM
• This bit is the same for all protocols on all channels.
[Rev 2.8]
78 of 107
The [I] Bits
• The Interrupt bits generate an interrupt to the CPU when the CPM
hands the BD to the CPU
– Whenever the CPM flips the [E/R] bit to “0”
• A redundant phrase, the CPM can only flip that bit to “0”, right?
• For transmit, it’s common to only receive an interrupt at the end of
transmission of the last buffer
• For receive, the last buffer is not known, so it’s more common to
receive an interrupt for most buffers on non-frame oriented protocols
– If a buffer is small enough that it can’t contain an entire frame, then this
bit might be cleared
• The CPU has to stay ahead of the CPM to know when a wrap occurred
– On Ethernet, the end of frame interrupt is more efficient
• This bit is the same for all protocols on all channels.
[Rev 2.8]
79 of 107
The [L] Bits
• The Last bits indicate the end of a frame within the list of
buffer descriptors
• Set and cleared by the CPU on transmit frames
– The CPM only reads this bit for transmit
• Set by the CPM on receive frames
– Should be cleared by the CPU before the [E] is used to hand the
buffer to the CPM
• This bit is not the same for all protocols on all channels.
[Rev 2.8]
80 of 107
The [F] Bits
• The First bit is only present in receive frames
• Set by the CPM to tell the CPU that this buffer starts a
frame
– An underrun, late collision, or aborted frame can cause a new
frame in the next buffer without the [L] bit being set in the
previous BD
• Not needed for transmit
– The CPU will control the state of the CPM with the [L] bit
– An [L] bit set or an underrun will cause the next buffer to be
considered the first buffer of a frame
• This bit is not the same for all protocols on all channels.
[Rev 2.8]
81 of 107
The [TC] Bits
• The Transmit CRC bits work in conjunction with the [L]
bit
• The [TC] bit is ignored if the [L] bit is cleared
• Initializing all [TC] bits to “1” is a good precaution
• Only custom protocols that don’t use hardware generated
CRC’s should have this bit cleared
• This bit is not the same for all protocols on all channels.
[Rev 2.8]
82 of 107
Subtle points on BD’s
• Frames can span buffers
• Buffers never span frames
– Unless you have all hardware support turned off and are running
transparent
• Be careful with small receive buffers that have the [I] bit set
– You’ll get hammered with interrupts
• Turn buffers over to the CPM from last to first
– If an interrupt interferes with the handoff, an underrun / overflow can
occur
• Hands off a BD with the [E/R] bit set
– Unless you like working weekends
[Rev 2.8]
83 of 107
FCCs
• The FCC’s support:
–
–
–
–
10/100-Mbps Ethernet through an MII
Full 155 Mbps ATM SAR through UTOPIA
45Mbps HDLC (DS-3)
Operation is similar to SCCs
• Block mode allows buffers to be dynamically moved into dual ported
RAM
[Rev 2.8]
84 of 107
FCC Buffer Descriptors
• Identical in format to the SCC’s buffer descriptors
• Except:
– Buffer descriptors, as well as buffers are in main memory
– Pointers to buffer descriptors in the parameter RAM are 32 bits
• Buffer descriptors must still be in consecutive memory
locations
[Rev 2.8]
85 of 107
SMC’s
• The SMC’s perform basic UART as well as transparent
mode transmission
• Buffer description operation is identical to the SCC’s
– The status and control word has different bit fields pertaining to the
protocols
– Bit fields controlling protocol independent operation are
unchanged
[Rev 2.8]
86 of 107
Status and Control Definitions
[SMC in UART mode]
Receive
Control & Status
E
- W I
-
- CM ID
-
BR FR PR - OV -
Idle: Close buffer on reception of idles
Continuous mode: [E] bit isn’t cleared on buffer reception
Interrupt: Generate an interrupt after this buffer is used by the CPM.
Wrap: This is the last BD in this set of BD’s.
Empty / Ready:
Transmit
Control & Status
R - W I
-
- CM P
-
-
0 = This buffer is owned by the CPU
1 = This buffer is owned by the CPM
-
-
-
[Rev 2.8]
87 of 107
MII
PQ II
MPC
8260
FCCn
Transmit Error
(Tx_ER)
Transmit Nibble Data (TxD[3:0])
Transmit Enable
(Tx_EN)
Transmit Clock
(Tx_clk)
Collision Detect
(COL)
Receive Nibble Data
(RxD[3:0])
Receive Error
(Rx_ER)
Receive Clock
(Rx_clk)
Receive Data Valid
(Rx_DV)
Carrier Sense output
(CRS)
Management Data Clock (MDC)
Management Data I/O (MDIO)
Fast
Ethernet
PHY
[Rev 2.8]
88 of 107
Utopia Interface
A[24-31]
D[0-7]
ATMCS0*
BCTL0*
PWE0*/PDQM/PBS0*
ATMRST*
DP6/CSE0/IRQ6*
MPC8260
A[7-0]
D[7-0]
CS*
RD*
WR*
RST*
ALE
INT*
PM5350
[Rev 2.8]
89 of 107
Applications
• Performance drives the complexity of the 8260 system
– Single processor
• Single 8260
• Multiple 8260’s with all but one core turned off
• Multiple 8260’s with all cores off, using an external MPC750
– Multiple processor
• Combinations of 8260’s and 750’s
[Rev 2.8]
90 of 107
Single 8260
MPC8260
SDRAM/SRAM/DRAM/Flash
60x Bus
PHY
PHY
Communication
Channels
SDRAM/SRAM/DRAM
155 Mbps
ATM PHY
UTOPIA
Local Bus
ATM
Connection Tables
[Rev 2.8]
91 of 107
Multiple 8260s
MPC8260
PHY
PHY
SDRAM/SRAM/DRAM
Local Bus
Communication
Channels
ATM
Connection Tables
SDRAM/SRAM/DRAM/Flash
155 Mbps
ATM PHY
UTOPIA
60x Bus
MPC8260
PHY
PHY
Communication
Channels
SDRAM/SRAM/DRAM
155 Mbps
ATM PHY
UTOPIA
Local Bus
ATM
Connection Tables
[Rev 2.8]
92 of 107
MPC7xx w/ 8260(s)
MPC7xx
Backside
Cache
32-Kbyte I cache
32-Kbyte D cache
MPC8260
PHY
PHY
Communication
Channels
SDRAM/SRAM/DRAM/Flash
60x Bus
SDRAM/SRAM/DRAM
155 Mbps
ATM PHY
UTOPIA
Local Bus
ATM
Connection Tables
[Rev 2.8]
93 of 107
Debug Considerations








What is JTAG
JTAG Limitations
Getting out of reset
The 60x Core and Bus
The cache is on
CPM Realities
Exception Routines
Tracing at the Bus Cycle Level
[Rev 2.8]
94 of 107
What is JTAG?





JTAG is a SLOW serial connection to the 8260
CPU resources
The serial data is called the scan chain.
JTAG provides the ability to modify memory
and registers.
The scan chain for each processor is different.
JTAG was not created for Debug…
[Rev 2.8]
95 of 107
JTAG connection
• JTAG connector allows for full run control of the processor
• The emulator can sync with the processor without
disrupting it’s state
TDO
TDI
QREQ*
TCK
TMS
SRESET*
HRESET*
XBR3*
TRST*
3.3V
GND
GND
[Rev 2.8]
96 of 107
JTAG Limitations

Slow download of code to RAM.
 JTAG accesses during execution MAY
dramatically affect performance.
 All commands through JTAG must be “scanned
in”
[Rev 2.8]
97 of 107
Getting out of Reset





Reset Configuration word of vital importance
TRST must not be permanently asserted
When flashing your boot code, be careful to
replace or keep the configuration word
What is your Interrupt Prefix?
Switchable pullup on RSTCONF*?
[Rev 2.8]
98 of 107
The 60x Core and Bus




a STOP instruction must be scanned in (no breakpoint pin)
only one hardware code breakpoint available; no hardware data
breakpoints
Address and Data do not necessarily appear on the bus at the
same time
Predictive Fetching means what you see on the bus may not be
executed.
[Rev 2.8]
99 of 107
The Caches are On
• Bus Cycles now appear as bursts
• Fetches are determined by the BIU, not related to instruction
execution
• No Cache Visibility pins
• Instrumentation required for accurate debug
• Caution must be exercised when the boot process performs a
code relocation
–
–
–
–
–
Contents are cached as data during the move
Contents are fetched as instructions after the move
The instruction queue doesn’t snoop the data cache
The load/store unit doesn’t snoop the instruction cache
There is no cache coherency for the instruction cache
[Rev 2.8]
100 of 107
CPM Realities
• The CPM operates independently of the CPU
• The CPM is not debugged yet.. Expect the unexpected
• Early releases of the silicon didn’t propagate watchdog resets to
the external reset pin
•
“Last Buffer Interrupt” occurs at the beginning of transmission
[Rev 2.8]
101 of 107
Exception Routines

Exception Routines are difficult to debug
 The Recoverability of exceptions is an issue
 On board hardware breakpoints do not work in
the head or tail of an exception handler
[Rev 2.8]
102 of 107
Tracing at the Bus Cycle Level

The 8260 comes in a BGA package
 Connecting to an emulator
 Connecting to an analyzer
[Rev 2.8]
103 of 107
Connecting to an Emulator
Connection to Emulator
Buffer Board
Original 8260
BGA site to pin socket
Target Adaptor
Pin header
Target board
[Rev 2.8]
104 of 107
Connecting to an Analyzer
Mictor Connectors
8260
Target board
[Rev 2.8]
105 of 107
Connecting to an Emulator or
Analyzer
Connection to Emulator
Socket to Mictor adaptor
- OR Buffer Board
Original 8260
BGA site to pin socket
Target Adaptor
Pin header
Target board
[Rev 2.8]
106 of 107
Summary of debug issues
•Init MMU before turning on caches
•Loads and stores can be re-ordered
•The CPM doesn’t use the MMU’s
or the caches
•Don’t single step through moves to
or from SPR’s
•ISR’s can not have breakpoints in
the first or last few instructions
•Each processor must have it’s own
JTAG connector
•JTAG lines must be terminated
with 1K or 2K values (depending
on the signal)
•JTAG connector should be within 2
inches of the processor
•Provide for the ability to pull
RSTCNFG high
•When using the 750 as the CPU,
provide the ability to access the
8260 configuration word in flash
•Don’t place code or program data
on the local bus
[Rev 2.8]
107 of 107
Download