Goals • • • • Provide an overview of the 8260 device Allow a quick start of an 8260 design cycle Gain familiarity with debug issues particular to the 8260 Create the basis to build further experience [Rev 2.8] 1 of 107 Outline • 8260 Architecture • Application examples • Debug considerations [Rev 2.8] 2 of 107 Outline • 8260 Architecture – – – – Device overview Core CPU SIU CPM [Rev 2.8] 3 of 107 EC603e PowerPC Core 16 KB I-Cache IMMU 16 KB D-Cache DMMU COMM. PROCESSOR MODULE Internal Four Serial Interrupt Memory Timers DMAs Controller Space Parallel I/O 32-bit RISC and Virtual Baud Rate Generators Timers Program ROM IDMAs SYSTEM INTERFACE UNIT 60x Bus Interface Unit PowerPC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L2 Cache Controller System Functions MCC1 MCC2 FCC1 FCC2 FCC3 SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C Serial Interface Time Slot Assigner 8 TDMs MII 2 UTOPIA [Rev 2.8] 4 of 107 CPU • • • • • • Based on the MPC603e core Up to two instructions fetched per clock Up to three instructions issued and retired per clock Up to five instructions in execution per clock Most instructions execute in one clock Branches can execute in zero clocks [Rev 2.8] 5 of 107 Programming Model 32 bits 64 bits GPR0 GPR1 GPR2 GPR3 GPR4 FPR0 FPR1 FPR2 FPR3 FPR4 CR XER FPSCR MSR PVR GPR30 GPR31 FPR30 FPR31 CTR LR TBU TBL SRR0 SRR1 DEC SPRn SPRx [Rev 2.8] 6 of 107 MSR Bit 0 is MSB 0 0 0 0 0 0 0 0 0 0 0 0 0 POW 0 ILE EE PR FP ME FE0 SE BE FE1 0 Bit 31 is LSB IP IR DR 0 0 RI LE Power management enabled Interrupt little endian mode External interrupt enable Privilege level Floating point available Machine check enable Floating point exception mode [0,1] Single step trace enabled Branch trace enabled Exception [interrupt] prefix Instruction address translation enabled Data address translation enabled Recoverable exception Little endian mode [Rev 2.8] 7 of 107 CPU Overview Inst. Cache Branch Processing Sequential Fetcher System Register Unit Instruction Queue Dispatch Inst. MMU CTR CR LR Floating Point Unit Instruction Unit / + * Integer Unit GPR File R0-R31 / + * XER GP Rename Regs FPR File Load/Store Unit FPR0-FPR31 FP Rename Regs Data MMU Completion Unit Main Memory Data Cache [Rev 2.8] 8 of 107 Execution Units • Execution units operate in parallel – – – – – – Fetch / Branch Integer Floating Point Load / Store System Completion [Rev 2.8] 9 of 107 Fetch / Dispatch • • • • Instructions are fetched in pairs Non-branch instructions enter the instruction queue Branch instructions are redirected to the branch unit Two instructions can be sent to the execution units and one to the branch unit for a total of three issued instructions per clock • All instructions “appear” to execute sequentially [Rev 2.8] 10 of 107 On each CPU clock: 64 bit wide transfer from instruction cache Instruction Instruction Cache Instruction Instructions fall through to first open location in queue Instruction Instruction Instruction Instruction Instruction Branch instruction closest to the bottom of the queue is issued to the branch unit on each clock Bottom two non-branch instructions are dispatched to available execution units Instruction Execution Unit Instruction Execution Unit Instruction Branch Processing CTR CR LR [Rev 2.8] 11 of 107 Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Instruction queue provides look ahead to determine data dependencies • Unresolved conditional branches are statically predicted under control of the compiler [Rev 2.8] 12 of 107 Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Link Register by the branch function GPR1 Branch to sub LR Instructions save the LR to the stack to allow nested function calls Branch to sub The LR is reused for another call LR Branch to LR The LR is recalled from the stack to allow a return from subroutine Branching to the contents of the LR is a return instruction [Rev 2.8] 13 of 107 Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls and allow instructions to be un-executed • Most instructions execute in one clock • Divides have been optimized over the 603 to reduce latency by 50% [Rev 2.8] 14 of 107 Floating Point • Floating point unit directly accesses the FPR file • Rename registers prevent stalls and allow instructions to be un-executed (The same as in the integer GPR file) • Supports single (32 bit) and double (64 bit) precision operands • Three stage pipeline accepts one instruction per clock • Supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity) in hardware, eliminating the latency incurred by software exception routines [Rev 2.8] 15 of 107 Load/Store • Responsible for all transfers between the GPR file and main memory • Instructions appear to execute in order • Actual accesses can occur out of order • Loads from cache execute in one clock with a two clock latency • Stores to cache execute in one clock with a latency of three clocks • Speculative loads are placed in the rename registers • Speculative stores remain in the store queue [Rev 2.8] 16 of 107 System • Performs moves to and from SPR’s • Doubles as an auxiliary integer unit – Executes add / compare instructions – Executes condition register logical operations • Instructions that affect processor mode force serialization of the processor [Rev 2.8] 17 of 107 Completion • Holds instructions executed in parallel or out of order until they can be retired in order • Retiring an instruction commits it’s results to the processor state • Simply discarding an instruction from the completion queue effectively un-executes it • Two instructions can be retired per clock [Rev 2.8] 18 of 107 Instruction Set • 68K instructions were based on an accumulator, direct memory model add (0x00035300).L, D4 D0 D1 D2 D3 D4 D5 D6 D7 0x00035300 + [Rev 2.8] 19 of 107 Instruction Set • PowerPC instructions are based on a triadic, load/store model lwz add r2,0x00035300 r6,r2,r4 GPR0 GPR1 GPR2 GPR3 GPR4 GPR5 GPR6 GPR7 0x00035300 + GPR31 [Rev 2.8] 20 of 107 Exceptions • All exceptions cause processing to vector to a predetermined memory location • The base address of the vector table is controlled by the [IP] bit in the MSR • Each vector is placed at a page boundary • • • • • • 64 instructions can be placed at a vector before hitting the next vector Reset = 0xnnn00100 Machine Check = 0xnnn00200 External Interrupt = 0xnnn00500 Decrementer = 0xnnn00900 Etc. [Rev 2.8] 21 of 107 Exceptions Flash MSR[IP] = 1 FFF00100 Instruction 64 instructions External 500 Instruction Instruction 64 instructions ISI 400 Instruction Instruction 64 instructions DSI 300 Instruction Instruction 64 instructions RAM 00000100 MSR[IP] = 0 Machine Check 200 Instruction Instruction 64 instructions Reset 100 Instruction [Rev 2.8] 22 of 107 Exceptions • Only the Decrementer and the External Interrupt can be masked by the [EE] bit in the MSR • Machine Check exceptions can vector to a routine or force Checkstop state • All other exceptions are synchronous (caused by instruction execution) and are unmaskable [Rev 2.8] 23 of 107 Nesting Exceptions • When an exception occurs, return state is stored in the processor • • • • There is no automated stacking of critical registers The address of the return instruction is stored in SRR0 The MSR prior to the exception is in SRR1 The [EE] bit of the MSR is cleared • The processor must save these registers and any other GPR’s to a software maintained stack • The EABI specifies GPR1 to be the stack pointer • The [RI] bit in the MSR is set by software when enough information is saved to allow recovery from a nested exception [Rev 2.8] 24 of 107 Exception Control Flow Address of this instruction is placed into SRR0 by the hardware An exception after the completion of this instruction causes flow to be directed to the Software maintained stack GPR1 ISR SRR0 SRR1 Instructions save the SRR’s to the stack to allow nested exceptions The MSR[RI] bit is cleared by the exception hardware and set by software after the SRR’s have been saved An exception while MSR[RI] is cleared causes a machine check event The MSR[RI] bit is cleared by the software just before the SRR’s are restored by the software It is safe for exceptions to occur in this section of code Breakpoints Are Exceptions! The SRR’s is recalled from the stack to allow a return from subroutine rfi [Rev 2.8] 25 of 107 Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each cache is 16Kbyte, four way set associative • Caching of separate memory areas is controlled by the MMU [Rev 2.8] 26 of 107 Cache Organization 0 Stored in address tag (20) 128 sets Set select (7) 31 Word Byte Way 0 Block 508 Way 1 Block 509 Way 2 Block 510 Way 3 Block 511 Way 0 Address Tag 0 State Words 0-7 Block 0 Way 1 Address Tag 1 State Words 0-7 Block 1 Way 2 Address Tag 2 State Words 0-7 Block 2 Way 3 Address Tag 3 State Words 0-7 Block 3 [Rev 2.8] 27 of 107 Cache Operation • Each cache block (or line) can be in one of three state (MEI protocol) – M = modified (or dirty) • Resides in cache and is different than memory – E = exclusive (resident and clean) • Resides in cache and is identical to memory – I = invalid (not resident) • The “shared” state of the full MESI protocol is not supported – Would allow synchronization of multiply cached blocks • There is no cache coherency for the instruction cache [Rev 2.8] 28 of 107 Cache control • Hardware implementation dependent registers (HIDn) control cache function – Enabling – Invalidate – Locking • Supervisor instructions provide block level control – Allocate, flush, invalidate, store, touch, zero • Ability to store a given block of memory into the cache is controlled by the MMU – Each block or page in the MMU has WIMG bits • (Write-through, Inhibited, Global, Guarded) [Rev 2.8] 29 of 107 MMU • The MMU provides for both memory translation and access control • The system boots in Real (un-translated) mode • To effectively use the caches, the MMU must be used in block or page mode – Effectively, a null translation is performed [Rev 2.8] 30 of 107 Protection • The primary use of the MMU in embedded applications is for cache control and access protection • The WIMG bits are set for each page – – – – W = write-through (applicable only to data cache) I = inhibited M = memory coherency supported in hardware G = guarded (indicates that memory is ill-behaved) • I/O spaces • All accesses are forced to be in order • No speculative reads or pre-fetches [Rev 2.8] 31 of 107 Translation • Block or page translation allows the full use of a virtual memory model • Block translation provides a memory space of 232 bytes • Page translation provides a virtual memory space of 252 bytes • System must be debugged with RTOS tools – Emulators and hardware debuggers don’t support it [Rev 2.8] 32 of 107 Real mode 32 Logical address WIMG: W = 0: write-back I = 0: cache enable M = 1: data is global G = 1: memory is guarded 32 Physical address [Rev 2.8] 33 of 107 BAT mode 4 11 17 BEPI (15) WIMG 4 BRPN BL (11) & 11 + BAT Reg n 4 11 Logical address 17 Physical address [Rev 2.8] 34 of 107 Page mode Logical address 4 16 12 Segment register Virtual address 24 16 12 40 TLB page table 20 WIMG 12 Physical address [Rev 2.8] 35 of 107 Reset operation Reset Source Power-on reset External hard reset Software watchdog Bus monitor Checkstop External soft reset Reset PLL System configuration sampled Clock module reset HREST driven Other internal logic reset SREST driven Core reset yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes [Rev 2.8] 36 of 107 Reset Types • Power-on reset is used to align all logic from a chaotic state after Vcc stabilizes – The PLL then begins to lock • Hard reset is analogous to the normal reset on other processors – The PLL is not affected • Soft reset can be used to initiate a warm start – Not commonly used – Not driven or monitored by the emulator – Basically, a non-returnable exception to the reset vector [Rev 2.8] 37 of 107 Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted HREST & SREST asserted SREST asserted PLL locks RSTCONF sampled RSTCONF sampled Internal logic reset Internal logic reset Internal logic reset HREST & SRESET negated HREST & SRESET negated SRESET negated [Rev 2.8] 38 of 107 Memory Map Startup Boot Map CS0 At boot, CS0 is active for one of two large areas of the address space. All other chip selects are invalid. Before Config Word After Config Word Application Target Map Flash Flash Flash Flash Flash IMMR Flash IMMR CSi IMMR I/O Flash Flash CSx,y,z Flash RAM Flash Flash [Rev 2.8] 39 of 107 Memory Map Implications •Since the Flash memory access by CS0 occupies one of two large areas in the address space, boot code can be linked to execute in a number of different locations •Any branches will change the NIA from the boot location to the linked location •All other chip selects are off •IMMR RAM is still available •CS0 must be reduced in scope before activating other chip selects •Be careful no to pull the rug out from under the boot code when reducing CS0 •BSP re-entry issues: •Altering chip select option registers while assuming the value in the Valid bit •Can the chip selects to the RAM and Flash be altered while running out of either? [Rev 2.8] 40 of 107 Memory Map Init Issues •Three different factors can enhance (confuse) the boot process: •The MSR[IP] •The reset vector can be 0x0000_0100 or 0xfff0_0100 •Determined by the Reset Configuration Word •Not changed by an SRESET •CS0 scope •CS0 responds to either a the upper or lower end of the memory map •It must be changed while it is being used •It may have already been reduced by a previous pass through the BSP •Code link results •Execution can start in code that is linked to a different address than the boot vector •Only the address lines within the memory device are significant •PC Relative addressing will solve this, right? WRONG! •The first branch, will set the NIA MSB’s to the current execution value [Rev 2.8] 41 of 107 RTOS Boot Sequences Compressed application image Flash External application image Boot Code Boot code decompresses and relocates application from flash BSP IMMR Data, stack, heap, etc. I/O Chip Select x Uncompressed application image BSP Boot code loads application over communication channel or backplane Base Register Base Address RAM V Option Register Mask Options [Rev 2.8] 42 of 107 Endian Bus Connections 31 MS Byte Lane 24 7 0 8 Bit 7 LS Byte Lane 0 7 0 8 Bit 0 MS Byte Lane 7 7 0 8 Bit 68K 7 LS Byte Lane 0 31 MS Byte Lane 24 X86 PPC 24 LS Byte Lane 31 [Rev 2.8] 43 of 107 Big Endian Bus 8 Bit 16 Bit 7-0 15-8 0-7 0-7 7-0 32 Bit 31-24 23-16 15-8 7-0 0-7 8-15 8-15 16-23 8260 0 7 8 15 16 23 24 31 32 39 40 47 48 55 56 63 MS Byte Lane 24-31 63 56 Byte Lane 55 48 Byte Lane 47 40 Byte Lane 39 32 Byte Lane 31 24 Byte Lane 23 16 15 8 Byte Lane LS Byte Lane 64 Bit 7 0 [Rev 2.8] 44 of 107 Configuration Word • Configuration word is latched from Flash memory during reset cycle • A 32 bit value is loaded 8 bits at a time from the high order bits of the data bus – Immune to boot memory width • RSTCONF pin allows configuration word to be forced to all zero • Multiple 8260 can access the same memory device [Rev 2.8] 45 of 107 Configuration Word Contents EARB EXMC CDIS EBM BPS CIP BMS BBD ISPS L2CPC MMR • • • • EARB – External arbitration EXMC – External memory controller CDIS - Core disable EBM - External bus mode • • BPS – Boot port size CIP – Core initial prefix • • • ISPS – Internal space port size L2CPC – L2 cache control pins DPPC – Data parity pin configuration • ISB – Internal space base address LBPC DPPC - APPC ISB CS10PC - MODCK_H BMS – Boot memory space BBD – Busy bus disable MMR – Mask Masters request LBPC – Local bus pin configuration APPC – Address parity pin configuration CS10PC – CS10 pin configuration MODCK_H – MODCK high order bits [Rev 2.8] 46 of 107 Configuration Word Format 8 bit wide boot device Address offset from CS0 603 bus MSB byte lane (0-7) 0x00 0x01 Byte 0 Ignored 0x08 0x09 Byte 1 Ignored 0x10 0x11 Byte 2 Ignored 0x18 0x19 Byte 3 Ignored 32 bit wide boot device Address 603 bus offset from MSB byte CS0 lane (0-7) 0x00 0x04 0x08 0x0C 0x10 0x14 0x18 0x1C Byte 0 Ignored Byte 1 Ignored Byte 2 Ignored Byte 3 Ignored 603 bus byte lane (24-31) Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored [Rev 2.8] Ignored Ignored Ignored Ignored Ignored Ignored Ignored Ignored 47 of 107 Configuring a single 8260 8260 A bus D bus Vcc RSTCONF 8260 A bus D bus Boot Flash RSTCONF [Rev 2.8] 48 of 107 Configuring multiple 8260’s Master 8260 A bus D bus Boot Flash RSTCONF Slave 1 DA bus bus 8260 RSTCONF Slave 7 DA bus bus 8260 RSTCONF A0 A6 [Rev 2.8] 49 of 107 SIU • The SIU contains the logic to interface the external system components to the 8260 • Contains all of the glue logic needed for a typical embedded application [Rev 2.8] 50 of 107 SIU Overview SYSTEM INTERFACE UNIT 60x Bus Interface Unit PowerPC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L2 Cache Controller System Functions [Rev 2.8] 51 of 107 603e Bus • Very high performance bus – – – – – Separate address and data tenures Pipelined Bursting Multi-master Cache snooping [Rev 2.8] 52 of 107 603e bus cycle Address only cycle to support cache snoop Address Data [Rev 2.8] 53 of 107 Local Bus Two busses, one address map: Address map Flash Flash Code/Data SDRAM CPM Buffer SDRAM Code/Data SDRAM Memory Control CPM Buffer SDRAM [Rev 2.8] 54 of 107 Memory Control • 12 banks of memory – Each can be configured for any type of device • Glueless support of SDRAM devices • Glueless support of SRAM, EPROM, Flash – Using general purpose chip select machine • Three user programmable machines • All memory controllers can be allocated to either the 603 or local bus [Rev 2.8] 55 of 107 System control • • • • • • • • Clock synthesis Reset control Interrupt control Real time clock Periodic interrupt timer Bus monitor Bus arbiter Watchdog timer [Rev 2.8] 56 of 107 Interrupt Control Software Watchdog Timer Or IRQ0 IRQ[0-7] MCP Fall / Level Port C [0-15] CPM Channels Edge / Fall Interrupt Controller IRQ[1-7] INT 603 Core On board Timers [Rev 2.8] 57 of 107 SIU Interrupt Vectors • All external interrupts cause processing at 0xnnn00500 – There is space for 64 instructions to save processor state and resolve the SIU vector • Vectors are six bits – Shifting w/ indirect addressing is used to decommutate to service routines – A 16 bit load from the long word address of the SIVEC register will point to a 64 entry array of 1K byte (256 instructions) service routines. – An 8 bit load will allow a 64 entry jump table of branch instructions [Rev 2.8] 58 of 107 SIU Interrupt Vector Register 5 6 0 Six Bit Interrupt Code 0 7 8 0 0 15 16 0 0 0 0 0 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 bit read from address 0xnnn10C04 16 bit read from address 0xnnn10C04 32 bit read from address 0xnnn10C04 [Rev 2.8] 59 of 107 SIU Interrupt Vectors 8 bit Read Six Bit Interrupt Code 0 0 Table of branch instructions to ISRs Each vector value points to a different branch instruction in the table ba routine_g ba routine_f ba routine_e ba routine_d ba routine_c ba routine_b ba routine_a _18 _14 _10 _0c _08 _04 _00 [Rev 2.8] 60 of 107 SIU Interrupt Vectors 16 bit Read Six Bit Interrupt Code 0 0 0 0 0 0 0 0 0 0 nnnn0fff Each vector value points to a block of 1K bytes / 256 instructions 256 32-bit instructions nnnn0c00 nnnn0bff 256 32-bit instructions nnnn0800 nnnn07ff 256 32-bit instructions nnnn0400 nnnn03ff 256 32-bit instructions nnnn0000 [Rev 2.8] 61 of 107 CPM • Communications processor module • Direct hardware support for all protocol and application interfaces – Ethernet, ATM, HDLC, T1/E1, T3/E3, BiSync, UART, ISDN, PCM highway – Parallel I/O – Full serial and virtual DMA support [Rev 2.8] 62 of 107 IMMR Format • All on-chip peripherals are accessed though a single 128K byte area of memory • Within the first 64K of address space, there are three blocks of dual ported RAM • The second 64K of address space contains the control registers of the on-chip peripherals [Rev 2.8] 63 of 107 0x1_ffff IMMR Map Upper 64K Hardware Registers 0x1_4000 SI routing RAM (8K) 0x1_2000 0x1_1c00 Control registers (7K) 0x1_0000 0x0_c000 0x0_b000 Lower 64K Dual Ported RAM FCC Data (4K) 0x0_9000 0x0_8000 Parameter RAM (4K) 0x0_4000 Buffer Descriptors / uCode / Data (16K) 0x0_0000 [Rev 2.8] 64 of 107 Dual Ported RAM usage • The layout of the Dual Ported RAM is determined by the uCode in the CPM • When the CPM is not in operation, it is nothing more than internal memory – During the boot sequence, stack, global data, and heap can reside in this memory – Initialization code can be written in C++! – A multi-layered boot process can be used • First code resides in flash, uses internal RAM to setup chip selects • Second code resides in another section of flash and uses external RAM to load main application over a CPM channel • Third level is the main application – Each level has it’s own crt0.s function and initializes the EABI from scratch [Rev 2.8] 65 of 107 CPM Overview COMM. PROCESSOR MODULE Four Internal Interrupt Timers Memory Controller Parallel I/O Space Baud Rate 32-bit RISC and Generators Timers Program ROM Serial DMAs Virtual IDMAs MCC1 MCC2 FCC1 FCC2 FCC3 SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C Time Slot Assigner Serial Interface [Rev 2.8] 66 of 107 DMA’s • Serial DMA’s – Full bi-directional support of all serial channels – Can access the 603 or local bus • Virtual DMA – 4 channels – Uses the serial DMA hardware to generate transfers – Memory to memory or memory to/from I/O [Rev 2.8] 67 of 107 CPM Buffer Structure BD128 IMMR BD3 BD2 BD1 RAM [Rev 2.8] 68 of 107 Buffer Descriptor Format 16 bits Status and Control Data Length High Order Pointer Low Order Pointer [Rev 2.8] 69 of 107 From Channel to Buffer Location fixed by: - Hardware channel Format fixed by: - Protocol Communication Channel hardware Parameter RAM Dual ported RAM (Buffer Descriptors) Location determined by: - Value in Buffer Descriptor - Memory controller mapping of Local/603 bus Format determined by: - Protocol Data Buffers Location determined by: - Parameter RAM value Format of control and status determined by Protocol [Rev 2.8] 70 of 107 SCC’s • The SCC’s implement the following protocols: – – – – SDLC/HDLC AppleTalk UART 10-Mbps Ethernet [Rev 2.8] 71 of 107 Ethernet Frame Stored by CPM in Receive buffer Stored by CPU in Transmit buffer Preamble Start Frame Destination Address Source Address Type / Length 7 bytes 1 byte 6 bytes 6 bytes 2 bytes Data 46 - 1500 bytes Frame Check 4 bytes [Rev 2.8] 72 of 107 Ethernet Buffer Descriptor Receive Control & Status E Transmit Control & Status R Common for Transmit and Receive - W I L F PAD - M - LG NO SH CR OV CL W I L TC DEF HB RC RL RC UN CSL Data Length High Order Pointer Low Order Pointer [Rev 2.8] 73 of 107 Status and Control Definitions Receive Control & Status E - W I L F - M - LG NO SH CR OV CL First in Frame: Set by the CPM to inform the CPU that this is the start of a new frame. Last in Frame: Set by the CPM or the CPU to inform the other that this is the last buffer of a frame. Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM Transmit CRC: Transmit the CRC after this buffer Transmit Control & Status R PAD W I L TC DEF HB RC RL RC UN CSL [Rev 2.8] 74 of 107 Transmit Frames Parameter RAM points to this BD R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=1 I=1 L = 1 TC = 1 After all buffers are filled, “R” is set to “1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2.8] 75 of 107 Receive Frames Parameter RAM points to this BD E=1 W=0 I=0 L= 0 F= 1 E=1 W=0 I=0 L= 0 F= 0 E=1 W=0 I=0 L= 0 F= 0 E=1 W=0 I=0 L= 0 F= 0 E=1 W=0 I=0 L= 1 F= 0 E=1 W=0 I=0 L= 0 F= 1 E=1 W=0 I=0 L= 0 F= 0 E=1 W=0 I=0 L= 1 F= 0 E=1 W=1 I=0 L= 1 F= 1 After all buffers are filled, “E” is set to “1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2.8] 76 of 107 The [E/R] bits Initial Value Operation Transmit [Ready] 0 Fill with data by CPU Receive [Empty] 1 Fill with data by CPM Changed by Changed to Operation Changed Changed by to CPU 1 CPM transmits buffer CPM 0 CPM 0 CPU reads buffer CPU 1 Polarity can be confusing because the sense is reversed for complementary operations. However, the same level always indicates who [CPU vs. CPM] owns the buffer. This bit is the same for all protocols on all channels. [Rev 2.8] 77 of 107 The [W] bits • The Wrap bit is always set to indicate the last buffer descriptor for the channel • It does not delineate frames! • The value of the first buffer descriptor is stored in the channel’s parameter RAM – The list of BD’s is bounded by the parameter RAM and the [W] bit • Any BD past a BD with the [W] bit set, that’s not pointed to by parameter RAM is inaccessible by the CPM • This bit is the same for all protocols on all channels. [Rev 2.8] 78 of 107 The [I] Bits • The Interrupt bits generate an interrupt to the CPU when the CPM hands the BD to the CPU – Whenever the CPM flips the [E/R] bit to “0” • A redundant phrase, the CPM can only flip that bit to “0”, right? • For transmit, it’s common to only receive an interrupt at the end of transmission of the last buffer • For receive, the last buffer is not known, so it’s more common to receive an interrupt for most buffers on non-frame oriented protocols – If a buffer is small enough that it can’t contain an entire frame, then this bit might be cleared • The CPU has to stay ahead of the CPM to know when a wrap occurred – On Ethernet, the end of frame interrupt is more efficient • This bit is the same for all protocols on all channels. [Rev 2.8] 79 of 107 The [L] Bits • The Last bits indicate the end of a frame within the list of buffer descriptors • Set and cleared by the CPU on transmit frames – The CPM only reads this bit for transmit • Set by the CPM on receive frames – Should be cleared by the CPU before the [E] is used to hand the buffer to the CPM • This bit is not the same for all protocols on all channels. [Rev 2.8] 80 of 107 The [F] Bits • The First bit is only present in receive frames • Set by the CPM to tell the CPU that this buffer starts a frame – An underrun, late collision, or aborted frame can cause a new frame in the next buffer without the [L] bit being set in the previous BD • Not needed for transmit – The CPU will control the state of the CPM with the [L] bit – An [L] bit set or an underrun will cause the next buffer to be considered the first buffer of a frame • This bit is not the same for all protocols on all channels. [Rev 2.8] 81 of 107 The [TC] Bits • The Transmit CRC bits work in conjunction with the [L] bit • The [TC] bit is ignored if the [L] bit is cleared • Initializing all [TC] bits to “1” is a good precaution • Only custom protocols that don’t use hardware generated CRC’s should have this bit cleared • This bit is not the same for all protocols on all channels. [Rev 2.8] 82 of 107 Subtle points on BD’s • Frames can span buffers • Buffers never span frames – Unless you have all hardware support turned off and are running transparent • Be careful with small receive buffers that have the [I] bit set – You’ll get hammered with interrupts • Turn buffers over to the CPM from last to first – If an interrupt interferes with the handoff, an underrun / overflow can occur • Hands off a BD with the [E/R] bit set – Unless you like working weekends [Rev 2.8] 83 of 107 FCCs • The FCC’s support: – – – – 10/100-Mbps Ethernet through an MII Full 155 Mbps ATM SAR through UTOPIA 45Mbps HDLC (DS-3) Operation is similar to SCCs • Block mode allows buffers to be dynamically moved into dual ported RAM [Rev 2.8] 84 of 107 FCC Buffer Descriptors • Identical in format to the SCC’s buffer descriptors • Except: – Buffer descriptors, as well as buffers are in main memory – Pointers to buffer descriptors in the parameter RAM are 32 bits • Buffer descriptors must still be in consecutive memory locations [Rev 2.8] 85 of 107 SMC’s • The SMC’s perform basic UART as well as transparent mode transmission • Buffer description operation is identical to the SCC’s – The status and control word has different bit fields pertaining to the protocols – Bit fields controlling protocol independent operation are unchanged [Rev 2.8] 86 of 107 Status and Control Definitions [SMC in UART mode] Receive Control & Status E - W I - - CM ID - BR FR PR - OV - Idle: Close buffer on reception of idles Continuous mode: [E] bit isn’t cleared on buffer reception Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: Transmit Control & Status R - W I - - CM P - - 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM - - - [Rev 2.8] 87 of 107 MII PQ II MPC 8260 FCCn Transmit Error (Tx_ER) Transmit Nibble Data (TxD[3:0]) Transmit Enable (Tx_EN) Transmit Clock (Tx_clk) Collision Detect (COL) Receive Nibble Data (RxD[3:0]) Receive Error (Rx_ER) Receive Clock (Rx_clk) Receive Data Valid (Rx_DV) Carrier Sense output (CRS) Management Data Clock (MDC) Management Data I/O (MDIO) Fast Ethernet PHY [Rev 2.8] 88 of 107 Utopia Interface A[24-31] D[0-7] ATMCS0* BCTL0* PWE0*/PDQM/PBS0* ATMRST* DP6/CSE0/IRQ6* MPC8260 A[7-0] D[7-0] CS* RD* WR* RST* ALE INT* PM5350 [Rev 2.8] 89 of 107 Applications • Performance drives the complexity of the 8260 system – Single processor • Single 8260 • Multiple 8260’s with all but one core turned off • Multiple 8260’s with all cores off, using an external MPC750 – Multiple processor • Combinations of 8260’s and 750’s [Rev 2.8] 90 of 107 Single 8260 MPC8260 SDRAM/SRAM/DRAM/Flash 60x Bus PHY PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2.8] 91 of 107 Multiple 8260s MPC8260 PHY PHY SDRAM/SRAM/DRAM Local Bus Communication Channels ATM Connection Tables SDRAM/SRAM/DRAM/Flash 155 Mbps ATM PHY UTOPIA 60x Bus MPC8260 PHY PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2.8] 92 of 107 MPC7xx w/ 8260(s) MPC7xx Backside Cache 32-Kbyte I cache 32-Kbyte D cache MPC8260 PHY PHY Communication Channels SDRAM/SRAM/DRAM/Flash 60x Bus SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2.8] 93 of 107 Debug Considerations What is JTAG JTAG Limitations Getting out of reset The 60x Core and Bus The cache is on CPM Realities Exception Routines Tracing at the Bus Cycle Level [Rev 2.8] 94 of 107 What is JTAG? JTAG is a SLOW serial connection to the 8260 CPU resources The serial data is called the scan chain. JTAG provides the ability to modify memory and registers. The scan chain for each processor is different. JTAG was not created for Debug… [Rev 2.8] 95 of 107 JTAG connection • JTAG connector allows for full run control of the processor • The emulator can sync with the processor without disrupting it’s state TDO TDI QREQ* TCK TMS SRESET* HRESET* XBR3* TRST* 3.3V GND GND [Rev 2.8] 96 of 107 JTAG Limitations Slow download of code to RAM. JTAG accesses during execution MAY dramatically affect performance. All commands through JTAG must be “scanned in” [Rev 2.8] 97 of 107 Getting out of Reset Reset Configuration word of vital importance TRST must not be permanently asserted When flashing your boot code, be careful to replace or keep the configuration word What is your Interrupt Prefix? Switchable pullup on RSTCONF*? [Rev 2.8] 98 of 107 The 60x Core and Bus a STOP instruction must be scanned in (no breakpoint pin) only one hardware code breakpoint available; no hardware data breakpoints Address and Data do not necessarily appear on the bus at the same time Predictive Fetching means what you see on the bus may not be executed. [Rev 2.8] 99 of 107 The Caches are On • Bus Cycles now appear as bursts • Fetches are determined by the BIU, not related to instruction execution • No Cache Visibility pins • Instrumentation required for accurate debug • Caution must be exercised when the boot process performs a code relocation – – – – – Contents are cached as data during the move Contents are fetched as instructions after the move The instruction queue doesn’t snoop the data cache The load/store unit doesn’t snoop the instruction cache There is no cache coherency for the instruction cache [Rev 2.8] 100 of 107 CPM Realities • The CPM operates independently of the CPU • The CPM is not debugged yet.. Expect the unexpected • Early releases of the silicon didn’t propagate watchdog resets to the external reset pin • “Last Buffer Interrupt” occurs at the beginning of transmission [Rev 2.8] 101 of 107 Exception Routines Exception Routines are difficult to debug The Recoverability of exceptions is an issue On board hardware breakpoints do not work in the head or tail of an exception handler [Rev 2.8] 102 of 107 Tracing at the Bus Cycle Level The 8260 comes in a BGA package Connecting to an emulator Connecting to an analyzer [Rev 2.8] 103 of 107 Connecting to an Emulator Connection to Emulator Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2.8] 104 of 107 Connecting to an Analyzer Mictor Connectors 8260 Target board [Rev 2.8] 105 of 107 Connecting to an Emulator or Analyzer Connection to Emulator Socket to Mictor adaptor - OR Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2.8] 106 of 107 Summary of debug issues •Init MMU before turning on caches •Loads and stores can be re-ordered •The CPM doesn’t use the MMU’s or the caches •Don’t single step through moves to or from SPR’s •ISR’s can not have breakpoints in the first or last few instructions •Each processor must have it’s own JTAG connector •JTAG lines must be terminated with 1K or 2K values (depending on the signal) •JTAG connector should be within 2 inches of the processor •Provide for the ability to pull RSTCNFG high •When using the 750 as the CPU, provide the ability to access the 8260 configuration word in flash •Don’t place code or program data on the local bus [Rev 2.8] 107 of 107