Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems 1 EECC-756 Modern Design Trends Larger on-chip caches Extended levels of cache System-on-a-chip integration Overall increasing design complexity All lead to more complex debugging of designs 2 The Good News Automated design tools are minimizing design errors IP reuse minimizes bugs Simulation tools discover most logic errors before fabrication Massive test suites allow comprehensive testing So what happened to Intel with FPU flaw? 3 Past Methods for Debugging Signal probing Bus monitoring Software debugging 4 Past Methods for Debugging (cont’d) Signal probing – More internal logic per pin = less info on pin – Pin inaccessibility due to modern packages (i.e. sockets, BGAs) Bus monitoring – Caches hide data accesses Software debugging – Impractical for real-time applications – Little or no hardware support in the past 5 Solutions Test Access Port (TAP) – Uses JTAG IEEE1149.1 specification for boundary scan Probe Mode – Allows step by step analysis of code impact on internal registers In-circuit Emulation (ICE) – Allows execution tracing – Real-time applicability 6 Test Access Port (TAP) Implementation of boundary scan JTAG IEEE1149.1 specification Allows access to all internal flip-flops in boundary scan chain Numerous chains serve different functions (i.e. IO flip-flops) Allows non-destructive snapshot of internal state at any point in time 7 Test Access Port (cont’d) Single instruction register Multiple data registers (scan chains) 8 Probe Mode Special processor mode halts program execution Uses the TAP interface to receive instructions and output internal data Allows read/write access to any internal registers Allows memory accesses to test cache functionality 9 Probe Mode (cont’d) 10 In-Circuit Emulation (ICE) Support Special pins provide branching information Example: Pentium Dual Pipeline – 3 dedicated pins IU – Asserted when instruction completes in the U instruction pipeline IV – Asserted when instruction completes in the V instruction pipeline IBT – (Instruction Branch Taken) Asserted when a branch is taken 11 In-Circuit Emulation (cont’d) Branch signal information provides realtime code tracing Branch trace message buffers provide further information Branch trace message buffers in conjunction with Probe Mode allow detailed realtime code tracing 12 Branch Trace Message Buffers FIFO queue Can be read through TAP during program execution Circular mode (trace-back from breakpoint) vs. Jump-to-Probe Mode (maintain instruction stream) Incident counter expands buffer size Intel automatically generates a special BTM cycle on local bus to export BTM info 13 Branch Trace Buffer Logic Implementation 14 Multiprocessor Issues Three methods for opening the “black box” on a single processor system – TAP (boundary scan) – Probe Mode – Branch Tracing Methods for ICE Multiple processor system design also has challenges 15 Multiprocessor Challenges Race conditions due to parallel data accesses Inconsistent and unpredictable network paths Differing processor behaviors on heterogeneous networks Communication patterns that restrict performance or scalability 16 Multiprocessor Solutions : Debugging Code Create sequential version of code Execute parallel tasks on a single computer as separate processes Visualization tools that create space-time diagrams or animations to show 2dimensional changes of state Unified Trace Environment (IBM) 17 Multiprocessor Solutions : Debugging Designs Ability to monitor communication packets circumvents most visibility problems – Debug messages can be included in packet Network protocol simulations – Protocol verification programs (i.e. petri-nets) – Network communication pattern simulators However ... 18 Multiprocessor Design Trends Currently, uniprocessor designs are hitting roadblocks – large dies impractical signal transit time – routing increases exponentially with die size One possible solution : multiple processors on a single die re-emergence of visibility problems 19 Conclusion Several methods available for internal execution tracing of uniprocessors – Test Access Port (JTAG IEEE1149.1) – Probe Mode extension – Branch Tracing Don’t count out TAP, Probe Mode, and ICE for multiprocessors 20