Selected Topics in VLSI Topic #1 IP-based Design and Verification Flow C.P. Ravikumar IIT Madras Texas Instruments, India 1 Changing face of VLSI Design Domains are constantly changing Computer Revolution Communication Revolution Medical Revolution Name a company that was once known as a computer maker but is known in the communication space today. 2 Integration Attempt to integrate multiple functionalities – the quest for the Swiss army knife Will it be the computer? Will it be the cell phone? Will it be something else? How has the quest for this kind of integration impacted VLSI Design? 3 Consumer Markets Reaching out to a larger markets Consumer devices (Toys, Cameras, Gadgets) Consumers from different countries What do these mean from a VLSI Design view point? 4 Consumers expect connectivity Consumers want their gadgets to communicate with one another Wired communication Wireless communication HW#1 - For the communication protocols shown, make a table that shows the type of communication (serial/parallel, wireless/wired, short range/medium range) and 2 typical applications 5 Consumers expect graphical user interfaces and multimedia Digital Signal/Image Processing 6 Chip Design – Some Metrics What are some quality metrics for a design? Do all the metrics have the same weight? Which one do you think matters most? What are some tradeoffs you can foresee? 7 Time to Market and Design Cost Usually, “winner takes all” and late entrants face an uphill task What are the ways to reduce the TTM? Design Time + Manufacturing Time + Test Time Better design flows and tools are needed to reduce design time 8 IP Reuse based Design Flow - Benefits Design Reuse can help reduce design time IP Reuse can help reduce design cost Vendors provide IP such as processors, memories, connectivity blocks Team sizes and team skills IP Reuse allows us to separate IP Creation from IP Integration Consider the options of (a) Having an in-house analog team for designing IP such as ADC/DAC versus purchasing such IP and integrating. What are the relative benefits/issues? 9 Design Productivity Gap 10 Chip Design Flows Custom Design Flow RTL to GDSII Design Flow (aka ASIC Design Flow) IP-based Design Flow Platform-based Design Flow 11 RTL-to-GDSII Design Flow – at 10,000ft RTL creation and verification Block-level synthesis, timing analysis and ATPG Floorplanning Top-level RTL synthesis, timing analysis and ATPG Physical Design Parasitic Extraction Top-level timing analysis and ATPG Handoff constraints sc.lib macros top.v(hd) B1.v(hd) B2.v(hd) top.gdsii B1 B2 reports 12 IP Reuse - Challenges IP vendors have multiple clients with different requirements One size may not fit all Does IP meet all functional requirements? Standards compliance IP verification becomes a challenge Multiple vendors may supply IP – extra effort needed to interface them (“Heterogeneous IP”) Does IP meet area and performance constraints? Sometimes, there may be surprises after the IP is integrated We may need to add extra hardware for protocol translation 13 IP integration challenges Verification in the context of the SoC Performance Standards Compliance Functionality Unconnected pins, Unknown values, etc. Three students (A, B, C) form a team for writing a report. B and C write sections. A integrates them. What problems might come up during integration? Page count exceeds limit, File provided by C not compatible, File provided by B may have a virus, Repetitions, Referencing problems, etc. 14 Example You are designing a 32-bit two’s complement multiplier and wish to use an adder as an IP You only have a 16-bit adder available IP assumes that inputs are coming MSB-first, whereas another IP provides the outputs LSB-first IP assumes Big-Endian storage of data; another IP stores data in Little-Endian format Voltage levels of two IP may be different Interface wrappers are used to overcome these problems. Wrappers will result in overheads. 15 Eight elements for judging the quality of Silicon Hardware/Software IP 16 Courtesy: Synopsys (2006) IP cores Hard core, Soft core Hardware IP, Software IP Design IP, Test IP, Verification IP Processors, Memories, Hardware Accelerators, Peripherals, Analog Search for some IP vendors who provide processor cores, memory cores, design libraries, Analog IP, IP for USB 2.0 17 Hard IP and Soft IP Hard IP Technology Dependent Predictable – already proved in silicon More protected against illegal usage Limitation – cannot be customized Soft IP Technology Independent Risk Can be modified to suit the needs of the SoC Application-specific Hard IP – tries to combine the best of both worlds. 18 Hardware/Software Codesign Two important steps in IP integration Provide a hardware interface Software Driver Pin/Signal mapping, Protocol translation, Buffering Access to IP functionality through the OS Implement some functions in hardware and others in software for area/performance/power tradeoff Example of JTAG: DCT in hardware, other functions in software 19 IP Standards OCP-IP is a well-known standard to mitigate the problem of interfacing IP from multiple vendors Many IP are OCP-compliant 20 IP-based Design – Decisions impact system cost, power, performance Including an IP can increase chip cost, but can bring down system cost Examples - DDR2 Interface, Power Management, Security IP Knowledge of the target market essential to make such decisions 21 Ideal ESL Design Flow 22 © 2008 Sudeep Pasricha & Nikil Dutt 22 Platform Based Design Most System-on-Chip need One or more processors Digital Signal Processors Hardware Accelerators Peripheral IP (touch screen, etc) Connectivity IP Embedded Memories Analog/Mixed-Signal IP Custom Designing the SoC for each application has advantages, but infeasible! What are the limitations of the top-down approach? 23 Even platforms are many Choices galore … Selection tools can help the customer http://tinyurl.com/ti-mcu-selection-html http://tinyurl.com/ti-dsp-selection-html Do you need flash memory? How much RAM do you need? What is the operating clock-speed? How many CAN controllers do you need? … 24 How do you answer these questions? Is multimedia performance needed? What is the target operating system? 25 Find out what these mean IDM, ODM and OEM VSIA and SPIRIT OCP/IP PSL and OVL ESL, UML Simulator, Emulator 26 Selected Topics in VLSI OCP IP C.P. Ravikumar 27 What is OCP? Open Core Protocol What it helps to solve System Integration challenge Areas of focus High-performance multithreading Synchronization Primitives Data transfers Cache Coherence Power Management 28 Features of OCP-IP Unique Features of OCP IP Openly licensed Simple to Complex protocols needed in smart phones Supported by many international industrial bodies Bus-independent Scalable IP-XACT format - XML based lanuage defined by the SPIRIT consortium Synchronous and unidirectional signaling 29 OMAP 2420 used OCP for the first time 30 Cache Coherence 1 2 CPU 3 CPU Cache Cache CPU Shared Memory Cache X: 24 Shared Bus CPU1: CPU2: CPU2: CPU3: T = read X; // T has the value 24 and X is cached L = read X; // L has the value 24 and X is cached X = 32; // Local copy of X is updated. M = read X; Notice that having write-through caches is not good enough 31 Snoopy Cache Each CPU cache ‘snoops’ for write activity on data addresses which it has cached Needs a bus structure which is ‘global’ All communication can be seen by all 32 Write-Invalidate Protocol for Snoopy Cache CPU that wishes to write to an address, gets a bus cycle and sends a ‘write invalidate’ message All snooping caches invalidate their copy of appropriate cache block CPU writes to its cached copy Also updates the shared memory if ‘Write Through’ protocol is followed If we use ‘Write Back’ scheme, things are more complex! Any shared read in other CPUs will now miss in cache and re-fetch new data 33 Write-Update Protocol for implementing Snoopy Cache CPU wanting to write gets a bus cycle and broadcasts new data as it updates its own copy All snooping caches update their copy 34 MESI Protocol A practical multiprocessor invalidate protocol that attempts to minimize bus usage Allows usage of a ‘write back’ scheme - i.e. main memory not updated until ‘dirty’ cache line is displaced Uses an extension of usual cache tags ‘invalid block’ tag ‘dirty block’ tag 35 4 states of a cache block (MESI Protocol) 2 bits to represent 4 states Invalid – Block is not valid (as in simple cache) Exclusive - cache block is the same as main memory and is the only cached copy Shared – cache block is same as main memory but copies may exist in other caches Modified - cache block has been modified, is different from main memory - is the only cached copy 36 Illustrating the states of cache blocks r P1 P2 P3 P1 P2 P3 w P1 P2 P3 r P1 P2 P3 r 37 Local Read Hit Block must be in one of states - MES If in M state, it must have been modified locally Simply return value No state change 38 Local Read Miss Case 1 - No other copy in caches Processor makes bus request to memory Value read to local cache, marked E Case 2 - One cache has E copy Processor makes bus request to memory Snooping cache puts copy value on the bus Memory access is abandoned Local processor caches value Both lines set to S Case 3 - Several caches have S copy Processor makes bus request to memory One cache puts copy value on the bus (arbitrated) Memory access is abandoned Local processor caches value Local copy set to S Other copies remain S One cache has M copy Processor makes bus request to memory Snooping cache puts copy value on the bus Memory access is abandoned Local processor caches value Local copy tagged S Source (M) value copied back to memory Source tag M -> S 39 Local Write Hit Line must be one of MES Case M line is exclusive and already ‘dirty’ Update local cache value no state change Case E Case S Processor broadcasts an invalidate on bus Snooping processors with S copy change S->I Local cache value is updated Local state change S->M Update local cache value State E -> M 40 Local Write Miss (1) Case 1 - No other copies Value read from memory to local cache (?) Value updated Local copy state set to M Case 2 – (One E or multiple S) Value read from memory to local cache - bus transaction marked RWITM (read with intent to modify) Snooping processors see this and set their copy state to I Local copy updated & state set to M 41 Local Cache Miss (2) Another copy in state M 1. Processor issues bus transaction marked RWITM 2. Snooping processor sees this Blocks RWITM request Takes control of bus Writes back its copy to memory Sets its copy state to I 3. 4. Original local processor reissues RWITM request Is now simple no-copy case Value read from memory to local cache Local copy value updated Local copy state set to M 42 State Diagram for locally initiated access Invalid RWITM Read Miss(sh) Mem Read Mem Read Read Miss(ex) Write Miss Read Hit Modified Write Hit Write Hit Shared Read Hit Exclusive Rea d Hit Invalidate Write Hit = bus transaction 43 State Diagram – Remotely initiated access Mem Read Invalidate Invalid Shared Mem Read RWITM Modified Mem Read RWITM Exclusive = copy back 44 Snoopy Cache - comments Snooping protocols rely on a broadcase (e.g. shared bus) for implementing coherence Do not scale well for large number of processors Not recommended for 32+ processors 45 Directory-based protocols No need for a shared bus (can use point-to-point connections) Therefore, more scaleable (100+ processors) Allow each processor can have its own private memory Each node maintains a directory storing cache information and memory information A processor communicates with the directory to access memory 46 Communicating with the directory If a processor requests a non-local memory page, the directory uses its information to find the page Then, it uses messages to retrieve the page and insure all other processors have consistent info. Since the directory maintains which processors are caching the page, it only needs to send messages to those processors 47 Point-to-Point Communication 48 Reading http://tinyurl.com/y965he6 S. Pasrichca and N. Dutt, On-Chip Communication Architectures, Morgan Kauffman, 2008. 49