Topic #1 IP-based Design and Verification Flow

advertisement
Selected Topics in VLSI
Topic #1
IP-based Design and
Verification Flow
C.P. Ravikumar
IIT Madras
Texas Instruments, India
1
Changing face of VLSI Design

Domains are constantly changing



Computer Revolution
Communication Revolution
Medical Revolution
Name a company that was once known as a computer maker but is known in
the communication space today.
2
Integration

Attempt to integrate multiple functionalities – the quest for the
Swiss army knife



Will it be the computer?
Will it be the cell phone?
Will it be something else?
How has the quest for this kind of integration impacted VLSI Design?
3
Consumer Markets

Reaching out to a larger markets


Consumer devices (Toys, Cameras, Gadgets)
Consumers from different countries
What do these mean from a VLSI Design view point?
4
Consumers expect connectivity

Consumers want their gadgets to communicate with
one another


Wired communication
Wireless communication
HW#1 - For the communication protocols shown, make a table that shows the
type of communication (serial/parallel, wireless/wired, short range/medium
range) and 2 typical applications
5
Consumers expect graphical user
interfaces and multimedia

Digital Signal/Image Processing
6
Chip Design – Some Metrics

What are some quality metrics for a design?
Do all the metrics have the same weight? Which one do you think matters
most? What are some tradeoffs you can foresee?
7
Time to Market and Design Cost


Usually, “winner takes all” and late entrants face an
uphill task
What are the ways to reduce the TTM?


Design Time + Manufacturing Time + Test Time
Better design flows and tools are needed to reduce
design time
8
IP Reuse based Design Flow - Benefits

Design Reuse can help reduce design time


IP Reuse can help reduce design cost


Vendors provide IP such as processors, memories,
connectivity blocks
Team sizes and team skills
IP Reuse allows us to separate IP Creation from IP
Integration
Consider the options of (a) Having an in-house analog team for designing IP
such as ADC/DAC versus purchasing such IP and integrating. What are the
relative benefits/issues?
9
Design Productivity Gap
10
Chip Design Flows

Custom Design Flow

RTL to GDSII Design Flow (aka ASIC Design Flow)

IP-based Design Flow

Platform-based Design Flow
11
RTL-to-GDSII Design Flow – at 10,000ft








RTL creation and verification
Block-level synthesis, timing analysis and ATPG
Floorplanning
Top-level RTL synthesis, timing analysis and ATPG
Physical Design
Parasitic Extraction
Top-level timing analysis and ATPG
Handoff
constraints
sc.lib
macros
top.v(hd)
B1.v(hd)
B2.v(hd)
top.gdsii
B1
B2
reports
12
IP Reuse - Challenges

IP vendors have multiple clients with different requirements


One size may not fit all
Does IP meet all functional requirements?


Standards compliance
IP verification becomes a challenge

Multiple vendors may supply IP – extra effort needed to
interface them (“Heterogeneous IP”)

Does IP meet area and performance constraints?


Sometimes, there may be surprises after the IP is integrated
We may need to add extra hardware for protocol translation
13
IP integration challenges

Verification in the context of the SoC




Performance
Standards Compliance
Functionality
Unconnected pins, Unknown values, etc.
Three students (A, B, C) form a team for writing a report. B and C write sections.
A integrates them. What problems might come up during integration?
Page count exceeds limit, File provided by C not compatible, File provided by B
may have a virus, Repetitions, Referencing problems, etc.
14
Example

You are designing a 32-bit two’s complement
multiplier and wish to use an adder as an IP




You only have a 16-bit adder available
IP assumes that inputs are coming MSB-first,
whereas another IP provides the outputs LSB-first
IP assumes Big-Endian storage of data; another IP
stores data in Little-Endian format
Voltage levels of two IP may be different
Interface wrappers are used to overcome these problems.
Wrappers will result in overheads.
15
Eight elements for judging the quality
of Silicon Hardware/Software IP
16
Courtesy: Synopsys (2006)
IP cores




Hard core, Soft core
Hardware IP, Software IP
Design IP, Test IP, Verification IP
Processors, Memories, Hardware Accelerators,
Peripherals, Analog
Search for some IP vendors who provide processor cores, memory cores,
design libraries, Analog IP, IP for USB 2.0
17
Hard IP and Soft IP

Hard IP




Technology Dependent
Predictable – already
proved in silicon
More protected against
illegal usage
Limitation – cannot be
customized

Soft IP



Technology Independent
Risk
Can be modified to suit the
needs of the SoC
Application-specific Hard IP – tries to combine the best of both worlds.
18
Hardware/Software Codesign

Two important steps in IP integration

Provide a hardware interface


Software Driver


Pin/Signal mapping, Protocol translation, Buffering
Access to IP functionality through the OS
Implement some functions in hardware and others in
software for area/performance/power tradeoff

Example of JTAG: DCT in hardware, other functions in
software
19
IP Standards

OCP-IP is a well-known standard to mitigate the
problem of interfacing IP from multiple vendors

Many IP are OCP-compliant
20
IP-based Design – Decisions impact
system cost, power, performance

Including an IP can increase chip cost, but can bring
down system cost


Examples - DDR2 Interface, Power Management,
Security IP
Knowledge of the target market essential to make
such decisions
21
Ideal ESL Design Flow
22
© 2008 Sudeep Pasricha & Nikil Dutt
22
Platform Based Design

Most System-on-Chip need








One or more processors
Digital Signal Processors
Hardware Accelerators
Peripheral IP (touch screen, etc)
Connectivity IP
Embedded Memories
Analog/Mixed-Signal IP
Custom Designing the SoC for each application has
advantages, but infeasible!

What are the limitations of the top-down approach?
23
Even platforms are many 


Choices galore …
Selection tools can help the customer



http://tinyurl.com/ti-mcu-selection-html
http://tinyurl.com/ti-dsp-selection-html
Do you need flash memory?

How much RAM do you need?

What is the operating clock-speed?

How many CAN controllers do you need? …
24
How do you answer these questions?


Is multimedia performance needed?
What is the target operating system?
25
Find out what these mean






IDM, ODM and OEM
VSIA and SPIRIT
OCP/IP
PSL and OVL
ESL, UML
Simulator, Emulator
26
Selected Topics in VLSI
OCP IP
C.P. Ravikumar
27
What is OCP?


Open Core Protocol
What it helps to solve


System Integration challenge
Areas of focus





High-performance multithreading
Synchronization Primitives
Data transfers
Cache Coherence
Power Management
28
Features of OCP-IP

Unique Features of OCP IP



Openly licensed
Simple to Complex protocols needed in smart phones
Supported by many international industrial bodies



Bus-independent
Scalable
IP-XACT format - XML based lanuage defined by the
SPIRIT consortium

Synchronous and unidirectional signaling
29
OMAP 2420 used OCP for the first time
30
Cache Coherence
1
2
CPU
3
CPU
Cache
Cache
CPU
Shared
Memory
Cache
X: 24
Shared Bus
CPU1:
CPU2:
CPU2:
CPU3:
T = read X; // T has the value 24 and X is cached
L = read X; // L has the value 24 and X is cached
X = 32; // Local copy of X is updated.
M = read X;
Notice that having write-through caches is not good enough
31
Snoopy Cache


Each CPU cache ‘snoops’ for write activity on data
addresses which it has cached
Needs a bus structure which is ‘global’

All communication can be seen by all
32
Write-Invalidate Protocol for Snoopy
Cache



CPU that wishes to write to an address, gets a bus cycle and sends
a ‘write invalidate’ message
All snooping caches invalidate their copy of appropriate cache block
CPU writes to its cached copy



Also updates the shared memory if ‘Write Through’ protocol is followed
If we use ‘Write Back’ scheme, things are more complex!
Any shared read in other CPUs will now miss in cache and re-fetch
new data
33
Write-Update Protocol for
implementing Snoopy Cache


CPU wanting to write gets a bus cycle and
broadcasts new data as it updates its own copy
All snooping caches update their copy
34
MESI Protocol



A practical multiprocessor invalidate protocol that attempts to
minimize bus usage
Allows usage of a ‘write back’ scheme - i.e. main memory not
updated until ‘dirty’ cache line is displaced
Uses an extension of usual cache tags


‘invalid block’ tag
‘dirty block’ tag
35
4 states of a cache block (MESI
Protocol)
2 bits to represent 4 states
 Invalid – Block is not valid (as in simple cache)
 Exclusive - cache block is the same as main memory and is the
only cached copy
 Shared – cache block is same as main memory but copies may
exist in other caches
 Modified - cache block has been modified, is different from main
memory - is the only cached copy
36
Illustrating the states of cache blocks
r
P1
P2
P3
P1
P2
P3
w
P1
P2
P3
r
P1
P2
P3
r
37
Local Read Hit




Block must be in one of states - MES
If in M state, it must have been modified locally
Simply return value
No state change
38
Local Read Miss

Case 1 - No other copy in caches





Processor makes bus request to
memory
Value read to local cache, marked E



Case 2 - One cache has E copy





Processor makes bus request to
memory
Snooping cache puts copy value on the
bus
Memory access is abandoned
Local processor caches value
Both lines set to S
Case 3 - Several caches have S copy



Processor makes bus request to memory
One cache puts copy value on the bus
(arbitrated)
Memory access is abandoned
Local processor caches value
Local copy set to S
Other copies remain S
One cache has M copy







Processor makes bus request to memory
Snooping cache puts copy value on the bus
Memory access is abandoned
Local processor caches value
Local copy tagged S
Source (M) value copied back to memory
Source tag M -> S
39
Local Write Hit
Line must be one of MES
 Case M




line is exclusive and already
‘dirty’
Update local cache value
no state change
Case E



Case S




Processor broadcasts an
invalidate on bus
Snooping processors with S
copy change S->I
Local cache value is updated
Local state change S->M
Update local cache value
State E -> M
40
Local Write Miss (1)

Case 1 - No other copies



Value read from memory to
local cache (?)
Value updated
Local copy state set to M

Case 2 – (One E or multiple S)



Value read from memory to
local cache - bus transaction
marked RWITM (read with
intent to modify)
Snooping processors see this
and set their copy state to I
Local copy updated & state set
to M
41
Local Cache Miss (2)
Another copy in state M
1.
Processor issues bus
transaction marked RWITM
2.
Snooping processor sees this




Blocks RWITM request
Takes control of bus
Writes back its copy to
memory
Sets its copy state to I
3.
4.
Original local processor reissues RWITM request
Is now simple no-copy case



Value read from memory to
local cache
Local copy value updated
Local copy state set to M
42
State Diagram for locally initiated
access
Invalid
RWITM
Read
Miss(sh)
Mem Read
Mem Read
Read
Miss(ex)
Write
Miss
Read
Hit
Modified
Write
Hit
Write
Hit
Shared
Read
Hit
Exclusive
Rea
d
Hit
Invalidate
Write
Hit
= bus transaction
43
State Diagram – Remotely initiated
access
Mem Read
Invalidate
Invalid
Shared
Mem Read
RWITM
Modified
Mem Read
RWITM
Exclusive
= copy back
44
Snoopy Cache - comments


Snooping protocols rely on a broadcase (e.g. shared
bus) for implementing coherence
Do not scale well for large number of processors

Not recommended for 32+ processors
45
Directory-based protocols

No need for a shared bus (can use point-to-point connections)




Therefore, more scaleable (100+ processors)
Allow each processor can have its own private memory
Each node maintains a directory storing cache information
and memory information
A processor communicates with the directory to access
memory
46
Communicating with the directory



If a processor requests a non-local memory page, the directory uses
its information to find the page
Then, it uses messages to retrieve the page and insure all other
processors have consistent info.
Since the directory maintains which processors are caching the
page, it only needs to send messages to those processors
47
Point-to-Point Communication
48
Reading


http://tinyurl.com/y965he6
S. Pasrichca and N. Dutt, On-Chip Communication
Architectures, Morgan Kauffman, 2008.
49
Download