On-Chip Communication Architectures Synthesis Techniques ICS 295

advertisement
On-Chip Communication
Architectures
Synthesis Techniques
ICS 295
Sudeep Pasricha and Nikil Dutt
Slides based on book chapter 6
© 2008 Sudeep Pasricha & Nikil Dutt
1
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
2
Introduction

Designing on-chip communication architectures
is becoming more and more challenging
◦ increasing number of components in today's systems
translates into more inter-component communication

Multi-dimensional design constraints
◦ ↑ performance, reliability
◦ ↓ power, cost, area, time-to-market

System designers need techniques that can
◦ optimize for individual design goals
◦ allow design decisions to provide a good balance
between other design goals
© 2008 Sudeep Pasricha & Nikil Dutt
3
Introduction


Exploration and synthesis
techniques can broadly be
classified into 3 categories:
◦ Static, dynamic, hybrid
Commercial toolkits
available for standard bus
architectures,
◦ AMBA Designer/Design Kit
◦ STBus GenKit
◦ Sonics Studio

Not very useful for
automating exploration and
synthesizing communication
architectures that satisfy
diverse design constraints
© 2008 Sudeep Pasricha & Nikil Dutt
4
Introduction
Bus Architecture Synthesis:

◦ process of designing a bus architecture topology and/or its
protocol parameters to satisfy application constraints
MEM1
MEM1
Bus Architecture
Synthesis
M2
M2
S4
S4
S1
S1
CPU1
CPU1
S3
S3
MEM2
MEM2
M3
M3
bridge
bridge
S3
S3
main1
periph
MEM2b
MEM2b
S2
S2
M3
M3
bridge
bridge
main2
bridge
bridge
S2
S2
MEM3
MEM3
main3
MEM3
MEM3
Parameter Space
Arbitration strategy
Data bus widths
Constraints
-Performance
-Power
-Cost
-Area
-reliability
Bus clock frequencies
MEM1
MEM1
CPU1
CPU1
S4
S4
bridge
bridge
Topology
Space
MEM2a
MEM2a
S3
S3
MEM1
MEM1 S3
S3
S3S3
periph
S3
periph
S3 periph
M3
MEM2b
M2
S1
M3
MEM2b
M2
M3
MEM2b
M2
MEM2b
S1
M3 M2
M3M3
MEM2bM3
M2
MEM2b
MEM2b
M3
MEM2b
M2
MEM1
MEM1
S1
S1
S1
S1
main1
main1
main1
main1
bridge
bridge
bridge
bridge
bridge
bridge
bridge
bridge
main2
main2
main2
MEM1
CPU1
MEM1
CPU1
MEM1
CPU1
CPU1
S1
CPU1
M2
MEM1
CPU1
CPU1
CPU1
S1
M2
Buffer sizes
S1
S1
M2
M2
S4
S4
S4
S4
S4
S4S4
S4
periph
S2 MEM3
MEM3
MEM3
S2
S2 S2MEM3
MEM3
S2S2
MEM3
S2
S2
MEM3
MEM3
bridge
bridge
bridge
bridge
bridge
bridge
MEM2a
MEM2a
MEM2a
MEM2a
MEM2a
MEM2a
MEM2a
MEM2a
© 2008 Sudeep Pasricha & Nikil Dutt
5
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
6
Topology Synthesis

Topology of a bus-based on-chip communication
architecture determines
◦ number of buses in the system
◦ manner in which they are interconnected to each other
◦ how components are allocated to the buses

Early work focused on allocating inter-component comm. to
buses for distributed real-time embedded systems
◦ Yen et al.[ICCAD ‘95] proposed techniques to estimate comm. delay
on a bus using static analysis for a system with periodic tasks
 assigned a PE to existing bus, or created a new bus to meet task deadlines
◦ Ortega et al. [ICCAD ‘98] explored mapping of PEs in to a set of offchip bus architecture configurations (shared buses or point-to-point)
and protocols (such as CAN or I2C )
 explored different performance vs. cost tradeoffs
© 2008 Sudeep Pasricha & Nikil Dutt
7
Topology Synthesis

Liveris et al. [DATE ‘04] proposed a bus topology
synthesis technique to reduce bus power consumption
while meeting latency constraints
◦
◦
◦
◦
AMBA AHB bus architecture
Simple FIFO arbitration
Dynamic power reduction
Switching activity α is taken as 0.5 for data bus, and a lower value
for address bus
 control wire switching is ignored
◦ Each master has a latency constraint that determines number of
cycles available to complete a communication operation
© 2008 Sudeep Pasricha & Nikil Dutt
8
Topology Synthesis

To improve latency response of communication
architecture and also reduce power consumption on
the bus wires, Liveris et al. proposed using 3 different
topology transformations
© 2008 Sudeep Pasricha & Nikil Dutt
9
Topology Synthesis

Private slave creation
◦ making a slave private to a master is possible if the master is the
only one accessing the slave
◦ removes a slave from the shared bus, which reduces the fanout by
one for all the signals driven by the AMBA logic
© 2008 Sudeep Pasricha & Nikil Dutt
10
Topology Synthesis

Slave isolation
◦ Moving a slave to another layer
© 2008 Sudeep Pasricha & Nikil Dutt
11
Topology Synthesis

Grouping masters
◦ Moving masters to another layer to reduce arbitration conflict
© 2008 Sudeep Pasricha & Nikil Dutt
12
Topology Synthesis

Synthesis heuristic
◦ initially, all masters and slaves are mapped to a single layer
◦ private slave creation transformation is applied for all eligible slaves
◦ in case a latency violation exists for a master, slave isolation
transformation is applied to the slowest slave
◦ if violation persists, grouping masters transformation is performed
 by transferring masters with less stringent latency requirements to a new layer
◦ once a solution that satisfies latency constraints is obtained, slave
isolation and grouping masters transformations are performed
 to reduce power
◦ at every iteration power of current solution is calculated,
 by using probability-based formulations to estimate switching activity on the wires
◦ transformations are carried out till no more improvement is
obtainable
© 2008 Sudeep Pasricha & Nikil Dutt
13
Topology Synthesis

Heuristic was implemented in C and applied to
◦ Sobel Transform SoC
29.6% less power
© 2008 Sudeep Pasricha & Nikil Dutt
14
Topology Synthesis

Murali et al. [DATE ‘05] proposed a methodology for STBus
crossbar (matrix) synthesis

Compared to a full crossbar, a partial crossbar has
◦ fewer communication components (buses, arbiters, decoders, etc.),
lower area, reduced power consumption

Goal:
◦ design a minimal cost partial crossbar bus architecture for a given
MPSoC application
◦ average and maximum packet latencies must lie within acceptable
bounds from the latencies obtained for a full crossbar
© 2008 Sudeep Pasricha & Nikil Dutt
15
Topology Synthesis

Phase 1: SystemC simulation
◦ window-based traffic analysis -> window size is parametrizable

Phase 2: Preprocessing to identify
◦ overlapping critical traffic streams to be mapped to separate buses
◦ targets with large traffic overlap in a window to map to separate buses
◦ max. no. of targets to be connected to a bus (to bound max. latency)

Phase 3: MILP based partial crossbar generation
© 2008 Sudeep Pasricha & Nikil Dutt
16
Topology Synthesis

Applied methodology to synthetic MPSoC applications
© 2008 Sudeep Pasricha & Nikil Dutt
17
Topology Synthesis

Thepayasuwan et al. [DATE ‘04] proposed a simulated
annealing (SA)-based approach to synthesize a hierarchical
shared bus architecture topology
◦ cost function accounts for criteria such as number of buses,
communication conflict, and bus utilization
◦ SA based optimization depends on weights in cost function


Yoo et al. [ASPDAC ‘07] presented an SA-based approach
for synthesizing a cascaded crossbar
Topology synthesis for segmented bus was presented by
Guo et al. [ASPDAC ‘06] to
◦ obtain a solution with minimum wire energy
◦ generate a set of solutions to trade-off chip area, energy, delay
© 2008 Sudeep Pasricha & Nikil Dutt
18
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
19
Protocol Parameter Synthesis

Bus-based communication architectures are characterized
by several protocol parameters
◦ bus widths, bus clock frequencies, transaction burst sizes,
arbitration schemes, buffer sizes

Protocol parameter synthesis determines values for one
or more parameter for a fixed topology
◦ while satisfying constraints of the application

Early work in protocol parameter synthesis focused on
determining bus width
◦ Narayan et al. [DATE ‘94]
 for simple shared bus architecture
 trade-off bus width with system performance
 no arbitration assumed; traffic conflict on shared bus ignored
© 2008 Sudeep Pasricha & Nikil Dutt
20
Protocol Parameter Synthesis

Lahiri et al. [ICCAD ’00] proposed an approach to
determine bus protocol parameters as well as
component mapping on buses to improve performance
© 2008 Sudeep Pasricha & Nikil Dutt
21
Protocol Parameter Synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
22
Protocol Parameter Synthesis

Step 1: Co-simulate entire system
◦ assuming completely parallel (conflict-free) comm. between cores
◦ generate execution traces
Step 2: save traces as a comm. analysis graph (CAG)
 Step 3: Performance analysis to generate comm. graph (CG)

◦ Represents statistics gathered by performance analysis
◦ Single weight derived for each edge
© 2008 Sudeep Pasricha & Nikil Dutt
23
Protocol Parameter Synthesis

Step 4: Generate initial component mapping to buses
◦ analyze CG
◦ calculate demand from component on comm. architecture
 demand of component = sum of weights of outgoing edges
◦ arrange components in a descending order of demand
◦ rank buses in comm. architecture by analyzing topology template
 higher rank is given to buses that have higher performance and are well
connected to the rest of the buses
◦ Select highest ranked component and map to bus with maximum
interaction level; repeat till no more components left

Step 5: Generate initial protocol parameters
◦ High arbitration priority for higher ranked component
◦ Maximum block transfer size calculated as weighted average of the size
of transactions between components on the bus
© 2008 Sudeep Pasricha & Nikil Dutt
24
Protocol Parameter Synthesis

Step 7: Generate transformations/moves to improve performance
◦ Create communication conflict graph (CCG) where edges between
components represent communication overlap
◦ Changed congestion levels used to recalculate time taken for transactions
◦ Move with maximum time reduction (potential gain) is selected
◦ Repeat till no more improvement possible
© 2008 Sudeep Pasricha & Nikil Dutt
25
Protocol Parameter Synthesis

Experimental results
◦ ATM: cell forwarding unit of an output queued ATM switch, with
a fixed topology having three buses connected by two bridges
◦ SYS: simple communication system with two buses connected by
a single bridge
© 2008 Sudeep Pasricha & Nikil Dutt
26
Protocol Parameter Synthesis

Shin et al. [DATE ‘04] proposed a methodology to
automatically determine slot schedule for a time division
multiple access (TDMA)-based arbitration scheme
© 2008 Sudeep Pasricha & Nikil Dutt
27
Protocol Parameter Synthesis

Objective function
◦ To meet throughput requirements for masters
© 2008 Sudeep Pasricha & Nikil Dutt
28
Protocol Parameter Synthesis

Objective function
◦ To meet throughput and latency requirements for masters
© 2008 Sudeep Pasricha & Nikil Dutt
29
Protocol Parameter Synthesis

Experimental results
◦ Best results with following GA parameters: crossover rate of 70%,
mutation rate of 25%, population size of 80%
© 2008 Sudeep Pasricha & Nikil Dutt
30
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
31
Topology and Protocol Parameter Synthesis

Unlike previous approaches, a few approaches consider both
topology and protocol parameter synthesis simultaneously
◦ more comprehensive synthesis

Pandey et al. [FPLA ‘05] proposed a technique to
simultaneously synthesize hierarchical shared bus topology
and width of data buses
◦ while satisfying the performance constraints
◦ using integer linear programming (ILP) formulation

Pasricha et al. [ASPDAC ‘05] proposed a technique to
automate synthesis of hierarchical bus topology and multiple
protocol parameters
◦ data bus widths, bus clock speeds, OO buffer sizes, DMA burst sizes
◦ using several heuristics
© 2008 Sudeep Pasricha & Nikil Dutt
32
Topology and Protocol Parameter Synthesis


Pasricha et al. [ASPDAC ‘06] proposed automated topology
and parameter synthesis methodology for bus matrix
architectures
Goal: minimal cost partial bus matrix tailored to application
◦ Has fewer busses (consequently fewer arbiters, decoders, buffers)
◦ Maximizes bus utilization
◦ Reduces implementation cost, area and power dissipation
© 2008 Sudeep Pasricha & Nikil Dutt
33
Topology and Protocol Parameter Synthesis



MPSoC designs have performance constraints that can be
represented in terms of Data Throughput Constraints
Communication Throughput Graph, CTG = G(V,A)
incorporates SoC components and throughput constraints
Throughput Constraint Path (TCP) is a CTG sub-graph
© 2008 Sudeep Pasricha & Nikil Dutt
34
Topology and Protocol Parameter Synthesis

Communication Parameter Constraint Set (Ψ)
◦ Used to ensure that approach generates realistic communication
architecture
◦ constraints are in the form of a discrete set of valid values for
protocol parameters to be synthesized
◦ e.g., specifying that bus clock frequency for a bus can only be
multiples of 33 MHz, up to a maximum of 330 MHz

Allows designer to bias synthesis process based on
knowledge of design and technology being targeted
© 2008 Sudeep Pasricha & Nikil Dutt
35
Topology and Protocol Parameter Synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
36
Topology and Protocol Parameter Synthesis


B&B Goal: cluster slave modules to minimize matrix cost
Start by clustering two slave clusters at a time
◦ Initially, each slave cluster has only one slave

However, the total number of clustering configurations possible for n
slaves is nC2 + (nC2 .n-1C2) + (nC2 .n-1C2 .n-2C2) + … + (n! x (n-1)!)/2(n-1)
◦ Extremely large number for even medium sized SoCs!


To quickly prune out invalid clustering configurations and converge on
an optimal solution, use a powerful bounding function
Bounding function
◦ Called after every clustering operation
◦ Uses lookup table to discard duplicate clustering ops
◦ Discards all non-beneficial clustering ops (i.e. no savings in no. of busses)
◦ Discards incompatible clustering ops
 e.g. mergers of busses with conflicting bus speeds
◦ Discards clustering which cannot theoretically support b/w requirements
© 2008 Sudeep Pasricha & Nikil Dutt
37
Topology and Protocol Parameter Synthesis

Experimental results on four MPSoC applications from
the networking domain

Significant matrix component savings
◦ 4.6x to 9x when compared with a full bus matrix
© 2008 Sudeep Pasricha & Nikil Dutt
38
Topology and Protocol Parameter Synthesis

Methodology extended by Pasricha et al. [CODES+ISSS
‘06] to synthesize bus matrix topology and protocol
parameters
◦ with the incorporation of energy estimation models for bus wires
and bus logic components


Goal: generate multiple candidate bus matrix solutions, on
which to perform a power-performance trade-off analysis
Methodology applied to an MPSoC application
© 2008 Sudeep Pasricha & Nikil Dutt
39
Topology and Protocol Parameter Synthesis
Results
Up to 20% in power and 40% in
performance possible trade-off
Up to 8% in runtime and 15% in
energy possible trade-off
CTG
© 2008 Sudeep Pasricha & Nikil Dutt
40
Topology and Protocol Parameter Synthesis


Pasricha et al. [VLSID ‘08] further extended this
synthesis methodology by incorporating a PVT (process,
voltage, temperature) variation aware power estimation
technique
Incorporating PVT variation-awareness in the system
level bus matrix synthesis technique resulted in a set of
curves for power and energy in the trade-off graph
outputs
◦ instead of a single curve for power and energy

Allowed for a more accurate power characterization in
the face of PVT variations early in the design flow
◦ enabling designers to make more informed decisions when
selecting a bus matrix configuration
© 2008 Sudeep Pasricha & Nikil Dutt
41
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
42
Physically-aware Synthesis

Most synthesis approaches design the communication
architecture without considering physical implementation
issues that can influence performance
◦ such as the layout of the components on the chip or the lengths
and routing of the bus wires interconnecting components


Physical level information can be extremely important to
guarantee that the synthesis results are reliable
However, such physical level information is typically
available much later in the design flow
◦ challenging to abstract up this information to early in the design
flow during communication architecture design

A few approaches have looked at this problem of
physically aware synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
43
Physically-aware Synthesis

Dick et al. [DATE ‘99] proposed physically aware topology
synthesis technique to ensure hard real-time communication
deadlines between components were satisfied
◦ used a high level floorplanner to create a block placement, and
estimate global wiring delays
◦ genetic algorithm (GA) was used to iterate over
 different bus topology configurations having low contention
 task assignments on components

Drinic et al. [ICCAD ‘00] and Meguerdichian et al. [DAC ‘01] used
a high level floorplanner to determine design feasibility during bus
topology synthesis
◦ compared estimates of wire length with upper bound on wire length
◦ does not account for varying capacitive loads of components on a bus
© 2008 Sudeep Pasricha & Nikil Dutt
44
Physically-aware Synthesis

Thepayasuwan et al. [ICCD ‘03] proposed a topology synthesis
framework that used a high level floorplanner to obtain wire
lengths
◦ lengths are incorporated into an SA cost function that is used to synthesize
bus topology
◦ SA minimizes the cost function, and selects a topology solution with low
total wire length

Guo et al. [ASPDAC ‘06] used a high level floorplanner during
segmented bus topology synthesis
◦ floorplanner aims to reduce length of critical wires with high switching
activity to reduce wire energy consumption

Pasricha et al. [CODES+ISSS ‘06] used a high level floorplanner
to obtain wire length for estimating wire energy
◦ during bus matrix topology and parameter synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
45
Physically-aware Synthesis

Pasricha et al. [DAC ‘05] proposed physically aware hierarchical bus
topology and protocol parameter synthesis technique (FABSYN)
◦ detects and eliminates clock cycle timing violations
MEM2
MEM3
DTCM
MEM4
MEM1
IP1
ITCM
ARM
IP2
SoC floorplan
DMAC
ASIC1
ASIC2
To meet performance constraints, bus clk speed set to 333 MHz (3 ns cycle time)
 After layout, signal delay  3.5 ns, which violates 3 ns clock timing constraint!
◦ adverse effect on cost, complexity, constraint satisfiability
 To eliminate such violations, designers use repeaters, pipeline elements
◦ can severely affect performance, power
◦ requires considerable manual RTL re-work, re-verification

© 2008 Sudeep Pasricha & Nikil Dutt
46
Physically-aware Synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
47
Physically-aware Synthesis

Simple bus mapping
Bus
mapping
© 2008 Sudeep Pasricha & Nikil Dutt
48
Physically-aware Synthesis

Mutate topology
Create new
bus
and/or
migrate IPs
© 2008 Sudeep Pasricha & Nikil Dutt
49
Physically-aware Synthesis

Mutate topology
Create new
bus
and/or
migrate IPs
© 2008 Sudeep Pasricha & Nikil Dutt
50
Physically-aware Synthesis

If a timing violation is detected
◦ TCPs that have components on buses with violations flagged
◦ feedback loop is used to go back and attempt to eliminate violations
◦ first the TCP that has components on the violated bus with the largest
load capacitance on its pins is selected from the flagged TCPs
 since cumulative capacitive load of components directly contributes to
increasing signal propagation delay
◦ the components are iteratively migrated to another existing bus
 or a new bus if migration to existing buses causes TCP constraint violations
◦ If there is still a violation, another flagged TCP is selected and its
components migrated away from the violated bus
◦ Another way used to eliminate clock cycle violations is to reduce bus
clock frequency
 increases cycle times
© 2008 Sudeep Pasricha & Nikil Dutt
51
Physically-aware Synthesis

Synthesized hierarchical bus architecture
Parameter
Values
main1
main2
main3
periph
bus width
32
32
32
32
bus speed
133
133
133
66
arb priority
CPU1 > M3 > M2 (static)
© 2008 Sudeep Pasricha & Nikil Dutt
52
Physically-aware Synthesis

Experimental study
Constraint Set
CTG
© 2008 Sudeep Pasricha & Nikil Dutt
53
Physically-aware Synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
54
Physically-aware Synthesis

Quality of the FABSYN synthesis solution was compared
with other synthesis approaches
◦ Initial: solution with just 2 buses (initial mapping)
◦ ABS: synthesis approach without integrated floorplanners
◦ Manual: designer driven manual synthesis approach with floorplanner
© 2008 Sudeep Pasricha & Nikil Dutt
55
Outline

Introduction

Topology Synthesis

Protocol Parameter Synthesis

Topology and Protocol Parameter Synthesis

Physically aware Synthesis

Co-synthesis with Memory
© 2008 Sudeep Pasricha & Nikil Dutt
56
Co-synthesis with Memory

Memory can take up a large chunk of on-chip area, as much
as 70% in some cases
◦ Estimates indicate that this will go up to 90% in coming years

Variety of different memory types available to satisfy storage
requirements in MPSoC applications
◦ DRAMs, SRAMs, EPROMs, EEPROMs etc.

Typically
◦ DRAMs -> larger memory requirements, slower, cheaper
◦ SRAMs -> smaller memory requirements, faster, expensive
◦ EPROMs and EEPROMs -> read-only data

Several tradeoffs during memory architecture synthesis
◦ SRAM vs. DRAM
 cost vs. performance vs. area
◦ ports vs. number of memory blocks
© 2008 Sudeep Pasricha & Nikil Dutt
57
Co-synthesis with Memory

Memory architecture synthesis determines the
◦ number, type, size of the memories in the system
◦ application data mapping to memories
Memory architecture significantly contributes to data traffic
on communication architectures
 Design of memory architecture has a substantial influence
on communication architecture design
 Traditionally, in platform-based design, memory synthesis is
performed before communication architecture synthesis

◦ can lead to inferior design decisions
© 2008 Sudeep Pasricha & Nikil Dutt
58
Co-synthesis with Memory

Motivational study (Pasricha et al.
[DATE ‘06])

MPSoC memory and comm.
architecture synthesis
Separate
synthesis
Co-synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
59
Co-synthesis with Memory
Shalan et al. [SASIMI ‘03] proposed a tool to automatically
generate a full crossbar and a dynamic memory management unit
 Grun et al. [DATE ‘02] considered the system connectivity
topology early in the design flow, in conjunction with memory
exploration, for simple processor–memory systems

◦ most active access patterns extracted from application data structures
◦ different memory architecture configurations that can match needs of access
patterns are obtained, assuming a simple connectivity model
◦ next, different comm. architectures are considered for these memory
architecture configurations, and the most suitable interconnect and memory
architecture is selected from a pareto-optimal curve

Srinivasan et al. [DATE ‘05] presented an approach to
simultaneously consider bus topology splitting and memory bank
partitioning during synthesis
◦ with the goal of reducing system energy
© 2008 Sudeep Pasricha & Nikil Dutt
60
Co-synthesis with Memory

Pasricha et al. [DATE ‘06] proposed the COSMECA methodology
for memory and comm. architecture synthesis
◦ Synthesize bus matrix topology and protocol parameters

Goal: obtain a least cost system, having minimal number of buses
while satisfying performance and memory area constraints

COSMECA selects memory blocks from a library populated by several
types of memories
◦ on-chip SRAMs, DRAMs, EPROMs, EEPROMs, …
Each memory type can have variants in library, having different
◦ capacities, areas, ports, operating frequencies and access times
Memory synthesis in COSMECA
◦ selects appropriate physical memories from library
◦ maps application arrays, scalars to physical memories selected from
library


© 2008 Sudeep Pasricha & Nikil Dutt
61
Co-synthesis with Memory


Application memory requirements are initially
represented by abstract data blocks (DBs) in a CTG
DBs are initially grouped together into virtual memories
© 2008 Sudeep Pasricha & Nikil Dutt
62
Co-synthesis with Memory

DBs are merged at this initial step only if they have
◦ similar edges (i.e., edges from the same masters) and
◦ non-overlapping access

Subsequently, the enhanced CTG with VMs is used as an
input to a branch and bound based bus matrix synthesis
framework to generate minimal cost solution
© 2008 Sudeep Pasricha & Nikil Dutt
63
Co-synthesis with Memory

Heuristic used to map VMs to physical memories from library
◦ finds N solutions that satisfy memory area and performance
constraints of design

Generate memory access traces that are used to determine the
extent of access overlap of VMs at each slave access point (SAP)
◦ after simulating best solution

If the overlap is below a user defined overlap threshold T, the VMs
are merged
© 2008 Sudeep Pasricha & Nikil Dutt
64
Co-synthesis with Memory



VMs are then mapped to physical memories from library
Initially, best memory from the library is selected for a VM
that fits capacity requirements and has max. port bandwidth
If performance constraints are not met even for the memory
with best performance, the matrix solution is discarded
◦ the next best matrix solution from the set of (ranked) matrix
solutions is selected


If performance constraints and memory area constraints are
met, the solution is added to the final solution database
Next, to lower memory area,VMs at SAPs are randomly
selected and the mapped physical memory replaced with
one that meets capacity requirements and has lower area
◦ If violation detected, then move is reversed, otherwise solution is kept
◦ Procedure repeated iteratively till N solutions obtained
© 2008 Sudeep Pasricha & Nikil Dutt
65
Co-synthesis with Memory

Experiments with MPSoC applications
◦ Shown below: PYTHON application synthesis
© 2008 Sudeep Pasricha & Nikil Dutt
66
Co-synthesis with Memory

Trade-off curve between number of buses and memory area

Impact of threshold value
© 2008 Sudeep Pasricha & Nikil Dutt
67
Co-synthesis with Memory

COSMECA saves 25–40% in the number of buses in the matrix and
from 17–29% in memory area compared to traditional approach
© 2008 Sudeep Pasricha & Nikil Dutt
68
Co-synthesis with Memory

Meyer et al. [CODES+ISSS ‘07] attempted to extend
COSMECA by adding layout-awareness during co-synthesis
◦ co-synthesis is performed using a SA-based algorithm

Results indicate 20–27% cost reduction for a synthetic DSP
software pipeline case study by using the approach
◦ compared to an approach that separately allocates memory
and synthesizes buses

A few limitations
◦ Only bus topology synthesis is performed – bus parameter
synthesis is neglected
◦ memory synthesis does not consider different memory types
- only SRAM memories are supported
© 2008 Sudeep Pasricha & Nikil Dutt
69
Summary

Designers need techniques that can efficiently explore the
increasingly intractable comm. architecture design space
◦ to satisfy and optimize constraints during comm. architecture design

Presented research on techniques for efficient bus-based
communication architecture synthesis
◦ Scope to extend synthesis techniques for emerging applications

A lot of open problems still remain to be solved, especially in
the areas of low level physical and circuit level synthesis
approaches (refer book chapter for more details)
◦
◦
◦
◦
◦
wire metal layer assignment
wire sizing optimization
inductance estimation
timing-driven floorplanning
shield wire insertion algorithms
© 2008 Sudeep Pasricha & Nikil Dutt
70
© 2008 Sudeep Pasricha & Nikil Dutt
71
Download