forever do - International Symposium on Asynchronous Circuits and

advertisement
An Automatic Approach to
Generate Haste Code from
Simulink Specifications
Maurizio Tranchero1, Leonardo M. Reyneri1,
Arjan Bink2, and Mark de Wit2
1Politecnico
di Torino – Department of Electronics – Italy
2Handshake Solutions – The Netherlands
Outline





Simulink Based Design and CodeSimulink
Haste Coding Choices
Simulink-Specific Issues
Proposed Flow and its Implementation
Case studies and performance
Haste Code from Simulink
2 of about 50
Simulink-Based
Design
Simulink®: what is it? and why?





General-purpose graphical tool able to describe
and simulate heterogeneous systems
Based on MATLAB®
Widely used in different application and
industrial areas: signal and image processing,
control, aerospace, modeling, etc...
Does not require knowledge of electronic/digital
design; allows interdisciplinary teams
Uses dataflow (DF) computational model
Haste Code from Simulink
4 of about 50
Simulink diagrams



A set of interconnected blocks
Each block performs an
operation (e.g. a multiply and
accumulate model)
Includes stimuli and test points
ACCUM
SOURCES
DISPLAY
RESULTS
MULTIPLY
ADD
Haste Code from Simulink
5 of about 50
Simulink to develop digital systems

Simulink is very fine in general-purpose modeling, but:

what are the implications of HW/SW implementations?
 what about the effects of data representation?
 what about the effects of timing, latencies and delays?

Can Simulink models be implemented physically?
Yes, but some external tools are required:

For SW


For HW





Real-Time Workshop, from The Mathworks
System Generator (Xilinx)
DSP-Builder (Altera)
HDL Coder (The Mathworks)
CodeSimulink (Politecnico di Torino)
For mixed HW/SW

CodeSimulink (Politecnico di Torino)
Haste Code from Simulink
6 of about 50
Data Flow vs. Register Transfer







Simulink is natively Data Flow (each block computes only
when data is valid)
Sequential SW is DF (by compilation)
Synchronous HW is natively Register Transfer (each block
computes independently of data being valid)  RT has
the problem of repipelining and synchronization…
Asynchronous HW is natively DF (because of handshake)
Analog systems are time-continuous
True Simulink  HW has to be DF!
Mixed HW/SW systems have to be DF!
Haste Code from Simulink
7 of about 50
Commercial tools Simulink  HW/SW

Non Simulink-compliant (they are RT, not DF !!!) :




System Generator (Xilinx)
DSP-Builder (Altera)
HDL Coder (The Mathworks)
 they use Simulink ONLY as a graphical interface
 do NOT support any Simulink block
Simulink-compatible (fully DF):


Real Time Workshop (The Mathworks; only for SW!)
CodeSimulink/SMT6040 (Politecnico di Torino); also supports
mixed HW/SW/analog systems
both implement Simulink blocksets natively in a
transparent manner
Haste Code from Simulink
8 of about 50
CodeSimulink/
SMT6040 Tool
Our Tool
CodeSimulink/SMT6040






True DF, Simulink-compatible, model-based, hybrid codesign environment
Co-simulates: SW + digital HW + analog HW + external
world (e.g. mechanical) modeling real behavior of chosen
implementation(s)
Generates: SW (C) + digital HW (VHDL) + analog (SPICE)
or VHDL + HASTE code
Digital either: synchronous DF, asynchronous DF
Commercially available (SMT6040)
Student edition available at
http://polimage.polito.it/groups/codesimulink.html
Haste Code from Simulink
10 of about 50
A Simple CodeSimulink model
Haste Code from Simulink
11 of about 50
Implementation Parameters

Available parameters






DATAWIDTH (number of bits)
BINARYPOINT (position of fixed point)
REPRESENTATION ((un)signed, sign/modulus,
floating point)
OVERFLOW (saturation/wraparound)
TRUNCATION (floor, ceil, round, etc.)
PIPELINE (latency, speed)
+/-
1
0
1
1
0
+5.50
Haste Code from Simulink
12 of about 50
From Simulink to
CodeSimulink

An automatic process composed of these steps:



Model simulation
Model conversion (namely 1-to-1 block substitution)
Hw Parameter setting (based on simulation result)
Double precision (64b)
Floating point
- Selectable data-width
- integer, fixed point, floating point...
- signed, unsigned, modulus & sign
- wrap around, saturate output
Haste Code from Simulink
13 of about 50
A Semi-Automatic Process



This conversion cannot be completely automated
Inputs and outputs block should be inserted manually
Some block parameter have to be set manually
(overflow, truncation and pipeline)
Haste Code from Simulink
14 of about 50
CodeSimulink Environment
System Description
Functional + Timing Simulation
HW-SW Partitioning
Digital HW
SW
DigHw Compiler
RTW
Analog HW
AnHw Compiler
Synchronous
Asynchronous
PCB Tool
P&R
Target Programming
Schematic
Haste Code from Simulink
15 of about 50
Advantages of (Code)Simulink





Flexibility: very high (short redesign time); no need to
take care of interfaces and timing; quick system-level
performance optimization
Reusability: may use existing Simulink models
Time-to-market: very short (consequently), although
design is suboptimal (can be optimized later on)
Accessibility: does not require experienced designer;
simpler integration of work team with heterogeneous
know-how’s
Academic: Optimal for teaching Electronic Systems
and Asynchronous circuits; student version available
Haste Code from Simulink
16 of about 50
Advantages of CodeSimulink






Allows choosing implementation later in the design flow
Timing analysis and pipeline balancing
Natively handles scalars, vectors, matrices
Supports multi-system (multi-platforms, multi-cores,
multi-SW, multi-FPGA, multi time-domains, GALS, mixed
synch/asynch, hybrid, etc.)
Supports synchronous bit-parallel, bit-serial, bundleddata asynchronous designs
Interfaces to low-level simulators (ModelSim, MaxPlus,
Quartus, ISE, Spice-like)
Haste Code from Simulink
17 of about 50
Limitations of CodeSimulink




Best suited to data-dominated systems
Mostly fixed-rate (does not mean synchronous!)
sampling strategy (including multi-rate)
Library-based (sub-optimal)
Fast timing models (optional) require technology
characterization
Haste Code from Simulink
18 of about 50
Library Blocks

Large library of blocks (blockset) including:
 Low-level
Simulink blockset: addition, multiplication,
min/max, floating / fixed point converters, etc.
 High-level functions: FIR filters, FFTs, custom transfer
functions, etc.
 Special-purpose functions
 Interface blocks:




I/Os
SW/HW/SW
Analog/digital/analog
Synchronous/asynchronous
Haste Code from Simulink
19 of about 50
CodeSimulink digital blocks
Each CodeSimulink block is
translated into:
 A combinational
functional blocks (VHDL)
 A sequential protocol
controller + register.
Either:



VHDL, synchronous
VHDL, asynchronous
Haste code
REQ,
CHANNEL
ACK
CLK,
VAL,
RDY
Haste Code from Simulink
20 of about 50
Asynchronous CodeSimulink







Just change protocol handling box
Supports bundled data transfers
Analyzes and optimizes timing
Timing analysis identifies bottlenecks and helps to
minimize them
Forces timing constraints during synthesis accordingly
Adds delay line according to required timing
Prevents optimization on delay line
Haste Code from Simulink
21 of about 50
Haste Coding Choices
VHDL Usage within TiDE


CodeSimulink uses a
library-based approach:
each block is described in
VHDL
To reuse such code, an
automatic conversion into
Verilog (which is fully
supported in TiDE) code
has been provided using
RTL Compiler
Haste Code from Simulink
23 of about 50
Coding Styles in HASTE


Different coding styles
available: which is the
best for Simulink blocks?
To benchmark, we used a
simple datapath made of:




4 different arithmetic
operations
2 x 16 bit-wide inputs
1 x 3 bit-wide selector
1 x 32 bit-wide output
Haste Code from Simulink
*
+
>
x3
24 of about 50
Multiple vs. Single Processes

The same block described as a single process
or as an ensemble of concurrent processes
produces different implementations
forever do
multiplier(...)
||
adder(...)
||
comparator(...)
||
fixedGain(...)
forever do multiplier(...) od
|| forever do adder(...) od
|| forever do comparator(...) od
|| forever do fixedGain(...) od
od
Haste Code from Simulink
25 of about 50
Shared Variables vs. Channels


Variables are cheaper than channels
channels automatically make synchronization
& C = func(& i0 ? var T
& i1 ? var T ):T. ( i0 + i1 )
fit T
& pipeline: main proc(
& x ? chan T broad pas
& y ! chan T ).
begin |
forever do
wait( outprobe( x ) )
;(
y!C( .i0(dataprobe(x)),
.i1(B(.i0(dataprobe(x)),
.i1(A(dataprobe(x)))) ) )
||
x?~ )
od
end
& C = proc( & i0 ? chan T
& i1 ? chan T & o0 ! chan T ).
begin |
forever do
wait( outprobe(i0) * outprobe(i1) )
;
o0!( dataprobe( i0 ) +
dataprobe( i1 ) ) fit T
;
( i0?~ || i1?~ )
od end
& pipeline: main proc( & x ? chan T
broad pas
& y ! chan T ).
begin & c0 : chan T & c1 : chan T
|
A(x,c0)||B(c0,x,c1)||C(c1,x,y)
end
Haste Code from Simulink
26 of about 50
Tupled vs. Separated Channels


Tupled
channels (c)
are cheaper
than separated
one (b)
But they can
introduce
deadlock in
several
configurations
Haste Code from Simulink
27 of about 50
Deadlock Exposed
Haste Code from Simulink
28 of about 50
Register Insertion




Using HASTE and TiDE 5.2, registers are inserted at
inputs
Simulink blocks have usually only one output and one or
more inputs
We would like to have register on output, for less area
occupation
At the moment (TiDE 5.2) it is not possible, but in TiDE6
it will be
Haste Code from Simulink
29 of about 50
Some Figures (htcomp + htmap)
System
Datapath
Tupled
inputs
Multiple
forever do
Independ.
parallel
inputs
V
-
V
-
V
-
932.2
156.7
V
-
-
V
V
-
902.0
156.3
-
V
-
V
V
-
848.7
129.3
-
V
V
-
V
-
871.7
124.7
V
-
V
-
-
V
331.0
14.3
V
-
-
V
-
V
298.0
8.3
Haste Code from Simulink
Pipelined
version
Fully
combinati
onal
Area [mm2]
C-gates
Global
forever do
30 of about 50
Conclusions

After this analysis we can decide to:
 Use
multiple processes description
 Use channels instead of variables
 Use separated channels
 Registers are not optimized, but left to compiler
optimization
Haste Code from Simulink
31 of about 50
Simulink-Specific
Issues
Multidimensional Objects


Simulink models can
easily process scalars,
vectors or matrices
Depending on throughput
constraints we can decide
to process each data
component serially or in
parallel
Serial vector
1,3,5,7
2
2,6,10,14
Parallel vector
1
3
5
7
Haste Code from Simulink
2
2
2
2
2
6
10
14
33 of about 50
Sampling Blocks



Sampling blocks are the ones with special timing
constraints, i.e., they have to guarantee data processing
in a fixed amount of time
They can be used to change input/output data rate
The main blocks belonging to this category



unit delay
zero order hold
rate transition
Haste Code from Simulink
34 of about 50
Unit Delay
FSM for scalar data


It introduces one memory
stage from input to output
When a “Sampling Time”
period has been elapsed


The old data (multiple data,
in case of arrays) is (are)
generated on output
A new data (multiple
components) is (are)
sampled
Haste Code from Simulink
35 of about 50
Zero Order Hold
FSM for scalar data


It maintains output data
until a “Sampling Time”
period has been elapsed
When it elapses, a new
acquired input data
(possibly multiple) is
transferred to the output
Haste Code from Simulink
36 of about 50
Rate Transition



It is a super set of previous blocks: it is used to
change data rate from input to output, both
increasing or decreasing it
Replicates/consumes tokens
It can be described as a cascade of “unit delays”
and “zero order” blocks
Haste Code from Simulink
37 of about 50
Sampling Blocks
Implementation



All these blocks have to be connected to a
clock/timing (?!?) signal to guarantee timing
To reduce overhead introduced by clock
interaction, it is possible to use a fully
asynchronous version of such blocks, yet
precisely timed
Timing clock interaction is still necessary but it
could be moved to I/Os
Haste Code from Simulink
38 of about 50
Simulink-Haste Flow
Implementation
The Flow
Simulink Model


Integrates
CodeSimulink with the
existing TiDE flow
Each block is
converted in both
Haste and RTL code
CodeSimulink
VHDL Descriptions
Haste Description
RTL Compiler
htcomp + htmap
Verilog Descriptions
HT Back-end
Haste Code from Simulink
40 of about 50
Haste File Generated

Is composed of 6 parts
 Type
definitions (used in the file itself)
 Top level procedure interface definition
 Internal channels
 Internal functions (the interface to RTL code)
 Internal procedures (protocol management and
functions instance)
 Procedure instances and connections
Haste Code from Simulink
41 of about 50
E.g.: Haste File Generated
// Types Definition
& STD_LOGIC_VECTOR_17 = type [0..2^17-1]
& STD_LOGIC_VECTOR_16 = type [0..2^16-1]
& STD_LOGIC_VECTOR_15 = type [0..2^15-1]
& STD_LOGIC_VECTOR_14 = type [0..2^14-1]
& STD_LOGIC_VECTOR_1 = type [0..2^1-1]
// Top entity instance
& inout1 : main proc(
& DIGINA ? chan STD_LOGIC_VECTOR_15
& DIGINB ? chan STD_LOGIC_VECTOR_14
& DIGOUTA ! chan STD_LOGIC_VECTOR_17
).
begin
...
Haste Code from Simulink
42 of about 50
E.g.: Haste File Generated
// Functions declarations
& sim_sum1_f = func (
& A1 ? Var STD_LOGIC_VECTOR_16
& A2 ? var STD_LOGIC_VECTOR_1
): STD_LOGIC_VECTOR_17. import
// Component declarations
& sim_sum1 = proc (
& Y1 ! Chan STD_LOGIC_VECTOR_17
& A1 ? Chan STD_LOGIC_VECTOR_16
& A2 ? Chan STD_LOGIC_VECTOR_1
). begin
& v_A1 : var STD_LOGIC_VECTOR_16
& v_A2 : var STD_LOGIC_VECTOR_1
| forever do
( A1 ? v_A1 || A2 ? v_A2 )
;
Y1 ! sim_sum1_f( .A1( v_A1 ), .A2( v_A2 ) )
od
end
Haste Code from Simulink
43 of about 50
E.g.: Haste File Generated
// Internal signal declarations
& Y1_5 : chan STD_LOGIC_VECTOR_1 broad
& Y1_4 : chan STD_LOGIC_VECTOR_17 broad
& Y1_1 : chan STD_LOGIC_VECTOR_16 broad
...
// Component instantiation
sim_constant ( .Y1( Y1_5 ) )
|| sim_digOut ( .A1( Y1_4 ),
.DIGIO( DIGOUTA ) )
|| sim_sum1 ( .Y1( Y1_4 ),
.A1( Y1_1 ),
.A2( Y1_5 )
)
...
Haste Code from Simulink
44 of about 50
E.g.: VHDL File Generated
-- Top entity instance
ENTITY sim_sum1 IS
PORT (
DIGOUTA_i : IN STD_LOGIC_VECTOR(15 downto 0);
DIGIN_VALA0 : IN STD_LOGIC;
DIGIN_RDYA : OUT STD_LOGIC;
DIGOUTB_i : IN STD_LOGIC_VECTOR(0 downto 0);
DIGIN_VALA0 : IN STD_LOGIC;
DIGOUTA_o : OUT STD_LOGIC_VECTOR(16 downto 0);
DIGOUT_VALA : OUT SIM_SIGVAL_SYNCHPAR;
DIGOUT_RDYA : IN STD_LOGIC;
nRESET : IN STD_LOGIC;
CLK : IN STD_LOGIC
-- left unconnected in this implementation
);
END sim_sum1 ;
Haste Code from Simulink


VHDL is used to
describe the
block functionality
For each block a
HDL file will be
generated with
desired
parameters (Data
width, binary
point...)
45 of about 50
Conversion of Simulink models
Simulink model
Compiled Haste program
// Functions declarations
& sim_sum1_f = func (
& A1 ? Var STD_LOGIC_VECTOR_16
& A2 ? var STD_LOGIC_VECTOR_1
): STD_LOGIC_VECTOR_17. import
// Component declarations
& sim_sum1 = proc (
& Y1 ! Chan STD_LOGIC_VECTOR_17
& A1 ? Chan STD_LOGIC_VECTOR_16
& A2 ? Chan STD_LOGIC_VECTOR_1
). begin
& v_A1 : var STD_LOGIC_VECTOR_16
& v_A2 : var STD_LOGIC_VECTOR_1
| forever do
( A1 ? v_A1 || A2 ? v_A2 )
; Y1 ! sim_sum1_f( .A1( v_A1 ), .A2( v_A2 ) )
od
end
46 of about 50
Haste
Code from Simulink
Case Studies
3-Input 32-bits adder
Haste Code from Simulink
48 of about 50
Simple 16-bits ALU (*,+,<,gain)
Haste Code from Simulink
49 of about 50
8th order, 20-bits wide IIR Filter
Haste Code from Simulink
50 of about 50
Results
Speed comparison by simulation on Cyclone II FPGA
SPEED (CycloneII)
Design
Unit
Adder
Msa/s
Datapath Msa/s
IIR
Msa/s
C.S. Synchronous C.S. Asynchronous
218
132
120
112 - 125
45 - 80
70 - 98
Area comparison on commercial 90nm ASIC library
AREA (ASIC)
Design
Unit
Adder
gates
Datapath gates
IIR
gates
Synchronous
Asynchronous
Haste
CS-Haste
Logic Regs Total Logic Regs Total Logic Regs Total Logic Regs Total
128 130 258 137 130 267 130 130 260 146 161
307
1181 179 1360 1224 186 1410 1144 186 1330 1230 186
1416
3,654 1029 4683 4752 1029 5781 4528 1029 5557 NA
NA
NA
Haste Code from Simulink
51 of about 50
Proprietary Audio Test Chip
Area [um2]
Handwritten +
TiDE 5.2
(not available)
Sequential
32,018
89,792
11,632
Logic
141,676
357,368
152,468
Total
173,694
468,746
164,100


CodeSimulink +
TiDE 5.2
CodeSimulink +
TiDE 6.0
TiDE 5.2 has limitations (e.g. registers placed at the
input instead of the outputs) which made Simulink to
Haste conversion very inefficient.
TiDE 6.0 has overcome these limitations and the
automatically generated ASIC is smaller than the
handwritten one.
Haste Code from Simulink
52 of about 50
Conclusions






Optimization at system level (CodeSimulink ), followed by automatic
translation to Haste can achieve the same quality as manual coding
with Haste, followed by hand optimization at Haste level, although
The major drastic improvement is in productivity, maintainability and
reusability of CodeSimulink model
System-level cosimulation reduces development risks, makes
optimization easier, makes interdisciplinary interactions much easier
Time to market is significantly faster
Performance reduction due to library-based design (about 10-20% in
the average) is completely overcompensated by the performance
improvement achievable with high level specification, simulation and
optimization
Further manual optimizations are feasible if economical returns
justify them
Haste Code from Simulink
53 of about 50
That’s folk!
Thank you for your attention!
Haste Code from Simulink
54 of about 50
Download