Using MATLAB and High-Level Synthesis for DSP Implementation

advertisement
Using MATLAB and High-Level Synthesis for DSP Implementation
Increasingly, design teams are looking to hardware to implement better performance and lower power DSP
algorithms. Chris Eddington, Product Marketing Director at Synopsys, describes Synphony HLS – a new
high-level synthesis tool that can target ASIC and FPGA for both production designs and virtual prototyping.
There is no question that embedded software has many compelling benefits for the chip industry. It is
possible to adapt software for derivative products and upgrade it to solve bugs. Design teams like software
because it reduces development risk. In fact, the industry has done such a good job of talking up the shift to
embedded software that another important design trend has slipped under the radar: the rapidly increasing
need for dedicated hardware engines in chip design.
Designers know that for many high-speed and compute-intensive DSP applications like video, WiMAX
MIMO technology, OFDM and error correction, they have little choice but to use dedicated hardware to
achieve the performance they require. Some applications don’t necessarily need a dedicated hardware
engine for performance, but designers are nevertheless considering hardware in order to achieve the
lowest-power design. In either case, the design challenge is to map the algorithm to an optimal DSP
architecture both quickly and efficiently.
Traditional Routes to DSP Architectures
For many years, the traditional path from DSP concept to implementation has been for system designers to
model the algorithm using a high-level language and hand it off for the design team to figure out the best
architecture. The design team then verifies its RTL description against the algorithm specification before
implementing the chip using logic synthesis, optimization and layout tools.
There are obvious problems with this approach. For one thing, it requires multiple re-coding and reverification steps involving manual effort and the potential introduction of errors in translation. Algorithm
specialists prefer to develop a floating-point model first to explore and validate the basic algorithm in full
precision. Once the algorithm concept is working, they will develop a fixed-point model then choose and
validate word length and precision. Then, the design team will choose the architecture and start RTL coding
with the target technology in mind. Furthermore, prototyping is often required for high performance systemlevel validation of the algorithm implementation. This can mean even more re-coding, re-verification and a
different type of expertise to optimize and map the design into an FPGA. Each of these steps is timeconsuming and error-prone, leading to months of time and effort to get from algorithm concept into prototype
and implementation (Figure 1). This means that verification and validation happen very late in the design
cycle.
Figure 1. Traditional Flow from DSP Concept to Implementation
Higher Abstraction with MATLAB
Increasingly, DSP architects use the MATLAB® environment for early high level floating point algorithm
exploration, analysis, and specification. The MathWorks MATLAB high-level language and interactive
environment enables engineers to describe complex systems quickly and concisely, then analyze, visualize,
and verify their operation using interactive tools and command-line functions.
When designers use MATLAB with the Simulink® environment, they can perform fast, efficient simulation for
both floating- and fixed-point designs and also handle multi-rate discrete time issues. The sophisticated
visualization and analysis features have made MATLAB and Simulink the tools of choice for an increasing
number of DSP algorithm designers.
Because of MATLAB’s widespread use as a precursor to chip design, some EDA and chip vendors have
made various attempts to automate the creation of RTL from the MATLAB environment. Often, the proposed
solutions have drawbacks, which is why many designers still choose to design the architecture and write the
RTL manually.
One way to get from MATLAB to chip implementation is to use IP instantiation and netlisting. This requires a
chip or FPGA vendor to supply matching libraries of highly parameterized IP models – one for Simulink and
a corresponding library for the target technology. Each model represents a DSP operation, such as an FFT
or FIR function. Once the design team has captured and proven the algorithm in Simulink, it can quickly and
easily write out a netlist for the target technology.
There are drawbacks to this approach. First, the design is far from portable. In fact, the DSP architects have
to work at a relatively low-level library and make decisions that would normally be the remit of the hardware
design team – for example, specifying details like how to build a delay line (RAM or registers), and how
much latency it should have. This goes against one of the principal aims of DSP architects in working with
MATLAB’s ‘M’ language, which is to explore algorithms at a high level of abstraction.
Synphony HLS Key Technologies
The IP instantiation technique described above is really just netlist translation. Synopsys’ Synphony HLS is a
true high-level synthesis solution for MATLAB users working with DSP chip applications. It produces
optimized RTL from a single high-level source that designers can target to multiple ASIC and FPGA
technologies – for production or rapid prototyping and at-speed validation. It lets designers quickly and
easily explore different implementation architectures, synthesize an optimal architecture including control
circuitry, and create the design implementation. Synphony HLS also generates C-models that let designers
quickly validate the overall system, and make an early start on developing software (Figure 2).
Figure 2. Synphony High Level Synthesis Flow
Fixed-Point Representation
Synphony HLS provides a fast and efficient way for designers to derive fixed-point models from floatingpoint descriptions in MATLAB. It provides a rule-based, fixed-point propagation flow that allows designers to
generate, explore and integrate M-code functions within Synphony HLS models. Designers can continue to
work at a high level of abstraction and debug the models in the Simulink environment.
Mixed Design Descriptions
Synphony HLS offers a mix of language and model-based design in one environment, which allows
engineers to specify and partition complex behavior with multiple sample rates, interfaces and functional
boundaries.
To support model-based design, the Simulink IP block library within Synphony HLS includes common math
and multi-rate signal processing functions for wireless, telecommunications and multimedia applications.
Synphony HLS automatically selects the parameterized blocks during high level synthesis to produce an
optimized architecture that meets the timing and area constraints.
At the algorithm level, the IP block library requires the DSP engineers to specify only high-level parameters
such as filter coefficients and gain requirements. As such, the Simulink model does not constrain the
implementation, and so provides an appropriate hand-off point to the hardware design team. Debug features
are built into the models, so that verification engineers can easily log, override or clock signals for debugging
and analysis.
Support for Multi-Rate Design
Support for multi-rate design is a common requirement for many high-performance algorithms. Typically the
DSP engineer will analyze the algorithm and decide where it is necessary to change the sample rate. The IP
library includes blocks for sample-rate conversion, which the DSP expert can instantiate and parameterize
so that there is no ambiguity when the hardware team takes the design through to implementation.
The choice of multi-rate clocking strategies has a significant impact on power consumption. Synphony HLS
can auto-generate clock domains to support different clocking strategies, which allows the design team to
explore this area of the design thoroughly, knowing that they can implement the clocking scheme quickly
and without error.
High-Level Synthesis
The hardware design team takes the Simulink model and specifies the target technology, the desired
sample rates and speed requirements. The high-level synthesis tool evaluates a number of different
solutions before creating RTL based on the timing and area constraints.
Synphony HLS uses advanced system-level optimization techniques such as retiming, resource allocation
and sharing, loop unrolling, scheduling (folding), multi-channelization, and architectural selection to produce
an optimal design.
HLS Folding
Folding takes the operations associated with a datapath and maps them onto fewer resources operating at a
higher rate. For example, consider a FIR filter with 100 taps (stages) running at 1 MHz. Each tap has an
associated multiplier and adder function. One approach would be to use 100 multipliers and 100 adders
running at 1 MHz. Alternatively the architecture could comprise one multiplier and one adder running at 100
MHz, with the intermediate results being stored in memory. Synphony HLS will create the option that
minimizes area while meeting the timing constraints.
HLS Multi-Channelization
Consider a video signal in which the same DSP operations are required on the red, green, and blue
channels. In this case, the user needs only identify one channel and tell Synphony HLS to use it for multiple
signals if it can. If the sample rate is sufficiently low compared to the system clock, the synthesis engine will
automatically identify the additional channels and apply the multi-channelization technique to them.
The Synphony HLS engine automatically optimizes the entire design at multiple levels by applying
pipelining, scheduling and binding optimizations across language and model boundaries.
Optimizing for the Target Technology
Synphony HLS uses built-in characterization technologies for fast timing analysis. Fast timing analysis is
good for quickly comparing the performance of a range of different architectures. But to truly optimize a
design, the high-level synthesis engine needs to know the performance of different operators in the target
technology. To do this Synphony HLS uses Synplify Premier (FPGA) and Design Compiler (ASIC) for the
accurate timing estimation needed to make device-specific optimizations for FPGA and ASIC targets. This
methodology enables designers to rapidly explore various architectural tradeoffs from a single model. More
importantly, it increases the reliability of verification through design project phases, whether the target is for
FPGA prototyping, fast architecture exploration, or ASIC implementation.
SoC Integration
Synphony HLS allows users to control and specify the timing of the interfaces to the DSP engine so that it is
easier to integrate the design within a SoC design. It takes the (untimed) model and M-language input and
compiles it into an intermediate format, which is ‘approximately timed’. This representation specifies latency
and has some cycle-accurate timing, but doesn’t yet have full timing information like the RTL description.
The hardware design team can use the approximately timed model to check that buffers are sized
appropriately at their inputs and outputs.
Virtual Prototyping and Verification
As well as producing RTL, Synphony HLS generates flexible, high-performance fixed-point ANSI C-models
that the verification team can use in virtual platforms for early software development and system simulation,
evaluation and analysis. To help verification engineers, Synphony HLS can also auto-generate testbenches.
Summary
DSP design is currently one of the fastest-growing application areas in digital electronics. Both DSP
architects and hardware design engineers have robust and proven design tools: MATLAB and Simulink for
the architects and logic simulation and synthesis for the design engineers. Until now, however, there has
been no efficient, automated methodology to bridge the two domains.
Synphony HLS a more automated design and verification flow from high level MATLAB descriptions. It
enables algorithm and system engineers to prototype, validate, and explore their algorithm concepts much
earlier in the design cycle and it allows them to continue to work at a high level of abstraction and have a
smooth handover to the hardware and verification teams.
Hardware designers have a robust starting point with the mixed M-language and model-based specification.
Synphony HLS allows them to quickly explore different architectures and select the best for their
performance and power goals, and then implement the architecture in their chosen target technology,
whether ASIC, FPGA or prototype without having to re-code the design.
Chris Eddington
Product Marketing Director for High Level Synthesis and System Level Products
Chris Eddington drives the product and technical marketing for high level synthesis and system level
products in the Synplicity Business Group.
Prior to joining Synopsys, Mr. Eddington was Director of Product Marketing at Synplicity, Inc., which was
acquired by Synopsys in May 2008. Before Synplicity he was at Mellanox Technologies where he led the
strategic and technical marketing for networking ICs in the high performance computing market. While at
8x8 Inc. he developed several DSP microprocessors for video and voice processing applications.
Previous to that he worked as a systems analyst at NASA’s Jet Propulsion Laboratory and held several IC
design positions in the wireless communications and networking industry.
Mr. Eddington holds a master’s degree in Signal and Image Processing from the University of Southern
California and an undergraduate degree in Physics and Math from Principia College.
Download