7 Series DSP Resources
Part 1
Objectives
After completing this module, you will be able to:
Describe the primary usage models of DSP slices
Describe the DSP slice in the 7 series FPGAs
Lessons
DSP Overview
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
Performance requirements
are outpacing traditional
DSP solutions
(Algorithmic & Processor Forecast)
Performance
Growing DSP Performance Gap
4G
LTE
SDR
3G Imaging
Radar
SD/HD Video
DSP/GPP
*
Traditional
Processor
Architectures
1960
* Source: Jan Rabaey, BWRC
1970
1980
1990
2000
2010
Need a
solution
to fill the gap
Typical DSP Operation
Diagram of a typical FIR filter
– Parallel computing process by nature
– N number of taps
– N multiplications should happen in parallel
Viewed as an Equation
i=N-1
Y(n) =

Coefficients
ki.X(n-i)
i=0
Accumulate
N times
Viewed as a Diagram
X(n)
Delay
(Register)
Z-1
Z-1
Z-1
Z-1
k0
k1
k2
k3
kN-1
Coefficient
Multiply
Multiply

Y(n)
Summation
Sequential vs. Parallel DSP Processing
Standard DSP Processor –
Sequential
(Generic DSP)
X
C1
C0
X
X
C2
C3
Reg
+
C0
Reg
3960 clock
cycles
needed
X
Reg
Coefficients
Data In
Reg
Data In
FPGA - Fully Parallel Implementation
(7 Series FPGA)
X …C2015
Single-MAC Unit
Reg
3960 operations
in 1 clock cycle
+
Data Out
Data Out
1.2 GHz
= 303 KSPS
3960 clock cycles
600 MHz
1 clock cycle
= 600 MSPS
X
DSP Slice Features
48
48
A:B
B REG
B
C
D
Input Conditioning
A
CE
D
Q
2-Deep
ALUMode
18
M REG
CE
D
Q
A REG
CE
D
Q
2-Deep
72 36 0
36 0
1
25
X
4
P REG
CE
D
Y
Z
17-bit shift
CarryIn
C or MC
7
OpMode
Q
Z-2
PATTERN
DETECT
=
C REG
Z-1
P
0
17-bit shift
CE
D
Q
48
MULT
Z
-1
OP
CTL
48
ADD
Z-1
FIR Filter Mapped to DSP Slices
This filter structure, while referred to as a
Systolic FIR filter, is really a Direct Form
Type I with one extra stage of pipelining
The input time delay series is created inside
the DSP slice for maximum performance
irrespective of the number of coefficients
x(n)
18
Coefficients are
from left to right,
causing the
latency to be as
large and grow
with the increase
of coefficients
K0
K1
K2
K3
y(n)
38
0
DSP48E1 Slice
DSP48E1 Slice
opmode = 0000101 opmode = 0010101
Dedicated cascade
connections (PCOUT and
PCIN) are exploited to
achieve maximum
performance
Non-DSP Functions (Addition)
START: This is the typical
adder tree found in many
signal processing designs
1
in
Remove all pipelining
from the tree. This makes it
easier to understand and
visualize the changes
2
Rearrange the tree. Notice
that functionally has not
changed. The diagram has
just been redrawn
in
in
out
out
out
3
Pipelining is required for
performance. Adding one in
the chain requires one in the
data path delay as well.
Determining mapping to
DSP48E is easy now
in
out
0
DSP48E Slice
opmode = 0000101
DSP48E Slice
opmode = 0010101
Lessons
DSP Review
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
Summary
All 7 series FPGAs contain the same DSP48E1 cell
– The DSP48E1 is identical to the one used in the Virtex-6 FPGA
The DSP48E1 cell has the following features
– 25x18 signed multiplier
– 48-bit add/subtract/accumulate
– Pipeline registers for high speed
– Pattern detector
– SIMD operators
– Cascade paths
– 25 bit pre-adder
– Dynamic pipeline control
Where Can I Learn More?
7 Series DSP48E1 Slice User Guide
– Slice description
– Design consideration
• How to design to optimize for power and performance
• How to use advanced design techniques
– Design recommendations for XST
• This guide has example inferences of many architectural resources
– XST User Guide
• Refer to the Coding Techniques chapter
Xilinx Education Services courses
– www.xilinx.com/training
• Xilinx tools and architecture courses and other Free videos!
Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on,
or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded,
displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of
privacy and publicity, and communications regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents,
copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design.
Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no
obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the
accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH
YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE,
WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS,
IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF
FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT
THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU
WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe
controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons
systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You
represent that use of the Design in such High-Risk Applications is fully at your risk.
© 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All
other trademarks are the property of their respective owners.
7 Series DSP Resources
Part 2
Objectives
After completing this module, you will be able to:
Describe the DSP slice in the 7 series FPGAs
Lessons
DSP Overview
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
A:B
18
25
C
C
48
Y
P
C’
0
=
>>17
Z
Carry
30
ACIN
BCIN
INMODE
5
 25x18 signed multiplier
 48-bit add/subtract/accumulate
 48-bit logic operations
 Pipeline registers for high speed
 Pattern detector
 SIMD operations (12/24 bit)
 Cascade paths for wide functions
 Pre-adder
7
3
CarryInSel
18
OpMode
>>17
4
48
P
48
P
2
PATTERN_
DETECT
PCIN
D
43
43
0
1
25
CARRY
OUT
PATTERN
30
M
X
C’
A
25 X 18
30
86 0
ALUMode
30
Dual A, D
Register
With
Pre-adder
4
6
MULT
SIGNIN
48
PCOUT
CARRY
CASCOUT
MULT
SIGNOUT
18
Dual B
Register
P
18
CarryIn
B
18
CARRY
CASCIN
ACOUT
BCOUT
7 Series DSP48E1 Slice
X, Y, and Z Multiplexers
Adder/subtractor operates on
X, Y, Z and CIN operands
– Table shows basic operations
X, Y, and Z multiplexers allow
for dynamic OPMODEs
Multiplier output requires both
X and Y multiplexers
Normal or 17-bit right shifted with
MSB fill for multi-precision
arithmetic
ALUMODE
Operation
0000
Z + X + Y + CIN
0001
-Z + (X + Y + CIN) – 1
0010
-Z – X – Y – CIN – 1
0011
Z – (X + Y + CIN)
Others
Logic Operations
Apply Your Knowledge
OPMODE
Controls the behavior of X, Y, and Z multiplexers
1) Given this OPMODE table,
what is the OPMODE for the
following functions?
– C + A:B
– A*B + C
– P + C + PCIN
Z
6 5
x x
x x
x x
x x
Y
4 3 2
x x x
x 0 1
x x x
x x x
1
0
0
1
1
Z
6 5
x x
x x
x x
x x
Y
4 3 2
x 0 0
x 0 1
x 1 0
x 1 1
1
x
0
x
x
Y
3 2
x x
x x
x x
x x
x x
x x
x x
x x
1
x
x
x
x
x
x
x
x
6
0
0
0
0
1
1
1
1
Z
5
0
0
1
1
0
0
1
1
4
0
1
0
1
0
1
0
1
X
0
0
1
0
1
X
0
x
1
x
x
X
0
x
x
x
x
x
x
x
x
Select
Zero
Mulitplier Output (A xB)
P
A:B
Notes
Default
Must select with Y[3:2]=01
48 bits wide
Select
Zero
Mulitplier Output (A x B)
48'hFFFF_FFFF_FFFF
C
Notes
Default
Must select with X[1:0]=01
For Bitwise Logic Ops
Select
Zero
PCIN
P
C
P
Shift(PCIN)
Shift(P)
Notes
Default
For MACC extend
Shifted by 17 bits
Shifted by 17 bits
illegal selection
Dual B Register
B input to multiplier is controlled by
INMODE[4]
– Dynamically selects B1/B2 pipeline level
B input to X MUX and BCOUT cascade
outputs are statically controlled by
bitstream options
B
BCIN
18
B1
Bitstream Controlled
Dynamically Controlled
18
BCOUT
18
X MUX
18
B2
INMODE[4]
B MULT
Dual A, D, Registers and Pre-Adder
Bitstream Controlled
A input to multiplier is controlled by
INMODE[3:0]
Dynamically Controlled
– Dynamically selects A1/A2 pipeline level
– Dynamically selects add/subtract
30
ACOUT
– Dynamically selects Zero for A or D
A
30
ACOUT and X MUX input are statically
controlled
30
ACIN
INMODE[1]
A2
A1
X MUX
25
INMODE[0]
A MULT
D
25
25
D
INMODE[2]
INMODE[3]
AD
25
Two-Input Logic Functions
ALUMODEs
48-bit logic operations
– XOR, XNOR, AND, NAND, OR,
NOR, NOT
ALUMODE[3:0]
0
P
A:B
X
0
1
Y
0
PCIN
P
C
Z
OPMODE[3:0]
P
Logic Unit Mode
OPMODE[3:2]
ALUMODE[3:0]
X XOR Z
00
0100
X XNOR Z
00
0101
X XNOR Z
00
0110
X XOR Z
00
0111
X AND Z
00
1100
X AND (NOT Z)
00
1101
X NAND Z
00
1110
(NOT X) OR Z
00
1111
X XNOR Z
10
0100
X XOR Z
10
0101
X XOR Z
10
0110
X XNOR Z
10
0111
X OR Z
10
1100
X OR (NOT Z)
10
1101
X NOR Z
10
1110
(NOT X) AND Z
10
1111
Pattern Detect and SIMD
Pattern detection
– Pattern and mask operation on output of adder
=
• Pattern can be constant (set by attribute) or C input
– Enables
•
•
•
•
Symmetric rounding for multi-precision operations
Convergent rounding
Saturation
Accumulator terminal count
SIMD operations
– 48-bit adder broken into 2x24 bits or 4x12 bits
• Allows two or four independent additions to be done
– Carry bits brought out independently and
disabled between sections
• Carry bits can be cascaded between DSP48E1
slices
C or MC
P
Cascade Paths
Cascade paths exist from each DSP48E1 slice to the slice above
it
– A input, B input, P output, and carry out
• P cascade path can be shifted by 17 bits by slice above
Enables common functions with little or no additional resources
– Wider accumulators, wider multipliers, complex multipliers, and FIR filters
– Example: 35-bit x 25-bit multiplier with two DSP48E1s
25
DSP48_1
OPMODE 0010101
ALUMODE 0000
B
P
B[34:17]
ACIN
DSP48_0
A[24:0]
OPMODE 0000101
0,B[16:0]
ALUMODE 0000
A
18
P[42:0] = OUT[59:17]
SHIFT 17
25
B
P
18
P[16:0] = OUT[16:0]
Lessons
DSP Review
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
Summary
All 7 series FPGAs contain the same DSP48E1 cell
– The DSP48E1 is identical to the one used in the Virtex-6 FPGA
The DSP48E1 cell has the following features
– 25x18 signed multiplier
– 48-bit add/subtract/accumulate
– Pipeline registers for high speed
– Pattern detector
– SIMD operators
– Cascade paths
– 25 bit pre-adder
– Dynamic pipeline control
Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on,
or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded,
displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of
privacy and publicity, and communications regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents,
copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design.
Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no
obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the
accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH
YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE,
WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS,
IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF
FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT
THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU
WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe
controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons
systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You
represent that use of the Design in such High-Risk Applications is fully at your risk.
© 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All
other trademarks are the property of their respective owners.
7 Series DSP Resources
Part 3
Objectives
After completing this module, you will be able to:
Describe the basic usage models of DSP slices
Describe the DSP slice in the 7 series FPGAs
Lessons
DSP Overview
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
Pre-Adder
The pre-adder can add or subtract the two 25-bit operands on the
A and the D inputs before the result drives the multiplier
Benefits
– Perfect for operations using symmetrical coefficients
– Doubles the efficiency of symmetric FIR and symmetric IIR and transpose
convolution filters
– Half the power consumption compared to architectures without a pre-adder
– Smaller total logic footprint
– A small change with a big benefit
Symmetrical Filters
When the coefficients are symmetrical
– The pre-adders either reduce the number of multiplications by 50% or
double the sample rate
– Factorizing the taps replaces one multiplication by a pre-addition
(or pre-subtraction)
Symmetrical Filter Example
k13
k17
Non symmetrical filter (k13≠k17) :
(tap13×k13) + (tap17×k17)
2 mults and one post-add
Symmetrical filter (k13=k17) :
(tap13+tap17) × k13
Direct benefit: saves 50% of the DSP slices
1 pre-add
1 mult
Six-Tap Transpose FIR Filter Without Pre-Adder
x(n)
z-2
z-2
z-2
z-2
z-2
z-2
x(n-2)
k0
X
k1
x(n-7)
k2
x(n-6)
z-1
0
X
+
k2
x(n-5)
z-1
z-1
X
+
k1
+
k0
z-1
z-1
z-1
X
X
x(n-3)
x(n-4)
z-1
z-1
X
+
z-1
+
z-1
z-1
Uses six legacy DSP slices (without pre-adder)
+
z-1
y(n-4)
Six-Tap Transpose FIR Filter Using the Pre-Adder
x(n)
z-2
z-2
z-2
z-1
+
+
z-1
z-1
z-1
x(n-6)+x(n-3)
X
k1
z-1
0
z-1
+
x(n-5)+x(n-4)
k2
z-1
+
x(n-7)+x(n-2)
X
k0
z-1
z-1
+
X
z-1
z-1
+
z-1
Optimized implementation supported by
y(n-4)
XST
using only three slices instead of six
Dynamic Pipeline Control
The 7 series FPGA DSP slice has dynamic pipeline control on the
A and B registers
– User can select which of the two pipeline registers to use for calculations
on a clock-by-clock basis
Benefits
– Allows an operation to reuse the same operand in subsequent cycles
Application: Sequential Complex Multiply
Complex Multiply
• (A + ai) * (B + bi) = (AB-ab) + (Ab+aB)i
• Use the two AB registers to locally store the real and imaginary parts of the operands
– Read each component of the complex operands out of memory only once
– Fewer memory reads because A, a, B, and b are then stored locally
CEB1
Dynamic routing is controlled by an FSM
updating the INMODE register on the fly
B
B
b
CEB2
INMODE
CEA1
X
+
m
A
USE_DPORT=FALSE
A
a
CEA2
INMODE
Needs only four clocks
for 18-bit data
using a single slice
Application: Sequential Large Multiply
Four-step large multiplication
42 bits * 34 bits = (A:a) * (B:b) =
A*B + sh17(A*0b + B*00000000a + sh17(0b*00000000a)
Needs only four clocks for 18-bit data using a single slice
Lessons
DSP Review
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
IP Support and Inference
Some basic functions can be inferred
– Example: Multiplier, Multiply-Accumulate, …
Other functions are supported through the CORE Generator™
interface
– Examples: FFT, FIR Compiler, and DDS Compiler
New IP cores become available with each service pack
– Visit the IP Center for information on the newest IP cores
• www.xilinx.com/ipcenter
Inferring a 16 x 16 Multiplier
-------------------------------------------------------------------- Example: 16x16 Multiplier, inputs registered once,
-- outputs twice
-Matches 1 DSP48 slice
-OpMode(Z,Y,X):Subtract
-(xxx,01,01):0
------------------------------------------------------------------p1 <= a1*b1;
///////////////////////////////////////////////////////////////////
// Example: 16x16 Multiplier, inputs registered
// once, outputs twice
//
Matches 1 DSP48 slice
//
OpMode(Z,Y,X):Subtract
//
(xxx,01,01):0
///////////////////////////////////////////////////////////////////
assign p1 = a1*b1;
process (clk) is
begin
if clk'event and clk = '1' then
if rst = '1' then
a1 <= (others => '0');
b1 <= (others => '0');
p <= (others => '0');
elsif ce = '1' then
a1 <= a;
b1 <= b;
p <= p1;
end if;
end if;
end process;
always @(posedge clk)
if (rst == 1'b1)
begin
a1 <= 0;
b1 <= 0;
p <= 0;
end
else if (ce == 1'b1)
begin
a1 <= a;
b1 <= b;
p <= p1;
end
Inferring a Multiply Accumulate (MACC)
------------------------------------------------------------- Example: Multiply add function, single level of register
-Matches 1 DSP48 slice
-OpMode(Z,Y,X):Subtract
-(011,01,01):0
-----------------------------------------------------------p1 <= a*b + c;
process (clk) is
begin
if clk'event and clk = '1' then
if rst = '1' then
p <= (others => '0');
elsif ce = '1' then
p <= p1;
end if;
end if;
end process;
////////////////////////////////////////////////////////////
// Example: Multiply add function, single level of register
//
Matches 1 DSP48 slice
//
OpMode(Z,Y,X):Subtract
//
(011,01,01):0
////////////////////////////////////////////////////////////
assign p1 = a*b + c;
always @(posedge clk)
if (rst == 1'b1)
p <= 0;
else if (ce == 1'b1) begin
p <= p1;
end
Inferring a 2-Input Adder
------------------------------------------------------------------------- Example: 16 bit adder 2 inputs, input and output
-- registered once
-- Mapping to DSP48 should be driven by timing as
-- DSP48 are limited resources. The -use_dsp48 XST
-- switch must be set to YES
-- Matches 1 DSP48 slice
-OpMode(Z,Y,X):Subtract
-(000,11,11):0 or
-(011,00,11):0
-----------------------------------------------------------------------p1 <= a1 + b1;
process (clk) is
begin
if clk'event and clk = '1' then
if rst = '1' then
p <= (others => '0');
a1 <= (others => '0');
b1 <= (others => '0');
elsif ce = '1' then
a1 <= a;
b1 <= b;
p <= p1;
end if;
end if;
end process;
////////////////////////////////////////////////////////////////////////
// Example: 16 bit adder 2 inputs, input and output
// registered once
// Mapping to DSP48 should be driven by timing as
// DSP48 are limited resources. The -use_dsp48 XST
// switch must be set to YES
// Matches 1 DSP48 slice
//
OpMode(Z,Y,X):Subtract
//
(000,11,11):0 or
//
(011,00,11):0
////////////////////////////////////////////////////////////////////////
assign p1 = a1 + b1;
always @(posedge clk)
if (rst == 1'b1)
begin
p <= 0;
a1 <= 0;
b1 <= 0;
end
else if (ce == 1'b1)
begin
a1 <= a;
b1 <= b;
p <= p1;
end
Inferring a Loadable MACC
-------------------------------------------------------------------------------- Example: Loadable Multiply Accumulate with one level
-- of registers
-Map into 1 DSP48 slice
-Funtion: OpMode(Z,Y,X):Subtract
-- load (011,00,00):0
-- mult_acc (010,01,01):0
-- Restriction: Since C input of DSP48 slice is used, then
-- adjacent DSP cannot se a different c input (c input are
-- shared between 2 adjacent DSP48 slices)
-- Expected mapping:
-AREG: no, BREG: no, CREG: no, MREG: no, PREG: yes
------------------------------------------------------------------------------with load select p_tmp <= signed(c) when '1' ,
p_reg + a1*b1 when others;
process(clk)
begin
if clk'event and clk = '1' then
if p_rst = '1' then
p_reg <= (others => '0');
a1 <= (others => '0');
b1 <= (others => '0');
elsif p_ce = '1' then
p_reg <= p_tmp;
a1 <= signed(a);
b1 <= signed(b);
end if;
end if;
end process;
p <= std_logic_vector(p_reg);
///////////////////////////////////////////////////////////////////////////////
// Example: Loadable Multiply Accumulate with one level
// of registers
//
Map into 1 DSP48 slice
//
Funtion: OpMode(Z,Y,X):Subtract
//
- load (011,00,00):0
//
- mult_acc (010,01,01):0
// Restriction: Since C input of DSP48 slice is used, then
// adjacent DSP cannot use a different c input (c input are
// shared between 2 adjacent DSP48 slices)
// Expected mapping:
// AREG: no, BREG: no, CREG: no, MREG: no, PREG: yes
///////////////////////////////////////////////////////////////////////////////
assign p_tmp = load ? c:p + a1*b1;
always @(posedge clk)
if (p_rst == 1'b1) begin
p <= 0;
a1 <=0;
b1 <=0;
end
else if (p_ce == 1'b1) begin
p <= p_tmp;
a1 <=a;
b1 <= b;
end
Lessons
DSP Review
7 Series FPGA DSP Slice
Pre-Adder and Dynamic Pipeline
Control Advantages
IP Support and Inference
Summary
Summary
All 7 series FPGAs contain the same DSP48E1 cell
– The DSP48E1 is identical to the one used in the Virtex-6 FPGA
The DSP48E1 cell has the following features
–
–
–
–
–
–
–
–
25x18 signed multiplier
48-bit add/subtract/accumulate
Pipeline registers for high speed
Pattern detector
SIMD operators
Cascade paths
25 bit pre-adder
Dynamic pipeline control
DSP48E1 slices can be inferred, instantiated or accessed using IP
cores
Where Can I Learn More?
7 Series DSP48E1 Slice User Guide
– Slice description
– Design consideration
• How to design to optimize for power and performance
• How to use advanced design techniques
– Design recommendations for XST
• This guide has example inferences of many architectural resources
– XST User Guide
• Refer to the Coding Techniques chapter
Xilinx Education Services courses
– www.xilinx.com/training
• Xilinx tools and architecture courses and other Free videos!
Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on,
or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded,
displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of
privacy and publicity, and communications regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents,
copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design.
Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no
obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the
accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH
YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE,
WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS,
IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF
FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT
THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU
WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe
controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons
systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You
represent that use of the Design in such High-Risk Applications is fully at your risk.
© 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All
other trademarks are the property of their respective owners.