ADC Board VHDL Firmware development for Mona Lisa

advertisement
ADC Board VHDL Firmware
development for Mona Lisa
Roy Wastie
Overview
•
•
•
•
•
•
•
•
Introduction
ADC Board
Hardware Blocks
Basic FPGA Architectures
Xilinx ISE 10.1 Tool Flow
USB
Algorithm
VHDL
Introduction
• Applications of FPGAs include digital
signal processing, software-defined radio,
aerospace and defense systems, ASIC
prototyping, medical imaging, computer
vision, speech recognition, cryptography,
bioinformatics, computer hardware
emulation & glue logic for PCBs.
ADC Board
Hardware Blocks
External
Clock &
Trigger
16
channel
ADC
FIFO
FPGA DAQ
FPGA
Memory
controller
USB
Interface
SDRAM
Memory
Basic FPGA
Architectures
Overview
• All Xilinx FPGAs contain the same basic
resources
– Logic Resources
• Slices (grouped into CLBs)
– Contain combinatorial logic and register resources
• Memory
• Multipliers
– Interconnect Resources
• Programmable interconnect
• IOBs
– Interface between the FPGA and the outside world
– Other resources
• Global clock buffers
• Boundary scan logic
Basic Building Block
Configurable Logic block
• Slices contain logic
resources and are
arranged in two colums
• A switch matrix
provides access to
general routing
resources
• Local routing provides
connection between
slices
in the same CLB, and it
provides routing to
neighboring CLBs
COUT
COUT
BUFT
BUF T
Slice S3
Slice S2
Switch
Matrix
SHIFT
Slice S1
Slice S0
CIN
Local Routing
CIN
Virtex-II CLB contains four slices
Basic Building Blocks
Simplified Slice Structure
• Each slice has four
outputs
– Two registered outputs,
two non-registered outputs
– Two BUFTs associated
with each CLB, accessible
by all 16 CLB outputs
• Carry logic runs
vertically,
up only
– Two independent
carry chains per CLB
Slice 0
LUT
Carry
PRE
D
Q
CE
CLR
LUT
Carry
D PRE
Q
CE
CLR
The
Slice
Detailed Structure
• The next few slides
discuss the slice
features
– LUTs
– MUXF5, MUXF6,
MUXF7, MUXF8
(only the F5 and
F6 MUX are shown
in this diagram)
– Carry Logic
– MULT_ANDs
– Sequential Elements
Combinatorial logic
Boolean logic is stored in Look-Up Tables (LUTs)
• Also called Function Generators (FGs)
• Capacity is limited by the number of
inputs, not by the complexity
• Delay through the LUT is constant
A B C D Z
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
Combinatorial Logic
A
B
C
D
. . .
Z
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Storage
Elements
Can be implemented as either flip-flops or latches
• Two in each slice; eight in
each CLB
• Inputs come from LUTs or
from an independent CLB
input
• Separate set and reset
controls
– Can be synchronous or
asynchronous
• All controls are shared within
a slice
– Control signals can be inverted
locally within a slice
FDRSE_1
D
S
Q
CE
R
FDCPE
D PRE Q
CE
CLR
LDCPE
D PRE Q
CE
G
CLR
Dedicated
Logic
FPGAs contain built-in logic for speeding up logic operations and saving resources
• Multiplexer Logic
– Connect Slices and LUTs
• Carry Chains
– Speed up arithmetic operations
• Multiplier AND gate
– Speed up LUT-based multiplication
• Shift Register LUT
– LUT-based shift register
• Embedded Multiplier
– 18x18 Multiplier
Multiplexer Logic
Dedicated MUXes provided to connect slices and LUTs
F5
F8
CLB
Slice S0
F5
F6
Slice S1
F5
F7
Slice S2
F5
F6
Slice S3
MUXF8 combines the two
MUXF7 outputs (from the CLB
above or below)
MUXF6 combines slices S2
and S3
MUXF7 combines the two
MUXF6 outputs
MUXF6 combines slices S0 and S1
MUXF5 combines LUTs in each slice
Carry Chains
Dedicated carry chains speeds up arithmetic operations
• Simple, fast, and
complete arithmetic
Logic
– Dedicated XOR gate
for single-level sum
completion
– Uses dedicated
routing resources
– All synthesis tools
can infer carry logic
COUT
COUT
To S0 of the
next CLB
To CIN of S2 of the next
CLB
First Carry
Chain
SLICE
S3
CIN
COUT
SLICE
S2
SLICE
S1
CIN
Second
Carry
Chain
COUT
SLICE
S0
CIN
CIN
CLB
Multiplier
AND
Gate
Speed up LUT-based multiplication
• Highly efficient multiply and add implementation
– Earlier FPGA architectures require two LUTs per bit to perform
the multiplication and addition
– The MULT_AND gate enables an area reduction by performing
the
multiply and the add in one LUT per bit
LUT
A
CY_MUX
S CO
DI
CI
CY_XOR
MULT_AND
AxB
LUT
B
LUT
Shift Register LUT (SRL16CE)
The shift register LUT saves from having to use dedicated registers
• Dynamically addressable
serial shift registers
– Maximum delay of 16 clock
cycles per LUT (128 per
CLB)
– Cascadable to other LUTs or
CLBs for longer shift
registers
LUT
D
CE
CLK
D Q
CE
D Q
CE
• Dedicated connection from
Q15 to D input of the next
LUT
SRL16CE
– Shift register length can
be changed
asynchronously
by toggling address A
D Q
CE
Q
D Q
CE
A[3:0]
Q15 (cascade out)
Embedded Multiplier Blocks
Saves from having to use LUTs to implement multiplications and increases performance
• 18-bit twos complement signed operation
• Optimized to implement Multiply and Accumulate
functions
• Multipliers are physically located next to block
SelectRAM™
memory
Data_A
(18 bits)
4 x 4 signed
18 x 18
Multiplier
Data_B
(18 bits)
Output
(36 bits)
8 x 8 signed
12 x 12
signed
18 x 18
signed
IOB Element
Connects the FPGA design to external components
• Input path
IOB
– Two DDR registers
• Output path
– Two DDR registers
– Two 3-state enable
DDR registers
• Separate clocks and
clock enables for I and O
• Set and reset signals
are shared
Input
Reg DDR MUX
OCK1
Reg
ICK1
Reg
OCK2
3-state
Reg
ICK2
Reg DDR MUX
OCK1
Reg
OCK2
PAD
Output
Distributed
RAM
Uses a LUT in a slice as memory
• Synchronous write
• Asynchronous read
– Accompanying flip-flops
can be used to create
synchronous read
• RAM and ROM are initialized
during
configuration
– Data can be written to RAM
after configuration
• Emulated dual-port RAM
– One read/write port
– One read-only port
• 1 LUT = 16 RAM bits
LUT
Slice
LUT
LUT
RAM16X1S
D
WE
WCLK
A0
O
A1
A2
A3
RAM32X1S
D
WE
WCLK
A0
O
A1
A2
A3
A4
RAM16X1D
D
WE
WCLK
A0
SPO
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
Block RAM
Embedded blocks of RAM arranged in columns
• Up to 3.5 Mb of RAM in 18-kb
blocks
– Synchronous read and write
• True dual-port memory
– Each port has synchronous read
and write capability
– Different clocks for each port
• Supports initial values
• Synchronous reset on output
latches
• Supports parity bits
– One parity bit per eight data bits
• Situated next to embedded
multiplier
18-kb block SelectRAM memory
DIA
DIPA
ADDRA
WEA
ENA
SSRA
CLKA
DOA
DOPA
DIB
DIPB
ADDRB
WEB
ENB
SSRB
CLKB
DOB
DOPB
Global Routing
• Sixteen dedicated global clock multiplexers
– Eight on the top-center of the die, eight on the bottom-center
– Driven by a clock input pad, a DCM, or local routing
• Global clock multiplexers provide the following:
– Traditional clock buffer (BUFG) function
– Global clock enable capability (BUFGCE)
– Glitch-free switching between clock signals (BUFGMUX)
• Up to eight clock nets can be used in each clock region
of the device
– Each device contains four or more clock regions
Digital Clock Manager (DCM)
• Up to twelve DCMs per device
– Located on the top and bottom edges of the die
– Driven by clock input pads
• DCMs provide the following:
– Delay-Locked Loop (DLL)
– Digital Frequency Synthesizer (DFS)
– Digital Phase Shifter (DPS)
• Up to four outputs of each DCM can drive onto global
clock buffers
– All DCM outputs can drive general routing
TheBuiltSpartan-3
Family
for high volume, low-cost applications
18x18 bit Embedded
Pipelined Multipliers
for efficient DSP
Configurable 18K Block
RAMs + Distributed RAM
Bank 0
Bank 1
Bank 3
Spartan-3
Bank 2
Up to eight on-chip
Digital Clock Managers
to support multiple
system clocks
4 I/O Banks,
Support for
all I/O Standards
including
PCI, DDR333,
RSDS, mini-LVDS
Spartan-3 Family
Based upon Virtex-II Architecture – Optimized for Lower Cost
• Smaller process = lower core voltage
– .09 micron versus .15 micron
– Vccint = 1.2V versus 1.5V
• Logic resources
– Only one-half of the slices support RAM or SRL16s (SLICEM)
– Fewer block RAMs and multiplier blocks
• Clock Resources
– Fewer global clock multiplexers and DCM blocks
• I/O Resources
– Fewer pins per package
– No internal 3-state buffers
– Support for different standards
• New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL
• Default is LVCMOS, versus LVTTL
SLICEM and SLICEL
• Each Spartan™-3 CLB
contains four slices
Left-Hand SLICEM Right-Hand SLICEL
COUT
COUT
– Similar to the Virtex™-II
Slice X1Y1
• Slices are grouped in
pairs
– Left-hand SLICEM
(Memory)
• LUTs can be configured
as memory or SRL16
– Right-hand SLICEL
(Logic)
• LUT can be used as
logic only
Slice X1Y0
Switch
Matrix
SHIFTIN
Slice X0Y1
Fast Connects
Slice X0Y0
SHIFTOUT
CIN
CIN
Xilinx Tool Flow
Xilinx Design Flow
Plan & Budget
Create Code/
Schematic
HDL RTL
Simulation
Implement
Translate
Functional
Simulation
Synthesize
to create netlist
Map
Place & Route
Attain Timing
Closure
Timing
Simulation
Generate
BIT File
Configure
FPGA
Synthesis
Generate a netlist file
• After coding up your HDL code, you will
need a tool to generate a netlist (NGC or
EDIF)
– Xilinx Synthesis Tool (XST) included
– Support for Popular Third Party Synthesis tools:
Synplify, Leonardo Spectrum
Implementation
Process a netlist file
• Consists of three phases
– Translate: Merge multiple design
files into a single netlist
– Map: Group logical symbols from
the netlist (gates) into physical
components (slices and IOBs)
– Place & Route: Place components
onto the chip, connect the
components, and extract timing
data into reports
• Access Xilinx reports and tools
at each phase
– Timing Analyzer, Floorplanner,
FPGA Editor, XPower
Netlist Generated
From Synthesis
.
.
.
Implement
Translate
Map
Place & Route
.
.
.
...
Configuration
• Once a design is implemented, you must create a file
that the FPGA can understand
– This file is called a bitstream: a BIT file (.bit extension)
• The BIT file can be downloaded
– Directly into the FPGA
• Use a download cable such as Platform USB
– To external memory device such as a Xilinx Platform Flash
PROM
• Must first be converted into a PROM file
ISE Project Navigator
Xilinx ISE Foundation is built around the Xilinx Design Flow
• Enter Designs
• Access to synthesis
tools
– Including third-party
synthesis tools
• Implement your
design with a simple
double-click
– Fine-tune with
easy-to-access
software options
• Download
– Generate a bitstream
– Configure FPGA
Synthesizing Designs
Generate a netlist file using XST (Xilinx Synthesis Technology)
Synthesis Processes and Analysis
• Access report
• View Schematics (RTL or Technology)
• Check syntax
• Generate Post-Synthesis Simulation Model
1
Highlight HDL
Sources
2
Double-click to
Synthesize
The Design Summary Displays
Design Data
• Quick View of
Reports,
Constraints
• Project Status
• Device
Utilization
• Design
Summary
Options
• Performance
and Constraints
• Reports
Outline
•
•
•
•
Overview
ISE
Summary
Lab 1: Xilinx Tool Flow
USB
USB2
•
•
•
•
•
•
•
•
Peer to Peer.
Host computer is master.
480Mbits/s 53.24Mb/s theoretical
30MB/s readily achievable in Bulk transfer
mode.
The speeds USB 1.0 Low & Full ,USB2 High
Hot Plug.
Peripherals electronics can be relatively simple
and inexpensive.
Power 500mA from the bus.
USB Data Travels in Packets
•Identified by “Packet ID” (PID)
•Token packet tells what’s coming
•Data packets deliver bytes
•Handshake packets report success or
otherwise
USB Packets
S
S
E
Y
T
N
U
C
P
A
D
D
R
E
N
D
P
D
S
A
Y
T
N
A
C
0
C
R
C
5
Token Packet
C
R
C
1
6
Data
Data Packet
S
A
Y
C
N
K
C
S
O
Y
U
N
T
C
H/S Pkt
Data
Data Packet
C
R
C
1
6
S
A
Y
C
N
K
C
C
R
C
5
Data Stage
S
O
Y
U
N
T
C
H/S Pkt
E
N
D
P
Token Packet
Setup Stage
D
S
A
Y
T
N
A
C
1
A
D
D
R
A
D
D
R
E
N
D
P
D
S
A
Y
T
N
A
C
0
C
R
C
5
Token Packet
C
R
C
1
6
Data
Data Packet
S
A
Y
C
N
K
C
H/S Pkt
Data Stage (cont'd)
S
O
Y
U
N
T
C
A E C
D N R
D D C
R P 5
Token Packet
D
C
S
D
A
R
Y
a
T
C
N
t
A
1
C
a
1
6
S
A
Y
C
N
K
C
Data Packet
H/S Pkt
Data Stage (cont'd)
S
Y I
N N
C
A
D
D
R
E
N
D
P
C
R
C
5
Token Packet
D C
S
A R
Y
T C
N
A 1
C
1 6
Data Packet H/S Pkt
Status Stage
A Control Write Transfer
S
A
Y
C
N
K
C
USB2 Controller
• EZ-USB FX2LP(TM) USB Microcontroller
High-Speed USB Peripheral Controller
• Integrated 8051 Microprocessor.
• Code/Data Downloaded via USB, or
EEPROM.
• Many Integrated Peripherals.
Simple Algorithm
• Sample Data at full rate 2.77Ms/s (16
channels)
• Down Convert Data to by 4
• Write data to USB interface 21.19MB/s
VHDL
VHDL Example
An example of a two-input XNOR gate is shown below.
entity XNOR2 is
port (A, B: in std_logic;
Z: out std_logic);
end XNOR2;
architecture behavioral_xnor of XNOR2 is
-- signal declaration (of internal signals X, Y)
signal X, Y: std_logic;
begin
X <= A and B;
Y <= (not A) and (not B);
Z <= X or Y;
End behavioral_xnor;
Download