An Asynchronous Self-Routing Adaptive Reconfigurable

advertisement
An Asynchronous Self-Routing Adaptive
Reconfigurable FPGA
M. Ferranti and A. Lodi
DEIS, 2, Via Risorgimento, 40136 Bologna, ITALY
{mferranti,anlodi}@deis.unibo.it, http://www.micro.deis.unibo.it
Analysis: Reconfigurable devices consist of logic-blocks interspersed with multi-context memory.
Architecture: We investigate an architecture with all memory information on one side of the array, leaving computation on
the other to achieve speed and compactness. The array consists of a column of CLB’s, interconnected by a Self-Routing
Network. The network (Omega) routes data packets on the basis of their content. An 32 CLB array is 1.8Kx15K sq.lambda.
With asynchronous signalling, logic-blocks are reconfigured as soon as their operands are ready. The delay across one stage
of the network is 250ps(2lambda=.25um), with no need of a over GHz clock synchronizing the stages across the system.
Low-power adaptive circuits were designed to maximize wire bandwidth. In this way, both logic-blocks and wiring
resources are re-used over time reaching optimal hardware exploitation and task independence.
Mapping Results: A few benchmark circuits mapped on our architecture show comparable speed even if we use a singlecontext device of minimal area for that circuit. In an SOC environment with fixed-pin constraints this advantage is
considerably larger.
A Comparison of FPGA Implementations of Two’s
Complement Bit-Level and Word-Level Matrix Multipliers
Radhika S. Grover, Weijia Shang, and Qiang Li
Computer Engineering Department, Santa Clara University, Santa Clara, CA.
rgrover@scudc.scu.edu, {wshang,qli}@sunrise.scu.edu
We present a novel bit-level architecture that differs from other architectures in that individual bits of a word do not have to
be processed as a unit. For example, in matrix multiplication, the carry chain of calculating the product of two numbers is
broken by sending the partial sums and carries of the product to the accumulating operation instead of the whole finished
word. In contrast, in a word-level matrix multiplier a product of two words has to be computed completely before it can be
added to the next word-level product. Bit-level architectures for fixed-point matrix multiplication are proven to be O(log p)
times faster than the fastest word-level architecture, where p is the word length. In this paper, we compare our
implementation of a bit-level twos complement matrix multiplier with a word-level matrix multiplier composed of library
macros. Our results show speedup by a factor of two or more.
Corner Turning Interconnect for an FPGA:
Motivation and Routing
Nicholas Weaver and John Wawrzynek
http://www.cs.berkeley.edu/~nweaver/sfra/
We believe it is important for computational-oriented FPGAs to include pipelined interconnect structures while maintaining
the well understood Manhattan placement properties. In order to develop such an array, we have devised an alternate switchbox topology: a “corner turn” network. Unlike a conventional switch-box based network, our topology allows for efficiently
pipelinable switches, encoded switch points, and fast, polynomial time routing. We have developed the interconnect
topology and a global routing tool, and have compared the results produced by VPR for a comparable conventional
topology. Our router generates nearly optimally short routes, in polynomial time, at the cost of a greater number of wire
channels.
223
Decoder-Driven Switching Matrices in Multicontext FPGAs:
Area, Routability and Speed
V. Baena-Lecuyer, M. A. Aguirre, A. Torralba and L. G. Franquelo
baena@gte.esi.us.es
Modern FPGAs use SRAM cells to store the programming bits that drive the switching matrices. The area of these SRAM
cells and programmable interconnect can be as large as 90% of the total area. This figure drammatically increases in the case
of multicontext FPGAs, where the programming configuration has to be repeated as many times as contexts. The use of
decoders or multiplexers at the logic block inputs is widely accepted as an attractive way for saving area without reducing
routability. This paper introduces a first order area model in terms of standard FPGA parameters which takes into account
the use of decoders to connect logic block output pins to the routing channels. Experimental results are presented, showing
that the use of decoders at the logic block outputs in some multicontext FPGAs saves more than 15% of area while
preserving routability.
Design and Implementation of a Variable Length Packet
Switch Board
Chan Kim, Ji-Myung Rho, Tae-Whan Yoo and Jong-Hyun Lee
ETRI, 161 Kajeong-dong, Yuseong-goo, Taejon, 305-350 Korea
ckim@etri.re.kr, http://home.hanmir.com/~ckim6803
A switch board for ATM or variable length packet switching was implemented using a shared memory switch fabric ASIC
and Xilinx FPGAs. The switch ASICs provides 20 Gbps throughput for 32 ports with 622Mbps speed using parallel
configuration and provides multicast, QoS scheduling, sophisticated back-pressure, and channel grouping function. The
FPGAs constitute the first and second stage of the switching and converts the packets from the back plane into parallel units
and the switched units from the switch ASIC back into packets with buffering, multiplexing and de-multiplexing, format
conversion and synchronization mechanism. The board runs at 100MHz clock and line drivers were eliminated by directly
using the FPGAs’ GTLP buffers. The board also has protection switching mechanism. The architecture was devised to
reduce the latency and buffer space requirement, and with this architecture the number of components was drastically
reduced together with increased performance compared to the former design.
Evaluation of Novel FPGA Features for Automotive
Multimedia Applications
Karl G. Esser, Carsten Oetker, Karlheinz Weiss and Wolfgang Rosenstiel
Forschungszentrum Informatik (FZI) Karlsruhe, Germany
In the past, embedded systems in the automotive area were mainly used for control applications, where extremely high
safety requirements prohibit the use of FPGA. We show, that FPGA have a high potential for use in future car multimedia
systems, which combine the challenges from the automotive and consumer market. Especially in the fields of availability
and time to market, we exploit the advantages of FPGA solutions. Furthermore, we show a tend toward System On Chip
(SoC) based on FPGA architectures. New FPGA features, like on-chip processor cores, offer interesting new possibilities in
this direction, but besides the increasing amount of very fast on-chip static RAM, we still notice a lack of sufficient dynamic
on-chip memory. These perceptions were acquired during the development of an FPGA implementation of a graphic
controller. Our own emulation environment, called SPYDER-VIRTEX-X2E was used for development.
224
Fast Reconfigurable Multiplier for FPGA
Sébastien Favard and Mohamed Shawky
firstname.lastname@utc.fr
In hardwired numerical computations, multiplier is the most important basic operator in terms of delay and die area. Signal
and image processing use convolution as core of common operations like filtering, edge detecting, FFT... Most of the
current implementations of multipliers are very complex and do not take advantage of the main characteristic of multiply
operations in signal processing: multiplying a constant by a variable. In this case, we may use decomposition into adders,
with a maximum of N-1 adders for an N bit multiplier. The results can be reduced using Booth encoding to gain N/2 adders.
In this paper, we propose a multiplier-decomposition method to reduce the global complexity and hence both delay and die
area. Compared to Booth coding decomposition, the resulting mean adders number decreases of 22% for 16 bit multiplier
and 24% for 20 bit multiplier.
Field Programmable Analog Array Modeling Approach
For Fast Prototyping
S. Colancon, G. Cambon, L. Torres and C. Dufaza
Universite Montpellier II, LIRMM, UMR 5506 CNRS /UM II, 161 Rue ADA, 34392 Montpellier cedex 5, France
{colancon, cambon, torres, dufaza}@lirmm.fr
With the recent introduction of FPAA components, the concept of analog prototyping is now affordable for analog design.
We propose the use of FPAA in conjunction with CAD tools to implement a new analog prototyping environment. This one
will have to estimate and validate analog system functionality at different level thanks to behavioral models simulation
and/or hardware simulation results. In an analog prototyping flow, simulation and accurate models plays an essential role. It
is then necessary to possess languages that are easily readable, fast and accurate to give a first idea of the analog
functionality behavior. The emergence of new analog description languages like Verilog-A, allows the possibility we need
for describing a complete analog macro-functions library. This one offers three levels modelisation that give us some
granularity in the analog simulation accuracy. Comparative results between simulation models and hardware prototyping
board validate the accuracy of our models and allow us the possible sharing of a complete analog system between this two
levels in a new analog prototyping methodology.
An FPGA Architecture with Configurable Multiplier and
Carry Units for Improved Arithmetic Performance
Kamal Rajagopalan and Peter Sutton
School of Computer Science and Electrical Engineering, The University of Queensland,
Brisbane QLD 4072 Australia
raja@csee.uq.edu.au, p.sutton@csee.uq.edu.au
FPGAs are increasingly being applied to DSP applications but are often inefficient compared with dedicated DSP chips,
particularly for multiplication-based operations. In this work, an FPGA architecture with Configurable Multiplier and Carry
Units is proposed to improve arithmetic performance. Each logic block of the proposed architecture consists of flexible
multiplier and carry units which can be configured to efficiently support multiplication, addition and multiply-accumulate
operations in serial or parallel form. This is supported by dedicated inter-block interconnect. A programmable multiplexer
allows for the efficient implementation of barrel shifters, thus enhancing the efficiency for floating point arithmetic
operations as well as for fixed-point arithmetic operations. The proposed architecture is suitable for complex DSP
applications, and when compared with the Xilinx XC4000, results in a 75% reduction in logic utilization.
225
FPGA Hardware Synthesis from MATLAB Utilizing
Optimized IP Cores
Malay Haldar, Anshuman Nayak, Alok Choudhary and Prith Banerjee
http://www.MachDesignSystems.com
We present a MATLAB compiler to compile algorithms described in MATLAB to an architecture comprising of a
processor core and a reconfigurable logic component. The processor core represents an embedded or DSP processor. The
reconfigurable logic part represents a FPGA. The compilation process is particularly tuned towards synthesis of
signal/image processing and communication applications as they benefit the most from our target architecture. Design reuse
is particularly important for such applications as some common functions appear very frequently in such applications. Our
compiler can automatically leverage highly optimized intellectual property (IP) cores for the frequently occurring functions.
The IP cores are integrated into the synthesized designs in an automated fashion. Along with the hardware descriptions of
the IP cores, an executable code is present in the IP database. The executable code actively works with the compiler to
generate arbitrary complex interfaces and supply accurate information regarding the area and delay for the various
parameterizable IP cores.
Implementation of a VME Bus to Internal Bus Bridge
FPGA Core
Xavier Revés, Antoni Gelonch, J. L. Garcia and Ferran Casadevall
Universitat Politècnica de Catalunya (UPC), Departament de Teoria del Senyal i Comunicacions,
Jordi Girona 1-3 08034, Barcelona, Spain
{xreves,antoni}@xaloc.upc.es garciam@teleline.es,feranc@tsc.upc.es
We present an FPGA core implementing the required bridging between the standard asynchronous VME bus and a simple
synchronous Internal Bus. This VME-Internal Bus bridge core reduces the complexity of interfacing a complete VME
backplane, supporting all the standard defined functionalities (data, interrupt and bus managing). Its use of relatively few
resources (about 1000 Logic Elements), its high level of performance, its ability to manage up to 80 Mbytes/sec in the VME
side and 200 Mbytes/sec in the Internal Bus side, and its flexibility, makes this core specially interesting in VME systems
with high transmission rates. Because VME bus is a mature technology constantly updated and following the state of the art
in digital systems, it has a wide acceptance and functionality in many industry-related environments. The use of an FPGA to
implement this bridge allows the user to customize and release it depending on the application, thus obtaining a more
compact solution.
The Machine CEPRA-S Configured for Stream Processing
Rolf Hoffmann, Bernd Ulmann, Klaus-Peter Völkmann and Stefan Waldschmidt
http://www.informatik.tu-darmstadt.de/MP
Stream processing is a very efficient execution model for processing large amounts of data in similar ways without the
restrictions of traditional vector architectures. The concept of vector processing has been extended to obtain a more flexible
and efficient execution model. In a stream processor data and instruction streams are associated allowing individual
processing of data stream elements under control of an instruction stream. A stream processor architecture has been
designed which can be configured on the CEPRA-S (a configurable coprocessor consisting of two FPGAs and 10 memory
banks thus allowing efficient implementation of different logical architectures requiring a large memory bandwidth.) The
stream processing machine is programmed by a special purpose language that is designed with the special requirements of
architectures like vector computers or stream processors in mind. Operators in the arithmetic/logic unit of the stream
processor can be configured in order to meet the requirements of special applications.
226
Motivation from a Full-Rate Specific Design to a DSP Core
Approach for All GSM Vocoders
Shervin Sheidaei
Hamid Noori
Iran University of Science
and Technology,
Computer Engineering
Department
sh_sh2@hotmail.com
Ahmad Akbari
AmirKabir University of
Technology
Computer Engineering
Department
hamid_noori@yahoo.com
Hosein Pedram
Iran University of Science
and Technology,
Computer Engineering
Department
akbari@iust.ac.ir
AmirKabir University of
Technology
Computer Engineering
Department
pedram@ce.aku.ac.ir
The GSM is a mobile telephony standard for cellular phones. GSM encodes a speech frame in less than 20ms and carries it
over a mixture of full-rate, half-rate and enhanced full-rate channels. A specific architecture was designed for full-rate
vocoder which its implementation on an Altera FLEX10K FPGA contains 7100 Logic cells, 59ns clock period and takes
7.22ms to encode a frame. In order to extend this architecture for half-rate and enhanced full-rate vocoders the most optimal
solution was an application oriented DSP core having a smaller power consumption than a general purpose DSP, but still
preserving its flexibility. The core contains 11 functional units that operate in parallel, and has a two stages pipeline.
Implementation on an Altera FLEK10KE FPGA, results 6838 LCs, 49152 memory bits (excluding ROMs), 27.5ns clock
period which takes 8.1 ms to encode a frame in the half-rate systems. There are 129500 multiplication and 106500 addition
operations in this frame.
Netlist Partitioning for Accelerated Verification Systems
Joachim Pistorius
Michel Minoux
Altera Corporation
101 Innovation Drive - MS 2301, San Jose, CA 95134
joachim_pistorius@altera.com
Laboratoire d'Informatique de Paris 6
Université Pierre et Marie Curie, 4 place Jussieu
F-75252 Paris Cedex 05, France
michel.minoux@lip6.fr
Accelerated verification requires the implementation of functional design descriptions on hierarchically built hardware
systems. Netlist partitioning is used with the aim at minimizing the hardware resource requirements and controlling the
computation time while meeting size and pin constraints for each partition at each hierarchical level. Our partitioning
strategy is based on an extensive study of basic algorithms. The best-suited algorithms were improved and, if necessary,
adapted for partitioning with size and pin constraints. The various algorithms were then combined in order to form
composite algorithms dedicated to each hierarchical level. These composite algorithms and their various possible
combinations have been evaluated using very large industrial netlists and randomly generated benchmark netlists. The
experimental results show significant improvements over the global partitioning results of a state-of-the-art industrial logic
emulator in terms of both the number of boards required for the design implementation and the computation time.
Proving Safety Properties of FPGAs
Adrian Hilton
Jon Hall
Teleca, 88/89 High Street,
Winchester, Hampshire, England
adi@suslik.org,http://www.suslik.org/
The Open University,
Walton Hall, Milton Keynes, England
j.g.hall@open.ac.uk,http://mcs.open.ac.uk/jgh23
FPGAs are increasing in complexity and being used as important components of safety-critical systems. Emerging safety
standards require analytic reasoning to demonstrate the safety of FPGAs in such systems. We describe a method which uses
a synchronous process algebra to produce formal proof that an FPGA program satisfies safety properties, and demonstrates
its use in the specification of safety functions for a safety-critical system.
227
PuMA++: A Fully Automatic Path from Specification
to Multi-FPGA-Prototype
Klaus Harbich
Oliver Bringmann
Erich Barke
Inst. of Microelectronic Systems
University of Hannover, Appelstr.
4D-30167 Hannover, Germany
harbich@ims.uni-hannover.de
Dept. of Computer Engineering
University of Tuebingen
Sand 13 D-72076 Tuebingen,
Germany
bringmann@fzi.de
Inst. of Microelectronic Systems
University of Hannover, Appelstr.
4D-30167 Hannover, Germany
barke@ims.uni-hannover.de
In this poster we present a new design flow for efficient hardware implementation of behavioral system specifications at
algorithmic level into multi-FPGA (Field-Programmable Gate Arrays) rapid prototyping systems. We discuss the benefits of
coupling the high-level synthesis tool CADDY-II and the partitioning and mapping environment PuMA, which is designed
for optimized implementation of RT-level (Register-Transfer) netlists into multi-FPGA architectures. With our new
approach, rapid prototyping and in-circuit verification in earliest design phases are enabled. Due to short implementation
times and precise back annotation accomplished by a close coupling of the tools, more design iterations and thus better
design space exploration are possible.
RCMAT: A Reconfigurable Coprocessor for Matrix
Algorithms
A. Amira, A. Bouridane, and P. Milligan
School of Computer Science, The Queen's University of Belfast, Belfast BT7 1NN, Northern Ireland
{A.Abbes, A.Bouridane, P.Milligan}@qub.ac.uk
Recently, computer architectures which combine a reconfigurable coprocessor with a general-purpose microprocessor have
been proposed as a solution for some computationally intensive tasks. These architectures are designed to exploit large
amount of fine grain parallelism in these computationally intensive applications. Currently, work is in progress at the
Queen’s University of Belfast to develop a field programmable gate array based rapid prototyping environment to perform
some matrix algorithms including matrix operations, matrix transforms and matrix decompositions. It is the aim of this
paper, to describe the environment of the general purpose RCMAT coprocessor. Key aspects of the architecture, together
with a prototype software environment are presented. Preliminary performance results and comparisons with similar
algorithms implemented on different platforms have shown better performance for RCMAT platform.
Reconfigurability in Embedded Microprocessors:
A Prototyping Study
Sergej Sawitzki, Steffen Köhler and Rainer G. Spallek
Inst. of Computer Engineering, Dresden University of Technology, D-01062 Dresden, Germany
{sawitzki,stk,rgs}@ite.inf.tu-dresden.de
http://www.inf.tu-dresden.de/TU/Informatik/TeI/index_e.html
This work introduces a prototyping environment for reconfigurable microprocessors design targeting embedded systems. It
differs from the previous approaches in the fact that a systematical way (concerning both hardware and software sides) to
design, test and debug a class of reconfigurable computing cores instead of one particular application is discussed. Both
static and dynamic reconfiguration approaches were tested during the execution of several algorithmic cores from signal,
image processing, data compression and simulation domains. First experiments with a simple 8 bit prototype using a
reconfigurable ALU extension for implementing application specific instructions have shown that the processor
reconfiguration allows performance gains by a factor 2-28 compared to the non-reconfigurable counterpart for these
applications. The speedup factors are depending on the number of executions of the algorithmic core and the reconfiguration
time. The study has discovered some directions for further architectural improvements.
228
Systems Prototyping Dedicated to Neural Network
Real-Time Image Processing
Rolf F. Molz, Paulo M. Engel
Fernando G. Moraes
Lionel Torres, Michel Robert
Inst. de Informática - UFRGS - Caixa
Postal, 15064 - POA - RS. CEP:
91501-970, Brazil
rolf@dinf.unisc.br, engel@inf.ufrgs.br
Fac. de Informática - PUC - Av.
Ipiranga, 6681-Prédio 30 / bloco 4 POA - RS. CEP: 90619-900, Brazil
moraes@inf.pucrs.br
LIRMM -Université Montpellier II 161, rue Ada, 34392 MONTPELLIER
Cedex 5, France.
{torres,robert}@lirmm.fr
The configurable computing research community has been using different examples of image processing operations as
computing exercises. Nevertheless, few research efforts have been carried out in conceiving complete systems aimed at
implementing vision applications, including both the hardware and the software elements required. The main goal of this
work is to show that all those techniques and technologies used by configurable computing community may be used to
make up a low-cost system capable of mapping a wide number of computer vision applications in research and industrial
environments. This work proposes a portable system using reconfigurable devices (FPGA) and a signal processor (DSP)
available in a flexible codesign platform (APTIX) for image processing. This hardware/software implementation is a full
stand-alone system, able to execute all required tasks for shape localization and classification. This system can be
implemented in a dedicated ASIC, characterizing a system-on-a-chip for image processing. To complete our system, we can
connect it to a CMOS Image Sensor circuit in order to include the image acquisition task.
The Systolic Ring: A Reconfigurable Systolic Architecture
Gille Sassatelli, G. Cambon, Jérome Galy and Lionel Torres
Université Montpellier II, LIRMM, UMR 5506 CNRS /UM II, 161 Rue ADA, 34392 Montpellier Cedex 5, France
Internet is becoming one of the key features of tomorrow’s communication world. The evolution of mobile phones network,
such as UMTS will soon allow everyone to be connected, everywhere. This new network technologies bring the ability to
deal not only with classical voice or text messages, but also with improved content: multimedia. At the mobile level, this
kind of data oriented content requires highly efficient architectures; and nowadays mobile system-on-chip solution will no
longer be able to manage the critical constraints like area, power, and data computing efficiency. We will show why
classical FPGA architectures won’t be more adapted to solve these arithmetic oriented application problems; and thus
naturally propose a new coarse grain dynamically reconfigurable network, dedicated to data oriented applications such as
the one targeted on third generation networks. Principles, realizations and comparative results will be exposed for some
classical application, targeted on different architectures.
Task Partitioning Between a General Purpose
Microprocessor and Reconfigurable Hardware
Nitij Mangal
Puneet Gupta
C. P. Ravikumar
nVIDIA Corporation, Inc.
nmangal@nvidia.com
Mindtree Technologies Pvt. Ltd.
puneet_gupta@mindtree.com
ravikumar@controlnet.co.in
We describe a hardware-software codesign solution for a target architecture consisting of a general-purpose processor and a
reconfigurable coprocessor. We consider two different working models for the coprocessor, one in which the outputs of
various prospective coprocessor configurations are multiplexed, and a second where the coprocessor supports “need based”
reconfiguration via an on-chip configuration cache. The input to the system is a C program, whose call graph is derived and
modified so that every instance of a function appears as a separate node. Recursive functions are required to be implemented
in software. Similarly, the successor nodes of all hardware-implemented functions are also implemented in hardware so that
hardware-software communication interfaces are unnecessary. We estimate task execution times on software/hardware, as
well as software-hardware communication overhead. We experimented with both simulated annealing and genetic
algorithms for hardware-software partitioning. Hardware implemented functions are pushed up in the reconfiguration order
to minimize the number of reconfigurations. We were able to get speedups of over 1.25 for an example of ADPCM in
significantly less time than an exhaustive search method.
229
Two-Dimensional 8x8 Fast Cosine Transform
Parallel Processor
Dr. Anatoly Melnyk, Yury Ermetov and Bohdan Dunets
Lviv Polytechnic National University , 79013, Bandera str., 12, Lviv, Ukraine,
{aomelnyk,yoerm,dunets}@polynet.lviv.ua, http://www.polynet.lviv.ua
The discrete cosine transform (DCT) algorithm is basis of such wide-spread coding standards as JPEG, MPEG, H.261 etc.
Two-dimensional DCT computation uses row-column approach including 8x8 data transposition, 8-point DCT, another 8x8
data transposition and 8-point DCT. For 8-point DCT calculation the new parallel fast cosine transform (FCT) algorithm is
developed featuring the same number of parallel adding operations instead of sequential adding operations of the usual FCT
algorithm. For FCT implementation fully parallel 8-channel pipeline FCT unit was designed. Development of specialized
constant multipliers allowed to obtain maximum performance with minimum hardware amount. Data transposition was
implemented with specialized sorting memory using registers instead of RAM for obtaining maximum speed. VHDL model
of the processor was created and implemented with Xilinx FPGA using the VHDL synthesis tool Synplify for FPGAs and
CPLDs synthesis of Synplicity, Inc. Comparing to an existing Xilinx, Inc. parallel 8-channel implementation of 2D 8x8
DCT proposed in this paper parallel processor requires 15% less hardware amount with 65% speed enhancement.
A Universal Fault-Tolerant Methodology in SRAM-Based
FPGA Systems
Yanmei Li, Dongmei Li and Zhihua Wang
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, P. R. China
{liym, lidm, wangzh}@hannah.ee.tsinghua.edu.cn
Contrasted with antifuse-based FPGAs (Field Programmable Gate Arrays), SRAM-based FPGAs are more attractive by
offering additional reprogramability and more flexibility. However, in some applications, the SRAMs are faced with great
threats, for example, they are susceptible to radiation-induced upsets in aerospace systems. The faults in configurationSRAMs may cause a functional failure in FPGAs and even in the whole system. As an effective solution, a TMR (Triple
Modular Redundancy) system and a universal fault-tolerant algorithm are presented in this paper. Fault identification,
mitigation and correction are introduced. Moreover, key circuit designs are also described, including two sorts of voter
circuits, the output compaction and the scan chain design. For general SRAM-based FPGAs, this fault-tolerant methodology
can mitigate the effects of SRAM faults and detect these faults without interrupting the system operation. Its effectiveness is
proved through simulation and experiment based on XILINX FPGAs.
230
Download