7 Series Dedicated Hardware REL, part 1 Script 1—7 Series Dedicated Hardware, part 1 Hello and welcome to this recorded e-Learning on 7 Series Dedicated Hardware, part 1. My name is Frank Nelson, I will be your instructor for this module. 2—Objectives <read slide> 3- Outline Slide <read slide> 4-Need for Higher Bandwidth <read slide> The demand for speed is exceeding the limits of differential IO, so we need serial IO. 5-Next-Generation Serial Connectivity <read slide> Speed is improving (increase in performance). 6-Multi-Gigabit Transceivers <read slide> 7-Transceiver Quad Clocking Transceivers are grouped in quads or blocks. Each block (or column) has four transceivers. There are 1 or 2 columns of these transceivers on each device. Each has 5 PLLs, 1 LC PLL (common PLL) and 4 Ring PLLs. Each PLL has different characteristics. The LC has the highest performance and is necessary to get the maximum performance from the transceivers. However, the LC does not span the frequency range in small increments, like the Ring PLL. The LC (or QPLL) can operate at half, quarter, and one-eighth of the source clock rate. The Ring Oscillator (CPLL) has a wider range of frequencies that it can support (but does not support the highest performance clocks, like the QPLL). There are four Ring PLLs per block and each can only drive one transceiver pair. Each LC PLL can drive every transceiver pair in the block. Note that the LC PLL can drive the Ring PLLs as an input. Each transceiver can select between the Ring and LC PLL clocks as an independent input. This gives each transceiver the ability for TX and RX to run at entirely different frequencies whether the clocks are related or not. 8—Gigabit Transceivers Overview The GTs consist of a parallel to serial transmitter and a serial to parallel receiver. Each is unidirectional and has its own data path. They are both independent and they only share fabric connections. The connection to the fabric is a wide parallel interface. The width of the interface varies by family, protocol and line rate, and the width will go from 8 to 40 bits depending on the combination of features the user selects. The serial interface to the circuit board uses differential signaling using differential current mode logic (CML), which requires that there are two traces for the transmitter and two traces for the receiver. This helps remove common-mode noise. 9—Transmitter Overview - PCS Each of the transmitters and receivers are broken into two parts…the Physical Coding Sub-layer (the parallel data side) and the Physical Media Path (which is the electrical interface that connects to the outside of the device). The PCS takes data that is wide and running at normal speeds in the FPGA fabric and prepares it for transmission on the serial link at very high speed. That means the data needs to be prepared. (not spoken…Preparation is needed because the data will lose framing, the receiving device may need to use a different clock, the data will also need some encoding and the data may need some phase adjusting. The encoding is accomplished with 8B/10B, but there is also support for 64b/66b and 64b/67b encoding. But there are other encoding techniques that are supported and require the user to build himself.) The user interface requires the use of an FPGA fabric clock which must be routed on a BUFG or BUFH primitive. The PMA side is going to be using a very high speed clock that is generated by either a QPLL or a CPLL. (not spoken…So there is also some clock crossing logic that has been added to this interface at the boundary between the PMA and the PCS interface. This is needed to compensate for the fact that there is an FPGA fabric clock and a divided down PLL clock that have the same frequency, but have different phase and jitter characteristics.) 10—Transmitter Overview - PMA Once the PCS side is finished the data is ready for serialization on the PMA side. It not only has to be serialized it also has to be conditioned for delivery on the external medium. That is necessary because the initial contact the transceiver makes is with copper. And at these speeds copper is a little challenging to work with. So the PMA includes a programmable output level to compensate for external signal attenuation. It also includes a programmable pre-emphasis to compensate for unavoidable external low-pass filter attenuation of the transmission line. There is also some additional circuitry for some out-of-band signaling technologies (that is PCI and Serial ATA technologies). 11—Receiver Overview - PMA The serial side of the receiver has to first include differential receiver inputs that can handle the copper inputs. The first thing this requires is linear equalization to compensate for the board effects caused by the board’s low pass loss. Decision Feedback Equalization is a mechanism for looking at the affects of the last few bits and using that to adjust the gain of the receiver so it can get the best data eye. Once there is an established data eye, the clock data recovery (CDR) builds a clock signal from the data and processes the data on a divided down recovered clock. It then can be de-serialized. 12—Reveiver Overview - PCS The PCS block now takes the parallel data and performs comma detection, alignment, decoding, and channel bonding before sending the data to the FPGA. All of that is built into the dedicated hardware on the receiving end. (not spoken…First the data needs to be transferred to a local clock at the same rate as the derived clock. This will require some synchronization circuitry. The PCs also has a built in 10B/8B decoder and support for the other decoders seen on this slide. There is also built in pattern checking logic for testing the link integrity.) 13—7 Series Transceiver Architecture Major Supported Protocols This slide shows you the major supported protocols that can be accomplished with the use of the 7 Series GTs. There is a wide range of protocols that are used in many different industries and applications. Note that most of the protocols are supported by more than one of the GT architectures, which means more than one device family can support most protocols. So make sure you use this information when selecting your FPGA product. 14—Power Reduction Options There are a number of power saving features that are built into the GTs. First of all the clocking architecture for the transceiver columns is designed to provide maximum flexibility while saving power. Since the PLLs use considerable power, we recommend that you only use the PLLs you need for your application, and let the unused PLLs be powered down. (not spoken…So if you can build your design such that very few clocks are generated, you will save power. Likewise, use of the QPLLs uses less power than CPLLs, so their use is recommended. Note that the tools will automatically determine which ones are not needed and turn them off.) There is also a low power mode for the receiver circuitry. This uses a lower power linear equalization circuitry rather than the higher power DFE. On the transmit side, you can use a lower voltage to power the CML. 15—Transceiver Wizard Overview This slide shows you how to customize your transceivers with the Transceiver Wizard. Since the GTs are grouped into quads, the customization provides optimization of an entire quad. This allows you to select from an immense number of attributes. You could set them manually, but it is pretty much essential to use this. There are also a number of supported protocols that when selected from a drop-down box (like XAUI) will automatically set all of the necessary attributes for you. Once these attributes have been set, you can manually edit them without the gui if you like. 16- Transceiver Wizard Like the MIG, the Transceiver Wizard generates a lot of stuff. This Wizard will generate RTL including your HDL wrappers. The RTL will have to be instantiated into your design. The Wizard also provides an example design for you to synthesize, implement, and simulate. This enables you to simulate your protocol before integrating it into your FPGA design. 17-ChipScope Pro Tool + IBERT ChipScope includes an Integrated Bit Error Rate Tester (IBERT). This allows you to figure out how good your link is and what power saving options you can use on your link and still meet your performance needs. Once you have an implemented design that contains gigabit transceivers, you can use the ChipScope Pro Tool and the IBERT to perform link characterization tests. With these tools, you can move the sampling point within the data eye, measure jitter margin, sweep through the equalization settings to find the optimal values, and analyze channel margins. 18-Outline Slide <read slide> 19-Summary <read slide> 20-Where Can I Learn More Well, there is lots of useful information about the 7 Series devices at support.xilinx.com. For this topic, I would recommend that your review the 7 Series GTX Transceivers User Guide. It has more detailed information that we did not have time to cover. You can also review the Transceiver Wizard User Guide. It will help you understand all the selectable attributes you can set for all the transceivers in the 7 Series. If you would like to see what other courses we offer, or what other Free RELs are available go to the Xilinx Education link you see here. But whatever you do, please take a second and let us know what you thought of this REL. Just click on the icon on the next page and tell us what you think. My name is Frank Nelson. You have been listening to the 7 Series Dedicated Hardware REL, part 1. Thanks for listening and thanks for your business. 21—Legal Stuff <nothing said> 7 Series Dedicated Hardware REL, part 2 Script 1—7 Series Dedicated Hardware, part 2 Hello and welcome to this recorded e-Learning on 7 Series Dedicated Hardware, part 2. My name is Frank Nelson, I will be your instructor for this module. 2—Objectives <read slide> 3- Outline Slide <read slide> 4-PCI Express Technology Success PCI Express is considered to be the dominant serial I/O standard today. Its broad market adoption may be due to the plug-and-play capability that removes technical hurdles that make other protocols more difficult to use. The strict compliance requirements mean that customers can enjoy seamless interoperability between ASSPs, ASICs, and FPGAs. More advanced users need customization capabilities to achieve a competitive advantage. There are three generations of PCI performance. Gen1 offers 2.5Gbps performance, Gen 2 offers 5.0 Gbps, and Gen3 offers 8 Gbps performance. Each supports multiple lane configurations. 5- 7 Series FPGA PCI Express Solution Xilinx offers both soft core and a dedicated PCI Express Block in its 7 series devices. This offers customers a combination of logic savings, improved speed, reduced cost, and flexibility. All 7 series devices offer a dedicated PCI hard IP core. Using the dedicated PCI Express Block enables customers to fit their design in a smaller, less expensive FPGA. A dedicated block also achieves higher performance and consumes less power than a soft implementation with guaranteed timing. Customers can also realize a faster time to market because the dedicated block is pre-verified, and Xilinx also provides a complete solution with a development kit, protocol packs, documentation, and reference designs to get your customers up and running quickly. From this slide, you can see that every device family supports the use of a PCI interface, but note that some include the hard IP core of different generations of the PCI interface. Likewise, not all of the device families support all of the PCI generations. Access to the PCI Express Block is included with the ISE software at no additional cost to customers. After installation, the users receive an evaluation license from the CORE Generator tool. Customers can then register online for free access to the full license. 6- 7 Series Gen2 Integrated Block The 7 Series PCI Express Endpoint Block supports both endpoint and root port, using the AXI interface. The bandwidth is scalable up to eight lanes and depending on your device family it will support Gen1, Gen2, or Gen3 data rates. The 7 series devices are compliant with the technology version 2.1 requirements for electrical signaling, protocol, quality of service, and is hot pluggable. 7-7 Series Gen3 Solutions Gen3 of the PCI standard is supported by both Kintex-7 and Virtex-7 families with a soft core. It is supported in both -2 and -3 speed grades. It is designed to be tightly integrated with the GTX transceivers included with those families. There are also a data link layer and transaction layer soft IP provided by one of our Alliance core partners. These are available through the core generator. It even inserts pipeline registers if they are needed to meet timing. The Virtex-7 XT family includes a dedicated hard IP core that supports Gen3 performance and up to 8 lanes. 8—7 Series PCIe AXI4 Interfaces The AXI protocol is designed to provide a soft local link interface between IP and peripherals integrated within Xilinx devices. It is our recommended interface for our IP going forward. It has a couple of different interfaces that provide benefits in particular applications. The Streaming Basic is used for maximum control and speed. The Streaming Enhanced is recommended for new designs and minimizes the design work a user would have to do build common functions. It also enables high speed. The Memory Mapped is optimized for designs that use off-chip memories or target embedded applications. Whichever configuration you choose, AXI is designed to be point-to-point interface that is simpler and more efficient than shared busses. The PCI express core has been optimized to use this interface. 9—CORE Generator Interface Simplifies Design Tasks The CORE Generator creates a wrapper for the PCI Express Block that includes the IP block itself, block RAM, gigabit transceivers, and clock generation. The CORE Generator helps your customers to configure the PCI Express Block, selecting lane width, link speed, and other settings. It automatically configures the transceivers to meet specific PCI Express requirements such as electrical idle, out of band signaling, output swing, deemphasis, channel bonding, and the data rate. 10—Outline Slide <read slide> 11—XADC Block Diagram The final piece of dedicated hardware we're going to discuss is the 7 Series Analog to Digital Converter. The XADC component is available in all 7 series devices. This block monitors up to 17 analog inputs and can perform digital conversion with up to a 16-bit resolution. The A to D converter supports a conversion rate of 1 mega-samples per second. It also has three on-chip sensors to measure the current consumption of the internal power supplies and device temperature. It can monitor on-chip power supplies within plus or minus 1% and temperature within plus or minus 4%. Alarm levels can be set using the control registers. For example, the alarm includes an automatic power-down feature if the temperature exceeds either 125 degrees Celsius or a user specified limit. These can be defined during configuration or they can be inspected and set through the DRP via the JTAG port or the FPGA fabric. The status registers are read-only, but can also be accessed through the DRP. The dynamic reconfiguration port (DRP) provides flexibility by allowing the configuration of the System Monitor to be changed at runtime. The XADC is easy to add to your design through direct instantiation from the Core Generator. 12—High Quality ADCs <read slide> The ADCs in the XADC block are fast and precise. These can replace external ADCs in many applications, reducing overall system cost. 13—Other Features <read slide> Some typical applications for the XADC include environmental monitoring for high reliability systems, debug and diagnostics including a power-on self test, and some customers just use it as a general A to D converter to save on board space and component count. In addition to the JTAG port, the XADC can also be accessed with the ChipScope Pro software. 14— Outline Slide <read slide> 15—Summary <read slide> 16-Where Can I Learn More Well, there is lots of useful information about the 7 Series devices at support.xilinx.com. For this topic, I would recommend that your review the 7 Series Integrated Block for PCI Express User Guide. It covers the core in great detail. You also can refer to the XADC User Guide for more information on the use of the Analog to Digital Controller. If you would like to see what other courses we offer, or what other Free RELs are available go to the Xilinx Education link you see here. But whatever you do, please take a second and let us know what you thought of this REL. Just click on the icon on the next page and tell us what you think. My name is Frank Nelson. You have been listening to the 7 Series Dedicated Hardware REL. Thanks for listening and thanks for your business. 17—Legal Stuff <nothing said>