Series 7 Overview, part 1 Transcript 1- Hello and welcome to this recorded e-learning about the 7-Series device architectures. This module is part one of three modules that gives you an overview of the 7-series device family members. Please note that where applicable were going to make comparisons between the 7-Series products to Virtex-6 and Spartan-6 FPGA families. 2- The course objectives <read slide>… Please note that everything we cover here we are going to be covering in more detail in later chapters. 3- This is a quick snapshot of the seven series subfamilies. The seven series includes three subfamilies Artix-7, Kintex-7, and Virtex-7. One of the most exciting things about the 7 Series is it's a truly unified FPGA architecture that spans multiple FPGA subfamilies. This means that the architectures associated with the device subfamilies is virtually identical. There are minor customizations between the device families but the underlying CLB and array architecture is virtually the same. The homogeneous nature of the 7 Series has a lot of advantages to designers because once a designer understands one device family it is easy to use the other device subfamilies. So customers do not have to learn how to optimize their design for different device families. This means that design migration is easier than it ever has been before. These 7 Series families are still designed to cater to different customer needs, but still make design migration easy. The Artix-7 device is designed to be the lowest cost subfamily. Kintex-7 is designed to be the industry's best combination of price and high performance. Virtex-7 is designed to have the highest system performance in the FPGA industry. You can also note from this slide that as you migrate to the higher speed and density devices, you increase the quantity IO pins, quantity of transceivers available, transceiver performance, available block RAMs, and the amount of DSP resources available. As you can see from this chart each of these devices has a range of densities that do somewhat overlap, but are separated significantly by device density. Also note that the largest Virtex-7 device, the 2000, has over 2 million logic cells. This is significantly larger than the largest Virtex-6 device which had about 750,000 logic cells. This is almost 3 times the size of the largest Virtex-6 device. 4- Although there are three subfamilies in the 7 Series, each of the subfamilies has some variation. This slide shows you that the Virtex-7 devices have 3 subfamilies. The Virtex-7 is designed for general logic purposes. It is our main stream device, which means that it has a moderate amount of logic, block RAM, and slice resources. The Virtex-7 XT is designed for DSP applications because it has the largest amount of DSP slice resources and block RAM resources. The Virtex-7 HXT is best for high-speed serial connectivity, because it has the most Serial Gigabit Transceivers. This family also has several very high-speed gigabit transceivers which the other families do not have. Note that each of the sub-families has serial gigabit transceivers, even though the Virtex-7 subfamily does not have a T in its name. 5- Each of the 7 Series devices have the same features that you see here. So the block RAMs in Artix-7 are the same as they are in Kintex-7 and Virtex-7. All of the slice resources are the same, all of the DSP slice resources are the same, etc. This is different than the Spartan-6 and Virtex-6 device families which are significantly different. By making this strategy change, Xilinx has made it easier to migrate a design between subfamilies and improves the time to market for a customer who wishes to do so. You should note that each device may support different IO standards and that not all of the Transceivers are identical. This will depend on the subfamily you choose. We will talk more about this in another recording. You can also see from this slide that each of these devices place the dedicated hardware in separate columns with CLB logic dispersed throughout the device. This has been used by Xilinx in other device families for years. 6- With the development of the 7 Series, there has been a very strong focus on reducing power consumption. This slide shows you a number of those power saving features. First of all, it is important to note that 7-Series is not just a Virtex-6 device with a die shrink. There are considerable features that have been modified to reduce power consumption. For example, the 7-Series uses a different gate technology that reduces the leakage current of the transistors, this translates to a significant static power reduction. A lot of these features are designed to reduce power consumption while maintaining high performance. Virtex-7 performance is designed to be the same or better than Virtex-6, while providing a substantial savings in power. The fine-grained clock gating is a functionality that allows you to effectively turn off a clock domain with a clock enable. This helps save dynamic power consumption. This feature has been around with the Virtex-6 devices, but now the implementation tools are making the most of its availability. Just like with Virtex-6, Virtex-7 also has a lower device core voltage feature (called –1L) which will save you considerable power at the expense of performance. There are also numerous changes to the I/O structures to reduce your I/O power which is considerable in most applications. Sorry I don’t have more time to cover all of these features, but if you refer to the 7 Series data sheet, it will describe in more detail all of the device families power saving features. From an engineering standpoint it is quite interesting. 7- All of this means that the 7-Series FPGA supports the goal of 50% lower total power compared to Virtex-6. A lot of this is accomplished with a 65% lower static power enabled by the 28 nm high-performance process. But overall you should see a 50% reduction in your power consumption. This means you can do the same work in a package with a smaller heat sink, less air flow, or a smaller power supply. Alternatively, you can use the same package and use a larger FPGA that will allow you to pack twice as much logic into the device or run at a higher system speed. 8- In summary…<read slide> 9- If you would like to see what other courses we offer, or what other Free RELs are available go to the Xilinx Education link you see here. <read slide> But whatever you do, please take a second and let us know what you thought of this REL. Just click on this icon at the top of this page and tell us what you think. My name is Frank Nelson. You have been listening to an introduction to the 7 Series of Xilinx FPGAs. Thank you for listening and thanks for your business. 10- <nothing said> Series 7 FPGA Overview, part 2 Transcript 1- Hello and welcome to this recorded e-learning about the 7-Series device architectures. This module is part two of three modules. Please note that where applicable were going to make comparisons between the 7-Series products to Virtex-6 and Spartan-6 FPGAs. 2- The course objectives <read slide>… Please note that everything we cover here we are going to be covering in more detail in later chapters. 3- Like Virtex-6, the 7-Series is built on the 4th generation of ASMBL architecture (that is the Advanced Silicon Modular Block architecture). This means the device is made up of separate columns of different dedicated hardware resources. This includes clocking resources, DSP, block RAM, and IO resources. This enables the changes in device densities to be homogeneous. So as the density increases, the architectures simply grows vertically, not horizontally. This means that you will not get additional columns of block RAMs and DSP slices as the density increases, just extra clock regions. The 7-Series has very similar functionality with the Virtex-6 architecture. This was deliberate and intended to facilitate design migration from Virtex-6 to the 7 Series. If you are a Spartan-6 user, this may require a little bit of learning effort. The 7 Series is simply a more advanced architecture than Spartan-6. 4- One of the differences between Virtex-6 and the 7-Series is the changes in which some of the columns of dedicated resources are laid out on the die. For example, in Virtex-6 all the clocking resources and a couple of the I/O columns were placed in the middle of the device. The 7-Series does not keep any I/O columns in the middle of the device, instead they are all placed on the left and right edge. However, there is still is a middle column for access to the clock routing resources. Instead the CMTs are placed on the left and right edge next to the IO columns and are tightly bound to the IOs. You should also note that in some devices the high-speed I/O pins will end up replacing some of the I/O Banks you might be used to having available. In this example, the sixth I/O bank we might be expecting to be in the upper right-hand corner is missing and has been replaced by high-speed I/O pins. 5- Series-7 devices use clock regions, just like the Virtex-6 devices. Each clock region is 50 CLBs tall, which is a little larger than Virtex-6 devices. Likewise, each clock region has 50 IOBs associated with it, compared to 40 IOBs with Virtex-6. Just as before, clock regions span from the middle of the device to the edge. The BUFH is designed to distribute clock signals into an individual region. In this case, it splits the region horizontally and is represented by the gray line. Regional clock routing resources are set to route in the middle of the clock region as well. 6- The CLB structure of the 7-Series is the same as Virtex-6. There are two slices per CLB. There are slice-M (which includes memory capable LUTs) and slice-L (which only support general logic and the carry chain). There is no slice-X which was in the Spartan-6 devices. The slice-M LUTs can be configured as a 64x1 distributed memory (good for DSP applications) or a 32-bit shift register. There are also two FFs per LUT. This is helpful because each 6-input LUT can be configured as two 5-input LUTs. This lets each 5-input LUT drive a FF or be used for pipelining an application. 7- In terms of functionality, the 7 Series block RAM is the same as the Virtex-6 block RAM. It features independent read and write port widths, Dual port, single port, and simple dual-port modes, integrated cascade logic, byte-write enable, optional 64-bit error correction, and integrated FIFO Logic. The block RAM has been designed for lower static power. Internally, each 36Kbit block is divided into 9K-bit blocks. Unused blocks can be turned off to save even more power for smaller memories. The performance of both the block RAM and FIFO Logic is around 600 MHz. 8- The DSP slice resources feature a pre-adder that simplifies the design of symmetric filters. Other DSP features include a 25 x 18 multiplier, an ALU stage that includes dynamic OPCODES, single instruction multiple data support, add/subtract and common logic functionality, and a pattern detector on the output. Design enhancements continue to reduce power consumption, and the number of available DSP slices continues to grow. Power consumption is the lowest of any FPGA solution. 7 Series achieves lowpower consumption without sacrificing performance. 600 Megahertz performance for any DSP operation means that you could achieve 1.2 TeraMACs in a single device. The pre-adder can save a significant amount of resources when building symmetric filter functions, which could mean cost savings if your design can fit into a smaller part. Also, the flexible structure of the DSP resources allows other non-DSP logic functions to use them, which saves slice resources further increasing device utilization for designs that do not contain many DSP functions. 9- The latest advancement in clock management comes in the form of the Mixed Mode Clock Manager or MMCM. The MMCM is based on a PLL providing low output jitter, phase shifting, and clock de-skew. Each clock management tile contains two MMCMs with dedicated cascade connections between the MMCMs. The benefits of the MMCM are that it has been designed for low static power. It also has excellent jitter performance. Another enhancement is the introduction of new high-performance paths that connect the MMCM outputs directly to the I/O and regional clock networks. These paths provide low skew without using a global clock buffer. Global clocks are distributed up and down from the midpoint of each clock region. This translates to lower skew between columns and increases the maximum global clock frequency to 800 MHz. All of these enhancements provide low power clock generation and distribution with performance up to 800 MHz for global clocks and 500 MHz for regional clocks. These advanced clocking features remove the need for external clocking components, reducing costs and simplifying board designs. 10- One of the most significant differences between Virtex-6 and 7-Series is that there are two distinct types of I/O pins, high range (also called HR) and high performance (also called HP). High range supports I/O standards up to 3.3 volts, this is significantly different from Virtex-6 which did not support any 3.3v standards and only supported up to 2.5v. Support for 3.3v IO standards is only available for I/O banks that are denoted as high range. High performance banks are limited to 1.8v, but they have more capabilities and support more advanced IO standards that allow them to interface at higher system speeds. These system features are designed to support two diverging needs. High range pins supports legacy applications, high performance pins support the newest and most demanding I/O standards. Having an architecture that divides the I/O banks into types allows the device to meet all IO performance expectations. The Serdes functionality now adds an independent Output Delay element in some of the I/O pins. There is also some other functionality built into the device that aids its use with high speed memory controllers. 11- Another interesting thing about the Virtex-7 device is that its largest device is almost 3x as big as the largest Virtex-6 device. This is enabled by the stacking of multiple silicon dies on an interposer, where the lower die is designed to route signals between super logic regions. For example, the largest device is made up of 4 Super Logic Regions (SLRs), each with 500K logic cells and each is interconnected by an interposer. This allows the density to increase significantly without a loss in performance. 12- This slide shows a cross-section of the largest 7 Series FPGA. Each die is placed on an interposer (in blue) which contains the four FPGAs on its one mounting. It is important to note that this connectivity is done at the routing level in order to get optimum performance. The combined dies are now placed on a substrate which is now made into a conventional package and can be attached to a board. This is all hidden from the user and the tools will still treat this device as a single monolithic FPGA with 2 million logic cells. There are minor routing delays when logic is split over two dies, but the tools will take this into account during implementation. This assures that your performance objectives are met and verified. 13- Summary <read slide> 14- If you would like to see what other courses we offer, or what other Free RELs are available go to the Xilinx Education link you see here. <read slide> But whatever you do, please take a second and let us know what you thought of this REL. Just click on this icon at the top of this page and tell us what you think. My name is Frank Nelson. You have been listening to an introduction to the 7 Series of Xilinx FPGAs. Thank you for listening and thanks for your business. 15- <nothing said> Series 7 FPGA Overview, part 3 Transcript 1- Hello and welcome to this recorded e-learning Overview of the 7-Series device architectures. This module is part three of three modules. Where applicable were going to make comparisons between the 7-Series products to Virtex-6 and Spartan-6 FPGAs. 2- The course objectives <read slide>… Note that everything we cover here we are going to be covering in more detail in later chapters. 3- 7 Series has all the IP that exists in the Virtex-6 device family, except the Ethernet Mac hard core, which exists in the Virtex-6 family. However, you still can instantiate the core as soft IP, however. This was done because of declining demand for Ethernet Mac interconnectivity applications. Instead we are seeing more applications that demand high-speed transceiver links or PCI express cores. Almost all the devices in the 7 Series have dedicated serial gigabit transceivers. There are four types of transceivers that are used throughout the families. So it is useful to understand which transceivers are used in different technologies. Each type of transceiver has different capabilities and speed and so you will need to learn more about each transceiver to design with it. Note that when building these transceivers the user interface is similar, but not identical. However, with the use of the Core Generator, customization of your transceiver is relatively easy. The GTP transceiver supports speeds up to 3.75 gigabits per second. It is an ultrahigh volume transceiver and is used in the wire bond packaging provided by the Artix-7 family. That is important because when using a low-end of performance transceiver some customers still wanted a wire bond package which is relatively inexpensive. The GTX transceiver is the most similar to what was provided in the Virtex-6 device family. It operates at up to 12.5 gigabits per second. This is a bit better performance than what is available in Virtex-6 devices. The GTH transceivers are similar to what was supported in the latest Virtex-6 devices. It supports speeds up to 13.1 gigabits per second and also supports up to 10 gigabits per second protocols with high forward error correction overhead. At the very high end the new GTZ transceivers supports speeds up to 28 gigabits per second. The GTZ transceivers are designed to support the next generation 100-400 gigabit per second system line cards. 4- Another hard block almost all 7 Series devices have is a PCI express block. The capabilities of this block vary by device subfamily. The PCI express core supports both endpoint and root functionality in all devices. There is a lot of new features embedded in the PCI express block that we don’t have time to talk about here. I would recommend you refer to the PCI Express User Guide to learn more about its functionality. 5- The Xilinx analog-to-digital converter block is something new and exciting in this 7 Series devices. This is kind of like the system monitor block that is included with the Virtex-6 devices, but it has significantly different features. First of all, it is a dual analog to digital converter rather than a single channel as it was in Virtex-6. The speed of the XADC is higher than the System Monitor block, it performs up to one mega samples per second (versus the 200K samples per second of the System Monitor). In Virtex-6 the system monitor was used to monitor power supplies and other things on the customer's final board, but with the 7 Series it is used as a true front-end for medium speed applications. You can actually bring in analog signals directly into the FPGA and route them to the XADC component. This is particularly helpful for some of the low-end devices since this allows you to build a full DSP system on a small Artix-7 device, and for relatively low cost. So you don't have to purchase a separate analog-to-digital converter just for that type of application. 6- The 7 Series devices are all based on the same architecture but obviously the intent is to have different device densities and different price points for different types of market applications. This slide was made to give you some ideas of how the devices have been used. <read slide> 7- As mentioned earlier, the IO banks are not identical in the 7 Series of devices. Some of the I/O banks can be high range which supports I/O standards up to 3.3 V. Other I/O Banks are designed for high performance which supports I/O standards up to 1.80 V, but supports higher-speeds. The mixture of the I/O Banks varies by device family. For example, the Artix-7 family only has high range IOs and there are no high-performance IO pins. This means that all Artix-7 I/O pins are 3.3v compatible. The Kintex-7 family has mostly high range I/O banks, but it does have some high-performance I/O banks. The Virtex-7 device family has mainly high-performance I/O banks with some that are high range. So even with Virtex-7 you have some ability to work with legacy I/O standards. This is quite different compared to Virtex-6 where you could only work with IO standards that could be powered by 2.5 V. In the Virtex-7 XT and HT families all IO banks are high-performance. 8- This slide shows the performance you can expect for the serial gigabit transceivers across all the 7 Series families. As you can see it varies significantly from family to family. The Artix-7 device family only supports the GTP multigigabit transceivers which only supports up to 3.75 gigabits per second in the fastest speed. This is about as fast as you can go with wire bond technology. In the Kintex-7, Virtex-7, and Virtex-7 XT family there is support for the GTX transceivers. These transceivers vary in performance based on their speed grade and their package. There are a couple of new packages offered in the Kintex-7 device family and this will impact the performance of some of the GTX transceivers. The Virtex-7 XT provides a mixture of GTX and GTH transceivers. This is provided with one column of GTX and a separate column of GTH transceivers. As the chart shows, some speed grades enable the GTX to operate at up to 12.5 gigabits per second. The GTH transceivers can operate at up to 13.1 gigabits per second. The Virtex-7 HT device family has a mixture GTH and GTZ transceivers. The GTZ transceivers perform at up to 28 gigabits per second and are only available in the highest speed grade members. As you can see from this chart, the performance of the GTZ transceivers is still to be characterized. Note that the GTZ transceivers will not be available in the -1L speed grade (that is the low-power device offering). 9- There is very distinctive device packaging offered for each of the different family members. Artix-7 has an ultra low-cost wire bond technology packaging that is very inexpensive when compared to flip-chip packaging cost. But this was used since the price point for Artix-7 needs to be as low as possible. One of the challenges of using wire-bond technology with Artix-7 is how to connect the package to the die. Particularly since this limits the performance of the transceivers. Likewise, parallel I/O performance is limited to just over 1 Gb per second. 10- The Kintex-7 devices support the conventional flip-chip packaging and a new low-cost bare die flip-chip package. The bare die packaging is designed to be very low cost, which is compatible with its role in the 7-Series. The interesting thing about the bare die packaging is that it will show you the backside of the die as being part of the package. This allows significant cost savings, but at expense of some of the performance. You should also note that the flip-chip does have some decoupling capacitors on the substrate, but the bare-die package only has the power supply for the transceivers bypassed. This is different from the Kintex-7 and Virtex-7 devices which have more of the necessary power supplies bypassed. 11- The Kintex-7 and all the Virtex-7 device family members use a more conventional flip-chip packaging. This is similar to the packaging used for Virtex-6. Virtex-7 uses a fourth-generation sparse Chevron pin pattern and supports speeds up to 1.866 Gb per second for parallel I/O (this is designed to suit the memory controller speeds needed by users) and up to 20 Gb per second for the MGTs. There is also a fair number of substrate decoupling capacitors used around the MGT power supplies, block RAM power supplies, and the I/O pre-driver power supplies. As an FYI having a separate power supply for the block RAM is a new requirement for this device family. 12- In summary…<read slide> 13- If you would like to see what other courses we offer, or what other Free RELs are available go to the Xilinx Education link you see here. <read slide> But whatever you do, please take a second and let us know what you thought of this REL. Just click on this icon at the top of this page and tell us what you think. My name is Frank Nelson. You have been listening to an introduction to the 7 Series of Xilinx FPGAs. Thank you for listening and thanks for your business. 14- <nothing said>