Basic HDL Coding Techniques Script

advertisement

Virtex-6 Clocking Resources Script

1-- Hello and welcome to this recorded e-Learning about Virtex-6 Clocking resources. My name is

Frank Nelson, I will be your instructor for this module. This module introduces the clocking generation capabilities in the Virtex®-6 FPGA, including its clock routing resources and clock management resources. This module is designed to enable you to get the most out of these clocking resources.

2--If you would like to print out this slide presentation and a copy of the script, please feel free to do so, now. You can pause the recording and download both by clicking on the Attachments link you see in the upper right-hand corner of the gui.

3- Virtex-6 is a brand new and leading edge FPGA architecture.

In this module…<read slide>

4- Virtex-6 has three different clock routing resources…

<click> global, low-skew for regional clocking, and IO clock routing. We will talk about the specifics of the clock networks, how they are driven, etc. We will talk about all of these resources and explain when you want to use each of these resources.

<click> Clock region structures are still important with the Virtex family of devices, unlike

Spartan. Note that the clock regions now contains 40 CLB rows, not 20 as in the Virtex-5 device family.

<click> Virtex-6 also has clock management tiles (called CMTs) which includes PLLs and some extra features. Unlike Spartan-6, Virtex-6 does NOT contain a digital clock management resource

(DCM). the DCM behavior is completed by the PLLs. There are up to 9 CMTs per device depending on your device density. The CMTs perform frequency synthesis, clock de-skew, and jitter filtering. Basically everything a PLL and DCM has done in previous device families.

The MMCMs accepts a very high input frequency range from 10-800 MHz. This is a very wide range and allows very flexible frequency synthesis. The VCO frequency has been improved in the

PLL. It is up to 1.6 GHz.

<click> Like the DCM and PLLs in other FPGAs, customization of the MMCM resources is done with the Clocking Wizard.

5- <click> The MMCM has eight independently programmable counter or programmable clock outputs. These outputs are primarily from a counter 0 to counter 6+ and CLKFBOUT.

Added specifically among counter 0 to counter 3 are the complement outputs. This allows a 180 degree phase shift of the outputs automatically, while requiring a minimum set of the counters.

<click> Additional MMCM features include the clock input switching, phase shifting, fractional clock division, and the Dynamic Reconfiguration Port. Clock switching allows you to seamlessly switch between the CLKIN and CLK2 without resetting the MMCM. We will talk about these features in more detail a little later.

<click> There are two kinds of software primitives: the BASE primitive and ADV primitive. The

ADV primitive will give you access to the advanced features and allow clock switching between two clocks without burning a BUFGMUX.

6- Slide represents the die view of Virtex-6.

<click 1> The yellow bars are I/O columns.

<click 2> The blue bar is the central clock column; this column contains the MMCMs and the global clock resources (shown in green).

<click 3> The clock regions are organized into flat rectangles that are divided down the center of the FPGA by the central column. In this case, the black line separates the regions so that an equal number of CLBS is above and below the horizontal routing (commonly called HROWs, shown here as BUFHs). There are between 6 and 18 clock regions in the FPGA, depending on your device density.

<click 4> The global clock buffers (BUFG), are in the middle of the chip. These drive the vertical spines of the global clock network. The horizontal spines of the global clock network run through the middle of each clock region. The horizontal spines are driven by BUFH buffers.

<click 5> There are also regional clock routing resources, called BUFRs. In addition to clocking elements in a single region, a BUFR can also drive clocks into neighboring clock regions (one above and one below), so they are more than just a regional clocking resource.

<click 6> Clocks are driven up and down from the center row (HROW) of each clock region.

The BUFIO I/O buffers are placed within the I/O column and designed to be high performance and thus highly optimized routing placed very near the IO columns (denoted in yellow). Some

BUFIOs are capable of spanning multiple regions. Each device has 2, 3, or 4 I/O columns depending on device density. This provides one or two columns of I/O pins per clock region.

Note that the IO pins associated with each column and a single region have their own IO bank.

<click> Note that the MMCM tiles are associated with two clock regions, so ideally those MMCMs would be used to generate the control signals associated with the neighboring clock region. This behaviour of isolating the MMCMs, a single clock region, with a fixed number of CLBs, and a fixed number of IO pins, and a single IO bank is reflective of an ASMBL architecture. This is a characteristic Xilinx has used with Virtex-5 and basically implies that as the density of the FPGA increases, you basically get a fixed number of additional regions.

7- <click> This is a close-up view of the Virtex-6 FPGA clock region. Each clock region is 40 CLBs high and half the width of the device. Each clock region contains 12 BUFH buffers that drive the global clock network. Any 12 of the 32 BUFG clocks can drive the BUFH resources in each region.

<click> Each clock region has six regional clock networks, driven by the regional clock buffers in the local region or either of the neighboring regions. Each clock region has up to eight I/O clock networks per I/O column in the region (a clock region can have one or two I/O columns). There are four local I/O clock networks per I/O bank, driven by the I/O clock buffers in that bank.

There are two additional I/O clock networks driven from the bank above and two from the bank below (except in clock regions on the top or bottom of the die). This makes a total of 8 I/O clock networks per I/O column and since each there are two I/O columns, a total of 16 I/O clock networks can be used.

8- <click> Virtex-6 has a large global clock network that provides the distribution for up to 32 global clock signals across the entire FPGA. These reside in the center of the device, but this is not relevant to the designer since clock routing is handled by the tools.

<click> To bring these clocks into the device there are 8 global clock input pins, but these are

NOT the only pins that are available to bring a clock into the FPGA. There are also 4 clockcapable IO pins per IO bank.

<click> The 32 clock buffers are placed on the die such that 16 are on top and 16 on bottom, but this only matters for certain connections to the BUFG. The 32 BUFG each drive one vertical spine of the global clock routing resources. To reduce clock skew, they actually drive to the center of the top half and center of the bottom half before fanning out to the HCLK rows, but like

I said the routing of the clocks is handled by the tools, so don’t get pre-occupied by this.

The global routing resources (BUFGs) can be driven by…<read slide>

Please note that the BUFGs can only be driven by CCIO pins in the center columns.

<click> For many years now we have referred to the BUFG as a number of different primitives.

This includes the BUFGMUX which is supported with Spartan-6 and the BUFGCTRL features which is supported with Virtex-5. Virtex-6 supports the latest features of the BUFGCTRL, which we will discuss later.

9- <click> In general, it should not be necessary for designers to instantiate a BUFH primitive in their design. The implementation tools will use the BUFH primitive appropriately.

<click> The role of the BUFH primitive is to drive a horizontal global clock row into each clock region. It is effectively an extension of the global routing resources. Each clock region has 12

BUFHs (12 BUFGs). The BUFH can distribute a clock at up to 800 MHz.

<click> The BUFH is driven by (read slide)

<click> There is also a BUFHCE primitive (with CE). This is helpful for turning off a local clock into a region to help minimize power consumption.

When using the BUFH or BUFHCE with the ISE® 11.3 software, an area constraint may be required to keep all of the clock loads within a single clock region.

10- Regional clock buffers serve vertically adjacent clock regions. So they act as additional routing resources for clock signals that do not need to route to many clock regions (or at least not immediately neighboring clock regions). The regional clock networks reach the clock inputs of all the synchronous elements within the region.

<read slide>

11- The I/O clock buffers are designed to route clocks within I/O columns.

<read slide>

The four clock-capable I/O pairs in each I/O bank are identified by the letters CC in the pin name.

The pins that connect to the BUFIOs that can drive multiple regions are designated MRCC.

Single-region pins are designated SRCC.

The I/O clock networks can drive the clock pins of the IOB resources

• DDR and SDR flip-flops in the ILOGIC/OLOGIC block.

• ISERDES and OSERDES resources. only :

Because of the low fanout, they have extremely low skew, and very short insertion delay which makes them ideal for source-synchronous interface applications.

12- Source Synchronous Interfaces can be easily built with Virtex-6.

<read slide>

The BUFIO and BUFR in a region have the same driver – the CCIO. They are intended to be used together to create the two clocks (High Speed and Low Speed) required for the SERDES

(this explains why the BUFR resource does clock division). The output of the BUFIO and BUFR are guaranteed to be in phase, allowing proper clock crossing between CLK and CLKDIV in

ISERDES/OSERDES.

Please note that ALL of this (the clock buffer, the networks, the regional driver, the ISERDES) are all in one clock region in one column. This is how extremely high speeds can be achieved with the IO resources in Virtex-6.

13- The performance path routing enables fast I/O interfaces. This has been optimized in the silicon and is handled automatically by the implementation tools.

<read slide>

Please note, if you connect the outputs of the MMCM to the BUFIO, BUFR, or GTX with outputs other than O0-O3 (which will use normal routing resources) then your performance will degrade significantly.

14- <read slide>

15- <read slide>

16- <read slide>

Depending on the specific device, a clock region can have one or two I/O columns. This will affect the number of I/O pins, BUFIOs and BUFRs per clock region. It does not affect the number of global (12) or regional (6) clock networks per clock region.

17- The MMCM uses PLLs for all of these features. It can replace external PLLs to lower your system cost. This slide is pretty much review from what we have already learned.

<read slide>

MMCMs are located in the center column of the device. The PLL is designed to remove your input clock jitter. The MMCM_ADV primitive supports the dynamic input selection and the dynamic phase shifting features.

18- This diagram shows the internals of the MMCM and helps explain how it works.

The first portion shows how the clock switching functionality is built in two allow synchronous switching of the source clocks.

<click> Then the output is fed to a programmable counter/divider (denoted by the letter D). The derived clock is then fed to a

<click> Phase-frequency detector (denoted as PFD) which compares the feedback clock (that’s the clock that has been selected to route back to the MMCM input for clock input compensation).

The phase-frequency detector accepts an input frequency of up to 650 MHz. The clock input detection circuitry is also found associated with the phase-frequency detector, so we can determine if either of the feedback clock or clock has stopped.

<click> The charge pump adjusts the output voltage higher or lower based on the feedback input to the phase-frequency detector. The charge pump controls the VCO frequency, which can be up to 1.6 GHz.

<click> This is then followed by the loop filter…

<click> and the voltage controlled oscillator. The oscillator has 8 programmable counters associated with it that allow our desired clocks to be made.

<click> Many different output frequencies can be made based on a programmable multiplier and divisor value you can set for each MMCM. Each MMCM can have its own programmable value. M can range from 1 to 64, D from 1 to 80, and O 1 to 128.

Note that the Clocking Wizard is used to customize the MMCM resources.

19- <read slide>

Please note that as a division occurs, the duty cycle can vary, but the division will be correct.

Static phase shift amounts can be set independently on each MMCM output. Using the dynamic phase shift port will adjust the phase shift on all MMCM outputs in addition to any static phase shift amount that is configured on each output.

20- Other additional MMCM outputs include…<read slide>

21- <read slide>

Note that the ability to drive clock-capable I/O to the MMCM and BUFG directly decreases the need for dedicated clock input pins.

22- This slide show you a simple way to de-skew your distributed clock. Note that this really means a compensation for the clock’s input delay. There will always be some clock distribution skew due to the clock routing resources.

<describe circuit>

<read slide>

The BUFG in the feedback path ensures that the feedback clock experiences the same delay as the CLKOUT0 clock. This maintains a known phase relationship between the input and output clocks.

Removing the BUFG from the feedback path will save resources, but also introduces a phase difference between the input and output clocks. This situation is acceptable if the MMCM is being used for frequency synthesis or jitter filtering, and an exact phase alignment is not necessary.

23- <read slide>

This configuration is possible because there is a dedicated connection between the MMCMs in the

CMT. By design, this saves a global buffer and allows the designer to generate a wider range of custom clock signals.

24- <read slide>

This is useful for creating a large number of phase-aligned clock frequencies.

25- This is a typical example of one MMCM making multiple clock sources and demonstrates the power and flexibility of the MMCM.

<describe circuit>

The CLKOUT1 output can also drive a BUFR in parallel with the BUFIO if SERDES is being used in the I/O logic.

26- Now lets wrap this module up with a summary. <read slide>

27-There are some very useful resources available to you on support.xilinx.com. The Virtex-6

User Guide has been mentioned numerous times in this REL. It describes the complete architecture in extreme detail. This is often what most customers find helpful. So I would strongly encourage you to check this document out.

When you go to support.xilinx.com you will notice that searching for documentation by device family will show you all of the current documents for the selected device family so you will probably find other documents that you will find interesting.

If you would like to see what other courses we offer, including the Designing with Virtex-6 FPGA

Family course, or what other Free RELs are available go to the Xilinx Education link you see here.

I would also like to mention again that there are architecture modules available that discuss the basics of Xilinx’s newest devices. You may find this useful, especially if you want to learn more about the device differences.

But whatever you do, please take a second and let us know what you thought of this REL. Just click on the icon on the next page and tell us what you think.

My name is Frank Nelson. You have been listening to the Basic FPGA Architecture (Virtex-6)

Clocking Resources REL. Thanks for listening.

30—(nothing said)

Download