>> Ray Bittner: Good morning. My name is... Don Matson today. He's a local Xilinx field applications...

advertisement
>> Ray Bittner: Good morning. My name is Ray Bittner. I have the pleasure to introduce
Don Matson today. He's a local Xilinx field applications engineer. He's offered to do a
series of three talks for us on the newest Xilinx tools, and today we'll be talking about
Virtex-6 and Spartan-6.
Don?
>> Don Matson: Thank you, Ray.
Good morning. And as Ray said, I'm going to be talking about Virtex-6 and Spartan-6.
These are our newest families. I'll start by talking about Spartan-6 and then I'll cover
Virtex-6, and at the end I'll have a little bit of a summary.
Go ahead if you do have questions. We'll try and make this a little bit interactive in case I
miss something.
So with that, let's get started with Spartan-6. And normally when I talk about Spartan-6
people say, What happened to Spartan-4 and what happened to Spartan-5? And you
can see I give a lot of responses, and some people think that our marketing department
couldn't count, some think it's just because it's better than Spartan-3, others because
Xilinx wanted to highlight the commonality, and then the final choice was it is the sixth
generation of Spartan devices.
So if you count generations and you go back, this is really the sixth generation of Spartan
devices, and Spartan devices have always been optimized to deliver a balance of cost,
power, and performance. And probably originally, the original ones were more cost and
performance, and as we've gotten into these later architectures, power has become a
bigger issue. So that is Spartan-6.
And as I go through here, you will see, for those who have done FPGA work, that
Spartan-6 and Virtex-6 are really derivatives of Virtex-5. So that's part of the reason for
calling it Spartan-6 as well.
So Spartan-6 is on a 45-nanometer process. It is a low-power process. For those who
have used our FPGAs, we've always used what would be considered either the general
purpose or the high-performance process before. So this is the first FPGA that we've
done using the low-power process.
You'll see that we're going to offer a couple different platforms, an LX series device and
an LXT series device. And the T stands for transceivers, which means our gigabit
transceivers are in there. And then you can see we have a rollout in mid 2009 of those
devices. And I'll talk, when I get to the family chart, a little bit more about what sort of is
our rollout schedule for hardware and software and documentation.
So that said, I did say it is a 45-nanometer process. And one of the things you will see is,
as we are rolling to 45 here, you will see that there are some significant advantages just
from the power standpoint in here, as well -- let's see. We'll just move -- so Spartan
devices as opposed to Virtex devices.
Spartan devices are really aimed for the high-volume market. That's what we're
targeting. And you can see that in this slide I'm showing, we've had some pretty good
growth or very good
growth in our Spartan families over this last decade. And we're also going to talk a little
bit about what we're doing in packaging.
So just to make sure everybody's aware, Spartan is what we're aiming for the
high-volume market and the Virtex is the guys who are really after some significant
performance. And so that's sort of how we're positioning the two families.
That said, here's sort of what we see as the market needs for that high volume. So we
need to minimize application cost. We do that by doing things to minimize the power
supplies you need. So in the Spartan family you need a VCC end supply to power the
core and in Spartan-6 that's 1.2 volts, and a VCCO supply for your I/Os, and that can be
anything from 3.3 volts all the way down to 1.2 volts.
In addition to that, there's a VCCOX supply so that certain I/O standards and certain
things have a dedicated supply. That supply can be either 2.5 volts or 3.3 volts. So if I
need just a 3.3-volt system, I only need to supply two power supplies, one for VCCO and
VCCOX of a 3.3 and then supply for VCC end. So that helps with the cost.
The other thing we'll talk about is how do we address more getting additional bandwidth.
In particular with the Spartan device, it isn't so much how much I can run inside the
device or the speed that I can run inside the device, it's more about how can I get things
onto the chip. And that's been the main emphasis in improving performance on
Spartan-6.
And then when we get to the slides on power you'll see where we've made significant
improvements there, and then when we look at the device family, you'll see that it's more
than 2x the size of our previous Spartan devices.
So let's just talk a little bit about some of the things we're doing to meet low cost. First,
when we look at the packages, what you'll see is we sort of break these into a few
groups. So there's the old TQ package. That's for the guys who are doing prototypes
who really -- you know, if I'm doing a flat panel display, I've got a lot of board area, and I
want to minimize board layers, so having a TQ package makes sense.
Then this is probably where most of our customers are at, the 1-millimeter ball pitch
packages. And then we have some chip scale packages here, and those are the
.8-millimeter ball pitch. And there is some additional packaging coming on Spartan,
particularly to reach -- put more logic into a small space. But these are the initial
packages that we'll roll out.
In addition to packaging, some things that we can -- yes?
>>: In the chip scale, is that like a bare die or is there a [inaudible]?
>> Don Matson: The question was, in the chip scale package, is it a bare die? Chip
scale package -- all of these packages are really wire bond packages, which -- I mean,
this kind of shows it looking more like a flip chip, but really the die is upside down on the
package and then there's wire bonds out and then it's got a lid on top of the package. So
that's all of these devices.
>>: [inaudible] the die is bonded to that metal cover of that [inaudible].
>> Don Matson: Yes.
So the question was, is the thermal resistance -- is there a cover on top of the die. Yes,
there is a cover on top of the die, and you can put the heat sink on top of that, and that
does help with the thermal impedence. But if you compare these devices to the Virtex
devices, all the Virtex devices are flip chip packages, and the majority of the heat is
actually conducted out through the leads, and the Virtex devices offer a much higher
thermal performance. So hopefully that answers that question.
Also, to help with reducing system costs, we've started implementing a significant number
of hard blocks inside the devices. And what this slide is showing is some of the blocks
that are in the Spartan FPGA. So the first one that it shows is this SRAM. And I'll talk
about it a little bit more in a few slides. But the important thing there is we've actually
made a hard controller and put it on the Spartan device, so that saves you logic. It also
saves you development time because it's much easier to use.
We introduced in Virtex-5 PCI Express Endpoint. In Virtex-5 it can be a by 1, by 2, by 4
or by 8. In a Gen 1 PCI express, and Spartan-6 it is a by 1 Gen 1 endpoint, so 2.5 gig.
In addition, we have put in the DSP blocks in here. If people are doing some processing,
the original Virtex-2 Pros put in multipliers, hard multipliers, 18-by-18s. In Virtex-4 we
introduced a DSP, what we call the DSP48. It's basically a multiplier followed by an
accumulator.
In Spartan-6 and in Virtex-6 that multiplier accumulator structure will have a preadder in
front of it as well. So if you think about classical FIR filters, FFTs or even -- yeah, or
those two applications, it's often quite nice to have that preadder.
So it's a hard silicone block in cascades. And I'll talk a little bit more about it in the
Virtex-6 time. But that is in Spartan-6 as well. There are some differences between it,
but both devices have the multiplier, the accumulator in there. And they have multiples in
there.
So another thing we've done in Spartan land to make things easier, and we actually do
this in Virtex as well, is we've made it easier to configure with commodity SPI PROM or
flash. We started making that switch a few years ago. We're now support SPI PROMs
from multiple vendors. Although you might not want to buy them from Spanion [phonetic]
right now. They're not doing so well.
But they have added capabilities now. They've gone not only by one, but by two, by four.
We've put some intelligence into the Spartan FPGA, so if you tell it to configure with an
SPI PROM, you no longer need to give it those variant selector vendor select pins. You
don't have to set those because we're going to interrogate the SPI PROM and determine
what -- who's out there and then do the appropriate things to pull the bit stream off.
The other thing we're showing here is we're showing that you can put multiple images in
that flash and have the Spartan device boot from one image and then give it a -- have it
do some checking and then boot from a second image if you wanted as well. So those
things have been worked on for a while now, and we've made some improvements there.
The next area I wanted to talk about is the I/Os, what have we done in Spartan-6 I/Os to
allow us to get higher speeds. And you'll see we talk about 1.0 gig per second. In the
general purpose LVDS I/Os, those who have used high-speed LVDS know that the
challenge usually lies not in capturing the signal at the first register, but how do I get it
from that first register, whatever speed it's in, to a parallel form to get it into the rest of my
fabric to run it. And I'll talk a little bit about that.
And then, of course, we've talked about the gigabit transceiver, so I'll talk about that.
Just so you're aware, here are all the memory interfaces that are supported by Spartan,
and just -- these last ones here in Spartan-6 are supported in the devices that are LTX,
where the T stands for transceivers. So those are ones that are supported with the
transceivers.
The general purpose I/Os look like this, and I want to make sure everybody is aware that
our general purpose I/Os are full 3-volt tolerant. So if you've used Spartan before, you
know that Spartan-3A has an ability to do hot socketing or be in a hot swap application
and withstand full 3-volt I/Os, and we're doing the same thing in Spartan-6. And I need to
contrast that a little bit with Virtex-6 because Virtex-6, the I/Os -- the maximum I/O
voltage it will withstand is 2.5 volts. And that was done to improve performance there.
Those who have used either Virtex-4 or Virtex-5 will kind of recognize this block. What's
off over here is my output buffer and over here is the input buffer. But over here I have
this
I Logic block, which is my deserializer. This is on every I/O pin. And then pins are sort of
paired together, with the idea that if I want to bring in LVDS or some other differential
signal, I need to have them come into a pair.
And so this is -- these two are paired together, and this would be like the P and this would
be the end input. And I can make this be the master deserializer and this would be the
slave, and with that I can deserialize either one to two, one to three, one to four, five -no, I can't do five -- yeah I can do five, six, seven, eight. So any of those numbers. And
that's how the deserializer works in combination.
If I just bring it in single-ended, I can do one to four, is the maximum. So somewhere we
talk about DDR3. That's a single-ended standard, and that will come in at 400
megahertz. It's DDR, so four to one on that means that coming out of here, this
deserializer will give me four bits at 200 megahertz going to the fabric, going to the hard
endpoint, just so everybody's aware of that.
And, likewise, that was the input logic. There's also a serializer over here to do the same
thing going out, and it's trying to show that I have a data path, and then the tri state
enable also needs some sort of ability to deserialize as well. So those are those two
blocks.
And then the last block here is the I/O delay. So in the Virtex-4 we introduced the I/O
delay. This is a similar function. It allows me to programmably come in through this
dynamic reconfiguration port and adjust where my delay -- input delay tab is. So that is
probably the biggest change in the I/O for Spartan, and that will give us the ability to do
that gigabit LVDS or do the high-speed memories.
When I'm talking about I/Os, we talked a little bit about the transceiver block. This is just
a high-level view of what we call the GTP in Virtex-5 or in Spartan-6. And this is the
transceiver. It's the same transceiver used or -- I shouldn't say the same. It was ported
from Virtex-5 to Spartan-6. So it was ported from a 65-nanometer process to a
45-nanometer process, which is one of the reasons we believe that we will have very
good success. It's something that we've been producing now in volume for quite a while,
and we're ready to make it available in our Spartan devices.
There is a couple of differences that are important for the transceivers. And I don't know
how many in here have used the transceivers, but in Virtex-5, those people who have
used the gigabit transceivers know that our transceiver pairs are put in a tile. A tile
contains two transceivers, and in that tile of two transceivers there is one PLL, and so
you can -- for your transmitter or for your two transceivers, they can work off of that one
PLL if their clocking rates are related in some nice integer divider ratios.
In Spartan-6 and in Virtex-6 we've changed the structure and we've added a PLL for the
TX and a PLL for the RX. Now, the PLL on the RX path is optional, so if you're running
PCI Express, you probably don't want to bother to turn it on. It's going to just burn a little
extra power. But it does give me the ability to take PCI Express in on one channel, which
is running at 2.5 gig, and then run the other channel or the other transceiver at something
else, like a video HD-SDI rate. So that's our transceiver.
I mentioned before that we've put in a hard memory controller. This hard memory
controller is only in Spartan devices, and this memory controller has a controller interface
to the memory that can either be 8, 4 or 16 bits. If I can figure it at 16-bit memory
interfaces, that'll be the majority I can do. On the user side I get six 32-bit programmable
ports to the fabric. So if I had a MicroBlaze or a processor, I could have the processor
hooked up to one of those ports.
The ports can be either write-only, read-only or read/write, and it's a very simple FIFO
type interface, so relatively easy to use. And this will give it -- make it easier for our
customers to hook up to these high-performance memories, get the design up and
running quickly.
And then the last area I wanted to talk about is twice the capabilities, half the power. And
I'm going to start here by looking at the fabric, and I want to contrast -- so this is where
Spartan-3 is today. You'll see the 4 LUT followed by a register, and you'll see that
Virtex-4 also had that same structure. Virtex-5 we went to a 6 LUT followed by a flip-flop,
and then there was -- it's actually six inputs and two outputs, and that second output was
just available to the fabric.
And we've done some studies and we said, hey, for a lot of designs where we want to
increase the performance, having that extra flip-flop here is a very easy thing to add, and
so we've added that.
It also gives us the advantage of when I make this LUT into a distributed memory, I now
can put twice as much distributed memory in the same area. So a 32 deep distributed
memory and two bits wide is what I can put into that area. Whereas, with Virtex-5 it's 32
by 1. So that's in both Virtex and Spartan.
The B-RAM, we've actually made multiple changes there. Probably the biggest one is in
Spartan devices, the ratio of logic to B-RAM was relatively low. So we've pretty much
doubled the ratio of B-RAM to logic in Spartan-6 over Spartan-3. The other thing we've
done is we've taken these 18Kb RAMS and we allow it to be fractured into two
independent 9Kb RAMS.
And a lot of times people didn't need the full depth of the 18Kb RAM, and so now you can
get a better utilization out of that memory. And that's very much similar in Virtex-6, this
primitive -- the base primitive is still a 36kb RAM, and it can be fractured into two 18kb
RAMS. So that's one of the things.
We've also done some enhancements to the block RAMS because we're interested in
reducing power in both Virtex and Spartan. When we were designing them, we found
some ways inside to reduce the power.
And just so you know, in Spartan, the B-RAMS will run somewhere -- or if you use the full
pipeline modes, 270, 300 megahertz, somewhere in there. 250 to 300 depending upon
your device. And in Virtex that'll be 500 to 600 megahertz if you use the pipelining
options.
Okay. Clocking. So those of you who have used Virtex-5, this block should look very
familiar. It's a PLL and two DCMs. Those are Spartan-3A DCMs coming over. So if you
have Spartan code, your DCMs will come on over. We've added this PLL. The PLL has
a VCO that's going to run somewhere between 500 to -- 500 megahertz to gigahertz, and
then you have five divider tabs that you can set coming off of that. And that will go to
your clocks.
We've done a lot of working on the clocking in order to support the high-speed I/Os.
There are some dedicated clocking paths for getting clocks from the PLL out to the I/Os
as well.
On this slide I just wanted to highlight, you know, what kind of performance boost do we
get using Spartan-6 versus Spartan-3. So we just take the MicroBlaze and give you
some relative performance numbers as we switch from a Spartan-3 architecture and just
move it to Spartan-6. You can see we do get a nice little boost in performance.
In general Spartan-6 will be, you know, a little better, a little faster than Spartan-3. I
mean, we're not trying to really push the speed aspect on Spartan-6. However, we are
doing everything we can to reduce the dynamic and static power, and so you can see
we're showing roughly about 50 percent reduction in power.
And this slide here, what we did is I took -- or we took the largest Spartan-3A device, it's
a 3A -- what we call the DSP device, so it had twice the memory that the standard
Spartan devices had. So it was roughly comparable to sort of a midrange Spartan-6
device. And if you just take this 3A device and compared it to here, you'll see that we -just through process, we're getting roughly about a 50 percent power savings.
Additional power savings is available if you move from a 1.2 volt core to a 1.0 volt core.
So let me explain what we're doing.
As we're designing these devices, we're designing them to do something we call voltage
scaling so that the device can be either run at 1.2 volts or 1.0 volts. When we do voltage
scaling on the Virtex-6, voltage scaling will either allow the core to run at 1.0 volt or .9.
And it's just an option that you order and you can get the device in there.
Yes?
>>: What are your projections for voltage levels for [inaudible]?
>> Don Matson: Your question was what are our projections for voltage levels?
>>: [inaudible] scale it down to .7 or are you plateauing at .9 or .8?
>> Don Matson: The question was -- and I think what you're really asking is when we roll
out the next generation beyond here at 32 nanometers, will we be scaling voltage. My
expectation is that we would, but I have not -- I have not checked on that. I mean, I could
certainly ask and find out.
So this just shows you where we're at with our power. I wanted to show you some of the
features that are added in in power. So these are hardware features that have been
added over the years. You know, the ability to stop clock, the buff Gs [phonetic] have
had that ability for quite a while or you could do it another way.
Hibernate, the idea that, hey, I'm going to just power off the FPGA, we've added
suspend, voltage scaling, and now when we go into suspend, we've added the ability to
use some of the pins inside the device to wake up. So not all the pins go asleep. So I
could have an interrupt pin coming into the FPGA to wake it up. I don't have to have an
external processor to wake it up.
And it's a lower power platform. One of the things I should mention is we've done a lot of
work on our software, and, you know, the initial software -- and I think we've been doing
-- there are power options in MAP and PAR to reduce power, simply just clean-up routes,
you know. Hey, what can I do to clean up this route? And, you know, we didn't get a
whole lot of power savings.
We did run the 11 and 1 recently on a design comparing it to 10-1 on an LX -- on a Virtex
5 and LX 110 device, and what we saw there was 13 percent improvement in
performance or in power reduction. And we kind of expect somewhere in that 10 percent
range can be handled -- can be achieved through software alone. So I just wanted to
mention that as well.
And then the last thing is we're bringing all these devices -- or building these devices.
People want to be able to test them. You want to get boards, you want to be able to build
your system. We're working to not only build the base platform or the devices, we're
rolling out boards to targeted areas, we're putting IP on those boards, putting reference
designs, making those available to customers. We're trying to do as many things as we
can to help you guys be able to do your designs in a quick and efficient time.
So that said, here's the family of devices, and the LX 16 device is -- we have got some
devices back. We have an early sample of that. We have a demo board that we took to
ESC Conference back in February. That's the first device. It will be generally available in
the July timeframe. And then the next couple of devices are the LX 45 and the LX 45P,
which is the August/September timeframe, and then the 150. Those three devices will be
the first three devices sampling, and I think all three of those are scheduled to be in
production at the end of the year.
Yes?
>>: [inaudible].
>> Don Matson: Volume prices -- the guy in the back right there, that's Michael Pierce
[phonetic]. He's our sales rep, and he can get you some guidelines as to what kind of
pricing to expect with volumes. And it's a question of volumes and time. But as an FAE,
they don't like me to talk about that
So I haven't mentioned anything about pricing, and I don't know if you've looked, you
know, at a Spartan-3. Obviously this Spartan-6 will give us the ability to put a lot more
transistors, a lot more logic cells in the same area. So there will be some price
reductions compared to Spartan-3.
Yes?
>>: Do the blocks save me power on memory [inaudible].
>> Don Matson: So I think there's two questions there from
Sandra [phonetic]. The first one, hard blocks, do they save you power. And the answer
is yes, hard blocks always save power over the embedded or over a soft block. Simply
because it's dedicated routes, it can be built a lot smaller. And typically we only do hard
blocks on pieces of IP that we see that there's significant savings in doing that.
I have numbers on how much power a DSP block or I could -- I could do a FIR filter with
or without it. The numbers are fairly dramatic. PCI Express would be an interesting one
to do. I mean, it would be easy to do a hard versus soft.
>>: [inaudible] SDRAM is a lot less hot than [inaudible].
>> Don Matson: So I think your question on -- so on the memory, I think a better way to
-- or how I would like to think about it is the memory controller inside there should save
you a fair amount of power. We made sure it addresses three particular standards, and I
mentioned a couple of them. DDR3, DDR2, and the other one that we get a fair amount
of request for is low-powered DDR simply because it's a low power, or mobile DDR is the
other name that it's called. And so I think we'll be able to demonstrate significant
savings, and when we've got boards, we'll get you some of those numbers.
The other question was ICAP, and I'm not exactly sure what the question is. I can guess,
and I think I'll take a stab at that.
So obviously there's an ICAP in Spartan devices. There has been for a number of years.
The area that some people are most interested in ICAP for is reconfiguration. And we've
always said don't bother with reconfiguration in Spartan devices because there was a
potential memory glitch when doing -- rewriting a logic cell from -- back to its same value.
With Spartan-6 there is no memory glitch. The devices are designed for partial
reconfiguration. So that is in Spartan-6.
Yes?
>>: I think you said that there is no hard block for DDR in the Virtex-6, right?
>> Don Matson: Yes.
>>: So you would still be using MIG?
>> Don Matson: You would still be using MIG to do the memory controllers in Virtex.
And the reason for that is it's -- with the Spartan class device, it isn't that -- I guess,
actually, it's easy to say, hey, if I can put a 16-bit-wide DDR3 out there, I probably can
cover the memory bandwidth that most applications need. With Spartan devices I need
the ability to tailor it a lot more, and so with Virtex devices we give the user the ability to
customize it, and the fabric is fast enough to allow you to do that.
The last slide here on Spartan -- actually, I've got a couple reference slides I'll throw in as
well.
Really the last slide here is on the timeline, and I don't have an equivalent slide for Virtex,
so I'm going to comment about Virtex because it's almost identical.
So documentation -- early access documentation to both Virtex-6 and Spartan-6 was
opened up the end of last year, so first of this year. And so if you go out to the web, you
go xilinx.com/6, you'll go to the Spartan-6 Virtex-6 home page, you'll find that that's an
eight-page overview brief that pretty much covers all the information that I'm going to be
covering today for both Spartan and Virtex.
But in the early access document lounges there's probably -- there's well over a thousand
pages of documentation on the major features of Spartan-6 and Virtex-6. And so if
somebody needs access to it, you know, my contact information was on the first slide.
It's don.matson@xilinx.com, and we can work to get you early access documentation.
Early access software happens the end of this month. So there will be a limited number
of customers with that. What I don't have on here -- well, I guess generally yes. Third
quarter, just to be -- so everybody's aware, it's actually -- July will be the release. So 1st
of July you should be able to target full Spartan-6 and Virtex-6.
And as I said before, it's the July timeframe, the first Spartan device, the 16, will be out in
general sampling, and it's the May timeframe for the first Virtex-6 device. So that's kind
of the timeline we're on.
I put these next two slides in here. These are just references for people who have used
Spartan-3A just to compare Spartan-3A on logic cells to Spartan-6. And so you can see
that we've got significantly larger devices, and this just gives people a way to compare.
And I also broke down the comparison on this slide and I said compared to number of
block RAM bits. So this is just looking at block RAM bits, and so you can see that there's
a lot more memory in these devices.
Yes?
>>: [inaudible] with an embedded flash? Are you going to do something like that for this
one?
>> Don Matson: Yes. So the question was is we do have some Spartan devices out
there, Spartan-3AN, that have an embedded flash available.
And will we do something with Spartan-6? I think that the devices have been designed to
allow us to put additional devices on there if we wanted. So we could put a flash, we
could put a DDR3 out there.
Our goal, though, right now is getting Spartan-6 and Virtex-6 released on schedule. And
also there's another thing. ISE software is ongoing. There's a lot of work being done
there.
So that's our primary focus, and we're investigating whether we should.
So just showing here a road map, you see that we do have our Spartan-3A family out
now. We will be releasing Spartan-6 early -- in the middle part of this year, and then the
devices with the transceivers, and you'll see that there's also something coming in
45 nanometers we call Dragonfire. I think there will be an architectural announcement at
the end of this month or early next month on that.
So with that, I'd like to transition from Spartan to Virtex-6.
Are there any other questions on Spartan before I go to Virtex?
Okay. So in Virtex land the care-abouts from our customers are a little bit different.
Performance is probably the number one or is the number one concern, but power is not
too far behind. So much like Spartan, our customers are caring about power. They do
care about cost, so we've done a number of things in Virtex to make it easier for people
to lay out boards. So the Spartan devices, the packages were wire bond packages. In
Virtex they are all flip chip packages.
We've done an extensive amount engineering of the package to support the high-speed
transceivers. We've also put all of the bypass -- not all of them -- the majority, vast
majority of the bypass capacity you need for the devices are on the flip chip devices in
Virtex land, and that helps to reduce cost. The other thing we've done is we've done as
much as we can to keep the number of power supplies to a minimal. So those are some
of the things that we've been doing there.
When I talk about Virtex, you know that we've had multiple families in Virtex, and in
Virtex-6 we have the LXT family and the SXT family that we've announced, and we will
be announcing something later this year and actually sampling something with the
high-speed transceivers at the end of the year. So that's where we're at.
Let's just talk about performance. Compared to Virtex-5, the fabric should be about one
speed grade faster. The transceivers in Virtex-5 were basically 3.2 gig. They're now -all base transceivers are 6.5 gig, so roughly twice as fast.
We've done some things on the I/O to allows us to run at a higher speed. And you can
see that we're going to be able to run DDR3 at 1066, and we've done a number of little
things in the clocking.
It says clocking is 10 percent faster overall, but there's a number of things that have been
done in clocking to really help those guys doing high-performance designs. And the
DSP, those people using DSP blocks, you'll see that even in all the devices, there's a lot
more DSP resources available, and so we'll talk about that.
This slide looks familiar so we won't say much about it other than the fabric is the same
as the Spartan-6 and it's a derivative from Virtex-5.
Block RAMS, much the same story. We've done some enhancements to improve the
FIFO performance so you know there is a dedicated hard block FIFO in that 36k B-RAM.
And so you have one FIFO or you can split and it use it as two independent B-RAMS.
And we've also added the in -- we added error correction in. If you use the FIFO as -- or
not the FIFO -- if you use the block RAM as a 72-bit-wide data path or 64-bit data path
with the 8-bit extra for correction, there is an ECC block available on all of the B-RAMS in
Virtex.
In addition to that, we've added some -- an extra capability on that ECC logic to allow you
to inject errors to test it, verify that it's working. So it's sort of our second generation of
ECC logic for the B-RAMS.
As I mentioned before, we've put in some new performance paths on the clocks. The
other things we've done is we've added midpoint buffering to reduce global skew. If
you've used the regional clock buffers, you know that in Virtex-5 they're limited to 250
megahertz. They're now -- those regional clock buffers have been changed to be
differential clock buffers to reduce any jitter or noise that they might see, and their
performance has been increased to the 500 megahertz range, and so that has happened
as well.
We have up to 18 mixed-mode clock managers. The clock managers are all PLL based
in here. All pins have the I/O delay. So you do have the ability to still do I/O delay. The
refinements in the I/O delay is that we've done a number of things to reduce the power
and also to increase the accuracy of the tap delays, and we have reduced the total
number of taps. So in Virtex-5 you could have 64 taps, in Virtex-6 you're down to 32
taps, the idea being that for a higher speeds I don't need that really long tap delay, and
that costs me a lot of power area and jitter performance. So that's why we've made that
change.
As we look at the DSP blocks -- so this is showing the DSP slice. As I mentioned before,
the DSP slice is similar in Spartan-6 to what you see here. So what I'm showing is one
slice. You'll see that I have an A and a B input. And I have a preadder coming in here,
so I could have a coefficient coming in here, I could have -- for a FIR filter, I could have
my coefficients wrapped around, so I come in here, and adding and then multiplying.
And you'll see that that multiplier is a 18-by-25 in Spartan -- or in Virtex-6, much like it
was in Virtex-5. So that bigger multiplier allows more dynamic range. The accumulator
is still a 48-bit accumulator in both Spartan and Virtex.
And then the other thing I'm showing is most of our DSP software takes advantage of the
fact that I can run a cascade of these really nicely to greatly reduce power by using the
direct route and keep performance up in that high range. So it's 500 megahertz for dash
1; 550, dash 2; 600 for dash 3. And then, as I mentioned before, there's roughly twice as
many as there was before.
Now, all of the Virtex devices have a built in PCI Express block, and they all have some
built in Ethernet MAC blocks as well. And those Ethernets can be 10, 100, 1 gig.
And the PCI Express block here, since the transceivers support 6.5 gig or up to 6.5, the
transceivers are all -- or I should say the PCI Express block can be either Gen 1 or
Gen 2, and it can support up to four lanes Gen 1 endpoint, and it has the ability or the
hooks necessary to allow you to hook to an endpoint as a downstream device from you.
So that capability is new.
As I mentioned before, the memory performance has been boosted. So we now support
1066 in 4DDR3 and we also support 1.4 gig for the LVDS I/O. And the rest of that should
look very familiar to those who have done Virtex-5.
So sort of as I mentioned before, our transceivers will be GTX on the base devices, and
then we'll be introducing a device later called HXT devices with the other transceiver.
So much like in Spartan, we have a lot of hard IP in Virtex. One of the blocks that I
hadn't talked about before is there is a system monitor block on all Virtex-5 and in
Virtex-6 devices that allows you to do -- monitor voltages and temperatures internal to the
device and also monitor voltages external to the chip if you want.
From a power standpoint, what you can see is both the static and the dynamic power is
reduced in Virtex-6 when compared to Virtex-5. And, you know, roughly we say it's going
to be about 30 percent lower power if you're running at a 1.0 volt core. If you use the
voltage scaling in Virtex, you'll get about 50 percent reduction.
Now, one thing you should be aware of is if you look on your charts, you'll see that on the
Spartan devices, the speed rates are dash 2 and 3 by default, and there's a dash 1L
speed grade, and that dash 1L is -- that's that 15 percent slower than the regular Spartan
devices. In Virtex you have a dash 1, 2 and 3 and, then there is also a dash 1L, and that
dash 1L is roughly the same performance as the dash 1. So, you know, it's somewhere
in the performance range of a dash 2 Virtex-5.
So that's our story there on power. And we'll just move on to the slide here that shows all
of the I/Os here and devices.
So one thing to be aware of, as I mentioned, if you look at the number of DSP slices, it's
much higher. And if you look at the DSP, you'll see that we have a couple of devices that
have significant amount of DSP elements and memory.
So the smaller devices, because the DSP block is relatively small to add, we have a high
number of DSP blocks, and as you go up larger we say, hey, if you really are interested
in DSP, you want to go this way as opposed to going up on these devices, just so you're
aware what's going on there.
As I said, all devices here are flip chip and all of them have the I/O or the majority of the
capacitance already on the chips. We've done some things to improve signal integrity of
the packages, and we've been doing our design on the packages with the idea that the
transceivers need to be capable to run 10 gig even though we're putting lower-speed
transceivers, and that's guiding our package development.
I guess the last thing -- earlier I said that this device, the 240T, is the first device out, and
you can see when that device samples, the first package will be the 1156 and the other
two packages will be followed shortly after. As those packages are available, we're able
to at least put something in any footprint that somebody would want, with the exception of
our friends who like to do big ASIC prototypes. We built a device specially for them, and
that gives you the three quarters of a million flip flops. So that's a big one.
That's what I had on Virtex. What I want to do now is just kind of summarize things by
just showing you the structure and just kind of show what things are common between
Virtex and Spartan and what isn't.
So what's common in there is, you know, the LUT/CLB is the same, the block RAM
structure. While the size of the block RAM isn't identical, the features are pretty much
identical. It makes it easy for us to port.
In the same fashion, the DSP slices are very similar. We have a lot of clocking resources
in these devices. The I/O structures, as you saw with the I/O -- I-logic, O-logic and I/O
delay is very similar. We have transceivers on all of them and we have PCI Express.
As you look at this, though, you should see some things that aren't the same. And first
off you'll see that this is a column-based structure, and so that means -- and I don't know
that I show it here. This doesn't show it very well. There are actually three columns of
I/O and then the fourth column of the transceivers over here.
But this is your I/O ring structure in Spartan land. And that's done so that we can do wire
bonding on there. As I said before, Spartan devices give you full 3.3-volt compatibility if
you need that. And there's just some other minor differences. You know, you have a
system monitor in Virtex, not in Spartan, and you have a tri-mode Ethernet MAC in Virtex
as well.
So that's our difference, and then -- so you can kind of see, you know, depending upon
what you're after, you know, for the lowest cost and, you know, moderate performance,
Spartan is the right choice. For the higher performance, you know, Virtex is the right
choice.
So that's pretty much what I have on the hardware. And like most things, I have to have
a commercial here at the end.
We are releasing our tools, the next version of our tools, the end of this month, so 11/1. I
just wanted to mention that for you.
We've been doing a lot of work on the incremental flows and partition base flows with the
idea that people need to be able to compile their designs in much faster time so you can
get more turns per day, because you look at these devices and you scale what it takes -might take to do a large Virtex-5 today, you realize that we have to do something
different. So there's a lot of work been going on there. As I mentioned before, power,
some things in the memory footprint.
But I just wanted to let you know the ISE software, which is now called Integrated Design
Suite because when we release it, everybody who has an ISE Suite will get -- what we
call Plan Ahead will be part of it, Chip Scope will be part of it. So your only options that
you can buy now are you can say, hey, I want to just do logic development or I want to do
logic development and embedded development or I want to do logic development, DSP
development, or I want all -- everything. But those are your choices. You no longer -Chip Scope is part of the package, I guess is the way to say it. Chip Scope, Plan Ahead,
all those tools are part of it.
So that's the changes coming, and I want to say thank for your time, and if you have any
questions, let me know.
[applause]
>> Don Matson: Yes?
>>: [inaudible] improvement, I forgot to ask you what the date was for IBS.
>> Don Matson: So, yeah. So IDS is out the end of this month. I hate to say the first of
next month because April 1st isn't -- it's kind of an odd day to say, but -- so that's when it
ships, and we have it available now internally.
The big improvement is using -- is spending more time using incremental flows, I think,
probably a little better. I don't know if you've tried that.
>>: [inaudible] you mentioned a different word.
>> Don Matson: So there's two different flows, and maybe we need to come in and talk
about that.
Partition-based flows says, hey, I've got a design, and if I look at it at the top level, I've
got multiple blocks in there, and I want to -- and I want to set those as partitions, and as a
partition I'm going to say, hey, I'm going to look at the compiled code, and, of course, I
have to compile it. And I don't know -- is everybody using Xilinx compilers or Simplicity?
But if you're using Xilinx compilers, then the compilers will keep it separate. Simplicity
can do it through MultiPoint. And it looks and says, oh, that block hasn't changed, use
the old NCD, use the information in that to keep that place and route it the same, and
then looks for the areas that are changed.
There's a little more problem in that just with the partition flow than in the flow that I called
incremental. Incremental says, hey, I'm just doing my design as I've been doing it now,
which is basically it's compiled flat. I mean, all modules are compiled at once whether -and then when I come in at that point, I look at my old, whole database from the previous
compile and I compare it to the database I have on the new compile, looking for changes.
And if I don't see changes, you know, I at least start with that placement, and that
reduces the compile times pretty significantly.
>>: [inaudible] in this performance improvement generally, is that equally across
synthesis [inaudible].
>> Don Matson: Yeah, I should go back to that. You notice I didn't say those words. So,
yeah -- so the question has to do with -- let me see if I can go back here -- to the very first
bullet on this slide, average 2x faster runtime compiles.
I don't know what they mean by average 2x faster. I can demonstrate some pretty good
numbers by using incremental compile, but if I don't change my flow and I don't change
my settings, I'm not going to see 2x improvement, you know. I might see, you know, 10
percent or something like that. But using incremental flow or partition-based flow, you
can see dramatic speed-ups on your compiles. And that's some stuff I can help with.
Yes?
>>: [inaudible].
>> Don Matson: The question from Sandra [phonetic] was do we do parallel compiles.
So this is another interesting topic. So if you look in Xilinx today, the old way of doing
design was you ran through NGDBuild, but NGDBuild today, all the synthesis tools
understand the underlying architecture, so NGDBuild is almost just really a translation
from the synthesis database to the Xilinx database. It hardly does anything.
Then the next step was MAP. And what you've seen over time is MAP has moved to a
point where more and more time is being spent in MAP because the performance that
you can get in PAR is dramatically impacted by what you do in MAP.
So a couple things for you to be aware of. In Virtex-5 MAP desk timing is the only way it
runs. Same thing with Virtex-6.
So you don't really have that option. So it's going to spend more time there.
Then when you got into PAR, there was this idea that you could place and route it. Well,
actually if you run MAP desk timing, your design has already been placed and all you're
doing is routing.
So there was this old thing called Multipass Place and Route, and some of you guys may
have used it a long time ago. But if you think about it, now if I'm down there in PAR and I
spew off a bunch of runs on Multipass Place and Route, I probably don't get that much
advantage because by the placement, some of the stuff I've done earlier will impact the
performance I get.
So we have the ability now to do some exploration up front that says, hey, instead of
doing Multipass Place and Route, which can use multiple servers or multiple CPUs, I can
use multiple servers at that earlier stage. So that is there.
We are working as well on, hey, I've just got a single compile, I know all my options, and I
can take advantage of multiple CPUs. I don't know the performance advantage I'd get
doing that at this point in time, but we are able to do that, and we are putting more and
more resources, because clearly that's the way to attack the problem.
Any other questions?
>>: [inaudible] begins with F, does that mean Floating Point?
>> Don Matson: What is this?
>>: [inaudible].
>> Don Matson: Three slides back? Okay. Way back. I think I know what you're talking
about. Let me not do it this way.
Let's just say this. Your question really was on Floating Point, I think. And we do have
support for floating point soft cores today. I don't know if there's plans to make a soft or a
hard Floating Point block. It wouldn't surprise me. That's something I can take offline
and we can do some checking on.
Yes?
>>: [inaudible] and such, there was -- one of the points was being able to actually -having a shorter lead time in terms of [inaudible].
>> Don Matson: Uh-huh.
>>: Other than the faster IDS, whatever, ISD tools, what are we doing to spend people
up faster?
>> Don Matson: That's a good question.
So what we're doing to help people get there is, if you've looked -- and I know you guys
have used some of our Xilinx development boards, and the past ones there was a Virtex
board and Spartan boards, and there are a fair number of development boards. What
we're doing is we're going to a development board strategy where the base development
board has a couple of sockets on it, or connectors, and then we're going to build
daughter boards for those connectors.
And you saw some slides that show these different reference designs, and our idea is
we're doing a lot of verification across different areas. So like video, for instance, we'll
build a video daughter card. We'll actually do some development.
I have a customer who's doing what I would call video connectivity, and we've got
complete reference designs from Virtex-5, and we're continuing on with that strategy of
not only are we giving you hardware, but we need to get either IP as reference designs or
through Core Gen to our customers to help you do your designs, because there's a lot of
common things that you might do that aren't critical.
So like my customer who's doing video connectivity, so what they want to do is they want
to take an HD-SDI stream coming in, they want to do some of their own processing and
MPEG encoding on it. Well, we're not doing the MPEG encoding for them, but as far as
they're concerned, they don't care to do the work necessary to take the data from the
gigabit transceiver into their FPGA and get it into some data, you know, format into
memory. And same thing, they don't want to do it on the other side. So we're providing
that design for them. That's what we're talking about.
Any other questions?
Sure.
>>: [inaudible].
>> Don Matson: That announcement is coming.
So the question was have we said anything about future embedded hard processors.
What I will say is we are committed to future embedded hard processors. We will be
doing some stuff, and we'll be announcing stuff in the sixth generation architectures later
this year on the embedded hard processors, and we are committed to continuing with
embedded hard processors.
Anything else?
Thanks again.
[applause]
Download