pci-BOARD_DESIGN

advertisement
Design Feature: March 30, 1995 http://www.ednmag.com/ednmag/reg/1995/033095/07df4.htm
DESIGNING PCI-COMPLIANT
FOR ADD-ON CARDS
MASTER/SLAVE
INTERFACES
Bernie Rosenthal and Ron Sartore,
Applied Micro Circuits Corp
As the PCI bus becomes the interface of choice for most desktop systems, it's clear the benefits
of high-bandwidth and plug-and-play operation do not come easy. Unlike its ISA and EISA
predecessors, the PCI bus presents a number of electrical, physical, and functional issues you
need to understand. System designers familiar with the earlier buses now need to expend a
significant effort to build a fully compliant PCI adapter card or add-in board.
The PCI bus is a synchronous, processor-independent 32- or 64-bit local bus that operates at
5V, 3.3V, or a combination of both. The bus is forward- and backward-compatible with multiple
32- and 64-bit PCI components and add-in boards, and it currently operates at clock speeds to 33
MHz with a compatible migration to 66 MHz. The PCI bus provides substantial performance gains
over the ISA or EISA buses (see Fig 1).
With a 33-MHz clock rate, PCI data-transfer rates are as high as 132 Mbytes/sec-or, in the case
of a 64-bit bus, 264 Mbytes/sec. Data transfer may be in "bursts" that ensure the bus is always
filled with data. The PCI spec allows burst reads and writes. Both are important in applications
such as high-performance graphics accelerators, where the majority of data transfers are writes
from the CPU to the frame buffer.
Systems that use the PCI bus usually do not have memory on the PCI side, so that transactions
between the CPU and memory do not degrade PCI performance. This arrangement permits
concurrent operations on both the CPU's memory bus and the PCI bus. For example, the CPU
can work on applications in memory while bus transfers simultaneously occur between an image
frame-buffer card and a compression coprocessor.
Perhaps more important than raw bandwidth or concurrency, the PCI bus feature allows the host
system to configure adapter cards automatically. Dubbed "plug and play," this feature eliminates
the need to set jumpers and switches for the adapter card to function properly. With a series of
programmable and examinable address-decoder, interrupt, and configuration registers, the
system can treat all PCI plug-and-play add-on peripherals similarly.
The PCI bus is radically different when compared with the 8/16-MHz ISA bus. The ISA bus
became popular because it was relatively easy for any system or board manufacturer to develop
ISA products. That situation isn't true for the PCI local bus. Potential pitfalls involving electrical,
physical, and functional compliance lurk in the PCI spec. In each category, you'll need to contend
with a number of highly detailed issues.
For example, output-buffer drivers for PCI bus signals are specified with both minimum and
maximum ac switching currents. This new method departs from the traditional dc output
description previously used for various TTL and CMOS logic families, FPGAs, and PLDs. For
these logic devices, IOL and IOH (switching current low and switching current high, respectively)
are dc parameters and are usually specified on the data sheet at an exact value (6 mA and -2
mA, respectively). The PCI spec defines IOH and IOL over the ac spectrum with a dynamic range
of 0 to 1.4V and 1.4 to 2.4V, respectively. The PCI spec defines switching currents in the
transition regions whereas, conventionally, only dc switching currents are specified at a given
logic state.
Although unlikely, some conventionally specified output buffers may be acceptable for driving the
PCI bus. However, it is unclear which ones comply with the specification until manufacturers
properly model, simulate, and characterize them. In effect, you take a big risk when using
conventional logic to drive the PCI bus because your PCI-interface design may turn out to be
noncompliant due to poor choice of interface buffers.
Because the PCI bus is synchronous, virtually all the logic involved in any high-performance data
and control path will require a copy of the PCI clock. This situation is made even more difficult by
the spec requirement that allows only one input load per bus slot. A phase-locked loop (PLL)
could help here by operating as a "zero-delay" buffer. However, PLLs cannot work properly in this
situation because the PCI spec allows the bus clock, sourced by a motherboard, to operate
anywhere from dc to 33 MHz.
To further preclude a PLL solution, the PCI spec allows for instantaneous changes in bus clock
speed as long as the minimum clock-pulse durations are not less than that of a 33-MHz clock.
Implementing certain complex functions on the PCI bus, especially 32-bit burst transfers,
ultimately requires a great many clock loads. This clock-fan-out problem is exacerbated by
another PCI spec, a tight timing budget of 11 nsec from clock to data out.
A third issue deals with the sheer number of gates necessary to build the basic functions the PCI
spec mandates. You can easily approach 10,000 gates when building the 36-bit parity
generator/checker, programmable address decoders, command/status registers, and various
other required configuration registers. You need those functional elements just to achieve basic
compliance to the PCI spec without implementing any extra frills such as FIFO buffers and userspecific registers.
Today's data-intensive applications require you to fully understand the expected performance of
your system bus. For example, you need to understand the transfer characteristics of the bus to
ensure that a full-motion video system isn't going to be jerky or that a data-transmission
broadcast can be received (stored) in its entirety. In short, to determine whether a given
configuration is able to accomplish a specific performance goal, you must know how the bus will
behave.
Many designers perceive that the PCI bus specification, with its 132-Mbyte/sec transfer rate,
alleviates bus-performance concerns. Although the PCI bus offers superior transfer rates over the
ISA and EISA buses, determining the expected performance of a PCI-based system is not a
simple task. System performance depends not only on the performance of the PCI bus interface
devices, but also on the environment in which they operate. The selection of processor, busarbitration scheme, memory bus, and motherboard chip set can influence the attainable
bandwidth of an adapter function. PCI bus traffic between devices other than the CPU can reduce
the expected bandwidth even further.
Key implementation issues
The most important PCI bus specifications deal with the configuration-space header, read and
write behaviors, and ac electrical and timing parameters. The PCI local bus specification
emphatically states that "all PCI devices must implement configuration space." PCI's
configuration-register space provides an appropriate set of configuration hooks, satisfying the
needs of current and anticipated system-configuration mechanisms without actually specifying
those mechanisms-or otherwise constraining their use.
The PCI spec divides configuration space into a predefined header region and a devicedependent region. Devices can implement only the necessary and relevant registers in each
region. A device's configuration space must be accessible at all times, not just during system boot
initialization. The predefined header region consists of fields that uniquely identify the device and
allow it to be generically controlled.
The predefined header portion of PCI's configuration space divides into two parts. The first 16
bytes are the same for all device types; the remaining bytes can have different layouts
depending on the base function the device supports. The Header Type field (located at offset
OEh in the configuration-space header) defines what layout is provided. Currently, there are two
defined header types: type 01h, defined for PCI-to-PCI bridges and documented in the PCI-toPCI bridge architecture specification, and type 00h (Fig 2), which all other types of PCI devices
currently use.
System software may need to scan the PCI bus to determine what devices are actually present.
To do this, configuration software must read the vendor ID field in each possible PCI slot. The
host bus to PCI bridge must unambiguously report attempts to read the vendor ID of nonexistent
devices. Because 0FFFFh is an invalid vendor ID, it is adequate for the host bus to PCI bridge to
return a value of all "1s" on read accesses to configuration-space registers of nonexistent
devices. These accesses ultimately terminate with a master abort.
All PCI devices must treat configuration-space write operations to reserved registers as no-ops.
That is, the accesses are normally completed on the bus and the data discarded. Read accesses
to reserved-but-unimplemented registers must be completed normally and return a data value of
0. Fig 2 shows the layout of a Type 00h predefined header portion within the 256-byte
configuration space. PCI devices must place any device-specific registers after the predefined
header in configuration space. All multibyte numeric fields follow little-endian ordering. That is,
lower addresses contain the least significant parts of the field.
Software must work correctly with bit-encoded fields that have some bits reserved for future use.
On reads, software must use appropriate masks to extract the defined bits and may not rely on
reserved bits' being of any particular value. On writes, software must ensure that the values of
reserved-bit positions are preserved. As a result, the values of reserved bit positions must first be
read, merged with the new values for other bit positions, and the data then written back. All PCIcompliant devices must support vendor ID, device ID, command, status, revision ID, class-code,
and header-type fields in the header. Implementation of the other registers in a Type 00h header
is optional.
From a compatibility standpoint, the configuration space header is important because it will be
manipulated by the host system's BIOS (basic I/O system) program. If you fail to implement the
configuration space correctly, it's likely that your add-in feature won't operate. Worse yet, your
design may operate properly on certain platforms and behave mysteriously on others. When we
first designed our general-purpose PCI-interface chip, we failed to treat unused registers in the
configuration space properly. Rev 2.0 of the PCI Specification did not define the use of an unused
base address register. The resulting ambiguity affected both our device and various BIOS codes
and their treatment of "all 1s" or "all 0s" for the address-register content. This example points to
areas where interpretation of the specification by two independent parties can unknowingly lead
to interoperability problems.
PCI READ AND WRITE TRANSACTIONS
Read and write transactions take place between a bus master and a target. In its simplest form,
a read transaction starts with an address phase occurring when FRAME# asserts for the first
time on clock 2 (Fig 3A). During the address phase, AD[31::00] (the 32 address-data signals)
contain a valid address, and C/BE[3::0]# (the command/byte enable signals) contain a valid bus
command.
The first clock of the first data phase is clock 3. During the data phase, the C/BE# signals indicate
which byte lanes are involved in the current data phase. A data phase consists of some number
of wait cycles and a data-transfer cycle. The C/BE# output buffers must remain enabled for both
reads and writes from the first clock of the data phase through the end of the transaction. This
move ensures that the C/BE# signals do not float for long intervals.
The C/BE# lines contain valid byte-enable information during the entire data phase, independent
of the state of IRDY# (initiator ready). C/BE# lines contain the byte-enable information for data
phase Nÿ2D1 on the clock following the completion of data phase N. Fig 3A doesn't show this
sequence because a burst-read transaction typically asserts all byte enables. However, Fig 3B
shows this type of transaction. Notice that on clock 5, the bus master inserts a wait state by
negating IRDY#. However, the byte enables for data phase 3 are valid on clock 5 and remain
valid until the data phase completes on clock 8.
BUS TURNAROUNDS
The first data phase on a read transaction requires a turnaround-cycle, which the bus target, via
TRDY# (target ready), enforces. During the read cycle, the address is valid on clock 2, and then
the bus master stops driving the AD (address/data) lines. The earliest the bus target can provide
valid data is clock 4. The target must then drive the AD lines following the turnaround cycle when
DEVSEL# (device select) asserts. Once enabled, the target's output buffers must stay enabled to
the end of the transaction.
A data phase can complete when data transfers, that is, when both IRDY# and TRDY# assert on
the same rising clock edge. However, the target cannot assert TRDY# until DEVSEL# asserts.
When either IRDY# or TRDY# is negated, a wait cycle occurs and no data is transferred. As Fig
3A shows, data successfully transfers on clocks 4, 6, and 8, and wait cycles occur on clocks 3, 5,
and 7. The first data phase shown in Fig 3A completes in the minimum time for a read
transaction. The second data phase is extended on clock 5 because TRDY# is negated. The last
data phase is extended because IRDY# is negated on clock 7.
The bus master knows, at clock 7, that the next data phase is the last. However, because the
master is not ready to complete the last transfer, it negates IRDY# on clock 7 and FRAME#
remains asserted. Only when IRDY# asserts can FRAME# be negated, which occurs on clock 8.
Fig 3B shows a write transaction. The transaction starts when FRAME# asserts for the first time,
which occurs on clock 2. A write transaction is similar to a read transaction except that no
turnaround cycle is required following the address phase because the master provides both
address and data. Data phases work the same for both read and write transactions.
The first and second data phases complete with zero wait cycles (Fig 3B). However, in this
example, the target inserts three wait cycles in the third data phase. Note that both the master
and the target insert a wait cycle on clock 5. To indicate the last data phase, IRDY# must be
asserted when FRAME# is negated. The master delays the data transfer on clock 5 by negating
IRDY#. The master signals the last data phase on clock 6, but the phase does not complete until
clock 8.
Although implementing the nominal conditions associated with PCI bus transactions is relatively
straightforward, addressing the eventual (but less frequent) exceptional cases can prove difficult.
For example, a burst transaction can burst beyond the allocated region for a given target. In this
case, the active target must disconnect and the bus master must reissue the address phase to
select another target. Naturally, this sequence must occur without losing or mistransferring data.
Both the master and the target must then keep track of the current address during burst transfers,
which increases the complexity of the logic involved.
There are other cases which further complicate master- and target-control-logic design. A master
must accommodate situations such as a target's request to disconnect with (or without) data,
removal of GRANT by the bus arbiter, detection of error conditions (such as parity errors), and
target abort. And, of course, all control-logic designs must handle transfer latencies caused by
wait states that masters or targets introduce.
A PCI INTERFACE CHIP FOR ADD-ON CARDS
The AMCC S593X or PCI Matchmaker family interfaces to virtually any major embedded µP,
such as Intel's i960, Motorola's 68000, or Texas Instruments' TMS320, as well as many discretelogic configurations. At the lowest level, the S593X serves as a PCI bus target with modest datatransfer abilities. At the highest level, PCI Matchmaker can act as a bus master with peak transfer
capabilities of 132 Mbytes/sec with a 32-bit PCI bus.
Address decoding, address sourcing, burst transfers, and all elements necessary to perform
efficient and timely data transfers reside within the device. Also included is a bidirectional, 32-bitwide FIFO buffer for system-to-system synchronization and data transfers between the PCI local
bus and the add-on product.
One of the S593X's key features is the built-in circuitry that automatically converts "big-endian"
data structures typically used in Motorola-based systems into the "little-endian" format common in
Intel-based systems. Because the PCI bus standard allows both types of endian assignments,
hardware conversion provides the highest performance method to exchange data between the
two formats.
The S593X incorporates three physical bus interfaces: one to the PCI bus, another to the addon interface bus, and the third to an optional external nonvolatile memory. PCI Matchmaker also
provides designers with connection to an inexpensive serial EPROM that can act as a BIOS ROM
for code generation and storage. You can connect a custom BIOS EPROM to perform any
preboot initialization required of the add-on function; you can connect external ROM, EPROM, or
NVRAM through either byte-wide or serial interfaces. The external nonvolatile memory may serve
as expansion BIOS.
Data can move between the PCI bus and the add-on bus or from the PCI bus and nonvolatile
memory. Transfers between PCI and add-on buses execute through mailbox registers, FIFO
buffers, or a pass-through data path. FIFO-buffer transfers through the PCI bus interface can
occur under software control or through hardware using the S593X as bus master.
PCI AC PARAMETRICS
Table 1 and Table 2 show ac electrical and timing specifications for the PCI bus.
As expected, the performance the PCI bus delivered drives timing constraints that are not trivial.
In particular, an input set-up time of 7 nsec for an address decoder requires high-speed logic and
an optimized decoding structure. Another challenging requirement is the clock-to-data-valid path
of 11 nsec. Again, high-speed logic is necessary, as are careful layout and ground-bounce
management because of the relatively high number of simultaneously switching outputs.
Compliance with the PCI bus specification involves many facets of product design. Consider not
only the electrical driver parameters already mentioned but also how close the PCI logic is
physically located to an add-in card's edge connector. (All bus signals must be within 1.5 in., and
the clock must be within 2.5 in. ([v2D+]0.1 in.)).
As obvious as mechanical- and electrical-compliance issues are, issues involving the functional
interface with a platform's BIOS are not. Most of the difficulties a PCI device encounters occur
during host initialization. Many of the problems relate to the interactive process of establishing the
address assignment for PCI devices. This process involves both the host's BIOS and every other
PCI device's configuration space. For example, should a PCI device request a contiguous I/O
space of greater than 256 bytes, then an ISA/PCI system may have a problem granting this
request without causing an address-assignment conflict.
Other problems may arise when a PCI add-in device powers up "enabled." Contrary to your initial
instinct, the PCI specification requires that all devices-even the boot source and display adaptermust power up disabled. This scheme allows the host to complete address assignments as part
of its initialization sequence. This procedure makes the add-in device reliant on the host system
to assign its address and become enabled. Otherwise, two identical add-in cards installed in a
system could contend for the same address location on power up.
Most high-performance applications have a desired minimum bus bandwidth. This goal often
translates as the minimum onboard storage required for an add-in card as well as the overall
performance suitable for proper application execution. Bus-bandwidth limitations are often so
important that a product's success or failure may hinge on enough bandwidth being available to
satisfy application needs. If you can't ship the application data across the bus, you have to store it
locally on the add-in card and then relay it to the final destination. Local storage increases both
the board's cost and the latency of the data transfer. If the increase in cost and decrease in
performance (caused by added latency) become large enough, the product becomes
uncompetitive or simply not feasible.
The PCI bus does not entirely alleviate bandwidth worries. At present, many system
implementations cannot support PCI's often touted 132-Mbyte/sec bandwidth when transfers
occur between an add-in card and main system memory. Actual system measurements for
today's popular PCs reveal that read-access delays are commonly between eight and eleven PCI
bus clocks from the time that the address is initially provided to the return of the first 32-bit data
word. Burst transfers may then follow the first data word, but bursts typically suspend once the
eighth data transfer is performed. Because these access-delay cycles correlate to delays of 240
to 300 nsec, it is apparent that future PCI systems need improvement over today's state of the
art. Until then, for today's machines, optimistically you can expect the PCI bus bandwidth into
main memory (including bus acquisition) to be just a bit over 40 Mbytes/sec. Naturally, you should
expect increased throughput to main memory as the PCI bus matures.
Early PCI bus interface designs used discrete ICs. However, that approach is costly and doesn't
fully comply with the specification. Programmable logic, such as an FPGA, is a more promising
approach in certain cases. Assuming its I/O drivers are PCI-compliant, an FPGA is acceptable if
the designer implements a bus master or bus slave. Altera claims that its MAX7000 family
devices (up to the 7128) are PCI-compatible; the Xylinx XC73XX, ranging from the 18-macrocell
7318 to the 108-macrocell 73108, reportedly comply with all points in the PCI checklist.
ASICs offer another alternative. Vendors are providing the necessary ASIC cores for structuring a
PCI bus interface. However, ASICs are expensive and continue to pose a time-to-market risk. In
some cases, core implementations do not guarantee designers PCI compliance. Although the I/O
drivers and transfer state machines for these ASIC cores may comply with the PCI specification,
there are other compliance issues to consider, too. You need to know if the ASIC core handles
PCI subtleties, such as a delayed transaction, and if the core implements all of the PCI
configuration space.
The ASIC alternative
Even when an ASIC core is used to create a PCI bus interface, you must still possess a modest
understanding of the PCI bus spec. For example, you are required to obtain a vendor and device
ID number assignment from the PCI SIG (or borrow one), construct a configuration space region
that properly describes your function, and ensure that your custom function meets the timing and
behavioral requirements of the PCI spec.
Figure 4
Specialized single-chip PCI bus controllers such as AMCC's PCI Matchmaker IC (Fig 4), do
provide PCI compliance. Some PCI-interface devices require a processor to drive the application
side of the device interface. Others, such as Matchmaker, do not require a processor on the
application side, a trait that can be advantageous in a design requiring a master and a slave on
the same board.
Once you've prototyped a PCI add-in product, you must subject it to various tests to ensure that it
works properly with other PCI products. You have several ways to accomplish this. Because it is
impractical and expensive to acquire access to every PCI system built, you need alternative
approaches to compliance verification. The PCI SIG (Portland, OR, (503) 797-4297) has
orchestrated one of the better methods, termed "PCI Compliance Workshops." PCI SIG
coordinates a gathering of interested members to verify their products (often prototypes and
preproduction designs) with each other. These compliance workshops are held quarterly, usually
in the San Francisco Bay area, and involve motherboard makers, system vendors, chip-set
manufacturers, BIOS companies, and various add-in-product manufacturers. The PCI SIG also
attends and provides a battery of compliance tests of its own. Test results remain confidential to
participants.
On successful completion of a compliance-workshop session, coupled with the submission of the
PCI-compliance checklist (which is more than 100 pages), your product then can appear on the
PCI-member "Confidential Integrators List." The PCI SIG maintains the integrator's list so that
SIG members can reduce the risk of noncompliance by using only components, BIOS programs,
or entire motherboards known to be compliant. Another way to make the list is to use an
independent testing organization such as National Software Testing Laboratories (NSTL,
Conshohocken, PA, (610) 941-9600) or VeriTest (Santa Monica, CA, (310) 450-0062). For a
modest fee, NSTL will test your product in a number of environments. On successful completion
of the tests and filling out the compliance checklist, your product will appear on the integrator's
list.
Author's biographies
Bernie Rosenthal, director of the Computer Products Business Unit at AMCC, holds a BSEE and
an MS in Industrial and Systems Engineering as well as an MBA from the University of Southern
California. He has held a number of marketing and sales positions both at AMCC and TRW Inc.
Ron Sartore is the chief architect for PCI interface products at AMCC. He has implemented
several award-winning designs, among them the Cheetah Gold 486 featured in EDN's All-Star
PC series (March 1990). Sartore has a BSEE from Purdue University.
Download