Graphics acceleration An example of line-drawing by the

advertisement
Graphics acceleration
An example of line-drawing by the
ATI Radeon’s 2D graphics engine
Bresenham’s algorithm
• Recall this iterative algorithm for doing a
‘scanline conversion’ for a straight line
• It required five parameters:
– The starting endpoint coordinates: (X0,Y0)
– The ending endpoint coordinates: (X1,Y1)
– The foreground color for the solid-color line
• It begins by initializing a decision-variable
– errorTerm = 2*deltaY - deltaX;
Algorithm’s main loop
for (int y = Y0, x = X0; x <= X1; x++)
{
drawPixel( x, y, color );
if ( errorTerm >= 0 ) { errorTerm += 2*delY; }
else { y += 1; errorTerm += 2*(delY – delX); }
}
How much work for CPU?
• Example: To draw the longest visible line
(in 1024x768 graphics mode) will require
approximately 10,000 CPU instructions
• The loop gets executed once for each of
the 1024 horizontal pixels, and each pass
through that loop requires about ten CPU
operations: moves, compares, branches,
adds and subtracts, plus the function-calls
Is acceleration possible?
• The IBM 8514/A appeared in late 1980s
• It could do line-drawing (and some other
common graphics operations) if just a few
parameters were supplied
• So instead of requiring the CPU to do ten
thousand operations, the CPU could do
maybe ten operations, then let the 8514/A
graphics engine do the rest of the work!
8514/A Block Diagram
Graphics
processor
RAMDAC
LUT
DAC
Display Monitor
VRAM
memory
Display processor
CRT controller
Drawing engine
CPU
ROM
PC Bus Interface
PC Bus
ATI improved on IBM’s 8514/A
• Various OEM vendors soon introduced
their own graphics accelerator designs
• Because IBM had not released details of
its design, others had to create their own
programming interfaces – all are different
• Early PC graphics software was therefore
NOT portable between hardware platforms
How does X300 draw lines?
• To demonstrate the line-drawing ability of
our classroom’s Radeon X300 graphics
processors, we wrote ‘drawline.cpp’ demo
• We did not have access to ATI’s official
Radeon programming manual, but we had
several such manuals from other vendors,
and we found ‘clues’ in source-code files
for the Linux Radeon device-driver
Programming concepts
• Our demo-program must first verify that it
is running on a Radeon-equipped machine
• It must determine how it can communicate
with the Radeon’s graphics accelerator
• Normal VGA registers are at ‘standard’ I/O
port-addresses, but the graphics engine is
outside the scope of established standards
Peripheral Component Interconnect
• An industry committee (led by Intel) has
established a standard mechanism that
PC device-drivers can use to identify the
peripheral devices that a workstation has,
and their mechanisms for communication
• To simplify the Pre-Boot Execution code,
modern PC’s provide ROM-BIOS routines
that can be called to identify peripherals
PCI Configuration Space
Each peripheral device has a set of nonvolatile memory-locations
which store information about that device using a standard layout
PCI CONFIGURATION HEADER
1024
bytes
256 bytes
ADDITIONAL
PCI
CONFIGURATION
DATA
This device-information is accessed via I/O Port-Addresses 0x3C8-0x3CF
PCI Configuration Header
Sixteen longword entries (256 bytes)
DEVICE VENDOR
ID
ID
BASE-ADDRESS BASE-ADDRESS BASE-ADDRESS BASE-ADDRESS
RESOURCE 0
RESOURCE 1
RESOURCE 2
RESOURCE 3
VENDOR-ID = 0x1002: Advanced Technologies, Incorporated
DEVICE-ID = 0x5B60: ATI Radeon X300 graphics processor
BASE-ADDRESS for RESOURCE 1 is the 2D engine’s I/O port
Our ‘findsvga.cpp’ utility will show you the PCI Configuration Space for any
peripheral devices of Class 0x030000 (i.e., VGA-compatible graphics cards)
Interface to PCI BIOS
• Our ‘dosio.c’ device-driver (and ‘int86.cpp’
companion code) allow us access to BIOS
• The PCI BIOS services are accessible (in
the Pentium’s virtual-8086 mode) using
function 0xB1 of software interrupt 0x1A
• There are several subfunctions – you can
find documentation online – for example,
Professor Ralf Brown’s Interrupt List
return_radeon_port_address();
• Our demo invokes these PCI ROM-BIOS
subfunctions to discover which I/O Port
our Radeon’s 2D graphics engine uses
– Subfunction 1: Detect BIOS presence
– Subfunction 3: Find Device in a Class
– Subfunction A: Read Configuration Dword
• Configuration Dword at offset 0x14 holds
I/O Port-Address for 2D graphics engine
The ATI I/O Port Interface
iobase + 0
iobase + 4
MM_INDEX
MM_DATA
You output a register’s index
to the iobase + 0 address
Then you have read or write access to
that register at the iobase + 4 address
Many 2D engine registers!
• You can peruse the ‘radeon.h’ header-file
to see names and register-index numbers
for the Radeon 2D graphics accelerator
• You could also write a programming loop
to input the contents from various offsets
and thereby get some idea of which ones
appear to hold ‘live’ values (i.e.,hundreds!)
• Only a small number used in line-drawing
Main Line-Drawing registers
•
•
•
•
•
•
DP_GUI_MASTER_CNTL
DP_BRUSH_FRGD_COLOR
DP_BRUSH_BKGD_COLOR
DP_WRITE_MSK
DST_LINE_START
DST_LINE_END
Others that affect drawing
•
•
•
•
•
•
•
•
RB2D_DSTCACHE_MODE
MC_FB_LOCATION
DEFAULT_PITCH_OFFSET
DST_PITCH_OFFSET
SRC_PITCH_OFFSET
DP_DATATYPE
DEFAULT_SC_TOP_LEFT
DEFAULT_SC_BOTTOM_RIGHT
CPU/GPU synchronization
Intel
Pentium
CPU
ATI
Radeon
GPU
When CPU off-loads the work of drawing lines (and doing other common
Graphical operations) tp the Graphics Processing Unit, then this frees up
the CPU to execute other instructions – but it opens up the possibility that
the CPU will send more drawing commands to the GPU, even before the
GPU is finished doing earlier commands. Some mechanism is needed to
prevent the GPU from becoming overwhelmed by work the CPU sends it.
Solution is a FIFO for pending commands, plus a Status Register
Engine has 64 FIFO slots
• Before the CPU initiates a new drawing
command, it checks to see if there are
enough free slots in the command FIFO
for storing that command’s parameters
• The CPU can do ‘busy-waiting’ until the
GPU reports that enough FIFO slots are
ready to accept new command-arguments
• An alternative is ‘interrupt-driven’ drawing
Testing ‘drawline.cpp’
• We developed our ‘drawline.cpp’ demo on
a Radeon 7000 graphics card, then tested
it on a newer and faster Radeon 9250
• Our code worked fine
• Tonight we shall try it on the Radeon X300
• If these various models of the Radeon are
fully compatible with one another, we can
expect our demo to work fine on the X300
Hardware changes?
• But if any significant differences exist in
the various Radeon design-generations,
then we may discover that our ‘drawline’
fails to perform properly on an X300
• We would then have to explore the ways
in which Radeon designs have changed,
and try to devise ‘fixes’ for any flaws that
we have found in our software application
In-class exercises
• Try running the ‘drawline.cpp’ application
on our classroom or CS Lab workstation:
maybe it works fine, maybe it doesn’t
• Look at the source-code files for the Linux
‘open-source’ ATI Radeon device-driver
• If our ‘drawline’ work ok, see if you can
add code that programs the engine to fill
rectangles or copy screen-areas; or, if
‘drawline’ fails, see if you can devise a ‘fix’
Download