AGE_System_Rev_H

advertisement
The architecture of graphics accelerator and its SW application driver
Written by: Shmuel Wimer – Bar Ilan University, School of Engineering
Host’s application and graphics accelerator mode of operation
Graphics accelerator (AGE) is used by a software implementing some graphics
application involving the rendering and animation of some object. In the following we
define the handshaking between the application and AGE, comprising few basic
computer graphics operations. These may be enhanced during the project if time permits
or left for future project.
The software is running on a host computer equipped with graphics board. The
application first initializes the system by performing the following operations:
1. Defining the object to be displayed, both its geometry and color.
2. Performing triangulation of its surface, calculating the vertices edges and the triangles
that comprise the object.
3. Assigning colors to every vertex.
4. Sending the following initial data to the USB port
a. List of vertices comprising the triangulated object’s surface. Every vertex will
have
 xw, yw, zw
world coordinates and RGB colors in agreed formats. A
special order is imposed on vertex list to enable pipeline processing of
triangles within any frame.
b. List of edges given by their two end vertices. This list is also ordered such that
the list of vertices implied by traversing the list of edges is contiguous (an
edge end new vertex doesn’t leave a “hole” in vertex list).
c. List of triangles given by their enclosing edges in cyclic order. The list should
be synchronized with that of edge list and hence with vertex list too, such that
continuous progression along triangle list will impose continuous progression
along vertex list.
d. An outward normal vector to every triangle. The order of this list must match
the order of triangles to enable the hardware pipeline.
1
e. A
box
of
the
real
world
where
the
object
exists,
given
by xwmin , xwmax , ywmin , ywmax, zwmin, zwmax.
f. The screen viewport where the object should be displayed, given
by xsmin , xsmax , ysmin , ysmax .
g. An indicator which one of the projection planes XY , XZ or YZ will be
displayed in the screen viewport.
h. A light unit vector L   l x , l y , l z  .
i. Background RGB colors for the frame buffer.
The application then performs an animation session where the above object with its world
box are moving and rotating in the space according to some externally defined trajectory.
The animation related data is transferred to the USB port at a rate of 24 frames/Sec and
it comprises a 4  4 matrix representing object’s new position, but as explained below,
only 12 entries are involved in calculation of object’s position.
AGE then performs the animation at a rate of 24 frames/Sec by applying hardware
operations whose mathematical definitions are described subsequently and hardware
implementation specified elsewhere. The result of every animation step is the contents of
a frame buffer comprising
 xsmax  xsmin    ysmax  ysmin  pixels
associated with RGB
colors each. The frame buffer data is addressed directly to the graphics board of the host.
Software and hardware pipeline implications
The so called “graphics pipeline” described herein lends itself to pipelined hardware
implementation where the processing of triangles comprising displayed objects possesses
a great deal of overlap. It is unnecessary to first transform all vertices into new world
coordinates and only then start rasterization of first rectangle. Instead, if the order of
vertices stored in memory is such that “surface continuity” is maintained, where
addressing next vertex in memory defines a new triangle whose two other vertices have
already been addressed and transformed, one can then start pipelined processing of that
triangle. This requires synchronizing the addresses of vertices and triangles in their
2
corresponding memories to maintain this continuity of addresses. This synchronization
(order of streaming vertices, triangles and triangle’s normal vectors) is the responsibility
of the application software. The hardware will need to maintain a pointer of vertex
memory, designating up to what address vertices have already been transformed, as the
process of triangle cannot start before world coordinates of its three vertices are updated.
The graphics pipeline should avoid divisions as much as possible, as this is the most
clock cycle consuming operation. Therefore, pixel computations involving divisions will
takes place in screen coordinates rather than world coordinates. This enables the usage of
reciprocal of dividers stored in a permanent lookup table, which then replaces division by
multiplication. The word stored at an address is the reciprocal value of that address. The
size of lookup table must support all possible divisions taken in screen coordinates as
explained later.
Storing the initial (permanent) and variable data in the memories of AGE
The representation of an object implies few memories. Some contain permanent data
being loaded at initialization, while the data of others is changed along the animation
session.
1. Variable world vertex memory is loaded at initialization. It stores the dynamically
changing vertex positions as resulted along the animation. Its content is the outcome
of multiplying a 4  4 position matrix by a 4-tuple homogeneous coordinate
representation of a vertex as described below. Vertices are indexed according to the
order they fed to AGE through the USB port, and their index is the address of a
memory word entry comprising  xw, yw, zw world coordinates. Vertex world is
transformed later to screen viewport position. Let us denote by V the number of
vertices. Since every coordinate is a 32 bit integer (4 bytes), the size of this memory
is V 12 bytes.
2. Permanent vertex RGB memory is loaded at initialization. It stores RGB of each
vertex. Vertices are indexed according to the order they fed to AGE through the USB
port, and their index is the address of a memory word entry comprising RGB values.
3
A vertex requires three bytes for its color; hence the size of RGB memory is
V  3 bytes.
3. Permanent edge memory contains the list of all edges comprising the triangles. A
word of this memory stores two indices of the edge’s end vertices. Denote by E the
number of edges. Assuming three bytes for a vertex index, the size of edge memory is
therefore E  6 bytes.
4. Permanent triangle memory, a word of which contains the indices of edges
comprising a triangle in cyclic order. Denote by T the number of triangles. Assuming
3 bytes for edge index, its implied size is T  9 bytes. Since all triangles are
embedded on the surface of the body, Euler formula implies V  E  T  2 . Assuming
1M triangles, there’s a total of
1  3 / 2  M  1.5M edges.
It follows from Euler
formula that there are 0.5M vertices. Figure 1 demonstrates the relation among the
above memories.
5. Variable
triangle
outward
normal
memory.
Every triangle
implies
a
plane Ax  By  Cz  1  0 whose parameters are varying along the animation. This
memory stores the parameters  A, B, C  , a data used for two purposes. First, a
decision of whether the triangle is a front one or back one with respect to viewer’s
eye, hence whether it is potentially visible or certainly hidden. The latter case occurs
for about half of the triangles; hence all rasterization calculations can be skipped for
those. The second purpose is for deciding whether an individual pixel is potentially
visible or certainly not, a decision that takes place in Z-buffer described later. The
derivation of the real world depth of that pixel is using  A, B, C,1 . The initial values
of these parameters are loaded at initialization. Every parameter occupies 4 bytes,
resulting in a memory of T  12 bytes.
6. Variable screen viewport vertex memory stores the coordinates of every
transformed vertex obtained by transformation to screen coordinates (described
below). Every word of this memory stores  xs, ys  coordinates of the projected plane.
Assuming that a pixel coordinate is stored in two bytes, the size of variable vertex
memory is V  4 bytes.
4
7. Variable edge slope memory stores the slopes of triangle edges. Edge slope is
required for the computations involved in the rasterization of triangles, and it is used
repeatedly. Using a memory, the slopes are calculated only once per animation step
and can are used repeatedly, saving division operation. Assuming four bytes slope
representation, memory size is E  4 bytes.
8. Permanent screen coordinate reciprocal memory is a lookup table aiming at
saving division operations made in screen coordinates. In particular, the interpolation
of RGB values of a pixel involves divisions of screen coordinate ranges, which are
integral numbers that do not exceed screen size. Therefore, the values of these
fractions can be calculated by hardware during initialization for the entire session and
then
be
used
by
multiplications
rather
than
divisions.
Denoting
by
xsmin , xsmax , ysmin , ysmax the minimum and maximum screen coordinates, and using 4
bytes to represent fixed point reciprocal value, the size of this memory
is max xsmax  xsmin , ysmax  ysmin   4 .
9. Variable Z-buffer (depth buffer) memory is used for hidden surface removal. Its
word contains two data items. One is the smallest real world depth of a pixel in the
screen viewport (the nearest to viewer’s eye) and the other is the index of the triangle
which dictates this Z-value. Assuming a depth needs four bytes, the index of triangle
three bytes and size of the frame buffer is D , the memory size is D  7 bytes.
10. Variable Display (frame buffer) memory stores the final image to be flushed via
USB port for display on host’s screen. It stores RGB of every pixel, requiring
D  3 bytes.
Total memory size requirements
Let us estimate the total area required for the above memories. Collecting all the above
specifies sizes yield the following expression:
12  3  4V  6  4 E  9  12  T  7  3 D  19V  10E  25T  10D ,
where V, E, T and D are numbers of vertices, edges, triangles and pixels. Using relations
between vertices edges and triangles discussed before and assuming 1M triangles and 1M
pixels yields 55.5M byte memory. Cutting the number of triangles to 0.1M implies
5
14.55M bytes, reducing the number of displayed pixels 0.1M (e.g., 400X250) yields
5.55M bytes, further reduction of number of triangles to 10K will result 1.455M bytes.
The rendering pipeline and its mathematical calculations
The following elaborates on the computations involved in the rendering pipeline. The
rendering pipeline is divided into two major parts. In the first all world data is
transformed according to the position matrix of the new animation step. This includes
vertices position, outward normal vectors, edge slopes, and world to screen coordinate
conversion of vertices. The rest operations are carried out per rectangle one after the
other. Since some operations involve divisions, a special attention is required to avoid
zero division.
1. Multiplication of real world vertices by transformation matrix. This operation
takes place in real world coordinates. A vertex stored in variable world vertex
memory is first converted into homogeneous representation  xw, yw, zw,1 and then
multiplied by the 4  4 position matrix to yield its new position in the world. The
result is then stored back in variable world vertex memory, overriding the previous
position. The operation involves 9 multiplications and 9 additions of 4-byte operands
as explained below. The usage of a single memory for vertex coordinates implies that
the hosting application will send incremental position matrices describing the position
change since last animation step.
2. Multiplication of triangle outward normal vector by transformation matrix. As
the object is changing position, the outward normal vectors of its triangles are
changing correspondingly. This change is obtained by converting first the vector
stored in variable triangle outward normal memory into homogeneous
representation
 A, B, C,0 and
then multiplied by the 4  4 position matrix to yield
new normal. The result is then stored back in variable triangle outward normal
memory, overriding the previous normal. This operation involves 9 multiplications
and 6 additions of 4-byte operands as explained below. The value of D in the plane
presentation Ax  By  Cz  D  0 is required later for Z-depth calculations. It is
6
obtained by D    Ax  By  Cz  where the point
 x, y, z  is
taken as one of
triangle’s vertices. The vector  A, B, C  is initially set to unit length by the application
software. Its length is then maintained unit since transformation matrix preserves
vector length (we assume that in this implementation animation excludes scaling of
world and perspective projections).
3. Calculate slopes of every edge. As described below, the rasterization of triangles for
obtaining their pixel RGB values requires the knowledge of triangle’s edge slopes.
These slopes are used for every scan line raster, so it is efficient to pre calculate the
slopes of edges and store in memory for later use. For the sake of precision slopes are
derived from world coordinates rather than screen coordinates since the latter are
obtained after rounding to nearest integer as explained below. The slope of an edge
defined by vertices
 xw, yw, zw
and
 xw, yw, zw
is  xw  xw  yw  yw .
The case of a zero denominator representing a straight vertical edge needs special
treatment, by assigning the largest or smallest 32-bit integer in 2’s complement
representation, 231  1 or 231 , respectively.
As mentioned above, this division can be performed in screen rather than world
coordinate, in the expense of precisions. The advantage of division in screen
coordinates is in the possibility to implement division as multiplication, avoiding the
need for hardware divider. Since the range of screen coordinates is limited to 1024 or
2048 at most, all denominator fractions can be pre-calculated and stored in
appropriate memory prior to starting the animation. In that case the slop calculation
stage is skipped.
4. Projecting triangle on viewing plane. Since the 3D object is projected on a 2D
screen plane, the depth coordinate which is perpendicular to the projection plane is
dropped. Assume without loss of generality that this is zw .
5. Convert every vertex to pixel coordinates. The projected world coordinate
 xw, yw
is
converted
into

a
screen
coordinate
 xs, ys 
by

transformation xs  round_int xsmin   xsmax  xsmin   xwmax  xwmin   xw  xwmin  ,
the
where
rounding is made to nearest integral number falling within the range of screen
7
coordinates. Analogous transformation applies for y . The screen coordinates thus
obtained are stored in variable screen viewport vertex memory. The scaling factor
of the above transformations is vertex and triangle independent and can therefore be
calculated once per frame computation and stored in a local register.
6. Deciding on hidden triangles is done by observing the outward normal vector
 A, B, C 
stored in variable triangle outward normal memory. It is certainly
hidden from observer’s eye if C  0 , a case where rasterization of the triangle is ruled
out, thus saving a lot of computations.
7. Scanning a triangle for rasterization. This is the most time consuming step of the
pipeline. Figure 2 illustrates a triangle whose vertex screen coordinates and edge
slopes have already been calculated and exist in appropriate memory. The first step
finds the vertex with smallest ys coordinate. Two edges are emanating from this
vertex, one left-upward and one right-upward, with known slopes which have been
calculated before, denoted by mleft and mright , respectively. These slopes have been
calculated previously in step 3. If it is decided to give up divisions, the slopes are
derived by looking into permanent screen coordinate reciprocal memory, where
the address is the appropriate difference of screen coordinates. This is elaborated in
the pseudo code below.
Every
horizontal
scan
line
yscan is
obtained
from
the
previous
one
by y scan  y scan _ old  1 . New yscan is detected versus the opposite ends of the upwardleft and upward-right edges of whether it exceeds any of them, a case where one of
the edges terminates and triangle’s third edge is invoked, or the scan terminates.
Every scan-line extends from a leftmost to a rightmost pixel obtained as follows. Let
xleft _ old and xright _ old denote the leftmost and rightmost pixels of the scan line y scan _ old ,
respectively. They are equal initially to each other as obtained from the lowest vertex
of the triangle. Let left mleft _ old and mright _ old be the slopes of the corresponding edges,
respectively. Then, xleft  round_int  xleft _ old  mleft _ old  , where round_int is a rounding
operation to nearest integer, and similarly, xright  round_int  xright _ old  mright _ old  .
Rasterization pseudo-code is attached below.
8
8. Decide on pixel visibility. This is accomplished with the aid of a Z-buffer described
before. Initially, the content of the Z-buffer is reset to store the smallest integer
( 231 in 2’s complement representations of 32-bit fixed point numbers). Then every
pixel in its turn is looked for the real world zw coordinate corresponding to that
pixel. If its value is smaller than the value found in Z-buffer (hence it is closer to
viewer’s eye), the color calculation of that pixel is progressing. In addition, the depth
value of that pixel gets update. Otherwise, the pixel is ignored and next pixel is
considered. The calculation of Z-value is made by first translating the pair
 xs
, ys pixel  of
, yw pixel  by
the
transformation xw pixel  xwmin   xwmax  xwmin   xsmax  xsmin   xs pixel  xsmin  ,
which
pixel
the
given
pixel
into
 xw
pixel
is the inverse of the transformation used formerly to convert vertices from world to
screen coordinates. yw pixel is obtained analogously. Notice that the scale factor can is
invariant along the entire computation of a frame buffer, hence can be calculated once
and stored in a register. Once
 xw
pixel
, yw pixel  is known its depth in the real world
zw pixel is obtained from the plane equation Axw pixel  Byw pixel  Czwpixel  D  0 ,
yielding zw pixel   1 C   Axw pixel  Byw pixel  D  . Coefficients are stored in variable
triangle outward normal memory. Notice that the coefficients are invariant for the
entire rectangle rasterization. Therefore, in order to avoid unnecessary memory
accesses, the coefficients should be stored in registers. The result of this depth test is
an either an update of both the nearest zw -coordinate and the triangle which implied
it, or just ignoring the update if the zw found deeper than the nearest so far.
9. Setting pixel’s color. This operation is executed only once per pixel, according to the
nearest rectangle covered that pixel, whose index is found in the Z-buffer memory.
Notice that this mode of operation excludes setting pixel’s colors from the hardware
pipeline, since all triangles must be processed first in order to know which of them is
the nearest to viewer’s eye at that pixel. This operation could be added to the pipeline,
but the number of pixel color calculation will be more than doubled, where more than
half of which are unnecessary. A pixel is assigned with nominal RGB values derived
from those exist in the vertices of the triangle it belongs to by interpolation over its
9
three vertices. Once RGB values have been set, further account of object’s surface
curvature takes place by multiplying the RGB with the factor L N N where
N   A, B, C  is the triangle’s outward normal vector and L   l x , l y , l z  is a unit light
vector pointing to the viewer (perpendicular to the screen). Notice that this dot
product is fixed for the entire rectangle, but because pixel’s color setting is excluded
from hardware’s pipeline, it is recalculated for every pixel. This is an overhead in
case T
D , but gets smaller as objects get more and more complex. The overhead
could be avoided by calculating the above lighting coefficient as a part of hardware’s
pipeline and storing it in a dedicated memory. The detailed description RGB color
interpolation is attached below.
10. Writing a pixel into frame buffer. The RGB values obtained for the above pixel are
written into a frame buffer that is eventually sent to the USB port for display on
host’s screen. This takes place at the rate of 24 frames/Sec . At every animation step
the frame is first filled by a background color as defined by the host application. It is
then filled pixel by pixel as a result of the above color calculation. Once filled, the
frame buffer is flushed out to the USB port.
3D transformations
A point P :  xw, yw, zw in 3D world is transformed into a new point P :  xw, yw, zw by
applying a series of transformations such as translation, rotations, scaling, perspective
views, and few others. Though these transformations are not necessarily linear, their
computation can be made linear by converting points into homogeneous coordinate
representation, where a point is represented by
 xw, yw, zw,1 and
the sequence of
transformations can be captured in the following 4  4 matrix
 t11 t12
t
t
T   21 22
t31 t32

0 0
t13
t23
t33
0
t14 
t24 
.
t34 

1
It is the responsibility of the software application to generate such matrices to perform the
right object drawing and animation. A new point position P is then obtained by P=TP ,
10
which
takes
the
following
explicit
equations:
xw  t11 xw  t12 yw  t13 zw  t14 ,
yw  t21 xw  t22 yw  t23 zw  t24 and zw  t31 xw  t32 yw  t33 zw  t34 . This transformation
requires 9 multiplication and 9 additions. Obviously, neither the fourth entry of a point
which equals 1, nor the fourth row of transformation matrix need explicitly represented.
Hence, the software application will send the hardware at every frame the 12 entries of
the first 3 rows, which are involved in the computation of the new coordinates.
Triangle rasterization pseudo-code (all are screen coordinates)
1. Get the 3 vertices of rectangle vA  xsA , ysA  , vB  xsB , ysB  and vC  xsC , ysC  .
2. Find the vertex with smallest y, assume it is vA  xsA , ysA  .
3. Get the slopes of edges  A, B  and  A, C  . In case of no division, slopes are obtained
from permanent screen coordinate reciprocal memory by addressing it with
| xsB  xs A | and | xsC  xs A | , and then using the appropriate sign of the differences,
respectively.
4. Find the edge with largest slope mleft and smallest slope mright . Assume these are
 A, B  and  A, C  , respectively.
ysstop  min  ysB , ysC  . Assume it is ysB .
5. xsleft  xsright  xs A .
6.
ys pixel  ys A .
7. while  ys pixel  ysstop  {
a. for  xs pixel  xsleft ; xs pixel  xright ; xs pixel    {
Decide on pixel  xs pixel , ys pixel  visibility;
}
b.   ys pixel ;
c.

 round _ int  xs

xsleft  round _ int xs A  mleft  ys pixel  ys A  ;
d. xright
A

 mright  ys pixel  ys A  ;
8. }
11
9.
ysstop  max  ysB , ysC . Assume it is ysC .
10. Set mleft to be the slope of edge  B, C  .
11. while  ys pixel  ysstop  {
a. for  xs pixel  xsleft ; xs pixel  xsright ; xs pixel    {
Decide on pixel  xs pixel , ys pixel  visibility;
};
b.   ys pixel ;
c.


xsleft  round _ int xsB  mleft  ys pixel  ysB  ;


d. xsrightt  round _ int xs A  mright  ys pixel  ys A  ;
12. }
RGB interpolation
This calculation involves few divisions per pixel. It is therefore takes place in screen
coordinates since all divisions there can use the permanent screen coordinate
reciprocal memory. Let  xs1, ys1  ,  xs2 , ys2  and  xs3 , ys3  be the screen the vertices of
the triangle whose internal pixel  xs pixel , ys pixel  is aimed at color setting. The line passing
through vertex  xs1, ys1  and the internal point  xs pixel , ys pixel  is intersecting the opposite
edge  xs2 , ys2  ,  xs3 , ys3  at  xsmid , ysmid  given by
(1) xsmid 
 xs  xs   xs ys  xs ys    xs  xs   xs ys  xs
 ys  ys   xs  xs    ys  ys   xs  xs 
1
pixel
1
2
pixel
3
3
2
2
2
3
3
2
1
3
pixel
1
pixel
ys1 
pixel
Calculation of (1) requires one division and eight multiplications. The nominator of (1)
includes cubic power of screen coordinates, so it is required checking whether 32 bit
integer can represent it. Moreover, the divider is an integral number which is a square of
screen coordinates. Permanent screen coordinate reciprocal memory stores reciprocal
of screen coordinates only. Therefore, the divider will be calculated indirectly by
multiplying 1  xs1  xs pixel  with 1  ys2  ys3  and by multiplying
1  xs2  xs3  with
12
1  ys1  ys pixel  . Notice also that the divider in (1) may become zero if the pixel involved
is a vertex, a case that should be detected upfront and if found true, the color setting is
taken directly from vertex. A more delicate situation occurs when the projection of the
triangle degenerates into vertical straight line. This needs also pre-detection, and if found
true, interpolations is made with y-coordinates as follows:
(2) ysmid 
 ys  ys   ys xs  ys xs    ys  ys   ys xs  ys
 xs  xs   ys  ys    xs  xs   ys  ys 
1
pixel
1
2
pixel
3
3
2
2
2
3
3
2
1
3
pixel
1
pixel
xs1 
.
pixel
Assuming that midpoint interpolation was done by (1), a color at
 xsmid , ysmid 
is
obtained by interpolating between  xs2 , ys2  and  xs3 , ys3  as follows.
(3) R mid  R 2 
R3  R2
 xsmid  xs2  .
xs3  xs2
The divider in (3) is obtained from permanent screen coordinate reciprocal memory.
Then, the value at the internal pixel is found as follows.
(4) R pixel  R1 
R mid  R1
 xs pixel  xs1  .
xsmid  xs1
The divider in (4) is also obtained from permanent screen coordinate reciprocal
memory. G and B values are obtained similarly. If interpolation of midpoint is done
according to (2), calculation of (3) and (4) are using y instated of x.
13
Download