Sort-Last Parallel Rendering for Viewing Extremely

Sort-Last Parallel Rendering for Viewing Extremely
Large Data Sets on Tile Displays
Paper by Kenneth Moreland,
Brian Wylie, and Constantine Pavlakos
Presented by Adam Howard
CS594 Spring 2002- Dr. Jian Huang
Research Focus/Goal
Develop highly scalable rendering techniques.
Drive multiple tile displays with frame rates that are comparable to a single tile display
using a sort-last based parallel algorithm that scales appropriately with large data
Sort-Last Parallel Rendering:
Combines images after rasterization occurs.
Overview of Approach
The Situation:
Requirement (Output):
Target resolution of 12 million pixels or more.
• Beyond capabilities of a single commodity computer.
• Tiled display where input comes from graphics engines on different
The Approach:
Rather than render a single high-resolution image, each processor
generates images for the tiles that make up the display.
More precisely, use N processors to render and compose a large data set
with T different projections, one for each display tile.
System Organization
Tile Display
Data Set
System Area
Compaq 750
nVidia Geforce 256 Graphics Card
System Organization
Design Strategy:
• Allow any number of N processors to
contribute to rendering T images for a tile
as long as N>=T.
• Intended to draw polygons that are evenly
distributed amongst all the processors.
System Organization
Main Point of Paper
Composting Strategies
Virtual Trees
Tile Split and Delegate
Reduce to Single Tile
Compose T images for a tile display by serially running a composition algorithm
for a single display T times.
• No advantage from spatial coherence.
• Load balancing.
Virtual Trees
Based on Binary Tree Algorithm.
Each tile image has a tree that
has processors assigned to it.
Processors assigned to more than
one tree- when finished with one
job- start another.
Processor scheduling is very
Weakness: Load balancing.
Tile Split and Delegate
Attempt to achieve better load
balancing throughout composting.
Extension of the direct send algorithm.
Load balancing is ensured.
Weakness: Large amount of message
passing. Number of messages is
Reduce to Single Tile
- Attempt to reduce the problem to
that of composing a single
image in the same manor as
traditional sort-last parallel
rendering systems.
Before composting begins, each
processor holds between zero and
T images for separate tiles. The
goal is for each processor to have
one image for a particular tile.
• Good load balancing.
• Fewer messages- Order of
Reduce number of polygons sent to
graphics hardware.
Active Pixel Encoding
Reduce amount of information
passed over the network
Floating Viewport
Reduce number of times a polygon
is rendered and the number of times
the frame buffer is read back.
Experimental Results
The serial strategy has good results when the data is not spatially coherent.
Tile Split and Delegate and Reduce to One strategies were the best for spatially
coherent data.
Determined that there is a tradeoff between display resolution and rendering
Other parallel cluster systems can render larger data sets faster, but not at this
level of resolution.
Experimental Results
Experimental Results
Experimental Results
Experimental Results
The results support the initial goal of increasing resolution by rendering to a
tiled display by using a cluster of commodity computers. It also supports the
desire for scalability- such as larger data sets or higher resolution displays.
Pretty Pictures
Reduce number of polygons sent to the graphics hardware by estimating which
polygons can be ignored.
Before rendering- each processor’s polygons are grouped into several 3D regions
called buckets. Occurs when data loaded during initialization.
Before each tile image is rendered, the buckets are tested to determine which lie in
the tile.
Only the polygons in these buckets are rendered.
Weakness: A large number of buckets reduces rendering time, but increases overhead in
determining screen projections.
They ended up using a moderate amount of buckets to reduce rendering time.
Active Pixel Encoding
This method simply reduces the amount of information that is sent across the
network by making a distinction between active pixels that contain
geometric information and inactive pixels. Active pixel information is longer
than inactive pixel information and this reduces the overall overhead of
message passing between processors.
Floating Viewport
Here a virtual tile is created to encompass an entire polygon.
After processing it is split and each piece is displayed directly on each real tile it
is actually on.
Hence the system does not need to render any polygon more than once, and
the frame buffer is read back one time instead of four.
This is most effective when the ratio of tiles to processors decreases and the
data has good spatial coherency.