GPU Particles - MikeCann.co.uk

advertisement
Real-Time State
Preserving Particle
Systems Simulated
on the GPU
Michael Cann (n0355877)
This report is submitted as part requirement for the BSc Degree in Computer Games Programming at
The University of Huddersfield. It is substantially the result of my own work except where explicitly
indicated in the text. The report may be freely copied and distributed provided the source is explicitly
acknowledged.
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
1.0 Abstract
Particle effects are everywhere. The movie, TV and games industries make heavy
use of particle effects to add realism or simply to impress the audience with
explosive visuals. Whether it’s to add smoke to the barrel of the hero’s gun in a
movie or have blood spray in every direction from a satisfying kill in a game, particle
effects are extensively used.
Until recently, particle effects have been limited to being computed on the Central
Processing Unit (CPU) of a computer. Computing particle movements on the CPU is a
calculation intensive operation and, as such, often limits the number of particles that
can be calculated at any one time.
With the advent of the programmable pipeline on modern Graphics Processing Units
(GPUs) a whole host of new opportunities for increasing performance opened up. For
the first time particle effects could be totally calculated on the graphics card allowing
the CPU to handle other tasks.
Even more recently the ability to carry out texture processing in vertex processing
pipeline has expanded the capability of GPU based particle systems. Texture lookups
in the vertex shader allows for a much more streamlined pipeline and greatly
improves the performance of particle systems.
Page 2
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
Contents
1.0 Abstract .................................................................................................... 2
Contents ......................................................................................................... 3
2.0 Introduction ............................................................................................... 5
3.0 Research ................................................................................................... 6
3.1 Existing Research on GPU Particles ............................................................. 6
3.1.1 Building a Million Particle System (Lutz Latta) ........................................ 6
3.1.2 GPU Particles (nVidia ) ........................................................................ 9
3.1.3 ParticlesGS (Microsoft) ........................................................................ 9
3.2 Specific technological research ................................................................ 10
3.2.1 Vertex Textures ............................................................................... 10
4.0 Implementation ........................................................................................ 13
4.1 Form and Controls ................................................................................. 14
4.1.1 Generic Controls............................................................................... 15
4.2 Static Particle System ............................................................................. 16
4.3 Dynamic CPU Based Particle System ........................................................ 17
4.4 Dynamic GPU Based Particle System ........................................................ 18
4.5 Dynamic GPU Based Particle System with Vertex Textures .......................... 21
4.6 Dynamic GPU with Vertex Textures and Forces .......................................... 23
5.0 Analysis................................................................................................... 25
5.1 The testing method ................................................................................ 25
5.2 Data ..................................................................................................... 26
5.3 Graph ................................................................................................... 27
6.0 Conclusion ............................................................................................... 28
7.0 Appendix ................................................................................................. 29
7.1 Appendix A – Test System Specs ............................................................. 29
Page 3
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
7.2 Appendix B - Speed Comparisons of Java, C# and C++ .............................. 30
7.3 Appendix C – Best Fit Graph .................................................................... 31
8.0 References ............................................................................................... 32
Table of Figures
Figure 1 - A Screenshot from Latta's Implementation ........................................... 6
Figure 2 - The Alternative Representation of a Texture .......................................... 7
Figure 3 - Diagram showing the Update Process (Kruger, Kipfer, Kondratieva, &
Westermann, 2005).......................................................................................... 8
Figure 4 - GPU Particles .................................................................................... 9
Figure 5 - Particle GS Screenshot ....................................................................... 9
Figure 6 - A CPU based particle flow ................................................................. 11
Figure 7 - A GPU Based Particle System with Read-Back ..................................... 11
Figure 8 - A GPU Based Particle System Using Vertex Textures ............................ 12
Figure 9 - XNAGPUParticles Application Form ..................................................... 14
Figure 10 - 1,000,000 Static Particles at 80FPS .................................................. 16
Figure 11 - CPU Based Dynamic Particle System Rendering 160,000 Particles at
10FPS ........................................................................................................... 17
Figure 12 - GPU Based Dynamic Particle System Rendering 360,000 Particles at
10FPS ........................................................................................................... 19
Figure 13 - GPU Based Dynamic Particle System Rendering 450,000 Particles at
45FPS ........................................................................................................... 21
Figure 14 - Complex Force Interactions with Hundreds of Thousands of Particles .... 24
Figure 15 - Graph Plotting Particle System Performance ...................................... 27
Page 4
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
2.0 Introduction
This paper is a continuation of the research paper “Stateless Particle Systems in
HLSL with RenderMonkey” that was written during late 2007 by myself (Cann, 2006).
The previous paper solely covered stateless particle systems that could easily be
simulated from within the confines of RenderMonkey. This paper covers the broader
concepts of GPU particle systems, in particular state preserving particle systems.
GPU programming is a modern technology, simulating particle systems purely on the
GPU is an even more recent technology. This paper will first explore the current
methods that have been employed when simulating and rendering state preserving
particle systems on the GPU.
Once relevant research into existing technologies has been carried out, a method for
the construction of a state preserving particle system will be designed. The design
will then be implemented and a log of the implementation process recorded. Once
the implementation has been completed an analysis of the results will be conducted
and conclusions drawn.
Page 5
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
3.0 Research
Before work can be carried out on the implementation of a state preserving GPU
particle system research must be carried out to determine the correct and most
efficient implementation method to use.
3.1 Existing Research on GPU Particles
As stated previously, programming state preserving particle systems to be simulated
on the GPU is a modern development. This section examines research that has
already been carried out in this field and examines the methods that were used.
3.1.1 Building a Million Particle System (Lutz Latta)
“Building
a
Million
Particle
System”
is
a
paper
published by Lutz Latta for the “My Game Developers
Conference” in 2004 (Latta, Building a Million Particle
System,
presented
2004).
at
The
the
article
“Graphics
was
subsequently
Hardware
2004”
conference and was also re-published in a July 2004
Gamasutra article by the same name (Latta, Building a
Million-Particle System, 2004).
The paper is one of the first to appear that proved state
Figure 1 - A Screenshot
from Latta's
Implementation
preserving particle systems could be implemented purely on the GPU. As the title
suggests the implementation is able to get a very large number of particles to be
simulated dynamically (1 million particles on a 6800GT at 12FPS).
Latta describes a method for preserving the state of the particles during each
iteration of the program loop so that dynamic forces can be applied (see my previous
paper for more information on the differences between stateless and state preserving
particle systems). His method involves storing certain properties of each particle in a
number of textures.
Page 6
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
A texture is made up of a two dimensional array of pixels. Pixels store a single colour
value each, with each colour value being comprised of three separate channels; red,
green, and blue. Latta proposed that instead of a texture representing an array of
pixels it could represent an array of particles with a pixel corresponding to a single
particle. In that model, rather than a pixel containing three values that represent a
colour the pixel will contain 3 values that represent the position or velocity of a
particle.
Figure 2 - The Alternative Representation of a Texture
Standard textures use 8 bits per colour channel. This results in a maximum of 255
values for red, green and blue in each pixel. This is okay for colours, as when the red
green and blue components are combined you have: 255*255*255 = 16581375
possible colours which is plenty to represent a realistic image. When each colour
channel represents a value in a particular dimension however it becomes apparent
that 255 possible positions in the x,y or z directions isn’t enough to create a realistic
particle effect.
To solve this problem Latta exploits a relatively new technology called the floating
point programmable pixel pipeline. The floating point pipeline, as the name suggests,
allows for floating point textures to be used within the programmable pixel pipeline
(pixel shaders).
Floating point textures are able to store a much larger range of possible positions for
each particle as they use 32 bits per colour channel as opposed to the 8 in standard
textures. 32 bits equates to 4294967295 possible values in each colour channel and
a total of 79228162458924105385300197375 different possible positions for each
particle. This results in a much more accurate particle simulation.
Page 7
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
Floating point textures can then be processes as per normal from within a pixel
shader and standard particle calculations such as updating the particle’s position can
be done by doing texture lookups. New positions are then outputted as a floating
point colour value to a renderable floating point texture.
Figure 3 - Diagram showing the Update Process (Kruger, Kipfer, Kondratieva, &
Westermann, 2005)
Once all the particles have been updated they then need to be drawn to the screen.
The traditional approach would be to take the outputted texture file and then for
each pixel/particle render a particle to the screen using either point sprites or screen
aligned quads (see my previous paper on the differences between quads and point
sprites). This is very costly as it requires a great deal of bandwidth between the
graphics card and CPU as textures are swapped back and forth.
Latta proposes a better solution that means that the output from the particle update
stage is then passed directly to a vertex shader that is able to render a stream of
vertices into particles. Latta describes “Uber-buffers” (also known as “Super-buffers”
(Percy, 2003)) as allowing accelerated asynchronous data copying with graphics
memory. At the time Latta wrote the paper Uber-buffers were only available in the
OpenGL API and not in the DirectX API.
Page 8
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
3.1.2 GPU Particles (nVidia )
‘GPU particles’ is a demo written by nVidia that was “inspired
by Lutz Latta's talk” (nVidia, Unknown). nVidia in the most part
uses the same techniques as Latta for their “million particle”
demo.
One key feature that nVidia add to their demo is the use of
Figure 4 - GPU
Particles
motion blurring to create a more realistic impression of rapidly moving particles.
nVidia achieve motion blurring by rendering each point sprite a number of times with
each render set at a lowered alpha transparency value. The result is an impression of
fast moving particles. Performance of the system does however take a hit as each
particle is rendered a number of times.
The nVidea demo also claims to use the Multiple Render Buffer (MRB) property to
enhance the performance of the system by condensing the particle updates into a
single shader pass. In theory this should provide a decent improvement in
performance compared to the multiple-pass-update method employed in Latta’s
demo. On closer inspection in the source code however it becomes evident that this
facility has been commented out for some undetermined reason.
3.1.3 ParticlesGS (Microsoft)
Microsoft’s implementation of a GPU based particle
system is the latest system to be developed and uses
Microsoft’s new DirectX 10 API.
DirectX 10 has many advantages over its earlier
counterparts. One new feature, the geometry shader,
is
particularly
valuable
when
creating
state
preserving particle effects. The geometry shader
Figure 5 - Particle GS
Screenshot
allows a programmer to create and destroy an arbitrary number of primitives directly
on the GPU. This means that all the updates, births and deaths involved with particle
systems can be carried out on the GPU without CPU intervention.
Page 9
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
ParticlesGS takes advantage of this new feature to create impressive particle
fireworks. Although the demo doesn’t aim to draw millions of particle on the screen it
is efficient and potentially capable of a much higher number of particles.
Unfortunately without Microsoft’s new operating system Vista and a graphics card
that’s compatible with DirectX 10 the demo cannot be run and hence experimented
with. Despite this, the geometry shader is a powerful new tool for state preserving
particle systems and should be investigated further in the future.
3.2 Specific technological research
3.2.1 Vertex Textures
Latta and nVidia created their particle demos during 2004 just as the shader model
3.0 was being released. At that time hardware didn’t support all of the features that
the new shader model brought with it. Since then many advancements have been
made in the direction of ‘a unified shader architecture’ (such as that found in
DirectX10). One of these key advancements was the introduction of vertex textures.
Vertex textures, as the name implies, allows the vertex shader to perform texture
lookup. This is an important technology for particle systems that attempt to be
simulated and rendered purely on the GPU as it prevents costly read-back.
A typical particle system where all the simulating is carried out on the CPU and then
the rendering is carried out on the GPU can be seen as:
Page
10
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
Figure 6 - A CPU based particle flow
A system where the particle updates are carried out then GPU can be seen as:
Figure 7 - A GPU Based Particle System with Read-Back
Read-back occurs once the pixel shader has finished calculating the new positions of
the particles and the resulting texture is passed back to the CPU so that a vertex
buffer can be prepared for rendering in a different pass. Read-back is bad because of
the low bandwidth in the direction of GPU to CPU and because of the inefficiencies it
creates.
Vertex textures mean that texture lookups can be carried out in the vertex shader
and hence the GPU doesn’t have to return the outputted texture from the particle
update. Instead the texture can be passed directly to the vertex shader where the
particle positions can be mapped to a vertex buffer.
Page
11
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
The new flow looks like:
Figure 8 - A GPU Based Particle System Using Vertex Textures
By preventing read-back the performance of the system should be considerably
increased. To see the results of this please see the analysis section of this document
in section 5.
Page
12
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.0 Implementation
This section details the method for implementing the GPU particle demos. The
implementation is broken down into 5 steps that progressively increase in technical
complexity. The reason for implementing the demos in this way is so that they can
be compared against one another and the difference in performance analysed (full
analysis can be found in section 5.0 of this document).
The demos were written in C# using the drawing library XNA (which is built on
DirectX 9). C# is an excellent language to rapidly prototype new features and
generate demos. Although it is an interpreted language its performance is not too far
short of that found in more traditional languages like C++ (see appendix B for a
speed comparison between C++,C# and Java). This paper focuses on research into
techniques involved with simulating and rendering particle systems on the GPU. As
such the performance decrease involved with using C# rather than C++ should not
be an issue as the majority of the calculations are to be carried out on the GPU.
Page
13
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.1 Form and Controls
Before developing particle systems an environment must be constructed so that the
properties of the demos can easily be modified and the results observed. Another
reason for creating the demos in XNA is the fact that it is able to seamlessly
integrate with C# and .Net v2.0 components.
Figure 9 - XNAGPUParticles Application Form
Figure 9 demonstrates the interface created with C# and XNA. Controls on the left
hand side of the window allow the user to modify properties and values and see
them reflected in the render in the centre of the screen.
The controls have been split into two categories; those that are generic to all demos
and ones that are specific to the demo currently active.
Page
14
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.1.1 Generic Controls
The first three generic controls (shown in figure 9) are fairly self explanatory.
Several textures have been included with the demos for testing; these can be
accessed from a drop down. The number of particles can be increased or decreased
from a minimum of 1,000 to a maximum of 1,000,000 using a slider. The individual
particle size can also be modified using a slider.
The second three controls can turn properties on or off via checkboxes. The first
checkbox allows the user to set whether distance perspective is taken into
consideration when rendering. If enabled, a different rendering pass is used to
calculate the size of each particle based on its distance from the camera. An inverse
square ratio is used to make the effect more pronounced.
The second checkbox allows the user to define whether multiple render targets are
used when updating dynamic particle systems. If enabled, the demos use different
techniques when updating the particles on the GPU to take advantage of rendering to
multiple textures simultaneously.
The final checkbox turns on or off alpha blending. When enabled each particle is
blended against the current scene when rendered. This process can be intensive on
the GPU and hence turning it off should increase performance of a particle system.
Page
15
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.2 Static Particle System
The static particle system is the most basic of particle systems. It involves filling a
large vertex buffer with random positions and then passing that data to the GPU for
rendering as point sprites.
This effect is very simple, it has no update passes and the vertex data never
changes. As such, the vertex buffer only needs to be sent to the graphics card once
in the initialisation phase. The result of this simplicity is that the effect is very quick,
capable of rendering one million particles at 80 frames per second (FPS).
Figure 10 - 1,000,000 Static Particles at 80FPS
Page
16
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.3 Dynamic CPU Based Particle System
The CPU based dynamic system is the most commonly used particle system in
games and as such should serve as in important benchmark against the more
advanced techniques explored later on.
The CPU based particle system involves individual particle states stored in a large
array with updates to their position and other properties performed each frame on
the CPU. Once updated the properties are then used to construct a vertex which is
then passed to the GPU for rendering as point sprites.
The bottleneck for this type of particle system occurs on the CPU where many
calculations have to be carried out to update the particles in a sequential manner.
The result of this is that the demo is only capable of rendering 160,000 particles
before the frame rate drops below 10FPS.
Figure 11 - CPU Based Dynamic Particle System Rendering 160,000 Particles at
10FPS
Page
17
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.4 Dynamic GPU Based Particle System
The third particle system is the first to use the GPU as a general processing unit.
Particle positions and velocities are updated in parallel in pixel shaders on the GPU
rather than sequentially on the CPU.
Three separate textures are used to store the states of the individual particles. Each
texture is a 128bit floating point texture with 32bits for each colour channel.
One texture is used to represent the positions of the particles. The red (R), green
(G), blue (B) components directly translate to XYZ components in the particles’
position. The final alpha (A) component of the texture is used as a time step modifier
such as the one described in the Kruger and Westermann paper (Kruger, Kipfer,
Kondratieva, & Westermann, 2005) and is used to control when individual particles
should spawn so that they don’t all spawn at once.
The second texture is used to represent the velocities of the individual particles. As
with the position texture, the RGB components are used to represent the XYZ values
of a particle’s velocity. The alpha component in this texture is unused.
The final texture is used for the starting velocities for the particles. It is needed so
that particle birth and deaths can be calculated purely on the GPU. If this texture
wasn’t used then the birth and death of individual particles would have to be carried
out on the CPU. As mentioned earlier this would result in the velocity texture being
read-back from video memory to standard memory causing performance to suffer.
As with the other textures the RGB component is used to represent the XYZ starting
velocities and the A component is left unused.
During the update phase the textures are passed to a pixel shader and the resultant
position and velocity passed out. As textures cannot be read from and written to at
the same time a double buffered technique is used. The position and velocity
textures have two copies stored in video memory and are arranged so that when one
is being read the other is written to and the following frame they are swapped over.
Page
18
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
This demo also features the Multiple Render Target (MRT) feature that was purported
to be in the nVidia demo. As mentioned in section 3.1.2 this feature allows the pixel
shader to output two colours instead of one allowing the positions and velocities to
be updated in a single pass. Toggling this feature on and off however appears to
have little affect on the frame rate.
Figure 12 - GPU Based Dynamic Particle System Rendering 360,000 Particles at
10FPS
This demo implements the core features found in Latta’s and nVidea’s demos. During
implementation however it was noticed that the demo was only capable of rendering
approximately 360,000 particles before it reached 10 FPS as opposed to the one
million particles rendered in Latta’s and nVidea’s demos. The reason for this
performance deficit can be attributed to when the position texture is read-back from
the graphics card. Latta and nVidia are able to prevent read-back this by using a
feature known as “Super buffers” or “Uber buffers”.
Super buffers allow the programmer to specifically define an area of video memory
for a given task (Mace, 2003). This means that the pixel shader can write directly to
Page
19
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
an area of video memory which can then be cast to a vertex array for direct use
during the rendering step. Super buffers mean that no read-back occurs and hence
Latta and nVidia can get three times the number of particles updated and rendered
than this demo is able to.
Unfortunately DirectX and hence XNA do not support super buffers and as such this
demo is unable to perform at the same levels as Latta’s and nVidea’s demos.
However, an alternative does exist that is also available to DirectX and XNA and is
demonstrated in the next section of this document.
Page
20
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.5 Dynamic GPU Based Particle System with Vertex Textures
Vertex textures as researched in section 3.2.1 of this document are a new
technology and are perfectly suited to the requirements of GPU based particle
systems. By performing texture lookups directly in the vertex shader the expensive
read-backs can be avoided.
The particle updates are carried out in exactly the same way as the previous demo
except the position texture instead of being read-back is passed directly to a
different rendering shader. The actual texture lookup in the vertex shader is carried
out with the line:
float4 pos = tex2Dlod(PositionsTex, float4(PositionCoord,0,0));
The PositionCoord variable is defined per-vertex at the initialisation phase. They are
arranged so that each vertex maps to an individual pixel on the texture. As the
vertex positions are dynamically calculated on the GPU each frame the vertex buffer
never needs updating and hence never has to be resent to the GPU.
Figure 13 - GPU Based Dynamic Particle System Rendering 450,000 Particles at
45FPS
Page
21
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
As figure 13 demonstrates, using vertex textures has boosted the frame rate to
45FPS with almost half a million particles being updated and rendered at the same
time. Unfortunately the testing system is unable to achieve any higher number of
particles than that as the 6800GT graphics card doesn’t support vertex textures large
enough to render over 450,000 particles.
In section 5 of this document a full analysis is conducted and graphs plotted to
predict the FPS the demo should reach running one million particles on the test
system. Having tested on other, more recent, hardware however the demo is able to
render one million at high frame rates.
Page
22
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
4.6 Dynamic GPU with Vertex Textures and Forces
Up until now the demos have been an investigation into the technology that can
provide increasingly efficient particle systems without thought spared to their
application in the real world. The final demo attempts to rectify this by introducing
forces.
Currently each particle system demo only applied the bare minimum of calculations
in the update passes to create a particle fountain effect. Gravity is added to velocity,
which is then applied to the position of each particle. For the final demo however
forces were introduced which created a more complex update procedure.
HLSL supports arrays of variables (non-texture variables) and can be dynamically
assigned per frame. In this demo the user is able to position the forces in the world
and as such variable arrays allow a number of force positions to be passed to the
shader in an array. Before the velocity is applied to the position of the particle,
forces are applied to the velocity in a loop:
for (int i=0; i<numForces; i++)
{
float3 diff = ForcePositions[i]-Out.Position.xyz;
float lenSqr = vecLenSquare(diff);
float inv = 1/lenSqr;
diff*=inv*ForceModifiers[i];
Out.Velocity.xyz += diff;
}
First the distance between the current particle and each force is calculated and the
length squared is calculated. vecLenSquare is a custom function created as HLSL
doesn’t have one and calculating the length of a vector is expensive as it requires a
square root calculation. Once calculated the distance is inversed then multiplied by
user defined force modifier. The force modifier contains a floating point variable that
can either be negative or positive. Negative numbers will repel particles whereas
positive will attract.
Page
23
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
This demo has also added extra rendering calculations to the previous one. Using
controls on the left hand side of the application the user can change the rendering
algorithm and the colours used.
The result of all this added functionality is a rather spectacular display of hundreds of
thousands of particles orbiting or being repelled from forces (see figure 14).
Figure 14 - Complex Force Interactions with Hundreds of Thousands of Particles
Page
24
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
5.0 Analysis
Each particle system implemented in section 4 of this document used different
methods. Although a brief comparison of performance was conducted, it wasn’t very
strict and doesn’t give an accurate performance measure.
This section intends to test each demo in turn and compare it against the others to
get accurate figures of how much benefit one method has over another.
5.1 The testing method
Each method will be tested in turn. The number of particles will be increased in
increments and the average FPS recorded.
Some of the demos implements different particle effects (demo1 is a cube whereas
demo 2,3,4 are fountains). Because of this alpha blending will be disabled for the
tests and the distance perspective option will be turned off. This ensures that each
particle drawn will take exactly the same amount of time. This means that only the
update and the various inefficiencies of the different methods are being tested and
the differences in the shape of particle system are ignored.
Page
25
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
5.2 Data
CPU
Static
#Particles 10000 119806 302887 408510 605673 809878
FPS
875
711
216
152
100
74
1E+06
60
CPU Dynamic
#Particles 10000
FPS
133
49391
32
GPU Dynamic
#Particles 10000
FPS
226
49391 126848 169097 246554 316970 394427 507092
64
31 169097
15
12
10
8
77557 105723 162056 211346 302887
20
16
11
8
6
GPU Dynamic Vertex Textures
#Particles 10000 119806 211346 309928 408510 450759
FPS
760
219
128
68
50
45
GPU Forces (one colour)
#Particles 10000 105723 204305 309928 457801
FPS
615
203
101
55
33
GPU Forces (based on velocity)
#Particles 10000 98682 204305 309928 450759
FPS
615
240
86
74
32
Page
26
41551
4
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
5.3 Graph
The graph below was constructed from the gathered results. Results with frame rates
larger than 250FPS and number of particles larger than 600,000 have been omitted
for clarity.
Figure 15 - Graph Plotting Particle System Performance
Page
27
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
6.0 Conclusion
The results gathered clearly demonstrate what was suspected. The slowest particle
system is the one where all the updates are carried out on the CPU and is only able
to render about 170,000 particles before the frame rate drops below 10FPS. The
next fastest is the system that updates the particles on the GPU and is able to render
about 390,000 particles before the frame rate reaches 10FPS. The next three don’t
suffer from any read-back as they are implemented using vertex textures and such
reach much higher frame rates of about 40FPS at 450,000 particles. The final and
fastest particle system is the static one that reaches 60FPS with 1,000,000 particles.
As mentioned previously the testing system cannot render any more than about
450,000 particles using vertex textures and as such results cannot be taken up to
higher numbers of particles. Despite this it is possible to draw a line of best fit onto
the available results and extrapolate a expected figures at higher number of particles
(see appendix C, p.30). Doing this reveals that on the test system the particle
system implemented with vertex textures should be able to reach about 28FPS at
1,000,000 particles.
This value is quite surprising considering Latta and nVidia were only able to reach
10FPS at 1,000,000 dynamic particle using C++ and OpenGL with super buffers. One
reason that may account for this increase is the fact that the implementation
discussed in this paper doesn’t use depth sorting to arrange the particles unlike
nVidia and Latta. Should depth sorting have been implemented it’s unlikely that it
would have affected the frame rate enough to reduce it to 10FPS.
Page
28
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
7.0 Appendix
7.1 Appendix A – Test System Specs
Operating System
Windows XP Professional Service Pack 2 (build 2600)
Processor
2.27 gigahertz AMD Athlon 64
128 kilobyte primary memory cache
512 kilobyte secondary memory cache
Main Circuit Board
Board: ASUSTeK Computer INC. A8N-SLI DELUXE 1.XX
Serial Number: 123456789000
Bus Clock: 206 megahertz
BIOS: Phoenix Technologies, LTD ASUS A8N-SLI DELUXE
ACPI BIOS Revision 1014 09/27/2005
Memory Modules
2048 Megabytes Installed Memory
Slot 'A0' has 512 MB
Slot 'A1' has 512 MB
Slot 'A2' has 512 MB
Slot 'A3' has 512 MB
Page
29
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
7.2 Appendix B - Speed Comparisons of Java, C# and C++
Page
30
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
The above graph was taken from the Tommti-Systems (Tommti-Systems, 2006)
website and shows performance in milliseconds.
Maximum memory usage: Java - 163 MB, C# - 111 MB, C++- 98 MB
To summarize the table, Java gets 5 wins against C# and C# gets 9 wins against
Java. C++ is the fastest overall with a total of 11 wins against C#.
7.3 Appendix C – Best Fit Graph
Page
31
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
8.0 References
Dudash, B. (2004). Next Generation Shading and Rendering. Retrieved 02 07, 2007,
from nVidia Developer:
http://download.nvidia.com/developer/presentations/2004/Iron_Developer/English_
Advanced_Shading.pdf
Gerasimov, P., Fernando, R., & Green, S. (2004, 06 04). Vertex Textures. Retrieved
02 07, 2007, from nVidia Developer:
ftp://download.nvidia.com/developer/Papers/2004/Vertex_Textures/Vertex_Textures
.pdf
Kruger, J., Kipfer, P., Kondratieva, P., & Westermann, R. (2005, 11). A Particle
System for Interactive Visualization of 3D Flows. Retrieved 02 15, 2007, from IEEE
Transactions on Visualization and Computer Graphics:
http://wwwcg.in.tum.de/Research/data/Publications/tvcg05.pdf
Latta, L. (2004). Building a Million Particle System. Retrieved 02 06, 2007, from
2LDigital: http://www.2ld.de/gdc2004/
Latta, L. (2004, 07 28). Building a Million-Particle System. Retrieved 02 06, 2007,
from Gamasutra: http://www.gamasutra.com/features/20040728/latta_01.shtml
Mace, R. (2003). OpenGL ARB Superbuffers. Retrieved 02 15, 2007, from nVidia
Developer:
http://developer.nvidia.com/docs/IO/8230/GDC2003_OGL_ARBSuperbuffers.pdf
nVidia. (Unknown). GPU Particles. Retrieved 02 06, 2007, from nVidea Samples:
http://download.nvidia.com/developer/SDK/Individual_Samples/samples.html#gpu_
particles
Percy, J. (2003). OpenGL Extensions. Retrieved 02 06, 2007, from ATI:
http://www.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf
Page
32
Real-Time State Preserving Particle Systems Simulated on the GPU
Michael Cann
Potiy, O. A. (2005). 3D Flow visualization using GPU-driven particle system.
Retrieved 02 15, 2007, from Graphicon:
http://www.graphicon.ru/proceedings2005/papers/Potiy.pdf
Williams, I., & Heart, E. (2005, 06 01). Efficient rendering of geometric data using
OpenGL VBOs in SPECviewperf. Retrieved 02 07, 2007, from Standard Performance
Evaluation Corporation: http://www.spec.org/gpc/opc.static/vbo_whitepaper.html
Zeller, C. (2005, 06). GPU Cloth. Retrieved 02 07, 2007, from nVidia Developer:
http://download.nvidia.com/developer/SDK/Individual_Samples/featured_samples.ht
ml#Cloth
Page
33
Download