Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann (n0355877) This report is submitted as part requirement for the BSc Degree in Computer Games Programming at The University of Huddersfield. It is substantially the result of my own work except where explicitly indicated in the text. The report may be freely copied and distributed provided the source is explicitly acknowledged. Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 1.0 Abstract Particle effects are everywhere. The movie, TV and games industries make heavy use of particle effects to add realism or simply to impress the audience with explosive visuals. Whether it’s to add smoke to the barrel of the hero’s gun in a movie or have blood spray in every direction from a satisfying kill in a game, particle effects are extensively used. Until recently, particle effects have been limited to being computed on the Central Processing Unit (CPU) of a computer. Computing particle movements on the CPU is a calculation intensive operation and, as such, often limits the number of particles that can be calculated at any one time. With the advent of the programmable pipeline on modern Graphics Processing Units (GPUs) a whole host of new opportunities for increasing performance opened up. For the first time particle effects could be totally calculated on the graphics card allowing the CPU to handle other tasks. Even more recently the ability to carry out texture processing in vertex processing pipeline has expanded the capability of GPU based particle systems. Texture lookups in the vertex shader allows for a much more streamlined pipeline and greatly improves the performance of particle systems. Page 2 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann Contents 1.0 Abstract .................................................................................................... 2 Contents ......................................................................................................... 3 2.0 Introduction ............................................................................................... 5 3.0 Research ................................................................................................... 6 3.1 Existing Research on GPU Particles ............................................................. 6 3.1.1 Building a Million Particle System (Lutz Latta) ........................................ 6 3.1.2 GPU Particles (nVidia ) ........................................................................ 9 3.1.3 ParticlesGS (Microsoft) ........................................................................ 9 3.2 Specific technological research ................................................................ 10 3.2.1 Vertex Textures ............................................................................... 10 4.0 Implementation ........................................................................................ 13 4.1 Form and Controls ................................................................................. 14 4.1.1 Generic Controls............................................................................... 15 4.2 Static Particle System ............................................................................. 16 4.3 Dynamic CPU Based Particle System ........................................................ 17 4.4 Dynamic GPU Based Particle System ........................................................ 18 4.5 Dynamic GPU Based Particle System with Vertex Textures .......................... 21 4.6 Dynamic GPU with Vertex Textures and Forces .......................................... 23 5.0 Analysis................................................................................................... 25 5.1 The testing method ................................................................................ 25 5.2 Data ..................................................................................................... 26 5.3 Graph ................................................................................................... 27 6.0 Conclusion ............................................................................................... 28 7.0 Appendix ................................................................................................. 29 7.1 Appendix A – Test System Specs ............................................................. 29 Page 3 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 7.2 Appendix B - Speed Comparisons of Java, C# and C++ .............................. 30 7.3 Appendix C – Best Fit Graph .................................................................... 31 8.0 References ............................................................................................... 32 Table of Figures Figure 1 - A Screenshot from Latta's Implementation ........................................... 6 Figure 2 - The Alternative Representation of a Texture .......................................... 7 Figure 3 - Diagram showing the Update Process (Kruger, Kipfer, Kondratieva, & Westermann, 2005).......................................................................................... 8 Figure 4 - GPU Particles .................................................................................... 9 Figure 5 - Particle GS Screenshot ....................................................................... 9 Figure 6 - A CPU based particle flow ................................................................. 11 Figure 7 - A GPU Based Particle System with Read-Back ..................................... 11 Figure 8 - A GPU Based Particle System Using Vertex Textures ............................ 12 Figure 9 - XNAGPUParticles Application Form ..................................................... 14 Figure 10 - 1,000,000 Static Particles at 80FPS .................................................. 16 Figure 11 - CPU Based Dynamic Particle System Rendering 160,000 Particles at 10FPS ........................................................................................................... 17 Figure 12 - GPU Based Dynamic Particle System Rendering 360,000 Particles at 10FPS ........................................................................................................... 19 Figure 13 - GPU Based Dynamic Particle System Rendering 450,000 Particles at 45FPS ........................................................................................................... 21 Figure 14 - Complex Force Interactions with Hundreds of Thousands of Particles .... 24 Figure 15 - Graph Plotting Particle System Performance ...................................... 27 Page 4 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 2.0 Introduction This paper is a continuation of the research paper “Stateless Particle Systems in HLSL with RenderMonkey” that was written during late 2007 by myself (Cann, 2006). The previous paper solely covered stateless particle systems that could easily be simulated from within the confines of RenderMonkey. This paper covers the broader concepts of GPU particle systems, in particular state preserving particle systems. GPU programming is a modern technology, simulating particle systems purely on the GPU is an even more recent technology. This paper will first explore the current methods that have been employed when simulating and rendering state preserving particle systems on the GPU. Once relevant research into existing technologies has been carried out, a method for the construction of a state preserving particle system will be designed. The design will then be implemented and a log of the implementation process recorded. Once the implementation has been completed an analysis of the results will be conducted and conclusions drawn. Page 5 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 3.0 Research Before work can be carried out on the implementation of a state preserving GPU particle system research must be carried out to determine the correct and most efficient implementation method to use. 3.1 Existing Research on GPU Particles As stated previously, programming state preserving particle systems to be simulated on the GPU is a modern development. This section examines research that has already been carried out in this field and examines the methods that were used. 3.1.1 Building a Million Particle System (Lutz Latta) “Building a Million Particle System” is a paper published by Lutz Latta for the “My Game Developers Conference” in 2004 (Latta, Building a Million Particle System, presented 2004). at The the article “Graphics was subsequently Hardware 2004” conference and was also re-published in a July 2004 Gamasutra article by the same name (Latta, Building a Million-Particle System, 2004). The paper is one of the first to appear that proved state Figure 1 - A Screenshot from Latta's Implementation preserving particle systems could be implemented purely on the GPU. As the title suggests the implementation is able to get a very large number of particles to be simulated dynamically (1 million particles on a 6800GT at 12FPS). Latta describes a method for preserving the state of the particles during each iteration of the program loop so that dynamic forces can be applied (see my previous paper for more information on the differences between stateless and state preserving particle systems). His method involves storing certain properties of each particle in a number of textures. Page 6 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann A texture is made up of a two dimensional array of pixels. Pixels store a single colour value each, with each colour value being comprised of three separate channels; red, green, and blue. Latta proposed that instead of a texture representing an array of pixels it could represent an array of particles with a pixel corresponding to a single particle. In that model, rather than a pixel containing three values that represent a colour the pixel will contain 3 values that represent the position or velocity of a particle. Figure 2 - The Alternative Representation of a Texture Standard textures use 8 bits per colour channel. This results in a maximum of 255 values for red, green and blue in each pixel. This is okay for colours, as when the red green and blue components are combined you have: 255*255*255 = 16581375 possible colours which is plenty to represent a realistic image. When each colour channel represents a value in a particular dimension however it becomes apparent that 255 possible positions in the x,y or z directions isn’t enough to create a realistic particle effect. To solve this problem Latta exploits a relatively new technology called the floating point programmable pixel pipeline. The floating point pipeline, as the name suggests, allows for floating point textures to be used within the programmable pixel pipeline (pixel shaders). Floating point textures are able to store a much larger range of possible positions for each particle as they use 32 bits per colour channel as opposed to the 8 in standard textures. 32 bits equates to 4294967295 possible values in each colour channel and a total of 79228162458924105385300197375 different possible positions for each particle. This results in a much more accurate particle simulation. Page 7 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann Floating point textures can then be processes as per normal from within a pixel shader and standard particle calculations such as updating the particle’s position can be done by doing texture lookups. New positions are then outputted as a floating point colour value to a renderable floating point texture. Figure 3 - Diagram showing the Update Process (Kruger, Kipfer, Kondratieva, & Westermann, 2005) Once all the particles have been updated they then need to be drawn to the screen. The traditional approach would be to take the outputted texture file and then for each pixel/particle render a particle to the screen using either point sprites or screen aligned quads (see my previous paper on the differences between quads and point sprites). This is very costly as it requires a great deal of bandwidth between the graphics card and CPU as textures are swapped back and forth. Latta proposes a better solution that means that the output from the particle update stage is then passed directly to a vertex shader that is able to render a stream of vertices into particles. Latta describes “Uber-buffers” (also known as “Super-buffers” (Percy, 2003)) as allowing accelerated asynchronous data copying with graphics memory. At the time Latta wrote the paper Uber-buffers were only available in the OpenGL API and not in the DirectX API. Page 8 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 3.1.2 GPU Particles (nVidia ) ‘GPU particles’ is a demo written by nVidia that was “inspired by Lutz Latta's talk” (nVidia, Unknown). nVidia in the most part uses the same techniques as Latta for their “million particle” demo. One key feature that nVidia add to their demo is the use of Figure 4 - GPU Particles motion blurring to create a more realistic impression of rapidly moving particles. nVidia achieve motion blurring by rendering each point sprite a number of times with each render set at a lowered alpha transparency value. The result is an impression of fast moving particles. Performance of the system does however take a hit as each particle is rendered a number of times. The nVidea demo also claims to use the Multiple Render Buffer (MRB) property to enhance the performance of the system by condensing the particle updates into a single shader pass. In theory this should provide a decent improvement in performance compared to the multiple-pass-update method employed in Latta’s demo. On closer inspection in the source code however it becomes evident that this facility has been commented out for some undetermined reason. 3.1.3 ParticlesGS (Microsoft) Microsoft’s implementation of a GPU based particle system is the latest system to be developed and uses Microsoft’s new DirectX 10 API. DirectX 10 has many advantages over its earlier counterparts. One new feature, the geometry shader, is particularly valuable when creating state preserving particle effects. The geometry shader Figure 5 - Particle GS Screenshot allows a programmer to create and destroy an arbitrary number of primitives directly on the GPU. This means that all the updates, births and deaths involved with particle systems can be carried out on the GPU without CPU intervention. Page 9 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann ParticlesGS takes advantage of this new feature to create impressive particle fireworks. Although the demo doesn’t aim to draw millions of particle on the screen it is efficient and potentially capable of a much higher number of particles. Unfortunately without Microsoft’s new operating system Vista and a graphics card that’s compatible with DirectX 10 the demo cannot be run and hence experimented with. Despite this, the geometry shader is a powerful new tool for state preserving particle systems and should be investigated further in the future. 3.2 Specific technological research 3.2.1 Vertex Textures Latta and nVidia created their particle demos during 2004 just as the shader model 3.0 was being released. At that time hardware didn’t support all of the features that the new shader model brought with it. Since then many advancements have been made in the direction of ‘a unified shader architecture’ (such as that found in DirectX10). One of these key advancements was the introduction of vertex textures. Vertex textures, as the name implies, allows the vertex shader to perform texture lookup. This is an important technology for particle systems that attempt to be simulated and rendered purely on the GPU as it prevents costly read-back. A typical particle system where all the simulating is carried out on the CPU and then the rendering is carried out on the GPU can be seen as: Page 10 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann Figure 6 - A CPU based particle flow A system where the particle updates are carried out then GPU can be seen as: Figure 7 - A GPU Based Particle System with Read-Back Read-back occurs once the pixel shader has finished calculating the new positions of the particles and the resulting texture is passed back to the CPU so that a vertex buffer can be prepared for rendering in a different pass. Read-back is bad because of the low bandwidth in the direction of GPU to CPU and because of the inefficiencies it creates. Vertex textures mean that texture lookups can be carried out in the vertex shader and hence the GPU doesn’t have to return the outputted texture from the particle update. Instead the texture can be passed directly to the vertex shader where the particle positions can be mapped to a vertex buffer. Page 11 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann The new flow looks like: Figure 8 - A GPU Based Particle System Using Vertex Textures By preventing read-back the performance of the system should be considerably increased. To see the results of this please see the analysis section of this document in section 5. Page 12 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.0 Implementation This section details the method for implementing the GPU particle demos. The implementation is broken down into 5 steps that progressively increase in technical complexity. The reason for implementing the demos in this way is so that they can be compared against one another and the difference in performance analysed (full analysis can be found in section 5.0 of this document). The demos were written in C# using the drawing library XNA (which is built on DirectX 9). C# is an excellent language to rapidly prototype new features and generate demos. Although it is an interpreted language its performance is not too far short of that found in more traditional languages like C++ (see appendix B for a speed comparison between C++,C# and Java). This paper focuses on research into techniques involved with simulating and rendering particle systems on the GPU. As such the performance decrease involved with using C# rather than C++ should not be an issue as the majority of the calculations are to be carried out on the GPU. Page 13 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.1 Form and Controls Before developing particle systems an environment must be constructed so that the properties of the demos can easily be modified and the results observed. Another reason for creating the demos in XNA is the fact that it is able to seamlessly integrate with C# and .Net v2.0 components. Figure 9 - XNAGPUParticles Application Form Figure 9 demonstrates the interface created with C# and XNA. Controls on the left hand side of the window allow the user to modify properties and values and see them reflected in the render in the centre of the screen. The controls have been split into two categories; those that are generic to all demos and ones that are specific to the demo currently active. Page 14 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.1.1 Generic Controls The first three generic controls (shown in figure 9) are fairly self explanatory. Several textures have been included with the demos for testing; these can be accessed from a drop down. The number of particles can be increased or decreased from a minimum of 1,000 to a maximum of 1,000,000 using a slider. The individual particle size can also be modified using a slider. The second three controls can turn properties on or off via checkboxes. The first checkbox allows the user to set whether distance perspective is taken into consideration when rendering. If enabled, a different rendering pass is used to calculate the size of each particle based on its distance from the camera. An inverse square ratio is used to make the effect more pronounced. The second checkbox allows the user to define whether multiple render targets are used when updating dynamic particle systems. If enabled, the demos use different techniques when updating the particles on the GPU to take advantage of rendering to multiple textures simultaneously. The final checkbox turns on or off alpha blending. When enabled each particle is blended against the current scene when rendered. This process can be intensive on the GPU and hence turning it off should increase performance of a particle system. Page 15 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.2 Static Particle System The static particle system is the most basic of particle systems. It involves filling a large vertex buffer with random positions and then passing that data to the GPU for rendering as point sprites. This effect is very simple, it has no update passes and the vertex data never changes. As such, the vertex buffer only needs to be sent to the graphics card once in the initialisation phase. The result of this simplicity is that the effect is very quick, capable of rendering one million particles at 80 frames per second (FPS). Figure 10 - 1,000,000 Static Particles at 80FPS Page 16 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.3 Dynamic CPU Based Particle System The CPU based dynamic system is the most commonly used particle system in games and as such should serve as in important benchmark against the more advanced techniques explored later on. The CPU based particle system involves individual particle states stored in a large array with updates to their position and other properties performed each frame on the CPU. Once updated the properties are then used to construct a vertex which is then passed to the GPU for rendering as point sprites. The bottleneck for this type of particle system occurs on the CPU where many calculations have to be carried out to update the particles in a sequential manner. The result of this is that the demo is only capable of rendering 160,000 particles before the frame rate drops below 10FPS. Figure 11 - CPU Based Dynamic Particle System Rendering 160,000 Particles at 10FPS Page 17 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.4 Dynamic GPU Based Particle System The third particle system is the first to use the GPU as a general processing unit. Particle positions and velocities are updated in parallel in pixel shaders on the GPU rather than sequentially on the CPU. Three separate textures are used to store the states of the individual particles. Each texture is a 128bit floating point texture with 32bits for each colour channel. One texture is used to represent the positions of the particles. The red (R), green (G), blue (B) components directly translate to XYZ components in the particles’ position. The final alpha (A) component of the texture is used as a time step modifier such as the one described in the Kruger and Westermann paper (Kruger, Kipfer, Kondratieva, & Westermann, 2005) and is used to control when individual particles should spawn so that they don’t all spawn at once. The second texture is used to represent the velocities of the individual particles. As with the position texture, the RGB components are used to represent the XYZ values of a particle’s velocity. The alpha component in this texture is unused. The final texture is used for the starting velocities for the particles. It is needed so that particle birth and deaths can be calculated purely on the GPU. If this texture wasn’t used then the birth and death of individual particles would have to be carried out on the CPU. As mentioned earlier this would result in the velocity texture being read-back from video memory to standard memory causing performance to suffer. As with the other textures the RGB component is used to represent the XYZ starting velocities and the A component is left unused. During the update phase the textures are passed to a pixel shader and the resultant position and velocity passed out. As textures cannot be read from and written to at the same time a double buffered technique is used. The position and velocity textures have two copies stored in video memory and are arranged so that when one is being read the other is written to and the following frame they are swapped over. Page 18 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann This demo also features the Multiple Render Target (MRT) feature that was purported to be in the nVidia demo. As mentioned in section 3.1.2 this feature allows the pixel shader to output two colours instead of one allowing the positions and velocities to be updated in a single pass. Toggling this feature on and off however appears to have little affect on the frame rate. Figure 12 - GPU Based Dynamic Particle System Rendering 360,000 Particles at 10FPS This demo implements the core features found in Latta’s and nVidea’s demos. During implementation however it was noticed that the demo was only capable of rendering approximately 360,000 particles before it reached 10 FPS as opposed to the one million particles rendered in Latta’s and nVidea’s demos. The reason for this performance deficit can be attributed to when the position texture is read-back from the graphics card. Latta and nVidia are able to prevent read-back this by using a feature known as “Super buffers” or “Uber buffers”. Super buffers allow the programmer to specifically define an area of video memory for a given task (Mace, 2003). This means that the pixel shader can write directly to Page 19 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann an area of video memory which can then be cast to a vertex array for direct use during the rendering step. Super buffers mean that no read-back occurs and hence Latta and nVidia can get three times the number of particles updated and rendered than this demo is able to. Unfortunately DirectX and hence XNA do not support super buffers and as such this demo is unable to perform at the same levels as Latta’s and nVidea’s demos. However, an alternative does exist that is also available to DirectX and XNA and is demonstrated in the next section of this document. Page 20 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.5 Dynamic GPU Based Particle System with Vertex Textures Vertex textures as researched in section 3.2.1 of this document are a new technology and are perfectly suited to the requirements of GPU based particle systems. By performing texture lookups directly in the vertex shader the expensive read-backs can be avoided. The particle updates are carried out in exactly the same way as the previous demo except the position texture instead of being read-back is passed directly to a different rendering shader. The actual texture lookup in the vertex shader is carried out with the line: float4 pos = tex2Dlod(PositionsTex, float4(PositionCoord,0,0)); The PositionCoord variable is defined per-vertex at the initialisation phase. They are arranged so that each vertex maps to an individual pixel on the texture. As the vertex positions are dynamically calculated on the GPU each frame the vertex buffer never needs updating and hence never has to be resent to the GPU. Figure 13 - GPU Based Dynamic Particle System Rendering 450,000 Particles at 45FPS Page 21 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann As figure 13 demonstrates, using vertex textures has boosted the frame rate to 45FPS with almost half a million particles being updated and rendered at the same time. Unfortunately the testing system is unable to achieve any higher number of particles than that as the 6800GT graphics card doesn’t support vertex textures large enough to render over 450,000 particles. In section 5 of this document a full analysis is conducted and graphs plotted to predict the FPS the demo should reach running one million particles on the test system. Having tested on other, more recent, hardware however the demo is able to render one million at high frame rates. Page 22 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 4.6 Dynamic GPU with Vertex Textures and Forces Up until now the demos have been an investigation into the technology that can provide increasingly efficient particle systems without thought spared to their application in the real world. The final demo attempts to rectify this by introducing forces. Currently each particle system demo only applied the bare minimum of calculations in the update passes to create a particle fountain effect. Gravity is added to velocity, which is then applied to the position of each particle. For the final demo however forces were introduced which created a more complex update procedure. HLSL supports arrays of variables (non-texture variables) and can be dynamically assigned per frame. In this demo the user is able to position the forces in the world and as such variable arrays allow a number of force positions to be passed to the shader in an array. Before the velocity is applied to the position of the particle, forces are applied to the velocity in a loop: for (int i=0; i<numForces; i++) { float3 diff = ForcePositions[i]-Out.Position.xyz; float lenSqr = vecLenSquare(diff); float inv = 1/lenSqr; diff*=inv*ForceModifiers[i]; Out.Velocity.xyz += diff; } First the distance between the current particle and each force is calculated and the length squared is calculated. vecLenSquare is a custom function created as HLSL doesn’t have one and calculating the length of a vector is expensive as it requires a square root calculation. Once calculated the distance is inversed then multiplied by user defined force modifier. The force modifier contains a floating point variable that can either be negative or positive. Negative numbers will repel particles whereas positive will attract. Page 23 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann This demo has also added extra rendering calculations to the previous one. Using controls on the left hand side of the application the user can change the rendering algorithm and the colours used. The result of all this added functionality is a rather spectacular display of hundreds of thousands of particles orbiting or being repelled from forces (see figure 14). Figure 14 - Complex Force Interactions with Hundreds of Thousands of Particles Page 24 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 5.0 Analysis Each particle system implemented in section 4 of this document used different methods. Although a brief comparison of performance was conducted, it wasn’t very strict and doesn’t give an accurate performance measure. This section intends to test each demo in turn and compare it against the others to get accurate figures of how much benefit one method has over another. 5.1 The testing method Each method will be tested in turn. The number of particles will be increased in increments and the average FPS recorded. Some of the demos implements different particle effects (demo1 is a cube whereas demo 2,3,4 are fountains). Because of this alpha blending will be disabled for the tests and the distance perspective option will be turned off. This ensures that each particle drawn will take exactly the same amount of time. This means that only the update and the various inefficiencies of the different methods are being tested and the differences in the shape of particle system are ignored. Page 25 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 5.2 Data CPU Static #Particles 10000 119806 302887 408510 605673 809878 FPS 875 711 216 152 100 74 1E+06 60 CPU Dynamic #Particles 10000 FPS 133 49391 32 GPU Dynamic #Particles 10000 FPS 226 49391 126848 169097 246554 316970 394427 507092 64 31 169097 15 12 10 8 77557 105723 162056 211346 302887 20 16 11 8 6 GPU Dynamic Vertex Textures #Particles 10000 119806 211346 309928 408510 450759 FPS 760 219 128 68 50 45 GPU Forces (one colour) #Particles 10000 105723 204305 309928 457801 FPS 615 203 101 55 33 GPU Forces (based on velocity) #Particles 10000 98682 204305 309928 450759 FPS 615 240 86 74 32 Page 26 41551 4 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 5.3 Graph The graph below was constructed from the gathered results. Results with frame rates larger than 250FPS and number of particles larger than 600,000 have been omitted for clarity. Figure 15 - Graph Plotting Particle System Performance Page 27 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 6.0 Conclusion The results gathered clearly demonstrate what was suspected. The slowest particle system is the one where all the updates are carried out on the CPU and is only able to render about 170,000 particles before the frame rate drops below 10FPS. The next fastest is the system that updates the particles on the GPU and is able to render about 390,000 particles before the frame rate reaches 10FPS. The next three don’t suffer from any read-back as they are implemented using vertex textures and such reach much higher frame rates of about 40FPS at 450,000 particles. The final and fastest particle system is the static one that reaches 60FPS with 1,000,000 particles. As mentioned previously the testing system cannot render any more than about 450,000 particles using vertex textures and as such results cannot be taken up to higher numbers of particles. Despite this it is possible to draw a line of best fit onto the available results and extrapolate a expected figures at higher number of particles (see appendix C, p.30). Doing this reveals that on the test system the particle system implemented with vertex textures should be able to reach about 28FPS at 1,000,000 particles. This value is quite surprising considering Latta and nVidia were only able to reach 10FPS at 1,000,000 dynamic particle using C++ and OpenGL with super buffers. One reason that may account for this increase is the fact that the implementation discussed in this paper doesn’t use depth sorting to arrange the particles unlike nVidia and Latta. Should depth sorting have been implemented it’s unlikely that it would have affected the frame rate enough to reduce it to 10FPS. Page 28 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 7.0 Appendix 7.1 Appendix A – Test System Specs Operating System Windows XP Professional Service Pack 2 (build 2600) Processor 2.27 gigahertz AMD Athlon 64 128 kilobyte primary memory cache 512 kilobyte secondary memory cache Main Circuit Board Board: ASUSTeK Computer INC. A8N-SLI DELUXE 1.XX Serial Number: 123456789000 Bus Clock: 206 megahertz BIOS: Phoenix Technologies, LTD ASUS A8N-SLI DELUXE ACPI BIOS Revision 1014 09/27/2005 Memory Modules 2048 Megabytes Installed Memory Slot 'A0' has 512 MB Slot 'A1' has 512 MB Slot 'A2' has 512 MB Slot 'A3' has 512 MB Page 29 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 7.2 Appendix B - Speed Comparisons of Java, C# and C++ Page 30 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann The above graph was taken from the Tommti-Systems (Tommti-Systems, 2006) website and shows performance in milliseconds. Maximum memory usage: Java - 163 MB, C# - 111 MB, C++- 98 MB To summarize the table, Java gets 5 wins against C# and C# gets 9 wins against Java. C++ is the fastest overall with a total of 11 wins against C#. 7.3 Appendix C – Best Fit Graph Page 31 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann 8.0 References Dudash, B. (2004). Next Generation Shading and Rendering. Retrieved 02 07, 2007, from nVidia Developer: http://download.nvidia.com/developer/presentations/2004/Iron_Developer/English_ Advanced_Shading.pdf Gerasimov, P., Fernando, R., & Green, S. (2004, 06 04). Vertex Textures. Retrieved 02 07, 2007, from nVidia Developer: ftp://download.nvidia.com/developer/Papers/2004/Vertex_Textures/Vertex_Textures .pdf Kruger, J., Kipfer, P., Kondratieva, P., & Westermann, R. (2005, 11). A Particle System for Interactive Visualization of 3D Flows. Retrieved 02 15, 2007, from IEEE Transactions on Visualization and Computer Graphics: http://wwwcg.in.tum.de/Research/data/Publications/tvcg05.pdf Latta, L. (2004). Building a Million Particle System. Retrieved 02 06, 2007, from 2LDigital: http://www.2ld.de/gdc2004/ Latta, L. (2004, 07 28). Building a Million-Particle System. Retrieved 02 06, 2007, from Gamasutra: http://www.gamasutra.com/features/20040728/latta_01.shtml Mace, R. (2003). OpenGL ARB Superbuffers. Retrieved 02 15, 2007, from nVidia Developer: http://developer.nvidia.com/docs/IO/8230/GDC2003_OGL_ARBSuperbuffers.pdf nVidia. (Unknown). GPU Particles. Retrieved 02 06, 2007, from nVidea Samples: http://download.nvidia.com/developer/SDK/Individual_Samples/samples.html#gpu_ particles Percy, J. (2003). OpenGL Extensions. Retrieved 02 06, 2007, from ATI: http://www.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf Page 32 Real-Time State Preserving Particle Systems Simulated on the GPU Michael Cann Potiy, O. A. (2005). 3D Flow visualization using GPU-driven particle system. Retrieved 02 15, 2007, from Graphicon: http://www.graphicon.ru/proceedings2005/papers/Potiy.pdf Williams, I., & Heart, E. (2005, 06 01). Efficient rendering of geometric data using OpenGL VBOs in SPECviewperf. Retrieved 02 07, 2007, from Standard Performance Evaluation Corporation: http://www.spec.org/gpc/opc.static/vbo_whitepaper.html Zeller, C. (2005, 06). GPU Cloth. Retrieved 02 07, 2007, from nVidia Developer: http://download.nvidia.com/developer/SDK/Individual_Samples/featured_samples.ht ml#Cloth Page 33