Youngho Kim CIS665: GPU Programming Building a Million Particle System: Lutz Latta UberFlow - A GPU-based Particle Engine: Peter Kipfer et al. Real-Time Particle Systems on the GPU in Dynamic Environments: Shannon Drone GPU-based Particle Systems for Illustrative Volume Rendering: R.F.P. van Pelt et al. Water Effects Nvdia DirectX10, Soft Particle Building a Million Particle System: SIGGRAPH 2004 Lutz Latta One of the original uses was in the movie Star Trek II William Reeves (implementor) Each Particle Had: • Position • Velocity • Color • Lifetime • Age • Shape • Size • Transparency Particle system from Star Trek II: Wrath of Khan Spacewar: 1962 Uses pixel clouds as Explosions Random motion Asteroids: 1978 Uses short moving vectors for Explosions Probably first “physical” particle simulation in CG/games Stateless Particle System ◦ A particle data is computed from birth to death by a closed form function defined by a set of start values and a current time. ◦ Does not react to dynamic environment Fast State Preserving Particle System ◦ Uses numerical iterative integration methods to compute particle data from previous values and changing environmental descriptions. Movie Clip Stateless Simulation – Computed particle data Particle Birth Rendering by closed form functions ◦ Notime reaction dynamically changing environment! Upload of birth &on initial Set global function parameter as Values to dynamic vertex program constants ◦ No Storage ofbuffer varying data (sim.vertex in vertex prog) Render point sprites/triangles/quads With particle system vertex program Generation – Particles are generated randomly within a predetermined location Particle Dynamics – The attributes of a particle may vary over time. Based upon equations depending on attribute. Extinction Premature Extinction: ◦ Age – Time the particle has been alive ◦ Lifetime – Maximum amount of time the particle can live. ◦ Running out of bounds ◦ Hitting an object (ground) ◦ Attribute reaches a threshold (particle becomes transparent) Rendered as a graphics primitive (blob) Particles that map to the same pixels are additive ◦ Sum the colors together ◦ No hidden surface removal ◦ Motion blur is rendered by streaking based on the particles position and velocity CPU based particle systems are limited to about 10,000 particles per frame in most game applications Limiting Factor is CPU-GPU communication ◦ PCI Express transfer rate is about 3 gigabytes/sec Rendering Passes: Process Birth and Deaths Update Velocities Update Positions Sort Particles (optional, takes multiple passes) Transfer particle positions from pixel to vertex memory Render particles Two Textures (position and velocity) ◦ Each holds an x,y,z component ◦ Conceptually a 1d array ◦ Stored in a 2d texture Use texture pair and double buffering to compute new data from previous data Storage Types ◦ Velocity can be stored using 16bit floats Size, Color, Opacity, etc. ◦ Simple attributes, can be added later, usually computed using the stateless method Birth = allocation of a particle Death = deallocation of a particle ◦ Associate new data with an available index in the attributes textures ◦ Serial process – offloaded to CPU ◦ Initial particle data determined on CPU also ◦ Must be processed on CPU and GPU CPU – frees the index associated with particle GPU – extra pass to move any dead particles to unseen areas (i.e. infinity, or behind the camera) In practice particles fade out or fall out of view (Clean-up rarely needs to be done) Stack 5 2 30 10 15 26 11 19 12 6 33 Easier! Heap 2 10 5 6 12 11 15 26 19 30 33 Optimize heap to always return smallest available index Better! Velocity Operations Global Forces ◦ Wind ◦ Gravity Local Forces Velocity Damping Collision Detection ◦ Attraction ◦ Repulsion F = Σ f0 ... fn F = ma a = F/m If m = 1 F=a Stokes Law of drag force on a sphere Fd = 6Πηr(v-vfl) η = viscosity r = radius of sphere C = 6Πηr (constant) v = particle velocity vfl = flow velocity Sample Flow Field Damping ◦ Imitates viscous materials or air resistance ◦ Implement by downward scaling velocity Un-damping ◦ Self-propelled objects (bee swarms) ◦ Implement by upward scaling velocity Collisions against simple objects ◦ Walls ◦ Bounding Spheres Collision against complex objects ◦ Terrain ◦ Complex objects ◦ Terrain is usually modeled as a texture-based height field vn = (vbc * n)vbc vt = vbc – vn Vn vbc = velocity before collision vn = normal component of velocity vt = tangental component of velocity V = (1-μ)vt – εvn μ = dynamic friction (affects tangent velocity) ε = resilience (affects normal velocity) V Vt Integrate acceleration to velocity: ◦ v = vp + a⋅ ∆ t Integrate velocity to position: ◦ p= pp + v⋅ ∆ t Computationally simple Needs storage of particle position and velocity Integrate acceleration to position: ◦ p = 2pp − ppp a⋅t2 ◦ ppp: position two time steps before Needs no storage of particle velocity Time step needs to be (almost) constant Explicit manipulations of velocity (e.g. for collision) impossible Odd-even merge sort ◦ Good for GPU because it always runs in constant time for a given data size Inefficient to check whether sort is done on GPU ◦ Guarantees that with each iteration sortedness never decreases Full sort can be distributed over 20-50 frames so it doesn’t slow down system too much (acceptable visual errors) UberFlow A GPU-based Particle Engine: SIGGRAPH 2004 Peter Kipfer et al. Major bottleneck ◦ Transfer of geometry to graphics card Process on GPU if transfer is to be avoided ◦ Need to avoid intermediate read-back also Requires dedicated GPU implementations Perform geometry handling for rendering on GPU Particle advection ◦ Motion according to external forces and 3D force field Sorting ◦ Depth-test and transparent rendering ◦ Spatial relations for collision detection Rendering ◦ Individually colored points ◦ Point sprites Simple two-pass method using two vertex arrays in double-buffer mode ◦ Render quad covering entire buffer ◦ Apply forces in fragment shader Render target Bind to texture Buffer 0 Screen Pass 1: integrate Bind to render target Buffer 1 Pass 2: render Bind to vertex array Additional buffers for state-full particles Store velocity per particle (Euler integration) Keep last two positions (Verlet integration) Simple: Collision with height-field stored as 2D texture ◦ ◦ ◦ ◦ RGB = [x,y,z] surface normal A = [w] height Compute reflection vector Force particle to field height Essential for natural behavior ◦ Full search is O(n²), not practicable ◦ Approximate solution by considering only neighbors ◦ Sort particles into spatial structure Staggered grid misses only few combinations Check m neighbors to the left/right ◦ Collision resolution with first collider (time sequential) ◦ Only if velocity is not excessively larger than integration step size Real-Time Particle Systems on the GPU in Dynamic Environments: SIGGRAPH 2007 Shannon Drone Storage requirements Integrating the equations of motion Saving particle states Changing behaviors Need to store immediate particle state (position, velocity, etc) Option 1: Store this data in a vertex buffer ◦ Each vertex represents the immediate state of the particle ◦ Particles are store linearly in the buffer Option 2: Store this data in a floating point texture array ◦ First array slice stores positions for all particles. ◦ Next array slice stores velocities, etc. Accuracy depends on the length of time step Use Euler integration for these samples Runge-Kutta based schemes can be more accurate at the expense of needing to store more particle state Integration on the CPU can use read-modifywrite operations to update the state in-place This is illegal on the GPU (except for nonprogramming blending hardware) Use double buffering Ping-pong between them Particles are no longer affixed to a predestined path of motion as in parametric systems Changing an individual velocity or position will change the behavior of that particle. This is the basis of every technique in this theory N-Body problems Force splatting for N2 interactions Gravity simulation Flocking particles on the GPU Every part of the system has the possibility of affecting every other part of the system Space partitioning can help Treating multiple parts of a system at a distance as one individual part can also help Parallelization can also help (GPU) Our goal is to project the force from one particle onto all other particles in the system Create a texture buffer large enough to hold forces for all particles in the simulation Each texel holds the accumulated forces acting upon a single particle It is O(N2), but it exploits the fast rasterization, blending, and SIMD hardware of the GPU It also avoids the need to create any space partitioning structures on the GPU Uses force splatting to project the gravitational attraction of each particle onto every other particle Handled on the GPU ◦ CPU only sends time-step information to the GPU ◦ CPU also handles highlevel switching of render targets and issuing draw calls Uses basic boids behaviors [Reynolds87,Reynolds99] to simulation thousands of flocking spaceships on the GPU ◦ ◦ ◦ ◦ Avoidance Separation Cohesion Alignment Cohesion and Alignment Unlike N-Body problems, Cohesion and Alignment need the average of all particle states ◦ Cohesion steers ships toward the common center. ◦ Alignment steers ships toward the common velocity We could average all of the positions and velocities by repeatedly down-sampling the particle state buffer / texture However, the graphics system can do this for us Graphics infrastructure can already perform this quick down-sampling by generating mip maps The smallest mip in the chain is the average of all values in the original texture Extend 2D partitioned grid [Lutz04] to 3D by rendering the scene into a volume texture Instance the geometry across all slices For each instance we clip the geometry to the slice For each voxel in the volume texture stores the plane equation and velocity of the scene primitive Once having the position buffer, it can gather paint splotches Use a gather pixel shader to traverse the position buffer For each particle in the buffer, the shader samples its position and determines if it can affect the current position in the position buffer If it can, the color of the particle is sent to the output render target, which doubles as the mesh texture GPU-based Particle Systems for Illustrative Volume Rendering: IEEE/ EG Symposium on Volume and PointBased Graphics 2008 R.F.P. van Pelt et al. Interactive GPU-based illustrative framework, called VolFlies-GPU, for rendering volume data, exploiting parallelism in both graphics hardware and particle systems User-configurable particle systems to produce stylized renderings from the volume data, imitating traditional pen and ink drawings Achieve real-time interaction and prompt parametrization of the illustrative styles, using an intuitive GPGPU paradigm GPGPU paradigm using transform feedback Intensive use of both vertex shaders and geometry shaders, while fragment shaders are completely discarded VolFliesGPU framework comprises an interactive illustrative visualization framework for real-time penand-ink style rendering of volume data Stippling: Surface shading by changing the stipple scale Hatching: Conveying shape by tracing hatch stroke in one or two directions. Contours Curvature-based hatching with contours Direction-based hatching on cone-splatted bone tissue, combined with a scale-based stippled skin iso-surface with contours. Similar styles, with added contours on the bone surface Interactive framerates The particle system is useful for graphic effects for movies, games, and medical visualization The particle system can be effectively operated on GPU processing http://www.seas.upenn.edu/~cis665/LECTURES/Lecture6.ppt http://wwwcg.in.tum.de/Research/data/uf_sig04_slides.pdf http://www.2ld.de/gdc2004/ http://wwwcg.in.tum.de/Research/Publications/UberFLOW http://ati.amd.com/developer/gdc/2007/Drone-RealTime_Particles_Systems_on_the_GPU_in_Dynamic_Environments(Siggraph0 7).pdf http://delivery.acm.org/10.1145/1290000/1281670/p80drone.pdf?key1=1281670&key2=2779414421&coll=GUIDE&dl=GUIDE&CF ID=39073315&CFTOKEN=91418908 http://yp.wtb.tue.nl/pdfs/9587.pdf