Jeff Schmidt CS 680 Nature scenes are a prevalent topic in computer graphics (for example, computer games) In addition, to realistic trees and flowing water effects, we want to render a high-quality grass effect in real-time that looks realistic from all angles In general, grass needs to cover a vast amount of the scene. This makes modeling each individual blade of grass with polygons unrealistic A simple, flat grass texture will only look realistic from certain angles We create a “star” organization of quads and cover them with grass textures The star formation gives us the same effect no matter which side of the grass object we are viewing Finally, we want to simulate a wind effect to make the grass feel more realistic To simulate the blowing of grass, we shift the upper vertices of our grass object I have implemented both a CPU only and a GPU version of the program I use OpenGL/GLUT for rendering I used the GPU to speed up various tasks I have been running my project on double/float CPU: Intel(R) Xeon(R) CPU GPU: GeForce GTX 580 Timings were taken using cudaEvent’s No compiler flags For the grass texture, I use .ppm files for their simple r, g, b, r, g, b,… file format However, ppm files do not support an alpha channel Solution: Pick a color that does not appear in any textures and fill the background with it. Then parse the texture file, and fill in alpha values where you used the “transparent” color Each (r, g, b) triple is independent, therefore, splitting it amongst threads is simple We only read each r, g, b value once, and we only write each alpha value once Therefore, I used zero-copy host memory to eliminate copying the texture from the CPU to the GPU and back Experiment ran with varying texture sizes. GPU version has 64 blocks, 64 threads Timings 900 800 CPU 700 600 GPU 500 Time (ms) 400 300 200 100 0 100 200 300 Texture dimensions 400 500 GPU - Speedup 1.6 1.4 1.2 1 Speedup 0.8 0.6 0.4 0.2 0 100 200 300 Texture dimensions 400 500 Again, each grass object is independent of one another, therefore, each thread can create it’s own set of grass objects in parallel Each grass object has an initial position that is perturbed by some random value, which gives slightly non-uniform distribution PROBLEM: Creating a random number on the GPU To generate random numbers, I create an array on the host filled with random numbers Then I place the array in texture memory Each block/thread indexes into the texture to retrieve the desired random numbers Experiment ran with varying numbers of grass objects. GPU version has 64 blocks, 64 threads Timings 100 90 80 CPU 70 60 Time (ms) 50 40 GPU 30 20 10 0 10000 20000 30000 Number of Grass Objects 40000 50000 GPU - Speedup 1.85 1.8 1.75 1.7 1.65 Speedup 1.6 1.55 1.5 1.45 1.4 1.35 10000 20000 30000 Number of Grass Objects 40000 50000 Create random wind vectors crossing the viewing area Calculate vector to shift the grass objects by measuring their distance from wind vector After the wind blows, “spring” back towards resting position Include some slight randomness, so that all grass doesn’t move exactly the same speed/direction Unfortunately, due to time, I was unable to simulate wind Instead, my grass objects wave randomly My “wind” simulation actually runs slower on the GPU version, due to creating the texture of random numbers and copying them to the GPU For this reason, I did not include any timings For a more in-depth wind simulation, the GPU version would likely out-perform the CPU Strategies to get random numbers from the GPU: Store them in a texture generated by the host Implement a simple psuedo-random number generator that runs on the GPU (example: linear congruential generators) I got to implement some new (for me) CUDA features Zero-copy host memory – used for updating my textures with transparency Texture memory – used for storing random numbers and passing them to the GPU Use __host__ __device__ for functions you want on both the CPU and GPU I would have liked to use CUDA’s interoperability features with OpenGL to have the GPU render the scene without going through the CPU I would have liked to explore CUDA streams for overlapping of GPU calculations and memory copying Due to time constraints, I was unable to implement wind simulation. I would have liked to use a real simulation formula and better grass textures to make the scene look more realistic