Image Synthesis GP-GPU computer graphics & visualization Graphics hardware • Current performace – PlayStation 3 • CPU: Cell Prozessor (3,2 GHz) – 512 kB L2-Cache – ~200 GFLOP/s • GPU (Graphics Processing Unit) – Nvidia RSX Reality Synthesizer (550 MHz, ~300 MTransistors – ~ 1,8 TFLOP/s – ~ 20 GPixels/s – ~ 2 GTriangles/s Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics hardware - history • 80: simple rasterization – Windows, lines, polygons, text-fonts • 90-95: „Geometry-Engines“ only on High-End-Workstations – e.g. SGI O2 vs. Indigo2) • 95: new rasterization functionality – Realism by texturing, e.g: SGI Infinite Reality • 98: Geometry processor (T&L) on PC-Graphics • 2000: PC-Graphics achieves similar performance to High-End-Workstations – 3D is becoming standard in Aldi-PC • 2001: PC-Graphics offers new functionality – Multitextures, Vertex- and Pixel-Shader • 2002: DirectX Level 9.0 Hardware – High Level Shader Languages • 2006: DirectX Level 10.0 Hardware – Geometry – Shader Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Trends in graphics hardware Number of transistors doubles every 6 months Advances in performance and functionality Transistors (Mi) 300 150 60 ATI R520 GeForceFX / ATI Radeon 9800 50 40 GeForce3 (57M) R200 (60M) 30 20 Riva 128 (3M) 10 0 9/97 3/98 9/98 3/99 9/99 3/00 9/00 3/01 9/02 Time (month/year) Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Trends in graphics hardware • Grows faster than Moore‘s law predicts Performance Graphics CPU Network Time Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Parallel graphics hardware • Graphics hardware has always been parallel – Internal on chip or board • Multiple rasterizer serve one frame buffer – Multi-Pipe • Multiple graphics cards in one system for one or multiple displays • Multiple geometry engines – Distributed graphics • Multiple knots in a connected cluster with one or multiple cards serve one or multiple displays driven by one application Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics architectures • State-of-the-Art GPUs – Highly parallel stream architecture • Stream of vertices/fragments is processed • Pipelined and SIMD parallel processing – SIMD: single set of instructions on multiple stream elements – Specifies new rendering pipeline • Additional stages a vertex or a fragment is passing through – Specifies new (vendor specific) OpenGL extensions – Allows for new classes of algorithms – Eventually makes programs platform dependent Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics architectures State-of-the-Art GPUs (G80) Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics architectures • State-of-the-Art GPUs – – – – – Multiple (texture) render targets Up to 2GB video memory Floating point textures (4 x 32 Bit) Internal computations in float /double precision Z-cull: discards fragments (before entering the pixel pipelines) that will fail the depth test – Dynamic flow control: per-vertex/geometry/fragment specific operations (if then else) – PCIe: serial, pont2point protocol, dual channels to allow for bandwidth in both directions (upload/download) – Fix fragment-to-pixel bound, i.e. a fragment (XY) can not be written to a pixel (X´Y´) • no scattering (at least not in DX/GL)– only gathering Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics architectures State-of-the-Art programmable GPUs Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Graphics architectures State-of-the-Art programmable GPUs Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization GP-GPU Water Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware Displacement mapping Simulation generates height field texture static grid water surface Displacer Rendering Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware • GPU memory objects – Semantics can be specified for chunk of memory – Memory object can be a texture, a vertex array, a frame buffer object • What was a texture render target in the current pass becomes a vertex array in the upcoming pass – Texture elements can be interpreted as vertex attributes without any copying operations (not in OpenGL) – Same effect can be achieved with vertex texture fetch, but this fetch actually slows down performance Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware • Example – Computation of height values u at vertices of a 2D grid y Pi-1j+1 Pij+1 Pi+1j+1 h Pi-1j Pij Pi+1j Pi-1j-1 Pij-1 Pi+1j-1 h x 2 2 2 2 t c t 4 c t t 1 t t t t t 1 uij u u u u 2 u u i 1 j i 1 j ij 1 ij 1 ij 2 ij h2 h – Starting with an initial distribution, compute evolution over time t Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware Algorithm: – Load initial height values (NxxNy) as 2D texture (sGridPrev, sGrid) – Upload fragment shader (render to sGridNew): void PerPixelSim ( float2 fragpos: TEXCOORD0, out height : COLOR0) { centerPrev = tex2D(sGridPrev, fragpos); float2 leftIndex = float2(-1.0/TexSize, 0.0); left = tex2D(sGrid, fragpos + leftIndex); // same for right, upper, lower, center height = f(left, right, upper, lower, center, centerPrev); } Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware Algorithm contd.: (0,1) (1,1) – Simulation: • Render a Quad that covers Nx x Ny pixels with appropriate texture coords. – Nx x Ny fragments will be generated – Data parallel execution of fragments (texCoord = 0,0) (1,0) – Swizzle texture identifiers • sGridPrev = sGrid, sGrid = sGridNew; sGridNew = sGrdPrev – Display height field in texture sGrid Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware Algorithm contd.: – Display: • Upload fragment shader (render to color buffer): void PerPixelRefract ( float2 fragpos: TEXCOORD0, out color : COLOR0) { tangent = float3(1.0, 0.0, tex2D(sGrid, fragpos + rightIndex).r tex2D(sGrid, fragpos).r; binormal = float3(0.0, 1.0, tex2D(sGrid, fragpos + upper).r tex2D(sGrid, fragpos).r); normal = normalize(cross(tangent, binormal)); refract = f(normal, refractionIndex); color = tex2D(sBackground, fragpos + refract); } Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization GPGPU Particle Tracing Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization GPU Partikelverfolgung Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization GPU Partikelverfolgung Eingabe Strom Input Assembler Vertex Shader Rasterizer Ausgabe Strom Output Merger Pixel Shader Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization Programmable graphics hardware Demonstration Image Synthesis – WS 07/08 Dr. Jens Krüger – Computer Graphics and Visualization Group computer graphics & visualization