Colt McAnlis Graphics Programmer – Blizzard 60 minutes (ish) Texturing data is too large to fit into memory Texturing data is unique Lots of resolution Down to maybe 1meter / per pixel Vertex data General terrain texturing issues Low End Hardware Review of technologies Paging & Caches DXT++ Compression Compositing frameworks Editing Issues Example Based Texture Synthesis Only subsection visible at a time Non-visible areas remain on disk New pages must be streamed in Quickly limited by Disk I/O Fast frustum movements kill perf New pages occur frequently Instead page in full radius around player Only need to stream in far-away pages Chunks stream in levels of mipmaps As Distance changes, so does LOD New mip levels brought in from disk Textures typically divided across chunk bounds Not ideal for Draw call counts.. Each chunk has it’s own mipchain Difficult To filter across boundaries But we don’t need full chains at each chunk Radial paging requires less memory Would be nice to have easier filtering What if we had one large mip-chain? Use one texture per ‘distance’ All textures are same size Resolution consistent for range As distance increases, quality decreases Can store as 3d texture / array Only bind 1 texture to GPU The benefit of this is that we can use 1 texture No more filtering-across-boundary issues Texturing no longer a reason for breaking batches 1 sample at 1 level gets proper filtering Mip mapping still poses a problem though Since mips are separated out Each ‘distance’ only needs 2 mips At distance boundaries, mip levels should be identical. Current mip, and the next smallest Current distance is mipped out to next distance Memory vs. perf vs. quality tradeoff YMMV Mip Transition MipChain How do we update the texture? GPU resource? Should use render-to-texture to fill it. But what about compression? Can’t RTT to compressed target GPU compress is limited Not enough cycles for good quality Shouldn’t you be GPU bound?? So then use the CPU to fill it? Lock + memcpy Paging & Caches DXT++ Compression Compositing frameworks Editing Issues Example Based Texture Synthesis Goal : Fill large texture on CPU Problem : DXT is good But other systems are better (JPG) ID Software: JPEG->RGBA8->DXT Re-compressing decompressed streams 2nd level quality artifacts can be introduced Decompress / recompress speeds? We have to end up at GPU friendly format Remove the Middle man? Sooner or later.. We would need to decompress directly to DXT Means we need to compress the DXT data MORE Let’s look at DXT layout DXT1 : Results in 4bpp High 565 Low 565 2 bit Selectors In reality you tend to have a lot of them : 512x512 texture is 16k blocks … Really, two different types of data per texture 16 bit block colors 2bit selectors Each one can be compressed even further Input texture : Potential for millions of colors Input texture : Actual used colors 16 bit compressed Used colors •Two unique colors per block. •But what if that unique color exists in other blocks? •We’re duplicating data •Let’s focus on trying to remove duplicates Lossless data compression Represents least-bit dictionary set IE more frequently used values have smaller bit reps String : AAAABBBCCD (80 bits) Symbol Used % Encode A 50% 0 B 25% 10 C 15% 110 D 5% 111 Result : 00001010101101101111 (20 bits) More common colors will be given smaller indexes 4096 identical 565 colors = 8kb Huffman encoded = 514 bytes 4k single bits, one 16 bit color Problem : As number of unique colors increases, Huffman becomes less effective. Similar colors can be quantized Vector Quantization Human eye won’t notice Groups large data sets into correlated groups Can replace group elements with single value Step #1 - Vectorize unique input colors Step #2 – Huffmanize quantized colors Reduces the number of unique colors Per-DXT block, store the Huffman index rather than the 565 color. W00t.. Each selector block is a small number of bits Chain 2bit selectors together to make larger symbol Can use huffman on these too! 4x4 array of 2bit –per block values Results in four 8 bit values Or a single 32 bit value Might be too small to get good compression results Doesn’t help much if there’s a lot of unique selectors Do tests on your data to find the ideal size 8bit-16 bit works well in practice DXT Data Seperate Vector Quantization Q Block Colors Huffman Huffman Table Color Indexes Selector Indexes Selector Bits Huffman Huffman Table TO DISK Block Colors Color Indexes Selector Indexes Huffman Table Selector Bits Huffman Table Block Colors Fill DXT blocks Uncompressed DXT1 (4bpp) DXT1 ++ 3mb 512kb 91kb Uncompressed DXT3A (4bpp) DXT ++ 1mb 512kb 9kb Getting back to texturing.. Insert decompressed data into mipstack level Can lock the mip-stack level Update the sub-region on the CPU Decompression not the only way.. Paging & Caches DXT++ Compression Compositing frameworks Editing Issues Example Based Texture Synthesis Pages for the cache can come from anywhere Doesn’t have to be compressed unique data What about splatting? Standard screenspace method Can we use it to fill the cache? Splatting is standard texturing method Re-render terrain to screen Bind new texture & alpha each time Results accumulated via blending De facto for terrain texturing Same process can work for our caching scheme Don’t splat to screen space, Get same memory benefits Composite to page in the cache What about compression? Can’t composite & compress Alpha blending + DXT compress??? Composite->ARGB8->DXT Compression is awesome Repeating textures + low-res alpha But we could get better results = large memory wins Decouples us from Verts overdraw Which is a great thing! Quality vs. Perf tradeoff Hard to get unique quality @ same perf More blends = worse perf Trade uniqueness for memory Tiled features very visible. Effectively wasting cycles Re-creating the same asset every frame Mix of compositing & decompression Fun ideas for foreground / background Switch between them based on distance Fun ideas for low-end platforms High end gets decompression Low end gets compositing Fun ideas for doing both! A really flexible pipeline.. Decompress Cache Disk Data CPU Compress 2D Compositor GPU Compress Paging & Caches DXT++ Compression Compositing frameworks Editing Issues Example Based Texture Synthesis Standard pipelines choke on data Designed for 1 user -> 1 asset work Mostly driven by source control setups Need to address massive texturing directly Problem with allowing multiple artists to texture a planet. 1 artist per planet is slow… Standard Source Control concepts fail If all texturing is in one file, it can only safely be edited by one person at a time Solution : 2 million separate files? Need a better setup Allows multiple users to edit texturing User Feedback is highly important Edited areas are highlighted immediately to other users Highlighted means ‘has been changed’ Highlighted means ‘you can’t change’ Texturing Server Change Made Artist A Data Updated Artist B Custom merge tool required Each machine only checks in their sparse changes Server handles merges before submitting to actual source control Acts as ‘man in the middle’ Source Control Texturing Server Changes Changes Artist A Artist B What about planet-sized batch operations? Could modify entire planet at once? Would ignore affected areas? Double Edged Sword.. Important to still have batching. Maybe limit batch operation distances? Flag if trying to modify edited area? Common texturing concepts Set texture by slope Set texture by height Set texture by area Could we extend it further? View ‘set’ operations as ‘masks’ Set texturing by procedural functions Combine masks in a graph setup Common concept .kkriger, worldmachine, etc Masks can re-generate based upon vertex changes Generate multiple masks for other data As long as you store the graph, not the mask. Apply trees, objects, etc Cool algorithms here for all Paging & Caches DXT++ Compression Compositing frameworks Editing Issues Example Based Texture Synthesis Repeating textures causes problems Takes more blends to reduce repetition Increases Memory Increases perf Burden Would be nice to fix that automagically Generates output texture per-pixel Chooses new pixel based upon current neighborhood Represent input pixel as a function of its neighbors Create search acceleration structure Find ‘neighborhood’ similar to input This is known as ‘Per-pixel’ synthesis Texture being synthesized Exemplar Basically a Nearest Neighbor search Doesn’t give best quality Only correcting input pixel based upon previously corrected neighborhood Introduces sequential dependencies Need to increase neighborhood size to get better results This increases sample time Exemplar Noisy Output image Hoppe 2006 (Microsoft Research) Multi-resolution: Fixes pixels at various sizes in output image This ‘keeps’ course texture features Reduces image artifacts GPU based Highly controllable Artists / mesh provided vector fields Can synthesize large textures Use terrain normals as input Allows texture to ‘flow’ with contours Allow artists to adjust vectors Rather than have same repeating texture So they can paint custom swirls etc. Could even use to synthesize terrain vertex data But that’s another talk ;) Still too slow to composite MASSIVE terrain @ edit time Synthesize the whole planet? Would have to be a render-farm process. Actually, still too slow to do nonmassive terrain.. Maybe generate custom decals? But what about CPU? Multicore may shed light on it Future research? Use 1 Texture Resource for texture data Use DXT++ to decrease footprint MipStack structure W/o using RGBA->DXT Multi-input cache filling algorithms Stream + Composite Use Custom texturing server Make texture synthesis Faster!! I’m talking to you Mr. Hoppe ;) Andrew Foster Rich Geldreich Ken Adams