A Hybrid Computationally Efficient Parallel Algorithm for Best Visual Quality 3D Real-time Graphics Research Area: Rewriting Algorithms to help Parallel Programming Authors: Amrit Asrani Atheendra P Tarun Athresh R Shigaval Faculty Mentor: Mr.Sudhir Shenai Name of the Institution: Global Academy Of Technology, Bangalore. Abstract: In the domain of computer graphics, real time visualization is achieved by two prominent techniques: Z-buffer algorithm and the ray tracing algorithm. Both have its inherent advantages and limitations. The Ray Tracing technique is capable of producing a very high degree of photorealism, usually higher than that of typical scanline rendering methods but at a greater computational cost. On the other hand Z-buffer is computationally fast but not the best in visual quality. This paper proposes a new hybrid parallel algorithm which exploits the speed of Z-buffer and the visual quality feature of Ray Tracing. This algorithm takes the Merged kD- trees proposed in the RAZOR architecture of Copernicus system and merges with the Z-buffer parallel algorithm which uses a hypercube topology. Multithreading is introduced in the construction stage of Merged kD-trees in computing dynamic scenes which serves as an input to the Z-buffer which itself is parallel and the computation is inherently fast. Thus the parallelism is introduced at various stages of the algorithm which is a linchpin for the high quality 3D Real-time Graphics. The new hybrid algorithm is presumed to be efficient as the experimental analysis of the Z-buffer [1] and Ray Tracing [3] independently are proved to be efficient on their own grounds, viz., in computation and visual quality respectively. Background: The Z Buffer algorithm is used to ensure that perspective works the same way in the virtual world as in the real world. It is a type of Visual Surface Determination (VSD) algorithm[9]. Z buffering works by testing pixel depth and comparing the current position (z coordinate) with stored data in a buffer (called a z buffer) that holds information about each pixel’s last position. The pixel which is closer to the viewer is the one that will be displayed. This can be seen when two squares overlap. The square on the top is visible but not the one below it. Z buffer algorithm is used to virtualize such states. The generic sequential approach to Z buffer algorithm is[1] : For All the Objects in the scene Project the Object in the image coordinate system For every scanline of that Object For all the pixel in a scanline If Z coordinate of pixel< Zbuffer[pixel] Write [pixel] Zbuffer[pixel] = Z coordinate of pixel End if End for End For End For Since the image contains several objects, the first step in the sequential algorithm is to project each object onto the coordinate system. Objects are, then scanned row-by-row. Considering each pixel in the scanline, its Z coordinate is compared with the value in the zbuffer, which is previously initialized to infinity. If the z-coordinate is found to be lesser than the value in the buffer, this new pixel is superimposed on the old pixel. This coordinate is copied onto the Z buffer. Consequently, the pixels closer to the observer are displayed. The Parallel Approach: Improvisation of the sequential algorithm is achieved through the parallel approach. To substantiate this point we take the example of two overlapping squares. Consider two squares, one overlapping the other partially, as shown. Applying the z buffer algorithm for coloring of pixels, the blue square is obscured partially by the yellow square, which is in the foreground. fig. 1 The parallel algorithm for this problem is[1]: ParallelZbuffer() Begin Scatter(Vertices) Scatter(Squares) >> For all picture to compute Do >>Project vertices from object to screen coordinate system MultiBroadcast(Projected Vertices) >>LocalLoad Estimation(Locals squares) GlobalLoad = MultiReduce(LocalLoad) MultiScatter(squares) >>Sequential Zbuffer Output the picture EndDo End The parallel parts of this algorithm have been marked with >>. In order to optimize the memory and computation requirement, our scene is represented by a two-level data structure: a set of vertices and a set of squares. A vertex is a set of 6 real numbers which define a point in a coordinate system and a normal for this point. A square is a set of 4 vertices' indices (4 integers). We describe the different parts of this algorithm: Scatter(Vertices):All the vertices of the scene are equally distributed on the parallel computer. The vertices come from a disk or from a previous computation on the parallel computer. Scatter(Squares):We equally distribute squares on the parallel computer. Note that the squares, of a given processor, can make reference to vertices that might not be present in the local memory of that processor. Project vertices from object to screen coordinate system: The projections are done in parallel. For each vertex we have to do a matrix vector multiplication. Furthermore, we use the normal to shade the vertices and assign it a RGB color using the Gouraud model. MultiBroadcast(Projected Vertices): After this step, each processor knows all the projected vertices of the scene even if it doesn't use them. LocalLoad = Estimation(Locals Squares): Each processor computes in parallel an estimation of the load due to its own squares. We approximate the load associated with each row of the picture, with the number of squares intersecting that row. GlobalLoad = MultiReduce(LocalLoad): This global load allows to compute for each processor which part of the picture to treat in order to have a balanced workload. MultiScatter(Squares): Given image partition, we can compute the squares required by each processor. Sequential Zbuffer: We compute in parallel a sequential z-buffer for the part of the image owned by each processor. Write the picture: When all the sequential z-buffers are performed, we transfer the image to an output device. Ray Tracing: Ray Tracing is a technique for generating an image by tracing the path of light through pixels in an image place. Ray tracing gives the best visual quality but is not fast enough to support real time computation of graphics. There are several possibilities how to make a ray-tracing or ray-casting faster. One class of approach employs data structures for speeding up the search for a closest intersection on a ray. Data structures which support efficient geometric search allow us to look at only a small percentage of the scene to determine the closest intersection. Octrees, kD trees, and nested bounding volumes are examples of explicitly hierarchical search structures of this type. A kD Tree (k-Dimensional Tree) is a space-partitioning data structure for organizing points in a k-dimensional space[10]. fig.2 - kD Tree Structure buildkd()[4]: 1) Create a root node for the kD-tree with the scene bounding box and the scene graph root node. 2) Set the current node to be the root. 3) Set the current discrete LOD level to be the coarsest supported level. 4) Subdivide the geometry at the current node until it satisfies the current discrete LOD criteria. 5) Build out the kD-tree from this node until the tree termination criteria are satisfied. 6) Retain the current geometry (these nodes are effectively leaves for the current discrete LOD level). 7) Set the current discrete LOD level to the next finer level. 8) Go to step 4. At the beginning of every frame, kD-tree construction is initialized with a single root kD-tree node containing the bounding box of the entire scene and a single pointer to the root of the scene graph. All further kD-tree building is triggered by traversal operations during ray tracing The Problem Statement: The faster Z-Buffer algorithm is not well suited for higher level visibility/occlusion culling. It is highly resolution dependent and prone to accuracy problems. On the other hand ray tracing algorithm provides dynamic scenes, high image quality and execution on programmable multicore architectures[3]. But it’s considerably slow, which leads to the requirement of a new algorithm which combines the advantages of both these methods. Our Hybrid Methodology: Our hybrid approach, imbibes the advantages of both the ray tracing algorithm and the conventional Z-buffer algorithm, in which we provide the input to the z buffer method of computations using the kD tree method . HybridZbuffer() Begin >> If (frame received) do BuildkD() >> For all picture to compute Do >> Project vertices from object to screen coordinate system MultiBroadcast(Projected Vertices) >> LocalLoad Estimation(Locals squares) GlobalLoad = MultiReduce(LocalLoad) MultiScatter(squares) >> Sequential Zbuffer Output the picture EndDo End Its flow chart is: In this algorithm we first check if a frame is received, if so the buildkD function is called where the kD tree is created. Since creating kD trees is a time consuming process it is more effective when parallelized. This serves as the input to the z-buffer algorithm. In each thread the calculations of normal z buffer algorithm is carried out as discussed earlier. Thus we obtain a new improvised hybrid algorithm which has the advantages of both the z buffer algorithm as well as the ray tracing algorithm. Key Results: Considering the generation of the image of a teapot fig.3 According to the hypercube topology[1] proposed by S. Miguet and J. Li based on a ring of processors, the variation of execution time with the increasing number of processors, is shown in the graph below[]. The times are given for two sizes of pictures: 256 by 256 pixels and 512 by 512 pixels. It can be clearly observed that there’s a sharp decrease in the execution time, when we switch from single core to multiple cores. But further increase in the number of cores does not yield much improvement over its predecessors. Discussion: Conventional Z-buffer used for 3D graphics does not provide complex illumination effects like soft-shadows, reflections and diffuse lighting interactions. Though the Copernicus system, which utilizes the ray tracing technique, has been considered as its substitute because of its features like dynamic scenes, high image quality and execution on programmable multicore architecture, it is considerably slow compared to the Z-buffer. Our algorithm is designed to contain the advantages of both the above mentioned algorithms and is presumed to be more competent for computation of set of images, when the polygons are already present in the local memory and need only a global declaration to be correctly distributed among the processors. Conclusion and Future Work: Parallel implementations of various computer graphics algorithms like Z-Buffer, Shadow Mapping and Ray Tracing achieve good speed up compared to their sequential counter parts. The proposed algorithm, though untested, promises to deliver satisfactory results and overcomes the inadequacies of the z-buffer and ray tracing techniques. Buffering of the output of the kD tree makes it possible to incorporate reflections, refractions, transparency while reducing the complexity of the algorithm. This gives scope for achieving previously unattainable image processing capabilities, preceded by extensive testing and analysis of our algorithm. References: [1] Henri-Pierre Charles, Laurent Lefèvre and Serge Miguet. An optimized and loadBalanced portable parallel Zbuffer, 2007. [2] Paul S. Heckbert and Michael Herf. Simulating Soft Shadows with Graphics Hardware. Carnegie Mellon University,Pittsburgh. January 15,1997. [3] Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary Vernon, William R. Mark. Toward A Multicore Architecture for Real-time Raytracing. The University of Texas at Austin, 2008. [4] Gordon Stoll, William R. Mark, Peter Djeu, Rui Wang, Ikrima Elhassan. Razor: An Architecture for Dynamic Multiresolution Ray Tracing. The University of Texas at Austin. April 26,2006. [5] Kenneth I. Joy. THE DEPTH-BUFFER VISIBLE SURFACE ALGORITHM. University of California,1996. [6] Karthik Ramani, Christiaan P Gribble, Al Davis. StreamRay: A Stream Filtering Architecture for Coherent Ray Tracing. University Of Utah,2009. [7] Nelson Max, Keiichi Ohsaki. Rendering Trees From Precomputed Z-Buffer Views. University Of California, Davis. [8] Michael Wand, Matthias Fischer, Ingmar Peter, Friedhelm Meyer auf der Heide, Wolfgang Straber. The Randomized z-Buffer Algorithm:Interactive Rendering of Highly Complex Scenes. Universties Of tubingen and Paderborn,2001. [9] www.whatis.com [10] www.wikipedia.org Acknowledgements: We are grateful to our college, the HOD and our faculty mentor for all the support and encouragement we have received from them. We would also like to thank Intel for giving us an opportunity to present this paper.