Computer Graphics Lecture Notes

INDEX S. No Topic Week 1 Page No. 1 Introduction to graphics 1 2 Historical evolution, issues and challenges 25 3 Basics of a graphics system 58 4 Introduction to 3D graphics pipeline 94 Week 2 5 Introduction and overview on object representation techniques 121 6 Various Boundary Representation Techniques 139 7 Spline representation â€“ I 162 8 Spline representation â€“ II 204 Week 3 9 Space representation methods 247 10 Introduction to modeling transformations 274 11 Matrix representation and composition of transformations 302 12 Transformations in 3D 332 Week 4 13 Color computation â€“ basic idea 358 14 Simple lighting model 382 15 Shading models 413 16 Intensity mapping 446 Week 5 17 Color models and texture synthesis 475 18 View transformation 510 19 Projection transformation 538 20 Windows-to-viewport transformation 565 Week 6 21 Clipping introduction and 2D point and line clipping 594 22 2D fill-area clipping and 3D clipping 629 23 Hidden surface removal â€“ I 665 24 Hidden surface removal â€“ II 702 Week 7 25 Scan conversion of basic shapes â€“ I 726 26 Scan conversion of basic shapes â€“ II 754 27 Fill area and character scan conversion 792 28 Anti-aliasing techniques 829 Week 8 29 Graphics I/O Devices 872 30 Introduction to GPU and Shaders 908 31 Programming with OpenGL 936 32 Concluding remarks 974 Computer Graphics Dr. Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 01 Introduction to Graphics Hello and welcome to the first lecture of the course Computer Graphics. In this lecture we will try to get an overview of the basic idea of graphics and what it means. (Refer Slide Time: 00:50) So, let us begin with a very simple trivial question, what we do with computers? I think most of you will be able to tell that we do lot of things. Let us see some examples, what are the things that we do with a computer. 1 (Refer Slide Time: 01:07) The first example that we will see is related to a document processing task. So essentially we are interested in creating document and let us see what we do there and what we get to see on the screen. (Refer Slide Time: 01:29) On the screen I have shown one example of a document creation going on, this is essentially the creation of the slides from which I am delivering the lecture. So, as you can see there are many things that are being shown on the screen. So, what are those things what are the components that we are seeing on the screen? 2 (Refer Slide Time: 01:56) In fact there are large number of different things, the most important of course because we are talking of document processing activities the most important component is the alpha numeric character. So, there are many such characters the alphabets the numbers and how we enter those characters? By using a keyboard, either physical or virtual. (Refer Slide Time: 02:32) But apart from that there are other equally important components. 3 (Refer Slide Time: 02:39) For example, the menu option that we see here on the top side of the screen. As well as the icons various icons representing some editing tools that we get to see on the top part of the screen. So, here or here in fact all these components are essentially editing tools and the icons representing those tools. (Refer Slide Time: 03:19) We also have the preview slides on the left part which is another important component. 4 (Refer Slide Time: 03:28) So, if you have noted some of these components are shown as text like the alphanumeric characters and the others are shown as images like those icons. (Refer Slide Time: 03:41) So, essentially there is a mix of characters and images that constitute the interface of a typical document processing system. 5 (Refer Slide Time: 03:52) Now, let us see another example which you may or may not have seen but it is also quite common that is essentially CAD interface or Computer Aided Design Interface. So, CAD stands for Computer Aided Design. And this is an example of the interface so there are many difference system with different interfaces what I have shown here is one such example. (Refer Slide Time: 04:21) So, what this systems do, essentially with this system, someone can actually design machinery parts and there are some control buttons to do various operations on this parts. 6 (Refer Slide Time: 04:46) And as you can see the overall part that is the entire image is constructed from individual components like this smaller gears or this cylinder, this cubes smaller components. And these smaller components are having some specified properties for example dimension. (Refer Slide Time: 05:31) So, with this interface then what we can do typically engineers use such interfaces to create machinery by specifying individual components and their properties and try to assemble them virtually. On the screen to check if there is any problem in the specifications. So, clearly since everything is done virtually the engineer does not require any physical development of the machinery, so it saves time it saves cost and many other things. So, that is example 2. 7 (Refer Slide Time: 06:08) Now let us see one more example another interesting example of computer graphics, this is related to visualization or trying to visualize things that otherwise is difficult to visualize. So, under visualization we will see a couple of example the first one is visualization of a DNA molecule, now DNA as you all know stands for Deoxyribonucleic acid is essentially kind of your genetic code present in every cell and it is not possible to see it with our bear eyes as we all know. But it will be good if we can see it somehow in some manner, and application of computer graphics known as visualization makes it possible, like it is shown here. So, this type of visualization is known as scientific visualization where we try to visualize things that occur in nature but we cannot see otherwise or difficult to see. 8 (Refer Slide Time: 07:23) There is another type of visualization, let us see one example, suppose we want to visualize a computer network how traffic flow happens in the network, here by traffic I mean packets the packets that are being moved in the network, in any case we are not in a position to visualize it with our eyes but computer can help us with computer we can actually create a visualization of the network traffic flow. (Refer Slide Time: 08:06) These type of visualization are known as information visualization, here we are not dealing with natural objects instead we are dealing with unnatural or man-made information and we are trying to visualize that information. So, we have two types of visualization: scientific and 9 information. And these are applications of computer graphics that help us perceive that help us understand things that otherwise we will not be able to perceive. (Refer Slide Time: 08:55) So, as I said each of the examples that I have discussed earlier is an example of the use of computer graphics. (Refer Slide Time: 08:57) But these are only three examples in fact the spectrum of such applications of computer graphics is huge and everything that we get to see around us involving computers are basically applications of computer graphics and it is definitely not possible to least all those applications. 10 (Refer Slide Time: 09:28) Also we have to keep in mind that not only desktop or laptop screens we are here talking about a pleather of other types of displays as well that includes mobile phones, information kiosks at popular spots such as airports, ATMs, large displays at open air music concerts, air traffic control panels even movie screens in the theatres all these are some kinds of display and whatever is being shown on this displays are mostly applications of computer graphics. So, we have two things one is large number of application second is application on all possible displays. (Refer Slide Time: 10:26) And as I have already mentioned earlier those who are not very conversion to the inner working of a computer for them whenever we use the term computer essentially the thing that 11 comes to the mind of such lay persons is the display whatever is being shown on the display. So, essentially the display is considered as computer by those who are not very wellaccustomed with the inner workings of a computer. (Refer Slide Time: 11:04) Now, what is the common thing between all this applications, instances of images that are displayed? Now, here by image we are refereeing to both text characters alpha numeric characters as well as actual images because texts are also considered as images as we shall see in our subsequent lectures. (Refer Slide Time: 11:19) And these images are constructed with objects components of the objects like we have discussed in the CAD application like there are individual objects as we have seen earlier, 12 now these objects are essentially geometric shapes. And on these objects, we assign some colors like the yellow color here or the blue color here or the white here. So, colored geometric objects are there which are used to create the overall image. (Refer Slide Time: 12:08) Along with that there is a one more thing when we create edit or view a document we are dealing with alphanumeric characters and each of these characters is an object. Again, we shall see in details why characters are considered to be objects in subsequent lectures. And these objects are rendered on the screen with different styles size as well as color. Like the typical objects that we have noted in the previous case. 13 (Refer Slide Time: 12:42) Similarly, if we are using some drawing application drawing package like MS paint or the drawing application of MS word, there we deal with other shapes such as circles, rectangles, curves, these are also objects and with these objects we create a bigger object or bigger image. (Refer Slide Time: 13:12) Finally, in the case of animation videos or computer games which involves animation anyway. In many cases we deal with virtual characters. Those are essentially some artificially created characters which may or may not be human like. 14 (Refer Slide Time: 13:31) And all these images or their components can be manipulated because nowadays most traffic systems are interacting. So, user can interact with the screen content and manipulate the content. For that input devices are there such as mouse, keyboard, joystick and so on. (Refer Slide Time: 14:01) Now, how a computer can do all these stuff, all these things. What are those things? Let us recap again. Images consisting of components so we need to represent those components then we need to put them together into the form of a image and we should allow the user to interact with those components or the whole image through input devices as well as we should be able to create the perception of motion by moving those images. How a computer can do all these things? 15 (Refer Slide Time: 14:42) We all know you probably have already done some basic courses where you know that computers understand only binary language that is language of 0s and 1s, on the other hand in computer graphics what we have letters numbers, symbols characters but these are not 0s or 1s. These are something that we understand we can perceive we can understand. So, what is needed there are two questions related to that. (Refer Slide Time: 15:23) First question is how we can represent such objects in a language that the computer understands and the computer can process. 16 (Refer Slide Time: 15:39) The second question is, how we can map from the computers language to something that we can perceived, so essentially with the computer output in 0s and 1s we will not be able to understand what that means. So, we want again in the form of those objects that we have mentioned earlier. So, one thing is mapping from our understanding to computers language and other thing is mapping from computers understanding to our language. (Refer Slide Time: 16:06) In other words, how we can create or represent synthesize and render images on a computer display this is the fundamental question that we try to answer in computer graphics. 17 (Refer Slide Time: 16:23) From this fundamental question we can frame FOUR component questions. (Refer Slide Time: 16:29) First one is as we have already said imagery is constructed from constituents parts. So, how we can represent those parts that is the first basic question. 18 (Refer Slide Time: 16:46) Second question is how to synthesize the constituents parts to form a complete realistic imagery? So, that is our second question. (Refer Slide Time: 17:01) Third question is how to allow the users to manipulate the imagery or its constituents on the screen with the use of input devices. That is our third fundamental question. 19 (Refer Slide Time: 17:22) And finally, the fourth question is how to create the impression of motion to create animations. So, these are the four questions first is how to represent, second is how to synthesize, third is how to interact and fourth is how to create animation. (Refer Slide Time: 17:43) Now, in computer graphics we see answers to these four basic questions. 20 (Refer Slide Time: 17:47) Here few things need to noted first of all when we are talking of computers screens, we are using it in a very broad sense because the screens vary in a great way as we all are aware nowadays from small displays to display walls to large displays and these variations indicate corresponding variations in the underling computing platform however we will ignore those things when we refer to computers screen will assume that we are refereeing to all sorts of screens. (Refer Slide Time: 18:33) Accordingly, whatever we discuss our objective would be to seek efficient solutions to the four basic questions for all possible platforms. For example, displaying something on mobile phone requires techniques difference from displaying something on your desktop, because the 21 underling hardware may be different. There is a difference in CPU speed, memory capacity, power consumption issues and so on. So, when we are proposing a solution to answer one of these question or all these questions we should keep in mind these underlying variations. (Refer Slide Time: 19:23) Now, in summary what we can say about computer graphics is that this is the process of rendering static images or animation which is a sequence of images on the computer screen, that to in an efficient way, where efficiency essentially refers to the efficient utilization of underlying resources. (Refer Slide Time: 19:48) In this course we shall learn in details this process particularly the stages of the pipeline where the pipeline actually refers to set of stages which are part of this whole process of 22 rendering and pipeline implementation that is how we implement the stages, this involve a discussion on the hardware and software basics for a graphic system. However, we will not discuss the process of creation of animation which is a vast topic in itself and requires a separate course all together. (Refer Slide Time: 20:31) This is just for your information, that there is a related term probably some of you may have heard of it called image processing, now in image processing we manipulate images whereas in computer graphics we synthesize images and also we synthesis it in a way such that it gives us perception of motion that we call animation. So, computer graphics deals with synthesis of image as well as animation, whereas image processing deals with manipulation of already captured images. And in many applications these two are linked but those things will not discuss in this limited scope of the course. 23 (Refer Slide Time: 21:17) So, whatever we have discussed today you can find in details from this book more specifically you should refer to chapter 1, section 1 for the topics that we covered today. In the next lecture we will go through some historical evolution of the field followed by a discussion on the issues and challenges that are faced by workers in this field. Thank you and good bye. 24 Computer Graphics Professor. Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 2 Historical Evolution, Issues and Challenges Hello and welcome to lecture number 2 in the course Computer Graphics. So, before we start, let us recap what we have learned in the previous lecture. (Refer Slide Time: 0:40) In the last lecture if you may recall, we got introduced to the field and talked about the basic idea. That means, what is computer graphics, what it deals with. Now today, what we are going to do is, we will discuss the historical evolution of the field. And also we will discuss the issues and challenges that are faced by researchers in this area. Historical evolution knowledge is always beneficial for the broader understanding of the subject. So, we will go into a bit details of the evolution followed by discussion on the issues and challenges. 25 (Refer Slide Time: 1:25) In the early days when computer just started appearing, that means in the 1940s, 50s of the last century, displays constituted a terminal, a terminal unit capable of showing only characters. So, in earlier days we had displays that used to show only characters, there was no facility or no way to show anything other than characters. (Refer Slide Time: 2:05) Subsequently, the ability to show complex 2D images was introduced, that is the later developments. Now, with the advent of technology other things changed. 26 (Refer Slide Time: 2:24) We have now higher memory, capacity and increased processor speeds. Along with those changes, the display technology also improved significantly, so we had 3 broad developments, memory capacity enhancement, processor speed increase, as well as improvement in display technology. (Refer Slide Time: 2:56) Now, all 3 contributed together to display or to make it possible to display complex 3D animations which are computationally intensive and which assumes that we are capable 27 of performing the computations in real time. How computers computationally intensive these processes are, we will see in the subsequent lecture. In fact, that is the core content of this course. (Refer Slide Time: 3:43) Now, if we look closer to the 3D animation, then we will see that there are 2 aspects, one is synthesize of frames and the second one is combining the frames together and render them in a way such that it generates a perception of motion or generates the motion effects. Now, synthesis of frame as well as combining them and rendering them on the screen to generate motion are complex processes and they are also resource intensive. They require lots of hardware resources. So, these are the main focus areas of present day computer graphics activities, how to make the computer these processes workable in the modern day computing environment. 28 (Refer Slide Time: 4:48) Now, we are repeatedly talking about the term computer graphics but it has an origin. So, the term was first coined by William Fetter of the Boeing Corporation in 1960. That is 60 years ago. (Refer Slide Time: 5:09) Subsequently, Sylvan Chasen of Lockheed Corporation in 1981 proposed 4 phases of the evolution of the field. What are those phases? The phase was concepts to birth which is typically considered to be between 1950 and 1963. This is also known as the gestational 29 period. The second phase is the childhood phase of short duration 64 to 70 in the last century, then we have adolescence, this is again somewhat significant phase and span between 1970s to early phase of 1980s and then we have the adulthood which is still continuing starting from the early 1980s. So, these are the 4 phases that was proposed by Sylvan Chasen in 1981, gestational period, childhood, adolescence period and adulthood. Now, let us have a quick look at the major developments that took place in each of these phases. (Refer Slide Time: 6:31) Let us start with the first phase that is the gestational period between 1950 and 1963 at the early stages of computers. Now, if you are aware of the evolution of computers in the early phases, then you know that, that phase the gestational period also coincides with the early developmental phases of the computing technology itself. So, that was the phase when technology evolved. And nowadays, we take for granted the availability of interfaces that are popularly known as graphical user interfaces. So, we get to see it on all of our computer screens mostly, if we are using desktop, laptops or even smartphones. But in that phase, in the gestational period, the GUI concept was not there. In fact, nobody was even aware of the possibility of such an interface, it could not be imagined even. 30 (Refer Slide Time: 7:47) Now in that phase, there was one system developed which was called SAGE, which stands for Semi automatic, Semi Automatic Ground Environment. Now, it is developed by or for the benefit of the US Air Force, which is part of a bigger project called the Whirlwind project which was started in 1945. Now, the SAGE system is an early example from this phase of gestational period demonstrating the use of computer graphics. (Refer Slide Time: 8:39) 31 What this system does or what this system did, now the basic idea of the project was to get the positional information of an aircraft from rudder stations that is typically the job radar network. Now there is an operator who like this operator here, who was sitting in front of a screen, as you can see, but not the traditional screens that we are accustomed with but early version of a screen. (Refer Slide Time: 9:21) And on this screen aircrafts are shown and on the aircraft other data, the data received from the radar was superimposed. So, essentially what we have is that on the screen, a geographical region is shown and on that region the aircraft information is shown. 32 (Refer Slide Time: 9:48) There was more one more aspect of the system. So, it was actually in a sense interactive system, so the operator can actually interact with the system with the use of an input device called a light gun or pen, light pen. Now, if there is an aircraft shown on the screen, the operator can point the pen to that aircraft to get the identification information of the aircraft. (Refer Slide Time: 10:30) 33 So, when the gun was pointed at the plane symbol on the screen an event was sent to the Whirlwind system which in turn sent the details as text about the plane or about the identification information of the plane which was then displayed on the screen of the operator. Something like this. As you can see this is a light gun or light pen, operator is pointing the pen on the screen where an aircraft symbol is shown and once the pointing is done, then the system sends message to the overall system, Whirlwind system which had all the information, which is sent back to the interface to be seen by the operator. (Refer Slide Time: 11:34) So as I said, the system SAGE which is part of the Whirlwind system had traces of interactive graphics, where the interaction was done with the light gun or the light pens, but it was still not fully interactive the way we understand interaction in the modern context. True potential of the interactive computer graphics came into picture after the development of another system called Sketchpad by Ivan Sutherland way back in 1963. So, this Sketchpad system was a part of doctoral theses of Ivan Sutherland at MIT. And this system actually demonstrated the idea as well as the potential of an interactive graphics system. 34 (Refer Slide Time: 12:46) Like the SAGE system in Sketchpad also, the interaction was done through light pen and it was mean to develop engineering drawings directly on a CRT screen. So, here the operator need not be a passive input provider instead active input can be given in the form of creating drawings itself on the screen. An example is shown in this figure as you can see, this is the screen and on the screen the operator is holding light pen to create a drawing here. (Refer Slide Time: 13:36) 35 Now this Sketchpad system actually contains many firsts. It is widely considered to be the first GUI, although the term GUI was still not popular at that time, it is also credited with pioneering several concepts of graphical computing namely how to represent data in memory, how to deal with flexible lines, ability to zoom in and out, draw perfectly straight lines, corners, joints. These are things that nowadays we take for granted but these were very, very difficult at the time and sketchpad actually managed to demonstrate that these are possible. Accordingly, Sutherland is widely acknowledged by many as the grandfather of interactive computer graphics. (Refer Slide Time: 14:50) Now along with SAGE and Sketchpad, this period, gestational period also saw development of many other influential systems. 36 (Refer Slide Time: 15:03) During this phase first computer game called Spaceware was developed in 1961 on a PDP-1 platform which is an early computing platform. (Refer Slide Time: 15:25) IBM also developed the first CAD or Computer Aided Design system, recollect our previous lecture these systems are meant for helping engineers create mechanical drawings and test various thing without actually requiring to build the system. And in the 37 gestational period, IBM came up with this first CAD system in 1964 although the work started in 1959. (Refer Slide Time: 16:02) Now, the gestational period was followed by the childhood period, which is reasonably short duration period only of 6, 7 years. Now, in this period now much significantly new things happen only whatever was developed earlier in the gestational period, further development took place along those lines and consolidation took place of the earlier ideas. 38 (Refer Slide Time: 16:37) Then came the adolescent period, mostly confined to the 1970s and early phase of 1980s. Now, in this phase again, many new things happen, in 1971 Intel released the first commercial microprocessor called the 4004. Now, as we all know, with the coming of this microprocessor, a paradigm shift took placed in the way computers were designed and that in turn impacted the computer graphics field in a significant way by making computations less costly and affordable. (Refer Slide Time: 17:32) 39 As a result, in this period several interesting things happened, primarily two types of developments took place, one is techniques for realistic 3D graphics and several applications were developed during this phase particularly in the entertainment and movie making fields. As a result of those applications, people started noticing the potential of the field and invested more and more time and money so, both the development were significant in the context of overall evolution of the field. (Refer Slide Time: 18:16) Now, what were the works that were done for realistic and 3D image generation? One important development was the working on the lighting models. Now, these models we will learn later. What these models were meant to do, were to assign colors to pixels and this coloring of pixels or smallest graphical units on a screen is very important to give us a perception of realistic images as we all know. And we shall see in details in later lectures. 40 (Refer Slide Time: 19:03) Apart from that, another thing took place that is development of texture mapping techniques, now texture is basically patterns that we get to see on the surfaces. So, if we can impose textures on our artificially created object surfaces, then definitely that will lead us to a more realistic image representation and that development took place in this adolescence period. So, the first work was done by Catmull in 1974. First notable work, as you can see, that on this object some textures are shown, because of that, we are able to make out that it is a 3D object and it is having certain characteristics. So, without texture, it will look dull and non realistic. 41 (Refer Slide Time: 20:05) An advanced form of texture mapping was done through Bump mapping by Blinn in 1978. Like the example shown here, on the object surfaces, we can see that that special type of textures were incorporated, inserted to make it look more real, natural. These are called bumps, Bump mapping. (Refer Slide Time: 20:34) Also another development took place which is an advanced technique of creating 3D images that is called Ray Tracing and first notable development took place in 1980s in 42 the adolescence period, using this technique, we can develop realistic 3D images on a 2D screen, in a more better way than using the other techniques. Now, these are techniques that were developed to improve the quality of the synthesized images, to make them more realistic, more natural. So, to recap, broadly 4 approaches were develop in this phase. First one is lighting modern, basic work on the lighting model followed by texture model and bump modeling, bump mappings and finally Ray tracing methods. Apart from that, as I mentioned earlier, another strand of development that took place during this phase was development of several applications of computer graphics, whatever was the state of the art at that time based on that several applications were developed. Particularly in entertainment and movie making. (Refer Slide Time: 22:12) So, in 1973 the first movie came out named Westworld, which was the first movie to use computer graphics. 43 (Refer Slide Time: 22:26) This was followed in 1977 by the movie Star Wars, I think most of you, if not all, may be aware of this movie. So, the first movie came out in 1977 and it became hugely popular throughout the world and as a result, people learned about the potential of computer graphics in a more compelling way. (Refer Slide Time: 23:01) The adolescence period was followed by the adulthood period, starting from the early phase of 1980s. Now, in this period, the field entered the adulthood with the release of 44 IBM PC in 1981. Now as we all know, after the advent of the PC or personal computers, computers became a mass product, earlier it used to be confined to only a few people who were well educated in an advanced stage of studies and primarily does research or development works using this but after the advent of PC proliferated and become a mass product. And since it had become a mass product, focus now shifted to the development of applications that were appealing to the masses. (Refer Slide Time: 24:15) And using computer graphics lots of such applications were developed and focus shifted from graphics for expert to graphics for laymen. 45 (Refer Slide Time: 24:32) And as a result, we got to see several developments including the development of GUIs and the associated concepts. In fact, so many developments took place that it gave rise to a new field of study, which is called human-computer interaction or HCI in short. (Refer Slide Time: 24:52) One thing happened during this phase, a self sustaining cycle of development emerged, what is that? As more and more user friendly systems emerge, they create more and more interest among people, in turn that brings in new enthusiasm and investments on 46 innovative systems. So, it is a self sustaining cycle of development, more and more applications are there that is appealing to more and more people and the people in turn want more and more so, more and more investment came and it continued and it is still continuing. (Refer Slide Time: 25:42) And as a result of this self sustaining cycle of development, other associated developments took place. So, from CPU, we migrated to GPU or graphics processing, dedicated hardware for graphics, storage capacity improved significantly to be able to store and process large amount of data required for 3D realistic graphics. So, now we are talking in terms of terabytes, petabytes, instead of kilobytes or megabytes that used to be the case earlier. Similarly, display technology have seen huge improvement from the earliest cathode ray tubes to modern day touchscreens or situated walls or even better things. So, all this took place because of this self sustaining cycle of development. 47 (Refer Slide Time: 26:42) So, we can say that these technological developments brought in a paradigm shift in the field and we are now in a position with the help of new technology to develop algorithms to generate photorealistic 3D graphics in real time. So, all these things are important and this will form the core subject matter of our discussion in subsequent lectures. Now, note that all these are computation intensive process and because of the advancement in technologies, such computation intensive process has become manageable, possible to implement in real time. 48 (Refer Slide Time: 27:40) And since we are able to do those things now then the appeal and application of computer graphics have increased manifold and they presence of all these factors implies that the field is growing and will continue to grow in the foreseeable future. So, that is in brief the evolution of the field, 4 phases starting with the gestational period to the adulthood and the major developments we briefly discussed. Now, let us shift our focus to another important aspect of the field that is what are the issues and challenges that confront workers in this field? 49 (Refer Slide Time: 28:28) Now, in the formative stages of the field, primary concern was as we all know, generation of 2D images or 2D scenes. (Refer Slide Time: 28:39) But again as we have already discussed that subsequently changed and 2D graphics is no longer the thrust area and we are mostly focused on, nowadays we are mostly focused on the generation of 3D graphics and animation. 50 (Refer Slide Time: 29:02) In the context of 3D graphics and animation, there are 3 primary concerns related to software, software development for the system. (Refer Slide Time: 29:19) One is modeling which essentially means creating and representing object geometry in 3d world and here we have to keep in mind that we are not only talking about solid geometric objects, but also some phenomena such as bellowing of smoke, rain, fire, some 51 natural events phenomena so, how to model both objects as well as phenomena, that is one concern. (Refer Slide Time: 29:58) Second concern is rendering, essentially creating and displaying 2D image of the 3D objects, why 2D image? Because our screen is 2D so we have to convert the 3D objects into a 2D form. So, then this rendering deals with issues related to displaying the modeled objects on the screen and there are some other related issues involved namely color, coloring of the pixels on the screen, color and illumination which involves simulating the optical process. Then, visible surface determinism with respect to the viewer position, textured patterns on the surfaces or texture synthesis to mimic realism, 3D to 2D transformation and so on. So, these are the issues that are there in rendering. 52 (Refer Slide Time: 31:11) Then the third issue, third major issue related to graphic software is animation, describing how the image changes over time so, what it deals with? It deals with imparting motion on the objects to simulate movement, so, give us a perception of movement. Then the key concerns here are modeling of motion and interaction between objects during motion. So, the 3 major issues related to software are modeling of objects, rendering of objects and creating of animation. Now, there are some hardware related issues as well. (Refer Slide Time: 32:06) 53 Why those are important, because quality and cost of the display technology is of important concern, because there is always a tradeoff between the two, quality of the hardware as well as the cost, so we cannot get high quality in low cost and vice versa. And while building a graphics system application, we need to keep in mind this tradeoff. (Refer Slide Time: 32:39) Along with that, we need to keep in mind selection of appropriate interaction device because nowadays we are talking of interactive computer graphics. So, the interaction component is important and it is important to choose an appropriate mode of interaction or input device such that the interaction appears intuitive to the user. The user should not be forced to learn complex patterns or complex operations, it should be as natural as possible. 54 (Refer Slide Time: 33:20) Finally, design of specialized graphic devices to speed up the rendering process is also of utmost importance. Because graphics algorithms are computation intensive and if we can have dedicated hardware to perform those computations, then we can expect better performance. Now, the issue is how to design such hardware at an affordable cost and that is of primary concern related to hardware platforms for computer graphics. So, from the point of view hardware, we have this quality of the hardware as well as cost tradeoff to keep in mind also, we have to keep in mind the type of input device we are using as well as the dedicated graphic systems that we can afford. 55 (Refer Slide Time: 34:31) Now, one thing we should note here is that in this course, we shall learn how the issues are addressed, but we will not discuss issues related to animation, we will restrict our discussion to modeling and rendering of 2D images on the screen. (Refer Slide Time: 34:57) So, whatever we have discussed so far can be found in chapter 1 of the book that we are following. You are advised to go through section 1.1 and section 1.2 for getting more 56 details on the topics that we have covered today. So, that is all for today, we will meet again in the next lecture, thank you and good bye. 57 Computer Graphics Professor Doctor Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 3 Basics of a graphic system Hello and welcome to lecture number 3, in the course Computer Graphics. Before we go into the topics of today's discussion, let me briefly recap what we have learnt in the previous lectures. (Refer Slide Time: 0:45) So in the first lecture we got some basic introduction to the field, what is graphics and what are the main characteristics of this field. This was followed by a brief discussion on the historical evolution as well as the issues and challenges that confronts the researchers and the workers in this area. So, these three topics we have covered in the previous lectures. Today, we shall introduce a basic graphics system so that in subsequent discussions it will be easier for us to understand the content. 58 (Refer Slide Time: 1:25) So, what we do in computer graphics? The answer is simple, we generate or synthesize a 2D image from some scene and we display it on a screen. So essentially generation of the images and display on the screen. Now, how do you do that? So in the previous lectures we went into some details of this questions, now let us try to understand the answer from the perspective of the graphics system. (Refer Slide Time: 2:00) So if we look at a graphic system, the components that are likely to be there looks something like this. So we have a host computer, where all the processing takes place, then we have a display controller one component of the graphics system and this display controller takes 59 input from the host computer in the form of display commands and also it takes input from input devices, various input devices we mentioned earlier for enabling us to interact with the screen content. Now the output of the display controller goes to another component called video memory. Video memory content goes to third component called video controller which eventually displays or which eventually helps to display the image on the display screen. So, there are broadly three components that are unique to a graphic system; display controller, video memory and video controller. So we will have a discussion brief discussion on each of these components for better understanding. (Refer Slide Time: 3:40) 60 Let us start with display controller. Now image generation task is performed by the display controller, so when you say that in computer graphics our primary objective is to generate an image, that generation task is performed by the display controller and it takes input from the CPU of the host computer as well as external input devices such as mouse, keyboard, joystick etc. (Refer Slide Time: 4:12) And based on these inputs it generates images, now these images are generated following a multistage process which involves lots of computation. (Refer Slide Time: 4:30) 61 One concern here is that if all these computations are to be carried out by the host CPU, then it may get very less time to perform other computations. So a computer is not meant only to display, it is supposed to perform some other activities as well. Now if the CPU is engaged with only the computations relevant for display, then it will not have time to perform other computations which in effect will affect the throughput of the system. So in such a situation the system or the host computer system would not be able to do much except graphics which definitely is not a designable situation. (Refer Slide Time: 5:20) To avoid such situations and increase efficiency of the system the job of rendering or displaying is usually carried out by a dedicated component of the system which probably some of us or all of us had heard of is called graphics card. Now in this card there is a dedicated processor like CPU we have a dedicated processing unit for graphics computing which is called GPU or Graphics Processing Unit. Later on will have one lecture on the basic idea of GPU, for the time being will just mention that there is a unit called GPU in the graphics card. 62 (Refer Slide Time: 6:24) And the CPU as science any graphics rendering task to this separate graphics unit and we call this graphic unit as the display controller which is a of course generic name and in different systems it is called in different ways. So essentially display controller deals with performing the multi-stage operations required to create or synthesize a 2D image. (Refer Slide Time: 7:15) 63 Now the second component is video memory, so output of display controller is some representation of the 2D image and in video memory which if we recollect from this generic architecture which takes as input output of the display controller, it stores the representation. (Refer Slide Time: 7:29) Now display controller generates the images in the digital format strings of 0’s and 1’s which is expected because computer understands and processes information only in terms of 0’s and 1’s. 64 (Refer Slide Time: 7:45) The place where we store it is simply the video memory which is a dedicated path of the memory hierarchy. Now as we all know in the memory hierarchy of a computing system we have RAM, ROM, secondary storage, cache different levels video memory is also a part of those levels in the hierarchy and typically it is situated in the separate graphics unit or graphics card which is more popularly called VRAM or video RAM probably many of you or all of you have heard of this term. So display controller generates image representation and stores and that representation is stored in video memory. (Refer Slide Time: 8:48) 65 Then comes video controller, again let us go back to that generic architecture here, video controller is situated here which takes as input the information stored in video memory and then it does something to display the image on the screen. (Refer Slide Time: 9:13) So what it does? It essentially converts digital image that is represented in the form of 0’s and 1’s to analogue voltages, why? Because the voltages drive electromechanical arrangements which ultimately render image on the screen. So screen essentially is a electro mechanical mechanism and to run this mechanism we require voltage and this voltage is generated by the video controller based on the 0’s and 1’s stored to represent the image. 66 (Refer Slide Time: 10:05) In each display screen we have a basic unit of display which is typically called pixels and typically it is arranged in the form of a grid or matrix like if I draw a screen like this so we will have pixel grid something like this, where each cell may represent a pixel essentially a matrix form of pixels. (Refer Slide Time: 10:40) Now these pixels are essentially excited by electrical means and when they are excited they meet lights with specific intensities. Now these intensities give us the sensation of coloured images or give us the sensation of colours. So pixels are there on the screen pixels are excited by electrical means, so after excitation and they meet some light with the specified intensity 67 which gives us a sensation of colour. So if some portion of an image is having the red colour, the corresponding pixels will emit light with intensity of red colour so that we get the red colour sensation. (Refer Slide Time: 11:30) Now the mechanism through which these pixels are excited is the job of the video controller, so video controller essentially is tasked to excite pixels through electrical means by converting the digital input signal 0’s and 1’s into some analogue voltage signals which in turns activates the suitable electromechanical mechanism which is part of the controller. So that is in a very broad sense how a graphics system look like, so it has three unique components, display controller, memory and video controller. Display controller is responsible for creating a digital representation of the image to be displayed which is stored in the video memory and then this image information is used to basically excite pixels on the screen, to emit light of specific intensity, to give a sensation of coloured images. So this job of exciting pixels on the screen is done by video controller. Now, in light of this broad description of a graphic system, let us now move to our next topic of types of graphic systems or graphic devices. 68 (Refer Slide Time: 13:08) So there are broadly two types of graphic systems which is based on the method used to excite the pixels. Now what are these two types? One is the vector scan device other one is the raster scan device. (Refer Slide Time: 13:33) Let us start with the vector scan device. This type of devices or graphic devices are also known as random scan stroke writing or calligraphic devices. 69 (Refer Slide Time: 13:47) In this type of devices when we are talking of an image that image is represented or assume to be represented as a composition of continuous geometric primitives such as lines and curves. So any image is assumed to be composed of lines and curves and when we render or display these images on the screen essentially we render these basic geometric shapes. So we no longer talk about the whole image instead we talked about the component lines and curves that define the image. (Refer Slide Time: 14:37) In other words, a vector scan device excites only those pixels of the pixel grid that are part of these primitives, so to a vectors can device there is no such concepts as a full image, instead it 70 only knows about constituent, geometric primitives and it excites the pixels that are part of those primitives. (Refer Slide Time: 15:10) An example is shown here, consider this line in this left figure and the corresponding pixels is a truncated part of the grid the corresponding pixels are highlighted in this right figure. So to a vector scan device the image is not the line but only the set of pixels. It knows only about these pixels instead of knowing about this line and these pixels are excited to generate the line image and only these pixels are excited other pixels are not excited, this is important that in case of a vector scan device we excite only the pixels that are part of the primitives, other pixels are not touched. 71 (Refer Slide Time: 16:04) As a result, what we need to do? We need to selectively excite pixels which is very tough job which requires high precision which is obvious and complex hardware. (Refer Slide Time: 16:26) Which in turn makes these devices costly because it takes money to develop such hardware with high precision. Also due to the selective exciting such type of devices, vector scan devices are good for rendering wireframes which are basically outlined images. For complex scenes which involves lot of field of areas, flicker is visible because of this mechanism of selective exciting which is not a good thing. 72 (Refer Slide Time: 17:18) The other type of graphic devices is raster scan device. Now in raster scan device an images is viewed as represented by the whole pixel grid, so earlier we considered an image to be represented by only a subset of the whole pixel grid but here we are considering the whole pixel grid and not only the selected pixels representing the primitives. So when we render an image on a raster scan device all the pixels are considered, in case of vectors can device be considered only a subset and other pixels were not touched but here all the pixels are considered. And how do we consider that? (Refer Slide Time: 18:08) 73 By considering the pixels in a sequence. What is the typical sequence? It is typically left to right top to bottom. So if we have a grid like this then typically we start from left move towards the right end then go to the next row move towards the right end and continue in this way so kind of this type of movement till we reach the lower right endpoint or end pixel. (Refer Slide Time: 18:41) The same thing is mentioned here, so the controller starts with the top left pixel and checks if the pixel needs to be excited, that information will be stored in the memory. So if it needs to be excited it excites the pixel or leaves it unchanged but mind here that the pixel is considered for excitation and action is taken accordingly. (Refer Slide Time: 19:16) 74 It then moves to the next pixel on the right and repeat the steps till the last pixel in the row is reached. (Refer Slide Time: 19:29) Then the controller considers the first pixel in the next row and repeats the steps and in this manner it continues till the right bottom pixel of the grid. (Refer Slide Time: 19:43) Now this process of consideration of pixels in sequence or such sequential consideration of pixels is known as scanning this is a more generic term used that in raster scan devices, pixel scanning takes place each row of the grid is known as a scan line. So this sequential consideration is called scanning and each row in the pixel grid is known as scanline. 75 (Refer Slide Time: 20:23) Let us consider the same example here, earlier we considered only the pixels that are part of this line only these pixels, now we are considering all pixels starting from the top left corner moving in this direction then this row so on till this point. So each line is a scan line and as you can see in this figure, right hand figure, the white pixels means they need not be excited. The system considered the pixel and found that they need not be excited so it move to the next pixel and the filled up circles indicate excited pixels which represents the line so that information was also there in the memory and the video controller found out that these pixels needed to be excited so it excited those pixels, in the process is it considered all pixels in the grid and excited only those which need to be excited. 76 (Refer Slide Time: 21:42) Now the video memory of a raster scan system is more generally known as frame buffer where each location corresponds to each pixel. So the size of a frame buffer is equal to the screen resolution the size of the pixel grid, which is very obvious of course. (Refer Slide Time: 22:07) Now there is one interesting fact you should be aware of it, display processors are typically very fast they work at the speed of CPU, that is nanosecond scale so any operation is done at a very less time nanosecond level. On the other hand, video controllers are typically slower, much, much slower compared to display controllers because they involve electromechanical arrangements which takes time to work. 77 So typical speed ranges in the millisecond level or millisecond scale. Clearly there is a mismatch between the way display processor produces output between the speed at which the display processor can produce output and the speed at which the video controller can take that output as input. (Refer Slide Time: 23:15) Now assume that there is only one video memory or frame buffer, if the display controller outputs are fed directly as input to the video controller through that framebuffer, now the output is being produced very fast but the input is being consumed at a much lower rate so the output may get overwritten before the entire output is taken by the video controller as input which in turn may result in the image getting distorted because before the current input is processed the next input is ready and overwrote the current input. So to address this concern, so we use the concept of frame buffers. 78 (Refer Slide Time: 24:14) Where single buffer is not sufficient and will require at least 2 buffers and if two buffers are used it is called double buffering, of course there are cases with more than 2 buffers. Now in case of double buffering one buffer or one video memory is called primary and the other one is called secondary, so now video controller takes input from one of the buffers typically the primary buffer whereas the display controller fills up the other or the secondary buffer. Now when the video controller finishes reading input from the primary buffer, the primary now become secondary and the secondary becomes primary, so a role reversal takes place and the process repeats. So in this way the problem of overwriting the image information can be avoided. (Refer Slide Time: 25:17) 79 Another interesting fact to note here is called refreshing, now lights emitted from pixel elements which gives us the sensation of colour starts decaying over time. So it is not the case that the intensity of the emitted light remains the same throughout the display session so over time it starts decaying so intensity changes which lead to fading of the scene after sometime. However, pixels in a scene may get excited at different points of time, thus the pixels may not fade in sync. So in an image it is not necessary that every pixels fade in sync so that it is not perceptible to the user so it may lead to image distortion. (Refer Slide Time: 26:29) You know to avoid that situation, what is done is to keep on exciting the pixels periodically which is known as refreshing. So whatever is the excitation value with that value there is a periodic excitement of the whole pixel grid, so it is not an one time activity. One important 80 consideration here is the refresh rate at which rate we should keep on refreshing the screen so that the changes are not perceptible to the human eye. So the number of times a scene is refreshed per second is known as the refresh rate which is represented in Hz or Hertz, it is typically the frequency unit. And in case of displays that is typically considered to be 60 Hertz or 60 time per second screen should be refreshed. 81 (Refer Slide Time: 27:33) So what are the pros and cons of a raster scan device? Clearly here, since we are not trying to excite selectively, so we do not require a very high precision hardware. Scanning is a very straightforward job so a low precision hardware can do to the job. Also it is good for generating complex images since we are considering all pixels anyway, so it will not lead to flickers unlike in vector scan. (Refer Slide Time: 28:10) Due to these benefits one is low cost the other one is ability to generate complex images most of the displays that we see around us are based on raster graphic concept, so you get to see 82 only or mostly raster graphics devices around us because it is low cost and good at generating complex images. (Refer Slide Time: 28:43) Now these two are from the point of view of hardware vector scan device and raster scan device, there is a closely related term which probably you may have heard of called vector graphics and raster graphics. (Refer Slide Time: 28:58) Now these two are not related to any hardware characteristics unlike the previous terms vector scan and raster scan. 83 (Refer Slide Time: 29:10) In case of vector graphics, what we actually refer to is a where the image is represented, so when we are talking of a vector graphics image we are talking of the representation in terms of continuous geometric primitives such as lines and curves, so if I say that particular image is a vector graphics image, that means I am representing that image in terms of its constituent geometric primitives, lines and curves. (Refer Slide Time: 29:50) In case of raster graphics, the representation is different like in raster scan device in case of raster graphics what we refer to is essentially representing the image as the whole pixel grid with the pixels which are supposed to be excited in an on state and others in a off state. So if 84 we are representing an image as a raster graphics image essentially the image is stored in a form of whole pixel grid where some pixels are in the excited or in the on state or at least it is indicated that these pixels should be in the on state. (Refer Slide Time: 30:48) But again it should be noted that vector graphics or raster graphics are terms to indicate the way images are represented they have nothing to do with the underlying hardware. So even if I represent an image in the form of a vector graphics I can still use a raster scan device to display that image and vice versa if I represent an image as a raster graphics I can still use a vector scan device to render it. So we should be always clear about the distinction between these terms, one term is vector scan device and raster scan device these are related to the way scanning takes place at the hardware level. Other terms are vector graphics and raster graphics these represent the way images are represented internally rather than how they are rendered through actual display hardware. 85 (Refer Slide Time: 32:00) Now let us come back to or let us discuss another important topic that is colour display. So far we are assuming that the pixels are monochromatic implicitly we are assuming that but in reality we get to see images that are having colours, so how they work. In a black and white display each pixel may contain one type of element, for example if you are aware of CRT or cathode ray tube displays and their internal mechanism then you may be knowing that each pixel on a CRT display is having a single phosphor dot. Now when we excite it to generate different light intensities, they result in different shades of grey because that is a single phosphor dot. 86 (Refer Slide Time: 33:05) Like the illustration shown here this is for CRT or cathode ray tube, of course nowadays it is very rare to see such displays but it is good for pedagogical purpose to demonstrate in terms of a CRT, so left side shows a typical CRT display and on the right side we can see that how it works internally. So it has a tube within which there are certain arrangements these arrangements together constitute the video controller component of a generic system that we have discussed earlier, so we have cathode, heater, anode arrangements, then a grid to control this electron flow, then deflection plates vertical and horizontal for deflecting the electron flow. So essentially the arrangement generates a stream of electrons which hits a point on the screen a pixel, after hitting the pixel or the phosphor dot generates intensities which results in different shades of grey, that is in a very brief how CRT’s work and in a similar way other displays also work in a similar way not in this exactly the same way. 87 (Refer Slide Time: 34:44) So what happens in case of a colour image? Now in that case each pixel contains more than one type of element, so like for CRT instead of having one phosphor dot we can have three types of phosphor dots representing three primary colours namely red, green and blue. So when excited each of these phosphor dots generates intensities related to this primary colours so the red dot generates red intensities, green dot generate green intensities and blue dot generates blue intensities. When this intensity is combined together, we get the sensation of desired colour. (Refer Slide Time: 35:44) 88 So as I said each element is capable of generating different shades of the colour and when this shades combine they give us the desired sensation of the colour, schematically it looks somewhat like this figure where we have three streams of electron beams hitting the three elements separately some special arrangements are there which are called masks to guide the electron beams to hit specific pixel group representing the three pixels like the three shown here and finally we get the combination of different shades as the desired colour. (Refer Slide Time: 36:48) Now there are two ways to generate this coloured images. Essentially what we want to do is we want to have some values to guide the exciting of the individual type of elements in a coloured display, so there are two ways to do that, one is direct coding in this case what we 89 do individual colour information for each of the red, green and blue element of a pixel are stored directly in the corresponding frame buffer. So in the frame buffer itself we are storing the information of what should be the intensities of this individual colours, clearly that requires larger frame buffer compared to black and white frame buffers because now in each location we are storing three values instead of one and this frame buffer should be capable of storing the enter combination of RGB values which is also called the colour gamut. So later on will learn more about this colour gamuts the idea but the point to be noted here is that if we are going for direct coding, then we require a large frame buffer. (Refer Slide Time: 38:25) Another way is colour lookup tables where we use a separate table, lookup table which is of course a portion of the memory where each entry of the table contains a specific RGB combination and the frame buffer location contains pointer to the appropriate entry in the table. So frame buffer does not store the values directly instead it stores the location to the table which stores the actual values like illustrated in this figure as you can see this is a frame buffer location which stores the pointer to this particular table entry which stores the values of R G and B these are the values to excite the pixels accordingly. 90 (Refer Slide Time: 39:19) Now if I want the CLT to work or the colour lookup tables scheme to work, then we have to know the subset of the colours that are going to be required in the generation of images. So the table cannot store all possible combinations of R G and B values, it stores only a subset of those combination so essentially a subset of the entire set or the colour gamut and we must know that subset in advance to make this scheme work. If it is not valid of course this method is not going to work but nowadays we do not have any problem with this frame buffer at the size of the frame buffer because memory is cheap. So nowadays it is almost all graphic systems go for direct coding method but in the earlier generation of graphical systems when memory was a factor to determine the overall cost CLT was much in use. In that period of course the screens were not equipped to display all sorts of complex images and mostly wireframes were the images that were displayed. So that time CLT’s were much more useful but nowadays we do not need to bother about CLT much unless there is some specific application and we can directly go for direct coding method. 91 (Refer Slide Time: 40:56) So let us summarise what we have learnt today, we have got introduced to a basic graphic system which consists of three unique components namely the display controller, the video memory and the video control. Display controller is tasked to generate the image which is stored in video memory and which is used by the video controller to render it on a computer screen. We also learnt about different types of graphic systems namely the vector scan devices and the raster scan devices in brief and the associated concepts namely vector graphics, raster graphics, refreshing, frame buffers so on. Also we got some idea of how colour images are generated at the hardware level. So these are basic concepts which will be useful in our subsequent discussions. In the next lecture we will get an introduction to the basic processing that is required to generate a 2D image that is the job of the display controller, now this processing is actually consisting of a set of stages which is collectively known as graphics pipeline, so in the next lecture we will have an introduction to the overall pipeline. 92 (Refer Slide Time: 42:41) The topics that I have covered today can be found in this book chapter 1, section 1.3 and you are also advised to go through the details on the CRT or the cathode ray tube display that is mentioned in this section, although I have not covered it here, for better understanding of the topics. So we will meet again in the next lecture, thank you and goodbye. 93 Computer Graphics Dr. Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 4 Introduction to 3D Graphics Pipeline Hello and welcome to lecture number 4 in the course Computer Graphics. (Refer Slide Time: 00:39) Before we start, we will briefly recap what we have discussed in the previous lectures. So we started with a basic introduction to the field where we discussed about the historical evolution as well as the issues and challenges that are encountered by the workers in this field. This was followed by a basic introduction to the graphics system. So whenever we talk about computer graphics implicitly we refer to some hardware platform on which some software works. And the basic hardware structure or architecture of a graphic system has been introduced in one of the previous lectures. Today we are going to introduce the other component of the graphic system, namely the graphics software. Of course at this stage we will restrict ourselves to a basic introduction and the software stages will be discussed in details in the subsequent lecture. 94 (Refer Slide Time: 02:06) So let us recap what we have learned about a generic architecture of a graphic system. As we mentioned in one of our earlier lectures, so there are 3 unique components of a graphic system. One is the display controller, one is the video memory and the 3rd one is a video controller. What the display controller does? It essentially takes input from the host computer as well as from some external input devices which are used to perform interactive graphics. And based on that input, it creates a representation, a digital representation of a 2D image. That is the job of the display controller. Now that representation, that the controller generates is stored in a memory which is called video memory. Now the content of the memory, video memory is given as input to the 3 rd component that is the video controller which takes the memory content as input and then generates certain voltage levels to drive some electro-mechanical arrangements that are required to ultimately display the image on a computer screen. As you may recollect, we also mentioned that most of the things are done separately without involving the CPU of the host computer. So typically computers come with a component which is called as graphics card which probably all of you have heard of which contains the video memory, the video controller and the display controller components. And the processing unit that is typically part of the display controller is known as the GPU or graphics processing unit, this is 95 separate from the CPU or the main processing unit of a host computer. And the GPU is designed to perform graphical activities, graphical operations. (Refer Slide Time: 04:48) Now in this generic architecture as we said, display controller generates representation of an image. So what that representation contains? It contains some color values or intensity values in a specific format which ultimately is used to generate the particular sensation of color on the screen. Now from where these color values are obtained? Let us try to go into a some details of the process involved in generating these color values. 96 (Refer Slide Time: 05:29) Now these color values are obtained by the display processor through some computations that are done in stages. So there are a series of computations and these computations ultimately result in the generation of the color values. (Refer Slide Time: 06:00) Now these stages or the series of steps that are involved in the generation of color values are together called the graphics pipeline. This is a very important terminology and in our subsequent lectures we will discuss in details the stages of the pipeline, that actually will be the crux of this course. 97 (Refer Slide Time: 06:30) But today we are going to introduce the pipeline for our benefit, so that we can understand the later discussion better. So let us get some introductory idea on the pipeline and its stages. (Refer Slide Time: 06:47) There are several stages as I mentioned, first stage is essentially defining the objects. So when we talk of creating a scene or an image, it contains objects. Now there needs to be some way to represent these objects in the computer. That activity where we define objects which are going to be the parts of the images constitute the first stage of the pipeline which is called object 98 representation stage. For example, as you can see in this figure on the screen we want to generate the image of a cube with color values as shown on the right hand part of the screen. Now this image contains an object which is a cube and on the left hand side here we have defined this cube. So when we talk of defining what we mean essentially as we can understand intuitively, defining the cube involves specifying the vertices or edges with respect to some reference frame that is the definition in this simple case that is what are the vertices or what are the edges as pair of vertices. Ofcourse cube is a very simple object, for more complex objects we may require more complex definitions, more complex way of representing the objects. (Refer Slide Time: 08:53) Accordingly, several representation techniques are available for efficient creation and efficient manipulation of the images. Note here on the term efficient, so when we talk of this term efficient, essentially what we refer to, we refer to the fact that the displays are different, the underlying hardware platforms are different. So whatever computational resources we have to display something on a desktop or a laptop are likely to be different with respect to whatever we have to display something on a small mobile device or on a wearable device screen. Accordingly, our representation techniques should be able to utilize the available resources to the extent possible and should be able to allow the users to manipulate images in an interactive 99 setting. So the efficiency is essentially with respect to the available computing resources and the way to make optimum use of those resources. (Refer Slide Time: 10:32) Now once we define those objects, these objects are then passed through the subsequent pipeline stages to get and render images on the screen. So the first stage is defining the objects and the subsequent stages we take these object definitions as input and generate image representation as well as render it on the screen. 100 (Refer Slide Time: 11:00) What are those subsequent stages? First one is modeling transformation which is the 2 nd stage of the pipeline. Now as I said when we are defining an object where considering some reference frame with respect to which we are defining the object. For example, the cube that we have seen earlier. To define the cube, we need to define its coordinates but coordinates with respect to what? There we will assume certain reference frames. Now those reference frames with respect to which the objects are defined are more popularly called local coordinate of the object. So the objects are typically defined in their own or local coordinate system. Now multiple objects are put together to create a scene, so each object is defined in its own or local coordinate system and when we are combining them we are essentially trying to combine these different reference frames. By combining those different objects, we are creating a new assemble of objects in a new reference frame which typically is called world coordinate system. Take the example shown on this figure. So here as you can see there are many objects, some cubes, spheres and other objects, cylinders. Each of these objects is defined in its own coordinate system. Now in this whole scene, consisting of all the objects, this is the whole scene, here we have assembled all those objects from their own coordinate systems. But here again we are assuming another coordinate system in terms of which this assembling of objects is defined. So that coordinate system where we have assembled them is called the world coordinate system. So 101 there is a transformation, transforming an object from its own coordinate system to the world coordinate system. That transformation is called modeling transformation which is the 2nd stage of the graphics pipeline. (Refer Slide Time: 13:58) So in the first stage we define the objects, in the second stage we bring those objects together in the world coordinate system through modeling transformation which is also sometime known as the geometric transformation. So both the terms are used either modeling transformation or geometric transformation that is the 2nd stage of the graphics pipeline. 102 (Refer Slide Time: 14:14) Now once the scene is constructed, the objects need to be assigned colors which is done in the 3 rd stage of the pipeline called lighting or illumination stage. Take for example the images shown here. In the left figure we have simply the object, in the right figure we have the color. So the, we have applied colors on the object surfaces. Now as you can see the way we have applied colors, it became clear which surface is closer to the viewer and which surface is further. In other words, it gives us a sensation of 3D, whereas without colors like the one shown here, that clarity is not there. So to get realistic image which gives us a sensation of 3D, we have to assign colors. Assignment of colors is the job of the 3rd stage which is called lighting or illumination stage. 103 (Refer Slide Time: 15:36) Now as probably you are aware of color is a psychological phenomenon and this is linked to the way light behaves or in other words, this is linked to the laws of optics. And in the 3rd stage, what we do? We essentially try to mimic these optical laws, we try to mimic the way we see color or we perceive color in the real world and based on that we try to assign colors in the synthesized scenes. (Refer Slide Time: 16:17) So first we define an object, 2nd we bring objects together to create a scene, 3rd stage we assign colors to the object surfaces in the scene. Now till this point, everything we were doing in 3D 104 setting in the world coordinate system. Now when we get to see an image, the computer screen is 2D, so essentially what we require is a mapping from this 3D world coordinate scene to 2D computer screen. That mapping is done in the 4th stage that is viewing transformation. Now this stage we perform several activities which is similar to taking a photograph. Consider yourself to be a photographer, you have a camera and you are capturing some photo of a scene. What you do? You place the camera near your eye, focus to some object which you want to capture and then capture it on the camera system and also this is followed by seeing it on the camera display or camera screen, if you are having a digital camera. (Refer Slide Time: 18:01) Now this process of taking a photograph can be mathematically analyzed to have several intermediate operations which in itself forms a pipeline, which is a pipeline within the broader graphics pipeline. So the 4th stage viewing transformation itself is the pipeline which is a part of the overall graphics pipeline. Now this pipeline where we transform a 3D world coordinate scene to a 2D view plane scene is called viewing pipeline. 105 (Refer Slide Time: 18:50) Now in this pipeline what we do? We first setup a camera coordinate system which is also referred to as a view coordinate system. Then the world coordinate scene is transformed to the view coordinate system. This stage is called viewing transformation. So we have setup a new coordinate system which is a camera coordinate system and then we transformed the world coordinate scene to the camera coordinate scene. 106 (Refer Slide Time: 19:30) From there we make another transformation, now we transfer the scene to a 2D view plane. Now this stage is called projection transformation. So we have viewing transformation followed by projection transformation. (Refer Slide Time: 19:49) For projection, we define a region in a viewing coordinate space which is called view volume. For example, in the figure shown here, as you can see this frustum is defining a view volume, the frustum shown here is defining a view volume. So we want to capture objects that are present within this volume, outside objects we do not want to capture. That is typically what we do when 107 we take a photograph, we select some region on the scene and then we capture it. So whichever object is outside will not be projected and whichever are there inside the volume will be projected. (Refer Slide Time: 20:48) So here we require one additional process, a process to remove objects that are outside the view volume. Now those objects can be fully outside or can be partially outside. So in both the cases we need to remove them. So when an object is fully outside we completely remove it and when an object is partially outside we clip the object and keep only the part that is within the view volume, the outside part we remove. The overall process is called clipping. 108 (Refer Slide Time: 21:22) Also when we are projecting, we consider a viewer position where the photographer is situated and in which direction he or she is looking at. Based on that position, some objects may appear fully visible, some may appear partially visible, whereas the other objects will become invisible. But all of them may be within the same volume. For example, with respect to this particular view position, some objects may get, like this object if it is behind this object then it will be invisible. If it is partially behind, then it will be partially visible and if they are not aligned in the same direction, then both of them will be fully visible. So you take care of this fact also before projection which requires some further operations, computations. 109 (Refer Slide Time: 22:32) So to capture this viewing effect, the operations that we perform are typically called hidden surface removal operations or similarly visible surface detection operations. So to generate realistic viewing effect along with clipping what we do is we perform the hidden surface removal or visible surface detection operations. (Refer Slide Time: 23:06) So after clipping and hidden surface removal operations, we project the scene on the view plane. That is a plane define in the system, in the view coordinate system. 110 (Refer Slide Time: 23:21) Now, there is one more transformation Suppose in the right hand figure, suppose this is the object which is projected here in the view plane. Now the object may be displayed on any portion of a computer screen, it need not to be exactly at the same portion as in the view plane. For example, this object may be displayed in a corner of the display. So we will differentiate between two concepts here; one is the view plane which is typically called a window, other one is the display region on the actual display screen which we call viewport. So one more transformation remains in the viewing pipeline that is transferring the content from window to the viewport. So this is called the window-to-viewport transformation. 111 (Refer Slide Time: 24:44) So in summary what we can say is that, in the 4th stage there are 3 transformations. What are those transformations? First we transform from world coordinate scene to camera or view coordinate scene. Then from camera coordinate scene, we perform the projection transformation to view plane, then the view plane window is transform to the viewport. So these are the 3 transformations. Along with those there are 2 major operations that we perform here; one is clipping that is clipping out the objects that lie outside the view volume and the other one is hidden surface removal which means creating a realistic effect, viewing effect with respect to the viewer position. So that is the 4th stage. So first we defined objects in the first stage, in the 2nd stage we combined those objects in the world coordinate scene, in the 3rd stage we assigned colors to the object surfaces in the world coordinate scene, in the 4th stage we transformed in the world coordinate scene to the image on the viewport through a series of transformations which form a sub-section-pipeline within the overall pipeline. And those sub-pipeline stages are viewing transformation, projection transformation and window-to-viewport transformation. This sub-pipeline is called viewing pipeline which is part of the overall graphics pipeline and in the 4th stage along with these viewing pipeline we also have to more operations performed that is clipping and hidden surface removal. 112 (Refer Slide Time: 27:17) One more stage remains that is the 5th stage which is called scan conversion or rendering. Now we mentioned earlier that we transform to a viewport. Now viewport is an abstract representation of the actual display. In the actual display if you recollect our discussion on our raster displays, we mentioned that the display contains a pixel grid. So essentially the display contains locations which are discrete, we cannot assume that any point can have a corresponding point on the screen. For example, if in our image we have a vertex at location 1.5 and 2.5, on the screen we cannot have such a location because on screen we only have integer values as coordinates due to the discrete nature of the grid. So we have either a pixel located at 2, 2 or 3, 3 or 1, 1 or 1, 2 something like that rather than the real numbers 1.5, 2.5. So we cannot have a pixel location at say 1.5, 2.5 but we can have pixel locations only at integer value say 1, 1; 2, 2 and so on. So if we get a vertex in our image located at 1.5, 2.5 then we must map it to these integer coordinates. That stage where we perform this mapping is called the scan conversion stage which is the 5th and final stage of the pipeline. For example, consider these lines shown here, the end points are 2, 2 and 7, 5. Now all the intermediate points may not have integer coordinate values but in the final display, in the actual display we can have pixels, these circles only at integer coordinate values. So we have to map these non-integer coordinates to integer coordinates. That mapping is the job of this 5th stage or scan conversion stage which is also called rasterization. And as you can see it 113 may lead to some distortion because due to the mapping we may not get the exact points on the line, instead we may have to satisfy ourselves with some approximate points that lies close to the actual line. For example, this pixel here or this pixel here is not exactly on the line but the closest possible pixel with respect to the line. (Refer Slide Time: 30:19) So what is the concern? How to minimize the distortions? Now these distortions has a technical name which is called aliasing effect, from where this name originated we will discuss later. So our concern is to eliminate or reduce the aliasing effect to the extent possible so that we do not get to see too much distortions, we do not get to perceive too much distortions. To address this concern, several techniques are used which are called anti-aliasing techniques. These are used to make the image look as smooth as possible to reduce the effect of aliasing. 114 (Refer Slide Time: 31:21) So let us summarize what we have discussed so far. We mentioned the graphics pipeline, it contains the 5 stages; 1st stage is object representation, 2nd stage is modeling transformation, 3rd stage is assigning colors or lighting, 4th stage is the viewing pipeline which itself has subpipeline involving viewing transformation, clipping, hidden surface removal, projection transformation and window-to-viewport transformation and the final stage is scan conversion which is the 5th stage. So there are broadly 5 stages involved. Now each of these stages has its own reference frames, own coordinate system. So in stage 1 we deal with local coordinate system of the objects, in stage 2 we deal with world coordinate system. So essentially it transforms from local to world coordinate system. Stage 3 again we deal with world coordinate. So when we are assigning color, we essentially assuming that the objects are defined in the world coordinate system. In stage 4 again different coordinates are used, so first transformation that is viewing transformation, involve transformation from world coordinate to view coordinate system or the camera coordinate system. Clipping is performed on the view coordinate system, hidden surface removal is also performed in the view coordinate system. Then we perform a transformation, projection transformation which transforms the content of the view coordinate system to 2D view coordinate system. 115 So we have 3D view coordinate system from there we transfer the content to a 2D view coordinate system. And in the window-to-viewport transformation, we transfer from this 2D view coordinate system to device coordinate system. And finally in the 5th stage what we do, we transfer from device coordinate to actual screen coordinate system. Note that device coordinate is an abstract an intermediate representation, whereas the screen coordinate is the actual pixel grid. So device coordinate contains continuous values, whereas screen coordinate contains only discrete values in the form of a grid. So this is in summary what is there in the graphics pipeline. So display controller actually performs all these stages to finally get the intensities values to be stored in the frame buffer or video memory. Now these stages are performed through software, of course with suitable hardware support. For a programmer of a graphic system, of course it is not necessary to learn about the intricate details of all these stages, they are quite involves lots of theoretical concepts, lots of theoretical models. Now if a graphics programmer gets brought down with all this theory, models then most of the time will be consumed by understanding the theory rather than actually developing the system. So in order to address this concern of a programmer what is done is essentially development of libraries, graphics libraries. (Refer Slide Time: 35:17) So there is this theoretical background which is involved in generating 2D image. The programmer need not always implement the stages of the pipeline to fully implement the 116 theoretical knowledge, that would be of course too much effort and major portion of the development effort will go into understanding and implementing the theoretical stages. (Refer Slide Time: 35:52) Instead the programmer can use what is called application programming interfaces or APIs provided by the graphics libraries. Where these stages are already implemented in the form of various functions and the developer can simply call those functions with arguments in their program to perform certain graphical tasks. There are many such libraries available, very popular ones are mentioned here OpenGL which is an open source graphics library which is widely used. Then there is DirectX by Microsoft and there are many such other commercial libraries available which are proprietary but OpenGL being open source is widely accessible and useful to many situations. 117 (Refer Slide Time: 37:00) Now what these libraries contains? They contain predefined sets of functions, which, when invoked with appropriate arguments, perform specific tasks. So the programmer need not know every detail about the underlying hardware platform namely processor, memory and OS to build a application. (Refer Slide Time: 37:29) For example, suppose we want to assign colors to an object we have modelled. Do we need to actually implement the optical laws to perform the coloring? Note that this optical law implementation also involves knowledge of the processors available, the memory available and 118 so on. So what we can do is instead of having that knowledge, we can simply go for using a function glColor3f with an argument r, g, b. So this function is defined in OpenGL or the open graphics library which assigns a color to a 3D point. So here we do not need to know details such as how color is defined in the system, how such information is stored, in which portion of the memory and accessed, how the operating system manages the call, which processor CPU or GPU handles the task and so on. So all these complicated details can be avoided and the programmer can simply use this function to assign color. We will come back to this OpenGL functions in a later part of the lecture where we will introduce OpenGL. (Refer Slide Time: 38:57) Now graphics applications such as painting systems which probably all of you are familiar with, CAD tools that we mentioned in our introductory lectures earlier, video games, animations, all these are developed using these functions. So it is important to have an understanding of these libraries if you want to make your life simpler as a graphics programmer. And we will come back later to this library functions, we will discuss in details some functions popularly used in the context of OpenGL. So in summary today we have learned some idea of 3D graphics pipeline and also got some idea, introductory idea to the graphics libraries. In subsequent portion of the course, we will discuss in 119 details all the stags as well as some more details of the graphics libraries and the graphics hardware. (Refer Slide Time: 40:26) That is all for today, whatever I have discussed today can be found in chapter 1 of the book mentioned here. You are advised to refer to section 1.4 and 1.5. Thank you and good bye. 120 Computer Graphics Doctor Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 5 Introduction and Overview on Object Representation Techniques Hello and welcome to lecture number five in the course computer graphics. So, in the earlier lectures, we got introduced to the field where we discussed about few things, namely the historical evolution of the field, the issues, the challenges and applications that are there in the field. We also got introduced to the basic idea of graphic software, namely the 3D graphics pipeline and the graphics libraries today will start our discussion on the pipeline stages. (Refer Slide Time: 1:12) Let us recap what we have learned about graphics pipeline, as you may recollect there are broadly, 5 stages of the pipeline. In the first stage we have object representation, in other words, in this stage, what we do, we essentially try to represent the objects that will constitute the scene. Now the objects that are represented are defined in their local coordinate system. In the second stage we combine these objects together to form a scene. So, that stage is called modelling transformation stage. And here what we do we essentially perform a transformation from the local or object coordinate system to the world coordinate system. And at the end of it, we get a scene in the world coordinate system. In the third stage, we assigned colours, colours to the object surface points. So, color assignment takes place in the 121 world coordinate system. In the fourth stage, we make a series of transformations as well as some other operations. So, we transfer the objects from the world coordinate to a view coordinate system through a transformation called viewing transformation. So, this is essentially a transformation between world to view coordinate reference. Now, after doing that, we perform an operation called clipping, which we do in the view coordinate space. This is followed by another operation called hidden surface removal, which again takes place in the view coordinate space. After that, we perform a transformation from 3D view coordinate system to 2D view coordinate system. This transformation is called projection transformation. And finally, we perform yet another transformation that is window to view port transformation, where we transfer the content from 2D view coordinate system to device coordinate system. So, all these transformations and operations together constitute the fourth stage, which is called viewing pipeline stage. And then there is a final state called scan conversion or rendering, which is the fifth stage in this stage. We render the scene in the device coordinate to an image on the screen coordinate so that transformation takes place in this last and final phase of the pipeline. Among these five stages, today, we will start our discussion on the first stage that is object representation. (Refer Slide Time: 4:42) 122 As we all know, or probably we can guess, that in a synthesized image where we are performing the synthesis using a computer, we are likely to deal with objects of different shapes and sizes. And that different shapes and sizes can vary widely. We may deal with tiny snowflakes to create a scene or we may deal with complex characters, animation characters to create a movie or animation and all possible shapes and sizes of objects in between. Now, as you can understand, a snowflake, for example, is not a simple object, it has an elegant shape and unless we reproduce that elegant shape it will not be able to reproduce a realistic image or scene. So, ideally the snowflakes should not be represented with simple sphere. Similarly, an animated character needs to be depicted with its associated complexities so that it looks realistic. We should not try to simplify it using say simple polygons or simple geometric shapes. (Refer Slide Time: 6:22) Now, to generate a scene with all these disparate objects, what we need, we need some way to represent them so that computers can understand and process those objects. And as I said before, any representation will not work. So, we cannot represent a snowflake with a sphere that will reduce the realistic feel of the generated image. 123 (Refer Slide Time: 6:50) Now, there are two fundamental questions related to object representation; how can we represent different objects with their characteristic complexities, so that those can be rendered realistically in a synthesized environment? So, what it tells us that we need to represent different objects, preserving their inherent complexities so that when they are rendered on the screen, we get the feeling of realism in the synthesized image. (Refer Slide Time: 7:34) The other question is, how can we have a representation that makes the process of rendering efficient? In other words, can we perform the operations of the different stages of the pipeline in 124 ways that optimize space and time complexities? So, one question deals with creating realistic effects and as we discussed in our introductory lectures, creating realistic effects involves lots of computations and lots of data processing requiring storage. So, the other fundamental question related to object representation is that we have some computing resources available involving storage and processors. Now, we may want to use very complex representations to create more realistic effect, but whether our available resources will support such representations, which will be used in subsequent stages of the pipeline? That also we need to keep in mind while we are going for a particular representation. So, there is a tradeoff; one is realism, one is available resources and we have to balance the tradeoff. (Refer Slide Time: 9:37) Now, in order to balance this trade off, a plethora of techniques have been developed to represent objects and today we will go through some of those techniques in brief and the details will be discussed in subsequent lectures. 125 (Refer Slide Time: 9:40) All these techniques we can categorize in broadly four types, first one is point sample representation. Second one is boundary representation. Third one is space partitioning. And the fourth one is sweep representation. So we have a large number of techniques and all these techniques we can categorize into four types; point sample, boundary representation, space partitioning and sweep representation. (Refer Slide Time: 10:22) Let us see what is, what it is start with the first category point sample representation. To create a 3D scene we can first capture raw data such as color, surface normal and depth information of 126 different points on the scene. How we can capture those? We can use various devices such as 3D range scanner, range finder or using simple camera and computer vision techniques. So, using those devices and techniques, we can capture raw information about a 3D scene, namely the color information or the surface normal information or the depth information at different points in the scene. (Refer Slide Time: 11:18) Now, since we already got this information, so we do not need to compute them and we can directly render these points on a screen. So, we can process these points subsequently to generate the scene, so here the focus is on capturing the information rather than computing the values, then what is the representation? The representation is a set of raw data points, for each data point, we have captured some values like color, depth, surface normal, vector. These are its attributes. So, our representation involves the set of data points as well as some attribute values for each, and that is called point sample representation. So, essentially we are representing the 3D scene in terms of points sampled at different locations. 127 (Refer Slide Time: 12:22) The next one is boundary representation. There are also a set of techniques that represent an object by representing the individual object surfaces. Now these surfaces can be polygonal or curved. For example, see the figure here, here on the left hand side of the figure we see six surfaces. Named A, B, C, D, E and F. Now this surfaces defined the cube shown on the right hand side of the image. So, you are representing the cube, which is an object in terms of these surfaces, which are interlinked rectangles A to F. So, that is boundary representation. So, we are representing the cube in terms of its bounding or boundary surfaces. (Refer Slide Time: 13:34) 128 There are other techniques in which, we do not represent objects in terms of boundaries. Instead, what we do, we use the 3D space occupied by the object to represent it. Now we divide the space into several disjoint regions, disjoint or non-overlapping regions. Any point inside the object lies in exactly one of the regions, so the division is done in a way such that any point inside the object lies in exactly one of the regions. (Refer Slide Time: 14:20) So, when we represent objects in this manner. Then we are essentially representing in terms of the space occupied by the object rather than the bounding surfaces, so those techniques where such approaches are used are called space partitioning methods or representations. And such representations are often created in a hierarchical way. That means space occupied by the object is divided into sub regions. And this division mechanism is applied recursively to each sub region till we arrive at some predefined size of the sub region. 129 (Refer Slide Time: 15:10) Now, these hierarchical representations can be depicted in different ways. A common way is to form a tree or to show the representation in the form of a tree, which is often called space partitioning trees. That is one common way of representing an object in terms of the space occupied by it. (Refer Slide Time: 15:42) And finally, we have the sweep representation. Now, there are two sweep representation techniques which are widely used in graphics. One is the sweep surface representation and the surface of revolution representation. 130 (Refer Slide Time: 16:02) Let us try to understand the sweep surface representation. In this type of representation 3D surfaces are obtained by traversing an entity such as a point, line, polygon or curve along a path in space in a specified manner. For example, look at the figure here we have this rectangle and the rectangle is moved in a specific trajectory. To create this overall object of interest. So, the object here is this, entire thing created by moving a rectangle along the specified path. This type of representation where we are not actually representing an object in terms of its bounding surface or the space occupied by it, rather, we are actually representing it in terms of a process where the input is a primitive surface and the path it follows. And we specified the way to traverse that path. That type of representations are called sweep surfaces. 131 (Refer Slide Time: 17:36) In a similar way there is another representation called surface of revolution, as the name suggests here what we do, we define a 2D entity which rotates around an axis. We also specify the axis. So then the resulting object is what we are interested in. For example, here again if we consider say the rectangle and this is the path or path of rotation around the x axis, then we get this overall object, which is our desired object. So, here again, we are not actually specifying the object in terms of its bounding surfaces or the space occupied by it, but in terms of a primitive object, in this case rectangle and the axis and the direction of revolution. So, this type of representations are known as surface of revolution. 132 (Refer Slide Time: 19:02) Whatever we have discussed so far are the broad categories. (Refer Slide Time: 19:18) Now, some of these categories have subcategories as well. For example, boundary representation techniques have three types of sub techniques, one is mesh representation, which is most common. We have parametric representation and the third one is implicit representation. 133 (Refer Slide Time: 19:35) In case of space partitioning methods there are again several sub techniques, subcategories such as octrees methods, BSP trees, constructive solid geometry or CSG. In subsequent lectures, we will go through the subcategories in more details. (Refer Slide Time: 20:02) Now, apart from these broad categories and subcategories, there are some other techniques which do not fall into any of these broad categories, which are categories in itself. They are mainly application specific or complex photorealistic object representation. So, now complex 134 photorealistic object representation indicates that those techniques are used to represent realistic effects in a very complex way. (Refer Slide Time: 20:36) Let us see a few examples. There is one technique called fractal representation. For example, see this figure of the tree here if you look closely at each branch of the tree, you will see that the overall structure is replicated in each branch. So the whole tree structure is replicated in each branch and within this branch, sub branches replicate again this tree structure. So, it is a selfrepeating structure, which is represented using fractal notations. So fractal representation is one useful representation and in nature we get to see lots of objects that are actually self-repeating, where fractal representation is very useful. Another advance representation technique is particle system representation, where we try to simulate the actual physics, for example, if we want to create this waterfall in a very realistic way. Then particle system representation would be more appropriate than any other representation we have used so far where we will be able to actually mimic the way this water flows due to gravity and falls from a higher position to lower position and the collisions and how they get dispersed, all these things can be captured using particle system representation. Third technique is skeletal model representation. If we want to create a character like this, we can represent the character using a skeletal form, which is shown in this left side figure. And the way this skeletal form is defined, so whenever this character moves, the movement of the 135 skeleton is also proportionate. So, kinematic considerations are taken into account while we define a skeletal representation. These are only a few of many such possible representations which are actually used in generation of realistic scene. (Refer Slide Time: 23:26) So, in summary, what we can say is that we have a large number of techniques available to represent 3D objects. Broadly, there are four techniques. One is point sample rendering where instead of artificially trying to create objects shapes, we actually capture some values, namely color, depth, surface normal at different points of a scene and then simply reproduce those on the screen, that is point sample rendering. Other technique is boundary representation, which has many subcategories like mesh representation, parametric representation and implicit surface representations. In boundary representation techniques we represent objects in terms of its bounding surfaces where these bounding surfaces can be lines, curves, polygons, anything. The third technique is space partitioning method here, instead of representing an object in terms of its boundary, what we do is we represent the space occupied by this object. And typically, we use some hierarchical representation in the form of trees, which is more popularly called spaced partitioning tree. And there are many subcategories of such representations, namely octree method, BSP method or binary spaced partitioning method, constructive solid geometry method or CSG methods. 136 The fourth technique is sweep representation, where we do not represent the whole object. Instead, we represent the objects in terms of a primitive shape and some movement path or the trajectory. There are two such sweep representation techniques available; sweep surface and surface of revolution. One interesting point about this type of representation is that here the representation itself contains an approach rather than objects. The approach is how to move the primitive surface along a path or around an axis. Now, apart from these broad four categories, there are other representations available which are application specific, sometimes some specific techniques are also possible, namely scene graphs, skeletal model and advanced modelling, namely fractals or particle systems. Now, among these categories in subsequent lectures we will discuss in details these two categories; boundary representation and space partitioning representation. In the boundary representation techniques, we will discuss all these three subcategories in some details, whereas in the space partitioning method we will discuss these three subcategories in some detail. (Refer Slide Time: 27:06) And in boundary representation we will learn about a specific representation technique, namely the spline representation, which is very popular in representing complex shapes in graphics. Whatever I have discussed today is just the introduction to object representation techniques, various techniques that are available to represent object. Next, few lectures will be devoted to the details of these techniques. 137 (Refer Slide Time: 27:43) The content of today's lecture can be found in this book. You can have a look at chapter 2, section 2.1. For the introduction part, however, as I said, there are some advanced methods available for representing objects, which you will not find in this section. Instead, you may have a look at Section 2.5. Although these advanced techniques we will not discuss any further details. If you are interested, you may have a look at this section. So, that is all for today. Thank you. And see you in the next lecture. Goodbye. 138 Computer Graphics Doctor Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 6 Various Boundary Representation Techniques Hello and welcome to lecture number six in the course, computer graphics. (Refer Slide Time: 0:38) So, we started our discussion on 3D object representation, which is the first stage of the graphics pipeline. 139 (Refer Slide Time: 0:49) To recap, let us see the pipeline again. There are 5 broad stages. As being shown on this screen, first stage is object representation, which we are currently discussing, the other stages we will take up in subsequent lectures, namely the modelling transformation, lighting, viewing pipeline and scan conversion. One point I would like to mention here is that although, in this course I will follow the pipeline stages in the way shown here, in practice, it is not necessary to have this exact sequence. Some stages may come after some other stages. For example, lighting may be done after viewing pipeline or in between some of the transformations of viewing pipeline and so on. So, the sequence that I am showing here need not be followed exactly during implementation of a graphics system. This is just for our understanding of the stages involved and the sequence may vary. 140 (Refer Slide Time: 2:18) Now, what we have learned in the previous lecture, we got a general introduction to various object presentation techniques. (Refer Slide Time: 2:27) What were those techniques that we discussed? One technique is point sample rendering, then we have boundary representation technique, space partitioning techniques and sweep representation technique. These are the 4 broad categories we mentioned, each of which has subcategories boundary representation, has three subcategories; mesh representation, parametric representation and implicit representation. 141 Space partitioning has three subcategories; octree representation, BSP representation and CSG representation. BSP stands for binary space partitioning, whereas CSG stands for computational solid geometry. In sweep representation, we have two techniques; sweep surfaces and surface of revolution. Apart from these 4 broad categories, we have other representations as well. Some are application specific, there are some general advanced representation techniques, namely scene graphs, skeleton models, skeletal models and advanced modelling techniques. Now, in the advanced modelling techniques we have many such techniques, fractal representation, points sample rendering, particle systems and so on. (Refer Slide Time: 3:56) Today, we shall discuss in details one of those techniques, namely boundary representation techniques. We already have seen that in boundary representation techniques we represent an object in terms of its bounding surfaces or the surfaces that constitutes its boundary. Now, those surfaces can be simple polygons or complex steps. 142 (Refer Slide Time: 4:31) There are several ways to represent these bounding surfaces. We mentioned three subcategories of representation; mesh representation, implicit representation and parametric forms. So today we will get introductory idea to all these three representation techniques. (Refer Slide Time: 5:00) Let us start with the mesh representation. This is the most basic technique of representing objects in a scene, where we use polygons to represent the surfaces. Now the polygons in terms are represented using vertex or edge lists that store information about all the vertices or edges of the surface and their relationship. 143 For example, consider the figure here, you are representing a cube in terms of its vertices v1, v2 and so on up to v7, so there are 8 vertices. And this one is the representation where we are storing the vertices with coordinate values and some other values, capturing the relationships. For example, here in this first row, what it tells is that v0 is connected to v1, v3 and v5. Similarly, each vertex stores the other vertices which, it has connection to, this is one representation, there can be other ways to represent it. (Refer Slide Time: 6:32) Now, sometimes the surfaces need not be polynomial, but in mesh representation, what we can do is we can approximate anything to polygonal meshes like the figure shown here, here, this hand actually does not contain any polygonal surface. But this hand surface I can approximate with this type of triangular meshes where lots of triangles are used to approximate it. And again, these meshes are represented using vertex and edge lists. 144 (Refer Slide Time: 7:27) In fact, the mesh representation is most basic form of representation any other representation that we may use will ultimately be converted to mesh representation at the end of the pipeline before the objects are rendered. So, we have to keep this in mind. So, whatever representation we use and we will learn about in subsequent discussions, at the end, everything is converted to a mesh representation. (Refer Slide Time: 8:02) Now there is one important issue. That is how many polygons should we use to approximate the surfaces? That is a very fundamental question. 145 (Refer Slide Time: 8:14) Because more the number of polygons, the better the approximation is, this is obvious. However, more subdivision also implies more storage and computation. So, if we can use three triangles to represent a surface, which (()) (8:37) if we are using 30 triangles to represent a surface, the latter representation, of course, will give a better visual clarity, better visual quality. However, since we are increasing the number of objects or polygons in the mesh, there will be a corresponding increase in the storage because we have to now store vertices for 30 triangles, which are (()) (9:08) 3 triangles as well as computations, because we have to perform recursive subdivision to create this mesh, a larger number of times, which (()) (9:19) when we have less number of triangles. So, creation of mesh is computation intensive and storing the mesh information is also storage intensive, and if we increase both, then both needs to be taken into account. 146 (Refer Slide Time: 9:44) So, there is a trade-off and what we need to do is to optimize space and time complexities while keeping the quality acceptable, quality of representation acceptable. Now how to decide how to balance this tradeoff? The answer depends on the application and the resources available. Depending on the resources and depending on what we need to render we can choose the right value for the number of subdivisions required and as well as the number of polygons. We are going to be to approximate a surface with a mesh. That is about mesh representation. (Refer Slide Time: 10:37) Next let us move to the other two representations, implicit and parametric representations. 147 (Refer Slide Time: 10:46) Now, although we said that mesh representation is the most fundamental type of representation, for a developer it is not necessarily a very convenient mode of representation because for complex surfaces, first of all, it is very difficult to determine how many polygons should be used to create a mesh. Secondly, it is very cumbersome to enumerate all the vertices of the mesh. If the number of polygons in the mesh or the number of meshes that we are using are large, which is likely to be the case in any practical application. So, what is required is some compromise and some way to help the developer define objects without bothering too much or spending too much time on defining the meshes. 148 (Refer Slide Time: 11:52) So, designers or developers like to use representations that mimic actual object rather than its approximation. (Refer Slide Time: 12:04) This brings into picture some high level representations, representation techniques, for curved surfaces. Now these techniques are likely to represent curved surfaces more accurately and conveniently for the designer, these are not approximations, rather more closer to the actual representations. 149 (Refer Slide Time: 12:32) So, implicit and parametric representations are essentially those type of representations where it is more convenient and represents objects in more accurate way rather than approximate the objects. Now, let us start with implicit representation. So, in this case the surfaces are defined in terms of implicit functional form, some mathematical equations. (Refer Slide Time: 13:05) In case of parametric representation, the surface points are defined in Euclidean space in terms of some parameters, again in the form of some mathematical equations. 150 (Refer Slide Time: 13:23) Now, let us see a few examples which are popularly used in graphics. Let us start with quadric surfaces. (Refer Slide Time: 13:41) These are frequently used class of objects in graphics which are represented using implicit or parametric form. And this term quadric surfaces refers to those objects, which or the surface of which are described with second degree equations or quadratic equations. 151 (Refer Slide Time: 14:11) For example, spheres, these are very commonly used. (Refer Slide Time: 14:19) In implicit form, we can represent a spherical surface with radius r and, which is centered at origin as . So, this equation we can use for implicitly representing a sphere. 152 (Refer Slide Time: 14:42) The same sphere can be represented parametrically also using this form where the angles theta and phi of the parameters which represent the latitude and longitude angles as shown in this figure here, this is the latitude angle and this is the longitude angle. And this p is a point on this sphere, which is represented using the parameters. (Refer Slide Time: 15:25) Similarly, we can represent ellipsoid also either in implicit form as shown here or in parametric form as shown here. This is another widely used quadric surface. 153 (Refer Slide Time: 15:44) There are many other examples like tori, paraboloids and hyperboloids. Some other widely used quadric surfaces in graphics applications. (Refer Slide Time: 15:58) An interesting class of objects are called blobby objects. 154 (Refer Slide Time: 16:07) There are some objects for whom their shapes show certain degree of fluidity or flexibility, that means the object shape changes during motion or when comes closer to other objects. (Refer Slide Time: 16:25) Typically, these objects have curved surfaces, but we cannot use standard shapes like lines, polynomials or quadratics, quadratic equations or quadrics to represent these shapes because these equations or standard shapes fail to represent surface fluidity in a realistic way. So, we have objects which show some fluidity, whose surfaces are represented using some curves, but 155 those curves we cannot represent using line or polynomials or quadrics because then we will lose the fluidic nature. (Refer Slide Time: 17:11) Now such objects generally are referred to as blobby objects such as molecular structures, liquid and water droplets, melting objects, animal and human muscle shapes and so on, these are some examples there are many other examples also. There are several methods to represent blobby objects. In all, there is one common approach essentially to use some distribution function of over a region of space. (Refer Slide Time: 17:49) 156 One method is to use a combination of Gaussian density functions or sometimes called Gaussian bumps. (Refer Slide Time: 17:59) An example is shown here of a Gaussian density function, it is characterized by two parameters, height and standard deviation as shown in the figure. (Refer Slide Time: 18:19) Now, when we combine many such functions by varying the two parameters, plus some other parameters, we get a blobby object or we can represent a blobby object. 157 (Refer Slide Time: 18:36) So, the object can be represented with a function like this. Subject to the condition mentioned here. Now by varying the parameters, ak and bk we can generate desired amount of blobby-ness or fluidity that we require. Now, when bk becomes negative, then there are dents instead of bumps and T is some specified threshold. (Refer Slide Time: 19:12) An example is shown here where we have used three Gaussian density functions by varying the parameters to create an overall shape, something like this as shown in this dotted line. 158 (Refer Slide Time: 19:30) There is another interesting method to use blobby object. This is also quite popular where a quadratic density function instead of Gaussian bumps is used. (Refer Slide Time: 19:46) Which looks something like this b is the scaling factor, r is the radius of the object and d is maximum radius, d is the bound on the spread of the object around its center. So how far the object is constrained around the center is specified by d. So, these three are the parameters using which we can define blobby object in this metaball model. 159 (Refer Slide Time: 20:23) Now, these are some techniques that we have discussed, however it is very difficult or even impossible to represent any arbitrary surface in either implicit or parametric form. The functions that we have already seen are quite complex in itself. But still there are other surfaces which may turn out to be very difficult, which are indeed very difficult to represent using such equations. So, in order to represent such surfaces, we use a special type of parametric representation called spline representation or splines. Now these splines we will discuss in more details in the next lecture. So today, we have got an introduction to various boundary representation techniques, so we learned about mesh representation, we learned about basic idea of implicit and parametric representation techniques with some detailed discussion on quadric surfaces and blobby objects. In the next lecture, we will continue our discussion on boundary representation technique next, few lectures will be devoted to a detailed discussion on spline representations that will be followed by a discussion on space partitioning methods. That is all for today. 160 (Refer Slide Time: 22:02) So, whatever I have covered today can be found in this book. You are advised to go through Chapter 2, Section 2.2 for the topics that are covered today. We will meet again in the next lecture till then goodbye. 161 Computer Graphics Professor. Doctor. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 07 NPTEL-MOOCS L7 Hello and welcome to lecture number 7 in the course Computer Graphics so far we have covered 6 lectures and we are discussing the graphics pipeline. Before we go into today's topic, let us recap the pipeline quickly. (Refer Slide Time: 0:49) So, as we have already mentioned, there are 5 stages in the pipeline, first stage is object representation, second stage is modelling transformation, third stage is lighting or colouring. Fourth stage is a composition of sub stages and sub pipelines. This is called viewing pipeline, which consists of a sub pipeline which has 3 stages and 2 operations, viewing transformation, clipping, hidden surface removal, projection transformation. And window to viewport transformation. And the final stage of the pipeline is scan conversion or rendering. And these stages take place in different coordinate systems, starting from local or object coordinate system transitioning through world coordinate, view coordinate, device coordinate to finally the screen coordinate system. 162 (Refer Slide Time: 2:04) Now, among these stages, we are currently discussing the first stage object representation techniques. (Refer Slide Time: 2:15) As we have already discussed in the object representation techniques there are broadly 5 categories, one is point sample rendering. The second one is boundary representation, then space partitioning, then sweep representation, and finally some other specific representation techniques which are application specific or referred to some advanced techniques such as scene graphs, 163 skeletal model and other advanced modelling techniques such as fractal representation and particle systems. In the boundary representation, their 3 broad group of techniques. One is mesh representation, one is parametric representation, and one is implicit representation. Similarly, in space partitioning representation, there are 3 broad techniques octree based representation BSP or binary space partitioning trees and CSG techniques. Now, among all these, we are currently discussing the boundary representation techniques and we will continue our discussion on this technique. In the boundary representation techniques. In the last lecture, last couple of lectures, we have covered mesh representation and introduced the idea of parametric as well as implicit representation. (Refer Slide Time: 3:50) These are some of the boundary representation techniques that we have introduced in the last lecture. 164 (Refer Slide Time: 3:58) Today we will continue our discussion on boundary representation today we will focus on one specific and popular boundary representation technique, which is called spline representation. (Refer Slide Time: 4:14) So, in order to understand the spline representation technique, we need to understand how we represent curve. Curve is very common, primitive shape which is required at many places to represent objects, particularly in the context of complex shapes we cannot avoid representing 165 curves only with lines or points, it may not be possible to represent complex shapes, and we have to take into account curves. To simplify our discussion will focus here only on parametric representation of curves, although earlier we have introduced both the types, namely parametric representation and implicit representation. (Refer Slide Time: 5:10) How we can representation curves in general using parametric form, we can use a single parameter u will denoted by u to represent curves or its Cartesian coordinates using these equations. This one is for representing the X coordinate. The other one is for representing the Y coordinate where X is a function of u and Y is another function of u. Let us try to understand the intuition behind this representation. 166 (Refer Slide Time: 5:47) We can assume that u is denoting time. We can think of it in this way, we are drawing the curve on a 2D Cartesian space over a period of time now at an instant of time, we place a Cartesian point. Then we can say that at that point of time the Cartesian point is that, in other words the Cartesian point is characterized by the instant of time, which is u. So, essentially u denotes specific instant of time, at which point we can determine the corresponding coordinate values using the equation. This is the simple intuition behind the idea of parametric representation of a curve. 167 (Refer Slide Time: 6:51) So, that is about understanding how to representation curve parametrically. Now, our objective is to represent the curve easily and efficiently. Let us elaborate on this a little bit more. (Refer Slide Time: 7:07) As we all know, we can approximate a curve in terms of a set of small line segments, of course here the segments have to be very small to make the curve look smooth, otherwise the curve 168 make it a jagged appearance. Now, clearly this is very easy, intuitive but may not be efficient. We may have to provide a large number of points to draw small lines judgments. (Refer Slide Time: 7:49) There may be another alternative. We can work out the curve equation and apply the equation to find out any point on the curve. So, this clearly is better than specifying manually large number of points to approximate the curve in the form of a set of line segments. So, clearly this is easy and may turn out to be efficient also. But the problem here is that for many curves we may not be able to find the equation itself. It is very difficult for any arbitrarily set curve to find out the curve question. 169 (Refer Slide Time: 8:38) So, let us try to understand these problems from that point of view of a user, what the user thinks. And what are the problems that are user faces. Now user wants to generate a curve of any arbitrary shape. If we are trying to represent the curve in the form of a large number of small line segments, then user has to input a very large number of those points through which the line segments can be generated. Clearly, no user would be interested to input a very large number of such points. On the other hand, for a user. It may be difficult or even impossible to find out a precise equation of the curve. So, therefore in both the approaches user is not going to be benefited. 170 (Refer Slide Time: 9:48) Ideally, what a user should do or what a user wants to do, user wants to provide a limited set of points. Now, these points define the curve. So, essentially user is not providing all possible line segments to approximate the curve or providing a precise equation to find out points on the curve. Instead, user is providing a small or limited set of points which defines the curve. In other words, these points a chosen such that the curve passes through or nearby those points, these points are also known as control points. So, the alternative to the user is to provide a small set of control points instead of providing large set of points through which line segments can be drawn or give a precise curve equation. So, user has provided set of control points. 171 (Refer Slide Time: 11:05) And the user expects the system to draw the curve by interpolation by interpolating those control points. So, let us try to briefly understand what is the idea of interpolation, many of you or maybe all of you may already know what is interpolation, but there is no harm in refreshing our knowledge. (Refer Slide Time: 11:31) 172 So essentially, when we talk of interpolation, what we mean, we essentially mean by interpolation fitting of a curve that passes through or nearby the set of points provided or the control points. (Refer Slide Time: 11:49) One form of interpolation is polynomial interpolation. In this interpolation what we do, we try to fit a polynomial curve through the given set of control points. Now, polynomial interpolation is very popular because it is generally considered that such interpolations are simple, efficient and easy to manipulate. So, we will focus here on polynomial interpolation. 173 (Refer Slide Time: 12:27) Now, depending on the number of control points, the degree of the interpolating polynomial is decided. So, when we talk a polynomial interpolation, one concern is what should be the degree of the polynomial now that can be decided based on the number of control points provided. (Refer Slide Time: 12:58) Let us take an example, suppose we are given 2 control points in such a situation, it is advisable to go for linear interpolation rather than any other higher form of interpolation because we have 174 only two control points. Similarly, if there are 3 control points, then we can go for quadratic polynomials. There are 4 control points which use the degree accordingly and so on. (Refer Slide Time: 13:32) Therefore, we can say that in general for n+1 control points, we may try to fit polynomial of degree n which is pictorially depicted here in this figure, we are given these control points through which we are trying to fit a curve. And if the number of control points is n+1, then the curve that we should work with or the polynomial that we should work with should have degree n ideally. Note at the system of equations that we have mentioned here. This is for X coordinate, similarly for Y coordinate, we can have a similar set of systems. Now since they are n control points given we have n x coordinate values for each of these coordinates. We have 1 equation of the curve in terms of the parameter and so for the n number of control points, we have n number of equations. 175 (Refer Slide Time: 14:49) Now, in those are equations there are constant terms, those are the coefficients like a 0, a1 to an-1. If we decide these coordinates, then we can define the polynomial. So, to get the values of this coordinate these coefficients what we need to do, we need to solve the set of equations. The n plus one equations that we have seen earlier. If we solve this, then we will get these values of the coefficients which defines the polynomial. (Refer Slide Time: 15:36) 176 But there is one problem, if we have a very large n if we have many control points, a large number of n. Then we need to solve a very large number of equations, which is not easy. On top of it we need to keep in mind that there are two separate sets of equations, one for X and one for Y. So, we need to solve actually two sets of equations rather than one and for large and, this becomes very cumbersome to do. (Refer Slide Time: 16:24) Along with that, there is one more problem, which is called local controllability issue. Suppose you or the user wants to change the shape slightly. So, with the polynomial equation will get a curve which represents a shape. Now, I want to change it slightly. Then ideally, what should I do? A change of one or few of the control points to denote the small change. But if we go for polynomial interpolation, then to get the new curve, we may have to recalculate the entire thing again. So, entire curve may have to be recalculated. Which is, of course not a good thing because we have changed a few points and ideally we should be able to restrict our pre calculations effort to those few points only, but instead we have to solve the entire set of equations again, which is not an efficient approach. So, this problem is known as local controllability, where we are unable to control local changes locally. We have to control local changes through global recalculation of the curve. Now, in order to address these issues, there is another solution which we will discuss. 177 (Refer Slide Time: 18:27) Now, what is this alternative approach? Suppose we are given again n plus one control points irrespective of the value of n, we may partition the entire set into subsets with fewer points. Typically, these fewer points at 3. So, given a set of n plus one points, we may like to have subsets where each subset contains three control points. (Refer Slide Time: 19:00) Now for each of these subsets we may fit lower degree polynomials. In this case, the degree 2 polynomials for each of the subsets. 178 (Refer Slide Time: 19:17) And then these individual polynomials of lower degree, which are also called polynomial pieces, when they join together, they give the overall curved. So, the idea is very simple. You are given a large number of control points, but it is not necessary to fit a single polynomial curve using the entire set of control points. Instead, what we do, we divide the entire set of control points into subsets of smaller numbers. Each subset contains very few control points. Typical value used is three and for each of these subsets we fit or interpolate a smaller degree polynomial. And these polynomials, when they join together, they give the overall curved. So, this individual polynomials are also known as polynomial pieces. So, the entire curve we are representing in terms of polynomial pieces. 179 (Refer Slide Time: 20:23) Let us take an example, consider this figure here. There are 5 control points p0 to p4 as you can see, p0, p1, p2, p4 p3 and p4. Now, these 5 points need not be used to draw a single polynomial. Which in this case would be of degree 4 instead what we can do, we can subdivide the curves or the set of control points into subsets. Like the two subsets shown here in one subset, we have three control points p0, p1, p2 another subset we have another 3 control points p2, p3, p4 For each of these subsets, we draw a quadratic or degree 2 polynomial and then when they join together, we get the overall interpolated curve. That is the basic idea. 180 (Refer Slide Time: 21:40) Now this idea of fitting a set of control points with several polynomials of lower degree than a single higher degree polynomial is known as spline representation. So, when we talk of spline representation, we are essentially referring to the fact that there is a set of control points, but we are not interpolating the entire set with a single polynomial curve. Instead, we are representing it in terms of several polynomial pieces. Now the entire carve is called spine curve simply spline. This is a very popular curve representation technique used in computer graphics. (Refer Slide Time: 22:32) 181 In graphics, it is very common to use splines made of third degree or n = 3 polynomials, also known as cubic polynomials. In our subsequent discussion, we concentrate. We will concentrate on these polynomials only and corresponding splines only. There is one important thing in spline representation that we have to keep in mind that is called continuity condition. (Refer Slide Time: 23:10) Now splines as we have discussed, refers to joining of several polynomials. So, clearly it is important to ensure that they joint smoothly. To make the resulting curve look smooth. (Refer Slide Time: 23:32) 182 Now, how to ensure that? In order to ensure that to happen, splines must conform to what is known as continuity condition. (Refer Slide Time: 23:50) There are several such conditions broadly, they are of two types, one is parametric continuity condition and the other one is geometric continuity conditions. (Refer Slide Time: 24:02) 183 So, in general, the nth order parametric continuity condition denoted by Cn states that adjoining curves meet and first to the nth order parametric derivatives of the adjoining curve functions are equal at their common boundary that is the general definition. Now, let us see what they refer to in simple terms. (Refer Slide Time: 24:40) So, the first parametric continuity condition is C0, the zeroth order condition, which simply states that the adjoining curve meet. It is just that the simple condition. (Refer Slide Time: 25:00) 184 Now, the first order parametric condition C1 indicates that the first order derivatives of adjoining curves at common boundary are equal. So, essentially it tells that at the common boundary, we have to ensure that the first order parametric derivative. That means the derivative with respect to the parameter u of the curve should be equal. (Refer Slide Time: 25:40) In a similar way C2 indicates that both the first and the second order derivatives are equal at the common boundary. And in this way, we can go on. But since in graphics we mostly focus on third degree polynomials so, we are mostly concerned with these continuity conditions up to C2. 185 (Refer Slide Time: 26:09) Now, this parametric continuity conditions are sufficient, but not necessary to ensure geometric smoothness of the spline. For that, what we need is to conform to the other set of continuity conditions called geometric continuity conditions. Now what are those? (Refer Slide Time: 26:34) The 0 order condition is denoted by G0. This is the zeroth order condition which is similar to C0, which simply states that the curves must meet. 186 (Refer Slide Time: 26:50) Similarly, G1 or the first order geometric continuity condition tells that the tangent directions at the common boundary should be equal, although they are magnitudes can be different so that directions must be equal but magnitudes can vary at the boundary that is the G1 or first-order geometric continuity condition. (Refer Slide Time: 27:23) Second-order condition or G2 indicates that both tangent direction and curvatures at the common boundary of the adjoining curves should be equal. Again, we can go on like this up to any order, 187 but since we are mostly concerned with cubic polynomials, up to G2 should be sufficient for our understanding. So, that is one basic knowledge that we should have about splines that is, if we want to represent any curve as splines, that means in terms of smaller polynomial pieces, we should ensure that the curves conforms to the continuity conditions, parametric and geometric continuity conditions. (Refer Slide Time: 28:27) Now let us try to see what are the different types of Spline representations that we can use. There are broadly two types. One is interpolating splines. Other one is approximation splines. 188 (Refer Slide Time: 28:46) Now, in case of interpolating splines, what we want? We essentially try to fit the curve such that it passes through all the control points. So, essentially we are given a set of control points and we are representing the curve in the form of splines, in a way such that the polynomial pieces of the spline passes through all the control points as shown in this figure. Now, the commonly used interpolating splines in computer graphics are natural cubic splines, hermite cubic splines, and Cardinal cubic splines. So, we will discuss about these splines in details later. 189 (Refer Slide Time: 29:46) The other type of Spline curves are called approximating splines here. Control points are used to define a boundary or convex hull, the spline itself does not pass through all the control points. Instead, it is restricted within the boundary defined by the control points. Take the same example here we have 4 control points. But here, the curve is not passing through the 4 control points, unlike earlier in case of interpolating splines, what is happening here is that these control points are defining a bonding region, a boundary which is popularly called convex hull, and the spline lies within this boundary. In other words, the Spline shape is determined by the convection. Now, there are a few common and popular splines approximating splines used in applications, namely the Cubic bezier curves and the Cubic B splines, again will discuss about those later. So, that is the basic idea of spline. What it is and what makes them good for representing any curve. So, what it is it is essentially representing a complex shape in terms of smaller, manageable, lower degree polynomials or polynomial pieces. And it is able to represent the Curves smoothly because splines are supposed to conform to continuity conditions. Now, let us try to understand how we represent spline. This is same as knowing how to represent the objects which are represented by splines. 190 (Refer Slide Time: 32:08) How can we represent splines? There are two ways, broadly one is basis or blending function based representation. Other one is basis metrics based representation. And these two are equivalent. Of course, that is quite obvious and one can be converted to the other and vice versa. (Refer Slide Time: 32:34) Let us take some examples to understand the representation so will start with basic metrics, representation of splines. And we will start with a simple example. Consider a polynomial of degree one that is a linear polynomial, which in the parametric form we can represent as a f u 191 equal to a0 plus ua1. Now a0, a1 are coefficients. And u is the parameter we must keep in mind here that this is a compact representation. ai like a0, a1 actually represents vectors comprising of two components, one each for the corresponding coordinates. So, a0 actually has a0x, a0y values separate for x and y coordinates. Similarly, fu should have corresponding expressions, namely fx u and fyu. However for simplicity we will work with this compact form rather than the expanded form. (Refer Slide Time: 34:00) Now, this parametric equation we can represent in the form of matrix U.A. So, this is a dot product of two matrices, U and A. U is the parameter metrics and A is the coefficient metrics. Where U is denoted in the form of this vector 1, u and the metrics A is denoted in this column vector form. Having the two coefficients a0, a1 in our example. 192 (Refer Slide Time: 34:46) Now, since this is a polynomial of degree 1, so we need at least two control points to determine f. Let us denote those two by p0 and p1. (Refer Slide Time: 35:06) Now, these points we will use to parameterize the polynomial, in other words, we shall assume that certain parameter values and therefore these control points, for example, we may assume the 193 control points denote values of the function at the boundary where we can define the boundary as the points where the parameter values text of value 0 and 1. (Refer Slide Time: 35:48) If that is the case, then we can set up our system of equations as shown here, two equation. One for p0, one for p1 with the parameter value fixed. Now, by solving these equations we can get the coefficients. (Refer Slide Time: 36:17) 194 However, if we look closely then we can see that the same system of equation we can represent in the form of matrices. Now, what is this matrix representation we can represent it as being able to C.A. Where p is defined as a column vector C is defined as another column vector and A is defined as yet another column vector. (Refer Slide Time: 36:56) So, how we constructed the C matrix, we took the coefficients of a i. That means from a1 to an in that order. Those terms in each equation to from the corresponding row of the C matrix. So, first equation we took for the first row and so on. 195 (Refer Slide Time: 37:30) In other words, we imposed certain constraints. Parameterization conditions as constraints to obtain C. Accordingly, C is called the constraint matrix. (Refer Slide Time: 37:53) Now we know P equal to C.A so we can say that A equal to C -1.P. Now, this inverse of the constant matrix is called basis matrix. 196 (Refer Slide Time: 38:13) So, we can represent f as U.A which can be expended, as U.C -1.P or UBP. So, this is the way to represent f in terms of matrix multiplication. (Refer Slide Time: 38:37) Now, one thing we have to note here is that the basis matrix. For an interpolating polynomial that satisfies the parameterization conditions is fixed. In other words, the matrix or the basis matrix uniquely characterizes the polynomial. So, if we use the basis matrix B instead of the polynomial 197 equation, then this is as good as representing the polynomial because B is fixed for the particular polynomial. (Refer Slide Time: 39:19) Now, we know that spline is made up of polynomial pieces. Now, if each piece is made of the same type of polynomial, that means the degree and the constraints are the same. So, then overall, Spline can be uniquely characterized by each piece. And since already we have mentioned that a polynomial piece can be characterized by the basis matrix, then the basis matrix can also be used to uniquely characterize the entire spline. So, when we are representing the spline, we can simply represent it in terms of the basis matrix. Now that is the basis matrix representation of spline. So, to recap given a polynomial, we can have a unique basis matrix for that polynomial under certain constraints. So, the basis matrix is suitable to represent the polynomial. Now the same polynomial pieces are used to represent a spline. So, for each polynomial piece, we have the same metrics so we can use a single basis matrix to represent the overall Spline, because the basis matrix will tell us that particular polynomial pieces are used to represent the spline. This is the basis matrix representation of splines. 198 (Refer Slide Time: 41:04) This explanation we just mentioned, so basis matrix refers to polynomial pieces of this spline, we are assuming all pieces are made up of same polynomial. So, polynomial basis matrix represents the whole spline. Now, let us focus attention to the other type of spline representation, namely the blending function representation. (Refer Slide Time: 41:33) Now, earlier, we have seen that we can representation f in terms of basis matrix, like U.B.P. 199 (Refer Slide Time: 41:43) Now, if we expand right hand side, we get weighted sum of polynomials with the control points being the weights. So, in our example let us derive it and see what happens. So, in our example we have a polynomial of degree 1 and we have the matrices in this from u is this one. B, is this Matrix and C is the control point metrics. Now if we expand, we will get this equation in terms of the control points. (Refer Slide Time: 42:37) 200 Now, the individual polynomials in the weighted sum, such as the term 1-u and u are the blending functions. So, the overall function is represented as a weighted sum of polynomials. And these individual polynomials are called the basis function or the blending functions. (Refer Slide Time: 43:03) Now, for a given polynomial, the blending functions are also fixed so we can use them to characterize the polynomial. So, for a given polynomial with constraints, the functions that can be used to represent it are fixed. So, this blending function set can be used to characterize the polynomial so we can apply the same logic here. Spline made up of several pieces of the same polynomial type. Therefore can also be represented in terms of the blending functions since they are uniquely characterizing the constituent polynomial pieces. 201 (Refer Slide Time: 43:52) So, in a compact form we can represent a Spline or the curve f in this way where pi is the i-th control point and bi is the blending function. So, to recap today we have got introduced to the basic idea of splines, which is essentially representing a curve in terms of constituent lower degree polynomial pieces. Then we discussed the continuity conditions to ensure that splines give us smooth curves. We also discussed the broad types of splines and the way the splines can be represented in the form of basis matrices or blending functions. In the next lecture will take up detailed discussion on the various types of splines that we have mentioned, namely the interpolating splines and the approximating splines. We will also learn in the next lecture about the use of splines to represent surfaces in computer graphics. 202 (Refer Slide Time: 45:15) Whatever I have discussed today can be found in this book. You are advised to go through Chapter 2, Section 2.3 to learn about these topics in more detail. See you in the next lecture till then thank you and goodbye. 203 Computer Graphics Professor. Doctor. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 08 Spline representation – II Hello and welcome to lecture number 8 in the course Computer Graphics. (Refer Slide Time: 0:37) In the previous lecture we got introduced to the basic idea of splines. Now splines is one of the areas boundary representation techniques, as we have mentioned. And in spline in the introductory lecture, what we have learned, we learned about the basic idea, what is a spline then how to represent it and how those representations are created and various such introductory topics, including the types of splines. Just to recap, let me briefly tell again about the basic idea of spline, so when we are talking of a spline, it is essentially a representation of a curve and the curve is represented in the form of collection of lower degree polynomials, so these polynomials joined together to give the overall curve and the polynomials are interpolated on the basis of a set of control points that are defined to represent the curves. So, Spline is essentially a representation where we join together multiple polynomial curves of lower order. 204 (Refer Slide Time: 2:11) Now, that was the basic idea, and during that discussion, we also talked about types of splines broadly, two types are there. One is interpolating splines. One is approximating splines. Today we are going to discuss in detail about these two types of splines. Along with that, we are going to talk about how these splines can be used to represent curves and surfaces in computer graphics. Let us start with the idea of interpolating splines. If you recall, interpolating splines are those where we define a set of control points and the Spline curve passes through those points. And we also mention 3 popular interpolating splines. 205 (Refer Slide Time: 3:07) Namely the Natural cubic spline, the Hermite cubic splines and the Cardinal cubic splines, all 3 are interpolating splines, meaning that they are defined by control points which passes through the Spline curves. So, let us now, go into some detailed discussion on each of these popular types. We will start with natural cubic spline. (Refer Slide Time: 3:44) 206 Just a little bit of history. So this is one of the first splines. That was used in computer graphics applications. And as the name suggests, since it is a cubic spline so it is made up of pieces of third degree polynomials. (Refer Slide Time: 4:09) Now, each piece, each of the polynomial pieces is defined by 4 control points, we can denote them as p0, p1, p2, p3. Recollect that when we are talking of polynomial interpolation, what we mentioned earlier is that if we are using a polynomial of degree n, then we are going to have n plus 1 control points. So, similarly here since the degree is three, so we have 4 control points. Now in the case of natural cubic splines two of the control points p0 and p3 denote two boundary points that means the boundary value of the parameter u so p0 refers to the point when u = 0 and p3 refers to the point where u = 1. Now, the other two control points are essentially denoting the first and second derivatives at the starting point that is equal to 0. So, two control points are essentially points on the curves p0 and p3, which represents the two boundary points and the other two points are not points on the Curves, rather the first and second order derivatives with respect to the parameter u at u equal to 0. 207 (Refer Slide Time: 5:47) Now let us try to represent this spline using one of the representation techniques. So, recollect again that spline can be represented in either of the two ways, namely a basis matrix representation and a blending function representations. We also mentioned that they are equivalent. So, any one representation is good enough. Now, in the context of interpolating representations, we will try to represent them using basis matrix form. So, in case of natural cubic spline, since we are using cubic polynomial so the general polynomial piece equations should look something like this. Where a0, a1, a2 and a3 are coefficients, u is the parameter. Now we already know that p0 or the control point p0 and the control point p1 control point p2, it should be p2 and the control point p3 have specific meaning, so p0 and p3 represent the point at the boundaries. So, then we can set up the equations, something like this, p0 is the function value at u equal to 0 which we can obtain by replacing u with 0. So, we will get a0, 0.a1, 0 square a2, 0 cube a3 just replace 0, replace u with 0. Similarly for p3, we have the function value at u equal to 1. So, in this case we replace u with 1 and we get something like this. These two are equations corresponding to the control points, p0 and p3, p1 and p2, as we said represent the first order and the second order derivative at u equal to 0. So, p1 is the first order derivative, p2 is the second order derivative with respect to u. 208 Now if we compute the derivatives and then replace u with value of 0 in case of p1, what we will get will get equation, something like this. You can try it yourself to first obtain the derivative and then replace the u values with 0. Now we compute the derivative again to get the second order derivative at u equal to 0 corresponding to p2 and then replace u with 0 again. And we get equation something like this. So, these 4 equations represent the system of equations that we can obtain by utilizing the control point characteristics. And from these equations, from these set of equations, we can derive the basis matrix. How we can do that, we can first construct the constraint matrix by taking simply these values attached to the coefficients and then we take the inverse of that to get the basis matrix. (Refer Slide Time: 9:53) So, what is the constraint matrix? Let us recast the equation here. So, from there what we get, as we can see, 1, 0, 0, 0, this is the first row. 0 1 0 0. This is the second row, 0 0, 2 and 0 this is the third row and finally 1, 1, 1, 1. This is the fourth row. So, just by utilizing these values, we get the constant matrix C. 209 (Refer Slide Time: 10:41) Now, to get basis matrix what we need to do, we need to take the inverse of C, how to compute the inverse that we will not discuss here, of course, because that is a very basic knowledge. And you can refer to any book on basic matrix. Also, you can refer to the appendix of the reference material that will be mentioned at the end of this lecture to learn how to get the inverse of a matrix, assuming we already know that. So, if this is our constraint matrix C and if we take the inverse, the matrix that we will get is this one. And this is the basis matrix for the natural cubic spline. So, let us recollect the series of steps we followed to get the basis matrix first we defined the control points then using those control points. We set up the system of equations and from the equations we formulated the constraint matrix C. Then we took the inverse of C to get the basis matrix, since basis matrix is the characteristic matrix for the cubic polynomials and that natural Cubic spline, are made up of cubic polynomials. So, we can say that this basis matrix, which is characteristics of cubic polynomial is good enough to represent the natural cubic spline. So, we will follow that line of argument as discussed in the previous lecture. 210 (Refer Slide Time: 12:34) So, if we simply use the matrix, then we can say that this is the natural cubic spline instead of specifying the equations or anything. Now, another important property here that these cubic splines have are that they support continuity conditions up to C2 continuity, parametric continuity up to C2 that means they support C0 continuity, C1 continuity and C2 continue. But the problem is that cubic splines do not have local controllability. That means if we have a spline and we want to change its shape slightly by modifying a very few points on the curve, then we have to re-compute the whole curve rather than re-computing, only the localized area. So, that is essentially a very inefficient approach. For example, suppose I have curve like this and I want to change this part to something like this, but then instead of just restricting our computation within this part, we have to actually compute the whole curve again. So, it does not support local controllability. Now, let us come to the second type of interpolating splines that is Hermite cubic splines like natural cubic, Hermite cubic splines are also represented by 4 control points. 211 (Refer Slide Time: 14:39) However, it has one important property it supports local controllability. So, the main problem with natural cubic splines is alleviated with the Hermite cubic splines. (Refer Slide Time: 15:00) So, the 4 control points that defines Hermite cubic splines can be denoted by p0, p1, p2 and p3. Now where p0 and p2 are the values at the parameter boundaries that means u equal to 0 and u equal to 1, p1 is the first derivative at u equal to 0 and p2 is the first derivative at u = 1. So, 212 earlier we had p0 and p3 to be the points at the boundary p1 and p2 are the first and second derivatives at the same point that is u equal to 0. Now what we have is p0 and p2 here using to represent the points at the parameter boundaries, namely u equal to 0 and u equal to 1. P1 is the first derivative at u equal to 0 and P2 is the first derivative at u equal to 1. So, at both the boundary points we have first derivative denoted by the control points P1 and P2. (Refer Slide Time: 16:14) Following the approach we used to derive the basis matrix for natural cubic splines, we can also derive the basis matrix for the Hermite cubic splines, will not do that here. And you can try it yourself. The approach is same. So, you set up a system of equations, identify the constraint matrix take its inverse and get the basic matrix. The Matrix looks something like this, which is the representation for the Hermite cubic splines. 213 (Refer Slide Time: 17:04) Now, although the Hermite cubic splines support local controllability, but they do not support all parametric continuity conditions, unlike the natural cubics, they support only C0 and C1 continuity conditions and do not support C2 condition. But natural cubics support all these threes C0, C1, C2. This implies that the curve that results from the Hermite cubics polynomials are less smooth compared to natural cubic polynomial base spline. So, at the expense of smoothness. We are getting the local controllability property, so we are getting local controllability, but we are losing to some extent, the degree of smoothness in the curve. 214 (Refer Slide Time: 18:15) Now, one problem here is that we have to specify the first order derivatives. As control points and both the boundaries, clearly this puts extra burden on the user. (Refer Slide Time: 18:47) To avoid that we have another spline curve that is the cardinal cubic splines. Now, with this spline, we can resolve the problem that we faced with Hermite cubic splines that is having to define the first order derivative and both the boundary points. 215 (Refer Slide Time: 19:08) Like before since again, we are dealing with cubic splines, this splines is also made up of polynomial pieces, and each piece is defined by 4 control points. Again, we are denoting them using p0 to p3. Same notations we are using but here p1 and p2 represent the boundary values that means at u equal to 0 and u equal to 1. P0 and p3 are used to obtain first order derivatives at the boundaries. How, look at this system of equations here. This is the p1 control point which is the function value at 0. This is p2 control point which is the function value at 1, u equal to 1. Now first order derivative at u = 0 can be obtained using this equations. Similarly first order derivative at u = 1 can be obtained using these equations. Where we have used that to control points in this fashion. And also used one additional parameter t in both cases. 216 (Refer Slide Time: 20:40) Now t is called tension Parameter. So, essentially it determines the shape of the curve when t equal to 0 what we get is the catmull-rom or overhuset spline. So, that is a special type of spline when t equal to 0. So, using the value of t we can actually control the shape of the overall spline curve. So, here we do not have to actually compute the derivatives instead we can derive it using the control points and the value of t, so that actually makes the life of the user simpler. But again, the Cardinal Cubic also suffered from the same problem. That is it supports up to C0 and see one parametric continuity conditions. It does not support C2. So, it has less smooth curve than natural Cubics. 217 (Refer Slide Time: 21:48) So, then what is left is the basis matrix formulation. This is slightly complicated, but again, you can try using the same approach by setting up the system of equations where some rearrangements may required. Then you create the constraint matrix, take the inverse to get the basis matrix for the cardinal cubic spline. Which is represented in the form of S where S is this expression. So, to summarize, we have learned about 3 interpolating splines. Each of these interpolating splines are made up of polynomial pieces. And these pieces are Cubic. So, each piece is defined by 4 control points. In case of natural cubic, the control points are defined in a particular way, in case of Hermite cubic it is defined in a different way and in case of Cardinal Cubic, it is defined in yet another way. So, natural Cubic is very smooth, but it has some problems. Those problems are particularly related to local controllability, which is taken care of in the Hermite Cubic, but at the expense of smoothness. But specifying Hermite Cubic’s is slightly difficult, which is again taken care of by Cardinal Cubic’s. But the smoothness remains less compared to natural Cubic’s. Now, let us move to the other category of splines, namely the approximate splines, so what are these splines? Just to recap here, we have a set of control points, but this spline need not pass through all the points. Instead, the control points are used to define the convict’s hull, which determines the shape of the overall Spline curve. 218 (Refer Slide Time: 24:18) Now, there are two popular approximating splines used in computer graphics. One is called Cubic Bezier curves, and the other one is B splines. Like in the case of interpolating splines, let us try to understand these types in a little more details. We will start with cubic Bezier curves. (Refer Slide Time: 24:47) Now, these particular curves, the Cubic Bezier curves are among the widely used representation technique in graphics. The name has been derived from the French engineer Pierre Bezier, who first used it to design Renault Car bodies. 219 (Refer Slide Time: 25:17) Since it is Cubic curve so defined by 4 control points as before, but here the difference is that the curve is an approximating Spline that means the polynomial pieces do not pass through all the 4 control points. We can denote these points like before p0, p1, p2 and p3. So, there are 4 points, but it is not necessary that the curve passes through all the points. (Refer Slide Time: 25:51) Instead, these points are used to define the convex hull, so each piece originates at p0 and ends at p3, that is the first and the last control points. The other two points are used to determine the 220 convex hull within which the curve lies. So, 2 points are on the curve and other 2 points are controlling the shape of the curve. (Refer Slide Time: 26:24) Now, the control points can be used to define first order derivatives at the boundary values that is u equal to 0 and u equal to one using these expressions. So, this is the first order derivative at u equal to 0 and this is the first order derivative u equal to 1. And these two derivatives are defined in terms of the control points as shown in these examples. (Refer Slide Time: 27:05) 221 Now we can set up the equation. This is the Cubic polynomial equation, as we have seen before, and if we replace the values, the boundary values. And the. This should be the boundary value. This is one value and this is another value, and this would be the first derivative equal to 1. So, then what we get is this system of equation. So, this is boundary value at u equal to 0 boundary value at u equal to 1 this is first order derivative u equal to one and first order derivative at u equal to 0. So, for each we can actually replace u with the corresponding value and then get the set of equations, as we have seen earlier. (Refer Slide Time: 28:19) From this set of equation we can rearrange and get this form shown here for the control points. 222 (Refer Slide Time: 28:39) Now, from that rearranged form, we can get the constraint matrix as shown here. This is the constraint matrix. Then we take the inverse of the constraint matrix C -1 to get the basis matrix, representation of the Bezier curves, which is something like this. So, note here we followed the same approach that is first we created the set of equations using the control point characteristics, then we formulated the constraint matrix and finally we took the inverse of the constraint matrix to get the basis matrix. (Refer Slide Time: 29:32) 223 There is actually a blending function representation also for Bezier curves. We already said they are equivalent, but blending function representation for Bezier curves is also quite popularly used. Which as a general form something like this, where pk is the control points and Bez is the blending functions. And the function is defined within this range is u between 0 and 1. (Refer Slide Time: 30:12) Now, these blending functions are actually special functions, sometimes they are known as Bernstein Polynomial which has this form where C is this one. So, Bezier curves we can represent using blending functions where the blending functions are called Bernstein polynomials, which takes the forms on here in this first line. Where the term C (n, k) is shown in the second line. 224 (Refer Slide Time: 31:00) Now, if we expand the blending functions for say a cubic Bezier where n equal to 3, then how it looks? BEZ 0, 3 is (1-u)3, 1, 3 is this one 2, 3 is this one and 3, 3 is this one. So, these are the blending functions for Bezier Cubic curve. Same thing we can derive from the basis matrix also. As I said in the previous lecture, that they are equivalent. One can be derived from another and you can try to do the same. (Refer Slide Time: 31:46) 225 So, one major problem with the Bezier Cubic is that they do not support local controllability. That is, of course a major concern. Now, let us move to the other category of splines that is B splines. (Refer Slide Time: 32:14) Now here, the idea is slightly more complicated. Now, B splines are approximating splines, as we have already mentioned, they support up to C2 continuity conditions, parametric continuity conditions. That means C0, C1 and C2 all three they support. In other words, these splines give us a very smooth curve and they support local controllability. So, Bezier curves do not have local controllability, whereas these splines is very smooth. They are very smooth as well as they support local controllability. So, all our problems are solved with these splines. 226 (Refer Slide Time: 33:02) Now, let us try to understand the basic idea, the mathematics behind it is slightly complicated, so we will try to keep it at the minimum for simplicity. What is the idea? The idea is based on representing a polynomial in terms of other polynomials, as we have seen in the blending function representation. So, that is where this idea starts, that a polynomial can be represented in terms of other polynomial. (Refer Slide Time: 33:40) 227 So, what is the most general form of such representation, we have encountered this expression in the previous lecture pi control points and bi is the blending function. So, we are assuming in this representation that the function f is a linear combination of the blending functions where the control points p serve as the coefficients. Now, in this definition, one thing to be noted here is that we are using a parameter t instead of u. Now this parameter is more general. That means that we need not restrict ourselves to the range defined for you that is between 0 and 1. So, the t can take any value, not necessarily within the range 0 to 1. (Refer Slide Time: 34:40) Now, each bi is a polynomial. Let us assume that so then f can be considered as a spline made up of polynomial pieces, which is basically the idea now we can represent each bi as a combination of other functions, like the overall function f. So, f is represented as a linear combination of bi’s, each bi in turn can be represented as a combination of other functions. Then conceptually, each bi is also spline, because it is made up of other polynomials. So, when we talked about Spline, we said that Spline is a curve which is made up of lower degree polynomials now each of these polynomials. Again, we are assuming to be made up of even other polynomials. So, this polynomial pieces are themselves splines rather than simple polynomials. And that is the overall idea of B splines. That is we are having a spline, which is made up of basis spline rather than polynomial pieces. 228 Now, each basis spline is made up of polynomial pieces. So, the definition is one level higher than basic definition of Spline. In spline we have a curve made up of polynomial pieces in this spline. We have a curve made up of basis splines, which in turn is made up of polynomial pieces. (Refer Slide Time: 36:19) So, what we get is a spline made up of B-splines and that is the overall idea. So, when we are talking of B- splines, we are actually referring to the constituent splines. (Refer Slide Time: 36:36) 229 Let us try to understand in terms of an example. Suppose we have 4 control points and we want to fit a curve through those points. (Refer Slide Time: 36:51) So, since we have 4 points. So, our curve function will look something like this, we expand the general form on shown earlier to get the 4 blending functions in the general form. (Refer Slide Time: 37:17) 230 Now, assume that we are using a linear B-spline that means B-splines pieces made up of linear polynomials. (Refer Slide Time: 37:30) Then, each B-spline will have two linear pieces as per the following equation. This is the Bspline and these are the linear pieces defined within range of t. (Refer Slide Time: 37:58) 231 So, as you can see in the previous formulation, that each B-spline is defined between sub intervals of the parameter T now within this sub interval, the pieces that made that are the constituents of the B-splines have their own range. So, we have a range for the B- spline and we have sub ranges for the constituent polynomial pieces of the spline. (Refer Slide Time: 38:27) Now, those points in the parameter range where the piece starts or ends are called knot points or simply knot for the ith B-spline, the knots are i+1 and i+2. Just to recollect, so we have a spline made up of B- spline. Now each B-spline is defined within a range that range is subdivided for each constituent piece of a B-spline. Now, where a piece starts or ends are called knot points or simply knots. For the ith B-spline. We can define the knots as i, i+1 and i+2. 232 (Refer Slide Time: 39:21) For the 4 control points in our example, i takes the value 1 to 4. So, then we can have the knots. As 1, 2, 3, 4, 5, 6, these are the 6 knot points, and this set of points is the vector of knots, also known as the knot vector. Which is, of course, an increasing order of parameter values. (Refer Slide Time: 39:57) So, then to summarize each B- spline is made up of k where k equal to d+1, d is the degree of the polynomial. Now parameter t ranges between 1 to n+k having n+k knots. In our example n is 4, n is the number of control points k is 2 since d is 1. So, the range is between 1 to 6 with 6 knots. 233 (Refer Slide Time: 40:38) The ith B- spline is defined within the knots i and i+k, for example, the second B-spline that means i=2 is defined between 2 and 2+2 or 4, because k is 2. So, between 2 to 4, that means the knot values are 2, 3 and 4. (Refer Slide Time: 41:10) So, each B-spline has k+1 knot's. Now, in our example, since k equal to 2. So, each B-spline will have 3 knots for any other value of k the B-spline will have k+1 knots. So, these are the characteristics of the B-spline. 234 (Refer Slide Time: 41:32) Another characteristics is k is actually very crucial in determining all these characteristics. So, this k value is often called the B-spline parameter. So, you should keep this in mind that k is very important, it plays a very crucial role in determining the B- spline characteristics that we have enumerated earlier. So, often k is called the B- spline parameter. Now, let us see various types of B-spline. So far, we have given a basic idea, basic introductory idea of B- spline. Now let us see the types that are there for B spline. 235 (Refer Slide Time: 42:29) There are 3 types, uniform B-spline, non-uniform B-spline and nurbs or non-uniform rational Bspline we will briefly introduce each of these types. (Refer Slide Time: 42:44) So, when we have the knots that are uniformly spaced in the knot vector, like in the case of the example we have seen before, those B-spline are called uniform B-splines. If the knot vectors are not uniformly spaced, spacing between consecutive knots is not the same. Then we get nonuniform B spline. 236 (Refer Slide Time: 43:20) And finally, nurbs referred to the B-splines which actually refers to ratio of two quantities shown here in this expression. Each i’s are the scalar weights and bi’s are non-uniform B-splines and also it should be noted that the same B-spline are used in both numerator and denominator. So, nurbs essentially refer to this ratio, so we have a uniform B-spline where knot points are uniformly spaced, we have non-uniform B-spline, where knot points are not uniformly spaced. And we have nurbs where we talk of ratio of two quantities, each of which is made up of same non-uniform B-spline. 237 (Refer Slide Time: 44:33) Just for your information, although will not go into the details of these. That we can derive the piecewise functions for B-spline, but that is complicated and there is one recurrence relation we can use to do that called Cox de Boor recurrence relation. If you are interested, you may refer to the book and the material mentioned at the end of this lecture. So, far, we have discussed about different types of splines. Now, the basic idea of splines is that using this, we should be able to represent some curves and render it on the screen. Now how can we display spline curves? (Refer Slide Time: 45:27) 238 So, where we started how to fit a curve to a given set of control points, so we are given a set of control points and we want to fit a curve, we discussed how to use the idea of spline to get the curve. (Refer Slide Time: 45:49) Now what we can do. So, we got the spline. Then what we can do the simplest way is to determine or interpolate new control points using the spline equation and then joined the control points using line segments. We can also use the blending function representation for interpolating these points. 239 (Refer Slide Time: 46:14) As an example, suppose we are given two control points p0 and p1 this p0 and this is p1 since two control points out there we can use a linear interpolating spline and once we identify the spline representation using the approaches mentioned already, we can evaluate the spline function to create 3 new control points shown here. And we will get the parameter value for these control points defined here, and then we will get the actual function value. So, we are given two control points, we are using a linear interpolating spline and using that spline, we are creating 3 new control points and once these points are created. We can join them using a line segment to get the curve. 240 (Refer Slide Time: 47:35) And we can keep on adding more and more points to get a final curve. (Refer Slide Time: 47:42) In fact, if more control points are there we can use more control points to get better splines, higher order splines instead of linear splines and do the same approach, follow the same approach. So, anything is possible. 241 (Refer Slide Time: 48:04) But the main problem is that we have to evaluate the blending function again and again. Now, when we want to display the splines at a very high rate, these computations may be costly and may slow the rendering process. So, we require some alternative approach. (Refer Slide Time: 48:22) One of those approaches is known as De Casteljau algorithm. And we will have a quick look at the algorithm. 242 (Refer Slide Time: 48:36) So, the algorithm consists of a few steps. They are n control points given. What we do, we join the consecutive pairs of control points with line segments and then we continue till a single point is obtained. What we do, we first divide each line segment in the ratio, d:1-d to get n-1 new points where d is any real number greater than 0. Now join these new points with line segments and then we go back to step 1.So, this is kind of look how it works. (Refer Slide Time: 49:31) 243 Let us see one example, suppose here n equal to 4 and we have decided on d to be one third. So, p0, p1, p2, p3 are the initial points, then at one third of the p are say p0 and p1 we took one point. Similarly, one third of the p are p1, p2 you took another point and then another point? And then these p0 has been modified to refer to this. Then we continue in the loop to finally get point here after the algorithm is executed So, this division continues till we get one point. And at that point, we stop. So, that is the output point. This is, in short how we can use an iterative procedure to a simpler procedure to get points on the curve using the algorithm. (Refer Slide Time: 51:01) So, what we are doing in each run, we are getting a new point and we can execute it any time we want to get m new points, when these points are joined with line segments, we get the Bezier curves. So, here it has to be noted that we get this particular curve, not all Spline types. So, this algorithm actually refers to creation of a Bezier curves, which is an approximating spline. We can execute the algorithm m times to get the m points on the Bezier curve. So, that is how we can get the points and create the curve. Now there is another thing in computer graphics, so we are not necessarily satisfied with only curves. What we need are surfaces also curved surfaces. So, using spline idea how can we get curved surfaces is very simple. 244 (Refer Slide Time: 52:13) Look at this figure here. We can actually create a grid of control points like shown in this figure. So, each subset of the grid can be used to get a curve in different ways and when we put together these curves, we get a surface, so to get the surface, we define a grid shown in the field of circles here using one subset, we get one spline curve. Using another subset, we get another Spline curve which are intersecting, and when we get this grid of curves, we get the surface, the surface is defined by that grid. So, this is the simple way of getting a surface. So, to recap today, we have learned about many new things. First of all, we learned different types of splines. We talked about 3 interpolating splines, natural cubic, Hermite cubic and Cardinal cubic. We also talked about two approximating splines, namely Bezier curves and Bsplines. The idea of B-splines is slightly complicated. And we explained it in terms of example. Then we touched upon the idea of how to display a spline curve simple approaches first get the curve spline equations or spline representation using that try to interpolate new points on the curve and join those points using line segments to get the final curve. But here evaluating the splines, equations at each new point may be time consuming, may not be efficient solution. So, to avoid that some efficient algorithms are proposed. One of those is the De Casteljau method. Now, in this approach, we can get Bezier cubic curves using simple iterative procedure, 245 which does not involve lots of computations. Also, it touched upon the idea of spline surfaces that is essentially creating set of curves and then using those curves to define the overall surface. (Refer Slide Time: 55:10) Whatever we have discussed today can be found in this book, Chapter 2. So, you can refer to Section 2.3 of the book to get more details, including some topics that we have excluded from our discussion today. Namely, Cox de boor equations to get the B-spline blending functions. With this, we complete our discussion on spline. One more topic is left in the overall discussion on object representation which we will cover in the next lecture. Thank you and goodbye. 246 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 9 Space Representation Methods Hello and welcome to lecture number 9 in the course Computer Graphics. We are discussing the stages of the graphics pipeline. As you may recollect, there are 5 broad stages, the very first stage is the object representation, and we are currently discussing on the object representation technique. (Refer Slide Time: 0:55) This figure may help you recollect the stages. We have the first stage as object representation, then we have the second stage as modeling transformation, third stage lighting or coloring, fourth stage is the viewing pipeline which itself consists of 5 sub stages, the viewing transformation, clipping, hidden surface removal projects and transformation and window to viewport transformation. And finally, we have the scan conversion which is the fifth and final stage. Now, we are discussing the first stage, object representation. 247 (Refer Slide Time: 1:44) So far what we have discussed, let us have a relook. There are broadly 4 techniques that are popularly used for representing objects, one is point sample rendering, then we have boundary representation. Now, in boundary representation, there are three techniques: mesh representation, parametric representation and implicit representation. In the previous few lectures, we talked about point sample rendering, the boundary representation techniques including mesh representation, parametric representation and implicit representation. Also, in the previous couple of lectures, we discussed in details one of the very popular parametric representation technique, namely the spline representation. The other techniques include space representation or space partitioning based representation technique, which itself has 3 sub techniques or 3 BSP or binary space partitioning and CSG. Then, we have sweep representation having further techniques namely sweep surface representation and surface of revolution representation. Apart from these four, there are some other implementation techniques which are application specific, and some advanced techniques such as scene graph, skeleton model and advanced modeling techniques including particle systems, fractal systems and so on. And we have got some idea on various representation techniques in our previous lectures. Today we are going to discuss in details the space partitioning technique for representing objects including its sub techniques. 248 (Refer Slide Time: 3:56) So, that will be the topic of our discussion today, space partitioning methods. (Refer Slide Time: 4:04) 249 Now, when we talk about space partitioning, what we refer to? As the name suggests, space partitioning refers to the representation of an object in terms of the 3D space that it occupies. The space is defined by its boundaries and we represent the space or the volume that is enclosed by the boundaries rather than the boundaries, which we have seen in the earlier lectures. (Refer Slide Time: 4:43) Now, there is one important concept in space partitioning techniques, this is called voxels. You have heard of pixels, we mentioned this term earlier. This is the smallest display unit. And on a display screen, we assume that there is a pixel grid. Similarly, voxel is basically the 3D counterpart of the concept of pixel. Voxel is the smallest unit of representation in a 3D space. Any 3D space can be assumed to be having a voxel grid. 250 (Refer Slide Time: 5:35) Like a pixel grid, we can create or assume that a voxel grid exists to represent a 3D space. So, pixel grid is for 2D space, voxel grid is for 3D space. And voxel, as maybe obvious, is the simplest way of representing 3D objects. (Refer Slide Time: 6:12) Now, when we talk about voxels, we are essentially referring to typically uniform sized cubes or parallelepipeds. So, essentially, a grid of uniform sized cubes or parallelepipeds that is our voxel grid. Now, in each grid, each voxel element of the grid typically carries various information. What are those information? The intensity at that particular 3D region, the temperature of that region and so on. And these information actually help in uniquely 251 describing the 3D scene. So, essentially voxels are having attributes which are characteristic information for the particular object in the scene. (Refer Slide Time: 7:25) Using voxels, we can actually apply various techniques to represent objects. One such technique is the Octree method. What is this method? As the name suggests, it is kind of tree representation, now let us try to understand what is this tree. (Refer Slide Time: 7:51) So, essentially it refers to a method to create a voxel grid, but in the form of a tree. 252 (Refer Slide Time: 7:59) So, in this method, the input is a 3D region. Now, we divide this input space into 8 sub regions, then each sub region in turn is again divided into 8 sub sub regions. So, essentially this is a recursive step and this recursion continues till we arrive at some preset uniform size of the sub regions. Now, this size can be user defined, for example, a unit cube. So, when we see that our division leads to unit cube sized regions, then we stop there, an example is shown in this right-hand figure. So, this is our initial 3D region. Now, as you can see, we have created 8 sub regions from here, this is 1, 2, 3, 4, 5, 6 ,7 and 8, then each of these sub regions are further divided into 8 sub sub regions like here as you can see in this division, again there are 8 sub regions 1, 2, 3, 4, 5, 6, 7, 8. Similarly, here also, we divided and all other sub regions also we can divide. 253 (Refer Slide Time: 9:50) Now, what is the output of this recursion? It creates a tree. So, we have a root node. For each root node, we have created 8 child nodes from here to here. Now for each child node, again we are creating 8 child nodes and this process continues. So, essentially this leads to a tree. Since each node has 8 children, we call it octree and the leaf nodes of this tree represent the 3D space. So, all the intermediate nodes are intermediate representations, they do not represent the final object, at the leaf level the final object is represented. Remember here that along with the space the attributes or the characteristic information is also stored at the leaf node level. (Refer Slide Time: 10:59) 254 So, in order to represent different objects, we can associate unique properties to those leaf nodes or voxels, the properties such as color, density, temperature and so on, that is the basic idea. So, we are given 3D space, the space we divided into 8 regions and we continue this division, each sub region is divided further into 8 sub sub regions and so on. We continue this division in a recursive manner till we reach the voxel level or the uniform sized cubes. And this process creates a tree, each node in this tree has 8 children, so the tree is called octree. The leaf level contains the voxels are representation of the 3D object and the characteristic information such as color, density, temperature and so on are associated with each voxel in this tree to uniquely identify objects. Octree is one of the various methods where space is used to represent the objects. Another popular method is called BSP or the BSP method. (Refer Slide Time: 12:51) Now, BSP stands for binary space partitioning and the method is binary space partitioning method. So, what it does? In the octree, we have done some recursive partitioning of space. In the BSP tree, we follow similar recursion, that is we are given a space we divide it into subspace and then divide the subspace again and continue till some condition is met. 255 (Refer Slide Time: 13:26) However, instead of dividing a space into 8 subspaces like we have done in the case of octree, here what we do we divide it into 2 spaces or subspaces in each recursive step. So, in case of octree we are dividing the region into 8 sub regions, in case of BSP tree we are dividing the region into 2 sub regions, that is why it is called binary space partitioning. Binary or two. (Refer Slide Time: 14:01) Now, how do we divide? We use planes. So, we are given a region we assume planes are available to divide the region into sub regions. But these planes need not be parallel to the planes formed by the axes XY, YZ or ZX. It can be planes of any orientation. Of course, if 256 we are having parallel to the planes formed by the principal axis then it is easier to implement, otherwise we have to do additional computations. (Refer Slide Time: 14:49) Let us see one example. Suppose we are given this 3D region and we are going to represent it in the form of a BSP tree. So, our root node is the whole object, then we have used a plane this one to divide it into two regions, the left region D and the right region C. Left with respect to the plane and right with respect to the plane, left sub region and right sub region. Then we have used another plane here to divide B into two sub regions D and E. So, then eventually what we got? The object is represented in terms of three sub regions D, E and C. So, these three at the leaf level represent the object. And again like in case of octree, we can associate unique properties with each of these sub regions to uniquely identify the object. Now, here we have used two planes which are orthogonal to each other, they are orthogonal planes, but that is, as we have already seen earlier, that is not a strict requirement. Planes of any orientation can be used to divide the regions into two sub regions. So, here instead of these orthogonal planes, we could have used any plane of any orientation to divide it into two regions, two sub regions, that is the most general formulation for BSP tree creation. But, as I said, if we are using planes that are not parallel to the planes formed by the principal axis, then additional computations may be required to process the representation. Let us see one more example for creation of BSP tree, how we can create, how we can write an algorithm for creation of a representation. 257 (Refer Slide Time: 17:29) Consider this figure, here we want to represent this circle, this is of course a two dimensional figure, it is not a 3D object, but this 2D object we want to represent this circle which is having the center at Pm and radius r. So, we want to represent it using BSP tree. (Refer Slide Time: 18:01) How we can do that? So it take as input the surrounding region R which is the square represented by the four vertices or the corner points P 1, P 2, P 3 and P 4 and the circle is within this region. So, when we are representing the circle, we are essentially representing this region with some regions that are part of the circle having unique characteristics of the circle. How we can do that? 258 (Refer Slide Time: 18:46) So, we will partition the region into 4 sub regions. So, first we divide into 2 sub regions, then each of these 2 sub regions we divide into further 2 sub regions and so on. So, just for simplicity we are combining these steps together here and stating that we are dividing it into 4 sub regions. And here we are using lines parallel to the axes. (Refer Slide Time: 19:25) Using that idea, what we can do? We can write an algorithm for a function create to BSP where R is the input region. Now, we will use R to create a node, then from R will create 4 regions by joining the midpoints of the sides of the square, this is same as applying the binary spaced partitioning techniques in multiple steps. Like in this case, we first create 2, then for each of these two create two more and we are actually combining these steps here in this line. 259 Now, for each of these sub regions here at this leaf node of this figure, if the size of this sub region, suppose this region is divided into 4 sub regions, so we are considering this. Now, if the size of the sub region is a unit square, where we are assuming that a pixel is represented as a unit square. Then we go to the next step that is if distance between the centers of the original region R and the sub region that we are considering currently is less than or equal to the radius of the circle, then this sub region is part of the circle. So, we add it as a leaf node to the BSP tree and mark it as 'inside'. Otherwise, we mark it as 'outside', although add it as part of BSP tree. Now, if the size is not unit square, that is we have not reached the termination condition of recursion. So, for each node we perform the function again, that is we call the function again recursively mentioned in these 2 steps. So, at the end of it we get a tree, where the leaf nodes represent the bounding region, the original region divided into sub regions. Now, some of these sub regions are marked as inside this particular object, inside the square. Whereas, others are marked as outside. So, from that tree, we get a representation of the square. So far so good. Now, what is the problem? Is there any problem with this space partitioning approach? (Refer Slide Time: 23:00) One problem may be very obvious to you by now, that is we require large memory to store these voxel grid information. So, we are creating tree where the leaf nodes represent the grid and if we are partitioning uniformly, then there will be large number of voxels and we require 260 significant amount of memory space to store the voxel information. The problem comes because we are dividing space into uniform sized voxels irrespective of the space properties. Like we have seen in the earlier example, the region R is a big region, within this, circle occupies a small area, but we are dividing the region into uniform sized sub regions and many sub regions may be outside the circle. So, if you want to represent the circle, it is not necessary to represent those regions which are outside the circle that actually wastes some memory space. So, if we have some method to avoid these wastage, then of course that would be helpful. (Refer Slide Time: 24:21) So, what do we want? If a certain region in space has the same properties everywhere, then ideally we should not divide it further. Because even if we divide whatever we get, we will get the same property. Instead, what we do, we still divide it into voxels, although each voxel contain the same attributes, because the broader region or the broader region in space has the same property everywhere. 261 (Refer Slide Time: 24:59) So, we can actually save, this is there is a typo here, it is safe, we can save memory by using this idea of non-uniform partitioning. So, earlier we are talking of partitioning a region into uniform sized voxels. Now, we are talking of non-uniform sized space partitioning. What is that? That is, instead of using many voxels to represent a homogeneous region that is a region where property is same everywhere we use a single unit. So, we do not divide that region further, instead the whole region is represented as a single unit. (Refer Slide Time: 25:47) How we can do that? One way is modify the octree method. Now, earlier we were having fixed 8 children for each node because we were dividing the region into 8 sub regions irrespective of the property of the region. Now, what we can do? Either we divide or we do 262 not divide. So, if one sub region is having same property everywhere, then we do not divide it. So, that node will not have any children or 0 children. But if it is not the case, then we divide it into 8 sub regions or 8 children. So now, in the revised octree method, what we will have either 0 or 8 children for each node, which was not the case earlier. So, earlier we were getting 8 children for each node, intermediate node, now we will get either 0 or 8 children for each intermediate node depending on the property of that space represented by the node. (Refer Slide Time: 26:58) The recursion procedure, of course, will remain, the recursion procedure will remain the same. But at each step of the recursion, we will check if attributes of a space is same everywhere. If that is the case, then we do not divide it any further. So, we do not go for recursive division again for that particular region. 263 (Refer Slide Time: 27:35) BSP representation also suffers from the same problem if we are going for uniform partitioning. (Refer Slide Time: 27:54) And to avoid that, we can go for a revised method or a refined method where we can either divided region into 2 sub regions if the sub regions are having different properties at different places or we do not divide. So, the intermediate nodes will have either 0 or 2 children similar to the revised octree method. So, what we have learned is that the basic unit of representation will be voxel. Now, using voxel we can divide a given 3D space to represent the object contained in that space in two ways, one is uniform division. 264 Now, when we are going for uniform division, we will get one particular type of tree if we are following octree method or one particular type of tree if we are following the BSP method. In the octree method, if we are going for uniform division, then we will get 8 children for each node. In the BSP method, we will get two children for each node. Now, if the region which we are dividing is having the same attribute or same properties everywhere then it will be a wastage of memory to divide that region into sub regions and store the information. Instead, we need not divide it any further. So, in the case of octree method, we modify it for non-uniform partitioning by imposing the fact that a node can have either 0 or 8 children. Similarly, in a revised BSP method, we can modify by imposing the fact that each intermediate node can have 0 or 2 children. (Refer Slide Time: 29:56) There is another space partitioning method known as CSG. What it means? 265 (Refer Slide Time: 30:07) It stands for Constructive Solid Geometry. So, this is another method for space representation of objects. Now, in case of octree or BSP what we have seen? We have seen that these representations are based on division of space. So, we are given a space and we are dividing it into subspaces. In comparison to that, in case of CSG what we have? It is actually based on joining up spaces, just the opposite of what we do in BSP or octree methods. So, in case of CSG we rely on joining of spaces to represent objects. Let us try to understand this with respect to an example. (Refer Slide Time: 31:00) Consider this object, this looks pretty complex. However, when we are using CSG or constructive solid geometry method, we can represent it as a union of some other shapes. So, 266 we start here at this level, here we have this shape, this shape, this shape and this shape, so 4 shapes are there. Now, we combine these 2 shapes to get one shape and we combine these 2 shapes here to get this shape. Now, these 2 shapes are then further combined to get this overall shape. So, here we start with set of primitive shapes or basic shapes, then we apply a set of Boolean operators namely union, intersection, difference etc. which are defined over this set of primitive shapes to get newer shapes. Like here on these 2 shapes, we applied a Boolean operator to get a new shape and this process continues till we get the desired shape here. So, the operators can be applied hierarchically. So, from this level we reach to this level, from this level we reach to this level, at each level we apply the Boolean operators. So that is a hierarchical application of operators. So, in other words, we have a set of primitive shapes defined and a set of Boolean operators on those shapes also defined. Now, we apply those operators on the shapes to get new shapes and we apply those operators hierarchically till we get the desired shape, till we are able to represent the desired shape. So, what is the representation? Representation is essentially the primitive shapes and the Boolean operators applied in a hierarchical manner, so that is the representation. So, that is in summary, what this constructive solid geometry is all about. Now, let us summarize what we have learned so far. (Refer Slide Time: 34:07) Today and in previous few lectures we have learned various representation techniques, which is the first stage of the pipeline. Boundary presentation including splines, space representation 267 and also got overview of other representations. Now, there are many techniques as we have seen so far. One question is which technique to use and when? That is a question that always confronts a graphic system developer. Which technique should be used and in which situation? What guides that decision making process? (Refer Slide Time: 34:55) Now, we may wish to have most realistic representation. Clearly, advanced techniques would be useful but advanced techniques such as particle systems may not always be possible because they require lots of computational resources which may not be available in all graphic systems. So, each technique comes with a cost and there is a tradeoff, which technique to use in which situation. (Refer Slide Time: 35:26) 268 Now, this cost may be computational or storage requirement or both and depending on the resource available, we have to make the decision. If we are having very less resource then we cannot think of advanced modeling techniques, then we may have to lose some realism and have to settle for some less realistic scenes or images. (Refer Slide Time: 36:07) For example, suppose we want to represent coastlines in a scene. So, what is desirable? Coastlines are typically self repeating structures and fractal representation is very good for such shapes. So, we ideally should use fractal representation to have a realistic coastline displayed. But it may not be possible if we are considering a mobile platform with limited storage processor and battery because fractal representation has additional computational cost which may not be supported in low end mobile devices. So, in that case, we may have to use some simpler method. So, we are losing here something, the realistic effect, but gain something which is a working approximation. So, such trade-offs are very much part of the decision making process, balancing those trade-offs. 269 (Refer Slide Time: 37:18) Another consideration maybe ease of manipulation. So, when we are creating animation, this consideration comes handy and it becomes important. Now, as we said each representation has its own methods, own algorithms, own process. Subsequent stages of the pipeline needs to process these representations to carry out the operations that are part of the subsequent stages. So, that requires manipulation of the representations. Now, if it requires a lot of time to manipulate, for example, rotate an object, shift 10 objects horizontally or vertically, scale up or down an object or clip, project or perform better in surface removal, etc. these are part of other stages of the pipeline. We will discuss about the stages in details later. Now, if a lot of time is required to perform these operations, for objects that are represented in a particular way, the quality of animation may be reduced. So, in such cases, we should look for simpler types. So, this is another consideration that should be kept in mind while choosing a particular model of representation, that is ease of manipulation. So, if we are having low end devices, low end graphics systems, then it is not advisable to go for advanced representation techniques which require lots of manipulations later, particularly in the context of animation. 270 (Refer Slide Time: 39:18) Third consideration is ease of acquiring data. For example, vertex list representation for a curved surface using a mesh require large number of vertices to be specified. Now, we consider spline representation. In that case, we require fewer control points to represent the same curve. So, clearly here, representing a curve using mesh involves more effort for acquiring data, whereas representing the same curve with a spline involves less effort to acquire data. So, sometimes these ease of acquiring data can be a deciding factor. Depending on the designer of the graphic system, a particular method can be choosing where the data can be acquired easily. (Refer Slide Time: 40:22) 271 So, there are several trade-offs we have to balance and we have to decide what are those trade-offs resources available, nature of interactions, resources in terms of computing platform, nature of interaction in terms of our level of comfort with supplying large amount of data and also effect, whether we want a very realistic effect or we are looking for some approximation considering the lack of resources. (Refer Slide Time: 41:07) So, depending on these broad 3 considerations, we have to choose a particular modeling technique. So, with this, we have come to an end to our discussion on the first stage of the graphics pipeline, that is object representation. In the next lecture, we will start our discussion on the second stage of the pipeline, namely, modeling or geometric transformations. (Refer Slide Time: 41:41) 272 Whatever I have discussed today can be found in this book. You are advised to go through chapter 2, section 2.4 and 2.6 to get more details on the topics that I have covered today. Thank you and goodbye. 273 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 10 Introduction to Modeling Transformations Hello and welcome to lecture number 10 in the course Computer Graphics. We were discussing about the graphics pipeline. To recollect, in computer graphics what we do, we generate 2D images on a computer screen and the process of generating these 2D images involves a set of stages, together these stages constitute the graphics pipeline. So, let us have a relook at the 3D graphics pipeline. (Refer Slide Time: 1:15) When we are talking of 3D graphics pipeline, just to refresh our memory, we are talking of creation of a 2D image from a 3D description of a scene. 274 (Refer Slide Time: 1:25) So, what are the stages that are part of the pipeline? First stage is object representation, second stage is modeling transformation, third stage is lighting or coloring, fourth stage is viewing pipeline. Now, note here that we are calling this stage as a pipeline itself, that means it consists of some more sub stages. There are 5 such sub stages viewing transformation, clipping, hidden surface removal, projection transformation and window to viewport transformation. And the final stage is scan conversion, also called rendering. Among these stages, in the previous few lectures, we have covered the first stage, that is object representation. So, we now know how we can represent the objects that constitute a scene. Again, if I may repeat, the idea of computer graphics is to convert this representation to a sequence of 0s and 1s and that conversion takes place through the execution of these stages. The very first stage is object representation which we have already discussed. Today, we are going to start discussion on the second stage, that is modeling, also called geometric transformation. 275 (Refer Slide Time: 3:10) So, what is there in this stage? (Refer Slide Time: 3:14) Now, when we talked about representing objects in the earlier lectures, what we implicitly referred to is that the objects were represented individually. Now, when we are defining those objects individually, we are implicitly using one coordinate system that is typically called local or object coordinate system. So, for each object definition we are having a local or object coordinate. Now, within this coordinate system we are defining the object. 276 (Refer Slide Time: 3:57) What that means? That means, at the time of defining objects, we are actually not bothering too much about the object shape, size and position relative to other objects. So, we have defined an object, but in essence there may be multiple objects where this particular object may have a relative size or relative position or a relative orientation. But when we are defining the object in its own local coordinate, we are not paying too much attention to those factors. (Refer Slide Time: 4:42) But, when we are going to compose a scene, which constitutes all the objects that we have defined, the objects need to be assembled together. Now, when we are talking of assembling the objects, what it means? It means that the objects should be put in a way it says that the 277 overall scene becomes perceptible. So, the object shape, size, orientation now is very important. So, when we are defining objects in its own coordinate those things are not important, but when we are assembling them, it becomes important. So, when we are trying to compose a scene by taking care of the relative shape, size, orientation and position of the objects with respect to other objects, we are again implicitly assuming another coordinate system, that is typically called the scene or more popularly the world coordinate system. So, earlier we dealt with local or object coordinate system, now, we have to deal with another coordinate system that is popularly known as the world coordinate system. (Refer Slide Time: 6:06) And as I said, in world coordinate system, the shapes, sizes, positions, orientations of these individual objects needs to be taken care of, so that we can construct the scene. So, those become very important now, in the world coordinate system. 278 (Refer Slide Time: 6:29) How we can do that? So, earlier we have defined objects in their own coordinate system without bothering about the relative shape, size, position, orientation, etc. Now, we are going to assemble them together in the world coordinate scene. And now, we have to think about those shapes, sizes, partitions orientations, so that the scene becomes perceptible. We can do that by applying some operations, by performing some operations. These operations will transform the objects from its local coordinate to world coordinate. So, in order to create the scene by assembling the objects together, which in turn are defined in their own coordinate system, what we need to do, we need to perform some operations to transform the objects from their local coordinate description to world coordinate description. (Refer Slide Time: 7:41) 279 Now, these operations or the transformation takes place in the second stage of the graphics pipeline. Let us see one example. Here we have 2 objects, this is object 1 on the leftmost figure and object 2 in the middle figure. Now, we want to create this object shown in the right hand figure. So, as you can see in this object in this overall object what we have, we have these cylinders, how many instances 1, 2, 3, 4 and the other shape how many instances 1, 2 and 3. So, we have 4 instances of this object and 3 instances of this object. And the way these objects are defined in their own coordinate system are not the same as the way they are defined in this scene, which is on the right hand side. Here as you can see, here the orientation is different, orientation is different size is also different in all the four instances. Same is true for the other objects, the instances of the other object. So, these coordinates where the objects are originally defined are the local or object coordinates. The coordinate here represented by the principal axis X, Y, Z in the right hand figure is the world coordinate. Here we are assembling multiple instances of the original object definitions to construct the overall object. And as you can see here, in order to do that, it is very important that the objects are put into proper place, in proper orientation and proper size. So, that is what is the job of the transformation that takes place in the second stage. (Refer Slide Time: 10:10) Now, since these transformations change some geometric properties or takes place on the geometry of the object definition, so we call these as geometric transformations. Also it is known as modeling transformations. 280 (Refer Slide Time: 10:38) Now, these modeling transformations imply applying some operations on the object definition in local coordinate to transform them as a component of the world coordinate scene. So, this is what we can more formally talk about the modeling transformation, that is applying some operations on the object definition to transform them as a component in the world coordinate scene. (Refer Slide Time: 11:11) What are those operations, in fact, there can be many such operations. We will soon see what are those operations. 281 (Refer Slide Time: 11:23) But all the operations can be derived from a set of basic operations. So, although we can in principle apply many operations, but these operations can be thought of as derived from a set of basic operations. (Refer Slide Time: 11:43) Now, let us have a look at those basic operations. There are actually 4 such basic operations, translation, rotation, scaling and shearing. Translation all of us know, what it does, it translates object from one position to another position. 282 (Refer Slide Time: 12:03) In rotation, what we do? We rotate the objects by some angle either in clockwise or anti clockwise direction around some axis. (Refer Slide Time: 12:22) With scaling what we can do? We can reduce or increase the object size. 283 (Refer Slide Time: 12:30) And finally, with shearing, we can change the shape of the object. It may be noted here that shearing is in a stricter sense, not a basic transformation and it can be derived as a composition of rotation and scaling. However, for simplicity, we will assume that this is a basic transformation and we will create it accordingly. So, then, let us recap. So, we have 4 basic transformations, translation, rotation, scaling, and shearing. Among them, shearing changes the shape of the object, scaling changes the size of the object, translation and rotation changes the position and orientation of the object. (Refer Slide Time: 13:24) Now, I am assuming that you have some idea of these operations and you may be knowing that these operations change the geometric properties of the objects in terms of changing their 284 shape, size and location. Since, that is the case, so we call these transformations as geometric transformers. So we can call the operations performed in the second stage as either modeling transformation or geometric transformation. Now, let us go into a little deeper discussion on each of these transformations. Let us start with translation. (Refer Slide Time: 14:10) What happens in translation? (Refer Slide Time: 14:11) As you can see, suppose we have an original point here in this reference frame, which is denoted by P with coordinate x and y. Through translation we can reposition the point to P dashed with new coordinates x dashed and y dashed. So essentially we are displacing this 285 point to another point by an amount tx and ty to get the new point and this displacement takes place along the x and y direction. So using this knowledge we can actually derive the new coordinate with respect to the old coordinate. What it will look like? (Refer Slide Time: 15:06) So the new coordinate, the new x coordinate will be x plus the displacement amount along the x direction and the new y coordinate will be the original y coordinate plus the displacement amount along the y direction. So these are simple derivations, and simple to formulate. And these are the relationships between the new and the old x and y coordinates of the points. 286 (Refer Slide Time: 15:41) Now, these displacements can be thought of in different ways. So if we are moving along positive x axis or positive y axis, then we call it positive displacement. If we are moving along negative x axis or negative y axis, we call it negative displacement. So the sign of the amount tx or ty will be different for positive or negative displacements. (Refer Slide Time: 16:21) Now, let us shift our attention to rotation. 287 (Refer Slide Time: 16:25) Now, in case of rotation, we do not have horizontal or vertical displacements, instead we have angular displacement. In other words, the point moves from one position to another on a circular track about some axis. So, here, we are having angular displacement and the point is moving around some axis. (Refer Slide Time: 16:57) We follow some convention typically, that is if the movement is counterclockwise, then it is positive angle of rotation. So consider this example here, we have the original point here. And now, we are having the point after rotation, it should be denoted by x dashed, y dashed and the rotation angle is ϕ. Now, since we are moving in the counterclockwise direction, we are calling it positive rotation angle. 288 If we are moving in clockwise direction, then we typically consider that to be a negative rotation angle. That is one convention typically followed. And in case of 2D rotation, we typically assume that the rotation takes place around the Z axis. However, later on we will see for 3D rotation what are the conventions. (Refer Slide Time: 18:13) Now, let us try to derive the relationship between the new and old coordinates. So, the old coordinate is (x, y), the new coordinate is (x’, y’). Now, as you can see, we can represent x as (r cos θ), now r is the radius of the circular track and θ is the angle between the x axis and the original point and y is (r sin θ). Now, as we can see x’ is if we draw a line like this, then we can represent x’ as {(r cos θ) + ϕ}. Now, if we expand then we will get {r cos θ cos ϕ - r sin θ sin ϕ}. Since (r cos θ) is x, so it is (x cos ϕ) and since (r sin θ) is y, so it will be (y sin ϕ). So, similarly for y’, we can have a similar expression {x sin ϕ + y cos ϕ}. These two are relationship between the old coordinate of the point and the new coordinates of the point. 289 (Refer Slide Time: 19:44) And as I already mentioned, counterclockwise angular movement is typically considered as positive, otherwise it is negative. Now, in case of negative angular movement, we change the sign of the angle of displacement. Instead of ϕ, we will use – ϕ. That is the only change we will make. (Refer Slide Time: 20:11) So we have learned about translation and rotation. How we can apply these transformations to the points? 290 (Refer Slide Time: 20:23) The way we derived, they applied to a point. (Refer Slide Time: 20:33) For an object we have many points. So single application to a point will not be sufficient. So what we need to do? We simply apply on all the points that make up the surface. Now, you may be thinking that that is impossible because there maybe infinite number of points on a surface. However, we can actually do that by applying the transformations to all vertices in a vertex list representation or all the control points for a spline surface. So, when we are going to apply it, we essentially think of some representation, it can be a vertex list representation as in case of mesh representation, or can be a set of control points as 291 in case of spline representation, and we apply the transformations on each of these points are so that the entire object gets transformed. (Refer Slide Time: 21:43) And as I have mentioned earlier that by applying in this way, we can change the object orientation using rotation and position using translation. So by applying rotation on all the vertices or all the control points, we can change the orientation of the object and by applying translation on all the vertices or all the control points, we can change the position. (Refer Slide Time: 22:17) The third basic transformation is scaling. What happens in scaling? 292 (Refer Slide Time: 22:25) It changes the size. Now, changes can take place in both ways, either it can decrease or it can increase. So both increase and decrease of the object size is possible with scaling. (Refer Slide Time: 22:43) Now, mathematically how scaling is defined? It is defined as an operation of multiplying object coordinates by some scalar quantity. Now, these scalar quantities are known as scaling factors. So essentially we are multiplying scaling factors to the coordinate values to scale up or down the objects. Scale up means increasing the size of the object, scale down means decreasing the size of the object. 293 (Refer Slide Time: 23:27) So, at the level of a point, how we can understand it? So, given a point P, we simply multiply the x coordinate with a scale factor along the x direction and a scaling factor along the y direction, we multiply this to the y coordinate to get the new point (x’, y’). So, the relationship between the old and new coordinates will look something like this. So the new coordinate x’ can be represented in terms of x in this way, and the new coordinate y’ can be represented in terms of y in this way, where sx and sy are the two scaling factors along the corresponding x directions. For example, here we have one object and we are using scaling factor along x direction to be one third and scaling factor along y direction to be half. Now, if I multiply these scaling factors to the x and y coordinates of the vertices, I will get 4 new vertices as shown here in this right-hand figure. Now these vertices together will represent the object. Since the scaling factors are less than 1, that means we are scaling it down or decreasing the size. So, like in case of translation or rotation, here also, we did the same thing. That is we applied the scaling operations to all the points that define the object. Now, if we are using mesh representation, then those points are essentially the vertices of the vertex list. If we are using a spline representation, then these points are essentially the set of control points. And we apply scaling factor to each of these points to get the new objects, the new points that define the object. 294 (Refer Slide Time: 26:08) Here, we should note one thing. So, if we are using the same scaling factor along both x and y direction, then this type of scaling is called uniform scaling. Otherwise, what we do is differential scaling. In the example, we have seen that the scaling factor along x direction is one third and along by direction is half so they are different, so we actually followed differential scaling in the example. (Refer Slide Time: 26:41) Now, when the scaling factor is, when say sx is greater than 1 then along the x direction we are scaling up, when sy is greater than 1, then along the y direction we are scaling up or increasing the size. Now, when both are greater than 1, then along both the directions we are 295 increasing the size. Similarly, when sx is less than 1, we are reducing or scaling down the size along x direction. Similarly, when sy is less than 1, we are reducing or scaling down the size along y direction, when both are less than 1 then we are scaling down along both directions simultaneously. And of course, if sx equal to 1 or sy equal to 1, then there is no change in size. (Refer Slide Time: 27:41) One important point to be noted here is that, during scaling the object may get repositioned, as we have seen in the example. So original vertex was at (3, 2) here it was at (9, 2) here it was at (9, 4) and it was at (3, 4). Now after scaling, by applying the scaling factors along x and y directions, we got a new object defined by the 4 vertices. What are the coordinates? We have (1, 1) then here we have (3, 1) here we have (3, 2) and here we have (1, 2). Now, as you can see the vertices got repositioned. So that is one of the effects of scaling. One effect is changing the size, the other effect is it may lead to repositioning of the objects. The final basic transformation is shearing. 296 (Refer Slide Time: 29:12) What happens in shearing? Here we basically change the shape of the object. So far the transformations that we have learned deal with changing the position, orientation and size. Now, the final and the fourth basic transformation shearing allows us to change the shape also. (Refer Slide Time: 29:40) As you can see in this example, so we have one object here, which after shearing gets transformed to this object with a change in shape. Now, like scaling, shearing also essentially refers to multiplication of shearing factors along the x and y directions on the original object or on the original point. So if original point is x, then we multiply it with the shearing factor to get the transformed point x’. And same is true for y also. 297 (Refer Slide Time: 30:37) But, the relationship is slightly more complicated than scaling. Here, the new point is obtained by addition plus multiplication. So the new point is an addition of the old coordinate and a term which is a multiplication of the old coordinate with the shearing factor along that axis. But note here that to get x’, the new x coordinate, we use old x coordinate and also old y coordinate and the shearing factor along the x direction. Similarly, to get new y coordinate we use old y coordinate and also old x coordinate and the shearing factor along the y direction. So that is the difference, slightly more complicated than scaling. And it allows us to change the shape of the object. (Refer Slide Time: 31:57) 298 Now, the relationship is established between old and new points. Like in the previous cases, previous 3 transformations, in case of shearing also, we can actually apply the operations on all the points on the surface to change the shape of the whole surface. If we are following a mesh representation, the surface will be represented in terms of its lattices, in the form of a vertex list. So we apply care on all the vertices. If we are using a spline representation, then the surface will be represented in terms of a controlled point grid and we apply sharing on all these controlled points in the grid to get all our transformations. (Refer Slide Time: 32:54) Like scaling, here also repositioning may take place. Let us see this example again let us consider one vertex, this is 9 and 2, this vertex (9, 2) changes to as you can see here (10, 2). Another vertex (9, 4) also changes and becomes (11, 4). Similarly, you can see the other vertices, like here it becomes (4, 2). Whereas, earlier it was (3, 2). This vertex becomes (5, 4) from here which was (3, 4). However, you may note that it is not necessary that all vertices change their position. So, all vertices may not reposition during a shearing transformation. Also you can see here that it is not mandatory to perform shear along both axes simultaneously. As you can see here that along y axis the shearing factor is 0. So, we are not shearing along y axis whereas we are shearing along the x axis. So, both scaling and shearing have this property that they may reposition the object, they may reposition all or some of the points that define the object. 299 (Refer Slide Time: 35:39) So, these are the four basic transformations. And when we are actually trying to transform an object we can apply these basic transformations in sequence, in multiple numbers, in different ways to get the desired transformation. Now, one point should be noted here that in order to perform the transformation, we have derived some equations, the equations that show the relationship between the old and the new point. So if we want to perform the transformation, we have to apply these equations on the old points to get the new points. In fact, these equations are not very handy and convenient for use to build graphics libraries or packages. If we are trying to design a modular graphics system then these transformations represented in the form of equations may not be a good way to represent the transformations. We require alternative representations and those representations are there in the form of matrices. And we will also have many advantages if we are using matrices, particularly in the context of building modular systems. So, in the next lecture, we shall discuss the matrix representation and why it is useful to represent transformations. 300 (Refer Slide Time: 37:28) Whatever I have discussed today can be found in the chapter 3, section 3.1 of this book computer graphics. So, we will meet again in the next lecture. Till then, goodbye. 301 Computer Graphics Professor. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 11 Matrix representation and composition of transformations Hello and welcome to lecture number 11 in the course Computer graphics. We are discussing different stages of the graphics pipeline. Before we go into today's topic, let us again quickly recap the stages and where we are currently. (Refer Slide Time: 00:50) So, there are 5 stages, first stage is object representation, second stage is modelling transformation, third stage is lighting or colouring, fourth stage is viewing pipeline which itself consists of few sub stages, 5 sub stages mainly; viewing transformation, clipping, hidden surface removal, projects and transformation and window to viewport transformation, and the last stage is scan conversion. So, what we do in this stage is, the first stage we define objects, in the second stage, we put them together to construct a scene in world coordinate and then in the subsequent stages, we process those till we perform rendering on the actual computer screen. We have already discussed the first stage object representation, currently we are discussing the second stage that is modelling transformation. 302 (Refer Slide Time: 1:56) So, in the last lecture, we got introduced to the basic idea what we mean by modelling transformation and we also introduced 4 basic transformations using which we perform any type of modelling transformation. Now, today, we are going to learn about representation. So, in the previous lecture we talked about how to represent the transformations, today we will learn about an alternative way of representing those transformations. (Refer Slide Time: 02:32) What we have seen in the previous lecture how we can represent the transformation. If you may recollect, there are 4 basic transformations; translation, rotation, scaling and shearing. Using these 4 basic transformations, we can perform any geometric transformation on any 303 object, either applying any one of these 4 transformations or applying these transformations in sequence one after another multiple times and so on. And we discussed these transformations in terms of equations. For translation, we discussed the relationship between the original point and the transformed point that is point after translation as shown in these two equations. For rotation, similarly, we established the relationship between the original point and the transformed point using these two equations. Same was the case with scaling, again two equations; one each for the two coordinates and shear. In these equations starting with translation, we used some parameters tx, ty or the amount of translations along x and y direction. Similarly, ϕ is the angle of rotation in these rotation equations, sx, sy are the scaling factors along the x and y directions respectively. And shx, shy are the shearing factors along the x and y directions respectively. (Refer Slide Time: 04:55) So, as we have shown we can actually use these equations to represent the transformations. Now, as we have discussed in introductory lectures, there are graphics packages, there are graphics libraries that are developed to actually make the life of a developer easier so that a developer need not always implement the individual components of a graphics pipeline to develop a product. In order to build a package or in order to develop library functions, what we need, we need modularity, we need standard way of defining inputs and outputs for each function. Unfortunately, the equation based representations of transformations do not support such modularity. So, when we are trying to represent transformations using equations, it is difficult 304 to modularize the overall pipeline in terms of standardized input and output and standardized functions, because in subsequent stages of the pipeline, we will see other transformations and each of those transformations will have different equations represented in different forms and formats. So, then it will be very difficult to actually combine these different stages and implement a package or a library, where the user will not be bothered about the internal working of the package. (Refer Slide Time: 06:50) To maintain this modularity, equation based representations are not suitable. We require some alternative representation and one such alternative representation, which supports our need for a modularized modular based system development is matrix representation. So, we can actually represent transformations in the form of matrices. And later on, we will see that other stages of the pipeline can also be implemented by representing basic operations in the form of matrices. So, there will be some synergy between different stages and it will be easier to implement those stages in the form of predefined packages, functions or libraries. 305 (Refer Slide Time: 07:48) So, how these matrices look like let us take, for example, the scaling transformation. Now, if we want to represent scaling in the form of matrices, what we will do? We will create a matrix in this form, a 2×2 matrix and the scaling factors will be positioned along the diagonal as shown here. (Refer Slide Time: 08:20) Now, how to apply this transformation then? Suppose we are given a point P(x, y) and we want to transform it by scaling. So, what we will do? We can represent this point as a column vector shown here and then multiply that transformation matrix with the column vector. This is the dot product of the matrices to get the new points. So essentially, we need to have matrix 306 multiplication. And this form of representing the operations of transformation actually is what makes it easier to implement in a modular way. (Refer Slide Time: 9:20) So, we have represented scaling in terms of 2×2 matrix. We can do the same with rotation, we can have a 2×2 matrix for representing rotation transformation, as well as shearing. So then, we can have 2×2 matrices for the 3 operations; rotation, scaling and shearing. (Refer Slide Time: 9:50) Unfortunately, 2×2 matrices would not serve our purpose. Because, no matter how much we try, we will not be able to implement or represent the translation transformation using a 2×2 matrix unlike the other 3 basic transformations that is not possible. 307 (Refer Slide Time: 10:33) So, in order to avoid this problem, in order to address this issue, we go for another type of matrix representation, which is called representation in a homogeneous coordinate system. Now, what this homogeneous coordinate system based matrices representation refers to? (Refer Slide Time: 10:48) So, essentially it is an abstract representation technique that means, this coordinate system actually does not exist in the physical sense, it is purely mathematical, purely abstract. So, there may be physically a 2 dimensional point which we transform to a 3 dimensional abstract coordinate system called homogeneous coordinate system. So, each 2D point represented by these 2 coordinates x and y can be represented with a 3 element vector as 308 shown here, each of these elements correspond to the coordinates in the homogeneous coordinate system. So, we are transforming a 2D point into a 3D space in this case, the 3D space is the abstract homogeneous coordinate space and each point is represented with a 3 element vector. (Refer Slide Time: 11:56) So, what is the relationship between these 2 representations? So, we have a 2D point represented by its 2 coordinates x and y. And now, we have transformed it or we are presenting the same point in a 3 dimensional space called homogeneous coordinate system where we are representing the same point with 3 coordinate values xh, yh and h. So, what are the relationships between these quantities? Now, the original coordinate x is equals to the x coordinate in the homogeneous coordinate system divided by h, which is the third coordinate value and original coordinate y is equals to the y coordinate in the homogeneous coordinate system divided by again the h, h is called homogeneous factor and it is important to note that it can take any nonzero value, it must be nonzero value. 309 (Refer Slide Time: 13:05) There are a few more things we should note here, since, we are considering h to be homogeneous factor. So, if h is 0, then we consider that point to be at infinity in the homogeneous coordinate system, and there is no concept of origin since 0×0 is not defined so, we usually do not allow the origin point where everything is 0. So, these two things we should remember while dealing with homogeneous coordinate system, first thing is if h becomes 0, then we consider that point to be at infinity and there is no concept of origin in the homogeneous coordinate system. (Refer Slide Time: 14:05) Now, let us try to understand how we can convert this geometric transformation matrices into the matrices in the homogeneous coordinate system. So, earlier we had this 2 by 2 matrices 310 representing the 3 basic transformation out of 4; rotation, scaling and shearing. As we have already mentioned, so this 2×2 matrices will transform to 3×3 matrices in the homogeneous coordinate system. In fact, in general if there is an N×N matrix transformation matrices, it is converted to (N+1) × (N+1) matrices. Now, if we represent a transformation matrix, a 2D transformation matrix using a 3×3 matrix, then we will be able to represent translation as well so our earlier problem will be resolved, earlier we were unable to represent translation using a 2×2 matrix, although you are able to represent the other 3 basic transformations. Now, with homogeneous representation, we will be able to avoid that, we will be able to represent all the 4 basic transformation using 3×3 matrices. (Refer Slide Time: 15:36) Another thing we should keep in mind is that, when we are talking about geometric transformations, we always consider h to be 1. So, h value will always be 1. However, there are other transformations that we will encounter in our subsequent lectures, where h is not equal to 1. 311 (Refer Slide Time 16:03) Now, let us see how the basic transformations are represented using homogeneous coordinate matrices. So, translation we can represent using this matrices, rotation we can represent using these matrices where phi is the angle of rotation, scaling can be represented using this matrices and finally, shear can be represented using these matrices. Now, in case of scaling, sx, sy represents the scaling factors along x and y direction, in case of searing shx and shy represent the steering factors along x and y direction respectively. So, here you can see that we managed to the present all transformations, all basic transformations in the form of matrices, although we have to use 3×3 matrices to represent 2 dimensional transformations. (Refer Slide Time: 17:12) 312 Since, we are using homogeneous coordinate system, so, our point representation also changes. So, earlier we had this representation for each point, now we will be representing each point using a 3 element column vector and the other operations remain the same with minor modification. So, first we apply the matrix multiplication as before to get the new point that is P’ = S.P. But, after this, what we need to do is divide whatever we got in P’, the x and y values by h to get the original value, this is the general rule for getting back the actual points. But, in case of geometric transformation as we have already mentioned, h is always 1. So, it really does not matter. But, other transformations will see in subsequent lectures where it matters very much. So far what we have discussed is what are the basic transformations, and how we can represent those transformations, and also how we can use those to transform a point which is by performing a matrix multiplication. Now, let us try to understand the process of composition of Transformation. When we require composition? If we have to perform transformations that involve more than one basic transformation, then we need to combine them together. Now, the question is how to combine and in which sequence? (Refer Slide Time: 19:10) So, when we are performing multiple geometric transformations to construct world coordinates scene, we need to address the issues of how we perform these multiple transformations together and what should be the sequence of transformations to be followed. 313 (Refer Slide Time: 19:34) Let us try to understand this in terms of an example. Here in this figure, look at the top figure here. We see one object denoted by the vertices ABCD with its dimension is given. Now, the bottom figure shows a world coordinate scene in which the same object is placed here which is used to define a chimney let us assume of the house. Now, here you can see that the original vertex A got transformed to A’, B got transformed to B’, C got transformed to C’ and D got transformed to D’. And also, the dimension changed. So, earlier dimension actually got reduced along the x direction, although the dimension along the y direction remained the same. So, two things happened here as you can note in this figure, first of all its dimension changed and secondly its position changed. So, earlier it was having one vertex as origin, now it is placed at a different point. So, two transformations are required; one is scaling and the other one is translation, scaling to reduce the size, translation to reposition it in the world coordinates scene. This much we can understand from the figure, but how to actually apply these transformations that is the question we want to answer so that we get the new vertices. 314 (Refer Slide Time: 21:42) What we know? We know that to get the new vertices we need to multiply the current vertices with a transformation matrix. But, here it is not a basic transformation matrix, it is a composition of two basic transformation matrices and we need to perform that how to do that, how to combine the two matrices? (Refer Slide Time: 22:13) Let us go step by step. First step, we need to determine the basic matrices that means determine the amount of translation and determine the scaling factors. Note that the object is halved in length while the height is the same that means along the x direction it halved but along y direction it remained the same. So, the scaling matrix would be sx should be half and sy will be 1 as shown in this transformation matrix for scaling. 315 (Refer Slide Time: 22:57) Now translation, the second basic transformation that we require. Now, here the vertex D was the origin as you can see here, where it got transferred to? To D’. Now, what is the vertex position of the transformed point that is (5, 5). So, origin got repositioned to (5, 5) that is essentially 5 unit displacement along both horizontal and vertical directions. So, then tx equal to 5 and ty equal to 5, so if we use these values in the transformation matrix for translation, then we will get this matrix in this current case. So, earlier we obtained the scaling matrix now, we obtained the translation matrix but our question remains how to combine them? (Refer Slide Time: 24:15) That is the second step composition of the matrices or obtain the composite matrix. What we need to do is to multiply the basic matrices in sequence and this sequencing is very important, 316 we follow the right to left sequence that is a rule we follow to form the sequence. Now, what this rule tells us? (Refer Slide Time: 24:50) First transformation applied on object is the right most in the sequence, next transformation is lists on the left of this earlier transformation and so on, till we reach the last transformation. So, if we apply the first transformation say T1 on the object then it should be placed at the rightmost. Now, suppose we require another transformation which is 2, then T2 will come on the left side of T1, if there is one more transformation need to be applied say T3 then it comes left of T2 and so on till we reach the final transformation say Tn. This is the right to left rule; first transformation applied on the object is on the rightmost side followed by other transformations in sequence till the leftmost point where we place the last transformation applied on the object. 317 (Refer Slide Time: 26:04) So, in our case, we can form it in this way, first transformation to be applied is scaling followed by translation. So, right to left rule means first S, and on its left side will be T so these two will have multiplication as shown by these 2 matrices and the result will be this matrix. So, this is our composite matrix for that particular transformation. (Refer Slide Time: 26:39) Once we get the composite matrix after multiplying the current matrices with the composite matrix, we will get the new points. 318 (Refer Slide Time: 26:55) So in our case, this step will lead us to the points as shown here, A’ can be derived by multiplying this composite matrix with the corresponding vertex in homogeneous coordinate system to get this final vertex in homogeneous coordinate system and that is true for B’, C’ and D’. (Refer Slide Time: 27:31) Now, the last stage of course, is to transform from the homogeneous representation to the actual representation that we do by dividing the x and y values by the homogeneous factor h. Now h in our case, that is the case where we are concerned about geometric transformation, it is 1. So, our final transform points or vertices should be obtained in this way, A’ we will get by dividing the x and y values by the homogeneous factors, and similarly for B’, C’, and D’. 319 So, what we did? We first identified the basic transformations. This was followed by forming the sequence in right to left manner that is we put the transformation that is to be applied on the object at first as the rightmost transformation, then the next transformation to be applied on the object as the transformation left to the earlier transformation and so, on. Then we multiply these basic transformation matrices to get the composite transformation matrix. Then, we multiplied the points with this composite transformation matrix to get the transform points in homogeneous coordinate system. Finally, we divided the x and y values of this homogeneous coordinate representation by the homogeneous factor to get back the original transformed point. (Refer Slide Time: 29:32) We must remember here that matrix multiplication is not commutative. So, the formation of the sequence is very important. So, earlier we did translation multiplied by scaling following the right to left rule. Now, if we have done it in the other way that is scaling followed by translation, it will lead to a different matrix whereas this gave us M, and since matrix multiplication is not commutative, so we cannot say M=M’ so actual M≠M’. So, if we do not create the sequence properly, then our result will be wrong, we may not get the right transformation matrices. 320 (Refer Slide Time: 30:28) So, how to decide which sequence to follow. So, earlier we simply said that first we will apply scaling and then we will follow translation, on the basis of what we made that decision. Let us try to understand the example again where we made the decision that scaling should be followed by translation. So, what was there in the example that indicated that this would be the sequence? (Refer Slide Time: 31:06) When we discussed scaling, we mentioned one thing that is during scaling the position of the object changes. Now, if we translate fast and then scale, then the vertex position might have changed because scaling may lead to change in position. However, if we scale fast and then translate, then anyway we are going to reposition it at the right place where we want it. So, 321 there is no possibility of further position change. So, clearly in this case, we first apply scaling and the associated changes that take place is fine that is followed by translation. If we do that in that sequence, then we do not get any problem so that was the logic behind going for this sequence. And in general, we follow this logic where if we require multiple basic transformations to be applied, so we keep translation at the end, the last transformation because scaling and shearing are likely to change the position so with translation we try to compensate with that so typically we follow this rule of thumb. (Refer Slide Time: 32:37) Now, one thing should be noted here, when we applied scaling, we actually applied it with respect to the origin. So, origin is the fixed point in the example. However, that is not necessarily true. We can have any fixed point located at any coordinate in a coordinate system. So, in such cases, what we do? We apply the approach that we have seen earlier in the example, but with slight modification. So, our approach when we are considering fixed point which is not the origin is slightly different, let us see how it is different. 322 (Refer Slide Time: 33:33) Suppose there is a fixed point F and we want to scale with respect to this fixed point. Now, this is not origin, this is situated at any arbitrary location. Now, to determine the transformation sequence, we assume a sequence of steps. So, if the scaling was with respect to origin then we do not require anything else we simply scale, but if it is not with respect to origin, if it is with respect to some other fixed point which is not the origin then scaling itself involves a sequence of steps, just to perform scaling. (Refer Slide Time: 34:15) What is that sequence? So, first we translate the fixed point to origin that means, we make the translation amount as such Tx is -x and Ty is –y; that is the first transformation. Then we perform scaling with respect to origin, this is important. So, our scaling matrix is defined with 323 respect to origin. So, we first brought or in a conceptual way brought the fixed point to origin then perform scaling and then the fixed point is translated back to its original place, now Tx becomes x and ty becomes y, reverse translation. (Refer Slide Time: 35:16) So, how to form the sequence? Will follow the same right to left rule, first translation is the rightmost transformation that is bringing the fixed point to origin, this is followed by scaling so that is the second transformation that is followed by reverse translation that is bringing the point to the original point again that is the leftmost transformation. So, our composite matrix will be a multiplication of these these matrices; T, S and T, let us call it T1 and T2. We multiply to get the composite matrices representing scaling with respect to any point other than origin. 324 (Refer Slide Time: 36:17) And in the same way we can actually perform other basic transformations with respect to any fixed point other than origin. This is one example, which shows the procedure that we just mentioned that is now suppose this original object was defined not with one vertex at origin, but here where we have new vertices and the new point with respect to which the scaling takes place is at T which is (5, 5), and the same object is placed here after scaling and translation. So in this case, translation is not required because it was already at that point and only scaling took place. (Refer Slide Time: 37:25) So, if we apply the previous approach that we outlined. So, here we are performing scaling with respect to these fixed point D, and the transformation matrix, the composite 325 transformation matrix can be found by multiplying these 3 matrices. So, first we translate this fixed point origin so Tx will be -5, Ty will be -5. Then we perform scaling with respect to origin along the x axis that is sx will be 1/2, sy will be 1. And then we translate back this point to the original position that is Tx=5, Ty=5 that is the composite matrix. So, once we get this composite matrix for scaling we apply it to the points to get the transformed points. (Refer Slide Time: 38:21) And as I said, we can follow a similar approach with respect to rotation and shearing by first transforming the fixed point with respect to which rotation are shearing had to be performed to the origin then performing the corresponding operation and then translating it back to the original location. So, for rotation, first we will have one translation. This is followed by rotation with respect to origin. This is followed by this will be followed by translating back to the original fixed point location. For shearing same approach, translation to origin followed by shearing with respect to origin followed by translating back to the original fixed point location. So, this is the composite matrix form for performing any of the basic operation with respect to a fixed point that is not origin. 326 (Refer Slide Time: 39:35) So, to recap, if we are performing the basic operation with respect to origin, then we do not require to do anything else, we simply apply the basic transformation matrix. However, if we are performing the operation with respect to a point which is not the origin, then we perform a composite transformation which involves 3 basic transformations; first one is translation translate the fixed point to origin, second one is the actual transformation that is either scaling, rotation or shearing and the third one is translating back the fixed point to its original place. And we perform it in this right to left manner so this is the right most, then this will be one on the left of this, the second transformation and the third one will be on the left of this second transformation. So if we put the sequence, first come 1, this will be followed by 2, this will be followed by 3. 327 (Refer Slide Time 41:07) For a better understanding let us go through one more example, which will illustrate the idea further. Now, let us assume we require more than one transformations, so we will apply the same process which we already outlined. (Refer Slide Time: 41:24) Consider this object, what are the transformations required to put this object as a chimney in proportion here, as you can see that we need to rotate this object here. So, earlier the surface now becomes here so it is a rotation in counter-clockwise direction positive rotation by 90 degree, and the size also reduces by half along the x direction. So sx should be 1/2, but all these basic operations took place with respect to this fixed point. So, then how to get the composite matrix? 328 So, we first translate the fixed point to origin so that is T (-5, -5), then we scale to make it 1/2 so then S half 1, along y axis there is no change so, we will keep it 1. So, then we get objects like this, then we rotate it to get this final one. So, rotate by 90 degree but these 2 operations we performed with respect to origin after translating the fixed point to origin. So, now we have to translate it back so another translation (5, 5). So these matrices together when multiplied will give us the composite matrix. (Refer Slide Time: 43:29) So, it will look something like this. So, if we replace this notations with actual matrices then we will get these four matrices and when we multiply we will get the composite matrix which will look like this. So, this is our way to get a composite matrix when we are trying to perform multiple basic operations with respect to a point which is not the origin. 329 (Refer Slide Time: 44:08) And after getting the composite matrix we will follow the same steps that is we will multiply the surface points say these points suppose or any other surface point with the composite matrix to get the transformed point, and that brings us to the end of this discussion. So, before we end it, let us try to recap what we have learned today. First, we discussed about an alternative representation for basic transformations that is the homogeneous coordinate systems where we represent a 2D point using a 3D coordinate system. And as we have seen, it makes life easier for building modular graphics packages or libraries. So, using this homogeneous form, we can represent all 4 basic transformations using 3 by 3 matrices. Then what we learned is to form a composite matrix following the right to left rule so first matrix that we apply on the objects should be the right most, next matric that we apply should be the left to the right most matrix and so on till the last transformation. And we multiply all these matrices together to get the composite matrices. Once we get the composite matrix, we multiply it with the points to get the transformed points in homogeneous coordinate system. Finally, we divide this x and y values in the homogeneous system by the homogeneous factor to get back the original points. We also learned about how to perform the basic transformations with respect to any point that is not origin. The earlier notations were meant to be performed with respect to origin so when we are given a fixed point and we are supposed to perform the basic transformation with respect to that fixed point, which is not the origin, then we follow a composite matrix approach there we first translate the fixed point to 330 origin, perform the required transformations basic transformations with respect to origin and translate the point back to its original location. Following the same right to left rule, we get the composite matrix to represent the basic transformation with respect to any arbitrary point. So far, whatever we have discussed are related to 2D transformations. In the next lecture, we will learn about transformations in 3D. (Refer Slide Time 47:19) The topic that I covered today can be found in this book, chapter 3, section 3.2 and 3.3. You may go through these chapters and sections to learn more about these topics. We will meet again in the next lecture. Till then thank you and goodbye. 331 Computer Graphics Professor. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 12 Transformations in 3D Hello and welcome to lecture number 12 in the course Computer graphics. So, we are, as you may recollect, discussing the graphics pipeline, and as we are doing for last few lectures, we will start with having a relook at the pipeline stages so that we are able to remember it better. (Refer Slide Time: 0:58) So, there are 5 stages in the graphics pipeline, the first stage is object representation, second stage is modelling or geometric transformation third stage is lighting or assigning colour to points on the objects, fourth stage is viewing pipeline where we transfer a 3D object to a 2D viewing plane. The transformation takes place through 5 sub stages viewing transformation, clipping, hidden surface removal, projection transformation and window to viewport transformers and the fifth and final stage is scan conversion. Here, we actually map the view plane object to the pixel grid on the screen. And as we have mentioned earlier, each of these stages take place in specific coordinate systems, object representation is done in local or object coordinate system, modelling transformation here we actually transfer from local to world coordinate system, lighting takes place in world coordinate, then viewing pipeline takes place in 3 coordinate systems; world coordinate, view coordinate and then device coordinate. And finally, scan conversion takes 332 place in screen coordinate system. So, the different coordinates are involved in different stages of the pipeline. (Refer Slide Time: 3:00) Among these stages, so far we have discussed the first stage object representation. Currently we are discussing the second stage that is modelling or geometric transformation. And in the last couple of lectures, we have discussed the basic transformation idea including how to perform complicated transformations in terms of sequence of basic transformation, but all our discussion were based on 2D transformations. (Refer Slide Time: 3:32) 333 In other words, we were performing transformations in 2 dimensional reference frame. Now, let us have a look at 3D transformation. So, 3D transformation will be the topic of discussion for our lecture today. (Refer Slide Time: 4:12) So, when we talk of 3D transformation, essentially we refer to all the basic transformations that we have already discussed in 2D but in a modified form. And the transformations are actually same as in 2D, but their representation is different. In 2D, we discussed 4 basic transformations, namely translation, rotation, scaling and shearing. Now, these 4 remainders basic transformations in 3D word also. However, their representation is different. (Refer Slide Time: 04:47) 334 So, earlier we had used homogeneous coordinate system to represent the transformation. We will use the same coordinate system here to represent the 3D transformation as well, but with the difference. Now, earlier in the matrix representation, we used 3×3 matrices in the homogeneous coordinate system to represent each of the transformations. In 3D, we use 4×4 matrices to represent each transformation. However, the homogeneous factor h remains the same that is h=1. So, essentially we are using instead of 3×3, we are using 4×4 transformation matrices to represent a transformation in 3D, and the homogeneous factor h remains equal to 1 because we are dealing with modelling transformation. But there are certain differences and we should keep in mind these differences, the differences are primarily with respect to the 2 transformations; rotation and shearing. (Refer Slide Time: 6:10) In rotation earlier, we assumed that the rotations are taking place with respect to the z axis or some axis that is parallel to it. That was our basic assumption in 2D rotations. In 3D this assumption is no longer valid, here we have 3 basic rotations with respect to each principle axis x, y and z. Earlier we had defined only one rotation with respect to z axis. Now, in 3D we are defining 3 basic rotations with respect to the 3 principle axis x, y and z, so number of basic transformations changed. So, earlier we had one for rotation, now we have 3 for rotation also. 335 (Refer Slide Time: 7:05) Also previously, we did not face this situation when we defined rotation with respect to z axis. Now here, transformation matrix that we should use to represent rotation about any arbitrary axis that means, any axis that is not the principle axis, is more complicated than in 2D. So, in 2D we can have only z as principle axis, in 3D we have 3 principle axis, we have to take into account all 3. So, when we are trying to define an arbitrary rotation with respect to any arbitrary axis then deriving the transformation matrix becomes more complicated. And the form of the matrix also is more complicated than what we have encountered in 2D. We will have a look at this derivation of rotation matrix with respect to any arbitrary axis later in the lecture that is about rotation. 336 (Refer Slide Time: 8:33) Now as I said, shearing is also having some difference with respect to its 2D counterpart. It is in fact more complicated compared to what we have seen in 2D. (Refer Slide Time: 8:50) So, let us start our discussion with shearing in 3D then we will talk about the differences in rotation and then we will see how to derive a composite transformation matrix for rotation about any arbitrary axis. Now, when we are talking of shearing, as we have seen earlier we are trying to basically change the shape of the object. So, essentially to introduce some deformity in the object shape. Now this distortion or deformation can be defined along 1 or 2 directions at a time while keeping 1 direction fixed that is 1 constrain that we follow for defining shearing in 3D. 337 For example, if we are trying to shear along x and y direction then we have to keep z direction shearing fixed as a result, the general form is different than in 2D shearing. (Refer Slide Time: 10:11) In fact, we can define 6 shearing factors. Recollect that shearing factor refers to the amount of distortion or deformation we want to introduce along a particular axis. So, in this case in case of 3D shearing we can define 6 shearing factors and each factor can take any real value or zero if no shear along that particular direction takes place, so when the shearing factor is 0 that means, along that direction there is no shearing. And with respect to the six factors, the shearing matrix looks something like this, where shxy, shxz, shyx, shyz, shzx and shzy are the six searing factors. (Refer Slide Time: 11:25) 338 Among these factors shxy and shxz are used to shear along y and z directions respectively leaving the x coordinate value unchanged. We earlier mentioned that while performing shearing one direction has to be left unchanged. So, in this case, we are performing shearing along y and z directions whereas, shearing along x direction remains 0. (Refer Slide Time: 12:09) Similarly, shyx and shyz refers to the shearing factors along x and z direction when y coordinate value remains unchanged. And likewise, the other 2 shearing factors can be defined that is shzx and shzy, these 2 refer to shearing along x and y direction leaving z value unchanged. So, each pair actually refers to shearing along 2 directions while the third directions remain unchanged that means shearing along that third direction does not take place. So, that is about shearing as you can see it is more complicated compared to the shearing matrix that we have seen for 2D transformation that is because we now have 6 shearing factors. Now, let us have a look at other transformation matrices basic transformation matrices. 339 (Refer Slide Time: 13:31) Translation is the simplest and the form remains almost the same with the addition of one more dimension. So, we have tx referring to translation along x direction, ty referring to translation along y direction and tz referring to translation along z direction. (Refer Slide Time: 14:01) As I said before, for rotation, we do not have a single matrix. Instead, we have 3 separate matrices, each matrix corresponding to the rotation along a particular principle axis. So, therefore, since there are 3 axis, so we have 3 rotation matrices. 340 (Refer Slide Time: 14:31) Rotation about x axis, when the angle of rotation is ϕ looks something like this matrix. (Refer Slide Time: 14:42) Rotation along the y axis again, assuming the rotation angle to be ϕ is shown here. 341 (Refer Slide Time: 15:06) And finally, rotation about z axis by an angle ϕ is shown here in this matrix. So, we have 3 matrices representing 3 basic rotations; one about x axis, one about y axis and one about z axis. (Refer Slide Time: 15:31) Scaling is also similar to the 2D counterpart, Sx is the scaling factor along x direction, Sy is the scaling factor along y direction, Sz is the scaling factor along z direction. So, if we do not want to perform any scaling along a particular direction, we simply set that particular scaling factor as 1. So, if we do not want scaling along say y direction, then we will set Sy=1. And if you may recollect scaling factor less than 1 means, in that particular direction we want to 342 reduce the size and scaling factor greater than 1 means in that particular direction we want to increase the size. So, scaling is related to size, shearing is related to shape, translation and rotation is related to position. So, then we have in 3D more than 3 basic matrices, we have one for translation, one for scaling, one for shearing, and three for rotation so total 6 basic matrices representing 6 basic transformations in 3D. The other difference that I mentioned with respect to 2D transformation is the rotation of an object with respect to any arbitrary axis that means, any axis that is not one of the principle axis x, y and z. (Refer Slide Time: 17:17) So, what is the idea that we want to rotate an object by an angle θ counter clockwise around an axis of rotation passing through 2 points P1 and P2. So, here we are defining these two points because with these two points, we can define a line or line segment that represents the axis of rotation. So, unless we mentioned the points, it will be difficult to represent the axis. So, then we have an axis defined by the 2 points, we have an angle of rotation theta, which is counter clockwise. Remember that we are using a convention that if angle of rotation is counter clockwise then it is positive angle, if angle of rotation is clockwise, then we consider it to be negative angle. So, if we are rotating the object by an angle θ counter clockwise, then it will be simply θ, but if we are rotating the same object by an angle θ clockwise, then we will replace θ with -θ. Now, let us see what happens when we are trying to perform this rotation with respect to any arbitrary axis, how we can derive a composite matrix representing this rotation. 343 (Refer Slide Time: 18:59) The idea is illustrated in the series of steps. So, this one top left figure shows the initial situation where P1 and P2 define the axis of rotation represented with the dotted line with respect to the 3D reference frame or coordinate frame. Now then, in step 1, what we do? We translate the line to the origin. Remember, earlier in our discussion on composition of transformation, we discussed how to combine multiple transformations. So, there what we said that if we are trying to perform some basic operation with respect to any arbitrary fixed point other than origin, then what we follow? We first translate the point to origin, perform the basic transformation and then translate it back to its original location. So, the same basic principle we are following here, we are given the arbitrary axis or arbitrary fixed line. In the first step, we translate it to the origin that means the axis passes through the origin. In step 2, what we do? Now the axis passes through the origin, but there is no guarantee that it aligns with any of the principle axis. So, in this step 2, we align the line with z axis in our particular explanation, but it is not necessary to always align with z axis, instead you can always align it with either x or y axis as well. But let us assume that we are aligning it with the z axis. So, then that involves rotation about x and y axis. So, now our arbitrary axis is aligned with the z axis. So, rotation will take place around or about z axis that we do in Step 3, we apply the rotation about the z axis. After the rotation is done in step 4, what we do is, we rotate the line back to its original orientation. So, when we brought it or translated it in the step 1 to pass it through origin, it had one orientation. So in 344 step 4, we return it to that orientation and in step 5 or the final step, we translate it back to its original position. So in step 4, we are returning it to its original orientation and in step 5 we are translating it back to its original position. So, these 5 steps are needed to construct the composite matrix representing rotation of an object with respect to any arbitrary axis. So, let us try to derive it then. (Refer Slide Time: 22:39) As we have seen in the figure, so there are 5 transformations. So, the composite matrix or the ultimate final, the final transformation matrix would be a composition of these 5 basic transformations. 345 (Refer Slide Time: 23:03) So, the first transformation is translation. Translating the line so that it passes through origin. Now, the translation amount would be minus x, minus y, minus z since we are moving along the negative z direction where x, y, z is the coordinate of P2, one of the endpoints. (Refer Slide Time: 23:30) Then in step 2, we align the line to the z axis, but as I said it need not be always z axis, it can be x or y axis also. So, in order to do that what we need to do? We need to perform some rotations about x and y axis. So first, let us assume that first we are rotating the line about x axis to put the axis on the x-z plane and the angle of rotation is α. Then, we are rotating it about the y axis to align the axis with the z axis. 346 So, first we rotate it about x axis to put it on the x-z plane and then we rotate it about y axis to align it with the z axis. So, in the first case the angle of rotation let us denote it by α and in the second case, let us denote it by ᵝ, both are anticlockwise rotation so both are positive at this stage. (Refer Slide Time: 24:59) Then in stage 3, what we do? Now we have aligned the axis with z axis and then we perform the rotation about z axis which is our original objective. So then, we use the rotation matrix with respect to z axis, so here θ is the angle of rotation of the object. Remember that this θ angle of rotation is with respect to arbitrary axis, now we are using it to rotate about z axis because we have aligned arbitrary axis with the z axis. (Refer Slide Time: 25:46) 347 Then, in step 4 and 5, we reverse the operations we performed in step 1 and 2. So first, we take the line to its original alignment, which involves reverse rotation about y and x axis to bring the axis of rotation back to its original orientation. While aligning, we rotated with respect to x first and then y. Since now, we are reversing the operation, so we rotate it with respect to y first and then x. (Refer Slide Time: 26:28) And in Step 5, what do we do? We then translate it back to its original position, which is the last step. So, then what would be the composite matrix? (Refer Slide Time: 26:43) 348 We can get it by matrix multiplication and we will follow the right to left rule. So, first we perform the translation to take the line passing through origin, then we performed a rotation about x axis by an angle α to bring the line on the x-z plane then we perform a rotation by an angle ᵝ around the y axis to align it with z axis, then we performed the actual rotation by an angle θ with respect to the z axis. Then we reverse the earlier steps that is first we perform rotation with respect to y, then rotation with respect to x by the same angle amount as in the earlier cases and then reverse translation. Now, since we are rotating in inverse of what we did in step 2, now these inverse rotations can simply be represented by a change of sign of the angle. So, earlier if the angle was ᵝ than it will be -ᵝ here and if the angle was α, then it will be -α here. So, when we rotate it about x axis with α, in case of reverse rotation we will rotate about x axis by -α. Similarly, we rotated here with ᵝ, here we will rotate by -ᵝ. So, the reverse rotation means changing the angle of rotation because from counter clockwise we are now rotating clockwise. So, these matrices, multiplied in the particular sequence shown here will give us the composite matrix for rotating an object by an angle θ about any arbitrary axis of rotation. So that is in summary, what are there in 3D transformation. So, it is mostly the same with 2D transformation with some differences. First difference is that in homogeneous coordinate system, now we require 4×4 matrices instead of 3×3 matrices to represent each transformation. 349 Then earlier we defined 4 basic transformations namely, translation, rotations, scaling and shearing in the context of 2D transformation. Now we have 6 basic transformations; translation, rotation about x axis, rotation about y axis, rotation about z axis, scaling and shearing. Earlier we had defined 2 shearing factors, now there are 6 shearing factors, it is a bit more complicated than the earlier case. Now, in shearing, when he perform shearing along 2 axis, 2 principle axis, there is no shearing along the third principle axis that we follow in 3D shearing. Apart from these differences, there is another major difference in the way we derive composite transformation matrix for rotation about any arbitrary axis. So, in order to do that, we follow 5 step process, first we translate the line to pass through origin, then we align it with one of the principle axis, then we perform the rotation by the desired angle about that axis, then we place the line back to its original orientation by performing reverse rotations, and then we translate it back to its original position. And we put the individual basic matrices in right to left manner to get the final composite matrix as we have shown in the discussion. Now, let us try to understand the 3D transformation with respect to 1 illustrative example. (Refer Slide Time: 32:01) Let us consider a situation, there is an object defined with the vertices defined with the vertices A B C D here in this figure top figure, as you can see it is on the x-y plane This is the initial situation, now we want to use this particular object to construct a partition wall defined 350 by the vertices A’, B’, C’ and D’ in a scene where A’ corresponds to A, B’ corresponds to B, C’ corresponds to C and D’ corresponds to the vertex D. So, here as we can clearly see some transformation took place, the question is try to calculate the composite transformation matrix that enables this object to be positioned as a partition wall in this scene. Let us see how we can do this. 351 (Refer Slide Time: 33:34) So, initially the square is in the x-y plane and each side had 2 units of length and the centre is given as (2, 2, 0). The final square is on the y-z plane with each side equal to 4 units and the centre is now at (0, 2, 2). Now, these lengths and centres can be found out by the coordinates of the vertices. (Refer Slide Time: 34:28) So, then what we need to do? So, in this case, we need a rotation from x-y plane to y-z plane, but the axis of rotation is not z axis, it is parallel to z axis so we will follow this composite matrix transformation creation approach. So, first we translate the centre to origin, centre of this original object so then the translation amount will be -2, -2 and 0. 352 (Refer Slide Time: 35:13) So, if we are translating the centre to origin then the axis of rotation which was parallel to z axis now will be automatically aligned with the z axis. So, then we perform the rotation by 90 degrees anti clockwise around the z axis. So, we will use the rotation matrix defined for rotation about z axis with the angle of rotation 90. Since the rotation is anti-clockwise, so it will be positive angle. (Refer Slide Time: 35:58) Then we rotate by 90 degrees anti clockwise around y axis. So, again we will use Ry (90) where Ry θ is the basic rotation matrix about y axis. 353 (Refer Slide Time: 36:28) Then we perform scaling because the size increased scale up by 2 in y and z direction, so x direction will have a scaling factor 1, there is no change and z and y direction will have scaling factor 2, the size will double. (Refer Slide Time: 36:54) And then we translate the centre to the new object centre using the translation matrix. 354 (Refer Slide Time: 37:14) So, then the composite transformation matrix can be obtained by multiplying these individual basic transformation matrices together where we followed the right to left rule that is, first is the translation to origin, then rotation about z axis, then rotation about y axis, then scaling up by 2 along y and z direction, then translation to the new origin. So if we multiply, we will get the new composite transformation matrix. And after we get this matrix, just to recap the procedure, what we need to do? We need to multiply each vertex with this composite transformation matrix. So, if vertex is represented by column vector P and this composite transformation matrix is M, then we perform this (M.P) for each vertex to get the new vertex position in homogeneous coordinate. So, eventually to get the physical coordinate, we perform this operation, we divide the x coordinate by homogeneous factor, y coordinate by homogeneous factor and z coordinate by the homogeneous factor. So in our case of course, h=1. So, it really does not matter, the x, x and z coordinate will remain the same. But later on, as I mentioned earlier, we will see that there are situations where h≠1. So, in that case, this division is very important that we will see in subsequent lectures. So, with that, we come to a conclusion to our discussion on 3D transformation. And also, we have come to a conclusion to our discussion on the second stage that is modelling transformation. So, we started our discussion with 2D transformations, there we introduced the basic idea of modelling transformation that is to assemble objects that are defined in their 355 own or local coordinate systems into a world coordinate scene. In order to do that, we perform geometric transformations, any transformation can be considered to be a sequence of basic transformations in 2D transformations. We have discussed 4 basic transformations, those are translation, rotation, scaling and shearing. We also discussed why it is important to represent transformations in terms of matrices, because of modularity and compatibility with subsequent stages when we are implementing a package in the form of library functions or APIs or standard functions. Now, for matrix representation we discussed the importance and significance of homogeneous coordinate system and we have seen how to use the homogeneous coordinate system to represent basic transformations or any composite transformation. So, in summary, in modelling transformation, we perform transformation by considering individually or in sequence basic transformations, these transformations are represented in the form of matrices, where the matrices are themselves representations in homogeneous coordinate system. And in 2D transformation, we have 4 basic transformations. In 3D modelling transformations, we have 6 basic transformers. And any transformation with respect to any arbitrary point or axis of rotation can be derived by using a sequence of basic transformations, the way we derive composite transformation. (Refer Slide Time: 42:13) So in the next lecture, we shall start our discussion on the third stage of the graphics pipeline that is assigning colour or the lighting. 356 (Refer Slide Time: 42:31) Whatever I have discussed today can be found in this book. And you may refer to chapter 3, section 3.4 to know more about the topics that I have covered today. So, we will meet you in the next lecture. Till then, thank you and goodbye. 357 Computer Graphics Professor. Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 13 Color Computation – Basic Idea Hello and welcome to lecture number 13 in the course Computer Graphics. So, by now we have covered more than one-third of the course. Before we go into the next topic, let us pause for a moment here and reflect on what we have learned so far. As we may recollect, we were discussing about the process, the process of displaying an image on a computer screen. Now this is a generic concept of course the screen may vary in size. And it need not be always image. It can be characters also, but broadly what we are concerned about in this course is basically how a screen or a display unit or some output unit generates an image that process is captured in the form of a set of stages which we call 3D graphics pipeline, currently we are discussing the pipeline. Let us have a relook at the stages of the pipeline. (Refer Slide Time: 01:51) As you can see here there are 5 stages. First stage is object representation. In this stage, we define objects in their own or local coordinate system, then we have second stage that is modeling or geometric transformation. So, in this case the objects that are defined in the local coordinates of that particular object is transformed to a world coordinate scene. So, here a transformation takes place from local coordinate to world coordinate. 358 Third stage is lighting. In this stage, we assign color to the points on the surface of the objects. We may consider this to take place in the world coordinate system itself. Fourth stage is actually a collection of sub stages. Fourth stage is viewing pipeline which consists of 5 sub stages viewing transformation. So, in this stage there is a transformation that takes place from world coordinate to a new coordinate system called view coordinate. Then there is one process called clipping which takes place in the view coordinate system, then another process hidden surface removal which takes place in view coordinate system again. After that there is another transformation called projection transformation. In this stage, another coordinate transformation takes place from a 3D view coordinate system to a 2D view coordinate system. And then we have another transformation window to viewport transformation. Here we transform the 2D view coordinate object to a device coordinate system. Together this 5 sub stages constitute the fourth stage that is viewing pipeline. And finally we have scan conversation which is the fifth stage. Here also some transformation takes place from the device coordinate to a screen coordinate system. So, these are the stages of a 3D graphics pipeline. Now we have already discussed some of those stages and some of those stages are there which remains to be discussed. What we have discussed? (Refer Slide Time: 04:30) 359 We have discussed first stage that is object representation and also we have finished our discussion on the second stage that is modeling transformation. Now we are going to start our discussion on the third stage that is lighting or assigning color to the objects rather the surface points of the objects. So, we will start with the basic idea of coloring process when we talk of coloring what we mean and how we can actually implement the idea of coloring in the context of computer graphics. (Refer Slide Time: 05:20) Now, as I have already mentioned, third stage deals with assigning colors. So, why that is important? Let us again look at the example figures shown on the right hand side of the screen. Here, as you can see on the top figure we have one object which we have assigned color too and there is another object at the bottom where again we have assigned color. Now what is the difference between this top figure and the bottom figure? See in the top figure we have assigned color, but in this figure we are unable to perceive the depth information. Now this depth information is very important to create an impression of 3D. This problem is not there in the lower figure here. In this case, as you can clearly see, by assigning color in a particular way we manage to create an impression of a 3D object which was not the case in the first image. Now, how we manage to do that? Here, as you can see in this lower figure, same color has not been applied everywhere. We have applied different colors with different intensity values to give us the perception of depth. Now, when we talk of assigning color, we are actually referring to this particular way of assigning colors so that we get the impression of depth. 360 (Refer Slide Time: 07:19) Now, this appropriate color assignment can be considered to be the same as illuminating the scene with a light. Why we get to see color? Because, there is light. If there is no light, then everything will be dark and we will not be able to see the color. So, essentially when we talk of assigning color we are actually referring to the fact that the scene is illuminated with a light and based on that light we are getting to see the color. So, in order to mimic this process of illuminating a scene with a light what we do in computing graphics we typically take help of a model particular type of model called lighting models. Now, in this third stage, we will learn about lighting models in details. (Refer Slide Time: 08:25) 361 What this lighting models do? A lighting model actually computes the color and output a number, a real number which represents intensity values, intensity of the light. (Refer Slide Time: 08:51) Now, the way these models are designed so they can only compute colors in terms of some continuous real numbers, but as we all know computers are digital machine so they can only process digital values, discrete values. They cannot deal with continuous numbers. So, we need some method, some way to map this continuous values to streams of 0s and 1s or digitalize those continuous values otherwise the computers will not be able to deal with those values. (Refer Slide Time: 09:44) 362 Now, this mapping process is also very important in our process of assigning colors to objects in computer graphics and that constitute a crucial part in the third stage where we assign colors. So, we will also discuss this mapping process. So, two things broadly we will discuss, one is computing intensity values based on lighting models and the second thing is mapping the continuous intensity values to a set of discrete steams of 0s and 1s. (Refer Slide Time: 10:33) Now let us try to understand the basic idea behind the process of illumination. (Refer Slide Time: 10:41) How we get to see color? Actually it is the outcome of a process, the process of illumination. 363 (Refer Slide Time: 10:54) The process assumes that there is a source a light source which emits light. Now there maybe one source, there may be more than one sources, but there has to be some source of light which emits light. Now this emitted light falls upon the point, like in this figure as you can see we have one source that is the light bulb and it emits light. The light intensity falls on these two object surfaces. (Refer Slide Time: 11:44) Now, sometimes this light that comes from the source need not fall directly to the surface point, instead it can get reflected from another point and then fall upon the surface point. Like we have shown here it first falls on this objects and then gets reflected from there and then 364 finally falls on this point. So, at this point we have two light incident upon this point. One comes from the direct light source that is the direct light other comes not directly from the light source, but after getting reflected from another surface. So, that can be considered as an indirect source of light. So, at this point we have light coming from a direct source as well as indirect source. Now this transportation of light energy from the source to the point this is our source here direct source and indirect source. Now from the source to the point this transportation of light energy is called illumination. So, this is of course one process of transporting light energy from a source to the point either directly or indirectly. (Refer Slide Time: 13:33) Now, this incident light at this point gets reflected from the object surface and then falls upon our eyes or the eye of the viewer. Now once we receive that light after getting reflected from the object surface, we can perceive color. So, the intensity of this incident light to the eye is the perceived color or the simply the color of the point. So, to recollect a viewer is looking at this point on here. Now, at this point there are two incident light. One coming from the direct source through this path, one coming from the indirect source through this path. Now this process of transporting light energy from the source to this point is illumination. So, after getting illuminated the light is actually reflected from this point and reaches to the eye of the viewer through this path. The intensity of this reflected light that reaches the eye actually is the perceived color or simply the color of the point. 365 So, essentially what we perceive as color is the intensity of the light that is reflected from the point to which we are looking at. (Refer Slide Time: 15:23) Now, this process of computing the luminous intensity or of the outgoing light at the point is known as lighting and in computer graphics we are interested in simulating this lighting process. So, there are two processes involved as we have just discussed; one is illumination that is from the source the light falls on the point either directly or indirectly that is called illumination. And from that point the light gets reflected and reaches the eye of the viewer that is the lighting and this intensity of the reflected light actually determines the color at that point. So, we are interested in determining the color at that point. So, we are interested in mimicking the lighting process. 366 (Refer Slide Time: 16:34) Sometimes another term is used that is called shading; also known as surface rendering. This term refers to the process of assigning colors to pixels minor difference earlier we are talking of assigning color to any surface point now we are talking of assigning colors to pixels. (Refer Slide Time: 17:06) So, technically both are same, both refer to the process of computing color at a point, but there is a difference particularly in the context of usage of this term in computer graphics. What is the difference? 367 (Refer Slide Time: 17:37) They represent two different ways of computing colors at a point. So when we are talking of lighting and when we are talking of shading technically both refer to the same thing that is we are talking of computing color of a point. But, in practice, when we use these terms in graphic, we are referring to slightly different concepts and these concepts are related to the way the color value is computed at the points. (Refer Slide Time: 18:20) In case of lighting, we take into account properties of light source and the surface. So, essentially we are trying to simulate the optical phenomenon. So, when we are talking of color, color of a surface point, it has to take into account the property of the material of the 368 surface, the properties of the light source and when we are talking into account these properties, then we are talking of lighting. When we are computing the color value taking into account these properties, we are talking of lighting. So, essentially it refers to computing color taking into account all optical properties that are relevant in color computation or in other words simulation of the optical phenomena. (Refer Slide Time: 19:28) In order to do that, we use lighting models. But these models, as we shall see later, are complex models involves lot of computations so essentially they are computation intensive. (Refer Slide Time: 19:53) 369 Now in graphics, in practical applications, as maybe obvious to all of us by now, when we are trying to render a scene, then there are large number of points. So, we need to compute color at a large number of points. Now at every point if we have to apply this lighting model, then since the model itself is very complex and computation intensive, the total time required to compute and assign colors to those points may lead to some delay in appropriate rendering of the pixels. So, in other words, if we are applying the lighting models to compute color on all the surface points, then we may not get a real time 3D image in a realistic way. So, it is in fact inappropriate to apply the lighting model to compute colors at all the surface points. So, we have a model, we know how the optics take place. We have taken into account the characteristics, the properties of both the surface as well as the source. But if we want to compute color it is not advisable or appropriate to compute it using lighting model only that will lead to delay in rendering the image. (Refer Slide Time: 22:00) Now, in order to address these issue, typically an alternative approach is used. What is that approach? 370 (Refer Slide Time: 22:09) So, instead of computing color at every point by using lighting model what we do is, we map the surface points to pixels and afterward we use the lighting model to compute colors for a selected small number of pixels so not for all pixels only for a selected small number of pixels which are on the surface we apply the lighting model. So, here we are not applying the model for all the pixels that are on the surfaces, instead we are applying it only for a very small subset of those surface pixels. (Refer Slide Time: 23:02) Subsequently we use those values to interpolate colors of the remaining surface pixels. So suppose we have a surface like this now it is map to say this pixel grid so these are the pixels. 371 So, there are total 16 pixels that constitute the surface among these pixels we may apply the lighting model to compute color of only one pixel and then use it to interpolate color of the remaining pixels. Now this interpolation is not done by applying the lighting model instead interpolation is done by much less computation mostly some iterative process which requires less computation. So essentially, interpolating colors rather than computing colors with the lighting models saves lot of time. So, for example, in this case earlier if we had to use lighting model we had to use for 16 points. Now, we are using lighting model say for one or two points. And then remaining 14 or 15 points we are coloring using interpolation which is simple iteration of simple computations steps. So, in this process we can save lot of time. (Refer Slide Time: 25:00) Now, this interpolation based process of pixel coloring is generally referred to as shading in computer graphics. So, we have lighting and we have shading. So, we will distinguish between these two, although technically they are same, but in the context of our discussion of the third stage we will distinguish between the two. Lighting refers to application of a lighting model to compute color and shading refers to application of interpolation to compute color and we will learn about shading models also in the subsequent lectures. 372 (Refer Slide Time: 25:47) Now, let us try to learn, in brief, some of the background information which we will utilize for our subsequent discussions on lighting model, shading model as well as the mapping from continuous intensity values to discrete intensity values. First thing is the factors that affect color. What affects color? So, there are two broad things. One is properties of the light source and the other one is properties of the surface on which the point lies. So, the surface properties as well as the light source properties determine the color of a point. (Refer Slide Time: 26:50) Now, surface properties include two types of properties optical properties such as reflectance and refractance. I hope you may be aware of these terms. Reflectance refers to the fact that 373 some portion of light gets reflected some gets absorbed. Refractance refers to the fact that light gets refracted while passing through a surface and the amount of reflection or refraction is determined by some properties reflectance properties and refraction properties. Apart from these optical properties there are geometric properties or attributes as well, such as position of the object surface with respect to the light source orientation with respect to the light source and so on these also determine the amount of color or the particular color that we perceive. That is about surface properties. What about light source? (Refer Slide Time: 28:23) So, in graphics we typically consider 3 types of light source. Let us have a look at these 3 types. 374 (Refer Slide Time: 28:29) First one is point light source. So, here what we are assuming is that such sources emit light equally in all direction from a single point which is dimensionless and how do I characterize this type of light sources? Since there is no dimension we do not need to characterize by their shape or size, instead we simply characterize them by their position and intensity values of the emitted light. So, what is the intensity value of the light that they are emitting as well as their position? (Refer Slide Time: 29:11) If we are trying to model some light source that are very very far with respect to the point. Typically infinitely distance sources. For example, the sunlight. We can use this concept of 375 point light source to model such light sources. However, in such cases since it is very very very far position makes no sense. So, we characterize such sources only with respect to the intensity of the emitted light. So, only the intensity of the emitted light characterizes the light sources that are infinitely distant from the point. (Refer Slide Time: 30:13) Then we have directional source or spotlight. So, we used this type of light sources to simulate beam of light effect. In this case what we assume is that it consists of a point light source and that source emits light within an angular limit so within this limit characterized by this angle theta. Now, if a point is within this limit then that point is illuminated, if a point is outside this limit, then it will not be illuminated by this particular light source. 376 (Refer Slide Time: 31:02) So essentially, spotlight sources can be characterized by three things: the position of the point source, the angular limit characterized by this angle and the emitted light intensity. So, the intensity with which it emits the light. Later on we will see that how this intensity varies from one point to another while we will be discussing the lighting model. (Refer Slide Time: 31:39) So, then the third type of light is ambient light or light that comes from indirect sources. So, sometimes there maybe objects which are not directly illuminated by a light source, but we are still able to see it. How? Because the light that is getting emitted from the light source gets reflected by other object that surround this particular object of interest and that reflected 377 light from other objects fall upon the object of interest and then comes to our eyes and we get to see that particular object. Like the example shown here in this figure even if assume that this direct light is not available still we will be able to see this object of this particular point because this light is falling on this object here and then getting reflected and falling on the point of interest and then from there getting reflected and comes to our eye. So, we get to see this point because it gets light from this source, indirect source. Now this is indirect illumination from surrounding surfaces. (Refer Slide Time: 33:32) And that also is one type of light source which we call ambient light, but if we want to model this ambient light effect that is how much light is getting reflected from surrounding surfaces and falling upon the point of interest, that as you can probably guess, it is going to be quite complex because there may be large number of objects at different positions, orientations with different surface properties. And if we need to calculate the luminous intensity from each of this surface points that ultimately falls upon that point of interest and that is going to take quite a lot of computations and is likely to time consuming. So, typically in graphics to avoid such complex computations we assume some simplified model of ambient light that also we will see in our discussion on the lighting model. 378 (Refer Slide Time: 34:43) And such simplified model is called ambient light source. So, we assume that there is an ambient light source which effects every surface point uniformly which of course in practice is not the case, but we assume that and we will see that assumption leads to realistic images in most of the cases without too much additional computations. So, to summarize, when we are talking of computing color, we need to take into account two things, one is surface properties both optical properties as well as geometric properties and one is the light source and we just discussed 3 types of light sources. One is point light source characterized by position and intensity of the emitted light, but if we are considering a point light source at a very, very distant location, then position is not important only emitted light intensity characterizes such sources. Then we have spotlight characterized by the point light source position the extent of angular spread of the light and the intensity of the emitted light and third type is ambient light source where we are assuming that there is a simplified model of ambient light effect and this model is encapsulated in the form of a single light source which affects all the object in a scene uniformly. We will learn more about this sources during our discussion on the lighting model we will get to see how these sources and the surface properties are going to affect the computations in a lighting model. 379 (Refer Slide Time: 37:12) One more thing about this ambient light is that the simplified model that we assume is that we are assuming that such light sources do not have any spatial or directional characteristics. As a result they are assumed to illuminate all surfaces equally and characterized by only one thing that is ambient light intensity. These are crucial considerations to be able to model the lighting process without imposing too much computational overhead on the system. (Refer Slide Time: 38:13) So, with this background knowledge, we would be discussing the idea of a lighting model in terms of a simple lighting model that we will do in the next lecture. You may like to note the term ‘simply’ although we will see in practice that it is still complex, but that is the simplest 380 of all possible lighting models that are there and used in graphics. And we will also discuss the computations involved and how to how to reduce the computations by making some simplifying assumptions. That lighting model we will discuss in the next lecture. (Refer Slide Time: 39:01) Whatever I have discussed today can be found in this book chapter 4 section 4.1. You may go through this section to learn in more details about the topics that I just discussed in today’s lecture. See you in the next lecture. Till then good bye and thank you. 381 Computer Graphics Professor. Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 14 Simple Lighting Model Hello and welcome to lecture number 14 in the course Computer Graphics. As usual, we will start by recollecting the pipeline stages which we are currently discussing. (Refer Slide Time: 00:44) So, there are 5 stages in the 3D graphics pipeline. Now, if you may recollect, this pipeline refers to the process of rendering a 2D image on a computer screen which is generated from a 3D scene and among those 5 stages, we have seen the first stage object representation. We have also discussed modeling transformation, the second stage. Currently we are discussing lighting or the third stage after that there are two more stages; one is viewing pipeline which itself is a series of 5 short stages and then finally scan conversion or rendering which is the last stage. 382 (Refer Slide Time: 01:33) As I just mentioned we are currently discussing the third stage that is lighting. So, the idea is that we want to assign colors to the points that are on the surface of the objects that are there in a scene. The assignment of colors to the surface point is the responsibility of the third stage that is called the lighting stage. If you may recollect in the previous lecture we discussed the basic concepts that are there behind this coloring of surface point. First thing is lighting that is, the light that comes after getting reflected from the point of interest to our eye that light determines the perception of color and this process of perceiving color by receiving the reflected light from the point of interest is called lighting and we discussed that this lighting can be computed with the help of a simple lighting model. Today we are going to talk about that simple lighting model. When we are referring a lighting model to be simple that means we are trying to simplify certain things. Now if you may recollect lighting models refers to the modeling of the process of lighting. Now that is clearly an optical process and when we are using the term simple to refer to a lighting model we are essentially referring to the fact that many optical phenomena that happens in practice will be ignored. Instead, we will make some simplifying assumptions about those phenomena and then implement the lighting model. 383 (Refer Slide Time: 03:50) So, in order to discuss the lighting model, we will start with the basic idea that is we use the lighting models to compute colors at the surface points. So, essentially the job of the lighting model is to enable us to compute colors at the points of interest. (Refer Slide Time: 04:14) If you may recollect, in the introductory lecture we mentioned there are broadly two components that determine the color. One is light source, other one is surface properties. Now, for simplicity, let us start by assuming that is a single light source which is monochromatic and a point light source. Monochromatic means it has only one color component and it is a point light source. 384 If you recollect, we discussed point light source that are dimensionless and characterized by only position as well as the intensity of the emitted light. (Refer Slide Time: 05:04) So, when we are assuming a monochromatic single point light source then how the model will look like let us try to derive it. (Refer Slide Time: 05:17) In order to do so, let us revisit our idea of perceiving a color the process involved in perceiving a color. So, this is a light source in the figure. Now as you can see on the queue this is a point of interest at this point we want to compute the color. Now color perception we 385 get after we receive the light that gets reflected from that point to our eye or the viewer eye. Now this light is a combination of two incident light. One comes directly from the light source this is direct light one comes after getting reflected from a secondary object this is we call ambient light. So, there are these two components direct light and ambient light. (Refer Slide Time: 06:28) So, we can say that this reflected light intensity can be approximated as a sum of intensities of the two incident light that is ambient light and direct reflection that is the assumption simplifying assumption that we are making. (Refer Slide Time: 06:57) 386 Now this reflection from a point can occur in two ways. One type of reflection is called diffused reflection and the other type is called specular reflection. So, we have two types of reflection one is diffuse reflection other one is specular reflection. (Refer Slide Time: 07:24) Let us try to understand these different types of reflections with respect to one illustrative example. Look at the figure here as you can see on this object different colors are there at different points. This region is having slightly dark color and this color comes from ambient reflection. Above this region we have slightly brighter color excluding the central region this whole region excluding the central region is having somewhat brighter color that is called diffuse reflection. Now diffuse reflection is defined as given here that when incident light tends to reflect in all directions from a rough or grainy surface then we get to see diffuse reflection. Now we assume that both reflection from direct light source as well as ambient light can result in diffuse reflection. So, ambient and diffuse technically both are same diffuse reflection, but we will differentiate between the two. By the term diffuse we mean diffuse reflection due to direct light and by the term ambient we mean diffuse reflection due to ambient light. 387 (Refer Slide Time: 09:18) For a shiny or smooth surface, we see a different sort of reflection that is light gets reflected in specific direction or region and if a viewer is situated within that region then the viewer gets to see a bright spot. You can see here in this figure, this zone, the color in this zone is completely different from the surrounding surface region. This is a bright spot and this results due to another type of reflection. Now this reflection is called specular reflection. So, we have this third type of reflection specular reflection. So, we have diffuse reflection due to ambient light which gives us this dark color somewhat dark color then diffuse reflection due to direct light source which gives us somewhat lighter color and finally specular reflection which gives us this bright spots. 388 (Refer Slide Time: 10:31) So, in light of this knowledge, let us now try to derive the simple model. So, in the simple model then we have 3 components. One component is due to the diffuse reflection of ambient light, one component is due to the diffuse reflection of direct light and the third component is due to the specular reflection of direct light that is incident at that point. (Refer Slide Time: 10:59) So, we can actually model the light intensity that reaches to the viewer from the surface point that is of interest to us as a sum of 3 intensities. 389 (Refer Slide Time: 11:23) What are these 3 intensities? Intensity due to ambient light, intensity due to diffuse light and intensity due to specular light. Now when I say intensity due to ambient light I mean to say the diffuse reflection of ambient light when I say intensity due to diffuse light I mean to say diffuse reflection of direct light and when I say specular light I mean to say specular reflection due to direct light. So, the intensity at the point is a sum of these three intensities which we denote by these terms Iamb, Idiff and Ispec. (Refer Slide Time: 12:22) 390 Now, those are the components. Now, how to get those components? So, one assumption is that reflected light intensity is a fraction of incident light intensity. How to decide on this fraction? It is determined by a surface property which is known as the reflection coefficient or reflectivity. Now recollect in our earlier lecture we discussed about two determinants for color. One is light source, other one is surface property. Now, we are bringing in the surface property here. So, we are assuming that the reflected light is a fraction of the incident light and the fraction is determined by one surface property that is the reflectivity or the reflection coefficient. (Refer Slide Time: 13:26) Now, in order to control the lighting effect in our computation, we define 3 such reflection coefficients. 391 (Refer Slide Time: 13:41) One is for the 3 types of lights. So, one coefficient for diffuse reflection due to direct light, one coefficient for diffuse reflection due to ambient light and one coefficient for specular reflection due to direct light. So, diffuse reflection coefficient due to ambient light is denoted by ka. Diffuse reflection coefficient due to direct light is denoted by kd and specular reflection coefficient due to direct light is denoted by ks. So, we are defining these three coefficients and we are also specifying the values that these coefficients can take. It is defined as a range. (Refer Slide Time: 14:34) 392 These coefficients can take values within the range 0.0 to 1.0. Now when we are specifying the value to be 0.0, it represents a dull surface with no reflection so everything will be absorbed. And when you are specifying the value 1.0, it represents the shiniest surface with full reflection that is whatever gets incident to that point will be fully reflected from that point. So, it reflects all the incident lights. By varying these values, we can actually control the amount of dullness or shininess of the surface of interest. (Refer Slide Time: 15:26) Now, as I said, there are three components which determines the color: one is the ambient light component, one is the diffuse reflection component due to direct light and one is the specular reflection component due to direct light. So, let us try to model this individual components one by one. We will start with the ambient light component which is the simplest two model and actually we will be making a very, very simplifying assumption in this case. 393 (Refer Slide Time: 15:59) So, here we will assume that every surface is fully illuminated by an ambient light with intensity Ia so that is our simplifying assumption that all points are getting illuminated by the same ambient light intensity Ia so we will not consider complex optical behavior of light after getting reflected from surrounding surfaces. We will instead make a very, very simplifying assumption that any point gets illuminated by a single intensity Ia representing the ambient light. So, essentially we are modeling ambient light as a single light source with intensity Ia. (Refer Slide Time: 17:03) And we have already defined the reflectivity or reflective coefficient for ambient light. Now if the light that is incident at a point is Ia, then reflected light we can compute based on the 394 assumption will be the incident light multiplied by the coefficient. So, that will give us the ambient light component of the color. This is our simple model for ambient light. So, with this model we can compute the intensity contribution due to ambient light in the overall intensity which gives us the color. (Refer Slide Time: 17:59) Then we have the second component that is diffuse reflection component due to direct light source. (Refer Slide Time: 18:09) Now in order to model this component we make another assumption. This is about how the surface reflects the incident light. So, we assume that all the surfaces in the scene are ideal, 395 diffuse relectors or more popularly these are called Lambertian reflectors. Now this follows the Lambert’s cosine law. So, all the surfaces follow this law which states that energy reflected by a small portion of a surface from a light source in a given direction is proportional to the cosine of the angle between the direction and the surface normal. This is the Lambert’s cosine law. Now as per this law what we can infer? (Refer Slide Time: 19:15) The law implies that amount of incident light from a light source on a Lambertian surface is proportional to the cosine of the angle between the surface normal and the direction of the incident light. Now if we assume that this is the point of interest in the right hand figure, then this law the Lambert’s cosine law indicates that the amount of incident light from a light source on a Lambertian surface is proportional to the cosine of the angle between the surface normal and the direction of the incident light. Now this angle is called angle of incidence. 396 (Refer Slide Time: 20:14) Based on this, let us assume a direct light source with intensity Is and the angle of incidence at the point is denoted by θ. (Refer Slide Time: 20:31) Then we can say that amount of light incident at that point according to the Lambert’s law is Iscosθ, as we have just seen as per the law. 397 (Refer Slide Time: 20:50) Now if that is the incident light, we also know that a fraction of this light is reflected and reaches the viewer’s eye and that fraction is determined by the diffuse reflectivity or the diffuse reflection coefficient for direct light which we are denoting by kd. So then, the amount of light that gets reflected can be modeled with this expression and this will be the contribution of the diffuse reflection due to direct light to the overall intensity. This will be our expression to compute the diffuse reflection component to the overall intensity value. (Refer Slide Time: 21:59) 398 Now we can represent the same expression in different way. Let us assume L and N denotes the unit direction vector to the light source from the point and the unit surface normal vector respectively. So, L denotes the unit direction vector to the light source from the point of interest and N denotes the surface normal vector at that point. (Refer Slide Time: 22:35) Then the same expression we can rewrite in this way because we know that we can represent cosθ as a vector dot product in terms of the two unit vector N.L. So, if N.L>0, then we have the diffuse reflection component denoted by this expression and if it is less than equal to 0, then it is 0. This is another way of writing the same expression and we will follow this expression of this representation. 399 (Refer Slide Time: 23:32) So, we have model the two components. And the third component is the one that is remaining which is model in specular reflection component. (Refer Slide Time: 23:43) Now, this component, we will model with some empirical derivation which we will see later and this empirically derived model was proposed by Bui Tuong Phong way back in 1973 and we shall use that model which is also known as Phong specular reflection model so we will be using this model in our simple lighting model. 400 (Refer Slide Time: 24:21) So, what this model tells us the assumption is specular reflection intensity is proportional to cosine of the angle between viewing and specular reflection vector raise to a power that is the empirically derived law so to say. So, in this Phong model empirically it has been found that we can model specular reflection intensity as proportional to the cosine of the angle between the viewing and the specular reflection vectors raised to a power. Now, V is the viewing vector and if R is the specular reflection vector and the angle between them is φ as shown here. (Refer Slide Time: 25:39) 401 Then according to this empirically derived formula, we can say specular reflection component is proportional to this expression where φ is defined within this range 0 degree and 90 degree. Now the term ns which is the power is called specular reflection exponent and by using this exponent judiciously we can generate different effects and by varying the value of course if the value is larger greater than 100, then it can generate shiny surface effect if the value is close is 1, it generates rough surface effect. (Refer Slide Time: 26:34) Like in case of diffuse reflection, in case of specular reflection also, we can have vector representation of the same expression. First let us see the actual expression to compute specular reflection component. Now as we have said this component is the amount of incident light kd is the specular reflectivity. So, the actual component is given by earlier we said I specular is proportional to this component and this proportionality constant is the kd. So, the actual component is kd multiplied by the expression. Now we know cosφ can be represented by vector dot product V and R, where V and R represents the unit vector along the viewing direction and the specular reflection direction. So, by using this expression we can say or we can represent the specular component in terms of vector product in this way. Now if V.R > 0, then we have this component to compute specular component. And if V.R ≤ 0, then we have 0. Also to make the expression more comfortable with the previous expression that we have seen we will replace R in terms of the other vectors L and N.L is the vector direction towards light source N is the surface normal. So, if we use this 402 expression for R then all our reflection components both diffuse reflection as well as specular reflection due to direct light can be computed only in terms of L and N. Rather than L and N in one case and V and R in another case. So, in case of diffuse reflection due to direct light source we have L and N in case of specular reflection due to direct light source we have L and as well as V the viewing direction and we are replacing R with L and N. So, these are the 3 components that we can use to compute the overall intensity of the reflected light from the point of interest. (Refer Slide Time: 30:05) There is another interesting thing which is called intensity attenuation. Now, in our computations that we have discussed earlier, we assume that the light intensity does not change as it moves from the source to the surface point. So, the intensity after getting emitted at source and the intensity when it is incident at a point which is some distance away from the source we are assuming both are same. 403 (Refer Slide Time: 30:41) What is the problem with that assumption if we make such assumption that will happen? Assume, there are two surface points: one is closer to the source and the other is slightly farther away. Now intensity of the light received by either of these points will be the same because we are not assuming any change in intensity in the incident light depending on the distance then the color computed using our simple lighting model will also be the same. So, nowhere in the computation we are taking into account the distance travelled by the light explicitly then the color computed will also be the same and as a result we will not be able to identify or perceive the relative difference in distance between the two points. (Refer Slide Time: 31:41) 404 So, all surfaces will be illuminated with equal intensities irrespective of their distance which will lead to indistinguishable overlapping of surfaces when projected on screen. So, we will not be able to understand the distance between them which will reduce the perception of 3D. (Refer Slide Time: 32:11) In order to address this issue, we incorporate something called intensity attenuation. In our model, in the form of attenuation factors. Now there are two such factors: one is radial attenuation factor and the other one is angular attenuation factor radial attenuation factor denoted by AFrad and angular factor denoted by AFang. (Refer Slide Time: 32:42) 405 Now radial factor accounts for the effect of diminishing light intensity over distance and we model it using an inverse quadratic function shown here where a0, a1, a2 are coefficients that we can vary to produce better realistic effects and d is the distance between source and surface point. So, by using this inverse quadratic function we can take into account the effect of distance on the intensity. (Refer Slide Time: 33:29) The other attenuation factor is angular attenuation. So, in this case, we use it primarily to generate the spotlight effect. So, there are many ways to do this of course, but one commonly used function is shown here. With this function, we can actually take into account the angular attenuation. So, farther away from this axis of point is it will reduce the intensities and that reduction in intensity with respect to the axis cone axis spotlight cone axis can be computed using this expression. It will be 0, if the surface point is outside the angular limit θ. So, if some point is here which is outside this limit then of course it is not likely to be getting influenced by the spotlight so the overall component will be 0, but if it is within this limit say somewhere here then depending on its angle and with respect to the axis we can compute using this expression where φ is the angle that this point makes with the cone axis. 406 (Refer Slide Time: 35:22) Now taking into account this attenuation so our simple lighting model will change. Earlier we had the model as a sum of three components Iamb, Idiff and Ispec. Now we are taking into the account the attenuation factor and then we are modifying the form. It now takes the form of Ip= Iamb + AFrad AFang [Idiff+Ispec]. So, AFrad denotes the attenuation factor radial attenuation factor and AFang denotes the angular attenuation factor. Now if these values are set to 1, then of course as you can see we are eliminating the attenuation effect and some value other than 1 will include the effect that is all about monochromatic point light source. 407 (Refer Slide Time: 36:36) Now let us assume colored source what will happen in that case? So, in case of monochromatic light which generate different shades of gray. Now if we have to generate color images then we need to consider colored light source. (Refer Slide Time: 36:54) Now as we have discussed earlier in the introductory lecture, when we are talking of color, we are assuming that there are three primary colors: red, green and blue. They together give us the perception of a particular color. Accordingly we can assume that the source light intensity is a three element vector. So, source intensity has three component intensity: one for red, one for green, one for blue. 408 Similarly, reflection coefficient also have components. Each coefficient is a vector having three coefficient - one is for each of the color. So one for red, one for green, one for blue for each of the coefficient for ka, kd and ks. So, this is the only modification we made in order to take into account colored light sources. (Refer Slide Time: 38:02) Then we compute each color component separately using the light model with appropriate source intensity and coefficient values. So, for computing the component for red we use the red source intensity as well as the reflective coefficients for red. Similarly, for green and blue. (Refer Slide Time: 38:37) That is the modification and finally let us assume that there are more than one light sources. 409 (Refer Slide Time: 38:48) In that case what will happen again a simple extension so earlier we had only these component plus this component now we are introducing a summation component here. So, for each of the source we will compute these component and then we will add it up for all the n light sources. Note that ambient component will not change because it is the same for all with a single ambient light intensity. The change is only for the components due to direct light namely diffuse component and the specular reflection component. So, this is the overall simple lighting model in general for multiple sources and if we want to have color then we will simply have IpR, IpG and IpB for red, green, blue where these coefficient for ambient light as well as diffused reflection coefficients will be chosen according to the specific color component. So, we will have 3 separate values giving us a 3 element output. So, that is in summary our overall simple lighting model. 410 (Refer Slide Time: 40:30) So, to summarize, we have discussed a simple model assuming that there is one point light source initially we assume monochromatic then we assume that there are colored light sources and initially we observe single light source then we assume that there are multiple light sources, but in all the cases we assume that it is a point light source. So, a dimensionless light source characterized by only position and intensity. Another thing that you should note here is that the simplifying assumptions we made. So, to compute ambient light we assume that there is a single ambient light intensity which is not true in practice. To compute diffuse light component due to direct light. We assume that Lambertian surfaces are there which again need not be true in practice and to compute specular component we assume an empirically derived model the Phong’s specular model which does not reflect the actual optical behavior. But in spite of these assumptions whatever we get gives us a working solution to our problem of computing colors which works in practice. So, although it does not reflect the actual optical behavior it gives us a working solution and due to this many simplifying assumptions we are calling it simple lighting model. In order to discuss this simple lighting model we left out many important topics which actually designed to take into the account the actual optical behavior. Which in turns gives us even better, much, much better realistic effects which is expected, but at the cost of increased heavily increased computation. To know more about such models you may refer to the material that will be mentioned in the next slide. 411 (Refer Slide Time: 42:48) So, with this, we conclude our discussion on the simple lighting model. As I said to learn about this you may refer to this book, refer to chapter 4, section 4.2 to learn in more details the topics that I have discussed today and also you may refer to the reference material mentioned in that book and in that chapter for more details on the more realistic lighting models which are much more complex than the simple model. So, with this I conclude today’s lecture. Thank you and good bye. 412 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No 15 Shading Models Hello, and welcome to lecture number 15 in the course Computer Graphics. As usual, we will start with a quick recap of the pipeline stages that we are currently discussing. (Refer Slide Time: 00:43) So, as you may recollect, there are five stages in the graphics pipeline. The first stage is Object Representation; the second stage is Modeling Transformation. The third stage is Lighting or assigning color to the surface points. The fourth stage is the Viewing pipeline which itself consists of five sub-stages namely Viewing transformation, Clipping, Hidden surface removal, Projection transformation and Window to Viewport transformation. The fifth and final stage of the graphics pipeline is Scan comparison. I would like to emphasize here again the fact that although in this lecture or in this course, I will be following this sequence of stages, but in practice, it is not necessary to follow this exact sequence. So, when a graphics package is implemented, you may find that some stages are coming after other stages although in the sequence that I have discussed. They are actually before those other stages like Hidden surface removal may come after Scan conversion although we are discussing it as before Scan 413 conversion. So, this sequence is not a strict requirement. The basic concepts are what matters the most. So, far we have completed our discussion on the first two stages namely Object representation and Geometric or Modeling transformers. Currently, we are discussing the third stage that is Lighting or assigning color to the surface points. In the Lighting stage, we have introduced the basic issues that are addressed in this stage. And in the previous lecture, we have gone through a simple Lighting model. If you may recollect, in the simple lighting model, we assume that the color is essentially a composition of three constituent colors or intensities. Intensity due to ambient light, intensity due to diffuse reflection, and intensity due to specular reflection. And we have learned models for each of these components and how to combine those models in the form of a summation of these three individual components. (Refer Slide Time: 03:23) Today, we are going to discuss Shading models which is related to assigning colors to the surface points, but in a slightly different way. Now, as we have seen during the simple lighting model discussion, the model itself is computation intensive. 414 (Refer Slide Time: 03:57) So, the calculation of color at a surface point in a 3D scene involves lots of operations. As a result generation of the image which includes assigning colors to the image is complex and expensive in terms of computing resources, what are those resources? Processor memory and so on. Also, it takes time. So, both are important resources and time. So, when we are talking of assigning colors or computing the colors, which is the job of the third stage. What we are referring to is essentially the utilization of underlying computing resources. And in the Lighting model, we have seen that the utilization is likely to be very high because the computation involves lots of mathematical operations involving real numbers. Also, it is likely to take time. 415 (Refer Slide Time: 05:26) In practice, whenever we use some graphics applications, we may have noticed that the screen images change frequently. For example, if we are dealing with computer animation or computer games, or any other interactive application, so, screen content changes at a very fast rate. So, the requirement is that we should be able to generate newer and newer content and render it on the screen very quickly. But if we are getting bogged down with this lots of complex computations for assigning colors or as we shall see in subsequent stages for doing other pipeline stages, pipeline operations, then that requirement may not be fulfilled, we will not be able to generate images quickly. 416 (Refer Slide Time: 06:32) So, that may result in visible flickers, distortions which in turn may lead to irritation and annoyance to the user. And we certainly do not want such a situation to occur. In order to avoid such situations by reducing the number of computations involved or the amount of computations involved in assigning colors to surface points, we make use of Shading models. So, the idea of Shading models is that we have Lighting models, we can make use of it to find out or determine the color at a given point. However, if we do that for each and every point, then that is likely to be computation-intensive and time-consuming. To reduce computation we use some tricks in the form of Shading models. 417 (Refer Slide Time: 07:41) So, what do we do with a Shading model? First, we use the Lighting model to find out or compute colors of only a few of all the points that are there on the surface. Now, using those computed points, we perform interpolation and through interpolation, we assign color at other surface points which are mapped to the screen pixels. So, here Shading models are used when the surface points are already mapped to screen pixel. So, already rendering took place. (Refer Slide Time: 08:38) Now, between the Lighting model and Shading model, there are broadly two differences. 418 (Refer Slide Time: 08:50) We have already mentioned that the Lighting model is very expensive because it involves large number of floating-point operations. In contrast, Shading models are interpolation-based. That means, we can come up with efficient incremental procedures to perform the computations rather than going for complex floating-point operations as we shall see in our subsequent discussions. (Refer Slide Time: 09:28) The other major differences, Lighting models are applied on the scene description that means, in a 3D world coordinate system whereas, as we have just mentioned, typically Shading models 419 work at the pixel level after the scene is mapped to the screen or after the rendering is done. That is the fifth stage of the pipeline is performed. So, as I said at the beginning, it is not necessary that everything should work as per the sequence we have outlined. In practice things work with a slightly modified sequence, what is important is to know about the basic concepts rather than sticking to the exact sequence of pipeline stages. (Refer Slide Time: 10:23) So, that is the idea of the Shading model and there are two major differences between Lighting and Shading models. Now, let us try to have a look and try to understand some Shading models briefly we will start with the simplest of the Shading models that is Flat Shading. 420 (Refer Slide Time: 10:51) So, it involves the least amount of computation and what it does? (Refer Slide Time: 11:00) So, first in this Flat shading model, what we do first is, find out the color of any one point on a surface using the Lighting model. So, we apply the Lighting model and compute the color of any one point, a single point on a surface and then this color is assigned to all other surface points that are mapped to the screen pixels. So, suppose this is a surface and this is mapped. This is the pixel grid that I am drawing here. 421 So, consider this scan line here. So, the pixels that are part of the surface are these three. Now, what we do in this Flat Shading model is that we choose any arbitrary point to apply the Lighting model and compute its color, color of that particular point in the 3D world coordinate system because we required to compute the vectors also, and then we use that to assign colors to all other pixels that are part of the surface. So, suppose we have computed color at this point say the color is C at this point, then we use this color to set color values of all other surface pixel points. For example, these three we set as C. (Refer Slide Time: 13:03) Clearly, this is a very simple scheme and it is likely to lead to unrealistic images unless we choose the application scenario properly. So, we must say that Flat Shading works in certain situations, but not in general good to color any surface. So, in general, we will not be able to use this particular Shading technique, because it may result in unrealistic images. So, when Flat Shading will be useful, there are a few conditions. What are those conditions? Let us see. 422 (Refer Slide Time: 13:46) So, in order to make the particular Shading method work, we have to assume three things. First, the surface should be polygonal. Second, all light sources should be sufficiently far from the surface. So, the Shading effects sets of different intensities or colors are not applicable. And the third Viewing position is also sufficiently far from the surface. It may be obvious that if we are assuming that the light source is very far away and the viewer is also looking at the scene from a very far distance. Then the minute differences between colors at neighboring regions may not be perceivable to the viewer, and accordingly whatever color we assign will look like uniform. So, in that case, Flat Shading may work and these three conditions restrict the use of the Flat Shading algorithm. I repeat again in order to make the Flat Shading work there should be three conditions satisfied, first the surface must be polygonal in nature. All light sources should be sufficiently far from the surface and the viewing position should be sufficiently far from the surface. If these three conditions are not met, then the resulting colored surface may look unrealistic. 423 (Refer Slide Time: 15:56) To avoid the problems that are associated with Flat Shading, an improved Shading model is there that is called Gouraud Shading. Let us try to understand Gouraud Shading. (Refer Slide Time: 16:18) It gives us a more realistic coloring effect than Flat Shading. But, at the same time, it is having more computation. So, the improvement is at the expense of increased computation. 424 (Refer Slide Time: 16:37) What happens in this Shading method, first, we determine the average unit normal vector at each vertex of a polygonal surface. We will soon see what we mean by the average unit normal vector. Then using that vector we compute color by applying a Lighting model at each vertex of the surface. Then we Linearly interpolate the vertex intensities over the projected area of the polygon. So, three stages are there or three steps are there in the first step, we compute average unit normal vector, in the second step, we compute color at the vertex positions by considering the average unit normal vector and in the third stage, we Linearly interpolate the color that we have computed at the vertices of the surface. To assign color to other pixels that are part of the surface. 425 (Refer Slide Time: 17:56) Now, let us try to understand the stages in detail. So, in the First step what we do, we compute the average unit normal vector. It essentially implies that a vertex of a surface may be shared by more than one surfaces. For example, consider this vertex here. Now, this vertex is shared by all the four surfaces in this figure. So, in that case, when we are trying to compute color at this vertex, which surface normal I should use? So, there is confusion. In order to avoid that Gouraud Shading tells us to compute the average unit normal vector. This is essentially the average of the unit normals of the surfaces sharing the vertex. So, in this particular example, the vertex here is shared by four surfaces, each will have its own normal vector. Say for Surface 1 it is N1, Surface 2, Surface 3 N3, Surface 2 N2, Surface 4 N4. We take the unit normal vectors then compute the average using the simple formula. So, this is a vector addition divided by a scalar quantity which is the modulus of the four-unit vectors. So, at that particular shared vertex, we use or we compute the average unit normal. 426 (Refer Slide Time: 19:43) Then in the second step with the average normal, we compute the color at this vertex using the Simple Lighting model. So, if you may recollect from our discussion on the Simple Lighting model to compute color components for diffuse reflection and specular reflection we had to use surface normals. So, instead of that regular surface normal, we use average surface normal to compute color. And this will do for all the vertices of the surface. So, it takes one surface at a time and compute colors for all vertices that define that particular surface. (Refer Slide Time: 20:39) 427 In the third step, which is the final step, we use these vertex colors to linearly interpolate the colors of the pixels that are part of the projected surface. So, we are assuming here that the surface is already projected on screen through the final stage of rendering and we already know the pixels that are part of the surface. Since we have computed the vertex colors in the first two stages, we use these colors to linearly interpolate and assign colors to other pixels that are part of the surface. (Refer Slide Time: 21:24) Let us try to understand in terms of one example. So, in this figure, we have shown a projected surface defined by three Vertices, Vertex 1, Vertex 3, Vertex 2. So, if we apply Gouraud Shading after the second step, we have already computed the colors of these three vertices by using the Simple Lighting model as well as the average unit normal vector at these vertex locations. Now, we are interested to assign or find out the colors of the pixels that are part of the surface, but not vertices. For example, there are Pixels 4, 5, 6, 7 these are all part of the surface, also 8 and many more. 4, 5, 6, 7 belong to the same Scan line, 4 and 8 belong to two consecutive Scan lines. 428 (Refer Slide Time: 22:47) So, what we do, we perform linear interpolation in terms of the colors that are already computed for the vertices. So, we take one scan line at a time. For example, we have taken the (i+1)th scan line. So, we compute the color at 4 and 7 which are two edge intersection points on the scan line which means, they are the intersection points between the edges of the surface and the scan line. And we apply interpolation where I1 and I2 denote the intensity or the color value that is already computed at Vertex 1 and Vertex 2. So, for I4 we required these two values for I7 we require I3 and I2 where I3 is the vertex color at 3 here and this y4, y2 these are all y coordinates of those pixels. So, we first compute colors for I4 and I7 on the same scan line and then using I4 and I7 we compute I5 which is here, which is inside the projected surface on the same scan line. So, the interpolation is shown here I5 is computed in terms of I4 and I7 note that here we are using the x coordinates of the pixels. In order to compute I4 and I7, we used y coordinates. But in order to compute I5 we are using x coordinates of the corresponding pixels. That is about the same scan line what happens when we want to compute the color of subsequent scan lines say in terms of previous colors, we want to compute the color for 8th pixel, the point 8. 429 (Refer Slide Time: 25:27) That is also possible. Actually, the equations or the formula that I have shown in the previous slide are not what is implemented in practice. There is a more efficient implementation of Gouraud Shading where we do not necessarily always compute the ratios and multiply it with the color values as we have seen in the previous slide. Instead, we perform interpolation with only addition, the multiplication and division are not required. However, for more details on this incremental approach of interpolation, you may refer to the reference material mentioned at the end of this lecture. We will quickly have a look at the corresponding algorithm. 430 (Refer Slide Time: 26:37) The incremental approach is encapsulated here. In these two lines, as you can see, color can be found out by simply considering the color already computed plus some age constants which are predetermined. Similarly, in this stage also in this stage, we can use simple addition to compute color where the addition is between previously computed color and some constant which is already pre-computed as shown in this line 2. For more explanation on this algorithm, you may refer to the material that will be mentioned at the end. The basic idea is that this linear interpolation can be computed using simply addition rather than multiplication and division that is required if we are trying to do it in a classical way. So, this is a more efficient implementation of the stage three of Gouraud Shading. 431 (Refer Slide Time: 27:59) And one more thing we should note here is that this particular Shading technique Gouraud Shading is implemented along with a later stage of the pipeline, which is part of the fourth stage it is called hidden surface removal. So, we will discuss about it later. So, Gouraud Shading assigns colors, but it is typically implemented along with a later stage of the pipeline that is a sub-stage of the fourth stage hidden surface removal. (Refer Slide Time: 28:41) There are problems with Gouraud Shading as well, although it generates more realistic images compared to Flat Shading, but there are two major problems, one is it is still not good to generate 432 a specular effect that is that shiny surface or the bright spots that we get to see on the surface. This is primarily because this linear interpolation results in a smooth change of color between neighboring pixels which is not what happens in the specular reflection where there is a sudden change between neighboring pixels. Secondly, what Gouraud Shading suffers from this problem of occurrence of Mach bands is kind of psychological phenomena in which we see bright bands when two blocks of solid colors meet, so, if two constitutive surfaces are assigned different colors, then at their joining point we may get to see some band like things, which is a psychological phenomenon known as Mach banding effect. And this may result if we apply Gouraud Shading. (Refer Slide Time: 30:13) There is a third Shading method, which is quite advanced and it eliminates all problems that we have discussed so far with Flat Shading and Gouraud Shading. 433 (Refer Slide Time: 30:26) But, it is heavily computation-intensive and requires huge resources as well as time. We will just learn the basic idea and we will not go into the details. So, this Phong Shading is also known as Normal vector interpolation rendering. (Refer Slide Time: 30:51) Now, in this, we actually compute color at each point where we find out the normal vectors in a different way. So, there is actually no interpolation involved, interpolation only in terms of finding out vectors, but not computing colors, it takes much more time as expected and it does not have the advantage of other Shading models in terms of reduction in computations. 434 So, it gives us a very realistic image because the coloring effect is closer to reality due to the very sophisticated approach, but for the same reason, it cannot compute colors with reduced computations, which are the advantages of Shading models. So, it is not having the main advantage, but it gives us more realistic images. We will not go into the details of it, it is quite complex. And if you are interested you may refer to the reference material that will be mentioned at the end of this lecture. (Refer Slide Time: 32:12) I will just mention the three steps. In the first stage, we compute the average unit normal vectorlike in Gouraud Shading. In stage two we apply the Lighting model at each vertex to compute color and in stage three we apply interpolation but in a different way. 435 (Refer Slide Time: 32:43) What is that difference? Instead of interpolating colors we now interpolate to determine normal vectors at each projected pixel position. Remember that normal vectors assume that we are in a 3D world coordinate system, whereas the projected pixel position assumes that we are already in the device coordinate system which is 2D. So, we need to calculate normal vectors to actually apply the lighting model which involves the use of normal vectors. We do that here in Phong Shading. So, the interpolation is not used to compute intensity instead it is used to determine normal vectors. Once that is done at each projected pixel we know the normal vectors through interpolation, we compute color using the Lighting model. So, here we are computing color using the Lighting model, but not through interpolation only difference is that in order to compute color with the Lighting model, we need a normal vector that we are finding out through interpolation. So, essentially in this case, if we summarize the surface is projected we identified a set of pixels that constitute the surface, at each pixel location we are applying the Lighting model. Before that, we are using interpolation to find out the normal vector at that pixel location and then we are using the Lightning model. So, we are using the Lightning model repeatedly, which increases the computation and time. 436 For more details, you may refer to the material that will be mentioned at the end. We will just outline and we will stop here on the discussion on Phong Shading. Now, let us try to understand the idea of Shading in terms of one illustrative example. (Refer Slide Time: 35:01) Let us consider a cubical object with the vertices given A, B, C, D, E, F, G, and H. Now, with this object we want to create a scene of a room in which the object is treated as a shelf attached to a wall keeping the relative positions of the corresponding vertices same. So, the relative position will be the same and there is some specification about the wall also it is parallel to the XZ plane cutting the positive Y-axis at a distance of 1 unit. And the length is reduced by half and we also mentioned the corresponding vertices in the shelf with respect to the original vertices. So, after the specified transformation, this figure shows the 3D scene with the shelf attached to the wall as specified in the problem. 437 (Refer Slide Time: 36:39) We also have to know its projection in order to be able to apply Shading. Now, that is mentioned here the shelf looks something like this as shown here with the vertices specified each of which corresponds to the corresponding vertex in the original scene. So, F’,’ belongs to F’, E double’ belongs to E’ and so, on. And in the projected scene, we have mentioned one vertex coordinate so that other coordinates can be derived. For example, here we have mentioned the vertex coordinate of 4 7 then we can derive E to be, X will remain the same Y will be reduced by 1 2 3 4 5, so Y will be 2 and so on for other vertices. In that way, we can derive the locations. 438 (Refer Slide Time: 37:57) Now, assume that the room has a monochromatic point light source at a given location with intensity of 2 units and also assume there is an ambient light with the intensity of 1 unit and the reflective coefficients or reflectivities for the 3 components ka for ambient light, kd for diffuse reflection due to direct light and ks for specular reflection due to direct light are specified. And the specular exponent is also specified as 10 and the viewer is located at this position. (Refer Slide Time: 38:48) Assuming this setting let us try to compute the colors at the pixels P1, P2, and P3 assuming the simplest of all Flat Shading. So, this is P1, this is P2 and this is P3, how we can do that? 439 (Refer Slide Time: 39:13) So, we first determine the coordinates of the projected vertices which should be easy. (Refer Slide Time: 39:35) Then, we have to compute the color at any given point on the surface. Note that as per the problem description light source is above the surface A’, B’, C’, D’, and on the left side of the plane which contains the surface B’, F’, G’, C’. Thus, it will illuminate this surface, but will not contribute anything towards the illumination of the other surface. So, this is the first observation of the problem description. 440 (Refer Slide Time: 40:17) Now, in order to compute color, we can calculate color at any point and then use the same value throughout the surface in Flat Shading. So, let us calculate color at this vertex B’. (Refer Slide Time: 40:37) If we see the scene and the object description in the scene, then we know that surface normal at B’ and the unit surface normal will be this. Now, we know the light source, so the unit vector towards the light source can be computed in this way and unit vector towards the viewer because we know viewer location can be computed in this way. 441 (Refer Slide Time: 41:17) Then with these values we can get the dot product as something like this and also this second dot product for the specular component as something like this and with these values and using the reflectivity coefficients we can get the three components added up to get the overall color value to be 0.79 unit at B’. (Refer Slide Time: 42:06) Now, we know that P1 and P2 both are part of the same surface containing B’. P1 is part of the B’ and P2 is part of the surface containing the B’. Now, if we are using Flat Shadings, so, we have 442 already computed the color at B’ so, we will simply assign these colors to all the surface points that mean to P1 and P2. So, the values color values of P1 and P2 will be 0.79 units. (Refer Slide Time: 42:47) And we have also noted that the light source does not contribute to the illumination of this other surface B’, F’, G’, C’. So, in that case, there will be no contribution due to the direct light source. So, those two components due to diffuse reflection and specular reflection due to direct light source will be 0 and it will be illuminated only by the ambient light which is computed using this expression ka into Ia, where ka is the coefficient value and Ia is the intensity and we get this value. So, these are the values that we have computed using Flat Shading P1, P2 and P3. Note here that we did not use color model or the Lighting model to compute values at P1 and P2 instead we computed the value only at B, B’ and use that to assign color to P1 and P2. And similarly, we have done for P3. So, here we have reduced the usage of the Simple Lighting model and by that, we have reduced the amount of computations required. However, as I said before since we are using Flat Shading the colors that are computed may not look realistic when they are rendered on the screen if, the distances of the source, as well as the viewer from the surface, are not sufficiently large. 443 (Refer Slide Time: 44:34) Now, here also it may be noted that we have done some informal reasoning to come to the conclusion of the color values. But if we simply apply the algorithms, then also we will get the same result. We do not need to actually do any informal reasoning but that you can try on your own. We will not work that out here. (Refer Slide Time: 45:00) And also I would like to request you to use the Gouraud Shading algorithm to perform the same computations for the three points. I leave it as an exercise for all of you to do. And then you can compare the amount of computation as well as the end of values that you are getting and from 444 there, you can get some informal idea of the effect that results in the application of these different Shading models. So, we have come to the end of our lecture today. To quickly recap, we learned about the idea of Shading and its difference with the Lighting model. Then we discussed in detail, Flat Shading model and Gouraud Shading models, and just outline the idea of Phong Shading models. With the illustrative example, I hope, you could get some idea of the application of the Shading models and its advantages over-application of only the Lighting model to compute colors. With that, I would like to end today's lecture. (Refer Slide Time: 46:38) For more details, including the ones that are mentioned at different points of the lecture you may like to refer to this book. Please have a look at Chapter 4, Section 4.3 for the details on all the topics that I have covered today. Thank you and goodbye. 445 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No 16 Intensity Mapping Hello and welcome to lecture number 16 in the course Computer Graphics, we are currently discussing the different pipeline stages, pipeline means how the rendering of a 2D image on a computer screen takes place through the process of Computer Graphics. (Refer Slide Time: 00:57) Now, as we know, there are five stages; we already have discussed the first two stages, namely Object representation, and Modeling transformers. Currently, we are discussing Lighting or the third stage and after this, we will be left with two more stages to discuss the fourth stage Viewing pipeline and the fifth stage Scan conversion. 446 (Refer Slide Time: 01:25) In the third stage Lighting, we deal with assigning colors to the surface points, the surface of an object. Now, in the previous couple of lectures, we have learned about the process of coloring that is, we learned about a simple Lighting model, and also we learned about Shading models. To recap, the Lighting model is a complex mathematical expression to compute color at a given point and it makes use of various components of lights that are there when we try to see a colored object. Now, these components are ambient light, diffused reflection due to direct light source and specular reflection due to direct light source. And for each of these, we have learned models and these models in turn make use of the vectors, surface normal vectors or the viewing vector and the vector towards the light source, all these vectors are used to compute these components. And at the end, we sum up these three component contributions to get the overall color values which is expressed in terms of an intensity value. Now, this Lighting model is complex and involves lots of operations. So, essentially it takes time in order to reduce computation time; we learnt about Shading models, where we do not compute color values using the Lighting model at each and every point, instead we compute values at a very small number of points, maybe a single point on a surface and we use interpolation techniques to assign colors to other points. 447 Now, this interpolation technique is much more simpler compared to the Lighting model computations. Now, these two techniques to assign colors are discussed in the previous lectures. One more thing remains that is how we map the computed intensity values either using the Lighting model or using the Shading models to a bit sequence, a sequence of 0’s and 1’s that the computer understands that will be the subject matter of today's discussion, Intensity Mapping. This is the third component of assigning color to an object surface. (Refer Slide Time: 04:25) Now, when we talk of Intensity Mapping, what we refer to? We refer to a mapping process, what it maps? It maps the intensity value that we have computed using the Lighting or the Shading model to a value that a computer understands that is a string of 0’s and 1’s. 448 (Refer Slide Time: 04:50) If you may recollect, during the worked-out examples that we have discussed in the previous lectures, we have seen the computation of intensity values. And those values are real numbers typically within the range 0 to 1. Now, these values are supposed to be used to drive the mechanism to draw pictures on the screen. In the introductory lectures, we have touched upon the basic idea of a graphics system. There we mentioned that through the pipeline stages we compute intensity values and these values are used to basically drive some electromechanical arrangement which is responsible for rendering or displaying a colored object on a computer screen. As an example, we briefly touched upon the idea of cathode ray tube displays. So, if you may recollect, there what we said that the CRT displays consists of an electromechanical arrangement where there are electron beams generated which are supposed to hit some locations on the screen representing the pixel grid. Now, this generation of electron beams is done through an electromechanical arrangement consisting of cathodes and anodes and magnetic fields. And this electromechanical arrangement is controlled by the values that we compute at the end of the pipeline stages. So, our ultimate objective is to use the values, intensity values and use them to drive the mechanism that actually is responsible for drawing colors on the screen or drawing pictures on the screen. 449 (Refer Slide Time: 07:18) As we have already mentioned, in a CRT display, this picture drawing is done by an arrangement of electron guns, which emits electron beams, and there is a mechanism to deflect those beams to specific regions on the screen where phosphor dots are present. And when the beam hits the phosphor dots, the dots emit photons with particular intensity that is light intensity, which gives us the sensation of a colored image on a screen. Of course, CRT displays are now obsolete. You may not be knowing about these displays nowadays, but there are lessons to learn from CRT displays. And at the end of this course, towards the end, we will learn about other displays where similar things happen, where we actually use the computed intensities to generate some effect on the screen which gives us a sensation of color. And this computed intensity values are used to drive the mechanism that generates those effects. We will talk about some display mechanisms at the end of this course, where we'll have dedicated lectures on Graphic Hardware. (Refer Slide Time: 09:00) 450 Now, the point is, so, we are saying that this intensity values are supposed to drive a mechanism some arrangement which in turn is responsible for generating the effect of colored image. But if the intensity values are computed as a real number in a range of 0 to 1, how we make the computer understand the value because computers do not understand these real numbers they only understand digital values, binary strings of 0’s and 1’s. A problem here is that any intensity value cannot be represented and used for the purpose of driving some arrangement to generate the visual effect of colored image on a screen and we need some way to represent the corresponding intensity values in the computer. Now, this presentation depends on the frame buffer how we designed the frame buffer. 451 (Refer Slide Time: 10:25) And that is what we call the Mapping Problem. What is this problem? (Refer Slide Time: 10:40) Suppose, let us try to understand it in terms of an example, suppose, we have a graphics system which has a frame buffer where 8 bits are there for each pixel location that means, 8 bits are there to store intensity values for each pixel. Now, with 8 bits, how many colors we can represent that is 2 to the power 8 or 256 values, it means that for each pixel location we can assign any one of the 256 values as a color value. So, for that particular graphics device, we can say that any pixel can take at most 256 color values. 452 (Refer Slide Time: 11:37) On the other hand, when we are computing the pixel colors, there is no such restriction, we can compute any value, any number between 0 to 1. So, that is essentially an infinite range of values. Note that this computation takes place with the help of Lighting or Shading models. So, on the one hand we have values that can be anything, which we get by applying the Lighting or Shading models real value between 0 to 1. And on the other hand due to the particular hardware design, we can represent at most a restricted number of values for each pixel location, in our example it is 256 values. So, essentially we need to map this potentially infinite intensity values to the 256 values, this is the problem. So, given the set size, the size of the number of values that can be represented in a computer, we have to map the potential range of values to those restricted sets. 453 (Refer Slide Time: 13:03) This is our mapping problem, where we have to keep in mind that we cannot use any arbitrary mapping because that may lead to visible distortion, our perception is a very sensitive and complex thing. If we arbitrarily decide the mapping, then we may perceive images in a different way then, ideally what should have been the case. So, this distortion evidence is another objective of our mapping. So, we need to map and we need to map in a way such that this distortion is not there. How we can achieve this objective? Let us try to understand the scheme through which we can achieve this objective. 454 (Refer Slide Time: 13:58) So, that is the Mapping scheme. (Refer Slide Time: 14:01) The core idea behind the scheme is that we need to distribute the computed values among the system supported values such that the distribution corresponds to the way our eyes perceive intensity difference. So, this is a slightly complex idea. Let us try to understand this in terms of some example. 455 (Refer Slide Time: 14:37) Now, this core idea actually relies on our psychological behavior. How we perceive intensity differences (Refer Slide Time: 14:58) Let us take one example. Suppose, there are two sets of intensity values. In the first set, there are two intensities 0.1 and 0.11. So, the difference between the two intensities is 10 percent. In the second set also, there are two intensities 0.5 and 0.55, again here the difference is 10 percent. But, due to our psychological behavior, we will not be able to perceive the absolute difference 456 between the intensity values, the difference will look the same, although absolute values are different. So, in first case we have two absolute values, although the relative difference between them is 10 percent. And in the second set, we have two absolute values which are different than the first set, but the relative difference is same 10 percent. If we are asked to look at those two sets of values, we will not be able to perceive the difference between those values because of our psychological behavior, that we do not perceive the absolute differences, instead, we perceive the relative differences. If the relative differences are same, then we will not perceive any difference if in spite of absolute differences being there. (Refer Slide Time: 16:45) So, that is one crucial behavioral trait of us, we cannot perceive the absolute difference in intensity values only relative difference matters. Now, if that is the case, then we can utilize this knowledge to distribute the intensity values among the device supported intensity values. How we can do that? 457 (Refer Slide Time: 17:17) It follows from our behavioral trait, that if ratio of two intensities is the same as the ratio of two other intensities, then we perceive the difference as the same. This is an implication of the psychological behavior that we just described. And using this implication, we can distribute the intensities, let us see how. (Refer Slide Time: 17:56) Recall that we are given a continuous range of values between 0.0 and 1.0. So, this is our range of computed intensity values computed using Lighting or Shading model. On the other hand, the device supports a set of discrete values, because the frame buffer is designed in that way. And 458 we are supposed to map this continuous range to that set of discrete values. This continuous range needs to be distributed into the finite set of discrete values. And we can do that without distorting the image by preserving the ratios in the successive intensity values, if we preserve the ratio in the successive intensity values, then even if we are approximating a computed intensity to a device supported intensity, the resulting image will not appear to be distorted and this comes from the psychological trait that we have just discussed. That is our eyes are not designed to perceive absolute differences in intensities; instead, only relative differences matter. (Refer Slide Time: 19:35) So, based on this reasoning, we can come up with a mapping algorithm, a step by step process to map a computed intensity to one of the device supported intensities. 459 (Refer Slide Time: 19:52) Let us assume that the device supports N discrete values for each pixel and let us denote these values by I0, I1, up to IN. So, there are it should be N-1. So, denoted by I0, I1, up to IN-1, there are N discrete values. (Refer Slide Time: 20:23) Now, we can use a particular device called a Photometer to determine the boundary values that is I0 and IN-1. Now, it means that we know the range of intensities supported by that particular system; this is called the Dynamic range, which is bounded by I0 and IN-1. 460 (Refer Slide Time: 21:12) Now, the highest value that is IN-1 is usually taken to be 1.0 that is a convention used. So, the intensities range between I0 and 1 this is the range [I0, 1]. This is the dynamic range. And I0 value we can obtain by using the particular device called Photometer. (Refer Slide Time: 21:44) Now, we will apply the knowledge that we have just discussed that is to preserve the ratio between successive intensities, we must ensure the following that is a I1/I0 = I2/I1 … IN/IN-2 = a common ratio r. So, the ratio of the consecutive intensity values supported by the device should be the same. 461 (Refer Slide Time: 22:26) In other words, we can express all intermediate values in terms of the lowest value. So, I1 we can represent by this expression rI0, I2 similarly, we can express by the expression r2I0, I3 to be r3I0, and so on. (Refer Slide Time: 22:52) So, in general, we can say that this equation holds that is Ik is equal to rk and I0 where I0 is the minimum intensity for k>0. Going along this line we can say IN is equals to rNI0. So, this equation holds for any intensity value supported by the device. Now, here you can notice that the 462 total number of intensity values supported by the device is represented by N+1 and IN is the maximum intensity value, I0 is the minimum intensity value. So, then what we need to do as we already discussed, we have already determined the minimum value, and we assume the maximum value to be 1. Minimum value we determined using a photometer and we assuming maximum value to be 1. Then, using this equation, we can determine the value of r by solving the equation 1 = rNI0 where we know the value of I0 and we know the value of N from the total number of intensity values supported by the device. Then using this value of r, which we compute by solving this equation, we can obtain the N intensity values using this equation for any particular intensity value k. (Refer Slide Time: 25:43) Now, let us try to understand what to do next, what should be our next step. So, in the previous step we computed the value of r knowing the minimum value, maximum value and the total number of intensity values supported by the device. Then based on that we can compute any Ik. Now, suppose, using a Lighting model, we computed an intensity value for a pixel to be IP. So, we are denoting this intensity value by IP. Now, we will maintain a table, in the table, what we will do, we will maintain the intensity values supported by the device, which we compute using the earlier equation. So, that is I0 which we get with photometer I1, I2 in this way to IN, then once we compute IP, we will try to see from 463 this table which value comes closest to IP. That is, we will try to find out the nearest value that is closest to IP. Let us call it Ik. Now, for that value, we already have a bit pattern stored here in this table. Let us call it bit pattern 0, bit pattern 1, bit pattern 2, this way bit pattern N for the N+1 intensity values. So, for the kth intensity value in the table Ik we know the corresponding bit pattern. So, then we take that bit pattern and store it in the frame buffer. So, that is how we map value computed using a Lighting model to a bit pattern that represents a value supported by the device. (Refer Slide Time: 28:23) So, then in summary what we do, we determine the value of N and the minimum value I0 using photometer and assume IN to be 1.0 that is the maximum value to be 1.0. Then using the equation IN = rNI0, we solve for the value r. Then using the value of r we calculate the device supported intensity values. So, we know I0, then we calculate I1 to be r, I0, I2 to be r2I0, and so on. And for each of these computed values, we keep a bit pattern. So, this is our table upto bit pattern for the maximum value. Then we compute for a pixel the intensity value using a Lighting model, map it to the nearest device supported intensity value by looking at the table and then we use the corresponding bit pattern to represent that computed intensity value and finally, we stored that value in the frame buffer. That is how we map computed intensity value to a bit pattern and store it in the frame buffer location. 464 (Refer Slide Time: 30:18) Let us try to understand this whole process using one example. (Refer Slide Time: 30:25) Suppose, we have a display device, which supports a minimum intensity I0 as 0.01 and this value of course, as we mentioned earlier we found out with the photometer device. As usual, we are assuming that the maximum intensity value supported by the device to be IN equal to 1.0. 465 (Refer Slide Time: 30:58) Let us assume that the device supports 8 bits for each pixel location. In other words, it has 8 bits to represent the color of a pixel then the total number of intensity values, which we can denote by M to be N+1 as discussed earlier supported by the device for each pixel is 28 or 256. So, M=256 that means N is 255. So, from I0 to I255 are the intensity values that will be supported by the device. (Refer Slide Time: 31:59) So, these intensity values we can denote with these notations I0, I1, I2 up to I255. Now, we can set up this equation based on the relationship that is IN= rNI0. Now, here we are replacing the values 466 IN, I0 and N to get this equation and we solve this equation to get the value of r. So, if you solve it, you will get the value of r. (Refer Slide Time: 32:38) So, solving this we get r =1.0182 and using this value, we get other intensity values in this way, so, I1 will be rI0 that is 0.0102. I2 will be r2I0, that is 0.0104, and so on. And we create a table of these values. (Refer Slide Time: 33:15) 467 In this table also we assigned bit patterns. So, I0 we assigned 000, I1 we assign this bit pattern I2 we assign this bit pattern and so on up to this bit pattern for the last value. This is of course one possible mapping. Now, assignment of bit pattern can be arbitrary, it really does not matter because it has nothing to do with the preservation of ratio. But the actual calculation of these intensity values is what matters. This calculation is done based on the principle of preserving the ratios of successive intensity values. So, that the resulting image is not distorted. (Refer Slide Time: 34:09) Now, let us assume that we have computed some intensity values using the Lighting model at a pixel location and that value is 0.1039. So, this is our table, and we computed this value. 468 (Refer Slide Time: 34:33) So, as per the algorithm what we should do, we try to find out the nearest intensity value that the pixel support. So, in this case, that is I2 or 0.104, and the bit pattern corresponding to I2 is this one. So, we store this bit pattern at the corresponding frame buffer location. (Refer Slide Time: 34:58) So, here you may note that the final intensity value that we represented and stored in the frame buffer is different from the actual value that is computed using the Light model because of the mapping. So, it means that there is some error, always there will be some error. Although with the preservation of the ratio of successive intensities, we can elevate the problem of visual 469 distortion in the resulting image. Still, there are ways to improve this selection of appropriate intensity representing a computed intensity. (Refer Slide Time: 35:57) And there are some techniques to do that, at different level, one is Gamma correction, other one is Error correction for intensity mapping through halftoning or dithering methods. However, we will not go into the details of these methods. The basic idea is that using these methods, we can actually reduce the effect that arises due to the introduction of mapping errors, the difference between computed intensity, and the intensity that we represent and store in the frame buffer. If you are interested, you may refer to the material that will be mentioned at the end of this lecture. 470 (Refer Slide Time: 36:48) So, in summary, what we can say is that. (Refer Slide Time: 36:53) In stage three, there are three broad concepts that we have covered, what are these concepts. 471 (Refer Slide Time: 37:09) The first is Lighting model. So, this is the basic method that we follow to simulate the optical properties and behavior which gives us the sensation of color. Now, lighting model is complex. So, in order to avoid complexities, we take recourse to Shading models. This is the second concept that we have learned. Shading models is essentially a way to reduce computation, while assigning colors to surface points, it makes use of lighting models, but in a very limited way and uses interpolation, which are less computation intensive to assign colors to surface points. Then the third concept that we have discussed is Intensity Mapping. So, with Lighting or Shading model we compute color as a real number within a range of 0 to 1. So, any value can be computed. However, a computer does not support any value, it is discrete in nature. So, essentially discrete values are supported a subset of values of all possible values are supported. For example, if we have 8 bit frame buffer that means each pixel location is represented by 8 bits we can support at most 256 intensity values for each pixel. A pixel color can be any one of these 256 values, whereas, we are computing color as any value between 0 to 1. So, we need to map it, this mapping is complex and it introduces some amount of error. This error may result in distortion. However, to avoid distortion, we make use of one psychological behavioral aspect of our visual perception. That is, we distribute the computed or potential intensities among the device supported intensities in a way such that the ratio of the consecutive intensities remains the same. 472 If we do that, then this perceived distortion of the image may be avoided. However, in spite of that, we introduce some error which may affect the quality of the image. Because whatever color we are computing, we are not actually assigning exactly the same color to the final image, instead, we are mapping it to a nearest color. So, in turn it may affect the overall quality. To reduce that, few techniques are used like Gamma correction or Error propagation. And these techniques you can go through on your own in the reference material that we will mention at the end. In this lecture, we will not go into the details of those techniques, as those are not relevant for our discussion. (Refer Slide Time: 40:44) In the next lecture, what we will do is we will discuss another important aspect of the third stage that is Color model. Along with that, we will also learn about Texture synthesis, both are part of the third state that is coloring. So, so far we have learned three concepts and two more concepts we will learn in the subsequent lectures. 473 (Refer Slide Time: 40:44) Whatever we have discussed today can be found in this book. You may refer to Chapter 4, Section 4.5, to learn about the topics, and also you may find more details on the topics that we mentioned, but we did not discuss in details, namely the Error propagation techniques and Gamma correction techniques. That is all for today. Thank you and goodbye. 474 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture No. 17 Color models and texture synthesis Hello and welcome to lecture number 17 in the course Computer Graphics. Currently we are discussing the third stage of the graphics pipeline. (Refer Slide Time: 0:40) We have already covered first two stages in our earlier lectures and currently we are discussing the third stage that is lighting or assigning colour to the surface points of objects in a scene. (Refer Slide Time: 0:58) 475 We have already covered 3 topics in this stage namely lighting, shading and intensity mapping. Today also we will continue our discussion on the activities that we perform in the third stage of the pipeline. So, two of the concepts that are part of third stage will be discussed today, one is colour model, other one is the texture mapping. (Refer Slide Time: 1:33) And with the discussion on these two topics, we will conclude our overall discussion on the third stage of the graphics pipeline. So, before we start our discussion on the topics I would like you to understand the basic idea behind our perception of color, in order to do that we need to know the psychology and physiology of vision. How do we perceive? (Refer Slide Time: 2:11) 476 We mentioned earlier that color is a psychological phenomenon. So, it is essentially our perception that there is a color. Now, from where this perception comes? That is due to the physiology of our visual system or the way our eyes are made and the way they work. So, essentially the physiology determines the psychological effect. (Refer Slide Time: 2:54) Let us try to go through the physiology of vision in a brief way. Look at the figure here, so this figure shows how our eye is organized. So, we have cornea that is at the outside of the eye there are other components, then we have pupil, iris, lens, retina, optical nerve and central phobia. So, when the light comes after getting reflected from a point suppose this the light ray. So, as the figure shows, the light rays that are incident on the eye passes through cornea that is the component here. Pupil component here and lens that is the component here and after passing through these, it comes to the backside that is the retina. 477 (Refer Slide Time: 4:43) Now, during its passage through these various components, it gets refracted by the cornea as well as the lens so that the image is focused on the retina. So, the lens and the cornea help to focus the image on the retina. Now, once the light rays falls on the retina, image is formed and then it is transmitted to the brain through optical nerve. (Refer Slide Time: 5:34) Amount of light that is entering the eye is controlled by iris. This component and that is done through the process of dilation or constriction of the pupil. This is the pupil, so iris dilates or constricts the pupil to control the amount of light entering the eye. 478 (Refer Slide Time: 6:12) Now, I said that once the light ray falls on the retina, image is formed. How that is done? Now retina is composed of optical nerve fibers and photoreceptors they help in forming and transmitting image to the brain. (Refer Slide Time: 6:42) Now, there are two types’ photoreceptors; rods and cones. Rods are more in the peripheral region of the retina, this region whereas, Cones are mainly in a small central region of retina called the phobia, this component. So, we have two types of photoreceptors rods and cones in retina which receives the light and help create the image. 479 (Refer Slide Time: 7:25) One more thing, now more than one rod can share an optic nerve, so there can be more than one rods for each optic nerve that is there in retina and connected through the rods. And the rods are there to help in one thing that is it aids sensitivity to lower levels of light, so when light is not very bright, we still manage to see things that is due to this presence of rod. (Refer Slide Time: 8:15) On the other hand, in the case of cones, the other kinds of photoreceptors so there is more or less one optic nerve fiber for each cone, unlike the case of rod and it aids in image resolution or acuity. 480 (Refer Slide Time: 8:39) Now, when we get to see something with the help of cones that is called photopic vision and when we get to see something with the help of rods that is called scotopic vision. So, there are two types of vision, photopic and scotopic. (Refer Slide Time: 8:58) And when we say we are seeing a coloured image, the fact is we perceive colors only in case of photopic vision. In scotopic vision, we do not get to see colors instead, we get to see series of grays or different gray levels rather than different colors. So, this is very important, we should remember that when we talk about colored images, that means we are perceiving colors, so we are talking about photopic vision only. 481 (Refer Slide Time: 9:41) Now, there is one more thing we should know that is the idea of visible light. So, when we see a color, as we have already discussed earlier, that is due to the light. So, light coming from some source gets reflected from the point and enters our eye and because of that we get to see colour at that point. Now, this light is called visible light. It allows us to perceive color. Now, what is this visible light? It refers to a spectrum of frequencies of electromagnetic waves, which are the light waves. Now, the spectrum means it is a range. At one end of the spectrum is the red light with the frequency mentioned here and 700-nanometre wavelength. And at the other end of the spectrum is violet light with a frequency and wavelength mentioned here. So, red is the component with lower frequency; violet is the component with higher frequency and all frequencies in between a part of the visible light. And red is the component with the highest wavelength, and violet is the component with the lowest wavelength and in between wavelengths are there in the visible light. 482 (Refer Slide Time: 11:35) Now, why we are calling it visible light? Because there are light waves with frequencies that are outside this range also but we are not calling that as part of visible light. That is for one simple reason. The frequencies that are present in the visible light spectrum are able to excite the cones in our eye giving photopic vision or the perception of color. So, these frequencies that are part of the visible light can excite the cones which gives the perception of photopic vision or coloured images. That is why we are calling this as visible light. Light waves that fall outside this spectrum do not have this property. (Refer Slide Time: 12:47) Now, there are three cone types that are present in the retina. Three types of cone photoreceptors. One type is called L or R. From the name, you may guess, this type of cone is 483 most sensitive to red light. Then we have M or G, which are most sensitive to green light. Now, green light has wavelength of 560 nanometre. And then we have S or B, again this type of cones are most sensitive to blue light with a 430 nanometre wavelength. So, there are three cone types, each type is sensitive to a particular light wave with a specified frequency, we call these light waves as red, green and blue. (Refer Slide Time: 14:03) Then how we perceive colour? So, when light comes, it contains all these frequencies. Accordingly, all the three cone types get stimulated and then as a combined effect of stimulation of all the three cone types, we get to perceive the colour. Now, this theory which tells us how we perceive colour is also known as the tristimulus theory of vision because the basis of it is the idea that there are three cone types and these three gets stimulated to give us the perception. So, it is called the tristimulus theory of vision. We will come back to this idea later. So, that is in a nutshell how we perceive colour. So, we have our eye constructed in a particular way having cone types in retina, three types of cone, these cone types gets stimulated with the frequencies present in the visible light, and then as a result, we get to perceive the colour. 484 (Refer Slide Time: 15:31) Now, that is the basic theory of how our eyes work and how we get to perceive colour. Now, with this knowledge, how we can actually be able to build a realistic computer graphics system. Let us try to see that, how this knowledge helps us in colour generation in computer graphics. This question brings us to a concept called Metamerism or Metamers. What is that? (Refer Slide Time: 16:10) So, what we want in computer graphics? We are primarily interested in synthesizing colours. We are not interested in the actual optical process that takes place in giving us the perception of color. Our soul objective is to be able to synthesize the colour so that the resulting scene or image looks realistic 485 (Refer Slide Time: 16:41) We can do that with the idea of metamers and the overall theory of metamerism. How we can do that? (Refer Slide Time: 16:54) Now, let us revisit what we have just discussed. So, when a light is incident on our eye, it composed of different frequencies of the light spectrum including the visible light frequencies. Now, these visible light frequencies excite the three cone types, L, M, S or R, G, B in different ways. Now, that in turn gives us the sensation of a particular color. So, all three cone types get excited due to the presence of corresponding frequencies and this excitation is different for different incident light and accordingly we get to see different colours. 486 (Refer Slide Time: 17:53) But one thing we should keep in mind that when we say that we are getting a perception of a colour, the underlying process need not be unique. So, in eye it works in a particular way because there are three cone types and these three types gets excited in a particular way, give us the perception of color, but there can be other ways also to give us the perception of the same colour. In other words, the sensation of a colour C resulting from spectrum S1 can also result from a different spectrum S2. That means we have a light source, it gets reflected from a point and comes to our eye. It has one spectrum which excites the three cone types in a particular way and give us a sensation of a colour C. That does not mean that that is the only spectrum that can give us this particular sensation. There can be another spectrum which can excite the three cone types in a different way but at the end can give us the same colour perception. So, there are multiple ways to generate colour perception. It is not a unique way. And this is a very important knowledge we exploit in computer graphics. 487 (Refer Slide Time: 19:30) Now, this possibility that multiple spectrums can give us the sensation of the same colour is because of the optical behavior which we call metamerism. And the different spectra that result in the sensation of the same colour are known as metamers. So, metamerism is effectively the idea that different spectra can give us the sensation of the same colour and these different spectra are known as metamers. (Refer Slide Time: 20:11) So then, what it implies? Metamers imply that we do not need to know the exact physical process behind colour perception. Because exact physical process may involve one particular spectrum say, S1. Instead, we can come up with an artificial way to generate the same 488 sensation using another spectrum S2 which is in our control. So, this S1 and S2 are metamers and gives the perception of the same colour. So, you may not be able to know exactly what is the spectrum when we perceive a particular colour, but we can always recreate that sensation by using another spectrum which is a metamer of the actual spectrum. (Refer Slide Time: 21:15) In other words, we can come up with a set of basic or primary colours and then combine or mix these colours in appropriate amounts to synthesize the desired color. So, a corollary of the metamerism concept is that we need not know the exact spectrum, instead what we can do? We can always come up with a set of primary colours or basic colours and then we combine these colours to get the sensation of the desired color, combine or mix these colours in an appropriate amount to get the sensation of the desired color. 489 (Refer Slide Time: 22:09) So, the idea boils down to finding out the set of basic or primary colours. Now, those sets are called color models. So, this brings us to the idea of color models, ways to represent and manipulate colors. So, basic idea is that we have a set of basic colors using which we can generate different colors by mixing them in an appropriate way with an appropriate amount. That is the idea of color models. So, the idea of metamers brings us to the idea of color models, which helps us to stimulate the idea of colored images. So, we can create the perception of a color using a color model which is supposed to be metamer of the actual spectrum. (Refer Slide Time: 23:12) 490 Thus, the question that we posed that is how we generate colours in CG, one way to do that is with the use of colour models without bothering about how the color is actually generated in the eye, without bothering about the actual spectrum that is responsible for giving us the perception. (Refer Slide Time: 23:38) Now, there are many colour models. Let us try to understand the idea of color models in terms of the most basic of them all, namely the RGB colour model or Red Green Blue color model. Remember that we talked about three cone types, L, M and S. L get excited mostly by the red light, M by the green light and S by the blue light that are present in a light spectrum. We also mentioned that incident light excites these three cones in different ways that means they excite them in different amount which results in the photopic vision which gives us perception of color. 491 (Refer Slide Time: 24:37) Thus, we can think of color as a mixture of three primary colors, red, green and blue. And we need to mix these three colors in appropriate amounts to synthesize the desired color. Now, this idea that color is a mixture of three primary colors red, green and blue is called the RGB color model, which is the very basic of the color model. Now, the idea of this color model as you can guess comes directly from the way our eye work, that is there are three cone types and we excite them differently to generate the perception of different color by varying the light spectrum that is incident on the eye. The idea is illustrated here in this figure as you can see, these are the three component colors red, green, and blue. And when they are mixed in a particular way, for example, here blue and green are mixed, red is also mixed and we get a particular color here, here if we mix these amounts, we will get another color and so on. So, in RGB model, we use three primary colors, red, green, blue and we mix them to get any color. Now, the idea is to mix them. What you mean by mixing them? 492 (Refer Slide Time: 26:43) Here, when we talk of mixing, we mean we add the colors. So, RGB is an additive model. Here, any color is obtained by adding proper amounts of red, green, and blue colors. (Refer Slide Time: 27:01) This is important to remember that this is an additive model. And how we can visualize this model? Is there any way to visualize this model? Now remember that there are three primaries. Thus we can think of a color as a point in a 3D color space. The 3 axes correspond to the 3 primary colors. 493 (Refer Slide Time: 27:29) Further, if we are assuming normalized color values, that means the values within the range 0 and 1 which is typically what we do when we use lighting models, we can visualize the RGB model as a 3D color cube as shown here. This is also known as the color gamut that is, set of all possible colors that can be generated by the model. So, the cube that is shown here contains all possible colors that can be generated by the RGB model. (Refer Slide Time: 28:22) Now, we said RGB is a color model, which is an additive model. Now, there are other color models also. For example, there is the XYZ model, CMY model, HSV model. These models are used for different purposes and in different situations. And not all of them are additive; we also have subtractive models as well. 494 However, in this lecture, we will not go into the details of any other model. If you are interested, you may refer to the material that will be mentioned at the end of this lecture. For more details on these different color models. Now, let us move to our other topic, that is, the synthesis of textures. (Refer Slide Time: 29:21) Now, earlier, we talked about lighting model to compute color. One thing we did not explicitly mentioned that is, when we are computing intensity values using the lighting model, the overall surface when colored with the intensity values computed with the model appears to be smooth which is definitely not a realistic. Typically we get to see different types of surfaces, and the majority or almost all the surfaces are non-smooth. They have something else apart from a smooth distribution of color. Like as you can see here, this wooden plank in this figure, you see on the surface, there are various patterns. Now, these patterns cannot be obtained by applying the lighting model, as we have discussed earlier. When I say the lighting model, I also mean that the shading models as well because shading models are essentially based on the lighting model. So, the lighting or shading models that we have discussed earlier cannot give us the various patterns and other features that typically appear in reality on the surfaces. So, in order to achieve those realistic effects, various other techniques are used. 495 (Refer Slide Time: 31:09) Now, broadly there are three such techniques. And these techniques together are called texture synthesis. We want to synthesize the texture that appears on the surface. So, broadly there are three such synthesis techniques. One is projected texture; one is texture mapping; the other one is solid texturing. (Refer Slide Time: 31:41) Now, let us start with the projected texture. The idea is very simple. So, when you say we have generated a colored image, that means we have created a 2D array of pixel values after, of course, the entire pipeline stages are completed and we have completed the fifth stage that is scan conversion also. Now, these pixel values are essentially values representing intensities 496 or colors. Now, on this surface, we want to generate a particular texture, a particular effect, pattern. What we can do? We can create the texture pattern and paste it on the surface. So, two things are involved here; one is we already have a pixel grid with values that represents the colored surface without texture. Now, we are separately creating a texture pattern and pasting it on the pixel grid or the color values that are already computed using the lighting or shading model. (Refer Slide Time: 33:06) Now, this projected texture method that means the creation and pasting of texture on the surface can be done using a simple technique. So, we create a texture image or a texture map from a synthesized or scanned image. So, either we can artificially create an image or we can scan or synthesize an image and use that as a texture map which is a 2D array of color values. Now, to differentiate it with computed color values we talked about earlier, this 2D array we called as texel array and each array element is a texel. So, we have a pixel grid representing the original surface, and we have a texel grid representing the artificially created texture patterns. Now, this texture pattern can be created either synthetically or by scanning an image. Now, there is a 1 to 1 correspondence between texel and pixel array. This is quite obvious, then what we do? We replace pixel color with the corresponding texel value to mimic the idea that we are pasting the texture on the surface. So, the first stage is we are creating the texel grid, which is creation of the texture pattern, then we are pasting by 497 replacing the pixel values with the corresponding texel values where there is a one-to-one corresponds between the pixel and texel grid elements. 498 (Refer Slide Time: 35:16) Now, this pasting or replacement can be done in different ways, broadly three ways are there. The first thing is the obvious one, simply replace the value with the corresponding texel value. The second is slightly more complicated; here we are blending the pixel and the texel values using a formula shown here. We are using the addition for blending, and here C pixel indicates the pixel intensity, C texel indicates the texel intensity value, k is a constant between 0 to1. The third is also slightly complicated, the third approach, that is, we perform a logical operation either AND operation or an OR operation between the pixel and texel values. Now, remember that we store these values as bit strings in the frame buffer. So, then we can perform logical AND or logical OR operation between the two-bit strings, which will give us a new bit string, and that will represent the final color that is the projected texture pattern. So, this is the projected texture method. Here, we create the texture pattern separately, then paste it. There are three ways to paste; one is simply replacing the pixel value with the texel value. The second is using an operation called blending, and third is using a logical AND or OR operation between the two bit strings representing the pixel and texel intensity values. Either of these three, we can use to paste. 499 (Refer Slide Time: 37:28) There is one special technique used in the projected texturing method, also apart from the ones that we just discussed, this is called the MIPMAP technique, where MIPMAP stands for Multum In Parvo map or many things in a small space map. Multum In Parvo means many things in a small space. What is the idea? (Refer Slide Time: 38:04) Earlier, we talked about creating one texture pattern. Now, in this MIPMAP technique, we are creating more than one texture pattern. In fact, we create a large number of texture maps with decreasing resolutions for the same texture image, and we store them. So, in our MIPMAP technique, we may store all these texture maps for the same image with different sizes, as shown in this figure. So, this is one image, this is another image, this is another 500 image, another image as you can see progressively the size is reducing, although the image remains the same. So, how we use this? (Refer Slide Time: 38:57) Now, suppose we are given to generate something like this pattern. As you can see, the region closer to the viewer position is having bigger patterns here. As the distance from the viewer increases, the pattern sizes becomes progressively smaller as we move away from the viewer. So, in MIPMAP technique, we store these different sizes of the same pattern and simply paste it at appropriate regions of the image rather than creating a more complicated pattern. So, that is the idea of MIPMAP, and we do that in generating realistic texture effects. (Refer Slide Time: 39:57) 501 Next is the second technique, that is texture mapping. Here, we primarily use it for curved surfaces. Now, on curved surfaces, it is very difficult to use the previous technique. Simple pasting of texture does not work, and we go for a more general definition of the texture map. (Refer Slide Time: 40:22) So, what we do there? We assume a texture map defined in a 2D texture space where the principle axes denoted by u and w and the object surface represented in the parametric form, usually by symbols Ɵ and φ. Of course these are one notation, there can be other notation as well. (Refer Slide Time: 40:52) Then, we define two mapping functions from the texture space to the object space. These are the forms of the mapping function shown here. This is the one, and this is the other one. 502 (Refer Slide Time: 41:13) And in the simplest case, these mapping functions take the form of linear functions as shown here where there are four constants A, B, C, and D. So using this mapping functions we map a texture value defined in the texture space to a value in the object space and then use that value to create the particular pattern. (Refer Slide Time: 41:48) Let us try to understand this in terms of one example. Consider this top figure, here we have shown a texture pattern defined in a texture space. Now, this pattern is to be pasted on this surface here, particularly in the middle of the overall surface, to create that effect, now in order to do that we need to map from this pattern to this objects surface space and what are 503 the mappings we should use, we will assume here that we are going for the simplest mapping that is the linear mapping let us try to estimate the mapping functions. (Refer Slide Time: 42:36) Now, the specification of that surface is already given here in terms of the size information. So, using that information, we go for a parametric representation of the target surface area that is the middle of the square. How do we represent this? With this set of equations which is very easy, you can try to derive it yourself. (Refer Slide Time: 43:13) Then with this parametric representation, we will make use of relationships between the parameters in the two spaces with respect to the corner points. For example, the point at 0 0 504 in the texture space is mapped to the point 25 25 in the object space and so on for all the corner points listed here. (Refer Slide Time: 43:48) So, with this mapping what we can do, we can substitute these values into the linear mapping functions, which we have seen earlier to get the constant values, which will be A = 50, B = 25, C = 50, and D = 25. So, our mapping functions are finally given in this form. Ɵ =50u+25, φ = 50w +25. (Refer Slide Time: 44:21) So, that is the idea of the second category of texturing. Now, there is a third category of texture mapping technique, solid texturing. Now, this texture mapping is typically difficult in many situations where we have complex surfaces or where there should be some continuity 505 of the texture between adjacent surfaces. For example, consider this wooden block here as you can see, there is a continuity between the textures in the surfaces, and unless this continuity is maintained, we will not be able to create the realistic effect. So, in such cases, we use solid texturing. (Refer Slide Time: 45:11) I will just give you the basic idea without going into the details of this technique because this is slightly more complicated compared to the previous techniques that we have seen. Earlier, we have seen that we are defining a texture space in 2D, now we are defining texture in a 3D texture space where the principal axis are usually denoted by u, v, and w. (Refer Slide Time: 45:38) 506 Then we perform some transformations to place the object in the texture space that means, any point on the object surface is transformed to a point to the texture space and then whatever color is there at that particular point is considered to be the color of the corresponding surface point. So, here we are performing a transformation from object space to texture space, and then whatever color is already defined in that particular texture space point is used as the color of the surface point. But this transformation is more complicated than the mapping that we have seen earlier, and we will not go into further details of this transformation technique. (Refer Slide Time: 46:37) So, in summary, what we have learnt so far let us quickly recap. So, with this lecture, we conclude our discussion on stage 3 of the pipeline that is coloring, and we covered three 507 concepts, broad concepts that is the lighting model to compute color, the shading model to interpolate colors, which reduces computation and the intensity mapping to map between the computed intensity and the device supported intensity. (Refer Slide Time: 47:14) Along with that, we understood the basics of color models and also how to create texture patterns, of course, these are very basic concepts. There are advance concepts which we did not discuss in this introductory discussion, and for more details, you may refer to the material at the end. In our next lecture we will start our discussion on the fourth stage of the pipeline that is the viewing pipeline, which itself consists of many sub-stages. (Refer Slide Time: 47:53) 508 Whatever we have discussed so far can be found in chapter 5, in particular section 5.1, 5.2.1, and 5.3 these three sections. However, if you are interested in learning about other color models as well as some more details on 3D texturing, then you may go through other sections as well. See you in the next lecture, thank you and goodbye. 509 Computer Graphics Professor Dr. Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 18 View transformation Hello, and welcome to lecture number 18 in the course Computer Graphics. We are currently discussing the 3D Graphics pipeline. That is the set of stages that converts an object description to a 2D image on a computer screen. What are the stages? Let us quickly recap. (Refer Slide Time: 00:51) There are five stages. As shown here. Object representation, Modelling transformation, Lighting, Viewing pipeline, and Scan conversion. And also, it may be recalled that each of these stages works in a specific coordinate system. For example, object representation works in local or object coordinate system; modeling transformation works in local to world coordinate systems. It is basically a transformation from local coordinate to world coordinate system. Then lighting or assigning colours to objects happen in the world coordinate system. Then the viewing pipeline, the fourth stage actually consists of five sub-stages, and there is a variation in coordinate system where they work. For example, the first stage, viewing transformation is a transformation from world to a view coordinate system, then clipping the second stage works in view coordinate system. Hidden surface removal, third stage works in view coordinate system, projection transformation, which is again a transformation that takes place between 3D view coordinate to 2D view coordinate system and window to viewport transformation the fifth substage of 510 the fourth stage takes the object description from view coordinate to device coordinate system. The last stage, scan conversion, essentially is a transformation from device to screen or pixel coordinates. (Refer Slide Time: 03:03) So, far among all these stages and sub-stages, we have covered three steps—the first three stages in our previous lectures. The first one is object representation, then geometric transformation and lighting or assigning colours to the objects. (Refer Slide Time: 03:20) Today, we will start our discussion on the fourth stage that is the viewing pipeline. Now let us start with some basic idea of what we mean by the viewing pipeline. 511 (Refer Slide Time: 03:39) Let us start with a background knowledge. (Refer Slide Time: 03:44) So, up to this point whatever we have discussed in our previous lectures. We learnt how to synthesize a realistic 3D scene in the world coordinate system. So, we started with the object definition stage or object representation stage. Then we put together the objects in the second stage modeling transformation stage to construct a world coordinate scene. And then, in the third stage, we assigned colours. In the world coordinate description of the scene to make it look a 3D scene. So, the knowledge we have gained so far is good enough to explain how we can create a 3D realistic scene. Now, that is not the end of it. So, we need to display this scene on a 2D computer screen. So, that means essentially, we need a projection 512 of the scene on a 2D screen from the 3D description, and this projection need not be of the whole scene. It can be a part of the scene also, which is the most common way of looking at it; we usually talk about a portion of the overall 3D description to be projected on a screen. (Refer Slide Time: 05:33) Now, this process of projection is actually similar to taking a photograph. So, when we take a photo, the photo is basically a projected image of a portion of the 3D world that we see around us that we live in, and this projected image is on the photographic plate or a camera screen. So, when you are talking of displaying a 2D image on a computer screen essentially, we start with a 3D description of the scene, and then we want to simulate the process of taking a photograph. (Refer Slide Time: 06:12) 513 Now, this process, process of taking a photograph is simulated in computer graphics with a set of stages and these stages together constitute the fourth stage of the graphics pipeline that is viewing pipeline. What are those stages? (Refer Slide Time: 06:32) The very first page is to transform the 3D world coordinate scene to a 3D view coordinate system or reference frame. Now, this 3D view coordinate system is also known as the eye coordinate system or a camera coordinate system. And this process of transforming from 3D world coordinates to 3D view coordinate is generally called the 3D viewing transformation. So, this is the first stage in the 3D viewing pipeline. (Refer Slide Time: 07:23) 514 Then we project the transformed scene onto the view plane. So, this is the projection transformation, so first comes 3D view transformation, then comes projection transformation. (Refer Slide Time: 07:38) Now, from the view plane after projection, we perform another transformation. The projection is done on a viewport on the view plane. Now, from there, we transform the objects into the description on a device coordinate system. So, when we perform projection transformation, we essentially transform on the view plane. Now, that area where the image is projected is called the window. Now, from this window, we make a further transformation. From the window, the object descriptions are projected onto a viewport which is on the device coordinate system. So, we perform window to viewport mapping. So, this is the third stage of the viewing pipeline that we are constructing in the fourth stage of the graphics pipeline. Now, these three are the basic stages in the 3D viewing pipeline. Along with that there are a couple of operations, namely clipping, and hidden surface removal, which together constitute the entire fourth stage of the 3D graphics pipeline, namely the viewing pipeline. And we will discuss each of these sub-stages one by one. 515 (Refer Slide Time: 09:22) So, let us start with the first substage that is 3D viewing transformation. (Refer Slide Time: 09:27) Now, in order to understand this transformation, we need to understand. How a photograph is taken? So, there are broadly three stages through which we capture a photo. First, we point the camera in a particular direction with a specific orientation so that we can capture the desired part of the scene. Then we set focus, and finally, we click the picture. These three are broadly the stages that we follow when we capture a photo. 516 (Refer Slide Time: 10:21) Now, among the most important thing is focusing. With focusing, we get to know, or at least we can estimate the quality and the coverage of the picture that we are taking. So, focusing constitutes the most important component of the overall process of capturing a photo. (Refer Slide Time: 10:49) Now, in order to set the focus, what we do? We essentially look at the scene through a viewing mechanism that is provided in the camera. Typically while we try to set focus, we do not look at the scene with our bare eyes. We look at the scene through the viewing mechanism provided in the camera itself. And accordingly, we set our focus. 517 (Refer Slide Time: 11:19) Now, this is important. So, we are not looking at the scene with our bare eyes to set focus. Instead, we are setting focus based on our perception of the scene obtained by looking through a viewing mechanism provided in the camera. So, we are looking at the scene through the camera instead of looking at the scene directly. So, this is very important consideration. (Refer Slide Time: 11:51) If we are looking at the scene directly, that means we are looking at the scene in its original coordinate system. That is what we are calling the world coordinate system. So, when we are looking at the scene directly, we are looking at it in its world coordinate reference frame, world coordinate system. 518 (Refer Slide Time: 12:13) However, if we are looking at the scene through a viewing mechanism of the camera, then we are not looking at the scene in its world coordinate system. Instead, we are looking at a different scene, one that is changed; it is important to note that a change scene and the change took place due to the arrangement of lenses in the camera. So, that we can estimate the quality and coverage of the photo to be taken So, here we are not looking at a world coordinate scene; we are looking at a scene that is changed from its world coordinate description due to the arrangement provided in the camera viewing mechanism. (Refer Slide Time: 13:14) 519 So, then what happens? When we are taking a photograph with a camera, we are actually changing or transforming the 3D world coordinate scene to a description in another coordinate system. This is the most fundamental concept to be noted to understand how computer graphics simulates the process of taking a photograph. So, when we are looking at a scene to set focus through the viewing mechanism provided in a camera, we are actually transforming the world coordinate scene to a different coordinate system. And this coordinate system is characterized by the camera parameters, namely, the position and orientation of the camera; this needs to be carefully noted. (Refer Slide Time: 14:18) So, this new coordinate system, we generally call it view coordinate system, and the transformation between world coordinate system and view coordinate system is the viewing transformation, which is the first sub-stage of the viewing pipeline. So, essentially we are trying to simulate the photo-taking process, and the first stage in it is to transform the world coordinate description of a scene to the view coordinate system, which simulates the process of looking at the scene through a viewing mechanism provided in the camera. 520 (Refer Slide Time: 15:11) So, to simulate this viewing transformation or to implement the viewing transformation we need to do two things. First, we need to define the coordinate system, and second is we perform the transformation. So, first we define the coordinate system, and second, we perform the transformation to simulate this effect of looking at the scene through the camera. (Refer Slide Time: 15:44) 521 Now, let us go one by one. First, we try to understand how we set up the viewing coordinate system. This figure shows the basic setting that will be considering here. So, on the left side, here is the actual world coordinate scene, so this cylinder is the object in a world coordinate description defined by the three principle access X, Y and Z. And then this is the camera through which we are looking at the scene, and this view coordinate is characterized by the three principle access X view, Y view and Z view. Although the more common notation used in graphics is u, v and n to denote these three principal axis of the view coordinate system rather than x, y and z. So, in subsequent discussion, we will refer to this view coordinate system in terms of this letter notation, that is in terms of the principle axis u, v and n. So, the question is, how do we determine this principle axis, which defines the viewing coordinate system. You may note here that n corresponds to z, v corresponds to y, and u corresponds to x. 522 (Refer Slide Time: 17:31) So, let us try to understand how we can determine the three principal axes to define the view coordinate system. So, the first thing is to determine the origin, origin of the view coordinate system where the three axes meet. Now, this is simple; we assume that the camera is represented as a point and the camera position is the origin denoted by o. So, the position of the camera is the origin where we are assuming that the camera actually is defined as a point, a dimensionless entity. (Refer Slide Time: 18:20) Now, when we try to focus our camera, we choose a point as we have already mentioned before in the world coordinate system, and we call this the center of interest or look at point, as shown here. So, this is our camera position, and this one is the look at Point. As of now, 523 you may note that both are defined in the world coordinates scene. So, with these points, we can define vectors. So, this will be the origin vector, and this will be the look at point vector p. (Refer Slide Time: 19:10) Then using vector algebra, what we can do is we can define n as you can see, n is the normal of the plane. So, we can define n to be ⃗ - ⃗ where each of these are vectors? That is the simple vector algebra we can use, and then we normalize n to get the unit basis vector ̂, which can be defined simply as | |. So, then we got one basis vector ⃗⃗. (Refer Slide Time: 20:00) Next, we specify an arbitrary point. Let us denote it by pup along the direction of our head while looking through the camera. This we call the view of point along the view-up 524 Direction. So, the direction along which our head is oriented while we are looking at the scene through the camera, essentially this is the head up direction. (Refer Slide Time: 20:37) Now, with this point we determine the view-up vector. This vector Vup as a difference of these two vectors as you can see from the figure here. This is again simple vector algebra. And once we get this V up vector, then we can get the unit basis vector ̂ in the same way by dividing the vector with its length, which is a scalar quantity. (Refer Slide Time: 21:17) Now, we got two vectors then, two basis vectors n and v. Now, if we look at the figure, we can see that the remaining vector u is perpendicular to the plane that is spanned by n and v. So, if this is the plane then u will be perpendicular to this plane. Then we can simply use the 525 vector algebra again to define u to be a cross product of v and n, v ✕n. Now, since both n and it should be v are unit vectors? So, further normalization is not required, and we get the unit basis vector u by this cross product. (Refer Slide Time: 22:24) So, then in summary, what we have done? We assume that few things are given three things; one is the camera position or the coordinate from where we can define the origin vector o. Then view-up point and corresponding view-up vector we can define and finally the look at point p and the corresponding vector. Then based on this information, we perform a three-step process. First, we determine the unit basis vector in ̂ using simple vector algebra, then we determine ̂ again using simple vector algebra. And finally, we determine u as a cross product of the earlier two vectors that we have defined in the first two steps. And following these stages, we get the three basis vectors that define our viewing coordinate system. Now, once the coordinate system is defined, our next task that is the second part of the process is to transform the object definition from the world coordinate system to the view coordinate system. Let us see how we can do that. So, in order to transform, we need to perform some operations. 526 (Refer Slide Time: 24:07) To get an idea, let us have a look at the figure here. So, suppose this is an arbitrary point in the world coordinates scene, and we want to transform it to the view coordinate system defined by three vectors n, u, and v. (Refer Slide Time: 24:33) Now, let us assume that the origin is defined with this point having the coordinates then sent here, and the three basis vectors are represented as shown here. These are the X, Y, and Z components of the basis vectors. And this representation will follow to formulate our mechanism to transform any arbitrary point in the world coordinate scene to a point in the view coordinate system. 527 (Refer Slide Time: 25:12) So, what do we need? We need transformation matrix M; if you recollect in the modeling transformation stage, we said that any transformation could be represented in the form of a matrix. So, we need to find out that matrix, which will transform a given point to a point in the view coordinate system. And how we do that? Again if you recollect our discussion from the lectures on modelling transformation, what we did, we multiply the point with the transformation matrix to get the transformed point. So, this is the transformed point which we will get by multiplying the original point with the transformation matrix. (Refer Slide Time: 26:03) 528 And this transformation is actually a sequence of transformations that are required to align the view coordinate with the world coordinate. In a most general setting they are not aligned like in the figure shown here; there is a slight difference in orientation between the twocoordinate system. So, we align them and then perform the transformation. Now, in order to do that, we require two basic transformation operations. Translation and rotation, the idea is simple. So, we translate the origin to the world coordinate origin and then rotate the system to align with the world coordinate system. (Refer Slide Time: 27:04) So, this translation and rotation will constitute the sequence of operations we need to transform between the two coordinate systems. (Refer Slide Time: 27:19) 529 Now, first thing is we translate VC origin to world coordinate origin. And this is the transformation matrix, which is the same as we discussed earlier with the corresponding X, Y, Z values replaced here. (Refer Slide Time: 27:38) Next is the rotation. Now, the rotation matrix is shown here; we will skip the derivation, can be derived. But for the time being, let us just note the rotation matrix. So, this matrix will align if applied this matrix rotates the viewing coordinate system to align it with the world coordinate system. (Refer Slide Time: 28:09) And since we performed first translation and then the rotation, so we will follow the right to left rule to combine the two transformations to come up with a composite transformation 530 matrix. Thus we will have to write them in this sequence T first, and then on the left side is R, and we take the product of these two matrices to get the composite matrix. And then we multiply this matrix to the point to get the transformed point coordinates. (Refer Slide Time: 28:51) Let us try to understand this process in terms of one example. Consider this setting, here there is a square object defined with its vertices A, B, C, D and then we have a camera located at this point (1, 2, 2) and the look at point is centre of the square object here that is (2, 2, 2). It is also specified that the up direction is parallel to the positive Z direction. Then given this specification, let us try to calculate the coordinate of the centre of the object after transformation to the view coordinate system. So, originally in the world coordinate 531 system it is (2, 2, 2). Now, after transformation, what will be the coordinate? Let us try to follow the steps that we have just discussed. (Refer Slide Time: 30:07) The first thing is we determine the 3 unit basis vectors for the viewing coordinate system. (Refer Slide Time: 30:20) Now, the camera position is defined o that is (1, 2, 2) as you can see here. Look at point p is defined at the centre of the object that is (2, 2, 2). So, then we can calculate the vector n as op which is (-1, 0, 0). Now, it is already a unit vector, so no need to do any further operations. So, we already got the unit basis vector ̂. 532 (Refer Slide Time: 31:00) Now, it is also mentioned that the up direction is parallel to the positive Z Direction. Therefore, we can directly determine that the unit basis vector along the up direction ̂ is basically the unit basis vector along the z direction only, that is (0, 0, 1), we do not need to do any further calculations. So, this as you can see is another way of specifying the up vector you tell the direction in terms of available basis vectors or in terms of a line rather than specifying a point. So, there are different ways of specifying the up direction. Anyway, so we have already found out two basis vectors n and v. (Refer Slide Time: 32:05) 533 Then we take a cross product of these two basis vectors to get the third basis vector ̂, which is (0, 1, 0). (Refer Slide Time: 32:16) So, then we found the view coordinate system as defined by the three-unit basis vectors n, u, and v. Next, we compute the transformation matrix M which is the composition of the translation and rotation matrices. (Refer Slide Time: 32:45) Now, we have already noted earlier that the translation matrix is represented in this way where we use the coordinate position of the origin since it is already given to be (1, 2, 2) so we replace it here to get the translation matrix in this form. Again, we already know the rotation matrix, which is in terms of the vectors which define the coordinate system, and we 534 already have determined this vector. So, we replace those values here. So, ̂ is (-1, 0, 0); ̂ is (0, 1, 0) and ̂ is (0, 0, 1) we already have determined this. Now, we replace these values. This is for u, this is for v, and this is for n to get this rotation matrix R. (Refer Slide Time: 34:14) So, then we multiply these to R dot T to get the transformation matrix m shown here. (Refer Slide Time: 34:29) Now, we have determined M, then we multiply M with the original coordinate to get the transformed coordinate and note here that it will be in the homogeneous coordinate system. But with a homogeneous factor 1, so we do not need to make any change. So, after multiplication, what we get? We get that the coordinate of the transformed point is (0, 0, -1) 535 in the view coordinate system. So, that is our transformed point in the view coordinate system. So, in summary, today what we discussed? So, we are discussing the fourth stage that is viewing pipeline which is essentially simulating the process of capturing a photo. Now, this process consists of multiple stages, broadly, there are three stages. First is a transformation from the world coordinate description of an object to a view coordinate system. The second is from view coordinate system to projection on a view plane, and the third is from the view plane a transformation to the device coordinate system. Among them we discussed the first major stage that is the transformation from world coordinate description to a view coordinate description. There we have seen how we can define a view coordinate system in terms of its three-principle axis u, v, and n and how to determine these three principal axes given the information of the camera position, the view of vector, and the look at point. Once these three are given we can define the three principal axes or the view coordinate system, which in turn gives us the system itself. Then once the system is defined, we determine a transformation matrix, which is a composition of translation and rotation to transform a point from the world coordinate system to the view coordinate system. We achieve this by multiplying the world coordinate point with the transformation matrix to get the transformed point in the view coordinate system. We may note here that here also we are assuming a homogeneous coordinate system. However, the homogeneous factor is still 1, so we do not need to do any further change in the computed coordinate value. 536 (Refer Slide Time: 37:35) In the next lecture, we will talk about the second major stage in this viewing pipeline, namely, the projection transformation. (Refer Slide Time: 37:47) Whatever we have discussed today can be found in this book. And you are advised to refer to chapter 6, section 6.1, to learn in more detail the topics that we have covered today. Thank you, and goodbye. 537 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 19 Projection Transformation Hello and welcome, to lecture number 19 in the course Computer Graphics. We are, continuing our discussion on the 3-D Graphics Pipeline. To recollect, the graphics pipeline is the set of stages that are used to convert a 3-D scene description to a 2-D image on the computer screen. And there are five stages in the pipeline; what are those five stages? (Refer Slide Time: 1:00) We have Object Representation as the first stage, Modeling Transformation as the second stage, Lightning or Assigning to the color objects as the third stage, Viewing Pipeline as the fourth stage, and Scan Conversation as the fifth stage. So, among them, we have already discussed the first three stages, object representation, modeling transformation, and lighting. Currently, we are discussing the fourth stage, that is viewing pipeline. 538 (Refer Slide Time: 1:42) Now, as you can recollect this stage, the fourth stage that is the viewing pipeline stage consists of a set of sub-stages, what are those sub-stages? Now, the first sub-stage is a Transformation from a 3-D world coordinate scene to a view coordinate description. Now, this view coordinate is also known as Eye or Camera coordinate system, and we have already discussed this transformation, which is called 3-D viewing transformation, in the previous lectures so, this is already discussed. 539 (Refer Slide Time: 2:32) Next comes, Projection that is after the viewing transformation, we project the transformed scene onto the view plane, which is our projection transformation, and this transformation we are going to discuss today. (Refer Slide Time: 2:58) There is one more sub-stage, the third sub-stage, in which we perform another transformation. So, from the view plane where the scene is projected, we transform it to a description on the device coordinate system in a region called a viewport. This is called window to viewport mapping, where window refers to a region on the view plane and this 540 will be our next lecture subject matter, today we are going to discuss the second stage that is projection. (Refer Slide Time: 3:47) Let us try to understand the basic idea behind projection before we discuss the projection transformation. (Refer Slide Time: 3:58) So, why we require projection? We all know when we see an image on a screen, it is a 2-D image; the image is display on a 2-D computer screen. 541 (Refer Slide Time: 4:14) However, when we discussed about transforming the world coordinate scene to a view coordinate scene, that was still a 3-D description, so the scene that was transformed to view coordinate system was still in 3-D or three-dimensional scene. (Refer Slide Time: 4:43) Then what is required? We need a way to transform a 3-D scene to a 2-D image, and this technique where we are transforming a 3-D description to a 2-D image description is called projection. 542 So, the idea is simple, when we see something on a screen that is on a 2-D screen, however, our definitions as well as representation are in 3-D, and we require some way to transfer from 3-D description to 2-D description, and that is projection. (Refer Slide Time: 5:22) In general, projection transforms objects from n dimension to (n – 1) dimension so, it reduces the dimension by 1 in our case of course, we will not go into the general description or projection, instead we will restrict our discussion to projection from 3-D to 2-D which will serve the purpose of this course. (Refer Slide Time: 5:51) 543 So, let us try to understand the basic setting. So, we have this world coordinate system where the scene is described. Now, we are looking at the scene through the viewing mechanism provided in the camera, and then we are taking a snapshot. So, this is the look at point and this snapshot when we take the snapshot on the film or on a screen that is called a view plane, this is the camera position, this is the view up point all these concepts we already discussed in the previous lecture and along with that the view up vector also. So, this view coordinate system is defined by these three principle axis n, u, and v. So, essentially, what we are doing? We are projecting the 3-D objects onto the 2-D view plane. (Refer Slide Time: 7:24) But, the entire view plane is not utilized, we define an area on this plane that contains the projected objects, and this area is typically called clipping window. So, in graphics, we will assume that whatever we want to project, we are projecting on a particular region on a view plane that is called the clipping window, later on, we will see why it is called clipping window. 544 (Refer Slide Time: 7:57) There is a third component also, so we also define a 3-D volume or a region in space in the scene. So, there is a scene, and within that scene, we are defining a 3-D volume. This is called the view volume. Now, this view volume is actually our way of specifying which objects we need to project on the clipping window. So, whichever objects are lying inside this view volume are projected on the view plane, more specifically on the clipping window and other objects that are outside are discarded. Now, this discarding takes place through a process called clipping, which we will discuss in the next lectures in subsequent lectures. So, essentially we have a view plane. Within this we define a clipping window, and also in the 3-D scene description, we define a view volume. Whatever objects are inside this view volume are projected onto this clipping window on the view plane, and whatever lies outside the volume are discarded through a process called clipping. 545 (Refer Slide Time: 9:27) So, the point to be noted here is that the entire scene is not projected; instead only a portion enclosed the view volume is projected. So, in this way, by controlling the view volume, we can control the amount of scene that we want to display on the screen. So, this gives us some flexibility in synthesizing the images; by increasing the volume, we want to show a larger image; by reducing the volume, we can show smaller images. (Refer Slide Time: 10:12) It maybe also noted that this larger and smaller in terms of size in terms of the amount of region of the 3-D scene description that we want to display. 546 So, by larger image, I meant that I can show a larger amount of the scene on the screen if we increase the view volume size; similarly, if we reduce the view volume size, then we can show the smaller region of the scene on the screen or on display. So, this brings us to the point of how to choose appropriate view volume so that we can control the content of the image; this requires some understanding of the different projection types. (Refer Slide Time: 11:23) So, let us try to understand different projection types. In order to do that, we have to now, to go the basic idea of projection at another level we have to understand the idea. So, what we want? We want to project a 3-D object on a 2-D view plane, so from each surface point of the objects that are present in the 3-D scene, we generate straight lines towards the view plane in order to create that projected image. Now, these straight lines are known as projectors, and they intersect the view plane. When we put together these intersection points, we get the projected image. So, essentially how we generate the projection? We generates projectors that are straight lines originating from the surface points in the original scene, and these lines goes towards the view plane intersects them and these intersection points taken together give us the projected image. 547 (Refer Slide Time: 12:49) Now, depending on the nature of projectors, we can have two broad types of projections one is Parallel projection, one is Perspective projection. There are many sub-types also, but, here we will not going to the finite details of different types and sub-types; instead, we will restrict our discussion to these broader types. (Refer Slide Time: 13:20) So, in the case of perspective projection, what happens? The projectors are not parallel to each other, and they converge to a center of projection So, here as you can see, this is the object, this one is the view plane, now from the objects we created projectors like here, here, here. Now, these projectors are not parallel to each other, 548 and they were projected towards a common point of projection where they meet during this process they intersect or passes through the view plane and the points of intersection like here, here, here it came together gives us the projected image. So, this object here is projected in this way, and this type of projection where the projectors are not parallel to each other instead, they meet at a point of projection is called perspective projection. (Refer Slide Time: 14:31) Now, there is another type of projection that is the parallel projection. In this case, projectors are parallel to each other they do not meet at a common point; instead, typically, it is assumed that they meet at infinity. That means, they do not meet in simple terms and this projectors when they intersect the view plane they generates sets of intersection points these points are taken together gives us the projected image as shown in this figure. So, here in this figure, as you can this is the object this entire thing, and we get this projection on the view plane. 549 (Refer Slide Time: 15:39) Now, in the case of perspective projection, certain things happen, and we should be aware of those things in fact because, of those things, only we get the only perception of reality let us see what are those things, they are together known as Anomalies. Now, those anomalies happen because the projectors converge at a point, and these anomalies indicate that the appearance of the object in terms of shape and size gets changed in perspective projection. (Refer Slide Time: 16:06) 550 Now, one anomalies perspective foreshortening. So two objects of the same size placed at different distances from the view plane; if that is the case, then the distant object appears smaller. So, if this is the view plane, this is one object, this is another object then as you can see object A, B size appears to this which is smaller than object C, D, although they are actually of same size so, this happens because of the projectors meeting at a common center of projection and as you can see form here that because of this reason, we get a sense of reality that distant objects appear smaller. (Refer Slide Time: 17:09) Another anomaly is called vanishing points; here, the lines that are not parallel to the view plane appear to be meeting at some point on the view plane after projection. Now, the point where these lines appear to meet is called the vanishing points. For example, here, if we have an object like this as you can see this side and this side, they appear to meet at these points; these are vanishing points. So, accordingly, the object shape is projected; this again gives us a sense of 3-D reality due to the occurrence of the vanishing points. 551 (Refer Slide Time: 18:15) There is another anomaly called view confusion. Now, if the view plane is behind the center of projection that means, the points where the projectors meet then the objects that are in front of the center of projection appear upside down on the view plane after projection, and this is simple to understand, this is center of projection and the view plane is behind it so, this object will appear upside down as shown here, this point will be projected to this point and this point will be projected to this point, this is view confusion. So, we have perspective foreshortening where distant objects appear smaller then vanishing points and then view confusion. So, together these anomalies make us perceive an object with changed shape and size, which intern reinforces our perception of 3-D reality. So, how we can use the projections? 552 (Refer Slide Time: 19:42) As I said, perspective anomalies actually help in generating realistic images since this is the way how we perceive objects in the real world. So, perspective projection can be used for realistic computer graphics; for example, in games or animations, we can use perspective projection wherever we require 3-D realistic scenes to be generated; we should use perspective projection. (Refer Slide Time: 20:24) On the other hand, in the case of parallel projection, the shape and size of the objects are preserved. They do not change, unlike in the case of perspective projection. As a result, the parallel projection is not suitable for giving us realistic images. 553 So, if we use parallel projection to generate a scene that would not look like realistic. So, it is not used for realistic 3-D, instead where it is used? For graphics systems that typically deal with Engineering drawings such as the CAD packages that we have discussed at the beginning of the course or compute aided design packages, so there realism is not important instead, other things are more important, so there parallel projection may be useful. So, with that, I hope you got some basic idea of projections, so to recap, we use projections to map 3-D scene to a 2-D image on the view plane, and we do this on a clipping window, which is a region on the view plane that is based on the view volume that we define in the 3D space, and then we perform projection transformation. So, let us now shift our focus to the idea of projection transformation what it is and how we can do this. (Refer Slide Time: 22:17) As the name suggests, it is a transformation. So, it is similar to all other transformations that we have already encountered, namely, the modeling transformation and the view transformation in which way it is similar, so we can perform these transformations with matrix multiplication between the point vectors and the projective transformation matrices. 554 So, essentially our objective is to have all sorts of transformations represented as a matrix multiplication, and we have already seen how to do it with modeling transformation; we have seen how to do it with view transformation. Now, let us try to derive projective transformation matrices so that we can do it with projection transformation as well. (Refer Slide Time: 23:21) Now, in order to understand those matrices, we require some understanding of the view volume because the projections are dependent on the view volumes, and the shape of the view volumes actually depends on the type of projection we are interested in, so we already mentioned there are two types one is the perspective projection, one is the parallel projection and their corresponding view volumes are different. 555 (Refer Slide Time: 23:52) Now, in the case of parallel projection the view volume takes the shape of a rectangular parallelepiped as shown in this figure. Here, there are six phases marked as near to the view plan, then right, the bottom, then far plane, top and left plane. Now this, near plane is also the clipping window. So, essentially it is the view plane containing the clipping window. So, in the case of parallel projection we define view volume as a rectangular parallelepipe defined by six planes, and the near plane is the view plane containing the clipping window. (Refer Slide Time: 25:00) 556 What happens in the case of perspective projection? So, in the case of perspective projection what we use is a frustum as shown in this figure like the parallel projection here also the frustum is defined in terms of its bounding planes, so we have near plane, far plane, right plane, left plane, top plane and bottom plane and the near plane contains the clipping window. So, essentially the idea is the same with parallel projection in the sense that in the both cases the near plane is the plane where the clipping window is situated. (Refer Slide Time: 26:02) So, with this knowledge of the view volume, let us try to understand the projection transformation matrices for the two types of projection. Let us start with the parallel projection. So, in this case, let us consider, a point P which is in the view volume with coordinates x, y, z. Now, we want to project this point on the view plane as a projected point P’ with the new coordinates x’, y’, z’. And this projection takes place on the clipping window, as we have already mentioned, and our objective is to relate these two points. 557 (Refer Slide Time: 26:53) Let us assume that the near plane is at a distance d along the -z direction, so then we can simply understand it is quite obvious that, since it is a parallel projection the new x coordinate will be same as the original one, the y coordinate will also be the same in case of z coordinate there will be change, so the new z coordinate will be - d. (Refer Slide Time: 27:29) Then, we can actually represent this information in the form of a transformation matrix for parallel projection, as shown here. So, when we multiply this transformation matrix with the point vector P, then we will get the new point vector P’. 558 (Refer Slide Time: 27:59) However, we have to keep in mind that this multiplication that we are performing that is we are trying to get the transform point by multiplying the transformation matrix with the original point in this way; this transformation takes place in a homogeneous coordinate system. (Refer Slide Time: 28:27) So, the multiplication performed in this homogeneous coordinate system and the real coordinates of P’ should be obtained by dividing it with the homogenous factor w. Now, we shall see later that in the case of transformations, w need not be 1, so we require division. We 559 will see some examples later where w is not 1, unlike in the previous transformations where the homogeneous factor was 1. (Refer Slide Time: 29:03) Now, let us see the perspective projection. So, this is more complex than parallel projection; in parallel projection we simply drop or change the coordinate but, here we require to change all the coordinates because the projectors meet at a point. Now, to derive this changes, let us try to understand the projection with this figure; the figure shows the side view along with the - x direction. So, what we need? We need to derive the transformation matrix that projects P. This is the point P to the point on the view plane or clipping window P’. (Refer Slide Time: 30:05) 560 Now, from the original and projected points, you can see that they are part of two similar triangles, and from these triangles, we can form some relationships. So, like between the y coordinates and between the x coordinates in terms of d, that is the distance of the view plane or the near plane from the origin and the original z coordinate value. (Refer Slide Time: 30:52) From there, we can reorganize to get the transformation matrix in this form, where d is the distance shown here between the origin and the projected point on the near plane or between the origin and the near plane. (Refer Slide Time: 31:17) 561 Now, like in the case of parallel projection here also, what we can do? We can multiply this perspective projection matrix with the original point vector to get the projected point in the homogeneous coordinate system and to get back to the original point, what we require is we need to divide it by this homogeneous factor w, and we will see that here again, w will not be equal to 1. So, we require some divisions, unlike in the cases of other transformations that we have seen before. So, that is how we can derive the projection matrices. Now, few things we should note here, one is the derivations that are shown are basically very simplified situations taken into account; in reality that actual projection matrices are slightly more complicated, plus there is some other information that is also stored along with projection, those things we will discuss later briefly although, we will not going to the minute details of those concepts. (Refer Slide Time: 32:51) 562 So, to summarize today, what we have learned is the idea of projection, why we require projection? To transform from 3-D scene description to description on a 2-D plane, which we are calling the view plane, on the view plane, we define a region which is called clipping window on which this projection takes place, and for the purpose of projection we define a 3D region in the 3-D space that is called view volume. Now, there are two types of view volumes defined for two types of projections, a rectangular parallel pipe for parallel projection and a frustum for perspective projection. In case of perspective projection, we get to see several anomalies that changes the shape and size of the objects after projection, and that gives us the perception of 3-D reality. Accordingly, for applications of computer graphics where we require to generate 3-D realistic scenes, we use perspective projection, whereas parallel projection can be used in situations where 3-D realism is not required. And we have also shown how to derive the projection transformation matrices for the basic two types of projections, namely parallel projection and perspective projection. The transformation idea is the same. Essentially, we have a transformation matrix this matrix we multiply with the point vector to get a new point vector, the transformed point vector; however, we have to keep in mind that this transform point vector is defined in the homogenous coordinate system. So, to get to the actual transform point vector, we need to divide the obtained coordinate values with the homogeneous factor w, and in the case of transformations where projection is 563 involved, w is not 1, unlike the other transformations that we have seen before namely modeling transformation and view transformation. In the next lecture, we will talk about the other sub-stage of the view pipeline that is the window to viewport mapping. (Refer Slide Time: 35:39) Whatever I have discussed today can be found in this book you are advised to refer to chapter 6, the whole section 6.2, excluding section 6.2.3. This section we will discuss in the next lecture. Now, in this section, you will find more details about projections and types of projections. You may go through those details if you are interested. That is all for today, thank you and goodbye. 564 Computer Graphics Professor Dr Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 20 Windows to Viewport Transformation Hello and welcome to lecture number 20 in the course Computer Graphics. We are currently discussing the graphics pipeline, that is the series of stages or steps that has to be performed to convert a 3D description of a scene to a 2D image on a computer screen or on that display that we get to see. (Refer Slide Time: 0:58) So, there are five stages, as we have already mentioned object representation first stage, modelling transformations second stage, lighting third stage, these three stages we have already discussed completely. Currently, we are in the fourth stage viewing pipeline, and there will be one more stage fifth stage that is scan conversion. 565 (Refer Slide Time: 1:24) Now, the fourth stage, the viewing pipeline, contains a set of sub-stages. The first sub-stage is a transformation from a 3D world coordinated scene description to a 3D view coordinate scene description. Now, this view coordinate is also called an eye or camera coordinate system. And this transformation is generally called 3D viewing transformation, which we have already discussed in our earlier lectures. 566 (Refer Slide Time: 2:11) The second stage is projection, so we project the 3D view coordinate description onto the view plane. And this projection is performed through a transformation which is generally called projection transformation. This also we have discussed in our earlier lectures. (Refer Slide Time: 2:43) 567 There is a third stage in which we perform a mapping or transformation that is from the view plane we map this to a viewport which is defined on the device coordinate system. This is called the window to viewport mapping where the window is on the view plane, and viewport is on the device coordinate system. And this third stage we are going to discuss today. Now, before we discuss the mapping, we would discuss one important aspect of projection transformation that we did not discuss in our last lecture, that is the idea of the canonical view volume. Let us see what this volume means. (Refer Slide Time: 3:42) As we mentioned earlier, there is an important stage in the graphics pipeline. In fact, this is part of the fourth stage that we are currently discussing that is viewing pipeline. Here, what we do is whatever objects are outside the view volume are clipped out. Now, that stage is called clipping. And we do it to remove all the objects that are outside the view volume. We already know what a view volume is, that is a region in the 3D space which we want to project. Now, if it involves lots of objects which are partly outside of view volume and partly inside or lots of objects that are outside the view volume, then we require a large number of calculations to determine which want to clip out. And this involves object surface, view volume boundary intersection point calculations. So where the object surfaces are intersecting with the view volume boundaries. So, if the number is 568 large, then such boundary calculations will be large, and these boundary calculations are not easy. They involve a lot of floating-point operations, and accordingly, the complexity is high. (Refer Slide Time: 5:21) Now, if we have to perform such intersection calculations with respect to arbitrary view volume where we have no control on the boundary planes of the view volume, then this complexity is only to be more. So, we can expect a large number of computations, which is likely to take a large amount of time reducing the quality of the image as we will get to see flicker. 569 (Refer Slide Time: 6:05) In order to avoid that, we can use one simple idea, that is we can come up with a standardized definition of view volume irrespective of how the actual view volume looks. We can always convert it to a standardized view volume or a standard view volume. This is called a canonical view volume or CVV in short. Essentially it is a standard representation of view volume irrespective of the actual nature of the volume. Remember that there are two types of view volume. One is for parallel projection, that is a rectangular parallel pipe, and the other one is for perspective projection that is a frustum. Now both these can be converted to a standard form which we call canonical view volume, which makes the intersection calculations standard and easier to implement. 570 (Refer Slide Time: 7:10) So, for both parallel and perspective projection, the standardized view volume looks the same. However, the way to arrive at the standardized or canonical volume for both the projections for the two types of projections are different. So, let us start with the parallel projection. So, for parallel projection, the canonical view volume that we define is a cube within a specified range, which is -1 to 1 along with all the three axis X, Y and Z. And as I already mentioned, any arbitrary view volume can be transformed to the CVV simply by the scaling operation. So, suppose this is an arbitrary view volume defined in terms of its bounding planes, six planes, so we can always map it by scaling within this range along the X Y Z direction and correspondingly, we get the canonical view volume. 571 (Refer Slide Time: 8:28) In case of perspective projection, this transformation is slightly more complicated because here we are dealing with a view frustum and we need to convert it to a canonical view volume for parallel production that is the rectangular parallel pipe where the X, Y and Z extent of the bounding planes are within a specified range. So, what we can do here, we will just talk about the idea rather than going into the details, we can convert this arbitrary view frustum to the canonical view volume here, by applying shearing and scaling in sequence. As we can guess from the figures, that shearing is required to change the shape and scaling is required to change the size. So, when we apply this to transformations on the original view frustum, we get the canonical view volume, of course, here will not go into any further details than this basic idea. So, what we have learned? That we define a view volume and this view volume, we transform to a canonical view volume so that in later stages when we perform clipping, the calculations are easier to implement because we are dealing with a standardized definition of the view volume. 572 (Refer Slide Time: 10:14) Let us revisit the sequence of transformations that we perform to project a point p in the world coordinates scene to a point on the view plane. Now, we mentioned that this is akin to taking a photograph that is we transfer it to view coordinate system, then take a projection. However, earlier, we mentioned these two steps. Now one third step is added. So first, we transform the world, coordinate point on the view coordinate system, as we have discussed in earlier lecture. And the next step is not projection. Instead, what we do is in this view coordinate description, we define a view volume, and this 573 view volume is transformed to a canonical view volume. And accordingly, the point is also transformed by applying the same set of transformations. So, the next stage is to transform the point in the view volume to a point in the canonical view volume; then the final stage is to perform the projection transformation that is, project the point in the canonical view volume on the view plane. So, these three steps, a transformation to view coordinate then transformation to canonical view volume and then projection transformation constitute the sequence through which we project a point in the world coordinate scene into a point on the view plane. Mathematically or in matrix notation that we are following, we can write this series of steps as shown here in this expression where this transformation represents the transformation to view volume. This one represents the transformation to canonical view volume, and this one represents the projection transformation. Since we are applying them in sequence, so we are following the right to left rule. So, the first transformation to view coordinate system, then transformation to canonical view volume and then transformation to view plane through projection. (Refer Slide Time: 13:11) So, that is the idea of performing projection on the view plane. There is one more point to be noted. So far, what we mentioned that in projection 3D points are mapped to 2D. The implication is that we are removing the Z or depth component. However, it may be noted here at this point 574 that while we implement the pipeline, this depth component is actually not removed, and why that is so? (Refer Slide Time: 13:53) One operation that we perform in this fourth stage is called hidden surface removal. We will talk about this operation in details in a later lecture. The point to be noted here is that this operation requires depth information. So, the depth information after projection is actually not removed. Instead, this original depth information is stored in separate storage, which is called the Z-buffer or the depth buffer. So, we are actually not removing the depth information, although we are performing a projection instead, we are keeping it stored separately in the Z-buffer or the depth buffer. And this information is required to perform a later operation called hidden surface removal, which gives us a realistic effect in an image. 575 (Refer Slide Time: 14:56) So, that is in short what we do during projection and how we project from a world coordinated scene to a view plane. Now there is one more stage. Let us go to that stage that is mapping from this view plane to a viewport on the device space. (Refer Slide Time: 15:21) So, far what we have discussed? We discussed steps to transform a point in world coordinate to a clipping window on the view plane. That means a region on the view plane on which we are projecting the objects that are part of the view volume. 576 Now, also, you have shown that this is typically the near plane of the canonical view volume. So, this is our window or clipping window. (Refer Slide Time: 15:59) We can assume that for simplicity that the window is at 0 depth or Z equal to 0, that is just for simplicity, although in general, that is not an absolute requirement. (Refer Slide Time: 16:17) 577 It may also be noted that we are talking of canonical view volume that is X and Y extents must be within a fixed range irrespective of their actual position in the world coordinates scene, and because of this region where we are restricting everything within a fixed range, this canonical view volumes are standardized, and the clipping window that is defined on the near plane of the canonical view volume is often called a normalized window. So, here we are dealing with a normalized window where the extent of values are to be within a predefined range. (Refer Slide Time: 17:19) Now, this view plane is actually an abstract concept, so accordingly, the clipping window is also an abstract and intermediate concept. We cannot see it; what we get to see on the screen is something different. The points that are there on the clipping window are to be shown on the screen. But the scene that is there in the window need not occupy the whole screen, for example, here. Suppose this outer rectangle defines a whole scene out of which we have projected this part defined within the clipping window. Now, this part can be displayed on any region of the screen and can be in any size now the region on which this part is displayed on the screen it is called the viewport. 578 (Refer Slide Time: 18:41) So, we have two concepts here, window, which is same as the clipping window, which is normalized. And objects are projected on this window, and we have the other concept viewport, which is defined in the device space with respect to the screen origin and dimensions. So, this viewport refers to a region on the device space where this projected image needs to be shown. Now, this region can be at any location on the space device space and can be of any size, irrespective of the size of the clipping window. So, what we need? We need to map from this window to the viewport. 579 So, it requires one more transformation to transfer the points from the window to the viewport. (Refer Slide Time: 19:59) So, let us see how we can perform this transformation. So what we want, suppose this is our window, and this is our viewport, note that here we are not using this normalized range. We are formulating the problem in a very generic scene where this Wx and Wy can take any value. So, Wx, Wy is a point on the window, and we want to map it to a point on the viewport Vx, Vy. 580 (Refer Slide Time: 20:44) So, how we can do that? The first thing is that we have to maintain the relative position of this point with respect to its boundaries, so the same relative position has to be maintained in the viewport, so if we want to maintain that, then we get relationships like the one shown here between the window dimensions and the viewport dimensions. (Refer Slide Time: 21:17) 581 Now, this expression can be simplified in this form. So, we can represent the X coordinate of the point in the viewport in terms of the X coordinate of the point in window and these two constants, which are defined here in terms of the window and viewport sizes. (Refer Slide Time: 22:04) Similar relationship we can form between the Y coordinate of the point in the viewport and the Y coordinate of the same point in the window by again, forming, the relationships first between the y coordinates and then simplifying and rearranging to get this relationship where Vy is the Y coordinate of the point in the viewport, Wy is the Y coordinate of the point in the window. 582 And these two are constants defined here in terms of, again, the window and viewport sizes. (Refer Slide Time: 23:05) So, using those expressions, we can actually form that transformation metrics as shown here. So, this is the metrics to transform this window point to the viewport point. (Refer Slide Time: 23:23) And we will follow the same rule that is to get the transform point will multiply the original point with the transformation metrics as shown here. Note that here again, we are dealing with 583 the homogeneous coordinate system since these are two-dimensional points, so we have threeelement vectors and three by three matrices. And at the end, we need to divide the obtained coordinates here with the homogeneous factor as shown here, to get the transformed points. The approach is similar to what we have seen earlier. So, that is the basic idea of how to transform from a point in the window or the clipping window to a point in the viewport, which can be anywhere on the device space. Now, let us try to understand the concepts that we have gone through so far in terms of illustrative examples. (Refer Slide Time: 24:38) So, in our earlier lecture, we have come across this example where we mentioned one object, shown here and a camera position view up direction, everything has been mentioned, and we computed the transform centre point of the object in the view coordinate system. 584 (Refer Slide Time: 25:11) So, we will not go into the details of how we calculated that again, let us just mention the transform point. That is 0, 0, - 1 that we got after applying this viewing transformation. (Refer Slide Time: 25:29) Now let us assume that the view plane is located at Z equal to - 0.5. And we want parallel projection. So, what would be the coordinate of the object centre after the projection? Assuming that the view volume is sufficiently large to encompass the whole transformed object. 585 (Refer Slide Time: 26:08) So, our parallel projection transformation matrix is given here, and we know D is 0.5. So, this is our parallel projection matrix. (Refer Slide Time: 26:27) So, if we use these matrix and perform the matrix multiplication here. The projection matrix and the point vector. Then we get this point as the projected point on the view plane. Now here, since the homogeneous factor is 1, so our point is directly obtained. 586 (Refer Slide Time: 26:56) Now, let us consider perspective projection. Earlier, we considered parallel projection, what will happen if we now consider the perspective projection with the same view plane? So, what would be the new point after projection? (Refer Slide Time: 27:29) So, the transformation metrics for perspective projection is shown here. We know the value of d replacing d in this, we get our projection metrics. And with this matrix, what we do? 587 (Refer Slide Time: 27:58) We multiply it with the point vector as before, as shown here, so after multiplication, we get this transform point in a homogeneous coordinate system. Now note here that the homogeneous factor is not 1, earlier I mentioned that in projection, particularly perspective projection, we get homogeneous factors that are not 1, so there we need to be careful in concluding the final transformed point we have to divide whatever we got with the homogeneous factor. So, after division, we will get this point, or this is the final point that we get after perspective projection applied on the central point. 588 (Refer Slide Time: 29:01) So, we have performed projection. Now let us try to see what happens if we want to perform this window to viewport transformation. Now, let us assume that we projected the point on a normalized clipping window. And this projected point is at the centre of the normalized window. Now we are defining a viewport with a lower-left corner at (4, 4) and the top right corner at (6, 8). So, that means if this is our viewport, then the lower-left corner is this 1. This is (4, 4) and top right corner is (6, 8). So, if we perform a window to viewport transformation, then what would be the position of the point, the same central point in the viewport? Let us try to derive that. 589 (Refer Slide Time: 30:18) Now, we already mentioned that the clipping window is normalized, so the values or the extent of the window are fixed. And we get these values. So, this is between - 1 to 1, and again, this is also between - 1 to 1. So, we get these values, and from viewport specification, we can see that this is (4, 4). So, this 4 and this is so this point is (6, 8) then this must be 6. This must be 8. So, we get these values. So, next, we simply replace these values in the transformation matrices that we have seen earlier. (Refer Slide Time: 31:27) 590 We first compute the constant values sx, sy, tx, ty by using those values that we have seen earlier to get these results. Sx is 1, sy is 2, tx is 5 and ty is 6. (Refer Slide Time: 31:53) So, the transformation matrix can be obtained by replacing the sx, sy, tx, ty values in this original transformation matrix which gives us this matrix. So, this will be our window to viewport transformation matrix. (Refer Slide Time: 32:16) 591 Now, once we obtain the transformation matrix, then it is easier to obtain the transformed point in the viewport by multiplying the transformation metrics with the point vector to obtain the transform point in homogeneous coordinate. Now, here again, the coordinate factor is not 1. So, we have to divide these values with the homogeneous factor as shown here and here, which eventually gives us the point (5, 6). So, this will be our transformed point after we apply the window to viewport transformation. So, this is how we get a transformed point in the viewport. Now in this example, you probably have noted that we have defined viewport irrespective of the window description, we can define it anywhere in the device space. What we need is basically a transformation also, you probably have noted that the viewport size has nothing to do with the window size, the window is normalized, whereas the viewport is not normalized. So, we can define any size by specifying its coordinate extents, and through mapping, we get that transformed point. So, this gives us the flexibility of placing the projected image anywhere on the screen with any size. (Refer Slide Time: 34:18) So, in summary, what we have discussed so far are three sub-stages of the viewing pipeline stage. So, these three sub-stages are view transformation, projection transformation and viewport transformation. Just to recap, so these three sub-stages are used to simulate the effect of taking a 592 photograph. So, when you take a photograph, we look at the scene through the mechanism provided in the camera so that we mimic by performing the viewing transformation, we transform the world coordinate scene to a 3D view coordinate system, which is actually equivalent to watching the scene through the viewing mechanism of the camera. Then, we take a photo that means we project it on the view plane that is done through projection transformation. And finally, we display it on the screen, which is of course not part of the photograph analogy, but we do it in computer graphics so that stages mimicked with the use of windows to viewport transformation. This transformation is required to have the flexibility of displaying the projected image anywhere on the screen and with any size, irrespective of the size of the clipping window. In the fourth stage, apart from these three sub-stages, which are related to three types of transformations, there are two more operations that are done. We already mentioned it in this lecture. One is clipping one is hidden surface removal. So, these two operations we will discuss in our subsequent lectures. (Refer Slide Time: 36:11) Whatever I have discussed today can be found from this book, Computer Graphics. You can go through Chapter 6, the section 6.2.3 and 6.3 this section is on the topic of canonical view volume, and this section discusses in detail the window to viewport transformation. That is all for today. Thank you and goodbye. 593 Computer Graphics Professor. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 21 Clipping Introduction and 2D Point and Line Clipping Hello and welcome to lecture number 21 in the course, Computer Graphics. We are currently discussing the graphics pipeline, that is how the 3D scene gets converted to a 2D image on the computer screen, what are the stages there to perform this task. Together these stages are known as pipeline, as we have already discussed. So, let us just quickly have a relook at the pipeline stages and then we will start our discussion on today's topic. (Refer Slide Time: 01:09) What are the pipeline stages? We have object representation as the first stage, then modeling transformation as the second stage, lighting or assigning color to objects as the third stage, viewing pipeline as the fourth stage, and scan conversion as the fifth stage. So, here I would like to reemphasize on the point that this sequence that is shown here need not be exactly followed during implementation of the pipeline, there the stages maybe in a slightly different sequence. Now, among these stages, we have already discussed first stage, second stage, third stage, and currently we are discussing the fourth stage that is viewing pipeline. As you can see here, in the viewing pipeline there are sub stages. So, we have viewing transformation, clipping, hidden surface removal, projection transformation and window-to-viewport transformation. 594 Among them, we have already discussed in the previous lectures, we have already discussed the viewing transformation, the projection transformation, and the window-to-viewport transformation. Two more operations in the fourth stage are remaining, namely clipping and hidden surface removal. So, these operations we are going to discuss in todays and subsequent lectures. (Refer Slide Time: 02:56) So, in the viewing pipelines stage we have covered these three transformations: view transformation, projection transformation, and viewport transformation already. 595 (Refer Slide Time: 03:07) And there are two more operations: clipping and hidden surface removal which are part of the fourth stage. (Refer Slide Time: 03:18) So, these operations we are going to cover in the lectures that we are going to have this week. 596 (Refer Slide Time: 03:29) Let us start with clipping. What is this operation and how it is performed. (Refer Slide Time: 03:37) If you may recollect, earlier we talked about a concept called view volume. So, essentially what we discussed is that when we are trying to generate a 2D image, essentially this is analogous to taking a photo of a 3D scene. So, we first perform view transformation, to transfer that content from world coordinate system to view coordinate system, then we perform projection transformation to project the 3D view coordinate system description of the scene to a 2D view plane description. And finally, we perform window-to-viewport mapping. 597 However, in this process what we project on the view plane, we project only the objects that are present in the view volume that is a region in the 3D view coordinate space that we have decided to project. So, whatever objects are within the view volume should be projected. Thus, we need to define this view volume in this 3D region, in the view coordinate system before projection. (Refer Slide Time: 05:19) So whichever object lies within this volume are projected, whereas objects that are outside the volume are discarded. For example, let us have a look at this figure. Here the view volume is on this rectangular parallel pipe. Now, as you can see this object is entirely within the view volume, so it will be projected on the view plane. This object is partially within the view volume, so whichever part is within the view volume will be projected and the outside part, this one will be discarded. Whereas in this case, the entire object is outside the view volume so it will not be projected. So, we have three situation. In one case, the entire object is within the volume, so entire object is projected. In the other case, object is partially within the volume. So which part is within the volume will be projected, whereas the outside part will be discarded. And in the third case, we have the entire object outside the volume which will be discarded. Now, the process of discarding objects is called clipping. So, before projection we would perform clipping and then whatever objects remain should be projected. 598 (Refer Slide Time: 06:46) Now, the question is how a computer can discard or clip an object? That is done through some programs or algorithms, which collectively are known as clipping algorithms. So, we perform clipping with the help of some clipping algorithms. And in this lecture, and subsequent lectures we will go through few of those algorithms which are easier to understand. (Refer Slide Time: 07:22) Before we start our discussion on the clipping algorithms, we should keep in mind two points. First thing is, the algorithms will be assuming that the clipping is done against canonical view volume. To recollect, a canonical view volume is a standardized view volume where the shape of 599 the view volume is rectangular parallel pipe and its bounding planes are within a fixed range. So, it is a standardized view volume. So, whenever we are talking of clipping, we assume that the scene is already transferred to the canonical view volume and then we are performing clipping. Secondly, we will first discuss 2D clipping algorithms for simplicity. It will be easier to understand clipping when we are dealing with 2D objects and then we will extend our discussion to 3D clipping. (Refer Slide Time: 08:35) So, let us see what are the algorithms that we can use to perform 2D clipping. 600 (Refer Slide Time: 08:44) Now since we are discussing 2D clipping, we have to restrict ourselves to the 2D concepts. Now view volume that we mentioned earlier is a 3D concept but that is not relevant in 2D. So instead of view volume we now use the concept of view window which is a square region on the view plane. In earlier lectures, we already got introduced to idea that we mentioned that on the view plane there is a clipping window on which objects are projected, the concept is same as the view window. So it is equivalent actually, to assume that the view volume and all objects are already projected on the view plane, that is another way of looking at 2D clipping. So already projection is done and we have 2D clipping window now we want to perform clipping. So that is after projection we want to perform clipping. 601 (Refer Slide Time: 10:00) So, this view volume is projected to form the window and other objects are projected to form points, lines and fill areas, such as a polygon. So, we have a window which is formed by projecting the view volume on the view plane and then other objects are projected to form points, lines as well as fill areas such as polygons. (Refer Slide Time: 10:34) So, then our objective boils down to performing the clipping operation for points, lines, and fill areas with respect to the window. So, the scenario is that we have a clipping window or view window and we have objects that are projected on the view plane and we want to clip the 602 projected objects against this window where the projection takes place in the form of points, lines or fill areas. (Refer Slide Time: 11:18) Let us start with the simplest of all clipping, that is point clipping. How to clip a point against the view window. (Refer Slide Time: 11:31) Let us assume that we are given a point with a coordinate x, y. Now the clipping idea is simple, we have to check whether the point is within the window or outside. So, what we have to do is 603 simply have to check if the coordinate values lie within the window boundary. So, we need to perform these check-ins. So, here we are checking the x value against the window boundaries and y value here we are checking against the window boundaries to determine whether they are inside the boundary or outside. So, if it is inside the boundary, we keep the point otherwise we clip it out. That is a very simple idea. (Refer Slide Time: 12:35) More complicated is line clipping. Here we do not have a single point, we have a large number of points to consider. So, the corresponding algorithm would be more complicated than what we have seen for point clipping. 604 (Refer Slide Time: 13:00) Let us first talk about a very simple intuitive approach. What we can do, we can represent any line segment with its end points and then we check the end point positions to decide whether to clip or not. So, we are representing a line with its end points. Then for each end point, we are applying this point clipping approach to check whether the endpoint is within the boundary or not. And after checking both the end points, we can come to a conclusion whether the line is within the window or not. (Refer Slide Time: 13:49) 605 But there is a problem. If we follow this approach, there can be either of the three scenarios. In the first case, both end points may lie within the window boundary. Consider this line segment as L1. Here, both the end points are within the boundary. So we do not clip. In the second case, one end point is inside and the other end point is outside. Consider L2, here this endpoint is inside and this endpoint is outside. So, in that case it has to be clipped. And then there is a third scenario where both the endpoints are outside. However, here we have a problem. Considered the line L4. In this case, both the end points are outside and the entire line is outside, so we can simply discard it. But in case of L3, here also both the end points are outside. However, a portion of it defined between these two intersection points is actually inside the view window. So, we cannot discard the line entirely. Instead, we have to clip out these outside parts and discard them whereas this inside part must be kept. So, in this case, then we have to check for line boundary intersections to determine whether some portion is inside the window or not. (Refer Slide Time: 15:45) Now, this intersection checks are computationally expensive because they involve floating point operations. And in practical applications, every change of screen involves large number of such intersection checks, which may slow down the overall rendering process and it may even turn out to be impractical. So, we require some alternative, some better solution. 606 (Refer Slide Time: 16:25) One such solution is provided by one algorithm called, Cohen-Sutherland Algorithm. Let us go through the steps of the algorithm. (Refer Slide Time: 16:38) So, in this algorithm, we assume a representation of the view plane. So here we assume that the window and its surrounding is divided into nine regions as shown here. So, this is the window, the central part, and then we have above left, above, above right with respect to the window, left, right, with respect to the window again, and below left, below, and below right with respect to the window position again. So, these nine regions are assumed to represent the view plane and 607 how we get the nine region, by extending the window boundaries as shown here. So, this is the first assumption. (Refer Slide Time: 17:41) Then what we do, we assign a code to each of these regions. Now, there are nine regions. So, we require four bits to represent all the nine regions, which is very obvious and each region is given a unique code, a four bit unique code. Now, each bit indicates the position of the region with respect to the window. So, this is the nine region representation and this is the codes assigned to these nine regions. Note here that each code is unique. So above left is represented by 1001, above is represented by 1000, above right is represented by 1010, left represented by 0001, right represented by 0010, below left represented by 0101, below represented by 0100, and below right represented by 0110. The central region or the window is represented by all zeros. And the organization of the code looks something like this, where the leftmost bit indicates above location, next bit indicates below location, next bit indicates right, and the rightmost bit indicates left location. So, each location can take either 1 or 0, 1 means it is above, 0 means it is below. For example, consider 1001. So, we have above 1, below 0, right 0, and left 1, that means the region is above left as shown here because these two bits are 1, whereas below and right bits are 0. So that is how the codes are assigned to each region. 608 (Refer Slide Time: 20:26) Now that is the assumption and assignment of code. Then the algorithm starts working once those things are done. So, in step one of the algorithm we assign region codes to the end points of the line segment. So given a line segment, we first have to assign its endpoints, the corresponding region codes. How we can do that, assume that the end point, one end point is denoted by P(x, y) and the window is specified by these boundary values Xmin, Xmax, Ymin, Ymax. So, these are the boundary values which specifies the window. (Refer Slide Time: 21:30) 609 Then what we do, we perform a simple check to determine the region code of the endpoint P. What are those checks, so we check the sign of the difference between Y and Ymax. P has coordinates x and y and Xmax, Ymax, Xmin, Ymin are the window boundaries. So, we take the difference of Y and Ymax and check its sign. Now they sign up this quantity, will give us a result 1 if this quantity is positive, otherwise it is 0 in case of negative. So, we do that for each bit position. So, for bit 3, we checked the difference between Y and Ymax. For bit 2, we check the difference between Ymin and Y. For bit 1 we checked the difference between X and Xmax and for bit 0 we checked the difference between Xmin and X and take their sign and then apply this rule to get the actual bit value. (Refer Slide Time 22:54) So, the first step is assignment of region codes to the end points. In step 2, we perform more checks on the region codes that are assigned to the endpoints and then we take action. So, if both endpoint codes turn out to be 0000, that means the line is completely inside. So, we retain the line. Now, if logical AND or the bitwise AND operation on the endpoint codes turn out to be not equal to 0000, then the line is completely outside and we discard the entire line. Note that, here we do not need to check for intersection. 610 (Refer Slide Time: 23:48) In the next step, when none of the cases that we just discussed in step 2 occur, we know that the line is partially inside the window and we need to clip it. So, in step 2 we perform checks to decide whether the line is completely inside or completely outside and accordingly we take step. Then if none of these conditions satisfy that means the line is partially inside and we need to clip it, that we do in step 3. (Refer Slide Time: 24:26) So for clipping, we need to calculate the boundary and line intersection point. So, here we cannot avoid calculation of the intersections points. And that we can do in different ways, one possible 611 way can be take the boundaries in a particular order. For example, first above boundaries then below boundary, then right boundary, then left boundary and so on, any order is fine. And for each boundary compare the corresponding bit values of end point region codes to decide whether the line is crossing the boundary or not. (Refer Slide Time: 25:25) If the region codes are not same then the line intersects that particular boundary, then we form the line equation from the end points and determine the intersection point by solving the equations. And then we assign region code to the intersection point, as we have done earlier. And we do it for all boundaries and in this process, we discard the line segment that lies outside the window. 612 (Refer Slide Time: 26:06) Now we have a new region code that is the intersection point and the other end point with respect to a particular boundary. So, we compare the two new end points to see if they are completely inside the window. If they are, then of course we keep that, if not then we take the other end point and repeat the process. So, what we do in step 3, in step 3 we go for clipping the line. We start with one end point; assign a region code which is already there, we check the region code with respect to all the boundaries following a particular sequence and based on that check we determine if the line is intersecting a particular boundary. If that is so, then we use the line equations and the boundary equations to solve for the intersection point. Then we assign a new code, the way we did before to this intersection point and the intersection point and the end point clearly lies outside the clipping window and we discard it. Then we have two new end points that is the intersection point and the original remaining endpoint. We check if it is completely inside the window if it is then we keep it, otherwise we repeat the process again for the other end point. 613 (Refer Slide Time: 27:55) Let us try to understand whatever we have discussed so far in terms of some examples. (Refer Slide Time: 28:03) So earlier we have shown how to assign the region codes, and based on that how to determine whether to clip a line or not. Let us consider this line segment defined between the end point A and B. And the window extension provided here, the Xmin is 2, Xmax is 4, Ymin is 2, and Ymax is 4. We check the sign and accordingly we assign region code to the end points. For example, for A, let us try to do this. Let us start with bit 3, here the corresponding expression is sign of 3 minus 4, which is sign of minus 1, since it is a negative quantity so bit 3 will be 0. 614 Similarly, bit 2 again negative quantity so it will be 0. Bit 1, this is a positive quantity 1 so it will yield 1. And bit 0 which is again a negative quantity and we are taking the sign of it, so it will result in 0. So, our region code will be given by these bits. (Refer Slide Time: 29:35) Or 0010. In a similar way, we can check for B which is 0010, it will be same. (Refer Slide Time: 29:57) Now we go to step 2, series of checks. So, first check is whether both end points are equal to 0. Now they are not 0000, so first check fails. Second check is if they are logical AND is not equal 615 to 0000 which happens to be true in this case. So, we can be sure that this line is completely outside the window boundary. So, no further check is required and we can simply discard it. So, this particular line will be completely discarded just by checking the region codes, we managed to determine that it is completely outside. (Refer Slide Time: 30:51) Now let us consider another line segment, that is given by the end points P and Q. As usual we will first determine the end point region codes. The P code will turn out to be 0000, as you can see here for bit 3 sign of a negative quantity 0. Again, bit 2 negative quantity 0, bit 1 negative quantity 0 and bit 0 sign of a negative quantity again 0. Similarly, we can determine the region code of Q, which turns out to be 0010. You can try yourself. 616 (Refer Slide Time: 31:43) Then once the region codes are decided, we go for the series of checks. First check fails, the end points both end points are not 0000. Second check also fails, logical AND turns out to be 0000. So, it is not completely outside. So, we need to go for the third step and need to determine the intersections. (Refer Slide Time: 32:15) From the endpoints we can derive the line equation, as shown here. And then we check for intersection of this line with boundaries in the following order, above boundary first we will check with, then below boundary, then right boundary, and then left boundary. So, for above 617 boundary you can see that bit 3 which represents the above boundary of P and Q both the end points are same. So, the line does not cross above boundary. Similarly, you can see that that line does not cross below boundary. There is a difference of the bit values in case of bit 1, these values are different. So, we can conclude that the line crosses the right boundary. (Refer Slide Time: 33:30) So, we have to find out the intersection point with the right boundary. Now, right boundary line equation is x equal to 4, as you can see here. And we put this value in the line equation to get the intersection point at 4, 5 by 2. This is the intersection point. Now, as you can see let us call this Q’. So, since the line crossed the boundary, so Q’Q is outside the boundary because Q is outside. So, this segment is discarded and the new line segment becomes PQ’. These are the two new end points. 618 (Refer Slide Time: 34:26) Next, we determine the region code of Q’ as we have seen earlier which turns out to be 0000. Now, as you can see both P and Q’ have the same region code 0000. So, this segment is entirely within the window. And it is retained by changing Q to Q’. (Refer Slide Time: 35:00) One thing was left, earlier we checked for above below right boundary, we did not check for left boundary. However, as we can see with this region codes of P and Q’ because that is the new line segment bit 0 is same for both end point so there is no intersection. And so, at the end of the 619 algorithm, we get PQ’ as the clipped line. So, that is how we can check for clipping following this algorithm. (Refer Slide Time: 35:54) Let us see one more example. Now let us consider a slightly more complicated line segment given by M and N. As before we can determine the region code as shown here, for both the end points M and N which turns out to be for M it is 0001 and for N it is 0010. (Refer Slide Time: 36:21) 620 Then in next step, we go for the checks, series of checks. Here also first check fails, both end points are not equal to 0000. Second check also fails. Logical AND results in 0000, so it is not 0000, so second check fails. Thus, we cannot completely keep it or completely discard it. So, there must be some intersection and we have to determine that intersection and clip the line. So, we need to determine line boundary intersection points and clip it. (Refer Slide Time: 37:07) For that we need to decide the line question from its end points, which is easy as shown here. Then we check for the line intersection with boundaries and following this order above, below, right, and left boundaries. So, bit 3 of M and N are same that means line does not cross above boundary. Similarly, it does not cross below boundary bit 2 is same. However, bit 1 are different and the line crosses right boundary. 621 (Refer Slide Time: 37:55) Then we check for intersection points, so the right boundary equation is x equal to 4. So, using this equation and the line equation we get the intersection point to be N’(4, 9) by 4, that is here. Now this point is the intersection point between the right boundary and the line and since N is outside the right boundary, so as before we discard this segment N’N and the new line segment becomes MN’, so this part. Then we decide the region code of the new end point N’ which is 0000. (Refer Slide Time: 39:05) 622 Thus, we now have two new endpoints M and N’. M has code 0001 and N’ has code 0000. Now earlier we checked for above, below, and right boundary, left boundary check was pending, we will do that next. Here if we check, we can see that for the left boundary, the bit values are not same, that means again there is an intersection with the left boundary and we check for that intersection points between the left boundary and the new line segment given by the end points M and N’. (Refer Slide Time: 39:59) Now, we know left boundary equation is x=2. We use this equation and the line equation to get the intersection point M’, which is 2 and 11/4, here. Now the point M is outside the left boundary that we already know, that means on the left side, so we discard this segment MM’. And new line segment becomes M’N’ between these two points, this is M’ and this one is N’. So, we now decide or determine the region code of this new endpoint M’ which is 0000. 623 (Refer Slide Time: 40:52) Then we go for checking the N point region codes again, step 2 of the algorithm and we find that M’ and N’ are same region code 0000, that means this entire line segment M’ and N’ is within the window. So, the algorithm resets the line segment to M’N’ and we have already checked for all the boundaries, so no more boundary remains to be checked. And the algorithm returns this line segment as the clipped line segment and stops. That is how the algorithm works. (Refer Slide Time: 41:44) So to summarize, the Cohen Sutherland algorithm is designed to reduce intersection calculations. If we go for intuitive method then when the line is completely inside or completely outside, it is 624 difficult to tell. So, for every line we have to go for intersection checks, that is not required in case of Cohen Sutherland method. Here, we assign some region codes to the line endpoints and based on the region codes we can decide whether the line is completely inside or completely outside. In those cases, we do not need to go for intersection checking. However, if the line is not completely inside or completely outside, we know that there is some intersection and we need to clip, there we have to go for some intersection checks and find out the intersection points. So, some amount of intersection calculation still remains. (Refer Slide Time: 42:48) But it reduce the calculation to a great extent, which is beneficial to render complexions faster. But as I said, it still retains some intersection calculations. 625 (Refer Slide Time: 43:12) So, this particular algorithm works well when the number of lines which can be clipped without further processing is large compared to the size of the input set of lines. So, when the number of lines that can be clipped without further processing is large, then clearly Cohen Sutherland works much better because no intersection calculation is involved. But if that is not so, then it still has some problem. (Refer Slide Time: 43:52) 626 There are in fact other faster methods that are developed to reduce intersections calculation further and those methods are developed based on more efficient tests which do not require intersection or complex floating point operations. (Refer Slide Time: 44:15) There was one algorithm proposed by Cyrus and Beck, which was among the earliest attempts in this direction and which relied on parametric line equations. Later, a more efficient version was proposed by Liang and Barsky. (Refer Slide Time: 44:44) 627 However, in this course we will not discuss those algorithms any further. If you are interested then you may refer to the reading material that will be mentioned next. So to summarize, today we have learned about clipping in 2D, clipping means discarding objects that are outside the view volume. In case of 2D clipping we want to discard lines or points that are outside the clipping window. Later on, we will see how to discard fill area also that are outside clipping window. And in order to do that, we rely on some algorithm to reduce extensive intersection point calculations. One such algorithm we have learned today, that is Cohen Sutherland algorithm. It is quite efficient however it still retains some amount of complex calculations which can be alleviated with other more efficient algorithms, namely by Cyrus and Beck or by Liang and Barsky. So, those algorithms we have not discussed. (Refer Slide Time: 46:07) If you want to learn about those you may refer to the reading material that is mentioned in this slide. So, you may refer to this book. Go through chapter 7, section 7.1, today we discussed section 7.1.1. However, if you are interested to learn about Liang Barsky algorithm, then you can go through the next section as well, section 7.1.2 also. We will not discuss that algorithm but you may go through it, if you want more information. That is all for today. Thank you and goodbye. 628 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 22 2D Fill-area Clipping and 3D Clipping (Refer Slide Time: 00:46) Hello and welcome to lecture number 22 in the course Computer Graphics. We are currently discussing the 3D graphics pipeline. And the pipeline has got 5 stages. We have already discussed object representation that is the first stage. Then modelling transformations - second stage. Lighting or assigning colour - third stage. Currently, we are in the fourth stage that is viewing pipeline. As you can see, it consists of 5 sub-stages. We have already discussed few of those and continuing our discussion on the remaining ones. 629 (Refer Slide Time: 01:20) So, among those sub-stages we have already discussed earlier. View transformation, projection transformation and viewport transformation. (Refer Slide Time: 01:34) Two more operations are there, as we have seen in the pipeline; clipping and hidden surface removal. Among them currently we are discussing clipping. 630 (Refer Slide Time: 01:48) So, in the last lecture, we introduced the basic idea of clipping and also discussed 2D line clipping. So, will continue our discussion on clipping. Today, we are going to discuss fill area clipping as well as 3D clipping. (Refer Slide Time: 02:12) 631 So, what is this fill area clipping? So, as we mentioned, when we talk of clipping, there is a clipping window and earlier we have discussed how to clip points and lines against this window. However, when we project objects the projection maybe in the form of a fill area such as a polygon where there is a boundary. Now clipping a filled area is different than flipping a point or a line, as we shall see in todays lecture. In fact, such situations are quite frequent in practice where we have to clip polygons against the clipping window. So, it requires some mechanism to do that. 632 (Refer Slide Time: 03:12) Now, what can be a very obvious and straightforward approach, let us try to understand the situation. Suppose this is our clipping window and we are given a polygon, something like this after projection say this triangle. So we have to keep this part which is inside the clipping window, which I am showing with shade and we have to clip out the, outside part. How we can do that? One way can be to use the line clippers that we discussed in earlier lecture for each of the edge, like here is one edge, one edge, one edge of the field area. And then perform clipping on the edges and decide on the clipped region. However, as you can see from this example, that is not necessarily easy, efficient and going to give us a good approach. Sometimes it is even difficult to understand how it works 633 (Refer Slide Time: 04:45) Instead, we require better approaches. There are in fact, many efficient algorithms proposed for the purpose. In this lecture we are going to discuss two of those approaches. One is SutherlandHodgeman algorithm and the other one is Weiler-Atherton algorithm. Let us try to understand these algorithms. (Refer Slide Time: 05:13) 634 We will start with the Sutherland-Hodgeman algorithm, what this algorithm does? Here in this algorithm we start with 4 clippers. Now, these clippers are essentially the lines that define the window boundary. For example, if this is my window boundary, then each of these lines defining the boundary is a clipper. So, there are 4 clippers in 2D clipping that is right, left, above and below. Now, each clipper takes as input a list of ordered pair of vertices which essentially indicate the edges, each pair of vertex indicate the edge. And from that input list it produces another list of output vertices that is the basic idea. So, there are 4 clippers, each clipper takes as input a list of ordered pair of vertices where each pair of vertices represent an edge. And then it performs some operations to produce an output list of vertices. 635 (Refer Slide Time: 06:55) Now, when we perform these operations, we impose some order of checking against each clipper that can be any order. Here in this discussion will assume the order left clipper first, then right clipper, then bottom clipper, and at the end the top or above clipper. (Refer Slide Time: 07:20) Now, as we said we start with the left clipper. So, its input set is the original polygon vertices or in other words, the original polygon edges represented by the pair of vertices that is the input set to the first or the left clipper. 636 (Refer Slide Time: 07:44) Now, to create a vertex list as output or also to provide the input vertex list, we need to follow a naming convention, whether to name the vertices in a clockwise manner or anticlockwise manner. Here again, we will assume that we will follow an anticlockwise naming of vertices. With these conventions, let us denote input vertex list to a clipper by the set V having these vertices. 637 (Refer Slide Time: 08:31) Now, for each edge or the pair of vertices in the list denoted by vi, vj. We perform some checks and based on the check results, we take some action. So, what are those checks? 638 (Refer Slide Time: 08:52) If vi is inside and vj is outside of the clipper then we return the intersection point of the clipper with the edge represented by the vertex pair vi, vj. If both vertices are inside the clipper, then we return vj. 639 (Refer Slide Time: 09:25) If vi is outside and vj is inside of the clipper, then we return two things. One is the intersection point of the clipper with the edge represented by the pair vi, vj and also vj. Both the things we return intersection point and vj. And finally, if both vertices are outside the clipper then we do not return anything, we return null. 640 (Refer Slide Time: 10:05) Now here we have use the terms inside and outside. So, how they are defined? In fact these terms are to be interpreted differently for different clipper. So, there is not a single meaning to these terms based on the clipper we define these terms. (Refer Slide Time: 10:27) And let us now go through this definition for each of the 4 clippers. So, for the left clipper, when you talk of insight we mean that the vertex is on the right side of the clipper and when we talk of outside we mean that it is on the left side of the clipper. For right clipper it is just the opposite. 641 When the vertex is on the left side, we call it inside. Otherwise it is outside. For top clipper if a vertex is below the clipper that means it is inside. Otherwise it is outside. And for a bottom clipper, it is again just the opposite of top clipper that means inside vertex means it is above the clipper, whereas outside means it is below. And how do we determine whether a vertex is on the right side or left side or above or below, just by considering the coordinates values, by comparing the coordinated values of the vertex with respect to the particular clipper. For example, suppose this is the top clipper, suppose it is equation is given by x = y = 4. Now suppose a point is denoted by (3, 5). Now we check here the y-value of the point that is 5, clearly 5 is greater than 4 which is the y-value of the boundary, top boundary. Then we can say that this point is outside because it is above the clipper. Similarly, we can determine the inside and outside based on comparing the x or y coordinate value of the vertex with respect to the clipper values. (Refer Slide Time: 12:55) If the vertex is on the clipper then it is considered inside in all the cases. So, for a left clipper inside means either it is on the right side or on the clipper, otherwise it is outside. And same is true for all other clippers. 642 (Refer Slide Time: 13:21) Now, let us try to understand this algorithm in terms of an illustrative example. Let us consider this situation here we have defined one clipping window and we have a fill area. Now this fill area is defined by the vertices {1, 2, 3} as you can see here, we followed a counter clockwise or anticlockwise naming convention to list the vertices. Our objective is to determine the clipped polygon. That is this polygon denoted by the vertices {2’, 3’, 3’’, 1’ and 2}. And we use to do that by following the Sutherland-Hodgeman algorithm. 643 (Refer Slide Time: 14:44) So, at the beginning we start with the left clipper. Then we check against right clipper, then top clipper and then bottom clipper. (Refer Slide Time: 15:06) Let us see what happens after checking against the left clipper. So here the input vertex list is the original vertices that is {1, 2, 3} which indicates the three edges represented by the vertex pair; {1, 2}; {2, 3} and {3, 1}. So, for each pair we perform the check. For pair {1, 2} we can see that 644 both the vertices are on the right side of the left clipper that means both are inside. So, Vout is 2 as per the algorithm. Similarly, after checking for {2, 3} against the left clipper, we can see that the final Vout becomes {2, 3} taking into account 2. And after checking {3, 1} the final list becomes {2, 3, 1}. In all the cases against the left clipper all the vertices are inside. (Refer Slide Time: 16:29) Now, let us check against right clipper. So, now the input vertex list is {1, 2, 3} same and initially Vout is NULL. So, pair of vertices has to be checked {1, 2; 2, 3} and {3, 1}, all the three edges we need to check. For 1, 2 both are inside the right clipper. We can check by comparing their coordinative values because both of them are on the left side of the right clipper. So, V out is now 2. Then we check the pair {2, 3} here we can see that 2 is inside whereas 3 is outside. So, in that case we compute the intersection point 2’ this point and then set Vout to be {2, 2’}. Then we check {3, 1} here vertex 3 is outside because it is on the right side of the clipper, whereas 1 is inside because it is on the left side. So, here we calculate the intersection point 3’ and then finalize the output vertex list as {2, 2’, 3’ (and) 1}; because in this case we return 1 also. So, then after checking against the right clipper, we get this output list; {2, 2’, 3’ (that means this point and), 1}. 645 (Refer Slide Time: 18:40) Then we check against the top clipper. Now in this case, the Vin or the input vertex list is the output vertex list after checking against the right clipper. So that is 2, 2’, 3’, 1. So, initially Vout is NULL. And the pair of vertices we need to check are 4; {2, 2’, 2’, 3’, 3’ 1} and {1, 2}. So, first we check {2, 2’} against the top clipper and we find that both 2 and 2 dash are inside because both of them are below the clipper. So, output list becomes 2’. Then we check the next vertex pair {2’, 3’} again {2’, 3’} both are below, so inside then Vout becomes 2’ and 3’. Then we check 3’, 1 in this case, we see that 3’ is inside, whereas 1 is outside. Then we calculate the intersection point 3’’ here and modify our output list to be {2’, 3’, (and) 3’’}. Finally, we check {1, 2}. Here we see that 1 is outside whereas 2 is inside. Then again we calculate the intersection point 1’ and modify Vout to be {2’, 3’, 3’’, 1’ (and) 2}. So, this is our output list after checking against top clipper, and this serves as the input list to the remaining clipper to be checked that is bottom clipper. 646 (Refer Slide Time: 20:43) This is the input list for the bottom clipper and output list is initially null and as we can see all these vertices 2, 2’, 3’, 3’’ and 1’ are inside because they are above the bottom clipper. So, the output list becomes the same that is {2’, 3’, 3’’, 1’, 2}. This is also the output of the algorithm, because here no more clippers are there to check against and the algorithm stops. So, at the end of the algorithm, we get this vertex list which represents the clipped region. That is how the algorithm works. 647 (Refer Slide Time: 21:43) Now let us move to our next algorithm that is Weiler-Atherton algorithm. Now, the SutherlandHodgeman algorithm that we just discussed works well when the fill area is a convex polygon and it is to be clipped against a rectangular clipping window. So, if this condition satisfy then Sutherland-Hodgeman algorithm works well. 648 (Refer Slide Time: 22:16) However that need not be the case always and that Weiler-Atherton algorithm provides a more general solution. This algorithm can be used for any polygon, either concave or convex against any polygonal clipping window. Need not be only a rectangle. Let us see how it works. (Refer Slide Time: 22:44) So, we will try to understand the algorithm in terms of an example rather than formal steps. Let us consider this scenario here we have a rectangular clipping window and a fill area. So, we will try to understand how the algorithm helps us identify the parts to be discarded that is this region 649 and the parts to be kept after clipping that is these two regions this one and this one. So, here we start with processing the fill area edges in a particular order, which is typically anticlockwise order. So, here we start with processing the fill area edges in a particular order which typically is anticlockwise order. (Refer Slide Time: 24:00) So, what we do in the processing, we check the edges one by one, continue along the edges till we encounter an edge that crosses to the outside of the clip window boundary. Let us start with this edge (1, 2) this edge. So, we check it whether it crosses the window boundary or not, that is our processing. It does not cross so we continue to the next stage, that is {2, 3} represented by the vertex pair {2, 3}. Now this edge crosses to the outside of the window boundary. Note that here we are following anticlockwise order. If the edge does not cross to the outside instead if the edge is crossing into inside of the window then we just record by intersection point, whereas if the edge is crossing to the outside boundary, then we stop and perform some different action, what we do in that case. 650 (Refer Slide Time: 25:24) At the intersection point, we make a detour. So, here the intersection point is 2’ this point. So, then we make a detour. We no longer continue along this direction. Instead what we do we now follow the edge of the clip window along the same direction, maintaining the traversal order. So, now in this example so we will follow this anticlockwise direction and make a detour from here now along the window boundary, so here we will follow this order. So, essentially, how you are traversing, we initially traversed in this way then while traversing in this way found that this edge is crossing to the outside. So, then we traverse in this way instead of continuing along the edge. 651 (Refer Slide Time: 26:25) Now, this along the boundary traversal, we continue till we encounter another fill area edge that crosses to the inside of the clip window. So, here as you can see, if we follow a anticlockwise traversal, then this edge is actually crosses to the inside. So, the edge is 6, 1 denoted by the vertex pair 6, 1 which crosses to the inside of the window and we encountered it while traversing along the window boundary. At that point what we do? 652 (Refer Slide Time: 27:17) At that point, we resume the polygon edge traversal again along the same direction. So, we stop here and then again continue along the same direction till we encounter previously processed intersection point. So, here we continue up to point 1 because point one is already processed. So, we stop here. (Refer Slide Time: 27:53) So, then after this part, we see that we started from here, then traversed up to this point, determined this intersection point, then traversed along this line up to this intersection point, 653 traversed back up to the originating point. So, there are two rules of traversal from an intersection point due to outside to inside fill area edge we should follow the polygon edges from an intersection point due to inside to outside fill area edge we should follow the window boundaries. So, these are the rules we applied while performing the traversal. But this gives us one part of the clipped area that is this part and apparently here it stopped. So, how to get to the other part? Actually, here the algorithm does not stop. What happens next? (Refer Slide Time: 29:00) Before we go into that, also, you should remember that whenever we are traversing the traversal direction remains the same, irrespective of whether you are traversing along the edge or along the windows boundary. So, if you are following an anticlockwise direction, it should be anticlockwise always. 654 (Refer Slide Time: 29:21) And after this traversal ends, the output is the vertex list representing a clipped area, as we have seen. So, in this case the traversal ended at 1. So, we get the vertex list {1, 2, 2’, (and) 1’} which gives us this clipped area. (Refer Slide Time: 29:46) But clearly here, the whole fill area is not covered. Some of the vertices are still not processed, so then what do we do? We resume traversal in case all the vertices are not processed. We resume the traversal along the polygon edges in the same direction from last intersection point of 655 an inside outside polygon edge. So, our last intersection point of an inside outside polygon edge is 2’ here. Remember that this 1’ is outside inside edge. So, it is not applicable. So, what is applicable is 2’. So, from there we resume our traversal till we cover the remaining vertices. And this traversal is in a similar way that we have done before. So, here what we do, we traverse along this anticlockwise direction to the vertex here. So, we traverse this edge, then this edge. But here, as you can see, there is an outside to inside crossing. So, we do not do anything, we keep on traversing this way, this way. Now at this point we can see that one inside to outside crossing is there. In the earlier case, it was outside to inside. Here it is, inside to outside at 6’. So, now we traverse along the edge. Then we encountered this intersection point again. This is from outside to inside. So, now we resume our traversal along edge. So, finally what we did, we traversed this direction, this direction, this direction, then this direction, then this direction, this direction. Now since already we have encountered 4 before so we stop our traversal here when we encounter 4. Then we get this remaining portion of the clipped area also just like the way we got it earlier. So, that is how Weiler-Atherton works. (Refer Slide Time: 32:19) So, we encountered or we discussed two algorithms; one is Sutherland-Hodgeman, one is Weiler-Atherton. Sutherland Hodgeman is simpler but it has restrictive use. It is applicable when we have a convex polygon which is clipped against a rectangular window. Whereas Weiler- 656 Atherton is more generic it is applicable for any fill area, polygonal fill area, either concave or convex against any polygonal clipping window. So, so far we have discussed clipping in 2D. So, we have learned how to clip a point line and fill area. Now let us try to understand clipping in 3D, because here our main focus is 3D graphic pipeline. So, we will try to understand clipping in 3D which is essentially extension of the ideas that we have already discussed that is clipping in 2D. Let us see how these extensions are done. (Refer Slide Time: 33:42) Only thing we have to keep in mind is that here we are talking about clipping against normalized view volume which is usually a symmetric cute with each coordinate in the range minus 1 to 1 in the 3 directions. That is a normalized view volume we assume while developing the or performing the clipping. Now, Cohen-Sutherland we can extend the basic 2D version to 3D with some modification. 657 (Refer Slide Time: 34:22) Point clipping also, we can extend, so let us first talk about point clipping. Here we check for x, y and z earlier we are checking only for x and y whether these values are within the range of the canonical volume. If that is so, then the point is to be kept. Otherwise it is to be clipped out. (Refer Slide Time: 34:53) In case of Cohen Sutherland line clipping algorithm, it can be easily extended to 3D clipping. However, with some modifications, core idea remains the same. That is, we divide view coordinate space into regions. Now, earlier we had 9 regions. Now since we are dealing with 3D 658 we have 27 regions, 3 times. Now since we have 27 regions. So, each region needs to be represented with 6 bits. Each bit for the 6 planes that define the canonical view volume. Far, near, top, bottom, right, left this is in contrast with the 4 bits earlier used to denote the 4 sides of the window. Now for each plane, we have this 9 regions defined so there are 9 regions behind the far plane. There are 9 regions between near and far plane and there are 9 regions in front of the near plane. Together there are 27 regions and each region is represented with this 6 bit code, where bit 6 represent the far region, bit 5 is the near region, bit 4 is the top region, bit 3 is the bottom region, bit 2 represents the right region and bit 1 represents the left region. The idea remains the same with 2D only the size changes because we are now dealing with 3D. The other steps remained the same. 659 (Refer Slide Time: 37:21) Now let us try to understand the extension of the algorithms for fill area clipping. So, here what we do. We first check if the bounding volume of the polyhedron that is the fill area is outside the view volume simply by comparing their maximum and minimum coordinate values in each of the x, y and y directions. If the bounding volume is outside then we clip it out and entirely. Otherwise we apply 3D extension of the Sutherland Hodgeman algorithm for clipping. 660 (Refer Slide Time: 38:16) Here also the core idea of 3D Sutherland Hodgeman algorithm remains the same with 2D version with two main differences. What are those differences? (Refer Slide Time: 38:31) A polyhedron is made up of polygonal surfaces. So, here we take one surface at a time to perform clipping. Earlier what we were doing, we took one line at a time. Here we are taking one surface at a time. Now, usually polygons divided into triangular meshes and there are algorithms to do so which you can refer to in the reference material at the end of this lecture. So, using those 661 algorithms, we can divide a polygon into a triangular mesh and then each triangle is processed at a time. (Refer Slide Time: 39:17) And the second difference is, instead of the 4 clippers that we had earlier, we now have 6 clippers. Which correspond to the 6 bounding surfaces of the normalized view volume which is a cube. So, these are the differences between the 2D version of the algorithm and the 3D version that earlier we are considering line at a time for clipping. Now we are considering a surface at a time. Now these surfaces are polygonal surfaces and we can convert these surfaces into triangular meshes. And then we perform clipping for each triangle at a time that is one difference. Other difference is earlier we are dealing with 4 clippers, now we have 6 clippers representing the 6 bounding planes of the view volume which is a cube. So, that is in summary the major differences between 2D clipping and 3D clipping. Core ideas remain the same some minor changes are there. So, with that we come to the end of our discussion on clipping. 662 (Refer Slide Time: 40:43) And our next topic will be hidden surface removal. So, here few things omitted during the discussion. For example the triangular mesh creation from given polygon. So, for these details you may refer to the material that will be mentioned in the next slide. (Refer Slide Time: 41:16) So, whatever I have covered today can be found in this book. You can go through chapter 7, section 7.1.3, 7.1.4 and section 7.2. For the topics that I have covered however outside this topics 663 also there are few interesting things that I did not discuss but you can find that in the book. So, you may like to go through those material as well. That is all for today. Thank you and goodbye. 664 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture - 23 Hidden Surface Removal - 1 Hello and welcome to lecture number 23 in the course Computer Graphics. So we are currently discussing the graphics pipeline and there are five stages in the pipeline as we all know. (Refer Slide Time: 00:46) The first stage is object representation, second stage is modeling transformation, third stage is lighting, fourth stage is viewing pipeline, and the fifth stage is scan conversion. Among them, we are currently in the fourth stage that is viewing transformation. The previous three stages we have already discussed. 665 (Refer Slide Time: 01:16) Now, in this fourth stage, as you may recollect there are many sub-stages. So there are 3 transformations and 2 sub-stages related to some other operations. So the three transformations are view transformation, projection transformation, and viewport transformation. (Refer Slide Time: 01:47) Then we have two operations, clipping and hidden surface removal. Now, among all these transformations and operations, we have already covered the three transformations and clipping. 666 (Refer Slide Time: 02:02) Today, we are going to discuss the remaining operation that is hidden surface removal. Let us see what is hidden surface removal, what is the basic idea, and how we can do this. (Refer Slide Time: 02:22) 667 Earlier, during our discussion on clipping, we have learned that how to remove objects fully or partially that are outside the view volume. So those we did using the clipping algorithms. So to note here that the clipping was done on objects that are partially or fully outside of the view volume. Sometimes, we actually need to remove either again fully or partially objects that are inside the view volume. So in case of clipping, we are dealing with objects that are outside the view volume, whereas in case of hidden surface removal, we deal with objects that are inside the volume. Now when the objects are inside the volume, clearly, we cannot apply the clipping algorithm because clipping algorithms are designed to detect the objects that are outside, either fully or partially. (Refer Slide Time: 03:40) 668 Let us see one example. Consider this image. Here, there are two objects; this is the one and this cylinder is the other one. Now ideally, there is one surface here. For realistic image generation, if we are looking at this object from this direction then ideally, we should not be able to see this surface represented by the dotted boundary line, this surface. So if the viewer is located at this point here then this object A surface, which is behind this object B should not be visible to the viewer. So before we render the image to have the realistic effect, we should be able to eliminate this surface from the rendered image. (Refer Slide Time: 05:11) Here in this case we cannot use clipping, because here we are assuming that both the objects are within the view volume. So clipping algorithms are not applicable. What we require is a different algorithm or a different set of algorithms. Now, these algorithms are collectively known as hidden surface removal methods or alternatively, visible surface detection methods. So to note which clipping, what we do? We try to remove objects that are partially or fully outside the view volume. With hidden surface removal, what we do? We try to remove object surfaces or objects, which are inside the view volume but which are blocked from view due to the presence of other objects or surfaces with respect to a particular viewing position. 669 (Refer Slide Time: 06:17) So in case of hidden surface, we are assuming specific viewing direction. Because a surface hidden from a particular viewing position may not be so if we are looking at it from another direction. So with respect to viewing position only, we can determine whether surface or an object is hidden or not. (Refer Slide Time: 06:46) Now, before we go into the details of the methods for hidden surface removal, we should keep in mind that there are two assumptions that we will be making. First one is we will use a right-handed coordinate system and we will assume that the viewer looking at the scene along the negative Z direction. Secondly, the objects whichever are there in the scene have 670 polygonal surfaces, so all the object surfaces are polygonal. These two assumptions we will make in order to explain the hidden surface removal methods. (Refer Slide Time: 07:48) Now let us go into the details of the methods that are there to detect and eliminate hidden surfaces. (Refer Slide Time: 08:00) Now there are many methods, all these methods we can broadly divide into two types. One is object space method, the second one is image space method. So what is the idea behind these methods let us try to understand? 671 (Refer Slide Time: 08:24) In case of object space method what we do, we compare objects or parts of the objects to each other to determine the visible surfaces. So here we are dealing with objects at the level of 3D. (Refer Slide Time: 08:53) And the general approach that is followed to perform hidden surface removal with an object space methods consists of two stages broadly. So for each object in the scene what we do is first we determine those parts of the object whose view is unobstructed by other parts or any other object with respect to the viewing specification. 672 So first stage is to determine the parts that are hidden with respect to the viewing position. And then in the second position, we render the parts that are not hidden. So, essentially those parts that are not obstructed with the color of the object. So these two are the general steps that are performed in any object space method namely, first stage is to determine the surfaces that are not hidden and in second stage we render those surfaces or parts of the objects with the particular color. (Refer Slide Time: 10:12) Since here we are dealing with objects, so essentially these methods work before projection, at the 3D object level. Remember that once we perform projection, the objects are transformed to a 2D description. So we cannot have this 3D characteristics. 673 (Refer Slide Time: 10:41) So what are the advantages? There is one advantage. So this object space methods provide device-independent method and work for any resolution of the screen, but it also has some drawbacks namely, determination of these surfaces that are hidden or not hidden is computation intensive. Secondly, depending on the complexity of the scene and also the resources that are available. These methods can even become infeasible because they are computation intensive and if the resources are not sufficient then we may not be able to implement them at all. (Refer Slide Time: 11:43) 674 Usually, such methods are suitable for simple scenes with small number of objects. So object space methods are best applicable when the scene is simple and having small number of objects. (Refer Slide Time: 12:00) In case of image space method what happens? So as the name suggest, the detection and rendering takes place at the level of image that means after projections. So here, visibility is decided point-by-point at each pixel position on the projection plane. So here, we are no longer dealing in the 3D space, we are dealing in 2D projection plane at the level of pixels. (Refer Slide Time: 12:41) 675 Again, there are two steps in the general approach. So for each pixel on the screen what we do? We first determine the objects that are closest to viewer and are pierced by the projector through the pixel. So essentially, the closest objects that are projected to that point. Secondly, the second step is to draw the pixel with the object color. So in the first stage, we determine which is the closest object that is projected on the pixel, and in the second stage we assign the pixel color as the object color and that we do for each pixel on the screen. So to compare earlier what we were doing? Earlier, we were doing it for each surface here we are doing it for each pixel. (Refer Slide Time: 13:58) Clearly, the methods work after surfaces are projected and rasterized that means mapped to pixel grid, unlike the previous case where we were in the 3D domain. 676 (Refer Slide Time: 14:20) Here the computations are usually less compared to object space methods. However, the method depends on display resolution because we are doing the computations for each pixel. So if there is a change in resolution then we require re-computation of pixel colors. So that is the overhead. (Refer Slide Time: 14:52) So broadly, there are these two methods, object space methods, and image space methods. Later on, we will see examples of each of these methods which are very popular. But before going into that let us try to talk about some properties that actually are utilized to come up with efficient methods for hidden surface detection and removal. 677 (Refer Slide Time: 15:25) So there are many such properties. Collectively, these properties are called coherence properties, which are used to reduce the computations in hidden surface removal methods. As we already talked about, these methods are computationally intensive. So if we use this coherent properties then some amount of computations can be reduced as we shall see later. Now these properties are essentially related to some similarities between images or parts of the images and if we perform computation for one part then due to these properties, we can apply the results on other parts and that is how we reduce computation. So essentially, we exploit local similarities that means making use of results that we have calculated for one part of a scene or an image for the other nearby parts. So we perform computation for one part and use the result for other part without repeating the same computation and in that way, we reduce some amount of computation. 678 (Refer Slide Time: 16:54) Now, there are many such coherence properties; broadly, of six types, object coherence, face coherence, edge coherence, scan line coherence, depth coherence, and frame coherence. Let us quickly have a discussion on each of these for better understanding although, we will not go into detailed discussions and how they are related to different methods. (Refer Slide Time: 17:25) First is object coherence, what it tells? Here, we check for visibility of an object with respect to another object by comparing its circumscribing solids, which are in many cases, of simple forms, such as spheres or cube. So then, only if the solids overlap we go for further processing. If there is no overlap that means there is no hidden surfaces so we do not need to 679 do any processing for further. So this is simple way of eliminating lots of computations due to this object coherence property. (Refer Slide Time: 18:23) Next, come face coherence. Here, surface properties computed for one part of a surface can be applied to adjacent parts of the same surface that is what is the implication of this face coherence property. For example, if the surface is small then we can assume that the surface is invisible to a viewer if one part of it invisible. So we do not need to check for invisibility for each and every part. So we check it for one part and then we simply say that other parts will also be invisible if that part is invisible. (Refer Slide Time: 19:10) 680 Then third is edge coherence. Here, this property indicates visibility of an edge changes only when it crosses another edge. If one segment of a non-intersecting edge is visible, we determine without further calculation the entire edge is also visible. So edge coherence tells us that there will be a change in visibility only if the edge intersects another edge. In other words, if one segment of an edge is visible and the edge is not intersecting with any other edge that means we can say that entire edge is also visible. (Refer Slide Time: 20:11) Then comes scan line coherence, what it tells? It indicates or implies a line or surface segment that is visible in one scan line is also likely to be visible in the adjacent scan lines and we do not need to perform this visibility computations for every scan line. So we do it for one scan line and apply the result for adjacent scan lines. 681 (Refer Slide Time: 20:48) Next is depth coherence, which tells us that depth of adjacent parts of the same surface are similar. There is not much change in depth at the adjacent parts of a surface. This information, in turn, help us to determine visibility of adjacent parts of a surface without too much computation. (Refer Slide Time: 21:26) Then frame coherence, which tells us that pictures of same scene at successive points in time are likely to be similar despite small changes in objects and viewpoint except near the edges of moving objects. That means visibility computations need not be performed for every scene rendered on the screen. So frame coherence is related to scene change. Earlier coherence 682 properties were related to static images here we are talking of dynamic change in images and based on this coherence property we can conclude that visibility can be determined without computing it again and again for every scene. So that is in short the six coherence properties. First five properties are related to static images last property can be used for rendering animations, which anyway is not part of our lectures here and the hidden surface removal methods make use of this properties to reduce computations. Now let us go into the details of such methods. (Refer Slide Time: 23:10) So we start with a simple method that is called back face elimination method. (Refer Slide Time: 23:19) 683 What is this method? This is actually the simplest way of removing a large number of hidden surfaces for a scene consisting of polyhedrons. So here, we are assuming each object is a polyhedron, and using back face elimination, we can remove a large number of hidden surfaces. The objective is to detect and eliminate surfaces that are on the backside of objects with respect to viewer. So when a surface is on the backside of an object with respect to a particular viewer clearly, during rendering that back surface should not be shown. With back face elimination method, we can detect on those back surfaces and then remove them from further consideration during rendering. (Refer Slide Time: 24:18) The steps are very simple for this particular method. So there are three steps. In the first step, we determine a normal vector for each surface N represented in terms of its scalar quantities a, b, c along the three-axis. I am assuming here that you all know how to calculate the normal vector for a given surface, if not then you may refer to any book on vector algebra, basic book; it is a very simple process and I will not discuss the details here. Now, once the normal is decided we check for this Z component. If this Z component is less than equal to 0, the scalar component, then we eliminate that particular surface because when Z component is less than 0, the surface is back face, whereas when it is equal to 0 the viewing vector actually grazes the surface. In that case, also, we consider it to be a back face. 684 Now, if c is greater than 0 then we retain the surface it is not the back face and we perform these steps one and two for all the surfaces in a loop. So as you can see, it is a very simple method we simply take one surface at a time, compute its surface normal and check the Z component the scalar component of Z. If it is less than equal to 0 then we eliminate the surface otherwise we retain it and we do it for all the surfaces. (Refer Slide Time: 26:24) Let us consider one example. Suppose this is our object, it contain four surfaces ACB, ADB, DCB, and ADC; four surfaces are there. So for each of these surfaces, we perform the back face elimination method. For each of these surface, we calculate the Z component of the surface normal as mentioned in the previous steps. 685 (Refer Slide Time: 27:05) Let us do it for the surfaces. For ACB, the z component of the normal is -12, so it is less than equal to 0. So ACB is not visible as you can see from this side ACB is on the backside of the surface. (Refer Slide Time: 27:28) For ADB, DCB, and ADC the z components of normal are -4, 4, and 2, respectively. So we can see that for DCB and ADC the z component is greater than 0, so these are visible surfaces. But for ADB, it is less than 0 again so it is not a visible surface. So that is the simple way of doing it. 686 (Refer Slide Time: 28:05) And you should note here that we are dealing with 3D description of the objects in the view coordinate system, so it works on surfaces, therefore it is an object space method. In practice, using this very simple method, we can eliminate about half of all the surfaces in a scene without any further complicated calculations. (Refer Slide Time: 28:45) However, there is a problem with this method. This method does not consider obscuring of a surface by other objects in the scene. So what we did? We essentially eliminated back faces of an object. Now here, the back face are obscured by surface of the same object. If a surface is not a back face but it is obscured by surface of some other object then those surfaces 687 cannot be determined or detected using the back face elimination method and we require some other algorithms. However, those other algorithms are useful for detection of a surface that is obscured by other object surfaces and we can use those algorithms in conjunction with this method. (Refer Slide Time: 29:48) Let us discuss one of those methods that is depth buffer algorithm or also known as Z-buffer algorithm. (Refer Slide Time: 29:58) 688 Now, this algorithm is an image space method. That means here we perform comparisons at the pixel level. So we assume here that already the surfaces are projected on the pixel grid and then we are comparing the distance of the surface from the viewer position. (Refer Slide Time: 30:24) Earlier, we mentioned that after projection the depth information is lost. However, we require the depth information here to compare the distance of the surface from a viewer. So we store that depth information even after projection and we assume an extra storage, which has this depth information, which is called depth buffer or Z-buffer. Here, the size of this buffer is same as the frame buffer. That means there is one storage for each pixel in this buffer. (Refer Slide Time: 31:12) 689 Another assumption we make that is we are dealing with canonical volumes. Depth of any point cannot exceed normalized range. So we already have a normalized range of the volume and the depth cannot exceed that range that we are assuming. If we assume that then we actually can fix the depth buffer size or number of bits per pixels. Otherwise, if we allow unrestricted depth then we do not know how many bits to keep and that may create implementation issues. So we go for some standardized considerations. (Refer Slide Time: 32:02) Now, this is the algorithm, the depth buffer algorithm shown here. Input is the depth buffer which is initialized to 1; then we have the frame buffer, which is initialized to the background color; list of surfaces and list of projected points for each surface; so all these are input. And output is this depth buffer and frame buffer with appropriate values. That means depth buffer value will keep on changing and frame buffer will contain the final values at the end of the algorithm. So what we do? For each surface in this surface list, we perform some steps. Now, for each surface, we have the projected pixel positions. So for each projected pixel position of the surface i, j starting from the top-left most projected pixel position, what we do? We calculate the depth denoted by d of the projected point on the surface and then compare it with already stored depth of that point. If d is less than what is already stored in the depth buffer at the corresponding location then we update this depth buffer information and then we update the frame buffer information with the color of the particular surface and this we continue for all the pixels for that 690 projected surface and we do it for all the surfaces. Now the crucial stage here is calculation of the depth, how do we do that? (Refer Slide Time: 34:06) We can do that iteratively. Let us see how. (Refer Slide Time: 34:14) Consider this scenario here. This is an illustrative way to understand this iterative method. We are considering this triangular surface, which after projection look something like this. Now, the surface equation we know which we can represent as ax + by + cz + d = 0. Again, if you are not familiar with this, you may refer to basic textbooks on vectors and planes. 691 Now given this surface equation, we can find out this z values in terms of a b c as shown here. And z value is the depth of that particular point so this is the depth of any point on the surface. Now, we are assuming canonical view volume. That means all projections are parallel projections. So the projection is a simple one, if a point is x, y, z then after projection, it becomes x, y we drop the z component. So that is our assumption. (Refer Slide Time: 35:53) Now let us consider one projected pixel i, j of the particular surface. So x is i, y is j. Now, depth of the original surface point z is then given by this expression where we replace x and y with i and j a, b, c, and d are constants. (Refer Slide Time: 36:23) 692 Now, as we progress along the same scan line, say consider this point and this point. The next pixel is at i+1, and j. Now, the depth of the corresponding surface point at the next pixel location is given by this expression where we replace i with i+1. After expanding and rearranging, we can get this form. Now this part, we already know to be the depth of the point i, j. So we can simply say z’ depth is (z- a/c), and here you can see this is a constant term. So for successive points, we can compute depth by simply subtracting a constant term from the previous depth that is the iterative method. (Refer Slide Time: 37:39) So that is along a scan line. What happens across scan lines? A similar iterative method we can formulate. Now, let us assume the point x, y on an edge of a projected surface say, here. Now in the next scan line, x becomes (x- 1/m) in this case, and the y value becomes y minus 1; m is the slope of the edge line. 693 (Refer Slide Time: 38:22) Then we can compute new depth at this new point as shown here and if we expand and rearrange what we will get? This new depth in terms of the previous depth and a constant term. So again, we see that across scan lines, we can compute depth at the edges by adding a constant term to the previous depth, and then along scan line, we can continue by subtracting a constant term from the previous depth. So this is the iterative way of computing depth and this method we follow in the Z-buffer algorithm to compute depth at a given point. Let us try to understand this algorithm with an illustrative example. (Refer Slide Time: 39:40) 694 Let us assume there are two triangular surfaces s1 and s2. Clearly, they are in the view volume. Now, vertices of s1 are given as these three vertices, s2 is also given as these three vertices. As before we are assuming parallel projection due to canonical view volume transformation and also, we can derive projection points or the projected vertices of s1 and s2 on the view plane to be denoted by this vertices shown here, which essentially, we can obtain by simply dropping the z component. Now, this is the situation shown in this figure. (Refer Slide Time: 40:48) Now, we are given a point (3, 1) and we want to determine the color at this point. Note that this point is part of both the surfaces, so which surface color it should get, we can determine using the algorithm. Now let us assume that cl1 and cl2 are the colors of s1 and s2 and bg is the background color. Now, initially, the depth buffer values are set at a very high value and frame buffer values are said to be background color, and then we follow the algorithm steps, we process the surfaces one at a time in the order s1 followed by s2. 695 (Refer Slide Time: 41:40) Now let us start with s1, what happens? From the given vertices, we can determine the s1 surface equation to be x+y+z-6=0. Then we determine the depth of the left-most projected surface pixel on the topmost scan line that is pixel (0, 6) here, which is z to be 0. (Refer Slide Time: 42:14) Now, this is the only point on the topmost scan line as you can see in the figure, then we move to the next scan line below that is y=5. And in this way using iterative method, we determine the depth of the left-most projected pixel on this scan line to be, using this expression to be 1 because here m is very high, infinity. And thus, surface equation is this. (Refer Slide Time: 43:00) 696 Then the algorithm proceeds to the computation of depth and color determination along y= 5 till the right edge. At that point, it goes to the next scan line down that is y=4 here. Now, we can skip all these steps and we can go directly to y=1, this line on which the point of interest lies. (Refer Slide Time: 43:34) Now, following this iterative procedure that we have outlined earlier across scan lines, we compute first the depth of the left-most point here as z=5. We skip those steps, you can do the calculations on your own and find out. Then we move along this scan line when this direction. So we go to the next point here, then here, and so on up to this point, (3, 1), and calculate that at this point z is 2. 697 Now, this depth value is less than already stored value which is infinity. So we set this value at the corresponding depth buffer location and then reset the frame buffer value from background to the color of surface 1. (Refer Slide Time: 44:51) Then our processing continuous for all points, but those are of not much relevance here because we are concerned only with this point, so we will skip those processing. So once the processing completes for s1 for all the projected points, we go to s2 and we perform similar iterative steps. And then, we find out the color at that particular point for s2 and then perform comparison and assign the color. So we skip here all the other calculations and it is left as an exercise for you to complete the calculations. So that is the idea behind depth buffer algorithm or the Z-buffer algorithm. 698 (Refer Slide Time: 45:54) Now one point is there. With this particular algorithm, a pixel can have one surface color. So given multiple surfaces, a pixel at a time can have only one of those surface colors. From any given viewing position, that means only one surface is visible. So this situation is acceptable if we are dealing with opaque surfaces. (Refer Slide Time: 46:28) If the surfaces are not opaque, if they are transparent then definitely, we get to see multiple surfaces which is not possible with this particular depth buffer algorithm. In case of transparent surfaces, pixel color is a combination of the surface color plus contribution from surfaces behind, and our depth buffer will not work in that case because we have only one 699 location to store the depth value for each pixel. So we cannot store all surface contributions to the color value. (Refer Slide Time: 47:08) There is another method called A-buffer method which can be used to overcome this particular limitation. We will not go into the details of this method, you may refer to the material, reading material. So that is, in short, what we can do with depth buffer method. So to recap, today we learned about basic idea of hidden surface removal. We learned about different properties that can be utilized to reduce computations. Then we learned about two broad class of hidden surface removal algorithms, one is the object space method, one is the image space method. We learned about one object space method that is the back face elimination method, and we also learned about one image space method that is the depth buffer algorithm or Z-buffer algorithm. 700 (Refer Slide Time: 48:14) Whatever we have discussed today can be found in this book, you may refer to Chapter 8, sections 8.1 to 8.4. And if you want to learn more about A-buffer algorithm then you may also check section 8.5. That is all for today. Thank you and goodbye. 701 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture - 24 Hidden Surface Removal - 2 Hello and welcome to lecture number 24 in the course Computer Graphics. We are in the process of learning about the 3D graphics pipeline, which has five stages. (Refer Slide Time: 00:46) What are those stages, let us recap. Object representation, modeling transformation, lighting, viewing pipeline, and scan conversion. So we are currently in this fourth stage of discussion that is the viewing pipeline. 702 (Refer Slide Time: 01:12) There we covered three transformations that are sub stages of the fourth stage namely, view transformation, projection transformation, viewport transformation. (Refer Slide Time: 01:28) Then there are two more operations. These are also sub stages of this fourth stage, clipping and hidden surface removal. 703 (Refer Slide Time: 01:40) Among them, we have already discussed clipping and we started our discussion on HSR or hidden surface removal. So we will continue our discussion on HSR and conclude that discussion today. (Refer Slide Time: 01:54) So in the last lecture, we talked about two hidden surface removal method namely, back face elimination and depth buffer algorithm. Today, we are going to talk about few more hidden surface removal methods. We will start our discussion with another method that is called depth sorting algorithm. 704 (Refer Slide Time: 02:21) Now, this depth sorting algorithm is also known as the painter’s algorithm, another popular name and it works at both image and object space. So it works both at the pixel level as well as the surface level. And why it is called painter’s algorithm? Because it tries to simulate the way painter draws a scene. (Refer Slide Time: 02:56) Now this algorithm consists of two basic steps. 705 (Refer Slide Time: 03:06) What is the first step? In the first step, it sorts the surfaces based on their depth with respect to the view position. To do that, we need to determine the max and min depth of each surface. That means the maximum and minimum depth of each surface and then we create a sorted surface list based on the maximum depth. So we can denote this list in terms of notation si, and we can arrange them in ascending order of depth. So in this notation, depth of si is less than depth of si+1 that is the first stage of the algorithm. (Refer Slide Time: 04:21) 706 In the next stage, we render the surface on the screen one at a time, starting with surface having maximum depth that is the nth surface in the list to surface with the least depth or the lowest depth. (Refer Slide Time: 04:45) Now, during rendering, during the second stage, we perform some comparisons. So when we are rendering a surface si, we compare it with all other surfaces in the list S to check for depth overlap. That means the minimum depth of one surface is greater than maximum depth of another surface as the situation is illustrated here. Because that these two surfaces here, there is no depth overlap because the minimum is greater than maximum. However, here the minimum is not greater than maximum, so there is a depth overlap. 707 (Refer Slide Time: 05:52) If there is no overlap then render the surface and remove it from the list S. (Refer Slide Time: 06:03) In case there is overlap, we perform more checks. First check is bounding rectangles of the two surfaces do not overlap, we check for it. Then we check whether surface si is completely behind the overlapping surface relative to the viewing position. That is the second check, this is the first check. In the third check, overlapping surface completely in front of si relative to the viewing position that is the third check. 708 And finally, the boundary edge projections of the two surfaces on to the view plane do not overlap, this is the fourth check. So there are series of checks that we perform in case there is depth overlap. (Refer Slide Time: 07:06) Let us try to understand these checks. First check is bounding rectangles; do not overlap. Now, how do we check for it? So the rectangles do not overlap if there is no overlap in the x and y coordinate extents of the two surfaces. Consider the situation here. This is one surface, this is another surface. So here, Xmin is less than Xmax, so there is overlap. If there is no overlap then Xmin would be higher than Xmax of the other surface. So if either of these coordinates overlap then the condition fails that means bounding rectangles overlap. So we check for X and Y coordinate extents and then decide whether bounding rectangles overlap or not. 709 (Refer Slide Time: 08:22) Next check is surface si completely behind overlapping surface relative to the viewing position. Now how do we determine it? We determine the plane equation of the overlapping surface where the normal point towards the viewer. Next we check all vertices of si with the plane equation of that overlapping surface. If for all vertices of si the plane equation of the overlapping surface returns value less than 0, then si is behind the overlapping surface, otherwise, it is not behind and this condition fails. Situation is depicted in this diagram. (Refer Slide Time: 09:22) 710 The third condition that we check is whether the overlapping surface is completely in front of the surface of interest si, again relative to the viewing position. Now, this we can check similarly with the plane equations that we have done earlier. Now, this time we use the plane equation of si rather than this overlapping surface and we use the vertices of the overlapping surface. So we use those vertices in the plane equation and for all vertices, if the equation returns positive value then it is completely in front, otherwise, the condition fails. Situation is shown in this figure. (Refer Slide Time: 10:19) And finally, the boundary edge projections of the two surfaces onto the view plane do not the overlap, this is the final check. Now, in order to check for these we need set of projected pixels for each surface and then check if there are any common pixels in the two sets. The idea is illustrated here. If there are common pixels in the two sets then definitely, there is an overlap, otherwise, there is no overlap. 711 (Refer Slide Time: 11:00) Now, as you can see here that this algorithm incorporates elements of both object space and image space methods. The first and the last checks were performed at the pixel level so that is image space, whereas the other two, second and third, were have performed at the object level. So here, the element of object space method is present. (Refer Slide Time: 11:39) Now, when we perform the tests, we perform the tests following this ascending order maintained in s and also, the order of the checks that we have mentioned. Now, as soon as any one of the checks is true, we move to the check for overlap with the next surface of the list. 712 So essentially, what we are doing? Initially, we are checking for Z overlap for one surface, with all other surfaces, if it succeeds then we simply render the surface, otherwise, we perform the checks in that particular order, and during the checks if any check is true then we move to the next step rather than continuing with the other checks. (Refer Slide Time: 12:43) Now if all the test fail, what happens in that case? We swap the order of the surfaces in the list. This is called reordering and then we stop. Then we restart the whole process again from the beginning. So if all checks fail, then we need to reorder the surface list and then we start from the beginning. (Refer Slide Time: 13:12) 713 Now, sometimes there are issues. Sometimes we may get surfaces that intersect with each other. For example, see this surfaces. This is one surface, this is another surface, and they intersect each other. So in this example, one part of surface 1 is at a depth which is larger than surface 2, whereas the other part is at a depth which is lesser than surface 2, as you can see in this figure. (Refer Slide Time: 13:48) Now, in such situations we may face problem, we may initially keep surface 1 and surface 2 in a particular way that is surface 1 after surface 2 in the sorted list. (Refer Slide Time: 14:10) 714 However, if you perform the algorithm you will see that for these surfaces, all conditions will fail so we have to reorder. But that will not solve our purpose. Even if we reorder, the conditions will fail again and we have to reorder again. So initially, we have S1 followed by S2, next we will have S2 followed by S1. Then we will have to reorder again S1 followed by S2 and this will go on, and we may end up in an indefinite loop because the surfaces intersect and the relative distance between the two are difficult to determine. (Refer Slide Time: 15:05) In order to avoid such situations what we can do is we can use an extra flag, a Boolean flag for each surface. If a surface is reordered then the corresponding flag will be set ON, which indicates that the surface is already reordered once. 715 (Refer Slide Time: 15:29) Now, if the surface needs to be reordered again next time we shall do the following. We divide the surface along the intersection line and then add two new surfaces in the sorted list at appropriate positions. So when the surface needs to reordered again, we know that there is intersection. Then we divide the surface along the intersection lines and then add two new surfaces instead of one in the list in a sorted order. Of course, these two steps are very easy to do and requires lots of computations, however, we will not go into the details we just give you some idea rather than the details of how to do that. So that is the basic idea of painter’s algorithm. (Refer Slide Time: 16:15) 716 We will discuss one more algorithm Warnock’s algorithm. (Refer Slide Time: 16:29) This is actually part of a group of methods for hidden surface removal, which are collectively known as area subdivision methods and they work on same general idea. And what is that idea? (Refer Slide Time: 16:56) So we first consider an area of the projected image. 717 (Refer Slide Time: 17:05) Then if we can determine which polygonal surfaces are visible in the area then we assign those surface colors to the area. Of course, if we can determine then our problem is solved. So that determination is the key issue here. (Refer Slide Time: 17:28) Now if we cannot determine, we recursively subdivide area into smaller regions and apply the same decision logic on the sub regions. So it is a recursive process. 718 (Refer Slide Time: 17:44) Warnock’s algorithm is one of the earliest subdivision method developed. (Refer Slide Time: 18:02) In this algorithm, we subdivide a screen area into four equal squares. As you can see this is the region which we divide into four equal squares P1, P2, P3 and P4 then we perform recursion. 719 (Refer Slide Time: 18:28) We check for visibility in each square to determine pixel colors in the square region. So we process each square at a time. (Refer Slide Time: 18:40) And in this processing, there are three cases to check. Case 1, is current square region being checked does not contain any surface. In that case, we do not sub divide the region any further because it does not have any surface so there is no point in further checking and we simply assign background color to the pixels contained in this sub region. 720 (Refer Slide Time: 19:13) In case 2, the nearest surface completely overlaps the region under consideration. That means it is completely overlapped by the surface that is closest to the viewer. In this case also, we do not further sub divide the square, instead we simply assign the surface color to the region because it is completely covered by the surface. So note that here we need to determine the nearest surface and then determine the extent of this surface after projection so that we can check whether it completely cover that sub region. (Refer Slide Time: 20:05) And there is case 3, where none of case 1 and 2 holds. In this case, we perform recursion. We recursively divide the region into four sub regions and then repeat the checks. Recursion 721 stops if either of the cases is met or the region size becomes equal to pixel size. For example, here, as you can see, we subdivided into four more sub regions P31, 32, 33, and 34. Then 31 we performed another recursion, again divided into sub regions, four sub regions. And we continue till either of the conditions 1 or 2 is met or the sub region size becomes equal to pixel size that is the smallest size possible. So this is the idea of the algorithm, where we assume that we are having projected image and then, we divide it into four sub regions at a time and perform recursive steps. (Refer Slide Time: 21:27) So with that, we have come to the conclusion of our discussion on hidden surface removal. (Refer Slide Time: 21:37) 722 Now, before we conclude, few things to be noted here. The hidden surface removal is an important operation in the fourth stage, but it involves lots of complex operations and we exploit the coherence principles to reduce such complexities. These are the things that we should remember. (Refer Slide Time: 22:05) Also, we should remember that there are many methods for hidden surface removal and broadly, they are of two types, object space method and image space method. (Refer Slide Time: 22:16) Among these methods, we covered four such methods that is back face elimination, which is an object space method; Z-buffer algorithm, an image space method; painter’s algorithm, a 723 mix of image and object based method; and Warnock’s algorithm, which is image space method. There are other approaches of course. (Refer Slide Time: 22:42) One popular approach, which is an object space method is an Octree method, which we will not discuss in details, you may refer to the learning material. So we covered fourth stage and all its sub stages namely, the three transformations view transformation, projection transformation, viewport transformation, and also the two operations namely, clipping and hidden surface removal. (Refer Slide Time: 23:19) 724 Whatever we have discussed so far can be found in this book. You may refer to chapter 8, sections 8.6 and 8.7. Also, if you are interested to learn more about another object space method that is the Octree method you may check section 8.8 as well. So that is all for today. In the next lecture, we will start our discussion on the next stage of the pipeline that is scan conversion. Till then, thank you and good bye. 725 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture - 25 Scan Conversion of Basic Shapes - 1 Hello and welcome to lecture number 25 in the course Computer Graphics. We are currently discussing the 3D graphics pipeline which consists of five stages. Let us quickly recap the stages. (Refer Slide Time: 00:52) As you can see in this figure, first stage is object representation, then we have modeling transformation, then lighting or the coloring of objects, then viewing pipeline, and the fifth stage is scan conversion. Among them, we have already discussed the first four stages namely, object representation, modeling transformation, lighting, and the viewing pipeline. 726 (Refer Slide Time: 01:23) Now we are going to discuss the fifth stage that is rendering or also known as scan conversion. So what is this stage is all about? (Refer Slide Time: 01:40) Let us have a look at the very basic problem that we try to address in this fifth stage. 727 (Refer Slide Time: 01:48) So far, whatever we have learned that gives us some idea of one particular thing. Through these four stages that we have discussed so far, we can transform a 3D scene to a 2D viewport description, which is in the device coordinate system. Just quickly have a relook at how it is done as we have learned in our previous discussions. (Refer Slide Time: 02:19) So first, we have a 3D scene say, for example, this cube. Now, this cube is defined in its own coordinate system or local coordinate, which we do in the first stage that is object representation. Then what we do, we transfer it to a world coordinate system through modeling transformation. 728 So the object is now in world coordinate. So this is stage one. And in second stage through modeling transformation, we transfer it to world coordinate description which is stage two. Then we assign colors by using the lighting, shading models. So we color the object in stage-3 and after that, we perform viewing transformation, which is stage four in which we transfer it to a viewport description. So it involves first, transferring the world coordinates into a view coordinate system, then from view coordinate, we perform projects and transformation and transfer it to view plane, then from view plane, we perform a window to viewport mapping to transfer it to this viewport, which is in the device coordinate system. So these three transformations take place in stage four along with, of course, clipping and hidden surface removal. And after that what we get is a 2D representation of the object or scene on a viewport, which is in the device coordinate system. This is how things get transformed from object definition to viewport description. (Refer Slide Time: 05:10) However, the device coordinate system that we are talking about is a continuous system that means the coordinate values of any point can be any real number. So we can have coordinate like 2, 3, which are integers, whereas, we can also have a coordinate like 2.5, 3.1, which are real numbers. So all sorts of coordinates are allowed in device coordinate system. 729 (Refer Slide Time: 05:47) In contrast, when we are actually trying to display it on a screen, we have a pixel grid that means it is a discrete coordinate system. So all possible coordinates are not allowed, instead, we must have something where only integer coordinates are defined. So whatever we want to display on the pixel grid must be displayed in terms of integer coordinates, we cannot have real coordinate values. (Refer Slide Time: 06:35) Thus what we need? We need to map from the viewport description, which is a continuous coordinate space to a pixel grid, which is a discrete coordinate space. So this is the final mapping 730 that we need to do before a scene is rendered on a physical display screen. Now, these mapping algorithms or the techniques that we use for mapping are collectively known as rendering or more popularly, they are called scan conversion or sometime rasterization as we are mostly dealing with raster scan devices. So all these three terms are used, rendering, scan conversion, or rasterization. (Refer Slide Time: 07:34) So what can be the straightforward approach to do this? You may think it is pretty simple. What we can do is simply round off the real coordinates to the nearest integer coordinates. For example, if we have a coordinate value like (2.3, 2.6), we can round it up to (2, 3) that is the nearest integer coordinate values. However, this may be good for converting points or mapping points from continuous coordinate space to discrete coordinate space, however, same scheme may not be good for lines, circles, or other primitive shapes that are required for rendering a scene. Now, let us try to understand how then we can take care of scan conversion of lines, circles, or other primitive shapes. 731 (Refer Slide Time: 08:53) Let us start with line scan conversion, how we can basically map a line defined in a continuous coordinate space to a line defined in a discrete coordinate space. (Refer Slide Time: 09:13) We will start with a very simple and intuitive approach and then, we will try to understand the problem with the intuitive approach, and then, we will introduce better and better approaches. Now, we all know that we can define a lightened segment in terms of its endpoints. So to scan convert, what we need? We need to first map the end points on the line to the appropriate pixels and also the other points that are on the line to the appropriate pixels. 732 (Refer Slide Time: 10:06) Now, let us go through a very simple approach, how we can map the points that are on the line to the nearest pixels in the pixel grid. (Refer Slide Time: 10:24) So we can follow a four-step approach. In the first step, we map the end points to pixels simply by rounding off to the nearest integer. In that way, we get the starting and ending pixels for the line segment. So we now know which pixels are defining the line. Then in the second step, we take one endpoint having the lower x and y values as the starting point. In the third step, we work out the y value for successive x values. 733 Now, since we are dealing with a pixel grid, we know that pixels are separated by unit distances. So the successive x values will have a value difference of 1, so these successive values will differ by 1. In the fourth step, this computed y values are mapped to the nearest integer giving us the pixel coordinates of that particular point. So we first convert the end points to the nearest pixels, then we choose the end point having lower x and y values as the starting point, and starting with that point, we compute y value taking as input the x value, where the successive x values differ by 1 and we continue this till the other endpoint. So we compute all the y values between the starting and ending pixels. And these computed y values are then mapped to the nearest integer values, giving us the pixel coordinates for those points. Let us try to understand this in terms of one example. (Refer Slide Time: 13:07) Suppose this is our line segment, as shown in this figure. So this is one endpoint, this one is another endpoint. Initially, the line was defined by these two endpoints, A and B. As you can see, both are real numbers. So we have to map it to the nearest pixel. 734 (Refer Slide Time: 13:35) For A, if we do the rounding off, we will get this pixel as the nearest pixel, and for B will get this one as the nearest pixel if we perform the rounding off. Now, we can see that coordinates of A’ is less than B’. So we start with A’, we choose it as our starting pixel. So we start with this pixel and continue finding out the y values till we reach this other pixel, other endpoint. (Refer Slide Time: 14:21) Now, our objective is to compute the y values for successive x values. For that, we require the line equation which involves computation of the slope m, which in our case turns out to be this. 735 Because we know the two endpoints we can compute the slope and also, the y-intercept value. We are assuming here that the line is expressed in terms of this equation, y = mx + b, where m is the slope, and b is the y-intercept. Given the two endpoints, we can solve for m and b and find that m is 3 by 5 and b is 4 by 5. (Refer Slide Time: 15:16) Then what we do? For each x separated by unit distance, starting from the lower end pixel that is x value is 2, we compute the y values using the line equation till we reach the other endpoint. And the line equation is given here. So in this equation, we use the x values to get the y values. (Refer Slide Time: 15:58) 736 If we do so, what we will find? So for x =2, we have y=2; when x is 3, so the next pixel, x coordinate, we get y is 2.6; when x is 4, then we get y to be 3.2; x=5, y=3.8; x=6, y=4.4. So we get the four intermediate pixels and corresponding y values computed using the line equations. (Refer Slide Time: 16:45) So the four points are (3, 2.6), (4, 3.2), (5, 3.8), and (6, 4.4). These are the four values that we compute using the line equation. Now, we map this y values as shown here to the nearest integer to get the pixel locations. So if we do the rounding off, we will get the four pixels to be (3, 3) which is here; then (4, 3), which is here; (5, 4) here, and (6, 4) here. So these are our four intermediate pixels corresponding to the points on the line. 737 (Refer Slide Time: 17:45) Now, with this approach, as you can see, the way we have computed the values and ultimately found out the pixels, there are two problems broadly. First problem is we need to perform multiplication of m and x. Now, m is likely to be a real value, so that is a floating-point operation. Secondly, we need to round off y coordinate values that is also floating-point operation. Together these floating-point operations are computation intensive. (Refer Slide Time: 18:40) 738 So we have a computationally expensive approach to convert a line to corresponding pixels. In reality, we need to scan convert very large number of lines within a very small time. Now, if we have floating-point operations involved, then this process will become slow and we will perceive flickers, which is of course something which we do not want. So what do we need, we need some better solutions. Let us have a look at it slightly better approach. (Refer Slide Time: 19:29) But before that, I would like to point your attention towards another important topic that is consideration of slope of a line when we are performing this scan conversion. (Refer Slide Time: 19:47) 739 In our example, what we did, we calculated the y coordinate values for each x coordinate value. So we increased x by 1 and corresponding y values, we calculated using the line equation. You may wonder why we did so, we could have similarly done it the other way around. We could have increased y and calculated the x values. Let us try to see what happens if we do so if we calculate x values by increasing y values. (Refer Slide Time: 20:26) Now, we have these two endpoint pixels and the lower pixel is the starting point denoted by A’. 740 (Refer Slide Time: 20:41) This time we are increasing y by 1. That means we are moving this way. Earlier, we were moving along this direction, now we are moving this way from one scan line to the next. And then, we calculate the x based on the equation. Now, we need a modified equation which is given here and we already know b and m, the y-intercept and slope respectively. So we simply replace the y value to get the x value. (Refer Slide Time: 21:24) Now, if we do so, we will see that we have only two successive y values, this one and this one between y=2 and y=5. So earlier, we computed four x values, this time we required to compute 741 only two values because we have only two increases of y between the endpoints. So we are required to calculate two x values. So when y=3, then the x turns out to be 3.7 using the equation, and when y=4, x is 5.3. (Refer Slide Time: 22:19) Then what we have computed between the two end points? Two new points, (3.7, 3) and (5.3, 4). If we round it off to the nearest integers, then we get the pixels (4, 3) and (5, 4); let us place it here. So (4, 3) is this point and (5, 4) is this point. Note that earlier, we got two additional points when we moved along x-direction, now we are getting only two points, these two, when we are moving along y-direction and computing x. 742 (Refer Slide Time: 23:11) Clearly, the first set that is these 4 plus the 2 endpoints, total 6, pixels will give us a better approximation to the line compared to the second set consisting of total 4 pixels, the two endpoints, and the two newly computed pixels. So you have the first set, which is better than second set because the approximation is better due to the larger number of pixels. (Refer Slide Time: 23:55) Now, that is the issue here. How do we decide which coordinate to calculate and when? Should we start with x and calculate y, or should we increase y and calculate x? Now, this decision is taken based on the slope of the line, depending on the slope we take a call. 743 (Refer Slide Time: 24:30) If the slope is within these ranges, then we work out or calculate y values based on x coordinates of the pixels. So you increase x by 1 and compute the y values when m is within this range. If m is not within this range, then we compute x by increasing y coordinates. So that is our rule. So when m is within the ranges given here, then we compute y based on x, where x indicates the pixel coordinates that means integers. Otherwise, that means when m is not within this range, we compute x based on y where y indicates the pixel coordinates in integers. So that is how we make the decision. (Refer Slide Time: 25:48) 744 Now, let us go back to a better line scan conversion algorithm compared to the simple approach that we have learned earlier. So this approach is called DDA or digital differential analyzer. (Refer Slide Time: 26:06) DDA stands for digital differential analyzer, and this is an incremental approach which is developed to reduce floating-point operations. That means to increase the computation speed, speed up the scan conversion process. (Refer Slide Time: 26:32) 745 Let us try to first understand the idea. We will use the same example that we have seen earlier but this time, we will note a few more points. So earlier, we computed 4 points between the two end points (2, 2) and (7, 5) by increasing x and computing y values. Now, these 4 points are (3, 2.6), (4, 3.2), (5, 3.8) and (6, 4.4). Now, we will have a closer look at these points, what they tell us. (Refer Slide Time: 27:17) We computed that slope is 3/5 or 0.6. Now, the successive y values are actually addition of this slope value to the current value. The first value that we got is 2.6. Second value that we got is 3.2, which we can get by adding 0.6 that is the slope value to the earlier value that is 2.6. Next value we got is 3.8, which is again the earlier value plus the slope. Finally, we got 4.4, which is again the earlier value plus slope. So there is a pattern. We add the slope value to the earlier value to get the new value. This idea is exploited in this DDA algorithm. 746 (Refer Slide Time: 28:31) So instead of computing y with the line equation every time, we can simply add m to the current y value. That means the new y we can get by adding m to the current value. So we do not need to go for solving the line equation every time we want to compute the y value. (Refer Slide Time: 28:59) What is the advantage of that? It eliminates floating-point multiplication which is involved in this computation that is m into x. So we can eliminate these calculations which in turn is going to reduce the computational complexities. 747 (Refer Slide Time: 29:28) Now, as I said earlier, slope is an important consideration here. So when the slope is not within the range that means the slope is greater than 1 or less than minus 1, then we do not compute successive y values, instead we compute x values. Again, in a similar way that is new value is the old value plus a constant term, which in this case is 1/m, earlier it was only m. So we can obtain the new x value by adding this constant 1/m to the current value. And here also, by this, we are eliminating floating-point operations. (Refer Slide Time: 30:19) 748 So the complete algorithm is shown here. The input is the endpoint, the two endpoints are the input. And the output is the set of all pixels that are part of the line. So we compute m. Now, when m is within this range, we compute successive y values as shown here, by adding m to the current y value, round it off to get the pixel, and add the pixel to the set. And when m is not within this range, we compute successive x values by adding 1/m to the current value and perform the same steps again. So we continue in both the cases till the other end point as you can see in this loop termination conditions. So that is how we improve on the simple lines scan conversion approach by exploiting one particular property that is, we can compute the successive x or y values by simply adding a constant term. (Refer Slide Time: 31:58) This is clearly some improvement over the simple approach. However, there are still issues. 749 (Refer Slide Time: 32:07) With the DDA algorithm as we have noted, we can reduce floating-point operations, but only some of those floating-point operations. We cannot remove all, we can only reduce multiplications. (Refer Slide Time: 32:30) That still leaves us with other floating-point operations, which are addition and rounding off. Now, any floating-point operation is computationally expensive and it involves additional resources. So when we, in reality, require to generate large number of line segments in a very short span of time, our ideal objective should be to eliminate all floating-point operations 750 altogether, rather than eliminating few. Eliminating few, of course, improves the overall rendering rate, but eliminating all should be our ultimate objective. (Refer Slide Time: 33:22) That is so, since, for large line segments or large number of line segments, this floating-point operations may create problem. Particularly, when we are dealing with a very large line segment, the rounding off may result in pixels far away from actual line, for example, consider a very big line like this. Now, if we perform rounding off then we make keep on getting pixel, like, something like this. This actually looks distorted line for large line segments. For small segments, this may not be the case, but for large line segments, there is a possibility of visible distortion, which of course, we do not want. 751 (Refer Slide Time: 34:28) That is one problem of course, plus our ultimate objective is to remove all floating-point operations because along with this distortion, they also increases the time to render a line segment, and also requires resources. So we need a better solution than what is provided by DDA. One such approach we will discuss in the next lecture. (Refer Slide Time: 35:06) So whatever we have discussed today can be found in this book, Computer Graphics. You are advised to go through Chapter 9 up to Section 9.1.1 to know in more details whatever we have 752 discussed. So the improved line drawing algorithm will be taken up in the next lecture. Till then, thank you and goodbye. 753 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture - 26 Scan Conversion of Basic Shapes - 2 Hello and welcome to lecture number 26 in the course Computer Graphics, we will continue our discussion on the graphics pipeline. For a quick recap, let us just go through the stages. (Refer Slide Time: 00:47) So, we have already discussed the first stage that is object representation, second stage modeling transformation, third stage lighting, fourth stage viewing pipeline, and the only stage that is remaining is the fifth stage scan conversion. We are currently discussing the fifth stage. 754 (Refer Slide Time: 01:14) In the last lecture, we talked about rendering of lines, which is part of the fifth stage. And there we talked about a very intuitive approach as well as a slightly improved approach that is the DDA methods. Today, we will continue our discussion on line rendering, where we will talk about even better approach. And also we will discuss rendering of circles. Now, before we go into the discussion on a better line rendering approach, let us quickly recap what we have seen in the previous lecture on line rendering. (Refer Slide Time: 02:02) 755 So the idea was to map a description from viewport to a pixel grid. That is, of course, the objective of the fifth stage. (Refer Slide Time: 02:20) In order to do that, simplest approach is just to round off the real coordinates to the nearest integers, which are pixels, for example, from (2.3, 2.6) to (2, 3). Now, this is good for points but for mapping lines or circles or other primitive shapes, this may not be good. 756 (Refer Slide Time: 02:49) And for line what we did? So we first assume that a line segment is defined by the endpoints. And our objective is to map all the points that are on the line to the appropriate pixels. (Refer Slide Time: 03:08) The straightforward approach that we discussed is that first we map the end points to pixels, then we start with one endpoint which is having the lower x and y coordinate values, then we work out y-coordinate values for successive x-coordinates, where the x-coordinates differ by 1 because we are talking pixel grids. And then, this y values that we computed are mapped to the nearest integer thereby getting the pixels. 757 (Refer Slide Time: 03:55) Now, this approach has two problems. First, we require multiplication which is a floating-point operation. And secondly, we require rounding off which is also a floating-point operation. Now, these floating-point operations are computationally expensive and may result in slower rendering of lines. (Refer Slide Time: 04:20) To improve, we discussed one incremental approach. There we did not go for multiplication, instead we used addition. So to compute y, we simply added this m value to the current value, or to compute x, new x, we simply added this 1 by m value to the current x value. Now when to 758 choose whether to compute x or y, that depends on the slope. So if the m value is within this range, then we compute y given x, otherwise, we compute x given y using the line equation. (Refer Slide Time: 05:18) Now, the DDA can reduce some floating-point operations as we have discussed, particularly multiplications. However, it still requires other floating-point operations, namely additions and rounding off. So it is still not completely efficient so to speak, and we require a better approach. One such approach is given by Bresenham’s algorithm. (Refer Slide Time: 06:03) 759 Now, this is an efficient way to scan convert line segments and we will discuss the algorithm assuming m to be within these ranges. That means we will concentrate on computing y value given the x value. (Refer Slide Time: 06:26) Now, let us try to understand the situation. Suppose, this is the actual point on the line and we are moving along the x-direction, the current position is given by this point (xk, yk). Now, the actual point on the line is a floating-point number real number, so we need to map it to the nearest pixel grid point. Now, there are two potential candidates for that, one is this pixel or the upper candidate pixel that is (xk+1, yk+1), and the other one is the lower candidate pixel that is (xk+1, yk), and we have to choose 1 of those. How to choose that? 760 (Refer Slide Time: 07:24) Our objective is to choose a pixel that is closer with respect to the other pixel to the original line. So between these two pixels, we have to decide which one is closer to the original line and choose that pixel. (Refer Slide Time: 07:51) Let us denote by dupper, the distance of the pixel (xk+1), (yk+1) from the line that is the upper candidate pixel from the line as shown here. Similarly, d lower indicates the distance of the lower candidate pixel from the line. 761 (Refer Slide Time: 08:28) Now, at (xk+1), that is at these points y is given by this expression using the line equation, where m is the slope. Then we can say that d upper can be given by ((yk+1) – y), that is this value minus this y value, which is given here or this expression. Similarly, dlower can also be given as y minus yk. As you can see here, this is the y value and this is the yk value. So replacing the y from this equation, you can get this expression. Now, let us do some mathematical trick on these expressions. (Refer Slide Time: 09:49) 762 But before that, we should note that if the difference is less than 0 then the lower pixel is closer and we choose it, otherwise, we choose the upper pixel. Distance between the y values here, here, and here; the two distances that we have used in expressing dupper and dlower. If the difference is less than 0, then we choose the lower pixel because that point is closer to the line, otherwise, we choose the upper pixel. (Refer Slide Time: 10:41) Now, let us substitute m with this ratio, Δy/Δx, where Δy is the y coordinate difference between the endpoints and Δx is the x coordinate difference between the endpoints. And then we rearrange and replace this expression with c, which is a constant term. As you can see here, all are constants. Then what do we get? This term to be equal to this term, both sides we have multiplied by Δx and replace m with these expressions. Rearranging and expanding, we get this, then we replace this constant term here with c to get this expression. This is a simple manipulation of the terms. 763 (Refer Slide Time: 11:55) Now, let us denote the left-hand side by pk, we call it a decision parameter for the kth step. Now, this parameter is used to decide the closeness of a pixel to the line. Now, its sign will be same as that of the sign of the difference dlower - dupper. (Refer Slide Time: 12:28) Thus, if pk˂0, then this lower pixel is closer to the line and we choose it, otherwise, we choose the upper pixel. 764 (Refer Slide Time: 12:47) So that is at step k. Now, at step k+1 that is the next step, we get pk+1. Now, that is essentially given by this expression where we replaced xk with xk+1 and yk with yk+1. These two terms we replaced with the next term. Then we take a difference between the two, pk+1 - pk, which gives us this expression. (Refer Slide Time: 13:26) Now, we know because we are dealing with a pixel grid that x k+1 is essentially xk + 1. So we can rearrange and rewrite the expression as pk+1 = pk + 2Δy – {this term}. That means the decision variable at k+1th step is given by the decision variable at kth step plus a term; this and that term. 765 Now, if pk<0, that is the lower pixel is closer, then we set ykp+1=yk, otherwise, we set yk+1=yk+1. Thus based on the sign of pk, this term becomes either 0 or 1. So you can see the physical significance from this figure. So if pk<0 that means in the current stage, lower pixel is closer. That means, we have chosen this one. Then in the next stage, we have to choose yk+1 = yk that is the lower pixel. If that is not the case, then we have to choose yk+1 = yk +1 that is the upper pixel. So depending on the sign of pk, this term yk+1 - yk turns out to be either 0 or 1. If pk<0 then this is 0; if pk≮0 then it is 1. (Refer Slide Time: 15:50) So, where we start, then we have to decide on the first decision parameter and that we call p0, which is given by twice delta y minus delta x. This value we calculate and then we continue. 766 (Refer Slide Time: 16:13) So the overall algorithm is given here. We first compute these differences between the endpoints and the first decision parameter. Then we go inside a loop till we reach the end point. We start with one end point and till we reach the other end point, we continue in the loop. Now, if p<0, we set that difference to be 0 and then update p as p+2Δy. If p≥0, then we update p as given in this expression and then add the corresponding x-y value into the set of pixels that is the output of the algorithm. So, depending on the decision value, we choose a pixel and add it to the set of pixels. 767 (Refer Slide Time: 17:31) Now, here we assume that m is within this range. When m is outside this range, we have to modify this algorithm but that is a minor modification. So you may try it yourself. (Refer Slide Time: 17:50) So what is the advantage of this algorithm? Here if you note, we are choosing the pixels at each step depending on the sign of decision parameter, and the decision parameter is computed entirely with integer operations so there is no floating-point operation. 768 Thus we have eliminated all floating-point operations; additions, rounding off, as well as multiplications which is a huge improvement because, in reality, we need to render a large number of lines in a short span of time, a very short span of time. So there this saving is substantial. There are even better approaches but we will not discuss those any further. (Refer Slide Time: 18:48) Now, let us try to understand the algorithm in terms of one example. (Refer Slide Time: 18:54) 769 We will continue with our example that we have introduced in the previous lecture. So this is the line segment given, these are the endpoints already mapped and our job is to find out the intermediate pixels that correspond to the points on the line. (Refer Slide Time: 19:17) So we will start with computing Δx, Δy, and initial p. Then we start with one endpoint and add it to the list of pixels, the endpoint. (Refer Slide Time: 19:38) 770 Now, we have computed p to be 1, which is ≥0. So here, the upper pixel is closer that means we choose this one. We add this and update p with the expression to be -3, and (3, 3) is added to the grid. Now, this is not the end point, we have not yet reached the end point so we will continue. (Refer Slide Time: 20:13) In the second execution, we check that p is -3, which ≤0. So in the second case, the lower pixel is chosen and we update p again, to be 3, add this lower pixel to the output list and check whether we have reached the end point. Since we have not yet reached we continue the loop. And in this way, we continue to get other points. 771 (Refer Slide Time: 20:43) So in the next stage, p=3 > 0. So we choose the upper pixel, add this one to the output pixel list, continue the loop since we are yet to reach the end point. (Refer Slide Time: 21:05) 772 Then we find p to be -1, less than 0. So we choose the lower pixel, add the pixel to the output list, and now, we see that we have reached the other end point. So we stopped the loop and add the other endpoint into the list. That is our last step. (Refer Slide Time: 21:46) So finally, what are the points that we get? These are the pixels that we get following the steps of the Bresenham’s algorithm. Now, you can compare it with the previous methods that we used. However, while comparing, you should keep in mind the number of floating-point operations that we avoided because that is the advantage. So if you find that both the sets or all the sets that we have found earlier are same that is not a problem because we saved in terms of computation. 773 So, with that, we end our discussion on lines scan conversion. So we learned three things; first, we started with a simple approach, found its problems, then we discussed one improved approach that is the DDA approach. And then, we finally discussed even better approach, the Bresenham’s line drawing algorithm, which eliminates all the floating-point operations. Now we will move to scan conversion of another primitive shape that is circle. (Refer Slide Time: 23:19) Initially, we will assume that the circle is centered at origin with radius r and its equation is given by x2 + y2 = r2. We all know this equation. Now, in the simple approach, the most intuitive and straightforward approach what we do? We solve for y after every unit increment of x in the pixel grid by using the equation. 774 (Refer Slide Time: 23:54) Clearly, here we have lots of computations, floating-point computations, which involve square root and multiplications because r need not be integer. So this is inefficient. We may also need to round off the computed values, which is addition of other floating-point operations and the pixels that we obtain may not generate smooth circle because there may be gap between actual points and the chosen pixels after rounding off. (Refer Slide Time: 24:44) So it suffers from many problems and we require a better solution. 775 (Refer Slide Time: 24:52) Let us try to go through one such solution, which is called the Midpoint algorithm. (Refer Slide Time: 25:05) Now, this algorithm exploits an interesting property of circle that is called eight-way symmetry. Now, what is this property? 776 (Refer Slide Time: 25:19) If we look at this figure, we will see that this is the origin and the circle is around the origin, we can divide the circle into 8 quadrants, these are the quadrants. And if we determine one point on any quadrants say this point, then we can determine seven other points on the circle belonging to the seven quadrants without much computations. So if this point is (x, y), then we can say this point will be (y, x), will be (y, -x). This one will be (x, -y), this one will be (-x, -y), this one will be (-y, -x), this one will be (-y, x), and this one will be (-x, y). So this we can straight away determined without any further computation. 777 (Refer Slide Time: 26:28) We can exploit this property in circle scan conversion and by computing one point on a quadrant and then use that point to derive the other seven points on the circle. That means we determine one pixel, and from there we determine the other seven pixels. So instead of determining the pixels through computation eight times, we do it once and the other seven time computations we save. (Refer Slide Time: 27:07) Now, let us see how to determine a pixel for a given quadrant. Suppose, we have determined a pixel (xk, yk); this is one quadrant of the circle given. Now, next pixel should be either this one or 778 this one. Again, we can call them upper candidate pixel and lower candidate pixel. And in this case, note that we are going down along this direction, down the scan lines, and our objective is to choose a pixel that is closer to the circle. Now, how do we decide which of these two candidate pixels is closer to the circle? (Refer Slide Time: 28:11) Again, we go for some mathematical trick. (Refer Slide Time: 28:18) 779 Now, the circle equation, we can reorganize in this way, f(x, y) = x2 + y2 – r2, this is the circle equation we can restate in this way. (Refer Slide Time: 28:33) Now, this function we can evaluate as shown here. That is if (x, y) < 0, if the point (x, y) is inside the circle; it is 0 if it is on the circle, and it will be greater than 0 if the point is outside the circle. This we know from geometry. (Refer Slide Time: 29:05) 780 Then, we can evaluate the function at the midpoint of the two candidate pixels. That means at (xk+1, yk – ½). Note that this is yk, so midpoint will be this point that is yk – ½ and it will be xk+1 that will be the new x coordinate. Now, this will be our decision variable pk after k steps. So let us try to see what this variable looks like. (Refer Slide Time: 29:48) So essentially, we compute this function at the point (xk+1, yk–½), which is the midpoint between the two candidate pixels, and y half because this is the unit distance, so half or the midpoint will be half of this unit distance. 781 (Refer Slide Time: 30:16) So, pk will be the function value at this point. Now, if we expand the function with these coordinates, then we get this expression which is the expression for pk. (Refer Slide Time: 30:42) Now, if pk<0 that means, the function evaluates to be less than 0. Then we know from the geometric properties that midpoint is inside the circle. That means the upper candidate pixel will be closer to the circle boundary. So we choose (xk+1, yk). If that is not the case then we choose the other candidate pixel that is (xk+1, yk-1). 782 Note that we are going down the scan lines, so next y coordinate will be yk-1. Because in that case, midpoint is outside and this lower candidate pixel is closer to the boundary. (Refer Slide Time: 31:33) Now, let us see the expression for the decision variable at the next step that is pk plus 1. So here, our new point will be xk+1+1, increment by 1, and yk+1-½, which after expansion will look like this. (Refer Slide Time: 31:55) 783 Now, we may expand it further and then rearrange to get a simplified expression that is pk+1 is the current decision value plus this term. (Refer Slide Time: 32:14) Now, yk+1 is yk if pk is less than 0 that we have already seen. So in that case, pk+1 will be pk+2, xk+3. Now, if pk greater than 0 then yk+1 will be yk-1 that also we have seen. Then the pk+1 term will become something like this. (Refer Slide Time: 33:01) 784 As you can see these are all integer operations and we choose the pixels based on an incremental approach that is computing the next decision parameter from the current value and that too by avoiding floating-point operations. (Refer Slide Time: 33:22) However, that need not be the case because here, the initial decision parameter or decision variable involves floating-point operation. And we have to keep that in mind that unlike Bresenham’s algorithm, although, the expression for computing the next decision variable does not involve any floating-point operation apparently, but when we start with maybe a floatingpoint value and then that will remain. So here we are not completely eliminating floating-point operations but we are reducing them significantly. 785 (Refer Slide Time: 34:11) So what is the complete algorithm? We first compute the first decision variable and choose the first or the starting point. Now, one point we have chosen, then using symmetry we can add four other points or pixels. Now, when the decision parameter is less than 0, we update the parameter in this way and get the pixel to add to the set of pixels. When the decision parameter is greater than 0, then we update the decision parameter in this way and get this point as the new pixel. And then we add the new pixel to the list of pixels plus we add the seven symmetric points using the symmetric property and we continue it until we reach the end of the quadrant. So that is how midpoint algorithm works. As you have noted, if we go for simple approach, we require a lot of floating-point operations, multiplications, square root, which we avoided by this midpoint algorithm. 786 (Refer Slide Time: 35:58) Now here, of course, it may be noted that the algorithm assumes circle is centered at origin and we require some modification when we are assuming circles which has its center at any arbitrary location. But that minor modification we will not go into the details, you may try it yourself. (Refer Slide Time: 36:23) Now, let us try to understand this algorithm better in terms of one illustrative example. 787 (Refer Slide Time: 36:34) Let us start with the assumption that we have a circle with radius r to be 2.7 that means a real number. Now, let us execute the algorithm to find out the pixels that we should choose to represent the circle. (Refer Slide Time: 36:59) First stage is compute p, which is 5/4 - r, which gives us this value. And we start with this point by rounding off r to 3 and we get this point as the first point in our list, first pixel. And based on this first pixel, we add other four pixels in the output list. Then we enter the main loop. 788 (Refer Slide Time: 37:34) So we have p<0, then we update p as per the expression for p<0 and then, get this new pixel value. With that, we add eight pixels to the output list (1, 3), (3, 1) (3, -1) (1, -3), and so on. Since we have not yet reached the end of the loop, we continue with the loop. (Refer Slide Time: 38:11) In the second loop run, we have p>0. So we use the expression to update p and we get the new value and we decide on this new pixel, based on that we choose the eight pixels as shown here. Now, we have arrived at the end of the loop, the termination condition is reached so we stop. So then at the end what we get. 789 (Refer Slide Time: 38:44) We get 20 pixels shown here. One thing you may note is that there are some pixels that are repeated. For example, (2, 2) occur twice; (-2, -2) occurred twice; (2, -2) occurred twice. So this repetition is there at the end of the execution of the algorithm. (Refer Slide Time: 39:27) These duplicate entries, we need to remove. So before rendering, we perform further checks and processing on the output list to remove such duplicate entries. So the algorithm may give us a list having duplicate entries, we need to perform some checks before we use those pixels to render the circle to avoid duplications. So that is what we do to render circle. 790 So we have learned how to render line, we have learned how to render circle. In both cases, our objective was to map from real number values to pixel grids. And our objective was to do so without involving floating-point operations to the extent possible because, in practical applications, we need to render these shapes very frequently. And there, if too many floatingpoint operations are involved, then the speed at which we can render may slow down giving us the perception of a distorted image or flickers which are unwelcome. In the next class, we will discuss more on rendering other things. Whatever we have discussed today, can be found in this book. (Refer Slide Time: 41:15) You may go through chapter 9, section 9.1.2 and 9.2 to get the details on the topics that we covered today. So we will meet in the next lecture. Till then, thank you and goodbye. 791 Computer Graphics Professor Doctor Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 27 Fill Area and Character Scan Conversion Hello and welcome to lecture number 27 in the course computer graphics. We are discussing the 3D graphics pipeline, as you may recollect it has 5 stages and we have already discussed 4 stages in details and currently we are in the fifth stage. So, let us just have a quick relook at the 5 stages. (Refer Slide Time: 00:55) As you can see in this figure. We have already discussed first stage in details, object representation, then second stage modelling transformation, third stage lighting or assigning colour, fourth stage viewing pipeline and currently we are at the fifth stage scan conversion or rendering. 792 (Refer Slide Time: 01:23) Now, in scan conversion what we do, we essentially try to map description of an image given in the device coordinate system to a description on the pixel grid that means, set up pixels. So, in the earlier lectures we have covered the methods that are followed for such mapping, for point, line and circle. And we have seen how we can improve efficiency of these methods by introducing better approaches such as the Bresenham's line drawing algorithm used for line scan conversion, midpoint algorithm for circle scan conversion and so on. Today we are going to discuss another scan conversion technique related to fill areas, along with that we will also discuss how we display characters that means the letters, numbers etcetera on the screen. We will try to get a broad idea on character rendering. 793 (Refer Slide Time: 02:56) Let us, start with fill area rendering. So, first let us try to understand what is a fill area. (Refer Slide Time: 03:05) What we have discussed so far, how to determine pixels that define a line or a circle boundary. 794 (Refer Slide Time: 03:24) Sometimes that may not be the case, sometimes we may know pixels that are part of a region. And we may want to apply a specific colour to that whole region. So, earlier what we did? We determined pixels that are part of a single line or the circle boundary, but sometimes there may be situations where we may go for assigning colours to region rather than a line or a boundary. (Refer Slide Time: 04:09) Now, that is the same as saying that we want to fill a region with a specified colour. So, that is fill area rendering, one of the topics of our discussion today. So, when we are talking about fill area rendering we are referring to a region and our objective is to fill that entire region that 795 means the pixels that are part of that region with a specified colour. This is in contrast to what we have learned earlier where our objective was to find out pixels and of course assign colours to them which are part of a line or the boundary of a circle. (Refer Slide Time: 05:02) Let us, try to understand this concept with an example. Consider an interactive painting system, so in that system what we do? We may try to draw any arbitrary shape and then we may wish to assign some colours to that shape, that means assign colours inside the boundary of that set. Also, we may want to change the colour, so first thing is we may want to colour it, colour that arbitrary shape that we have drawn. Now, when we say we are trying to colour some shape that means we want to colour the boundary as well as the interior. We may also want to change colour and that too interactively that means select some colour from a menu, click in the interior of the shape to indicate that the new colour to be applied to that shape. If you have used some interactive painting system, then you maybe already familiar with these things. For example, suppose this is our canvas and here we have drawn a shape something like this, then there may be a menu of colour or say colour palette, so we may choose this menu, say for example this colour click our mouse pointer or touch some point inside this shape and then the centre colour is applied in the interior of this shape. So, that is interactive colouring of a shape. And here as you can see, we are concerned about colouring a region rather than only the 796 boundary, unlike what we did when we were trying to determine pixels as well as their colours for lines or circle boundaries (Refer Slide Time: 07:26) The question is how we can perform such colouring or region filling? Now, that depends on how the regions are defined, so there can be different ways to define a region and depending on that definition we can have region filling approaches. (Refer Slide Time: 07:50) Broadly there are two definitions of a region, one is pixel level definition one is geometric definition. 797 (Refer Slide Time: 08:02) In case of a pixel level definition we define a region in terms of pixels that means we may define the region in terms of boundary pixels or we may define the region in terms of pixels within a boundary. In the first case when we are defining a region in terms of boundary pixels or the set of pixels that define the boundary such definition is called boundary defined. In the other case we do not explicitly define a boundary but set of pixels that defines the whole region in that case we call it interior defined. So, such pixel definitions are useful when we are dealing with regions having complex boundaries or as we have just seen applications such as interactive painting systems. So, for complex shapes, it is difficult to deal with the boundary, so their pixel level definition may be useful. Also in interactive systems pixel level definitions are very useful. 798 (Refer Slide Time: 09:28) The other type of fill area definition is geometric definition, here we define a region in terms of the geometric primitives such as edges and vertices this we have already seen before during our object representation techniques. Now, this particular approach is primarily meant for polygonal regions. And these definitions are commonly used in general graphics packages, which we have already mentioned earlier. So, essentially geometric definitions means, defining a region in terms of geometric primitives such as edges, vertices, if you may recollect we have discussed such things during our discussion on objective representation where we used vertex list, edge list to define objects or regions. And when we are dealing with geometric definitions, they are primarily meant to define regions that are polygonal in shape. 799 (Refer Slide Time: 10:52) Now, with this knowledge of two broad definitions of regions, let us try to understand the different region filling scan conversion algorithms. We will start with one simple approach that is called seed fill algorithm. So, what it does let us try to understand. (Refer Slide Time: 11:16) So, the idea is very simple for a seed fill algorithm, we start with one interior pixel and colour the region progressively, that is the simple idea. 800 (Refer Slide Time: 11:33) Clearly here we are assuming a pixel level definition particularly, a boundary definition of a region, where the boundary pixels are specified. And we also assume that we know at least one interior pixel, now that pixel is called the seed pixel and if we know the boundary pixels we can decide on any seed pixel, it is easy, because we are dealing with a seed pixel, so the algorithm is named seed fill algorithm. So, we have a seed pixel and we have boundary definitions of the region in terms of pixels. (Refer Slide Time: 12:26) 801 Next in this algorithm, it is also assumed that interior pixels are connected to other pixels in either of the two ways, either they can be connected to 4 pixels which is called 4 connected or they can be connected to 8 pixels, which is called 8 connected. Now, these are the neighbouring pixels for example suppose this is a seed pixel and there are pixels around, if this is the grid, where the circle show the pixels, then these pixels can be assumed to be connected to either 4 neighbouring pixels or all 8 neighbouring pixels. Accordingly, the nature of connection is called 4 connected or 8 connected. So, when we are talking of 4 connected, we essentially assume that let us redraw the figure again, suppose these are the pixels these intersection points of the grid, this is one pixel, now in case of 4 connected the 4 neighbouring pixels are defined as top, bottom, left and right that means this is the top, this is bottom, this is right and this is left. Whereas when we are dealing with 8 connected pixels, we are dealing with the 8 neighbours top, top left, this is the top left, then top right here, then left, right, bottom, bottom left this is here and bottom right here. So, either of these connections we can assume. And accordingly the algorithm is executed. (Refer Slide Time: 15:05) So, the basic idea is simple we maintain a stack, the seed pixel is first pushed into the stack and then a loop executed till the stack is not empty. Now, in each step, we pop the stack top pixel and assign the desired colour to that pixel. 802 (Refer Slide Time: 15:37) The algorithm is shown here. So, what is our input? The boundary pixel colour, specified colour, which we want to assign to the region and the seed or interior pixel, anyone seed pixel and the output is the interior pixels with specified colour that is our objective. We start with pushing the seed pixel to a stack and then we enter into a loop where we set the current pixel to be the stack top pixel by popping it from the stack, apply specified colour to that pixel, then we make use of the connected property. So, if we are assuming that it is a 4 connected pixel, then for each of the 4 connected pixels or if we are assuming that it is 8 connected pixel then for each of the 8 connected pixels of the current pixel, what we do? We check if connected pixel colour is not equal to boundary colour that means we have not reached the boundary or the connected pixel colour is not equal to the specified colour that means we are yet to assign it any colour, then we push it to the stack. So, for each pixel we push either 4 connected pixels to the stack or 8 connected pixels to the stack depending on the nature of connectedness that we are assuming. And then we come back here and the loop continues still the stack is empty that means we have reached the boundary or we have assigned colours to all the interior pixels. That is the simple idea of the seed fill algorithm. Next we will discuss another approach which is called flood fill. The idea is almost similar with some minor variations. Let us see how it works. 803 (Refer Slide Time: 18:03) Now, in case of flood fill algorithm, we assume a different definition which is an interior definition that means the interior pixels are known. Earlier, we assumed boundary definition with only one interior pixel, that is the seed pixel. Now, here we are assuming interior definition, that means all the interior pixels are known. And our objective is to colour or recolour the region with a specified colour. (Refer Slide Time: 18:38) The idea is similar to seed fill, with some difference. Now, in this case the decisions are taken based on original interior colour of the current pixels instead of boundary pixel colour. Other 804 things remain the same, that means using a stack and utilizing the stack elements in a particular way, colouring them in a particular way remains the same. (Refer Slide Time: 19:17) So, the algorithm is shown here again input is the interior pixel colour, specified colour and the one interior pixel or seed pixel, it is even more easy here because we already know the interior pixels and we can randomly pick up one pixel and the output is after assigning colours to all the pixels the set of pixels. Now, we push the seed pixels to stack and as before we enter a loop, first we pop the stack and set it to the current pixel applying specified colour, then assuming connectedness as we did before, we deal with either 4 or 8 connected pixels and for each pixel we do the check, now here the check is slightly different as compared to what we did earlier. Here we check if the colour of the connected pixel is the interior colour. Only in that case we push the connected pixel, because here we cannot check for boundary colour, there is no boundary specified. And then we continue like before till stack is empty. So, in both the cases we start with a seed pixel, but in one case we are dealing with a boundary definition of pixels, in other case we are dealing with an interior definition of region in terms of pixels. And accordingly our algorithm changes slightly otherwise broad idea remains the same. 805 (Refer Slide Time: 21:29) We will discuss a third approach, which relies on geometric definition this is called scan line polygon fill algorithm. So, earlier approaches seed fill or flood fill depend on pixel definitions. In case of scan line polygon fill algorithm, we depend on geometric definition. (Refer Slide Time: 21:56) So, here we assume that the region is defined in terms of its vertices and edges, of course here the implicit assumption is that the region is polygonal and the vertices are rounded off to the nearest pixels. These are the things that we assume. 806 (Refer Slide Time: 22:23) We will first discuss the algorithm and then try to understand it in terms of an illustrative example. So, here the input is set of vertices and the output is the pixels, interior pixels with specified colour. From the vertices what we do is determine the maximum, the minimum scan lines, that means the maximum and minimum y values for the polygon. So, for example here suppose this is our pixel grid and we have shape like this, so we need to know the minimum y which is here ymin and maximum y which says here ymax. So, this maximum and minimum first we determine, then we start from the minimum scan line, that is the lowermost one here. And then we enter into a loop and continue in the loop until we reach the maximum scan line as shown in this loop condition. So, in the loop what we do? For each edge or the pair of vertices of the polygon if go for a check if the scan line is within a certain range defined by the y coordinates of the edge then we determine the edge scan line intersection point. After these steps what we do? We sort the intersection points in increasing order of x coordinates, that means we first try to determine the intersection points then we sort them in increasing order, then apply specified colour to the pixels that are within the intersection points all intermediate pixels we apply the colour and then we go to the next scan line, that is the broad idea. 807 So, first we determine the minimum and maximum we start with minimum and continue the processing till the maximum scan line is reached. In each step of the processing or each loop execution what we do? We determine these two intersection points of the edge with the scan lines to get the two extremes on a single scan line and then assign specified colour to all the pixels that are within these extremes, that is the simple idea. Let us, now try to understand it in terms of one example. (Refer Slide Time: 25:58) We will go through one illustrative example to get more clarity on the algorithm. (Refer Slide Time: 26:07) 808 Let us consider this figure. Here there is a polygon or fill area specified with 4 vertices A, B, C and D as shown here. Now, we followed an anti-clockwise vertex naming convention, so there are 4 edges AB, BC, CD and DA. (Refer Slide Time: 26:46) Now first, we determine the minimum and maximum extent of the scan lines. Here it is 1 is the minimum as you can see here and 6 is the maximum scan line, this we determine as the first step. (Refer Slide Time: 27:10) Then we start the loop. So, we start from 1 and continue till 6 and in each execution of the loop we process one scan line. So, when we are starting with 1, so our objective is to determine the 809 intersection points of the scan line y equal to 1 with all 4 edges in the inner loop of the algorithm that is lines 6 to 10. (Refer Slide Time: 27:44) If you execute the lines, you will find that for the edge AB the if condition is satisfied and the intersection point is A, for BC and CD edges the condition is not satisfied, again for DA the condition is satisfied and we get A again, so the intersection point is only A. (Refer Slide Time: 28:15) Since it is already a vertex, there cannot be any intermediate pixels. So, we get 2 intersection points, which is the same vertex A, thus it is the only pixel and we apply the specified colour. 810 Then we go to the next iteration by setting scan line equal to 2 and checking that 2 is not the maximum scan line that is 6, so we execute the loop again. (Refer Slide Time: 28:57) In the second iteration of the loop, what we do? We check for intersection points as before with the edges and the scan line y equal to 2, that is this scan line. (Refer Slide Time: 29:14) Now, for y equal to 2 and if we check the edges we see that for AB if condition is satisfied that means there is an intersection point, using the edge line equation and the scan line equation we 811 can get the intersection point as this one, this point, for BC and CD the if condition does not satisfy for BC and CD, so there is no intersection. And for DA the condition satisfies, again. So, this is for AB intersection point, this is for DA intersection point and this DA intersection point we can find to be (3, 2) by using the line equation for the edge as well as the scan line equation. So, this point is one intersection point, this point is another intersection point. (Refer Slide Time: 30:41) Then we perform a sorting as mentioned in the algorithm and get these two intersection points in a sorted order as this one. So, this is one intersection point and this is the other intersection point. In between pixels are there as you can see so this itself is an in between pixel, then we have this pixel, which is (4, 2) and then we have (5, 2). Note that the other intersection point is not a pixel in itself because it involves a real number as coordinate. So, we found out the 3 pixels, which are there in between the two intersection points, then we apply specified colour to these pixels. Then we reset the scan line to 3 and check whether 3 is the maximum scan line, it is not so we re-enter the loop again. 812 (Refer Slide Time: 32:15) And in a similar way we process the scan lines y=3, y=4, y=5 till y=6. So, that is the idea of the algorithm. So, to summarize here in this scan line polygon fill algorithm, what we do? We assume a geometric representation of the region, polygonal region in terms of edges or vertices. Then for each scan line which is there between the minimum and maximum scan lines, what we do? We determine the intersection points of the edges with those scan lines, sort them to get two extreme intersection points, identify the in between pixels and colour those pixels. And this we do for all the scan lines that are there between the minimum and the maximum scan lines. 813 (Refer Slide Time: 33:29) Now, there are two things that require some elaboration in the algorithm. (Refer Slide Time: 33:38) First is how do we determine the 8 scan line intersection point, that I think all of you know that we can use the line equation which we can determine by the two endpoints and we can use the scan line equation to get the intersection point. So, this line equation can be evaluated with the scan line value to get the intersection point, which is a very simple approach and I think all of you may already know how to do that. 814 (Refer Slide Time: 34:20) Secondly how do we determine pixels within two intersection points? This is again very simple, we start from the left most pixel which is either the intersection point or just next to the intersection point and then continue along the scan line till we get a pixel value which is less than the right intersection point. Pixel x coordinate we check in all these checks. So, both are simple, but the second point that is how to determine pixels within two intersection points is not as simple as it appears. And why that is so? (Refer Slide Time: 35:27) 815 If we assume we have a concave polygon, then there is a problem. So, far whatever explanation is given is based on the assumption that we are dealing with convex polygons. For concave polygons it is not so easy to determine the intermediate pixels, an additional issues are there which needs to be solved before we determine the interior pixels. (Refer Slide Time: 36:16) Let us, take an example, here in this figure as you can see this is a concave polygon, so when we are dealing with these two extreme intersection points, some pixels are outside the polygon, although if we follow the approach that we outlined earlier that we will just move along this line to get all the intermediate pixels, then outside pixels will also be treated as interior pixels, which 816 definitely we do not want. So, we need to go for some different approach when we are dealing with concave polygons. What we need to do is, we need to explicitly determine inside pixels, pixels that are inside the polygon. As you can see from the figure that is not so obvious for concave polygons. (Refer Slide Time: 37:24) So, in this case we need to perform an inside outside test for each pixel which is of course an additional overhead. (Refer Slide Time: 37:39) 817 And how we can do that? So, for each pixel p what we can do is determine the bounding box of the polygon that is the maximum and minimum x and y values of the polygons vertices. This is the first step, then in the second step we choose an arbitrary pixel, let us denote it by p0, outside the bounding box. So, in this case, this can be our bounding box, as you can see it covers all vertices of the polygon. Then we choose one pixel outside this bounding box somewhere say here, that means choose a point (x, y) which is outside the min and max range of the polygon coordinates. In the third stage we create a line by joining p and p0. So, we create a line between the pixel that is inside that is our pixel of concern and the point that is outside the bounding box. In the final stage we go for some checks, if the line intersects the polygon edges even number of times then p is outside, otherwise it is inside. So, as you can see suppose we have a pixel here and these two pixels if we join it intersects one time, that is odd number of time, so this pixel is inside. Whereas, if we are dealing with a pixel here and we are joining this, so as you can see here intersection is twice that is even number of times, so these pixel is outside. Similarly, these pixels if we join these two lines, we see that it does not intersect the polygon edges, so that is 0, so in that case also it is outside. But of course all these checks takes time, so it is an additional overhead when we are dealing with concave polygons. Otherwise, scan line polygon fill algorithm for convex polygons is quite simple. So, we have discussed different region fill algorithms, both for pixel level definitions as well as geometric definitions. For pixel level definitions we learned about seed fill algorithm and flood fill algorithm, both are similar with minor variation. For geometric definitions we learned about scan line polygon fill algorithm, this algorithm is quite simple and straightforward when we are dealing with convex polygon, but requires additional checks when we are dealing with concave polygons. Now, let us try to understand how characters are displayed on the computer screen. So, how do we render characters? 818 (Refer Slide Time: 41:28) Here, character means alphanumeric characters. (Refer Slide Time: 41:35) Now, character rendering is of course as we all know is an important issue, for example consider this slide, here we see lots of alphanumeric characters displayed. So, clearly it is an important issue and this is the building blocks of any textual content, so any application that deals with displaying texts must have support for character rendering. And how it is done? 819 (Refer Slide Time: 42:15) As we all know when we are dealing with some text processing application, typically large amount of text needs to be displayed in short time span. For example, if we consider scrolling, now with each scroll action the whole set of characters is redrawn and that has to be done very quickly. So, efficient rendering of characters is a very important issue in computer graphics. (Refer Slide Time: 43:05) Now, before we try to understand character rendering, let us have a quick look at the idea font and we already know probably already heard of this term, so let us see what is a font and how it is dealt within computer graphics. 820 (Refer Slide Time: 43:22) So, when we are talking of font, font or typeface denotes overall design style of characters and there are many such fonts as probably all of us are using every day such as Times New Roman, Courier, Arial and so on. So, these fonts or typefaces indicate the design style of the characters how they look. (Refer Slide Time: 43:58) Now, each font can be rendered with varying appearance. So, appearance may be different, it may be bold, it may be italicised or it may be both bold and italicised. 821 (Refer Slide Time: 44:17) Along with that there is another concept called size, how big or small the character appears on the screen. So, that is denoted by point, for example a 10-point font, 12-point font and so on which is a major of character height in inches and this term is borrowed from typography, but in computer graphics we do not use the original measure. Instead, we assume that point is equivalent to 1/72 of an inch or approximately 0.0139 inch. And this is also known as DTP or desktop publishing or postscript point. So, when we are talking of a point, we assume that it is 1/72 of an inch or approximately 0.0139 inch, which indicates the height of the character. 822 (Refer Slide Time: 45:33) Now, with that basic knowledge, let us try to understand how characters are rendered. (Refer Slide Time: 45:41) So, there are broadly two ways of rendering characters, one is bitmapped one is outlined. 823 (Refer Slide Time: 45:53) In case of bitmapped font, we define a pixel grid for each character, for example consider this 8 by 8 pixel grid and we can define the grid for character B in capital, where the pixels that are part of the characters are marked ON and others are OFF. So, when the grid is rendered for B, only those pixels will be illuminated other pixels will not be illuminate. The black circles here indicate the ON pixels as you can see here and the white boxes indicate the OFF pixels. So, we can have this type of grid for each character when we are dealing with bitmapped font. (Refer Slide Time: 46:59) 824 In contrast when we are dealing with outline font the approach is totally different, here characters are defined using geometric primitives such as points and lines. So, few pixels maybe provided and then other pixels will be determined by using scan conversion techniques for points, lines and circles. So, essentially few pixels are provided and using those pixels the computer will draw the primitives such as lines or circles to construct a character like creating an image. So, in case of bitmapped font, we already specify all the pixels whereas in case of outline font we do not specify all the pixels, few pixels are specified and using those the overall shape is computed or created by following the scan conversion techniques. (Refer Slide Time: 48:18) Clearly bitmapped fonts are simple to define and first to render because here no computation is involved. We do not need to compute any pixels they are already specified, but it has some problems. 825 (Refer Slide Time: 48:37) Obviously, it will require additional storage, large amount of storage, because for each character we are storing a pixel grid information and then if we want to resize or reshape to generate different stylish effect of the font then that is not easily possible with bitmapped definitions and the resulting font may appear to be bad. (Refer Slide Time: 49:22) The third concern is the font size depends on screen resolution, because we are fixing the pixel grid. So, if the resolution changes then the rendering may not look good. For example, suppose we have defined a 12 pixels high bitmap, it will produce a 12-point character in a 72 pixels per 826 inch resolution. Now, if we change the resolution to 96 pixels per inch then the same bitmap will produce 9-point character which we may not want. So, depending on resolution the outcome may change. (Refer Slide Time: 50:18) On the other hand, outline fonts compute the intermediate pixels, they do not store everything, so it requires less storage, it can perform geometric transformations with satisfactory effect to reshape and all resize, so the distortion will be less and it is not resolution dependent. (Refer Slide Time: 50:50) 827 But on the other hand rendering such fonts is slow, which is quite obvious because computations are involved, we need to create the shape, we need to scan convert the shape before rendering. So, due to such computations rendering is slower compared to bitmap fonts. So, both the approaches have their positive and negative sides and depending on the resources used and the outcome desired we can choose one particular approach. (Refer Slide Time: 51:34) Whatever I have discussed today can be found in this book, you may go through chapter 9 section 9.3 and 9.4 for more details on the topics that we have covered today. In the next lecture, we will discuss an interesting issue in scan conversion which is called aliasing effect till then thank you and goodbye. 828 Computer Graphics Professor Doctor Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 28 Anti-Aliasing Techniques Hello and welcome to lecture number 28 in the course Computer Graphics. We are currently in our last leg of discussion on the 3D graphics pipeline. Let us quickly recap the pipeline and then we will continue our discussion today. (Refer Slide Time: 00:47) So, as we have learned there are five stages, let us quickly go through the stages once again. First stage is object representation, second stage is modelling transformation, third stage is lighting, four stage is viewing pipeline and there is a fifth stage which is scan conversion and just to recap among these stages object representation deals with representation of objects that constitute a scene and there the objects are defined in a local coordinate system. Second stage is modelling transformation where we combine different objects to construct a scene and there we perform a transformation that is local to world coordinate transformation and then the scene is defined in world coordinate system. Now, in this world coordinate system we assign colours to the object that is the lighting stage, then in the fourth stage that is viewing pipeline we perform a series of transformations namely view transformation where we transform from world to a view coordinate system then projection transformation where we transform from 829 3D view coordinate system to 2D view coordinate system and then thirdly window to viewport transformation where we transform from 2D view coordinate system to a device coordinate system. In this fourth stage we also performed two operations namely clipping and hidden surface removal both of these operations are done in this view coordinate system. The fifth stage that is scan conversion is also related to transformation where we transform this device coordinate description of a scene to a screen coordinate or pixel grid. Currently we are discussing on this fifth stage that is scan conversion. (Refer Slide Time: 03:01) Now, here in this stage we have already covered few topics namely how to convert point, line, circle, fill area, and characters. Today we will discuss an important concept related to scan conversion which is called anti-aliasing techniques. This is required to smoothen the scan converted or rendered shapes. 830 (Refer Slide Time: 03:39) Let us, try to understand what is anti-aliasing then we will discuss few anti-aliasing techniques with examples. (Refer Slide Time: 03:48) Let us, start with the basic idea. What is anti-aliasing? Now consider this figure here as you can see the dotted line indicates the original line that we wanted to scan convert and this is the pixel grid shown here and in this pixel grid, we want to scan convert this dotted line. However, as you can see not all the points on the line passes through the pixels so we need to map those points to the nearest pixels as we have seen in our earlier discussion. 831 As a result, what we get, we get this scan converted line which looks like a stair step pattern represented with this thick black line. Now, this is definitely not the original line exactly it is an approximation as we have already said however due to this approximation some distortion happens. In case of line these are called jaggies or stair-step like patterns. (Refer Slide Time: 05:13) In general, there will be distortion to the original shape after scan conversion for any shapes not only lines. Now, these distortions are called or the distorted shapes that is there after scan conversion is called aliasing this phenomenon where we get distortion due to the algorithms that we follow for scan conversion. Now, why it is called aliasing? What is the significance of this term? 832 (Refer Slide Time: 05:52) Before going into that we should note that aliasing is an undesirable side effect of scan conversion we want to avoid that to the extent possible. So, additional operations are performed to remove such distortions these techniques together are known as anti-aliasing techniques. So, when we perform some techniques to remove aliasing effects then we call those techniques as anti-aliasing techniques. (Refer Slide Time: 06:33) So, why we call it aliasing? What is the significance of this term? 833 (Refer Slide Time: 06:41) In fact, the term aliasing is related to signal processing we can explain the idea in terms of concepts that are borrowed from signal processing domain. (Refer Slide Time: 07:05) In computer graphics what we want we want to synthesize images. In other words, we want to render true image and here we have to think of it as rendering the image on the window in the view coordinate system or the 2D view coordinate system. 834 (Refer Slide Time: 07:41) Now, how we define this image in terms of intensity values, which can be any real number. Remember that we are still in the view coordinate system where we are free to use any real number to represent intensity. Now, those intensity values if we think in a different way, they represent some distribution of continuous values. So, if we plot that plot those values in a graph then we will get some curve which represents the distribution and here the values are continuous, it can take any real value real number. So, essentially we can think of a true image as a continuous signal that is mapping the idea of image rendering to signal representation. 835 (Refer Slide Time: 08:44) Now, when we perform rendering what we do? The process of rendering can be viewed as it two-stage process in a very broad sense in the first stage what we do we sample the signal or the intensity values. In other words, we sample those values for pixel locations. So, we try to get the pixel intensities, this is what we have discussed so far how to obtain the pixel intensities from the actual intensities. But there is also a second stage that is from the sampled intensities we want to reconstruct the original signal as a set of coloured pixels on the display. So, essentially we are given an image which is a continuous signal of intensity values. We can think of it in that way, then we want to render this image on the pixel grid that is the purpose of scan conversion, how we do that? We follow two stages broadly. In the first stage, we sample the pixel values and in the second stage we reconstruct from those samples to get the rendered image. 836 (Refer Slide Time: 10:20) Since we have reconstructing the original signal from the sampled value, clearly it is not the exact signal and we are dealing with a false representation of the original signal. Now, in English a person using a false name is known as alias and the same idea is adapted here where we are trying to represent an original signal in terms of false reconstructed signal hence this particular signal is called alias and whatever we get is known as aliasing. (Refer Slide Time: 11:19) Now, since we are reconstructing there will be some change from the original. Usually it results in visually distracting images or artefacts and to reduce or eliminate this effect we use anti- 837 aliasing techniques. We use techniques that reduce aliasing effects, they are called anti-aliasing techniques. (Refer Slide Time: 11:48) Now, how we do that? Again we can borrow terms from signal processing. So, continuous intensity signal that is the true image can be viewed as a composition of various frequency components or in other words primary signals of varied frequencies. That is one way of giving the intensity values. 838 (Refer Slide Time: 12:21) Now, there are two components in those signals. Uniform regions with constant intensity values may be viewed as corresponding to low frequency components whereas there are values that change abruptly and these values correspond to sharp edges at the high end of frequency spectrum. In other words, wherever there are abrupt changes in values we can think of those regions as representing the high frequency components. (Refer Slide Time: 13:10) Now, because of those high frequency components such abrupt changes we get aliasing effects. So, we need to smoothen out those aliasing effects, and how do we do that? 839 (Refer Slide Time: 13:30) By removing or filtering out the high frequency components from the reconstructed intensity signals. So, if we do not do any additional operations, we will have the reconstructed signal having both high and low intensity which will result in distortion. So, our objective would be to eliminate or remove, filter out the high frequency components in the reconstructed signal. (Refer Slide Time: 14:12) Now, there are broadly two ways to do that. We apply broadly to group of techniques filtering techniques, one set of techniques comes under pre-filtering techniques and the other set comes under post-filtering techniques. 840 (Refer Slide Time: 14:34) In pre-filtering, what we do is we perform filtering before sampling that means we work on the true signal which is in the continuous space and what we do we try to derive proper values for individual pixels on the true signal and these group of techniques are also called area sampling techniques. (Refer Slide Time: 15:07) In contrast, in post filtering as the name suggests, we perform filtering after sampling. So, we filter high frequency components from the sampled data. In other words, we compute pixel values and then using post filtering techniques we modify those values. Now, this group of 841 techniques are also known as super sampling techniques. So, we have pre-filtering or area sampling techniques and post filtering or super sampling techniques. Now, let us try to learn about few of these techniques. (Refer Slide Time: 15:53) We will start with area sampling. What are the techniques and how they work? (Refer Slide Time: 16:01) Now, in case of area sampling, we assume that pixel has area. So, pixel is not a dimensionless point instead it has got its area usually considered to be square or circular with unit radius and lines passing through those pixels have some finite width. So, each line also has got some area. 842 (Refer Slide Time: 16:35) Now, to compute pixel intensity what we do we determine the percentage of pixel area occupied by line. Let us denote it by p then pixel intensity I is computed as a weighted sum of the line colour and background colour as shown in this expression. So, to compute intensity of a pixel say for example here this is pixel 01. Now, here you can see that the line covers 50 percent of the pixel area or 0.5 into the line colour whatever that is, that cl then 1 minus 0.5 into whatever is the background colour that will be this pixel colour of this particular pixel 01 in the figure. Note that earlier what we did earlier we simply assigned the line colour to this particular pixel, but here we are considering area, how much of that area is occupied by the line and then accordingly we are changing the colour. This is one approach. 843 (Refer Slide Time: 18:16) Let us, have a look at a little bit more sophisticated approach involving more computations, which is called Gupta-Sproull algorithm. (Refer Slide Time: 18:27) Now, this is a pre filtering technique used for anti-aliased line drawing and here the intensity, pixel intensity is set based on distance of a line or line centre. Since here we are assuming line as finite width from the pixel centre. Now, this idea of this particular algorithm is based on midpoint line drawing algorithm. 844 So, earlier we have talked about DDA algorithm, Bresenham’s line drawing algorithm. Now, there is another algorithm that is midpoint line drawing algorithm, this algorithm is used in the Gupta-Sproull pre-filtering technique. So, we will first try to understand this algorithm then we will talk about the actual pre filtering technique. (Refer Slide Time: 19:30) So in the midpoint algorithm, what we do? Let us assume that we just determined this current pixel here and for the next pixel we have two candidate pixels. One is upper candidate pixel, one is lower candidate pixel given by these two E and NE. Up to this, it is similar to the Bresenham's line drawing algorithm that we have seen earlier. Now, what changes is that the way we decide on these candidate pixels, which one to choose. 845 (Refer Slide Time: 20:15) Earlier what we did we made the decision based on distance of line from candidate pixels. Now, in this midpoint algorithm, what we will do is we will consider midpoint between the candidate pixel rather than distance from line. So, midpoint is shown here between these two candidate pixels, which can be represented by this expression as you can see. (Refer Slide Time: 20:53) Now, we can represent a line as shown here with this expression where a, b, c are integer constants. Now we can restate this equation by multiplying 2 and get this expression without affecting anything in the equation. So that is just a trick. 846 (Refer Slide Time: 21:28) Then we set the decision variable dk which is the function evaluated at the midpoint M or at this point (xk+1, yk + ½). Now if we use the modified equation after multiplication by 2 then we can see that if we expand it will look something like this that is the decision variable at k. (Refer Slide Time: 22:00) Now, if d is greater than 0 then midpoint is below the line. In other words, if midpoint is below the line then pixel NE is closer and we should choose NE here. Otherwise, we should choose E. Now when NE closer that means d is greater than 0. The next decision variable will be d k+1 847 which is the next midpoint and if we expand and rearrange then we get in terms of the previous decision variable dk. So, dk+1 we can represent in terms of dk and a constant twice a plus b. (Refer Slide Time: 23:02) Now, when dk≤0, then we choose this one because midpoint is closer to NE and in that case the next decision parameter or decision variable would be given by this expression and which if we rearrange and reorganize then we will get dk+1= dk + 2a. (Refer Slide Time: 23:41) 848 Now, what is the initial decision variable, that is given by this expression. And after expanding we get this initial variable to be 2a+b, knowing that this value is 0 as you can see here in this derivation. (Refer Slide Time: 24:04) So, here we have summarized the steps of the algorithm. So, input is two line endpoints and the output is set of pixels to render the line. Now, first task is to find out the value of these constants say a, b and c from endpoints and then initial decision value d then we start with one end point and continue till the other endpoint as before. If d>0 then we update x, y in this way and update the decision parameter in this way. Otherwise, we update x in this way update the decision parameter in this way and eventually in each iteration we add this pixel to P we continue till we reach the other end point that is the midpoint algorithm. Now, let us see how this midpoint algorithm is used in Gupta Sproull algorithm. 849 (Refer Slide Time: 25:15) Now in Gupta Sproull algorithm there is some modification to the basic midpoint algorithm. Consider this figure here, here suppose we have chosen this pixel at present xk, yk and based on midpoint let us assume that we have chosen E this pixel in the next step. Later on, we will see what will happen if we choose NE instead of E. (Refer Slide Time: 25:53) Now, D is perpendicular distance from the point to the line and we can compute D using geometry as shown here where Δx, Δy are the differences in x and y coordinates of the line endpoints so they are constants essentially. So, the denominator here is constant. 850 (Refer Slide Time: 26:35) Now, what should be the value of the intensity here, it will be a fraction of the original line colour. So, will not assign the original line colour here to avoid aliasing effect instead we will assign a fraction of the original line colour. So, how to choose this fraction it is based on that distance D. Now see that this is in contrast to the earlier approaches like Bresenham’s algorithm or DDA algorithm where line colour is simply assigned to the chosen pixel. (Refer Slide Time: 27:14) 851 Now, how this distance determine the colour? Typically, a cone filter function is used that means more distant the line from the chosen pixel centre is the laser is the intensity. So, more distant the line is from the chosen pixel centre the less will be the intensity it will be dimmer. (Refer Slide Time: 27:39) And this distance to intensity value determination is implemented as a table. So, we maintain a table where based on distance a particular intensity value is there and depending on the computer distance we simply take the intensity value from the table and apply it to the chosen pixel. So, each entry in the table represents fraction with respect to a given D. So, some precomputed D values and their corresponding intensity values are there in that table. 852 (Refer Slide Time: 28:22) That is not all, along with that to increase the smoothness of the line, intensity of neighbours are also changed. So, here E is the chosen pixel here its neighbours are this pixel and this pixel. Now, their intensity is also modified. Again, according to the distances Dupper and Dlower from the line. So, here as you can see, this is Dlower and this is Dupper and depending on these values this neighbour pixel values are set. (Refer Slide Time: 29:02) 853 This Dupper and Dlower can be again obtained using geometry as shown in these expressions where v is this distance and Δx Δy are as before difference between the x and y coordinates of the endpoints. (Refer Slide Time: 29:33) And depending on those distances again tables are maintained to set the values of the neighbouring pixels. Now, in case of E suppose we have chosen NE, then of course these distances will be computed differently. Again, we can use geometry to check that the distances will be represented in this way. Of course here Dupper is different and Dlower is different as compared to the previous case and their distances we can represent using these expressions. So, this is the distance of the chosen pixel from the line, perpendicular distance then this is Dupper this is Dlower for each we maintain tables to implement the cone filter functions. In the table just to recap we maintain distances and the fraction of line colour to be applied based on that we choose the colour. 854 (Refer Slide Time: 30:43) So, there are a few additional steps performed in each iteration of the midpoint algorithm. In the regular algorithm we do not modify the pixel colours, whatever line colour is there we choose that to be the chosen pixel colour but here in Gupta-Sproull algorithm those steps are modified. So, after choosing a candidate pixel, we determine the line colour by first computing the distance if E is chosen then distance is given in this expression otherwise if NE is chosen then separate expression. Then we update the decision value as in the regular algorithm, then we set the intensity values according to D, then we compute Dupper and Dlower and then set intensity of the two vertical neighbours, these are the additional steps that are performed in the Gupta-Sproull algorithm as compared to the original midpoint line drawing algorithm. 855 (Refer Slide Time: 32:00) Let us, try to understand the algorithm with one example. (Refer Slide Time: 32:06) Suppose, this is our line shown here, these are the endpoints and we want to choose the pixels and colour them. So, both the things we need to do, choose pixels and also to choose appropriate intensity values for those pixels. 856 (Refer Slide Time: 32:35) So, first we would go for choosing the pixel and then we will decide on its colour. So, the line equation can be given in this way or we get a equal to this, b equal to this, and c equal to this, and initial decision value d is given as 1. (Refer Slide Time: 33:02) So, in the first iteration, we need to choose between NE and E, d is greater than 0 if we can see. So, then we need to choose NE that is this pixel and then reset d. 857 (Refer Slide Time: 33:23) Now, after choosing NE the next iteration we choose this pixel E'(3, 2) depending on the value of d and then again we reset d. (Refer Slide Time: 33:51) Now, after choosing these two pixels, we check that where is the other endpoint so stop. So, at the end the algorithm returns these four pixels, these are to be rendered. Now, while choosing a pixel we also use the modifications proposed in the Gupta-Sproull algorithm to assign colours. 858 (Refer Slide Time: 34:21) So, for example when we are talking about NE this is one chosen pixel, so you have to choose its colour as well as the colour of its two neighbours. Similarly, when we have chosen E we have to choose its colour as well as the colour of its two neighbours. (Refer Slide Time: 34:46) For that we determine Δx, Δy. Then we compute the distance from NE D which is given here and based on this distance. 859 (Refer Slide Time: 35:06) Also, we compute Dupper and Dlower by computing the first. (Refer Slide Time: 35:25) So, Dupper and Dlower are these two values for this particular pixel NE. So, you have Dupper, Dlower. Now, we use the table to determine the fraction of the original line colour to be applied to the three pixels based on the three computed distances. 860 (Refer Slide Time: 35:56) That is for any now for E' we have these two pixels, neighbouring pixels and the chosen pixel E'. So, here similarly we can compute D to be this value 1/√13. (Refer Slide Time: 36:17) And we compute v here to get the Dupper and Dlower values. 861 (Refer Slide Time: 36:33) So, using v we get Dupper to be this value, Dlower to be this value and now we know the distance of this pixel from line as well as Dupper and Dlower again we go back to the table perform the table lookup to determine the fraction of the line colour to be assigned to these 3 pixels. So that is how we choose the pixel colours. So in summary, what we do is in this Gupta-Sproull algorithm is we do not simply assign line colour to the pixels that are chosen to represent the line, instead we choose pixels following a particular line drawing algorithm and then compute three distances, distance of the pixel from the line and distance of the neighbouring pixels from the line. Here we are talking only in terms of vertical neighbours. And based on these distances we find out the fraction of the line colour to be applied to the chosen pixel as well as the neighbouring pixels. Here we apply a cone filter function in the form of a table. In the table corresponding to the distances some fractions are mentioned. So, those were area sampling techniques 862 (Refer Slide Time: 38:15) Now, let us try to understand the other broad class of anti-aliasing techniques known as super sampling. (Refer Slide Time: 38:23) Now, in super sampling what we do? Here each pixel is assumed to consist of a grid of subpixels. In other words, we are effectively increasing the resolution of the display. Now to draw anti-aliased lines, we count number of sub pixels through which the line passes and this number determines the intensity to be applied. For example, here this whole square is a pixel, which is represented with two by two subpixel grid. 863 (Refer Slide Time: 39:12) That is one approach. There are other approaches also, for example, we can use a finite line with determine inside subpixels that means the subpixels that are inside the finite width of the line. There can be a simple check for that. We may only consider those subpixels which has its lower left corners inside the line as inside. So if a subpixel has its lower left corner inside the line we can consider that subpixel to be inside subpixel and then pixel intensity is weighted average of subpixel intensities where weights are the fraction of subpixels inside or outside, that can be another simple approach. (Refer Slide Time: 40:25) 864 For example, in this figure as you can see the line has some width and each pixel has got 2×2 subpixel grid. So for example, this is a pixel which has got 2×2 subpixel grid, this pixel (0, 2). Similarly, this is a pixel (1, 1) which has got 2×2 subpixel grid. Now let us assume that the original line colour is given by this R, G, B values and the background light is yellow with again these R, G, B values. So, then how to choose the actual colour after rendering? (Refer Slide Time: 41:30) Now in the figure as you can see these three subpixels maybe considered inside the line. Thus the fraction of subpixels that is inside is 3/4 and outside fraction is 1/4. (Refer Slide Time: 42:07) 865 Now if we take a weighted average for individual intensity components, then we get the R value for this particular pixel as this or this value, G value will be this value and B value will be 0. And this R value or this G value and this B value together will give us the colour of that particular pixel. (Refer Slide Time: 42:45) So, the intensity will be set as R equal this, G equal to this and B equal to this value and we got this by considering the fraction of subpixels that are inside the line and fractions that are outside. (Refer Slide Time: 43:07) 866 Sometimes we also use weighting masks to control the amount of contribution of various subpixels to the overall intensity. Now this mask size depends on the subpixel grid size, that is obvious because we want to control the contribution of each and every subpixel. So if we have a 3×3 subpixel grid then the mask size will be 3×3, there should be same. (Refer Slide Time: 43:48) For example, consider a 3×3 subpixel grid and we are given these masks. So, given this mask and these three subpixel grid, how to choose the colour or the intensity of a pixel? (Refer Slide Time: 44:09) 867 So, we can use this rule that the intensity contribution of a subpixel is its corresponding mask value divided by 16, which is the sum of all the values. So for the subpixel (0, 0) its contribution will be 1/16, for subpixel (1, 1) contribution will be 4/16 because the corresponding value is 4 and so on. So whatever subpixel intensity value is we will multiply with these fractions to get the fractional contribution and then we will add those up to get the overall pixel intensity. (Refer Slide Time: 45:07) Now suppose a line passes through or encloses the sub pixels top, center, bottom left and bottom of a pixel. Assuming the same masks as shown here and 3×3 subpixel grid. (Refer Slide Time: 45:32) 868 Then to get the line intensity at any particular pixel location or that pixel intensity for the line where line colour is given by cl original colour and background colour is given by cb, what we can do, we can use this simple formulation; pixel intensity will be given by total contribution of the subpixels into cl plus 1 minus total contribution into cb, this is for each of these R, G, B. components. (Refer Slide Time: 46:11) So, that is in effect what we can do it super sampling. So, let us summarize what we have learned so far. (Refer Slide Time: 46:25) 869 So with this discussion we have come to the end of our discussion on 3D pipeline and here we have covered in this series of lectures the five stages as I mentioned at the beginning. Today we have learned about one important technique that is aliasing and how to address it. It happens because in the fifth stage when we convert a shape for an image to a pixel grid then we get distortions and anti-aliasing techniques are useful for avoiding those distortions. To do that, we follow broadly two groups of techniques; either area sampling or pre-filtering and super sampling or post filtering. In pre-filtering, we learned about Gupta-Sproull algorithm along with some other simple approach. Similarly, in post filtering we learned about three approaches there are many others, of course. So these are meant to give you some idea of what are the issues and how they are addressed but clearly as you can see if we are going for anti-aliasing techniques, it entails additional computations and additional hardware support which of course has its own cost. With that we end our discussion today and that is also the end of our discussion on the 3D graphics pipeline. In the next lecture, we shall learn about pipeline implementation, how the pipeline stages that we have learned so far are implemented, in particular we will learn about the graphics hardware and software. In the introductory lectures we learnt in brief the very small way graphics hardware and software, here we will go into more details. (Refer Slide Time: 48:53) 870 Whatever I have discussed today can be found in this book. You are advised to go through chapter 9 section 9.5 for more details on the topics. So see you in the next lecture, till then, thank you and goodbye. 871 Computer Graphics Professor Doctor Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology Guwahati Lecture 29 Graphics I/O Devices Hello, and welcome to lecture number 29 in the course Computer Graphics. So, before we go into today's topic, we will quickly recap what we have learned so far. (Refer Slide Time: 00:44) Now, till today, we have covered the stages of the 3D Graphics Pipeline. We completed our discussions on the pipeline stages. Today and in the next few lectures, we are going to look into its implementation that means, how the pipeline stages are implemented. 872 (Refer Slide Time: 01:15) So, in these lectures on pipeline as well as the lectures that preceded the pipeline discussion, what we have learned? (Refer Slide Time: 01:30) We can summarize the learning as the fundamental process that is involved in synthesizing or depicting an image on a computer screen, that is what we have learned so far in the process. 873 (Refer Slide Time: 01:54) Now in this process, there are several stages. So, the process starts with abstract representation of objects, which involve representing points or vertices, lines or edges and other such geometric primitives, that is the first thing we do in executing the process. Next, the subsequent stages of the pipeline are applied to convert this representation to a bit sequence, sequence of 0s and 1s. And then this sequence is stored in this frame buffer location, and the content of the frame buffer is used by video controller to activate appropriate pixels, so that we perceive the image, that is the whole process. We first define some objects or in other words, we define a scene, then we apply the pipeline stages on this definition to convert it to 0s and 1s, and then these 0s and 1s are stored in a frame buffer. The frame buffer values are used by the video controller to activate appropriate pixels on the screen to give us the perception of the desired image. 874 (Refer Slide Time: 03:43) So far, we have discussed only theoretical aspects of this process, that means how it works conceptually. But we did not discuss on how these concepts are implemented. And today and next few lectures, we will do that, that will be our primary focus, how the concepts that we have discussed to understand the process are implemented in practice. (Refer Slide Time: 04:23) So, what we will learn? We will learn the overall architecture of a graphics system, how it looks. Then, we will have discussion on display device technology. We will also learn about graphics processing unit or GPU in brief. Then, we will mention how the 3D pipeline is implemented on graphics hardware. 875 And finally, we will learn about OpenGL, which is a library provided to ease graphics software implementation. So, we will start with how a graphic system architecture looks. Remember that we have already introduced a generic system architecture in our introductory lectures. We will quickly recap and then try to understand it with the new knowledge that we have gained in our previous discussions. (Refer Slide Time: 05:37) So, if you may recollect, so in the generic architecture, generic system architecture, we have several components as shown in this figure. So, we have the host computer, which issues commands and accepts interaction data. Now, we have display controller, which is a dedicated graphics processing unit, which may take input from input devices. Then the output of this display controller is stored in video memory. And this video memory content is used by the video controller to render the image on the screen, that is what we have briefly learned about earlier. 876 (Refer Slide Time: 06:41) But as may be obvious, the terms that we used were very broad, they give some generic idea without any details. (Refer Slide Time: 06:55) In the last few lectures, we have learned about new things, how the pipelines are organized and what are the algorithms, what they do. So, in light of that new knowledge, let us try to understand the relationship between these hardware components and the pipeline stages. Let us assume that we have written a program to display 2 objects on the screen. It is very simple image having only a ball and a cube, something like this. So, this is the screen here we will show a ball, maybe with some lines and a cube. So, we want to show 877 these two objects as an image on the screen, and we have written a program to do that. Then let us try to understand with respect to the generic architecture, what happens. (Refer Slide Time: 08:25) Once the CPU detects that the process involves graphics operations, because here display is involved, it transfers the control to display controller. In other words, it frees itself from doing graphics related activities, so that it can perform other activities. Now, the controller has its own processing unit separate from CPU, which is called GPU or graphics processing unit. We will learn in more details about the GPU in a later lecture. Now, these processing units can perform the pipeline stages in a better way. So, there are specialized instructions using which the stages can be performed on object definition by the GPU to get the sequence of bits. So essentially, conversion of the object definition to sequence of bits is performed by GPU with the use of specialized instructions. 878 (Refer Slide Time: 09:58) Now, this bit sequence is stored in frame buffer, which we have already mentioned before. In case of interactive systems, where user can provide input, frame buffer content may change depending on the input that is coming from the input devices. (Refer Slide Time: 10:21) But we must keep in mind that frame buffer is only a part of the video memory. It is not the entire video memory. We also require other memory to store object definitions as well as to store instructions for graphics operations, that means the code and data part. So, that is what constitute video memory, we have frame buffer as well as other memory to store various things. 879 (Refer Slide Time: 11:04) Now, how to organize this memory? There are 2 ways. We can integrate the memory in the generic architecture as shared system memory, that means a single memory shared by both CPU and GPU. Clearly here, to access memory, we need to use the common system bus as shown in this figure. So, the execution may be slower. So, in this figure, as you can see, we have CPU and GPU here as part of display controller, and we have a common system memory which both of them access through this. So, if GPU wants to access it, it will leave the system bus, if CPU wants to access it, it will leave the system bus, and accordingly, it may be slow. (Refer Slide Time: 12:13) 880 Otherwise, we can have dedicated graphics memory, which can be part of this graphics controller organization. As shown here, as you can see, we have this display controller, which has exclusive access to this dedicated graphics memory or video memory. This memory has 2 component, one is the memory containing other things and one is the memory called frame buffer. And here, there is no need to access the shared memory through system bus, common system bus, so it is faster as compared to the previous scheme. (Refer Slide Time: 13:09) Now, once the data is available in the frame buffer, video controller acts on the framebuffer content. Now, acting means it maps to activation of corresponding pixel on the screen, the framebuffer content is mapped by the video controller to activation of corresponding pixels on the screen. For example, in case of CRT, activation refers to excitation as we have seen earlier by appropriate amount of corresponding phosphor dots that are there on the screen. Now, how to choose the appropriate amount? This amount of excitation is determined by electron beam intensity, which in turn is determined by voltage applied on electron gun, which in turn is determined by the frame buffer value. So, this is how this frame buffer value affects the amount of excitation in case of CRT, and similar thing happens with respect to other devices as well. So, that is in summary, how we can understand the generic system architecture in light of the stages that we have learned. So, we can relate the stages to the ultimate generation of image on the screen at a very broad level as we have just discussed. Now, let us try to have a more 881 detailed understanding of different graphics hardware and software. So, we will start with graphics input and output devices. (Refer Slide Time: 15:18) Let us start with the output devices. Now, as we all know, whenever we talk of graphics output device, immediately what comes to our mind is the video monitor or the so-called computer screen. But there are other output devices as well. For example, output also means projectors, we project the content. Of course, as we all know, both can be present together in a graphics system, both the monitor as well as a projector. In addition, there may be a third mode of output that is hardcopy output. We are already familiar with them, one is printer, other one is plotters. Also, nowadays, we have wearable displays such as head mounted displays HMDs, which are not traditional computer screens, but they also provide a way to display output. So, there are outputs available in different ways. 882 (Refer Slide Time: 16:38) In this lecture, we will talk about video monitors and hardcopy outputs, namely printers and plotters in brief. We will start with video monitor. (Refer Slide Time: 17:00) Now, whatever screens we see nowadays are all called flat panel displays. This is a generic term used to represent displays that are flat as compared to earlier CRTs, which used to be bulky. So, they are thinner and lighter compared to CRTs of course and useful for both non portable and portable systems. And they are almost everywhere, desktops, laptops, palmtops, calculators, advertising boards, video-game console, wristwatch and so on. Everywhere, we get to see flat panel displays. Now, there is a wide variation in these displays. 883 (Refer Slide Time: 17:54) Flat panel effectively is a generic term, which indicates a display monitor having a much reduced volume, weight and power consumption compared to CRT. So, whenever we are talking of flat, it has to be understood in the context of CRT. (Refer Slide Time: 18:22) Now there are broadly two types of flat panel displays, one is emissive display, other one is non-emissive displays. 884 (Refer Slide Time: 18:38) In case of emissive displays, they are often known as emitters, what happens is that these displays convert electrical energy into light on the screen. Examples are plasma panels, thinfilm electroluminescent displays, light emitting diodes or LEDs, these are all emissive displays. (Refer Slide Time: 19:15) In case of non-emissive display, what happens is that such displays convert light which may be natural or may come from other sources to graphics pattern on screen through some optical effects, this is important. Example is LCD or liquid crystal displays. 885 (Refer Slide Time: 19:49) Let us go into a bit more details of these types of displays. We will start with emissive display. (Refer Slide Time: 19:57) As we mentioned, one example of emissive displays is plasma panel displays. Now, in such type of displays, we have 2 glass panels or plates placed parallelly as shown in this figure. And the region in between is filled with a mixture of gases, these are Xeon, Neon and Helium. So, this is the inside region between the 2 parallel plates, glass plates, which is filled with gases. 886 (Refer Slide Time: 20:45) Now, the inner walls of each plate contain set of parallel conductors. And these conductors are very thin and ribbon shaped. As shown here, these are set of parallel conductors, these are also sets of parallel conductors. The conductors are placed on the inner side of the plate. (Refer Slide Time: 21:22) And as shown in this figure, one plate has set of vertical conductors, whereas the other contains a set of horizontal conductors. The region between each corresponding pair of conductors that means horizontal and vertical conductors is defined as a pixel. So, the region in between these parallel conductors is called a pixel as shown here. 887 (Refer Slide Time: 22:06) Now, the screen side wall of the pixel is coated with phosphors. For RGB or colour displays, we have 3 phosphors corresponding to RGB values. (Refer Slide Time: 22:32) Now, what happens? The effect of image displayed on the screen happens due to ions that rush towards electrodes and collide with the phosphor coating. When they collide, they emit lights. And this light gives us like in case of CRT, the perception of the image. Now, the separation between pixels is achieved by the electric fields of the conductors. That is how plasma panels work. 888 (Refer Slide Time: 23:22) Then we have led or light emitting diodes, that is another type of emissive devices. In this case, each pixel position is represented by an LED or light emitting diode. So, the overall display is a grid of LEDs corresponding to the pixel grid. Now, this is different than plasma panel as you can see, where we did not have such grids, instead ions collide with phosphors and produce lights. Now, based on the frame buffer content, suitable voltage is applied to each diode in the grid to emit appropriate amount of light. Again, similar to CRT, where we use the frame buffer content to produce suitable amount of electron beam to produce suitable amount of intensity from the phosphors. (Refer Slide Time: 24:46) 889 Let us now try to understand non-emissive displays. An example is LCD or liquid crystal displays. So here, like plasma panel here we have 2 parallel glass plates, each having a material which is a light polarizer aligned perpendicular to the other. And rows of horizontal transparent conductors are placed on the inside surface of one plate having particle polarisers. Also, columns of vertical transparent conductors on the other plate having horizontal polarizer. (Refer Slide Time: 25:45) Now, between the plates, we have a liquid crystal material. Now, this material refers to some special type of materials that have crystalline molecular arrangement, although they flow like liquids, they behave like liquids. Now, LCDs typically contain threadlike or nematic crystalline molecules, which tend to align along their long axes. 890 (Refer Slide Time: 26:27) The intersection points of each pair of mutually perpendicular conductors define the pixel positions. When a pixel position is active, molecules are aligned. (Refer Slide Time: 26:52) Now, this LCD can be of 2 types, reflective and transmissive. 891 (Refer Slide Time: 27:01) In case of reflective display, we have external light enters through one polarizer and gets polarized. Then the molecular arrangement ensures that the polarized light gets twisted, so that it can pass through the opposite polarizer. And behind polarizer, a reflective surface reflects the light back to the viewer. So here, it depends on external light. (Refer Slide Time: 27:42) In case of transmissive display, we have a light source present on the backside of the screen unlike reflective displays where there is no light source present. Now, light from the source gets polarized after passing through the polarizer, then twisted by liquid crystal molecules, and passes through screen-side polarizer to the viewer. Here, to deactivate a pixel, voltage 892 applied to intersecting pairs of conductors, which leads to molecules in the pixel region getting rearranged. Now, this arrangement prevents polarized light to get twisted and passed through the opposite polarizer effectively blocking the light. So, we do not get to see any colour or anything at those pixel locations. So, the basic idea in liquid crystal displays is that, we have a liquid crystal in between pixel positions. Due to the molecular arrangement, light passes through or gets blocked and we get the image on the screen accordingly. (Refer Slide Time: 29:20) Another thing to note here is that these both reflective and transmissive LCDs are also known as Passive Matrix LCD technology. 893 (Refer Slide Time: 29:34) In contrast, we also have Active Matrix LCD technology, which is another method of constructing LCDs. In this case, thin film transistors or TFTs are placed at each pixel location to have more control on the voltage at those locations. So, they are more sophisticated. And these transistors also help prevent charges leaking out gradually to the liquid crystal cells. So, essentially in case of passive matrix, we do not have explicit control at the pixel locations, whereas in case of active matrix LCDs, we have transistors placed at those locations to have more control on the way light passes. (Refer Slide Time: 30:31) Now, let us try to understand output devices, graphic output devices. 894 (Refer Slide Time: 30:41) So, as we said when we talk of output devices, one is display screen that is one, other thing is hardcopy devices, display screen we have already discussed. In hardcopy output devices, we have printers and plotters. In case of printers, there are broadly 2 types, impact printers and non-impact printers. (Refer Slide Time: 31:14) Now, in case of impact printers, there are pre-formed character faces pressed against an inked ribbon on the paper. Example is line printer, where typefaces mounted on a band or chain or drums or wheels are used. And these typefaces are pressed against an ink ribbon on the paper. So, in case of line printer, whole line gets printed at a time. 895 (Refer Slide Time: 32:00) There is also character printer. In that case, 1 character at a time is printed, example is dot matrix printer, although they are no longer very widespread nowadays, but still in few cases they are still used. In such printers, the print head contained a rectangular array or matrix of protruding wire pins or dots. Number of pins determine the print quality. Higher number means better quality. Now, this matrix represents characters. Each pin can be retracted inwards. (Refer Slide Time: 33:28) During printing, some pins are retracted, whereas the remaining pins press against the ribbon on paper, giving the impression of a particular character or pattern. So here, the objective is to control the pins or the dots, which pins to let impact on the ribbon and which pins to pull 896 back inwards. Those are impact printers. More popular nowadays are non-impact printers. We are all familiar with them. It has laser printers, inkjet printers, electrostatic methods and electrothermal printing methods. (Refer Slide Time: 33:51) In case of laser printer, what happens? A laser beam is applied on a rotating drum. Now, the drum is coated with photo-electric material such as selenium. Consequently, a charge distribution is created on the drum due to the application of the laser beam. The toner is then applied to the drum, which gets transferred to the paper. So due to the charge distribution, that toner creates a pattern, pattern of what we wanted to print, and that gets transferred to the paper. (Refer Slide Time: 34:41) 897 That was laser printing technology. In case of inkjet printers, what happens? An electrically charged ink stream is sprayed in horizontal rows across a paper, which is wrapped around a drum. Now, using these electrical fields that deflect the charged ink stream, so there are electrical fields also, which deflect the charged ink stream, dot matrix patterns of ink is created on the paper. So essentially, there is ink stream which gets deflected due to electrical field and then creates the desired pattern on the paper, which is wrapped around a drum. (Refer Slide Time: 35:40) Then we have electrostatic printer. In this case, a negative charge is placed on paper at selected dot positions one row at a time. Now, the paper is then exposed to positively charged toner, which gets attracted to the negatively charged areas, producing the desired output. (Refer Slide Time: 36:14) 898 And finally, we have electrothermal methods of printing also. In this case, heat is applied to a dot matrix print head on selected pins, and print head is used to put patterns on a heat sensitive paper. Of course, these 2 types are not as common as the laser jet and inkjet printers, but they are still used. That is about how printers work. (Refer Slide Time: 36:50) So far, we have not mentioned anything about colour printing. We will quickly try to understand how colour printing works. So, in case of impact printers, they use different coloured ribbons to produce coloured printing. But the range of colour and quality is usually limited, which is much better in case of non-impact printers. Here, colour is produced by combining 3 colour pigments, cyan, magenta, and yellow. In case of laser and electrostatic devices, these 3 pigments are deposited on separate passes. In case of inkjet printers, these colours are sought together in a single pass along each line. So, they work differently for different printers. 899 (Refer Slide Time: 37:58) Apart from printers, we also have plotters as another graphics output device. They are hardcopy outputs. And typically, they are used to generate drafting layouts and other drawings. (Refer Slide Time: 38:19) This shows one example plotter, this figure. Here typically, in pen plotters, one or more pens are mounted on a carriage or crossbar, which spans a sheet of paper. And this paper can lie flat or rolled onto a drum or belt, which is held in place with clamps. 900 It can also be held in place with a vacuum or an electrostatic charge. As shown here, there is a pen, a pen carriage, moving arm and there are other spare pens also, indicating different colours. So, the pen can move along the arm, and the arm can move across the page. (Refer Slide Time: 39:36) To generate shading or styles, different pens can be used with varying colours and widths as shown here. (Refer Slide Time: 39:55) And as I already mentioned, the pen holding carriage can move, it can be stationary also depending on the nature of the plotter. 901 (Refer Slide Time: 40:20) Sometimes instead of pen, ink-jet technology is also used, that means instead of pen, ink sprays will be used to create the drying. (Refer Slide Time: 40:36) And how this movement is controlled? Again, it depends on the content of the frame buffer. So, depending on the frame buffer values, the movement of pens or spray, ink spray is determined, just like in case of video monitors. So, we have learned about in brief 2 types of graphics output devices, namely video monitors and hardcopy outputs. Let us now try to quickly understand the input devices, what kind of inputs are there and how they affect the frame buffer. 902 (Refer Slide Time: 41:22) In most of the graphic systems that we typically see nowadays, provide data input facilities, that means the users can manipulate screen images. Now, these facilities are provided in terms of input devices. The most well-known such input devices are keyboards and mouse. But there are many other devices and methods available. Let us have a quick look at all those different devices and methods. (Refer Slide Time: 42:04) So, in case of modern-day computing environment, as we know, we are surrounded by various computing devices. So, we have a laptop, desktop, tab, smartphone, smart TV, microwave, washing machine, pedometer and many more such devices that we interact with 903 every day, each of which can be termed a computer, by the classical definition of a computer. And accordingly, we make use of various input devices to provide input to these computers. (Refer Slide Time: 43:02) So, all these input devices or input methods can be divided into broad categories. The objective of such devices are to provide means for natural interaction. So, they include speech-based interaction, that means the computers are equipped with speech recognition and synthesis facilities. So in case of speech recognition, the computer can understand what we say, so we provide input through our voice and there is a speech recognition system that understands what we say. And then it can also produce output in terms of speech only, human understandable speech through the synthesis method. Note that this is different from what input and output we have mentioned earlier. Then we have eye gaze interaction, where we use our eye gaze to provide input. Haptic or touch interaction, one example is the touchscreen, which we are heavily using nowadays because of the use of smartphones or tabs. There are alternative output mechanisms also, exploiting the sensation of touch. These are called tactile interfaces. Here, we do not rely on display or visual display, instead we go for tactile interfaces. These are primarily useful for people who are having problem in seeing things. 904 (Refer Slide Time: 44:57) We can also have “in air” gestures to provide input. Now, these gestures can be provided using any of our body parts like hands or fingers or even head. And there is no need to touch any surface unlike in case of smartphones or touchscreen devices, where we provide gesture by touching the surface. We can also have interaction or we can also provide input through our head or body movements. So, all these are input mechanisms that are heavily used nowadays. Traditional input mechanisms like keyboard, mouse, joystick, stylus are no longer very popular, instead we mostly interact with the computers that we see around us through touch, through gestures, through speech and so on. So, all these devices also are equipped with recognition of such input mechanisms. And also, as I said, output need not be always visible, sometimes it can be different also like in case of tactile output, we can only perceive the output through touch rather than by seeing anything. There also, this frame buffer content can be utilized to create the particular sensation of touch to give us specific output. Also, we can provide output through speech, speech synthesis to be more precise and so on. 905 (Refer Slide Time: 46:58) Now, these inputs can be used to alter the frame buffer content. For example, I have created an image of a cube and a ball as the example we started with. Now, I gave a voice command that place the ball on the left side of the cube, that means the computer will understand this command and accordingly modify the frame buffer values, so that the ball is now placed on the left side of the cube. Similarly, I can also give a command like, place the ball on the right side of the cube. And again, the frame buffer value will change, so that the display that we get is an image showing the ball on the right side of the cube and so on. So, with these inputs, we can change the output. So, that is in brief, how we can provide input and how it affects the frame buffer content to get different outputs. 906 (Refer Slide Time: 48:25) Whatever I have discussed today can be found in this book, you may refer to chapter 10, these 2 sections, section 10.1 and 10.2. So today, we briefly discussed about different technologies that are used for computer graphics systems, namely the display technologies, the hard copy output technologies and the input technologies. In the next lecture, we are going to go deeper into this graphics hardware and going to learn more on how the controller works, how the GPUs in the controller are organized and help implement the pipeline stages. See you in the next lecture. Thank you, and goodbye. 907 Computer Graphics Professor Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 30 Introduction to GPU and Shaders Hello and welcome to lecture number 30 in the course, computer graphics. We have reached almost the end of the course. (Refer Slide Time: 00:45) In the previous lecture we have learnt about basic graphics hardware. Also, we learnt about the graphics input and output devices and also general architecture of a graphic system. Now, we will continue our discussion on this graphics hardware and today we will cover basics of GPU and GPU programming, which are part of the graphics hardware. 908 (Refer Slide Time: 01:21) So, let us start with GPU. What it is and how it is used to implement the pipeline. (Refer Slide Time: 01:40) One thing we should note in graphics is that the graphics operations are highly parallel in nature. That is a very crucial characteristics of graphics operations. Consequently, there is a need to go for parallel processing of these operations. 909 (Refer Slide Time: 01:56) For example, consider the modeling transformation stage. Remember that in this stage, what we do? We convert or transform objects defined in their own or local coordinate system to a world coordinate scene. Now, how we do that, we apply transformations, for example rotations to the vertices that define the objects. (Refer Slide Time: 02:34) 910 And what is the transformation? If you may recollect from our earlier discussions, we define transformation as a multiplication of two things. One is a transformation matrix and the other is a vertex vector. The thing to note here is that the same vector matrix multiplication is done for all vertices that we want to transform. That means we are essentially performing the same operation multiplication for all the vertices. (Refer Slide Time: 03:26) Now, we are given a set of vertices that define the objects. We can go for a serial multiplication where we perform one matrix vector multiplication at a time. However, that anyhow is not going to be very efficient. Because essentially, we are performing the same operation, so instead of going for serial multiplication, if we can perform the same operation on all the vectors at the same time that is in parallel then we are going to have a significant gain in performance. And this is very important in real time rendering of scenes because typically we need to process millions of vertices at a time or millions of vertices per second. Therefore, if we can process all these millions of vertices parallely then we are going to have huge gain in performance. 911 (Refer Slide Time: 04:40) If we are performing these operations using our CPU then we cannot take advantage of these inherent parallel nature of the graphics operations. Because CPUs are not designed for that. In order to address this issue where we want to take advantage of these inherent parallelism, there is special purpose hardware that comes with our systems. Almost all the graphics systems come with a separate graphics card, containing its own processing unit and memory elements. Now, this separate specialized hardware system is called graphics processing unit or GPU. So, essentially GPU means a specialized hardware which is used to perform graphic operation by exploiting the inherent parallelism that are there in graphics operations. (Refer Slide Time: 06:08) 912 (Refer Slide Time: 06:24) Let us go into the workings of GPU. Now, we have to note that GPU is a multicore system that means it contains a large number of cores or unit processing elements. Now, each of these cores or these unit processing elements is called a stream processor. Because it works on data streams, streams of input data. (Refer Slide Time: 06:55) Now, these cores are nothing but simple hardware capable of performing simple integer and floating-point arithmetic operations only. So, each core can perform arithmetic operations only, either integer arithmetic or floating point arithmetic. And multiple cores are grouped together to form another unit called streaming multiprocessors or SM. So, each core is called 913 stream processor and many such cores are grouped together to form streaming multiprocessors. (Refer Slide Time: 07:43) Now, this brings us to the idea of SIMD, note the term. To understand, let us consider one example, geometric transformation of vertices that we were discussing earlier. So, here our instruction is same which is multiplication. Now, the data on which this instruction operates varies, because the vertex vectors vary. Although the transformation matrix remains the same. So, then here what we are doing, we are having single instruction working on multiple data. This is the idea of SIMD or Single Instruction Multiple Data, and the GPU streaming multiprocessors are essentially examples of SIMD. So, how it works, here as you can see, we have same instruction given to all the cores and the cores take data which may be different but the instruction will be same and the same instruction will operate on different data streams. Another illustration of this idea is given here, here if we do not consider SIMD then what happens. So we have two addition operations on two data streams. A0 B0 and A1 B1, so there will be two separate instructions for performing these two separate additions giving the output of C0 and C1, this is normal operation. In case of SIMD what happens is that we have this as data streams and the instruction is single. Here note that we have two instructions working on two separate data streams. Here we have a single instruction working on both the data streams to give us the desired output. That is the idea of SIMD. So, you now know that GPUs contain SMs or streaming multiprocessors which work based on the idea of SIMD. 914 (Refer Slide Time: 10:25) Then let us have a look at how the GPUs are organised. As I said each streaming multiprocessor is designed to perform SIMD operations. So and we have many such streaming multiprocessors, as shown here. Then we have some specialized memory, purpose of which will be explained to you shortly, and other components to manage this parallel processing. Each streaming multiprocessor contains multiple streaming processors or course as shown here. And each core is capable of performing simple integer or floating-point arithmetic operations only. (Refer Slide Time: 11:33) So, that is broadly what are there in GPU, streaming multiprocessors and dedicated memory units plus additional components to manage this parallel processing. Now, let us try to 915 understand how the graphics operations work in the GPUs. First thing we should note is that most real time graphic systems assume that scene is made of triangles. So, we actually convert any surface into triangles or triangular meshes. This point we have already discussed earlier when we were talking about object representation. (Refer Slide Time: 12:34) Now, given that triangular mesh information what happens is that, those dedicated APIs which are provided in the graphics library such as OpenGL or Direct3D, these triangles are send to GPU, one vertex at a time serially and GPU assembles them into triangles. 916 (Refer Slide Time: 13:03) Also, we should note here that the vertices are represented with homogeneous coordinate system. So the vertices are represented in the homogeneous coordinate system. And so we are dealing here with the very first stage that is object definition, so these objects are defined in their local or modeling coordinate systems. Then the GPU performs all the stages, so first it performs modeling transformation on vertices that is the first stage of processing. 917 (Refer Slide Time: 13:54) And as we have explained earlier this transformation is achieved with a single transformation matrix and vector point multiplication operation. (Refer Slide Time: 14:15) As we have noted earlier, the multicore GPU performs such operations simultaneously or parallely, so essentially multiple vertices are transformed simultaneously at the same time. Is not that one after another we are performing the multiplications. So, what we get after multiplication that is stream of triangles but this time, they are defined in world coordinate system, which is the purpose of modelling transformation stage. It is also assumed that the viewer is located at the origin of the world coordinate system and view direction is aligned with the z axis. This is the assumption using which the hardware is designed. 918 (Refer Slide Time: 15:17) So, after modeling transformation, GPU computes vertex colour or the lighting stage is performed. Now, this is done based on the light that is defined for the scene, so some light source is assumed and based on that light source this colouring is done. Now, why GPU is suitable for computing colours because if you may recollect our discussion on lighting, we have noted that colouring can be computed by vector dot products and a series of addition and multiplication operations. And these operations are performed simultaneously for multiple vertices by the GPU because it is designed in that way, so again here we are exploiting the inherent nature of graphics operations that is parallelism. (Refer Slide Time: 16:38) 919 After colouring, each coloured 3D vertex is projected on to the blue plane. And that again is done using matrix vector multiplication, we have already noted this before during our discussion on projection transformation and the output that we get is stream of triangles in the screen or device coordinates ready to be converted to pixels. Now, note here that this projection actually involves view transformation also. Which we have not explicitly mentioned here as well as window to view put transformation. All these transformations we can club together by multiplying the corresponding transformation matrices to get a single transformation matrix. (Refer Slide Time: 17:46) So, after that stage we get the device space triangles and we now go for rasterization or scan conversion. So here it may be noted that each device space triangle overlaps some pixels on the screen that means those pixels are part of the triangles. In the rasterization stage these pixels are determined. 920 (Refer Slide Time: 18:20) Now, the GPU designers who developed GPUs over the years incorporated many such rasterization algorithms, we have already discussed few in our discussions on rasterization. Now, these algorithms exploit one crucial observation that is each pixel can be treated independently from all other pixels. So, it is not necessary to treat the pixels as dependent on each other they can be treated independently. (Refer Slide Time: 19:01) Accordingly, the pixels can be rasterised parallely, so you can use this inherent parallelism to rasterize all the pixels simultaneously. And that is one big advantage of having GPU, we do 921 not have to process one pixel at a time instead we can process all the pixels together to get quick result, quick output. (Refer Slide Time: 19:36) Now, if you may recollect the 3D graphics pipeline, during pixel processing stage there are two more stages that are there, two more activities that are there, one is surface texturing or assigning patterns to the surface colours and second is hidden surface removal or HSR. (Refer Slide Time: 20:03) 922 Now, surface texturing idea is very simple here texture image is there which is actually imposed on the surface to give us the illusion of details. Note that it is only an illusion creation rather than actually computing a texture pattern, a simply replacing pixel colours with texture colour. That is the simplest idea we discussed earlier. (Refer Slide Time: 20:40) Now, in order to that we need to store this texture images or texture maps. And since we need to access it, that texture images frequently, ideally they should be stored in high speed memory, so that access time is less. Now, this is because as we said earlier pixel calculations are very frequent and each pixel calculation must access this texture images. Secondly the access is usually very regular in nature. That means nearby pixels tend to access nearby texture images or texture image locations. So, to reduce the access time specialized memory cache is used to store the texture images as shown here in this figure. These are specialized memory locations in the GPU to store texture images. 923 (Refer Slide Time: 22:05) Also, we discussed earlier during our discussion on hidden surface removal, the idea of Z buffer algorithm or depth buffer algorithm. Now, that is implemented in GPUs and for that also typically GPUs are equipped with specialized memory element or depth buffers. And it stores distance of viewers from each pixel. So, that is typically part of the GPU. (Refer Slide Time: 22:48) Now if you may recollect how the Z buffer works, so here also GPU compares pixels distance with distance of pixel already present that is it simply executes the algorithm and the display memory is updated only if the new pixel is closer. So, it implements the Z buffer 924 algorithm, for more details you may refer to lecture 23. So, you have the streaming multiprocessors, each containing course, then these various data items for performing these simultaneous operations plus specialized texture storage that form GPU. (Refer Slide Time: 24:07) Now, there is one more concept that is the idea of shaders and shader programming. Let us try to understand this programming concept in brief at a very introductory level. In our earlier discussion on GPU what we discussed is that, how the GPUs implement pipeline stages. Now, in that discussion if you may have noted there are two broad group of activities, one is processing of vertices or vertex processing also called geometry processing. Other one is processing of pixels. So, these two broad group of activities were discussed to explain the working of GPU. 925 (Refer Slide Time: 24:54) Now, during the early years of GPUs they used to come with fixed function hardware pipeline, that means all the pipeline stages or all the stages that implement the pipeline are pre-programmed and embedded into the hardware. GPU content dedicated components for specific tasks and the user had no control on how this task should be performed and what processing unit performs which stage of the pipeline. So, earlier GPU is used to come with this fixed function hardware that means everything was predetermined, which component of the GPU will deal with which part of the pipeline and the user had no control on it. So, the flow was typically like this from user program the primitives were sent, then components were there for geometry processing, output is 2D screen coordinates from there pixel processing starts and components were again fixed. 926 (Refer Slide Time: 26:11) But then people realize that, that is actually reducing flexibility and because of that power of GPU was not fully utilized. To leverage the GPU power better, modern GPUs are designed to be programmable that means we can program them. Fixed function units are replaced by unified grid of processors known as shaders. So, earlier there were fixed function units, now there are unified grid of processors which are called shaders. (Refer Slide Time: 27:05) And any processing unit can be used for performing any pipeline stage calculation. And the GPU elements, that is the processing units and memory can be reused through user programs. 927 So, earlier we had fixed units for performing different stages, now we have common facilities which are reused to perform different stages and that is determined through programming. Which portion and how the GPU elements namely the processing units and memory are used for performing operations related to a particular stage. The idea is shown here as you can see once the primitives are sent to the GPU, GPU as a common element, now subset of these common elements are used for different purposes as you can see here also the memory is also shared and reused. (Refer Slide Time: 28:35) Now, the idea is that we write programs to use GPU elements, these programs are called shader programs and the corresponding approach is called shader programming. Let us briefly go through the basics of shader programming. 928 (Refer Slide Time: 29:01) Now, with the programable GPU that we have just introduced it is possible for programmer to modify how the GPU hardware processes vertices and shades pixels, shades means assigns colour to the pixels. This is possible by writing vertex shaders and fragment shaders, these are also called vertex programs and fragment programs. These are terms that you probably have come across with these are used to specify to the GPU how to use its hardware for specific purpose. And this approach as I said is known as shader programming and it has other names also such as GPU programming, graphics hardware programming and so on. 929 (Refer Slide Time: 30:12) In case of vertex shader, what happens is that, these programs are used to process vertices or the geometry. Essentially these programmes are used to perform modeling transformations, lighting and projection to screen coordinates which involve all the intermediate transformations of view transformation. And conceptually the window to view put transformation as well. (Refer Slide Time: 30:52) In case of fragment shader it does a different job, these are programmes that perform the computations required for pixel processing. Now what are the computations, those are related 930 to how each pixel is rendered, how texture is applied or texture mapping and whether to draw a pixel or not that is hidden surface removal. So, these 3 are the task done by fragment shaders. Note that all these 3 are related to processing of pixels. So, vertex shaders are processing of vertices mostly related to transformations from modeling coordinate to device coordinate, and all the transformations in between. Whereas fragment shaders deal with pixel processing that is rendering of pixels applying textures as well as performing hidden surface removal at the pixel level. (Refer Slide Time: 32:14) Now, why it is called fragment shader the pixel processing units, it implies that GPU at any instant can process a subset or fragment of all the screen pixels that are present. So, at a time a subset of the screen pixels are processed hence it is called a fragment shader. 931 (Refer Slide Time: 32:43) Now, this shader programs are small pieces of codes and they are sent to graphics hardware from user programs so essentially by calling some APIs and executed on graphics hardware. So, we should keep this in mind that they are small pieces of codes which are executed on graphics hardware and which are embedded in user programs, sent to the hardware by the user programs. (Refer Slide Time: 33:18) In fact, this ability to program GPUs gave rise to a new idea that is the idea of general purpose GPU or GPGPU. Again these are common terms nowadays and you probably have come across this term, this means that we can use GPU for any purpose not necessarily only 932 to perform graphics related operations. So, with the idea of GPGPU we can perform tasks that are not related to graphics at all, however these are very involved subjects and we will not go any further to explain these concepts. (Refer Slide Time: 34:06) So, in summary what we have learnt today, let us try to recap quickly. (Refer Slide Time: 34:20) We learnt about how the hardware works. Now, that means the graphics processing unit which are anyway part of the computer systems that deal with graphics operations. We also 933 learnt how the pipeline stages are implemented in the GPU and got introduced to the idea of shaders and shader programs. (Refer Slide Time: 35:02) Now, that is about hardware. So, in the previous lecture and today’s lecture we learnt about graphics hardware. We started with discussion on general architecture of a graphics system a very generic architecture, then explain different terms and then in some details learnt how the GPU works. One component remains that is how as a programmer we can write a programme to perform graphics operations. That we will learn that aspect of the course that is writing programs to perform graphics operation or create a scene on the screen that we will learn in the next lecture, where we will learn about writing programs using OpenGL which is a graphics library. 934 (Refer Slide Time: 36:10) Whatever we have discussed today you can find in this book, specifically Chapter 10, Section 10 point 3. That is all for today, see you in the next lecture. Thank you and good bye. 935 Computer Graphics Professor Dr. Samit Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 31 Hello and welcome to lecture number 31 in the course Computer Graphics. So, this is going to be our final lecture on the topic. So, far what we have discussed? (Refer Slide Time: 00:44) We discussed pipeline and then currently we are discussing pipeline implementation. In our earlier lectures we learned about the basic graphics hardware including the graphics input and output devices, the GPU or Graphics Processing Unit and the GPU programming basic idea. Today in our, this last topic we are going to learn about programming or how to write graphics programs that is essentially the software aspect of Computer Graphics. Now, before we learn to program, we will first start with a basic introduction to graphics software. If you may recollect during our introductory lectures, we had a preliminary introduction but today we are going to recap as well as expand those discussions. 936 (Refer Slide Time: 01:55) As we have mentioned earlier graphic software are broadly of two types. One is the special purpose packages and the other one is general programming packages. (Refer Slide Time: 02:13) In the special purpose packages, what we have? These are essentially complete software systems with their own GUIs or user interfaces. For example, painting system here it has its own user interface through which an artist can select objects, select colour, place the objects at desired 937 position on the Canvas or the screen, change the size of the object, change the shape, also orientation and so on. And all this, the artist can do by interacting with the user interface. So, there the artist need not know anything about the graphics pipeline or how it is implemented. These are examples of complete software systems or packages. Another example is the CAD package that we have learned about in the introductory lectures, CAD or Computer Aided Design packages. These are primarily used in architecture, medicine, business, engineering and such domains. (Refer Slide Time: 03:46) The other type of software is the general programming package. Now, here we have libraries, libraries of graphics functions that are provided and we can use those libraries with any programming language such as C, C++, or Java and these functions are mean to perform or rather mean to help a programmer perform pipeline tasks. So, in other words they help the program and implement the pipeline. An example is OpenGL, which stands for Open Source Graphics Library. Also there are VRML Virtual Reality Modeling Language, Java 3D and so on. So, there are many such libraries provided to implement graphics functions. 938 (Refer Slide Time: 04:55) Now, these functions are also known as computer graphics application programming interface or CG API. Now, they are essentially a software interface between programming language and the hardware. For example, when we want to write it an application program in a language say C, these library functions allow us to construct and display pictures on the output device. So, without these functions will not be able to do so. (Refer Slide Time: 05:39) 939 But one thing we should keep in mind is that the graphics functions are typically defined independent of the programming language and that is achieved through a concept called language binding. So, language binding is defined for a particular high-level programming language. Now, through such binding we get the particular syntax to be used for accessing various graphic functions from that language. So, essentially language binding allows us to use these library functions from inside a program written using a particular language. (Refer Slide Time: 06:33) Now, each language binding is designed to make the best use of the capabilities there for a particular language and they are designed to handle various syntax issues such as data types, parameter passing and error handling. Now, these specifications or language binding are set by the ISO or International Standard Organization, so we need to know about these standards. We will have a brief introduction to different standards used for computer graphics. 940 (Refer Slide Time: 07:28) So, what are those standards, software standards that are used in computer graphics? (Refer Slide Time: 07:32) Now, why we need standard let us try to understand again. When we are writing a program with graphic functions, it may be the case that those programs are moved from one hardware platform to another. Now, how the computer will then understand the program if the platform is changed? There we require standard, without some standards which is essentially a commonly agreed syntax, this movement between platforms will not be possible and we need to rewrite the whole 941 program. So, essentially we need to start from scratch. So, standard helps us avoid in such situation. (Refer Slide Time: 08:33) First graphic standard came in 1984, long ago which was known as the graphics kernel system or in short GKS. It was adopted by ISO as well as many other national standard bodies. (Refer Slide Time: 08:57) 942 Then came a second standard which was developed by extending the GKS, it was called PHIGS, which stands for programmer’s hierarchical interactive graphics standard. ‘PHIGS’, again it was then adopted by the standards organizations worldwide. (Refer Slide Time: 09:36) Now, around the same time when the other standards were being developed Silicon Graphics Inc or SGI started to ship their workstations meant for graphics with a set of routines or library functions together these are called graphics library for GL. (Refer Slide Time: 10:10) 943 Subsequently these set of functions or GL become very popular and eventually evolved as the OpenGL in the early 1990s, which had become a de facto graphic standard. Now, this standard is now maintained by the OpenGL architecture review board which is a consortium of representatives from many graphics companies and organizations. (Refer Slide Time: 10:45) Now, let us try to understand what is there in OpenGL, what functions it provide and how to use those functions. (Refer Slide Time: 10:58) 944 Let us try to understand OpenGL with respect to one example program. So, this program is shown here, this program is meant to display a straight line on the screen. Now, this has been written by utilizing OpenGL library functions called from C, the C language. Now, let us try to understand the syntax of the program. (Refer Slide Time: 11:30) So, in order to make use of the library functions, the first thing we should do is to include a header file. Now, this header file contains the library functions, so here we have included it with this statement hash include GL slash glut dot h. Now, what this library name means? 945 (Refer Slide Time: 12:04) The core library of OpenGL actually does not support input and output operations because those functions were designed to be device independent, whereas support for I/O is or must be device dependent. So, we need to do something about it because we have to display the line on the output which is essentially a device dependent operations. (Refer Slide Time: 12:40) 946 So, to display we require auxiliary libraries on top of the code library, this is provided by the library GLUT or OpenGL utility toolkit library, ‘GLUT’, GLUT library, that is mentioned in this include statement. (Refer Slide Time: 13:14) Now, GLUT provides a library of functions for interacting with any screen windowing system essentially any display device and it allows us to setup a display window on our screen, in this window we are going to show the image or whatever we want to display and this display window is essentially a rectangular area which contains the image, that we can do with the help of functions provided in the GLUT library. 947 (Refer Slide Time: 14:01) Now, whichever library functions we use that are part of GLUT they come with the prefix ‘glut’. (Refer Slide Time: 14:16) So, essentially these functions provide interface to other device specific window systems that we have already mentioned. So, we can write device independent programs using these GLUT functions and the functions themselves are used to link our program to the particular device. 948 (Refer Slide Time: 14:45) Also we should note here is that the library GLUT is suitable for graphics operations only and for any other operation we may need to include other header files such as stdio.h or stdlib.h as we do in our regular programs. (Refer Slide Time: 15:12) Now, let us start with the main function which is shown here, this function and let us try to understand the body of the function. As we said GLUT allows us to create and manage a display window or the screen region on which we want to display the line. So, the first thing that is 949 required is to initialize GLUT with the statement glutInit as shown here, this is the initialization function that is required at the beginning. (Refer Slide Time: 16:00) After initialization, we can set various options for the display window using the function glutInitDisplayMode as shown in the second statement. So, what are these options? (Refer Slide Time: 16:27) 950 Now, these options are provided by symbolic GLUT constants as arguments as shown here, GLUT_SINGLE, GLUT_RGB. (Refer Slide Time: 16:44) Now, here in this particular function we have used this statement having these two arguments GLUT_SINGLE and GLUT_RGB, they indicate that we are specifying a single refresh buffer to be used for the display window and RGB color mode to be used for selecting color values. GLUT_SINGLE is for the first task single refresh buffer and GLUT_RGB indicates that RGB color mode to be used. 951 (Refer Slide Time: 17:23) Now, here we should look at the syntax, how this glutInitDisplayMode function is used. In the constant name which provides the options, we have used GLUT as a prefix all caps followed by a underline symbol and then the constant name again all caps as shown here or here this is the particular syntax used to provide arguments. Now, to combine multiple options we are using this logical or operation, to indicate that we want both that is the syntax used to provide the options. (Refer Slide Time: 18:32) 952 Then we are using the two functions glutInitwindowPosition and glutInitwindowSize. Now, these are used to provide some values that are different than the default values for the window size and position that is already there in the library function. So, if we want to change the values then we need to use these two functions glutInitwindowPosition where is specify the value and glutInitwindowSize where we specify again the size value. (Refer Slide Time: 19:16) Now, this window position, which position is specifies? It specifies top left corner position of the window. Assuming integer screen coordinate system and assuming origin at the top left corner. These are the assumptions when we specify these values. 953 (Refer Slide Time: 19:45) Then in case of glutInitwindowSize where we are specifying the size, the first argument specifies width that means 800, second argument specifies height that is 600 and both these values are in pixels, so 800 pixels by 600 pixels. So, we have understood these four functions init, displaymode, windowPosition and windowSize. (Refer Slide Time: 20:26) Next we create the window and set a caption which is optional using the function Createwindow and the caption is provided within parentheses, but this caption is optional. 954 (Refer Slide Time: 20:52) The next thing that we do is specify that the picture is to be displayed in the window that is the line. Now, we have to create the line and then we can display it in the window, this creation is done by a separate function which is user defined which we are calling createLine function. (Refer Slide Time: 21:23) Now, this createLine function is passed as an argument to another glut library function that is glutDisplayFunction which is shown here. This indicates that the line is to be displayed on the window. So, with this function we indicate that we are creating a line which is our image here 955 that is using the create line function and this line is to be displayed on the window created through these statements. But before we do that certain initializations are required. (Refer Slide Time: 22:05) And these initializations are performed in the init function shown here. Again this init function is used to make our code look very clean, otherwise we could have used it in a different way and will come back to this function later. (Refer Slide Time: 22:37) 956 So, in order to keep the code clean and to indicate that we want to display a line on the window we add these two lines init and glutDisplayFunction as shown here. (Refer Slide Time: 22:53) Now, those are all done but the window is still not on the screen, we need to activate it once the window content is decided, that we do with this function glutMainLoop. Here it activates all display windows created along with their graphic contents. So, this function glutMainLoop actually puts the window with its content on the screen. (Refer Slide Time: 23:29) 957 This function must be the last one in our program, it puts the program into an infinite loop because the display we want constantly. In this loop the program waits for inputs from devices an input device such as mouse, keyboard, even if there is no input the loop ensures that the picture is displayed till the window is closed. So, since we want the picture to remain on the screen unless there is some input or the window is closed we use the loop and this loop must be at the last statement of the code in main after we create the image and put it on the window. (Refer Slide Time: 24:23) Now, as we have noted so we explained all these functions that are there in main and all this started with glut indicating that there glut library function except the two functions init and create line. Now, in these two functions we used OpenGL library function rather than glut library functions accordingly their syntax are different. 958 (Refer Slide Time: 24:57) Each OpenGL function prefixed with GL as we can see in this function init as well as in this create line function. So, here each function is starting with this prefix gl, it indicates that this function is a OpenGL function. Each component word within the function name has first letter capitalized like here C is capitalized in all the cases as you can see Matrix M is capitalized and so on. So, that is the syntax of OpenGL library function starts with gl and component word within this function name has first letter capitalized. (Refer Slide Time: 25:57) 959 Sometimes some functions may require one or more arguments which are assigned symbolic constants. For example, a parameter name, parameter value or a particular mode, these are all part of the OpenGL library function syntax. (Refer Slide Time: 26:21) Now, all these constants begin with capital GL all in capital. Each component of the name is written in capital letters and separated by underline symbol, as we have seen in the case of glut constants as well like GL underscore RGB, GL underscore AMBIENT underscore AND underscore DIFFUSE, where everything is in capital separated by underline. 960 (Refer Slide Time: 26:57) Also the OpenGL functions expect some specific data types. For example, 32 bit integer as a parameter value and these functions use built-in data type names for that. (Refer Slide Time: 27:19) Each of these names begins with GL and followed by data type name. For example, GLbyte, GLdouble, but this data type name is in lowercase. 961 (Refer Slide Time: 27:42) So, those are the syntax that are used for using OpenGL library functions. Now, let us try to understand these two functions that we have defined using the OpenGL library functions, one is init, one is create line. So, let us start with init. This is essentially mean to initialize and perform one time parameter settings. In our function we have used three OpenGL library routines or library functions. What they do? (Refer Slide Time: 28:25) 962 Now, one is glClearColor, the first one with some argument, four arguments are used. This is used to set a background color to our display window and this color is specified with RGB components. (Refer Slide Time: 28:48) Now, these RGB components are specified in the first three arguments in that order that means this is R, this is G, this is B, with this particular set of values as we all know you will get white as the background color, we can set any background color. For example, if we set all 0, we will get black. 963 (Refer Slide Time: 29:21) Now, there is also a fourth parameter which we have set as 0.0. Now, this is called alpha value for the specified color and it is used as a blending parameter. In other words it specifies transparency of the color. If we are using value 0.0 that means the color is totally transparent and 1.0 means totally opaque objects. So, it indicates transparency. (Refer Slide Time: 30:08) 964 Now, here we are displaying a line which is a 2D object. However, OpenGL does not treat 2D objects separately. Now, it treats the 2D pictures as special case of 3D viewing. So, essentially the entire 3D pipeline stages are performed. (Refer Slide Time: 30:34) So, we need to specify the projection type and other viewing parameters that is done with these two functions glMatrixMode, which is GL_PROJECTION and gluOrtho2D with some arguments. (Refer Slide Time: 30:58) 965 Now, this function gluOrtho2D here the function is prefixed to GLU rather than GL. So, it indicates that this function belongs to GLU or OpenGL utility another auxiliary library. Earlier we have seen GLUT OpenGL utility toolkit, now we are seeing OpenGL utility another auxiliary library. And this library provides routines for complex tasks such as setting up of viewing and projection matrices, describing complex objects with line and polygon approximations, processing of surface rendering operations and displaying splines with linear approximations, these are some examples of the complex tasks that are part of the pipeline which are implemented in this OpenGL utility auxiliary library. (Refer Slide Time: 32:00) Now, together these two functions glMatrixMode and gluOrtho2D specify an orthogonal projection to be used to map the line from view plane to the screen. Now, view plane window specified in terms of lower left and top right corners of the window. So, these arguments specify the lower left and top right corners of the window and during this projection anything outside of this window is clipped out as we have discussed during our pipeline discussion. 966 (Refer Slide Time: 32:58) Now, let us move to our second function create line. Now, this function is actually creates the line which we want to display. The first line is glClear with some arguments. Now, this function is used to display the window with specified background color. Now, the arguments as you can see an OpenGL symbolic constant indicates bit values in color or the refresh buffer that are to be set to the background color values specified in the function glClearColor function. So, essentially this function indicates what should be the background color of the display window. (Refer Slide Time: 34:10) 967 Now, OpenGL function also allows us to set the object color with the function glColor3f. So, there are three arguments again they specify the RGB components ‘RGB’, so these two functions are used to set color values to the background as well as to the object. (Refer Slide Time: 34:37) Now, in the second function this 3f indicates that the three components are specified using floating point values. In other words the values can range between 0.0 to 1.0. So, in this particular case these three values denote green color. (Refer Slide Time: 35:09) 968 Next we have a piece of code between the two functions glBegin and glEnd, so this indicates the line segment to be drawn between the endpoints provided in the arguments. So, this function essentially creates the line between these two endpoints specified with these two arguments and the functions glVertex2i called twice. (Refer Slide Time: 35:50) Now, here this 2i in the function as you can guess indicates that the vertices are specified by two integer values denoting the X and Y coordinates, this is quite straightforward. (Refer Slide Time: 36:10) 969 Now, the first and second endpoints are determined depending on their ordering in the code. So, this will always be treated as the first point because it is appearing before the other one and this will be the second point. So, in the way the code is written the first and second points are determined. (Refer Slide Time: 36:35) And this function glBegin with this constant GL_LINES as well as the function glEnd indicate that the vertices are line end points. (Refer Slide Time: 36:56) 970 Now, with all these functions our basic line creation program is ready. One point to be noted here is that these functions may be stored at different locations in the memory depending on the way OpenGL is implemented. (Refer Slide Time: 37:19) And we need to force the system to process all these functions. This we do with the other function glFlush as shown here. So, this should be again the last line of our picture generation procedure which indicates that all these functions that we have used must be processed one after another. 971 (Refer Slide Time: 37:47) So, that is how we can create a program using OpenGL. So, in our example we have used OpenGL library in the setting of C language and we have also seen that only OpenGL library is not sufficient, we need to some auxiliary libraries, here we have used GLUT as well as GLU auxiliary libraries, GLUT stands for GL Utility Toolkit which allows us to create the window which is a display dependent operation and GLU allows us to perform other complex tasks which are not there in core OpenGL library. (Refer Slide Time: 38:44) 972 So, with this we have come to the end of the topic. So, we have learned various things, the graphics hardware including the input output and GPU, also we started with a generic architecture of a graphic system and then learned about various IO and GPU and today we learned about the graphics software, how the softwares are created, different standards and an example program using OpenGL, OpenGL can be used to write any graphics program. Now, with this lecture we have come to the end of the course. So, in the next lecture we will summarize what we have learnt in this course so far. (Refer Slide Time: 39:44) Whatever I discussed today can we found in this book, you can go through chapter 10, section 10.4 to learn on Graphic Software including the OpenGL example. So, in the last lecture we will summarize our learning so far, so we will see you in the concluding lecture, till then thank you and goodbye. 973 Computer Graphics Professor Samit Bhattacharya Computer Science and Engineering Indian Institute of Technology, Guwahati Lecture 32 Concluding Remarks Hello and welcome to the last lecture in the course Computer Graphics. So we have reached the end of the course. Let us reflect on what we have learnt so far and how to use the knowledge. (Refer Slide Time: 00:45) (Refer Slide Time: 00:48) 974 So, we started with some objectives. What are the objectives? (Refer Slide Time: 00:54) Our broad objective was to learn about the process of computer graphics. In particular, rendering of static images on a screen. Now it has broadly two components, one is the idea of the pipeline and the other one is the implementation of the pipeline. So essentially what we tried to learn is that how an image is displayed on the screen, starting from object definition to final image synthesis and display. And it has two components. First one is, how to create the image or synthesize the image. As we have discussed that is done through the 3D pipeline and secondly how the image is actually physically rendered on the screen, that is done with the hardware and software support together which is part of the implementation of the pipeline. 975 (Refer Slide Time: 02:14) Now that was the broad objective. In order to achieve that broader objective we divided our learning objectives into smaller specific objectives. There are broadly these 3 specific objectives, learning about object representation which is the very first stage of image synthesis, then the pipeline stages which converts object definitions into a representation on the pixel grid and finally implementation of that representation on the physical screen. There we have the objective of learning about the basic hardware as well as the software. 976 (Refer Slide Time: 03:00) Now let us see, how we have learned the Broad Idea and what are the things that we have learned. (Refer Slide Time: 03:09) To start with, we learnt about a very generic graphics architecture to understand the image synthesis process. This graphic system architecture consists of 3 major components, one is the 977 display controller as shown here, then we have video memory and finally we have video controller. And these 3 components are used to synthesize an image. Now display controller is essentially the graphics card with the GPU or the graphics processing unit. Now this is the component responsible for implementation of the pipeline stages in hardware. Recollect that the idea is to exploit the inherent parallelism in the graphics processing operations and that is achieved with the use of GPU. Video memory is again there in the graphics card, it is the memory component of the card which is separate but it may be part of main memory also, which is typically not the case. And finally the video controller is used to convert whatever is there in the memory, digital data to analog signals for controlling electrochemical arrangements that are ultimately responsible for exciting the pixels on the screen. Along with that there may be input devices attached to the graphics system which allows the user to change the synthesized image that is the broad idea of the graphic system and the components involved to process the graphics operations. (Refer Slide Time: 05:34) And then we learned about the pipeline that is the conceptual stages involved in converting an image described in the form of component objects to the final synthesized image. We learnt the stages in a particular sequence starting with object representation that is the first stage. Here the 978 objects are defined in their local coordinate system. Then we have second stage that is modeling transformation. Here a transformation takes place which is responsible for constructing a scene in the world coordinate system by combining together the objects that are there in the local coordinate system. So essentially here there is a transformation from local to world coordinate. In the third stage we assign color to the objects where we assume that the objects are defined in the world coordinate. In the fourth stage, a series of transformations take place, this fourth stage is called viewing pipeline, again it is a pipeline of sub-stages. There are 5 sub-stages, first sub-stage is viewing transformation. Here we assume that the world coordinate scene is transformed to a view coordinate system, so essentially world to view coordinate transformation takes place here. Then in this view coordinate system we perform clipping so we define a view volume and whatever objects are outside that volume are clipped out. So this takes place in the view coordinate system. Then whatever is there inside the view volume are further processed to remove hidden surfaces with respect to a particular viewer position, this also takes place in the view coordinate system conceptually. After that we project the view coordinate scene to a 2D view coordinate system, so from 3D view coordinate system to 2D view coordinate system and this transformation is the projection transformation. Finally from the 2D view coordinate system we transform the image description to a device coordinate system that is the final sub-stage of the fourth stage. After this viewing pipeline stage is over we can convert the resulting image description in the device coordinate system that means from a continuous device coordinate we map it to the discrete pixel grid or the screen coordinate system. So these are the 5 stages that we have learnt in the pipeline. I would like to emphasize here again that these stages need not be in this exact sequence in which we have learnt. In implementation, this sequence may be different, so exact sequence need not be followed in implementation of the pipeline. The sequence I have used just to explain the concepts rather than explain how they are actually implemented. 979 (Refer Slide Time: 09:21) So to achieve this broader learning objective we have covered several topics, let us go through an overview of those topics. (Refer Slide Time: 09:27) So there are total 31 lectures to learn this broader idea. 980 (Refer Slide Time: 09:37) And these lectures were divided into groups. So, first 3 lectures were devoted to introduction to the field. (Refer Slide Time: 09:50) Then introduction to the 3D graphics pipeline was covered in lecture 4. 981 (Refer Slide Time: 10:00) Lecture 5 to 9, were devoted to discussions on object representation techniques, various techniques we covered. (Refer Slide Time: 10:13) 982 Then the other pipeline stages were covered in lectures 10 to 28. Now lectures 10 to 12 covered geometric modeling, the second stage. Lecture 13 to lecture 17 covered lighting. Lecture 18 to 24 covered viewing pipeline and lecture 25 to 28 covered the final stage that is rendering or scan conversion. (Refer Slide Time: 10:54) The pipeline implementation, how the pipeline is implemented in hardware as well as using software were covered in the remaining lectures. The two lectures 29 and 30 were devoted on the explanation of graphics hardware and the final lecture, lecture 31 was used to discuss software, graphics software. 983 (Refer Slide Time: 11:20) So I hope that you have enjoyed the course the lectures and you have learned the concepts. So with this learning I hope that you will be able to understand how graphic systems work. How your program can create an image on the screen of your computer and may be with this knowledge you can even think of developing a library of your own, a general purpose graphics library which others can use to create their own programs. Also you can think of developing special graphics applications using these library functions like the one we discussed earlier painting packages or CAD packages and so on. (Refer Slide Time: 12:24) 984 So I hope that you have learned all these concepts, the lectures were interesting and understandable. Ofcourse in the lectures I could not cover everything so for more details you may always refer to the main learning material as well as the reference learning materials that I have mentioned throughout the lecture. That is all. Wish you all the best, thank you. (Refer Slide Time: 12:54) 985 THIS BOOK IS NOT FOR SALE NOR COMMERCIAL USE (044) 2257 5905/08 nptel.ac.in swayam.gov.in

Computer Graphics Lecture Notes

Related documents

Products

Support

Computer Graphics Lecture Notes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib