Uploaded by Dev Soni

CG IIT

advertisement
INDEX
S. No
Topic
Week 1
Page No.
1
Introduction to graphics
1
2
Historical evolution, issues and challenges
25
3
Basics of a graphics system
58
4
Introduction to 3D graphics pipeline
94
Week 2
5
Introduction and overview on object representation techniques
121
6
Various Boundary Representation Techniques
139
7
Spline representation – I
162
8
Spline representation – II
204
Week 3
9
Space representation methods
247
10
Introduction to modeling transformations
274
11
Matrix representation and composition of transformations
302
12
Transformations in 3D
332
Week 4
13
Color computation – basic idea
358
14
Simple lighting model
382
15
Shading models
413
16
Intensity mapping
446
Week 5
17
Color models and texture synthesis
475
18
View transformation
510
19
Projection transformation
538
20
Windows-to-viewport transformation
565
Week 6
21
Clipping introduction and 2D point and line clipping
594
22
2D fill-area clipping and 3D clipping
629
23
Hidden surface removal – I
665
24
Hidden surface removal – II
702
Week 7
25
Scan conversion of basic shapes – I
726
26
Scan conversion of basic shapes – II
754
27
Fill area and character scan conversion
792
28
Anti-aliasing techniques
829
Week 8
29
Graphics I/O Devices
872
30
Introduction to GPU and Shaders
908
31
Programming with OpenGL
936
32
Concluding remarks
974
Computer Graphics
Dr. Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 01
Introduction to Graphics
Hello and welcome to the first lecture of the course Computer Graphics. In this lecture we
will try to get an overview of the basic idea of graphics and what it means.
(Refer Slide Time: 00:50)
So, let us begin with a very simple trivial question, what we do with computers? I think most
of you will be able to tell that we do lot of things. Let us see some examples, what are the
things that we do with a computer.
1
(Refer Slide Time: 01:07)
The first example that we will see is related to a document processing task. So essentially we
are interested in creating document and let us see what we do there and what we get to see on
the screen.
(Refer Slide Time: 01:29)
On the screen I have shown one example of a document creation going on, this is essentially
the creation of the slides from which I am delivering the lecture. So, as you can see there are
many things that are being shown on the screen. So, what are those things what are the
components that we are seeing on the screen?
2
(Refer Slide Time: 01:56)
In fact there are large number of different things, the most important of course because we
are talking of document processing activities the most important component is the alpha
numeric character. So, there are many such characters the alphabets the numbers and how we
enter those characters? By using a keyboard, either physical or virtual.
(Refer Slide Time: 02:32)
But apart from that there are other equally important components.
3
(Refer Slide Time: 02:39)
For example, the menu option that we see here on the top side of the screen. As well as the
icons various icons representing some editing tools that we get to see on the top part of the
screen. So, here or here in fact all these components are essentially editing tools and the icons
representing those tools.
(Refer Slide Time: 03:19)
We also have the preview slides on the left part which is another important component.
4
(Refer Slide Time: 03:28)
So, if you have noted some of these components are shown as text like the alphanumeric
characters and the others are shown as images like those icons.
(Refer Slide Time: 03:41)
So, essentially there is a mix of characters and images that constitute the interface of a typical
document processing system.
5
(Refer Slide Time: 03:52)
Now, let us see another example which you may or may not have seen but it is also quite
common that is essentially CAD interface or Computer Aided Design Interface. So, CAD
stands for Computer Aided Design. And this is an example of the interface so there are many
difference system with different interfaces what I have shown here is one such example.
(Refer Slide Time: 04:21)
So, what this systems do, essentially with this system, someone can actually design
machinery parts and there are some control buttons to do various operations on this parts.
6
(Refer Slide Time: 04:46)
And as you can see the overall part that is the entire image is constructed from individual
components like this smaller gears or this cylinder, this cubes smaller components. And these
smaller components are having some specified properties for example dimension.
(Refer Slide Time: 05:31)
So, with this interface then what we can do typically engineers use such interfaces to create
machinery by specifying individual components and their properties and try to assemble them
virtually. On the screen to check if there is any problem in the specifications. So, clearly
since everything is done virtually the engineer does not require any physical development of
the machinery, so it saves time it saves cost and many other things. So, that is example 2.
7
(Refer Slide Time: 06:08)
Now let us see one more example another interesting example of computer graphics, this is
related to visualization or trying to visualize things that otherwise is difficult to visualize. So,
under visualization we will see a couple of example the first one is visualization of a DNA
molecule, now DNA as you all know stands for Deoxyribonucleic acid is essentially kind of
your genetic code present in every cell and it is not possible to see it with our bear eyes as we
all know.
But it will be good if we can see it somehow in some manner, and application of computer
graphics known as visualization makes it possible, like it is shown here. So, this type of
visualization is known as scientific visualization where we try to visualize things that occur in
nature but we cannot see otherwise or difficult to see.
8
(Refer Slide Time: 07:23)
There is another type of visualization, let us see one example, suppose we want to visualize a
computer network how traffic flow happens in the network, here by traffic I mean packets the
packets that are being moved in the network, in any case we are not in a position to visualize
it with our eyes but computer can help us with computer we can actually create a
visualization of the network traffic flow.
(Refer Slide Time: 08:06)
These type of visualization are known as information visualization, here we are not dealing
with natural objects instead we are dealing with unnatural or man-made information and we
are trying to visualize that information. So, we have two types of visualization: scientific and
9
information. And these are applications of computer graphics that help us perceive that help
us understand things that otherwise we will not be able to perceive.
(Refer Slide Time: 08:55)
So, as I said each of the examples that I have discussed earlier is an example of the use of
computer graphics.
(Refer Slide Time: 08:57)
But these are only three examples in fact the spectrum of such applications of computer
graphics is huge and everything that we get to see around us involving computers are
basically applications of computer graphics and it is definitely not possible to least all those
applications.
10
(Refer Slide Time: 09:28)
Also we have to keep in mind that not only desktop or laptop screens we are here talking
about a pleather of other types of displays as well that includes mobile phones, information
kiosks at popular spots such as airports, ATMs, large displays at open air music concerts, air
traffic control panels even movie screens in the theatres all these are some kinds of display
and whatever is being shown on this displays are mostly applications of computer graphics.
So, we have two things one is large number of application second is application on all
possible displays.
(Refer Slide Time: 10:26)
And as I have already mentioned earlier those who are not very conversion to the inner
working of a computer for them whenever we use the term computer essentially the thing that
11
comes to the mind of such lay persons is the display whatever is being shown on the display.
So, essentially the display is considered as computer by those who are not very wellaccustomed with the inner workings of a computer.
(Refer Slide Time: 11:04)
Now, what is the common thing between all this applications, instances of images that are
displayed? Now, here by image we are refereeing to both text characters alpha numeric
characters as well as actual images because texts are also considered as images as we shall
see in our subsequent lectures.
(Refer Slide Time: 11:19)
And these images are constructed with objects components of the objects like we have
discussed in the CAD application like there are individual objects as we have seen earlier,
12
now these objects are essentially geometric shapes. And on these objects, we assign some
colors like the yellow color here or the blue color here or the white here. So, colored
geometric objects are there which are used to create the overall image.
(Refer Slide Time: 12:08)
Along with that there is a one more thing when we create edit or view a document we are
dealing with alphanumeric characters and each of these characters is an object. Again, we
shall see in details why characters are considered to be objects in subsequent lectures. And
these objects are rendered on the screen with different styles size as well as color. Like the
typical objects that we have noted in the previous case.
13
(Refer Slide Time: 12:42)
Similarly, if we are using some drawing application drawing package like MS paint or the
drawing application of MS word, there we deal with other shapes such as circles, rectangles,
curves, these are also objects and with these objects we create a bigger object or bigger
image.
(Refer Slide Time: 13:12)
Finally, in the case of animation videos or computer games which involves animation
anyway. In many cases we deal with virtual characters. Those are essentially some artificially
created characters which may or may not be human like.
14
(Refer Slide Time: 13:31)
And all these images or their components can be manipulated because nowadays most traffic
systems are interacting. So, user can interact with the screen content and manipulate the
content. For that input devices are there such as mouse, keyboard, joystick and so on.
(Refer Slide Time: 14:01)
Now, how a computer can do all these stuff, all these things. What are those things? Let us
recap again. Images consisting of components so we need to represent those components then
we need to put them together into the form of a image and we should allow the user to
interact with those components or the whole image through input devices as well as we
should be able to create the perception of motion by moving those images. How a computer
can do all these things?
15
(Refer Slide Time: 14:42)
We all know you probably have already done some basic courses where you know that
computers understand only binary language that is language of 0s and 1s, on the other hand in
computer graphics what we have letters numbers, symbols characters but these are not 0s or
1s. These are something that we understand we can perceive we can understand. So, what is
needed there are two questions related to that.
(Refer Slide Time: 15:23)
First question is how we can represent such objects in a language that the computer
understands and the computer can process.
16
(Refer Slide Time: 15:39)
The second question is, how we can map from the computers language to something that we
can perceived, so essentially with the computer output in 0s and 1s we will not be able to
understand what that means. So, we want again in the form of those objects that we have
mentioned earlier. So, one thing is mapping from our understanding to computers language
and other thing is mapping from computers understanding to our language.
(Refer Slide Time: 16:06)
In other words, how we can create or represent synthesize and render images on a computer
display this is the fundamental question that we try to answer in computer graphics.
17
(Refer Slide Time: 16:23)
From this fundamental question we can frame FOUR component questions.
(Refer Slide Time: 16:29)
First one is as we have already said imagery is constructed from constituents parts. So, how
we can represent those parts that is the first basic question.
18
(Refer Slide Time: 16:46)
Second question is how to synthesize the constituents parts to form a complete realistic
imagery? So, that is our second question.
(Refer Slide Time: 17:01)
Third question is how to allow the users to manipulate the imagery or its constituents on the
screen with the use of input devices. That is our third fundamental question.
19
(Refer Slide Time: 17:22)
And finally, the fourth question is how to create the impression of motion to create
animations. So, these are the four questions first is how to represent, second is how to
synthesize, third is how to interact and fourth is how to create animation.
(Refer Slide Time: 17:43)
Now, in computer graphics we see answers to these four basic questions.
20
(Refer Slide Time: 17:47)
Here few things need to noted first of all when we are talking of computers screens, we are
using it in a very broad sense because the screens vary in a great way as we all are aware
nowadays from small displays to display walls to large displays and these variations indicate
corresponding variations in the underling computing platform however we will ignore those
things when we refer to computers screen will assume that we are refereeing to all sorts of
screens.
(Refer Slide Time: 18:33)
Accordingly, whatever we discuss our objective would be to seek efficient solutions to the
four basic questions for all possible platforms. For example, displaying something on mobile
phone requires techniques difference from displaying something on your desktop, because the
21
underling hardware may be different. There is a difference in CPU speed, memory capacity,
power consumption issues and so on. So, when we are proposing a solution to answer one of
these question or all these questions we should keep in mind these underlying variations.
(Refer Slide Time: 19:23)
Now, in summary what we can say about computer graphics is that this is the process of
rendering static images or animation which is a sequence of images on the computer screen,
that to in an efficient way, where efficiency essentially refers to the efficient utilization of
underlying resources.
(Refer Slide Time: 19:48)
In this course we shall learn in details this process particularly the stages of the pipeline
where the pipeline actually refers to set of stages which are part of this whole process of
22
rendering and pipeline implementation that is how we implement the stages, this involve a
discussion on the hardware and software basics for a graphic system. However, we will not
discuss the process of creation of animation which is a vast topic in itself and requires a
separate course all together.
(Refer Slide Time: 20:31)
This is just for your information, that there is a related term probably some of you may have
heard of it called image processing, now in image processing we manipulate images whereas
in computer graphics we synthesize images and also we synthesis it in a way such that it
gives us perception of motion that we call animation.
So, computer graphics deals with synthesis of image as well as animation, whereas image
processing deals with manipulation of already captured images. And in many applications
these two are linked but those things will not discuss in this limited scope of the course.
23
(Refer Slide Time: 21:17)
So, whatever we have discussed today you can find in details from this book more
specifically you should refer to chapter 1, section 1 for the topics that we covered today. In
the next lecture we will go through some historical evolution of the field followed by a
discussion on the issues and challenges that are faced by workers in this field. Thank you and
good bye.
24
Computer Graphics
Professor. Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 2
Historical Evolution, Issues and Challenges
Hello and welcome to lecture number 2 in the course Computer Graphics. So, before we
start, let us recap what we have learned in the previous lecture.
(Refer Slide Time: 0:40)
In the last lecture if you may recall, we got introduced to the field and talked about the
basic idea. That means, what is computer graphics, what it deals with. Now today, what
we are going to do is, we will discuss the historical evolution of the field. And also we
will discuss the issues and challenges that are faced by researchers in this area.
Historical evolution knowledge is always beneficial for the broader understanding of the
subject. So, we will go into a bit details of the evolution followed by discussion on the
issues and challenges.
25
(Refer Slide Time: 1:25)
In the early days when computer just started appearing, that means in the 1940s, 50s of
the last century, displays constituted a terminal, a terminal unit capable of showing only
characters. So, in earlier days we had displays that used to show only characters, there
was no facility or no way to show anything other than characters.
(Refer Slide Time: 2:05)
Subsequently, the ability to show complex 2D images was introduced, that is the later
developments. Now, with the advent of technology other things changed.
26
(Refer Slide Time: 2:24)
We have now higher memory, capacity and increased processor speeds. Along with those
changes, the display technology also improved significantly, so we had 3 broad
developments, memory capacity enhancement, processor speed increase, as well as
improvement in display technology.
(Refer Slide Time: 2:56)
Now, all 3 contributed together to display or to make it possible to display complex 3D
animations which are computationally intensive and which assumes that we are capable
27
of performing the computations in real time. How computers computationally intensive
these processes are, we will see in the subsequent lecture. In fact, that is the core content
of this course.
(Refer Slide Time: 3:43)
Now, if we look closer to the 3D animation, then we will see that there are 2 aspects, one
is synthesize of frames and the second one is combining the frames together and render
them in a way such that it generates a perception of motion or generates the motion
effects. Now, synthesis of frame as well as combining them and rendering them on the
screen to generate motion are complex processes and they are also resource intensive.
They require lots of hardware resources.
So, these are the main focus areas of present day computer graphics activities, how to
make the computer these processes workable in the modern day computing environment.
28
(Refer Slide Time: 4:48)
Now, we are repeatedly talking about the term computer graphics but it has an origin. So,
the term was first coined by William Fetter of the Boeing Corporation in 1960. That is 60
years ago.
(Refer Slide Time: 5:09)
Subsequently, Sylvan Chasen of Lockheed Corporation in 1981 proposed 4 phases of the
evolution of the field. What are those phases? The phase was concepts to birth which is
typically considered to be between 1950 and 1963. This is also known as the gestational
29
period. The second phase is the childhood phase of short duration 64 to 70 in the last
century, then we have adolescence, this is again somewhat significant phase and span
between 1970s to early phase of 1980s and then we have the adulthood which is still
continuing starting from the early 1980s.
So, these are the 4 phases that was proposed by Sylvan Chasen in 1981, gestational
period, childhood, adolescence period and adulthood. Now, let us have a quick look at the
major developments that took place in each of these phases.
(Refer Slide Time: 6:31)
Let us start with the first phase that is the gestational period between 1950 and 1963 at
the early stages of computers. Now, if you are aware of the evolution of computers in the
early phases, then you know that, that phase the gestational period also coincides with the
early developmental phases of the computing technology itself. So, that was the phase
when technology evolved.
And nowadays, we take for granted the availability of interfaces that are popularly known
as graphical user interfaces. So, we get to see it on all of our computer screens mostly, if
we are using desktop, laptops or even smartphones. But in that phase, in the gestational
period, the GUI concept was not there. In fact, nobody was even aware of the possibility
of such an interface, it could not be imagined even.
30
(Refer Slide Time: 7:47)
Now in that phase, there was one system developed which was called SAGE, which
stands for Semi automatic, Semi Automatic Ground Environment. Now, it is developed
by or for the benefit of the US Air Force, which is part of a bigger project called the
Whirlwind project which was started in 1945. Now, the SAGE system is an early
example from this phase of gestational period demonstrating the use of computer
graphics.
(Refer Slide Time: 8:39)
31
What this system does or what this system did, now the basic idea of the project was to
get the positional information of an aircraft from rudder stations that is typically the job
radar network. Now there is an operator who like this operator here, who was sitting in
front of a screen, as you can see, but not the traditional screens that we are accustomed
with but early version of a screen.
(Refer Slide Time: 9:21)
And on this screen aircrafts are shown and on the aircraft other data, the data received
from the radar was superimposed. So, essentially what we have is that on the screen, a
geographical region is shown and on that region the aircraft information is shown.
32
(Refer Slide Time: 9:48)
There was more one more aspect of the system. So, it was actually in a sense interactive
system, so the operator can actually interact with the system with the use of an input
device called a light gun or pen, light pen. Now, if there is an aircraft shown on the
screen, the operator can point the pen to that aircraft to get the identification information
of the aircraft.
(Refer Slide Time: 10:30)
33
So, when the gun was pointed at the plane symbol on the screen an event was sent to the
Whirlwind system which in turn sent the details as text about the plane or about the
identification information of the plane which was then displayed on the screen of the
operator. Something like this. As you can see this is a light gun or light pen, operator is
pointing the pen on the screen where an aircraft symbol is shown and once the pointing is
done, then the system sends message to the overall system, Whirlwind system which had
all the information, which is sent back to the interface to be seen by the operator.
(Refer Slide Time: 11:34)
So as I said, the system SAGE which is part of the Whirlwind system had traces of
interactive graphics, where the interaction was done with the light gun or the light pens,
but it was still not fully interactive the way we understand interaction in the modern
context. True potential of the interactive computer graphics came into picture after the
development of another system called Sketchpad by Ivan Sutherland way back in 1963.
So, this Sketchpad system was a part of doctoral theses of Ivan Sutherland at MIT. And
this system actually demonstrated the idea as well as the potential of an interactive
graphics system.
34
(Refer Slide Time: 12:46)
Like the SAGE system in Sketchpad also, the interaction was done through light pen and
it was mean to develop engineering drawings directly on a CRT screen. So, here the
operator need not be a passive input provider instead active input can be given in the
form of creating drawings itself on the screen. An example is shown in this figure as you
can see, this is the screen and on the screen the operator is holding light pen to create a
drawing here.
(Refer Slide Time: 13:36)
35
Now this Sketchpad system actually contains many firsts. It is widely considered to be
the first GUI, although the term GUI was still not popular at that time, it is also credited
with pioneering several concepts of graphical computing namely how to represent data in
memory, how to deal with flexible lines, ability to zoom in and out, draw perfectly
straight lines, corners, joints.
These are things that nowadays we take for granted but these were very, very difficult at
the time and sketchpad actually managed to demonstrate that these are possible.
Accordingly, Sutherland is widely acknowledged by many as the grandfather of
interactive computer graphics.
(Refer Slide Time: 14:50)
Now along with SAGE and Sketchpad, this period, gestational period also saw
development of many other influential systems.
36
(Refer Slide Time: 15:03)
During this phase first computer game called Spaceware was developed in 1961 on a
PDP-1 platform which is an early computing platform.
(Refer Slide Time: 15:25)
IBM also developed the first CAD or Computer Aided Design system, recollect our
previous lecture these systems are meant for helping engineers create mechanical
drawings and test various thing without actually requiring to build the system. And in the
37
gestational period, IBM came up with this first CAD system in 1964 although the work
started in 1959.
(Refer Slide Time: 16:02)
Now, the gestational period was followed by the childhood period, which is reasonably
short duration period only of 6, 7 years. Now, in this period now much significantly new
things happen only whatever was developed earlier in the gestational period, further
development took place along those lines and consolidation took place of the earlier
ideas.
38
(Refer Slide Time: 16:37)
Then came the adolescent period, mostly confined to the 1970s and early phase of 1980s.
Now, in this phase again, many new things happen, in 1971 Intel released the first
commercial microprocessor called the 4004. Now, as we all know, with the coming of
this microprocessor, a paradigm shift took placed in the way computers were designed
and that in turn impacted the computer graphics field in a significant way by making
computations less costly and affordable.
(Refer Slide Time: 17:32)
39
As a result, in this period several interesting things happened, primarily two types of
developments took place, one is techniques for realistic 3D graphics and several
applications were developed during this phase particularly in the entertainment and
movie making fields. As a result of those applications, people started noticing the
potential of the field and invested more and more time and money so, both the
development were significant in the context of overall evolution of the field.
(Refer Slide Time: 18:16)
Now, what were the works that were done for realistic and 3D image generation? One
important development was the working on the lighting models. Now, these models we
will learn later. What these models were meant to do, were to assign colors to pixels and
this coloring of pixels or smallest graphical units on a screen is very important to give us
a perception of realistic images as we all know. And we shall see in details in later
lectures.
40
(Refer Slide Time: 19:03)
Apart from that, another thing took place that is development of texture mapping
techniques, now texture is basically patterns that we get to see on the surfaces. So, if we
can impose textures on our artificially created object surfaces, then definitely that will
lead us to a more realistic image representation and that development took place in this
adolescence period.
So, the first work was done by Catmull in 1974. First notable work, as you can see, that
on this object some textures are shown, because of that, we are able to make out that it is
a 3D object and it is having certain characteristics. So, without texture, it will look dull
and non realistic.
41
(Refer Slide Time: 20:05)
An advanced form of texture mapping was done through Bump mapping by Blinn in
1978. Like the example shown here, on the object surfaces, we can see that that special
type of textures were incorporated, inserted to make it look more real, natural. These are
called bumps, Bump mapping.
(Refer Slide Time: 20:34)
Also another development took place which is an advanced technique of creating 3D
images that is called Ray Tracing and first notable development took place in 1980s in
42
the adolescence period, using this technique, we can develop realistic 3D images on a 2D
screen, in a more better way than using the other techniques. Now, these are techniques
that were developed to improve the quality of the synthesized images, to make them more
realistic, more natural.
So, to recap, broadly 4 approaches were develop in this phase. First one is lighting
modern, basic work on the lighting model followed by texture model and bump
modeling, bump mappings and finally Ray tracing methods. Apart from that, as I
mentioned earlier, another strand of development that took place during this phase was
development of several applications of computer graphics, whatever was the state of the
art at that time based on that several applications were developed. Particularly in
entertainment and movie making.
(Refer Slide Time: 22:12)
So, in 1973 the first movie came out named Westworld, which was the first movie to use
computer graphics.
43
(Refer Slide Time: 22:26)
This was followed in 1977 by the movie Star Wars, I think most of you, if not all, may be
aware of this movie. So, the first movie came out in 1977 and it became hugely popular
throughout the world and as a result, people learned about the potential of computer
graphics in a more compelling way.
(Refer Slide Time: 23:01)
The adolescence period was followed by the adulthood period, starting from the early
phase of 1980s. Now, in this period, the field entered the adulthood with the release of
44
IBM PC in 1981. Now as we all know, after the advent of the PC or personal computers,
computers became a mass product, earlier it used to be confined to only a few people
who were well educated in an advanced stage of studies and primarily does research or
development works using this but after the advent of PC proliferated and become a mass
product. And since it had become a mass product, focus now shifted to the development
of applications that were appealing to the masses.
(Refer Slide Time: 24:15)
And using computer graphics lots of such applications were developed and focus shifted
from graphics for expert to graphics for laymen.
45
(Refer Slide Time: 24:32)
And as a result, we got to see several developments including the development of GUIs
and the associated concepts. In fact, so many developments took place that it gave rise to
a new field of study, which is called human-computer interaction or HCI in short.
(Refer Slide Time: 24:52)
One thing happened during this phase, a self sustaining cycle of development emerged,
what is that? As more and more user friendly systems emerge, they create more and more
interest among people, in turn that brings in new enthusiasm and investments on
46
innovative systems. So, it is a self sustaining cycle of development, more and more
applications are there that is appealing to more and more people and the people in turn
want more and more so, more and more investment came and it continued and it is still
continuing.
(Refer Slide Time: 25:42)
And as a result of this self sustaining cycle of development, other associated
developments took place. So, from CPU, we migrated to GPU or graphics processing,
dedicated hardware for graphics, storage capacity improved significantly to be able to
store and process large amount of data required for 3D realistic graphics. So, now we are
talking in terms of terabytes, petabytes, instead of kilobytes or megabytes that used to be
the case earlier.
Similarly, display technology have seen huge improvement from the earliest cathode ray
tubes to modern day touchscreens or situated walls or even better things. So, all this took
place because of this self sustaining cycle of development.
47
(Refer Slide Time: 26:42)
So, we can say that these technological developments brought in a paradigm shift in the
field and we are now in a position with the help of new technology to develop algorithms
to generate photorealistic 3D graphics in real time. So, all these things are important and
this will form the core subject matter of our discussion in subsequent lectures. Now, note
that all these are computation intensive process and because of the advancement in
technologies, such computation intensive process has become manageable, possible to
implement in real time.
48
(Refer Slide Time: 27:40)
And since we are able to do those things now then the appeal and application of computer
graphics have increased manifold and they presence of all these factors implies that the
field is growing and will continue to grow in the foreseeable future. So, that is in brief the
evolution of the field, 4 phases starting with the gestational period to the adulthood and
the major developments we briefly discussed.
Now, let us shift our focus to another important aspect of the field that is what are the
issues and challenges that confront workers in this field?
49
(Refer Slide Time: 28:28)
Now, in the formative stages of the field, primary concern was as we all know, generation
of 2D images or 2D scenes.
(Refer Slide Time: 28:39)
But again as we have already discussed that subsequently changed and 2D graphics is no
longer the thrust area and we are mostly focused on, nowadays we are mostly focused on
the generation of 3D graphics and animation.
50
(Refer Slide Time: 29:02)
In the context of 3D graphics and animation, there are 3 primary concerns related to
software, software development for the system.
(Refer Slide Time: 29:19)
One is modeling which essentially means creating and representing object geometry in 3d
world and here we have to keep in mind that we are not only talking about solid
geometric objects, but also some phenomena such as bellowing of smoke, rain, fire, some
51
natural events phenomena so, how to model both objects as well as phenomena, that is
one concern.
(Refer Slide Time: 29:58)
Second concern is rendering, essentially creating and displaying 2D image of the 3D
objects, why 2D image? Because our screen is 2D so we have to convert the 3D objects
into a 2D form. So, then this rendering deals with issues related to displaying the
modeled objects on the screen and there are some other related issues involved namely
color, coloring of the pixels on the screen, color and illumination which involves
simulating the optical process.
Then, visible surface determinism with respect to the viewer position, textured patterns
on the surfaces or texture synthesis to mimic realism, 3D to 2D transformation and so on.
So, these are the issues that are there in rendering.
52
(Refer Slide Time: 31:11)
Then the third issue, third major issue related to graphic software is animation, describing
how the image changes over time so, what it deals with? It deals with imparting motion
on the objects to simulate movement, so, give us a perception of movement. Then the key
concerns here are modeling of motion and interaction between objects during motion. So,
the 3 major issues related to software are modeling of objects, rendering of objects and
creating of animation. Now, there are some hardware related issues as well.
(Refer Slide Time: 32:06)
53
Why those are important, because quality and cost of the display technology is of
important concern, because there is always a tradeoff between the two, quality of the
hardware as well as the cost, so we cannot get high quality in low cost and vice versa.
And while building a graphics system application, we need to keep in mind this tradeoff.
(Refer Slide Time: 32:39)
Along with that, we need to keep in mind selection of appropriate interaction device
because nowadays we are talking of interactive computer graphics. So, the interaction
component is important and it is important to choose an appropriate mode of interaction
or input device such that the interaction appears intuitive to the user. The user should not
be forced to learn complex patterns or complex operations, it should be as natural as
possible.
54
(Refer Slide Time: 33:20)
Finally, design of specialized graphic devices to speed up the rendering process is also of
utmost importance. Because graphics algorithms are computation intensive and if we can
have dedicated hardware to perform those computations, then we can expect better
performance. Now, the issue is how to design such hardware at an affordable cost and
that is of primary concern related to hardware platforms for computer graphics.
So, from the point of view hardware, we have this quality of the hardware as well as cost
tradeoff to keep in mind also, we have to keep in mind the type of input device we are
using as well as the dedicated graphic systems that we can afford.
55
(Refer Slide Time: 34:31)
Now, one thing we should note here is that in this course, we shall learn how the issues
are addressed, but we will not discuss issues related to animation, we will restrict our
discussion to modeling and rendering of 2D images on the screen.
(Refer Slide Time: 34:57)
So, whatever we have discussed so far can be found in chapter 1 of the book that we are
following. You are advised to go through section 1.1 and section 1.2 for getting more
56
details on the topics that we have covered today. So, that is all for today, we will meet
again in the next lecture, thank you and good bye.
57
Computer Graphics
Professor Doctor Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 3
Basics of a graphic system
Hello and welcome to lecture number 3, in the course Computer Graphics. Before we go into
the topics of today's discussion, let me briefly recap what we have learnt in the previous
lectures.
(Refer Slide Time: 0:45)
So in the first lecture we got some basic introduction to the field, what is graphics and what
are the main characteristics of this field. This was followed by a brief discussion on the
historical evolution as well as the issues and challenges that confronts the researchers and the
workers in this area. So, these three topics we have covered in the previous lectures. Today,
we shall introduce a basic graphics system so that in subsequent discussions it will be easier
for us to understand the content.
58
(Refer Slide Time: 1:25)
So, what we do in computer graphics? The answer is simple, we generate or synthesize a 2D
image from some scene and we display it on a screen. So essentially generation of the images
and display on the screen. Now, how do you do that? So in the previous lectures we went into
some details of this questions, now let us try to understand the answer from the perspective of
the graphics system.
(Refer Slide Time: 2:00)
So if we look at a graphic system, the components that are likely to be there looks something
like this. So we have a host computer, where all the processing takes place, then we have a
display controller one component of the graphics system and this display controller takes
59
input from the host computer in the form of display commands and also it takes input from
input devices, various input devices we mentioned earlier for enabling us to interact with the
screen content.
Now the output of the display controller goes to another component called video memory.
Video memory content goes to third component called video controller which eventually
displays or which eventually helps to display the image on the display screen. So, there are
broadly three components that are unique to a graphic system; display controller, video
memory and video controller. So we will have a discussion brief discussion on each of these
components for better understanding.
(Refer Slide Time: 3:40)
60
Let us start with display controller. Now image generation task is performed by the display
controller, so when you say that in computer graphics our primary objective is to generate an
image, that generation task is performed by the display controller and it takes input from the
CPU of the host computer as well as external input devices such as mouse, keyboard, joystick
etc.
(Refer Slide Time: 4:12)
And based on these inputs it generates images, now these images are generated following a
multistage process which involves lots of computation.
(Refer Slide Time: 4:30)
61
One concern here is that if all these computations are to be carried out by the host CPU, then
it may get very less time to perform other computations. So a computer is not meant only to
display, it is supposed to perform some other activities as well. Now if the CPU is engaged
with only the computations relevant for display, then it will not have time to perform other
computations which in effect will affect the throughput of the system. So in such a situation
the system or the host computer system would not be able to do much except graphics which
definitely is not a designable situation.
(Refer Slide Time: 5:20)
To avoid such situations and increase efficiency of the system the job of rendering or
displaying is usually carried out by a dedicated component of the system which probably
some of us or all of us had heard of is called graphics card. Now in this card there is a
dedicated processor like CPU we have a dedicated processing unit for graphics computing
which is called GPU or Graphics Processing Unit. Later on will have one lecture on the basic
idea of GPU, for the time being will just mention that there is a unit called GPU in the
graphics card.
62
(Refer Slide Time: 6:24)
And the CPU as science any graphics rendering task to this separate graphics unit and we call
this graphic unit as the display controller which is a of course generic name and in different
systems it is called in different ways. So essentially display controller deals with performing
the multi-stage operations required to create or synthesize a 2D image.
(Refer Slide Time: 7:15)
63
Now the second component is video memory, so output of display controller is some
representation of the 2D image and in video memory which if we recollect from this generic
architecture which takes as input output of the display controller, it stores the representation.
(Refer Slide Time: 7:29)
Now display controller generates the images in the digital format strings of 0’s and 1’s which
is expected because computer understands and processes information only in terms of 0’s and
1’s.
64
(Refer Slide Time: 7:45)
The place where we store it is simply the video memory which is a dedicated path of the
memory hierarchy. Now as we all know in the memory hierarchy of a computing system we
have RAM, ROM, secondary storage, cache different levels video memory is also a part of
those levels in the hierarchy and typically it is situated in the separate graphics unit or
graphics card which is more popularly called VRAM or video RAM probably many of you or
all of you have heard of this term. So display controller generates image representation and
stores and that representation is stored in video memory.
(Refer Slide Time: 8:48)
65
Then comes video controller, again let us go back to that generic architecture here, video
controller is situated here which takes as input the information stored in video memory and
then it does something to display the image on the screen.
(Refer Slide Time: 9:13)
So what it does? It essentially converts digital image that is represented in the form of 0’s and
1’s to analogue voltages, why? Because the voltages drive electromechanical arrangements
which ultimately render image on the screen. So screen essentially is a electro mechanical
mechanism and to run this mechanism we require voltage and this voltage is generated by the
video controller based on the 0’s and 1’s stored to represent the image.
66
(Refer Slide Time: 10:05)
In each display screen we have a basic unit of display which is typically called pixels and
typically it is arranged in the form of a grid or matrix like if I draw a screen like this so we
will have pixel grid something like this, where each cell may represent a pixel essentially a
matrix form of pixels.
(Refer Slide Time: 10:40)
Now these pixels are essentially excited by electrical means and when they are excited they
meet lights with specific intensities. Now these intensities give us the sensation of coloured
images or give us the sensation of colours. So pixels are there on the screen pixels are excited
by electrical means, so after excitation and they meet some light with the specified intensity
67
which gives us a sensation of colour. So if some portion of an image is having the red colour,
the corresponding pixels will emit light with intensity of red colour so that we get the red
colour sensation.
(Refer Slide Time: 11:30)
Now the mechanism through which these pixels are excited is the job of the video controller,
so video controller essentially is tasked to excite pixels through electrical means by
converting the digital input signal 0’s and 1’s into some analogue voltage signals which in
turns activates the suitable electromechanical mechanism which is part of the controller. So
that is in a very broad sense how a graphics system look like, so it has three unique
components, display controller, memory and video controller.
Display controller is responsible for creating a digital representation of the image to be
displayed which is stored in the video memory and then this image information is used to
basically excite pixels on the screen, to emit light of specific intensity, to give a sensation of
coloured images. So this job of exciting pixels on the screen is done by video controller.
Now, in light of this broad description of a graphic system, let us now move to our next topic
of types of graphic systems or graphic devices.
68
(Refer Slide Time: 13:08)
So there are broadly two types of graphic systems which is based on the method used to
excite the pixels. Now what are these two types? One is the vector scan device other one is
the raster scan device.
(Refer Slide Time: 13:33)
Let us start with the vector scan device. This type of devices or graphic devices are also
known as random scan stroke writing or calligraphic devices.
69
(Refer Slide Time: 13:47)
In this type of devices when we are talking of an image that image is represented or assume
to be represented as a composition of continuous geometric primitives such as lines and
curves. So any image is assumed to be composed of lines and curves and when we render or
display these images on the screen essentially we render these basic geometric shapes. So we
no longer talk about the whole image instead we talked about the component lines and curves
that define the image.
(Refer Slide Time: 14:37)
In other words, a vector scan device excites only those pixels of the pixel grid that are part of
these primitives, so to a vectors can device there is no such concepts as a full image, instead it
70
only knows about constituent, geometric primitives and it excites the pixels that are part of
those primitives.
(Refer Slide Time: 15:10)
An example is shown here, consider this line in this left figure and the corresponding pixels is
a truncated part of the grid the corresponding pixels are highlighted in this right figure. So to
a vector scan device the image is not the line but only the set of pixels. It knows only about
these pixels instead of knowing about this line and these pixels are excited to generate the
line image and only these pixels are excited other pixels are not excited, this is important that
in case of a vector scan device we excite only the pixels that are part of the primitives, other
pixels are not touched.
71
(Refer Slide Time: 16:04)
As a result, what we need to do? We need to selectively excite pixels which is very tough job
which requires high precision which is obvious and complex hardware.
(Refer Slide Time: 16:26)
Which in turn makes these devices costly because it takes money to develop such hardware
with high precision. Also due to the selective exciting such type of devices, vector scan
devices are good for rendering wireframes which are basically outlined images. For complex
scenes which involves lot of field of areas, flicker is visible because of this mechanism of
selective exciting which is not a good thing.
72
(Refer Slide Time: 17:18)
The other type of graphic devices is raster scan device. Now in raster scan device an images
is viewed as represented by the whole pixel grid, so earlier we considered an image to be
represented by only a subset of the whole pixel grid but here we are considering the whole
pixel grid and not only the selected pixels representing the primitives. So when we render an
image on a raster scan device all the pixels are considered, in case of vectors can device be
considered only a subset and other pixels were not touched but here all the pixels are
considered. And how do we consider that?
(Refer Slide Time: 18:08)
73
By considering the pixels in a sequence. What is the typical sequence? It is typically left to
right top to bottom. So if we have a grid like this then typically we start from left move
towards the right end then go to the next row move towards the right end and continue in this
way so kind of this type of movement till we reach the lower right endpoint or end pixel.
(Refer Slide Time: 18:41)
The same thing is mentioned here, so the controller starts with the top left pixel and checks if
the pixel needs to be excited, that information will be stored in the memory. So if it needs to
be excited it excites the pixel or leaves it unchanged but mind here that the pixel is
considered for excitation and action is taken accordingly.
(Refer Slide Time: 19:16)
74
It then moves to the next pixel on the right and repeat the steps till the last pixel in the row is
reached.
(Refer Slide Time: 19:29)
Then the controller considers the first pixel in the next row and repeats the steps and in this
manner it continues till the right bottom pixel of the grid.
(Refer Slide Time: 19:43)
Now this process of consideration of pixels in sequence or such sequential consideration of
pixels is known as scanning this is a more generic term used that in raster scan devices, pixel
scanning takes place each row of the grid is known as a scan line. So this sequential
consideration is called scanning and each row in the pixel grid is known as scanline.
75
(Refer Slide Time: 20:23)
Let us consider the same example here, earlier we considered only the pixels that are part of
this line only these pixels, now we are considering all pixels starting from the top left corner
moving in this direction then this row so on till this point. So each line is a scan line and as
you can see in this figure, right hand figure, the white pixels means they need not be excited.
The system considered the pixel and found that they need not be excited so it move to the
next pixel and the filled up circles indicate excited pixels which represents the line so that
information was also there in the memory and the video controller found out that these pixels
needed to be excited so it excited those pixels, in the process is it considered all pixels in the
grid and excited only those which need to be excited.
76
(Refer Slide Time: 21:42)
Now the video memory of a raster scan system is more generally known as frame buffer
where each location corresponds to each pixel. So the size of a frame buffer is equal to the
screen resolution the size of the pixel grid, which is very obvious of course.
(Refer Slide Time: 22:07)
Now there is one interesting fact you should be aware of it, display processors are typically
very fast they work at the speed of CPU, that is nanosecond scale so any operation is done at
a very less time nanosecond level. On the other hand, video controllers are typically slower,
much, much slower compared to display controllers because they involve electromechanical
arrangements which takes time to work.
77
So typical speed ranges in the millisecond level or millisecond scale. Clearly there is a
mismatch between the way display processor produces output between the speed at which the
display processor can produce output and the speed at which the video controller can take
that output as input.
(Refer Slide Time: 23:15)
Now assume that there is only one video memory or frame buffer, if the display controller
outputs are fed directly as input to the video controller through that framebuffer, now the
output is being produced very fast but the input is being consumed at a much lower rate so
the output may get overwritten before the entire output is taken by the video controller as
input which in turn may result in the image getting distorted because before the current input
is processed the next input is ready and overwrote the current input. So to address this
concern, so we use the concept of frame buffers.
78
(Refer Slide Time: 24:14)
Where single buffer is not sufficient and will require at least 2 buffers and if two buffers are
used it is called double buffering, of course there are cases with more than 2 buffers. Now in
case of double buffering one buffer or one video memory is called primary and the other one
is called secondary, so now video controller takes input from one of the buffers typically the
primary buffer whereas the display controller fills up the other or the secondary buffer. Now
when the video controller finishes reading input from the primary buffer, the primary now
become secondary and the secondary becomes primary, so a role reversal takes place and the
process repeats. So in this way the problem of overwriting the image information can be
avoided.
(Refer Slide Time: 25:17)
79
Another interesting fact to note here is called refreshing, now lights emitted from pixel
elements which gives us the sensation of colour starts decaying over time. So it is not the case
that the intensity of the emitted light remains the same throughout the display session so over
time it starts decaying so intensity changes which lead to fading of the scene after sometime.
However, pixels in a scene may get excited at different points of time, thus the pixels may not
fade in sync. So in an image it is not necessary that every pixels fade in sync so that it is not
perceptible to the user so it may lead to image distortion.
(Refer Slide Time: 26:29)
You know to avoid that situation, what is done is to keep on exciting the pixels periodically
which is known as refreshing. So whatever is the excitation value with that value there is a
periodic excitement of the whole pixel grid, so it is not an one time activity. One important
80
consideration here is the refresh rate at which rate we should keep on refreshing the screen so
that the changes are not perceptible to the human eye. So the number of times a scene is
refreshed per second is known as the refresh rate which is represented in Hz or Hertz, it is
typically the frequency unit. And in case of displays that is typically considered to be 60
Hertz or 60 time per second screen should be refreshed.
81
(Refer Slide Time: 27:33)
So what are the pros and cons of a raster scan device? Clearly here, since we are not trying to
excite selectively, so we do not require a very high precision hardware. Scanning is a very
straightforward job so a low precision hardware can do to the job. Also it is good for
generating complex images since we are considering all pixels anyway, so it will not lead to
flickers unlike in vector scan.
(Refer Slide Time: 28:10)
Due to these benefits one is low cost the other one is ability to generate complex images most
of the displays that we see around us are based on raster graphic concept, so you get to see
82
only or mostly raster graphics devices around us because it is low cost and good at generating
complex images.
(Refer Slide Time: 28:43)
Now these two are from the point of view of hardware vector scan device and raster scan
device, there is a closely related term which probably you may have heard of called vector
graphics and raster graphics.
(Refer Slide Time: 28:58)
Now these two are not related to any hardware characteristics unlike the previous terms
vector scan and raster scan.
83
(Refer Slide Time: 29:10)
In case of vector graphics, what we actually refer to is a where the image is represented, so
when we are talking of a vector graphics image we are talking of the representation in terms
of continuous geometric primitives such as lines and curves, so if I say that particular image
is a vector graphics image, that means I am representing that image in terms of its constituent
geometric primitives, lines and curves.
(Refer Slide Time: 29:50)
In case of raster graphics, the representation is different like in raster scan device in case of
raster graphics what we refer to is essentially representing the image as the whole pixel grid
with the pixels which are supposed to be excited in an on state and others in a off state. So if
84
we are representing an image as a raster graphics image essentially the image is stored in a
form of whole pixel grid where some pixels are in the excited or in the on state or at least it is
indicated that these pixels should be in the on state.
(Refer Slide Time: 30:48)
But again it should be noted that vector graphics or raster graphics are terms to indicate the
way images are represented they have nothing to do with the underlying hardware. So even if
I represent an image in the form of a vector graphics I can still use a raster scan device to
display that image and vice versa if I represent an image as a raster graphics I can still use a
vector scan device to render it.
So we should be always clear about the distinction between these terms, one term is vector
scan device and raster scan device these are related to the way scanning takes place at the
hardware level. Other terms are vector graphics and raster graphics these represent the way
images are represented internally rather than how they are rendered through actual display
hardware.
85
(Refer Slide Time: 32:00)
Now let us come back to or let us discuss another important topic that is colour display. So
far we are assuming that the pixels are monochromatic implicitly we are assuming that but in
reality we get to see images that are having colours, so how they work. In a black and white
display each pixel may contain one type of element, for example if you are aware of CRT or
cathode ray tube displays and their internal mechanism then you may be knowing that each
pixel on a CRT display is having a single phosphor dot. Now when we excite it to generate
different light intensities, they result in different shades of grey because that is a single
phosphor dot.
86
(Refer Slide Time: 33:05)
Like the illustration shown here this is for CRT or cathode ray tube, of course nowadays it is
very rare to see such displays but it is good for pedagogical purpose to demonstrate in terms
of a CRT, so left side shows a typical CRT display and on the right side we can see that how
it works internally.
So it has a tube within which there are certain arrangements these arrangements together
constitute the video controller component of a generic system that we have discussed earlier,
so we have cathode, heater, anode arrangements, then a grid to control this electron flow,
then deflection plates vertical and horizontal for deflecting the electron flow.
So essentially the arrangement generates a stream of electrons which hits a point on the
screen a pixel, after hitting the pixel or the phosphor dot generates intensities which results in
different shades of grey, that is in a very brief how CRT’s work and in a similar way other
displays also work in a similar way not in this exactly the same way.
87
(Refer Slide Time: 34:44)
So what happens in case of a colour image? Now in that case each pixel contains more than
one type of element, so like for CRT instead of having one phosphor dot we can have three
types of phosphor dots representing three primary colours namely red, green and blue. So
when excited each of these phosphor dots generates intensities related to this primary colours
so the red dot generates red intensities, green dot generate green intensities and blue dot
generates blue intensities. When this intensity is combined together, we get the sensation of
desired colour.
(Refer Slide Time: 35:44)
88
So as I said each element is capable of generating different shades of the colour and when
this shades combine they give us the desired sensation of the colour, schematically it looks
somewhat like this figure where we have three streams of electron beams hitting the three
elements separately some special arrangements are there which are called masks to guide the
electron beams to hit specific pixel group representing the three pixels like the three shown
here and finally we get the combination of different shades as the desired colour.
(Refer Slide Time: 36:48)
Now there are two ways to generate this coloured images. Essentially what we want to do is
we want to have some values to guide the exciting of the individual type of elements in a
coloured display, so there are two ways to do that, one is direct coding in this case what we
89
do individual colour information for each of the red, green and blue element of a pixel are
stored directly in the corresponding frame buffer.
So in the frame buffer itself we are storing the information of what should be the intensities
of this individual colours, clearly that requires larger frame buffer compared to black and
white frame buffers because now in each location we are storing three values instead of one
and this frame buffer should be capable of storing the enter combination of RGB values
which is also called the colour gamut. So later on will learn more about this colour gamuts
the idea but the point to be noted here is that if we are going for direct coding, then we
require a large frame buffer.
(Refer Slide Time: 38:25)
Another way is colour lookup tables where we use a separate table, lookup table which is of
course a portion of the memory where each entry of the table contains a specific RGB
combination and the frame buffer location contains pointer to the appropriate entry in the
table. So frame buffer does not store the values directly instead it stores the location to the
table which stores the actual values like illustrated in this figure as you can see this is a frame
buffer location which stores the pointer to this particular table entry which stores the values
of R G and B these are the values to excite the pixels accordingly.
90
(Refer Slide Time: 39:19)
Now if I want the CLT to work or the colour lookup tables scheme to work, then we have to
know the subset of the colours that are going to be required in the generation of images. So
the table cannot store all possible combinations of R G and B values, it stores only a subset of
those combination so essentially a subset of the entire set or the colour gamut and we must
know that subset in advance to make this scheme work. If it is not valid of course this method
is not going to work but nowadays we do not have any problem with this frame buffer at the
size of the frame buffer because memory is cheap.
So nowadays it is almost all graphic systems go for direct coding method but in the earlier
generation of graphical systems when memory was a factor to determine the overall cost CLT
was much in use. In that period of course the screens were not equipped to display all sorts of
complex images and mostly wireframes were the images that were displayed. So that time
CLT’s were much more useful but nowadays we do not need to bother about CLT much
unless there is some specific application and we can directly go for direct coding method.
91
(Refer Slide Time: 40:56)
So let us summarise what we have learnt today, we have got introduced to a basic graphic
system which consists of three unique components namely the display controller, the video
memory and the video control. Display controller is tasked to generate the image which is
stored in video memory and which is used by the video controller to render it on a computer
screen.
We also learnt about different types of graphic systems namely the vector scan devices and
the raster scan devices in brief and the associated concepts namely vector graphics, raster
graphics, refreshing, frame buffers so on. Also we got some idea of how colour images are
generated at the hardware level.
So these are basic concepts which will be useful in our subsequent discussions. In the next
lecture we will get an introduction to the basic processing that is required to generate a 2D
image that is the job of the display controller, now this processing is actually consisting of a
set of stages which is collectively known as graphics pipeline, so in the next lecture we will
have an introduction to the overall pipeline.
92
(Refer Slide Time: 42:41)
The topics that I have covered today can be found in this book chapter 1, section 1.3 and you
are also advised to go through the details on the CRT or the cathode ray tube display that is
mentioned in this section, although I have not covered it here, for better understanding of the
topics. So we will meet again in the next lecture, thank you and goodbye.
93
Computer Graphics
Dr. Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 4
Introduction to 3D Graphics Pipeline
Hello and welcome to lecture number 4 in the course Computer Graphics.
(Refer Slide Time: 00:39)
Before we start, we will briefly recap what we have discussed in the previous lectures. So we
started with a basic introduction to the field where we discussed about the historical evolution as
well as the issues and challenges that are encountered by the workers in this field. This was
followed by a basic introduction to the graphics system. So whenever we talk about computer
graphics implicitly we refer to some hardware platform on which some software works.
And the basic hardware structure or architecture of a graphic system has been introduced in one
of the previous lectures. Today we are going to introduce the other component of the graphic
system, namely the graphics software. Of course at this stage we will restrict ourselves to a basic
introduction and the software stages will be discussed in details in the subsequent lecture.
94
(Refer Slide Time: 02:06)
So let us recap what we have learned about a generic architecture of a graphic system. As we
mentioned in one of our earlier lectures, so there are 3 unique components of a graphic system.
One is the display controller, one is the video memory and the 3rd one is a video controller.
What the display controller does? It essentially takes input from the host computer as well as
from some external input devices which are used to perform interactive graphics. And based on
that input, it creates a representation, a digital representation of a 2D image. That is the job of the
display controller.
Now that representation, that the controller generates is stored in a memory which is called video
memory. Now the content of the memory, video memory is given as input to the 3 rd component
that is the video controller which takes the memory content as input and then generates certain
voltage levels to drive some electro-mechanical arrangements that are required to ultimately
display the image on a computer screen.
As you may recollect, we also mentioned that most of the things are done separately without
involving the CPU of the host computer. So typically computers come with a component which
is called as graphics card which probably all of you have heard of which contains the video
memory, the video controller and the display controller components. And the processing unit that
is typically part of the display controller is known as the GPU or graphics processing unit, this is
95
separate from the CPU or the main processing unit of a host computer. And the GPU is designed
to perform graphical activities, graphical operations.
(Refer Slide Time: 04:48)
Now in this generic architecture as we said, display controller generates representation of an
image. So what that representation contains? It contains some color values or intensity values in
a specific format which ultimately is used to generate the particular sensation of color on the
screen. Now from where these color values are obtained? Let us try to go into a some details of
the process involved in generating these color values.
96
(Refer Slide Time: 05:29)
Now these color values are obtained by the display processor through some computations that
are done in stages. So there are a series of computations and these computations ultimately result
in the generation of the color values.
(Refer Slide Time: 06:00)
Now these stages or the series of steps that are involved in the generation of color values are
together called the graphics pipeline. This is a very important terminology and in our subsequent
lectures we will discuss in details the stages of the pipeline, that actually will be the crux of this
course.
97
(Refer Slide Time: 06:30)
But today we are going to introduce the pipeline for our benefit, so that we can understand the
later discussion better. So let us get some introductory idea on the pipeline and its stages.
(Refer Slide Time: 06:47)
There are several stages as I mentioned, first stage is essentially defining the objects. So when
we talk of creating a scene or an image, it contains objects. Now there needs to be some way to
represent these objects in the computer. That activity where we define objects which are going to
be the parts of the images constitute the first stage of the pipeline which is called object
98
representation stage. For example, as you can see in this figure on the screen we want to generate
the image of a cube with color values as shown on the right hand part of the screen.
Now this image contains an object which is a cube and on the left hand side here we have
defined this cube. So when we talk of defining what we mean essentially as we can understand
intuitively, defining the cube involves specifying the vertices or edges with respect to some
reference frame that is the definition in this simple case that is what are the vertices or what are
the edges as pair of vertices.
Ofcourse cube is a very simple object, for more complex objects we may require more complex
definitions, more complex way of representing the objects.
(Refer Slide Time: 08:53)
Accordingly, several representation techniques are available for efficient creation and efficient
manipulation of the images. Note here on the term efficient, so when we talk of this term
efficient, essentially what we refer to, we refer to the fact that the displays are different, the
underlying hardware platforms are different. So whatever computational resources we have to
display something on a desktop or a laptop are likely to be different with respect to whatever we
have to display something on a small mobile device or on a wearable device screen.
Accordingly, our representation techniques should be able to utilize the available resources to the
extent possible and should be able to allow the users to manipulate images in an interactive
99
setting. So the efficiency is essentially with respect to the available computing resources and the
way to make optimum use of those resources.
(Refer Slide Time: 10:32)
Now once we define those objects, these objects are then passed through the subsequent pipeline
stages to get and render images on the screen. So the first stage is defining the objects and the
subsequent stages we take these object definitions as input and generate image representation as
well as render it on the screen.
100
(Refer Slide Time: 11:00)
What are those subsequent stages? First one is modeling transformation which is the 2 nd stage of
the pipeline. Now as I said when we are defining an object where considering some reference
frame with respect to which we are defining the object. For example, the cube that we have seen
earlier. To define the cube, we need to define its coordinates but coordinates with respect to
what? There we will assume certain reference frames.
Now those reference frames with respect to which the objects are defined are more popularly
called local coordinate of the object. So the objects are typically defined in their own or local
coordinate system. Now multiple objects are put together to create a scene, so each object is
defined in its own or local coordinate system and when we are combining them we are
essentially trying to combine these different reference frames.
By combining those different objects, we are creating a new assemble of objects in a new
reference frame which typically is called world coordinate system. Take the example shown on
this figure. So here as you can see there are many objects, some cubes, spheres and other objects,
cylinders. Each of these objects is defined in its own coordinate system.
Now in this whole scene, consisting of all the objects, this is the whole scene, here we have
assembled all those objects from their own coordinate systems. But here again we are assuming
another coordinate system in terms of which this assembling of objects is defined. So that
coordinate system where we have assembled them is called the world coordinate system. So
101
there is a transformation, transforming an object from its own coordinate system to the world
coordinate system. That transformation is called modeling transformation which is the 2nd stage
of the graphics pipeline.
(Refer Slide Time: 13:58)
So in the first stage we define the objects, in the second stage we bring those objects together in
the world coordinate system through modeling transformation which is also sometime known as
the geometric transformation. So both the terms are used either modeling transformation or
geometric transformation that is the 2nd stage of the graphics pipeline.
102
(Refer Slide Time: 14:14)
Now once the scene is constructed, the objects need to be assigned colors which is done in the 3 rd
stage of the pipeline called lighting or illumination stage. Take for example the images shown
here. In the left figure we have simply the object, in the right figure we have the color. So the, we
have applied colors on the object surfaces. Now as you can see the way we have applied colors,
it became clear which surface is closer to the viewer and which surface is further.
In other words, it gives us a sensation of 3D, whereas without colors like the one shown here,
that clarity is not there. So to get realistic image which gives us a sensation of 3D, we have to
assign colors. Assignment of colors is the job of the 3rd stage which is called lighting or
illumination stage.
103
(Refer Slide Time: 15:36)
Now as probably you are aware of color is a psychological phenomenon and this is linked to the
way light behaves or in other words, this is linked to the laws of optics. And in the 3rd stage,
what we do? We essentially try to mimic these optical laws, we try to mimic the way we see
color or we perceive color in the real world and based on that we try to assign colors in the
synthesized scenes.
(Refer Slide Time: 16:17)
So first we define an object, 2nd we bring objects together to create a scene, 3rd stage we assign
colors to the object surfaces in the scene. Now till this point, everything we were doing in 3D
104
setting in the world coordinate system. Now when we get to see an image, the computer screen is
2D, so essentially what we require is a mapping from this 3D world coordinate scene to 2D
computer screen. That mapping is done in the 4th stage that is viewing transformation.
Now this stage we perform several activities which is similar to taking a photograph. Consider
yourself to be a photographer, you have a camera and you are capturing some photo of a scene.
What you do? You place the camera near your eye, focus to some object which you want to
capture and then capture it on the camera system and also this is followed by seeing it on the
camera display or camera screen, if you are having a digital camera.
(Refer Slide Time: 18:01)
Now this process of taking a photograph can be mathematically analyzed to have several
intermediate operations which in itself forms a pipeline, which is a pipeline within the broader
graphics pipeline. So the 4th stage viewing transformation itself is the pipeline which is a part of
the overall graphics pipeline. Now this pipeline where we transform a 3D world coordinate scene
to a 2D view plane scene is called viewing pipeline.
105
(Refer Slide Time: 18:50)
Now in this pipeline what we do? We first setup a camera coordinate system which is also
referred to as a view coordinate system. Then the world coordinate scene is transformed to the
view coordinate system. This stage is called viewing transformation. So we have setup a new
coordinate system which is a camera coordinate system and then we transformed the world
coordinate scene to the camera coordinate scene.
106
(Refer Slide Time: 19:30)
From there we make another transformation, now we transfer the scene to a 2D view plane. Now
this stage is called projection transformation. So we have viewing transformation followed by
projection transformation.
(Refer Slide Time: 19:49)
For projection, we define a region in a viewing coordinate space which is called view volume.
For example, in the figure shown here, as you can see this frustum is defining a view volume, the
frustum shown here is defining a view volume. So we want to capture objects that are present
within this volume, outside objects we do not want to capture. That is typically what we do when
107
we take a photograph, we select some region on the scene and then we capture it. So whichever
object is outside will not be projected and whichever are there inside the volume will be
projected.
(Refer Slide Time: 20:48)
So here we require one additional process, a process to remove objects that are outside the view
volume. Now those objects can be fully outside or can be partially outside. So in both the cases
we need to remove them. So when an object is fully outside we completely remove it and when
an object is partially outside we clip the object and keep only the part that is within the view
volume, the outside part we remove. The overall process is called clipping.
108
(Refer Slide Time: 21:22)
Also when we are projecting, we consider a viewer position where the photographer is situated
and in which direction he or she is looking at. Based on that position, some objects may appear
fully visible, some may appear partially visible, whereas the other objects will become invisible.
But all of them may be within the same volume.
For example, with respect to this particular view position, some objects may get, like this object
if it is behind this object then it will be invisible. If it is partially behind, then it will be partially
visible and if they are not aligned in the same direction, then both of them will be fully visible.
So you take care of this fact also before projection which requires some further operations,
computations.
109
(Refer Slide Time: 22:32)
So to capture this viewing effect, the operations that we perform are typically called hidden
surface removal operations or similarly visible surface detection operations. So to generate
realistic viewing effect along with clipping what we do is we perform the hidden surface removal
or visible surface detection operations.
(Refer Slide Time: 23:06)
So after clipping and hidden surface removal operations, we project the scene on the view plane.
That is a plane define in the system, in the view coordinate system.
110
(Refer Slide Time: 23:21)
Now, there is one more transformation Suppose in the right hand figure, suppose this is the
object which is projected here in the view plane. Now the object may be displayed on any
portion of a computer screen, it need not to be exactly at the same portion as in the view plane.
For example, this object may be displayed in a corner of the display. So we will differentiate
between two concepts here; one is the view plane which is typically called a window, other one
is the display region on the actual display screen which we call viewport. So one more
transformation remains in the viewing pipeline that is transferring the content from window to
the viewport. So this is called the window-to-viewport transformation.
111
(Refer Slide Time: 24:44)
So in summary what we can say is that, in the 4th stage there are 3 transformations. What are
those transformations? First we transform from world coordinate scene to camera or view
coordinate scene. Then from camera coordinate scene, we perform the projection transformation
to view plane, then the view plane window is transform to the viewport. So these are the 3
transformations.
Along with those there are 2 major operations that we perform here; one is clipping that is
clipping out the objects that lie outside the view volume and the other one is hidden surface
removal which means creating a realistic effect, viewing effect with respect to the viewer
position. So that is the 4th stage.
So first we defined objects in the first stage, in the 2nd stage we combined those objects in the
world coordinate scene, in the 3rd stage we assigned colors to the object surfaces in the world
coordinate scene, in the 4th stage we transformed in the world coordinate scene to the image on
the viewport through a series of transformations which form a sub-section-pipeline within the
overall pipeline.
And those sub-pipeline stages are viewing transformation, projection transformation and
window-to-viewport transformation. This sub-pipeline is called viewing pipeline which is part of
the overall graphics pipeline and in the 4th stage along with these viewing pipeline we also have
to more operations performed that is clipping and hidden surface removal.
112
(Refer Slide Time: 27:17)
One more stage remains that is the 5th stage which is called scan conversion or rendering. Now
we mentioned earlier that we transform to a viewport. Now viewport is an abstract representation
of the actual display. In the actual display if you recollect our discussion on our raster displays,
we mentioned that the display contains a pixel grid.
So essentially the display contains locations which are discrete, we cannot assume that any point
can have a corresponding point on the screen. For example, if in our image we have a vertex at
location 1.5 and 2.5, on the screen we cannot have such a location because on screen we only
have integer values as coordinates due to the discrete nature of the grid. So we have either a pixel
located at 2, 2 or 3, 3 or 1, 1 or 1, 2 something like that rather than the real numbers 1.5, 2.5.
So we cannot have a pixel location at say 1.5, 2.5 but we can have pixel locations only at integer
value say 1, 1; 2, 2 and so on. So if we get a vertex in our image located at 1.5, 2.5 then we must
map it to these integer coordinates. That stage where we perform this mapping is called the scan
conversion stage which is the 5th and final stage of the pipeline. For example, consider these
lines shown here, the end points are 2, 2 and 7, 5. Now all the intermediate points may not have
integer coordinate values but in the final display, in the actual display we can have pixels, these
circles only at integer coordinate values.
So we have to map these non-integer coordinates to integer coordinates. That mapping is the job
of this 5th stage or scan conversion stage which is also called rasterization. And as you can see it
113
may lead to some distortion because due to the mapping we may not get the exact points on the
line, instead we may have to satisfy ourselves with some approximate points that lies close to the
actual line. For example, this pixel here or this pixel here is not exactly on the line but the closest
possible pixel with respect to the line.
(Refer Slide Time: 30:19)
So what is the concern? How to minimize the distortions? Now these distortions has a technical
name which is called aliasing effect, from where this name originated we will discuss later. So
our concern is to eliminate or reduce the aliasing effect to the extent possible so that we do not
get to see too much distortions, we do not get to perceive too much distortions. To address this
concern, several techniques are used which are called anti-aliasing techniques. These are used to
make the image look as smooth as possible to reduce the effect of aliasing.
114
(Refer Slide Time: 31:21)
So let us summarize what we have discussed so far. We mentioned the graphics pipeline, it
contains the 5 stages; 1st stage is object representation, 2nd stage is modeling transformation, 3rd
stage is assigning colors or lighting, 4th stage is the viewing pipeline which itself has subpipeline involving viewing transformation, clipping, hidden surface removal, projection
transformation and window-to-viewport transformation and the final stage is scan conversion
which is the 5th stage. So there are broadly 5 stages involved.
Now each of these stages has its own reference frames, own coordinate system. So in stage 1 we
deal with local coordinate system of the objects, in stage 2 we deal with world coordinate
system. So essentially it transforms from local to world coordinate system. Stage 3 again we deal
with world coordinate. So when we are assigning color, we essentially assuming that the objects
are defined in the world coordinate system.
In stage 4 again different coordinates are used, so first transformation that is viewing
transformation, involve transformation from world coordinate to view coordinate system or the
camera coordinate system. Clipping is performed on the view coordinate system, hidden surface
removal is also performed in the view coordinate system. Then we perform a transformation,
projection transformation which transforms the content of the view coordinate system to 2D
view coordinate system.
115
So we have 3D view coordinate system from there we transfer the content to a 2D view
coordinate system. And in the window-to-viewport transformation, we transfer from this 2D
view coordinate system to device coordinate system. And finally in the 5th stage what we do, we
transfer from device coordinate to actual screen coordinate system. Note that device coordinate is
an abstract an intermediate representation, whereas the screen coordinate is the actual pixel grid.
So device coordinate contains continuous values, whereas screen coordinate contains only
discrete values in the form of a grid. So this is in summary what is there in the graphics pipeline.
So display controller actually performs all these stages to finally get the intensities values to be
stored in the frame buffer or video memory. Now these stages are performed through software,
of course with suitable hardware support.
For a programmer of a graphic system, of course it is not necessary to learn about the intricate
details of all these stages, they are quite involves lots of theoretical concepts, lots of theoretical
models. Now if a graphics programmer gets brought down with all this theory, models then most
of the time will be consumed by understanding the theory rather than actually developing the
system. So in order to address this concern of a programmer what is done is essentially
development of libraries, graphics libraries.
(Refer Slide Time: 35:17)
So there is this theoretical background which is involved in generating 2D image. The
programmer need not always implement the stages of the pipeline to fully implement the
116
theoretical knowledge, that would be of course too much effort and major portion of the
development effort will go into understanding and implementing the theoretical stages.
(Refer Slide Time: 35:52)
Instead the programmer can use what is called application programming interfaces or APIs
provided by the graphics libraries. Where these stages are already implemented in the form of
various functions and the developer can simply call those functions with arguments in their
program to perform certain graphical tasks. There are many such libraries available, very popular
ones are mentioned here OpenGL which is an open source graphics library which is widely used.
Then there is DirectX by Microsoft and there are many such other commercial libraries available
which are proprietary but OpenGL being open source is widely accessible and useful to many
situations.
117
(Refer Slide Time: 37:00)
Now what these libraries contains? They contain predefined sets of functions, which, when
invoked with appropriate arguments, perform specific tasks. So the programmer need not know
every detail about the underlying hardware platform namely processor, memory and OS to build
a application.
(Refer Slide Time: 37:29)
For example, suppose we want to assign colors to an object we have modelled. Do we need to
actually implement the optical laws to perform the coloring? Note that this optical law
implementation also involves knowledge of the processors available, the memory available and
118
so on. So what we can do is instead of having that knowledge, we can simply go for using a
function glColor3f with an argument r, g, b. So this function is defined in OpenGL or the open
graphics library which assigns a color to a 3D point.
So here we do not need to know details such as how color is defined in the system, how such
information is stored, in which portion of the memory and accessed, how the operating system
manages the call, which processor CPU or GPU handles the task and so on. So all these
complicated details can be avoided and the programmer can simply use this function to assign
color. We will come back to this OpenGL functions in a later part of the lecture where we will
introduce OpenGL.
(Refer Slide Time: 38:57)
Now graphics applications such as painting systems which probably all of you are familiar with,
CAD tools that we mentioned in our introductory lectures earlier, video games, animations, all
these are developed using these functions. So it is important to have an understanding of these
libraries if you want to make your life simpler as a graphics programmer. And we will come
back later to this library functions, we will discuss in details some functions popularly used in
the context of OpenGL.
So in summary today we have learned some idea of 3D graphics pipeline and also got some idea,
introductory idea to the graphics libraries. In subsequent portion of the course, we will discuss in
119
details all the stags as well as some more details of the graphics libraries and the graphics
hardware.
(Refer Slide Time: 40:26)
That is all for today, whatever I have discussed today can be found in chapter 1 of the book
mentioned here. You are advised to refer to section 1.4 and 1.5. Thank you and good bye.
120
Computer Graphics
Doctor Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 5
Introduction and Overview on Object Representation Techniques
Hello and welcome to lecture number five in the course computer graphics. So, in the earlier
lectures, we got introduced to the field where we discussed about few things, namely the
historical evolution of the field, the issues, the challenges and applications that are there in the
field. We also got introduced to the basic idea of graphic software, namely the 3D graphics
pipeline and the graphics libraries today will start our discussion on the pipeline stages.
(Refer Slide Time: 1:12)
Let us recap what we have learned about graphics pipeline, as you may recollect there are
broadly, 5 stages of the pipeline. In the first stage we have object representation, in other words,
in this stage, what we do, we essentially try to represent the objects that will constitute the scene.
Now the objects that are represented are defined in their local coordinate system. In the second
stage we combine these objects together to form a scene. So, that stage is called modelling
transformation stage. And here what we do we essentially perform a transformation from the
local or object coordinate system to the world coordinate system.
And at the end of it, we get a scene in the world coordinate system. In the third stage, we
assigned colours, colours to the object surface points. So, color assignment takes place in the
121
world coordinate system. In the fourth stage, we make a series of transformations as well as
some other operations. So, we transfer the objects from the world coordinate to a view
coordinate system through a transformation called viewing transformation. So, this is essentially
a transformation between world to view coordinate reference.
Now, after doing that, we perform an operation called clipping, which we do in the view
coordinate space. This is followed by another operation called hidden surface removal, which
again takes place in the view coordinate space. After that, we perform a transformation from 3D
view coordinate system to 2D view coordinate system. This transformation is called projection
transformation.
And finally, we perform yet another transformation that is window to view port transformation,
where we transfer the content from 2D view coordinate system to device coordinate system. So,
all these transformations and operations together constitute the fourth stage, which is called
viewing pipeline stage.
And then there is a final state called scan conversion or rendering, which is the fifth stage in this
stage. We render the scene in the device coordinate to an image on the screen coordinate so that
transformation takes place in this last and final phase of the pipeline. Among these five stages,
today, we will start our discussion on the first stage that is object representation.
(Refer Slide Time: 4:42)
122
As we all know, or probably we can guess, that in a synthesized image where we are performing
the synthesis using a computer, we are likely to deal with objects of different shapes and sizes.
And that different shapes and sizes can vary widely. We may deal with tiny snowflakes to create
a scene or we may deal with complex characters, animation characters to create a movie or
animation and all possible shapes and sizes of objects in between.
Now, as you can understand, a snowflake, for example, is not a simple object, it has an elegant
shape and unless we reproduce that elegant shape it will not be able to reproduce a realistic
image or scene. So, ideally the snowflakes should not be represented with simple sphere.
Similarly, an animated character needs to be depicted with its associated complexities so that it
looks realistic. We should not try to simplify it using say simple polygons or simple geometric
shapes.
(Refer Slide Time: 6:22)
Now, to generate a scene with all these disparate objects, what we need, we need some way to
represent them so that computers can understand and process those objects. And as I said before,
any representation will not work. So, we cannot represent a snowflake with a sphere that will
reduce the realistic feel of the generated image.
123
(Refer Slide Time: 6:50)
Now, there are two fundamental questions related to object representation; how can we represent
different objects with their characteristic complexities, so that those can be rendered realistically
in a synthesized environment? So, what it tells us that we need to represent different objects,
preserving their inherent complexities so that when they are rendered on the screen, we get the
feeling of realism in the synthesized image.
(Refer Slide Time: 7:34)
The other question is, how can we have a representation that makes the process of rendering
efficient? In other words, can we perform the operations of the different stages of the pipeline in
124
ways that optimize space and time complexities? So, one question deals with creating realistic
effects and as we discussed in our introductory lectures, creating realistic effects involves lots of
computations and lots of data processing requiring storage.
So, the other fundamental question related to object representation is that we have some
computing resources available involving storage and processors. Now, we may want to use very
complex representations to create more realistic effect, but whether our available resources will
support such representations, which will be used in subsequent stages of the pipeline? That also
we need to keep in mind while we are going for a particular representation. So, there is a tradeoff; one is realism, one is available resources and we have to balance the tradeoff.
(Refer Slide Time: 9:37)
Now, in order to balance this trade off, a plethora of techniques have been developed to represent
objects and today we will go through some of those techniques in brief and the details will be
discussed in subsequent lectures.
125
(Refer Slide Time: 9:40)
All these techniques we can categorize in broadly four types, first one is point sample
representation. Second one is boundary representation. Third one is space partitioning. And the
fourth one is sweep representation. So we have a large number of techniques and all these
techniques we can categorize into four types; point sample, boundary representation, space
partitioning and sweep representation.
(Refer Slide Time: 10:22)
Let us see what is, what it is start with the first category point sample representation. To create a
3D scene we can first capture raw data such as color, surface normal and depth information of
126
different points on the scene. How we can capture those? We can use various devices such as 3D
range scanner, range finder or using simple camera and computer vision techniques. So, using
those devices and techniques, we can capture raw information about a 3D scene, namely the
color information or the surface normal information or the depth information at different points
in the scene.
(Refer Slide Time: 11:18)
Now, since we already got this information, so we do not need to compute them and we can
directly render these points on a screen. So, we can process these points subsequently to generate
the scene, so here the focus is on capturing the information rather than computing the values,
then what is the representation?
The representation is a set of raw data points, for each data point, we have captured some values
like color, depth, surface normal, vector. These are its attributes. So, our representation involves
the set of data points as well as some attribute values for each, and that is called point sample
representation. So, essentially we are representing the 3D scene in terms of points sampled at
different locations.
127
(Refer Slide Time: 12:22)
The next one is boundary representation. There are also a set of techniques that represent an
object by representing the individual object surfaces. Now these surfaces can be polygonal or
curved. For example, see the figure here, here on the left hand side of the figure we see six
surfaces. Named A, B, C, D, E and F. Now this surfaces defined the cube shown on the right
hand side of the image. So, you are representing the cube, which is an object in terms of these
surfaces, which are interlinked rectangles A to F. So, that is boundary representation. So, we are
representing the cube in terms of its bounding or boundary surfaces.
(Refer Slide Time: 13:34)
128
There are other techniques in which, we do not represent objects in terms of boundaries. Instead,
what we do, we use the 3D space occupied by the object to represent it. Now we divide the space
into several disjoint regions, disjoint or non-overlapping regions. Any point inside the object lies
in exactly one of the regions, so the division is done in a way such that any point inside the
object lies in exactly one of the regions.
(Refer Slide Time: 14:20)
So, when we represent objects in this manner. Then we are essentially representing in terms of
the space occupied by the object rather than the bounding surfaces, so those techniques where
such approaches are used are called space partitioning methods or representations. And such
representations are often created in a hierarchical way. That means space occupied by the object
is divided into sub regions. And this division mechanism is applied recursively to each sub
region till we arrive at some predefined size of the sub region.
129
(Refer Slide Time: 15:10)
Now, these hierarchical representations can be depicted in different ways. A common way is to
form a tree or to show the representation in the form of a tree, which is often called space
partitioning trees. That is one common way of representing an object in terms of the space
occupied by it.
(Refer Slide Time: 15:42)
And finally, we have the sweep representation. Now, there are two sweep representation
techniques which are widely used in graphics. One is the sweep surface representation and the
surface of revolution representation.
130
(Refer Slide Time: 16:02)
Let us try to understand the sweep surface representation. In this type of representation 3D
surfaces are obtained by traversing an entity such as a point, line, polygon or curve along a path
in space in a specified manner. For example, look at the figure here we have this rectangle and
the rectangle is moved in a specific trajectory. To create this overall object of interest.
So, the object here is this, entire thing created by moving a rectangle along the specified path.
This type of representation where we are not actually representing an object in terms of its
bounding surface or the space occupied by it, rather, we are actually representing it in terms of a
process where the input is a primitive surface and the path it follows. And we specified the way
to traverse that path. That type of representations are called sweep surfaces.
131
(Refer Slide Time: 17:36)
In a similar way there is another representation called surface of revolution, as the name suggests
here what we do, we define a 2D entity which rotates around an axis. We also specify the axis.
So then the resulting object is what we are interested in. For example, here again if we consider
say the rectangle and this is the path or path of rotation around the x axis, then we get this overall
object, which is our desired object.
So, here again, we are not actually specifying the object in terms of its bounding surfaces or the
space occupied by it, but in terms of a primitive object, in this case rectangle and the axis and the
direction of revolution. So, this type of representations are known as surface of revolution.
132
(Refer Slide Time: 19:02)
Whatever we have discussed so far are the broad categories.
(Refer Slide Time: 19:18)
Now, some of these categories have subcategories as well. For example, boundary representation
techniques have three types of sub techniques, one is mesh representation, which is most
common. We have parametric representation and the third one is implicit representation.
133
(Refer Slide Time: 19:35)
In case of space partitioning methods there are again several sub techniques, subcategories such
as octrees methods, BSP trees, constructive solid geometry or CSG. In subsequent lectures, we
will go through the subcategories in more details.
(Refer Slide Time: 20:02)
Now, apart from these broad categories and subcategories, there are some other techniques
which do not fall into any of these broad categories, which are categories in itself. They are
mainly application specific or complex photorealistic object representation. So, now complex
134
photorealistic object representation indicates that those techniques are used to represent realistic
effects in a very complex way.
(Refer Slide Time: 20:36)
Let us see a few examples. There is one technique called fractal representation. For example, see
this figure of the tree here if you look closely at each branch of the tree, you will see that the
overall structure is replicated in each branch. So the whole tree structure is replicated in each
branch and within this branch, sub branches replicate again this tree structure. So, it is a selfrepeating structure, which is represented using fractal notations. So fractal representation is one
useful representation and in nature we get to see lots of objects that are actually self-repeating,
where fractal representation is very useful.
Another advance representation technique is particle system representation, where we try to
simulate the actual physics, for example, if we want to create this waterfall in a very realistic
way. Then particle system representation would be more appropriate than any other
representation we have used so far where we will be able to actually mimic the way this water
flows due to gravity and falls from a higher position to lower position and the collisions and how
they get dispersed, all these things can be captured using particle system representation.
Third technique is skeletal model representation. If we want to create a character like this, we
can represent the character using a skeletal form, which is shown in this left side figure. And the
way this skeletal form is defined, so whenever this character moves, the movement of the
135
skeleton is also proportionate. So, kinematic considerations are taken into account while we
define a skeletal representation. These are only a few of many such possible representations
which are actually used in generation of realistic scene.
(Refer Slide Time: 23:26)
So, in summary, what we can say is that we have a large number of techniques available to
represent 3D objects. Broadly, there are four techniques. One is point sample rendering where
instead of artificially trying to create objects shapes, we actually capture some values, namely
color, depth, surface normal at different points of a scene and then simply reproduce those on the
screen, that is point sample rendering.
Other technique is boundary representation, which has many subcategories like mesh
representation, parametric representation and implicit surface representations. In boundary
representation techniques we represent objects in terms of its bounding surfaces where these
bounding surfaces can be lines, curves, polygons, anything.
The third technique is space partitioning method here, instead of representing an object in terms
of its boundary, what we do is we represent the space occupied by this object. And typically, we
use some hierarchical representation in the form of trees, which is more popularly called spaced
partitioning tree. And there are many subcategories of such representations, namely octree
method, BSP method or binary spaced partitioning method, constructive solid geometry method
or CSG methods.
136
The fourth technique is sweep representation, where we do not represent the whole object.
Instead, we represent the objects in terms of a primitive shape and some movement path or the
trajectory. There are two such sweep representation techniques available; sweep surface and
surface of revolution. One interesting point about this type of representation is that here the
representation itself contains an approach rather than objects.
The approach is how to move the primitive surface along a path or around an axis. Now, apart
from these broad four categories, there are other representations available which are application
specific, sometimes some specific techniques are also possible, namely scene graphs, skeletal
model and advanced modelling, namely fractals or particle systems.
Now, among these categories in subsequent lectures we will discuss in details these two
categories; boundary representation and space partitioning representation. In the boundary
representation techniques, we will discuss all these three subcategories in some details, whereas
in the space partitioning method we will discuss these three subcategories in some detail.
(Refer Slide Time: 27:06)
And in boundary representation we will learn about a specific representation technique, namely
the spline representation, which is very popular in representing complex shapes in graphics.
Whatever I have discussed today is just the introduction to object representation techniques,
various techniques that are available to represent object. Next, few lectures will be devoted to the
details of these techniques.
137
(Refer Slide Time: 27:43)
The content of today's lecture can be found in this book. You can have a look at chapter 2,
section 2.1. For the introduction part, however, as I said, there are some advanced methods
available for representing objects, which you will not find in this section. Instead, you may have
a look at Section 2.5. Although these advanced techniques we will not discuss any further details.
If you are interested, you may have a look at this section. So, that is all for today. Thank you.
And see you in the next lecture. Goodbye.
138
Computer Graphics
Doctor Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 6
Various Boundary Representation Techniques
Hello and welcome to lecture number six in the course, computer graphics.
(Refer Slide Time: 0:38)
So, we started our discussion on 3D object representation, which is the first stage of the graphics
pipeline.
139
(Refer Slide Time: 0:49)
To recap, let us see the pipeline again. There are 5 broad stages. As being shown on this screen,
first stage is object representation, which we are currently discussing, the other stages we will
take up in subsequent lectures, namely the modelling transformation, lighting, viewing pipeline
and scan conversion.
One point I would like to mention here is that although, in this course I will follow the pipeline
stages in the way shown here, in practice, it is not necessary to have this exact sequence. Some
stages may come after some other stages. For example, lighting may be done after viewing
pipeline or in between some of the transformations of viewing pipeline and so on. So, the
sequence that I am showing here need not be followed exactly during implementation of a
graphics system. This is just for our understanding of the stages involved and the sequence may
vary.
140
(Refer Slide Time: 2:18)
Now, what we have learned in the previous lecture, we got a general introduction to various
object presentation techniques.
(Refer Slide Time: 2:27)
What were those techniques that we discussed? One technique is point sample rendering, then
we have boundary representation technique, space partitioning techniques and sweep
representation technique. These are the 4 broad categories we mentioned, each of which has
subcategories boundary representation, has three subcategories; mesh representation, parametric
representation and implicit representation.
141
Space partitioning has three subcategories; octree representation, BSP representation and CSG
representation. BSP stands for binary space partitioning, whereas CSG stands for computational
solid geometry. In sweep representation, we have two techniques; sweep surfaces and surface of
revolution.
Apart from these 4 broad categories, we have other representations as well. Some are application
specific, there are some general advanced representation techniques, namely scene graphs,
skeleton models, skeletal models and advanced modelling techniques. Now, in the advanced
modelling techniques we have many such techniques, fractal representation, points sample
rendering, particle systems and so on.
(Refer Slide Time: 3:56)
Today, we shall discuss in details one of those techniques, namely boundary representation
techniques. We already have seen that in boundary representation techniques we represent an
object in terms of its bounding surfaces or the surfaces that constitutes its boundary. Now, those
surfaces can be simple polygons or complex steps.
142
(Refer Slide Time: 4:31)
There are several ways to represent these bounding surfaces. We mentioned three subcategories
of representation; mesh representation, implicit representation and parametric forms. So today
we will get introductory idea to all these three representation techniques.
(Refer Slide Time: 5:00)
Let us start with the mesh representation. This is the most basic technique of representing objects
in a scene, where we use polygons to represent the surfaces. Now the polygons in terms are
represented using vertex or edge lists that store information about all the vertices or edges of the
surface and their relationship.
143
For example, consider the figure here, you are representing a cube in terms of its vertices v1, v2
and so on up to v7, so there are 8 vertices. And this one is the representation where we are
storing the vertices with coordinate values and some other values, capturing the relationships.
For example, here in this first row, what it tells is that v0 is connected to v1, v3 and v5.
Similarly, each vertex stores the other vertices which, it has connection to, this is one
representation, there can be other ways to represent it.
(Refer Slide Time: 6:32)
Now, sometimes the surfaces need not be polynomial, but in mesh representation, what we can
do is we can approximate anything to polygonal meshes like the figure shown here, here, this
hand actually does not contain any polygonal surface. But this hand surface I can approximate
with this type of triangular meshes where lots of triangles are used to approximate it. And again,
these meshes are represented using vertex and edge lists.
144
(Refer Slide Time: 7:27)
In fact, the mesh representation is most basic form of representation any other representation that
we may use will ultimately be converted to mesh representation at the end of the pipeline before
the objects are rendered. So, we have to keep this in mind. So, whatever representation we use
and we will learn about in subsequent discussions, at the end, everything is converted to a mesh
representation.
(Refer Slide Time: 8:02)
Now there is one important issue. That is how many polygons should we use to approximate the
surfaces? That is a very fundamental question.
145
(Refer Slide Time: 8:14)
Because more the number of polygons, the better the approximation is, this is obvious. However,
more subdivision also implies more storage and computation. So, if we can use three triangles to
represent a surface, which (()) (8:37) if we are using 30 triangles to represent a surface, the latter
representation, of course, will give a better visual clarity, better visual quality.
However, since we are increasing the number of objects or polygons in the mesh, there will be a
corresponding increase in the storage because we have to now store vertices for 30 triangles,
which are (()) (9:08) 3 triangles as well as computations, because we have to perform recursive
subdivision to create this mesh, a larger number of times, which (()) (9:19) when we have less
number of triangles. So, creation of mesh is computation intensive and storing the mesh
information is also storage intensive, and if we increase both, then both needs to be taken into
account.
146
(Refer Slide Time: 9:44)
So, there is a trade-off and what we need to do is to optimize space and time complexities while
keeping the quality acceptable, quality of representation acceptable. Now how to decide how to
balance this tradeoff? The answer depends on the application and the resources available.
Depending on the resources and depending on what we need to render we can choose the right
value for the number of subdivisions required and as well as the number of polygons. We are
going to be to approximate a surface with a mesh. That is about mesh representation.
(Refer Slide Time: 10:37)
Next let us move to the other two representations, implicit and parametric representations.
147
(Refer Slide Time: 10:46)
Now, although we said that mesh representation is the most fundamental type of representation,
for a developer it is not necessarily a very convenient mode of representation because for
complex surfaces, first of all, it is very difficult to determine how many polygons should be used
to create a mesh. Secondly, it is very cumbersome to enumerate all the vertices of the mesh.
If the number of polygons in the mesh or the number of meshes that we are using are large,
which is likely to be the case in any practical application. So, what is required is some
compromise and some way to help the developer define objects without bothering too much or
spending too much time on defining the meshes.
148
(Refer Slide Time: 11:52)
So, designers or developers like to use representations that mimic actual object rather than its
approximation.
(Refer Slide Time: 12:04)
This brings into picture some high level representations, representation techniques, for curved
surfaces. Now these techniques are likely to represent curved surfaces more accurately and
conveniently for the designer, these are not approximations, rather more closer to the actual
representations.
149
(Refer Slide Time: 12:32)
So, implicit and parametric representations are essentially those type of representations where it
is more convenient and represents objects in more accurate way rather than approximate the
objects. Now, let us start with implicit representation. So, in this case the surfaces are defined in
terms of implicit functional form, some mathematical equations.
(Refer Slide Time: 13:05)
In case of parametric representation, the surface points are defined in Euclidean space in terms of
some parameters, again in the form of some mathematical equations.
150
(Refer Slide Time: 13:23)
Now, let us see a few examples which are popularly used in graphics. Let us start with quadric
surfaces.
(Refer Slide Time: 13:41)
These are frequently used class of objects in graphics which are represented using implicit or
parametric form. And this term quadric surfaces refers to those objects, which or the surface of
which are described with second degree equations or quadratic equations.
151
(Refer Slide Time: 14:11)
For example, spheres, these are very commonly used.
(Refer Slide Time: 14:19)
In implicit form, we can represent a spherical surface with radius r and, which is centered at
origin as
. So, this equation we can use for implicitly representing a
sphere.
152
(Refer Slide Time: 14:42)
The same sphere can be represented parametrically also using this form where the angles theta
and phi of the parameters which represent the latitude and longitude angles as shown in this
figure here, this is the latitude angle and this is the longitude angle. And this p is a point on this
sphere, which is represented using the parameters.
(Refer Slide Time: 15:25)
Similarly, we can represent ellipsoid also either in implicit form as shown here or in parametric
form as shown here. This is another widely used quadric surface.
153
(Refer Slide Time: 15:44)
There are many other examples like tori, paraboloids and hyperboloids. Some other widely used
quadric surfaces in graphics applications.
(Refer Slide Time: 15:58)
An interesting class of objects are called blobby objects.
154
(Refer Slide Time: 16:07)
There are some objects for whom their shapes show certain degree of fluidity or flexibility, that
means the object shape changes during motion or when comes closer to other objects.
(Refer Slide Time: 16:25)
Typically, these objects have curved surfaces, but we cannot use standard shapes like lines,
polynomials or quadratics, quadratic equations or quadrics to represent these shapes because
these equations or standard shapes fail to represent surface fluidity in a realistic way. So, we
have objects which show some fluidity, whose surfaces are represented using some curves, but
155
those curves we cannot represent using line or polynomials or quadrics because then we will lose
the fluidic nature.
(Refer Slide Time: 17:11)
Now such objects generally are referred to as blobby objects such as molecular structures, liquid
and water droplets, melting objects, animal and human muscle shapes and so on, these are some
examples there are many other examples also. There are several methods to represent blobby
objects. In all, there is one common approach essentially to use some distribution function of
over a region of space.
(Refer Slide Time: 17:49)
156
One method is to use a combination of Gaussian density functions or sometimes called Gaussian
bumps.
(Refer Slide Time: 17:59)
An example is shown here of a Gaussian density function, it is characterized by two parameters,
height and standard deviation as shown in the figure.
(Refer Slide Time: 18:19)
Now, when we combine many such functions by varying the two parameters, plus some other
parameters, we get a blobby object or we can represent a blobby object.
157
(Refer Slide Time: 18:36)
So, the object can be represented with a function like this. Subject to the condition mentioned
here. Now by varying the parameters, ak and bk we can generate desired amount of blobby-ness
or fluidity that we require. Now, when bk becomes negative, then there are dents instead of
bumps and T is some specified threshold.
(Refer Slide Time: 19:12)
An example is shown here where we have used three Gaussian density functions by varying the
parameters to create an overall shape, something like this as shown in this dotted line.
158
(Refer Slide Time: 19:30)
There is another interesting method to use blobby object. This is also quite popular where a
quadratic density function instead of Gaussian bumps is used.
(Refer Slide Time: 19:46)
Which looks something like this b is the scaling factor, r is the radius of the object and d is
maximum radius, d is the bound on the spread of the object around its center. So how far the
object is constrained around the center is specified by d. So, these three are the parameters using
which we can define blobby object in this metaball model.
159
(Refer Slide Time: 20:23)
Now, these are some techniques that we have discussed, however it is very difficult or even
impossible to represent any arbitrary surface in either implicit or parametric form. The functions
that we have already seen are quite complex in itself. But still there are other surfaces which may
turn out to be very difficult, which are indeed very difficult to represent using such equations.
So, in order to represent such surfaces, we use a special type of parametric representation called
spline representation or splines. Now these splines we will discuss in more details in the next
lecture.
So today, we have got an introduction to various boundary representation techniques, so we
learned about mesh representation, we learned about basic idea of implicit and parametric
representation techniques with some detailed discussion on quadric surfaces and blobby objects.
In the next lecture, we will continue our discussion on boundary representation technique next,
few lectures will be devoted to a detailed discussion on spline representations that will be
followed by a discussion on space partitioning methods. That is all for today.
160
(Refer Slide Time: 22:02)
So, whatever I have covered today can be found in this book. You are advised to go through
Chapter 2, Section 2.2 for the topics that are covered today. We will meet again in the next
lecture till then goodbye.
161
Computer Graphics
Professor. Doctor. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 07
NPTEL-MOOCS L7
Hello and welcome to lecture number 7 in the course Computer Graphics so far we have covered
6 lectures and we are discussing the graphics pipeline. Before we go into today's topic, let us
recap the pipeline quickly.
(Refer Slide Time: 0:49)
So, as we have already mentioned, there are 5 stages in the pipeline, first stage is object
representation, second stage is modelling transformation, third stage is lighting or colouring.
Fourth stage is a composition of sub stages and sub pipelines. This is called viewing pipeline,
which consists of a sub pipeline which has 3 stages and 2 operations, viewing transformation,
clipping, hidden surface removal, projection transformation. And window to viewport
transformation. And the final stage of the pipeline is scan conversion or rendering.
And these stages take place in different coordinate systems, starting from local or object
coordinate system transitioning through world coordinate, view coordinate, device coordinate to
finally the screen coordinate system.
162
(Refer Slide Time: 2:04)
Now, among these stages, we are currently discussing the first stage object representation
techniques.
(Refer Slide Time: 2:15)
As we have already discussed in the object representation techniques there are broadly 5
categories, one is point sample rendering. The second one is boundary representation, then space
partitioning, then sweep representation, and finally some other specific representation techniques
which are application specific or referred to some advanced techniques such as scene graphs,
163
skeletal model and other advanced modelling techniques such as fractal representation and
particle systems.
In the boundary representation, their 3 broad group of techniques. One is mesh representation,
one is parametric representation, and one is implicit representation. Similarly, in space
partitioning representation, there are 3 broad techniques octree based representation BSP or
binary space partitioning trees and CSG techniques. Now, among all these, we are currently
discussing the boundary representation techniques and we will continue our discussion on this
technique.
In the boundary representation techniques. In the last lecture, last couple of lectures, we have
covered mesh representation and introduced the idea of parametric as well as implicit
representation.
(Refer Slide Time: 3:50)
These are some of the boundary representation techniques that we have introduced in the last
lecture.
164
(Refer Slide Time: 3:58)
Today we will continue our discussion on boundary representation today we will focus on one
specific and popular boundary representation technique, which is called spline representation.
(Refer Slide Time: 4:14)
So, in order to understand the spline representation technique, we need to understand how we
represent curve. Curve is very common, primitive shape which is required at many places to
represent objects, particularly in the context of complex shapes we cannot avoid representing
165
curves only with lines or points, it may not be possible to represent complex shapes, and we have
to take into account curves.
To simplify our discussion will focus here only on parametric representation of curves, although
earlier we have introduced both the types, namely parametric representation and implicit
representation.
(Refer Slide Time: 5:10)
How we can representation curves in general using parametric form, we can use a single
parameter u will denoted by u to represent curves or its Cartesian coordinates using these
equations. This one is for representing the X coordinate. The other one is for representing the Y
coordinate where X is a function of u and Y is another function of u. Let us try to understand the
intuition behind this representation.
166
(Refer Slide Time: 5:47)
We can assume that u is denoting time. We can think of it in this way, we are drawing the curve
on a 2D Cartesian space over a period of time now at an instant of time, we place a Cartesian
point. Then we can say that at that point of time the Cartesian point is that, in other words the
Cartesian point is characterized by the instant of time, which is u. So, essentially u denotes
specific instant of time, at which point we can determine the corresponding coordinate values
using the equation. This is the simple intuition behind the idea of parametric representation of a
curve.
167
(Refer Slide Time: 6:51)
So, that is about understanding how to representation curve parametrically. Now, our objective is
to represent the curve easily and efficiently. Let us elaborate on this a little bit more.
(Refer Slide Time: 7:07)
As we all know, we can approximate a curve in terms of a set of small line segments, of course
here the segments have to be very small to make the curve look smooth, otherwise the curve
168
make it a jagged appearance. Now, clearly this is very easy, intuitive but may not be efficient.
We may have to provide a large number of points to draw small lines judgments.
(Refer Slide Time: 7:49)
There may be another alternative. We can work out the curve equation and apply the equation to
find out any point on the curve. So, this clearly is better than specifying manually large number
of points to approximate the curve in the form of a set of line segments. So, clearly this is easy
and may turn out to be efficient also. But the problem here is that for many curves we may not be
able to find the equation itself. It is very difficult for any arbitrarily set curve to find out the
curve question.
169
(Refer Slide Time: 8:38)
So, let us try to understand these problems from that point of view of a user, what the user thinks.
And what are the problems that are user faces. Now user wants to generate a curve of any
arbitrary shape. If we are trying to represent the curve in the form of a large number of small line
segments, then user has to input a very large number of those points through which the line
segments can be generated. Clearly, no user would be interested to input a very large number of
such points.
On the other hand, for a user. It may be difficult or even impossible to find out a precise equation
of the curve. So, therefore in both the approaches user is not going to be benefited.
170
(Refer Slide Time: 9:48)
Ideally, what a user should do or what a user wants to do, user wants to provide a limited set of
points. Now, these points define the curve. So, essentially user is not providing all possible line
segments to approximate the curve or providing a precise equation to find out points on the
curve. Instead, user is providing a small or limited set of points which defines the curve. In other
words, these points a chosen such that the curve passes through or nearby those points, these
points are also known as control points.
So, the alternative to the user is to provide a small set of control points instead of providing large
set of points through which line segments can be drawn or give a precise curve equation. So, user
has provided set of control points.
171
(Refer Slide Time: 11:05)
And the user expects the system to draw the curve by interpolation by interpolating those control
points. So, let us try to briefly understand what is the idea of interpolation, many of you or
maybe all of you may already know what is interpolation, but there is no harm in refreshing our
knowledge.
(Refer Slide Time: 11:31)
172
So essentially, when we talk of interpolation, what we mean, we essentially mean by
interpolation fitting of a curve that passes through or nearby the set of points provided or the
control points.
(Refer Slide Time: 11:49)
One form of interpolation is polynomial interpolation. In this interpolation what we do, we try to
fit a polynomial curve through the given set of control points. Now, polynomial interpolation is
very popular because it is generally considered that such interpolations are simple, efficient and
easy to manipulate. So, we will focus here on polynomial interpolation.
173
(Refer Slide Time: 12:27)
Now, depending on the number of control points, the degree of the interpolating polynomial is
decided. So, when we talk a polynomial interpolation, one concern is what should be the degree
of the polynomial now that can be decided based on the number of control points provided.
(Refer Slide Time: 12:58)
Let us take an example, suppose we are given 2 control points in such a situation, it is advisable
to go for linear interpolation rather than any other higher form of interpolation because we have
174
only two control points. Similarly, if there are 3 control points, then we can go for quadratic
polynomials. There are 4 control points which use the degree accordingly and so on.
(Refer Slide Time: 13:32)
Therefore, we can say that in general for n+1 control points, we may try to fit polynomial of
degree n which is pictorially depicted here in this figure, we are given these control points
through which we are trying to fit a curve. And if the number of control points is n+1, then the
curve that we should work with or the polynomial that we should work with should have degree
n ideally. Note at the system of equations that we have mentioned here.
This is for X coordinate, similarly for Y coordinate, we can have a similar set of systems. Now
since they are n control points given we have n x coordinate values for each of these coordinates.
We have 1 equation of the curve in terms of the parameter and so for the n number of control
points, we have n number of equations.
175
(Refer Slide Time: 14:49)
Now, in those are equations there are constant terms, those are the coefficients like a 0, a1 to an-1.
If we decide these coordinates, then we can define the polynomial. So, to get the values of this
coordinate these coefficients what we need to do, we need to solve the set of equations. The n
plus one equations that we have seen earlier. If we solve this, then we will get these values of the
coefficients which defines the polynomial.
(Refer Slide Time: 15:36)
176
But there is one problem, if we have a very large n if we have many control points, a large
number of n. Then we need to solve a very large number of equations, which is not easy. On top
of it we need to keep in mind that there are two separate sets of equations, one for X and one for
Y. So, we need to solve actually two sets of equations rather than one and for large and, this
becomes very cumbersome to do.
(Refer Slide Time: 16:24)
Along with that, there is one more problem, which is called local controllability issue. Suppose
you or the user wants to change the shape slightly. So, with the polynomial equation will get a
curve which represents a shape.
Now, I want to change it slightly. Then ideally, what should I do? A change of one or few of the
control points to denote the small change. But if we go for polynomial interpolation, then to get
the new curve, we may have to recalculate the entire thing again. So, entire curve may have to be
recalculated. Which is, of course not a good thing because we have changed a few points and
ideally we should be able to restrict our pre calculations effort to those few points only, but
instead we have to solve the entire set of equations again, which is not an efficient approach.
So, this problem is known as local controllability, where we are unable to control local changes
locally. We have to control local changes through global recalculation of the curve. Now, in
order to address these issues, there is another solution which we will discuss.
177
(Refer Slide Time: 18:27)
Now, what is this alternative approach? Suppose we are given again n plus one control points
irrespective of the value of n, we may partition the entire set into subsets with fewer points.
Typically, these fewer points at 3. So, given a set of n plus one points, we may like to have
subsets where each subset contains three control points.
(Refer Slide Time: 19:00)
Now for each of these subsets we may fit lower degree polynomials. In this case, the degree 2
polynomials for each of the subsets.
178
(Refer Slide Time: 19:17)
And then these individual polynomials of lower degree, which are also called polynomial pieces,
when they join together, they give the overall curved. So, the idea is very simple. You are given
a large number of control points, but it is not necessary to fit a single polynomial curve using the
entire set of control points. Instead, what we do, we divide the entire set of control points into
subsets of smaller numbers. Each subset contains very few control points.
Typical value used is three and for each of these subsets we fit or interpolate a smaller degree
polynomial. And these polynomials, when they join together, they give the overall curved. So,
this individual polynomials are also known as polynomial pieces. So, the entire curve we are
representing in terms of polynomial pieces.
179
(Refer Slide Time: 20:23)
Let us take an example, consider this figure here. There are 5 control points p0 to p4 as you can
see, p0, p1, p2, p4 p3 and p4. Now, these 5 points need not be used to draw a single polynomial.
Which in this case would be of degree 4 instead what we can do, we can subdivide the curves or
the set of control points into subsets. Like the two subsets shown here in one subset, we have
three control points p0, p1, p2 another subset we have another 3 control points p2, p3, p4
For each of these subsets, we draw a quadratic or degree 2 polynomial and then when they join
together, we get the overall interpolated curve. That is the basic idea.
180
(Refer Slide Time: 21:40)
Now this idea of fitting a set of control points with several polynomials of lower degree than a
single higher degree polynomial is known as spline representation. So, when we talk of spline
representation, we are essentially referring to the fact that there is a set of control points, but we
are not interpolating the entire set with a single polynomial curve. Instead, we are representing it
in terms of several polynomial pieces. Now the entire carve is called spine curve simply spline.
This is a very popular curve representation technique used in computer graphics.
(Refer Slide Time: 22:32)
181
In graphics, it is very common to use splines made of third degree or n = 3 polynomials, also
known as cubic polynomials. In our subsequent discussion, we concentrate. We will concentrate
on these polynomials only and corresponding splines only. There is one important thing in spline
representation that we have to keep in mind that is called continuity condition.
(Refer Slide Time: 23:10)
Now splines as we have discussed, refers to joining of several polynomials. So, clearly it is
important to ensure that they joint smoothly. To make the resulting curve look smooth.
(Refer Slide Time: 23:32)
182
Now, how to ensure that? In order to ensure that to happen, splines must conform to what is
known as continuity condition.
(Refer Slide Time: 23:50)
There are several such conditions broadly, they are of two types, one is parametric continuity
condition and the other one is geometric continuity conditions.
(Refer Slide Time: 24:02)
183
So, in general, the nth order parametric continuity condition denoted by Cn states that adjoining
curves meet and first to the nth order parametric derivatives of the adjoining curve functions are
equal at their common boundary that is the general definition. Now, let us see what they refer to
in simple terms.
(Refer Slide Time: 24:40)
So, the first parametric continuity condition is C0, the zeroth order condition, which simply
states that the adjoining curve meet. It is just that the simple condition.
(Refer Slide Time: 25:00)
184
Now, the first order parametric condition C1 indicates that the first order derivatives of adjoining
curves at common boundary are equal. So, essentially it tells that at the common boundary, we
have to ensure that the first order parametric derivative. That means the derivative with respect to
the parameter u of the curve should be equal.
(Refer Slide Time: 25:40)
In a similar way C2 indicates that both the first and the second order derivatives are equal at the
common boundary. And in this way, we can go on. But since in graphics we mostly focus on
third degree polynomials so, we are mostly concerned with these continuity conditions up to C2.
185
(Refer Slide Time: 26:09)
Now, this parametric continuity conditions are sufficient, but not necessary to ensure geometric
smoothness of the spline. For that, what we need is to conform to the other set of continuity
conditions called geometric continuity conditions. Now what are those?
(Refer Slide Time: 26:34)
The 0 order condition is denoted by G0. This is the zeroth order condition which is similar to C0,
which simply states that the curves must meet.
186
(Refer Slide Time: 26:50)
Similarly, G1 or the first order geometric continuity condition tells that the tangent directions at
the common boundary should be equal, although they are magnitudes can be different so that
directions must be equal but magnitudes can vary at the boundary that is the G1 or first-order
geometric continuity condition.
(Refer Slide Time: 27:23)
Second-order condition or G2 indicates that both tangent direction and curvatures at the common
boundary of the adjoining curves should be equal. Again, we can go on like this up to any order,
187
but since we are mostly concerned with cubic polynomials, up to G2 should be sufficient for our
understanding. So, that is one basic knowledge that we should have about splines that is, if we
want to represent any curve as splines, that means in terms of smaller polynomial pieces, we
should ensure that the curves conforms to the continuity conditions, parametric and geometric
continuity conditions.
(Refer Slide Time: 28:27)
Now let us try to see what are the different types of Spline representations that we can use. There
are broadly two types. One is interpolating splines. Other one is approximation splines.
188
(Refer Slide Time: 28:46)
Now, in case of interpolating splines, what we want? We essentially try to fit the curve such that
it passes through all the control points. So, essentially we are given a set of control points and we
are representing the curve in the form of splines, in a way such that the polynomial pieces of the
spline passes through all the control points as shown in this figure.
Now, the commonly used interpolating splines in computer graphics are natural cubic splines,
hermite cubic splines, and Cardinal cubic splines. So, we will discuss about these splines in
details later.
189
(Refer Slide Time: 29:46)
The other type of Spline curves are called approximating splines here. Control points are used to
define a boundary or convex hull, the spline itself does not pass through all the control points.
Instead, it is restricted within the boundary defined by the control points. Take the same example
here we have 4 control points. But here, the curve is not passing through the 4 control points,
unlike earlier in case of interpolating splines, what is happening here is that these control points
are defining a bonding region, a boundary which is popularly called convex hull, and the spline
lies within this boundary.
In other words, the Spline shape is determined by the convection. Now, there are a few common
and popular splines approximating splines used in applications, namely the Cubic bezier curves
and the Cubic B splines, again will discuss about those later. So, that is the basic idea of spline.
What it is and what makes them good for representing any curve.
So, what it is it is essentially representing a complex shape in terms of smaller, manageable,
lower degree polynomials or polynomial pieces. And it is able to represent the Curves smoothly
because splines are supposed to conform to continuity conditions. Now, let us try to understand
how we represent spline. This is same as knowing how to represent the objects which are
represented by splines.
190
(Refer Slide Time: 32:08)
How can we represent splines? There are two ways, broadly one is basis or blending function
based representation. Other one is basis metrics based representation. And these two are
equivalent. Of course, that is quite obvious and one can be converted to the other and vice versa.
(Refer Slide Time: 32:34)
Let us take some examples to understand the representation so will start with basic metrics,
representation of splines. And we will start with a simple example. Consider a polynomial of
degree one that is a linear polynomial, which in the parametric form we can represent as a f u
191
equal to a0 plus ua1. Now a0, a1 are coefficients. And u is the parameter we must keep in mind
here that this is a compact representation.
ai like a0, a1 actually represents vectors comprising of two components, one each for the
corresponding coordinates. So, a0 actually has a0x, a0y values separate for x and y coordinates.
Similarly, fu should have corresponding expressions, namely fx u and fyu. However for simplicity
we will work with this compact form rather than the expanded form.
(Refer Slide Time: 34:00)
Now, this parametric equation we can represent in the form of matrix U.A. So, this is a dot
product of two matrices, U and A. U is the parameter metrics and A is the coefficient metrics.
Where U is denoted in the form of this vector 1, u and the metrics A is denoted in this column
vector form. Having the two coefficients a0, a1 in our example.
192
(Refer Slide Time: 34:46)
Now, since this is a polynomial of degree 1, so we need at least two control points to determine f.
Let us denote those two by p0 and p1.
(Refer Slide Time: 35:06)
Now, these points we will use to parameterize the polynomial, in other words, we shall assume
that certain parameter values and therefore these control points, for example, we may assume the
193
control points denote values of the function at the boundary where we can define the boundary as
the points where the parameter values text of value 0 and 1.
(Refer Slide Time: 35:48)
If that is the case, then we can set up our system of equations as shown here, two equation. One
for p0, one for p1 with the parameter value fixed. Now, by solving these equations we can get
the coefficients.
(Refer Slide Time: 36:17)
194
However, if we look closely then we can see that the same system of equation we can represent
in the form of matrices. Now, what is this matrix representation we can represent it as being able
to C.A. Where p is defined as a column vector C is defined as another column vector and A is
defined as yet another column vector.
(Refer Slide Time: 36:56)
So, how we constructed the C matrix, we took the coefficients of a i. That means from a1 to an in
that order. Those terms in each equation to from the corresponding row of the C matrix. So, first
equation we took for the first row and so on.
195
(Refer Slide Time: 37:30)
In other words, we imposed certain constraints. Parameterization conditions as constraints to
obtain C. Accordingly, C is called the constraint matrix.
(Refer Slide Time: 37:53)
Now we know P equal to C.A so we can say that A equal to C -1.P. Now, this inverse of the
constant matrix is called basis matrix.
196
(Refer Slide Time: 38:13)
So, we can represent f as U.A which can be expended, as U.C -1.P or UBP. So, this is the way to
represent f in terms of matrix multiplication.
(Refer Slide Time: 38:37)
Now, one thing we have to note here is that the basis matrix. For an interpolating polynomial that
satisfies the parameterization conditions is fixed. In other words, the matrix or the basis matrix
uniquely characterizes the polynomial. So, if we use the basis matrix B instead of the polynomial
197
equation, then this is as good as representing the polynomial because B is fixed for the particular
polynomial.
(Refer Slide Time: 39:19)
Now, we know that spline is made up of polynomial pieces. Now, if each piece is made of the
same type of polynomial, that means the degree and the constraints are the same. So, then
overall, Spline can be uniquely characterized by each piece. And since already we have
mentioned that a polynomial piece can be characterized by the basis matrix, then the basis matrix
can also be used to uniquely characterize the entire spline. So, when we are representing the
spline, we can simply represent it in terms of the basis matrix.
Now that is the basis matrix representation of spline. So, to recap given a polynomial, we can
have a unique basis matrix for that polynomial under certain constraints. So, the basis matrix is
suitable to represent the polynomial. Now the same polynomial pieces are used to represent a
spline. So, for each polynomial piece, we have the same metrics so we can use a single basis
matrix to represent the overall Spline, because the basis matrix will tell us that particular
polynomial pieces are used to represent the spline. This is the basis matrix representation of
splines.
198
(Refer Slide Time: 41:04)
This explanation we just mentioned, so basis matrix refers to polynomial pieces of this spline, we
are assuming all pieces are made up of same polynomial. So, polynomial basis matrix represents
the whole spline. Now, let us focus attention to the other type of spline representation, namely
the blending function representation.
(Refer Slide Time: 41:33)
Now, earlier, we have seen that we can representation f in terms of basis matrix, like U.B.P.
199
(Refer Slide Time: 41:43)
Now, if we expand right hand side, we get weighted sum of polynomials with the control points
being the weights. So, in our example let us derive it and see what happens. So, in our example
we have a polynomial of degree 1 and we have the matrices in this from u is this one. B, is this
Matrix and C is the control point metrics. Now if we expand, we will get this equation in terms
of the control points.
(Refer Slide Time: 42:37)
200
Now, the individual polynomials in the weighted sum, such as the term 1-u and u are the
blending functions. So, the overall function is represented as a weighted sum of polynomials.
And these individual polynomials are called the basis function or the blending functions.
(Refer Slide Time: 43:03)
Now, for a given polynomial, the blending functions are also fixed so we can use them to
characterize the polynomial. So, for a given polynomial with constraints, the functions that can
be used to represent it are fixed. So, this blending function set can be used to characterize the
polynomial so we can apply the same logic here. Spline made up of several pieces of the same
polynomial type. Therefore can also be represented in terms of the blending functions since they
are uniquely characterizing the constituent polynomial pieces.
201
(Refer Slide Time: 43:52)
So, in a compact form we can represent a Spline or the curve f in this way where pi is the i-th
control point and bi is the blending function. So, to recap today we have got introduced to the
basic idea of splines, which is essentially representing a curve in terms of constituent lower
degree polynomial pieces. Then we discussed the continuity conditions to ensure that splines
give us smooth curves. We also discussed the broad types of splines and the way the splines can
be represented in the form of basis matrices or blending functions.
In the next lecture will take up detailed discussion on the various types of splines that we have
mentioned, namely the interpolating splines and the approximating splines. We will also learn in
the next lecture about the use of splines to represent surfaces in computer graphics.
202
(Refer Slide Time: 45:15)
Whatever I have discussed today can be found in this book. You are advised to go through
Chapter 2, Section 2.3 to learn about these topics in more detail. See you in the next lecture till
then thank you and goodbye.
203
Computer Graphics
Professor. Doctor. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 08
Spline representation – II
Hello and welcome to lecture number 8 in the course Computer Graphics.
(Refer Slide Time: 0:37)
In the previous lecture we got introduced to the basic idea of splines. Now splines is one of the
areas boundary representation techniques, as we have mentioned. And in spline in the
introductory lecture, what we have learned, we learned about the basic idea, what is a spline then
how to represent it and how those representations are created and various such introductory
topics, including the types of splines.
Just to recap, let me briefly tell again about the basic idea of spline, so when we are talking of a
spline, it is essentially a representation of a curve and the curve is represented in the form of
collection of lower degree polynomials, so these polynomials joined together to give the overall
curve and the polynomials are interpolated on the basis of a set of control points that are defined
to represent the curves. So, Spline is essentially a representation where we join together multiple
polynomial curves of lower order.
204
(Refer Slide Time: 2:11)
Now, that was the basic idea, and during that discussion, we also talked about types of splines
broadly, two types are there. One is interpolating splines. One is approximating splines. Today
we are going to discuss in detail about these two types of splines. Along with that, we are going
to talk about how these splines can be used to represent curves and surfaces in computer
graphics. Let us start with the idea of interpolating splines.
If you recall, interpolating splines are those where we define a set of control points and the
Spline curve passes through those points. And we also mention 3 popular interpolating splines.
205
(Refer Slide Time: 3:07)
Namely the Natural cubic spline, the Hermite cubic splines and the Cardinal cubic splines, all 3
are interpolating splines, meaning that they are defined by control points which passes through
the Spline curves. So, let us now, go into some detailed discussion on each of these popular
types. We will start with natural cubic spline.
(Refer Slide Time: 3:44)
206
Just a little bit of history. So this is one of the first splines. That was used in computer graphics
applications. And as the name suggests, since it is a cubic spline so it is made up of pieces of
third degree polynomials.
(Refer Slide Time: 4:09)
Now, each piece, each of the polynomial pieces is defined by 4 control points, we can denote
them as p0, p1, p2, p3. Recollect that when we are talking of polynomial interpolation, what we
mentioned earlier is that if we are using a polynomial of degree n, then we are going to have n
plus 1 control points. So, similarly here since the degree is three, so we have 4 control points.
Now in the case of natural cubic splines two of the control points p0 and p3 denote two boundary
points that means the boundary value of the parameter u so p0 refers to the point when u = 0 and
p3 refers to the point where u = 1. Now, the other two control points are essentially denoting the
first and second derivatives at the starting point that is equal to 0. So, two control points are
essentially points on the curves p0 and p3, which represents the two boundary points and the
other two points are not points on the Curves, rather the first and second order derivatives with
respect to the parameter u at u equal to 0.
207
(Refer Slide Time: 5:47)
Now let us try to represent this spline using one of the representation techniques. So, recollect
again that spline can be represented in either of the two ways, namely a basis matrix
representation and a blending function representations. We also mentioned that they are
equivalent. So, any one representation is good enough. Now, in the context of interpolating
representations, we will try to represent them using basis matrix form.
So, in case of natural cubic spline, since we are using cubic polynomial so the general
polynomial piece equations should look something like this. Where a0, a1, a2 and a3 are
coefficients, u is the parameter. Now we already know that p0 or the control point p0 and the
control point p1 control point p2, it should be p2 and the control point p3 have specific meaning,
so p0 and p3 represent the point at the boundaries. So, then we can set up the equations,
something like this, p0 is the function value at u equal to 0 which we can obtain by replacing u
with 0.
So, we will get a0, 0.a1, 0 square a2, 0 cube a3 just replace 0, replace u with 0. Similarly for p3,
we have the function value at u equal to 1. So, in this case we replace u with 1 and we get
something like this. These two are equations corresponding to the control points, p0 and p3, p1
and p2, as we said represent the first order and the second order derivative at u equal to 0. So, p1
is the first order derivative, p2 is the second order derivative with respect to u.
208
Now if we compute the derivatives and then replace u with value of 0 in case of p1, what we will
get will get equation, something like this. You can try it yourself to first obtain the derivative and
then replace the u values with 0. Now we compute the derivative again to get the second order
derivative at u equal to 0 corresponding to p2 and then replace u with 0 again. And we get
equation something like this.
So, these 4 equations represent the system of equations that we can obtain by utilizing the control
point characteristics. And from these equations, from these set of equations, we can derive the
basis matrix. How we can do that, we can first construct the constraint matrix by taking simply
these values attached to the coefficients and then we take the inverse of that to get the basis
matrix.
(Refer Slide Time: 9:53)
So, what is the constraint matrix? Let us recast the equation here. So, from there what we get, as
we can see, 1, 0, 0, 0, this is the first row. 0 1 0 0. This is the second row, 0 0, 2 and 0 this is the
third row and finally 1, 1, 1, 1. This is the fourth row. So, just by utilizing these values, we get
the constant matrix C.
209
(Refer Slide Time: 10:41)
Now, to get basis matrix what we need to do, we need to take the inverse of C, how to compute
the inverse that we will not discuss here, of course, because that is a very basic knowledge. And
you can refer to any book on basic matrix. Also, you can refer to the appendix of the reference
material that will be mentioned at the end of this lecture to learn how to get the inverse of a
matrix, assuming we already know that. So, if this is our constraint matrix C and if we take the
inverse, the matrix that we will get is this one. And this is the basis matrix for the natural cubic
spline.
So, let us recollect the series of steps we followed to get the basis matrix first we defined the
control points then using those control points. We set up the system of equations and from the
equations we formulated the constraint matrix C. Then we took the inverse of C to get the basis
matrix, since basis matrix is the characteristic matrix for the cubic polynomials and that natural
Cubic spline, are made up of cubic polynomials.
So, we can say that this basis matrix, which is characteristics of cubic polynomial is good enough
to represent the natural cubic spline. So, we will follow that line of argument as discussed in the
previous lecture.
210
(Refer Slide Time: 12:34)
So, if we simply use the matrix, then we can say that this is the natural cubic spline instead of
specifying the equations or anything. Now, another important property here that these cubic
splines have are that they support continuity conditions up to C2 continuity, parametric
continuity up to C2 that means they support C0 continuity, C1 continuity and C2 continue.
But the problem is that cubic splines do not have local controllability. That means if we have a
spline and we want to change its shape slightly by modifying a very few points on the curve,
then we have to re-compute the whole curve rather than re-computing, only the localized area.
So, that is essentially a very inefficient approach. For example, suppose I have curve like this
and I want to change this part to something like this, but then instead of just restricting our
computation within this part, we have to actually compute the whole curve again. So, it does not
support local controllability.
Now, let us come to the second type of interpolating splines that is Hermite cubic splines like
natural cubic, Hermite cubic splines are also represented by 4 control points.
211
(Refer Slide Time: 14:39)
However, it has one important property it supports local controllability. So, the main problem
with natural cubic splines is alleviated with the Hermite cubic splines.
(Refer Slide Time: 15:00)
So, the 4 control points that defines Hermite cubic splines can be denoted by p0, p1, p2 and p3.
Now where p0 and p2 are the values at the parameter boundaries that means u equal to 0 and u
equal to 1, p1 is the first derivative at u equal to 0 and p2 is the first derivative at u = 1. So,
212
earlier we had p0 and p3 to be the points at the boundary p1 and p2 are the first and second
derivatives at the same point that is u equal to 0.
Now what we have is p0 and p2 here using to represent the points at the parameter boundaries,
namely u equal to 0 and u equal to 1. P1 is the first derivative at u equal to 0 and P2 is the first
derivative at u equal to 1. So, at both the boundary points we have first derivative denoted by the
control points P1 and P2.
(Refer Slide Time: 16:14)
Following the approach we used to derive the basis matrix for natural cubic splines, we can also
derive the basis matrix for the Hermite cubic splines, will not do that here. And you can try it
yourself. The approach is same. So, you set up a system of equations, identify the constraint
matrix take its inverse and get the basic matrix. The Matrix looks something like this, which is
the representation for the Hermite cubic splines.
213
(Refer Slide Time: 17:04)
Now, although the Hermite cubic splines support local controllability, but they do not support all
parametric continuity conditions, unlike the natural cubics, they support only C0 and C1
continuity conditions and do not support C2 condition. But natural cubics support all these threes
C0, C1, C2. This implies that the curve that results from the Hermite cubics polynomials are less
smooth compared to natural cubic polynomial base spline. So, at the expense of smoothness. We
are getting the local controllability property, so we are getting local controllability, but we are
losing to some extent, the degree of smoothness in the curve.
214
(Refer Slide Time: 18:15)
Now, one problem here is that we have to specify the first order derivatives. As control points
and both the boundaries, clearly this puts extra burden on the user.
(Refer Slide Time: 18:47)
To avoid that we have another spline curve that is the cardinal cubic splines. Now, with this
spline, we can resolve the problem that we faced with Hermite cubic splines that is having to
define the first order derivative and both the boundary points.
215
(Refer Slide Time: 19:08)
Like before since again, we are dealing with cubic splines, this splines is also made up of
polynomial pieces, and each piece is defined by 4 control points. Again, we are denoting them
using p0 to p3. Same notations we are using but here p1 and p2 represent the boundary values
that means at u equal to 0 and u equal to 1. P0 and p3 are used to obtain first order derivatives at
the boundaries. How, look at this system of equations here. This is the p1 control point which is
the function value at 0. This is p2 control point which is the function value at 1, u equal to 1.
Now first order derivative at u = 0 can be obtained using this equations. Similarly first order
derivative at u = 1 can be obtained using these equations. Where we have used that to control
points in this fashion. And also used one additional parameter t in both cases.
216
(Refer Slide Time: 20:40)
Now t is called tension Parameter. So, essentially it determines the shape of the curve when t
equal to 0 what we get is the catmull-rom or overhuset spline. So, that is a special type of spline
when t equal to 0. So, using the value of t we can actually control the shape of the overall spline
curve. So, here we do not have to actually compute the derivatives instead we can derive it using
the control points and the value of t, so that actually makes the life of the user simpler.
But again, the Cardinal Cubic also suffered from the same problem. That is it supports up to C0
and see one parametric continuity conditions. It does not support C2. So, it has less smooth curve
than natural Cubics.
217
(Refer Slide Time: 21:48)
So, then what is left is the basis matrix formulation. This is slightly complicated, but again, you
can try using the same approach by setting up the system of equations where some
rearrangements may required. Then you create the constraint matrix, take the inverse to get the
basis matrix for the cardinal cubic spline. Which is represented in the form of S where S is this
expression.
So, to summarize, we have learned about 3 interpolating splines. Each of these interpolating
splines are made up of polynomial pieces. And these pieces are Cubic. So, each piece is defined
by 4 control points. In case of natural cubic, the control points are defined in a particular way, in
case of Hermite cubic it is defined in a different way and in case of Cardinal Cubic, it is defined
in yet another way. So, natural Cubic is very smooth, but it has some problems.
Those problems are particularly related to local controllability, which is taken care of in the
Hermite Cubic, but at the expense of smoothness. But specifying Hermite Cubic’s is slightly
difficult, which is again taken care of by Cardinal Cubic’s. But the smoothness remains less
compared to natural Cubic’s. Now, let us move to the other category of splines, namely the
approximate splines, so what are these splines? Just to recap here, we have a set of control
points, but this spline need not pass through all the points. Instead, the control points are used to
define the convict’s hull, which determines the shape of the overall Spline curve.
218
(Refer Slide Time: 24:18)
Now, there are two popular approximating splines used in computer graphics. One is called
Cubic Bezier curves, and the other one is B splines. Like in the case of interpolating splines, let
us try to understand these types in a little more details. We will start with cubic Bezier curves.
(Refer Slide Time: 24:47)
Now, these particular curves, the Cubic Bezier curves are among the widely used representation
technique in graphics. The name has been derived from the French engineer Pierre Bezier, who
first used it to design Renault Car bodies.
219
(Refer Slide Time: 25:17)
Since it is Cubic curve so defined by 4 control points as before, but here the difference is that the
curve is an approximating Spline that means the polynomial pieces do not pass through all the 4
control points. We can denote these points like before p0, p1, p2 and p3. So, there are 4 points,
but it is not necessary that the curve passes through all the points.
(Refer Slide Time: 25:51)
Instead, these points are used to define the convex hull, so each piece originates at p0 and ends at
p3, that is the first and the last control points. The other two points are used to determine the
220
convex hull within which the curve lies. So, 2 points are on the curve and other 2 points are
controlling the shape of the curve.
(Refer Slide Time: 26:24)
Now, the control points can be used to define first order derivatives at the boundary values that is
u equal to 0 and u equal to one using these expressions. So, this is the first order derivative at u
equal to 0 and this is the first order derivative u equal to 1. And these two derivatives are defined
in terms of the control points as shown in these examples.
(Refer Slide Time: 27:05)
221
Now we can set up the equation. This is the Cubic polynomial equation, as we have seen before,
and if we replace the values, the boundary values. And the. This should be the boundary value.
This is one value and this is another value, and this would be the first derivative equal to 1. So,
then what we get is this system of equation. So, this is boundary value at u equal to 0 boundary
value at u equal to 1 this is first order derivative u equal to one and first order derivative at u
equal to 0.
So, for each we can actually replace u with the corresponding value and then get the set of
equations, as we have seen earlier.
(Refer Slide Time: 28:19)
From this set of equation we can rearrange and get this form shown here for the control points.
222
(Refer Slide Time: 28:39)
Now, from that rearranged form, we can get the constraint matrix as shown here. This is the
constraint matrix. Then we take the inverse of the constraint matrix C -1 to get the basis matrix,
representation of the Bezier curves, which is something like this. So, note here we followed the
same approach that is first we created the set of equations using the control point characteristics,
then we formulated the constraint matrix and finally we took the inverse of the constraint matrix
to get the basis matrix.
(Refer Slide Time: 29:32)
223
There is actually a blending function representation also for Bezier curves. We already said they
are equivalent, but blending function representation for Bezier curves is also quite popularly
used. Which as a general form something like this, where pk is the control points and Bez is the
blending functions. And the function is defined within this range is u between 0 and 1.
(Refer Slide Time: 30:12)
Now, these blending functions are actually special functions, sometimes they are known as
Bernstein Polynomial which has this form where C is this one. So, Bezier curves we can
represent using blending functions where the blending functions are called Bernstein
polynomials, which takes the forms on here in this first line. Where the term C (n, k) is shown in
the second line.
224
(Refer Slide Time: 31:00)
Now, if we expand the blending functions for say a cubic Bezier where n equal to 3, then how it
looks? BEZ 0, 3 is (1-u)3, 1, 3 is this one 2, 3 is this one and 3, 3 is this one. So, these are the
blending functions for Bezier Cubic curve. Same thing we can derive from the basis matrix also.
As I said in the previous lecture, that they are equivalent. One can be derived from another and
you can try to do the same.
(Refer Slide Time: 31:46)
225
So, one major problem with the Bezier Cubic is that they do not support local controllability.
That is, of course a major concern. Now, let us move to the other category of splines that is B
splines.
(Refer Slide Time: 32:14)
Now here, the idea is slightly more complicated. Now, B splines are approximating splines, as
we have already mentioned, they support up to C2 continuity conditions, parametric continuity
conditions. That means C0, C1 and C2 all three they support. In other words, these splines give
us a very smooth curve and they support local controllability. So, Bezier curves do not have local
controllability, whereas these splines is very smooth. They are very smooth as well as they
support local controllability. So, all our problems are solved with these splines.
226
(Refer Slide Time: 33:02)
Now, let us try to understand the basic idea, the mathematics behind it is slightly complicated, so
we will try to keep it at the minimum for simplicity. What is the idea? The idea is based on
representing a polynomial in terms of other polynomials, as we have seen in the blending
function representation. So, that is where this idea starts, that a polynomial can be represented in
terms of other polynomial.
(Refer Slide Time: 33:40)
227
So, what is the most general form of such representation, we have encountered this expression in
the previous lecture pi control points and bi is the blending function. So, we are assuming in this
representation that the function f is a linear combination of the blending functions where the
control points p serve as the coefficients. Now, in this definition, one thing to be noted here is
that we are using a parameter t instead of u. Now this parameter is more general. That means that
we need not restrict ourselves to the range defined for you that is between 0 and 1. So, the t can
take any value, not necessarily within the range 0 to 1.
(Refer Slide Time: 34:40)
Now, each bi is a polynomial. Let us assume that so then f can be considered as a spline made up
of polynomial pieces, which is basically the idea now we can represent each bi as a combination
of other functions, like the overall function f. So, f is represented as a linear combination of bi’s,
each bi in turn can be represented as a combination of other functions. Then conceptually, each
bi is also spline, because it is made up of other polynomials.
So, when we talked about Spline, we said that Spline is a curve which is made up of lower
degree polynomials now each of these polynomials. Again, we are assuming to be made up of
even other polynomials. So, this polynomial pieces are themselves splines rather than simple
polynomials. And that is the overall idea of B splines. That is we are having a spline, which is
made up of basis spline rather than polynomial pieces.
228
Now, each basis spline is made up of polynomial pieces. So, the definition is one level higher
than basic definition of Spline. In spline we have a curve made up of polynomial pieces in this
spline. We have a curve made up of basis splines, which in turn is made up of polynomial pieces.
(Refer Slide Time: 36:19)
So, what we get is a spline made up of B-splines and that is the overall idea. So, when we are
talking of B- splines, we are actually referring to the constituent splines.
(Refer Slide Time: 36:36)
229
Let us try to understand in terms of an example. Suppose we have 4 control points and we want
to fit a curve through those points.
(Refer Slide Time: 36:51)
So, since we have 4 points. So, our curve function will look something like this, we expand the
general form on shown earlier to get the 4 blending functions in the general form.
(Refer Slide Time: 37:17)
230
Now, assume that we are using a linear B-spline that means B-splines pieces made up of linear
polynomials.
(Refer Slide Time: 37:30)
Then, each B-spline will have two linear pieces as per the following equation. This is the Bspline and these are the linear pieces defined within range of t.
(Refer Slide Time: 37:58)
231
So, as you can see in the previous formulation, that each B-spline is defined between sub
intervals of the parameter T now within this sub interval, the pieces that made that are the
constituents of the B-splines have their own range. So, we have a range for the B- spline and we
have sub ranges for the constituent polynomial pieces of the spline.
(Refer Slide Time: 38:27)
Now, those points in the parameter range where the piece starts or ends are called knot points or
simply knot for the ith B-spline, the knots are i+1 and i+2. Just to recollect, so we have a spline
made up of B- spline. Now each B-spline is defined within a range that range is subdivided for
each constituent piece of a B-spline. Now, where a piece starts or ends are called knot points or
simply knots. For the ith B-spline. We can define the knots as i, i+1 and i+2.
232
(Refer Slide Time: 39:21)
For the 4 control points in our example, i takes the value 1 to 4. So, then we can have the knots.
As 1, 2, 3, 4, 5, 6, these are the 6 knot points, and this set of points is the vector of knots, also
known as the knot vector. Which is, of course, an increasing order of parameter values.
(Refer Slide Time: 39:57)
So, then to summarize each B- spline is made up of k where k equal to d+1, d is the degree of the
polynomial. Now parameter t ranges between 1 to n+k having n+k knots. In our example n is 4, n
is the number of control points k is 2 since d is 1. So, the range is between 1 to 6 with 6 knots.
233
(Refer Slide Time: 40:38)
The ith B- spline is defined within the knots i and i+k, for example, the second B-spline that
means i=2 is defined between 2 and 2+2 or 4, because k is 2. So, between 2 to 4, that means the
knot values are 2, 3 and 4.
(Refer Slide Time: 41:10)
So, each B-spline has k+1 knot's. Now, in our example, since k equal to 2. So, each B-spline will
have 3 knots for any other value of k the B-spline will have k+1 knots. So, these are the
characteristics of the B-spline.
234
(Refer Slide Time: 41:32)
Another characteristics is k is actually very crucial in determining all these characteristics. So,
this k value is often called the B-spline parameter. So, you should keep this in mind that k is very
important, it plays a very crucial role in determining the B- spline characteristics that we have
enumerated earlier. So, often k is called the B- spline parameter. Now, let us see various types of
B-spline. So far, we have given a basic idea, basic introductory idea of B- spline. Now let us see
the types that are there for B spline.
235
(Refer Slide Time: 42:29)
There are 3 types, uniform B-spline, non-uniform B-spline and nurbs or non-uniform rational Bspline we will briefly introduce each of these types.
(Refer Slide Time: 42:44)
So, when we have the knots that are uniformly spaced in the knot vector, like in the case of the
example we have seen before, those B-spline are called uniform B-splines. If the knot vectors are
not uniformly spaced, spacing between consecutive knots is not the same. Then we get nonuniform B spline.
236
(Refer Slide Time: 43:20)
And finally, nurbs referred to the B-splines which actually refers to ratio of two quantities shown
here in this expression. Each i’s are the scalar weights and bi’s are non-uniform B-splines and
also it should be noted that the same B-spline are used in both numerator and denominator. So,
nurbs essentially refer to this ratio, so we have a uniform B-spline where knot points are
uniformly spaced, we have non-uniform B-spline, where knot points are not uniformly spaced.
And we have nurbs where we talk of ratio of two quantities, each of which is made up of same
non-uniform B-spline.
237
(Refer Slide Time: 44:33)
Just for your information, although will not go into the details of these. That we can derive the
piecewise functions for B-spline, but that is complicated and there is one recurrence relation we
can use to do that called Cox de Boor recurrence relation. If you are interested, you may refer to
the book and the material mentioned at the end of this lecture. So, far, we have discussed about
different types of splines. Now, the basic idea of splines is that using this, we should be able to
represent some curves and render it on the screen. Now how can we display spline curves?
(Refer Slide Time: 45:27)
238
So, where we started how to fit a curve to a given set of control points, so we are given a set of
control points and we want to fit a curve, we discussed how to use the idea of spline to get the
curve.
(Refer Slide Time: 45:49)
Now what we can do. So, we got the spline. Then what we can do the simplest way is to
determine or interpolate new control points using the spline equation and then joined the control
points using line segments. We can also use the blending function representation for
interpolating these points.
239
(Refer Slide Time: 46:14)
As an example, suppose we are given two control points p0 and p1 this p0 and this is p1 since
two control points out there we can use a linear interpolating spline and once we identify the
spline representation using the approaches mentioned already, we can evaluate the spline
function to create 3 new control points shown here. And we will get the parameter value for
these control points defined here, and then we will get the actual function value.
So, we are given two control points, we are using a linear interpolating spline and using that
spline, we are creating 3 new control points and once these points are created. We can join them
using a line segment to get the curve.
240
(Refer Slide Time: 47:35)
And we can keep on adding more and more points to get a final curve.
(Refer Slide Time: 47:42)
In fact, if more control points are there we can use more control points to get better splines,
higher order splines instead of linear splines and do the same approach, follow the same
approach. So, anything is possible.
241
(Refer Slide Time: 48:04)
But the main problem is that we have to evaluate the blending function again and again. Now,
when we want to display the splines at a very high rate, these computations may be costly and
may slow the rendering process. So, we require some alternative approach.
(Refer Slide Time: 48:22)
One of those approaches is known as De Casteljau algorithm. And we will have a quick look at
the algorithm.
242
(Refer Slide Time: 48:36)
So, the algorithm consists of a few steps. They are n control points given. What we do, we join
the consecutive pairs of control points with line segments and then we continue till a single point
is obtained. What we do, we first divide each line segment in the ratio, d:1-d to get n-1 new
points where d is any real number greater than 0. Now join these new points with line segments
and then we go back to step 1.So, this is kind of look how it works.
(Refer Slide Time: 49:31)
243
Let us see one example, suppose here n equal to 4 and we have decided on d to be one third. So,
p0, p1, p2, p3 are the initial points, then at one third of the p are say p0 and p1 we took one point.
Similarly, one third of the p are p1, p2 you took another point and then another point? And then
these p0 has been modified to refer to this. Then we continue in the loop to finally get point here
after the algorithm is executed
So, this division continues till we get one point. And at that point, we stop. So, that is the output
point. This is, in short how we can use an iterative procedure to a simpler procedure to get points
on the curve using the algorithm.
(Refer Slide Time: 51:01)
So, what we are doing in each run, we are getting a new point and we can execute it any time we
want to get m new points, when these points are joined with line segments, we get the Bezier
curves. So, here it has to be noted that we get this particular curve, not all Spline types. So, this
algorithm actually refers to creation of a Bezier curves, which is an approximating spline. We
can execute the algorithm m times to get the m points on the Bezier curve. So, that is how we can
get the points and create the curve.
Now there is another thing in computer graphics, so we are not necessarily satisfied with only
curves. What we need are surfaces also curved surfaces. So, using spline idea how can we get
curved surfaces is very simple.
244
(Refer Slide Time: 52:13)
Look at this figure here. We can actually create a grid of control points like shown in this figure.
So, each subset of the grid can be used to get a curve in different ways and when we put together
these curves, we get a surface, so to get the surface, we define a grid shown in the field of circles
here using one subset, we get one spline curve. Using another subset, we get another Spline
curve which are intersecting, and when we get this grid of curves, we get the surface, the surface
is defined by that grid. So, this is the simple way of getting a surface.
So, to recap today, we have learned about many new things. First of all, we learned different
types of splines. We talked about 3 interpolating splines, natural cubic, Hermite cubic and
Cardinal cubic. We also talked about two approximating splines, namely Bezier curves and Bsplines.
The idea of B-splines is slightly complicated. And we explained it in terms of example. Then we
touched upon the idea of how to display a spline curve simple approaches first get the curve
spline equations or spline representation using that try to interpolate new points on the curve and
join those points using line segments to get the final curve. But here evaluating the splines,
equations at each new point may be time consuming, may not be efficient solution.
So, to avoid that some efficient algorithms are proposed. One of those is the De Casteljau
method. Now, in this approach, we can get Bezier cubic curves using simple iterative procedure,
245
which does not involve lots of computations. Also, it touched upon the idea of spline surfaces
that is essentially creating set of curves and then using those curves to define the overall surface.
(Refer Slide Time: 55:10)
Whatever we have discussed today can be found in this book, Chapter 2. So, you can refer to
Section 2.3 of the book to get more details, including some topics that we have excluded from
our discussion today. Namely, Cox de boor equations to get the B-spline blending functions.
With this, we complete our discussion on spline. One more topic is left in the overall discussion
on object representation which we will cover in the next lecture. Thank you and goodbye.
246
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 9
Space Representation Methods
Hello and welcome to lecture number 9 in the course Computer Graphics. We are discussing
the stages of the graphics pipeline. As you may recollect, there are 5 broad stages, the very
first stage is the object representation, and we are currently discussing on the object
representation technique.
(Refer Slide Time: 0:55)
This figure may help you recollect the stages. We have the first stage as object representation,
then we have the second stage as modeling transformation, third stage lighting or coloring,
fourth stage is the viewing pipeline which itself consists of 5 sub stages, the viewing
transformation, clipping, hidden surface removal projects and transformation and window to
viewport transformation. And finally, we have the scan conversion which is the fifth and final
stage. Now, we are discussing the first stage, object representation.
247
(Refer Slide Time: 1:44)
So far what we have discussed, let us have a relook. There are broadly 4 techniques that are
popularly used for representing objects, one is point sample rendering, then we have
boundary representation. Now, in boundary representation, there are three techniques: mesh
representation, parametric representation and implicit representation. In the previous few
lectures, we talked about point sample rendering, the boundary representation techniques
including mesh representation, parametric representation and implicit representation.
Also, in the previous couple of lectures, we discussed in details one of the very popular
parametric representation technique, namely the spline representation. The other techniques
include space representation or space partitioning based representation technique, which itself
has 3 sub techniques or 3 BSP or binary space partitioning and CSG. Then, we have sweep
representation having further techniques namely sweep surface representation and surface of
revolution representation.
Apart from these four, there are some other implementation techniques which are application
specific, and some advanced techniques such as scene graph, skeleton model and advanced
modeling techniques including particle systems, fractal systems and so on. And we have got
some idea on various representation techniques in our previous lectures. Today we are going
to discuss in details the space partitioning technique for representing objects including its sub
techniques.
248
(Refer Slide Time: 3:56)
So, that will be the topic of our discussion today, space partitioning methods.
(Refer Slide Time: 4:04)
249
Now, when we talk about space partitioning, what we refer to? As the name suggests, space
partitioning refers to the representation of an object in terms of the 3D space that it occupies.
The space is defined by its boundaries and we represent the space or the volume that is
enclosed by the boundaries rather than the boundaries, which we have seen in the earlier
lectures.
(Refer Slide Time: 4:43)
Now, there is one important concept in space partitioning techniques, this is called voxels.
You have heard of pixels, we mentioned this term earlier. This is the smallest display unit.
And on a display screen, we assume that there is a pixel grid. Similarly, voxel is basically the
3D counterpart of the concept of pixel. Voxel is the smallest unit of representation in a 3D
space. Any 3D space can be assumed to be having a voxel grid.
250
(Refer Slide Time: 5:35)
Like a pixel grid, we can create or assume that a voxel grid exists to represent a 3D space. So,
pixel grid is for 2D space, voxel grid is for 3D space. And voxel, as maybe obvious, is the
simplest way of representing 3D objects.
(Refer Slide Time: 6:12)
Now, when we talk about voxels, we are essentially referring to typically uniform sized cubes
or parallelepipeds. So, essentially, a grid of uniform sized cubes or parallelepipeds that is our
voxel grid. Now, in each grid, each voxel element of the grid typically carries various
information. What are those information? The intensity at that particular 3D region, the
temperature of that region and so on. And these information actually help in uniquely
251
describing the 3D scene. So, essentially voxels are having attributes which are characteristic
information for the particular object in the scene.
(Refer Slide Time: 7:25)
Using voxels, we can actually apply various techniques to represent objects. One such
technique is the Octree method. What is this method? As the name suggests, it is kind of tree
representation, now let us try to understand what is this tree.
(Refer Slide Time: 7:51)
So, essentially it refers to a method to create a voxel grid, but in the form of a tree.
252
(Refer Slide Time: 7:59)
So, in this method, the input is a 3D region. Now, we divide this input space into 8 sub
regions, then each sub region in turn is again divided into 8 sub sub regions. So, essentially
this is a recursive step and this recursion continues till we arrive at some preset uniform size
of the sub regions. Now, this size can be user defined, for example, a unit cube. So, when we
see that our division leads to unit cube sized regions, then we stop there, an example is shown
in this right-hand figure.
So, this is our initial 3D region. Now, as you can see, we have created 8 sub regions from
here, this is 1, 2, 3, 4, 5, 6 ,7 and 8, then each of these sub regions are further divided into 8
sub sub regions like here as you can see in this division, again there are 8 sub regions 1, 2, 3,
4, 5, 6, 7, 8. Similarly, here also, we divided and all other sub regions also we can divide.
253
(Refer Slide Time: 9:50)
Now, what is the output of this recursion? It creates a tree. So, we have a root node. For each
root node, we have created 8 child nodes from here to here. Now for each child node, again
we are creating 8 child nodes and this process continues. So, essentially this leads to a tree.
Since each node has 8 children, we call it octree and the leaf nodes of this tree represent the
3D space. So, all the intermediate nodes are intermediate representations, they do not
represent the final object, at the leaf level the final object is represented.
Remember here that along with the space the attributes or the characteristic information is
also stored at the leaf node level.
(Refer Slide Time: 10:59)
254
So, in order to represent different objects, we can associate unique properties to those leaf
nodes or voxels, the properties such as color, density, temperature and so on, that is the basic
idea. So, we are given 3D space, the space we divided into 8 regions and we continue this
division, each sub region is divided further into 8 sub sub regions and so on. We continue this
division in a recursive manner till we reach the voxel level or the uniform sized cubes. And
this process creates a tree, each node in this tree has 8 children, so the tree is called octree.
The leaf level contains the voxels are representation of the 3D object and the characteristic
information such as color, density, temperature and so on are associated with each voxel in
this tree to uniquely identify objects. Octree is one of the various methods where space is
used to represent the objects. Another popular method is called BSP or the BSP method.
(Refer Slide Time: 12:51)
Now, BSP stands for binary space partitioning and the method is binary space partitioning
method. So, what it does? In the octree, we have done some recursive partitioning of space.
In the BSP tree, we follow similar recursion, that is we are given a space we divide it into
subspace and then divide the subspace again and continue till some condition is met.
255
(Refer Slide Time: 13:26)
However, instead of dividing a space into 8 subspaces like we have done in the case of
octree, here what we do we divide it into 2 spaces or subspaces in each recursive step. So, in
case of octree we are dividing the region into 8 sub regions, in case of BSP tree we are
dividing the region into 2 sub regions, that is why it is called binary space partitioning.
Binary or two.
(Refer Slide Time: 14:01)
Now, how do we divide? We use planes. So, we are given a region we assume planes are
available to divide the region into sub regions. But these planes need not be parallel to the
planes formed by the axes XY, YZ or ZX. It can be planes of any orientation. Of course, if
256
we are having parallel to the planes formed by the principal axis then it is easier to
implement, otherwise we have to do additional computations.
(Refer Slide Time: 14:49)
Let us see one example. Suppose we are given this 3D region and we are going to represent it
in the form of a BSP tree. So, our root node is the whole object, then we have used a plane
this one to divide it into two regions, the left region D and the right region C. Left with
respect to the plane and right with respect to the plane, left sub region and right sub region.
Then we have used another plane here to divide B into two sub regions D and E. So, then
eventually what we got? The object is represented in terms of three sub regions D, E and C.
So, these three at the leaf level represent the object. And again like in case of octree, we can
associate unique properties with each of these sub regions to uniquely identify the object.
Now, here we have used two planes which are orthogonal to each other, they are orthogonal
planes, but that is, as we have already seen earlier, that is not a strict requirement. Planes of
any orientation can be used to divide the regions into two sub regions.
So, here instead of these orthogonal planes, we could have used any plane of any orientation
to divide it into two regions, two sub regions, that is the most general formulation for BSP
tree creation. But, as I said, if we are using planes that are not parallel to the planes formed by
the principal axis, then additional computations may be required to process the
representation. Let us see one more example for creation of BSP tree, how we can create,
how we can write an algorithm for creation of a representation.
257
(Refer Slide Time: 17:29)
Consider this figure, here we want to represent this circle, this is of course a two dimensional
figure, it is not a 3D object, but this 2D object we want to represent this circle which is
having the center at Pm and radius r. So, we want to represent it using BSP tree.
(Refer Slide Time: 18:01)
How we can do that? So it take as input the surrounding region R which is the square
represented by the four vertices or the corner points P 1, P 2, P 3 and P 4 and the circle is
within this region. So, when we are representing the circle, we are essentially representing
this region with some regions that are part of the circle having unique characteristics of the
circle. How we can do that?
258
(Refer Slide Time: 18:46)
So, we will partition the region into 4 sub regions. So, first we divide into 2 sub regions, then
each of these 2 sub regions we divide into further 2 sub regions and so on. So, just for
simplicity we are combining these steps together here and stating that we are dividing it into
4 sub regions. And here we are using lines parallel to the axes.
(Refer Slide Time: 19:25)
Using that idea, what we can do? We can write an algorithm for a function create to BSP
where R is the input region. Now, we will use R to create a node, then from R will create 4
regions by joining the midpoints of the sides of the square, this is same as applying the binary
spaced partitioning techniques in multiple steps. Like in this case, we first create 2, then for
each of these two create two more and we are actually combining these steps here in this line.
259
Now, for each of these sub regions here at this leaf node of this figure, if the size of this sub
region, suppose this region is divided into 4 sub regions, so we are considering this. Now, if
the size of the sub region is a unit square, where we are assuming that a pixel is represented
as a unit square. Then we go to the next step that is if distance between the centers of the
original region R and the sub region that we are considering currently is less than or equal to
the radius of the circle, then this sub region is part of the circle.
So, we add it as a leaf node to the BSP tree and mark it as 'inside'. Otherwise, we mark it as
'outside', although add it as part of BSP tree. Now, if the size is not unit square, that is we
have not reached the termination condition of recursion. So, for each node we perform the
function again, that is we call the function again recursively mentioned in these 2 steps. So, at
the end of it we get a tree, where the leaf nodes represent the bounding region, the original
region divided into sub regions.
Now, some of these sub regions are marked as inside this particular object, inside the square.
Whereas, others are marked as outside. So, from that tree, we get a representation of the
square. So far so good. Now, what is the problem? Is there any problem with this space
partitioning approach?
(Refer Slide Time: 23:00)
One problem may be very obvious to you by now, that is we require large memory to store
these voxel grid information. So, we are creating tree where the leaf nodes represent the grid
and if we are partitioning uniformly, then there will be large number of voxels and we require
260
significant amount of memory space to store the voxel information. The problem comes
because we are dividing space into uniform sized voxels irrespective of the space properties.
Like we have seen in the earlier example, the region R is a big region, within this, circle
occupies a small area, but we are dividing the region into uniform sized sub regions and
many sub regions may be outside the circle. So, if you want to represent the circle, it is not
necessary to represent those regions which are outside the circle that actually wastes some
memory space. So, if we have some method to avoid these wastage, then of course that would
be helpful.
(Refer Slide Time: 24:21)
So, what do we want? If a certain region in space has the same properties everywhere, then
ideally we should not divide it further. Because even if we divide whatever we get, we will
get the same property. Instead, what we do, we still divide it into voxels, although each voxel
contain the same attributes, because the broader region or the broader region in space has the
same property everywhere.
261
(Refer Slide Time: 24:59)
So, we can actually save, this is there is a typo here, it is safe, we can save memory by using
this idea of non-uniform partitioning. So, earlier we are talking of partitioning a region into
uniform sized voxels. Now, we are talking of non-uniform sized space partitioning. What is
that? That is, instead of using many voxels to represent a homogeneous region that is a region
where property is same everywhere we use a single unit. So, we do not divide that region
further, instead the whole region is represented as a single unit.
(Refer Slide Time: 25:47)
How we can do that? One way is modify the octree method. Now, earlier we were having
fixed 8 children for each node because we were dividing the region into 8 sub regions
irrespective of the property of the region. Now, what we can do? Either we divide or we do
262
not divide. So, if one sub region is having same property everywhere, then we do not divide
it. So, that node will not have any children or 0 children. But if it is not the case, then we
divide it into 8 sub regions or 8 children.
So now, in the revised octree method, what we will have either 0 or 8 children for each node,
which was not the case earlier. So, earlier we were getting 8 children for each node,
intermediate node, now we will get either 0 or 8 children for each intermediate node
depending on the property of that space represented by the node.
(Refer Slide Time: 26:58)
The recursion procedure, of course, will remain, the recursion procedure will remain the
same. But at each step of the recursion, we will check if attributes of a space is same
everywhere. If that is the case, then we do not divide it any further. So, we do not go for
recursive division again for that particular region.
263
(Refer Slide Time: 27:35)
BSP representation also suffers from the same problem if we are going for uniform
partitioning.
(Refer Slide Time: 27:54)
And to avoid that, we can go for a revised method or a refined method where we can either
divided region into 2 sub regions if the sub regions are having different properties at different
places or we do not divide. So, the intermediate nodes will have either 0 or 2 children similar
to the revised octree method. So, what we have learned is that the basic unit of representation
will be voxel. Now, using voxel we can divide a given 3D space to represent the object
contained in that space in two ways, one is uniform division.
264
Now, when we are going for uniform division, we will get one particular type of tree if we
are following octree method or one particular type of tree if we are following the BSP
method. In the octree method, if we are going for uniform division, then we will get 8
children for each node. In the BSP method, we will get two children for each node. Now, if
the region which we are dividing is having the same attribute or same properties everywhere
then it will be a wastage of memory to divide that region into sub regions and store the
information.
Instead, we need not divide it any further. So, in the case of octree method, we modify it for
non-uniform partitioning by imposing the fact that a node can have either 0 or 8 children.
Similarly, in a revised BSP method, we can modify by imposing the fact that each
intermediate node can have 0 or 2 children.
(Refer Slide Time: 29:56)
There is another space partitioning method known as CSG. What it means?
265
(Refer Slide Time: 30:07)
It stands for Constructive Solid Geometry. So, this is another method for space representation
of objects. Now, in case of octree or BSP what we have seen? We have seen that these
representations are based on division of space. So, we are given a space and we are dividing
it into subspaces. In comparison to that, in case of CSG what we have? It is actually based on
joining up spaces, just the opposite of what we do in BSP or octree methods. So, in case of
CSG we rely on joining of spaces to represent objects. Let us try to understand this with
respect to an example.
(Refer Slide Time: 31:00)
Consider this object, this looks pretty complex. However, when we are using CSG or
constructive solid geometry method, we can represent it as a union of some other shapes. So,
266
we start here at this level, here we have this shape, this shape, this shape and this shape, so 4
shapes are there. Now, we combine these 2 shapes to get one shape and we combine these 2
shapes here to get this shape. Now, these 2 shapes are then further combined to get this
overall shape.
So, here we start with set of primitive shapes or basic shapes, then we apply a set of Boolean
operators namely union, intersection, difference etc. which are defined over this set of
primitive shapes to get newer shapes. Like here on these 2 shapes, we applied a Boolean
operator to get a new shape and this process continues till we get the desired shape here. So,
the operators can be applied hierarchically. So, from this level we reach to this level, from
this level we reach to this level, at each level we apply the Boolean operators.
So that is a hierarchical application of operators. So, in other words, we have a set of
primitive shapes defined and a set of Boolean operators on those shapes also defined. Now,
we apply those operators on the shapes to get new shapes and we apply those operators
hierarchically till we get the desired shape, till we are able to represent the desired shape. So,
what is the representation? Representation is essentially the primitive shapes and the Boolean
operators applied in a hierarchical manner, so that is the representation.
So, that is in summary, what this constructive solid geometry is all about. Now, let us
summarize what we have learned so far.
(Refer Slide Time: 34:07)
Today and in previous few lectures we have learned various representation techniques, which
is the first stage of the pipeline. Boundary presentation including splines, space representation
267
and also got overview of other representations. Now, there are many techniques as we have
seen so far. One question is which technique to use and when? That is a question that always
confronts a graphic system developer. Which technique should be used and in which
situation? What guides that decision making process?
(Refer Slide Time: 34:55)
Now, we may wish to have most realistic representation. Clearly, advanced techniques would
be useful but advanced techniques such as particle systems may not always be possible
because they require lots of computational resources which may not be available in all
graphic systems. So, each technique comes with a cost and there is a tradeoff, which
technique to use in which situation.
(Refer Slide Time: 35:26)
268
Now, this cost may be computational or storage requirement or both and depending on the
resource available, we have to make the decision. If we are having very less resource then we
cannot think of advanced modeling techniques, then we may have to lose some realism and
have to settle for some less realistic scenes or images.
(Refer Slide Time: 36:07)
For example, suppose we want to represent coastlines in a scene. So, what is desirable?
Coastlines are typically self repeating structures and fractal representation is very good for
such shapes. So, we ideally should use fractal representation to have a realistic coastline
displayed. But it may not be possible if we are considering a mobile platform with limited
storage processor and battery because fractal representation has additional computational cost
which may not be supported in low end mobile devices.
So, in that case, we may have to use some simpler method. So, we are losing here something,
the realistic effect, but gain something which is a working approximation. So, such trade-offs
are very much part of the decision making process, balancing those trade-offs.
269
(Refer Slide Time: 37:18)
Another consideration maybe ease of manipulation. So, when we are creating animation, this
consideration comes handy and it becomes important. Now, as we said each representation
has its own methods, own algorithms, own process. Subsequent stages of the pipeline needs
to process these representations to carry out the operations that are part of the subsequent
stages. So, that requires manipulation of the representations.
Now, if it requires a lot of time to manipulate, for example, rotate an object, shift 10 objects
horizontally or vertically, scale up or down an object or clip, project or perform better in
surface removal, etc. these are part of other stages of the pipeline. We will discuss about the
stages in details later. Now, if a lot of time is required to perform these operations, for objects
that are represented in a particular way, the quality of animation may be reduced. So, in such
cases, we should look for simpler types.
So, this is another consideration that should be kept in mind while choosing a particular
model of representation, that is ease of manipulation. So, if we are having low end devices,
low end graphics systems, then it is not advisable to go for advanced representation
techniques which require lots of manipulations later, particularly in the context of animation.
270
(Refer Slide Time: 39:18)
Third consideration is ease of acquiring data. For example, vertex list representation for a
curved surface using a mesh require large number of vertices to be specified. Now, we
consider spline representation. In that case, we require fewer control points to represent the
same curve. So, clearly here, representing a curve using mesh involves more effort for
acquiring data, whereas representing the same curve with a spline involves less effort to
acquire data.
So, sometimes these ease of acquiring data can be a deciding factor. Depending on the
designer of the graphic system, a particular method can be choosing where the data can be
acquired easily.
(Refer Slide Time: 40:22)
271
So, there are several trade-offs we have to balance and we have to decide what are those
trade-offs resources available, nature of interactions, resources in terms of computing
platform, nature of interaction in terms of our level of comfort with supplying large amount
of data and also effect, whether we want a very realistic effect or we are looking for some
approximation considering the lack of resources.
(Refer Slide Time: 41:07)
So, depending on these broad 3 considerations, we have to choose a particular modeling
technique.
So, with this, we have come to an end to our discussion on the first stage of the graphics
pipeline, that is object representation. In the next lecture, we will start our discussion on the
second stage of the pipeline, namely, modeling or geometric transformations.
(Refer Slide Time: 41:41)
272
Whatever I have discussed today can be found in this book. You are advised to go through
chapter 2, section 2.4 and 2.6 to get more details on the topics that I have covered today.
Thank you and goodbye.
273
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 10
Introduction to Modeling Transformations
Hello and welcome to lecture number 10 in the course Computer Graphics. We were
discussing about the graphics pipeline. To recollect, in computer graphics what we do, we
generate 2D images on a computer screen and the process of generating these 2D images
involves a set of stages, together these stages constitute the graphics pipeline. So, let us have
a relook at the 3D graphics pipeline.
(Refer Slide Time: 1:15)
When we are talking of 3D graphics pipeline, just to refresh our memory, we are talking of
creation of a 2D image from a 3D description of a scene.
274
(Refer Slide Time: 1:25)
So, what are the stages that are part of the pipeline? First stage is object representation,
second stage is modeling transformation, third stage is lighting or coloring, fourth stage is
viewing pipeline. Now, note here that we are calling this stage as a pipeline itself, that means
it consists of some more sub stages. There are 5 such sub stages viewing transformation,
clipping, hidden surface removal, projection transformation and window to viewport
transformation. And the final stage is scan conversion, also called rendering.
Among these stages, in the previous few lectures, we have covered the first stage, that is
object representation. So, we now know how we can represent the objects that constitute a
scene. Again, if I may repeat, the idea of computer graphics is to convert this representation
to a sequence of 0s and 1s and that conversion takes place through the execution of these
stages. The very first stage is object representation which we have already discussed.
Today, we are going to start discussion on the second stage, that is modeling, also called
geometric transformation.
275
(Refer Slide Time: 3:10)
So, what is there in this stage?
(Refer Slide Time: 3:14)
Now, when we talked about representing objects in the earlier lectures, what we implicitly
referred to is that the objects were represented individually. Now, when we are defining those
objects individually, we are implicitly using one coordinate system that is typically called
local or object coordinate system. So, for each object definition we are having a local or
object coordinate. Now, within this coordinate system we are defining the object.
276
(Refer Slide Time: 3:57)
What that means? That means, at the time of defining objects, we are actually not bothering
too much about the object shape, size and position relative to other objects. So, we have
defined an object, but in essence there may be multiple objects where this particular object
may have a relative size or relative position or a relative orientation. But when we are
defining the object in its own local coordinate, we are not paying too much attention to those
factors.
(Refer Slide Time: 4:42)
But, when we are going to compose a scene, which constitutes all the objects that we have
defined, the objects need to be assembled together. Now, when we are talking of assembling
the objects, what it means? It means that the objects should be put in a way it says that the
277
overall scene becomes perceptible. So, the object shape, size, orientation now is very
important. So, when we are defining objects in its own coordinate those things are not
important, but when we are assembling them, it becomes important.
So, when we are trying to compose a scene by taking care of the relative shape, size,
orientation and position of the objects with respect to other objects, we are again implicitly
assuming another coordinate system, that is typically called the scene or more popularly the
world coordinate system. So, earlier we dealt with local or object coordinate system, now, we
have to deal with another coordinate system that is popularly known as the world coordinate
system.
(Refer Slide Time: 6:06)
And as I said, in world coordinate system, the shapes, sizes, positions, orientations of these
individual objects needs to be taken care of, so that we can construct the scene. So, those
become very important now, in the world coordinate system.
278
(Refer Slide Time: 6:29)
How we can do that? So, earlier we have defined objects in their own coordinate system
without bothering about the relative shape, size, position, orientation, etc. Now, we are going
to assemble them together in the world coordinate scene. And now, we have to think about
those shapes, sizes, partitions orientations, so that the scene becomes perceptible. We can do
that by applying some operations, by performing some operations.
These operations will transform the objects from its local coordinate to world coordinate. So,
in order to create the scene by assembling the objects together, which in turn are defined in
their own coordinate system, what we need to do, we need to perform some operations to
transform the objects from their local coordinate description to world coordinate description.
(Refer Slide Time: 7:41)
279
Now, these operations or the transformation takes place in the second stage of the graphics
pipeline. Let us see one example. Here we have 2 objects, this is object 1 on the leftmost
figure and object 2 in the middle figure. Now, we want to create this object shown in the right
hand figure. So, as you can see in this object in this overall object what we have, we have
these cylinders, how many instances 1, 2, 3, 4 and the other shape how many instances 1, 2
and 3.
So, we have 4 instances of this object and 3 instances of this object. And the way these
objects are defined in their own coordinate system are not the same as the way they are
defined in this scene, which is on the right hand side. Here as you can see, here the
orientation is different, orientation is different size is also different in all the four instances.
Same is true for the other objects, the instances of the other object.
So, these coordinates where the objects are originally defined are the local or object
coordinates. The coordinate here represented by the principal axis X, Y, Z in the right hand
figure is the world coordinate. Here we are assembling multiple instances of the original
object definitions to construct the overall object. And as you can see here, in order to do that,
it is very important that the objects are put into proper place, in proper orientation and proper
size. So, that is what is the job of the transformation that takes place in the second stage.
(Refer Slide Time: 10:10)
Now, since these transformations change some geometric properties or takes place on the
geometry of the object definition, so we call these as geometric transformations. Also it is
known as modeling transformations.
280
(Refer Slide Time: 10:38)
Now, these modeling transformations imply applying some operations on the object
definition in local coordinate to transform them as a component of the world coordinate
scene. So, this is what we can more formally talk about the modeling transformation, that is
applying some operations on the object definition to transform them as a component in the
world coordinate scene.
(Refer Slide Time: 11:11)
What are those operations, in fact, there can be many such operations. We will soon see what
are those operations.
281
(Refer Slide Time: 11:23)
But all the operations can be derived from a set of basic operations. So, although we can in
principle apply many operations, but these operations can be thought of as derived from a set
of basic operations.
(Refer Slide Time: 11:43)
Now, let us have a look at those basic operations. There are actually 4 such basic operations,
translation, rotation, scaling and shearing. Translation all of us know, what it does, it
translates object from one position to another position.
282
(Refer Slide Time: 12:03)
In rotation, what we do? We rotate the objects by some angle either in clockwise or anti
clockwise direction around some axis.
(Refer Slide Time: 12:22)
With scaling what we can do? We can reduce or increase the object size.
283
(Refer Slide Time: 12:30)
And finally, with shearing, we can change the shape of the object. It may be noted here that
shearing is in a stricter sense, not a basic transformation and it can be derived as a
composition of rotation and scaling. However, for simplicity, we will assume that this is a
basic transformation and we will create it accordingly. So, then, let us recap. So, we have 4
basic transformations, translation, rotation, scaling, and shearing. Among them, shearing
changes the shape of the object, scaling changes the size of the object, translation and rotation
changes the position and orientation of the object.
(Refer Slide Time: 13:24)
Now, I am assuming that you have some idea of these operations and you may be knowing
that these operations change the geometric properties of the objects in terms of changing their
284
shape, size and location. Since, that is the case, so we call these transformations as geometric
transformers. So we can call the operations performed in the second stage as either modeling
transformation or geometric transformation. Now, let us go into a little deeper discussion on
each of these transformations. Let us start with translation.
(Refer Slide Time: 14:10)
What happens in translation?
(Refer Slide Time: 14:11)
As you can see, suppose we have an original point here in this reference frame, which is
denoted by P with coordinate x and y. Through translation we can reposition the point to P
dashed with new coordinates x dashed and y dashed. So essentially we are displacing this
285
point to another point by an amount tx and ty to get the new point and this displacement takes
place along the x and y direction.
So using this knowledge we can actually derive the new coordinate with respect to the old
coordinate. What it will look like?
(Refer Slide Time: 15:06)
So the new coordinate, the new x coordinate will be x plus the displacement amount along
the x direction and the new y coordinate will be the original y coordinate plus the
displacement amount along the y direction. So these are simple derivations, and simple to
formulate. And these are the relationships between the new and the old x and y coordinates of
the points.
286
(Refer Slide Time: 15:41)
Now, these displacements can be thought of in different ways. So if we are moving along
positive x axis or positive y axis, then we call it positive displacement. If we are moving
along negative x axis or negative y axis, we call it negative displacement. So the sign of the
amount tx or ty will be different for positive or negative displacements.
(Refer Slide Time: 16:21)
Now, let us shift our attention to rotation.
287
(Refer Slide Time: 16:25)
Now, in case of rotation, we do not have horizontal or vertical displacements, instead we
have angular displacement. In other words, the point moves from one position to another on a
circular track about some axis. So, here, we are having angular displacement and the point is
moving around some axis.
(Refer Slide Time: 16:57)
We follow some convention typically, that is if the movement is counterclockwise, then it is
positive angle of rotation. So consider this example here, we have the original point here.
And now, we are having the point after rotation, it should be denoted by x dashed, y dashed
and the rotation angle is ϕ. Now, since we are moving in the counterclockwise direction, we
are calling it positive rotation angle.
288
If we are moving in clockwise direction, then we typically consider that to be a negative
rotation angle. That is one convention typically followed. And in case of 2D rotation, we
typically assume that the rotation takes place around the Z axis. However, later on we will see
for 3D rotation what are the conventions.
(Refer Slide Time: 18:13)
Now, let us try to derive the relationship between the new and old coordinates. So, the old
coordinate is (x, y), the new coordinate is (x’, y’). Now, as you can see, we can represent x as
(r cos θ), now r is the radius of the circular track and θ is the angle between the x axis and the
original point and y is (r sin θ). Now, as we can see x’ is if we draw a line like this, then we
can represent x’ as {(r cos θ) + ϕ}. Now, if we expand then we will get {r cos θ cos ϕ - r sin θ
sin ϕ}.
Since (r cos θ) is x, so it is (x cos ϕ) and since (r sin θ) is y, so it will be (y sin ϕ). So,
similarly for y’, we can have a similar expression {x sin ϕ + y cos ϕ}. These two are
relationship between the old coordinate of the point and the new coordinates of the point.
289
(Refer Slide Time: 19:44)
And as I already mentioned, counterclockwise angular movement is typically considered as
positive, otherwise it is negative. Now, in case of negative angular movement, we change the
sign of the angle of displacement. Instead of ϕ, we will use – ϕ. That is the only change we
will make.
(Refer Slide Time: 20:11)
So we have learned about translation and rotation. How we can apply these transformations
to the points?
290
(Refer Slide Time: 20:23)
The way we derived, they applied to a point.
(Refer Slide Time: 20:33)
For an object we have many points. So single application to a point will not be sufficient. So
what we need to do? We simply apply on all the points that make up the surface. Now, you
may be thinking that that is impossible because there maybe infinite number of points on a
surface. However, we can actually do that by applying the transformations to all vertices in a
vertex list representation or all the control points for a spline surface.
So, when we are going to apply it, we essentially think of some representation, it can be a
vertex list representation as in case of mesh representation, or can be a set of control points as
291
in case of spline representation, and we apply the transformations on each of these points are
so that the entire object gets transformed.
(Refer Slide Time: 21:43)
And as I have mentioned earlier that by applying in this way, we can change the object
orientation using rotation and position using translation. So by applying rotation on all the
vertices or all the control points, we can change the orientation of the object and by applying
translation on all the vertices or all the control points, we can change the position.
(Refer Slide Time: 22:17)
The third basic transformation is scaling. What happens in scaling?
292
(Refer Slide Time: 22:25)
It changes the size. Now, changes can take place in both ways, either it can decrease or it can
increase. So both increase and decrease of the object size is possible with scaling.
(Refer Slide Time: 22:43)
Now, mathematically how scaling is defined? It is defined as an operation of multiplying
object coordinates by some scalar quantity. Now, these scalar quantities are known as scaling
factors. So essentially we are multiplying scaling factors to the coordinate values to scale up
or down the objects. Scale up means increasing the size of the object, scale down means
decreasing the size of the object.
293
(Refer Slide Time: 23:27)
So, at the level of a point, how we can understand it? So, given a point P, we simply multiply
the x coordinate with a scale factor along the x direction and a scaling factor along the y
direction, we multiply this to the y coordinate to get the new point (x’, y’). So, the
relationship between the old and new coordinates will look something like this. So the new
coordinate x’ can be represented in terms of x in this way, and the new coordinate y’ can be
represented in terms of y in this way, where sx and sy are the two scaling factors along the
corresponding x directions.
For example, here we have one object and we are using scaling factor along x direction to be
one third and scaling factor along y direction to be half. Now, if I multiply these scaling
factors to the x and y coordinates of the vertices, I will get 4 new vertices as shown here in
this right-hand figure. Now these vertices together will represent the object. Since the scaling
factors are less than 1, that means we are scaling it down or decreasing the size.
So, like in case of translation or rotation, here also, we did the same thing. That is we applied
the scaling operations to all the points that define the object. Now, if we are using mesh
representation, then those points are essentially the vertices of the vertex list. If we are using
a spline representation, then these points are essentially the set of control points. And we
apply scaling factor to each of these points to get the new objects, the new points that define
the object.
294
(Refer Slide Time: 26:08)
Here, we should note one thing. So, if we are using the same scaling factor along both x and y
direction, then this type of scaling is called uniform scaling. Otherwise, what we do is
differential scaling. In the example, we have seen that the scaling factor along x direction is
one third and along by direction is half so they are different, so we actually followed
differential scaling in the example.
(Refer Slide Time: 26:41)
Now, when the scaling factor is, when say sx is greater than 1 then along the x direction we
are scaling up, when sy is greater than 1, then along the y direction we are scaling up or
increasing the size. Now, when both are greater than 1, then along both the directions we are
295
increasing the size. Similarly, when sx is less than 1, we are reducing or scaling down the size
along x direction.
Similarly, when sy is less than 1, we are reducing or scaling down the size along y direction,
when both are less than 1 then we are scaling down along both directions simultaneously.
And of course, if sx equal to 1 or sy equal to 1, then there is no change in size.
(Refer Slide Time: 27:41)
One important point to be noted here is that, during scaling the object may get repositioned,
as we have seen in the example. So original vertex was at (3, 2) here it was at (9, 2) here it
was at (9, 4) and it was at (3, 4). Now after scaling, by applying the scaling factors along x
and y directions, we got a new object defined by the 4 vertices. What are the coordinates? We
have (1, 1) then here we have (3, 1) here we have (3, 2) and here we have (1, 2). Now, as you
can see the vertices got repositioned. So that is one of the effects of scaling.
One effect is changing the size, the other effect is it may lead to repositioning of the objects.
The final basic transformation is shearing.
296
(Refer Slide Time: 29:12)
What happens in shearing? Here we basically change the shape of the object. So far the
transformations that we have learned deal with changing the position, orientation and size.
Now, the final and the fourth basic transformation shearing allows us to change the shape
also.
(Refer Slide Time: 29:40)
As you can see in this example, so we have one object here, which after shearing gets
transformed to this object with a change in shape. Now, like scaling, shearing also essentially
refers to multiplication of shearing factors along the x and y directions on the original object
or on the original point. So if original point is x, then we multiply it with the shearing factor
to get the transformed point x’. And same is true for y also.
297
(Refer Slide Time: 30:37)
But, the relationship is slightly more complicated than scaling. Here, the new point is
obtained by addition plus multiplication. So the new point is an addition of the old coordinate
and a term which is a multiplication of the old coordinate with the shearing factor along that
axis. But note here that to get x’, the new x coordinate, we use old x coordinate and also old y
coordinate and the shearing factor along the x direction.
Similarly, to get new y coordinate we use old y coordinate and also old x coordinate and the
shearing factor along the y direction. So that is the difference, slightly more complicated than
scaling. And it allows us to change the shape of the object.
(Refer Slide Time: 31:57)
298
Now, the relationship is established between old and new points. Like in the previous cases,
previous 3 transformations, in case of shearing also, we can actually apply the operations on
all the points on the surface to change the shape of the whole surface. If we are following a
mesh representation, the surface will be represented in terms of its lattices, in the form of a
vertex list. So we apply care on all the vertices.
If we are using a spline representation, then the surface will be represented in terms of a
controlled point grid and we apply sharing on all these controlled points in the grid to get all
our transformations.
(Refer Slide Time: 32:54)
Like scaling, here also repositioning may take place. Let us see this example again let us
consider one vertex, this is 9 and 2, this vertex (9, 2) changes to as you can see here (10, 2).
Another vertex (9, 4) also changes and becomes (11, 4). Similarly, you can see the other
vertices, like here it becomes (4, 2). Whereas, earlier it was (3, 2). This vertex becomes (5, 4)
from here which was (3, 4). However, you may note that it is not necessary that all vertices
change their position.
So, all vertices may not reposition during a shearing transformation. Also you can see here
that it is not mandatory to perform shear along both axes simultaneously. As you can see here
that along y axis the shearing factor is 0. So, we are not shearing along y axis whereas we are
shearing along the x axis. So, both scaling and shearing have this property that they may
reposition the object, they may reposition all or some of the points that define the object.
299
(Refer Slide Time: 35:39)
So, these are the four basic transformations. And when we are actually trying to transform an
object we can apply these basic transformations in sequence, in multiple numbers, in different
ways to get the desired transformation. Now, one point should be noted here that in order to
perform the transformation, we have derived some equations, the equations that show the
relationship between the old and the new point. So if we want to perform the transformation,
we have to apply these equations on the old points to get the new points.
In fact, these equations are not very handy and convenient for use to build graphics libraries
or packages. If we are trying to design a modular graphics system then these transformations
represented in the form of equations may not be a good way to represent the transformations.
We require alternative representations and those representations are there in the form of
matrices. And we will also have many advantages if we are using matrices, particularly in the
context of building modular systems.
So, in the next lecture, we shall discuss the matrix representation and why it is useful to
represent transformations.
300
(Refer Slide Time: 37:28)
Whatever I have discussed today can be found in the chapter 3, section 3.1 of this book
computer graphics. So, we will meet again in the next lecture. Till then, goodbye.
301
Computer Graphics
Professor. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 11
Matrix representation and composition of transformations
Hello and welcome to lecture number 11 in the course Computer graphics. We are discussing
different stages of the graphics pipeline. Before we go into today's topic, let us again quickly
recap the stages and where we are currently.
(Refer Slide Time: 00:50)
So, there are 5 stages, first stage is object representation, second stage is modelling
transformation, third stage is lighting or colouring, fourth stage is viewing pipeline which
itself consists of few sub stages, 5 sub stages mainly; viewing transformation, clipping,
hidden surface removal, projects and transformation and window to viewport transformation,
and the last stage is scan conversion.
So, what we do in this stage is, the first stage we define objects, in the second stage, we put
them together to construct a scene in world coordinate and then in the subsequent stages, we
process those till we perform rendering on the actual computer screen. We have already
discussed the first stage object representation, currently we are discussing the second stage
that is modelling transformation.
302
(Refer Slide Time: 1:56)
So, in the last lecture, we got introduced to the basic idea what we mean by modelling
transformation and we also introduced 4 basic transformations using which we perform any
type of modelling transformation. Now, today, we are going to learn about representation. So,
in the previous lecture we talked about how to represent the transformations, today we will
learn about an alternative way of representing those transformations.
(Refer Slide Time: 02:32)
What we have seen in the previous lecture how we can represent the transformation. If you
may recollect, there are 4 basic transformations; translation, rotation, scaling and shearing.
Using these 4 basic transformations, we can perform any geometric transformation on any
303
object, either applying any one of these 4 transformations or applying these transformations
in sequence one after another multiple times and so on.
And we discussed these transformations in terms of equations. For translation, we discussed
the relationship between the original point and the transformed point that is point after
translation as shown in these two equations. For rotation, similarly, we established the
relationship between the original point and the transformed point using these two equations.
Same was the case with scaling, again two equations; one each for the two coordinates and
shear. In these equations starting with translation, we used some parameters tx, ty or the
amount of translations along x and y direction. Similarly, ϕ is the angle of rotation in these
rotation equations, sx, sy are the scaling factors along the x and y directions respectively. And
shx, shy are the shearing factors along the x and y directions respectively.
(Refer Slide Time: 04:55)
So, as we have shown we can actually use these equations to represent the transformations.
Now, as we have discussed in introductory lectures, there are graphics packages, there are
graphics libraries that are developed to actually make the life of a developer easier so that a
developer need not always implement the individual components of a graphics pipeline to
develop a product. In order to build a package or in order to develop library functions, what
we need, we need modularity, we need standard way of defining inputs and outputs for each
function.
Unfortunately, the equation based representations of transformations do not support such
modularity. So, when we are trying to represent transformations using equations, it is difficult
304
to modularize the overall pipeline in terms of standardized input and output and standardized
functions, because in subsequent stages of the pipeline, we will see other transformations and
each of those transformations will have different equations represented in different forms and
formats.
So, then it will be very difficult to actually combine these different stages and implement a
package or a library, where the user will not be bothered about the internal working of the
package.
(Refer Slide Time: 06:50)
To maintain this modularity, equation based representations are not suitable. We require
some alternative representation and one such alternative representation, which supports our
need for a modularized modular based system development is matrix representation. So, we
can actually represent transformations in the form of matrices. And later on, we will see that
other stages of the pipeline can also be implemented by representing basic operations in the
form of matrices.
So, there will be some synergy between different stages and it will be easier to implement
those stages in the form of predefined packages, functions or libraries.
305
(Refer Slide Time: 07:48)
So, how these matrices look like let us take, for example, the scaling transformation. Now, if
we want to represent scaling in the form of matrices, what we will do? We will create a
matrix in this form, a 2×2 matrix and the scaling factors will be positioned along the diagonal
as shown here.
(Refer Slide Time: 08:20)
Now, how to apply this transformation then? Suppose we are given a point P(x, y) and we
want to transform it by scaling. So, what we will do? We can represent this point as a column
vector shown here and then multiply that transformation matrix with the column vector. This
is the dot product of the matrices to get the new points. So essentially, we need to have matrix
306
multiplication. And this form of representing the operations of transformation actually is
what makes it easier to implement in a modular way.
(Refer Slide Time: 9:20)
So, we have represented scaling in terms of 2×2 matrix. We can do the same with rotation,
we can have a 2×2 matrix for representing rotation transformation, as well as shearing. So
then, we can have 2×2 matrices for the 3 operations; rotation, scaling and shearing.
(Refer Slide Time: 9:50)
Unfortunately, 2×2 matrices would not serve our purpose. Because, no matter how much we
try, we will not be able to implement or represent the translation transformation using a 2×2
matrix unlike the other 3 basic transformations that is not possible.
307
(Refer Slide Time: 10:33)
So, in order to avoid this problem, in order to address this issue, we go for another type of
matrix representation, which is called representation in a homogeneous coordinate system.
Now, what this homogeneous coordinate system based matrices representation refers to?
(Refer Slide Time: 10:48)
So, essentially it is an abstract representation technique that means, this coordinate system
actually does not exist in the physical sense, it is purely mathematical, purely abstract. So,
there may be physically a 2 dimensional point which we transform to a 3 dimensional
abstract coordinate system called homogeneous coordinate system. So, each 2D point
represented by these 2 coordinates x and y can be represented with a 3 element vector as
308
shown here, each of these elements correspond to the coordinates in the homogeneous
coordinate system.
So, we are transforming a 2D point into a 3D space in this case, the 3D space is the abstract
homogeneous coordinate space and each point is represented with a 3 element vector.
(Refer Slide Time: 11:56)
So, what is the relationship between these 2 representations? So, we have a 2D point
represented by its 2 coordinates x and y. And now, we have transformed it or we are
presenting the same point in a 3 dimensional space called homogeneous coordinate system
where we are representing the same point with 3 coordinate values xh, yh and h. So, what are
the relationships between these quantities?
Now, the original coordinate x is equals to the x coordinate in the homogeneous coordinate
system divided by h, which is the third coordinate value and original coordinate y is equals to
the y coordinate in the homogeneous coordinate system divided by again the h, h is called
homogeneous factor and it is important to note that it can take any nonzero value, it must be
nonzero value.
309
(Refer Slide Time: 13:05)
There are a few more things we should note here, since, we are considering h to be
homogeneous factor. So, if h is 0, then we consider that point to be at infinity in the
homogeneous coordinate system, and there is no concept of origin since 0×0 is not defined
so, we usually do not allow the origin point where everything is 0. So, these two things we
should remember while dealing with homogeneous coordinate system, first thing is if h
becomes 0, then we consider that point to be at infinity and there is no concept of origin in
the homogeneous coordinate system.
(Refer Slide Time: 14:05)
Now, let us try to understand how we can convert this geometric transformation matrices into
the matrices in the homogeneous coordinate system. So, earlier we had this 2 by 2 matrices
310
representing the 3 basic transformation out of 4; rotation, scaling and shearing. As we have
already mentioned, so this 2×2 matrices will transform to 3×3 matrices in the homogeneous
coordinate system. In fact, in general if there is an N×N matrix transformation matrices, it is
converted to (N+1) × (N+1) matrices.
Now, if we represent a transformation matrix, a 2D transformation matrix using a 3×3 matrix,
then we will be able to represent translation as well so our earlier problem will be resolved,
earlier we were unable to represent translation using a 2×2 matrix, although you are able to
represent the other 3 basic transformations. Now, with homogeneous representation, we will
be able to avoid that, we will be able to represent all the 4 basic transformation using 3×3
matrices.
(Refer Slide Time: 15:36)
Another thing we should keep in mind is that, when we are talking about geometric
transformations, we always consider h to be 1. So, h value will always be 1. However, there
are other transformations that we will encounter in our subsequent lectures, where h is not
equal to 1.
311
(Refer Slide Time 16:03)
Now, let us see how the basic transformations are represented using homogeneous coordinate
matrices. So, translation we can represent using this matrices, rotation we can represent using
these matrices where phi is the angle of rotation, scaling can be represented using this
matrices and finally, shear can be represented using these matrices. Now, in case of scaling,
sx, sy represents the scaling factors along x and y direction, in case of searing shx and shy
represent the steering factors along x and y direction respectively.
So, here you can see that we managed to the present all transformations, all basic
transformations in the form of matrices, although we have to use 3×3 matrices to represent 2
dimensional transformations.
(Refer Slide Time: 17:12)
312
Since, we are using homogeneous coordinate system, so, our point representation also
changes. So, earlier we had this representation for each point, now we will be representing
each point using a 3 element column vector and the other operations remain the same with
minor modification. So, first we apply the matrix multiplication as before to get the new point
that is P’ = S.P. But, after this, what we need to do is divide whatever we got in P’, the x and
y values by h to get the original value, this is the general rule for getting back the actual
points.
But, in case of geometric transformation as we have already mentioned, h is always 1. So, it
really does not matter. But, other transformations will see in subsequent lectures where it
matters very much. So far what we have discussed is what are the basic transformations, and
how we can represent those transformations, and also how we can use those to transform a
point which is by performing a matrix multiplication.
Now, let us try to understand the process of composition of Transformation. When we require
composition? If we have to perform transformations that involve more than one basic
transformation, then we need to combine them together. Now, the question is how to combine
and in which sequence?
(Refer Slide Time: 19:10)
So, when we are performing multiple geometric transformations to construct world
coordinates scene, we need to address the issues of how we perform these multiple
transformations together and what should be the sequence of transformations to be followed.
313
(Refer Slide Time: 19:34)
Let us try to understand this in terms of an example. Here in this figure, look at the top figure
here. We see one object denoted by the vertices ABCD with its dimension is given. Now, the
bottom figure shows a world coordinate scene in which the same object is placed here which
is used to define a chimney let us assume of the house. Now, here you can see that the
original vertex A got transformed to A’, B got transformed to B’, C got transformed to C’ and
D got transformed to D’. And also, the dimension changed.
So, earlier dimension actually got reduced along the x direction, although the dimension
along the y direction remained the same. So, two things happened here as you can note in this
figure, first of all its dimension changed and secondly its position changed. So, earlier it was
having one vertex as origin, now it is placed at a different point.
So, two transformations are required; one is scaling and the other one is translation, scaling to
reduce the size, translation to reposition it in the world coordinates scene. This much we can
understand from the figure, but how to actually apply these transformations that is the
question we want to answer so that we get the new vertices.
314
(Refer Slide Time: 21:42)
What we know? We know that to get the new vertices we need to multiply the current
vertices with a transformation matrix. But, here it is not a basic transformation matrix, it is a
composition of two basic transformation matrices and we need to perform that how to do that,
how to combine the two matrices?
(Refer Slide Time: 22:13)
Let us go step by step. First step, we need to determine the basic matrices that means
determine the amount of translation and determine the scaling factors. Note that the object is
halved in length while the height is the same that means along the x direction it halved but
along y direction it remained the same. So, the scaling matrix would be sx should be half and
sy will be 1 as shown in this transformation matrix for scaling.
315
(Refer Slide Time: 22:57)
Now translation, the second basic transformation that we require. Now, here the vertex D was
the origin as you can see here, where it got transferred to? To D’. Now, what is the vertex
position of the transformed point that is (5, 5). So, origin got repositioned to (5, 5) that is
essentially 5 unit displacement along both horizontal and vertical directions. So, then tx equal
to 5 and ty equal to 5, so if we use these values in the transformation matrix for translation,
then we will get this matrix in this current case. So, earlier we obtained the scaling matrix
now, we obtained the translation matrix but our question remains how to combine them?
(Refer Slide Time: 24:15)
That is the second step composition of the matrices or obtain the composite matrix. What we
need to do is to multiply the basic matrices in sequence and this sequencing is very important,
316
we follow the right to left sequence that is a rule we follow to form the sequence. Now, what
this rule tells us?
(Refer Slide Time: 24:50)
First transformation applied on object is the right most in the sequence, next transformation is
lists on the left of this earlier transformation and so on, till we reach the last transformation.
So, if we apply the first transformation say T1 on the object then it should be placed at the
rightmost. Now, suppose we require another transformation which is 2, then T2 will come on
the left side of T1, if there is one more transformation need to be applied say T3 then it comes
left of T2 and so on till we reach the final transformation say Tn.
This is the right to left rule; first transformation applied on the object is on the rightmost side
followed by other transformations in sequence till the leftmost point where we place the last
transformation applied on the object.
317
(Refer Slide Time: 26:04)
So, in our case, we can form it in this way, first transformation to be applied is scaling
followed by translation. So, right to left rule means first S, and on its left side will be T so
these two will have multiplication as shown by these 2 matrices and the result will be this
matrix. So, this is our composite matrix for that particular transformation.
(Refer Slide Time: 26:39)
Once we get the composite matrix after multiplying the current matrices with the composite
matrix, we will get the new points.
318
(Refer Slide Time: 26:55)
So in our case, this step will lead us to the points as shown here, A’ can be derived by
multiplying this composite matrix with the corresponding vertex in homogeneous coordinate
system to get this final vertex in homogeneous coordinate system and that is true for B’, C’
and D’.
(Refer Slide Time: 27:31)
Now, the last stage of course, is to transform from the homogeneous representation to the
actual representation that we do by dividing the x and y values by the homogeneous factor h.
Now h in our case, that is the case where we are concerned about geometric transformation, it
is 1. So, our final transform points or vertices should be obtained in this way, A’ we will get
by dividing the x and y values by the homogeneous factors, and similarly for B’, C’, and D’.
319
So, what we did? We first identified the basic transformations. This was followed by forming
the sequence in right to left manner that is we put the transformation that is to be applied on
the object at first as the rightmost transformation, then the next transformation to be applied
on the object as the transformation left to the earlier transformation and so, on.
Then we multiply these basic transformation matrices to get the composite transformation
matrix. Then, we multiplied the points with this composite transformation matrix to get the
transform points in homogeneous coordinate system. Finally, we divided the x and y values
of this homogeneous coordinate representation by the homogeneous factor to get back the
original transformed point.
(Refer Slide Time: 29:32)
We must remember here that matrix multiplication is not commutative. So, the formation of
the sequence is very important. So, earlier we did translation multiplied by scaling following
the right to left rule. Now, if we have done it in the other way that is scaling followed by
translation, it will lead to a different matrix whereas this gave us M, and since matrix
multiplication is not commutative, so we cannot say M=M’ so actual M≠M’. So, if we do not
create the sequence properly, then our result will be wrong, we may not get the right
transformation matrices.
320
(Refer Slide Time: 30:28)
So, how to decide which sequence to follow. So, earlier we simply said that first we will
apply scaling and then we will follow translation, on the basis of what we made that decision.
Let us try to understand the example again where we made the decision that scaling should be
followed by translation. So, what was there in the example that indicated that this would be
the sequence?
(Refer Slide Time: 31:06)
When we discussed scaling, we mentioned one thing that is during scaling the position of the
object changes. Now, if we translate fast and then scale, then the vertex position might have
changed because scaling may lead to change in position. However, if we scale fast and then
translate, then anyway we are going to reposition it at the right place where we want it. So,
321
there is no possibility of further position change. So, clearly in this case, we first apply
scaling and the associated changes that take place is fine that is followed by translation. If we
do that in that sequence, then we do not get any problem so that was the logic behind going
for this sequence.
And in general, we follow this logic where if we require multiple basic transformations to be
applied, so we keep translation at the end, the last transformation because scaling and
shearing are likely to change the position so with translation we try to compensate with that
so typically we follow this rule of thumb.
(Refer Slide Time: 32:37)
Now, one thing should be noted here, when we applied scaling, we actually applied it with
respect to the origin. So, origin is the fixed point in the example. However, that is not
necessarily true. We can have any fixed point located at any coordinate in a coordinate
system. So, in such cases, what we do? We apply the approach that we have seen earlier in
the example, but with slight modification. So, our approach when we are considering fixed
point which is not the origin is slightly different, let us see how it is different.
322
(Refer Slide Time: 33:33)
Suppose there is a fixed point F and we want to scale with respect to this fixed point. Now,
this is not origin, this is situated at any arbitrary location. Now, to determine the
transformation sequence, we assume a sequence of steps. So, if the scaling was with respect
to origin then we do not require anything else we simply scale, but if it is not with respect to
origin, if it is with respect to some other fixed point which is not the origin then scaling itself
involves a sequence of steps, just to perform scaling.
(Refer Slide Time: 34:15)
What is that sequence? So, first we translate the fixed point to origin that means, we make the
translation amount as such Tx is -x and Ty is –y; that is the first transformation. Then we
perform scaling with respect to origin, this is important. So, our scaling matrix is defined with
323
respect to origin. So, we first brought or in a conceptual way brought the fixed point to origin
then perform scaling and then the fixed point is translated back to its original place, now Tx
becomes x and ty becomes y, reverse translation.
(Refer Slide Time: 35:16)
So, how to form the sequence? Will follow the same right to left rule, first translation is the
rightmost transformation that is bringing the fixed point to origin, this is followed by scaling
so that is the second transformation that is followed by reverse translation that is bringing the
point to the original point again that is the leftmost transformation. So, our composite matrix
will be a multiplication of these these matrices; T, S and T, let us call it T1 and T2. We
multiply to get the composite matrices representing scaling with respect to any point other
than origin.
324
(Refer Slide Time: 36:17)
And in the same way we can actually perform other basic transformations with respect to any
fixed point other than origin. This is one example, which shows the procedure that we just
mentioned that is now suppose this original object was defined not with one vertex at origin,
but here where we have new vertices and the new point with respect to which the scaling
takes place is at T which is (5, 5), and the same object is placed here after scaling and
translation. So in this case, translation is not required because it was already at that point and
only scaling took place.
(Refer Slide Time: 37:25)
So, if we apply the previous approach that we outlined. So, here we are performing scaling
with respect to these fixed point D, and the transformation matrix, the composite
325
transformation matrix can be found by multiplying these 3 matrices. So, first we translate this
fixed point origin so Tx will be -5, Ty will be -5. Then we perform scaling with respect to
origin along the x axis that is sx will be 1/2, sy will be 1. And then we translate back this point
to the original position that is Tx=5, Ty=5 that is the composite matrix. So, once we get this
composite matrix for scaling we apply it to the points to get the transformed points.
(Refer Slide Time: 38:21)
And as I said, we can follow a similar approach with respect to rotation and shearing by first
transforming the fixed point with respect to which rotation are shearing had to be performed
to the origin then performing the corresponding operation and then translating it back to the
original location. So, for rotation, first we will have one translation. This is followed by
rotation with respect to origin. This is followed by this will be followed by translating back to
the original fixed point location.
For shearing same approach, translation to origin followed by shearing with respect to origin
followed by translating back to the original fixed point location. So, this is the composite
matrix form for performing any of the basic operation with respect to a fixed point that is not
origin.
326
(Refer Slide Time: 39:35)
So, to recap, if we are performing the basic operation with respect to origin, then we do not
require to do anything else, we simply apply the basic transformation matrix. However, if we
are performing the operation with respect to a point which is not the origin, then we perform
a composite transformation which involves 3 basic transformations; first one is translation
translate the fixed point to origin, second one is the actual transformation that is either
scaling, rotation or shearing and the third one is translating back the fixed point to its original
place.
And we perform it in this right to left manner so this is the right most, then this will be one on
the left of this, the second transformation and the third one will be on the left of this second
transformation. So if we put the sequence, first come 1, this will be followed by 2, this will be
followed by 3.
327
(Refer Slide Time 41:07)
For a better understanding let us go through one more example, which will illustrate the idea
further. Now, let us assume we require more than one transformations, so we will apply the
same process which we already outlined.
(Refer Slide Time: 41:24)
Consider this object, what are the transformations required to put this object as a chimney in
proportion here, as you can see that we need to rotate this object here. So, earlier the surface
now becomes here so it is a rotation in counter-clockwise direction positive rotation by 90
degree, and the size also reduces by half along the x direction. So sx should be 1/2, but all
these basic operations took place with respect to this fixed point. So, then how to get the
composite matrix?
328
So, we first translate the fixed point to origin so that is T (-5, -5), then we scale to make it 1/2
so then S half 1, along y axis there is no change so, we will keep it 1. So, then we get objects
like this, then we rotate it to get this final one. So, rotate by 90 degree but these 2 operations
we performed with respect to origin after translating the fixed point to origin. So, now we
have to translate it back so another translation (5, 5). So these matrices together when
multiplied will give us the composite matrix.
(Refer Slide Time: 43:29)
So, it will look something like this. So, if we replace this notations with actual matrices then
we will get these four matrices and when we multiply we will get the composite matrix which
will look like this. So, this is our way to get a composite matrix when we are trying to
perform multiple basic operations with respect to a point which is not the origin.
329
(Refer Slide Time: 44:08)
And after getting the composite matrix we will follow the same steps that is we will multiply
the surface points say these points suppose or any other surface point with the composite
matrix to get the transformed point, and that brings us to the end of this discussion. So, before
we end it, let us try to recap what we have learned today.
First, we discussed about an alternative representation for basic transformations that is the
homogeneous coordinate systems where we represent a 2D point using a 3D coordinate
system. And as we have seen, it makes life easier for building modular graphics packages or
libraries. So, using this homogeneous form, we can represent all 4 basic transformations
using 3 by 3 matrices.
Then what we learned is to form a composite matrix following the right to left rule so first
matrix that we apply on the objects should be the right most, next matric that we apply should
be the left to the right most matrix and so on till the last transformation. And we multiply all
these matrices together to get the composite matrices. Once we get the composite matrix, we
multiply it with the points to get the transformed points in homogeneous coordinate system.
Finally, we divide this x and y values in the homogeneous system by the homogeneous factor
to get back the original points. We also learned about how to perform the basic
transformations with respect to any point that is not origin. The earlier notations were meant
to be performed with respect to origin so when we are given a fixed point and we are
supposed to perform the basic transformation with respect to that fixed point, which is not the
origin, then we follow a composite matrix approach there we first translate the fixed point to
330
origin, perform the required transformations basic transformations with respect to origin and
translate the point back to its original location.
Following the same right to left rule, we get the composite matrix to represent the basic
transformation with respect to any arbitrary point. So far, whatever we have discussed are
related to 2D transformations. In the next lecture, we will learn about transformations in 3D.
(Refer Slide Time 47:19)
The topic that I covered today can be found in this book, chapter 3, section 3.2 and 3.3. You
may go through these chapters and sections to learn more about these topics. We will meet
again in the next lecture. Till then thank you and goodbye.
331
Computer Graphics
Professor. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 12
Transformations in 3D
Hello and welcome to lecture number 12 in the course Computer graphics. So, we are, as you
may recollect, discussing the graphics pipeline, and as we are doing for last few lectures, we
will start with having a relook at the pipeline stages so that we are able to remember it better.
(Refer Slide Time: 0:58)
So, there are 5 stages in the graphics pipeline, the first stage is object representation, second
stage is modelling or geometric transformation third stage is lighting or assigning colour to
points on the objects, fourth stage is viewing pipeline where we transfer a 3D object to a 2D
viewing plane. The transformation takes place through 5 sub stages viewing transformation,
clipping, hidden surface removal, projection transformation and window to viewport
transformers and the fifth and final stage is scan conversion. Here, we actually map the view
plane object to the pixel grid on the screen.
And as we have mentioned earlier, each of these stages take place in specific coordinate
systems, object representation is done in local or object coordinate system, modelling
transformation here we actually transfer from local to world coordinate system, lighting takes
place in world coordinate, then viewing pipeline takes place in 3 coordinate systems; world
coordinate, view coordinate and then device coordinate. And finally, scan conversion takes
332
place in screen coordinate system. So, the different coordinates are involved in different
stages of the pipeline.
(Refer Slide Time: 3:00)
Among these stages, so far we have discussed the first stage object representation. Currently
we are discussing the second stage that is modelling or geometric transformation. And in the
last couple of lectures, we have discussed the basic transformation idea including how to
perform complicated transformations in terms of sequence of basic transformation, but all our
discussion were based on 2D transformations.
(Refer Slide Time: 3:32)
333
In other words, we were performing transformations in 2 dimensional reference frame. Now,
let us have a look at 3D transformation. So, 3D transformation will be the topic of discussion
for our lecture today.
(Refer Slide Time: 4:12)
So, when we talk of 3D transformation, essentially we refer to all the basic transformations
that we have already discussed in 2D but in a modified form. And the transformations are
actually same as in 2D, but their representation is different. In 2D, we discussed 4 basic
transformations, namely translation, rotation, scaling and shearing. Now, these 4 remainders
basic transformations in 3D word also. However, their representation is different.
(Refer Slide Time: 04:47)
334
So, earlier we had used homogeneous coordinate system to represent the transformation. We
will use the same coordinate system here to represent the 3D transformation as well, but with
the difference. Now, earlier in the matrix representation, we used 3×3 matrices in the
homogeneous coordinate system to represent each of the transformations. In 3D, we use 4×4
matrices to represent each transformation. However, the homogeneous factor h remains the
same that is h=1.
So, essentially we are using instead of 3×3, we are using 4×4 transformation matrices to
represent a transformation in 3D, and the homogeneous factor h remains equal to 1 because
we are dealing with modelling transformation. But there are certain differences and we
should keep in mind these differences, the differences are primarily with respect to the 2
transformations; rotation and shearing.
(Refer Slide Time: 6:10)
In rotation earlier, we assumed that the rotations are taking place with respect to the z axis or
some axis that is parallel to it. That was our basic assumption in 2D rotations. In 3D this
assumption is no longer valid, here we have 3 basic rotations with respect to each principle
axis x, y and z. Earlier we had defined only one rotation with respect to z axis. Now, in 3D
we are defining 3 basic rotations with respect to the 3 principle axis x, y and z, so number of
basic transformations changed. So, earlier we had one for rotation, now we have 3 for
rotation also.
335
(Refer Slide Time: 7:05)
Also previously, we did not face this situation when we defined rotation with respect to z
axis. Now here, transformation matrix that we should use to represent rotation about any
arbitrary axis that means, any axis that is not the principle axis, is more complicated than in
2D. So, in 2D we can have only z as principle axis, in 3D we have 3 principle axis, we have
to take into account all 3.
So, when we are trying to define an arbitrary rotation with respect to any arbitrary axis then
deriving the transformation matrix becomes more complicated. And the form of the matrix
also is more complicated than what we have encountered in 2D. We will have a look at this
derivation of rotation matrix with respect to any arbitrary axis later in the lecture that is about
rotation.
336
(Refer Slide Time: 8:33)
Now as I said, shearing is also having some difference with respect to its 2D counterpart. It is
in fact more complicated compared to what we have seen in 2D.
(Refer Slide Time: 8:50)
So, let us start our discussion with shearing in 3D then we will talk about the differences in
rotation and then we will see how to derive a composite transformation matrix for rotation
about any arbitrary axis. Now, when we are talking of shearing, as we have seen earlier we
are trying to basically change the shape of the object. So, essentially to introduce some
deformity in the object shape. Now this distortion or deformation can be defined along 1 or 2
directions at a time while keeping 1 direction fixed that is 1 constrain that we follow for
defining shearing in 3D.
337
For example, if we are trying to shear along x and y direction then we have to keep z
direction shearing fixed as a result, the general form is different than in 2D shearing.
(Refer Slide Time: 10:11)
In fact, we can define 6 shearing factors. Recollect that shearing factor refers to the amount of
distortion or deformation we want to introduce along a particular axis. So, in this case in case
of 3D shearing we can define 6 shearing factors and each factor can take any real value or
zero if no shear along that particular direction takes place, so when the shearing factor is 0
that means, along that direction there is no shearing. And with respect to the six factors, the
shearing matrix looks something like this, where shxy, shxz, shyx, shyz, shzx and shzy are the six
searing factors.
(Refer Slide Time: 11:25)
338
Among these factors shxy and shxz are used to shear along y and z directions respectively
leaving the x coordinate value unchanged. We earlier mentioned that while performing
shearing one direction has to be left unchanged. So, in this case, we are performing shearing
along y and z directions whereas, shearing along x direction remains 0.
(Refer Slide Time: 12:09)
Similarly, shyx and shyz refers to the shearing factors along x and z direction when y
coordinate value remains unchanged. And likewise, the other 2 shearing factors can be
defined that is shzx and shzy, these 2 refer to shearing along x and y direction leaving z value
unchanged. So, each pair actually refers to shearing along 2 directions while the third
directions remain unchanged that means shearing along that third direction does not take
place.
So, that is about shearing as you can see it is more complicated compared to the shearing
matrix that we have seen for 2D transformation that is because we now have 6 shearing
factors. Now, let us have a look at other transformation matrices basic transformation
matrices.
339
(Refer Slide Time: 13:31)
Translation is the simplest and the form remains almost the same with the addition of one
more dimension. So, we have tx referring to translation along x direction, ty referring to
translation along y direction and tz referring to translation along z direction.
(Refer Slide Time: 14:01)
As I said before, for rotation, we do not have a single matrix. Instead, we have 3 separate
matrices, each matrix corresponding to the rotation along a particular principle axis. So,
therefore, since there are 3 axis, so we have 3 rotation matrices.
340
(Refer Slide Time: 14:31)
Rotation about x axis, when the angle of rotation is ϕ looks something like this matrix.
(Refer Slide Time: 14:42)
Rotation along the y axis again, assuming the rotation angle to be ϕ is shown here.
341
(Refer Slide Time: 15:06)
And finally, rotation about z axis by an angle ϕ is shown here in this matrix. So, we have 3
matrices representing 3 basic rotations; one about x axis, one about y axis and one about z
axis.
(Refer Slide Time: 15:31)
Scaling is also similar to the 2D counterpart, Sx is the scaling factor along x direction, Sy is
the scaling factor along y direction, Sz is the scaling factor along z direction. So, if we do not
want to perform any scaling along a particular direction, we simply set that particular scaling
factor as 1. So, if we do not want scaling along say y direction, then we will set Sy=1. And if
you may recollect scaling factor less than 1 means, in that particular direction we want to
342
reduce the size and scaling factor greater than 1 means in that particular direction we want to
increase the size.
So, scaling is related to size, shearing is related to shape, translation and rotation is related to
position. So, then we have in 3D more than 3 basic matrices, we have one for translation, one
for scaling, one for shearing, and three for rotation so total 6 basic matrices representing 6
basic transformations in 3D. The other difference that I mentioned with respect to 2D
transformation is the rotation of an object with respect to any arbitrary axis that means, any
axis that is not one of the principle axis x, y and z.
(Refer Slide Time: 17:17)
So, what is the idea that we want to rotate an object by an angle θ counter clockwise around
an axis of rotation passing through 2 points P1 and P2. So, here we are defining these two
points because with these two points, we can define a line or line segment that represents the
axis of rotation. So, unless we mentioned the points, it will be difficult to represent the axis.
So, then we have an axis defined by the 2 points, we have an angle of rotation theta, which is
counter clockwise.
Remember that we are using a convention that if angle of rotation is counter clockwise then it
is positive angle, if angle of rotation is clockwise, then we consider it to be negative angle.
So, if we are rotating the object by an angle θ counter clockwise, then it will be simply θ, but
if we are rotating the same object by an angle θ clockwise, then we will replace θ with -θ.
Now, let us see what happens when we are trying to perform this rotation with respect to any
arbitrary axis, how we can derive a composite matrix representing this rotation.
343
(Refer Slide Time: 18:59)
The idea is illustrated in the series of steps. So, this one top left figure shows the initial
situation where P1 and P2 define the axis of rotation represented with the dotted line with
respect to the 3D reference frame or coordinate frame. Now then, in step 1, what we do? We
translate the line to the origin. Remember, earlier in our discussion on composition of
transformation, we discussed how to combine multiple transformations.
So, there what we said that if we are trying to perform some basic operation with respect to
any arbitrary fixed point other than origin, then what we follow? We first translate the point
to origin, perform the basic transformation and then translate it back to its original location.
So, the same basic principle we are following here, we are given the arbitrary axis or arbitrary
fixed line. In the first step, we translate it to the origin that means the axis passes through the
origin.
In step 2, what we do? Now the axis passes through the origin, but there is no guarantee that
it aligns with any of the principle axis. So, in this step 2, we align the line with z axis in our
particular explanation, but it is not necessary to always align with z axis, instead you can
always align it with either x or y axis as well. But let us assume that we are aligning it with
the z axis. So, then that involves rotation about x and y axis.
So, now our arbitrary axis is aligned with the z axis. So, rotation will take place around or
about z axis that we do in Step 3, we apply the rotation about the z axis. After the rotation is
done in step 4, what we do is, we rotate the line back to its original orientation. So, when we
brought it or translated it in the step 1 to pass it through origin, it had one orientation. So in
344
step 4, we return it to that orientation and in step 5 or the final step, we translate it back to its
original position.
So in step 4, we are returning it to its original orientation and in step 5 we are translating it
back to its original position. So, these 5 steps are needed to construct the composite matrix
representing rotation of an object with respect to any arbitrary axis. So, let us try to derive it
then.
(Refer Slide Time: 22:39)
As we have seen in the figure, so there are 5 transformations. So, the composite matrix or the
ultimate final, the final transformation matrix would be a composition of these 5 basic
transformations.
345
(Refer Slide Time: 23:03)
So, the first transformation is translation. Translating the line so that it passes through origin.
Now, the translation amount would be minus x, minus y, minus z since we are moving along
the negative z direction where x, y, z is the coordinate of P2, one of the endpoints.
(Refer Slide Time: 23:30)
Then in step 2, we align the line to the z axis, but as I said it need not be always z axis, it can
be x or y axis also. So, in order to do that what we need to do? We need to perform some
rotations about x and y axis. So first, let us assume that first we are rotating the line about x
axis to put the axis on the x-z plane and the angle of rotation is α. Then, we are rotating it
about the y axis to align the axis with the z axis.
346
So, first we rotate it about x axis to put it on the x-z plane and then we rotate it about y axis to
align it with the z axis. So, in the first case the angle of rotation let us denote it by α and in
the second case, let us denote it by ᵝ, both are anticlockwise rotation so both are positive at
this stage.
(Refer Slide Time: 24:59)
Then in stage 3, what we do? Now we have aligned the axis with z axis and then we perform
the rotation about z axis which is our original objective. So then, we use the rotation matrix
with respect to z axis, so here θ is the angle of rotation of the object. Remember that this θ
angle of rotation is with respect to arbitrary axis, now we are using it to rotate about z axis
because we have aligned arbitrary axis with the z axis.
(Refer Slide Time: 25:46)
347
Then, in step 4 and 5, we reverse the operations we performed in step 1 and 2. So first, we
take the line to its original alignment, which involves reverse rotation about y and x axis to
bring the axis of rotation back to its original orientation. While aligning, we rotated with
respect to x first and then y. Since now, we are reversing the operation, so we rotate it with
respect to y first and then x.
(Refer Slide Time: 26:28)
And in Step 5, what do we do? We then translate it back to its original position, which is the
last step. So, then what would be the composite matrix?
(Refer Slide Time: 26:43)
348
We can get it by matrix multiplication and we will follow the right to left rule. So, first we
perform the translation to take the line passing through origin, then we performed a rotation
about x axis by an angle α to bring the line on the x-z plane then we perform a rotation by an
angle ᵝ around the y axis to align it with z axis, then we performed the actual rotation by an
angle θ with respect to the z axis. Then we reverse the earlier steps that is first we perform
rotation with respect to y, then rotation with respect to x by the same angle amount as in the
earlier cases and then reverse translation.
Now, since we are rotating in inverse of what we did in step 2, now these inverse rotations
can simply be represented by a change of sign of the angle. So, earlier if the angle was ᵝ than
it will be -ᵝ here and if the angle was α, then it will be -α here. So, when we rotate it about x
axis with α, in case of reverse rotation we will rotate about x axis by -α. Similarly, we rotated
here with ᵝ, here we will rotate by -ᵝ. So, the reverse rotation means changing the angle of
rotation because from counter clockwise we are now rotating clockwise.
So, these matrices, multiplied in the particular sequence shown here will give us the
composite matrix for rotating an object by an angle θ about any arbitrary axis of rotation. So
that is in summary, what are there in 3D transformation. So, it is mostly the same with 2D
transformation with some differences. First difference is that in homogeneous coordinate
system, now we require 4×4 matrices instead of 3×3 matrices to represent each
transformation.
349
Then earlier we defined 4 basic transformations namely, translation, rotations, scaling and
shearing in the context of 2D transformation. Now we have 6 basic transformations;
translation, rotation about x axis, rotation about y axis, rotation about z axis, scaling and
shearing. Earlier we had defined 2 shearing factors, now there are 6 shearing factors, it is a bit
more complicated than the earlier case.
Now, in shearing, when he perform shearing along 2 axis, 2 principle axis, there is no
shearing along the third principle axis that we follow in 3D shearing. Apart from these
differences, there is another major difference in the way we derive composite transformation
matrix for rotation about any arbitrary axis.
So, in order to do that, we follow 5 step process, first we translate the line to pass through
origin, then we align it with one of the principle axis, then we perform the rotation by the
desired angle about that axis, then we place the line back to its original orientation by
performing reverse rotations, and then we translate it back to its original position. And we put
the individual basic matrices in right to left manner to get the final composite matrix as we
have shown in the discussion. Now, let us try to understand the 3D transformation with
respect to 1 illustrative example.
(Refer Slide Time: 32:01)
Let us consider a situation, there is an object defined with the vertices defined with the
vertices A B C D here in this figure top figure, as you can see it is on the x-y plane This is the
initial situation, now we want to use this particular object to construct a partition wall defined
350
by the vertices A’, B’, C’ and D’ in a scene where A’ corresponds to A, B’ corresponds to B,
C’ corresponds to C and D’ corresponds to the vertex D.
So, here as we can clearly see some transformation took place, the question is try to calculate
the composite transformation matrix that enables this object to be positioned as a partition
wall in this scene. Let us see how we can do this.
351
(Refer Slide Time: 33:34)
So, initially the square is in the x-y plane and each side had 2 units of length and the centre is
given as (2, 2, 0). The final square is on the y-z plane with each side equal to 4 units and the
centre is now at (0, 2, 2). Now, these lengths and centres can be found out by the coordinates
of the vertices.
(Refer Slide Time: 34:28)
So, then what we need to do? So, in this case, we need a rotation from x-y plane to y-z plane,
but the axis of rotation is not z axis, it is parallel to z axis so we will follow this composite
matrix transformation creation approach. So, first we translate the centre to origin, centre of
this original object so then the translation amount will be -2, -2 and 0.
352
(Refer Slide Time: 35:13)
So, if we are translating the centre to origin then the axis of rotation which was parallel to z
axis now will be automatically aligned with the z axis. So, then we perform the rotation by 90
degrees anti clockwise around the z axis. So, we will use the rotation matrix defined for
rotation about z axis with the angle of rotation 90. Since the rotation is anti-clockwise, so it
will be positive angle.
(Refer Slide Time: 35:58)
Then we rotate by 90 degrees anti clockwise around y axis. So, again we will use Ry (90)
where Ry θ is the basic rotation matrix about y axis.
353
(Refer Slide Time: 36:28)
Then we perform scaling because the size increased scale up by 2 in y and z direction, so x
direction will have a scaling factor 1, there is no change and z and y direction will have
scaling factor 2, the size will double.
(Refer Slide Time: 36:54)
And then we translate the centre to the new object centre using the translation matrix.
354
(Refer Slide Time: 37:14)
So, then the composite transformation matrix can be obtained by multiplying these individual
basic transformation matrices together where we followed the right to left rule that is, first is
the translation to origin, then rotation about z axis, then rotation about y axis, then scaling up
by 2 along y and z direction, then translation to the new origin. So if we multiply, we will get
the new composite transformation matrix.
And after we get this matrix, just to recap the procedure, what we need to do? We need to
multiply each vertex with this composite transformation matrix. So, if vertex is represented
by column vector P and this composite transformation matrix is M, then we perform this
(M.P) for each vertex to get the new vertex position in homogeneous coordinate. So,
eventually to get the physical coordinate, we perform this operation, we divide the x
coordinate by homogeneous factor, y coordinate by homogeneous factor and z coordinate by
the homogeneous factor.
So in our case of course, h=1. So, it really does not matter, the x, x and z coordinate will
remain the same. But later on, as I mentioned earlier, we will see that there are situations
where h≠1. So, in that case, this division is very important that we will see in subsequent
lectures.
So, with that, we come to a conclusion to our discussion on 3D transformation. And also, we
have come to a conclusion to our discussion on the second stage that is modelling
transformation. So, we started our discussion with 2D transformations, there we introduced
the basic idea of modelling transformation that is to assemble objects that are defined in their
355
own or local coordinate systems into a world coordinate scene. In order to do that, we
perform geometric transformations, any transformation can be considered to be a sequence of
basic transformations in 2D transformations.
We have discussed 4 basic transformations, those are translation, rotation, scaling and
shearing. We also discussed why it is important to represent transformations in terms of
matrices, because of modularity and compatibility with subsequent stages when we are
implementing a package in the form of library functions or APIs or standard functions. Now,
for matrix representation we discussed the importance and significance of homogeneous
coordinate system and we have seen how to use the homogeneous coordinate system to
represent basic transformations or any composite transformation.
So, in summary, in modelling transformation, we perform transformation by considering
individually or in sequence basic transformations, these transformations are represented in the
form of matrices, where the matrices are themselves representations in homogeneous
coordinate system. And in 2D transformation, we have 4 basic transformations. In 3D
modelling transformations, we have 6 basic transformers. And any transformation with
respect to any arbitrary point or axis of rotation can be derived by using a sequence of basic
transformations, the way we derive composite transformation.
(Refer Slide Time: 42:13)
So in the next lecture, we shall start our discussion on the third stage of the graphics pipeline
that is assigning colour or the lighting.
356
(Refer Slide Time: 42:31)
Whatever I have discussed today can be found in this book. And you may refer to chapter 3,
section 3.4 to know more about the topics that I have covered today. So, we will meet you in
the next lecture. Till then, thank you and goodbye.
357
Computer Graphics
Professor. Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 13
Color Computation – Basic Idea
Hello and welcome to lecture number 13 in the course Computer Graphics. So, by now we
have covered more than one-third of the course. Before we go into the next topic, let us pause
for a moment here and reflect on what we have learned so far. As we may recollect, we were
discussing about the process, the process of displaying an image on a computer screen. Now
this is a generic concept of course the screen may vary in size.
And it need not be always image. It can be characters also, but broadly what we are
concerned about in this course is basically how a screen or a display unit or some output unit
generates an image that process is captured in the form of a set of stages which we call 3D
graphics pipeline, currently we are discussing the pipeline. Let us have a relook at the stages
of the pipeline.
(Refer Slide Time: 01:51)
As you can see here there are 5 stages. First stage is object representation. In this stage, we
define objects in their own or local coordinate system, then we have second stage that is
modeling or geometric transformation. So, in this case the objects that are defined in the local
coordinates of that particular object is transformed to a world coordinate scene. So, here a
transformation takes place from local coordinate to world coordinate.
358
Third stage is lighting. In this stage, we assign color to the points on the surface of the
objects. We may consider this to take place in the world coordinate system itself. Fourth
stage is actually a collection of sub stages. Fourth stage is viewing pipeline which consists of
5 sub stages viewing transformation. So, in this stage there is a transformation that takes
place from world coordinate to a new coordinate system called view coordinate.
Then there is one process called clipping which takes place in the view coordinate system,
then another process hidden surface removal which takes place in view coordinate system
again. After that there is another transformation called projection transformation. In this
stage, another coordinate transformation takes place from a 3D view coordinate system to a
2D view coordinate system.
And then we have another transformation window to viewport transformation. Here we
transform the 2D view coordinate object to a device coordinate system. Together this 5 sub
stages constitute the fourth stage that is viewing pipeline. And finally we have scan
conversation which is the fifth stage. Here also some transformation takes place from the
device coordinate to a screen coordinate system. So, these are the stages of a 3D graphics
pipeline.
Now we have already discussed some of those stages and some of those stages are there
which remains to be discussed. What we have discussed?
(Refer Slide Time: 04:30)
359
We have discussed first stage that is object representation and also we have finished our
discussion on the second stage that is modeling transformation. Now we are going to start our
discussion on the third stage that is lighting or assigning color to the objects rather the surface
points of the objects. So, we will start with the basic idea of coloring process when we talk of
coloring what we mean and how we can actually implement the idea of coloring in the
context of computer graphics.
(Refer Slide Time: 05:20)
Now, as I have already mentioned, third stage deals with assigning colors. So, why that is
important? Let us again look at the example figures shown on the right hand side of the
screen. Here, as you can see on the top figure we have one object which we have assigned
color too and there is another object at the bottom where again we have assigned color. Now
what is the difference between this top figure and the bottom figure?
See in the top figure we have assigned color, but in this figure we are unable to perceive the
depth information. Now this depth information is very important to create an impression of
3D. This problem is not there in the lower figure here. In this case, as you can clearly see, by
assigning color in a particular way we manage to create an impression of a 3D object which
was not the case in the first image.
Now, how we manage to do that? Here, as you can see in this lower figure, same color has
not been applied everywhere. We have applied different colors with different intensity values
to give us the perception of depth. Now, when we talk of assigning color, we are actually
referring to this particular way of assigning colors so that we get the impression of depth.
360
(Refer Slide Time: 07:19)
Now, this appropriate color assignment can be considered to be the same as illuminating the
scene with a light. Why we get to see color? Because, there is light. If there is no light, then
everything will be dark and we will not be able to see the color. So, essentially when we talk
of assigning color we are actually referring to the fact that the scene is illuminated with a
light and based on that light we are getting to see the color.
So, in order to mimic this process of illuminating a scene with a light what we do in
computing graphics we typically take help of a model particular type of model called lighting
models. Now, in this third stage, we will learn about lighting models in details.
(Refer Slide Time: 08:25)
361
What this lighting models do? A lighting model actually computes the color and output a
number, a real number which represents intensity values, intensity of the light.
(Refer Slide Time: 08:51)
Now, the way these models are designed so they can only compute colors in terms of some
continuous real numbers, but as we all know computers are digital machine so they can only
process digital values, discrete values. They cannot deal with continuous numbers. So, we
need some method, some way to map this continuous values to streams of 0s and 1s or
digitalize those continuous values otherwise the computers will not be able to deal with those
values.
(Refer Slide Time: 09:44)
362
Now, this mapping process is also very important in our process of assigning colors to objects
in computer graphics and that constitute a crucial part in the third stage where we assign
colors. So, we will also discuss this mapping process. So, two things broadly we will discuss,
one is computing intensity values based on lighting models and the second thing is mapping
the continuous intensity values to a set of discrete steams of 0s and 1s.
(Refer Slide Time: 10:33)
Now let us try to understand the basic idea behind the process of illumination.
(Refer Slide Time: 10:41)
How we get to see color? Actually it is the outcome of a process, the process of illumination.
363
(Refer Slide Time: 10:54)
The process assumes that there is a source a light source which emits light. Now there maybe
one source, there may be more than one sources, but there has to be some source of light
which emits light. Now this emitted light falls upon the point, like in this figure as you can
see we have one source that is the light bulb and it emits light. The light intensity falls on
these two object surfaces.
(Refer Slide Time: 11:44)
Now, sometimes this light that comes from the source need not fall directly to the surface
point, instead it can get reflected from another point and then fall upon the surface point. Like
we have shown here it first falls on this objects and then gets reflected from there and then
364
finally falls on this point. So, at this point we have two light incident upon this point. One
comes from the direct light source that is the direct light other comes not directly from the
light source, but after getting reflected from another surface. So, that can be considered as an
indirect source of light.
So, at this point we have light coming from a direct source as well as indirect source. Now
this transportation of light energy from the source to the point this is our source here direct
source and indirect source. Now from the source to the point this transportation of light
energy is called illumination. So, this is of course one process of transporting light energy
from a source to the point either directly or indirectly.
(Refer Slide Time: 13:33)
Now, this incident light at this point gets reflected from the object surface and then falls upon
our eyes or the eye of the viewer. Now once we receive that light after getting reflected from
the object surface, we can perceive color. So, the intensity of this incident light to the eye is
the perceived color or the simply the color of the point. So, to recollect a viewer is looking at
this point on here.
Now, at this point there are two incident light. One coming from the direct source through
this path, one coming from the indirect source through this path. Now this process of
transporting light energy from the source to this point is illumination. So, after getting
illuminated the light is actually reflected from this point and reaches to the eye of the viewer
through this path. The intensity of this reflected light that reaches the eye actually is the
perceived color or simply the color of the point.
365
So, essentially what we perceive as color is the intensity of the light that is reflected from the
point to which we are looking at.
(Refer Slide Time: 15:23)
Now, this process of computing the luminous intensity or of the outgoing light at the point is
known as lighting and in computer graphics we are interested in simulating this lighting
process. So, there are two processes involved as we have just discussed; one is illumination
that is from the source the light falls on the point either directly or indirectly that is called
illumination.
And from that point the light gets reflected and reaches the eye of the viewer that is the
lighting and this intensity of the reflected light actually determines the color at that point. So,
we are interested in determining the color at that point. So, we are interested in mimicking the
lighting process.
366
(Refer Slide Time: 16:34)
Sometimes another term is used that is called shading; also known as surface rendering. This
term refers to the process of assigning colors to pixels minor difference earlier we are talking
of assigning color to any surface point now we are talking of assigning colors to pixels.
(Refer Slide Time: 17:06)
So, technically both are same, both refer to the process of computing color at a point, but
there is a difference particularly in the context of usage of this term in computer graphics.
What is the difference?
367
(Refer Slide Time: 17:37)
They represent two different ways of computing colors at a point. So when we are talking of
lighting and when we are talking of shading technically both refer to the same thing that is we
are talking of computing color of a point. But, in practice, when we use these terms in
graphic, we are referring to slightly different concepts and these concepts are related to the
way the color value is computed at the points.
(Refer Slide Time: 18:20)
In case of lighting, we take into account properties of light source and the surface. So,
essentially we are trying to simulate the optical phenomenon. So, when we are talking of
color, color of a surface point, it has to take into account the property of the material of the
368
surface, the properties of the light source and when we are talking into account these
properties, then we are talking of lighting.
When we are computing the color value taking into account these properties, we are talking
of lighting. So, essentially it refers to computing color taking into account all optical
properties that are relevant in color computation or in other words simulation of the optical
phenomena.
(Refer Slide Time: 19:28)
In order to do that, we use lighting models. But these models, as we shall see later, are
complex models involves lot of computations so essentially they are computation intensive.
(Refer Slide Time: 19:53)
369
Now in graphics, in practical applications, as maybe obvious to all of us by now, when we are
trying to render a scene, then there are large number of points. So, we need to compute color
at a large number of points. Now at every point if we have to apply this lighting model, then
since the model itself is very complex and computation intensive, the total time required to
compute and assign colors to those points may lead to some delay in appropriate rendering of
the pixels.
So, in other words, if we are applying the lighting models to compute color on all the surface
points, then we may not get a real time 3D image in a realistic way. So, it is in fact
inappropriate to apply the lighting model to compute colors at all the surface points. So, we
have a model, we know how the optics take place. We have taken into account the
characteristics, the properties of both the surface as well as the source.
But if we want to compute color it is not advisable or appropriate to compute it using lighting
model only that will lead to delay in rendering the image.
(Refer Slide Time: 22:00)
Now, in order to address these issue, typically an alternative approach is used. What is that
approach?
370
(Refer Slide Time: 22:09)
So, instead of computing color at every point by using lighting model what we do is, we map
the surface points to pixels and afterward we use the lighting model to compute colors for a
selected small number of pixels so not for all pixels only for a selected small number of
pixels which are on the surface we apply the lighting model. So, here we are not applying the
model for all the pixels that are on the surfaces, instead we are applying it only for a very
small subset of those surface pixels.
(Refer Slide Time: 23:02)
Subsequently we use those values to interpolate colors of the remaining surface pixels. So
suppose we have a surface like this now it is map to say this pixel grid so these are the pixels.
371
So, there are total 16 pixels that constitute the surface among these pixels we may apply the
lighting model to compute color of only one pixel and then use it to interpolate color of the
remaining pixels. Now this interpolation is not done by applying the lighting model instead
interpolation is done by much less computation mostly some iterative process which requires
less computation.
So essentially, interpolating colors rather than computing colors with the lighting models
saves lot of time. So, for example, in this case earlier if we had to use lighting model we had
to use for 16 points. Now, we are using lighting model say for one or two points. And then
remaining 14 or 15 points we are coloring using interpolation which is simple iteration of
simple computations steps. So, in this process we can save lot of time.
(Refer Slide Time: 25:00)
Now, this interpolation based process of pixel coloring is generally referred to as shading in
computer graphics. So, we have lighting and we have shading. So, we will distinguish
between these two, although technically they are same, but in the context of our discussion of
the third stage we will distinguish between the two. Lighting refers to application of a
lighting model to compute color and shading refers to application of interpolation to compute
color and we will learn about shading models also in the subsequent lectures.
372
(Refer Slide Time: 25:47)
Now, let us try to learn, in brief, some of the background information which we will utilize
for our subsequent discussions on lighting model, shading model as well as the mapping from
continuous intensity values to discrete intensity values. First thing is the factors that affect
color. What affects color? So, there are two broad things. One is properties of the light source
and the other one is properties of the surface on which the point lies. So, the surface
properties as well as the light source properties determine the color of a point.
(Refer Slide Time: 26:50)
Now, surface properties include two types of properties optical properties such as reflectance
and refractance. I hope you may be aware of these terms. Reflectance refers to the fact that
373
some portion of light gets reflected some gets absorbed. Refractance refers to the fact that
light gets refracted while passing through a surface and the amount of reflection or refraction
is determined by some properties reflectance properties and refraction properties.
Apart from these optical properties there are geometric properties or attributes as well, such
as position of the object surface with respect to the light source orientation with respect to the
light source and so on these also determine the amount of color or the particular color that we
perceive. That is about surface properties. What about light source?
(Refer Slide Time: 28:23)
So, in graphics we typically consider 3 types of light source. Let us have a look at these 3
types.
374
(Refer Slide Time: 28:29)
First one is point light source. So, here what we are assuming is that such sources emit light
equally in all direction from a single point which is dimensionless and how do I characterize
this type of light sources? Since there is no dimension we do not need to characterize by their
shape or size, instead we simply characterize them by their position and intensity values of
the emitted light. So, what is the intensity value of the light that they are emitting as well as
their position?
(Refer Slide Time: 29:11)
If we are trying to model some light source that are very very far with respect to the point.
Typically infinitely distance sources. For example, the sunlight. We can use this concept of
375
point light source to model such light sources. However, in such cases since it is very very
very far position makes no sense. So, we characterize such sources only with respect to the
intensity of the emitted light. So, only the intensity of the emitted light characterizes the light
sources that are infinitely distant from the point.
(Refer Slide Time: 30:13)
Then we have directional source or spotlight. So, we used this type of light sources to
simulate beam of light effect. In this case what we assume is that it consists of a point light
source and that source emits light within an angular limit so within this limit characterized by
this angle theta. Now, if a point is within this limit then that point is illuminated, if a point is
outside this limit, then it will not be illuminated by this particular light source.
376
(Refer Slide Time: 31:02)
So essentially, spotlight sources can be characterized by three things: the position of the point
source, the angular limit characterized by this angle and the emitted light intensity. So, the
intensity with which it emits the light. Later on we will see that how this intensity varies from
one point to another while we will be discussing the lighting model.
(Refer Slide Time: 31:39)
So, then the third type of light is ambient light or light that comes from indirect sources. So,
sometimes there maybe objects which are not directly illuminated by a light source, but we
are still able to see it. How? Because the light that is getting emitted from the light source
gets reflected by other object that surround this particular object of interest and that reflected
377
light from other objects fall upon the object of interest and then comes to our eyes and we get
to see that particular object.
Like the example shown here in this figure even if assume that this direct light is not
available still we will be able to see this object of this particular point because this light is
falling on this object here and then getting reflected and falling on the point of interest and
then from there getting reflected and comes to our eye. So, we get to see this point because it
gets light from this source, indirect source. Now this is indirect illumination from
surrounding surfaces.
(Refer Slide Time: 33:32)
And that also is one type of light source which we call ambient light, but if we want to model
this ambient light effect that is how much light is getting reflected from surrounding surfaces
and falling upon the point of interest, that as you can probably guess, it is going to be quite
complex because there may be large number of objects at different positions, orientations
with different surface properties.
And if we need to calculate the luminous intensity from each of this surface points that
ultimately falls upon that point of interest and that is going to take quite a lot of computations
and is likely to time consuming. So, typically in graphics to avoid such complex
computations we assume some simplified model of ambient light that also we will see in our
discussion on the lighting model.
378
(Refer Slide Time: 34:43)
And such simplified model is called ambient light source. So, we assume that there is an
ambient light source which effects every surface point uniformly which of course in practice
is not the case, but we assume that and we will see that assumption leads to realistic images
in most of the cases without too much additional computations.
So, to summarize, when we are talking of computing color, we need to take into account two
things, one is surface properties both optical properties as well as geometric properties and
one is the light source and we just discussed 3 types of light sources. One is point light source
characterized by position and intensity of the emitted light, but if we are considering a point
light source at a very, very distant location, then position is not important only emitted light
intensity characterizes such sources.
Then we have spotlight characterized by the point light source position the extent of angular
spread of the light and the intensity of the emitted light and third type is ambient light source
where we are assuming that there is a simplified model of ambient light effect and this model
is encapsulated in the form of a single light source which affects all the object in a scene
uniformly.
We will learn more about this sources during our discussion on the lighting model we will get
to see how these sources and the surface properties are going to affect the computations in a
lighting model.
379
(Refer Slide Time: 37:12)
One more thing about this ambient light is that the simplified model that we assume is that
we are assuming that such light sources do not have any spatial or directional characteristics.
As a result they are assumed to illuminate all surfaces equally and characterized by only one
thing that is ambient light intensity. These are crucial considerations to be able to model the
lighting process without imposing too much computational overhead on the system.
(Refer Slide Time: 38:13)
So, with this background knowledge, we would be discussing the idea of a lighting model in
terms of a simple lighting model that we will do in the next lecture. You may like to note the
term ‘simply’ although we will see in practice that it is still complex, but that is the simplest
380
of all possible lighting models that are there and used in graphics. And we will also discuss
the computations involved and how to how to reduce the computations by making some
simplifying assumptions. That lighting model we will discuss in the next lecture.
(Refer Slide Time: 39:01)
Whatever I have discussed today can be found in this book chapter 4 section 4.1. You may go
through this section to learn in more details about the topics that I just discussed in today’s
lecture. See you in the next lecture. Till then good bye and thank you.
381
Computer Graphics
Professor. Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 14
Simple Lighting Model
Hello and welcome to lecture number 14 in the course Computer Graphics. As usual, we will
start by recollecting the pipeline stages which we are currently discussing.
(Refer Slide Time: 00:44)
So, there are 5 stages in the 3D graphics pipeline. Now, if you may recollect, this pipeline
refers to the process of rendering a 2D image on a computer screen which is generated from a
3D scene and among those 5 stages, we have seen the first stage object representation. We
have also discussed modeling transformation, the second stage. Currently we are discussing
lighting or the third stage after that there are two more stages; one is viewing pipeline which
itself is a series of 5 short stages and then finally scan conversion or rendering which is the
last stage.
382
(Refer Slide Time: 01:33)
As I just mentioned we are currently discussing the third stage that is lighting. So, the idea is
that we want to assign colors to the points that are on the surface of the objects that are there
in a scene. The assignment of colors to the surface point is the responsibility of the third stage
that is called the lighting stage. If you may recollect in the previous lecture we discussed the
basic concepts that are there behind this coloring of surface point.
First thing is lighting that is, the light that comes after getting reflected from the point of
interest to our eye that light determines the perception of color and this process of perceiving
color by receiving the reflected light from the point of interest is called lighting and we
discussed that this lighting can be computed with the help of a simple lighting model. Today
we are going to talk about that simple lighting model.
When we are referring a lighting model to be simple that means we are trying to simplify
certain things. Now if you may recollect lighting models refers to the modeling of the process
of lighting. Now that is clearly an optical process and when we are using the term simple to
refer to a lighting model we are essentially referring to the fact that many optical phenomena
that happens in practice will be ignored.
Instead, we will make some simplifying assumptions about those phenomena and then
implement the lighting model.
383
(Refer Slide Time: 03:50)
So, in order to discuss the lighting model, we will start with the basic idea that is we use the
lighting models to compute colors at the surface points. So, essentially the job of the lighting
model is to enable us to compute colors at the points of interest.
(Refer Slide Time: 04:14)
If you may recollect, in the introductory lecture we mentioned there are broadly two
components that determine the color. One is light source, other one is surface properties.
Now, for simplicity, let us start by assuming that is a single light source which is
monochromatic and a point light source. Monochromatic means it has only one color
component and it is a point light source.
384
If you recollect, we discussed point light source that are dimensionless and characterized by
only position as well as the intensity of the emitted light.
(Refer Slide Time: 05:04)
So, when we are assuming a monochromatic single point light source then how the model
will look like let us try to derive it.
(Refer Slide Time: 05:17)
In order to do so, let us revisit our idea of perceiving a color the process involved in
perceiving a color. So, this is a light source in the figure. Now as you can see on the queue
this is a point of interest at this point we want to compute the color. Now color perception we
385
get after we receive the light that gets reflected from that point to our eye or the viewer eye.
Now this light is a combination of two incident light.
One comes directly from the light source this is direct light one comes after getting reflected
from a secondary object this is we call ambient light. So, there are these two components
direct light and ambient light.
(Refer Slide Time: 06:28)
So, we can say that this reflected light intensity can be approximated as a sum of intensities
of the two incident light that is ambient light and direct reflection that is the assumption
simplifying assumption that we are making.
(Refer Slide Time: 06:57)
386
Now this reflection from a point can occur in two ways. One type of reflection is called
diffused reflection and the other type is called specular reflection. So, we have two types of
reflection one is diffuse reflection other one is specular reflection.
(Refer Slide Time: 07:24)
Let us try to understand these different types of reflections with respect to one illustrative
example. Look at the figure here as you can see on this object different colors are there at
different points. This region is having slightly dark color and this color comes from ambient
reflection. Above this region we have slightly brighter color excluding the central region this
whole region excluding the central region is having somewhat brighter color that is called
diffuse reflection.
Now diffuse reflection is defined as given here that when incident light tends to reflect in all
directions from a rough or grainy surface then we get to see diffuse reflection. Now we
assume that both reflection from direct light source as well as ambient light can result in
diffuse reflection. So, ambient and diffuse technically both are same diffuse reflection, but we
will differentiate between the two.
By the term diffuse we mean diffuse reflection due to direct light and by the term ambient we
mean diffuse reflection due to ambient light.
387
(Refer Slide Time: 09:18)
For a shiny or smooth surface, we see a different sort of reflection that is light gets reflected
in specific direction or region and if a viewer is situated within that region then the viewer
gets to see a bright spot. You can see here in this figure, this zone, the color in this zone is
completely different from the surrounding surface region. This is a bright spot and this results
due to another type of reflection. Now this reflection is called specular reflection.
So, we have this third type of reflection specular reflection. So, we have diffuse reflection
due to ambient light which gives us this dark color somewhat dark color then diffuse
reflection due to direct light source which gives us somewhat lighter color and finally
specular reflection which gives us this bright spots.
388
(Refer Slide Time: 10:31)
So, in light of this knowledge, let us now try to derive the simple model. So, in the simple
model then we have 3 components. One component is due to the diffuse reflection of ambient
light, one component is due to the diffuse reflection of direct light and the third component is
due to the specular reflection of direct light that is incident at that point.
(Refer Slide Time: 10:59)
So, we can actually model the light intensity that reaches to the viewer from the surface point
that is of interest to us as a sum of 3 intensities.
389
(Refer Slide Time: 11:23)
What are these 3 intensities? Intensity due to ambient light, intensity due to diffuse light and
intensity due to specular light. Now when I say intensity due to ambient light I mean to say
the diffuse reflection of ambient light when I say intensity due to diffuse light I mean to say
diffuse reflection of direct light and when I say specular light I mean to say specular
reflection due to direct light.
So, the intensity at the point is a sum of these three intensities which we denote by these
terms Iamb, Idiff and Ispec.
(Refer Slide Time: 12:22)
390
Now, those are the components. Now, how to get those components? So, one assumption is
that reflected light intensity is a fraction of incident light intensity. How to decide on this
fraction? It is determined by a surface property which is known as the reflection coefficient
or reflectivity. Now recollect in our earlier lecture we discussed about two determinants for
color.
One is light source, other one is surface property. Now, we are bringing in the surface
property here. So, we are assuming that the reflected light is a fraction of the incident light
and the fraction is determined by one surface property that is the reflectivity or the reflection
coefficient.
(Refer Slide Time: 13:26)
Now, in order to control the lighting effect in our computation, we define 3 such reflection
coefficients.
391
(Refer Slide Time: 13:41)
One is for the 3 types of lights. So, one coefficient for diffuse reflection due to direct light,
one coefficient for diffuse reflection due to ambient light and one coefficient for specular
reflection due to direct light. So, diffuse reflection coefficient due to ambient light is denoted
by ka. Diffuse reflection coefficient due to direct light is denoted by kd and specular reflection
coefficient due to direct light is denoted by ks.
So, we are defining these three coefficients and we are also specifying the values that these
coefficients can take. It is defined as a range.
(Refer Slide Time: 14:34)
392
These coefficients can take values within the range 0.0 to 1.0. Now when we are specifying
the value to be 0.0, it represents a dull surface with no reflection so everything will be
absorbed. And when you are specifying the value 1.0, it represents the shiniest surface with
full reflection that is whatever gets incident to that point will be fully reflected from that
point. So, it reflects all the incident lights.
By varying these values, we can actually control the amount of dullness or shininess of the
surface of interest.
(Refer Slide Time: 15:26)
Now, as I said, there are three components which determines the color: one is the ambient
light component, one is the diffuse reflection component due to direct light and one is the
specular reflection component due to direct light. So, let us try to model this individual
components one by one. We will start with the ambient light component which is the simplest
two model and actually we will be making a very, very simplifying assumption in this case.
393
(Refer Slide Time: 15:59)
So, here we will assume that every surface is fully illuminated by an ambient light with
intensity Ia so that is our simplifying assumption that all points are getting illuminated by the
same ambient light intensity Ia so we will not consider complex optical behavior of light after
getting reflected from surrounding surfaces. We will instead make a very, very simplifying
assumption that any point gets illuminated by a single intensity Ia representing the ambient
light. So, essentially we are modeling ambient light as a single light source with intensity Ia.
(Refer Slide Time: 17:03)
And we have already defined the reflectivity or reflective coefficient for ambient light. Now
if the light that is incident at a point is Ia, then reflected light we can compute based on the
394
assumption will be the incident light multiplied by the coefficient. So, that will give us the
ambient light component of the color. This is our simple model for ambient light. So, with
this model we can compute the intensity contribution due to ambient light in the overall
intensity which gives us the color.
(Refer Slide Time: 17:59)
Then we have the second component that is diffuse reflection component due to direct light
source.
(Refer Slide Time: 18:09)
Now in order to model this component we make another assumption. This is about how the
surface reflects the incident light. So, we assume that all the surfaces in the scene are ideal,
395
diffuse relectors or more popularly these are called Lambertian reflectors. Now this follows
the Lambert’s cosine law. So, all the surfaces follow this law which states that energy
reflected by a small portion of a surface from a light source in a given direction is
proportional to the cosine of the angle between the direction and the surface normal. This is
the Lambert’s cosine law. Now as per this law what we can infer?
(Refer Slide Time: 19:15)
The law implies that amount of incident light from a light source on a Lambertian surface is
proportional to the cosine of the angle between the surface normal and the direction of the
incident light. Now if we assume that this is the point of interest in the right hand figure, then
this law the Lambert’s cosine law indicates that the amount of incident light from a light
source on a Lambertian surface is proportional to the cosine of the angle between the surface
normal and the direction of the incident light. Now this angle is called angle of incidence.
396
(Refer Slide Time: 20:14)
Based on this, let us assume a direct light source with intensity Is and the angle of incidence
at the point is denoted by θ.
(Refer Slide Time: 20:31)
Then we can say that amount of light incident at that point according to the Lambert’s law is
Iscosθ, as we have just seen as per the law.
397
(Refer Slide Time: 20:50)
Now if that is the incident light, we also know that a fraction of this light is reflected and
reaches the viewer’s eye and that fraction is determined by the diffuse reflectivity or the
diffuse reflection coefficient for direct light which we are denoting by kd. So then, the amount
of light that gets reflected can be modeled with this expression and this will be the
contribution of the diffuse reflection due to direct light to the overall intensity.
This will be our expression to compute the diffuse reflection component to the overall
intensity value.
(Refer Slide Time: 21:59)
398
Now we can represent the same expression in different way. Let us assume L and N denotes
the unit direction vector to the light source from the point and the unit surface normal vector
respectively. So, L denotes the unit direction vector to the light source from the point of
interest and N denotes the surface normal vector at that point.
(Refer Slide Time: 22:35)
Then the same expression we can rewrite in this way because we know that we can represent
cosθ as a vector dot product in terms of the two unit vector N.L. So, if N.L>0, then we have
the diffuse reflection component denoted by this expression and if it is less than equal to 0,
then it is 0. This is another way of writing the same expression and we will follow this
expression of this representation.
399
(Refer Slide Time: 23:32)
So, we have model the two components. And the third component is the one that is remaining
which is model in specular reflection component.
(Refer Slide Time: 23:43)
Now, this component, we will model with some empirical derivation which we will see later
and this empirically derived model was proposed by Bui Tuong Phong way back in 1973 and
we shall use that model which is also known as Phong specular reflection model so we will
be using this model in our simple lighting model.
400
(Refer Slide Time: 24:21)
So, what this model tells us the assumption is specular reflection intensity is proportional to
cosine of the angle between viewing and specular reflection vector raise to a power that is the
empirically derived law so to say. So, in this Phong model empirically it has been found that
we can model specular reflection intensity as proportional to the cosine of the angle between
the viewing and the specular reflection vectors raised to a power.
Now, V is the viewing vector and if R is the specular reflection vector and the angle between
them is φ as shown here.
(Refer Slide Time: 25:39)
401
Then according to this empirically derived formula, we can say specular reflection
component is proportional to this expression where φ is defined within this range 0 degree
and 90 degree. Now the term ns which is the power is called specular reflection exponent and
by using this exponent judiciously we can generate different effects and by varying the value
of course if the value is larger greater than 100, then it can generate shiny surface effect if the
value is close is 1, it generates rough surface effect.
(Refer Slide Time: 26:34)
Like in case of diffuse reflection, in case of specular reflection also, we can have vector
representation of the same expression. First let us see the actual expression to compute
specular reflection component. Now as we have said this component is the amount of
incident light kd is the specular reflectivity. So, the actual component is given by earlier we
said I specular is proportional to this component and this proportionality constant is the kd.
So, the actual component is kd multiplied by the expression. Now we know cosφ can be
represented by vector dot product V and R, where V and R represents the unit vector along
the viewing direction and the specular reflection direction. So, by using this expression we
can say or we can represent the specular component in terms of vector product in this way.
Now if V.R > 0, then we have this component to compute specular component.
And if V.R ≤ 0, then we have 0. Also to make the expression more comfortable with the
previous expression that we have seen we will replace R in terms of the other vectors L and
N.L is the vector direction towards light source N is the surface normal. So, if we use this
402
expression for R then all our reflection components both diffuse reflection as well as specular
reflection due to direct light can be computed only in terms of L and N.
Rather than L and N in one case and V and R in another case. So, in case of diffuse reflection
due to direct light source we have L and N in case of specular reflection due to direct light
source we have L and as well as V the viewing direction and we are replacing R with L and
N. So, these are the 3 components that we can use to compute the overall intensity of the
reflected light from the point of interest.
(Refer Slide Time: 30:05)
There is another interesting thing which is called intensity attenuation. Now, in our
computations that we have discussed earlier, we assume that the light intensity does not
change as it moves from the source to the surface point. So, the intensity after getting emitted
at source and the intensity when it is incident at a point which is some distance away from the
source we are assuming both are same.
403
(Refer Slide Time: 30:41)
What is the problem with that assumption if we make such assumption that will happen?
Assume, there are two surface points: one is closer to the source and the other is slightly
farther away. Now intensity of the light received by either of these points will be the same
because we are not assuming any change in intensity in the incident light depending on the
distance then the color computed using our simple lighting model will also be the same.
So, nowhere in the computation we are taking into account the distance travelled by the light
explicitly then the color computed will also be the same and as a result we will not be able to
identify or perceive the relative difference in distance between the two points.
(Refer Slide Time: 31:41)
404
So, all surfaces will be illuminated with equal intensities irrespective of their distance which
will lead to indistinguishable overlapping of surfaces when projected on screen. So, we will
not be able to understand the distance between them which will reduce the perception of 3D.
(Refer Slide Time: 32:11)
In order to address this issue, we incorporate something called intensity attenuation. In our
model, in the form of attenuation factors. Now there are two such factors: one is radial
attenuation factor and the other one is angular attenuation factor radial attenuation factor
denoted by AFrad and angular factor denoted by AFang.
(Refer Slide Time: 32:42)
405
Now radial factor accounts for the effect of diminishing light intensity over distance and we
model it using an inverse quadratic function shown here where a0, a1, a2 are coefficients that
we can vary to produce better realistic effects and d is the distance between source and
surface point. So, by using this inverse quadratic function we can take into account the effect
of distance on the intensity.
(Refer Slide Time: 33:29)
The other attenuation factor is angular attenuation. So, in this case, we use it primarily to
generate the spotlight effect. So, there are many ways to do this of course, but one commonly
used function is shown here. With this function, we can actually take into account the angular
attenuation. So, farther away from this axis of point is it will reduce the intensities and that
reduction in intensity with respect to the axis cone axis spotlight cone axis can be computed
using this expression.
It will be 0, if the surface point is outside the angular limit θ. So, if some point is here which
is outside this limit then of course it is not likely to be getting influenced by the spotlight so
the overall component will be 0, but if it is within this limit say somewhere here then
depending on its angle and with respect to the axis we can compute using this expression
where φ is the angle that this point makes with the cone axis.
406
(Refer Slide Time: 35:22)
Now taking into account this attenuation so our simple lighting model will change. Earlier we
had the model as a sum of three components Iamb, Idiff and Ispec. Now we are taking into the
account the attenuation factor and then we are modifying the form. It now takes the form of
Ip= Iamb + AFrad AFang [Idiff+Ispec].
So, AFrad denotes the attenuation factor radial attenuation factor and AFang denotes the
angular attenuation factor. Now if these values are set to 1, then of course as you can see we
are eliminating the attenuation effect and some value other than 1 will include the effect that
is all about monochromatic point light source.
407
(Refer Slide Time: 36:36)
Now let us assume colored source what will happen in that case? So, in case of
monochromatic light which generate different shades of gray. Now if we have to generate
color images then we need to consider colored light source.
(Refer Slide Time: 36:54)
Now as we have discussed earlier in the introductory lecture, when we are talking of color,
we are assuming that there are three primary colors: red, green and blue. They together give
us the perception of a particular color. Accordingly we can assume that the source light
intensity is a three element vector. So, source intensity has three component intensity: one for
red, one for green, one for blue.
408
Similarly, reflection coefficient also have components. Each coefficient is a vector having
three coefficient - one is for each of the color. So one for red, one for green, one for blue for
each of the coefficient for ka, kd and ks. So, this is the only modification we made in order to
take into account colored light sources.
(Refer Slide Time: 38:02)
Then we compute each color component separately using the light model with appropriate
source intensity and coefficient values. So, for computing the component for red we use the
red source intensity as well as the reflective coefficients for red. Similarly, for green and blue.
(Refer Slide Time: 38:37)
That is the modification and finally let us assume that there are more than one light sources.
409
(Refer Slide Time: 38:48)
In that case what will happen again a simple extension so earlier we had only these
component plus this component now we are introducing a summation component here. So,
for each of the source we will compute these component and then we will add it up for all the
n light sources. Note that ambient component will not change because it is the same for all
with a single ambient light intensity.
The change is only for the components due to direct light namely diffuse component and the
specular reflection component. So, this is the overall simple lighting model in general for
multiple sources and if we want to have color then we will simply have IpR, IpG and IpB for
red, green, blue where these coefficient for ambient light as well as diffused reflection
coefficients will be chosen according to the specific color component.
So, we will have 3 separate values giving us a 3 element output. So, that is in summary our
overall simple lighting model.
410
(Refer Slide Time: 40:30)
So, to summarize, we have discussed a simple model assuming that there is one point light
source initially we assume monochromatic then we assume that there are colored light
sources and initially we observe single light source then we assume that there are multiple
light sources, but in all the cases we assume that it is a point light source. So, a dimensionless
light source characterized by only position and intensity.
Another thing that you should note here is that the simplifying assumptions we made. So, to
compute ambient light we assume that there is a single ambient light intensity which is not
true in practice. To compute diffuse light component due to direct light. We assume that
Lambertian surfaces are there which again need not be true in practice and to compute
specular component we assume an empirically derived model the Phong’s specular model
which does not reflect the actual optical behavior.
But in spite of these assumptions whatever we get gives us a working solution to our problem
of computing colors which works in practice. So, although it does not reflect the actual
optical behavior it gives us a working solution and due to this many simplifying assumptions
we are calling it simple lighting model. In order to discuss this simple lighting model we left
out many important topics which actually designed to take into the account the actual optical
behavior.
Which in turns gives us even better, much, much better realistic effects which is expected, but
at the cost of increased heavily increased computation. To know more about such models you
may refer to the material that will be mentioned in the next slide.
411
(Refer Slide Time: 42:48)
So, with this, we conclude our discussion on the simple lighting model. As I said to learn
about this you may refer to this book, refer to chapter 4, section 4.2 to learn in more details
the topics that I have discussed today and also you may refer to the reference material
mentioned in that book and in that chapter for more details on the more realistic lighting
models which are much more complex than the simple model. So, with this I conclude
today’s lecture. Thank you and good bye.
412
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No 15
Shading Models
Hello, and welcome to lecture number 15 in the course Computer Graphics. As usual, we will
start with a quick recap of the pipeline stages that we are currently discussing.
(Refer Slide Time: 00:43)
So, as you may recollect, there are five stages in the graphics pipeline. The first stage is Object
Representation; the second stage is Modeling Transformation. The third stage is Lighting or
assigning color to the surface points. The fourth stage is the Viewing pipeline which itself
consists of five sub-stages namely Viewing transformation, Clipping, Hidden surface removal,
Projection transformation and Window to Viewport transformation.
The fifth and final stage of the graphics pipeline is Scan comparison. I would like to emphasize
here again the fact that although in this lecture or in this course, I will be following this sequence
of stages, but in practice, it is not necessary to follow this exact sequence. So, when a graphics
package is implemented, you may find that some stages are coming after other stages although in
the sequence that I have discussed. They are actually before those other stages like Hidden
surface removal may come after Scan conversion although we are discussing it as before Scan
413
conversion. So, this sequence is not a strict requirement. The basic concepts are what matters the
most.
So, far we have completed our discussion on the first two stages namely Object representation
and Geometric or Modeling transformers. Currently, we are discussing the third stage that is
Lighting or assigning color to the surface points. In the Lighting stage, we have introduced the
basic issues that are addressed in this stage. And in the previous lecture, we have gone through a
simple Lighting model. If you may recollect, in the simple lighting model, we assume that the
color is essentially a composition of three constituent colors or intensities.
Intensity due to ambient light, intensity due to diffuse reflection, and intensity due to specular
reflection. And we have learned models for each of these components and how to combine those
models in the form of a summation of these three individual components.
(Refer Slide Time: 03:23)
Today, we are going to discuss Shading models which is related to assigning colors to the
surface points, but in a slightly different way. Now, as we have seen during the simple lighting
model discussion, the model itself is computation intensive.
414
(Refer Slide Time: 03:57)
So, the calculation of color at a surface point in a 3D scene involves lots of operations. As a
result generation of the image which includes assigning colors to the image is complex and
expensive in terms of computing resources, what are those resources? Processor memory and so
on. Also, it takes time. So, both are important resources and time. So, when we are talking of
assigning colors or computing the colors, which is the job of the third stage.
What we are referring to is essentially the utilization of underlying computing resources. And in
the Lighting model, we have seen that the utilization is likely to be very high because the
computation involves lots of mathematical operations involving real numbers. Also, it is likely to
take time.
415
(Refer Slide Time: 05:26)
In practice, whenever we use some graphics applications, we may have noticed that the screen
images change frequently. For example, if we are dealing with computer animation or computer
games, or any other interactive application, so, screen content changes at a very fast rate. So, the
requirement is that we should be able to generate newer and newer content and render it on the
screen very quickly.
But if we are getting bogged down with this lots of complex computations for assigning colors or
as we shall see in subsequent stages for doing other pipeline stages, pipeline operations, then that
requirement may not be fulfilled, we will not be able to generate images quickly.
416
(Refer Slide Time: 06:32)
So, that may result in visible flickers, distortions which in turn may lead to irritation and
annoyance to the user. And we certainly do not want such a situation to occur. In order to avoid
such situations by reducing the number of computations involved or the amount of computations
involved in assigning colors to surface points, we make use of Shading models.
So, the idea of Shading models is that we have Lighting models, we can make use of it to find
out or determine the color at a given point. However, if we do that for each and every point, then
that is likely to be computation-intensive and time-consuming. To reduce computation we use
some tricks in the form of Shading models.
417
(Refer Slide Time: 07:41)
So, what do we do with a Shading model? First, we use the Lighting model to find out or
compute colors of only a few of all the points that are there on the surface. Now, using those
computed points, we perform interpolation and through interpolation, we assign color at other
surface points which are mapped to the screen pixels. So, here Shading models are used when the
surface points are already mapped to screen pixel. So, already rendering took place.
(Refer Slide Time: 08:38)
Now, between the Lighting model and Shading model, there are broadly two differences.
418
(Refer Slide Time: 08:50)
We have already mentioned that the Lighting model is very expensive because it involves large
number of floating-point operations. In contrast, Shading models are interpolation-based. That
means, we can come up with efficient incremental procedures to perform the computations rather
than going for complex floating-point operations as we shall see in our subsequent discussions.
(Refer Slide Time: 09:28)
The other major differences, Lighting models are applied on the scene description that means, in
a 3D world coordinate system whereas, as we have just mentioned, typically Shading models
419
work at the pixel level after the scene is mapped to the screen or after the rendering is done. That
is the fifth stage of the pipeline is performed.
So, as I said at the beginning, it is not necessary that everything should work as per the sequence
we have outlined. In practice things work with a slightly modified sequence, what is important is
to know about the basic concepts rather than sticking to the exact sequence of pipeline stages.
(Refer Slide Time: 10:23)
So, that is the idea of the Shading model and there are two major differences between Lighting
and Shading models. Now, let us try to have a look and try to understand some Shading models
briefly we will start with the simplest of the Shading models that is Flat Shading.
420
(Refer Slide Time: 10:51)
So, it involves the least amount of computation and what it does?
(Refer Slide Time: 11:00)
So, first in this Flat shading model, what we do first is, find out the color of any one point on a
surface using the Lighting model. So, we apply the Lighting model and compute the color of any
one point, a single point on a surface and then this color is assigned to all other surface points
that are mapped to the screen pixels. So, suppose this is a surface and this is mapped. This is the
pixel grid that I am drawing here.
421
So, consider this scan line here. So, the pixels that are part of the surface are these three. Now,
what we do in this Flat Shading model is that we choose any arbitrary point to apply the Lighting
model and compute its color, color of that particular point in the 3D world coordinate system
because we required to compute the vectors also, and then we use that to assign colors to all
other pixels that are part of the surface. So, suppose we have computed color at this point say the
color is C at this point, then we use this color to set color values of all other surface pixel points.
For example, these three we set as C.
(Refer Slide Time: 13:03)
Clearly, this is a very simple scheme and it is likely to lead to unrealistic images unless we
choose the application scenario properly. So, we must say that Flat Shading works in certain
situations, but not in general good to color any surface. So, in general, we will not be able to use
this particular Shading technique, because it may result in unrealistic images. So, when Flat
Shading will be useful, there are a few conditions. What are those conditions? Let us see.
422
(Refer Slide Time: 13:46)
So, in order to make the particular Shading method work, we have to assume three things. First,
the surface should be polygonal. Second, all light sources should be sufficiently far from the
surface. So, the Shading effects sets of different intensities or colors are not applicable. And the
third Viewing position is also sufficiently far from the surface. It may be obvious that if we are
assuming that the light source is very far away and the viewer is also looking at the scene from a
very far distance.
Then the minute differences between colors at neighboring regions may not be perceivable to the
viewer, and accordingly whatever color we assign will look like uniform. So, in that case, Flat
Shading may work and these three conditions restrict the use of the Flat Shading algorithm.
I repeat again in order to make the Flat Shading work there should be three conditions satisfied,
first the surface must be polygonal in nature. All light sources should be sufficiently far from the
surface and the viewing position should be sufficiently far from the surface. If these three
conditions are not met, then the resulting colored surface may look unrealistic.
423
(Refer Slide Time: 15:56)
To avoid the problems that are associated with Flat Shading, an improved Shading model is there
that is called Gouraud Shading. Let us try to understand Gouraud Shading.
(Refer Slide Time: 16:18)
It gives us a more realistic coloring effect than Flat Shading. But, at the same time, it is having
more computation. So, the improvement is at the expense of increased computation.
424
(Refer Slide Time: 16:37)
What happens in this Shading method, first, we determine the average unit normal vector at each
vertex of a polygonal surface. We will soon see what we mean by the average unit normal
vector. Then using that vector we compute color by applying a Lighting model at each vertex of
the surface. Then we Linearly interpolate the vertex intensities over the projected area of the
polygon.
So, three stages are there or three steps are there in the first step, we compute average unit
normal vector, in the second step, we compute color at the vertex positions by considering the
average unit normal vector and in the third stage, we Linearly interpolate the color that we have
computed at the vertices of the surface. To assign color to other pixels that are part of the
surface.
425
(Refer Slide Time: 17:56)
Now, let us try to understand the stages in detail. So, in the First step what we do, we compute
the average unit normal vector. It essentially implies that a vertex of a surface may be shared by
more than one surfaces. For example, consider this vertex here. Now, this vertex is shared by all
the four surfaces in this figure. So, in that case, when we are trying to compute color at this
vertex, which surface normal I should use?
So, there is confusion. In order to avoid that Gouraud Shading tells us to compute the average
unit normal vector. This is essentially the average of the unit normals of the surfaces sharing the
vertex. So, in this particular example, the vertex here is shared by four surfaces, each will have
its own normal vector. Say for Surface 1 it is N1, Surface 2, Surface 3 N3, Surface 2 N2, Surface
4 N4.
We take the unit normal vectors then compute the average using the simple formula. So, this is a
vector addition divided by a scalar quantity which is the modulus of the four-unit vectors. So, at
that particular shared vertex, we use or we compute the average unit normal.
426
(Refer Slide Time: 19:43)
Then in the second step with the average normal, we compute the color at this vertex using the
Simple Lighting model. So, if you may recollect from our discussion on the Simple Lighting
model to compute color components for diffuse reflection and specular reflection we had to use
surface normals. So, instead of that regular surface normal, we use average surface normal to
compute color. And this will do for all the vertices of the surface. So, it takes one surface at a
time and compute colors for all vertices that define that particular surface.
(Refer Slide Time: 20:39)
427
In the third step, which is the final step, we use these vertex colors to linearly interpolate the
colors of the pixels that are part of the projected surface. So, we are assuming here that the
surface is already projected on screen through the final stage of rendering and we already know
the pixels that are part of the surface. Since we have computed the vertex colors in the first two
stages, we use these colors to linearly interpolate and assign colors to other pixels that are part of
the surface.
(Refer Slide Time: 21:24)
Let us try to understand in terms of one example. So, in this figure, we have shown a projected
surface defined by three Vertices, Vertex 1, Vertex 3, Vertex 2. So, if we apply Gouraud Shading
after the second step, we have already computed the colors of these three vertices by using the
Simple Lighting model as well as the average unit normal vector at these vertex locations.
Now, we are interested to assign or find out the colors of the pixels that are part of the surface,
but not vertices. For example, there are Pixels 4, 5, 6, 7 these are all part of the surface, also 8
and many more. 4, 5, 6, 7 belong to the same Scan line, 4 and 8 belong to two consecutive Scan
lines.
428
(Refer Slide Time: 22:47)
So, what we do, we perform linear interpolation in terms of the colors that are already computed
for the vertices. So, we take one scan line at a time. For example, we have taken the (i+1)th scan
line. So, we compute the color at 4 and 7 which are two edge intersection points on the scan line
which means, they are the intersection points between the edges of the surface and the scan line.
And we apply interpolation where I1 and I2 denote the intensity or the color value that is already
computed at Vertex 1 and Vertex 2. So, for I4 we required these two values for I7 we require I3
and I2 where I3 is the vertex color at 3 here and this y4, y2 these are all y coordinates of those
pixels.
So, we first compute colors for I4 and I7 on the same scan line and then using I4 and I7 we
compute I5 which is here, which is inside the projected surface on the same scan line. So, the
interpolation is shown here I5 is computed in terms of I4 and I7 note that here we are using the x
coordinates of the pixels. In order to compute I4 and I7, we used y coordinates.
But in order to compute I5 we are using x coordinates of the corresponding pixels. That is about
the same scan line what happens when we want to compute the color of subsequent scan lines
say in terms of previous colors, we want to compute the color for 8th pixel, the point 8.
429
(Refer Slide Time: 25:27)
That is also possible. Actually, the equations or the formula that I have shown in the previous
slide are not what is implemented in practice. There is a more efficient implementation of
Gouraud Shading where we do not necessarily always compute the ratios and multiply it with the
color values as we have seen in the previous slide. Instead, we perform interpolation with only
addition, the multiplication and division are not required.
However, for more details on this incremental approach of interpolation, you may refer to the
reference material mentioned at the end of this lecture. We will quickly have a look at the
corresponding algorithm.
430
(Refer Slide Time: 26:37)
The incremental approach is encapsulated here. In these two lines, as you can see, color can be
found out by simply considering the color already computed plus some age constants which are
predetermined. Similarly, in this stage also in this stage, we can use simple addition to compute
color where the addition is between previously computed color and some constant which is
already pre-computed as shown in this line 2.
For more explanation on this algorithm, you may refer to the material that will be mentioned at
the end. The basic idea is that this linear interpolation can be computed using simply addition
rather than multiplication and division that is required if we are trying to do it in a classical way.
So, this is a more efficient implementation of the stage three of Gouraud Shading.
431
(Refer Slide Time: 27:59)
And one more thing we should note here is that this particular Shading technique Gouraud
Shading is implemented along with a later stage of the pipeline, which is part of the fourth stage
it is called hidden surface removal. So, we will discuss about it later. So, Gouraud Shading
assigns colors, but it is typically implemented along with a later stage of the pipeline that is a
sub-stage of the fourth stage hidden surface removal.
(Refer Slide Time: 28:41)
There are problems with Gouraud Shading as well, although it generates more realistic images
compared to Flat Shading, but there are two major problems, one is it is still not good to generate
432
a specular effect that is that shiny surface or the bright spots that we get to see on the surface.
This is primarily because this linear interpolation results in a smooth change of color between
neighboring pixels which is not what happens in the specular reflection where there is a sudden
change between neighboring pixels.
Secondly, what Gouraud Shading suffers from this problem of occurrence of Mach bands is kind
of psychological phenomena in which we see bright bands when two blocks of solid colors meet,
so, if two constitutive surfaces are assigned different colors, then at their joining point we may
get to see some band like things, which is a psychological phenomenon known as Mach banding
effect. And this may result if we apply Gouraud Shading.
(Refer Slide Time: 30:13)
There is a third Shading method, which is quite advanced and it eliminates all problems that we
have discussed so far with Flat Shading and Gouraud Shading.
433
(Refer Slide Time: 30:26)
But, it is heavily computation-intensive and requires huge resources as well as time. We will just
learn the basic idea and we will not go into the details. So, this Phong Shading is also known as
Normal vector interpolation rendering.
(Refer Slide Time: 30:51)
Now, in this, we actually compute color at each point where we find out the normal vectors in a
different way. So, there is actually no interpolation involved, interpolation only in terms of
finding out vectors, but not computing colors, it takes much more time as expected and it does
not have the advantage of other Shading models in terms of reduction in computations.
434
So, it gives us a very realistic image because the coloring effect is closer to reality due to the
very sophisticated approach, but for the same reason, it cannot compute colors with reduced
computations, which are the advantages of Shading models. So, it is not having the main
advantage, but it gives us more realistic images. We will not go into the details of it, it is quite
complex. And if you are interested you may refer to the reference material that will be mentioned
at the end of this lecture.
(Refer Slide Time: 32:12)
I will just mention the three steps. In the first stage, we compute the average unit normal vectorlike in Gouraud Shading. In stage two we apply the Lighting model at each vertex to compute
color and in stage three we apply interpolation but in a different way.
435
(Refer Slide Time: 32:43)
What is that difference? Instead of interpolating colors we now interpolate to determine normal
vectors at each projected pixel position. Remember that normal vectors assume that we are in a
3D world coordinate system, whereas the projected pixel position assumes that we are already in
the device coordinate system which is 2D. So, we need to calculate normal vectors to actually
apply the lighting model which involves the use of normal vectors.
We do that here in Phong Shading. So, the interpolation is not used to compute intensity instead
it is used to determine normal vectors. Once that is done at each projected pixel we know the
normal vectors through interpolation, we compute color using the Lighting model. So, here we
are computing color using the Lighting model, but not through interpolation only difference is
that in order to compute color with the Lighting model, we need a normal vector that we are
finding out through interpolation.
So, essentially in this case, if we summarize the surface is projected we identified a set of pixels
that constitute the surface, at each pixel location we are applying the Lighting model. Before
that, we are using interpolation to find out the normal vector at that pixel location and then we
are using the Lightning model. So, we are using the Lightning model repeatedly, which increases
the computation and time.
436
For more details, you may refer to the material that will be mentioned at the end. We will just
outline and we will stop here on the discussion on Phong Shading. Now, let us try to understand
the idea of Shading in terms of one illustrative example.
(Refer Slide Time: 35:01)
Let us consider a cubical object with the vertices given A, B, C, D, E, F, G, and H. Now, with
this object we want to create a scene of a room in which the object is treated as a shelf attached
to a wall keeping the relative positions of the corresponding vertices same. So, the relative
position will be the same and there is some specification about the wall also it is parallel to the
XZ plane cutting the positive Y-axis at a distance of 1 unit.
And the length is reduced by half and we also mentioned the corresponding vertices in the shelf
with respect to the original vertices. So, after the specified transformation, this figure shows the
3D scene with the shelf attached to the wall as specified in the problem.
437
(Refer Slide Time: 36:39)
We also have to know its projection in order to be able to apply Shading. Now, that is mentioned
here the shelf looks something like this as shown here with the vertices specified each of which
corresponds to the corresponding vertex in the original scene. So, F’,’ belongs to F’, E double’
belongs to E’ and so, on. And in the projected scene, we have mentioned one vertex coordinate
so that other coordinates can be derived.
For example, here we have mentioned the vertex coordinate of 4 7 then we can derive E to be, X
will remain the same Y will be reduced by 1 2 3 4 5, so Y will be 2 and so on for other vertices.
In that way, we can derive the locations.
438
(Refer Slide Time: 37:57)
Now, assume that the room has a monochromatic point light source at a given location with
intensity of 2 units and also assume there is an ambient light with the intensity of 1 unit and the
reflective coefficients or reflectivities for the 3 components ka for ambient light, kd for diffuse
reflection due to direct light and ks for specular reflection due to direct light are specified. And
the specular exponent is also specified as 10 and the viewer is located at this position.
(Refer Slide Time: 38:48)
Assuming this setting let us try to compute the colors at the pixels P1, P2, and P3 assuming the
simplest of all Flat Shading. So, this is P1, this is P2 and this is P3, how we can do that?
439
(Refer Slide Time: 39:13)
So, we first determine the coordinates of the projected vertices which should be easy.
(Refer Slide Time: 39:35)
Then, we have to compute the color at any given point on the surface. Note that as per the
problem description light source is above the surface A’, B’, C’, D’, and on the left side of the
plane which contains the surface B’, F’, G’, C’. Thus, it will illuminate this surface, but will not
contribute anything towards the illumination of the other surface. So, this is the first observation
of the problem description.
440
(Refer Slide Time: 40:17)
Now, in order to compute color, we can calculate color at any point and then use the same value
throughout the surface in Flat Shading. So, let us calculate color at this vertex B’.
(Refer Slide Time: 40:37)
If we see the scene and the object description in the scene, then we know that surface normal at
B’ and the unit surface normal will be this. Now, we know the light source, so the unit vector
towards the light source can be computed in this way and unit vector towards the viewer because
we know viewer location can be computed in this way.
441
(Refer Slide Time: 41:17)
Then with these values we can get the dot product as something like this and also this second dot
product for the specular component as something like this and with these values and using the
reflectivity coefficients we can get the three components added up to get the overall color value
to be 0.79 unit at B’.
(Refer Slide Time: 42:06)
Now, we know that P1 and P2 both are part of the same surface containing B’. P1 is part of the B’
and P2 is part of the surface containing the B’. Now, if we are using Flat Shadings, so, we have
442
already computed the color at B’ so, we will simply assign these colors to all the surface points
that mean to P1 and P2. So, the values color values of P1 and P2 will be 0.79 units.
(Refer Slide Time: 42:47)
And we have also noted that the light source does not contribute to the illumination of this other
surface B’, F’, G’, C’. So, in that case, there will be no contribution due to the direct light source.
So, those two components due to diffuse reflection and specular reflection due to direct light
source will be 0 and it will be illuminated only by the ambient light which is computed using this
expression ka into Ia, where ka is the coefficient value and Ia is the intensity and we get this
value.
So, these are the values that we have computed using Flat Shading P1, P2 and P3. Note here that
we did not use color model or the Lighting model to compute values at P1 and P2 instead we
computed the value only at B, B’ and use that to assign color to P1 and P2. And similarly, we
have done for P3.
So, here we have reduced the usage of the Simple Lighting model and by that, we have reduced
the amount of computations required. However, as I said before since we are using Flat Shading
the colors that are computed may not look realistic when they are rendered on the screen if, the
distances of the source, as well as the viewer from the surface, are not sufficiently large.
443
(Refer Slide Time: 44:34)
Now, here also it may be noted that we have done some informal reasoning to come to the
conclusion of the color values. But if we simply apply the algorithms, then also we will get the
same result. We do not need to actually do any informal reasoning but that you can try on your
own. We will not work that out here.
(Refer Slide Time: 45:00)
And also I would like to request you to use the Gouraud Shading algorithm to perform the same
computations for the three points. I leave it as an exercise for all of you to do. And then you can
compare the amount of computation as well as the end of values that you are getting and from
444
there, you can get some informal idea of the effect that results in the application of these
different Shading models.
So, we have come to the end of our lecture today. To quickly recap, we learned about the idea of
Shading and its difference with the Lighting model. Then we discussed in detail, Flat Shading
model and Gouraud Shading models, and just outline the idea of Phong Shading models. With
the illustrative example, I hope, you could get some idea of the application of the Shading
models and its advantages over-application of only the Lighting model to compute colors. With
that, I would like to end today's lecture.
(Refer Slide Time: 46:38)
For more details, including the ones that are mentioned at different points of the lecture you may
like to refer to this book. Please have a look at Chapter 4, Section 4.3 for the details on all the
topics that I have covered today. Thank you and goodbye.
445
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No 16
Intensity Mapping
Hello and welcome to lecture number 16 in the course Computer Graphics, we are currently
discussing the different pipeline stages, pipeline means how the rendering of a 2D image on a
computer screen takes place through the process of Computer Graphics.
(Refer Slide Time: 00:57)
Now, as we know, there are five stages; we already have discussed the first two stages, namely
Object representation, and Modeling transformers. Currently, we are discussing Lighting or the
third stage and after this, we will be left with two more stages to discuss the fourth stage
Viewing pipeline and the fifth stage Scan conversion.
446
(Refer Slide Time: 01:25)
In the third stage Lighting, we deal with assigning colors to the surface points, the surface of an
object. Now, in the previous couple of lectures, we have learned about the process of coloring
that is, we learned about a simple Lighting model, and also we learned about Shading models. To
recap, the Lighting model is a complex mathematical expression to compute color at a given
point and it makes use of various components of lights that are there when we try to see a
colored object.
Now, these components are ambient light, diffused reflection due to direct light source and
specular reflection due to direct light source. And for each of these, we have learned models and
these models in turn make use of the vectors, surface normal vectors or the viewing vector and
the vector towards the light source, all these vectors are used to compute these components. And
at the end, we sum up these three component contributions to get the overall color values which
is expressed in terms of an intensity value.
Now, this Lighting model is complex and involves lots of operations. So, essentially it takes time
in order to reduce computation time; we learnt about Shading models, where we do not compute
color values using the Lighting model at each and every point, instead we compute values at a
very small number of points, maybe a single point on a surface and we use interpolation
techniques to assign colors to other points.
447
Now, this interpolation technique is much more simpler compared to the Lighting model
computations. Now, these two techniques to assign colors are discussed in the previous lectures.
One more thing remains that is how we map the computed intensity values either using the
Lighting model or using the Shading models to a bit sequence, a sequence of 0’s and 1’s that the
computer understands that will be the subject matter of today's discussion, Intensity Mapping.
This is the third component of assigning color to an object surface.
(Refer Slide Time: 04:25)
Now, when we talk of Intensity Mapping, what we refer to? We refer to a mapping process, what
it maps? It maps the intensity value that we have computed using the Lighting or the Shading
model to a value that a computer understands that is a string of 0’s and 1’s.
448
(Refer Slide Time: 04:50)
If you may recollect, during the worked-out examples that we have discussed in the previous
lectures, we have seen the computation of intensity values. And those values are real numbers
typically within the range 0 to 1. Now, these values are supposed to be used to drive the
mechanism to draw pictures on the screen.
In the introductory lectures, we have touched upon the basic idea of a graphics system. There we
mentioned that through the pipeline stages we compute intensity values and these values are used
to basically drive some electromechanical arrangement which is responsible for rendering or
displaying a colored object on a computer screen.
As an example, we briefly touched upon the idea of cathode ray tube displays. So, if you may
recollect, there what we said that the CRT displays consists of an electromechanical arrangement
where there are electron beams generated which are supposed to hit some locations on the screen
representing the pixel grid. Now, this generation of electron beams is done through an
electromechanical arrangement consisting of cathodes and anodes and magnetic fields.
And this electromechanical arrangement is controlled by the values that we compute at the end
of the pipeline stages. So, our ultimate objective is to use the values, intensity values and use
them to drive the mechanism that actually is responsible for drawing colors on the screen or
drawing pictures on the screen.
449
(Refer Slide Time: 07:18)
As we have already mentioned, in a CRT display, this picture drawing is done by an arrangement
of electron guns, which emits electron beams, and there is a mechanism to deflect those beams to
specific regions on the screen where phosphor dots are present. And when the beam hits the
phosphor dots, the dots emit photons with particular intensity that is light intensity, which gives
us the sensation of a colored image on a screen.
Of course, CRT displays are now obsolete. You may not be knowing about these displays
nowadays, but there are lessons to learn from CRT displays. And at the end of this course,
towards the end, we will learn about other displays where similar things happen, where we
actually use the computed intensities to generate some effect on the screen which gives us a
sensation of color. And this computed intensity values are used to drive the mechanism that
generates those effects. We will talk about some display mechanisms at the end of this course,
where we'll have dedicated lectures on Graphic Hardware.
(Refer Slide Time: 09:00)
450
Now, the point is, so, we are saying that this intensity values are supposed to drive a mechanism
some arrangement which in turn is responsible for generating the effect of colored image. But if
the intensity values are computed as a real number in a range of 0 to 1, how we make the
computer understand the value because computers do not understand these real numbers they
only understand digital values, binary strings of 0’s and 1’s.
A problem here is that any intensity value cannot be represented and used for the purpose of
driving some arrangement to generate the visual effect of colored image on a screen and we need
some way to represent the corresponding intensity values in the computer. Now, this presentation
depends on the frame buffer how we designed the frame buffer.
451
(Refer Slide Time: 10:25)
And that is what we call the Mapping Problem. What is this problem?
(Refer Slide Time: 10:40)
Suppose, let us try to understand it in terms of an example, suppose, we have a graphics system
which has a frame buffer where 8 bits are there for each pixel location that means, 8 bits are
there to store intensity values for each pixel. Now, with 8 bits, how many colors we can represent
that is 2 to the power 8 or 256 values, it means that for each pixel location we can assign any one
of the 256 values as a color value. So, for that particular graphics device, we can say that any
pixel can take at most 256 color values.
452
(Refer Slide Time: 11:37)
On the other hand, when we are computing the pixel colors, there is no such restriction, we can
compute any value, any number between 0 to 1. So, that is essentially an infinite range of values.
Note that this computation takes place with the help of Lighting or Shading models. So, on the
one hand we have values that can be anything, which we get by applying the Lighting or Shading
models real value between 0 to 1.
And on the other hand due to the particular hardware design, we can represent at most a
restricted number of values for each pixel location, in our example it is 256 values. So,
essentially we need to map this potentially infinite intensity values to the 256 values, this is the
problem. So, given the set size, the size of the number of values that can be represented in a
computer, we have to map the potential range of values to those restricted sets.
453
(Refer Slide Time: 13:03)
This is our mapping problem, where we have to keep in mind that we cannot use any arbitrary
mapping because that may lead to visible distortion, our perception is a very sensitive and
complex thing. If we arbitrarily decide the mapping, then we may perceive images in a different
way then, ideally what should have been the case. So, this distortion evidence is another
objective of our mapping.
So, we need to map and we need to map in a way such that this distortion is not there. How we
can achieve this objective? Let us try to understand the scheme through which we can achieve
this objective.
454
(Refer Slide Time: 13:58)
So, that is the Mapping scheme.
(Refer Slide Time: 14:01)
The core idea behind the scheme is that we need to distribute the computed values among the
system supported values such that the distribution corresponds to the way our eyes perceive
intensity difference. So, this is a slightly complex idea. Let us try to understand this in terms of
some example.
455
(Refer Slide Time: 14:37)
Now, this core idea actually relies on our psychological behavior. How we perceive intensity
differences
(Refer Slide Time: 14:58)
Let us take one example. Suppose, there are two sets of intensity values. In the first set, there are
two intensities 0.1 and 0.11. So, the difference between the two intensities is 10 percent. In the
second set also, there are two intensities 0.5 and 0.55, again here the difference is 10 percent.
But, due to our psychological behavior, we will not be able to perceive the absolute difference
456
between the intensity values, the difference will look the same, although absolute values are
different.
So, in first case we have two absolute values, although the relative difference between them is 10
percent. And in the second set, we have two absolute values which are different than the first set,
but the relative difference is same 10 percent.
If we are asked to look at those two sets of values, we will not be able to perceive the difference
between those values because of our psychological behavior, that we do not perceive the
absolute differences, instead, we perceive the relative differences. If the relative differences are
same, then we will not perceive any difference if in spite of absolute differences being there.
(Refer Slide Time: 16:45)
So, that is one crucial behavioral trait of us, we cannot perceive the absolute difference in
intensity values only relative difference matters. Now, if that is the case, then we can utilize this
knowledge to distribute the intensity values among the device supported intensity values. How
we can do that?
457
(Refer Slide Time: 17:17)
It follows from our behavioral trait, that if ratio of two intensities is the same as the ratio of two
other intensities, then we perceive the difference as the same. This is an implication of the
psychological behavior that we just described. And using this implication, we can distribute the
intensities, let us see how.
(Refer Slide Time: 17:56)
Recall that we are given a continuous range of values between 0.0 and 1.0. So, this is our range
of computed intensity values computed using Lighting or Shading model. On the other hand, the
device supports a set of discrete values, because the frame buffer is designed in that way. And
458
we are supposed to map this continuous range to that set of discrete values. This continuous
range needs to be distributed into the finite set of discrete values.
And we can do that without distorting the image by preserving the ratios in the successive
intensity values, if we preserve the ratio in the successive intensity values, then even if we are
approximating a computed intensity to a device supported intensity, the resulting image will not
appear to be distorted and this comes from the psychological trait that we have just discussed.
That is our eyes are not designed to perceive absolute differences in intensities; instead, only
relative differences matter.
(Refer Slide Time: 19:35)
So, based on this reasoning, we can come up with a mapping algorithm, a step by step process to
map a computed intensity to one of the device supported intensities.
459
(Refer Slide Time: 19:52)
Let us assume that the device supports N discrete values for each pixel and let us denote these
values by I0, I1, up to IN. So, there are it should be N-1. So, denoted by I0, I1, up to IN-1, there are
N discrete values.
(Refer Slide Time: 20:23)
Now, we can use a particular device called a Photometer to determine the boundary values that is
I0 and IN-1. Now, it means that we know the range of intensities supported by that particular
system; this is called the Dynamic range, which is bounded by I0 and IN-1.
460
(Refer Slide Time: 21:12)
Now, the highest value that is IN-1 is usually taken to be 1.0 that is a convention used. So, the
intensities range between I0 and 1 this is the range [I0, 1]. This is the dynamic range. And I0 value
we can obtain by using the particular device called Photometer.
(Refer Slide Time: 21:44)
Now, we will apply the knowledge that we have just discussed that is to preserve the ratio
between successive intensities, we must ensure the following that is a I1/I0 = I2/I1 … IN/IN-2 = a
common ratio r. So, the ratio of the consecutive intensity values supported by the device should
be the same.
461
(Refer Slide Time: 22:26)
In other words, we can express all intermediate values in terms of the lowest value. So, I1 we can
represent by this expression rI0, I2 similarly, we can express by the expression r2I0, I3 to be r3I0,
and so on.
(Refer Slide Time: 22:52)
So, in general, we can say that this equation holds that is Ik is equal to rk and I0 where I0 is the
minimum intensity for k>0. Going along this line we can say IN is equals to rNI0. So, this
equation holds for any intensity value supported by the device. Now, here you can notice that the
462
total number of intensity values supported by the device is represented by N+1 and IN is the
maximum intensity value, I0 is the minimum intensity value.
So, then what we need to do as we already discussed, we have already determined the minimum
value, and we assume the maximum value to be 1. Minimum value we determined using a
photometer and we assuming maximum value to be 1. Then, using this equation, we can
determine the value of r by solving the equation 1 = rNI0 where we know the value of I0 and we
know the value of N from the total number of intensity values supported by the device. Then
using this value of r, which we compute by solving this equation, we can obtain the N intensity
values using this equation for any particular intensity value k.
(Refer Slide Time: 25:43)
Now, let us try to understand what to do next, what should be our next step. So, in the previous
step we computed the value of r knowing the minimum value, maximum value and the total
number of intensity values supported by the device. Then based on that we can compute any Ik.
Now, suppose, using a Lighting model, we computed an intensity value for a pixel to be IP. So,
we are denoting this intensity value by IP.
Now, we will maintain a table, in the table, what we will do, we will maintain the intensity
values supported by the device, which we compute using the earlier equation. So, that is I0 which
we get with photometer I1, I2 in this way to IN, then once we compute IP, we will try to see from
463
this table which value comes closest to IP. That is, we will try to find out the nearest value that is
closest to IP.
Let us call it Ik. Now, for that value, we already have a bit pattern stored here in this table. Let us
call it bit pattern 0, bit pattern 1, bit pattern 2, this way bit pattern N for the N+1 intensity values.
So, for the kth intensity value in the table Ik we know the corresponding bit pattern. So, then we
take that bit pattern and store it in the frame buffer. So, that is how we map value computed
using a Lighting model to a bit pattern that represents a value supported by the device.
(Refer Slide Time: 28:23)
So, then in summary what we do, we determine the value of N and the minimum value I0 using
photometer and assume IN to be 1.0 that is the maximum value to be 1.0. Then using the equation
IN = rNI0, we solve for the value r. Then using the value of r we calculate the device supported
intensity values. So, we know I0, then we calculate I1 to be r, I0, I2 to be r2I0, and so on.
And for each of these computed values, we keep a bit pattern. So, this is our table upto bit pattern
for the maximum value. Then we compute for a pixel the intensity value using a Lighting model,
map it to the nearest device supported intensity value by looking at the table and then we use the
corresponding bit pattern to represent that computed intensity value and finally, we stored that
value in the frame buffer. That is how we map computed intensity value to a bit pattern and store
it in the frame buffer location.
464
(Refer Slide Time: 30:18)
Let us try to understand this whole process using one example.
(Refer Slide Time: 30:25)
Suppose, we have a display device, which supports a minimum intensity I0 as 0.01 and this value
of course, as we mentioned earlier we found out with the photometer device. As usual, we are
assuming that the maximum intensity value supported by the device to be IN equal to 1.0.
465
(Refer Slide Time: 30:58)
Let us assume that the device supports 8 bits for each pixel location. In other words, it has 8 bits
to represent the color of a pixel then the total number of intensity values, which we can denote
by M to be N+1 as discussed earlier supported by the device for each pixel is 28 or 256. So,
M=256 that means N is 255. So, from I0 to I255 are the intensity values that will be supported by
the device.
(Refer Slide Time: 31:59)
So, these intensity values we can denote with these notations I0, I1, I2 up to I255. Now, we can set
up this equation based on the relationship that is IN= rNI0. Now, here we are replacing the values
466
IN, I0 and N to get this equation and we solve this equation to get the value of r. So, if you solve
it, you will get the value of r.
(Refer Slide Time: 32:38)
So, solving this we get r =1.0182 and using this value, we get other intensity values in this way,
so, I1 will be rI0 that is 0.0102. I2 will be r2I0, that is 0.0104, and so on. And we create a table of
these values.
(Refer Slide Time: 33:15)
467
In this table also we assigned bit patterns. So, I0 we assigned 000, I1 we assign this bit pattern I2
we assign this bit pattern and so on up to this bit pattern for the last value. This is of course one
possible mapping. Now, assignment of bit pattern can be arbitrary, it really does not matter
because it has nothing to do with the preservation of ratio. But the actual calculation of these
intensity values is what matters. This calculation is done based on the principle of preserving the
ratios of successive intensity values. So, that the resulting image is not distorted.
(Refer Slide Time: 34:09)
Now, let us assume that we have computed some intensity values using the Lighting model at a
pixel location and that value is 0.1039. So, this is our table, and we computed this value.
468
(Refer Slide Time: 34:33)
So, as per the algorithm what we should do, we try to find out the nearest intensity value that the
pixel support. So, in this case, that is I2 or 0.104, and the bit pattern corresponding to I2 is this
one. So, we store this bit pattern at the corresponding frame buffer location.
(Refer Slide Time: 34:58)
So, here you may note that the final intensity value that we represented and stored in the frame
buffer is different from the actual value that is computed using the Light model because of the
mapping. So, it means that there is some error, always there will be some error. Although with
the preservation of the ratio of successive intensities, we can elevate the problem of visual
469
distortion in the resulting image. Still, there are ways to improve this selection of appropriate
intensity representing a computed intensity.
(Refer Slide Time: 35:57)
And there are some techniques to do that, at different level, one is Gamma correction, other one
is Error correction for intensity mapping through halftoning or dithering methods. However, we
will not go into the details of these methods. The basic idea is that using these methods, we can
actually reduce the effect that arises due to the introduction of mapping errors, the difference
between computed intensity, and the intensity that we represent and store in the frame buffer. If
you are interested, you may refer to the material that will be mentioned at the end of this lecture.
470
(Refer Slide Time: 36:48)
So, in summary, what we can say is that.
(Refer Slide Time: 36:53)
In stage three, there are three broad concepts that we have covered, what are these concepts.
471
(Refer Slide Time: 37:09)
The first is Lighting model. So, this is the basic method that we follow to simulate the optical
properties and behavior which gives us the sensation of color. Now, lighting model is complex.
So, in order to avoid complexities, we take recourse to Shading models. This is the second
concept that we have learned. Shading models is essentially a way to reduce computation, while
assigning colors to surface points, it makes use of lighting models, but in a very limited way and
uses interpolation, which are less computation intensive to assign colors to surface points.
Then the third concept that we have discussed is Intensity Mapping. So, with Lighting or
Shading model we compute color as a real number within a range of 0 to 1. So, any value can be
computed. However, a computer does not support any value, it is discrete in nature. So,
essentially discrete values are supported a subset of values of all possible values are supported.
For example, if we have 8 bit frame buffer that means each pixel location is represented by 8 bits
we can support at most 256 intensity values for each pixel. A pixel color can be any one of these
256 values, whereas, we are computing color as any value between 0 to 1. So, we need to map it,
this mapping is complex and it introduces some amount of error. This error may result in
distortion.
However, to avoid distortion, we make use of one psychological behavioral aspect of our visual
perception. That is, we distribute the computed or potential intensities among the device
supported intensities in a way such that the ratio of the consecutive intensities remains the same.
472
If we do that, then this perceived distortion of the image may be avoided. However, in spite of
that, we introduce some error which may affect the quality of the image.
Because whatever color we are computing, we are not actually assigning exactly the same color
to the final image, instead, we are mapping it to a nearest color. So, in turn it may affect the
overall quality. To reduce that, few techniques are used like Gamma correction or Error
propagation. And these techniques you can go through on your own in the reference material that
we will mention at the end. In this lecture, we will not go into the details of those techniques, as
those are not relevant for our discussion.
(Refer Slide Time: 40:44)
In the next lecture, what we will do is we will discuss another important aspect of the third stage
that is Color model. Along with that, we will also learn about Texture synthesis, both are part of
the third state that is coloring. So, so far we have learned three concepts and two more concepts
we will learn in the subsequent lectures.
473
(Refer Slide Time: 40:44)
Whatever we have discussed today can be found in this book. You may refer to Chapter 4,
Section 4.5, to learn about the topics, and also you may find more details on the topics that we
mentioned, but we did not discuss in details, namely the Error propagation techniques and
Gamma correction techniques. That is all for today. Thank you and goodbye.
474
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture No. 17
Color models and texture synthesis
Hello and welcome to lecture number 17 in the course Computer Graphics. Currently we are
discussing the third stage of the graphics pipeline.
(Refer Slide Time: 0:40)
We have already covered first two stages in our earlier lectures and currently we are
discussing the third stage that is lighting or assigning colour to the surface points of objects in
a scene.
(Refer Slide Time: 0:58)
475
We have already covered 3 topics in this stage namely lighting, shading and intensity
mapping. Today also we will continue our discussion on the activities that we perform in the
third stage of the pipeline. So, two of the concepts that are part of third stage will be
discussed today, one is colour model, other one is the texture mapping.
(Refer Slide Time: 1:33)
And with the discussion on these two topics, we will conclude our overall discussion on the
third stage of the graphics pipeline. So, before we start our discussion on the topics I would
like you to understand the basic idea behind our perception of color, in order to do that we
need to know the psychology and physiology of vision. How do we perceive?
(Refer Slide Time: 2:11)
476
We mentioned earlier that color is a psychological phenomenon. So, it is essentially our
perception that there is a color. Now, from where this perception comes? That is due to the
physiology of our visual system or the way our eyes are made and the way they work. So,
essentially the physiology determines the psychological effect.
(Refer Slide Time: 2:54)
Let us try to go through the physiology of vision in a brief way. Look at the figure here, so
this figure shows how our eye is organized. So, we have cornea that is at the outside of the
eye there are other components, then we have pupil, iris, lens, retina, optical nerve and central
phobia. So, when the light comes after getting reflected from a point suppose this the light
ray. So, as the figure shows, the light rays that are incident on the eye passes through cornea
that is the component here. Pupil component here and lens that is the component here and
after passing through these, it comes to the backside that is the retina.
477
(Refer Slide Time: 4:43)
Now, during its passage through these various components, it gets refracted by the cornea as
well as the lens so that the image is focused on the retina. So, the lens and the cornea help to
focus the image on the retina. Now, once the light rays falls on the retina, image is formed
and then it is transmitted to the brain through optical nerve.
(Refer Slide Time: 5:34)
Amount of light that is entering the eye is controlled by iris. This component and that is done
through the process of dilation or constriction of the pupil. This is the pupil, so iris dilates or
constricts the pupil to control the amount of light entering the eye.
478
(Refer Slide Time: 6:12)
Now, I said that once the light ray falls on the retina, image is formed. How that is done?
Now retina is composed of optical nerve fibers and photoreceptors they help in forming and
transmitting image to the brain.
(Refer Slide Time: 6:42)
Now, there are two types’ photoreceptors; rods and cones. Rods are more in the peripheral
region of the retina, this region whereas, Cones are mainly in a small central region of retina
called the phobia, this component. So, we have two types of photoreceptors rods and cones in
retina which receives the light and help create the image.
479
(Refer Slide Time: 7:25)
One more thing, now more than one rod can share an optic nerve, so there can be more than
one rods for each optic nerve that is there in retina and connected through the rods. And the
rods are there to help in one thing that is it aids sensitivity to lower levels of light, so when
light is not very bright, we still manage to see things that is due to this presence of rod.
(Refer Slide Time: 8:15)
On the other hand, in the case of cones, the other kinds of photoreceptors so there is more or
less one optic nerve fiber for each cone, unlike the case of rod and it aids in image resolution
or acuity.
480
(Refer Slide Time: 8:39)
Now, when we get to see something with the help of cones that is called photopic vision and
when we get to see something with the help of rods that is called scotopic vision. So, there
are two types of vision, photopic and scotopic.
(Refer Slide Time: 8:58)
And when we say we are seeing a coloured image, the fact is we perceive colors only in case
of photopic vision. In scotopic vision, we do not get to see colors instead, we get to see series
of grays or different gray levels rather than different colors. So, this is very important, we
should remember that when we talk about colored images, that means we are perceiving
colors, so we are talking about photopic vision only.
481
(Refer Slide Time: 9:41)
Now, there is one more thing we should know that is the idea of visible light. So, when we
see a color, as we have already discussed earlier, that is due to the light. So, light coming
from some source gets reflected from the point and enters our eye and because of that we get
to see colour at that point. Now, this light is called visible light. It allows us to perceive color.
Now, what is this visible light? It refers to a spectrum of frequencies of electromagnetic
waves, which are the light waves. Now, the spectrum means it is a range. At one end of the
spectrum is the red light with the frequency mentioned here and 700-nanometre wavelength.
And at the other end of the spectrum is violet light with a frequency and wavelength
mentioned here.
So, red is the component with lower frequency; violet is the component with higher
frequency and all frequencies in between a part of the visible light. And red is the component
with the highest wavelength, and violet is the component with the lowest wavelength and in
between wavelengths are there in the visible light.
482
(Refer Slide Time: 11:35)
Now, why we are calling it visible light? Because there are light waves with frequencies that
are outside this range also but we are not calling that as part of visible light. That is for one
simple reason. The frequencies that are present in the visible light spectrum are able to excite
the cones in our eye giving photopic vision or the perception of color.
So, these frequencies that are part of the visible light can excite the cones which gives the
perception of photopic vision or coloured images. That is why we are calling this as visible
light. Light waves that fall outside this spectrum do not have this property.
(Refer Slide Time: 12:47)
Now, there are three cone types that are present in the retina. Three types of cone
photoreceptors. One type is called L or R. From the name, you may guess, this type of cone is
483
most sensitive to red light. Then we have M or G, which are most sensitive to green light.
Now, green light has wavelength of 560 nanometre. And then we have S or B, again this type
of cones are most sensitive to blue light with a 430 nanometre wavelength. So, there are three
cone types, each type is sensitive to a particular light wave with a specified frequency, we
call these light waves as red, green and blue.
(Refer Slide Time: 14:03)
Then how we perceive colour? So, when light comes, it contains all these frequencies.
Accordingly, all the three cone types get stimulated and then as a combined effect of
stimulation of all the three cone types, we get to perceive the colour. Now, this theory which
tells us how we perceive colour is also known as the tristimulus theory of vision because the
basis of it is the idea that there are three cone types and these three gets stimulated to give us
the perception. So, it is called the tristimulus theory of vision. We will come back to this idea
later.
So, that is in a nutshell how we perceive colour. So, we have our eye constructed in a
particular way having cone types in retina, three types of cone, these cone types gets
stimulated with the frequencies present in the visible light, and then as a result, we get to
perceive the colour.
484
(Refer Slide Time: 15:31)
Now, that is the basic theory of how our eyes work and how we get to perceive colour. Now,
with this knowledge, how we can actually be able to build a realistic computer graphics
system. Let us try to see that, how this knowledge helps us in colour generation in computer
graphics.
This question brings us to a concept called Metamerism or Metamers. What is that?
(Refer Slide Time: 16:10)
So, what we want in computer graphics? We are primarily interested in synthesizing colours.
We are not interested in the actual optical process that takes place in giving us the perception
of color. Our soul objective is to be able to synthesize the colour so that the resulting scene or
image looks realistic
485
(Refer Slide Time: 16:41)
We can do that with the idea of metamers and the overall theory of metamerism. How we can
do that?
(Refer Slide Time: 16:54)
Now, let us revisit what we have just discussed. So, when a light is incident on our eye, it
composed of different frequencies of the light spectrum including the visible light
frequencies. Now, these visible light frequencies excite the three cone types, L, M, S or R, G,
B in different ways. Now, that in turn gives us the sensation of a particular color. So, all three
cone types get excited due to the presence of corresponding frequencies and this excitation is
different for different incident light and accordingly we get to see different colours.
486
(Refer Slide Time: 17:53)
But one thing we should keep in mind that when we say that we are getting a perception of a
colour, the underlying process need not be unique. So, in eye it works in a particular way
because there are three cone types and these three types gets excited in a particular way, give
us the perception of color, but there can be other ways also to give us the perception of the
same colour. In other words, the sensation of a colour C resulting from spectrum S1 can also
result from a different spectrum S2.
That means we have a light source, it gets reflected from a point and comes to our eye. It has
one spectrum which excites the three cone types in a particular way and give us a sensation of
a colour C. That does not mean that that is the only spectrum that can give us this particular
sensation. There can be another spectrum which can excite the three cone types in a different
way but at the end can give us the same colour perception. So, there are multiple ways to
generate colour perception. It is not a unique way. And this is a very important knowledge we
exploit in computer graphics.
487
(Refer Slide Time: 19:30)
Now, this possibility that multiple spectrums can give us the sensation of the same colour is
because of the optical behavior which we call metamerism. And the different spectra that
result in the sensation of the same colour are known as metamers. So, metamerism is
effectively the idea that different spectra can give us the sensation of the same colour and
these different spectra are known as metamers.
(Refer Slide Time: 20:11)
So then, what it implies? Metamers imply that we do not need to know the exact physical
process behind colour perception. Because exact physical process may involve one particular
spectrum say, S1. Instead, we can come up with an artificial way to generate the same
488
sensation using another spectrum S2 which is in our control. So, this S1 and S2 are metamers
and gives the perception of the same colour.
So, you may not be able to know exactly what is the spectrum when we perceive a particular
colour, but we can always recreate that sensation by using another spectrum which is a
metamer of the actual spectrum.
(Refer Slide Time: 21:15)
In other words, we can come up with a set of basic or primary colours and then combine or
mix these colours in appropriate amounts to synthesize the desired color. So, a corollary of
the metamerism concept is that we need not know the exact spectrum, instead what we can
do? We can always come up with a set of primary colours or basic colours and then we
combine these colours to get the sensation of the desired color, combine or mix these colours
in an appropriate amount to get the sensation of the desired color.
489
(Refer Slide Time: 22:09)
So, the idea boils down to finding out the set of basic or primary colours. Now, those sets are
called color models. So, this brings us to the idea of color models, ways to represent and
manipulate colors. So, basic idea is that we have a set of basic colors using which we can
generate different colors by mixing them in an appropriate way with an appropriate amount.
That is the idea of color models.
So, the idea of metamers brings us to the idea of color models, which helps us to stimulate the
idea of colored images. So, we can create the perception of a color using a color model which
is supposed to be metamer of the actual spectrum.
(Refer Slide Time: 23:12)
490
Thus, the question that we posed that is how we generate colours in CG, one way to do that is
with the use of colour models without bothering about how the color is actually generated in
the eye, without bothering about the actual spectrum that is responsible for giving us the
perception.
(Refer Slide Time: 23:38)
Now, there are many colour models. Let us try to understand the idea of color models in
terms of the most basic of them all, namely the RGB colour model or Red Green Blue color
model. Remember that we talked about three cone types, L, M and S. L get excited mostly by
the red light, M by the green light and S by the blue light that are present in a light spectrum.
We also mentioned that incident light excites these three cones in different ways that means
they excite them in different amount which results in the photopic vision which gives us
perception of color.
491
(Refer Slide Time: 24:37)
Thus, we can think of color as a mixture of three primary colors, red, green and blue. And we
need to mix these three colors in appropriate amounts to synthesize the desired color. Now,
this idea that color is a mixture of three primary colors red, green and blue is called the RGB
color model, which is the very basic of the color model. Now, the idea of this color model as
you can guess comes directly from the way our eye work, that is there are three cone types
and we excite them differently to generate the perception of different color by varying the
light spectrum that is incident on the eye.
The idea is illustrated here in this figure as you can see, these are the three component colors
red, green, and blue. And when they are mixed in a particular way, for example, here blue
and green are mixed, red is also mixed and we get a particular color here, here if we mix
these amounts, we will get another color and so on. So, in RGB model, we use three primary
colors, red, green, blue and we mix them to get any color. Now, the idea is to mix them. What
you mean by mixing them?
492
(Refer Slide Time: 26:43)
Here, when we talk of mixing, we mean we add the colors. So, RGB is an additive model.
Here, any color is obtained by adding proper amounts of red, green, and blue colors.
(Refer Slide Time: 27:01)
This is important to remember that this is an additive model. And how we can visualize this
model? Is there any way to visualize this model? Now remember that there are three
primaries. Thus we can think of a color as a point in a 3D color space. The 3 axes correspond
to the 3 primary colors.
493
(Refer Slide Time: 27:29)
Further, if we are assuming normalized color values, that means the values within the range 0
and 1 which is typically what we do when we use lighting models, we can visualize the RGB
model as a 3D color cube as shown here. This is also known as the color gamut that is, set of
all possible colors that can be generated by the model. So, the cube that is shown here
contains all possible colors that can be generated by the RGB model.
(Refer Slide Time: 28:22)
Now, we said RGB is a color model, which is an additive model. Now, there are other color
models also. For example, there is the XYZ model, CMY model, HSV model. These models
are used for different purposes and in different situations. And not all of them are additive;
we also have subtractive models as well.
494
However, in this lecture, we will not go into the details of any other model. If you are
interested, you may refer to the material that will be mentioned at the end of this lecture. For
more details on these different color models. Now, let us move to our other topic, that is, the
synthesis of textures.
(Refer Slide Time: 29:21)
Now, earlier, we talked about lighting model to compute color. One thing we did not
explicitly mentioned that is, when we are computing intensity values using the lighting
model, the overall surface when colored with the intensity values computed with the model
appears to be smooth which is definitely not a realistic. Typically we get to see different
types of surfaces, and the majority or almost all the surfaces are non-smooth.
They have something else apart from a smooth distribution of color. Like as you can see here,
this wooden plank in this figure, you see on the surface, there are various patterns. Now,
these patterns cannot be obtained by applying the lighting model, as we have discussed
earlier. When I say the lighting model, I also mean that the shading models as well because
shading models are essentially based on the lighting model.
So, the lighting or shading models that we have discussed earlier cannot give us the various
patterns and other features that typically appear in reality on the surfaces. So, in order to
achieve those realistic effects, various other techniques are used.
495
(Refer Slide Time: 31:09)
Now, broadly there are three such techniques. And these techniques together are called
texture synthesis. We want to synthesize the texture that appears on the surface. So, broadly
there are three such synthesis techniques. One is projected texture; one is texture mapping;
the other one is solid texturing.
(Refer Slide Time: 31:41)
Now, let us start with the projected texture. The idea is very simple. So, when you say we
have generated a colored image, that means we have created a 2D array of pixel values after,
of course, the entire pipeline stages are completed and we have completed the fifth stage that
is scan conversion also. Now, these pixel values are essentially values representing intensities
496
or colors. Now, on this surface, we want to generate a particular texture, a particular effect,
pattern.
What we can do? We can create the texture pattern and paste it on the surface. So, two things
are involved here; one is we already have a pixel grid with values that represents the colored
surface without texture. Now, we are separately creating a texture pattern and pasting it on
the pixel grid or the color values that are already computed using the lighting or shading
model.
(Refer Slide Time: 33:06)
Now, this projected texture method that means the creation and pasting of texture on the
surface can be done using a simple technique. So, we create a texture image or a texture map
from a synthesized or scanned image. So, either we can artificially create an image or we can
scan or synthesize an image and use that as a texture map which is a 2D array of color values.
Now, to differentiate it with computed color values we talked about earlier, this 2D array we
called as texel array and each array element is a texel. So, we have a pixel grid representing
the original surface, and we have a texel grid representing the artificially created texture
patterns. Now, this texture pattern can be created either synthetically or by scanning an
image. Now, there is a 1 to 1 correspondence between texel and pixel array.
This is quite obvious, then what we do? We replace pixel color with the corresponding texel
value to mimic the idea that we are pasting the texture on the surface. So, the first stage is we
are creating the texel grid, which is creation of the texture pattern, then we are pasting by
497
replacing the pixel values with the corresponding texel values where there is a one-to-one
corresponds between the pixel and texel grid elements.
498
(Refer Slide Time: 35:16)
Now, this pasting or replacement can be done in different ways, broadly three ways are there.
The first thing is the obvious one, simply replace the value with the corresponding texel
value. The second is slightly more complicated; here we are blending the pixel and the texel
values using a formula shown here. We are using the addition for blending, and here C pixel
indicates the pixel intensity, C texel indicates the texel intensity value, k is a constant
between 0 to1.
The third is also slightly complicated, the third approach, that is, we perform a logical
operation either AND operation or an OR operation between the pixel and texel values. Now,
remember that we store these values as bit strings in the frame buffer. So, then we can
perform logical AND or logical OR operation between the two-bit strings, which will give us
a new bit string, and that will represent the final color that is the projected texture pattern.
So, this is the projected texture method. Here, we create the texture pattern separately, then
paste it. There are three ways to paste; one is simply replacing the pixel value with the texel
value. The second is using an operation called blending, and third is using a logical AND or
OR operation between the two bit strings representing the pixel and texel intensity values.
Either of these three, we can use to paste.
499
(Refer Slide Time: 37:28)
There is one special technique used in the projected texturing method, also apart from the
ones that we just discussed, this is called the MIPMAP technique, where MIPMAP stands for
Multum In Parvo map or many things in a small space map. Multum In Parvo means many
things in a small space. What is the idea?
(Refer Slide Time: 38:04)
Earlier, we talked about creating one texture pattern. Now, in this MIPMAP technique, we
are creating more than one texture pattern. In fact, we create a large number of texture maps
with decreasing resolutions for the same texture image, and we store them. So, in our
MIPMAP technique, we may store all these texture maps for the same image with different
sizes, as shown in this figure. So, this is one image, this is another image, this is another
500
image, another image as you can see progressively the size is reducing, although the image
remains the same. So, how we use this?
(Refer Slide Time: 38:57)
Now, suppose we are given to generate something like this pattern. As you can see, the
region closer to the viewer position is having bigger patterns here. As the distance from the
viewer increases, the pattern sizes becomes progressively smaller as we move away from the
viewer. So, in MIPMAP technique, we store these different sizes of the same pattern and
simply paste it at appropriate regions of the image rather than creating a more complicated
pattern. So, that is the idea of MIPMAP, and we do that in generating realistic texture effects.
(Refer Slide Time: 39:57)
501
Next is the second technique, that is texture mapping. Here, we primarily use it for curved
surfaces. Now, on curved surfaces, it is very difficult to use the previous technique. Simple
pasting of texture does not work, and we go for a more general definition of the texture map.
(Refer Slide Time: 40:22)
So, what we do there? We assume a texture map defined in a 2D texture space where the
principle axes denoted by u and w and the object surface represented in the parametric form,
usually by symbols Ɵ and φ. Of course these are one notation, there can be other notation as
well.
(Refer Slide Time: 40:52)
Then, we define two mapping functions from the texture space to the object space. These are
the forms of the mapping function shown here. This is the one, and this is the other one.
502
(Refer Slide Time: 41:13)
And in the simplest case, these mapping functions take the form of linear functions as shown
here where there are four constants A, B, C, and D. So using this mapping functions we map
a texture value defined in the texture space to a value in the object space and then use that
value to create the particular pattern.
(Refer Slide Time: 41:48)
Let us try to understand this in terms of one example. Consider this top figure, here we have
shown a texture pattern defined in a texture space. Now, this pattern is to be pasted on this
surface here, particularly in the middle of the overall surface, to create that effect, now in
order to do that we need to map from this pattern to this objects surface space and what are
503
the mappings we should use, we will assume here that we are going for the simplest mapping
that is the linear mapping let us try to estimate the mapping functions.
(Refer Slide Time: 42:36)
Now, the specification of that surface is already given here in terms of the size information.
So, using that information, we go for a parametric representation of the target surface area
that is the middle of the square. How do we represent this? With this set of equations which is
very easy, you can try to derive it yourself.
(Refer Slide Time: 43:13)
Then with this parametric representation, we will make use of relationships between the
parameters in the two spaces with respect to the corner points. For example, the point at 0 0
504
in the texture space is mapped to the point 25 25 in the object space and so on for all the
corner points listed here.
(Refer Slide Time: 43:48)
So, with this mapping what we can do, we can substitute these values into the linear mapping
functions, which we have seen earlier to get the constant values, which will be A = 50, B =
25, C = 50, and D = 25. So, our mapping functions are finally given in this form. Ɵ =50u+25,
φ = 50w +25.
(Refer Slide Time: 44:21)
So, that is the idea of the second category of texturing. Now, there is a third category of
texture mapping technique, solid texturing. Now, this texture mapping is typically difficult in
many situations where we have complex surfaces or where there should be some continuity
505
of the texture between adjacent surfaces. For example, consider this wooden block here as
you can see, there is a continuity between the textures in the surfaces, and unless this
continuity is maintained, we will not be able to create the realistic effect. So, in such cases,
we use solid texturing.
(Refer Slide Time: 45:11)
I will just give you the basic idea without going into the details of this technique because this
is slightly more complicated compared to the previous techniques that we have seen. Earlier,
we have seen that we are defining a texture space in 2D, now we are defining texture in a 3D
texture space where the principal axis are usually denoted by u, v, and w.
(Refer Slide Time: 45:38)
506
Then we perform some transformations to place the object in the texture space that means,
any point on the object surface is transformed to a point to the texture space and then
whatever color is there at that particular point is considered to be the color of the
corresponding surface point.
So, here we are performing a transformation from object space to texture space, and then
whatever color is already defined in that particular texture space point is used as the color of
the surface point. But this transformation is more complicated than the mapping that we have
seen earlier, and we will not go into further details of this transformation technique.
(Refer Slide Time: 46:37)
So, in summary, what we have learnt so far let us quickly recap. So, with this lecture, we
conclude our discussion on stage 3 of the pipeline that is coloring, and we covered three
507
concepts, broad concepts that is the lighting model to compute color, the shading model to
interpolate colors, which reduces computation and the intensity mapping to map between the
computed intensity and the device supported intensity.
(Refer Slide Time: 47:14)
Along with that, we understood the basics of color models and also how to create texture
patterns, of course, these are very basic concepts. There are advance concepts which we did
not discuss in this introductory discussion, and for more details, you may refer to the material
at the end. In our next lecture we will start our discussion on the fourth stage of the pipeline
that is the viewing pipeline, which itself consists of many sub-stages.
(Refer Slide Time: 47:53)
508
Whatever we have discussed so far can be found in chapter 5, in particular section 5.1, 5.2.1,
and 5.3 these three sections. However, if you are interested in learning about other color
models as well as some more details on 3D texturing, then you may go through other sections
as well. See you in the next lecture, thank you and goodbye.
509
Computer Graphics
Professor Dr. Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 18
View transformation
Hello, and welcome to lecture number 18 in the course Computer Graphics. We are currently
discussing the 3D Graphics pipeline. That is the set of stages that converts an object
description to a 2D image on a computer screen. What are the stages? Let us quickly recap.
(Refer Slide Time: 00:51)
There are five stages. As shown here. Object representation, Modelling transformation,
Lighting, Viewing pipeline, and Scan conversion. And also, it may be recalled that each of
these stages works in a specific coordinate system. For example, object representation works
in local or object coordinate system; modeling transformation works in local to world
coordinate systems. It is basically a transformation from local coordinate to world coordinate
system.
Then lighting or assigning colours to objects happen in the world coordinate system. Then the
viewing pipeline, the fourth stage actually consists of five sub-stages, and there is a variation
in coordinate system where they work. For example, the first stage, viewing transformation is
a transformation from world to a view coordinate system, then clipping the second stage
works in view coordinate system.
Hidden surface removal, third stage works in view coordinate system, projection
transformation, which is again a transformation that takes place between 3D view coordinate
to 2D view coordinate system and window to viewport transformation the fifth substage of
510
the fourth stage takes the object description from view coordinate to device coordinate
system. The last stage, scan conversion, essentially is a transformation from device to screen
or pixel coordinates.
(Refer Slide Time: 03:03)
So, far among all these stages and sub-stages, we have covered three steps—the first three
stages in our previous lectures. The first one is object representation, then geometric
transformation and lighting or assigning colours to the objects.
(Refer Slide Time: 03:20)
Today, we will start our discussion on the fourth stage that is the viewing pipeline. Now let
us start with some basic idea of what we mean by the viewing pipeline.
511
(Refer Slide Time: 03:39)
Let us start with a background knowledge.
(Refer Slide Time: 03:44)
So, up to this point whatever we have discussed in our previous lectures. We learnt how to
synthesize a realistic 3D scene in the world coordinate system. So, we started with the object
definition stage or object representation stage. Then we put together the objects in the second
stage modeling transformation stage to construct a world coordinate scene.
And then, in the third stage, we assigned colours. In the world coordinate description of the
scene to make it look a 3D scene. So, the knowledge we have gained so far is good enough to
explain how we can create a 3D realistic scene. Now, that is not the end of it. So, we need to
display this scene on a 2D computer screen. So, that means essentially, we need a projection
512
of the scene on a 2D screen from the 3D description, and this projection need not be of the
whole scene. It can be a part of the scene also, which is the most common way of looking at
it; we usually talk about a portion of the overall 3D description to be projected on a screen.
(Refer Slide Time: 05:33)
Now, this process of projection is actually similar to taking a photograph. So, when we take a
photo, the photo is basically a projected image of a portion of the 3D world that we see
around us that we live in, and this projected image is on the photographic plate or a camera
screen. So, when you are talking of displaying a 2D image on a computer screen essentially,
we start with a 3D description of the scene, and then we want to simulate the process of
taking a photograph.
(Refer Slide Time: 06:12)
513
Now, this process, process of taking a photograph is simulated in computer graphics with a
set of stages and these stages together constitute the fourth stage of the graphics pipeline that
is viewing pipeline. What are those stages?
(Refer Slide Time: 06:32)
The very first page is to transform the 3D world coordinate scene to a 3D view coordinate
system or reference frame. Now, this 3D view coordinate system is also known as the eye
coordinate system or a camera coordinate system. And this process of transforming from 3D
world coordinates to 3D view coordinate is generally called the 3D viewing transformation.
So, this is the first stage in the 3D viewing pipeline.
(Refer Slide Time: 07:23)
514
Then we project the transformed scene onto the view plane. So, this is the projection
transformation, so first comes 3D view transformation, then comes projection transformation.
(Refer Slide Time: 07:38)
Now, from the view plane after projection, we perform another transformation. The
projection is done on a viewport on the view plane. Now, from there, we transform the
objects into the description on a device coordinate system. So, when we perform projection
transformation, we essentially transform on the view plane. Now, that area where the image
is projected is called the window.
Now, from this window, we make a further transformation. From the window, the object
descriptions are projected onto a viewport which is on the device coordinate system. So, we
perform window to viewport mapping. So, this is the third stage of the viewing pipeline that
we are constructing in the fourth stage of the graphics pipeline.
Now, these three are the basic stages in the 3D viewing pipeline. Along with that there are a
couple of operations, namely clipping, and hidden surface removal, which together constitute
the entire fourth stage of the 3D graphics pipeline, namely the viewing pipeline. And we will
discuss each of these sub-stages one by one.
515
(Refer Slide Time: 09:22)
So, let us start with the first substage that is 3D viewing transformation.
(Refer Slide Time: 09:27)
Now, in order to understand this transformation, we need to understand. How a photograph is
taken? So, there are broadly three stages through which we capture a photo. First, we point
the camera in a particular direction with a specific orientation so that we can capture the
desired part of the scene. Then we set focus, and finally, we click the picture. These three are
broadly the stages that we follow when we capture a photo.
516
(Refer Slide Time: 10:21)
Now, among the most important thing is focusing. With focusing, we get to know, or at least
we can estimate the quality and the coverage of the picture that we are taking. So, focusing
constitutes the most important component of the overall process of capturing a photo.
(Refer Slide Time: 10:49)
Now, in order to set the focus, what we do? We essentially look at the scene through a
viewing mechanism that is provided in the camera. Typically while we try to set focus, we do
not look at the scene with our bare eyes. We look at the scene through the viewing
mechanism provided in the camera itself. And accordingly, we set our focus.
517
(Refer Slide Time: 11:19)
Now, this is important. So, we are not looking at the scene with our bare eyes to set focus.
Instead, we are setting focus based on our perception of the scene obtained by looking
through a viewing mechanism provided in the camera. So, we are looking at the scene
through the camera instead of looking at the scene directly. So, this is very important
consideration.
(Refer Slide Time: 11:51)
If we are looking at the scene directly, that means we are looking at the scene in its original
coordinate system. That is what we are calling the world coordinate system. So, when we are
looking at the scene directly, we are looking at it in its world coordinate reference frame,
world coordinate system.
518
(Refer Slide Time: 12:13)
However, if we are looking at the scene through a viewing mechanism of the camera, then we
are not looking at the scene in its world coordinate system. Instead, we are looking at a
different scene, one that is changed; it is important to note that a change scene and the change
took place due to the arrangement of lenses in the camera. So, that we can estimate the
quality and coverage of the photo to be taken
So, here we are not looking at a world coordinate scene; we are looking at a scene that is
changed from its world coordinate description due to the arrangement provided in the camera
viewing mechanism.
(Refer Slide Time: 13:14)
519
So, then what happens? When we are taking a photograph with a camera, we are actually
changing or transforming the 3D world coordinate scene to a description in another
coordinate system. This is the most fundamental concept to be noted to understand how
computer graphics simulates the process of taking a photograph.
So, when we are looking at a scene to set focus through the viewing mechanism provided in a
camera, we are actually transforming the world coordinate scene to a different coordinate
system. And this coordinate system is characterized by the camera parameters, namely, the
position and orientation of the camera; this needs to be carefully noted.
(Refer Slide Time: 14:18)
So, this new coordinate system, we generally call it view coordinate system, and the
transformation between world coordinate system and view coordinate system is the viewing
transformation, which is the first sub-stage of the viewing pipeline. So, essentially we are
trying to simulate the photo-taking process, and the first stage in it is to transform the world
coordinate description of a scene to the view coordinate system, which simulates the process
of looking at the scene through a viewing mechanism provided in the camera.
520
(Refer Slide Time: 15:11)
So, to simulate this viewing transformation or to implement the viewing transformation we
need to do two things. First, we need to define the coordinate system, and second is we
perform the transformation. So, first we define the coordinate system, and second, we
perform the transformation to simulate this effect of looking at the scene through the camera.
(Refer Slide Time: 15:44)
521
Now, let us go one by one. First, we try to understand how we set up the viewing coordinate
system. This figure shows the basic setting that will be considering here. So, on the left side,
here is the actual world coordinate scene, so this cylinder is the object in a world coordinate
description defined by the three principle access X, Y and Z.
And then this is the camera through which we are looking at the scene, and this view
coordinate is characterized by the three principle access X view, Y view and Z view.
Although the more common notation used in graphics is u, v and n to denote these three
principal axis of the view coordinate system rather than x, y and z.
So, in subsequent discussion, we will refer to this view coordinate system in terms of this
letter notation, that is in terms of the principle axis u, v and n. So, the question is, how do we
determine this principle axis, which defines the viewing coordinate system. You may note
here that n corresponds to z, v corresponds to y, and u corresponds to x.
522
(Refer Slide Time: 17:31)
So, let us try to understand how we can determine the three principal axes to define the view
coordinate system. So, the first thing is to determine the origin, origin of the view coordinate
system where the three axes meet. Now, this is simple; we assume that the camera is
represented as a point and the camera position is the origin denoted by o. So, the position of
the camera is the origin where we are assuming that the camera actually is defined as a point,
a dimensionless entity.
(Refer Slide Time: 18:20)
Now, when we try to focus our camera, we choose a point as we have already mentioned
before in the world coordinate system, and we call this the center of interest or look at point,
as shown here. So, this is our camera position, and this one is the look at Point. As of now,
523
you may note that both are defined in the world coordinates scene. So, with these points, we
can define vectors. So, this will be the origin vector, and this will be the look at point vector
p.
(Refer Slide Time: 19:10)
Then using vector algebra, what we can do is we can define n as you can see, n is the normal
of the plane. So, we can define n to be ⃗ - ⃗ where each of these are vectors? That is the
simple vector algebra we can use, and then we normalize n to get the unit basis vector ̂,
which can be defined simply as | |. So, then we got one basis vector ⃗⃗.
(Refer Slide Time: 20:00)
Next, we specify an arbitrary point. Let us denote it by pup along the direction of our head
while looking through the camera. This we call the view of point along the view-up
524
Direction. So, the direction along which our head is oriented while we are looking at the
scene through the camera, essentially this is the head up direction.
(Refer Slide Time: 20:37)
Now, with this point we determine the view-up vector. This vector Vup as a difference of
these two vectors as you can see from the figure here. This is again simple vector algebra.
And once we get this V up vector, then we can get the unit basis vector ̂ in the same way by
dividing the vector with its length, which is a scalar quantity.
(Refer Slide Time: 21:17)
Now, we got two vectors then, two basis vectors n and v. Now, if we look at the figure, we
can see that the remaining vector u is perpendicular to the plane that is spanned by n and v.
So, if this is the plane then u will be perpendicular to this plane. Then we can simply use the
525
vector algebra again to define u to be a cross product of v and n, v ✕n. Now, since both n and
it should be v are unit vectors? So, further normalization is not required, and we get the unit
basis vector u by this cross product.
(Refer Slide Time: 22:24)
So, then in summary, what we have done? We assume that few things are given three things;
one is the camera position or the coordinate from where we can define the origin vector o.
Then view-up point and corresponding view-up vector we can define and finally the look at
point p and the corresponding vector.
Then based on this information, we perform a three-step process. First, we determine the unit
basis vector in ̂ using simple vector algebra, then we determine ̂ again using simple vector
algebra. And finally, we determine u as a cross product of the earlier two vectors that we
have defined in the first two steps. And following these stages, we get the three basis vectors
that define our viewing coordinate system.
Now, once the coordinate system is defined, our next task that is the second part of the
process is to transform the object definition from the world coordinate system to the view
coordinate system. Let us see how we can do that. So, in order to transform, we need to
perform some operations.
526
(Refer Slide Time: 24:07)
To get an idea, let us have a look at the figure here. So, suppose this is an arbitrary point in
the world coordinates scene, and we want to transform it to the view coordinate system
defined by three vectors n, u, and v.
(Refer Slide Time: 24:33)
Now, let us assume that the origin is defined with this point having the coordinates then sent
here, and the three basis vectors are represented as shown here. These are the X, Y, and Z
components of the basis vectors. And this representation will follow to formulate our
mechanism to transform any arbitrary point in the world coordinate scene to a point in the
view coordinate system.
527
(Refer Slide Time: 25:12)
So, what do we need? We need transformation matrix M; if you recollect in the modeling
transformation stage, we said that any transformation could be represented in the form of a
matrix. So, we need to find out that matrix, which will transform a given point to a point in
the view coordinate system.
And how we do that? Again if you recollect our discussion from the lectures on modelling
transformation, what we did, we multiply the point with the transformation matrix to get the
transformed point. So, this is the transformed point which we will get by multiplying the
original point with the transformation matrix.
(Refer Slide Time: 26:03)
528
And this transformation is actually a sequence of transformations that are required to align
the view coordinate with the world coordinate. In a most general setting they are not aligned
like in the figure shown here; there is a slight difference in orientation between the twocoordinate system. So, we align them and then perform the transformation.
Now, in order to do that, we require two basic transformation operations. Translation and
rotation, the idea is simple. So, we translate the origin to the world coordinate origin and then
rotate the system to align with the world coordinate system.
(Refer Slide Time: 27:04)
So, this translation and rotation will constitute the sequence of operations we need to
transform between the two coordinate systems.
(Refer Slide Time: 27:19)
529
Now, first thing is we translate VC origin to world coordinate origin. And this is the
transformation matrix, which is the same as we discussed earlier with the corresponding X,
Y, Z values replaced here.
(Refer Slide Time: 27:38)
Next is the rotation. Now, the rotation matrix is shown here; we will skip the derivation, can
be derived. But for the time being, let us just note the rotation matrix. So, this matrix will
align if applied this matrix rotates the viewing coordinate system to align it with the world
coordinate system.
(Refer Slide Time: 28:09)
And since we performed first translation and then the rotation, so we will follow the right to
left rule to combine the two transformations to come up with a composite transformation
530
matrix. Thus we will have to write them in this sequence T first, and then on the left side is R,
and we take the product of these two matrices to get the composite matrix. And then we
multiply this matrix to the point to get the transformed point coordinates.
(Refer Slide Time: 28:51)
Let us try to understand this process in terms of one example. Consider this setting, here there
is a square object defined with its vertices A, B, C, D and then we have a camera located at
this point (1, 2, 2) and the look at point is centre of the square object here that is (2, 2, 2). It is
also specified that the up direction is parallel to the positive Z direction.
Then given this specification, let us try to calculate the coordinate of the centre of the object
after transformation to the view coordinate system. So, originally in the world coordinate
531
system it is (2, 2, 2). Now, after transformation, what will be the coordinate? Let us try to
follow the steps that we have just discussed.
(Refer Slide Time: 30:07)
The first thing is we determine the 3 unit basis vectors for the viewing coordinate system.
(Refer Slide Time: 30:20)
Now, the camera position is defined o that is (1, 2, 2) as you can see here. Look at point p is
defined at the centre of the object that is (2, 2, 2). So, then we can calculate the vector n as op which is (-1, 0, 0). Now, it is already a unit vector, so no need to do any further operations.
So, we already got the unit basis vector ̂.
532
(Refer Slide Time: 31:00)
Now, it is also mentioned that the up direction is parallel to the positive Z Direction.
Therefore, we can directly determine that the unit basis vector along the up direction ̂ is
basically the unit basis vector along the z direction only, that is (0, 0, 1), we do not need to do
any further calculations.
So, this as you can see is another way of specifying the up vector you tell the direction in
terms of available basis vectors or in terms of a line rather than specifying a point. So, there
are different ways of specifying the up direction. Anyway, so we have already found out two
basis vectors n and v.
(Refer Slide Time: 32:05)
533
Then we take a cross product of these two basis vectors to get the third basis vector ̂, which
is (0, 1, 0).
(Refer Slide Time: 32:16)
So, then we found the view coordinate system as defined by the three-unit basis vectors n, u,
and v. Next, we compute the transformation matrix M which is the composition of the
translation and rotation matrices.
(Refer Slide Time: 32:45)
Now, we have already noted earlier that the translation matrix is represented in this way
where we use the coordinate position of the origin since it is already given to be (1, 2, 2) so
we replace it here to get the translation matrix in this form. Again, we already know the
rotation matrix, which is in terms of the vectors which define the coordinate system, and we
534
already have determined this vector. So, we replace those values here. So, ̂ is (-1, 0, 0); ̂ is
(0, 1, 0) and ̂ is (0, 0, 1) we already have determined this. Now, we replace these values.
This is for u, this is for v, and this is for n to get this rotation matrix R.
(Refer Slide Time: 34:14)
So, then we multiply these to R dot T to get the transformation matrix m shown here.
(Refer Slide Time: 34:29)
Now, we have determined M, then we multiply M with the original coordinate to get the
transformed coordinate and note here that it will be in the homogeneous coordinate system.
But with a homogeneous factor 1, so we do not need to make any change. So, after
multiplication, what we get? We get that the coordinate of the transformed point is (0, 0, -1)
535
in the view coordinate system. So, that is our transformed point in the view coordinate
system.
So, in summary, today what we discussed? So, we are discussing the fourth stage that is
viewing pipeline which is essentially simulating the process of capturing a photo. Now, this
process consists of multiple stages, broadly, there are three stages. First is a transformation
from the world coordinate description of an object to a view coordinate system. The second is
from view coordinate system to projection on a view plane, and the third is from the view
plane a transformation to the device coordinate system.
Among them we discussed the first major stage that is the transformation from world
coordinate description to a view coordinate description. There we have seen how we can
define a view coordinate system in terms of its three-principle axis u, v, and n and how to
determine these three principal axes given the information of the camera position, the view of
vector, and the look at point.
Once these three are given we can define the three principal axes or the view coordinate
system, which in turn gives us the system itself. Then once the system is defined, we
determine a transformation matrix, which is a composition of translation and rotation to
transform a point from the world coordinate system to the view coordinate system.
We achieve this by multiplying the world coordinate point with the transformation matrix to
get the transformed point in the view coordinate system. We may note here that here also we
are assuming a homogeneous coordinate system. However, the homogeneous factor is still 1,
so we do not need to do any further change in the computed coordinate value.
536
(Refer Slide Time: 37:35)
In the next lecture, we will talk about the second major stage in this viewing pipeline,
namely, the projection transformation.
(Refer Slide Time: 37:47)
Whatever we have discussed today can be found in this book. And you are advised to refer to
chapter 6, section 6.1, to learn in more detail the topics that we have covered today. Thank
you, and goodbye.
537
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 19
Projection Transformation
Hello and welcome, to lecture number 19 in the course Computer Graphics. We are,
continuing our discussion on the 3-D Graphics Pipeline. To recollect, the graphics pipeline is
the set of stages that are used to convert a 3-D scene description to a 2-D image on the
computer screen. And there are five stages in the pipeline; what are those five stages?
(Refer Slide Time: 1:00)
We have Object Representation as the first stage, Modeling Transformation as the second
stage, Lightning or Assigning to the color objects as the third stage, Viewing Pipeline as the
fourth stage, and Scan Conversation as the fifth stage.
So, among them, we have already discussed the first three stages, object representation,
modeling transformation, and lighting. Currently, we are discussing the fourth stage, that is
viewing pipeline.
538
(Refer Slide Time: 1:42)
Now, as you can recollect this stage, the fourth stage that is the viewing pipeline stage
consists of a set of sub-stages, what are those sub-stages? Now, the first sub-stage is a
Transformation from a 3-D world coordinate scene to a view coordinate description.
Now, this view coordinate is also known as Eye or Camera coordinate system, and we have
already discussed this transformation, which is called 3-D viewing transformation, in the
previous lectures so, this is already discussed.
539
(Refer Slide Time: 2:32)
Next comes, Projection that is after the viewing transformation, we project the transformed
scene onto the view plane, which is our projection transformation, and this transformation we
are going to discuss today.
(Refer Slide Time: 2:58)
There is one more sub-stage, the third sub-stage, in which we perform another
transformation. So, from the view plane where the scene is projected, we transform it to a
description on the device coordinate system in a region called a viewport. This is called
window to viewport mapping, where window refers to a region on the view plane and this
540
will be our next lecture subject matter, today we are going to discuss the second stage that is
projection.
(Refer Slide Time: 3:47)
Let us try to understand the basic idea behind projection before we discuss the projection
transformation.
(Refer Slide Time: 3:58)
So, why we require projection? We all know when we see an image on a screen, it is a 2-D
image; the image is display on a 2-D computer screen.
541
(Refer Slide Time: 4:14)
However, when we discussed about transforming the world coordinate scene to a view
coordinate scene, that was still a 3-D description, so the scene that was transformed to view
coordinate system was still in 3-D or three-dimensional scene.
(Refer Slide Time: 4:43)
Then what is required? We need a way to transform a 3-D scene to a 2-D image, and this
technique where we are transforming a 3-D description to a 2-D image description is called
projection.
542
So, the idea is simple, when we see something on a screen that is on a 2-D screen, however,
our definitions as well as representation are in 3-D, and we require some way to transfer from
3-D description to 2-D description, and that is projection.
(Refer Slide Time: 5:22)
In general, projection transforms objects from n dimension to (n – 1) dimension so, it reduces
the dimension by 1 in our case of course, we will not go into the general description or
projection, instead we will restrict our discussion to projection from 3-D to 2-D which will
serve the purpose of this course.
(Refer Slide Time: 5:51)
543
So, let us try to understand the basic setting. So, we have this world coordinate system where
the scene is described. Now, we are looking at the scene through the viewing mechanism
provided in the camera, and then we are taking a snapshot. So, this is the look at point and
this snapshot when we take the snapshot on the film or on a screen that is called a view plane,
this is the camera position, this is the view up point all these concepts we already discussed in
the previous lecture and along with that the view up vector also.
So, this view coordinate system is defined by these three principle axis n, u, and v. So,
essentially, what we are doing? We are projecting the 3-D objects onto the 2-D view plane.
(Refer Slide Time: 7:24)
But, the entire view plane is not utilized, we define an area on this plane that contains the
projected objects, and this area is typically called clipping window.
So, in graphics, we will assume that whatever we want to project, we are projecting on a
particular region on a view plane that is called the clipping window, later on, we will see why
it is called clipping window.
544
(Refer Slide Time: 7:57)
There is a third component also, so we also define a 3-D volume or a region in space in the
scene. So, there is a scene, and within that scene, we are defining a 3-D volume. This is
called the view volume. Now, this view volume is actually our way of specifying which
objects we need to project on the clipping window. So, whichever objects are lying inside this
view volume are projected on the view plane, more specifically on the clipping window and
other objects that are outside are discarded.
Now, this discarding takes place through a process called clipping, which we will discuss in
the next lectures in subsequent lectures. So, essentially we have a view plane. Within this we
define a clipping window, and also in the 3-D scene description, we define a view volume.
Whatever objects are inside this view volume are projected onto this clipping window on the
view plane, and whatever lies outside the volume are discarded through a process called
clipping.
545
(Refer Slide Time: 9:27)
So, the point to be noted here is that the entire scene is not projected; instead only a portion
enclosed the view volume is projected. So, in this way, by controlling the view volume, we
can control the amount of scene that we want to display on the screen. So, this gives us some
flexibility in synthesizing the images; by increasing the volume, we want to show a larger
image; by reducing the volume, we can show smaller images.
(Refer Slide Time: 10:12)
It maybe also noted that this larger and smaller in terms of size in terms of the amount of
region of the 3-D scene description that we want to display.
546
So, by larger image, I meant that I can show a larger amount of the scene on the screen if we
increase the view volume size; similarly, if we reduce the view volume size, then we can
show the smaller region of the scene on the screen or on display. So, this brings us to the
point of how to choose appropriate view volume so that we can control the content of the
image; this requires some understanding of the different projection types.
(Refer Slide Time: 11:23)
So, let us try to understand different projection types. In order to do that, we have to now, to
go the basic idea of projection at another level we have to understand the idea. So, what we
want? We want to project a 3-D object on a 2-D view plane, so from each surface point of the
objects that are present in the 3-D scene, we generate straight lines towards the view plane in
order to create that projected image.
Now, these straight lines are known as projectors, and they intersect the view plane. When we
put together these intersection points, we get the projected image. So, essentially how we
generate the projection? We generates projectors that are straight lines originating from the
surface points in the original scene, and these lines goes towards the view plane intersects
them and these intersection points taken together give us the projected image.
547
(Refer Slide Time: 12:49)
Now, depending on the nature of projectors, we can have two broad types of projections one
is Parallel projection, one is Perspective projection. There are many sub-types also, but, here
we will not going to the finite details of different types and sub-types; instead, we will restrict
our discussion to these broader types.
(Refer Slide Time: 13:20)
So, in the case of perspective projection, what happens? The projectors are not parallel to
each other, and they converge to a center of projection
So, here as you can see, this is the object, this one is the view plane, now from the objects we
created projectors like here, here, here. Now, these projectors are not parallel to each other,
548
and they were projected towards a common point of projection where they meet during this
process they intersect or passes through the view plane and the points of intersection like
here, here, here it came together gives us the projected image.
So, this object here is projected in this way, and this type of projection where the projectors
are not parallel to each other instead, they meet at a point of projection is called perspective
projection.
(Refer Slide Time: 14:31)
Now, there is another type of projection that is the parallel projection. In this case, projectors
are parallel to each other they do not meet at a common point; instead, typically, it is assumed
that they meet at infinity. That means, they do not meet in simple terms and this projectors
when they intersect the view plane they generates sets of intersection points these points are
taken together gives us the projected image as shown in this figure.
So, here in this figure, as you can this is the object this entire thing, and we get this projection
on the view plane.
549
(Refer Slide Time: 15:39)
Now, in the case of perspective projection, certain things happen, and we should be aware of
those things in fact because, of those things, only we get the only perception of reality let us
see what are those things, they are together known as Anomalies.
Now, those anomalies happen because the projectors converge at a point, and these anomalies
indicate that the appearance of the object in terms of shape and size gets changed in
perspective projection.
(Refer Slide Time: 16:06)
550
Now, one anomalies perspective foreshortening. So two objects of the same size placed at
different distances from the view plane; if that is the case, then the distant object appears
smaller.
So, if this is the view plane, this is one object, this is another object then as you can see object
A, B size appears to this which is smaller than object C, D, although they are actually of same
size so, this happens because of the projectors meeting at a common center of projection and
as you can see form here that because of this reason, we get a sense of reality that distant
objects appear smaller.
(Refer Slide Time: 17:09)
Another anomaly is called vanishing points; here, the lines that are not parallel to the view
plane appear to be meeting at some point on the view plane after projection. Now, the point
where these lines appear to meet is called the vanishing points. For example, here, if we have
an object like this as you can see this side and this side, they appear to meet at these points;
these are vanishing points.
So, accordingly, the object shape is projected; this again gives us a sense of 3-D reality due to
the occurrence of the vanishing points.
551
(Refer Slide Time: 18:15)
There is another anomaly called view confusion. Now, if the view plane is behind the center
of projection that means, the points where the projectors meet then the objects that are in
front of the center of projection appear upside down on the view plane after projection, and
this is simple to understand, this is center of projection and the view plane is behind it so, this
object will appear upside down as shown here, this point will be projected to this point and
this point will be projected to this point, this is view confusion.
So, we have perspective foreshortening where distant objects appear smaller then vanishing
points and then view confusion. So, together these anomalies make us perceive an object with
changed shape and size, which intern reinforces our perception of 3-D reality. So, how we
can use the projections?
552
(Refer Slide Time: 19:42)
As I said, perspective anomalies actually help in generating realistic images since this is the
way how we perceive objects in the real world.
So, perspective projection can be used for realistic computer graphics; for example, in games
or animations, we can use perspective projection wherever we require 3-D realistic scenes to
be generated; we should use perspective projection.
(Refer Slide Time: 20:24)
On the other hand, in the case of parallel projection, the shape and size of the objects are
preserved. They do not change, unlike in the case of perspective projection. As a result, the
parallel projection is not suitable for giving us realistic images.
553
So, if we use parallel projection to generate a scene that would not look like realistic. So, it is
not used for realistic 3-D, instead where it is used? For graphics systems that typically deal
with Engineering drawings such as the CAD packages that we have discussed at the
beginning of the course or compute aided design packages, so there realism is not important
instead, other things are more important, so there parallel projection may be useful.
So, with that, I hope you got some basic idea of projections, so to recap, we use projections to
map 3-D scene to a 2-D image on the view plane, and we do this on a clipping window,
which is a region on the view plane that is based on the view volume that we define in the 3D space, and then we perform projection transformation.
So, let us now shift our focus to the idea of projection transformation what it is and how we
can do this.
(Refer Slide Time: 22:17)
As the name suggests, it is a transformation. So, it is similar to all other transformations that
we have already encountered, namely, the modeling transformation and the view
transformation in which way it is similar, so we can perform these transformations with
matrix multiplication between the point vectors and the projective transformation matrices.
554
So, essentially our objective is to have all sorts of transformations represented as a matrix
multiplication, and we have already seen how to do it with modeling transformation; we have
seen how to do it with view transformation. Now, let us try to derive projective
transformation matrices so that we can do it with projection transformation as well.
(Refer Slide Time: 23:21)
Now, in order to understand those matrices, we require some understanding of the view
volume because the projections are dependent on the view volumes, and the shape of the
view volumes actually depends on the type of projection we are interested in, so we already
mentioned there are two types one is the perspective projection, one is the parallel projection
and their corresponding view volumes are different.
555
(Refer Slide Time: 23:52)
Now, in the case of parallel projection the view volume takes the shape of a rectangular
parallelepiped as shown in this figure. Here, there are six phases marked as near to the view
plan, then right, the bottom, then far plane, top and left plane. Now this, near plane is also the
clipping window.
So, essentially it is the view plane containing the clipping window. So, in the case of parallel
projection we define view volume as a rectangular parallelepipe defined by six planes, and
the near plane is the view plane containing the clipping window.
(Refer Slide Time: 25:00)
556
What happens in the case of perspective projection? So, in the case of perspective projection
what we use is a frustum as shown in this figure like the parallel projection here also the
frustum is defined in terms of its bounding planes, so we have near plane, far plane, right
plane, left plane, top plane and bottom plane and the near plane contains the clipping
window. So, essentially the idea is the same with parallel projection in the sense that in the
both cases the near plane is the plane where the clipping window is situated.
(Refer Slide Time: 26:02)
So, with this knowledge of the view volume, let us try to understand the projection
transformation matrices for the two types of projection. Let us start with the parallel
projection. So, in this case, let us consider, a point P which is in the view volume with
coordinates x, y, z. Now, we want to project this point on the view plane as a projected point
P’ with the new coordinates x’, y’, z’. And this projection takes place on the clipping
window, as we have already mentioned, and our objective is to relate these two points.
557
(Refer Slide Time: 26:53)
Let us assume that the near plane is at a distance d along the -z direction, so then we can
simply understand it is quite obvious that, since it is a parallel projection the new x
coordinate will be same as the original one, the y coordinate will also be the same in case of z
coordinate there will be change, so the new z coordinate will be - d.
(Refer Slide Time: 27:29)
Then, we can actually represent this information in the form of a transformation matrix for
parallel projection, as shown here.
So, when we multiply this transformation matrix with the point vector P, then we will get the
new point vector P’.
558
(Refer Slide Time: 27:59)
However, we have to keep in mind that this multiplication that we are performing that is we
are trying to get the transform point by multiplying the transformation matrix with the
original point in this way; this transformation takes place in a homogeneous coordinate
system.
(Refer Slide Time: 28:27)
So, the multiplication performed in this homogeneous coordinate system and the real
coordinates of P’ should be obtained by dividing it with the homogenous factor w. Now, we
shall see later that in the case of transformations, w need not be 1, so we require division. We
559
will see some examples later where w is not 1, unlike in the previous transformations where
the homogeneous factor was 1.
(Refer Slide Time: 29:03)
Now, let us see the perspective projection. So, this is more complex than parallel projection;
in parallel projection we simply drop or change the coordinate but, here we require to change
all the coordinates because the projectors meet at a point. Now, to derive this changes, let us
try to understand the projection with this figure; the figure shows the side view along with the
- x direction.
So, what we need? We need to derive the transformation matrix that projects P. This is the
point P to the point on the view plane or clipping window P’.
(Refer Slide Time: 30:05)
560
Now, from the original and projected points, you can see that they are part of two similar
triangles, and from these triangles, we can form some relationships. So, like between the y
coordinates and between the x coordinates in terms of d, that is the distance of the view plane
or the near plane from the origin and the original z coordinate value.
(Refer Slide Time: 30:52)
From there, we can reorganize to get the transformation matrix in this form, where d is the
distance shown here between the origin and the projected point on the near plane or between
the origin and the near plane.
(Refer Slide Time: 31:17)
561
Now, like in the case of parallel projection here also, what we can do? We can multiply this
perspective projection matrix with the original point vector to get the projected point in the
homogeneous coordinate system and to get back to the original point, what we require is we
need to divide it by this homogeneous factor w, and we will see that here again, w will not be
equal to 1.
So, we require some divisions, unlike in the cases of other transformations that we have seen
before. So, that is how we can derive the projection matrices. Now, few things we should
note here, one is the derivations that are shown are basically very simplified situations taken
into account; in reality that actual projection matrices are slightly more complicated, plus
there is some other information that is also stored along with projection, those things we will
discuss later briefly although, we will not going to the minute details of those concepts.
(Refer Slide Time: 32:51)
562
So, to summarize today, what we have learned is the idea of projection, why we require
projection? To transform from 3-D scene description to description on a 2-D plane, which we
are calling the view plane, on the view plane, we define a region which is called clipping
window on which this projection takes place, and for the purpose of projection we define a 3D region in the 3-D space that is called view volume.
Now, there are two types of view volumes defined for two types of projections, a rectangular
parallel pipe for parallel projection and a frustum for perspective projection. In case of
perspective projection, we get to see several anomalies that changes the shape and size of the
objects after projection, and that gives us the perception of 3-D reality. Accordingly, for
applications of computer graphics where we require to generate 3-D realistic scenes, we use
perspective projection, whereas parallel projection can be used in situations where 3-D
realism is not required.
And we have also shown how to derive the projection transformation matrices for the basic
two types of projections, namely parallel projection and perspective projection. The
transformation idea is the same. Essentially, we have a transformation matrix this matrix we
multiply with the point vector to get a new point vector, the transformed point vector;
however, we have to keep in mind that this transform point vector is defined in the
homogenous coordinate system.
So, to get to the actual transform point vector, we need to divide the obtained coordinate
values with the homogeneous factor w, and in the case of transformations where projection is
563
involved, w is not 1, unlike the other transformations that we have seen before namely
modeling transformation and view transformation.
In the next lecture, we will talk about the other sub-stage of the view pipeline that is the
window to viewport mapping.
(Refer Slide Time: 35:39)
Whatever I have discussed today can be found in this book you are advised to refer to chapter
6, the whole section 6.2, excluding section 6.2.3. This section we will discuss in the next
lecture. Now, in this section, you will find more details about projections and types of
projections. You may go through those details if you are interested. That is all for today,
thank you and goodbye.
564
Computer Graphics
Professor Dr Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 20
Windows to Viewport Transformation
Hello and welcome to lecture number 20 in the course Computer Graphics. We are currently
discussing the graphics pipeline, that is the series of stages or steps that has to be performed to
convert a 3D description of a scene to a 2D image on a computer screen or on that display that
we get to see.
(Refer Slide Time: 0:58)
So, there are five stages, as we have already mentioned object representation first stage,
modelling transformations second stage, lighting third stage, these three stages we have already
discussed completely. Currently, we are in the fourth stage viewing pipeline, and there will be
one more stage fifth stage that is scan conversion.
565
(Refer Slide Time: 1:24)
Now, the fourth stage, the viewing pipeline, contains a set of sub-stages. The first sub-stage is a
transformation from a 3D world coordinated scene description to a 3D view coordinate scene
description. Now, this view coordinate is also called an eye or camera coordinate system. And
this transformation is generally called 3D viewing transformation, which we have already
discussed in our earlier lectures.
566
(Refer Slide Time: 2:11)
The second stage is projection, so we project the 3D view coordinate description onto the view
plane.
And this projection is performed through a transformation which is generally called projection
transformation. This also we have discussed in our earlier lectures.
(Refer Slide Time: 2:43)
567
There is a third stage in which we perform a mapping or transformation that is from the view
plane we map this to a viewport which is defined on the device coordinate system. This is called
the window to viewport mapping where the window is on the view plane, and viewport is on the
device coordinate system. And this third stage we are going to discuss today.
Now, before we discuss the mapping, we would discuss one important aspect of projection
transformation that we did not discuss in our last lecture, that is the idea of the canonical view
volume. Let us see what this volume means.
(Refer Slide Time: 3:42)
As we mentioned earlier, there is an important stage in the graphics pipeline. In fact, this is part
of the fourth stage that we are currently discussing that is viewing pipeline. Here, what we do is
whatever objects are outside the view volume are clipped out. Now, that stage is called clipping.
And we do it to remove all the objects that are outside the view volume.
We already know what a view volume is, that is a region in the 3D space which we want to
project. Now, if it involves lots of objects which are partly outside of view volume and partly
inside or lots of objects that are outside the view volume, then we require a large number of
calculations to determine which want to clip out.
And this involves object surface, view volume boundary intersection point calculations. So
where the object surfaces are intersecting with the view volume boundaries. So, if the number is
568
large, then such boundary calculations will be large, and these boundary calculations are not
easy. They involve a lot of floating-point operations, and accordingly, the complexity is high.
(Refer Slide Time: 5:21)
Now, if we have to perform such intersection calculations with respect to arbitrary view volume
where we have no control on the boundary planes of the view volume, then this complexity is
only to be more. So, we can expect a large number of computations, which is likely to take a
large amount of time reducing the quality of the image as we will get to see flicker.
569
(Refer Slide Time: 6:05)
In order to avoid that, we can use one simple idea, that is we can come up with a standardized
definition of view volume irrespective of how the actual view volume looks. We can always
convert it to a standardized view volume or a standard view volume. This is called a canonical
view volume or CVV in short.
Essentially it is a standard representation of view volume irrespective of the actual nature of the
volume. Remember that there are two types of view volume. One is for parallel projection, that is
a rectangular parallel pipe, and the other one is for perspective projection that is a frustum. Now
both these can be converted to a standard form which we call canonical view volume, which
makes the intersection calculations standard and easier to implement.
570
(Refer Slide Time: 7:10)
So, for both parallel and perspective projection, the standardized view volume looks the same.
However, the way to arrive at the standardized or canonical volume for both the projections for
the two types of projections are different. So, let us start with the parallel projection. So, for
parallel projection, the canonical view volume that we define is a cube within a specified range,
which is -1 to 1 along with all the three axis X, Y and Z. And as I already mentioned, any
arbitrary view volume can be transformed to the CVV simply by the scaling operation.
So, suppose this is an arbitrary view volume defined in terms of its bounding planes, six planes,
so we can always map it by scaling within this range along the X Y Z direction and
correspondingly, we get the canonical view volume.
571
(Refer Slide Time: 8:28)
In case of perspective projection, this transformation is slightly more complicated because here
we are dealing with a view frustum and we need to convert it to a canonical view volume for
parallel production that is the rectangular parallel pipe where the X, Y and Z extent of the
bounding planes are within a specified range.
So, what we can do here, we will just talk about the idea rather than going into the details, we
can convert this arbitrary view frustum to the canonical view volume here, by applying shearing
and scaling in sequence. As we can guess from the figures, that shearing is required to change
the shape and scaling is required to change the size. So, when we apply this to transformations
on the original view frustum, we get the canonical view volume, of course, here will not go into
any further details than this basic idea.
So, what we have learned? That we define a view volume and this view volume, we transform to
a canonical view volume so that in later stages when we perform clipping, the calculations are
easier to implement because we are dealing with a standardized definition of the view volume.
572
(Refer Slide Time: 10:14)
Let us revisit the sequence of transformations that we perform to project a point p in the world
coordinates scene to a point on the view plane. Now, we mentioned that this is akin to taking a
photograph that is we transfer it to view coordinate system, then take a projection. However,
earlier, we mentioned these two steps.
Now one third step is added. So first, we transform the world, coordinate point on the view
coordinate system, as we have discussed in earlier lecture. And the next step is not projection.
Instead, what we do is in this view coordinate description, we define a view volume, and this
573
view volume is transformed to a canonical view volume. And accordingly, the point is also
transformed by applying the same set of transformations. So, the next stage is to transform the
point in the view volume to a point in the canonical view volume; then the final stage is to
perform the projection transformation that is, project the point in the canonical view volume on
the view plane.
So, these three steps, a transformation to view coordinate then transformation to canonical view
volume and then projection transformation constitute the sequence through which we project a
point in the world coordinate scene into a point on the view plane. Mathematically or in matrix
notation that we are following, we can write this series of steps as shown here in this expression
where this transformation represents the transformation to view volume.
This one represents the transformation to canonical view volume, and this one represents the
projection transformation. Since we are applying them in sequence, so we are following the right
to left rule. So, the first transformation to view coordinate system, then transformation to
canonical view volume and then transformation to view plane through projection.
(Refer Slide Time: 13:11)
So, that is the idea of performing projection on the view plane. There is one more point to be
noted. So far, what we mentioned that in projection 3D points are mapped to 2D. The implication
is that we are removing the Z or depth component. However, it may be noted here at this point
574
that while we implement the pipeline, this depth component is actually not removed, and why
that is so?
(Refer Slide Time: 13:53)
One operation that we perform in this fourth stage is called hidden surface removal. We will talk
about this operation in details in a later lecture. The point to be noted here is that this operation
requires depth information. So, the depth information after projection is actually not removed.
Instead, this original depth information is stored in separate storage, which is called the Z-buffer
or the depth buffer.
So, we are actually not removing the depth information, although we are performing a projection
instead, we are keeping it stored separately in the Z-buffer or the depth buffer. And this
information is required to perform a later operation called hidden surface removal, which gives
us a realistic effect in an image.
575
(Refer Slide Time: 14:56)
So, that is in short what we do during projection and how we project from a world coordinated
scene to a view plane. Now there is one more stage. Let us go to that stage that is mapping from
this view plane to a viewport on the device space.
(Refer Slide Time: 15:21)
So, far what we have discussed? We discussed steps to transform a point in world coordinate to a
clipping window on the view plane. That means a region on the view plane on which we are
projecting the objects that are part of the view volume.
576
Now, also, you have shown that this is typically the near plane of the canonical view volume. So,
this is our window or clipping window.
(Refer Slide Time: 15:59)
We can assume that for simplicity that the window is at 0 depth or Z equal to 0, that is just for
simplicity, although in general, that is not an absolute requirement.
(Refer Slide Time: 16:17)
577
It may also be noted that we are talking of canonical view volume that is X and Y extents must
be within a fixed range irrespective of their actual position in the world coordinates scene, and
because of this region where we are restricting everything within a fixed range, this canonical
view volumes are standardized, and the clipping window that is defined on the near plane of the
canonical view volume is often called a normalized window.
So, here we are dealing with a normalized window where the extent of values are to be within a
predefined range.
(Refer Slide Time: 17:19)
Now, this view plane is actually an abstract concept, so accordingly, the clipping window is also
an abstract and intermediate concept. We cannot see it; what we get to see on the screen is
something different. The points that are there on the clipping window are to be shown on the
screen. But the scene that is there in the window need not occupy the whole screen, for example,
here.
Suppose this outer rectangle defines a whole scene out of which we have projected this part
defined within the clipping window. Now, this part can be displayed on any region of the screen
and can be in any size now the region on which this part is displayed on the screen it is called the
viewport.
578
(Refer Slide Time: 18:41)
So, we have two concepts here, window, which is same as the clipping window, which is
normalized. And objects are projected on this window, and we have the other concept viewport,
which is defined in the device space with respect to the screen origin and dimensions. So, this
viewport refers to a region on the device space where this projected image needs to be shown.
Now, this region can be at any location on the space device space and can be of any size,
irrespective of the size of the clipping window. So, what we need? We need to map from this
window to the viewport.
579
So, it requires one more transformation to transfer the points from the window to the viewport.
(Refer Slide Time: 19:59)
So, let us see how we can perform this transformation. So what we want, suppose this is our
window, and this is our viewport, note that here we are not using this normalized range. We are
formulating the problem in a very generic scene where this Wx and Wy can take any value. So,
Wx, Wy is a point on the window, and we want to map it to a point on the viewport Vx, Vy.
580
(Refer Slide Time: 20:44)
So, how we can do that? The first thing is that we have to maintain the relative position of this
point with respect to its boundaries, so the same relative position has to be maintained in the
viewport, so if we want to maintain that, then we get relationships like the one shown here
between the window dimensions and the viewport dimensions.
(Refer Slide Time: 21:17)
581
Now, this expression can be simplified in this form. So, we can represent the X coordinate of the
point in the viewport in terms of the X coordinate of the point in window and these two
constants, which are defined here in terms of the window and viewport sizes.
(Refer Slide Time: 22:04)
Similar relationship we can form between the Y coordinate of the point in the viewport and the Y
coordinate of the same point in the window by again, forming, the relationships first between the
y coordinates and then simplifying and rearranging to get this relationship where Vy is the Y
coordinate of the point in the viewport, Wy is the Y coordinate of the point in the window.
582
And these two are constants defined here in terms of, again, the window and viewport sizes.
(Refer Slide Time: 23:05)
So, using those expressions, we can actually form that transformation metrics as shown here. So,
this is the metrics to transform this window point to the viewport point.
(Refer Slide Time: 23:23)
And we will follow the same rule that is to get the transform point will multiply the original
point with the transformation metrics as shown here. Note that here again, we are dealing with
583
the homogeneous coordinate system since these are two-dimensional points, so we have threeelement vectors and three by three matrices.
And at the end, we need to divide the obtained coordinates here with the homogeneous factor as
shown here, to get the transformed points. The approach is similar to what we have seen earlier.
So, that is the basic idea of how to transform from a point in the window or the clipping window
to a point in the viewport, which can be anywhere on the device space.
Now, let us try to understand the concepts that we have gone through so far in terms of
illustrative examples.
(Refer Slide Time: 24:38)
So, in our earlier lecture, we have come across this example where we mentioned one object,
shown here and a camera position view up direction, everything has been mentioned, and we
computed the transform centre point of the object in the view coordinate system.
584
(Refer Slide Time: 25:11)
So, we will not go into the details of how we calculated that again, let us just mention the
transform point. That is 0, 0, - 1 that we got after applying this viewing transformation.
(Refer Slide Time: 25:29)
Now let us assume that the view plane is located at Z equal to - 0.5. And we want parallel
projection. So, what would be the coordinate of the object centre after the projection? Assuming
that the view volume is sufficiently large to encompass the whole transformed object.
585
(Refer Slide Time: 26:08)
So, our parallel projection transformation matrix is given here, and we know D is 0.5. So, this is
our parallel projection matrix.
(Refer Slide Time: 26:27)
So, if we use these matrix and perform the matrix multiplication here. The projection matrix and
the point vector. Then we get this point as the projected point on the view plane. Now here, since
the homogeneous factor is 1, so our point is directly obtained.
586
(Refer Slide Time: 26:56)
Now, let us consider perspective projection. Earlier, we considered parallel projection, what will
happen if we now consider the perspective projection with the same view plane? So, what would
be the new point after projection?
(Refer Slide Time: 27:29)
So, the transformation metrics for perspective projection is shown here. We know the value of d
replacing d in this, we get our projection metrics. And with this matrix, what we do?
587
(Refer Slide Time: 27:58)
We multiply it with the point vector as before, as shown here, so after multiplication, we get this
transform point in a homogeneous coordinate system. Now note here that the homogeneous
factor is not 1, earlier I mentioned that in projection, particularly perspective projection, we get
homogeneous factors that are not 1, so there we need to be careful in concluding the final
transformed point we have to divide whatever we got with the homogeneous factor. So, after
division, we will get this point, or this is the final point that we get after perspective projection
applied on the central point.
588
(Refer Slide Time: 29:01)
So, we have performed projection. Now let us try to see what happens if we want to perform this
window to viewport transformation. Now, let us assume that we projected the point on a
normalized clipping window. And this projected point is at the centre of the normalized window.
Now we are defining a viewport with a lower-left corner at (4, 4) and the top right corner at (6,
8).
So, that means if this is our viewport, then the lower-left corner is this 1. This is (4, 4) and top
right corner is (6, 8). So, if we perform a window to viewport transformation, then what would
be the position of the point, the same central point in the viewport? Let us try to derive that.
589
(Refer Slide Time: 30:18)
Now, we already mentioned that the clipping window is normalized, so the values or the extent
of the window are fixed. And we get these values. So, this is between - 1 to 1, and again, this is
also between - 1 to 1. So, we get these values, and from viewport specification, we can see that
this is (4, 4). So, this 4 and this is so this point is (6, 8) then this must be 6. This must be 8. So,
we get these values. So, next, we simply replace these values in the transformation matrices that
we have seen earlier.
(Refer Slide Time: 31:27)
590
We first compute the constant values sx, sy, tx, ty by using those values that we have seen earlier
to get these results. Sx is 1, sy is 2, tx is 5 and ty is 6.
(Refer Slide Time: 31:53)
So, the transformation matrix can be obtained by replacing the sx, sy, tx, ty values in this original
transformation matrix which gives us this matrix. So, this will be our window to viewport
transformation matrix.
(Refer Slide Time: 32:16)
591
Now, once we obtain the transformation matrix, then it is easier to obtain the transformed point
in the viewport by multiplying the transformation metrics with the point vector to obtain the
transform point in homogeneous coordinate. Now, here again, the coordinate factor is not 1. So,
we have to divide these values with the homogeneous factor as shown here and here, which
eventually gives us the point (5, 6).
So, this will be our transformed point after we apply the window to viewport transformation. So,
this is how we get a transformed point in the viewport. Now in this example, you probably have
noted that we have defined viewport irrespective of the window description, we can define it
anywhere in the device space.
What we need is basically a transformation also, you probably have noted that the viewport size
has nothing to do with the window size, the window is normalized, whereas the viewport is not
normalized. So, we can define any size by specifying its coordinate extents, and through
mapping, we get that transformed point. So, this gives us the flexibility of placing the projected
image anywhere on the screen with any size.
(Refer Slide Time: 34:18)
So, in summary, what we have discussed so far are three sub-stages of the viewing pipeline
stage. So, these three sub-stages are view transformation, projection transformation and viewport
transformation. Just to recap, so these three sub-stages are used to simulate the effect of taking a
592
photograph. So, when you take a photograph, we look at the scene through the mechanism
provided in the camera so that we mimic by performing the viewing transformation, we
transform the world coordinate scene to a 3D view coordinate system, which is actually
equivalent to watching the scene through the viewing mechanism of the camera. Then, we take a
photo that means we project it on the view plane that is done through projection transformation.
And finally, we display it on the screen, which is of course not part of the photograph analogy,
but we do it in computer graphics so that stages mimicked with the use of windows to viewport
transformation. This transformation is required to have the flexibility of displaying the projected
image anywhere on the screen and with any size, irrespective of the size of the clipping window.
In the fourth stage, apart from these three sub-stages, which are related to three types of
transformations, there are two more operations that are done.
We already mentioned it in this lecture. One is clipping one is hidden surface removal. So, these
two operations we will discuss in our subsequent lectures.
(Refer Slide Time: 36:11)
Whatever I have discussed today can be found from this book, Computer Graphics. You can go
through Chapter 6, the section 6.2.3 and 6.3 this section is on the topic of canonical view
volume, and this section discusses in detail the window to viewport transformation. That is all
for today. Thank you and goodbye.
593
Computer Graphics
Professor. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 21
Clipping Introduction and 2D Point and Line Clipping
Hello and welcome to lecture number 21 in the course, Computer Graphics. We are currently
discussing the graphics pipeline, that is how the 3D scene gets converted to a 2D image on the
computer screen, what are the stages there to perform this task. Together these stages are known
as pipeline, as we have already discussed. So, let us just quickly have a relook at the pipeline
stages and then we will start our discussion on today's topic.
(Refer Slide Time: 01:09)
What are the pipeline stages? We have object representation as the first stage, then modeling
transformation as the second stage, lighting or assigning color to objects as the third stage,
viewing pipeline as the fourth stage, and scan conversion as the fifth stage. So, here I would like
to reemphasize on the point that this sequence that is shown here need not be exactly followed
during implementation of the pipeline, there the stages maybe in a slightly different sequence.
Now, among these stages, we have already discussed first stage, second stage, third stage, and
currently we are discussing the fourth stage that is viewing pipeline. As you can see here, in the
viewing pipeline there are sub stages. So, we have viewing transformation, clipping, hidden
surface removal, projection transformation and window-to-viewport transformation.
594
Among them, we have already discussed in the previous lectures, we have already discussed the
viewing
transformation,
the
projection
transformation,
and
the
window-to-viewport
transformation. Two more operations in the fourth stage are remaining, namely clipping and
hidden surface removal. So, these operations we are going to discuss in todays and subsequent
lectures.
(Refer Slide Time: 02:56)
So, in the viewing pipelines stage we have covered these three transformations: view
transformation, projection transformation, and viewport transformation already.
595
(Refer Slide Time: 03:07)
And there are two more operations: clipping and hidden surface removal which are part of the
fourth stage.
(Refer Slide Time: 03:18)
So, these operations we are going to cover in the lectures that we are going to have this week.
596
(Refer Slide Time: 03:29)
Let us start with clipping. What is this operation and how it is performed.
(Refer Slide Time: 03:37)
If you may recollect, earlier we talked about a concept called view volume. So, essentially what
we discussed is that when we are trying to generate a 2D image, essentially this is analogous to
taking a photo of a 3D scene. So, we first perform view transformation, to transfer that content
from world coordinate system to view coordinate system, then we perform projection
transformation to project the 3D view coordinate system description of the scene to a 2D view
plane description. And finally, we perform window-to-viewport mapping.
597
However, in this process what we project on the view plane, we project only the objects that are
present in the view volume that is a region in the 3D view coordinate space that we have decided
to project. So, whatever objects are within the view volume should be projected. Thus, we need
to define this view volume in this 3D region, in the view coordinate system before projection.
(Refer Slide Time: 05:19)
So whichever object lies within this volume are projected, whereas objects that are outside the
volume are discarded. For example, let us have a look at this figure. Here the view volume is on
this rectangular parallel pipe. Now, as you can see this object is entirely within the view volume,
so it will be projected on the view plane. This object is partially within the view volume, so
whichever part is within the view volume will be projected and the outside part, this one will be
discarded. Whereas in this case, the entire object is outside the view volume so it will not be
projected.
So, we have three situation. In one case, the entire object is within the volume, so entire object is
projected. In the other case, object is partially within the volume. So which part is within the
volume will be projected, whereas the outside part will be discarded. And in the third case, we
have the entire object outside the volume which will be discarded. Now, the process of
discarding objects is called clipping. So, before projection we would perform clipping and then
whatever objects remain should be projected.
598
(Refer Slide Time: 06:46)
Now, the question is how a computer can discard or clip an object? That is done through some
programs or algorithms, which collectively are known as clipping algorithms. So, we perform
clipping with the help of some clipping algorithms. And in this lecture, and subsequent lectures
we will go through few of those algorithms which are easier to understand.
(Refer Slide Time: 07:22)
Before we start our discussion on the clipping algorithms, we should keep in mind two points.
First thing is, the algorithms will be assuming that the clipping is done against canonical view
volume. To recollect, a canonical view volume is a standardized view volume where the shape of
599
the view volume is rectangular parallel pipe and its bounding planes are within a fixed range. So,
it is a standardized view volume.
So, whenever we are talking of clipping, we assume that the scene is already transferred to the
canonical view volume and then we are performing clipping. Secondly, we will first discuss 2D
clipping algorithms for simplicity. It will be easier to understand clipping when we are dealing
with 2D objects and then we will extend our discussion to 3D clipping.
(Refer Slide Time: 08:35)
So, let us see what are the algorithms that we can use to perform 2D clipping.
600
(Refer Slide Time: 08:44)
Now since we are discussing 2D clipping, we have to restrict ourselves to the 2D concepts. Now
view volume that we mentioned earlier is a 3D concept but that is not relevant in 2D. So instead
of view volume we now use the concept of view window which is a square region on the view
plane. In earlier lectures, we already got introduced to idea that we mentioned that on the view
plane there is a clipping window on which objects are projected, the concept is same as the view
window.
So it is equivalent actually, to assume that the view volume and all objects are already projected
on the view plane, that is another way of looking at 2D clipping. So already projection is done
and we have 2D clipping window now we want to perform clipping. So that is after projection
we want to perform clipping.
601
(Refer Slide Time: 10:00)
So, this view volume is projected to form the window and other objects are projected to form
points, lines and fill areas, such as a polygon. So, we have a window which is formed by
projecting the view volume on the view plane and then other objects are projected to form points,
lines as well as fill areas such as polygons.
(Refer Slide Time: 10:34)
So, then our objective boils down to performing the clipping operation for points, lines, and fill
areas with respect to the window. So, the scenario is that we have a clipping window or view
window and we have objects that are projected on the view plane and we want to clip the
602
projected objects against this window where the projection takes place in the form of points,
lines or fill areas.
(Refer Slide Time: 11:18)
Let us start with the simplest of all clipping, that is point clipping. How to clip a point against the
view window.
(Refer Slide Time: 11:31)
Let us assume that we are given a point with a coordinate x, y. Now the clipping idea is simple,
we have to check whether the point is within the window or outside. So, what we have to do is
603
simply have to check if the coordinate values lie within the window boundary. So, we need to
perform these check-ins. So, here we are checking the x value against the window boundaries
and y value here we are checking against the window boundaries to determine whether they are
inside the boundary or outside. So, if it is inside the boundary, we keep the point otherwise we
clip it out. That is a very simple idea.
(Refer Slide Time: 12:35)
More complicated is line clipping. Here we do not have a single point, we have a large number
of points to consider. So, the corresponding algorithm would be more complicated than what we
have seen for point clipping.
604
(Refer Slide Time: 13:00)
Let us first talk about a very simple intuitive approach. What we can do, we can represent any
line segment with its end points and then we check the end point positions to decide whether to
clip or not. So, we are representing a line with its end points. Then for each end point, we are
applying this point clipping approach to check whether the endpoint is within the boundary or
not. And after checking both the end points, we can come to a conclusion whether the line is
within the window or not.
(Refer Slide Time: 13:49)
605
But there is a problem. If we follow this approach, there can be either of the three scenarios. In
the first case, both end points may lie within the window boundary. Consider this line segment as
L1. Here, both the end points are within the boundary. So we do not clip. In the second case, one
end point is inside and the other end point is outside. Consider L2, here this endpoint is inside
and this endpoint is outside. So, in that case it has to be clipped. And then there is a third
scenario where both the endpoints are outside.
However, here we have a problem. Considered the line L4. In this case, both the end points are
outside and the entire line is outside, so we can simply discard it. But in case of L3, here also
both the end points are outside. However, a portion of it defined between these two intersection
points is actually inside the view window. So, we cannot discard the line entirely. Instead, we
have to clip out these outside parts and discard them whereas this inside part must be kept. So, in
this case, then we have to check for line boundary intersections to determine whether some
portion is inside the window or not.
(Refer Slide Time: 15:45)
Now, this intersection checks are computationally expensive because they involve floating point
operations. And in practical applications, every change of screen involves large number of such
intersection checks, which may slow down the overall rendering process and it may even turn out
to be impractical. So, we require some alternative, some better solution.
606
(Refer Slide Time: 16:25)
One such solution is provided by one algorithm called, Cohen-Sutherland Algorithm. Let us go
through the steps of the algorithm.
(Refer Slide Time: 16:38)
So, in this algorithm, we assume a representation of the view plane. So here we assume that the
window and its surrounding is divided into nine regions as shown here. So, this is the window,
the central part, and then we have above left, above, above right with respect to the window, left,
right, with respect to the window again, and below left, below, and below right with respect to
the window position again. So, these nine regions are assumed to represent the view plane and
607
how we get the nine region, by extending the window boundaries as shown here. So, this is the
first assumption.
(Refer Slide Time: 17:41)
Then what we do, we assign a code to each of these regions. Now, there are nine regions. So, we
require four bits to represent all the nine regions, which is very obvious and each region is given
a unique code, a four bit unique code. Now, each bit indicates the position of the region with
respect to the window. So, this is the nine region representation and this is the codes assigned to
these nine regions. Note here that each code is unique.
So above left is represented by 1001, above is represented by 1000, above right is represented by
1010, left represented by 0001, right represented by 0010, below left represented by 0101, below
represented by 0100, and below right represented by 0110. The central region or the window is
represented by all zeros.
And the organization of the code looks something like this, where the leftmost bit indicates
above location, next bit indicates below location, next bit indicates right, and the rightmost bit
indicates left location. So, each location can take either 1 or 0, 1 means it is above, 0 means it is
below. For example, consider 1001. So, we have above 1, below 0, right 0, and left 1, that means
the region is above left as shown here because these two bits are 1, whereas below and right bits
are 0. So that is how the codes are assigned to each region.
608
(Refer Slide Time: 20:26)
Now that is the assumption and assignment of code. Then the algorithm starts working once
those things are done. So, in step one of the algorithm we assign region codes to the end points
of the line segment. So given a line segment, we first have to assign its endpoints, the
corresponding region codes. How we can do that, assume that the end point, one end point is
denoted by P(x, y) and the window is specified by these boundary values Xmin, Xmax, Ymin, Ymax.
So, these are the boundary values which specifies the window.
(Refer Slide Time: 21:30)
609
Then what we do, we perform a simple check to determine the region code of the endpoint P.
What are those checks, so we check the sign of the difference between Y and Ymax. P has
coordinates x and y and Xmax, Ymax, Xmin, Ymin are the window boundaries. So, we take the
difference of Y and Ymax and check its sign. Now they sign up this quantity, will give us a result
1 if this quantity is positive, otherwise it is 0 in case of negative. So, we do that for each bit
position. So, for bit 3, we checked the difference between Y and Ymax.
For bit 2, we check the difference between Ymin and Y. For bit 1 we checked the difference
between X and Xmax and for bit 0 we checked the difference between Xmin and X and take their
sign and then apply this rule to get the actual bit value.
(Refer Slide Time 22:54)
So, the first step is assignment of region codes to the end points. In step 2, we perform more
checks on the region codes that are assigned to the endpoints and then we take action. So, if both
endpoint codes turn out to be 0000, that means the line is completely inside. So, we retain the
line. Now, if logical AND or the bitwise AND operation on the endpoint codes turn out to be not
equal to 0000, then the line is completely outside and we discard the entire line. Note that, here
we do not need to check for intersection.
610
(Refer Slide Time: 23:48)
In the next step, when none of the cases that we just discussed in step 2 occur, we know that the
line is partially inside the window and we need to clip it. So, in step 2 we perform checks to
decide whether the line is completely inside or completely outside and accordingly we take step.
Then if none of these conditions satisfy that means the line is partially inside and we need to clip
it, that we do in step 3.
(Refer Slide Time: 24:26)
So for clipping, we need to calculate the boundary and line intersection point. So, here we cannot
avoid calculation of the intersections points. And that we can do in different ways, one possible
611
way can be take the boundaries in a particular order. For example, first above boundaries then
below boundary, then right boundary, then left boundary and so on, any order is fine. And for
each boundary compare the corresponding bit values of end point region codes to decide whether
the line is crossing the boundary or not.
(Refer Slide Time: 25:25)
If the region codes are not same then the line intersects that particular boundary, then we form
the line equation from the end points and determine the intersection point by solving the
equations. And then we assign region code to the intersection point, as we have done earlier. And
we do it for all boundaries and in this process, we discard the line segment that lies outside the
window.
612
(Refer Slide Time: 26:06)
Now we have a new region code that is the intersection point and the other end point with respect
to a particular boundary. So, we compare the two new end points to see if they are completely
inside the window. If they are, then of course we keep that, if not then we take the other end
point and repeat the process. So, what we do in step 3, in step 3 we go for clipping the line. We
start with one end point; assign a region code which is already there, we check the region code
with respect to all the boundaries following a particular sequence and based on that check we
determine if the line is intersecting a particular boundary.
If that is so, then we use the line equations and the boundary equations to solve for the
intersection point. Then we assign a new code, the way we did before to this intersection point
and the intersection point and the end point clearly lies outside the clipping window and we
discard it. Then we have two new end points that is the intersection point and the original
remaining endpoint. We check if it is completely inside the window if it is then we keep it,
otherwise we repeat the process again for the other end point.
613
(Refer Slide Time: 27:55)
Let us try to understand whatever we have discussed so far in terms of some examples.
(Refer Slide Time: 28:03)
So earlier we have shown how to assign the region codes, and based on that how to determine
whether to clip a line or not. Let us consider this line segment defined between the end point A
and B. And the window extension provided here, the Xmin is 2, Xmax is 4, Ymin is 2, and Ymax is 4.
We check the sign and accordingly we assign region code to the end points. For example, for A,
let us try to do this. Let us start with bit 3, here the corresponding expression is sign of 3 minus
4, which is sign of minus 1, since it is a negative quantity so bit 3 will be 0.
614
Similarly, bit 2 again negative quantity so it will be 0. Bit 1, this is a positive quantity 1 so it will
yield 1. And bit 0 which is again a negative quantity and we are taking the sign of it, so it will
result in 0. So, our region code will be given by these bits.
(Refer Slide Time: 29:35)
Or 0010. In a similar way, we can check for B which is 0010, it will be same.
(Refer Slide Time: 29:57)
Now we go to step 2, series of checks. So, first check is whether both end points are equal to 0.
Now they are not 0000, so first check fails. Second check is if they are logical AND is not equal
615
to 0000 which happens to be true in this case. So, we can be sure that this line is completely
outside the window boundary. So, no further check is required and we can simply discard it. So,
this particular line will be completely discarded just by checking the region codes, we managed
to determine that it is completely outside.
(Refer Slide Time: 30:51)
Now let us consider another line segment, that is given by the end points P and Q. As usual we
will first determine the end point region codes. The P code will turn out to be 0000, as you can
see here for bit 3 sign of a negative quantity 0. Again, bit 2 negative quantity 0, bit 1 negative
quantity 0 and bit 0 sign of a negative quantity again 0. Similarly, we can determine the region
code of Q, which turns out to be 0010. You can try yourself.
616
(Refer Slide Time: 31:43)
Then once the region codes are decided, we go for the series of checks. First check fails, the end
points both end points are not 0000. Second check also fails, logical AND turns out to be 0000.
So, it is not completely outside. So, we need to go for the third step and need to determine the
intersections.
(Refer Slide Time: 32:15)
From the endpoints we can derive the line equation, as shown here. And then we check for
intersection of this line with boundaries in the following order, above boundary first we will
check with, then below boundary, then right boundary, and then left boundary. So, for above
617
boundary you can see that bit 3 which represents the above boundary of P and Q both the end
points are same. So, the line does not cross above boundary. Similarly, you can see that that line
does not cross below boundary. There is a difference of the bit values in case of bit 1, these
values are different. So, we can conclude that the line crosses the right boundary.
(Refer Slide Time: 33:30)
So, we have to find out the intersection point with the right boundary. Now, right boundary line
equation is x equal to 4, as you can see here. And we put this value in the line equation to get the
intersection point at 4, 5 by 2. This is the intersection point. Now, as you can see let us call this
Q’. So, since the line crossed the boundary, so Q’Q is outside the boundary because Q is outside.
So, this segment is discarded and the new line segment becomes PQ’. These are the two new end
points.
618
(Refer Slide Time: 34:26)
Next, we determine the region code of Q’ as we have seen earlier which turns out to be 0000.
Now, as you can see both P and Q’ have the same region code 0000. So, this segment is entirely
within the window. And it is retained by changing Q to Q’.
(Refer Slide Time: 35:00)
One thing was left, earlier we checked for above below right boundary, we did not check for left
boundary. However, as we can see with this region codes of P and Q’ because that is the new line
segment bit 0 is same for both end point so there is no intersection. And so, at the end of the
619
algorithm, we get PQ’ as the clipped line. So, that is how we can check for clipping following
this algorithm.
(Refer Slide Time: 35:54)
Let us see one more example. Now let us consider a slightly more complicated line segment
given by M and N. As before we can determine the region code as shown here, for both the end
points M and N which turns out to be for M it is 0001 and for N it is 0010.
(Refer Slide Time: 36:21)
620
Then in next step, we go for the checks, series of checks. Here also first check fails, both end
points are not equal to 0000. Second check also fails. Logical AND results in 0000, so it is not
0000, so second check fails. Thus, we cannot completely keep it or completely discard it. So,
there must be some intersection and we have to determine that intersection and clip the line. So,
we need to determine line boundary intersection points and clip it.
(Refer Slide Time: 37:07)
For that we need to decide the line question from its end points, which is easy as shown here.
Then we check for the line intersection with boundaries and following this order above, below,
right, and left boundaries. So, bit 3 of M and N are same that means line does not cross above
boundary. Similarly, it does not cross below boundary bit 2 is same. However, bit 1 are different
and the line crosses right boundary.
621
(Refer Slide Time: 37:55)
Then we check for intersection points, so the right boundary equation is x equal to 4. So, using
this equation and the line equation we get the intersection point to be N’(4, 9) by 4, that is here.
Now this point is the intersection point between the right boundary and the line and since N is
outside the right boundary, so as before we discard this segment N’N and the new line segment
becomes MN’, so this part. Then we decide the region code of the new end point N’ which is
0000.
(Refer Slide Time: 39:05)
622
Thus, we now have two new endpoints M and N’. M has code 0001 and N’ has code 0000. Now
earlier we checked for above, below, and right boundary, left boundary check was pending, we
will do that next. Here if we check, we can see that for the left boundary, the bit values are not
same, that means again there is an intersection with the left boundary and we check for that
intersection points between the left boundary and the new line segment given by the end points
M and N’.
(Refer Slide Time: 39:59)
Now, we know left boundary equation is x=2. We use this equation and the line equation to get
the intersection point M’, which is 2 and 11/4, here. Now the point M is outside the left boundary
that we already know, that means on the left side, so we discard this segment MM’. And new line
segment becomes M’N’ between these two points, this is M’ and this one is N’. So, we now
decide or determine the region code of this new endpoint M’ which is 0000.
623
(Refer Slide Time: 40:52)
Then we go for checking the N point region codes again, step 2 of the algorithm and we find that
M’ and N’ are same region code 0000, that means this entire line segment M’ and N’ is within the
window. So, the algorithm resets the line segment to M’N’ and we have already checked for all
the boundaries, so no more boundary remains to be checked. And the algorithm returns this line
segment as the clipped line segment and stops. That is how the algorithm works.
(Refer Slide Time: 41:44)
So to summarize, the Cohen Sutherland algorithm is designed to reduce intersection calculations.
If we go for intuitive method then when the line is completely inside or completely outside, it is
624
difficult to tell. So, for every line we have to go for intersection checks, that is not required in
case of Cohen Sutherland method.
Here, we assign some region codes to the line endpoints and based on the region codes we can
decide whether the line is completely inside or completely outside. In those cases, we do not
need to go for intersection checking. However, if the line is not completely inside or completely
outside, we know that there is some intersection and we need to clip, there we have to go for
some intersection checks and find out the intersection points. So, some amount of intersection
calculation still remains.
(Refer Slide Time: 42:48)
But it reduce the calculation to a great extent, which is beneficial to render complexions faster.
But as I said, it still retains some intersection calculations.
625
(Refer Slide Time: 43:12)
So, this particular algorithm works well when the number of lines which can be clipped without
further processing is large compared to the size of the input set of lines. So, when the number of
lines that can be clipped without further processing is large, then clearly Cohen Sutherland works
much better because no intersection calculation is involved. But if that is not so, then it still has
some problem.
(Refer Slide Time: 43:52)
626
There are in fact other faster methods that are developed to reduce intersections calculation
further and those methods are developed based on more efficient tests which do not require
intersection or complex floating point operations.
(Refer Slide Time: 44:15)
There was one algorithm proposed by Cyrus and Beck, which was among the earliest attempts in
this direction and which relied on parametric line equations. Later, a more efficient version was
proposed by Liang and Barsky.
(Refer Slide Time: 44:44)
627
However, in this course we will not discuss those algorithms any further. If you are interested
then you may refer to the reading material that will be mentioned next. So to summarize, today
we have learned about clipping in 2D, clipping means discarding objects that are outside the
view volume. In case of 2D clipping we want to discard lines or points that are outside the
clipping window. Later on, we will see how to discard fill area also that are outside clipping
window. And in order to do that, we rely on some algorithm to reduce extensive intersection
point calculations.
One such algorithm we have learned today, that is Cohen Sutherland algorithm. It is quite
efficient however it still retains some amount of complex calculations which can be alleviated
with other more efficient algorithms, namely by Cyrus and Beck or by Liang and Barsky. So,
those algorithms we have not discussed.
(Refer Slide Time: 46:07)
If you want to learn about those you may refer to the reading material that is mentioned in this
slide. So, you may refer to this book. Go through chapter 7, section 7.1, today we discussed
section 7.1.1. However, if you are interested to learn about Liang Barsky algorithm, then you can
go through the next section as well, section 7.1.2 also. We will not discuss that algorithm but you
may go through it, if you want more information. That is all for today. Thank you and goodbye.
628
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 22
2D Fill-area Clipping and 3D Clipping
(Refer Slide Time: 00:46)
Hello and welcome to lecture number 22 in the course Computer Graphics. We are currently
discussing the 3D graphics pipeline. And the pipeline has got 5 stages. We have already
discussed object representation that is the first stage. Then modelling transformations - second
stage. Lighting or assigning colour - third stage. Currently, we are in the fourth stage that is
viewing pipeline. As you can see, it consists of 5 sub-stages. We have already discussed few of
those and continuing our discussion on the remaining ones.
629
(Refer Slide Time: 01:20)
So, among those sub-stages we have already discussed earlier. View transformation, projection
transformation and viewport transformation.
(Refer Slide Time: 01:34)
Two more operations are there, as we have seen in the pipeline; clipping and hidden surface
removal. Among them currently we are discussing clipping.
630
(Refer Slide Time: 01:48)
So, in the last lecture, we introduced the basic idea of clipping and also discussed 2D line
clipping. So, will continue our discussion on clipping. Today, we are going to discuss fill area
clipping as well as 3D clipping.
(Refer Slide Time: 02:12)
631
So, what is this fill area clipping? So, as we mentioned, when we talk of clipping, there is a
clipping window and earlier we have discussed how to clip points and lines against this window.
However, when we project objects the projection maybe in the form of a fill area such as a
polygon where there is a boundary.
Now clipping a filled area is different than flipping a point or a line, as we shall see in todays
lecture. In fact, such situations are quite frequent in practice where we have to clip polygons
against the clipping window. So, it requires some mechanism to do that.
632
(Refer Slide Time: 03:12)
Now, what can be a very obvious and straightforward approach, let us try to understand the
situation. Suppose this is our clipping window and we are given a polygon, something like this
after projection say this triangle. So we have to keep this part which is inside the clipping
window, which I am showing with shade and we have to clip out the, outside part. How we can
do that?
One way can be to use the line clippers that we discussed in earlier lecture for each of the edge,
like here is one edge, one edge, one edge of the field area. And then perform clipping on the
edges and decide on the clipped region. However, as you can see from this example, that is not
necessarily easy, efficient and going to give us a good approach. Sometimes it is even difficult to
understand how it works
633
(Refer Slide Time: 04:45)
Instead, we require better approaches. There are in fact, many efficient algorithms proposed for
the purpose. In this lecture we are going to discuss two of those approaches. One is SutherlandHodgeman algorithm and the other one is Weiler-Atherton algorithm. Let us try to understand
these algorithms.
(Refer Slide Time: 05:13)
634
We will start with the Sutherland-Hodgeman algorithm, what this algorithm does? Here in this
algorithm we start with 4 clippers. Now, these clippers are essentially the lines that define the
window boundary. For example, if this is my window boundary, then each of these lines defining
the boundary is a clipper.
So, there are 4 clippers in 2D clipping that is right, left, above and below. Now, each clipper
takes as input a list of ordered pair of vertices which essentially indicate the edges, each pair of
vertex indicate the edge. And from that input list it produces another list of output vertices that is
the basic idea. So, there are 4 clippers, each clipper takes as input a list of ordered pair of vertices
where each pair of vertices represent an edge. And then it performs some operations to produce
an output list of vertices.
635
(Refer Slide Time: 06:55)
Now, when we perform these operations, we impose some order of checking against each clipper
that can be any order. Here in this discussion will assume the order left clipper first, then right
clipper, then bottom clipper, and at the end the top or above clipper.
(Refer Slide Time: 07:20)
Now, as we said we start with the left clipper. So, its input set is the original polygon vertices or
in other words, the original polygon edges represented by the pair of vertices that is the input set
to the first or the left clipper.
636
(Refer Slide Time: 07:44)
Now, to create a vertex list as output or also to provide the input vertex list, we need to follow a
naming convention, whether to name the vertices in a clockwise manner or anticlockwise
manner. Here again, we will assume that we will follow an anticlockwise naming of vertices.
With these conventions, let us denote input vertex list to a clipper by the set V having these
vertices.
637
(Refer Slide Time: 08:31)
Now, for each edge or the pair of vertices in the list denoted by vi, vj. We perform some checks
and based on the check results, we take some action. So, what are those checks?
638
(Refer Slide Time: 08:52)
If vi is inside and vj is outside of the clipper then we return the intersection point of the clipper
with the edge represented by the vertex pair vi, vj. If both vertices are inside the clipper, then we
return vj.
639
(Refer Slide Time: 09:25)
If vi is outside and vj is inside of the clipper, then we return two things. One is the intersection
point of the clipper with the edge represented by the pair vi, vj and also vj. Both the things we
return intersection point and vj. And finally, if both vertices are outside the clipper then we do
not return anything, we return null.
640
(Refer Slide Time: 10:05)
Now here we have use the terms inside and outside. So, how they are defined? In fact these terms
are to be interpreted differently for different clipper. So, there is not a single meaning to these
terms based on the clipper we define these terms.
(Refer Slide Time: 10:27)
And let us now go through this definition for each of the 4 clippers. So, for the left clipper, when
you talk of insight we mean that the vertex is on the right side of the clipper and when we talk of
outside we mean that it is on the left side of the clipper. For right clipper it is just the opposite.
641
When the vertex is on the left side, we call it inside. Otherwise it is outside. For top clipper if a
vertex is below the clipper that means it is inside.
Otherwise it is outside. And for a bottom clipper, it is again just the opposite of top clipper that
means inside vertex means it is above the clipper, whereas outside means it is below. And how
do we determine whether a vertex is on the right side or left side or above or below, just by
considering the coordinates values, by comparing the coordinated values of the vertex with
respect to the particular clipper.
For example, suppose this is the top clipper, suppose it is equation is given by x = y = 4. Now
suppose a point is denoted by (3, 5). Now we check here the y-value of the point that is 5, clearly
5 is greater than 4 which is the y-value of the boundary, top boundary. Then we can say that this
point is outside because it is above the clipper. Similarly, we can determine the inside and
outside based on comparing the x or y coordinate value of the vertex with respect to the clipper
values.
(Refer Slide Time: 12:55)
If the vertex is on the clipper then it is considered inside in all the cases. So, for a left clipper
inside means either it is on the right side or on the clipper, otherwise it is outside. And same is
true for all other clippers.
642
(Refer Slide Time: 13:21)
Now, let us try to understand this algorithm in terms of an illustrative example. Let us consider
this situation here we have defined one clipping window and we have a fill area. Now this fill
area is defined by the vertices {1, 2, 3} as you can see here, we followed a counter clockwise or
anticlockwise naming convention to list the vertices.
Our objective is to determine the clipped polygon. That is this polygon denoted by the vertices
{2’, 3’, 3’’, 1’ and 2}. And we use to do that by following the Sutherland-Hodgeman algorithm.
643
(Refer Slide Time: 14:44)
So, at the beginning we start with the left clipper. Then we check against right clipper, then top
clipper and then bottom clipper.
(Refer Slide Time: 15:06)
Let us see what happens after checking against the left clipper. So here the input vertex list is the
original vertices that is {1, 2, 3} which indicates the three edges represented by the vertex pair;
{1, 2}; {2, 3} and {3, 1}. So, for each pair we perform the check. For pair {1, 2} we can see that
644
both the vertices are on the right side of the left clipper that means both are inside. So, Vout is 2
as per the algorithm.
Similarly, after checking for {2, 3} against the left clipper, we can see that the final Vout becomes
{2, 3} taking into account 2. And after checking {3, 1} the final list becomes {2, 3, 1}. In all the
cases against the left clipper all the vertices are inside.
(Refer Slide Time: 16:29)
Now, let us check against right clipper. So, now the input vertex list is {1, 2, 3} same and
initially Vout is NULL. So, pair of vertices has to be checked {1, 2; 2, 3} and {3, 1}, all the three
edges we need to check. For 1, 2 both are inside the right clipper. We can check by comparing
their coordinative values because both of them are on the left side of the right clipper. So, V out is
now 2. Then we check the pair {2, 3} here we can see that 2 is inside whereas 3 is outside. So, in
that case we compute the intersection point 2’ this point and then set Vout to be {2, 2’}.
Then we check {3, 1} here vertex 3 is outside because it is on the right side of the clipper,
whereas 1 is inside because it is on the left side. So, here we calculate the intersection point 3’
and then finalize the output vertex list as {2, 2’, 3’ (and) 1}; because in this case we return 1 also.
So, then after checking against the right clipper, we get this output list; {2, 2’, 3’ (that means this
point and), 1}.
645
(Refer Slide Time: 18:40)
Then we check against the top clipper. Now in this case, the Vin or the input vertex list is the
output vertex list after checking against the right clipper. So that is 2, 2’, 3’, 1. So, initially Vout is
NULL. And the pair of vertices we need to check are 4; {2, 2’, 2’, 3’, 3’ 1} and {1, 2}.
So, first we check {2, 2’} against the top clipper and we find that both 2 and 2 dash are inside
because both of them are below the clipper. So, output list becomes 2’. Then we check the next
vertex pair {2’, 3’} again {2’, 3’} both are below, so inside then Vout becomes 2’ and 3’.
Then we check 3’, 1 in this case, we see that 3’ is inside, whereas 1 is outside. Then we calculate
the intersection point 3’’ here and modify our output list to be {2’, 3’, (and) 3’’}. Finally, we check
{1, 2}. Here we see that 1 is outside whereas 2 is inside. Then again we calculate the intersection
point 1’ and modify Vout to be {2’, 3’, 3’’, 1’ (and) 2}. So, this is our output list after checking
against top clipper, and this serves as the input list to the remaining clipper to be checked that is
bottom clipper.
646
(Refer Slide Time: 20:43)
This is the input list for the bottom clipper and output list is initially null and as we can see all
these vertices 2, 2’, 3’, 3’’ and 1’ are inside because they are above the bottom clipper. So, the
output list becomes the same that is {2’, 3’, 3’’, 1’, 2}. This is also the output of the algorithm,
because here no more clippers are there to check against and the algorithm stops. So, at the end
of the algorithm, we get this vertex list which represents the clipped region. That is how the
algorithm works.
647
(Refer Slide Time: 21:43)
Now let us move to our next algorithm that is Weiler-Atherton algorithm. Now, the SutherlandHodgeman algorithm that we just discussed works well when the fill area is a convex polygon
and it is to be clipped against a rectangular clipping window. So, if this condition satisfy then
Sutherland-Hodgeman algorithm works well.
648
(Refer Slide Time: 22:16)
However that need not be the case always and that Weiler-Atherton algorithm provides a more
general solution. This algorithm can be used for any polygon, either concave or convex against
any polygonal clipping window. Need not be only a rectangle. Let us see how it works.
(Refer Slide Time: 22:44)
So, we will try to understand the algorithm in terms of an example rather than formal steps. Let
us consider this scenario here we have a rectangular clipping window and a fill area. So, we will
try to understand how the algorithm helps us identify the parts to be discarded that is this region
649
and the parts to be kept after clipping that is these two regions this one and this one. So, here we
start with processing the fill area edges in a particular order, which is typically anticlockwise
order. So, here we start with processing the fill area edges in a particular order which typically is
anticlockwise order.
(Refer Slide Time: 24:00)
So, what we do in the processing, we check the edges one by one, continue along the edges till
we encounter an edge that crosses to the outside of the clip window boundary. Let us start with
this edge (1, 2) this edge. So, we check it whether it crosses the window boundary or not, that is
our processing. It does not cross so we continue to the next stage, that is {2, 3} represented by
the vertex pair {2, 3}.
Now this edge crosses to the outside of the window boundary. Note that here we are following
anticlockwise order. If the edge does not cross to the outside instead if the edge is crossing into
inside of the window then we just record by intersection point, whereas if the edge is crossing to
the outside boundary, then we stop and perform some different action, what we do in that case.
650
(Refer Slide Time: 25:24)
At the intersection point, we make a detour. So, here the intersection point is 2’ this point. So,
then we make a detour. We no longer continue along this direction. Instead what we do we now
follow the edge of the clip window along the same direction, maintaining the traversal order.
So, now in this example so we will follow this anticlockwise direction and make a detour from
here now along the window boundary, so here we will follow this order. So, essentially, how you
are traversing, we initially traversed in this way then while traversing in this way found that this
edge is crossing to the outside. So, then we traverse in this way instead of continuing along the
edge.
651
(Refer Slide Time: 26:25)
Now, this along the boundary traversal, we continue till we encounter another fill area edge that
crosses to the inside of the clip window. So, here as you can see, if we follow a anticlockwise
traversal, then this edge is actually crosses to the inside. So, the edge is 6, 1 denoted by the
vertex pair 6, 1 which crosses to the inside of the window and we encountered it while traversing
along the window boundary. At that point what we do?
652
(Refer Slide Time: 27:17)
At that point, we resume the polygon edge traversal again along the same direction. So, we stop
here and then again continue along the same direction till we encounter previously processed
intersection point. So, here we continue up to point 1 because point one is already processed. So,
we stop here.
(Refer Slide Time: 27:53)
So, then after this part, we see that we started from here, then traversed up to this point,
determined this intersection point, then traversed along this line up to this intersection point,
653
traversed back up to the originating point. So, there are two rules of traversal from an
intersection point due to outside to inside fill area edge we should follow the polygon edges from
an intersection point due to inside to outside fill area edge we should follow the window
boundaries.
So, these are the rules we applied while performing the traversal. But this gives us one part of the
clipped area that is this part and apparently here it stopped. So, how to get to the other part?
Actually, here the algorithm does not stop. What happens next?
(Refer Slide Time: 29:00)
Before we go into that, also, you should remember that whenever we are traversing the traversal
direction remains the same, irrespective of whether you are traversing along the edge or along
the windows boundary. So, if you are following an anticlockwise direction, it should be
anticlockwise always.
654
(Refer Slide Time: 29:21)
And after this traversal ends, the output is the vertex list representing a clipped area, as we have
seen. So, in this case the traversal ended at 1. So, we get the vertex list {1, 2, 2’, (and) 1’} which
gives us this clipped area.
(Refer Slide Time: 29:46)
But clearly here, the whole fill area is not covered. Some of the vertices are still not processed,
so then what do we do? We resume traversal in case all the vertices are not processed. We
resume the traversal along the polygon edges in the same direction from last intersection point of
655
an inside outside polygon edge. So, our last intersection point of an inside outside polygon edge
is 2’ here. Remember that this 1’ is outside inside edge. So, it is not applicable. So, what is
applicable is 2’. So, from there we resume our traversal till we cover the remaining vertices.
And this traversal is in a similar way that we have done before. So, here what we do, we traverse
along this anticlockwise direction to the vertex here. So, we traverse this edge, then this edge.
But here, as you can see, there is an outside to inside crossing. So, we do not do anything, we
keep on traversing this way, this way. Now at this point we can see that one inside to outside
crossing is there. In the earlier case, it was outside to inside.
Here it is, inside to outside at 6’. So, now we traverse along the edge. Then we encountered this
intersection point again. This is from outside to inside. So, now we resume our traversal along
edge. So, finally what we did, we traversed this direction, this direction, this direction, then this
direction, then this direction, this direction. Now since already we have encountered 4 before so
we stop our traversal here when we encounter 4. Then we get this remaining portion of the
clipped area also just like the way we got it earlier. So, that is how Weiler-Atherton works.
(Refer Slide Time: 32:19)
So, we encountered or we discussed two algorithms; one is Sutherland-Hodgeman, one is
Weiler-Atherton. Sutherland Hodgeman is simpler but it has restrictive use. It is applicable when
we have a convex polygon which is clipped against a rectangular window. Whereas Weiler-
656
Atherton is more generic it is applicable for any fill area, polygonal fill area, either concave or
convex against any polygonal clipping window.
So, so far we have discussed clipping in 2D. So, we have learned how to clip a point line and fill
area. Now let us try to understand clipping in 3D, because here our main focus is 3D graphic
pipeline. So, we will try to understand clipping in 3D which is essentially extension of the ideas
that we have already discussed that is clipping in 2D. Let us see how these extensions are done.
(Refer Slide Time: 33:42)
Only thing we have to keep in mind is that here we are talking about clipping against normalized
view volume which is usually a symmetric cute with each coordinate in the range minus 1 to 1 in
the 3 directions. That is a normalized view volume we assume while developing the or
performing the clipping. Now, Cohen-Sutherland we can extend the basic 2D version to 3D with
some modification.
657
(Refer Slide Time: 34:22)
Point clipping also, we can extend, so let us first talk about point clipping. Here we check for x,
y and z earlier we are checking only for x and y whether these values are within the range of the
canonical volume. If that is so, then the point is to be kept. Otherwise it is to be clipped out.
(Refer Slide Time: 34:53)
In case of Cohen Sutherland line clipping algorithm, it can be easily extended to 3D clipping.
However, with some modifications, core idea remains the same. That is, we divide view
coordinate space into regions. Now, earlier we had 9 regions. Now since we are dealing with 3D
658
we have 27 regions, 3 times. Now since we have 27 regions. So, each region needs to be
represented with 6 bits. Each bit for the 6 planes that define the canonical view volume. Far,
near, top, bottom, right, left this is in contrast with the 4 bits earlier used to denote the 4 sides of
the window.
Now for each plane, we have this 9 regions defined so there are 9 regions behind the far plane.
There are 9 regions between near and far plane and there are 9 regions in front of the near plane.
Together there are 27 regions and each region is represented with this 6 bit code, where bit 6
represent the far region, bit 5 is the near region, bit 4 is the top region, bit 3 is the bottom region,
bit 2 represents the right region and bit 1 represents the left region. The idea remains the same
with 2D only the size changes because we are now dealing with 3D. The other steps remained
the same.
659
(Refer Slide Time: 37:21)
Now let us try to understand the extension of the algorithms for fill area clipping. So, here what
we do. We first check if the bounding volume of the polyhedron that is the fill area is outside the
view volume simply by comparing their maximum and minimum coordinate values in each of
the x, y and y directions. If the bounding volume is outside then we clip it out and entirely.
Otherwise we apply 3D extension of the Sutherland Hodgeman algorithm for clipping.
660
(Refer Slide Time: 38:16)
Here also the core idea of 3D Sutherland Hodgeman algorithm remains the same with 2D version
with two main differences. What are those differences?
(Refer Slide Time: 38:31)
A polyhedron is made up of polygonal surfaces. So, here we take one surface at a time to
perform clipping. Earlier what we were doing, we took one line at a time. Here we are taking one
surface at a time. Now, usually polygons divided into triangular meshes and there are algorithms
to do so which you can refer to in the reference material at the end of this lecture. So, using those
661
algorithms, we can divide a polygon into a triangular mesh and then each triangle is processed at
a time.
(Refer Slide Time: 39:17)
And the second difference is, instead of the 4 clippers that we had earlier, we now have 6
clippers. Which correspond to the 6 bounding surfaces of the normalized view volume which is a
cube. So, these are the differences between the 2D version of the algorithm and the 3D version
that earlier we are considering line at a time for clipping. Now we are considering a surface at a
time. Now these surfaces are polygonal surfaces and we can convert these surfaces into
triangular meshes.
And then we perform clipping for each triangle at a time that is one difference. Other difference
is earlier we are dealing with 4 clippers, now we have 6 clippers representing the 6 bounding
planes of the view volume which is a cube. So, that is in summary the major differences between
2D clipping and 3D clipping. Core ideas remain the same some minor changes are there. So,
with that we come to the end of our discussion on clipping.
662
(Refer Slide Time: 40:43)
And our next topic will be hidden surface removal. So, here few things omitted during the
discussion. For example the triangular mesh creation from given polygon. So, for these details
you may refer to the material that will be mentioned in the next slide.
(Refer Slide Time: 41:16)
So, whatever I have covered today can be found in this book. You can go through chapter 7,
section 7.1.3, 7.1.4 and section 7.2. For the topics that I have covered however outside this topics
663
also there are few interesting things that I did not discuss but you can find that in the book. So,
you may like to go through those material as well. That is all for today. Thank you and goodbye.
664
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture - 23
Hidden Surface Removal - 1
Hello and welcome to lecture number 23 in the course Computer Graphics. So we are
currently discussing the graphics pipeline and there are five stages in the pipeline as we all
know.
(Refer Slide Time: 00:46)
The first stage is object representation, second stage is modeling transformation, third stage is
lighting, fourth stage is viewing pipeline, and the fifth stage is scan conversion. Among them,
we are currently in the fourth stage that is viewing transformation. The previous three stages
we have already discussed.
665
(Refer Slide Time: 01:16)
Now, in this fourth stage, as you may recollect there are many sub-stages. So there are 3
transformations and 2 sub-stages related to some other operations. So the three
transformations are view transformation, projection transformation, and viewport
transformation.
(Refer Slide Time: 01:47)
Then we have two operations, clipping and hidden surface removal. Now, among all these
transformations and operations, we have already covered the three transformations and
clipping.
666
(Refer Slide Time: 02:02)
Today, we are going to discuss the remaining operation that is hidden surface removal. Let us
see what is hidden surface removal, what is the basic idea, and how we can do this.
(Refer Slide Time: 02:22)
667
Earlier, during our discussion on clipping, we have learned that how to remove objects fully
or partially that are outside the view volume. So those we did using the clipping algorithms.
So to note here that the clipping was done on objects that are partially or fully outside of the
view volume. Sometimes, we actually need to remove either again fully or partially objects
that are inside the view volume.
So in case of clipping, we are dealing with objects that are outside the view volume, whereas
in case of hidden surface removal, we deal with objects that are inside the volume. Now when
the objects are inside the volume, clearly, we cannot apply the clipping algorithm because
clipping algorithms are designed to detect the objects that are outside, either fully or partially.
(Refer Slide Time: 03:40)
668
Let us see one example. Consider this image. Here, there are two objects; this is the one and
this cylinder is the other one. Now ideally, there is one surface here. For realistic image
generation, if we are looking at this object from this direction then ideally, we should not be
able to see this surface represented by the dotted boundary line, this surface. So if the viewer
is located at this point here then this object A surface, which is behind this object B should
not be visible to the viewer.
So before we render the image to have the realistic effect, we should be able to eliminate this
surface from the rendered image.
(Refer Slide Time: 05:11)
Here in this case we cannot use clipping, because here we are assuming that both the objects
are within the view volume. So clipping algorithms are not applicable. What we require is a
different algorithm or a different set of algorithms. Now, these algorithms are collectively
known as hidden surface removal methods or alternatively, visible surface detection methods.
So to note which clipping, what we do? We try to remove objects that are partially or fully
outside the view volume. With hidden surface removal, what we do? We try to remove object
surfaces or objects, which are inside the view volume but which are blocked from view due
to the presence of other objects or surfaces with respect to a particular viewing position.
669
(Refer Slide Time: 06:17)
So in case of hidden surface, we are assuming specific viewing direction. Because a surface
hidden from a particular viewing position may not be so if we are looking at it from another
direction. So with respect to viewing position only, we can determine whether surface or an
object is hidden or not.
(Refer Slide Time: 06:46)
Now, before we go into the details of the methods for hidden surface removal, we should
keep in mind that there are two assumptions that we will be making. First one is we will use a
right-handed coordinate system and we will assume that the viewer looking at the scene along
the negative Z direction. Secondly, the objects whichever are there in the scene have
670
polygonal surfaces, so all the object surfaces are polygonal. These two assumptions we will
make in order to explain the hidden surface removal methods.
(Refer Slide Time: 07:48)
Now let us go into the details of the methods that are there to detect and eliminate hidden
surfaces.
(Refer Slide Time: 08:00)
Now there are many methods, all these methods we can broadly divide into two types. One is
object space method, the second one is image space method. So what is the idea behind these
methods let us try to understand?
671
(Refer Slide Time: 08:24)
In case of object space method what we do, we compare objects or parts of the objects to
each other to determine the visible surfaces. So here we are dealing with objects at the level
of 3D.
(Refer Slide Time: 08:53)
And the general approach that is followed to perform hidden surface removal with an object
space methods consists of two stages broadly. So for each object in the scene what we do is
first we determine those parts of the object whose view is unobstructed by other parts or any
other object with respect to the viewing specification.
672
So first stage is to determine the parts that are hidden with respect to the viewing position.
And then in the second position, we render the parts that are not hidden. So, essentially those
parts that are not obstructed with the color of the object. So these two are the general steps
that are performed in any object space method namely, first stage is to determine the surfaces
that are not hidden and in second stage we render those surfaces or parts of the objects with
the particular color.
(Refer Slide Time: 10:12)
Since here we are dealing with objects, so essentially these methods work before projection,
at the 3D object level. Remember that once we perform projection, the objects are
transformed to a 2D description. So we cannot have this 3D characteristics.
673
(Refer Slide Time: 10:41)
So what are the advantages? There is one advantage. So this object space methods provide
device-independent method and work for any resolution of the screen, but it also has some
drawbacks namely, determination of these surfaces that are hidden or not hidden is
computation intensive. Secondly, depending on the complexity of the scene and also the
resources that are available.
These methods can even become infeasible because they are computation intensive and if the
resources are not sufficient then we may not be able to implement them at all.
(Refer Slide Time: 11:43)
674
Usually, such methods are suitable for simple scenes with small number of objects. So object
space methods are best applicable when the scene is simple and having small number of
objects.
(Refer Slide Time: 12:00)
In case of image space method what happens? So as the name suggest, the detection and
rendering takes place at the level of image that means after projections. So here, visibility is
decided point-by-point at each pixel position on the projection plane. So here, we are no
longer dealing in the 3D space, we are dealing in 2D projection plane at the level of pixels.
(Refer Slide Time: 12:41)
675
Again, there are two steps in the general approach. So for each pixel on the screen what we
do? We first determine the objects that are closest to viewer and are pierced by the projector
through the pixel. So essentially, the closest objects that are projected to that point. Secondly,
the second step is to draw the pixel with the object color.
So in the first stage, we determine which is the closest object that is projected on the pixel,
and in the second stage we assign the pixel color as the object color and that we do for each
pixel on the screen. So to compare earlier what we were doing? Earlier, we were doing it for
each surface here we are doing it for each pixel.
(Refer Slide Time: 13:58)
Clearly, the methods work after surfaces are projected and rasterized that means mapped to
pixel grid, unlike the previous case where we were in the 3D domain.
676
(Refer Slide Time: 14:20)
Here the computations are usually less compared to object space methods. However, the
method depends on display resolution because we are doing the computations for each pixel.
So if there is a change in resolution then we require re-computation of pixel colors. So that is
the overhead.
(Refer Slide Time: 14:52)
So broadly, there are these two methods, object space methods, and image space methods.
Later on, we will see examples of each of these methods which are very popular. But before
going into that let us try to talk about some properties that actually are utilized to come up
with efficient methods for hidden surface detection and removal.
677
(Refer Slide Time: 15:25)
So there are many such properties. Collectively, these properties are called coherence
properties, which are used to reduce the computations in hidden surface removal methods. As
we already talked about, these methods are computationally intensive. So if we use this
coherent properties then some amount of computations can be reduced as we shall see later.
Now these properties are essentially related to some similarities between images or parts of
the images and if we perform computation for one part then due to these properties, we can
apply the results on other parts and that is how we reduce computation.
So essentially, we exploit local similarities that means making use of results that we have
calculated for one part of a scene or an image for the other nearby parts. So we perform
computation for one part and use the result for other part without repeating the same
computation and in that way, we reduce some amount of computation.
678
(Refer Slide Time: 16:54)
Now, there are many such coherence properties; broadly, of six types, object coherence, face
coherence, edge coherence, scan line coherence, depth coherence, and frame coherence. Let
us quickly have a discussion on each of these for better understanding although, we will not
go into detailed discussions and how they are related to different methods.
(Refer Slide Time: 17:25)
First is object coherence, what it tells? Here, we check for visibility of an object with respect
to another object by comparing its circumscribing solids, which are in many cases, of simple
forms, such as spheres or cube. So then, only if the solids overlap we go for further
processing. If there is no overlap that means there is no hidden surfaces so we do not need to
679
do any processing for further. So this is simple way of eliminating lots of computations due
to this object coherence property.
(Refer Slide Time: 18:23)
Next, come face coherence. Here, surface properties computed for one part of a surface can
be applied to adjacent parts of the same surface that is what is the implication of this face
coherence property. For example, if the surface is small then we can assume that the surface
is invisible to a viewer if one part of it invisible. So we do not need to check for invisibility
for each and every part. So we check it for one part and then we simply say that other parts
will also be invisible if that part is invisible.
(Refer Slide Time: 19:10)
680
Then third is edge coherence. Here, this property indicates visibility of an edge changes only
when it crosses another edge. If one segment of a non-intersecting edge is visible, we
determine without further calculation the entire edge is also visible. So edge coherence tells
us that there will be a change in visibility only if the edge intersects another edge. In other
words, if one segment of an edge is visible and the edge is not intersecting with any other
edge that means we can say that entire edge is also visible.
(Refer Slide Time: 20:11)
Then comes scan line coherence, what it tells? It indicates or implies a line or surface
segment that is visible in one scan line is also likely to be visible in the adjacent scan lines
and we do not need to perform this visibility computations for every scan line. So we do it for
one scan line and apply the result for adjacent scan lines.
681
(Refer Slide Time: 20:48)
Next is depth coherence, which tells us that depth of adjacent parts of the same surface are
similar. There is not much change in depth at the adjacent parts of a surface. This
information, in turn, help us to determine visibility of adjacent parts of a surface without too
much computation.
(Refer Slide Time: 21:26)
Then frame coherence, which tells us that pictures of same scene at successive points in time
are likely to be similar despite small changes in objects and viewpoint except near the edges
of moving objects. That means visibility computations need not be performed for every scene
rendered on the screen. So frame coherence is related to scene change. Earlier coherence
682
properties were related to static images here we are talking of dynamic change in images and
based on this coherence property we can conclude that visibility can be determined without
computing it again and again for every scene.
So that is in short the six coherence properties. First five properties are related to static
images last property can be used for rendering animations, which anyway is not part of our
lectures here and the hidden surface removal methods make use of this properties to reduce
computations. Now let us go into the details of such methods.
(Refer Slide Time: 23:10)
So we start with a simple method that is called back face elimination method.
(Refer Slide Time: 23:19)
683
What is this method? This is actually the simplest way of removing a large number of hidden
surfaces for a scene consisting of polyhedrons. So here, we are assuming each object is a
polyhedron, and using back face elimination, we can remove a large number of hidden
surfaces. The objective is to detect and eliminate surfaces that are on the backside of objects
with respect to viewer.
So when a surface is on the backside of an object with respect to a particular viewer clearly,
during rendering that back surface should not be shown. With back face elimination method,
we can detect on those back surfaces and then remove them from further consideration during
rendering.
(Refer Slide Time: 24:18)
The steps are very simple for this particular method. So there are three steps. In the first step,
we determine a normal vector for each surface N represented in terms of its scalar quantities
a, b, c along the three-axis. I am assuming here that you all know how to calculate the normal
vector for a given surface, if not then you may refer to any book on vector algebra, basic
book; it is a very simple process and I will not discuss the details here.
Now, once the normal is decided we check for this Z component. If this Z component is less
than equal to 0, the scalar component, then we eliminate that particular surface because when
Z component is less than 0, the surface is back face, whereas when it is equal to 0 the viewing
vector actually grazes the surface. In that case, also, we consider it to be a back face.
684
Now, if c is greater than 0 then we retain the surface it is not the back face and we perform
these steps one and two for all the surfaces in a loop. So as you can see, it is a very simple
method we simply take one surface at a time, compute its surface normal and check the Z
component the scalar component of Z. If it is less than equal to 0 then we eliminate the
surface otherwise we retain it and we do it for all the surfaces.
(Refer Slide Time: 26:24)
Let us consider one example. Suppose this is our object, it contain four surfaces ACB, ADB,
DCB, and ADC; four surfaces are there. So for each of these surfaces, we perform the back
face elimination method. For each of these surface, we calculate the Z component of the
surface normal as mentioned in the previous steps.
685
(Refer Slide Time: 27:05)
Let us do it for the surfaces. For ACB, the z component of the normal is -12, so it is less than
equal to 0. So ACB is not visible as you can see from this side ACB is on the backside of the
surface.
(Refer Slide Time: 27:28)
For ADB, DCB, and ADC the z components of normal are -4, 4, and 2, respectively. So we
can see that for DCB and ADC the z component is greater than 0, so these are visible
surfaces. But for ADB, it is less than 0 again so it is not a visible surface. So that is the simple
way of doing it.
686
(Refer Slide Time: 28:05)
And you should note here that we are dealing with 3D description of the objects in the view
coordinate system, so it works on surfaces, therefore it is an object space method. In practice,
using this very simple method, we can eliminate about half of all the surfaces in a scene
without any further complicated calculations.
(Refer Slide Time: 28:45)
However, there is a problem with this method. This method does not consider obscuring of a
surface by other objects in the scene. So what we did? We essentially eliminated back faces
of an object. Now here, the back face are obscured by surface of the same object. If a surface
is not a back face but it is obscured by surface of some other object then those surfaces
687
cannot be determined or detected using the back face elimination method and we require
some other algorithms.
However, those other algorithms are useful for detection of a surface that is obscured by other
object surfaces and we can use those algorithms in conjunction with this method.
(Refer Slide Time: 29:48)
Let us discuss one of those methods that is depth buffer algorithm or also known as Z-buffer
algorithm.
(Refer Slide Time: 29:58)
688
Now, this algorithm is an image space method. That means here we perform comparisons at
the pixel level. So we assume here that already the surfaces are projected on the pixel grid
and then we are comparing the distance of the surface from the viewer position.
(Refer Slide Time: 30:24)
Earlier, we mentioned that after projection the depth information is lost. However, we require
the depth information here to compare the distance of the surface from a viewer. So we store
that depth information even after projection and we assume an extra storage, which has this
depth information, which is called depth buffer or Z-buffer. Here, the size of this buffer is
same as the frame buffer. That means there is one storage for each pixel in this buffer.
(Refer Slide Time: 31:12)
689
Another assumption we make that is we are dealing with canonical volumes. Depth of any
point cannot exceed normalized range. So we already have a normalized range of the volume
and the depth cannot exceed that range that we are assuming. If we assume that then we
actually can fix the depth buffer size or number of bits per pixels. Otherwise, if we allow
unrestricted depth then we do not know how many bits to keep and that may create
implementation issues. So we go for some standardized considerations.
(Refer Slide Time: 32:02)
Now, this is the algorithm, the depth buffer algorithm shown here. Input is the depth buffer
which is initialized to 1; then we have the frame buffer, which is initialized to the background
color; list of surfaces and list of projected points for each surface; so all these are input. And
output is this depth buffer and frame buffer with appropriate values. That means depth buffer
value will keep on changing and frame buffer will contain the final values at the end of the
algorithm.
So what we do? For each surface in this surface list, we perform some steps. Now, for each
surface, we have the projected pixel positions. So for each projected pixel position of the
surface i, j starting from the top-left most projected pixel position, what we do? We calculate
the depth denoted by d of the projected point on the surface and then compare it with already
stored depth of that point.
If d is less than what is already stored in the depth buffer at the corresponding location then
we update this depth buffer information and then we update the frame buffer information
with the color of the particular surface and this we continue for all the pixels for that
690
projected surface and we do it for all the surfaces. Now the crucial stage here is calculation of
the depth, how do we do that?
(Refer Slide Time: 34:06)
We can do that iteratively. Let us see how.
(Refer Slide Time: 34:14)
Consider this scenario here. This is an illustrative way to understand this iterative method.
We are considering this triangular surface, which after projection look something like this.
Now, the surface equation we know which we can represent as ax + by + cz + d = 0. Again, if
you are not familiar with this, you may refer to basic textbooks on vectors and planes.
691
Now given this surface equation, we can find out this z values in terms of a b c as shown
here. And z value is the depth of that particular point so this is the depth of any point on the
surface. Now, we are assuming canonical view volume. That means all projections are
parallel projections. So the projection is a simple one, if a point is x, y, z then after projection,
it becomes x, y we drop the z component. So that is our assumption.
(Refer Slide Time: 35:53)
Now let us consider one projected pixel i, j of the particular surface. So x is i, y is j. Now,
depth of the original surface point z is then given by this expression where we replace x and y
with i and j a, b, c, and d are constants.
(Refer Slide Time: 36:23)
692
Now, as we progress along the same scan line, say consider this point and this point. The next
pixel is at i+1, and j. Now, the depth of the corresponding surface point at the next pixel
location is given by this expression where we replace i with i+1. After expanding and
rearranging, we can get this form.
Now this part, we already know to be the depth of the point i, j. So we can simply say z’
depth is (z- a/c), and here you can see this is a constant term. So for successive points, we can
compute depth by simply subtracting a constant term from the previous depth that is the
iterative method.
(Refer Slide Time: 37:39)
So that is along a scan line. What happens across scan lines? A similar iterative method we
can formulate. Now, let us assume the point x, y on an edge of a projected surface say, here.
Now in the next scan line, x becomes (x- 1/m) in this case, and the y value becomes y minus
1; m is the slope of the edge line.
693
(Refer Slide Time: 38:22)
Then we can compute new depth at this new point as shown here and if we expand and
rearrange what we will get? This new depth in terms of the previous depth and a constant
term. So again, we see that across scan lines, we can compute depth at the edges by adding a
constant term to the previous depth, and then along scan line, we can continue by subtracting
a constant term from the previous depth.
So this is the iterative way of computing depth and this method we follow in the Z-buffer
algorithm to compute depth at a given point. Let us try to understand this algorithm with an
illustrative example.
(Refer Slide Time: 39:40)
694
Let us assume there are two triangular surfaces s1 and s2. Clearly, they are in the view
volume. Now, vertices of s1 are given as these three vertices, s2 is also given as these three
vertices. As before we are assuming parallel projection due to canonical view volume
transformation and also, we can derive projection points or the projected vertices of s1 and s2
on the view plane to be denoted by this vertices shown here, which essentially, we can obtain
by simply dropping the z component. Now, this is the situation shown in this figure.
(Refer Slide Time: 40:48)
Now, we are given a point (3, 1) and we want to determine the color at this point. Note that
this point is part of both the surfaces, so which surface color it should get, we can determine
using the algorithm. Now let us assume that cl1 and cl2 are the colors of s1 and s2 and bg is the
background color.
Now, initially, the depth buffer values are set at a very high value and frame buffer values are
said to be background color, and then we follow the algorithm steps, we process the surfaces
one at a time in the order s1 followed by s2.
695
(Refer Slide Time: 41:40)
Now let us start with s1, what happens? From the given vertices, we can determine the s1
surface equation to be x+y+z-6=0. Then we determine the depth of the left-most projected
surface pixel on the topmost scan line that is pixel (0, 6) here, which is z to be 0.
(Refer Slide Time: 42:14)
Now, this is the only point on the topmost scan line as you can see in the figure, then we
move to the next scan line below that is y=5. And in this way using iterative method, we
determine the depth of the left-most projected pixel on this scan line to be, using this
expression to be 1 because here m is very high, infinity. And thus, surface equation is this.
(Refer Slide Time: 43:00)
696
Then the algorithm proceeds to the computation of depth and color determination along y= 5
till the right edge. At that point, it goes to the next scan line down that is y=4 here. Now, we
can skip all these steps and we can go directly to y=1, this line on which the point of interest
lies.
(Refer Slide Time: 43:34)
Now, following this iterative procedure that we have outlined earlier across scan lines, we
compute first the depth of the left-most point here as z=5. We skip those steps, you can do the
calculations on your own and find out. Then we move along this scan line when this
direction. So we go to the next point here, then here, and so on up to this point, (3, 1), and
calculate that at this point z is 2.
697
Now, this depth value is less than already stored value which is infinity. So we set this value
at the corresponding depth buffer location and then reset the frame buffer value from
background to the color of surface 1.
(Refer Slide Time: 44:51)
Then our processing continuous for all points, but those are of not much relevance here
because we are concerned only with this point, so we will skip those processing. So once the
processing completes for s1 for all the projected points, we go to s2 and we perform similar
iterative steps. And then, we find out the color at that particular point for s2 and then perform
comparison and assign the color.
So we skip here all the other calculations and it is left as an exercise for you to complete the
calculations. So that is the idea behind depth buffer algorithm or the Z-buffer algorithm.
698
(Refer Slide Time: 45:54)
Now one point is there. With this particular algorithm, a pixel can have one surface color. So
given multiple surfaces, a pixel at a time can have only one of those surface colors. From any
given viewing position, that means only one surface is visible. So this situation is acceptable
if we are dealing with opaque surfaces.
(Refer Slide Time: 46:28)
If the surfaces are not opaque, if they are transparent then definitely, we get to see multiple
surfaces which is not possible with this particular depth buffer algorithm. In case of
transparent surfaces, pixel color is a combination of the surface color plus contribution from
surfaces behind, and our depth buffer will not work in that case because we have only one
699
location to store the depth value for each pixel. So we cannot store all surface contributions to
the color value.
(Refer Slide Time: 47:08)
There is another method called A-buffer method which can be used to overcome this
particular limitation. We will not go into the details of this method, you may refer to the
material, reading material. So that is, in short, what we can do with depth buffer method.
So to recap, today we learned about basic idea of hidden surface removal. We learned about
different properties that can be utilized to reduce computations. Then we learned about two
broad class of hidden surface removal algorithms, one is the object space method, one is the
image space method. We learned about one object space method that is the back face
elimination method, and we also learned about one image space method that is the depth
buffer algorithm or Z-buffer algorithm.
700
(Refer Slide Time: 48:14)
Whatever we have discussed today can be found in this book, you may refer to Chapter 8,
sections 8.1 to 8.4. And if you want to learn more about A-buffer algorithm then you may
also check section 8.5. That is all for today. Thank you and goodbye.
701
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture - 24
Hidden Surface Removal - 2
Hello and welcome to lecture number 24 in the course Computer Graphics. We are in the
process of learning about the 3D graphics pipeline, which has five stages.
(Refer Slide Time: 00:46)
What are those stages, let us recap. Object representation, modeling transformation, lighting,
viewing pipeline, and scan conversion. So we are currently in this fourth stage of discussion
that is the viewing pipeline.
702
(Refer Slide Time: 01:12)
There we covered three transformations that are sub stages of the fourth stage namely, view
transformation, projection transformation, viewport transformation.
(Refer Slide Time: 01:28)
Then there are two more operations. These are also sub stages of this fourth stage, clipping
and hidden surface removal.
703
(Refer Slide Time: 01:40)
Among them, we have already discussed clipping and we started our discussion on HSR or
hidden surface removal. So we will continue our discussion on HSR and conclude that
discussion today.
(Refer Slide Time: 01:54)
So in the last lecture, we talked about two hidden surface removal method namely, back face
elimination and depth buffer algorithm. Today, we are going to talk about few more hidden
surface removal methods. We will start our discussion with another method that is called
depth sorting algorithm.
704
(Refer Slide Time: 02:21)
Now, this depth sorting algorithm is also known as the painter’s algorithm, another popular
name and it works at both image and object space. So it works both at the pixel level as well
as the surface level. And why it is called painter’s algorithm? Because it tries to simulate the
way painter draws a scene.
(Refer Slide Time: 02:56)
Now this algorithm consists of two basic steps.
705
(Refer Slide Time: 03:06)
What is the first step? In the first step, it sorts the surfaces based on their depth with respect
to the view position. To do that, we need to determine the max and min depth of each surface.
That means the maximum and minimum depth of each surface and then we create a sorted
surface list based on the maximum depth.
So we can denote this list in terms of notation si, and we can arrange them in ascending order
of depth. So in this notation, depth of si is less than depth of si+1 that is the first stage of the
algorithm.
(Refer Slide Time: 04:21)
706
In the next stage, we render the surface on the screen one at a time, starting with surface
having maximum depth that is the nth surface in the list to surface with the least depth or the
lowest depth.
(Refer Slide Time: 04:45)
Now, during rendering, during the second stage, we perform some comparisons. So when we
are rendering a surface si, we compare it with all other surfaces in the list S to check for depth
overlap. That means the minimum depth of one surface is greater than maximum depth of
another surface as the situation is illustrated here. Because that these two surfaces here, there
is no depth overlap because the minimum is greater than maximum. However, here the
minimum is not greater than maximum, so there is a depth overlap.
707
(Refer Slide Time: 05:52)
If there is no overlap then render the surface and remove it from the list S.
(Refer Slide Time: 06:03)
In case there is overlap, we perform more checks. First check is bounding rectangles of the
two surfaces do not overlap, we check for it. Then we check whether surface si is completely
behind the overlapping surface relative to the viewing position. That is the second check, this
is the first check. In the third check, overlapping surface completely in front of si relative to
the viewing position that is the third check.
708
And finally, the boundary edge projections of the two surfaces on to the view plane do not
overlap, this is the fourth check. So there are series of checks that we perform in case there is
depth overlap.
(Refer Slide Time: 07:06)
Let us try to understand these checks. First check is bounding rectangles; do not overlap.
Now, how do we check for it? So the rectangles do not overlap if there is no overlap in the x
and y coordinate extents of the two surfaces.
Consider the situation here. This is one surface, this is another surface. So here, Xmin is less
than Xmax, so there is overlap. If there is no overlap then Xmin would be higher than Xmax of
the other surface. So if either of these coordinates overlap then the condition fails that means
bounding rectangles overlap. So we check for X and Y coordinate extents and then decide
whether bounding rectangles overlap or not.
709
(Refer Slide Time: 08:22)
Next check is surface si completely behind overlapping surface relative to the viewing
position. Now how do we determine it? We determine the plane equation of the overlapping
surface where the normal point towards the viewer.
Next we check all vertices of si with the plane equation of that overlapping surface. If for all
vertices of si the plane equation of the overlapping surface returns value less than 0, then si is
behind the overlapping surface, otherwise, it is not behind and this condition fails. Situation
is depicted in this diagram.
(Refer Slide Time: 09:22)
710
The third condition that we check is whether the overlapping surface is completely in front of
the surface of interest si, again relative to the viewing position. Now, this we can check
similarly with the plane equations that we have done earlier.
Now, this time we use the plane equation of si rather than this overlapping surface and we use
the vertices of the overlapping surface. So we use those vertices in the plane equation and for
all vertices, if the equation returns positive value then it is completely in front, otherwise, the
condition fails. Situation is shown in this figure.
(Refer Slide Time: 10:19)
And finally, the boundary edge projections of the two surfaces onto the view plane do not the
overlap, this is the final check. Now, in order to check for these we need set of projected
pixels for each surface and then check if there are any common pixels in the two sets. The
idea is illustrated here. If there are common pixels in the two sets then definitely, there is an
overlap, otherwise, there is no overlap.
711
(Refer Slide Time: 11:00)
Now, as you can see here that this algorithm incorporates elements of both object space and
image space methods. The first and the last checks were performed at the pixel level so that is
image space, whereas the other two, second and third, were have performed at the object
level. So here, the element of object space method is present.
(Refer Slide Time: 11:39)
Now, when we perform the tests, we perform the tests following this ascending order
maintained in s and also, the order of the checks that we have mentioned. Now, as soon as
any one of the checks is true, we move to the check for overlap with the next surface of the
list.
712
So essentially, what we are doing? Initially, we are checking for Z overlap for one surface,
with all other surfaces, if it succeeds then we simply render the surface, otherwise, we
perform the checks in that particular order, and during the checks if any check is true then we
move to the next step rather than continuing with the other checks.
(Refer Slide Time: 12:43)
Now if all the test fail, what happens in that case? We swap the order of the surfaces in the
list. This is called reordering and then we stop. Then we restart the whole process again from
the beginning. So if all checks fail, then we need to reorder the surface list and then we start
from the beginning.
(Refer Slide Time: 13:12)
713
Now, sometimes there are issues. Sometimes we may get surfaces that intersect with each
other. For example, see this surfaces. This is one surface, this is another surface, and they
intersect each other. So in this example, one part of surface 1 is at a depth which is larger
than surface 2, whereas the other part is at a depth which is lesser than surface 2, as you can
see in this figure.
(Refer Slide Time: 13:48)
Now, in such situations we may face problem, we may initially keep surface 1 and surface 2
in a particular way that is surface 1 after surface 2 in the sorted list.
(Refer Slide Time: 14:10)
714
However, if you perform the algorithm you will see that for these surfaces, all conditions will
fail so we have to reorder. But that will not solve our purpose. Even if we reorder, the
conditions will fail again and we have to reorder again.
So initially, we have S1 followed by S2, next we will have S2 followed by S1. Then we will
have to reorder again S1 followed by S2 and this will go on, and we may end up in an
indefinite loop because the surfaces intersect and the relative distance between the two are
difficult to determine.
(Refer Slide Time: 15:05)
In order to avoid such situations what we can do is we can use an extra flag, a Boolean flag
for each surface. If a surface is reordered then the corresponding flag will be set ON, which
indicates that the surface is already reordered once.
715
(Refer Slide Time: 15:29)
Now, if the surface needs to be reordered again next time we shall do the following. We
divide the surface along the intersection line and then add two new surfaces in the sorted list
at appropriate positions. So when the surface needs to reordered again, we know that there is
intersection. Then we divide the surface along the intersection lines and then add two new
surfaces instead of one in the list in a sorted order.
Of course, these two steps are very easy to do and requires lots of computations, however, we
will not go into the details we just give you some idea rather than the details of how to do
that. So that is the basic idea of painter’s algorithm.
(Refer Slide Time: 16:15)
716
We will discuss one more algorithm Warnock’s algorithm.
(Refer Slide Time: 16:29)
This is actually part of a group of methods for hidden surface removal, which are collectively
known as area subdivision methods and they work on same general idea. And what is that
idea?
(Refer Slide Time: 16:56)
So we first consider an area of the projected image.
717
(Refer Slide Time: 17:05)
Then if we can determine which polygonal surfaces are visible in the area then we assign
those surface colors to the area. Of course, if we can determine then our problem is solved.
So that determination is the key issue here.
(Refer Slide Time: 17:28)
Now if we cannot determine, we recursively subdivide area into smaller regions and apply
the same decision logic on the sub regions. So it is a recursive process.
718
(Refer Slide Time: 17:44)
Warnock’s algorithm is one of the earliest subdivision method developed.
(Refer Slide Time: 18:02)
In this algorithm, we subdivide a screen area into four equal squares. As you can see this is
the region which we divide into four equal squares P1, P2, P3 and P4 then we perform
recursion.
719
(Refer Slide Time: 18:28)
We check for visibility in each square to determine pixel colors in the square region. So we
process each square at a time.
(Refer Slide Time: 18:40)
And in this processing, there are three cases to check. Case 1, is current square region being
checked does not contain any surface. In that case, we do not sub divide the region any
further because it does not have any surface so there is no point in further checking and we
simply assign background color to the pixels contained in this sub region.
720
(Refer Slide Time: 19:13)
In case 2, the nearest surface completely overlaps the region under consideration. That means
it is completely overlapped by the surface that is closest to the viewer. In this case also, we do
not further sub divide the square, instead we simply assign the surface color to the region
because it is completely covered by the surface. So note that here we need to determine the
nearest surface and then determine the extent of this surface after projection so that we can
check whether it completely cover that sub region.
(Refer Slide Time: 20:05)
And there is case 3, where none of case 1 and 2 holds. In this case, we perform recursion. We
recursively divide the region into four sub regions and then repeat the checks. Recursion
721
stops if either of the cases is met or the region size becomes equal to pixel size. For example,
here, as you can see, we subdivided into four more sub regions P31, 32, 33, and 34. Then
31
we
performed another recursion, again divided into sub regions, four sub regions.
And we continue till either of the conditions 1 or 2 is met or the sub region size becomes
equal to pixel size that is the smallest size possible. So this is the idea of the algorithm, where
we assume that we are having projected image and then, we divide it into four sub regions at
a time and perform recursive steps.
(Refer Slide Time: 21:27)
So with that, we have come to the conclusion of our discussion on hidden surface removal.
(Refer Slide Time: 21:37)
722
Now, before we conclude, few things to be noted here. The hidden surface removal is an
important operation in the fourth stage, but it involves lots of complex operations and we
exploit the coherence principles to reduce such complexities. These are the things that we
should remember.
(Refer Slide Time: 22:05)
Also, we should remember that there are many methods for hidden surface removal and
broadly, they are of two types, object space method and image space method.
(Refer Slide Time: 22:16)
Among these methods, we covered four such methods that is back face elimination, which is
an object space method; Z-buffer algorithm, an image space method; painter’s algorithm, a
723
mix of image and object based method; and Warnock’s algorithm, which is image space
method. There are other approaches of course.
(Refer Slide Time: 22:42)
One popular approach, which is an object space method is an Octree method, which we will
not discuss in details, you may refer to the learning material. So we covered fourth stage and
all its sub stages namely, the three transformations view transformation, projection
transformation, viewport transformation, and also the two operations namely, clipping and
hidden surface removal.
(Refer Slide Time: 23:19)
724
Whatever we have discussed so far can be found in this book. You may refer to chapter 8,
sections 8.6 and 8.7. Also, if you are interested to learn more about another object space
method that is the Octree method you may check section 8.8 as well.
So that is all for today. In the next lecture, we will start our discussion on the next stage of the
pipeline that is scan conversion. Till then, thank you and good bye.
725
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture - 25
Scan Conversion of Basic Shapes - 1
Hello and welcome to lecture number 25 in the course Computer Graphics. We are currently
discussing the 3D graphics pipeline which consists of five stages. Let us quickly recap the stages.
(Refer Slide Time: 00:52)
As you can see in this figure, first stage is object representation, then we have modeling
transformation, then lighting or the coloring of objects, then viewing pipeline, and the fifth stage
is scan conversion. Among them, we have already discussed the first four stages namely, object
representation, modeling transformation, lighting, and the viewing pipeline.
726
(Refer Slide Time: 01:23)
Now we are going to discuss the fifth stage that is rendering or also known as scan conversion.
So what is this stage is all about?
(Refer Slide Time: 01:40)
Let us have a look at the very basic problem that we try to address in this fifth stage.
727
(Refer Slide Time: 01:48)
So far, whatever we have learned that gives us some idea of one particular thing. Through these
four stages that we have discussed so far, we can transform a 3D scene to a 2D viewport
description, which is in the device coordinate system. Just quickly have a relook at how it is done
as we have learned in our previous discussions.
(Refer Slide Time: 02:19)
So first, we have a 3D scene say, for example, this cube. Now, this cube is defined in its own
coordinate system or local coordinate, which we do in the first stage that is object representation.
Then what we do, we transfer it to a world coordinate system through modeling transformation.
728
So the object is now in world coordinate. So this is stage one. And in second stage through
modeling transformation, we transfer it to world coordinate description which is stage two.
Then we assign colors by using the lighting, shading models. So we color the object in stage-3
and after that, we perform viewing transformation, which is stage four in which we transfer it to
a viewport description. So it involves first, transferring the world coordinates into a view
coordinate system, then from view coordinate, we perform projects and transformation and
transfer it to view plane, then from view plane, we perform a window to viewport mapping to
transfer it to this viewport, which is in the device coordinate system.
So these three transformations take place in stage four along with, of course, clipping and hidden
surface removal. And after that what we get is a 2D representation of the object or scene on a
viewport, which is in the device coordinate system. This is how things get transformed from
object definition to viewport description.
(Refer Slide Time: 05:10)
However, the device coordinate system that we are talking about is a continuous system that
means the coordinate values of any point can be any real number. So we can have coordinate like
2, 3, which are integers, whereas, we can also have a coordinate like 2.5, 3.1, which are real
numbers. So all sorts of coordinates are allowed in device coordinate system.
729
(Refer Slide Time: 05:47)
In contrast, when we are actually trying to display it on a screen, we have a pixel grid that means
it is a discrete coordinate system. So all possible coordinates are not allowed, instead, we must
have something where only integer coordinates are defined. So whatever we want to display on
the pixel grid must be displayed in terms of integer coordinates, we cannot have real coordinate
values.
(Refer Slide Time: 06:35)
Thus what we need? We need to map from the viewport description, which is a continuous
coordinate space to a pixel grid, which is a discrete coordinate space. So this is the final mapping
730
that we need to do before a scene is rendered on a physical display screen. Now, these mapping
algorithms or the techniques that we use for mapping are collectively known as rendering or
more popularly, they are called scan conversion or sometime rasterization as we are mostly
dealing with raster scan devices. So all these three terms are used, rendering, scan conversion, or
rasterization.
(Refer Slide Time: 07:34)
So what can be the straightforward approach to do this? You may think it is pretty simple. What
we can do is simply round off the real coordinates to the nearest integer coordinates. For
example, if we have a coordinate value like (2.3, 2.6), we can round it up to (2, 3) that is the
nearest integer coordinate values.
However, this may be good for converting points or mapping points from continuous coordinate
space to discrete coordinate space, however, same scheme may not be good for lines, circles, or
other primitive shapes that are required for rendering a scene. Now, let us try to understand how
then we can take care of scan conversion of lines, circles, or other primitive shapes.
731
(Refer Slide Time: 08:53)
Let us start with line scan conversion, how we can basically map a line defined in a continuous
coordinate space to a line defined in a discrete coordinate space.
(Refer Slide Time: 09:13)
We will start with a very simple and intuitive approach and then, we will try to understand the
problem with the intuitive approach, and then, we will introduce better and better approaches.
Now, we all know that we can define a lightened segment in terms of its endpoints. So to scan
convert, what we need? We need to first map the end points on the line to the appropriate pixels
and also the other points that are on the line to the appropriate pixels.
732
(Refer Slide Time: 10:06)
Now, let us go through a very simple approach, how we can map the points that are on the line to
the nearest pixels in the pixel grid.
(Refer Slide Time: 10:24)
So we can follow a four-step approach. In the first step, we map the end points to pixels simply
by rounding off to the nearest integer. In that way, we get the starting and ending pixels for the
line segment. So we now know which pixels are defining the line. Then in the second step, we
take one endpoint having the lower x and y values as the starting point. In the third step, we work
out the y value for successive x values.
733
Now, since we are dealing with a pixel grid, we know that pixels are separated by unit distances.
So the successive x values will have a value difference of 1, so these successive values will differ
by 1. In the fourth step, this computed y values are mapped to the nearest integer giving us the
pixel coordinates of that particular point.
So we first convert the end points to the nearest pixels, then we choose the end point having
lower x and y values as the starting point, and starting with that point, we compute y value taking
as input the x value, where the successive x values differ by 1 and we continue this till the other
endpoint. So we compute all the y values between the starting and ending pixels. And these
computed y values are then mapped to the nearest integer values, giving us the pixel coordinates
for those points. Let us try to understand this in terms of one example.
(Refer Slide Time: 13:07)
Suppose this is our line segment, as shown in this figure. So this is one endpoint, this one is
another endpoint. Initially, the line was defined by these two endpoints, A and B. As you can see,
both are real numbers. So we have to map it to the nearest pixel.
734
(Refer Slide Time: 13:35)
For A, if we do the rounding off, we will get this pixel as the nearest pixel, and for B will get this
one as the nearest pixel if we perform the rounding off. Now, we can see that coordinates of A’ is
less than B’. So we start with A’, we choose it as our starting pixel. So we start with this pixel
and continue finding out the y values till we reach this other pixel, other endpoint.
(Refer Slide Time: 14:21)
Now, our objective is to compute the y values for successive x values. For that, we require the
line equation which involves computation of the slope m, which in our case turns out to be this.
735
Because we know the two endpoints we can compute the slope and also, the y-intercept value.
We are assuming here that the line is expressed in terms of this equation, y = mx + b, where m is
the slope, and b is the y-intercept. Given the two endpoints, we can solve for m and b and find
that m is 3 by 5 and b is 4 by 5.
(Refer Slide Time: 15:16)
Then what we do? For each x separated by unit distance, starting from the lower end pixel that is
x value is 2, we compute the y values using the line equation till we reach the other endpoint.
And the line equation is given here. So in this equation, we use the x values to get the y values.
(Refer Slide Time: 15:58)
736
If we do so, what we will find? So for x =2, we have y=2; when x is 3, so the next pixel, x
coordinate, we get y is 2.6; when x is 4, then we get y to be 3.2; x=5, y=3.8; x=6, y=4.4. So we
get the four intermediate pixels and corresponding y values computed using the line equations.
(Refer Slide Time: 16:45)
So the four points are (3, 2.6), (4, 3.2), (5, 3.8), and (6, 4.4). These are the four values that we
compute using the line equation. Now, we map this y values as shown here to the nearest integer
to get the pixel locations. So if we do the rounding off, we will get the four pixels to be (3, 3)
which is here; then (4, 3), which is here; (5, 4) here, and (6, 4) here. So these are our four
intermediate pixels corresponding to the points on the line.
737
(Refer Slide Time: 17:45)
Now, with this approach, as you can see, the way we have computed the values and ultimately
found out the pixels, there are two problems broadly. First problem is we need to perform
multiplication of m and x. Now, m is likely to be a real value, so that is a floating-point
operation. Secondly, we need to round off y coordinate values that is also floating-point
operation. Together these floating-point operations are computation intensive.
(Refer Slide Time: 18:40)
738
So we have a computationally expensive approach to convert a line to corresponding pixels. In
reality, we need to scan convert very large number of lines within a very small time. Now, if we
have floating-point operations involved, then this process will become slow and we will perceive
flickers, which is of course something which we do not want. So what do we need, we need
some better solutions. Let us have a look at it slightly better approach.
(Refer Slide Time: 19:29)
But before that, I would like to point your attention towards another important topic that is
consideration of slope of a line when we are performing this scan conversion.
(Refer Slide Time: 19:47)
739
In our example, what we did, we calculated the y coordinate values for each x coordinate value.
So we increased x by 1 and corresponding y values, we calculated using the line equation. You
may wonder why we did so, we could have similarly done it the other way around. We could
have increased y and calculated the x values. Let us try to see what happens if we do so if we
calculate x values by increasing y values.
(Refer Slide Time: 20:26)
Now, we have these two endpoint pixels and the lower pixel is the starting point denoted by A’.
740
(Refer Slide Time: 20:41)
This time we are increasing y by 1. That means we are moving this way. Earlier, we were
moving along this direction, now we are moving this way from one scan line to the next. And
then, we calculate the x based on the equation. Now, we need a modified equation which is given
here and we already know b and m, the y-intercept and slope respectively. So we simply replace
the y value to get the x value.
(Refer Slide Time: 21:24)
Now, if we do so, we will see that we have only two successive y values, this one and this one
between y=2 and y=5. So earlier, we computed four x values, this time we required to compute
741
only two values because we have only two increases of y between the endpoints. So we are
required to calculate two x values. So when y=3, then the x turns out to be 3.7 using the
equation, and when y=4, x is 5.3.
(Refer Slide Time: 22:19)
Then what we have computed between the two end points? Two new points, (3.7, 3) and (5.3, 4).
If we round it off to the nearest integers, then we get the pixels (4, 3) and (5, 4); let us place it
here. So (4, 3) is this point and (5, 4) is this point. Note that earlier, we got two additional points
when we moved along x-direction, now we are getting only two points, these two, when we are
moving along y-direction and computing x.
742
(Refer Slide Time: 23:11)
Clearly, the first set that is these 4 plus the 2 endpoints, total 6, pixels will give us a better
approximation to the line compared to the second set consisting of total 4 pixels, the two
endpoints, and the two newly computed pixels. So you have the first set, which is better than
second set because the approximation is better due to the larger number of pixels.
(Refer Slide Time: 23:55)
Now, that is the issue here. How do we decide which coordinate to calculate and when? Should
we start with x and calculate y, or should we increase y and calculate x? Now, this decision is
taken based on the slope of the line, depending on the slope we take a call.
743
(Refer Slide Time: 24:30)
If the slope is within these ranges, then we work out or calculate y values based on x coordinates
of the pixels. So you increase x by 1 and compute the y values when m is within this range. If m
is not within this range, then we compute x by increasing y coordinates. So that is our rule. So
when m is within the ranges given here, then we compute y based on x, where x indicates the
pixel coordinates that means integers. Otherwise, that means when m is not within this range, we
compute x based on y where y indicates the pixel coordinates in integers. So that is how we
make the decision.
(Refer Slide Time: 25:48)
744
Now, let us go back to a better line scan conversion algorithm compared to the simple approach
that we have learned earlier. So this approach is called DDA or digital differential analyzer.
(Refer Slide Time: 26:06)
DDA stands for digital differential analyzer, and this is an incremental approach which is
developed to reduce floating-point operations. That means to increase the computation speed,
speed up the scan conversion process.
(Refer Slide Time: 26:32)
745
Let us try to first understand the idea. We will use the same example that we have seen earlier
but this time, we will note a few more points. So earlier, we computed 4 points between the two
end points (2, 2) and (7, 5) by increasing x and computing y values. Now, these 4 points are (3,
2.6), (4, 3.2), (5, 3.8) and (6, 4.4). Now, we will have a closer look at these points, what they tell
us.
(Refer Slide Time: 27:17)
We computed that slope is 3/5 or 0.6. Now, the successive y values are actually addition of this
slope value to the current value. The first value that we got is 2.6. Second value that we got is
3.2, which we can get by adding 0.6 that is the slope value to the earlier value that is 2.6.
Next value we got is 3.8, which is again the earlier value plus the slope. Finally, we got 4.4,
which is again the earlier value plus slope. So there is a pattern. We add the slope value to the
earlier value to get the new value. This idea is exploited in this DDA algorithm.
746
(Refer Slide Time: 28:31)
So instead of computing y with the line equation every time, we can simply add m to the current
y value. That means the new y we can get by adding m to the current value. So we do not need to
go for solving the line equation every time we want to compute the y value.
(Refer Slide Time: 28:59)
What is the advantage of that? It eliminates floating-point multiplication which is involved in
this computation that is m into x. So we can eliminate these calculations which in turn is going to
reduce the computational complexities.
747
(Refer Slide Time: 29:28)
Now, as I said earlier, slope is an important consideration here. So when the slope is not within
the range that means the slope is greater than 1 or less than minus 1, then we do not compute
successive y values, instead we compute x values. Again, in a similar way that is new value is
the old value plus a constant term, which in this case is 1/m, earlier it was only m. So we can
obtain the new x value by adding this constant 1/m to the current value. And here also, by this,
we are eliminating floating-point operations.
(Refer Slide Time: 30:19)
748
So the complete algorithm is shown here. The input is the endpoint, the two endpoints are the
input. And the output is the set of all pixels that are part of the line. So we compute m. Now,
when m is within this range, we compute successive y values as shown here, by adding m to the
current y value, round it off to get the pixel, and add the pixel to the set. And when m is not
within this range, we compute successive x values by adding 1/m to the current value and
perform the same steps again.
So we continue in both the cases till the other end point as you can see in this loop termination
conditions. So that is how we improve on the simple lines scan conversion approach by
exploiting one particular property that is, we can compute the successive x or y values by simply
adding a constant term.
(Refer Slide Time: 31:58)
This is clearly some improvement over the simple approach. However, there are still issues.
749
(Refer Slide Time: 32:07)
With the DDA algorithm as we have noted, we can reduce floating-point operations, but only
some of those floating-point operations. We cannot remove all, we can only reduce
multiplications.
(Refer Slide Time: 32:30)
That still leaves us with other floating-point operations, which are addition and rounding off.
Now, any floating-point operation is computationally expensive and it involves additional
resources. So when we, in reality, require to generate large number of line segments in a very
short span of time, our ideal objective should be to eliminate all floating-point operations
750
altogether, rather than eliminating few. Eliminating few, of course, improves the overall
rendering rate, but eliminating all should be our ultimate objective.
(Refer Slide Time: 33:22)
That is so, since, for large line segments or large number of line segments, this floating-point
operations may create problem. Particularly, when we are dealing with a very large line segment,
the rounding off may result in pixels far away from actual line, for example, consider a very big
line like this. Now, if we perform rounding off then we make keep on getting pixel, like,
something like this. This actually looks distorted line for large line segments. For small
segments, this may not be the case, but for large line segments, there is a possibility of visible
distortion, which of course, we do not want.
751
(Refer Slide Time: 34:28)
That is one problem of course, plus our ultimate objective is to remove all floating-point
operations because along with this distortion, they also increases the time to render a line
segment, and also requires resources. So we need a better solution than what is provided by
DDA. One such approach we will discuss in the next lecture.
(Refer Slide Time: 35:06)
So whatever we have discussed today can be found in this book, Computer Graphics. You are
advised to go through Chapter 9 up to Section 9.1.1 to know in more details whatever we have
752
discussed. So the improved line drawing algorithm will be taken up in the next lecture. Till then,
thank you and goodbye.
753
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture - 26
Scan Conversion of Basic Shapes - 2
Hello and welcome to lecture number 26 in the course Computer Graphics, we will continue our
discussion on the graphics pipeline. For a quick recap, let us just go through the stages.
(Refer Slide Time: 00:47)
So, we have already discussed the first stage that is object representation, second stage modeling
transformation, third stage lighting, fourth stage viewing pipeline, and the only stage that is
remaining is the fifth stage scan conversion. We are currently discussing the fifth stage.
754
(Refer Slide Time: 01:14)
In the last lecture, we talked about rendering of lines, which is part of the fifth stage. And there
we talked about a very intuitive approach as well as a slightly improved approach that is the
DDA methods. Today, we will continue our discussion on line rendering, where we will talk
about even better approach. And also we will discuss rendering of circles.
Now, before we go into the discussion on a better line rendering approach, let us quickly recap
what we have seen in the previous lecture on line rendering.
(Refer Slide Time: 02:02)
755
So the idea was to map a description from viewport to a pixel grid. That is, of course, the
objective of the fifth stage.
(Refer Slide Time: 02:20)
In order to do that, simplest approach is just to round off the real coordinates to the nearest
integers, which are pixels, for example, from (2.3, 2.6) to (2, 3). Now, this is good for points but
for mapping lines or circles or other primitive shapes, this may not be good.
756
(Refer Slide Time: 02:49)
And for line what we did? So we first assume that a line segment is defined by the endpoints.
And our objective is to map all the points that are on the line to the appropriate pixels.
(Refer Slide Time: 03:08)
The straightforward approach that we discussed is that first we map the end points to pixels, then
we start with one endpoint which is having the lower x and y coordinate values, then we work
out y-coordinate values for successive x-coordinates, where the x-coordinates differ by 1 because
we are talking pixel grids. And then, this y values that we computed are mapped to the nearest
integer thereby getting the pixels.
757
(Refer Slide Time: 03:55)
Now, this approach has two problems. First, we require multiplication which is a floating-point
operation. And secondly, we require rounding off which is also a floating-point operation. Now,
these floating-point operations are computationally expensive and may result in slower rendering
of lines.
(Refer Slide Time: 04:20)
To improve, we discussed one incremental approach. There we did not go for multiplication,
instead we used addition. So to compute y, we simply added this m value to the current value, or
to compute x, new x, we simply added this 1 by m value to the current x value. Now when to
758
choose whether to compute x or y, that depends on the slope. So if the m value is within this
range, then we compute y given x, otherwise, we compute x given y using the line equation.
(Refer Slide Time: 05:18)
Now, the DDA can reduce some floating-point operations as we have discussed, particularly
multiplications. However, it still requires other floating-point operations, namely additions and
rounding off. So it is still not completely efficient so to speak, and we require a better approach.
One such approach is given by Bresenham’s algorithm.
(Refer Slide Time: 06:03)
759
Now, this is an efficient way to scan convert line segments and we will discuss the algorithm
assuming m to be within these ranges. That means we will concentrate on computing y value
given the x value.
(Refer Slide Time: 06:26)
Now, let us try to understand the situation. Suppose, this is the actual point on the line and we are
moving along the x-direction, the current position is given by this point (xk, yk). Now, the actual
point on the line is a floating-point number real number, so we need to map it to the nearest pixel
grid point.
Now, there are two potential candidates for that, one is this pixel or the upper candidate pixel
that is (xk+1, yk+1), and the other one is the lower candidate pixel that is (xk+1, yk), and we have
to choose 1 of those. How to choose that?
760
(Refer Slide Time: 07:24)
Our objective is to choose a pixel that is closer with respect to the other pixel to the original line.
So between these two pixels, we have to decide which one is closer to the original line and
choose that pixel.
(Refer Slide Time: 07:51)
Let us denote by dupper, the distance of the pixel (xk+1), (yk+1) from the line that is the upper
candidate pixel from the line as shown here. Similarly, d lower indicates the distance of the
lower candidate pixel from the line.
761
(Refer Slide Time: 08:28)
Now, at (xk+1), that is at these points y is given by this expression using the line equation, where
m is the slope. Then we can say that d upper can be given by ((yk+1) – y), that is this value minus
this y value, which is given here or this expression.
Similarly, dlower can also be given as y minus yk. As you can see here, this is the y value and this
is the yk value. So replacing the y from this equation, you can get this expression. Now, let us do
some mathematical trick on these expressions.
(Refer Slide Time: 09:49)
762
But before that, we should note that if the difference is less than 0 then the lower pixel is closer
and we choose it, otherwise, we choose the upper pixel. Distance between the y values here,
here, and here; the two distances that we have used in expressing dupper and dlower. If the
difference is less than 0, then we choose the lower pixel because that point is closer to the line,
otherwise, we choose the upper pixel.
(Refer Slide Time: 10:41)
Now, let us substitute m with this ratio, Δy/Δx, where Δy is the y coordinate difference between
the endpoints and Δx is the x coordinate difference between the endpoints. And then we
rearrange and replace this expression with c, which is a constant term. As you can see here, all
are constants.
Then what do we get? This term to be equal to this term, both sides we have multiplied by Δx and
replace m with these expressions. Rearranging and expanding, we get this, then we replace this
constant term here with c to get this expression. This is a simple manipulation of the terms.
763
(Refer Slide Time: 11:55)
Now, let us denote the left-hand side by pk, we call it a decision parameter for the kth step. Now,
this parameter is used to decide the closeness of a pixel to the line. Now, its sign will be same as
that of the sign of the difference dlower - dupper.
(Refer Slide Time: 12:28)
Thus, if pk˂0, then this lower pixel is closer to the line and we choose it, otherwise, we choose
the upper pixel.
764
(Refer Slide Time: 12:47)
So that is at step k. Now, at step k+1 that is the next step, we get pk+1. Now, that is essentially
given by this expression where we replaced xk with xk+1 and yk with yk+1. These two terms we
replaced with the next term. Then we take a difference between the two, pk+1 - pk, which gives us
this expression.
(Refer Slide Time: 13:26)
Now, we know because we are dealing with a pixel grid that x k+1 is essentially xk + 1. So we can
rearrange and rewrite the expression as pk+1 = pk + 2Δy – {this term}. That means the decision
variable at k+1th step is given by the decision variable at kth step plus a term; this and that term.
765
Now, if pk<0, that is the lower pixel is closer, then we set ykp+1=yk, otherwise, we set yk+1=yk+1.
Thus based on the sign of pk, this term becomes either 0 or 1. So you can see the physical
significance from this figure. So if pk<0 that means in the current stage, lower pixel is closer.
That means, we have chosen this one.
Then in the next stage, we have to choose yk+1 = yk that is the lower pixel. If that is not the case,
then we have to choose yk+1 = yk +1 that is the upper pixel. So depending on the sign of pk, this
term yk+1 - yk turns out to be either 0 or 1. If pk<0 then this is 0; if pk≮0 then it is 1.
(Refer Slide Time: 15:50)
So, where we start, then we have to decide on the first decision parameter and that we call p0,
which is given by twice delta y minus delta x. This value we calculate and then we continue.
766
(Refer Slide Time: 16:13)
So the overall algorithm is given here. We first compute these differences between the endpoints
and the first decision parameter. Then we go inside a loop till we reach the end point. We start
with one end point and till we reach the other end point, we continue in the loop.
Now, if p<0, we set that difference to be 0 and then update p as p+2Δy. If p≥0, then we update p
as given in this expression and then add the corresponding x-y value into the set of pixels that is
the output of the algorithm. So, depending on the decision value, we choose a pixel and add it to
the set of pixels.
767
(Refer Slide Time: 17:31)
Now, here we assume that m is within this range. When m is outside this range, we have to
modify this algorithm but that is a minor modification. So you may try it yourself.
(Refer Slide Time: 17:50)
So what is the advantage of this algorithm? Here if you note, we are choosing the pixels at each
step depending on the sign of decision parameter, and the decision parameter is computed
entirely with integer operations so there is no floating-point operation.
768
Thus we have eliminated all floating-point operations; additions, rounding off, as well as
multiplications which is a huge improvement because, in reality, we need to render a large
number of lines in a short span of time, a very short span of time. So there this saving is
substantial. There are even better approaches but we will not discuss those any further.
(Refer Slide Time: 18:48)
Now, let us try to understand the algorithm in terms of one example.
(Refer Slide Time: 18:54)
769
We will continue with our example that we have introduced in the previous lecture. So this is the
line segment given, these are the endpoints already mapped and our job is to find out the
intermediate pixels that correspond to the points on the line.
(Refer Slide Time: 19:17)
So we will start with computing Δx, Δy, and initial p. Then we start with one endpoint and add it
to the list of pixels, the endpoint.
(Refer Slide Time: 19:38)
770
Now, we have computed p to be 1, which is ≥0. So here, the upper pixel is closer that means we
choose this one. We add this and update p with the expression to be -3, and (3, 3) is added to the
grid. Now, this is not the end point, we have not yet reached the end point so we will continue.
(Refer Slide Time: 20:13)
In the second execution, we check that p is -3, which ≤0. So in the second case, the lower pixel is
chosen and we update p again, to be 3, add this lower pixel to the output list and check whether
we have reached the end point. Since we have not yet reached we continue the loop. And in this
way, we continue to get other points.
771
(Refer Slide Time: 20:43)
So in the next stage, p=3 > 0. So we choose the upper pixel, add this one to the output pixel list,
continue the loop since we are yet to reach the end point.
(Refer Slide Time: 21:05)
772
Then we find p to be -1, less than 0. So we choose the lower pixel, add the pixel to the output
list, and now, we see that we have reached the other end point. So we stopped the loop and add
the other endpoint into the list. That is our last step.
(Refer Slide Time: 21:46)
So finally, what are the points that we get? These are the pixels that we get following the steps of
the Bresenham’s algorithm. Now, you can compare it with the previous methods that we used.
However, while comparing, you should keep in mind the number of floating-point operations
that we avoided because that is the advantage. So if you find that both the sets or all the sets that
we have found earlier are same that is not a problem because we saved in terms of computation.
773
So, with that, we end our discussion on lines scan conversion. So we learned three things; first,
we started with a simple approach, found its problems, then we discussed one improved
approach that is the DDA approach. And then, we finally discussed even better approach, the
Bresenham’s line drawing algorithm, which eliminates all the floating-point operations. Now we
will move to scan conversion of another primitive shape that is circle.
(Refer Slide Time: 23:19)
Initially, we will assume that the circle is centered at origin with radius r and its equation is given
by x2 + y2 = r2. We all know this equation. Now, in the simple approach, the most intuitive and
straightforward approach what we do? We solve for y after every unit increment of x in the pixel
grid by using the equation.
774
(Refer Slide Time: 23:54)
Clearly, here we have lots of computations, floating-point computations, which involve square
root and multiplications because r need not be integer. So this is inefficient. We may also need to
round off the computed values, which is addition of other floating-point operations and the pixels
that we obtain may not generate smooth circle because there may be gap between actual points
and the chosen pixels after rounding off.
(Refer Slide Time: 24:44)
So it suffers from many problems and we require a better solution.
775
(Refer Slide Time: 24:52)
Let us try to go through one such solution, which is called the Midpoint algorithm.
(Refer Slide Time: 25:05)
Now, this algorithm exploits an interesting property of circle that is called eight-way symmetry.
Now, what is this property?
776
(Refer Slide Time: 25:19)
If we look at this figure, we will see that this is the origin and the circle is around the origin, we
can divide the circle into 8 quadrants, these are the quadrants. And if we determine one point on
any quadrants say this point, then we can determine seven other points on the circle belonging to
the seven quadrants without much computations.
So if this point is (x, y), then we can say this point will be (y, x), will be (y, -x). This one will be
(x, -y), this one will be (-x, -y), this one will be (-y, -x), this one will be (-y, x), and this one will
be (-x, y). So this we can straight away determined without any further computation.
777
(Refer Slide Time: 26:28)
We can exploit this property in circle scan conversion and by computing one point on a quadrant
and then use that point to derive the other seven points on the circle. That means we determine
one pixel, and from there we determine the other seven pixels. So instead of determining the
pixels through computation eight times, we do it once and the other seven time computations we
save.
(Refer Slide Time: 27:07)
Now, let us see how to determine a pixel for a given quadrant. Suppose, we have determined a
pixel (xk, yk); this is one quadrant of the circle given. Now, next pixel should be either this one or
778
this one. Again, we can call them upper candidate pixel and lower candidate pixel. And in this
case, note that we are going down along this direction, down the scan lines, and our objective is
to choose a pixel that is closer to the circle. Now, how do we decide which of these two
candidate pixels is closer to the circle?
(Refer Slide Time: 28:11)
Again, we go for some mathematical trick.
(Refer Slide Time: 28:18)
779
Now, the circle equation, we can reorganize in this way, f(x, y) = x2 + y2 – r2, this is the circle
equation we can restate in this way.
(Refer Slide Time: 28:33)
Now, this function we can evaluate as shown here. That is if (x, y) < 0, if the point (x, y) is inside
the circle; it is 0 if it is on the circle, and it will be greater than 0 if the point is outside the circle.
This we know from geometry.
(Refer Slide Time: 29:05)
780
Then, we can evaluate the function at the midpoint of the two candidate pixels. That means at
(xk+1, yk – ½). Note that this is yk, so midpoint will be this point that is yk – ½ and it will be xk+1
that will be the new x coordinate. Now, this will be our decision variable pk after k steps. So let
us try to see what this variable looks like.
(Refer Slide Time: 29:48)
So essentially, we compute this function at the point (xk+1, yk–½), which is the midpoint
between the two candidate pixels, and y half because this is the unit distance, so half or the
midpoint will be half of this unit distance.
781
(Refer Slide Time: 30:16)
So, pk will be the function value at this point. Now, if we expand the function with these
coordinates, then we get this expression which is the expression for pk.
(Refer Slide Time: 30:42)
Now, if pk<0 that means, the function evaluates to be less than 0. Then we know from the
geometric properties that midpoint is inside the circle. That means the upper candidate pixel will
be closer to the circle boundary. So we choose (xk+1, yk). If that is not the case then we choose
the other candidate pixel that is (xk+1, yk-1).
782
Note that we are going down the scan lines, so next y coordinate will be yk-1. Because in that
case, midpoint is outside and this lower candidate pixel is closer to the boundary.
(Refer Slide Time: 31:33)
Now, let us see the expression for the decision variable at the next step that is pk plus 1. So here,
our new point will be xk+1+1, increment by 1, and yk+1-½, which after expansion will look like
this.
(Refer Slide Time: 31:55)
783
Now, we may expand it further and then rearrange to get a simplified expression that is pk+1 is
the current decision value plus this term.
(Refer Slide Time: 32:14)
Now, yk+1 is yk if pk is less than 0 that we have already seen. So in that case, pk+1 will be pk+2,
xk+3. Now, if pk greater than 0 then yk+1 will be yk-1 that also we have seen. Then the pk+1 term
will become something like this.
(Refer Slide Time: 33:01)
784
As you can see these are all integer operations and we choose the pixels based on an incremental
approach that is computing the next decision parameter from the current value and that too by
avoiding floating-point operations.
(Refer Slide Time: 33:22)
However, that need not be the case because here, the initial decision parameter or decision
variable involves floating-point operation. And we have to keep that in mind that unlike
Bresenham’s algorithm, although, the expression for computing the next decision variable does
not involve any floating-point operation apparently, but when we start with maybe a floatingpoint value and then that will remain. So here we are not completely eliminating floating-point
operations but we are reducing them significantly.
785
(Refer Slide Time: 34:11)
So what is the complete algorithm? We first compute the first decision variable and choose the
first or the starting point. Now, one point we have chosen, then using symmetry we can add four
other points or pixels.
Now, when the decision parameter is less than 0, we update the parameter in this way and get the
pixel to add to the set of pixels. When the decision parameter is greater than 0, then we update
the decision parameter in this way and get this point as the new pixel. And then we add the new
pixel to the list of pixels plus we add the seven symmetric points using the symmetric property
and we continue it until we reach the end of the quadrant.
So that is how midpoint algorithm works. As you have noted, if we go for simple approach, we
require a lot of floating-point operations, multiplications, square root, which we avoided by this
midpoint algorithm.
786
(Refer Slide Time: 35:58)
Now here, of course, it may be noted that the algorithm assumes circle is centered at origin and
we require some modification when we are assuming circles which has its center at any arbitrary
location. But that minor modification we will not go into the details, you may try it yourself.
(Refer Slide Time: 36:23)
Now, let us try to understand this algorithm better in terms of one illustrative example.
787
(Refer Slide Time: 36:34)
Let us start with the assumption that we have a circle with radius r to be 2.7 that means a real
number. Now, let us execute the algorithm to find out the pixels that we should choose to
represent the circle.
(Refer Slide Time: 36:59)
First stage is compute p, which is 5/4 - r, which gives us this value. And we start with this point
by rounding off r to 3 and we get this point as the first point in our list, first pixel. And based on
this first pixel, we add other four pixels in the output list. Then we enter the main loop.
788
(Refer Slide Time: 37:34)
So we have p<0, then we update p as per the expression for p<0 and then, get this new pixel
value. With that, we add eight pixels to the output list (1, 3), (3, 1) (3, -1) (1, -3), and so on.
Since we have not yet reached the end of the loop, we continue with the loop.
(Refer Slide Time: 38:11)
In the second loop run, we have p>0. So we use the expression to update p and we get the new
value and we decide on this new pixel, based on that we choose the eight pixels as shown here.
Now, we have arrived at the end of the loop, the termination condition is reached so we stop. So
then at the end what we get.
789
(Refer Slide Time: 38:44)
We get 20 pixels shown here. One thing you may note is that there are some pixels that are
repeated. For example, (2, 2) occur twice; (-2, -2) occurred twice; (2, -2) occurred twice. So this
repetition is there at the end of the execution of the algorithm.
(Refer Slide Time: 39:27)
These duplicate entries, we need to remove. So before rendering, we perform further checks and
processing on the output list to remove such duplicate entries. So the algorithm may give us a list
having duplicate entries, we need to perform some checks before we use those pixels to render
the circle to avoid duplications. So that is what we do to render circle.
790
So we have learned how to render line, we have learned how to render circle. In both cases, our
objective was to map from real number values to pixel grids. And our objective was to do so
without involving floating-point operations to the extent possible because, in practical
applications, we need to render these shapes very frequently. And there, if too many floatingpoint operations are involved, then the speed at which we can render may slow down giving us
the perception of a distorted image or flickers which are unwelcome.
In the next class, we will discuss more on rendering other things. Whatever we have discussed
today, can be found in this book.
(Refer Slide Time: 41:15)
You may go through chapter 9, section 9.1.2 and 9.2 to get the details on the topics that we
covered today. So we will meet in the next lecture. Till then, thank you and goodbye.
791
Computer Graphics
Professor Doctor Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 27
Fill Area and Character Scan Conversion
Hello and welcome to lecture number 27 in the course computer graphics. We are discussing the
3D graphics pipeline, as you may recollect it has 5 stages and we have already discussed 4 stages
in details and currently we are in the fifth stage. So, let us just have a quick relook at the 5 stages.
(Refer Slide Time: 00:55)
As you can see in this figure. We have already discussed first stage in details, object
representation, then second stage modelling transformation, third stage lighting or assigning
colour, fourth stage viewing pipeline and currently we are at the fifth stage scan conversion or
rendering.
792
(Refer Slide Time: 01:23)
Now, in scan conversion what we do, we essentially try to map description of an image given in
the device coordinate system to a description on the pixel grid that means, set up pixels. So, in
the earlier lectures we have covered the methods that are followed for such mapping, for point,
line and circle.
And we have seen how we can improve efficiency of these methods by introducing better
approaches such as the Bresenham's line drawing algorithm used for line scan conversion,
midpoint algorithm for circle scan conversion and so on. Today we are going to discuss another
scan conversion technique related to fill areas, along with that we will also discuss how we
display characters that means the letters, numbers etcetera on the screen. We will try to get a
broad idea on character rendering.
793
(Refer Slide Time: 02:56)
Let us, start with fill area rendering. So, first let us try to understand what is a fill area.
(Refer Slide Time: 03:05)
What we have discussed so far, how to determine pixels that define a line or a circle boundary.
794
(Refer Slide Time: 03:24)
Sometimes that may not be the case, sometimes we may know pixels that are part of a region.
And we may want to apply a specific colour to that whole region. So, earlier what we did? We
determined pixels that are part of a single line or the circle boundary, but sometimes there may
be situations where we may go for assigning colours to region rather than a line or a boundary.
(Refer Slide Time: 04:09)
Now, that is the same as saying that we want to fill a region with a specified colour. So, that is
fill area rendering, one of the topics of our discussion today. So, when we are talking about fill
area rendering we are referring to a region and our objective is to fill that entire region that
795
means the pixels that are part of that region with a specified colour. This is in contrast to what we
have learned earlier where our objective was to find out pixels and of course assign colours to
them which are part of a line or the boundary of a circle.
(Refer Slide Time: 05:02)
Let us, try to understand this concept with an example. Consider an interactive painting system,
so in that system what we do? We may try to draw any arbitrary shape and then we may wish to
assign some colours to that shape, that means assign colours inside the boundary of that set.
Also, we may want to change the colour, so first thing is we may want to colour it, colour that
arbitrary shape that we have drawn.
Now, when we say we are trying to colour some shape that means we want to colour the
boundary as well as the interior. We may also want to change colour and that too interactively
that means select some colour from a menu, click in the interior of the shape to indicate that the
new colour to be applied to that shape. If you have used some interactive painting system, then
you maybe already familiar with these things.
For example, suppose this is our canvas and here we have drawn a shape something like this,
then there may be a menu of colour or say colour palette, so we may choose this menu, say for
example this colour click our mouse pointer or touch some point inside this shape and then the
centre colour is applied in the interior of this shape. So, that is interactive colouring of a shape.
And here as you can see, we are concerned about colouring a region rather than only the
796
boundary, unlike what we did when we were trying to determine pixels as well as their colours
for lines or circle boundaries
(Refer Slide Time: 07:26)
The question is how we can perform such colouring or region filling? Now, that depends on how
the regions are defined, so there can be different ways to define a region and depending on that
definition we can have region filling approaches.
(Refer Slide Time: 07:50)
Broadly there are two definitions of a region, one is pixel level definition one is geometric
definition.
797
(Refer Slide Time: 08:02)
In case of a pixel level definition we define a region in terms of pixels that means we may define
the region in terms of boundary pixels or we may define the region in terms of pixels within a
boundary. In the first case when we are defining a region in terms of boundary pixels or the set
of pixels that define the boundary such definition is called boundary defined.
In the other case we do not explicitly define a boundary but set of pixels that defines the whole
region in that case we call it interior defined. So, such pixel definitions are useful when we are
dealing with regions having complex boundaries or as we have just seen applications such as
interactive painting systems. So, for complex shapes, it is difficult to deal with the boundary, so
their pixel level definition may be useful. Also in interactive systems pixel level definitions are
very useful.
798
(Refer Slide Time: 09:28)
The other type of fill area definition is geometric definition, here we define a region in terms of
the geometric primitives such as edges and vertices this we have already seen before during our
object representation techniques. Now, this particular approach is primarily meant for polygonal
regions. And these definitions are commonly used in general graphics packages, which we have
already mentioned earlier.
So, essentially geometric definitions means, defining a region in terms of geometric primitives
such as edges, vertices, if you may recollect we have discussed such things during our discussion
on objective representation where we used vertex list, edge list to define objects or regions. And
when we are dealing with geometric definitions, they are primarily meant to define regions that
are polygonal in shape.
799
(Refer Slide Time: 10:52)
Now, with this knowledge of two broad definitions of regions, let us try to understand the
different region filling scan conversion algorithms. We will start with one simple approach that
is called seed fill algorithm. So, what it does let us try to understand.
(Refer Slide Time: 11:16)
So, the idea is very simple for a seed fill algorithm, we start with one interior pixel and colour
the region progressively, that is the simple idea.
800
(Refer Slide Time: 11:33)
Clearly here we are assuming a pixel level definition particularly, a boundary definition of a
region, where the boundary pixels are specified. And we also assume that we know at least one
interior pixel, now that pixel is called the seed pixel and if we know the boundary pixels we can
decide on any seed pixel, it is easy, because we are dealing with a seed pixel, so the algorithm is
named seed fill algorithm. So, we have a seed pixel and we have boundary definitions of the
region in terms of pixels.
(Refer Slide Time: 12:26)
801
Next in this algorithm, it is also assumed that interior pixels are connected to other pixels in
either of the two ways, either they can be connected to 4 pixels which is called 4 connected or
they can be connected to 8 pixels, which is called 8 connected. Now, these are the neighbouring
pixels for example suppose this is a seed pixel and there are pixels around, if this is the grid,
where the circle show the pixels, then these pixels can be assumed to be connected to either 4
neighbouring pixels or all 8 neighbouring pixels.
Accordingly, the nature of connection is called 4 connected or 8 connected. So, when we are
talking of 4 connected, we essentially assume that let us redraw the figure again, suppose these
are the pixels these intersection points of the grid, this is one pixel, now in case of 4 connected
the 4 neighbouring pixels are defined as top, bottom, left and right that means this is the top, this
is bottom, this is right and this is left.
Whereas when we are dealing with 8 connected pixels, we are dealing with the 8 neighbours top,
top left, this is the top left, then top right here, then left, right, bottom, bottom left this is here and
bottom right here. So, either of these connections we can assume. And accordingly the algorithm
is executed.
(Refer Slide Time: 15:05)
So, the basic idea is simple we maintain a stack, the seed pixel is first pushed into the stack and
then a loop executed till the stack is not empty. Now, in each step, we pop the stack top pixel and
assign the desired colour to that pixel.
802
(Refer Slide Time: 15:37)
The algorithm is shown here. So, what is our input? The boundary pixel colour, specified colour,
which we want to assign to the region and the seed or interior pixel, anyone seed pixel and the
output is the interior pixels with specified colour that is our objective. We start with pushing the
seed pixel to a stack and then we enter into a loop where we set the current pixel to be the stack
top pixel by popping it from the stack, apply specified colour to that pixel, then we make use of
the connected property.
So, if we are assuming that it is a 4 connected pixel, then for each of the 4 connected pixels or if
we are assuming that it is 8 connected pixel then for each of the 8 connected pixels of the current
pixel, what we do? We check if connected pixel colour is not equal to boundary colour that
means we have not reached the boundary or the connected pixel colour is not equal to the
specified colour that means we are yet to assign it any colour, then we push it to the stack.
So, for each pixel we push either 4 connected pixels to the stack or 8 connected pixels to the
stack depending on the nature of connectedness that we are assuming. And then we come back
here and the loop continues still the stack is empty that means we have reached the boundary or
we have assigned colours to all the interior pixels. That is the simple idea of the seed fill
algorithm. Next we will discuss another approach which is called flood fill. The idea is almost
similar with some minor variations. Let us see how it works.
803
(Refer Slide Time: 18:03)
Now, in case of flood fill algorithm, we assume a different definition which is an interior
definition that means the interior pixels are known. Earlier, we assumed boundary definition with
only one interior pixel, that is the seed pixel. Now, here we are assuming interior definition, that
means all the interior pixels are known. And our objective is to colour or recolour the region with
a specified colour.
(Refer Slide Time: 18:38)
The idea is similar to seed fill, with some difference. Now, in this case the decisions are taken
based on original interior colour of the current pixels instead of boundary pixel colour. Other
804
things remain the same, that means using a stack and utilizing the stack elements in a particular
way, colouring them in a particular way remains the same.
(Refer Slide Time: 19:17)
So, the algorithm is shown here again input is the interior pixel colour, specified colour and the
one interior pixel or seed pixel, it is even more easy here because we already know the interior
pixels and we can randomly pick up one pixel and the output is after assigning colours to all the
pixels the set of pixels.
Now, we push the seed pixels to stack and as before we enter a loop, first we pop the stack and
set it to the current pixel applying specified colour, then assuming connectedness as we did
before, we deal with either 4 or 8 connected pixels and for each pixel we do the check, now here
the check is slightly different as compared to what we did earlier. Here we check if the colour of
the connected pixel is the interior colour.
Only in that case we push the connected pixel, because here we cannot check for boundary
colour, there is no boundary specified. And then we continue like before till stack is empty. So,
in both the cases we start with a seed pixel, but in one case we are dealing with a boundary
definition of pixels, in other case we are dealing with an interior definition of region in terms of
pixels. And accordingly our algorithm changes slightly otherwise broad idea remains the same.
805
(Refer Slide Time: 21:29)
We will discuss a third approach, which relies on geometric definition this is called scan line
polygon fill algorithm. So, earlier approaches seed fill or flood fill depend on pixel definitions. In
case of scan line polygon fill algorithm, we depend on geometric definition.
(Refer Slide Time: 21:56)
So, here we assume that the region is defined in terms of its vertices and edges, of course here
the implicit assumption is that the region is polygonal and the vertices are rounded off to the
nearest pixels. These are the things that we assume.
806
(Refer Slide Time: 22:23)
We will first discuss the algorithm and then try to understand it in terms of an illustrative
example. So, here the input is set of vertices and the output is the pixels, interior pixels with
specified colour. From the vertices what we do is determine the maximum, the minimum scan
lines, that means the maximum and minimum y values for the polygon.
So, for example here suppose this is our pixel grid and we have shape like this, so we need to
know the minimum y which is here ymin and maximum y which says here ymax. So, this
maximum and minimum first we determine, then we start from the minimum scan line, that is the
lowermost one here.
And then we enter into a loop and continue in the loop until we reach the maximum scan line as
shown in this loop condition. So, in the loop what we do? For each edge or the pair of vertices of
the polygon if go for a check if the scan line is within a certain range defined by the y
coordinates of the edge then we determine the edge scan line intersection point.
After these steps what we do? We sort the intersection points in increasing order of x
coordinates, that means we first try to determine the intersection points then we sort them in
increasing order, then apply specified colour to the pixels that are within the intersection points
all intermediate pixels we apply the colour and then we go to the next scan line, that is the broad
idea.
807
So, first we determine the minimum and maximum we start with minimum and continue the
processing till the maximum scan line is reached. In each step of the processing or each loop
execution what we do? We determine these two intersection points of the edge with the scan
lines to get the two extremes on a single scan line and then assign specified colour to all the
pixels that are within these extremes, that is the simple idea. Let us, now try to understand it in
terms of one example.
(Refer Slide Time: 25:58)
We will go through one illustrative example to get more clarity on the algorithm.
(Refer Slide Time: 26:07)
808
Let us consider this figure. Here there is a polygon or fill area specified with 4 vertices A, B, C
and D as shown here. Now, we followed an anti-clockwise vertex naming convention, so there
are 4 edges AB, BC, CD and DA.
(Refer Slide Time: 26:46)
Now first, we determine the minimum and maximum extent of the scan lines. Here it is 1 is the
minimum as you can see here and 6 is the maximum scan line, this we determine as the first step.
(Refer Slide Time: 27:10)
Then we start the loop. So, we start from 1 and continue till 6 and in each execution of the loop
we process one scan line. So, when we are starting with 1, so our objective is to determine the
809
intersection points of the scan line y equal to 1 with all 4 edges in the inner loop of the algorithm
that is lines 6 to 10.
(Refer Slide Time: 27:44)
If you execute the lines, you will find that for the edge AB the if condition is satisfied and the
intersection point is A, for BC and CD edges the condition is not satisfied, again for DA the
condition is satisfied and we get A again, so the intersection point is only A.
(Refer Slide Time: 28:15)
Since it is already a vertex, there cannot be any intermediate pixels. So, we get 2 intersection
points, which is the same vertex A, thus it is the only pixel and we apply the specified colour.
810
Then we go to the next iteration by setting scan line equal to 2 and checking that 2 is not the
maximum scan line that is 6, so we execute the loop again.
(Refer Slide Time: 28:57)
In the second iteration of the loop, what we do? We check for intersection points as before with
the edges and the scan line y equal to 2, that is this scan line.
(Refer Slide Time: 29:14)
Now, for y equal to 2 and if we check the edges we see that for AB if condition is satisfied that
means there is an intersection point, using the edge line equation and the scan line equation we
811
can get the intersection point as this one, this point, for BC and CD the if condition does not
satisfy for BC and CD, so there is no intersection.
And for DA the condition satisfies, again. So, this is for AB intersection point, this is for DA
intersection point and this DA intersection point we can find to be (3, 2) by using the line
equation for the edge as well as the scan line equation. So, this point is one intersection point,
this point is another intersection point.
(Refer Slide Time: 30:41)
Then we perform a sorting as mentioned in the algorithm and get these two intersection points in
a sorted order as this one. So, this is one intersection point and this is the other intersection point.
In between pixels are there as you can see so this itself is an in between pixel, then we have this
pixel, which is (4, 2) and then we have (5, 2).
Note that the other intersection point is not a pixel in itself because it involves a real number as
coordinate. So, we found out the 3 pixels, which are there in between the two intersection points,
then we apply specified colour to these pixels. Then we reset the scan line to 3 and check
whether 3 is the maximum scan line, it is not so we re-enter the loop again.
812
(Refer Slide Time: 32:15)
And in a similar way we process the scan lines y=3, y=4, y=5 till y=6. So, that is the idea of the
algorithm. So, to summarize here in this scan line polygon fill algorithm, what we do? We
assume a geometric representation of the region, polygonal region in terms of edges or vertices.
Then for each scan line which is there between the minimum and maximum scan lines, what we
do? We determine the intersection points of the edges with those scan lines, sort them to get two
extreme intersection points, identify the in between pixels and colour those pixels. And this we
do for all the scan lines that are there between the minimum and the maximum scan lines.
813
(Refer Slide Time: 33:29)
Now, there are two things that require some elaboration in the algorithm.
(Refer Slide Time: 33:38)
First is how do we determine the 8 scan line intersection point, that I think all of you know that
we can use the line equation which we can determine by the two endpoints and we can use the
scan line equation to get the intersection point. So, this line equation can be evaluated with the
scan line value to get the intersection point, which is a very simple approach and I think all of
you may already know how to do that.
814
(Refer Slide Time: 34:20)
Secondly how do we determine pixels within two intersection points? This is again very simple,
we start from the left most pixel which is either the intersection point or just next to the
intersection point and then continue along the scan line till we get a pixel value which is less than
the right intersection point. Pixel x coordinate we check in all these checks. So, both are simple,
but the second point that is how to determine pixels within two intersection points is not as
simple as it appears. And why that is so?
(Refer Slide Time: 35:27)
815
If we assume we have a concave polygon, then there is a problem. So, far whatever explanation
is given is based on the assumption that we are dealing with convex polygons. For concave
polygons it is not so easy to determine the intermediate pixels, an additional issues are there
which needs to be solved before we determine the interior pixels.
(Refer Slide Time: 36:16)
Let us, take an example, here in this figure as you can see this is a concave polygon, so when we
are dealing with these two extreme intersection points, some pixels are outside the polygon,
although if we follow the approach that we outlined earlier that we will just move along this line
to get all the intermediate pixels, then outside pixels will also be treated as interior pixels, which
816
definitely we do not want. So, we need to go for some different approach when we are dealing
with concave polygons. What we need to do is, we need to explicitly determine inside pixels,
pixels that are inside the polygon. As you can see from the figure that is not so obvious for
concave polygons.
(Refer Slide Time: 37:24)
So, in this case we need to perform an inside outside test for each pixel which is of course an
additional overhead.
(Refer Slide Time: 37:39)
817
And how we can do that? So, for each pixel p what we can do is determine the bounding box of
the polygon that is the maximum and minimum x and y values of the polygons vertices. This is
the first step, then in the second step we choose an arbitrary pixel, let us denote it by p0, outside
the bounding box.
So, in this case, this can be our bounding box, as you can see it covers all vertices of the polygon.
Then we choose one pixel outside this bounding box somewhere say here, that means choose a
point (x, y) which is outside the min and max range of the polygon coordinates. In the third stage
we create a line by joining p and p0.
So, we create a line between the pixel that is inside that is our pixel of concern and the point that
is outside the bounding box. In the final stage we go for some checks, if the line intersects the
polygon edges even number of times then p is outside, otherwise it is inside. So, as you can see
suppose we have a pixel here and these two pixels if we join it intersects one time, that is odd
number of time, so this pixel is inside.
Whereas, if we are dealing with a pixel here and we are joining this, so as you can see here
intersection is twice that is even number of times, so these pixel is outside. Similarly, these
pixels if we join these two lines, we see that it does not intersect the polygon edges, so that is 0,
so in that case also it is outside.
But of course all these checks takes time, so it is an additional overhead when we are dealing
with concave polygons. Otherwise, scan line polygon fill algorithm for convex polygons is quite
simple. So, we have discussed different region fill algorithms, both for pixel level definitions as
well as geometric definitions.
For pixel level definitions we learned about seed fill algorithm and flood fill algorithm, both are
similar with minor variation. For geometric definitions we learned about scan line polygon fill
algorithm, this algorithm is quite simple and straightforward when we are dealing with convex
polygon, but requires additional checks when we are dealing with concave polygons. Now, let us
try to understand how characters are displayed on the computer screen. So, how do we render
characters?
818
(Refer Slide Time: 41:28)
Here, character means alphanumeric characters.
(Refer Slide Time: 41:35)
Now, character rendering is of course as we all know is an important issue, for example consider
this slide, here we see lots of alphanumeric characters displayed. So, clearly it is an important
issue and this is the building blocks of any textual content, so any application that deals with
displaying texts must have support for character rendering. And how it is done?
819
(Refer Slide Time: 42:15)
As we all know when we are dealing with some text processing application, typically large
amount of text needs to be displayed in short time span. For example, if we consider scrolling,
now with each scroll action the whole set of characters is redrawn and that has to be done very
quickly. So, efficient rendering of characters is a very important issue in computer graphics.
(Refer Slide Time: 43:05)
Now, before we try to understand character rendering, let us have a quick look at the idea font
and we already know probably already heard of this term, so let us see what is a font and how it
is dealt within computer graphics.
820
(Refer Slide Time: 43:22)
So, when we are talking of font, font or typeface denotes overall design style of characters and
there are many such fonts as probably all of us are using every day such as Times New Roman,
Courier, Arial and so on. So, these fonts or typefaces indicate the design style of the characters
how they look.
(Refer Slide Time: 43:58)
Now, each font can be rendered with varying appearance. So, appearance may be different, it
may be bold, it may be italicised or it may be both bold and italicised.
821
(Refer Slide Time: 44:17)
Along with that there is another concept called size, how big or small the character appears on
the screen. So, that is denoted by point, for example a 10-point font, 12-point font and so on
which is a major of character height in inches and this term is borrowed from typography, but in
computer graphics we do not use the original measure.
Instead, we assume that point is equivalent to 1/72 of an inch or approximately 0.0139 inch. And
this is also known as DTP or desktop publishing or postscript point. So, when we are talking of a
point, we assume that it is 1/72 of an inch or approximately 0.0139 inch, which indicates the
height of the character.
822
(Refer Slide Time: 45:33)
Now, with that basic knowledge, let us try to understand how characters are rendered.
(Refer Slide Time: 45:41)
So, there are broadly two ways of rendering characters, one is bitmapped one is outlined.
823
(Refer Slide Time: 45:53)
In case of bitmapped font, we define a pixel grid for each character, for example consider this 8
by 8 pixel grid and we can define the grid for character B in capital, where the pixels that are part
of the characters are marked ON and others are OFF. So, when the grid is rendered for B, only
those pixels will be illuminated other pixels will not be illuminate. The black circles here
indicate the ON pixels as you can see here and the white boxes indicate the OFF pixels. So, we
can have this type of grid for each character when we are dealing with bitmapped font.
(Refer Slide Time: 46:59)
824
In contrast when we are dealing with outline font the approach is totally different, here characters
are defined using geometric primitives such as points and lines. So, few pixels maybe provided
and then other pixels will be determined by using scan conversion techniques for points, lines
and circles.
So, essentially few pixels are provided and using those pixels the computer will draw the
primitives such as lines or circles to construct a character like creating an image. So, in case of
bitmapped font, we already specify all the pixels whereas in case of outline font we do not
specify all the pixels, few pixels are specified and using those the overall shape is computed or
created by following the scan conversion techniques.
(Refer Slide Time: 48:18)
Clearly bitmapped fonts are simple to define and first to render because here no computation is
involved. We do not need to compute any pixels they are already specified, but it has some
problems.
825
(Refer Slide Time: 48:37)
Obviously, it will require additional storage, large amount of storage, because for each character
we are storing a pixel grid information and then if we want to resize or reshape to generate
different stylish effect of the font then that is not easily possible with bitmapped definitions and
the resulting font may appear to be bad.
(Refer Slide Time: 49:22)
The third concern is the font size depends on screen resolution, because we are fixing the pixel
grid. So, if the resolution changes then the rendering may not look good. For example, suppose
we have defined a 12 pixels high bitmap, it will produce a 12-point character in a 72 pixels per
826
inch resolution. Now, if we change the resolution to 96 pixels per inch then the same bitmap will
produce 9-point character which we may not want. So, depending on resolution the outcome may
change.
(Refer Slide Time: 50:18)
On the other hand, outline fonts compute the intermediate pixels, they do not store everything, so
it requires less storage, it can perform geometric transformations with satisfactory effect to
reshape and all resize, so the distortion will be less and it is not resolution dependent.
(Refer Slide Time: 50:50)
827
But on the other hand rendering such fonts is slow, which is quite obvious because computations
are involved, we need to create the shape, we need to scan convert the shape before rendering.
So, due to such computations rendering is slower compared to bitmap fonts. So, both the
approaches have their positive and negative sides and depending on the resources used and the
outcome desired we can choose one particular approach.
(Refer Slide Time: 51:34)
Whatever I have discussed today can be found in this book, you may go through chapter 9
section 9.3 and 9.4 for more details on the topics that we have covered today. In the next lecture,
we will discuss an interesting issue in scan conversion which is called aliasing effect till then
thank you and goodbye.
828
Computer Graphics
Professor Doctor Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 28
Anti-Aliasing Techniques
Hello and welcome to lecture number 28 in the course Computer Graphics. We are currently in
our last leg of discussion on the 3D graphics pipeline. Let us quickly recap the pipeline and then
we will continue our discussion today.
(Refer Slide Time: 00:47)
So, as we have learned there are five stages, let us quickly go through the stages once again. First
stage is object representation, second stage is modelling transformation, third stage is lighting,
four stage is viewing pipeline and there is a fifth stage which is scan conversion and just to recap
among these stages object representation deals with representation of objects that constitute a
scene and there the objects are defined in a local coordinate system.
Second stage is modelling transformation where we combine different objects to construct a
scene and there we perform a transformation that is local to world coordinate transformation and
then the scene is defined in world coordinate system. Now, in this world coordinate system we
assign colours to the object that is the lighting stage, then in the fourth stage that is viewing
pipeline we perform a series of transformations namely view transformation where we transform
from world to a view coordinate system then projection transformation where we transform from
829
3D view coordinate system to 2D view coordinate system and then thirdly window to viewport
transformation where we transform from 2D view coordinate system to a device coordinate
system.
In this fourth stage we also performed two operations namely clipping and hidden surface
removal both of these operations are done in this view coordinate system. The fifth stage that is
scan conversion is also related to transformation where we transform this device coordinate
description of a scene to a screen coordinate or pixel grid. Currently we are discussing on this
fifth stage that is scan conversion.
(Refer Slide Time: 03:01)
Now, here in this stage we have already covered few topics namely how to convert point, line,
circle, fill area, and characters. Today we will discuss an important concept related to scan
conversion which is called anti-aliasing techniques. This is required to smoothen the scan
converted or rendered shapes.
830
(Refer Slide Time: 03:39)
Let us, try to understand what is anti-aliasing then we will discuss few anti-aliasing techniques
with examples.
(Refer Slide Time: 03:48)
Let us, start with the basic idea. What is anti-aliasing? Now consider this figure here as you can
see the dotted line indicates the original line that we wanted to scan convert and this is the pixel
grid shown here and in this pixel grid, we want to scan convert this dotted line. However, as you
can see not all the points on the line passes through the pixels so we need to map those points to
the nearest pixels as we have seen in our earlier discussion.
831
As a result, what we get, we get this scan converted line which looks like a stair step pattern
represented with this thick black line. Now, this is definitely not the original line exactly it is an
approximation as we have already said however due to this approximation some distortion
happens. In case of line these are called jaggies or stair-step like patterns.
(Refer Slide Time: 05:13)
In general, there will be distortion to the original shape after scan conversion for any shapes not
only lines. Now, these distortions are called or the distorted shapes that is there after scan
conversion is called aliasing this phenomenon where we get distortion due to the algorithms that
we follow for scan conversion. Now, why it is called aliasing? What is the significance of this
term?
832
(Refer Slide Time: 05:52)
Before going into that we should note that aliasing is an undesirable side effect of scan
conversion we want to avoid that to the extent possible. So, additional operations are performed
to remove such distortions these techniques together are known as anti-aliasing techniques. So,
when we perform some techniques to remove aliasing effects then we call those techniques as
anti-aliasing techniques.
(Refer Slide Time: 06:33)
So, why we call it aliasing? What is the significance of this term?
833
(Refer Slide Time: 06:41)
In fact, the term aliasing is related to signal processing we can explain the idea in terms of
concepts that are borrowed from signal processing domain.
(Refer Slide Time: 07:05)
In computer graphics what we want we want to synthesize images. In other words, we want to
render true image and here we have to think of it as rendering the image on the window in the
view coordinate system or the 2D view coordinate system.
834
(Refer Slide Time: 07:41)
Now, how we define this image in terms of intensity values, which can be any real number.
Remember that we are still in the view coordinate system where we are free to use any real
number to represent intensity. Now, those intensity values if we think in a different way, they
represent some distribution of continuous values.
So, if we plot that plot those values in a graph then we will get some curve which represents the
distribution and here the values are continuous, it can take any real value real number. So,
essentially we can think of a true image as a continuous signal that is mapping the idea of image
rendering to signal representation.
835
(Refer Slide Time: 08:44)
Now, when we perform rendering what we do? The process of rendering can be viewed as it
two-stage process in a very broad sense in the first stage what we do we sample the signal or the
intensity values. In other words, we sample those values for pixel locations. So, we try to get the
pixel intensities, this is what we have discussed so far how to obtain the pixel intensities from the
actual intensities.
But there is also a second stage that is from the sampled intensities we want to reconstruct the
original signal as a set of coloured pixels on the display. So, essentially we are given an image
which is a continuous signal of intensity values. We can think of it in that way, then we want to
render this image on the pixel grid that is the purpose of scan conversion, how we do that? We
follow two stages broadly. In the first stage, we sample the pixel values and in the second stage
we reconstruct from those samples to get the rendered image.
836
(Refer Slide Time: 10:20)
Since we have reconstructing the original signal from the sampled value, clearly it is not the
exact signal and we are dealing with a false representation of the original signal. Now, in English
a person using a false name is known as alias and the same idea is adapted here where we are
trying to represent an original signal in terms of false reconstructed signal hence this particular
signal is called alias and whatever we get is known as aliasing.
(Refer Slide Time: 11:19)
Now, since we are reconstructing there will be some change from the original. Usually it results
in visually distracting images or artefacts and to reduce or eliminate this effect we use anti-
837
aliasing techniques. We use techniques that reduce aliasing effects, they are called anti-aliasing
techniques.
(Refer Slide Time: 11:48)
Now, how we do that? Again we can borrow terms from signal processing. So, continuous
intensity signal that is the true image can be viewed as a composition of various frequency
components or in other words primary signals of varied frequencies. That is one way of giving
the intensity values.
838
(Refer Slide Time: 12:21)
Now, there are two components in those signals. Uniform regions with constant intensity values
may be viewed as corresponding to low frequency components whereas there are values that
change abruptly and these values correspond to sharp edges at the high end of frequency
spectrum. In other words, wherever there are abrupt changes in values we can think of those
regions as representing the high frequency components.
(Refer Slide Time: 13:10)
Now, because of those high frequency components such abrupt changes we get aliasing effects.
So, we need to smoothen out those aliasing effects, and how do we do that?
839
(Refer Slide Time: 13:30)
By removing or filtering out the high frequency components from the reconstructed intensity
signals. So, if we do not do any additional operations, we will have the reconstructed signal
having both high and low intensity which will result in distortion. So, our objective would be to
eliminate or remove, filter out the high frequency components in the reconstructed signal.
(Refer Slide Time: 14:12)
Now, there are broadly two ways to do that. We apply broadly to group of techniques filtering
techniques, one set of techniques comes under pre-filtering techniques and the other set comes
under post-filtering techniques.
840
(Refer Slide Time: 14:34)
In pre-filtering, what we do is we perform filtering before sampling that means we work on the
true signal which is in the continuous space and what we do we try to derive proper values for
individual pixels on the true signal and these group of techniques are also called area sampling
techniques.
(Refer Slide Time: 15:07)
In contrast, in post filtering as the name suggests, we perform filtering after sampling. So, we
filter high frequency components from the sampled data. In other words, we compute pixel
values and then using post filtering techniques we modify those values. Now, this group of
841
techniques are also known as super sampling techniques. So, we have pre-filtering or area
sampling techniques and post filtering or super sampling techniques. Now, let us try to learn
about few of these techniques.
(Refer Slide Time: 15:53)
We will start with area sampling. What are the techniques and how they work?
(Refer Slide Time: 16:01)
Now, in case of area sampling, we assume that pixel has area. So, pixel is not a dimensionless
point instead it has got its area usually considered to be square or circular with unit radius and
lines passing through those pixels have some finite width. So, each line also has got some area.
842
(Refer Slide Time: 16:35)
Now, to compute pixel intensity what we do we determine the percentage of pixel area occupied
by line. Let us denote it by p then pixel intensity I is computed as a weighted sum of the line
colour and background colour as shown in this expression. So, to compute intensity of a pixel
say for example here this is pixel 01.
Now, here you can see that the line covers 50 percent of the pixel area or 0.5 into the line colour
whatever that is, that cl then 1 minus 0.5 into whatever is the background colour that will be this
pixel colour of this particular pixel 01 in the figure. Note that earlier what we did earlier we
simply assigned the line colour to this particular pixel, but here we are considering area, how
much of that area is occupied by the line and then accordingly we are changing the colour. This
is one approach.
843
(Refer Slide Time: 18:16)
Let us, have a look at a little bit more sophisticated approach involving more computations,
which is called Gupta-Sproull algorithm.
(Refer Slide Time: 18:27)
Now, this is a pre filtering technique used for anti-aliased line drawing and here the intensity,
pixel intensity is set based on distance of a line or line centre. Since here we are assuming line as
finite width from the pixel centre. Now, this idea of this particular algorithm is based on
midpoint line drawing algorithm.
844
So, earlier we have talked about DDA algorithm, Bresenham’s line drawing algorithm. Now,
there is another algorithm that is midpoint line drawing algorithm, this algorithm is used in the
Gupta-Sproull pre-filtering technique. So, we will first try to understand this algorithm then we
will talk about the actual pre filtering technique.
(Refer Slide Time: 19:30)
So in the midpoint algorithm, what we do? Let us assume that we just determined this current
pixel here and for the next pixel we have two candidate pixels. One is upper candidate pixel, one
is lower candidate pixel given by these two E and NE. Up to this, it is similar to the Bresenham's
line drawing algorithm that we have seen earlier. Now, what changes is that the way we decide
on these candidate pixels, which one to choose.
845
(Refer Slide Time: 20:15)
Earlier what we did we made the decision based on distance of line from candidate pixels. Now,
in this midpoint algorithm, what we will do is we will consider midpoint between the candidate
pixel rather than distance from line. So, midpoint is shown here between these two candidate
pixels, which can be represented by this expression as you can see.
(Refer Slide Time: 20:53)
Now, we can represent a line as shown here with this expression where a, b, c are integer
constants. Now we can restate this equation by multiplying 2 and get this expression without
affecting anything in the equation. So that is just a trick.
846
(Refer Slide Time: 21:28)
Then we set the decision variable dk which is the function evaluated at the midpoint M or at this
point (xk+1, yk + ½). Now if we use the modified equation after multiplication by 2 then we can
see that if we expand it will look something like this that is the decision variable at k.
(Refer Slide Time: 22:00)
Now, if d is greater than 0 then midpoint is below the line. In other words, if midpoint is below
the line then pixel NE is closer and we should choose NE here. Otherwise, we should choose E.
Now when NE closer that means d is greater than 0. The next decision variable will be d k+1
847
which is the next midpoint and if we expand and rearrange then we get in terms of the previous
decision variable dk. So, dk+1 we can represent in terms of dk and a constant twice a plus b.
(Refer Slide Time: 23:02)
Now, when dk≤0, then we choose this one because midpoint is closer to NE and in that case the
next decision parameter or decision variable would be given by this expression and which if we
rearrange and reorganize then we will get dk+1= dk + 2a.
(Refer Slide Time: 23:41)
848
Now, what is the initial decision variable, that is given by this expression. And after expanding
we get this initial variable to be 2a+b, knowing that this value is 0 as you can see here in this
derivation.
(Refer Slide Time: 24:04)
So, here we have summarized the steps of the algorithm. So, input is two line endpoints and the
output is set of pixels to render the line. Now, first task is to find out the value of these constants
say a, b and c from endpoints and then initial decision value d then we start with one end point
and continue till the other endpoint as before. If d>0 then we update x, y in this way and update
the decision parameter in this way.
Otherwise, we update x in this way update the decision parameter in this way and eventually in
each iteration we add this pixel to P we continue till we reach the other end point that is the
midpoint algorithm. Now, let us see how this midpoint algorithm is used in Gupta Sproull
algorithm.
849
(Refer Slide Time: 25:15)
Now in Gupta Sproull algorithm there is some modification to the basic midpoint algorithm.
Consider this figure here, here suppose we have chosen this pixel at present xk, yk and based on
midpoint let us assume that we have chosen E this pixel in the next step. Later on, we will see
what will happen if we choose NE instead of E.
(Refer Slide Time: 25:53)
Now, D is perpendicular distance from the point to the line and we can compute D using
geometry as shown here where Δx, Δy are the differences in x and y coordinates of the line
endpoints so they are constants essentially. So, the denominator here is constant.
850
(Refer Slide Time: 26:35)
Now, what should be the value of the intensity here, it will be a fraction of the original line
colour. So, will not assign the original line colour here to avoid aliasing effect instead we will
assign a fraction of the original line colour. So, how to choose this fraction it is based on that
distance D. Now see that this is in contrast to the earlier approaches like Bresenham’s algorithm
or DDA algorithm where line colour is simply assigned to the chosen pixel.
(Refer Slide Time: 27:14)
851
Now, how this distance determine the colour? Typically, a cone filter function is used that means
more distant the line from the chosen pixel centre is the laser is the intensity. So, more distant the
line is from the chosen pixel centre the less will be the intensity it will be dimmer.
(Refer Slide Time: 27:39)
And this distance to intensity value determination is implemented as a table. So, we maintain a
table where based on distance a particular intensity value is there and depending on the computer
distance we simply take the intensity value from the table and apply it to the chosen pixel. So,
each entry in the table represents fraction with respect to a given D. So, some precomputed D
values and their corresponding intensity values are there in that table.
852
(Refer Slide Time: 28:22)
That is not all, along with that to increase the smoothness of the line, intensity of neighbours are
also changed. So, here E is the chosen pixel here its neighbours are this pixel and this pixel.
Now, their intensity is also modified. Again, according to the distances Dupper and Dlower from the
line. So, here as you can see, this is Dlower and this is Dupper and depending on these values this
neighbour pixel values are set.
(Refer Slide Time: 29:02)
853
This Dupper and Dlower can be again obtained using geometry as shown in these expressions where
v is this distance and Δx Δy are as before difference between the x and y coordinates of the
endpoints.
(Refer Slide Time: 29:33)
And depending on those distances again tables are maintained to set the values of the
neighbouring pixels. Now, in case of E suppose we have chosen NE, then of course these
distances will be computed differently. Again, we can use geometry to check that the distances
will be represented in this way. Of course here Dupper is different and Dlower is different as
compared to the previous case and their distances we can represent using these expressions.
So, this is the distance of the chosen pixel from the line, perpendicular distance then this is Dupper
this is Dlower for each we maintain tables to implement the cone filter functions. In the table just
to recap we maintain distances and the fraction of line colour to be applied based on that we
choose the colour.
854
(Refer Slide Time: 30:43)
So, there are a few additional steps performed in each iteration of the midpoint algorithm. In the
regular algorithm we do not modify the pixel colours, whatever line colour is there we choose
that to be the chosen pixel colour but here in Gupta-Sproull algorithm those steps are modified.
So, after choosing a candidate pixel, we determine the line colour by first computing the distance
if E is chosen then distance is given in this expression otherwise if NE is chosen then separate
expression.
Then we update the decision value as in the regular algorithm, then we set the intensity values
according to D, then we compute Dupper and Dlower and then set intensity of the two vertical
neighbours, these are the additional steps that are performed in the Gupta-Sproull algorithm as
compared to the original midpoint line drawing algorithm.
855
(Refer Slide Time: 32:00)
Let us, try to understand the algorithm with one example.
(Refer Slide Time: 32:06)
Suppose, this is our line shown here, these are the endpoints and we want to choose the pixels
and colour them. So, both the things we need to do, choose pixels and also to choose appropriate
intensity values for those pixels.
856
(Refer Slide Time: 32:35)
So, first we would go for choosing the pixel and then we will decide on its colour. So, the line
equation can be given in this way or we get a equal to this, b equal to this, and c equal to this,
and initial decision value d is given as 1.
(Refer Slide Time: 33:02)
So, in the first iteration, we need to choose between NE and E, d is greater than 0 if we can see.
So, then we need to choose NE that is this pixel and then reset d.
857
(Refer Slide Time: 33:23)
Now, after choosing NE the next iteration we choose this pixel E'(3, 2) depending on the value of
d and then again we reset d.
(Refer Slide Time: 33:51)
Now, after choosing these two pixels, we check that where is the other endpoint so stop. So, at
the end the algorithm returns these four pixels, these are to be rendered. Now, while choosing a
pixel we also use the modifications proposed in the Gupta-Sproull algorithm to assign colours.
858
(Refer Slide Time: 34:21)
So, for example when we are talking about NE this is one chosen pixel, so you have to choose its
colour as well as the colour of its two neighbours. Similarly, when we have chosen E we have to
choose its colour as well as the colour of its two neighbours.
(Refer Slide Time: 34:46)
For that we determine Δx, Δy. Then we compute the distance from NE D which is given here and
based on this distance.
859
(Refer Slide Time: 35:06)
Also, we compute Dupper and Dlower by computing the first.
(Refer Slide Time: 35:25)
So, Dupper and Dlower are these two values for this particular pixel NE. So, you have Dupper, Dlower.
Now, we use the table to determine the fraction of the original line colour to be applied to the
three pixels based on the three computed distances.
860
(Refer Slide Time: 35:56)
That is for any now for E' we have these two pixels, neighbouring pixels and the chosen pixel E'.
So, here similarly we can compute D to be this value 1/√13.
(Refer Slide Time: 36:17)
And we compute v here to get the Dupper and Dlower values.
861
(Refer Slide Time: 36:33)
So, using v we get Dupper to be this value, Dlower to be this value and now we know the distance of
this pixel from line as well as Dupper and Dlower again we go back to the table perform the table
lookup to determine the fraction of the line colour to be assigned to these 3 pixels.
So that is how we choose the pixel colours. So in summary, what we do is in this Gupta-Sproull
algorithm is we do not simply assign line colour to the pixels that are chosen to represent the
line, instead we choose pixels following a particular line drawing algorithm and then compute
three distances, distance of the pixel from the line and distance of the neighbouring pixels from
the line. Here we are talking only in terms of vertical neighbours.
And based on these distances we find out the fraction of the line colour to be applied to the
chosen pixel as well as the neighbouring pixels. Here we apply a cone filter function in the form
of a table. In the table corresponding to the distances some fractions are mentioned. So, those
were area sampling techniques
862
(Refer Slide Time: 38:15)
Now, let us try to understand the other broad class of anti-aliasing techniques known as super
sampling.
(Refer Slide Time: 38:23)
Now, in super sampling what we do? Here each pixel is assumed to consist of a grid of
subpixels. In other words, we are effectively increasing the resolution of the display. Now to
draw anti-aliased lines, we count number of sub pixels through which the line passes and this
number determines the intensity to be applied. For example, here this whole square is a pixel,
which is represented with two by two subpixel grid.
863
(Refer Slide Time: 39:12)
That is one approach. There are other approaches also, for example, we can use a finite line with
determine inside subpixels that means the subpixels that are inside the finite width of the line.
There can be a simple check for that. We may only consider those subpixels which has its lower
left corners inside the line as inside. So if a subpixel has its lower left corner inside the line we
can consider that subpixel to be inside subpixel and then pixel intensity is weighted average of
subpixel intensities where weights are the fraction of subpixels inside or outside, that can be
another simple approach.
(Refer Slide Time: 40:25)
864
For example, in this figure as you can see the line has some width and each pixel has got 2×2
subpixel grid. So for example, this is a pixel which has got 2×2 subpixel grid, this pixel (0, 2).
Similarly, this is a pixel (1, 1) which has got 2×2 subpixel grid. Now let us assume that the
original line colour is given by this R, G, B values and the background light is yellow with again
these R, G, B values. So, then how to choose the actual colour after rendering?
(Refer Slide Time: 41:30)
Now in the figure as you can see these three subpixels maybe considered inside the line. Thus the
fraction of subpixels that is inside is 3/4 and outside fraction is 1/4.
(Refer Slide Time: 42:07)
865
Now if we take a weighted average for individual intensity components, then we get the R value
for this particular pixel as this or this value, G value will be this value and B value will be 0. And
this R value or this G value and this B value together will give us the colour of that particular
pixel.
(Refer Slide Time: 42:45)
So, the intensity will be set as R equal this, G equal to this and B equal to this value and we got
this by considering the fraction of subpixels that are inside the line and fractions that are outside.
(Refer Slide Time: 43:07)
866
Sometimes we also use weighting masks to control the amount of contribution of various
subpixels to the overall intensity. Now this mask size depends on the subpixel grid size, that is
obvious because we want to control the contribution of each and every subpixel. So if we have a
3×3 subpixel grid then the mask size will be 3×3, there should be same.
(Refer Slide Time: 43:48)
For example, consider a 3×3 subpixel grid and we are given these masks. So, given this mask
and these three subpixel grid, how to choose the colour or the intensity of a pixel?
(Refer Slide Time: 44:09)
867
So, we can use this rule that the intensity contribution of a subpixel is its corresponding mask
value divided by 16, which is the sum of all the values. So for the subpixel (0, 0) its contribution
will be 1/16, for subpixel (1, 1) contribution will be 4/16 because the corresponding value is 4
and so on. So whatever subpixel intensity value is we will multiply with these fractions to get the
fractional contribution and then we will add those up to get the overall pixel intensity.
(Refer Slide Time: 45:07)
Now suppose a line passes through or encloses the sub pixels top, center, bottom left and bottom
of a pixel. Assuming the same masks as shown here and 3×3 subpixel grid.
(Refer Slide Time: 45:32)
868
Then to get the line intensity at any particular pixel location or that pixel intensity for the line
where line colour is given by cl original colour and background colour is given by cb, what we
can do, we can use this simple formulation; pixel intensity will be given by total contribution of
the subpixels into cl plus 1 minus total contribution into cb, this is for each of these R, G, B.
components.
(Refer Slide Time: 46:11)
So, that is in effect what we can do it super sampling. So, let us summarize what we have learned
so far.
(Refer Slide Time: 46:25)
869
So with this discussion we have come to the end of our discussion on 3D pipeline and here we
have covered in this series of lectures the five stages as I mentioned at the beginning. Today we
have learned about one important technique that is aliasing and how to address it. It happens
because in the fifth stage when we convert a shape for an image to a pixel grid then we get
distortions and anti-aliasing techniques are useful for avoiding those distortions.
To do that, we follow broadly two groups of techniques; either area sampling or pre-filtering and
super sampling or post filtering. In pre-filtering, we learned about Gupta-Sproull algorithm along
with some other simple approach. Similarly, in post filtering we learned about three approaches
there are many others, of course.
So these are meant to give you some idea of what are the issues and how they are addressed but
clearly as you can see if we are going for anti-aliasing techniques, it entails additional
computations and additional hardware support which of course has its own cost. With that we
end our discussion today and that is also the end of our discussion on the 3D graphics pipeline.
In the next lecture, we shall learn about pipeline implementation, how the pipeline stages that we
have learned so far are implemented, in particular we will learn about the graphics hardware and
software. In the introductory lectures we learnt in brief the very small way graphics hardware
and software, here we will go into more details.
(Refer Slide Time: 48:53)
870
Whatever I have discussed today can be found in this book. You are advised to go through
chapter 9 section 9.5 for more details on the topics. So see you in the next lecture, till then, thank
you and goodbye.
871
Computer Graphics
Professor Doctor Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology Guwahati
Lecture 29
Graphics I/O Devices
Hello, and welcome to lecture number 29 in the course Computer Graphics. So, before we go
into today's topic, we will quickly recap what we have learned so far.
(Refer Slide Time: 00:44)
Now, till today, we have covered the stages of the 3D Graphics Pipeline. We completed our
discussions on the pipeline stages. Today and in the next few lectures, we are going to look
into its implementation that means, how the pipeline stages are implemented.
872
(Refer Slide Time: 01:15)
So, in these lectures on pipeline as well as the lectures that preceded the pipeline discussion,
what we have learned?
(Refer Slide Time: 01:30)
We can summarize the learning as the fundamental process that is involved in synthesizing or
depicting an image on a computer screen, that is what we have learned so far in the process.
873
(Refer Slide Time: 01:54)
Now in this process, there are several stages. So, the process starts with abstract
representation of objects, which involve representing points or vertices, lines or edges and
other such geometric primitives, that is the first thing we do in executing the process. Next,
the subsequent stages of the pipeline are applied to convert this representation to a bit
sequence, sequence of 0s and 1s.
And then this sequence is stored in this frame buffer location, and the content of the frame
buffer is used by video controller to activate appropriate pixels, so that we perceive the
image, that is the whole process. We first define some objects or in other words, we define a
scene, then we apply the pipeline stages on this definition to convert it to 0s and 1s, and then
these 0s and 1s are stored in a frame buffer. The frame buffer values are used by the video
controller to activate appropriate pixels on the screen to give us the perception of the desired
image.
874
(Refer Slide Time: 03:43)
So far, we have discussed only theoretical aspects of this process, that means how it works
conceptually. But we did not discuss on how these concepts are implemented. And today and
next few lectures, we will do that, that will be our primary focus, how the concepts that we
have discussed to understand the process are implemented in practice.
(Refer Slide Time: 04:23)
So, what we will learn? We will learn the overall architecture of a graphics system, how it
looks. Then, we will have discussion on display device technology. We will also learn about
graphics processing unit or GPU in brief. Then, we will mention how the 3D pipeline is
implemented on graphics hardware.
875
And finally, we will learn about OpenGL, which is a library provided to ease graphics
software implementation. So, we will start with how a graphic system architecture looks.
Remember that we have already introduced a generic system architecture in our introductory
lectures. We will quickly recap and then try to understand it with the new knowledge that we
have gained in our previous discussions.
(Refer Slide Time: 05:37)
So, if you may recollect, so in the generic architecture, generic system architecture, we have
several components as shown in this figure. So, we have the host computer, which issues
commands and accepts interaction data. Now, we have display controller, which is a
dedicated graphics processing unit, which may take input from input devices. Then the output
of this display controller is stored in video memory. And this video memory content is used
by the video controller to render the image on the screen, that is what we have briefly learned
about earlier.
876
(Refer Slide Time: 06:41)
But as may be obvious, the terms that we used were very broad, they give some generic idea
without any details.
(Refer Slide Time: 06:55)
In the last few lectures, we have learned about new things, how the pipelines are organized
and what are the algorithms, what they do. So, in light of that new knowledge, let us try to
understand the relationship between these hardware components and the pipeline stages. Let
us assume that we have written a program to display 2 objects on the screen.
It is very simple image having only a ball and a cube, something like this. So, this is the
screen here we will show a ball, maybe with some lines and a cube. So, we want to show
877
these two objects as an image on the screen, and we have written a program to do that. Then
let us try to understand with respect to the generic architecture, what happens.
(Refer Slide Time: 08:25)
Once the CPU detects that the process involves graphics operations, because here display is
involved, it transfers the control to display controller. In other words, it frees itself from
doing graphics related activities, so that it can perform other activities. Now, the controller
has its own processing unit separate from CPU, which is called GPU or graphics processing
unit. We will learn in more details about the GPU in a later lecture.
Now, these processing units can perform the pipeline stages in a better way. So, there are
specialized instructions using which the stages can be performed on object definition by the
GPU to get the sequence of bits. So essentially, conversion of the object definition to
sequence of bits is performed by GPU with the use of specialized instructions.
878
(Refer Slide Time: 09:58)
Now, this bit sequence is stored in frame buffer, which we have already mentioned before. In
case of interactive systems, where user can provide input, frame buffer content may change
depending on the input that is coming from the input devices.
(Refer Slide Time: 10:21)
But we must keep in mind that frame buffer is only a part of the video memory. It is not the
entire video memory. We also require other memory to store object definitions as well as to
store instructions for graphics operations, that means the code and data part. So, that is what
constitute video memory, we have frame buffer as well as other memory to store various
things.
879
(Refer Slide Time: 11:04)
Now, how to organize this memory? There are 2 ways. We can integrate the memory in the
generic architecture as shared system memory, that means a single memory shared by both
CPU and GPU. Clearly here, to access memory, we need to use the common system bus as
shown in this figure. So, the execution may be slower.
So, in this figure, as you can see, we have CPU and GPU here as part of display controller,
and we have a common system memory which both of them access through this. So, if GPU
wants to access it, it will leave the system bus, if CPU wants to access it, it will leave the
system bus, and accordingly, it may be slow.
(Refer Slide Time: 12:13)
880
Otherwise, we can have dedicated graphics memory, which can be part of this graphics
controller organization. As shown here, as you can see, we have this display controller, which
has exclusive access to this dedicated graphics memory or video memory. This memory has 2
component, one is the memory containing other things and one is the memory called frame
buffer. And here, there is no need to access the shared memory through system bus, common
system bus, so it is faster as compared to the previous scheme.
(Refer Slide Time: 13:09)
Now, once the data is available in the frame buffer, video controller acts on the framebuffer
content. Now, acting means it maps to activation of corresponding pixel on the screen, the
framebuffer content is mapped by the video controller to activation of corresponding pixels
on the screen. For example, in case of CRT, activation refers to excitation as we have seen
earlier by appropriate amount of corresponding phosphor dots that are there on the screen.
Now, how to choose the appropriate amount? This amount of excitation is determined by
electron beam intensity, which in turn is determined by voltage applied on electron gun,
which in turn is determined by the frame buffer value. So, this is how this frame buffer value
affects the amount of excitation in case of CRT, and similar thing happens with respect to
other devices as well.
So, that is in summary, how we can understand the generic system architecture in light of the
stages that we have learned. So, we can relate the stages to the ultimate generation of image
on the screen at a very broad level as we have just discussed. Now, let us try to have a more
881
detailed understanding of different graphics hardware and software. So, we will start with
graphics input and output devices.
(Refer Slide Time: 15:18)
Let us start with the output devices. Now, as we all know, whenever we talk of graphics
output device, immediately what comes to our mind is the video monitor or the so-called
computer screen. But there are other output devices as well. For example, output also means
projectors, we project the content. Of course, as we all know, both can be present together in
a graphics system, both the monitor as well as a projector.
In addition, there may be a third mode of output that is hardcopy output. We are already
familiar with them, one is printer, other one is plotters. Also, nowadays, we have wearable
displays such as head mounted displays HMDs, which are not traditional computer screens,
but they also provide a way to display output. So, there are outputs available in different
ways.
882
(Refer Slide Time: 16:38)
In this lecture, we will talk about video monitors and hardcopy outputs, namely printers and
plotters in brief. We will start with video monitor.
(Refer Slide Time: 17:00)
Now, whatever screens we see nowadays are all called flat panel displays. This is a generic
term used to represent displays that are flat as compared to earlier CRTs, which used to be
bulky. So, they are thinner and lighter compared to CRTs of course and useful for both non
portable and portable systems. And they are almost everywhere, desktops, laptops, palmtops,
calculators, advertising boards, video-game console, wristwatch and so on. Everywhere, we
get to see flat panel displays. Now, there is a wide variation in these displays.
883
(Refer Slide Time: 17:54)
Flat panel effectively is a generic term, which indicates a display monitor having a much
reduced volume, weight and power consumption compared to CRT. So, whenever we are
talking of flat, it has to be understood in the context of CRT.
(Refer Slide Time: 18:22)
Now there are broadly two types of flat panel displays, one is emissive display, other one is
non-emissive displays.
884
(Refer Slide Time: 18:38)
In case of emissive displays, they are often known as emitters, what happens is that these
displays convert electrical energy into light on the screen. Examples are plasma panels, thinfilm electroluminescent displays, light emitting diodes or LEDs, these are all emissive
displays.
(Refer Slide Time: 19:15)
In case of non-emissive display, what happens is that such displays convert light which may
be natural or may come from other sources to graphics pattern on screen through some
optical effects, this is important. Example is LCD or liquid crystal displays.
885
(Refer Slide Time: 19:49)
Let us go into a bit more details of these types of displays. We will start with emissive
display.
(Refer Slide Time: 19:57)
As we mentioned, one example of emissive displays is plasma panel displays. Now, in such
type of displays, we have 2 glass panels or plates placed parallelly as shown in this figure.
And the region in between is filled with a mixture of gases, these are Xeon, Neon and
Helium. So, this is the inside region between the 2 parallel plates, glass plates, which is filled
with gases.
886
(Refer Slide Time: 20:45)
Now, the inner walls of each plate contain set of parallel conductors. And these conductors
are very thin and ribbon shaped. As shown here, these are set of parallel conductors, these are
also sets of parallel conductors. The conductors are placed on the inner side of the plate.
(Refer Slide Time: 21:22)
And as shown in this figure, one plate has set of vertical conductors, whereas the other
contains a set of horizontal conductors. The region between each corresponding pair of
conductors that means horizontal and vertical conductors is defined as a pixel. So, the region
in between these parallel conductors is called a pixel as shown here.
887
(Refer Slide Time: 22:06)
Now, the screen side wall of the pixel is coated with phosphors. For RGB or colour displays,
we have 3 phosphors corresponding to RGB values.
(Refer Slide Time: 22:32)
Now, what happens? The effect of image displayed on the screen happens due to ions that
rush towards electrodes and collide with the phosphor coating. When they collide, they emit
lights. And this light gives us like in case of CRT, the perception of the image. Now, the
separation between pixels is achieved by the electric fields of the conductors. That is how
plasma panels work.
888
(Refer Slide Time: 23:22)
Then we have led or light emitting diodes, that is another type of emissive devices. In this
case, each pixel position is represented by an LED or light emitting diode. So, the overall
display is a grid of LEDs corresponding to the pixel grid. Now, this is different than plasma
panel as you can see, where we did not have such grids, instead ions collide with phosphors
and produce lights.
Now, based on the frame buffer content, suitable voltage is applied to each diode in the grid
to emit appropriate amount of light. Again, similar to CRT, where we use the frame buffer
content to produce suitable amount of electron beam to produce suitable amount of intensity
from the phosphors.
(Refer Slide Time: 24:46)
889
Let us now try to understand non-emissive displays. An example is LCD or liquid crystal
displays. So here, like plasma panel here we have 2 parallel glass plates, each having a
material which is a light polarizer aligned perpendicular to the other. And rows of horizontal
transparent conductors are placed on the inside surface of one plate having particle polarisers.
Also, columns of vertical transparent conductors on the other plate having horizontal
polarizer.
(Refer Slide Time: 25:45)
Now, between the plates, we have a liquid crystal material. Now, this material refers to some
special type of materials that have crystalline molecular arrangement, although they flow like
liquids, they behave like liquids. Now, LCDs typically contain threadlike or nematic
crystalline molecules, which tend to align along their long axes.
890
(Refer Slide Time: 26:27)
The intersection points of each pair of mutually perpendicular conductors define the pixel
positions. When a pixel position is active, molecules are aligned.
(Refer Slide Time: 26:52)
Now, this LCD can be of 2 types, reflective and transmissive.
891
(Refer Slide Time: 27:01)
In case of reflective display, we have external light enters through one polarizer and gets
polarized. Then the molecular arrangement ensures that the polarized light gets twisted, so
that it can pass through the opposite polarizer. And behind polarizer, a reflective surface
reflects the light back to the viewer. So here, it depends on external light.
(Refer Slide Time: 27:42)
In case of transmissive display, we have a light source present on the backside of the screen
unlike reflective displays where there is no light source present. Now, light from the source
gets polarized after passing through the polarizer, then twisted by liquid crystal molecules,
and passes through screen-side polarizer to the viewer. Here, to deactivate a pixel, voltage
892
applied to intersecting pairs of conductors, which leads to molecules in the pixel region
getting rearranged.
Now, this arrangement prevents polarized light to get twisted and passed through the opposite
polarizer effectively blocking the light. So, we do not get to see any colour or anything at
those pixel locations. So, the basic idea in liquid crystal displays is that, we have a liquid
crystal in between pixel positions. Due to the molecular arrangement, light passes through or
gets blocked and we get the image on the screen accordingly.
(Refer Slide Time: 29:20)
Another thing to note here is that these both reflective and transmissive LCDs are also known
as Passive Matrix LCD technology.
893
(Refer Slide Time: 29:34)
In contrast, we also have Active Matrix LCD technology, which is another method of
constructing LCDs. In this case, thin film transistors or TFTs are placed at each pixel location
to have more control on the voltage at those locations. So, they are more sophisticated. And
these transistors also help prevent charges leaking out gradually to the liquid crystal cells. So,
essentially in case of passive matrix, we do not have explicit control at the pixel locations,
whereas in case of active matrix LCDs, we have transistors placed at those locations to have
more control on the way light passes.
(Refer Slide Time: 30:31)
Now, let us try to understand output devices, graphic output devices.
894
(Refer Slide Time: 30:41)
So, as we said when we talk of output devices, one is display screen that is one, other thing is
hardcopy devices, display screen we have already discussed. In hardcopy output devices, we
have printers and plotters. In case of printers, there are broadly 2 types, impact printers and
non-impact printers.
(Refer Slide Time: 31:14)
Now, in case of impact printers, there are pre-formed character faces pressed against an inked
ribbon on the paper. Example is line printer, where typefaces mounted on a band or chain or
drums or wheels are used. And these typefaces are pressed against an ink ribbon on the paper.
So, in case of line printer, whole line gets printed at a time.
895
(Refer Slide Time: 32:00)
There is also character printer. In that case, 1 character at a time is printed, example is dot
matrix printer, although they are no longer very widespread nowadays, but still in few cases
they are still used. In such printers, the print head contained a rectangular array or matrix of
protruding wire pins or dots. Number of pins determine the print quality. Higher number
means better quality. Now, this matrix represents characters. Each pin can be retracted
inwards.
(Refer Slide Time: 33:28)
During printing, some pins are retracted, whereas the remaining pins press against the ribbon
on paper, giving the impression of a particular character or pattern. So here, the objective is
to control the pins or the dots, which pins to let impact on the ribbon and which pins to pull
896
back inwards. Those are impact printers. More popular nowadays are non-impact printers.
We are all familiar with them. It has laser printers, inkjet printers, electrostatic methods and
electrothermal printing methods.
(Refer Slide Time: 33:51)
In case of laser printer, what happens? A laser beam is applied on a rotating drum. Now, the
drum is coated with photo-electric material such as selenium. Consequently, a charge
distribution is created on the drum due to the application of the laser beam. The toner is then
applied to the drum, which gets transferred to the paper. So due to the charge distribution,
that toner creates a pattern, pattern of what we wanted to print, and that gets transferred to the
paper.
(Refer Slide Time: 34:41)
897
That was laser printing technology. In case of inkjet printers, what happens? An electrically
charged ink stream is sprayed in horizontal rows across a paper, which is wrapped around a
drum. Now, using these electrical fields that deflect the charged ink stream, so there are
electrical fields also, which deflect the charged ink stream, dot matrix patterns of ink is
created on the paper. So essentially, there is ink stream which gets deflected due to electrical
field and then creates the desired pattern on the paper, which is wrapped around a drum.
(Refer Slide Time: 35:40)
Then we have electrostatic printer. In this case, a negative charge is placed on paper at
selected dot positions one row at a time. Now, the paper is then exposed to positively charged
toner, which gets attracted to the negatively charged areas, producing the desired output.
(Refer Slide Time: 36:14)
898
And finally, we have electrothermal methods of printing also. In this case, heat is applied to a
dot matrix print head on selected pins, and print head is used to put patterns on a heat
sensitive paper. Of course, these 2 types are not as common as the laser jet and inkjet printers,
but they are still used. That is about how printers work.
(Refer Slide Time: 36:50)
So far, we have not mentioned anything about colour printing. We will quickly try to
understand how colour printing works. So, in case of impact printers, they use different
coloured ribbons to produce coloured printing. But the range of colour and quality is usually
limited, which is much better in case of non-impact printers.
Here, colour is produced by combining 3 colour pigments, cyan, magenta, and yellow. In
case of laser and electrostatic devices, these 3 pigments are deposited on separate passes. In
case of inkjet printers, these colours are sought together in a single pass along each line. So,
they work differently for different printers.
899
(Refer Slide Time: 37:58)
Apart from printers, we also have plotters as another graphics output device. They are
hardcopy outputs. And typically, they are used to generate drafting layouts and other
drawings.
(Refer Slide Time: 38:19)
This shows one example plotter, this figure. Here typically, in pen plotters, one or more pens
are mounted on a carriage or crossbar, which spans a sheet of paper. And this paper can lie
flat or rolled onto a drum or belt, which is held in place with clamps.
900
It can also be held in place with a vacuum or an electrostatic charge. As shown here, there is
a pen, a pen carriage, moving arm and there are other spare pens also, indicating different
colours. So, the pen can move along the arm, and the arm can move across the page.
(Refer Slide Time: 39:36)
To generate shading or styles, different pens can be used with varying colours and widths as
shown here.
(Refer Slide Time: 39:55)
And as I already mentioned, the pen holding carriage can move, it can be stationary also
depending on the nature of the plotter.
901
(Refer Slide Time: 40:20)
Sometimes instead of pen, ink-jet technology is also used, that means instead of pen, ink
sprays will be used to create the drying.
(Refer Slide Time: 40:36)
And how this movement is controlled? Again, it depends on the content of the frame buffer.
So, depending on the frame buffer values, the movement of pens or spray, ink spray is
determined, just like in case of video monitors. So, we have learned about in brief 2 types of
graphics output devices, namely video monitors and hardcopy outputs. Let us now try to
quickly understand the input devices, what kind of inputs are there and how they affect the
frame buffer.
902
(Refer Slide Time: 41:22)
In most of the graphic systems that we typically see nowadays, provide data input facilities,
that means the users can manipulate screen images. Now, these facilities are provided in
terms of input devices. The most well-known such input devices are keyboards and mouse.
But there are many other devices and methods available. Let us have a quick look at all those
different devices and methods.
(Refer Slide Time: 42:04)
So, in case of modern-day computing environment, as we know, we are surrounded by
various computing devices. So, we have a laptop, desktop, tab, smartphone, smart TV,
microwave, washing machine, pedometer and many more such devices that we interact with
903
every day, each of which can be termed a computer, by the classical definition of a computer.
And accordingly, we make use of various input devices to provide input to these computers.
(Refer Slide Time: 43:02)
So, all these input devices or input methods can be divided into broad categories. The
objective of such devices are to provide means for natural interaction. So, they include
speech-based interaction, that means the computers are equipped with speech recognition and
synthesis facilities.
So in case of speech recognition, the computer can understand what we say, so we provide
input through our voice and there is a speech recognition system that understands what we
say. And then it can also produce output in terms of speech only, human understandable
speech through the synthesis method.
Note that this is different from what input and output we have mentioned earlier. Then we
have eye gaze interaction, where we use our eye gaze to provide input. Haptic or touch
interaction, one example is the touchscreen, which we are heavily using nowadays because of
the use of smartphones or tabs.
There are alternative output mechanisms also, exploiting the sensation of touch. These are
called tactile interfaces. Here, we do not rely on display or visual display, instead we go for
tactile interfaces. These are primarily useful for people who are having problem in seeing
things.
904
(Refer Slide Time: 44:57)
We can also have “in air” gestures to provide input. Now, these gestures can be provided
using any of our body parts like hands or fingers or even head. And there is no need to touch
any surface unlike in case of smartphones or touchscreen devices, where we provide gesture
by touching the surface.
We can also have interaction or we can also provide input through our head or body
movements. So, all these are input mechanisms that are heavily used nowadays. Traditional
input mechanisms like keyboard, mouse, joystick, stylus are no longer very popular, instead
we mostly interact with the computers that we see around us through touch, through gestures,
through speech and so on. So, all these devices also are equipped with recognition of such
input mechanisms.
And also, as I said, output need not be always visible, sometimes it can be different also like
in case of tactile output, we can only perceive the output through touch rather than by seeing
anything. There also, this frame buffer content can be utilized to create the particular
sensation of touch to give us specific output. Also, we can provide output through speech,
speech synthesis to be more precise and so on.
905
(Refer Slide Time: 46:58)
Now, these inputs can be used to alter the frame buffer content. For example, I have created
an image of a cube and a ball as the example we started with. Now, I gave a voice command
that place the ball on the left side of the cube, that means the computer will understand this
command and accordingly modify the frame buffer values, so that the ball is now placed on
the left side of the cube.
Similarly, I can also give a command like, place the ball on the right side of the cube. And
again, the frame buffer value will change, so that the display that we get is an image showing
the ball on the right side of the cube and so on. So, with these inputs, we can change the
output. So, that is in brief, how we can provide input and how it affects the frame buffer
content to get different outputs.
906
(Refer Slide Time: 48:25)
Whatever I have discussed today can be found in this book, you may refer to chapter 10,
these 2 sections, section 10.1 and 10.2. So today, we briefly discussed about different
technologies that are used for computer graphics systems, namely the display technologies,
the hard copy output technologies and the input technologies.
In the next lecture, we are going to go deeper into this graphics hardware and going to learn
more on how the controller works, how the GPUs in the controller are organized and help
implement the pipeline stages. See you in the next lecture. Thank you, and goodbye.
907
Computer Graphics
Professor Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 30
Introduction to GPU and Shaders
Hello and welcome to lecture number 30 in the course, computer graphics. We have reached
almost the end of the course.
(Refer Slide Time: 00:45)
In the previous lecture we have learnt about basic graphics hardware. Also, we learnt about
the graphics input and output devices and also general architecture of a graphic system. Now,
we will continue our discussion on this graphics hardware and today we will cover basics of
GPU and GPU programming, which are part of the graphics hardware.
908
(Refer Slide Time: 01:21)
So, let us start with GPU. What it is and how it is used to implement the pipeline.
(Refer Slide Time: 01:40)
One thing we should note in graphics is that the graphics operations are highly parallel in
nature. That is a very crucial characteristics of graphics operations. Consequently, there is a
need to go for parallel processing of these operations.
909
(Refer Slide Time: 01:56)
For example, consider the modeling transformation stage. Remember that in this stage, what
we do?
We convert or transform objects defined in their own or local coordinate system to a world
coordinate scene. Now, how we do that, we apply transformations, for example rotations to
the vertices that define the objects.
(Refer Slide Time: 02:34)
910
And what is the transformation? If you may recollect from our earlier discussions, we define
transformation as a multiplication of two things. One is a transformation matrix and the other
is a vertex vector.
The thing to note here is that the same vector matrix multiplication is done for all vertices
that we want to transform. That means we are essentially performing the same operation
multiplication for all the vertices.
(Refer Slide Time: 03:26)
Now, we are given a set of vertices that define the objects. We can go for a serial
multiplication where we perform one matrix vector multiplication at a time. However, that
anyhow is not going to be very efficient.
Because essentially, we are performing the same operation, so instead of going for serial
multiplication, if we can perform the same operation on all the vectors at the same time that is
in parallel then we are going to have a significant gain in performance. And this is very
important in real time rendering of scenes because typically we need to process millions of
vertices at a time or millions of vertices per second. Therefore, if we can process all these
millions of vertices parallely then we are going to have huge gain in performance.
911
(Refer Slide Time: 04:40)
If we are performing these operations using our CPU then we cannot take advantage of these
inherent parallel nature of the graphics operations. Because CPUs are not designed for that. In
order to address this issue where we want to take advantage of these inherent parallelism,
there is special purpose hardware that comes with our systems.
Almost all the graphics systems come with a separate graphics card, containing its own
processing unit and memory elements. Now, this separate specialized hardware system is
called graphics processing unit or GPU. So, essentially GPU means a specialized hardware
which is used to perform graphic operation by exploiting the inherent parallelism that are
there in graphics operations.
(Refer Slide Time: 06:08)
912
(Refer Slide Time: 06:24)
Let us go into the workings of GPU. Now, we have to note that GPU is a multicore system
that means it contains a large number of cores or unit processing elements. Now, each of
these cores or these unit processing elements is called a stream processor. Because it works
on data streams, streams of input data.
(Refer Slide Time: 06:55)
Now, these cores are nothing but simple hardware capable of performing simple integer and
floating-point arithmetic operations only. So, each core can perform arithmetic operations
only, either integer arithmetic or floating point arithmetic. And multiple cores are grouped
together to form another unit called streaming multiprocessors or SM. So, each core is called
913
stream processor and many such cores are grouped together to form streaming
multiprocessors.
(Refer Slide Time: 07:43)
Now, this brings us to the idea of SIMD, note the term. To understand, let us consider one
example, geometric transformation of vertices that we were discussing earlier. So, here our
instruction is same which is multiplication. Now, the data on which this instruction operates
varies, because the vertex vectors vary. Although the transformation matrix remains the
same. So, then here what we are doing, we are having single instruction working on multiple
data.
This is the idea of SIMD or Single Instruction Multiple Data, and the GPU streaming
multiprocessors are essentially examples of SIMD. So, how it works, here as you can see, we
have same instruction given to all the cores and the cores take data which may be different
but the instruction will be same and the same instruction will operate on different data
streams. Another illustration of this idea is given here, here if we do not consider SIMD then
what happens. So we have two addition operations on two data streams.
A0 B0 and A1 B1, so there will be two separate instructions for performing these two separate
additions giving the output of C0 and C1, this is normal operation. In case of SIMD what
happens is that we have this as data streams and the instruction is single. Here note that we
have two instructions working on two separate data streams. Here we have a single
instruction working on both the data streams to give us the desired output. That is the idea of
SIMD. So, you now know that GPUs contain SMs or streaming multiprocessors which work
based on the idea of SIMD.
914
(Refer Slide Time: 10:25)
Then let us have a look at how the GPUs are organised. As I said each streaming
multiprocessor is designed to perform SIMD operations. So and we have many such
streaming multiprocessors, as shown here. Then we have some specialized memory, purpose
of which will be explained to you shortly, and other components to manage this parallel
processing. Each streaming multiprocessor contains multiple streaming processors or course
as shown here. And each core is capable of performing simple integer or floating-point
arithmetic operations only.
(Refer Slide Time: 11:33)
So, that is broadly what are there in GPU, streaming multiprocessors and dedicated memory
units plus additional components to manage this parallel processing. Now, let us try to
915
understand how the graphics operations work in the GPUs. First thing we should note is that
most real time graphic systems assume that scene is made of triangles. So, we actually
convert any surface into triangles or triangular meshes. This point we have already discussed
earlier when we were talking about object representation.
(Refer Slide Time: 12:34)
Now, given that triangular mesh information what happens is that, those dedicated APIs
which are provided in the graphics library such as OpenGL or Direct3D, these triangles are
send to GPU, one vertex at a time serially and GPU assembles them into triangles.
916
(Refer Slide Time: 13:03)
Also, we should note here that the vertices are represented with homogeneous coordinate
system. So the vertices are represented in the homogeneous coordinate system.
And so we are dealing here with the very first stage that is object definition, so these objects
are defined in their local or modeling coordinate systems. Then the GPU performs all the
stages, so first it performs modeling transformation on vertices that is the first stage of
processing.
917
(Refer Slide Time: 13:54)
And as we have explained earlier this transformation is achieved with a single transformation
matrix and vector point multiplication operation.
(Refer Slide Time: 14:15)
As we have noted earlier, the multicore GPU performs such operations simultaneously or
parallely, so essentially multiple vertices are transformed simultaneously at the same time. Is
not that one after another we are performing the multiplications. So, what we get after
multiplication that is stream of triangles but this time, they are defined in world coordinate
system, which is the purpose of modelling transformation stage. It is also assumed that the
viewer is located at the origin of the world coordinate system and view direction is aligned
with the z axis. This is the assumption using which the hardware is designed.
918
(Refer Slide Time: 15:17)
So, after modeling transformation, GPU computes vertex colour or the lighting stage is
performed. Now, this is done based on the light that is defined for the scene, so some light
source is assumed and based on that light source this colouring is done. Now, why GPU is
suitable for computing colours because if you may recollect our discussion on lighting, we
have noted that colouring can be computed by vector dot products and a series of addition
and multiplication operations. And these operations are performed simultaneously for
multiple vertices by the GPU because it is designed in that way, so again here we are
exploiting the inherent nature of graphics operations that is parallelism.
(Refer Slide Time: 16:38)
919
After colouring, each coloured 3D vertex is projected on to the blue plane. And that again is
done using matrix vector multiplication, we have already noted this before during our
discussion on projection transformation and the output that we get is stream of triangles in the
screen or device coordinates ready to be converted to pixels. Now, note here that this
projection actually involves view transformation also. Which we have not explicitly
mentioned here as well as window to view put transformation. All these transformations we
can club together by multiplying the corresponding transformation matrices to get a single
transformation matrix.
(Refer Slide Time: 17:46)
So, after that stage we get the device space triangles and we now go for rasterization or scan
conversion. So here it may be noted that each device space triangle overlaps some pixels on
the screen that means those pixels are part of the triangles. In the rasterization stage these
pixels are determined.
920
(Refer Slide Time: 18:20)
Now, the GPU designers who developed GPUs over the years incorporated many such
rasterization algorithms, we have already discussed few in our discussions on rasterization.
Now, these algorithms exploit one crucial observation that is each pixel can be treated
independently from all other pixels. So, it is not necessary to treat the pixels as dependent on
each other they can be treated independently.
(Refer Slide Time: 19:01)
Accordingly, the pixels can be rasterised parallely, so you can use this inherent parallelism to
rasterize all the pixels simultaneously. And that is one big advantage of having GPU, we do
921
not have to process one pixel at a time instead we can process all the pixels together to get
quick result, quick output.
(Refer Slide Time: 19:36)
Now, if you may recollect the 3D graphics pipeline, during pixel processing stage there are
two more stages that are there, two more activities that are there, one is surface texturing or
assigning patterns to the surface colours and second is hidden surface removal or HSR.
(Refer Slide Time: 20:03)
922
Now, surface texturing idea is very simple here texture image is there which is actually
imposed on the surface to give us the illusion of details. Note that it is only an illusion
creation rather than actually computing a texture pattern, a simply replacing pixel colours
with texture colour. That is the simplest idea we discussed earlier.
(Refer Slide Time: 20:40)
Now, in order to that we need to store this texture images or texture maps. And since we need
to access it, that texture images frequently, ideally they should be stored in high speed
memory, so that access time is less. Now, this is because as we said earlier pixel calculations
are very frequent and each pixel calculation must access this texture images. Secondly the
access is usually very regular in nature. That means nearby pixels tend to access nearby
texture images or texture image locations. So, to reduce the access time specialized memory
cache is used to store the texture images as shown here in this figure. These are specialized
memory locations in the GPU to store texture images.
923
(Refer Slide Time: 22:05)
Also, we discussed earlier during our discussion on hidden surface removal, the idea of Z
buffer algorithm or depth buffer algorithm. Now, that is implemented in GPUs and for that
also typically GPUs are equipped with specialized memory element or depth buffers. And it
stores distance of viewers from each pixel. So, that is typically part of the GPU.
(Refer Slide Time: 22:48)
Now if you may recollect how the Z buffer works, so here also GPU compares pixels
distance with distance of pixel already present that is it simply executes the algorithm and the
display memory is updated only if the new pixel is closer. So, it implements the Z buffer
924
algorithm, for more details you may refer to lecture 23. So, you have the streaming
multiprocessors, each containing course, then these various data items for performing these
simultaneous operations plus specialized texture storage that form GPU.
(Refer Slide Time: 24:07)
Now, there is one more concept that is the idea of shaders and shader programming. Let us
try to understand this programming concept in brief at a very introductory level. In our earlier
discussion on GPU what we discussed is that, how the GPUs implement pipeline stages.
Now, in that discussion if you may have noted there are two broad group of activities, one is
processing of vertices or vertex processing also called geometry processing.
Other one is processing of pixels. So, these two broad group of activities were discussed to
explain the working of GPU.
925
(Refer Slide Time: 24:54)
Now, during the early years of GPUs they used to come with fixed function hardware
pipeline, that means all the pipeline stages or all the stages that implement the pipeline are
pre-programmed and embedded into the hardware. GPU content dedicated components for
specific tasks and the user had no control on how this task should be performed and what
processing unit performs which stage of the pipeline.
So, earlier GPU is used to come with this fixed function hardware that means everything was
predetermined, which component of the GPU will deal with which part of the pipeline and
the user had no control on it. So, the flow was typically like this from user program the
primitives were sent, then components were there for geometry processing, output is 2D
screen coordinates from there pixel processing starts and components were again fixed.
926
(Refer Slide Time: 26:11)
But then people realize that, that is actually reducing flexibility and because of that power of
GPU was not fully utilized. To leverage the GPU power better, modern GPUs are designed to
be programmable that means we can program them. Fixed function units are replaced by
unified grid of processors known as shaders. So, earlier there were fixed function units, now
there are unified grid of processors which are called shaders.
(Refer Slide Time: 27:05)
And any processing unit can be used for performing any pipeline stage calculation. And the
GPU elements, that is the processing units and memory can be reused through user programs.
927
So, earlier we had fixed units for performing different stages, now we have common facilities
which are reused to perform different stages and that is determined through programming.
Which portion and how the GPU elements namely the processing units and memory are used
for performing operations related to a particular stage. The idea is shown here as you can see
once the primitives are sent to the GPU, GPU as a common element, now subset of these
common elements are used for different purposes as you can see here also the memory is also
shared and reused.
(Refer Slide Time: 28:35)
Now, the idea is that we write programs to use GPU elements, these programs are called
shader programs and the corresponding approach is called shader programming. Let us
briefly go through the basics of shader programming.
928
(Refer Slide Time: 29:01)
Now, with the programable GPU that we have just introduced it is possible for programmer
to modify how the GPU hardware processes vertices and shades pixels, shades means assigns
colour to the pixels.
This is possible by writing vertex shaders and fragment shaders, these are also called vertex
programs and fragment programs. These are terms that you probably have come across with
these are used to specify to the GPU how to use its hardware for specific purpose. And this
approach as I said is known as shader programming and it has other names also such as GPU
programming, graphics hardware programming and so on.
929
(Refer Slide Time: 30:12)
In case of vertex shader, what happens is that, these programs are used to process vertices or
the geometry. Essentially these programmes are used to perform modeling transformations,
lighting and projection to screen coordinates which involve all the intermediate
transformations of view transformation. And conceptually the window to view put
transformation as well.
(Refer Slide Time: 30:52)
In case of fragment shader it does a different job, these are programmes that perform the
computations required for pixel processing. Now what are the computations, those are related
930
to how each pixel is rendered, how texture is applied or texture mapping and whether to draw
a pixel or not that is hidden surface removal. So, these 3 are the task done by fragment
shaders. Note that all these 3 are related to processing of pixels. So, vertex shaders are
processing of vertices mostly related to transformations from modeling coordinate to device
coordinate, and all the transformations in between.
Whereas fragment shaders deal with pixel processing that is rendering of pixels applying
textures as well as performing hidden surface removal at the pixel level.
(Refer Slide Time: 32:14)
Now, why it is called fragment shader the pixel processing units, it implies that GPU at any
instant can process a subset or fragment of all the screen pixels that are present. So, at a time
a subset of the screen pixels are processed hence it is called a fragment shader.
931
(Refer Slide Time: 32:43)
Now, this shader programs are small pieces of codes and they are sent to graphics hardware
from user programs so essentially by calling some APIs and executed on graphics hardware.
So, we should keep this in mind that they are small pieces of codes which are executed on
graphics hardware and which are embedded in user programs, sent to the hardware by the
user programs.
(Refer Slide Time: 33:18)
In fact, this ability to program GPUs gave rise to a new idea that is the idea of general
purpose GPU or GPGPU. Again these are common terms nowadays and you probably have
come across this term, this means that we can use GPU for any purpose not necessarily only
932
to perform graphics related operations. So, with the idea of GPGPU we can perform tasks
that are not related to graphics at all, however these are very involved subjects and we will
not go any further to explain these concepts.
(Refer Slide Time: 34:06)
So, in summary what we have learnt today, let us try to recap quickly.
(Refer Slide Time: 34:20)
We learnt about how the hardware works. Now, that means the graphics processing unit
which are anyway part of the computer systems that deal with graphics operations. We also
933
learnt how the pipeline stages are implemented in the GPU and got introduced to the idea of
shaders and shader programs.
(Refer Slide Time: 35:02)
Now, that is about hardware. So, in the previous lecture and today’s lecture we learnt about
graphics hardware.
We started with discussion on general architecture of a graphics system a very generic
architecture, then explain different terms and then in some details learnt how the GPU works.
One component remains that is how as a programmer we can write a programme to perform
graphics operations. That we will learn that aspect of the course that is writing programs to
perform graphics operation or create a scene on the screen that we will learn in the next
lecture, where we will learn about writing programs using OpenGL which is a graphics
library.
934
(Refer Slide Time: 36:10)
Whatever we have discussed today you can find in this book, specifically Chapter 10, Section
10 point 3. That is all for today, see you in the next lecture. Thank you and good bye.
935
Computer Graphics
Professor Dr. Samit Bhattacharya
Department of Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 31
Hello and welcome to lecture number 31 in the course Computer Graphics. So, this is going to be
our final lecture on the topic. So, far what we have discussed?
(Refer Slide Time: 00:44)
We discussed pipeline and then currently we are discussing pipeline implementation. In our
earlier lectures we learned about the basic graphics hardware including the graphics input and
output devices, the GPU or Graphics Processing Unit and the GPU programming basic idea.
Today in our, this last topic we are going to learn about programming or how to write graphics
programs that is essentially the software aspect of Computer Graphics.
Now, before we learn to program, we will first start with a basic introduction to graphics
software. If you may recollect during our introductory lectures, we had a preliminary
introduction but today we are going to recap as well as expand those discussions.
936
(Refer Slide Time: 01:55)
As we have mentioned earlier graphic software are broadly of two types. One is the special
purpose packages and the other one is general programming packages.
(Refer Slide Time: 02:13)
In the special purpose packages, what we have? These are essentially complete software systems
with their own GUIs or user interfaces. For example, painting system here it has its own user
interface through which an artist can select objects, select colour, place the objects at desired
937
position on the Canvas or the screen, change the size of the object, change the shape, also
orientation and so on.
And all this, the artist can do by interacting with the user interface. So, there the artist need not
know anything about the graphics pipeline or how it is implemented. These are examples of
complete software systems or packages. Another example is the CAD package that we have
learned about in the introductory lectures, CAD or Computer Aided Design packages. These are
primarily used in architecture, medicine, business, engineering and such domains.
(Refer Slide Time: 03:46)
The other type of software is the general programming package. Now, here we have libraries,
libraries of graphics functions that are provided and we can use those libraries with any
programming language such as C, C++, or Java and these functions are mean to perform or
rather mean to help a programmer perform pipeline tasks. So, in other words they help the
program and implement the pipeline.
An example is OpenGL, which stands for Open Source Graphics Library. Also there are VRML
Virtual Reality Modeling Language, Java 3D and so on. So, there are many such libraries
provided to implement graphics functions.
938
(Refer Slide Time: 04:55)
Now, these functions are also known as computer graphics application programming interface or
CG API. Now, they are essentially a software interface between programming language and the
hardware. For example, when we want to write it an application program in a language say C,
these library functions allow us to construct and display pictures on the output device. So,
without these functions will not be able to do so.
(Refer Slide Time: 05:39)
939
But one thing we should keep in mind is that the graphics functions are typically defined
independent of the programming language and that is achieved through a concept called
language binding. So, language binding is defined for a particular high-level programming
language.
Now, through such binding we get the particular syntax to be used for accessing various graphic
functions from that language. So, essentially language binding allows us to use these library
functions from inside a program written using a particular language.
(Refer Slide Time: 06:33)
Now, each language binding is designed to make the best use of the capabilities there for a
particular language and they are designed to handle various syntax issues such as data types,
parameter passing and error handling. Now, these specifications or language binding are set by
the ISO or International Standard Organization, so we need to know about these standards. We
will have a brief introduction to different standards used for computer graphics.
940
(Refer Slide Time: 07:28)
So, what are those standards, software standards that are used in computer graphics?
(Refer Slide Time: 07:32)
Now, why we need standard let us try to understand again. When we are writing a program with
graphic functions, it may be the case that those programs are moved from one hardware platform
to another. Now, how the computer will then understand the program if the platform is changed?
There we require standard, without some standards which is essentially a commonly agreed
syntax, this movement between platforms will not be possible and we need to rewrite the whole
941
program. So, essentially we need to start from scratch. So, standard helps us avoid in such
situation.
(Refer Slide Time: 08:33)
First graphic standard came in 1984, long ago which was known as the graphics kernel system or
in short GKS. It was adopted by ISO as well as many other national standard bodies.
(Refer Slide Time: 08:57)
942
Then came a second standard which was developed by extending the GKS, it was called PHIGS,
which stands for programmer’s hierarchical interactive graphics standard. ‘PHIGS’, again it was
then adopted by the standards organizations worldwide.
(Refer Slide Time: 09:36)
Now, around the same time when the other standards were being developed Silicon Graphics Inc
or SGI started to ship their workstations meant for graphics with a set of routines or library
functions together these are called graphics library for GL.
(Refer Slide Time: 10:10)
943
Subsequently these set of functions or GL become very popular and eventually evolved as the
OpenGL in the early 1990s, which had become a de facto graphic standard. Now, this standard is
now maintained by the OpenGL architecture review board which is a consortium of
representatives from many graphics companies and organizations.
(Refer Slide Time: 10:45)
Now, let us try to understand what is there in OpenGL, what functions it provide and how to use
those functions.
(Refer Slide Time: 10:58)
944
Let us try to understand OpenGL with respect to one example program. So, this program is
shown here, this program is meant to display a straight line on the screen. Now, this has been
written by utilizing OpenGL library functions called from C, the C language. Now, let us try to
understand the syntax of the program.
(Refer Slide Time: 11:30)
So, in order to make use of the library functions, the first thing we should do is to include a
header file. Now, this header file contains the library functions, so here we have included it with
this statement hash include GL slash glut dot h. Now, what this library name means?
945
(Refer Slide Time: 12:04)
The core library of OpenGL actually does not support input and output operations because those
functions were designed to be device independent, whereas support for I/O is or must be device
dependent. So, we need to do something about it because we have to display the line on the
output which is essentially a device dependent operations.
(Refer Slide Time: 12:40)
946
So, to display we require auxiliary libraries on top of the code library, this is provided by the
library GLUT or OpenGL utility toolkit library, ‘GLUT’, GLUT library, that is mentioned in this
include statement.
(Refer Slide Time: 13:14)
Now, GLUT provides a library of functions for interacting with any screen windowing system
essentially any display device and it allows us to setup a display window on our screen, in this
window we are going to show the image or whatever we want to display and this display window
is essentially a rectangular area which contains the image, that we can do with the help of
functions provided in the GLUT library.
947
(Refer Slide Time: 14:01)
Now, whichever library functions we use that are part of GLUT they come with the prefix ‘glut’.
(Refer Slide Time: 14:16)
So, essentially these functions provide interface to other device specific window systems that we
have already mentioned. So, we can write device independent programs using these GLUT
functions and the functions themselves are used to link our program to the particular device.
948
(Refer Slide Time: 14:45)
Also we should note here is that the library GLUT is suitable for graphics operations only and
for any other operation we may need to include other header files such as stdio.h or stdlib.h as
we do in our regular programs.
(Refer Slide Time: 15:12)
Now, let us start with the main function which is shown here, this function and let us try to
understand the body of the function. As we said GLUT allows us to create and manage a display
window or the screen region on which we want to display the line. So, the first thing that is
949
required is to initialize GLUT with the statement glutInit as shown here, this is the initialization
function that is required at the beginning.
(Refer Slide Time: 16:00)
After initialization, we can set various options for the display window using the function
glutInitDisplayMode as shown in the second statement. So, what are these options?
(Refer Slide Time: 16:27)
950
Now, these options are provided by symbolic GLUT constants as arguments as shown here,
GLUT_SINGLE, GLUT_RGB.
(Refer Slide Time: 16:44)
Now, here in this particular function we have used this statement having these two arguments
GLUT_SINGLE and GLUT_RGB, they indicate that we are specifying a single refresh buffer to
be used for the display window and RGB color mode to be used for selecting color values.
GLUT_SINGLE is for the first task single refresh buffer and GLUT_RGB indicates that RGB
color mode to be used.
951
(Refer Slide Time: 17:23)
Now, here we should look at the syntax, how this glutInitDisplayMode function is used. In the
constant name which provides the options, we have used GLUT as a prefix all caps followed by
a underline symbol and then the constant name again all caps as shown here or here this is the
particular syntax used to provide arguments. Now, to combine multiple options we are using this
logical or operation, to indicate that we want both that is the syntax used to provide the options.
(Refer Slide Time: 18:32)
952
Then we are using the two functions glutInitwindowPosition and glutInitwindowSize. Now,
these are used to provide some values that are different than the default values for the window
size and position that is already there in the library function. So, if we want to change the values
then we need to use these two functions glutInitwindowPosition where is specify the value and
glutInitwindowSize where we specify again the size value.
(Refer Slide Time: 19:16)
Now, this window position, which position is specifies? It specifies top left corner position of the
window. Assuming integer screen coordinate system and assuming origin at the top left corner.
These are the assumptions when we specify these values.
953
(Refer Slide Time: 19:45)
Then in case of glutInitwindowSize where we are specifying the size, the first argument specifies
width that means 800, second argument specifies height that is 600 and both these values are in
pixels, so 800 pixels by 600 pixels. So, we have understood these four functions init,
displaymode, windowPosition and windowSize.
(Refer Slide Time: 20:26)
Next we create the window and set a caption which is optional using the function Createwindow
and the caption is provided within parentheses, but this caption is optional.
954
(Refer Slide Time: 20:52)
The next thing that we do is specify that the picture is to be displayed in the window that is the
line. Now, we have to create the line and then we can display it in the window, this creation is
done by a separate function which is user defined which we are calling createLine function.
(Refer Slide Time: 21:23)
Now, this createLine function is passed as an argument to another glut library function that is
glutDisplayFunction which is shown here. This indicates that the line is to be displayed on the
window. So, with this function we indicate that we are creating a line which is our image here
955
that is using the create line function and this line is to be displayed on the window created
through these statements. But before we do that certain initializations are required.
(Refer Slide Time: 22:05)
And these initializations are performed in the init function shown here. Again this init function is
used to make our code look very clean, otherwise we could have used it in a different way and
will come back to this function later.
(Refer Slide Time: 22:37)
956
So, in order to keep the code clean and to indicate that we want to display a line on the window
we add these two lines init and glutDisplayFunction as shown here.
(Refer Slide Time: 22:53)
Now, those are all done but the window is still not on the screen, we need to activate it once the
window content is decided, that we do with this function glutMainLoop. Here it activates all
display windows created along with their graphic contents. So, this function glutMainLoop
actually puts the window with its content on the screen.
(Refer Slide Time: 23:29)
957
This function must be the last one in our program, it puts the program into an infinite loop
because the display we want constantly. In this loop the program waits for inputs from devices
an input device such as mouse, keyboard, even if there is no input the loop ensures that the
picture is displayed till the window is closed. So, since we want the picture to remain on the
screen unless there is some input or the window is closed we use the loop and this loop must be
at the last statement of the code in main after we create the image and put it on the window.
(Refer Slide Time: 24:23)
Now, as we have noted so we explained all these functions that are there in main and all this
started with glut indicating that there glut library function except the two functions init and
create line. Now, in these two functions we used OpenGL library function rather than glut library
functions accordingly their syntax are different.
958
(Refer Slide Time: 24:57)
Each OpenGL function prefixed with GL as we can see in this function init as well as in this
create line function. So, here each function is starting with this prefix gl, it indicates that this
function is a OpenGL function. Each component word within the function name has first letter
capitalized like here C is capitalized in all the cases as you can see Matrix M is capitalized and
so on. So, that is the syntax of OpenGL library function starts with gl and component word
within this function name has first letter capitalized.
(Refer Slide Time: 25:57)
959
Sometimes some functions may require one or more arguments which are assigned symbolic
constants. For example, a parameter name, parameter value or a particular mode, these are all
part of the OpenGL library function syntax.
(Refer Slide Time: 26:21)
Now, all these constants begin with capital GL all in capital. Each component of the name is
written in capital letters and separated by underline symbol, as we have seen in the case of glut
constants as well like GL underscore RGB, GL underscore AMBIENT underscore AND
underscore DIFFUSE, where everything is in capital separated by underline.
960
(Refer Slide Time: 26:57)
Also the OpenGL functions expect some specific data types. For example, 32 bit integer as a
parameter value and these functions use built-in data type names for that.
(Refer Slide Time: 27:19)
Each of these names begins with GL and followed by data type name. For example, GLbyte,
GLdouble, but this data type name is in lowercase.
961
(Refer Slide Time: 27:42)
So, those are the syntax that are used for using OpenGL library functions. Now, let us try to
understand these two functions that we have defined using the OpenGL library functions, one is
init, one is create line. So, let us start with init. This is essentially mean to initialize and perform
one time parameter settings. In our function we have used three OpenGL library routines or
library functions. What they do?
(Refer Slide Time: 28:25)
962
Now, one is glClearColor, the first one with some argument, four arguments are used. This is
used to set a background color to our display window and this color is specified with RGB
components.
(Refer Slide Time: 28:48)
Now, these RGB components are specified in the first three arguments in that order that means
this is R, this is G, this is B, with this particular set of values as we all know you will get white as
the background color, we can set any background color. For example, if we set all 0, we will get
black.
963
(Refer Slide Time: 29:21)
Now, there is also a fourth parameter which we have set as 0.0. Now, this is called alpha value
for the specified color and it is used as a blending parameter. In other words it specifies
transparency of the color. If we are using value 0.0 that means the color is totally transparent and
1.0 means totally opaque objects. So, it indicates transparency.
(Refer Slide Time: 30:08)
964
Now, here we are displaying a line which is a 2D object. However, OpenGL does not treat 2D
objects separately. Now, it treats the 2D pictures as special case of 3D viewing. So, essentially
the entire 3D pipeline stages are performed.
(Refer Slide Time: 30:34)
So, we need to specify the projection type and other viewing parameters that is done with these
two functions glMatrixMode, which is GL_PROJECTION and gluOrtho2D with some
arguments.
(Refer Slide Time: 30:58)
965
Now, this function gluOrtho2D here the function is prefixed to GLU rather than GL. So, it
indicates that this function belongs to GLU or OpenGL utility another auxiliary library. Earlier
we have seen GLUT OpenGL utility toolkit, now we are seeing OpenGL utility another auxiliary
library.
And this library provides routines for complex tasks such as setting up of viewing and projection
matrices, describing complex objects with line and polygon approximations, processing of
surface rendering operations and displaying splines with linear approximations, these are some
examples of the complex tasks that are part of the pipeline which are implemented in this
OpenGL utility auxiliary library.
(Refer Slide Time: 32:00)
Now, together these two functions glMatrixMode and gluOrtho2D specify an orthogonal
projection to be used to map the line from view plane to the screen. Now, view plane window
specified in terms of lower left and top right corners of the window. So, these arguments specify
the lower left and top right corners of the window and during this projection anything outside of
this window is clipped out as we have discussed during our pipeline discussion.
966
(Refer Slide Time: 32:58)
Now, let us move to our second function create line. Now, this function is actually creates the
line which we want to display. The first line is glClear with some arguments. Now, this function
is used to display the window with specified background color. Now, the arguments as you can
see an OpenGL symbolic constant indicates bit values in color or the refresh buffer that are to be
set to the background color values specified in the function glClearColor function. So, essentially
this function indicates what should be the background color of the display window.
(Refer Slide Time: 34:10)
967
Now, OpenGL function also allows us to set the object color with the function glColor3f. So,
there are three arguments again they specify the RGB components ‘RGB’, so these two functions
are used to set color values to the background as well as to the object.
(Refer Slide Time: 34:37)
Now, in the second function this 3f indicates that the three components are specified using
floating point values. In other words the values can range between 0.0 to 1.0. So, in this
particular case these three values denote green color.
(Refer Slide Time: 35:09)
968
Next we have a piece of code between the two functions glBegin and glEnd, so this indicates the
line segment to be drawn between the endpoints provided in the arguments. So, this function
essentially creates the line between these two endpoints specified with these two arguments and
the functions glVertex2i called twice.
(Refer Slide Time: 35:50)
Now, here this 2i in the function as you can guess indicates that the vertices are specified by two
integer values denoting the X and Y coordinates, this is quite straightforward.
(Refer Slide Time: 36:10)
969
Now, the first and second endpoints are determined depending on their ordering in the code. So,
this will always be treated as the first point because it is appearing before the other one and this
will be the second point. So, in the way the code is written the first and second points are
determined.
(Refer Slide Time: 36:35)
And this function glBegin with this constant GL_LINES as well as the function glEnd indicate
that the vertices are line end points.
(Refer Slide Time: 36:56)
970
Now, with all these functions our basic line creation program is ready. One point to be noted
here is that these functions may be stored at different locations in the memory depending on the
way OpenGL is implemented.
(Refer Slide Time: 37:19)
And we need to force the system to process all these functions. This we do with the other
function glFlush as shown here. So, this should be again the last line of our picture generation
procedure which indicates that all these functions that we have used must be processed one after
another.
971
(Refer Slide Time: 37:47)
So, that is how we can create a program using OpenGL. So, in our example we have used
OpenGL library in the setting of C language and we have also seen that only OpenGL library is
not sufficient, we need to some auxiliary libraries, here we have used GLUT as well as GLU
auxiliary libraries, GLUT stands for GL Utility Toolkit which allows us to create the window
which is a display dependent operation and GLU allows us to perform other complex tasks
which are not there in core OpenGL library.
(Refer Slide Time: 38:44)
972
So, with this we have come to the end of the topic. So, we have learned various things, the
graphics hardware including the input output and GPU, also we started with a generic
architecture of a graphic system and then learned about various IO and GPU and today we
learned about the graphics software, how the softwares are created, different standards and an
example program using OpenGL, OpenGL can be used to write any graphics program. Now,
with this lecture we have come to the end of the course. So, in the next lecture we will
summarize what we have learnt in this course so far.
(Refer Slide Time: 39:44)
Whatever I discussed today can we found in this book, you can go through chapter 10, section
10.4 to learn on Graphic Software including the OpenGL example. So, in the last lecture we will
summarize our learning so far, so we will see you in the concluding lecture, till then thank you
and goodbye.
973
Computer Graphics
Professor Samit Bhattacharya
Computer Science and Engineering
Indian Institute of Technology, Guwahati
Lecture 32
Concluding Remarks
Hello and welcome to the last lecture in the course Computer Graphics. So we have reached the
end of the course. Let us reflect on what we have learnt so far and how to use the knowledge.
(Refer Slide Time: 00:45)
(Refer Slide Time: 00:48)
974
So, we started with some objectives. What are the objectives?
(Refer Slide Time: 00:54)
Our broad objective was to learn about the process of computer graphics. In particular, rendering
of static images on a screen. Now it has broadly two components, one is the idea of the pipeline
and the other one is the implementation of the pipeline. So essentially what we tried to learn is
that how an image is displayed on the screen, starting from object definition to final image
synthesis and display.
And it has two components. First one is, how to create the image or synthesize the image. As we
have discussed that is done through the 3D pipeline and secondly how the image is actually
physically rendered on the screen, that is done with the hardware and software support together
which is part of the implementation of the pipeline.
975
(Refer Slide Time: 02:14)
Now that was the broad objective.
In order to achieve that broader objective we divided our learning objectives into smaller specific
objectives. There are broadly these 3 specific objectives, learning about object representation
which is the very first stage of image synthesis, then the pipeline stages which converts object
definitions into a representation on the pixel grid and finally implementation of that
representation on the physical screen. There we have the objective of learning about the basic
hardware as well as the software.
976
(Refer Slide Time: 03:00)
Now let us see, how we have learned the Broad Idea and what are the things that we have
learned.
(Refer Slide Time: 03:09)
To start with, we learnt about a very generic graphics architecture to understand the image
synthesis process. This graphic system architecture consists of 3 major components, one is the
977
display controller as shown here, then we have video memory and finally we have video
controller. And these 3 components are used to synthesize an image. Now display controller is
essentially the graphics card with the GPU or the graphics processing unit.
Now this is the component responsible for implementation of the pipeline stages in hardware.
Recollect that the idea is to exploit the inherent parallelism in the graphics processing operations
and that is achieved with the use of GPU. Video memory is again there in the graphics card, it is
the memory component of the card which is separate but it may be part of main memory also,
which is typically not the case.
And finally the video controller is used to convert whatever is there in the memory, digital data
to analog signals for controlling electrochemical arrangements that are ultimately responsible for
exciting the pixels on the screen. Along with that there may be input devices attached to the
graphics system which allows the user to change the synthesized image that is the broad idea of
the graphic system and the components involved to process the graphics operations.
(Refer Slide Time: 05:34)
And then we learned about the pipeline that is the conceptual stages involved in converting an
image described in the form of component objects to the final synthesized image. We learnt the
stages in a particular sequence starting with object representation that is the first stage. Here the
978
objects are defined in their local coordinate system. Then we have second stage that is modeling
transformation. Here a transformation takes place which is responsible for constructing a scene
in the world coordinate system by combining together the objects that are there in the local
coordinate system.
So essentially here there is a transformation from local to world coordinate. In the third stage we
assign color to the objects where we assume that the objects are defined in the world coordinate.
In the fourth stage, a series of transformations take place, this fourth stage is called viewing
pipeline, again it is a pipeline of sub-stages. There are 5 sub-stages, first sub-stage is viewing
transformation. Here we assume that the world coordinate scene is transformed to a view
coordinate system, so essentially world to view coordinate transformation takes place here.
Then in this view coordinate system we perform clipping so we define a view volume and
whatever objects are outside that volume are clipped out. So this takes place in the view
coordinate system. Then whatever is there inside the view volume are further processed to
remove hidden surfaces with respect to a particular viewer position, this also takes place in the
view coordinate system conceptually.
After that we project the view coordinate scene to a 2D view coordinate system, so from 3D
view coordinate system to 2D view coordinate system and this transformation is the projection
transformation. Finally from the 2D view coordinate system we transform the image description
to a device coordinate system that is the final sub-stage of the fourth stage. After this viewing
pipeline stage is over we can convert the resulting image description in the device coordinate
system that means from a continuous device coordinate we map it to the discrete pixel grid or the
screen coordinate system.
So these are the 5 stages that we have learnt in the pipeline. I would like to emphasize here again
that these stages need not be in this exact sequence in which we have learnt. In implementation,
this sequence may be different, so exact sequence need not be followed in implementation of the
pipeline. The sequence I have used just to explain the concepts rather than explain how they are
actually implemented.
979
(Refer Slide Time: 09:21)
So to achieve this broader learning objective we have covered several topics, let us go through an
overview of those topics.
(Refer Slide Time: 09:27)
So there are total 31 lectures to learn this broader idea.
980
(Refer Slide Time: 09:37)
And these lectures were divided into groups. So, first 3 lectures were devoted to introduction to
the field.
(Refer Slide Time: 09:50)
Then introduction to the 3D graphics pipeline was covered in lecture 4.
981
(Refer Slide Time: 10:00)
Lecture 5 to 9, were devoted to discussions on object representation techniques, various
techniques we covered.
(Refer Slide Time: 10:13)
982
Then the other pipeline stages were covered in lectures 10 to 28. Now lectures 10 to 12 covered
geometric modeling, the second stage. Lecture 13 to lecture 17 covered lighting. Lecture 18 to 24
covered viewing pipeline and lecture 25 to 28 covered the final stage that is rendering or scan
conversion.
(Refer Slide Time: 10:54)
The pipeline implementation, how the pipeline is implemented in hardware as well as using
software were covered in the remaining lectures. The two lectures 29 and 30 were devoted on the
explanation of graphics hardware and the final lecture, lecture 31 was used to discuss software,
graphics software.
983
(Refer Slide Time: 11:20)
So I hope that you have enjoyed the course the lectures and you have learned the concepts. So
with this learning I hope that you will be able to understand how graphic systems work. How
your program can create an image on the screen of your computer and may be with this
knowledge you can even think of developing a library of your own, a general purpose graphics
library which others can use to create their own programs. Also you can think of developing
special graphics applications using these library functions like the one we discussed earlier
painting packages or CAD packages and so on.
(Refer Slide Time: 12:24)
984
So I hope that you have learned all these concepts, the lectures were interesting and
understandable. Ofcourse in the lectures I could not cover everything so for more details you
may always refer to the main learning material as well as the reference learning materials that I
have mentioned throughout the lecture. That is all. Wish you all the best, thank you.
(Refer Slide Time: 12:54)
985
THIS BOOK
IS NOT FOR
SALE
NOR COMMERCIAL USE
(044) 2257 5905/08
nptel.ac.in
swayam.gov.in
Download