Document

advertisement
§ 彩色系統 與 彩色模式的轉換
․常用色彩模式有(1) RGB (2)YIQ (3) HSV (4) YUV (5) YCbCr ; 影像壓縮與彩色
影像處理都是將亮度與色彩分開處理, 其中色彩是以人類分辨顏色的方式處理
․顯示器、掃瞄器、攝影機等影像I/O, 工作在RGB空間; CMY用於印刷, YUV和
YIQ用於數據壓縮, HSI則用於影像處理。
green
․RGB是一直角座標系統, CIE定出的三原色: red(700nm)、
green(546.1nm)、blue(435.8nm)。通常不直接處理RGB
影像, 例如: 偵測物體邊緣, 會得到R、G、B個別成份的
邊緣, 而非物件真正邊緣, 故有不同彩色系統。
白
red
blue
黃
青 黃
․補色光: 同一對色光可混加成白色光, 如右圖所示:
紅
紫
紅
紅--青(cyan), 綠--紫(magenta), 藍--黃(yellow)  C  1  R 
藍
紫
可各自吸收其補色光; CMY與RGB轉換關係為:  M   1  G 
 Y  1  B 
彩色印刷以顏料或染料的吸光性質顯示色彩,
又稱為減色系統(i.e.CMY system); 將C、M、Y三種顏料依不同比例打在白紙
上, 在白光照射下, 反射出不同比例的R、G、B光, 而呈現各種色彩。
Cb : blue minus “black and white”, Cr; red minus “black and white”
CIE: Commission Internationale de L’Eclairage
․YUV和YIQ皆是為了讓彩色電視信號, 能夠利用原有黑白電視轉播系統(頻寬
4.5MHz)傳送信號, 即不同電視台之間的載波間隔為4.5MHz; 而黑白影像頻寬
為4MHz, 如果直接傳送R、G、B訊號將需要12MHz, 因此將RGB訊號轉換成
亮度與色彩訊號, 利用眼睛對色彩變化的不靈敏性, 將色彩訊號以較低頻 (約
500KHz)和亮度訊號調變在一起。
0.587 0.114  R 
 Y   0.299
U    0.147  0.289 0.436 G 
  
 
V   0.615  0.515  0.10  B 
․YUV已成為視訊壓縮的標準, 其與RGB轉換關係:
Y=0.299R+0.587G+0.114B , U=0.493(B-Y)=0.493(-0.299R-0.587G+0.886B)
V=0.877(0.701R-0.587G-0.114B) 可將U、V範圍皆正規化至(-0.5, 0.5), 即
 Y   0.299
U '    0.169
  
V '   0.50
0.114   R 
 0.331
0.50  G 
 0.419  0.081  B 
0.587
將YUV轉換回RGB:
 R  1
G   1
  
 B  1
1.140   Y 
 R  1



 0.395  0.581 U  或 G   1
 V 
 B  1
2.032
0
0
0
 0.344
1.772
1.402   Y 
 0.714 U ' 
 V ' 
0
․U與V所在平面稱為色差平面, 由上面的式子可證明U與V互為正交; Y中的加權
值代表人眼對相同R、G、B值的不同亮度感應, U和V代表去除亮度後的藍色與
紅色; 當R=G=B時, U和V之值皆為0, 意指無色差, 即為白光。U與V式子的常數
0.493 和 0.877是為了避免此二色差訊號與Y混合成視訊信號後, 會造成過度調
變而作的適度衰減。
․I,Q向量的相位角與U,V向量的相位角差330 , 即 I=Vcos330-Usin330
Q=Vsin330+Ucos330
i.e. I=0.839V-0.545U Q=0.545V+0.839U
Y  0.299 0.587 0.114   R 
 I   0.596  0.275  0.321 G 
例 如, Y  0.299R  0.587G  0.114B
  
 
Q  0.212  0.523 0.311   B 
代表亮度的Y受到綠色的影響, 遠大於R與B的影響; 故, 若欲轉換成高灰階影像,
轉換成YIQ後的Y是很合適的其中I與Q代表兩種色差向量
將I、Q範圍皆正規化至(-0.5, 0.5),
將YIQ轉換回RGB, 利用  R 
1

G   1
 B  1
 Y  0.299
則  I '    0.50
'

Q 
 
 0.203
 0.231
 0.50
0.620  Y  R 1
 0.272  0.647  I  或 G=1
 1.108 1.705  Q  B 1
0.956
 10,20,40
100,150,200
Ex1: 有一 2X2 RGB影像 I  
R, G, B  40
0.587
求Y1,1與 I1, 2 及Q2, 2
A: Y11=19.29 I12=9.17 Q22=-82.83
40,30,20 
,

50,250,120
0.114   R 
 
 0.269
 G 
0.297 

B

0.648   Y 
 0.324  0.677  I ' 
 1.321 1.785  Q ' 
1.139
其 中(10,20,40)依次代表
HSI System
․RGB 主要缺點: 每一個成份之間有很高的關聯性; 故, 欲分辨不同色彩, 要利用
色相(hue)、彩度(saturation)、亮度(intensity), 此之謂 HIS system。
․HSI 中的每一個成份彼此之間是不相關的, hue: 不同波長的光在人眼中感覺出
來的色彩; 例如, 紅、橙、黃、綠等。Saturation: 顏色的飽和度, 即顏色中滲入
白色的程度; 高彩度表示滲入的白色少, 例如粉紅色比紅色的彩度低。
chromaticity: 指色相與彩度。
․CIE定義析色圖(chromaticity diagram): 所有光譜上的色光, 分佈在析色圖的周
圍; 這些色光混合出來的所有顏色皆在析色圖上, 其與RGB之關係如圖:
色相以角度表示, 紅色之色相為零度; 如下圖所示,

G
以w為中心, 由 W R 旋轉一個θ角
Y
520nm
G
即代表一個不同色彩; 彩度以百分比表示
X Y
W 
純彩色的彩度為100%, 白色的彩度為
R
W
R
B
0%; 邊界的每一個顏色, 其彩度均為
780nm
100%; 如圖中的點 X 和 Y 色調相同, 但 Y 的彩度為100%,
B
X
380nm
而點 X 的彩度為  W X  100%
 WY 


․析色圖中任意兩點的連線, 代表此兩點所能調出來的不同顏色; ,析色圖上的色
彩是定在同一個亮度下。 不同亮度的色彩, 可構成不同的色彩三角形; 如下圖,
A所代表色彩三角形中的每一點之亮度都相同, 且皆小於C所代表色彩三角形中
C
G
的每一點之亮度。
A
O
B
․經向量推導後, 可得下列關係式
I
1
R  G  B 
3


H1  cos 


1
S
S  1
3
 min( imum ( R, G, B))
RG  B


, 其中H  H1
 R  B G  B  

0.5R  G   R  B 
R  G 2
maxR, G, B   minR, G, B 
max(R, G, B)
V 
R
RG  B  0
if B  G; H  3600  H1
if
BG
max(R, G, B)
255
其中, S代表飽和度, 其值介於0~1之間; 人的皮膚色的飽和度介於0.23~0.63之
間。V代表明暗度, 其值也介於0~1之間; 人臉的色調範圍介於0~0.9之間。
․HSV系統也稱作HSB系統(其中B=Brightness), 又稱為HSI系統(其中I=灰階值)
H=00時, 代表紅色; H=1200代表綠色; H=2400代表藍色; S=0代表灰階影像;
H=00且S=1影像為紅色; V=0表示黑色, V=1表示白色亮光。
․Face Detection時, 容易受到光的強弱變化影響, 常以HSV中的H(色調)為偵測
人臉的依據, 主要原因是H較不易受到亮度的影響
․欲將HSI轉換回RGB, 分(1) 00≤H<1200時, B值最小; 故
B  (1  S )  I
(2)

S cos H 
R  1 
I
0
cos(
60

H
)


1200<H≤2400時, R  (1  S )  I
(3) 2400<H≤3600時, G  (1  S )  I
G  3I  ( R  B)

S cos(H  1200 ) 
G  1 
I
0
cos(
180

H
)


 S cos(H  2400 ) 
B  1 
I
0
cos(
300

H
)


B  3I  ( R  G )
R  3I  ( B  G )
․ YUV 和 YIQ 用於數據壓縮, YIQ 用於美制 NTSC (National Television
Standards Committee) 彩色電視系統; YUV用於英制PAL(Phase Alternation
Line) 彩色電視系統。
․第四種模式YIQ的IQ與第二種色彩模式YUV的UV之關係如下:
I=-Usin330+Vcos330 ; Q=Ucos330+Vsin330 故可透過YIQ將RGB轉換成YUV
在JPEG系統中, 常將RGB轉換成YCbCr : Cb=(B-Y)/2+0.5
Cr=(R-Y)/2+0.5
Ex2: 有(R,G,B)= (a) (201,187,180) (b) (79,10,47) 兩個pixels分別隸屬不同
的影像, 哪一個色彩表現較鮮艷?
max(201,187,180)  min(201,187,180)
 0.104
Sb  0.873  Sb 較鮮艷
A:  S a 
max(201,187,180)
․CIE 彩色模式可將一個顏色分為色彩與亮度兩部分, 它在色彩分佈的叢聚性
(clustering) 與 色彩差異性的評估等方面的表現上, 更勝於RGB; 常見CIE系統
有CIE XYZ、CIE xyY、CIE La*b*與 CIE Lu’v’。
․如何將RGB轉換為CIE Lu’v’ ?
首先根據下式轉換為CIE XYZ
0.31
0.2   R 
 X   0.49
 Y   0.17697 0.8124 0.01063 G 
, 所得到的Y可視為亮度L
  
 
 Z   0.00
0.01
0.99   B 
4X
9X
再透過u ' 
及v' 
求得u ' 與v'
X  15Y  3Z
X  15Y  3Z
Ex3: 將一張3X3的RGB影像(如右)轉換為CIE Lu’v’ ?
u’
v’
A: L
71.82
0.2629
0.5916
81.26
0.2620
0.5895
110.6
0.2471
0.5559
R
G
B
117
62
70
132
70
97
159
100
115
․CIE u’v’ 色彩分佈圖, 如右:
0.6
G’
W
圖中綠色弧線稱為光譜軌跡(Spectral Locus), 0.5
R’
0.4
其內為可見光區域, 波長由右上至左下排列;
0.3
內部三角形為一般CRT螢幕可以顯示的顏色
0.2
0.1
範圍, 稱為彩色色域三角形(Color Gamut
B’
0.0
Triangle), 其中R’(u’,v’)=(0.4507, 0.5229)、
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
G’(u’,v’)=(0.125, 0.5625)、
B’(u’,v’)=(0.1754, 0.1579); W(u’,v’)=(0.17798, 0.4683)則是被對應到白色點。
․因為所有的影像仍要在RGB模式底下顯示, 因此以CIE Lu’v’做完相關處理後,
9u '
仍需轉換回到RGB模式--- 先轉換成CIE x yY , 即 x 
6u '16v'12
y
4v '
6u '16v'12
Y Y
Y L
再透過CIE
XYZ, 即X  x( X  Y  Z )
Z  z ( X  Y  Z ), 其中 z  1  x  y
Y
 (X Y  Z)
y
․彩色影像處理的方法和黑白影像處理的方法, 基本上是相同的; 針對不同的處
理, 以不同彩色系統為之。故, 像是測邊、分割、彩色對比加強等, 留待最後;
先處理灰階影像。
Review Exercises
1.顏色A=(100,100,50), B=(50,50,0)則此二顏色之亮度比較為_____(=,>,<)色調
值比較為____(=,≠)飽和度比較為____(=,>,<). 顏色C=(20,20,20)則此顏色飽合
度值為_____。
2.顏色甲r=0.5,g=0.5,b=0.0, 顏色乙 r=0.4,g=0.4,b=0.2, 此二顏色H.S.I.值是否相
同,比較其H.S.I值大小
3. In additive color system, what colors can be generated by A(r1,g1,b1) and
B(r2,g2,b2)___________________________
4. A color image can be represented in different color system. __________in
computer display, __________in JPEG, and __________ in the printing
industry.
5. Run the attached program, rgbhsv.m, with disney.bmp/ autumn.png for true
color image and trees.tif/ emuu.fig/ pout2.png for binary image to see what
happens.
Graphics File Formats
․Bitmap is a two-dimensional array of values, and each element of the array
corresponds to a single dot in a picture, i.e. a pixel. In most cases a pixel’s
value is an index into a table of colors, indicating that the pixel is to be
displayed using the color pointed to by the index. The colors in the table are
collectively referred to as a palette.
․Termed bitmapped and vector formats. They are fundamentally different: the
former stores a complete, digitally encoded image while the latter representing a picture as a series of lines, arcs circles, and text, something like
“move to(100,100), select color blue, circle with radius 50”, etc. The main
disadvantage is they can only reasonably describe a line drawing, not a
photographic image. Metafiles: containing a list of image creation commands
along with vectors and circles, and are really programs. Drawing the image it
describe is impractical without access to the graphics package it depends on.
․If palette sizes used are16 or 256 colors, then the corresponding index sizes
are 4 and 8 bits referred to as the number of bits per pixel. In a bitmap of 4
bits per pixel, each byte holds 2 separate pixel values. In a bitmap of 8 bits
per pixel, each byte represents a single pixel.
․Bitmaps that represent very large numbers of colors simultaneously generally
do not employ the palette scheme, but a pixel’s value directly defines a color.
․Pixel Ordering: The simplest is to store the pixels a row at a time. Each row is
referred to as a scan line, and most often storing from left to right with rows
from top to bottom.
․Image Geometry: Every computer image has an internal geometry used to
position elements in the picture. The 2 most common are screen coordinates
and graph coordinates. The former is commonly used for display and shown
as the following left figure (the 2 scales may be different, IBM VGA for
example.)
Y
(0,0)
(1,0)
(2,0)
(0,1)
(1,1)
(2,1)
(0,2)
(1,2)
….
(2,10)
….
(10,3)
screen(left) and graph coordinates(right)
The latter is often used to be printed on a paper.
(0,0)
X
․Bitmapped Graphics File Formats
BMP Microsoft Windows Bitmap, general-purpose for bitmapped image
GIF CompuServe Graphics Interchange Format, general-purpose to
transmit image by modem–- utilizing data compression to reduce
transmission times and also supporting interlaced image.
TIFF Aldus/Microsoft Tagged Image File Format, complex, multipurpose and
open-ended and supporting all types of bitmaps and bitmap-related
measures.
JEPG Joint Photographic Experts Group under the auspices of the ISO,
fundamentally a bitmapped format. Instead of storing individual pixels,
it stores blocks of data that can be approximately to reconstruct blocks
of pixels, and also called lossy.
․Interleaving: The simplest is to store the even numbered rows, then the odd
rows, i.e. 0, 2, 4, …, 1, 3, 5, ... Or maybe 0, 2, 4, …, 98, 99, 97,
95, …, 3, 1 supposing there were a total of 100 rows. The
original point is to match the order of scan lines used on TV, i.e. even-downodd-up. Another advantage is one can quickly construct an approximate
version of the image without having to read the whole file.
․GIF uses a four-way interleave that first stores every eighth row, then three
more sets of rows, each of which fills in the rows halfway between the ones
already stored. GIF is copyrighted but freely used, and employs patented
LZW compression.
․The most practical approach to dealing with a bitmapped image is to treat it
as a collection of scan lines--- writing functions that read and write scan lines,
display scan lines, and the like.
****************************************************************************************************************
․Most adaptive-dictionary-based techniques have theirs roots in two landmark papers by Jacob
Ziv and Abraham Lempel in 1977 and 1978. That is what we call the LZ77 family (also known as
LZ1), and this LZ78 or LZ2 family. The most well-known modification of LZ2 is the one by Terry
Welch, known as LZW.
․Printer Data File: 2 general types, namely extended text formats and page
description languages. The former embed picture
information inside a convention text stream; that is, plain
text prints as itself, and escape sequences introduce nontext elements, PCL
of Hewlett-Packard’s being a de facto standard for low- to medium-performance laser printers for example. The other is to define an entirely new
language to describe what is to be printed on the page, PostScript becoming
the standard description language for example.
Converting File Types
1) bitmap to bitmap, one reads a file format, extracts the array of pixels, and
then writes the same pixels in any other format, PBM utilities (PGM, PPM)
supporting the transformations for example. Image transformation of this
kind has nothing to do with file processing per se!
․Promoting from a less expressive format to a more expressive format does
nothing at all– a white pixel remains white, a 50 percent gray remains a 50
percent gray, and so forth. Conversion in reverse direction is not easy. The
goal is to produce the best-looking image possible given the limitations of
the new format.
2) Color to Gray Conversion: For each pixel, one need only determine the
pixel’s luminance, a value conventionally computed from the component
values as Y(or L) in slide 2.
․Color Quantization: Sometimes one has a color image with more colors than
the local hardware can handle, such as a full-color image to be displayed on
a screen that can only show 64 or 256 colors. A process called quantization
selects a representative set of colors and then assigns one of those colors to
each pixel. For example, that a digitalized photograph using 246 gray scale
values to be displayed on a screen with 3 bits per pixel, a show of 8 colors, is
much coarser but still recognizable.
․Dithering: The limited number of colors is quite noticeable in areas with
gradual changes of color. One way to decrease this effect is by dithering,
spreading the quantization error around from pixel to pixel to avoid unwanted step effects. It turns out to be a much smoother image than the previous
example by using 8 colors and also dithering.
3) Vector to vector conversion reconciles the slightly different semantics of
different formats and, to some degree, handling coordinate systems. For
example, a ‘circle’ command in the original turns into a ‘circle’ command in
the translated file. Problems arise when the two formats don’t have
corresponding commands. One might approximate it or simulate it with a
series of short line segments.
4) Vector to bitmap rasterization is the task of taking an image described in a
vector graphics format (shapes) and converting it into a raster image (pixels
or dots) for output on a video display or printer, or for storage in a bitmap file
format. Rasterization refers to the popular rendering algorithm for displaying
three-dimensional shapes on a computer. Real-time applications need to
respond immediately to user input, and generally need to produce frame
rates of at least 25 frames per second to achieve smooth animation.
Rasterization is simply the process of computing the mapping from scene
geometry to pixels and does not prescribe a particular way to compute the
color of those pixels. Shading, including programmable shading, may be
based on physical light transport, or artistic intent.
(going on with #31)
․Since all modern displays are raster-oriented, the difference between rasteronly and vector graphics comes down to where they are rasterised; client side
in the case of vector graphics, as opposed to already rasterised on the (web)
server.
․Basic Approach: The most basic algorithm takes a 3D scene, described as
polygons, and renders it onto a 2D surface, usually a computer monitor.
Polygons are themselves represented as collections of triangles. Triangles
are represented by 3 vertices in 3D-space. At a very basic level, rasterizers
simply take a stream of vertices, transform them into corresponding 2D
points on the viewer’s monitor and fill in the transformed 2D triangles as
appropriate.
․Transformations are usually performed by matrix multiplivation. Quaternion
math may also be used. The main transformations are translation, scaling,
rotation, and projection. A 3D vertex may be transformed by augmenting an
extra variable (known as a "homogeneous variable") and left multiplying the
resulting 4-component vertex by a 4 x 4 transformation matrix.
․A translation is simply the movement of a point from its original location to
another location in 3-space by a constant offset. Translations can be
1 0 0 x   x 0
represented by the leftmost matrix, where X, Y, and 0 1 0 y  0 y

 
0 0 1 z   0 0
Z are the offsets in the 3 dimensions, respectively.


0 0 0 1 
0
0
z

0 0 0
0
0
0

1
․A scaling transformation is performed by multiplying the position of a vertex
by a scalar value. This has the effect of scaling a vertex with respect to the
origin. Scaling can be represented by the upright matrix, and X, Y, and Z are
the values by which each of the 3-dimensions are multiplied. Asymmetric
scaling can be accomplished by varying the values of X, Y, and Z.
․Rotation matrices depend on the axis around which a point is to be rotated.
0
0  cos 0 sin  0
cos  sin  0 0
1) Rotation about the X-axis: 1 0



 sin  cos 0 0
0
1
0
0
0
cos


sin

0

2) Rotation about the Y-axis: 0 sin  cos 0  sin  0 cos 0  0
0
1 0

 


0
0
1
0
1  0
0
0 1
3) Rotation about the Z-axis: 0 0
 0
(1)
(2)
(3)
θ in all each of these cases represent the angle of rotation.
․Rasterization systems generally use a transformation stack to move the
stream of input vertices into place. The transformation stack is a standard
stack which stores matrices. Incoming vertices are multiplied by the matrix
stack. As an illustrative example, imagine a simple scene with a single model
of a person. The person is standing upright, facing an arbitrary direction while
his head is turned in another direction. The person is also located at a certain
offset from the origin. A stream of vertices, the model, would be loaded to
represent the person. First, a translation matrix would be pushed onto the
stack to move the model to the correct location. A scaling matrix would be
pushed onto the stack to size the model correctly. A rotation about the y-axis
would be pushed onto the stack to orient the model properly. Then, the
stream of vertices representing the body would be sent through the rasterizer.
Since the head is facing a different direction, the rotation matrix would be
popped off the top of the stack and a different rotation matrix about the y-axis
with a different angle would be pushed. Finally the stream of vertices
representing the head would be sent to the rasterizer.
After all points have been transformed to their desired locations in 3-space
with respect to the viewer, they must be transformed to the 2-D image plane.
The orthographic projection, simply involves removing the z component from
transformed 3d vertices. Orthographic projections have the property that all
parallel lines in 3-space will remain parallel in the 2-D representation.
However, real world images are perspective images, with distant objects
appearing smaller than objects close to the viewer. A perspective projective
transformation needs to be applied to these points.
․Conceptually, the idea is to transform the perspective viewing volume into
the orthogonal viewing volume. The perspective viewing volume is a frustum,
that is, a truncated pyramid. The orthographic viewing volume is a
rectangular box, where both the near and far viewing planes are parallel to
the image plane.
․A perspective projection transformation can be represented by the following
0
0
matrix: 1 0
F and N here are the distances of the far and near
0 1
0
0 
viewing planes, respectively. The resulting four

0 0 ( F  N ) / N  F 
vector will be a vector where the homogeneous


1/ N
0
0 0
variable is not 1. Homogenizing the vector, or
multiplying it by the inverse of the homogeneous variable such that the
homogeneous variable becomes unitary, gives us our resulting 2-D location
in the x and y coordinates.
․Clipping: Once triangle vertices are transformed to their proper 2d locations,
some of these locations may be outside the viewing window, or the area on
the screen to which pixels will actually be written. Clipping is the process of
truncating triangles to fit them inside the viewing area.
․The common technique is the Sutherland-Hodgeman clipping algorithm: each
of the 4 edges of the image plane is tested at a time. For each edge, test all
points to be rendered. If the point is outside the edge, the point is removed.
For each triangle edge that is intersected by the image plane’s edge, that is,
one vertex of the edge is inside the image and another is outside, a point is
inserted at the intersection and the outside point is removed.
․Scan conversion: The final step in the traditional rasterization process is to
fill in the 2D triangles that are now in the image plane, also known as scan
conversion. The first problem to consider is whether or not to draw a pixel at
all. For a pixel to be rendered, it must be within a triangle, and it must not be
occluded, or blocked by another pixel. The most popular algorithm of filling in
pixels inside a triangle is the scanline algorithm. Since it is difficult to know
that the rasterization engine will draw all pixels from front to back, there must
be some way of ensuring that pixels close to the viewer are not overwritten
by pixels far away.
․The z buffer, the most common solution, is a 2d array corresponding to the
image plane which stores a depth value for each pixel. Whenever a pixel is
drawn, it updates the z buffer with its depth value. Any new pixel must check
its depth value against the z buffer value before it is drawn. Closer pixels are
drawn and farther pixels are disregarded.
․To find out a pixel's color, textures and shading calculations must be applied.
A texture map is a bitmap that is applied to a triangle to define its look. Each
triangle vertex is also associated with a texture and a texture coordinate (u,v)
for normal 2-d textures in addition to its position coordinate. Every time a pixel
on a triangle is rendered, the corresponding texel (or texture element) in the
texture must be found-- done by interpolating between the triangle’s vertices’
associated texture coordinates by the pixels on-screen distance from the
vertices. In perspective projections, interpolation is performed on the texture
coordinates divided by the depth of the vertex to avoid a problem known as
perspective foreshortening (a process known as perspective texturing).
․Before the final color of the pixel can be decided, a lighting calculation must
be performed to shade the pixels based on any lights which may be present
in the scene. There are generally three light types commonly used in scenes.
․Directional lights are lights which come from a single direction and have the
same intensity throughout the entire scene. In real life, sunlight comes close
to being a directional light, as the sun is so far away that rays from the sun
appear parallel to Earth observers and the falloff is negligible.
․Point lights are lights with a definite position in space and radiate light
evenly in all directions. Point lights are usually subject to some form of
attenuation, or fall off in the intensity of light incident on objects farther away.
Real life light sources experience quadratic falloff. Finally, spotlights are like
real-life spotlights, with a definite point in space, a direction, and an angle
defining the cone of the spotlight. There is also often an ambient light value
that is added to all final lighting calculations to arbitrarily compensate for
global illumination effects which rasterization can not calculate correctly.
․All shading algorithms need to account for distance from light and the normal
vector of the shaded object with respect to the incident direction of light. The
fastest algorithms simply shade all pixels on any given triangle with a single
lighting value, known as flat shading.
․There is no way to create the illusion of smooth surfaces except by subdividing into many small triangles. Algorithms can also separately shade
vertices, and interpolate the lighting value of the vertices when drawing pixels,
known as Gouraud shading. The slowest and most realistic approach is to
calculate lighting separately for each pixel, noted as Phong shading. This
performs bilinear interpolation of the normal vectors and uses the result to do
local lighting calculation.
****************************************************************************************************************
․bilinear interpolation is an extension of linear interpolation functions of two variables on a
regular grid. The key idea is to perform linear interpolation first in one direction, and then again
in the other direction. In computer vision and image processing, bilinear interpolation is one of
the basic resampling techniques(影像內插法).
․Application in image processing:
1) It is a texture mapping technique that produces a reasonably realistic image, also known as
bilinear filtering or bilinear texture mapping. An algorithm is used to map a screen pixel location
to a corresponding point on the texture map. A weighted average of the attributes (color, alpha,
etc.) of the four surrounding texels is computed and applied to the screen pixel. This process is
repeated for each pixel forming the object being textured.
2) When an image needs to be scaled-up, each pixel of the original image needs to be moved in
certain direction based on scale constant. However, when scaling up an image, there are pixels
(i.e. Hole) that are not assigned to appropriate pixel values. In this case, those holes should be
assigned to appropriate image values so that the output image does not have non-value pixels.
3) Typically bilinear interpolation can be used where perfect image transformation, matching and
imaging is impossible so that it can calculate and assign appropriate image values to pixels.
Unlike other interpolation techniques such as nearest neighbor interpolation and bicubic
interpolation, bilinear Interpolation uses the 4 nearest pixel values which are located in
diagonal direction from that specific pixel in order to find the appropriate color intensity value of a
desired pixel.
․如圖所示, 4x4的影像透過此法, 可以放大為8x8的影像; 紅點為原始點, (x,y’) (x,y+1)
(x,y)
白點是新產生的點。
λ μ 1-μ
(x’,y’)
1-λ
(x+1,y)
(x+1,y’) (x+1,y+1)
•
•
•
•
•
•
•
•
•
Suppose that we want to find the value of the unknown function f at the point P = (x, y). It is
assumed that we know the value of f at the four points Q11 = (x1, y1), Q12 = (x1, y2), Q21 =
(x2, y1), and Q22 = (x2, y2).
We first do linear interpolation in the x-direction. This yields
We proceed by interpolating in the y-direction.
This gives us the desired estimate of f(x, y).
If we choose a coordinate system in which the four points where f is known are (0, 0), (0, 1),
(1, 0), and (1, 1), then the interpolation formula simplifies to
Or equivalently, in matrix operations:
Contrary to what the name suggests, the interpolant is not linear. Instead, it is of the form
so it is a product of two linear functions. Alternatively, the interpolant can be written as
•
•
•
•
where
In both cases, the number of constants (four) correspond to the number of data points where f
is given. The interpolant is linear along lines parallel to either the x or the y direction,
equivalently if x or y is set constant. Along any other straight line, the interpolant is quadratic.
The result of bilinear interpolation is independent of the order of interpolation. If we had first
performed the linear interpolation in the y-direction and then in the x-direction, the resulting
approximation would be the same.
The obvious extension of bilinear interpolation to three dimensions is called trilinear
interpolation.
․Acceleration techniques: To extract the maximum performance out of any
rasterization engine, a minimum number of polygons should be sent to the
renderer, culling out objects which can not be seen.
․Backface culling: The simplest way to cull polygons is to cull all polygons
which face away from the viewer, known as backface culling. Since most 3d
objects are fully enclosed, polygons facing away from a viewer are always
blocked by polygons facing towards the viewer unless the viewer is inside the
object. A polygon’s facing is defined by its winding, or the order in which its
vertices are sent to the renderer. A renderer can define either clockwise or
counterclockwise winding as front or back facing. Once a polygon has been
transformed to screen space, its winding can be checked and if it is in the
opposite direction, it is not drawn at all. Note: backface culling can not be
used with degenerate and unclosed volumes.
․Using spatial data structures to cull out objects which are either outside the
viewing volume or are occluded by other objects. The most common are
binary space partitions, octrees, and cell and portal culling.
․Texture filtering, one of further refinements, to create clean images at any
distance : Textures are created at specific resolutions, but since the surface
they are applied to may be at any distance from the viewer, they can show up
at arbitrary sizes on the final image. As a result, one pixel on screen usually
does not correspond directly to one texel.
․Environment mapping is a form of texture mapping in which the texture
coordinates are view-dependent. One common application, for example, is to
simulate reflection on a shiny object. One can environment map the interior of
a room to a metal cup in a room. As the viewer moves about the cup, the
texture coordinates of the cup’s vertices move accordingly, providing the
illusion of reflective metal.
․Bump mapping is another form of texture mapping which does not provide
pixels with color, but rather with depth. Especially with modern pixel shaders,
bump mapping creates the feel of view and lighting-dependent roughness on
a surface to enhance realism greatly.
․Level of detail: Though the number of polygons in any scene can be
phenomenal, a viewer in a scene will only be able to discern details of closeby objects. Objects right in front of the viewer can be rendered at full
complexity while objects further away can be simplified dynamically, or even
replaced completely with sprites.
․Shadow mapping and shadow volumes are two common modern techniques
for creating shadows, taking object occlusion into consideration.
․Hardware acceleration: Most modern programs are written to interface with
one of the existing graphics APIs, which drives a dedicated GPU. The latest
GPUs feature support for programmable pixel shaders which drastically
improve the capabilities of programmers. The trend is towards full programmability of the graphics pipeline.
RGB Basis for Color
․The RGB encoding in graphics system usually uses 3 bytes enabling (28)3
or roughly 16 million distinct color codes. Display devices whose color
resolution matches the human eye are said to use true color. At least 16 bits
are needed: A 15-bit encoding might use 5 bits for each of R,G,B while a 16bit encoding would better model the relatively larger green sensitivity using 6bits.
․The figure below shows one way of encoding: Red(255, 0, 0) and green(0,
255, 0) combined in equal amounts create yellow(255, 255, 0). The amount
of each primary color gives its intensity.
red
green
blue
yellow
white
grey
black
RGB
( 255 , 0 , 0 )
( 0 , 255 , 0 )
( 0 , 0 , 255 )
(255 , 255 , 0)
(100, 100, 50)
(255, 255, 255)
(192, 192, 192)
(127, 127, 127)
( 63 , 63 , 63 )
…….
(0,0,0)
CMY
( 0 , 255 , 255)
( 255 , 0, 255 )
( 255 , 255 , 0 )
( 0 , 0 , 255 )
(155, 155, 205)
(0,0,0)
( 63, 63, 63 )
(128, 128, 128)
(192, 192, 192)
……..
(255, 255, 255)
HIS
(0.0 , 1.0 , 255)
(2.09 , 1.0 , 255)
(4.19 , 1.0 , 255)
(1.05 , 1.0 , 255)
(1.05 , 0.5 , 100)
( -1.0 , 0.0 , 255 )
( -1.0 , 0.0 , 192 )
( -1.0 , 0.0 , 127 )
( -1.0 , 0.0 , 63 )
……
( -1.0 , 0.0 , 0 )
NOTE: H∈[ 0 , 2π], S∈[ 0 , 1 ] and I∈[0, 255]. Byte codings exist for H and S.
․If all components are of highest intensity, then the color white results. Equal
proportions of less intensity create shades of grey(c, c, c) for any constant
0<c<255 down to black(0, 0, 0). It is more convenient to scale values in the
range 0 to 1 rather than 0 to 255 since such a range is device-independent.
․The RGB system is an additive color system because colors are created by
adding components to black(0, 0, 0). Three neighboring elements of
phosphor corresponding to a pixel are struck by 3 electron beams of intensity
c1 , c2 , c3 respectively; the human eye integrates their luminance to perceive
color(c1, c2, c3). The light of 3 wavelengths from a small region of the CRT
screen is thus physically added or mixed together.
․Encoding a pixel of a digital image as (R,G,B), where each coordinate is in
the range [0, 255], one can use the following equations to normalize image
data for interpretation by both computers and people and for transformation
to other color systems.
Intensity I = (R+G+B)/3
normalized red r = R/(R+G+B)
normalized green g = G/(R+G+B)
normalized blue b = B/(R+G+B)
( There are alternative normalizations.)
․By using r+g+b=1, the relationship of coordinate
values to colors can be plotted via a 2D graph as
in top right graph. Pure colors are represented by
points near the corners of the triangle. The blue axis
is out of the page perpendicular to the r and g axes
in the figure, and thus the triangle is actually a slice
through the points[1,0,0], [0,1,0], [0,0,1] in 3D.
The value for blue can be computed as b = 1-- r -- g
for any pair of r-g values shown inside the triangle.
g
1
G
540
510
560
500
1/3 500
W
B
0
600
pink R
780
wavelength
400
0
gold
1/3
1
CMY
․The CMY models printing on white paper and subtracts from white, thus
creating appropriate reflections when the printed image is illuminated with
white light. Some encodings are white(0, 0, 0) because no white illumination
should be absorbed, black(255, 255, 255) because all components of white
light should be absorbed and yellow(0, 0, 255) because the blue component
of incident white light should be absorbed by the inks, leaving the red and
green components to create the perception of yellow.
r
HIS
․The HIS system encodes color information by separating out an overall
intensity value I from two values encoding chromaticity—hue H and
saturation S. In the color cube representation below, each r, g, b value can
ranged independently in [0, 1]. If projecting the
grey
blue
color cube along its major diagonal, i.e. from [0,0,0]
cyan
[0,0,1]
[0,1,1]
to [1,1,1], we arrive at the hexagon at the following magenta
白
[1,0,1]
green
figure: shades of grey that were formerly along the
black
[0,1,0]
[1,0,0]
color cube diagonal now are all projected to the
[1,1,0]
red yellow
center white point while the red point is now at the
right corner and the green point is at the top left corner of the hexagon.
green
green I
․A 3D representation, called
yellow
yellow
H = 2π/3
H = π/3
hexacone, is shown in the
right, allowing us to visualize cyan
red cyan
S H
red
H
=
0
W
W
the former cube diagonal as H = π
a vertical intensity axis I. Hue H
magenta
I = 1.0
blue
blue
is defined by an angle between H = 4π/3
0 and 2π relative to the red axis,
I = 0.5
with pure red at an angle of 0,
[0,0,0] black
pure green at 2π/3 and pure blue
at 4π/3. Saturation S is the 3rd coordinate value needed in order to complete-
ly specify a point in this color space.
․Saturation models the purity of the color or hue, with 1 modeling a completely pure or saturated color and 0 modeling a completely unsaturated hue, that
is, some shade of gray.
․The HSI system is also referred to as the HSV system using the term value
instead of intensity. HIS is more convenient to some graphics designers
because it provides direct control of brightness and hue. Pastels are
centered near the I axis, while deep or rich colors are out at the periphery of
the hexacone. HIS might also provide better support for computer vision
algorithms because it can normalize for lighting and focus on the two
chromaticity parameters that are more associated with the intrinsic character
of a surface rather than the source that is lighting it.
․Important to note that hue H is unchanged in the images when changing
(either increasing or decreasing) their saturation for the very same image,
respectively and should thus be a reliable feature for color segmentation
despite variations in intensity of white light under which a machine vision
system might have to operate.
§ 位元平面
․三原色RGB可分解成R平面, G平面, B平面, 如右:
A
A
A
高灰階像素也可分解成八個位元平面,
假設256個灰階值表示成(g8g7g6g5g4g3g2g1)2 , 每一像素提供第i個位元, 即gi
以組成第i個位元平面(也就是第i張黑白影像), 如下:
B
B
B
B
Ex4: 給定一 4X4子影像:
A:
B
B
8
7
6
5
32
31
30
29
10
11
12
13
0
1
2
3
00001000
00000111
00000110
00000101
00100000
00011111
00011110
00011101
00001010
00001011
00001100
00001101
00000000
00000001
00000010
00000011
B
B
, 算出第三張位元平面?
⇒
0
1
1
1
0
1
1
1
0
0
1
1
0
0
0
0
․利用位元平面植入影像的缺點: 經過壓縮後, 所植入的影像容易遭到破壞;
解壓縮後所得影像, 常已破損; 即為數學上的One-way function。
§ Steganography and Watermark
․實際重疊高階四個位元平面(捨棄低階四個位元平面)所得影像, 肉眼幾乎分辨
不出差異; 故捨棄低階四個位元並不影響影像特徵太大(此乃因愈低階位元的
權重愈低, 所以影響影像特徵的機率愈小)。
例如, 某影像中有兩像素, 其灰階值為19310=(11000001)2 與 192=(11000000)2 ,
可把灰階值為37=(00100101)2 的第三個像素隱藏於前述影像中; 所得灰
階值為194=(11000010)2 與 197=(11000101)2 , 人眼幾乎察覺無異其影
像特徵。
․ 假設一個位元組可以隱藏一位元, 且影像術規則如下:
(1) 從浮水印讀出之位元為0, 則原影像對應位元組的最後兩位元由01/10, 改為
00/11。
(2) 從浮水印讀出之位元為1, 則原影像對應位元組的最後兩位元由00/11, 改為
01/10。
(3) 其餘情況則保持原狀
例如, 位元組11000000要隱藏位元1, 則改為11000001; 要隱藏0, 則位元組
11000000保持不變。
Ex5: 原影像為 24 710
21
42
8
66
34
10
12
, 想隱藏如右浮水印
A: 先改成二進制(如下), 再根據規則得
00011000
XX
XX
00101010
XX
XX
00100010
XX
XX
1
0
0
1
0
1
0
1
0
25
710
20
42
8
66
35
10
12
, 求出加入浮水印後
的十進位影像?
基本原理
․令B’為A隱藏在B後的結果, PSNR常用來評估B’和B的相似性;
2552
PSNR  10log
MSE
1
MSE  2
N
N 1 N 1
 B' ( x, y)  B( x, y
2
x 0 y 0
PSNR是不錯的失真表示法, 但無法充分反應紋理(texure)的失真情形; 所謂的
浮水印, 可把A看成標誌(logo)---而這標誌通常也是一種版權; 例如, NCKU
之於成大。Note: A的大小必須小於B ; 故必要時, 可把A先壓縮。
․設A為灰階影像且可被壓縮, 又為長條型矩陣; Rank(A)=m, 則 Singular Value
Decomposition of A 可表示成 A=U∑Vt , 其中 V and U is orthogonal.
 diag( ,
1
2
,..., n )
其中1, 2 ,..., n 為奇異值且滿足
 1   2  ...   m  0
 i  i
where
and
 m1   m 2  ...   n  0
i 為矩陣 AtA 的第 i 個 Eigenvalue
Ex6: Prove λi ≥ 0
 AX  ( AX ) AX  X A AX  X (X )  X X   X   
2
t
t
t
t
t
2
X
Ex7: Prove
A=U∑Vt
= (U1U2)  1
 0

先求正交矩陣V  (V1 ,V2 ),
2
AX
2
0
0 V1t 
 t   U1 1V1t
0 V2 
V1 為1 , 2 ,...,m 所算出的Eigenvectors, 即v1 , v2 ,...,vm
所構成, 也就是V1  (v1 , v2 ,...,vm );V2  (Vm1 ,Vm 2 ,...,Vn )是m1  0  m 2  ...  n
所求出之Eigenvectors所組成
 2 2
8 8
t
A

則
A
A

例如, 設
 2 2
8 8且特徵值1  16, 2  0;  1  1




 4,  2  0; eigenvectors : V1  (1,1) t , V2  (1,1) t
1 1 
1  1; u sin g


1
 V  (V1 ,V2 ) 
2
AV j   j u j  m eaning
sin ce
 1
 2

AtU  V 
t
1 
2 
t
SVD
Note : 欲解A  U  V t
AV  U 

AV1 1 2 2 
u1 


 1 4 2 2 

we
have
we
1  
2  
1  
2  
At u j  0; hence
 1
 2
t
of
A  U V  
 1
 2
(1)先解 (2)次解V
get
1 
2
1 
2 
u2 
1 

2   4 0 
 1  0 0 

2 
(3)末解U
1
2
1
2
1 
2
 1 
2 
․SVD被用於隱像術的原因: 乃因植入的影像A之奇異值, 可變得很小; 再把轉換
後的影像A’植入B, 則合成影像B’的SVD之奇異值, 仍以B的奇異值為主。Note:
前景取較大的奇異值; 即A’的奇異值接在B的後面, 如此A’就不易被察覺
形態學
․假設色調H為人臉特徵依據, 以訓練集(training images)測得皮膚色之色調範圍
可能顯得零碎; 吾人可利用形態學的opening與closing算子, 將太小且疏離的雜
訊刪除, 但將很靠近的區塊連接在一起; 加上頭髮的考慮, 進一步判定是否人臉
․closing算子會先進行dilation運算, 再作erosion運算; 效果是: 先擴張後, 區域旁
的小區域會被併在一起, 但離區域遠的小雜訊仍然處於孤立狀態。後經侵蝕運
算, 區域旁近距離的雜訊仍會存於新區域內, 但遠距離的雜訊則被侵蝕掉
․opening算子進行的順序恰相反, 有消除小塊雜點的功能; 能打斷以細線連接的
近距離兩區塊。原因是: 連接兩區域的細邊消失, 即使擴張兩區域也無法併合
․影像處理基本主題, 例如DCT、sampling theorem、aliasing等, 不在討論內
Digital Watermarking
․A digital watermark is a signal permanently embedded into
digital data (audio, images, and text) that can be detected or
extracted later by means of computing operations in order to
make assertions about the data. It has been developed to
protect the copyright of media signals.
․It is hidden in the host data in such a way that it is inseparable
from the data and so that it is resistant to many operations not
degrading the host document. Thus by means of watermarking,
the work is still accessible but permanently marked.
․It is derived from steganography, which means covered writing
Steganography is the science of communicating information
while hiding the existence of the communication.
․The goal of steganography is to hide an information message
inside harmless messages in such a way that it is not possible
even to detect that there is a secret message present. Watermarking is not like encryption in that the latter has the aim of
making messages unintelligible to any unauthorized persons
who might interpret them. Once encrypted data id decrypted,
the media is no longer protected.
Morphology
․Morphology means the form and structure of an object, or the
arrangements and interrelationships between the parts of an
object. Digital morphology is a way to describe or analyze the
shape of a digital (most often raster) object. The math behind it
is simply set theory.
․We can assume the existence of three color components (red,
green and blue) is an extension of a grey level, or each color
can be thought of as a separate domain containing new
information.
․Closing the red and blue images should brighten the green
images, and opening the green images should suppress the
green ones.
․Images consist of a set of picture elements (pixels) that collect
into groups having two-dimensional structure (shape).
Mathematical operations on the set of pixels can be used to
enhance specific aspects of the shapes so that they might be
(for example) counted or recognized.
․Erosion: Pixels matching a given pattern are deleted from the
image.
․Dilation: A small area about a pixel is set to a given pattern.
Binary Dilation: First marking all white pixels having at
least one black neighbor, and then
(simple)
setting all of the marked pixels to black.
(Dilation of the original by 1 pixel)
․In general the object is considered to be a mathematical set of
black pixels, written as A={(3,3),(3,4),(4,3),(4,4)} if the upper left
pixel has the index (0,0).
․Translation of the set A by the point x:  Ax  c c  a  x, a  A
For example, if x were at (1,2) then the first (upper left) pixel in
Ax would be (3,3)+(1,2)=(4,5); all of the pixels in A shift down by
one row and right by two columns.
ˆ  c c  a, a  A  Ac  c c  A
․Reflection: A
This is really a rotation of the object A by 180 degrees about the
origin, namely the complement of the set A.
․Intersection, union and difference (i.e.
the language of the set theory.

A  B c) correspond to

․Dilation: A  B  c c  a  b, a  A, b  B; the set B is called a
structuring element, and its composition defines the nature of
the specific dilation.
Ex1: Let B={(0,0),(0,1)}, A  B  C  ( A  {(0,0)})  ( A  {(0,1)})
(3,3)+(0,0)=(3,3), (3,3)+(0,1)=(3,4), … Some are duplicates.
B=
(0,0) added to A Adding (0,1) to A
A=
A=
After union
A=
Note: If the set B has a set pixel to the right of the origin, then a
dilation grows a layer of pixels on the right of the object.
To grow in all directions, we can use B having one pixel on
every side of the origin; i.e. a 3X3 square with the origin at the
center.
Ex2: Suppose A1={(1,1),(1,2),(2,2),(3,2),(3,3),(4,4)} and
B1={(0,-1),(0,1)}. The translation of A1 by (0,-1) yields
(A1)(0,-1)={(1,0),(1,1),(2,1),(3,1),(3,2),(4,3)} and
(A1)(0,1)={(1,2),(1,3),(2,3),(3,3),(3,4),(4,5)} as following.
B1=
(B1 not including the origin)
before
after
Note: (1) The original object pixels belonging to A1 are not
necessarily set in the result, (4,4) for example, due to
the effect of the origin not being a part of B1.
( A)b   ( B) a since dilation is
(2) In fact, A  B  b
B
aA
commutative. This gives a clue concerning a possible
implementation for the dilation operator. When the origin of B
aligns with a black pixel in the image, all of the image pixels that
correspond to black pixels in B are marked, and will later be
changed to black. After the entire image has been swept by B,
the dilation is complete. Normally the dilation is not computed in
place; that is, where the result is copied over the original image.
A third image, initially all white, is used to store the dilation while
it is being computed.
← Dilating →
(1st)
(Erosion)
(2nd)
(1st translation)
(Erosion)
⇒
(2nd)
⇒
(3rd)
(final)
Binary Erosion
• If dilation can be said to add pixels to an object, or to make it
bigger, then erosion will make an image smaller. Erosion can be
implemented by marking all black pixels having at least one
white neighbor, and then setting to white all of the marked
pixels. Only those that initially place the origin of B at one of the
members of A need to be considered. It is defined as
AB  c (B)c  A
Ex3: B={(0,0),(1,0)}, A={(3,3),(3,4),(4,3),(4,4)}
Four such
translations: B(3,3)={(3,3),(4,3)}
B(3,4)={(3,4),(4,4)}
B(4,3)={(4,3),(5,3)}
B(4,4)={(4,4),(5,4)}
Ex4: B2={(1,0)}, i.e. 0  B2 . The ones that result in a match are:
B(2,3)={(3,3)} B(2,4)={(3,4)} B(3,3)={(4,3)} B(3,4)={(4,4)}
Note: {(2,3),(2,4),(3,3),(4,4)} is not a subset of A, meaning the
eroded image is not always a subset of the original.
․Erosion and dilation are not inverse operations. Yet, erosion and
( AB)c  Ac  Bˆ
dilation are duals in the following sense:
․An issue of a “don’t care” state in B, which was not a concern
about dilation. When using a strictly binary structuring element
to perform an erosion, the member black pixels must
correspond to black pixels in the image in order to set the pixel
in the result, but the same is not true for a white pixel in B. We
don’t care what the corresponding pixel in the image might be
when the structuring element pixel is white.
Opening and Closing
․The application of an erosion immediately followed by a dilation
using the same B is referred to as an opening operation,
describing the operation tends to “open” small gaps or spaces
between touching objects in an image. After an opening using
simple the objects are better isolated, and might now be
counted or classified.
․Another using of opening: the removal of noise. A noisy greylevel image thresholded results in isolated pixels in random
locations. The erosion step in an opening will remove isolated
pixels as well as boundaries of objects, and the dilation step will
restore most of the boundary pixels without restoring the noise.
This process seems to be successful at removing spurious
black pixels, but does not remove the white ones.
․A closing is similar to an opening except that the dilation is
performed first, followed by an erosion using the same B, and
will fill the gaps or “close” them. It can remove much of the
white pixel noise, giving a fairly clean image. (A more complete
method for fixing the gaps may use 4 or 5 structuring elements,
and 2 or 3 other techniques outside of morphology.)
․Closing can also be used for smoothing the outline of objects in
an image, i.e. to fill the jagged appearances due to digitization
in order to determine how rough the outline is. However, more
than one B may be needed since the simple structuring element
is only useful for removing or smoothing single pixel irregularities. N dilation/erosion (named depth N) applications should
result in the smoothing of irregularities of N pixels in size.
․A fast erosion method is based on the distance map of each
object, where the numerical value of each pixel is replaced by
new value representing the distance of that pixel from the
nearest background pixel. Pixels on a boundary would have a
value of 1, being that they are one pixel width from a background pixel; a value of 2 meaning two widths from the background, and so on. The result has the appearance of a contour
map, where the contours represent the distance from the
boundary.
․The distance map contains enough information to perform an
erosion by any number of pixels in just one pass through the
image, and a simple thresholding operation will give any desired
erosion.
․There is another way to encode all possible openings as one
grey-level image, and all possible closings can be computed at
the same time. First, all pixels in the distance map that do NOT
have at least one neighbor nearer to the background and one
neighbor more distant are located and marked as nodal pixels.
If the distance map is thought of as a three-dimensional surface
where the distance from the background is represented as
height, then every pixel can be thought of as being peak of a
pyramid having a standardized slope. Those peaks that are not
included in any other pyramid are the nodal pixels.
․One way to locate nodal pixels is to scan the distance map,
looking at all object pixels; find the MIN and MAX value of all
neighbors of the target pixel, and compute MAX-MIN. If the
value is less than the MAX possible, then the pixel is nodal.
The “Hit and Miss” Transform
․It is a morphological operator designed to locate simple shapes
within an image. Though the erosion of A by S also includes
places where the background pixels in that region do not match
those of S, these locations would not normally be thought of as
a match.
․Matching the foreground pixels in S against those in A is “hit,”
and is accomplished with an erosion AS . The background
pixels in A are those found in Ac, and while we could use Sc as
the background for S in a more flexible approach is to specify
the background pixels explicitly in a new structuring element T.
A “hit” in the background is called a “miss,” and is found by
Ac T .
․What we need is an operation that matches both the foreground
and the background pixels of S in A, which are the pixels:
A  (S , T )  ( AS )  ( Ac T )
Ex5: To detect upper right corners. The figure (a) below shows
an image interpreted as being two overlapping squares.
(b) Foreground
structuring element
(a)
(e) Background S,
showing 3 pixels of
the corner
(c) Erosion of (a) by (b)
--the ‘hit’
(f) Erosion of (d)
by (e)--the ‘miss’
(d) Complement of (a)
(g) Intersection
of (c) and (f)--the result
Identifying Region Boundaries
․The pixels on the boundary of an object are those that have at
least one neighbor that belongs to the background. It can’t be
known in advance which neighbor to look for! A single structuring element can’t be constructed to detect the boundary. This is
in spite of the fact that an erosion removes exactly these pixels.
․The boundary can be stripped away using an erosion and the
eroded image can then be subtracted from the original, written
as: Boundary A  ( Asimple)
Ex6: Figure (h) results from the previous figure (a) after an erosion, and (i) shows (a)-(h): the boundary.
(a) of Ex5
(h)
(i)
Conditional Dilation
․There are occasions when it is desirable to dilate an object in
such a way that certain pixels remain immune. The forbidden
area of the image is specified as a second image in which the
forbidden pixels are black. The notation is A  (S , A) and is
computed in an iterative fashion: Ai  ( Ai 1  S )  A
A’: the set of forbidden pixels; Ai: the desired dilation
․One place where this is useful is in segmenting an image.
Ihigh: a very high threshold applying to an image-- a great many
will be missed.
Ilow: a very low threshold applying to the original image-- some
background will be marked.
R: a segmented version of the original-- a superior result than
using any single threshold in some cases, and obtained by:
R  I high  (simple, Ilow )
․Another application of conditional dilation is that of filling a
region with pixels, which is the inverse operation of boundary
extraction. It is to dilate until the inside region is all black, and
then combine with the boundary image to form the final result.
Fill  P  (Scross , Ac )
where P is an image containing only the seed pixel, known to be
inside the region to be filled, A is the boundary image and Scross
is the cross-shaped structuring element, (j) for example.
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
Ex7: (i) boundary, (j) structuring element, (k) seed pixel iterated 0 of the process, (l) iteration 1, (m) iteration 2,
(n) iteration 3, (o) iteration 4, (p) iteration 5 and completed, (q) union of (i) with (p)-- the result
Counting Regions
․It is possible to count the number of regions in an binary image
using morphological operators, first discussed by Levialdi using
6 different structuring elements--4 for erosion naming L1~L4
and 2 for counting isolated “1” pixels (# operator). The initial
count of regions is the number of isolated pixels in the input
image, and the image of iteration 0 is A:
count0= #A, A0=A, countn= #An
The image of the next iteration is the union of the four erosions
of the current image: An1  ( AnL1 )  ( AnL2 )  ( AnL3 )  ( AnL4 )
The iteration stops when An becomes empty (all 0 pixels), and
the overall number of regions is the sum
of all of the values counti.
Ex8: Counting 8-connected,
(e), and (a)~(d): L1~L4
(a)
(b)
(c)
(d)
(e)
Grey-Level Morphology
․A pixel can now have any integer value, so the nice picture of
an image being a set disappears! The figures shows how the
dilated grey-level line (a) might appear to be (b), and was
computed as follows, A being the grey-level image to be dilated.
( A  S )[i, j ]  max{A[i  r , j  c] 
S[r , c][i  r , j  c]  A, [r , c]  S}
(a) Background is 0, and line
pixels have the value 20.
(b) Grey line after a dilation
․Process of the above computation:
(1) Position the origin of the structuring element over the first
pixel of the image being dilated.
(2) Compute the sum of each corresponding pair of pixel values
in the structuring element and the image.
(3) Find the maximum value of all of these sums, and set the
corresponding pixel in the output image to this value.
(4) Repeat this process for each pixel in the image being dilated.
․
Chromaticity Diagram
• Chromaticity is an objective specification of the quality of a
color regardless of its luminance, that is, as determined by its
colorfulness (or saturation, chroma, intensity, or excitation
purity) and hue.
• In color science, the white point of an illuminant or of a display
is a neutral reference characterized by a chromaticity; for
example, the white point of an sRGB display is an x,y chromaticity of [0.3127,0.3290]. All other chromaticities may be defined
in relation to this reference using polar coordinates. The hue is
the angular component, and the purity is the radial component,
normalized by the maximum radius for that hue.
• Purity is roughly equivalent to the term "saturation" in the HSV
color model. The property "hue" is as used in general color
theory and in specific color models such as HSV or HSL,
though it is more perceptually uniform in color models such as
Munsell, CIELAB or CIECAM02.
․Some color spaces separate the three dimensions of color into
one luminance dimension and a pair of chromaticity dimensions.
For example, the chromaticity coordinates are a and b in Lab
color space, u and v in Luv color space, x and y in xyY space,
etc. These pairs define chromaticity vectors in a rectangular 2space, unlike the polar coordinates of hue angle and saturation
that are used in HSV color space.
․On the other hand, some color spaces such as RGB and XYZ
do not separate out chromaticity; chromaticity coordinates such
as r and g or x and y can be calculated by an operation that
normalizes out intensity.
․The xyY space is a cross between the CIE XYZ color space and
its normalized chromaticity coordinates xyz, such that the
luminance Y is preserved and augmented with just the required
two chromaticity dimensions.
․The CIE (1931) diagram is a projection of a 3D color space,
called XYZ color space, to 2D. The light emitted by a device, or
light reflected from a surface consists of photons with different
wavelengths. The amount of photons with a certain wavelength,
λ, in a given light composition is represented by the function
C(λ). The CIE diagram comprises three funs μx(λ), μy(λ),
μz(λ), and is used for comparing colors produced by colorproducing devices, e.g. PC monitors, printers, and cameras.
The science of quantifying color is called colorimetry. The X, Y,
Z coordinates are found as follows:
X   C ( )  x d
Y   C ( )  y d
Z   C ( )  z d
x
X
X Y  Z
․The projection to the CIE diagram is obtained via
Y
Z
where x+y+z=1, it’so that only two of x, y,
y
z
X Y  Z
X Y  Z
z are independent, making the projection
a planar surface.
Texture
․A major characteristic is the repetition of a pattern or patterns
over a region. The pattern may be repeated exactly, or as a set
of small variations on the theme, possibly a function of position.
The goal of studying texture is to segment regions rather than
characterizing textures, determining which regions have texture
A and which have texture B. The result could be an image in
which texture has been replaced by a unique grey level or color.
․texton: the size, shape, color, and orientation of the elements of
the pattern.
Sometimes the difference between two textures is contained in
the degree of variation alone, or in the statistical distribution
found relating the textons. A region cannot display texture if it is
small compared with the size of a texton.
․The same texture at two different scales will be perceived as
two different textures, provided that the scales are different
enough. This leaves us with a problem of scale. As the scales
become closer together the textures are harder to distinguish,
and at some point they become the same.
Ex1: regions characterized by their textures– both are artificial.
(a)
․
(b)
Download