影像幾何與內插法,_平移

advertisement
影像幾何
․有時需對影像作下列處理: 放大影像可符合特定空間大小或用於印刷, 縮小影像
以便置於網頁上, 旋轉影像以調整不正確的攝影角度或只是為了某種效果等等,
皆是影像幾何的範疇; 其中旋轉和縮放即為所謂的仿射轉換 (affine transformation), 轉換後線條仍是線條, 而且平行線仍然是平行線。另外, 也有非仿射幾何
轉換, 包括彎曲, 不在探討範圍內。
․影像幾何轉換可用於還原幾何失真的影像, 例如: 影像重疊定位, 地球座標投影
(refer to my third copy of slides), 校正影像的長寬比, 或前一點所述事項; 影像
幾何轉換包括兩部份, (1) 座標變換: 描述原影像與新影像之間的座標關係, 此可
依照需要設計而得到 (2) 像素重新取樣: 座標變換得到的新影像座標(x’,y’), 有
些不會剛好落在舊格子點(x,y), 此時(x’,y’)的像素灰階值需取鄰近的一些點之灰
階值, 經內插法或其他計算而得到。
(x,y)
Tg
(x’,y’)
․座標變換: 將(x,y)座標所在的某一區間R映射到(x’,y’)的區間R*,
定義如下: Tg: R → R*
or Tg-1: R* → R
(5.1) x’ = p(x,y) ; y’ = q(x,y)
(5.2) x = p-1(x’,y’) ; y = q-1(x’,y’)
通常可以次數為m的多項式近似如下: x' 
m
m
 a jk x y
j
j 0 k 0
m
k
m
y'   b jk x j y k
j 0 k 0
係數ajk, bjk可由一些對應點(xi,yi)和(xi’,yi’)利用最小平方誤差法求出; 對變化平緩
的幾何失真, 一般取 m=2 即可得到不錯的近似。
․雙線性變換: x’ = a0+a1x+a2y+a3xy y’ = b0+b1x+b2y+b3xy 含有八個變數需
定出四組對應點即可求得; 另外, 線性轉換的放大、縮小、旋轉、平移與歪像則
含六個變數, 定義成: x’ = c0+c1x+c2y y’ = d0+d1x+d2y
細分為(1) 旋轉φ度: x’ = xcosφ + ysinφ
y’ = -xsinφ + ycosφ
(2) 水平放大a倍, 垂直放大b倍: x’ = ax ; y’ = by
(3) 歪曲θ度: x’ = x + ytanθ ; y’ = y
․影像轉換一般有兩種方式, (a) 利用5.1式, 直接將R轉換至R*, 缺點是: 可能R*中
的某些點無法被對應到, 需再將R*經過低通濾波器; (b) 利用5.2式, 對於R*中每
一點(x’,y’), 求出(x,y), 並將(x’,y’)的灰階值以內插方式或其他方法計算出。一般
採用(b)。
Ex: 以(a)方式, 將影像f(x,y)放大兩倍得g(x,y), 則g(2x,2y) = f(x,y), 且g(2x+1,y)
= g(x,2y+1) = g(2x+1,2y+1) = 0
sol: g(x,y)每間隔一列或一行其像素值為0, 內插法有0次和1次(即線性)兩種; 0次
1 0 3 0
1 1 3 3
內插法可將g(x,y)和 (1) 1 作旋積, 例如




1
1
1 3 放大兩倍 0 0 0 0 (1) 1 1 1 3 3



5 7  
5 0 7 0  1 1 5 5 7 7






0 0 0 0 
5 5 7 7
線性內插若將g(x,y)和
1
4
1

2
1
 4
1
2
1
1
2
1
4
1

2
1
4 
作旋積, 得
1

1 3 0

5 7 5



0
1
0 3 0 
4
0 0 0  1

0 7 0  2

1
0 0 0 
 4
1
2
1
1
2
1
1
4 
1  3

2  5
1 
2.5
4 
1.5 
4 5 2.5 
6 7 3.5 

3 3.5 1.75
2
3
↓β↑
Ex: 執行(b)法後, (x,y)可能沒有在格子點上, 如何決定(x’,y’)的灰階值?
sol: (1) 採用近鄰內插法, 又稱替代內插法: 以最接近(x,y)的格子點之灰階來代表
(x’,y’)的灰階值; 令 j = inte(x+0.5)、k = inte(y+0.5), 則 g(x’,y’) = f(j,k)。其中
x、y由5.2式提供, 但此法會造成散亂的影像或鋸齒狀現象(因為在空間上有
±1/2格子點的誤差); 下圖示其關係, 實線表示 (x’,y’) 的格子點, 虛線表示
(x,y)的格子點。
f(j,k)
f(j+1,k)
(2) bilinear interpolation: 以(x,y)的四個 A ← α →
B
(x,y)
鄰近點求得最近似的灰階, 如右圖:
f(j,k+1)
f(j+1,k+1)
j = inte[x],α= x – j ; k = inte[y],β= y – k
grey(A) = (1-β)f(j,k) + βf(j,k+1)
gray(B) = (1-β)f(j+1,k) + βf(j+1,k+1)
grey(f(x,y)) = α•B + (1-α)•A = (1-α)(1-β)f(j,k) +
β(1-α)f(j,k+1) +α(1-β)f(j+1,k) + αβf(j+1,k+1)
當α=β= 0, 則 f(x,y) = f(j,k)且上式中的係數為1, 圖示如右。
結果: 近鄰內插法易算但效果較差; bilinear interpolation的
近似效果較好, 即使運算時間較長也無法避免模糊現象。
․bilinear interpolation is an extension of linear interpolation functions of two
variables on a regular grid. The key idea is to perform linear interpolation first
in one direction, and then again in the other direction. In computer vision and
image processing, bilinear interpolation is one of the basic resampling
techniques(影像內插法).
․Application in image processing:
1) It is a texture mapping technique that produces a reasonably realistic
image, also known as bilinear filtering or bilinear texture mapping. An
algorithm is used to map a screen pixel location to a corresponding point
on the texture map. A weighted average of the attributes (color, alpha,
etc.) of the four surrounding texels is computed and applied to the screen
pixel. This process is repeated for each pixel forming the object being
textured.
2) When an image needs to be scaled-up, each pixel of the original image
needs to be moved in certain direction based on scale constant. However,
when scaling up an image, there are pixels (i.e. Hole) that are not assigned
to appropriate pixel values. In this case, those holes should be assigned to
appropriate image values so that the output image does not have non-value
pixels.
3) Typically bilinear interpolation can be used where perfect image transformation, matching and imaging is impossible so that it can calculate and
assign appropriate image values to pixels. Unlike other interpolation
techniques such as nearest neighbor interpolation and bicubic interpolation,
bilinear Interpolation uses the 4 nearest pixel values which are located in
diagonal direction from that specific pixel in order to find the appropriate
color intensity value of a desired pixel.
․數據內插法: 如右圖, 四個間隔平均且皆為1 的點 x1, x2, x3 和 x4 ; 假設這些點的
數值分別為 f(x1), f(x2), f(x3), f(x4),
x1
x2
x3
x4
沿著此線平均分隔出八個點
x1' x2' x3' x4' x5' x6' x7' x8'
’
重畫此線(如下頁), 釐清 x 與 x 之間關係:
x’ = (7x-4)/3 ; x = (3x’+4)/7
(6.1)
'
X 1
2
․除了第一點和最後一點重回外, xi完全不會
與原始的 xj 相對應; 因此, 必須以已知鄰近值 x’ 1 2 3 4
f(xj) 來估算 f( xi' ), 稱為內插法。
xi' f( xi' ) = f(xj), xj 為原來的點裡面最接近
若讓
的, 即為內插法的一種方式, 稱為
x1'
x2'
x3'
nearest-neighbor interpolation。
3
5
6
4
7
8
x8'
․另一種稱為線性內插法: 以直線連結原始函數值,
再取落在線上的值為內插值,
f(k2)
(1)
如下圖 (1) 所示:
F
f(k1)
步驟如右圖(2)所示:
λ
1-λ
(2)
k1
k2
設 k2 = k1+1 ,
x1'
x2'
x3'
x8'
F為欲求之數值,
根據斜率得
F  f (k1 )


f (k 2 )  f (k1 )
 F  f (k 2 )  (1   ) f (k1 )
1
(6.2)
'
Ex: f(x1) = 2, f(x2) = 3, f(x3) = 1.5, f(x4) = 2.5 , 求 (a) 介於 x2 和 x3 之間的點 x4 ,
λ的對應值為2/7; (b) 介於 x3 和 x4 之間的點 x7' , λ= 4/7
A:
2
5
4
3
'
'
(a)
f ( x4 ) 
7
f ( x3 ) 
7
f ( x2 )  2.5714
(b)
f ( x7 ) 
7
f ( x4 ) 
7
f ( x3 )  2.0714
影像內插法
․如圖所示, 4x4的影像透過此法, 可以放大為8x8的影像;
紅點為原始點, 白點是新產生的點。先從最上列
開始執行線性內插法, 求得 f(x,y’); 然後在最下列 (x,y) (x,y’) (x,y+1)
μ
1-μ
執行線性內插法, 求得 f(x+1,y’)。最後, 在新數值 λ
(x’,y’)
之間的 y’ 行執行內插法, 求得 f(x’,y’);
1-λ
利用公式6.2可得 f(x,y’) = µ f(x,y+1) + (1-µ) f(x,y)
f(x+1,y’) = µ f(x+1,y+1) + (1-µ) f(x+1,y) ;
(x+1,y)
(x+1,y’) (x+1,y+1)
在y’行執行內插法, 可得
f(x’,y’) = λ f(x+1,y’) + (1-λ) f(x,y’) , 代入新求得之數值後, 可得
f(x’,y’) = λ [µ f(x+1,y+1) + (1-µ) f(x+1,y)] + (1-λ) [µ f(x,y+1) + (1-µ) f(x,y)]
=λ µ f(x+1,y+1) + λ (1-µ) f(x+1,y) + (1-λ) µ f(x,y+1) + (1-λ) (1-µ) f(x,y)
此即bilinear interpolation公式
1
1

(
x
,
y
)

(
3
x
'

4
),
(3 y '4)  即縮放係數1/2 < 1 ,
․回到公式(6.1), 另可得
7
7
得到的影像陣列比原始影像來得小, 故應反過來看: 白點是原始點, 紅點為新產
生的點 (原來縮放係數 2 >1: 代表放大)。MatLab的函數 imresize 可進行這樣的
運算: imresize(A,k,’method’) ; 其中A是任何型態的影像, k是縮放係數, method
可為nearest or bilinear or cubic 。
imresize另一種用法: imresize(A,[m,n],’method’) , [m,n] 是縮放後輸出結果的大
小; 也可在縮小影像之前選擇執行的低通濾波器大小或類型。
一般性的內插法
R(0)
․概念如下: 假設 x’ – x1 = λ , x1 ≤ x’ ≤ x2 , 內插法函數為R(u),
並設f(x’) = R(-λ) f(x1) + R(1-λ) f(x2), 運算方式如右上圖所示;
函數R(u)的中心在 x’ , 因此 x1 與 u = -λ 相對應, x2 與 u = 1-λ
相對應; 另兩右圖中的R0(u)與R1(u)都定義在 -1 ≤ u ≤ 1 . 如下:


R0 (u )  


0 if
u  0.5
1 if
 0.5  u  0.5
0 if
u  0.5
1  u
R1 (u )  
1  u
if
u0
if
u0
R(-λ)
R(1-λ)
x1
x’
x2
0.5 1
-1 -0.5 R0(u)
R1(u)亦可寫成 1- |u| 。將6.2中的R(u)代換為R0(u), 就是
-1 R1(u) 1
nearest neighbor interpolation; 分開考慮λ< 0.5 和 λ≧0.5
即可明白。若λ< 0.5 則 R0(-λ) = 1, R0(1-λ) = 0,
那麼 f(x’) = 1•f(x1) + 0•f(x2) = f(x1) ;
若λ≧0.5 則 R0(-λ)=0、R0(1-λ)=1, 那麼 f(x’) = 0•f(x1) + 1•f(x2) = f(x2)
不論哪種情況, f(x’)均為最接近 x’ 之像素的函數值。同樣地, 將6.2中的 R(u) 代
換為R1(u), 就變成線性內插法; 推導過程如右, f(x’) = R1(-λ)f(x1) + R1(1-λ)f(x2)
= (1-λ)f(x1) + λf(x2)
․另一個類似的函數稱為cubic interpolation, 定義成
R3(u) =1.5|u|3 - 2.5|u|2 + 1 if |u| < 1 or
-2
-1
1
2
3
2
R3(u) = -0.5|u| + 2.5|u| – 4|u| + 2 if 1 < |u| ≤ 2
R3(-λ) R3(0)
R3(2-λ)
R3(-1-λ)
如右圖所示, 此函數定義在 -2 ≤ u ≤ 2 ; 除了使用
R3(1-λ)
x’ 兩邊的 x1 和 x2 上的函數值 f(x1) 與 f(x2) 外,
-2
-1 x2 x’ x3 1
2
x1
x4
也使用更遠的 x 來進行內插; 由6.2衍生下式
f(x’) = R3(-1-λ)f(x1) + R3(-λ)f(x2) + R3(1-λ)f(x3) + R3(2-λ)f(x4) 且 x – x2 = λ
起始點
列內插
行內插
實作時, 需以(x,y)周圍的16個已知數值, 先對列執行內插, 再對行進行內插(反之
亦可); 所以若從行與列兩個方向, 對影像進行立方內插即稱為bicubic interpolation。使用MatLab時, 可在imresize函數的參數部份鍵入 ’bicubic’ 就可執行雙
立
方內插; 例如 >> head4c=imresize(head,4,’cubic’); imshow(head4c)
․縮小影像: 其中一法就是取走間隔的像素; 例如, 欲縮小成原來的十六分之一, 則
取走像素(I,j), i 和 j 都必須是4的倍數, 此法又稱為次取樣(subsampling)。使用
MatLab的指令imresize時, 對應的參數設成 ‘nearest’ ; 然而, 對於影像的高頻
部
份效果並不好(例如, 圓形邊緣會變得斷斷續續)-- 另可透過閥值運算, 轉換成二
(x’,y’)
元影像(後述), 效果較佳。
θ
 x' cos  sin    x 
․旋轉: 如右圖所示, 逆時針旋轉θ度,
標出點(x,y)及旋轉後的
 y'   sin  cos
   y 
  
 x   cos sin    x'
 y    sin  cos   y'
對應點(x’,y’); 矩陣算法為
  
 
因為反矩陣與轉置矩陣相同, 所以矩陣正交, 故可得
圖中的實心圓代表原始位置, 影像網格
(a,b)
(0,b)
可視為由像素所構成的笛卡爾整數值網格,
必須保證旋轉後的像素仍位於網格上,
(a,0)
(b)
故如圖(a)即不適用於影像處理; 需如(b)圖,
(x’,y’)
以一個方形框住旋轉後的影像, 影像中的
(x”,y”)
點(x’,y’)都是再旋轉回去, 也會落在原始
影像範圍內的數值, 即 0≤ x’cosθ+y’sinθ≤a
(d)
0≤ -x’sinθ+y’cosθ≤b
(x,y)
(a)
(c)
圖a旋轉300後成為c圖, 可以找到旋轉後的影像像素位置, 但是數值呢? 將旋轉
後的像素點(x’,y’)再旋轉回原始影像中, 得到點(x’’,y’’)如d圖, (x’,y’)的灰階值可透
過周圍的灰階, 以內插法找出來。
․MatLab的影像旋轉指令可使用imrotate(image, angle, ‘method’), 和imresize一
樣, 方法參數method可設定為nearest、binlinear、bicubic; 若沒設定, 則內定使
用nearest。例如, 所附程式im_geometry.m以前述三法分別實作pout2.png且旋
轉100、600並執行放大兩倍, 仔細觀察可發覺, nearest法邊緣鋸齒化較明顯◦
若旋轉角度為900的整數倍, 則影像的旋轉只需轉置矩陣, 可使用flipud指令將矩
陣上下倒轉, 或fliplr指令將矩陣左右倒轉;
即 900 flipud(image); 1800 fliplr(flipud(A)); 2700 fliplr(image)
․歪像: 延展或扭曲物體的形狀, 執行所附程式distortion.m, 觀察運作及變化情形
適當運用函數pixval來觀察, 即可單獨取出骷髏頭影像, 例如
>> e1 = A(1:150,28:200); % extract specific image
>> e2=imresize(imrotate(e1,-22,'bicubic'),[20,150],'bicubic'); % distortion
>> e3=imresize(e2,4,'bicubic'); imshow(e3);
欲取消歪像, 須以相反次序操作; 但若事先不知此歪像形成方式, 則只有試誤多
次, 方能看懂該影像◦
Review Exercises:
→0.5←
1.影像幾何轉換的二個步驟為何?
2.寫出歪曲45度的幾何轉換?
(x,y)
100
200
0.8
F
150
(X,Y)
220
450
3.以雙線性內插法, 求右上圖之像點灰度F=____________
4.將原來方形影像轉換成圓形(圖示在右, 利用或修改公式, 公式中以(256,256)為
中心點); 提示: 將影像視為由不同邊長的同心方形
P(x,y)
所構成, 再將每一同心方形轉換至同心圓。
R P(X,Y)
θ
X = 256 + Rcosθ
Y = 256 – Rsinθ
(256,256)
R = max(|x-256|,|y-256|)
cos 
x  256
( x  256) 2  ( y  256) 2
sin  
256 y
( x  256) 2  ( y  256) 2
5.Run the programs, im_geometry.m and distortion.m, with attached files
(calligraphy.bmp, pout2.png or graphical files) to see the effect and effectiveness of different approaches.
§ 位元平面
․三原色RGB可分解成R平面, G平面, B平面, 如右:
A
A
A
高灰階像素也可分解成八個位元平面,
假設256個灰階值表示成(g8g7g6g5g4g3g2g1)2 , 每一像素提供第i個位元, 即gi
以組成第i個位元平面(也就是第i張黑白影像), 如下:
B
B
B
B
Ex4: 給定一 4X4子影像:
A:
B
B
8
7
6
5
32
31
30
29
10
11
12
13
0
1
2
3
00001000
00000111
00000110
00000101
00100000
00011111
00011110
00011101
00001010
00001011
00001100
00001101
00000000
00000001
00000010
00000011
B
B
, 算出第三張位元平面?
⇒
0
1
1
1
0
1
1
1
0
0
1
1
0
0
0
0
․利用位元平面植入影像的缺點: 經過壓縮後, 所植入的影像容易遭到破壞;
解壓縮後所得影像, 常已破損; 即為數學上的One-way function。
§ Steganography and Watermark
․實際重疊高階四個位元平面(捨棄低階四個位元平面)所得影像, 肉眼幾乎分辨
不出差異; 故捨棄低階四個位元並不影響影像特徵太大(此乃因愈低階位元的
權重愈低, 所以影響影像特徵的機率愈小)。
例如, 某影像中有兩像素, 其灰階值為19310=(11000001)2 與 192=(11000000)2 ,
可把灰階值為37=(00100101)2 的第三個像素隱藏於前述影像中; 所得灰
階值為194=(11000010)2 與 197=(11000101)2 , 人眼幾乎察覺無異其影
像特徵。
․ 假設一個位元組可以隱藏一位元, 且影像術規則如下:
(1) 從浮水印讀出之位元為0, 則原影像對應位元組的最後兩位元由01/10, 改為
00/11。
(2) 從浮水印讀出之位元為1, 則原影像對應位元組的最後兩位元由00/11, 改為
01/10。
(3) 其餘情況則保持原狀
例如, 位元組11000000要隱藏位元1, 則改為11000001; 要隱藏0, 則位元組
11000000保持不變。
Ex5: 原影像為 24 710
21
42
8
66
34
10
12
, 想隱藏如右浮水印
A: 先改成二進制(如下), 再根據規則得
00011000
XX
XX
00101010
XX
XX
00100010
XX
XX
1
0
0
1
0
1
0
1
0
25
710
20
42
8
66
35
10
12
, 求出加入浮水印後
的十進位影像?
基本原理
․令B’為A隱藏在B後的結果, PSNR常用來評估B’和B的相似性;
2552
PSNR  10log
MSE
1
MSE  2
N
N 1 N 1
 B' ( x, y)  B( x, y
2
x 0 y 0
PSNR是不錯的失真表示法, 但無法充分反應紋理(texure)的失真情形; 所謂的
浮水印, 可把A看成標誌(logo)---而這標誌通常也是一種版權; 例如, NCKU
之於成大。Note: A的大小必須小於B ; 故必要時, 可把A先壓縮。
․設A為灰階影像且可被壓縮, 又為長條型矩陣; Rank(A)=m, 則 Singular Value
Decomposition of A 可表示成 A=U∑Vt , 其中 V and U is orthogonal.
 diag( ,
1
2
,..., n )
其中1, 2 ,..., n 為奇異值且滿足
 1   2  ...   m  0
 i  i
where
and
 m1   m 2  ...   n  0
i 為矩陣 AtA 的第 i 個 Eigenvalue
Ex6: Prove λi ≥ 0
 AX  ( AX ) AX  X A AX  X (X )  X X   X   
2
t
t
t
t
t
2
X
Ex7: Prove
A=U∑Vt
= (U1U2)  1
 0

先求正交矩陣V  (V1 ,V2 ),
2
AX
2
0
0 V1t 
 t   U1 1V1t
0 V2 
V1 為1 , 2 ,...,m 所算出的Eigenvectors, 即v1 , v2 ,...,vm
所構成, 也就是V1  (v1 , v2 ,...,vm );V2  (Vm1 ,Vm 2 ,...,Vn )是m1  0  m 2  ...  n
所求出之Eigenvectors所組成
 2 2
8 8
t
A

則
A
A

例如, 設
 2 2
8 8且特徵值1  16, 2  0;  1  1




 4,  2  0; eigenvectors : V1  (1,1) t , V2  (1,1) t
1 1 
1  1; u sin g


1
 V  (V1 ,V2 ) 
2
AV j   j u j  m eaning
sin ce
 1
 2

AtU  V 
t
1 
2 
t
SVD
Note : 欲解A  U  V t
AV  U 

AV1 1 2 2 
u1 


 1 4 2 2 

we
have
we
1  
2  
1  
2  
At u j  0; hence
 1
 2
t
of
A  U V  
 1
 2
(1)先解 (2)次解V
get
1 
2
1 
2 
u2 
1 

2   4 0 
 1  0 0 

2 
(3)末解U
1
2
1
2
1 
2
 1 
2 
․SVD被用於隱像術的原因: 乃因植入的影像A之奇異值, 可變得很小; 再把轉換
後的影像A’植入B, 則合成影像B’的SVD之奇異值, 仍以B的奇異值為主。Note:
前景取較大的奇異值; 即A’的奇異值接在B的後面, 如此A’就不易被察覺
形態學
․假設色調H為人臉特徵依據, 以訓練集(training images)測得皮膚色之色調範圍
可能顯得零碎; 吾人可利用形態學的opening與closing算子, 將太小且疏離的雜
訊刪除, 但將很靠近的區塊連接在一起; 加上頭髮的考慮, 進一步判定是否人臉
․closing算子會先進行dilation運算, 再作erosion運算; 效果是: 先擴張後, 區域旁
的小區域會被併在一起, 但離區域遠的小雜訊仍然處於孤立狀態。後經侵蝕運
算, 區域旁近距離的雜訊仍會存於新區域內, 但遠距離的雜訊則被侵蝕掉
․opening算子進行的順序恰相反, 有消除小塊雜點的功能; 能打斷以細線連接的
近距離兩區塊。原因是: 連接兩區域的細邊消失, 即使擴張兩區域也無法併合
․影像處理基本主題, 例如DCT、sampling theorem、aliasing等, 不在討論內
Digital Watermarking
․A digital watermark is a signal permanently embedded into
digital data (audio, images, and text) that can be detected or
extracted later by means of computing operations in order to
make assertions about the data. It has been developed to
protect the copyright of media signals.
․It is hidden in the host data in such a way that it is inseparable
from the data and so that it is resistant to many operations not
degrading the host document. Thus by means of watermarking,
the work is still accessible but permanently marked.
․It is derived from steganography, which means covered writing
Steganography is the science of communicating information
while hiding the existence of the communication.
․The goal of steganography is to hide an information message
inside harmless messages in such a way that it is not possible
even to detect that there is a secret message present. Watermarking is not like encryption in that the latter has the aim of
making messages unintelligible to any unauthorized persons
who might interpret them. Once encrypted data id decrypted,
the media is no longer protected.
Morphology
․Morphology means the form and structure of an object, or the
arrangements and interrelationships between the parts of an
object. Digital morphology is a way to describe or analyze the
shape of a digital (most often raster) object. The math behind it
is simply set theory.
․We can assume the existence of three color components (red,
green and blue) is an extension of a grey level, or each color
can be thought of as a separate domain containing new
information.
․Closing the red and blue images should brighten the green
images, and opening the green images should suppress the
green ones.
․Images consist of a set of picture elements (pixels) that collect
into groups having two-dimensional structure (shape).
Mathematical operations on the set of pixels can be used to
enhance specific aspects of the shapes so that they might be
(for example) counted or recognized.
․Erosion: Pixels matching a given pattern are deleted from the
image.
․Dilation: A small area about a pixel is set to a given pattern.
Binary Dilation: First marking all white pixels having at
least one black neighbor, and then
(simple)
setting all of the marked pixels to black.
(Dilation of the original by 1 pixel)
․In general the object is considered to be a mathematical set of
black pixels, written as A={(3,3),(3,4),(4,3),(4,4)} if the upper left
pixel has the index (0,0).
․Translation of the set A by the point x:  Ax  c c  a  x, a  A
For example, if x were at (1,2) then the first (upper left) pixel in
Ax would be (3,3)+(1,2)=(4,5); all of the pixels in A shift down by
one row and right by two columns.
ˆ  c c  a, a  A  Ac  c c  A
․Reflection: A
This is really a rotation of the object A by 180 degrees about the
origin, namely the complement of the set A.
․Intersection, union and difference (i.e.
the language of the set theory.

A  B c) correspond to

․Dilation: A  B  c c  a  b, a  A, b  B; the set B is called a
structuring element, and its composition defines the nature of
the specific dilation.
Ex1: Let B={(0,0),(0,1)}, A  B  C  ( A  {(0,0)})  ( A  {(0,1)})
(3,3)+(0,0)=(3,3), (3,3)+(0,1)=(3,4), … Some are duplicates.
B=
(0,0) added to A Adding (0,1) to A
A=
A=
After union
A=
Note: If the set B has a set pixel to the right of the origin, then a
dilation grows a layer of pixels on the right of the object.
To grow in all directions, we can use B having one pixel on
every side of the origin; i.e. a 3X3 square with the origin at the
center.
Ex2: Suppose A1={(1,1),(1,2),(2,2),(3,2),(3,3),(4,4)} and
B1={(0,-1),(0,1)}. The translation of A1 by (0,-1) yields
(A1)(0,-1)={(1,0),(1,1),(2,1),(3,1),(3,2),(4,3)} and
(A1)(0,1)={(1,2),(1,3),(2,3),(3,3),(3,4),(4,5)} as following.
B1=
(B1 not including the origin)
before
after
Note: (1) The original object pixels belonging to A1 are not
necessarily set in the result, (4,4) for example, due to
the effect of the origin not being a part of B1.
( A)b   ( B) a since dilation is
(2) In fact, A  B  b
B
aA
commutative. This gives a clue concerning a possible
implementation for the dilation operator. When the origin of B
aligns with a black pixel in the image, all of the image pixels that
correspond to black pixels in B are marked, and will later be
changed to black. After the entire image has been swept by B,
the dilation is complete. Normally the dilation is not computed in
place; that is, where the result is copied over the original image.
A third image, initially all white, is used to store the dilation while
it is being computed.
← Dilating →
(1st)
(Erosion)
(2nd)
(1st translation)
(Erosion)
⇒
(2nd)
⇒
(3rd)
(final)
Binary Erosion
• If dilation can be said to add pixels to an object, or to make it
bigger, then erosion will make an image smaller. Erosion can be
implemented by marking all black pixels having at least one
white neighbor, and then setting to white all of the marked
pixels. Only those that initially place the origin of B at one of the
members of A need to be considered. It is defined as
AB  c (B)c  A
Ex3: B={(0,0),(1,0)}, A={(3,3),(3,4),(4,3),(4,4)}
Four such
translations: B(3,3)={(3,3),(4,3)}
B(3,4)={(3,4),(4,4)}
B(4,3)={(4,3),(5,3)}
B(4,4)={(4,4),(5,4)}
Ex4: B2={(1,0)}, i.e. 0  B2 . The ones that result in a match are:
B(2,3)={(3,3)} B(2,4)={(3,4)} B(3,3)={(4,3)} B(3,4)={(4,4)}
Note: {(2,3),(2,4),(3,3),(4,4)} is not a subset of A, meaning the
eroded image is not always a subset of the original.
․Erosion and dilation are not inverse operations. Yet, erosion and
( AB)c  Ac  Bˆ
dilation are duals in the following sense:
․An issue of a “don’t care” state in B, which was not a concern
about dilation. When using a strictly binary structuring element
to perform an erosion, the member black pixels must
correspond to black pixels in the image in order to set the pixel
in the result, but the same is not true for a white pixel in B. We
don’t care what the corresponding pixel in the image might be
when the structuring element pixel is white.
Opening and Closing
․The application of an erosion immediately followed by a dilation
using the same B is referred to as an opening operation,
describing the operation tends to “open” small gaps or spaces
between touching objects in an image. After an opening using
simple the objects are better isolated, and might now be
counted or classified.
․Another using of opening: the removal of noise. A noisy greylevel image thresholded results in isolated pixels in random
locations. The erosion step in an opening will remove isolated
pixels as well as boundaries of objects, and the dilation step will
restore most of the boundary pixels without restoring the noise.
This process seems to be successful at removing spurious
black pixels, but does not remove the white ones.
․A closing is similar to an opening except that the dilation is
performed first, followed by an erosion using the same B, and
will fill the gaps or “close” them. It can remove much of the
white pixel noise, giving a fairly clean image. (A more complete
method for fixing the gaps may use 4 or 5 structuring elements,
and 2 or 3 other techniques outside of morphology.)
․Closing can also be used for smoothing the outline of objects in
an image, i.e. to fill the jagged appearances due to digitization
in order to determine how rough the outline is. However, more
than one B may be needed since the simple structuring element
is only useful for removing or smoothing single pixel irregularities. N dilation/erosion (named depth N) applications should
result in the smoothing of irregularities of N pixels in size.
․A fast erosion method is based on the distance map of each
object, where the numerical value of each pixel is replaced by
new value representing the distance of that pixel from the
nearest background pixel. Pixels on a boundary would have a
value of 1, being that they are one pixel width from a background pixel; a value of 2 meaning two widths from the background, and so on. The result has the appearance of a contour
map, where the contours represent the distance from the
boundary.
․The distance map contains enough information to perform an
erosion by any number of pixels in just one pass through the
image, and a simple thresholding operation will give any desired
erosion.
․There is another way to encode all possible openings as one
grey-level image, and all possible closings can be computed at
the same time. First, all pixels in the distance map that do NOT
have at least one neighbor nearer to the background and one
neighbor more distant are located and marked as nodal pixels.
If the distance map is thought of as a three-dimensional surface
where the distance from the background is represented as
height, then every pixel can be thought of as being peak of a
pyramid having a standardized slope. Those peaks that are not
included in any other pyramid are the nodal pixels.
․One way to locate nodal pixels is to scan the distance map,
looking at all object pixels; find the MIN and MAX value of all
neighbors of the target pixel, and compute MAX-MIN. If the
value is less than the MAX possible, then the pixel is nodal.
The “Hit and Miss” Transform
․It is a morphological operator designed to locate simple shapes
within an image. Though the erosion of A by S also includes
places where the background pixels in that region do not match
those of S, these locations would not normally be thought of as
a match.
․Matching the foreground pixels in S against those in A is “hit,”
and is accomplished with an erosion AS . The background
pixels in A are those found in Ac, and while we could use Sc as
the background for S in a more flexible approach is to specify
the background pixels explicitly in a new structuring element T.
A “hit” in the background is called a “miss,” and is found by
Ac T .
․What we need is an operation that matches both the foreground
and the background pixels of S in A, which are the pixels:
A  (S , T )  ( AS )  ( Ac T )
Ex5: To detect upper right corners. The figure (a) below shows
an image interpreted as being two overlapping squares.
(b) Foreground
structuring element
(a)
(e) Background S,
showing 3 pixels of
the corner
(c) Erosion of (a) by (b)
--the ‘hit’
(f) Erosion of (d)
by (e)--the ‘miss’
(d) Complement of (a)
(g) Intersection
of (c) and (f)--the result
Identifying Region Boundaries
․The pixels on the boundary of an object are those that have at
least one neighbor that belongs to the background. It can’t be
known in advance which neighbor to look for! A single structuring element can’t be constructed to detect the boundary. This is
in spite of the fact that an erosion removes exactly these pixels.
․The boundary can be stripped away using an erosion and the
eroded image can then be subtracted from the original, written
as: Boundary A  ( Asimple)
Ex6: Figure (h) results from the previous figure (a) after an erosion, and (i) shows (a)-(h): the boundary.
(a) of Ex5
(h)
(i)
Conditional Dilation
․There are occasions when it is desirable to dilate an object in
such a way that certain pixels remain immune. The forbidden
area of the image is specified as a second image in which the
forbidden pixels are black. The notation is A  (S , A) and is
computed in an iterative fashion: Ai  ( Ai 1  S )  A
A’: the set of forbidden pixels; Ai: the desired dilation
․One place where this is useful is in segmenting an image.
Ihigh: a very high threshold applying to an image-- a great many
will be missed.
Ilow: a very low threshold applying to the original image-- some
background will be marked.
R: a segmented version of the original-- a superior result than
using any single threshold in some cases, and obtained by:
R  I high  (simple, Ilow )
․Another application of conditional dilation is that of filling a
region with pixels, which is the inverse operation of boundary
extraction. It is to dilate until the inside region is all black, and
then combine with the boundary image to form the final result.
Fill  P  (Scross , Ac )
where P is an image containing only the seed pixel, known to be
inside the region to be filled, A is the boundary image and Scross
is the cross-shaped structuring element, (j) for example.
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
Ex7: (i) boundary, (j) structuring element, (k) seed pixel iterated 0 of the process, (l) iteration 1, (m) iteration 2,
(n) iteration 3, (o) iteration 4, (p) iteration 5 and completed, (q) union of (i) with (p)-- the result
Counting Regions
․It is possible to count the number of regions in an binary image
using morphological operators, first discussed by Levialdi using
6 different structuring elements--4 for erosion naming L1~L4
and 2 for counting isolated “1” pixels (# operator). The initial
count of regions is the number of isolated pixels in the input
image, and the image of iteration 0 is A:
count0= #A, A0=A, countn= #An
The image of the next iteration is the union of the four erosions
of the current image: An1  ( AnL1 )  ( AnL2 )  ( AnL3 )  ( AnL4 )
The iteration stops when An becomes empty (all 0 pixels), and
the overall number of regions is the sum
of all of the values counti.
Ex8: Counting 8-connected,
(e), and (a)~(d): L1~L4
(a)
(b)
(c)
(d)
(e)
Grey-Level Morphology
․A pixel can now have any integer value, so the nice picture of
an image being a set disappears! The figures shows how the
dilated grey-level line (a) might appear to be (b), and was
computed as follows, A being the grey-level image to be dilated.
( A  S )[i, j ]  max{A[i  r , j  c] 
S[r , c][i  r , j  c]  A, [r , c]  S}
(a) Background is 0, and line
pixels have the value 20.
(b) Grey line after a dilation
Download