Report - CAE Users

advertisement
1. Introduction
Main purpose of video segmentation is to enable content-based representation by
extracting objects of interest from a series of consecutive video frames. This
technique can be used in MPEG-4 standard. It is also a key to many robotic vision
applications. Most vision based autonomous vehicles acquire information on their
surroundings by analyzing video. Particularly, it is required for high-level image
understanding and scene interpretation such as spotting and tracking of special events
in surveillance video. For instance, pedestrian and highway traffic can be regularized
using density evaluations obtained by segmenting people and vehicles. By object
segmentation, speeding and suspicious moving cars, road obstacles, strange activities
can be detected. Forbidden zones, parking lots, elevators can be monitored
automatically. Gesture recognition as well as visual biometric extraction can be done
for user interfaces.
We developed a novel algorithm for automatic and reliable
segmentation of moving objects in video sequences. What we do is to implement an
efficient video object segmentation algorithm with Matlab language and search for
hardware implementation using systolic array and pipelined architecture.
2. Algorithm
Our algorithm is based on inter-frame change detection. Change detection for
1
inter-frame difference is one of most feasible solutions because it enables automatic
detection of objects and allows detection of nonrigid motion. However the drawback
of change detection is noise by decision error which causes the lack of spatial edge
information, especially in some crucial image areas and forms small false region. To
overcome this drawback, we use the method of morphology to remove the small holes
region and smooth the edge of object.
2.1 Extraction of Moving Edge (ME) Map
Our algorithm starts with edge detection. While edge information plays a key role in
extracting the physical change of the corresponding surface in a real scene, exploiting
simple difference of edges for extracting shape information of moving objects in
video sequence suffers from great deal of noise even in stationary background. This is
due to the fact that the random noise created in one frame is different from the one
created in the successive frame. The difference of edges is defined as
( I n )  ( I n1 )  (G  I n )  (G  I n1 )
(1)
where In and In-1 are the current frame and previous frame respectively.  (I ) is edge
map which is obtained by the Canny edge detector[2], which is accomplished by
performing a gradient operation  on the Gaussian convoluted image G  I . This
algorithm of edge extraction from the difference images in successive frames results
2
in a noise-robust difference edge map DEn because Gaussian convolution included in
the Canny operator suppresses the noise in the luminance difference.
DEn  ( I n  I n1 )  G  I n  I n1
(2)
After calculating the difference edge map of images using a Canny edge detector, we
extract the moving edge MEn of the current frame In based on the difference edge map
DEn of difference I n  I n1 , the current frame’s edge map En  ( I n ) , and the
background edge map Eb. Eb contains absolute background edges in case of a still
camera, which can be extracted from the first or by counting the number of edge
occurrence for each pixel through the first several frames. We define the current edge
model En  e1 ,..., ek  as a set of all edge points detected by Canny detector. Besides,
we also define the moving edge MEn  m1 ,..., ml  as a set of l moving edge points,
where l  k and MEn  En . MEn can be generated by selecting edge pixels from En
within a small distance Tchange of DEn, that is

MEnchange  e  En | min e  x  Tchange
xDEn

(3)
Some MEn might have scattered noise which needs to be romoved. In addition, a
previous frame’s moving edges can be referenced to detect temporarily still moving
edge, that is

ME nstill  e  En | e  Eb , min e  x  Tstill
xMEn 1

(4)
The final moving edge map for current In is expressed by combining the two maps
ME n  ME nchange  ME nstill
(5)
3
2.2 Extraction of object
While a moving edge map MEn, as shown in figure 1, detected from DEn, the object is
ready to be extracted. The horizontal object candidates are declared to be the region
between the first and last edges in each row. The vertical object candidates are based
on the same method. After finding both horizontal and vertical object candidates, we
combine these two object candidates as shown in figure 2 and then use morphology
method to remove small false region and smooth edge of object. Here we use erosion
and dilation operations as shown in figure 3.
In the following sections, we’ll talk about details of some arithmetic which we
use in this project.
Figure 1. Edge map of current frame
4
Figure 2 Combination of horizontal and vertical candidates
Figure 3 After applying Morphology
5
Figure 4 Block diagram of the segmentation
Figure 5 Our result
2.3 Gradient operator
The gradient of an image f(x,y) at location (x,y) is the vector
6
 f 
 f x   x 
F      f 
 fy  
 y 
(6)
It is well know from vector analysis that the gradient vector points in the direction of
maximum rate of change of f at (x,y). In edge detection, an important quantity is the
magnitude of this vector, generally referred to simply as the gradient and denoted
f , where:
f  mag (f )  [ f x2  f y2 ]1/ 2
(7)
This quantity equals the maximum rate of increase of f(x,y) per unit distance in the
direction of f . Common practice is to approximate the gradient with absolute
values:
f  f x  f y
(8)
which is much simpler to implement, particularly with dedicated hardware.
Note from Eq. (6) and Eq. (7) that computation of gradient of an image is based on
obtaining the partial derivative f x and f y at every pixel location. There are
several ways to derive the first-order differentiation to implement derivative in digital
form[3]. Among these methods, we choose Sobel operator to be implemented in our
project. The Sobel operator masks are
f x  ( z7  2 z8  z9 )  ( z1  2 z 2  z3 )
(9)
and
7
f y  ( z3  2 z 6  z9 )  ( z1  2 z 4  z 7 )
(10)
where zi are referred to pixels in a 3×3 window as shown below.
z1
z2
z3
-1
-2
-1
-1
0
1
z4
z5
z6
0
0
0
-2
0
2
z7
z8
z9
1
2
1
-1
0
1
So we can get two 3×3 masks as shown above.
2.4 Morphology
In our project, we use two operations: dilation and erosion.
Dilation
With A and B as sets in Z2 and  denoting the empty set, the dilation of A by B,
denoted A B is defined as

A  B  x | ( Bˆ ) x  A  

(11)
ˆ denote reflection of B and translation of B̂ by x.
where ( B)
x
Thus the dilation process consists of obtaining the reflection of B about its origin and
then shifting this reflection by x. The dilation of A by B then is the set of all x
displacements such that B̂ and A overlap by at least one nonzero element. Based On
this interpretation, Eq. (11) may be rewritten as


A  B  x | [( Bˆ ) x  A]  A
(12)
8
Erosion
For set A and B as sets in Z2, the erosion A by B, denoted A
A
B, is defined as
B  x | ( B) x  A
(13)
Which , in words, says that the erosion of A by B is the set of all points x such that B,
translated by x, is contained in A.
3. Pipelined architectures of the first phase segmentation
3.1 Feature Extraction
A pixel (m,n) is chosen to be the center of a small window. And the two features,
mean
ft1 (m, n) and variance
ft 2 (m, n) , are textural appearance of the area
surrounding the center. The formula is given below:
ft1 (m, n) 
1
NW

( k ,l )W
f y (m  k , n  l )
(14)
ft 2 (m, n) 
1
NW

( k ,l )W
( f y (m  k , n  l )  f t1 (m, n)) 2
(15)
Where NW is the number of pixels of the Ws * Ws window W.
9
Systolic architectures are used for extracting the mean and variance using a 5 by 5
sliding window. Nc is the number of columns of the reference frame. The luminance
component of the reference frame are scanned and serially fed into the 1+4Nc FIFO,
which has latency 1+4Nc clocks and then input to the systolic array architecture.
As the above figure shows, the first systolic array calculates the local mean and the
10
second systolic array calculates the local variance every clock cycle with 5-clock
cycle latency. Block A accumulates the luminance components and generate a mean
value using block M by dividing the accumulated results by NW . On the other hand,
the luminance input and the local mean value are the inputs for block V, which
calculates the local variance. The first systolic array uses NW A blocks, NW B
blocks and an M block. The second systolic array uses NW V blocks, NW B blocks
and an M block. After feature extracting, we do some feature weighting and feed into
the feature labeling array.
3.2 Feature labeling
In order to generate the initial segmentation for a frame, each weighted feature is
assigned a label. The segmentation label of a pixel, NL(m,n), is a label at pixel(m,n).

So the input feature vector f (m, n) and the code word have a minimum Euclidean
11
distance. To prevent square operation in calculating Euclidean distance, the equation
can be transformed as follows:

NL(m, n)  arg min{|| f (m, n)  c j ||2 }
j

 arg max{ f (m, n)* c j  0.5 || c j ||2 }
(16)
j
c j is a fixed code word vector and 0.5 || c j ||2 can be calculated in advance.
The systolic architecture for feature labeling is shown in the figure below . An input

weighted feature vector f (m, n) input into the array and shift right to compute the
inner product through the entire code words. The L blocks add the product of an input
and code words, and the results subtract 0.5 || c j ||2 in S blocks and shift into the W
blocks. The W blocks search a minimum index (j) by comparing current S block
output (si1) and a temporal maximum value (si2) and its index (lji) shifted from the
previous state. If the current result is larger than previous one, W block will replace
the maximum value and its index. So, W will finally find an index that best matches
code word for the input freature vector, and the index NL(m,n) is stored into a buffer.
12
3.3 Edge fusion
The edge fusion algorithm integrates the edge-linked map into initially labeled
regions and iterates region-merging processes until no region luminance distance
between neighbors is less than a default value. After this step, the current region is
merged with the selected neighbor region. The merging process is stopped after the
merged region counter hit the maximum allowable number of loops.
13
4. Conclusion
We successfully extracted the moving objects from video sequence. And on the other
hand, we surveyed the hardware implementation for video object segmentation. And
the VLSI architecture proposed by Jinsang Kim and Tom Chen is a highly pipelined,
systolic array microachitecture. 4+Nf+Ncb+N clock cycles are achieved with going
through feature extraction, normalization, weighting, labeling and edge fusion. The
architecture is able to perform video segmentation on QCIF image, and can also be
used as an acceleration hardware for segmentataion.
5. Matlab code
5.1 Main function:
clear;
w=176;
h=144;
f=300;
msklen=7;
fid=fopen('akiyo.yuv','r');
fid2=fopen('mdtest.yuv','w');
% read first 5 frames
for k=1:5
lum=fread(fid,w*h)';
14
chr1=fread(fid,w*h/4)';
chr2=fread(fid,w*h/4)';
for j=1:h
lumMatrix(j,:,k)=lum(1+(j-1)*w:j*w);
end
% edge detection
En1(:,:,k)=edged(lumMatrix(:,:,k));
end
figure(1)
imshow(En1(:,:,k),[])
% calculate edge of background
Eb=zeros(h,w);
for i=1:h
for j=1:w
a=0;
for k=1:5
a=a+En1(i,j,k);
end
if a==5
Eb(i,j)=1;
end
end
end
figure(2)
imshow(Eb,[]);
% calculate ME of first 5 frames
for k=2:f
if k<6
% calculate the diference between successive frames
DI=abs(lumMatrix(:,:,k)-lumMatrix(:,:,k-1));
En=En1(:,:,k);
lumMatrix(:,:,2)=lumMatrix(:,:,k);
else
if k==6
lumMatrix(:,:,1)=lumMatrix(:,:,5);
end
lum=fread(fid,w*h)';
15
chr1=fread(fid,w*h/4)';
chr2=fread(fid,w*h/4)';
for j=1:h
lumMatrix(j,:,2)=lum(1+(j-1)*w:j*w);
end
DI=abs(lumMatrix(:,:,2)-lumMatrix(:,:,1));
En=edged(lumMatrix(:,:,2));
lumMatrix(:,:,1)=lumMatrix(:,:,2);
end
DE=edged(DI);
% En compared with DE
MEch=edgcomp(En,DE,' and');
% En compared with Eb
MEstil=edgcomp(En,Eb,'nand');
ME=MEch+MEstil;
for i=1:h
for j=1:w
if ME(i,j)>0
ME(i,j)=1;
end
end
end
ME1=zeros(h,w);
ME2=zeros(h,w);
ix=zeros(h,2);
for i=1:h
j=1;
while ME(i,j)==0&j<w
j=j+1;
end
ix(i,1)=j;
j=w;
while ME(i,j)==0&j>1
j=j-1;
end
ix(i,2)=j;
16
if ix(i,1)<=ix(i,2)
ME1(i,ix(i,1):ix(i,2))=1;
end
end
iy=zeros(w,2);
for j=1:w
i=1;
while ME(i,j)==0&i<h
i=i+1;
end
iy(j,1)=i;
i=h;
while ME(i,j)==0&i>1
i=i-1;
end
iy(j,2)=i;
if iy(j,1)<=iy(j,2)
ME2(iy(j,1):iy(j,2),j)=1;
end
end
for i=1:h
for j=1:w
if ME1(i,j)==1&ME2(i,j)==1
ME(i,j)=1;
else
ME(i,j)=0;
end
end
end
figure(9)
imshow(ME,[])
% do morphology
ME=eros(ME,msklen);
ME=dil(ME,msklen);
ME=dil(ME,msklen);
17
ME=eros(ME,msklen);
obj=ME.*lumMatrix(:,:,2);
for i=1:h
lum(1+(i-1)*w:i*w)=obj(i,:);
end
MEchr=zeros(h/2,w/2);
for i=1:h/2
for j=1:w/2
MEchr(i,j)=ME(i*2,j*2);
end
end
MEchrv=zeros(1,h*w/4);
for i=1:h/2
MEchrv(1+(i-1)*w/2:i*w/2)=MEchr(i,:);
end
chr1=chr1.*MEchrv;
chr2=chr2.*MEchrv;
fwrite(fid2,lum);
fwrite(fid2,chr1);
fwrite(fid2,chr2);
lumMatrix(:,:,1)=lumMatrix(:,:,2);
end
fclose all
5.2 Sub-functions
5.2.1 Edge detection function
% edge detection
function En1=edgd(a)
[h,w]=size(a);
l=5;
s2=1;
x=-(l-1)/2:(l-1)/2;
18
[X,Y]=meshgrid(x);
G=exp(-(X.^2+Y.^2)/2/s2);
a1=conv2(G,a);
En=grad(a1);
Enavg=sum(sum(En))/(h+l-1)/(w+l-1);
for i=1:h+l-1
for j=1:w+l-1
if En(i,j)>Enavg
En(i,j)=1;
else
En(i,j)=0;
end
end
end
En1=En((l-1)/2+1:(l-1)/2+h,(l-1)/2+1:(l-1)/2+w);
5.2.2 Edge comparison function
% do comparison between two edge maps
% b is the reference edge map
% str="and" or "nand"
function y=edgcomp(a,b,str)
[h,w]=size(a);
y=a;
for i=2:h-1
for j=2:w-1
if a(i,j)>0
tmp=b(i-1:i+1,j-1:j+1);
if sum(sum(tmp))==0
if str==' and'
y(i,j)=0;
end
else
if str=='nand'
19
y(i,j)=0;
end
end
end
end
end
if a(1,1)>0
if (b(1,1)+b(1,2)+b(2,2)+b(2,1))==0
if str==' and'
y(1,1)=0;
end
else
if str=='nand'
y(1,1)=0;
end
end
end
if a(h,1)>0
if (b(h,1)+b(h,2)+b(h-1,1)+b(h-1,2))==0
if str==' and'
y(h,1)=0;
end
else
if str=='nand'
y(h,1)=0;
end
end
end
if a(1,w)>0
if (b(1,w)+b(1,w-1)+b(2,w)+b(2,w-1))==0
if str==' and'
y(1,w)=0;
end
else
if str=='nand'
y(1,w)=0;
end
end
20
end
if a(h,w)>0
if (b(h,w)+b(h,w-1)+b(h-1,w-1)+b(h-1,w))==0
if str==' and'
y(h,w)=0;
end
else
if str=='nand'
y(h,w)=0;
end
end
end
for i=2:h-1
if a(i,1)>0
if sum(sum(b(i-1:i+1,1:2)))==0
if str==' and'
y(i,1)=0;
end
else
if str=='nand'
y(i,1)=0;
end
end
end
if a(i,w)>0
if sum(sum(b(i-1:i+1,w-1:w)))==0
if str==' and'
y(i,w)=0;
end
else
if str=='nand'
y(i,w)=0;
end
end
end
end
for j=2:w-1
if a(1,j)>0
21
if sum(sum(b(1:2,j-1:j+1)))==0
if str==' and'
y(1,j)=0;
end
else
if str=='nand'
y(1,j)=0;
end
end
end
if a(h,j)>0
if sum(sum(b(h-1:h,j-1:j+1)))==0
if str==' and'
y(h,j)=0;
end
else
if str=='nand'
y(h,j)=0;
end
end
end
end
5.2.3 Erosion function
% do erosion operation of morphology
function y=eros(x,msklen)
[h,w]=size(x);
eros=zeros(h+msklen-1,w+msklen-1);
erostmp=eros;
msk2=(msklen-1)/2;
erostmp(msk2+1:msk2+h,msk2+1:msk2+w)=x;
tmp=zeros(msklen,msklen);
for i=1+msk2:msk2+h
for j=1+msk2:msk2+w
22
tmp=erostmp(i-msk2:i+msk2,j-msk2:j+msk2);
if sum(sum(tmp))<msklen^2-4
eros(i,j)=0;
else
eros(i,j)=1;
end
end
end
y=eros(msk2+1:msk2+h,msk2+1:msk2+w);
5.2.4 Dilation function
% do dilation operation of morphology
function y=eros(x,msklen)
[h,w]=size(x);
dil=zeros(h+msklen-1,w+msklen-1);
diltmp=dil;
msk2=(msklen-1)/2;
diltmp(msk2+1:msk2+h,msk2+1:msk2+w)=x;
tmp=zeros(msklen,msklen);
for i=1+msk2:msk2+h
for j=1+msk2:msk2+w
tmp=diltmp(i-msk2:i+msk2,j-msk2:j+msk2);
if sum(sum(tmp))>1
dil(i,j)=1;
else
dil(i,j)=0;
end
end
end
y=dil(msk2+1:msk2+h,msk2+1:msk2+w);
23
References
[1] Changick Kim and Jenq-Neng Hwang, “Fast and automatic video object
segmentation and tracking for content-based application,” IEEE Trans. Circuits
and Systems for Video Technology, vol. 12, No. 2, Feb. 2002, pp. 122-129.
[2] J. F. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern
Anal. Machine Intell., vol. PAMI-6, pp. 679-698, Nov. 1996.
[3] Jinsang Kim and Tom Chen, “Real-time video objects segmentation using a highly
pipelined microarchitecture,” Proceedings of the IASTED International
Conference, Visualization, Imaging, and Image Processing, Sep. 3-5, 2001,
Marbella, Spain, pp. 483-488
[4] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing.
24
Download