Uploaded by Sangeeta Bhanja Chaudhuri

09040424

advertisement
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
1
HSAJAYA: An Improved Optimization Scheme for
Consumer Surveillance Video Synopsis Generation
Subhankar Ghatak, Suvendu Rup, Banshidhar Majhi, and M.N.S. Swamy, Life Fellow, IEEE
Abstract—Video Surveillance is an active area of research and
provides promising security measures for consumer applications.
To ease consumer surveillance investigations, Video Synopsis (VS)
serves as a powerful tool to assess hours of video in a shorter
retro of time by projecting multiple objects concurrently. The
optimization module in VS framework is considered to be a
key module, yet to date, only traditional optimization techniques
have been addressed for energy minimization. Amongst these,
simulated annealing (SA) has been broadly employed to produce
global optimal solution without getting trapped in local minima.
However, the convergence time of SA is quite high as the next
state is chosen randomly to achieve real-time performance. This
paper presents an improved energy minimization scheme using
hybridization of SA and JAYA algorithm to achieve global optimal solution with faster convergence rate. The weights associated
with the energy function are computed using analytic hierarchy
process (AHP) instead of heuristic selection. From experimental
evaluations and analysis, it is seen that the proposed scheme
exhibits superior performance to minimize the overall energy
cost with lesser computational time. The proposed scheme has a
potential to quickly review consumer surveillance video data in
a smart and efficient way.
Index Terms—Video Synopsis, Simulated Annealing, JAYA
Algorithm, Energy Minimization, Optimization.
I. I NTRODUCTION
I
N the recent past, video surveillance is getting increasing
importance in many consumer and professional applications, including home/office security, traffic control, criminal
action, unusual alarming and civic protection. From different
observations, consumer-based video surveillance is bringing
considerable challenges to data storage, analysis and fast
browsing. Furthermore, it is a difficult task for the analyst/operator to analyze the surveillance video effectively and
efficiently, and provide consumer access to key information.
Most of the efforts [1], [2] deal with the overflow of huge
volume consumer surveillance video and involves the deployment of automatic video understanding. Hence, there is a
growing demand to create a summery/synopsis from a lengthy
video to find interesting objects with a low resource and less
computational overhead.
Manuscript received- ; revised- .
S. Ghatak is with the Image and Video Processing Laboratory, Department
of Computer Science and Engineering, IIIT Bhubaneswar, Orissa 751003,
India (e-mail: [email protected])
S. Rup is with the Image and Video Processing Laboratory, Department of
Computer Science and Engineering, IIIT Bhubaneswar, Orissa 751003, India
(e-mail: [email protected]).
B. Majhi is with the School of Computer and Electrical Engineering,
IIITDM Kancheepuram, Chennai 600127, India (e-mail: [email protected]).
M.N.S Swamy is with the Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada (e-mail:
[email protected]).
To transform a lengthy video to shorter length, video
summarization [3], [4], [5] and video synopsis [6] are the
most popular technologies. Video synopsis is more efficient as
it creates a shorter version of the original video by preserving
all the important activities. Out of the several approaches in the
field of video synopsis, initially, Rav-Acha et al. [6] in 2006,
proposed a video synopsis framework which includes both
low level and object based synopsis. In the low level video
synopsis, the active pixels are shifted along time from the input
video to synopsis video. It resulted in a huge computational
burden due to the pixel level operation. On the contrary, objectbased video synopsis is popular as it retains higher level
object-based properties by detecting and tracking the moving
objects. The basic framework for object-based video synopsis
includes object detection and segmentation, tube formation,
optimization, and stitching, where the optimization module
is considered to be a key module. The solution of energy
minimization to solve the optimization problem is extensive
due to a large number of tube rearrangement possibilities. To
make it feasible, two different schemes, namely, lossy and
lossless object-based video synopsis are proposed in [6] and
solved using Simulated Annealing (SA) [7]. Similar to [6], an
improved framework is proposed by Pritch et al. [8], which
uses a greedy-based approach for energy minimization. From
different research findings, the presence of unwanted collisions
is considered as a major limitation in VS. To resolve this, two
notable approaches, namely, spatio-temporal rearrangement of
objects and object’s size reduction are reported in [9] and [10],
respectively. Subsequently, other issues like fluent tube formation [11], [12], and tracking [13] are also addressed. Further,
to alleviate the crowded appearance of synopsis video, abnormal activity-based [14] and event-based [15] video synopsis
generation are also presented. The aforementioned approaches
are mostly based on the visual assessment of synopsis video,
where only traditional optimization techniques are used for
energy minimization. Another notable work [14] solves the
minimization problem using Blob Trajectory Optimization,
whereas [16] employes a greedy approach. Similarly, a global
energy function in [9] is solved using Graph Cut [17]. From
the literature, it is noticed that to solve the energy minimization
problem, most of the schemes utilize SA as it produces a
global optimum solution. However, the iterative computational
nature of SA makes the process lengthy and time-consuming.
A. Motivation & Overview
This article presents an improved optimization framework
for object-based video synopsis. In the suggested scheme, the
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
objective function of the optimization module is solved using
the proposed hybridization of SA [7] and JAYA Algorithm
[18], namely HSAJAYA, to achieve a global optimal solution.
The total energy cost function for energy minimization may
not always be a convex function, as the individual cost functions associated with it are not always convex. Thus, in most of
the observations, SA is chosen because of the fact that it is well
acclaimed to achieve a global optimum solution. In SA, due
to the random selection of the next state in each iteration, the
algorithm finally reaches the global optimal solution, but the
convergence time is fairly high. On the other hand, due to the
parameter independent property of JAYA, it leads the algorithm
to reach an optimal solution with a lesser number of function
evaluations. But in JAYA, the search space is narrowed down
because of its one-way nature of the solution(i.e. only the best
and worst solutions decide the next state of the population).
Thus, in the proposed HSAJAYA, JAYA is adopted in the
generation step for considering the previous best and worst
solutions to decide the next state of the population. Further,
the selection of the population as a successful one is done by
a probabilistic method as in SA.
In the proposed scheme, the energy function is computed as
a weighted sum of three objectives. The objectives are the minimization of activity loss termed as activity cost, minimization
of collision between objects due to temporal rearrangement
known as the collision cost and reduction of temporal relativity
between objects as per the original one, called the temporal
consistency cost. Further, the associated weights are used to
signify the importance of the corresponding objectives. In
general, weight values are decided intuitively or heuristically,
but the proposed scheme utilizes Analytic Hierarchy Process
(AHP) [19] to compute the weights.
B. Contributions
The contributions of the proposed scheme are threefold:
1) An improved computationally less overhead Consumer
Surveillance Management System (CSMS) is proposed,
facilitating an optimization framework for object-based
video synopsis, where the energy minimization is
achieved by the proposed HSAJAYA.
2) Further, AHP based-weight determination is employed
to assign weights to the objective function.
3) The efficacy of the proposed scheme is validated in
terms of some qualitative and quantitative measures
along with complexity analysis.
C. Organizations
Rest of the paper is organized as follows: Section II presents
the proposed framework. Simulation results are given in Section III. Finally, Section IV gives the concluding remarks.
II. C ONSUMER S URVEILLANCE M ANAGEMENT S YSTEM
The proposed scheme operates as follows. The live video
streams are captured by the consumer surveillance cameras
and fed to the Consumer Surveillance Management System
(CSMS). CSMS is central to the Consumer Surveillance System and facilitates storage and Video Synopsis Services. Based
Consumer Electronic Devices
2
Consumer Surveillance Management
System (CSMS)
Database & Archive
Smart
Phone
Desktop
PDA
Tablet
HDTV
Video Synopsis Services
Video Synopsis Generation
Proposed
HSAJAYA
Input Video
Object
Detection and
Segmentation
Tube
Formation
Optimization
Framework
Video Synopsis
Stitching
Fig. 1. Flow Diagram of the proposed Video Synopsis Framework with
Consumer Surveillance Management System.
on the consumer time query to the CSMS, Video Synopsis
Services fetch the desired video footage from the Database
and transfer it to the VS Generation module for processing.
In VS generation, the first step for the input video is to detect
and segment the moving objects. Next, the segmented objects
are tracked to form a tube for each object. Further, three
objectives of optimization module are formulated and solved
using the proposed HSAJAYA approach. Finally, the output
resulting from the energy minimization based on objects’
information is blended with the background video to form the
final video synopsis. The processing core of CSMS System is
implemented in a consumer electronic device, Laptop/standard
PC and the generated synopsis video can be suitably linked
to consumer devices like smart-phone (for easy processing),
tablet, HDTV (for easy visualization). Fig. 1 depicts the flow
diagram of the proposed work.
A. Object Detection & Segmentation
The variation in illumination and misclassification of shadows, ghost, and noise as foreground can cause serious issues
like, object merging, object shape distortion, and even object missing. So, a robust multi-layer background subtraction
algorithm [20] is applied for the purpose of extracting the
foreground along with a background model. OV (x, y, t) and
BGI(x, y, t0 ) denote an input video and the corresponding
background frame, respectively, where x & y are the spatial
coordinates, t is the temporal index and t0 < t. The applied
algorithm is boosted with the power of local binary pattern
(LBP) features plus photometric invariant color measurements
in RGB color space to address variation in illumination.
Morphological dilation and erosion are also applied on each
resulting foreground mask to get more accurate result. Finally,
the noise-free foreground masks are inputted for object detection through blob analysis. In the proposed methodology, all
the detected moving objects are represented in terms of precise
and independent bounding boxes.
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
3
B. Tube Formation
In video synopsis terminology, the tracked representation of
each independent object is represented as a tube. As multiple
objects are present in a surveillance video, a robust multiobject tracking strategy is required for the tube generation. In
the proposed scheme, a Kalman Filter-based algorithm [21]
is employed for tracking. The set of all connected bounding
boxes in the time domain is considered to be a track of a
single moving object and will be temporally rearranged in the
optimization module.
C. Optimization Framework
This section deals with the optimization framework where
the proposed hybrid energy minimization is formulated. The
prime objective is to preserve the activities among all the moving objects, their corresponding collision relic and to maintain
the temporal consistency for the synopsis video with respect
to the original one. The shifting time position of each object is
addressed by the proposed energy minimization scheme, which
produces minimum activity loss, collision, and violation to
temporal consistency. These are the desired properties which
represent energy of the video synopsis framework as described
in (1). Finally, based on the new resultant positions the objects
are stitched with the background video to produce the final
synopsis.
The objective function (also termed as energy function)
of the proposed optimization framework is formulated by
considering µ and E, which represent a mapping of objects
from the input video to synopsis and the total energy cost,
respectively, and is presented as in
E(µ) = Ea (µ) + ω1 Ec (µ) + ω2 Et (µ)
(1)
where ω1 and ω2 are associated weight terms. Ea (µ), Ec (µ)
and Et (µ) represent the penalty cost for activity loss, collision
and the temporal consistency, respectively. To include all
activities in the synopsis video, no weight is assigned to
Ea (µ). The total energy cost function may not be always a
convex function as the collision cost associated with it is an
area function (namely, the standard dot product).
1) The Energy Cost: Activity Cost: As stated in (1), the
activity cost Ea (µ) represents the penalty for non-preservation
of original activity in the synopsis video and is defined by
X
X
X
Ea (µ) = Ea (obs ) =
AbsDif f (ô) (2)
ob∈OB
ob∈OB
ô ∈ obs ∧ ô ∈
/ synopsis
where ob represents an element that belongs to the set of
all moving object tubes, denoted as OB. obs is a set, which
contains all mapping representations of the elements of OB
with respect to a new position in the synopsis video. ô is
an element representation of obs that has not been included
in the synopsis video. In (2), AbsDif f () is a function to
evaluate the absolute difference between the element ô and
its corresponding background. Thus, the AbsDif f () function
can be defined as
X
AbsDif f (ô) =
||OV (x, y, tô ) − BGI(x, y, tô )|| (3)
(x,y) ∈ bbox(ô)
where tô is the corresponding frame number of ô and bbox()
denotes the bounding box function. Thus, Ea (µ) is the summation of all active pixels that are not included in the synopsis
video. In a lossless scenario, i.e., if all activities are preserved
in the synopsis video, the activity cost will be set as zero.
Collision Cost: In (1), the term Ec (µ) is used to minimize
the number of collisions among the moving objects. The presence of collisions are considered unrealistic due to overlapping
of two or more objects. So, minimum number of collisions
ensures visually comfortable synopsis video, as suggested in
[6] and [10]. Let obsm and obsn denote any two moving objects
in the synopsis video, mapped from the original video objects
obm and obn , respectively. The collision cost is defined as
X
Ec (µ) =
Ec (obsm , obsn )
(4)
obm ,obn ∈OB
where the term
Ec (obsm , obsn )
Ec (obsm , obsn ) =
X
is further defined as
Area(bbox(ôm ) ∩ bbox(ôn ))
(5)
ôm ∈obsm ,ôn ∈obsn
in which the Area() function calculates the area of the
overlapped section of the bounding box of ôm (∈ obsm ) and ôn
(∈ obsn ). If the synopsis video is produced with no collision,
then the cost will be zero. Otherwise, it reflects the cost in
terms of addition to all overlapping areas.
Temporal Consistency Cost: In (1), the term Et (µ) represents the temporal consistency cost to maintain the chronological order of appearance of the objects. In the process of video
synopsis generation, the temporal relationships among moving
objects may be affected due to the applied temporal shifts,
as mentioned in [6] and [10]. To eliminate these temporal
shifts, which may cause temporal violation, a penalty cost
with following situations are considered. 1. Two objects which
share some common frames; 2. Two objects that do not share
any common frames. Under situation 2, there can be two
sub-classes: 2a. the temporal appearance order between these
two objects is violated and 2b. the temporal appearance order
between these two objects is preserved. Interactions among
moving objects are most precious in case of surveillance.
With the situations considered above, the preservation of
interactions among moving objects in the resultant synopsis
is done.
Consider the first situation: the interaction between two
moving objects can be measured probabilistically through their
spatial relationships. Let obm and obn be two moving objects
and share some common frames in the original video. Then
their spatial relationships (4) can be established as
4(obm , obn ) = exp(−
min
t∈tobm ∩tobn
(δ(obm , obn , t))/σ)
(6)
where δ(obm , obn , t) is the Euclidean distance between the
objects obm and obn in the tth frame of the input video.
σ lies in the range 0 − 100%, and represents the level of
space interaction between obm and obn . A greater value of σ
results in a situation where a higher cost will be assigned for
the violation of temporal consistency. Hence, the optimization
process attempts to minimize the cost though there may be
very little possibility of interaction between the objects. Hence,
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
4
in this scenario the cost allocation is inappropriate due the
higher value of σ. On the other hand, a low value of σ drives
the situation to assign a low temporal consistency cost which is
not effective in the minimization process and a high possibility
of interaction between objects might get neglected. Hence, in
our experiment, we have considered an intermediate value of
σ as 40, inspired from [10].
Beside spatial relationships, the measurement of temporal
relationship (τ ) between objects can be established in terms
of the absolute difference as
→
−
→
−
→
−
→
−
τ (obm , obn ) = exp(||( t obm− t obn ) − ( t obsm− t obsn )||) (7)
→
−
→
−
where t obm and t obn are the entry frame indices of obm and
→
−
→
−
obn , respectively, in the input video while t obsm and t obsn
are that of obm and obn , respectively, in the resultant synopsis.
For the second situation, to measure the temporal violation
(τ ∗ ) between obm and obn , another metric is defined as
→
−
→
−
→
−
→
−
τ ∗ (obm , obn ) = ( t obm − t obn ) × ( t obsm − t obsn ) (8)
A positive value of τ ∗ implies a successful preservation of
temporal consistency between obm and obn in the synopsis
resulting in zero penalty. Hence, the complete formulation of
temporal consistency cost through the mapping of situations
considered is presented as

4(obm , obn ) × τ (obm , obn )

(
→
−
→
−
Et (µ) =
exp(||( t obsm − t obsn )||/γ) if τ ∗ (obm , obn ) ≤ 0


0
Otherwise
if (tobm ∩ tobn ) 6= φ
Otherwise
(9)
where γ, which lies in the range 0−100%, is used to define the
time interval in which events have temporal communication.
In [10], γ is considered as 10 due to higher number of objects
with less interaction in their considered videos. Due to higher
interaction among objects in our considered videos, the value
of γ is chosen as 20. Here, an exponential cost is assigned in
terms of the absolute difference of the temporal distance. The
exponential increment of cost is directly proportional to the
amount of temporal relationship violation.
2) Analytic Hierarchy Process (AHP) based weight determination: AHP [19] is a proficient tool to deal with complex
decision problems. This analytic technique allows the user to
make the best decision by the settlement of given priorities.
Evaluation of the geometric means for the parameters considered defines the decision-making process in AHP. In this
work, AHP is used to realize the values of weights associated
with the corresponding costs in the objective function.
The implementation step of AHP starts with a pairwise
comparison matrix, denoted by An×n , where n is the number
of considered evaluation criteria. The elements of An×n are
denoted by aij , which represents the relative importance
between ith and j th criteria. If aij > 1, the ith criterion is
more significant than j th criterion. Likewise, if aij < 1, then
ith criterion is less significant than j th criterion. If aij = 1,
they are having equal significance. A numerical scale of 1 to 9
is used to measure the relative importance. After constructing
An×n , the relative normalized weights ωi can be computed as
v
v
uY
n uY
X
u n
u n
n
n
aij / ( t
aij )
(10)
ωi = t
j=1
i=1
j=1
P
where
i=1 ωi = 1. The realization of weight values are
usually computed heuristically. In lossy scenario, the summation of weight values associated to activity, collision and
temporal consistency is 1. Considering a lossless scenario in
the proposed scheme, no weight has been assigned to activity
cost. Considering Ec (µ) is absolutely more significant than
Et (µ), Ec (µ) is assigned with label 9, while 1 represents the
significance of Et (µ) and corresponding pairwise comparison
matrix A is written as
1 9
a11 a12
(11)
A=
= 1
a21 a22
1
9
By comparing (10) and (11), we get ω1 = 0.8123; ω2 =
0.1877. Hence (1), can be modified in terms of a resultant
objective function as
E(µ) = Ea (µ) + 0.8123 × Ec (µ) + 0.1877 × Et (µ) (12)
3) Energy Minimization: The energy minimization function
involves high computations due to a large number of possibilities. Some of the reported literature [6], [10], [22], [23] employ
Simulated Annealing and Genetic Algorithm for minimization.
The present proposal aims to minimize the process with
reduced time along with optimal cost. The process is applied to
the set of all possible mapping µ to find the temporal shift that
swings the tubes along the time axis. In the proposed scheme,
the lower bound of all temporal shifts is considered as the first
frame of the synopsis video, wherein the upper bound of the
shift denotes the absolute difference between synopsis length
and corresponding tube length. In this work, the length of the
synopsis is equal to the length of the tube, which constitutes
maximum length. Considering the aforementioned conditions,
an efficient hybridized optimization algorithm is proposed
to solve (12). The proposed hybridization (HSAJAYA) is
formulated using SA & JAYA algorithm.
In Simulated Annealing [24], a preliminary state with energy E0 , a new state is selected by the shift of a randomly
elected particle. For E0 > E, E is being selected and a new
next state is generated as before, where E is the energy of the
current state. Otherwise, if E0 ≤ E, the probability to remain
in this new state is given as
exp(−(E − E0 )/kb T )
(13)
where kb is the Boltzmann constant and T is the current
temperature. This probabilistic acceptance strategy is known
as the Metropolis criterion. As the Metropolis algorithm is
based on a single fixed temperature, Krikpatrick et al. [7],
generalized it by incorporating an annealing schedule to reduce
the temperature. Starting with a high initial temperature, it
decreases according to annealing schedule and the step iterates
until the system freezes in a state with global minimum energy.
R. Rao [18] proposed a meta-heuristic optimization process
named as Jaya algorithm. This technique operates with a
population size Pmax (i.e. apparent optimum state, k =
1, 2, . . . , Pmax ) and N number of decision variables j =
1, 2, . . . , N for every individual of the population. For any
mth iteration, there exist a best and a worst solution among
m
m
all the apparent optimum states denoted by Xbest
and Xworst
,
respectively. To obtain the best and worst states for the mth
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
5
Algorithm 1: Proposed HSAJAYA Algorithm
input : OV (x, y, t) : Original Video;
T ube : Object Track
output: µ = [xi,j ] : Temporal Shifts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Initialization of definition parameters:
objno;
// Number of moving objects in OV
objlen;
// Length of the corresponding object
maxlen;
// Length of the synopsis video
T = 1000;
// Initial Temperature
e = 0.01;
// Change in fitness value
τs = 0.99;
// Temperature Schedule
Gen = 1;
// Iteration Counter
nP op = 10;
// Size of the population
Genmax = 100; // Maximum iteration
xi,j = randi([0 (maxlen − objlenj )]);
Initial Energy, E(µ[xi,j ]) is evaluated by (12);
for Gen = 1 to Genmax do
Generation and Evaluation Phase:
for i = 1 to nP op do
for j = 1 to objno do
r = rand(1, 2);
new
Xi,j
= xi,j + r(1) × (Bestj − |xi,j |) −
r(2) × (W orstj − |xi,j |);
new
Enew (µ[Xi,j
]) is evaluated by (12);
Selection Phase:
for i = 1 to nP op do
if Enew ≤ E then
new
µ[xi,j ] = µ[Xi,j
]; E = Enew ;
else
n = rand(1, 1);
if n ≤ exp(−((Enew − E)/E)/T ) then
new
µ[xi,j ] = µ[Xi,j
]; E = Enew ;
T = T × τs ; Gen = Gen + 1;
m
iteration, the position of Xk,j
for the k th population and j th
decision variable is updated as per (14).
m
m
m
m
m
m
m
Xk,j
=xm
k,j +r1,j (Xj,best−|xk,j |)−r2,j (Xj,worst−|xk,j |) (14)
m
m
where, r1,j
and r2,j
are randomly initialized within the range
m
[0, 1] for every mth iteration. The updated value of Xk,j
is
accepted if and only if the resultant objective function value
is better than xm
k,j .
a. HSAJAYA: The detailed steps of the proposed hybrid
scheme is presented in Algorithm 1. The required parameters
as specified in step 1 are initialized as per the defined optimization framework with the objective given in (12). Here,
a set of initial population xi,j is assigned for each decision
variable j, with a random integer value returned by the randi()
function within a range of 0 to the absolute difference between
length of the longest object tube and the corresponding object
tube. Application of the initial population to the objective
function decides the initial best population, Bestj and the
worst population, W orstj . Here, xi,j is the current position
Fig. 2. Experimental Setup: 1,2-Video Capture; 3-Processing; 4-Visualization
for the ith population and j th decision variable. Now inspired
from JAYA, the search is directed towards the best point
and away from the worst to reach the optimum solution.
new
for each mth iteration
Thus, the updated population Xi,j
is evaluated from xi,j by (14). During the update, a random
multiple ‘r0 is used within the range of [0, 1] for each iteration.
The new generation is evaluated for activity, collision and
temporal consistency costs, and the objective function value
new
E(µ) is obtained by (12). The selection procedure of Xi,j
is accomplished by the random selection depicted in step 10
to step 16 of Algorithm 1. If the new selection is found to be
better than the previous best, the population is designated as
the best population. Otherwise, as per SA, the new solution
is accepted to be the best solution from the basis of a
probabilistic selection. Finally, the termination criterion is
considered to attain the optimum solution.
D. Stitching
In this step, the resultant temporal shift is applied to the
corresponding object tube to include it in the synopsis video.
The generated background video and the object tubes are
stitched together to produce the final synopsis. Before stitching, a time-stamp is generated based on the frame numbers
and the rate for each object activity, and stitched along with
the corresponding object which makes the synopsis efficient
for video browsing. In this work, like other video synopsis
generation methodologies [6], [8], [9], [10], Poisson Image
Editing [25] is applied to blend the object information with the
corresponding background frame. This image editing scheme
has proven itself as an efficient blending tool to remove any
undesirable seams if present in the frames.
III. R ESULTS AND D ISCUSSION
The experimental setup consists of a laptop with 3 CPU
Core 1.70 GHz, 64 bit processor with 4 GB of RAM. Surveillance videos are captured through Dome/Day-Night consumer
surveillance cameras and processing is done through the laptop
to generate synopsis. Once the final synopsis is ready, visualization can be used by various consumer electronics display
devices. Fig. 2 depicts the experimental setup of the proposed
video synopsis framework. To appraise the efficacy of the
proposed scheme, extensive experiments are carried out on
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
6
TABLE I
PARAMETERS OF E XPERIMENTAL S URVEILLANCE V IDEO
First Frame Video Video Length Frame Rate Number of
(fps)
Objects
Snapshot
Number (# Frames)
2
1066
30
5
3
2688
30
12
4
300
10
3
various benchmark surveillance videos. A prototypical frame
of each of the video sequences and parameters like frame rate,
video length, the number of objects present are depicted in
Table I.
Video number 1, ”Atrium”, is taken from [16], where
several moving objects (human) are walking around the field of
view in a random manner. On the other hand, video number
2, ”Pedestrians” is taken from ChangeDetection.Net Dataset
[26], which includes some human activities and bicycle riding
as moving objects. The objects are moving from left to right
and right to left that contains overlapping in objects. Video
number 3 is obtained from PETS 2001 [27], where several
human beings walk around single or in a group along with
the car’s movement. Finally, video number 4 is taken from
LASIESTA Database [28]. The experimental outcomes and
analysis of the proposed scheme are presented and discussed
experiment-wise as follows.
A. Experiment 1: Evaluation of Object Detection and Segmentation
To evaluate the efficiency of the employed algorithm,
namely, Multi-layer background subtraction [20], for overcoming the issues discussed in section II(A), the following
experiment has been conducted. The visual assessment of the
employed scheme has been compared to the well accepted
object detection and segmentation techniques, reflected in Fig.
3. In Fig. 3, A, B, C, and D represents videos 1, 2, 3, and 4,
respectively. A(1), B(1), C(1), and D(1) are the input frames
from original videos (1st row); 2nd row depicts the corresponding ground truths, and 3rd -7th rows depicts resultant
frames obtained by Gaussian mixture model (GMM) [29],
PBAS [30], LOBSTERBGS [31], SuBSENSE [32], and multilayer background subtraction algorithm [20], respectively. Fig.
3(D(7)) clearly shows improved segmented result as compared
to Fig. 3(D(3) and D(4)) as they suffer from noise and Fig.
3(D(5) and D(6)) which contain ghosting artifacts. Moreover,
it can be observed from Fig. 3(A(7), B(7), C(7), and D(7)), the
resulting foreground masks obtained through the application of
[20], outperform that of the benchmark schemes. In addition,
a quantitative analysis is also carried out to evaluate the
efficacy of the used algorithm [20] in terms of the performance
metrics like Precision, Recall, F1, Similarity, and Percentage
of Correct Classification (PCC) for the considered videos and
presented in Table II. The detailed mathematical expressions
B(1)
C(1)
D(1)
A(2)
B(2)
C(2)
D(2)
A(3)
B(3)
C(3)
D(3)
A(4)
B(4)
C(4)
D(4)
A(5)
B(5)
C(5)
D(5)
A(6)
B(6)
C(6)
D(6)
A(7)
B(7)
C(7)
D(7)
Fig. 3. Subjective comparison of Object Detection and Segmentation Phase
TABLE II
Q UANTITATIVE A NALYSIS OF O BJECT D ETECTION
Video#1
6
Video#2
30
Video#3
600
Video#4
1
A(1)
AND
S EGMENTATION
Algorithm Precision Recall
F1
Similarity PCC
[29]
0.319
0.665 0.431
0.455
53.42
[30]
0.486
0.556 0.518
0.475
58.95
[31]
0.488
0.529 0.507
0.488
62.35
[32]
0.725
0.645 0.682
0.595
85.34
[20]
0.875
0.896 0.885
0.795
97.45
[29]
0.487
0.723 0.581
0.428
92.52
[30]
0.795
0.389 0.522
0.468
95.26
[31]
0.848
0.792 0.819
0.734
94.07
[32]
0.644
0.726 0.682
0.527
93.53
[20]
0.925
0.814 0.865
0. 948
98.70
[29]
0.628
0.885 0.734
0.516
96.25
[30]
0.792
0.625 0.698
0.573
97.37
[31]
0.878
0.541 0.669
0.426
91.85
[32]
0.745
0.628 0.681
0.553
94.52
[20]
0.926
0.803 0.860
0.839
98.35
[29]
0.338
0.398 0.365
0.262
74.86
[30]
0.675
0.604 0.637
0.482
84.29
[31]
0.592
0.697 0.640
0.498
86.78
[32]
0.789
0.812 0.800
0.675
91.24
[20]
0.977
0.918 0.946
0.927
92.74
of the above performance metrics are available in [33]. The
algorithm in [20] exhibits higher values for each metrics
excepts recall value in video number 3, compared to other
algorithms for object detection and segmentation.
B. Experiment 2: Evaluation of Proposed Optimization Approach
To validate the effectiveness of HSAJAYA to solve (12), the
results obtained are compared with that of other optimization
techniques, namely, SA, Teaching Learning Based Optimization (TLBO) [34], JAYA, Elitist-JAYA [35], SAMP-JAYA [36],
and Non-dominated Sort Genetic Algorithm II (NSGA II) [37].
For all simulations, the number of iterations is considered to be
100 and the number of population as 10. Table III demonstrates
the performance of the proposed scheme to minimize the
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
7
5
×10
4
×10
×105
106
4.5
3
4
SA
TLBO
JAYA
Elitist-JAYA
SAMP-JAYA
NSGA II
HSAJAYA
3
2.5
SA
TLBO
JAYA
Elitist-JAYA
SAMP-JAYA
NSGA II
HSAJAYA
105
4
10
2
Best Cost
1.5
6
3.5
Best Cost
SA
TLBO
JAYA
Elitist-JAYA
SAMP-JAYA
NSGA II
HSAJAYA
2
Best Cost
Best Cost
2.5
SA
TLBO
JAYA
Elitist-JAYA
SAMP-JAYA
NSGA II
HSAJAYA
5
4
3
2
1
0
20
40
60
80
1.5
100
0
20
40
60
80
100
103
0
20
40
60
80
100
0
20
40
60
Iteration
Iteration
Iteration
Iteration
(a)
(b)
(c)
(d)
80
100
Fig. 4. Convergence characteristics for various optimization techniques applied to (a) Video No.1 (b) Video No.2 (c) Video No.3 (d) Video No.4
Video#4
Video#3
Video#2
Video#1
TABLE III
P ERFORMANCE C OMPARISON A MONG O PTIMIZATION T ECHNIQUES
Collision Temporal Fitness Time of
Optimization Activity
Cost
Consistency Value Execution
Techniques
Cost
(Sec)
Cost
(×103 )
(×103 )
SA
0
13.77
9.69
11.19
207.18
TLBO
0
13.21
8.40
10.73
94.13
JAYA
0
13.21
8.40
10.73
41.28
Elitist-JAYA
0
13.15
9.09
10.69
80.65
SAMP-JAYA
0
13.10
8.95
10.65
81.52
NSGA II
0
13.21
8.38
10.73
47.09
HSAJAYA
0
12.99
11.72
10.55
136.47
SA
0
212.85
9.77
172.9
120.39
TLBO
0
520.75
11.53
423.01
53.23
JAYA
0
520.75
11.53
423.01
26.89
Elitist-JAYA
0
520.80
10.95
423.05
52.73
SAMP-JAYA
0
520.77
10.87
423.02
54.28
NSGA II
0
520.75
11.53
423.01
33.12
HSAJAYA
0
184.11
9.38
149.55
89.50
SA
0
5.01
36.75
4.08
1213.3
TLBO
0
276.72
32.51
224.78
512.52
JAYA
0
162.33
48.37
131.87
269.04
Elitist-JAYA
0
6.77
40.69
5.51
538.28
SAMP-JAYA
0
10.34
38.43
8.41
540.95
NSGA II
0
272.19
41.29
221.10
266.65
HSAJAYA
0
1.76
55.66
1.44
882.18
SA
0
719.04
1.07
584.07
8.59
TLBO
0
722.41
1.10
586.81
6.70
JAYA
0
722.41
1.10
586.81
3.67
Elitist-JAYA
0
722.41
1.10
586.81
10.04
SAMP-JAYA
0
719.04
1.07
584.07
15.42
NSGA II
0
722.41
1.10
586.81
6.42
HSAJAYA
0
184.35
2.90
149.74
4.37
objective function (12) over the four videos considered in
terms of activity, collision and temporal consistency costs
along with the execution time. During the entire simulation,
the preservation of all activities by considering the length of
the synopsis video to be equal to the length of the longest tube
is assumed. Hence, in all the simulation results the activity cost
reflects to be zero.
A statistical comparative analysis is carried out to validate
the superiority of the proposed scheme with respect to the best,
mean, worst, standard deviation and the average execution time
as compared to other considered meta-heuristic techniques,
which are depicted in Table IV. The analysis is drawn over 10
independent runs of each algorithm (SA, TLBO, JAYA, ElitistJAYA, SAMP-JAYA, NSGA II, and the proposed HSAJAYA),
where the best one is selected in terms of the convergence
characteristics over four considered videos, shown in Fig. 4. In
addition, the proposed HSAJAYA algorithm is employed and
comparison is based on the fitness value and time of execution
to minimize the objective functions (originally solved by SA)
Fig. 5. First row: Video Synopsis results obtained in [6] of Video 1 (155th 158th frames). Second row: Video Synopsis results obtained in HSAJAYA of
Video 1 (155th - 158th frames). Third row: Video Synopsis results obtained
in [6] of Video 2 (140th - 143th frames). Fourth row: Video Synopsis results
obtained in HSAJAYA of Video 2 (140th - 143th frames). Fifth row: Video
Synopsis results obtained in [6] of Video 3 (425th - 428th frames). Sixth
row: Video Synopsis results obtained in HSAJAYA of Video 3 (425th - 428th
frames).Seventh row: Video Synopsis results obtained in [6] of Video 4 (36th
- 39th frames). Eighth row: Video Synopsis results obtained in HSAJAYA of
Video 4 (36th - 39th frames). Red circle signifies the presence of collision
in the resultant synopsis obtained in [6], where as yellow circle signifies the
collision in resultant synopsis obtained in HSAJAYA.
proposed in [6] and [10], as illustrated in Table V.
During simulations of [6] and [10], the weights associated
for collision and temporal consistency costs are assumed to
be the same as used in (12). From Tables III-IV, it is evident
that the proposed algorithm outperforms the considered optimization techniques to solve the proposed objective function.
Moreover, it is noticed from Table V that the proposed
scheme is superior to solve the state-of-the-arts optimization
frameworks in [6] and [10]. In general, the proposed hybrid
scheme outperforms the traditional optimization algoritms (i.e.
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
Video#4
Video#3
Video#2
Video#1
8
TABLE IV
C OMPARATIVE S TATISTICAL A NALYSIS
Standard
Average
Best
Mean
Worst Deviation
Optimization
Execution
Techniques (×103 ) (×103 ) (×103 )
3
Time
(Sec)
(×10 )
SA
11.19
14.03
19.39
2.86
202.69
TLBO
10.73
12.48
14.40
1.53
96.52
JAYA
10.73
14.03
20.43
3.57
44.68
Elitist-JAYA
10.69
13.86
18.95
3.32
92.18
SAMP-JAYA 10.65
13.77
17.43
3.24
94.54
NSGA II
10.73
15.02
20.94
4.33
45.63
HSAJAYA
10.55
12.83
19.33
3.23
192.00
SA
172.90 276.80 364.36
54.02
117.81
TLBO
423.01 424.83 440.89
5.64
51.94
JAYA
423.01 426.61 442.68
7.64
26.55
Elitist-JAYA 423.05 430.07 438.69
7.25
54.45
SAMP-JAYA 423.02 428.03 436.75
7.12
53.78
NSGA II
423.01 454.01 538.30
41.11
32.96
HSAJAYA
149.55 221.03 291.42
42.35
91.71
SA
4.08
9.18
18.78
5.45
1150.33
TLBO
224.78 298.06 341.17
37.12
508.85
JAYA
131.87 226.47 389.91
89.27
259.01
Elitist-JAYA
5.51
18.47
23.76
11.54
503.81
SAMP-JAYA
8.41
25.87
52.65
32.59
509.32
NSGA II
221.10 303.37 454.78
69.52
270.13
HSAJAYA
1.44
36.24
115.71
40.90
914.59
SA
584.07 584.93 585.78
0.89
8.81
TLBO
586.81 586.81 586.81
0.00
6.44
JAYA
586.81 586.93 588.03
0.38
3.41
Elitist-JAYA 586.81 593.18 640.44
16.77
9.09
SAMP-JAYA 584.07 584.88 585.78
0.86
14.59
NSGA II
586.81 586.81 586.81
0.00
5.14
HSAJAYA
149.74 150.78 152.16
1.12
4.11
SA, TLBO, JAYA, Elitist-JAYA, SAMP-JAYA, and NSGA II)
in solving the overall minimization problem for object-based
surveillance video synopsis.
Further, to validate the efficacy of the proposed consumer
surveillance management system, a subjective quality assessment study as in [9] is conducted. Here, 45 subjects are
invited to observe the generated synopsis of the experimental videos through [6] (left, Screen-1) and proposed CSMS
(right, Screen-2) placed adjacently. Generated synopsis frames
through [6] and proposed CSMS are depicted in the odd
rows and even rows of Fig. 5, respectively. The subjects are
requested to contribute their observations in terms of “Yes”
or “No” for the questions: 1. Is the synopsis on Screen-2
more comfortable in terms of visualization? 2. Is synopsis
on the Screen-1 having fewer observable collisions? 3. Are
the objects clearly distinguishable in the synopsis on Screen2? Then the original videos are projected and the following
question is asked: 4. Is the synopsis video on Screen-1 better
than that on the Screen-2 in correspondence to the original
video?
To obtain a visual response rating (V RR), it is assumed
that Rij = 1, if the ith participant has given “Yes” for the j th
question, else Rij = 0. Equation (15) is applied to evaluate
“Yes” rates for the questions, which reflects the participant’s
opinion.
45 X
4
X
V RR = (
Rij )/180 × 100%
(15)
i=1 j=1
The “Yes” rates for four questions were 76.8%, 16.4%,
82.3%, and 6.7%, respectively. From the testing data, it can
be noted that most users prefer the proposed video synopsis
as compared to [6].
TABLE V
P ERFORMANCE C OMPARISON FOR S OLVING S TATE -O F -T HE -A RT
Fitness
Fitness
Time of
Time of
Optimization Value [6] Execution
Value [10] Execution
Techniques
(×103 ) [6] (Sec)
(×103 ) [10] (Sec)
11.19
207.18
0.666
350.85
Video#1 SA
HSAJAYA
10.55
136.47
0.629
216.62
172.90
120.39
4.514
194.18
Video#2 SA
HSAJAYA
149.55
89.50
3.904
147.20
SA
4.08
1213.28
0.248
1866.05
Video#3 HSAJAYA
1.44
882.18
0.095
1383.18
584.07
8.54
34.760
14.46
Video#4 SA
HSAJAYA
149.79
4.37
8.930
6.93
Video#1
Video#2
Video#3
Video#4
TABLE VI
F RAME - WISE A NALYSIS OF P ROCESSING T IME
Optimization
Computational
Computational
Techniques Time [6] (Sec/Frame) Time [10] (Sec/Frame)
SA
0.3453
0.5847
HSAJAYA
0.2274
0.3610
SA
0.1129
0.1821
HSAJAYA
0.0839
0.1380
SA
0.4513
0.6942
HSAJAYA
0.3281
0.5145
SA
0.0286
0.0483
HSAJAYA
0.0142
0.0231
C. Computational Complexity Analysis
The computational complexity of the proposed algorithm is
O(K 2 LW H), where K denotes the number of objects in the
original video and L, W , and H represent the maximum tube
length, bounding box width and height, respectively. Based
on two major steps, 3 and 10 of Algorithm 1, the complexity
is analyzed as remaining steps takes either constant time or
iterates for constant time. Hence, the total time complexity
of the proposed algorithm is O(K 2 LW H), which is computationally more efficient than that of [6] and [10] schemes
that require O(T n ) and O(T n + Rn ), respectively, where T
is the suitable temporal shift of objects and n is the number
of all objects present in the original video and R denotes the
reduction coefficient search space. Additionally, the processing
time for each frame (in seconds) for the considered videos are
depicted in Table VI. It is observed that the proposed scheme
takes less time than that of [6] and [10].
IV. C ONCLUSION
This paper deals with implementation of hybrid optimization scheme, HSAJAYA, in VS framework to use consumer
electronic devices efficiently. During hybridization, the population step is generated using JAYA, and SA is utilized for
selection. In general, the proposed HSAJAYA algorithm effectively minimizes the objective function in terms of activity
loss, collision cost and temporal consistency cost with reduced
computational time for efficient and intelligent applications
like home security. Experimental results and analysis confirm
that the performance of the proposed scheme surpasses that of
the recent optimization techniques. The proposed technique
may be applicable to the consumer electronic devices (such
as smart phone, tablets, etc.) which are capable of processing
video due to the reduced computational cost.
R EFERENCES
[1] W. Lao, J. Han, and P. H. De With, “Automatic video-based human motion analyzer for consumer surveillance system,” IEEE Trans. Consum.
Electron., vol. 55, no. 2, pp. 591–598, May 2009.
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCE.2020.2981829, IEEE
Transactions on Consumer Electronics
9
[2] S. Javanbakhti, S. Zinger et al., “Fast scene analysis for surveillance
& video databases,” IEEE Trans. Consum. Electron., vol. 63, no. 3, pp.
325–333, Aug. 2017.
[3] R. M. Jiang, A. H. Sadka, and D. Crookes, “Hierarchical video summarization in reference subspace,” IEEE Trans. Consum. Electron., vol. 55,
no. 3, pp. 1551–1557, Aug. 2009.
[4] G. Ciocca and R. Schettini, “Supervised and unsupervised classification
post-processing for visual video summaries,” IEEE Trans. Consum.
Electron., vol. 52, no. 2, pp. 630–638, May 2006.
[5] Y. Gao, W. Wang, and J. Yong, “A video summarization tool using twolevel redundancy detection for personal video recorders,” IEEE Trans.
Consum. Electron., vol. 54, no. 2, pp. 521–526, May 2008.
[6] A. Rav-Acha, Y. Pritch, and S. Peleg, “Making a long video short:
Dynamic video synopsis,” in Proc. IEEE Comput. Soc. Conf. Comput.
Vis. Pattern Recognit., New York, NY, USA, Jun. 2006, pp. 435–441.
[7] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by
simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, May
1983.
[8] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological video synopsis
and indexing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11,
pp. 1971–1984, Jan. 2008.
[9] Y. Nie, C. Xiao, H. Sun, and P. Li, “Compact video synopsis via
global spatiotemporal optimization,” IEEE Trans. Vis. Comput. Graphics, vol. 19, no. 10, pp. 1664–1676, Oct. 2013.
[10] X. Li, Z. Wang, and X. Lu, “Surveillance video synopsis via scaling
down objects,” IEEE Trans. Image Process., vol. 25, no. 2, pp. 740–
755, Feb. 2016.
[11] M. Lu, Y. Wang, and G. Pan, “Generating fluent tubes in video synopsis,”
in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Vancouver,
BC, Canada, May 2013, pp. 2292–2296.
[12] R. Zhong, R. Hu, Z. Wang, and S. Wang, “Fast synopsis for moving
objects using compressed video,” IEEE Signal Process. Lett., vol. 21,
no. 7, pp. 834–838, Jul. 2014.
[13] L. Sun, J. Xing, H. Ai, and S. Lao, “A tracking based fast online
complete video synopsis approach,” in Proc. 21st Int. Conf. Pattern
Recognit., Tsukuba, Japan, Nov. 2012, pp. 1956–1959.
[14] W. Lin, Y. Zhang, J. Lu, B. Zhou, J. Wang, and Y. Zhou, “Summarizing surveillance videos with local-patch-learning-based abnormality
detection, blob sequence optimization, and type-based synopsis,” Neurocomputing, vol. 155, pp. 84–98, May 2015.
[15] W. Wang, P. Chung, C. Huang, and W. Huang, “Event based surveillance
video synopsis using trajectory kinematics descriptors,” in Proc. 15th
IAPR Int. Conf. on Mach. Vis. Appl., Nagoya, Japan, May 2017, pp.
250–253.
[16] K. Li, B. Yan, W. Wang, and H. Gharavi, “An effective video synopsis
approach with seam carving,” IEEE Signal Process. Lett., vol. 23, no. 1,
pp. 11–14, Jan. 2016.
[17] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 23, no. 11, pp. 1222–1239, Nov. 2001.
[18] R. Rao, “Jaya: A simple and new optimization algorithm for solving
constrained and unconstrained optimization problems,” Int. J. Ind. Eng.
Comput., vol. 7, no. 1, pp. 19–34, Jan. 2016.
[19] S. P. Singh, T. Prakash, V. Singh, and M. G. Babu, “Analytic hierarchy
process based automatic generation control of multi-area interconnected
power system using jaya algorithm,” Eng. Appl. Artif. Intell., vol. 60,
pp. 35–44, Apr. 2017.
[20] J. Yao and J. M. Odobez, “Multi-layer background subtraction based
on color and texture,” in Proc. IEEE Conf. Comp. Vis. Pattern Recog.,
Minneapolis, MN, USA, Jun. 2007, pp. 1–8.
[21] G. Welch and G. Bishop, “An introduction to the kalman filter,” in Proc.
Annu. Conf. Comput. Graph. Interact. Techn., Los Angeles, CA, USA,
Aug. 2001, pp. 12–17.
[22] T. Yao, M. Xiao, C. Ma, C. Shen, and P. Li, “Object based video
synopsis,” in Proc. IEEE Workshop Adv. Res. Technol. Ind. Appl.,
Ottawa, Canada, Sep. 2014, pp. 1138–1141.
[23] Y. Tian, H. Zheng, Q. Chen, D. Wang, and R. Lin, “Surveillance video
synopsis generation method via keeping important relationship among
objects,” IET Computer Vis., vol. 10, no. 8, pp. 868–872, Jun. 2016.
[24] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and
E. Teller, “Equation of state calculations by fast computing machines,”
J. Chem. Phys., vol. 21, no. 6, pp. 1087–1091, Jun. 1953.
[25] P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM
Trans. Graph., vol. 22, no. 3, pp. 313–318, Jul. 2003.
[26] Y. Wang, P. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, and
P. Ishwar. (2014, Jun.) Cdnet 2014: An expanded change detection
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
benchmark dataset. Columbus, Ohio. CDnet 2014. [Online]. Available:
http://changedetection.net/
PETS2001 Dataset. [Online]. Available: ftp://ftp.pets.rdg.ac.uk/pub
LASIESTA
Database.
[Online].
Available:
http://www.gti.ssr.upm.es/data
D. S. Lee, “Effective gaussian mixture learning for video background
subtraction,” IEEE Trans. Pattern Anal. Mach. Intell., no. 5, pp. 827–
832, 2005.
M. Hofmann, P. Tiefenbacher, and G. Rigoll, “Background segmentation with feedback: The pixel-based adaptive segmenter,” in Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops,
Providence, Rhode Island, USA, Jun. 2012, pp. 38–43.
P. L. St-Charles and G. A. Bilodeau, “Improving background subtraction
using local binary similarity patterns,” in Proc. IEEE Winter Conf. Appl.
Comput. Vis., Steamboat Springs CO, Mar. 2014, pp. 509–515.
P. L. St-Charles, G. A. Bilodeau, and R. Bergevin, “Flexible background
subtraction with self-balanced local sensitivity,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. Workshops, Columbus, Ohio, Jun. 2014,
pp. 408–413.
D. K. Panda and S. Meher, “A new wronskian change detection
model based codebook background subtraction for visual surveillance
applications,” J. Visual Commun. Image Representation, vol. 56, pp. 52
– 72, Aug. 2018.
R. Rao, V. J. Savsani, and D. P. Vakharia, “Teaching–learning-based
optimization: a novel method for constrained mechanical design optimization problems,” Comput. Aided Design, vol. 43, no. 3, pp. 303–315,
Mar. 2011.
R. Rao and A. Saroj, “Constrained economic optimization of shell-andtube heat exchangers using elitist-jaya algorithm,” Energy, vol. 128,
no. 1, pp. 785–800, Jun. 2017.
——, “A self-adaptive multi-population based jaya algorithm for engineering optimization,” Swarm Evol. Comput., vol. 37, pp. 1–26, Dec.
2017.
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput.,
vol. 6, no. 2, pp. 182–197, Apr. 2002.
Subhankar Ghatak received the B.Sc. (Hons.) degree in mathematics from University of Calcutta,
India, in 2005 and the MCA and M.Tech in Information Technology degrees from the same University,
in 2008 and 2011, respectively. He is continuing
his Ph.D. degree in Computer Science and Engineering from IIIT Bhubaneswar, India. His research
interest includes Advanced Image and Video Processing, Computer Vision, Visual Cryptography and
Steganography.
Suvendu Rup received his M.Tech degree in CSE,
from Jadavpur University, Kolkata, India. He received his Ph.D. degree in CSE from NIT Rourkela,
Odisha, India. Since 2010, he is with the Department
of CSE, IIIT Bhubaneswar, India as an Assistant
Professor. His research interest includes Image and
Video Processing, Computer Vision, Machine Learning, Distributed video coding, and assistive computing.
Banshidhar Majhi received his M.Tech degree and
Ph.D. in CSE in the year 1998 and 2003, respectively, from NIT Rourkela, Odisha, India. Presently,
he is serving as Director & Registrar (i/c) of IIITDM
Kancheepuram. His research interest includes Image
and Video Processing, Data Compression, Soft Computing, Bio-metrics and Network Security.
M.N.S. Swamy (S’59–M’62–SM’74–F’80–LF’01)
received the B.Sc. (Hons.) degree in mathematics
from Mysore University, India, in 1954, the Diploma
degree in ECE from the IISc Bangalore, in 1957 and
the M.Sc. and Ph.D. degrees in EE from the University of Saskatchewan, Saskatoon, Canada, in 1960
and 1963, respectively. He is presently a Research
Professor and holds the Concordia Chair (Tier I) in
Signal Processing in the Department of Electrical
and Computer Engineering at Concordia University,
Montreal, QC, Canada.
0098-3063 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on April 08,2020 at 06:00:18 UTC from IEEE Xplore. Restrictions apply.
Download