Uploaded by SIddhant Shirodkar

Paper Summary

advertisement
Surveillance videos are a valuable tool for enhancing security, preventing crime, and improving
safety in various settings. Surveillance videos provide us with the ability to review and analyze
specific incidents, even after they have occurred. This allows us to overcome any challenges that
may have been encountered during an investigation, as the visual evidence captured by the
surveillance videos can be used to help identify suspects, reconstruct events, and provide
valuable insights into the incident. However, if we are unable to comprehend and analyze the
information presented in the video, it is difficult to draw accurate conclusions or make
meaningful inferences. In the paper, we see an effective proposed method for anomaly
detection.
In the proposed method, it uses both normal as well as anomalous videos to learn anomalies. It
uses multiple instance ranking framework to automatically predict anomaly scores in a video. It
uses weakly labeled training videos where training labels are at video level rather than clip level.
To localize the anomaly better during training the method introduces sparsity and temporal
smoothness constraints in the ranking loss function. This method requires a large dataset to
create a more adept and effective model. Therefore a dataset containing 128 hours of videos. It
consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies
such as fighting, road accident, burglary, robbery, etc. as well as normal activities.
Now let’s understand how the proposed method is suggested. First, we divide the surveillance
videos into fixed number of segments for training. Then these segments are given positive and
negative instances where anomalous videos are in the positive bag and normal videos are in the
negative bag. Then the anomaly detection model is implemented where the proposed Multiple
Instance Learning(MIL) ranking loss is used.
In standard supervised algorithm such as Support Vector Machine(SVM), the labels of all positive
and negative examples are used and the classifier is learned using the optimizing function.
However, in supervised anomaly detection the classifier needs a temporal annotations of each
segment in videos. This process is quite convoluted and time consuming. Therefore, in the
proposed method, precise temporal locations are unknown instead video label indicating the
presence of anomaly in the whole video is considered. Now videos containing anomaly are
labeled as positive and a videos with no anomaly are considered as negative and then
represented in a positive or negative bag where individual instances are present.
The author considers the anomaly detection as a regression problem where the anomaly score
of anomalous segments is higher than that of normal segments. Usually, it uses a ranking loss
function which encourages high scores for anomalous segments. However, since segment level
annotations of video are missing, the method proposes a multiple instance ranking objective
function where max of the values is taken over all video segment in each bag.
The segment corresponding to the highest anomaly score in the positive bag is most likely to be
the true positive instance. The segment corresponding to the highest anomaly score in the
negative bag is the one which looks most similar to an anomalous segment but actually is a
normal instance. Therefore, a ranking loss function is defined where Va and Vn represent
anomalous and normal video segments, f(Va) and f(Vn) represent the corresponding predicted
scores.
Now a limitation is explored here, where the scores of instances in an anomalous bag could be
sparse. Since video is a sequence of segments, the anomaly score might vary smoothly between
video segments. Therefore, we impose temporal smoothness between anomaly scores of
temporally adjacent video segments by minimizing the difference of scores for adjacent video
segments. This gives loss as
where 1-represents temporal smoothness term and 2-represents the sparsity term
By training on a large number of positive and negative bags, the author expects that the network
will learn a generalized model to predict high scores for anomalous segments in positive bags.
Finally, our complete objective function is given by
where W represents model weights
To evaluate the proposed method different factors are involved such as dataset used,
experimentations and comparisons with other methods. The dataset for this method was divided
into two parts where the training set consisting of 800 normal and 810 anomalous videos and
the testing set including the remaining 150 normal and 140 anomalous videos. after
experimentations it is observed that the proposed method outperforms the existing methods.
In conclusion, a deep learning approach to detect real world anomaly through surveillance videos
is discussed here. Approaches such as using video level segments rather than clip level segments,
enforcing temporal smoothness and sparsity and using a deep multiple instance ranking
framework with weakly labeled data effectively improves the proposed methods over other
methods. This method could surely be a valuable contribution in the analysis of surveillance
videos.
Download