Surveillance videos are a valuable tool for enhancing security, preventing crime, and improving safety in various settings. Surveillance videos provide us with the ability to review and analyze specific incidents, even after they have occurred. This allows us to overcome any challenges that may have been encountered during an investigation, as the visual evidence captured by the surveillance videos can be used to help identify suspects, reconstruct events, and provide valuable insights into the incident. However, if we are unable to comprehend and analyze the information presented in the video, it is difficult to draw accurate conclusions or make meaningful inferences. In the paper, we see an effective proposed method for anomaly detection. In the proposed method, it uses both normal as well as anomalous videos to learn anomalies. It uses multiple instance ranking framework to automatically predict anomaly scores in a video. It uses weakly labeled training videos where training labels are at video level rather than clip level. To localize the anomaly better during training the method introduces sparsity and temporal smoothness constraints in the ranking loss function. This method requires a large dataset to create a more adept and effective model. Therefore a dataset containing 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. Now let’s understand how the proposed method is suggested. First, we divide the surveillance videos into fixed number of segments for training. Then these segments are given positive and negative instances where anomalous videos are in the positive bag and normal videos are in the negative bag. Then the anomaly detection model is implemented where the proposed Multiple Instance Learning(MIL) ranking loss is used. In standard supervised algorithm such as Support Vector Machine(SVM), the labels of all positive and negative examples are used and the classifier is learned using the optimizing function. However, in supervised anomaly detection the classifier needs a temporal annotations of each segment in videos. This process is quite convoluted and time consuming. Therefore, in the proposed method, precise temporal locations are unknown instead video label indicating the presence of anomaly in the whole video is considered. Now videos containing anomaly are labeled as positive and a videos with no anomaly are considered as negative and then represented in a positive or negative bag where individual instances are present. The author considers the anomaly detection as a regression problem where the anomaly score of anomalous segments is higher than that of normal segments. Usually, it uses a ranking loss function which encourages high scores for anomalous segments. However, since segment level annotations of video are missing, the method proposes a multiple instance ranking objective function where max of the values is taken over all video segment in each bag. The segment corresponding to the highest anomaly score in the positive bag is most likely to be the true positive instance. The segment corresponding to the highest anomaly score in the negative bag is the one which looks most similar to an anomalous segment but actually is a normal instance. Therefore, a ranking loss function is defined where Va and Vn represent anomalous and normal video segments, f(Va) and f(Vn) represent the corresponding predicted scores. Now a limitation is explored here, where the scores of instances in an anomalous bag could be sparse. Since video is a sequence of segments, the anomaly score might vary smoothly between video segments. Therefore, we impose temporal smoothness between anomaly scores of temporally adjacent video segments by minimizing the difference of scores for adjacent video segments. This gives loss as where 1-represents temporal smoothness term and 2-represents the sparsity term By training on a large number of positive and negative bags, the author expects that the network will learn a generalized model to predict high scores for anomalous segments in positive bags. Finally, our complete objective function is given by where W represents model weights To evaluate the proposed method different factors are involved such as dataset used, experimentations and comparisons with other methods. The dataset for this method was divided into two parts where the training set consisting of 800 normal and 810 anomalous videos and the testing set including the remaining 150 normal and 140 anomalous videos. after experimentations it is observed that the proposed method outperforms the existing methods. In conclusion, a deep learning approach to detect real world anomaly through surveillance videos is discussed here. Approaches such as using video level segments rather than clip level segments, enforcing temporal smoothness and sparsity and using a deep multiple instance ranking framework with weakly labeled data effectively improves the proposed methods over other methods. This method could surely be a valuable contribution in the analysis of surveillance videos.