International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 An Interactive Video Surveillance System for Detection and Classification of Moving Vehicles S. B. Kulkarni#1, Sharada K. S#2, U. P. Kulkarni#3, Nagaraj. M. Benakanahalli#4 #1 #2 Professor, Department of Computer Science & Engineering, SDMCET, Dharwad, Karnataka, India Lecturer, Department of Computer Science & Engineering, SKSVMACET, Laxmeshwar, Karnataka, India #3 Professor, Department of Computer Science & Engineering, SDMCET, Dharwad, Karnataka, India #4 Asst., Professor, Department of Automation and Robotics, BVBCET, Hubli, Karnataka, India Abstract — Traffic controlling and monitoring are the major issues in today’s world as the number of vehicles is increased tremendously. So it is imperative to develop a system so as to have an efficient traffic controlling mechanism by knowing the information about the number of vehicles moving on the highways. And also it is better to know the categories of the types of vehicles moving on highways as the important information for the concerned traffic authority to monitor the traffic. The main objective of the work is to develop an interactive system that can count and classify the vehicles for better traffic management. The proposed algorithm which is able to detect the moving vehicles on the road and classify them into light and heavy vehicles and also it interacts with the user to tag the detected vehicles and produces the count of vehicles passing on the highway. The system accepts an integer value 0 for light vehicles, 1 for heavy types of vehicles for newly detected vehicles. In this paper the preprocessed video is used as input, for which the improved adaptive Gaussian mixture model is used for background subtraction and detection of vehicles. Then the design of interactive system proceeds with extraction of geometrical features for the detected vehicle. Based on the geometrical features extracted the ratio is calculated. The reference ratio is set for the corresponding vehicle type detected. When the vehicles of different type are detected, the reference ratio values are compared for the detected vehicle type and the vehicles are classified. The respective vehicle type is also counted. Keywords— Background subtraction, improved adaptive Gaussian mixture model, geometrical features, reference ratio I. INTRODUCTION Vehicle detection and classification is a popular research technique nowadays. The increasing population and the transport system resulted in a huge usage of vehicles of different types. The increased number of vehicles moving on roads promoted the traffic over the highways. To control traffic, the signals video surveillance is used which need manual intervention to fetch the information about the number or the count of vehicles that are moving at a particular location. And when the problem of video surveillance and monitoring is considered, the position of the camera should not affect the working of the system. The efficient system should be designed in such a way that it should not be affected by visual field or whatever lighting effects occur. It is desirable that the system adapts to gradual changes of the appearance of the environment as the light intensity typically varies in day. It should also be capable of dealing with ISSN: 2231-5381 movement through cluttered areas, objects overlapping in the visual field, shadows, lighting changes, and effects of moving elements of the scene, slow-moving objects, and objects being introduced or removed from the scene. Traditional approaches based on back grounding methods typically fail in these general situations. To achieve the tasks of a surveillance system, target detection is a special form of dimensionality reduction. As well finding moving objects in image sequences is also an important task in computer vision. The numerous approaches to this problem vary in the type of background model used and the procedure used to update the model. One such method to detect the target in video surveillance is to use the method of background subtraction. Background Subtraction is one of the most widely used techniques to detect moving objects from static scene [1]. Since background subtraction is a lowlevel task, it should consider two aspects: Accuracy and Computational resources (time and memory). First, accuracy is critical because the output of the background subtraction is used for other high tasks, such as tracking, recognition and also classification. Second, computational resources used for background subtraction are critical since the resources remaining after this low-level task should be used for high level tasks, and is preferable as a means of implementing this task in real-time embedded systems such as smart cameras. Therefore, it is important for the background subtraction method to obtain high accuracy and low resource requirements at the same time. To develop a proposed system to be robust two of the major steps are considered. The first one is background modeling which results in foreground detection. Improved adaptive Gaussians mixture model is used to develop the background modeling. The second step is classification of the detected vehicles .This involves geometrical features extraction. The resulting classification system aimed towards a simple design to perform detection of the vehicles on roads and classify them into small and heavy vehicles with the count. II. RELATED WORK Detection, counting and classification of vehicles in a video have become a potential area of research due to its numerous applications to video-based intelligent transportation systems [2]. Vehicle type recognition is a relatively new research domain, which begins to interest various laboratories in the world. Vehicle classification is a very important, as accurate http://www.ijettjournal.org Page 72 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 identification of the moving target is the basis for subsequent problem of eliminating apparent changes in image reflectance, traffic management analysis. At present, many approaches the improved adaptive Gaussian mixture model[10] is applied have been proposed to automatically classify vehicles in to achieve the foreground detection. Once the foreground is images and videos. A.J.Liptonet.al. [3] proposed an approach detected the ratios are calculated and corresponding reference purely based on a target's shape. R.T.Collinset.al.[4]used ratio is set for different type of vehicles. The tagging of viewpoint-specific neural networks to classify the moving vehicle type for each of the detected vehicle is done, based on object to be people and vehicles which using spatial features tagging the moving objects into different types of vehicles of image blob dispersing (perimeter*2/area), image blob area, such as light and heavy vehicles. And the proposed system apparent aspect ratio of the blob bounding box and camera provides an interactive environment to perform dynamic zoom. This method is based on training, which has used a lot detection based on tags given by the user i.e., 0 for light of features and has high complexity. The work of A. Salinger vehicles and 1 for heavy vehicles. and L. Wilson[5] and Javed and Shah[6] presented a method Input Video for classifying moving object as rigid or non-rigid based on the similarity of their appearance over multiple frames. When Pre-processing morphological operations are applied to the blobs for improving segmentation this method may not be reliable. R. Cutler and L. Davis [7] described new techniques using image Foreground vehicle detection using similarity to detect and analyze periodic motion, which may IAGMM Background Modeling don’t work when the motion is non-periodic. Y.Bogomolovet.al.[8] used hybrid features combining motion and appearance features. L.B.Chen.et.al. [9] used features of Geometrical Feature Extraction from detected Vehicle objects size, velocity, location and differences of Histograms of Oriented Gradients to classify people from vehicles. This Determine Ratio (DR) approach use complicated features to classification based on tracking results, and needs to calibrate the size feature, which is hard to transplant to other scenes.In order to classify the Receive Tags from the user (0-Light vehicle, 1-Heavy vehicle) and Set objects , first background modeling is needed. Background respective Reference Ratio (RR) for light modeling is generally done by analyzing some of the regular and heavy vehicles statistical characteristics and then object is detected by comparing the current frame with the modeled background. However, though it sounds simple, but this technique seems DR=RR to be inadequate when dealing with complex environment. No Most of the available non-adaptive methods of Yes background subtraction cannot be used in surveillance Classification of vehicles as because it requires manual initialization. And in surveillance, Light and Heavy vehicles it is not possible to have a pre-defined set of background model before-hand. So, it is very important that the Fig. 1 Flow chart for Vehicle Classification background model is adaptive and robust for the detection of the objects in the foreground. A. Pre-processing The proposed methodology uses video images of vehicle III. PROPOSED METHODOLOGY moving on road segments which are captured from the camera This research work presents an efficient algorithm that that is placed on the elevation on road side. The initial task of performs two tasks namely vehicle detection and vehicle vehicle classification is to convert true colour input image into classification. It detects the presence of vehicle in the video gray level image. Input frame I = [IR IG IB] T, is combination and classifies it into one of the known class. The proposed of the three colour components Red, Green and Blue. The methodology involves several steps as shown in Figure 1.The input colour frame is converted into gray level image IG, work starts with the pre-processing where the true colour according to luminance converter [11] so as to get the input image is converted into gray level image. Then it enhanced images. proceeds with background subtraction. There are many approaches developed for background subtraction as discussed B. Background Modeling using Improved Adaptive Gaussian in the survey work. For the proposed work improved adaptive Mixture Model Gaussian mixture model is chosen so as to overcome the Finding moving objects in video sequences is one of the problems with lighting conditions such as the illumination most important tasks in surveillance. For many years, the results from the lighting conditions present when the image is obvious approach has been first to compute the stationary captured, and can change when lighting conditions change and background image, and then to identify the target as these reflectance results from the way the objects in the image pixels in the image vary significantly from the background. reflect light, and is determined by the intrinsic properties of The numerous approaches to this problem vary in the the object itself, which does not change. For our given ISSN: 2231-5381 http://www.ijettjournal.org Page 73 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 type of background model used and the procedure used to describe the Gaussian components. The covariance matrices update the model. There are several popular approaches for are assumed to be diagonal and the identity matrix I has background subtraction such as GMM [15]. But the results proper dimensions. The mixing weights denoted by are obtained for this algorithm had very high false positives and non-negative and add up to one. Given a new data sample some false negatives. The foreground object was getting at time t the recursive update equations are given by (5), (6) registered but large number of false positives made it very and (7). difficult to extract the foreground object. Morphological (5) operations such as erosion and dilation also failed to help (6) them in reducing the false positives. Even when there is no foreground object in the scene, the false positives that they got (7) in resultant background subtraction frame after GMM based Where instead of the time interval T that classification process were very high. So to overcome this was mentioned above, here constant α describes an problem the proposed methodology uses improved Gaussian exponentially decaying envelope that is used to limit the mixture model for the background subtraction. influence of the old data. The same notation having the mind In order to choose the background model it is important to that approximately α =1/T. For a new sample the ownership know which approach should be chosen so as to achieve is set to 1 for the ‘close’ component with largest and foreground detection with a good accuracy. The proposed the others are set to zero. The squared distance from the m-th method uses an approach known as improved adaptive component is calculated as shown in equation (8). Gaussian mixture model given by Z. Zivkovic [1] for (8) background subtraction which is the extension of the basic GMM approach. To have the overview of the Gaussian If there are no 'close' components a new component is Mixture Model (GMM) which is a parametric probability generated with equations (9) and (10). density function represented as a weighted sum of Gaussian (9) component densities [13]. GMMs are commonly used as a (10) parametric model of the probability distribution of continuous Where is some appropriate initial variance. If the measurements. GMM parameters are estimated from training maximum number of components is reached we discard the data using the iterative Expectation Maximization (EM) component with smallest . algorithm or Maximum estimation from a well-trained prior model. The related work discussed above says that GMM got The foreground object will be represented by some additional . Therefore approximate the failed to adapt the illumination changes and also to identify clusters with small weights background model by the first B largest clusters given by the foreground object, to resolve these problems improved equation (11). Gaussian Mixture Model is chosen and it is discussed below. The enhanced images are fed as input to construct the (11) background model. This paper uses impproved adaptive If the components are stored to have descending weights Gaussian mixture model approach [1] so as to build a then, background model to track the objects. It helps to create foreground frames to perform upcoming operations such as Where cf is a measure of the maximum portion of the data feature extraction and classification. In the approach, the that can belong to foreground objects without influencing the illumination in the scene could change suddenly. A new object background model. For example, if a new object comes into a could be brought into the scene or a present object removed scene and remains static for some time it will probably from it. In order to adapt to changes update the training set by generate an additional stabile cluster. Since the old adding new samples and discarding the old ones. Choose a background is occluded the weight of the new cluster reasonable time period T and at time t, given by equation will be constantly increasing. If the object remains static long (1). (1) For each new sample update the training data set and re-estimate given by equation (2). (2) Among the samples some values that belong to the foreground objects and should denote this estimate as equation (3). (3) GMM with M components given by equation (4) (4) Where are the estimates of the means and are the estimates of the variances that ISSN: 2231-5381 enough, its weight becomes larger than cf and it can be considered to be part of the background. By looking at equation (4). We can conclude that the object should be static for approximately frames. The weight describes how much of the data belongs to the m-th component of the GMM. It can be regarded as the probability that a sample comes from the m-th component and in this way the define an underlying multinomial distribution. Let us assume that we have t data samples and each of them belongs to one of the components of the GMM. Let us also assume that the number of samples that belong to the m-th component is given by equation (13). http://www.ijettjournal.org (13) Page 74 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 Where are defined in the previous section. The assumed multinomial distribution for gives likelihood function . The mixing weights are constrained (21) to sum up to one. Take this into account by introducing the Since we expect usually only a few components M and cT is . As mentioned we set 1/t to Lagrange multiplier λ. The maximum likelihood estimate small we assume α and get the final modified adaptive update equation (22). follows from equation (14). (22) (14) This equation is used instead of this equation After getting rid of we get equation(15) . (15) After each update needs to be normalize so that it The estimate from t samples we denoted as and it can is up to one. Then start with GMM with one component be rewritten in recursive form as a function of the estimate centered on the first sample and new components are added as for t - 1 samples and the ownership of the last mentioned in the previous. The Dirichlet prior with negative weights will suppress the components that are not supported sample given by equation (16). by the data and discard the component m when its weight (16) becomes negative. This also ensures that the mixing weights If the influence of the new samples is fix by fixing 1/t to α we could require = 1/T to get the update equation (4). This fixed influence of stay non-negative. For a chosen that at least c = 0.01*T samples support a component and got the new samples means that rely more on the new samples and the result . the contribution from the old samples is down weighted in an exponentially decaying manner as mentioned before. 1) Foreground Detection: The above operations are Prior knowledge for multinomial distribution can be performed and are checked with the learning rate. The introduced by using its conjugate Prior, the Dirichlet prior learning rate is defined as; when the change of background given by equation (17). dynamics becomes smaller the rate for updating the background model using the current video frame should (17) The coefficients cm has a meaningful interpretation. For the decrease. This rate of background model updating is referred multinomial distribution, the cm presents the prior evidence for to as the learning rate. This is useful in order not to miss any the class m the number of samples that belong to that class a foreground pixels. When the change of background dynamics priori. Negative prior evidence means, accept that the class m increase, the learning rate should increase for the background exists only if there is enough evidence from the data for the model to converge more rapidly in order to reduce the number existence of this class. This type of prior is also related to of false alarms. Here the learning rate is compared value, if learning rate exceeds 0.01 then the minimum message length criterion that is used for selecting with proper models for given data. The MAP solution that includes pixels are updated to the background model. Then adapted background model is then taken as the reference for the next the mentioned prior follows from equation (18). incoming frames to be compared. This is used for detecting foreground object which is highly dependent on background (18) model. It compares the video frame with the background Where frame, and identifies the foreground object from the frame. Finally, this step eliminates any pixels which are not We get equation (19) connected to the image. It involves the process of improving (19) the foreground mask based on the information obtained from Where the outside background model. We rewrite (7) as equation (20) (20) Where is the ML estimate from equation (15) and bias from the prior is introduced through c/t. The bias decreases for larger data sets (larger t). However, if a small bias is acceptable we can keep it constant by fixing c/t to cT= c/T with some large T. This means that the bias will always be the same as if it would have been for a data set with T samples. It is easy to show that the recursive version of (18) with fixed c/t = cT is given by equation (21) . ISSN: 2231-5381 C. Geometrical Feature Extraction and Classification From the detected vehicles by the improved adaptive Gaussian mixture model [1], the geometrical features like length, width, area, and perimeter are extracted. The ratios are calculated for the geometrical features extracted. For each of the object (vehicle) detected, the reference ratio is calculated and is taken as RR, for example, lw=length/width; ,al=area/length; aw=area/width; ap=area/perimeter; lp=length/perimeter; wp=width/perimeter and finally produces the reference ratio as, RR=[lw al aw ap lp wp] and this will be stored in database as shown in Table 1and Table 2. When the next vehicle is detected, its ratios are determined say DR. Then the RR and DR are compared. If both are equal classify the vehicle type as already detected one else calculate the RR http://www.ijettjournal.org Page 75 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 for the detected vehicle and store in a database for future reference as shown in Figure.1calculated. By this calculation the system classify the vehicle as small/heavy. The system accepts an integer value 0 for light vehicles and 1 for heavy vehicles types for newly detected vehicles as shown in the Fig. 3 and Fig. 4. IV. EXPERIMENTAL RESULTS The work has been implemented using a set of video inputs, captured from the immovable camera. The pre-processed video sequences are given as input to the system. The improved adaptive Gaussian mixture model approach is applied so as to achieve the foreground detection. The result of background modelling is shown in Fig. 2. The detection of the vehicles is very clear and the frame is not affected by lighting conditions, clutters and shadow. Fig. 2 Result of background subtraction (Foreground Detction) Once the system receives the video as input to be processed, the vehicles are detected as the video proceeds. Firstly in the sample video, a vehicle is detected and the system interacts with the user to specify the tag (0 for light vehicles), and after receiving the tag the count of the light vehicle type is detected and this information is stored to continue with the count record. With the further proceeding of the input video, when the new type of the vehicle is detected subsequently and hence the system interacts with the user and asks for the new tag for the new type of vehicle category and the count for the corresponding vehicle types are displayed. Fig. 3, Fig. 4 and Fig. 5 illustrate the part of results. Fig. 4 Heavy vehicle detected waiting for vehicle tag option Fig. 5 Classification result and count of vehicles The geometrical features extracted are used for the calculation of reference ratios. The reference ratios are calculated for each of the vehicle type detected for the first time and are saved separately for light and heavy vehicles in text file for future reference as shown in the Table I. TABLE I SAMPLE REFERENCE RATIOS (RR) = [LW AL AW AP LP WP] FOR LIGHT AND HEAVY VEHICLE WHEN DETECTED FOR THE FIRST TIME Sample Vehicle type lw=Le ngth/wi dth Light 1.7 Heavy 3.8338 al=ar ea/len gth 29.13 4 9.216 5 aw=a rea/ widt h ap=a rea/p erim eter lp=Le ngth/p erimet er wp=widt h/perime ter 59.8 2.14 0.2986 0.17564 35.3 34 3.08 45 0.3346 7 0.08729 6 The determined ratios are calculated by the system and are compared with the previously stored reference ratio. When the vehicle is detected it compares with the previously stored reference ratio (as shown in Table II). If the it is equal, then system adds the count to the previously detected class type else it will store the newly calculated ratio as reference ratio for future comparison and asks the user to tag for the new class type as 0 or 1 for light and heavy vehicles respectively. Fig. 3 Light vehicle detected waiting for vehicle tag option ISSN: 2231-5381 http://www.ijettjournal.org Page 76 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 2 – Sep 2014 TABLE III inexpensive to use. Also the system provides an interactive DETERMINED RATIOS FOR EACH OF THE SAMPLE VEHICLE IMAGES FROM environment to tag the vehicle types before classification DIFFERENT FRAMES through which user can issue new tags for the vehicle type as lw=L al=are aw=a ap=are lp=Le wp= Us Vehicl and when the vehicle is detected. The system can identify new ength/ a/leng rea/w a/peri ngth/ width er e Type classes for more number of input videos consisting of various Fram width th idth meter perim /peri in e id eter meter pu vehicles, which is another advantage of the proposed system. t It is intended to work on several classification tasks by tag 1.817 29.59 53.78 0.301 0.165 possible extensions for the proposed methodology to detect 1 8.9151 0 Light 2 5 1 24 77 more number of vehicle classes efficiently with above 90% 3.833 9.216 35.33 0.334 0.087 2 3.0845 1 Heavy performance. 8 5 4 67 296 3 4 5 6 7 1.817 2 3.833 8 1.817 2 1.817 2 1.817 2 29.59 5 9.216 5 29.59 5 29.59 5 29.59 5 53.78 1 35.33 4 53.78 1 53.78 1 53.78 1 8.9151 3.0845 8.9151 8.9151 8.9151 0.301 24 0.334 67 0.301 24 0.301 24 0.301 24 0.165 77 0.087 296 0.165 77 0.165 77 0.165 77 0 Light 1 Heavy 0 Light 0 Light 0 Light To analyse the performance of the proposed system the manually delineated vehicles were used as reference vehicles to check the accuracy of the system. The performance of the proposed system is evaluated using the following measures [12], TPR (true positive (23), false positive (24) and false negative (25) respectively: (23) (24) (25) The input videos are taken for testing and results obtained are shown in Table III. To obtain good performance, the methodology must try to reduce the false negative rate and false positive rate types of errors and FNR = 0 and FPR = 0 indicates 100% accuracy of the classification method. TABLE IIII PERFORMANCE STATISTICS OF PROPOSED APPROACH Input Video Video 1 Video 2 Video 3 Number of Vehicles appearing in the video frames 22 52 8 Number of vehicles detected Number of false detection s Vehicle s missed 18 45 7 0 0 0 04 07 1 REFERENCES [1] [2] [3] [4] [5] [6] [7] Accura cy % [8] [9] 75 86.53 87.5 The sample outcome results signify accuracy and relatively high level performance of the proposed methodology. It can detect and classify the vehicles with higher efficiency in the scenes of traffic without being affected by lighting conditions. The tagging of the vehicles types, for each of the categories of the vehicles for sample input video as shown in Fig. 3, Fig. 4 and Fig. 5. V. CONCLUSION AND FUTURE WORK The proposed system provides economical and efficient solution for vehicle detection and classification with the count. The system uses low level geometrical features like width, length etc to achieve classification. Hence results in simple implementation for classification. The system is ISSN: 2231-5381 ACKNOWLEDGMENT The authors wish to acknowledge CASIA for providing the iris database. The shared CASIA Iris Database is available on the web [10]. The work is partially supported by the Research Grant from AICTE, Govt. of India, Reference no: 8023/RID/RPS-114(PVT)/2011-12 Dated December,24-2011. [10] [11] [12] M. Piccardi, “Background subtraction techniques: A review,” in Proc. IEEE International Conference on Systems, Man and Cybernetics, pp 3099–3104, Oct. 2004. Niluthpol Chowdhury Mithun, Nafi Ur Rashid, and S. M. Mahbubur Rahman, “Detection and Classification Of Vehicles From Video Using Multiple Time-Spatial Images” IEEE Transactions On Intelligent Transportation Systems, Vol. 13, No. 3, September 2012. A. J. Lipton, H. Fujiyoshi and R. S. Patil, “Moving target classification and tracking from real-time video,” In Proc., IEEE Workshop on Application of Computer Vision, Princeton, NJ, 1998, pp. 8-14. R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y.Tsin, et al., “A system for video surveillance and monitoring,” In VSAM final report, Robotics Inst. Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-RI-TR-00-12, 2000. A. Selinger and L. Wixson, “Classifying moving objects as rigid ornon-rigid without correspondences,” In Proc. DARPA Image Understanding Workshop, Monterey, CA, 1998, pp. 341-358. O. Javed and M. Shah, “Tracking and object classification for automated surveillance,” In Proc. European Conf. on Computer Vision, Copenhagen, Denmark, 2002, pp. 343-357. R. Cutler and L. S. Davis, “Robust real-time periodic motion detection, analysis and applications,” IEEE Trans Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 781-796, 2000. Y. Bogomolov, G. Dror, S. Lapchev, E. Rivlin and M. Rudzsky, “Classification of moving targets based on motion and appearance,” In Proc. British Machine Vision Conference, Norwich, 2003, pp. 42-438. L. B. Chen, R. Feris, Y. Zhai, L. Brown and A. Hampapur, “An Integrated System for Moving Object Classification in Surveillance Videos,” In Proc. IEEE Advanced Video and Signal Based Surveillance, Santa Fe, New Mexico, 2008, pp. 52-59. Z. Zivkovic, “Improved Adaptive Gaussian Mixture Model for Background Subtraction”, in IEEE International Conference on Pattern Recognition (ICPR), vol. 2, (Cambridge, UK), pp. 28–31, August 2004. Sharpiro, L.G. and stockman, G.C. (2001) Computer Vision. Prentice Hall, Upper Saddle River. Brown, L.M., Senior, A.W., Tian, Y.L., Connell, J., Hampapur, A., Shu, C.F., Merkl, H. and Lu, M. (2005) “Performance Evaluation of Surveillance Systems under Varying Conditions,” IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Colorado, January. http://www.ijettjournal.org Page 77