Midterm Status Report 3.1. Counting People with Body Recognition Sub-system: When people enter the place, the system will recognize their bodies and count them while entering the place, as like as when they are leaving. System keeps both separately like people came and left. 3.1.1. Requirements 3.1.2. Technologies and methods The system should have enough cameras to count people. We process the camera source with OpenCV and detect people. The system should have at least 2 computers. One for server and other(s) for kiosk(s) as clients. Kiosk needs an interface to get inputs from users and gets the data from server database. Our cameras should be connected to main computer. 3.1.3. Conceptualization The system should detect people who came to visit the place. People can be counted when they parked and leave their cars, and also they can be counted at the entrance when they are passing the detectors. 3.1.4. Physical architecture The cameras for counting people should be placed at entrance to get how many people came visit. The cameras for getting body detection should be placed to correct spots get them clearly. 3.1.5. Materialization The libraries and extensions that we have imported so far for the model in sub-system of body-detection can be briefly explained just as below: OpenCV: It is a powerful library to read/write display sources, and we are mainly using it for getting the video or camera source, and read the display from them as frame while getting the RGB coloring, shape (width and height) and apply some features as resizing, preprocessing the frame before feeding into DNN (Deep Neural Network) and also to adding text and lines to the frame for displaying the model results to GUI. NumPY: Is a library for applying numeric applications for variables and we are using it while calculating the surrounding boxes of body detections with applying an array multipication initiated with numpy, also for the part to calculate the direction of centroid part in body with getting mean of previous centroid location. Tkinter: Is a flexible library an alternative for Python to create GUI applications with widgets, so we have used this library to design the whole GUI application that carries buttons to operate opening the source whether it is a camera or a video, starting the model and exitting the operation when required. There is also the displaying of the frame that is displayed using OpenCV, but it is not integrated to the application yet due to objectifying bugs. In addition, there is a text area to write the model results to app such as number of people that is passed by with categorization of the upper side and lower side of the line on the frame that is predetermined and the execution time. And lastly filedialog package of the library is used to move to file that is being selected. Dlib: We are initiating and tracking the rectangles surronding detections of the objects over the time with correlation tracking using this library. Aside from those libraries, there are few files about configuration and classification for the labeled objects 3.1.6. Evaluation We have concluded with a broad investigation about programming language that the most suitable one for the body detection model is Python, which has enormous range of libraries, especially about subjects as image processing or data manipulation and using libraries as NumPY could compansate the disadvantage of slowness due to Python is being interpreted in C for execution. With the language is being determined, we ought to examine various projects in the subject and decide which model is more suitable for implementing a body detection project, and we decided on MobileNet SSD since it is known for the effectiveness on real-time object detection. Previously we have tried to implement a pre-trained haarcascade face-recognition model but it was extremely slow. We have searched over the libraries that are common for image processing and operational libraries for both the implementation of the interface and made choices based on the most time efficient and easy to implement. The algorithm mainly consisted of the functionallity as reading the video source or the displaying of camera itself in a loop while interpreting model detections with the operations to calculate and tracking detection rectangles and categorizing the detection with a line that divides the frame in to two parts. And by getting the number of people that is passing to one side to another, current number of people is calculated with the number of people that is entering or exitting the place. And the results are being saved to a .csv file after loop is done. After the implementation of the first demo that includes main algorithm, we have discussed to whether testing manually in a IP camera or a webcam with cable connection at first, then realized they have rather few differences to implement, decided on using a webcam to test out the model. In the process of gathering the testing data, we have searched over large datasets which can be a little bit hard and took time to traverse and find the suitable on kaggle and videos on the internet mainly, and with the process on testing we realised that even though there is a issue as model detects some of the people as two objects, it is related to the crowded another objects and reflections. Moreover model was undeniablely accurate with videos in perspective of birdsview as probably will be the same in a mall entrance camera. Also that kind of perspective is enabling to view few unrelated objects and that supports the accuracy either. Mehmet Fatih Gülmez 2002966 Haktan Tana 2001646 Musa Ozturk 1906061