Uploaded by Jash Jain

Trashwatch

advertisement
TrashWatch: Empowering Cleanliness through
Smart Cameras
Jash Jain1 , Manthan Juthani2 , Kashish Jain3 and Anant V. Nimkar4
1
2
Sardar Patel Institute Of Technology, Mumbai, India
{jash.jain, manthan.juthani, kashish.jain, anant_nimkar}@spit.ac.in
Abstract. Illegal garbage disposal is an integral social and environmental issue. Garbage disposal at arbitrary sites damages the environment and is a major health hazard. We want to minimize the disposal of
garbage in public places. This paper introduces a novel approach to addressing littering incidents by utilizing computer vision and surveillance
cameras to detect and monitor littering actions accurately. The system
can tell the difference between littering and litter removal because it
uses a combination of cutting-edge approaches like YOLOv4 and the
Area Over Intersection method with traditional geometry. The research
also focuses on determining the necessary camera features, including focal length and maximum camera angle, to optimize the system’s performance in different settings. The proposed system, named ’Trashwatch,’
provides users with flexibility in choosing camera resolutions and settings based on their specific requirements. The study demonstrates the
system’s effectiveness in identifying littering and cleaning actions while
considering user preferences for camera features. Ultimately, this comprehensive solution lays the groundwork for automating the entire process of
detecting litterbugs, including tracking the responsible individuals and
employing facial recognition for penalty enforcement.
Keywords: Littering Detection, Camera features, Frame Extraction, Throwing
action
1
Introduction
Littering—throwing debris like a Styrofoam cup from a moving car—is a recent
occurrence. It became more common in the 1950s with throwaway products and
plastic packaging. Each ocean receives 8 million tons of plastic waste, polluting
water, land, and air. Cigarette butts produce arsenic and formaldehyde into waters and endanger humans and animals as litter disintegrates. Trash causes 60%
of global water contamination, and over 40% of it is burned outside, producing
harmful toxins that cause breathing issues and acid rain.
Traditional littering methods require human interaction and eyewitness testimony, which can delay and mislead. Video surveillance cameras can automate
littering detection. This system’s accuracy and scope depend on camera features including focal length, resolution, distance, and angle. The study provides
a mathematical link between these elements to enable customized settings.
2
Jash Jain, Manthan Juthani, Kashish Jain and Anant V. Nimkar
Currently, no research exists that considers both the littering action and the
required camera features for detection. While papers by R. Csordás [1] and S.
Mahankali [2] discuss specific littering actions, and Porikli’s paper [7] explains
general object detection, this paper covers these topics more extensively. Object
detection is the primary step, followed by using geometry, center of mass, and
area over intersection to determine if an action is littering or just carrying an
object.
The paper resolves them. Cameras will record littering. It can identify and
trace litterers using YOLOv4, Area Over Intersection, and geometry. It differentiates littering from picking up. The solution is public and private. The second
part of the approach suggests camera features based on the coverage area to
accurately identify littering if used privately. The technique used a simulator
to iterate to find the greatest camera angle at a given focal length. After the
user enters the objects it wants identified as litter and the distance it wants it
detected, the system outputs different camera resolutions, the ideal focal length
at that resolution, the maximum camera angle that will cover the most area,
and the percentage of the area it will cover. User can pick as needed.
The paper’s main contribution is the implementation of the suggested system ’Trashwatch’, which identifies littering and cleaning acts and matches user
choices for surveillance camera features. Computer vision and cutting-edge methods are used to detect littering. The system’s ability to meet user camera feature
needs makes it complete. Thus, this technology may automate the full litterbug
detection process, including identifying littering behavior, tracking the perpetrator, and administering sanctions via facial recognition.
This paper proposes an organized approach to public-place pollution. Section II analyzes current computer vision and object-tracking developments and
breakthroughs pertinent to the issue after the introduction. Following that, in
Section III, the novel qualities of the suggested system as well as its research
contributions are discussed. In Section IV, the approach and instruments used to
evaluate the effectiveness of the system are discussed. The findings of the study,
together with a more in-depth analysis, are presented in Section V. The most
important discoveries, contributions, and upcoming actions are highlighted in
Section VI.
2
Literature Survey
Several research papers have addressed various challenges in object detection
and recognition, as well as related applications in surveillance and waste management. Csord et al. [1] proposed a robust method for detecting objects thrown
over a fence using a monocular camera system. Their approach utilized optical flow to track object trajectories and was particularly effective for tracking
blurry, small, or variably shaped objects. However, it was limited to a specific
camera placement at the end of the fence, focused on object detection over the
fence line. Mahankali et al. [2] presented a system for identifying illegal garbage
dumping in video footage using deep learning algorithms. Their system achieved
TrashWatch: Empowering Cleanliness through Smart Cameras
3
a high accuracy of 95% in classifying objects as garbage or non-garbage based
on shape, size, and color features. Nevertheless, the system had limitations in
detecting obscured or specific types of garbage and identifying the individuals
responsible for the dumping.
In the field of object detection, Liu et al. [3] introduced a Region Proposal
Network (RPN) that improved the efficiency and precision of object detection
networks by sharing convolutional features between the RPN and the detection network. This integration allowed for real-time operation and enhanced the
quality of proposed regions. Conversely, Esen et al. [4] focused on motion-based
detection for surveillance videos, proposing the motion co-occurrence feature
(MCF) as a promising candidate for abnormal event detection. However, the
computational time required for high frame history values limited its suitability
for real-time applications.
For action recognition in videos, Wang et al. [5] presented the temporal
segment network (TSN), which improved the modeling of long-range temporal structures and achieved state-of-the-art performance in action recognition
while maintaining computational efficiency. Zhang et al. [6] conducted a survey
of vision-based fall detection methods, highlighting the challenges of distinguishing falls from other similar activities in daily life. They found that vision-based
methods alone may not provide a comprehensive solution for fall detection.
Porikli et al. [7] proposed a pixel-wise method that relied on dual foregrounds
to detect objects brought into a scene at a subsequent time, such as abandoned
items or illegally parked vehicles. This approach did not rely on object tracking
and was effective in crowded scenarios. Zhou et al. [8] presented DECOLOR, a
motion-based algorithm for moving object detection that addressed non-static
backgrounds by using a parametric motion model and low-rank representation.
While DECOLOR achieved accurate results, it was not suitable for real-time
object detection.
In garbage detection, Xu et al. [14] analyzed existing trash datasets and introduced two new benchmarks: detect-waste and classify-waste. EfficientDet-D2
localized litter and EfficientNet-B2 categorized it into garbage types in a twostage detector. Majchrowska et al. [13] enhanced the YOLO-CS detection system,
enabling YOLOv4 to distinguish several objects in a single cell. Their collaborative prediction method outperformed state-of-the-art detectors on CrowdHuman
and CityPersons.
3
TrashWatch
The proposed project involves the development of a litter detection algorithm
capable of detecting instances of littering, as well as identifying the specific
type of waste being discarded by individuals. To implement this system, the
proposal involves offering a hardware solution that enables the government or
public entities to specify the prevalent type of litter in a given area, the minimum
size of the debris, and the distance at which it must be detected. On the basis
of these findings, the system would derive the camera parameters necessary for
4
Jash Jain, Manthan Juthani, Kashish Jain and Anant V. Nimkar
the proposed solution. The camera’s minimum resolution, focal length, coverage
area, and angle of view would be specified.
3.1
Littering Detection
At the outset, in order to identify instances of throwing or littering, we contemplated the possibility of isolating an individual as a pixelated entity. In the
hypothetical scenario where the entirety of the human form is converted into a
pixelated binary entity, it would be feasible to discern a separate binary entity
by observing its movement away from the aforementioned human entity. The
second entity was postulated to be a refuse item. This line of reasoning gave rise
to the notion of utilizing a video that documented alterations in scenery. The
method in question is beset by the issue of fragmentation of the human subject
depicted in the photograph, which leads to outcomes that lack consistency.
The YOLOv4 (You Only Look Once) system is a real-time object recognition
system that is capable of identifying specific objects in various media formats
such as videos, live feeds, and still images [9][10]. The YOLO machine learning
algorithm employs a deep convolutional neural network to leverage the learned
features in order to accurately identify an object. The prediction process utilizes
eleven discrete convolutions, implying that the dimensions of the preceding feature map and the resultant prediction map are equivalent. The Yolo language
comprises a total of eighty unique item classifications, out of which thirteen have
been identified as litter through our analysis. The inventory comprised of various
objects such as receptacles for liquids, bags for carrying personal belongings, and
protective gear for rain, along with edible produce such as bananas and apples.
The identification of humans in the stream was accomplished through the
utilization of the Histogram of Oriented Gradients (HOG) technique. The HOG
individual detector utilizes a sliding detection window which is translated across
the image.The texture-based detection method known as HOG, along with its
advantages and features as elucidated in the reference [?], appeared to be the
most appropriate choice. A HOG descriptor is allocated to every position within
the detection window. The aforementioned attribute is subsequently transmitted
to the proficient Support Vector Machine (SVM) algorithm, which categorizes it
according to its assessment as either "an individual" or "an entity other than an
individual." A constant was added to the borders of the bounding box with the
aim of minimizing the number of bounding boxes that enclose a solitary entity
to a single one.
After the detection of waste through the Yolo Model, the AOI (Area over
Intersection) technique was implemented. It can be observed that in cases where
two axis-aligned bounding boxes intersect, the outcome is invariably another
axis-aligned bounding box. By applying the principle of overlap, a computation
was executed to ascertain the area of intersection between the individual and
waste materials. Once the litter has been removed from the person, the surrounding environment is rendered devoid of waste, thereby enabling the perpetrator
to be identified.
TrashWatch: Empowering Cleanliness through Smart Cameras
AoI =
area of overlap
area of union
5
=
Fig. 1. Area Over Intersection
Fig. 2. Littering and Cleaning Action Detected
Conversely, it is contended that the process of cleaning is deemed finished
once a particular item of litter enters an individual’s bounding box and subsequently moves a designated distance in an upward or downward direction. To
validate our model, we employed both a live webcam feed and a pre-existing
recorded video stream. Both methods were effective in identifying individuals
who were engaged in either cleaning up or disposing of waste.
3.2
Camera Feature Extraction
As part of our research contribution, various types of litter, such as purses,
bottles, umbrellas, and more, to generate camera features were employed. These
objects were utilized in different sizes and placed at varying distances from the
camera, simulating real-world scenarios of littering actions.
The purpose of this approach was to accurately assess the performance of our
TrashWatch system in detecting littering incidents. By analyzing the output of
the system when confronted with different objects and distances, valuable data
6
Jash Jain, Manthan Juthani, Kashish Jain and Anant V. Nimkar
was gathered. This data was then entered into the JVSG Lens Simulator, which
provided with the necessary camera features required for effective detection if
the objects were being littered in a real-life CCTV camera scenario.
Through simulations run, various camera angles and percentage of coverage
area were found where littering action was detected. Using the angle of view,
the ideal focal length was then calculated using
AOV = 2 ∗ tan−1 (
d
)
2f
(1)
where AOV represents the angle of view, d corresponds to the chosen dimension
(such as film or sensor size), and f denotes the effective focal length of the camera.
By conducting these experiments and leveraging the simulator, precise and
relevant camera features specific to litter detection were obtained. This research
contributes to the advancement of surveillance systems by enhancing their capabilities to identify and monitor instances of littering, leading to improved
cleanliness and environmental preservation.
4
Experimental Setup
The experimental setup used TrashWatch to process camera video. The Person
Detection Model found persons in the shot while the Custom YOLOv4 Model
found rubbish. The models supplied detection bounding box coordinates and
probabilities. The AOI Function determined if the activity was cleaning or littering by calculating bounding box overlap. Camera Features Model specifications were resolution, focal length, angle, and coverage area. Object distance and
size were entered into the JVSG Lens Simulator to calculate these values. The
TrashWatch system was tested utilizing varied video footage, ground truth annotations, and quantitative evaluations. The Person Detection Model, Custom
YOLOv4 Model, AOI Function, and Camera Features Model merged to enable
real-time trash detection and monitoring.
The study used the cutting-edge YOLOv4 model to improve litter detection
accuracy. Bottles, handbags, and umbrellas were initially hard to identify. A
bespoke dataset was used to fine-tune the YOLOv4 model to overcome this
restriction. Cropping and rotation were used to segment a wide range of rubbish
pictures. By training the model for 100 epochs at 0.001, litter prediction accuracy
improved significantly.
Due to a lack of litter detection datasets, we can only identify 12 litter categories. The model’s minimum height is 13 cm, hence this threshold was chosen
for detection. The video resolution should be at least 480p to accurately detect litter and faces for penalization. The camera is expected to be able to see
pedestrians’ faces at a specific point to help track litterers.
The next step in our procedure was to create a framework that provides a
camera with only the necessary hardware to build up a system for detecting trash
in the neighborhood. The JVSG Lens Calculator was used to obtain information
based on the lens type and distance. Based on the subject’s height and distance,
TrashWatch: Empowering Cleanliness through Smart Cameras
7
Fig. 3. Architectural Diagram
the camera’s focal length (in millimeters), resolution, and angle were set. The
following resolutions were tried and trued: 480p, 640p, 1280p, and 2048p.
5
Results & Discussions
The study has challenges in its attempt to simplify the system for widespread
usage, as stated in its scope. This detection model relies on object and pedestrian
detection for all math. The camera contains several internal features that affect
its object detection accuracy, but evaluating them all is too tough for a common
person, which contradicts the study’s goal. In addition, not all these factors are
considered during purchase or installation, so to simplify and make things easier
to conclude, the fundamental factors such as the height of the installation (which
is kept constant at 3.5 m), the focal length of the lens (measured in mm), the
camera’s resolution, the horizontal distance of the object to be detected from
the camera (measured in meters), and the angle of depression from the lens to
the. The results focus on conclusions from hundreds of simulations to explore
the association between the elements above.
For the user, Table 1 shows the results that will be displayed when he enters
his requirements in terms of the object to be detected and the size to which they
want them to be detected.
When the user wants to detect bananas at a maximum distance of 4 meters,
they get 4 options from which they can pick according to requirement/availability/budget.
The camera with the lowest resolution gives 73% coverage, the one with the next
best resolution gives 92% coverage, and the one with the second best resolution
gives 100% coverage in this case. Along with the coverage, the study also outputs
the ideal focal length of the lens that is to be set up and the maximum angle at
which the camera should point at the detection area for optimum detection.
8
Jash Jain, Manthan Juthani, Kashish Jain and Anant V. Nimkar
Table 1. Camera features for a selected object at a distance of 4m
Resolution Ideal Focal Length Max Camera Angle % Coverage
480 * 360
23
55.9
73%
640 * 512
17
62.3
92%
1280 * 720
7.7
72.4
100%
2048 * 1536
7.7
72.4
100%
First, we’re looking for a connection between the ideal focal length and the
camera resolution, assuming both the object’s distance from the camera and
its size remain same. The result of the correlation is seen in Figure 4,i.e. the
ideal focal length decreases when the camera resolution gets better. Ideal focal
length is the length that the study recommends that will help the user have
the maximum coverage of the area. It is also mathematically co-related to the
maximum angle of depression that the camera can have which is given by in [12]
to actually be able to recognize the object at that particular distance.
Figure 5 shows the relationship between the minimum object length and the
percent coverage of the total area for various focal lengths and camera angles
for objects like a banana, a bottle, and a bag, assuming a constant object-tocamera distance of 4 meters and a constant camera resolution of 480 by 360
pixels. The minimum length is considered because if a 20 centimeter bottle is
detected properly, then a bottle with comparable characteristics but a slightly
longer height would also be discovered. The testing was done with bottles of
various sizes, but for camera feature analysis, the minimal object length was
chosen, which was 20 cm for bottles, 13 cm for bananas, and 35 cm for bags.
6
Conclusion
TrashWatch detects littering with twelve things with high accuracy. Camera
specifications make personalized implementation straightforward. The system’s
efficacy has been proven in varied settings and test participants. The camera’s
seeing range decreased slightly as object size rose while retaining distance and
resolution. Optimizing camera resolution and focal length increased coverage significantly. The study showed how camera resolution and focus length affect item
identification. TrashWatch might design a comprehensive solution that includes
surveillance-based rubbish detection, parallel active monitoring for facial identification, and an automatic penalty system. Active tracking can locate offenders,
and working with authorities to create a powerful image database connected to
National Identification Cards (NIC) can help enforce in schools and offices. This
study lays the groundwork for waste management technology developments and
reinforces the commitment to cleaner and more accountable surroundings.
TrashWatch: Empowering Cleanliness through Smart Cameras
9
Fig. 4. Ideal Focal Length of the Camera for Maximum Coverage vs Camera Resolution
Fig. 5. Maximum Percentage Area covered for Detection using the System vs Minimum
object Length to be Detected
References
1. Csordás, Róbert, László Havasi, and Tamás Szirányi. "Detecting objects thrown over
fence in outdoor scenes." International Conference on Computer Vision Theory and
Applications. Vol. 2. SciTePress, 2015.
2. Mahankali, Sriya, et al. "Identification of illegal garbage dumping with video analytics." 2018 International Conference on Advances in Computing, Communications
and Informatics (ICACCI). IEEE, 2018.
3. Liu, Wei, et al. "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016:
14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016,
Proceedings, Part I 14. Springer International Publishing, 2016.
4. Esen, Ersin, Mehmet Ali Arabaci, and Medeni Soysal. "Fight detection in surveillance videos." 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 2013.
5. Wang, Limin, et al. "Temporal segment networks: Towards good practices for deep
action recognition." European conference on computer vision. Springer, Cham, 2016.
10
Jash Jain, Manthan Juthani, Kashish Jain and Anant V. Nimkar
6. Zhang, Zhong, Christopher Conly, and Vassilis Athitsos. "A survey on vision-based
fall detection." Proceedings of the 8th ACM international conference on PErvasive
technologies related to assistive environments. 2015.
7. Porikli, Fatih, Yuri Ivanov, and Tetsuji Haga. "Robust abandoned object detection
using dual foregrounds." EURASIP Journal on Advances in Signal Processing 2008
(2007): 1-11.
8. Zhou, Xiaowei, Can Yang, and Weichuan Yu. "Moving object detection by detecting
contiguous outliers in the low-rank representation." IEEE transactions on pattern
analysis and machine intelligence 35.3 (2012): 597-610.
9. Jiang, Peiyuan, et al. "A Review of Yolo algorithm developments." Procedia Computer Science 199 (2022): 1066-1073.
10. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv
preprint arXiv:1804.02767 (2018).
11. Rana, Md Sohel, Aiden Nibali, and Zhen He. "Selection of object detections using
overlap map predictions." Neural Computing and Applications 34.21 (2022): 1861118627.
12. Li, Xiang, et al. "Evaluating effects of focal length and viewing angle in a comparison of recent face landmark and alignment methods." Eurasip Journal on Image
and Video Processing 2021 (2021): 1-18.
13. Majchrowska, Sylwia, et al. "Deep learning-based waste detection in natural and
urban environments." Waste Management 138 (2022): 274-284.
14. Xu, Hong-hui, et al. "Object detection in crowded scenes via joint prediction."
Defence Technology (2021).
15. Begur, Hema, et al. "An edge-based smart mobile service system for illegal dumping
detection and monitoring in San Jose." 2017 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation
(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 2017.
16. Dabholkar, Akshay, et al. "Smart illegal dumping detection." 2017 IEEE Third
International Conference on Big Data Computing Service and Applications (BigDataService). IEEE, 2017.
17. Zhang, Qing, Yongwei Nie, and Wei-Shi Zheng. "Dual illumination estimation for
robust exposure correction." Computer graphics forum. Vol. 38. No. 7. 2019.
18. Guo, Xiaojie, Yu Li, and Haibin Ling. "LIME: Low-light image enhancement via
illumination map estimation." IEEE Transactions on image processing 26.2 (2016):
982-993.
19. Hasan, I., et al. "Generalizable Pedestrian Detection: The Elephant In The
Room, 2021 IEEE." CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021.
20. Wang, Haoran, Zhen Hua, and Jinjiang Li. "Two-stage progressive residual learning
network for multi-focus image fusion." IET Image Processing 16.3 (2022): 772-786.
21. Singh, Mohit, Vijay Laxmi, and Parvez Faruki. "Dense spatially-weighted attentive
residual-haze network for image dehazing." Applied Intelligence 52.12 (2022): 1385513869.
Related documents
Download