Uploaded by LY 777

AI Based Object Recognition Performance between General Camera and Omnidirectional Camera Images

advertisement
2022 2nd International Conference on Image Processing and Robotics (ICIPRob) | 978-1-6654-0771-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIPRob54042.2022.9798740
AI Based Object Recognition Performance between
General Camera and Omnidirectional Camera
Images
Shota Kaneda
Department of Electrical Engineering and Computer Science,
Graduate School of Engineering and Science
Shibaura Institute of Technology
Tokyo, Janpan
ma20027@shibaura-it.ac.jp
Abstract— In this paper, we present a comparison of the
accuracies of AI-based object recognition using a general
camera and an omnidirectional camera. Recently, with the
improvement in machine learning technology, there has been
significant research related to the detection and classification of
objects from images and videos. In this field, it is common to use
horizontal images and videos. However, omnidirectional
cameras, which can acquire information from the entire
surrounding area, are becoming popular in addition to general
cameras. Although there are some studies on object recognition
using these cameras, almost no studies have focused on
comparisons between object recognition using general and
omnidirectional cameras. Therefore, in this study, we compared
the recognition rate of object recognition using the YOLO
algorithm on both general and omnidirectional images taken in
the same environment.
Chinthaka Premachandra
Department of Electrical Engineering and Computer Science,
Graduate School of Engineering and Science
Shibaura Institute of Technology
Tokyo, Janpan
chintaka@shibaura-it.ac.jp
learning using CPUs in the past. Along with the development
of deep learning technology, the development of object
detection methods has also been remarkable. By using deep
learning, object detection with higher accuracy is now
possible [1].
When performing machine learning using AI, it is important
to process the data to be learned. In the case of object
recognition, images and videos were used as training data. In
this case, it is necessary to annotate each image data with
relevant information, such as the coordinates of which part of
the image or video is the target object, and labeling the
category into which the target object is classified, before
learning. The dataset used for object recognition was captured
using a standard camera.
Keywords— Omnidirectional camera, Object detection,
Machine learning,YOLO,Recognition comparison
I. INTRODUCTION
Because of advancements in image processing technology,
research on object recognition and classification has
progressed in recent years. Additionally, owing to the
innovation of Internet of Things (IoT) technology, image
processing can be realized in previously unthinkable
situations, and information processing using images in various
situations is increasing. In these studies, it is possible to obtain
information about an object by photographing it with a camera
and then performing digital image processing on the acquired
image or video.
In recent years, research in the fields of object recognition
and classification using artificial intelligence (AI), such as
machine learning and deep learning, has made remarkable
progress. With the development of deep learning technology,
computers can now automatically extract feature values from
training data, which was previously the most difficult problem
in defining feature values in conventional machine learning.
Additionally, the remarkable development of graphics
processing unit (GPU) computing technology and its
processing power in recent years has contributed to the
advancement of machine learning research. The
computational power of GPUs, which was previously used for
graphics processing, is now being used for matrix
calculations, which are essential for machine learning.
Consequently, the time required for machine learning has
been significantly reduced compared to the time required for
Fig. 1. Omnidirectional camera “RICOH THETA S”
Recently, there has been growing interest in the use of 360degree cameras, which can acquire information on the entire
surrounding area, to obtain a wider range of information
compared to ordinary cameras [2]. A 360-degree camera uses
a fisheye lens to focus light from a wider area than a standard
flat camera. The angle of view (approximately 70°), which is
almost unidirectional in a normal flat camera, is extended to
180° (half omnidirectional shape) using a fisheye lens.
Omnidirectional cameras have a structure such that there are
hemispheric fisheye lenses in front of and behind the camera
body (Fig. 1). Because an omnidirectional camera has a wide
field of view with a single camera, it has the advantage that
the number of cameras installed can be reduced compared to
the number of general cameras used in surveillance systems in
open areas [3]. Additionally, when tracking a target object, it
is possible to obtain a broader range of information in the
image that can be acquired from the same camera than the
angle of view of a general camera. Recently, research has been
978-1-6654-0771- 7/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.
conducted on the development of omnidirectional
photography, including image processing research on
generating omnidirectional camera images by matching two
independent hemispherical camera images [4] and highresolution imaging of omnidirectional camera images [5-6].
Object recognition and its evaluation by deep learning using
AI on the basis of image processing of images and videos
obtained from the aforementioned omnidirectional camera
have also been conducted [6-13]. However, the comparison of
object recognition accuracies between cameras with different
lens imaging methods, such as general and omnidirectional
cameras, has not yet been studied. In this study, we acquired
images of the same object in the same environment using a
general camera and an omnidirectional camera, and we
created training data for machine learning for each camera
based on these images. Based on the training data, learning
was performed using deep learning technology. Following the
learning model, we detected the target object using different
testing data images from both normal and omnidirectional
cameras, and we compared the detection accuracy.
on the image sensor collected by the lens is converted from
analog information by light into electrical digital information
by the photoelectric effect of the photodiodes that constitute
the image sensor. This electrical information acquired by a
single photodiode is the information of one pixel in a digital
image. The information on each pixel was integrated to form
single image data. This image data is stored in the storage area
of the camera itself or on various memory cards after the
image processing engine in the camera adjusts the contents,
such as color and brightness. Because the number of
semiconductors on the image sensor corresponds to the
resolution of the acquired image, in general, the more image
sensors there are, the more information there is in the acquired
image. An image acquired by a camera often has a rectangular
shape with a long horizontal direction.
This paper consists of five sections: Section 2 compares the
image capture methods and other aspects of the general and
omnidirectional cameras used in this study, and it introduces
the technology used. Section 3 defines the problem and
describes the methodology used. Section 4 presents the
experimental environment and the results of this study.
Section 5 summarizes the results.
II. OMNIDIRECTIONAL AND GENERAL IMAGE GENERATION
A.
Imaging processing
This section describes the imaging methods for the two
types of cameras that are important in this research. In this
study, we refer to a camera with a general imaging method as
a general camera and a 360-degree camera capable of
capturing the entire surroundings as an omnidirectional
camera.
In general cameras, the following steps are taken: first, the
lens captures the object to be photographed, and then, the
shutter is released to save the digital image in the camera body
or storage device (Fig. 2).
Fig. 3. Equidistant projection of a fisheye lens
As mentioned earlier, the lens used in an omnidirectional
camera is a half omnidirectional fisheye lens, and each lens
can acquire information from approximately 180° around the
camera. In general, the acquired ambient information in the
omnidirectional camera is projected onto the image sensor in
the camera using the equidistant projection method, as shown
in Fig. 4, and then saved as image data in the storage area. The
ambient information of the lens acquired using the equidistant
projection method is a circle in which the distance from the
center of the lens is proportional to the angle of the object. The
stored image is actually a square, and outside the circular area
described by equidistant projection, the image is filled with
black pixel information that has no light information.
Fig. 2. Structure of a typical digital camera
First, the object to be photographed is captured by the lens,
and then the amount of light entering the lens and the focus
are adjusted to produce an optical image of the object on the
image sensor in the flat camera based on the characteristics of
light refraction of the lens. The actual lens in a general digital
camera consists of a combination of multiple lenses with
different magnifications, refractive indices, and shapes. The
image sensor in Fig. 5 consists of a collection of minute
semiconductors. When the shutter is pressed, the light image
Fig. 4. Real image of Equidistant projection
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.
In the equidistant projection images shown in Figs. 3 and 4,
the red dot indicates the lens center, and the red circle indicates
the equidistant object position. Because the distances from the
red point to each red circle are equal, more distortion is caused
farther away from the center red point owing to the
characteristics of the fisheye lens increase in the acquired
image, further distorting the appearance of the object
compared to a typical camera. Some methods correct this
peculiar distortion and modify the image [6].
III. PROBREM SETTING
In this study, we compared the object-recognition
performance of a general camera and an omnidirectional
camera. While countless datasets are available to use freely,
from images taken by general cameras, there are very few
datasets available to use freely, from images captured by
fisheye cameras (omnidirectional cameras). Additionally, we
could not find any datasets that were almost equivalent for
both general and omnidirectional cameras. Therefore, we
acquired images from both typical and omnidirectional
cameras while maintaining the same distance relationship
with the camera and object, and we created each dataset for
object recognition (Fig. 5). Furthermore, YOLOv5 was used
as the object detection algorithm in this study; YOLO is
characterized by the fact that two processes, object detection
and class classification, are performed simultaneously [7].
Table 2 shows the details of the datasets created for each
camera used in this study.
In this study, we used plastic bottles as target objects for
object recognition. We chose plastic bottles as the target
object because they come in various colors and sizes, are
inexpensive, and can be used to capture images for the dataset
in various locations and conditions.
Fig. 5.
learning model file using the following method. In general
object recognition, many evaluation metrics such as
intersection over union (IoU) are used in many cases.
However, when object recognition is performed using the
training results of the fisheye lens used in the omnidirectional
camera, the recognition rate at the center of the lens, where
the lens distortion is small, is expected to differ from the
recognition rate at the periphery, where the lens distortion is
large. Therefore, we made it possible to check the
performance by changing various conditions when taking
images for verification.
To verify the recognition rate, we used three objects: one in
the center and one each on the left and right sides. The
leftmost object is called object 1 (obj1), the object placed on
the center is called object 2 (obj2), and the rightmost object
is called object 3 (obj3) (Fig. 6).
Fig. 6. Verification Object (Obj1-left, Obj2-center, Obj3-right)
Obj1 and obj3 are moved outward from the table by 5 cm
each, and the maximum distance from the center was 80 cm.
Seventeen different images were taken, and the recognition
rate of the object was calculated for each image (Fig. 7).
Furthermore, the distance between obj2 and the camera was
set as the camera distance, and the camera position was
changed every 0.25 m. The images were taken in eight
patterns with camera distances ranging from 0.25 m to 2 m in
each stage. For the recognition rate verification, we used
1920 × 1080 images for the general camera and 960 × 960
images for the fisheye camera.
Photographic equipment
Fig. 7. Measurement method
Tab. 2. Dataset Contents
Dataset
General
omnidirectional
Number of images
800
800
Number of objects (per image)
2~5
2~5
Image size (pixel ×pixel)
960*960
960*960
Epochs
500
500
The above verification of the recognition rate for the two
types of cameras was performed by applying the learning
model obtained from deep learning using the dataset created
from the images taken by each normal and omnidirectional
camera. We also checked the effect of applying learning
models created from images of different camera types on
object recognition.
In this study, after creating the result (learning model) file
after deep learning with the produced dataset, we verified the
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENTAL EVALUATION
A.
General camera’s verification
Fig. 8 shows the recognition rate of the target object at each
distance in the validation images captured by a general
camera. The average recognition rates of each object are listed
in Table. 3.
Fig. 8 General camera’s recognition rate (Learning model with a general
camera)
Tab. 3. Average recognition rate for General camera
material
Average recognition rate
obj1
0.762
obj2
0.646
obj3
0.811
The recognition rate of obj2 at the center was slightly lower
in the case of the general camera at the camera distance of 1.0
m to 1.5 m. However, the recognition rate was relatively high
in the other cases (Fig.8 right).
Comparing obj1 on the left and obj3 on the right, which were
moved symmetrically with respect to the center, the
recognition rate of obj1 tended to be slightly lower, but both
objects showed stable recognition rates. The recognition rate
of obj1 was slightly lower than that of obj3, but the recognition
rate of obj3 was stable. Obj1 and obj3, which are often located
at the edge of the image, had a slightly higher recognition rate
than that of obj2, which is located on the center of the image
(Fig.8 left).
B.
Omnidirectional camera’s verification
Fig. 9 shows the recognition rate of the target object at each
distance in the validation images captured by the
omnidirectional camera. The average recognition rates of
each object are listed in Table. 4.
Fig. 9 Omnidirectional camera’s recognition rate (Learning model with a
Omnidirectional camera)
Tab. 4. Average recognition rate for Omnidirectional camera
material
Average recognition rate
obj1
0.402
obj2
0.773
obj3
0.666
In the case of an omnidirectional camera, it was confirmed
that the recognition rate of the central obj2 gradually
decreased as the camera-object distance increased (Fig.9
right).
Looking at obj1 on the left and obj3 on the right, which have
been moved symmetrically with respect to the center, the
recognition rate tends to decrease as the camera object
distance increases, similar to that of obj2 at the center. Also,
comparing obj2 at the center of the lens with obj1 and obj3
located at the periphery of the lens, the recognition rate of
obj2 at the center tends to be higher. When we compared the
recognition rate of obj1 on the left with that of obj3 on the
right, the recognition rate of obj1 tended to be lower than that
of obj3, but this tendency increased when we used an
omnidirectional camera.
C.
Omnidirectional camera’s verification with general
camera’s learning model
Fig. 10 shows the recognition rate of the target object at each
distance in the validation image taken by the omnidirectional
camera, and the result derived by applying the learning result
(learning model) to by the general camera. The average
recognition rates for each object are listed in Table. 5.
Fig. 10 Omnidirectional camera’s recognition rate (Learning model with a
general camera)
Tab. 5. Average recognition rate for Omnidirectional camera
material
obj1
Average recognition rate
0.343
obj2
0.609
obj3
0.252
The result of measuring the recognition rate for object
recognition with an omnidirectional camera image using the
learning result (learning model) by the general camera. The
results show that the recognition rates of all objects decrease
significantly as the camera distance increases. This trend is
more pronounced than in the verification of the recognition
rate of the omnidirectional camera by the verification of the
omnidirectional camera’s learning model.
In addition, there was a large difference in recognition rates
between the central obj2 and symmetrically moving obj1 and
obj3, with the central obj2 having a higher recognition rate.
D.
General camera’s verification with omnidirectional
camera’s learning model
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.
Fig. 11 shows the recognition rate of the target object at each
distance in the validation image taken by the general camera,
and the result derived by applying the learning result
(learning model) by the omnidirectional camera. The average
recognition rates for each object are listed in Table. 6.
Furthermore, we confirmed that the recognition rate
decreases when the recognition is done by a learning model
created with images from different shooting methods. Thus,
it was demonstrated that creating a data set with the camera
actually used for object recognition is a very effective method
from the viewpoint of recognition rate. In the future, we
would like to consider a method for object recognition
utilizing the fisheye lens of an omnidirectional camera based
on the results of this study.
REFERENCES
[1]
Fig. 11 General camera’s recognition rate (Learning model with a
Omnidirectional camera)
[2]
Tab.6. Average recognition rate for General camera
material
obj1
obj2
obj3
Average recognition rate
0.027
0.587
0.433
The result of measuring the recognition rate for object
recognition with a general camera image using the learning
result (learning model) by the omnidirectional camera. The
result shows that where the camera object distance was the
shortest, no object could be recognized. The recognition rate
was rather low when the camera distance was between 1.0 m
and 1.5 m, but relatively high in other cases (Fig. 11). This
trend was a little bit similar to that of applying the learning
model of a general camera to a general camera’s images.
[3]
[4]
[5]
[6]
[7]
[8]
V. CONCLUSION
In this study, we have addressed the consideration of
comparing the performance of object recognition using a
general and an omnidirectional camera. In our experiments,
we created datasets from two types of cameras, a general
camera and an omnidirectional camera, in almost the same
environment, and performed machine learning. Based on the
learning results (model), we verified the object recognition of
the target object by changing the positional relationship
between the camera and the target object.
As a result, we found that the recognition range of a general
camera is narrow due to the viewing angle, but the
recognition rate is not easily affected by distance changes or
the position of the object in the image within few meters,
while the recognition range of an omnidirectional camera is
wide, but the recognition rate is easily affected by distance
changes or the position of the object in the image.
This indicates that the tendency of object recognition in
omnidirectional camera images is different between the
center and the periphery of the fisheye lens.
[9]
[10]
[11]
[12]
[13]
K. B. Lee and H. S. Shin, “An Application of a Deep Learning
Algorithm for Automatic Detection of Unexpected Accidents Under
Bad CCTV Monitoring Conditions in Tunnels”, 2019
C. Premachandra, S. Ueda, and Y. Suzuki, "Detection and Tracking of
Moving Objects at Road Intersections Using a 360-Degree Camera for
Driver Assistance and Automated Driving, "IEEE Access, Vol.8, pp.
135652 - 135660, July 2020.
M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson and A.
Dickerson, "360 degrees video coding using region adaptive
smoothing", Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 750-754,
Sep. 2015.
S. Ono and C. Premachandra, "Generation of Panoramic Images by
Two Hemiomnidirectional Cameras Independent of Installation
location,
"IEEE Consumer Electronics Magazine, (Early Access Article).
C. Premachandra and M. Tamaki, "A Hybrid Camera System for highresolutionization of Target Objects in Omnidirectional Images," IEEE
Sensors Journal, Vol. 21, No. 9, pp. 10752-10760, May 2021.
M. Tamaki and C. Premachandra, “An automatic compensation system
for unclear area in 360-degree images using Pan-Tilt camera,” Proc. of
5th IEEE International Symposium of System Engineering, Oct. 2019
C. Premachandra, S. Ueda, and Y. Suzuki, "Detection and Tracking of
Moving Objects at Road Intersections Using a 360-Degree Camera for
Driver Assistance and Automated Driving,"IEEE Access, Vol.8, pp.
135652 - 135660, July 2020.
M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson and A.
Dickerson, "360 degrees video coding using region adaptive
smoothing", Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 750-754,
Sep. 2015.
A. K. Mulya, F. Ardilla and D. Pramadihanto, "Ball tracking and goal
detection for middle size soccer robot using omnidirectional camera",
Proc. Int. Electron. Symp. (IES), pp. 432-437, Sep. 2016.
G. Pudics, M. Z. Szabo-Resch and Z. Vamossy, "Safe robot navigation
using an omnidirectional camera", Proc. 16th IEEE Int. Symp. Comput.
Intell. Informat. (CINTI), pp. 227-231, Nov. 2015.
T. Kai, H. Lu, and T. Kamiya,” Object Recognition from
Omnidirectional Camera Images Based on YOLOv3”, 2020 20th
International Conference on Control, Automation and Systems
(ICCAS), 2020
B. Zhang, J. Wang, J. Li, and X. Wang,” Fisheye Lens Distortion
Correction Based on an Ellipsoidal Function Model”, 2015
International Conference on Industrial Informatics - Computing
Technology, Intelligent Technology, Industrial Information
Integration, 2015
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look
Once:
Unified,
Real-Time
Object
Detection”,
https://arxiv.org/abs/1506.0264, June 2015
Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.
Related documents
Download