2022 2nd International Conference on Image Processing and Robotics (ICIPRob) | 978-1-6654-0771-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIPRob54042.2022.9798740 AI Based Object Recognition Performance between General Camera and Omnidirectional Camera Images Shota Kaneda Department of Electrical Engineering and Computer Science, Graduate School of Engineering and Science Shibaura Institute of Technology Tokyo, Janpan ma20027@shibaura-it.ac.jp Abstract— In this paper, we present a comparison of the accuracies of AI-based object recognition using a general camera and an omnidirectional camera. Recently, with the improvement in machine learning technology, there has been significant research related to the detection and classification of objects from images and videos. In this field, it is common to use horizontal images and videos. However, omnidirectional cameras, which can acquire information from the entire surrounding area, are becoming popular in addition to general cameras. Although there are some studies on object recognition using these cameras, almost no studies have focused on comparisons between object recognition using general and omnidirectional cameras. Therefore, in this study, we compared the recognition rate of object recognition using the YOLO algorithm on both general and omnidirectional images taken in the same environment. Chinthaka Premachandra Department of Electrical Engineering and Computer Science, Graduate School of Engineering and Science Shibaura Institute of Technology Tokyo, Janpan chintaka@shibaura-it.ac.jp learning using CPUs in the past. Along with the development of deep learning technology, the development of object detection methods has also been remarkable. By using deep learning, object detection with higher accuracy is now possible [1]. When performing machine learning using AI, it is important to process the data to be learned. In the case of object recognition, images and videos were used as training data. In this case, it is necessary to annotate each image data with relevant information, such as the coordinates of which part of the image or video is the target object, and labeling the category into which the target object is classified, before learning. The dataset used for object recognition was captured using a standard camera. Keywords— Omnidirectional camera, Object detection, Machine learning,YOLO,Recognition comparison I. INTRODUCTION Because of advancements in image processing technology, research on object recognition and classification has progressed in recent years. Additionally, owing to the innovation of Internet of Things (IoT) technology, image processing can be realized in previously unthinkable situations, and information processing using images in various situations is increasing. In these studies, it is possible to obtain information about an object by photographing it with a camera and then performing digital image processing on the acquired image or video. In recent years, research in the fields of object recognition and classification using artificial intelligence (AI), such as machine learning and deep learning, has made remarkable progress. With the development of deep learning technology, computers can now automatically extract feature values from training data, which was previously the most difficult problem in defining feature values in conventional machine learning. Additionally, the remarkable development of graphics processing unit (GPU) computing technology and its processing power in recent years has contributed to the advancement of machine learning research. The computational power of GPUs, which was previously used for graphics processing, is now being used for matrix calculations, which are essential for machine learning. Consequently, the time required for machine learning has been significantly reduced compared to the time required for Fig. 1. Omnidirectional camera “RICOH THETA S” Recently, there has been growing interest in the use of 360degree cameras, which can acquire information on the entire surrounding area, to obtain a wider range of information compared to ordinary cameras [2]. A 360-degree camera uses a fisheye lens to focus light from a wider area than a standard flat camera. The angle of view (approximately 70°), which is almost unidirectional in a normal flat camera, is extended to 180° (half omnidirectional shape) using a fisheye lens. Omnidirectional cameras have a structure such that there are hemispheric fisheye lenses in front of and behind the camera body (Fig. 1). Because an omnidirectional camera has a wide field of view with a single camera, it has the advantage that the number of cameras installed can be reduced compared to the number of general cameras used in surveillance systems in open areas [3]. Additionally, when tracking a target object, it is possible to obtain a broader range of information in the image that can be acquired from the same camera than the angle of view of a general camera. Recently, research has been 978-1-6654-0771- 7/22/$31.00 ©2022 IEEE Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply. conducted on the development of omnidirectional photography, including image processing research on generating omnidirectional camera images by matching two independent hemispherical camera images [4] and highresolution imaging of omnidirectional camera images [5-6]. Object recognition and its evaluation by deep learning using AI on the basis of image processing of images and videos obtained from the aforementioned omnidirectional camera have also been conducted [6-13]. However, the comparison of object recognition accuracies between cameras with different lens imaging methods, such as general and omnidirectional cameras, has not yet been studied. In this study, we acquired images of the same object in the same environment using a general camera and an omnidirectional camera, and we created training data for machine learning for each camera based on these images. Based on the training data, learning was performed using deep learning technology. Following the learning model, we detected the target object using different testing data images from both normal and omnidirectional cameras, and we compared the detection accuracy. on the image sensor collected by the lens is converted from analog information by light into electrical digital information by the photoelectric effect of the photodiodes that constitute the image sensor. This electrical information acquired by a single photodiode is the information of one pixel in a digital image. The information on each pixel was integrated to form single image data. This image data is stored in the storage area of the camera itself or on various memory cards after the image processing engine in the camera adjusts the contents, such as color and brightness. Because the number of semiconductors on the image sensor corresponds to the resolution of the acquired image, in general, the more image sensors there are, the more information there is in the acquired image. An image acquired by a camera often has a rectangular shape with a long horizontal direction. This paper consists of five sections: Section 2 compares the image capture methods and other aspects of the general and omnidirectional cameras used in this study, and it introduces the technology used. Section 3 defines the problem and describes the methodology used. Section 4 presents the experimental environment and the results of this study. Section 5 summarizes the results. II. OMNIDIRECTIONAL AND GENERAL IMAGE GENERATION A. Imaging processing This section describes the imaging methods for the two types of cameras that are important in this research. In this study, we refer to a camera with a general imaging method as a general camera and a 360-degree camera capable of capturing the entire surroundings as an omnidirectional camera. In general cameras, the following steps are taken: first, the lens captures the object to be photographed, and then, the shutter is released to save the digital image in the camera body or storage device (Fig. 2). Fig. 3. Equidistant projection of a fisheye lens As mentioned earlier, the lens used in an omnidirectional camera is a half omnidirectional fisheye lens, and each lens can acquire information from approximately 180° around the camera. In general, the acquired ambient information in the omnidirectional camera is projected onto the image sensor in the camera using the equidistant projection method, as shown in Fig. 4, and then saved as image data in the storage area. The ambient information of the lens acquired using the equidistant projection method is a circle in which the distance from the center of the lens is proportional to the angle of the object. The stored image is actually a square, and outside the circular area described by equidistant projection, the image is filled with black pixel information that has no light information. Fig. 2. Structure of a typical digital camera First, the object to be photographed is captured by the lens, and then the amount of light entering the lens and the focus are adjusted to produce an optical image of the object on the image sensor in the flat camera based on the characteristics of light refraction of the lens. The actual lens in a general digital camera consists of a combination of multiple lenses with different magnifications, refractive indices, and shapes. The image sensor in Fig. 5 consists of a collection of minute semiconductors. When the shutter is pressed, the light image Fig. 4. Real image of Equidistant projection Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply. In the equidistant projection images shown in Figs. 3 and 4, the red dot indicates the lens center, and the red circle indicates the equidistant object position. Because the distances from the red point to each red circle are equal, more distortion is caused farther away from the center red point owing to the characteristics of the fisheye lens increase in the acquired image, further distorting the appearance of the object compared to a typical camera. Some methods correct this peculiar distortion and modify the image [6]. III. PROBREM SETTING In this study, we compared the object-recognition performance of a general camera and an omnidirectional camera. While countless datasets are available to use freely, from images taken by general cameras, there are very few datasets available to use freely, from images captured by fisheye cameras (omnidirectional cameras). Additionally, we could not find any datasets that were almost equivalent for both general and omnidirectional cameras. Therefore, we acquired images from both typical and omnidirectional cameras while maintaining the same distance relationship with the camera and object, and we created each dataset for object recognition (Fig. 5). Furthermore, YOLOv5 was used as the object detection algorithm in this study; YOLO is characterized by the fact that two processes, object detection and class classification, are performed simultaneously [7]. Table 2 shows the details of the datasets created for each camera used in this study. In this study, we used plastic bottles as target objects for object recognition. We chose plastic bottles as the target object because they come in various colors and sizes, are inexpensive, and can be used to capture images for the dataset in various locations and conditions. Fig. 5. learning model file using the following method. In general object recognition, many evaluation metrics such as intersection over union (IoU) are used in many cases. However, when object recognition is performed using the training results of the fisheye lens used in the omnidirectional camera, the recognition rate at the center of the lens, where the lens distortion is small, is expected to differ from the recognition rate at the periphery, where the lens distortion is large. Therefore, we made it possible to check the performance by changing various conditions when taking images for verification. To verify the recognition rate, we used three objects: one in the center and one each on the left and right sides. The leftmost object is called object 1 (obj1), the object placed on the center is called object 2 (obj2), and the rightmost object is called object 3 (obj3) (Fig. 6). Fig. 6. Verification Object (Obj1-left, Obj2-center, Obj3-right) Obj1 and obj3 are moved outward from the table by 5 cm each, and the maximum distance from the center was 80 cm. Seventeen different images were taken, and the recognition rate of the object was calculated for each image (Fig. 7). Furthermore, the distance between obj2 and the camera was set as the camera distance, and the camera position was changed every 0.25 m. The images were taken in eight patterns with camera distances ranging from 0.25 m to 2 m in each stage. For the recognition rate verification, we used 1920 × 1080 images for the general camera and 960 × 960 images for the fisheye camera. Photographic equipment Fig. 7. Measurement method Tab. 2. Dataset Contents Dataset General omnidirectional Number of images 800 800 Number of objects (per image) 2~5 2~5 Image size (pixel ×pixel) 960*960 960*960 Epochs 500 500 The above verification of the recognition rate for the two types of cameras was performed by applying the learning model obtained from deep learning using the dataset created from the images taken by each normal and omnidirectional camera. We also checked the effect of applying learning models created from images of different camera types on object recognition. In this study, after creating the result (learning model) file after deep learning with the produced dataset, we verified the Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply. IV. EXPERIMENTAL EVALUATION A. General camera’s verification Fig. 8 shows the recognition rate of the target object at each distance in the validation images captured by a general camera. The average recognition rates of each object are listed in Table. 3. Fig. 8 General camera’s recognition rate (Learning model with a general camera) Tab. 3. Average recognition rate for General camera material Average recognition rate obj1 0.762 obj2 0.646 obj3 0.811 The recognition rate of obj2 at the center was slightly lower in the case of the general camera at the camera distance of 1.0 m to 1.5 m. However, the recognition rate was relatively high in the other cases (Fig.8 right). Comparing obj1 on the left and obj3 on the right, which were moved symmetrically with respect to the center, the recognition rate of obj1 tended to be slightly lower, but both objects showed stable recognition rates. The recognition rate of obj1 was slightly lower than that of obj3, but the recognition rate of obj3 was stable. Obj1 and obj3, which are often located at the edge of the image, had a slightly higher recognition rate than that of obj2, which is located on the center of the image (Fig.8 left). B. Omnidirectional camera’s verification Fig. 9 shows the recognition rate of the target object at each distance in the validation images captured by the omnidirectional camera. The average recognition rates of each object are listed in Table. 4. Fig. 9 Omnidirectional camera’s recognition rate (Learning model with a Omnidirectional camera) Tab. 4. Average recognition rate for Omnidirectional camera material Average recognition rate obj1 0.402 obj2 0.773 obj3 0.666 In the case of an omnidirectional camera, it was confirmed that the recognition rate of the central obj2 gradually decreased as the camera-object distance increased (Fig.9 right). Looking at obj1 on the left and obj3 on the right, which have been moved symmetrically with respect to the center, the recognition rate tends to decrease as the camera object distance increases, similar to that of obj2 at the center. Also, comparing obj2 at the center of the lens with obj1 and obj3 located at the periphery of the lens, the recognition rate of obj2 at the center tends to be higher. When we compared the recognition rate of obj1 on the left with that of obj3 on the right, the recognition rate of obj1 tended to be lower than that of obj3, but this tendency increased when we used an omnidirectional camera. C. Omnidirectional camera’s verification with general camera’s learning model Fig. 10 shows the recognition rate of the target object at each distance in the validation image taken by the omnidirectional camera, and the result derived by applying the learning result (learning model) to by the general camera. The average recognition rates for each object are listed in Table. 5. Fig. 10 Omnidirectional camera’s recognition rate (Learning model with a general camera) Tab. 5. Average recognition rate for Omnidirectional camera material obj1 Average recognition rate 0.343 obj2 0.609 obj3 0.252 The result of measuring the recognition rate for object recognition with an omnidirectional camera image using the learning result (learning model) by the general camera. The results show that the recognition rates of all objects decrease significantly as the camera distance increases. This trend is more pronounced than in the verification of the recognition rate of the omnidirectional camera by the verification of the omnidirectional camera’s learning model. In addition, there was a large difference in recognition rates between the central obj2 and symmetrically moving obj1 and obj3, with the central obj2 having a higher recognition rate. D. General camera’s verification with omnidirectional camera’s learning model Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply. Fig. 11 shows the recognition rate of the target object at each distance in the validation image taken by the general camera, and the result derived by applying the learning result (learning model) by the omnidirectional camera. The average recognition rates for each object are listed in Table. 6. Furthermore, we confirmed that the recognition rate decreases when the recognition is done by a learning model created with images from different shooting methods. Thus, it was demonstrated that creating a data set with the camera actually used for object recognition is a very effective method from the viewpoint of recognition rate. In the future, we would like to consider a method for object recognition utilizing the fisheye lens of an omnidirectional camera based on the results of this study. REFERENCES [1] Fig. 11 General camera’s recognition rate (Learning model with a Omnidirectional camera) [2] Tab.6. Average recognition rate for General camera material obj1 obj2 obj3 Average recognition rate 0.027 0.587 0.433 The result of measuring the recognition rate for object recognition with a general camera image using the learning result (learning model) by the omnidirectional camera. The result shows that where the camera object distance was the shortest, no object could be recognized. The recognition rate was rather low when the camera distance was between 1.0 m and 1.5 m, but relatively high in other cases (Fig. 11). This trend was a little bit similar to that of applying the learning model of a general camera to a general camera’s images. [3] [4] [5] [6] [7] [8] V. CONCLUSION In this study, we have addressed the consideration of comparing the performance of object recognition using a general and an omnidirectional camera. In our experiments, we created datasets from two types of cameras, a general camera and an omnidirectional camera, in almost the same environment, and performed machine learning. Based on the learning results (model), we verified the object recognition of the target object by changing the positional relationship between the camera and the target object. As a result, we found that the recognition range of a general camera is narrow due to the viewing angle, but the recognition rate is not easily affected by distance changes or the position of the object in the image within few meters, while the recognition range of an omnidirectional camera is wide, but the recognition rate is easily affected by distance changes or the position of the object in the image. This indicates that the tendency of object recognition in omnidirectional camera images is different between the center and the periphery of the fisheye lens. [9] [10] [11] [12] [13] K. B. Lee and H. S. Shin, “An Application of a Deep Learning Algorithm for Automatic Detection of Unexpected Accidents Under Bad CCTV Monitoring Conditions in Tunnels”, 2019 C. Premachandra, S. Ueda, and Y. Suzuki, "Detection and Tracking of Moving Objects at Road Intersections Using a 360-Degree Camera for Driver Assistance and Automated Driving, "IEEE Access, Vol.8, pp. 135652 - 135660, July 2020. M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson and A. Dickerson, "360 degrees video coding using region adaptive smoothing", Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 750-754, Sep. 2015. S. Ono and C. Premachandra, "Generation of Panoramic Images by Two Hemiomnidirectional Cameras Independent of Installation location, "IEEE Consumer Electronics Magazine, (Early Access Article). C. Premachandra and M. Tamaki, "A Hybrid Camera System for highresolutionization of Target Objects in Omnidirectional Images," IEEE Sensors Journal, Vol. 21, No. 9, pp. 10752-10760, May 2021. M. Tamaki and C. Premachandra, “An automatic compensation system for unclear area in 360-degree images using Pan-Tilt camera,” Proc. of 5th IEEE International Symposium of System Engineering, Oct. 2019 C. Premachandra, S. Ueda, and Y. Suzuki, "Detection and Tracking of Moving Objects at Road Intersections Using a 360-Degree Camera for Driver Assistance and Automated Driving,"IEEE Access, Vol.8, pp. 135652 - 135660, July 2020. M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson and A. Dickerson, "360 degrees video coding using region adaptive smoothing", Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 750-754, Sep. 2015. A. K. Mulya, F. Ardilla and D. Pramadihanto, "Ball tracking and goal detection for middle size soccer robot using omnidirectional camera", Proc. Int. Electron. Symp. (IES), pp. 432-437, Sep. 2016. G. Pudics, M. Z. Szabo-Resch and Z. Vamossy, "Safe robot navigation using an omnidirectional camera", Proc. 16th IEEE Int. Symp. Comput. Intell. Informat. (CINTI), pp. 227-231, Nov. 2015. T. Kai, H. Lu, and T. Kamiya,” Object Recognition from Omnidirectional Camera Images Based on YOLOv3”, 2020 20th International Conference on Control, Automation and Systems (ICCAS), 2020 B. Zhang, J. Wang, J. Li, and X. Wang,” Fisheye Lens Distortion Correction Based on an Ellipsoidal Function Model”, 2015 International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration, 2015 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection”, https://arxiv.org/abs/1506.0264, June 2015 Authorized licensed use limited to: National Yang Ming Chiao Tung University. Downloaded on March 06,2024 at 16:07:48 UTC from IEEE Xplore. Restrictions apply.