Using depth cameras for biomass estimation – a multi-angle approach Andújar, D.1, Escolà, A.2, Rosell-Polo, J.R.2, Ribeiro, A.3, San Martín, C.1, FernándezQuintanilla, C.1, and Dorado, J.1 1 Institute of Agricultural Sciences, CSIC, Madrid 28006, Spain 2 Research Group on AgroICT & Precision Agriculture, UdL, Lleida 25198, Spain 3 Center for Automatic and Robotic, CSIC, Arganda del Rey 28500, Madrid, Spain dionisioandujar@hotmail.com Abstract The multi-angle plant reconstruction obtained from sensors such as Microsoft Kinect creates realistic models. However a full 3D reconstruction from every angle is not possible at present under field conditions. When an on-the-go measurement is taken, the sensor must be fixed at a vehicle and its best position needs to be determined. The objective of this study was to assess the possibilities of the Microsoft Kinect for Windows v1 sensor to quantify the biomass of poplar trees using different angles from a stationary position, in other words, to explore the best location of the sensor with respect to the trees. For this purpose, readings were obtained by placing the sensor at one meter from the tree, comparing four different view angles: top view (0º), 45º, perpendicular (90º) and ground (-45º). Good correlations between dry biomass and calculated plant surface area from measured raw data were found. The comparison of the different view angles revealed that top view showed poorer results due to top leaves occluding lower leaves. However, the other views led to good results. Consequently, the Microsoft Kinect for Windows v1 sensor provides reliable information about crop biomass. Keywords: depth cameras, Kinect, angle of view, biomass assessment. Introduction Numerous sensors have been used for crop characterization: RGB cameras, NDVI sensors, fluorescence systems, ultrasonic sensors, LiDAR sensors. The RGB cameras have been widely used because of their efficiency and high resolution. The NDVI sensors have been mainly targeted at variable rate nitrogen applications. The use of ultrasonic sensors could have potential for reliable adoption of site-specific management technologies. Escolà et al. (2011) developed a real-time sprayer based on ultrasonic sensors, each of them managed and activated a specific boom sections. However, the use of these sensors has various limitations mainly related to its small field of view. Therefore, various sensors are necessary for scanning a representative proportion of the field. In this regard, a perception system with wider scanning possibilities could improve the results. The use of LIDAR (LIght Detection And Ranging) sensing technologies could be a powerful tool. Terrestrial Laser Scanners (TLS) have become popular in plant science in recent years for obtaining 3D models (Méndez et al., 2014). There are several systems using 3D techniques such as magnetic resonance or X-ray, ultrasonic systems (Andújar et al., 2011;), stereo-vision and depth cameras (Dal Mutto et al., 2012). The low cost and the high frame rate (number of frames per second) of depth cameras are promoting their increased usage. These cameras can be divided in two groups: Time of Flight (ToF) and structured-light emission cameras. Structuredlight scanners recent development for 3D modeling. Various studies have been conducted to assess different types of 3D cameras. Paulus et al. (2014) reconstructed the volumetric shape of sugar beet taproots and their leaves in order to compare the available cameras in the market. They showed their great potential for plant reconstruction, phenotyping and possible automated applications, suggesting that these cameras could replace some alternative high-cost tools. Chéné et al. (2012) also assessed the potential of these low cost cameras in plant phenotyping, concluding that Kinect (Microsoft Inc., Redmond, Washington, USA) sensors could successfully identify plant shape. Very few studies have been conducted in woody plants to assess 3D cameras for biomass estimation. Nock et al. (2013) conducted a test using structured-light emission cameras to test Salix branches at different distances, showing their possibilities for reconstruction of branching architecture of woody plants. New technologies are required to improve agricultural practices and to increase crop yields. One of these technologies is automatic characterization of the crop canopy. Although, a fully geometric characterization is nowadays possible, getting a proper reconstruction is sometimes unachievable or the reconstruction is time-intensive. The use of structuredlight emission cameras, such as Kinect sensor could contribute to solving this problem. However, since poplars are planted in wide (3-m) rows, a full 3D reconstruction from every angle is not realistic under field conditions. In on-the-go measurements, the sensor must be fixed at a vehicle and the best position needs to be found. Thus, the objective of this research was to evaluate the possibilities of a Kinect sensor to estimate poplar biomass depending on the angle from which the sensor is positioned on a stationary platform. Material and methods Sampling system A 3D modeling system was tested consisting of a Kinect sensor and specific software for point cloud acquisition. The Kinect sensor was originally designed for gaming using Microsoft Xbox. However, this sensor has been used in many other applications. The sensor is a structured-light device integrating a RGB camera, an infrared (IR) emitter and an IR depth sensor. The depth integrated sensor consists of the IR emitter combined with the IR camera, which is a CMOS sensor. The RGB camera is equipped with a 400800nm bandpass filter. The IR camera has an 850–1100 nm bandpass filter and captures the depth image. The system captures frames with a depth image resolution of 640 × 480 pixels by a structured-light method with a rate of 30 frames per second. The depth range is low when compared with time of flight systems. It has a minimum range of 800 mm and maximum 4000 mm. Since the sensor output creates a large number of frames per second and this information is in many cases redundant, the information could be used to automatically remove outliers in the point cloud of the 3D model. Skanect 1.5® software was used for data acquisition. This software uses the Kinect Fusion algorithms to recreate a 3D point cloud from the depth video stream. Site study and Data Processing Field measurements were conducted during 2014. The poplar field was located in La Poveda Research Farm (Arganda del Rey, Central Spain, 40º 18' N, 3º 29' W, 618-m elevation). The location is characterized by a Mediterranean climate with cold winters and hot and dry summers. The field was irrigated during spring and summer. The agronomic management was that common in the area for growing poplar. Poplar cuttings were planted during spring 2014. The distance between poplar trees was 0.5 m and 3 m row spacing. At sampling time, poplar height ranged from 0.4 m to 1.5 m. A total of 20 trees were sampled. Each poplar tree was individually scanned. Readings were obtained by positioning the sensor at 4 different angles: top view (0º), downwards view (45º), front view (90º) and ground upwards view (-45º) (Figure 1). The Kinect was mounted on a mast and it was oriented to the required angle. Data were acquired from the same stationary location in a single shot mode, i.e. the sensor and the object did not move. Right after the Kinect data were recorded, the actual height of trees was manually measured. Thereafter, trees were cut and processed in the lab for biomass determination (dry weight basis). Figure 1. Schematic positioning of the sensor mounted on an ATV and respect to the tree: 1) Top view (0º); 2) downwards (45º); 3) front view (90º); and, 4) upwards (-45º). Meshes were processed off line in the open source software Meshlab®. The software was able to manage and plot the stored data, creating a model with the values taken by the sensor. The incorporated tools allow cleaning, managing and processing the obtained meshes. The meshes were processed in different steps: (1) filtering and removal of outliers and noise reduction by filters. Individual points were removed using statistical filters. Those points which were outside the grid by more than 2 cm were removed. Some points could not be removed by filters and a visual assessment was done to removed the rest of the outliers. (2) Neighbor poplars were removed from the image. The target poplar was isolated by removing some close plants inside the box occupied by the studied poplar. The model height was also measured and compared with the actual poplar height. Additionally, the tree volume was measured and compared with the dry biomass. Results and discussion In general, results showed that sensor location is a key factor for a correct plant phenotyping. Total biomass and tree height were always well estimated (correlations significant at 99%) by the sensor located in all positions, indicating the potential of depth cameras for plant phenotyping (Table 1). Tree volume was also properly measured. Mean values did not differ greatly between them. The top view (0º) showed poorer results in the tree reconstruction. Views from angles 45º, 90º and -45º showed good results in relation to the measured parameters. However, small branches were not detected (Figure 2). This effect was probably due to the small branch diameter in the top segment. Nock et al (2013), trying to reconstruct the branching architecture of Salix plants with a similar sensor, found that the sensor accurately captured the diameter of branches >6 mm but small branches remained undetectable at certain distances. In the same way, the front view (90º) perfectly reconstructed most of the parameters of tree shape and biomass. Similar results were obtained for the -45º angle of view. Thus, the use of a top view is not justified in any case and the other three angles could be used depending on the study target. Small ground robots could use -45º to obtain depth models while other vehicles, such as ATVs or tractors, could use 45º or 90º which shows better results for biomass estimations. Table 1. Coefficients of correlation of actual plant parameters (i.e., biomass and height) and estimated parameters using Kinect sensor from different view angles in poplar trees for the first growing cycle. View angle 0º 45º 90º -45º Total biomass 0.729** 0.799** 0.802** 0.747** Height 0.747** 0.979** 0.982** 0.944** **Correlation significant at P<0.01 level Figure 2. Example of the 3D model acquired at: a) 0º angle; b) 45º angle; c) 90º angle; d) -45º angle; and e) multiangle. Therefore, a fixed angle cannot be defined as the best location of the sensor in a vehicle, although the front angle seems the best position. The most important limitation for the use of structured-light sensors in agriculture is that special conditions of illumination are required for outdoor detection. Low and high natural illumination does not allow collection of a sufficient number of points in the point cloud and color detection is not accurate enough. In cases with very low illumination or in total darkness, Kinect detection can be improved by illuminating the crop with artificial light. Comparison of Kinect with TLS shows advantages of the latter because they are not influenced by ambient light, while the Kinect sensors are superior because 3D point clouds can be obtained for each frame, unlike the 2D TLS which must be moved to obtain a 3D model. On the other hand, the precision of Kinect is lower for the detection of thin branches. In summary, when comparing LiDAR sensors (TLS) and depth cameras, the latter have the advantage of their capability to record 3D color point clouds in a single image and, detect plant structure in a faster and reliable way with very high spatial resolution. The new model that Microsoft has launched (Kinect for Windows v2) should improve the advantages of this sensor with respect to TLS. It includes a high-definition (HD) color camera and it uses the principle of time of flight (ToF) instead of structuredlight, improving accuracy and penetration in plants for 3D characterization. However, all these features will not be significant if the sensor is not located correctly. The location of the sensor is key to achieving good results. Conclusions Single shots showed the potential of structured-light sensors to collect spatial and color information of trees to characterize the RGB-D three-dimensional properties of the crops. The validation using actual parameter showed that the front angle represents one of the best locations and the top angle leads to poorer results. The Kinect sensor has led to reliable data. These sensors are relatively cheap and the integration into real-time applications would lead to an improvement in crop management. However, the possibilities of on-line operations need to be further explored using the best location and sensor angle in a vehicle. Acknowledgments This research was funded by the Spanish CICyT (Project No. AGL2011-25243). References Andújar, D., Escolà, A., Dorado, J. and Fernández-Quintanilla, C. 2011. Weed discrimination using ultrasonic sensors. Weed Research 51 543–547. Chéné., Y., Rousseau., D., Lucidarme, P., Bertheloot, J., Caffier, V., Morel, P. et al. 2012. On the use of depth camera for 3D phenotyping of entire plants. Computers and Electronics in Agriculture 82 122-127. Dal Mutto, C., Zanuttigh, P. and Cortelazzo, G. M. 2012. Time-of-Flight Cameras and Microsoft Kinect. Springer Briefs in Electrical and Computer Engineering. 108 pp. Méndez, V., Rosell-Polo, J. R., Sanz, R., Escolà, A. and Catalán, H. 2014. Deciduous tree reconstruction algorithm based on cylinder fitting from mobile terrestrial laser scanned point clouds. Biosystems Engineering 124 78-88. Nock, C. A., Taugourdeau, O., Delagrange, S. and Messier, C. 2013. Assessing the potential of low-cost 3D cameras for the rapid measurement of plant woody structure. Sensors 13 16216-16233. Paulus, S., Behmann, J., Mahlein, A. K., Plümer, L. and Kuhlmann, H. 2014. Low-cost 3D systems: suitable tools for plant phenoyping. Sensors 14 3001-3018.