260 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998 Active Range Sensing for Indoor Environment Modeling Shadia Elgazzar, Ramiro Liscano, François Blais, and Andrew Miles Abstract— This paper investigates modeling indoor environments using a low-cost, compact, active-range camera, known as BIRIS, mounted onto a pan and tilt motor unit. The BIRIS sensor, developed at the National Research Council of Canada, is a rugged small camera with no moving parts. The objectives of this paper are to describe and demonstrate the viability of the use of a low-cost range sensor in the domain of indoor environment modeling; to present the results of processing three-dimensional (3-D) data to build a virtual environment for navigation and visualization; and, to analyze and outline the advantages and limitations encountered when scanning large indoor environments. Index Terms— Data acquisition, environment modeling, geometric modeling, mapping, range sensing, virtual environment, virtual reality. I. INTRODUCTION Fig. 1. Triangulation using BIRIS. T HE long term objective of this research is to build the necessary tools and to develop the required algorithms to model indoor environments. The first step toward achieving this objective is to assemble, build and/or develop the necessary hardware and software tools for data acquisition and model construction. Data acquisition implies the use of a sensor that can provide reasonable accuracy for the application and which can give a good coverage of the environment in one scan. A compact active range camera, known as BIRIS, is used for this purpose [1], [2]. This is one of the first attempts to use BIRIS for the acquisition and modeling of data from indoor environments. Research in the domain of modeling using threedimensional (3-D) data has primarily focused on the extraction of 3-D surfaces and volumetric primitives for the purpose of either object recognition or creating more precise models from 3-D sensory data of machined parts. These types of objects can easily be carried and placed in a controlled environment and scanned using a high resolution range sensor. This is significantly different from modeling large indoor environments where it is necessary to bring the sensor to the environment, changing the characteristics of the sensed data dramatically. Because of the larger domain in which the sensor is operating, research on the modeling of indoor environments has primarily focused on the incremental synthesis of sensor views and/or position estimation of the sensor using either 3-D active sensing or stereo vision. Manuscript received June 1, 1997; revised April 1, 1998. S. Elgazzar, R. Liscano, and F. Blais are with the National Research Council of Canada, Ottawa, Ont., Canada K1A 0R6. A. Miles is with Carleton University, Ottawa, Ont. Canada K1S 5B6. Publisher Item Identifier S 0018-9456(98)05455-2. In modeling large environments, the need for detail diminishes and the challenge becomes one of trying to extract from the sparse sensory data an overall concept of shape and size of the structures within the environment. Previous attempts in this domain have integrated intensity data with range data to help define the boundaries of surfaces extracted from the 3-D data, and then used a set of heuristics to decide what surfaces should be joined. For this application, it becomes necessary to develop algorithms that can hypothesize the existence of surface continuity and intersections among surfaces [3] and the formation of more composite features from the surfaces [4]. The paper starts with a description of the BIRIS scanner and its data acquisition and scanning modes. The processing of the data is described and the results are presented as a set of planar surfaces, represented as polygons, in 3-D space. II. DESCRIPTION OF THE BIRIS RANGE SCANNER BIRIS uses active laser triangulation techniques which rely on sophisticated processing to extract precise range data. The BIRIS head [1], [2] uses a standard CCD video camera, a laser line projector, and a modified lens (Fig. 1). A double aperture mask introduced inside a conventional camera lens and , (Bi-IRIS) produces two distinct intensity peaks, on the CCD sensor, of a single target point illuminated on the object surface, along the projected laser line. Both the , and their center position of the laser points, , are used to calculate the distance separation, of the camera to the object. The processing software uses the distance separating the laser projector and the point as the base for triangulation. The separation is also used to 0018–9456/98$10.00 1998 IEEE ELGAZZAR et al.: ACTIVE RANGE SENSING FOR INDOOR ENVIRONMENT MODELING 261 Fig. 3. Intensity data for experimental set up. Fig. 2. Immunity to ambient illumination. calculate the range, but mainly for the purpose of validation. In perfect registration with the range data, is the intensity data that is obtained by measuring the amount of laser light reflected back to the CCD. Realtime processing provides an exact 3-D profile of the object at the speed of the CCD camera. This arrangement offers a number of advantages: high immunity due to the validation mechanism described in Fig. 2 to ambient illumination and background interference; compactness and reliability; sub-pixel resolution by matching the shape of the light intensity on the CCD array to determine the position of the reflection; speed, where 3-D profiles are generated at the frame rate of the CCD camera, 256 points/profile, 60 profiles/s; registered intensity and range images which give complementary information for image interpretation; and finally low-cost implementation. Furthermore, BIRIS allows flexible configurations. It is possible to customize the sensor to adapt it to the required field of view, distance, resolution, speed, and to have multiple heads with overlapping fields of view. The maximum range attainable is limited mainly by the error of the measurement at that range and the laser power. The maximum rate of acquisition is mainly defined by the frame transfer rate of the CCD camera and the application software. The prototype used in this research has a range of 0.5–4 m with an accuracy (rms) of 1 mm @ 1.2 m and 2 mm @ 2 m. As mentioned above, the output of the camera is two single-column arrays of registered range and intensity data. To obtain a system that covers larger views of the environment, the camera was mounted on a pan and tilt unit (PTU) and interfaced to the BIRIS software. The camera is rotated as data are acquired by the sensor, resulting in two perfectly registered images of range and intensity. A simple experiment was set up in order to test the PTU controller’s acceleration and velocity capabilities; Fig. 3 displays the resulting intensity data. Processing the data representing the targets’ positions showed good matching velocity and acceleration results. III. DATA ACQUISITION To scan a large environment, two modes of scanning were tested. In the first, the camera system was kept in the same position and was rotated degrees for every tilt angle, where is a user defined pan angle. The second mode consisted of a set of scans taken from different positions; this is necessary when taking different viewpoints of the same scene so that occluded views are captured. In both cases, the camera position is saved with the corresponding data. The camera position is obtained either from the odometry data (of the mobile vehicle carrying the camera) or by photogrammetric methods. The results presented in this paper give one example using odometry. The tilt angle is incremented according to the breadth angle of the camera. By breadth angle we mean the interior angle of the plane of light. To determine the breadth angle of the BIRIS scanner, a ruler, larger than the field of view and positioned along the axis, was scanned. Along the ruler a set of reflectors were attached at equal intervals. From the pixel position of the top and bottom markers and the value of the spacing between the reflectors, the distance/pixel was calculated to be 2.1739 mm/pixel. This is equivalent to a breadth angle of 17.448 . IV. MODEL CONSTRUCTION Two primary steps are followed to obtain a model of the environment: surface extraction where a sequence of algorithms is applied to the range data to find surfaces and their location in 3-D space, and surface registration where surfaces are manipulated to obtain a full 3-D model that can be displayed. For surface extraction, the data was processed as follows: Raw Filtering: This is the basic preprocessing applied to the acquired data. The algorithm starts by transforming the data from a cylindrical coordinate system to a world coordinate system and recording the position of the scan. Then, it checks all range acquisitions and removes the ones that are above a certain user specified range (usually ranges greater than the calibrated maximum range). The filtering algorithm also removes stray pixels from the image. Data Segmentation: Segmentation, which is applied to the 3-D data directly, removes superfluous data and groups them into regions. The algorithm used on our data is the one published in [5], [6]. The results from this algorithm are: a 262 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998 Fig. 4. Label map, after segmentation, of Fig. 3. Fig. 6. Five intensity images representing one pan position. Fig. 5. Boundaries for surfaces in Fig. 4. label map (region and intensity information) of planar surfaces, shown in Fig. 4 for the data depicted in Fig. 3; the parametric equations defining the regions in space; the number of points of the region; it’s center of mass; the covariance matrix; the invariant coefficients; the approximation error; and, a list of adjoining neighbors. A total of 30 regions were extracted. In some particular situations the regions contain small holes that are removed by applying dilation and erosion morphological algorithms. Boundary Extraction: To complete the representation of the surfaces, an edge tracking algorithm is applied, in twodimensional (2-D), to each of the surfaces depicted in the labeled image (Fig. 4) so as to define a boundary corresponding to a particular surface, (Fig. 5). The edge tracking algorithm is an extension of the one developed in [7] in that it computes an estimate of the curvature of the edge while tracking the boundary of a surface. The curvature along the edge is computed as a difference of running averages of the gradient values along the boundary. Currently this filter uses the average of three pixel gradient values and appears to be able to filter the majority of large changes in the gradient values, which are due primarily to the discretization of the image. The high curvature points are used to define polygons that represent the 3-D surfaces. Any sequential set of high curvature points is replaced by a straight line defined by the first and last high curvature points in that segment. The result is a set of polygons whose corners are high curvature points connected by straight lines or low curvature edges. This procedure can be justified by the fact that in most cases these sets of points are associated with fictitious boundaries where the returned signal is not strong enough to be registered. The primary reason for applying this procedure is to reduce the number of points required to represent the boundaries (sometimes by as much as a factor of ten). This leads to an improved performance in visualization and reasoning among the surfaces. V. RESULTS In this section, results of processing 3-D data to build a virtual environment for navigation and visualization are presented using figures and images. The display of the results is composed of a set of planar surfaces, represented as polygons, in 3-D space. These polygons are defined using the virtual reality markup language (VRML). A setup in the laboratory that included different partitions and many objects was scanned from three positions 120 apart, covering 360 pan. Each pan position had a corresponding series of five tilt scans taken, totaling 15 images. Each image consists of 256 512 data points. Each set of pan scans ranged from 70 to 70 thus overlapping adjacent images by 10 . A few degrees of overlap (2 to 3 ) also existed between most of the tilt scans. The intensity image constructed from a single pan position and five tilt positions (five intensity images total) is shown in Fig. 6. Since the PTU is fairly accurate relative to the sensory data and no translational movement of the sensor occurs, it is not necessary to apply any registration algorithms to the multiple images. The five range images corresponding to the intensity images of Fig. 6 were processed (i.e., surfaces extracted), manually registered, and formatted in VRML. Fig. 7 displays the pro- ELGAZZAR et al.: ACTIVE RANGE SENSING FOR INDOOR ENVIRONMENT MODELING Fig. 7. Polygonal representation of five (vertically) registered images showing the ceiling and the back of the floor. 263 domain have used a set of heuristics to decide what surfaces should be joined. In most circumstances, these heuristics are a set of rules with predefined thresholds that determines if the surfaces should be joined. Currently we are investigating the use of Bayesian networks [4] to manage the uncertainty associated with such decisions. A Bayesian network offers a unified approach to the specification of relationships among surfaces as well as a method for computing a belief value in the existence of a compound feature given the evidence from the sensory data. Scanning large environments in 3-D allows their representation in a manageable number of images. But this also has its limitations. Since the accuracy of the sensor degrades with distance, the resolution of the acquired data, when scanning a large environment, will differ substantially within a single image and from one image to another. One of the technical problems that still needs to be solved satisfactorily is how to fuse multiple resolution data from different views. Although some manual and semi-automatic solutions exist for multiview registration, robust solutions are needed for multiview registration with sensory error. REFERENCES Fig. 8. Polygonal representation of three (horizontally) registered images showing the layout of the lab. cessed 3-D image from a point of view that shows the part of the ceiling that was scanned by the camera; the back of the floor can also be seen as the dark surfaces on the bottom. Similarly, a set of range images formed of the three pan positions for the middle tilt scan were processed, registered and formatted in VRML (Fig. 8). The polygons in Figs. 7 and 8 were intentionally left at different levels of gray to better display the results to the reader. VI. DISCUSSION AND [1] F. Blais, M. Rioux, and J. Domey, “Optical range image acquisition for the navigation of a mobile robot,” in IEEE Int. Conf. Robotics and Automation, Sacramento, CA, Apr. 9–11, 1991, vol. 3, pp. 2574–2580. [2] F. Blais, M. Lecavalier, and J. Bisson, “Real-time processing and validation of optical ranging in a cluttered environment,” in Int. Conf. Signal Processing Applications & Technology, Boston, MA, Oct. 7–10, 1996, vol. 2, pp. 1066–1070. [3] R. Liscano, S. Elgazzar, and A. K. C. Wong, “A proximity compatibility function among 3-D surfaces for environment modeling,” in IASTED 5th Int. Conf. Robotics and Manufacturing, Cancun, Mexico, May 29–31, 1997. , “Use of belief networks for modeling indoor environments,” in [4] Vision Interface ’97, Kelowna, B.C., Canada, May 19–23, 1997. [5] P. Boulanger and F. Blais, “Range image segmentation, free space determination, and position estimate for a mobile vehicle,” in SPIE Proc. Mobile Robots VII, Boston, MA, Nov. 18–20, 1992, vol. 1831, pp. 444–455. [6] P. Boulanger and P. Cohen, “Viewpoint invariant computation of surface curvatures in range images,” in Vision Interface ’94, Banff, Alta., Canada, May 16–20, 1994, pp. 145–154. [7] Q. Gao and A. K. C. Wong, “Curve detection based on perceptual organization,” Pattern Recognit., vol. 26, no. 7, pp. 1039–1046, 1993. CONCLUSION BIRIS as a low-cost, medium accuracy, real-time range sensor was found to be a viable sensor for modeling indoor environments. The real challenge in indoor environment modeling is in reducing the amount of information that the sensor detects by extracting the key features in the sensory data and grouping the surfaces into larger surfaces with less details. Figs. 7 and 8 are examples of surfaces that have not been grouped together. For an environment modeler to become a viable tool for computer aided design, it is necessary to develop approaches that hypothesize the formation of more composite features from the surfaces. At a minimum, it is essential to determine some measure of proximity between nonadjoining surfaces [3] and the possibility of surface-tosurface intersection so that a better approximation of the surface boundaries can be performed. Also, in environments that are cluttered with objects, it is important to hypothesize on the continuation of surfaces. Previous attempts in this Shadia Elgazzar received the Ph.D. degree in electrical engineering from the University of Manitoba, Winnipeg, Man., Canada, in 1981, in the area of optimal control. She is a Senior Research Officer of the Visual Information Technology Group of the Institute for Information Technology at the National Research Council of Canada, Ottawa, Ont., which she joined in 1978. Since then most of her activities were in the area of kinematics, multiprocessor architecture for robot controllers, and digital signal processing. Her present interests are in 3-D vision and the visual aspects of sensor-based robotics control. 264 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998 Ramiro Liscano received the B.Sc. degree in mechanical engineering from the University of New Brunswick, Fredericton, N.B., Canada, in 1982 and the M.Sc. degree in mechanical engineering from the University of Rhode Island, Kingston, in 1984. He received the Ph.D. degree in systems design engineering from the University of Waterloo, Ont., Canada, in 1998. His thesis research topic is in the grouping and management of uncertainty in 3-D sensory data for environment modeling. Since 1984, he has been a Researcher at the National Research Council of Canada, Ottawa, Ont., working in the control of a manipulators using visual feedback, calibration, and dynamic programming of robot manipulators, environment sensing and modeling, autonomous navigation, real-time computing, and intelligent system design. François Blais received the M.Sc. degree from Laval University, Québec City, P.Q., Canada. He is a Senior Research Officer at the Institute for Information Technology of the National Research Council of Canada. In 1984, he joined the National Research Council, Ottawa, Ont., where he has been involved in the development of various 3-D range sensor technologies and applications. His topics of interest cover various fields in digital signal and image processing, control, optics, 3-D vision, and their applications. Andrew Miles graduated from the School of Computer Science, Carleton University, Ottawa, Ont., Canada, in June 1997, with specialization in scientific computing. He received the B.Sc degree in mathematics from Carleton University in February 1994. He is currently the Network Administrator at Carleton University and is pursuing graduate work in the area of 3-D graphics.