Active Range Sensing For Indoor Environment Modeling

advertisement
260
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998
Active Range Sensing for
Indoor Environment Modeling
Shadia Elgazzar, Ramiro Liscano, François Blais, and Andrew Miles
Abstract— This paper investigates modeling indoor environments using a low-cost, compact, active-range camera, known
as BIRIS, mounted onto a pan and tilt motor unit. The BIRIS
sensor, developed at the National Research Council of Canada, is
a rugged small camera with no moving parts. The objectives of
this paper are to describe and demonstrate the viability of the use
of a low-cost range sensor in the domain of indoor environment
modeling; to present the results of processing three-dimensional
(3-D) data to build a virtual environment for navigation and
visualization; and, to analyze and outline the advantages and limitations encountered when scanning large indoor environments.
Index Terms— Data acquisition, environment modeling, geometric modeling, mapping, range sensing, virtual environment,
virtual reality.
I. INTRODUCTION
Fig. 1. Triangulation using BIRIS.
T
HE long term objective of this research is to build the
necessary tools and to develop the required algorithms
to model indoor environments. The first step toward achieving this objective is to assemble, build and/or develop the
necessary hardware and software tools for data acquisition
and model construction. Data acquisition implies the use of a
sensor that can provide reasonable accuracy for the application
and which can give a good coverage of the environment in one
scan. A compact active range camera, known as BIRIS, is used
for this purpose [1], [2]. This is one of the first attempts to use
BIRIS for the acquisition and modeling of data from indoor
environments.
Research in the domain of modeling using threedimensional (3-D) data has primarily focused on the
extraction of 3-D surfaces and volumetric primitives for
the purpose of either object recognition or creating more
precise models from 3-D sensory data of machined parts.
These types of objects can easily be carried and placed in a
controlled environment and scanned using a high resolution
range sensor. This is significantly different from modeling
large indoor environments where it is necessary to bring the
sensor to the environment, changing the characteristics of the
sensed data dramatically. Because of the larger domain in
which the sensor is operating, research on the modeling of
indoor environments has primarily focused on the incremental
synthesis of sensor views and/or position estimation of the
sensor using either 3-D active sensing or stereo vision.
Manuscript received June 1, 1997; revised April 1, 1998.
S. Elgazzar, R. Liscano, and F. Blais are with the National Research Council
of Canada, Ottawa, Ont., Canada K1A 0R6.
A. Miles is with Carleton University, Ottawa, Ont. Canada K1S 5B6.
Publisher Item Identifier S 0018-9456(98)05455-2.
In modeling large environments, the need for detail diminishes and the challenge becomes one of trying to extract from
the sparse sensory data an overall concept of shape and size
of the structures within the environment. Previous attempts in
this domain have integrated intensity data with range data to
help define the boundaries of surfaces extracted from the 3-D
data, and then used a set of heuristics to decide what surfaces
should be joined. For this application, it becomes necessary
to develop algorithms that can hypothesize the existence of
surface continuity and intersections among surfaces [3] and the
formation of more composite features from the surfaces [4].
The paper starts with a description of the BIRIS scanner
and its data acquisition and scanning modes. The processing
of the data is described and the results are presented as a set
of planar surfaces, represented as polygons, in 3-D space.
II. DESCRIPTION OF THE BIRIS RANGE SCANNER
BIRIS uses active laser triangulation techniques which rely
on sophisticated processing to extract precise range data. The
BIRIS head [1], [2] uses a standard CCD video camera, a
laser line projector, and a modified lens (Fig. 1). A double
aperture mask introduced inside a conventional camera lens
and ,
(Bi-IRIS) produces two distinct intensity peaks,
on the CCD sensor, of a single target point illuminated on
the object surface, along the projected laser line. Both the
, and their
center position of the laser points,
, are used to calculate the distance
separation,
of the camera to the object. The processing software uses
the distance separating the laser projector and the point
as the base for triangulation. The separation is also used to
0018–9456/98$10.00  1998 IEEE
ELGAZZAR et al.: ACTIVE RANGE SENSING FOR INDOOR ENVIRONMENT MODELING
261
Fig. 3. Intensity data for experimental set up.
Fig. 2. Immunity to ambient illumination.
calculate the range, but mainly for the purpose of validation. In
perfect registration with the range data, is the intensity data that
is obtained by measuring the amount of laser light reflected
back to the CCD. Realtime processing provides an exact 3-D
profile of the object at the speed of the CCD camera.
This arrangement offers a number of advantages: high
immunity due to the validation mechanism described in Fig. 2
to ambient illumination and background interference; compactness and reliability; sub-pixel resolution by matching the
shape of the light intensity on the CCD array to determine the
position of the reflection; speed, where 3-D profiles are generated at the frame rate of the CCD camera, 256 points/profile,
60 profiles/s; registered intensity and range images which
give complementary information for image interpretation; and
finally low-cost implementation. Furthermore, BIRIS allows
flexible configurations. It is possible to customize the sensor
to adapt it to the required field of view, distance, resolution,
speed, and to have multiple heads with overlapping fields of
view. The maximum range attainable is limited mainly by the
error of the measurement at that range and the laser power. The
maximum rate of acquisition is mainly defined by the frame
transfer rate of the CCD camera and the application software.
The prototype used in this research has a range of 0.5–4 m
with an accuracy (rms) of 1 mm @ 1.2 m and 2 mm @ 2 m.
As mentioned above, the output of the camera is two
single-column arrays of registered range and intensity data. To
obtain a system that covers larger views of the environment,
the camera was mounted on a pan and tilt unit (PTU) and
interfaced to the BIRIS software. The camera is rotated as
data are acquired by the sensor, resulting in two perfectly
registered images of range and intensity. A simple experiment
was set up in order to test the PTU controller’s acceleration
and velocity capabilities; Fig. 3 displays the resulting intensity
data. Processing the data representing the targets’ positions
showed good matching velocity and acceleration results.
III. DATA ACQUISITION
To scan a large environment, two modes of scanning were
tested. In the first, the camera system was kept in the same
position and was rotated degrees for every tilt angle, where
is a user defined pan angle. The second mode consisted of
a set of scans taken from different positions; this is necessary
when taking different viewpoints of the same scene so that
occluded views are captured. In both cases, the camera position
is saved with the corresponding data. The camera position
is obtained either from the odometry data (of the mobile
vehicle carrying the camera) or by photogrammetric methods.
The results presented in this paper give one example using
odometry.
The tilt angle is incremented according to the breadth angle
of the camera. By breadth angle we mean the interior angle
of the plane of light. To determine the breadth angle of the
BIRIS scanner, a ruler, larger than the field of view and
positioned along the axis, was scanned. Along the ruler a
set of reflectors were attached at equal intervals. From the
pixel position of the top and bottom markers and the value
of the spacing between the reflectors, the distance/pixel was
calculated to be 2.1739 mm/pixel. This is equivalent to a
breadth angle of 17.448 .
IV. MODEL CONSTRUCTION
Two primary steps are followed to obtain a model of
the environment: surface extraction where a sequence of
algorithms is applied to the range data to find surfaces and
their location in 3-D space, and surface registration where
surfaces are manipulated to obtain a full 3-D model that can
be displayed. For surface extraction, the data was processed
as follows:
Raw Filtering: This is the basic preprocessing applied to
the acquired data. The algorithm starts by transforming the
data from a cylindrical coordinate system to a world coordinate
system and recording the position of the scan. Then, it checks
all range acquisitions and removes the ones that are above
a certain user specified range (usually ranges greater than
the calibrated maximum range). The filtering algorithm also
removes stray pixels from the image.
Data Segmentation: Segmentation, which is applied to the
3-D data directly, removes superfluous data and groups them
into regions. The algorithm used on our data is the one
published in [5], [6]. The results from this algorithm are: a
262
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998
Fig. 4. Label map, after segmentation, of Fig. 3.
Fig. 6. Five intensity images representing one pan position.
Fig. 5. Boundaries for surfaces in Fig. 4.
label map (region and intensity information) of planar surfaces,
shown in Fig. 4 for the data depicted in Fig. 3; the parametric
equations defining the regions in space; the number of points
of the region; it’s center of mass; the covariance matrix; the
invariant coefficients; the approximation error; and, a list of
adjoining neighbors. A total of 30 regions were extracted. In
some particular situations the regions contain small holes that
are removed by applying dilation and erosion morphological
algorithms.
Boundary Extraction: To complete the representation of
the surfaces, an edge tracking algorithm is applied, in twodimensional (2-D), to each of the surfaces depicted in the
labeled image (Fig. 4) so as to define a boundary corresponding to a particular surface, (Fig. 5). The edge tracking
algorithm is an extension of the one developed in [7] in that
it computes an estimate of the curvature of the edge while
tracking the boundary of a surface. The curvature along the
edge is computed as a difference of running averages of the
gradient values along the boundary. Currently this filter uses
the average of three pixel gradient values and appears to be
able to filter the majority of large changes in the gradient
values, which are due primarily to the discretization of the
image.
The high curvature points are used to define polygons
that represent the 3-D surfaces. Any sequential set of high
curvature points is replaced by a straight line defined by
the first and last high curvature points in that segment. The
result is a set of polygons whose corners are high curvature
points connected by straight lines or low curvature edges.
This procedure can be justified by the fact that in most cases
these sets of points are associated with fictitious boundaries
where the returned signal is not strong enough to be registered.
The primary reason for applying this procedure is to reduce
the number of points required to represent the boundaries
(sometimes by as much as a factor of ten). This leads to an
improved performance in visualization and reasoning among
the surfaces.
V. RESULTS
In this section, results of processing 3-D data to build a
virtual environment for navigation and visualization are presented using figures and images. The display of the results is
composed of a set of planar surfaces, represented as polygons,
in 3-D space. These polygons are defined using the virtual
reality markup language (VRML).
A setup in the laboratory that included different partitions
and many objects was scanned from three positions 120 apart,
covering 360 pan. Each pan position had a corresponding
series of five tilt scans taken, totaling 15 images. Each image
consists of 256 512 data points. Each set of pan scans ranged
from 70 to 70 thus overlapping adjacent images by 10 .
A few degrees of overlap (2 to 3 ) also existed between most
of the tilt scans. The intensity image constructed from a single
pan position and five tilt positions (five intensity images total)
is shown in Fig. 6. Since the PTU is fairly accurate relative to
the sensory data and no translational movement of the sensor
occurs, it is not necessary to apply any registration algorithms
to the multiple images.
The five range images corresponding to the intensity images
of Fig. 6 were processed (i.e., surfaces extracted), manually
registered, and formatted in VRML. Fig. 7 displays the pro-
ELGAZZAR et al.: ACTIVE RANGE SENSING FOR INDOOR ENVIRONMENT MODELING
Fig. 7. Polygonal representation of five (vertically) registered images showing the ceiling and the back of the floor.
263
domain have used a set of heuristics to decide what surfaces
should be joined. In most circumstances, these heuristics are
a set of rules with predefined thresholds that determines if
the surfaces should be joined. Currently we are investigating
the use of Bayesian networks [4] to manage the uncertainty
associated with such decisions. A Bayesian network offers a
unified approach to the specification of relationships among
surfaces as well as a method for computing a belief value in
the existence of a compound feature given the evidence from
the sensory data.
Scanning large environments in 3-D allows their representation in a manageable number of images. But this also has
its limitations. Since the accuracy of the sensor degrades with
distance, the resolution of the acquired data, when scanning
a large environment, will differ substantially within a single
image and from one image to another. One of the technical
problems that still needs to be solved satisfactorily is how to
fuse multiple resolution data from different views. Although
some manual and semi-automatic solutions exist for multiview registration, robust solutions are needed for multiview
registration with sensory error.
REFERENCES
Fig. 8. Polygonal representation of three (horizontally) registered images
showing the layout of the lab.
cessed 3-D image from a point of view that shows the part of
the ceiling that was scanned by the camera; the back of the
floor can also be seen as the dark surfaces on the bottom.
Similarly, a set of range images formed of the three pan
positions for the middle tilt scan were processed, registered
and formatted in VRML (Fig. 8). The polygons in Figs. 7 and
8 were intentionally left at different levels of gray to better
display the results to the reader.
VI. DISCUSSION
AND
[1] F. Blais, M. Rioux, and J. Domey, “Optical range image acquisition
for the navigation of a mobile robot,” in IEEE Int. Conf. Robotics and
Automation, Sacramento, CA, Apr. 9–11, 1991, vol. 3, pp. 2574–2580.
[2] F. Blais, M. Lecavalier, and J. Bisson, “Real-time processing and
validation of optical ranging in a cluttered environment,” in Int. Conf.
Signal Processing Applications & Technology, Boston, MA, Oct. 7–10,
1996, vol. 2, pp. 1066–1070.
[3] R. Liscano, S. Elgazzar, and A. K. C. Wong, “A proximity compatibility
function among 3-D surfaces for environment modeling,” in IASTED 5th
Int. Conf. Robotics and Manufacturing, Cancun, Mexico, May 29–31,
1997.
, “Use of belief networks for modeling indoor environments,” in
[4]
Vision Interface ’97, Kelowna, B.C., Canada, May 19–23, 1997.
[5] P. Boulanger and F. Blais, “Range image segmentation, free space
determination, and position estimate for a mobile vehicle,” in SPIE
Proc. Mobile Robots VII, Boston, MA, Nov. 18–20, 1992, vol. 1831,
pp. 444–455.
[6] P. Boulanger and P. Cohen, “Viewpoint invariant computation of surface
curvatures in range images,” in Vision Interface ’94, Banff, Alta.,
Canada, May 16–20, 1994, pp. 145–154.
[7] Q. Gao and A. K. C. Wong, “Curve detection based on perceptual
organization,” Pattern Recognit., vol. 26, no. 7, pp. 1039–1046, 1993.
CONCLUSION
BIRIS as a low-cost, medium accuracy, real-time range
sensor was found to be a viable sensor for modeling indoor
environments. The real challenge in indoor environment modeling is in reducing the amount of information that the sensor
detects by extracting the key features in the sensory data and
grouping the surfaces into larger surfaces with less details.
Figs. 7 and 8 are examples of surfaces that have not been
grouped together. For an environment modeler to become
a viable tool for computer aided design, it is necessary to
develop approaches that hypothesize the formation of more
composite features from the surfaces. At a minimum, it is
essential to determine some measure of proximity between
nonadjoining surfaces [3] and the possibility of surface-tosurface intersection so that a better approximation of the
surface boundaries can be performed. Also, in environments
that are cluttered with objects, it is important to hypothesize
on the continuation of surfaces. Previous attempts in this
Shadia Elgazzar received the Ph.D. degree in electrical engineering from the University of Manitoba,
Winnipeg, Man., Canada, in 1981, in the area of
optimal control.
She is a Senior Research Officer of the Visual
Information Technology Group of the Institute for
Information Technology at the National Research
Council of Canada, Ottawa, Ont., which she joined
in 1978. Since then most of her activities were in
the area of kinematics, multiprocessor architecture
for robot controllers, and digital signal processing.
Her present interests are in 3-D vision and the visual aspects of sensor-based
robotics control.
264
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 47, NO. 1, FEBRUARY 1998
Ramiro Liscano received the B.Sc. degree in mechanical engineering from the University of New
Brunswick, Fredericton, N.B., Canada, in 1982 and
the M.Sc. degree in mechanical engineering from
the University of Rhode Island, Kingston, in 1984.
He received the Ph.D. degree in systems design
engineering from the University of Waterloo, Ont.,
Canada, in 1998. His thesis research topic is in the
grouping and management of uncertainty in 3-D
sensory data for environment modeling.
Since 1984, he has been a Researcher at the
National Research Council of Canada, Ottawa, Ont., working in the control
of a manipulators using visual feedback, calibration, and dynamic programming of robot manipulators, environment sensing and modeling, autonomous
navigation, real-time computing, and intelligent system design.
François Blais received the M.Sc. degree from
Laval University, Québec City, P.Q., Canada.
He is a Senior Research Officer at the Institute for
Information Technology of the National Research
Council of Canada. In 1984, he joined the National
Research Council, Ottawa, Ont., where he has been
involved in the development of various 3-D range
sensor technologies and applications. His topics of
interest cover various fields in digital signal and
image processing, control, optics, 3-D vision, and
their applications.
Andrew Miles graduated from the School of Computer Science, Carleton University, Ottawa, Ont.,
Canada, in June 1997, with specialization in scientific computing. He received the B.Sc degree in
mathematics from Carleton University in February
1994.
He is currently the Network Administrator at
Carleton University and is pursuing graduate work
in the area of 3-D graphics.
Download