TERRESTRIAL IMAGE BASED 3D EXTRACTION OF URBAN UNFOLIAGED TREES

advertisement
TERRESTRIAL IMAGE BASED 3D EXTRACTION OF URBAN UNFOLIAGED TREES
OF DIFFERENT BRANCHING TYPES
Hai Huang
Institute of Photogrammetry and Cartography
Bundeswehr University Munich, 85577 Neubiberg, Germany
hai.huang@unibw.de, www.unibw.de/ipk
Commission III, ThS-7
KEY WORDS: Image Understanding, 3-D Feature Extraction, Computer Vision, Feature Extraction, Urban Planning, Vegetation,
Three-dimensional, Statistics
ABSTRACT:
In this paper we propose extensions to a generative statistical approach for three-dimensional (3D) extraction of urban unfoliaged trees
of different branching types from terrestrial wide-baseline image sequences. Unfoliaged trees are difficult to extract from images due
to their weak contrast, background clutter, and particularly the possibly varying order of branches in different images. By combining
generative modeling by L-systems and statistical sampling one can reconstruct the main branching structure of trees in 3D based on
image sequences in spite of these problems. Here, we particularly classify trees into different branching types and specific L-systems
are applied for each type for a more plausible description. We combine Monte Carlo (MC) with subsequential Markov Chain Monte
Carlo (MCMC) to robustly and efficiently deal with the sparse distributions of the branching parameters. First results show the potential
of the extended approach.
1
INTRODUCTION
as well as the height. Image data, also in the infrared is used to
mostly reliably differentiate pine, spruce, and deciduous trees.
The extraction of trees can help to enhance 3D urban geoinformation. Trees are an essential component adding a natural touch
to and for a realistic visualization of city models.
Aerial images are used in (Cheng et al., 2006), which like our
work employs a statistical (Bayesian) framework consisting also
of a generative component. It comprises segmentation, stereo,
and 3D fitting and is demonstrated by extracting individual trees.
Both data-driven (inverse modeling, 2D data to 3D geometries)
and generative – model-driven (3D models to 2D images) components are integrated.
In this paper we aim at extracting the main branching structure
of individual unfoliaged deciduous trees from wide-baseline image sequences. We basically follow our approach presented in
(Huang and Mayer, 2007), extending it with the following novelties:
(Pfeifer et al., 2004) and (Gorte and Pfeifer, 2004) extract detailed
models for trees from terrestrial laser scanner data. (Pfeifer et
al., 2004) aims at fitting cylinders to the trunk and the thicker
branches. A possibly better solution is provided by (Gorte and
Pfeifer, 2004). The laser points are rasterized in 3D voxel space
and operations such as closing from mathematical morphology
and thinning are used to obtain a connected 3D skeleton. We
note that for both approaches a relatively high density of points
is needed to avoid gaps in the skeleton, or even more critical, to
fit cylinders at all.
• Classification of branching types and refined modeling
• Combined MC and MCMC sampling for robust and efficient
search
Deciduous trees are popular in cities as they provide shadow in
summer while letting the sunlight through in winter. Thus, they
often form the majority of trees in urban areas. From a practical
point of view images for data acquisition are often taken when
trees are unfoliaged, as facades, etc. are then more readily visible. From a scientific but also a practical point of view unfoliaged
trees have the advantage, that they explicitly show the branches.
Yet, it is hard to extract unfoliaged trees because of their geometric complexity, weak contrast, background clutter, and the varying order of the branches when projected into different images
even when the images have been taken very close to each other.
In the work of (Sakaguchi and Ohya, 1999), which employs like
our work terrestrial images, a volume is carved out by intersecting the view cones generated from the tree silhouettes in multiple
images. The voxels of the volume are colored with the average
brightness of the rays from the different images. A branching process is started on the ground extending into dark areas assumed to
correspond to the trunk or branches. The given results are plausible, but there is much human intervention involved. A more
sophisticated automatic approach is (Shlyakhter et al., 2001). 3D
volumes are generated as in (Sakaguchi and Ohya, 1999). From
the volumes 3D medial axes are constructed. The medial axes
are constrained to the “botanical fidelity of the branching pattern
and the leaf distribution” (Shlyakhter et al., 2001) via an open
Lindenmayer-, or in short L-system (Mĕch and Prusinkiewicz,
1996). Again, manual interaction is employed to generate results
which are good in terms of visualization.
Former work has mostly dealt with tree extraction in aerial images and especially recently laser scanner data. Much work focuses on forests. (Hyyppä et al., 2005) aims in their empirical
study on the accuracy of the estimation of tree volumes from
aerial data. They estimate a segment for each tree from image
data, but found, that the height is much more reliably determined
from laser-scanner data than by image matching. (Persson et al.,
2004) uses laser-scanner data for the determination of the outline
253
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3a. Beijing 2008
Recent work includes (Tan et al., 2007), which focuses on the
semi-automatic construction of realistic looking tree models from
images. It is based on high quality structure from motion and
dense depth estimation and it employs shape patterns of visible
branches to reconstruct missing parts of the tree.
In this paper we show how generative statistical modeling based
on L-systems and MCMC makes it feasible, to extract branches
in wide-baseline image sequences taken unconstrained with standard consumer cameras in spite of the problems with clutter and
occlusions stated above. Our basis is a highly precise structure
from motion procedure (Mayer, 2005) making use of calibration
via the five-point-algorithm (Nistér, 2004). Corresponding points
are obtained with high precision by least-squares matching and
bundle adjustment is used after every step.
In Section 2 the generative modeling of the tree structure by means
of L-systems is described for three different branching types. The
generation of 3D hypotheses, their 2D projection, and evaluation
are presented in Sections 3 and 4 comprising also the classification of the branching types based on results for the first level
of branches. Hypotheses for trunks are generated by line extraction and image matching while those for branches are produced
by MC/MCMC sampling of their parameters in 3D object-space
with given prior distributions based on the classified branching
type. The evaluation of new hypotheses is conducted using the
normalized cross correlation coefficient (CCC) as a substitute
for likelihood. After demonstrating the potential of the extended
approach with preliminary results in Section 5, the paper ends up
with conclusions.
2
L-SYSTEM FOR DIFFERENT BRANCHING TYPES
As stated above, the branching structure of trees is difficult to extract from terrestrial wide-baseline urban image sequences. To
construct 3D models of trees bottom-up/data-driven from the images, we need to match the branches. Often, the ordering constraint, i.e., a point left of another point on an epipolar line in one
image is also left of the corresponding point on the epipolar line
in the other image, is employed to guide matching. Yet, because
of the complex 3D structure of trees, the ordering constraint is
often not valid for branches even for images taken close to each
other. All this means that the bottom-up extraction and matching of branches does not seem promising and suitable constraints
describing the structure of trees are essential for their 3D reconstruction.
We therefore decided to model the tree structure generatively,
i.e., top-down/model-driven by means of L-systems (Mĕch and
Prusinkiewicz, 1996). The latter are widely used in computer
graphics to simulate the structure and growth of vegetation. They
are parallel string rewriting systems representing branching structures in terms of bracketed strings of symbols possibly with associated numerical parameters. The simulation of branching starts
with an initial string (axiom). By means of productions substrings
of the predecessor string are substituted by successor strings according to specific rules. Models produced by L-systems are able
to maintain basic botanic features such as hierarchical structure
and self-similarity.
2.1
Branching types
Basically, branching structures of trees can be divided into two
main groups: “monopodial” and “sympodial”, cf., e.g., (Deussen
and Lintermann, 2005), for which different Production Rules have
to be used. The monopodial branching system (cf. Fig. 1 (m))
has a prominent main axis, which is stronger and longer than the
254
side branches. The side branches are again stronger and longer
than their side branches of the second order, etc. Because of the
dominant axes, monopodial branching structures have a radially
symmetric crown.
Fig. 1 (sd) and (sm) show the two main types of sympodial
branching. “Sympodial-dichasium” – sd branching means, that
two buds of a branch sprout and grow synchronously. For this
kind of tree trunk and crown are clearly separated. The most common branching structure for trees is “sympodial-monochasium”
– sm, where one of the secondary branches has approximately the
same direction as the original branch. Sympodial-monochasium
branching results into only partially symmetric branching structures, which will still often appear very similar to monopodial
branching.
Figure 1: Branching types: (m) monopodial; (sd) sympodialdichasium; (sm) sympodial-monochasium
2.2
L-systems
We have devised L-systems for the different branching types with
predefined basic Production Rules for each type. An example for
how L-systems work is given for monopodial – m trees. An Lsystem as shown in Fig. 1 (m) is, e.g., defined as follows:
G(m) = (V , S, ω, P )
with
V(m) (Variable): F
S(m) (Constants): +, –, <, >, [, ]
ω(m) (Initial State): F
P(m) (Production Rule): F = F[+>F][–<F] F
in which the Variable F corresponds to growth, i.e., a new piece
of cylinder, while the Constants describe rotation around certain
axes (“+” and “−” indicate turn left and right – inclinations; “<”
and “>” indicate roll left and right – azimuths) and the creation of
new branches (enclosed by brackets “[” and “]”). The Production
Rule instructs to replace F with the specific string in the next
iteration. Different Production Rules lead to different branching
structures.
The L-systems for sympodial-dichasium – sd and sympodial-monochasium – sm (cf. Fig. 1 (sd) and (sm)) branching are defined
in the same way, but with specific Production Rules:
P(sd)
:
F=F[+<F][–<F]
P(sm)
:
F=F[+>F]F[–<F]F
Variable and Constants are parameterized, i.e., the values of the
size of the cylinder and the angles of rotation are not fixed. In
contrast to graphical modeling with L-systems, the parameters in
our approach are determined by statistical sampling to fit the real
scene.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3a. Beijing 2008
3
GENERATION OF 3D HYPOTHESES
after the first round of refinement and the best one is finally found
after another twenty MCMC iterations for each of them.
The L-system provides a reasonable structure for the branches.
Yet, the L-system alone only gives means to generate and also
visualize trees. Thus, after the trunk is located by line extraction
and image matching, the branches are extracted via a generative
statistical approach: MC and MCMC are employed to link the
3D model to the scene, i.e., to find a plausible structure for the
tree visible in the images. Likely candidates for branches are
generated by stochastic sampling and are verified by comparing
simulated and real images. Via classification the branching type
is determined from the first level of branches.
3.1
The basic MC search and the Metropolis-Hastings algorithm (Neal,
1993), integrated into our MCMC, both help to avoid local minima while still allowing to find solutions quickly. Preliminary
prior distributions for the parameters have been devised from experience and observations, resulting in an efficient search, particularly after the first level of branching.
3.4
The classification of the branching types is done after the determination of the first level of branches, i.e., the branches, which
directly grow from the trunk. We employ a particularly flexible
search for this level by adding the vertical Z-coordinate of the begin point of the branch, i.e., its joint position along the trunk, as
an additional parameter. This parameter is sampled together with
the branching angles. Thus, branches do not have to start exactly
at the end point of trunk, but can be distributed in its vicinity.
Extraction of the trunk
The trunk is a basic part of many trees we are interested in. For its
extraction we assume that it corresponds to a thick, mostly vertical line and it defines the lower part of the main axis outside the
crown. The vertical direction is presumed to be known approximately by basically taking images horizontally. It can often be
improved by computing the vertical vanishing point from the vertical edges of trunks or on facades as we focus on urban scenes.
Vertical lines, i.e., hypotheses for trunks, are verified by matching in several images. We use the trifocal tensor (Hartley and
Zisserman, 2003) derived from the known orientation parameters
to predict from lines in two images hypotheses for lines representing the trunk in other images. For the remainder of the paper
we assume that the 3D position of the tree is determined by the
trunk.
3.2
Classification of branching types
The classification is based on the distribution of the determined
branches: trees are supposed to be of the monopodial – m type if
most of the joint positions of the extracted branches are concentrated in a relatively small area. If there is no branch maintaining the direction of the trunk, trees are classified as sympodialdichasium – sd. If none of the above holds, the branches are assumed to be more suitably described as sympodial-monochasium
– sm trees. The appropriate Production Rule of the corresponding
L-system (cf. Section 2.2) is then employed for the further levels.
Hypotheses generation for branches
4
A branch in 3D object-space is modeled as a cylinder with known
begin. As parameters azimuth (angle with x-axis of branch projected into horizontal plane), inclination (angle between branch
and horizontal plane), length, and diameter are used. New hypotheses are generated by sampling the space of the parameters.
4.1
2D projection
The 3D hypotheses generated above are projected into 2D resulting into simulated images. They are evaluated by comparing them with the given images. We use a simple and efficient
2D representation derived from the 3D representation instead of
the actual projection of the 3D cylinder, as the latter entails a
larger computational effort and for statistical sampling many projections are needed. Another reason for doing so is that the projection of the branches results into patches of nearly constant
brightness anyhow.
In Fig. 2 the basic idea for the generation of branches is presented. Based on the trunk, branches are grown by sampling their
parameters statistically, guided by appropriate prior distributions.
I.e., the inclination is presumed to be often about 45 upwards and
relatively seldom downwards. A newly generated hypothesis is
projected into the images via the given highly precisely known
orientation parameters. The hereby generated simulated images
are matched to the given images (cf. Section 4). As model for the
background clutter we use Gaussian noise.
The chosen 2D representation consists of trapezoids. A trapezoid
is described by its direction (angle with x-axis), length, width of
begin, and width of end. We determine the centers of the ends of a
trapezoid by projecting the end points of the 3D axis of the cylinder into the image. Reasonable approximations for the widths are
obtained from the projection of points on the normals to the lines
from the end points to the projection center with the distance radius of the cylinder.
Our current experiments are limited to the major branching structure, which includes the longer and thicker branches in the first
few levels. Small twigs are ignored in modeling and search.
3.3
EVALUATION OF HYPOTHESES
Combined MC and MCMC sampling
Combined Monte Carlo plus subsequential Markov Chain Monte
Carlo (MC+MCMC) is used for the statistical sampling. Since
the parameters, especially the branching angles, are distributed
sparsely in a relatively large space, plain Monte Carlo, i.e., random numbers are drawn from the given prior distributions, is used
in the first phase for a coarse sampling and refined in the second
phase by means of MCMC. MCMC, cf. (Neal, 1993), is characterized by the Markov property: a sampling step depends only on
the previous step, i.e., the space of parameters is sampled locally.
4.2
Evaluation by cross correlation coefficient
The cross correlation coefficient (CCC) is used for the evaluation of the 2D-projections of the 3D branch hypotheses based on
the given images. Hypotheses are projected with the trunk color,
Gaussian noise simulates background clutter, and both together
compose simulated images.
Each simulated image i and its corresponding given image are
compared based on the intensities computed by the Hue-SaturationIntensity (HSI) color transformation and results in an independent
CCCi . To be able to compare different hypotheses for the whole
tree, the matching is done against the projections of the convex
Particularly, the best samples, e.g., ten from one hundred, from
MC are taken as candidates for a refined search using ten MCMC
iterations for each. The number of candidates is reduced to three
255
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3a. Beijing 2008
Figure 2: Stochastic sampling based on an L-system results in a 3D tree hypothesis (left). Projection of a new branch (red) into three
empty images in the form of a trapezoid with Gaussian noise as background (center) and given image data (right). For the sake of
clarity only the projection of the new branch is shown.
3D hull of all hypotheses. As MCMC sampling usually entails
a larger number of iterations, the comparison has to be efficient.
This is done by an incremental update of only those parts of the
2D projection and the corresponding variances and covariances,
which have changed.
digital camera. As output, the major branching structure of the
target tree is represented in the form of a VRML (Virtual Reality Modeling Language) model. Focusing on urban scenes, we
mainly analyze monopodial – m and sympodial-monochasium –
sm trees, which form the majority of urban trees.
CCCi values for the n individual images are combined via multiplication into a global CCC value
A result for an image triplet depicting an sm tree is presented
in Fig. 3. A snapshot taken at the end of the search process
shows the extracted branches. The search is limited to the thicker
branches, which are modeled by the L-system and stops when no
more hypotheses are accepted. Fig. 4 and 5 show an sm and an
m tree, the latter taken under very different lighting conditions.
The VRML models are presented from positions corresponding
to the imaging positions to allow for an easy comparison. Since
we model both the trunk and the branches as cylinders, the arcshape of the trunk and the main branches of the tree in Fig. 4 is
roughly represented as consecutive cylinders.
CCCglobal =
n
Y
CCCi .
(1)
i=1
We use multiplication because we assume that the CCCi values
are proportional to likelihoods and we assume independence of
the images given the 3D model. Moreover, we found empirically,
that this conservative combination helps to sort out wrong hypotheses early. We are aware that the actual size of the CCCi
values can be far from correct likelihoods. Yet, our experiments
give evidence to assume that they are reasonably proportional to
likelihoods.
We basically have determined the main structure on the first several branching levels of the trees even for partially occluded branches as, e.g., the lowest left branch of Fig. 5. However, thinner
branches and twigs are still missing, because they are not yet described by the L-system.
The CCC values of all accepted hypotheses are normalized and
result into a criterion, which hypotheses have to fulfill, to avoid
implausible results. This is especially helpful for irregular trees
and artificial pruning.
6
We have presented extensions to our approach (Huang and Mayer,
2007) for the 3D extraction of unfoliaged trees from image sequences. The approach combines the descriptive power for trees
of L-systems with statistical sampling and cross correlation into
a generative statistical approach. A more sophisticated modeling
by L-systems is devised for an improved interpretation of trees.
Branching structures of trees are classified into different types
after the extraction of the first level of branches and specific Lsystems are applied for refined modeling. For the statistical sampling of the sparsely distributed branching parameters we combine MC and MCMC together with the Metropolis-Hastings algorithm for a more robust and efficient sampling. With the approach
we are able to extract major branches to represent the basic structure of individual trees even when they are partly occluded and
the images are taken under different lighting conditions. Preliminary results show the feasibility and potential of the approach.
By means of experiments we found that it is not useful to sample
all parameters of a branch at the same time. Thus, sampling of the
parameters is conducted sequentially. Firstly, only azimuth and
inclination (and additionally joint position for the first level, cf.
Section 3.4) are jointly sampled by means of MC+MCMC as described in Section 3.3 while the length is kept fixed. The length is
optimized only by MCMC subsequently. For future research we
plan to relax the sequential sampling via conditional probabilities
controlling which parameter to sample next.
5
CONCLUSIONS
RESULTS
The input data for our experiments consists of wide-baseline image sequences taken unconstrained with a hand-held consumer
256
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3a. Beijing 2008
ture of trees. The estimation of empirical distributions for the parameters from a larger number of examples is expected to lead
to better results and more efficient modeling. Those parameters
could include contraction rates for length, diameters and branching angles.
By correlating against trees and representative samples of the
background, a function to upgrade correlation coefficients to correct likelihoods could be determined. Reversible Jumps - RJ
(Green, 1995) are to be integrated in conjunction with model selection into MCMC. Search is thus optimized by an informed selection of competing hypotheses while at the same time avoiding
overfitting.
Figure 3: Result for a sympodial-monochasium tree limited to
the trunk and the first several levels of branches – image triplet
(top), a snapshot of the search process (top-center), and the result
as a VRML model (bottom). The cameras are shown as (green)
pyramids.
We finally note that generative statistical modeling is not confined
to L-systems. One basically just needs a means to construct realistic looking trees that can be efficiently controlled. For this, e.g.,
also (Lintermann and Deussen, 1999) could be basis. As shown
above, the basic branching type of the object tree is classified
during the extraction of main branches. The Production Rules
of L-system could also be refined according to the already found
structure. We assume that the upper stages of branches with very
thin twigs might be grown stochastically with the derived Production Rules and parameters to match the image density.
ACKNOWLEDGEMENTS
We thank Deutsche Forschungsgemeinschaft for supporting Hai
Huang under grant MA 1651/12-1 and the anonymous reviewers
for their helpful comments.
REFERENCES
Cheng, L., Caelli, T. and Sanchez-Azofeifa, A., 2006. Component Optimization for Image Understanding: A Bayesian Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(5), pp. 684–693.
Deussen, O. and Lintermann, B., 2005. Digital Design of Nature.
Springer, Berlin, Germany.
Gorte, B. and Pfeifer, N., 2004. Structuring Laser-scanned Trees
Using 3d Mathematical Morphology. The International Archives
of the Photogrammetry, Remote Sensing and Spatial Information
Sciences 35(B5), pp. 929–933.
Green, P., 1995. Reversible Jump Markov Chain Monte Carlo
Computation and Bayesian Model Determination. Biometrika 82,
pp. 711–732.
Hartley, R. and Zisserman, A., 2003. Multiple View Geometry in
Computer Vision – Second Edition. Cambridge University Press,
Cambridge, UK.
Huang, H. and Mayer, H., 2007. Extraction of the 3d Branching
Structure of Unfoliaged Deciduous Trees from Image Sequences.
Photogrammetrie – Fernerkundung – Geoinformation (6/2007),
pp. 429–436.
Figure 4: Result for a sympodial-monochasium tree limited to the
trunk and the first several levels of branches – image triplet (left)
as well as the VRML model (right) seen from the corresponding
camera positions.
Hyyppä, J., Mielonen, T., Hyyppä, H., Maltamo, M., Yu, X.,
Honkavaara, E. and Kaartinen, H., 2005. Using Individual Tree
Crown Approach for Forest Volume Extraction with Aerial Images and Laser Point Clouds. The International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences 36 (3/W19), pp. 144–149.
Our future research aims at more complex, e.g., context-sensitive,
Lintermann, B. and Deussen, O., 1999. Interactive Modeling of
L-systems based on a more detailed geometrical modeling with
Plants. IEEE Computer Graphics and Applications 19(1), pp. 2–
11.
generalized cylinders for the reconstruction of the branching struc257
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B3a. Beijing 2008
Figure 5: Result for a monopodial tree limited to the trunk and the first several levels of branches – image quadruple taken at night (top)
as well as the VRML model (bottom). Please note the occluded branch on the lower left.
Mayer, H., 2005. Robust Least-Squares Adjustment Based Orientation and Auto-Calibration of Wide-Baseline Image Sequences.
In: ISPRS Workshop in conjunction with ICCV 2005 “Towards
Benchmarking Automated Calibration, Orientation and Surface
Reconstruction from Images” (BenCos), Beijing, China, pp. 1–6.
Tan, P., Zeng, G., Wang, J., Kang, S. and Quan, L., 2007. Imagebased Tree Modeling. ACM Transaction on Graphics.
Mĕch and Prusinkiewicz, P., 1996. Visual Models of Plants Interacting with Their Environment. In: SIGGRAPH ’96, pp. 397–
410.
Neal, R., 1993. Probabilistic Inference Using Markov Chain
Monte Carlo Methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.
Nistér, D., 2004. An Efficient Solution to the Five-Point Relative Pose Problem. IEEE Transactions on Pattern Analysis and
Machine Intelligence 26(6), pp. 756–770.
Persson, Å., Holmgren, J., Söderman, U. and Olsson, H., 2004.
Tree Species Classification of Individual Trees in Sweden by
Combining High Resolution Laser Data with High Resolution
Near-infrared Digital Images. The International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences 36 (8/W2), pp. 204–207.
Pfeifer, N., Gorte, B. and Winterhalder, D., 2004. Automatic
Reconstruction of Single Trees from Terrestrial Laser Scanner
Data. The International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences 35 (B5), pp. 114–119.
Sakaguchi, T. and Ohya, J., 1999. Modeling and Animation of
Botanical Trees for Interactive Environments. In: Symposium on
Virtual Reality Software and Technology, pp. 139–146.
Shlyakhter, I., Rozenoer, M., Dorsey, J. and Teller, S., 2001. Reconstructing 3D Tree Models from Instrumented Photographs.
IEEE Computer Graphics and Applications 21(3), pp. 53–61.
258
Download