ROAD EXTRACTION IN URBAN AREAS SUPPORTED BY CONTEXT OBJECTS

advertisement
Stefan Hinz
ROAD EXTRACTION IN URBAN AREAS SUPPORTED BY CONTEXT OBJECTS
Stefan Hinz, Albert Baumgartner
Technische Universität München, D–80290 Munich, Germany
Tel: +49-89-2892-3880
Fax: +49-89-280-9573
E-mail: fhinz j albertg@photo.verm.tu-muenchen.de
KEY WORDS: Image Understanding, Road Extraction, Context
ABSTRACT
In this contribution, we introduce our concept for automatic road extraction in urban areas using high resolution aerial
imagery and dense height information. For facilitating the extraction we use a hierachical road model comprising different
road elements (markings, lanes, junctions, road network) at different scales as well as context objects of roads like buildings and vehicles. Global contextual knowledge about roads in residential areas helps us to focus on certain image parts,
and thus, to cope with the inherently high complexity of urban scenes. Road extraction is then performed in a mainly data
driven fashion starting with the extraction of the basic road elements at the lowest level of the road model and reaching
the top level, i.e., a topologically consistent road network, with the final steps of processing. On each step in between,
hypotheses for the respective road elements are generated and validated, and thus, evidence about roads is collected and
transfered to the next step. Since urban roads are often characterized by rather dense traffic the detection of vehicles and
vehicle convoys plays a major role. The results of the currently implemented modules give us rise to further realize this
concept.
1 INTRODUCTION
In the past, the automation of road extraction from digital imagery has received considerable attention. Research on this
issue is mainly motivated by the increasing importance of geographic information systems (GIS) and the need for data
acquisition and update for GIS. Especially in the field of urban planning, there is a high demand for actual, accurate
and detailed information about the road network as required for applications like traffic flow analysis and simulation,
estimation of air and noise pollution, street maintenance, etc.. The OEEPE Survey on 3D City Models by the European
Organization for Experimental Photogrammetric Research (OEEPE) confirmed this demand yielding about 85 of the
participants mentioning that information about the road network is of their greatest interest (see (Fuchs et al., 1998)).
%
For a number of cities in Europe and Northern America, larger parts of the road network are already digitally mapped,
most important in commercial and military navigation and route planning systems. In practice, however, the level of detail
of the data acquisition, i.e., which elements of a road are thought to be important and which are negligible, is (naturally)
influenced by the application for which the data is collected. Since, for navigation purposes, the topology of the road
network is on principle more important than the number of lanes (except traffic-dependent route planning), most existing
area-wide road databases provide road information in form of the road axis attached with the road class or the road width.
They thus lack of the proper level of detail mandatory for the above-mentioned applications (though, in many cases, the
underlying data models would allow for a more detailed road description). Since a high percentage of the relevant urban
features, including roads and their sub-structures, i.e., lanes and road markings, can be extracted from aerial images with
high accuracy (Englisch and Heipke, 1998), the interpretation of aerial images is despite of existing GIS data a crucial
task for the acquisition of detailed road data. E.g., (Bogenberger et al., 1999) use aerial images in order to count lanes
and vehicles, measure queue lengths, and estimate vehicle velocities for calculating and simulating traffic flow conditions
on high capacity roads. The time- and cost intensive manual procedure necessary for extracting the desired information
constitutes the main bottleneck, which needs to be overcome.
In this paper, we present our concept for the automatic extraction of roads and their sub-structures from aerial imagery
taken over urban areas. Besides multiple overlapping images, the approach relies on accurate height information as, e.g.,
derived from airborne laser data. The approach is designed to include additional information sources, without depending
on it. Such sources are most important color channels, external information provided by GIS and results of semi- or
fully-automatic building extraction. As in our previous work (Baumgartner et al., 1999, Baumgartner et al., 1997), we
distinguish three ”global contexts” called context regions: rural, forest, and urban. For road extraction in urban areas we
use a model that describes roads and their sub-structures at different scales for different sensors. The extraction starts
with the segmentation of Regions of Interest (RoI) based on height and image data at coarse scale. Then, the basic road
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
405
Stefan Hinz
features at the lowest level of the road model, i.e., road markings and lanes are extracted separately in each fine scale
image. After a first validation, which includes also the detection of vehicle, the results of the different images are fused
into larger road segments (possibly consisting of several lanes). During the next step junctions and connection hypotheses
between the road segments are generated and validated in each image. The results are fused again, and the next iteration
starts. The extraction is completed if no more connections can be formed or verified.
The next section describes existing approaches on road extraction being most relevant to our work. After a short discussion
of the appearance of roads in urban areas, our road model is introduced in Sect. 3. In Sect. 4 the extraction strategy is
outlined in detail and results of the currently implemented modules are shown. Finally, the results are discussed and
conclusions for our future work are drawn (Sect. 5).
2 RELATED WORK
Compared to the intense research activities on the extraction and visualization of buildings, site models, or city models
(see (Mayer, 1999) and (Förstner, 1999) for an overview), only few groups work on the automatic extraction of roads
in urban environments. One reason for this is that most of the past and actual efforts in road extraction, including our
own previous work, rely on road models that describe the appearance of roads in rural terrain. Depending on the image
resolution, roads are modeled as line-like structures (Wiedemann and Hinz, 1999, Heller et al., 1998, Gruen and Li,
1997, Geman and Jedynak, 1996) or relatively homogeneous areas satisfying certain shape and size features (Harvey,
1999, Zhang and Baltsavias, 1999, Baumgartner et al., 1999, Mayer et al., 1998, Trinder and Wang, 1998). Throughout
all the different approaches some issues have proved to be of great importance: The fusion of different scales helps to
eliminate isolated disturbances on the road while the fundamental structures are emphasized (Mayer and Steger, 1998).
Exploiting the network characteristics adds global information and, thus, the selection of the correct hypotheses becomes
easier (Wiedemann, 1999, Heller et al., 1998). The integration of context helps to cope with the influence of background
objects like trees and buildings on the appearance of roads. In the following, we discuss three selected approaches, each
consisting of certain promising elements, and each having significant influence on our concept:
(Vosselman and de Gunst, 1997) and (de Gunst, 1996) use a detailed and scale dependent road model for the actualization
of outdated road maps. According to road construction guidelines, they model roads and freeways as aggregation of different lanes. Bright markings separate the individual lanes. Intersecting roads form junctions of different types (crossings,
Y-junctions, and fly-overs). In order to detect changes in the road network, in a first step, the old information is validated
using road features extracted from medium or small scale aerial images (about 0.4 m / 1.6 m resolution). Then, in case
of inconsistencies, a change in the road width, e.g. an additional lane, or a junction where a new road branches off is hypothesized. Again, extracted road features verify the hypotheses. With this strategy it is possible to detect both changes in
existing roads and completely new roads. They use, however, rather simple thresholding and profile matching techniques
for the extraction of road features. The approach is consequently very sensitive for disturbances like cars, shadows or
occlusions. Hence, this system seems only applicable to roads and freeways in open and rural areas, though the concept
is more generic.
In the approach of (Ruskoné, 1996, Ruskoné et al., 1994, Airault et al., 1994), the interpretation of local context is used
to verify an automatically extracted road network. The extraction starts with detecting seed points in a medium (about
0.5 m resolution) scale image followed by low level road tracking along homogeneous elongated areas. Then, hypotheses
for the connection of the extracted road parts are generated and checked based on geometric criteria like distance and
direction. The resulting road network is geometrically adjusted using snakes. For verification, the network is split into
smaller pieces. A supervised classification assigns the meanings ”road”, ”crossing”, ”shadow”, ”tree”, or ”field” to each
piece. These so-called local contexts are validated using several geometric and radiometric criteria. In urban areas, where
techniques based on road profiles or road sides may yield erratic results, the validation is done by detecting cars. With a
neural network classifier car-like patterns are extracted and thereafter grouped into convoys (Ruskoné et al., 1996).
The approach of (Price, 1999) is particularly designed for extraction of urban street grids from medium and high resolution
images (about 0.8 m - 0.2 m resolution). The road network is modeled as a combination of grids with a rather regular mesh
size, the size of a single building block. Junctions – the nodes on the grid – are connected by individual road segments
of approximately constant width and height. A human operator initializes the grid spacing and orientation by manual
selection of three points specifying the first mesh. Then, the grid is iteratively expanded by adding new meshes. In each
iteration, the new road segments are refined and evaluated by simultaneously matching their sides to image edges. Thus,
longer portions of the road sides must be visible at least in one of different overlapping images. During final verification,
height information and contextual knowledge are used to adjust the position of several consecutive road segments and to
remove short road portions. To do this, the adjusted segments are evaluated anew but now regarding each segment’s direct
neighbors.
The above approaches show individually promising parts of road model and extraction strategy for different types of
scenes. The varying appearance of roads in complex scenes can be captured by a detailed model for a road and its
406
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
Stefan Hinz
contextual relations to background objects. Vice-versa, context helps to guide the interpretation, e.g. ”easiest first”,
making the extraction easier to handle and more robust. This can be supported by considering the function of roads
connecting different sites and thus forming a fairly dense and sometimes even regular network. Data from different
and complementary sources, especially overlapping images and accurate height information, is very useful to address
occlusions and shadows in urban environments. Furthermore, all of these knowledge sources need to be integrated into a
single system.
3 MODELING URBAN ROADS
3.1 Appearance of Roads
Figure 1 visualizes two examples of urban roads. It is obvious that these images exhibit a more complex content than
scenes showing rural areas since the number of different objects and their heterogeneity is much bigger.
Generally, this implies that more details of the road and context model must be
exploited for road extraction. In dense urban areas, for instance, some of the roads
comprise several lanes that are linked by complex road crossings. What is more,
by the increase of the number of objects the complexity of their relations grows,
too. In Fig. 1a), for example, some parts of the streets are occluded by vehicles,
especially at the road sides. Hence in this particular case, a road is mainly defined
by groups of (parking) cars but not by parallel road sides or homogeneous surface.
A similar relation is the occurrence of shadows cast by high buildings. A road
generally appears bright in open areas, but in the case of shadows two problems
for the extraction arise: (1) the surface is darkened significantly, and (2) at the
margin of the shadow regions, strong gray value edges in almost any direction
may occur on the road disturbing the usually homogeneous reflectance.
Figure 1b) shows a different kind of problem: The roof of the rectangular building
in the center of the image could be wrongly identified as a parking lot because
its shape and reflectance properties match the ones of a road-like object almost
perfectly. Only the combination with height data as given by a DSM (Digital
Surface Model) or, as in this case, implicitly given by a corresponding shadow
region provides enough information for avoiding this misdetection.
(a)
What follows is, that on one hand those features of a road ought to be selected
on which the influence of the above mentioned phenomena is minimal. On the
other hand, it is very important to consider context objects, in particular different
kinds of vehicles, in order to explain abnormal changes in the appearance of a
road. Hence, the model described below consists of two parts: The first part describes characteristic properties of roads in the real world and in aerial imagery,
and represents a road model derived from these properties. The second part defines different local contexts and assigns those to the global contexts.
(b)
3.2 Road Model
Figure 1: Roads in urban areas
Besides an (at least partially) homogeneous surface and more or less densely arranged vehicles, one obvious feature of roads in urban areas are road markings
separating a road into different lanes. To make use of them, we extended the model of our previous work and now model
roads and complex junctions as a combination of several lanes consisting of one or more lane segments. Dashed or solid
linear markings define the border of a lane segment. The road model condensed from the findings of the previous section
is illustrated in Fig. 2 in form of a hierarchical semantic network.
The model describes objects by means of “concepts,” and is split into three levels defining different points of view.
The real world level comprises the objects to be extracted and their relations. On this level the road network consists of junctions and road links that connect junctions. Road links are constructed from road segments. In fine scale,
road segments are aggregated by lanes, which consist of pavement and markings. For markings there are two specializations: Symbols and line-shaped markings. The concepts of the real world are connected to the concepts of the
geometry and material level via concrete relations (Tönjes, 1997), which connect concepts representing the same object on different levels. The geometry and material level is an intermediate level which represents the 3D-shape of an
object as well as its material (Clément et al., 1993). The idea behind this level is that in contrast to the image level it
describes objects independently from sensor characteristics and viewpoint. Road segments are linked to the “straight
bright lines” of the image level in coarse scale. In contrast to this, the pavement as a part of a road segment in fine
scale is linked to the “elongated bright region” of the image level via the “elongated, flat concrete or asphalt region.”
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
407
Stefan Hinz
Whereas the fine scale gives detailed
information, the coarse scale adds
global information. Because of the
abstraction in coarse scale, additional
correct hypotheses for roads can be
found and sometimes also false ones
can be eliminated based on topological
criteria, while details, like exact width
and position of the lanes and markings,
from fine scale are integrated. In this
way the extraction benefits from both
scales.
Figure 2: Road model
Road network
Part-of relation
Junction
Specialization
Concrete relation
Road segment
Fine scale
Complex junction
Lane
is parallel or
orthogonal
Simple junction
General relation between objects
Coarse scale
is parallel
Lane segment
Marking
Line-shaped
3.3 Context Model
connects
Road link
Pavement
Symbols
marking
is aligned
Long marking
Real world
Short marking
The road model presented above comElongated, flat
Compact
Geometry and
prises knowledge about radiometric,
Long
Short
Colored
concrete or
concrete or
colored line
colored line
symbols
geometric, and topological characterasphalt region
asphalt region
Material
istics of roads. This model is exLong
Short
Bright
Elongated
Compact
Straight
Bright
tended by knowledge about context:
Image
bright line
bright line
symbols
bright region
bright region
bright line
blob
So-called context objects, i.e. background objects like buildings, trees, or
vehicles, can support road extraction, but they can also interfere (see the discussion of Sect. 3.1). Also, external GIS data
can be regarded as context object. Experience has shown that modeling this interaction between road objects and context
objects on a local level as well as a global level is an aid for guiding the extraction since the interpretation problem is split
into smaller sub-problems which can be solved more efficiently by using specific models and extraction strategies.
In order to capture the varying appearance of roads globally, we distinguish between the context regions urban, forest,
and rural (cf. (Baumgartner et al., 1997)). Furthermore, we model the local context with so-called context relations, i.e.,
certain relations between a small number of road and context objects. For typical context relations in the rural context
region, we refer the reader to (Baumgartner et al., 1997). In the following we turn our focus on context relations in the
urban context region (see Fig. 3).
Vehicle
Building
GIS road axis
Context object
Almost every building in the real world is connected with the
road network. The denser the settlement is, the closer the buildis approximately
casts shadow on
occludes
is close to
ings move to the road, and the more parallel is their outline with
parallel to
the road sides . Therefore, this context relation gets more useful
for the extraction in downtown areas, where, in some extreme
Junction
Lane segment
Road object
cases, roads and junctions are purely defined by the building
outlines. Vice-versa, buildings or other high objects standing
defines end of
divides
close to the road potentially occlude larger parts of it or cast
shadows on it. Hence, a context relation ”occlusion”, gives rise
Orthogonal marking
Sub-structure
Relation
to select another image providing a better view on that particular
Figure 3: Context relations for urban areas
part of the scene, whereas a context relation ”shadow” can tell
an extraction algorithm to choose modified parameter settings.
Both context relations imply that roads lie beneath the surrounding objects. Consequently, there is no need to search for
roads on locally high objects.
For some settlements, road axes might be available digitally. Such kind of information can be integrated in a very consistent way by using a context relation that models parallelism and closeness between an extracted piece of road and the
mapped road axis. With such a context relation, cues are provided where a road might be present. Nevertheless, the
extraction has to prove independently if the road truly exists.
Vehicles are related with a road, or more specifically, with a lane segment by means of occluding the lane’s pavement.
However, since vehicles drive or stand collinear with the a lane (at least in most cases), we can directly use a detected
vehicle or vehicle convoy for road extraction – in particular, we treat a detected vehicle as lane segment. By doing so, we
need not to take care of moving vehicles if we want to fuse extraction results achieved from images taken at a different
time.
In addition to relations between road and context objects, we also consider relations between the object and its substructures. This is best exemplified with orthogonal markings at the end of a lane. In most cases, they define the end of a
lane and relate it to a junction. Fig. 3 summarizes the relations between road objects, context objects, and sub-structures
by using the concepts ”Lane segment” and ”Junction” as the basic entities of a road network.
Note, however, that the use of knowledge about local context and the verification of specific relations between local objects
408
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
Stefan Hinz
will in most cases be possible in high resolution imagery only, because the image features which contribute to the local
context are usually not very prominent. Therefore, the local context is more tightly connected with the high resolution,
whereas information about global context usually can be derived from images with a resolution > 2 m and is useful to
guide the road extraction in both scales.
4 EXTRACTION STRATEGY
4.1 Overview
Generally speaking, the extraction strategy inheres knowledge how and when certain parts of the road and context model
are optimally exploited. In some reasonable cases, this knowledge is easy to implement as a set of predefined fixed
rules, e.g., a road never runs through a house (at least in our model). However, the flexibility of rule-based systems is
well-known to be rather limited. Dynamic control systems, e.g., based on Bayesian networks overcome such a drawback.
However, they can get computationally expensive rather quickly. Therefore, we plan to implement a control strategy which
applies dynamic decision making to smaller sub-problems, e.g., finding reasonable parameter settings for the extraction
of markings in shadow regions of a particular scene, whereas the overall flow of the extraction, i.e., (1) Context-based
data analysis, (2) Extraction of salient road segments, and (3) Road network completion is defined as shown in Fig. 4.
Context-based
data analysis
Segmentation of
Since the road model incorporates 3D information as well as
context regions
small sub-structures to a considerable extent, the extraction relies on one side on aerial imagery consisting of overlapping gray
urban areas
forest
rural areas
scale images with a fairly high resolution (< 15 cm) and on the
other side on an accurate Digital Surface Model (DSM), e.g.,
Analysis of context relations:
as it can be derived from dense airborne laser data. In conRoad extraction using
shadow, occlusion,
trast to other approaches, we neither use orthophotos for extracapproach for rural areas
building outlines, GIS road axes
tion nor we extract completely in 3D as, e.g., (Gruen and Li,
1997) do, by using multiple images simultaneously. The latFocus on initial Regions of Interest:
ter procedure is conceptually elegant, but it inherently implies
Hypotheses for road axes:
Hypotheses for road sides:
matching procedures throughout the extraction process, which
- Fragmented, mostly parallel
- Bright lines in low resolution images
become increasingly burdensome with higher scene complexedges in high resolution images
- Elongated valleys in DSM
Parallel building outlines
ities. Furthermore, we want to avoid feature extraction in orthophotos – despite of using accurate DSM information. Inaccuracies of the DSM due to erroneous height measurements,
Construction of lane segments:
filtering, resampling, or moving objects still remain and, for inDetection of vehicle (convoy) outlines:
Extraction of marking groups:
- Composition of approximately
- Groups of aligned, mostly
stance, they would disturb useful collinear properties of image
rectangular structures
symmetric bright, faint lines
structures like road markings. Hence, during extraction, we sep- Analysis of radiometric properties
- Parallel marking groups with
homogeneous area in between
- Analysis of shadow regions
arate image and height information to a certain extent:
At the first stage, down-sampled image information is used
Fusion based on lane segments:
to extract the global context regions (see (Baumgartner et al.,
Merging of lane segments
1997)). Contrarily, we use DSM information to analyze height
Detection and removal of inconsistencies
dependent context relations, e.g., for the prediction of shadow
-> Road segments
regions and occlusions. In dense built-up areas approximate
road axes and road sides are derived from the DSM, whereas,
Generation of connection hypotheses
e.g. in suburban areas, the image information is used for this.
Local connections and junctions:
Global network connectivity
Thereafter, we project the results of context analysis in each im- local connectivity criteria
- Global grouping
- Constraints based on reasonable
- Detour analysis
age separately. This enables us to treat basic bottom-up procelane combinations
dures purely as 2D-problems, e.g., the extraction and grouping
of linear features for hypothesizing vehicle outlines and for deVerification of connection hypotheses
tecting groups of markings.
Road extraction tools:
Context relations:
At the next stage, if enough information is collected to construct
- Homogeneity tracking
- Shadow
road objects of a certain spatial extent – in particular, lane seg- Snakes
- Occlusion
- Detection of markings
- Vehicles
ments – we include 3D-information by back-projecting the inNo further hypotheses
dividual objects on the height model. Here, we fuse the results
of the individual images in order to achieve a consistent data
set. Optionally, results of road extraction in different context
Road network
regions can be integrated at this point. Once the lane segments
are fused and aggregated to more complex objects (lanes and fiFigure 4: Extraction Strategy
nally road segments), connection hypotheses between the road
segments can be generated. Experience from road extraction in the rural context region has shown that this should be done
both locally and globally. Thereafter, verification is carried out in the original images using context relations and standard verification techniques like snakes and homogeneity tracking. The accepted hypotheses are fused with the already
Extraction of salient roads
Road network completion
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
409
Stefan Hinz
extracted roads, again, and new connections are hypothesized. This process iterates until no more connections are found
to be correct.
4.2 Preliminary results
In the first phase of realization of the proposed concept, we focused
on one hand on the context-based data analysis, in particular, the
segmentation of Regions of Interest (RoI) and the delineation of
approximate position and direction of potential roads, since context
analysis plays a key role to reduce the scene complexity. On the
other hand we implemented modules for the construction of lane
segments based on the extraction and grouping of markings. For
vehicle detection, some preliminary steps have been undertaken.
By this, the contribution of small sub-structures and context objects to road extraction could be investigated. In the following, an
example is described which shows results that can be achieved with
the modules implemented up to now. For implementation issues,
we refer the reader to (Hinz et al., 1999).
The first example shows imagery and DSM of the downtown area
of Washington (DC) ( 20 cm, resp. 3 m resolution). RoIs are
segmented using the context relation that most buildings are higher
than the road surface. Therefore, the parts that correspond to locally
high regions in a DSM are removed from the image. The segmentation procedure compares a smoothed version of the DSM with the
original DSM and removes regions where the height difference between both data exceeds a threshold. Furthermore, approximately
parallel building outlines are detected by searching for elongated
valleys in the DSM. Figure 5 shows the down-sampled image, the
DSM image, and the segmented image with the detected valleys.
The results are then transformed to the original image resolution
followed by the extraction of thin, bright lines. (see Fig. 6).
(a) Original image in reduced resolution (3m)
(b) DSM in reduced resolution (3m)
(c) Masked image with valleys detected in DSM
Figure 5: Context analysis with DSM
Thereafter, an iterative graph-based grouping algorithm is applied
to group the lines into extended linear objects according to perceptual principles: absolute and relative proximity of lines as well
as their continuation (see Fig. 7 a) ). In regions, where the context
analysis was able to find hypothetical road center axes from parallel
building outlines, i.e., the valleys in the DSM, only marking groups
are kept that show good parallelism with DSM-valleys. The grouping procedure results in a set of unique and topologically consistent
marking groups, from which hypotheses for lane segments are generated. We first find parallel marking groups and define their medial
axis as the lane axis. However, due to occluding vegetation or parking cars, we can rarely detect road markings at the road sides, even
if they are painted there. Hence, we construct additional hypotheses for lane segments on those portions of each side of a particular
marking group where no parallel relation to another group could be
established.
Finally, the hypotheses are validated: The surface of a lane segment
should be homogeneous in the direction of a lane. In road regions
where this criterion is not fulfilled, a car must be present. We there(a)
(b)
Figure 6: (a) DSM-valleys projected on image
fore check the radiometry of lanes in one pixel wide stripes along
patch, (b) result of line extraction
the respective lane axis. To this end, we shift a one dimensional
mask over each stripe and calculate mean brightness and variance.
Please note, that the lanes are not constrained to be straight, though
some generic knowledge about the geometry of lanes by means of lower bounds for their length and curvature is included
(see, e.g., the upper left corner of Fig. 7 b).
Homogeneous parts of a lane segment are labeled as ”good” hypotheses, whereas larger portions with high variance
indicate abnormal changes in the surface and are labeled as search regions for vehicles. Figure 7 b) visualizes selected bright and homogeneous lane segments, but no hypothesis is ultimately rejected at this point. Some of the
410
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
Stefan Hinz
hypotheses are not selected because of dark shadows and some of the correct hypotheses are interrupted because of
cars. Dark regions can be detected by shadow prediction and verification using the respective context relation. Since
we already use this kind of information in our road extraction system for rural areas, extracting the exact shadow region in urban areas was not of our primary interest. We rather turned the focus of our work on the detection of
vehicles which should provide the system an explanation why such inhomogeneities inside of lane segments occur.
We currently work on the implementation of a model-based scheme
for vehicle detection. It starts with the extraction of rectangular
edge structures. Then, adjacent rectangles are iteratively concatenated to set up 2D hypotheses for the outlines and center-lines of
isolated vehicles or vehicle convoys. Basically, this restricts the hypothesis formation to approximate nadir views. However, since we
generally have to select such views in urban areas due to occluding objects, this seems to us being no true limitation. Figure 8 a)
shows a particular part of a lane axis (in black) which was not accepted by the homogeneity check. (In fact, it is the horizontal road,
resp. lane, in the center of the image shown in 7 b) ). The result of
hypothesis formation is visualized in Fig 8 b), extracted edge segments are plotted in black. The generated car hypothesis (shown
in white) consists of individual rectangles which complete the fragmented edges resulting in a chain of rectangles. The medial axis of
the rectangle chain corresponds to the center-line of the hypothesized vehicle.
Once, we have generated a hypothesis for a vehicle’s center-line, we
(a)
(b)
are able to check for significant geometric and radiometric symmeFigure 7: (a) Detected groups of markings, (b)
tries along and across the vehicle. In aerial imagery, most vehicles
lane segments with bright homogeneous interior
are characterized by dark windows. Furthermore, their front, top,
and rear, show similar reflectance. The use of color images could be a potential aid for the decision which rectangles
belong to one single car and how to discriminate different cars within a convoy. With this information we should be
able to select a few, rather generic 3D models from a database containing a limited set of vehicle models. Verification is
carried out by projecting the selected models back to the image and matching, e.g., their wire-frame representation to the
grayvalue gradients. A good match between the model and the image indicates that the system has selected a reasonable
model. An additional and, in fact, strong verification for a vehicle is its shadow region. By illuminating the 3D model we
can predict the corresponding shadow region on the road surface and search in the image for the respective features (i.e.,
dark region, corresponding edges, dark side of the edges against the sun)
However, it should be noted that our primary goal is the extraction
of lanes and not the detection of individual vehicles. It is sufficient
for this purpose to know that a vehicle occludes the road, but not
what kind of vehicle it is.
5 SUMMARY AND OUTLOOK
We presented our concept for automatic road extraction in urban
(a)
areas. It is based on a detailed model for roads, including lanes and
road markings, and their context. By using image and DSM information, the extraction strategy first exploits global context knowledge, in order to reduce the inherently high complexity of urban
scenes. Then, it focuses on image parts where initial lane and
road hypotheses are rather easy to detect. Iteratively hypothesizing
and verifying connections between already extracted road segments
completes the network. So-called context relations help to analyze
and explain abnormal changes in the expected appearance of the
(b)
Figure 8: Rectangular car hypothesis
road. Especially, the context relation that models vehicles on roads
is of great importance. The results of the currently implemented modules encourage us to further realize this concept.
However, there are still many steps to go and there are still many questions to be answered.
REFERENCES
Airault, S., Ruskoné, R. and Jamet, O., 1994. Road detection from aerial images: a cooperation between local and global
methods. In: J. Desachy (ed.), Image and Signal Processing for Remote Sensing, Proc. SPIE 2315, pp. 508–518.
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
411
Stefan Hinz
Baumgartner, A., Eckstein, W., Mayer, H., Heipke, C. and Ebner, H., 1997. Context-Supported Road Extraction. In:
(Gruen et al., 1997), pp. 299–308.
Baumgartner, A., Steger, C., Mayer, H., Eckstein, W. and Ebner, H., 1999. Automatic Road Extraction Based on MultiScale, Grouping, and Context. Photogrammetric Engineering and Remote Sensing 65(7), pp. 777–785.
Bogenberger, K., Ernhofer, O. and Schütte, C., 1999. Effects of Telematic Applications for a High Capacity Ring Road
in Munich. In: Proceedings of the 6th World Congress on Itelligent Transportation Systems, Toronto.
Clément, V., Giraudon, G., Houzelle, S. and Sandakly, F., 1993. Interpretation of Remotely Sensed Images in a Context of
Multisensor Fusion Using a Multispecialist Architecture. IEEE Transactions on Geoscience and Remote Sensing 31(4),
pp. 779–791.
de Gunst, M. E., 1996. Knowledge-based Interpretation of Aerial Images for Updating of Road Maps. PhD thesis, Delft
University of Technology, Delft.
Englisch, A. and Heipke, C., 1998. Erfassung und Aktualisierung topographischer Geo-daten mit Hilfe analoger und
digitaler Luftbilder. Photogrammetrie - Fernerkundung - Geoinformation (PFG).
Förstner, W., 1999. 3D-City Models: Automatic and Semiautomatic Acquisition Techniques. In: D. Fritsch and R. Spiller
(eds), Photogrammetric week’99, Wichmann.
Fuchs, C., Gülch, E. and Förstner, W., 1998. OEEPE Survey on 3D-City Models. OEEPE Publication, 35.
Geman, D. and Jedynak, B., 1996. An active testing model for tracking roads in satellite images. IEEE Transactions on
Pattern Analysis and Machine Intelligence 18(1), pp. 1–14.
Gruen, A. and Li, H., 1997. Linear feature extraction with LSB-snakes. In: (Gruen et al., 1997), pp. 287–298.
Gruen, A., Baltsavias, E. and Henricsson, O. (eds), 1997. Automatic Extraction of Man-Made Objects from Aerial and
Space Images (II). Birkhäuser Verlag, Basel.
Harvey, W. A., 1999. Performance Evaluation for Road Extraction. In: International Workshop on 3D Geospatial Data
Production: Meeting Application Requirements, Paris.
Heller, A., Fischler, M., Bolles, R. and Connolly, C., 1998. An Integrated Feasibility Demonstration for Automatic
Population of Spatial Databases. In: Image Understanding Workshop.
Hinz, S., Baumgartner, A., Steger, C., Mayer, H., Eckstein, W., Ebner, H. and Radig, B., 1999. Road extraction in rural
and urban areas. In: W. Förstner, C.-E. Liedke and J. Bückner (eds), SMATI’99: Semantic Modeling for the Acquisition
of Topographic Information from Images and Maps, pp. 133–153.
Mayer, H., 1999. Automatic Object Extraction from Aerial Imagery – A Survey Focussing on Buildings. Computer
Vision and Image Understanding 74(2), pp. 138–149.
Mayer, H. and Steger, C., 1998. Scale-Space Events and Their Link to Abstraction for Road Extraction. ISPRS Journal
of Photogrammetry and Remote Sensing 53(2), pp. 62–75.
Mayer, H., Laptev, I. and Baumgartner, A., 1998. Multi-Scale and Snakes for Automatic Road Extraction. In: Fifth
European Conference on Computer Vision, pp. 720–733.
Price, K., 1999. Road Grid Extraction and Verification. In: International Archives of Photogrammetry and Remote
Sensing, Vol. XXXII, part 3-2W5, International Society for Photogrammetry and Remote Sensing.
Ruskoné, R., 1996. Road Network Automatic Extraction by Local Context Interpretation: application to the production
of cartographic data. PhD thesis, Université Marne-La-Valleé.
Ruskoné, R., Airault, S. and Jamet, O., 1994. Road network interpretation: A topological hypothesis driven system. In:
International Archives of Photogrammetry and Remote Sensing, Vol. XXX, part 3/2, pp. 711–717.
Ruskoné, R., Guiges, L., Airault, S. and Jamet, O., 1996. Vehicle detection on aerial images: A structural approach. In:
13th International Conference on Pattern Recognition, Vol. III, pp. 900–903.
Steger, C., 1998. An Unbiased Detector of Curvilinear Structures. IEEE Transactions on Pattern Analysis and Machine
Intelligence 20(2), pp. 113–125.
Tönjes, R., 1997. 3D Reconstruction of Objects from Aerial Iamges Using a GIS. In: International Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 3-2W3, pp. 140–147.
Trinder, J. and Wang, Y., 1998. Knowledge-Based Road Interpretation in Aerial Images. In: International Archives of
Photogrammetry and Remote Sensing, Vol. XXXII, part 4, pp. 635–640.
Vosselman, G. and de Gunst, M., 1997. Updating Road Maps by Contextual Reasoning. In: (Gruen et al., 1997), pp. 267–
276.
Wiedemann, C., 1999. Automatic completion of road networks. In: Proceedings of the ISPRS Joint Workshop ’Sensors
and Mapping from Space 1999’, Hanover.
Wiedemann, C. and Hinz, S., 1999. Automatic Extraction and Evaluation of Road Networks from Satellite Imagery.
In: International Archives of Photogrammetry and Remote Sensing, Vol. XXXII, part 3-2W5, International Society for
Photogrammetry and Remote Sensing, pp. 95–100.
Zhang, C. and Baltsavias, E., 1999. Road Network Detection by Mathematical Morphology. In: International Workshop
on 3D Geospatial Data Production: Meeting Application Requirements, Paris.
412
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B3. Amsterdam 2000.
Download