InfraWatch: Data management of large systems for monitoring

advertisement
InfraWatch: Data management of large systems
for monitoring infrastructural performance
Arno Knobbe1 , Hendrik Blockeel1,2 , Arne Koopman1 , Toon Calders3 , Bas
Obladen4 , Carlos Bosma4 , Hessel Galenkamp4 , Eddy Koenders5 , and Joost
Kok1
1
3
5
LIACS, Leiden University, the Netherlands
knobbe@liacs.nl
2
Katholieke Universiteit Leuven, Belgium
Eindhoven Technical University, the Netherlands
4
Strukton, the Netherlands
Delft University of Technology, the Netherlands
Abstract. This paper introduces a new project, InfraWatch, that demonstrates the many challenges that a large complex data analysis application has to offer in terms of data capture, management, analysis and
reporting. The project is concerned with the intelligent monitoring and
analysis of large infrastructural projects in the public domain, such as
public roads, highways, tunnels and bridges. As a demonstrator, the
project includes the detailed measurement of traffic and weather load on
one of the largest highway bridges in the Netherlands. As part of a recent
renovation and re-enforcement effort, the bridge has been equipped with
a substantial sensor network, which has been producing large amounts
of sensor data for more than a year. The bridge is currently equipped
with a multitude of vibration and stress sensors, a video camera and
weather station. We propose this bridge as a challenging environment
for intelligent data analysis research. In this paper we outline the reasons for monitoring infrastructural assets through sensors, the scientific
challenges in for example data management and analysis, and we present
a visualization tool for the data coming from the bridge. We think that
the bridge can serve as a means to promote research and education in
intelligent data analysis.
1
Introduction
In practical projects involving data, one often has to model and analyze complex, dynamic systems. An example of this, which is gaining importance, is the
monitoring of infrastructural assets such as bridges, tunnels, etc. [1]. Nowadays,
the use of advanced sensing and monitoring systems provides the opportunity to
collect real-time information from such structures, in order to monitor their performance and to deduce relevant knowledge for decisions on their maintenance
demand. Asset owners can use this information to assess the life time perspective
of (crucial) infrastructural links and to plan the window within which maintenance can be conducted. When considering the stock of infrastructural assets in
view of service-life assessment, monitoring and sensing systems are very valuable
instruments that can be used to extract actual information about its condition
and performance.
In terms of condition, sensor systems are mounted in or to structures that
monitor the environmental as well as the internal condition. Environmental conditions are related to the climatic changes in which the structure has to be
operational, and, in terms of performance, the external and internal actions acting on the structure are recorded. Long and enduring 24/7 monitoring systems
are necessary that generate large amounts of data that needs to be evaluated
in a smart way such that relevant changes in the data will be noticed and asset
management systems will be informed and/or alarmed efficiently.
Managing the huge amounts of data, extracted from the monitoring systems,
requires integral knowledge in the field of data management. The best representation for storing the data is not necessarily the best representation for the
evaluation of the data. Intelligent Data Analysis is needed for the smart evaluation of data extracted from the degradation mechanisms. The ultimate challenge
is to design, develop and optimize a data management system for measuring and
reporting the actual performance of large infrastructural projects. Such a system
should provide monitoring, notification and reporting services. The goal of such
a system is to manage the output of sensors in an optimal way for infrastructure
condition assessment.
In this paper we present the InfraWatch project. The goal of this project is
to construct a data management system with the above properties for a particular infrastructural asset, the “Hollandse Brug”, one of the Netherlands’ major
highway bridges. All the challenges mentioned above are present in this project.
We present some initial experiments on the bridge data, and discuss the concrete
challenges of this project and argue that it is interesting for the intelligent data
analysis community for a variety of reasons. It contains important challenges,
but it also provides an attractive and tangible environment for defining data
analysis tasks, demonstrating the value of methods, and promoting research and
education in intelligent data analysis.
In the next section we introduce the problem of infrastructural asset monitoring and the entailed requirements for data management and analysis on a
general level. After that, we will zoom in on the concrete case of the Hollandse
Brug.
2
Monitoring infrastructure
In view of asset management of large infrastructural projects, monitoring the
actual performance in real-time conditions is becoming indispensable. The actual
performance of infrastructural assets in relation to their action loads (traffic,
climate, etc.), is considered to be the key element for managing the maintenance
requests and for the control of the budgets in the long run. Decisions made
regarding the maintenance needs can be considered in view of the technical and
economical perspective of an infrastructural asset, and even more importantly, in
view of its functional perspective. Monitoring the performance of infrastructure
requires (1) sensor systems to measure, and (2) data management and intelligent
data analysis systems for dealing with the large streams of data coming from the
sensors. Both are necessary to come up with an optimized system for condition
monitoring. In the design of such a system, choices need to be made for
1. The type of sensors: depending on the used materials and other parameters,
different types of sensors can be used to trace a structure’s actual condition,
i.e. sensors for chloride concentration, moisture content, carbonation, etc.
2. The placement of the sensors: the layout and the grid density for the sensors to be positioned in order to get a reliable response of the condition
monitoring.
3. The data management: The data received from the sensors will need to be
collected, processed and stored by the system and further processed. Results
should be communicated by notification and reporting services.
4. Asset Modelling: In addition to the information coming from the data, for
prediction of the condition of the asset in the future, modeling of (parts
of) the assets is needed. The combination of the data-driven approach with
computational modeling gives predictions about the condition of the asset
in the future.
Points 1 and 2 are engineering decisions; from the data management and
analysis point of view, we assume the type and configuration of sensors are
given and we have no control over them. Point 3, data management, contains a
number of challenges that hold for infrastructure monitoring in general, and we
will discuss these here. The modeling methods relevant for point 4 will strongly
depend on the concrete context. Therefore we will discuss point 4 in the next
section, after we have introduced the concrete setting of the Hollandse Brug.
2.1
Data Management
Monitoring systems for infrastructural assets are continuously producing large
quantities of data. Clearly, it is infeasible to store on location all sensor output at
a high resolution, for example more than once per second. Still, for some applications, such detailed data may be required. At the same time, it is unrealistic to
perform all computation required for data analysis on-site. Therefore, a sophisticated data management strategy will have to be developed that brings together
computing power and the necessary data. This strategy will have to take into
account local data collection at different time-resolutions, periodic replication to
an off-line data warehouse, scheduled snapshots of intense measurement, as well
as some amount of local data analysis for monitoring recent events.
Data collected in monitoring systems roughly serves two purposes:
1. Large scale data mining for analyzing patterns in the stream of sensor output
or between different (types of) sensors, as well as analyzing trends over time,
for example for finding so-called concept drift.
2. On-line data analysis of (to a certain degree) real-time data for monitoring
the integrity of the infrastructure and detecting recent changes.
The first purpose may involve a large range of analysis techniques, and typically requires substantial computational resources. Therefore, this purpose entails the periodic downloading of recent measurement data to a data warehouse,
where it can be massaged and analyzed at will. It seems unlikely that all sensor data will be stored at the highest resolution available, at least not for the
entire life-span of the infrastructure/monitoring system. The data management
system will therefore need to allow for sensor-dependent data storage that may
also vary over time. For specific periods of interest, say a notoriously busy day
of the year, one may be interested in intense measurement of data, to allow for
specific integrated types of data analysis. The effect of varying traffic load on the
infrastructure may be assessed by involving video-streams as well as vibration
and stress sensors. On the other hand, the influence of weather may be analyzed
on a much larger time scale.
The real-time monitoring will clearly have different data management requirements. Such a system component will be using a recent history of sensor
output (say one day) to compare these to characteristics of load and infrastructure response over much longer periods, with the intent of detecting recent
changes in this relationship. Obviously, this on-line tracking of changes in the
infrastructure will have to be done on site, and therefore cannot involve huge
amounts of computation. One intended approach is to analyze stress parameters
over a longer period off-line, and detect significant patterns and key characteristics that represent nominal behavior of the infrastructure. These patterns may
then be uploaded to the monitoring system (with the occasional update), where
they can be used to compare to similar results obtained on the recently acquired
sensor data. In this way, no large collections of data will need to be stored and
processed on-site.
3
The Hollandse Brug
The InfraWatch project is centred around an import highway bridge that is already producing substantial quantities of data: the Hollandse Brug. This is a
bridge between the Flevoland and Noord-Holland provinces and is located at
the place where the Gooimeer joins the IJmeer (see Figure 1). The bridge was
opened in June 1969. National Road A6 uses this bridge. There is also a connection for rail parallel to the highway bridge, as well as a lane for cyclists on the
west side of the car bridge. In April 2007 it was announced that measurements
would have shown that the bridge did not meet the quality and security requirements. Therefore, the bridge was closed in both directions to freight traffic on
April 27, 2007. The repairs were launched in August 2007 and a consortium of
companies, Strukton, RWS and Reef has installed a monitoring configuration
underneath the first south span of the Hollandse Brug with the main aim to collect data for evaluating how the bridge responds. The sensor network is part of
Fig. 1. Aerial picture of the situation of the Hollandse Brug, which connects the ‘island’
Flevoland to the province Noord-Holland, and the adjacent railway bridge (top).
the strengthening project which was necessary to upgrade the bridges capacity
by overlaying.
The monitoring system comprises 145 sensors that measure different aspects
of the condition of the bridge, at several locations along the bridge (see Figure 2
for an illustration). The following types of sensors are employed:
– 34 ‘geo-phones’ (vibration sensors) that measure the vertical movement of
the bottom of the road-deck as well as the supporting columns.
– 16 strain-gauges embedded in the concrete, measuring horizontal longitudinal stress, and an additional 34 gauges attached to the outside.
– 28 strain-gauges embedded in the concrete, measuring horizontal stress perpendicular to the first 16 strain-gauges, and an additional 13 gauges attached
to the outside.
– 10 thermometers embedded in the concrete, and 10 attached on the outside.
Furthermore, there is a weather station, and a video-camera provides a continuous video stream of the actual traffic on the bridge. Additionally, there are also
plans to monitor the adjacent railway bridge.
Clearly, the current monitoring set-up is already providing many challenges
for data management. For one, the 145 sensors are producing data at rates of 100
Hz, which can amount to a gigabyte of data per day. Adding to that is the continuous stream of video. Although the InfraWatch projects is in its early stages,
data is already being gathered and under provisional monitoring. However, the
current data available for analysis consists of short snapshots of stress and video
data, that is being manually transported from the site to the monitoring location
(typically an office environment or Leiden University). One of the aims of the
project is to develop sophisticated methods for data management, as outlined in
the previous section.
Fig. 2. Detail of the diagram explaining the individual sensor placement
Prior to the start of the InfraWatch project, an initial monitoring application was developed, that allows the visual inspection of both video and sensor
information. The application allows the user to navigate through a selected timeframe, and watch the traffic passing over the bridge, while the data over one or
more sensors is displayed in synchronised fashion. The user can select the nature
of the sensor as well as the location of it, which does not necessarily have to
correspond with the location of the camera. Using this application, it is fairly
easy to already observe some patterns in the data. For example, the vertical
load data nicely corresponds with heavy vehicles passing. However, more sophisticated data analysis should be developed in the course of the project, that also
takes into account multivariate behaviour of the data, and spatial relationships
between sensors, to name just a few options. In the next sections, we provide
some suggestions for the range of analysis approaches this data allows.
4
Data Analysis
We need to distinguish two forms of data analysis: the first form, which we call
model construction and which happens offline, consists of analysing data to find
patterns in them; the patterns together form a model of the data. The second
form, which we call model application, happens online and consists of checking
whether the data stream is still consistent with the model. The Hollandse Brug
data poses interesting challenges on both sides.
1) Model construction: much data mining research focuses on the model construction task. Many algorithms for detecting patterns in data, and constructing
descriptive or predictive models from these patterns, have been described in the
literature. The sensor data that we need to deal with here, however, have characteristics that render it impossible to use standard data mining algorithms. First,
there is the temporal dimension. Each sensor essentially produces a time series of
data. Analysis of time series is a well-investigated problem. However, in this case
we cannot analyse each time series on its own: relationships between different
time series are relevant. A simple example of this is that a pattern might state
that two particular time series normally correlate negatively; but patterns may
actually involve much more complicated forms of relationships between (possibly
more than two) time series. In addition, these time series may have a different
granularity. It is currently not known how such data are best analysed. Second,
there is a spatial dimension: the sensors are related to each other though their
spatial location. The relative position of sensors may be indicated with a graph
structure. In that case, patterns may involve combinations of graph structures
and time series patterns (for instance, two sensors tend to correlate if they are
the same type of sensor and are connected to each other in the spatial graph).
It is not obvious how to represent such patterns, and a fortiori no algorithms for
discovering them are known. Third, the data are dynamic: there may be concept
drift, which implies that the patterns relevant at some point in time gradually
become less relevant. Models should therefore be adapted regularly. But while a
slow shift in the patterns may be normal, a sudden change may indicate a reason
for alarm. The question is: how can we distinguish these two different cases?
2) Model application is an equally important task in this context. Model application will happen online, in real-time, with limited computational resources.
It is crucial, then, that the developed models can indeed be applied efficiently.
This is true for many, but not all models; for instance, for probabilistic graphical
models it is known that inference is NP-hard, which makes it non-obvious that
they can be applied in this context. The efficient applicability of the learned
models is an additional constraint on the data mining task.
Viewed as a whole, we are confronted with data with a complex and evolving
relational and spatio-temporal structure. Applying statistical, data mining and
pattern recognition techniques to such data is a non-trivial task: there are open
questions regarding the optimal representation of the data, how to represent the
patterns, what algorithms can be used to detect these patterns in the data (again,
existing algorithms will likely not suffice for this task), how to detect significant
shifts in the patterns, and how to efficiently detect significant deviations of the
data with respect to a given pattern. The development of suitable representations
and algorithms to solve these problems is an important research task.
The format of the data and the way it is generated is clearly reminiscent
of data streams. The context of this project is somewhat different than what
is typically considered in data stream mining: for instance, due to the offline
analysis of data, the usual constraints on data stream mining algorithms (namely,
that model construction happens online) are less stringent here. This allows us to
explore a wider range of algorithms. Nevertheless, it is clear that stream mining
is relevant for this project.
In recent years there has been a growing interest in the study and analysis
of data streams. Typical examples of such streams include continuous sensor
readings. Traditional data mining approaches are not suitable for mining such
streams, because they assume static data stored in a database, whereas streams
are continuous, high speed, and unbounded. Therefore, streams must be analyzed
as they are produced and high quality, online results need to be guaranteed.
Until now, most pattern mining techniques focus either on non-streaming
data, or only consider very simple patterns, such as identifying the hot items
from one stream, or constantly maintaining the frequencies in a window sliding
over the stream. The challenging task is to extend the existing state-of-the-art
into two, orthogonal directions: On the one hand, the mining of more complex
patterns in streams, such as sequential patterns and evolving graph patterns, and
on the other hand, more natural stream support measures taking into account
the temporal nature of most data streams. Clearly, the classical pattern mining
algorithms do not fulfil the constraints imposed on stream processing algorithms.
Mining data streams, or stream mining, is therefore a challenging task.
The most popular techniques that have been developed so-far are randomization and approximation, sampling, sketches, and summaries. Randomization
and approximation techniques render stream mining algorithms sufficiently fast,
at the expense of no longer guaranteeing exactness. Sampling implies that a
small sample of the data stream is taken, and costly algorithms are run on the
sample. Sketches and summaries help dealing with the abundance of data by
instead of storing the complete data stream, which is infeasible, a summary of
the relevant features is kept that allows for answering queries about the stream
approximately.
5
First experiments
Although the InfraWatch project has only recently started, the sensor network
has been up and running for more than a year. During this period, a number
of experiments have been performed and specific samples of data have been
collected. Some exploratory analysis has been performed to investigate what
challenges need to be faced in different aspects of the structural modeling. This
section gives some examples.
In theory, one can interpret ‘traffic’ as a series of discrete events, with events
being a vehicle passing a particular point at a certain time. However, each individual event will appear to a vibration or load sensor as a signal over some
period of time. This temporal spread of the signal is caused by three factors:
1. The physical size of the vehicle. As a vehicle will have a certain length, it
will take some time to pass a particular sensor. One can safely assume that
this factor is monotone in the length of the vehicle (in the direction of travel)
and its speed.
2. The sensitivity area of the sensor. As the sensor is connected to a rigid part
of the structure, any movement of the structure will be conducted along it,
causing a change in signal of the sensor, even if the vehicle is not exactly
located over the sensor. However the effect of the vehicle on the signal will
diminish with the distance from the sensor. In effect, the area of sensitivity
will act as a form of smoothing on the signal, producing a bump, rather than
Fig. 3. The 10 axle test truck that was driven across the bridge in the early morning.
Fig. 4. Measurement of a load (top) and vibration sensor at the moment when the test
truck was passing. Individual axles can be observed
a single peak in the sensor data. The effect of this factor will differ between
vibration and load sensors, with the latter being bigger, due to complete
bridge sections carrying the load of a vehicle.
3. Specific physical properties of the structure, such as the resonance frequency
of the bridge. Sudden events, such as a heavy vehicle entering specific sections
of the bridge may cause the bridge to subtly sway at a specific frequency
that is a physical property of the bridge, and that depends on structural
characteristics, such as the size, weight and rigidity of each section. This
resonance will cause a signal that starts at the vehicle passing, but that
continues for some duration after the event. A Fourier analysis will reveal
such dominant frequencies in the spectrum.
One of the essential tasks of the project is to match the continuous signals
caused by these three factors with the discrete events of the actual traffic. One
way to approach this, is to consider isolated events, and determine their effect
in the sensor-space. Figure 3 shows two pictures of such an isolated test. Trucks
were driven with a specific speed (ranging between 50 and 90 km/h) over the
sensor network in the early morning, when regular traffic is sparse. Prior to the
test, the weight and load distribution over the 10 axles was determined. Different
loads were tested, to get a proper variation in examples. Using the resulting data,
the sensor-network can effectively used as a Weigh-In-Motion (WIM) system [2].
Figure 4 shows the effect of a test run on both a load and a vibration sensor.
The right graph also shows a subtle vibration of the bridge superimposed on the
load signal. This vibration was determined to be approx. 2.5 Hz, over a period
of one month. Sudden changes, or gradual drift of this resonance frequency can
point to structural degradation of the bridge.
An alternative means of matching continuous signals with discrete events is
to remove (or at least minimize) the variable of speed. Figure 5 (left) shows a
situation of slow-moving traffic on the far lane of the bridge. By careful manual
annotation of consecutive individual video-frames, one can determine the individual events, including some estimate of the size of the vehicle. The right graph
shows the effect that the five highlighted vehicles have had on one of the straingauges. In such slow-moving conditions, the individual bumps can be identified,
and matched to the video-stream. However, there will be a certain amount of
‘stretch’ in the signal, due to the intermittent nature of the passing vehicles. This
will make the bumps vary in width in a manner that is somewhat independent
of the length of the vehicle.
For the above-mentioned settings, annotation of the video-stream was performed manually, by carefully inspecting individual frames. In order to be able
to process large periods of video and sensor-streams, we have been experimenting with automatic detection of vehicles in the images, using a technique for
separating the background from the moving traffic (see Figure 6). This technique is flexible and robust, in the sense that it can deal with slight movement
of the camera (due to wind and bridge movement), as well as with changing
environmental situations (such as weather and lighting). The figure for example
shows a rainy day, with a number of large water drops on the lens. Based on
the detected location of moving objects, a further aggregation step identifies actual vehicles. The current implementations works fairly consistently, but a clear
matching from blobs to events (especially over multiple frames) is still a major
challenge.
6
Education Opportunities
Besides being an excellent research challenge and a complex fielded application
of Data Mining techniques, the InfraWatch project and its Hollandse Brug are
also intended to serve educational purposes [9]. Because of its practical nature,
the project will, and has already been an important tool in the teaching of intelligent data analysis techniques to computer science students in the first place.
Rather than the traditional focus on basic analysis techniques and algorithms,
we now have an opportunity to demonstrate the many complications that tend
Fig. 5. Slow-moving traffic, and the corresponding output of one of the strain-gauges.
Fig. 6. Estimating large blobs of moving objects: (left) the input image, (middle) the
expected background over the recent past, (right) the estimated location of moving
objects.
to arise in actual analysis projects [4, 5], and how these should be tackled. These
complications include the measuring of data (noise, sensor-failure, ...), the continuous flow of data (data volume, versioning issues, sample rates), the range of
analysis paradigms (multivariate analysis, streams, relational aspects), and the
inclusion of domain knowledge (spatial aspects, feature extraction). Apart from
making the existing data analysis education more attractive and realistic, the
project will also serve to attract potential students to analysis-related courses
and computer science in general.
7
Conclusion
In this paper we have introduced the InfraWatch project, which has as main goal
the setting up of an intelligent infrastructure monitoring system, in particular a
data management and analysis system for the Hollandse Brug. It is clear that this
system will have online and offline components, and the challenges involved are:
determining which functionality is best offered online and offline, determining
the optimal representation for online and offline data storage and processing,
determining what kind of models are most suitable for this kind of systems, and
developing the necessary data analysis techniques for constructing and applying
such models.
We believe the project offers a very attractive environment for data analysis
for students, scientists and experienced practitioners alike. It provides a tangible
and even somewhat spectacular application, with challenges on all levels: students can try to analyse infrastructural data with existing techniques and see
what they can find; practitioners can tackle a number of concrete challenges using their expertise on data mining; scientists can study the presented challenges
in depth and develop novel techniques and approaches to solve them. Solving the
problems defined within the project will require bringing together expertise from
very diverse areas in intelligent data analysis, including data and knowledge representation, spatio-temporal data mining, graph mining, sequence mining, data
stream mining, computer vision, data visualisation, and more.
Acknowledgements
The InfraWatch project is funded by the Dutch funding agency STW, under
project number 10970.
References
1. M. Dejori, H.H. Malik, F. Moerchen, N.C. Tas, and C. Neubauer, 2009 Development of Data Infrastructure for the Long Term Bridge Performance Program, In
Proceedings of Structures ’09, Austin, USA.
2. E. Doupal, R. Calderara, 2004, Weigh-In-Motion, In Proceedings of First International Conference on Virtual and Remote Weigh Stations, Orlando.
3. S. Džeroski, H. Blockeel, B. Kompare, S. Kramer, B. Pfahringer, W. van Laer,
Experiments in Predicting Biodegradability, In Proceedings ILP 1999, LNCS 1634,
1999 - Springer
4. A. Knobbe, 1997, Data Mining for Adaptive System Management, In Proceedings
of PAKDD ’97, London.
5. A. Knobbe, Bart Marseille, Otto Moerbeek, Daniël M.G. van der Wallen, Results
in Adaptive System Management, Benelearn’98
6. G. Meijer, Smart Sensor Systems, 2008, ISBN: 978-0-470-86691-7, Hardcover, 404
pages.
7. T. Hastie, R. Tibshirani, J. Friedman, 2001, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction, Springer Verlag.
8. N. Bessis, 2009, Grid Technology for Maximizing Collaborative Decision Management and Support: Advancing Effective Virtual Organizations, University of Bedfordshire, UK
9. R. Gavaldà, 2008, Machine Learning in Secondary Education?, In Proceedings TML 2008, Saint Etienne, France, http://www.lsi.upc.edu/∼
gavalda/docencia/tml08-revised.pdf
Download