Uploaded by mr.moronbutt.face

real time road monitoring using residual rnn

advertisement
1215
ARTICLE
Real-time winter road surface condition monitoring using
an improved residual CNN
Guangyuan Pan, Matthew Muresan, Ruifan Yu, and Liping Fu
Abstract: This paper proposes a real-time winter road surface condition (RSC) monitoring solution that automatically generates descriptive RSC information in terms of snow and ice coverage by using images from fixed traffic and weather cameras. Several state-of-the-art pre-trained deep neural networks are customized and fine-tuned to address a specific domain,
classifying the amount of snow coverage on a road surface. A thorough evaluation is conducted to identify and select the
best model. This evaluation uses an extensive set of experiments to test the accuracy and generalization of each model and
uses transfer-learning to fine-tune each of the pre-trained models on independent images from different traffic and weather
cameras. The transferability of each model, relationship between model performance and data size, and the system settings
of each model are then examined. Lastly, three online weight calibration methods are proposed to automatically update
the model in new environments. The result shows that re-training the model using images from a mixed set of cameras has
the most promising results.
Key words: road surface condition, realtime recognition, deep learning, convolutional neural network.
Résumé : Dans ce document, on propose une solution de surveillance en temps réel de l’état de la surface de roulement
(ESR) en hiver qui génère automatiquement des informations descriptives de l’ESR en matière de couverture de neige et
glace en utilisant des images de caméras fixes de circulation et météo. Plusieurs réseaux de neurones profonds préentraînés de pointe sont adaptés et peaufinés pour traiter un domaine spécifique, soit celui de classifier la quantité de couverture de neige sur une surface de roulement. Une évaluation approfondie est effectuée afin de déterminer et de choisir le
meilleur modèle. Cette évaluation utilise un vaste ensemble d’expériences pour vérifier la précision et la généralisation de
chaque modèle et utilise l’apprentissage par transfert pour peaufiner chacun des modèles pré-entrainés sur des images
indépendantes provenant de différentes caméras de circulation et météo. La transférabilité de chaque modèle, la relation
entre la performance du modèle et la taille des données ainsi que les paramètres du système de chaque modèle sont ensuite
examinés. Enfin, trois méthodes de calibration de poids en ligne sont proposées pour mettre à jour automatiquement le
modèle dans de nouveaux environnements. Le résultat montre que le réentrainement du modèle à l’aide d’images provenant d’un ensemble de caméras fournit les résultats les plus prometteurs. [Traduit par la Rédaction]
Mots-clés : état de la surface de roulement, reconnaissance en temps reel, apprentissage profond, réseau neuronal
convolutif.
1. Introduction
In countries with severe winter seasons such as Canada, road
surface conditions (RSC) on highways during snow events could
vary remarkably from location to location and over time. Due to
the vast spatial distances covered by the highway network and
the uncertain nature of the weather events, such variations are
often hard to monitor and predict, making both winter road
maintenance and public travel extremely challenging. Maintenance agencies often struggle to obtain up-to-date information
on the RSC of the highway network, which is essential to making
effective and efficient decisions on managing maintenance operations such as salting and plowing. Providing RSC data in realtime to the general public can also have benefits as it allows them
to consider current roadway conditions when planning travel.
Traditionally, RSC monitoring is done by either manual patrolling by highway agencies and maintenance contractors or using
road weather information system (RWIS). Both methods however
suffer from their limitations. Manual observation by patrollers is
subjective, inaccurate, and time-consuming while data from
RWIS are limited to the sparse points where RWIS stations are installed (Buchanan and Gwartz 2005; Kwon et al. 2015, 2017; Gu
et al. 2019). Some new technologies, for example, in-vehicle video
recorders, smartphone-based systems, and high-end imaging systems, have been developed to collect RSC data; however, the
application of these technologies still requires manual observation as no reliable image recognition solutions are available to
automate the process (Hong et al. 2009; Omer and Fu 2010; Jonsson
et al. 2015; Linton and Fu 2015). Researchers have also attempted to
Received 19 June 2019. Accepted 13 October 2020.
G. Pan. Department of Civil & Environmental Engineering, University of Waterloo, Waterloo, ON, Canada; Shenzhen Garry Intelligent Technology
Limited, Shenzhen, China.
M. Muresan and L. Fu.* Department of Civil & Environmental Engineering, University of Waterloo, Waterloo, ON, Canada.
R. Yu. Google, Kitchener, ON, Canada.
Corresponding author: Liping Fu (email: lfu@uwaterloo.ca).
*Liping Fu served as an Associate Editor at the time of manuscript review and acceptance; peer review and editorial decisions regarding this manuscript
were handled by another Editorial Board Member.
Copyright remains with the author(s) or their institution(s). Permission for reuse (free in most cases) can be obtained from copyright.com.
Can. J. Civ. Eng. 48: 1215–1222 (2021) dx.doi.org/10.1139/cjce-2019-0367
Published at www.cdnsciencepub.com/cjce on 20 October 2020.
1216
apply some traditional machine learning models, including artificial neural networks (ANN), random forests (RF), and support vector
machines (SVM) to classify winter RSC images; however, these
models have not been shown to be practically applicable in terms
of accuracy and transferability (Linton and Fu 2016). This has
however changed due to the development of a novel machine
learning technique – deep learning (DL) or deep neural networks
(DNN). DNN models have been extensively studied and shown
excellent performance to solve a variety of problems including
unsupervised-learning based classification, objects observation,
forecasting, and reinforcement learning based game playing (LeCun
et al. 2015; Qiao et al. 2019), some deep learning and big-data based
approaches have been adopted for traffic flow forecasting, accident
prediction and other applications, showing satisfying improvements (Lv et al. 2015; Basso et al. 2018; Zhang et al. 2017, 2018;
Lippi et al. 2013). In our previous effort, we have shown some
promising results when applying this technique for tackling the
RSC recognition problems (Pan et al. 2018, 2019).
This paper describes the results of our continuing effort with a
specific focus on exploring the potential of applying one of the
most successful convolutional neural networks, deep residual
network (ResNet), to classify winter road surface conditions. The
research has made the following critical contributions: (1) it is
the first effort to customize and compare several state-of-the-art
pre-trained convolutional neural network (CNN) models for monitoring winter RSC using images from traffic and RWIS cameras;
(2) this research has extensively investigated the relationship
between the performance of the customized pre-trained CNN
models and data size to the transferability of the trained models
using images from new cameras; and (3) three new model updating methods are proposed and tested to automate the model
updating using new data for real-world implementation.
2. Literature review
The image recognition problem has been studied extensively
in the literature. Many machine learning models have been proposed in the past decades. In particular, convolutional neural networks (CNN) have been applied successfully to solve pattern
recognition problems, including large-scale image classification
and video analysis. A CNN is a kind of feed forward neural network that contains convolutional computing elements with deep
structure and is one of the most typical algorithms applied in
deep learning. This success has been made possible due to both
the availability of large public image repositories (such as ImageNet), and advances in computational power such as graphic processing units (GPU), tensor processing units (TPU), and recent
neural-network processing units (NPU) (Glorot and Bengio 2010;
He and Sun 2015). In recent years, CNN has become one of the
most popular techniques used in the machine learning field and
many variations have been developed for improved performance
(Sermanet et al. 2014; Howard 2014).
One of the successful models was called “AlexNet” which was
developed in 2012 by Krizhevsky et al. (2012). Since its development, more research has been done to advance the state of the
art in the field and many successful models have also been developed by commercial entities, such as Microsoft’s ResNet and Google’s
Xception and Inception (Simonyan and Zisserman 2014; He et al.
2016a; Szegedy et al. 2016; Chollet 2016). Although deep learning
provides excellent performance in many settings, there are many
situations where it still faces difficulties. Deep learning, like many
other big data technologies, is unable to learn useful features without significant data input. When data are limited, approaches common to traditional machine learning methods must be employed,
such as the preprocessing of raw images (Linton and Fu 2016; Liu
et al. 2013; Breiman 2001; Marsh and Bearfield 2004). Furthermore,
while previously developed models excel in many situations,
some tasks may provide additional challenges that reduce their
Can. J. Civ. Eng. Vol. 48, 2021
effectiveness (Veit et al. 2016; He et al. 2016b; Chen-McCaig et al.
2017; Soon et al. 2018; Mondal et al. 2017; Yi and Shirk 2018; Xue and
Li 2018). One key strategy that has been developed recently to
address this is transfer learning, which is the idea of adapting a
trained model for a new task (Pan and Yang 2010; Zheng et al. 2018).
In a previous work, we attempted to apply a number of machine
learning methods for RSC classification (Linton and Fu 2016). We
further extended this work by, developing a VGG16 based RSC
recognition model using images from in-vehicle cameras and
showed that the proposed model had the best performance when
compared to other traditional machine learning techniques (Pan
et al. 2018). In another work, we studied the testing accuracy of
the model using different training dataset sizes and showed the
potential of achieving much higher accuracy with a larger training dataset. In another work (Pan et al. 2019), we evaluated four
of the most successful CNN models in recent years, namely,
VGG16 (Oxford), ResNet50 (Microsoft), Inception-V3 (Google) and
Xception (Google), for their potential to address the particular
challenges that hinder RSC classification. To summarize, (1) CNN
has been shown to have superior performance for image recognition, (2) transfer learning based on pre-trained CNNs has increased
learning efficiency greatly, (3) several pre-trained CNNs are tested
on road surface conditional monitoring using small amount of
images; however, the performance of these models is yet to be
evaluated using large-scale datasets from real-world applications.
Hence, based on those works, in this paper we aim at establishing
a relatively complete RSC recognition system based on ResNet50
using transfer learning theory and constructing an automatically
updating method for the system.
3. Deep residual neural network
3.1. Model structure
Deep residual neural networks are a type of convolutional neural network that have been shown to be considerably more
powerful than other models for image recognition based on testing using two standard benchmark image datasets: CIFAR-10 and
ImageNet-2012 (Göring et al. 2013).
The advantage of making use of one of the successful pretrained
models is that they have already been trained with millions of
images, which means that they could have “remembered” the
major common features of various objects; therefore, transferring the model to new applications with some additional training
using a smaller data set is expected to be much easier than training a completely new model using a large data set. In our previous research, we have shown that ResNet50 is the best performer
providing the most accurate predictions.
In this research, the model is trained by using raw images that
are resized into a three channeled (Red, Green, Blue) image with
dimensions of 224 224. RGB values are normalized by subtracting the mean RGB value of each pixel from the value of each
pixel. Although this causes some minor information loss, it
reduces the complexity of the learning process and lowers the
computation time. The original images are also classified into
one of four classes – bare pavement, partly covered pavement,
fully covered pavement, and unrecognizable images. Four nodes
are used in the output layer with the output from each node representing the probability that the image matches each of the categories. This structure is identical to the one used in a previous
work (Pan et al. 2019) and uses two layers with 1000 neurons in
each layer. As outlined in the next section, all nodes in the hidden layers are rectified linear units (ReLU). All the models used in
this research have been pretrained on general-purpose images,
we therefore solve this RSC problem by using the highest probability
value provided by the SoftMax output layer to select the category
for the image. As the ImageNet database used to train the ResNet
model does not directly include RSC classes and image, it is necessary to add these classifications by further training and fine
Published by Canadian Science Publishing
Pan et al.
Fig. 1. Development of the winter road condition monitoring
system. [Colour online.]
1217
Step 4 is introduced to automate the model updating process.
In real world applications, new cameras could be installed over
time and the camera settings could be changed to address some
new condition monitoring needs. This step allows the model to
be further updated automatically with a completely new data set
without significant loss in prediction performance. This process
is done by training the model for a few epochs using data from
the newly installed cameras or cameras with new settings.
4. Experiment
tuning the model with our domain-specific data (RSC images).
The structure of the customized ResNet for RSC recognition
includes two 1000 fully connected layer that must be trained
from scratch, and an output layer with 4 nodes.
3.2. Algorithm: training and fine-tuning
Figure 1 shows the process involved in developing the proposed
model. The key strategy associated with development process
includes steps to train the top classifier (Steps 1 and 2) and to finetune the whole network (Steps 3 and 4).
In Step 1, all layers except the customized final two customized
fully connected layers of the pre-trained model are initiated
using the pre-stored weights. The model is then fed with the
training dataset and validation dataset. In this step all outputs
from the convolutional functions, ReLU transfer functions, and
Max pooling layers are used.
In Step 2, the convolutional layers and Max pooling layers calculated in Step 1 are all frozen and the extra fully connected
layers on the top are trained. Weights are updated by using the
stochastic gradient decent (SGD) method. This approach of storing features offline and only training the last fully connected
layers helps to increase the computational speed as tuning the
model from a randomly initialed state is computationally expensive, especially, if training is done on the CPU.
In Step 3, fine-tuning of the model’s upper layers and the toplevel classifier is conducted. This process uses small weight
updates while re-training the model’s layers with an additional
dataset. During this process, lower layers of the model are frozen
and gradient descent is used to update the upper layers only.
4.1. Experimental design
The image dataset used is collected from a number of fixed
weather cameras installed at the various Road Weather Information System (RWIS) stations through the Ontario road network.
In addition to images, these stations also provide highway maintenance staff with real-time weather and road surface conditions
so that they can proactively make the most appropriate winter
maintenance decisions. The RWIS network is divided into five
regions: western Ontario (WR), eastern Ontario (ER), central (CR),
north eastern (NER), and north western Ontario (NWR). The RWIS
cameras typically take pictures at three different angles: left,
middle and right, as shown in Table 1. In this experiment, each
image is first manually classified (ground truth) according to the
four-class classification scheme shown in Table 1 and then used
to train and test the image recognition model. The four classes
include bare pavement (BP) — the road surface including lanes
and shoulders is completely clear — partly snow covered (PSC),
fully snow covered (FSC), and not recognizable (the image quality
is too poor to be used for classifying road surface conditions. It
should be noted that this RSC classification system follows the Canadian standard for reporting winter highway conditions as proposed by Transportation Association of Canada. It is typically used
to convey RSC information about maintenance routes to the general public. Sample images for each category are provided in Table 1.
The model is implemented using an open source available from
online. All experiments are conducted on Windows 10 machine
with a GTX1070 GPU with 16 GB memory.
Four experiments are designed to evaluate the performance
and transferability of the trained prototype under different
conditions. The first is to assess the overall performance of the
ResNet50-based model as compared to other models using data
from all the 60 cameras. The second experiment focuses on the
transferability of the model with images from some cameras
being used for training and those from the remaining cameras
being held out for use in validation. In the third experiment,
based on the results of the second experiment, two tests are
designed to analyze the relationship between model performance, the number of cameras used, and data size. Finally, the
fourth experiment tests the effectiveness of a model updating
process based on a sequential training and fine-tuning approach.
4.2. Performance assessment
In this experiment, the performance of ResNet50 is evaluated
in comparison to some other well-known deep neural network
models, including VGG16 (Simonyan and Zisserman 2014), InceptionV3 (Szegedy et al. 2016), and Xception (Chollet 2016). As the
convolutional layers in each of these models are pre-trained and
specified, we are only able to modify the fully connected layers
and only alter the number of units (or nodes) in those layers.
Although structural changes are important, the re-training process’ effectiveness heavily depends on configuring the training
process. Model training is divided into “epochs” which represent
weight updates conducted after a single pass-through of the
training images. The size of these weight updates is controlled by
the learning rate, and a different rate can be specified for both
the pre-training and fine-tuning stage. As mentioned previously,
Published by Canadian Science Publishing
1218
Can. J. Civ. Eng. Vol. 48, 2021
Table 1. Definition of different types of snow coverage.
Sample image
some layers can also be frozen preventing weight updates on
their nodes. Based on a sensitivity analysis conducted in our previous effort (Pan et al. 2018, 2019), we determined that using two
fully connected layers each containing 1000 neurons, a learning
rate of 0.001, fine-tuning learning rate of 0.0005, and restricting
the fine-tuning to the last 48 convolutional layers provided the
best performance. These settings are adopted in this research.
The data was downloaded from Ontario’s RWIS system, including
a total of 24 779 images, of which 80% (19 864 samples) are randomly chosen for training, and the remaining 20% (4915 images)
for testing.
The results in Fig. 2 and Table 2 show the evolving performance
of the training and fine-tuning process and the final performance
of the four models. Accuracy is used to evaluate the performance of
the model; it is defined as the ratio of the number of correct predictions to the total number of testing images. ResNet50 had the
best final performance, starting with an accuracy of 62.17% that
reached to 81.4% after 20 training epochs. Its final accuracy
reached 95.18% after fine-tuning. Additional results are shown in
Table 2. However, the difference in the testing accuracy between
the models is relatively small. In a previous study (Linton and Fu
2016), additional machine learning models were tested on an invehicle image classification problem with three classes. The accuracy of the traditional artificial neural networks was 83.6%, random
tree 85.3%, random forest 85.4%, and traditional convolutional neural networks only 84.8%. ResNet50’s high accuracy is in line with
findings and results from another recent study that showed it was
the most robust model with the lowest variance (Pan et al. 2019).
Description
Four-class
description
At least 3 m of the
pavement crosssection in all lanes
clear of snow or ice.
Bare
Only part of wheel
path is clear of
snow or ice.
Partly snow
covered
No wheel path clear
of snow or ice.
Fully snow
covered (more
than 90% snow
coverage)
Not recognizable
because of too dark,
too much light or
too blurry.
Not recognizable
As a comprehensive comparison, a confusion matrix using ResNet
(showed in Table 3) is also provided to show different aspects of
the model performance.
4.3. Effect of camera mix on model transferability
This experiment is designed to assess the transferability of the
proposed model. We start the experiment by first training the
ResNet50 based RSC model using images from a varying number of
cameras and then test its performance using data from five holdout
cameras (CR-01, CR-21, ER-26, NER-21, and WR-11, see Table 4). The
experiment is first started with images from one camera (CR-04),
which is a Highway 400 camera near Bradford in central Ontario. It
has two directions and three lanes on each direction. The five cameras in the testing set have similar angles and snow-covered areas,
but have a few differences in the roadway curvature, lane count,
weather conditions, etc. The results are shown in Table 4, with the
last column providing the accuracy for each camera’s image set.
4.4. Effect of training data size
In this experiment, we further analyze the effect of data size on
the performance of the proposed model.
(1) Accuracy test: increasing mixed cameras and training data
In this experiment, the performance of ResNet50 on a fixed
traffic camera with different training dataset sizes is evaluated.
First, a subset of data at a specific amount is randomly drawn
from the training data set (e.g., 10, 35, and 60 cameras out of
60 of the total training cameras) and then used to train and
Published by Canadian Science Publishing
Pan et al.
1219
Fig. 2. Training (first 20 epochs) and fine-tuning (last 20 epochs) using four pre-trained models. [Colour online.]
Table 2. Model comparisons in training and testing.
Table 4. Results of transferability testing.
Model
Training
time/epoch/s
Fine-tuning/
epoch/s
Training
accuracy
Testing
accuracy
VGG16
Xception
Inception-V3
ResNet50
Testing samples
220
220
215
211
/
332
438
260
320
/
95.28%
96.30%
96.05%
97.53%
93.08%
94.67%
94.93%
95.18%
/
Table 3. Confusion matrix on ResNet model testing performance.
Training
Testing
Site name
Testing accuracy
CR-04
CR-01
CR-21
ER-26
NER-21
WR-11
96.42%
100 %
87.23%
99.01%
92.15%
92.74%
Table 5. Testing result with increased training cameras.
Training Training Testing
size
accuracy accuracy Variance
Predicted Predicted Predicted Predicted True
False
Class 1
Class 2
Class 3
Class 4
positives negatives
Class 1 95.33%
Class 2 3.57%
Class 3 0.00%
Class 4 1.10%
1.50%
96.14%
2.15%
0.21%
0.28%
15.41%
82.91%
1.40%
1.95%
0.39%
0.78%
96.88%
95.33%
96.14%
82.91%
96.88%
4.67%
3.86%
17.09%
3.12%
Case 1: Train on 10, test on 10
3439
Case 2: Train on 35, test on 35 10 975
Case 3: Train on 60, test on 60 19 864
95.79%
97.69%
97.53%
92.14%
93.63%
95.18%
0.0109
0.0050
0.0027
Table 6. Transferability performance versus data size.
fine-tune the model. The selected dataset is then split into two
subsets: a training set and a testing set. The training set
includes the 80% of data while the testing set includes the
remaining 20% data. The trained model is subsequently used
to classify the testing data set. The training and the fine-tuning
epochs are set to 15 in the 10 and 35 camera experiments, and
20 in 60 camera experiment.
As shown in the Table 5, the classification performance of
the model is 92.14% at low data sizes but increases quickly as
data size also increases. The improvement trend of the model
performance suggests that the model can reach a higher level
(over 95%) of classification accuracy if more training data was
available.
(2) Transferability test: mixed cameras vs. training data
To examine the effect of camera-mix on model performance, ResNet50 is trained on a varying number of cameras
Test
site
Train
Train
Train
Train
Train
10 cameras 20 cameras 30 cameras 40 cameras 55 cameras
CR-01
74.07%
CR-21
68.89%
ER-26
73.89%
NER-21
59.92%
WR-11
81.67%
Total
71.79%
accuracy
78.57%
79.95%
78.81%
74.31%
78.62%
78.35%
76.45%
82.25%
83.25%
75.48%
76.33%
79.33%
77.77%
78.80%
78.57%
75.09%
70.22%
76.68%
80.15%
82.48%
81.77%
75.48%
73.28%
79.39%
while the number of training images is fixed. This experiment
tests the transferability of the ResNet model as related to the
variety of the camera mix. The same five cameras used in the
effect of camera mix on model transferability experiment
(Section 4.3) (CR-01, CR-21, ER-26, NER-21, and WR-11) are used
to test the model transferability. Images are divided into five
Published by Canadian Science Publishing
1220
Can. J. Civ. Eng. Vol. 48, 2021
Table 7. Results of sequential training.
Test site
Without
re-training
Re-trained on
one camera
Re-trained using images from
all new cameras but with only
one class of RSC (only bare pavement)
Re-trained using data from all
new cameras with all RSC classes
CR-01
CR-21
ER-26
NER-21
WR-11
Accuracy
84.12%
82.25%
80.04%
68.87%
84.73%
80.54%
92.83%
85.79%
90.41%
82.69%
87.68%
88.29%
83.95%
86.09%
79.94%
71.15%
80.78%
81.16%
94.19%
87.27%
92.21%
80.76%
88.66%
89.23%
Fig. 3. Improvements after re-finetuning. [Colour online.]
different training groups which are created by using a randomly drawn subset of images from specific cameras from
the remaining sets (e.g., 10/55, 20/55, 30/55, 40/55, and 55/55 of
the rest training cameras). To assess the importance of the
number of cameras and data size on the testing performance,
each group contains the same number of images regardless of
camera count (Table 6). The ResNet50 model is then trained
and fine-tuned separately using the previously chosen data,
and the trained models are subsequently used to classify the
testing data set. The training and the fine-tuning epochs are
set at 15 for each group due to the small sample size in the
training dataset.
Table 6 shows the trends of accuracies and correct classifications of the five trained ResNet50 models as the number of
cameras used in training increases while keeping training
sample size the same. The trained models all perform identically to previous results as all the cameras, samples and
model parameters are designed the same. As the number of
cameras used in training increases, the transferability accuracy rises at first to an average of approximately 78% and
drops slightly before finally stopping at 79.39% when 55 cameras are used. When looking at the results with individual
testing cameras, most of them perform better when more
cameras are used for training, which suggests that the more
cameras used, the more useful features are captured.
4.5. Sequential training and automatically updating
In this section, we experiment with a fine-tuning scheme to
quickly retrain the model using image data from new cameras. A
sample-based approach is introduced with the idea of using a few
labelled images from newly installed cameras to update the model.
The ResNet50 model is first trained with data from all the cameras
(except the five cameras held out for testing) and then re-trained
using a few new images drawn from the new cameras (the five
selected cameras). In this experiment, the re-training set includes
only 20% of data while the testing set includes the remaining 80%
data. We explore three re-training methods: (1) the re-training data
are from each of the new cameras added, (2) from a mixed dataset
of the five cameras, (3) images with bare pavement category only
(sometimes image samples in the snow-covered state may not be
immediately available).
Table 7 and Fig. 3 compares the results of different re-training
methods. Large improvements are achieved in cases of re-training
on one extra camera and mixed cameras, especially with CR-01,
ER-26, and NER-21. Significant improvements are also achieved for
CR-21 and WR-11. Performance degradations were also observed
when testing NER-21 using mixed data compared with using only
one specific camera (Fig. 3). Overall, the prototype model shows a
2% to 3% improvement when the model is trained with images of
only bare pavement condition (in which ER-26 and NER-21 show
decrease in accuracy), and 8% 9% improvement in all conditions
after using re-training with very few epochs, which become much
Published by Canadian Science Publishing
Pan et al.
better after updating (the process of retraining and updating can
refer to Fig. 1).
5. Conclusions
In this paper, we have discussed the results of an extensive
investigation on the idea of applying a pre-trained deep convolutional neural network called RestNet for winter road surface condition (RSC) recognition. A set of customized models are trained
with our problem-specific training data — a RSC image dataset.
By analyzing the model parameters and sensitivity, the best
model and structure is found. The results have shown that the
proposed solution reduces the need for large training datasets
and reduces computational time while maintaining high accuracy. Meanwhile, the model is shown to be highly transferable
and can adapt to new tasks with only minor tuning using new
data. This research finding suggests that, as compared to the traditional approach which involves developing several local models separately, the proposed method of developing a single model
using ResNet50 has the advantage of significantly reducing the
training effort. Furthermore, the automatic updating technique
is shown to have the flexibility to make use of newly available
data for improving model performance, which would otherwise
be a time-consuming process.
Acknowledgement
This research is supported by Natural Sciences and Engineering
Research Council of Canada (NSERC), Ontario Research Fund –
Research Excellence (ORF-RE), and the Ministry of Transportation
Ontario (MTO) through its Highway Infrastructure Innovation
Funding Program (HIIFP).
References
Basso, F., Basso, L.J., Bravo, F., and Pezoa, R. 2018. Real-time crash prediction
in an urban expressway using disaggregated data. Transportation Research
Part C: Emerging Technologies, 86: 202–219. doi:10.1016/j.trc.2017.11.014.
Breiman, L. 2001. Random forests. Machine Learning, 45: 5–32. doi:10.1023/
A:1010933404324.
Buchanan, F., and Gwartz, S.E. 2005. Road weather information systems at
the Ministry of Transportation, Ontario. In Proceedings of the 2005 Annual Conference of the Transportation Association of Canada, Calgary,
Alta.
Chen-McCaig, Z., Hoseinnezhad, R., and Hadiasha, A. 2017. Convolutional neural
networks for texture recognition using transfer learning. In Proceedings of
the 2017 International Conference on Control, Automation and Information
Sciences (ICCAIS), Chiang Mai, Thailand, 31 October–1 November 2007. IEEE.
pp. 187–192. doi:10.1109/ICCAIS.2017.8217573.
Chollet, F. 2016. Xception: deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), Honolulu, HI, 21–26 July 2017. pp. 1800–
1807. doi:10.1109/CVPR.2017.195.
Glorot, X., and Bengio, Y. 2010. Understanding the difficulty of training
deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), Chia
Laguna Resort, Sardinia, Italy. pp. 249–256.
Göring, C., Freytag, A., and Rodner, E. 2013. Fine-grained categorization –
short summary of our entry for the ImageNet Challenge 2012. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
arXiv:1310.4759 [cs.CV].
Gu, L., Kwon, T.J., and Qiu, Z.J. 2019. A geostatistical approach to winter
road surface condition estimation using mobile RWIS data. Canadian
Journal of Civil Engineering, 46(6): 511–521. doi:10.1139/cjce-2018-0341.
He, K., and Sun, J. 2015. Convolutional neural networks at constrained time
cost. In Proceedings of the Conference on Computer Vision and Pattern
Recognition. pp. 5353–5360. arXiv:1412.1710 [cs.CV].
He, K., Zhang, X., and Ren, S. 2016a. Identity mappings in deep residual networks. In European Conference on Computer Vision. Cham, pp. 630–645.
He, K., Zhang, X., Ren, S., and Sun, J. 2016b. Deep residual learning for
image recognition. In Proceedings of the Conference on Computer Vision
and Pattern Recognition. pp. 770–778. arXiv:1603.05027 [cs.CV].
Hong, L., Lin, J., and Feng, Y. 2009. Road surface condition recognition
method based on color models. In Proceedings of the 2009 First International Workshop on Database Technology and Applications, Wuhan,
China, 25–26 April 2009. pp. 61–63.
1221
Howard, A.G. 2014. Some improvements on deep convolutional neural network based image classification. In Proceedings of the Conference on
Computer Vision and Pattern Recognition. arXiv:1312.5402 [cs.CV].
Jonsson, P., Vaa, T., Dobslaw, F., and Thörnberg, B. 2015. Road condition
imaging – model development. In Proceedings of the Transportation
Research Board 94th Annual Meeting, Washington, DC, 11–15 January
2015. The National Academies of Sciences, Engineering, and Medicine,
Washington, DC.
Krizhevsky, A., Sutskever, I., and Hinton, G. 2012. ImageNet classification
with deep convolutional neural networks. In Proceedings of the Advances
in Neural Information Processing Systems 25 (NIPS 2012). Curran Associates Inc. pp. 1097–1105.
Kwon, T.J., Fu, L., and Jiang, C. 2015. Road weather information system
stations — where and how many to install: a cost benefit analysis approach.
Canadian Journal of Civil Engineering, 42(1): 57–66. doi:10.1139/cjce-2013-0569.
Kwon, T.J., Fu, L., and Melles, S.J. 2017. Location optimization of road weather
information system (RWIS) network considering the needs of winter road
maintenance and the traveling public. Computer-Aided Civil and Infrastructure Engineering, 32(1): 57–71. doi:10.1111/mice.12222.
LeCun, Y., Bengio, Y., and Hinton, G. 2015. Deep learning. Nature, 521(7553):
436–444. doi:10.1038/nature14539.
Linton, M.A., and Fu, L. 2015. Winter road surface condition monitoring:
field evaluation of a smartphone-based system. Transportation Research
Record: Journal of the Transportation Research Board, 2482: 46–56. doi:10.3141/
2482-07.
Linton, M.A., and Fu, L. 2016. Connected Vehicle Solution for Winter Road
Surface Condition Monitoring. Transportation Research Record: Journal
of the Transportation Research Board, 2551: 62–72. doi:10.3141/2551-08.
Lippi, M., Bertini, M., and Frasconi, P. 2013. Short-term traffic flow forecasting:
An experimental comparison of time-series analysis and supervised learning.
IEEE Transactions on Intelligent Transportation Systems, 14(2): 871–882.
doi:10.1109/TITS.2013.2247040.
Liu, M., Wang, M., Wang, J., and Li, D. 2013. Comparison of random forest,
support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar. Sensors and Actuators B: Chemical,
177: 970–980. doi:10.1016/j.snb.2012.11.071.
Lv, Y., Duan, Y., and Kang, W. 2015. Traffic flow prediction with big data: a
deep learning approach. IEEE Transactions on Intelligent Transportation
Systems, 16(2): 865–873. doi:10.1109/TITS.2014.2345663.
Marsh, W., and Bearfield, G. 2004. Using Bayesian Networks to model accident causation in the UK railway industry. In Probabilistic Safety Assessment and Management. Springer, London. pp. 3597–3602.
Mondal, M., Mondal, P., Saha, N., and Chattopadhyay, P. 2017. Automatic
number plate recognition using CNN based self-synthesized feature
learning. In Proceedings of the 2017 IEEE Calcutta Conference (CALCON),
Kolkata, India, 2–3 December 2017. pp. 378–381. doi:10.1109/CALCON.2017.
8280759.
Omer, R., and Fu, L. 2010. An automatic image recognition system for winter
road surface condition classification. In Proceedings of the 13th International
IEEE Conference on Intelligent Systems, Funchal, Portugal, 19–22 September
2010. pp. 19–22. doi:10.1109/ITSC.2010.5625290.
Pan, J., and Yang, Q. 2010. A survey on transfer learning. IEEE Transactions
on Knowledge and Data Engineering, 22: 1345–1359. doi:10.1109/TKDE.
2009.191.
Pan, G., Fu, L., Yu, R., and Muresan, M. 2018. Winter road surface condition
recognition using a pre-trained deep convolutional neural network. In
Proceedings of the Transportation Research Board 97th Annual Meeting.
Washington, DC. arXiv:1812.06858 [eess.IV].
Pan, G., Fu, L., Yu, R., and Muresan, M. 2019. Evaluation of alternative pretrained convolutional neural networks for winter road surface condition
monitoring. In Proceedings of the 2019 5th International Conference on
Transportation Information and Safety (ICTIS), Liverpool, UK, 14–17 July
2019. IEEE. pp. 614–620. doi:10.1109/ICTIS.2019.8883540.
Qiao, J., Pan, G., and Han, H. 2019. A regularization-reinforced DBN for digital
recognition. Natural Computing, 18(4): 721–733. doi:10.1007/s11047-016-9597-7.
Sermanet, P., Eigen, D., and Zhang, X. 2014. OverFeat: integrated recognition,
localization and detection using convolutional networks. In Proceedings
of the Conference on Computer Vision and Pattern Recognition. arXiv:1312.
6229 [cs.CV].
Simonyan, K., and Zisserman, A. 2014. Very Deep Convolutional Networks
for Large-Scale Image Recognition. In Proceedings of the Conference on
Computer Vision and Pattern Recognition. arXiv:1409.1556 [cs.CV].
Soon, F.C., Khaw, H.Y., Chuah, J.H., and Kanesan, J. 2018. Hyper-parameters
optimisation of deep CNN architecture for vehicle logo recognition. IET
Intelligent Transport Systems, 12(8): 939–946. doi:10.1049/iet-its.2018.5127.
Szegedy, C., Vanhoucke, V., and Ioffe, S. 2016. Rethinking the inception
architecture for computer vision. In Proceedings of the Conference on
Computer Vision and Pattern Recognition. pp. 2818–2826. arXiv:1512.00567
[cs.CV].
Veit, A., Wilber, M., and Belongie, S. 2016. Residual networks behave like
ensembles of relatively shallow networks. In Proceedings of the Conference
on Computer Vision and Pattern Recognition. pp. 550–558. arXiv:1605.06431
[cs.CV].
Published by Canadian Science Publishing
1222
Xue, Y., and Li, Y. 2018. A fast detection method via region-based fully convolutional neural networks for shield tunnel lining defects. ComputerAided Civil and Infrastructure Engineering, 33(8): 638–654. doi:10.1111/
mice.12367.
Yi, Z., and Shirk, M. 2018. Data-driven optimal charging decision making for
connected and automated electric vehicles: a personal usage scenario.
Transportation Research Part C: Emerging Technologies, 86: 37–58. doi:10.1016/
j.trc.2017.10.014.
Zhang, A., Wang, K.C.P., Li, B., Yang, E., Dai, X., Peng, Y., et al. 2017. Automated pixel-level pavement crack detection on 3D asphalt surfaces using
Can. J. Civ. Eng. Vol. 48, 2021
a deep-learning network. Computer-Aided Civil and Infrastructure Engineering, 32(10): 805–819. doi:10.1111/mice.12297.
Zhang, Z., He, Q., Gao, J., and Ni, M. 2018. A deep learning approach for
detecting traffic accidents from social media data. Transportation Research
Part C: Emerging Technologies, 86: 580–596. doi:10.1016/j.trc.2017.11.027.
Zheng, Q., Yang, M., Yang, J., Zhang, Q., and Zhang, X. 2018. Improvement
of generalization ability of deep CNN via implicit regularization in twostage training process. IEEE Access, 6: 15844–15869. doi:10.1109/ACCESS.2018.
2810849.
Published by Canadian Science Publishing
Copyright of Canadian Journal of Civil Engineering is the property of Canadian Science
Publishing and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder's express written permission. However, users may print,
download, or email articles for individual use.
Download