Science Journal of Environmental Engineering Research Published By ISSN: 2276-7495 Science Journal Publication

advertisement
Science Journal of Environmental Engineering Research
ISSN: 2276-7495
http://www.sjpub.org/sjeer.html
© Author(s) 2013. CC Attribution 3.0 License.
Research Articles
Published By
Science Journal Publication
International Open Access Journal
Volume 2013, Article ID sjeer-118, 4 Pages, 2013. doi: 10.7237/sjeer/118
On A Generalization of a Dynamic Reconstruction-Method for Gap Filling in
Daily Temperature Datasets From Weather Networks
Gianmarco Tardivo and Antonio Berti
University of Padua. Department of Agriculture, Food, Natural Resources, Animals, and Environment (DAFNAE). Viale Dell'Università, 16.
Legnaro (Padova) – Italy.
e-mail: gian0812@gmail.com
Acceped 29th July 2013
ABSTRACT
Research in the environmental sciences often makes use of
database provided by measuring instruments. Lack of
detections or false readings are some important and natural
drawbacks of their use.
The finding of methods to solve these problems is required
to improve the quality of the data and the analysis carried
out with them.
A dynamic method to fill in the missing values of a dataset,
from a network of automatic temperature sensors, was just
presented in the international literature.
The aim of this paper is to suggest a way to improve this
method and generalize it in some of its procedures.
KEYWORDS: reconstructing method; weather network;
daily temperature; neural networks; regression trees.
INTRODUCTION
In dealing with environment, often sets of data are
needed to analyze variables related to the concerned
study argument. Sometimes these database are prone
to have missing values or quality problems in general.
These problems could exclude certain types of
analysis or could lead analysis to have not very
reliable results. The search of a solution is desirable
(Mahmoud et al., 2013; Zahumensky, WMO, 2004;
Huising et al. ). Nowadays, a lot of international
research papers, in the field of the Applied
Climatology, take care of these issues (Eischeid et al.,
1995; Kemp et al.,1993; Vicente-Serrano et al., 2010;
Young, 1992). Recently, a new dynamic method was
presented, through the international literature
(Tardivo and Berti, 2012), particularly to fill in the
missing values that can be found in a daily
temperature database collected from a weather
network.
This new method was thought to reconstruct
absences of daily maximum, mean and minimum
temperature obtained from the values of a grid of 114
precision linear thermistor sensors of an Italian
Region (Veneto). It is based on a dynamic
management of the multiple regression model within
the dataset. The dynamic character is due to the
particular selection-system of the predictors and the
useful sampling size of the multiregression formula
carried out gap by gap (a gap is a set of temporally
contiguous missing values). A driving algorithm leads
the multiple regression model inside the database to
fulfill this task.
In Tardivo and Berti's paper (2012) the dynamic
algorithm driving the multiple regression formula in
the neighbour of each gap was presented. From a
modelling point of view, some generalization can be
obtained. In our opinion it should be easily to deduce
that a different model from the multiple regression
could be applied during the progress of the driving
algorithm. In this paper a formalization of this last
generalization will be presented. For this purpose a
series of mathematical symbols will be defined
or/and used. Usually mathematical symbols "cut the
readership in half", but in this case, this symbolic
exposure was retained desirable to explain the
argument and to obtain a shorthand format. This
opinion paper is strictly linked to the before
mentioned article (Tardivo and Berti, 2012). The
method of this article can be subdivided into two
phases (Tardivo and Berti, 2013), for each gap and
each station: (1) the selection of the predictors and
(2) an inference procedure (the before mentioned
algorithm) taking care of the choice of the sampling
size to be used in the final reconstructing multiple
regression formula. The driving algorithm of the
second phase is presented by Fig-01.
In this opinion-paper, a mathematical translation of
the second phase and a resulting generalization are
presented. This generalization makes it possible to
undermine this phase from the regression model and
S c i e n c e
J o u r n a l
o f
E n v i r o n m e n t a l
E n g i n e e r i n g
give freedom to use any other model. Neural
Networks model can be cited as an example.
MATERIALS AND METHODS
REFERENCE METHOD
The Tardivo and Beri's method is based on multiple
linear regressions, using a set of surrounding stations
as regressors. The approach used for the selection of
the stations and identification of the best period of
coupling of reconstructing and target stations can be
summarised as follows:
1.
analysis of the target station to identify a period
without gaps of sufficient length contiguous to
the gap to be filled preceding and/or following
the gap;
2.
identification of two groups of stations
(maximum 4 stations per group) that can be
potentially used for data reconstruction in the
neighbourhood of the target station, one
considering data preceding the gap to be filled,
the other group following the gap;
3.
selection of the period to be considered (before
or after the gap);
4.
identification of the subset of stations, in the
period previously identified, giving the best
correlation with the target station for the specific
gap to be filled. The search is done considering all
the possible subsets within the selected group;
5.
identification of the best sampling size (length of
the period used for data coupling) that minimises
the reconstruction error;
6.
reconstruction of the gap.
The procedure is then repeated for the subsequent
gaps and, after reaching the end of the time series of
the considered target station, for the gaps previously
omitted for lack of a period without gaps of sufficient
length, using the reconstructed values previously
calculated.
For the present method the focus is in step 5.
USEFUL DEFINITIONS
Having a finite sequence of real numbers,
= ∶= { , , … , }, ∈
,
we define a contiguous subsequence of =
= ,( , ) ∶=
(
≤
− + 1).
,
Denoting by
,
,
,…,
the set of all
, .
(
) ,
R e s e a r c h
( I S S N : 2 2 7 6 - 7 4 9 5 ) P a g e |2
This definition is matched to the interval of the target
series, contiguous to the gap and covering the future
or/and the past of the gap (Fig-01).
can be
matched to the
period (Fig-01) and
, to the
+
periods of length
2012).
(Fig-01; Tardivo and Berti,
Of course, all this can be defined also in the dimensional space
. In this case a vector notation
̅ will be used to indicate the elements of the
sequences;
( −
̅ ∈
, for each
= 1, 2, . . . ,
=
+ 1).
This is to include also the series of the predictors
selected in the first phase of the method (Tardivo and
Berti, 2013).
For each , , and such that ≤ (
function can be defined, in this way:
:
,
→
,
≥1
(
,
) ∶=
̅
, ̅
−
+ 1) a
∈
.
When the values and are setted we can write also
( ) ∶= with = 1, 2, … , ( − + 1).
We subdivide each contiguous subsequence into
further two contiguous subsequences ( = + ):
,
̅, ̅
=
,…, ̅
(
),
and we define two functions and :
: ,⊲ →
and : , ⊳ ×
,⊲
,…, ̅
→
,
(
> 1.
)
denotes the left-sub-subsequences (lss) of size
, and with
of size .
, ⊳
the right-sub-subsequences (rss)
This last definition is easily linked to the Tardivo and
Berti (2012) method. The function corresponds to
the multiple regression formula. For each
subsequence: (1) through the multiple regression
formula, over lss, a vector of coefficients was found
( is the number of the selected predictors +1); (2)
these coefficients are multiplied to the submatrix of
the matrix belonging to , ⊳ , rss.
This submatrix is obtained by excluding the values of
the target series. In fact these target values are
simulated to assess the method.
CONCLUSIONS
:
∈
,
In conclusion, our opinion is that
and/or could
be replaced by other types of models different from
the multiple regression model. Possibly, this
substitution could improve the reliability of the
How to Cite this Article: Gianmarco Tardivo and Antonio Berti "On A Generalization of a Dynamic Reconstruction-Method for Gap Filling in Daily
Temperature Datasets From Weather Networks" Science Journal of Environmental Engineering Research, Volume 2013, Article ID sjeer-118, 4
Pages, 2013. doi: 10.7237/sjeer/118
S c i e n c e
J o u r n a l
o f
E n v i r o n m e n t a l
E n g i n e e r i n g
reconstructions and the performance of the driving
algorithm.
Examples of possible alternative models can be:
Regression-Trees (Faucher et al., 1999; Burrows et
al., 1995), Neural Networks (Kim and Pachepsky,
2010; Kuligowski and Barros, 1998; Melek et al.,
2008), etc., while mainteining the dynamic character
of the system (a specific selection of predictors and
sampling size, for each gap), which is the main
advantage of Tardivo and Berti (2012) approach.
REFERENCES
1.
2.
3.
4.
5.
Burrows WR, Benjamin M, Beauchamp S, Lord ER,
McCollor D, Thomson B (1995). CART decision-tree
statistical-analysis and prediction of summer season
maximum surface ozone for the Vancouver, Montreal,
and Atlantic regions of Canada. J. Appl. Meteorol. 34
(8): 1848-1862.
Eischeid JK, Baker CB, Karl TR, Diaz HF (1995). The
quality control of long-term climatological data using
objective data analysis. J. Appl. Meteorol. 34: 27872795.
Faucher M, Burrows WR, Pandolfo L (1999). Empiricalstatistical reconstruction of surface marine winds
along the western coast of Canada. Clim. Res. 11 (3):
173-190.
Huising EJ, Kihara J, Nziguheba G. Africa Soil
Information Service. Spatially explicit and evidence
based soil management recommendations. Technical
specifications. Data quality control procedures for data
pertaining to the diagnostic trials. http://afsisdt.ciat.cgiar.org/downloads/DT_data_quality_control.p
df
Kemp WPD, Burnell DG, Everson DO, Thomson AJ
(1983). Estimating missing daily maximum and
minimum temperatures. J. Clim. Appl. Meteorol. 22:
1587-1593.
6.
7.
8.
9.
R e s e a r c h
( I S S N : 2 2 7 6 - 7 4 9 5 ) P a g e |3
Kim JW, Pachepsky YA (2010). Reconstructing missing
daily precipitation data using regression trees and
artificial neural networks for SWAT streamflow
simulation. J. Hydrol. 394: 305-314.
Kuligowski RJ, Barros AP (1998). Using artificial neural
networks to estimate missing rainfall data. J. Am.
Water Resour. As. 34 (6): 1437-1447.
Mahmoud MA, Saleh NA, Madbuly DF (2013). Phase I
Analysis of Individual Observations with Missing Data.
Qual. Reliab. Engng. Int. doi: 10.1002/qre.1508
Malek MA, Harun S, Shamsuddin SM, Mohamad I
(2008). Reconstruction of missing daily data rainfall
using unsupervised artificial neural network. In:
Proceedings of World Academy of Science, Engineering
and Technology, 2008, Paris, France.
10. Tardivo G, Berti A (2012). A Dynamic Method for Gap
Filling in Daily Temperature Datasets. J. Appl.
Meteorol. Clim. 51: 1079-1086. doi: 10.1175/JAMC-D11-0117.1
11. Tardivo G, Berti A (2013). The selection of predictors
in a regression-based method for gap filling in daily
temperature datasets. Int. J. Climatol. doi:
10.1002/joc.3766
12. Vicente-Serrano SM, Beguerìa S, Lòpez-Moreno JI,
Garcìa-Vera MA, Stepanek P (2010). A complete daily
precipitation
database
for
northeast
Spain:
reconstruction, quality control, and homogeneity. Int. J.
Climatol. 30: 1146-1163. doi:10.1002/joc.1850.
13. Young KC (1992). A three-way model for interpolating
for monthly precipitation values. Mon. Weather Rev.
120: 2561-2569.
14. Zahumensky I (2004). World Guidelines on Quality
Control Procedures for Data from Automatic Weather
Stations. World Meteorological Organization.
How to Cite this Article: Gianmarco Tardivo and Antonio Berti "On A Generalization of a Dynamic Reconstruction-Method for Gap Filling in Daily
Temperature Datasets From Weather Networks" Science Journal of Environmental Engineering Research, Volume 2013, Article ID sjeer-118, 4
Pages, 2013. doi: 10.7237/sjeer/118
S c i e n c e
J o u r n a l
o f
E n v i r o n m e n t a l
E n g i n e e r i n g
R e s e a r c h
( I S S N : 2 2 7 6 - 7 4 9 5 ) P a g e |4
Fig-01. Progress of CV trials for every sampling size in the case of the selection of the period before the real gap.
D = (U-1) + T + I; I = maximum sampling size allowed; i = tested sampling size; U = number of CV trials; T =
length of the gap (days). Each u represents a CV trial (from Tardivo and Berti, 2012).
How to Cite this Article: Gianmarco Tardivo and Antonio Berti "On A Generalization of a Dynamic Reconstruction-Method for Gap Filling in Daily
Temperature Datasets From Weather Networks" Science Journal of Environmental Engineering Research, Volume 2013, Article ID sjeer-118, 4
Pages, 2013. doi: 10.7237/sjeer/118
Download