New evidence on temperature variations and trends over China from

advertisement
Supplementary 1: Homogenization of the observed data from the surface stations
The homogenization method of Peterson and Easterling (1994 and 1995) is used to adjust
the station data of China by Li et al. (2004 and 2009). Since changes in instruments,
station moves, or different observing practices can cause step changes, a technique is
used for detecting a change in the trend of a time series by identifying the change point in
a two-phase regression. The obvious discontinuous points are homogenized after they are
combined with the station metadata. Fig. 1 shows the distribution of the station standard
deviations for the temperature homogenization of China. Twelve stations had standard
deviations larger than 1°C. Fig. 2 shows the temperature series curves of these stations
before and after the temperature revisions. The homogenization adjustment significantly
affects the local temperature of a few stations. Fig. 3 provides the annual adjustments
across all 607 sites according to the data of Li et al. It shows that the majority of
adjustments are in the range ±(0.01~1.00°C) and the “average” across all sites is
relatively to zero, but slightly negative.
Fig. 1 Standard deviations (SD) for station temperature homogenization
Fig. 2 Temperature series from the stations with standard deviations larger than 1℃. The blue line indicates
the pre-adjustment series and the red line indicates the post-adjustment series.
Fig. 3 Distribution of annual station homogeneity adjustments
Notice that the vertical axis corresponds to the relative frequency of each data value, which is written in
decimal form. The relative frequencies sum to 1. The histogram is based on adjustments from 607 stations
from CMA.
Supplementary 2: Treatment of missing data
The missing values are estimated using linear interpolation of data from the neighboring
stations. The reference neighboring stations are selected according to the following
criteria: the high correlation of temperature records with a coefficient (r) higher than 0.8
for the nearest 30 years around the missing year, the similar elevation, and the similar
land surface configuration. To evaluate the statistical fidelity of these regression models,
the split-sample calibration–verification tests (Meko and Graybill, 1995) are used. The
two most rigorous tests of model validation (the reduction of error (RE) and the
coefficient of efficiency (CE)) are calculated. The following Table 1 shows the sample
results for Tianjin station. The closer that the value is to 1, the better the regression model
is. Totally, the values of RE and CE are almost positive for all models. The test results
show the validity of the regression model.
We compare the differences between two mean annual temperature time-series before
and after missing data imputation (Fig. 4). In general, the data imputation does not
significantly influence the temperature trend, and only minor differences are present over
several years. The standard deviation of the annual difference series is 0.012°C. The
seasonal analysis shows similar results as the annual mean temperature (not shown).
Table 1 Statistics of calibration and verification test results for the Tianjin station
Calibration
Verification
Calibration
Verification
Full calibration
(1954-1973)
(1974-1983)
(1964-1983)
(1954-1963)
(1954-1983)
r
0.98
0.99
0.97
0.87
0.98
RE
-
1.00
-
0.52
-
CE
-
0.92
-
0.45
-
Fig. 4 Annual mean temperature anomaly before and after missing data imputation for China
Supplementary 3: Satellite observed land-use data
The land use database is developed by the Chinese Academy of Sciences (CAS). The
original data were derived from satellite remote sensing data provided by the US Landsat
TM/ETM images which have a spatial resolution of 30 by 30 meters (Vogelmann et al.
2001). These have been aggregated by CAS into 100 by 100 meters picture elements
(‘pixels’) (Liu et al. 2002 and 2005). We use the period selected from 1980 onwards. For
each time period more than 500 TM scenes are used to cover the entire country. The CAS
data team also spent considerable time to validate the interpretation of the images and
land-cover classifications against extensive field surveys (Liu et al. 2002). These
Landsat-TM/ETM images were geo-referenced and orthorectified, using field-collected
ground control points and high-resolution digital elevation models. For each TM/ ETM
scene, there are at least 20 evenly distributed sites served as Ground Control Points
(GCPs). Visual interpretation and digitization of images at the scale of 1:100,000 were
done to generate thematic maps of land cover under technical support from Intergraph
MGE (Modular GIS Environment) software. A hierarchical classification system of 25
land-use classes was originally applied to the data and the CAS data team aggregates
these further into six classes of land use–cultivated land, forestry area, grassland, water
area, residential and industrial land (including urban built-up land, other industrial land,
and rural residential land) and unused land. The urban built-up land is comprised of areas
of intensive use with much of the land covered by structures. The industrial lands are
such as large industrial areas, factories, oil fields, salt works, quarries, transport roads,
and airports.
Urban land-use types include urban built-up land and other industrial land uses in
large, medium and small cities and towns. Fig. 5 shows urban land use in China for the
selected period, with each cell representing a 100 m by 100 m pixel. Fig. 5a presents the
recent urban land-use coverage, which is 0.67 % of the total area of China. Fig. 5b shows
the increase in urban land use since 1980.
Fig. 5 Urban land use in China
a. 2005 b. increases since 1980. The processed image of urban land use is from satellite observation. The
unit is one pixel with 100m by 100 m.
Supplementary 4: Dividing method of urban and reference stations
To divide the stations into urban or rural types, we extract land data from the area
immediately surrounding a given meteorological station. Various distances around the
weather stations are selected for determining the radius of the area surrounding a station
with greatest correlation between annual temperature trends (⊿T) and urban land-
expansion (⊿U). Buffers to a radius of 50 km are developed around every station, and we
calculate the correlation coefficient (r) between ⊿T and ⊿U at every 5 km increment
(i.e. 1 km, 5 km, 10 km, … , 50 km). Fig. 6 presents r values for the various radii around
the stations. High values are found at10–20 km; beyond that, correlation decreased
gradually. At radii from 10–20 km, r was calculated again at every 1 km increment,
finding the highest value at radius 11 km, with r = 0.217 (n = 607) statistically significant
at the 95 % level (r = 0.217 > t0.05,607 = 0.08). Therefore, we choose urban land use data
within an 11 km circle (unit distance) surrounding each station to designate sites as either
urban or reference stations.
Fig. 6 Correlation of annual temperature trend and urban land-use change at variable distances around sites.
The urban land-use change refers to the increase over the 1980 level. The temperature trend is the
temperature increase in corresponding period.
The result indicates that 95 % of all stations have experienced significant urban landuse expansion in their surroundings. Therefore, it is difficult to select adequate numbers
of reference stations that are free from urbanization effects overall of China. Stations with
the least urban land use are regarded as such reference stations. To be defined as a
reference station, the urban area in the unit distance surrounding the station must have
been less than 1 %; otherwise, the station is designated an urban station. In certain highly
urbanized areas (e.g., eastern China), several neighboring grids may have had no
reference station. For these regions, a less strict reference-station criterion is applied, in
which the urban area can be larger than 1 % but less than 10 %, and the urban land use
over the past several decades is not expanded greatly. Fig. 7 shows the distribution of
urban and reference stations. There are 196 reference stations (in blue) and 411 urban
stations (in red).
Fig. 7 The distribution of meteorological stations in China used for this study covering the period 19512010 (Red- urban stations, blue- reference stations (see Sect. 4.2).
Supplementary 5: Estimation of uncertainties
Surface temperature errors originate from station error, bias error, imputation error,
sampling error, and interpolation error. Because these errors are independent, the total
error is the root of the sum of squares of all errors. The errors are estimated according to
Brohan et al.’s (2006) calculation method.
(1) Station error
The uncertainty in the reported station temperature originates from the measurement
error, homogenization adjustment error, and normal error (it is caused by subtracting the
station normal from the observed temperature). For the measurement error (ob), the
random error in a single thermometer reading is about 0.2°C (Folland et al. 2001), the
monthly average will be based on at least two readings a day throughout the month,
giving 60 or more values contributing to the monthly mean, and 730 or more values
contributing to the yearly mean. So the station error in the yearly average will be at most
0.2 / 730 = 0.007 °C, and this will be uncorrelated with the value for any other station
or the value for any other month or year. The homogenization adjustment error (H)
results from homogenization adjustments. Inhomogeneities are introduced into the station
temperature series by changes in the station site, changes in measurement time, or
changes in instrumentation (Brohan et al. 2006).So the station data were adjusted to
remove these inhomogeneities. But such adjustments are not exact, and the resulting H is
estimated according to the error value of 0.4 °C from Brohan et al. (2006) as the
homogenization adjustment error of individual station. The total adjustment errors of all
stations are calculated based on 0.4 / n , where n is the station number. The normal error
(N) results from the normalization of the observed temperature. The station temperature
in each month duringthe normal period can be considered as the sum of a constant station
normal value and a random weather value (with standard deviation  i ). The normal error
is
i
N
where N is the number of years for which there is data in the normal period.
(2) Bias error
Bias error has two sources: the effect of urbanization and the exposure of thermometer.
We discuss the former effect in more details later in this paper. Regarding the exposure of
the thermometer, note that after the 1950s thermometers have been placed in instrument
shelters; therefore, this problem no longer exists.
(3) Imputation error
Missing monthly station data imputation via interpolation causes errors. The method used
to calculate the imputation error when Station A has missing data and Station B is close
to A and is expected to be the most relevant is as follows. First, a regression model is
fitted to the monthly temperature data of Stations A and B during a period without
missing data. Then, the missing temperature data of Station A are simulated using the
predicted values from the model. Finally, the standard error of the regression slope is
calculated as the imputation error. This is done for each single month and the annual
values for different years are the mean of all monthly values at each year.
(4) Sampling error(SE)
SE occurs when the temperature estimate does not represent the true grid value because
there are not enough stations within the grid box. We use Jones et al.’s (1997) calculation
method to estimate the SE of each grid-box. Then the SE of the region is calculated using
the large-scale regional average method applied by Smith et al. (1994) and Jones et al.
(1997).
First, we calculated the SE of each grid point,
SE 
2
si 2 r (1  r )
(1)
1+(n-1)r
where si 2 is the temperature mean variance of each station within the grid point, r is the
average correlation coefficient between stations, and n is the number of stations in the
grid point.
For grid points with multiple stations, r was calculated using the station data. For grid
points with few stations, r was calculated using the theory of temperature correlation
decay length. This expression is
r
x0
(1  e X / x0 )
X
(2)
where X is the diagonal length of the grid point, and x0 is the correlation decay length. To
calculate the correlation decay length, please refer to Briffa et al. (1993).
To avoid the influences of station relocation and station density change, the
temperature mean variance of the stations at grid point si 2 can be calculated as
2
si 2  S n / [1  (n 1)r ]
(3)
2
where si is the estimation of grid-point temperature variance during the climate
reference period, and n is the average number of stations in the grid point during the
climate reference period.
Second, we calculated the SE of the region using the large-scale regional average
method. The regional average of SEregion is
SE 2 region  SE 2 / N eff
(4)
2
where SE is the area-weighted average of the grid-point sampling variance, and Neff is
the degrees of freedom. This equation can be expressed as
N eff  2 R / F
F (
(5)
e  R / x0 1
1
1
 )/( 2  2)
R
R
R
x
0
(6)
where R is the radius of Earth, and x0 is the average correlation decay length of the
region.
(5) Interpolation error
Interpolation error occurs during interpolation because of incomplete data coverage when
gridding the station data. There are two types of interpolation errors: (a) when there is no
station in the grid and interpolation is performed with regard to the nearest station within
a neighboring grid; and (b) when there is no rural station in the grid, but most of the
region is agricultural land and interpolation is performed with regard to the nearest rural
station. The method used to calculate the interpolation error is as follows. Assume that
stations A, B, C, D, and E exist in certain regions. Now, suppose that the stations B-E do
not exist; in this case, the temperature of station A is used to impute the temperatures of
stations B-E. We are able to calculate the decay region of the interpolation error for
station A. As shown in Fig. 8, the y-axis is the temperature standard deviation between
station A (for example) and stations B-E. The x-axis represents the distance between
station A and its neighboring stations. The scatter plot suggests that the shorter distances
between stations predict smaller temperature standard deviations and also the more
accurate value of interpolation. An exponential relationship can be obtained via modeling
the relationship of standard deviations and distances among stations. Now, suppose we
need to use the temperature of station A to interpolate that of station F; the interpolation
error of station F can be calculated by substituting the distance between stations A and F
into the exponential model. Here the 4 examples shown in Fig. 8 are selected randomly.
Fig. 8 The temperature Standard Deviations (SD) between annual station anomalies and the distance
between 4 selected stations. (The solid line is the exponential model fit)
Download