Statistical analysis of urban heat island by temperature changes in

advertisement
STATISTICAL ANALYSIS OF URBAN HEAT ISLAND
BY TEMPERATURE CHANGES IN URBAN AREA
Gendai Takahashi 1 , Tomomichi Suzuki 2
1,2
Department of Industrial Administration, Tokyo University of Science
2461 Yamazaki, Noda, Chiba, 278-8510, JAPAN
1
j7407617@ed.noda.tus.ac.jp
2
suzuki@ia.noda.tus.ac.jp
ABSTRACT
The temperature of urban district is often higher than that of surrounding area and it is called “urban heat island”. Tokyo is not an
exception. Urban heat island is becoming serious problem. We aim to understand the feature of urban heat island of Tokyo by using the
temperature data and using the statistical method. Missing values and outliers are often included in the kind of observed data. If we
analyzed such data without approach for missing value and outlier, the result might be biased and wrong. Thus, handling of missing
values and outliers is important. Therefore, this paper investigates patterns and detection of missing value and outlier. in the actually
observed temperature data. We found out that missing values appeared at different moments and at different observation points. We also
found out that missing time’s interval is different various. We should consider spatio-temporal correlation to detection of outlier.
Keyword: spatio-temporal data
missing data and outlier analysis multivariate time series
INTRODUCTION
The temperature of urban district is higher than that of surrounding area and it is called “urban heat island”. Tokyo is not an exception.
According to the Japan Meteorological Agency, it is said that the average yearly temperature of Tokyo rose about 3˚C in 100 years [2].
Amount of rise in average temperature of Tokyo is one of the highest values among other big cities in Japan. Cause of urban heat island
are thought to be change in ground cover by urbanize of city, increased use of energy, and decrease of nature. In addition, urban heat
island causes an increase of tropical night and local downpour and heatstroke. Urban heat island is becoming serious problem. Urban
heat island has been analyzed by many fields of researches. For example, geography and climatology study airflow structure or effect on
precipitation. Medicine and biology study effect on ecologic system or health. A statistical analysis is one of those fields of research. In
previous studies, time range to be analyzed is secular change, seasonal change, and diurnal change. And space range to be analyzed is
country, region, and one city, town, streets. We aim to understand the feature of urban heat island of Tokyo by using the temperature data
and using the statistical method.
In the measurement data of the observation of the climate, the erroneous data might be measured due to some troubles even if the
measurement is done with appropriate equipment, or appropriate maintenance is done. An automatic climate observing system usually
observed by measuring machine. If the trouble is occurred by power failure, the data is missing. So, missing values and outliers are often
included in the actually observed data. If we analyzed such data without approach for missing values and outliers, the result might be
1
Gendai Takahashi1, Tomomichi Suzuki2
biased and wrong. Therefore, we should do some corrections for the data. The temperature data is time series, and there are features in
each area. When correcting them, we should consider the spatio-temporal correlation. It is difficult to detect and correct all the wrong
data instantly by automatic detection. Because if we don’t consider the climate condition or feature in each area, the true value might be
judged as outliers or wrong values might be missed. Therefore, when we analyze including missing value and outlier, we should research
the missing patterns and accurate detection of outliers. And we should correct them according to their patterns.
We aim to find patterns and detection of missing values and outliers in actually observed data.
DATA TO BE ANALYZED
We use temperature data of METROS [1] data. This data were observed every ten minutes in Tokyo metropolitan area, and there are
106 observation points. Observation location is in elementary schools. There is about 2.5 km distance between each observation points.
Observing period is from July, 2002 to March, 2005. Figure 1 shows the map of the observation points.
3 km
Tokyo bay
From Tokyo Metropolitan Research Institute for Environmental Protection
Figure 1
observation points of METROS
2
Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area
MISSING VALUE ANALYSIS
At first, we focus missing values of METROS data in 2003. We study situation of missing values in the actually observed temperature
data.
To investigate the interval time in missing values, we shows Figure 2. Figure 2 is the histogram of interval time from causing missing
20
18
16
10
8
6
4
2
less than an hour
14
n=90
ave=3.13
sd=3.11
12
45
40
35
30
25
20
15
10
5
0
0
frequency
values to the start of the observation again.
interval time of missing value (week)
Figure 2
histogram of missing interval
Missing value happened 90 times in total. From Figure 2, missing interval of less than an hour is about 15% of the total. And another is
long interval, for example, two weeks or one month or four months. When we interpolate the value to the missing part, Short interval
(from ten minutes to three minutes) can be interpolate mean of near times. But when missing part is long interval, mean on the day or
mean of near times might be not suitable. We should consider the past value or change pattern of temperature of the observed points.
To investigate frequency by the season, we show Figure 3. Figure 3 is histogram of the month that missing starts.
16
n=90
ave=5.96
sd=3.57
14
frequency
12
10
8
6
4
2
12
11
10
9
8
7
6
5
4
3
2
1
0
month
Figure 3
histogram of month that missing starts
From Figure 3, many missing values start in March, April, and May. And it is fewer than another in summer. There is bias the month
that missing happens.
To investigate time zone that missing happens, we show Figure 4. Figrue4 is histogram of time zone that missing starts.
From Figure 4, many missing values start in daytime, especially between ten o’clock and eleven o’clock. And missing values don’t
happen so much in nighttime.
3
Gendai Takahashi1, Tomomichi Suzuki2
20
18
n=90
ave=10.57
sd=4.37
16
frequency
14
12
10
8
6
4
2
23
22
21
20
19
18
17
16
15
14
13
12
11
9
10
8
7
6
5
4
3
2
1
0
0
time
Figure 4
histogram of time zone that missing starts
Discussion of missing value pattern
The temperature data by METROS often became the missing value. Cause of missing value is thought an external factor or accident of
the measuring machine. The observer was collecting data every one and a half month. If the measuring machine had stopped measuring
at that time, the observer restarts or fixes up the measuring machine. Therefore, from Figure 2, missing interval will often end within one
and a half month. The missing interval of four and a half month continues until the end of measurement. It will be thought that the
observer doesn’t fix up the measuring machine, because the end of measurement is near.
Some of the short missing interval will be caused by external factor rather than breakdown of the measuring machine, for example,
momentary power failure around the observed point or children’s caper (Observer’s location is in the elementary school.). From Figure 3,
the reason that missing value is extremely few in August is maybe that there are few children in the school because of summer holiday.
From Figure 4, the reason for many missing values start in daytime is maybe large effect from external factor, too.
OUTLIER ANALYSIS
As previously mentioned it is difficult to detection and corrects all wrong data instantly by automatic detection. Because if we don’t
considered the climate condition or feature in each area, the true value may be judged as outlier or a wrong value may be missed.
Consequently, this chapter regards about outlier pattern in the METROS data.
Procedure
We investigated the data according to the following procedures.
(1) We check the change of the temperature of all observed points.
At first, we should check the change of temperature of entire Tokyo. The data include unreal values and deviated changes obviously.
For example, -10˚C is observed in Tokyo in midsummer, or the values are 20~30˚C higher than other points. To find the pattern of these
values, we made the entire graph. This pattern can be found by even watching by writing the graph. We found four patterns of outliers by
making the entire graph.
(2) We check the variability of the temperature of each point at certain time.
4
Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area
After checked rough outliers in (1), we checked the variability of value of each point at certain time. We made the scatter diagram of
the temperature at certain time. And we showed the geographical distribution of the temperature at certain time on the map.
At first, we find the observed points of the outlier in the scatter diagram. And we compare it with the value near the points by
geographical distribution of the temperature. Because the value of around points might be similar, it is easy to find the change of the
values. And this step might be found outlier that missed at (1). We found three patterns of outliers.
Result
We found out that outlier pattern or doubtful values. Our pattern is following.
Pattern1. unreal value and change that deviates obviously
Pattern2. similar change to others, but value is high or low
Pattern3. strange change at long interval
Pattern4. same values at long interval
Pattern5. change of spike
Pattern6. opposite change to other points
Pattern7. a similar spike in the same time zone on near days
We show figures of examples of these patterns.
[Pattern1. unreal value and change that deviates obviously]
14 August ~ 16 August
50
Observation
temperature [℃]
40
Location Number
100
55
2
43
51
3
39
30
20
10
0
0
:00
12
0:0
0
0
:00
12
0:0
:00
12
0:0
-10
time
Figure 5
examples of unreal value and deviated changes obviously (1)
From Figure 5, Point 100, Point 55, Point 2, Point 43 might be outlier. Reason for outlier, this time was raining heavily. Outlier might
have occurred because the measuring machine was set up out of doors. Moreover, in this case, the judgment of outlier will often be
difficult. For example, it is difficult to judge when point 43 returned normally.
5
Gendai Takahashi1, Tomomichi Suzuki2
19 May ~ 21 May
50
Observation
Location Number
temperature [℃]
40
51
52
53
54
55
56
57
58
30
20
10
0
:00
12
0:0
0
0
0
:00
12
0:0
:00
12
0:0
-10
time
Figure 6
examples of unreal value and deviated changes obviously (2)
From Figure 6, it was raining with thunder at night, 20 May. Point 52 and Point 55 might be outlier. And there is a machine that it
happens outliers many times like Point 55. By the way, Point51~58 are near places. From Figure 6, there are outlier and normal value at
the same time. So, we cannot automatically judge, “it is outlier because it is a day of rain”.
[Pattern2. similar change to others, but value is high or low]
We found other pattern that the values similar change to others, but values are high or low. We show Figure 7. Figure 7 show example
of this pattern. From Figure 7, these observed points are near places. The values of Point 13 are similar to change to other points. But the
values are lower than others. And the values became clear outliers at the end. Thus, the values of Point 13 have the possibility that is
gradually outlier. We often found this pattern near unreal values and deviated values obviously. And in this pattern, it is difficult to judge
when have became outlier. This pattern is not found easy from the graph of only one point, because we will be difficult to check for
doubtful values. Thus, to make the graph of entire points has effect to find this pattern.
17 August ~ 23 August
45
Observation
40
Location Number
temperature [℃]
35
1
12
13
50
53
30
25
20
15
10
0
:00
12
0:0
0
:00
12
0:0
0
:00
12
0:0
0
:00
12
0:0
0
:00
12
0:0
0
:00
12
0:0
0
:00
12
0:0
time
Figure 7
examples of similar change to others, but value is high or low
6
Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area
[Pattern3. strange change at long interval]
We found the pattern that strange change at long interval. We show this pattern at Figure 8. This pattern is difficult to check for
doubtful values from the graph of only one point, too.
16 August ~ 18 August
30
Observation
28
Location Number
26
temperature [℃]
24
8
9
10
11
59
22
20
18
16
14
12
10
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
0
:0
12
00
0:
time
Figure 8
strange changes at long interval
[Pattern4. same values at long interval]
We found the pattern same values at long interval. We show this pattern at Figure 9. We do not think that the temperature takes the
same values for a long time. So, the values might be outliers.
26 January ~ 17 March
20
Observation
Location Number
temperature [˚C]
15
10
23
24
25
26
5
0
-5
time
Figure 9
same values at long interval
[Pattern5. change of spike]
The value often rapidly rises or drops suddenly. They are spikes when seeing in the graph. We show Figure10 and 11. Figure 10 and 11
show change of spike.
7
Gendai Takahashi1, Tomomichi Suzuki2
28 January
20
Observation
temperature [˚C]
15
Location Number
10
31
32
33
34
35
36
5
0
:00
20
:00
16
0
0
0
:00
12
8:0
4:0
0:0
-5
-10
time
Figure 10 change of spike (1)
From Figure 10, Point 36 rapidly rises suddenly. Point31~36 are near places. The judgment of this value of Point 31 is difficult. It is
doubtful because these changes are different from around points. However, if something has happened locally in the Point 36, for
example, the Point 31 fines locally, it is values that can happen.
Figure 11 also show change of spike. Point 51 rapidly drops suddenly. The judgment of this value is difficult by same reason, too.
To consider these patterns, we should consider the correlation of the spatio-temporal correlation.
7 Octorber
30
Observation
temperature [˚C]
25
Location Number
20
51
52
53
54
55
15
10
5
0
:00
20
:00
16
0
0
0
:00
12
8:0
4:0
0:0
time
Figure 11 change of spike (2)
[Pattern6. opposite change to other points]
We found the pattern that opposite change to other points. We show this pattern at Figure 12. These observed points are near places.
But Point 9 changes opposite to other points. It is not thought that the temperature locally change oppositely to other points for a day.
Thus, we think that these values are outliers.
8
Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area
16 August ~ 18 August
temperature [˚C]
30
28
Observation
26
Location Number
24
8
9
10
11
59
22
20
18
16
14
12
10
:00
12
0:0
0
0
0
:00
12
0:0
:00
12
0:0
time
Figure 12 opposite change to other points
[Pattern7. a similar spike in the same time zone on near days]
Though this looks like Figure 10, 11, this pattern has the feature that the spike is seen in the same time zone on near days. We show
this pattern at Figure13.
6 January
10 January
20
20
15
15
Observation
10
temperature [℃]
temperature [℃]
Location Number
37 10
38
39
40 5
41
42 0
5
0
:00
20
:00
16
:00
12
0
8:0
0
0
4:0
0:0
:00
20
:00
16
:00
12
0
8:0
0
4:0
0
0:0
-5
37
38
39
40
41
42
-5
-10
-10
time
time
Figure 13 similar spikes in the same time zone on near days
CONCLUSION
We investigate patterns of missing value and detection of outlier in the actually observed temperature data.
We found out that missing value appeared at different moments and at different observation points. We also found out that missing
time’s interval is different various.
We found seven patterns of outlier by doing two procedures. As a result, we could confirm that varieties of kinds of patterns of outlier
are included in actually observed data. It is difficult to detection and corrects all wrong data instantly by only simple automatic detection.
Simple automatic detection, for example, the value of 50˚C is erased, is not possible to detection of all outliers. Thus, we should consider
9
Gendai Takahashi1, Tomomichi Suzuki2
the method that each pattern can be detected. And we should consider the criteria to judge whether the values are outliers in consideration
of those patterns. Therefore, it is important that we research missing value patterns and outlier patterns in the actually observed data.
As future tasks, we continuously investigate whether other outlier patterns exist. In addition, we obtain the criteria that detect the
outliers in consideration of the feature of those patterns of outlier. Moreover, we study how to interpolate to the actually observed data.
REFERENCE
[1] Yokoyama Hitoshi, Ando Haruo, Yamaguchi Takako, Ichino Mika, Akiyama Yukari, Ishii Koichiro, Mikami Takehiko, Realities of
summer urban heat island in Tokyo wards –observation results of METROS since 2002 to 2004-, Annual report of the Tokyo
metropolitan research institute for environmental protection, 2005, pp.3-9
[2] Tokyo metropolitan research institute for environmental protection, http://www2.kankyo.metro.tokyo.jp/kankyoken/
[3] Tomomichi Suzuki, Yoshinori Iizuka, Statistical approach on analyzing heat island phenomena, Proc. The 2nd international
symposium on business and industrial statistics (Yokohama), 151-154, 2001-2008
10
Download