STATISTICAL ANALYSIS OF URBAN HEAT ISLAND BY TEMPERATURE CHANGES IN URBAN AREA Gendai Takahashi 1 , Tomomichi Suzuki 2 1,2 Department of Industrial Administration, Tokyo University of Science 2461 Yamazaki, Noda, Chiba, 278-8510, JAPAN 1 j7407617@ed.noda.tus.ac.jp 2 suzuki@ia.noda.tus.ac.jp ABSTRACT The temperature of urban district is often higher than that of surrounding area and it is called “urban heat island”. Tokyo is not an exception. Urban heat island is becoming serious problem. We aim to understand the feature of urban heat island of Tokyo by using the temperature data and using the statistical method. Missing values and outliers are often included in the kind of observed data. If we analyzed such data without approach for missing value and outlier, the result might be biased and wrong. Thus, handling of missing values and outliers is important. Therefore, this paper investigates patterns and detection of missing value and outlier. in the actually observed temperature data. We found out that missing values appeared at different moments and at different observation points. We also found out that missing time’s interval is different various. We should consider spatio-temporal correlation to detection of outlier. Keyword: spatio-temporal data missing data and outlier analysis multivariate time series INTRODUCTION The temperature of urban district is higher than that of surrounding area and it is called “urban heat island”. Tokyo is not an exception. According to the Japan Meteorological Agency, it is said that the average yearly temperature of Tokyo rose about 3˚C in 100 years [2]. Amount of rise in average temperature of Tokyo is one of the highest values among other big cities in Japan. Cause of urban heat island are thought to be change in ground cover by urbanize of city, increased use of energy, and decrease of nature. In addition, urban heat island causes an increase of tropical night and local downpour and heatstroke. Urban heat island is becoming serious problem. Urban heat island has been analyzed by many fields of researches. For example, geography and climatology study airflow structure or effect on precipitation. Medicine and biology study effect on ecologic system or health. A statistical analysis is one of those fields of research. In previous studies, time range to be analyzed is secular change, seasonal change, and diurnal change. And space range to be analyzed is country, region, and one city, town, streets. We aim to understand the feature of urban heat island of Tokyo by using the temperature data and using the statistical method. In the measurement data of the observation of the climate, the erroneous data might be measured due to some troubles even if the measurement is done with appropriate equipment, or appropriate maintenance is done. An automatic climate observing system usually observed by measuring machine. If the trouble is occurred by power failure, the data is missing. So, missing values and outliers are often included in the actually observed data. If we analyzed such data without approach for missing values and outliers, the result might be 1 Gendai Takahashi1, Tomomichi Suzuki2 biased and wrong. Therefore, we should do some corrections for the data. The temperature data is time series, and there are features in each area. When correcting them, we should consider the spatio-temporal correlation. It is difficult to detect and correct all the wrong data instantly by automatic detection. Because if we don’t consider the climate condition or feature in each area, the true value might be judged as outliers or wrong values might be missed. Therefore, when we analyze including missing value and outlier, we should research the missing patterns and accurate detection of outliers. And we should correct them according to their patterns. We aim to find patterns and detection of missing values and outliers in actually observed data. DATA TO BE ANALYZED We use temperature data of METROS [1] data. This data were observed every ten minutes in Tokyo metropolitan area, and there are 106 observation points. Observation location is in elementary schools. There is about 2.5 km distance between each observation points. Observing period is from July, 2002 to March, 2005. Figure 1 shows the map of the observation points. 3 km Tokyo bay From Tokyo Metropolitan Research Institute for Environmental Protection Figure 1 observation points of METROS 2 Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area MISSING VALUE ANALYSIS At first, we focus missing values of METROS data in 2003. We study situation of missing values in the actually observed temperature data. To investigate the interval time in missing values, we shows Figure 2. Figure 2 is the histogram of interval time from causing missing 20 18 16 10 8 6 4 2 less than an hour 14 n=90 ave=3.13 sd=3.11 12 45 40 35 30 25 20 15 10 5 0 0 frequency values to the start of the observation again. interval time of missing value (week) Figure 2 histogram of missing interval Missing value happened 90 times in total. From Figure 2, missing interval of less than an hour is about 15% of the total. And another is long interval, for example, two weeks or one month or four months. When we interpolate the value to the missing part, Short interval (from ten minutes to three minutes) can be interpolate mean of near times. But when missing part is long interval, mean on the day or mean of near times might be not suitable. We should consider the past value or change pattern of temperature of the observed points. To investigate frequency by the season, we show Figure 3. Figure 3 is histogram of the month that missing starts. 16 n=90 ave=5.96 sd=3.57 14 frequency 12 10 8 6 4 2 12 11 10 9 8 7 6 5 4 3 2 1 0 month Figure 3 histogram of month that missing starts From Figure 3, many missing values start in March, April, and May. And it is fewer than another in summer. There is bias the month that missing happens. To investigate time zone that missing happens, we show Figure 4. Figrue4 is histogram of time zone that missing starts. From Figure 4, many missing values start in daytime, especially between ten o’clock and eleven o’clock. And missing values don’t happen so much in nighttime. 3 Gendai Takahashi1, Tomomichi Suzuki2 20 18 n=90 ave=10.57 sd=4.37 16 frequency 14 12 10 8 6 4 2 23 22 21 20 19 18 17 16 15 14 13 12 11 9 10 8 7 6 5 4 3 2 1 0 0 time Figure 4 histogram of time zone that missing starts Discussion of missing value pattern The temperature data by METROS often became the missing value. Cause of missing value is thought an external factor or accident of the measuring machine. The observer was collecting data every one and a half month. If the measuring machine had stopped measuring at that time, the observer restarts or fixes up the measuring machine. Therefore, from Figure 2, missing interval will often end within one and a half month. The missing interval of four and a half month continues until the end of measurement. It will be thought that the observer doesn’t fix up the measuring machine, because the end of measurement is near. Some of the short missing interval will be caused by external factor rather than breakdown of the measuring machine, for example, momentary power failure around the observed point or children’s caper (Observer’s location is in the elementary school.). From Figure 3, the reason that missing value is extremely few in August is maybe that there are few children in the school because of summer holiday. From Figure 4, the reason for many missing values start in daytime is maybe large effect from external factor, too. OUTLIER ANALYSIS As previously mentioned it is difficult to detection and corrects all wrong data instantly by automatic detection. Because if we don’t considered the climate condition or feature in each area, the true value may be judged as outlier or a wrong value may be missed. Consequently, this chapter regards about outlier pattern in the METROS data. Procedure We investigated the data according to the following procedures. (1) We check the change of the temperature of all observed points. At first, we should check the change of temperature of entire Tokyo. The data include unreal values and deviated changes obviously. For example, -10˚C is observed in Tokyo in midsummer, or the values are 20~30˚C higher than other points. To find the pattern of these values, we made the entire graph. This pattern can be found by even watching by writing the graph. We found four patterns of outliers by making the entire graph. (2) We check the variability of the temperature of each point at certain time. 4 Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area After checked rough outliers in (1), we checked the variability of value of each point at certain time. We made the scatter diagram of the temperature at certain time. And we showed the geographical distribution of the temperature at certain time on the map. At first, we find the observed points of the outlier in the scatter diagram. And we compare it with the value near the points by geographical distribution of the temperature. Because the value of around points might be similar, it is easy to find the change of the values. And this step might be found outlier that missed at (1). We found three patterns of outliers. Result We found out that outlier pattern or doubtful values. Our pattern is following. Pattern1. unreal value and change that deviates obviously Pattern2. similar change to others, but value is high or low Pattern3. strange change at long interval Pattern4. same values at long interval Pattern5. change of spike Pattern6. opposite change to other points Pattern7. a similar spike in the same time zone on near days We show figures of examples of these patterns. [Pattern1. unreal value and change that deviates obviously] 14 August ~ 16 August 50 Observation temperature [℃] 40 Location Number 100 55 2 43 51 3 39 30 20 10 0 0 :00 12 0:0 0 0 :00 12 0:0 :00 12 0:0 -10 time Figure 5 examples of unreal value and deviated changes obviously (1) From Figure 5, Point 100, Point 55, Point 2, Point 43 might be outlier. Reason for outlier, this time was raining heavily. Outlier might have occurred because the measuring machine was set up out of doors. Moreover, in this case, the judgment of outlier will often be difficult. For example, it is difficult to judge when point 43 returned normally. 5 Gendai Takahashi1, Tomomichi Suzuki2 19 May ~ 21 May 50 Observation Location Number temperature [℃] 40 51 52 53 54 55 56 57 58 30 20 10 0 :00 12 0:0 0 0 0 :00 12 0:0 :00 12 0:0 -10 time Figure 6 examples of unreal value and deviated changes obviously (2) From Figure 6, it was raining with thunder at night, 20 May. Point 52 and Point 55 might be outlier. And there is a machine that it happens outliers many times like Point 55. By the way, Point51~58 are near places. From Figure 6, there are outlier and normal value at the same time. So, we cannot automatically judge, “it is outlier because it is a day of rain”. [Pattern2. similar change to others, but value is high or low] We found other pattern that the values similar change to others, but values are high or low. We show Figure 7. Figure 7 show example of this pattern. From Figure 7, these observed points are near places. The values of Point 13 are similar to change to other points. But the values are lower than others. And the values became clear outliers at the end. Thus, the values of Point 13 have the possibility that is gradually outlier. We often found this pattern near unreal values and deviated values obviously. And in this pattern, it is difficult to judge when have became outlier. This pattern is not found easy from the graph of only one point, because we will be difficult to check for doubtful values. Thus, to make the graph of entire points has effect to find this pattern. 17 August ~ 23 August 45 Observation 40 Location Number temperature [℃] 35 1 12 13 50 53 30 25 20 15 10 0 :00 12 0:0 0 :00 12 0:0 0 :00 12 0:0 0 :00 12 0:0 0 :00 12 0:0 0 :00 12 0:0 0 :00 12 0:0 time Figure 7 examples of similar change to others, but value is high or low 6 Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area [Pattern3. strange change at long interval] We found the pattern that strange change at long interval. We show this pattern at Figure 8. This pattern is difficult to check for doubtful values from the graph of only one point, too. 16 August ~ 18 August 30 Observation 28 Location Number 26 temperature [℃] 24 8 9 10 11 59 22 20 18 16 14 12 10 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: 0 :0 12 00 0: time Figure 8 strange changes at long interval [Pattern4. same values at long interval] We found the pattern same values at long interval. We show this pattern at Figure 9. We do not think that the temperature takes the same values for a long time. So, the values might be outliers. 26 January ~ 17 March 20 Observation Location Number temperature [˚C] 15 10 23 24 25 26 5 0 -5 time Figure 9 same values at long interval [Pattern5. change of spike] The value often rapidly rises or drops suddenly. They are spikes when seeing in the graph. We show Figure10 and 11. Figure 10 and 11 show change of spike. 7 Gendai Takahashi1, Tomomichi Suzuki2 28 January 20 Observation temperature [˚C] 15 Location Number 10 31 32 33 34 35 36 5 0 :00 20 :00 16 0 0 0 :00 12 8:0 4:0 0:0 -5 -10 time Figure 10 change of spike (1) From Figure 10, Point 36 rapidly rises suddenly. Point31~36 are near places. The judgment of this value of Point 31 is difficult. It is doubtful because these changes are different from around points. However, if something has happened locally in the Point 36, for example, the Point 31 fines locally, it is values that can happen. Figure 11 also show change of spike. Point 51 rapidly drops suddenly. The judgment of this value is difficult by same reason, too. To consider these patterns, we should consider the correlation of the spatio-temporal correlation. 7 Octorber 30 Observation temperature [˚C] 25 Location Number 20 51 52 53 54 55 15 10 5 0 :00 20 :00 16 0 0 0 :00 12 8:0 4:0 0:0 time Figure 11 change of spike (2) [Pattern6. opposite change to other points] We found the pattern that opposite change to other points. We show this pattern at Figure 12. These observed points are near places. But Point 9 changes opposite to other points. It is not thought that the temperature locally change oppositely to other points for a day. Thus, we think that these values are outliers. 8 Statistical Analysis of Urban Heat Island by Temperature Changes in Urban Area 16 August ~ 18 August temperature [˚C] 30 28 Observation 26 Location Number 24 8 9 10 11 59 22 20 18 16 14 12 10 :00 12 0:0 0 0 0 :00 12 0:0 :00 12 0:0 time Figure 12 opposite change to other points [Pattern7. a similar spike in the same time zone on near days] Though this looks like Figure 10, 11, this pattern has the feature that the spike is seen in the same time zone on near days. We show this pattern at Figure13. 6 January 10 January 20 20 15 15 Observation 10 temperature [℃] temperature [℃] Location Number 37 10 38 39 40 5 41 42 0 5 0 :00 20 :00 16 :00 12 0 8:0 0 0 4:0 0:0 :00 20 :00 16 :00 12 0 8:0 0 4:0 0 0:0 -5 37 38 39 40 41 42 -5 -10 -10 time time Figure 13 similar spikes in the same time zone on near days CONCLUSION We investigate patterns of missing value and detection of outlier in the actually observed temperature data. We found out that missing value appeared at different moments and at different observation points. We also found out that missing time’s interval is different various. We found seven patterns of outlier by doing two procedures. As a result, we could confirm that varieties of kinds of patterns of outlier are included in actually observed data. It is difficult to detection and corrects all wrong data instantly by only simple automatic detection. Simple automatic detection, for example, the value of 50˚C is erased, is not possible to detection of all outliers. Thus, we should consider 9 Gendai Takahashi1, Tomomichi Suzuki2 the method that each pattern can be detected. And we should consider the criteria to judge whether the values are outliers in consideration of those patterns. Therefore, it is important that we research missing value patterns and outlier patterns in the actually observed data. As future tasks, we continuously investigate whether other outlier patterns exist. In addition, we obtain the criteria that detect the outliers in consideration of the feature of those patterns of outlier. Moreover, we study how to interpolate to the actually observed data. REFERENCE [1] Yokoyama Hitoshi, Ando Haruo, Yamaguchi Takako, Ichino Mika, Akiyama Yukari, Ishii Koichiro, Mikami Takehiko, Realities of summer urban heat island in Tokyo wards –observation results of METROS since 2002 to 2004-, Annual report of the Tokyo metropolitan research institute for environmental protection, 2005, pp.3-9 [2] Tokyo metropolitan research institute for environmental protection, http://www2.kankyo.metro.tokyo.jp/kankyoken/ [3] Tomomichi Suzuki, Yoshinori Iizuka, Statistical approach on analyzing heat island phenomena, Proc. The 2nd international symposium on business and industrial statistics (Yokohama), 151-154, 2001-2008 10