1 COST733CAT - a database of weather and circulation type classifications 2 Andreas Philipp 3 University of Augsburg, Germany 4 a.philipp@geo.uni-augsburg.de 5 6 Judit Bartholy 7 Eotvos Lorand University, Budapest, Hungary 8 bari@ludens.elte.hu 9 10 Christoph Beck 11 University of Augsburg, Germany 12 c.beck@geo.uni-augsburg.de 13 14 Michel Erpicum 15 16 Pere Esteban 17 Institut d'Estudis Andorrans (IEA), Andorra 18 pesteban.cenma@iea.ad 19 20 Xavier Fettweis 21 22 Radan Huth 23 Institute of Atmospheric Physics, Prague, Czech Republic 24 huth@ufa.cas.cz 25 26 Paul James 27 28 Sylvie Jourdain 29 30 31 Frank Kreienkamp 32 Climate and Environment Consulting Potsdam GmbH, Germany 33 frank.kreienkamp@cec-potsdam.de 34 35 Thomas Krennert 36 37 Spyros Lykoudis 38 Institute of Environmental Research and Sustainable Development, National Observatory of 39 Athens, Greece 40 slykoud@meteo.noa.gr 41 Silas Michalides 42 43 44 Krystyna Pianko 45 Piia Post 46 47 Domingo Rassilla Álvarez 48 49 Arne Spekat 50 Climate and Environment Consulting Potsdam GmbH, Germany 51 arne.spekat@cec-potsdam.de 52 53 Filippos S. Tymvios 54 55 Correspondence to: Andreas Philipp, Universitaetsstrasse 10, 86135 Augsburg, Germany 56 57 Key words: weather type classification, circulation type classification, dataset, COST 733, Europe 58 Abstract 59 A new database of classification catalogs has been compiled during the COST Action 733 60 "Harmonisation and Applications of Weather Type Classifications for European regions" in order to 61 evaluate different methods for weather and circulation type classification. This paper gives a 62 technical description of the included methods and provides a basis for further studies dealing with 63 the dataset. Even though the list of included methods is by far not complete, it demonstrates the 64 huge variety of methods and their variations. In order to allow a systematic overview an attempt for 65 a new conceptional systemization for classification methods is presented reflecting the way types 66 are defined. Methods using predefined types include manual and threshold based classifications 67 while methods producing derived types include those based on eigenvector techniques, leader 68 algorithms and optimization algorithms. Concerning the input dataset and the method configuration 69 the classification catalogs included in the dataset have been unified to allow for comparisons of the 70 pure methods. The different methods are discussed with respect to their intention and 71 implementation as well as their systematization. 72 73 74 1.) Introduction 75 Classification of weather and atmospheric circulation states into distinct types is a widely used tool 76 for describing and analyzing weather and climate conditions. The general idea is to transfer 77 multivariate information given on the metrical scale in a input dataset, e.g. a time series of daily 78 pressure fields, to a univariate time series of type membership on the nominal scale, i.e. a so called 79 classification catalog. The advantage of such a substantial information compression is the 80 straightforward use of the catalogs. On the other hand the loss of information caused by the 81 classification process makes it sometimes difficult to relate the remaining information to other 82 climate elements like temperature or precipitation, which in many cases is the main objective for 83 applications of classifications. For other applications which are just interested in a compact 84 description of atmospheric dynamics the problem is the same since as much information as possible 85 should be recorded by a single nominal variable, the classification catalog. It may be a consequence 86 of this contrariness, that the number of different classification methods is huge and still increasing, 87 in the hope to find a classification method producing a simple catalog but reflecting the most 88 relevant parts of climate variability. However, the number of classification methods and their results 89 is a drawback as it is hard to decide which one to use for certain applications. This was the reason to 90 initiate COST action 733 entitled "Harmonisation and Applications of Weather Type Classifications 91 for European regions" that intents to compare different classification methods and to evaluate 92 whether there might be one or a few universally superior methods, which can be recommended. 93 Further it should be evaluated whether it is possible to combine favorable properties of existing 94 methods in order to develop a reference classification. The basis for this work has been established 95 by producing a database of classification catalogs (called cost733cat) under unified conditions 96 concerning the input dataset and the configuration of the classification procedure in order to make 97 the resulting catalogs as comparable as possible concerning the method alone. As a consequence in 98 some aspects the included catalogs might be suboptimal for some applications, e.g. all methods 99 have been applied to mean sea level pressure while the 850 or 500 hPa geopotential height might be 100 more suitable for some applications. However, the current version of the collection, which will be 101 made available for free at the end of COST Action 733 in 2010, can be applied for more than 102 intercomparison questions and future versions might include more suitable configurations. 103 In order to describe the included methods an attempt for a new systemization has been developed 104 within the COST Action and is used here in the following. For a long while classification methods 105 have been divided into two main groups, namely manual and automated classifications (e.g. Yarnal 106 1993). The first group has been called "manual" while the second group is called "automated". An 107 alternative discrimination refers to "subjective" versus "objective" methods, which is not quite the 108 same since automated methods which are often regarded as objective always include subjective 109 decisions. A third group of hybrid methods has been established (e.g. Sheridan 2002) accounting for 110 methods that define types subjectively but assign all observation patterns automatically (see Huth et 111 al. 2008 for a more detailed discussion). Another distinction could be made between circulation 112 type classifications (CTC) utilizing solely atmospheric circulation data like air pressure, and so 113 called weather type classifications (WTC) using additional information about other weather 114 elements like temperature, precipitation, etc. However new developments of WTCs and subjective 115 classifications which always include some information about the synoptic weather situation of a 116 circulation type are rare and actually only one automated method (called WLK) using other 117 parameters than pressure fields is included in cost733cat. The reasons are probably the high efforts 118 of manual classifications as well as the higher demand for CTCs in so called circulation-to- 119 environment applications (Yarnal 1993) relating circulation to other weather elements after the 120 classification process. 121 With the growing availability of computing capacities during the last decades, the number of 122 automated CTC methods increased considerably, since it is easy now to modify existing algorithms 123 and produce new classifications. Therefore it seems necessary to find a new systematization of 124 methods especially accounting for the increased diversity of automated methods and their 125 algorithms (see Huth et al. 2008 and Jolliffe & Philipp 2009 for further developments). Therefore 126 an attempt is made in the paper to distinguish between basic classification strategies which might 127 shed some more light on the functionality and effects of the different methods. 128 This paper is organized as follows: A review of the classification methods included into the catalog 129 database is given in section 2 structured by a methodological systematization which is explained for 130 each method group respectively. Section 3 describes the dataset compilation concerning the input 131 dataset and the unified method configurations. Finally in section 4 the systematization of methods 132 is discussed and concluding remarks point out the main methodological differences and commons. 133 134 2.) Classification methods and systematization 135 In order to get an overview of commonly used methods an initial inventory of classification 136 schemes, developed by different authors has been produced by a questionnaire in the year 2006 137 (Huth et al. 2008). By removing redundant methods and adding some classical methods (e.g. Lund 138 1963) a broad spectrum of different strategies for classification became apparent. In order to 139 systematize these different approaches concerning methodological commonalities, the kind of type 140 definition can be utilized. 141 Two main strategies can be discerned concerning the relation between type definition and 142 assignment of objects to them. The first strategy is to establish a set of types in prior to the process 143 of assignment (called "predefined types" hereafter), while the second way is to arrange the entities 144 to be classified (daily patterns in this case) following a certain algorithm such that the types are, 145 together with the assignment, the result of the process (called "derived types"). The use of 146 predefined types corresponds to a deductive top down approach where it is already known how the 147 relevant types look like, while the generation of derived types corresponds to the inductive bottom 148 up approach where more or less no knowledge about the structure of the dataset or effects of certain 149 types is assumed but should be derived by data mining. This fundamental difference will be 150 specified in the following. 151 152 I. Methods using predefined types 153 Methods using predefined types include those with subjectively chosen weather situations and those 154 where the allocation of days to one type depends on thresholds or rules. They have in common a 155 presumed concept of the relation between circulation and surface climate variables like temperature 156 and precipitation even though it is rarely formulated explicitly. Especially for European surface 157 climate it is for example important whether the large scale flow is organized zonally or 158 meridionally. Therefore predefined types are preferentially defined to clearly discern between these 159 two configurations, while this is not necessarily the case for derived types. The difference between 160 subjectively defined types and their definition by thresholds is just the formulation of explicit rules 161 for the latter. 162 163 I.1. Subjective definition of types 164 Subjective classifications are based on the expert experience about the effect of certain circulation 165 patterns on various surface climate parameters, i.e. they try to discern between typical synoptic 166 situations. A main problem for this approach (as well as for the subject of weather and circulation 167 type classification as a whole) is the diffuse meaning of typical. To define typical in the meaning of 168 more often than other situations does not solve the problem, because there is no obvious way how 169 to separate other situations from it, since there are smooth gradual transitions from one situation to 170 another. However, typical situations may be further concretized by including (not always in an 171 explicit way) the effects of circulation on associated surface climate variables. Thus a typical 172 westerly pattern for central Europe might be defined for prevailing westerly winds combined with a 173 high probability for stratiform precipitation and warm temperatures in winter. It might be this 174 integration of effects increasing the spread of possibilities for different situations which results in 175 the characteristic high number of types of subjective classifications, ranging between 29 for Hess- 176 Brezowsky (1952) and 43 for the ZAMG-classification (Lauscher 1985) inlcuding originally over 177 80 classes (see below). The only exception so far is the classification by Peczely (1957) with 13 178 types. There are several drawbacks of subjective classifications. One is that they are not transferable 179 and scalable to other regions. Another one is the high likelihood for inhomogeneities caused e.g. by 180 changing classifiers which at least in case of the Hess-Brezowsky-classification has to be 181 considered (Cahynová and Huth 2009). However in order to compare non-subjective classifications 182 concerning their information content with the existing subjective expert classifications some of 183 them are included in the dataset. 184 185 HBGWL/HBGWT - Hess Brezowsky Grosswetterlagen/-typen 186 One of the most famous catalogs focusing on central Europe is the one founded by Baur (1948) and 187 revised and further developed by Hess and Brezowski (1952) and more recently by Gerstengarbe 188 and Werner (1999).The concept of type definition strongly follows the flow direction of air masses 189 onto central Europe, discerning zonal, mixed and meridional types which are further discriminated 190 into 10 "Großwettertypes" and on the last hierarchical level into 29 "Großwetterlagen" and one 191 undefined or transitional type. The subjective definition of types and assignment of daily patterns 192 includes the knowledge of the authors about importance of the specific circulation patterns for 193 temperature and precipitation conditions in central Europe. 194 195 OGWL - Objective Grosswetterlagen 196 An objectivized version of the "Hess and Brezowsky Großwetterlagen" has been produced by 197 James (2007) using only circulation composites (mean MSLP and 500 hPa GPH) of the 29 original 198 types and newly assigning the daily circulation patterns to them. This has been done for winter and 199 summer separately and by applying time filters in order to achieve a minimum persistence of 3 200 days. In the cost733cat dataset an unfiltered version based solely on MSLP is included. 201 202 PECZELY - Hungarian weather types 203 György Péczely, a Hungarian climatologist and professor of the Szeged University, Hungary (1924- 204 1984), originally published his macrocirculation classification system in 1957. The system was 205 defined on the base of the geographical location of cyclones and anticyclones over the Carpathian 206 basin, however also the the position of cold and warm fronts are considered. All together 13 types 207 were composed which are pooled into 3 superordinated groups of meridional-northern, zonal- 208 western and zonal-eastern types (Péczely, 1957, 1961, 1983). After the death of Gyorgy Peczely, 209 one of his followers, Csaba Karossy continued the coding process (Karossy, 1994, 1997). 210 211 PERRET - Alpine weather statistics 212 The Swiss weather type classification after Perret (1987) is part of the Alpine Weather Statistics a 213 comprehensive characterization of the regional synoptic situation in a circle with radius of 2° 214 latitude (ca. 222km) centered at Switzerland. Here it is used as an additional information among 33 215 parameters, including others like the threshold based Schüepp classification (below). The Perret 216 classification shows some concordance to the Hess/Brezowsky classification described above. 217 However, the main idea behind it differs in so far that not the main flow direction determines the 218 superordinated groups but the intensity and cyclonicity of the circulation within the target area. 219 Thus the main distinction is made between types dominated by upper level flow, types dominated 220 by upper level highs and those dominated by upper level lows. On a second hierarchy level 5 flow 221 directions are discriminated for the first supergroup which are further characterized by being 222 cyclonic or anticyclonic which leads to a total of 12 types. On the other hand the upper level high 223 and low groups are specified by the position of the pressure centers on a second and third detail 224 level, leading to 9 and 10 types and a total of 31 types. 225 226 ZAMG - Central Agency for Meteorology and Geodynamics Eastern Alpine weather types 227 Since the 1950s weather types have been identified on a daily basis at the Central institute of 228 Meteorology in Vienna / Austria. Weather types according to Baur (1948) and Lauscher (1985), air 229 masses and passages of surface fronts over the eastern alpine edge have been logged. Individual 230 knowledge and education of the forecasters leads to a mixture of applications of methods during the 231 following years, the classification types from Lauscher and Hess-Brezowski (1952) are mostly used 232 within the catalogue. Also 8 circulation types (cyclonic or anticyclonic), and weak gradient 233 situations result in additional 17 classes. Altogether more than 80 different classes had been found 234 on handwritten sheets during the process of digitalization and had been reduced to 43 classes. 235 However, resembling class types are still present, e.g. a low over western parts of Europe and a 236 southwesterly anticyclonic flow at the eastern Alpine edge, both types being causally related. 237 Consequently, additional work is needed for further homogenization of the catalog. Besides the 238 weather type classification the ZAMG catalog also includes the detection of 6 cold and 6 warm 239 different air masses at an upper level (< 850 hPa) and a surface level (> 850 hPa). Their possible 240 changes during the day indicates advection without fronts. Further, fronts and frontal passages over 241 the town of Vienna are logged (11 different surface and upper fronts). This manual classification is 242 still done regularly by the night shift forecaster in Vienna around 2200 UTC. 243 244 I.2. Threshold based methods 245 Compared to subjective classifications threshold based methods define their types indirectly by 246 declaration of a borderline between different types in form of thresholds. Alternatively the 247 distinction between types can be realized by predefined rules for assignment, which is essentially 248 the same. For example a distinction can be made between days with a westerly main flow direction 249 over the domain and days with northerly, easterly or southerly direction where the angle between 250 the sectors represents a threshold or borderline between the types. In contrast to the subjective 251 methods the use of thresholds or explicit rules allows for automated classification. However, the 252 term objective which is sometimes used to point out the difference to subjective classifications is 253 debatable, since the predetermination of thresholds and rules also includes subjective decisions. 254 However their advantage is the reproducibility and of course their computer based fast processing. 255 256 GWT - Grosswetter-types or Prototype classification 257 Ten main circulation types are determined according to the so-called Großwetter-types of HBGWL 258 (see above). The basic idea is to characterize them in terms of varying degrees of zonality, 259 meridionality and vorticity of the large-scale MSLP field. Coefficients of zonality (Z), meridionality 260 (M), and vorticity (V) for each case (day) are determined as spatial correlation coefficients between 261 the respective MSLP field and three prototypical SLP patterns representing idealized W-E, S-N, 262 and central low-pressure isobars over the region of interest. The ten main circulation types are then 263 defined by means of particular combinations of these three coefficients. Assignment to central high 264 and low pressure types results from a maximum V coefficient (negative and positive, respectively). 265 The eight directional types are defined in terms of the Z and M coefficients (e.g. Z = 1 and M= 0 266 for the W-E pattern, Z = 0.7 and M= 0.7 for the SW-NE pattern, and so on, and remaining cases are 267 assigned to one of these types according to the minimum Euclidean distance of their respective Z 268 and M coefficients from those of the predefined types. A subdivision of the directional circulation 269 types into cyclonic and anticyclonic subtypes according to the sign of the V coefficient leads to 18 270 types and an even finer partitioning into subtypes of negative, indifferent and positive V coefficient 271 results in 27 circulation types. 272 273 LITADVE/LITTC - Litynski advection and circulation types 274 The classification scheme is based on three indices, calculated from gridded MSLP data, for 275 estimating the advection of air masses and the cyclonicity characteristic. Meridional (Wp) and zonal 276 (Ws) indices are calculated as spatially averaged components of the geostrophical wind vector 277 while cyclonicity (Cp) is estimated as the MSLP value over the central grid point of the domain. 278 Threshold values for these three indices are defined for each day of the year utilizing the respective 279 long-term means and standard deviations resulting in three categories for each index (negative, 280 indifferent, positive). Circulation types are defined as distinct combinations of index categories for 281 Wp and Ws, leading to nine types, characterized by the direction of advection (e.g. Wp=negative 282 and Ws=positive for type NW). Including Cp results in a further subdivision of directional types 283 according to their cyclonicity characteristics in 27 circulation types (e.g. Wp=positive, 284 Ws=indifferent and Cp=negative for type SC). Each case is assigned to one circulation type 285 according to its categorized index values. 286 287 LWT2 - Lamb weather types version 2 288 LWT2 is a modified version of the objective Jenkinson and Collison (1977) system for classifying 289 daily MSLP fields according to the subjectively derived Lamb-Weather types (Lamb 1950). Daily 290 MSLP fields are classified into 26 circulation types, indicating flow direction and vorticity. Based 291 on a selection of grid-points, direction and intensity of flow and vorticity are computed for each 292 daily MSLP field. The appropriate circulation type is determined according to vorticity-flow ratio 293 thresholds resulting in 8 pure directional types (e.g. W=West), 2 pure anticyclonic / cyclonic types 294 (A,C) and 16 hybrid types (e.g. ANE=Nort -East, partly anticyclonic). In contrast to Jenkinson and 295 Collison (1977) no unclassified days are allowed in LWT2 (James 2006) and the vorticity and flow 296 strength thresholds are adjusted so that exactly 33% of days fall into each of the three vorticity 297 classes anticyclonic, indeterminate and cyclonic. 298 299 WLK – Objektive Wetterlagenklassifikation 300 This method is based on the WLK weather type classification by Dittmann et al. 1995 and Bissolli 301 and Dittmann 2003, originally including 40 different types. The alphanumeric output consists of 302 five letters: the first two letters denote the main wind sector number counting clockwise: sector 303 01=NE, 02=SE, 03=SW, 04=NW, 00=undefined (varying directions). As input parameter the true 304 wind direction (u-, v- components) at 700 hPa is used. Within the domain different weights can be 305 defined. 2/3 of all weighted wind directions are connected to a sector of 10°, further these sectors 306 are assigned to a quadrant (i.e. „SW“ = [190°, 280°]). The third and fourth letter denote 307 >A<nticyclonicity or >C<yclonicity at 925 hPa and at 500, respectively. For each grid point a 308 weighted area mean value of the quasi geostrophic Vorticity is calculated. The fifth letter denotes 309 >D<ry or >W<et conditions, according to an weighted area mean value of precipitable water (whole 310 atmospheric column) which is compared to the long-term daily mean. Further input parameters are 311 the geopotential height, the temperature and the relative humidity at 955, 850, 700, 500 and 300 312 hPa. The consecutive method WLKC733 included in cost733cat.1.2 provides a few simplifications: 313 circulation patterns are derived from a simple majority (threshold) of the weighted wind field 314 vectors at 700 hPa. Further on, up to nine circulation patterns and one undefined circulation class is 315 computed, though the number is flexible to the requirements of the application. I.e. 9 main wind 316 sectors are defined counting clockwise (WLKC09): sector 01=345-15°, 02=15-75°, 03=75-105°, 317 04=105-165°, 05=165-195°, 06=195-255°, 07=255-285°, 08=285-345°, 00=undefined (varying 318 directions); 6 main sectors are used in WLKC28: 01=330-30°, 02=30-90°, 03=90-150°, 04=150- 319 210°, 05=210-270°, 06=270-330°, 00=undefined (varying directions). The last deviation from the 320 original scheme is the replacement of the integrated precipitable water content by the DMO 321 parameter of towering water content. Therefore the number of input parameter fields is reduced. 322 323 SCHÜEPP – Alpine weather statistics 324 Although initially developed as a manual classification scheme (Schüepp 1957, 1968) - explicitly 325 focusing on the western Alpine region and Switzerland - this classification is assigned to the 326 threshold based methods, as it utilizes distinct numerically derived thresholds. The classification of 327 days into 40 classes is based on MSLP and 500 hPa data for a spatial domain of 2° radius centered 328 at 46.5°N, 9°E. Circulation types are defined by thresholds with regard to the following variables: 329 surface pressure gradient and wind direction, 500 hPa wind direction and intensity, 500 hPa GPH, 330 vertical wind shear and baroclinicity. The resulting 40 classes are grouped into 3 generic main 331 groups. Advective / convective types, characterized by pressure gradients above / below a certain 332 threshold and mixed types showing intermediate characteristics concerning horizontal and vertical 333 dynamics. The Schüepp classification is only available for the original spatial domain. 334 335 II. Methods producing derived types 336 In contrast to methods utilizing predefined types the following methods are based on the idea to find 337 types which are indicated by any structure in the dataset itself. In particular three superordinate 338 strategies may be discerned. The first group utilizes principal component analysis (PCA) to 339 determine principal components (PCs) explaining major fractions of the variance of the input data. 340 The patterns to be classified are assigned to classes according to some measure of relation to the 341 principal components. The second strategy is to find leading patterns according to their number of 342 similar patterns within a certain distance around them while the third strategy is the combinatorial 343 approach to optimize a partition according to a function, commonly the minimization of within-type 344 variability. 345 346 II.1. PCA based methods 347 The potential of principal component analysis (PCA) to be used as a classification tool was 348 suggested by Richman (1981) and more deeply discussed and elaborated by Gong and Richman 349 (1995). The basic idea of using PCA as a classification tool consists in assigning each case to a 350 principal component according to some rule. However there are different modes for PCA differing 351 fundamentally from each other. In the most often used s-mode the results are score time series 352 representing the most important types of data variability in time, while the loadings indicate where 353 and how much these time series are realized. Things are reversed in t-mode, where the scores 354 describe important patterns and the loadings reflect the amount of their time variant realization. 355 Thus the t-mode seems more appropriate, however also the s-mode might be utilized. 356 357 TPCA - principal component analysis in t-mode 358 To classify circulation patterns by PC loadings, PCA has to be used in a T-mode with oblique 359 rotation (e.g., Richman 1986, Huth 1993, and Compagnucci and Richman 2008). Here we apply 360 TPCA in a setting similar to Huth (2000): PCA is first conducted on ten subsets of data, the first 361 subset being defined by selecting the 1st, 11th, 21st, etc. day; the second subset by selecting the 362 2nd, 12th, 22nd, etc.day, and so on. The solutions are projected onto all data by solving matrix 363 equation ΦAT = FTZ (here F and Φ are matrices of PC scores and PC correlations, respectively, Z 364 is the full data matrix, and A are pseudo-loadings to be determined). Each day is classified with that 365 PC (type) for which it has the highest loading. Contingency tables are finally used to compare the 366 ten classifications and, based on the tables, the classification most consistent with the other nine 367 classifications is selected as the resultant one. 368 369 P27 - Kruzinga empirical orthogonal function types 370 The P27 classification scheme developed at the Royal Netherlands Meteorological Institute 371 (Kruizinga 1979; Buishand and Brandsma 1997) utilizes the s-mode variant of PCA. Originally 372 daily 500 hPa heights were transformed to patterns ptq of reduced seasonal variability by subtracting 373 the long-term daily average from each gridpoint. 374 The pattern of each day t is approximated as: pt ≈ s1t a 1+s2t a 2 +s3t a 3 , 375 a 2 and a3 are the first three principal component vectors and s1t to s3t are the respective scores. 376 Instead of using the correlation or covariance matrix, the eigenvectors are calculated from the 377 unadjusted (neither with regard to mean nor variance) product matrix. The amplitudes of the scores t=1,. . .. . ,N, where a1 , 378 of the first three components, representing zonality, meridionality and cyclonality respectively, are 379 subdivided into n1, n2 and n3 equiprobable intervals. Finally each object (day) is assigned to one of 380 the n1 x n2 x n3 possible interval combinations. The original classification has 3 equiprobable 381 intervals for all components, resulting in 27 classes. However, varying numbers of classes can be 382 achieved by defining different numbers of amplitude intervals (e.g. 3 x 3 x 2 = 18 classes). The 383 interpretation of the principal components to be connected to the zonal and meridional components 384 of the flow and to cyclonality does not hold always depending on the location and the scale of the 385 used grid. 386 387 PCAXTR - principal component analysis extreme scores 388 For this classification approach Varimax rotated s-mode PCA, based on the correlation matrix, is 389 used to build initial centroids based on the score time-series. The centroids are calculated averaging 390 the days that fulfil the principle of the “extreme scores” (Esteban et al; 2005, 2006, 2009): patterns 391 with high score values for a certain component (normally values higher than +2 for the positive 392 phase, or lower than -2 for the negative phase), but with low score values for the remainder 393 (normally between +1 and -1) were selected. According to this, twice as many classes as retained 394 PCs result from this method. However, PCs for which none real case exhibits an "extreme score" 395 are considered as an artificial result of the PCA and the according class is eliminated. All days are 396 finally assigned to the remaining centroids according to their minimum Euclidean distance. 397 398 II.2. Leader algorithms 399 Methods based on the so called leader algorithm (Hartigan 1975) have been established when 400 computing capacities have been available but were still low. They try to find key (or leader) 401 patterns in the sample of maps, which are located in the center of high density clouds of entities 402 within the multidimensional phase space spawned by the variables, i.e. grid point values. Thus the 403 aim is close to that of non-hierarchical cluster analysis but the expensive iterations of the latter are 404 avoided. 405 406 LUND - classical leader algorithm 407 Lund (1963) used a simple linear correlation method to identify frequently appearing, well 408 separated sea-level pressure patterns over the northeastern USA. To do so Pearson correlation 409 coefficients between all days are calculated and for each day all coefficients of r>0.7 to any other 410 pattern are counted. The first key pattern (leader) is defined as the day with the largest number of 411 correlation coefficients r >0.7. This day and all days with a correlation coefficients r >0.7 to that 412 key day are then removed from the dataset. On the remainder data set the search for the second and 413 following leaders is done in the same manner, until all types (of the predefined number of types) 414 have a key day. Finally all days are just assigned to the key day according to the highest correlation 415 coefficient, regardless to which key day they had been assigned initially. 416 417 ESLP/EZ850 - Erpicum and Fettweis 418 This classification algorithm is similar to Lund, however, differing by the calculation and use of the 419 similarity measure (Fettweis et al. 2009). Another difference is the skip of the final step. Thus the 420 final classification of the days is based on their first assignment to a key day. However to avoid 421 large dissimilar class sizes a novel scheme varying the threshold is introduced as explained below. 422 To give the same weight to all the grid points of the input maps for the similarity index 423 computation, the data are first normalized over the temporal dimension in a preprocessing step. 424 Then, for each pair of days the similarity index i(day1,day2) = 1 - 1/2 * SUM |(Z(day1)-Z(day2)|, is 425 calculated, where Z(day) is the vector of pressure values for one day. This index equals to the 426 maximum of 1 if the Z surface is compared with itself and is negative if both Z surfaces are very 427 different. Departing from Lund (1963) the distance threshold for determination of key patterns ic is 428 progressively decremented, starting from one and being decremented by a factor epsilon for each 429 type, i.e. ic(k) = 1 - epsilon * k; k=1,n, where k is the number of the class and n is the maximum 430 number of types. The whole classification scheme (see above) is repeatedly applied for different 431 values of epsilon in order to minimize the skill score (percentage of explained variance) defined by 432 Buishand et al. (1997). For each run epsilon is increased by epsilon=0.05*(r-1); r=1,m where r is 433 number of the run and m is the maximum number of runs. Thus compared to the Lund (1963) 434 classification this procedure is considerably more time demanding, however includes a mechanism 435 for reducing within-type variability. 436 437 KH - Kirchhofer types 438 The method initially introduced by Kirchhofer (1974) was intended to classify 500hPa geopotential 439 height patterns over Europe, using as criterion the squared distances between normalized grid-point 440 values. Two patterns were allowed to be classified as similar only if this so called Kirchhofer score 441 was high enough, and if the Kirchhofer scores for a set of user-defined sub-sections of the two 442 patterns were also acceptable. The method implemented here is a variation of the original based on 443 Yarnal (1984) , who suggested that the columns and rows of the grid could serve as sub-domains, 444 and Blair (1998), who replaced the squared distances with, equivalent, linear correlation 445 coefficients between map patterns. Since this distance measure is considerably lower than the usual 446 pattern correlation coefficient, the threshold for finding key patterns is set to 0.4. Apart from that 447 the procedure follows exactly the Lund (1963) classification scheme (see above). 448 449 II.3. Optimization algorithms 450 Optimization methods are combinatorial approaches to arrange the whole set of objects (days) 451 within groups (or clusters) in such a way that a certain function is optimized. This function is the 452 minimization of the within-type variability measured as the overall sum of the Euclidean distances 453 between the member objects of a type and the average of the type (centroid). Most of the included 454 optimization methods are based on the k-means clustering algorithm (e.g. Hartigan 1975). In order 455 to avoid redundancy its principal is described here. K-means starts with an initial starting partition 456 of the objects (daily pressure maps) and evaluates one object after the other whether it is in the most 457 similar cluster and if not shifts it to a nearer one in terms of the Euclidean distance between the 458 object and the centroid of the cluster. By doing so, the affected centroids in turn have to be 459 recalculated which in turn is changing the situation for following checks and makes it necessary to 460 repeatedly iterate through the list of objects and check them again. At some point of this process all 461 objects are in their nearest cluster and no shift is necessary and possible anymore, i.e. convergence 462 to an optimum has been reached. Most of the following optimization methods only differ 463 concerning the starting partition or data preprocessing for k-means. Only SANDRA and NNW 464 method use alternative ways for optimization. The PETISCO classification does not really optimize 465 the whole partition in the end but tries to find iteratively optimal leading centroids in an 466 intermediate step which are finally combined. Thus this method utilizes a hybrid between the leader 467 algorithm and a kind of cluster-analysis and might be also assigned to the method group of leader 468 algorithms or be an own group by itself. However conveniently it is listed here beside the actual 469 optimization methods. Likewise NNW does not converge to an optimum since its iterations have 470 been limited to 2000 here. 471 472 CKMEANS - k-means by dissimilar seeds 473 For this classification method the k-means algorithm is initiated using a starting partition based on 474 dissimilar pressure fields of the dataset to be classified following Enke and Spekat (1997). The 475 initialization takes place by randomly selecting one object (pressure map). The seed for the second 476 cluster is then determined as the object most different to the first, while the seed for the third cluster 477 is the object with the highest sum of the distances to the first two seed-patterns, and so on, until 478 every cluster has one seed-pattern. In a stepwise procedure the starting partition, initially 479 consisting of the k seed-patterns is gradually identified. All remaining days are then assigned to 480 their most similar class. With each day entering a class, the centroid positions are re-computed. As a 481 consequence the multi dimensional distance between class centroids continually decreases while the 482 variability within the individual classes of the starting partition, on the other hand, increases. After 483 the assignment of all days has been performed the iterative k-means clustering process is launched 484 (see above). The centroids converge towards a final configuration which has no similarity with the 485 starting partition. In order to retain representativity, classes are kept in the process only if they do 486 not fall short of a certain number, e.g., 5% of all days. Otherwise the class will be removed and its 487 contents distributed among the remaining classes. 488 489 PCACA - k-means by seeds from hierarchical cluster analysis of principal components 490 This method follows some recommendations proposed by Yarnal (1993). In a preprocessing step, a 491 high-pass filter using the 13-day running mean is applied on the input data in order to remove the 492 seasonal cycle. Afterwards a s-mode PCA (Varimax rotated using the correlation matrix) is applied 493 on the filtered data to reduce the collinearity, simplifying the numerical calculations and improving 494 the performance of subsequent clustering procedure. The daily PC-score time series of the retained 495 PCs (mostly 3 according to a scree test) were the input of the clustering procedure. In order to 496 obtain the starting partition for the k-means procedure (see above) the hierarchical cluster algorithm 497 by Ward (1963) is applied. 498 499 PETISCO - leader algorithm with optimized key patterns 500 This method resembles some aspects of the leader algorithm, however includes an optimization 501 procedure for the classification seeds (Petisco 2005). Like for LUND (see above) key patterns are 502 determined among all days but here with a threshold of 0.9 for the pattern correlation r. In case of 503 using more than one level (Originally, both, the MSLP and the 500 hPa GPH field have been used) 504 r is the minimum of the correlation coefficients calculated separately for each level. In contrast to 505 the leader algorithm the key pattern is calculated as the centroid of the key day and all days within 506 r>0.9, the latter called key group in the following. Iteratively the calculation for the key group 507 centroid and the search for new members of the key group is repeated, until optimized key patterns 508 exist, i.e. they do not change anymore. From these key groups the largest are selected as final types 509 and all remaining days are assigned to them according to their minimum correlation coefficient. 510 511 PCAXTRKMS - k-means using PCA derived seeds 512 For this k-means variant the starting partition is determined by the PCAXTR method described 513 above. Consequently this is the only optimization method with some restrictions for the number of 514 types. 515 SANDRA - simulated annealing and diversified randomization clustering 516 As it is discussed in literature the k-means clustering algorithm is by design an unstable method (see 517 e.g. Michelangeli et al. 1995, Philipp et al. 2007, Fereday et al. 2008), since it has no strategy to 518 avoid local optima of the optimization function. In contrast, the heuristic simulated annealing 519 algorithm is able to approximate the global optimum. The only difference to k-means are so called 520 wrong reassignments, i.e. objects may be removed from their nearest cluster, depending on a 521 probability P which is high at the beginning but slowly decreases during the optimization process. 522 Thus, if the process was trapped in a local optimum of poor quality some objects can be shifted 523 again which might lead to an overall improvement for the following steps. In order to slowly reduce 524 P a control parameter T which is a huge number at the beginning, is stepwise reduced by a cooling 525 factor C (e.g. C=0.999) from iteration to iteration by calculating T=T*C. P is then calculated as P = 526 exp( (Dcurrent - Dnew)/T ) , where Dcurrent is the Euclidean distance of the object to its current cluster 527 and Dnew the distance to a potentially new cluster. If a random number r (0<r<1) is less than P, the 528 wrong reassignment takes effect. At the end when T is a tiny number and hopefully all poor-quality 529 local optima have been bypassed, only improving reassignments are in effect (like in k-means) and 530 convergence is reached. In order to speed up the run time SANDRA uses a relatively low cooling 531 factor (C=0.999) leading to a quick convergence but repeats the whole process 1000 times with 532 randomly diversified starting partitions and a randomized scheme for object and cluster ordering 533 leading to a diversified chronology for the tests and thus different ways to approach to the global 534 optimum. From these 1000 results the best is finally selected. 535 536 SANDRAS - classification of sequences of days with SANDRA 537 This method differs from SANDRA only by using modified input data. Instead of single daily 538 patterns, three day sequences are used, thus the history of the development of the last day in this 539 sequence is included for type definition, resulting in types of successions of maps (Philipp 2008). 540 This approach might be applied in principle to all classification methods, but has been included in 541 the dataset only for the SANDRA scheme in order to evaluate its effects in general. 542 543 NNW - neural network self organizing maps (Kohonen network) 544 The Neural Network architecture proposed for the classification of weather patterns (see e.g. 545 Michaelides et al. 2007, Tymvios et al., 2007, Tymvios et al., 2008a, Tymvios et al., 2008b) is the 546 Kohonen’s (1990) SOFM (Self-Organising Features Map). The SOFM network has the ability to 547 learn without being shown correct outputs in the sample patterns and is able to separate data into a 548 specified number of categories with only two layers: an input layer and an output layer, the latter 549 consisting of one neuron for each possible output category. The objective is to discover significant 550 features or regularities in the input data Χ(k,j); k=1, 2, … , p; j=1, 2, ... M; where k is a day and p the 551 total number of days and j is a gridpoint and M the total number of gridpoints. The input vector 552 X(p), representing a pressure pattern, is connected with each output neuron through weights w(j), 553 j=1,2,... M, which are randomly chosen at the beginning. The output neuron whose weight vector is 554 most similar to the input vector X is the so called winner neuron. The weight vector of the winner 555 neuron, as well as those of its neighborhood neurons, are updated to become more similar to the 556 input pattern following the learning rule: 557 558 where the neighborhood set N(I,R) of neuron I of radius R consists of the neurons 1,I±1,…,I±R, 559 assuming these neurons exist, with the maximum adjustment being around the winner I and 560 decreasing for more distant neurons in terms of the distance within the (hexagonal) network 561 topology. The coefficient a in the above relationship is called the learning factor and decreases to 562 zero as the learning progresses. Also the radius R of the neighborhood around the winner unit is 563 relatively large to start with, to include all neurons. As the learning process continues, the 564 neighborhood is consecutively shrunk down to the point where only the winner unit is updated 565 (Kohonen, 2001). When all vectors in the training set were presented once at the input (one epoch), 566 the whole procedure was repeated 2000 times for the cost733cat dataset version 1.2. In the end, this 567 algorithm organizes the weights of the two-dimensional map, such that topologically close nodes 568 become sensitive to input that is physically similar. These networks are extremely demanding as far 569 as computing power and memory is concerned. 570 571 3.) Method configuration and dataset compilation 572 The different methods reviewed above represent a large bandwidth of strategies to distinguish 573 synoptic situations into classes. Beside the subjective classifications which are included as 574 supplementation, all are automated and allow for recalculation which is a basic criterion for later 575 applications since they should be applicable to different regions, datasets, periods etc. However, in 576 order to allow successive comparisons studies of the pure effects caused by different classification 577 methods, it is necessary to apply the methods in a standardized way on a standardized input dataset. 578 This should exclude differences in the resulting classification catalogs caused by other conditions 579 than by the classification method alone. Therefore unified test conditions have been defined and all 580 automated methods have been recalculated accordingly. The standardization has been applied in 581 two steps, the first concerning the input data source and the temporal and spatial domain, the second 582 step additionally specifying the number of types produced and MSLP as the only variable. 583 The predefined input dataset is the ECMWF ERA40 reanalysis dataset (Uppala et al. 2005) 584 available for the period 09/1957 to 08/2002 in 1° by 1° grid resolution and in 6-hourly time steps 585 thus excluding discrepancies caused by different input data sources, assimilation or preprocessing 586 schemes. It has been decided to use only 12 o'clock UTC data and to apply all methods for the full 587 year period. Another important feature is the clipping of the ERA40 1° by 1° grid. Different spatial 588 scales and regions have been covered by a set of 12 unified domains throughout Europe presented 589 in Figure 1 and Table 1. The largest is covering whole Europe by a reduced grid resolution (domain 590 0) while the smallest is confined to the greater Alpine area comprising 12x18 grid points (domain 591 6). All methods have been applied to all 12 domains. 592 593 [Figure 1.] 594 595 [Table 1.] 596 597 Several of the above listed catalogues use other or more variables than MSLP in their original 598 variants. To exclude the use of different variables as one potential source of dissimilarity, in a 599 second step towards unification all methods (except WLK) have been recalculated using only 600 MSLP, again for all domains, and the resulting catalogs have been added to the dataset. Another 601 source of considerable dissimilarities is the number of types. While some methods allow to chose an 602 arbitrary number of types (like cluster analysis) others are limited to one or a few numbers, either 603 due to their concept (e.g. by division by wind sectors) or due to technical reasons (e.g. leading to 604 empty classes). The original numbers of types are varying between 4 and 43 types which makes it 605 rather difficult to compare the classifications. Preliminary analyses suggested that the quality of 606 classifications, e.g. in their power to discriminate surface temperature and precipitation, is strongly 607 sensitive to the number of types (Huth 2009, Beck and Philipp 2009). To eliminate this effect, it 608 has been decided to fix the numbers of types in the classifications to a few distinct numbers. To 609 span the usual range of the numbers of types in circulation classifications, and to allow methods 610 with non-flexible numbers of types (such as the Litinsky (1969) classification) to be included, it has 611 been decided to fix the numbers of types at 9, 18, and 27; with deviations by up to two from these 612 numbers being allowed. Therefore, each method has been run three times where necessary, one for 613 each of the reference number of types, again for all spatial domains. 614 All in all 73 method variants,, each of them applied to the 12 spatial domains where possible, are 615 included in cost733cat version 1.2. An overview of the variants of classifications is presented in 616 Table 2. It includes variants initially created by the authors as well as variants produced under the 617 unified classification conditions as described above. The abbreviations (column 2) reflect the 618 method or the authors name as well (in case of variants with unified conditions) the input parameter 619 (i.e. SLP) and the number of types. The number of types (column 3) may be given as a range, which 620 applies to catalogs with different numbers for the 12 spatial domains. At least one unified variant is 621 given with the number of types close to 9, 18 and 27 for all domains. A key reference for more 622 detailed descriptions of the method is added in column 4. Finally the method group is indicated in 623 the last column. 624 625 [Table 2.] 626 627 The dataset consists physically of ASCII files, following the naming convention in Table 2 column 628 2 and the domain number as documented in Table 1, holding 16436 lines, each line representing a 629 day, while the date is given by year, month and day in the first three columns of each line, followed 630 by the type number of the day. These single catalog files are organized in directories for each 631 method variant (see Table 2). Additionally catalog compilations for similar numbers of types are 632 provided in comprehensive files for easier processing of comparison analyses. 633 634 4.) Discussion and concluding remarks 635 636 A number of 788 classification weather and circulation type catalogs for European regions has 637 been compiled to a consistent dataset. While 8 spatially fixed and mostly not automated 638 classifications are included, 22 automated classification methods in 65 method variants have been 639 applied to 12 standard domains within Europe, including variants for unified classification 640 conditions, in order to provide comparable catalogs for future studies on classification methods and 641 their possible improvement. The various classification methods have been reviewed and 642 systematized according to the strategy for generation of types. In the following the main aspects of 643 these methods are summarized and discussed. 644 A main distinction is made between methods using predefined types and those producing derived 645 types. Within the former, manual or subjective methods are still in use and have their value due to 646 the integration of the experience of experts, which is often hard to formulate in precise rules for 647 automated classification. Therefore 4 subjective catalogs have been included in the dataset for 648 comparison (HBGWL/HBGWT, PECZELY, PERRET and ZAMG), even though it is unlikely that 649 new manual classifications will be established in the future. Disadvantages are the sometimes 650 imprecise definition of the types, which may lead to inhomogeneities and that they are made for 651 distinct geographical regions and are not scalable. The former problems are overcome by the 652 automatic assignment of days due to their pressure patterns as it is done by the OGWL method, 653 however the latter problems remain. 654 They are often replaced now by threshold based methods (GWT, LITADVE/LITTC, LWT2, WLK, 655 SCHÜEPP) which have the advantage of automatic processing and a lower degree of subjectivity. 656 However, subjectivity is also apparent by definition of the thresholds. Even though some thresholds 657 seem natural (like the symmetric partitioning of the wind rose) it is always an arbitrary decision to 658 use certain thresholds and not others. However they are appealing with their clear and precise 659 definition of types. However it became apparent that the definition and application of thresholds or 660 rules is very different, even though their common concept is based on airmass flow. 661 The second basic strategy for classification is to derive types from the input dataset by data 662 analysis. PCA based methods (TPCA, P27, PCAXTR) have in common the use of eigenvectors, 663 however, these are calculated and applied quite differently and it is debatable whether they can be 664 considered as a homogeneous group of methods. This is especially true for P27, where there is some 665 conceptional similarity to the GWT classification of the threshold group. Methodological more 666 homogeneous appears the group of methods based on the leader algorithm (LUND, ESLP/EZ850, 667 KH), differing only by using different distance metrics, different thresholds for finding key patterns 668 and by omitting the final redistribution step for ESLP/EZ850. May be the most coherent group is 669 the group of optimization methods (CKMEANS, PCACA, PETISCO, PCAXTRKMS, SANDRA, 670 SANDRAS, NNW), where only the PETISCO method does not try to minimize the within-type 671 differences at the end and thus might be considered as a member of the leader algorithm group as 672 well. Differences between the other methods confine to the kind of creating starting partitions, data 673 preprocessing and variations of the optimization algorithm, while their objective is all the same and 674 clearly defined. 675 [--------------------------------------------------------------> 676 Unification, drawbacks 677 [The reason for rareness of subjective WTCs is probably the demand from the majority of 678 applications to use CTCs for relating circulation to target variables (circulation-environment 679 approach after Yarnal 1993), e.g. for downscaling where only reliable circulation data are 680 available.] 681 682 [In some cases the assignment of methods to a method group is not unambiguous, e.g. P27 is 683 assigned to the PCA group but is also related to THR methods. 684 It turned out that PETSICO is now inspired by the leader algorithm (LDR group) and is somewhat 685 misplaced within the OPT group (it seems it was initially a k-means procedure.] 686 687 While most of the traditional subjective classifications are made for the full year without 688 subdivisions for different seasons, some applications use separate classifications e.g. for the four 689 meteorological seasons, which helps to stratify links between circulation and other parameters 690 which are seasonally variable. However, this approach complicates seamless studies on seasonality 691 of types. In case of some classification schemes (e.g. the so called GWT) it makes no difference 692 whether the seasons are resolved or not. In the current version of the cost733cat dataset the full year 693 classification approach has been implemented for all classifications taking into account that this 694 might lead to a drop of performance for some applications, however since this is done for all 695 methods, relative differences in performance should be conserved. 696 697 While most of the authors contributing to the catalog dataset originally used mean sea level pressure 698 (MSLP), some methods have been applied to other parameters like the geopotential height of the 699 850 or 500 hPa level or wind components, humidity and temperature (the three latter parameters 700 only for the WLK classification). ----> Unification 701 702 In version 1.2 (the current version at the time of writing) of the cost733cat dataset not all 703 configuration features have been unified so far. Thus methods use different distance measures, 704 mostly Euclidean distance and correlation coefficient, and some include a special preprocessing like 705 high-pass filtering and a preceding PCA (e.g. PCACA). It is debatable whether such differences 706 should be regarded as method-specific or should be unified as well, and indeed such considerations 707 are under discussion. Furthermore there are discussions under way whether the overall quality of 708 the catalogs should be improved, e.g. by using 850 hPa, seasonal classifications or inclusion of 709 temperature and precipitation as input variables. And last but not least there will be some more 710 catalogs in order to test further developed classification methods in the future. However, such future 711 modifications of the catalogs will be just added as additional variants in order to keep the dataset 712 backwards compatible. 713 <--------------------------------------------------------------] 714 715 The list of included methods of the cost733 catalog database is by far not complete. However, it is 716 believed that the most common basic methods are represented with some of their variations. 717 Methods not included are mostly those which are specialized for a certain target variable or don't 718 fulfill the criterion of producing distinct and unambiguous classifications of the entities. An 719 example for a method including both aspects is the fuzzy classification method developed by 720 Bardossy et al. (2002), which tries to find circulation types explaining the variations of a specified 721 target variable, e.g. station precipitation time series, while the daily maps are members of more 722 than one class. The consequence of optimization of a classification for one or a few target variables 723 is that the resulting catalog is applicable for a small target region but not for climate variations on a 724 larger, e.g. continental scale. On the other hand one of the attracting properties of classification 725 methods is the transfer from the metric scale to the nominal scale, which is violated by the principle 726 of fuzzy classification. These aspects should not debase the value of the mentioned methods since 727 they usually are superior to generalized and discrete methods, but their results are useful for specific 728 purposes only (see e.g Enke et al. 2005). 729 However, further studies of the presented classification dataset might show that the idea of 730 generalized classifications including the main links between circulation and climate variability in 731 different or large regions is unrealistic. And the attempts to find generally "better" classification 732 methods is like shooting on a moving target. If a method is optimal for circulation it may be bad for 733 explaining temperature variations, if it is optimal for temperature it may be bad for precipitation, 734 etc. Thus a universally best method might be unreachable, explaining the large number of attempts 735 to develop more and more different variations of existing methods or to develop completely new 736 ones. It is the hope of the authors that this database reflecting the large variety of classification 737 methods will help to come to a conclusion on whether it is worth the amount of effort spent in the 738 search for the ultimate best method. However, even if the results of the intercomparison studies of 739 the classifications only show that some of the methods are more or less useful in many aspects, the 740 aim of the work on this catalog collection will have been reached. 741 742 743 References 744 745 Bárdossy A. J. Stehlík, and C. Hans-Joachim, 2002: Automated objective classification of daily 746 circulation patterns for precipitation and temperature downscaling based on optimized fuzzy rules. 747 Climate Res., 23, 11–22. 748 749 Baur, F., 1948: Einführung in die Großwetterkunde (Introduction into large scale weather), 750 Dieterich Verlag, Wiesbaden. 751 752 Beck, C., 2000. Zirkulationsdynamische Variabilität im Bereich Nordatlantik-Europa seit 1780. 753 Würzburger Geographische Arbeiten 95. (German) 754 755 Beck, C., Jacobeit, J., Jones, P. D., 2007. Frequency and within-type variations of large scale 756 circulation types and their effects on low-frequency climate variability in Central Europe since 757 1780. Int. J. Climatol. 27, 473-491. 758 759 Bissolli, P., Dittmann, E., 2003: Objektive Wetterlagenklassen. In: Klimastatusbericht 2003. DWD 760 (Hrsg.). Offenbach 2004. 761 762 Blair (1998): The Kirchhofer technique of synoptic typing revisited. Int. J. Climatol. 18: 1625–1635 763 764 Buishand, T.A. and Brandsma, T., Comparison of circulation typeification schemes for predicting 765 temperature and precipitation in the Netherlands. International Journal of Climatology 17, 875-889, 766 (1997). 767 768 Cahynová M. and R. Huth, 2009. Enhanced lifetime of atmospheric circulation types over Europe: 769 fact or fiction? Tellus A, DOI: 10.1111/j.1600-0870.2009.00393.x 770 771 CHARALAMBOUS, C., CHARITOU, A., and KAOUROU, F. (2001), Comparative analysis of 772 neural network models: Application in bankruptcy prediction, Annals of Operations Res. 99, 403– 773 425 774 775 KOHONEN, T., (1990), The Self-Organizing Map. Proceedings of IEEE, 78(9), pp. 776 1464-1480 777 778 Compagnucci, R.H., Richman, M.B., 2008: Can principal component analysis provide atmospheric 779 circulation or teleconnection patterns? Int. J. Climatol., 28, 703-726. 780 781 Dittmann, E., Barth, S., Lang, J., Müller-Westermeier, G., 1995. Objektive 782 Wetterlagenklassifikation. Ber. Dt. Wetterd. 197, Offenbach a. M., Germany. 783 784 Ekstrom M, Jonsson P and Barring L. 2002. Synoptic pressure patterns associated with major wind 785 erosion events in southern Sweden. Climate Research 23: 51–66. 786 787 Enke, W., Spekat, A., 1997. Downscaling climate model outputs into local and regional weather 788 elements by classification and regression. Clim. Res. 8, 195-207. 789 790 Enke, W.; Schneider, F.; Deutschländer, Th. (2005):A novel scheme to derive optimised circulation 791 pattern classifications for downscaling and forecast purposes. Theoretical and Applied Climatology, 792 DOI: 10.1007/s00704-004-0116-x. 793 794 Erpicum, M., Mabille, G., Fettweis, X., 2008. Automatic synoptic weather circulation types 795 classification based on the 850 hPa geopotential height. Book of Abstracts COST 733 Mid-term 796 Conference, Advances in Weather and Circulation Type Classifications & Applications, 22-25 797 October 2008 Krakow, Poland, 33. 798 ESTEBAN P, JONES PD, MARTÍN-VIDE J, MASES M. (2005) Atmospheric circulation patterns 799 related to heavy snowfall days in Andorra, Pyrenees. International Journal of Climatology. 25: 319- 800 329. 801 ESTEBAN P, MARTIN-VIDE J, MASES M (2006) Daily atmospheric circulation catalogue for 802 western europe using multivariate techniques.International Journal of Climatology 26: 1501-1515 803 ESTEBAN, P.; NINYEROLA, M. and PROHOM, M. (2009): Spatial modelling of air temperature 804 and precipitation for Andorra (Pyrenees) from daily circulation patterns. Theoretical and Applied 805 Climatology. In press. Online at: 806 http://www.springerlink.com/content/r2728365g085q007/?p=0713eaeac536434ebc4b237b6677218 807 a&pi=1 808 Fereday D.R., J.R. Knight, A.A. Scaife, C.K. Folland, A. Philipp, 2008. Cluster analysis of North 809 Atlantic / European circulation types and links with tropical Pacific sea surface temperatures. 810 Journal of Climate, 21(15) 3687-3703. DOI: 10.1175/2007JCLI1875.1 811 812 Fettweis, X., Mabille, G., Erpicum, M. and Nicolay, S.: The 1958-2008 Greenland ice sheet surface 813 melt and the mid-tropospheric atmospheric circulation, Climate Dynamics, in preparation, 2009. 814 815 Gerstengarbe, F.-W., P. Werner, 1999: Katalog der Großwetterlagen Europas (1881-1998), nach 816 Paul Hess und Helmuth Brezowsky (Catalog of European weather types (1881-1998) after Paul 817 Hess and Helmuth Brezowsky, in German). Potsdam, pp 138. Online at http://www.pik- 818 potsdam.de/~uwerner/gwl/ 819 820 Gong, X., Richman, M.B., 1995: On the application of cluster analysis to growing season 821 precipitation data in North America east of the Rockies. J. Climate, 8, 897-931. 822 823 Hartigan, J.A.,1975. Clustering algorithms, Wiley series in probability and mathematical statistics, 824 New York, 351 pp. 825 826 Hess, P., Brezowski, H., 1952. Katalog der Großwetterlagen Europas. Ber. Dt. Wetterd. in der US- 827 Zone 33, Bad Kissingen, Germany. 828 829 Hewitson B, Crane RG (1992) Regional climates in the GISS Global Circulation Model and 830 Synoptic-scale Circulation. J Climate 5:1002-1011. 831 832 Huth, R., 1993: An example of using obliquely rotated principal components to detect circulation 833 types over Europe. Meteorol. Z., 2, 285-293. 834 835 Huth, R. (2000): A circulation classification scheme applicable in GCM studies. Theor. Appl. 836 Climatol., 67, 1-18. 837 838 Huth R., C. Beck, A. Philipp, M. demuzere, Z. Ustrnul, M. Cahynová, J. Kyselý, O. E. Tveito 839 (2008): Classification of Atmospheric Circulation Patters - Recent Advances and Applications is 840 published in the Annals of the New York Academy of Sciences, 1146, pp105-152. 841 842 Huth R. (2009): Synoptic-climatological applicability of circulation classifications from the 843 COST733 collection: First results. Physics and Chemistry of the Earth, current issue. 844 845 James, P. M., 2006. Second Generation Lamb Weather Types – a new generic classification with 846 evenly tempered type frequencies. 6th Annual Meeting of the EMS / 6th ECAC, EMS2006A00441, 847 Ljubljana, Slovenia. 848 849 James P.M., (2007): An objective classification for Hess and Brezowsky Grosswetterlagen over 850 Europe. Theor. Appl. Climatol., 88, 17-42 851 852 Jenkinson, A.F., Collison, B. P., 1977. An initial climatology of gales over the North Sea. Synop. 853 Climatol. BranchMemo.N◦ 62, Meteorological Office, London, UK. 18 pp. 854 855 Károssy, Cs., 1994: Péczely’s classification of macrosynoptic types and the catalogue of weather 856 situations (1951-1992). In: Nowinsky, L. (ed): Light trapping of insects influenced by abiotic 857 factors. Part I. Savaria University Press, Szombathely. 117-130. 858 859 Károssy, Cs., 1997: Catalogue of Péczely’s macrosynoptic weather situations (1993-1996). In: 860 Nowinsky, L. (ed): Light trapping of insects influenced by abiotic factors. Part II. Savaria 861 University Press. Szombathely. 159-162. 862 863 Kirchhofer, W., 1974. Classification of European 500 mb patterns. Arbeitsbericht der 864 Schweizerischen Meteorologischen Zentralanstalt, 43, Zurich, Switzerland, pp. 1-16.. 865 866 KOHONEN, T. (2001), Self-Organizing Maps, 3rd Ed. (Springer Series in Information Sciences) 867 868 Kruizinga, S., 1979. Objective classification of daily 500 mbar patterns. Preprints Sixth Conference 869 on Probability and Statistics in Atmospheric Sciences, 9-10 October 1979, Banff, Alberta, 870 American Meteorological Society, Boston, MA, 126-129. 871 872 Lamb, H., 1950. Types and spells of weather around the year in the British Isles: Annual trends, 873 seasonal structure of years, singularities. Quart. J. R. Met. Soc. 76, 393-438. 874 875 Lauscher, F., 1985: Klimatologische Synoptik Österreichs mittels der ostalpinen 876 Wetterlagenklassifikation, Arbeiten aus der Zentralanstalt für Meteorologie und Geodynamik, 877 Publikation Nr. 302, Heft 64 878 879 Litynski, J. (1969): A numerical classification of circulation patterns and weather types in Poland. 880 Prace Panstwowego Instytutu Hydrologiczno-Meteorologicznego 97: 3–15 (in Polish) 881 882 Lund, I. A., 1963. Map-pattern classification by statistical methods. J. Appl. Meteorol. 2, 56-65. 883 Martin, E., 2004. Validation of Alpine snow in ERA-40. ERA-40 Project Report Series 14. 884 European Centre for Medium-Range Weather Forecasts, Shinfield, Reading, UK. 885 886 MICHAELIDES, S.C., PATTICHIS, C.S., and KLEOVOULOU, G. (2001), Classification of 887 rainfall variability by using artificial neural networks, Internat. J. Climatology 21, 1401–1414 888 889 MICHAELIDES, S.C., LIASSIDOU F. and SCHIZAS C.N. (2007), Synoptic classification and 890 establishment of analogues with artificial neural networks. Pure and applied geophysics. 164 (2007) 891 1347-1364 892 893 Péczely, Gy., 1957: Grosswetterlagen in Ungarn. Kleinere Veröffentlichungen der Zentralanstalt 894 für Meteorologie, No.30. Budapest 895 896 Péczely, Gy., 1961: Characterising the meteorological macrosynoptic situations in Hungary. (in 897 Hungarian). OMI Kisebb Kiadványai No. 32. Budapest 898 899 Péczely, Gy., 1983: Catalogue of macrosynoptic situations of Hungary in years 1881-1983. (in 900 Hungarian). OMSz Kisebb Kiadványai No. 53. Budapest 901 902 Perret R, (1987): Une classification des situations météorologiques à l'usage de la prévision, 903 Schweizerischen Meteorologischen Zentralanstalt, 46, Zurich, Switzerland., 124 pp. 904 905 PETISCO, S.E., J.M. MARTÍN y D. GIL (2005): Método de estima de precipitación mediante 906 "downscaling" (A method for precipitation downscaling, in Spanish). Nota técnica n.º 11 del 907 Servicio de Variabilidad y Predicción del Clima, INM, Madrid. 908 909 Philipp, A., Della-Marta, P. M., Jacobeit, J., Fereday, D. R., Jones, P. D., Moberg, A., Wanner, H., 910 2007. Long term variability of daily north atlantic-european pressure patterns since 1850 classified 911 by simulated annealing clustering. J. Clim. 20, 4065-4095. 912 913 Philipp, A., 2008. Comparison of principal component and cluster analysis for classifying 914 circulation pattern sequences for the European domain. Theor. Appl. Climatol. DOI 915 10.1007/s00704-008-0037-1 916 917 Pianko-Kluczynska, K. (2007): New Calendar of Atmosphere Circulation Types According to J. 918 Litynski”, Wiadomosci Meteorologii Hydrologii Gospodarki Wodnej, pp. 65-85 (In Polish). 919 920 Richman, M.B., 1981: Obliquely rotated principal components: An improved meteorological 921 maptyping technique? J. Appl. Meteorol., 20, 1145-1159. 922 923 Schüepp, M., 1968. Kalender der Wetter- und Witterungslagen von 1955 bis 1967. 924 Veröffentlichungender Schweizerischen Meteorologischen Zentralanstalt, 11, 43 pp. 925 926 Schüepp, M., 1957. Klassifikationsschema, Beispiele und Probleme der Alpenwetterstatistik. La 927 Meteorologie 4, 291-299. 928 929 Schüepp, M., 1979. Witterungsklimatologie (Klimatologie der Schweiz III). Beihefte zu den 930 Annalen der Schweizerischen Meteorologischen Anstalt 1978 , 93 pp. 931 932 TYMVIOS F.S., COSTANTINIDES P., RETALIS A., MICHAELIDES S.C., PARONIS S., 933 EVRIPIDOU P. and KLEANTHOUS S. (2007), The AERAS project: data base implementation and 934 Neural Network classification tests. Urban Air Quality Proceedings, 2007. 935 936 TYMVIOS F.S., SAVVIDOU K., MICHAELIDES S.C., NIKOLAIDES K.A. (2008a), 937 Atmospheric circulation patterns associated with heavy precipitation over Cyprus. Geophysical 938 Research Abstracts, Vol. 10, EGU2008-A-00000, 2008 939 940 TYMVIOS F.S., SAVVIDOU K., MICHAELIDES S.C.,(2008b), Association of geopotential 941 height patterns to heavy rainfall events in the eastern Μediterranean. Advances in Geosciences 942 (submitted). 943 944 Richman MB (1986) Rotation of principal components. J Climatol 6: 293-335 945 Uppala, S.M., Kållberg, P.W., Simmons, A.J., Andrae, U., da Costa Bechtold, V., Fiorino, M., 946 Gibson, J.K., Haseler, J., Hernandez, A., Kelly, G.A., Li, X., Onogi, K., Saarinen, S., Sokka, N., 947 Allan, R.P., Andersson, E., Arpe, K., Balmaseda, M.A., Beljaars, A.C.M., van de Berg, L., Bidlot, 948 J., Bormann, N., Caires, S., Chevallier, F., Dethof, A., Dragosavac, M., Fisher, M., Fuentes, M., 949 Hagemann, S., Hólm, E., Hoskins, B.J., Isaksen, L., Janssen, P.A.E.M., Jenne, R., McNally, A.P., 950 Mahfouf, J.-F., Morcrette, J.-J., Rayner, N.A., Saunders, R.W., Simon, P., Sterl, A., Trenberth, 951 K.E., Untch, A., Vasiljevic, D., Viterbo, P., and Woollen, J. 2005: The ERA-40 re-analysis. Quart. 952 J. R. Meteorol. Soc., 131, 2961-3012.doi:10.1256/qj.04.176 953 954 955 Yarnal B. (1984): A procedure for the classification of synoptic weather maps from gridded 956 atmospheric pressure surface data, Computers & Geosciences, 10, 397-410 957 958 959 Yarnal B (1993) Synoptic climatology in environmental analysis Belhaven Press, London. 960 Figure 1. (will be finished soon and made available on the wiki) 961 Table 1.: Identification numbers, names and coordinates of spatial domains defined for 962 classification input data. # 00 01 02 03 04 05 06 07 08 09 10 11 name Europe Iceland West Scandinavia Northeastern Europe British Isles Baltic Sea Alps Central Europe Eastern Europe Western Mediterranean Central Mediterranean Eastern Mediterranean longitudes 37°W to 56°E by 3° (32) 34°W to 3°W by 1° (32) 06°W to 25°E by 1° (32) 24°E to 55°E by 1° (32) 18°W to 08°E by 1° (27) 08°E to 34°E by 1° (27) 03°E to 20°E by 1° (18) 03°E to 26°E by 1° (24) 22°E to 45°E by 1° (24) 17°W to 09°E by 1° (27) 07°E to 30°E by 1° (24) 20°E to 43°E by 1° (24) latitudes 30°N to 76°N by 2° (24) 57°N to 72°N by 1° (16) 57°N to 72°N by 1° (16) 55°N to 70°N by 1° (16) 47°N to 62°N by 1° (16) 53°N to 68°N by 1° (16) 41°N to 52°N by 1° (12) 43°N to 58°N by 1° (16) 41°N to 56°N by 1° (16) 31°N to 48°N by 1° (18) 34°N to 49°N by 1° (16) 27°N to 42°N by 1° (16) 963 964 965 Table 2.: Methods and variants overview. Alphabetical list of abbreviations used to identify 966 individual variants of classification configurations. In column 2 the number of types is given, 967 column 3 holds the parameters used for classification (SLP: sea level pressure, Z geopotential 968 height, UV zonal/meridional wind component and PW precipitable water, SFC denotes surface and 969 numbers the referring pressure level) and column 4 the key reference. The method groups are given 970 in column 5 where SUB means subjective, THR threshold based, PCA based on principal 971 component analysis, LDR leader algorithm and OPT optimization algorithm. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 abbreviation types CKMEANSC09 CKMEANSC18 CKMEANSC27 ESLPC09 ESLPC18 ESLPC27 EZ850C10 EZ850C20 EZ850C30 GWT GWTC10 GWTC18 GWTC26 HBGWL 9 18 27 9 18 27 10 20 30 18 10 18 26 29 parameters SLP SLP SLP SLP SLP SLP Z850 Z850 Z850 SLP SLP SLP SLP various key reference Enke and Spekat (1997) Enke and Spekat (1997) Enke and Spekat (1997) Erpicum et al. (2008) Erpicum et al. (2008) Erpicum et al. (2008) Erpicum et al. (2008) Erpicum et al. (2008) Erpicum et al. (2008) Beck (2000) Beck (2000) Beck (2000) Beck (2000) Hess and Brezowski (1952) method group OPT OPT OPT LDR LDR LDR LDR LDR LDR THR THR THR THR SUB 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 HBGWT KHC09 KHC18 KHC27 LITADVE LITTC LITTC18 LUND LUNDC09 LUNDC18 LUNDC27 LWT2 LWT2C10 LWT2C18 NNW NNWC09 NNWC18 NNWC27 OGWL OGWLSLP P27 P27C08 P27C16 P27C27 PCACA PCACAC09 PCACAC18 PCACAC27 PCAXTR PCAXTRC09 PCAXTRC18 PCAXTRKM PCAXTRKMC09 PCAXTRKMC18 PECZELY PERRET PETISCO PETISCOC09 PETISCOC18 PETISCOC27 SANDRA SANDRAC09 SANDRAC18 SANDRAC27 SANDRAS SANDRASC09 SANDRASC18 SANDRASC27 SCHUEPP TPCA07 TPCAC09 TPCAC18 10 various 9 SLP 18 SLP 27 SLP 9 SLP 27 SLP 18 SLP 10 SLP 9 SLP 18 SLP 27 SLP 26 SLP 10 SLP 18 SLP 9-30 Z500 9 SLP 18 SLP 27 SLP 29 SLP, Z500 29 SLP 27 Z500 8 SLP 16 SLP 27 SLP 4-5 SLP 9 SLP 18 SLP 27 SLP 11-17 SLP 9-10 SLP 15-18 SLP 11-17 SLP 9-10 SLP 15-18 SLP 13 various 40 various 25-38 SLP, Z500 9 SLP 18 SLP 27 SLP 18-23 SLP 9 SLP 18 SLP 27 SLP 30 Z925, Z500 9 SLP 18 SLP 27 SLP 40 SLP, Z500, U/V SFC/500 7 SLP 9 SLP 18 SLP Hess and Brezowski (1952) Blair (1998) Blair (1998) Blair (1998) Litynski (1969) Litynski (1969) Litynski (1969) Lund (1963) Lund (1963) Lund (1963) Lund (1963) James (2006) James (2006) James (2006) Michaelides (2001) Michaelides (2001) Michaelides (2001) Michaelides (2001) James (2007) James (2007) Kruizinga (1979) Kruizinga (1979) Kruizinga (1979) Kruizinga (1979) Yarnal (1993) Yarnal (1993) Yarnal (1993) Yarnal (1993) Esteban et al. (2005) Esteban et al. (2005) Esteban et al. (2005) Esteban et al. (2006) Esteban et al. (2006) Esteban et al. (2006) Peczely (1957) Perret (1987) Petisco (2005) Petisco (2005) Petisco (2005) Petisco (2005) Philipp et al. (2007) Philipp et al. (2007) Philipp et al. (2007) Philipp et al. (2007) Philipp (2008) Philipp (2008) Philipp (2008) Philipp (2008) Schüepp (1979) Huth (1993) Huth (1993) Huth (1993) SUB LDR LDR LDR THR THR THR LDR LDR LDR LDR THR THR THR OPT OPT OPT OPT SUB SUB PCA PCA PCA PCA OPT OPT OPT OPT PCA PCA PCA OPT OPT OPT SUB SUB OPT/LDR OPT/LDR OPT/LDR OPT/LDR OPT OPT OPT OPT OPT OPT OPT OPT THR PCA PCA PCA 67 68 69 70 71 72 73 972 TPCAC27 TPCAV WLKC09 WLKC18 WLKC28 WLKC733 ZAMG 27 6-12 9 18 28 40 43 SLP SLP U700, V700 U700,V700, Z925 U700,V700, Z925/500 U/V700, Z925/500, PW various Huth (1993) Huth (1993) Dittmann et al. (1995) Dittmann et al. (1995) Dittmann et al. (1995) Dittmann et al. (1995) Lauscher (1985) PCA PCA THR THR THR THR SUB