Renewable Energy 29 (2004) 2111–2131 www.elsevier.com/locate/renene Stochastic generation of hourly mean wind speed data Hafzullah Aksoy , Z. Fuat Toprak, Ali Aytek, N. Erdem Ünal Department of Civil Engineering, Civil Engineering Faculty, Istanbul Technical University, Hydraulics Division, Maslak, 34469 Istanbul, Turkey Received 19 September 2003; accepted 23 March 2004 Abstract Use of wind speed data is of great importance in civil engineering, especially in structural and coastal engineering applications. Synthetic data generation techniques are used in practice for cases where long wind speed data are required. In this study, a new wind speed data generation scheme based upon wavelet transformation is introduced and compared to the existing wind speed generation methods namely normal and Weibull distributed independent random numbers, the first- and second-order autoregressive models, and the first-order Markov chain. Results propose the wavelet-based approach as a wind speed data generation scheme to alternate the existing methods. # 2004 Elsevier Ltd. All rights reserved. Keywords: Normal distribution; Weibull distribution; Autoregressive models; Markov chain; Wavelet; Hourly mean wind speed 1. Introduction and existing literature Climatology is defined as a set of probabilistic statements on long-term weather conditions [1], and wind climatology as that branch of climatology that specialises in the study of winds, from which information on extreme winds is provided to structural designers. Such information is also needed for wind energy producers and engineers who design coastal civil structures, for example breakwaters. From a structural engineering point of view, forecasting the maximum wind speed that is Corresponding author. Tel.: +90-212-2856577; fax: +90-212-2856587. E-mail address: haksoy@itu.edu.tr (H. Aksoy). 0960-1481/$ - see front matter # 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.renene.2004.03.011 2112 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 expected to affect a structure during its lifetime is important to the designer. On the other hand, in coastal engineering practices, not only the magnitude but also the directionality of wind becomes important. The duration of wind, in addition to its magnitude and direction, is also required in wind energy production systems, and the amount of energy that can be produced depends upon it. The information required by either structural and coastal engineers or wind energy producers is related to wind speed data, and is a matter of quality and quantity. The quality of the wind speed data refers to whether the data set is reliable and micrometeorologically homogeneous. A data set is reliable if (i) the measurement instrument performs adequately, (ii) the instrument is not influenced by obstructions and (iii) the atmospheric stratification is neutral. A set of wind speed data is considered micrometeorologically homogeneous if the data set is obtained under identical micrometeorological conditions [1]. The size of the data set (quantity) is related to the time period during which the wind speed data are recorded. The time period over which wind speed data are recorded is usually shorter than the lifetime of civil engineering structures. Therefore, the worst case of wind load that the structural designer expects that the structure will face during its lifetime is determined by modelling the wind speed data record in hand. For this, climatological and physical modelling techniques are available. Additionally, probabilistic and stochastic models have been developed, for which the existing literature is reviewed in brief below. The main aim in those techniques is to determine minimum design loads due to wind [2]. Short records of daily, weekly, and monthly highest wind speeds taken at 36 weather stations in the US were empirically analyzed [3] in order to determine design wind speeds. Short records of hourly mean wind speed data from normal regions in the US were used by Cheng and Chiu [4] for determination of the transition probabilities of the Markov chain upon which the methodology in that study was based. This methodology was extended later to tropical cyclone-prone regions [5]. Also, a knowledge-based expert system, principally similar to the mentioned methodologies, was made available [6,7]. Alternative approaches used in the generation of simulated wind speed time series were compared by Kaminsky et al. [8]. Sfetsos [9] examined adaptive neuro-fuzzy inference systems and neural logic networks and compared them to the traditional autoregressive moving average (ARMA) models. Dukes and Palutikof [10] employed the Markov chain in order to estimate hourly mean wind speed with very long return periods. Another Markov chain based study was conducted by Sahin and Sen [11]. Castino et al. [12] coupled autoregressive processes to the Markov chain and simulated both wind speed and direction. A recent study [13] presents a wavelet-based method to generate artificial wind data. The Weibull distribution has commonly been fitted to hourly mean wind speed data [14,15]. The peaks-over-threshold approach has also been commonly used in the estimation of extreme quantiles of wind speed data [16–19]. 2. Methods In this study, a number of probabilistic and stochastic methods are used in order to compare their ability to reproduce long series of hourly mean wind speed data H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2113 with the same statistical behaviour as that of the observations in hand. The normal and Weibull probability distribution functions are chosen in order to generate independent and identically distributed random numbers. Autoregressive processes are useful tools in generating data sets in cases where persistency exists. Persistency means that large values tend to be followed by large values, and small values by small values, so that runs of values of similar magnitude tend to persist throughout the sequence. First- and second-order autoregressive processes are chosen in this study. Another concept commonly employed in wind speed data generation studies is the first-order Markov chain. The results of these methods are compared to those obtained from a newly developed wavelet-based approach. The methods are described below. Only the wavelet-based approach will be detailed, whereas the remaining five methods will be outlined briefly as they have been well documented in literature. 2.1. Normal distribution Hourly mean wind speed time series are generated by using a sequence of independent random numbers from the normal distribution. The normal probability distribution function is given by 1 f ðwÞ ¼ pffiffiffiffiffiffi exp½ðw lÞ2 =2r2 r 2p ð1Þ where w is the variable (hourly mean wind speed, in this study), l mean value of wind speed, and r standard deviation of wind speed. A number of computational methods are available for the generation of random numbers with normal probability distribution of mean l and standard deviation r. 2.2. Weibull distribution The Weibull distribution is another probability distribution function commonly used for the frequency analysis of wind speed data [14,15]. It is given by a a1 1 a f ðwÞ ¼ a w exp a w w 0; a; b > 0 ð2Þ b b where a and b are shape and scale parameters, respectively, that can be determined by using either a graphical method or the method of moments. They can also be determined using the method of probability weighted moments (PWMs) for which explicit equations are available. It is the method used in this study for the determination of parameters. Equations to be used for this purpose are given by lnð2Þ L2;ðln wÞ 0:5772 b ¼ exp L1;ðln wÞ þ a a¼ ð3aÞ ð3bÞ 2114 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 In Eqs. (3a,b), L1,(ln w) and L2,(ln w) are L1 and L2 moments of the logarithm of the hourly mean wind speed time series. The L1 and L2 moments of a series are given by L 1 ¼ b0 ð4aÞ L2 ¼ 2b1 b0 ð4bÞ in which b0 and b1 are given by b0 ¼ x b1 ¼ N 1 X j¼1 ð5aÞ ðN jÞ xj NðN 1Þ ð5bÞ xj in Eq. (5b) comes from the time series sorted in descending order as xN xi x1 . Detailed information on L moments and the method of PWM is given in [20]. Once the parameters are determined, the generation of Weibull distributed random numbers is a matter of a simple computer code, as the cumulative distribution function of the Weibull distribution can be obtained in closed form. 2.3. AR(1) model The hourly mean wind speed time series is of high dependence. This property particularly requires a wind speed data generation model incorporating the dependence structure of the observations. As mentioned, both normal and Weibull distributed random numbers do not take this property into account as they are independent, but autoregressive models are of correlated type and hence capable of simulating this property of the data series. The use of autoregressive type models is reported in literature very commonly. The first-order autoregressive [AR(1)] model accommodates only the effect of the previous value in the series in which the observed sequence of wind speed data {w1, w2,. . ., wt,. . .} is used to fit a model of form wi ¼ m X aj wij þ ei ð6Þ j¼1 where w is the hourly mean wind speed, a the autoregressive coefficient, that is, model parameter, and e a normally distributed independent random variable. It is noted that Eq. (6) is written for the mth order. The simplest case of Eq. (6) is obtained for m ¼ 1, which is also called the Markov model. Eq. (6) then becomes yi ¼ r1 yi1 þ ei ð7Þ where y is the standardised (zero mean and unit variance) version of the variable and r1 the lag-one serial correlation coefficient of the sequence. H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2115 The random component (e) in AR(1) is of normal distribution with zero mean and a variance of 1 r21 . The simulation procedure for the processes is very simple. It requires only a random number of a normal distribution to be generated. 2.4. AR(2) model With increase in order of the autoregressive model, the dependence structure in the observations is better preserved. Therefore, the second-order autoregressive [AR(2)] model is preferred to AR(1). This becomes more important in cases where dependence in the data set is very obvious, as in the hourly mean wind speed data. AR(2) is formulized as yi ¼ /1 yi1 þ /2 yi2 þ ei ð8Þ where autoregressive coefficients /1 and /2 are given by /1 ¼ r1 ð1 r2 Þ=ð1 r21 Þ ð9aÞ /2 ¼ ðr2 r21 Þ=ð1 r21 Þ ð9bÞ in which r1 is the lag-one autocorrelation coefficient and r2 the lag-two autocorrelation coefficient of the wind speed time series. The random component in AR(2) is again of normal distribution, with zero mean and variance equal to 1R2, where R2 ¼ r21 þ r22 2r1 r2 1 r21 ð10Þ 2.5. Markov chain In this approach, the observed time series is divided into a number of states. A wind speed state contains wind speeds between certain values. For example, State 1 might include wind speeds below 2 m/s, State 2 wind speeds between 2 and 4 m/s, etc. until the final wind speed state includes all speeds above the highest observed value or a predefined upper limit. The upper and lower limits of the states are highly subjective values. For instance, the hourly mean wind speed data set in this study was divided into 10 states. In another wind speed study [11], states were defined depending upon the standard deviation of the data set. Each state in that study [11] was taken as wide as one standard deviation of the observed hourly mean wind speed time series. Dukes and Palutikof [10], on the other hand, used a fixed width for the states, which was equal to 2 m/s. In the Markov chain approach, the state of wind speed in the current hour can be defined depending only upon the previous state. This is called the first-order or one-step Markov chain. Two previous states are used in the second-order or twostep Markov chain in determining the current state of the wind speed. Although they are not common as the first- and second-order Markov chains, higher-order Markov chains can also be used. However, dramatic increase in the number of their parameters limits their use. 2116 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 The parameter set of a Markov chain consists of probabilities of transition from one state to another that are given in transition probability matrices. The transition probability matrix of a first-order Markov chain with m states can be written symbolically as 2 3 P11 P12 . . . P1m 6 P21 P22 . . . P2m 7 7 P¼6 ð11Þ 4 . . . . . . Pij . . . 5 Pm1 Pm2 . . . Pmm where Pij is the probability of transition from state i to state j. The number of parameters is mðm 1Þ, as the sum of the probabilities is equal to 1 (100%) for each row of the matrix. If nij is the total number of hours of observation in state j with the previous state i, the probabilities of transition from state i to state j can be calculated as nij Pij ¼ P nij i; j ¼ 1; 2; . . . ; m ð12Þ j The procedure for generating the simulated hourly mean wind speed time series is explained below. First, the cumulative transition probability matrix is calculated. In the cumulative transition probability matrix, cumulative summation of probabilities within each row is carried out; hence, each row in that matrix ends with 1. Then, an initial state is adopted. No wind (State 1) can, for example, be assumed as the initial state. Using a uniform random number, the next state of wind speed can be determined. If State 1 is obtained as the new state of wind speed, then it is first checked if the wind speed is zero. If the wind speed is not zero, then a uniform random number is generated from the interval of State 1. If the highest state is found to be the new state of wind speed, then a shifted one-parameter gamma distributed random number is used in order to find the magnitude of the wind speed. The reason for choosing the gamma distribution will be discussed in the section where results obtained from application of the methods are presented. For intermediate states, a uniform random number from the interval of the corresponding state is generated and set as the wind speed at the current hour. 2.6. Wavelet-based approach A real or complex-value continuous function with zero mean and finite variance is called a wavelet [21]. There are many functions that can qualify as wavelets. Some examples of wavelets are Morlet, Mexican hat, Shannon and Meyer. A simple wavelet is the Haar wavelet (Fig. 1), defined as 8 < 1 0 t 1=2 wðtÞ ¼ 1 1=2 t 1 ð13Þ : 0 otherwise Decomposing a signal and then reconstructing it is the base for the wavelet transform. In this study, the Haar wavelet was used due to its simplicity. There- H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2117 Fig. 1. Haar wavelet. fore, decomposition of a signal (multiresolution analysis) with the Haar wavelet is considered and explained in detail below. For a certain value of k, let us define fk(t) as the average of f(s) over an interval of size 2k: ð k 1 2 ðlþ1Þ fk ðtÞ ¼ k f ðsÞds 2k l < t < 2k ðl þ 1Þ ð14Þ 2 2k l where k and l are integers, k a scale variable (k > 0 means stretching and k < 0 means contracting of the wavelet) and l a translation variable [21]. For k ¼ 1,. . .,1, 0, 1,. . ., 1, fk(t) is as follows: f1 ðtÞ ¼ f ðtÞ .. . ð ðlþ1Þ=2 l lþ1 f1 ðtÞ ¼ 2 <t< f ðsÞds 2 2 l=2 ð ðlþ1Þ f0 ðtÞ ¼ f ðsÞds l < t < ðl þ 1Þ l ð 2ðlþ1Þ 1 f ðsÞds 2l < t < 2ðl þ 1Þ f1 ðtÞ ¼ 2 2l .. . f1 ðtÞ ¼ 0 ð15Þ 2118 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 The resolution decreases as k increases. The difference between the successive averages fk1(t) and fk(t) is defined as a detail function: gk ðtÞ ¼ fk1 ðtÞ fk ðtÞ It can be easily seen that 1 X gk ðtÞ f ðtÞ ¼ ð16Þ ð17Þ k¼1 According to Eq. (17), the original signal is obtained when all detail functions are summed up. Change in data resolution with change in k, the resolution level, can be seen in the upper part of Fig. 2, in which the average of the time series taken at different resolution levels according to Eq. (15) is shown. Note that the data sample used in Fig. 2 has 16 elements. Increase in the ordinates of fk(t) with decrease in k shows the change (increase) in the resolution. The middle part of Fig. 2 shows the detail functions calculated using Eq. (16) for different resolution levels. Note from Eq. (15) that f 4 ðtÞ ¼ 0 for all t. At the bottom of Fig. 2, f(t), the sum of the four detail functions according to Eq. (17), is seen, and it represents the original data, f0(t). Eq. (17) is the basis for the generation algorithm explained below. Let us consider a data sample of size M ¼ 2K , where K is a positive integer (K ¼ 4 for the sequence in Fig. 2) taken from a stochastic process f(t) with zero mean: f(1), f(2),. . ., f(M). Define the sample fk(i) (k ¼ 0, 1,. . ., K; i ¼ 1,. . ., M) consisting of averages of 2k successive elements of the sample. f0(i) is the original sample and fK(i) is a sample of all zeros, since the average of M elements is zero. The detail function gk(t) has a sample consisting of M elements given by Eq. (16) for k ¼ 1, 2,. . ., K. Thus, for each element fi of the original sample, we have K detail function values, gk(i), corresponding to different resolutions. Choosing from M elements for each gk(t) randomly, and then summing them up using Eq. (17), one obtains a simulated value for f(t) as f ðjÞ ¼ K X gk ðjÞ ð18Þ k¼1 where j is the index for generated elements. The generation algorithm is given step by step as follows [22] and is illustrated in Fig. 3 for K ¼ 4. 1. In order to obtain the first element of the series ( j ¼ 1), gk values (k ¼ 0, 1,. . ., K) are chosen from M values randomly and summed up to obtain f1 (Fig. 3). 2. The second element ( j ¼ 2) is generated by choosing, for each k, the gk coming just after the gk values chosen in the first step. f2 is obtained by the summation of these (Fig. 3). 3. Data generation is continued in this way for a desired number of times using, for the generation of each element fj, the detail function values right next to those of the previous step j1 at each resolution level. H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 Fig. 2. Decomposition and reconstruction of a data sequence. 2119 2120 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 Fig. 3. Construction of a simulated data sequence. This generation algorithm is a newly developed approach for data simulation purposes. It was first used in non-skewed annual and monthly streamflow data simulation studies [22,23]. The approach was later used for the simulation of the storage capacity of river reservoirs [24]. Modelling suspended sediment discharge series [25] and annual and monthly rainfall data series [26] was also performed by this approach successfully. The algorithm generated the mean, standard deviation and correlation structure of the observed streamflow data sets. When one is interested in the generation of skewed data, it is first required to transform the data to a non-skewed structure, generate them and then transform them back to their skewed structure. 3. Application The methods were applied to an hourly mean wind speed data set that will be introduced in the following subsections. Results obtained from the application of the methods are presented and discussed below. The performance of the methods H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2121 was measured according to their ability to capture the statistical behaviour of the observed data set. A comparison of the methods is finally presented. 3.1. Data Table 1 shows the main statistical characteristics of the data set of hourly mean wind speed taken from the State Meteorological Works’ meteorology station in Diyarbakir, a southeastern Anatolian city. The data set is of four years’ length, from 1994 to 1997 (35064 hours in total). The region is normal, as is seen in Table 1. The data set is highly correlated, as expected, and skewed. For the wavelet-based approach, 32768 hours of data, extending from the first hour of April 6, 1994, to the eighth hour of December 31, 1997, were used. This is a choice with no specific reason. Characteristics corresponding to that part of the observed series are also given in Table 1. 3.2. Parameters The hourly mean wind speed data set used in the study is of skewed structure. This prevents fitting of the normal distribution to the data. Therefore, power transformation [ y ¼ xh ; where x the is raw (untransformed) variable, y the transformed variable, and h the transformation coefficient] was adopted in order to obtain nonskewed data, to which the normal distribution can be fitted. The transformation coefficient was obtained as h ¼ 0:38585 for the data set in the study. As the normal distribution is fitted to the transformed hourly mean wind speed time series (but not to the raw data series), the parameters of the normal distribution are the mean and standard deviation of the transformed hourly mean wind speed time series. Those parameters are presented in Table 2. The normal probability distribution function based upon the determined parameters was fitted to the transformed wind speed data series (Fig. 4). It is seen that the distribution performs very well in fitting to the observations as well as to the generated data, to be explained later in following sections. The Weibull distribution has two parameters (a, the shape parameter, and b, the scale parameter). The parameters were determined using the method of L-moments on which detailed information was given previously. The reason for choosing this method is that explicit equations are available for determination of the parameters of the distribution. The method also has the superiority of being less sensitive to outliers, which means that outliers do not affect the performance of the method in determining the parameters correctly. The only problem with this method is the presence of zero wind speeds, which makes the method inapplicable due to the logarithm included. In order to overcome this problem, zero wind speeds were ignored from the observed time series as their number of occurrences was very small, less than 0.5%. The parameters of the Weibull distribution determined by the method of L-moments are listed in Table 2. Fig. 5 shows the agreement between the observed data and the fitted Weibull probability distribution function. It can be considered a very good fit, although the Weibull probability distribution function 2.555 6 April 1994–31 32768 December 1997 Mean (m/s) 2.538 Number of data 1 January 1994–31 35064 December 1997 Date 1.794 1.786 Standard deviation (m/s) 0.702 0.703 Coefficient of variation Table 1 Statistical characteristics of observed hourly mean wind speed time series 1.283 1.285 Coefficient of skewness 14.4 14.4 Maximum wind speed (m/s) 0.861 0.860 r1 0.733 0.732 r2 0.635 0.633 r3 Correlation coefficient 0.551 0.549 r4 0.438 0.476 r5 2122 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2123 Table 2 Parameter sets of methods Method Parameter set Normal Weibull AR(1), AR(2) l ¼ 1:347 m=s a ¼ 1:583 r1 ¼ 0:820 r ¼ 0:392 m=s b ¼ 1:973 r2 ¼ 0:688 gives the mode an occurrence probability slightly lower than that in the observation. AR(1) is a parametric model with two parameters (a, the autoregression coefficient, and r2e , the variance of the independent normal variable). The model requires only the lag-one serial correlation coefficient (r1), as both parameters are dependent only upon r1. AR(2) has three parameters (/1 and /2, the autoregression coefficients, and r2e , the variance of the independent normal variable), all functions of r1 and r2, the lagone and lag-two serial correlation coefficients listed in Table 2. Of the six methods, the Markov chain is the one that requires the highest number of parameters. The number of parameters required changes with the number of states used for the wind speed. In this study, 10 states were chosen for the wind speed, each 1.5 m/s wide. This resulted in 90 transition probabilities to be determined from the observed wind speed data set, when it is considered that summation Fig. 4. Normal probability distribution function fitted to the observed and simulated random wind speed sequences. 2124 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 Fig. 5. Weibull probability distribution function fitted to the observed and simulated random wind speed sequences. over any row in the transition probability matrix results in 100% probability. The transition probability matrix of the data set is given in Table 3. Not only transition probabilities, but also the wind speed distribution in each state should be known by this method. In this study, wind speed was assumed to be distributed uniformly over the states except for the last one (state of highest wind speeds with no upper limit), where the one-parameter gamma distribution was used. In State 1 with the lower limit of zero, the probability of occurrence of zero wind speed was also taken Table 3 Transition probability matrix of the observed hourly mean wind speed data set Pij j¼1 2 3 4 5 6 7 8 9 10 i¼1 2 3 4 5 6 7 8 9 10 0.7053 0.2405 0.0256 0.0042 0.0008 0.0000 0.0000 0.0000 0.0000 0.0000 0.2779 0.6089 0.2839 0.0491 0.0176 0.0089 0.0152 0.0000 0.0000 0.0000 0.0144 0.1306 0.5317 0.3116 0.0865 0.0266 0.0076 0.0000 0.0000 0.0000 0.0015 0.0153 0.1352 0.4954 0.3486 0.1197 0.0455 0.0526 0.0000 0.0000 0.0008 0.0041 0.0178 0.1191 0.4311 0.3437 0.1212 0.0000 0.0000 0.0000 0.0001 0.0005 0.0048 0.0179 0.0978 0.4013 0.3561 0.2105 0.0000 0.0000 0.0000 0.0000 0.0005 0.0023 0.0168 0.0865 0.3485 0.3684 0.1818 0.0000 0.0000 0.0001 0.0003 0.0003 0.0008 0.0111 0.1061 0.2632 0.3636 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0022 0.0000 0.0789 0.3636 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0263 0.0909 0.0000 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2125 into consideration in order to reproduce the zero wind speeds, although their occurrence was very low. There is no parameter to be listed for the wavelet approach, as it is a nonparametric method. The length of the series to be used in this method is equal to 2K, where K is a positive integer and equal to 15 in this study. This corresponds to a series 32768 hours in length. Another requirement for the wavelet approach is that the data set should be of a non-skewed structure. Therefore, the part of the observed series used for the wavelet approach was transformed by using the power transformation with h ¼ 0:3853. 3.3. Simulation and results A thousand-year (8760000-hour) -long series was generated for each method. The correlogram, frequency distribution of maximum wind speeds and wind duration curve obtained from the simulations will be compared to those of the observed series. It is obvious that the hourly mean wind speed time series has a highly dependent structure. The normal and Weibull distributions, however, are of independent structures (Fig. 6), yet they are very common methods used in generating wind speed data. These methods may be useful in offering, to the structural designer, the highest wind speed that the structure will possibly face during its lifetime. It is seen from Fig. 4 that wind speed data generated by the normal distribution fit the observed series very well. It is seen in Fig. 5 that the Weibull fit is perfect as well. Other than those two methods, the AR(1), AR(2), and Markov chain methods looked to produce the dependence structure of the series. However, with increasing lags in time, the success of those methods in reproducing the correlation structure Fig. 6. Correlogram of the observed and simulated wind speeds. 2126 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 Fig. 7. Cumulative frequency diagram of maximum wind speeds. of the series decreases (Fig. 6). The wavelet method of the six studied, was found to be the best in preserving the correlation structure of the series. The annual maximum values of the simulation series were compared in Fig. 7. It is seen that the normal distribution, AR(1) and AR(2) produced similar maxima, whereas the wavelet approach produced higher, the Markov chain slightly lower and the Weibull distribution considerably lower maxima. From the structural engineering point of view, therefore, it is safer to use the wind load due to the maximum wind speed generated by the wavelet-based approach. Maxima obtained from the Markov chain method should be discussed specifically. There are three vertical jumps (one of them is very obvious) in the cumulative frequency diagram of the maxima of this method, as is seen in Fig. 7. The reason for those jumps can be explained very simply. It is seen from Table 3 that the probabilities of transition of the wind speed to the highest states are too low, making transition of wind speed to those states almost impossible in the simulation series. It is only possible to make a transition to State 10 if the previous state in the simulation series is either State 8 or State 9. Otherwise State 10 is not simulated. This causes the maximum value of the series to be bounded by the upper limit of State 9, which was taken as 13.5 m/s in this study. A very small jump exists in the frequency curve in Fig. 7 due to this circumstance. Similarly, State 9 can be simulated if and only if the previous state of the wind speed is one of the following states: 3, 6, 8, 9 and 10 (Table 3). The big jump in the cumulative frequency curve in Fig. 7 is due to this situation. It is seen that maximums of the simulated series are bounded by the upper limit of State 8, which was taken as 12 m/s in this study. H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2127 The third jump at the very beginning of the curve (close to the y-axis of the graph) is due to a similar situation. This is the result of not having simulated wind speeds from State 8, which limits the maximum wind speed to 10.5 m/s at the upper limit of State 7. This drawback of the method can be overcome by forcing the simulation series to have at least one value from the highest state that results in a maximum wind speed data series all generated from the highest state with no upper limit. Such a forcing can be considered quite reasonable and it does not affect the transition probability matrix as the number of data is usually very large (of the order of tens of thousands). Uniformly distributed wind speeds were accepted for the intermediate states, whereas the one-parameter gamma distribution was adopted for the highest state (State 10 in this study). The reason for choosing this distribution is explained below together with a discussion on other distributions. A distribution with no upper limit should be used for the highest state so that maximum wind speeds higher than those in the observed series can possibly be generated. Therefore, in this study, it was first thought to simulate wind speeds of the highest state by using the exponential distribution shifted to that state as Sahin and Sen [11] did. This is quite a reasonable choice for simulating the wind speeds in that state. However, it was seen that the exponential distribution generated lower maximum wind speeds compared to those generated by other methods. Therefore, the Gumbel distribution was tested. It was seen that the maximum wind speeds generated by this distribution were too low compared to those obtained by the other methods. The Frechet distribution, which is accepted as the distribution of the maximum wind speeds [1], was also found to be unsuccessful in generating maximum wind speeds compared to other methods. The distribution generated low maximum wind speeds. In the end, the two-parameter gamma distribution was fitted, which resulted again in low maximum wind speeds. Finally, the one-parameter gamma distribution was fitted and results comparable to those of the other methods (in Fig. 7) were obtained. The conclusion that can be drawn from those trials is that a one-parameter distribution can fit to the highest state better than distributions with two or more parameters. If the standard deviation of the highest state, which is bounded by the lower and upper limits of the state, is included in the generation scheme, then lower maximum wind speeds are generated. Therefore, mean-dependent probability distribution functions are better in the simulation of maximum wind speeds. The transition probability matrix of the Markov chain based simulation wind speed series is given in Table 4. It is almost the same as its observed counterpart given in Table 3, which means that the Markov chain based simulation technique worked very well in the simulation of the state of the wind speed series. The wind duration curve is a graph with time percentage as abscissa and wind speed as ordinate (Fig. 8). It is an important tool used in determining the percentage of time that the wind speed exceeds a specified level. Wind energy production systems use this graph in order to determine the wind energy potential of the region under consideration. A very good fit was obtained in Fig. 8, where the wind duration curves of the six methods were plotted together with the one extracted 2128 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 Table 4 Transition probability matrix of hourly mean wind speed data simulated by Markov chain method Pij j¼1 2 3 4 5 6 7 8 9 10 i¼1 2 3 4 5 6 7 8 9 10 0.7049 0.2406 0.0255 0.0042 0.0008 0.0001 0.0001 0.0000 0.0000 0.0000 0.2782 0.6086 0.2845 0.0491 0.0181 0.0089 0.0156 0.0000 0.0000 0.0000 0.0143 0.1309 0.5318 0.3115 0.0860 0.0259 0.0077 0.0000 0.0000 0.0000 0.0016 0.0153 0.1346 0.4958 0.3474 0.1206 0.0478 0.0501 0.0000 0.0000 0.0009 0.0041 0.0179 0.1191 0.4322 0.3421 0.1227 0.0000 0.0000 0.0000 0.0001 0.0005 0.0048 0.0177 0.0982 0.4028 0.3526 0.2197 0.0000 0.0000 0.0000 0.0000 0.0005 0.0023 0.0165 0.0861 0.3491 0.3667 0.1888 0.0000 0.0000 0.0001 0.0003 0.0003 0.0008 0.0113 0.1045 0.2606 0.3619 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0022 0.0000 0.0777 0.3720 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0252 0.0773 0.0000 from the observed series. Although the wind duration curve of the Markov chain method fluctuates around the others, it has a fit that is good enough as well. The first three central moments of the observed and simulated series are given in Table 5. The maximum values and the first five lags of the correlation are also listed. It is seen that the mean values of the simulated series are almost the same as those of the observations. The wavelet-based method approaches its counterparts with a relative error of 0.3%. Standard deviation and variation coefficient were best captured by the normal probability distribution, and AR(1) and AR(2) processes. Skewness coefficient in the wind speed time series was best reproduced by Fig. 8. Wind duration curve of observed and simulated wind speeds. Mean (m/s) 2.538 2.529 2.537 2.537 2.585 2.566 Series Normal Weibull AR(1) AR(2) Markov Wavelet 1.777 1.634 1.776 1.776 2.058 1.833 Standard deviation (m/s) Table 5 Statistical characteristics of simulated series 0.700 0.646 0.700 0.700 0.796 0.714 Coefficient of variation 1.315 0.980 1.309 1.313 0.983 1.438 Coefficient of skewness 26.94 16.22 27.30 25.23 21.29 31.13 Maximum wind speed (m/s) 0.0005 0.0003 0.815 0.815 0.715 0.715 r1 r3 0.0005 0.0003 0.536 0.561 0.483 0.523 r2 0.0002 0.0005 0.661 0.677 0.587 0.580 Correlation coefficient 0.0002 0.0001 0.436 0.466 0.399 0.443 r4 0.0002 0.0007 0.355 0.387 0.330 0.421 r5 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2129 2130 H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 AR(1). Higher maximums were obtained by the methods of AR(1), wavelet and normal distribution and lower maximums by Weibull distribution. Correlation structure, as discussed earlier, was best simulated by the wavelet-based method. 4. Summary and conclusion In this study, hourly mean wind speed data sets were generated by traditional simulation methods—the normal and Weibull probability distribution functions, the first- and second-order autoregressive processes, and the Markov chain. Additionally, the newly developed wavelet-based approach was used. The normal and Weibull probability distribution functions consist of independent identically distributed random numbers. The autoregressive models include the correlation structure of the observation and hence generate dependent series. The Markov chain is a two-step method that first determines the state of the wind speed and then generates its magnitude by using a preselected distribution. All the mentioned methods are parametric and they therefore require the time series to have a specific probability distribution. This is a drawback of parametric models more than a limitation. A nonparametric model, of which the wavelet approach in this study is one of the best examples, can be applied to data sets with any distribution. However, it should be kept in mind that the wavelet approach works only with sequences of zero skewness. The correlation structure of the observations, distribution of the maximum wind speeds, wind duration curve and statistical features of the series were used in order to compare the success of the methods. The generation of maximum wind speeds requires special attention in Markov chain based simulation methods. Based upon the application in this study, it is concluded that the uniform probability distribution function is suitable for use in the first and intermediate states. A probability distribution function with no upper limit should be used for the highest state. It is concluded that the one-parameter gamma distribution is good enough in fitting to the wind speed data in the highest state of the series for normal regions, such as the one used in this study. Some methods performed better in preserving some particular characteristics than other methods did. For example, the wavelet method is obviously the best in preserving the correlation structure of the sequence. This method is as good at preserving other statistical features of the series as other methods. Therefore, in conclusion, the wavelet method is proposed as a tool to substitute for the classical generation schemes for the simulation of hourly mean wind speed data. Acknowledgements The wavelet approach presented in this study is a result of an earlier cooperation between the first author (H. Aksoy) and Professor M. Bayazit of Istanbul Technical University, Turkey, whom the authors sincerely thank. H. Aksoy et al. / Renewable Energy 29 (2004) 2111–2131 2131 References [1] Simiu E, Scanlan RH. Wind effects on structures. New York: John Wiley & Sons; 1986. [2] American Society of Civil Engineers. Minimum design loads for buildings and other structures. ANSI/ASCE 7-93 (Revision of ANSI/ASCE 7-88), New York, 1994. [3] Simiu E, Filliben JJ, Shaver JR. Short-term records and extreme wind speeds. ASCE, Journal of the Structural Division 1982;108(ST11):2571–7. [4] Cheng EDH, Chiu ANL. Extreme winds simulated from short-period records. ASCE, Journal of Structural Engineering 1985;111(1):77–94. [5] Cheng EDH, Chiu ANL. Extreme winds generated from short records in a tropical cyclone-prone region. Journal of Wind Engineering and Industrial Aerodynamics 1988;28:69–78. [6] Cheng EDH. Wind data generator: a knowledge-based expert system. Journal of Wind Engineering and Industrial Aerodynamics 1991;38:101–8. [7] Cheng EDH, Chiu ANL. An expert system for extreme wind simulation. Journal of Wind Engineering and Industrial Aerodynamics 1990;36:1235–43. [8] Kaminsky FC, Kirchhoff RH, Syu CY, Manwell JF. A comparison of alternative approaches for the synthetic generation of a wind speed time series. Transactions of the ASME 1991;113:280–9. [9] Sfetsos A. A comparison of various forecasting techniques applied to mean hourly wind speed time series. Renewable Energy 2000;21:23–35. [10] Dukes MDG, Palutikof JP. Estimation of extreme wind speeds with very long return periods. Journal of Applied Meteorology 1995;34:1950–61. [11] Sahin AD, Sen Z. First-order Markov chain approach to wind speed modelling. Journal of Wind Engineering and Industrial Aerodynamics 2001;89:263–9. [12] Castino F, Festa R, Ratto CF. Stochastic modelling of wind velocities time series. Journal of Wind Engineering and Industrial Aerodynamics 1998;74–76:141–51. [13] Kitagawa T, Nomura T. A wavelet-based method to generate artificial wind fluctuation data. Journal of Wind Engineering and Industrial Aerodynamics 2003;91:943–64. [14] Garcia A, Torres JL, Prieto E, De Francisco A. Fitting wind speed distributions: a case study. Solar Energy 1998;62(2):139–44. [15] Grigoriu M. Estimates of design wind from short records. ASCE Journal of the Structural Division 1982;108(ST5):1034–48. [16] Heckert NA, Simiu E, Whalen T. Estimates of hurricane wind speeds by ‘peaks over threshold’ method. ASCE Journal of Structural Engineering 1998;124(4):445–9. [17] Lechner A, Simiu E, Heckert NA. Assessment of ‘peaks over threshold’ methods for estimating extreme value distribution tails. Structural Safety 1993;12:305–14. [18] Pandey MD, Van Gelder PHAJM, Vrijling JK. The estimation of extreme quantiles of wind velocity using L-moments in the peaks-over-threshold approach. Structural Safety 2001;23:179–92. [19] Simiu E, Heckert NA. Extreme wind distribution tails: a ‘peaks over threshold’ approach. ASCE, Journal of Structural Engineering 1996;122(5):539–47. [20] Stedinger JR, Vogel RM, Foufoula-Georgiou E. Frequency analysis of extreme events. In: Maidment D, editor. Handbook of hydrology. New York: McGraw Hill Book Co; 1993 [Chapter 18]. [21] Rao RM, Bopardikar AJ. Wavelet transforms, introduction to theory and applications. Reading, MA: Addison-Wesley; 1998. [22] Bayazit M, Aksoy H. Using wavelets for data generation. Journal of Applied Statistics 2001;28(2): 157–66. [23] Bayazit M, Onoz B, Aksoy H. Nonparametric streamflow simulation by wavelet or Fourier analysis. Hydrological Sciences Journal 2001;46(4):623–34. [24] Aksoy H. Storage capacity for river reservoirs by wavelet-based generation of sequent peak algorithm. Water Resources Management 2001;15(6):423–37. [25] Aksoy H, Akar T, Unal NE. Wavelet analysis for modeling suspended sediment discharge. Nordic Hydrology 2004;35:165–74. [26] Unal NE, Aksoy H, Akar T. Annual and monthly rainfall data generation schemes. Stochastic Environmental Research and Risk Assessment 2044;18(6):in press.