A machine learning approach for analyzing and predicting wind

1 A machine learning approach for analyzing and predicting wind turbine response to 2 diurnal and nocturnal wind fields 3 J. Park1, K. Smarsly2, K. H. Law1 and D. Hartmann3 4 5 6 1 Department of Civil and Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, CA 94305, USA; email: {jkpark11, law}@stanford.edu 7 8 2 9 10 Department of Civil Engineering, Bauhaus University Weimar, Coudraystr. 7, 99423 Weimar, GERMANY; email: kay.smarsly@uni-weimar.de 11 12 13 3 Department of Civil and Environmental Engineering, Ruhr University Bochum, Universitätsstr. 150, 44780 Bochum, GERMANY; email: hartus@inf.bi.rub.de 14 15 ABSTRACT 16 17 Site- and time-specific wind field characteristics have a significant impact on the 18 structural response and the lifespan of wind turbines. This paper presents a machine 19 learning approach towards analyzing and predicting the response of a wind turbine 20 structure to diurnal and nocturnal wind fields. Machine learning algorithms are applied (i) 21 to better understand the changes of wind field characteristics due to atmospheric 22 conditions, and (ii) to gain insights into the wind turbine loads being affected by the wind 23 field. Using Gaussian Mixture Models, the variations in wind field characteristics are 1 24 investigated by comparing the joint probability density functions of selected wind field 25 features. The wind field features are constructed from long-term monitoring data taken 26 from a 500 kW wind turbine in Germany that is used as a reference system. Furthermore, 27 employing Gaussian Discriminant Analysis, representative daytime and nocturnal wind 28 turbine loads are compared and analyzed. 29 30 INTRODUCTION 31 32 Stochastic wind effects can cause fluctuations in the wind turbine responses such as loads, 33 displacement, fatigue damages and power outputs. Typically, recorded site-specific wind 34 field data are described by time-averaged statistics, such as mean wind speed, mean wind 35 direction, turbulence intensity and power exponent describing the vertical profile of the 36 mean wind speed. In addition to the mean statistics, the distributions of such wind 37 parameters are important factors in evaluating the wind turbine responses. Currently, the 38 wind turbine design standard of the International Electrotechnical Commission (IEC 39 61400-1, 2005) uses the Weibull probability distribution to describe only the mean wind 40 speed. In this study, we describe site-specific wind conditions by using multivariate 41 probability density functions (PDFs) for wind parameters. We implement Gaussian 42 Mixture Models (GMMs) to construct the PDFs to study how wind field characteristics 43 vary over the day and night time periods. 44 45 Analyzing wind turbine responses is difficult due to the complex interactions between a 46 wind turbine structure and the stochastic wind flow. Describing the input wind field is 2 47 even more difficult. With recent advances in sensor technologies, however, a wind 48 turbine can now be continuously monitored and the measurements data can be used to 49 gain a better understanding on the dynamic responses of the wind turbine structure. Using 50 the measured wind field as input data and the wind turbine response as output data, the 51 relationship between them can be studied by applying supervised machine learning 52 techniques. In this study, we investigate the influence of wind fields on wind turbine 53 response by constructing the response classification models using Gaussian discriminant 54 analysis. By combining the variation of a wind field captured by the PDF of the wind 55 features and the influences of wind fields on the wind turbine captured by the response 56 classification model, new information can be generated about how different atmospheric 57 conditions affect wind turbine responses. In this study, we elucidate the reason for the 58 different levels of wind turbine response at day and night time periods by relating the 59 time-specific PDFs for the wind features and the wind turbine response classification 60 functions. The machine learning approach is demonstrated using the data sets collected 61 from a 500 kW wind turbine located in Germany. 62 63 This paper is organized as follows: First, the integrated life-cycle management (LCM) 64 framework implemented for the 500 kW wind turbine is presented. The system, which 65 includes two major components, namely a “structural health monitoring (SHM) system” 66 and a “decentralized software system”, has been in operation for over four years. The 67 methodologies for the proposed machine learning approaches are then described. The 68 basic features of Gaussian Mixture Models (GMMs) are introduced and the application of 69 GMMs to construct the multivariate probability density function for the collected wind 3 70 field data is presented. The use of Gaussian discriminant analysis to study the effect of 71 wind field on wind turbine response is discussed. The paper is concluded with a brief 72 summary and discussion. 73 74 WIND TURBINE LIFE-CYCLE MANAGEMENT FRAMEWORK 75 76 This study is based on the monitoring data collected on a 500 kW wind turbine, located in 77 Germany (see Figure 1). The wind turbine has a hub height of 65 m and upwind rotor 78 diameter of 40.3 m. A life cycle management (LCM) framework has been developed for 79 the monitoring and management operation of the wind turbine. The LCM framework, as 80 shown in Figure 2, consists of six interconnected components that are installed at 81 geographically distributed locations: 82 83 1. A SHM system comprises of sensors, several data acquisition units as well as a 84 local computer. The SHM system is installed on the wind turbine to 85 continuously collect not only structural, environmental but also operational 86 data. 87 88 2. A decentralized software system is installed on different computers at the 89 Institute for Computational Engineering (ICE) in Bochum, Germany. The 90 software system is remotely connected to the SHM system to continuously 91 collect, process and archive the recorded data. Additionally, software modules 4 92 have been developed to support automated data management and provide 93 Internet-enabled remote access to the monitoring data. 94 3. An agent-based self-diagnostic system for detecting sensor malfunctions has been implemented to ensure reliability and availability of the SHM system. 95 96 97 4. A model updating component is included to continuously update the 98 computational wind turbine models and to support damage detection analyses 99 based on continuously updated monitoring data. 100 101 5. A management module is incorporated to enable online life-cycle 102 management through remote data access and analyses of the structural, 103 environmental, and operational conditions of the wind turbine. 104 105 6. A 32-node PC cluster, also located at the ICE in Bochum, is integrated into 106 the LCM framework to provide high-performance parallel computing 107 capabilities for solving computationally intensive calculations. 108 5 109 110 Figure 1. Monitored wind turbine and anemometers used. 111 112 Details of the components of the LCM framework have been previously described by 113 Smarsly et al. (2012a,b,c, 2013a,b) and Hartmann et al. (2011). This long term life cycle 114 management system provides valuable information that can be used not only to monitor 115 the integrity of the wind turbine structure, but also to support new research and to test 116 algorithmic developments. 117 applicability of machine learning algorithms for analyzing the relationships between wind 118 turbine response and wind field characteristics. The following briefly discusses the two 119 basic components that are related to the collection and management of the measurement 120 data, namely, the “SHM System” and the “Decentralized Software System”. We then 121 discuss the data sets employed in this study. Using the monitoring data, this study investigates the 6 122 123 Figure 2.Architecture of the LCM framework. 124 125 Structural health monitoring system 126 127 A SHM system is developed and installed inside and outside of the zinc-coated steel 128 tower as well as at the concrete foundation of the wind turbine structure (Lachmann et al. 129 2009). The SHM system consists of a network of sensors, data acquisition units (DAUs) 130 and an on-site server installed in the maintenance room of the wind turbine. Six three- 131 dimensional accelerometers, type PCB-3713D1FD3G, are mounted on five different 132 levels on the inner surface of the wind turbine tower. The sensitivity of the 133 accelerometers is 700 mV/g with a measurement range of ± 3.0 g and a frequency range 134 between 0 and 100 Hz. Three additional accelerometers are placed at the foundation of 135 the wind turbine. Since very small accelerations are expected at the foundation, single- 7 136 axis piezoelectric seismic ICP accelerometers of type PCB-393B12 with a sensitivity of 137 10,000 mV/g are used; their measurement range amounts to ± 0.5 g. Furthermore, six 138 inductive displacement transducers HBM-W5K, having a range of 5 mm, are mounted at 139 21 m and 42 m height, inside the tower. To determine temperature effects on the 140 displacement measurements, the inductive displacement transducers are complemented 141 by RTD surface sensors Pt100 capable of measuring temperature from –60 °C to +200 °C. 142 Additional temperature sensors are placed inside and outside the tower to gauge 143 temperature gradients. Last but not least, two anemometers are deployed. The first one, a 144 cup anemometer that is directly connected to the integrated supervisory control and data 145 acquisition (SCADA) system of the wind turbine, is installed on top of the nacelle at 67 146 m height (Figure 1). The second one, a three-dimensional ultrasonic anemometer, is 147 mounted on a telescopic mast adjacent to the wind turbine at 13 m height; it continuously 148 monitors the horizontal and vertical wind speed (0...60 m/s), the wind direction (0...360°) 149 and the air temperature (–40...60°C). 150 151 The sensors are controlled by the DAUs that are connected to the on-site server located in 152 the maintenance room of the wind turbine. For the acquisition of temperature data, three 153 4-channel Picotech RTD input modules PT-104 are deployed. For the acquisition of 154 acceleration and displacement data, four Spider8 measuring units are used. Each Spider8 155 unit has separate A/D converters ensuring simultaneous measurements at sampling rates 156 between 1 Hz and 9,600 Hz. All data sets, being sampled and digitized, are continuously 157 forwarded from the DAUs to the on-site server for temporary storage. In addition to the 158 structural and environmental sensor data collected by the DAUs, operational wind turbine 8 159 data, such as power production and rotational speed, is taken directly from the wind 160 turbine’s internal SCADA system. 161 162 All recorded data sets, referred to as “primary monitoring data”, are forwarded 163 continuously from the data acquisition units to the on-site server for temporary storage 164 and periodic backups. Using a permanently installed DSL connection, the on-site server 165 transfers the primary monitoring data to the decentralized software system of the LCM 166 framework. 167 168 Decentralized software system 169 170 Installed at different computers at ICE in Bochum, the decentralized software system 171 serves two basic purposes. First, it provides a persistent storage for the data recorded 172 from the wind turbine supporting automated data management and processing. Second, it 173 includes a set of program modules enabling remote access to the LCM framework. 174 175 The collected primary monitoring data (or “raw data”) is transferred from the on-site 176 server to the main server, and then, to the MySQL central monitoring database system 177 both installed at the ICE. The data transmission is automatically executed by a time-based 178 UNIX utility running on the on-site server that ensures the periodic execution of tasks 179 according to prescribed scheduled time intervals. On the main server at ICE, the primary 180 monitoring data is converted into “secondary monitoring data” that can easily be 181 interpreted by human users. The primary monitoring data is synchronized and aggregated. 9 182 Metadata is added to provide specifications of the installed sensors, the IDs of data 183 acquisition units, output specification details, date and time formats, etc. Furthermore, 184 “tertiary monitoring data”, summarizing the basic statistics of the data sets, such as 185 quartiles, medians and means, is computed at given time intervals. The secondary and 186 tertiary monitoring data is backed up in a RAID-based storage system and then stored in 187 the monitoring database. 188 189 Once the data is successfully converted and persistently stored, an acknowledgement is 190 sent from the main server at ICE to the on-site server in the wind turbine, whereupon the 191 respective primary monitoring data on the on-site server is deleted. In order to avoid 192 inconsistencies, all data sets involved are automatically “locked” during the conversion 193 process to prohibit unacceptable accessed by external software programs or by human 194 users. The data conversion process described is implemented using the commercial tool 195 “PDI” (Pentaho Data Integration), which offers metadata-driven conversion and data 196 extraction capabilities (Roldan 2009, Castors 2008). Descriptions of performance tests 197 validating the automated data conversion process have been documented by Smarsly and 198 Hartmann (2009a, 2009b, 2010). 199 200 Remote access to the monitoring data is provided in two ways, through a web interface 201 and a direct database connection. In particular, the web interface, installed at ICE in 202 Bochum, offers problem dependent GUIs for remotely visualizing, exporting and 203 analyzing the monitoring data, while the database connection allows accessing the data 204 directly. The web interface is implemented as a web service written in Java; it is based on 10 205 Adobe Flex, an open-source framework for developing Rich Internet Applications (RIAs), 206 using the Adobe Flash platform (Adobe 2007, Polanco and Pedersen 2009). Figure 3 207 shows the web interface exemplarily displaying the monitoring data collected over 4 208 hours, on October 1, 2013. Using the control panel on the right hand side and the menu 209 bar on top of the GUI, the user can select the data collected by specific sensors, specify 210 the time intervals to be plotted, conduct online data analyses, and export data sets. Figure 211 3 displays a time history of horizontal acceleration data collected by a Spider8 measuring 212 unit through a 3D accelerometer that is installed at 63 m height of the wind turbine tower. 213 In addition to the acceleration data, a time history of horizontal wind speed (labeled U67), 214 collected by the cup anemometer installed on top of the wind turbine nacelle at 67 m 215 height, is shown. 216 217 218 Figure 3. Web interface providing remote access to the LCM framework. 219 11 220 Data sets used in this study 221 222 The machine learning approaches to be discussed employ various inputs (i.e. wind field 223 statistics) and the corresponding outputs (i.e. structural response data) in order to 224 understand the wind characteristics and to make reliable predictions on the structural 225 wind turbine behavior, by “learning” from the existing input-output patterns. As input, 226 long-term wind field statistics computed from the monitoring data recorded by the 227 anemometers described above are used in this study. Specifically, the input feature vector 228 ̅67 , 𝑈 ̅13 , (𝑄3 − 𝑄1 )13 ) consists of three statistical features, where 𝒙 = (𝑥1 , 𝑥2 , 𝑥3 ) = (𝑈 229 ̅67 is the mean of a 20-minute wind speed time series recorded at 67 m height by the cup 𝑈 230 ̅13 is the mean of a 20-minute wind speed time series obtained at 13 m anemometer, 𝑈 231 height by the ultrasonic anemometer, and(𝑄3 − 𝑄1 )13is the interquartile, the difference 232 between the 75 percentile and the 25 percentile for the data distribution over the 20- 233 minute wind speed time series recorded at the 13 m height. Since the monitoring data 234 stored in the database system includes only the statistical mean and quartile data for each 235 20-minute period, the interquartile is used in this study as a measure to quantify the level 236 of turbulence. 237 238 For the output response, the acceleration of the tower recorded by one of the 239 accelerometers located at 21 m height is used. Specifically, the interquartile values for the 240 20-minute acceleration time series are employed as an indirect measure for the averaged 241 magnitude of the wind turbine tower acceleration. Assuming the acceleration time series 242 are of zero or constant mean values, the interquartile value is a reasonable measure 12 243 analogous to the RMS value of the acceleration time series, which is commonly used for 244 quantifying the averaged magnitude of acceleration response of on a structure. For 245 classification, the interquartile values are evenly divided and categorized into 𝑁𝐶 246 discretized classes based on their magnitude; a response class is denoted by 𝑦𝐶 , 𝑦𝐶 ∈ 247 {1, … , 𝑁𝐶 }. In total, 7,960 pairs of 𝒙 (input feature vector) and 𝑦𝐶 (the interquartile output 248 class), provided by the LCM framework, serve as the basis for this study. 249 250 CHARACTERIZATION OF WIND FIELDS 251 252 Gaussian mixture models (GMMs) can be regarded as a statistical unsupervised learning 253 to extract the underlying hidden structure by estimating the probability density of features 254 from a statistical data set. Due to their capability of representing different probability 255 distributions, GMMs have been widely used in computer vision and speaker recognition 256 (Stauffer and Grimson 1999; Reynolds et al., 2000). GMMs have also been applied to 257 describe the stochastic load variations in a distributed energy grid (Singh et al., 2010; 258 Gonzalez-Longatt et al., 2012). In this study, GMMs are used to study the wind field 259 characteristics for different time periods of a day by constructing the multivariate 260 probability density function based on diurnal and nocturnal wind statistics. 261 262 Gaussian Mixture Models 263 264 A Gaussian mixture model (GMM) can be regarded as a multivariate probability density 265 function written in terms of the weighted sum of the Gaussian probability density 13 266 functions. Due to the combination of multiple Gaussian functions, a GMM can represent 267 a complex multi-modal probability density function. For the statistical wind input feature 268 ̅67 , 𝑈 ̅13 , (𝑄3 − 𝑄1 )13 ) , the multivariate PDF 𝑓𝑋 (𝒙) is vector 𝒙 = (𝑥1 , 𝑥2 , 𝑥3 ) = (𝑈 269 expressed in terms of a linear combination of 𝐾 probability density functions: 𝐾 𝑓𝑋 (𝒙) = ∑ 𝜙𝑗 𝑝𝑗 (𝒙) (1) 𝑗=1 270 where 𝑝𝑗 (𝒙) is the Gaussian probability density fucntion (GPDF) for the 𝑗th mixture 271 component of the GMM and 𝜙𝑗 , ∑𝐾 𝑗=1 𝜙𝑗 = 1, is the corresponding mixture weight. The 272 𝑗th GPDF 𝑝𝑗 (𝒙) can be fully described by its mean vector 𝝁𝑗 and covariance matrix 𝚺𝑗 273 (Reynolds et al., 2000): 𝑝𝑗 (𝒙) = 1 1 𝑇 exp⁡(− (𝒙 − 𝝁𝑗 ) 𝚺𝑗−1 (𝒙 − 𝝁𝑗 )) 2 √(2𝜋)𝑛 |𝚺𝑗 | (2) 274 where 𝑛 is the dimension of input feature vector 𝒙. To construct the PDF 𝑓𝑋 (𝒙), the 275 parameters 𝝓 = {𝜙1 , … , 𝜙𝐾 }, 𝝁 = {𝝁1 , … , 𝝁𝐾 } and 𝚺 = {𝚺1 , … , 𝚺𝐾 } are determined from 276 (𝑖) ̅ (𝑖) (𝑖) ̅67 the measurement (training) data set 𝒙(𝑖) = ⁡ (𝑈 , 𝑈13 , (𝑄3 − 𝑄1 )13 ), 𝑖 = 1, … , 𝑚𝑢 , where 277 𝒙(𝑖) is the 𝑖th input feature vector and 𝑚𝑢 denotes the size (number of input feature 278 vectors) of the training data set. 279 Assuming the input vectors 𝒙(𝑖) , 𝑖 = 1, … , 𝑚𝑢 , in the training data set are independent, 280 the parameters 𝝓, 𝝁⁡and⁡𝚺 can be estimated as the values that maximize the following 281 log-likelihood estimation from the measured wind input feature data as: 14 𝑚𝑢 𝑙(𝝓, 𝝁, 𝚺) = ln ∏ 𝑝( 𝒙(𝑖) ; ⁡𝝓, 𝝁, 𝚺) 𝑖=1 𝑚𝑢 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= ∑ ln 𝑝(𝒙(𝑖) ; ⁡𝝓, 𝝁, 𝚺) ⁡ 𝑖=1 𝑚𝑢 (3) ⁡𝐾 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= ∑ ln ( ∑ 𝑝⁡(𝒙(𝑖) , 𝑧 (𝑖) ; 𝝁, 𝚺, 𝝓)) 𝑧 (𝑖) =1 ⁡𝐾 𝑖=1 𝑚𝑢 282 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= ∑ ln ( ∑ 𝑝(𝒙(𝑖) |𝑧 (𝑖) ; 𝝁, 𝚺)𝑝(𝑧 (𝑖) ; 𝝓)) 𝑧 (𝑖) =1 𝑖=1 283 284 where 𝑝(𝒙(𝑖) ; 𝝓, 𝝁, 𝚺) is the probability of the input feature 𝒙(𝑖) given the parameters, 285 𝝓, 𝝁⁡and⁡𝚺. The latent random variable 𝑧 (𝑖) specifies one of the 𝐾 possible Gaussian 286 probability distributions from which 𝒙(𝑖) is drawn. Because 𝑧 (𝑖) is a random vairable 287 individually assigned to each 𝒙(𝑖) , it follows different probablity distributions over the 𝐾 288 possible GPDFs. If 𝑧 (𝑖) is explicitly known apriori, Eq. (3) can be expressed in terms of 289 the parameters of GMM and optimized with repsect to the parameters. 290 291 Since 𝑧 (𝑖) is unknown, Eq. (3) cannot be explicitly defined; therefore, the parameters of 292 the GMM are difficult to estimate. In this study, an iterative expectation maximization 293 (EM) algorithm is employed to find the maximum likelihood estimates of the statistical 294 paraemters (Render and Walker, 1984). In EM, the lower bound of the log-likelihood 295 function 𝑙(𝝓, 𝝁, 𝚺) in Eq. (3) is represented as: 296 15 𝑚𝑢 𝐾⁡ ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡𝑙(𝝓, 𝝁, 𝚺) = ∑ ln ∑ 𝑝 (𝒙(𝑖) , 𝑧 (𝑖) ; ⁡𝝓, 𝝁, 𝚺) 𝑖=1 𝑧 (𝑖) =1 𝑚𝑢 𝐾⁡ 𝑝(𝒙(𝑖) , 𝑧 (𝑖) ; ⁡𝝓, 𝝁, 𝚺) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= ∑ ln ∑ 𝑞𝑖 (𝑧 ) 𝑞𝑖 (𝑧 (𝑖) ) (𝑖) (𝑖) 𝑖=1 𝑧 𝑚𝑢 𝐾⁡ (4) =1 𝑝(𝒙(𝑖) , 𝑧 (𝑖) ; ⁡𝝓, 𝝁, 𝚺) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡≥ ∑ ∑ 𝑞𝑖 (𝑧 ) ln 𝑞𝑖 (𝑧 (𝑖) ) (𝑖) (𝑖) 𝑖=1 𝑧 =1 297 where 𝑞𝑖 (𝑧 (𝑖) ), ∑𝐾⁡ 𝑞 (𝑧 (𝑖) ) = 1, denotes the probability that the 𝑖th input feature 𝒙(𝑖) 𝑧 (𝑖) =1 𝑖 298 is drawn from the 𝑧 (𝑖) th GPDF among the 𝐾 GPDFs. If we let 𝑌(𝑧 (𝑖) ) = 299 the inequality in Eq.(4) is always true for any probability distribution 𝑞𝑖 (𝑧 (𝑖) ) because, by 300 the Jensen’s inequaltiy, ln E[𝑌(𝑧 (𝑖) )] ≥ E[ln 𝑌(𝑧 (𝑖) )] (Boyd and Vandenberghe, 2004). 𝑝(𝒙(𝑖) ,𝑧 (𝑖) ;⁡𝝓,𝝁,𝚺) , 𝑞𝑖 (𝑧 (𝑖) ) 301 302 The EM algorithm consists of two iterative steps. For the “E-step” the probability 303 distribution 𝑞𝑖 (𝑧 (𝑖) = 𝑗)⁡for⁡𝑗 = 1, … , 𝐾 is computed (i.e., the GPDF from which 𝒙(𝑖) is 304 drawn) using the current feature vector 𝒙(𝑖) and the parameters, 𝝓, 𝝁⁡and⁡𝚺, estimated in 305 the preceding (M-step): 𝑞𝑖 (𝑧 (𝑖) = 𝑗) = 𝑝(𝑧 (𝑖) = 𝑗|𝒙(𝑖) ; 𝝓, 𝝁, 𝚺) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= 𝑝(𝒙(𝑖) |𝑧 (𝑖) = 𝑗; 𝝁, 𝚺)𝑝(𝑧 (𝑖) = 𝑗; 𝝓) 𝑝(𝒙(𝑖) ; 𝝓, 𝝁, 𝚺) (5) 306 Eq. (5) can be easily evaluated using Eq. (2) because here we assume 𝑧 (𝑖) = 𝑗. The “M- 307 step” then updates the parameters, 𝝓, 𝝁⁡and⁡𝚺, that maximize the lower bound of the log- 308 likelihood function in Eq. (4) by using 𝑞𝑖 (𝑧 (𝑖) ) estimated in the preceding E-step. The E- 16 309 and M-steps are repeated until the parameters, 𝝓, 𝝁⁡and⁡𝚺, converge to some acceptable 310 values. 311 312 Application to wind fields 313 314 To study the diurnal and nocturnal wind field characteristics at the wind turbine site, we 315 construct the PDFs from the wind data collected from the anemometers. In this study, we 316 set 𝑡1 = 6: 00⁡AM⁡~⁡6: 00⁡PM (daytime) and 𝑡2 = 6: 00⁡~⁡to⁡6: 00⁡PM (night time). A 317 total number of 7,960 input feature vectors, measured for a period of 110 days, are 318 categorized into the two time periods 𝑡1 and 𝑡2 . Two data sets (each with 3,980 feature 319 vectors) are then used to construct the time specific PDFs, 𝑓𝑋 (𝒙|𝑡1 ) and 𝑓𝑋 (𝒙|𝑡2 ). The 320 wind input feature data for the two time periods are plotted as shown in Figure 4. It can 321 be seen that the wind field characteristics for the two time periods are quite different 322 because of the temperature gradients with respect to the elevation; the wind speeds and 323 the levels of turbulence are usually higher during the day than during the night due to 324 more active airflow mixing (Stull, 1988). 17 ̅67 and 𝑈 ̅13 Figure 4. Wind data collected during day (circle) and night (cross) times. 𝑈 denote the mean of a 20-minute wind speed time series data at 67 m height and at 13 m height, respectively.⁡(𝑄3 − 𝑄1 )13 denotes the interquartile of a 20-minute wind speed time series at 13 m height. 325 Figure 5 shows the PDFs 𝑓𝑋 (𝒙|𝑡1 ) and 𝑓𝑋 (𝒙|𝑡2 ) constructed by applying the GMM to the 326 diurnal and the nocturnal wind data sets, respectively. In this study, 3 GPDFs are used to 327 construct the PDFs 𝑓𝑋 (𝒙|𝑡1 ) and 𝑓𝑋 (𝒙|𝑡2 ). The number of GPDFs is determined such that 328 the Akaike information criterion (AIC) (Akaike, 1974), that quantifies the tradeoff 329 between the complexity of a statistical model and the fitness of the statistical model to 330 data, is maximized. Figure 5 shows the shapes and the densities of the PDFs, which 331 depict the different characteristics for the wind fields for the two time periods. In Figure 5, 332 the regions with darker color correspond to the regions with lots of measurement data 333 points in Figure 4. The constructed PDFs effectively describe the density of the wind 334 input feature data and the correlations among the different wind input features using only 335 a small number of GMM parameters. 18 (a) 𝑓𝑋 (𝒙|𝑡1 ) (daytime) (b) 𝑓𝑋 (𝒙|𝑡2 ) (night time) Figure 5. Multivariate probability density functions for the wind field input features. 336 337 With the multivariate probability density function 𝑓𝑋 (𝒙) for the three wind field input 338 ̅67 , 𝑈 ̅13 and (𝑄3 − 𝑄1 )13 , three marginal PDFs, 𝑓(𝑈 ̅67 , 𝑈 ̅13 ) , 𝑓(𝑈 ̅13 , (𝑄3 − features, 𝑈 339 ̅67 ) , can be computed in that 𝑓𝑋 (𝒙\𝑥𝑘 ) is obtained by 𝑄1 )13 ) and 𝑓((𝑄3 − 𝑄1 )13 , 𝑈 340 integrating 𝑓𝑋 (𝒙) with respect to the feature variable 𝑥𝑘 . Figure 6 shows the marginal 341 PDFs computed from the 𝑓𝑋 (𝒙|𝑡1 ) and 𝑓𝑋 (𝒙|𝑡2 ). Based on the marginal PDFs, we can 342 study more clearly about the relationships between each pair of wind statistical 343 parameters. It can be seen that positive correlations exist between the features, but the 344 levels of correlation differ slightly between the day and the night times. 345 19 (a) Diurnal (b) Nocturnal Figure 6. Marginal PDFs showing the characteristics for the diurnal and nocturnal wind fields. The contours represent PDFs constructed from GMM applied to the input data, represented as dot. 20 346 Analogously, we can obtain an univariate PDF 𝑓𝑋𝑖 (𝑥𝑖 ) for a specific feature variable by 347 integrating the 𝑓𝑋 (𝒙) with respect to all other feature variables. The probability mass 348 function for a feature variable can then be computed by integrating the univariate PDF 349 over a discrete interval. Using the discretized intervals of 14/30 m/s, 8/30 m/s and 4/30 350 m/s, which divide the range of input features into 30 bins, respectively, the discrete 351 ̅67 ) , 𝑃(𝑈 ̅13 ) and 𝑃((𝑄3 − 𝑄1 )13 ) , are estimated and probability mass functions, 𝑃(𝑈 352 plotted as continuous line in Figure 7. That is, the univariate PDFs are generated from the 353 multivariate PDF constructed by applying GMMs to the wind feature training data set. 354 Comparing the discrete PMF computed by counting the frequency of occurrence directly 355 from the wind data (shown as bar in Figure 7), we can observe that the statistical trends 356 predicted from the GMM are in good agreement with those described by the data. 357 358 The univariate PDFs can also be derived individually by applying GMMs to each wind 359 feature. As shown in Figure 7, the PMF obtained from the multivariate PDF (3-D GMM) 360 and the PMF from the independent univariate PDF (1-D GMM) are in good agreement. 361 This result indicates that the high dimensional PDFs constructed appear to preserve the 362 statistical trend in the wind input features. Furthermore, the multivariate PDFs can 363 provide valuable information about the correlation among the wind field features. 364 21 (a) Diurnal (b) Nocturnal Figure 7. PMFs showing the characteristics for the diurnal and nocturnal wind fields. 365 366 WIND TURBINE LOAD CLASSIFICATION 367 368 Gaussian discriminant analysis (GDA), first proposed by Fisher (1936) as statistical 369 classification analysis, has been widely adopted as a supervised learning algorithm to 370 extract the underlying relationships between the input (feature vectors) and the output 371 (classes) data. In practice, if the input features for each output class follow the Gaussian 22 372 distribution, GDA requires the least number of training data to find the parameters for a 373 statistical classification model and has better classification accuracy compared to other 374 classification algorithms that map the input features directly to the output classes (Pohar 375 et al. 2004). In this study, GDA is used to study the relationships between the wind input 376 features and the tower acceleration response of the wind turbine structure. 377 378 Gaussian Discriminant Analysis 379 380 The Gaussian discriminant analysis constructs the response classification (prediction) 381 ̅67 , 𝑈 ̅13 , (𝑄3 − 𝑄1 )13 ), as defined function 𝑔𝐶 (𝒙) that maps the wind input features 𝒙 = (𝑈 382 previously, to the output response class 𝑦𝐶 ∈ {1, … , 𝑁𝐶 }, where 𝑁𝐶 is the number of 383 classes that categorize the different levels of output response feature. In this study, the 384 output response feature is the interquartile values (i.e. the difference between the 75 385 percentile and the 25 percentile) for a 20 minute acceleration time series recorded by an 386 accelerometer at a 21 m height inside the wind turbine tower. Analogous to the RMS 387 measure, the interquartile is used as measure to quantify the average magnitude of the 388 tower acceleration. Each interquartile data value is then assigned to an output response 389 class 𝑦𝐶 . 390 (𝑖) 391 Using 𝑚𝑠 pairs of input and output (training) data, (𝒙(𝑖) , 𝑦𝐶 ), 𝑖 = 1, … , 𝑚𝑠 , GDA 392 constructs the probability density function 𝑝(𝒙|𝑦𝐶 ) for the input feature vector 𝒙 393 conditional on the given output response class 𝑦𝐶 . Assuming multivariate Gaussian 23 394 distribution with mean vector 𝝁𝑗 and covariance matrix 𝚺𝑗 , the PDF for the 𝑗th output 395 class 𝑝(𝒙|𝑦𝐶 = 𝑗), 𝑗 = 1, … , 𝑁𝐶 , can be represented as: 396 𝑝(𝒙|𝑦𝐶 = 𝑗) = 1 √(2𝜋)𝑛 |𝚺𝑗 | 1 𝑇 exp⁡(− (𝒙 − 𝝁𝑗 ) 𝚺𝑗−1 (𝒙 − 𝝁𝑗 )) 2 (6) 397 398 where 𝑛 is the number of features in 𝒙, i.e. the dimension of 𝒙. Furthermore, the prior 399 distribution on the response class 𝑦𝐶 is expressed as a multinomial distribution, i.e. 400 𝑝(𝑦𝐶 = 𝑗; ⁡𝝋) = 𝜑𝑗 , where 𝜑𝑗 is simply the ratio between the number of data points for 401 class 𝑦𝐶 = 𝑗. 402 403 To obtain the mapping function 𝑔𝐶 (𝒙) between the input and output features, we 404 construct 𝑝(𝒙|𝑦𝐶 ) and 𝑝(𝑦𝐶 ) by estimating the parameters, 𝝁 = {𝝁1 , … , 𝝁𝑁𝐶 } , 𝚺 = 405 {𝚺1 , … , 𝚺𝑁𝐶 } and 𝝋 = {𝜑1 , … , 𝜑𝑁𝐶 }, for the 𝑁𝐶 conditional distributions and the priors. 406 The parameters 𝝁, 𝚺 and 𝝋 can be determined by maximizing the log-likelihood function 407 over the 𝑚𝑠 pairs of input and output (training) data sets (Li et al., 2006): 𝑚𝑠 (𝑖) 𝑙(𝝋, 𝝁, 𝚺) = ∑ ln⁡𝑝( 𝒙(𝑖) , 𝑦𝐶 ; 𝝋, 𝝁, 𝚺) 𝑖=1 (7) 𝑚𝑠 (𝑖) (𝑖) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= ∑ ln ⁡𝑝(𝒙(𝑖) |𝑦𝐶 ; 𝝁, 𝚺)𝑝(𝑦𝐶 ; 𝝋) 𝑖=1 408 24 (𝑖) 409 The response class 𝑦𝐶 denotes the specific PDF from which 𝑥 (𝑖) is drawn. As the class 410 𝑦𝐶 is explicitly given from the training data, Eq. (7) can be expressed in terms of the 411 previously defined 𝑝(𝒙|𝑦𝐶 ) and 𝑝(𝑦𝐶 ). Note that the log-likelihood function in Eq. 7 is 412 concave (because of the sum of logarithmic functions), the function can be easily 413 optimized with respect to the parameters 𝝁, 𝚺, and 𝝋. (𝑖) 414 415 Once the parameters⁡𝝁, 𝚺, and 𝝋, for 𝑝(𝒙|𝑦𝐶 ) and 𝑝(𝑦𝐶 ) are determined, the posterior 416 distribution on the response class 𝑦𝐶 given the wind input feature 𝒙 is modeled according 417 to the Bayes’ rule as follows: 418 𝑝(𝑦𝐶 |𝒙) = 𝑝(𝒙|𝑦𝐶 )𝑝(𝑦𝐶 ) 𝑝(𝒙|𝑦𝐶 )𝑝(𝑦𝐶 ) = ⁡ ∑𝑦𝐶 𝑝(𝒙|𝑦𝐶 )𝑝(𝑦𝐶 ) 𝑝(𝒙) (8) 419 Therefore, for the new wind input feature vector 𝒙𝑛𝑒𝑤 representing the newly observed 420 wind field characteristics, the corresponding estimated response class 𝑦̂𝐶 ∈ {1, … , 𝑁𝐶 } 421 can be determined according to the maximum a posteriori detection (MAP) principle 422 expressed as: 423 𝑦̂𝐶 = 𝑔𝐶 (𝒙𝑛𝑒𝑤 ) ≝ argmax 𝑝(𝑦𝐶 |𝒙𝑛𝑒𝑤 ) 𝑦𝐶 (9) ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡= argmax 𝑝(𝒙𝑛𝑒𝑤 |𝑦𝐶 )𝑝(𝑦𝐶 ) 𝑦𝐶 424 425 The response classification function 𝑔𝐶 (𝒙) divides the domain of the input feature 𝒙 (in a 426 3D space) into 𝑁𝐶 regions, each of which corresponds to a different output response class. 25 427 We can construct the boundary between the two neighboring classes 𝑖 and 𝑗 by setting 428 𝑝(𝑦𝐶 = 𝑖|𝒙) = 𝑝(𝑦𝐶 = 𝑗|𝒙) in Eq. (6), which leads to a quadratic equation in terms of 429 the input feature 𝒙 as follows (Hastie et al. 2008): 𝒙𝑇 (𝚺𝑖−1 − 𝚺𝑗−1 )𝒙 − 2(𝝁𝑻𝑖 𝚺𝑖−1 − 𝝁𝑻𝑗 𝚺𝑗−1 )𝒙 + 𝝁𝑻𝑖 𝚺𝑖−1 𝝁𝑖 − 𝝁𝑻𝑗 𝚺𝑗−1 𝝁𝑗 + ln |𝚺𝑖 | |𝚺𝑗 | =0 (10) 430 431 Eq. (10) represents a quadratic surface that separates two neighboring regions (classes). 432 Note that if the covariance matrices are assumed to be the same for all the classes, i.e. 433 𝚺𝑖 = 𝚺⁡ for⁡𝑖 = 1, … , 𝑁𝐶 , Eq. (10) becomes a linear plane surface and the problem is 434 generally known as linear discriminant analysis. 435 436 Application to wind fields and wind turbine responses 437 438 The size of the training data set, 𝑚𝑠 , and the number of output response classes, 𝑁𝐶 , can 439 affect the accuracies of the response classification function 𝑔𝐶 (𝒙) shown in Eq. (9). 440 Figure 8 shows the accuracy ratios of 𝑔𝐶 (𝒙) with different number of response classes 441 and different size of the training data set. The accuracy ratio is calculated as the ratio 442 between the number of correct predictions by 𝑔𝐶 (𝒙) and the size of the test data set, 𝑚𝑡 . 443 Altogether, 1,000 pairs ( 𝑚𝑡 = 1,000 ) of the input feature vector 𝒙 and the output 444 response class 𝑦𝐶⁡ , which do not belong to the training data set, are used to calculate the 445 accuracy ratio for the classification by 𝑔𝐶 (𝒙). Furthermore, to define different number of 446 classes, the interquartile values are evenly divided and categorized into 𝑁𝐶 discretized 447 classes based on their magnitudes, which range from 0 to 19 mg. 26 448 449 Form Figure 8, it can be seen that the accuracy ratio decreases as the number of the 450 response classes increases. On the other hand, for a given number of response classes, the 451 accuracy ratio rarely changes with the size of the training data set. As the size of training 452 data set exceeds the minimum number of data points necessary to estimate the covariance 453 matrix for each class, the accuracy ratio seems to remain the same, which indicates the 454 effectiveness of GDA in terms of the training data size. With the input features employed 455 in this study, we select 5 output response classes (𝑁𝐶 = 5) that show reasonably high 456 accuracy ratios (78 ~ 80 %) from the test data set. The ranges of the interquartile values 457 for each class are shown in Table 1. Figure 8. Comparison of the accuracy ratios for the response classification functions using different numbers of classes and different sizes of training data sets. 458 459 Table 1. The ranges of interquartile value for each output response class. Class 𝑦𝐶 Interquartile range 1 0.0 ~ 3.8 (mg) 2 3.8 ~ 7.6 (mg) 3 7.6 ~ 11.4 (mg) 460 27 4 11.4 ~ 15.2 (mg) 5 15.2 ~ 19.0 (mg) 461 For a total of 7,960 pairs input feature vectors and output response data collected over 462 110 days, we randomly select 3,980 pairs as the training data set. Figure 9 shows the data 463 points for the input feature vectors and the corresponding output response classes. The 464 boundary surfaces discriminating two neighboring classes as determined by Eq. (10) are 465 also plotted in Figure 9. It can be seen that the quadratic boundary surfaces discriminate 466 the response classes reasonably well. 467 468 469 470 471 472 473 474 Figure 9. Training data for GDA and the constructed boundaries classifying the response classes. EXPECTED WIND TURBINE LOADS AND CLASS PROBABILITIES 475 476 The diurnal and nocturnal wind field characteristics are investigated by constructing 477 multivariate probability density functions via GMMs, while the influences of the wind 478 fields on the wind turbine response are studied by constructing classification functions 28 479 with the aid of GDA. Combining these two independently constructed models, we can 480 establish insightful information by establishing a probabilistic framework to study the 481 response of the wind turbine during the day and the night times. Specifically, we compare 482 the class probabilities and the expected classes of the wind turbine tower responses 483 corresponding to the diurnal and the nocturnal winds. 484 485 Evaluation for the Class Probabilities and the Expected Class 486 487 Given the load classification function and the probability density function for wind 488 statistical parameters, the class probabilities and the expected response class can be 489 obtained, without requiring further measurements of the wind turbine responses from 490 sensors. The probability of the wind turbine response class 𝑖⁡conditional on the specific 491 time period 𝑡𝑗 can be estimated by integrating the time-specific probability density 492 function 𝑓𝑋 (𝒙|𝑡𝑗 ) over the input feature domain {𝒙|𝑔𝐶 (𝒙) = 𝑖} corresponding to the wind 493 turbine response class 𝑖 as: ⁡ 𝑝(𝑦𝐶 = 𝑖|𝑡𝑗 ) ≈ 𝑝(𝑦̂𝐶 = 𝑖|𝑡𝑗 ) = ∫ 𝑓𝑋 (𝒙|𝑡𝑗 )𝑑𝒙 {𝒙|𝑔𝐶 (𝒙) = 𝑖 } (11) 494 495 As mentioned before, 𝑦𝐶 is the true wind turbine response class and 𝑦̂𝐶 is the estimated 496 response class by the load classification function 𝑔𝐶 (𝒙), i.e., 𝑦̂𝐶 = 𝑔𝐶 (𝒙). The response 497 class probability 𝑝(𝑦𝐶 |𝑡𝑗 ) can provide how the wind turbine response is distributed 498 during the time period 𝑡𝑗 . The expected wind turbine response class conditional on the 499 time period 𝑡𝑗 can be determined by integrating the response classification function 29 500 𝑔𝐶 (𝒙) weighted by the time specific probability density function 𝑓𝑋 (𝒙|𝑡𝑗 ) over the entire 501 input feature domain as: ⁡ E[𝑦𝐶 |𝑡𝑗 ] ≈ E[𝑦̂𝐶 |𝑡𝑗 ] = ∫ 𝑔𝐶 (𝒙)𝑓𝑋 (𝒙|𝑡𝑗 )𝑑𝒙 (12) 𝒙 502 503 The expected wind turbine response can provide general insight into the difference in the 504 wind turbine response for the time period considered. 505 506 Application to wind fields and wind turbine responses 507 508 To study the wind turbine responses corresponding to the diurnal and the nocturnal wind 509 fields, we construct the class probabilities 𝑃(𝑦𝐶 = 𝑖|𝑡𝑗 ) and the expected response classes 510 E[𝑦𝐶 |𝑡𝑗 ] for the wind turbine tower acceleration during the daytime 𝑡1 and during the 511 nighttime 𝑡2 . Figure 10 shows 𝑓𝑋 (𝒙|𝑡1 ) and 𝑓𝑋 (𝒙|𝑡2 ) overlapped with the load 512 classification function⁡𝑔𝐶 (𝒙), which divides the input feature domain into five different 513 classes. For display purposes, the PDFs are represented by the contour surfaces with 514 𝑓𝑋 (𝒙|𝑡𝑗 ) = 0.02. The probability of the class 𝑖 can be estimated by integrating the PDF 515 intersected with the input feature domain corresponding to the class 𝑖 using Eq. (11). As 516 shown in Figure 10, the PDF for the daytime wind disperses more widely in the region 517 associated with the high load classes (𝑦𝐶 = 3, 4⁡and⁡5) than the PDF for the nighttime 518 wind. As a result, the diurnal wind fields have higher probabilities for classes 3, 4 and 5 519 but lower class probabilities for classes 1 and 2 than the nocturnal wind fields as shown 520 in Figure 11. 30 (a) Day (b) Night Figure 10. PDF 𝑓𝑋 (𝒙|𝑡𝑗 ) and the response classification function ⁡𝑔𝐶 (𝒙) for evaluating the class probabilities and the expected response classes. 521 Figure 11. Comparison of the estimated class probabilities for the day and the night times. 522 523 Using the available data, the true class probabilities and the expected classes 524 corresponding to the diurnal and the nocturnal wind fields can be directly calculated by 525 counting the number of measurement data points in each response class for the day and 526 night time periods. Table 2 compares the class probabilities and the expected classes 527 obtained from the measurement data and those estimated from 𝑓𝑋 (𝒙|𝑡𝑗 ) and 𝑔𝐶 (𝒙). In 528 general, the estimated class probabilities compare very well with the true (measured) 31 529 class probabilities with only slight deviation observed for the nocturnal case. The 530 estimated expected class also matches very well with the expected classes obtained from 531 the measurement data. It is worth noting that while accurately predicting the statistical 532 trend in the wind turbine output response, the machine learning approach can also 533 elucidate the reason for the statistical trend in the wind turbine output response by 534 relating 𝑓𝑋 (𝒙|𝑡𝑗 ) and 𝑔𝐶 (𝒙). That is, the daytime wind fields induce the higher level of 535 the wind turbine tower acceleration due to the higher mean wind speed and turbulence 536 compared to the night time wind fields. Table 2. Comparison of the class probabilities and the expected loads. 537 𝑡1 (day) 𝑡2 (night) Measured Estimated Measured Estimated Class 1 Class 2 Class 3 Class 4 Class 5 0.3835 0.3832 0.5202 0.4916 0.2901 0.2913 0.2807 0.3319 0.1849 0.1845 0.1468 0.1160 0.0799 0.0766 0.0279 0.0347 0.0616 0.0645 0.0244 0.0257 Expected class 2.1461 2.1480 1.7557 1.7710 538 539 540 CONCLUSION 541 542 A machine learning approach has been presented to analyze time-specific wind field 543 characteristics and to predict the response of a wind turbine at different time periods. The 544 machine learning algorithms have been tested using the long-term monitoring data, 545 collected by an integrated life-cycle management system installed at a wind turbine in 546 Germany. 547 32 548 A Gaussian mixture model (GMM) approach is applied to the measured wind data to 549 construct the probability density function for wind statistical parameters in an attempt to 550 understand the characteristics of wind fields. We study the characteristics of the daytime 551 and nighttime wind fields by comparing the PDFs constructed based on the diurnal and 552 nocturnal wind data. The PDFs clearly show that the wind fields in the daytime are 553 characterized by faster mean wind speeds and higher levels of fluctuations than the 554 nighttime counterparts; this trend is effectively characterized by a multivariate probability 555 density function for wind field statistical parameters. 556 557 A Gaussian discriminative analysis (GDA) has been applied to find the relationship 558 between the wind input features and the response of the monitored wind turbine tower. In 559 this study, three input features include the mean wind speeds at 13 m and 67 m height, 560 and the interquartile values of the 20-minute wind speed time series measured at 13 m 561 height. The output response is the interquartile values of 20-minute wind turbine tower 562 acceleration time series. In general, wind turbine long-term monitoring data are collected 563 at limited locations and only statistical parameters (such as means, quartiles and standard 564 deviations) of time series data are archived. It may be worth noting that although the 565 interquartile values employed in this study is a reasonable measure analogous to RMS 566 values of the acceleration time series, the standard deviation or RMS values could be 567 more ideal features for quantifying the averaged magnitude of the acceleration or other 568 types of wind turbine response. In the future, the wind turbine monitoring system will be 569 updated to calculate and archive the standard deviations of wind speed and wind turbine 570 response time series. 33 571 572 By combining the multivariate probability density function and the response 573 classification function, useful information that are helpful to the operation of the wind 574 turbine can be obtained. This study examines the class probabilities and the expected 575 classes for the levels of wind turbine tower acceleration at daytime and nighttime and 576 used them to elucidate the reasons for different wind turbine tower vibrational response. 577 It becomes transparent that the expected response class for the daytime is higher than that 578 for the nighttime because daytime wind fields have higher mean wind speeds and higher 579 levels of turbulence, both of which are positively correlated to level of wind turbine 580 tower vibration. 581 582 The statistical machine learning approach presented in this paper can also be used to 583 study the influence of the daily, monthly, seasonal and yearly variations of wind field 584 characteristics on wind turbine response. If we establish the accurate wind turbine 585 response classification function for a specific wind turbine model and construct a site 586 specific probability density function for the wind input statistical features, the approach 587 can potentially be expanded to evaluate how the wind turbine will respond at that site. In 588 future research, we plan to develop an economical and practical methodology for 589 structure monitoring that uses wind input statistical features commonly collected at each 590 wind turbine but employs structural responses measured from only one or a few 591 representative wind turbines equipped with sensors. 592 593 34 594 ACKNOWLEDGEMENTS 595 596 This research is partially funded by the German Research Foundation (DFG) under grants 597 SM 281/1-1, SM 281/2-1, and HA 1463/20-1. Any opinions, findings, conclusions, or 598 recommendations are those of the authors and do not necessarily reflect the views of the 599 DFG. 600 601 REFERENCES 602 603 Akaike, H. (1974). “A new look at the statistical model identification,” IEEE Transaction 604 on Automatic Control, 19(6), pp.716–723. 605 606 Adobe Systems Incorporated. (2007). “Datasheet Adobe Flex Builder 3 – Create 607 engaging, cross-platform rich Internet applications,” San Jose, CA, USA: Adobe 608 Systems Incorporated. 609 610 Boyd, S. and Vandenberghe, L. (2004). “Convex Optimization,” Cambridge University 611 Press. 612 613 Castors, M. (2008). “PDI Performance tuning check-list,” Technical Report. Orlando, FL, 614 USA: Pentaho Corporation. 615 35 616 Fisher, R (1936). “The use of multiple measurements in taxonomic problems,” Annal 617 Eugen (7), pp.179–188. 618 619 Gonzalez-Longatt, F. M., Rueda, J. L. and Erlich, I., Bogdanov, D. and Villa, W. (2012). 620 “Identification of Gaussian Mixture Model using Mean Variance Mapping Optimization: 621 Venezuelan Case,” In: Proceedings of IEEE PES Innovative Smart Grid Technologies 622 Europe, Berlin, Germany, October 14, 2012. 623 624 Hastie, T, Tibshirani, R. and Friedman, J. (2008). “The elements of statistical learning,” 625 Springer. 626 627 Hartmann, D., Smarsly, K. and Law, K. H. (2011). “Coupling sensor-based structural 628 health monitoring with finite element model updating for probabilistic lifetime estimation 629 of wind energy converter Structures,” In: Proceedings of the 8th International Workshop 630 on Structural Health Monitoring 2011. Stanford, CA, USA, September 13, 2011. 631 632 International Electotechnical Commission. (2005). “Wind Turbines - Part 1: Design 633 Requirements,” IEC 61400-1, Edition 3.0. 634 635 Lachmann, S., Baitsch, M., Hartmann, D. and Höffer, R. (2009). “Structural lifetime 636 prediction for wind energy converters based on health monitoring and system 637 identification,” In: Proceedings of the 5th European & African Conference on Wind 638 Engineering. Florence, Italy, July 19, 2009. 36 639 640 Li, T., Zhu, S. and Ogihara, M. (2006). “Using discriminant analysis for multi-class 641 classiﬁcation: an experimental investigation,” Knowledge and Information Systems, 10(4), 642 pp. 453-472. 643 644 Polanco, J. and Pedersen, A. (2009). “Understanding the Flex 3 Component and 645 Framework Lifecycle,” San Francisco, CA, USA: Development Arc LLC. 646 647 Pohar, M., Blas, M. and Turk, S. (2004). “Comparison of logistic regression and linear 648 discriminant analysis: a simulation study,” Metodološki zvezki, 1(1), pp.143-161. 649 650 Render, A. R. and Walker, F. H. (1984). “Mixture densities, maximum likelihood and the 651 eM algorithm,” SIAM Review, 26(2), pp. 195-239. 652 653 Reynolds, A. D., Quatieri, F. T. and Dunn, B. R. (2000). “Speaker verification using 654 adapted Gaussian Mixture Models,” Digital Signal Processing, 10, pp. 19-41. 655 656 Roldan, M.C. (2009). “Pentaho Data Integration (Kettle) Tutorial,” Technical Report. 657 Orlando, FL, USA: Pentaho Corporation. 658 659 Singh, R., Pal, C. B. and Jabr, A. R. (2010). “Statistical representation of distribution 660 system loads using Gaussian mixture model,” IEEE Transactions on Power Systems, 25 661 (1), pp. 29-37. 37 662 663 Smarsly, K. and Hartmann, D. (2009a). “Real-time monitoring of wind turbines based on 664 software agents,” In: Proceedings of the 18th International Conference on the 665 Applications of Computer Science and Mathematics in Architecture and Civil 666 Engineering. Weimar, Germany, July 7, 2009. 667 668 Smarsly, K. and Hartmann, D. (2009b). “Multi-scale monitoring of wind energy plants 669 based on agent technology,” In: Proceedings of the 7th International Workshop on 670 Structural Health Monitoring. Stanford, CA, USA, September 9, 2009. 671 672 Smarsly, K. and Hartmann, D. (2010). “Agent-oriented development of hybrid wind 673 turbine monitoring systems,” In: Proceedings of the 2010 EG-ICE Workshop on 674 Intelligent Computing in Engineering. Nottingham, UK, June 30, 2010. 675 676 Smarsly, K., Law, K. H. and Hartmann, D. (2012a). “A multiagent-based collaborative 677 framework for a self-managing Structural health monitoring system,” ASCE Journal of 678 Computing in Civil Engineering, 26(1), pp. 76-89. 679 680 Smarsly, K., Law, K. H. and Hartmann, D. (2012b). “Towards life-cycle management of 681 wind turbines based on structural health monitoring,” In: Proceedings of the First 682 International Conference on Performance-Based Life-Cycle Structural Engineering. 683 Hong Kong, China, December 5, 2012. 684 38 685 Smarsly, K., Hartmann, D. and Law, K. H. (2012c). “Integration of Structural Health and 686 Condition Monitoring into the Life-Cycle Management of Wind Turbines,” In: 687 Proceedings of the Civil Structural Health Monitoring Workshop. Berlin, Germany, 688 November 6, 2012. 689 690 Smarsly, K., Hartmann, D. & Law, K. H. (2013a). “A computational framework for life- 691 cycle management of wind turbines incorporating structural health monitoring,” 692 Structural Health Monitoring – An International Journal, 12(4), pp. 359-376. 693 694 Smarsly, K., Hartmann, D. & Law, K. H. (2013b). “An integrated monitoring system for 695 life-cycle management of wind Turbines,” International Journal of Smart Structures and 696 Systems, 12(2), pp. 209-233. 697 698 Stull, R. B. (1998). “An introduction to boundary layer meteorology,” Kluwer Academic 699 Publishers. 700 701 Stauffer, C. and Grimson, W.E.L. (1999). “Adaptive background mixture models for real- 702 time tracking,” In: Proceeding of IEEE Computer Society Conference on Computer 703 Vision and Pattern Recognition, Fort Collins, U.S.A., 23 Jun, 1999 704 705 706 39

A machine learning approach for analyzing and predicting wind

Related documents

Products

Support

A machine learning approach for analyzing and predicting wind

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib