PV MODULE PERFORMANCE UNDER REAL-WORLD TEST CONDITIONS–A DATA ANALYTICS APPROACH by YANG HU Submitted in partial fulfillment of the requirements For the degree of Master of Science Thesis Adviser: Prof. Roger H. French Department of Materials Science and Engineering CASE WESTERN RESERVE UNIVERSITY May, 2014 PV Module Performance Under Real-world Test Conditions–A Data Analytics Approach Case Western Reserve University Case School of Graduate Studies We hereby approve the thesis1 of YANG HU for the degree of Master of Science Prof. Roger French Committee Chair, Adviser Prof. Roger French 11/21/2013 Prof. David Matthiesen Committee Member Prof. David Matthiesen 11/21/2013 Prof. Jennifer Carter Committee Member Prof. Jennifer Carter 11/21/2013 Prof. Jiayang Sun Committee Member Prof. Jiayang Sun 11/21/2013 Dr. Timothy Peshek Committee Member Dr. Timothy Peshek 11/21/2013 Dr. Yifan Xu Committee Member Dr. Yifan Xu 1 11/21/2013 We certify that written approval has been obtained for any proprietary material contained therein. Dedicated to science and the pursuit of progress. Table of Contents List of Figures Acknowledgements Abstract Chapter 1. Introduction vi viii ix 1 Lifetime and degradation science approach 2 Thesis overview 3 Chapter 2. Background and literature review Previous research on real world PV modules’ performance 5 5 Standards 10 Data science 13 Clustering analysis 14 Chapter 3. Real-world Data Acquisition 16 SDLE SunFarm design 16 Global SunFarm network and Energy CRADLE 17 Chapter 4. Results: Real-world data analytics 19 Overview 19 Raw data validation 21 Exploratory Data Analysis (EDA) on Integrated Data 29 Clustering of AC Power Data 31 Data Assembly 36 Sub-sampling 38 iv Clustering of Solar Noon Time Performance Ratio Data Chapter 5. Discussion 42 49 Data analytics 49 Performance at different relative positions 51 Performance of different brands 55 Power time series data clustering 56 Solar noon time performance ratio clustering 58 Chapter 6. Conclusions 60 Chapter 7. Future research 62 Improved SunFarm data quality and redundancy 62 Predictive model 62 Appendix A. List of 24 manufacturers and nameplate power 64 Appendix B. SunFarm network 65 SDLE SunFarm design & characteristics 65 Energy CRADLE SunFarm informatics 73 Appendix. Complete References 76 v List of Figures 2.1 Pie chart of method used to determine Rd 6 2.2 PR subsetting 8 4.1 60 PV modules distribution 20 4.2 Baseline result of 20 brands 22 4.3 Power time series plot 23 4.4 Microinverter’s efficiency 24 4.5 Power curve comparison 29 4.6 Total power production of 20 bands 30 4.7 Normalized power production of 20 brands 32 4.8 Hierarchical cluster 1 33 4.9 Total within cluster sum of square 1 34 4.10 Power time series plot with clustering result 35 4.11 Normalized performance metrics 39 4.12 Noontime PR versus yI 40 4.13 PR in different climate condition 42 4.14 Pairs plot of PR 1 43 4.15 Pairs plot of PR 2 45 4.16 Total within cluster sum of square 2 46 4.17 Hierarchical cluster 2 47 4.18 PR time series plot with clustering result 48 vi 5.1 Sensor cross check 50 5.2 Averaged performance ratio of 15 min around solar noon time 52 5.3 Comparison of normalized AC power in winter 53 5.4 Comparison of normalized AC power in summer 54 B.1 An overview of SDLE Sunfarm 66 B.2 Sample tray and concentrator 67 B.3 Dual axis tracker 69 B.4 Tracker frame 69 B.5 SunFarms within Ohio 72 B.6 Architecture of NO-SQL Hadoop system 73 B.7 Architecture of Energy CRADLE’s user front end 75 vii Acknowledgements I would like to express my deepest gratitude to the patience, diligence, and resourcefulness of the entire team of researchers in the Solar Durability and Lifetime Extension (SDLE) center at Case Western Reserve University, Department of Material Science and Engineering, headed by Prof. Roger H. French. Explicit thanks to Dr. Timothy Peshek and Mohammad A. Hossain who helped build and maintain the SDLE SunFarm data acquisition system. Thanks for the coordinated efforts by researchers at the Center for Statistical Research, Computing and Collaboration (SR2C), Department of Epidemiology & Biostatistics. Prof. Jiayang Sun and Dr. Yifan Xu’s guidance in statistics and data science was instrumental in this work. Assistance and technical support from researchers in the Medical Informatics Division of EECS, especially Prof. G.Q. Zhang and his group members, Yashwanth Reddy Gunapati, and Tarun Jian, who were extremely valuable in completing the data collection and Energy CRADLE part of this work. I would also like to acknowledge the funding for this work. The SDLE center was established through funding through the Ohio Third Frontier, Wright Project Program Award Tech 12-004. The PV module case study was supported by the Bay Area Phocovoltaic Consortium Prime Award No. DE-EE0004946, Subaward Agreement No. 60220829-51077-T. At last I would like to certify that there is no proprietary material in this thesis. viii PV Module Performance Under Real-world Test Conditions–A Data Analytics Approach Abstract by YANG HU 0.1 Abstract In pursuit of a higher fidelity understanding of the long-term degradation of long-lived technologies, such as photovoltaic (PV) systems, the framework of Lifetime and Degradation Science (L&DS) goes beyond initial qualification tests and investigates the underpinning mechanisms of degradation. L&DS concerns itself with the complex and multivariate signatures of the degradation process and uncovering the fundamental physical mechanisms contributing to that degradation. In the case of PV modules, this effort requires extensive continuous monitoring of PV modules’ power production and climatic conditions. The responses of PV module to the stressors of the real world is cross-correlated to the simulated and accelerated stressors placed on devices in a laboratory setting. A unique, highly instrumented, outdoor test facility for PV materials, components, and systems, the Solar Durability and Lifetime Extension (SDLE) center’s SunFarm, was built for the purpose of better understanding the power degradation mechanisms of PV modules and materials. The SDLE SunFarm provides an apparatus for the collection of real-world time series data consisting of output power, weather and insolation ix metrology. The SunFarm is comprised of 122 individual PV power plants, including 120 module-level plants and 2×8 modules, string-level plants. Output power is monitored through appropriate grid-tied inverters. The metrology package developed at CWRU for the collection of time series data provides a model to be implemented at external sites around the globe. In order to expand the ability of monitoring PV systems’ performance under different climatic conditions, a global SunFarm Network was implemented among nine outdoor test facilities around the world in collaboration with academic institutions and industrial partners including commercial power plants. This thesis provides the initial data analytics on the first six months of data from 60 PV modules on the SDLE SunFarm, and serves as a model for the analytics of full dataset from the global SunFarm Network. The data was first validated by characterization of the measurement apparatus, redundancy of measurement, and time-slewing according to minimization of the time cross-correlation function using a free and open-source statistical software language and packages known as “R”. Using R (v3.0.1) 1 for clustering data analysis base upon unfiltered AC power time series showed that the data fell into six clusters, which represented the six different electrical sites of SDLE SunFarm. The data were intelligently assembled and subsampled around solar noon time. PV performance ratio (PR), which is a measure of PV modules’ output at given incident power from sunlight, was used as a indicator of modules’ working effectiveness. Correlations among the filtered sub-set of solar noon time PR data were discerned with hierarchical clustering analysis. K-means clustering was used to confirm the optimum x number of clusters for the analytics. The clustering results differentiate modules on different physical sites, pointed out malfunctions of the PV mounting system, and incapacity of certain module brands. These results are useful for correlating different modules’ response to stressors and those stressors’ effects on overall performance. xi 1 1 Introduction Solar energy is becoming a more mature and mainstream source of electricity; the photovoltaic (PV) industry has experienced remarkable growth over the past decade. Worldwide, PV has already exceeded the 100 GW installed capacity mark in 2012 2 . Germany lead the installation in 2012 with 7.6 GW, followed by China with between 3.5 and 4.5 GW 3,4 . In the US during 2012, there were 3.2 GW installed, fourth in the world 2 . A solar project will be installed, on average, every four minutes in the US 5 . By the end of 2013, over 100,000 individual solar systems will be installed, exceeding 4.4 GW in capacity. In the academic world, although much PV research still focuses on gaining higher efficiencies and inserting new technologies, interest in lifetime and degradation has risen. At the 2010 Department of Energy Science for Energy Technology workshop 6 , the topic of PV lifetime and degradation science (L&DS) was made a research priority and its importance was reconfirmed in the Mesoscale Science Report 7 . A quantification of power decline over time, also known as degradation rate (R d ), is equally important as initial performance. Especially for investigators and PV power plant owners, degradation rates essentially determine the lifetime of a PV system. A well-known disaster in the PV industry was Carrizo Plains, which was once the largest PV power plant in the world 8 . The Introduction 2 installation failed after four years of operation because it exhibited a power degradation rate of 10% per year. Commercial PV panels claim a degradation rate lower than 1% per year, and usually come with a 25 year manufacturer warranty 9 . However, recent research, sampling from on over 2000 degradation rates reported around the world, suggest that some PV systems exhibit a power degradation rate (Rd ) higher than 1%. Additionally, the study observes that Rd is highly dependent on the operating environment 10 . 1.1 Lifetime and degradation science approach In order to predict the performance and lifetime of PV modules, a better understanding of degradation mechanisms and the influence of climate condition is necessary. A performance and lifetime prediction tool (PLP) based on a reliability physics and prognostics approach was proposed, which requires indoor accelerated studies of PV materials, components and system and a real-world degradation and time series analysis of PV modules 11,12 . Real-world testing plays a critical role in researching degradation mechanisms, firstly because it is the typical operating environment for PV systems 13 . A real-world environment is a unique combination of different stressors that no indoor testing chamber is able to duplicate. Stressors in the real-world include, but are not limited to solar irradiance, rain, snow, salt fog, and soiling. In order to isolate the influence of a single stressor or several stressors requires precision and redundant climate condition monitoring. Secondly, outdoor testing is the only way to correlate indoor accelerated testing to realworld performance. By developing metrics, metrology, and tools to quantify, compare, and cross-correlate the response of PV modules and components to a variety of stressors Introduction 3 for both accelerated and outdoor testing, it is possible able to link observed responses to particular stressors and determine quantitative rates of degradation. 1.2 Thesis overview 1.2.1 Background and literature review A literature review of previous research on PV modules, PV power plant performance under real-world operation conditions and different data filtering methods applied is provided in this thesis. Two IEC standards which were used for data monitoring and data cleaning in this study are also reviewed. Finally some background information on data science is provided. 1.2.2 Real World Data Acquisition SDLE SunFarm’s design and the data acquisition methods applied to the case study of 60 PV modules on SDLE SunFarm are explained. 1.2.3 Results:Real-world data analytics Descriptive data analysis and data clustering results are presented in this section. 1.2.4 Discussion Discussion of the data analytic procedures for outdoor test data, comparison of modules’ performance under different climate conditions, at different relative position to sunlight, comparison of initial indoor performance and outdoor performance of 20 brands will be presented in this section. 1.2.5 Conclusions Conclusions draw from data analysis are presented in this section. Introduction 4 1.2.6 Future research An improved study protocol and a predictive model are planed for future research. 1.2.7 Appendix A list of 24 PV models being studied in this thesis, SunFarm design and characteristics, and full references are presented in the appendixes. 5 2 Background and literature review 2.1 Previous research on real world PV modules’ performance 2.1.1 PV module degradation PV modules’ power output is known to decline over time, and a quantification of this phenomenon is measuring the degradation rate (Rd ) of a PV system. It is equally important for investigators and power plant owners to know the initial efficiency of PV modules as well as their degradation rates. Jordan and Kurtz reviewed over 2000 degradation rate reports in 2011 10 . All the degradation rates that had been reported were determined using one of the four methods introduce below. Current-voltage (I-V) curves, which are typically taken at discrete time intervals indoor with a solar simulator or outdoors with a portable I-V curve tracer, are used for determining Rd 14 . In order to take an indoor I-V curve, the PV module needs to be taken off the array, which is not convenient for PV system owners. Outdoor I-V curve tracing requires a very clear sky. Fig. 2.1 shows the the methodologies used to determine Rd . The use of indoor I-V curve tracing increased after the year 2000 due to the widespread use of flash indoor solar simulators. Neither of these methods provide continuous measurements, in fact it would take a large effort to acquire I-V curve measurements on every PV module on a real PV power plant 15 . As a result, a large portion of the Rd measurements Background and literature review 6 (40 out of 58, around 70%) were determined using only two, or even one, data point, which leads to low accuracy and high uncertainty 16 . Using continuous power data for Figure 2.1. Pie chart of the number of references deploying the indicated methods to determine degradation rates prior to and following the year 2000 10 . Rd determination can improve the accuracy 10 . Photovoltaic for Utility Scale Application (PVUSA) 17 and performance ratio (PR) 18 Rd measurement methods are in the continuous data category. PVUSA is an AC rating method developed by engineers working on Background and literature review 7 the PVUSA project. The PVUSA method provides an empirical relationship of the module’s AC output as a function of solar irradiance, ambient temperature and wind speed. PR gives a measure of the ratio of modules convert efficiency in the field to a manufacturer provided qualification test efficiency under standard test conditions (STC), of 25◦ C, 1kW /m 2 , and AM 1.5 irradiance. The degradation rate was determined by taking the trend of continuous data using time series analysis 19,20 . Both methods display strong seasonality that can affect reported rates and increase uncertainties. In practice, the process of preferentially choosing data subsets, referred to as data filtering, such as data for sunny-only days, can reduce the noisiness of data 21 . However, data filtering usually eliminates or disregards the impact of different climate conditions on modules’ Rd . 2.1.2 Performance ratio filtering Performance ratio (PR) reflects the PV system conversion efficiency in the field compared to that under qualification STC. Previous research reported that typical PR of PV systems is about 70%-80%. A survey conducted by Nils Reich from Frauhofer Institute for Solar Energy Systems suggests that the PR for newly built PV systems in Germany increased to 90% 22 . However these reported PR are all filtered and averaged with a certain methodology. Reich’s study only considered POA irradiance between 800-1000 W /m 2 and temperatures of either the 35-40◦ C or the 40-45◦ C temperature bin. Following the first round of filtering there is still remaining ”outliers”, which they discarded all the data points with a deviation of more than ±5% from the median of the annual PR. Fig 2.2 shows how annual PR was determined from already filtered data set. There are obvious outliers exceeded 110% at the beginning of the study, and additional outliers lower than 40% during the study. This range was selected because ”there is no physical reason apart Background and literature review 8 from malfunctions or measurement uncertainty, why PR at selected irradiance and temperature conditions should differ that much” 23 . Figure 2.2. PR subsetting of an entire plant. Keeping ±5% data from the median of annual PR 22 . Another study conducted by Jordan et al . from NREL used three steps filtering 13 . POA irradiance is fixed between 800 W /m 2 to 1200W /m 2 . Another two filters were applied, denoted as stability and outliers. The stability filter ”eliminates data points when POA changes more than 20 W /m 2 /mi n and the module temperature more than 1 ◦ C/min”. Outlier filter ”uses DC/POA to eliminate snow days, partial shading conditions. Furthermore, the data for sunny days were selected by filtering for clearness index >0.5”. Clearness index of the sky is the ratio of measured global irradiance over the extraterrestrial beam irradiance on a similarly tilted surface 24 . After filtering, PR shows good precision, which is good for degradation determination. However by applying a filter it Background and literature review 9 only keeps data from constant bright sunny days and eliminated the other weather conditions. Filtered data was also averaged to eliminate seasonality, yet weather and season have important effects. Recently, Hasselbrink et al . from SunPower Corporation developed a unique approach of using “3 million module-years of live site data” 25 . Instead of determine yearly degradation data with monthly averaged PR and moving average method, which ignore seasonality by smoothing out the variation, performance index of the same day of the year was used to determine the degradation rate at each day of the year. And yearly degradation rate was determined from the distribution of the 365 Rd . This method included all climate conditions; however, isolating the influence of each climate stressors is not the focus of their study. 2.1.3 Influence of weather stressors A PV systems’ operating environment is a combination of multiple weather stressors including temperature, humidity, radiation, soiling, etc. Interest has risen for the investigating of the influence of one or multiple stresses. Faiman, Ye et .al conducted an experiment on three different types of modules: Mono c-Si, micromorph Si and a-Si with single junction. Their performance under two distinct monsoon seasons throughout the year was modeled 26 . The results show module efficiency is highly correlated to temperature. However, as a result of Singapore’s low altitude, module’s efficiency at noon time is not strongly correlated to spectral effects, which arises from changes in air mass. Another study focused on the soiling losses of solar systems, was conducted by a group of researchers at the University of California San Diego. They qualitatively modeled the losses caused by dust accumulating on module surfaces between two days of rain 27 . The research explicitly compared average soiling losses of modules mounted at Background and literature review 10 tilt angles from 0-5, 6-19, and greater than 20 degrees. Soiling loss of sites have tilt angle shallower than 5◦ showed losses five times that of the rest of the sites. Seasonal variation, which has usually been neglected in the process of determining Rd , contains information about influence of climate stresses on PV modules performance and reliability. The research reported here aims to extract more information by doing exploratory data analysis and clustering analysis on the entire AC power time series data before sub-sampling or “filtering”. 2.2 Standards 2.2.1 Photovoltaic system performance monitoring IEC 61724 describes general guidelines for the monitoring and analysis of the electrical performance of photovoltaic systems 28 . Meteorology. For climate conditions monitoring, total irradiance in the plane of array (GI ) shall be measured in the same plane as the PV array by calibrated reference devices or pyranometers. Ambient air temperature (Tam ) shall be measured at a location that can represent array conditions using temperature sensors that are shielded from direct solar radiation. Wind speed (SW ) shall be measured at a height that can represent array conditions. Electrical parameters. PV system electrical parameters including output voltage (V A ), output current (I A ), and output power (P A ) represent the DC electrical characteristics. Utility grid electrical parameters including utility voltage (VU ), current to utility grid (IT U ), current from the grid (IFU ), and power to the utility grid (PT U ). The standard also point out that “AC voltage and current may not need to be monitored in every situation. DC power can either be calculated in real time as the product of sampled voltage and Background and literature review 11 current quantities or measured directly using a power sensor. If DC power is calculated, the voltage and current quantities shall be sampled not averaged.” This explains why the microinverter used in this study provides instantaneous DC voltage and DC current and averaged AC power. System performance indices. System performance indices are part of derived parameters that relate to system energy balance and performance calculated from the recorded monitoring data. Performance indices normalize system performance, which makes PV systems of different configurations and at different locations comparable. These indices include yield, losses and efficiencies. Yields are energy quantities normalized to rated array power. System efficiencies are normalized to array area. Losses are the differences between yields. Daily mean yields. a) The array yield Y A is the daily array energy output per kW of installed PV array: Y A = E A,d /P 0 = τr × (Σd a y P A )/P 0 (2.1) This yield represents “the number of hours per day that the array would need to operate at its rated output power, P0 , to contribute the same daily array energy to the system as was monitored”. b) The final PV system yield Y f is the portion of daily net energy output of the entire PV plant which was supplied by the array per kW of installed PV array: Y f = Y A × η LO AD (2.2) This yield represents the number of hours per day that the array would need to operate at its rated power output to equal monitored net daily yield. η L O AD is load efficiency. Background and literature review 12 c) The reference yield Yr can be calculated by dividing the total daily in-plane irradiation by the module’s reference in-plane irradiance GI ,r e f . Yr = τr × (Σd a y G I )/G I ,r e f (2.3) This yield represents the number of hours in a day the sun needs to be at reference irradiance levels in order to contribute the same incident energy as measured on the field. Normalized losses. By subtracting yields, normalized losses are calculated. a) The "array capture" losses Lc represent the losses due to array operation: L c = Yr − Y A (2.4) b) The balance of system (BOS) losses LBOS represent the losses in the BOS components: L BOS = Y A × (1 − η BOS ) (2.5) c) The PR indicates the overall effect of losses on the array’s rated output due to array temperature, incomplete utilization of the irradiation, and system component inefficiencies or failure: P R = Y f /Yr (2.6) 2.2.2 Procedures for temperature and irradiance corrections to measure current-voltage characteristics In IEC. 60891 18 , three correction procedures have been introduced. For time’s sake, only the first procedure will be introduced, which was used for the baseline data correction in this work. The second procedure is especially good for large irradiance corrections (>20%). The third procedure needs to be utilized when the temperature coefficient of PV devices is unknown. Background and literature review 13 Correction procedure 1. The measured current-voltage characteristic shall be corrected to standard test conditions ,which is given at 25 and 1000W /m 2 , by applying the following equations: I 2 = I 1 + I SC · (G 2 /G 1 − 1) + α · (T2 − T1 ) (2.7) V2 = V1 − R S · (I 2 − I 1 ) − κ · I 2 · (T2 − T1 ) + β · (T2 − T1 ) (2.8) where I 1 ,V1 are coordinates of points on the measured characteristics; I 2 ,V2 are coordinates of the corresponding points on the corrected characteristics; G 1 is the irradiance measured with the reference device; G 2 is the irradiance at the standard or other desired irradiance; T1 is the measured temperature of the test specimen; T2 is the standard or other desired temperature; I SC is the measured short-circuit current of the test specimen at G 1 and T1 ; αand β are the current and voltage temperature coefficients of the test specimen in the standard or target irradiance for correction and within the temperature range of interest; R s is the internal series resistance of the test specimen; κ is a curve correction factor. 2.3 Data science 2.3.1 Data validation Data validation is the process of ensuring that data analysis is based on a clean, correct and useful data set 29 . Data validation including data type checks, for example, whether the data is power production of PV module or irradiance intensity on the PV module’s plane; file existence check, check for which days data files are available for analysis; cross-system consistency check, which compare data point to the same variable collected in different systems to ensure it is consistent. In practice data validation rules Background and literature review 14 can be implemented through the automated facilities of a data dictionary 30 , or by the inclusion of explicit application program validation logic 31 . 2.3.2 Exploratory data analysis In outdoor testing of PV systems, test conditions are not controllable, the best we can do is to collect as much data as possible so as quantitatively evaluate climate stressors and the PV systems’ response. Exploratory data analysis (EDA) 32 encompasses and surpasses initial data analysis (IDA) 33 while IDA narrowly focus on hypothesis testing and checking assumptions, EDA encourages statisticians to explore the data, possibly formulating hypothesis that can guide further experiments and data collection. EDA usually summarizes main characteristics of data by visual methods, including box plots, histograms, multi-vari charts which graphically displays patterns of variation. 2.4 Clustering analysis Data describe the characteristics of different PV systems. In order to understand all kinds of response and phenomena, one of the most important steps of data analysis activities is to classify or group data into a set of categories or clusters. Data objects that are classified in the same group or cluster should reflect similar properties based on some criteria. Classification processes can be supervised or unsupervised. Supervised classification is mapping data objects into predefined classes. Unsupervised classification is know as cluster data analysis 34 . As described in literature, “A direct reason for unsupervised clustering comes from the requirement of exploring the unknown natures of the data that are integrated with little or no prior information” 35 . Clustering algorithms will be discussed in this paper including hierarchical clustering and k-means clustering. Background and literature review 15 Hierarchical clustering is a connectivity based clustering algorithm. It is based on the core idea of “objects being more related to nearby objects than to objects farther away” 36 . In order to determine the similarity of two objects, the distance of two objects need to be defined. Distance metrics including Euclidean distance, squared Euclidean distance, Manhattan distance, Dynamic time warping etc. Euclidean distance computes the root of square differences coordinates of a pair of objects: DXY = rX (x i k − x j k )2 (2.9) k The standard Euclidean distance can be squared in order to place progressively greater weight on objects that are farther apart: D 2X Y = X (x i k − x j k )2 (2.10) k Manhattan distance or city block distance represents distance between points in a city road grid. It computes the absolute differences between coordinates of a pair of objects: DXY = X | xi k − x j k | (2.11) k Linkage criterion specifies if two set of objects can joined into one by measure different objects pairs in two sets. K-means clustering is also known as centroid-based clustering 37 , which partitions objects in a way that objects assigned to the same cluster are nearest to each other. Kmeans clustering uses Euclidean distance metrics. The quantity that can evaluate the quality of k-means clustering result is within-cluster sum of squares (WCSS), which is a sum of the distance among the objects in the same cluster. The goal is to assign each objects to a cluster such that the total WCSS is minimized. 16 3 Real-world Data Acquisition 3.1 SDLE SunFarm design The SDLE Sunfarm located on the west campus of CWRU is about one acre in size. 14 high precision, Feina SF20 dual-axis tracker and 2 sites of adjustable tilted racking comprise the 16 electrical sites of SDLE SunFarm. 122 individual PV power plants include 120 PV modules with microinverters and two sets of 8 PV modules connected in series with string inverters. Output power is monitored through inverters and fed back to the grid through a reversing relay. 120 modules work with microinverters were evenly separated into two groups, each group has 60 samples (3 modules samples from 20 brands). Two groups of modules use two different microinverter models for comparison. The first 60 microinverters installed were Enphase model M215. Electrical data was reported by Enphase’s embedded Enlighten data acquisition system. The metrology platform (shown in Table 3.1) includes insolation, and weather monitoring. Minute-by-minute global horizontal irradiance (GHI) data was monitored by a Kipp & Zonen CMP6 pyranometer, positioned near the fixed racking. Another Kipp & Zonen CMP11 pyranometer was also set on the horizontal plane and connected to a Daystar multi-tracer. Two Vaisala WXD520 weather stations were placed on the SunFarm to record wind speed, wind direction, rainfall, rain intensity, rain duration, and humidity. An anemometer was Real-world Data Acquisition 17 connected to the Master Control Unit of the trackers to monitor the wind load on the trackers. T-type thermocouples were used for backsheet temperature monitoring. Instrument Attributes AC power Enphase micro-inverter DC current DC voltage Kipp& Zonen pyranometer Solar irradiance Temperature Wind speed Wind direction Vaisala WXD 520 weather station Rainfall Hail Relative humidity T-type thermocouple Backsheet temperature Table 3.1. Parameters monitored using SDLE SunFarm metrology platform The data acquisition system consists of 17 networked Campbell Scientific CR-1000 dataloggers, with each datalogger connected to an AM 16-32 multiplexer, extending the capacity of datalogger to 32 differential measurement channels. The Campbell dataloggers monitor thermocouple and sensor outputs. Enphase micro-inverters use the Enphase Envoy Communications Gateway to connect each individual micro-inverter to Enlighten monitoring and management software. Similarly, Solectria string inverters use the Solrenview system to collect data. Minute by minute data can be downloaded from Solrenview web servers. 3.2 Global SunFarm network and Energy CRADLE Cleveland’s climate, a humid continental, is not typical for PV degradation research. In order to study PV modules’ performance under different climatic conditions, a global SunFarm network was established among nine PV outdoor test beds across the world. Real-world Data Acquisition 18 The purpose of the Energy Common Research Analytics and Data Lifecycle Environment (Energy CRADLE) is to create, for engineering, and in particular lifetime science, the tools and protocols necessary to transform Big Data to information, which informs scientific knowledge to guide further analysis 30,38–40 . Energy CRADLE is tightly focused on serving the needs of handling and sharing data across the SunFarm network. Appendix B provides further details. 19 4 Results: Real-world data analytics 4.1 Overview In this section, the results will focus on real-world performance of 60 crystalline silicon PV modules from 20 different manufacturers exposed from November 25, 2012 to May 31, 2013 on the SDLE SunFarm. The purpose of this case study is interpreting the information in the data that has been collected during the first 6 months of SDLE SunFarm’s operation, developing a data cleaning, and data munging procedure. This analytic procedure will be integrated within Energy CRADLE and will guide the way for data processing on the cloud. This case study can also inform experimental design and evoke further research interests. Fig. 4.1 is a blueprint of SDLE SunFarm’s 16 electrical sites. The 60 modules studied are distributed on the sites marked with red boxes, specifically fixed rack Site 1 and tracker Sites 4, 6, 8, 12, and 14. All three modules from the same manufacturer are placed on the same site. On fixed rack Site 1, 18 modules from 6 brands are aligned horizontally, and modules of same brand placed adjacent to each other. On trackers, which carry either 6, 9 or 12 modules each, modules of the same brand are evenly distributed on the same tracker frame. Results: Real-world data analytics 20 In this study, manufacturer information of the modules are withheld, each brand will be referred to as capital letter A through T. Modules’ location is represented using lower case f or t, which are short for “fixed rack” or “tracker”, respectively, followed by site number. Each module has a sample number start with “sa”. For example, in Fig. 4.3 power data was record from “A.f1.sa18259.00” which is a module of brand A mounted on fixed rack site 1 and its sample number is “sa18259.00”. Figure 4.1. 60 PV modules studied in this section are distributed on 6 different electrical sites shown in red boxes in the plot. Site 1, which is the long site along the bottom, is a fixed tilt rack site. 18 modules are exposed on Site 1. The rest of the modules are exposed on even number of tracker sites. There are 12 modules on Site 4, 6 modules on Site 6 and 8, 9 modules on Site 12 and 14. 4.1.1 Analytical methods R 1 , which is a free and open source programming language and software environment for statistical computing and graphics was chosen as the data analysis tool. The data Results: Real-world data analytics 21 analytical methods applied to this case study consist of raw data validation, exploratory data analysis, data assembly, data subsampling, and clustering data analysis. 4.2 Raw data validation 4.2.1 Module baseline All the modules studied on SDLE SunFarm were brand new modules purchased on the open market. Before being exposed to the sunlight, I-V characterization of each module were recorded using a SPIRE SPI- 4800 solar simulator and I-V curve tracer, located at the Wright Center for Photovoltaic Innovation and Commercialization (PVIC), at the University of Toledo. In order to reduce the impact of instrument uncertainty, sixteen I-V curve measurements were acquired for each module. Additionally, the backsheet temperature and the irradiance intensities were recorded. Each measurement was corrected to standard test condition (STC), specified at 25 ◦ C and 1 kW /m 2 according to IEC 60891 18 . Maximum power output (Pmax ) was taken from 16 corrected I-V curve measurements to represent the initial performance of each module under STC. For 60 modules, the standard deviations of 16 Pmax measurements fall between 0.04%-0.9%, which supports the reliability of baseline results. In order to evaluate the initial performance of each brand, the mean of Pmax were taken for each brand from three modules and normalized by dividing nominal power output of the module. Fig. 4.2 shows the normalized performance of each brand, and the deviation among three module samples is shown as error bars. Most brands’ (except H and Q) initial performance fall in the gap between 0.95-1.05, which means their initial performance reached the common market expectation of ± 5% of their nominal power. Results: Real-world data analytics 22 Figure 4.2. Cross-sectional comparison of crystalline silicon PV modules from 20 different manufacturers. Y axis is normalized power. X axis shows the brands and location. Brand names were replaced with letters A through T. Letter f and t represent fixed tilt rack and tracker. The maximum power output (Pmax ) of three modules of each brand were measured. The bars show the averaged normalized power of each brand. The standard deviation was plotted as error bars. 4.2.2 Power data As introduced in previous chapter, electricity generated from all 60 modules are reported by the microinverters data acquisition system, Enlighten. Enlighten data reports DC Results: Real-world data analytics 23 current, DC voltage, a microinverter’s internal temperature, and AC power. Data collection interval is 5 minutes. Prior to Energy CRADLE, power data was collected from Enlighten manually. Fig. 4.3 shows an example module’s AC power over six months. From this figure, we can clearly see daily variation of power data. During the 180 days, there are several gaps in data partly due to three trips of the interconnection relay. Over the 180 days observation time, 99 days have power data reports. Figure 4.3. Power production versus time of one PV module. 4.2.3 Microinverter’s efficiency A microinverter’s efficiency is calculated from AC and DC power data. Microinverter’s conversion efficiency is given by the ratio of AC power to DC power. DC power is calculated using the product of DC current, DC voltage. AC power is provided in the data. Both DC current and DC voltage have two significant digits, while AC power is given as an integer. Fig. 4.4 shows the efficiency of 60 microinverters. The majority of the efficiencies are between 95% and 99%, which is consistent with the efficiency provided by the manufacturer. However, there are 12 points that exceed 1.0, which is contrary to the laws of thermodynamics. By looking at the raw data, it appears that when the PV module’s DC output is low (around 1 W), the module tends to "round up" the product of DC Results: Real-world data analytics 24 current and voltage to an integer. This rounding behavior explains why abnormal efficiency appears mostly on trackers 12 and 14; these two trackers did not track properly during the majority of the 180 days data was collected. The modules on these trackers were exposed to low irradiance level longer than the other modules. Figure 4.4. The efficiency of 60 microinvertors from 99 days of data collection after exposure on the SDLE SunFarm on fixed rack 1 (blue), tracker4 (red), tracker6 (yellow), tracker8 (green), tracker12 (purple), and tracker14 (light blue). Results: Real-world data analytics 25 4.2.4 Microinverter burst-mode Further investigation of the“round up” effect shows that it is due to the fact that Enphase microinverter works at “burst-mode” at low DC input. When a PV module is working under low irradiance, DC output of the module is low, and therefore the DC to AC conversion efficiency will drop 41 . Microinverters can scan the DC voltage at each AC cycle (1/60 second). When a microinverter detects the DC input is lower than 30%, it will charge a capacitor instead of converting DC to AC power. At the next cycle, microinverter scans the PV module for its output again and adds that to the amount of charge already stored in that capacitor bank from the previous cycle. If the combined power is high enough for a DC-to-AC conversion, the capacitor will release the charge. As a result, when the microinverter is “bursting” the stored-up charge, the AC output of the microinverter will be higher than what the DC input would dictate. This explains why when a microinverter always rounds up its AC power which, as a result, make its efficiency higher when it’s working at low irradiance level. However, it is also known that AC power reported by Enlighten is an averaged value instead of an instantaneous measurement. The affect of “Burst-Mode” will be shown in the data subsampling part of this chapter. 4.2.5 Weather data Insolation data. This study uses global horizontal irradiance (GHI) monitored by a Kipp & Zonen CMP6 pyranometer placed at a horizontal plane as reference. The sampling rate for irradiance data was determined by the datalogger’s scan period, which is 1 minute for all the data loggers on SDLE SunFarm. Incident irradiance on a PV module’s plane Results: Real-world data analytics 26 is different from horizontal plane, so in order to convert GHI to plane of array (POA) irradiance, the assumption was made that all incident sunlight is direct sunlight. Global horizontal irradiance (GHI) to plane of array (POA) irradiance conversion. In reality, global horizontal irradiance (GHI) consists of direct irradiance and diffuse irradiance. Direct irradiance is proportional to direct normal irradiance (DNI) with a sine function, while diffuse irradiance varies on different planes. G H I = D N I × si nα + I d i f PO A t r acker = D N I + I d i f (4.1) PO A f i xed = D N I × si n(α + β) + I d i f where I d i f is the defused irradiance, α is the elevation angle of the sun, and β is the tilt angle of the fixed rack, in this case β equals 22.3◦ . The elevation angle is given as : α = 90◦ − θ + δ (4.2) where θ is the latitude; and δ is the declination angle given as: δ = 23.45◦ si n[360◦ × (284 + d )/365] (4.3) where d is the day of the year. Since the DNI or I d i f data was not available for the first 6 months, here the assumption is that all the incident light is direct sunlight, which simplifies the formulas as: G H I = D N I × si nα PO A t r acker = D N I PO A f i xed = D N I × si n(α + β) (4.4) Results: Real-world data analytics 27 The estimated POA is supposed to be higher than the actual incident irradiance on a modules’ surface as a result of treating diffused light as direct light, amplified by converting GHI to POA. In the future, this systematic error can be removed by having an irradiance sensor set on the plane of array and a direct irradiance sensor for DNI. In this case study, as modules performances will be cross compared, the systematic error can be ignored. Additional climate data. Additional climate data including special climate events and the cloudiness levels were collected from online open source historical data, such as Weather Underground (http://www.wunderground.com). 4.2.6 Data alignment Data alignment is another important validation process for time series data. There are multiple data sources on the SunFarm, and different devices synchronize time from different time sources. For example, the weather data used in this study was collected by dataloggers on SDLE SunFarm. Time on these dataloggers was synchronized through a controller software on a desktop computer in the SDLE lab. The power data was reported by Enphase user interface, which synchronizes time with their server. PV modules can generate power almost instantaneously when sunlight hit on the front surface; therefore, time series data of power and irradiance should be highly correlated. Weather and power data were aligned using the sample cross correlation function (ccf) in R. Ccf in R is defined as the set of sample correlations between time series X at time t + h (h = 0, ±1, ±2) and time series Y at time t, where X is potentially a predictor of Y. If two time series were perfectly aligned, then correlation is the highest when h = 0 and the correlation value drops as the absolute value of h increases. However, if the maximum correlation value appears when h is positive, then X lags Y. If correlation is Results: Real-world data analytics 28 maximum when h is negative, then X leads Y. It was determined using CCF that, weather data was leading power data by 3 minutes before March 10t h , 2013. After daylight saving time began on March 10t h , 2013, the time shift became 63 minutes. The time for the power data was found to be more trust worthy in comparison to the standard Greenwich Mean Time (GMT). The weather data were separated into two parts, before and after March 10, then the time of two parts were slewed accordingly. 4.2.7 Malfunction of trackers According to the maintenance record, tracker 4 did not experience any mechanical problems. All the other trackers experienced some amount of malfunctions during the observation time. In order to determine the days when trackers malfunctioned, the power data from an example module on each tracker were plotted versus time and compared to the example power data from the functioning tracker 4. An example of the power data curve from a stopped tracker (tracker 8) and power data curve from a normal operating tracker (tracker 4) is shown in Fig. 4.5. The curve of the power on tracker 8 is not symmetric with the majority of the power generated in the afternoons. Thus, it was stopped facing west. By comparing the curves in this manner, the malfunctioning dates of each tracker were determined. Tracker 6 was stopped for 5 days in May. Tracker 8 stopped 10 days in April and May. Tracker 12 was off tracking until its gear counter got replaced. Tracker 14 was not tracking most of the time because of a gear stopper issue. Results: Real-world data analytics 29 Figure 4.5. A comparison of AC power generated by module on tracker 8 (top) and tracker 4 (bottom) between May 12th to May 18th 2013, when tracker 8 stopped functioning and facing west. 4.3 Exploratory Data Analysis (EDA) on Integrated Data 4.3.1 Total power production A normal way of evaluating a PV module’s performance is by comparing the total power production. A module’s power production, in this case, is not only affected by it’s nominal power rating, but also affected by modules mounting system. The averaged total power production of each brand is shown in Fig. 4.6. The highest power production in 99 days is 60.36 kWh from brand G on tracker 4 and the lowest power production, 28.79 kWh, is brand T on tracker 14. The four brands on tracker 4 (red), on average, produced about 40% more power than the six brands on fixed rack (blue). Modules on the other trackers produced less power than tracker 4 by varying degrees. Generally, modules on an operational tracker should produce more power than those on fixed rack when the Results: Real-world data analytics 30 tracker is operating correctly. However, except the tracker on sites 4 and 6, the modules on trackers produced less power on average than the modules on the fixed rack site 1. In order to compare the performance of different brands on the same site, total power needs to be normalized. Figure 4.6. The bar graph shows the averaged total power production of each brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow), tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). The standard deviation are plotted as error bars. 4.3.2 Normalized power yield Normalized power yield is defined as the ratio of the total power production to the product of nominal power and exposure days (i.e, 99 days) 28 . Normalized power yield is Results: Real-world data analytics 31 equal to the time that the PV plant is operating at nominal power output in a day. Normalized power is an important factor in choosing PV modules. In the PV industry, a module’s price is presented in the unit of dollars per watt, so modules have higher normalized power yield are more cost effective. Fig. 4.7 shows the normalized power production of each brand. As the trackers experienced different problems, it is not valid to compare the performance between two brands on different sites. However, the ranking of brands within one site demonstrates the relative performance of different brands. For example, on fixed rack which is shown in blue, brand B’s total power production is the lowest but it’s normalized power production is the highest among 6 brands. This indicates that under the same environment, including temperature and irradiance conditions, brand B performs better than the other brands on site 1. 4.4 Clustering of AC Power Data Section 4.3 showed that integrated performance, total power production and normalized power production vary among the 20 different brands; however, brands on the same site perform similarly. In order to determine if the modules of the same brand always perform similarly, it is necessary to check the similarity of 60 modules’ AC power time series data. As mentioned in Chapter 2, a statistical way of checking the similarity of multiple observations is clustering analysis. A hierarchical clustering analysis (HCA) was conducted on all of the time series AC power data from the 99 days of observed data. There are 9698 observations for each module. A dendrogram that uses Euclidean distance metric and average linkage criterion is shown in Fig. 4.8. The distance metric and linkage criteria will be discussed in 5. Red boxes in the plot show the result of dividing modules into six groups. The grouping result reflected exactly 6 physical sites on Results: Real-world data analytics 32 Figure 4.7. The bar graph shows the averaged normalized power production of each brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow), tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). The standard deviation are plotted as error bars. SDLE SunFarm. Although there are some exceptions, most of the modules of the same brand are close to each other in distance. In Fig 4.8 from left to right, the 6 groups consist of modules from tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker 4 respectively. However, six is an arbitrary number chosen from experience with the data. In order to confirm the result of HCA is valid, the k-means algorithm was used. K-means clustering partition observations into k clusters which minimize the “total within-cluster sum of square" (WCSS). In this case, each sample (PV module) has a set of 9698 observations, Results: Real-world data analytics 33 Figure 4.8. Hierarchical cluster analysis of 60 modules based on all AC power time series data. The clusters were generated using “hclust" in “stats" package in R(v3.0.1). Distance matrix is computed using a Euclidean method. Distance between sets of observations is defined with the average linkage method. When the dendrogram tree is divided into 6 groups, each group includes exactly the modules physically located on the same electrical site. where each sample is treated as a 9698-dimensional vector. In order to determine the k value that gives the most reasonable result, a commonly used method is the elbow method 42 . The elbow method is applied to a plot of WCSS as a function of the cluster numbers, k. The best cluster result occurs when adding an additional cluster does not statistically improve the model of the data. This point should be chosen as the cluster number, hence the "elbow criterion". A survey of the WCSS as a function of k is plotted Results: Real-world data analytics 34 in Fig. 4.9. The elbow point is equal to 6, which is marked in a red circle. The k-means clustering result a k equals to 6 is consistent with the result of HCA. In order to visually Figure 4.9. Total within cluster sum of square (WCSS). Elbow points occurs when k is equal to 6. conform that AC power time series fall into each group similar to each other, Fig. 4.10 plotted AC power output of 60 PV modules over 99 days according to both k-means and hierarchical clustering results. 60 AC power time series were separated into 6 groups, Results: Real-world data analytics 35 modules from the same brands are shown in the same color. Group 1 through 6 correspond to tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker 4, respectively. Fig. 4.10 confirms that the shape and magnitude of the AC power time series in each cluster are similar. Figure 4.10. AC power of 60 modules grouped by hierarchical result. Color of the curve differentiate module brands. Results: Real-world data analytics 36 4.5 Data Assembly 4.5.1 Performance metrics Up to this point analysis was based on 60 modules’ AC power data. However, in order to correlate modules power output to climate conditions, climate data and power data were assembled in the following way. To compare the performance of 60 modules of different nominal power and with different mounting system, a normalized analysis and presentation was introduced based on IEC 61724 28 and H. Haeberlin et. al. work 43 . Normalized energy yields and losses. Definition of six performance indices introduced in IEC 61724 were discussed in Chapter 2. Since the 60 modules being studied are all working with individual microinverters instead of a PV array and AC power generated was directly fed back to the grid. Each module is one PV plant. Data was collected on a minute basis instead of on daily basis. Specifically power data was collected every 5 minutes and weather data was collected every minute; therefore, it is necessary to modify the performance metrics. These new performance indices are normalized instantaneous quantities. Irradiance yield, YI , is POA irradiance normalized to reference irradiation 1 kW/m2 (Equation 4.5). Y I = PO A/G 0 ,G 0 = 1kW /m 2 (4.5) DC yield, YDC , is the DC power normalized to a module’s nominal power (Equation 4.6). DC power was calculated by multiplying DC current to DC voltage. YDC = P DC /P 0 (4.6) AC yield, Y AC , is the AC power normalized to module’s nominal power (Equation 4.7). Y AC = P AC /P 0 (4.7) Results: Real-world data analytics 37 Capture losses, Lc , is the part of incident sun power not captured by the solar cell (Equation 4.8). L c = Y I − YDC (4.8) System losses, Ls , is the DC-AC inverter conversion losses (Equation 4.9). L s = YDC − Y AC (4.9) Performance ratio (PR) is the ratio of the useful energy fed back into the grid to the energy which would be generated an ideal PV module with cell temperature of 25 ◦ C and the same irradiance. P R = Y AC /Y I (4.10) 4.5.2 Solar time Local noon time is usually not when the sun is the highest in the sky due to the Earth’s orbit and human adjustments such as time zones and daylight saving time. Noon local solar time (LST) is defined as the time when the sun is highest in the sky for a particular location and not necessarily at the local noon time 44 . In order to better understand the modules’ performance corresponding to solar motion, timestamps of the data need to be converted from local time (LT) to LST. The local standard time meridian (LSTM) is a reference meridian used for a particular time zone and is similar to the Prime Meridian (longitude = 0◦ ), which is used for greenwich mean time (GMT) 45 . The formula for calculating LSTM is given by Equation 4.11: LST M = 15◦ × ∆TG M T (4.11) where ∆TG M T is the difference of the local time from GMT in hours. ∆TG M T equals −4 for eastern daylight time (EDT), equals −5 for eastern standard time (EST). Equation of Results: Real-world data analytics 38 time (EoT) corrects the eccentricity of the Earth’s orbit and Earth’s axial tilt (Equation 4.12). EoT = 9.87si n(2B ) − 7.53cos(B ) − 1.5si n(B ) (4.12) where B = 360◦ (d − 81)/365 in degree and d is the number of days in the year. The net time correction factor (TC) accounts for the variation of LST in a given time zone (Equation 4.13, 4.14). T C = 4(Long i t ud e − LST M ) + EoT (4.13) LST = LT + T C /60 (4.14) Six performance metric variables of one single module over one day in LST are shown in Fig. 4.11. Y I , YDC , Y AC , L c , L s , and PR curves are plotted in black, green, blue, yellow, brown, and red, respectively. On a clear sunny day both irradiance and PR show a dome shaped curve. The PR curve has a comparably flat top, which suggest PR and POA are highly correlated and PR is less sensitive to POA irradiance at high level (over 750 W/m2 ). 4.6 Sub-sampling 4.6.1 Solar noon time performance ratio From the EDA plot of performance metrics (Fig. 4.11), it is clear that PR is correlated to POA irradiance. PR can reach up to 0.85 on the 22.3◦ fixed rack and 0.90 on the tracker at solar noon time when the POA irradiance is high. In order to reduce the volume of data and reduce temporal fluctuations, the PR is subset into ±15 mins around solar noon time. The sampling rate of the PR is 5 mins, so there are about 7 data points within this Results: Real-world data analytics 39 Figure 4.11. Normalized performance of one single module (D.fi.sa18286.00) on fixed rack on December 12, 2012. These variables are PR (red), YI (black), YDC (green), Y AC (blue), Lc (yellow), Ls (brown). 30 min window. During the 99 days, there are roughly 700 observations for each module, which is still statistically sufficient for further analysis. 4.6.2 Snowy days EDA on the solar noon time PR subset was performed by plotting PR versus Y I for each module. An example of PR vs Y I plot is shown in Fig. 4.12. Three abnormal data points groups are marked in yellow circle in the plot. Results: Real-world data analytics 40 Figure 4.12. Solar noon time PR of a module (C.f1.sa18328.00) on the fixed rack versus Y I . The vertical blue line marks POA irradiance at 1200 kW /m 2 , the red horizontal line marks the PR at 1.0. Group 1 are points have a PR greater than one. Group 2 are the points at irradiance higher than 1200 kW /m 2 . Group 3 are the points showed zero PR. Group 1. In theory, the PR can never exceed one. From literature and standards 22,28,43 , PR is normally reported to be 0.8-0.85 on average. The abnormal points in group 1 have a PR calculated greater than one. By looking at the raw data including AC power, DC power, and POA irradiance at each of data points, two potential causes were found. First, these data points appeared when the irradiance changed quickly. As discussed previously, power data was reported by Enphase Envoy system and the method of their data acquisition is unknown. It could be that Enphase does not report instantaneous power but an averaged value. By using averaged power data and instantaneous POA measurement for PR calculation, a systematic error is introduced. Another cause of high performance could be the microinverter working in burst-mode as discussed earlier. Results: Real-world data analytics 41 Group 2. Solar radiation outside the Earth’s atmosphere is 1.36 kW /m 2 and the global irradiance on a tracker plane at noon time in Denver, Colorado is less than 1.2 kW /m 2 . The abnormal data points in Group 2 showed a POA irradiance on the fixed rack, 22.3◦ tilt plane is higher than 1.2 kW /m 2 . This is a systematic error introduced by converting GHI to POA. Without direct global POA irradiance monitoring or direct sunlight monitoring, it is not possible to correct the error. However, since the same irrradiance conversion method is used for all modules, it will not affect the cross-sectional comparison of the modules performance. Group 3. The PR of the modules was small or equal to zero even when irradiance was not very low, which indicates that the module may be covered. Moreover, it only appeared in certain days in December and January, and only appeared on some modules mainly on the fixed rack. Given the climate, this suggests that it was caused by snow coverage. Snowy days. In order to document the relationship between low performance of modules and snowy weather, historical climate condition data from a third party web site was collected. PR time series were plotted for each module, data points on snowy days were highlighted with red and blue colors 46 . Fig. 4.13 shows the PR of six modules of two different brands (three modules from each brand). Low PR appears only during or after snow or fog-snow days, proving that the abnormal points group 3 points were most likely snow coverage. All snow-covered date was determined by plotting out all 60 modules PR versus time, and the snowy days data were assembled as a subgroup. Results: Real-world data analytics 42 Figure 4.13. Solar noon time PR of six modules, data points when there was snow or fog-snow were highlighted in red and blue. Three modules on the top row are from brand A placed on fixed rack. Three modules on the bottom row are from brand G placed on tracker 4. All three A brand modules showed low/zero performance during or after some snowy days, while the other three modules on tracker do not. 4.7 Clustering of Solar Noon Time Performance Ratio Data As discussed in Section 4.6, PV modules performance ratio data was subsetted to 15 minutes around solar noon time. In order to reduce data volume, the average of each days PR data was taken. After subtracting the snowy days, there are 75 days of PR data Results: Real-world data analytics 43 left, thus 75 data points represent the solar noon time performance of each module. Since the relationships among the 60 modules noon time performance is not intuitive, an EDA can lead to a better understanding of the data. A pairs plot is commonly the first step of EDA. Figure 4.14. A pairs plot of solar noon time PR of three modules of the same brand. For each row, all the Y axis are the PR of a module. For each column, all the X axis of the plot are PR of a module. Module’s sample number is shown in the diagonal boxes. The correlation coefficient of each X, Y axis is calculated, and represented by varying shades of green. The darker the green relate to the higher correlation coefficient. Results: Real-world data analytics 44 The pairs plot takes the value of the PR of one module as the X coordinate and the PR of another module as the Y coordinate. If the two modules under comparison performed the same at each observation time, then we expect to see all data points in a diagonal line. Fig 4.14 is a plot of solar noon time PR of three modules of the same brand. In order to better visualize the correlation of the X and Y coordinates, the correlation coefficient of the two modules are represented by varying shades of green related to the strength of the correlation coefficient. The darker the green color relates to a higher correlation coefficient between the two PR series. In Fig. 4.14, performance of the module, G.t4.sa18211.00, is over 99% correlated to G.t4.sa18210.00. Only the first ten modules pairs plot is shown in Fig. 4.15, as space is limited; however a pairs plot of all 60 modules was studied. From the pairs plot of all 60 modules, a green and gray pattern helps visualize that modules in different groups. Qualitatively grouping the modules requires a Pearson distance matrix which use correlation coefficient to define the distance between different observations 47 . Also, since there is no domain knowledge suggesting the number of clusters, a k-means clustering analysis was used to determine the number of clusters. Fig 4.16 shows the WCSS as a function of clusters number, k, and there is a clear “elbow point” when k equals 5. An HCA dendrogram of solar noon time performance ratio using the Pearson distance matrix and average linkage criteria is shown in Fig. 4.17. Modules are divided into 5 groups using the cut r ee function. The first group on the left consists of all the modules on the fixed rack. The second group are all modules on tracker 4, 6, and 8 except for three modules of brand M. The third group are all modules on tracker 14 and the forth group are all modules on tracker 12. The last group on the right contains only three modules of brand M. Time series of each modules are plotted out according to the HCA result (Fig. 4.18). There are several gaps since data was Results: Real-world data analytics 45 Figure 4.15. Pairs plot of solar noon time PR of ten modules. For each row, all the Y axis are the PR of one module. For each column, all the X axis of the plot are PR of one module. Module’s sample number is shown in the diagonal boxes. Correlation coefficient of each X, Y axis is calculated, and represented by color. The darker the green background represent the higher correlation coefficient. First three modules showed strong correlations (over 99%) among themselves. They also showed fairly strong correlations to next three modules on a different location (about 90%). However first three module showed low correlation to last four modules, correlation coefficient is lower than 30%. not continuous due to snow and noncontiguous AC power data. The variability of the curve in the same group are mainly caused by the noncontiguous nature of the data. The largest dispersion of data curves appears in the last group. Before mid February, data curves of three modules are highly varied. Results: Real-world data analytics Figure 4.16. Total within-cluster sum of square (WCSS). The elbow point occurs when k equals 5. 46 Results: Real-world data analytics Figure 4.17. The hierarchical clustering of 60 modules based on solar noon time PR time series data. The distance matrix is computed using the Pearson method. Distance between the sets of observations is defined with the average linkage method. From left to right, the first group includes all modules on fixed rack; the second includes all modules from tracker 4, 6, and brand N on tracker 8; the third included all modules on tracker 14; the fourth included all modules on tracker 12; the last group are three modules from brand M (on tracker 8). 47 Results: Real-world data analytics Figure 4.18. Solar noon time PR of 60 modules grouped by hierarchical clustering result. Color of the curve differentiates samples. 48 49 5 Discussion 5.1 Data analytics This section will focus on the problems found in the process of data cleaning, munging, and exploratory data analysis (EDA). 5.1.1 Irradiance data crosscheck In the data subsampling part (Section 4.6), due to snow coverage, some modules, especially those on the fixed rack showed low performance during and after snowy days. Snow has the potential to cover irradiance sensors on SunFarm. Since all irradiance data used in this study were measured by a pyranometer mounted on top of a electrical cabinet near the fixed rack, it is necessary to evaluate the irradiance data quality. There were two pyranometers working on SDLE SunFarm during the observation time, the GHI data used in this work is collected by a CMP11 pyranometer mounted on an electrical cabinet. The other one was mounted horizontally and connected with a Daystar multi-tracer. The Daystar can trace real-time I-V curves of up to 32 modules. Unlike dataloggers, the Daystar doesn’t collect irradiance measurements every minute, it recordes irradiance data only when an I-V curve was being taken. In the first several months, the Daystar took an I-V curve in 30 minute time intervals. After proper data Discussion 50 ● 2012−12−26 2012−12−27 2012−12−28 ● ● ● 400 60 ● ● 60 50 ● ● 40 300 ● ● ● ● ● ● 200 ●● ● 30 40 ● ● ● ●● ● ●● 20 ● ● ●● ● ● 20 ● ● 100 ● ● ● 10 ● ● ● ● ● ● ● ● ● Dec 26 04:00 Dec 26 08:00 Dec 26 12:00 Dec 26 16:00 Dec 26 20:00 Dec 26 23:59 ●● ●●●● Dec 27 00:00 Dec 27 04:00 Dec 27 08:00 Dec 27 12:00 Dec 27 16:00 Dec 27 20:00 Dec 27 23:59 ●●●● Dec 28 00:00 Dec 28 04:00 ● ● ●●● Dec 28 08:00 Dec 28 12:00 Dec 28 16:00 2012−12−29 2012−12−30 2012−12−31 fog−snow fog−snow fog−snow 150 250 Dec 26 00:00 ● ● 0 ●●● ●●●● 0 0 ● ● Dec 28 23:59 ● ● ●● ● 200 60 ● Dec 28 20:00 ● ● ● ● ● ● 150 40 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● 50 20 ● ● ● 100 ● ● ● ● ● ● ● ● ● ● 0 Dec 29 Dec 29 Dec 29 Dec 29 Dec 29 ● Dec 29 Dec 29 ●●●● Dec 30 Dec 30 ● ● Dec 30 ● ●● Dec 30 Dec 30 Dec 30 Dec 30 ● ● 0 ●● 0 ● ●●●●● ●●● ●●●● Dec 31 Dec 31 Dec 31 Dec 31 Dec 31 Dec 31 Figure 5.1. Cross-comparison of global horizontal irradiance (GHI) from two irradiance sensors. cleaning and data alignment, each day’s irradiance data collected by the two pyranometers was compared in the same figure. In Fig. 5.1, red points represent irradiance data collected by the Daystar, the black curve shows irradiance collected by the datalogger. Because the two instruments have different sensitivity and their sampling time and rates are not the same, an efficient way of evaluating the data is visual comparison. In most of the plots, red dots stack on black curve. However in the second plot on the first row, December 27th, 2012, the red points are far above the curve which indicates that irradiance measurement on the Daystar pyranometer is much higher than the one near fixed rack. It is potentially caused by snow coverage. The weather condition data confirms that this Discussion 51 data was collected on a snowy day. A survey of the irradiance cross check has been done by plotting two sensors’ data together. On the following dates, fixed rack pyranometer’s reading is lower than Daystar pyranometer. "2012-12-27, 2012-12-28, 2012-12-30, 201301-04, 2013-01-25". These days’ data were eliminated 5.1.2 Performance ratio filtering Several methods of data filtering were tried; however, with limited data sources (direct POA irradiance, Direct Normal Irradiance (DNI), and the module’s temperature were not available for the case study), the ±15 min around solar noon time PR data was considered the best filtering method when malfunctions like snow covered module surface and snow covered irradiance sensor data were eliminated from the final dataset. Average PR of 30 min around solar noon time is plotted in Fig. 5.2. Modules on the fixed rack have the highest PR, around 0.75. Modules on trackers showed lower PR due to the fact that POA on tracker was over estimated by conversion, and during last two months of operation some microinverters on trackers saturated at noon time. 5.2 Performance at different relative positions There are only two different positions of the PV modules relative to the sun (fixed rack tilted at 22.3◦ and tracker mounted); however, due to the tracker’s mechanical failures during parts of the exposures, there are actually six different positions of PV modules relative to the sun. This is because the relative positions of modules on different trackers are not identical in practice. Discussion 52 Figure 5.2. Average performance ratio of 60 PV modules 15 min around solar noon time. Y axis are the sample IDs of each module. From left to right are modules on fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow), tracker 8 (green), tracker 12 (purple), and tracker 14 (light blue) respectively. 5.2.1 Performance in sunny days Modules mounted on the dual axis trackers should always track the sun, which increases the amount of incident light to a module. Furthermore, as all the incident light is normal to the modules surface, a larger portion of the sunlight can be absorbed. Intuitively, modules mounted on trackers should produce more electricity than the ones mounted on fixed rack. Fig. 5.3 shows normalized AC power production of one PV module on Discussion 53 Figure 5.3. Comparison of normalized AC power of G sa18211.00 (red) on tracker and D sa18286.00 (blue) on tilted rack on December 12t h , 2012. fixed rack (blue) and one PV module on a normally operating tracker (red) on December 12, 2012. Fixed rack is tilted at 22.3◦ which tends to optimize the irradiance gain during summer months of the year. Thus in December, even on a bright sunny day, noon time peak power didn’t reach 70% of the nominal power. The total power production on this particular day of a specific module on a tracker (G.t4 sa18211.00) is 1.64 times of the power production of D.f1 sa18286.00. However, in the summer time, as the sun’s elevation angle is higher at noon time, performance of the modules on fixed rack increases. Fig. 5.4 shows the AC power production of the same modules on May 13t h , 2013. The modules on the trackers gain more power in the morning and afternoon, but almost the same as the module on fixed rack around noon time. As discussed above, one factor is Discussion 54 Figure 5.4. Comparison of normalized AC power of G sa18211.00 (red) on tracker and D sa18286.00 (blue) on tilted rack on May 13t h , 2013 the sun’s elevation angle in May is closer to the fixed rack’s tilt angle at noon time. Another possible effect is microinverter’s saturation. From the data provided by Enphase, M215’s maximum output power is 225 W 48 . Around noon time, the AC power of brand G module’s output increased to 225 W and the microinverter saturated, which forms the ”flat head” of the curve. 5.2.2 Performance in snowy days Snow-coverage occurred on all of the modules mounted on fixed rack. When a PV module is fully covered by snow, there is no light incident to solar cell and the module’s power Discussion 55 production is zero. If part of the solar cell is exposed to sunlight, it will generate electricity. However, because of the bypass diodes, if only one cell is working then current cannot flow. This small amount of electricity will be dissipated as heat, raising the temperature of the PV module, and melting the snow on the surface. As a result, snow coverage should not last long, but it will increase the thermal stresses on the solar cell and string, which may lead to reliability issues in the future. In contrast, most of the modules on the tracker do not have the snow coverage problem. On a normal working tracker, the tracker frame is almost vertical during dawn and dusk, thus it mechanically avoids snow accumulation on the module surface. 5.3 Performance of different brands PV modules are chosen for PV power plants based on their nominal power from a module’s data sheet. Fig. 4.2 shows that the brands H and Q didn’t reach 95% of their nominal power, and therefore that a power plant could loose 5% of the designed power output by using these two models of PV module. For a nominal 10 MW utility scale power plant, that is over 500 kW power “lost" due to the over-estimation of the PV module’s power output. In comparison, power plants using module type A, J, or S may produce more power than specified, which is not necessarily good since it may over load the power grid or subsequent electricity storage instruments. Furthermore, a PV module’s price is also based on the nominal power. For example, while brand A and Q share the same nominal power, the initial selling price of brand A in2011 was 0.82 $/W, while brand Q was 0.80 $/W, suggesting Brand Q is cheaper. However, given the baseline results of the two brands, brand A’s initial performance is 101% of its nominal power, but brand Q’s initial performance is only 92.5%. The final price of brands A and Q are 0.812 $/W and Discussion 56 0.865 $/W, respectively. Thus, brand A is actually a more cost effective choice. Deviation of power output of the brand may also be an important factor for the initial performance of a PV power plant. Since most power plants are implemented with string inverters, power output of a string is determined by the lowest module in the string. The lower the power output deviation of modules in the same string, the less the electricity dissipated as heat in a string. Internal thermal stress is also suspected as a factor that causes power degradation. 5.4 Power time series data clustering In the hierarchical clustering of power time series data two distance methods were applied, Euclidean distance and dynamic time warping (DTW) 49 . Euclidean distance is the most commonly used method, which is the square root of sum of squares of attribute differences. DTW is an algorithm often used in time series analysis for measuring the similarity of two temporal sequences, which may vary in time or speed. However, in our application DTW was not used because all time series data were rigidly aligned. The speed of variation of AC power was determined by incident sun light and a module’s internal characteristics, so they should be at the same pace. Therefore, dynamic time warping is not applicable and Euclidean distance is more appropriate. There are only two different positions of the PV modules relative to the sun (fixed rack and tracker mounted); however, due to the mechanical failures of the trackers during parts of the exposure time, there are actually six different locations of PV modules relative to the sun. This is because the relative position of modules on different trackers are not identical in practice. In order to find a proper linkage criteria, hierarchical clustering results were compared; using three linkage criteria; complete, single and average. In Discussion 57 complete-linkage clustering, or farthest neighbor clustering 50 , the distance of two clusters is equal the distance between those two elements that are farthest from each other. In single-linkage clustering, or nearest neighbor clustering; the distance between two clusters equals to the distance between the nearest elements pair. Average clustering, or UPGMA (unweighted pair group method with arithmetic mean) defines the distances between two clusters as the average of all distances between pairs of elements. Dendrograms using all three different linkage criteria with the group equal to six, the modules on the same site are always grouped in one cluster. Cutting these 6 large groups into 20 smaller clusters, the modules from the same brand are more likely to form a cluster than the dendrograms using the other two criteria. Although k-means clustering itself is an independent clustering method, in this application, it was used to determine the number of groups and confirm the result of the hierarchical clustering. For power data clustering, k-means clustering consistent to hierarchical clustering result when k equals six, which is also the elbow point. This result confirmed that modules’ location (which site they were mounted on the SDLE SunFarm) has the strongest influence on modules power production over time. The analytical method of power time series data clustering gives a way of distinguishing PV systems mounted in different configurations. In this case study, as observed on the field, it was already known that six sites worked differently. For data shared through Energy CRADLE, which does not necessarily carry a maintenance log with them, this power time series clustering will be a good tool to start to classify the power data. Discussion 58 5.5 Solar noon time performance ratio clustering Solar noon time PR clustering is based on a subset of PR time series data. Snow covered PV modules and snow covered irradiance sensor data were removed from the data. For the other 85 days, average of solar noon time PR was taken. In hierarchical clustering, the Pearson correlation coefficient was used as distance metric, it is a measure of how similar two time series’ shapes are, in other words, how similarly two modules’ performances vary with time. As with power time series clustering, the average linkage criteria was used. The clustering results can be interpreted as a group of fixed rack, a group of functioning trackers, two groups of malfunctioning trackers, and a group of malfunction modules. Fig 4.17 shows this clearly, the first cluster on the left are all the modules on fixed rack. Even at solar noon time, modules performance on fixed rack is different from the ones on trackers because they are tilted at 22.3◦ , which is much shallower than the trackers at solar noon time in the winter. The second cluster on the left consists all modules from trackers 4 and 6, and one brand from tracker 8. These three trackers were tracking correctly most of the time, however 6 and 8 stopped for a short time. Taking the means of PR at solar noon time minimized the influence of outliers, and made the difference between fixed rack and tracker more distinguishable. It is noteworthy that the “fixed rack group" is very close to the“normal tracker group", which indicates their performances are very similar. Fig.4.18 proved that the shape of curves in group 1 and group 2 are similar, though the amplitudes are different because of the difference in angles. Tracker 12 was off tracking most of the time, and tracker 14 was not in motion, so they each formed a group. The last group on the right isolated brand M from other modules on the same tracker. Brand M had been replaced during the experiment because previous models are not compatible with Enphase M215 microinverter. During Discussion 59 the first two months of operation errors were reported by Enlighten, so these modules were replaced on Feb. 15th. Previous clustering result didn’t reflect this change, yet using noon time PR data and the Pearson coefficient distance method distinguished these changes and the distance between this cluster to the other modules on the same tracker (brand N modules) is quite far, which indicates performance of brand M is unlikely to be related to brand N. Solar noon time mean PR clustering result using Pearson correlation coefficient distance metric neglected diminutive differences among functioning trackers 51 , differentiated fixed rack and trackers, identified malfunctioning tracker 12 ,14 and malfunctioning module brand M. 60 6 Conclusions Previous work on time series analysis (TSA) of a photovoltaic (PV) system’s real-world performance focused on determining precise and accurate Rd by means of using highly filtered data and neglecting seasonality. My research provides a higher fidelity data analytics approach to TSA. With a case study of the first six mouths of real-world performance of 60 PV modules and climate conditions, a data analytic procedure was developed, which includes the following parts. Raw data was first validated by characterization of the measurement apparatus. The impact of microinverters working in burst mode, and microinverter’s efficiency on AC power data were evaluated. Irradiance data was converted from global horizontal plane (GHI) to plane of array (POA), and the systematic error being introduced was discussed. Secondly, using the redundancy of measurements, snow covered irradiance sensor data was first detected by visually cross checking the daily irradiance profile; and then eliminated from the data set. Thirdly, data alignment was accomplished using a cross-correlation function in R to minimize the time lag of two time series. Exploratory data analysis (EDA) on integrated data indicate that the total energy harvest of each PV system varies severely. A clustering analysis on power time series data Conclusions 61 found that PV systems located on the same mounting system performed similarly, and the behavior of all the tracking systems was not identical. Data was assembled based on IEC 61724, and performance ratio (PR) was used to evaluate the PV system’s performance. PR of the 60 modules were subsampled to solar noon time, and snow covered days were eliminated. The solar noon time performance ratio clustering neglected diminutive differences of different sites, strongly differentiated fixed rack and trackers, identified malfunctioning tracker 12, 14 and malfunction module brand M. This work leads to improvements to the SunFarm metrology platform and suggests it is necessary to have redundant measurements. The Clustering results provide guidance for future data modeling. 62 7 Future research 7.1 Improved SunFarm data quality and redundancy Since the conversion of GHI to POA introduced systematic error, sufficient irradiance measurements are critical to improve data quality. Another pyranometer measuring global irradaince was mounted on tracker 7 in June, 2013. A pyrheliometer, which measures direct irradiance was also mounted on tracker 7 at the same time. Thus, a direct measurement of POA and NDI( normal direct irradiance) are now available. By taking the ratio of NDI and POA, it is possible to determine clearness (or cloudiness) of the sky. Redundant irradiance measurement will also be made to cross-check the sensor’s accuracy. On the Energy CRADLE’s user interface, a sensor cross-check page will enable real-time on-site monitoring. 7.2 Predictive model The Global SunFarm Network has already come online, there will be sufficient amount of PV performance data coming from different climatic conditions available for the next step. In the next phase of study, it would be interesting to build a mixed effect model 52 of PV module’s power output as a function of multiple climate stresses. A mixed effect Future research 63 model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. Furthermore, by correlating to indoor test results, we hope be able to predict a PV module’s degradation with climatic stresses. This kind of data can direct the improvement of PV modules qualification testing, which will eventually lead to improved lifetime for PV modules. Appendix 64 Appendix A List of 24 manufacturers and nameplate power 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Manufacturer NameplatePower AUO 240 Astronergy 235 Bosch 225 CSI 220 Conergy 220 ET Solar 235 EcoSolargy 230 Helios 240 Hyundai 230 Kyocera 240 LG 220 MX Solar 230 Mage 230 Perlight 240 REC 230 Sanyo 220 Schott 230 Schuco 240 Sharp 235 Siliken 220 Solar World 230 Trina 230 UpSolar 240 Yingli 230 Appendix 65 Appendix B SunFarm network 1 SDLE SunFarm design & characteristics 1.1 Overview The SDLE SunFarm was established for outdoor testing of long lived materials, components, and systems 53,54 . It is a highly instrumented outdoor test facility which is not commonly found in academic research. It is located on CWRU west campus, and is about one acre in size. There are 16 electrical sites on the SunFarm including 14 high precision dual axis trackers and two sites of fixed tilt racking, shown in Fig B.1. A total of 148 full-sized crystal silicon modules bought on the open market from 24 different manufacturers in sets of six or eight were exposed on both trackers and fixed racking. On the trackers 8000 PV material samples will be exposed under 1X, 2X, 4X, and 5X suns illumination with front surface mirror concentrators. 1.2 Samples Samples being exposed on the SDLE SunFarm are divided into two major groups: PV modules and PV material sample coupons. Full-sized crystalline PV modules. In order to better understand power degradation mechanisms and determine power degradation rates (Rd ) 148 full-sized crystalline silicon modules from 24 different manufacturers around the world are being exposed on SunFarm to investigate their performance under real-world working conditions. The majority of the population are polycrystalline silicon modules, only two brands are monocrystalline silicon modules. 24 manufacturers and their nameplate power are listed in appendix A. The 60 crystalline PV module samples studied in this thesis are part of this Appendix Figure B.1. Top half shows the blue print of SDLE Sunfarm and the distribution of 16 electrical sites. Bottom half shows an operating tracker and the electrical cabinets behind trackers. On the tracker frame showing in the figure, top half features six PV modules mounted horizontally, and the bottom half features 48 sample trays mounted in 12 by 4 rows. 66 Appendix 67 Figure B.2. A mechanical drawing of a sample tray, on the left.A 3D drawing of 5X front surface mirror concentrator on the right. total population. These 60 modules are from 20 different manufacturers with 3 samples from each brand. PV material samples. In order to better understand PV modules degradation mechanisms, we need to know how each component of PV modules degrade over time. PV material and component samples are made and exposed on the SDLE SunFarm Backsheet samples, front sheet samples and transparent conductor oxide (TCO) samples are cut into 1 × 1.5 inch coupon and held with sample trays(Fig. B.2).With the use of front surface concentrator (Fig. B.2), PV material samples can be exposed at 1X, 2X, 4X, 5X sunlight intensities as well as real-world climate conditions. Material samples will be taken off periodically for optical characters measurements. 1.3 PV mounting system Two different PV mounting system are being used on the SunFarm. Fourteen dual-axis trackers which are commonly used for high-concentration photovoltaics (HCPV) can Appendix 68 keep the module plane normal to sunlight. Two sites are fixed tilt rack, which is commonly used for roof PV installation or utility power plant. Fixed racks are facing south and tilted at 22.3◦ Fixed rack. Power produced from a PV array is proportional to the direct sunlight it receives. Typically, fixed PV arrays are tilted to an angle equal to the latitude of the arrays’ location, which is the average elevation angle of the sun through out a year. Here in Cleveland since we usually have more cloudy days in winter, the fixed rack on SDLE SunFarm was installed at a shallower angle; latitude of the SDLE SunFarm is 41.5◦ , the tilt angle of the fixed racking is 22.3 ◦ . The 30 meters of fixed tilted rack are divided into two identical electrical sites. Eighteen PV modules are exposed on each site. Trackers. Dual-axis solar trackers orient PV modules normal to direct sunlight at all times. They are often seen in concentrated photovoltaic (CPV) applications especially HCPV systems, which enable the optical components in concentration system. In flatpanel PV applications, trackers can maximize the performance of PV modules by minimizing the indecent angle of sunlight to the module plane. The 14 dual-axis trackers on the SDLE SunFarm were manufactured by Feina Tracker, Spain. Each tracker consist of three parts, foundation pole, tracker head, and tracker panel, shown in Fig. B.3. Tracker head is 10 feet off the ground, driven by two DC motors in both horizontal and azimuth directions. The motion of the motors is controlled by a Tracker Control Unit (TCU) inside the 4 to 5 inch electrical box behind each tracker. Tracker panels are 16’ 4” (5m) width× 13’ 1” (4m) length, shown in Fig B.4, which can hold up to 12 PV modules in landscape mode. In order to enhance the capability of testing various modules, components, and materials ten flexible unistrut were placed on the tracker panel to fasten modules and sample trays to the tracker. Appendix 69 Figure B.3. The photo was taken when SunFarm was under construction. Relative position of fixed rack and tracker was shown in the image, fixed rack is in the front of SunFarm in order to avoid shading issue. Relative position of tracker head, tracker foundation, and two electrical cabinets behind each tracker are shown on the right hand side. Figure B.4. A mechanical drawing of tracker frame is shown on the left hand side. Distance between horizontal unistrut is 0.5m in order to mount sample trays. On the right, there is a drawing of a tracker frame fully loaded with 12 full size PV modules. 1.4 SunFarm electrical design Two sites of fixed tilted rack, plus 14 dual-axis trackers formed the 16 electrical sites of the SDLE SunFarm. Two electrical cabinets behind each site separate the power device Appendix 70 and datalogger system (Fig B.3). 110V AC is connected to the power cabinets to power the tracking system as well as Ethernet switches. In the data cabinet, a data logging system consist of Campbell CR1000 datalogger, multiplexer, and battery monitor sensor readings. The SDLE SunFarm have 122 individual power plants, including 120 individual PV modules connected with microinverter and 2 strings of 8 PV modules connected with string inverter. These 122 power plants, which can generate about 32 kW of electricity at peak, all tied to grid through a reversing relay. 1.5 Metrology platform Power Data. A metrology platform is built for SDLE SunFarm data monitoring, include power, insolation, and weather monitoring. For power monitoring, either inverters or I-V curve tracers were used. Two trackers with eight full-sized modules on each used Solectria PVI1800 string inverters. Another 10 trackers as well as two tilt rack sites used two brands of micro-inverters, Enphase and Power-One. On the other two trackers a Daystar multi-tracer is used to take I-V curves of full-sized modules and mini-modules with one minute time intervals. A portable I-V curve tracer was used on clear days to take I-V curves on demand. Insolation data. Redundant insolation sensors were placed around the SDLE SunFarm in order to get accurate irradiance data and align the trackers. Four Kipp & Zonen pyranometers of three different models (CMP6, CMP11, CMP21) were placed on the horizontal, tilt rack, and tracker planes. A Kipp & Zonen pyrheliometer (CHP1) was used to measure direct illumination. Multiple split-cell reference cells, Li-cor Li-200 pyranometers, and Apogee SP-212 full spectra radiance sensors were placed in the tracker plane to help align the tracker frames’ orientations. Another four Apogee SP-212 full spectra Appendix 71 irradiance sensors and apogee SU-100 UV sensors were mounted on the sample trays to measure the concentrated solar irradiance. Climate data. Two Vaisala WXD520 weather stations were placed on the SunFarm to record wind speed, wind direction, rainfall, rain intensity, rain duration, and humidity. An anemometer was connected to the Master Control Unit of trackers to monitor the wind load on the trackers. A snow cup was used to measure the precipitation. T-type thermocouples were used for backsheet temperature monitoring. Data acquisition system. The data acquisition system consists of 17 networked Campbell Scientific CR-1000 dataloggers, with each datalogger connected to an AM 16-32 multiplexer, extending the capacity of datalogger to 32 differential measurement channels. The Campbell dataloggers monitor thermocouple and sensor outputs. Enphase micro-inverters use envoy unit to collect data from each individual micro-inverter. Similarly, Solectria string inverters use Solenview system to collect data. Minute by minute data can be downloaded from their web servers. 1.6 SunFarm Network Cleveland’s climate, a humid continental, is not typical for PV degradation research. In order to study PV modules’ performance under different climatic conditions, a global SunFarm network was established among nine PV outdoor test beds across the world. These test beds include four Ohio SunFarms: SDLE SunFarm, Cleveland, Ohio; Lakeview 1MW power plant, Cleveland, Ohio; Replex SunFarm, Mt. Vernon, Ohio; and AEP Dulan test center, Columbus, Ohio (Fig. B.5). Within the United States, we cooperate with two Q-Lab SunFarms in Arizona and Florida, which are in mid-latitude desert climate and humid subtropical climate area, Appendix 72 Figure B.5. Upper left corner is SDLE SunFarm with tilt fixed rack in the front and duel axis trackers on the back. Bottom right is Replex SunFarm at Mount Vernon Replex Plastics, with fixed rack, single axis tracker and dual axis trackers. Bottom left is AEP SunFarm at Dolan Technology Center. Mirror Augmented PV (MAPV)system are on the bottom half of the tilt racks and flat back surface mirrors were mounted tilted towards the modules. The top half of the tilt rack has a non-augmented PV system. respectively. On an even larger scale, we established three SunFarms abroad with international collaborators: Underwriter Lab Sunfarms in Taitung and Lujhu, Taiwan, and SunFarm at the Indian Institute of Technology Gandhinagar, Ahmedabad (IITGN). These nine SunFarms span a large the range of environmental conditions across the globe. Similar data collection methods were applied to each SunFarm. In order to better manipulate the Big Data that streams back daily, and manage the sensors that go on each site, a data acquisition system Energy CRADLE was established. Appendix 73 2 Energy CRADLE SunFarm informatics The purpose of the Energy Common Research Analytics and Data Lifecycle Environment (Energy CRADLE) is to create for engineering, and in particular lifetime science, the tools and protocols necessary to transform Big Data to information, which informs scientific knowledge to guide further analysis 30 38 39 40 . Energy CRADLE is tightly focused on serving the needs of handling and sharing data among the SunFarm network researchers. Raw data collected from the SunFarms will go through data pre-processing and semantic annotation and stored in a NO-SQL Hadoop system. With domain knowledge Energy CRADLE can manage the organization and orchestration of the data, making the inquiry of the data more efficient. The Energy CRADLE data integration environment has two features, shown in Fig. B.6. First, it can push all the raw data collected from SunFarms on to a Hadoop Distributed File System (HDFS) and further map to HBase which is a distributed database. Secondly, through Thrift and REST servers, user can use a visual front end to interact with data stored in HBase. Figure B.6. Architecture of NO-SQL Hadoop system. Appendix 74 The front end of Energy CRADLE (Fig. B.7)consists of four different web pages. According to their functions, the four pages were named: Data Inquiry, Equipment Registration, Maintenance, and Metrology Check. On the Data Inquiry page, data can be queried by location, system ID, local time, or local solar time. Invalid data points or NAs are filtered out. Equipment Registration page was built for the purpose of sensors and sample management. Location, serial number, and calibration coefficients can be registered remotely from any one of the SunFarms. Because the top of the tracker is more than 30 feet above the ground when it is operating, maintenance(e.g, changing samples) has to done when the track is at birdbath mode. Whenever maintenance is needed, the operator can go to the Maintenance page to specify the location and duration. Data collected during maintenance will be flagged in the database warning that the tracker is not at normal operation mode. The redundant insolation and weather sensors cross check to assure that sensors are working correctly. The Metrology Check page can plot the same variables collected by multiple sensors comparatively, making the cross checking easy. Appendix 75 Figure B.7. Architecture of Energy CRADLE’s user front end. Bibliography 76 Complete References [1] Kurt Hornik. The R FAQ, 2013. [2] Mano´’el Rekinger Ioannis-Thomas Theologitis Myrto Papoutsi Ga´’etan Masson, Marie Latour. Global market outlook fo photovoltaics. European Photovoltaic Industry Association annual report, 2013. [3] SANDRA ENKHARDT. Germany sets new pv installation record in 2012, January 2013. [4] Becky Beetz. China 2012: 5gw of pv installations predicted, October 2012. [5] Stephen Lacey. A solar system is installed in the us every 4 minutes, August 2013. [6] J. Hemminger, G. Crabtree, and A. Malozemoff. Science for energy technology: Strengthening the link between basic research and industry. A report from the Basic Energy Sciences Advisory Committee, US Department of Energy, 2010. [7] John Hemminger. From quanta to the continuum: Opportunities for Mesoscale Science. Technical report, A Report from the Basic Energy Sciences Advisory Committee, 2012. [8] Andrew L Rosenthal and Cary G Lane. Field test results for the 6 mw carrizo solar photovoltaic power plant. Solar cells, 30(1):563–571, 1991. [9] John H Wohlgemuth, Daniel W Cunningham, Paul Monus, Jay Miller, and Andy Nguyen. Long term reliability of photovoltaic modules. In Photovoltaic Energy Conversion, Conference Record of the 2006 IEEE 4th World Conference on, volume 2, pages 2050–2053. IEEE, 2006. [10] Dirk C Jordan and Sarah R Kurtz. Photovoltaic degradation rate-an analytical review. Progress in Photovoltaics: Research and Applications, 21(1):12–29, 2013. [11] M.P. Murray, L.S. Bruckman, and R.H. French. Durability of acrylic: Stress and response characterization of materials for photovoltaics. In Energytech, 2012 IEEE, pages 1 –6, May 2012. [12] Myles P. Murray, Laura S. Bruckman, and Roger H. French. Photodegradation in a stress and response framework: Poly(methyl methacrylate) for solar mirrors and lens. Journal of Photonics for Energy, 2(1):022004–022004, 2012. Bibliography 77 [13] D. C. Jordan and S. R. Kurtz. The dark horse of evaluating long-term field performance-data filtering. In PV Module Reliability Workshop, February 26–27, 2013, Golden, Colorado, 2012. [14] Ryan M Smith and National Renewable Energy Laboratory (U.S.). Outdoor PV Module Degradation of Current-Voltage Parameters Preprint. Number 5200-53713 in NREL/CP. National Renewable Energy Laboratory, Golden, CO, 2012. [15] Radhika Lad, John Wohlgemuth, and G TamizhMani. Outdoor energy ratings and spectral effects of photovoltaic modules. In Photovoltaic Specialists Conference (PVSC), 2010 35th IEEE, pages 002827–002832. IEEE, 2010. [16] D Jordan and S Kurtz. Photovoltaic degradation risk. In World Renewable Energy Forum, Colorado, 2012. [17] A. Kimber, T. Dierauf, L. Mitchell, C. Whitaker, T. Townsend, J. NewMiller, D. King, J. Granata, K. Emery, and C. Osterwald. Improved test method to verify the power rating of a photovoltaic (PV) project. In Photovoltaic Specialists Conference (PVSC), 2009 34th IEEE, pages 000316–000321, 2009. [18] Iec 60891 ed2.0 - photovoltaic devices - procedures for temperature and irradiance corrections to measured i-v characteristics | iec webstore | publication abstract, preview, scope. [19] Sanford Weisberg. Applied linear regression, volume 528. John Wiley & Sons, 2005. [20] John Fox and Sanford Weisberg. An R companion to applied regression. Sage, 2011. [21] D. C. Jordan and S. R. Kurtz. Data filtering impact on PV degradation rates and uncertainty (poster). In PV Module Reliability Workshop, 28 February - 2 March 2012, Golden, Colorado, 2012. [22] Nils H Reich, Alexander Goebel, Daniela Dirnberger, and Klaus Kiefer. System performance analysis and estimation of degradation rates based on 500 years of monitoring data. In Photovoltaic Specialists Conference (PVSC), 2012 38th IEEE, pages 001551–001555. IEEE, 2012. [23] Klaus Kiefer and Daniela Dirnberger. A degradation analysis of pv power plants. In 25th EUPVSEC, 2012,Valencia,Spain, pages 005032–005037. EUPVSEC, 2010. [24] Matthew J. Reno and Joshua Stein. Using cloud classification to model solar variability. Technical report, Sandia National Laboratories, 2013. Bibliography 78 [25] Mike Anderson Zoe Defreitas Mark Mikofski Yu-Chen Shen Zach Campeau, Charlie Hasselbrink. Validation of the pvlife model using 3 million module-years of live site data. In 39th IEEE Photovoltaic Specialists Conference, 2013. [26] J. Ye, T. Reindl, and J. Luther. Seasonal variation of PV module performance in tropical regions. In Conference Record of the IEEE Photovoltaic Specialists Conference, pages 2406–2410, 2012. [27] F.A. Mejia and J. Kleissl. Soiling losses for solar photovoltaic systems in california. Solar Energy, 95:357–363, 2013. [28] IEC 61724 ed1.0 - photovoltaic system performance monitoring - guidelines for measurement, data exchange and analysis | IEC webstore | publication abstract, preview, scope. [29] Werner Horn, Silvia Miksch, Gerhilde Egghart, Christian Popow, and Franz Paky. Effective data validation of high-frequency data: time-point-, time-interval-, and trend-based methods. Computers in biology and medicine, 27(5):389–409, 1997. [30] G.Q. Zhang, T. Siegler, P. Saxman, N. Sandberg, R. Mueller, N. Johnson, D. Hunscher, and S. Arabandi. Visage: A query interface for clinical research. In Proceedings of the 2010 AMIA Clinical Research Informatics Summit; San Francisco. March 12–13; 2010, pages 76–80, March 2010. [31] German Puebla, Francisco Bueno, and Manuel Hermenegildo. A generic preprocessor for program validation and debugging. In Analysis and Visualization Tools for Constraint Programming, pages 63–107. Springer, 2000. [32] John W Tukey. Exploratory data analysis. Reading, Ma, 231, 1977. [33] Matthew B Miles and A Michael Huberman. Qualitative data analysis: An expanded sourcebook. Sage, 1994. [34] Michael R Anderberg. Cluster analysis for applications. Technical report, DTIC Document, 1973. [35] Rui Xu and Don Wunsch. Clustering, volume 10. Wiley. com, 2008. [36] wikipedia. hierarchical clustering, June 2013. [37] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, page 14. California, USA, 1967. Bibliography 79 [38] G.Q. Zhang, S. Arabandi, and S. Redline. Physio-MIMI lessons learned. Technical report, National Center for Research Resources (NCRR), 2011. [39] Physio-MIMI homepage, 2012. [40] R. Mueller, S. Sahoo, X. Dong, S. Redline, S. Arabandi, L. Luo, and G.Q. Zhang. Mapping multi-institution data sources to domain ontology for data federation: The PhysioMIMI approach. In AMIA Clinical Research Informatics Summit, March 2011. [41] Enphase. âĂIJburst-modeâĂİ makes enphase micro-inverter systems the smarter choice. Enphase Whitepaper Series, 2010. [42] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. Constrained k-means clustering with background knowledge. In ICML, volume 1, pages 577–584, 2001. [43] Heinrich Häberlin. Normalized representation of energy and power of pv systems. Photovoltaics: System Design and Practice, pages 487–506. [44] Richard Perez, Pierre Ineichen, Robert Seals, Joseph Michalsky, and Ronald Stewart. Modeling daylight availability and irradiance components from direct and global irradiance. Solar energy, 44(5):271–289, 1990. [45] PVEducation.org. Solar time, June 2009. [46] PVEducation.org. weather data, June 2013. [47] Kevin R. Coombes. oompaBase: Class unions and matrix operations for OOMPA, 2013. R package version 3.0.1. [48] Preston Steele Tefford Reed David Briggs, Dave Williams. Bigger is better: Sizing solar modules for microinverters. Enphase Whitepaper Series, 2010. [49] Meinard Müller. Dynamic time warping. Information Retrieval for Music and Motion, pages 69–84, 2007. [50] Charles J Krebs et al. Ecological methodology, volume 620. Benjamin/Cummings Menlo Park, California, 1999. [51] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(347-352):240–242, 1895. [52] Alex J Sutton, Keith R Abrams, David R Jones, David R Jones, Trevor A Sheldon, and Fujian Song. Methods for meta-analysis in medical research. J. Wiley, 2000. Bibliography 80 [53] Yang Hu, Mohammad A Hosain, Tarun Jain, Yashwanth R Gunapati, Lauren Elkin, GQ Zhang, and Roger H French. Global sunfarm data acquisition network, energy cradle, and time series analysis. In Energytech, 2013 IEEE, pages 1–5. IEEE, 2013. [54] Yang Hu, Dave Hollingshead, Mohammad A Hossain, Mark Schuetz, and Roger French. Comparison of multi-crystalline silicon pv modules’ performance under augmented solar irradiation. In MRS Proceedings, volume 1493, pages mrsf12–1493. Cambridge Univ Press, 2013.