PV MODULE PERFORMANCE UNDER REAL-WORLD TEST CONDITIONS–A DATA ANALYTICS APPROACH

advertisement
PV MODULE PERFORMANCE UNDER REAL-WORLD TEST
CONDITIONS–A DATA ANALYTICS APPROACH
by
YANG HU
Submitted in partial fulfillment of the requirements
For the degree of Master of Science
Thesis Adviser: Prof. Roger H. French
Department of Materials Science and Engineering
CASE WESTERN RESERVE UNIVERSITY
May, 2014
PV Module Performance Under Real-world Test Conditions–A Data
Analytics Approach
Case Western Reserve University
Case School of Graduate Studies
We hereby approve the thesis1 of
YANG HU
for the degree of
Master of Science
Prof. Roger French
Committee Chair, Adviser
Prof. Roger French
11/21/2013
Prof. David Matthiesen
Committee Member
Prof. David Matthiesen
11/21/2013
Prof. Jennifer Carter
Committee Member
Prof. Jennifer Carter
11/21/2013
Prof. Jiayang Sun
Committee Member
Prof. Jiayang Sun
11/21/2013
Dr. Timothy Peshek
Committee Member
Dr. Timothy Peshek
11/21/2013
Dr. Yifan Xu
Committee Member
Dr. Yifan Xu
1
11/21/2013
We certify that written approval has been obtained for any proprietary material contained therein.
Dedicated to science
and the pursuit of progress.
Table of Contents
List of Figures
Acknowledgements
Abstract
Chapter 1. Introduction
vi
viii
ix
1
Lifetime and degradation science approach
2
Thesis overview
3
Chapter 2. Background and literature review
Previous research on real world PV modules’ performance
5
5
Standards
10
Data science
13
Clustering analysis
14
Chapter 3. Real-world Data Acquisition
16
SDLE SunFarm design
16
Global SunFarm network and Energy CRADLE
17
Chapter 4. Results: Real-world data analytics
19
Overview
19
Raw data validation
21
Exploratory Data Analysis (EDA) on Integrated Data
29
Clustering of AC Power Data
31
Data Assembly
36
Sub-sampling
38
iv
Clustering of Solar Noon Time Performance Ratio Data
Chapter 5. Discussion
42
49
Data analytics
49
Performance at different relative positions
51
Performance of different brands
55
Power time series data clustering
56
Solar noon time performance ratio clustering
58
Chapter 6. Conclusions
60
Chapter 7. Future research
62
Improved SunFarm data quality and redundancy
62
Predictive model
62
Appendix A. List of 24 manufacturers and nameplate power
64
Appendix B. SunFarm network
65
SDLE SunFarm design & characteristics
65
Energy CRADLE SunFarm informatics
73
Appendix. Complete References
76
v
List of Figures
2.1
Pie chart of method used to determine Rd
6
2.2
PR subsetting
8
4.1
60 PV modules distribution
20
4.2
Baseline result of 20 brands
22
4.3
Power time series plot
23
4.4
Microinverter’s efficiency
24
4.5
Power curve comparison
29
4.6
Total power production of 20 bands
30
4.7
Normalized power production of 20 brands
32
4.8
Hierarchical cluster 1
33
4.9
Total within cluster sum of square 1
34
4.10
Power time series plot with clustering result
35
4.11
Normalized performance metrics
39
4.12
Noontime PR versus yI
40
4.13
PR in different climate condition
42
4.14
Pairs plot of PR 1
43
4.15
Pairs plot of PR 2
45
4.16
Total within cluster sum of square 2
46
4.17
Hierarchical cluster 2
47
4.18
PR time series plot with clustering result
48
vi
5.1
Sensor cross check
50
5.2
Averaged performance ratio of 15 min around solar noon time
52
5.3
Comparison of normalized AC power in winter
53
5.4
Comparison of normalized AC power in summer
54
B.1
An overview of SDLE Sunfarm
66
B.2
Sample tray and concentrator
67
B.3
Dual axis tracker
69
B.4
Tracker frame
69
B.5
SunFarms within Ohio
72
B.6
Architecture of NO-SQL Hadoop system
73
B.7
Architecture of Energy CRADLE’s user front end
75
vii
Acknowledgements
I would like to express my deepest gratitude to the patience, diligence, and resourcefulness of the entire team of researchers in the Solar Durability and Lifetime Extension
(SDLE) center at Case Western Reserve University, Department of Material Science and
Engineering, headed by Prof. Roger H. French. Explicit thanks to Dr. Timothy Peshek
and Mohammad A. Hossain who helped build and maintain the SDLE SunFarm data
acquisition system.
Thanks for the coordinated efforts by researchers at the Center for Statistical Research, Computing and Collaboration (SR2C), Department of Epidemiology & Biostatistics. Prof. Jiayang Sun and Dr. Yifan Xu’s guidance in statistics and data science was
instrumental in this work.
Assistance and technical support from researchers in the Medical Informatics Division of EECS, especially Prof. G.Q. Zhang and his group members, Yashwanth Reddy
Gunapati, and Tarun Jian, who were extremely valuable in completing the data collection and Energy CRADLE part of this work.
I would also like to acknowledge the funding for this work.
The SDLE center
was established through funding through the Ohio Third Frontier, Wright Project Program Award Tech 12-004. The PV module case study was supported by the Bay Area
Phocovoltaic Consortium Prime Award No. DE-EE0004946, Subaward Agreement No.
60220829-51077-T.
At last I would like to certify that there is no proprietary material in this thesis.
viii
PV Module Performance Under Real-world Test Conditions–A Data
Analytics Approach
Abstract
by
YANG HU
0.1 Abstract
In pursuit of a higher fidelity understanding of the long-term degradation of long-lived
technologies, such as photovoltaic (PV) systems, the framework of Lifetime and Degradation Science (L&DS) goes beyond initial qualification tests and investigates the underpinning mechanisms of degradation. L&DS concerns itself with the complex and multivariate signatures of the degradation process and uncovering the fundamental physical mechanisms contributing to that degradation. In the case of PV modules, this effort requires extensive continuous monitoring of PV modules’ power production and
climatic conditions. The responses of PV module to the stressors of the real world is
cross-correlated to the simulated and accelerated stressors placed on devices in a laboratory setting.
A unique, highly instrumented, outdoor test facility for PV materials, components,
and systems, the Solar Durability and Lifetime Extension (SDLE) center’s SunFarm, was
built for the purpose of better understanding the power degradation mechanisms of
PV modules and materials. The SDLE SunFarm provides an apparatus for the collection of real-world time series data consisting of output power, weather and insolation
ix
metrology. The SunFarm is comprised of 122 individual PV power plants, including 120
module-level plants and 2×8 modules, string-level plants. Output power is monitored
through appropriate grid-tied inverters.
The metrology package developed at CWRU for the collection of time series data provides a model to be implemented at external sites around the globe. In order to expand
the ability of monitoring PV systems’ performance under different climatic conditions,
a global SunFarm Network was implemented among nine outdoor test facilities around
the world in collaboration with academic institutions and industrial partners including
commercial power plants.
This thesis provides the initial data analytics on the first six months of data from 60
PV modules on the SDLE SunFarm, and serves as a model for the analytics of full dataset
from the global SunFarm Network. The data was first validated by characterization of
the measurement apparatus, redundancy of measurement, and time-slewing according to minimization of the time cross-correlation function using a free and open-source
statistical software language and packages known as “R”. Using R (v3.0.1) 1 for clustering
data analysis base upon unfiltered AC power time series showed that the data fell into
six clusters, which represented the six different electrical sites of SDLE SunFarm.
The data were intelligently assembled and subsampled around solar noon time. PV
performance ratio (PR), which is a measure of PV modules’ output at given incident
power from sunlight, was used as a indicator of modules’ working effectiveness. Correlations among the filtered sub-set of solar noon time PR data were discerned with hierarchical clustering analysis. K-means clustering was used to confirm the optimum
x
number of clusters for the analytics. The clustering results differentiate modules on different physical sites, pointed out malfunctions of the PV mounting system, and incapacity of certain module brands. These results are useful for correlating different modules’
response to stressors and those stressors’ effects on overall performance.
xi
1
1
Introduction
Solar energy is becoming a more mature and mainstream source of electricity; the photovoltaic (PV) industry has experienced remarkable growth over the past decade. Worldwide, PV has already exceeded the 100 GW installed capacity mark in 2012 2 . Germany
lead the installation in 2012 with 7.6 GW, followed by China with between 3.5 and 4.5
GW 3,4 . In the US during 2012, there were 3.2 GW installed, fourth in the world 2 . A solar
project will be installed, on average, every four minutes in the US 5 . By the end of 2013,
over 100,000 individual solar systems will be installed, exceeding 4.4 GW in capacity. In
the academic world, although much PV research still focuses on gaining higher efficiencies and inserting new technologies, interest in lifetime and degradation has risen. At
the 2010 Department of Energy Science for Energy Technology workshop 6 , the topic of
PV lifetime and degradation science (L&DS) was made a research priority and its importance was reconfirmed in the Mesoscale Science Report 7 . A quantification of power
decline over time, also known as degradation rate (R d ), is equally important as initial
performance. Especially for investigators and PV power plant owners, degradation rates
essentially determine the lifetime of a PV system. A well-known disaster in the PV industry was Carrizo Plains, which was once the largest PV power plant in the world 8 . The
Introduction
2
installation failed after four years of operation because it exhibited a power degradation rate of 10% per year. Commercial PV panels claim a degradation rate lower than
1% per year, and usually come with a 25 year manufacturer warranty 9 . However, recent
research, sampling from on over 2000 degradation rates reported around the world, suggest that some PV systems exhibit a power degradation rate (Rd ) higher than 1%. Additionally, the study observes that Rd is highly dependent on the operating environment 10
.
1.1 Lifetime and degradation science approach
In order to predict the performance and lifetime of PV modules, a better understanding of degradation mechanisms and the influence of climate condition is necessary. A
performance and lifetime prediction tool (PLP) based on a reliability physics and prognostics approach was proposed, which requires indoor accelerated studies of PV materials, components and system and a real-world degradation and time series analysis of
PV modules 11,12 .
Real-world testing plays a critical role in researching degradation mechanisms, firstly
because it is the typical operating environment for PV systems 13 . A real-world environment is a unique combination of different stressors that no indoor testing chamber is
able to duplicate. Stressors in the real-world include, but are not limited to solar irradiance, rain, snow, salt fog, and soiling. In order to isolate the influence of a single stressor or several stressors requires precision and redundant climate condition monitoring.
Secondly, outdoor testing is the only way to correlate indoor accelerated testing to realworld performance. By developing metrics, metrology, and tools to quantify, compare,
and cross-correlate the response of PV modules and components to a variety of stressors
Introduction
3
for both accelerated and outdoor testing, it is possible able to link observed responses
to particular stressors and determine quantitative rates of degradation.
1.2 Thesis overview
1.2.1 Background and literature review
A literature review of previous research on PV modules, PV power plant performance
under real-world operation conditions and different data filtering methods applied is
provided in this thesis. Two IEC standards which were used for data monitoring and
data cleaning in this study are also reviewed. Finally some background information on
data science is provided.
1.2.2 Real World Data Acquisition
SDLE SunFarm’s design and the data acquisition methods applied to the case study of
60 PV modules on SDLE SunFarm are explained.
1.2.3 Results:Real-world data analytics
Descriptive data analysis and data clustering results are presented in this section.
1.2.4 Discussion
Discussion of the data analytic procedures for outdoor test data, comparison of modules’ performance under different climate conditions, at different relative position to
sunlight, comparison of initial indoor performance and outdoor performance of 20 brands
will be presented in this section.
1.2.5 Conclusions
Conclusions draw from data analysis are presented in this section.
Introduction
4
1.2.6 Future research
An improved study protocol and a predictive model are planed for future research.
1.2.7 Appendix
A list of 24 PV models being studied in this thesis, SunFarm design and characteristics,
and full references are presented in the appendixes.
5
2
Background and literature review
2.1 Previous research on real world PV modules’ performance
2.1.1 PV module degradation
PV modules’ power output is known to decline over time, and a quantification of this
phenomenon is measuring the degradation rate (Rd ) of a PV system. It is equally important for investigators and power plant owners to know the initial efficiency of PV modules as well as their degradation rates. Jordan and Kurtz reviewed over 2000 degradation
rate reports in 2011 10 . All the degradation rates that had been reported were determined
using one of the four methods introduce below.
Current-voltage (I-V) curves, which are typically taken at discrete time intervals indoor with a solar simulator or outdoors with a portable I-V curve tracer, are used for
determining Rd 14 . In order to take an indoor I-V curve, the PV module needs to be taken
off the array, which is not convenient for PV system owners. Outdoor I-V curve tracing
requires a very clear sky. Fig. 2.1 shows the the methodologies used to determine Rd . The
use of indoor I-V curve tracing increased after the year 2000 due to the widespread use
of flash indoor solar simulators. Neither of these methods provide continuous measurements, in fact it would take a large effort to acquire I-V curve measurements on every PV
module on a real PV power plant 15 . As a result, a large portion of the Rd measurements
Background and literature review
6
(40 out of 58, around 70%) were determined using only two, or even one, data point,
which leads to low accuracy and high uncertainty 16 . Using continuous power data for
Figure 2.1. Pie chart of the number of references deploying the indicated
methods to determine degradation rates prior to and following the year
2000 10 .
Rd determination can improve the accuracy 10 . Photovoltaic for Utility Scale Application
(PVUSA) 17 and performance ratio (PR) 18 Rd measurement methods are in the continuous data category. PVUSA is an AC rating method developed by engineers working on
Background and literature review
7
the PVUSA project. The PVUSA method provides an empirical relationship of the module’s AC output as a function of solar irradiance, ambient temperature and wind speed.
PR gives a measure of the ratio of modules convert efficiency in the field to a manufacturer provided qualification test efficiency under standard test conditions (STC), of
25◦ C, 1kW /m 2 , and AM 1.5 irradiance.
The degradation rate was determined by taking the trend of continuous data using
time series analysis 19,20 . Both methods display strong seasonality that can affect reported rates and increase uncertainties. In practice, the process of preferentially choosing data subsets, referred to as data filtering, such as data for sunny-only days, can reduce the noisiness of data 21 . However, data filtering usually eliminates or disregards the
impact of different climate conditions on modules’ Rd .
2.1.2 Performance ratio filtering
Performance ratio (PR) reflects the PV system conversion efficiency in the field compared to that under qualification STC. Previous research reported that typical PR of PV
systems is about 70%-80%. A survey conducted by Nils Reich from Frauhofer Institute
for Solar Energy Systems suggests that the PR for newly built PV systems in Germany increased to 90% 22 . However these reported PR are all filtered and averaged with a certain
methodology. Reich’s study only considered POA irradiance between 800-1000 W /m 2
and temperatures of either the 35-40◦ C or the 40-45◦ C temperature bin. Following the
first round of filtering there is still remaining ”outliers”, which they discarded all the data
points with a deviation of more than ±5% from the median of the annual PR. Fig 2.2
shows how annual PR was determined from already filtered data set. There are obvious
outliers exceeded 110% at the beginning of the study, and additional outliers lower than
40% during the study. This range was selected because ”there is no physical reason apart
Background and literature review
8
from malfunctions or measurement uncertainty, why PR at selected irradiance and temperature conditions should differ that much” 23 .
Figure 2.2. PR subsetting of an entire plant. Keeping ±5% data from the
median of annual PR 22 .
Another study conducted by Jordan et al . from NREL used three steps filtering 13 .
POA irradiance is fixed between 800 W /m 2 to 1200W /m 2 . Another two filters were
applied, denoted as stability and outliers. The stability filter ”eliminates data points
when POA changes more than 20 W /m 2 /mi n and the module temperature more than 1
◦
C/min”. Outlier filter ”uses DC/POA to eliminate snow days, partial shading conditions.
Furthermore, the data for sunny days were selected by filtering for clearness index >0.5”.
Clearness index of the sky is the ratio of measured global irradiance over the extraterrestrial beam irradiance on a similarly tilted surface 24 . After filtering, PR shows good
precision, which is good for degradation determination. However by applying a filter it
Background and literature review
9
only keeps data from constant bright sunny days and eliminated the other weather conditions. Filtered data was also averaged to eliminate seasonality, yet weather and season
have important effects.
Recently, Hasselbrink et al . from SunPower Corporation developed a unique approach of using “3 million module-years of live site data” 25 . Instead of determine yearly
degradation data with monthly averaged PR and moving average method, which ignore
seasonality by smoothing out the variation, performance index of the same day of the
year was used to determine the degradation rate at each day of the year. And yearly
degradation rate was determined from the distribution of the 365 Rd . This method included all climate conditions; however, isolating the influence of each climate stressors
is not the focus of their study.
2.1.3 Influence of weather stressors
A PV systems’ operating environment is a combination of multiple weather stressors
including temperature, humidity, radiation, soiling, etc. Interest has risen for the investigating of the influence of one or multiple stresses. Faiman, Ye et .al conducted an
experiment on three different types of modules: Mono c-Si, micromorph Si and a-Si
with single junction. Their performance under two distinct monsoon seasons throughout the year was modeled 26 . The results show module efficiency is highly correlated
to temperature. However, as a result of Singapore’s low altitude, module’s efficiency at
noon time is not strongly correlated to spectral effects, which arises from changes in air
mass. Another study focused on the soiling losses of solar systems, was conducted by a
group of researchers at the University of California San Diego. They qualitatively modeled the losses caused by dust accumulating on module surfaces between two days of
rain 27 . The research explicitly compared average soiling losses of modules mounted at
Background and literature review
10
tilt angles from 0-5, 6-19, and greater than 20 degrees. Soiling loss of sites have tilt angle
shallower than 5◦ showed losses five times that of the rest of the sites.
Seasonal variation, which has usually been neglected in the process of determining Rd , contains information about influence of climate stresses on PV modules performance and reliability. The research reported here aims to extract more information
by doing exploratory data analysis and clustering analysis on the entire AC power time
series data before sub-sampling or “filtering”.
2.2 Standards
2.2.1 Photovoltaic system performance monitoring
IEC 61724 describes general guidelines for the monitoring and analysis of the electrical
performance of photovoltaic systems 28 .
Meteorology. For climate conditions monitoring, total irradiance in the plane of array
(GI ) shall be measured in the same plane as the PV array by calibrated reference devices
or pyranometers. Ambient air temperature (Tam ) shall be measured at a location that
can represent array conditions using temperature sensors that are shielded from direct
solar radiation. Wind speed (SW ) shall be measured at a height that can represent array
conditions.
Electrical parameters. PV system electrical parameters including output voltage (V A ),
output current (I A ), and output power (P A ) represent the DC electrical characteristics.
Utility grid electrical parameters including utility voltage (VU ), current to utility grid
(IT U ), current from the grid (IFU ), and power to the utility grid (PT U ). The standard also
point out that “AC voltage and current may not need to be monitored in every situation.
DC power can either be calculated in real time as the product of sampled voltage and
Background and literature review
11
current quantities or measured directly using a power sensor. If DC power is calculated,
the voltage and current quantities shall be sampled not averaged.” This explains why
the microinverter used in this study provides instantaneous DC voltage and DC current
and averaged AC power.
System performance indices. System performance indices are part of derived parameters that relate to system energy balance and performance calculated from the recorded
monitoring data. Performance indices normalize system performance, which makes PV
systems of different configurations and at different locations comparable. These indices
include yield, losses and efficiencies. Yields are energy quantities normalized to rated
array power. System efficiencies are normalized to array area. Losses are the differences
between yields.
Daily mean yields. a) The array yield Y A is the daily array energy output per kW of installed PV array:
Y A = E A,d /P 0 = τr × (Σd a y P A )/P 0
(2.1)
This yield represents “the number of hours per day that the array would need to operate
at its rated output power, P0 , to contribute the same daily array energy to the system as
was monitored”.
b) The final PV system yield Y f is the portion of daily net energy output of the entire
PV plant which was supplied by the array per kW of installed PV array:
Y f = Y A × η LO AD
(2.2)
This yield represents the number of hours per day that the array would need to operate
at its rated power output to equal monitored net daily yield. η L O AD is load efficiency.
Background and literature review
12
c) The reference yield Yr can be calculated by dividing the total daily in-plane irradiation by the module’s reference in-plane irradiance GI ,r e f .
Yr = τr × (Σd a y G I )/G I ,r e f
(2.3)
This yield represents the number of hours in a day the sun needs to be at reference irradiance levels in order to contribute the same incident energy as measured on the field.
Normalized losses. By subtracting yields, normalized losses are calculated.
a) The "array capture" losses Lc represent the losses due to array operation:
L c = Yr − Y A
(2.4)
b) The balance of system (BOS) losses LBOS represent the losses in the BOS components:
L BOS = Y A × (1 − η BOS )
(2.5)
c) The PR indicates the overall effect of losses on the array’s rated output due to array temperature, incomplete utilization of the irradiation, and system component inefficiencies or failure:
P R = Y f /Yr
(2.6)
2.2.2 Procedures for temperature and irradiance corrections to measure
current-voltage characteristics
In IEC. 60891 18 , three correction procedures have been introduced. For time’s sake, only
the first procedure will be introduced, which was used for the baseline data correction
in this work. The second procedure is especially good for large irradiance corrections
(>20%). The third procedure needs to be utilized when the temperature coefficient of
PV devices is unknown.
Background and literature review
13
Correction procedure 1. The measured current-voltage characteristic shall be corrected
to standard test conditions ,which is given at 25 and 1000W /m 2 , by applying the following equations:
I 2 = I 1 + I SC · (G 2 /G 1 − 1) + α · (T2 − T1 )
(2.7)
V2 = V1 − R S · (I 2 − I 1 ) − κ · I 2 · (T2 − T1 ) + β · (T2 − T1 )
(2.8)
where I 1 ,V1 are coordinates of points on the measured characteristics; I 2 ,V2 are coordinates of the corresponding points on the corrected characteristics; G 1 is the irradiance
measured with the reference device; G 2 is the irradiance at the standard or other desired irradiance; T1 is the measured temperature of the test specimen; T2 is the standard
or other desired temperature; I SC is the measured short-circuit current of the test specimen at G 1 and T1 ; αand β are the current and voltage temperature coefficients of the test
specimen in the standard or target irradiance for correction and within the temperature
range of interest; R s is the internal series resistance of the test specimen; κ is a curve
correction factor.
2.3 Data science
2.3.1 Data validation
Data validation is the process of ensuring that data analysis is based on a clean, correct
and useful data set 29 . Data validation including data type checks, for example, whether
the data is power production of PV module or irradiance intensity on the PV module’s
plane; file existence check, check for which days data files are available for analysis;
cross-system consistency check, which compare data point to the same variable collected in different systems to ensure it is consistent. In practice data validation rules
Background and literature review
14
can be implemented through the automated facilities of a data dictionary 30 , or by the
inclusion of explicit application program validation logic 31 .
2.3.2 Exploratory data analysis
In outdoor testing of PV systems, test conditions are not controllable, the best we can do
is to collect as much data as possible so as quantitatively evaluate climate stressors and
the PV systems’ response. Exploratory data analysis (EDA) 32 encompasses and surpasses
initial data analysis (IDA) 33 while IDA narrowly focus on hypothesis testing and checking assumptions, EDA encourages statisticians to explore the data, possibly formulating
hypothesis that can guide further experiments and data collection. EDA usually summarizes main characteristics of data by visual methods, including box plots, histograms,
multi-vari charts which graphically displays patterns of variation.
2.4 Clustering analysis
Data describe the characteristics of different PV systems. In order to understand all
kinds of response and phenomena, one of the most important steps of data analysis activities is to classify or group data into a set of categories or clusters. Data objects that are
classified in the same group or cluster should reflect similar properties based on some
criteria. Classification processes can be supervised or unsupervised. Supervised classification is mapping data objects into predefined classes. Unsupervised classification is
know as cluster data analysis 34 . As described in literature, “A direct reason for unsupervised clustering comes from the requirement of exploring the unknown natures of the
data that are integrated with little or no prior information” 35 . Clustering algorithms will
be discussed in this paper including hierarchical clustering and k-means clustering.
Background and literature review
15
Hierarchical clustering is a connectivity based clustering algorithm. It is based on
the core idea of “objects being more related to nearby objects than to objects farther
away” 36 . In order to determine the similarity of two objects, the distance of two objects
need to be defined. Distance metrics including Euclidean distance, squared Euclidean
distance, Manhattan distance, Dynamic time warping etc. Euclidean distance computes
the root of square differences coordinates of a pair of objects:
DXY =
rX
(x i k − x j k )2
(2.9)
k
The standard Euclidean distance can be squared in order to place progressively greater
weight on objects that are farther apart:
D 2X Y =
X
(x i k − x j k )2
(2.10)
k
Manhattan distance or city block distance represents distance between points in a
city road grid. It computes the absolute differences between coordinates of a pair of
objects:
DXY =
X
| xi k − x j k |
(2.11)
k
Linkage criterion specifies if two set of objects can joined into one by measure different objects pairs in two sets.
K-means clustering is also known as centroid-based clustering 37 , which partitions
objects in a way that objects assigned to the same cluster are nearest to each other. Kmeans clustering uses Euclidean distance metrics. The quantity that can evaluate the
quality of k-means clustering result is within-cluster sum of squares (WCSS), which is a
sum of the distance among the objects in the same cluster. The goal is to assign each
objects to a cluster such that the total WCSS is minimized.
16
3
Real-world Data Acquisition
3.1 SDLE SunFarm design
The SDLE Sunfarm located on the west campus of CWRU is about one acre in size. 14
high precision, Feina SF20 dual-axis tracker and 2 sites of adjustable tilted racking comprise the 16 electrical sites of SDLE SunFarm. 122 individual PV power plants include
120 PV modules with microinverters and two sets of 8 PV modules connected in series
with string inverters. Output power is monitored through inverters and fed back to the
grid through a reversing relay. 120 modules work with microinverters were evenly separated into two groups, each group has 60 samples (3 modules samples from 20 brands).
Two groups of modules use two different microinverter models for comparison. The
first 60 microinverters installed were Enphase model M215. Electrical data was reported
by Enphase’s embedded Enlighten data acquisition system. The metrology platform
(shown in Table 3.1) includes insolation, and weather monitoring. Minute-by-minute
global horizontal irradiance (GHI) data was monitored by a Kipp & Zonen CMP6 pyranometer, positioned near the fixed racking. Another Kipp & Zonen CMP11 pyranometer was also set on the horizontal plane and connected to a Daystar multi-tracer. Two
Vaisala WXD520 weather stations were placed on the SunFarm to record wind speed,
wind direction, rainfall, rain intensity, rain duration, and humidity. An anemometer was
Real-world Data Acquisition
17
connected to the Master Control Unit of the trackers to monitor the wind load on the
trackers. T-type thermocouples were used for backsheet temperature monitoring.
Instrument
Attributes
AC power
Enphase micro-inverter
DC current
DC voltage
Kipp& Zonen pyranometer
Solar irradiance
Temperature
Wind speed
Wind direction
Vaisala WXD 520 weather station
Rainfall
Hail
Relative humidity
T-type thermocouple
Backsheet temperature
Table 3.1. Parameters monitored using SDLE SunFarm metrology platform
The data acquisition system consists of 17 networked Campbell Scientific CR-1000
dataloggers, with each datalogger connected to an AM 16-32 multiplexer, extending the
capacity of datalogger to 32 differential measurement channels. The Campbell dataloggers monitor thermocouple and sensor outputs. Enphase micro-inverters use the
Enphase Envoy Communications Gateway to connect each individual micro-inverter to
Enlighten monitoring and management software. Similarly, Solectria string inverters
use the Solrenview system to collect data. Minute by minute data can be downloaded
from Solrenview web servers.
3.2 Global SunFarm network and Energy CRADLE
Cleveland’s climate, a humid continental, is not typical for PV degradation research. In
order to study PV modules’ performance under different climatic conditions, a global
SunFarm network was established among nine PV outdoor test beds across the world.
Real-world Data Acquisition
18
The purpose of the Energy Common Research Analytics and Data Lifecycle Environment (Energy CRADLE) is to create, for engineering, and in particular lifetime science,
the tools and protocols necessary to transform Big Data to information, which informs
scientific knowledge to guide further analysis 30,38–40 . Energy CRADLE is tightly focused
on serving the needs of handling and sharing data across the SunFarm network. Appendix B provides further details.
19
4 Results: Real-world data analytics
4.1 Overview
In this section, the results will focus on real-world performance of 60 crystalline silicon
PV modules from 20 different manufacturers exposed from November 25, 2012 to May
31, 2013 on the SDLE SunFarm. The purpose of this case study is interpreting the information in the data that has been collected during the first 6 months of SDLE SunFarm’s
operation, developing a data cleaning, and data munging procedure. This analytic procedure will be integrated within Energy CRADLE and will guide the way for data processing on the cloud. This case study can also inform experimental design and evoke
further research interests.
Fig. 4.1 is a blueprint of SDLE SunFarm’s 16 electrical sites. The 60 modules studied are distributed on the sites marked with red boxes, specifically fixed rack Site 1 and
tracker Sites 4, 6, 8, 12, and 14. All three modules from the same manufacturer are placed
on the same site. On fixed rack Site 1, 18 modules from 6 brands are aligned horizontally,
and modules of same brand placed adjacent to each other. On trackers, which carry either 6, 9 or 12 modules each, modules of the same brand are evenly distributed on the
same tracker frame.
Results: Real-world data analytics
20
In this study, manufacturer information of the modules are withheld, each brand
will be referred to as capital letter A through T. Modules’ location is represented using
lower case f or t, which are short for “fixed rack” or “tracker”, respectively, followed by
site number. Each module has a sample number start with “sa”. For example, in Fig. 4.3
power data was record from “A.f1.sa18259.00” which is a module of brand A mounted on
fixed rack site 1 and its sample number is “sa18259.00”.
Figure 4.1. 60 PV modules studied in this section are distributed on 6 different electrical sites shown in red boxes in the plot. Site 1, which is the
long site along the bottom, is a fixed tilt rack site. 18 modules are exposed
on Site 1. The rest of the modules are exposed on even number of tracker
sites. There are 12 modules on Site 4, 6 modules on Site 6 and 8, 9 modules
on Site 12 and 14.
4.1.1 Analytical methods
R 1 , which is a free and open source programming language and software environment
for statistical computing and graphics was chosen as the data analysis tool. The data
Results: Real-world data analytics
21
analytical methods applied to this case study consist of raw data validation, exploratory
data analysis, data assembly, data subsampling, and clustering data analysis.
4.2 Raw data validation
4.2.1 Module baseline
All the modules studied on SDLE SunFarm were brand new modules purchased on the
open market. Before being exposed to the sunlight, I-V characterization of each module were recorded using a SPIRE SPI- 4800 solar simulator and I-V curve tracer, located
at the Wright Center for Photovoltaic Innovation and Commercialization (PVIC), at the
University of Toledo. In order to reduce the impact of instrument uncertainty, sixteen
I-V curve measurements were acquired for each module. Additionally, the backsheet
temperature and the irradiance intensities were recorded. Each measurement was corrected to standard test condition (STC), specified at 25 ◦ C and 1 kW /m 2 according to
IEC 60891 18 . Maximum power output (Pmax ) was taken from 16 corrected I-V curve
measurements to represent the initial performance of each module under STC. For 60
modules, the standard deviations of 16 Pmax measurements fall between 0.04%-0.9%,
which supports the reliability of baseline results.
In order to evaluate the initial performance of each brand, the mean of Pmax were
taken for each brand from three modules and normalized by dividing nominal power
output of the module. Fig. 4.2 shows the normalized performance of each brand, and
the deviation among three module samples is shown as error bars. Most brands’ (except
H and Q) initial performance fall in the gap between 0.95-1.05, which means their initial
performance reached the common market expectation of ± 5% of their nominal power.
Results: Real-world data analytics
22
Figure 4.2. Cross-sectional comparison of crystalline silicon PV modules
from 20 different manufacturers. Y axis is normalized power. X axis
shows the brands and location. Brand names were replaced with letters
A through T. Letter f and t represent fixed tilt rack and tracker. The maximum power output (Pmax ) of three modules of each brand were measured. The bars show the averaged normalized power of each brand. The
standard deviation was plotted as error bars.
4.2.2 Power data
As introduced in previous chapter, electricity generated from all 60 modules are reported
by the microinverters data acquisition system, Enlighten. Enlighten data reports DC
Results: Real-world data analytics
23
current, DC voltage, a microinverter’s internal temperature, and AC power. Data collection interval is 5 minutes. Prior to Energy CRADLE, power data was collected from Enlighten manually. Fig. 4.3 shows an example module’s AC power over six months. From
this figure, we can clearly see daily variation of power data. During the 180 days, there
are several gaps in data partly due to three trips of the interconnection relay. Over the
180 days observation time, 99 days have power data reports.
Figure 4.3. Power production versus time of one PV module.
4.2.3 Microinverter’s efficiency
A microinverter’s efficiency is calculated from AC and DC power data. Microinverter’s
conversion efficiency is given by the ratio of AC power to DC power. DC power is calculated using the product of DC current, DC voltage. AC power is provided in the data.
Both DC current and DC voltage have two significant digits, while AC power is given as
an integer. Fig. 4.4 shows the efficiency of 60 microinverters. The majority of the efficiencies are between 95% and 99%, which is consistent with the efficiency provided by
the manufacturer. However, there are 12 points that exceed 1.0, which is contrary to the
laws of thermodynamics. By looking at the raw data, it appears that when the PV module’s DC output is low (around 1 W), the module tends to "round up" the product of DC
Results: Real-world data analytics
24
current and voltage to an integer. This rounding behavior explains why abnormal efficiency appears mostly on trackers 12 and 14; these two trackers did not track properly
during the majority of the 180 days data was collected. The modules on these trackers
were exposed to low irradiance level longer than the other modules.
Figure 4.4. The efficiency of 60 microinvertors from 99 days of data
collection after exposure on the SDLE SunFarm on fixed rack 1 (blue),
tracker4 (red), tracker6 (yellow), tracker8 (green), tracker12 (purple), and
tracker14 (light blue).
Results: Real-world data analytics
25
4.2.4 Microinverter burst-mode
Further investigation of the“round up” effect shows that it is due to the fact that Enphase
microinverter works at “burst-mode” at low DC input. When a PV module is working
under low irradiance, DC output of the module is low, and therefore the DC to AC conversion efficiency will drop 41 . Microinverters can scan the DC voltage at each AC cycle
(1/60 second). When a microinverter detects the DC input is lower than 30%, it will
charge a capacitor instead of converting DC to AC power. At the next cycle, microinverter scans the PV module for its output again and adds that to the amount of charge
already stored in that capacitor bank from the previous cycle. If the combined power
is high enough for a DC-to-AC conversion, the capacitor will release the charge. As a
result, when the microinverter is “bursting” the stored-up charge, the AC output of the
microinverter will be higher than what the DC input would dictate. This explains why
when a microinverter always rounds up its AC power which, as a result, make its efficiency higher when it’s working at low irradiance level. However, it is also known that
AC power reported by Enlighten is an averaged value instead of an instantaneous measurement. The affect of “Burst-Mode” will be shown in the data subsampling part of this
chapter.
4.2.5 Weather data
Insolation data. This study uses global horizontal irradiance (GHI) monitored by a Kipp
& Zonen CMP6 pyranometer placed at a horizontal plane as reference. The sampling
rate for irradiance data was determined by the datalogger’s scan period, which is 1 minute
for all the data loggers on SDLE SunFarm. Incident irradiance on a PV module’s plane
Results: Real-world data analytics
26
is different from horizontal plane, so in order to convert GHI to plane of array (POA)
irradiance, the assumption was made that all incident sunlight is direct sunlight.
Global horizontal irradiance (GHI) to plane of array (POA) irradiance conversion. In
reality, global horizontal irradiance (GHI) consists of direct irradiance and diffuse irradiance. Direct irradiance is proportional to direct normal irradiance (DNI) with a sine
function, while diffuse irradiance varies on different planes.
G H I = D N I × si nα + I d i f
PO A t r acker = D N I + I d i f
(4.1)
PO A f i xed = D N I × si n(α + β) + I d i f
where I d i f is the defused irradiance, α is the elevation angle of the sun, and β is the tilt
angle of the fixed rack, in this case β equals 22.3◦ . The elevation angle is given as :
α = 90◦ − θ + δ
(4.2)
where θ is the latitude; and δ is the declination angle given as:
δ = 23.45◦ si n[360◦ × (284 + d )/365]
(4.3)
where d is the day of the year. Since the DNI or I d i f data was not available for the first 6
months, here the assumption is that all the incident light is direct sunlight, which simplifies the formulas as:
G H I = D N I × si nα
PO A t r acker = D N I
PO A f i xed = D N I × si n(α + β)
(4.4)
Results: Real-world data analytics
27
The estimated POA is supposed to be higher than the actual incident irradiance on a
modules’ surface as a result of treating diffused light as direct light, amplified by converting GHI to POA. In the future, this systematic error can be removed by having an
irradiance sensor set on the plane of array and a direct irradiance sensor for DNI. In this
case study, as modules performances will be cross compared, the systematic error can
be ignored.
Additional climate data. Additional climate data including special climate events and
the cloudiness levels were collected from online open source historical data, such as
Weather Underground (http://www.wunderground.com).
4.2.6 Data alignment
Data alignment is another important validation process for time series data. There are
multiple data sources on the SunFarm, and different devices synchronize time from different time sources. For example, the weather data used in this study was collected by
dataloggers on SDLE SunFarm. Time on these dataloggers was synchronized through
a controller software on a desktop computer in the SDLE lab. The power data was reported by Enphase user interface, which synchronizes time with their server. PV modules can generate power almost instantaneously when sunlight hit on the front surface;
therefore, time series data of power and irradiance should be highly correlated.
Weather and power data were aligned using the sample cross correlation function
(ccf) in R. Ccf in R is defined as the set of sample correlations between time series X at
time t + h (h = 0, ±1, ±2) and time series Y at time t, where X is potentially a predictor
of Y. If two time series were perfectly aligned, then correlation is the highest when h =
0 and the correlation value drops as the absolute value of h increases. However, if the
maximum correlation value appears when h is positive, then X lags Y. If correlation is
Results: Real-world data analytics
28
maximum when h is negative, then X leads Y. It was determined using CCF that, weather
data was leading power data by 3 minutes before March 10t h , 2013. After daylight saving
time began on March 10t h , 2013, the time shift became 63 minutes. The time for the
power data was found to be more trust worthy in comparison to the standard Greenwich
Mean Time (GMT). The weather data were separated into two parts, before and after
March 10, then the time of two parts were slewed accordingly.
4.2.7 Malfunction of trackers
According to the maintenance record, tracker 4 did not experience any mechanical problems. All the other trackers experienced some amount of malfunctions during the observation time. In order to determine the days when trackers malfunctioned, the power
data from an example module on each tracker were plotted versus time and compared
to the example power data from the functioning tracker 4. An example of the power data
curve from a stopped tracker (tracker 8) and power data curve from a normal operating
tracker (tracker 4) is shown in Fig. 4.5.
The curve of the power on tracker 8 is not symmetric with the majority of the power
generated in the afternoons. Thus, it was stopped facing west. By comparing the curves
in this manner, the malfunctioning dates of each tracker were determined. Tracker 6
was stopped for 5 days in May. Tracker 8 stopped 10 days in April and May. Tracker 12
was off tracking until its gear counter got replaced. Tracker 14 was not tracking most of
the time because of a gear stopper issue.
Results: Real-world data analytics
29
Figure 4.5. A comparison of AC power generated by module on tracker 8
(top) and tracker 4 (bottom) between May 12th to May 18th 2013, when
tracker 8 stopped functioning and facing west.
4.3 Exploratory Data Analysis (EDA) on Integrated Data
4.3.1 Total power production
A normal way of evaluating a PV module’s performance is by comparing the total power
production. A module’s power production, in this case, is not only affected by it’s nominal power rating, but also affected by modules mounting system. The averaged total
power production of each brand is shown in Fig. 4.6. The highest power production in
99 days is 60.36 kWh from brand G on tracker 4 and the lowest power production, 28.79
kWh, is brand T on tracker 14. The four brands on tracker 4 (red), on average, produced
about 40% more power than the six brands on fixed rack (blue). Modules on the other
trackers produced less power than tracker 4 by varying degrees. Generally, modules on
an operational tracker should produce more power than those on fixed rack when the
Results: Real-world data analytics
30
tracker is operating correctly. However, except the tracker on sites 4 and 6, the modules
on trackers produced less power on average than the modules on the fixed rack site 1.
In order to compare the performance of different brands on the same site, total power
needs to be normalized.
Figure 4.6. The bar graph shows the averaged total power production of
each brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow),
tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). The standard deviation are plotted as error bars.
4.3.2 Normalized power yield
Normalized power yield is defined as the ratio of the total power production to the product of nominal power and exposure days (i.e, 99 days) 28 . Normalized power yield is
Results: Real-world data analytics
31
equal to the time that the PV plant is operating at nominal power output in a day. Normalized power is an important factor in choosing PV modules. In the PV industry, a
module’s price is presented in the unit of dollars per watt, so modules have higher normalized power yield are more cost effective. Fig. 4.7 shows the normalized power production of each brand. As the trackers experienced different problems, it is not valid to
compare the performance between two brands on different sites. However, the ranking of brands within one site demonstrates the relative performance of different brands.
For example, on fixed rack which is shown in blue, brand B’s total power production is
the lowest but it’s normalized power production is the highest among 6 brands. This
indicates that under the same environment, including temperature and irradiance conditions, brand B performs better than the other brands on site 1.
4.4 Clustering of AC Power Data
Section 4.3 showed that integrated performance, total power production and normalized power production vary among the 20 different brands; however, brands on the
same site perform similarly. In order to determine if the modules of the same brand
always perform similarly, it is necessary to check the similarity of 60 modules’ AC power
time series data. As mentioned in Chapter 2, a statistical way of checking the similarity
of multiple observations is clustering analysis. A hierarchical clustering analysis (HCA)
was conducted on all of the time series AC power data from the 99 days of observed
data. There are 9698 observations for each module. A dendrogram that uses Euclidean
distance metric and average linkage criterion is shown in Fig. 4.8. The distance metric
and linkage criteria will be discussed in 5. Red boxes in the plot show the result of dividing modules into six groups. The grouping result reflected exactly 6 physical sites on
Results: Real-world data analytics
32
Figure 4.7. The bar graph shows the averaged normalized power production of each brand from fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow), tracker 8 (green), tracker 12 (purple), and tracker 16 (light blue). The
standard deviation are plotted as error bars.
SDLE SunFarm. Although there are some exceptions, most of the modules of the same
brand are close to each other in distance. In Fig 4.8 from left to right, the 6 groups consist of modules from tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker
4 respectively.
However, six is an arbitrary number chosen from experience with the data. In order
to confirm the result of HCA is valid, the k-means algorithm was used. K-means clustering partition observations into k clusters which minimize the “total within-cluster sum
of square" (WCSS). In this case, each sample (PV module) has a set of 9698 observations,
Results: Real-world data analytics
33
Figure 4.8. Hierarchical cluster analysis of 60 modules based on all AC
power time series data. The clusters were generated using “hclust" in
“stats" package in R(v3.0.1). Distance matrix is computed using a Euclidean method. Distance between sets of observations is defined with
the average linkage method. When the dendrogram tree is divided into
6 groups, each group includes exactly the modules physically located on
the same electrical site.
where each sample is treated as a 9698-dimensional vector. In order to determine the
k value that gives the most reasonable result, a commonly used method is the elbow
method 42 . The elbow method is applied to a plot of WCSS as a function of the cluster
numbers, k. The best cluster result occurs when adding an additional cluster does not
statistically improve the model of the data. This point should be chosen as the cluster
number, hence the "elbow criterion". A survey of the WCSS as a function of k is plotted
Results: Real-world data analytics
34
in Fig. 4.9. The elbow point is equal to 6, which is marked in a red circle. The k-means
clustering result a k equals to 6 is consistent with the result of HCA. In order to visually
Figure 4.9. Total within cluster sum of square (WCSS). Elbow points occurs when k is equal to 6.
conform that AC power time series fall into each group similar to each other, Fig. 4.10
plotted AC power output of 60 PV modules over 99 days according to both k-means and
hierarchical clustering results. 60 AC power time series were separated into 6 groups,
Results: Real-world data analytics
35
modules from the same brands are shown in the same color. Group 1 through 6 correspond to tracker 14, tracker 12, tracker 8, fixed rack 1, tracker 6, and tracker 4, respectively. Fig. 4.10 confirms that the shape and magnitude of the AC power time series in
each cluster are similar.
Figure 4.10. AC power of 60 modules grouped by hierarchical result.
Color of the curve differentiate module brands.
Results: Real-world data analytics
36
4.5 Data Assembly
4.5.1 Performance metrics
Up to this point analysis was based on 60 modules’ AC power data. However, in order
to correlate modules power output to climate conditions, climate data and power data
were assembled in the following way. To compare the performance of 60 modules of
different nominal power and with different mounting system, a normalized analysis and
presentation was introduced based on IEC 61724 28 and H. Haeberlin et. al. work 43 .
Normalized energy yields and losses. Definition of six performance indices introduced
in IEC 61724 were discussed in Chapter 2. Since the 60 modules being studied are all
working with individual microinverters instead of a PV array and AC power generated
was directly fed back to the grid. Each module is one PV plant. Data was collected
on a minute basis instead of on daily basis. Specifically power data was collected every 5 minutes and weather data was collected every minute; therefore, it is necessary
to modify the performance metrics. These new performance indices are normalized instantaneous quantities. Irradiance yield, YI , is POA irradiance normalized to reference
irradiation 1 kW/m2 (Equation 4.5).
Y I = PO A/G 0 ,G 0 = 1kW /m 2
(4.5)
DC yield, YDC , is the DC power normalized to a module’s nominal power (Equation 4.6).
DC power was calculated by multiplying DC current to DC voltage.
YDC = P DC /P 0
(4.6)
AC yield, Y AC , is the AC power normalized to module’s nominal power (Equation 4.7).
Y AC = P AC /P 0
(4.7)
Results: Real-world data analytics
37
Capture losses, Lc , is the part of incident sun power not captured by the solar cell (Equation 4.8).
L c = Y I − YDC
(4.8)
System losses, Ls , is the DC-AC inverter conversion losses (Equation 4.9).
L s = YDC − Y AC
(4.9)
Performance ratio (PR) is the ratio of the useful energy fed back into the grid to the energy which would be generated an ideal PV module with cell temperature of 25 ◦ C and
the same irradiance.
P R = Y AC /Y I
(4.10)
4.5.2 Solar time
Local noon time is usually not when the sun is the highest in the sky due to the Earth’s
orbit and human adjustments such as time zones and daylight saving time. Noon local
solar time (LST) is defined as the time when the sun is highest in the sky for a particular
location and not necessarily at the local noon time 44 . In order to better understand the
modules’ performance corresponding to solar motion, timestamps of the data need to
be converted from local time (LT) to LST. The local standard time meridian (LSTM) is
a reference meridian used for a particular time zone and is similar to the Prime Meridian (longitude = 0◦ ), which is used for greenwich mean time (GMT) 45 . The formula for
calculating LSTM is given by Equation 4.11:
LST M = 15◦ × ∆TG M T
(4.11)
where ∆TG M T is the difference of the local time from GMT in hours. ∆TG M T equals −4
for eastern daylight time (EDT), equals −5 for eastern standard time (EST). Equation of
Results: Real-world data analytics
38
time (EoT) corrects the eccentricity of the Earth’s orbit and Earth’s axial tilt (Equation
4.12).
EoT = 9.87si n(2B ) − 7.53cos(B ) − 1.5si n(B )
(4.12)
where
B = 360◦ (d − 81)/365
in degree and d is the number of days in the year. The net time correction factor (TC)
accounts for the variation of LST in a given time zone (Equation 4.13, 4.14).
T C = 4(Long i t ud e − LST M ) + EoT
(4.13)
LST = LT + T C /60
(4.14)
Six performance metric variables of one single module over one day in LST are shown
in Fig. 4.11. Y I , YDC , Y AC , L c , L s , and PR curves are plotted in black, green, blue, yellow,
brown, and red, respectively. On a clear sunny day both irradiance and PR show a dome
shaped curve. The PR curve has a comparably flat top, which suggest PR and POA are
highly correlated and PR is less sensitive to POA irradiance at high level (over 750 W/m2 ).
4.6 Sub-sampling
4.6.1 Solar noon time performance ratio
From the EDA plot of performance metrics (Fig. 4.11), it is clear that PR is correlated to
POA irradiance. PR can reach up to 0.85 on the 22.3◦ fixed rack and 0.90 on the tracker at
solar noon time when the POA irradiance is high. In order to reduce the volume of data
and reduce temporal fluctuations, the PR is subset into ±15 mins around solar noon
time. The sampling rate of the PR is 5 mins, so there are about 7 data points within this
Results: Real-world data analytics
39
Figure 4.11. Normalized performance of one single module
(D.fi.sa18286.00) on fixed rack on December 12, 2012. These variables are PR (red), YI (black), YDC (green), Y AC (blue), Lc (yellow), Ls
(brown).
30 min window. During the 99 days, there are roughly 700 observations for each module,
which is still statistically sufficient for further analysis.
4.6.2 Snowy days
EDA on the solar noon time PR subset was performed by plotting PR versus Y I for each
module. An example of PR vs Y I plot is shown in Fig. 4.12. Three abnormal data points
groups are marked in yellow circle in the plot.
Results: Real-world data analytics
40
Figure 4.12. Solar noon time PR of a module (C.f1.sa18328.00) on the
fixed rack versus Y I . The vertical blue line marks POA irradiance at 1200
kW /m 2 , the red horizontal line marks the PR at 1.0. Group 1 are points
have a PR greater than one. Group 2 are the points at irradiance higher
than 1200 kW /m 2 . Group 3 are the points showed zero PR.
Group 1. In theory, the PR can never exceed one. From literature and standards 22,28,43 ,
PR is normally reported to be 0.8-0.85 on average. The abnormal points in group 1 have
a PR calculated greater than one. By looking at the raw data including AC power, DC
power, and POA irradiance at each of data points, two potential causes were found.
First, these data points appeared when the irradiance changed quickly. As discussed
previously, power data was reported by Enphase Envoy system and the method of their
data acquisition is unknown. It could be that Enphase does not report instantaneous
power but an averaged value. By using averaged power data and instantaneous POA
measurement for PR calculation, a systematic error is introduced. Another cause of high
performance could be the microinverter working in burst-mode as discussed earlier.
Results: Real-world data analytics
41
Group 2. Solar radiation outside the Earth’s atmosphere is 1.36 kW /m 2 and the global
irradiance on a tracker plane at noon time in Denver, Colorado is less than 1.2 kW /m 2 .
The abnormal data points in Group 2 showed a POA irradiance on the fixed rack, 22.3◦
tilt plane is higher than 1.2 kW /m 2 . This is a systematic error introduced by converting
GHI to POA. Without direct global POA irradiance monitoring or direct sunlight monitoring, it is not possible to correct the error. However, since the same irrradiance conversion method is used for all modules, it will not affect the cross-sectional comparison
of the modules performance.
Group 3. The PR of the modules was small or equal to zero even when irradiance was
not very low, which indicates that the module may be covered. Moreover, it only appeared in certain days in December and January, and only appeared on some modules
mainly on the fixed rack. Given the climate, this suggests that it was caused by snow
coverage.
Snowy days. In order to document the relationship between low performance of modules and snowy weather, historical climate condition data from a third party web site
was collected. PR time series were plotted for each module, data points on snowy days
were highlighted with red and blue colors 46 . Fig. 4.13 shows the PR of six modules of
two different brands (three modules from each brand). Low PR appears only during or
after snow or fog-snow days, proving that the abnormal points group 3 points were most
likely snow coverage. All snow-covered date was determined by plotting out all 60 modules PR versus time, and the snowy days data were assembled as a subgroup.
Results: Real-world data analytics
42
Figure 4.13. Solar noon time PR of six modules, data points when there
was snow or fog-snow were highlighted in red and blue. Three modules
on the top row are from brand A placed on fixed rack. Three modules on
the bottom row are from brand G placed on tracker 4. All three A brand
modules showed low/zero performance during or after some snowy days,
while the other three modules on tracker do not.
4.7 Clustering of Solar Noon Time Performance Ratio Data
As discussed in Section 4.6, PV modules performance ratio data was subsetted to 15
minutes around solar noon time. In order to reduce data volume, the average of each
days PR data was taken. After subtracting the snowy days, there are 75 days of PR data
Results: Real-world data analytics
43
left, thus 75 data points represent the solar noon time performance of each module.
Since the relationships among the 60 modules noon time performance is not intuitive,
an EDA can lead to a better understanding of the data. A pairs plot is commonly the first
step of EDA.
Figure 4.14. A pairs plot of solar noon time PR of three modules of the
same brand. For each row, all the Y axis are the PR of a module. For each
column, all the X axis of the plot are PR of a module. Module’s sample
number is shown in the diagonal boxes. The correlation coefficient of
each X, Y axis is calculated, and represented by varying shades of green.
The darker the green relate to the higher correlation coefficient.
Results: Real-world data analytics
44
The pairs plot takes the value of the PR of one module as the X coordinate and the
PR of another module as the Y coordinate. If the two modules under comparison performed the same at each observation time, then we expect to see all data points in a diagonal line. Fig 4.14 is a plot of solar noon time PR of three modules of the same brand.
In order to better visualize the correlation of the X and Y coordinates, the correlation
coefficient of the two modules are represented by varying shades of green related to the
strength of the correlation coefficient. The darker the green color relates to a higher correlation coefficient between the two PR series. In Fig. 4.14, performance of the module,
G.t4.sa18211.00, is over 99% correlated to G.t4.sa18210.00. Only the first ten modules
pairs plot is shown in Fig. 4.15, as space is limited; however a pairs plot of all 60 modules was studied. From the pairs plot of all 60 modules, a green and gray pattern helps
visualize that modules in different groups. Qualitatively grouping the modules requires
a Pearson distance matrix which use correlation coefficient to define the distance between different observations 47 . Also, since there is no domain knowledge suggesting
the number of clusters, a k-means clustering analysis was used to determine the number of clusters. Fig 4.16 shows the WCSS as a function of clusters number, k, and there is
a clear “elbow point” when k equals 5. An HCA dendrogram of solar noon time performance ratio using the Pearson distance matrix and average linkage criteria is shown in
Fig. 4.17. Modules are divided into 5 groups using the cut r ee function. The first group
on the left consists of all the modules on the fixed rack. The second group are all modules on tracker 4, 6, and 8 except for three modules of brand M. The third group are all
modules on tracker 14 and the forth group are all modules on tracker 12. The last group
on the right contains only three modules of brand M. Time series of each modules are
plotted out according to the HCA result (Fig. 4.18). There are several gaps since data was
Results: Real-world data analytics
45
Figure 4.15. Pairs plot of solar noon time PR of ten modules. For each row,
all the Y axis are the PR of one module. For each column, all the X axis
of the plot are PR of one module. Module’s sample number is shown in
the diagonal boxes. Correlation coefficient of each X, Y axis is calculated,
and represented by color. The darker the green background represent the
higher correlation coefficient. First three modules showed strong correlations (over 99%) among themselves. They also showed fairly strong
correlations to next three modules on a different location (about 90%).
However first three module showed low correlation to last four modules,
correlation coefficient is lower than 30%.
not continuous due to snow and noncontiguous AC power data. The variability of the
curve in the same group are mainly caused by the noncontiguous nature of the data.
The largest dispersion of data curves appears in the last group. Before mid February,
data curves of three modules are highly varied.
Results: Real-world data analytics
Figure 4.16. Total within-cluster sum of square (WCSS). The elbow point
occurs when k equals 5.
46
Results: Real-world data analytics
Figure 4.17. The hierarchical clustering of 60 modules based on solar
noon time PR time series data. The distance matrix is computed using
the Pearson method. Distance between the sets of observations is defined with the average linkage method. From left to right, the first group
includes all modules on fixed rack; the second includes all modules from
tracker 4, 6, and brand N on tracker 8; the third included all modules on
tracker 14; the fourth included all modules on tracker 12; the last group
are three modules from brand M (on tracker 8).
47
Results: Real-world data analytics
Figure 4.18. Solar noon time PR of 60 modules grouped by hierarchical
clustering result. Color of the curve differentiates samples.
48
49
5
Discussion
5.1 Data analytics
This section will focus on the problems found in the process of data cleaning, munging,
and exploratory data analysis (EDA).
5.1.1 Irradiance data crosscheck
In the data subsampling part (Section 4.6), due to snow coverage, some modules, especially those on the fixed rack showed low performance during and after snowy days.
Snow has the potential to cover irradiance sensors on SunFarm. Since all irradiance data
used in this study were measured by a pyranometer mounted on top of a electrical cabinet near the fixed rack, it is necessary to evaluate the irradiance data quality.
There were two pyranometers working on SDLE SunFarm during the observation
time, the GHI data used in this work is collected by a CMP11 pyranometer mounted on
an electrical cabinet. The other one was mounted horizontally and connected with a
Daystar multi-tracer. The Daystar can trace real-time I-V curves of up to 32 modules.
Unlike dataloggers, the Daystar doesn’t collect irradiance measurements every minute,
it recordes irradiance data only when an I-V curve was being taken. In the first several
months, the Daystar took an I-V curve in 30 minute time intervals. After proper data
Discussion
50
●
2012−12−26
2012−12−27
2012−12−28
●
●
●
400
60
●
●
60
50
●
●
40
300
●
●
●
●
●
●
200
●●
●
30
40
●
●
●
●●
●
●●
20
●
●
●●
●
●
20
●
●
100
●
●
●
10
●
●
●
●
●
●
●
●
●
Dec 26
04:00
Dec 26
08:00
Dec 26
12:00
Dec 26
16:00
Dec 26
20:00
Dec 26
23:59
●●
●●●●
Dec 27
00:00
Dec 27
04:00
Dec 27
08:00
Dec 27
12:00
Dec 27
16:00
Dec 27
20:00
Dec 27
23:59
●●●●
Dec 28
00:00
Dec 28
04:00
●
●
●●●
Dec 28
08:00
Dec 28
12:00
Dec 28
16:00
2012−12−29
2012−12−30
2012−12−31
fog−snow
fog−snow
fog−snow
150
250
Dec 26
00:00
●
●
0
●●●
●●●●
0
0
●
●
Dec 28
23:59
●
●
●●
●
200
60
●
Dec 28
20:00
●
●
●
●
●
●
150
40
100
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
●
●
●
●
●
●
50
20
●
●
●
100
●
●
●
●
●
●
●
●
●
●
0
Dec 29
Dec 29
Dec 29
Dec 29
Dec 29
●
Dec 29
Dec 29
●●●●
Dec 30
Dec 30
●
●
Dec 30
●
●●
Dec 30
Dec 30
Dec 30
Dec 30
●
●
0
●●
0
●
●●●●●
●●●
●●●●
Dec 31
Dec 31
Dec 31
Dec 31
Dec 31
Dec 31
Figure 5.1. Cross-comparison of global horizontal irradiance (GHI) from
two irradiance sensors.
cleaning and data alignment, each day’s irradiance data collected by the two pyranometers was compared in the same figure. In Fig. 5.1, red points represent irradiance data
collected by the Daystar, the black curve shows irradiance collected by the datalogger.
Because the two instruments have different sensitivity and their sampling time and rates
are not the same, an efficient way of evaluating the data is visual comparison. In most of
the plots, red dots stack on black curve. However in the second plot on the first row, December 27th, 2012, the red points are far above the curve which indicates that irradiance
measurement on the Daystar pyranometer is much higher than the one near fixed rack.
It is potentially caused by snow coverage. The weather condition data confirms that this
Discussion
51
data was collected on a snowy day. A survey of the irradiance cross check has been done
by plotting two sensors’ data together. On the following dates, fixed rack pyranometer’s
reading is lower than Daystar pyranometer. "2012-12-27, 2012-12-28, 2012-12-30, 201301-04, 2013-01-25". These days’ data were eliminated
5.1.2 Performance ratio filtering
Several methods of data filtering were tried; however, with limited data sources (direct
POA irradiance, Direct Normal Irradiance (DNI), and the module’s temperature were not
available for the case study), the ±15 min around solar noon time PR data was considered the best filtering method when malfunctions like snow covered module surface and
snow covered irradiance sensor data were eliminated from the final dataset. Average PR
of 30 min around solar noon time is plotted in Fig. 5.2. Modules on the fixed rack have
the highest PR, around 0.75. Modules on trackers showed lower PR due to the fact that
POA on tracker was over estimated by conversion, and during last two months of operation some microinverters on trackers saturated at noon time.
5.2 Performance at different relative positions
There are only two different positions of the PV modules relative to the sun (fixed rack
tilted at 22.3◦ and tracker mounted); however, due to the tracker’s mechanical failures
during parts of the exposures, there are actually six different positions of PV modules
relative to the sun. This is because the relative positions of modules on different trackers
are not identical in practice.
Discussion
52
Figure 5.2. Average performance ratio of 60 PV modules 15 min around
solar noon time. Y axis are the sample IDs of each module. From left to
right are modules on fixed rack 1 (blue), tracker 4 (red), tracker 6 (yellow),
tracker 8 (green), tracker 12 (purple), and tracker 14 (light blue) respectively.
5.2.1 Performance in sunny days
Modules mounted on the dual axis trackers should always track the sun, which increases
the amount of incident light to a module. Furthermore, as all the incident light is normal to the modules surface, a larger portion of the sunlight can be absorbed. Intuitively,
modules mounted on trackers should produce more electricity than the ones mounted
on fixed rack. Fig. 5.3 shows normalized AC power production of one PV module on
Discussion
53
Figure 5.3. Comparison of normalized AC power of G sa18211.00 (red) on
tracker and D sa18286.00 (blue) on tilted rack on December 12t h , 2012.
fixed rack (blue) and one PV module on a normally operating tracker (red) on December
12, 2012. Fixed rack is tilted at 22.3◦ which tends to optimize the irradiance gain during summer months of the year. Thus in December, even on a bright sunny day, noon
time peak power didn’t reach 70% of the nominal power. The total power production on
this particular day of a specific module on a tracker (G.t4 sa18211.00) is 1.64 times of the
power production of D.f1 sa18286.00. However, in the summer time, as the sun’s elevation angle is higher at noon time, performance of the modules on fixed rack increases.
Fig. 5.4 shows the AC power production of the same modules on May 13t h , 2013. The
modules on the trackers gain more power in the morning and afternoon, but almost the
same as the module on fixed rack around noon time. As discussed above, one factor is
Discussion
54
Figure 5.4. Comparison of normalized AC power of G sa18211.00 (red) on
tracker and D sa18286.00 (blue) on tilted rack on May 13t h , 2013
the sun’s elevation angle in May is closer to the fixed rack’s tilt angle at noon time. Another possible effect is microinverter’s saturation. From the data provided by Enphase,
M215’s maximum output power is 225 W 48 . Around noon time, the AC power of brand
G module’s output increased to 225 W and the microinverter saturated, which forms the
”flat head” of the curve.
5.2.2 Performance in snowy days
Snow-coverage occurred on all of the modules mounted on fixed rack. When a PV module is fully covered by snow, there is no light incident to solar cell and the module’s power
Discussion
55
production is zero. If part of the solar cell is exposed to sunlight, it will generate electricity. However, because of the bypass diodes, if only one cell is working then current cannot flow. This small amount of electricity will be dissipated as heat, raising the temperature of the PV module, and melting the snow on the surface. As a result, snow coverage
should not last long, but it will increase the thermal stresses on the solar cell and string,
which may lead to reliability issues in the future. In contrast, most of the modules on
the tracker do not have the snow coverage problem. On a normal working tracker, the
tracker frame is almost vertical during dawn and dusk, thus it mechanically avoids snow
accumulation on the module surface.
5.3 Performance of different brands
PV modules are chosen for PV power plants based on their nominal power from a module’s data sheet. Fig. 4.2 shows that the brands H and Q didn’t reach 95% of their nominal
power, and therefore that a power plant could loose 5% of the designed power output by
using these two models of PV module. For a nominal 10 MW utility scale power plant,
that is over 500 kW power “lost" due to the over-estimation of the PV module’s power
output. In comparison, power plants using module type A, J, or S may produce more
power than specified, which is not necessarily good since it may over load the power
grid or subsequent electricity storage instruments. Furthermore, a PV module’s price is
also based on the nominal power. For example, while brand A and Q share the same
nominal power, the initial selling price of brand A in2011 was 0.82 $/W, while brand Q
was 0.80 $/W, suggesting Brand Q is cheaper. However, given the baseline results of the
two brands, brand A’s initial performance is 101% of its nominal power, but brand Q’s
initial performance is only 92.5%. The final price of brands A and Q are 0.812 $/W and
Discussion
56
0.865 $/W, respectively. Thus, brand A is actually a more cost effective choice. Deviation
of power output of the brand may also be an important factor for the initial performance
of a PV power plant. Since most power plants are implemented with string inverters,
power output of a string is determined by the lowest module in the string. The lower the
power output deviation of modules in the same string, the less the electricity dissipated
as heat in a string. Internal thermal stress is also suspected as a factor that causes power
degradation.
5.4 Power time series data clustering
In the hierarchical clustering of power time series data two distance methods were applied, Euclidean distance and dynamic time warping (DTW) 49 . Euclidean distance is the
most commonly used method, which is the square root of sum of squares of attribute
differences. DTW is an algorithm often used in time series analysis for measuring the
similarity of two temporal sequences, which may vary in time or speed. However, in our
application DTW was not used because all time series data were rigidly aligned. The
speed of variation of AC power was determined by incident sun light and a module’s
internal characteristics, so they should be at the same pace. Therefore, dynamic time
warping is not applicable and Euclidean distance is more appropriate.
There are only two different positions of the PV modules relative to the sun (fixed
rack and tracker mounted); however, due to the mechanical failures of the trackers during parts of the exposure time, there are actually six different locations of PV modules relative to the sun. This is because the relative position of modules on different trackers are
not identical in practice. In order to find a proper linkage criteria, hierarchical clustering results were compared; using three linkage criteria; complete, single and average. In
Discussion
57
complete-linkage clustering, or farthest neighbor clustering 50 , the distance of two clusters is equal the distance between those two elements that are farthest from each other.
In single-linkage clustering, or nearest neighbor clustering; the distance between two
clusters equals to the distance between the nearest elements pair. Average clustering, or
UPGMA (unweighted pair group method with arithmetic mean) defines the distances
between two clusters as the average of all distances between pairs of elements. Dendrograms using all three different linkage criteria with the group equal to six, the modules
on the same site are always grouped in one cluster. Cutting these 6 large groups into
20 smaller clusters, the modules from the same brand are more likely to form a cluster
than the dendrograms using the other two criteria. Although k-means clustering itself
is an independent clustering method, in this application, it was used to determine the
number of groups and confirm the result of the hierarchical clustering. For power data
clustering, k-means clustering consistent to hierarchical clustering result when k equals
six, which is also the elbow point. This result confirmed that modules’ location (which
site they were mounted on the SDLE SunFarm) has the strongest influence on modules
power production over time. The analytical method of power time series data clustering gives a way of distinguishing PV systems mounted in different configurations. In
this case study, as observed on the field, it was already known that six sites worked differently. For data shared through Energy CRADLE, which does not necessarily carry a
maintenance log with them, this power time series clustering will be a good tool to start
to classify the power data.
Discussion
58
5.5 Solar noon time performance ratio clustering
Solar noon time PR clustering is based on a subset of PR time series data. Snow covered
PV modules and snow covered irradiance sensor data were removed from the data. For
the other 85 days, average of solar noon time PR was taken. In hierarchical clustering,
the Pearson correlation coefficient was used as distance metric, it is a measure of how
similar two time series’ shapes are, in other words, how similarly two modules’ performances vary with time. As with power time series clustering, the average linkage criteria
was used. The clustering results can be interpreted as a group of fixed rack, a group of
functioning trackers, two groups of malfunctioning trackers, and a group of malfunction modules. Fig 4.17 shows this clearly, the first cluster on the left are all the modules
on fixed rack. Even at solar noon time, modules performance on fixed rack is different
from the ones on trackers because they are tilted at 22.3◦ , which is much shallower than
the trackers at solar noon time in the winter. The second cluster on the left consists all
modules from trackers 4 and 6, and one brand from tracker 8. These three trackers were
tracking correctly most of the time, however 6 and 8 stopped for a short time. Taking the
means of PR at solar noon time minimized the influence of outliers, and made the difference between fixed rack and tracker more distinguishable. It is noteworthy that the
“fixed rack group" is very close to the“normal tracker group", which indicates their performances are very similar. Fig.4.18 proved that the shape of curves in group 1 and group
2 are similar, though the amplitudes are different because of the difference in angles.
Tracker 12 was off tracking most of the time, and tracker 14 was not in motion, so
they each formed a group. The last group on the right isolated brand M from other
modules on the same tracker. Brand M had been replaced during the experiment because previous models are not compatible with Enphase M215 microinverter. During
Discussion
59
the first two months of operation errors were reported by Enlighten, so these modules
were replaced on Feb. 15th. Previous clustering result didn’t reflect this change, yet using noon time PR data and the Pearson coefficient distance method distinguished these
changes and the distance between this cluster to the other modules on the same tracker
(brand N modules) is quite far, which indicates performance of brand M is unlikely to be
related to brand N. Solar noon time mean PR clustering result using Pearson correlation
coefficient distance metric neglected diminutive differences among functioning trackers 51 , differentiated fixed rack and trackers, identified malfunctioning tracker 12 ,14 and
malfunctioning module brand M.
60
6
Conclusions
Previous work on time series analysis (TSA) of a photovoltaic (PV) system’s real-world
performance focused on determining precise and accurate Rd by means of using highly
filtered data and neglecting seasonality. My research provides a higher fidelity data analytics approach to TSA. With a case study of the first six mouths of real-world performance of 60 PV modules and climate conditions, a data analytic procedure was developed, which includes the following parts.
Raw data was first validated by characterization of the measurement apparatus. The
impact of microinverters working in burst mode, and microinverter’s efficiency on AC
power data were evaluated. Irradiance data was converted from global horizontal plane
(GHI) to plane of array (POA), and the systematic error being introduced was discussed.
Secondly, using the redundancy of measurements, snow covered irradiance sensor data
was first detected by visually cross checking the daily irradiance profile; and then eliminated from the data set. Thirdly, data alignment was accomplished using a cross-correlation
function in R to minimize the time lag of two time series.
Exploratory data analysis (EDA) on integrated data indicate that the total energy harvest of each PV system varies severely. A clustering analysis on power time series data
Conclusions
61
found that PV systems located on the same mounting system performed similarly, and
the behavior of all the tracking systems was not identical.
Data was assembled based on IEC 61724, and performance ratio (PR) was used to
evaluate the PV system’s performance. PR of the 60 modules were subsampled to solar
noon time, and snow covered days were eliminated. The solar noon time performance
ratio clustering neglected diminutive differences of different sites, strongly differentiated fixed rack and trackers, identified malfunctioning tracker 12, 14 and malfunction
module brand M.
This work leads to improvements to the SunFarm metrology platform and suggests it
is necessary to have redundant measurements. The Clustering results provide guidance
for future data modeling.
62
7
Future research
7.1 Improved SunFarm data quality and redundancy
Since the conversion of GHI to POA introduced systematic error, sufficient irradiance
measurements are critical to improve data quality. Another pyranometer measuring
global irradaince was mounted on tracker 7 in June, 2013. A pyrheliometer, which measures direct irradiance was also mounted on tracker 7 at the same time. Thus, a direct
measurement of POA and NDI( normal direct irradiance) are now available. By taking
the ratio of NDI and POA, it is possible to determine clearness (or cloudiness) of the sky.
Redundant irradiance measurement will also be made to cross-check the sensor’s accuracy. On the Energy CRADLE’s user interface, a sensor cross-check page will enable
real-time on-site monitoring.
7.2 Predictive model
The Global SunFarm Network has already come online, there will be sufficient amount
of PV performance data coming from different climatic conditions available for the next
step. In the next phase of study, it would be interesting to build a mixed effect model 52
of PV module’s power output as a function of multiple climate stresses. A mixed effect
Future research
63
model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. Furthermore, by
correlating to indoor test results, we hope be able to predict a PV module’s degradation
with climatic stresses. This kind of data can direct the improvement of PV modules qualification testing, which will eventually lead to improved lifetime for PV modules.
Appendix
64
Appendix A
List of 24 manufacturers and nameplate power
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Manufacturer NameplatePower
AUO
240
Astronergy
235
Bosch
225
CSI
220
Conergy
220
ET Solar
235
EcoSolargy
230
Helios
240
Hyundai
230
Kyocera
240
LG
220
MX Solar
230
Mage
230
Perlight
240
REC
230
Sanyo
220
Schott
230
Schuco
240
Sharp
235
Siliken
220
Solar World
230
Trina
230
UpSolar
240
Yingli
230
Appendix
65
Appendix B
SunFarm network
1 SDLE SunFarm design & characteristics
1.1 Overview
The SDLE SunFarm was established for outdoor testing of long lived materials, components, and systems 53,54 . It is a highly instrumented outdoor test facility which is not
commonly found in academic research. It is located on CWRU west campus, and is
about one acre in size. There are 16 electrical sites on the SunFarm including 14 high
precision dual axis trackers and two sites of fixed tilt racking, shown in Fig B.1. A total
of 148 full-sized crystal silicon modules bought on the open market from 24 different
manufacturers in sets of six or eight were exposed on both trackers and fixed racking.
On the trackers 8000 PV material samples will be exposed under 1X, 2X, 4X, and 5X suns
illumination with front surface mirror concentrators.
1.2 Samples
Samples being exposed on the SDLE SunFarm are divided into two major groups: PV
modules and PV material sample coupons.
Full-sized crystalline PV modules. In order to better understand power degradation
mechanisms and determine power degradation rates (Rd ) 148 full-sized crystalline silicon modules from 24 different manufacturers around the world are being exposed on
SunFarm to investigate their performance under real-world working conditions. The
majority of the population are polycrystalline silicon modules, only two brands are monocrystalline silicon modules. 24 manufacturers and their nameplate power are listed in
appendix A. The 60 crystalline PV module samples studied in this thesis are part of this
Appendix
Figure B.1. Top half shows the blue print of SDLE Sunfarm and the distribution of 16 electrical sites. Bottom half shows an operating tracker and
the electrical cabinets behind trackers. On the tracker frame showing in
the figure, top half features six PV modules mounted horizontally, and the
bottom half features 48 sample trays mounted in 12 by 4 rows.
66
Appendix
67
Figure B.2. A mechanical drawing of a sample tray, on the left.A 3D drawing of 5X front surface mirror concentrator on the right.
total population. These 60 modules are from 20 different manufacturers with 3 samples
from each brand.
PV material samples. In order to better understand PV modules degradation mechanisms, we need to know how each component of PV modules degrade over time. PV
material and component samples are made and exposed on the SDLE SunFarm Backsheet samples, front sheet samples and transparent conductor oxide (TCO) samples are
cut into 1 × 1.5 inch coupon and held with sample trays(Fig. B.2).With the use of front
surface concentrator (Fig. B.2), PV material samples can be exposed at 1X, 2X, 4X, 5X
sunlight intensities as well as real-world climate conditions. Material samples will be
taken off periodically for optical characters measurements.
1.3 PV mounting system
Two different PV mounting system are being used on the SunFarm. Fourteen dual-axis
trackers which are commonly used for high-concentration photovoltaics (HCPV) can
Appendix
68
keep the module plane normal to sunlight. Two sites are fixed tilt rack, which is commonly used for roof PV installation or utility power plant. Fixed racks are facing south
and tilted at 22.3◦
Fixed rack. Power produced from a PV array is proportional to the direct sunlight it
receives. Typically, fixed PV arrays are tilted to an angle equal to the latitude of the arrays’
location, which is the average elevation angle of the sun through out a year. Here in
Cleveland since we usually have more cloudy days in winter, the fixed rack on SDLE
SunFarm was installed at a shallower angle; latitude of the SDLE SunFarm is 41.5◦ , the
tilt angle of the fixed racking is 22.3 ◦ . The 30 meters of fixed tilted rack are divided into
two identical electrical sites. Eighteen PV modules are exposed on each site.
Trackers. Dual-axis solar trackers orient PV modules normal to direct sunlight at all
times. They are often seen in concentrated photovoltaic (CPV) applications especially
HCPV systems, which enable the optical components in concentration system. In flatpanel PV applications, trackers can maximize the performance of PV modules by minimizing the indecent angle of sunlight to the module plane. The 14 dual-axis trackers
on the SDLE SunFarm were manufactured by Feina Tracker, Spain. Each tracker consist of three parts, foundation pole, tracker head, and tracker panel, shown in Fig. B.3.
Tracker head is 10 feet off the ground, driven by two DC motors in both horizontal and
azimuth directions. The motion of the motors is controlled by a Tracker Control Unit
(TCU) inside the 4 to 5 inch electrical box behind each tracker.
Tracker panels are 16’ 4” (5m) width× 13’ 1” (4m) length, shown in Fig B.4, which
can hold up to 12 PV modules in landscape mode. In order to enhance the capability
of testing various modules, components, and materials ten flexible unistrut were placed
on the tracker panel to fasten modules and sample trays to the tracker.
Appendix
69
Figure B.3. The photo was taken when SunFarm was under construction.
Relative position of fixed rack and tracker was shown in the image, fixed
rack is in the front of SunFarm in order to avoid shading issue. Relative
position of tracker head, tracker foundation, and two electrical cabinets
behind each tracker are shown on the right hand side.
Figure B.4. A mechanical drawing of tracker frame is shown on the left
hand side. Distance between horizontal unistrut is 0.5m in order to
mount sample trays. On the right, there is a drawing of a tracker frame
fully loaded with 12 full size PV modules.
1.4 SunFarm electrical design
Two sites of fixed tilted rack, plus 14 dual-axis trackers formed the 16 electrical sites of
the SDLE SunFarm. Two electrical cabinets behind each site separate the power device
Appendix
70
and datalogger system (Fig B.3). 110V AC is connected to the power cabinets to power
the tracking system as well as Ethernet switches. In the data cabinet, a data logging
system consist of Campbell CR1000 datalogger, multiplexer, and battery monitor sensor
readings. The SDLE SunFarm have 122 individual power plants, including 120 individual
PV modules connected with microinverter and 2 strings of 8 PV modules connected with
string inverter. These 122 power plants, which can generate about 32 kW of electricity at
peak, all tied to grid through a reversing relay.
1.5 Metrology platform
Power Data. A metrology platform is built for SDLE SunFarm data monitoring, include
power, insolation, and weather monitoring. For power monitoring, either inverters or
I-V curve tracers were used. Two trackers with eight full-sized modules on each used
Solectria PVI1800 string inverters. Another 10 trackers as well as two tilt rack sites used
two brands of micro-inverters, Enphase and Power-One. On the other two trackers a
Daystar multi-tracer is used to take I-V curves of full-sized modules and mini-modules
with one minute time intervals. A portable I-V curve tracer was used on clear days to
take I-V curves on demand.
Insolation data. Redundant insolation sensors were placed around the SDLE SunFarm
in order to get accurate irradiance data and align the trackers. Four Kipp & Zonen pyranometers of three different models (CMP6, CMP11, CMP21) were placed on the horizontal, tilt rack, and tracker planes. A Kipp & Zonen pyrheliometer (CHP1) was used to
measure direct illumination. Multiple split-cell reference cells, Li-cor Li-200 pyranometers, and Apogee SP-212 full spectra radiance sensors were placed in the tracker plane
to help align the tracker frames’ orientations. Another four Apogee SP-212 full spectra
Appendix
71
irradiance sensors and apogee SU-100 UV sensors were mounted on the sample trays to
measure the concentrated solar irradiance.
Climate data. Two Vaisala WXD520 weather stations were placed on the SunFarm to
record wind speed, wind direction, rainfall, rain intensity, rain duration, and humidity.
An anemometer was connected to the Master Control Unit of trackers to monitor the
wind load on the trackers. A snow cup was used to measure the precipitation. T-type
thermocouples were used for backsheet temperature monitoring.
Data acquisition system. The data acquisition system consists of 17 networked Campbell Scientific CR-1000 dataloggers, with each datalogger connected to an AM 16-32
multiplexer, extending the capacity of datalogger to 32 differential measurement channels. The Campbell dataloggers monitor thermocouple and sensor outputs. Enphase
micro-inverters use envoy unit to collect data from each individual micro-inverter. Similarly, Solectria string inverters use Solenview system to collect data. Minute by minute
data can be downloaded from their web servers.
1.6 SunFarm Network
Cleveland’s climate, a humid continental, is not typical for PV degradation research. In
order to study PV modules’ performance under different climatic conditions, a global
SunFarm network was established among nine PV outdoor test beds across the world.
These test beds include four Ohio SunFarms: SDLE SunFarm, Cleveland, Ohio; Lakeview
1MW power plant, Cleveland, Ohio; Replex SunFarm, Mt. Vernon, Ohio; and AEP Dulan
test center, Columbus, Ohio (Fig. B.5).
Within the United States, we cooperate with two Q-Lab SunFarms in Arizona and
Florida, which are in mid-latitude desert climate and humid subtropical climate area,
Appendix
72
Figure B.5. Upper left corner is SDLE SunFarm with tilt fixed rack in the
front and duel axis trackers on the back. Bottom right is Replex SunFarm
at Mount Vernon Replex Plastics, with fixed rack, single axis tracker and
dual axis trackers. Bottom left is AEP SunFarm at Dolan Technology Center. Mirror Augmented PV (MAPV)system are on the bottom half of the
tilt racks and flat back surface mirrors were mounted tilted towards the
modules. The top half of the tilt rack has a non-augmented PV system.
respectively. On an even larger scale, we established three SunFarms abroad with international collaborators: Underwriter Lab Sunfarms in Taitung and Lujhu, Taiwan, and
SunFarm at the Indian Institute of Technology Gandhinagar, Ahmedabad (IITGN). These
nine SunFarms span a large the range of environmental conditions across the globe.
Similar data collection methods were applied to each SunFarm. In order to better manipulate the Big Data that streams back daily, and manage the sensors that go on each
site, a data acquisition system Energy CRADLE was established.
Appendix
73
2 Energy CRADLE SunFarm informatics
The purpose of the Energy Common Research Analytics and Data Lifecycle Environment
(Energy CRADLE) is to create for engineering, and in particular lifetime science, the tools
and protocols necessary to transform Big Data to information, which informs scientific
knowledge to guide further analysis 30 38 39 40 . Energy CRADLE is tightly focused on serving the needs of handling and sharing data among the SunFarm network researchers.
Raw data collected from the SunFarms will go through data pre-processing and semantic annotation and stored in a NO-SQL Hadoop system. With domain knowledge Energy
CRADLE can manage the organization and orchestration of the data, making the inquiry
of the data more efficient. The Energy CRADLE data integration environment has two
features, shown in Fig. B.6. First, it can push all the raw data collected from SunFarms
on to a Hadoop Distributed File System (HDFS) and further map to HBase which is a
distributed database. Secondly, through Thrift and REST servers, user can use a visual
front end to interact with data stored in HBase.
Figure B.6. Architecture of NO-SQL Hadoop system.
Appendix
74
The front end of Energy CRADLE (Fig. B.7)consists of four different web pages. According to their functions, the four pages were named: Data Inquiry, Equipment Registration, Maintenance, and Metrology Check. On the Data Inquiry page, data can be
queried by location, system ID, local time, or local solar time. Invalid data points or NAs
are filtered out. Equipment Registration page was built for the purpose of sensors and
sample management. Location, serial number, and calibration coefficients can be registered remotely from any one of the SunFarms. Because the top of the tracker is more
than 30 feet above the ground when it is operating, maintenance(e.g, changing samples)
has to done when the track is at birdbath mode. Whenever maintenance is needed, the
operator can go to the Maintenance page to specify the location and duration. Data collected during maintenance will be flagged in the database warning that the tracker is not
at normal operation mode. The redundant insolation and weather sensors cross check
to assure that sensors are working correctly. The Metrology Check page can plot the
same variables collected by multiple sensors comparatively, making the cross checking
easy.
Appendix
75
Figure B.7. Architecture of Energy CRADLE’s user front end.
Bibliography
76
Complete References
[1] Kurt Hornik. The R FAQ, 2013.
[2] Mano´’el Rekinger Ioannis-Thomas Theologitis Myrto Papoutsi Ga´’etan Masson,
Marie Latour. Global market outlook fo photovoltaics. European Photovoltaic Industry Association annual report, 2013.
[3] SANDRA ENKHARDT. Germany sets new pv installation record in 2012, January
2013.
[4] Becky Beetz. China 2012: 5gw of pv installations predicted, October 2012.
[5] Stephen Lacey. A solar system is installed in the us every 4 minutes, August 2013.
[6] J. Hemminger, G. Crabtree, and A. Malozemoff. Science for energy technology:
Strengthening the link between basic research and industry. A report from the Basic
Energy Sciences Advisory Committee, US Department of Energy, 2010.
[7] John Hemminger. From quanta to the continuum: Opportunities for Mesoscale Science. Technical report, A Report from the Basic Energy Sciences Advisory Committee, 2012.
[8] Andrew L Rosenthal and Cary G Lane. Field test results for the 6 mw carrizo solar
photovoltaic power plant. Solar cells, 30(1):563–571, 1991.
[9] John H Wohlgemuth, Daniel W Cunningham, Paul Monus, Jay Miller, and Andy
Nguyen. Long term reliability of photovoltaic modules. In Photovoltaic Energy Conversion, Conference Record of the 2006 IEEE 4th World Conference on, volume 2,
pages 2050–2053. IEEE, 2006.
[10] Dirk C Jordan and Sarah R Kurtz. Photovoltaic degradation rate-an analytical review. Progress in Photovoltaics: Research and Applications, 21(1):12–29, 2013.
[11] M.P. Murray, L.S. Bruckman, and R.H. French. Durability of acrylic: Stress and response characterization of materials for photovoltaics. In Energytech, 2012 IEEE,
pages 1 –6, May 2012.
[12] Myles P. Murray, Laura S. Bruckman, and Roger H. French. Photodegradation in a
stress and response framework: Poly(methyl methacrylate) for solar mirrors and
lens. Journal of Photonics for Energy, 2(1):022004–022004, 2012.
Bibliography
77
[13] D. C. Jordan and S. R. Kurtz. The dark horse of evaluating long-term field
performance-data filtering. In PV Module Reliability Workshop, February 26–27,
2013, Golden, Colorado, 2012.
[14] Ryan M Smith and National Renewable Energy Laboratory (U.S.). Outdoor PV Module Degradation of Current-Voltage Parameters Preprint. Number 5200-53713 in
NREL/CP. National Renewable Energy Laboratory, Golden, CO, 2012.
[15] Radhika Lad, John Wohlgemuth, and G TamizhMani. Outdoor energy ratings and
spectral effects of photovoltaic modules. In Photovoltaic Specialists Conference
(PVSC), 2010 35th IEEE, pages 002827–002832. IEEE, 2010.
[16] D Jordan and S Kurtz. Photovoltaic degradation risk. In World Renewable Energy
Forum, Colorado, 2012.
[17] A. Kimber, T. Dierauf, L. Mitchell, C. Whitaker, T. Townsend, J. NewMiller, D. King,
J. Granata, K. Emery, and C. Osterwald. Improved test method to verify the power
rating of a photovoltaic (PV) project. In Photovoltaic Specialists Conference (PVSC),
2009 34th IEEE, pages 000316–000321, 2009.
[18] Iec 60891 ed2.0 - photovoltaic devices - procedures for temperature and irradiance
corrections to measured i-v characteristics | iec webstore | publication abstract, preview, scope.
[19] Sanford Weisberg. Applied linear regression, volume 528. John Wiley & Sons, 2005.
[20] John Fox and Sanford Weisberg. An R companion to applied regression. Sage, 2011.
[21] D. C. Jordan and S. R. Kurtz. Data filtering impact on PV degradation rates and uncertainty (poster). In PV Module Reliability Workshop, 28 February - 2 March 2012,
Golden, Colorado, 2012.
[22] Nils H Reich, Alexander Goebel, Daniela Dirnberger, and Klaus Kiefer. System performance analysis and estimation of degradation rates based on 500 years of monitoring data. In Photovoltaic Specialists Conference (PVSC), 2012 38th IEEE, pages
001551–001555. IEEE, 2012.
[23] Klaus Kiefer and Daniela Dirnberger. A degradation analysis of pv power plants. In
25th EUPVSEC, 2012,Valencia,Spain, pages 005032–005037. EUPVSEC, 2010.
[24] Matthew J. Reno and Joshua Stein. Using cloud classification to model solar variability. Technical report, Sandia National Laboratories, 2013.
Bibliography
78
[25] Mike Anderson Zoe Defreitas Mark Mikofski Yu-Chen Shen Zach Campeau, Charlie Hasselbrink. Validation of the pvlife model using 3 million module-years of live
site data. In 39th IEEE Photovoltaic Specialists Conference, 2013.
[26] J. Ye, T. Reindl, and J. Luther. Seasonal variation of PV module performance in tropical regions. In Conference Record of the IEEE Photovoltaic Specialists Conference,
pages 2406–2410, 2012.
[27] F.A. Mejia and J. Kleissl. Soiling losses for solar photovoltaic systems in california.
Solar Energy, 95:357–363, 2013.
[28] IEC 61724 ed1.0 - photovoltaic system performance monitoring - guidelines for
measurement, data exchange and analysis | IEC webstore | publication abstract,
preview, scope.
[29] Werner Horn, Silvia Miksch, Gerhilde Egghart, Christian Popow, and Franz Paky.
Effective data validation of high-frequency data: time-point-, time-interval-, and
trend-based methods. Computers in biology and medicine, 27(5):389–409, 1997.
[30] G.Q. Zhang, T. Siegler, P. Saxman, N. Sandberg, R. Mueller, N. Johnson, D. Hunscher,
and S. Arabandi. Visage: A query interface for clinical research. In Proceedings of
the 2010 AMIA Clinical Research Informatics Summit; San Francisco. March 12–13;
2010, pages 76–80, March 2010.
[31] German Puebla, Francisco Bueno, and Manuel Hermenegildo. A generic preprocessor for program validation and debugging. In Analysis and Visualization Tools
for Constraint Programming, pages 63–107. Springer, 2000.
[32] John W Tukey. Exploratory data analysis. Reading, Ma, 231, 1977.
[33] Matthew B Miles and A Michael Huberman. Qualitative data analysis: An expanded
sourcebook. Sage, 1994.
[34] Michael R Anderberg. Cluster analysis for applications. Technical report, DTIC
Document, 1973.
[35] Rui Xu and Don Wunsch. Clustering, volume 10. Wiley. com, 2008.
[36] wikipedia. hierarchical clustering, June 2013.
[37] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical
statistics and probability, volume 1, page 14. California, USA, 1967.
Bibliography
79
[38] G.Q. Zhang, S. Arabandi, and S. Redline. Physio-MIMI lessons learned. Technical
report, National Center for Research Resources (NCRR), 2011.
[39] Physio-MIMI homepage, 2012.
[40] R. Mueller, S. Sahoo, X. Dong, S. Redline, S. Arabandi, L. Luo, and G.Q. Zhang. Mapping multi-institution data sources to domain ontology for data federation: The
PhysioMIMI approach. In AMIA Clinical Research Informatics Summit, March 2011.
[41] Enphase. âĂIJburst-modeâĂİ makes enphase micro-inverter systems the smarter
choice. Enphase Whitepaper Series, 2010.
[42] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. Constrained k-means
clustering with background knowledge. In ICML, volume 1, pages 577–584, 2001.
[43] Heinrich Häberlin. Normalized representation of energy and power of pv systems.
Photovoltaics: System Design and Practice, pages 487–506.
[44] Richard Perez, Pierre Ineichen, Robert Seals, Joseph Michalsky, and Ronald Stewart.
Modeling daylight availability and irradiance components from direct and global
irradiance. Solar energy, 44(5):271–289, 1990.
[45] PVEducation.org. Solar time, June 2009.
[46] PVEducation.org. weather data, June 2013.
[47] Kevin R. Coombes. oompaBase: Class unions and matrix operations for OOMPA,
2013. R package version 3.0.1.
[48] Preston Steele Tefford Reed David Briggs, Dave Williams. Bigger is better: Sizing
solar modules for microinverters. Enphase Whitepaper Series, 2010.
[49] Meinard Müller. Dynamic time warping. Information Retrieval for Music and Motion, pages 69–84, 2007.
[50] Charles J Krebs et al. Ecological methodology, volume 620. Benjamin/Cummings
Menlo Park, California, 1999.
[51] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(347-352):240–242, 1895.
[52] Alex J Sutton, Keith R Abrams, David R Jones, David R Jones, Trevor A Sheldon, and
Fujian Song. Methods for meta-analysis in medical research. J. Wiley, 2000.
Bibliography
80
[53] Yang Hu, Mohammad A Hosain, Tarun Jain, Yashwanth R Gunapati, Lauren Elkin,
GQ Zhang, and Roger H French. Global sunfarm data acquisition network, energy
cradle, and time series analysis. In Energytech, 2013 IEEE, pages 1–5. IEEE, 2013.
[54] Yang Hu, Dave Hollingshead, Mohammad A Hossain, Mark Schuetz, and Roger
French. Comparison of multi-crystalline silicon pv modules’ performance under
augmented solar irradiation. In MRS Proceedings, volume 1493, pages mrsf12–1493.
Cambridge Univ Press, 2013.
Download