Detailed Explanation - IEEE Standards Working Group Areas

advertisement
Descriptions of Candidate Major Event Day Classification Methods
Collected by: C. A. Warren
Contributors: R. D. Christie, C. Williams, C.A. Warren
February 8, 2002
After the winter power meeting, it was clear that there was profound confusion about the
possible ways to segment days using the proposed methods; three sigma and boot strap.
This paper attempts to clear up the confusion. Three methods were applied to the data as
described below.
Method A
Many members took the natural logarithm (ln) of the SAIDI/Day data, then found the
mean and standard deviation of the log data. The next step was to compute the mean + 3
standard deviations and then to find e(mean + 3 standard deviations) to obtain the cutoff value for
“normal” days.
Method B
Boot Strap – see previous documents
Method C
On the “3 sigma” method described by Charlie Williams, Crystal Ball software was used
to perform the calculations.
A fit of the SAIDI/Day data f(x) is made. Crystal Ball identifies the best fitting
probability distribution. This typically turns out to be the log normal distribution. Crystal
Ball calculates a mean (m) and standard deviation (s) for the distribution. We tried to
duplicate the Crystal Ball results for m and s using Excel, but had no luck. Reading the
Crystal Ball manual, it seems that the program is finding the natural logarithm of the
data, taking the log mean and log standard deviation and then finding the geometric mean
(em) and geometric standard deviation (es). Once the mean and standard deviation are
found, mean + 3 standard deviations is calculated to be the threshold for determining
abnormal days.
To clarify these processes, we first need to agree on some common terminology about the
log normal distribution. A short tutorial on the log normal distribution follows:
Log Normal Distribution
The log-normal distribution is the probability distribution where the natural log of the
sample values has a normal distribution. The probability density function (pdf) is given
by
1
 1  ln x    2 
f x  
exp  
 
x 2
 2    
1
where α is the mean of ln(x) and β is the standard deviation of ln(x). These are related to
the mean and standard deviation of random variable x (μ and σ respectively) as follows:


  exp  
1 2
 
2

  exp 2   2 exp  2   1
2

1   
  ln   ln    1
2   

   2 
  ln    1
  

And note that
 ln X   
Px  X   F 




where P is the cumulative probability function (CDF) of the log normal distribution,
which is
 1  ln x    2 
Px  X   
exp  
 dx
2

x

2


 
0

X
1
and F is the cumulative probability function (CDF) for the standard normal probability
distribution.
Discussion
From this, Method A above involves finding the values of the log normal parameters 
and , and computing a threshold for ln(x) of  + 3. Since  and  are the mean and
standard deviation of ln(x), which is normally distributed if x is log normal, this should in
theory result in the identification (on average) of a fixed percentage of days as Major
Event Days, specifically, 0.0013 * 356 = 0.49 MEDs or about a half a day a year, on
average.
Consistently larger numbers of MEDs would tell us the tail of the distribution does not
conform to the log-normal distribution. This in turn may be the result of crew saturation,
or of a separate weather process, i.e. hurricanes on top of normal weather.
2
Method C seems to match Method A as far as finding the values of  and , but there
appears to be no justification for what happens after that. Clearly there will be a different
threshold value R*, though, since:
Method A threshold: R A*  e  3 
Method C threshold: RC*  e   3e b
Method A theoretically identifies a fixed number of days per year, on average, as MEDs.
Since there is little justification for calculating the threshold shown above for Method C,
we should consider using Method A.
Note that method A answers Rich’s critique of the “Three Sigma” method as inequitable.
The critique was based on the notion of using the mean and standard deviation of the
daily reliability to set the threshold, rather than mean and standard deviation of the
natural log of the daily reliability. Method A is in fact mathematically identical to the
curve fitting method Rich originally proposed, except that the number of MEDs/year is
hidden in the choice of 3 as the multiplier of . As proposed above, it seems simpler for
engineers to understand and implement than the way Rich originally presented the curve
fitting method.
From this point forward let’s call Method A the "Three Beta" method, since there is a
good chance that people hearing "Three Sigma" will just use the standard deviation of the
samples, rather than the standard deviation of the natural logs of the samples.
Note also that there is an issue in how to calculate  and  if any days have SAIDI/day of
zero. Rich explained how to deal with this in the curve fitting method, but it would
increase the complexity of method A by an order of magnitude. It's probably best just to
drop them from the data, if there are few of them. This would make the average higher,
but probably not much.
To remove or not to remove 6 beta days…
The other remaining question is whether to segment the data twice – first to remove 6
beta days and then to remove 3 beta days.
Let's call this Method A' (A prime). The method involves finding  and  as above, then
setting a first threshold
R A* '1  e  6  
then removing all days with reliability greater than this first threshold, and computing
new values ' and ' on the remaining data, and finding a second threshold
R A* ' 2  e  ' 3 ' 
3
and removing all days greater than this threshold.
The clear effect is to lower the final threshold, that is,
R A* ' 2  R A*
A statistics professor could probably tell you exactly how many more MEDs per year this
creates, on average, assuming daily reliability is log-normally distributed, but for now
that's beyond the scope of the discussion. Note that chopping off the tail undoubtedly
makes the remaining data fit the log-normal distribution less well, so the there is less
theoretical basis for using log normal parameters  and  to compute the second
threshold.
If there was a 1 in 25 year chance of something happening then we should not want
ignore it. We have setup a concept that says MEDs are identified and analyzed separately
because utility response is qualitatively different. Does utility response get analyzed in
the 1 in 25 year event? You bet it does, just as the guys in Montreal (or Missouri...) are
getting analyzed. So I think the outlier thing represents wishful thinking by the utility.
In terms of impact on average reliability, what's at stake is the extra MEDs generated by
the reduced values of alpha and beta after the outliers are removed. I hate watching
utilities scrabble to improve their reliability statistics. If I were the PUC I would demand
that they report ALL unreliability, and then allow them to report and analyze major event
performance and some measure of routine reliability performance. What we should be
thinking of is a valid measure of routine reliability performance.
Process
Use the attached spreadsheet and do the following:
1.
2.
3.
4.
Using five years of data (or as many as you have up to five).
Calculate SAIDI/Day and order from Highest to Lowest
Calculate the natural log (LN function) of each value.
Calculate the mean () (AVERAGE function) and standard deviation ()
(STDEV function) of the natural log values.
5. Find the threshold by e( + 3 ) (EXP function).
6. Segment the days above the threshold into the abnormal group.
4
Download