UNIVERSITY OF CALGARY A Comparison of Mean Estimators by

advertisement
UNIVERSITY OF CALGARY
A Comparison of Mean Estimators
by Maryam Moghadasi A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTEMENT OF CHEMICAL AND PETROLEUM ENGINEERING CALGARY, ALBERTA January, 2014 © Maryam Moghadasi 2014 Abstract The mean values of reservoir parameters such as permeability, porosity, and
hydrocarbon reserves are widely used to evaluate a formation for potential development
and perform reservoir simulations. Among different mean estimators, the arithmetic
average and Swanson’s rule are commonly used within the petroleum industry.
In the petroleum literature, Swanson’s rule has been promoted as a superior
alternative to the arithmetic average. A few researchers have evaluated its performance
for the case of a log-normal distribution with a limited range of variability but they have
overlooked its performance for other types of distribution, which may describe the
distributions of reservoir parameters. Prior studies only concentrated on evaluating the
bias of Swanson’s rule whereas an optimum mean estimator should simultaneously have
zero bias, small uncertainty, consistency, and high efficiency. In addition to bias, this
research study, thus, evaluates the performance of mean estimators based on these toher
properties.
This research study also compares the performance of Swanson’s rule with some
well-known mean estimators: the arithmetic average, maximum likelihood estimator, and
Pearson-Tukey’s rule for log normal and the power-normal and bimodal distributions.
The mean estimators’ properties are analytically derived and numerically validated via
Monte Carlo simulation. We find that none of these mean estimators simultaneously
satisfies all conditions of an optimum mean estimator for all ranges of variability and
sample size. In other words, each mean estimator can be an optimum mean estimator
depending on sample size, variability, and distribution type.
Being unbiased is a desirable property, but it is not necessarily the most important
property because a mean estimator can be de-biased. We propose a de-biased version of
Swanson’s rule and find it is an appropriate alternative for approximating the mean value,
particularly for a data set with large standard deviation and small sample size. Moreover,
we evaluate the performance of the mean estimators when data follow a first-order auto-
ii
regressive model to illustrate that the auto-correlation causes the mean estimators to
behave differently compared to the uncorrelated case.
iii
Acknowledgments I would like to express my sincere gratitude to my advisor, Dr. Jerry Jensen for his
help, guidance, and encouragements throughout my Ph.D. study.
I wish to thank the members of my advisory committee, Dr. Jalal Abedi and Dr.
Hassan Hassanzadeh as well as my examining committee, Dr. Laurence Robert Bentley
and Dr. Clayton Deutsch for their time and comments.
I would also like to thank all my friends and fellow graduate students, in particular:
Dr. Danial Kaviani, Mohammad Soroush, and Mehdi Majdi Yazdi for their continuous
friendship and support during my Ph.D. study.
I gratefully acknowledge the financial support from Natural Sciences and Engineering
Research Council of Canada (NSERC).
Finally, and the most importantly, I would like to extend my gratitude to my husband
(Mehdi Bahonar), my parents (Iraj Moghadasi and Pari Anvari), brother (Alireza
Moghadasi), and sister (Roya Moghadasi) for the endless love, support, and
encouragement they have given me throughout my Ph.D. study, without which this work
would not have been accomplished.
iv
Wxw|vtàxw àÉ Åç WxtÜ
ctÜxÇàá 9[âáutÇw
YÉÜ à{x|Ü _Éäx 9fâÑÑÉÜà
9
ZÉw yÉÜ à{x XÇwÄxáá bÑÑÉÜàâÇ|à|xáA
v
Table of Contents
Table of Contents
Abstract
.......................................................................................................................... ii
Acknowledgments .............................................................................................................iv
Table of Contents ..............................................................................................................vi
List of Figures....................................................................................................................ix
List of Tables ...................................................................................................................xiv
Nomenclature ................................................................................................................... xv
Chapter 1 : Introduction...................................................................................................1
1.1 Thesis Organization................................................................................................... 3
Chapter 2 : Literature Review .........................................................................................6
2.1 2.2 Notation..................................................................................................................... 6
Definitions................................................................................................................. 6
2.2.1 Bias .................................................................................................................. 7
2.2.2 Uncertainty....................................................................................................... 7
2.2.3 Consistency ...................................................................................................... 7
2.2.4 Efficiency......................................................................................................... 8
2.3 Detailed Analysis of Literature ................................................................................. 9
2.3.1 Arithmetic Average .......................................................................................... 9
2.3.2 Discretization Methods .................................................................................... 9
2.3.3 Maximum Likelihood Estimator .................................................................... 14
2.4 Distributions Types ................................................................................................. 15
2.5 Gaps in the Existing Body of Knowledge ............................................................... 17
Chapter 3 : Performance Evaluation for the Case of the Log-Normal Distribution 20
3.1 3.2 3.3 Analytical Expressions of Mean Estimators’ Properties ......................................... 20
Validation of Analytical Expressions using Monte Carlo Simulation .................... 22
Analysis of the Analytical Expressions of the Mean Estimators’ Properties .......... 25
vi
Table of Contents
3.4 Improving Swanson’s Rule ..................................................................................... 32
3.4.1 Adjusting Swanson’s Rule by a Coefficient .................................................. 32
3.4.2 Moment Matching with Fixed Values ........................................................... 34
3.5 Concluding Remarks ............................................................................................... 38
Chapter 4 : Performance Evaluation for the Case of Bimodal Distribution .............39
4.1 4.2 4.3 4.4 Analytical Expressions of Mean Estimators’ Properties ......................................... 40
Validation of Analytical Expressions using Monte Carlo Simulation .................... 42
Analyses of the Analytical Expressions of Mean Estimators’ Properties ............... 44
Concluding Remarks ............................................................................................... 47
Chapter 5 : Performance Evaluation for the Case of Power-Normal Distribution ..48
5.1 5.2 5.3 5.4 5.5 Analytical Expressions of Mean Estimators’ Properties ......................................... 48
Validation of Analytical Expressions using Monte Carlo Simulation .................... 51
Analyses of Mean Estimators’ Properties ............................................................... 55
Improving Swanson’s Rule ..................................................................................... 58
Concluding Remarks ............................................................................................... 64
Chapter 6 : Performance Evaluation for the Case of Auto-Correlated Random
Variables 65 6.1 6.2 6.3 6.4 6.5 6.6 Assumptions ............................................................................................................ 65
Analytical Expressions of Mean Estimators’ Properties ......................................... 67
Analytical Expression Validations Using Monte Carlo Simulation ........................ 70
Analysis of the Analytical Expressions of the Mean Estimators’ Properties .......... 81
Auto-Correlated Random Variables with Bimodal Distribution ............................. 85
Concluding Remarks ............................................................................................... 89
Chapter 7 : Comparison of Mean Estimators for Independent Random Variables. 90
Chapter 8 : Case Studies.................................................................................................96
Chapter 9 : Conclusions and Recommendations........................................................105
9.1 9.2 Conclusions ........................................................................................................... 105
Future Work .......................................................................................................... 107
9.2.1 Evaluate Swanson’s Rule Performance for Very Small Sample Sizes ........ 107
9.2.2 Consider Beta Distribution for Percentiles .................................................. 108
9.2.3 Extend Delfiner’s Approach ........................................................................ 108
9.2.4 Evaluate Swanson’s Rule Performance for Truncated Log-normal Distribution.................................................................................................................... 108
vii
Table of Contents
Appendix A : Order-Statistics Samples .......................................................................109
Appendix B : Moments of the Maximum Likelihood Estimator ...............................112
Appendix C : Conditions for a Bimodal Distribution.................................................114
Appendix D : First and Second Moments of Maximum Likelihood for Bimodal
Distribution
............................................................................................................117
Appendix E : First and Second Moments of a Power Normal Distribution .............119
Appendix F : Parameters of the First Order Auto-Regressive Model ......................123
Appendix G : Moments of Discretization Methods for the Case of Dependent Random Variables .........................................................................................................125
Appendix H : Moments of the Maximum Likelihood Estimator for Dependent Random Variables .........................................................................................................129
References ...................................................................................................................... 131
viii
Table of Figures
List of Figures
Fig. 2-1 – Estimated x10, x50, x90, and xSR (black squares) compared to the exponentialregression function (Delfiner 2007).................................................................................. 11
Fig. 3-1 – Comparison of E(xT)/E(X) and (xT)A/E(X) of (a) SR and PT (b) the AA and MLE. ................................................................................................................................. 23
Fig. 3-2 – Standard errors of the AA, MLE, SR, and PT obtained from analytical and
numerical approaches for the cases of  = 1 and  = 1.5 . ............................................... 24
Fig. 3-3 – RMSE’s of the AA, MLE, SR, and PT obtained from analytical and numerical approaches for the cases of  = 1 and  = 1.5. ................................................................. 25
Fig. 3-4– Analytical ratios of E(xT)/E(X) versus σ and the Dykstra-Parsons coefficient. 26
Fig. 3-5 – Ratio of SE’s to x50 of the AA, MLE, SR, and PT for four different σ values. 29
Fig. 3-6 – Ratios of SE/x50 of the AA, SR, MLE, and PT versus σ and VDP for (a) n=50 and (b) n=600.................................................................................................................... 29
Fig. 3-7 – Ratio of RMSE to x50 of the AA, SR, MLE, and PT for four different σ values.
........................................................................................................................................... 30
Fig. 3-8 – Ratios RMSE/x50 of the AA, MLE, SR, and PT versus σ when (a) n=50 or (b) n=600. ............................................................................................................................... 31
Fig. 3-9 – Sample standard deviation obtained from analytical expression and MC simulation with error bars showing 95% confidence interval (a) for two different σ values (b) for general case............................................................................................................ 33
Fig. 3-10 – E(xSR‐C1 ) obtained from analytical expression and MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown.................. 33
Fig. 3-11 – Weights of SR versus σ, where σ is known and unknown with error bars showing 95% confidence interval..................................................................................... 35
ix
Table of Figures
Fig. 3-12 – E(xSR‐C2 ) obtained from analytical expression and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. .......................................................................................................... 36
Fig. 3-13 – Ratio of the expected values of SR, SRC1, SRC2, PT, and the MLE to E(X). 36
Fig. 3-14 – (a) RMSE/x50 and (b) SE/x50 of the AA, MLE, PT, SR, SRC1, and SRC2 versus σ and VDP when n=200. .................................................................................................... 37
Fig. 3-15 – Ratio of the RMSE’s of the AA, MLE, PT, SR, SRC1, and SRC2 to x50 versus the square root of the inverse of sample size. ................................................................... 37
Fig. 4-1– Bimodal region when µ1=1 and σ2=0.5............................................................. 41
Fig. 4-2 – (a) Expected value and (b) SE of the AA. ........................................................ 43
Fig. 4-3 – (a) Expected value and (b) SE of MLE. ........................................................... 43
Fig. 4-4 – (a) Expected value and (b) SE of SR................................................................ 44
Fig. 4-5 – (a) Expected value and (b) SE (b) of PT. ......................................................... 44
Fig. 4-6 – Ratio E(xT)/E(X) of (a) the AA and MLE, and (b) SR and PT when σ2=0.5... 45
Fig. 4-7 – Standard errors of the AA, MLE, SR, and PT for four different values of σ1. 46
Fig. 4-8 – RMSE’s of the AA, MLE, SR, and PT for four different values of σ1. ........... 47
Fig. 5-1 – Ratios of MC to analytical results of (a) expected value, (b) SE, and (c) RMSE of the AA for the case of square root power-normal distribution. .................................... 53
Fig. 5-2 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of SR for square root power-normal distribution. ................................................. 54
Fig. 5-3 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of the PT for square root power-normal distribution. ........................................... 55
Fig. 5-4 – E(XT)/E(X) of (a) SR and (b) PT versus σ for different λ values..................... 56
Fig. 5-5 – Analytical ratios of (a) E(xSR)/E(X) and (b) E(xPT)/E(X) versus VDP for
different λ values............................................................................................................... 56
Fig. 5-6 – Standard errors of the AA, SR, and PT for four different values of λ and σ. .. 57
x
Table of Figures
Fig. 5-7 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. ............. 58
Fig. 5-8 – Justified weights of SR versus σ for three different λ values when σ is known and unknown with error bars showing a 95% confidence interval. .................................. 60
Fig. 5-9 – E(xSR_C) analytically derived and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. 61
Fig. 5-10 -- Ratio of the expected value of SRC to E(X) for four λ values....................... 61
Fig. 5-11 – SE’s of the AA, SR, and PT for four different values of λ and σ. ................. 62
Fig. 5-12 – σ versus n showing regions that SRc has smaller SE than (a) PT and (b) the
AA..................................................................................................................................... 63
Fig. 5-13 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. ........... 63
Fig. 5-14 – σ versus n showing regions where SRC is more efficient than SR when (a) λ=1/2 and (b) SRC is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient. .................................. 64
Fig. 6-1 – Expected value of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95%
confidence interval............................................................................................................ 70
Fig. 6-2 – Standard error of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95%
confidence interval............................................................................................................ 71
Fig. 6-3 – Expected value of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 72
Fig. 6-4 – Standard error of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 73
Fig. 6-5 – Expected value of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 74
xi
Table of Figures
Fig. 6-6 – Standard error of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 75
Fig. 6-7 – Expected value of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95%
confidence interval............................................................................................................ 76
Fig. 6-8 – Standard error of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 77
Fig. 6-9 – The ratio of
expected values of mean estimators to x50 obtained from
analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. ................................... 78
Fig. 6-10 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5. ................................... 79
Fig. 6-11 – The ratio of
standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. ................................... 80
Fig. 6-12 – The ratio of
standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5................................... 81
Fig. 6-13 – Analytical standard errors/x50 of the AA, SR, and PT. .................................. 82
Fig. 6-14 – Analytical RMSE/x50 of the AA, SR, PT, and MLE...................................... 83
Fig. 6-15 – The ratio of standard errors of the mean estimators to x50 which analytically
derived for three different ρx1 values when σ=1.5. ........................................................... 84
Fig. 6-16 – RMSE/x50’s of the mean estimators analytically derived for three different ρx1
values when σ=1.5. ........................................................................................................... 85
xii
Table of Figures
Fig. 6-17 – Standard errors of the AA, SR, and PT with error bar showing 95% confidence interval............................................................................................................ 88
Fig. 6-18 – RMSE`s of the AA, SR, and PT with error bar showing 95% confidence. ... 89
Fig. 7-1 – σ versus n showing regions in which a mean estimator has (a) the smallest bias, (b) has the lowest SE, and (c) is the most efficient estimator compared to other estimators for the case of log-normal distribution.............................................................................. 91
Fig. 7-2 –σ1 versus n showing regions in which (a) a mean estimator has smaller uncertainty, and (b) is more efficient than other estimators when σ2=0.5 for the case of bimodal distribution. ......................................................................................................... 92
Fig. 7-3 – SR has smaller SE than the AA when σ is greater than the value given by each curve depending on n and λ; otherwise the AA has less SE for the case of power-normal
distribution (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). .................................................................................................. 92
Fig. 7-4 – (a) PT is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient; and (b) when λ =1/16, a mean estimator is the most efficient depending on σ and n (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). ................................. 93
Fig. 8-1 – Probability plots of data sets taken from (a) Hurst et al. (2000) in million barrel oil (MMBO), and (b) EUR of an OK field in million cubic feet (MMCFE) with statistical properties calculated from available data sets. ................................................................. 97
Fig. 8-2 – Probability plot of the data set taken from MacCrossan (1969) with sample
statistical properties calculated from available data sets. ............................................... 101
Fig. 8-3 – Probability plot of the transformed EUR of the Hemphill gas field with
exponent λ=0.28.............................................................................................................. 102
Fig. 8-4 – Probability plot of a permeability data set taken from North Sea. ................. 103
xiii
List of Tables
List of Tables
Table 3-1– Analytical expressions of E(xT)/E(X)............................................................. 23
Table 3-2– Analytical expressions of RMSE’s of the mean estimators. .......................... 24
Table 5-1 – Derived ω’s for some power normal distributions with different λ’s............ 59
Table 8-1 – Statistical properties of the Hurst et al.’s (2000) data set. ............................. 98
Table 8-2 – Statistical properties of gas reserves of an Oklahoma field. ......................... 99
Table 8-3 – Statistical properties of measured permeability in Cleveland Formation. .. 100
Table 8-4 – Statistical properties of the data set taken from MacCrossan (1969). ......... 102
Table 8-5 – Statistical properties of EUR data set of the Hemphill gas field................. 103
Table 8-6 – Statistical properties of permeability data set measured along a well located
in the North Sea. ............................................................................................................. 104
Table E-1– Expected value of power-normal distribution for different λ values. .......... 121
Table E-2 – Bias of Swanson’s rule for different λ values. ............................................ 121
Table E-3– Bias of Pearson-Tukey for different λ values. ............................................. 122
xiv
Nomenclature
Nomenclature
Symbols
்ܾ =
Bias of the mean estimator ܶ
ܿ‫ݒ݋‬ሺ. ሻ = Covariance
‫ ܧ‬ሺ. ሻ = Expected value
݄௑ ሺ‫ݔ‬ሻ =
Probability density function of ܺ
‫ܪ‬௑ ሺ‫ݔ‬ሻ = Cumulative density function of ܺ
m = Number of data sets
n = Sample size
ܲ୧ = Assigned
weight to the uth percentile
‫ = ݏ‬Sample standard deviation
ܵ‫݀ݐ‬ሺ. ሻ = Standard deviation
ܶ = Mean estimator
ܸܽ‫ݎ‬ሺ. ሻ = Variance
ܸ஽௉ = Dykstra-Parsons coefficient
‫ݓ‬௨ = ିଵ ሺ‫ݑ‬/100ሻ
‫ݓ‬௨ ∗ =  ் ିଵ ሺ‫ݑ‬⁄100ሻ
‫ = ݔ‬Deterministic variable
‫ = ்ݔ‬Approximated mean value by the estimator ܶ using an analytical expression
‫ݔ‬ො் = Approximated mean value using the estimator ܶ obtained from numerical approach
xv
Nomenclature
‫ݔ‬௨ = The uth percentile
ܺ = Random variable
Abbreviations
AA = Arithmetic Average
AR(1) = First order auto-regressive model
CDF = Cumulative Density Function
CV= Coefficient of Variation
d.i.d.= Dependent and identically distributed
ESS = Effective sample size
EUR = Estimated ultimate recovery
i.i.d. = Independent and identically distributed
LF = Likelihood function
MC = Monte Carlo
mD = Millidary
MLE= Maximum Likelihood Estimator
MMBO = Million barrels of oil
MMCFE = Million cubic feet
MSE = Mean square error
N = Normal
OK = Oklahoma
PDF = Probability Density Function
xvi
Nomenclature
PT = Pearson-Tukey’s rule
RMSE = Root Mean Square Error
RV= Random variable
SD = Standard deviation
SE = Standard error
SR = Swanson’s Rule
TN = Truncated normal
TND = Truncated normal distribution
Greek Symbols
ߙ = Portion of a distribution in a bimodal distribution
ߚ = An index parameter and a correction factor
ߠ = Population parameter
λ = Exponent for power-normal transformation
ߤ = Standard deviation of a population
ߩఛ = Correlation coefficient between pairs of values separated by an interval ߬
ߪ = Standard deviation of a population
∅= Standard normal probability density function
 = Cumulative standard normal density function
 ் = Truncated cumulative standard normal density function
߱ = Assigned weights to the uth percentile in discretization methods
߱
ෝ = Estimated ߱
xvii
Nomenclature
Γሺ. ሻ = Gamma function
Subscript
A = Arithmetic average
dep = Dependent
dis = Discretization method
eff = Effective
xviii
Introduction
Chapter 1 : Introduction The mean values of reservoir parameters such as permeability, porosity, fluid
saturation, recovery factor, and hydrocarbon reserves are widely used to evaluate a
formation for potential reservoir development. For example, these parameters are
implemented into reservoir simulators to predict complex fluid flow behavior in
reservoirs and also used in decision analysis. Thus it is imperative to choose an optimal
mean estimator among several available options. An optimum mean estimator should
simultaneously be unbiased and consistent and have small uncertainty and large
efficiency (definition of these terms are given in the next chapter).
Among different mean estimators, the arithmetic average (AA) and discretization
methods, such as Swanson’s rule (SR) (Megill 1984), are commonly used within the oil
and gas industry. The AA approximates a mean value based on assigning equal weights
of 1/n to all n samples. SR, on the other hand, assigns different weights to the sample
10th, 50th, and 90th percentiles, respectively.
Two examples are briefly explained here to emphasize how the selection of a mean
estimator can significantly affect the project development process and economic
assessment. First dataset belongs to Cleveland Formation permeability measurements
(Rollins et al. 1992) and the second dataset consists of the estimated ultimate recovery of
416 wells located in the Hemphill Field, Texas, USA. In the first dataset, SR estimates
the mean value as 0.09 md while the AA gives 0.18 md; a 50% difference. For
production prediction, the two-fold difference is relatively modest. The difference is
important, however, for tax and regulatory purposes as it changes the Cleveland
Formation from a tight (< 0.1 md) to a conventional classification. In the second dataset,
the AA and SR give the sample means of 1,824 and 1,797 million cubic feet (MMCFE),
respectively; only a 1.5% difference. Although the difference is very small, it is
equivalent to around 27 MMCFE per well and 11,300 MMCFE in total difference in
reserves estimation which is in turn equivalent to the significant difference of ~US$ 34
millions in economical assessment, assuming the gas price of US$ 3/MCFE. Hence, it is
1
Introduction
clear that the choice of a proper mean estimator is important, particularly to estimate the
mean values of critical reservoir parameters such as hydrocarbon reserves and
permeability.
Despite many studies that have used and supported SR as an alternative mean
estimator, there are only few studies, such as Keefer and Bodily (1983), Megill (1984),
and Bickel et al. (2011), that have questioned SR applicability as a good mean estimator
and evaluated its performance in terms of its bias for the case of log-normal distribution
with limited variability. It is desirable to have an unbiased property; however, the bias
can be removed using a correction factor. Thus, besides bias, other mean estimators’
properties such as uncertainty, consistency, and efficiency should be evaluated. However,
none of Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) has
investigated the consistency, uncertainty, and efficiency of SR in addition to its bias.
Thus, this study focuses on the comprehensive evaluation of SR performance based
on its bias, uncertainty, consistency, and efficiency for the case of log-normal distribution
with wider range of variability since reservoir characteristics can be highly variable
(Rollins et al. 1992). SR performance is compared to the performances of the AA and
other discretization methods, such Pearson-Tukey’s rule (PT) (Pearson-Tukey 1969). In
addition to the AA and PT, the SR performance is also compared to the performance of
another mean estimator known as maximum likelihood estimator (MLE) since it is an
optimum mean estimator from statistical perspective (Kenney and Keeping 1951;
Quenouille 1956; Kendell and Stuart 1977).
The log-normal distribution is commonly used to describe the distributions of
reservoir parameters such as drainage areas, gross and net pay, reserves, recovery, and
permeability (e.g., Kaufman 1965; Bennion 1966; Megill 1984; Rollins et al. 1992; and
Rose 2001; Seidle and O’Connor 2003). Several studies, on the other hand, have
illustrated that reservoir parameters are not necessarily log-normally distributed and can
be described by other kinds of distribution such as power-normal and bimodal
distributions (e.g., Jensen et al. 1987; Seyedghasemipour and Bhattacharyya 1990; and
MacCrossan 1969). However, no attention has been paid to the SR performance when the
underlying distribution is either a power-normal or bimodal. Thus the intention of this
2
Introduction
research is also to evaluate the performances of the mean estimators for the cases of non­
log-normal distribution (e.g., power-normal and bimodal distributions).
Thus far, all studies on SR performance have been conducted with the assumption
that samples are independent and identically distributed. However, reservoir parameters
such as permeability might be auto-correlated (Jensen et al. 2000). Thus in this study, the
SR performance is evaluated and compared to the performances of other mean estimators
when random variables are identically distributed and follow the first order autoregressive model.
In the course of this study, it was found that none of the AA, MLE, SR, and PT has
all conditions of an optimum mean estimator for any sample size and variability. The AA
is unbiased and MLE is asymptotically unbiased whereas both SR and PT are biased
except for the case of the normal distribution. They estimate a mean value with small bias
for the nearly-homogeneous case but significantly underestimate the mean value as
variability increases. Although SR approximates the mean value with slightly larger bias
than PT, this study demonstrates that SR is more efficient and less uncertain than the AA,
MLE, and PT for some ranges of sample size and variability. Therefore, there is likely
that SR becomes an optimum mean estimator for some ranges of variability and sample
size because the SR bias can be compensated by its smaller uncertainty and higher
efficiency.
As just pointed out, bias can be removed using a correction factor and by making
some modifications in mean estimator’s formula. In this study SR is converted into an
unbiased mean estimator using two approaches: (1) adjusting SR weights based on
population variability and (2) using a correction factor. These approaches lead to the
introduction of an unbiased SR with the highest efficiency and lowest uncertainty for
some range of sample size and variability.
1.1
Thesis Organization
In the next few paragraphs, the main contents of each chapter are described.
Chapter Two describes previous work done on evaluating Swanson’s rule and other
mean estimators in detail. It provides the definitions of bias, uncertainty, consistency, and
3
Introduction
efficiency which will be widely used in the next chapters. Furthermore, some gaps in the
literature are identified, and motivate those issues that are assessed in this research study.
Chapter Three is devoted to evaluate the performance of selected mean estimators
when the underlying distribution is log-normal. In this chapter, it is attempted to propose
two methods to de-bias SR by modifying the weights of SR and defining a correction
factor.
Due to geology variation, reservoir parameters can follow a bimodal distribution.
Thus, Chapter Four evaluates the performance of mean estimators with the assumption
that reservoir parameters can be described by bimodal distribution. It shows that both PT
and SR are biased; the AA is unbiased; and MLE is asymptotically unbiased. Although
MLE has the smallest uncertainty and the highest efficiency, it involves complex
manipulation, thus other mean estimators are preferable.
Chapter Five describes the performance of mean estimators when reservoir
parameters follow a power-normal distribution. This chapter illustrates that SR estimates
a mean value with insignificant bias when the exponent for power-normal transformation,
ߣ, tends to one; however, it significantly underestimates the mean value as ߣ approaches
zero. None of the mean estimators under review is an absolute winner of being an
optimum mean estimator for any variability, sample size, and λ. Thus each of them can
be chosen as the optimum mean estimator for certain ranges of variability and sample
size, depending on the λ value. Moreover, in this chapter, it is attempted to de-bias SR by
modifying the SR’s weights.
Up to Chapter Five, it is assumed that reservoir parameters are independent; however,
reservoir parameters can be auto-correlated. Chapter Six assesses the performance of
mean estimators when random variables are auto-correlated. This chapter shows that
auto-correlation between data points results in a decrease in efficiency and an increase in
uncertainty. In other words, auto-correlated samples are less informative than un­
correlated samples, thus more auto-correlated samples, known as the effective sample
size (ESS), are needed to achieve a given accuracy. The auto-correlation causes the mean
estimators to behave differently and, depending on which mean estimator is used,
different ESS are needed to achieve a given accuracy. Chapter Seven gives an integrated
4
Introduction
view of the preceding results to define regions where one mean estimator has smallest
uncertainty and bias, and the highest efficiency among other mean estimator. These
regions are determined as a function of the number of samples and variability. This
chapter shows that at certain values of sample size and variability, SR is an optimum
mean estimator because its bias can be compensated by its smaller uncertainty and higher
efficiency.
In Chapter Eight, the applications of the mean estimators on some case studies are
shown. In this regard, the statistical properties of mean estimator are analytically
calculated and compared to the results obtained from the bootstrap method.
The last chapter lists the main conclusions of this study and raises some questions and
issues for future research.
5
Literature Review
Chapter 2 : Literature Review
As mentioned in Chapter One, the performances of the mean estimators are evaluated
based on four properties. This chapter provides the definitions of these properties in
detail.
This chapter also summarizes previous studies done on evaluating the performances
of those mean estimators that were mentioned in the previous chapter; following that, it
describes the issues which have been overlooked in these studies.
2.1
Notation
The following notation is used in this study. Assume ܺ is a random variable (RV)
with probability density function (PDF) of ݄௑ ሺ‫ݔ‬ሻ, where ‫ ݔ‬is a deterministic variable.
The expected value of ܺ is given by
ஶ
‫ ܧ‬ሺܺሻ ൌ ‫ି׬‬ஶ ߦ݄௑ ሺߦ ሻ݀ߦ , ................................................................................................ (2-1) and ܸܽ‫ݎ‬ሺܺሻ ൌ ‫ ܧ‬ሼሾܺ െ ‫ ܧ‬ሺܺሻሿଶ ሽ is the variance of ܺ.
‫ ்ݔ‬denotes an approximated mean value by an estimator T using an analytical
expression and ‫ݔ‬ො் represents the estimated mean value obtained from a numerical
approach using Monte Carlo (MC) simulation.
2.2
Definitions
The mean estimator ܶ, which estimates a population mean, ‫ ்ݔ‬, is a function of
samples that are randomly taken from the population. Therefore, ‫ ்ݔ‬is a RV and whose
behaviour is described by a PDF. The mean and standard deviation of this PDF are used
to analytically and numerically compute the mean estimator’s properties.
The choice of a mean estimator amongst other mean estimators depends on its
performance compared to the performances of other estimators which are evaluated based
on their properties: bias, uncertainty, consistency, and efficiency. These properties are not
6
Literature Review
necessarily the most important, but they are commonly considered to treat estimates thus
this section is devoted to describing these properties in detail.
2.2.1 Bias
It is desirable that the PDF of the estimate, ‫ ்ݔ‬, is centered around the true mean
value, ‫ ܧ‬ሺܺሻ; otherwise the mean estimator, ܶ, tends to underestimate or overestimate a
mean value. Bias measures the difference between the expected value of the mean
estimator, ‫ ܧ‬ሺ‫ ்ݔ‬ሻ, and the true mean value as
்ܾ ൌ ‫ ܧ‬ሺ‫ ்ݔ‬ሻ െ ‫ ܧ‬ሺܺሻ, .................................................................................................... (2-2)
The estimator, ܶ, is unbiased when ்ܾ ൌ 0, otherwise it is biased.
2.2.2 Uncertainty
Uncertainty as the second mean estimator’s property refers to the range of possible
outcomes and is desirable to be as small as possible. It is evaluated in terms of the
standard error (SE) of a mean estimator; an estimator with smaller SE has a lower degree
of uncertainty.
2.2.3 Consistency
Another feature used to assess the performance of estimators is consistency. It is
expressed by another parameter known as mean square error (MSE) which is given by
‫ ܧܵܯ‬ൌ ‫ ܧ‬ሼሾ‫ ்ݔ‬െ ‫ ܧ‬ሺܺሻሿଶ ሽ. The estimator, ܶ, is consistent when the following condition is
satisfied
ଶ
lim௡→ஶ ‫ ܧ‬ቄൣ‫ ்ݔ‬௡ െ ‫ ܧ‬ሺܺሻ൧ ቅ ൌ 0, ................................................................................. (2-3) where ݊ is the number of samples (Lindgren 1968). By expanding Eq. 2-3, the consistent
condition can be rewritten as lim௡→ஶ ൣܸܽ‫்ݎ‬೙ ൅ ்ܾ೙ ଶ ൧ ൌ 0, ........................................................................................ (2-4) where ܸܽ‫்ݎ‬೙ is the variance of the estimator ܶ. According to Eq. 2-4, MSE incorporates
both the bias and variance of the mean estimator, ܶ.
7
Literature Review
The MSE has the same units as the square of the quantity being estimated. In analogy
to the standard deviation, the square root of the MSE known as the root mean square
error (RMSE), which has the same units as the quantity being estimated, is considered
here instead of MSE. Taking square root of Eq. 2-4 modifies the consistent condition as
lim௡→ஶ ටൣܸܽ‫்ݎ‬೙ ൅ ்ܾ೙ ଶ ൧ ൌ 0. .................................................................................... (2-5) Eq. 2-5 is satisfied when the variance and bias of ܶ both approach zero as ݊ becomes
very large. In other words, the variation of the sequence ൛‫ ்ݔ‬௡ ൟ becomes more and more
concentrated around ‫ ܧ‬ሺܺሻ as ݊ increases or the sequence ൛‫ ்ݔ‬௡ ൟ converges in probability.
This means that probability that the sequence ൛‫ ்ݔ‬௡ ൟ differs from ‫ ܧ‬ሺܺሻ becomes smaller
and smaller as the number of samples tends to infinity. This definition is formulated as
(Lindgren 1968)
lim௡→ஶ ܲ൫ห‫ ்ݔ‬௡ െ ‫ ܧ‬ሺܺሻห ൒ ߳൯ ൌ 0, for any ߳ ൐ 0. ................................................. (2-6) However, Eq. 2-6 cannot always be satisfied because there might be an estimator ܶ′
that the variation of sequence ൛‫் ݔ‬ᇲ ௡ ൟ becomes more and more centered around a value
which is different from ‫ ܧ‬ሺܺሻ as ݊ becomes infinite. Therefore, the estimator ܶ′ is known
as the estimator that converges in probability but to a wrong value (Lindgren, 1968).
2.2.4 Efficiency
The forth property of an optimal estimator is that the RMSE should be as small as
possible. Lindgren (1968) states that the estimator ܶ is more efficient than an estimator ܶ′
if
ටൣܸܽ‫ ்ݎ‬൅ ்ܾ ଶ ൧ ൑ ටൣܸܽ‫்ݎ‬ᇱ ൅ ܾ ்ᇲ ଶ ൧. ........................................................................... (2-7) In other words, an estimator with smaller RMSE is more precise and efficient. In the
case of unbiased estimators, ்ܾ ൌ ܾ ்ᇲ ൌ 0, Eq. 2-7 is reduced to the inequality between
the SE’s of the two estimators, and the one with the smaller SE is more efficient.
8
Literature Review
2.3
Detailed Analysis of Literature
Among the different mean estimators, this study concentrates on a few mean
estimators which are extensively used in the oil and gas industry: the AA and
discretization methods such as SR and PT. In addition to the AA, SR, and PT, MLE is
another mean estimator that is evaluated here since MLE is an optimum mean estimator
from a statistical perspective (Kenney and Keeping 1951; Quenouille 1956; Kendell and
Stuart 1977).
2.3.1 Arithmetic Average
The AA is used to approximate the mean values of reservoir parameters such as
porosity and permeability when a medium is horizontally stratified and the flow path
parallels the layers. It estimates a mean value based on assigning equal weights of 1/݊ to
all n samples of ܺଵ , ܺଶ , … , ܺ௡ as
ଵ
‫ݔ‬஺ ൌ ∑௡௜ୀଵ ܺ௜ . ............................................................................................................ (2-8) ௡
For decades, the properties of the AA have been studied by many researchers such as
Kenney and Keeping (1951, p. 133), Lindgren (1968, p. 221), and Jensen (1998).
The sequence ሼ‫ݔ‬஺ ሽ is centred around ‫ ܧ‬ሺܺሻ with the variability of ܸܽ‫ݎ‬ሺܺሻ⁄݊ (i.e.,
‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ‫ ܧ‬ሺܺሻ, and ܸܽ‫ݎ‬ሺ‫ݔ‬஺ ሻ ൌ ܸܽ‫ݎ‬ሺܺሻ⁄݊) and thus the AA is an unbiased mean
estimator. It may suffer sampling variation since a limited number of samples is available
for estimating the mean value; however, it can be reduced by taking a sufficient number
of samples.
2.3.2 Discretization Methods
Among different discretization methods, SR and PT are mostly used as alternative
mean estimators within the oil and gas industry. They, unlike the AA, approximate the
mean value by assigning unequal weights to the percentiles. PT was introduced by
Pearson-Tukey (1969) as a mean estimator which gives the mean value as
‫ݔ‬௉் ൌ 0.185‫ݔ‬ହ ൅ 0.630‫ݔ‬ହ଴ ൅ 0.185‫ݔ‬ଽହ , ................................................................. (2-9) 9
Literature Review
where ‫ݔ‬ହ , ‫ݔ‬ହ଴ , and ‫ݔ‬ଽହ are the 5th, 50th, and 95th percentiles, respectively. Later in 1972,
Swanson proposed SR as an alternative mean estimator as
‫ݔ‬ௌோ ൌ 0.3‫ݔ‬ଵ଴ ൅ 0.4‫ݔ‬ହ଴ ൅ 0.3‫ݔ‬ଽ଴ , ............................................................................(2-10) where ‫ݔ‬ଵ଴ and ‫ݔ‬ଽ଴ are the 10th and 90th percentiles, respectively (Megill 1984). Swanson
empirically found this rule as a “good” mean estimator for modestly skewed distributions
(Megill 1984).
SR is more commonly used within the oil and gas industry than PT, perhaps because
it involves the 10th and 90th percentiles which are representative of possible and proved
reserves, respectively. Many studies recommended SR as an alternative mean estimator
such as Hurst et al (2000), Rose (2001), and Delfiner (2007). Arild et al. (2008), on the
other hand, suggested MC simulation instead of SR.
Hurst et al. (2000) applied SR to estimate the mean reserves of the fields and
discoveries in the upper Jurassic, salt related play in the United Kingdom and Norwegian
Central North Sea. They recommended the use of SR based on work done by Megill
(1984). Hurst et al. (2000), however, did not consider the bias and uncertainty associated
with SR.
Delfiner (2007) used SR to estimate permeability from the porosity-permeability
(Phi-k) relationship to approximate the pseudo-flow profile. In this regard, he vertically
divided the Phi-k cross plot on semi-log scale into slices with 5-p.u. width and computed
the 10th, 50th, and 90th percentiles for each slice (Fig. 2-1). Subsequently the effective
permeability was computed using SR for each slice and then a trend line was fitted
through the obtained SR points to provide an equation to estimate permeability at any
given porosity in the fitted range. He computed the pseudo-flow profile by two
approaches: first, the predicted permeability obtained from fitted curve through SR points
and second, the permeability approximated by exponential regression. Following that, the
estimated pseudo-flow profiles obtained from these two approaches were compared to the
true profile. Consequently, Delfiner (2007) illustrated an improvement in the predicted
pseudo-flow profile as a result of using SR. Thus, he proposed SR as a solution to resolve
10 Literature Review
the pitfall associated with Phi-k transforms. He also showed that SR improves the
permeability power averaging.
Fig. 2-1 – Estimated x10, x50, x90, and xSR (black squares) compared to the exponential-regression
function (Delfiner 2007).
Rose (2001) advocated SR to approximate the mean value of parameters which are
log-normally distributed with low and moderate heterogeneity, particularly hydrocarbon
reserves. He stated that the distribution of reserves should be truncated below ܲଵ and
above ܲଽଽ because of economic limits on producing reserves below ܲଵ and low
probability of occurrence of reserves above ܲଽଽ . Otherwise, approximating the mean
value yields an “unrealistically large” value. He referred to the population mean with the
truncated distribution as the truncated mean and stated that the truncated mean values are
close to the mean values estimated via SR. Rose (2001) supported his argument using a
log-normal distribution with the log-mean of ߤ ൌ 1.61 and log-standard deviation of
ߪ ൌ 1.67. He calculated the mean value before and after truncation and concluded that
SR underestimates the population mean by 24% error while it underestimates the mean
value after truncation by 4%. Hence, he supported the use of SR as an appropriate
alternative approach to estimate the mean value of hydrocarbon reserves.
Arild et al. (2008) compared SR to MC simulation in the context of the value of
information (VOI). VOI investigates whether additional information should be collected
prior to making a decision. They illustrated that results obtained from SR and MC
simulation are different and suggested the use of MC simulation instead of SR, although
11 Literature Review
they did not conclude which method has smaller bias since their problem did not have an
analytical solution.
While numerous studies have used SR to estimate a mean value, a few researchers
such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) have
investigated the SR performance to compare with other mean estimators in terms of their
bias. Indeed, a few studies have also considered the SR for its abilities to estimate higher
moments.
Keefer and Bodily (1983) used both PT and SR to approximate a population variance1
and referred to them as extended SR and PT, although both Pearson and Tukey (1965)
and Swanson (Megill, 1984) did not recommend PT and SR as estimators for higher
moments and only proposed them for estimating mean values. Keefer and Bodily (1983)
numerically investigated the performances of PT and SR for a wide range of beta
distributions and limited range of log-normal distributions with log-mean of ߤ ൌ 0 and
log-standard deviation of ߪ ∈ ሾ0.1, 1.5ሿ, judging their ability to approximate the mean
and variance values. They concluded that both PT and SR perform well as mean
estimators while PT estimates the variance more accurately than SR. Therefore, they
advocated PT as the “clear winner”. Keefer and Bodily (1983) also qualitatively
commented that it is more difficult to accurately approximate the 5th and 95th percentiles
compared to other percentiles which are closer to the center of a distribution. Hence they
recommended SR as an alternative mean estimator if being close to the center of
distribution is the main concern because SR involves the 10th and 90th percentiles to
estimate a mean value.
Megill (1984) investigated the bias of SR for the case of log-normal distribution. In
this regard, he plotted the ratio of ‫ݔ‬ௌோ ⁄‫ ܧ‬ሺܺሻ versus the ratio of ‫ݔ‬ଽ଴ ⁄‫ݔ‬ହ଴ (ൌ ݁ଵ.ଶ଼ఙ )
varying from one to 15 (i.e.,  = 0 to 2.1) as a measure of the variability of ܺ, where ߪ is
the log-standard deviation. He stated that SR is “close” to the mean value generated by
5000 iteration in MC simulation for the modestly skewed distributions; however, it starts
to be significantly biased as the distribution becomes highly skewed. For example, SR
‫ݎܽݒ‬ሺ‫ ݔ‬ሻ௉் ൌ 0.185 ‫ݔ‬ହ ଶ ൅ 0.63‫ݔ‬ହ଴ ଶ ൅ 0.185 ‫ݔ‬ଽହ ଶ െ ሺ‫ݔ‬௉் ሻଶ and ‫ݎܽݒ‬ሺ‫ ݔ‬ሻௌோ ൌ 0.3‫ݔ‬ଵ଴ ଶ ൅ 0.4‫ݔ‬ହ଴ ଶ ൅
0.3‫ݔ‬ଽ଴ ଶ െ ሺ‫ݔ‬ௌோ ሻଶ are estimated population variance using PT and SR, respectively.
1
12 Literature Review
underestimates the mean by 10% when the ratio ‫ݔ‬ଽ଴ ⁄‫ݔ‬ହ଴ is 5 and the bias increases to
45% when the ratio reaches 15.
Recently, Bickel et al. (2011) studied the biases of different discretization methods.
They described three different approaches to derive discretization methods. In
discretization methods, the PDF is approximated by a few representative values and
corresponding probabilities. Thus, they stated that direct application of moment matching
to each input distribution yields a maximum accuracy. Hence they applied moment
matching to develop weights for the 10th, 50th, and 90th percentiles, and they concluded
that SR has no analytical justification for any distribution other than normal distribution.
In other words, only when SR is applied to a data set which is normally distributed it
estimates moments with zero error. In addition, they investigated the performance of
different discretization methods by comparing them to MC simulation in the context of
estimating the population moments. For this comparison, they analytically derived the
number of samples required (i.e., S-equivalence) to achieve a probability that MC
simulation estimates the kth raw moment more accurately than discretization methods.
They computed the 95% S-equivalence for the uniform ܷሺ0, ܾሻ, normal ܰሺ0, ߪሻ,
triangular ܶሺ0, ܾ, ܾ⁄2ሻ , exponential ‫ ܧ‬ሺߣሻ, and log-normal ‫ܮ‬ሺߤ, 1ሻ distributions. Then
they demonstrated that the performance of SR is “quite poor” whereas other
discretization methods such as PT and GQN2 work well. Therefore, they did not support
the use of SR to estimate a mean value, and recommended using other more accurate
alternatives such as PT and GQN instead.
Bickel et al. (2011) and Keefer (1994) have suggested PT and GQN as alternative
mean estimators instead of SR. They, however, neglected the uncertainty associated with
PT and GQN caused by using percentiles which are close to the tails of a distribution.
One may wonder why PT and GQN outperformed SR. The first reason is that PT and
GQN give the weights 0.63 and 0.667 to ‫ݔ‬ହ଴ while SR assigns the weight 0.4 to ‫ݔ‬ହ଴ .
Thus ‫ݔ‬ହ଴ makes a larger contribution to mean estimate in PT and GQN than in SR. Steel
and Torrie (1980, p.19) claimed that the AA can be replaced by the median as the mean
estimator for a skewed distribution. However, the median is not affected by values that
2
GQN is based on applying the three-point moment matching to a normal distribution and it estimates a
mean value by weighting 4.2th, 50th, and 95.8th percentiles by 0.167, 0.667, and 0.167, respectively. 13 Literature Review
are distant from the center of a distribution. Thus it cannot be an appropriate
representative of the mean value, but it is a good start to estimate the mean value. The
second reason is that as the standard deviation of population increases, the data points
spread over a large interval (i.e., the distances of mean value from those points that are
close to extremes become large). ‫ݔ‬ଵ଴ becomes much larger than ‫ݔ‬ହ in lower tail and ‫ݔ‬ଽହ
becomes much larger than ‫ݔ‬ଽ଴ in higher tail as ߪ increases. For example, for the case of
log-normal distribution with log-standard deviation of ߪ ൌ 2, ‫ݔ‬ଵ଴ and ‫ݔ‬ଽହ are twice as
large as ‫ݔ‬ହ and ‫ݔ‬ଽ଴ , respectively. Therefore, using ‫ݔ‬ଵ଴ and ‫ݔ‬ଽ଴ as substitute for the lower
and upper tails of the distribution miss those values that are much smaller and larger than
‫ݔ‬ଵ଴ and ‫ݔ‬ଽ଴ , respectively. On the other hand, ‫ݔ‬ହ and ‫ݔ‬ଽହ are closer to extremes than ‫ݔ‬ଵ଴
and ‫ݔ‬ଽ଴ , thus they are better representative of the tails. Reliable estimation of the 5th and
95th percentiles, however, is more difficult than the estimation of the 10th and 90th
percentiles, especially for large variability.
2.3.3 Maximum Likelihood Estimator
The idea behind the MLE is to estimate parameters of a population such that they
maximize the probability of the sample data. From a statistical point of view, the MLE
method is the preferred estimator because it is asymptotically efficient, asymptotically
normal, and it converges in probability under general conditions for a large number of
samples (Kenney and Keeping 1951; Quenouille 1956; Kendell and Stuart 1977). In
general, MLE is biased; however, it becomes unbiased and asymptotically unbiased for
some distribution types such as normal and log-normal distribution, respectively (Kendell
and Stuart 1977; example 18.2).
Suppose the independent RV, ܺ, is taken from a population with continuous PDF of
݄௑ ሺ‫ ߠ|ݔ‬ሻ, where ߠ is unknown parameter. For any observed data set ܺଵ , … , ܺ௡ , the joint
PDF is denoted by ‫ܮ‬௡ ሺ‫ ߠ|ݔ‬ሻ ൌ ∏୬୧ୀଵ ݄௑ ௜ ሺ‫ݔ‬୧ |ߠ ሻ, which is called the likelihood function
(LF). The parameters ߠ is approximated by maximizing ݈݊ሺ‫ܮ‬ሻ, thus ݈݊ሺ‫ܮ‬ሻ should be at
least twice-differentiable function respect to ߠ.
All local maxima of LF are found such that
14
Literature Review
డሾ௟௡ሺ௅ሻሿ
డఏ
ൌ 0 and
డమ ሾ௟௡ሺ௅ሻሿ
డఏమ
൏ 0, ......................................................................................(2-11)
and if there is more than one, the largest one is selected.
2.4
Distributions Types
There are different methods to estimate a mean value, and a few of them are
commonly used in the oil and gas industry as described above. Each of these methods has
its own bias and SE, which may differ from one distribution type to another one. For
example, the bias of the AA does not depend on PDF, and it is unbiased for any PDF.
Although the use of SR requires no assumption on the underlying distribution types, the
bias of SR are different from one distribution to another one. As shown by Bickel et al.
(2011), SR has zero bias when the underlying distribution is normal while it becomes
biased for the log-normal distribution. Therefore, knowing the behaviour of the mean
estimators under different distribution types assist in selecting the appropriate mean
estimator for certain distribution type. This section, thus, summarizes the commonly used
distribution types in the oil and gas industry in order to concentrate on specific
distribution types and study the statistical properties of the mean estimators for each of
them.
Law (1944) was one of the first researchers who had statistically characterized
permeability data sets; he represented permeability variation by a log-normal distribution.
Many subsequent investigators had statistically analyzed permeability data sets (e.g,
Bennion 1966 and Lambert 1981). According to these studies, possible permeability
PDF’s include the normal, log-normal, and exponential distributions. Later, Jensen et al.
(1987) analyzed six permeability data sets and proposed that permeability distributions
are not necessarily log-normal and can be transformed into the normal distribution by a
power transformation. Therefore, power-normal distribution is another possible
permeability distribution.
In addition to the unimodal log-normal and power-normal distributions, a bimodal
distribution can be assigned to the reservoir parameters due to the existence of geological
heterogeneity. For instance, a formation may consist of fractures, high permeability
15 Literature Review
media, and matrix, shaly media with low permeability. Therefore, the permeability
distribution may be either bimodal or unimodal.
The oil and gas field size is another parameter that has been statistically studied by
many researchers for decades. Kaufman (1965) examined the Yule, Pareto, and log­
normal distributions on some empirical data, and he concluded that the log-normal is the
best distribution type to describe hydrocarbon reserves. Following that, the log-normal
distribution was used as the distribution of the oil and gas reserves by many researchers
such as Megill (1984), Hurst (2000), Rose (2001), and Cartwright (2007). However,
some studies have shown that hydrocarbon reserves is not necessarily log-normally
distributed. For example, Seyedghasemipour and Bhattacharyya (1990) introduced the
log-hyperbolic distribution as an alternative distribution for the reserves distribution by
studying Denver basin oil fields and MacCrossan (1969) illustrated that the bimodal
distribution is another possible reserves distribution by analyzing the oil and gas reserves
distributions of Western Canada from 1965 to 1969. MacCrossan (1969) showed that the
size-frequency distribution of ultimate recoverable reserves of Viking, Rainbow Reef,
and Nisku oil pools, and Leduc and Wabamun gas pools are bimodal and illustrated that
each of these bimodal distributions can be split into two log-normal distributions. He
described a few hypotheses to address conditions under which we can have reserves with
a bimodal distribution. First, bimodality may be a consequence of mixing two
geologically different groups, which have dissimilar reservoir characteristics such as
porosity. Second, this might be a result of combining a large-sized pool with larger mean
due to enhanced recovery with smaller-sized pool which is under primary production.
Moreover, Laherrere and Sornette (1998) have proposed more flexible power-normal
distribution as a reserves distribution.
Consequently, for the purposes of this study, the variable ܺ is assumed to be
randomly drawn from populations with one of three PDFs. Distributions of log-normal,
power-normal, and bimodal type, which can be described by a combination of two log­
normal distributions, will be used in this research study.
16 Literature Review
2.5
Gaps in the Existing Body of Knowledge
Being unbiased is desirable but it is not necessarily the most important factor to select
an estimator because it might be possible to correct the bias either using a correction
factor or making some modifications in the formulation of the mean estimator that is
discussed in next chapters. Therefore, in addition to bias, other properties such as
uncertainty, consistency, and efficiency should be assessed in order to choose an optimal
mean estimator. Furthermore, a good estimator should simultaneously be unbiased and
consistent and have good efficiency and small uncertainty. However, none of Keefer and
Bodily (1983), Megill (1984), and Bickel et al (2011) has assessed other properties of the
mean estimators besides bias. Thus it is of interest to assess the performances of the mean
estimators based on bias, uncertainty, consistency, and efficiency.
Keefer and Bodily (1983), Megill (1984), and Bickel et al (2011) have investigated
the bias of SR for a log-normal distribution with a limited range of mean and standard
deviation. For instance, Megill (1984) reported the bias of SR for a log-normal
distribution with ߪ ∈ ሾ0, 2.1ሿ. Hence it is of interest to extend Megill’s results to the
wider range of heterogeneity, ߪ ∈ ሾ0, 5ሿ, since reservoir characteristics can be this
variable. For example, Rollins et al. (1992) show log-normal probability plots and list
characteristics for permeability with ’s of 4 (Cotton Valley Formation) and 2.6 (Travis
Peak Formation). Seidle and O’Connor (2003) report that the distribution of estimated
ultimate gas recoveries (EUR’s) for the San Juan Basin is log-normal with  = 2.2.
As mentioned before, Keefer and Bodily (1983), Megill (1984), and Bickel et al
(2011) have evaluated the SR performance specifically for the case of a log-normal
distribution and overlooked its performance for other types of distribution. It is well
understood, on the other hand, that reservoir parameters such as hydrocarbon reserves
and permeability are not necessarily log-normally distributed and can be described by
other types of distribution (e.g., bimodal and power-normal distributions). Therefore, this
research study investigates the SR performance for bimodal and power-normal
distributions in addition to log-normal distribution and compares it with the performances
of other mean estimators.
17 Literature Review
Rose (2001) has raised a noteworthy aspect of SR; however, his conclusion would
have been more persuasive if he could have quantitatively studied the bias of SR for wide
range of truncated log-normal distributions. Another issue that he overlooked is that, after
truncation, 98% of the cumulative density function (CDF) is used to calculate the mean
value while proposed SR’s formula is based on using a 100% of the CDF. Therefore,
SR’s formula might be changed based on this truncation. This change might be
insignificant, but it should be evaluated. Therefore, it would be of interest to
comprehensively evaluate the bias, uncertainty, efficiency, and consistency of SR when
the underlying truncated distribution is log-normal with wide range of variability.
Delfiner (2007) also advocated the use of SR to reduce the pitfall related to
permeability estimates from Phi-k relationship. He did this comparison for a Phi-k data
set with the correlation coefficient of 0.64. However, it remains unclear whether the
result is improved due to the use of SR or due to change of method of estimating
permeability from Phi-k (i.e., vertically slicing the Phi-k cross plot and estimating the
permeability mean for each slice). Also, he has not addressed whether the method
proposed by him is applicable for all Phi-k cross-plot with different correlation
coefficients. Although Delfiner (2007) has shown the improvement in estimating pseudoflow profile using SR, his conclusion could be more persuasive if he had considered the
problem with SR discussed by Megill (1984) and investigated more examples. Therefore,
he could have investigated the uncertainty associated with estimated permeability from
Phi-k relationship by SR. Thus it remains unclear how precise and efficient SR estimates
the permeability for a Phi-K data set with different heterogeneity degree. Also, he did not
consider the uncertainty associated with estimating mean via SR for each slice due to the
change in the number of data points in each slice. Some slices contain a few data points
while others contain a large number of data points; therefore, reliable estimates of ‫ݔ‬ଵ଴
and ‫ݔ‬ଽ଴ from the small number of data points becomes difficult.
Both Delfiner (2007) and Keefer and Bodily (1983) have mentioned that reliably
estimating the 5th and 95th percentiles are difficult because they are closer to the extremes
than the 10th and 90th percentiles. However, they did not quantitatively investigate the
variability associated with ‫ݔ‬ହ and ‫ݔ‬ଽହ estimates in contrast to ‫ݔ‬ଵ଴ and ‫ݔ‬ଽ଴ estimates, and
consequently its effect on estimating the population mean.
18 Literature Review
Moreover, it remains unclear whether SR is asymptotically unbiased, like the MLE
(i.e., SR becomes unbiased as a sufficient number of samples is drawn from a
population).
Researchers have studied SR performance with the assumption that samples are
independent and identically distributed (i.i.d.). Reservoir parameters, however, such as
permeability might be auto-correlated. No attention has been paid to the SR performance
when samples are auto-correlated. Hence the performances of SR and other mean
estimators when samples are dependent and identically distributed (d.i.d.) are evaluated
in this study.
19 Log-Normal Distribution
Chapter 3 : Performance Evaluation for the Case of the
Log-Normal Distribution
In 1984, Megill in his book stated that SR offers good estimates of the mean for
modestly skewed distributions; however, it starts to be significantly biased as the
distribution becomes highly skewed. In addition to Megill (1984), Keefer and Bodily
(1983) and Bickel et al. (2011) have conducted evaluation on the bias of SR for a log­
normal distribution with a limited range of mean and standard deviation.
As mentioned before, an optimum mean estimator should simultaneously be unbiased
and consistent, and have small uncertainty and large efficiency. Therefore, besides
biasness, other mean estimator’s properties such as consistency, efficiency, and
uncertainty should be assessed.
Hence, this chapter evaluates all aforementioned properties of SR and compares them
to the properties of the AA, MLE, and PT when the underlying distribution is log-normal
with the log-standard deviation of ߪ varying between zero and five, ߪ ∈ ሾ0, 5ሿ, almost
twice larger than variability studied by previous researchers since reservoir parameters
can be this variable (Rollins et al. 1992 and Seidle and O’Connor 2003). These properties
are analytically derived and generalized in terms of the log-mean of ߤ, and then
numerically verified via MC simulation. Moreover, this chapter proposes two approaches
to de-bias SR using a correction factor and making modification in the weights of SR.
3.1
Analytical Expressions of Mean Estimators’ Properties
There are two assumptions used in this chapter to analytically derive the properties of
mean estimators. First, RV’s of ܺଵ , ܺଶ , … , ܺ௡ are assumed independent and identically
distributed (i.i.d) with PDF of ݄௑ ሺ‫ݔ‬ሻ, where ݄௑ ሺ‫ݔ‬ሻ is the log-normal PDF with
‫ ܧ‬ሾ݈݊ሺܺሻሿ ൌ ߤ and ܸܽ‫ݎ‬ሾ݈݊ሺܺሻሿ ൌ ߪ ଶ . Thus based on the log-normal distribution’s
properties, ‫ ܧ‬ሺܺሻ ൌ ݁ ఓାఙ
మ ⁄ଶ
మ
మ
and ܸܽ‫ݎ‬ሺܺሻ ൌ ݁ ൫ଶఓା ఙ ൯ ൫݁ఙ െ 1൯, where ‫ ܧ‬ሺܺሻ and
ܸܽ‫ݎ‬ሺܺሻ are the expected value and variance of ܺ, respectively. Second, the uth percentile
20 Log-Normal Distribution
of X is normally distributed with the mean of ܺ௨ and variance of ‫ݑ‬ሺ1 െ ‫ݑ‬ሻ⁄ሺ݄݊௑ೠ ଶ ሻ
(Ord and Stuart 1987).
According to the assumptions mentioned above, the expected values, ‫ ܧ‬ሺ‫ ்ݔ‬ሻ, and
variances, ܸܽ‫ݎ‬ሺ‫ ்ݔ‬ሻ, of the AA, SR, PT, and the MLE are as follows.
The statistical properties of the AA are
‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ݁
൬ఓା
഑మ
൰
మ
, ......................................................................................................... (3-1) and
మ
ܸܽ‫ݎ‬ሺ‫ݔ‬஺ ሻ ൌ ݁ ൫ଶఓା ఙ ൯ ൬
మ
௘ ഑ ିଵ
௡
൰. ..................................................................................... (3-2)
The expected values and variances of SR and PT are given by
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ ൌ ݁ ఓ ሺ0.3݁ ఙ௪భబ ൅ 0.4݁ ఙ௪ఱబ ൅ 0.3݁ ఙ௪వబ ሻ; .................................................. (3-3)
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ ൌ
ଶగఙ మ
௡
݁
మ
మ
ೢభబ మ
మ
݁ଶఓ ቊ0.0081൫݁ ଶఙ ௪భబ ା௪భబ ൅ ݁ ଶఙ ௪వబା௪వబ ൯ ൅ 0.04 ൅ 0.012 ൬݁ ఙ௪భబ ା
ೢ
ఙ௪వబ ା వబ
మ
൰൅
మ
ೢ మ శೢ మ
൬ భబ మ వబ ൰
0.0018݁
ቋ;
൅
......................................................................... (3-4) ‫ ܧ‬ሺ‫ݔ‬௉் ሻ ൌ ݁ ఓ ሺ0.185݁ ఙ௪ఱ ൅ 0.630݁ ఙ௪ఱబ ൅ 0.185݁ ఙ௪వఱ ሻ; ...................................... (3-5)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ ൌ
ଶగఙ మ
௡
݁
మ
మ
ೢఱ మ
మ
݁ଶఓ ቊ0.0016൫݁ ଶఙ ௪ఱ ା௪ఱ ൅ ݁ ଶఙ ௪వఱ ା௪వఱ ൯ ൅ 0.099 ൅ 0.0058 ൬݁ ఙ௪ఱ ା
ೢ
ఙ௪వఱ ା వఱ
మ
మ
൰ ൅ 0.00017݁
ೢ మ శೢవఱ మ
൬ ఱ
൰
మ
൅
ቋ, ......................................................................... (3-6)
where ‫ݓ‬௨ ൌ ିଵ ሺ‫ݑ‬/100ሻ;  denotes the cumulative standard normal density function;
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ and ‫ ܧ‬ሺ‫ݔ‬௉் ሻ are expected values of SR and PT, respectively; and ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ and
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ are the variances of SR and PT, respectively (see Appendix A for derivations).
Furthermore, the expected value and variance of the MLE is given by
21
Log-Normal Distribution
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ
൰
మ
൬ஜା
ሺ೙షభሻ ഑మ
൤ି ೙ ൨
మ ቀ1
݁
െ
ఙమ
ቁ
ሺ೙షభሻ
మ
ି
௡ିଵ
........................................................ (3-7) and
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ
൰
೙
൬ଶஜା
൥ ݁
഑మ
൰
೙
൬
ቀ1 െ
ଶఙ మ
ቁ
ି
ሺ೙షభሻ
మ
௡ିଵ
െ ቀ1 െ
ఙమ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
൩, ....................... (3-8) respectively (see Appendix B for the derivations).
3.2
Validation of Analytical Expressions using Monte Carlo Simulation
The analytical expressions of mean estimators’ properties derived above are
numerically validated using MC simulation. In this regard, m = 10,000 data sets
containing n = 25 to 3,000 samples are randomly drawn from a log-normal distribution
using the inverse cumulative method. In this method, ܺ ൌ ‫ି ܪ‬ଵ ሺ‫ݑ‬ሻ, where ‫ି ܪ‬ଵ is the
inverse of the CDF, ‫ܪ‬௑ ሺ‫ݔ‬ሻ, and u is randomly drawn from a uniform distribution within
the interval ሺ0, 1ሻ.
The mean estimator, ܶ, is applied to each data set to estimate the mean value of ݉
data sets. Consequently, a set of ൛‫ݔ‬ො் ଵ , … , ‫ݔ‬ො் ௠ ൟ, is generated. The expected value of this
set is approximated by the AA which is designated as ሺ‫ݔ‬ො் ሻ஺ , and its variance is obtained
ଶ
from ∑௠
ො் ௜ െ ሺ‫ݔ‬ො் ሻ஺ ൧ ൗሺ݉ െ 1ሻ. The estimated means and variances are used to
௜ୀ଴ൣ‫ݔ‬
numerically compute the biases, SE’s, and RMSE’s.
Bias as the first mean estimator property is evaluated using the ratio of ‫ ܧ‬ሺ‫ ்ݔ‬ሻ to ‫ ܧ‬ሺܺሻ
(Table 3-1). Deviation from unity indicates that the mean estimator is biased, and zero
deviation implies that the mean estimator is unbiased. As seen in Table 3-1, the analytical
expressions of the ratio, ‫ ܧ‬ሺ‫ ்ݔ‬ሻ/‫ ܧ‬ሺܺሻ, are independent of ߤ, thus these ratios can be
applied to all log-normal distributions with different mean values.
The expressions in Table 3-1 are verified using MC simulation (Fig. 3-1). The
analytical and numerical approaches of SR and PT relatively match and their match
improves as ݊ becomes large (Fig. 3-1a). The analytical and numerical approaches of
MLE perfectly agree, but the match between the analytical and numerical approaches of
22
Log-Normal Distribution
the AA strongly depends on ݊, especially for large ߪ, similar to Agterberg’s observations
(Agterberg 1974, p. 237) (Fig. 3-1b).
Table 3-1– Analytical expressions of E(xT)/E(X).
ࡱሺ࢞ࢀ ሻ⁄ࡱሺ࢞ሻ
AA
1
ሺ0.3 ݁ ఙ ௪భబ ൅ 0.4 ݁ ఙ ௪ఱబ ൅ 0.3 ݁ ఙ ௪వబ ሻ⁄݁ ஢
PT
ሺ0.185 ݁ ఙ ௪ఱ ൅ 0.63 ݁ ఙ ௪ఱబ ൅ 0.185 ݁ ఙ ௪వఱ ሻ⁄݁ ஢
MLE
݁
1.2
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
E(xT) / E(X) 1
E(xT) / E(X) (b)
Analy. SR
Analy. PT
Mc, SR, n=35
Mc, SR, n=100
Mc, PT, n=35
Mc, PT, n=100
1.2
ሺ௡ିଵሻ ఙ మ
൤ି ௡
ଶ൨
(‫ ݔ‬T)A / E(X) (a)
ሺ1 െ ߪ ଶ ⁄݊ሻି
0
2
3
4
1.4
1.4
1.2
1.2
1
1
0.8
0.8
Analy, AA
Analy, MLE, n=35
Analy, MLE, n=100
Mc, AA, n=100
Mc, AA, n=3000
Mc, MLE, n=35
Mc, MLE, n=100
0.6
0
5
ሺ௡ିଵሻ
ଶ
1.6
0.2
0
1
మ ⁄ଶ
1.6
0.4
0
మ ⁄ଶ
SR
0
1
2
3
0.6
(xT)A / E(X) Estimator
0.4
0.2
0
4
5
Standard Deviation (σ)
Standard Deviation (σ)
Fig. 3-1 – Comparison of E(xT)/E(X) and (xT)A/E(X) of (a) SR and PT (b) the AA and MLE.
Fig. 3-2 draws a comparison between SE’s obtained from analytical expressions (Eqs.
3-2, 3-4, 3-6, and 3-8) and computed using MC simulations for cases of ߪ ൌ 1 and
ߪ ൌ 1.5. Analytical expressions of the AA and MLE perfectly follow the numerical
results (Fig. 3-2a and 3-2c); however, there is a slight discrepancy between analytical and
numerical results of SR, especially for small ݊ (Fig. 3-2b and 3-2d). For the case of PT,
the difference between analytical and numerical approaches becomes significant for large
ߪ (Fig. 3-2d). As ݊ increases or ߪ decreases, these differences approach zero. They might
be caused by assuming that the uth percentile is normally distributed whereas it actually
has a beta distribution for small ݊ (Ord and Stuart 1987).
23 Log-Normal Distribution
(a)
(b)
σ = 1.0
70
σ = 1.0
90
80
60
70
50
SD(xT)
60
SD(xT)
40
30
Analy., AA
MC., AA
Analy., MLE
MC., MLE
20
10
0.05
0.1
0.15
40
30
Analy., SR
MC., SR
Analy., PT
MC., PT
20
10
0
0
50
0
0.2
0
0.05
0.1
1/√n
(c)
(d)
σ=1.5
350
0.2
0.15
0.2
σ=1.5
400
300
Analy., AA
350
250
MC., AA
Analy., MLE
300
MC., MLE
250
200
SD(xT)
SD(xT)
0.15
1/√n
150
Analy., SR
MC., SR
Analy., PT
MC., PT
200
150
100
100
50
50
0
0
0.05
0.1
0.15
0
0.2
0
0.05
1/√n
0.1
1/√n
Fig. 3-2 – Standard errors of the AA, MLE, SR, and PT obtained from analytical and numerical
approaches for the cases of  = 1 and  = 1.5 .
Estimator
AA
Table 3-2– Analytical expressions of RMSE’s of the mean estimators.
RMSE
ఓ
஢మ ⁄ଶ ඥ ఙ మ
݁ ቄ݁
݁ െ 1ൗ√݊ቅ
మ
ଵ/ଶ
ଶ
SR
݁ ఓ ቄ൫0.3݁ ఙ௪భబ ൅ 0.4 ݁ ఙ ௪ఱబ ൅ 0.3 ݁ ఙ ௪వబ െ ݁ ஢ ⁄ଶ ൯ ൅ ሺ2ߨߪ ଶ ⁄݊ሻfௌோ ሺߪሻቅ ,
where
మ
మ
మ
fௌோ ሺߪሻ ൌ 0.0081 ൣ݁ଶఙ ௪భబା௪భబ ൅ ݁ ଶఙ ௪వబା௪వబ ൧ ൅ 0.04 ൅ 0.012 ൣ݁ ఙ௪భబା௪భబ ⁄ଶ ൅
మ
మ
మ
݁ఙ௪వబା௪వబ ⁄ଶ ൧ ൅ 0.0018 ݁ ൫௪భబ ା௪వబ ൯⁄ଶ
PT
݁ ఓ ቄ൫0.185 ݁ఙ௪ఱ ൅ 0.630 ݁ ఙ ௪ఱబ ൅ 0.185 ݁ ఙ ௪వఱ െ ݁ ஢ ⁄ଶ ൯ ൅ ሺ2ߨߪ ଶ ⁄݊ሻf௉் ሺߪሻቅ
where
మ
మ
మ
f௉் ሺߪሻ ൌ 0.0016 ൣ݁ଶఙ ௪ఱା௪ఱ ൅ ݁ ଶఙ ௪వఱ ା௪వఱ ൧ ൅ 0.099 ൅ 0.0058 ൣ݁ఙ௪ఱା௪ఱ ⁄ଶ ൅
మ
మ
మ
݁ ఙ௪వఱ ା௪వఱ ⁄ଶ ൧ ൅ 0.00017݁ ൫௪ఱ ା௪వఱ ൯⁄ଶ
మ
݁ ఓ ቊ݁ ఙ ൛݁ ିሺ௡ିଵሻఙ
మ ⁄ሺଶ௡ሻ
ଶ
ሾ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሿି ሺ௡ିଵሻ⁄ଶ െ 1ൟ ൅ ݁ ఙ
MLE
ଵ/ଶ
2ߪ ଶ ⁄ሺ݊ െ 1ሻሿିሺ௡ିଵሻ⁄ଶ െ ሾ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሿି ሺ௡ିଵሻ ൟቋ
24 ଵ/ଶ
ଶ
మ
మ ⁄௡
൛ ݁ఙ
మ ⁄௡
ሾ1 െ
,
Log-Normal Distribution
As described in Chapter two, RMSE that, indeed, incorporates both the bias and
variance of a mean estimator is also analytically derived (Table 3-2). These analytical
expressions are verified by MC simulations when ߪ ൌ 1 and ߪ ൌ 1.5 for different ݊
values (Fig. 3-3). The numerical and analytical approaches of the AA and MLE
approximately match (Fig. 3-3a and 3-3c). However, there is difference between the
numerical and analytical approaches of SR and PT which depends on ݊ and ߪ (Fig. 3-3b
and 3-3d).
(a)
(b)
σ=1.0
70
σ=1.0
90
80
60
70
50
RMSE
RMSE
60
40
30
Analy., AA
MC., AA
Analy., MLE
MC., MLE
20
10
50
40
20
10
0
0
0.05
0.1
0.15
Analy., SR
MC., SR
Analy., PT
MC., PT
30
0
0.2
0
0.05
0.1
1/√n
(c)
(d)
σ=1.5
350
0.2
σ=1.5
400
300
350
Analy., AA
MC., AA
Analy., MLE
MC., MLE
200
Analy., SR
MC., SR
Analy., PT
MC., PT
300
RMSE
250
RMSE
0.15
1/√n
150
250
200
150
100
100
50
50
0
0
0.05
0.1
0.15
0
0.2
0
1/√n
0.05
0.1
0.15
0.2
1/√n
Fig. 3-3 – RMSE’s of the AA, MLE, SR, and PT obtained from analytical and numerical approaches
for the cases of  = 1 and  = 1.5.
3.3 Analysis of the Analytical Expressions of the Mean Estimators’
Properties
Due to the good agreement between the numerical and analytical expressions for the
bias, SE, and RMSE, the rest of this chapter focuses on comparing the mean estimators’
properties obtained from the analytical expressions.
25 Log-Normal Distribution
As mentioned before, the ratio of ‫ ܧ‬ሺ‫ ்ݔ‬ሻ⁄‫ ܧ‬ሺܺሻ is used to assess the bias of an
estimator ܶ, and any deviation from unity implies that the mean estimator is biased.
According to Table 3-1, the ratio, ‫ ܧ‬ሺ‫ݔ‬஺ ሻ⁄‫ ܧ‬ሺܺሻ, is one which means the AA is an
unbiased estimator, which is also shown in many statistics textbooks too. The ratio of
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ ܧ‬ሺܺሻ is a function of ݊ and approaches one when ݊ becomes very large. In
other words, the MLE is asymptotically unbiased for the log-normal distribution, which
has been previously demonstrated (e.g., Kendall and Stuart 1977) (Fig. 3-4). The ratios
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ ܧ‬ሺܺሻ and ‫ ܧ‬ሺ‫ݔ‬௉் ሻ⁄‫ ܧ‬ሺܺሻ do not appreciably deviate from one when ߪ is within
the interval ሺ0, 1ሻ whereas the deviations become substantial as ߪ increases. PT estimates
the mean value with a smaller bias than SR does.
Fig. 3-4 supports Megill’s conclusion (1984) that the SR is biased, and Keefer and
Bodily’s finding (1983) that PT outperforms SR in the context of bias. However, the
results are extended here to log-normal distribution with wider variation, ߪ ∈ ሾ0, 5ሿ.
1.4
1.2
1
AA
MLE, n=35
MLE, n=100
MLE, n=10,000
PT
SR
E(xT) / E(X) 0.8
0.6
0.4
0.5 < CV < 1.0
0.2
CV < 0.5
CV > 1.0
0
0
0.5
1
1.5
2
2.5
0
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.9890.993
σ
VDP
3
3.5
4
4.5
5
Fig. 3-4– Analytical ratios of E(xT)/E(X) versus σ and the Dykstra-Parsons coefficient.
In order to study the mean estimators’ characteristics in terms of permeability
variations, the ratio ‫ ܧ‬ሺ‫ ்ݔ‬ሻ⁄‫ ܧ‬ሺܺሻ is also plotted versus the Dykstra-Parsons coefficient,
VDP (Dykstra and Parsons 1950) (Fig. 3-4, the second horizontal axis). VDP is commonly
used as a measure of permeability variation in the oil and gas industry, expressed by
ܸ஽௉ ൌ
௞ఱబ ି௞భల
௞ఱబ
, .............................................................................................................. (3-9) where ݇ହ଴ is the median permeability and ݇ଵ଺ is located one standard deviation below
݇ହ଴ on a log-normal probability plot. VDP varies from zero (homogeneous reservoir) to
26 Log-Normal Distribution
unity (infinitely heterogeneous reservoir), with typical values in the range of 0.5 to 0.9
(Willhite 1986, Fig. 5.45). Many researchers have used this coefficient such as Jensen et
al. (2000), Lambert (1981), and Pintos et al. (2011) and described how this coefficient
can be approximated from available data sets. The x-axes of Fig. 3-4 are divided into
three intervals based on the coefficient of variation (CV), which is another way to express
permeability heterogeneity in geological and engineering studies. Jensen et al. (2000)
proposed ranges of CV as the homogenous region with CV ≤ 0.5; heterogeneous region
with 0.5 < CV ≤ 1; and very heterogeneous region with CV > 1.
According to Fig. 3-4, SR underestimates population mean by at most 4% when VDP
is less than 0.6; however, the underestimation sharply increases when VDP exceeds 0.75.
Based on the CV values, SR underestimates the mean value by at most 2% when CV ≤ 1,
but this underestimation exceeds 20% for CV > 1. The 16% underestimation can be seen
in Delfiner’s data set (Delfiner 2007) with VDP = 0.75 and CV=1.3.
The RMSE’s and SE’s of the mean estimators are functions of ߤ, ߪ, and ݊ (Table 3-2
and Eqs. 3-2, 3-4, 3-6, and 3-8). This dependency on ߤ implies that the RMSE and SE are
unique for each log-normal distribution. Thus it is of interest to modify the formulas of
the RMSE and SE’s such that they can be used for all log-normal distributions with
different mean values. This approach helps to predict the behaviour of the mean
estimators by calculating only a parameter from a data set. Hence ݁ ଶఓ is canceled out
from both sides of, for instance Eq. 3-4, and then taking the square root from either side
yields
ௌ௧ௗሺ௫ೄೃ ሻ
௘ഋ
ൌ
మ
మ
ೢభబ మ
మ
ሺ2ߨߪ ଶ ⁄݊ሻ ቊ0.0081൫݁ ଶఙ ௪భబା௪భబ ൅ ݁ ଶఙ ௪వబା௪వబ ൯ ൅ 0.04 ൅ 0.012 ൬݁ ఙ௪భబ ା
݁
ೢ
ఙ௪వబ ା వబ
మ
మ
൰ ൅ 0.0018݁
ೢ మ శೢవబ మ
൬ భబ
൰
మ
൅
ቋ, ........................................................................(3-10) where ܵ‫݀ݐ‬ሺ‫ݔ‬ௌோ ሻ is the SE of ‫ݔ‬ௌோ and ݁ ఓ ൌ ‫ݔ‬ହ଴ , hence the new equation is only a function
of ߪ and ݊ (i.e., the graph of ܵ‫݀ݐ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ݔ‬ହ଴ versus ߪ can be used for all log-normal
distributions with different ߤ values). The same approach is used to modify the SE’s of
the AA, MLE, and PT. The expression ܵ‫݀ݐ‬ሺ‫ ்ݔ‬ሻ⁄‫ݔ‬ହ଴ is used to compare the degree of
27 Log-Normal Distribution
uncertainty of the mean estimators in the rest of this chapter. An estimator with smaller
ܵ‫݀ݐ‬ሺ‫ ்ݔ‬ሻ⁄‫ݔ‬ହ଴ has smaller uncertainty.
In general, all SE’s approach zero as ݊ becomes large. For small ߪ, ܵ‫݀ݐ‬ሺ‫ݔ‬஺ ሻ⁄‫ݔ‬ହ଴ and
ܵ‫݀ݐ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ݔ‬ହ଴ are approximately identical and smaller than ܵ‫݀ݐ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ݔ‬ହ଴ and
ܵ‫݀ݐ‬ሺ‫ݔ‬௉் ሻ⁄‫ݔ‬ହ଴ (Fig. 3-5a and 3-5b). Thus, the AA has the same uncertainty as MLE and
has less uncertainty than SR and PT for small ߪ. However, when ߪ exceeds a certain
value depending on ݊, SR has smaller SE than the other mean estimators (Fig. 3-5d and
3-6).
The same approach as used for generalizing the SE’s is applied to generalize the
RMSE’s, hence the ratio ܴ‫ ܧܵܯ‬⁄‫ݔ‬ହ଴ , which is only a function of ߪ and ݊, is used for all
log-normal distributions with different ߤ values. For example, the RMSE of SR can be
modified as
ோெௌாೄೃ
௫ఱబ
where
మ ⁄ଶ
ൌ ቄ൫0.3݁ ఙ௪భబ ൅ 0.4݁ ఙ௪ఱబ ൅ 0.3݁ ఙ௪వబ െ ݁ ஢
మ
మ
ଶ
ଵ/ଶ
൯ ൅ ሺ2ߨߪ ଶ ⁄݊ሻ fௌோ ቅ
, ...(3-11) fௌோ ൌ ൛0.0081ൣ݁ ଶఙ ௪భబ ା௪భబ ൅ ݁ ଶఙ ௪వబ ା௪వబ ൧ ൅ 0.04 ൅ 0.012ൣ݁ ఙ௪భబା௪భబ
݁ఙ௪వబା௪వబ
మ ⁄ଶ
൧ ൅ 0.0018݁ ൫௪భబ
మ ା௪ మ ൯⁄ଶ
వబ
మ ⁄ଶ
൅
ൟ. This ratio is used to evaluate the consistency
and efficiency of the mean estimators. An estimator is consistent when the ܴ‫ ܧܵܯ‬⁄‫ݔ‬ହ଴
tends to zero for large ݊, and it is the most efficient when it has the minimum ܴ‫ ܧܵܯ‬⁄‫ݔ‬ହ଴
among other estimators.
28 Log-Normal Distribution
(a)
(b)
σ=0.05, VDP=0.05
σ=0.5, VDP=0.39
0.16
0.012
0.14
0.01
0.12
SD(xT)/x50
SD(xT)/x50
0.008
0.006
AA
MLE
SR
PT
0.004
0.002
0.1
0.08
AA
MLE
SR
PT
0.06
0.04
0.02
0
0
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
(c)
(d)
0.2
σ=1.5, VDP=0.78
2
1.8
0.4
Arith
MLE
SR
PT
1.6
1.4
SD(xT)/x50
0.15
1/√n
σ=1.0, VDP=0.63
0.5
0.1
0.2
SD(xT)/x50
0.3
AA
MLE
SR
PT
0.1
1.2
1
0.8
0.6
0.4
0.2
0
0
0
0.05
0.1
0.15
0.2
0
0.05
0.1
0.15
0.2
1/√n
1/√n
Fig. 3-5 – Ratio of SE’s to x50 of the AA, MLE, SR, and PT for four different σ values.
(a)
100000
10000
1000
1000
std (xT)/x50
10000
std (xT)/x50
100
10
AA
MLE
PT
SR
1
0.1
0.01
0.5
1
1.5
2
2.5
3
3.5
4
4.5
100
10
1
0.01
5
0.001
0
σ, n=50
0
AA
MLE
PT
SR
0.1
0.001
0
(b)
100000
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993
0
VDP
0.5
1
1.5
2
2.5
σ, n=600
3
3.5
4
4.5
5
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993
VDP
Fig. 3-6 – Ratios of SE/x50 of the AA, SR, MLE, and PT versus σ and VDP for (a) n=50 and (b) n=600.
Figures 3-7a to 3-7d present the analytical expressions of ܴ‫ ܧܵܯ‬⁄‫ݔ‬ହ଴ of the AA,
MLE, SR, and PT versus 1⁄√݊ for four different values of ߪ and VDP. When ߪ is small,
the AA is as efficient as the MLE and more efficient than SR and PT; SR is slightly more
efficient than PT (Fig. 3-7a and 3-7b). As  reaches 1, for ݊ ൏ 150, SR is still less
29 Log-Normal Distribution
efficient than the AA and MLE, and more efficient than PT. However, as ݊ ൒ 150, PT
becomes more efficient than SR (Fig. 3-7c). As  increases and reaches 1.5, SR becomes
the most efficient for a very small number of samples (݊ ൑ 50); nevertheless, as ݊
increases, the AA, MLE, and PT all become more efficient than SR (Fig. 3-7d).
Therefore, SR becomes more efficient than others for certain ranges of ݊ and ߪ. In other
words, SR becomes more efficient than the AA, MLE, and PT when ߪ exceeds a certain
value depending on ݊ (Fig. 3-8).
(a)
(b)
σ=0.05, VDP=0.05
0.012
σ=0.5, VDP=0.39
0.16
0.14
0.01
RMSE/x50
RMSE/x50
0.12
0.008
0.006
AA
MLE
SR
PT
0.004
0.002
0.1
0.08
0.06
AA
MLE
SR
PT
0.04
0.02
0
0
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
(c)
(d)
0.2
σ=1.5, VDP=0.78
2
1.6
RMSE/x50
0.4
RMSE/x50
0.15
1/√n
σ=1.0, VDP=0.63
0.5
0.1
0.3
0.2
AA
MLE
SR
PT
0.1
1.2
0.8
AA
MLE
SR
PT
0.4
0
0
0
0.05
0.1
0.15
0.2
0
0.05
0.1
0.15
0.2
1/√n
1/√n
Fig. 3-7 – Ratio of RMSE to x50 of the AA, SR, MLE, and PT for four different σ values.
Equating the analytical expression of RMSE of SR to the analytical expressions of
RMSE’s of the AA, MLE, and PT yields intervals where SR is more efficient than other
estimators (Eqs. 3-12 through 3-14). If
ቐ
݊ ൏ 100,ߪ ൒ ݊ ൒ 100,ߪ ൒ ହ଴ଵ଺ଽ
௡మ
െ
ଵଶଵ଼଼
௡య⁄మ
൅
଻.଺଼
௡
ଵ଴ଵ଴
௡
െ
െ
ଷ.ସଽ
√௡
ହଵ.ଵ଼
√௡
30 ൅ 1.37
൅ 3.39
, .....................................(3-12)
Log-Normal Distribution
SR is more efficient than the AA; if
݊ ൏ 50,ߪ ൒ 1
ଵହ଻଴଴
଻଺ହ଼.ଵ
ଵଷସସ.ହ
ଵଷଵ.଺ଶ
ቐ50 ൑ ݊ ൏ 800,ߪ ൒ ௡మ െ ௡య⁄మ ൅ ௡ െ ௡ ൅ 8.30, ...........................(3-13)
√
݊ ൒ 800,ߪ ൒ 5
SR is more efficient than MLE; and SR is more efficient than PT if
ߪ ൏െ
ଷ଼଺.ଶ଻
௡మ
െ
ଷଵ.଻଺
௡య⁄మ
൅
ଷହ.ଽହ
௡
൅
଼.଼
√௡
൅ 0.17 for any ݊. ..................................................(3-14)
For example, when ݊ ൌ 600, SR is more efficient than the AA and MLE when ߪ is
greater than 2.01 and 4.7, respectively, and has smaller RMSE than PT when ߪ is smaller
than 0.6 (Fig. 3-8b).
(a)
100000
1000
100
100
RMSE/X50
1000
RMSE/X50
10000
10
AA
MLE
PT
SR
1
0.1
0.01
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1
AA
MLE
PT
SR
0.01
0.001
5
0
σ, n=50
0
10
0.1
0.001
0
(b)
100000
10000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
σ, n=600
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.9890.993
0
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993
VDP
VDP
Fig. 3-8 – Ratios RMSE/x50 of the AA, MLE, SR, and PT versus σ when (a) n=50 or (b) n=600.
The RMSE’s of SR and PT never reach zero as the number of samples becomes large
(Fig. 3-7d). The SE’s of SR and PT (square roots of Eqs. 3-4 and 3-6, and Fig. 3-5)
illustrate that the SE’s tend to zero as ݊ approaches infinity. However, the ratios
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ ܧ‬ሺܺሻ and ‫ ܧ‬ሺ‫ݔ‬௉் ሻ⁄‫ ܧ‬ሺܺሻ are not functions of ݊ and only depend on ߪ, so they
never tend to one with the exception of ߪ ൌ 0 (Table 3-1 and Fig. 3-4). Thus the
deviations of RMSE’s of SR and PT from zero for large ݊ are due to their biases. This
means that both SR and PT are inconsistent and converge in probability to a value which
is different from the true mean.
31 Log-Normal Distribution
3.4
Improving Swanson’s Rule
Being unbiased is an appropriate property but it is not necessarily the main criterion
for selecting an optimal estimator because, first bias can be removed by including a
correction factor, and second other mean estimators’ properties can compensate the bias.
For example, although SR is biased, it has the smallest SE for large ߪ for any ݊ and has
the smallest RMSE estimator when ߪ is large and ݊ is small.
Two approaches are used to remove or reduce the bias of SR are as follows. (1)
multiply ‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ by a correction factor, ‫ݖ‬, such that the ratio ‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ ܧ‬ሺܺሻ becomes
unity; and (2) change the weights ω of SR as ߪ changes.
3.4.1 Adjusting Swanson’s Rule by a Coefficient
The bias of SR is a function of ߪ and becomes significant as ߪ increases (Fig. 3-4).
The first approach to improve SR is to multiply it by a coefficient such that ‫ݔݖ‬ௌோ ൌ
‫ݔ‬ௌோ ஼ଵ ൎ ‫ ܧ‬ሺܺሻ. The coefficient is approximated using generalized reduced gradient
nonlinear optimization code3 and given by
‫ ݖ‬ൌ ݁‫݌ݔ‬ሺെ.00771ߪ ସ ൅ 0.105ߪ ଷ െ 0.043ߪ ଶ െ 0.00342ߪሻ. ....................................(3-15) ‫ ݖ‬is a function of ߪ hence, in order to de-bias SR, σ should be known. However, σ is
not always available, so the sample standard deviation, ‫ݏ‬, is used instead. ‫ ݏ‬can be
evaluated by ሾ݈݊ሺ‫ݔ‬ଽ଴ ⁄‫ݔ‬ଵ଴ ሻሿ⁄2‫ݓ‬ଵ଴ , where ‫ݔ‬ଵ଴ and ‫ݔ‬ଽ଴ are the 10th and 90th percentiles,
and ‫ݓ‬ଵ଴ ൌ െ1.28176.
When ݊ is very small, ‫ ݏ‬is estimated by about 27% error for both a very
heterogeneous case, ߪ ൌ 5.0, and a nearly-homogeneous case, ߪ ൌ 1.0 (Fig. 3-9a). From
Fig. 3-9a, it can also be concluded that the error associated with estimating ‫ ݏ‬strongly
depends on ݊, regardless of how heterogeneous a population is, and as ݊ increases, ‫ݏ‬
becomes a good approximation of ߪ (Fig. 3-9b).
The analytical expressions of the expected value and SE of SRC1 are ‫ܧ‬൫‫ݔ‬ௌோ ஼ଵ ൯ ൌ
‫ݖ‬ሺߪሻ ‫ܧ‬ሺ‫ݔ‬ௌோ ሻ and ‫ ݀ݐݏ‬൫‫ݔ‬ௌோ ஼ଵ ൯ ൌ ‫ݖ‬ሺߪሻ ‫݀ݐݏ‬ሺ‫ݔ‬ௌோ ሻ, respectively. The analytical results do
3
Microsoft Excel Solver tool is developed by Leon Lasdon, University of Texas at Austin, and Allan
Waren, Cleveland State University.
32
Log-Normal Distribution
not follow the MC simulation results except for large ݊ (Fig. 3-10). This discrepancy
might be caused by the assumption that the uth percentile is normally distributed while it,
indeed, has a beta distribution (Ord and Stuart 1987).
7
(b) 1.4
6
1.2
5
1
4
Sample SD/σ
Sample SD
(a)
Analy, σ=5.0
MC, σ=5.0
Analy, σ=1.0
MC, σ=1.0
3
2
0.8
0.6
Analy
0.4
MC
1
0.2
0
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√n
0.1
0.15
0.2
1/√n
Fig. 3-9 – Sample standard deviation obtained from analytical expression and MC simulation with
error bars showing 95% confidence interval (a) for two different σ values (b) for general case.
(a)
(b)
σ=0.5, VDP=0.39
MC, σ Unknown
MC, σ Known
Analy
169.5
MC, σ Unknown
MC, σ Known
Analy
256
254
169
E(xSR_C1)
E(xSR_C1)
σ=1.0, VDP=0.63
258
170
168.5
252
250
248
168
246
167.5
244
167
242
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
(c)
(d)
σ=1.5, VDP=0.78
MC, σ Unknown
MC, σ Known
Analy
0.2
MC, σ Unknown
MC, σ Known
Analy
1600
1500
510
E(xSR_C1)
E(xSR_C1)
520
0.15
σ=2.0, VDP=0.86
1700
540
530
0.1
1/√n
500
490
480
1400
1300
1200
470
1100
460
1000
450
0
0.05
0.1
0.15
0
0.2
1/√n
0.05
0.1
0.15
1/√n
Fig. 3-10 – E(xSR‐C1 ) obtained from analytical expression and MC simulation with error bars
showing 95% confidence interval, when σ is either known or unknown.
33
0.2
Log-Normal Distribution
The error associated with estimating  causes at most 17% error in ‫ܧ‬൫‫ݔ‬ௌோ ஼ଵ ൯
estimation when ߪ ൌ 2.0 compared to the case in which ߪ is known; however this error
rapidly decreases and approaches approximately zero as ݊ increases and/or ߪ decreases
(Fig. 3-10). For instance, when ߪ decreases from 2.0 to 1.5, the error drops to about 5%
for ݊ ൌ 25 and becomes zero for very large n (Fig. 3-10c).
3.4.2 Moment Matching with Fixed Values
Another way to de-bias SR is to analytically calculate the weights of SR. Hurst et al.
(2000) have theoretically justified the SR weights by using the general form of Eq. 2-10
given as
‫ݔ‬௦௪ ൌ ߱‫ݔ‬ଵ଴ ൅ ሺ1 െ 2߱ሻ‫ݔ‬ହ଴ ൅ ߱‫ݔ‬ଽ଴ , ......................................................................(3-16) which preserves symmetry but allows the weights to vary. They showed that the weights
are identical to the weights originally proposed by Swanson (Megill 1984) when ߪ
approaches zero, but they deviate from the 0.3-0.4-0.3 rule as ߪ increases. Bickel et al.
(2011) stated that directly applying moment matching to each sample distribution yields a
maximum accuracy, thus they applied moment matching to uniform, normal, exponential,
and triangular distributions to derive discretization methods with fixed values or fixed
probabilities. They concluded that SR has no analytical justification for any distribution
other than normal distribution.
Moment matching is a form of Gaussian quadrature,
௥
‫׬‬௑ ݄ሺߦ ሻߦ ௄ ݀ߦ ൌ ‫ ܧ‬ሺܺ ௄ ሻ ൎ ∑
ே
௜ୀଵ ܲ௜ ܺ௜ for ‫ ݎ‬ൌ 0,1, … , ܰ . ...........................................(3-17)
௥
It approximates the rth non-centered moment, ‫ ܧ‬ሺܺ ௥ ሻ, by the sum ∑ே
௜ୀଵ ܲ௜ ܺ௜ , where ܰ
is the number of probability-value pairs, and 2ܰ moments can be approximated by ܰ
points (Miller and Rice 1983). Hurst et al. (2000) have used moment matching assuming
that ܰ ൌ 3, the 10th, 50th, and 90th percentiles as the ܺଵ , ܺଶ , ܺଷ values, respectively, and
ܲଵ ൌ ܲଷ ൌ ߱. By taking the expected value of Eq. 3-16 and equating it to ‫ ܧ‬ሺܺሻ, the
weight ߱ is given by
߱ൌ
మ
௘ ൫഑ ⁄మ൯ ିଵ
௘ ഑ೢభబ ିଶା௘ ഑ೢవబ
, ....................................................................................................(3-18)
34
Log-Normal Distribution
where ‫ݓ‬ଵ଴ ൌ െ‫ݓ‬ଽ଴ ൌ െ1.28176. ߱ is a function of ߪ and independent of the population
mean, thus this formulation can be used for all log-normal distributions with different
mean values.
As derived in Eq. 3-18, SR can be converted into an unbiased mean estimator,
designated by ‫ݔ‬ௌோ ஼ଶ , by modifying the weights based on ߪ. As mentioned before, ߪ is not
always known and thus ‫ ݏ‬is used instead. The error associated with estimating ߪ causes,
for instance, at most 15% error in estimating the weights of SR for a heterogeneous case
with ߪ ൌ 2.0 and ݊ ൌ 200 (Fig. 3-11). This error leads to at most 20% error in ‫ܧ‬൫‫ݔ‬ௌோ ஼ଶ ൯
estimation when ߪ ൌ 2.0 compared to the case in which ߪ is known, but it rapidly
approaches zero as ݊ increases and/or ߪ decreases (e.g., the error decreases from 20% to
7% when ߪ ൌ 1.5) (Fig. 3-12).
3.5
Weight of SR (ω)
3
σ known
σ Unknown
2.5
2
1.5
1
0.5
0
0
0.5
1
0
0.39
0.63
1.5
2
2.5
3
0.78
0.86
0.92
0.95
σ, n=200
VDP
Fig. 3-11 – Weights of SR versus σ, where σ is known and unknown with error bars showing 95%
confidence interval.
35 Log-Normal Distribution
(a)
(b)
σ=0.5, VDP=0.39
MC, σ Unknown
MC, σ Known
Analy
MC, σ Unknown
MC, σ Known
Analy
256
E(xSR_C2)
E(xSR_C2)
170
σ=1.0, VDP=0.63
260
170.5
169.5
252
169
248
168.5
244
168
240
0
0.05
0.1
0.15
0.2
0
0.05
0.1
1/√n
1/√n
(c)
0.2
0.15
0.2
σ=2.0, VDP=0.86
1700
MC, σ Unknown
MC, σ Known
Analy
MC, σ Unknown
MC, σ Known
Analy
1600
1500
E(xSR_C2)
560
E(xSR_C2)
(d)
σ=1.5, VDP=0.78
600
0.15
520
1400
1300
480
1200
440
1100
400
1000
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
0.1
1/√n
Fig. 3-12 – E(xSR‐C2 ) obtained from analytical expression and numerically calculated using MC
simulation with error bars showing 95% confidence interval, when σ is either known or unknown.
1.2
1
MLE, n=35
MLE, n=1,000
PT
SR
SR_C1
SR_C2
E(xT) / E(X)
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
σ, n=200
0
0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993
VDP
Fig. 3-13 – Ratio of the expected values of SR, SRC1, SRC2, PT, and the MLE to E(X).
Both de-biased SR approaches cause SR to become unbiased and the most efficient
estimator depending on ݊ and ߪ(Fig. 3-13 and 3-14a); however, they result in larger SE’s
than the original SR when σ exceeds a certain value (Fig. 3-14b). Compared to the AA
36
Log-Normal Distribution
and MLE, SRC1 and SRC2 have smaller SE’s. In other words, although these
modifications remove the bias of SR, they cause an increase in the SE of SR. This
increase depends on ߪ so that it can become appreciable for large ߪ’s.
(a) 1000
10
1
AA
MLE
PT
SR
SR_C1
SR_C2
100
std (xT)/x50
100
RMSE/X50
(b) 1000
AA
MLE
PT
SR
SR_C1
SR_C2
0.1
0.01
10
1
0.1
0.01
0.001
0.001
0
0.5
1
1.5
2
2.5
3
0
0.5
1
σ, n=200
0
0.39
0.63
0.78
1.5
2
2.5
3
0.86
0.92
0.95
σ, n=200
0.86
0.92
0
0.95
0.39
0.63
0.78
VDP
VDP
Fig. 3-14 – (a) RMSE/x50 and (b) SE/x50 of the AA, MLE, PT, SR, SRC1, and SRC2 versus σ and VDP
when n=200.
(a)
Arith
MLE
PT
SR
SR_C1
SR_C2
0.12
RMSE/x50
0.008
σ=0.5, VDP=0.39
0.16
Arith
MLE
PT
SR
SR_C1
SR_C2
0.01
RMSE/x50
(b)
σ=0.05, VDP=0.05
0.012
0.006
0.004
0.08
0.04
0.002
0
0
0
0.05
0.1
0.15
0
0.2
0.05
0.1
1/√n
(c)
Arith
MLE
PT
SR
SR_C1
SR_C2
1.6
RMSE/x50
0.3
0.2
σ=1.5, VDP=0.78
2
Arith
MLE
PT
SR
SR_C1
SR_C2
0.4
RMSE/x50
(d)
σ=1.0, VDP=0.63
0.5
0.15
1/√n
0.2
1.2
0.8
0.4
0.1
0
0
0
0.05
0.1
0.15
0
0.2
1/√n
0.05
0.1
0.15
0.2
1/√n
Fig. 3-15 – Ratio of the RMSE’s of the AA, MLE, PT, SR, SRC1, and SRC2 to x50 versus the square
root of the inverse of sample size.
37
Log-Normal Distribution
The comparison of the RMSE’s of unbiased estimators reduces to the comparison of
their SE’s. When ߪ is small, the AA and MLE are the most efficient mean estimators
(Fig. 3-15a and 3-15b). However, for large ߪ’s, SRC1 and SRC2 are more efficient than the
AA, MLE, and SR for some ranges of ݊ (Fig. 3-15c and 3-15d).
3.5
Concluding Remarks
This chapter shows that SR, unlike MLE, is not asymptotically unbiased and
significantly underestimates the mean value of a heterogeneous case. However, it
becomes more efficient than the AA, MLE, and PT when ߪ becomes large. Hence, there
are statistical benefits to using SR as an alternative mean estimator under some
conditions but SR must be used with care as its bias can diminish its other advantages.
This chapter also finds that the de-biased SR’s become consistent and the most
efficient mean estimator among all estimators considered here for certain range of
variability and sample size.
38 Bimodal Distribution
Chapter 4 : Performance Evaluation for the Case of
Bimodal Distribution
Megill (1984) graphically showed that SR estimates a mean value of modestly
skewed log-normal distributions with acceptable error, for instance 5.0% when σ=1;
however, it significantly underestimates the mean value as the distribution becomes
highly skewed. Recently, Bickel et al. (2011) concluded that SR has zero bias when
population is normally distributed. In other words, the performance of SR differs from
one distribution to another.
Knowing the performance of the mean estimators under different distribution types
up front assists in selecting an appropriate mean estimator for a given distribution type. A
few studies, such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011)
have assessed the bias of SR for the case of log-normal distribution. Thus, it is of interest
to study the performance of SR when the underlying distribution is different than normal
or log-normal distributions.
Sometimes, reservoir parameters can be better described by bimodal distribution due
to geological heterogeneity. For instance, a formation may consist of high quality sand
with high permeability and interbedded shale with low permeability and thus
permeability might follow a bimodal distribution. The oil and gas field size and
hydrocarbon reserves are other parameters that are not necessarily log-normally
distributed and can be described by bimodal distributions (MacCrossan 1969).
Hence this chapter evaluates the SR performance in contrast to the performances of
the AA, PT, and MLE. In this regard, the bias, consistency, efficiency, and uncertainty of
the mean estimators are analytically derived and then these expressions are numerically
validated via MC simulation.
39 Bimodal Distribution
4.1
Analytical Expressions of Mean Estimators’ Properties
In this chapter, it is assumed that the RV’s, ܺଵ , … , ܺ௡ , are i.i.d and follow a bimodal
distribution which can be split into two log-normal distributions as
݄௑ ሺ‫ߤ ;ݔ‬, ߪ ଶ , ߙ ሻ ൌ ߙ ݄௑ ଵ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ ൅ ሺ1 െ ߙሻ݄௑ ଶ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ, ................................. (4-1)
where ݄௑ ଵ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ and ݄௑ ଶ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ are the PDF’s of two log-normal distributions
with the log-means of ߤଵ and ߤଶ and log-variances of ߪଵ ଶ and ߪଶ ଶ , and ߙ is the portion of
each distribution in the population which varies from zero to one. The PDF of
݄௑ ሺ‫ߤ ;ݔ‬, ߪ ଶ , ߙ ሻ can be written as
݄௑ ሺ‫ߤ ;ݔ‬, ߪ
ଶሻ
ൌߙ
௫ షభ
ఙభ √ଶగ
݁
భ ೗೙ሺೣሻషഋభ మ
ቃ
഑భ
ିమ ቂ
൅ ሺ1 െ ߙሻ
௫ షభ
ఙమ √ଶగ
݁
భ ೗೙ሺೣሻషഋమ మ
ቃ
഑మ
ିమቂ
, ....................... (4-2) where ߤ and ߪ ଶ are the first and second moments of ݄௑ ሺ‫ݔ‬ሻ which are respectively given
by
‫ ܧ‬ሺܺሻ ൌ ߙ ݁
഑ మ
൬ఓభ ା మభ ൰
൅ ሺ1 െ ߙሻ݁
഑ మ
൬ఓమ ା మ ൰
మ
, .................................................................... (4-3) and
మ
഑భ మ
మ
మ
ܸܽ‫ݎ‬ሺܺሻ ൌ ߙ ݁ ଶఓభ ାଶఙభ ൅ ሺ1 െ ߙሻ݁ ଶఓమ ାଶఙమ െ ൤ߙ݁ ఓభ ା
഑మ మ
మ
൅ ሺ1 െ ߙሻ݁ ఓమ ା
ଶ
൨ . .......... (4-4) ߤଵ , ߤଶ , ߪଵ ଶ , and ߪଶ ଶ are selected such that the mode of the first log-normal
distribution is smaller than the mode of the second one (i.e., ݁ ൫ఓభ ିఙభ
మ൯
మ
൏ ݁ ൫ఓమିఙమ ൯ ). Note
that any combination of these five properties, ߤଵ , ߪଵ ଶ , ߤଶ , ߪଶ ଶ , and ߙ, will not yield a
bimodal distribution (see Appendix C for detail). For example, the PDF ݄௑ ሺ‫ߤ ;ݔ‬ଵ ൌ
1, ߪଵ ଶ , ߤଶ , ߪଶ ଶ ൌ 0.5ଶ , ߙሻ is bimodal when the value of ߙ lies between two same colored
curves shown in Fig. 4-1 depending on ߪଵ and ߤଶ values.
40 Bimodal Distribution
Mixing Proportion (α)
1
0.8
0.6
σ1 =1.5
0.4
σ1 =1.0
σ1=0.5
0.2
σ1 =0.3
σ1=0.05
0
0
1
2
3
4
5
µ2
Fig. 4-1– Bimodal region when µ1=1 and σ2=0.5.
As shown before, SR and PT are functions of percentiles and consequently their
statistical properties are functions of the means and variances of the percentiles. Hence
the statistical properties of the 5th, 10th, 50th, 90th, and 95th percentiles are derived first.
The uth percentile is assumed to be normally distributed with mean, ܺ௨ , and variance,
‫ݑ‬ሺ1 െ ‫ݑ‬ሻ⁄ሺ݄݊௑ೠ ଶ ሻ, where ܺ௨ ൌ ‫ି ܪ‬ଵ ሺ‫ݑ‬ሻ and ‫ܪ‬௑ ሺ‫ݔ‬ሻ is CDF (Ord and Stuart 1987). The
joint distribution of the uth and vth percentiles is bivariate normal and their covariance is
expressed by ‫ݑ‬ሺ1 െ ‫ ݒ‬ሻ⁄൫݊ ݄ሺ‫ݔ‬௨ ሻ ݄ሺ‫ݔ‬௩ ሻ൯ , ‫ ݑ‬൏ ‫( ݒ‬Ord and Stuart 1987).
Based on the stated assumptions above, the expected values, ‫ ܧ‬ሺ‫ ்ݔ‬ሻ, and variances,
ܸܽ‫ݎ‬ሺ‫ ்ݔ‬ሻ, of the AA, SR, PT, and MLE are analytically derived as follows.
According to the AA’s properties, ‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ‫ ܧ‬ሺܺሻ and ܸܽ‫ݎ‬ሺ‫ݔ‬஺ ሻ ൌ
௏௔௥ሺ௑ሻ
௡
. The
statistical properties of the MLE are analytically derived using the same approach as used
to derive the mean and SE of MLE for the log-normal case (see Appendix D for detail).
Therefore, the statistical properties of the MLE are given by
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ
஢మ మ
ቁ
௡ିଵ
ሺ೙షభሻ
మ
ି
ሺ೙షభሻ ಚభ మ
ಚభ మ
ߙ ݁ ఓభ ା మ ݁ ି ೙ మ ቀ1
െ
஢భ మ
ሺ೙షభሻ
మ
ቁ
ି
௡ିଵ
ಚమ మ
మ
൅ ሺ1 െ ߙሻ݁ ఓమ ା
݁ ି
ሺ೙షభሻ ಚమ మ
೙
మ
ቀ1 െ
, ................................................................................................................... (4-5) and
41 Bimodal Distribution
ಚభ మ
ଶ ଶఓభ ା ೙
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ߙ ݁
ಚమ మ
ߙሻଶ ݁ ଶఓమ ା ೙
൥݁
ಚమ మ
೙
ቀ1 െ
൥݁
ଶ஢మ మ
௡ିଵ
ಚభ మ
೙
ቀ1 െ
ଶ஢భ మ
௡ିଵ
ሺ೙షభሻ
ቁ
ି మ
ሺ೙షభሻ
ቁ
ି మ
െ ቀ1 െ
஢మ మ
െ ቀ1 െ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
஢భ మ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
൩ ൅ ሺ1 െ
൩. .......................................... (4-6) The first and second monents of the SR and PT are derived by subsituting the means,
variances, and covariances of the percentiles in the following equations
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ ൌ 0.3‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻ ൅ 0.4‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅ 0.3‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻ; ........................................................ (4-7)
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ ൌ 0.09ሾ‫ݎܽݒ‬ሺ‫ݔ‬ଵ଴ ሻ ൅ ‫ݎܽݒ‬ሺ‫ݔ‬ଽ଴ ሻሿ ൅ 0.16‫ݎܽݒ‬ሺ‫ݔ‬ହ଴ ሻ ൅ 0.24ሾܿ‫ݒ݋‬ሺ‫ݔ‬ଵ଴ , ‫ݔ‬ହ଴ ሻ ൅
ܿ‫ݒ݋‬ሺ‫ݔ‬ହ଴ , ‫ݔ‬ଽ଴ ሻሿ ൅ 0.18ܿ‫ݒ݋‬ሺ‫ݔ‬ଵ଴ , ‫ݔ‬ଽ଴ ሻ; .......................................................................... (4-8)
‫ ܧ‬ሺ‫ݔ‬௉் ሻ ൌ 0.185‫ ܧ‬ሺ‫ݔ‬ହ ሻ ൅ 0.63‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅ 0.185‫ܧ‬ሺ‫ݔ‬ଽହ ሻ; .............................................. (4-9)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ ൌ 0.034ሾ‫ݎܽݒ‬ሺ‫ݔ‬ହ ሻ ൅ ‫ݎܽݒ‬ሺ‫ݔ‬ଽହ ሻሿ ൅ 0.397‫ݎܽݒ‬ሺ‫ݔ‬ହ଴ ሻ ൅ 0.233ሾܿ‫ݒ݋‬ሺ‫ݔ‬ହ , ‫ݔ‬ହ଴ ሻ ൅
ܿ‫ݒ݋‬ሺ‫ݔ‬ହ଴ , ‫ݔ‬ଽହ ሻሿ ൅ 0.068ܿ‫ݒ݋‬ሺ‫ݔ‬ହ , ‫ݔ‬ଽହ ሻ, ........................................................................(4-10)
Where ‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ and ‫ ܧ‬ሺ‫ݔ‬௉் ሻ are the expected values of SR and PT, respectively, and
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ and ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ are the variances of SR and PT, respectively.
4.2
Validation of Analytical Expressions using Monte Carlo Simulation
Analytical expressions are numerically validated using MC simulation. For this
purpose, m = 10,000 data sets including n = 25 to 3,000 samples are randomly taken from
a population as described in previous chapter with this difference that the PDF of
underlying distribution is ݄௑ ሺ‫ݔ‬ሻ ൌ ߙ ݄௑ଵ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ ൅ ሺ1 െ ߙሻ݄௑ ଶ ሺ‫ߤ ;ݔ‬ଶ , ߪଶ ଶ ሻ, where
ߤଵ ൌ 1, ߤଶ ൌ 3, ߪଵ varies from 0.05 to 2, ߪଶ ൌ 0.5, and ߙ ൌ 0.3.
In order to draw a data set from this population, two subsets of ‫ݔ‬ଵ௜ ∈ ܺଵ and ‫ݔ‬ଶ௜ ∈ ܺଶ
are generated. These two sets are randomly taken from entire domain of ܺଵ and ܺଶ using
the inverse cumulative method, where ܺ௜ ൌ ‫ܪ‬௜ ିଵ ሺ‫ݑ‬ሻ, ‫ܪ‬௜ ିଵ is the inverse of CDF,
‫ܪ‬௑೔ ሺ‫ݔ‬௜ ሻ, of the ith log-normal distribution, ݅ ൌ 1, 2, and ‫ ݑ‬is uniformly distributed over
the interval ሺ0, 1ሻ. Next, these sets are combined by the following formula
42 Bimodal Distribution
ܺ ൌ ߚ ܺ
ଵ ൅ ሺ1 െ ߚሻܺଶ ...............................................................................................(4-11)
where ߚ is an index (i.e., it is either zero or one depending on the value of ‫)ݑ‬. ߚ equals
zero if ‫ ݑ‬is greater than ߙ , otherwise ߚ is one. The RV’s ܺଵ and ܺଶ are assumed to be
i.i.d, and consequently the RV’s ܺ is i.i.d too.
Applying the mean estimator, ܶ, to each data results in the sequence of ሼ‫ݔ‬ො் ሽ, where
‫ݔ‬ො் is the estimated mean value using the mean estimator, ܶ. The expected value, ሺ‫ݔ‬ො் ሻ஺ ,
and variance, ܸܽ‫ݎ‬ሺ‫ݔ‬ො் ሻ, of the sequence are approximated by the AA and formula
ଶ
∑௠
ො் ௜ െ ሺ‫ݔ‬ො் ሻ஺ ൧ ൗሺ݉ െ 1ሻ, respectively. ሺ‫ݔ‬ො் ሻ஺ and ܸܽ‫ݎ‬ሺ‫ݔ‬ො் ሻ are used to validate the
௜ୀ଴ൣ‫ݔ‬
analytical expressions of the mean estimators’ properties (Fig. 4-2 through Fig. 4-5).
Mc,σ1=1.5
Analy., σ1=1.5
Mc,σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
(a)
19.5
19
18
Mc, σ1=1.5
Analy.,σ1=1.5
Mc, σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
4
3.5
3
SD(xA)
E(xA)
18.5
(b)
2.5
2
1.5
17.5
1
17
0.5
16.5
0
0.05
0.1
0.15
0
0.2
0
0.05
0.1
0.15
0.2
1/√n
1/√n
Fig. 4-2 – (a) Expected value and (b) SE of the AA.
(a)
Mc,σ1=1.5
Analy., σ1=1.5
Mc,σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
19.5
19
(b)
Mc, σ1=1.5
Analy.,σ1=1.5
Mc, σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
2.5
2
SD(xMLE)
E(xMLE)
18.5
18
17.5
1.5
1
0.5
17
16.5
0
0.05
0.1
0.15
0
0.2
0
1/√n
Fig. 4-3 – (a) Expected value and (b) SE of MLE.
43 0.05
0.1
1/√n
0.15
0.2
Bimodal Distribution
(b)
(a) 19.5
Mc,σ1=1.5
Analy., σ1=1.5
Mc,σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
E(xSR)
18.5
3
2.5
SD(xSR)
19
Mc, σ1=1.5
Analy.,σ1=1.5
Mc, σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
3.5
18
17.5
2
1.5
1
17
0.5
0
16.5
0
0.05
0.1
0.15
0
0.2
0.05
0.1
0.15
0.2
0.15
0.2
1/√n
1/√n
Fig. 4-4 – (a) Expected value and (b) SE of SR.
(a)
(b)
Mc,σ1=1.5
Analy., σ1=1.5
Mc,σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
19.5
19
4
3.5
3
SD(xPT)
18.5
E(xPT)
Mc, σ1=1.5
Analy.,σ1=1.5
Mc, σ1=1.0
Analy., σ1=1.0
Mc, σ1=0.5
Analy., σ1=0.5
4.5
18
2.5
2
17.5
1.5
17
0.5
1
0
16.5
0
0.05
0.1
0.15
0
0.2
0.05
0.1
1/√n
1/√n
Fig. 4-5 – (a) Expected value and (b) SE (b) of PT.
Analytical expressions of the expected values and SE’s of the AA and MLE perfectly
follow the MC simulation results (Fig. 4-2 and 4-3); however, there are slight
discrepancies between MC results and analytical expressions of SR and PT. These
discrepancies are slight and reach 5.0% at the most for large ߪଵ and small ݊ (Fig. 4-4 and
4-5). In general, the analytical expressions reasonably match the numerical results, thus
the rest of this chapter focuses on comparing mean estimator properties obtained from the
analytical expressions.
4.3 Analyses of the Analytical Expressions of Mean Estimators’
Properties
As mentioned in previous chapter, the ratio, ‫ ܧ‬ሺ‫ ்ݔ‬ሻ⁄‫ ܧ‬ሺܺሻ, where ‫ ܧ‬ሺ‫ ்ݔ‬ሻ is the
expected value of the mean estimator, ܶ, and ‫ ܧ‬ሺܺሻ is the true mean value, is used to
44 Bimodal Distribution
assess the bias. Any deviation from one display that the mean estimator is biased and no
deviation indicates it is unbiased. The AA is unbiased since ‫ ܧ‬ሺ‫ݔ‬஺ ሻ⁄‫ ܧ‬ሺܺሻ ൌ 1 and MLE
is asymptotically unbiased since the deviation from unity decreases as ݊ tends to infinity
(Fig. 4-6a). However, SR and PT are both biased and slightly overestimate the mean
value for small standard deviation, ߪଵ , but significantly underestimate the mean value as
ߪଵ exceeds one (Fig. 4-6b).
(b)
(a)
2.5
1.2
1
E(xT) / E(X) E(xT) / E(X) 2
1.5
1
AA
0.5
0.8
0.6
0.4
SR
MLE, n=100
0.2
PT
MLE, n=450
0
0
1
2
3
0
4
0
Standard Deviation (σ1)
1
2
3
4
Standrad Deviation (σ1)
Fig. 4-6 – Ratio E(xT)/E(X) of (a) the AA and MLE, and (b) SR and PT when σ2=0.5.
Uncertainty as the second properties is evaluated based on SE. MLE estimates the
mean value with the smallest SE when ߪଵ ൑ 1.5 (Fig. 4-7a through Fig. 4-7c); however,
it has larger SE than SR and PT as ߪଵ exceeds 1.5 (Fig. 4-7d). SR has smaller SE than PT
for any ݊ and ߪଵ and estimates a mean value with at most 0.35% smaller SE than the AA
when ߪଵ ൑ 1.0. However, as ߪଵ increases, the difference between SE’s of SR and the AA
becomes significantly large such that it reaches, for instance, to 80% when ߪଵ ൌ 2.0. For
ߪଵ ൒ 1.5, the AA estimates the mean value with the largest SE and when ߪଵ reaches 2.0,
SR has the smallest SE (Fig. 4-7d).
Besides bias and uncertainty, efficiency and consistency are evaluated in context of
RMSE to choose an appropriate mean estimator. As mentioned before, an estimator with
smaller RMSE is more efficient and it is consistent as its RMSE tends to zero as ݊
becomes very large. Zero RMSE, indeed, means that both bias and SE approach zero for
very large ݊. All SE’s of the mean estimators approach zero as ݊ increases (Fig. 4-7),
thus any non-zero RMSE is caused by non-zero bias.
45 Bimodal Distribution
(a)
3.5
AA
PT
SR
MLE
3
2.5
SD(xT)
2.5
σ1=1.0
3.5
AA
PT
SR
MLE
3
SD(xT)
(b)
σ1=0.5
2
1.5
2
1.5
1
1
0.5
0.5
0
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√n
(c)
3
0.2
AA
PT
SR
MLE
16
14
12
SD(xT)
2.5
0.15
σ1 = 2.0
18
AA
PT
SR
MLE
3.5
SD(xT)
(d)
σ1=1.5
4
0.1
1/√n
2
1.5
10
8
6
1
4
0.5
2
0
0
0.05
0.1
0.15
0
0.2
0
1/√n
0.05
0.1
0.15
0.2
1/√n
Fig. 4-7 – Standard errors of the AA, MLE, SR, and PT for four different values of σ1.
The AA and MLE are consistent; however, the RMSE’s of SR and PT do not
approach zero for large ݊ due to their bias, thus SR and PT are inconsistent (Fig. 4-8).
When ߪଵ ൑ 1.5, MLE has the smallest RMSE for any ݊ (Fig. 4-8a through Fig. 4-8c);
however, MLE has higher RMSE than SR and PT when ߪଵ ൐ 1.5 and ݊ ൏ 50 (Fig. 4-8d).
For ߪଵ ൑ 1, the AA and SR have approximately identical RMSE’s; nevertheless, as ݊
becomes very large, the RMSE’s of the AA is smaller and approaches zero whereas the
RMSE of SR tends to a value other than zero. The RMSE of PT is the largest one when
ߪଵ ൑ 1 regardless of ݊, but as ߪଵ increases to 2.0 and ݊ ൏ 60, PT has the smallest RMSE.
The AA becomes the least efficient mean estimator when ߪଵ ൒ 1.5 and moderate ݊, but
as ݊ becomes very large, the efficiency of the AA improves (Fig. 4-8c and 4-8d).
46 Bimodal Distribution
(a)
σ1=1.0
3.5
Arith
MLE
SR
PT
3
2.5
RMSE
(b)
σ1=0.5
3.5
3
Arith
MLE
SR
PT
2.5
RMSE
2
1.5
2
1.5
1
1
0.5
0.5
0
0
0
0.05
0.1
0.15
0
0.2
0.05
1/√N
(c)
3.5
0.2
Arith
MLE
SR
PT
14
12
RMSE
2.5
0.15
σ1=2.0
18
16
Arith
MLE
SR
PT
3
RMSE
(d)
σ1=1.5
4
0.1
1/√N
2
10
8
1.5
6
1
4
0.5
2
0
0
0
0.05
0.1
0.15
0
0.2
1/√N
0.05
0.1
0.15
0.2
1/√N
Fig. 4-8 – RMSE’s of the AA, MLE, SR, and PT for four different values of σ1.
4.4
Concluding Remarks
This chapter shows that PT has slightly less bias than SR; the AA is unbiased; and
MLE is asymptotically unbiased. SR estimates the mean value with smaller uncertainty
than the AA and PT for any variability and sample size, but it is more uncertain than
MLE for certain ranges of variability and sample size. Although MLE has the highest
efficiency for moderate variability, there is complexity in using MLE for the case of a
bimodal distribution, so other mean estimators are preferable. Each of the AA, SR, and
PT has the highest efficiency just for certain ranges of variability and sample size.
Therefore, there is the possibility that SR becomes an optimum mean estimator for some
ranges of variability and sample size because the SR bias is compensated by its smaller
uncertainty and higher efficiency.
47 Power-Normal Distribution
Chapter 5 : Performance Evaluation for the Case of
Power-Normal Distribution
Log-normal distribution is widely used to describe the distributions of reservoir
parameters; however, many researchers such as Lambert 1981, Bennion 1966, and Jensen
et al. 1987 have shown that reservoir parameters such as permeability and hydrocarbon
reserves are not necessarily log-normally distributed and can be described by other kinds
of distribution, such as power-normal distribution. None of Keefer and Bodily (1983),
Megill (1984) and Bickel et al. (2011), however, have evaluated the performance of SR
when the underlying distribution is power-normal. Thus the intention of this chapter is to
evaluate the performance of SR and compare it to the performances of other commonly
used mean estimators such as the AA and PT for the case of power-normal distribution.
For this purpose, the mean estimators’ properties such as bias, uncertainty, efficiency,
and consistency are analytically derived and numerically validated using MC simulation.
5.1
Analytical Expressions of Mean Estimators’ Properties
Let RV, ܺ, be i.i.d. and transformable to a normal distribution by power-normal
transformation:
௑ ಓ ିଵ
,λ ് 0
, ....................................................................................... (5-1)
ܻ ൌ ൝ ஛
݈݊ሺܺሻ,λ ൌ 0
where െ1 ൑ ߣ ൑ ൅1 (Box and Cox 1964). The PDF of a power-normally distributed RV,
ܺ, with the transformed mean of ߤ and variance of ߪ ଶ is expressed as
ଵ
௫ ಓషభ
భ ೤షഋ మ
݁ିమቂ ഑ ቃ
,‫ ݔ‬൐ 0 , .................................... (5-2) ݄௑ ሺ‫ݔ‬ሻ ൌ ൝ ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ ஢√ଶగ
0,‫ ݔ‬൑ 0
where  denotes standard normal CDF, and ‫ ܭ‬ൌ ሺ1⁄λσ ൅ ߤ⁄σሻ; ൫– ‫ ܭ‬൯ is a truncated
point of the normalized transformed RV of ܼ ൌ ሺܻ െ ߤሻ⁄ߪ.
48
Power-Normal Distribution
The PDF, ݄௑ ሺ‫ݔ‬ሻ, is defined based on the fact that the distribution of ܻ follows a
truncated normal distribution (TND) (i.e., ܻ~ܶܰሺߤ, ߪ ଶ ሻ). When λ ൌ 1, ܺ follows a TND,
and ܺ is log-normally distributed when λ ൌ 0. In practice, the truncation issue can be
resolved by assigning a very large value to ߤ. As a result, ‫ ܭ‬becomes sufficiently large
and ߔሾ‫݊݃݅ݏ‬ሺλሻ‫ ܭ‬ሿ ൎ 1, and consequently ܻ is normally distributed. Gnanadesikan (1977)
proposed using ܺ ൅ ܿ instead of ܺ, where ܿ is adequately large value; however, this
method can cause a likelihood function to behave poorly (Atkinson et al. 1991).
Among different approaches used to derive the statistical properties of a power
normal distribution, Freeman and Modarres’s approach (2006) is used in this study. The
expected value and variance of ܺ are analytically derived based on the fact that ܻ follows
a TND. The rth moment of RV, ܺ, is given as
‫ ܧ‬ሺ‫ ݔ‬௥ ሻ ൌ
೔
ଵ
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ
ఙ
ሺ௜ ሻ
ሺ௜ሻ
௜ ି௄ ௜
‫ۓ‬ቈ∑ஶ
െ ∑ஶ
௜ୀ଴, ܶ ሺߤሻ ೔ൗ ௜
௜ୀ଴ ܶ ሺߤሻߪ ‫ି׬‬ஶ ‫∅ ݖ‬ሺ‫ݖ‬ሻ ݀‫ݖ‬቉ ,ߣ ൐ 0
ۖ ௘௩௘௡
మ
ଶ ൫ ൗଶ൯!
ఙ೔
‫ ۔‬ஶ
ሺ௜ ሻ
ሺ௜ሻ
ஶ
௜
௜ ஶ
ۖ ቈ∑ ௜ୀ଴, ܶ ሺߤሻ ೔ൗమ ௜ െ ∑௜ୀ଴ ܶ ሺߤሻߪ ‫ି׬‬௄ ‫∅ ݖ‬ሺ‫ݖ‬ሻ ݀‫ݖ‬቉ ,ߣ ൏ 0
ൗ
ଶ
൫
൯!
‫ ە‬௘௩௘௡
ଶ
, . (5-3)
ೝ
where ܶ ሺ௜ሻ ሺߤሻ ൌ ሺ1 ൅ ߣߤሻഊି௜ ∏௜ିଵ
௝ୀ଴ሺ‫ ݎ‬െ ݆ߣሻ; ‫ ݖ‬ൌ ሺ‫ ݕ‬െ ߤ ሻ⁄ߪ follows a standard TND;
and ∅ is standard normal PDF (see Appendix E for detail). Thus ‫ ܧ‬ሺXሻ is obtained by
assigning ‫ ݎ‬ൌ 1 and the ܸܽ‫ݎ‬ሺXሻ ൌ ‫ ܧ‬ሺX ଶ ሻ െ ‫ ܧ‬ሺXሻଶ . The expected value and variance of the AA are given by ‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ‫ ܧ‬ሺܺሻ and
ܸܽ‫ݎ‬ሺ‫ݔ‬஺ ሻ ൌ ܸܽ‫ݎ‬ሺܺሻ⁄݊, respectively.
In order to derive the statistical properties of SR and PT, Eqs. 2-9 and 2-10 are
rewritten in the general form of
‫ ்ݔ‬ൌ ω‫ݔ‬௥ ൅ ሺ1 െ 2ωሻ‫ݔ‬௦ ൅ ω‫ݔ‬௩ , ................................................................................ (5-4)
where ‫ ்ݔ‬is the mean value estimated using the estimator ܶ, ‫ݔ‬௨ is the uth percentile, and
ω equals to 0.3 and 0.185 in SR and PT formulas, respectively. The expected value and
variance of Eq. 5-4 can be respectively derived by taking expectation and variance from
both sides of it as
‫ ܧ‬ሺ‫ ்ݔ‬ሻ ൌ ߱‫ ܧ‬ሺ‫ݔ‬௥ ሻ ൅ ሺ1 െ 2߱ሻ ‫ܧ‬ሺ‫ݔ‬௦ ሻ ൅ ߱‫ ܧ‬ሺ‫ݔ‬௩ ሻ, ...................................................... (5-5)
49 Power-Normal Distribution
and
ܸܽ‫ݎ‬ሺ‫ ்ݔ‬ሻ ൌ
߱ଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௥ ሻ ൅ ሺ1 െ 2߱ሻଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௦ ሻ ൅ ߱ଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௩ ሻ ൅ 2߱ሺ1 െ 2߱ሻ ܿ‫ݒ݋‬ሺ‫ݔ‬௥ , ‫ݔ‬௦ ሻ ൅
2߱ሺ1 െ 2߱ሻ ܿ‫ݒ݋‬ሺ‫ݔ‬௦ , ‫ݔ‬௩ ሻ ൅ 2߱ଶ ܿ‫ݒ݋‬ሺ‫ݔ‬௥ , ‫ݔ‬௩ ሻ. ............................................................. (5-6) According to Eqs. 5-5 and 5-6, the first step to analytically derive the expected values
and variances of SR and PT is to analytically calculate the statistical properties of the
percentiles. In this regard, it is assumed that the uth percentile is normally distributed,
‫ݔ‬௨ ~ܰ൫ܺ௨ , ‫ݑ‬ሺ1 െ ‫ݑ‬ሻ⁄ሺ݄݊௑ೠ ଶ ሻ൯ and the covariance between the uth and vth percentiles is
given by ‫ݑ‬ሺ1 െ ‫ ݒ‬ሻ⁄൫݊ ݄ሺ‫ݔ‬௨ ሻ ݄ሺ‫ݔ‬௩ ሻ൯ , ‫ ݑ‬൏ ‫( ݒ‬Ord and Stuart 1987). Therefore, the
expected value and variance of the uth percentile can be given by
భ
‫ ܧ‬ሺ‫ݔ‬௨ ሻ ൌ ሺ1 ൅ ߣߪ‫ݓ‬௨ ∗ ൅ ߣߤሻഊ ..................................................................................... (5-7)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ ൌ ‫ݑܣ‬ሺ1 െ ‫ݑ‬ሻሺ1 ൅ ߣߪ‫ݓ‬௨ ∗ ൅ ߣߤሻ
where ‫ ܣ‬ൌ ቀ
ଶగఙ మ
௡
మሺభషഊሻ
ഊ
∗మ
݁௪ೠ , .............................................. (5-8) ߔଶ ሾ‫݊݃݅ݏ‬ሺλሻ‫ ܭ‬ሿቁ,  is standard normal CDF, ‫ݓ‬௨ ∗ ൌ  ் ିଵ ሺ‫ݑ‬⁄100ሻ, and
 ் is the truncated standard normal CDF as  ் ሺ. ሻ ൌ
ሺ.ሻିሺି௄ሻ
.
ଵିሺି௄ሻ
The covariance of two
percentiles, ‫ݔ‬௨ and ‫ݔ‬௩ , where ‫ ݑ‬൏ ‫ ݒ‬is defined as
ሺభషഊሻ
ഊ
ܿ‫ݒ݋‬ሺ‫ݔ‬௨ , ‫ݔ‬௩ ሻ ൌ ‫ݑܣ‬ሺ1 െ ‫ ݒ‬ሻሺ1 ൅ ߣߪ‫ݓ‬௨ ∗ ൅ ߣߤሻ
ሺభషഊሻ
ഊ
ሺ1 ൅ ߣߪ‫ݓ‬௩ ∗ ൅ ߣߤሻ
݁
మ
మ
ೢೠ ∗ శೢೡ ∗
మ
.
....................................................................................................................................... (5-9) Substituting Eq. 5-7 into Eq. 5-5 yields the expected values of SR and PT,
respectively, as
భ
భ
భ
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ ൌ 0.3 ቂሺ1 ൅ ߣߪ‫ݓ‬ଵ଴ ∗ ൅ ߣߤሻഊ ൅ ሺ1 ൅ ߣߪ‫ݓ‬ଽ଴ ∗ ൅ ߣߤሻഊ ቃ ൅ 0.4ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅ ߣߤሻഊ ,
......................................................................................................................................(5-10)
and
50
Power-Normal Distribution
భ
భ
‫ ܧ‬ሺ‫ݔ‬௉் ሻ ൌ 0.185 ቂሺ1 ൅ ߣߪ‫ݓ‬ହ ∗ ൅ ߣߤሻഊ ൅ ሺ1 ൅ ߣߪ‫ݓ‬ଽହ ∗ ൅ ߣߤሻഊ ቃ ൅ 0.63ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅
భ
ߣߤሻഊ . ............................................................................................................................(5-11)
Applications of Eqs. 5-8 and 5-9 into Eq. 5-6 results in the variances of SR and PT,
respectively, as
మሺభషഊሻ
ഊ
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ ൌ A ൈ 10ିଷ ቊ8.1 ൤ሺ1 ൅ ߣߪ‫ݓ‬ଵ଴ ∗ ൅ ߣߤሻ
మሺభషഊሻ
ഊ
ߣߤሻ
∗మ
݁ ௪వబ ൨ ൅ 40ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅ ߣߤሻ
∗
ߣߪ‫ݓ‬ହ଴ ൅ ߣߤሻ
మ
మ
ሺభషഊሻ ೢభబ ∗ శೢఱబ ∗
ഊ
మ
݁
ሺభషഊሻ
ഊ
ሺ1 ൅ ߣߪ‫ݓ‬ଽ଴ ∗ ൅ ߣߤሻ
ሺభషഊሻ
ഊ
ߣߤሻ
మሺభషഊሻ
ഊ
∗మ
݁ ௪భబ ൅ ሺ1 ൅ ߣߪ‫ݓ‬ଽ଴ ൅
ሺభషഊሻ
ഊ
∗మ
݁ ௪ఱబ ൅ 12 ቈሺ1 ൅ ߣߪ‫ݓ‬ଵ଴ ∗ ൅ ߣߤሻ
ሺ1 ൅
൅
ሺభషഊሻ
ഊ
ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅ ߣߤሻ
ሺభషഊሻ
ഊ
ሺ1 ൅ ߣߪ‫ݓ‬ଽ଴ ∗ ൅ ߣߤሻ
݁
మ
మ
ೢభబ ∗ శೢవబ ∗
మ
మ
మ
ೢవబ ∗ శೢఱబ ∗
మ
݁
቉ ൅ 1.8ሺ1 ൅ ߣߪ‫ݓ‬ଵ଴ ∗ ൅
ቋ, ...........................................................(5-12) and
మሺభషഊሻ
ഊ
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ ൌ A ൈ 10ିଷ ቊ1.63 ൤ሺ1 ൅ ߣߪ‫ݓ‬ହ ∗ ൅ ߣߤሻ
మሺభషഊሻ
ഊ
ߣߤሻ
ሺభషഊሻ
ഊ
ߣߤሻ
ሺభషഊሻ
ഊ
ߣߤሻ
మሺభషഊሻ
ഊ
∗మ
݁ ௪వఱ ൨ ൅ 99.23ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅ ߣߤሻ
∗
ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ൅ ߣߤሻ
݁
మ
మ
ೢవఱ ∗ శೢఱబ ∗
మ
ሺభషഊሻ
ഊ
݁
మ
మ
ೢఱ ∗ శೢఱబ ∗
మ
∗
∗మ
݁ ௪ఱ ൅ ሺ1 ൅ ߣߪ‫ݓ‬ଽହ ൅
∗మ
݁ ௪ఱబ ൅ 11.66 ቈሺ1 ൅ ߣߪ‫ݓ‬ହ ∗ ൅
ሺభషഊሻ
ഊ
൅ ሺ1 ൅ ߣߪ‫ݓ‬ଽହ ∗ ൅ ߣߤሻ
ሺభషഊሻ
ഊ
቉ ൅ 0.68ሺ1 ൅ ߣߪ‫ݓ‬ହ ൅ ߣߤሻ
∗
ሺ1 ൅ ߣߪ‫ݓ‬ହ଴ ∗ ൅
ሺ1 ൅ ߣߪ‫ݓ‬ଽହ ൅ ߣߤሻ
ሺభషഊሻ
ഊ
݁
మ
మ
ೢఱ ∗ శೢవఱ ∗
మ
ቋ.
......................................................................................................................................(5-13) 5.2
Validation of Analytical Expressions using Monte Carlo Simulation
The analytical expressions derived in previous section are numerically validated by
MC simulation. For this purpose, m = 10,000 data sets with n = 35 to 10,000 samples are
taken from a power normal distribution with the transformed mean of ߤ ൌ 5 and standard
deviation of ߪ varying from zero to 12, and the exponent of ߣ ൌ 1/2.
51 Power-Normal Distribution
As mentioned before, the RV, ܺ, can be transformed into RV, ܻ, where
ܻ~ܶܰሺߤ, ߪ ଶ ሻ, therefore, it is easier to generate RV, ܻ, first and then transform it into
RV, ܺ, using the following formula
భ
ܺ ൌ ሺ
ܻߣ ൅ 1ሻഊ . ...........................................................................................................(5-14)
n samples, ‫ݕ‬ଵ , … , ‫ݕ‬௡ , are randomly chosen from the entire domain of ܻ ൌ
‫ି ܨ‬ଵ ሼߔ ሺെ1⁄ߣሻ ൅ ሾ1 െ ߔሺെ1⁄ߣሻሿ‫ݑ‬ሽ, where ‫ି ܨ‬ଵ is the inverse of the normal CDF with
the mean of ߤ, standard deviation of ߪ, and truncated at point of െ1⁄ߣ; ‫ ݑ‬is randomly
taken from a uniform distribution over interval of ሾ0, 1ሿ ;and ߔ is the standard normal
CDF. This procedure is repeated ݉ times to generate ݉ data sets. Then, the mean
estimators are applied to each data set, and consequently a set of mean estimates,
൛‫ݔ‬ො் ଵ , … , ‫ݔ‬ො் ௠ ൟ, is generated. The expected value of this data set, ሺ‫ݔ‬ො் ሻ஺ , is approximated
by
the
AA
and
its
variance
is
calculated
using
the
formula,
ଶ
ܸܽ‫ݎ‬ሺ‫ݔ‬ො் ሻ ൌ ∑௠
ො் ௜ െ ሺ‫ݔ‬ො் ሻ஺ ൧ ൗሺ݉ െ 1ሻ.
௜ୀ଴ൣ‫ݔ‬
In order to validate the analytical expressions, the ratios of MC results to analytical
results are used in this chapter (Fig. 5-1 through Fig. 5-3). Any deviation from one
illustrates discrepancy between MC simulation and analytical expressions. The ratio
ሺ‫ݔ‬ො஺ ሻ஺ ⁄‫ ܧ‬ሺ‫ݔ‬஺ ሻ deviates from one at most 0.7% and the ratios of ሺ‫ݔ‬ොௌோ ሻ஺ ⁄‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ and
ሺ‫ݔ‬ො௉் ሻ஺ ⁄‫ ܧ‬ሺ‫ݔ‬௉் ሻ differ from one at most by 2% for small ݊ and large ߪ. As ݊ increases
and/or ߪ decreases, however, the deviation from unity approaches zero. Therefore, in
general, analytical expressions reasonably follow the numerical results. Hence, the rest of
this chapter focuses on assessing mean estimators’ properties based on the analytical
expressions.
52 Power-Normal Distribution
(b)
(a)
1.03
(‫̂ ݔ‬A)A /E(xA)
1.02
SD(‫̂ ݔ‬A) / SD(xA)
n=35
n=600
n=2000
n=10000
1.01
1
1.2
1.1
1
n=35
n=600
n=2000
n=10000
0.9
0.99
0.8
0.98
0
2
4
6
8
10
0
12
2
4
6
8
10
12
Standard Deviation (σ)
Standard Deviation (σ)
(c)
RMSE (‫̂ ݔ‬A) / RMSE (xA)
1.2
1.1
1
n=35
n=600
n=2000
n=10000
0.9
0.8
0
2
4
6
8
10
12
Standard Deviation (σ)
Fig. 5-1 – Ratios of MC to analytical results of (a) expected value, (b) SE, and (c) RMSE of the AA
for the case of square root power-normal distribution.
53 Power-Normal Distribution
(b)
(a)
n=35
n=600
n=2000
n=10000
(‫̂ ݔ‬SR)A /E(xSR)
1.02
1.01
1
1.1
1
0.9
0.99
0.98
0.8
0
2
4
6
8
10
12
0
Standard Deviation (σ)
(c)
2
4
6
8
10
12
Standard Deviation (σ)
n=35
n=600
n=2000
n=10000
1.2
RMSE (‫̂ ݔ‬SR) / RMSE (xSR)
n=35
n=600
n=2000
n=10000
1.2
SD(‫̂ ݔ‬SR) / SD(xSR)
1.03
1.1
1
0.9
0.8
0
2
4
6
8
10
12
Standard Deviation (σ)
Fig. 5-2 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of SR for
square root power-normal distribution.
54 Power-Normal Distribution
(a)
1.2
SD(‫ ݔ‬PT) / SD(xPT)
1.02
(‫ ݔ‬PT)A /E(xPT)
(b)
n=35
n=600
n=2000
n=10000
1.03
1.01
1
n=35
n=600
n=2000
n=10000
1.1
1
0.9
0.99
0.8
0.98
0
2
4
6
8
10
0
12
2
4
6
8
10
12
Standard Deviation (σ)
Standard Deviation (σ)
(c)
RMSE (‫ ݔ‬PT) / RMSE (xPT)
1.2
n=35
n=600
n=2000
n=10000
1.1
1
0.9
0.8
0
2
4
6
8
10
12
Standard Deviation (σ)
Fig. 5-3 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of the PT
for square root power-normal distribution.
5.3
Analyses of Mean Estimators’ Properties
As mentioned previously, the mean estimators’ performances are evaluated based on
their statistical properties including bias, uncertainty, efficiency, and consistency. Bias is
evaluated using the ratio of ‫ ܧ‬ሺ‫ ்ݔ‬ሻ to ‫ ܧ‬ሺܺሻ, where ‫ ܧ‬ሺ‫ ்ݔ‬ሻ is the expected value of the
mean estimator, ܶ, and ‫ ܧ‬ሺܺሻ is the true mean (Fig. 5-4). The mean estimator, ܶ, is
unbiased when this ratio is one and any deviation from unity illustrates that it is biased.
‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ‫ ܧ‬ሺܺሻ, so the ratio is one and the AA is unbiased whereas SR and PT are biased
for any ߣ, except when ߣ ൌ 1 (Fig. 5-4). The biases associated with them are unimportant
when ߣ is close to one (e.g., λ ൌ 1/2 ); nonetheless, the biases become significant as ߣ
approaches zero. PT estimates the mean value with less bias than SR, for instance, when
ߣ ൌ 1⁄4, PT estimates a mean value by 2% error whereas SR gives a mean value with
14% error.
55 Power-Normal Distribution
(a) 1.2
(b)
1.2
1
0.8
λ=1
λ=1/2
λ=1/4
λ=1/6
λ=1/8
λ=1/16
λ=0
0.6
0.4
0.2
E(xPT)/E(X)
E(xSR)/E(X)
1
0.8
λ=1
λ=1/2
λ=1/4
λ=1/6
λ=1/8
λ=1/16
λ=0
0.6
0.4
0.2
0
0
0
2
4
6
8
10
12
14
16
0
2
4
6
8
10
12
14
16
Standard Deviation (σ)
Standard Deviation (σ)
Fig. 5-4 – E(XT)/E(X) of (a) SR and (b) PT versus σ for different λ values.
As mentioned in Chapter Three, in addition to ߪ, the VDP (Dykstra and Parsons 1950)
is used as a measure of variability to evaluate the biases of SR and PT (Fig. 5-5). Both SR
and PT estimate the mean value with insignificant errors when VDP ranges from 0 to 0.75
(i.e., homogenous to relatively heterogeneous reservoirs); however, as VDP increases to
one, they significantly underestimate the mean value.
(a)
(b)
1.2
1.2
1
0.8
E(xPT)/E(X)
E(xSR)/E(X)
1
λ=1
λ=1/2
λ=1/4
λ=1/6
λ=1/8
λ=1/16
λ=0
0.6
0.4
0.2
0.2
0.4
λ=1
λ=1/2
λ=1/4
λ=1/6
λ=1/8
λ=1/16
λ=0
0.6
0.4
0.2
0
0
0.8
0.6
0.8
0
1
0
VDP
0.2
0.4
0.6
0.8
1
VDP
Fig. 5-5 – Analytical ratios of (a) E(xSR)/E(X) and (b) E(xPT)/E(X) versus VDP for different λ values.
As mentioned before, another important estimator property is uncertainty; an
estimator has smaller uncertainty than others when it has smaller SE. In general, the SE’s
of the mean estimators decrease as ߪ decreases and/or ݊ increases (Fig. 5-6). When
ߣ ൐ 1/8, the AA has the smallest SE, and SR has slightly less SE than PT (Fig. 5-6a
through Fig. 5-6c). As ߣ decreases to 1/16, SR and the AA have approximately identical
SE’s and perform better than PT for moderate ߪ’s; however, for large ߪ’s, for instance
ߪ ൌ 2.0, SR has significantly smaller SE than PT and the AA (Fig. 5-6d).
56
Power-Normal Distribution
(b)
λ=1/2
1.4
AA
1.2
SD(xT)
AA
PT
5
SR
1
λ=1/4
6
σ =2.0
σ =1.5
0.8
σ =1.0
0.6
SR
σ =1.5
3
σ =1.0
2
0.4
σ =0.5
σ =0.5
1
0.2
0
0
0
0.05
0.1
0.15
0
0.2
0.05
0.1
1/√N
(c)
0.15
0.2
1/√N
(d)
λ=1/8
25
AA
20
λ=1/16
80
AA
70
σ =2.0
PT
σ =2.0
PT
60
15
σ =1.5
SD(xT)
SR
SD(xT)
σ =2.0
PT
4
SD(xT)
(a)
10
SR
50
40
σ =1.5
30
σ =1.0
20
5
σ =1.0
10
σ =0.5
0
σ =0.5
0
0
0.05
0.1
0.15
0.2
0
1/√N
0.05
0.1
0.15
0.2
1/√N
Fig. 5-6 – Standard errors of the AA, SR, and PT for four different values of λ and σ.
Consistency as another property studies the effect of ݊ on the accuracy of estimates.
As mentioned before, a mean estimator is consistent if its RMSE approaches zero for
very large ݊ (i.e., zero RMSE happens when both bias and SE tend to zero as ݊
approaches infinity). All SE’s tend to zero for very large ݊ (Fig. 5-6), hence non-zero
RMSE is caused by non-zero bias. The AA is unbiased, thus the AA is consistent for all
ߪ and ݊; nonetheless, SR and PT are inconsistent due to their biases (Fig. 5-7).
The biases of SR and PT are very small when ߣ ൌ 1/2 and ߣ ൌ 1/4; however, they
considerably increase as ߣ approaches zero (Fig. 5-4). For example, when ߣ ൌ 1/2 and
ߪ ൌ 12, SR and PT estimate a mean value by at most 0.5% and 0.15% errors,
respectively. Thus, it can be expected that SR and PT are approximately consistent for
some ߣ values (Fig. 5-7a and 5-7b).
When ߪ ൏ 1, the AA is the most efficient mean estimator regardless of ݊ and ߣ. However, as ߪ increases depending on ߣ and ݊, each of the AA, SR, and PT could have
57 Power-Normal Distribution
the smallest RMSE for certain range of ݊ and ߪ. For instance, when ߣ ൒ 1/4, the AA is
the most efficient mean estimator, and SR has slightly higher efficiency than PT
regardless of ݊ (Fig. 5-7a and 5-7b). However, for ߣ ൏ 1/4, SR becomes the most
efficient mean estimator for certain range of ݊ and ߪ. PT gives a mean with significantly smaller bias than SR for small ߣ (Fig. 5-4);
therefore, the RMSE of SR significantly deviates from zero whereas the RMSE of PT
slightly deviates from zero (Fig. 5-7c and 5-7d).
(a)
(b)
λ=1/2
1.4
1.2
σ =1.5
SR
0.8
σ =1.0
0.6
0.4
σ =0.5
0.2
SR
σ =1.5
3
2
σ =1.0
1
σ =0.5
0
0
0
0.05
0.1
0.15
0
0.2
0.05
0.1
(d)
λ=1/8
25
AA
0.2
λ=1/16
80
70
AA
PT
60
PT
SR
50
σ =2.0
15
σ =1.5
10
RMSE(xT)
RMSE(xT)
20
0.15
1/√N
1/√N
(c)
σ =2.0
PT
4
RMSE(xT)
RMSE(xT)
AA
5
PT
1
λ=1/4
6
σ =2.0
AA
SR
40
σ =1.5
30
20
σ =1.0
5
σ =2.0
σ =1.0
10
σ =0.5
σ =0.5
0
0
0
0.05
0.1
0.15
0
0.2
0.05
0.1
0.15
0.2
1/√N
1/√N
Fig. 5-7 – RMSE’s of the AA, SR, and PT for four different values of λ and σ.
5.4
Improving Swanson’s Rule
Zero bias is a desirable property of a mean estimator; however, it is not necessarily
the most important property because a biased mean estimator might be converted into an
unbiased estimator using a correction factor or some modifications in its formula.
58 Power-Normal Distribution
One way to de-bias SR is to analytically calculate the weights of SR by setting
‫ݔ‬ௌோ ൌ ‫ ܧ‬ሺܺሻ, designated by SRC, as described previously in Chapter Three.
For simplification, ‫ ܭ‬is assumed to be sufficiently large such that there is no
truncated issue. Equating Eq. 3-16 to ‫ܧ‬ሺܺሻ yields
భ
∑ಮ
೔సమ, ೐ೡ೐೙ሺଵାఓఒሻഊ
ష೔
೔
൬ ൰ ೔
൥ఙ೔ ൘ቆଶ మ ቀమቁ!ቇ൩
∏೔షభ
ೕసభሺଵି௝ఒሻቤ
೤సഋ
భൗ
భൗ
భ
ሾଵାఒఙ௪భబ ାఒఓሿ ഊ ିଶሾଵାఒఓሿ ഊ ାሾଵାఒఙ௪వబ ାఒఓሿ ൗഊ
߱ൌ
, ..................................................(5-15)
where ‫ݓ‬௨ ൌ ିଵ ሺ‫ݑ‬/100ሻ, and  denotes standard normal CDF. Table 5-1 provides
some justified ω’s for power normal distributions with different ߣ values. For cases of
λ=1/2 and λ=1/3, SR can be converted to unbiased mean estimator by just a 1.3% change
in the weights of SR regardless of ߤ and ߪ. However, for other λ values, ω can be
justified based on ߤ and ߪ values.
Table 5-1 – Derived ω’s for some power normal distributions with different
λ’s.
࣓
λ
ଶ ሺߤ ⁄
ସ ⁄12
ସ ሺߤ ⁄
1⁄6
6 ൅ 1ሻ
6 ൅ 1ሻଶ ⁄144 ൅ 5ߪ ଺ ⁄15552
൅5ߪ
5ߪ
2ሺߪ‫ݓ‬ଽ଴ ⁄6ሻ଺ ൅ 30ሺߤ⁄6 ൅ 1ሻସ ሺߪ‫ݓ‬ଽ଴ ⁄6ሻଶ ൅ 30ሺߤ⁄6 ൅ 1ሻଶ ሺߪ‫ݓ‬ଽ଴ ⁄6ሻସ
1⁄4
3 ߪ ଶ ሺߤ⁄4 ൅ 1ሻଶ ൅ 3 ߪ ସ ⁄256
2 ߪ ସ ‫ݓ‬ଽ଴ ସ ⁄256 ൅ 12ሺߤ⁄4 ൅ 1ሻଶ ߪ ଶ ‫ݓ‬ଽ଴ ଶ ⁄16
1⁄3
0.304
1⁄2
0.304
As derived in Eq. 5-15 and shown in Table 5-1, ߱ is a function of ߤ and ߪ depending
on ߣ except for some ߣ’s such as 1/2 and 1/3. This dependency means that these
population parameters should be known in order to justify ߱; however, in the most of
cases, none of them is known. Hence, they should be estimated from an available data
set. Applications of sample transformed mean, ݉, and standard deviation, ‫ݏ‬, and sample
exponent, ߣመ, implement errors into ߱ estimate, ߱
ෝ, and consequently error in the
estimation of a mean value. Many researchers have investigated as to how ߣ is estimated
from a set of observation (Box and Cox 1964; Hinkley 1975; and Emerson and Stoto
1982).
59 Power-Normal Distribution
Assume ߣ to be known but ߤ and ߪ to be unknown. Under this assumption, over 95%
confidence interval, ߱ is estimated, for instance, by at most 1.5% error when ߣ ൌ 1/8
compared to the case in which ߤ and ߪ are known (Fig. 5-8). As ߣ increases and/or ߪ
decreases, this error significantly drops to 0.2% at the most when ߣ ൌ 1/4.
0.334
σ & µ Known, λ=1/8
σ & µ Unknown, λ=1/8
σ & µ Known, λ=1/6
σ & µ Unknown, λ=1/6
σ & µ Known, λ=1/4
σ & µ Unknown, λ=1/4
Weight of SR (ω)
0.33
0.326
0.322
0.318
0.314
0.31
0.306
0.302
0
0.5
1
1.5
2
σ, n=200 & µ=5.0
Fig. 5-8 – Justified weights of SR versus σ for three different λ values when σ is known and unknown
with error bars showing a 95% confidence interval.
The application of ߱
ෝ implements error in estimation of ‫ܧ‬൫‫ݔ‬ௌோ಴ ൯. For example, for
ߪ ൌ 2.0, there are at most 0.3% and 0.04% errors when ߣ ൌ 1/6 and ߣ ൌ 1/4,
respectively, compared to the case where ߤ and ߪ are known (Fig. 5-9b and 5-9d). As
seen in Fig. 5-9, the error associated in the estimation of ‫ܧ‬൫‫ݔ‬ௌோ಴ ൯ drops when σ decreases
and/or ݊ increases.
The expected value and variance of SRC are also analytically derived using Eqs. 5-8
and 5-9, respectively, with the weight of ߱ obtained from Eq. 5-15 (Fig. 3-12, solid
lines). In analytical derivation, we assume that all population parameters, ߤ, ߪ, and ߣ, are
known.
60 Power-Normal Distribution
(a)
(b)
σ=1.5, λ=1/6
52
MC, σ & µ Unknown
MC, σ & µ Unknown
MC, σ & µ Known
51.5
σ=2.0, λ=1/6
61
60.5
MC, σ & µ Known
Analy
Analy
60
E(xSR_C)
E(xSR_C)
51
50.5
59.5
50
59
49.5
58.5
49
58
0
0.05
0.1
0.15
0.2
0
0.05
1/√n
(c)
(d)
σ=1.5, λ=1/4
30.7
MC, σ & µ Known
0.2
MC, σ & µ Unknown
MC, σ & µ Known
Analy
34
Analy
E(xSR_C)
E(xSR_C)
0.15
σ=2.0, λ=1/4
34.2
MC, σ & µ Unknown
30.5
0.1
1/√n
30.3
33.8
33.6
30.1
33.4
33.2
29.9
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
0.1
0.15
0.2
1/√n
Fig. 5-9 – E(xSR_C) analytically derived and numerically calculated using MC simulation with error
bars showing 95% confidence interval, when σ is either known or unknown.
1.03
λ=1/2
λ=1/4
1.02
E(xSRc)/E(X)
λ=1/8
λ=1/16
1.01
1
0.99
0.98
0.97
0
0.5
1
1.5
2
2.5
3
Standard Deviation (σ)
Fig. 5-10 -- Ratio of the expected value of SRC to E(X) for four λ values.
The de-biased SR approach causes SR to become unbiased as the ratio of
‫ܧ‬൫‫ݔ‬ௌோ಴ ൯⁄‫ ܧ‬ሺܺሻ is almost one, but as σ exceeds three, the ratio deviates from unity for
cases of ߣ ൌ 1/2 and ߣ ൌ 1/4 (Fig. 5-10). Under this condition ( ߪ ൐ 3 ), the required
61 Power-Normal Distribution
assumption of Eq. 5-15 (‫ ܭ‬is sufficiently large) is no longer satisfied and thus Eq. 5-15 is
not valid anymore. Thus the ratio ‫ܧ‬൫‫ݔ‬ௌோ಴ ൯⁄‫ ܧ‬ሺܺሻ starts to deviate from one as ߪ exceeds
three. The performance comparisons of SR, SRC, the AA, and PT, therefore, are limited
to the range of ߪ ∈ ሾ0, 3ሿ.
One may wonder how this modification affects the SE and RMSE of SR. The
modification causes the SE of SRC to become larger than the SE of SR for any ߪ, ݊, and
ߣ (Fig. 5-11). This increase is insignificant for large ߣ’s but becomes considerable for
small ߣ’s. For instance when ߪ ൌ 3, there is only a 0.74% increase in SE when ߣ ൌ 1/2;
however, as ߣ decreases to 1/16 the SE of SRC is 35% larger than the SE of SR. SRC has
smaller SE than PT and larger SE than the AA for any ߪ, ݊, and ߣ except for very small
ߣ, such as ߣ ൌ 1/16 (Fig. 5-11 and 5-12).
(a)
(b)
λ=1/2
2.5
λ=1/4
12
AA
AA
PT
σ =3.0
SD(xT)
SR
SRc
1.5
10
σ =2.0
PT
σ =3.0
SR
SD(xT)
2
8
SRc
6
σ =2.0
1
4
σ =1.0
0.5
σ =1.0
2
0
0
0
0.05
0.1
0.15
0.2
0
0.05
0.1
1/√N
(c)
(d)
λ=1/8
70
PT
50
SR
0.2
λ=1/16
400
AA
350
σ =3.0
SRc
SD(xT)
SD(xT)
AA
60
0.15
1/√N
40
30
PT
300
SR
250
SRc
σ =3.0
200
150
20
σ =2.0
10
100
σ =2.0
50
σ =1.0
0
σ =1.0
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√N
0.1
1/√N
Fig. 5-11 – SE’s of the AA, SR, and PT for four different values of λ and σ.
62
0.15
0.2
Power-Normal Distribution
(a)
(b)
λ=1/16
3
PT has smaller SE SRc has smaller SE 2.5
Standard Deviation (σ)
2.5
Standard Deviation (σ)
λ=1/16
3
2
1.5
SRc has smaller SE 1
0.5
0
2
1.5
AA has smaller SE 1
0.5
0
0
2000
4000
6000
8000
10000
0
2000
4000
n
6000
8000
10000
n
Fig. 5-12 – σ versus n showing regions that SRc has smaller SE than (a) PT and (b) the AA.
(a)
(b)
λ=1/2
2.5
λ=1/4
12
AA
AA
PT
σ =3.0
RMSE(xT)
SR
SRc
1.5
10
σ =2.0
1
PT
σ =3.0
SR
RMSE(xT)
2
8
SRc
6
σ =2.0
4
σ =1.0
0.5
σ =1.0
2
0
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√N
(c)
(d)
λ=1/8
70
PT
50
SR
0.15
σ =3.0
SRc
40
30
σ =2.0
20
10
AA
PT
300
SR
250
SRc
σ =3.0
200
150
100
σ =2.0
50
σ =1.0
0
σ =1.0
0
0
0.05
0.1
0.2
λ=1/16
400
350
RMSE(xT)
RMSE(xT)
AA
60
0.1
1/√N
0.15
0.2
0
1/√N
0.05
0.1
0.15
0.2
1/√N
Fig. 5-13 – RMSE’s of the AA, SR, and PT for four different values of λ and σ.
The SR modification converts SR to a consistent mean estimator but causes RMSE to
increase except for some ranges of ݊ and ߪ depending on ߣ (Fig. 5-13). The AA is more
efficient than SRC for any ݊ and ߪ except for small ߣ, such as ߣ ൌ 1/16 where the AA
has higher efficiency than SRC when ߪ ൏ 2.067 (Fig. 5-12 and 5-13d). SRC is more
63
Power-Normal Distribution
efficient than PT for some ranges of ݊ and ߪ depending on ߣ. For instance, when
ߣ ൌ 1/16 and ߪ ൌ 3.0, PT is more efficient than SRC for ݊ ൏ 52 (Fig. 5-13d); however,
when ߣ ൌ 1/4, SRC is more efficient for any ߪ and ݊ (Fig. 5-13c).
As mentioned before, SRC has higher SE than SR for any ݊ and ߪ, but it is unbiased.
Therefore, its larger SE can be compensated by its zero bias, and consequently SRC
becomes more efficient than SR for some ranges of ݊ and ߪ depending on ߣ (Fig. 5-14).
λ=1/2
(a)
(b)
3
SR
2.5
SRC
1.5
σ
σ
2
3
2.5
λ=1/4
2
λ=1/8
λ=1/16
1.5
1
1
0.5
SR
0.5
0
0
25
2025
4025
6025
0
8025
2000
4000
6000
8000
10000
n
n
Fig. 5-14 – σ versus n showing regions where SRC is more efficient than SR when (a) λ=1/2 and (b)
SRC is more efficient than SR when σ is greater than the value given by each curve depending on n
and λ; otherwise SR is more efficient.
5.5
Concluding Remarks
This chapter demonstrates that as ߣ approaches one, SR and PT approximate a mean
value with insignificant bias; however, their biases significantly rise as ߣ approaches
zero. SR underestimates a mean value with larger bias than PT, but it has smaller
uncertainty for any variability, sample size, and λ values. The AA has the smallest
uncertainty when ߣ ൌ 1 and ߣ ൌ 1/2, but for other λ values, the AA has smaller
uncertainty than SR only for certain range of variability and sample size. None of the
mean estimators under review is an absolute winner in terms of efficiency for any
variability, sample size, and λ. Therefore, SR can become the optimum mean estimator as
its bias can be compensated by its smaller uncertainty and higher efficiency for some
range of variability and sample size.
64 Auto-Correlated Random Variables
Chapter 6 : Performance Evaluation for the Case of AutoCorrelated Random Variables
In spite of SR being used as an alternative mean estimator to the AA in the oil and gas
industry, a few researchers such as Keefer and Bodily (1983), Megill (1984), and Bickel
et al. (2011) have studied its performance in terms of its bias when samples are
independent and identically distributed. Reservoir parameters, however, might be
dependent with their neighbouring points. For example, permeability measured along a
well might be auto-correlated which may describes the sequences of lithofacies in the
well. It appears that no attention has been paid to the performance of SR when samples
are auto-correlated.
Thus this chapter evaluates the performance of SR and compares it to the
performances of the AA, MLE, and PT when RV’s are dependent and follow the first
order auto-regressive model. The mean estimators’ properties including uncertainty,
consistency, and efficiency are analytically evaluated for the case of log-normal
distribution, and then these analytical expressions are validated using MC simulation.
6.1
Assumptions
The assumptions used in this chapter to analytically derive the properties of mean
estimators are as follows. RV’s, ܺଵ , … , ܺ௡ , are assumed to be log-normally distributed
with ‫ ܧ‬ሾ݈݊ሺܺሻሿ ൌ ߤ and ܸܽ‫ݎ‬ሾ݈݊ሺܺሻሿ ൌ ߪ ଶ and follow the first order auto-regressive
(AR(1)). In other words, the RV, ܺ, can be converted into another RV, ܻ, where ܻ ൌ
݈݊ሺܺሻ and ܻ~ܰሺߤ, ߪ ଶ ሻ, and RV’s, ܻଵ , … , ܻ௡ , follow AR(1) as
ܻ௭ ൌ ‫ ܥ‬൅ ߩଵ ܻ௭ିଵ ൅ ߝ௒ , ................................................................................................. (6-1)
where ‫ ܥ‬is a constant value, ߝ௒ is a RV which is normally distributed with the mean of ߤఌ
and variance of ߪఌ ଶ , the subscript ‫ ݖ‬stands for the location at which the RV, ܻ, is
measured, and ߩଵ is the correlation coefficient between the pairs of ܻ௭ and ܻ௭ିଵ . ሼܻ௭ ሽ is
called a first order auto-regressive process (i.e., the observation at location ‫ ݖ‬depends on
65 Auto-Correlated Random Variables
the observation at location ‫ ݖ‬െ 1, with the correlation coefficient of ߩଵ , and on ߝ௒ ). Eq. 6­
1 might be treated as a linear regression between ܻ௭ and ܻ௭ିଵ with ߝ௒ as an error term.
ሼܻሺ‫ݖ‬ሻሽ is assumed completely stationary which means the joint distribution of
൛ܻሺ‫ݖ‬ଵ ሻ, ⋯ , ܻሺ‫ݖ‬௡ ሻൟ is identical with the joint distribution of ൛ܻሺ‫ݖ‬ଵ ൅ Δ‫ݖ‬ሻ, ⋯ , ܻሺ‫ݖ‬௡ ൅
Δ‫ݖ‬ሻൟ for any ‫ ݖ‬and Δ‫ݖ‬, where ݊ is the number of samples (i.e., ‫ ܧ‬ሺܻ௭ ሻ ൌ ߤ for all ‫ݖ‬,
ܸܽ‫ݎ‬ሺܻ௭ ሻ ൌ ߪ ଶ for all ‫ݖ‬, etc).
According to the assumptions above, ‫ ܥ‬ൌ ߤሺ1 െ ߩଵ ሻ, ߤఌ ൌ 0, and ߪఌ ଶ ൌ ሺ1 െ
ߩଵ ଶ ሻσଶ , respectively (see Appendix F for derivations).
Although the AR(1) model considers only the first-step dependency, the correlation
coefficient of ߩఛ , which is the correlation coefficient between pairs of values of ሼܻሺ‫ݖ‬ሻሽ
separated by an interval ߬ and expressed as ߩఛ ൌ
௖௢௩ሼ௒ሺ௭ሻ,௒ሺ௭ିఛሻሽ
, becomes smaller and
ఙሼ௒ሺ௭ሻሽ ఙሼ௒ሺ௭ିఛሻሽ
smaller as ߬ increases, and approaches zero for large ߬. The reason is that ܻ௭ is related to
ܻ௭ିଵ , and ܻ௭ିଵ is related to ܻ௭ିଶ ; consequently ܻ௭ is related to ܻ௭ିଶ with smaller
correlation coefficient, and so on. According to the fact that ߩఛ is an even function of ߬
when ܻ௭ is real-valued; it can be given by (Priestley 1981)
ߩఛ ൌ ߩଵ |ఛ| , ߬ ൌ 0, േ1, േ2, ⋯. ....................................................................................... (6-2) After generating ܻሺ‫ݖ‬ሻ, the RV, ܺሺ‫ݖ‬ሻ, is derived using exponential transformation,
ܺሺ‫ݖ‬ሻ ൌ ݁ ௒ሺ௭ሻ . The correlation coefficient between ܺሺ‫ݖ‬ሻ and ܺሺ‫ ݖ‬െ ߬ሻ is expressed as
(Vanmarcke 2010)
ߩ௑ ሺ߬ሻ ൌ
మ
௘ ഐഓ ഑ ିଵ
మ
௘ ഑ ିଵ
. ......................................................................................................... (6-3) In other words, ܺ௭ ൌ ‫ ܥ‬ᇱ ൅ ߩ௑ ଵ ܺ௭ିଵ ൅ ߝ௑ , where ߩ௑ ଵ is the correlation coefficient
between ܺ௭ and ܺ௭ିଵ , ‫ ܥ‬ᇱ ൌ ݁ ఓାఙ
మ ⁄ଶ
൫1 െ ߩ௑ ଵ ൯, and ߝ௑ is normally distributed with the
మ
మ
mean of ‫ ܧ‬ሺߝ௑ ሻ ൌ ߤఌ೉ ൌ 0 and variance of ܸܽ‫ݎ‬ሺߝ௑ ሻ ൌ ߪఌ೉ ଶ ൌ ൫1 െ ߩ௑ ଵ ଶ ൯݁ ଶఓାఙ ൫݁ ఙ െ
1൯.
66 Auto-Correlated Random Variables
6.2
Analytical Expressions of Mean Estimators’ Properties
In this section the expected values and SE’s of the mean estimators are analytically
derived based on the assumptions mentioned above.
The expected value of the AA is ‫ ܧ‬ሺ‫ݔ‬஺ ሻ ൌ ‫ ܧ‬ሺܺሻ and its variance is expressed as
ܸܽ‫ ݎ‬ቀ‫ݔ‬஺ ௗ௘௣ ቁ ൌ
௏௔௥ሺ௑ሻ
௡
ఛ
ቂ1 ൅ 2 ∑௡ିଵ
ఛୀଵ ቀ1 െ ቁ ߩ௑ ሺ߬ሻቃ, .................................................... (6-4) ௡
మ
మ
where ܸܽ‫ݎ‬ሺܺሻ ൌ ݁ ଶఓାఙ ൫݁ ఙ െ 1൯ is the variance of RV, ܺ, ݊ is the number of sample,
and the subscript of ݀݁‫ ݌‬stands for the dependant case (Vicens and Schaake 1972).
Eq. 6-4 reveals that ݊ auto-correlated samples is not as informative as ݊ uncorrelated
sample, thus to achieve a certain accuracy, fewer or more auto-correlated samples—
depending on the sign of ߩଵ — are needed. This equivalent sample size is called the
effective sample size (ESS), designated by ݊௘௙௙ , where the subscript of ݂݂݁ stands for
effective. If ߩ௑ ൐ 0 more auto-correlated samples are needed; if ߩ௑ ൏ 0 fewer correlated
samples might be needed; and ݊௘௙௙ approaches one for strong positive correlation
(ߩ௑ → 1) (Priestley 1981).
Among different approaches discussed by Thiebaux and Zwiers (1984) to calculate
ESS, a method which considers both the variance of the AA of independent data,
ܸܽ‫ݎ‬ሺ‫ݔ‬஺ ሻ, and the variance of the AA of dependent data, ܸܽ‫ ݎ‬ቀ‫ݔ‬஺ ௗ௘௣ ቁ, is used in this
study. The ESS can be given by
ఛ
݊௘௙௙ ൌ ݊⁄ቂ1 ൅ 2 ∑௡ିଵ
ఛୀଵ ቀ1 െ ቁ ߩ௑ ሺ߬ሻቃ. ...................................................................... (6-5)
௡
Thus, Eq. 6-4 is rewritten as
మ
ܸܽ‫ ݎ‬ቀ‫ݔ‬஺ ௗ௘௣ ቁ ൌ
మ
௘ మഋశ഑ ቀ௘ ഑ ିଵቁ
௡೐೑೑
. .................................................................................... (6-6) As mentioned before, the statistical properties of SR and PT are functions of the
statistical properties of percentiles. Thus, mean and standard deviation of percentiles are
derived first. The statistical properties of percentiles are functions of log-mean of
ଶ
݉௬ ൌ ∑௡௜ୀଵ ݈݊ሺ‫ݔ‬௜ ሻ⁄݊ and log-standard deviation of ‫ ݏ‬ൌ ට∑௡௜ୀଵൣ݈݊ሺ‫ݔ‬௜ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ
67 Auto-Correlated Random Variables
(see Appendix G). ݊௘௙௙ is derived based on the variance of the AA, so ESS should be
modified for estimating the statistical properties of percentiles and consequently SR and
PT properties.
Zieba (2010) derived an expression for the sample variance of auto-correlated
samples as
‫ݏ‬௔ ଶ ൌ
ሺ௡ିଵሻ
‫ݏ‬ଶ , ....................................................................................... (6-7) ഓ
భశమ ∑೙షభ
ഓసభ ሺభష೙ሻഐഓ
௡ቈଵି
቉
೙
where ‫ݏ‬௔ ଶ is the sample variance of ݊ auto-correlated samples and ‫ ݏ‬ଶ is the sample
variance of ݊ uncorrelated samples. In other words, the variance of auto-correlated
samples can alternatively be writen as the product of the variance of uncorrelated samples
and a correction factor of
௡
ߚൌ
௡ିଵ
ഓ
൤1 െ
ଵାଶ ∑೙షభ
ഓసభ ሺଵି೙ሻఘഓ
௡
൨, ..................................................................................... (6-8) where ߚ approaches one for large ݊ (Zieba 2010, Fig. 4). Hence Eq. 6-7 is rewritten as
‫ݏ‬௔ ଶ ൌ
ଵ
ఉ
‫ݏ‬ଶ . .................................................................................................................. (6-9) Taking expectation of Eq. 6-9 yields the expected value of ‫ݏ‬௔ ଶ as
‫ ܧ‬ሺ‫ݏ‬௔ ଶ ሻ ൌ
ଵ
ఉ
ߪ ଶ , ...........................................................................................................(6-10) Consequently a new ESS is defined as
݊௘௙௙ ∗ ൌ
௡
ଵ
. ........................................................................................(6-11)
ഓ
ఉ ቄଵାଶ ∑೙షభ
ቀଵି ቁఘഓ ቅ
ഓ
೙
݊ auto-correlated samples are equivalent to ݊௘௙௙ ∗ un-correlated samples to estimate
the mean value using the SR and PT to reach certain uncertainty.
The statistical properties of SR and PT are derived as follows:
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ ൌ ݁
∑
ସ௜ୀଶ
୵వబ ೔
௜!
഑మ
మ೙೐೑೑ ∗
ఓା
ቄ0.3݁ ୵భబ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
୵భబ ೔
௜!
ܶቃ ൅ 0.4 ൅ 0.3݁ ୵వబ ா൫௦೤൯ ቂ1 ൅
ܶቃቅ, ..............................................................................................................(6-12) 68 Auto-Correlated Random Variables
‫ ܧ‬ሺ‫ݔ‬௉் ሻ ൌ ݁
୵వఱ ೔
∑
ସ௜ୀଶ
௜!
഑మ
మ೙೐೑೑ ∗
ఓା
ቄ0.185݁ ୵ఱ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
୵ఱ ೔
௜!
ܶቃ ൅ 0.63 ൅ 0.185݁ ୵వఱ ா൫௦೤ ൯ ቂ1 ൅
ܶቃቅ, ...............................................................................................................(6-13)
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ ൌ
݁
మ഑మ
೙೐೑೑ ∗
ଶఓା
ቄ0.09݁ ଶ୵భబா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺଶ୵వబ ሻ೔
∑
ସ௜ୀଶ
௜!
ሺଶ୵భబ ሻ೔
௜!
ܶቃ ൅ 0.24݁ ୵భబ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
ܶቃ ൅ 0.16 ൅ 0.09݁ ଶ୵వబ ா൫௦೤൯ ቂ1 ൅
ሺ୵భబ ሻ೔
௜!
ܶቃ ൅ 0.24݁ ୵వబ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵వబ ሻ೔
௜!
ܶቃ ൅
0.18ቅ െ ሼ0.09‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻଶ ൅ 0.16‫ܧ‬ሺ‫ݔ‬ହ଴ ሻଶ ൅ 0.09‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻଶ ൅ 0.24‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅
0.24‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻ ൅ 0.18‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻሽ, ................................................................(6-14)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ ൌ
݁
మ഑మ
೙೐೑೑ ∗
ଶఓା
∑
ସ௜ୀଶ
ቄ0.034݁ ଶ୵ఱ ாሺ௦೥ ሻ ቂ1 ൅ ∑ସ௜ୀଶ
ሺଶ୵వఱ ሻ೔
௜!
ሺଶ୵ఱ ሻ೔
௜!
ܶቃ ൅ 0.13݁ ୵ఱ ாሺ௦೥ሻ ቂ1 ൅ ∑ସ௜ୀଶ
ܶቃ ൅ 0.13 ൅ 0.034݁ ଶ୵వఱ ாሺ௦೥ ሻ ቂ1 ൅
ሺ୵ఱ ሻ೔
௜!
ܶቃ ൅ 0.13݁ ୵వఱ ாሺ௦೥ ሻ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵వఱ ሻ೔
௜!
ܶቃ ൅
0.07ቅ െ ሼ0.034‫ܧ‬ሺ‫ݔ‬ହ ሻଶ ൅ 0.13‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻଶ ൅ 0.034‫ ܧ‬ሺ‫ݔ‬ଽହ ሻଶ ൅ 0.13‫ ܧ‬ሺ‫ݔ‬ହ ሻ‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅
0.13‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽହ ሻ ൅ 0.07‫ ܧ‬ሺ‫ݔ‬ହ ሻ‫ ܧ‬ሺ‫ݔ‬ଽହ ሻሽ, ..................................................................(6-15)
௜
ܶ ൌ ‫ ܧ‬ቄൣ‫ݏ‬௬ െ ‫ܧ‬൫‫ݏ‬௬ ൯൧ ቅ
Where
and
భ
‫ܧ‬൫‫ݏ‬௬ ൯ ൌ Γ൫݊௘௙௙ ∗ ⁄2൯ൣ2ߪ ଶ ൗ൫݊௘௙௙ ∗ െ 1൯൧మ ൗΓൣ൫݊௘௙௙ ∗ െ 1൯⁄2൧ (see Appendix G for the
derivations). The expected value and SE of MLE is derived based on two effective number of ఛ
samples: ݊௘௙௙ ൌ ݊⁄ቄ1 ൅ 2 ∑௡ିଵ
ቀ1 െ ቁ ߩఛ ቅ that is derived based on the variance of the
ఛ
௬
௡
sample mean of ݈݊ሺ‫ݕ‬ሻ, and ݊௩ ∗ that was introduced by Bayley and Hammersley (1946)
using the variance of sample variance. Hence the statistical properties of MLE are given
as (see Appendix H for derivations)
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ మ೙೐೑೑
೤
ఓା
ቀ1 െ
ఙమ
ቁ
௡ೡ ∗ ିଵ
೙ ∗ షభ
ି ೡమ
, ...................................................................(6-16) 69
Auto-Correlated Random Variables
and
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
6.3
഑మ ೙೐೑೑
ଶஜା
೤
ቐ ݁
഑మ ೙೐೑೑
೤
ቀ1 െ
ଶఙ మ
ሺ೙ ∗ షభሻ
ି ೡమ
ቁ
௡ೡ ∗ ିଵ
െ ቀ1 െ
ିሺ௡ೡ ∗ ିଵሻ
ఙమ
ቁ
௡ೡ ∗ ିଵ
ቑ. ......(6-17)
Analytical Expression Validations Using Monte Carlo Simulation
Since the term of ݁ ఓ ൌ ‫ݔ‬ହ଴ is common in the expressions derived in the previous
section, the ratios of the analytical expressions of expected value and SE to ‫ݔ‬ହ଴ are used
to numerically validate analytical expressions using MC simulation. For this purpose,
m=30,000 data sets containing n=25 to 3000 samples of ‫ ݔ‬are taken from a log-normal
distribution with the log-mean of 4.6 and log-standard deviation of ߪ varying from 0.05
to 1.5, with the correlation coefficient of ߩ௑ ଵ ൌ 0.7.
(a)
(b)
σ=0.05
1.0018
1.138
1.0016
1.136
E (xA)/x50
E (xA)/x50
1.0014
1.0012
1.001
1.0008
1.134
1.132
1.13
Analy
Analy
1.128
MC
1.0006
MC
1.126
0
0.05
0.1
0.15
(c)
0
0.2
1/√n
0.05
0.1
0.15
0.2
0.15
0.2
1/√n
(d)
σ=1.0
1.675
3.16
1.665
3.14
1.66
3.12
1.655
σ=1.5
3.18
1.67
3.1
E (xA)/x50
E (xA)/x50
σ=0.5
1.14
1.65
1.645
1.64
Analy
1.635
3.06
3.04
Analy
3.02
MC
1.63
3.08
MC
3
1.625
2.98
0
0.05
0.1
1/√n
0.15
0.2
0
0.05
0.1
1/√n
Fig. 6-1 – Expected value of the AA/x50 obtained from analytical expressions and computed
numerically using MC simulation shown with error bars showing 95% confidence interval.
70
Auto-Correlated Random Variables
(a)
(b)
σ=0.05
0.025
σ=0.5
0.3
0.25
0.02
0.2
SD (xA)/x50
SD (xA)/x50
0.015
0.01
Analy
0.005
MC
0.1
Analy
MC
0.05
0
0
0
0.05
0.1
0.15
(c)
0
0.2
1/√n
0.05
(d)
5
0.8
4
SD (xA)/x50
1
0.4
Analy
MC
0.2
0.15
0.2
σ= 1.5
6
0.6
0.1
1/√n
σ= 1.0
1.2
SD (xA)/x50
0.15
3
2
Analy
1
MC
0
0
0
0.05
0.1
0.15
1/√n
0
0.2
0.05
0.1
0.15
0.2
1/√n
Fig. 6-2 – Standard error of the AA/x50 obtained from analytical expressions and computed
numerically using MC simulation shown with error bars showing 95% confidence interval.
The ratio of analytical expression of expected value and SE to ‫ݔ‬ହ଴ of the AA follows
MC simulation (Fig. 6-1 and 6-2).
There is at most 8% discrepancy between the analytical and MC simulation results,
with 95% confidence interval, of ‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ݔ‬ହ଴ when ߪ ൌ 1.5 and ݊ ൌ 25; however, the
difference decreases, for instance to 2.0% when ߪ decreases to 0.5, and approaches zero
as ݊ become very large (Fig. 6-3). The difference between the analytical and MC
simulation of ܵ‫݀ݐ‬ሺ‫ݔ‬ௌோ ሻ⁄‫ݔ‬ହ଴ is 18% at the most when ߪ ൌ 1.5 and ݊ ൌ 25; however, it
sharply decreases as ߪ decreases and ݊ become very large (Fig. 6-4).
71 Auto-Correlated Random Variables
(a)
(b)
σ = 0.05
1.002
1.155
1.0018
1.15
E(xSR)/x50
E(xSR)/x50
1.0016
1.0014
1.0012
1.145
1.14
1.135
1.001
1.13
1.0008
1.125
0
0.05
0.1
0.15
0.2
0
0.1
0.15
0.2
0.15
0.2
(d)
σ = 1.0
1.8
0.05
1/√n
1/√n
(c)
σ = 0.5
1.16
σ = 1.5
3.6
3.4
1.75
3.2
3
E(xSR)/x50
E(xSR)/x50
1.7
1.65
2.8
2.6
2.4
1.6
2.2
1.55
0
0.05
0.1
1/√n
0.15
2
0.2
0
0.05
0.1
1/√n
Fig. 6-3 – Expected value of SR/x50 obtained from analytical expressions and computed numerically
using MC simulation shown with error bars showing 95% confidence interval.
72 Auto-Correlated Random Variables
(a)
(b)
σ=0.05
0.3
SD (xSR)/x50
0.02
SD (xSR)/x50
σ=0.5
0.35
0.025
0.015
0.01
0.25
0.2
0.15
0.1
0.005
Analy
0
MC
0
0
0.05
0.1
0.15
(c)
0
0.2
1/√n
0.05
(d)
σ= 1.0
0.1
0.15
0.2
1/√n
σ= 1.5
6
1.2
5
1
4
0.8
SD(xSR)/x50
SD(xSR)/x50
Analy
0.05
MC
0.6
0.4
Analy
0.2
3
2
Analy
1
MC
MC
0
0
0
0.05
0.1
1/√n
0.15
0.2
0
0.05
0.1
1/√n
0.15
0.2
Fig. 6-4 – Standard error of SR/x50 obtained from analytical expressions and computed numerically
using MC simulation shown with error bars showing 95% confidence interval.
73 Auto-Correlated Random Variables
MC simulation gives ‫ ܧ‬ሺ‫ݔ‬௉் ሻ⁄‫ݔ‬ହ଴ at most 20% smaller than analytical approach for
ߪ ൌ 1.5 and ݊ ൌ 25. However, the difference decreases, for instance to 1.0% when ߪ
decreases to 0.5, and approaches zero as ݊ become very large (Fig. 6-5). The difference
between the analytical and MC simulation of ܵ‫݀ݐ‬ሺ‫ݔ‬௉் ሻ⁄‫ݔ‬ହ଴ is 40% at the most when
ߪ ൌ 1.5 and ݊ ൌ 25; however, it sharply decreases as ߪ decreases and ݊ become very
large (Fig. 6-6).
(a)
1.002
(b) 1.165
σ = 0.05
1.0018
1.16
1.0016
1.155
1.0014
1.15
σ = 0.5
MC
E(xPT)/x50
E(xPT)/x50
Analy
1.0012
1.001
1.145
1.14
MC
1.0008
1.135
Analy
1.0006
1.13
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
0.1
0.15
0.2
1/√n
(c)
1.9
1.85
σ = 1.5
4
MC
Analy
E(xPT) /x50
1.8
E(xPT)/x50
(d) 4.5
σ = 1.0
1.75
1.7
3.5
3
MC
2.5
1.65
Analy
2
1.6
0
0.05
0.1
1/√n
0.15
0
0.2
0.05
0.1
0.15
0.2
1/√n
Fig. 6-5 – Expected value of PT/x50 obtained from analytical expressions and computed numerically
using MC simulation shown with error bars showing 95% confidence interval.
74 Auto-Correlated Random Variables
(a)
(b)
σ=0.05
0.025
σ=0.5
0.35
0.3
0.02
Analy
SD (xPT)/x50
SD (xPT)/x50
0.015
0.01
0.005
MC
0.2
0.15
0.1
0.05
0
0
0
0.05
0.1
0.15
0.2
1/√n
(c)
Analy
0.25
MC
0
(d)
1
0.15
0.2
0.15
0.2
σ= 1.5
6
5
Analy
Analy
SD (xPT)/x50
MC
0.8
SD (xPT)/x50
0.1
1/√n
σ= 1.0
1.2
0.05
0.6
3
0.4
2
0.2
1
0
MC
4
0
0
0.05
0.1
1/√n
0.15
0.2
0
0.05
0.1
1/√n
Fig. 6-6 – Standard error of PT/x50 obtained from analytical expressions and computed numerically
using MC simulation shown with error bars showing 95% confidence interval.
75 Auto-Correlated Random Variables
The discrepancy between analytical and MC results of ‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ݔ‬ହ଴ reaches 27% at
the most for ߪ ൌ 1.5 and ݊ ൌ 25. However, the difference decreases, for instance to 10%
when ߪ decreases to 1.0, and approaches zero as ݊ become very large (Fig. 6-7). The
difference between the analytical and MC simulation of ܵ‫ ݀ݐ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ݔ‬ହ଴ is 54% at the
most when ߪ ൌ 1.5 and ݊ ൌ 25; however, it sharply decreases as ߪ decreases and ݊
become very large (Fig. 6-8).
(b)
σ = 0.05
1.19
1.0018
1.18
1.0016
1.17
E(xMLE)/x50
E(xMLE)/x50
(a) 1.002
1.0014
1.0012
1.001
σ = 0.5
MC
Analy
1.16
1.15
1.14
MC
1.0008
1.13
Analy
1.0006
1.12
0
0.05
0.1
0.15
0.2
0
0.05
0.1
1/√n
0.15
0.2
1/√n
(c)
1.95
σ = 1.5
7
MC
6
MC
Analy
1.9
5
1.85
E(xMLE)/x50
E(xMLE)/x50
(d)
σ = 1.0
2
1.8
1.75
Analy
4
3
2
1.7
1
1.65
0
1.6
0
0.05
0.1
0.15
0
0.2
1/√n
0.05
0.1
0.15
1/√n
Fig. 6-7 – Expected value of MLE/x50 obtained from analytical expressions and computed
numerically using MC simulation shown with error bars showing 95% confidence interval.
76 0.2
Auto-Correlated Random Variables
(a)
(b)
σ=0.05
0.025
0.3
SD (xMLE)/x50
0.02
SD (xMLE)/x50
σ=0.5
0.35
0.015
0.01
0.2
0.15
0.1
Analy
0.005
0.25
Analy
0.05
MC
0
MC
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√n
0.1
0.15
0.2
0.15
0.2
1/√n
(c)
12
1
10
0.8
0.6
0.4
Analy
0.2
σ=1.5
14
1.2
SD (xMLE)/x50
SD (xMLE)/x50
(d)
σ=1.0
1.4
Analy
MC
8
6
4
2
MC
0
0
0
0.05
0.1
1/√n
0.15
0.2
0
0.05
0.1
1/√n
Fig. 6-8 – Standard error of MLE/x50 obtained from analytical expressions and computed
numerically using MC simulation shown with error bars showing 95% confidence interval.
The analytical expressions derived in previous section are also validated based on the
assumptions that ߩ௑ ଵ ൌ 0.3 and ߩ௑ ଵ ൌ 0.0 (Fig. 6-9 through Fig. 6-12 shown only for the
case of ߪ ൌ 1.5). These figures depict insignificant discrepancy between the analytical
expressions and MC simulation results. For example for the case of MLE, the difference
between the analytical and MC simulation results of ‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ݔ‬ହ଴ and ‫ ݀ݐݏ‬ሺ‫ݔ‬ெ௅ா ሻ⁄‫ݔ‬ହ଴
decreases from 27% to 8% and from 54% to 9%, respectively, when ߩ௑ ଵ decrease from
0.7 to 0.3 when n=25 (compare Fig. 6-7d with Fig. 6-9d and Fig. 6-8d with Fig.
6-11d).When ߩ௑ ଵ decreases from 0.3 to 0.0, the discrepancy between the analytical and
MC simulation slightly increases but it is still less than the case of ߩ௑ ଵ ൌ 0.7 (Fig. 6-10
and Fig. 6-12).
The discrepancy between the MC simulation and analytical results of SR and PT
when ߩ௑ ଵ ൌ 0.0 is slightly higher compared to the Chapter Three (independent log­
77 Auto-Correlated Random Variables
normally distributed random variables). The analytical expressions shown in Fig. 6-10
and Fig. 6-12 give smaller expected values and standard errors than the MC results. The
reason is that the term of ݁ ௦௪ೠ in Eqs. G-1 and G-2 is terminated to the fourth term (Eq.
G-7) whereas there is no termination in Eqs. A-2 and A-3.
(b)
4
4
3.8
3.8
3.6
3.6
3.4
3.4
3.2
3.2
E(xSR)/x50
E (xA)/x50
(a)
3
2.8
2.6
2.6
Analy
2.4
2.4
MC
2.2
2.2
2
0
0.05
0.1
0.15
2
0.2
1/√n
(c)
0
0.05
0.1
0.15
0.2
1/√n
(d)
4
4
3.8
3.8
3.6
3.6
3.4
3.4
3.2
3.2
E(xMLE)/x50
E(xPT)/x50
3
2.8
3
2.8
2.6
MC
2.4
2.6
MC
2.4
Analy
2.2
3
2.8
Analy
2.2
2
2
0
0.05
0.1
0.15
0.2
0
1/√n
0.05
0.1
0.15
0.2
1/√n
Fig. 6-9 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions
and computed numerically using MC simulation shown with error bars depicting 95% confidence
interval when ρx1=0.3 and σ=1.5.
78 Auto-Correlated Random Variables
(b)
3.6
3.6
3.4
3.4
3.2
3.2
3
3
E(xSR)/x50
E (xA)/x50
(a)
2.8
2.6
2.4
Analy
2.2
MC
0.05
2.2
0.1
0.15
2
0.2
1/√n
(c)
0
0.05
0.1
0.15
0.2
1/√n
(d)
3.6
3.6
3.4
MC
3.2
Analy
3.4
3.2
E(xMLE)/x50
3
E(xPT)/x50
2.6
2.4
2
0
2.8
2.8
2.6
3
2.8
2.6
2.4
2.4
MC
2.2
2.2
Analy
2
2
0
0.05
0.1
0.15
0.2
0
1/√n
0.05
0.1
0.15
0.2
1/√n
Fig. 6-10 – The ratio of expected values of mean estimators to x50 obtained from analytical
expressions and computed numerically using MC simulation shown with error bars depicting 95%
confidence interval when ρx1=0.0 and σ=1.5.
79 Auto-Correlated Random Variables
(b)
3
3
2.5
2.5
2
2
SD(xSR)/x50
SD (xA)/x50
(a)
1.5
1
1.5
1
Analy
0.5
MC
0
0
0.05
0.1
0.15
Analy
0.5
MC
0
0.2
0
1/√n
0.05
(c)
(d)
4
3.5
0.15
0.2
3.5
3
Analy
3
2.5
Analy
2.5
MC
SD (xMLE)/x50
SD (xPT)/x50
0.1
1/√n
2
1.5
MC
2
1.5
1
1
0.5
0.5
0
0
0
0.05
0.1
1/√n
0.15
0
0.2
0.05
0.1
0.15
0.2
1/√n
Fig. 6-11 – The ratio of standard errors of mean estimators to x50 obtained from analytical
expressions and computed numerically using MC simulation shown with error bars depicting 95%
confidence interval when ρx1=0.3 and σ=1.5.
80 Auto-Correlated Random Variables
(a)
(b)
1.6
2.5
1.4
1.2
1.5
SD(xSR)/x50
SD (xA)/x50
2
1
Analy
0.5
0.1
0.15
0.6
Analy
0.2
0
0.05
0.8
0.4
MC
0
1
MC
0
0.2
0
1/√n
0.05
(c)
(d)
3
0.1
1/√n
0.15
0.2
2
1.8
2.5
1.6
Analy
1.4
MC
2
SD (xMLE)/x50
SD (xPT)/x50
Analy
1.5
1
MC
1.2
1
0.8
0.6
0.4
0.5
0.2
0
0
0
0.05
0.1
1/√n
0.15
0
0.2
0.05
0.1
0.15
0.2
1/√n
Fig. 6-12 – The ratio of standard errors of mean estimators to x50 obtained from analytical
expressions and computed numerically using MC simulation shown with error bars depicting 95%
confidence interval when ρx1=0.0 and σ=1.5.
6.4 Analysis of the Analytical Expressions of the Mean Estimators’
Properties
As shown in previous section, for small correlation coefficient the analytical results
match the MC simulation results well; however, for large correlation coefficient, the
analytical results are approximately identical to the numerical results obtained from MC
simulation for ߪ ൏ 1 and there are discrepancies between analytical and numerical results
when ߪ ൐ 1.0, especially when ݊ is small. Overall, as ݊ increases, the analytical results
follow numerical results with insignificant error regardless of variability. Thus analytical
expressions are used to analyze the mean estimators’ properties for ݊ ൐ 100 in this
section.
81 Auto-Correlated Random Variables
The consistency, uncertainty, and efficiency are evaluated in this section for the case
of ߩ௑ ଵ ൌ 0.7. Fig. 6-13 draws a comparison between the SE’s of the mean estimators and
the one with smaller SE is less uncertain than others. When ߪ ൑ 0.5, the SE’s of the AA,
SR, PT, and MLE are approximately identical (Fig. 6-13a); however, as ߪ increases, the
SE’s differ. When ߪ ൒ 1, PT has larger SE than SR for any ݊ and ߪ, but it has smaller SE
than the AA for certain range of ݊ and ߪ (Fig. 6-13d). The SE of the AA is approximately
identical to the SE of SR when ߪ ൌ 1; however, as ߪ exceeds one, the SR has smaller SE
than AA and MLE for any ݊ (Fig. 6-13d).
(a)
(b)
σ = 0.05
0.014
0.14
Arith
0.012
0.008
SR
MLE
PT
0.1
SD (xT)/x50
SD (xT)/x50
PT
Arith
0.12
MLE
0.01
σ = 0.5
0.006
0.004
SR
0.08
0.06
0.04
0.002
0.02
0
0
0.02
0.04
0.06
0.08
0
0.1
0
0.02
0.04
1/√n
(c)
(d)
σ = 1.0
0.6
0.08
0.1
0.06
0.08
0.1
σ = 1.5
3
Arith
0.5
PT
0.4
Arith
2.5
MLE
SD (xT)/x50
SD (xT)/x50
0.06
1/√n
SR
0.3
PT
SR
1.5
0.2
1
0.1
0.5
0
MLE
2
0
0
0.02
0.04
0.06
1/√n
0.08
0.1
0
0.02
0.04
1/√n
Fig. 6-13 – Analytical standard errors/x50 of the AA, SR, and PT.
Figure 6-14 compares the RMSE’s of the mean estimators and the one with smaller
RMSE is more efficient than others. When ߪ ൑ 0.5, all mean estimator have
approximately identical efficiency; however, as ߪ increases they perform differently. For
instance, in heterogeneous case, ߪ ൌ 1.5, SR is the most efficient mean estimator except
82 Auto-Correlated Random Variables
for very large sample size, ݊ ൐ 600, where the RMSE of SR approaches to a value that
is different from zero (Fig. 6-14d).
The RMSE’s of SR and PT do not approach to zero for very large ݊ and they are
inconsistent; however, the AA and MLE are consistent (Fig. 6-14).
(a)
(b)
σ = 0.05
0.014
0.01
0.008
0.1
0.08
0.006
0.06
0.004
0.04
0.002
0.02
0
0
0
0.02
0.04
0.06
0.08
0.1
1/√n
(c)
0
0.3
0.06
0.08
0.1
Arith
MLE
PT
SR
2
1.5
0.2
1
0.1
0.5
0
σ = 1.5
2.5
RMSE/x50
0.4
0.04
1/√n
3
Arith
MLE
PT
SR
0.5
0.02
(d)
σ = 1.0
0.6
RMSE/x50
Arith
MLE
PT
SR
0.12
RMSE/x50
RMSE/x50
0.14
Arith
MLE
PT
SR
0.012
σ = 0.5
0
0
0.02
0.04
0.06
1/√n
0.08
0.1
0
0.02
0.04
0.06
0.08
0.1
1/√n
Fig. 6-14 – Analytical RMSE/x50 of the AA, SR, PT, and MLE.
The dependency between data points causes the mean estimators to behave differently
compared to the case where samples are uncorrelated. For example, when ߪ ൌ 1.5 and
݊ ൐ 100, SR has the smallest efficiency for uncorrelated samples whereas it becomes the
most efficient mean estimator for 100 ൏ ݊ ൏ 600 (compare Fig. 3-7d with Fig. 6-14d).
As mentioned before, n positively correlated samples are less informative than n
uncorrelated samples (ESS<n), and as the correlation coefficient approaches zero, the
ESS tends to n. Thus, the mean estimators approximate the mean value with smaller
83
Auto-Correlated Random Variables
uncertainty and larger efficiency when data points are less or no auto-correlated (Fig.
6-15 and 6-16).
(a)
(b)
3
3
ρx1=0.7
ρx1=0.7
2.5
2.5
ρx1=0.3
2
SD(xSR)/x50
SD (xA)/x50
ρ=0.0
ρ=0.0
2
ρx1=0.3
1.5
1
0.5
1.5
1
0.5
0
0
0.02
0.04
0.06
0.08
0
0.1
0
1/√n
(c)
0.04
1/√n
0.06
0.08
0.1
(d)
3
3
ρx1=0.7
2.5
SD (xMLE)/x50
ρ=0.0
2
ρx1=0.7
ρx1=0.3
ρ=0.0
2.5
ρx1=0.3
SD (xPT)/x50
0.02
1.5
1
2
1.5
1
0.5
0.5
0
0
0
0.02
0.04
0.06
1/√n
0.08
0
0.1
0.02
0.04
0.06
0.08
0.1
1/√n
Fig. 6-15 – The ratio of standard errors of the mean estimators to x50 which analytically derived for
three different ρx1 values when σ=1.5.
84 Auto-Correlated Random Variables
(a)
ρ=0.3
ρ=0.7
ρ=0.0
2.5
ρ=0.3
ρ=0.7
ρ=0.0
2
1.5
1
0.5
1.5
1
0.5
0
0
0
0.02
0.04
0.06
0.08
0.1
1/√n
(c)
σ=1.5
3
2.5
RMSE/x50 of SR
2
RMSE/x50 of AA
(b)
σ=1.5
3
0
(d)
3
0.1
ρ=0.7
2.5
ρ=0.3
ρ=0.0
2
RMSE/x50 of MLE
RMSE/x50 of PT
0.08
σ=1.5
ρ=0.3
2
0.06
1/√n
ρ=0.7
2.5
0.04
σ=1.5
3
0.02
1.5
1
0.5
ρ=0.0
1.5
1
0.5
0
0
0
0.02
0.04
0.06
1/√n
0.08
0.1
0
0.02
0.04
0.06
0.08
0.1
1/√n
Fig. 6-16 – RMSE/x50’s of the mean estimators analytically derived for three different ρx1 values
when σ=1.5.
The analyses in this section are based on the assumption that the correlation
coefficient is known; however, in a real case, it has to be approximated. Thus, it adds an
error into the estimation of the mean estimator’s properties.
6.5
Auto-Correlated Random Variables with Bimodal Distribution
For the case of auto-correlated RV’s with bimodal distribution, the estimator
properties are only numerically evaluated using MC simulation as deriving analytical
expressions is out of the scope of this study. For this purpose, m = 20,000 data sets
including n = 25 to 1,000 samples are generated.
It is assumed that the RV, ܺ௭ , taken at location, ‫ݖ‬, follows a bimodal distribution
described by a mixture of two log-normal distributions. In other words, the distribution of
transformed RV, ܻ௭ ൌ ݈݊ሺܺ௭ ሻ, is split into two normal distributions with mixing portion
85 Auto-Correlated Random Variables
of ߙ ൌ 0.3. It is easier to generate the RV, ܻ௭ , first and then transform it to the RV, ܺ௭ ,
using ܺ௭ ൌ ݁ ௒೥ .
In order to generate a data set of ‫ݕ‬௭ , two different sets of ‫ݕ‬ଵ௜ ௭ ∈ ܻଵ ௭ and ‫ݕ‬ଶ௜ ௭ ∈ ܻଶ ௭
are generated, where ܻଵ ௭ is normally distributed with ߤଵ ൌ 1 and ߪଵ varying from 0.05 to
1.5; and ܻଶ ௭ follows a normal distribution with ߤଶ ൌ 3 and ߪଶ ൌ 0.5. It is also assumed
that ܻଵ ௭ and ܻଶ ௭ follow the first order auto-regressive model as
ܻଵ ௭ ൌ ߩ௒ଵଵ ܻଵ ௭ିଵ ൅ Cଵ ൅ ߝଵ ௭ , ......................................................................................(6-18)
and
ܻଶ ௭ ൌ ߩ௒ଶଵ ܻଶ ௭ିଵ ൅ Cଶ ൅ ߝଶ ௭ , ......................................................................................(6-19)
where, ߩ௒ଵଵ and ߩ௒ଶଵ are the first auto-correlation functions, and ‫ ݖ‬represents location at
which a sample is taken.
For numerical study, it is presumed that ߩ௑ଵଵ ൌ ߩ௑ଶଵ ൌ 0.7 and using log-normal
transformation, the auto-correlation functions of the RV, ܻ௭ , are calculated using Eq. 6-3.
Following that, these two subsets are combined based on the formula below
ቊ
‫ݕ‬௭ ൌ ‫ݕ‬ଵ ௭ ‫ ݖ‬൑ ܼ
, ..........................................................................................(6-20)
‫݁ݏ݅ݓݎ݄݁ݐ݋‬
‫ݕ‬௭ ൌ ‫ݕ‬ଶ ௭
where ܼ is a constant and is a function of ߙ , which means that the first subset occurs up
to the location of ܼ and the second subset appears after ܼ.
The mean estimators are applied to each data set to calculate their expected values
and SE’s as described before. The expected values and SE’s are used to numerically
evaluate the performances of mean estimators. For ߪଵ ൑ 1, MLE has the smallest SE and
has 13% at the most less SE than the AA. As ݊ increases, the AA, SR, and PT perform
similarly in the context of uncertainty, but MLE has slightly smaller uncertainty (Fig.
6-17a to 6-16c). As ߪଵ exceeds one, all mean estimators overlap each other for small ݊,
which makes it difficult to distinguish the estimator with the smallest SE; however, as ݊
becomes very large, SR and MLE have less uncertainty compared to the AA and PT (Fig.
6-17d).
86
Auto-Correlated Random Variables
In addition to SE, RMSE is numerically obtained to evaluate consistency and
efficiency of the estimators (Fig. 6-18). For ߪଵ ൑ 1, MLE has the smallest RMSE, and
other estimators have approximately identical RMSE’s. For ߪଵ ൐ 1, all mean estimators
overlap each other for small ݊, which makes it difficult to distinguish which one has
smaller RMSE. However, as ݊ increases, MLE has slightly higher efficiency than other
estimators (Fig. 6-18d).
The dependency between samples causes the SE’s (compare Fig. 4-7 with Fig. 6-17)
and RMSE’s (compare Fig. 4-8 with Fig. 6-18) of the mean estimators to increase. This
means that more positively auto-correlated samples are needed to extract the same
information from ݊ un-correlated samples. Hence the auto-correlation causes mean
estimators to behave differently in terms of uncertainty and efficiency compared to the
case of un-correlated data points. For instance, as pointed out before, SR has smaller
uncertainty than the AA for any ݊ and ߪ (Fig. 4-7); however, the auto-correlation leads
the AA to become less uncertain than SR for certain ranges of ݊ and ߪ (Fig. 6-17a to 617c). For large ߪ’s, the AA has a significantly higher uncertainty than other mean
estimators for the case of un-correlated samples (Fig. 4-7c and 4-7d); however, there is
no significant difference between the SE’s of the AA and other mean estimators for autocorrelated samples (Fig. 6-17d). Moreover, the auto-correlation causes the difference
between RMSE’s to diminish (Fig. 6-18d) whereas the AA has significantly larger RMSE
than other mean estimators for the un-correlated case when ߪ ൌ 1.5 and ݊ is small (Fig.
4-8c). This means that more auto-correlated samples are needed to achieve certain
accuracy if SR and PT are used as mean estimators instead of the AA.
87 Auto-Correlated Random Variables
(a)
AA
MLE
SR
PT
4
σ=0.5
6
5
4
SD(xT)
5
Standard Error (b)
σ=0.05
6
3
3
2
2
1
1
AA
MLE
SR
PT
0
0
0
0.05
0.1
0.15
0
0.2
0.05
1/√n
(c)
(d)
σ=1.0
6
0.15
0.2
0.15
0.2
σ=1.5
14
AA
MLE
SR
PT
12
5
10
SD(xT)
4
SD(xT)
0.1
1/√n
3
8
6
AA
2
4
MLE
SR
1
2
PT
0
0
0
0.05
0.1
0.15
0
0.2
0.05
0.1
1/√n
1/√n
Fig. 6-17 – Standard errors of the AA, SR, and PT with error bar showing 95% confidence interval.
88 Auto-Correlated Random Variables
(a)
(b)
σ=0.05
6
AA
MLE
SR
PT
4
5
RMSE
5
RMSE
σ=0.5
6
4
3
3
2
2
1
1
0
AA
MLE
SR
PT
0
0
0.05
0.1
0.15
0.2
0
0.05
1/√n
(c)
(d)
σ=1.0
0.15
0.2
0.15
0.2
σ=1.5
14
5
12
RMSE
6
4
RMSE
0.1
1/√n
AA
MLE
10
SR
8
PT
3
6
AA
MLE
SR
PT
2
1
4
2
0
0
0
0.05
0.1
0.15
0
0.2
1/√n
0.05
0.1
1/√n
Fig. 6-18 – RMSE`s of the AA, SR, and PT with error bar showing 95% confidence.
6.6
Concluding Remarks
This chapter shows that for the case of log-normal distribution, all mean estimators
have approximately identical uncertainty and efficiency when ߪ ൏ 1; however they
perform differently as ߪ increases and/or ݊ decreases. SR has the smallest uncertainty
and highest efficiency among mean estimators as ߪ exceeds one. The results demonstrate
that as the data points become less or no auto-correlated, the mean estimators
approximate the mean value with smaller uncertainty and higher efficiency.
For the case of the bimodal distribution, the mean estimators’ properties are only
numerically computed via MC simulation. The results show that the auto-correlation
causes the mean estimators to approximate mean values with larger uncertainty and
smaller efficiency. These changes in uncertainty and efficiency happen with different
rates for different mean estimators.
89 Comparison of Mean Estimators
Chapter 7 :
Comparison
of
Mean
Estimators
for
Independent Random Variables
A reliable estimator should simultaneously have small bias, small uncertainty (i.e.,
small SE), high efficiency (i.e., small RMSE), and consistency (i.e., zero RMSE for large
݊). Among all mean estimators considered in this study, none of them, however, has
these four conditions all together for all variabilities and sample sizes.
The AA is unbiased and MLE is asymptotically unbiased whereas SR and PT are both
biased, even for small variability. Their biases are insignificant for near-homogenous
populations, but sharply rise as the population becomes heterogeneous. Nevertheless, SR
and PT are unbiased when the underlying distribution is normal and have insignificant
bias for some power-normal distributions with small ߣ, such as ߣ ൌ 1/2.
For the case of the log-normal distribution, SR and PT have smaller bias than the
MLE when ߪ and ݊ are both very small and/or ߪ and ݊ are both moderately large (Fig.
7-1a). SR has smaller SE than PT, SRC1, and SRC2 for any ݊ and ߪ; however, when
ߪ൏
ଵ଻.ଷ
௡మ
൅
ସ.ଽ
య
௡మ
െ
଺.଼
௡
൅
଴.଴ଷ
√௡
൅ 1.1, ................................................................................. (7-1) SR has larger SE than the AA and MLE (Fig. 7-1b). SRC1 and SRC2 are the most efficient
mean estimators for some ranges of ݊ and σ (Fig. 7-1c).
90 Comparison of Mean Estimators
(a)
(b)
5
5
SR & PT
Standard Deviation (σ)
Standard Deviation (σ)
4
3
MLE 2
1
4
3
2
SR 1
AA 0
SR & PT
MLE 0
25
250
25
2500
250
2500
Number of Samples, n
Number of Samples, n
(c)
Standard Deviation (σ)
5
4
PT SRC2
3
2
SR 1
SRC1
MLE AA
0
25
250
2500
Number of Samples, n
Fig. 7-1 – σ versus n showing regions in which a mean estimator has (a) the smallest bias, (b) has the
lowest SE, and (c) is the most efficient estimator compared to other estimators for the case of log­
normal distribution.
When a data set follows a log-normal distribution, Fig. 7-1 can, indeed, be used as a
guideline to define an appropriate mean estimator to estimate the mean value of the data
set depending on its ߪ and ݊.
Although SR has less uncertainty than the AA for certain range for the case of log­
normal distribution, SR has smaller uncertainty than PT and the AA for any ݊ and ߪଵ
when underlying distribution is bimodal. SR has larger uncertainty than MLE if (Fig. 7-2)
ߪଵ ൏
ଷଶ.ଶ
௡మ
െ
ସ.ହସ
௡
൅ 1.69. ............................................................................................... (7-2) Among mean estimators considered in this study, MLE has the largest efficiency
except for large ߪଵ and small ݊ (Fig. 4-8). However, there is complexity in using MLE
for the case of bimodal distribution, thus other mean estimators are preferable. The
91
Comparison of Mean Estimators
greatest efficiency among AA, SR, and PT varies and depends on ranges of ߪଵ and ݊
(Fig. 7-2b).
(a)
2
(b)
SR
MC SR vs. PT
5
Analy SR vs. PT
MC SR vs. AA
1.8
4
1.6
Analy SR vs. AA
MLE 1.4
PT σ1
σ1
3
2
MC
1.2
SR 1
AA Analy
1
0
25
250
2500
25
250
Number of Samples, n
2500
Number of Samples, n
Fig. 7-2 –σ1 versus n showing regions in which (a) a mean estimator has smaller uncertainty, and (b)
is more efficient than other estimators when σ2=0.5 for the case of bimodal distribution.
While a small SE is a desirable property, none of the AA, SR, and PT has the smallest
SE for all ݊ and ߪ. This applies also when the underlying distribution is power-normal.
SR has smaller SE than PT for any ݊, ߪ, and ߣ values; however, it has smaller SE than
the AA for some ranges of ݊ and ߪ depending on λ value, except when ߣ ൌ 1 and
ߣ ൌ 1/2 where the AA has the smallest SE for any ݊ and ߪ (Fig. 7-3).
λ=1/4
λ=1/6
λ=1/8
λ=1/16
Standard Deviation (σ)
12
10
8
6
4
2
0
0
2000
4000
6000
8000
10000
Number of Samples, n
Fig. 7-3 – SR has smaller SE than the AA when σ is greater than the value given by each curve
depending on n and λ; otherwise the AA has less SE for the case of power-normal distribution (solid
curves and dots obtained from the analytical expressions and MC simulation, respectively).
SR has smaller RMSE than PT for some ranges of ݊ and ߪ depending on ߣ values
(Fig. 7-4a). The AA has the smallest RMSE for any ݊ and ߪ when ߣ ൒ 1/8; nevertheless,
92 Comparison of Mean Estimators
as ߣ decreases, for example, to 1/16, the AA has the smallest RMSE for some ranges of ݊
and ߪ (Fig. 7-4b).
(a)
(b)
10
8
λ=1/16
λ=1/2
λ=1/4
12
λ=1/6
λ=1/8
10
Standard Deviation (σ)
Standard Deviation (σ)
12
6
4
2
0
8
PT
6
4
SR
AA 2
0
25
250
2500
25000
25
250
2500
25000
Number of Samples, n
Number of Samples, n
Fig. 7-4 – (a) PT is more efficient than SR when σ is greater than the value given by each curve
depending on n and λ; otherwise SR is more efficient; and (b) when λ =1/16, a mean estimator is the
most efficient depending on σ and n (solid curves and dots obtained from the analytical expressions
and MC simulation, respectively).
The AA is an optimum mean estimator for the power-normal distribution with ߣ ൌ 1
and ߣ ൌ 1/2 because it has the smallest uncertainty and highest efficiency, and it is
unbiased. However, as ߣ differs from these two values, SR is preferable to the AA for
certain range of sample size and variability because it estimates mean value with
insignificant bias, the smallest uncertainty, and the highest efficiency.
For the bimodal distribution, MLE is an optimum mean estimator when there is a
sufficient sample size. However, it involves complex manipulations, thus other mean
estimators are preferable. SR has the smallest uncertainty compared to the AA and PT for
any ݊ and ߪ (Fig. 4-7); nevertheless, it is biased and less efficient for certain ranges of
sample size and variability. Although, the AA has larger uncertainty than SR, it is
unbiased and has higher efficiency than SR for the large range of sample size and
variability (Fig. 7-2b). The dependency between samples causes mean estimators to
behave differently; consequently, a different mean estimator may be chosen as an
optimum mean estimator for the case of auto-correlated samples compared to un­
correlated case. Auto-correlation leads the AA to have smaller uncertainty than SR for
certain ranges of ݊ and ߪ (Fig. 6-17a to 6-16c), while it has larger uncertainty for all ݊
and ߪ. When ߪ ൑ 1, the AA has higher efficiency than SR for auto-correlated data while
93 Comparison of Mean Estimators
they have approximately identical RMSE for un-correlated data (compare Fig. 4-8 with
Fig. 6-18). For example, suppose a dataset has ݊ ൌ 30and 1 ൏ ߪଵ ൏ 1.5. The SR can be
an optimum mean estimator because SR has insignificant bias (Fig. 4-6); the smallest
uncertainty; and the highest efficiency (Fig. 7-2b). Nevertheless, if this data set follows
the auto-regressive model, the AA is an optimum mean estimator because it is unbiased,
has the smallest uncertainty after MLE (Fig. 6-17), and the highest efficiency after MLE
(Fig. 6-18).
As shown in Fig. 7-1, depending on ݊ and ߪ, each of the AA, MLE, SR, and PT can
be optimum mean estimator. Although the AA is unbiased, it has the highest uncertainty
and lowest efficiency for certain range of ݊ and ߪ. Both SR and PT are biased, and their
biases can be compensated by their small uncertainty and high efficiency. However, for
certain range of ݊ and ߪ, they cannot be appropriate mean estimators because they
significantly underestimate the mean value even they have small SE’s and RMSE’s.
Under this condition, the de-biased versions of SR can be used instead, which estimate
the mean value with zero or insignificant bias, the lowest uncertainty, and the highest
efficiency.
Auto-correlation causes to change the performance of mean estimators for log-normal
distribution as well. For example, when ߪ is small, SR and PT have larger uncertainty
than the AA and MLE (Fig. 3-5a) whereas they have slightly smaller SE’s than the AA
and MLE for auto-correlated samples (Fig. 6-13a). Moreover, both SR and PT have
larger RMSE’s than the AA and MLE (Fig. 3-7Fig. 3-5a) whereas they have slightly
smaller RMSE’s than the AA and MLE for auto-correlated samples (Fig. 6-14Fig. 6-13a).
Thus, the AA is an optimum mean estimator for very small ߪ regardless of ݊ for uncorrelated samples (Fig. 7-1); however, SR is an optimum mean estimator when sample
are auto-correlated.
Each curve in Fig. 7-1 through Fig. 7-4 represents ߪ’s and ݊’s where the mean
estimators on either side of the curve have identical property. For example, the green
curve in Fig. 7-1c shows ߪ and ݊’s that both the AA and MLE have identical RMSE, and
as we move away from the curve the difference between the RMSE’s of the AA and
MLE increases. In another words, regions close to the curve which separates two
94
Comparison of Mean Estimators
estimators can considered as transition zone where both estimators can be utilized as an
optimum mean estimator.
In the most of cases, measurements are associated with errors (i.e., ‫ݔ‬௜ ᇱ ൌ ‫ݔ‬௜ ൅ ݁,
where ‫ݔ‬௜ is true value, ݁ represents the error associated with the measurement, and ‫ݔ‬௜ ᇱ is
the reported value as the measurement). If the error is assumed to have the mean of zero
and standard deviation of ߪ௘ , the error increases the variability of data (ܸܽ‫ݎ‬ሺ‫ݔ‬௜ ᇱ ሻ ൌ
ܸܽ‫ݎ‬ሺ‫ݔ‬௜ ሻ ൅ ߪ௘ ଶ ) while it does not change the mean value (‫ ܧ‬ሺ‫ݔ‬௜ ᇱ ሻ ൌ ‫ ܧ‬ሺ‫ݔ‬௜ ሻ ൅ ‫ ܧ‬ሺ݁ሻ ൌ
‫ ܧ‬ሺ‫ݔ‬௜ ሻ). Consequently, the measurement error may cause to choose a mean estimator as
an optimum estimator for the data set whereas another mean estimator is more
appropriate for the data set with zero error.
Reservoir parameters can be split into a number of subsets based on the geological or
geophysical character (e.g., permeability data set is subdivided on the basis of facie
types). This subdivision converts a data set with variability of ߪ and sample size of ݊ into
a number of subsets with smaller variabilities, ߪ ᇱ ’s and smaller ݊ᇱ ’s. Therefore, based on
ߪ ᇱ and ݊ᇱ of each subset, a different mean estimator might be appropriate, which
estimates the mean value with different uncertainty and efficiency degrees.
95 Case Studies
Chapter 8 : Case Studies The bootstrap is a resampling technique as a solution for the case where the true
distribution of RV, ܺ, is unknown and only an observed data set is available. This method
is based on randomly drawing ݊ samples from the observed samples such that each
sample can be selected more than one time. By repeating this resampling, ݉ subsets can
be created. Following that, the mean estimators are applied to ݉ data sets and the
sequence of ൛‫ݔ‬ො ∗ ் ଵ , … , ‫ݔ‬ො ∗ ் ௠ ൟ is generated. The mean value of this sequence is
approximated by the AA, designated by ሺ‫ݔ‬ො ∗ ் ሻ஺ , and its standard deviation is calculated
ଵ/ଶ
ଶ
as ‫݀ݐݏ‬ሺ‫ݔ‬ො ∗ ் ሻ ൌ ቄ∑ே
ො ∗ ் ௜ െ ሺ‫ݔ‬ො ∗ ் ሻ஺ ൧ ൗሺ݉ െ 1ሻቅ
௜ୀ଴ൣ‫ݔ‬
. ሺ‫ݔ‬ො ∗ ் ሻ஺ and ‫ ݀ݐݏ‬ሺ‫ݔ‬ො ∗ ் ሻ are, indeed,
approximations of the mean and SE of ‫ݔ‬ො் , respectively, and are good representative when
݉ is sufficiently large.
In this chapter, several datasets are analyzed to illustrate how the results of the
previous chapters can be applied and the results are compared to the bootstrap-derived
estimates. These datasets are as follows:
1. Reserves in the United Kingdom and Norwegian Central North Sea, as described
in Hurst et al. (2000);
2. Estimated ultimate recovery (EUR) of an Oklahoma gas field;
3. A permeability data set from the Cleveland Formation (Rollins et al 1992);
4. Ultimate recoverable gas reserves of Wabamun pool (MacCrossan 1969);
5. EUR of the Hemphill gas field; and
6. A permeability data set from the North Sea.
Using the bootstrap method, ݉ ൌ 30,000 subsets are generated from each available
data set and then the mean estimators are applied to each set (see results in Table 8-1
through Table 8-6).
96
Case Studies
In addition to the bootstrap, the performances of the mean estimators are evaluated
using the analytical expressions derived before. There are discrepancies between the
bootstrap and analytical results which are mainly caused by having insufficient samples
depending on the sample heterogeneity. For example, although the second data set has a
larger number of samples (n=83) than the first one (n=21) by a factor of four, it is more
heterogeneous by a factor of 1.8 (i.e., the first and second sets have the sample standard
deviations of 105 and 189, respectively). Hence, the first data set has approximately as
much information as the second one, given their variabilities and sample sizes
(approximately (21/83)(189/105)2 = 0.82).
The probability plots suggest that the first two examples are log-normally distributed
(Fig. 8-1), so ݁ ௠೤ ା௦೤
మ ⁄ଶ
could be used to approximate the mean value, where ݉௬ and ‫ݏ‬௬
are sample log-mean and log-standard deviation, respectively. Bayley and Hammersley
(1946), however, showed that this antilog of the mean log is biased hence, it should be
multiplied by a correction factor, ߚ݁ ௠ା௦
మ ⁄ଶ
. Later Agterberg (1974, p235) tabulated this
correction coefficient, β ൌ Ψ௡ ሺ‫ݐ‬ሻ⁄Ψஶ ሺ‫ݐ‬ሻ, where Ψ௡ ሺ‫ݐ‬ሻ is an infinite series of ‫ ݐ‬and ݊;
Ψஶ ሺ‫ݐ‬ሻ ൌ ݁ ௦೤
మ ⁄ଶ
; and ‫ ݐ‬ൌ ‫ݏ‬௬ ଶ ⁄2.
(b)
(a)
10000
1000
Sample SD of ln(x) = 1.17
Sample Mean of ln(x) = 3.66
1000
Sample SD of ln(x) = 1.94
Sample Mean of ln(x) = 3.09
100
10
1
# points= 21
Sample SR = Sample AA = SR/AA = Sample SD = Sample CV =
67.5
76.3
0.88
105
1.38
Reserves (MMCFE)
Reserves (MMBO)
100
10
1
0.1
# Wells= 83
Sample SR = Sample AA = SR/AA = Sample SD = CV =
69.1
90.6
0.76
189
2.09
0.01
0.1
2
5
10 15 20 30 40 50 60 70 80
90
95
2
98 99
5
10 15 20 30 40 50 60 70 80
90
95
98 99
Probability, (%)
Probability, (%)
Fig. 8-1 – Probability plots of data sets taken from (a) Hurst et al. (2000) in million barrel oil
(MMBO), and (b) EUR of an OK field in million cubic feet (MMCFE) with statistical properties
calculated from available data sets.
Hurst et al.’s (2000) data set has n = 21 points, which approximate a log-normal
distribution with log-mean of 3.66 and log-standard deviation of 1.17 (Fig. 8-1a). The
correction factor of ߚ ൌ 0.94 is used to calculate the unbiased estimate of the mean as
97 Case Studies
72.7 million barrels of oil (MMBO). He proposed a most likely reserve distribution
because Swanson’s mean estimate is feasible based on the available mapped closure and
variation in pay and recovery factor. However, SR underestimates the mean value by
10% (Fig. 3-4). In other words, the proposed reserves distribution may have 10% more
reserves, which is easily accommodated within the geological constraints described by
Hurst et al. (2000).
Based on the analytical expression, in which the log-normality of data set is assumed,
‫ݔ‬ௌோ is preferable to ‫ݔ‬௉் , as the SR bias (10%) is compensated by its 12% and 15%
smaller SE and RMSE than the SE and RMSE of PT, respectively (Table 8-1, last three
columns). The MLE poorly performs because the MLE performance strongly depends on
݊ that is very small in this case. SR outperforms the AA in context of efficiency and
uncertainty; on the other hand, SR approximates the mean value by 10% error whereas
the AA is unbiased. Therefore, it makes it difficult to choose SR or the AA as the
preferable mean estimator. The two SR bias reduction approaches significantly reduce the
SR bias but result in an increase in SE’s from 25.1 to 27.5 and 29.2 MMBO.
Nevertheless, SRC1 has a smaller SE and RMSE than other mean estimators with
exception of SR and estimates the mean with zero bias.
The bootstrap results show that the AA, MLE, and PT estimate the mean value with
large SE (Table 8-1). Although SRC1 and SRC2 have about 16% larger SE than SR, they
estimate the mean value with 0.08% and 0.01% bias, respectively. Thus, based on the
bootstrap and analytical results, SR is not an attractive estimator for this dataset, while
SRC1 and SRC2, are possible mean estimators in this case. Furthermore, their reduced bias
increases the economic value of the region by approximately $0.7 billion for $100/bbl oil.
Table 8-1 – Statistical properties of the Hurst et al.’s (2000) data set.
Mean
Estimator
ෝࢀ (MMBO)
࢞
sample mean
AA
MLE
PT
SR
SRC1
SRC2
76.30
77.40
56.80
67.50
74.10
75.50
Bootstrap Results (MMBO)
ሺ࢞
ෝ∗ ࢀ ሻ࡭
ෝ∗ ࢀ ሻ
࢙࢚ࢊሺ࢞
76.24
79.17
81.62
72.05
79.28
79.21
30.27
27.04
26.68
22.70
27.12
27.00
95% Confidence
ෝ∗ ࢀ
Interval of ࢞
16.92
135.57
26.16
132.17
29.33
133.90
27.56
116.54
26.11
132.44
26.29
132.14
98 Theory (MMBO)
࢙࢚ࢊሺ࢞ࢀ ሻ
ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
RMSE
28.79
29.04
29.46
25.11
27.53
29.17
1.00
1.06
0.97
0.91
1.00
1.00
28.79
29.39
29.55
25.98
27.53
29.17
Case Studies
The second data set consists of the gas reserves of an Oklahoma (OK) field, based on
83 wells. The probability plot shows an approximate log-normal distribution for the EUR
with the log-mean of 3.1 and log-standard deviation of 1.94, and  =0.94; consequently
the unbiased estimate of the mean is 136.85 MMCFE (Fig. 8-1b). This data set gives a
Swanson mean of 69 million cubic feet (MMCFE) whereas the AA is 90.6 MMCFE, a
24% difference (Table 8-2). This difference is equivalent to US$65,000 difference in
reserves per well at $3 per MCFE, which could lead to a poor economic assessment for
the whole prospect.
Based on the analytical expressions, although the AA is unbiased, it has the smallest
efficiency and largest SE (Table 8-2, last three columns). The MLE performs better than
the AA in terms of efficiency and uncertainty; however, it is biased because ݊ ൌ 83 is
not sufficiently large. The performance of the PT is slightly better than MLE in terms of
efficiency and uncertainty, but PT has larger bias than MLE. SR has the smallest SE, but
it underestimates the mean value by error of 40% and is as inefficient as MLE. The debiased SR corrections cause SE to increase from 29.7 to 48.66 and 52.83 MMCFE.
However, they have preference to SR and PT since they have smaller RMSE’s and
negligible bias, which compensate for their larger SE’s. SRC1 and SRC2 are also
preferable mean estimators compared to the AA and MLE since they have smaller SE’s
and RMSE’s.
Table 8-2 – Statistical properties of gas reserves of an Oklahoma field.
Mean
Estimator
ෝࢀ (MMBO)
࢞
sample mean
AA
MLE
PT
SR
SRC1
SRC2
90.60
143.00
88.30
69.10
112.90
99.20
Bootstrap Results (MMBO)
ሺ࢞
ෝ∗ ࢀ ሻ࡭
ෝ∗ ࢀ ሻ
࢙࢚ࢊሺ࢞
90.66
144.88
97.27
88.62
145.49
144.89
107.32
48.39
38.33
19.61
39.93
39.84
95% Confidence
ෝ∗ ࢀ
Interval of ࢞
0
130.60
50.04
222.50
22.14
173.30
50.18
126.70
67.23
211.20
66.80
197.20
Theory (MMBO)
࢙࢚ࢊሺ࢞ࢀ ሻ
ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
RMSE
102.77
59.90
45.50
29.74
48.66
52.83
1.00
1.07
0.78
0.61
1.0045
0.9998
102.77
60.73
55.28
63.15
48.66
52.83
The bootstrap results also indicate SR has the smallest SE’s; however, the unbiased
sample mean of 136.85 MMCFE does not lie within the 95% confidence intervals of SR.
The AA has the largest SE and then MLE, SRC1, SRC2, and PT have smaller SE’s in
descending order. As mentioned before, the MLE is biased for small ݊, so MLE is not a
99 Case Studies
good alternative mean estimator. PT could not be an appropriate alternative mean
estimator either because it underestimates the mean value by 22% (Fig. 3-4). Hence
according to the bootstrap and analytical results, the de-biased SRC1 and SRC2 might be
appropriate mean estimators, in descending order.
The third example is a permeability data set of the Cleveland Formation reported by
Rollins et al (1992). This data set (n=319) follows a log-normal distribution with the logmean of -3.6 and log-standard deviation of 1.73 millidary (mD) (Rollins et al. 1992, Fig.
5). Consequently, the unbiased estimate of the mean is 0.121 mD using the correction
factor of ߚ ൌ 0.99. The bootstrap method is not applied to this data set since the data
values are unavailable, and only analytical results are presented here (Table 8-3).
Table 8-3 – Statistical properties of measured permeability in Cleveland
Formation.
Mean
Estimator
ෝࢀ (MMCFE)
࢞
sample mean
AA
MLE
PT
SR
SRC1
SRC2
0.180
0.121
0.100
0.090
0.122
0.121
Theory (MMCFE)
࢙࢚ࢊሺ࢞ࢀ ሻ
0.0297
0.0191
0.0184
0.0130
0.0188
0.0198
ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
1.0
1.01
0.86
0.71
1.006
1.0
RMSE
0.0297
0.0192
0.0254
0.0372
0.0189
0.0198
The data set gives a Swanson mean of 0.09 md whereas the AA is 0.18 md, a 50%
difference. This 50% difference changes the Cleveland from a tight (less than 0.1 md) to
conventional classification for tax and regulatory purposes. SRC1 and SRC2 estimate the
mean value as 0.122 and 0.121 md, respectively.
SR and PT significantly underestimate the mean value by 28% and 15%, respectively,
and they have the lowest efficiency. Therefore, neither SR nor PT is an appropriate mean
estimator, although SR has the smallest SE. There is a clear preference for xSRC1, xSRC2,
and xMLE compared to SR as around 30% larger SE’s of SRC1, SRC2, and MLE are
compensated by their almost zero bias and 50% smaller RMSE. The AA is not a suitable
mean estimator because it has low efficiency and large SE although it is unbiased. SRC1,
SRC2, and MLE are preferable to the AA because they have smaller SE’s and RMSE’s
than the AA in addition to having around zero bias. Hence, SRC1, SRC2, and MLE might
be among the best mean estimators for this case.
100 Case Studies
To illustrate our findings for the case of bimodal distribution, an example, which
follows a bimodal distribution, is provided here: ultimate recoverable gas reserves of
Wabamun pool based on 28 wells (MacCrossan 1969). The distribution of the data set
can be described by a combination of two log-normal distributions with α = 0.6 (Fig.
8-2). The data set gives SR and the AA of 327,659 and 308,175 MMCFE, respectively; a
6% difference (Fig. 8-2). Although this difference appears insignificant, it is equivalent to
19,500 MMCFE difference in reserves and US$ 58,500,000 difference in value as $3 per
MCFE.
1000
Sample Properties of ln(x) Std1 = 0.81 Std2 = 0.82
Mean1 = 23.81 Mean2 = 26.79
100
Swanson's Rule = Arithmetic Average = SR/Arith A = Sample Std = VDP = # Wells=28
CV =
10
1
*Thousands
Gas Reserves (MMCFE)
10000
3.28E+05
3.08E+05
0.94
4.E+05
0.8
1.33
0.1
2
5
10 15 20 30 40 50 60 70 80
Probability, (%)
90
95
98 99
Fig. 8-2 – Probability plot of the data set taken from MacCrossan (1969) with sample statistical
properties calculated from available data sets.
The bootstrap results show that SR has the largest SE whereas MLE has the smallest
SE (Table 8-4). MLE cannot be good candidate as an alternative mean estimator for this
data set because it strongly depends on ݊ such that it is biased for small ݊, and it also has
its complexity in estimating a mean value of a bimodal distribution. Although PT has
smaller SE than the AA, it is biased, thus the AA might be appropriate to estimate the
mean value.
If it is assumed that the true distribution follows a bimodal distribution with the
statistical properties of ߤଵ ൌ 23.81, ߤଶ ൌ 26.79, ߪଵ ൌ 0.81, ߪଶ ൌ 0.82, and ߙ ൌ 0.6, the
statistical properties of the mean estimators are analytically calculated (Table 8-4; last
three columns). The results show that MLE has the smallest SE and RMSE which can
compensate its 1.6% bias; however, as mentioned before, MLE involves complex
manipulations. Thus MLE is not appropriate mean estimator, and PT cannot be an
101 Case Studies
optimum mean estimator either since it has the largest bias, SE, and RMSE. Although SR
has 1.6% and 2.6% smaller SE and RMSE than the AA, respectively, it underestimates
the mean value by 4.5%. Thus the AA might be the proper mean estimator.
Table 8-4 – Statistical properties of the data set taken from MacCrossan
(1969).
Mean
Estimator
ෝࢀ
࢞
(MMCFE)
sample mean
ሺ࢞
ෝ∗ ࢀ ሻ࡭
ෝ∗ ࢀ ሻ
࢙࢚ࢊሺ࢞
AA
MLE
PT
SR
308,175
266,060
285,089
327,659
307,930
267,458
302,212
340,311
75,857
52,000
70,707
91,234
Bootstrap Results (MMCFE)
Theory (MMCFE)
95% Confidence
ෝ∗ ࢀ
Interval of ࢞
159,251 456,609
165,538 369,378
163,627 440,798
161,492 519,130
࢙࢚ࢊሺ࢞ࢀ ሻ
ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
RMSE
1.000
1.016
0.908
0.953
86,796
44,063
90,020
85,414
86,796
43,871
86,840
84,532
Another example shown here consists of the EUR of 416 wells located in the
Hemphill gas field. The probability plot of the transformed data set, with
‫ ݕ‬ൌ ሺ‫ ݔ‬଴.ଶ଼ െ 1ሻ⁄0.28, shows a normal distribution with the transformed sample mean of
23.14 MMCFE0.28 and standard deviation of 7.65 MMCFE0.28 (Fig. 8-3).
(EUR (MMCFE)^λ‐1)/λ
50
40
30
Sample SR = 1796.8
Sample AA = 1824
SR/AA = 0.98
Sample CV = 0.92
Sample SD = 1683.9
Sample VDP = 0.68
20
10
# Wells = 146
0
2
5
10
20 30 40 50 60 70 80
90 95 98 99
Probability (%)
Fig. 8-3 – Probability plot of the transformed EUR of the Hemphill gas field with exponent λ=0.28.
The AA and SR give the sample means of 1824 and 1796.8 MMCFE, respectively;
only a 2% difference. This small difference leads to around 27.2 MMCFE per well and
11,300 MMCFE in total (n = 416 wells) difference in reserves estimation which is
equivalent to US$33,950,000 difference in economical assessment assuming US$3 per
MCFE. Thus, the example shows that how choosing a correct mean estimator is
imperative and may affect decisions for further development.
If we assume that samples have been taken from a population that its true distribution
is power-normal distribution with transformed mean of 23.16 MMCFE0.28 and standard
102 Case Studies
deviation of 7.65 MMCFE0.28, the statistical properties of the mean estimator can be
analytically calculated (Table 8-5, last three columns). Analytical results show that PT
estimates the mean value with slightly smaller bias (0.1%) than SR (0.4%) and SRC
(0.3%); however, SR and SRC are more desirable than PT since they have around 2.0%
smaller SE’s and 1.8% smaller RMSE’s than PT. Although the AA is unbiased, both SR
and SRC are preferable to the AA because their biases are compensated by their 6.3% and
6.0% smaller SE and RMSE, respectively. Nevertheless, using either SR or SRC to
estimate the mean value leads to around US$22,000 and 15,000 per well underestimation
of reserves, respectively.
Table 8-5 – Statistical properties of EUR data set of the Hemphill gas field.
Mean
Estimator
AA
PT
SR
SRC
ෝࢀ
࢞
(MMBO)
Sample
Mean
1824.00
1831.00
1796.80
1799.76
Bootstrap Results (MMCFE)
ሺ࢞
ෝ∗ ࢀ ሻ࡭
ෝ∗ ࢀ ሻ
࢙࢚ࢊሺ࢞
1824.4
1838.8
1807.3
1811
82.21
95.49
92.56
92.42
Theory (MMCFE)
95% Confidence
ෝ∗ ࢀ
Interval of ࢞
1663.3
1985.5
1651.6
2026.0
1625.9
1988.7
1629.9
1992.1
࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
90.8
86.8
85.0
85.2
1.000
1.001
0.996
0.997
RMSE
90.81
86.83
85.26
85.33
The last example shown here consists of 94 core plug permeability taken on a well in
the North Sea. The probability plot of the data set on the log scale illustrates a log-normal
distribution with the log-sample mean of 4.3 mD and log-sample standard deviation of
1.33 mD (Fig. 8-4). It follows a AR(1) model with first step autocorrelation coefficient of
0.4. The AA and SR give the sample means of 163 and 150 mD, respectively; an 8%
difference.
10,000
Sample SD of ln(x) = 1.33
Sample Mean of ln(x) = 4.31
Permeability (mD)
1,000
100
10
# points= 94
1
2
5
10 15 20 30 40 50 60 70 80
90
95
98 99
Probability, (%)
Fig. 8-4 – Probability plot of a permeability data set taken from North Sea.
103 Case Studies
The mean estimators’ properties are analytically calculated based on the assumption
that the data set comes from a log-normal population with the log-mean of 4.31, logstandard deviation of 1.33, and ߩ௑భ ൌ 0.4 (Table 8-6, last three columns). Based on the
analytical results, PT approximates the mean value with 5% and 10% less bias than MLE
and SR, respectively; however, it has 8% and 15% higher uncertainty, and 9% and 20%
less efficiency than MLE and SR, respectively. Therefore, compared to the AA and PT,
the smaller SE’s and RMSE’s of MLE and SR can compensate their larger bias. SR is
preferable to MLE because it has a 7% smaller uncertainty and 10% higher efficiency
than MLE. Although the AA is unbiased, SR is preferable to the AA because its bias
(9%) is compensated by its 18% and 13% smaller SE and RMSE, respectively.
Since the data points are auto-correlated, the bootstrap method is used in previous
examples cannot be used as a resampling technique for this case and thus the block
bootstrap method is utilized instead. The results obtained from the block bootstrap agree
with the results analytically calculated.
Table 8-6 – Statistical properties of permeability data set measured along a
well located in the North Sea.
Mean
Estimator
ෝࢀ (mD)
࢞
Sample
Mean
ሺ࢞
ෝ∗ ࢀ ሻ࡭
ෝ∗ ࢀ ሻ
࢙࢚ࢊሺ࢞
AA
MLE
PT
SR
163.29
181.31
168.71
149.73
173.68
197.56
186.28
167.96
66.70
65.21
70.30
61.39
Block Bootstrap Results (mD)
95% Confidence
ෝ∗ ࢀ
Interval of ࢞
42.95
304.41
69.75
325.37
48.49
324.07
47.63
288.28
104 Theory (mD)
࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ
68.11
63.56
69.22
57.71
1.000
1.053
1.002
0.907
RMSE
68.11
64.28
69.22
60.14
Future Work
Chapter 9 : Conclusions and Recommendations
9.1
Conclusions
An optimum mean estimator should simultaneously have small bias, small
uncertainty (small SE), consistency (zero RMSE for large ݊), and large efficiency (small
RMSE). None of the AA, MLE, SR, and PT has all these conditions at the same time for
all ranges of variability and sample size. Regardless of distribution type except for the
normal distribution, both SR and PT are biased even for a very large ݊ and small ߪ
whereas the AA is unbiased and MLE is asymptotically unbiased.
For the case of log-normal distribution, the AA is consistent for any ݊ and ߪ;
however, it has the smallest SE and RMSE only for certain ranges of ݊ and ߪ. SR and PT
are inconsistent even for a very large ݊ and small ߪ. PT estimates the mean value with
slightly less bias than SR; however, SR has smaller SE than PT for any ݊ and ߪ. MLE is
asymptotically unbiased and efficient, which means the performance of MLE strongly
depends on ݊. SR might be preferable to the AA and MLE since it is more efficient than
the AA and MLE for some ranges of ݊ and ߪ although SR is biased.
SR underestimates the mean value, even for log-normal populations with small
standard deviations. As Megill (1984) observes, this underestimation rapidly increases as
ߪ rises, thus users should be aware of the SR bias. Otherwise, for example, a 10% in
estimating the mean value of reserves of Hurst et al’s (2000) data set could lead to a poor
assessment of the prospect; or a 50% underestimation could cause the Cleveland
formation to be classified as tight reservoir.
Being unbiased is a desirable property, but it is not necessarily the most important
mean estimator’s property because SR can be de-biased using a correction factor. Two
approaches are described here to de-bias SR: multiply SR by a coefficient, ‫ݔ‬ௌோ ஼ଵ , and
adjust the weights of SR based on the population standard deviation, ‫ݔ‬ௌோ ஼ଶ . Both
approaches need ߪ, which is not always available, thus the sample standard deviation
should be used instead. Estimating ߪ causes, for example, at most 17% and 20% errors in
105 Future Work
estimating ‫ܧ‬൫‫ݔ‬ௌோ ஼ଵ ൯ and ‫ܧ‬൫‫ݔ‬ௌோ ஼ଶ ൯ when ߪ ൌ 2.0; nonetheless, the errors rapidly
approach zero as n increases and/or ߪ decreases.
Converting SR to an unbiased mean estimator causes SE to increase, but the SE’s of
xSRC1 and xSRC2 are still smaller than SE’s of other mean estimator except SR for some
ranges of ݊ and ߪ. De-biased approaches make SR to be consistent for any ݊ and ߪ and
the most efficient mean estimator for large ranges of ݊ and ߪ.
For the case of bimodal distribution, SR has smaller uncertainty than the AA and PT
for any variability and sample size; however, it has smaller uncertainty than MLE for
certain ranges of variability and sample size. For moderate variability, MLE is the most
efficient mean estimator; however, it has the smallest efficiency for large variability and
small sample sizes. None of the AA, SR, and PT is the most efficient mean estimator for
all ranges of variability and sample size. Hence for some ranges of variability and sample
size, SR becomes an optimum mean estimator because the SR bias is compensated by its
smaller uncertainty and higher efficiency.
For the case of power-normal distribution, SR and PT are both biased for all λ values
except when λ =1 and have negligible bias when ߣ ൌ 1/2 and ߣ ൌ 1/3. PT approximates
a mean value with smaller bias than SR whereas SR has smaller uncertainty than PT for
any ݊, ߪ, and ߣ. Efficiency is evaluated based on RMSE which incorporates SE and bias,
so SR is more efficient than PT if the bias of SR is compensated by its smaller SE;
otherwise, PT becomes more efficient than SR. Consequently, SR becomes preferable to
PT for some ranges of ݊ and ߪ depending on ߣ.
When ߣ ൌ 1 and ߣ ൌ 1/2, the AA has the smallest uncertainty; however, as λ differs
from these two values, SR has smaller uncertainty than the AA for certain ranges of ݊
and ߪ depending on ߣ. When ߣ ൒ 1/8, the AA has the highest efficiency, so the AA
becomes the most preferable mean estimator because its larger SE might be compensated
by being unbiased and having the highest efficiency. However, as ߣ approaches zero,
each of the AA, SR, and PT could be preferable mean estimator depending on ݊, ߪ, and ߣ
values.
106 Future Work
In order to de-bias SR, its weights are modified based on ߤ, ߪ, and ߣ. Since these
properties are unknown in the most of cases, they are estimated using available data set.
Their applications introduce errors into the estimation of ‫ܧ‬൫‫ݔ‬ௌோ ஼ ൯; nevertheless, these
errors rapidly drop to zero as ߣ increases and/or ߪ decreases. When SR becomes
unbiased, its SE increases compared to original SR, but it is still smaller than the SE’s of
the AA and PT except when ߣ is very small, such as ߣ ൌ 1/16. SRC has higher efficiency
than PT and smaller efficiency than the AA for any ݊, ߪ, and ߣ except when ߣ ൌ 1/16
where it has higher efficiency than PT and the AA for certain ranges of ݊ and ߪ.
Compared to SR, SRC becomes more efficient for some ranges of ݊ and ߪ depending on
ߣ.
So far, it has been assumed that RV’s are i.i.d., but this is not always a valid
assumption as reservoir parameters might be auto-correlated. Positive auto-correlation
leads to a decrease in efficiency and an increase in uncertainty for the case of estimating
the mean value. This means that auto-correlated samples are less informative than un­
correlated samples, thus more auto-correlated samples are needed to achieve certain
accuracy. The auto-correlation causes the mean estimators to behave differently and
depends on which mean estimator is used different ESS is needed to achieve a certain
accuracy.
9.2
Future Work
In the following, some questions and issues are briefly described as a potential for
future research.
9.2.1
Evaluate Swanson’s Rule Performance for Very Small Sample Sizes
This study evaluates the performance of SR when ݊ ൒ 25; however, in some cases, it
is expensive to have many measurements, and thus the available data set contains few
samples (݊ ൏ 25). Thus, it is recommended to assess the SR performance for very small
sample sizes.
107 Future Work
9.2.2 Consider Beta Distribution for Percentiles
In this study, it is assumed that the uth percentile is normally distributed. It, indeed,
has a beta distribution and only becomes normally distributed for very large sample sizes
(i.e., the uth percentile is asymptotically normally distributed).
As seen before, there is a discrepancy between analytical and numerical approaches,
especially for small sample size, which might be due to the assumption mentioned above.
Therefore, it is recommended to analytically derive the properties of SR based on the fact
that the uth percentile has a beta distribution.
9.2.3 Extend Delfiner’s Approach
Delfiner (2007) advocated the use of SR to reduce the pitfalls related to permeability
estimates from Phi-k relationship. He did this comparison for a Phi-k data set with the
correlation coefficient of 0.64. However, he has not addressed whether this method is
applicable for all Phi-k cross-plot with different correlation coefficients. Thus, it is
recommended to evaluate his approaches for different correlation coefficients and
whether it was statistically better.
9.2.4 Evaluate Swanson’s Rule Performance for Truncated Log-normal
Distribution
Rose (2001) has raised a remarkable issue about SR; however, his conclusion would
have been more persuasive if he could have quantitatively studied the bias of SR for a
wide range of truncated log-normal distributions. Another issue that he overlooked is that
after truncation, 98% of the cumulative density function (CDF) is used to calculate the
mean value while proposed SR’s formula is based on using a 100% of the CDF.
Therefore, SR’s formula might be changed based on this truncation. This change might
be insignificant, but it should be evaluated. Therefore, it would be of interest to
comprehensively evaluate the bias, uncertainty, efficiency, and consistency of SR when
the underlying truncated distribution is log-normal with wide range of variability.
108 Appendices
Appendix A : Order-Statistics Samples
We wish to analytically derive the expected value and standard deviation of the
discretization methods which can be written in general form of
‫ݔ‬ௗ௜௦ ൌ ܲଵ ‫ݔ‬௥ ൅ ܲଶ ‫ݔ‬௦ ൅ ܲଷ ‫ݔ‬௧ , ....................................................................................... (A-1)
where the subscript of ݀݅‫ ݏ‬stands for discretization method, and ܲ୧ is the weight assigned
to the uth percentile, ‫ݔ‬௨ . For this purpose, it is assumed that ‫ݔ‬௨ is normally distributed
with the mean of ܺ௨ and variance of ‫ݑ‬ሺ1 െ ‫ݑ‬ሻ⁄ሺ݄݊௑ೠ ଶ ሻ; and the covariance of two
percentiles, the ‫ݑ‬th and ‫ ݒ‬th percentiles is ‫ݑ‬ሺ1 െ ‫ ݒ‬ሻ⁄൫݊ ݄ሺ‫ݔ‬௨ ሻ ݄ሺ‫ݔ‬௩ ሻ൯, where ‫ ݑ‬൏ ‫( ݒ‬Ord
and Stuart 1987). Hence, the expected value and variance of the ‫ݑ‬th percentile when the
population is log-normally distributed with the log-mean of ߤ and log-variance of ߪ ଶ can
be expressed as
‫ ܧ‬ሺ‫ݔ‬௨ ሻ ൌ ݁ ሺఓାఙ௪ೠ ሻ , .................................................................................................... (A-2) and
ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ ൌ
ଶగఙ మ
௡
మ
‫ݑ‬ሺ1 െ ‫ݑ‬ሻ ݁൫ଶఓାଶఙ௪ೠ ା௪ೠ ൯ , ............................................................ (A-3) respectively, where ‫ݓ‬௨ ൌ ିଵ ሺ‫ݑ‬/100ሻ, and  denotes cumulative standard normal
probability density, and ݊ is the sample size. The covariance of the ‫ݑ‬th and ‫ ݒ‬th
percentiles, where ‫ ݑ‬൏ ‫ݒ‬, is given by
‫ ݒ݋ܥ‬ሺ‫ݔ‬௨ , ‫ݔ‬௦ ሻ ൌ
ଶగఙ మ
௡
ଵൗ ൫௪ మ ା௪ మ ൯൧
ೞ
ଶ ೠ
.
‫ݑ‬ሺ1 െ ‫ݏ‬ሻ ݁ൣଶఓାఙሺ௪ೠ ା௪ೞ ሻା
.................................. (A-4) Eq. A-1 is, indeed, a linear combination of three percentiles, so in order to derive the
analytical expressions of ‫ ܧ‬ሺ‫ݔ‬ௗ௜௦ ሻ and ܸܽ‫ݎ‬ሺ‫ݔ‬ௗ௜௦ ሻ, Pearson’s method is used (Ord and
Stuart 1987).
In order to drive the expected value and variance of a function of RV’s as given by
݃௑భ,… ,௑ೖ ሺ‫ݔ‬ଵ , … , ‫ݔ‬௞ ሻ, the Pearson’s method suggests to take Taylor series expansion of the
109 Appendices
function around the expected values of RV’s, ܺଵ , … , ܺ௞ ; and then the expansion is
truncated to the second term as
݃௑ ሺ‫ݔ‬ሻ ൌ ݃ሺߠ ሻ ൅ ∑௞௜ୀଵ ݃ᇱ ௜ ሺߠሻ ሺ‫ݔ‬௜ െ ߠ௜ ሻ
൅ ܱሺ݊ିଵ ሻ .................................................... (A-5)
where ݃ᇱ ௜ ሺ‫ݔ‬ሻ ൌ ߲݃ሺ‫ݔ‬ଵ , … , ‫ݔ‬௜ , … , ‫ݔ‬௞ ሻ⁄߲‫ݔ‬௜ is evaluated at ߠ ൌ ሼߠଵ , … , ߠ௞ ሽ and ߠ௜ is the
expected value of RV, ܺ௜ , ݅ ൌ 1, … , ݇. Following that, the expected value and variance
of the function ݃ሺ‫ݔ‬ሻ can be respectively expressed by
‫ܧ‬൫
݃௑ ሺ‫ݔ‬ሻ൯ ൌ ݃ሺߠ ሻ ൅ ܱሺ݊ିଵ ሻ, .................................................................................... (A-6)
and
ܸܽ‫ݎ‬൫݃௑ ሺ‫ݔ‬ሻ൯ ൌ ∑௞௜ୀଵ ݃ᇱ ௜ ሺߠሻଶ ܸܽ‫ݎ‬ሺܺ௜ ሻ ൅ ∑ ∑௞௜ஷ௝ୀଵ ݃ᇱ ௜ ሺߠ ሻ ݃ᇱ ௝ ሺߠሻ ‫ ݒ݋ܥ‬൫ܺ௜ , ܺ௝ ൯ ൅ ܱሺ݊ିଵ ሻ.
..................................................................................................................................... (A-7)
The Taylor expansion of Eq. A-1 is given by
‫ݔ‬ௗ௜௦ ൌ ܲଵ ‫ ܧ‬ሺ‫ݔ‬௥ ሻ ൅ ܲଶ ‫ ܧ‬ሺ‫ݔ‬௦ ሻ ൅ ܲଷ ‫ ܧ‬ሺ‫ݔ‬௧ ሻ ൅ ܲଵ ሾ‫ݔ‬௥ െ ‫ܧ‬ሺ‫ݔ‬௥ ሻሿ ൅ ܲଶ ሾ‫ݔ‬௦ െ ‫ܧ‬ሺ‫ݔ‬௦ ሻሿ ൅
ܲଷ ሾ‫ݔ‬௧ െ ‫ ܧ‬ሺ‫ݔ‬௧ ሻሿ. .......................................................................................................... (A-8)
Consequently, the expected value and variance of Eq. A-8 are as follows respectively
‫ ܧ‬ሺ‫ݔ‬ௗ௜௦ ሻ ൌ ܲଵ ‫ ܧ‬ሺ‫ݔ‬௥ ሻ ൅ ܲଶ ‫ ܧ‬ሺ‫ݔ‬௦ ሻ ൅ ܲଷ ‫ ܧ‬ሺ‫ݔ‬௧ ሻ, ................................................................ (A-9)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬ௗ௜௦ ሻ ൌ ܲଵ ଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௥ ሻ ൅ ܲଶ ଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௦ ሻ ൅ ܲଷ ଶ ܸܽ‫ݎ‬ሺ‫ݔ‬௧ ሻ ൅ 2ܲଵ ܲଶ ܿ‫ݒ݋‬ሺ‫ݔ‬௥ , ‫ݔ‬௦ ሻ ൅
2ܲଵ ܲଷ ܿ‫ݒ݋‬ሺ‫ݔ‬௥ , ‫ݔ‬௧ ሻ ൅ 2ܲଶ ܲଷ ܿ‫ݒ݋‬ሺ‫ݔ‬௦ , ‫ݔ‬௧ ሻ. .................................................................. (A-10) Substituting Eq. A-2 into A-9 yield
‫ ܧ‬ሺ‫ݔ‬ௗ௜௦ ሻ ൌ ܲଵ ݁ ሺఓାఙ௪ೝሻ ൅ ܲଶ ݁ ሺఓାఙ௪ೞ ሻ ൅ ܲଷ ݁ ሺఓାఙ௪೟ሻ , .............................................. (A-11) and the application of Eqs. A-3 and A-4 in Eq. A-10 gives
110 Appendices
ܸܽ‫ݎ‬ሺ‫ݔ‬ௗ௜௦ ሻ ൌ
ଶగఙ మ
௡
మ
మ
݁ଶఓ ൜ܲଵ ଶ ‫ݎ‬ሺ1 െ ‫ݎ‬ሻ ݁ଶఙ ௪ೝା௪ೝ ൅ ܲଶ ଶ ‫ݏ‬ሺ1 െ ‫ݏ‬ሻ ݁ଶఙ ௪ೞ ା௪ೞ ൅
ೢೝ మ శೢೞ మ
మ
మ
ܲଷ ଶ ‫ݐ‬ሺ1 െ ‫ݐ‬ሻ ݁ଶఙ௪೟ା௪೟ ൅ 2ܲଵ ܲଶ ‫ݎ‬ሺ1 െ ‫ݏ‬ሻ ݁ఙሺ௪ೝା௪ೞ ሻା
‫
ݐ‬ሻ ݁
ೢ మ శೢ೟ మ
ఙ ሺ௪ೝ ା௪೟ ሻା ೝ
మ
൅ 2ܲଶ ܲଷ ‫ݏ‬ሺ1 െ ‫ݐ‬ሻ ݁
ೢ మ శೢ೟ మ
ఙ ሺ௪ೞ ା௪೟ ሻା ೞ
మ
111 ൅ 2ܲଵ ܲଷ ‫ݎ‬ሺ1 െ
ൠ. ................................... (A-12)
Appendices
Appendix B : Moments of the Maximum Likelihood
Estimator
MLE approximates the parameters of a population by maximizing the likelihood
function. For any data set, ሺ‫ݔ‬ଵ , … , ‫ݔ‬௡ ሻ, taken from a log-normal population with the logmean of ߤ and log-variance of ߪ ଶ , MLE estimates the mean value as ‫ݔ‬ெ௅ா ൌ
݁‫݌ݔ‬൫݉௬ ൅ ‫ݏ‬௬ ଶ ⁄2൯,
‫ݕ‬௜ ൌ ݈݊ሺ‫ݔ‬௜ ሻ,
where
ଵ
݉௬ ൌ ∑௡௜ୀଵ ‫ݕ‬௜ ,
௡
and
ଶ
‫ݏ‬௬ ଶ ൌ ∑௡௜ୀଵൣ݈݊ሺ‫ݔ‬௜ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ.
The sample mean and variance are independent RV’s when random samples are
drawn from a normal distribution. Hence, the sample mean, ݉௬ , and sample variance,
‫ݏ‬௬ ଶ , are independent because they are the first and second centered moments of a data set
which drawn from a normal distribution, ܻ ൌ ݈݊ሺܺሻ~ܰሺߤ, ߪ ଶ ሻ. Therefore, the covariance
of ݉௬ and ‫ݏ‬௬ ଶ is zero.
The expected value and variance of ‫ݔ‬ெ௅ா are analytically derived based on the
property of expectation that if two RV’s ܺ and ܻ are independent, then ‫ ܧ‬ሺܻܺሻ ൌ
‫ ܧ‬ሺܺሻ ‫ܧ‬ሺܻሻ. As just stated, the sample mean, ݉௬ , and sample variance, ‫ݏ‬௬ ଶ , are
independent. Thus
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ‫ܧ‬൫݁ ௠೤ା௦೤
మ ⁄ଶ
൯ ൌ ‫ ܧ‬ሺ݁ ௠೤ ሻ ‫ܧ‬൫݁ ௦೤
మ ⁄ଶ
൯. ..................................................... (B-1) Based on the CLT, ݉௬ is normally distributed with the mean ߤ and variance
ఙమ
௡
.
Therefore, according to the properties of the log-normal distribution, ݁ ௠೤ has the mean of
݁ ஜାఙ
మ ⁄ሺଶ௡ሻ
and variance of ݁ ଶஜାఙ
మ ⁄௡
൫݁ ఙ
మ ⁄௡
െ 1൯.
మ
The expectation of ݁ ௔௦ can be given by
‫ܧ‬൫
݁
௔௦ మ
൯ ൌ ቀ1 െ
ଶ௔ఙ మ
௡ିଵ
ሺ೙షభሻ
ቁ
ି మ
, .................................................................................... (B-2)
112
Appendices
ܽ
where
‫ܧ‬൫݁ ௦೤
మ ⁄ଶ
is
a
constant
ሺ೙షభሻ
మ
൯ ൌ ሺ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሻି
coefficient
(Finney
1941).
Thus,
, and consequently the expected value of ‫ݔ‬ெ௅ா is
given by
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ
൰
మ
൬ஜା
݁
ሺ೙షభሻ ഑మ
మ
೙
ି
ቀ1 െ
ఙమ
ሺ೙షభሻ
మ
ି
ቁ
௡ିଵ
. ......................................................... (B-3) Eq. B-3 reveals that MLE is asymptotically unbiased because as ݊ becomes sufficiently
ሺ೙షభሻ ഑మ
݁ ି ೙ మ
large, the term of
ቀ1 െ
ሺ೙షభሻ
మ
ି
ఙమ
ቁ
in Eq. B-3 approaches one and thus
௡ିଵ
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ‫ ܧ‬ሺܺሻ.
The variance of ݁ ௠೤ା௦೤
ܸܽ‫ݎ‬൫݁ ௠೤ା௦೤
‫ܧ‬൫݁ ௠೤ା௦೤
మ ⁄ଶ
మ ⁄ଶ
మ ⁄ଶ
can be written as
మ
൯ ൌ E൫݁ ଶ௠೤ା௦೤ ൯ െ ‫ܧ‬൫݁ ௠೤ ା௦೤
మ ⁄ଶ
ଶ
మ
൯ ൌ Eሺ݁ ଶ௠೤ ሻE൫݁ ௦೤ ൯ െ
ଶ
൯ . .......................................................................................................... (B-4) From the properties of log-normal, ‫ܧ‬൫݁ ௕௠೤ ൯ ൌ ݁ ௕ఓା൫௕
మ ఙ మ ൯⁄ሺଶ௡ሻ
, where b is a constant
ሺ೙షభሻ
మ
మ
coefficient. From Eq. B-2, ‫ܧ‬൫݁ ௦೤ ൯ ൌ ሺ1 െ 2ߪ ଶ ⁄ሺ݊ െ 1ሻሻି
. Thus Eq. B-4 is
simplified and the variance of MLE is given by
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ
൰
೙
൬ଶஜା
൝݁
഑మ
൰
೙
൬
ቀ1 െ
ଶఙ మ
ቁ
௡ିଵ
ሺ೙షభሻ
మ
ି
െ ቀ1 െ
113 ఙమ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
ൡ. ..................... (B-5) Appendices
Appendix C : Conditions for a Bimodal Distribution
Investigators such as Eisenberger (1964), Robertson and Fryer (1969), Behboodian
(1970), and Schilling et al. (2002) have analyzed the bimodality of a combination of two
normal distributions. They proposed an interval for the difference between the means of
two distributions such that the combination of two normal distributions yields a bimodal
distribution. They mentioned that when the difference between the means lies somewhere
outside of this interval, their combination results in a unimodal distribution.
In this section, we would like to know condition(s) under which a combination of two
log-normal distributions yields a bimodal distribution.
Let the PDF of ݄௑ ሺ‫ݔ‬ሻ presume to have two modes and be split into two log-normal
distributions with log-means of ߤଵ and ߤଶ and log-standard deviations of ߪଵ and ߪଶ such
that ݁ ൫ఓభିఙభ
మ൯
൏ ݁ ൫ఓమ ିఙమ
మ൯
(i.e, the mode of the first distribution is smaller than the mode
of the second one). Hence, the first derivative of PDF, ݄௑ ᇱ ሺ‫ݔ‬ሻ, should have three real
మ
roots. When ‫ ݔ‬൏ ݁ ൫ఓభିఙభ ൯ , ݄௑ ᇱ ሺ‫ݔ‬ሻ ൐ 0 since both of the ݄ᇱଵ and ݄ᇱ ଶ are positive; and for
మ
‫ ݔ‬൐ ݁ ൫ఓమ ିఙమ ൯ , ݄௑ ᇱ ሺ‫ݔ‬ሻ ൏ 0 because ݄ᇱଵ and ݄ᇱ ଶ are both negative. Thus, the interval in
which there is the possibility of finding the roots of ݄௑ ᇱ ሺ‫ݔ‬ሻ (i.e., ݄௑ ᇱ ሺ‫ݔ‬ሻ ൌ 0) is
మ
మ
݁ ఓభ ିఙభ ൏ ‫ ݔ‬൏ ݁ ఓమ ିఙమ . There must exist an ‫ݔ‬଴ over this interval such that ݄௑ ᇱ ሺ‫ݔ‬ሻ ൌ 0
and ݄௑ ᇱᇱ ሺ‫ݔ‬ሻ ൐ 0 (i.e., ݄௑ ሺ‫ݔ‬ሻ is concave up between two modes).
The first and second derivatives of ݄௑ ሺ‫ݔ‬ሻ are respectively expressed as
ᇱ
݄௑ ሺ‫ݔ‬ሻ ൌ ߙ
ିሾఙభ ା௟௡ሺ௫ሻିఓభ ሿ
௫ మ ఙభ య √ଶగ
݁
భ ೗೙ሺೣሻషഋభ మ
ቁ
഑భ
ିమ ቀ
൅ ሺ1 െ ߙሻ
and
114 ିሾఙమ ା௟௡ሺ௫ሻିఓమ ሿ
௫ మ ఙమ య √ଶగ
݁
భ ೗೙ሺೣሻషഋమ మ
ቁ
഑మ
ିమ ቀ
, . (C-1) Appendices
݄௑ ᇱᇱ ሺ‫ݔ‬ሻ ൌ
ିఈ
௫ య ఙభ య √ଶగ
ሺଵିఈሻ
௫ య ఙమ య √ଶగ
ቄെ2ሾ݈݊ሺ‫ݔ‬ሻ െ ߤଵ ൅ ߪଵ
ଶሿ
ଶ
െ
ଵ
ఙభ మ
ቄെ2ሾߤଶ െ ߪଶ െ ݈݊ሺ‫ݔ‬ሻሿ ൅
ሾ݈݊ሺ‫ݔ‬ሻ െ ߤଵ ൅ ߪଵ
ଵ
ఙమ మ
ଶሿ
൅ 1ቅ ݁
ଶ
భ ೗೙ሺೣሻషഋభ మ
ቁ
഑భ
ିమ ቀ
ሾߤଶ െ ߪଶ െ ݈݊ሺ‫ݔ‬ሻሿ െ 1ቅ ݁
൅
భ ೗೙ሺೣሻషഋమ మ
ቁ
഑మ
ିమ ቀ
. . (C-2) The combination of two equations, ݄௑ ᇱ ሺ‫ݔ‬ሻ ൌ 0 and ݄௑ ᇱᇱ ሺ‫ݔ‬ሻ ൐ 0, yields a cubic
equation as given by
݂ሺ‫ݔ‬ሻ ≡
ଵ
ఙభ మ
൛ሺ‫ ݎ‬െ 1ሻ݈݊ଷ ሺ‫ݔ‬ሻ ൅ ሾ2‫ܣ‬ଵ ൅ ‫ܣ‬ଶ െ ‫ݎ‬ሺ‫ܣ‬ଵ ൅ 2‫ܣ‬ଶ ሻሿ݈݊ଶ ሺ‫ݔ‬ሻ ൅ ൣ‫ݎ‬൫2‫ܣ‬ଵ ‫ܣ‬ଶ ൅ ‫ܣ‬ଶ ଶ ൯ െ
൫2‫
ܣ‬ଵ ‫ܣ‬ଶ ൅ ‫ܣ‬ଵ ଶ ൯൧݈݊ሺ‫ݔ‬ሻ ൅ ൣ‫ܣ‬ଵ ଶ ‫ܣ‬ଶ െ ‫ܣݎ‬ଵ ‫ܣ‬ଶ ଶ െ ߪଵ ଶ ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ൧ൟ ൐ 0 ......................... (C-3)
where ‫ܣ‬ଵ ൌ ߤଵ െ ߪଵ ଶ , ‫ܣ‬ଶ ൌ ߤଶ െ ߪଶ ଶ , and ‫ ݎ‬ൌ ߪଵ ଶ ⁄ߪଶ ଶ . When either ‫ ݔ‬ൌ ݁ ൫ఓభ ିఙభ
మ൯
మ
or ‫ ݔ‬ൌ ݁ ൫ఓమ ିఙమ ൯ , ݂ሺ‫ݔ‬ሻ is negative (i.e.,
݂ሺ‫ܣ‬ଵ ሻ ൌ
݂ሺ‫ܣ‬ଶ ሻ ൌ െሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ ൏ 0).
For the bimodality of ݄௑ ሺ‫ݔ‬ሻ, the cubic equation, ݂ሺ‫ݔ‬ሻ, must have two real roots and
hence three different real roots. Therefore, the discriminant of ݂ሺ‫ݔ‬ሻ given by
‫ ܦ‬ൌ ‫ ݎ‬ଶ ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻସ െ 2ߪଵ ସ ሺ2‫ ݎ‬ଷ െ 3‫ ݎ‬ଶ െ 3‫ ݎ‬൅ 2ሻሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻଶ െ 27ߪଵ ସ ሺ‫ ݎ‬െ 1ሻଶ , .... (C-4) should be positive. Eq. C-4 is a quadratic equation in terms of ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻଶ . Equating Eq. C-4 to zero is simply solved as
ሺ஺మ ି஺భ ሻబ
ఙభ
ൌቄ
ଵ
௥మ
య
మ
భ
మ
ቂെ2‫ ݎ‬ଷ ൅ 3‫ ݎ‬ଶ ൅ 3‫ ݎ‬െ 2 ൅ 2ሺ1 െ ‫ ݎ‬൅ ‫ ݎ‬ଶ ሻ ቃቅ . ................................... (C-5) The discriminant is either negative or zero if ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ ൑ ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ଴ , and
consequently ݂ሺ‫ݔ‬ሻ has at most one real root. Since ݂ሺ‫ݔ‬ሻ is negative at the boundaries and
within range then ݄௑ ᇱᇱ ሺ‫ݔ‬ሻ is always negative in the desired interval (i.e., ݄௑ ሺ‫ݔ‬ሻ is
concave down). Therefore, ݄௑ ሺ‫ݔ‬ሻ is unimodal.
Now, suppose that ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ ൐ ሺ‫ܣ‬ଶ െ ‫ܣ‬ଵ ሻ଴ . Then Eq. C-2 has two real roots ‫ݔ‬ଵ and
మ
మ
మ
మ
‫ݔ‬ଶ such that ݁ ఓభ ିఙభ ൏ ‫ݔ‬ଵ ൏ ‫ݔ‬ଶ ൏ ݁ ఓమ ିఙమ , since ݂ ᇱ ൫݁ ఓభ ିఙభ ൯ ൐ 0; ݂ ᇱ ൫݁ ఓమ ିఙమ ൯ ൏ 0;
మ
and ݂ ሺ‫ݔ‬ሻ is positive between ‫ݔ‬ଵ and ‫ݔ‬ଶ and negative when ݁ ఓభ ିఙభ ൏ ‫ ݔ‬൏ ‫ݔ‬ଵ and
115 Appendices
మ
‫ݔ‬ଶ ൏ ‫ ݔ‬൏ ݁ ఓమ ିఙమ . This means that ݄௑ ሺ‫ݔ‬ሻ cannot have more than two modes, and ݄௑ ሺ‫ݔ‬ሻ
is bimodal.
Eq. 4-2 shows that the relation between ߙ and ‫ ݔ‬is one—one because ݀‫ݔ‬⁄݀ߙ ൏ 0 in
మ
మ
the desirable interval, ݁ ఓభ ିఙభ ൏ ‫ ݔ‬൏ ݁ ఓమ ିఙమ . Replacement of ‫ݔ‬ଵ and ‫ݔ‬ଶ in Eq. 4-2
yields two values: ߙଵ and ߙଶ . This gives the second necessary condition to have a
bimodal distribution (i.e., the value of ߙ should lie in the open interval of ሺߙଵ , ߙଶ ሻ). The
boundary of ߙ is given by
మ
భ ೗೙ቀೣೕ ቁషഋమ
ቇ ቏
഑మ
ൣఙమ ା௟௡൫௫ೕ ൯ିఓమ ൧௘௫௣቎ିమ ቆ
ߙ௝ ൌ
, ............ (C-6)
మ
మ
భ ೗೙ቀೣೕ ቁషഋభ
భ ೗೙ቀೣೕ ቁషഋమ
షభ.ఱ
ൣఙభ ା௟௡൫௫ೕ ൯ିఓభ ൧௘௫௣቎ିమ ቆ
ቇ ቏ାൣఙమ ା௟௡൫௫ೕ ൯ିఓమ ൧௘௫௣቎ିమ ቆ
ቇ ቏
௥
഑భ
഑మ
where ‫ݔ‬௝ , ݆ ൌ 1, 2,is the jth root of ݂ሺ‫ݔ‬ሻ which lies in the preferred interval.
116 Appendices
Appendix D : First and Second Moments of Maximum Likelihood for Bimodal Distribution Let the RV’s, ܺଵ , … , ܺ௡ , assume to be i.i.d and follow a bimodal distribution which
can be split into two log-normal distributions as
݄௑ ሺ‫ߤ ;ݔ‬, ߪ ଶ , ߙ ሻ ൌ ߙ ݄௑ ଵ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ ൅ ሺ1 െ ߙሻ݄௑ ଶ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ, ............................... (D-1)
where ݄௑ ଵ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ and ݄௑ ଶ ሺ‫ߤ ;ݔ‬ଵ , ߪଵ ଶ ሻ are the PDF’s of two log-normal
distributions with the log-means of ߤଵ and ߤଶ and log-variances of ߪଵ ଶ and ߪଶ ଶ , and ߙ is
the portion of each distribution in the population which varies from zero to one.
As mentioned in Appendix-B, ‫ݔ‬ெ௅ா ௝ ൌ ݁‫ ݌ݔ‬ቀ݉௬ ൅ ‫ݏ‬௬ ଶ ⁄2ቁ, where ‫ݕ‬௜ ௝ ൌ ݈݊ ቀ‫ݔ‬௜ ௝ ቁ,
௝
ଵ
݉௬ ൌ ∑௡௜ୀଵ ‫ݕ‬௜ ௝ , ‫ݏ‬௬
௝
௡
ଶ
௝
௝
ଶ
ൌ ∑௡௜ୀଵ ቂ݈݊ ቀ‫ݔ‬௜ ௝ ቁ െ ݉௬ ቃ ൗሺ݊ െ 1ሻ, and j=1, 2. Then
௝
‫ݔ‬ெ௅ா ൌ ߙ‫ݔ‬ெ௅ா ଵ ൅ ሺ1 െ ߙሻ ‫ݔ‬ெ௅ா ଶ . .......................................................................... (D-2) Therefore, using the properties of expected value and variance of the sum of two
independent random variables, the first and second moments of MLE for the case of
bimodal distribution can be given as
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ߙ‫ ܧ‬൫‫ݔ‬ெ௅ா ଵ ൯ ൅ ሺ1 െ ߙሻ‫ܧ‬൫‫ݔ‬ெ௅ா ଶ ൯, ........................................................... (D-3)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ߙ ଶ ܸܽ‫ݎ‬൫‫ݔ‬ெ௅ா ଵ ൯ ൅ ሺ1 െ ߙሻଶ ܸܽ‫ݎ‬൫‫ݔ‬ெ௅ா ଶ ൯. ........................................... (D-4)
Application of Eqs. B-3 and B-5 into Eqs. D-3 and D-4, respectively, yield the
expected value and variance of MLE as
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ
஢మ మ
ቁ
௡ିଵ
ሺ೙షభሻ
మ
ି
ሺ೙షభሻ ಚ మ
ಚ మ
ఓభ ା భ ି ೙ మభ
మ
ߙ݁
݁
ቀ1
െ
஢భ మ
ሺ೙షభሻ
మ
ቁ
ି
௡ିଵ
൅ ሺ1 െ ߙሻ݁
ಚ
ఓమ ା మ
మ
మ
ሺ೙షభሻ ಚ మ
మ
݁ ି ೙ మ
ቀ1 െ
, ................................................................................................................. (D-5) 117
Appendices
and
ಚభ మ
ଶ ଶఓభ ା ೙
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ߙ ݁
ಚమ మ
ሻଶ ଶఓమ ା ೙
ߙ ݁
൥݁
ಚమ మ
೙
ቀ1 െ
൥݁
ଶ஢మ మ
௡ିଵ
ಚభ మ
೙
ቀ1 െ
ଶ஢భ మ
௡ିଵ
ሺ೙షభሻ
ቁ
ି మ
ሺ೙షభሻ
ቁ
ି మ
െ ቀ1 െ
஢మ మ
െ ቀ1 െ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
118 ஢భ మ
ିሺ௡ିଵሻ
ቁ
௡ିଵ
൩ ൅ ሺ1 െ
൩. ........................................ (D-6) Appendices
Appendix E : First and Second Moments of a Power
Normal Distribution
Among different approaches used to derive the statistical properties of a power
normal distribution, Freeman and Modarres’s approach (2006) is used in this study
Let ܺ be power-normally distrubuted with the transfomed mean of ߤ and variance,
ߪ ଶ , and the exponent of ߣ. The rth moment of the power-normal distribution is given by
௫ ಓషభ
ஶ
‫ ܧ‬ሺܺ ௥ ሻ ൌ ‫׬‬଴ ‫ ݔ‬௥
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ ఙ√ଶగ
݁
భ ೤షഋ మ
൤ିమ ቀ ഑ ቁ ൨
݀‫ݔ‬.
...................................................... (E-1) The RV, ܺ, is obtained by the inverse transformation of ܺ ൌ ሺ1 ൅ λyሻଵ⁄஛ for λ ് 0
and ܺ ൌ ݈݊ሺܻሻ for λ ൌ 0. Eq. E-1 is rearranged as
௥
‫ ܧ‬ሺ‫ ݔ‬ሻ ൌ
భ ೤షഋ మ
ஶ
ሺଵା஛୷ሻ౨⁄ಓ
ಓ
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ ఙ√ଶగ
‫׬ ۓ‬షభ
൤ି ቀ
ቁ ൨
݁ మ ഑
݀‫ݕ‬λ
൐ 0,
మ
భ ೤షഋ
‫ ۔‬షభ
ሺଵା஛୷ሻ౨⁄ಓ
൤ିమ ቀ ഑ ቁ ൨
ಓ
݁
݀‫ݕ‬λ ൏ 0
‫׬‬
‫ି ە‬ஶ ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ ఙ√ଶగ
................................... (E-2) In order to simplify Eq. E-2, the term ሺ1 ൅ λyሻ୰⁄஛ is expanded by using Taylor Series
about the mean ߤ,
ܶሺ‫ݕ‬ሻ ൌ ሺ1 ൅ ߣ‫ݕ‬ሻ௥⁄ఒ ൌ ∑ஶ
௜ୀ଴
ଵ
௜!
ܶሺ௜ሻ ሺߤሻሺ‫ ݕ‬െ ߤሻ௜ , ...................................................... (E-3) ೝ
th
where ܶ ሺ௜ሻ ሺ‫ݕ‬ሻ ൌ ሺ1 ൅ λyሻಓି௜ ∏௜ିଵ
௝ୀ଴ሺ‫ ݎ‬െ ݆λሻ is the i derivative of ܶ ሺ‫ݕ‬ሻ respect to ‫ݕ‬.
Therefore, Eq. E-2 is simplified as
‫ ܧ‬ሺ‫ ݔ‬௥ ሻ ൌ
ஶ
‫׬ ۓ‬షభ
ಓ
భ
ሺ ೔ሻ
೔
∑ಮ
೔సబቂ೔! ் ሺఓሻሺ௬ିఓሻ ቃ
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿఙ√ଶగ
భ ೤షഋ మ
൤ି ቀ
ቁ ൨
݁ మ ഑
݀‫ݕ‬λ
మ
൐0
భ ሺ ೔ሻ
భ ೤షഋ
೔
∑ಮ
‫ ۔‬షభ
൤ିమ ቀ ഑ ቁ ൨
೔సబቂ೔! ் ሺఓሻሺ௬ିఓሻ ቃ
ಓ
݁
݀‫ݕ‬λ ൏ 0
‫׬‬
‫ି ە‬ஶ ఃሾ௦௜௚௡ሺ஛ሻ௄ሿఙ√ଶగ
. ............................. (E-4) A new RV, ‫ݖ‬, is introduced, where ‫ ݖ‬ൌ ሺ‫ ݕ‬െ ߤሻ⁄ߪ and follows a truncated standard
normal distribution. Therefore, Eq. E-4 is shortened as
119
Appendices
ஶ
ଵ
ଵ
‫ ܧ‬ሺ‫ ݔ‬௥ ሻ ൌ
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ
ቐ
ሺ௜ ሻ
௜
∑ஶ
௜ୀ଴ ቂ ܶ ሺߤሻ ߪ ቃ ‫ ୏ି׬‬
௜!
ଵ
ሺ௜ ሻ
∑ஶ
௜ୀ଴ ቂ ܶ ሺߤሻ
௜!
ଵ
భ మ
‫ ݖ‬௜ ݁ ቀିమ ୸ ቁ ݀‫ ݖ‬λ ൐ 0
√ଶగ
భ మ
ି୏ ଵ
݁ ቀିమ ୸ ቁ ݀‫ ݖ‬λ
ߪ ௜ ቃ ‫ି׬‬ஶ √ଶగ
, ........ (E-5) ൏ 0
The integral term in equation above equals the ith moment of a standard normal
distribution. It is also similar to the Gamma function which is evaluated by integral
ஶ
Γሺ݇ሻ ൌ ‫׬‬଴ ‫ ݐ‬௞ିଵ ݁ ି௧ ݀‫ݐ‬. ............................................................................................... (E-6) With Eq. E-6, the integral term of Eq. E-5 is evaluated by
ஶ
‫ ்ܧ‬൫‫ ݖ‬௜ ൯ ൌ
‫ ୏ି׬‬
ଵ
భ మ
ஶ
ି௄
‫ ݖ‬௜ ݁ ቂିమ ୸ ቃ ݀‫ ݖ‬ൌ ‫ି׬‬ஶ ‫
ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ ݖ‬െ ‫ି׬‬ஶ ‫
ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ݖ‬λ ൐ 0
√ଶగ
ቐ
భ
ି୏ ଵ
ቂି ୸మ ቃ
‫ି׬‬ஶ ଶగ ‫ ݖ‬௜ ݁ మ ݀‫ݖ‬
√
,
ஶ
ஶ
ൌ ‫ି׬‬ஶ ‫
ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ ݖ‬െ ‫ି׬‬௄ ‫
ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ݖ‬λ ൏ 0
..................................................................................................................................... (E-7) where ‫ ்ܧ‬൫‫ ݖ‬௜ ൯ is the ith moment of a truncated standard normal distribution, ∅ is standard
ஶ
normal PDF. The first terms, ‫ି׬‬ஶ ‫ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ݖ‬, in the equation is expressed as
0݅ ൌ ‫݀݀݋‬
ஶ
‫ܧ‬൫‫ ݖ‬௜ ൯ ൌ ‫ି׬‬ஶ ‫
ݖ‬௜ ∅ሺ‫ݖ‬ሻ ݀‫ ݖ‬ൌ ൝ ௜!
݅ൌ ݁‫݊݁ݒ‬, ........................................ (E-8)
೔ൗ
௜
ଶ మ ൫ ൗଶ൯!
Finally, the rth moment of a power-normal distribution is evaluated as
‫ ܧ‬ሺ‫ ݔ‬௥ ሻ ൌ
೔
ଵ
ఃሾ௦௜௚௡ሺ஛ሻ௄ሿ
ఙ
ሺ௜ ሻ ሺߤሻ
ሺ௜ ሻ ሺߤሻߪ ௜ ି௄ ௜
‫ۓ‬ቈ∑ஶ
∑ஶ
ܶ
െ
ܶ
‫ି׬‬ஶ ‫∅ ݖ‬ሺ‫ݖ‬ሻ ݀‫ݖ‬቉ ,ߣ ൐ 0
௜ୀ଴,
೔
௜ୀ଴
ൗ
ۖ ௘௩௘௡
ଶ మ ൫௜ൗଶ൯!
ఙ೔
‫ ۔‬ஶ
ሺ௜ሻ
ஶ
ሺ௜ ሻ
௜ ஶ
௜
ۖ ቈ∑ ௜ୀ଴, ܶ ሺߤሻ ೔ൗమ ௜ െ ∑௜ୀ଴ ܶ ሺߤሻߪ ‫ି׬‬௄ ‫∅ ݖ‬ሺ‫ݖ‬ሻ ݀‫ݖ‬቉ ,ߣ ൏ 0
ଶ ൫ ൗଶ൯!
‫ ە‬௘௩௘௡
, (E-9) ೝ
where ܶ ሺ௜ሻ ሺߤሻ ൌ ሺ1 ൅ λμሻಓି௜ ∏௜ିଵ
௝ୀ଴ሺ‫ ݎ‬െ ݆λሻ. Taking the logarithm of the data points to convert them to a normal distribution is a
form of the power-normal distribution when λ ൌ 0, so the rth moment is given by
‫ ܧ‬ሺ‫ ݔ‬௥ ሻ ൌ ݁
ೝమ ഑మ
൰
మ
൬௥ఓା
................................................................................................... (E-10) 120 Appendices
If ‫ ܭ‬is sufficiently large such that ܻ is a normal distribution, the first moment of the
RV, ܺ, for a few power-normal distributions with different exponents, λ ൐ 0, are given in
Table E-1.
The AA is unbiased; however, SR and PT are biased that are respectively expressed
as
భ
భ
ܾௌோ ൌ ‫ ܧ‬ሺܺሻ െ ቂ0.3ሺ1 ൅ λߪ௒ ‫ݓ‬ଵ଴ ∗ ൅ λߤሻಓ ൅ 0.4ሺ1 ൅ λߪ௒ ‫ݓ‬ହ଴ ∗ ൅ λߤሻಓ ൅ 0.3ሺ1 ൅
భ
λߪ௒ ‫ݓ‬ଽ଴ ∗ ൅ λߤሻಓ ቃ, ...................................................................................................... (E-11)
and
భ
భ
ܾ௉் ൌ ‫ ܧ‬ሺܺሻ െ ቂ0.185ሺ1 ൅ λߪ௒ ‫ݓ‬ହ ∗ ൅ λߤሻಓ ൅ 0.63ሺ1 ൅ λߪ௒ ‫ݓ‬ହ଴ ∗ ൅ λߤሻಓ ൅ 0.185ሺ1 ൅
భ
λߪ௒ ‫ݓ‬ଽହ ∗ ൅ λߤሻಓ ቃ, ...................................................................................................... (E-12) where ‫ ܧ‬ሺܺሻ is the expected value of ܺ given in Table E-1; and ܾௌோ and ܾ௉் are the biases
of SR and PT, respectively.
Table E-1– Expected value of power-normal distribution for different λ
values.
Λ
ࡱሺࢄሻ
1⁄4
ሺߤ⁄4 ൅ 1ሻସ ൅ 3 ߪ ଶ ሺߤ⁄4 ൅ 1ሻଶ ൅ 3 ߪ ସ ⁄256
1⁄3
ሺߤ⁄3 ൅ 1ሻଷ ൅ ߪ ଶ ሺߤ⁄3 ൅ 1ሻ⁄3
1⁄2
ሺߤ⁄2 ൅ 1ሻଶ ൅ ߪ ଶ ⁄4
1
ሺߤ ൅ 1ሻ
Table E-2 – Bias of Swanson’s rule for different λ values.
Λ
࢈ࡿࡾ ൌ ࡱሺ࢞ࡿࡾ ሻ െ ࡱሺࢄሻ
1⁄4
ଶ
ሺ0.6 ‫ݓ‬ଵ଴ ସ െ 3ሻ ସ
ߤ
3ሺ0.075 ‫ݓ‬ଵ଴ ଶ െ 1ሻ ቀ ൅ 1ቁ ߪ ଶ ൅
ߪ
4
256
1⁄3
ߤ
ߪଶ
ሺ0.6‫ݓ‬ଵ଴ ଶ െ 1ሻ ቀ ൅ 1ቁ
3
3
1⁄2
0.6 ‫ݓ‬ଵ଴ ଶ െ 1 ଶ
ߪ
4
1
0
Eq.’s E-11 and E-12 are general equations for the biases of SR and PT, which
illustrate that are function of λ, σ, and µ. Table E-2 – Bias of Swanson’s rule for different
121
Appendices
λ values.ܾௌோ and ܾ௉் for a few power-normal distributions with four different exponents
are given in Table E-2 and E-3, respectively. As provided in these two tables, SR and PT
are unbiased for a normal distribution (λ ൌ 1); however, the bias increases as λ tends to
zero.
Table E-3– Bias of Pearson-Tukey for different λ values.
Λ
࢈ࡼࢀ ൌ ࡱሺ࢞ࡼࢀ ሻ െ ࡱሺࢄሻ
1⁄4
ଶ
ሺ0.37 ‫ݓ‬ହ ସ െ 3ሻ ସ
ߤ
ߪ
3ሺ0.046 ‫ݓ‬ହ ଶ െ 1ሻ ቀ ൅ 1ቁ ߪ ଶ ൅
256
4
1⁄3
ߤ
ߪଶ
ሺ0.37‫ݓ‬ହ ଶ െ 1ሻ ቀ ൅ 1ቁ
3
3
1⁄2
0.37 ‫ݓ‬ହ ଶ െ 1 ଶ
ߪ
4
1
0
122 Appendices
Appendix F : Parameters of the First Order AutoRegressive Model
Let ሼܻ௭ ሽ follow the first auto-regressive model as
ܻ௭ ൌ ‫ ܥ‬൅ ߩଵ ܻ௭ିଵ ൅ ߝ௒ , ............................................................................................... (F-1)
where ‫ ܥ‬is a constant value; ߝ௒ is a RV which is normally distributed with the mean of ߤఌ
and variance of ߪఌ ଶ ; and ‫ ݖ‬is a location where ܻ is measured. It is assumed that ሼܻ௭ ሽ is
stationary which means all moments of ܻ௭ are constant and independent of location ‫ݖ‬,
‫ ܧ‬ሺܻ௭ ሻ ൌ ߤ for all ‫ݖ‬, ܸܽ‫ݎ‬ሺܻ௭ ሻ ൌ ߪ ଶ for all ‫ݖ‬, etc).
The constant value ‫ ܥ‬is derived, as follows. Multiplying both sides of Eq. F-1 by
ܻ௭ିଵ , and then taking expectations from either side yields
‫ ܧ‬ሺܻ௭ ܻ௭ିଵ ሻ ൌ ‫ ܧܥ‬ሺܻ௭ିଵ ሻ ൅ ‫ܧ‬൫ߩଵ ܻ௭ିଵ ଶ ൯ ൅ ‫ ܧ‬ሺߝ௧ ܻ௭ିଵ ሻ. ................................................ (F-2)
Based on the covariance between ܻ௭ and ܻ௭ିଵ , ‫ ܧ‬ሺܻ௭ ܻ௭ିଵ ሻ ൌ ߩଵ ߪ ଶ ൅ ߤଶ , and
‫ܧ‬൫ߩଵ ܻ௭ିଵ ଶ ൯ ൌ ߩଵ ሺߪ ଶ ൅ ߤଶ ሻ. ܻ௭ିଵ is a linear function of ߝ௧ିଵ , ߝ௧ିଶ , ߝ௧ିଷ , ⋯, therefore,
‫ ܧ‬ሺߝ௧ ܻ௭ିଵ ሻ ൌ 0. Hence, Eq. F-2 is simplified to
‫ ܥ‬ൌ ሺ1 െ ߩଵ ሻߤ. .......................................................................................................... (F-3)
In order to obtain ߤఌ , take expectation from both sides of Eq. F-1 as
‫ ܧ‬ሺܻ௭ ሻ ൌ ‫ ܥ‬൅ ߩଵ ‫ ܧ‬ሺܻ௭ିଵ ሻ ൅ ‫ ܧ‬ሺߝ௧ ሻ, ............................................................................. (F-4)
then substitute Eq. F-3 in above equation. Consequently, ‫ ܧ‬ሺߝ௧ ሻ ൌ ߤఌ ൌ 0. ܸܽ‫ݎ‬ሺߝ௧ ሻ is
derived by taking variance from both sides of Eq. F-1 and it is expressed as
ܸܽ‫ݎ‬ሺߝ௧ ሻ ൌ ሺ1 െ ߩଵ ଶ ሻߪ ଶ . ............................................................................................ (F-5)
In order to derive correlation coefficient function. Multiply both sides of Eq. F-1 by
ܻ௭ିఛ and then take expectation; the following equation is obtained
‫ ܧ‬ሺܻ௭ ܻ௭ିఛ ሻ ൌ ‫ ܧ‬ሺ‫ܻܥ‬௭ିఛ ሻ ൅ ߩଵ ‫ ܧ‬ሺܻ௭ିଵ ܻ௭ିఛ ሻ ൅ ‫ ܧ‬ሺߝ௧ ܻ௭ିఛ ሻ. ........................................... (F-6) 123 Appendices
From the definition of covariance, ‫ ܧ‬ሺܻ௭ ܻ௭ିఛ ሻ ൌ ߩఛ ߪ ଶ ൅ ߤଶ , and ‫ ܧ‬ሺܻ௭ିଵ ܻ௭ିఛ ሻ ൌ
ߩఛିଵ ߪ ଶ ൅ ߤଶ . As mentioned before, ܻ௭ିఛ is uncorrelated with ߝ௧ , so ‫ ܧ‬ሺߝ௧ ܻ௭ିఛ ሻ ൌ 0.
Therefore, ߩఛ ൌ ߩଵ ߩఛିଵ , and then ߩఛ ൌ ߩଵ ଶ ߩఛିଶ ൌ ߩଵ ଷ ߩఛିଷ ൌ ߩଵ ఛ ߩ଴ ൌ ߩଵ ఛ , since ߩ଴ ൌ
1. According to the fact that ߩఛ is an even function of ߬, when ܻ௭ is real-valued; the
correlation coefficients can be given by (Priestley 1981)
ߩఛ ൌ ߩଵ |ఛ| , ߬ ൌ 0, േ1, േ2, ⋯ . .................................................................................... (F-7)
Although the AR(1) model considers only the first-step dependency, Eq. F-7 implies
that the correlation coefficients, ߩఛ , does not become zero after ߬ ൌ 1, but approaches
zero instead. The reason is that ܻ௭ is related to ܻ௭ିଵ , and ܻ௭ିଵ is related to ܻ௭ିଶ ,
consequently ܻ௭ is related to ܻ௭ିଶ , and so on.
124 Appendices
Appendix G : Moments of Discretization Methods for
the Case of Dependent Random Variables
The expected value and variance of ‫ݔ‬ௌோ and ‫ݔ‬௉் are functions of the statistical properties
of the 10th, 50th, 90th, and 95th hence the expected value and variance of these percentiles
should be analytically derived first.
Suppose that ݈݊ሺ‫ݕ‬ሻ~ܰ൫݉௬ , ‫ݏ‬௬ ଶ ൯, so ‫ݔ‬௨ ൌ ݁ ௠೤ ା୵ೠ ௦೤ , where ‫ݔ‬௨ is the uth percentile.
According to the properties of the log-normal distribution, the statistical properties of the
percentile can be given as
‫ ܧ‬ሺ‫ݔ‬௨ ሻ ൌ ‫ ܧ‬ሾ݁ ௠೤ ା୵ೠ௦೤ ሿ, .............................................................................................. (G-1) and
ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ ൌ ‫ ܧ‬ሾ݁ ଶ௠೤ ାଶ୵ೠ ௦೤ ሿ െ ‫ܧ‬ሾ݁ ௠೤ା୵ೠ ௦೤ ሿଶ , .......................................................... (G-2)
where ‫ݓ‬௨ ൌ ିଵ ሺ‫ݑ‬/100ሻ, and 
denotes cumulative standard normal probability
density. The analytical expressions of ‫ ܧ‬ሺ‫ݔ‬௨ ሻ and ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ are derived based on the
property of expectation that if two RV’s ܸଵ and ܸଶ are independent, then ‫ ܧ‬ሺܸଵ ܸଶ ሻ ൌ
‫ ܧ‬ሺܸଵ ሻ ‫ܧ‬ሺܸଶ ሻ. As mentioned in Appendix B, the sample mean, ݉௬ , and variance ,‫ݏ‬௬ ଶ , are
independent when samples, ‫ݕ‬ଵ , … , ‫ݕ‬௡ , are assumed identically distributed and follow a
normal distribution. Thus
‫ ܧ‬ሺ‫ݔ‬௨ ሻ ൌ ‫ ܧ‬ሾ݁ ௠೤ ݁ ୵ೠ ௦೤ ሿ ൌ ‫ ܧ‬ሺ
݁ ௠೤ ሻ ‫ܧ‬ሺ݁ ୵ೠ ௦೤ ሻ, ........................................................... (G-3)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ ൌ ‫ ܧ‬ሺ݁ ଶ௠೤ ሻ ‫ܧ‬ሺ݁ ଶ୵ೠ ௦೤ ሻ െ ሾ‫ ܧ‬ሺ‫ݔ‬௨ ሻሿଶ . ........................................................... (G-4) Based on the CLT, ݉௬ is normally distributed with the mean, ߤ, and variance,
ఙమ
௡ᇲ
, where
݊ᇱ ൌ ݊⁄ሾ1 ൅ 2 ∑௡ିଵ
ఛୀଵ ሺ1 െ ߬ ⁄݊ሻߩఛ ሿ and ߩఛ is the correlation coefficient between the pairs
125 Appendices
of ܻ’s which are separated by ߬.Therefore, according to the properties of the log-normal
distribution, ݁ ௠ೊ has the mean
‫ ܧ‬ሺ݁
௠೥ ሻ
ൌ݁
഑మ
൰
మ೙ᇲ
൬ஜା
, .................................................................................................... (G-5) and variance
ܸܽ‫ݎ‬ሺ݁ ௠೥ ሻ ൌ ݁
഑మ
൬ଶஜା ᇲ ൰
೙
ቆ݁
഑మ
೙ᇲ
െ 1ቇ. ............................................................................. (G-6) Moreover, based on the properties of log-normal, ‫ܧ‬൫݁ ௕௠೤ ൯ ൌ ݁ ൣ௕ఓା௕
మ ఙ మ ⁄ଶ௡ᇲ ൧
, where b is
a constant coefficient.
The statistical properties of ‫ ܧ‬ሾ݁ ୵ೠ ௦೤ ሿ is derived by taking expectation from the Taylor
series expansion of ݁ ୵ೠ ௦೤ about the expected value of sample standard deviation, ‫ܧ‬൫‫ݏ‬௬ ൯,
and truncating it to the forth term as
‫ܧ‬ሾ݁ ୵ೠ ௦೤ ሿ ൌ ݁ ୵ೠ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
୵ೠ ೔
௜!
ܶቃ, ..................................................................... (G-7) ௜
where ܶ ൌ ‫ ܧ‬ቄൣ‫ݏ‬௬ െ ‫ܧ‬൫‫ݏ‬௬ ൯൧ ቅ.
As mentioned before, Zieba (2010) derived an expression for the sample variance of
auto-correlated samples (Eq. 6-7). Based on this expression, the variance of autocorrelated samples can alternatively be writen as the product of the variance of
uncorrelated samples and a correction factor of ߚ ൌ
௡
௡ିଵ
ഓ
൤1 െ
ଵାଶ ∑೙షభ
ഓసభ ሺଵି೙ሻఘഓ
௡
൨, where ߚ ߚ
approaches one for large ݊ (Zieba 2010). Consequently a new ESS is defined as
݊௘௙௙ ∗ ൌ
௡
ଵ
, ....................................................................................... (G-8) ഓ
ఉ ቄଵାଶ ∑೙షభ
ቀଵି ቁఘഓ ቅ
ഓ
೙
and used instead of ݊ᇱ . The statistical properties of ‫ݏ‬௬ is derived using the Finney’s
derivations (1941) which can be expressed by
೙೐೑೑ ∗ షభ
‫ ܧ‬ሺ‫ ݏ‬ଶ௣ ሻ ൌ
୻ቆ
୻൬
మ
೙೐೑೑
మ
ା௣ቇ
∗ షభ
൰
൬௡
ଶఙ మ
೐೑೑
௣
∗ ିଵ൰ . ............................................................................ (G-9) 126 Appendices
where ‫ ݌‬is a constant value. Hence, the expected value and variance of the uth percentile
are respectively given by
‫ ܧ‬ሺ‫ݔ‬௨ ሻ ൌ ݁
഑మ
቉
మ೙೐೑೑ ∗
ቈஜା
݁ ୵ೠ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
୵ೠ ೔
௜!
ܶቃ, ........................................................ (G-10) and
ܸܽ‫ݎ‬ሺ‫ݔ‬௨ ሻ ൌ ݁
మ഑మ
቉
೙೐೑೑ ∗
ቈଶஜା
݁ ଶ୵ೠ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺଶ୵ೠ ሻ೔
௜!
ܶቃ െ ሾ‫ ܧ‬ሺ‫ݔ‬௨ ሻሿଶ . .......................... (G-11) The covariance between the uth and vth percentiles is expressed as
ܿ‫ݒ݋‬ሺ‫ݔ‬௨ , ‫ݔ‬௩ ሻ ൌ ‫ ܧ‬ሺ‫ݔ‬௨ ‫ݔ‬௩ ሻ െ ‫ ܧ‬ሺ‫ݔ‬௨ ሻ‫ ܧ‬ሺ ‫ݔ‬௩ ሻ ൌ ‫ ܧ‬ሺ݁ ଶ௠೤ ሻ ‫ܧ‬ൣ݁ ሺ୵ೠ ା୵ೡሻ௦೤ ൧ െ ‫ ܧ‬ሺ‫ݔ‬௨ ሻ‫ ܧ‬ሺ ‫ݔ‬௩ ሻ ൌ
݁
మ഑మ
቉
೙೐೑೑ ∗
ቈଶஜା
݁ ሺ୵ೠ ା୵ೡሻா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵ೠ ା୵ೡ ሻ೔
ܶቃ െ ‫ ܧ‬ሺ‫ݔ‬௨ ሻ‫ ܧ‬ሺ ‫ݔ‬௩ ሻ, ............................. (G-12)
௜!
Substituting Eq. G-9 in the Eq. A-9 yields ‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ and ‫ ܧ‬ሺ‫ݔ‬௉் ሻ as follows:
‫ ܧ‬ሺ‫ݔ‬ௌோ ሻ ൌ ݁
∑
ସ௜ୀଶ
୵వబ ೔
௜!
഑మ
቉
మ೙೐೑೑ ∗
ቈஜା
ቄ0.3݁ ୵భబ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
୵భబ ೔
௜!
ܶቃ ൅ 0.4 ൅ 0.3݁ ୵వబ ா൫௦೤ ൯ ቂ1 ൅
ܶቃቅ,
......................................................................................................................................(G-13
)
and
‫ ܧ‬ሺ‫ݔ‬௉் ሻ ൌ ݁
∑
ସ௜ୀଶ
୵వఱ ೔
௜!
഑మ
቉
మ೙೐೑೑ ∗
ቈஜା
ቄ0.185݁ ୵ఱ ாሺ௦೥ ሻ ቂ1 ൅ ∑ସ௜ୀଶ
୵ఱ ೔
௜!
ܶቃ ൅ 0.63 ൅ 0.185݁ ୵వఱாሺ௦೥ ሻ ቂ1 ൅
ܶቃቅ. ............................................................................................................. (G-14) The variances of SR and PT are respectively given by
ܸܽ‫ݎ‬ሺ‫ݔ‬ௌோ ሻ ൌ
݁
మ഑మ
቉
೙೐೑೑ ∗
ቈଶஜା
∑
ସ௜ୀଶ
ቄ0.09݁ ଶ୵భబ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺଶ୵వబ ሻ೔
௜!
ሺଶ୵భబ ሻ೔
௜!
ܶቃ ൅ 0.24݁ ୵భబ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
ܶቃ ൅ 0.16 ൅ 0.09݁ ଶ୵వబா൫௦೤ ൯ ቂ1 ൅
ሺ୵భబ ሻ೔
௜!
ܶቃ ൅
127 Appendices
0.24݁ ୵వబ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵వబ ሻ೔
௜!
ܶቃ ൅ 0.18ቅ െ ሼ0.09‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻଶ ൅ 0.16‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻଶ ൅
0.09‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻଶ ൅ 0.24‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅ 0.24‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻ ൅ 0.18‫ ܧ‬ሺ‫ݔ‬ଵ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽ଴ ሻሽ, .. (G-15)
and
ܸܽ‫ݎ‬ሺ‫ݔ‬௉் ሻ ൌ
݁
మ഑మ
቉
೙೐೑೑ ∗
ቈଶஜା
∑
ସ௜ୀଶ
ቄ0.034݁ ଶ୵ఱ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺଶ୵వఱ ሻ೔
௜!
ሺଶ୵ఱ ሻ೔
ܶቃ ൅ 0.16 ൅ 0.09݁ ଶ୵వఱ ா൫௦೤ ൯ ቂ1 ൅
௜!
ܶቃ ൅ 0.24݁ ୵ఱ ா൫௦೤൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵ఱ ሻ೔
௜!
ܶቃ ൅ 0.24݁ ୵వఱ ா൫௦೤ ൯ ቂ1 ൅ ∑ସ௜ୀଶ
ሺ୵వఱ ሻ೔
௜!
ܶቃ ൅
0.18ቅ െ ሼ0.09‫ ܧ‬ሺ‫ݔ‬ହ ሻଶ ൅ 0.16‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻଶ ൅ 0.09‫ ܧ‬ሺ‫ݔ‬ଽହ ሻଶ ൅ 0.24‫ ܧ‬ሺ‫ݔ‬ହ ሻ‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ ൅
0.24‫ ܧ‬ሺ‫ݔ‬ହ଴ ሻ‫ ܧ‬ሺ‫ݔ‬ଽହ ሻ ൅ 0.18‫ ܧ‬ሺ‫ݔ‬ହ ሻ‫ ܧ‬ሺ‫ݔ‬ଽହ ሻሽ. ................................................................ (G-16)
128 References
Appendix H : Moments of the Maximum Likelihood
Estimator for Dependent Random Variables
The MLE approximates the parameters of a population by maximizing the likelihood
function. For any data set of ‫ݔ‬ଵ , … , ‫ݔ‬௡ taken from a log-normal population with the logmean of ߤ and log-variance of ߪ ଶ , MLE estimates the mean value as ‫ݔ‬ெ௅ா ൌ
ଵ
݁‫݌ݔ‬൫݉௬ ൅ ‫ݏ‬௬ ଶ ⁄2൯, where ‫ݕ‬௜ ൌ ݈݊ሺ‫ݔ‬௜ ሻ, ݉௬ ൌ ∑௡௜ୀଵ ‫ݕ‬௜ , and the sample variance is
௡
ଶ
‫ݏ‬௬ ଶ ൌ ∑௡௜ୀଵൣ݈݊ሺ‫ݔ‬௜ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ. ....................................................................... (H-1)
The expected value and variance of ‫ݔ‬ெ௅ா are analytically derived based on the property
of expectation of the product of two independent random variables. As stated before, the
sample mean, ݉௬ , and sample variance, ‫ݏ‬௬ ଶ , are independent. Thus
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ‫ܧ‬൫݁ ௠೤ା௦೤
మ ⁄ଶ
൯ ൌ ‫ ܧ‬ሺ݁ ௠೤ ሻ ‫ܧ‬൫݁ ௦೤
మ ⁄ଶ
൯. ..................................................... (H-2) Based on the CLT, ݉௬ is normally distributed with the mean ߤ and variance
ఙమ
୬ᇲ
, where
n
ᇱ ൌ ݊⁄ሼ1 ൅ 2 ∑௡ିଵ
ఛ ሺ1 െ ߬ ⁄݊ሻ ߩఛ ሽ, and ߩఛ is the correlation coefficient between the pairs
of ܻ’s which are separated by ߬ . Therefore, according to the properties of the log-normal
distribution, ݁ ௠೤ has the mean of ݁ ൣஜାఙ
మ ൗ൫ଶ௡ᇲ ൯൧
and variance of ݁ ൣଶஜାఙ
మ ⁄௡ ᇲ ൧
൫݁ ఙ
మ ⁄௡ᇲ
െ 1൯.
Bayley and Hammersley (1946) introduced an effective sample size, ݊௩ ∗ , derived based
on the variance of sample variance, ܸܽ‫ݎ‬൫‫ݏ‬௬ ଶ ൯. Therefore, using the Finney’s derivations
మ
(1941), the expectation of ݁ ௔௦ can be given by
మ
‫ܧ‬൫݁ ௔௦ೌ ൯ ൌ ቂ1 െ
ଶ௔ఙ మ
ቃ
ሺ௡ೡ ∗ ିଵሻ
ሺ೙ ∗ షభሻ
ି ೡమ
, .............................................................................. (H-3) where ܽ is a constant coefficient. Thus the expected value of ݁ ൫௦ೌ
ቂ1 െ ሺ௡
ఙమ
ೡ
∗ ିଵሻ
ቃ
ሺ೙ ∗ షభሻ
ି ೡమ
మ ⁄ଶ൯
is ‫ܧ‬ൣ݁ ൫௦ೌ
, and consequently the expected value of ‫ݔ‬ெ௅ா is given by
129
మ ⁄ଶ൯
൧ൌ
References
‫ ܧ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ ൰
మ೙ᇲ
൬ஜା
ቀ1 െ
ሺ௡
ఙమ
ೡ
ቁ
ሺ೙ ∗ షభሻ
ି ೡమ
∗ ିଵሻ
. ................................................................ (H-4) The variance of ‫ݔ‬ெ௅ா can be written as
ܸܽ‫ݎ‬ൣ݁ ௠೤ ା௦೤
మ ⁄ଶ
మ
൧ ൌ ‫ ܧ‬൛݁ ଶ௠೤ା௦೤ ൟ െ ൛‫ ܧ‬ൣ݁ ௠೤ା௦೤
మ ⁄ଶ
ଶ
൧ൟ . ............................................... (H-5) From the properties of log-normal, ሺ݁ ௕௠೥ ሻ ൌ ݁ ൣ௕ఓା௕
మ ఙ మ ⁄ଶ୬ᇲ ൧
, where b is a constant
coefficient. Then Eq. H-5 is simplified as
ܸܽ‫ݎ‬ሺ‫ݔ‬ெ௅ா ሻ ൌ ݁
഑మ ൬ଶఓା ᇲ ൰
౤
൝݁
഑మ ൬ ᇲ൰
౤
ቀ1 െ
ଶఙ మ
ሺ೙ ∗ షభሻ
ି ೡమ
ቁ
௡ೡ ∗ ିଵ
130 െ ቀ1 െ
ఙమ
ିሺ௡ೡ ∗ ିଵሻ
ቁ
௡ೡ ∗ ିଵ
ൡ. ......... (H-6)
References
References Agterberg, F.P., (1974) “Geomathematics: Mathematical Background and GeoScience Applications” Elsevier Scientific Pub. Co., Amsterdam, New York, 569 p.
Arild, Φ., Lohne, H.P., Bratvold, R., (2008) “A Monte Carlo Approach to Value of
Information” SPE IPTC-11969.
Atkinson, A.C., Pericchi, L.R., Smith, R.L., (1991) “Grouped Likelihood for the
Shifted Power Transformation” Journal of the Royal Statistical Society, Series B, 53, No.
2, 473–482.
Bayley, G.V., Hammersley, G.M., (1946) “The Effective Number of Independent
Observations in an Autocorrelated Time-Series”. J. Roy. Stat. Soc. Suppl., 8, 184-197.
Behboodian, J., (1970) “On the Modes of a Mixture of Two Normal Distributions”
Technometrics, 12, No. 1, 131-139.
Bennion, D.W., (1966) “A Stochastic Model for Predicting Variations in Reservoir
Rock Properties” SPE Journal 1187-PA, 6, No. 1, 9-16.
Bickel, J.E., Lake, L.W., Lehman, J., (2011) “Discretization, Simulation, and
Swanson’s (Inaccurate) Mean” SPE Economic and Management, 3, No. 3, 128-140.
Box, G.E., Cox, D.R., (1964) ‘An Analysis of Transformed Data” Journal of Royal
Statistical Society, Series B, 39, 211-252.
Cartwright, L.G., (2007) “Applying Modern Portfolio Theory in the Upstream Oil
and Gas Sector”, Oil and Gas Financial Journal.
DeGroot, M.H., (1989) “Probability and Statistics”, Addison-Wesley Publishing,
Reading City, 2nd ed., 723 p.
Delfiner, P., (2007) “Three Statistical Pitfalls of Phi-K Transform” SPE Reservoir
Evaluation & Engineering, 10, No. 6, 609-617.
131 References
Dykstra, H., Parsons, R.L., (1950) “The Prediction of Oil Recovery by Waterflood”
Secondary Recovery of Oil in the United States, New York, American Petroleum
Institute, 2nd ed., 160-174.
Efron, B., Tibshirani, R.J., (1993) “An introduction to the bootstrap” Chapman and
Hall, New York, 436 p.
Finney, D.J., (1941) “On the Distribution of a Variate Whose Logarithm is Normally
Distributed” Supplement to the Journal of the Royal Statistical Society, 7, No. 2, 155­
161.
Eisenberger, I., (1964) “Genesis of Bimodal Distributions” Technometrics, 6, No. 4,
357-364.
Emerson, J.D., Stoto, M.A., (1982) “Exploratory Methods for Choosing Power
Transformations” Journal of the American Statistical Association, 77, No.377, 103-108.
Freeman, J., Modarres, R., (2006) “Inverse Box–Cox: The power-normal
distribution” Statistics & Probability Letters, 76, No. 8, 15, 764–772.
Gnanadesikan, R., (1977) “Methods for Statistical Data Analysis of Multivariate
Observations” Wiley, New York, 368 p.
Hinkley, D.V., (1975) “On Power Transformations to Symmetry” Biometrilw, 62,
No.1, 101-11.
Hurst, A., Brown, G.C., Swanson, R.I., (2000) “Swanson’s 30-40-30 Rule” AAPG
Bulletin, 84, No. 12, 1883-1891.
Jensen, J.L., (1998) “Some Statistical Properties of Power Averages for lognormal
Samples” Water Resources Research, 34, No. 9, 2415-2418.
Jensen L.J., Hinkley, D.V., Lake L.W., (1987) “A Statistical Study of Reservoir
Permeability: Distributions, Correlations, and Averages” SPE Formation Evaluation
14270-PA, 2, No. 4, 461-468.
Jensen L.J., Lake L.W., Corbett P.W.M., Goggin D.J., (2000) “Statistics for
Petroleum Engineers and Geoscientists” Elsevier, Amsterdam, New York, 2nd ed., 338 p.
132 References
Kaufman, G.M., (1965) “Statistical Analysis of the Size Distribution of Oil and Gas
Fields” SPE 1096-MS.
Keefer, D.L., (1994) “Certainty Equivalent for Three-Point Discrete-Distribution
Approximations”, Management Science, 40, No. 6, 760-773.
Keefer, D.L., Bodily, S.E., (1983) “Three-Point Approximations for Continuous
Random Variables” Management Science, 29, No. 5, 595-609.
Kendall, M., Stuart, A. (1977) “The Advanced Theory of Statistics”, Macmillan
Publishing Company, New York City, 2, 748 p.
Kenney, J.F., Keeping E.S., (1951) “Mathematics of Statistics” D. Van Nostrand,
Princeton Pt. 2, 2nd ed., 429 p.
Laherrère, J., Sornette, D., (1998) “Stretched Exponential Distributions in Nature and
Economy: “Fat Tails” with Characteristic Scales” The European Physical Journal B, 2,
525-539.
Lambert, M.E., (1981) “A Statistical Study of Reservoir Heterogeneity” MS Thesis,
U of Texas, Austin, TX.
Law, J., (1944) “Statistical Approach to the Interstitial Heterogeneity of Sand
Reservoirs”, Transactions of the AIME, 155, No. 1, 202-222.
Lindgren, B.W., (1968) “Statistical Theory” Macmillan Company, London, 2, 521 p.
MacCrossan, R.G., (1969) “An Analysis of Size Frequency Distribution of Oil and
Gas Reserves of Western Canada” Canadian Journal of Earth Sciences, 6, No. 2, 201­
211.
Megill, R.E., (1984) “An Introduction to Risk Analysis” Pennwell Publishing
Company, Tulsa, Oklahoma, 2nd ed., 274 p.
Miller, A.C., Rice, T.R., (1983) “Discrete Approximation of Probability
Distributions” Management Science, 29, No. 3, 352-362.
Ord, K., Stuart, A., (1987) “Kendall's Advanced Theory of Statistics, Distribution
Theory”, Oxford University Press, New York City, 1, 604 p.
133 References
Pearson, E.S., Tukey, J.W., (1965) “Approximate Means and Standard Deviations
Based on Distances Between Percentage Points of Frequency Curves” Biometrika, 52,
No. 3-4, 533-546.
Pintos, S., Bohorquez, C., Queipo, N.V., (2011) “Asymptotic Dykstra–Parsons
Distribution, Estimates and Confidence Intervals” Mathematical Geosciences, 43, No. 3,
329-343.
Priestley, M.B., (1981) “Spectral analysis and time series” Academic Press, London,
New York, 890 p.
Quenouille, M.H., (1956) “Note on Bias in Estimation” Biometrika, 43, No. 3-4, 353­
360.
Rice, J.A., (2007) “Mathematical Statistics and Data Analysis” the University of
California, Berkeley, 3rd ed., 603 p.
Robertson, C.A., Fryer, J.G., (1969) “Some Descriptive Properties of Normal
Mixture” Scandinavian Actuarial Journal, 1969, No. 3-4, 137-146.
Rollins, J.B., Holditch, S.A., Lee, W.J., (1992) “Characterizing Average Permeability
in Oil and Gas Formations” SPE Formation Evaluation 19793-PA, 7, No.1, 99-105.
Rose, P.R., (2001) “Risk Analysis and Management of Petroleum Exploration
Ventures” American Association of Petroleum Geologists, Tulsa, Oklahoma, 164 p.
Schilling, M.F., Watkins, A.E., Watkins, W., (2002) “Is Human Height Bimodal?”
The American Statistician, 56, No. 3, 223-229.
Seidle, J.P., O’Connor, L.S., (2003) “Production Based Probabilistic Economics for
Unconventional Gas” SPE paper 82024-MS.
Seyedghasemipour, S.J., Bhattacharyya, B.B., (1990) “The Log-hyperbolic an
Alternative to the Lognormal”, Mathematical Geology, 22, No. 5, 557-571.
Steel, R.G.D., Torrie, J.H., (1980) “Principles and Procedures of Statistics: A
Biometrical Approach” 2nd ed., New York: McGraw-Hill, 666 p.
Thiebaux, H.J., Zwiers, F.W., (1984) “The Interpretation and Estimation of Effective
Sample Size” Journal of Applied Meteorology, 23, Issue 5, 800-811.
134 References
Vanmarcke, E., (2010) “Random Fields Analysis and Synthesis” World Scientific
Publishing Company Pte. Ltd, 350 p.
Vicens, G., Schaake J.C., Jr. (1972), “Simulation Criteria for Selecting Water
Resources System Alternatives”, Report No. 154, Ralph M. Parsons Laboratory for Water
Resources and Hydrodynamics, Department of Civil Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts.
Willhite, G.P., (1986) “Waterflooding” Society of Petroleum Engineers, 326 p.
Zięba, A., (2010) “Effective number of observations and unbiased estimators of
variance for autocorrelated data - an overview”, Metrology and Measurement Systems,
17, Issue 1, 3-16.
135 
Download