UNIVERSITY OF CALGARY A Comparison of Mean Estimators by Maryam Moghadasi A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTEMENT OF CHEMICAL AND PETROLEUM ENGINEERING CALGARY, ALBERTA January, 2014 © Maryam Moghadasi 2014 Abstract The mean values of reservoir parameters such as permeability, porosity, and hydrocarbon reserves are widely used to evaluate a formation for potential development and perform reservoir simulations. Among different mean estimators, the arithmetic average and Swanson’s rule are commonly used within the petroleum industry. In the petroleum literature, Swanson’s rule has been promoted as a superior alternative to the arithmetic average. A few researchers have evaluated its performance for the case of a log-normal distribution with a limited range of variability but they have overlooked its performance for other types of distribution, which may describe the distributions of reservoir parameters. Prior studies only concentrated on evaluating the bias of Swanson’s rule whereas an optimum mean estimator should simultaneously have zero bias, small uncertainty, consistency, and high efficiency. In addition to bias, this research study, thus, evaluates the performance of mean estimators based on these toher properties. This research study also compares the performance of Swanson’s rule with some well-known mean estimators: the arithmetic average, maximum likelihood estimator, and Pearson-Tukey’s rule for log normal and the power-normal and bimodal distributions. The mean estimators’ properties are analytically derived and numerically validated via Monte Carlo simulation. We find that none of these mean estimators simultaneously satisfies all conditions of an optimum mean estimator for all ranges of variability and sample size. In other words, each mean estimator can be an optimum mean estimator depending on sample size, variability, and distribution type. Being unbiased is a desirable property, but it is not necessarily the most important property because a mean estimator can be de-biased. We propose a de-biased version of Swanson’s rule and find it is an appropriate alternative for approximating the mean value, particularly for a data set with large standard deviation and small sample size. Moreover, we evaluate the performance of the mean estimators when data follow a first-order auto- ii regressive model to illustrate that the auto-correlation causes the mean estimators to behave differently compared to the uncorrelated case. iii Acknowledgments I would like to express my sincere gratitude to my advisor, Dr. Jerry Jensen for his help, guidance, and encouragements throughout my Ph.D. study. I wish to thank the members of my advisory committee, Dr. Jalal Abedi and Dr. Hassan Hassanzadeh as well as my examining committee, Dr. Laurence Robert Bentley and Dr. Clayton Deutsch for their time and comments. I would also like to thank all my friends and fellow graduate students, in particular: Dr. Danial Kaviani, Mohammad Soroush, and Mehdi Majdi Yazdi for their continuous friendship and support during my Ph.D. study. I gratefully acknowledge the financial support from Natural Sciences and Engineering Research Council of Canada (NSERC). Finally, and the most importantly, I would like to extend my gratitude to my husband (Mehdi Bahonar), my parents (Iraj Moghadasi and Pari Anvari), brother (Alireza Moghadasi), and sister (Roya Moghadasi) for the endless love, support, and encouragement they have given me throughout my Ph.D. study, without which this work would not have been accomplished. iv Wxw|vtàxw àÉ Åç WxtÜ ctÜxÇàá 9[âáutÇw YÉÜ à{x|Ü _Éäx 9fâÑÑÉÜà 9 ZÉw yÉÜ à{x XÇwÄxáá bÑÑÉÜàâÇ|à|xáA v Table of Contents Table of Contents Abstract .......................................................................................................................... ii Acknowledgments .............................................................................................................iv Table of Contents ..............................................................................................................vi List of Figures....................................................................................................................ix List of Tables ...................................................................................................................xiv Nomenclature ................................................................................................................... xv Chapter 1 : Introduction...................................................................................................1 1.1 Thesis Organization................................................................................................... 3 Chapter 2 : Literature Review .........................................................................................6 2.1 2.2 Notation..................................................................................................................... 6 Definitions................................................................................................................. 6 2.2.1 Bias .................................................................................................................. 7 2.2.2 Uncertainty....................................................................................................... 7 2.2.3 Consistency ...................................................................................................... 7 2.2.4 Efficiency......................................................................................................... 8 2.3 Detailed Analysis of Literature ................................................................................. 9 2.3.1 Arithmetic Average .......................................................................................... 9 2.3.2 Discretization Methods .................................................................................... 9 2.3.3 Maximum Likelihood Estimator .................................................................... 14 2.4 Distributions Types ................................................................................................. 15 2.5 Gaps in the Existing Body of Knowledge ............................................................... 17 Chapter 3 : Performance Evaluation for the Case of the Log-Normal Distribution 20 3.1 3.2 3.3 Analytical Expressions of Mean Estimators’ Properties ......................................... 20 Validation of Analytical Expressions using Monte Carlo Simulation .................... 22 Analysis of the Analytical Expressions of the Mean Estimators’ Properties .......... 25 vi Table of Contents 3.4 Improving Swanson’s Rule ..................................................................................... 32 3.4.1 Adjusting Swanson’s Rule by a Coefficient .................................................. 32 3.4.2 Moment Matching with Fixed Values ........................................................... 34 3.5 Concluding Remarks ............................................................................................... 38 Chapter 4 : Performance Evaluation for the Case of Bimodal Distribution .............39 4.1 4.2 4.3 4.4 Analytical Expressions of Mean Estimators’ Properties ......................................... 40 Validation of Analytical Expressions using Monte Carlo Simulation .................... 42 Analyses of the Analytical Expressions of Mean Estimators’ Properties ............... 44 Concluding Remarks ............................................................................................... 47 Chapter 5 : Performance Evaluation for the Case of Power-Normal Distribution ..48 5.1 5.2 5.3 5.4 5.5 Analytical Expressions of Mean Estimators’ Properties ......................................... 48 Validation of Analytical Expressions using Monte Carlo Simulation .................... 51 Analyses of Mean Estimators’ Properties ............................................................... 55 Improving Swanson’s Rule ..................................................................................... 58 Concluding Remarks ............................................................................................... 64 Chapter 6 : Performance Evaluation for the Case of Auto-Correlated Random Variables 65 6.1 6.2 6.3 6.4 6.5 6.6 Assumptions ............................................................................................................ 65 Analytical Expressions of Mean Estimators’ Properties ......................................... 67 Analytical Expression Validations Using Monte Carlo Simulation ........................ 70 Analysis of the Analytical Expressions of the Mean Estimators’ Properties .......... 81 Auto-Correlated Random Variables with Bimodal Distribution ............................. 85 Concluding Remarks ............................................................................................... 89 Chapter 7 : Comparison of Mean Estimators for Independent Random Variables. 90 Chapter 8 : Case Studies.................................................................................................96 Chapter 9 : Conclusions and Recommendations........................................................105 9.1 9.2 Conclusions ........................................................................................................... 105 Future Work .......................................................................................................... 107 9.2.1 Evaluate Swanson’s Rule Performance for Very Small Sample Sizes ........ 107 9.2.2 Consider Beta Distribution for Percentiles .................................................. 108 9.2.3 Extend Delfiner’s Approach ........................................................................ 108 9.2.4 Evaluate Swanson’s Rule Performance for Truncated Log-normal Distribution.................................................................................................................... 108 vii Table of Contents Appendix A : Order-Statistics Samples .......................................................................109 Appendix B : Moments of the Maximum Likelihood Estimator ...............................112 Appendix C : Conditions for a Bimodal Distribution.................................................114 Appendix D : First and Second Moments of Maximum Likelihood for Bimodal Distribution ............................................................................................................117 Appendix E : First and Second Moments of a Power Normal Distribution .............119 Appendix F : Parameters of the First Order Auto-Regressive Model ......................123 Appendix G : Moments of Discretization Methods for the Case of Dependent Random Variables .........................................................................................................125 Appendix H : Moments of the Maximum Likelihood Estimator for Dependent Random Variables .........................................................................................................129 References ...................................................................................................................... 131 viii Table of Figures List of Figures Fig. 2-1 – Estimated x10, x50, x90, and xSR (black squares) compared to the exponentialregression function (Delfiner 2007).................................................................................. 11 Fig. 3-1 – Comparison of E(xT)/E(X) and (xT)A/E(X) of (a) SR and PT (b) the AA and MLE. ................................................................................................................................. 23 Fig. 3-2 – Standard errors of the AA, MLE, SR, and PT obtained from analytical and numerical approaches for the cases of = 1 and = 1.5 . ............................................... 24 Fig. 3-3 – RMSE’s of the AA, MLE, SR, and PT obtained from analytical and numerical approaches for the cases of = 1 and = 1.5. ................................................................. 25 Fig. 3-4– Analytical ratios of E(xT)/E(X) versus σ and the Dykstra-Parsons coefficient. 26 Fig. 3-5 – Ratio of SE’s to x50 of the AA, MLE, SR, and PT for four different σ values. 29 Fig. 3-6 – Ratios of SE/x50 of the AA, SR, MLE, and PT versus σ and VDP for (a) n=50 and (b) n=600.................................................................................................................... 29 Fig. 3-7 – Ratio of RMSE to x50 of the AA, SR, MLE, and PT for four different σ values. ........................................................................................................................................... 30 Fig. 3-8 – Ratios RMSE/x50 of the AA, MLE, SR, and PT versus σ when (a) n=50 or (b) n=600. ............................................................................................................................... 31 Fig. 3-9 – Sample standard deviation obtained from analytical expression and MC simulation with error bars showing 95% confidence interval (a) for two different σ values (b) for general case............................................................................................................ 33 Fig. 3-10 – E(xSR‐C1 ) obtained from analytical expression and MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown.................. 33 Fig. 3-11 – Weights of SR versus σ, where σ is known and unknown with error bars showing 95% confidence interval..................................................................................... 35 ix Table of Figures Fig. 3-12 – E(xSR‐C2 ) obtained from analytical expression and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. .......................................................................................................... 36 Fig. 3-13 – Ratio of the expected values of SR, SRC1, SRC2, PT, and the MLE to E(X). 36 Fig. 3-14 – (a) RMSE/x50 and (b) SE/x50 of the AA, MLE, PT, SR, SRC1, and SRC2 versus σ and VDP when n=200. .................................................................................................... 37 Fig. 3-15 – Ratio of the RMSE’s of the AA, MLE, PT, SR, SRC1, and SRC2 to x50 versus the square root of the inverse of sample size. ................................................................... 37 Fig. 4-1– Bimodal region when µ1=1 and σ2=0.5............................................................. 41 Fig. 4-2 – (a) Expected value and (b) SE of the AA. ........................................................ 43 Fig. 4-3 – (a) Expected value and (b) SE of MLE. ........................................................... 43 Fig. 4-4 – (a) Expected value and (b) SE of SR................................................................ 44 Fig. 4-5 – (a) Expected value and (b) SE (b) of PT. ......................................................... 44 Fig. 4-6 – Ratio E(xT)/E(X) of (a) the AA and MLE, and (b) SR and PT when σ2=0.5... 45 Fig. 4-7 – Standard errors of the AA, MLE, SR, and PT for four different values of σ1. 46 Fig. 4-8 – RMSE’s of the AA, MLE, SR, and PT for four different values of σ1. ........... 47 Fig. 5-1 – Ratios of MC to analytical results of (a) expected value, (b) SE, and (c) RMSE of the AA for the case of square root power-normal distribution. .................................... 53 Fig. 5-2 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of SR for square root power-normal distribution. ................................................. 54 Fig. 5-3 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of the PT for square root power-normal distribution. ........................................... 55 Fig. 5-4 – E(XT)/E(X) of (a) SR and (b) PT versus σ for different λ values..................... 56 Fig. 5-5 – Analytical ratios of (a) E(xSR)/E(X) and (b) E(xPT)/E(X) versus VDP for different λ values............................................................................................................... 56 Fig. 5-6 – Standard errors of the AA, SR, and PT for four different values of λ and σ. .. 57 x Table of Figures Fig. 5-7 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. ............. 58 Fig. 5-8 – Justified weights of SR versus σ for three different λ values when σ is known and unknown with error bars showing a 95% confidence interval. .................................. 60 Fig. 5-9 – E(xSR_C) analytically derived and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. 61 Fig. 5-10 -- Ratio of the expected value of SRC to E(X) for four λ values....................... 61 Fig. 5-11 – SE’s of the AA, SR, and PT for four different values of λ and σ. ................. 62 Fig. 5-12 – σ versus n showing regions that SRc has smaller SE than (a) PT and (b) the AA..................................................................................................................................... 63 Fig. 5-13 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. ........... 63 Fig. 5-14 – σ versus n showing regions where SRC is more efficient than SR when (a) λ=1/2 and (b) SRC is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient. .................................. 64 Fig. 6-1 – Expected value of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval............................................................................................................ 70 Fig. 6-2 – Standard error of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval............................................................................................................ 71 Fig. 6-3 – Expected value of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 72 Fig. 6-4 – Standard error of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 73 Fig. 6-5 – Expected value of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 74 xi Table of Figures Fig. 6-6 – Standard error of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 75 Fig. 6-7 – Expected value of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval............................................................................................................ 76 Fig. 6-8 – Standard error of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. ............................................................................................................................. 77 Fig. 6-9 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. ................................... 78 Fig. 6-10 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5. ................................... 79 Fig. 6-11 – The ratio of standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. ................................... 80 Fig. 6-12 – The ratio of standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5................................... 81 Fig. 6-13 – Analytical standard errors/x50 of the AA, SR, and PT. .................................. 82 Fig. 6-14 – Analytical RMSE/x50 of the AA, SR, PT, and MLE...................................... 83 Fig. 6-15 – The ratio of standard errors of the mean estimators to x50 which analytically derived for three different ρx1 values when σ=1.5. ........................................................... 84 Fig. 6-16 – RMSE/x50’s of the mean estimators analytically derived for three different ρx1 values when σ=1.5. ........................................................................................................... 85 xii Table of Figures Fig. 6-17 – Standard errors of the AA, SR, and PT with error bar showing 95% confidence interval............................................................................................................ 88 Fig. 6-18 – RMSE`s of the AA, SR, and PT with error bar showing 95% confidence. ... 89 Fig. 7-1 – σ versus n showing regions in which a mean estimator has (a) the smallest bias, (b) has the lowest SE, and (c) is the most efficient estimator compared to other estimators for the case of log-normal distribution.............................................................................. 91 Fig. 7-2 –σ1 versus n showing regions in which (a) a mean estimator has smaller uncertainty, and (b) is more efficient than other estimators when σ2=0.5 for the case of bimodal distribution. ......................................................................................................... 92 Fig. 7-3 – SR has smaller SE than the AA when σ is greater than the value given by each curve depending on n and λ; otherwise the AA has less SE for the case of power-normal distribution (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). .................................................................................................. 92 Fig. 7-4 – (a) PT is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient; and (b) when λ =1/16, a mean estimator is the most efficient depending on σ and n (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). ................................. 93 Fig. 8-1 – Probability plots of data sets taken from (a) Hurst et al. (2000) in million barrel oil (MMBO), and (b) EUR of an OK field in million cubic feet (MMCFE) with statistical properties calculated from available data sets. ................................................................. 97 Fig. 8-2 – Probability plot of the data set taken from MacCrossan (1969) with sample statistical properties calculated from available data sets. ............................................... 101 Fig. 8-3 – Probability plot of the transformed EUR of the Hemphill gas field with exponent λ=0.28.............................................................................................................. 102 Fig. 8-4 – Probability plot of a permeability data set taken from North Sea. ................. 103 xiii List of Tables List of Tables Table 3-1– Analytical expressions of E(xT)/E(X)............................................................. 23 Table 3-2– Analytical expressions of RMSE’s of the mean estimators. .......................... 24 Table 5-1 – Derived ω’s for some power normal distributions with different λ’s............ 59 Table 8-1 – Statistical properties of the Hurst et al.’s (2000) data set. ............................. 98 Table 8-2 – Statistical properties of gas reserves of an Oklahoma field. ......................... 99 Table 8-3 – Statistical properties of measured permeability in Cleveland Formation. .. 100 Table 8-4 – Statistical properties of the data set taken from MacCrossan (1969). ......... 102 Table 8-5 – Statistical properties of EUR data set of the Hemphill gas field................. 103 Table 8-6 – Statistical properties of permeability data set measured along a well located in the North Sea. ............................................................................................................. 104 Table E-1– Expected value of power-normal distribution for different λ values. .......... 121 Table E-2 – Bias of Swanson’s rule for different λ values. ............................................ 121 Table E-3– Bias of Pearson-Tukey for different λ values. ............................................. 122 xiv Nomenclature Nomenclature Symbols ்ܾ = Bias of the mean estimator ܶ ܿݒሺ. ሻ = Covariance ܧሺ. ሻ = Expected value ݄ ሺݔሻ = Probability density function of ܺ ܪ ሺݔሻ = Cumulative density function of ܺ m = Number of data sets n = Sample size ܲ୧ = Assigned weight to the uth percentile = ݏSample standard deviation ܵ݀ݐሺ. ሻ = Standard deviation ܶ = Mean estimator ܸܽݎሺ. ሻ = Variance ܸ = Dykstra-Parsons coefficient ݓ௨ = ିଵ ሺݑ/100ሻ ݓ௨ ∗ = ் ିଵ ሺݑ⁄100ሻ = ݔDeterministic variable = ்ݔApproximated mean value by the estimator ܶ using an analytical expression ݔො் = Approximated mean value using the estimator ܶ obtained from numerical approach xv Nomenclature ݔ௨ = The uth percentile ܺ = Random variable Abbreviations AA = Arithmetic Average AR(1) = First order auto-regressive model CDF = Cumulative Density Function CV= Coefficient of Variation d.i.d.= Dependent and identically distributed ESS = Effective sample size EUR = Estimated ultimate recovery i.i.d. = Independent and identically distributed LF = Likelihood function MC = Monte Carlo mD = Millidary MLE= Maximum Likelihood Estimator MMBO = Million barrels of oil MMCFE = Million cubic feet MSE = Mean square error N = Normal OK = Oklahoma PDF = Probability Density Function xvi Nomenclature PT = Pearson-Tukey’s rule RMSE = Root Mean Square Error RV= Random variable SD = Standard deviation SE = Standard error SR = Swanson’s Rule TN = Truncated normal TND = Truncated normal distribution Greek Symbols ߙ = Portion of a distribution in a bimodal distribution ߚ = An index parameter and a correction factor ߠ = Population parameter λ = Exponent for power-normal transformation ߤ = Standard deviation of a population ߩఛ = Correlation coefficient between pairs of values separated by an interval ߬ ߪ = Standard deviation of a population ∅= Standard normal probability density function = Cumulative standard normal density function ் = Truncated cumulative standard normal density function ߱ = Assigned weights to the uth percentile in discretization methods ߱ ෝ = Estimated ߱ xvii Nomenclature Γሺ. ሻ = Gamma function Subscript A = Arithmetic average dep = Dependent dis = Discretization method eff = Effective xviii Introduction Chapter 1 : Introduction The mean values of reservoir parameters such as permeability, porosity, fluid saturation, recovery factor, and hydrocarbon reserves are widely used to evaluate a formation for potential reservoir development. For example, these parameters are implemented into reservoir simulators to predict complex fluid flow behavior in reservoirs and also used in decision analysis. Thus it is imperative to choose an optimal mean estimator among several available options. An optimum mean estimator should simultaneously be unbiased and consistent and have small uncertainty and large efficiency (definition of these terms are given in the next chapter). Among different mean estimators, the arithmetic average (AA) and discretization methods, such as Swanson’s rule (SR) (Megill 1984), are commonly used within the oil and gas industry. The AA approximates a mean value based on assigning equal weights of 1/n to all n samples. SR, on the other hand, assigns different weights to the sample 10th, 50th, and 90th percentiles, respectively. Two examples are briefly explained here to emphasize how the selection of a mean estimator can significantly affect the project development process and economic assessment. First dataset belongs to Cleveland Formation permeability measurements (Rollins et al. 1992) and the second dataset consists of the estimated ultimate recovery of 416 wells located in the Hemphill Field, Texas, USA. In the first dataset, SR estimates the mean value as 0.09 md while the AA gives 0.18 md; a 50% difference. For production prediction, the two-fold difference is relatively modest. The difference is important, however, for tax and regulatory purposes as it changes the Cleveland Formation from a tight (< 0.1 md) to a conventional classification. In the second dataset, the AA and SR give the sample means of 1,824 and 1,797 million cubic feet (MMCFE), respectively; only a 1.5% difference. Although the difference is very small, it is equivalent to around 27 MMCFE per well and 11,300 MMCFE in total difference in reserves estimation which is in turn equivalent to the significant difference of ~US$ 34 millions in economical assessment, assuming the gas price of US$ 3/MCFE. Hence, it is 1 Introduction clear that the choice of a proper mean estimator is important, particularly to estimate the mean values of critical reservoir parameters such as hydrocarbon reserves and permeability. Despite many studies that have used and supported SR as an alternative mean estimator, there are only few studies, such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011), that have questioned SR applicability as a good mean estimator and evaluated its performance in terms of its bias for the case of log-normal distribution with limited variability. It is desirable to have an unbiased property; however, the bias can be removed using a correction factor. Thus, besides bias, other mean estimators’ properties such as uncertainty, consistency, and efficiency should be evaluated. However, none of Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) has investigated the consistency, uncertainty, and efficiency of SR in addition to its bias. Thus, this study focuses on the comprehensive evaluation of SR performance based on its bias, uncertainty, consistency, and efficiency for the case of log-normal distribution with wider range of variability since reservoir characteristics can be highly variable (Rollins et al. 1992). SR performance is compared to the performances of the AA and other discretization methods, such Pearson-Tukey’s rule (PT) (Pearson-Tukey 1969). In addition to the AA and PT, the SR performance is also compared to the performance of another mean estimator known as maximum likelihood estimator (MLE) since it is an optimum mean estimator from statistical perspective (Kenney and Keeping 1951; Quenouille 1956; Kendell and Stuart 1977). The log-normal distribution is commonly used to describe the distributions of reservoir parameters such as drainage areas, gross and net pay, reserves, recovery, and permeability (e.g., Kaufman 1965; Bennion 1966; Megill 1984; Rollins et al. 1992; and Rose 2001; Seidle and O’Connor 2003). Several studies, on the other hand, have illustrated that reservoir parameters are not necessarily log-normally distributed and can be described by other kinds of distribution such as power-normal and bimodal distributions (e.g., Jensen et al. 1987; Seyedghasemipour and Bhattacharyya 1990; and MacCrossan 1969). However, no attention has been paid to the SR performance when the underlying distribution is either a power-normal or bimodal. Thus the intention of this 2 Introduction research is also to evaluate the performances of the mean estimators for the cases of non­ log-normal distribution (e.g., power-normal and bimodal distributions). Thus far, all studies on SR performance have been conducted with the assumption that samples are independent and identically distributed. However, reservoir parameters such as permeability might be auto-correlated (Jensen et al. 2000). Thus in this study, the SR performance is evaluated and compared to the performances of other mean estimators when random variables are identically distributed and follow the first order autoregressive model. In the course of this study, it was found that none of the AA, MLE, SR, and PT has all conditions of an optimum mean estimator for any sample size and variability. The AA is unbiased and MLE is asymptotically unbiased whereas both SR and PT are biased except for the case of the normal distribution. They estimate a mean value with small bias for the nearly-homogeneous case but significantly underestimate the mean value as variability increases. Although SR approximates the mean value with slightly larger bias than PT, this study demonstrates that SR is more efficient and less uncertain than the AA, MLE, and PT for some ranges of sample size and variability. Therefore, there is likely that SR becomes an optimum mean estimator for some ranges of variability and sample size because the SR bias can be compensated by its smaller uncertainty and higher efficiency. As just pointed out, bias can be removed using a correction factor and by making some modifications in mean estimator’s formula. In this study SR is converted into an unbiased mean estimator using two approaches: (1) adjusting SR weights based on population variability and (2) using a correction factor. These approaches lead to the introduction of an unbiased SR with the highest efficiency and lowest uncertainty for some range of sample size and variability. 1.1 Thesis Organization In the next few paragraphs, the main contents of each chapter are described. Chapter Two describes previous work done on evaluating Swanson’s rule and other mean estimators in detail. It provides the definitions of bias, uncertainty, consistency, and 3 Introduction efficiency which will be widely used in the next chapters. Furthermore, some gaps in the literature are identified, and motivate those issues that are assessed in this research study. Chapter Three is devoted to evaluate the performance of selected mean estimators when the underlying distribution is log-normal. In this chapter, it is attempted to propose two methods to de-bias SR by modifying the weights of SR and defining a correction factor. Due to geology variation, reservoir parameters can follow a bimodal distribution. Thus, Chapter Four evaluates the performance of mean estimators with the assumption that reservoir parameters can be described by bimodal distribution. It shows that both PT and SR are biased; the AA is unbiased; and MLE is asymptotically unbiased. Although MLE has the smallest uncertainty and the highest efficiency, it involves complex manipulation, thus other mean estimators are preferable. Chapter Five describes the performance of mean estimators when reservoir parameters follow a power-normal distribution. This chapter illustrates that SR estimates a mean value with insignificant bias when the exponent for power-normal transformation, ߣ, tends to one; however, it significantly underestimates the mean value as ߣ approaches zero. None of the mean estimators under review is an absolute winner of being an optimum mean estimator for any variability, sample size, and λ. Thus each of them can be chosen as the optimum mean estimator for certain ranges of variability and sample size, depending on the λ value. Moreover, in this chapter, it is attempted to de-bias SR by modifying the SR’s weights. Up to Chapter Five, it is assumed that reservoir parameters are independent; however, reservoir parameters can be auto-correlated. Chapter Six assesses the performance of mean estimators when random variables are auto-correlated. This chapter shows that auto-correlation between data points results in a decrease in efficiency and an increase in uncertainty. In other words, auto-correlated samples are less informative than un­ correlated samples, thus more auto-correlated samples, known as the effective sample size (ESS), are needed to achieve a given accuracy. The auto-correlation causes the mean estimators to behave differently and, depending on which mean estimator is used, different ESS are needed to achieve a given accuracy. Chapter Seven gives an integrated 4 Introduction view of the preceding results to define regions where one mean estimator has smallest uncertainty and bias, and the highest efficiency among other mean estimator. These regions are determined as a function of the number of samples and variability. This chapter shows that at certain values of sample size and variability, SR is an optimum mean estimator because its bias can be compensated by its smaller uncertainty and higher efficiency. In Chapter Eight, the applications of the mean estimators on some case studies are shown. In this regard, the statistical properties of mean estimator are analytically calculated and compared to the results obtained from the bootstrap method. The last chapter lists the main conclusions of this study and raises some questions and issues for future research. 5 Literature Review Chapter 2 : Literature Review As mentioned in Chapter One, the performances of the mean estimators are evaluated based on four properties. This chapter provides the definitions of these properties in detail. This chapter also summarizes previous studies done on evaluating the performances of those mean estimators that were mentioned in the previous chapter; following that, it describes the issues which have been overlooked in these studies. 2.1 Notation The following notation is used in this study. Assume ܺ is a random variable (RV) with probability density function (PDF) of ݄ ሺݔሻ, where ݔis a deterministic variable. The expected value of ܺ is given by ஶ ܧሺܺሻ ൌ ିஶ ߦ݄ ሺߦ ሻ݀ߦ , ................................................................................................ (2-1) and ܸܽݎሺܺሻ ൌ ܧሼሾܺ െ ܧሺܺሻሿଶ ሽ is the variance of ܺ. ்ݔdenotes an approximated mean value by an estimator T using an analytical expression and ݔො் represents the estimated mean value obtained from a numerical approach using Monte Carlo (MC) simulation. 2.2 Definitions The mean estimator ܶ, which estimates a population mean, ்ݔ, is a function of samples that are randomly taken from the population. Therefore, ்ݔis a RV and whose behaviour is described by a PDF. The mean and standard deviation of this PDF are used to analytically and numerically compute the mean estimator’s properties. The choice of a mean estimator amongst other mean estimators depends on its performance compared to the performances of other estimators which are evaluated based on their properties: bias, uncertainty, consistency, and efficiency. These properties are not 6 Literature Review necessarily the most important, but they are commonly considered to treat estimates thus this section is devoted to describing these properties in detail. 2.2.1 Bias It is desirable that the PDF of the estimate, ்ݔ, is centered around the true mean value, ܧሺܺሻ; otherwise the mean estimator, ܶ, tends to underestimate or overestimate a mean value. Bias measures the difference between the expected value of the mean estimator, ܧሺ ்ݔሻ, and the true mean value as ்ܾ ൌ ܧሺ ்ݔሻ െ ܧሺܺሻ, .................................................................................................... (2-2) The estimator, ܶ, is unbiased when ்ܾ ൌ 0, otherwise it is biased. 2.2.2 Uncertainty Uncertainty as the second mean estimator’s property refers to the range of possible outcomes and is desirable to be as small as possible. It is evaluated in terms of the standard error (SE) of a mean estimator; an estimator with smaller SE has a lower degree of uncertainty. 2.2.3 Consistency Another feature used to assess the performance of estimators is consistency. It is expressed by another parameter known as mean square error (MSE) which is given by ܧܵܯൌ ܧሼሾ ்ݔെ ܧሺܺሻሿଶ ሽ. The estimator, ܶ, is consistent when the following condition is satisfied ଶ lim→ஶ ܧቄൣ ்ݔ െ ܧሺܺሻ൧ ቅ ൌ 0, ................................................................................. (2-3) where ݊ is the number of samples (Lindgren 1968). By expanding Eq. 2-3, the consistent condition can be rewritten as lim→ஶ ൣ்ܸܽݎ ்ܾ ଶ ൧ ൌ 0, ........................................................................................ (2-4) where ்ܸܽݎ is the variance of the estimator ܶ. According to Eq. 2-4, MSE incorporates both the bias and variance of the mean estimator, ܶ. 7 Literature Review The MSE has the same units as the square of the quantity being estimated. In analogy to the standard deviation, the square root of the MSE known as the root mean square error (RMSE), which has the same units as the quantity being estimated, is considered here instead of MSE. Taking square root of Eq. 2-4 modifies the consistent condition as lim→ஶ ටൣ்ܸܽݎ ்ܾ ଶ ൧ ൌ 0. .................................................................................... (2-5) Eq. 2-5 is satisfied when the variance and bias of ܶ both approach zero as ݊ becomes very large. In other words, the variation of the sequence ൛ ்ݔ ൟ becomes more and more concentrated around ܧሺܺሻ as ݊ increases or the sequence ൛ ்ݔ ൟ converges in probability. This means that probability that the sequence ൛ ்ݔ ൟ differs from ܧሺܺሻ becomes smaller and smaller as the number of samples tends to infinity. This definition is formulated as (Lindgren 1968) lim→ஶ ܲ൫ห ்ݔ െ ܧሺܺሻห ߳൯ ൌ 0, for any ߳ 0. ................................................. (2-6) However, Eq. 2-6 cannot always be satisfied because there might be an estimator ܶ′ that the variation of sequence ൛் ݔᇲ ൟ becomes more and more centered around a value which is different from ܧሺܺሻ as ݊ becomes infinite. Therefore, the estimator ܶ′ is known as the estimator that converges in probability but to a wrong value (Lindgren, 1968). 2.2.4 Efficiency The forth property of an optimal estimator is that the RMSE should be as small as possible. Lindgren (1968) states that the estimator ܶ is more efficient than an estimator ܶ′ if ටൣܸܽ ்ݎ ்ܾ ଶ ൧ ටൣ்ܸܽݎᇱ ܾ ்ᇲ ଶ ൧. ........................................................................... (2-7) In other words, an estimator with smaller RMSE is more precise and efficient. In the case of unbiased estimators, ்ܾ ൌ ܾ ்ᇲ ൌ 0, Eq. 2-7 is reduced to the inequality between the SE’s of the two estimators, and the one with the smaller SE is more efficient. 8 Literature Review 2.3 Detailed Analysis of Literature Among the different mean estimators, this study concentrates on a few mean estimators which are extensively used in the oil and gas industry: the AA and discretization methods such as SR and PT. In addition to the AA, SR, and PT, MLE is another mean estimator that is evaluated here since MLE is an optimum mean estimator from a statistical perspective (Kenney and Keeping 1951; Quenouille 1956; Kendell and Stuart 1977). 2.3.1 Arithmetic Average The AA is used to approximate the mean values of reservoir parameters such as porosity and permeability when a medium is horizontally stratified and the flow path parallels the layers. It estimates a mean value based on assigning equal weights of 1/݊ to all n samples of ܺଵ , ܺଶ , … , ܺ as ଵ ݔ ൌ ∑ୀଵ ܺ . ............................................................................................................ (2-8) For decades, the properties of the AA have been studied by many researchers such as Kenney and Keeping (1951, p. 133), Lindgren (1968, p. 221), and Jensen (1998). The sequence ሼݔ ሽ is centred around ܧሺܺሻ with the variability of ܸܽݎሺܺሻ⁄݊ (i.e., ܧሺݔ ሻ ൌ ܧሺܺሻ, and ܸܽݎሺݔ ሻ ൌ ܸܽݎሺܺሻ⁄݊) and thus the AA is an unbiased mean estimator. It may suffer sampling variation since a limited number of samples is available for estimating the mean value; however, it can be reduced by taking a sufficient number of samples. 2.3.2 Discretization Methods Among different discretization methods, SR and PT are mostly used as alternative mean estimators within the oil and gas industry. They, unlike the AA, approximate the mean value by assigning unequal weights to the percentiles. PT was introduced by Pearson-Tukey (1969) as a mean estimator which gives the mean value as ݔ் ൌ 0.185ݔହ 0.630ݔହ 0.185ݔଽହ , ................................................................. (2-9) 9 Literature Review where ݔହ , ݔହ , and ݔଽହ are the 5th, 50th, and 95th percentiles, respectively. Later in 1972, Swanson proposed SR as an alternative mean estimator as ݔௌோ ൌ 0.3ݔଵ 0.4ݔହ 0.3ݔଽ , ............................................................................(2-10) where ݔଵ and ݔଽ are the 10th and 90th percentiles, respectively (Megill 1984). Swanson empirically found this rule as a “good” mean estimator for modestly skewed distributions (Megill 1984). SR is more commonly used within the oil and gas industry than PT, perhaps because it involves the 10th and 90th percentiles which are representative of possible and proved reserves, respectively. Many studies recommended SR as an alternative mean estimator such as Hurst et al (2000), Rose (2001), and Delfiner (2007). Arild et al. (2008), on the other hand, suggested MC simulation instead of SR. Hurst et al. (2000) applied SR to estimate the mean reserves of the fields and discoveries in the upper Jurassic, salt related play in the United Kingdom and Norwegian Central North Sea. They recommended the use of SR based on work done by Megill (1984). Hurst et al. (2000), however, did not consider the bias and uncertainty associated with SR. Delfiner (2007) used SR to estimate permeability from the porosity-permeability (Phi-k) relationship to approximate the pseudo-flow profile. In this regard, he vertically divided the Phi-k cross plot on semi-log scale into slices with 5-p.u. width and computed the 10th, 50th, and 90th percentiles for each slice (Fig. 2-1). Subsequently the effective permeability was computed using SR for each slice and then a trend line was fitted through the obtained SR points to provide an equation to estimate permeability at any given porosity in the fitted range. He computed the pseudo-flow profile by two approaches: first, the predicted permeability obtained from fitted curve through SR points and second, the permeability approximated by exponential regression. Following that, the estimated pseudo-flow profiles obtained from these two approaches were compared to the true profile. Consequently, Delfiner (2007) illustrated an improvement in the predicted pseudo-flow profile as a result of using SR. Thus, he proposed SR as a solution to resolve 10 Literature Review the pitfall associated with Phi-k transforms. He also showed that SR improves the permeability power averaging. Fig. 2-1 – Estimated x10, x50, x90, and xSR (black squares) compared to the exponential-regression function (Delfiner 2007). Rose (2001) advocated SR to approximate the mean value of parameters which are log-normally distributed with low and moderate heterogeneity, particularly hydrocarbon reserves. He stated that the distribution of reserves should be truncated below ܲଵ and above ܲଽଽ because of economic limits on producing reserves below ܲଵ and low probability of occurrence of reserves above ܲଽଽ . Otherwise, approximating the mean value yields an “unrealistically large” value. He referred to the population mean with the truncated distribution as the truncated mean and stated that the truncated mean values are close to the mean values estimated via SR. Rose (2001) supported his argument using a log-normal distribution with the log-mean of ߤ ൌ 1.61 and log-standard deviation of ߪ ൌ 1.67. He calculated the mean value before and after truncation and concluded that SR underestimates the population mean by 24% error while it underestimates the mean value after truncation by 4%. Hence, he supported the use of SR as an appropriate alternative approach to estimate the mean value of hydrocarbon reserves. Arild et al. (2008) compared SR to MC simulation in the context of the value of information (VOI). VOI investigates whether additional information should be collected prior to making a decision. They illustrated that results obtained from SR and MC simulation are different and suggested the use of MC simulation instead of SR, although 11 Literature Review they did not conclude which method has smaller bias since their problem did not have an analytical solution. While numerous studies have used SR to estimate a mean value, a few researchers such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) have investigated the SR performance to compare with other mean estimators in terms of their bias. Indeed, a few studies have also considered the SR for its abilities to estimate higher moments. Keefer and Bodily (1983) used both PT and SR to approximate a population variance1 and referred to them as extended SR and PT, although both Pearson and Tukey (1965) and Swanson (Megill, 1984) did not recommend PT and SR as estimators for higher moments and only proposed them for estimating mean values. Keefer and Bodily (1983) numerically investigated the performances of PT and SR for a wide range of beta distributions and limited range of log-normal distributions with log-mean of ߤ ൌ 0 and log-standard deviation of ߪ ∈ ሾ0.1, 1.5ሿ, judging their ability to approximate the mean and variance values. They concluded that both PT and SR perform well as mean estimators while PT estimates the variance more accurately than SR. Therefore, they advocated PT as the “clear winner”. Keefer and Bodily (1983) also qualitatively commented that it is more difficult to accurately approximate the 5th and 95th percentiles compared to other percentiles which are closer to the center of a distribution. Hence they recommended SR as an alternative mean estimator if being close to the center of distribution is the main concern because SR involves the 10th and 90th percentiles to estimate a mean value. Megill (1984) investigated the bias of SR for the case of log-normal distribution. In this regard, he plotted the ratio of ݔௌோ ⁄ ܧሺܺሻ versus the ratio of ݔଽ ⁄ݔହ (ൌ ݁ଵ.ଶ଼ఙ ) varying from one to 15 (i.e., = 0 to 2.1) as a measure of the variability of ܺ, where ߪ is the log-standard deviation. He stated that SR is “close” to the mean value generated by 5000 iteration in MC simulation for the modestly skewed distributions; however, it starts to be significantly biased as the distribution becomes highly skewed. For example, SR ݎܽݒሺ ݔሻ் ൌ 0.185 ݔହ ଶ 0.63ݔହ ଶ 0.185 ݔଽହ ଶ െ ሺݔ் ሻଶ and ݎܽݒሺ ݔሻௌோ ൌ 0.3ݔଵ ଶ 0.4ݔହ ଶ 0.3ݔଽ ଶ െ ሺݔௌோ ሻଶ are estimated population variance using PT and SR, respectively. 1 12 Literature Review underestimates the mean by 10% when the ratio ݔଽ ⁄ݔହ is 5 and the bias increases to 45% when the ratio reaches 15. Recently, Bickel et al. (2011) studied the biases of different discretization methods. They described three different approaches to derive discretization methods. In discretization methods, the PDF is approximated by a few representative values and corresponding probabilities. Thus, they stated that direct application of moment matching to each input distribution yields a maximum accuracy. Hence they applied moment matching to develop weights for the 10th, 50th, and 90th percentiles, and they concluded that SR has no analytical justification for any distribution other than normal distribution. In other words, only when SR is applied to a data set which is normally distributed it estimates moments with zero error. In addition, they investigated the performance of different discretization methods by comparing them to MC simulation in the context of estimating the population moments. For this comparison, they analytically derived the number of samples required (i.e., S-equivalence) to achieve a probability that MC simulation estimates the kth raw moment more accurately than discretization methods. They computed the 95% S-equivalence for the uniform ܷሺ0, ܾሻ, normal ܰሺ0, ߪሻ, triangular ܶሺ0, ܾ, ܾ⁄2ሻ , exponential ܧሺߣሻ, and log-normal ܮሺߤ, 1ሻ distributions. Then they demonstrated that the performance of SR is “quite poor” whereas other discretization methods such as PT and GQN2 work well. Therefore, they did not support the use of SR to estimate a mean value, and recommended using other more accurate alternatives such as PT and GQN instead. Bickel et al. (2011) and Keefer (1994) have suggested PT and GQN as alternative mean estimators instead of SR. They, however, neglected the uncertainty associated with PT and GQN caused by using percentiles which are close to the tails of a distribution. One may wonder why PT and GQN outperformed SR. The first reason is that PT and GQN give the weights 0.63 and 0.667 to ݔହ while SR assigns the weight 0.4 to ݔହ . Thus ݔହ makes a larger contribution to mean estimate in PT and GQN than in SR. Steel and Torrie (1980, p.19) claimed that the AA can be replaced by the median as the mean estimator for a skewed distribution. However, the median is not affected by values that 2 GQN is based on applying the three-point moment matching to a normal distribution and it estimates a mean value by weighting 4.2th, 50th, and 95.8th percentiles by 0.167, 0.667, and 0.167, respectively. 13 Literature Review are distant from the center of a distribution. Thus it cannot be an appropriate representative of the mean value, but it is a good start to estimate the mean value. The second reason is that as the standard deviation of population increases, the data points spread over a large interval (i.e., the distances of mean value from those points that are close to extremes become large). ݔଵ becomes much larger than ݔହ in lower tail and ݔଽହ becomes much larger than ݔଽ in higher tail as ߪ increases. For example, for the case of log-normal distribution with log-standard deviation of ߪ ൌ 2, ݔଵ and ݔଽହ are twice as large as ݔହ and ݔଽ , respectively. Therefore, using ݔଵ and ݔଽ as substitute for the lower and upper tails of the distribution miss those values that are much smaller and larger than ݔଵ and ݔଽ , respectively. On the other hand, ݔହ and ݔଽହ are closer to extremes than ݔଵ and ݔଽ , thus they are better representative of the tails. Reliable estimation of the 5th and 95th percentiles, however, is more difficult than the estimation of the 10th and 90th percentiles, especially for large variability. 2.3.3 Maximum Likelihood Estimator The idea behind the MLE is to estimate parameters of a population such that they maximize the probability of the sample data. From a statistical point of view, the MLE method is the preferred estimator because it is asymptotically efficient, asymptotically normal, and it converges in probability under general conditions for a large number of samples (Kenney and Keeping 1951; Quenouille 1956; Kendell and Stuart 1977). In general, MLE is biased; however, it becomes unbiased and asymptotically unbiased for some distribution types such as normal and log-normal distribution, respectively (Kendell and Stuart 1977; example 18.2). Suppose the independent RV, ܺ, is taken from a population with continuous PDF of ݄ ሺ ߠ|ݔሻ, where ߠ is unknown parameter. For any observed data set ܺଵ , … , ܺ , the joint PDF is denoted by ܮ ሺ ߠ|ݔሻ ൌ ∏୬୧ୀଵ ݄ ሺݔ୧ |ߠ ሻ, which is called the likelihood function (LF). The parameters ߠ is approximated by maximizing ݈݊ሺܮሻ, thus ݈݊ሺܮሻ should be at least twice-differentiable function respect to ߠ. All local maxima of LF are found such that 14 Literature Review డሾሺሻሿ డఏ ൌ 0 and డమ ሾሺሻሿ డఏమ ൏ 0, ......................................................................................(2-11) and if there is more than one, the largest one is selected. 2.4 Distributions Types There are different methods to estimate a mean value, and a few of them are commonly used in the oil and gas industry as described above. Each of these methods has its own bias and SE, which may differ from one distribution type to another one. For example, the bias of the AA does not depend on PDF, and it is unbiased for any PDF. Although the use of SR requires no assumption on the underlying distribution types, the bias of SR are different from one distribution to another one. As shown by Bickel et al. (2011), SR has zero bias when the underlying distribution is normal while it becomes biased for the log-normal distribution. Therefore, knowing the behaviour of the mean estimators under different distribution types assist in selecting the appropriate mean estimator for certain distribution type. This section, thus, summarizes the commonly used distribution types in the oil and gas industry in order to concentrate on specific distribution types and study the statistical properties of the mean estimators for each of them. Law (1944) was one of the first researchers who had statistically characterized permeability data sets; he represented permeability variation by a log-normal distribution. Many subsequent investigators had statistically analyzed permeability data sets (e.g, Bennion 1966 and Lambert 1981). According to these studies, possible permeability PDF’s include the normal, log-normal, and exponential distributions. Later, Jensen et al. (1987) analyzed six permeability data sets and proposed that permeability distributions are not necessarily log-normal and can be transformed into the normal distribution by a power transformation. Therefore, power-normal distribution is another possible permeability distribution. In addition to the unimodal log-normal and power-normal distributions, a bimodal distribution can be assigned to the reservoir parameters due to the existence of geological heterogeneity. For instance, a formation may consist of fractures, high permeability 15 Literature Review media, and matrix, shaly media with low permeability. Therefore, the permeability distribution may be either bimodal or unimodal. The oil and gas field size is another parameter that has been statistically studied by many researchers for decades. Kaufman (1965) examined the Yule, Pareto, and log­ normal distributions on some empirical data, and he concluded that the log-normal is the best distribution type to describe hydrocarbon reserves. Following that, the log-normal distribution was used as the distribution of the oil and gas reserves by many researchers such as Megill (1984), Hurst (2000), Rose (2001), and Cartwright (2007). However, some studies have shown that hydrocarbon reserves is not necessarily log-normally distributed. For example, Seyedghasemipour and Bhattacharyya (1990) introduced the log-hyperbolic distribution as an alternative distribution for the reserves distribution by studying Denver basin oil fields and MacCrossan (1969) illustrated that the bimodal distribution is another possible reserves distribution by analyzing the oil and gas reserves distributions of Western Canada from 1965 to 1969. MacCrossan (1969) showed that the size-frequency distribution of ultimate recoverable reserves of Viking, Rainbow Reef, and Nisku oil pools, and Leduc and Wabamun gas pools are bimodal and illustrated that each of these bimodal distributions can be split into two log-normal distributions. He described a few hypotheses to address conditions under which we can have reserves with a bimodal distribution. First, bimodality may be a consequence of mixing two geologically different groups, which have dissimilar reservoir characteristics such as porosity. Second, this might be a result of combining a large-sized pool with larger mean due to enhanced recovery with smaller-sized pool which is under primary production. Moreover, Laherrere and Sornette (1998) have proposed more flexible power-normal distribution as a reserves distribution. Consequently, for the purposes of this study, the variable ܺ is assumed to be randomly drawn from populations with one of three PDFs. Distributions of log-normal, power-normal, and bimodal type, which can be described by a combination of two log­ normal distributions, will be used in this research study. 16 Literature Review 2.5 Gaps in the Existing Body of Knowledge Being unbiased is desirable but it is not necessarily the most important factor to select an estimator because it might be possible to correct the bias either using a correction factor or making some modifications in the formulation of the mean estimator that is discussed in next chapters. Therefore, in addition to bias, other properties such as uncertainty, consistency, and efficiency should be assessed in order to choose an optimal mean estimator. Furthermore, a good estimator should simultaneously be unbiased and consistent and have good efficiency and small uncertainty. However, none of Keefer and Bodily (1983), Megill (1984), and Bickel et al (2011) has assessed other properties of the mean estimators besides bias. Thus it is of interest to assess the performances of the mean estimators based on bias, uncertainty, consistency, and efficiency. Keefer and Bodily (1983), Megill (1984), and Bickel et al (2011) have investigated the bias of SR for a log-normal distribution with a limited range of mean and standard deviation. For instance, Megill (1984) reported the bias of SR for a log-normal distribution with ߪ ∈ ሾ0, 2.1ሿ. Hence it is of interest to extend Megill’s results to the wider range of heterogeneity, ߪ ∈ ሾ0, 5ሿ, since reservoir characteristics can be this variable. For example, Rollins et al. (1992) show log-normal probability plots and list characteristics for permeability with ’s of 4 (Cotton Valley Formation) and 2.6 (Travis Peak Formation). Seidle and O’Connor (2003) report that the distribution of estimated ultimate gas recoveries (EUR’s) for the San Juan Basin is log-normal with = 2.2. As mentioned before, Keefer and Bodily (1983), Megill (1984), and Bickel et al (2011) have evaluated the SR performance specifically for the case of a log-normal distribution and overlooked its performance for other types of distribution. It is well understood, on the other hand, that reservoir parameters such as hydrocarbon reserves and permeability are not necessarily log-normally distributed and can be described by other types of distribution (e.g., bimodal and power-normal distributions). Therefore, this research study investigates the SR performance for bimodal and power-normal distributions in addition to log-normal distribution and compares it with the performances of other mean estimators. 17 Literature Review Rose (2001) has raised a noteworthy aspect of SR; however, his conclusion would have been more persuasive if he could have quantitatively studied the bias of SR for wide range of truncated log-normal distributions. Another issue that he overlooked is that, after truncation, 98% of the cumulative density function (CDF) is used to calculate the mean value while proposed SR’s formula is based on using a 100% of the CDF. Therefore, SR’s formula might be changed based on this truncation. This change might be insignificant, but it should be evaluated. Therefore, it would be of interest to comprehensively evaluate the bias, uncertainty, efficiency, and consistency of SR when the underlying truncated distribution is log-normal with wide range of variability. Delfiner (2007) also advocated the use of SR to reduce the pitfall related to permeability estimates from Phi-k relationship. He did this comparison for a Phi-k data set with the correlation coefficient of 0.64. However, it remains unclear whether the result is improved due to the use of SR or due to change of method of estimating permeability from Phi-k (i.e., vertically slicing the Phi-k cross plot and estimating the permeability mean for each slice). Also, he has not addressed whether the method proposed by him is applicable for all Phi-k cross-plot with different correlation coefficients. Although Delfiner (2007) has shown the improvement in estimating pseudoflow profile using SR, his conclusion could be more persuasive if he had considered the problem with SR discussed by Megill (1984) and investigated more examples. Therefore, he could have investigated the uncertainty associated with estimated permeability from Phi-k relationship by SR. Thus it remains unclear how precise and efficient SR estimates the permeability for a Phi-K data set with different heterogeneity degree. Also, he did not consider the uncertainty associated with estimating mean via SR for each slice due to the change in the number of data points in each slice. Some slices contain a few data points while others contain a large number of data points; therefore, reliable estimates of ݔଵ and ݔଽ from the small number of data points becomes difficult. Both Delfiner (2007) and Keefer and Bodily (1983) have mentioned that reliably estimating the 5th and 95th percentiles are difficult because they are closer to the extremes than the 10th and 90th percentiles. However, they did not quantitatively investigate the variability associated with ݔହ and ݔଽହ estimates in contrast to ݔଵ and ݔଽ estimates, and consequently its effect on estimating the population mean. 18 Literature Review Moreover, it remains unclear whether SR is asymptotically unbiased, like the MLE (i.e., SR becomes unbiased as a sufficient number of samples is drawn from a population). Researchers have studied SR performance with the assumption that samples are independent and identically distributed (i.i.d.). Reservoir parameters, however, such as permeability might be auto-correlated. No attention has been paid to the SR performance when samples are auto-correlated. Hence the performances of SR and other mean estimators when samples are dependent and identically distributed (d.i.d.) are evaluated in this study. 19 Log-Normal Distribution Chapter 3 : Performance Evaluation for the Case of the Log-Normal Distribution In 1984, Megill in his book stated that SR offers good estimates of the mean for modestly skewed distributions; however, it starts to be significantly biased as the distribution becomes highly skewed. In addition to Megill (1984), Keefer and Bodily (1983) and Bickel et al. (2011) have conducted evaluation on the bias of SR for a log­ normal distribution with a limited range of mean and standard deviation. As mentioned before, an optimum mean estimator should simultaneously be unbiased and consistent, and have small uncertainty and large efficiency. Therefore, besides biasness, other mean estimator’s properties such as consistency, efficiency, and uncertainty should be assessed. Hence, this chapter evaluates all aforementioned properties of SR and compares them to the properties of the AA, MLE, and PT when the underlying distribution is log-normal with the log-standard deviation of ߪ varying between zero and five, ߪ ∈ ሾ0, 5ሿ, almost twice larger than variability studied by previous researchers since reservoir parameters can be this variable (Rollins et al. 1992 and Seidle and O’Connor 2003). These properties are analytically derived and generalized in terms of the log-mean of ߤ, and then numerically verified via MC simulation. Moreover, this chapter proposes two approaches to de-bias SR using a correction factor and making modification in the weights of SR. 3.1 Analytical Expressions of Mean Estimators’ Properties There are two assumptions used in this chapter to analytically derive the properties of mean estimators. First, RV’s of ܺଵ , ܺଶ , … , ܺ are assumed independent and identically distributed (i.i.d) with PDF of ݄ ሺݔሻ, where ݄ ሺݔሻ is the log-normal PDF with ܧሾ݈݊ሺܺሻሿ ൌ ߤ and ܸܽݎሾ݈݊ሺܺሻሿ ൌ ߪ ଶ . Thus based on the log-normal distribution’s properties, ܧሺܺሻ ൌ ݁ ఓାఙ మ ⁄ଶ మ మ and ܸܽݎሺܺሻ ൌ ݁ ൫ଶఓା ఙ ൯ ൫݁ఙ െ 1൯, where ܧሺܺሻ and ܸܽݎሺܺሻ are the expected value and variance of ܺ, respectively. Second, the uth percentile 20 Log-Normal Distribution of X is normally distributed with the mean of ܺ௨ and variance of ݑሺ1 െ ݑሻ⁄ሺ݄݊ೠ ଶ ሻ (Ord and Stuart 1987). According to the assumptions mentioned above, the expected values, ܧሺ ்ݔሻ, and variances, ܸܽݎሺ ்ݔሻ, of the AA, SR, PT, and the MLE are as follows. The statistical properties of the AA are ܧሺݔ ሻ ൌ ݁ ൬ఓା మ ൰ మ , ......................................................................................................... (3-1) and మ ܸܽݎሺݔ ሻ ൌ ݁ ൫ଶఓା ఙ ൯ ൬ మ ିଵ ൰. ..................................................................................... (3-2) The expected values and variances of SR and PT are given by ܧሺݔௌோ ሻ ൌ ݁ ఓ ሺ0.3݁ ఙ௪భబ 0.4݁ ఙ௪ఱబ 0.3݁ ఙ௪వబ ሻ; .................................................. (3-3) ܸܽݎሺݔௌோ ሻ ൌ ଶగఙ మ ݁ మ మ ೢభబ మ మ ݁ଶఓ ቊ0.0081൫݁ ଶఙ ௪భబ ା௪భబ ݁ ଶఙ ௪వబା௪వబ ൯ 0.04 0.012 ൬݁ ఙ௪భబ ା ೢ ఙ௪వబ ା వబ మ ൰ మ ೢ మ శೢ మ ൬ భబ మ వబ ൰ 0.0018݁ ቋ; ......................................................................... (3-4) ܧሺݔ் ሻ ൌ ݁ ఓ ሺ0.185݁ ఙ௪ఱ 0.630݁ ఙ௪ఱబ 0.185݁ ఙ௪వఱ ሻ; ...................................... (3-5) and ܸܽݎሺݔ் ሻ ൌ ଶగఙ మ ݁ మ మ ೢఱ మ మ ݁ଶఓ ቊ0.0016൫݁ ଶఙ ௪ఱ ା௪ఱ ݁ ଶఙ ௪వఱ ା௪వఱ ൯ 0.099 0.0058 ൬݁ ఙ௪ఱ ା ೢ ఙ௪వఱ ା వఱ మ మ ൰ 0.00017݁ ೢ మ శೢవఱ మ ൬ ఱ ൰ మ ቋ, ......................................................................... (3-6) where ݓ௨ ൌ ିଵ ሺݑ/100ሻ; denotes the cumulative standard normal density function; ܧሺݔௌோ ሻ and ܧሺݔ் ሻ are expected values of SR and PT, respectively; and ܸܽݎሺݔௌோ ሻ and ܸܽݎሺݔ் ሻ are the variances of SR and PT, respectively (see Appendix A for derivations). Furthermore, the expected value and variance of the MLE is given by 21 Log-Normal Distribution ܧሺݔொ ሻ ൌ ݁ మ ൰ మ ൬ஜା ሺషభሻ మ ି ൨ మ ቀ1 ݁ െ ఙమ ቁ ሺషభሻ మ ି ିଵ ........................................................ (3-7) and ܸܽݎሺݔொ ሻ ൌ ݁ మ ൰ ൬ଶஜା ݁ మ ൰ ൬ ቀ1 െ ଶఙ మ ቁ ି ሺషభሻ మ ିଵ െ ቀ1 െ ఙమ ିሺିଵሻ ቁ ିଵ ൩, ....................... (3-8) respectively (see Appendix B for the derivations). 3.2 Validation of Analytical Expressions using Monte Carlo Simulation The analytical expressions of mean estimators’ properties derived above are numerically validated using MC simulation. In this regard, m = 10,000 data sets containing n = 25 to 3,000 samples are randomly drawn from a log-normal distribution using the inverse cumulative method. In this method, ܺ ൌ ି ܪଵ ሺݑሻ, where ି ܪଵ is the inverse of the CDF, ܪ ሺݔሻ, and u is randomly drawn from a uniform distribution within the interval ሺ0, 1ሻ. The mean estimator, ܶ, is applied to each data set to estimate the mean value of ݉ data sets. Consequently, a set of ൛ݔො் ଵ , … , ݔො் ൟ, is generated. The expected value of this set is approximated by the AA which is designated as ሺݔො் ሻ , and its variance is obtained ଶ from ∑ ො் െ ሺݔො் ሻ ൧ ൗሺ݉ െ 1ሻ. The estimated means and variances are used to ୀൣݔ numerically compute the biases, SE’s, and RMSE’s. Bias as the first mean estimator property is evaluated using the ratio of ܧሺ ்ݔሻ to ܧሺܺሻ (Table 3-1). Deviation from unity indicates that the mean estimator is biased, and zero deviation implies that the mean estimator is unbiased. As seen in Table 3-1, the analytical expressions of the ratio, ܧሺ ்ݔሻ/ ܧሺܺሻ, are independent of ߤ, thus these ratios can be applied to all log-normal distributions with different mean values. The expressions in Table 3-1 are verified using MC simulation (Fig. 3-1). The analytical and numerical approaches of SR and PT relatively match and their match improves as ݊ becomes large (Fig. 3-1a). The analytical and numerical approaches of MLE perfectly agree, but the match between the analytical and numerical approaches of 22 Log-Normal Distribution the AA strongly depends on ݊, especially for large ߪ, similar to Agterberg’s observations (Agterberg 1974, p. 237) (Fig. 3-1b). Table 3-1– Analytical expressions of E(xT)/E(X). ࡱሺ࢞ࢀ ሻ⁄ࡱሺ࢞ሻ AA 1 ሺ0.3 ݁ ఙ ௪భబ 0.4 ݁ ఙ ௪ఱబ 0.3 ݁ ఙ ௪వబ ሻ⁄݁ PT ሺ0.185 ݁ ఙ ௪ఱ 0.63 ݁ ఙ ௪ఱబ 0.185 ݁ ఙ ௪వఱ ሻ⁄݁ MLE ݁ 1.2 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 E(xT) / E(X) 1 E(xT) / E(X) (b) Analy. SR Analy. PT Mc, SR, n=35 Mc, SR, n=100 Mc, PT, n=35 Mc, PT, n=100 1.2 ሺିଵሻ ఙ మ ି ଶ൨ ( ݔT)A / E(X) (a) ሺ1 െ ߪ ଶ ⁄݊ሻି 0 2 3 4 1.4 1.4 1.2 1.2 1 1 0.8 0.8 Analy, AA Analy, MLE, n=35 Analy, MLE, n=100 Mc, AA, n=100 Mc, AA, n=3000 Mc, MLE, n=35 Mc, MLE, n=100 0.6 0 5 ሺିଵሻ ଶ 1.6 0.2 0 1 మ ⁄ଶ 1.6 0.4 0 మ ⁄ଶ SR 0 1 2 3 0.6 (xT)A / E(X) Estimator 0.4 0.2 0 4 5 Standard Deviation (σ) Standard Deviation (σ) Fig. 3-1 – Comparison of E(xT)/E(X) and (xT)A/E(X) of (a) SR and PT (b) the AA and MLE. Fig. 3-2 draws a comparison between SE’s obtained from analytical expressions (Eqs. 3-2, 3-4, 3-6, and 3-8) and computed using MC simulations for cases of ߪ ൌ 1 and ߪ ൌ 1.5. Analytical expressions of the AA and MLE perfectly follow the numerical results (Fig. 3-2a and 3-2c); however, there is a slight discrepancy between analytical and numerical results of SR, especially for small ݊ (Fig. 3-2b and 3-2d). For the case of PT, the difference between analytical and numerical approaches becomes significant for large ߪ (Fig. 3-2d). As ݊ increases or ߪ decreases, these differences approach zero. They might be caused by assuming that the uth percentile is normally distributed whereas it actually has a beta distribution for small ݊ (Ord and Stuart 1987). 23 Log-Normal Distribution (a) (b) σ = 1.0 70 σ = 1.0 90 80 60 70 50 SD(xT) 60 SD(xT) 40 30 Analy., AA MC., AA Analy., MLE MC., MLE 20 10 0.05 0.1 0.15 40 30 Analy., SR MC., SR Analy., PT MC., PT 20 10 0 0 50 0 0.2 0 0.05 0.1 1/√n (c) (d) σ=1.5 350 0.2 0.15 0.2 σ=1.5 400 300 Analy., AA 350 250 MC., AA Analy., MLE 300 MC., MLE 250 200 SD(xT) SD(xT) 0.15 1/√n 150 Analy., SR MC., SR Analy., PT MC., PT 200 150 100 100 50 50 0 0 0.05 0.1 0.15 0 0.2 0 0.05 1/√n 0.1 1/√n Fig. 3-2 – Standard errors of the AA, MLE, SR, and PT obtained from analytical and numerical approaches for the cases of = 1 and = 1.5 . Estimator AA Table 3-2– Analytical expressions of RMSE’s of the mean estimators. RMSE ఓ మ ⁄ଶ ඥ ఙ మ ݁ ቄ݁ ݁ െ 1ൗ√݊ቅ మ ଵ/ଶ ଶ SR ݁ ఓ ቄ൫0.3݁ ఙ௪భబ 0.4 ݁ ఙ ௪ఱబ 0.3 ݁ ఙ ௪వబ െ ݁ ⁄ଶ ൯ ሺ2ߨߪ ଶ ⁄݊ሻfௌோ ሺߪሻቅ , where మ మ మ fௌோ ሺߪሻ ൌ 0.0081 ൣ݁ଶఙ ௪భబା௪భబ ݁ ଶఙ ௪వబା௪వబ ൧ 0.04 0.012 ൣ݁ ఙ௪భబା௪భబ ⁄ଶ మ మ మ ݁ఙ௪వబା௪వబ ⁄ଶ ൧ 0.0018 ݁ ൫௪భబ ା௪వబ ൯⁄ଶ PT ݁ ఓ ቄ൫0.185 ݁ఙ௪ఱ 0.630 ݁ ఙ ௪ఱబ 0.185 ݁ ఙ ௪వఱ െ ݁ ⁄ଶ ൯ ሺ2ߨߪ ଶ ⁄݊ሻf் ሺߪሻቅ where మ మ మ f் ሺߪሻ ൌ 0.0016 ൣ݁ଶఙ ௪ఱା௪ఱ ݁ ଶఙ ௪వఱ ା௪వఱ ൧ 0.099 0.0058 ൣ݁ఙ௪ఱା௪ఱ ⁄ଶ మ మ మ ݁ ఙ௪వఱ ା௪వఱ ⁄ଶ ൧ 0.00017݁ ൫௪ఱ ା௪వఱ ൯⁄ଶ మ ݁ ఓ ቊ݁ ఙ ൛݁ ିሺିଵሻఙ మ ⁄ሺଶሻ ଶ ሾ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሿି ሺିଵሻ⁄ଶ െ 1ൟ ݁ ఙ MLE ଵ/ଶ 2ߪ ଶ ⁄ሺ݊ െ 1ሻሿିሺିଵሻ⁄ଶ െ ሾ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሿି ሺିଵሻ ൟቋ 24 ଵ/ଶ ଶ మ మ ⁄ ൛ ݁ఙ మ ⁄ ሾ1 െ , Log-Normal Distribution As described in Chapter two, RMSE that, indeed, incorporates both the bias and variance of a mean estimator is also analytically derived (Table 3-2). These analytical expressions are verified by MC simulations when ߪ ൌ 1 and ߪ ൌ 1.5 for different ݊ values (Fig. 3-3). The numerical and analytical approaches of the AA and MLE approximately match (Fig. 3-3a and 3-3c). However, there is difference between the numerical and analytical approaches of SR and PT which depends on ݊ and ߪ (Fig. 3-3b and 3-3d). (a) (b) σ=1.0 70 σ=1.0 90 80 60 70 50 RMSE RMSE 60 40 30 Analy., AA MC., AA Analy., MLE MC., MLE 20 10 50 40 20 10 0 0 0.05 0.1 0.15 Analy., SR MC., SR Analy., PT MC., PT 30 0 0.2 0 0.05 0.1 1/√n (c) (d) σ=1.5 350 0.2 σ=1.5 400 300 350 Analy., AA MC., AA Analy., MLE MC., MLE 200 Analy., SR MC., SR Analy., PT MC., PT 300 RMSE 250 RMSE 0.15 1/√n 150 250 200 150 100 100 50 50 0 0 0.05 0.1 0.15 0 0.2 0 1/√n 0.05 0.1 0.15 0.2 1/√n Fig. 3-3 – RMSE’s of the AA, MLE, SR, and PT obtained from analytical and numerical approaches for the cases of = 1 and = 1.5. 3.3 Analysis of the Analytical Expressions of the Mean Estimators’ Properties Due to the good agreement between the numerical and analytical expressions for the bias, SE, and RMSE, the rest of this chapter focuses on comparing the mean estimators’ properties obtained from the analytical expressions. 25 Log-Normal Distribution As mentioned before, the ratio of ܧሺ ்ݔሻ⁄ ܧሺܺሻ is used to assess the bias of an estimator ܶ, and any deviation from unity implies that the mean estimator is biased. According to Table 3-1, the ratio, ܧሺݔ ሻ⁄ ܧሺܺሻ, is one which means the AA is an unbiased estimator, which is also shown in many statistics textbooks too. The ratio of ܧሺݔொ ሻ⁄ ܧሺܺሻ is a function of ݊ and approaches one when ݊ becomes very large. In other words, the MLE is asymptotically unbiased for the log-normal distribution, which has been previously demonstrated (e.g., Kendall and Stuart 1977) (Fig. 3-4). The ratios ܧሺݔௌோ ሻ⁄ ܧሺܺሻ and ܧሺݔ் ሻ⁄ ܧሺܺሻ do not appreciably deviate from one when ߪ is within the interval ሺ0, 1ሻ whereas the deviations become substantial as ߪ increases. PT estimates the mean value with a smaller bias than SR does. Fig. 3-4 supports Megill’s conclusion (1984) that the SR is biased, and Keefer and Bodily’s finding (1983) that PT outperforms SR in the context of bias. However, the results are extended here to log-normal distribution with wider variation, ߪ ∈ ሾ0, 5ሿ. 1.4 1.2 1 AA MLE, n=35 MLE, n=100 MLE, n=10,000 PT SR E(xT) / E(X) 0.8 0.6 0.4 0.5 < CV < 1.0 0.2 CV < 0.5 CV > 1.0 0 0 0.5 1 1.5 2 2.5 0 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.9890.993 σ VDP 3 3.5 4 4.5 5 Fig. 3-4– Analytical ratios of E(xT)/E(X) versus σ and the Dykstra-Parsons coefficient. In order to study the mean estimators’ characteristics in terms of permeability variations, the ratio ܧሺ ்ݔሻ⁄ ܧሺܺሻ is also plotted versus the Dykstra-Parsons coefficient, VDP (Dykstra and Parsons 1950) (Fig. 3-4, the second horizontal axis). VDP is commonly used as a measure of permeability variation in the oil and gas industry, expressed by ܸ ൌ ఱబ ିభల ఱబ , .............................................................................................................. (3-9) where ݇ହ is the median permeability and ݇ଵ is located one standard deviation below ݇ହ on a log-normal probability plot. VDP varies from zero (homogeneous reservoir) to 26 Log-Normal Distribution unity (infinitely heterogeneous reservoir), with typical values in the range of 0.5 to 0.9 (Willhite 1986, Fig. 5.45). Many researchers have used this coefficient such as Jensen et al. (2000), Lambert (1981), and Pintos et al. (2011) and described how this coefficient can be approximated from available data sets. The x-axes of Fig. 3-4 are divided into three intervals based on the coefficient of variation (CV), which is another way to express permeability heterogeneity in geological and engineering studies. Jensen et al. (2000) proposed ranges of CV as the homogenous region with CV ≤ 0.5; heterogeneous region with 0.5 < CV ≤ 1; and very heterogeneous region with CV > 1. According to Fig. 3-4, SR underestimates population mean by at most 4% when VDP is less than 0.6; however, the underestimation sharply increases when VDP exceeds 0.75. Based on the CV values, SR underestimates the mean value by at most 2% when CV ≤ 1, but this underestimation exceeds 20% for CV > 1. The 16% underestimation can be seen in Delfiner’s data set (Delfiner 2007) with VDP = 0.75 and CV=1.3. The RMSE’s and SE’s of the mean estimators are functions of ߤ, ߪ, and ݊ (Table 3-2 and Eqs. 3-2, 3-4, 3-6, and 3-8). This dependency on ߤ implies that the RMSE and SE are unique for each log-normal distribution. Thus it is of interest to modify the formulas of the RMSE and SE’s such that they can be used for all log-normal distributions with different mean values. This approach helps to predict the behaviour of the mean estimators by calculating only a parameter from a data set. Hence ݁ ଶఓ is canceled out from both sides of, for instance Eq. 3-4, and then taking the square root from either side yields ௌ௧ௗሺ௫ೄೃ ሻ ഋ ൌ మ మ ೢభబ మ మ ሺ2ߨߪ ଶ ⁄݊ሻ ቊ0.0081൫݁ ଶఙ ௪భబା௪భబ ݁ ଶఙ ௪వబା௪వబ ൯ 0.04 0.012 ൬݁ ఙ௪భబ ା ݁ ೢ ఙ௪వబ ା వబ మ మ ൰ 0.0018݁ ೢ మ శೢవబ మ ൬ భబ ൰ మ ቋ, ........................................................................(3-10) where ܵ݀ݐሺݔௌோ ሻ is the SE of ݔௌோ and ݁ ఓ ൌ ݔହ , hence the new equation is only a function of ߪ and ݊ (i.e., the graph of ܵ݀ݐሺݔௌோ ሻ⁄ݔହ versus ߪ can be used for all log-normal distributions with different ߤ values). The same approach is used to modify the SE’s of the AA, MLE, and PT. The expression ܵ݀ݐሺ ்ݔሻ⁄ݔହ is used to compare the degree of 27 Log-Normal Distribution uncertainty of the mean estimators in the rest of this chapter. An estimator with smaller ܵ݀ݐሺ ்ݔሻ⁄ݔହ has smaller uncertainty. In general, all SE’s approach zero as ݊ becomes large. For small ߪ, ܵ݀ݐሺݔ ሻ⁄ݔହ and ܵ݀ݐሺݔொ ሻ⁄ݔହ are approximately identical and smaller than ܵ݀ݐሺݔௌோ ሻ⁄ݔହ and ܵ݀ݐሺݔ் ሻ⁄ݔହ (Fig. 3-5a and 3-5b). Thus, the AA has the same uncertainty as MLE and has less uncertainty than SR and PT for small ߪ. However, when ߪ exceeds a certain value depending on ݊, SR has smaller SE than the other mean estimators (Fig. 3-5d and 3-6). The same approach as used for generalizing the SE’s is applied to generalize the RMSE’s, hence the ratio ܴ ܧܵܯ⁄ݔହ , which is only a function of ߪ and ݊, is used for all log-normal distributions with different ߤ values. For example, the RMSE of SR can be modified as ோெௌாೄೃ ௫ఱబ where మ ⁄ଶ ൌ ቄ൫0.3݁ ఙ௪భబ 0.4݁ ఙ௪ఱబ 0.3݁ ఙ௪వబ െ ݁ మ మ ଶ ଵ/ଶ ൯ ሺ2ߨߪ ଶ ⁄݊ሻ fௌோ ቅ , ...(3-11) fௌோ ൌ ൛0.0081ൣ݁ ଶఙ ௪భబ ା௪భబ ݁ ଶఙ ௪వబ ା௪వబ ൧ 0.04 0.012ൣ݁ ఙ௪భబା௪భబ ݁ఙ௪వబା௪వబ మ ⁄ଶ ൧ 0.0018݁ ൫௪భబ మ ା௪ మ ൯⁄ଶ వబ మ ⁄ଶ ൟ. This ratio is used to evaluate the consistency and efficiency of the mean estimators. An estimator is consistent when the ܴ ܧܵܯ⁄ݔହ tends to zero for large ݊, and it is the most efficient when it has the minimum ܴ ܧܵܯ⁄ݔହ among other estimators. 28 Log-Normal Distribution (a) (b) σ=0.05, VDP=0.05 σ=0.5, VDP=0.39 0.16 0.012 0.14 0.01 0.12 SD(xT)/x50 SD(xT)/x50 0.008 0.006 AA MLE SR PT 0.004 0.002 0.1 0.08 AA MLE SR PT 0.06 0.04 0.02 0 0 0 0.05 0.1 0.15 0 0.2 0.05 1/√n (c) (d) 0.2 σ=1.5, VDP=0.78 2 1.8 0.4 Arith MLE SR PT 1.6 1.4 SD(xT)/x50 0.15 1/√n σ=1.0, VDP=0.63 0.5 0.1 0.2 SD(xT)/x50 0.3 AA MLE SR PT 0.1 1.2 1 0.8 0.6 0.4 0.2 0 0 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 1/√n 1/√n Fig. 3-5 – Ratio of SE’s to x50 of the AA, MLE, SR, and PT for four different σ values. (a) 100000 10000 1000 1000 std (xT)/x50 10000 std (xT)/x50 100 10 AA MLE PT SR 1 0.1 0.01 0.5 1 1.5 2 2.5 3 3.5 4 4.5 100 10 1 0.01 5 0.001 0 σ, n=50 0 AA MLE PT SR 0.1 0.001 0 (b) 100000 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993 0 VDP 0.5 1 1.5 2 2.5 σ, n=600 3 3.5 4 4.5 5 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993 VDP Fig. 3-6 – Ratios of SE/x50 of the AA, SR, MLE, and PT versus σ and VDP for (a) n=50 and (b) n=600. Figures 3-7a to 3-7d present the analytical expressions of ܴ ܧܵܯ⁄ݔହ of the AA, MLE, SR, and PT versus 1⁄√݊ for four different values of ߪ and VDP. When ߪ is small, the AA is as efficient as the MLE and more efficient than SR and PT; SR is slightly more efficient than PT (Fig. 3-7a and 3-7b). As reaches 1, for ݊ ൏ 150, SR is still less 29 Log-Normal Distribution efficient than the AA and MLE, and more efficient than PT. However, as ݊ 150, PT becomes more efficient than SR (Fig. 3-7c). As increases and reaches 1.5, SR becomes the most efficient for a very small number of samples (݊ 50); nevertheless, as ݊ increases, the AA, MLE, and PT all become more efficient than SR (Fig. 3-7d). Therefore, SR becomes more efficient than others for certain ranges of ݊ and ߪ. In other words, SR becomes more efficient than the AA, MLE, and PT when ߪ exceeds a certain value depending on ݊ (Fig. 3-8). (a) (b) σ=0.05, VDP=0.05 0.012 σ=0.5, VDP=0.39 0.16 0.14 0.01 RMSE/x50 RMSE/x50 0.12 0.008 0.006 AA MLE SR PT 0.004 0.002 0.1 0.08 0.06 AA MLE SR PT 0.04 0.02 0 0 0 0.05 0.1 0.15 0 0.2 0.05 1/√n (c) (d) 0.2 σ=1.5, VDP=0.78 2 1.6 RMSE/x50 0.4 RMSE/x50 0.15 1/√n σ=1.0, VDP=0.63 0.5 0.1 0.3 0.2 AA MLE SR PT 0.1 1.2 0.8 AA MLE SR PT 0.4 0 0 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 1/√n 1/√n Fig. 3-7 – Ratio of RMSE to x50 of the AA, SR, MLE, and PT for four different σ values. Equating the analytical expression of RMSE of SR to the analytical expressions of RMSE’s of the AA, MLE, and PT yields intervals where SR is more efficient than other estimators (Eqs. 3-12 through 3-14). If ቐ ݊ ൏ 100,ߪ ݊ 100,ߪ ହଵଽ మ െ ଵଶଵ଼଼ య⁄మ .଼ ଵଵ െ െ ଷ.ସଽ √ ହଵ.ଵ଼ √ 30 1.37 3.39 , .....................................(3-12) Log-Normal Distribution SR is more efficient than the AA; if ݊ ൏ 50,ߪ 1 ଵହ ହ଼.ଵ ଵଷସସ.ହ ଵଷଵ.ଶ ቐ50 ݊ ൏ 800,ߪ మ െ య⁄మ െ 8.30, ...........................(3-13) √ ݊ 800,ߪ 5 SR is more efficient than MLE; and SR is more efficient than PT if ߪ ൏െ ଷ଼.ଶ మ െ ଷଵ. య⁄మ ଷହ.ଽହ ଼.଼ √ 0.17 for any ݊. ..................................................(3-14) For example, when ݊ ൌ 600, SR is more efficient than the AA and MLE when ߪ is greater than 2.01 and 4.7, respectively, and has smaller RMSE than PT when ߪ is smaller than 0.6 (Fig. 3-8b). (a) 100000 1000 100 100 RMSE/X50 1000 RMSE/X50 10000 10 AA MLE PT SR 1 0.1 0.01 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 AA MLE PT SR 0.01 0.001 5 0 σ, n=50 0 10 0.1 0.001 0 (b) 100000 10000 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 σ, n=600 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.9890.993 0 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993 VDP VDP Fig. 3-8 – Ratios RMSE/x50 of the AA, MLE, SR, and PT versus σ when (a) n=50 or (b) n=600. The RMSE’s of SR and PT never reach zero as the number of samples becomes large (Fig. 3-7d). The SE’s of SR and PT (square roots of Eqs. 3-4 and 3-6, and Fig. 3-5) illustrate that the SE’s tend to zero as ݊ approaches infinity. However, the ratios ܧሺݔௌோ ሻ⁄ ܧሺܺሻ and ܧሺݔ் ሻ⁄ ܧሺܺሻ are not functions of ݊ and only depend on ߪ, so they never tend to one with the exception of ߪ ൌ 0 (Table 3-1 and Fig. 3-4). Thus the deviations of RMSE’s of SR and PT from zero for large ݊ are due to their biases. This means that both SR and PT are inconsistent and converge in probability to a value which is different from the true mean. 31 Log-Normal Distribution 3.4 Improving Swanson’s Rule Being unbiased is an appropriate property but it is not necessarily the main criterion for selecting an optimal estimator because, first bias can be removed by including a correction factor, and second other mean estimators’ properties can compensate the bias. For example, although SR is biased, it has the smallest SE for large ߪ for any ݊ and has the smallest RMSE estimator when ߪ is large and ݊ is small. Two approaches are used to remove or reduce the bias of SR are as follows. (1) multiply ܧሺݔௌோ ሻ by a correction factor, ݖ, such that the ratio ܧሺݔௌோ ሻ⁄ ܧሺܺሻ becomes unity; and (2) change the weights ω of SR as ߪ changes. 3.4.1 Adjusting Swanson’s Rule by a Coefficient The bias of SR is a function of ߪ and becomes significant as ߪ increases (Fig. 3-4). The first approach to improve SR is to multiply it by a coefficient such that ݔݖௌோ ൌ ݔௌோ ଵ ൎ ܧሺܺሻ. The coefficient is approximated using generalized reduced gradient nonlinear optimization code3 and given by ݖൌ ݁ݔሺെ.00771ߪ ସ 0.105ߪ ଷ െ 0.043ߪ ଶ െ 0.00342ߪሻ. ....................................(3-15) ݖis a function of ߪ hence, in order to de-bias SR, σ should be known. However, σ is not always available, so the sample standard deviation, ݏ, is used instead. ݏcan be evaluated by ሾ݈݊ሺݔଽ ⁄ݔଵ ሻሿ⁄2ݓଵ , where ݔଵ and ݔଽ are the 10th and 90th percentiles, and ݓଵ ൌ െ1.28176. When ݊ is very small, ݏis estimated by about 27% error for both a very heterogeneous case, ߪ ൌ 5.0, and a nearly-homogeneous case, ߪ ൌ 1.0 (Fig. 3-9a). From Fig. 3-9a, it can also be concluded that the error associated with estimating ݏstrongly depends on ݊, regardless of how heterogeneous a population is, and as ݊ increases, ݏ becomes a good approximation of ߪ (Fig. 3-9b). The analytical expressions of the expected value and SE of SRC1 are ܧ൫ݔௌோ ଵ ൯ ൌ ݖሺߪሻ ܧሺݔௌோ ሻ and ݀ݐݏ൫ݔௌோ ଵ ൯ ൌ ݖሺߪሻ ݀ݐݏሺݔௌோ ሻ, respectively. The analytical results do 3 Microsoft Excel Solver tool is developed by Leon Lasdon, University of Texas at Austin, and Allan Waren, Cleveland State University. 32 Log-Normal Distribution not follow the MC simulation results except for large ݊ (Fig. 3-10). This discrepancy might be caused by the assumption that the uth percentile is normally distributed while it, indeed, has a beta distribution (Ord and Stuart 1987). 7 (b) 1.4 6 1.2 5 1 4 Sample SD/σ Sample SD (a) Analy, σ=5.0 MC, σ=5.0 Analy, σ=1.0 MC, σ=1.0 3 2 0.8 0.6 Analy 0.4 MC 1 0.2 0 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√n 0.1 0.15 0.2 1/√n Fig. 3-9 – Sample standard deviation obtained from analytical expression and MC simulation with error bars showing 95% confidence interval (a) for two different σ values (b) for general case. (a) (b) σ=0.5, VDP=0.39 MC, σ Unknown MC, σ Known Analy 169.5 MC, σ Unknown MC, σ Known Analy 256 254 169 E(xSR_C1) E(xSR_C1) σ=1.0, VDP=0.63 258 170 168.5 252 250 248 168 246 167.5 244 167 242 0 0.05 0.1 0.15 0 0.2 0.05 1/√n (c) (d) σ=1.5, VDP=0.78 MC, σ Unknown MC, σ Known Analy 0.2 MC, σ Unknown MC, σ Known Analy 1600 1500 510 E(xSR_C1) E(xSR_C1) 520 0.15 σ=2.0, VDP=0.86 1700 540 530 0.1 1/√n 500 490 480 1400 1300 1200 470 1100 460 1000 450 0 0.05 0.1 0.15 0 0.2 1/√n 0.05 0.1 0.15 1/√n Fig. 3-10 – E(xSR‐C1 ) obtained from analytical expression and MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. 33 0.2 Log-Normal Distribution The error associated with estimating causes at most 17% error in ܧ൫ݔௌோ ଵ ൯ estimation when ߪ ൌ 2.0 compared to the case in which ߪ is known; however this error rapidly decreases and approaches approximately zero as ݊ increases and/or ߪ decreases (Fig. 3-10). For instance, when ߪ decreases from 2.0 to 1.5, the error drops to about 5% for ݊ ൌ 25 and becomes zero for very large n (Fig. 3-10c). 3.4.2 Moment Matching with Fixed Values Another way to de-bias SR is to analytically calculate the weights of SR. Hurst et al. (2000) have theoretically justified the SR weights by using the general form of Eq. 2-10 given as ݔ௦௪ ൌ ߱ݔଵ ሺ1 െ 2߱ሻݔହ ߱ݔଽ , ......................................................................(3-16) which preserves symmetry but allows the weights to vary. They showed that the weights are identical to the weights originally proposed by Swanson (Megill 1984) when ߪ approaches zero, but they deviate from the 0.3-0.4-0.3 rule as ߪ increases. Bickel et al. (2011) stated that directly applying moment matching to each sample distribution yields a maximum accuracy, thus they applied moment matching to uniform, normal, exponential, and triangular distributions to derive discretization methods with fixed values or fixed probabilities. They concluded that SR has no analytical justification for any distribution other than normal distribution. Moment matching is a form of Gaussian quadrature, ݄ሺߦ ሻߦ ݀ߦ ൌ ܧሺܺ ሻ ൎ ∑ ே ୀଵ ܲ ܺ for ݎൌ 0,1, … , ܰ . ...........................................(3-17) It approximates the rth non-centered moment, ܧሺܺ ሻ, by the sum ∑ே ୀଵ ܲ ܺ , where ܰ is the number of probability-value pairs, and 2ܰ moments can be approximated by ܰ points (Miller and Rice 1983). Hurst et al. (2000) have used moment matching assuming that ܰ ൌ 3, the 10th, 50th, and 90th percentiles as the ܺଵ , ܺଶ , ܺଷ values, respectively, and ܲଵ ൌ ܲଷ ൌ ߱. By taking the expected value of Eq. 3-16 and equating it to ܧሺܺሻ, the weight ߱ is given by ߱ൌ మ ൫ ⁄మ൯ ିଵ ೢభబ ିଶା ೢవబ , ....................................................................................................(3-18) 34 Log-Normal Distribution where ݓଵ ൌ െݓଽ ൌ െ1.28176. ߱ is a function of ߪ and independent of the population mean, thus this formulation can be used for all log-normal distributions with different mean values. As derived in Eq. 3-18, SR can be converted into an unbiased mean estimator, designated by ݔௌோ ଶ , by modifying the weights based on ߪ. As mentioned before, ߪ is not always known and thus ݏis used instead. The error associated with estimating ߪ causes, for instance, at most 15% error in estimating the weights of SR for a heterogeneous case with ߪ ൌ 2.0 and ݊ ൌ 200 (Fig. 3-11). This error leads to at most 20% error in ܧ൫ݔௌோ ଶ ൯ estimation when ߪ ൌ 2.0 compared to the case in which ߪ is known, but it rapidly approaches zero as ݊ increases and/or ߪ decreases (e.g., the error decreases from 20% to 7% when ߪ ൌ 1.5) (Fig. 3-12). 3.5 Weight of SR (ω) 3 σ known σ Unknown 2.5 2 1.5 1 0.5 0 0 0.5 1 0 0.39 0.63 1.5 2 2.5 3 0.78 0.86 0.92 0.95 σ, n=200 VDP Fig. 3-11 – Weights of SR versus σ, where σ is known and unknown with error bars showing 95% confidence interval. 35 Log-Normal Distribution (a) (b) σ=0.5, VDP=0.39 MC, σ Unknown MC, σ Known Analy MC, σ Unknown MC, σ Known Analy 256 E(xSR_C2) E(xSR_C2) 170 σ=1.0, VDP=0.63 260 170.5 169.5 252 169 248 168.5 244 168 240 0 0.05 0.1 0.15 0.2 0 0.05 0.1 1/√n 1/√n (c) 0.2 0.15 0.2 σ=2.0, VDP=0.86 1700 MC, σ Unknown MC, σ Known Analy MC, σ Unknown MC, σ Known Analy 1600 1500 E(xSR_C2) 560 E(xSR_C2) (d) σ=1.5, VDP=0.78 600 0.15 520 1400 1300 480 1200 440 1100 400 1000 0 0.05 0.1 0.15 0 0.2 0.05 1/√n 0.1 1/√n Fig. 3-12 – E(xSR‐C2 ) obtained from analytical expression and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. 1.2 1 MLE, n=35 MLE, n=1,000 PT SR SR_C1 SR_C2 E(xT) / E(X) 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 σ, n=200 0 0.39 0.63 0.78 0.86 0.92 0.95 0.97 0.98 0.989 0.993 VDP Fig. 3-13 – Ratio of the expected values of SR, SRC1, SRC2, PT, and the MLE to E(X). Both de-biased SR approaches cause SR to become unbiased and the most efficient estimator depending on ݊ and ߪ(Fig. 3-13 and 3-14a); however, they result in larger SE’s than the original SR when σ exceeds a certain value (Fig. 3-14b). Compared to the AA 36 Log-Normal Distribution and MLE, SRC1 and SRC2 have smaller SE’s. In other words, although these modifications remove the bias of SR, they cause an increase in the SE of SR. This increase depends on ߪ so that it can become appreciable for large ߪ’s. (a) 1000 10 1 AA MLE PT SR SR_C1 SR_C2 100 std (xT)/x50 100 RMSE/X50 (b) 1000 AA MLE PT SR SR_C1 SR_C2 0.1 0.01 10 1 0.1 0.01 0.001 0.001 0 0.5 1 1.5 2 2.5 3 0 0.5 1 σ, n=200 0 0.39 0.63 0.78 1.5 2 2.5 3 0.86 0.92 0.95 σ, n=200 0.86 0.92 0 0.95 0.39 0.63 0.78 VDP VDP Fig. 3-14 – (a) RMSE/x50 and (b) SE/x50 of the AA, MLE, PT, SR, SRC1, and SRC2 versus σ and VDP when n=200. (a) Arith MLE PT SR SR_C1 SR_C2 0.12 RMSE/x50 0.008 σ=0.5, VDP=0.39 0.16 Arith MLE PT SR SR_C1 SR_C2 0.01 RMSE/x50 (b) σ=0.05, VDP=0.05 0.012 0.006 0.004 0.08 0.04 0.002 0 0 0 0.05 0.1 0.15 0 0.2 0.05 0.1 1/√n (c) Arith MLE PT SR SR_C1 SR_C2 1.6 RMSE/x50 0.3 0.2 σ=1.5, VDP=0.78 2 Arith MLE PT SR SR_C1 SR_C2 0.4 RMSE/x50 (d) σ=1.0, VDP=0.63 0.5 0.15 1/√n 0.2 1.2 0.8 0.4 0.1 0 0 0 0.05 0.1 0.15 0 0.2 1/√n 0.05 0.1 0.15 0.2 1/√n Fig. 3-15 – Ratio of the RMSE’s of the AA, MLE, PT, SR, SRC1, and SRC2 to x50 versus the square root of the inverse of sample size. 37 Log-Normal Distribution The comparison of the RMSE’s of unbiased estimators reduces to the comparison of their SE’s. When ߪ is small, the AA and MLE are the most efficient mean estimators (Fig. 3-15a and 3-15b). However, for large ߪ’s, SRC1 and SRC2 are more efficient than the AA, MLE, and SR for some ranges of ݊ (Fig. 3-15c and 3-15d). 3.5 Concluding Remarks This chapter shows that SR, unlike MLE, is not asymptotically unbiased and significantly underestimates the mean value of a heterogeneous case. However, it becomes more efficient than the AA, MLE, and PT when ߪ becomes large. Hence, there are statistical benefits to using SR as an alternative mean estimator under some conditions but SR must be used with care as its bias can diminish its other advantages. This chapter also finds that the de-biased SR’s become consistent and the most efficient mean estimator among all estimators considered here for certain range of variability and sample size. 38 Bimodal Distribution Chapter 4 : Performance Evaluation for the Case of Bimodal Distribution Megill (1984) graphically showed that SR estimates a mean value of modestly skewed log-normal distributions with acceptable error, for instance 5.0% when σ=1; however, it significantly underestimates the mean value as the distribution becomes highly skewed. Recently, Bickel et al. (2011) concluded that SR has zero bias when population is normally distributed. In other words, the performance of SR differs from one distribution to another. Knowing the performance of the mean estimators under different distribution types up front assists in selecting an appropriate mean estimator for a given distribution type. A few studies, such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) have assessed the bias of SR for the case of log-normal distribution. Thus, it is of interest to study the performance of SR when the underlying distribution is different than normal or log-normal distributions. Sometimes, reservoir parameters can be better described by bimodal distribution due to geological heterogeneity. For instance, a formation may consist of high quality sand with high permeability and interbedded shale with low permeability and thus permeability might follow a bimodal distribution. The oil and gas field size and hydrocarbon reserves are other parameters that are not necessarily log-normally distributed and can be described by bimodal distributions (MacCrossan 1969). Hence this chapter evaluates the SR performance in contrast to the performances of the AA, PT, and MLE. In this regard, the bias, consistency, efficiency, and uncertainty of the mean estimators are analytically derived and then these expressions are numerically validated via MC simulation. 39 Bimodal Distribution 4.1 Analytical Expressions of Mean Estimators’ Properties In this chapter, it is assumed that the RV’s, ܺଵ , … , ܺ , are i.i.d and follow a bimodal distribution which can be split into two log-normal distributions as ݄ ሺߤ ;ݔ, ߪ ଶ , ߙ ሻ ൌ ߙ ݄ ଵ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ ሺ1 െ ߙሻ݄ ଶ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ, ................................. (4-1) where ݄ ଵ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ and ݄ ଶ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ are the PDF’s of two log-normal distributions with the log-means of ߤଵ and ߤଶ and log-variances of ߪଵ ଶ and ߪଶ ଶ , and ߙ is the portion of each distribution in the population which varies from zero to one. The PDF of ݄ ሺߤ ;ݔ, ߪ ଶ , ߙ ሻ can be written as ݄ ሺߤ ;ݔ, ߪ ଶሻ ൌߙ ௫ షభ ఙభ √ଶగ ݁ భ ሺೣሻషഋభ మ ቃ భ ିమ ቂ ሺ1 െ ߙሻ ௫ షభ ఙమ √ଶగ ݁ భ ሺೣሻషഋమ మ ቃ మ ିమቂ , ....................... (4-2) where ߤ and ߪ ଶ are the first and second moments of ݄ ሺݔሻ which are respectively given by ܧሺܺሻ ൌ ߙ ݁ మ ൬ఓభ ା మభ ൰ ሺ1 െ ߙሻ݁ మ ൬ఓమ ା మ ൰ మ , .................................................................... (4-3) and మ భ మ మ మ ܸܽݎሺܺሻ ൌ ߙ ݁ ଶఓభ ାଶఙభ ሺ1 െ ߙሻ݁ ଶఓమ ାଶఙమ െ ߙ݁ ఓభ ା మ మ మ ሺ1 െ ߙሻ݁ ఓమ ା ଶ ൨ . .......... (4-4) ߤଵ , ߤଶ , ߪଵ ଶ , and ߪଶ ଶ are selected such that the mode of the first log-normal distribution is smaller than the mode of the second one (i.e., ݁ ൫ఓభ ିఙభ మ൯ మ ൏ ݁ ൫ఓమିఙమ ൯ ). Note that any combination of these five properties, ߤଵ , ߪଵ ଶ , ߤଶ , ߪଶ ଶ , and ߙ, will not yield a bimodal distribution (see Appendix C for detail). For example, the PDF ݄ ሺߤ ;ݔଵ ൌ 1, ߪଵ ଶ , ߤଶ , ߪଶ ଶ ൌ 0.5ଶ , ߙሻ is bimodal when the value of ߙ lies between two same colored curves shown in Fig. 4-1 depending on ߪଵ and ߤଶ values. 40 Bimodal Distribution Mixing Proportion (α) 1 0.8 0.6 σ1 =1.5 0.4 σ1 =1.0 σ1=0.5 0.2 σ1 =0.3 σ1=0.05 0 0 1 2 3 4 5 µ2 Fig. 4-1– Bimodal region when µ1=1 and σ2=0.5. As shown before, SR and PT are functions of percentiles and consequently their statistical properties are functions of the means and variances of the percentiles. Hence the statistical properties of the 5th, 10th, 50th, 90th, and 95th percentiles are derived first. The uth percentile is assumed to be normally distributed with mean, ܺ௨ , and variance, ݑሺ1 െ ݑሻ⁄ሺ݄݊ೠ ଶ ሻ, where ܺ௨ ൌ ି ܪଵ ሺݑሻ and ܪ ሺݔሻ is CDF (Ord and Stuart 1987). The joint distribution of the uth and vth percentiles is bivariate normal and their covariance is expressed by ݑሺ1 െ ݒሻ⁄൫݊ ݄ሺݔ௨ ሻ ݄ሺݔ௩ ሻ൯ , ݑ൏ ( ݒOrd and Stuart 1987). Based on the stated assumptions above, the expected values, ܧሺ ்ݔሻ, and variances, ܸܽݎሺ ்ݔሻ, of the AA, SR, PT, and MLE are analytically derived as follows. According to the AA’s properties, ܧሺݔ ሻ ൌ ܧሺܺሻ and ܸܽݎሺݔ ሻ ൌ ሺሻ . The statistical properties of the MLE are analytically derived using the same approach as used to derive the mean and SE of MLE for the log-normal case (see Appendix D for detail). Therefore, the statistical properties of the MLE are given by ܧሺݔொ ሻ ൌ మ మ ቁ ିଵ ሺషభሻ మ ି ሺషభሻ ಚభ మ ಚభ మ ߙ ݁ ఓభ ା మ ݁ ି మ ቀ1 െ భ మ ሺషభሻ మ ቁ ି ିଵ ಚమ మ మ ሺ1 െ ߙሻ݁ ఓమ ା ݁ ି ሺషభሻ ಚమ మ మ ቀ1 െ , ................................................................................................................... (4-5) and 41 Bimodal Distribution ಚభ మ ଶ ଶఓభ ା ܸܽݎሺݔொ ሻ ൌ ߙ ݁ ಚమ మ ߙሻଶ ݁ ଶఓమ ା ݁ ಚమ మ ቀ1 െ ݁ ଶమ మ ିଵ ಚభ మ ቀ1 െ ଶభ మ ିଵ ሺషభሻ ቁ ି మ ሺషభሻ ቁ ି మ െ ቀ1 െ మ మ െ ቀ1 െ ିሺିଵሻ ቁ ିଵ భ మ ିሺିଵሻ ቁ ିଵ ൩ ሺ1 െ ൩. .......................................... (4-6) The first and second monents of the SR and PT are derived by subsituting the means, variances, and covariances of the percentiles in the following equations ܧሺݔௌோ ሻ ൌ 0.3 ܧሺݔଵ ሻ 0.4 ܧሺݔହ ሻ 0.3 ܧሺݔଽ ሻ; ........................................................ (4-7) ܸܽݎሺݔௌோ ሻ ൌ 0.09ሾݎܽݒሺݔଵ ሻ ݎܽݒሺݔଽ ሻሿ 0.16ݎܽݒሺݔହ ሻ 0.24ሾܿݒሺݔଵ , ݔହ ሻ ܿݒሺݔହ , ݔଽ ሻሿ 0.18ܿݒሺݔଵ , ݔଽ ሻ; .......................................................................... (4-8) ܧሺݔ் ሻ ൌ 0.185 ܧሺݔହ ሻ 0.63 ܧሺݔହ ሻ 0.185ܧሺݔଽହ ሻ; .............................................. (4-9) and ܸܽݎሺݔ் ሻ ൌ 0.034ሾݎܽݒሺݔହ ሻ ݎܽݒሺݔଽହ ሻሿ 0.397ݎܽݒሺݔହ ሻ 0.233ሾܿݒሺݔହ , ݔହ ሻ ܿݒሺݔହ , ݔଽହ ሻሿ 0.068ܿݒሺݔହ , ݔଽହ ሻ, ........................................................................(4-10) Where ܧሺݔௌோ ሻ and ܧሺݔ் ሻ are the expected values of SR and PT, respectively, and ܸܽݎሺݔௌோ ሻ and ܸܽݎሺݔ் ሻ are the variances of SR and PT, respectively. 4.2 Validation of Analytical Expressions using Monte Carlo Simulation Analytical expressions are numerically validated using MC simulation. For this purpose, m = 10,000 data sets including n = 25 to 3,000 samples are randomly taken from a population as described in previous chapter with this difference that the PDF of underlying distribution is ݄ ሺݔሻ ൌ ߙ ݄ଵ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ ሺ1 െ ߙሻ݄ ଶ ሺߤ ;ݔଶ , ߪଶ ଶ ሻ, where ߤଵ ൌ 1, ߤଶ ൌ 3, ߪଵ varies from 0.05 to 2, ߪଶ ൌ 0.5, and ߙ ൌ 0.3. In order to draw a data set from this population, two subsets of ݔଵ ∈ ܺଵ and ݔଶ ∈ ܺଶ are generated. These two sets are randomly taken from entire domain of ܺଵ and ܺଶ using the inverse cumulative method, where ܺ ൌ ܪ ିଵ ሺݑሻ, ܪ ିଵ is the inverse of CDF, ܪ ሺݔ ሻ, of the ith log-normal distribution, ݅ ൌ 1, 2, and ݑis uniformly distributed over the interval ሺ0, 1ሻ. Next, these sets are combined by the following formula 42 Bimodal Distribution ܺ ൌ ߚ ܺ ଵ ሺ1 െ ߚሻܺଶ ...............................................................................................(4-11) where ߚ is an index (i.e., it is either zero or one depending on the value of )ݑ. ߚ equals zero if ݑis greater than ߙ , otherwise ߚ is one. The RV’s ܺଵ and ܺଶ are assumed to be i.i.d, and consequently the RV’s ܺ is i.i.d too. Applying the mean estimator, ܶ, to each data results in the sequence of ሼݔො் ሽ, where ݔො் is the estimated mean value using the mean estimator, ܶ. The expected value, ሺݔො் ሻ , and variance, ܸܽݎሺݔො் ሻ, of the sequence are approximated by the AA and formula ଶ ∑ ො் െ ሺݔො் ሻ ൧ ൗሺ݉ െ 1ሻ, respectively. ሺݔො் ሻ and ܸܽݎሺݔො் ሻ are used to validate the ୀൣݔ analytical expressions of the mean estimators’ properties (Fig. 4-2 through Fig. 4-5). Mc,σ1=1.5 Analy., σ1=1.5 Mc,σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 (a) 19.5 19 18 Mc, σ1=1.5 Analy.,σ1=1.5 Mc, σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 4 3.5 3 SD(xA) E(xA) 18.5 (b) 2.5 2 1.5 17.5 1 17 0.5 16.5 0 0.05 0.1 0.15 0 0.2 0 0.05 0.1 0.15 0.2 1/√n 1/√n Fig. 4-2 – (a) Expected value and (b) SE of the AA. (a) Mc,σ1=1.5 Analy., σ1=1.5 Mc,σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 19.5 19 (b) Mc, σ1=1.5 Analy.,σ1=1.5 Mc, σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 2.5 2 SD(xMLE) E(xMLE) 18.5 18 17.5 1.5 1 0.5 17 16.5 0 0.05 0.1 0.15 0 0.2 0 1/√n Fig. 4-3 – (a) Expected value and (b) SE of MLE. 43 0.05 0.1 1/√n 0.15 0.2 Bimodal Distribution (b) (a) 19.5 Mc,σ1=1.5 Analy., σ1=1.5 Mc,σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 E(xSR) 18.5 3 2.5 SD(xSR) 19 Mc, σ1=1.5 Analy.,σ1=1.5 Mc, σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 3.5 18 17.5 2 1.5 1 17 0.5 0 16.5 0 0.05 0.1 0.15 0 0.2 0.05 0.1 0.15 0.2 0.15 0.2 1/√n 1/√n Fig. 4-4 – (a) Expected value and (b) SE of SR. (a) (b) Mc,σ1=1.5 Analy., σ1=1.5 Mc,σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 19.5 19 4 3.5 3 SD(xPT) 18.5 E(xPT) Mc, σ1=1.5 Analy.,σ1=1.5 Mc, σ1=1.0 Analy., σ1=1.0 Mc, σ1=0.5 Analy., σ1=0.5 4.5 18 2.5 2 17.5 1.5 17 0.5 1 0 16.5 0 0.05 0.1 0.15 0 0.2 0.05 0.1 1/√n 1/√n Fig. 4-5 – (a) Expected value and (b) SE (b) of PT. Analytical expressions of the expected values and SE’s of the AA and MLE perfectly follow the MC simulation results (Fig. 4-2 and 4-3); however, there are slight discrepancies between MC results and analytical expressions of SR and PT. These discrepancies are slight and reach 5.0% at the most for large ߪଵ and small ݊ (Fig. 4-4 and 4-5). In general, the analytical expressions reasonably match the numerical results, thus the rest of this chapter focuses on comparing mean estimator properties obtained from the analytical expressions. 4.3 Analyses of the Analytical Expressions of Mean Estimators’ Properties As mentioned in previous chapter, the ratio, ܧሺ ்ݔሻ⁄ ܧሺܺሻ, where ܧሺ ்ݔሻ is the expected value of the mean estimator, ܶ, and ܧሺܺሻ is the true mean value, is used to 44 Bimodal Distribution assess the bias. Any deviation from one display that the mean estimator is biased and no deviation indicates it is unbiased. The AA is unbiased since ܧሺݔ ሻ⁄ ܧሺܺሻ ൌ 1 and MLE is asymptotically unbiased since the deviation from unity decreases as ݊ tends to infinity (Fig. 4-6a). However, SR and PT are both biased and slightly overestimate the mean value for small standard deviation, ߪଵ , but significantly underestimate the mean value as ߪଵ exceeds one (Fig. 4-6b). (b) (a) 2.5 1.2 1 E(xT) / E(X) E(xT) / E(X) 2 1.5 1 AA 0.5 0.8 0.6 0.4 SR MLE, n=100 0.2 PT MLE, n=450 0 0 1 2 3 0 4 0 Standard Deviation (σ1) 1 2 3 4 Standrad Deviation (σ1) Fig. 4-6 – Ratio E(xT)/E(X) of (a) the AA and MLE, and (b) SR and PT when σ2=0.5. Uncertainty as the second properties is evaluated based on SE. MLE estimates the mean value with the smallest SE when ߪଵ 1.5 (Fig. 4-7a through Fig. 4-7c); however, it has larger SE than SR and PT as ߪଵ exceeds 1.5 (Fig. 4-7d). SR has smaller SE than PT for any ݊ and ߪଵ and estimates a mean value with at most 0.35% smaller SE than the AA when ߪଵ 1.0. However, as ߪଵ increases, the difference between SE’s of SR and the AA becomes significantly large such that it reaches, for instance, to 80% when ߪଵ ൌ 2.0. For ߪଵ 1.5, the AA estimates the mean value with the largest SE and when ߪଵ reaches 2.0, SR has the smallest SE (Fig. 4-7d). Besides bias and uncertainty, efficiency and consistency are evaluated in context of RMSE to choose an appropriate mean estimator. As mentioned before, an estimator with smaller RMSE is more efficient and it is consistent as its RMSE tends to zero as ݊ becomes very large. Zero RMSE, indeed, means that both bias and SE approach zero for very large ݊. All SE’s of the mean estimators approach zero as ݊ increases (Fig. 4-7), thus any non-zero RMSE is caused by non-zero bias. 45 Bimodal Distribution (a) 3.5 AA PT SR MLE 3 2.5 SD(xT) 2.5 σ1=1.0 3.5 AA PT SR MLE 3 SD(xT) (b) σ1=0.5 2 1.5 2 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√n (c) 3 0.2 AA PT SR MLE 16 14 12 SD(xT) 2.5 0.15 σ1 = 2.0 18 AA PT SR MLE 3.5 SD(xT) (d) σ1=1.5 4 0.1 1/√n 2 1.5 10 8 6 1 4 0.5 2 0 0 0.05 0.1 0.15 0 0.2 0 1/√n 0.05 0.1 0.15 0.2 1/√n Fig. 4-7 – Standard errors of the AA, MLE, SR, and PT for four different values of σ1. The AA and MLE are consistent; however, the RMSE’s of SR and PT do not approach zero for large ݊ due to their bias, thus SR and PT are inconsistent (Fig. 4-8). When ߪଵ 1.5, MLE has the smallest RMSE for any ݊ (Fig. 4-8a through Fig. 4-8c); however, MLE has higher RMSE than SR and PT when ߪଵ 1.5 and ݊ ൏ 50 (Fig. 4-8d). For ߪଵ 1, the AA and SR have approximately identical RMSE’s; nevertheless, as ݊ becomes very large, the RMSE’s of the AA is smaller and approaches zero whereas the RMSE of SR tends to a value other than zero. The RMSE of PT is the largest one when ߪଵ 1 regardless of ݊, but as ߪଵ increases to 2.0 and ݊ ൏ 60, PT has the smallest RMSE. The AA becomes the least efficient mean estimator when ߪଵ 1.5 and moderate ݊, but as ݊ becomes very large, the efficiency of the AA improves (Fig. 4-8c and 4-8d). 46 Bimodal Distribution (a) σ1=1.0 3.5 Arith MLE SR PT 3 2.5 RMSE (b) σ1=0.5 3.5 3 Arith MLE SR PT 2.5 RMSE 2 1.5 2 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 0.15 0 0.2 0.05 1/√N (c) 3.5 0.2 Arith MLE SR PT 14 12 RMSE 2.5 0.15 σ1=2.0 18 16 Arith MLE SR PT 3 RMSE (d) σ1=1.5 4 0.1 1/√N 2 10 8 1.5 6 1 4 0.5 2 0 0 0 0.05 0.1 0.15 0 0.2 1/√N 0.05 0.1 0.15 0.2 1/√N Fig. 4-8 – RMSE’s of the AA, MLE, SR, and PT for four different values of σ1. 4.4 Concluding Remarks This chapter shows that PT has slightly less bias than SR; the AA is unbiased; and MLE is asymptotically unbiased. SR estimates the mean value with smaller uncertainty than the AA and PT for any variability and sample size, but it is more uncertain than MLE for certain ranges of variability and sample size. Although MLE has the highest efficiency for moderate variability, there is complexity in using MLE for the case of a bimodal distribution, so other mean estimators are preferable. Each of the AA, SR, and PT has the highest efficiency just for certain ranges of variability and sample size. Therefore, there is the possibility that SR becomes an optimum mean estimator for some ranges of variability and sample size because the SR bias is compensated by its smaller uncertainty and higher efficiency. 47 Power-Normal Distribution Chapter 5 : Performance Evaluation for the Case of Power-Normal Distribution Log-normal distribution is widely used to describe the distributions of reservoir parameters; however, many researchers such as Lambert 1981, Bennion 1966, and Jensen et al. 1987 have shown that reservoir parameters such as permeability and hydrocarbon reserves are not necessarily log-normally distributed and can be described by other kinds of distribution, such as power-normal distribution. None of Keefer and Bodily (1983), Megill (1984) and Bickel et al. (2011), however, have evaluated the performance of SR when the underlying distribution is power-normal. Thus the intention of this chapter is to evaluate the performance of SR and compare it to the performances of other commonly used mean estimators such as the AA and PT for the case of power-normal distribution. For this purpose, the mean estimators’ properties such as bias, uncertainty, efficiency, and consistency are analytically derived and numerically validated using MC simulation. 5.1 Analytical Expressions of Mean Estimators’ Properties Let RV, ܺ, be i.i.d. and transformable to a normal distribution by power-normal transformation: ಓ ିଵ ,λ ് 0 , ....................................................................................... (5-1) ܻ ൌ ൝ ݈݊ሺܺሻ,λ ൌ 0 where െ1 ߣ 1 (Box and Cox 1964). The PDF of a power-normally distributed RV, ܺ, with the transformed mean of ߤ and variance of ߪ ଶ is expressed as ଵ ௫ ಓషభ భ షഋ మ ݁ିమቂ ቃ , ݔ 0 , .................................... (5-2) ݄ ሺݔሻ ൌ ൝ ఃሾ௦ሺሻሿ √ଶగ 0, ݔ 0 where denotes standard normal CDF, and ܭൌ ሺ1⁄λσ ߤ⁄σሻ; ൫– ܭ൯ is a truncated point of the normalized transformed RV of ܼ ൌ ሺܻ െ ߤሻ⁄ߪ. 48 Power-Normal Distribution The PDF, ݄ ሺݔሻ, is defined based on the fact that the distribution of ܻ follows a truncated normal distribution (TND) (i.e., ܻ~ܶܰሺߤ, ߪ ଶ ሻ). When λ ൌ 1, ܺ follows a TND, and ܺ is log-normally distributed when λ ൌ 0. In practice, the truncation issue can be resolved by assigning a very large value to ߤ. As a result, ܭbecomes sufficiently large and ߔሾ݊݃݅ݏሺλሻ ܭሿ ൎ 1, and consequently ܻ is normally distributed. Gnanadesikan (1977) proposed using ܺ ܿ instead of ܺ, where ܿ is adequately large value; however, this method can cause a likelihood function to behave poorly (Atkinson et al. 1991). Among different approaches used to derive the statistical properties of a power normal distribution, Freeman and Modarres’s approach (2006) is used in this study. The expected value and variance of ܺ are analytically derived based on the fact that ܻ follows a TND. The rth moment of RV, ܺ, is given as ܧሺ ݔ ሻ ൌ ଵ ఃሾ௦ሺሻሿ ఙ ሺ ሻ ሺሻ ି ۓቈ∑ஶ െ ∑ஶ ୀ, ܶ ሺߤሻ ൗ ୀ ܶ ሺߤሻߪ ିஶ ∅ ݖሺݖሻ ݀ݖ ,ߣ 0 ۖ ௩ మ ଶ ൫ ൗଶ൯! ఙ ۔ஶ ሺ ሻ ሺሻ ஶ ஶ ۖ ቈ∑ ୀ, ܶ ሺߤሻ ൗమ െ ∑ୀ ܶ ሺߤሻߪ ି ∅ ݖሺݖሻ ݀ݖ ,ߣ ൏ 0 ൗ ଶ ൫ ൯! ە௩ ଶ , . (5-3) ೝ where ܶ ሺሻ ሺߤሻ ൌ ሺ1 ߣߤሻഊି ∏ିଵ ୀሺ ݎെ ݆ߣሻ; ݖൌ ሺ ݕെ ߤ ሻ⁄ߪ follows a standard TND; and ∅ is standard normal PDF (see Appendix E for detail). Thus ܧሺXሻ is obtained by assigning ݎൌ 1 and the ܸܽݎሺXሻ ൌ ܧሺX ଶ ሻ െ ܧሺXሻଶ . The expected value and variance of the AA are given by ܧሺݔ ሻ ൌ ܧሺܺሻ and ܸܽݎሺݔ ሻ ൌ ܸܽݎሺܺሻ⁄݊, respectively. In order to derive the statistical properties of SR and PT, Eqs. 2-9 and 2-10 are rewritten in the general form of ்ݔൌ ωݔ ሺ1 െ 2ωሻݔ௦ ωݔ௩ , ................................................................................ (5-4) where ்ݔis the mean value estimated using the estimator ܶ, ݔ௨ is the uth percentile, and ω equals to 0.3 and 0.185 in SR and PT formulas, respectively. The expected value and variance of Eq. 5-4 can be respectively derived by taking expectation and variance from both sides of it as ܧሺ ்ݔሻ ൌ ߱ ܧሺݔ ሻ ሺ1 െ 2߱ሻ ܧሺݔ௦ ሻ ߱ ܧሺݔ௩ ሻ, ...................................................... (5-5) 49 Power-Normal Distribution and ܸܽݎሺ ்ݔሻ ൌ ߱ଶ ܸܽݎሺݔ ሻ ሺ1 െ 2߱ሻଶ ܸܽݎሺݔ௦ ሻ ߱ଶ ܸܽݎሺݔ௩ ሻ 2߱ሺ1 െ 2߱ሻ ܿݒሺݔ , ݔ௦ ሻ 2߱ሺ1 െ 2߱ሻ ܿݒሺݔ௦ , ݔ௩ ሻ 2߱ଶ ܿݒሺݔ , ݔ௩ ሻ. ............................................................. (5-6) According to Eqs. 5-5 and 5-6, the first step to analytically derive the expected values and variances of SR and PT is to analytically calculate the statistical properties of the percentiles. In this regard, it is assumed that the uth percentile is normally distributed, ݔ௨ ~ܰ൫ܺ௨ , ݑሺ1 െ ݑሻ⁄ሺ݄݊ೠ ଶ ሻ൯ and the covariance between the uth and vth percentiles is given by ݑሺ1 െ ݒሻ⁄൫݊ ݄ሺݔ௨ ሻ ݄ሺݔ௩ ሻ൯ , ݑ൏ ( ݒOrd and Stuart 1987). Therefore, the expected value and variance of the uth percentile can be given by భ ܧሺݔ௨ ሻ ൌ ሺ1 ߣߪݓ௨ ∗ ߣߤሻഊ ..................................................................................... (5-7) and ܸܽݎሺݔ௨ ሻ ൌ ݑܣሺ1 െ ݑሻሺ1 ߣߪݓ௨ ∗ ߣߤሻ where ܣൌ ቀ ଶగఙ మ మሺభషഊሻ ഊ ∗మ ݁௪ೠ , .............................................. (5-8) ߔଶ ሾ݊݃݅ݏሺλሻ ܭሿቁ, is standard normal CDF, ݓ௨ ∗ ൌ ் ିଵ ሺݑ⁄100ሻ, and ் is the truncated standard normal CDF as ் ሺ. ሻ ൌ ሺ.ሻିሺିሻ . ଵିሺିሻ The covariance of two percentiles, ݔ௨ and ݔ௩ , where ݑ൏ ݒis defined as ሺభషഊሻ ഊ ܿݒሺݔ௨ , ݔ௩ ሻ ൌ ݑܣሺ1 െ ݒሻሺ1 ߣߪݓ௨ ∗ ߣߤሻ ሺభషഊሻ ഊ ሺ1 ߣߪݓ௩ ∗ ߣߤሻ ݁ మ మ ೢೠ ∗ శೢೡ ∗ మ . ....................................................................................................................................... (5-9) Substituting Eq. 5-7 into Eq. 5-5 yields the expected values of SR and PT, respectively, as భ భ భ ܧሺݔௌோ ሻ ൌ 0.3 ቂሺ1 ߣߪݓଵ ∗ ߣߤሻഊ ሺ1 ߣߪݓଽ ∗ ߣߤሻഊ ቃ 0.4ሺ1 ߣߪݓହ ∗ ߣߤሻഊ , ......................................................................................................................................(5-10) and 50 Power-Normal Distribution భ భ ܧሺݔ் ሻ ൌ 0.185 ቂሺ1 ߣߪݓହ ∗ ߣߤሻഊ ሺ1 ߣߪݓଽହ ∗ ߣߤሻഊ ቃ 0.63ሺ1 ߣߪݓହ ∗ భ ߣߤሻഊ . ............................................................................................................................(5-11) Applications of Eqs. 5-8 and 5-9 into Eq. 5-6 results in the variances of SR and PT, respectively, as మሺభషഊሻ ഊ ܸܽݎሺݔௌோ ሻ ൌ A ൈ 10ିଷ ቊ8.1 ሺ1 ߣߪݓଵ ∗ ߣߤሻ మሺభషഊሻ ഊ ߣߤሻ ∗మ ݁ ௪వబ ൨ 40ሺ1 ߣߪݓହ ∗ ߣߤሻ ∗ ߣߪݓହ ߣߤሻ మ మ ሺభషഊሻ ೢభబ ∗ శೢఱబ ∗ ഊ మ ݁ ሺభషഊሻ ഊ ሺ1 ߣߪݓଽ ∗ ߣߤሻ ሺభషഊሻ ഊ ߣߤሻ మሺభషഊሻ ഊ ∗మ ݁ ௪భబ ሺ1 ߣߪݓଽ ሺభషഊሻ ഊ ∗మ ݁ ௪ఱబ 12 ቈሺ1 ߣߪݓଵ ∗ ߣߤሻ ሺ1 ሺభషഊሻ ഊ ሺ1 ߣߪݓହ ∗ ߣߤሻ ሺభషഊሻ ഊ ሺ1 ߣߪݓଽ ∗ ߣߤሻ ݁ మ మ ೢభబ ∗ శೢవబ ∗ మ మ మ ೢవబ ∗ శೢఱబ ∗ మ ݁ 1.8ሺ1 ߣߪݓଵ ∗ ቋ, ...........................................................(5-12) and మሺభషഊሻ ഊ ܸܽݎሺݔ் ሻ ൌ A ൈ 10ିଷ ቊ1.63 ሺ1 ߣߪݓହ ∗ ߣߤሻ మሺభషഊሻ ഊ ߣߤሻ ሺభషഊሻ ഊ ߣߤሻ ሺభషഊሻ ഊ ߣߤሻ మሺభషഊሻ ഊ ∗మ ݁ ௪వఱ ൨ 99.23ሺ1 ߣߪݓହ ∗ ߣߤሻ ∗ ሺ1 ߣߪݓହ ߣߤሻ ݁ మ మ ೢవఱ ∗ శೢఱబ ∗ మ ሺభషഊሻ ഊ ݁ మ మ ೢఱ ∗ శೢఱబ ∗ మ ∗ ∗మ ݁ ௪ఱ ሺ1 ߣߪݓଽହ ∗మ ݁ ௪ఱబ 11.66 ቈሺ1 ߣߪݓହ ∗ ሺభషഊሻ ഊ ሺ1 ߣߪݓଽହ ∗ ߣߤሻ ሺభషഊሻ ഊ 0.68ሺ1 ߣߪݓହ ߣߤሻ ∗ ሺ1 ߣߪݓହ ∗ ሺ1 ߣߪݓଽହ ߣߤሻ ሺభషഊሻ ഊ ݁ మ మ ೢఱ ∗ శೢవఱ ∗ మ ቋ. ......................................................................................................................................(5-13) 5.2 Validation of Analytical Expressions using Monte Carlo Simulation The analytical expressions derived in previous section are numerically validated by MC simulation. For this purpose, m = 10,000 data sets with n = 35 to 10,000 samples are taken from a power normal distribution with the transformed mean of ߤ ൌ 5 and standard deviation of ߪ varying from zero to 12, and the exponent of ߣ ൌ 1/2. 51 Power-Normal Distribution As mentioned before, the RV, ܺ, can be transformed into RV, ܻ, where ܻ~ܶܰሺߤ, ߪ ଶ ሻ, therefore, it is easier to generate RV, ܻ, first and then transform it into RV, ܺ, using the following formula భ ܺ ൌ ሺ ܻߣ 1ሻഊ . ...........................................................................................................(5-14) n samples, ݕଵ , … , ݕ , are randomly chosen from the entire domain of ܻ ൌ ି ܨଵ ሼߔ ሺെ1⁄ߣሻ ሾ1 െ ߔሺെ1⁄ߣሻሿݑሽ, where ି ܨଵ is the inverse of the normal CDF with the mean of ߤ, standard deviation of ߪ, and truncated at point of െ1⁄ߣ; ݑis randomly taken from a uniform distribution over interval of ሾ0, 1ሿ ;and ߔ is the standard normal CDF. This procedure is repeated ݉ times to generate ݉ data sets. Then, the mean estimators are applied to each data set, and consequently a set of mean estimates, ൛ݔො் ଵ , … , ݔො் ൟ, is generated. The expected value of this data set, ሺݔො் ሻ , is approximated by the AA and its variance is calculated using the formula, ଶ ܸܽݎሺݔො் ሻ ൌ ∑ ො் െ ሺݔො் ሻ ൧ ൗሺ݉ െ 1ሻ. ୀൣݔ In order to validate the analytical expressions, the ratios of MC results to analytical results are used in this chapter (Fig. 5-1 through Fig. 5-3). Any deviation from one illustrates discrepancy between MC simulation and analytical expressions. The ratio ሺݔො ሻ ⁄ ܧሺݔ ሻ deviates from one at most 0.7% and the ratios of ሺݔොௌோ ሻ ⁄ ܧሺݔௌோ ሻ and ሺݔො் ሻ ⁄ ܧሺݔ் ሻ differ from one at most by 2% for small ݊ and large ߪ. As ݊ increases and/or ߪ decreases, however, the deviation from unity approaches zero. Therefore, in general, analytical expressions reasonably follow the numerical results. Hence, the rest of this chapter focuses on assessing mean estimators’ properties based on the analytical expressions. 52 Power-Normal Distribution (b) (a) 1.03 (̂ ݔA)A /E(xA) 1.02 SD(̂ ݔA) / SD(xA) n=35 n=600 n=2000 n=10000 1.01 1 1.2 1.1 1 n=35 n=600 n=2000 n=10000 0.9 0.99 0.8 0.98 0 2 4 6 8 10 0 12 2 4 6 8 10 12 Standard Deviation (σ) Standard Deviation (σ) (c) RMSE (̂ ݔA) / RMSE (xA) 1.2 1.1 1 n=35 n=600 n=2000 n=10000 0.9 0.8 0 2 4 6 8 10 12 Standard Deviation (σ) Fig. 5-1 – Ratios of MC to analytical results of (a) expected value, (b) SE, and (c) RMSE of the AA for the case of square root power-normal distribution. 53 Power-Normal Distribution (b) (a) n=35 n=600 n=2000 n=10000 (̂ ݔSR)A /E(xSR) 1.02 1.01 1 1.1 1 0.9 0.99 0.98 0.8 0 2 4 6 8 10 12 0 Standard Deviation (σ) (c) 2 4 6 8 10 12 Standard Deviation (σ) n=35 n=600 n=2000 n=10000 1.2 RMSE (̂ ݔSR) / RMSE (xSR) n=35 n=600 n=2000 n=10000 1.2 SD(̂ ݔSR) / SD(xSR) 1.03 1.1 1 0.9 0.8 0 2 4 6 8 10 12 Standard Deviation (σ) Fig. 5-2 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of SR for square root power-normal distribution. 54 Power-Normal Distribution (a) 1.2 SD( ݔPT) / SD(xPT) 1.02 ( ݔPT)A /E(xPT) (b) n=35 n=600 n=2000 n=10000 1.03 1.01 1 n=35 n=600 n=2000 n=10000 1.1 1 0.9 0.99 0.8 0.98 0 2 4 6 8 10 0 12 2 4 6 8 10 12 Standard Deviation (σ) Standard Deviation (σ) (c) RMSE ( ݔPT) / RMSE (xPT) 1.2 n=35 n=600 n=2000 n=10000 1.1 1 0.9 0.8 0 2 4 6 8 10 12 Standard Deviation (σ) Fig. 5-3 – Ratios of MC to analytical results of the (a) expected value, (b) SE, and (c) RMSE of the PT for square root power-normal distribution. 5.3 Analyses of Mean Estimators’ Properties As mentioned previously, the mean estimators’ performances are evaluated based on their statistical properties including bias, uncertainty, efficiency, and consistency. Bias is evaluated using the ratio of ܧሺ ்ݔሻ to ܧሺܺሻ, where ܧሺ ்ݔሻ is the expected value of the mean estimator, ܶ, and ܧሺܺሻ is the true mean (Fig. 5-4). The mean estimator, ܶ, is unbiased when this ratio is one and any deviation from unity illustrates that it is biased. ܧሺݔ ሻ ൌ ܧሺܺሻ, so the ratio is one and the AA is unbiased whereas SR and PT are biased for any ߣ, except when ߣ ൌ 1 (Fig. 5-4). The biases associated with them are unimportant when ߣ is close to one (e.g., λ ൌ 1/2 ); nonetheless, the biases become significant as ߣ approaches zero. PT estimates the mean value with less bias than SR, for instance, when ߣ ൌ 1⁄4, PT estimates a mean value by 2% error whereas SR gives a mean value with 14% error. 55 Power-Normal Distribution (a) 1.2 (b) 1.2 1 0.8 λ=1 λ=1/2 λ=1/4 λ=1/6 λ=1/8 λ=1/16 λ=0 0.6 0.4 0.2 E(xPT)/E(X) E(xSR)/E(X) 1 0.8 λ=1 λ=1/2 λ=1/4 λ=1/6 λ=1/8 λ=1/16 λ=0 0.6 0.4 0.2 0 0 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Standard Deviation (σ) Standard Deviation (σ) Fig. 5-4 – E(XT)/E(X) of (a) SR and (b) PT versus σ for different λ values. As mentioned in Chapter Three, in addition to ߪ, the VDP (Dykstra and Parsons 1950) is used as a measure of variability to evaluate the biases of SR and PT (Fig. 5-5). Both SR and PT estimate the mean value with insignificant errors when VDP ranges from 0 to 0.75 (i.e., homogenous to relatively heterogeneous reservoirs); however, as VDP increases to one, they significantly underestimate the mean value. (a) (b) 1.2 1.2 1 0.8 E(xPT)/E(X) E(xSR)/E(X) 1 λ=1 λ=1/2 λ=1/4 λ=1/6 λ=1/8 λ=1/16 λ=0 0.6 0.4 0.2 0.2 0.4 λ=1 λ=1/2 λ=1/4 λ=1/6 λ=1/8 λ=1/16 λ=0 0.6 0.4 0.2 0 0 0.8 0.6 0.8 0 1 0 VDP 0.2 0.4 0.6 0.8 1 VDP Fig. 5-5 – Analytical ratios of (a) E(xSR)/E(X) and (b) E(xPT)/E(X) versus VDP for different λ values. As mentioned before, another important estimator property is uncertainty; an estimator has smaller uncertainty than others when it has smaller SE. In general, the SE’s of the mean estimators decrease as ߪ decreases and/or ݊ increases (Fig. 5-6). When ߣ 1/8, the AA has the smallest SE, and SR has slightly less SE than PT (Fig. 5-6a through Fig. 5-6c). As ߣ decreases to 1/16, SR and the AA have approximately identical SE’s and perform better than PT for moderate ߪ’s; however, for large ߪ’s, for instance ߪ ൌ 2.0, SR has significantly smaller SE than PT and the AA (Fig. 5-6d). 56 Power-Normal Distribution (b) λ=1/2 1.4 AA 1.2 SD(xT) AA PT 5 SR 1 λ=1/4 6 σ =2.0 σ =1.5 0.8 σ =1.0 0.6 SR σ =1.5 3 σ =1.0 2 0.4 σ =0.5 σ =0.5 1 0.2 0 0 0 0.05 0.1 0.15 0 0.2 0.05 0.1 1/√N (c) 0.15 0.2 1/√N (d) λ=1/8 25 AA 20 λ=1/16 80 AA 70 σ =2.0 PT σ =2.0 PT 60 15 σ =1.5 SD(xT) SR SD(xT) σ =2.0 PT 4 SD(xT) (a) 10 SR 50 40 σ =1.5 30 σ =1.0 20 5 σ =1.0 10 σ =0.5 0 σ =0.5 0 0 0.05 0.1 0.15 0.2 0 1/√N 0.05 0.1 0.15 0.2 1/√N Fig. 5-6 – Standard errors of the AA, SR, and PT for four different values of λ and σ. Consistency as another property studies the effect of ݊ on the accuracy of estimates. As mentioned before, a mean estimator is consistent if its RMSE approaches zero for very large ݊ (i.e., zero RMSE happens when both bias and SE tend to zero as ݊ approaches infinity). All SE’s tend to zero for very large ݊ (Fig. 5-6), hence non-zero RMSE is caused by non-zero bias. The AA is unbiased, thus the AA is consistent for all ߪ and ݊; nonetheless, SR and PT are inconsistent due to their biases (Fig. 5-7). The biases of SR and PT are very small when ߣ ൌ 1/2 and ߣ ൌ 1/4; however, they considerably increase as ߣ approaches zero (Fig. 5-4). For example, when ߣ ൌ 1/2 and ߪ ൌ 12, SR and PT estimate a mean value by at most 0.5% and 0.15% errors, respectively. Thus, it can be expected that SR and PT are approximately consistent for some ߣ values (Fig. 5-7a and 5-7b). When ߪ ൏ 1, the AA is the most efficient mean estimator regardless of ݊ and ߣ. However, as ߪ increases depending on ߣ and ݊, each of the AA, SR, and PT could have 57 Power-Normal Distribution the smallest RMSE for certain range of ݊ and ߪ. For instance, when ߣ 1/4, the AA is the most efficient mean estimator, and SR has slightly higher efficiency than PT regardless of ݊ (Fig. 5-7a and 5-7b). However, for ߣ ൏ 1/4, SR becomes the most efficient mean estimator for certain range of ݊ and ߪ. PT gives a mean with significantly smaller bias than SR for small ߣ (Fig. 5-4); therefore, the RMSE of SR significantly deviates from zero whereas the RMSE of PT slightly deviates from zero (Fig. 5-7c and 5-7d). (a) (b) λ=1/2 1.4 1.2 σ =1.5 SR 0.8 σ =1.0 0.6 0.4 σ =0.5 0.2 SR σ =1.5 3 2 σ =1.0 1 σ =0.5 0 0 0 0.05 0.1 0.15 0 0.2 0.05 0.1 (d) λ=1/8 25 AA 0.2 λ=1/16 80 70 AA PT 60 PT SR 50 σ =2.0 15 σ =1.5 10 RMSE(xT) RMSE(xT) 20 0.15 1/√N 1/√N (c) σ =2.0 PT 4 RMSE(xT) RMSE(xT) AA 5 PT 1 λ=1/4 6 σ =2.0 AA SR 40 σ =1.5 30 20 σ =1.0 5 σ =2.0 σ =1.0 10 σ =0.5 σ =0.5 0 0 0 0.05 0.1 0.15 0 0.2 0.05 0.1 0.15 0.2 1/√N 1/√N Fig. 5-7 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. 5.4 Improving Swanson’s Rule Zero bias is a desirable property of a mean estimator; however, it is not necessarily the most important property because a biased mean estimator might be converted into an unbiased estimator using a correction factor or some modifications in its formula. 58 Power-Normal Distribution One way to de-bias SR is to analytically calculate the weights of SR by setting ݔௌோ ൌ ܧሺܺሻ, designated by SRC, as described previously in Chapter Three. For simplification, ܭis assumed to be sufficiently large such that there is no truncated issue. Equating Eq. 3-16 to ܧሺܺሻ yields భ ∑ಮ సమ, ೡሺଵାఓఒሻഊ ష ൬ ൰ ఙ ൘ቆଶ మ ቀమቁ!ቇ൩ ∏షభ ೕసభሺଵିఒሻቤ సഋ భൗ భൗ భ ሾଵାఒఙ௪భబ ାఒఓሿ ഊ ିଶሾଵାఒఓሿ ഊ ାሾଵାఒఙ௪వబ ାఒఓሿ ൗഊ ߱ൌ , ..................................................(5-15) where ݓ௨ ൌ ିଵ ሺݑ/100ሻ, and denotes standard normal CDF. Table 5-1 provides some justified ω’s for power normal distributions with different ߣ values. For cases of λ=1/2 and λ=1/3, SR can be converted to unbiased mean estimator by just a 1.3% change in the weights of SR regardless of ߤ and ߪ. However, for other λ values, ω can be justified based on ߤ and ߪ values. Table 5-1 – Derived ω’s for some power normal distributions with different λ’s. ࣓ λ ଶ ሺߤ ⁄ ସ ⁄12 ସ ሺߤ ⁄ 1⁄6 6 1ሻ 6 1ሻଶ ⁄144 5ߪ ⁄15552 5ߪ 5ߪ 2ሺߪݓଽ ⁄6ሻ 30ሺߤ⁄6 1ሻସ ሺߪݓଽ ⁄6ሻଶ 30ሺߤ⁄6 1ሻଶ ሺߪݓଽ ⁄6ሻସ 1⁄4 3 ߪ ଶ ሺߤ⁄4 1ሻଶ 3 ߪ ସ ⁄256 2 ߪ ସ ݓଽ ସ ⁄256 12ሺߤ⁄4 1ሻଶ ߪ ଶ ݓଽ ଶ ⁄16 1⁄3 0.304 1⁄2 0.304 As derived in Eq. 5-15 and shown in Table 5-1, ߱ is a function of ߤ and ߪ depending on ߣ except for some ߣ’s such as 1/2 and 1/3. This dependency means that these population parameters should be known in order to justify ߱; however, in the most of cases, none of them is known. Hence, they should be estimated from an available data set. Applications of sample transformed mean, ݉, and standard deviation, ݏ, and sample exponent, ߣመ, implement errors into ߱ estimate, ߱ ෝ, and consequently error in the estimation of a mean value. Many researchers have investigated as to how ߣ is estimated from a set of observation (Box and Cox 1964; Hinkley 1975; and Emerson and Stoto 1982). 59 Power-Normal Distribution Assume ߣ to be known but ߤ and ߪ to be unknown. Under this assumption, over 95% confidence interval, ߱ is estimated, for instance, by at most 1.5% error when ߣ ൌ 1/8 compared to the case in which ߤ and ߪ are known (Fig. 5-8). As ߣ increases and/or ߪ decreases, this error significantly drops to 0.2% at the most when ߣ ൌ 1/4. 0.334 σ & µ Known, λ=1/8 σ & µ Unknown, λ=1/8 σ & µ Known, λ=1/6 σ & µ Unknown, λ=1/6 σ & µ Known, λ=1/4 σ & µ Unknown, λ=1/4 Weight of SR (ω) 0.33 0.326 0.322 0.318 0.314 0.31 0.306 0.302 0 0.5 1 1.5 2 σ, n=200 & µ=5.0 Fig. 5-8 – Justified weights of SR versus σ for three different λ values when σ is known and unknown with error bars showing a 95% confidence interval. The application of ߱ ෝ implements error in estimation of ܧ൫ݔௌோ ൯. For example, for ߪ ൌ 2.0, there are at most 0.3% and 0.04% errors when ߣ ൌ 1/6 and ߣ ൌ 1/4, respectively, compared to the case where ߤ and ߪ are known (Fig. 5-9b and 5-9d). As seen in Fig. 5-9, the error associated in the estimation of ܧ൫ݔௌோ ൯ drops when σ decreases and/or ݊ increases. The expected value and variance of SRC are also analytically derived using Eqs. 5-8 and 5-9, respectively, with the weight of ߱ obtained from Eq. 5-15 (Fig. 3-12, solid lines). In analytical derivation, we assume that all population parameters, ߤ, ߪ, and ߣ, are known. 60 Power-Normal Distribution (a) (b) σ=1.5, λ=1/6 52 MC, σ & µ Unknown MC, σ & µ Unknown MC, σ & µ Known 51.5 σ=2.0, λ=1/6 61 60.5 MC, σ & µ Known Analy Analy 60 E(xSR_C) E(xSR_C) 51 50.5 59.5 50 59 49.5 58.5 49 58 0 0.05 0.1 0.15 0.2 0 0.05 1/√n (c) (d) σ=1.5, λ=1/4 30.7 MC, σ & µ Known 0.2 MC, σ & µ Unknown MC, σ & µ Known Analy 34 Analy E(xSR_C) E(xSR_C) 0.15 σ=2.0, λ=1/4 34.2 MC, σ & µ Unknown 30.5 0.1 1/√n 30.3 33.8 33.6 30.1 33.4 33.2 29.9 0 0.05 0.1 0.15 0 0.2 0.05 1/√n 0.1 0.15 0.2 1/√n Fig. 5-9 – E(xSR_C) analytically derived and numerically calculated using MC simulation with error bars showing 95% confidence interval, when σ is either known or unknown. 1.03 λ=1/2 λ=1/4 1.02 E(xSRc)/E(X) λ=1/8 λ=1/16 1.01 1 0.99 0.98 0.97 0 0.5 1 1.5 2 2.5 3 Standard Deviation (σ) Fig. 5-10 -- Ratio of the expected value of SRC to E(X) for four λ values. The de-biased SR approach causes SR to become unbiased as the ratio of ܧ൫ݔௌோ ൯⁄ ܧሺܺሻ is almost one, but as σ exceeds three, the ratio deviates from unity for cases of ߣ ൌ 1/2 and ߣ ൌ 1/4 (Fig. 5-10). Under this condition ( ߪ 3 ), the required 61 Power-Normal Distribution assumption of Eq. 5-15 ( ܭis sufficiently large) is no longer satisfied and thus Eq. 5-15 is not valid anymore. Thus the ratio ܧ൫ݔௌோ ൯⁄ ܧሺܺሻ starts to deviate from one as ߪ exceeds three. The performance comparisons of SR, SRC, the AA, and PT, therefore, are limited to the range of ߪ ∈ ሾ0, 3ሿ. One may wonder how this modification affects the SE and RMSE of SR. The modification causes the SE of SRC to become larger than the SE of SR for any ߪ, ݊, and ߣ (Fig. 5-11). This increase is insignificant for large ߣ’s but becomes considerable for small ߣ’s. For instance when ߪ ൌ 3, there is only a 0.74% increase in SE when ߣ ൌ 1/2; however, as ߣ decreases to 1/16 the SE of SRC is 35% larger than the SE of SR. SRC has smaller SE than PT and larger SE than the AA for any ߪ, ݊, and ߣ except for very small ߣ, such as ߣ ൌ 1/16 (Fig. 5-11 and 5-12). (a) (b) λ=1/2 2.5 λ=1/4 12 AA AA PT σ =3.0 SD(xT) SR SRc 1.5 10 σ =2.0 PT σ =3.0 SR SD(xT) 2 8 SRc 6 σ =2.0 1 4 σ =1.0 0.5 σ =1.0 2 0 0 0 0.05 0.1 0.15 0.2 0 0.05 0.1 1/√N (c) (d) λ=1/8 70 PT 50 SR 0.2 λ=1/16 400 AA 350 σ =3.0 SRc SD(xT) SD(xT) AA 60 0.15 1/√N 40 30 PT 300 SR 250 SRc σ =3.0 200 150 20 σ =2.0 10 100 σ =2.0 50 σ =1.0 0 σ =1.0 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√N 0.1 1/√N Fig. 5-11 – SE’s of the AA, SR, and PT for four different values of λ and σ. 62 0.15 0.2 Power-Normal Distribution (a) (b) λ=1/16 3 PT has smaller SE SRc has smaller SE 2.5 Standard Deviation (σ) 2.5 Standard Deviation (σ) λ=1/16 3 2 1.5 SRc has smaller SE 1 0.5 0 2 1.5 AA has smaller SE 1 0.5 0 0 2000 4000 6000 8000 10000 0 2000 4000 n 6000 8000 10000 n Fig. 5-12 – σ versus n showing regions that SRc has smaller SE than (a) PT and (b) the AA. (a) (b) λ=1/2 2.5 λ=1/4 12 AA AA PT σ =3.0 RMSE(xT) SR SRc 1.5 10 σ =2.0 1 PT σ =3.0 SR RMSE(xT) 2 8 SRc 6 σ =2.0 4 σ =1.0 0.5 σ =1.0 2 0 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√N (c) (d) λ=1/8 70 PT 50 SR 0.15 σ =3.0 SRc 40 30 σ =2.0 20 10 AA PT 300 SR 250 SRc σ =3.0 200 150 100 σ =2.0 50 σ =1.0 0 σ =1.0 0 0 0.05 0.1 0.2 λ=1/16 400 350 RMSE(xT) RMSE(xT) AA 60 0.1 1/√N 0.15 0.2 0 1/√N 0.05 0.1 0.15 0.2 1/√N Fig. 5-13 – RMSE’s of the AA, SR, and PT for four different values of λ and σ. The SR modification converts SR to a consistent mean estimator but causes RMSE to increase except for some ranges of ݊ and ߪ depending on ߣ (Fig. 5-13). The AA is more efficient than SRC for any ݊ and ߪ except for small ߣ, such as ߣ ൌ 1/16 where the AA has higher efficiency than SRC when ߪ ൏ 2.067 (Fig. 5-12 and 5-13d). SRC is more 63 Power-Normal Distribution efficient than PT for some ranges of ݊ and ߪ depending on ߣ. For instance, when ߣ ൌ 1/16 and ߪ ൌ 3.0, PT is more efficient than SRC for ݊ ൏ 52 (Fig. 5-13d); however, when ߣ ൌ 1/4, SRC is more efficient for any ߪ and ݊ (Fig. 5-13c). As mentioned before, SRC has higher SE than SR for any ݊ and ߪ, but it is unbiased. Therefore, its larger SE can be compensated by its zero bias, and consequently SRC becomes more efficient than SR for some ranges of ݊ and ߪ depending on ߣ (Fig. 5-14). λ=1/2 (a) (b) 3 SR 2.5 SRC 1.5 σ σ 2 3 2.5 λ=1/4 2 λ=1/8 λ=1/16 1.5 1 1 0.5 SR 0.5 0 0 25 2025 4025 6025 0 8025 2000 4000 6000 8000 10000 n n Fig. 5-14 – σ versus n showing regions where SRC is more efficient than SR when (a) λ=1/2 and (b) SRC is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient. 5.5 Concluding Remarks This chapter demonstrates that as ߣ approaches one, SR and PT approximate a mean value with insignificant bias; however, their biases significantly rise as ߣ approaches zero. SR underestimates a mean value with larger bias than PT, but it has smaller uncertainty for any variability, sample size, and λ values. The AA has the smallest uncertainty when ߣ ൌ 1 and ߣ ൌ 1/2, but for other λ values, the AA has smaller uncertainty than SR only for certain range of variability and sample size. None of the mean estimators under review is an absolute winner in terms of efficiency for any variability, sample size, and λ. Therefore, SR can become the optimum mean estimator as its bias can be compensated by its smaller uncertainty and higher efficiency for some range of variability and sample size. 64 Auto-Correlated Random Variables Chapter 6 : Performance Evaluation for the Case of AutoCorrelated Random Variables In spite of SR being used as an alternative mean estimator to the AA in the oil and gas industry, a few researchers such as Keefer and Bodily (1983), Megill (1984), and Bickel et al. (2011) have studied its performance in terms of its bias when samples are independent and identically distributed. Reservoir parameters, however, might be dependent with their neighbouring points. For example, permeability measured along a well might be auto-correlated which may describes the sequences of lithofacies in the well. It appears that no attention has been paid to the performance of SR when samples are auto-correlated. Thus this chapter evaluates the performance of SR and compares it to the performances of the AA, MLE, and PT when RV’s are dependent and follow the first order auto-regressive model. The mean estimators’ properties including uncertainty, consistency, and efficiency are analytically evaluated for the case of log-normal distribution, and then these analytical expressions are validated using MC simulation. 6.1 Assumptions The assumptions used in this chapter to analytically derive the properties of mean estimators are as follows. RV’s, ܺଵ , … , ܺ , are assumed to be log-normally distributed with ܧሾ݈݊ሺܺሻሿ ൌ ߤ and ܸܽݎሾ݈݊ሺܺሻሿ ൌ ߪ ଶ and follow the first order auto-regressive (AR(1)). In other words, the RV, ܺ, can be converted into another RV, ܻ, where ܻ ൌ ݈݊ሺܺሻ and ܻ~ܰሺߤ, ߪ ଶ ሻ, and RV’s, ܻଵ , … , ܻ , follow AR(1) as ܻ௭ ൌ ܥ ߩଵ ܻ௭ିଵ ߝ , ................................................................................................. (6-1) where ܥis a constant value, ߝ is a RV which is normally distributed with the mean of ߤఌ and variance of ߪఌ ଶ , the subscript ݖstands for the location at which the RV, ܻ, is measured, and ߩଵ is the correlation coefficient between the pairs of ܻ௭ and ܻ௭ିଵ . ሼܻ௭ ሽ is called a first order auto-regressive process (i.e., the observation at location ݖdepends on 65 Auto-Correlated Random Variables the observation at location ݖെ 1, with the correlation coefficient of ߩଵ , and on ߝ ). Eq. 6­ 1 might be treated as a linear regression between ܻ௭ and ܻ௭ିଵ with ߝ as an error term. ሼܻሺݖሻሽ is assumed completely stationary which means the joint distribution of ൛ܻሺݖଵ ሻ, ⋯ , ܻሺݖ ሻൟ is identical with the joint distribution of ൛ܻሺݖଵ Δݖሻ, ⋯ , ܻሺݖ Δݖሻൟ for any ݖand Δݖ, where ݊ is the number of samples (i.e., ܧሺܻ௭ ሻ ൌ ߤ for all ݖ, ܸܽݎሺܻ௭ ሻ ൌ ߪ ଶ for all ݖ, etc). According to the assumptions above, ܥൌ ߤሺ1 െ ߩଵ ሻ, ߤఌ ൌ 0, and ߪఌ ଶ ൌ ሺ1 െ ߩଵ ଶ ሻσଶ , respectively (see Appendix F for derivations). Although the AR(1) model considers only the first-step dependency, the correlation coefficient of ߩఛ , which is the correlation coefficient between pairs of values of ሼܻሺݖሻሽ separated by an interval ߬ and expressed as ߩఛ ൌ ௩ሼሺ௭ሻ,ሺ௭ିఛሻሽ , becomes smaller and ఙሼሺ௭ሻሽ ఙሼሺ௭ିఛሻሽ smaller as ߬ increases, and approaches zero for large ߬. The reason is that ܻ௭ is related to ܻ௭ିଵ , and ܻ௭ିଵ is related to ܻ௭ିଶ ; consequently ܻ௭ is related to ܻ௭ିଶ with smaller correlation coefficient, and so on. According to the fact that ߩఛ is an even function of ߬ when ܻ௭ is real-valued; it can be given by (Priestley 1981) ߩఛ ൌ ߩଵ |ఛ| , ߬ ൌ 0, േ1, േ2, ⋯. ....................................................................................... (6-2) After generating ܻሺݖሻ, the RV, ܺሺݖሻ, is derived using exponential transformation, ܺሺݖሻ ൌ ݁ ሺ௭ሻ . The correlation coefficient between ܺሺݖሻ and ܺሺ ݖെ ߬ሻ is expressed as (Vanmarcke 2010) ߩ ሺ߬ሻ ൌ మ ഐഓ ିଵ మ ିଵ . ......................................................................................................... (6-3) In other words, ܺ௭ ൌ ܥᇱ ߩ ଵ ܺ௭ିଵ ߝ , where ߩ ଵ is the correlation coefficient between ܺ௭ and ܺ௭ିଵ , ܥᇱ ൌ ݁ ఓାఙ మ ⁄ଶ ൫1 െ ߩ ଵ ൯, and ߝ is normally distributed with the మ మ mean of ܧሺߝ ሻ ൌ ߤఌ ൌ 0 and variance of ܸܽݎሺߝ ሻ ൌ ߪఌ ଶ ൌ ൫1 െ ߩ ଵ ଶ ൯݁ ଶఓାఙ ൫݁ ఙ െ 1൯. 66 Auto-Correlated Random Variables 6.2 Analytical Expressions of Mean Estimators’ Properties In this section the expected values and SE’s of the mean estimators are analytically derived based on the assumptions mentioned above. The expected value of the AA is ܧሺݔ ሻ ൌ ܧሺܺሻ and its variance is expressed as ܸܽ ݎቀݔ ௗ ቁ ൌ ሺሻ ఛ ቂ1 2 ∑ିଵ ఛୀଵ ቀ1 െ ቁ ߩ ሺ߬ሻቃ, .................................................... (6-4) మ మ where ܸܽݎሺܺሻ ൌ ݁ ଶఓାఙ ൫݁ ఙ െ 1൯ is the variance of RV, ܺ, ݊ is the number of sample, and the subscript of ݀݁ stands for the dependant case (Vicens and Schaake 1972). Eq. 6-4 reveals that ݊ auto-correlated samples is not as informative as ݊ uncorrelated sample, thus to achieve a certain accuracy, fewer or more auto-correlated samples— depending on the sign of ߩଵ — are needed. This equivalent sample size is called the effective sample size (ESS), designated by ݊ , where the subscript of ݂݂݁ stands for effective. If ߩ 0 more auto-correlated samples are needed; if ߩ ൏ 0 fewer correlated samples might be needed; and ݊ approaches one for strong positive correlation (ߩ → 1) (Priestley 1981). Among different approaches discussed by Thiebaux and Zwiers (1984) to calculate ESS, a method which considers both the variance of the AA of independent data, ܸܽݎሺݔ ሻ, and the variance of the AA of dependent data, ܸܽ ݎቀݔ ௗ ቁ, is used in this study. The ESS can be given by ఛ ݊ ൌ ݊⁄ቂ1 2 ∑ିଵ ఛୀଵ ቀ1 െ ቁ ߩ ሺ߬ሻቃ. ...................................................................... (6-5) Thus, Eq. 6-4 is rewritten as మ ܸܽ ݎቀݔ ௗ ቁ ൌ మ మഋశ ቀ ିଵቁ . .................................................................................... (6-6) As mentioned before, the statistical properties of SR and PT are functions of the statistical properties of percentiles. Thus, mean and standard deviation of percentiles are derived first. The statistical properties of percentiles are functions of log-mean of ଶ ݉௬ ൌ ∑ୀଵ ݈݊ሺݔ ሻ⁄݊ and log-standard deviation of ݏൌ ට∑ୀଵൣ݈݊ሺݔ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ 67 Auto-Correlated Random Variables (see Appendix G). ݊ is derived based on the variance of the AA, so ESS should be modified for estimating the statistical properties of percentiles and consequently SR and PT properties. Zieba (2010) derived an expression for the sample variance of auto-correlated samples as ݏ ଶ ൌ ሺିଵሻ ݏଶ , ....................................................................................... (6-7) ഓ భశమ ∑షభ ഓసభ ሺభషሻഐഓ ቈଵି where ݏ ଶ is the sample variance of ݊ auto-correlated samples and ݏଶ is the sample variance of ݊ uncorrelated samples. In other words, the variance of auto-correlated samples can alternatively be writen as the product of the variance of uncorrelated samples and a correction factor of ߚൌ ିଵ ഓ 1 െ ଵାଶ ∑షభ ഓసభ ሺଵିሻఘഓ ൨, ..................................................................................... (6-8) where ߚ approaches one for large ݊ (Zieba 2010, Fig. 4). Hence Eq. 6-7 is rewritten as ݏ ଶ ൌ ଵ ఉ ݏଶ . .................................................................................................................. (6-9) Taking expectation of Eq. 6-9 yields the expected value of ݏ ଶ as ܧሺݏ ଶ ሻ ൌ ଵ ఉ ߪ ଶ , ...........................................................................................................(6-10) Consequently a new ESS is defined as ݊ ∗ ൌ ଵ . ........................................................................................(6-11) ഓ ఉ ቄଵାଶ ∑షభ ቀଵି ቁఘഓ ቅ ഓ ݊ auto-correlated samples are equivalent to ݊ ∗ un-correlated samples to estimate the mean value using the SR and PT to reach certain uncertainty. The statistical properties of SR and PT are derived as follows: ܧሺݔௌோ ሻ ൌ ݁ ∑ ସୀଶ ୵వబ ! మ మ ∗ ఓା ቄ0.3݁ ୵భబ ா൫௦ ൯ ቂ1 ∑ସୀଶ ୵భబ ! ܶቃ 0.4 0.3݁ ୵వబ ா൫௦൯ ቂ1 ܶቃቅ, ..............................................................................................................(6-12) 68 Auto-Correlated Random Variables ܧሺݔ் ሻ ൌ ݁ ୵వఱ ∑ ସୀଶ ! మ మ ∗ ఓା ቄ0.185݁ ୵ఱ ா൫௦൯ ቂ1 ∑ସୀଶ ୵ఱ ! ܶቃ 0.63 0.185݁ ୵వఱ ா൫௦ ൯ ቂ1 ܶቃቅ, ...............................................................................................................(6-13) ܸܽݎሺݔௌோ ሻ ൌ ݁ మమ ∗ ଶఓା ቄ0.09݁ ଶ୵భబா൫௦൯ ቂ1 ∑ସୀଶ ሺଶ୵వబ ሻ ∑ ସୀଶ ! ሺଶ୵భబ ሻ ! ܶቃ 0.24݁ ୵భబ ா൫௦൯ ቂ1 ∑ସୀଶ ܶቃ 0.16 0.09݁ ଶ୵వబ ா൫௦൯ ቂ1 ሺ୵భబ ሻ ! ܶቃ 0.24݁ ୵వబ ா൫௦൯ ቂ1 ∑ସୀଶ ሺ୵వబ ሻ ! ܶቃ 0.18ቅ െ ሼ0.09 ܧሺݔଵ ሻଶ 0.16ܧሺݔହ ሻଶ 0.09 ܧሺݔଽ ሻଶ 0.24 ܧሺݔଵ ሻ ܧሺݔହ ሻ 0.24 ܧሺݔହ ሻ ܧሺݔଽ ሻ 0.18 ܧሺݔଵ ሻ ܧሺݔଽ ሻሽ, ................................................................(6-14) and ܸܽݎሺݔ் ሻ ൌ ݁ మమ ∗ ଶఓା ∑ ସୀଶ ቄ0.034݁ ଶ୵ఱ ாሺ௦ ሻ ቂ1 ∑ସୀଶ ሺଶ୵వఱ ሻ ! ሺଶ୵ఱ ሻ ! ܶቃ 0.13݁ ୵ఱ ாሺ௦ሻ ቂ1 ∑ସୀଶ ܶቃ 0.13 0.034݁ ଶ୵వఱ ாሺ௦ ሻ ቂ1 ሺ୵ఱ ሻ ! ܶቃ 0.13݁ ୵వఱ ாሺ௦ ሻ ቂ1 ∑ସୀଶ ሺ୵వఱ ሻ ! ܶቃ 0.07ቅ െ ሼ0.034ܧሺݔହ ሻଶ 0.13 ܧሺݔହ ሻଶ 0.034 ܧሺݔଽହ ሻଶ 0.13 ܧሺݔହ ሻ ܧሺݔହ ሻ 0.13 ܧሺݔହ ሻ ܧሺݔଽହ ሻ 0.07 ܧሺݔହ ሻ ܧሺݔଽହ ሻሽ, ..................................................................(6-15) ܶ ൌ ܧቄൣݏ௬ െ ܧ൫ݏ௬ ൯൧ ቅ Where and భ ܧ൫ݏ௬ ൯ ൌ Γ൫݊ ∗ ⁄2൯ൣ2ߪ ଶ ൗ൫݊ ∗ െ 1൯൧మ ൗΓൣ൫݊ ∗ െ 1൯⁄2൧ (see Appendix G for the derivations). The expected value and SE of MLE is derived based on two effective number of ఛ samples: ݊ ൌ ݊⁄ቄ1 2 ∑ିଵ ቀ1 െ ቁ ߩఛ ቅ that is derived based on the variance of the ఛ ௬ sample mean of ݈݊ሺݕሻ, and ݊௩ ∗ that was introduced by Bayley and Hammersley (1946) using the variance of sample variance. Hence the statistical properties of MLE are given as (see Appendix H for derivations) ܧሺݔொ ሻ ൌ ݁ మ మ ఓା ቀ1 െ ఙమ ቁ ೡ ∗ ିଵ ∗ షభ ି ೡమ , ...................................................................(6-16) 69 Auto-Correlated Random Variables and ܸܽݎሺݔொ ሻ ൌ ݁ 6.3 మ ଶஜା ቐ ݁ మ ቀ1 െ ଶఙ మ ሺ ∗ షభሻ ି ೡమ ቁ ೡ ∗ ିଵ െ ቀ1 െ ିሺೡ ∗ ିଵሻ ఙమ ቁ ೡ ∗ ିଵ ቑ. ......(6-17) Analytical Expression Validations Using Monte Carlo Simulation Since the term of ݁ ఓ ൌ ݔହ is common in the expressions derived in the previous section, the ratios of the analytical expressions of expected value and SE to ݔହ are used to numerically validate analytical expressions using MC simulation. For this purpose, m=30,000 data sets containing n=25 to 3000 samples of ݔare taken from a log-normal distribution with the log-mean of 4.6 and log-standard deviation of ߪ varying from 0.05 to 1.5, with the correlation coefficient of ߩ ଵ ൌ 0.7. (a) (b) σ=0.05 1.0018 1.138 1.0016 1.136 E (xA)/x50 E (xA)/x50 1.0014 1.0012 1.001 1.0008 1.134 1.132 1.13 Analy Analy 1.128 MC 1.0006 MC 1.126 0 0.05 0.1 0.15 (c) 0 0.2 1/√n 0.05 0.1 0.15 0.2 0.15 0.2 1/√n (d) σ=1.0 1.675 3.16 1.665 3.14 1.66 3.12 1.655 σ=1.5 3.18 1.67 3.1 E (xA)/x50 E (xA)/x50 σ=0.5 1.14 1.65 1.645 1.64 Analy 1.635 3.06 3.04 Analy 3.02 MC 1.63 3.08 MC 3 1.625 2.98 0 0.05 0.1 1/√n 0.15 0.2 0 0.05 0.1 1/√n Fig. 6-1 – Expected value of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 70 Auto-Correlated Random Variables (a) (b) σ=0.05 0.025 σ=0.5 0.3 0.25 0.02 0.2 SD (xA)/x50 SD (xA)/x50 0.015 0.01 Analy 0.005 MC 0.1 Analy MC 0.05 0 0 0 0.05 0.1 0.15 (c) 0 0.2 1/√n 0.05 (d) 5 0.8 4 SD (xA)/x50 1 0.4 Analy MC 0.2 0.15 0.2 σ= 1.5 6 0.6 0.1 1/√n σ= 1.0 1.2 SD (xA)/x50 0.15 3 2 Analy 1 MC 0 0 0 0.05 0.1 0.15 1/√n 0 0.2 0.05 0.1 0.15 0.2 1/√n Fig. 6-2 – Standard error of the AA/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. The ratio of analytical expression of expected value and SE to ݔହ of the AA follows MC simulation (Fig. 6-1 and 6-2). There is at most 8% discrepancy between the analytical and MC simulation results, with 95% confidence interval, of ܧሺݔௌோ ሻ⁄ݔହ when ߪ ൌ 1.5 and ݊ ൌ 25; however, the difference decreases, for instance to 2.0% when ߪ decreases to 0.5, and approaches zero as ݊ become very large (Fig. 6-3). The difference between the analytical and MC simulation of ܵ݀ݐሺݔௌோ ሻ⁄ݔହ is 18% at the most when ߪ ൌ 1.5 and ݊ ൌ 25; however, it sharply decreases as ߪ decreases and ݊ become very large (Fig. 6-4). 71 Auto-Correlated Random Variables (a) (b) σ = 0.05 1.002 1.155 1.0018 1.15 E(xSR)/x50 E(xSR)/x50 1.0016 1.0014 1.0012 1.145 1.14 1.135 1.001 1.13 1.0008 1.125 0 0.05 0.1 0.15 0.2 0 0.1 0.15 0.2 0.15 0.2 (d) σ = 1.0 1.8 0.05 1/√n 1/√n (c) σ = 0.5 1.16 σ = 1.5 3.6 3.4 1.75 3.2 3 E(xSR)/x50 E(xSR)/x50 1.7 1.65 2.8 2.6 2.4 1.6 2.2 1.55 0 0.05 0.1 1/√n 0.15 2 0.2 0 0.05 0.1 1/√n Fig. 6-3 – Expected value of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 72 Auto-Correlated Random Variables (a) (b) σ=0.05 0.3 SD (xSR)/x50 0.02 SD (xSR)/x50 σ=0.5 0.35 0.025 0.015 0.01 0.25 0.2 0.15 0.1 0.005 Analy 0 MC 0 0 0.05 0.1 0.15 (c) 0 0.2 1/√n 0.05 (d) σ= 1.0 0.1 0.15 0.2 1/√n σ= 1.5 6 1.2 5 1 4 0.8 SD(xSR)/x50 SD(xSR)/x50 Analy 0.05 MC 0.6 0.4 Analy 0.2 3 2 Analy 1 MC MC 0 0 0 0.05 0.1 1/√n 0.15 0.2 0 0.05 0.1 1/√n 0.15 0.2 Fig. 6-4 – Standard error of SR/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 73 Auto-Correlated Random Variables MC simulation gives ܧሺݔ் ሻ⁄ݔହ at most 20% smaller than analytical approach for ߪ ൌ 1.5 and ݊ ൌ 25. However, the difference decreases, for instance to 1.0% when ߪ decreases to 0.5, and approaches zero as ݊ become very large (Fig. 6-5). The difference between the analytical and MC simulation of ܵ݀ݐሺݔ் ሻ⁄ݔହ is 40% at the most when ߪ ൌ 1.5 and ݊ ൌ 25; however, it sharply decreases as ߪ decreases and ݊ become very large (Fig. 6-6). (a) 1.002 (b) 1.165 σ = 0.05 1.0018 1.16 1.0016 1.155 1.0014 1.15 σ = 0.5 MC E(xPT)/x50 E(xPT)/x50 Analy 1.0012 1.001 1.145 1.14 MC 1.0008 1.135 Analy 1.0006 1.13 0 0.05 0.1 0.15 0 0.2 0.05 1/√n 0.1 0.15 0.2 1/√n (c) 1.9 1.85 σ = 1.5 4 MC Analy E(xPT) /x50 1.8 E(xPT)/x50 (d) 4.5 σ = 1.0 1.75 1.7 3.5 3 MC 2.5 1.65 Analy 2 1.6 0 0.05 0.1 1/√n 0.15 0 0.2 0.05 0.1 0.15 0.2 1/√n Fig. 6-5 – Expected value of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 74 Auto-Correlated Random Variables (a) (b) σ=0.05 0.025 σ=0.5 0.35 0.3 0.02 Analy SD (xPT)/x50 SD (xPT)/x50 0.015 0.01 0.005 MC 0.2 0.15 0.1 0.05 0 0 0 0.05 0.1 0.15 0.2 1/√n (c) Analy 0.25 MC 0 (d) 1 0.15 0.2 0.15 0.2 σ= 1.5 6 5 Analy Analy SD (xPT)/x50 MC 0.8 SD (xPT)/x50 0.1 1/√n σ= 1.0 1.2 0.05 0.6 3 0.4 2 0.2 1 0 MC 4 0 0 0.05 0.1 1/√n 0.15 0.2 0 0.05 0.1 1/√n Fig. 6-6 – Standard error of PT/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 75 Auto-Correlated Random Variables The discrepancy between analytical and MC results of ܧሺݔொ ሻ⁄ݔହ reaches 27% at the most for ߪ ൌ 1.5 and ݊ ൌ 25. However, the difference decreases, for instance to 10% when ߪ decreases to 1.0, and approaches zero as ݊ become very large (Fig. 6-7). The difference between the analytical and MC simulation of ܵ ݀ݐሺݔொ ሻ⁄ݔହ is 54% at the most when ߪ ൌ 1.5 and ݊ ൌ 25; however, it sharply decreases as ߪ decreases and ݊ become very large (Fig. 6-8). (b) σ = 0.05 1.19 1.0018 1.18 1.0016 1.17 E(xMLE)/x50 E(xMLE)/x50 (a) 1.002 1.0014 1.0012 1.001 σ = 0.5 MC Analy 1.16 1.15 1.14 MC 1.0008 1.13 Analy 1.0006 1.12 0 0.05 0.1 0.15 0.2 0 0.05 0.1 1/√n 0.15 0.2 1/√n (c) 1.95 σ = 1.5 7 MC 6 MC Analy 1.9 5 1.85 E(xMLE)/x50 E(xMLE)/x50 (d) σ = 1.0 2 1.8 1.75 Analy 4 3 2 1.7 1 1.65 0 1.6 0 0.05 0.1 0.15 0 0.2 1/√n 0.05 0.1 0.15 1/√n Fig. 6-7 – Expected value of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. 76 0.2 Auto-Correlated Random Variables (a) (b) σ=0.05 0.025 0.3 SD (xMLE)/x50 0.02 SD (xMLE)/x50 σ=0.5 0.35 0.015 0.01 0.2 0.15 0.1 Analy 0.005 0.25 Analy 0.05 MC 0 MC 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√n 0.1 0.15 0.2 0.15 0.2 1/√n (c) 12 1 10 0.8 0.6 0.4 Analy 0.2 σ=1.5 14 1.2 SD (xMLE)/x50 SD (xMLE)/x50 (d) σ=1.0 1.4 Analy MC 8 6 4 2 MC 0 0 0 0.05 0.1 1/√n 0.15 0.2 0 0.05 0.1 1/√n Fig. 6-8 – Standard error of MLE/x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars showing 95% confidence interval. The analytical expressions derived in previous section are also validated based on the assumptions that ߩ ଵ ൌ 0.3 and ߩ ଵ ൌ 0.0 (Fig. 6-9 through Fig. 6-12 shown only for the case of ߪ ൌ 1.5). These figures depict insignificant discrepancy between the analytical expressions and MC simulation results. For example for the case of MLE, the difference between the analytical and MC simulation results of ܧሺݔொ ሻ⁄ݔହ and ݀ݐݏሺݔொ ሻ⁄ݔହ decreases from 27% to 8% and from 54% to 9%, respectively, when ߩ ଵ decrease from 0.7 to 0.3 when n=25 (compare Fig. 6-7d with Fig. 6-9d and Fig. 6-8d with Fig. 6-11d).When ߩ ଵ decreases from 0.3 to 0.0, the discrepancy between the analytical and MC simulation slightly increases but it is still less than the case of ߩ ଵ ൌ 0.7 (Fig. 6-10 and Fig. 6-12). The discrepancy between the MC simulation and analytical results of SR and PT when ߩ ଵ ൌ 0.0 is slightly higher compared to the Chapter Three (independent log­ 77 Auto-Correlated Random Variables normally distributed random variables). The analytical expressions shown in Fig. 6-10 and Fig. 6-12 give smaller expected values and standard errors than the MC results. The reason is that the term of ݁ ௦௪ೠ in Eqs. G-1 and G-2 is terminated to the fourth term (Eq. G-7) whereas there is no termination in Eqs. A-2 and A-3. (b) 4 4 3.8 3.8 3.6 3.6 3.4 3.4 3.2 3.2 E(xSR)/x50 E (xA)/x50 (a) 3 2.8 2.6 2.6 Analy 2.4 2.4 MC 2.2 2.2 2 0 0.05 0.1 0.15 2 0.2 1/√n (c) 0 0.05 0.1 0.15 0.2 1/√n (d) 4 4 3.8 3.8 3.6 3.6 3.4 3.4 3.2 3.2 E(xMLE)/x50 E(xPT)/x50 3 2.8 3 2.8 2.6 MC 2.4 2.6 MC 2.4 Analy 2.2 3 2.8 Analy 2.2 2 2 0 0.05 0.1 0.15 0.2 0 1/√n 0.05 0.1 0.15 0.2 1/√n Fig. 6-9 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. 78 Auto-Correlated Random Variables (b) 3.6 3.6 3.4 3.4 3.2 3.2 3 3 E(xSR)/x50 E (xA)/x50 (a) 2.8 2.6 2.4 Analy 2.2 MC 0.05 2.2 0.1 0.15 2 0.2 1/√n (c) 0 0.05 0.1 0.15 0.2 1/√n (d) 3.6 3.6 3.4 MC 3.2 Analy 3.4 3.2 E(xMLE)/x50 3 E(xPT)/x50 2.6 2.4 2 0 2.8 2.8 2.6 3 2.8 2.6 2.4 2.4 MC 2.2 2.2 Analy 2 2 0 0.05 0.1 0.15 0.2 0 1/√n 0.05 0.1 0.15 0.2 1/√n Fig. 6-10 – The ratio of expected values of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5. 79 Auto-Correlated Random Variables (b) 3 3 2.5 2.5 2 2 SD(xSR)/x50 SD (xA)/x50 (a) 1.5 1 1.5 1 Analy 0.5 MC 0 0 0.05 0.1 0.15 Analy 0.5 MC 0 0.2 0 1/√n 0.05 (c) (d) 4 3.5 0.15 0.2 3.5 3 Analy 3 2.5 Analy 2.5 MC SD (xMLE)/x50 SD (xPT)/x50 0.1 1/√n 2 1.5 MC 2 1.5 1 1 0.5 0.5 0 0 0 0.05 0.1 1/√n 0.15 0 0.2 0.05 0.1 0.15 0.2 1/√n Fig. 6-11 – The ratio of standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.3 and σ=1.5. 80 Auto-Correlated Random Variables (a) (b) 1.6 2.5 1.4 1.2 1.5 SD(xSR)/x50 SD (xA)/x50 2 1 Analy 0.5 0.1 0.15 0.6 Analy 0.2 0 0.05 0.8 0.4 MC 0 1 MC 0 0.2 0 1/√n 0.05 (c) (d) 3 0.1 1/√n 0.15 0.2 2 1.8 2.5 1.6 Analy 1.4 MC 2 SD (xMLE)/x50 SD (xPT)/x50 Analy 1.5 1 MC 1.2 1 0.8 0.6 0.4 0.5 0.2 0 0 0 0.05 0.1 1/√n 0.15 0 0.2 0.05 0.1 0.15 0.2 1/√n Fig. 6-12 – The ratio of standard errors of mean estimators to x50 obtained from analytical expressions and computed numerically using MC simulation shown with error bars depicting 95% confidence interval when ρx1=0.0 and σ=1.5. 6.4 Analysis of the Analytical Expressions of the Mean Estimators’ Properties As shown in previous section, for small correlation coefficient the analytical results match the MC simulation results well; however, for large correlation coefficient, the analytical results are approximately identical to the numerical results obtained from MC simulation for ߪ ൏ 1 and there are discrepancies between analytical and numerical results when ߪ 1.0, especially when ݊ is small. Overall, as ݊ increases, the analytical results follow numerical results with insignificant error regardless of variability. Thus analytical expressions are used to analyze the mean estimators’ properties for ݊ 100 in this section. 81 Auto-Correlated Random Variables The consistency, uncertainty, and efficiency are evaluated in this section for the case of ߩ ଵ ൌ 0.7. Fig. 6-13 draws a comparison between the SE’s of the mean estimators and the one with smaller SE is less uncertain than others. When ߪ 0.5, the SE’s of the AA, SR, PT, and MLE are approximately identical (Fig. 6-13a); however, as ߪ increases, the SE’s differ. When ߪ 1, PT has larger SE than SR for any ݊ and ߪ, but it has smaller SE than the AA for certain range of ݊ and ߪ (Fig. 6-13d). The SE of the AA is approximately identical to the SE of SR when ߪ ൌ 1; however, as ߪ exceeds one, the SR has smaller SE than AA and MLE for any ݊ (Fig. 6-13d). (a) (b) σ = 0.05 0.014 0.14 Arith 0.012 0.008 SR MLE PT 0.1 SD (xT)/x50 SD (xT)/x50 PT Arith 0.12 MLE 0.01 σ = 0.5 0.006 0.004 SR 0.08 0.06 0.04 0.002 0.02 0 0 0.02 0.04 0.06 0.08 0 0.1 0 0.02 0.04 1/√n (c) (d) σ = 1.0 0.6 0.08 0.1 0.06 0.08 0.1 σ = 1.5 3 Arith 0.5 PT 0.4 Arith 2.5 MLE SD (xT)/x50 SD (xT)/x50 0.06 1/√n SR 0.3 PT SR 1.5 0.2 1 0.1 0.5 0 MLE 2 0 0 0.02 0.04 0.06 1/√n 0.08 0.1 0 0.02 0.04 1/√n Fig. 6-13 – Analytical standard errors/x50 of the AA, SR, and PT. Figure 6-14 compares the RMSE’s of the mean estimators and the one with smaller RMSE is more efficient than others. When ߪ 0.5, all mean estimator have approximately identical efficiency; however, as ߪ increases they perform differently. For instance, in heterogeneous case, ߪ ൌ 1.5, SR is the most efficient mean estimator except 82 Auto-Correlated Random Variables for very large sample size, ݊ 600, where the RMSE of SR approaches to a value that is different from zero (Fig. 6-14d). The RMSE’s of SR and PT do not approach to zero for very large ݊ and they are inconsistent; however, the AA and MLE are consistent (Fig. 6-14). (a) (b) σ = 0.05 0.014 0.01 0.008 0.1 0.08 0.006 0.06 0.004 0.04 0.002 0.02 0 0 0 0.02 0.04 0.06 0.08 0.1 1/√n (c) 0 0.3 0.06 0.08 0.1 Arith MLE PT SR 2 1.5 0.2 1 0.1 0.5 0 σ = 1.5 2.5 RMSE/x50 0.4 0.04 1/√n 3 Arith MLE PT SR 0.5 0.02 (d) σ = 1.0 0.6 RMSE/x50 Arith MLE PT SR 0.12 RMSE/x50 RMSE/x50 0.14 Arith MLE PT SR 0.012 σ = 0.5 0 0 0.02 0.04 0.06 1/√n 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1 1/√n Fig. 6-14 – Analytical RMSE/x50 of the AA, SR, PT, and MLE. The dependency between data points causes the mean estimators to behave differently compared to the case where samples are uncorrelated. For example, when ߪ ൌ 1.5 and ݊ 100, SR has the smallest efficiency for uncorrelated samples whereas it becomes the most efficient mean estimator for 100 ൏ ݊ ൏ 600 (compare Fig. 3-7d with Fig. 6-14d). As mentioned before, n positively correlated samples are less informative than n uncorrelated samples (ESS<n), and as the correlation coefficient approaches zero, the ESS tends to n. Thus, the mean estimators approximate the mean value with smaller 83 Auto-Correlated Random Variables uncertainty and larger efficiency when data points are less or no auto-correlated (Fig. 6-15 and 6-16). (a) (b) 3 3 ρx1=0.7 ρx1=0.7 2.5 2.5 ρx1=0.3 2 SD(xSR)/x50 SD (xA)/x50 ρ=0.0 ρ=0.0 2 ρx1=0.3 1.5 1 0.5 1.5 1 0.5 0 0 0.02 0.04 0.06 0.08 0 0.1 0 1/√n (c) 0.04 1/√n 0.06 0.08 0.1 (d) 3 3 ρx1=0.7 2.5 SD (xMLE)/x50 ρ=0.0 2 ρx1=0.7 ρx1=0.3 ρ=0.0 2.5 ρx1=0.3 SD (xPT)/x50 0.02 1.5 1 2 1.5 1 0.5 0.5 0 0 0 0.02 0.04 0.06 1/√n 0.08 0 0.1 0.02 0.04 0.06 0.08 0.1 1/√n Fig. 6-15 – The ratio of standard errors of the mean estimators to x50 which analytically derived for three different ρx1 values when σ=1.5. 84 Auto-Correlated Random Variables (a) ρ=0.3 ρ=0.7 ρ=0.0 2.5 ρ=0.3 ρ=0.7 ρ=0.0 2 1.5 1 0.5 1.5 1 0.5 0 0 0 0.02 0.04 0.06 0.08 0.1 1/√n (c) σ=1.5 3 2.5 RMSE/x50 of SR 2 RMSE/x50 of AA (b) σ=1.5 3 0 (d) 3 0.1 ρ=0.7 2.5 ρ=0.3 ρ=0.0 2 RMSE/x50 of MLE RMSE/x50 of PT 0.08 σ=1.5 ρ=0.3 2 0.06 1/√n ρ=0.7 2.5 0.04 σ=1.5 3 0.02 1.5 1 0.5 ρ=0.0 1.5 1 0.5 0 0 0 0.02 0.04 0.06 1/√n 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1 1/√n Fig. 6-16 – RMSE/x50’s of the mean estimators analytically derived for three different ρx1 values when σ=1.5. The analyses in this section are based on the assumption that the correlation coefficient is known; however, in a real case, it has to be approximated. Thus, it adds an error into the estimation of the mean estimator’s properties. 6.5 Auto-Correlated Random Variables with Bimodal Distribution For the case of auto-correlated RV’s with bimodal distribution, the estimator properties are only numerically evaluated using MC simulation as deriving analytical expressions is out of the scope of this study. For this purpose, m = 20,000 data sets including n = 25 to 1,000 samples are generated. It is assumed that the RV, ܺ௭ , taken at location, ݖ, follows a bimodal distribution described by a mixture of two log-normal distributions. In other words, the distribution of transformed RV, ܻ௭ ൌ ݈݊ሺܺ௭ ሻ, is split into two normal distributions with mixing portion 85 Auto-Correlated Random Variables of ߙ ൌ 0.3. It is easier to generate the RV, ܻ௭ , first and then transform it to the RV, ܺ௭ , using ܺ௭ ൌ ݁ . In order to generate a data set of ݕ௭ , two different sets of ݕଵ ௭ ∈ ܻଵ ௭ and ݕଶ ௭ ∈ ܻଶ ௭ are generated, where ܻଵ ௭ is normally distributed with ߤଵ ൌ 1 and ߪଵ varying from 0.05 to 1.5; and ܻଶ ௭ follows a normal distribution with ߤଶ ൌ 3 and ߪଶ ൌ 0.5. It is also assumed that ܻଵ ௭ and ܻଶ ௭ follow the first order auto-regressive model as ܻଵ ௭ ൌ ߩଵଵ ܻଵ ௭ିଵ Cଵ ߝଵ ௭ , ......................................................................................(6-18) and ܻଶ ௭ ൌ ߩଶଵ ܻଶ ௭ିଵ Cଶ ߝଶ ௭ , ......................................................................................(6-19) where, ߩଵଵ and ߩଶଵ are the first auto-correlation functions, and ݖrepresents location at which a sample is taken. For numerical study, it is presumed that ߩଵଵ ൌ ߩଶଵ ൌ 0.7 and using log-normal transformation, the auto-correlation functions of the RV, ܻ௭ , are calculated using Eq. 6-3. Following that, these two subsets are combined based on the formula below ቊ ݕ௭ ൌ ݕଵ ௭ ݖ ܼ , ..........................................................................................(6-20) ݁ݏ݅ݓݎ݄݁ݐ ݕ௭ ൌ ݕଶ ௭ where ܼ is a constant and is a function of ߙ , which means that the first subset occurs up to the location of ܼ and the second subset appears after ܼ. The mean estimators are applied to each data set to calculate their expected values and SE’s as described before. The expected values and SE’s are used to numerically evaluate the performances of mean estimators. For ߪଵ 1, MLE has the smallest SE and has 13% at the most less SE than the AA. As ݊ increases, the AA, SR, and PT perform similarly in the context of uncertainty, but MLE has slightly smaller uncertainty (Fig. 6-17a to 6-16c). As ߪଵ exceeds one, all mean estimators overlap each other for small ݊, which makes it difficult to distinguish the estimator with the smallest SE; however, as ݊ becomes very large, SR and MLE have less uncertainty compared to the AA and PT (Fig. 6-17d). 86 Auto-Correlated Random Variables In addition to SE, RMSE is numerically obtained to evaluate consistency and efficiency of the estimators (Fig. 6-18). For ߪଵ 1, MLE has the smallest RMSE, and other estimators have approximately identical RMSE’s. For ߪଵ 1, all mean estimators overlap each other for small ݊, which makes it difficult to distinguish which one has smaller RMSE. However, as ݊ increases, MLE has slightly higher efficiency than other estimators (Fig. 6-18d). The dependency between samples causes the SE’s (compare Fig. 4-7 with Fig. 6-17) and RMSE’s (compare Fig. 4-8 with Fig. 6-18) of the mean estimators to increase. This means that more positively auto-correlated samples are needed to extract the same information from ݊ un-correlated samples. Hence the auto-correlation causes mean estimators to behave differently in terms of uncertainty and efficiency compared to the case of un-correlated data points. For instance, as pointed out before, SR has smaller uncertainty than the AA for any ݊ and ߪ (Fig. 4-7); however, the auto-correlation leads the AA to become less uncertain than SR for certain ranges of ݊ and ߪ (Fig. 6-17a to 617c). For large ߪ’s, the AA has a significantly higher uncertainty than other mean estimators for the case of un-correlated samples (Fig. 4-7c and 4-7d); however, there is no significant difference between the SE’s of the AA and other mean estimators for autocorrelated samples (Fig. 6-17d). Moreover, the auto-correlation causes the difference between RMSE’s to diminish (Fig. 6-18d) whereas the AA has significantly larger RMSE than other mean estimators for the un-correlated case when ߪ ൌ 1.5 and ݊ is small (Fig. 4-8c). This means that more auto-correlated samples are needed to achieve certain accuracy if SR and PT are used as mean estimators instead of the AA. 87 Auto-Correlated Random Variables (a) AA MLE SR PT 4 σ=0.5 6 5 4 SD(xT) 5 Standard Error (b) σ=0.05 6 3 3 2 2 1 1 AA MLE SR PT 0 0 0 0.05 0.1 0.15 0 0.2 0.05 1/√n (c) (d) σ=1.0 6 0.15 0.2 0.15 0.2 σ=1.5 14 AA MLE SR PT 12 5 10 SD(xT) 4 SD(xT) 0.1 1/√n 3 8 6 AA 2 4 MLE SR 1 2 PT 0 0 0 0.05 0.1 0.15 0 0.2 0.05 0.1 1/√n 1/√n Fig. 6-17 – Standard errors of the AA, SR, and PT with error bar showing 95% confidence interval. 88 Auto-Correlated Random Variables (a) (b) σ=0.05 6 AA MLE SR PT 4 5 RMSE 5 RMSE σ=0.5 6 4 3 3 2 2 1 1 0 AA MLE SR PT 0 0 0.05 0.1 0.15 0.2 0 0.05 1/√n (c) (d) σ=1.0 0.15 0.2 0.15 0.2 σ=1.5 14 5 12 RMSE 6 4 RMSE 0.1 1/√n AA MLE 10 SR 8 PT 3 6 AA MLE SR PT 2 1 4 2 0 0 0 0.05 0.1 0.15 0 0.2 1/√n 0.05 0.1 1/√n Fig. 6-18 – RMSE`s of the AA, SR, and PT with error bar showing 95% confidence. 6.6 Concluding Remarks This chapter shows that for the case of log-normal distribution, all mean estimators have approximately identical uncertainty and efficiency when ߪ ൏ 1; however they perform differently as ߪ increases and/or ݊ decreases. SR has the smallest uncertainty and highest efficiency among mean estimators as ߪ exceeds one. The results demonstrate that as the data points become less or no auto-correlated, the mean estimators approximate the mean value with smaller uncertainty and higher efficiency. For the case of the bimodal distribution, the mean estimators’ properties are only numerically computed via MC simulation. The results show that the auto-correlation causes the mean estimators to approximate mean values with larger uncertainty and smaller efficiency. These changes in uncertainty and efficiency happen with different rates for different mean estimators. 89 Comparison of Mean Estimators Chapter 7 : Comparison of Mean Estimators for Independent Random Variables A reliable estimator should simultaneously have small bias, small uncertainty (i.e., small SE), high efficiency (i.e., small RMSE), and consistency (i.e., zero RMSE for large ݊). Among all mean estimators considered in this study, none of them, however, has these four conditions all together for all variabilities and sample sizes. The AA is unbiased and MLE is asymptotically unbiased whereas SR and PT are both biased, even for small variability. Their biases are insignificant for near-homogenous populations, but sharply rise as the population becomes heterogeneous. Nevertheless, SR and PT are unbiased when the underlying distribution is normal and have insignificant bias for some power-normal distributions with small ߣ, such as ߣ ൌ 1/2. For the case of the log-normal distribution, SR and PT have smaller bias than the MLE when ߪ and ݊ are both very small and/or ߪ and ݊ are both moderately large (Fig. 7-1a). SR has smaller SE than PT, SRC1, and SRC2 for any ݊ and ߪ; however, when ߪ൏ ଵ.ଷ మ ସ.ଽ య మ െ .଼ .ଷ √ 1.1, ................................................................................. (7-1) SR has larger SE than the AA and MLE (Fig. 7-1b). SRC1 and SRC2 are the most efficient mean estimators for some ranges of ݊ and σ (Fig. 7-1c). 90 Comparison of Mean Estimators (a) (b) 5 5 SR & PT Standard Deviation (σ) Standard Deviation (σ) 4 3 MLE 2 1 4 3 2 SR 1 AA 0 SR & PT MLE 0 25 250 25 2500 250 2500 Number of Samples, n Number of Samples, n (c) Standard Deviation (σ) 5 4 PT SRC2 3 2 SR 1 SRC1 MLE AA 0 25 250 2500 Number of Samples, n Fig. 7-1 – σ versus n showing regions in which a mean estimator has (a) the smallest bias, (b) has the lowest SE, and (c) is the most efficient estimator compared to other estimators for the case of log­ normal distribution. When a data set follows a log-normal distribution, Fig. 7-1 can, indeed, be used as a guideline to define an appropriate mean estimator to estimate the mean value of the data set depending on its ߪ and ݊. Although SR has less uncertainty than the AA for certain range for the case of log­ normal distribution, SR has smaller uncertainty than PT and the AA for any ݊ and ߪଵ when underlying distribution is bimodal. SR has larger uncertainty than MLE if (Fig. 7-2) ߪଵ ൏ ଷଶ.ଶ మ െ ସ.ହସ 1.69. ............................................................................................... (7-2) Among mean estimators considered in this study, MLE has the largest efficiency except for large ߪଵ and small ݊ (Fig. 4-8). However, there is complexity in using MLE for the case of bimodal distribution, thus other mean estimators are preferable. The 91 Comparison of Mean Estimators greatest efficiency among AA, SR, and PT varies and depends on ranges of ߪଵ and ݊ (Fig. 7-2b). (a) 2 (b) SR MC SR vs. PT 5 Analy SR vs. PT MC SR vs. AA 1.8 4 1.6 Analy SR vs. AA MLE 1.4 PT σ1 σ1 3 2 MC 1.2 SR 1 AA Analy 1 0 25 250 2500 25 250 Number of Samples, n 2500 Number of Samples, n Fig. 7-2 –σ1 versus n showing regions in which (a) a mean estimator has smaller uncertainty, and (b) is more efficient than other estimators when σ2=0.5 for the case of bimodal distribution. While a small SE is a desirable property, none of the AA, SR, and PT has the smallest SE for all ݊ and ߪ. This applies also when the underlying distribution is power-normal. SR has smaller SE than PT for any ݊, ߪ, and ߣ values; however, it has smaller SE than the AA for some ranges of ݊ and ߪ depending on λ value, except when ߣ ൌ 1 and ߣ ൌ 1/2 where the AA has the smallest SE for any ݊ and ߪ (Fig. 7-3). λ=1/4 λ=1/6 λ=1/8 λ=1/16 Standard Deviation (σ) 12 10 8 6 4 2 0 0 2000 4000 6000 8000 10000 Number of Samples, n Fig. 7-3 – SR has smaller SE than the AA when σ is greater than the value given by each curve depending on n and λ; otherwise the AA has less SE for the case of power-normal distribution (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). SR has smaller RMSE than PT for some ranges of ݊ and ߪ depending on ߣ values (Fig. 7-4a). The AA has the smallest RMSE for any ݊ and ߪ when ߣ 1/8; nevertheless, 92 Comparison of Mean Estimators as ߣ decreases, for example, to 1/16, the AA has the smallest RMSE for some ranges of ݊ and ߪ (Fig. 7-4b). (a) (b) 10 8 λ=1/16 λ=1/2 λ=1/4 12 λ=1/6 λ=1/8 10 Standard Deviation (σ) Standard Deviation (σ) 12 6 4 2 0 8 PT 6 4 SR AA 2 0 25 250 2500 25000 25 250 2500 25000 Number of Samples, n Number of Samples, n Fig. 7-4 – (a) PT is more efficient than SR when σ is greater than the value given by each curve depending on n and λ; otherwise SR is more efficient; and (b) when λ =1/16, a mean estimator is the most efficient depending on σ and n (solid curves and dots obtained from the analytical expressions and MC simulation, respectively). The AA is an optimum mean estimator for the power-normal distribution with ߣ ൌ 1 and ߣ ൌ 1/2 because it has the smallest uncertainty and highest efficiency, and it is unbiased. However, as ߣ differs from these two values, SR is preferable to the AA for certain range of sample size and variability because it estimates mean value with insignificant bias, the smallest uncertainty, and the highest efficiency. For the bimodal distribution, MLE is an optimum mean estimator when there is a sufficient sample size. However, it involves complex manipulations, thus other mean estimators are preferable. SR has the smallest uncertainty compared to the AA and PT for any ݊ and ߪ (Fig. 4-7); nevertheless, it is biased and less efficient for certain ranges of sample size and variability. Although, the AA has larger uncertainty than SR, it is unbiased and has higher efficiency than SR for the large range of sample size and variability (Fig. 7-2b). The dependency between samples causes mean estimators to behave differently; consequently, a different mean estimator may be chosen as an optimum mean estimator for the case of auto-correlated samples compared to un­ correlated case. Auto-correlation leads the AA to have smaller uncertainty than SR for certain ranges of ݊ and ߪ (Fig. 6-17a to 6-16c), while it has larger uncertainty for all ݊ and ߪ. When ߪ 1, the AA has higher efficiency than SR for auto-correlated data while 93 Comparison of Mean Estimators they have approximately identical RMSE for un-correlated data (compare Fig. 4-8 with Fig. 6-18). For example, suppose a dataset has ݊ ൌ 30and 1 ൏ ߪଵ ൏ 1.5. The SR can be an optimum mean estimator because SR has insignificant bias (Fig. 4-6); the smallest uncertainty; and the highest efficiency (Fig. 7-2b). Nevertheless, if this data set follows the auto-regressive model, the AA is an optimum mean estimator because it is unbiased, has the smallest uncertainty after MLE (Fig. 6-17), and the highest efficiency after MLE (Fig. 6-18). As shown in Fig. 7-1, depending on ݊ and ߪ, each of the AA, MLE, SR, and PT can be optimum mean estimator. Although the AA is unbiased, it has the highest uncertainty and lowest efficiency for certain range of ݊ and ߪ. Both SR and PT are biased, and their biases can be compensated by their small uncertainty and high efficiency. However, for certain range of ݊ and ߪ, they cannot be appropriate mean estimators because they significantly underestimate the mean value even they have small SE’s and RMSE’s. Under this condition, the de-biased versions of SR can be used instead, which estimate the mean value with zero or insignificant bias, the lowest uncertainty, and the highest efficiency. Auto-correlation causes to change the performance of mean estimators for log-normal distribution as well. For example, when ߪ is small, SR and PT have larger uncertainty than the AA and MLE (Fig. 3-5a) whereas they have slightly smaller SE’s than the AA and MLE for auto-correlated samples (Fig. 6-13a). Moreover, both SR and PT have larger RMSE’s than the AA and MLE (Fig. 3-7Fig. 3-5a) whereas they have slightly smaller RMSE’s than the AA and MLE for auto-correlated samples (Fig. 6-14Fig. 6-13a). Thus, the AA is an optimum mean estimator for very small ߪ regardless of ݊ for uncorrelated samples (Fig. 7-1); however, SR is an optimum mean estimator when sample are auto-correlated. Each curve in Fig. 7-1 through Fig. 7-4 represents ߪ’s and ݊’s where the mean estimators on either side of the curve have identical property. For example, the green curve in Fig. 7-1c shows ߪ and ݊’s that both the AA and MLE have identical RMSE, and as we move away from the curve the difference between the RMSE’s of the AA and MLE increases. In another words, regions close to the curve which separates two 94 Comparison of Mean Estimators estimators can considered as transition zone where both estimators can be utilized as an optimum mean estimator. In the most of cases, measurements are associated with errors (i.e., ݔ ᇱ ൌ ݔ ݁, where ݔ is true value, ݁ represents the error associated with the measurement, and ݔ ᇱ is the reported value as the measurement). If the error is assumed to have the mean of zero and standard deviation of ߪ , the error increases the variability of data (ܸܽݎሺݔ ᇱ ሻ ൌ ܸܽݎሺݔ ሻ ߪ ଶ ) while it does not change the mean value ( ܧሺݔ ᇱ ሻ ൌ ܧሺݔ ሻ ܧሺ݁ሻ ൌ ܧሺݔ ሻ). Consequently, the measurement error may cause to choose a mean estimator as an optimum estimator for the data set whereas another mean estimator is more appropriate for the data set with zero error. Reservoir parameters can be split into a number of subsets based on the geological or geophysical character (e.g., permeability data set is subdivided on the basis of facie types). This subdivision converts a data set with variability of ߪ and sample size of ݊ into a number of subsets with smaller variabilities, ߪ ᇱ ’s and smaller ݊ᇱ ’s. Therefore, based on ߪ ᇱ and ݊ᇱ of each subset, a different mean estimator might be appropriate, which estimates the mean value with different uncertainty and efficiency degrees. 95 Case Studies Chapter 8 : Case Studies The bootstrap is a resampling technique as a solution for the case where the true distribution of RV, ܺ, is unknown and only an observed data set is available. This method is based on randomly drawing ݊ samples from the observed samples such that each sample can be selected more than one time. By repeating this resampling, ݉ subsets can be created. Following that, the mean estimators are applied to ݉ data sets and the sequence of ൛ݔො ∗ ் ଵ , … , ݔො ∗ ் ൟ is generated. The mean value of this sequence is approximated by the AA, designated by ሺݔො ∗ ் ሻ , and its standard deviation is calculated ଵ/ଶ ଶ as ݀ݐݏሺݔො ∗ ் ሻ ൌ ቄ∑ே ො ∗ ் െ ሺݔො ∗ ் ሻ ൧ ൗሺ݉ െ 1ሻቅ ୀൣݔ . ሺݔො ∗ ் ሻ and ݀ݐݏሺݔො ∗ ் ሻ are, indeed, approximations of the mean and SE of ݔො் , respectively, and are good representative when ݉ is sufficiently large. In this chapter, several datasets are analyzed to illustrate how the results of the previous chapters can be applied and the results are compared to the bootstrap-derived estimates. These datasets are as follows: 1. Reserves in the United Kingdom and Norwegian Central North Sea, as described in Hurst et al. (2000); 2. Estimated ultimate recovery (EUR) of an Oklahoma gas field; 3. A permeability data set from the Cleveland Formation (Rollins et al 1992); 4. Ultimate recoverable gas reserves of Wabamun pool (MacCrossan 1969); 5. EUR of the Hemphill gas field; and 6. A permeability data set from the North Sea. Using the bootstrap method, ݉ ൌ 30,000 subsets are generated from each available data set and then the mean estimators are applied to each set (see results in Table 8-1 through Table 8-6). 96 Case Studies In addition to the bootstrap, the performances of the mean estimators are evaluated using the analytical expressions derived before. There are discrepancies between the bootstrap and analytical results which are mainly caused by having insufficient samples depending on the sample heterogeneity. For example, although the second data set has a larger number of samples (n=83) than the first one (n=21) by a factor of four, it is more heterogeneous by a factor of 1.8 (i.e., the first and second sets have the sample standard deviations of 105 and 189, respectively). Hence, the first data set has approximately as much information as the second one, given their variabilities and sample sizes (approximately (21/83)(189/105)2 = 0.82). The probability plots suggest that the first two examples are log-normally distributed (Fig. 8-1), so ݁ ା௦ మ ⁄ଶ could be used to approximate the mean value, where ݉௬ and ݏ௬ are sample log-mean and log-standard deviation, respectively. Bayley and Hammersley (1946), however, showed that this antilog of the mean log is biased hence, it should be multiplied by a correction factor, ߚ݁ ା௦ మ ⁄ଶ . Later Agterberg (1974, p235) tabulated this correction coefficient, β ൌ Ψ ሺݐሻ⁄Ψஶ ሺݐሻ, where Ψ ሺݐሻ is an infinite series of ݐand ݊; Ψஶ ሺݐሻ ൌ ݁ ௦ మ ⁄ଶ ; and ݐൌ ݏ௬ ଶ ⁄2. (b) (a) 10000 1000 Sample SD of ln(x) = 1.17 Sample Mean of ln(x) = 3.66 1000 Sample SD of ln(x) = 1.94 Sample Mean of ln(x) = 3.09 100 10 1 # points= 21 Sample SR = Sample AA = SR/AA = Sample SD = Sample CV = 67.5 76.3 0.88 105 1.38 Reserves (MMCFE) Reserves (MMBO) 100 10 1 0.1 # Wells= 83 Sample SR = Sample AA = SR/AA = Sample SD = CV = 69.1 90.6 0.76 189 2.09 0.01 0.1 2 5 10 15 20 30 40 50 60 70 80 90 95 2 98 99 5 10 15 20 30 40 50 60 70 80 90 95 98 99 Probability, (%) Probability, (%) Fig. 8-1 – Probability plots of data sets taken from (a) Hurst et al. (2000) in million barrel oil (MMBO), and (b) EUR of an OK field in million cubic feet (MMCFE) with statistical properties calculated from available data sets. Hurst et al.’s (2000) data set has n = 21 points, which approximate a log-normal distribution with log-mean of 3.66 and log-standard deviation of 1.17 (Fig. 8-1a). The correction factor of ߚ ൌ 0.94 is used to calculate the unbiased estimate of the mean as 97 Case Studies 72.7 million barrels of oil (MMBO). He proposed a most likely reserve distribution because Swanson’s mean estimate is feasible based on the available mapped closure and variation in pay and recovery factor. However, SR underestimates the mean value by 10% (Fig. 3-4). In other words, the proposed reserves distribution may have 10% more reserves, which is easily accommodated within the geological constraints described by Hurst et al. (2000). Based on the analytical expression, in which the log-normality of data set is assumed, ݔௌோ is preferable to ݔ் , as the SR bias (10%) is compensated by its 12% and 15% smaller SE and RMSE than the SE and RMSE of PT, respectively (Table 8-1, last three columns). The MLE poorly performs because the MLE performance strongly depends on ݊ that is very small in this case. SR outperforms the AA in context of efficiency and uncertainty; on the other hand, SR approximates the mean value by 10% error whereas the AA is unbiased. Therefore, it makes it difficult to choose SR or the AA as the preferable mean estimator. The two SR bias reduction approaches significantly reduce the SR bias but result in an increase in SE’s from 25.1 to 27.5 and 29.2 MMBO. Nevertheless, SRC1 has a smaller SE and RMSE than other mean estimators with exception of SR and estimates the mean with zero bias. The bootstrap results show that the AA, MLE, and PT estimate the mean value with large SE (Table 8-1). Although SRC1 and SRC2 have about 16% larger SE than SR, they estimate the mean value with 0.08% and 0.01% bias, respectively. Thus, based on the bootstrap and analytical results, SR is not an attractive estimator for this dataset, while SRC1 and SRC2, are possible mean estimators in this case. Furthermore, their reduced bias increases the economic value of the region by approximately $0.7 billion for $100/bbl oil. Table 8-1 – Statistical properties of the Hurst et al.’s (2000) data set. Mean Estimator ෝࢀ (MMBO) ࢞ sample mean AA MLE PT SR SRC1 SRC2 76.30 77.40 56.80 67.50 74.10 75.50 Bootstrap Results (MMBO) ሺ࢞ ෝ∗ ࢀ ሻ ෝ∗ ࢀ ሻ ࢙࢚ࢊሺ࢞ 76.24 79.17 81.62 72.05 79.28 79.21 30.27 27.04 26.68 22.70 27.12 27.00 95% Confidence ෝ∗ ࢀ Interval of ࢞ 16.92 135.57 26.16 132.17 29.33 133.90 27.56 116.54 26.11 132.44 26.29 132.14 98 Theory (MMBO) ࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ RMSE 28.79 29.04 29.46 25.11 27.53 29.17 1.00 1.06 0.97 0.91 1.00 1.00 28.79 29.39 29.55 25.98 27.53 29.17 Case Studies The second data set consists of the gas reserves of an Oklahoma (OK) field, based on 83 wells. The probability plot shows an approximate log-normal distribution for the EUR with the log-mean of 3.1 and log-standard deviation of 1.94, and =0.94; consequently the unbiased estimate of the mean is 136.85 MMCFE (Fig. 8-1b). This data set gives a Swanson mean of 69 million cubic feet (MMCFE) whereas the AA is 90.6 MMCFE, a 24% difference (Table 8-2). This difference is equivalent to US$65,000 difference in reserves per well at $3 per MCFE, which could lead to a poor economic assessment for the whole prospect. Based on the analytical expressions, although the AA is unbiased, it has the smallest efficiency and largest SE (Table 8-2, last three columns). The MLE performs better than the AA in terms of efficiency and uncertainty; however, it is biased because ݊ ൌ 83 is not sufficiently large. The performance of the PT is slightly better than MLE in terms of efficiency and uncertainty, but PT has larger bias than MLE. SR has the smallest SE, but it underestimates the mean value by error of 40% and is as inefficient as MLE. The debiased SR corrections cause SE to increase from 29.7 to 48.66 and 52.83 MMCFE. However, they have preference to SR and PT since they have smaller RMSE’s and negligible bias, which compensate for their larger SE’s. SRC1 and SRC2 are also preferable mean estimators compared to the AA and MLE since they have smaller SE’s and RMSE’s. Table 8-2 – Statistical properties of gas reserves of an Oklahoma field. Mean Estimator ෝࢀ (MMBO) ࢞ sample mean AA MLE PT SR SRC1 SRC2 90.60 143.00 88.30 69.10 112.90 99.20 Bootstrap Results (MMBO) ሺ࢞ ෝ∗ ࢀ ሻ ෝ∗ ࢀ ሻ ࢙࢚ࢊሺ࢞ 90.66 144.88 97.27 88.62 145.49 144.89 107.32 48.39 38.33 19.61 39.93 39.84 95% Confidence ෝ∗ ࢀ Interval of ࢞ 0 130.60 50.04 222.50 22.14 173.30 50.18 126.70 67.23 211.20 66.80 197.20 Theory (MMBO) ࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ RMSE 102.77 59.90 45.50 29.74 48.66 52.83 1.00 1.07 0.78 0.61 1.0045 0.9998 102.77 60.73 55.28 63.15 48.66 52.83 The bootstrap results also indicate SR has the smallest SE’s; however, the unbiased sample mean of 136.85 MMCFE does not lie within the 95% confidence intervals of SR. The AA has the largest SE and then MLE, SRC1, SRC2, and PT have smaller SE’s in descending order. As mentioned before, the MLE is biased for small ݊, so MLE is not a 99 Case Studies good alternative mean estimator. PT could not be an appropriate alternative mean estimator either because it underestimates the mean value by 22% (Fig. 3-4). Hence according to the bootstrap and analytical results, the de-biased SRC1 and SRC2 might be appropriate mean estimators, in descending order. The third example is a permeability data set of the Cleveland Formation reported by Rollins et al (1992). This data set (n=319) follows a log-normal distribution with the logmean of -3.6 and log-standard deviation of 1.73 millidary (mD) (Rollins et al. 1992, Fig. 5). Consequently, the unbiased estimate of the mean is 0.121 mD using the correction factor of ߚ ൌ 0.99. The bootstrap method is not applied to this data set since the data values are unavailable, and only analytical results are presented here (Table 8-3). Table 8-3 – Statistical properties of measured permeability in Cleveland Formation. Mean Estimator ෝࢀ (MMCFE) ࢞ sample mean AA MLE PT SR SRC1 SRC2 0.180 0.121 0.100 0.090 0.122 0.121 Theory (MMCFE) ࢙࢚ࢊሺ࢞ࢀ ሻ 0.0297 0.0191 0.0184 0.0130 0.0188 0.0198 ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ 1.0 1.01 0.86 0.71 1.006 1.0 RMSE 0.0297 0.0192 0.0254 0.0372 0.0189 0.0198 The data set gives a Swanson mean of 0.09 md whereas the AA is 0.18 md, a 50% difference. This 50% difference changes the Cleveland from a tight (less than 0.1 md) to conventional classification for tax and regulatory purposes. SRC1 and SRC2 estimate the mean value as 0.122 and 0.121 md, respectively. SR and PT significantly underestimate the mean value by 28% and 15%, respectively, and they have the lowest efficiency. Therefore, neither SR nor PT is an appropriate mean estimator, although SR has the smallest SE. There is a clear preference for xSRC1, xSRC2, and xMLE compared to SR as around 30% larger SE’s of SRC1, SRC2, and MLE are compensated by their almost zero bias and 50% smaller RMSE. The AA is not a suitable mean estimator because it has low efficiency and large SE although it is unbiased. SRC1, SRC2, and MLE are preferable to the AA because they have smaller SE’s and RMSE’s than the AA in addition to having around zero bias. Hence, SRC1, SRC2, and MLE might be among the best mean estimators for this case. 100 Case Studies To illustrate our findings for the case of bimodal distribution, an example, which follows a bimodal distribution, is provided here: ultimate recoverable gas reserves of Wabamun pool based on 28 wells (MacCrossan 1969). The distribution of the data set can be described by a combination of two log-normal distributions with α = 0.6 (Fig. 8-2). The data set gives SR and the AA of 327,659 and 308,175 MMCFE, respectively; a 6% difference (Fig. 8-2). Although this difference appears insignificant, it is equivalent to 19,500 MMCFE difference in reserves and US$ 58,500,000 difference in value as $3 per MCFE. 1000 Sample Properties of ln(x) Std1 = 0.81 Std2 = 0.82 Mean1 = 23.81 Mean2 = 26.79 100 Swanson's Rule = Arithmetic Average = SR/Arith A = Sample Std = VDP = # Wells=28 CV = 10 1 *Thousands Gas Reserves (MMCFE) 10000 3.28E+05 3.08E+05 0.94 4.E+05 0.8 1.33 0.1 2 5 10 15 20 30 40 50 60 70 80 Probability, (%) 90 95 98 99 Fig. 8-2 – Probability plot of the data set taken from MacCrossan (1969) with sample statistical properties calculated from available data sets. The bootstrap results show that SR has the largest SE whereas MLE has the smallest SE (Table 8-4). MLE cannot be good candidate as an alternative mean estimator for this data set because it strongly depends on ݊ such that it is biased for small ݊, and it also has its complexity in estimating a mean value of a bimodal distribution. Although PT has smaller SE than the AA, it is biased, thus the AA might be appropriate to estimate the mean value. If it is assumed that the true distribution follows a bimodal distribution with the statistical properties of ߤଵ ൌ 23.81, ߤଶ ൌ 26.79, ߪଵ ൌ 0.81, ߪଶ ൌ 0.82, and ߙ ൌ 0.6, the statistical properties of the mean estimators are analytically calculated (Table 8-4; last three columns). The results show that MLE has the smallest SE and RMSE which can compensate its 1.6% bias; however, as mentioned before, MLE involves complex manipulations. Thus MLE is not appropriate mean estimator, and PT cannot be an 101 Case Studies optimum mean estimator either since it has the largest bias, SE, and RMSE. Although SR has 1.6% and 2.6% smaller SE and RMSE than the AA, respectively, it underestimates the mean value by 4.5%. Thus the AA might be the proper mean estimator. Table 8-4 – Statistical properties of the data set taken from MacCrossan (1969). Mean Estimator ෝࢀ ࢞ (MMCFE) sample mean ሺ࢞ ෝ∗ ࢀ ሻ ෝ∗ ࢀ ሻ ࢙࢚ࢊሺ࢞ AA MLE PT SR 308,175 266,060 285,089 327,659 307,930 267,458 302,212 340,311 75,857 52,000 70,707 91,234 Bootstrap Results (MMCFE) Theory (MMCFE) 95% Confidence ෝ∗ ࢀ Interval of ࢞ 159,251 456,609 165,538 369,378 163,627 440,798 161,492 519,130 ࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ RMSE 1.000 1.016 0.908 0.953 86,796 44,063 90,020 85,414 86,796 43,871 86,840 84,532 Another example shown here consists of the EUR of 416 wells located in the Hemphill gas field. The probability plot of the transformed data set, with ݕൌ ሺ ݔ.ଶ଼ െ 1ሻ⁄0.28, shows a normal distribution with the transformed sample mean of 23.14 MMCFE0.28 and standard deviation of 7.65 MMCFE0.28 (Fig. 8-3). (EUR (MMCFE)^λ‐1)/λ 50 40 30 Sample SR = 1796.8 Sample AA = 1824 SR/AA = 0.98 Sample CV = 0.92 Sample SD = 1683.9 Sample VDP = 0.68 20 10 # Wells = 146 0 2 5 10 20 30 40 50 60 70 80 90 95 98 99 Probability (%) Fig. 8-3 – Probability plot of the transformed EUR of the Hemphill gas field with exponent λ=0.28. The AA and SR give the sample means of 1824 and 1796.8 MMCFE, respectively; only a 2% difference. This small difference leads to around 27.2 MMCFE per well and 11,300 MMCFE in total (n = 416 wells) difference in reserves estimation which is equivalent to US$33,950,000 difference in economical assessment assuming US$3 per MCFE. Thus, the example shows that how choosing a correct mean estimator is imperative and may affect decisions for further development. If we assume that samples have been taken from a population that its true distribution is power-normal distribution with transformed mean of 23.16 MMCFE0.28 and standard 102 Case Studies deviation of 7.65 MMCFE0.28, the statistical properties of the mean estimator can be analytically calculated (Table 8-5, last three columns). Analytical results show that PT estimates the mean value with slightly smaller bias (0.1%) than SR (0.4%) and SRC (0.3%); however, SR and SRC are more desirable than PT since they have around 2.0% smaller SE’s and 1.8% smaller RMSE’s than PT. Although the AA is unbiased, both SR and SRC are preferable to the AA because their biases are compensated by their 6.3% and 6.0% smaller SE and RMSE, respectively. Nevertheless, using either SR or SRC to estimate the mean value leads to around US$22,000 and 15,000 per well underestimation of reserves, respectively. Table 8-5 – Statistical properties of EUR data set of the Hemphill gas field. Mean Estimator AA PT SR SRC ෝࢀ ࢞ (MMBO) Sample Mean 1824.00 1831.00 1796.80 1799.76 Bootstrap Results (MMCFE) ሺ࢞ ෝ∗ ࢀ ሻ ෝ∗ ࢀ ሻ ࢙࢚ࢊሺ࢞ 1824.4 1838.8 1807.3 1811 82.21 95.49 92.56 92.42 Theory (MMCFE) 95% Confidence ෝ∗ ࢀ Interval of ࢞ 1663.3 1985.5 1651.6 2026.0 1625.9 1988.7 1629.9 1992.1 ࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ 90.8 86.8 85.0 85.2 1.000 1.001 0.996 0.997 RMSE 90.81 86.83 85.26 85.33 The last example shown here consists of 94 core plug permeability taken on a well in the North Sea. The probability plot of the data set on the log scale illustrates a log-normal distribution with the log-sample mean of 4.3 mD and log-sample standard deviation of 1.33 mD (Fig. 8-4). It follows a AR(1) model with first step autocorrelation coefficient of 0.4. The AA and SR give the sample means of 163 and 150 mD, respectively; an 8% difference. 10,000 Sample SD of ln(x) = 1.33 Sample Mean of ln(x) = 4.31 Permeability (mD) 1,000 100 10 # points= 94 1 2 5 10 15 20 30 40 50 60 70 80 90 95 98 99 Probability, (%) Fig. 8-4 – Probability plot of a permeability data set taken from North Sea. 103 Case Studies The mean estimators’ properties are analytically calculated based on the assumption that the data set comes from a log-normal population with the log-mean of 4.31, logstandard deviation of 1.33, and ߩభ ൌ 0.4 (Table 8-6, last three columns). Based on the analytical results, PT approximates the mean value with 5% and 10% less bias than MLE and SR, respectively; however, it has 8% and 15% higher uncertainty, and 9% and 20% less efficiency than MLE and SR, respectively. Therefore, compared to the AA and PT, the smaller SE’s and RMSE’s of MLE and SR can compensate their larger bias. SR is preferable to MLE because it has a 7% smaller uncertainty and 10% higher efficiency than MLE. Although the AA is unbiased, SR is preferable to the AA because its bias (9%) is compensated by its 18% and 13% smaller SE and RMSE, respectively. Since the data points are auto-correlated, the bootstrap method is used in previous examples cannot be used as a resampling technique for this case and thus the block bootstrap method is utilized instead. The results obtained from the block bootstrap agree with the results analytically calculated. Table 8-6 – Statistical properties of permeability data set measured along a well located in the North Sea. Mean Estimator ෝࢀ (mD) ࢞ Sample Mean ሺ࢞ ෝ∗ ࢀ ሻ ෝ∗ ࢀ ሻ ࢙࢚ࢊሺ࢞ AA MLE PT SR 163.29 181.31 168.71 149.73 173.68 197.56 186.28 167.96 66.70 65.21 70.30 61.39 Block Bootstrap Results (mD) 95% Confidence ෝ∗ ࢀ Interval of ࢞ 42.95 304.41 69.75 325.37 48.49 324.07 47.63 288.28 104 Theory (mD) ࢙࢚ࢊሺ࢞ࢀ ሻ ࡱሺ࢞ࢀ ሻ⁄ࡱሺࢄሻ 68.11 63.56 69.22 57.71 1.000 1.053 1.002 0.907 RMSE 68.11 64.28 69.22 60.14 Future Work Chapter 9 : Conclusions and Recommendations 9.1 Conclusions An optimum mean estimator should simultaneously have small bias, small uncertainty (small SE), consistency (zero RMSE for large ݊), and large efficiency (small RMSE). None of the AA, MLE, SR, and PT has all these conditions at the same time for all ranges of variability and sample size. Regardless of distribution type except for the normal distribution, both SR and PT are biased even for a very large ݊ and small ߪ whereas the AA is unbiased and MLE is asymptotically unbiased. For the case of log-normal distribution, the AA is consistent for any ݊ and ߪ; however, it has the smallest SE and RMSE only for certain ranges of ݊ and ߪ. SR and PT are inconsistent even for a very large ݊ and small ߪ. PT estimates the mean value with slightly less bias than SR; however, SR has smaller SE than PT for any ݊ and ߪ. MLE is asymptotically unbiased and efficient, which means the performance of MLE strongly depends on ݊. SR might be preferable to the AA and MLE since it is more efficient than the AA and MLE for some ranges of ݊ and ߪ although SR is biased. SR underestimates the mean value, even for log-normal populations with small standard deviations. As Megill (1984) observes, this underestimation rapidly increases as ߪ rises, thus users should be aware of the SR bias. Otherwise, for example, a 10% in estimating the mean value of reserves of Hurst et al’s (2000) data set could lead to a poor assessment of the prospect; or a 50% underestimation could cause the Cleveland formation to be classified as tight reservoir. Being unbiased is a desirable property, but it is not necessarily the most important mean estimator’s property because SR can be de-biased using a correction factor. Two approaches are described here to de-bias SR: multiply SR by a coefficient, ݔௌோ ଵ , and adjust the weights of SR based on the population standard deviation, ݔௌோ ଶ . Both approaches need ߪ, which is not always available, thus the sample standard deviation should be used instead. Estimating ߪ causes, for example, at most 17% and 20% errors in 105 Future Work estimating ܧ൫ݔௌோ ଵ ൯ and ܧ൫ݔௌோ ଶ ൯ when ߪ ൌ 2.0; nonetheless, the errors rapidly approach zero as n increases and/or ߪ decreases. Converting SR to an unbiased mean estimator causes SE to increase, but the SE’s of xSRC1 and xSRC2 are still smaller than SE’s of other mean estimator except SR for some ranges of ݊ and ߪ. De-biased approaches make SR to be consistent for any ݊ and ߪ and the most efficient mean estimator for large ranges of ݊ and ߪ. For the case of bimodal distribution, SR has smaller uncertainty than the AA and PT for any variability and sample size; however, it has smaller uncertainty than MLE for certain ranges of variability and sample size. For moderate variability, MLE is the most efficient mean estimator; however, it has the smallest efficiency for large variability and small sample sizes. None of the AA, SR, and PT is the most efficient mean estimator for all ranges of variability and sample size. Hence for some ranges of variability and sample size, SR becomes an optimum mean estimator because the SR bias is compensated by its smaller uncertainty and higher efficiency. For the case of power-normal distribution, SR and PT are both biased for all λ values except when λ =1 and have negligible bias when ߣ ൌ 1/2 and ߣ ൌ 1/3. PT approximates a mean value with smaller bias than SR whereas SR has smaller uncertainty than PT for any ݊, ߪ, and ߣ. Efficiency is evaluated based on RMSE which incorporates SE and bias, so SR is more efficient than PT if the bias of SR is compensated by its smaller SE; otherwise, PT becomes more efficient than SR. Consequently, SR becomes preferable to PT for some ranges of ݊ and ߪ depending on ߣ. When ߣ ൌ 1 and ߣ ൌ 1/2, the AA has the smallest uncertainty; however, as λ differs from these two values, SR has smaller uncertainty than the AA for certain ranges of ݊ and ߪ depending on ߣ. When ߣ 1/8, the AA has the highest efficiency, so the AA becomes the most preferable mean estimator because its larger SE might be compensated by being unbiased and having the highest efficiency. However, as ߣ approaches zero, each of the AA, SR, and PT could be preferable mean estimator depending on ݊, ߪ, and ߣ values. 106 Future Work In order to de-bias SR, its weights are modified based on ߤ, ߪ, and ߣ. Since these properties are unknown in the most of cases, they are estimated using available data set. Their applications introduce errors into the estimation of ܧ൫ݔௌோ ൯; nevertheless, these errors rapidly drop to zero as ߣ increases and/or ߪ decreases. When SR becomes unbiased, its SE increases compared to original SR, but it is still smaller than the SE’s of the AA and PT except when ߣ is very small, such as ߣ ൌ 1/16. SRC has higher efficiency than PT and smaller efficiency than the AA for any ݊, ߪ, and ߣ except when ߣ ൌ 1/16 where it has higher efficiency than PT and the AA for certain ranges of ݊ and ߪ. Compared to SR, SRC becomes more efficient for some ranges of ݊ and ߪ depending on ߣ. So far, it has been assumed that RV’s are i.i.d., but this is not always a valid assumption as reservoir parameters might be auto-correlated. Positive auto-correlation leads to a decrease in efficiency and an increase in uncertainty for the case of estimating the mean value. This means that auto-correlated samples are less informative than un­ correlated samples, thus more auto-correlated samples are needed to achieve certain accuracy. The auto-correlation causes the mean estimators to behave differently and depends on which mean estimator is used different ESS is needed to achieve a certain accuracy. 9.2 Future Work In the following, some questions and issues are briefly described as a potential for future research. 9.2.1 Evaluate Swanson’s Rule Performance for Very Small Sample Sizes This study evaluates the performance of SR when ݊ 25; however, in some cases, it is expensive to have many measurements, and thus the available data set contains few samples (݊ ൏ 25). Thus, it is recommended to assess the SR performance for very small sample sizes. 107 Future Work 9.2.2 Consider Beta Distribution for Percentiles In this study, it is assumed that the uth percentile is normally distributed. It, indeed, has a beta distribution and only becomes normally distributed for very large sample sizes (i.e., the uth percentile is asymptotically normally distributed). As seen before, there is a discrepancy between analytical and numerical approaches, especially for small sample size, which might be due to the assumption mentioned above. Therefore, it is recommended to analytically derive the properties of SR based on the fact that the uth percentile has a beta distribution. 9.2.3 Extend Delfiner’s Approach Delfiner (2007) advocated the use of SR to reduce the pitfalls related to permeability estimates from Phi-k relationship. He did this comparison for a Phi-k data set with the correlation coefficient of 0.64. However, he has not addressed whether this method is applicable for all Phi-k cross-plot with different correlation coefficients. Thus, it is recommended to evaluate his approaches for different correlation coefficients and whether it was statistically better. 9.2.4 Evaluate Swanson’s Rule Performance for Truncated Log-normal Distribution Rose (2001) has raised a remarkable issue about SR; however, his conclusion would have been more persuasive if he could have quantitatively studied the bias of SR for a wide range of truncated log-normal distributions. Another issue that he overlooked is that after truncation, 98% of the cumulative density function (CDF) is used to calculate the mean value while proposed SR’s formula is based on using a 100% of the CDF. Therefore, SR’s formula might be changed based on this truncation. This change might be insignificant, but it should be evaluated. Therefore, it would be of interest to comprehensively evaluate the bias, uncertainty, efficiency, and consistency of SR when the underlying truncated distribution is log-normal with wide range of variability. 108 Appendices Appendix A : Order-Statistics Samples We wish to analytically derive the expected value and standard deviation of the discretization methods which can be written in general form of ݔௗ௦ ൌ ܲଵ ݔ ܲଶ ݔ௦ ܲଷ ݔ௧ , ....................................................................................... (A-1) where the subscript of ݀݅ ݏstands for discretization method, and ܲ୧ is the weight assigned to the uth percentile, ݔ௨ . For this purpose, it is assumed that ݔ௨ is normally distributed with the mean of ܺ௨ and variance of ݑሺ1 െ ݑሻ⁄ሺ݄݊ೠ ଶ ሻ; and the covariance of two percentiles, the ݑth and ݒth percentiles is ݑሺ1 െ ݒሻ⁄൫݊ ݄ሺݔ௨ ሻ ݄ሺݔ௩ ሻ൯, where ݑ൏ ( ݒOrd and Stuart 1987). Hence, the expected value and variance of the ݑth percentile when the population is log-normally distributed with the log-mean of ߤ and log-variance of ߪ ଶ can be expressed as ܧሺݔ௨ ሻ ൌ ݁ ሺఓାఙ௪ೠ ሻ , .................................................................................................... (A-2) and ܸܽݎሺݔ௨ ሻ ൌ ଶగఙ మ మ ݑሺ1 െ ݑሻ ݁൫ଶఓାଶఙ௪ೠ ା௪ೠ ൯ , ............................................................ (A-3) respectively, where ݓ௨ ൌ ିଵ ሺݑ/100ሻ, and denotes cumulative standard normal probability density, and ݊ is the sample size. The covariance of the ݑth and ݒth percentiles, where ݑ൏ ݒ, is given by ݒܥሺݔ௨ , ݔ௦ ሻ ൌ ଶగఙ మ ଵൗ ൫௪ మ ା௪ మ ൯൧ ೞ ଶ ೠ . ݑሺ1 െ ݏሻ ݁ൣଶఓାఙሺ௪ೠ ା௪ೞ ሻା .................................. (A-4) Eq. A-1 is, indeed, a linear combination of three percentiles, so in order to derive the analytical expressions of ܧሺݔௗ௦ ሻ and ܸܽݎሺݔௗ௦ ሻ, Pearson’s method is used (Ord and Stuart 1987). In order to drive the expected value and variance of a function of RV’s as given by ݃భ,… ,ೖ ሺݔଵ , … , ݔ ሻ, the Pearson’s method suggests to take Taylor series expansion of the 109 Appendices function around the expected values of RV’s, ܺଵ , … , ܺ ; and then the expansion is truncated to the second term as ݃ ሺݔሻ ൌ ݃ሺߠ ሻ ∑ୀଵ ݃ᇱ ሺߠሻ ሺݔ െ ߠ ሻ ܱሺ݊ିଵ ሻ .................................................... (A-5) where ݃ᇱ ሺݔሻ ൌ ߲݃ሺݔଵ , … , ݔ , … , ݔ ሻ⁄߲ݔ is evaluated at ߠ ൌ ሼߠଵ , … , ߠ ሽ and ߠ is the expected value of RV, ܺ , ݅ ൌ 1, … , ݇. Following that, the expected value and variance of the function ݃ሺݔሻ can be respectively expressed by ܧ൫ ݃ ሺݔሻ൯ ൌ ݃ሺߠ ሻ ܱሺ݊ିଵ ሻ, .................................................................................... (A-6) and ܸܽݎ൫݃ ሺݔሻ൯ ൌ ∑ୀଵ ݃ᇱ ሺߠሻଶ ܸܽݎሺܺ ሻ ∑ ∑ஷୀଵ ݃ᇱ ሺߠ ሻ ݃ᇱ ሺߠሻ ݒܥ൫ܺ , ܺ ൯ ܱሺ݊ିଵ ሻ. ..................................................................................................................................... (A-7) The Taylor expansion of Eq. A-1 is given by ݔௗ௦ ൌ ܲଵ ܧሺݔ ሻ ܲଶ ܧሺݔ௦ ሻ ܲଷ ܧሺݔ௧ ሻ ܲଵ ሾݔ െ ܧሺݔ ሻሿ ܲଶ ሾݔ௦ െ ܧሺݔ௦ ሻሿ ܲଷ ሾݔ௧ െ ܧሺݔ௧ ሻሿ. .......................................................................................................... (A-8) Consequently, the expected value and variance of Eq. A-8 are as follows respectively ܧሺݔௗ௦ ሻ ൌ ܲଵ ܧሺݔ ሻ ܲଶ ܧሺݔ௦ ሻ ܲଷ ܧሺݔ௧ ሻ, ................................................................ (A-9) and ܸܽݎሺݔௗ௦ ሻ ൌ ܲଵ ଶ ܸܽݎሺݔ ሻ ܲଶ ଶ ܸܽݎሺݔ௦ ሻ ܲଷ ଶ ܸܽݎሺݔ௧ ሻ 2ܲଵ ܲଶ ܿݒሺݔ , ݔ௦ ሻ 2ܲଵ ܲଷ ܿݒሺݔ , ݔ௧ ሻ 2ܲଶ ܲଷ ܿݒሺݔ௦ , ݔ௧ ሻ. .................................................................. (A-10) Substituting Eq. A-2 into A-9 yield ܧሺݔௗ௦ ሻ ൌ ܲଵ ݁ ሺఓାఙ௪ೝሻ ܲଶ ݁ ሺఓାఙ௪ೞ ሻ ܲଷ ݁ ሺఓାఙ௪ሻ , .............................................. (A-11) and the application of Eqs. A-3 and A-4 in Eq. A-10 gives 110 Appendices ܸܽݎሺݔௗ௦ ሻ ൌ ଶగఙ మ మ మ ݁ଶఓ ൜ܲଵ ଶ ݎሺ1 െ ݎሻ ݁ଶఙ ௪ೝା௪ೝ ܲଶ ଶ ݏሺ1 െ ݏሻ ݁ଶఙ ௪ೞ ା௪ೞ ೢೝ మ శೢೞ మ మ మ ܲଷ ଶ ݐሺ1 െ ݐሻ ݁ଶఙ௪ା௪ 2ܲଵ ܲଶ ݎሺ1 െ ݏሻ ݁ఙሺ௪ೝା௪ೞ ሻା ݐሻ ݁ ೢ మ శೢ మ ఙ ሺ௪ೝ ା௪ ሻା ೝ మ 2ܲଶ ܲଷ ݏሺ1 െ ݐሻ ݁ ೢ మ శೢ మ ఙ ሺ௪ೞ ା௪ ሻା ೞ మ 111 2ܲଵ ܲଷ ݎሺ1 െ ൠ. ................................... (A-12) Appendices Appendix B : Moments of the Maximum Likelihood Estimator MLE approximates the parameters of a population by maximizing the likelihood function. For any data set, ሺݔଵ , … , ݔ ሻ, taken from a log-normal population with the logmean of ߤ and log-variance of ߪ ଶ , MLE estimates the mean value as ݔொ ൌ ݁ݔ൫݉௬ ݏ௬ ଶ ⁄2൯, ݕ ൌ ݈݊ሺݔ ሻ, where ଵ ݉௬ ൌ ∑ୀଵ ݕ , and ଶ ݏ௬ ଶ ൌ ∑ୀଵൣ݈݊ሺݔ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ. The sample mean and variance are independent RV’s when random samples are drawn from a normal distribution. Hence, the sample mean, ݉௬ , and sample variance, ݏ௬ ଶ , are independent because they are the first and second centered moments of a data set which drawn from a normal distribution, ܻ ൌ ݈݊ሺܺሻ~ܰሺߤ, ߪ ଶ ሻ. Therefore, the covariance of ݉௬ and ݏ௬ ଶ is zero. The expected value and variance of ݔொ are analytically derived based on the property of expectation that if two RV’s ܺ and ܻ are independent, then ܧሺܻܺሻ ൌ ܧሺܺሻ ܧሺܻሻ. As just stated, the sample mean, ݉௬ , and sample variance, ݏ௬ ଶ , are independent. Thus ܧሺݔொ ሻ ൌ ܧ൫݁ ା௦ మ ⁄ଶ ൯ ൌ ܧሺ݁ ሻ ܧ൫݁ ௦ మ ⁄ଶ ൯. ..................................................... (B-1) Based on the CLT, ݉௬ is normally distributed with the mean ߤ and variance ఙమ . Therefore, according to the properties of the log-normal distribution, ݁ has the mean of ݁ ஜାఙ మ ⁄ሺଶሻ and variance of ݁ ଶஜାఙ మ ⁄ ൫݁ ఙ మ ⁄ െ 1൯. మ The expectation of ݁ ௦ can be given by ܧ൫ ݁ ௦ మ ൯ ൌ ቀ1 െ ଶఙ మ ିଵ ሺషభሻ ቁ ି మ , .................................................................................... (B-2) 112 Appendices ܽ where ܧ൫݁ ௦ మ ⁄ଶ is a constant ሺషభሻ మ ൯ ൌ ሺ1 െ ߪ ଶ ⁄ሺ݊ െ 1ሻሻି coefficient (Finney 1941). Thus, , and consequently the expected value of ݔொ is given by ܧሺݔொ ሻ ൌ ݁ మ ൰ మ ൬ஜା ݁ ሺషభሻ మ మ ି ቀ1 െ ఙమ ሺషభሻ మ ି ቁ ିଵ . ......................................................... (B-3) Eq. B-3 reveals that MLE is asymptotically unbiased because as ݊ becomes sufficiently ሺషభሻ మ ݁ ି మ large, the term of ቀ1 െ ሺషభሻ మ ି ఙమ ቁ in Eq. B-3 approaches one and thus ିଵ ܧሺݔொ ሻ ൌ ܧሺܺሻ. The variance of ݁ ା௦ ܸܽݎ൫݁ ା௦ ܧ൫݁ ା௦ మ ⁄ଶ మ ⁄ଶ మ ⁄ଶ can be written as మ ൯ ൌ E൫݁ ଶା௦ ൯ െ ܧ൫݁ ା௦ మ ⁄ଶ ଶ మ ൯ ൌ Eሺ݁ ଶ ሻE൫݁ ௦ ൯ െ ଶ ൯ . .......................................................................................................... (B-4) From the properties of log-normal, ܧ൫݁ ൯ ൌ ݁ ఓା൫ మ ఙ మ ൯⁄ሺଶሻ , where b is a constant ሺషభሻ మ మ coefficient. From Eq. B-2, ܧ൫݁ ௦ ൯ ൌ ሺ1 െ 2ߪ ଶ ⁄ሺ݊ െ 1ሻሻି . Thus Eq. B-4 is simplified and the variance of MLE is given by ܸܽݎሺݔொ ሻ ൌ ݁ మ ൰ ൬ଶஜା ൝݁ మ ൰ ൬ ቀ1 െ ଶఙ మ ቁ ିଵ ሺషభሻ మ ି െ ቀ1 െ 113 ఙమ ିሺିଵሻ ቁ ିଵ ൡ. ..................... (B-5) Appendices Appendix C : Conditions for a Bimodal Distribution Investigators such as Eisenberger (1964), Robertson and Fryer (1969), Behboodian (1970), and Schilling et al. (2002) have analyzed the bimodality of a combination of two normal distributions. They proposed an interval for the difference between the means of two distributions such that the combination of two normal distributions yields a bimodal distribution. They mentioned that when the difference between the means lies somewhere outside of this interval, their combination results in a unimodal distribution. In this section, we would like to know condition(s) under which a combination of two log-normal distributions yields a bimodal distribution. Let the PDF of ݄ ሺݔሻ presume to have two modes and be split into two log-normal distributions with log-means of ߤଵ and ߤଶ and log-standard deviations of ߪଵ and ߪଶ such that ݁ ൫ఓభିఙభ మ൯ ൏ ݁ ൫ఓమ ିఙమ మ൯ (i.e, the mode of the first distribution is smaller than the mode of the second one). Hence, the first derivative of PDF, ݄ ᇱ ሺݔሻ, should have three real మ roots. When ݔ൏ ݁ ൫ఓభିఙభ ൯ , ݄ ᇱ ሺݔሻ 0 since both of the ݄ᇱଵ and ݄ᇱ ଶ are positive; and for మ ݔ ݁ ൫ఓమ ିఙమ ൯ , ݄ ᇱ ሺݔሻ ൏ 0 because ݄ᇱଵ and ݄ᇱ ଶ are both negative. Thus, the interval in which there is the possibility of finding the roots of ݄ ᇱ ሺݔሻ (i.e., ݄ ᇱ ሺݔሻ ൌ 0) is మ మ ݁ ఓభ ିఙభ ൏ ݔ൏ ݁ ఓమ ିఙమ . There must exist an ݔ over this interval such that ݄ ᇱ ሺݔሻ ൌ 0 and ݄ ᇱᇱ ሺݔሻ 0 (i.e., ݄ ሺݔሻ is concave up between two modes). The first and second derivatives of ݄ ሺݔሻ are respectively expressed as ᇱ ݄ ሺݔሻ ൌ ߙ ିሾఙభ ାሺ௫ሻିఓభ ሿ ௫ మ ఙభ య √ଶగ ݁ భ ሺೣሻషഋభ మ ቁ భ ିమ ቀ ሺ1 െ ߙሻ and 114 ିሾఙమ ାሺ௫ሻିఓమ ሿ ௫ మ ఙమ య √ଶగ ݁ భ ሺೣሻషഋమ మ ቁ మ ିమ ቀ , . (C-1) Appendices ݄ ᇱᇱ ሺݔሻ ൌ ିఈ ௫ య ఙభ య √ଶగ ሺଵିఈሻ ௫ య ఙమ య √ଶగ ቄെ2ሾ݈݊ሺݔሻ െ ߤଵ ߪଵ ଶሿ ଶ െ ଵ ఙభ మ ቄെ2ሾߤଶ െ ߪଶ െ ݈݊ሺݔሻሿ ሾ݈݊ሺݔሻ െ ߤଵ ߪଵ ଵ ఙమ మ ଶሿ 1ቅ ݁ ଶ భ ሺೣሻషഋభ మ ቁ భ ିమ ቀ ሾߤଶ െ ߪଶ െ ݈݊ሺݔሻሿ െ 1ቅ ݁ భ ሺೣሻషഋమ మ ቁ మ ିమ ቀ . . (C-2) The combination of two equations, ݄ ᇱ ሺݔሻ ൌ 0 and ݄ ᇱᇱ ሺݔሻ 0, yields a cubic equation as given by ݂ሺݔሻ ≡ ଵ ఙభ మ ൛ሺ ݎെ 1ሻ݈݊ଷ ሺݔሻ ሾ2ܣଵ ܣଶ െ ݎሺܣଵ 2ܣଶ ሻሿ݈݊ଶ ሺݔሻ ൣݎ൫2ܣଵ ܣଶ ܣଶ ଶ ൯ െ ൫2 ܣଵ ܣଶ ܣଵ ଶ ൯൧݈݊ሺݔሻ ൣܣଵ ଶ ܣଶ െ ܣݎଵ ܣଶ ଶ െ ߪଵ ଶ ሺܣଶ െ ܣଵ ሻ൧ൟ 0 ......................... (C-3) where ܣଵ ൌ ߤଵ െ ߪଵ ଶ , ܣଶ ൌ ߤଶ െ ߪଶ ଶ , and ݎൌ ߪଵ ଶ ⁄ߪଶ ଶ . When either ݔൌ ݁ ൫ఓభ ିఙభ మ൯ మ or ݔൌ ݁ ൫ఓమ ିఙమ ൯ , ݂ሺݔሻ is negative (i.e., ݂ሺܣଵ ሻ ൌ ݂ሺܣଶ ሻ ൌ െሺܣଶ െ ܣଵ ሻ ൏ 0). For the bimodality of ݄ ሺݔሻ, the cubic equation, ݂ሺݔሻ, must have two real roots and hence three different real roots. Therefore, the discriminant of ݂ሺݔሻ given by ܦൌ ݎଶ ሺܣଶ െ ܣଵ ሻସ െ 2ߪଵ ସ ሺ2 ݎଷ െ 3 ݎଶ െ 3 ݎ 2ሻሺܣଶ െ ܣଵ ሻଶ െ 27ߪଵ ସ ሺ ݎെ 1ሻଶ , .... (C-4) should be positive. Eq. C-4 is a quadratic equation in terms of ሺܣଶ െ ܣଵ ሻଶ . Equating Eq. C-4 to zero is simply solved as ሺమ ିభ ሻబ ఙభ ൌቄ ଵ మ య మ భ మ ቂെ2 ݎଷ 3 ݎଶ 3 ݎെ 2 2ሺ1 െ ݎ ݎଶ ሻ ቃቅ . ................................... (C-5) The discriminant is either negative or zero if ሺܣଶ െ ܣଵ ሻ ሺܣଶ െ ܣଵ ሻ , and consequently ݂ሺݔሻ has at most one real root. Since ݂ሺݔሻ is negative at the boundaries and within range then ݄ ᇱᇱ ሺݔሻ is always negative in the desired interval (i.e., ݄ ሺݔሻ is concave down). Therefore, ݄ ሺݔሻ is unimodal. Now, suppose that ሺܣଶ െ ܣଵ ሻ ሺܣଶ െ ܣଵ ሻ . Then Eq. C-2 has two real roots ݔଵ and మ మ మ మ ݔଶ such that ݁ ఓభ ିఙభ ൏ ݔଵ ൏ ݔଶ ൏ ݁ ఓమ ିఙమ , since ݂ ᇱ ൫݁ ఓభ ିఙభ ൯ 0; ݂ ᇱ ൫݁ ఓమ ିఙమ ൯ ൏ 0; మ and ݂ ሺݔሻ is positive between ݔଵ and ݔଶ and negative when ݁ ఓభ ିఙభ ൏ ݔ൏ ݔଵ and 115 Appendices మ ݔଶ ൏ ݔ൏ ݁ ఓమ ିఙమ . This means that ݄ ሺݔሻ cannot have more than two modes, and ݄ ሺݔሻ is bimodal. Eq. 4-2 shows that the relation between ߙ and ݔis one—one because ݀ݔ⁄݀ߙ ൏ 0 in మ మ the desirable interval, ݁ ఓభ ିఙభ ൏ ݔ൏ ݁ ఓమ ିఙమ . Replacement of ݔଵ and ݔଶ in Eq. 4-2 yields two values: ߙଵ and ߙଶ . This gives the second necessary condition to have a bimodal distribution (i.e., the value of ߙ should lie in the open interval of ሺߙଵ , ߙଶ ሻ). The boundary of ߙ is given by మ భ ቀೣೕ ቁషഋమ ቇ మ ൣఙమ ା൫௫ೕ ൯ିఓమ ൧௫ିమ ቆ ߙ ൌ , ............ (C-6) మ మ భ ቀೣೕ ቁషഋభ భ ቀೣೕ ቁషഋమ షభ.ఱ ൣఙభ ା൫௫ೕ ൯ିఓభ ൧௫ିమ ቆ ቇ ାൣఙమ ା൫௫ೕ ൯ିఓమ ൧௫ିమ ቆ ቇ భ మ where ݔ , ݆ ൌ 1, 2,is the jth root of ݂ሺݔሻ which lies in the preferred interval. 116 Appendices Appendix D : First and Second Moments of Maximum Likelihood for Bimodal Distribution Let the RV’s, ܺଵ , … , ܺ , assume to be i.i.d and follow a bimodal distribution which can be split into two log-normal distributions as ݄ ሺߤ ;ݔ, ߪ ଶ , ߙ ሻ ൌ ߙ ݄ ଵ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ ሺ1 െ ߙሻ݄ ଶ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ, ............................... (D-1) where ݄ ଵ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ and ݄ ଶ ሺߤ ;ݔଵ , ߪଵ ଶ ሻ are the PDF’s of two log-normal distributions with the log-means of ߤଵ and ߤଶ and log-variances of ߪଵ ଶ and ߪଶ ଶ , and ߙ is the portion of each distribution in the population which varies from zero to one. As mentioned in Appendix-B, ݔொ ൌ ݁ ݔቀ݉௬ ݏ௬ ଶ ⁄2ቁ, where ݕ ൌ ݈݊ ቀݔ ቁ, ଵ ݉௬ ൌ ∑ୀଵ ݕ , ݏ௬ ଶ ଶ ൌ ∑ୀଵ ቂ݈݊ ቀݔ ቁ െ ݉௬ ቃ ൗሺ݊ െ 1ሻ, and j=1, 2. Then ݔொ ൌ ߙݔொ ଵ ሺ1 െ ߙሻ ݔொ ଶ . .......................................................................... (D-2) Therefore, using the properties of expected value and variance of the sum of two independent random variables, the first and second moments of MLE for the case of bimodal distribution can be given as ܧሺݔொ ሻ ൌ ߙ ܧ൫ݔொ ଵ ൯ ሺ1 െ ߙሻܧ൫ݔொ ଶ ൯, ........................................................... (D-3) and ܸܽݎሺݔொ ሻ ൌ ߙ ଶ ܸܽݎ൫ݔொ ଵ ൯ ሺ1 െ ߙሻଶ ܸܽݎ൫ݔொ ଶ ൯. ........................................... (D-4) Application of Eqs. B-3 and B-5 into Eqs. D-3 and D-4, respectively, yield the expected value and variance of MLE as ܧሺݔொ ሻ ൌ మ మ ቁ ିଵ ሺషభሻ మ ି ሺషభሻ ಚ మ ಚ మ ఓభ ା భ ି మభ మ ߙ݁ ݁ ቀ1 െ భ మ ሺషభሻ మ ቁ ି ିଵ ሺ1 െ ߙሻ݁ ಚ ఓమ ା మ మ మ ሺషభሻ ಚ మ మ ݁ ି మ ቀ1 െ , ................................................................................................................. (D-5) 117 Appendices and ಚభ మ ଶ ଶఓభ ା ܸܽݎሺݔொ ሻ ൌ ߙ ݁ ಚమ మ ሻଶ ଶఓమ ା ߙ ݁ ݁ ಚమ మ ቀ1 െ ݁ ଶమ మ ିଵ ಚభ మ ቀ1 െ ଶభ మ ିଵ ሺషభሻ ቁ ି మ ሺషభሻ ቁ ି మ െ ቀ1 െ మ మ െ ቀ1 െ ିሺିଵሻ ቁ ିଵ 118 భ మ ିሺିଵሻ ቁ ିଵ ൩ ሺ1 െ ൩. ........................................ (D-6) Appendices Appendix E : First and Second Moments of a Power Normal Distribution Among different approaches used to derive the statistical properties of a power normal distribution, Freeman and Modarres’s approach (2006) is used in this study Let ܺ be power-normally distrubuted with the transfomed mean of ߤ and variance, ߪ ଶ , and the exponent of ߣ. The rth moment of the power-normal distribution is given by ௫ ಓషభ ஶ ܧሺܺ ሻ ൌ ݔ ఃሾ௦ሺሻሿ ఙ√ଶగ ݁ భ షഋ మ ିమ ቀ ቁ ൨ ݀ݔ. ...................................................... (E-1) The RV, ܺ, is obtained by the inverse transformation of ܺ ൌ ሺ1 λyሻଵ⁄ for λ ് 0 and ܺ ൌ ݈݊ሺܻሻ for λ ൌ 0. Eq. E-1 is rearranged as ܧሺ ݔሻ ൌ భ షഋ మ ஶ ሺଵା୷ሻ౨⁄ಓ ಓ ఃሾ௦ሺሻሿ ఙ√ଶగ ۓషభ ି ቀ ቁ ൨ ݁ మ ݀ݕλ 0, మ భ షഋ ۔షభ ሺଵା୷ሻ౨⁄ಓ ିమ ቀ ቁ ൨ ಓ ݁ ݀ݕλ ൏ 0 ି ەஶ ఃሾ௦ሺሻሿ ఙ√ଶగ ................................... (E-2) In order to simplify Eq. E-2, the term ሺ1 λyሻ୰⁄ is expanded by using Taylor Series about the mean ߤ, ܶሺݕሻ ൌ ሺ1 ߣݕሻ⁄ఒ ൌ ∑ஶ ୀ ଵ ! ܶሺሻ ሺߤሻሺ ݕെ ߤሻ , ...................................................... (E-3) ೝ th where ܶ ሺሻ ሺݕሻ ൌ ሺ1 λyሻಓି ∏ିଵ ୀሺ ݎെ ݆λሻ is the i derivative of ܶ ሺݕሻ respect to ݕ. Therefore, Eq. E-2 is simplified as ܧሺ ݔ ሻ ൌ ஶ ۓషభ ಓ భ ሺ ሻ ∑ಮ సబቂ! ் ሺఓሻሺ௬ିఓሻ ቃ ఃሾ௦ሺሻሿఙ√ଶగ భ షഋ మ ି ቀ ቁ ൨ ݁ మ ݀ݕλ మ 0 భ ሺ ሻ భ షഋ ∑ಮ ۔షభ ିమ ቀ ቁ ൨ సబቂ! ் ሺఓሻሺ௬ିఓሻ ቃ ಓ ݁ ݀ݕλ ൏ 0 ି ەஶ ఃሾ௦ሺሻሿఙ√ଶగ . ............................. (E-4) A new RV, ݖ, is introduced, where ݖൌ ሺ ݕെ ߤሻ⁄ߪ and follows a truncated standard normal distribution. Therefore, Eq. E-4 is shortened as 119 Appendices ஶ ଵ ଵ ܧሺ ݔ ሻ ൌ ఃሾ௦ሺሻሿ ቐ ሺ ሻ ∑ஶ ୀ ቂ ܶ ሺߤሻ ߪ ቃ ି ! ଵ ሺ ሻ ∑ஶ ୀ ቂ ܶ ሺߤሻ ! ଵ భ మ ݖ ݁ ቀିమ ቁ ݀ ݖλ 0 √ଶగ భ మ ି ଵ ݁ ቀିమ ቁ ݀ ݖλ ߪ ቃ ିஶ √ଶగ , ........ (E-5) ൏ 0 The integral term in equation above equals the ith moment of a standard normal distribution. It is also similar to the Gamma function which is evaluated by integral ஶ Γሺ݇ሻ ൌ ݐିଵ ݁ ି௧ ݀ݐ. ............................................................................................... (E-6) With Eq. E-6, the integral term of Eq. E-5 is evaluated by ஶ ்ܧ൫ ݖ ൯ ൌ ି ଵ భ మ ஶ ି ݖ ݁ ቂିమ ቃ ݀ ݖൌ ିஶ ݖ ∅ሺݖሻ ݀ ݖെ ିஶ ݖ ∅ሺݖሻ ݀ݖλ 0 √ଶగ ቐ భ ି ଵ ቂି మ ቃ ିஶ ଶగ ݖ ݁ మ ݀ݖ √ , ஶ ஶ ൌ ିஶ ݖ ∅ሺݖሻ ݀ ݖെ ି ݖ ∅ሺݖሻ ݀ݖλ ൏ 0 ..................................................................................................................................... (E-7) where ்ܧ൫ ݖ ൯ is the ith moment of a truncated standard normal distribution, ∅ is standard ஶ normal PDF. The first terms, ିஶ ݖ ∅ሺݖሻ ݀ݖ, in the equation is expressed as 0݅ ൌ ݀݀ ஶ ܧ൫ ݖ ൯ ൌ ିஶ ݖ ∅ሺݖሻ ݀ ݖൌ ൝ ! ݅ൌ ݁݊݁ݒ, ........................................ (E-8) ൗ ଶ మ ൫ ൗଶ൯! Finally, the rth moment of a power-normal distribution is evaluated as ܧሺ ݔ ሻ ൌ ଵ ఃሾ௦ሺሻሿ ఙ ሺ ሻ ሺߤሻ ሺ ሻ ሺߤሻߪ ି ۓቈ∑ஶ ∑ஶ ܶ െ ܶ ିஶ ∅ ݖሺݖሻ ݀ݖ ,ߣ 0 ୀ, ୀ ൗ ۖ ௩ ଶ మ ൫ൗଶ൯! ఙ ۔ஶ ሺሻ ஶ ሺ ሻ ஶ ۖ ቈ∑ ୀ, ܶ ሺߤሻ ൗమ െ ∑ୀ ܶ ሺߤሻߪ ି ∅ ݖሺݖሻ ݀ݖ ,ߣ ൏ 0 ଶ ൫ ൗଶ൯! ە௩ , (E-9) ೝ where ܶ ሺሻ ሺߤሻ ൌ ሺ1 λμሻಓି ∏ିଵ ୀሺ ݎെ ݆λሻ. Taking the logarithm of the data points to convert them to a normal distribution is a form of the power-normal distribution when λ ൌ 0, so the rth moment is given by ܧሺ ݔ ሻ ൌ ݁ ೝమ మ ൰ మ ൬ఓା ................................................................................................... (E-10) 120 Appendices If ܭis sufficiently large such that ܻ is a normal distribution, the first moment of the RV, ܺ, for a few power-normal distributions with different exponents, λ 0, are given in Table E-1. The AA is unbiased; however, SR and PT are biased that are respectively expressed as భ భ ܾௌோ ൌ ܧሺܺሻ െ ቂ0.3ሺ1 λߪ ݓଵ ∗ λߤሻಓ 0.4ሺ1 λߪ ݓହ ∗ λߤሻಓ 0.3ሺ1 భ λߪ ݓଽ ∗ λߤሻಓ ቃ, ...................................................................................................... (E-11) and భ భ ்ܾ ൌ ܧሺܺሻ െ ቂ0.185ሺ1 λߪ ݓହ ∗ λߤሻಓ 0.63ሺ1 λߪ ݓହ ∗ λߤሻಓ 0.185ሺ1 భ λߪ ݓଽହ ∗ λߤሻಓ ቃ, ...................................................................................................... (E-12) where ܧሺܺሻ is the expected value of ܺ given in Table E-1; and ܾௌோ and ்ܾ are the biases of SR and PT, respectively. Table E-1– Expected value of power-normal distribution for different λ values. Λ ࡱሺࢄሻ 1⁄4 ሺߤ⁄4 1ሻସ 3 ߪ ଶ ሺߤ⁄4 1ሻଶ 3 ߪ ସ ⁄256 1⁄3 ሺߤ⁄3 1ሻଷ ߪ ଶ ሺߤ⁄3 1ሻ⁄3 1⁄2 ሺߤ⁄2 1ሻଶ ߪ ଶ ⁄4 1 ሺߤ 1ሻ Table E-2 – Bias of Swanson’s rule for different λ values. Λ ࢈ࡿࡾ ൌ ࡱሺ࢞ࡿࡾ ሻ െ ࡱሺࢄሻ 1⁄4 ଶ ሺ0.6 ݓଵ ସ െ 3ሻ ସ ߤ 3ሺ0.075 ݓଵ ଶ െ 1ሻ ቀ 1ቁ ߪ ଶ ߪ 4 256 1⁄3 ߤ ߪଶ ሺ0.6ݓଵ ଶ െ 1ሻ ቀ 1ቁ 3 3 1⁄2 0.6 ݓଵ ଶ െ 1 ଶ ߪ 4 1 0 Eq.’s E-11 and E-12 are general equations for the biases of SR and PT, which illustrate that are function of λ, σ, and µ. Table E-2 – Bias of Swanson’s rule for different 121 Appendices λ values.ܾௌோ and ்ܾ for a few power-normal distributions with four different exponents are given in Table E-2 and E-3, respectively. As provided in these two tables, SR and PT are unbiased for a normal distribution (λ ൌ 1); however, the bias increases as λ tends to zero. Table E-3– Bias of Pearson-Tukey for different λ values. Λ ࢈ࡼࢀ ൌ ࡱሺ࢞ࡼࢀ ሻ െ ࡱሺࢄሻ 1⁄4 ଶ ሺ0.37 ݓହ ସ െ 3ሻ ସ ߤ ߪ 3ሺ0.046 ݓହ ଶ െ 1ሻ ቀ 1ቁ ߪ ଶ 256 4 1⁄3 ߤ ߪଶ ሺ0.37ݓହ ଶ െ 1ሻ ቀ 1ቁ 3 3 1⁄2 0.37 ݓହ ଶ െ 1 ଶ ߪ 4 1 0 122 Appendices Appendix F : Parameters of the First Order AutoRegressive Model Let ሼܻ௭ ሽ follow the first auto-regressive model as ܻ௭ ൌ ܥ ߩଵ ܻ௭ିଵ ߝ , ............................................................................................... (F-1) where ܥis a constant value; ߝ is a RV which is normally distributed with the mean of ߤఌ and variance of ߪఌ ଶ ; and ݖis a location where ܻ is measured. It is assumed that ሼܻ௭ ሽ is stationary which means all moments of ܻ௭ are constant and independent of location ݖ, ܧሺܻ௭ ሻ ൌ ߤ for all ݖ, ܸܽݎሺܻ௭ ሻ ൌ ߪ ଶ for all ݖ, etc). The constant value ܥis derived, as follows. Multiplying both sides of Eq. F-1 by ܻ௭ିଵ , and then taking expectations from either side yields ܧሺܻ௭ ܻ௭ିଵ ሻ ൌ ܧܥሺܻ௭ିଵ ሻ ܧ൫ߩଵ ܻ௭ିଵ ଶ ൯ ܧሺߝ௧ ܻ௭ିଵ ሻ. ................................................ (F-2) Based on the covariance between ܻ௭ and ܻ௭ିଵ , ܧሺܻ௭ ܻ௭ିଵ ሻ ൌ ߩଵ ߪ ଶ ߤଶ , and ܧ൫ߩଵ ܻ௭ିଵ ଶ ൯ ൌ ߩଵ ሺߪ ଶ ߤଶ ሻ. ܻ௭ିଵ is a linear function of ߝ௧ିଵ , ߝ௧ିଶ , ߝ௧ିଷ , ⋯, therefore, ܧሺߝ௧ ܻ௭ିଵ ሻ ൌ 0. Hence, Eq. F-2 is simplified to ܥൌ ሺ1 െ ߩଵ ሻߤ. .......................................................................................................... (F-3) In order to obtain ߤఌ , take expectation from both sides of Eq. F-1 as ܧሺܻ௭ ሻ ൌ ܥ ߩଵ ܧሺܻ௭ିଵ ሻ ܧሺߝ௧ ሻ, ............................................................................. (F-4) then substitute Eq. F-3 in above equation. Consequently, ܧሺߝ௧ ሻ ൌ ߤఌ ൌ 0. ܸܽݎሺߝ௧ ሻ is derived by taking variance from both sides of Eq. F-1 and it is expressed as ܸܽݎሺߝ௧ ሻ ൌ ሺ1 െ ߩଵ ଶ ሻߪ ଶ . ............................................................................................ (F-5) In order to derive correlation coefficient function. Multiply both sides of Eq. F-1 by ܻ௭ିఛ and then take expectation; the following equation is obtained ܧሺܻ௭ ܻ௭ିఛ ሻ ൌ ܧሺܻܥ௭ିఛ ሻ ߩଵ ܧሺܻ௭ିଵ ܻ௭ିఛ ሻ ܧሺߝ௧ ܻ௭ିఛ ሻ. ........................................... (F-6) 123 Appendices From the definition of covariance, ܧሺܻ௭ ܻ௭ିఛ ሻ ൌ ߩఛ ߪ ଶ ߤଶ , and ܧሺܻ௭ିଵ ܻ௭ିఛ ሻ ൌ ߩఛିଵ ߪ ଶ ߤଶ . As mentioned before, ܻ௭ିఛ is uncorrelated with ߝ௧ , so ܧሺߝ௧ ܻ௭ିఛ ሻ ൌ 0. Therefore, ߩఛ ൌ ߩଵ ߩఛିଵ , and then ߩఛ ൌ ߩଵ ଶ ߩఛିଶ ൌ ߩଵ ଷ ߩఛିଷ ൌ ߩଵ ఛ ߩ ൌ ߩଵ ఛ , since ߩ ൌ 1. According to the fact that ߩఛ is an even function of ߬, when ܻ௭ is real-valued; the correlation coefficients can be given by (Priestley 1981) ߩఛ ൌ ߩଵ |ఛ| , ߬ ൌ 0, േ1, േ2, ⋯ . .................................................................................... (F-7) Although the AR(1) model considers only the first-step dependency, Eq. F-7 implies that the correlation coefficients, ߩఛ , does not become zero after ߬ ൌ 1, but approaches zero instead. The reason is that ܻ௭ is related to ܻ௭ିଵ , and ܻ௭ିଵ is related to ܻ௭ିଶ , consequently ܻ௭ is related to ܻ௭ିଶ , and so on. 124 Appendices Appendix G : Moments of Discretization Methods for the Case of Dependent Random Variables The expected value and variance of ݔௌோ and ݔ் are functions of the statistical properties of the 10th, 50th, 90th, and 95th hence the expected value and variance of these percentiles should be analytically derived first. Suppose that ݈݊ሺݕሻ~ܰ൫݉௬ , ݏ௬ ଶ ൯, so ݔ௨ ൌ ݁ ା୵ೠ ௦ , where ݔ௨ is the uth percentile. According to the properties of the log-normal distribution, the statistical properties of the percentile can be given as ܧሺݔ௨ ሻ ൌ ܧሾ݁ ା୵ೠ௦ ሿ, .............................................................................................. (G-1) and ܸܽݎሺݔ௨ ሻ ൌ ܧሾ݁ ଶ ାଶ୵ೠ ௦ ሿ െ ܧሾ݁ ା୵ೠ ௦ ሿଶ , .......................................................... (G-2) where ݓ௨ ൌ ିଵ ሺݑ/100ሻ, and denotes cumulative standard normal probability density. The analytical expressions of ܧሺݔ௨ ሻ and ܸܽݎሺݔ௨ ሻ are derived based on the property of expectation that if two RV’s ܸଵ and ܸଶ are independent, then ܧሺܸଵ ܸଶ ሻ ൌ ܧሺܸଵ ሻ ܧሺܸଶ ሻ. As mentioned in Appendix B, the sample mean, ݉௬ , and variance ,ݏ௬ ଶ , are independent when samples, ݕଵ , … , ݕ , are assumed identically distributed and follow a normal distribution. Thus ܧሺݔ௨ ሻ ൌ ܧሾ݁ ݁ ୵ೠ ௦ ሿ ൌ ܧሺ ݁ ሻ ܧሺ݁ ୵ೠ ௦ ሻ, ........................................................... (G-3) and ܸܽݎሺݔ௨ ሻ ൌ ܧሺ݁ ଶ ሻ ܧሺ݁ ଶ୵ೠ ௦ ሻ െ ሾ ܧሺݔ௨ ሻሿଶ . ........................................................... (G-4) Based on the CLT, ݉௬ is normally distributed with the mean, ߤ, and variance, ఙమ ᇲ , where ݊ᇱ ൌ ݊⁄ሾ1 2 ∑ିଵ ఛୀଵ ሺ1 െ ߬ ⁄݊ሻߩఛ ሿ and ߩఛ is the correlation coefficient between the pairs 125 Appendices of ܻ’s which are separated by ߬.Therefore, according to the properties of the log-normal distribution, ݁ ೊ has the mean ܧሺ݁ ሻ ൌ݁ మ ൰ మᇲ ൬ஜା , .................................................................................................... (G-5) and variance ܸܽݎሺ݁ ሻ ൌ ݁ మ ൬ଶஜା ᇲ ൰ ቆ݁ మ ᇲ െ 1ቇ. ............................................................................. (G-6) Moreover, based on the properties of log-normal, ܧ൫݁ ൯ ൌ ݁ ൣఓା మ ఙ మ ⁄ଶᇲ ൧ , where b is a constant coefficient. The statistical properties of ܧሾ݁ ୵ೠ ௦ ሿ is derived by taking expectation from the Taylor series expansion of ݁ ୵ೠ ௦ about the expected value of sample standard deviation, ܧ൫ݏ௬ ൯, and truncating it to the forth term as ܧሾ݁ ୵ೠ ௦ ሿ ൌ ݁ ୵ೠ ா൫௦ ൯ ቂ1 ∑ସୀଶ ୵ೠ ! ܶቃ, ..................................................................... (G-7) where ܶ ൌ ܧቄൣݏ௬ െ ܧ൫ݏ௬ ൯൧ ቅ. As mentioned before, Zieba (2010) derived an expression for the sample variance of auto-correlated samples (Eq. 6-7). Based on this expression, the variance of autocorrelated samples can alternatively be writen as the product of the variance of uncorrelated samples and a correction factor of ߚ ൌ ିଵ ഓ 1 െ ଵାଶ ∑షభ ഓసభ ሺଵିሻఘഓ ൨, where ߚ ߚ approaches one for large ݊ (Zieba 2010). Consequently a new ESS is defined as ݊ ∗ ൌ ଵ , ....................................................................................... (G-8) ഓ ఉ ቄଵାଶ ∑షభ ቀଵି ቁఘഓ ቅ ഓ and used instead of ݊ᇱ . The statistical properties of ݏ௬ is derived using the Finney’s derivations (1941) which can be expressed by ∗ షభ ܧሺ ݏଶ ሻ ൌ ቆ ൬ మ మ ାቇ ∗ షభ ൰ ൬ ଶఙ మ ∗ ିଵ൰ . ............................................................................ (G-9) 126 Appendices where is a constant value. Hence, the expected value and variance of the uth percentile are respectively given by ܧሺݔ௨ ሻ ൌ ݁ మ మ ∗ ቈஜା ݁ ୵ೠ ா൫௦ ൯ ቂ1 ∑ସୀଶ ୵ೠ ! ܶቃ, ........................................................ (G-10) and ܸܽݎሺݔ௨ ሻ ൌ ݁ మమ ∗ ቈଶஜା ݁ ଶ୵ೠ ா൫௦ ൯ ቂ1 ∑ସୀଶ ሺଶ୵ೠ ሻ ! ܶቃ െ ሾ ܧሺݔ௨ ሻሿଶ . .......................... (G-11) The covariance between the uth and vth percentiles is expressed as ܿݒሺݔ௨ , ݔ௩ ሻ ൌ ܧሺݔ௨ ݔ௩ ሻ െ ܧሺݔ௨ ሻ ܧሺ ݔ௩ ሻ ൌ ܧሺ݁ ଶ ሻ ܧൣ݁ ሺ୵ೠ ା୵ೡሻ௦ ൧ െ ܧሺݔ௨ ሻ ܧሺ ݔ௩ ሻ ൌ ݁ మమ ∗ ቈଶஜା ݁ ሺ୵ೠ ା୵ೡሻா൫௦ ൯ ቂ1 ∑ସୀଶ ሺ୵ೠ ା୵ೡ ሻ ܶቃ െ ܧሺݔ௨ ሻ ܧሺ ݔ௩ ሻ, ............................. (G-12) ! Substituting Eq. G-9 in the Eq. A-9 yields ܧሺݔௌோ ሻ and ܧሺݔ் ሻ as follows: ܧሺݔௌோ ሻ ൌ ݁ ∑ ସୀଶ ୵వబ ! మ మ ∗ ቈஜା ቄ0.3݁ ୵భబ ா൫௦൯ ቂ1 ∑ସୀଶ ୵భబ ! ܶቃ 0.4 0.3݁ ୵వబ ா൫௦ ൯ ቂ1 ܶቃቅ, ......................................................................................................................................(G-13 ) and ܧሺݔ் ሻ ൌ ݁ ∑ ସୀଶ ୵వఱ ! మ మ ∗ ቈஜା ቄ0.185݁ ୵ఱ ாሺ௦ ሻ ቂ1 ∑ସୀଶ ୵ఱ ! ܶቃ 0.63 0.185݁ ୵వఱாሺ௦ ሻ ቂ1 ܶቃቅ. ............................................................................................................. (G-14) The variances of SR and PT are respectively given by ܸܽݎሺݔௌோ ሻ ൌ ݁ మమ ∗ ቈଶஜା ∑ ସୀଶ ቄ0.09݁ ଶ୵భబ ா൫௦ ൯ ቂ1 ∑ସୀଶ ሺଶ୵వబ ሻ ! ሺଶ୵భబ ሻ ! ܶቃ 0.24݁ ୵భబ ா൫௦൯ ቂ1 ∑ସୀଶ ܶቃ 0.16 0.09݁ ଶ୵వబா൫௦ ൯ ቂ1 ሺ୵భబ ሻ ! ܶቃ 127 Appendices 0.24݁ ୵వబ ா൫௦ ൯ ቂ1 ∑ସୀଶ ሺ୵వబ ሻ ! ܶቃ 0.18ቅ െ ሼ0.09 ܧሺݔଵ ሻଶ 0.16 ܧሺݔହ ሻଶ 0.09 ܧሺݔଽ ሻଶ 0.24 ܧሺݔଵ ሻ ܧሺݔହ ሻ 0.24 ܧሺݔହ ሻ ܧሺݔଽ ሻ 0.18 ܧሺݔଵ ሻ ܧሺݔଽ ሻሽ, .. (G-15) and ܸܽݎሺݔ் ሻ ൌ ݁ మమ ∗ ቈଶஜା ∑ ସୀଶ ቄ0.034݁ ଶ୵ఱ ா൫௦ ൯ ቂ1 ∑ସୀଶ ሺଶ୵వఱ ሻ ! ሺଶ୵ఱ ሻ ܶቃ 0.16 0.09݁ ଶ୵వఱ ா൫௦ ൯ ቂ1 ! ܶቃ 0.24݁ ୵ఱ ா൫௦൯ ቂ1 ∑ସୀଶ ሺ୵ఱ ሻ ! ܶቃ 0.24݁ ୵వఱ ா൫௦ ൯ ቂ1 ∑ସୀଶ ሺ୵వఱ ሻ ! ܶቃ 0.18ቅ െ ሼ0.09 ܧሺݔହ ሻଶ 0.16 ܧሺݔହ ሻଶ 0.09 ܧሺݔଽହ ሻଶ 0.24 ܧሺݔହ ሻ ܧሺݔହ ሻ 0.24 ܧሺݔହ ሻ ܧሺݔଽହ ሻ 0.18 ܧሺݔହ ሻ ܧሺݔଽହ ሻሽ. ................................................................ (G-16) 128 References Appendix H : Moments of the Maximum Likelihood Estimator for Dependent Random Variables The MLE approximates the parameters of a population by maximizing the likelihood function. For any data set of ݔଵ , … , ݔ taken from a log-normal population with the logmean of ߤ and log-variance of ߪ ଶ , MLE estimates the mean value as ݔொ ൌ ଵ ݁ݔ൫݉௬ ݏ௬ ଶ ⁄2൯, where ݕ ൌ ݈݊ሺݔ ሻ, ݉௬ ൌ ∑ୀଵ ݕ , and the sample variance is ଶ ݏ௬ ଶ ൌ ∑ୀଵൣ݈݊ሺݔ ሻ െ ݉௬ ൧ ൗሺ݊ െ 1ሻ. ....................................................................... (H-1) The expected value and variance of ݔொ are analytically derived based on the property of expectation of the product of two independent random variables. As stated before, the sample mean, ݉௬ , and sample variance, ݏ௬ ଶ , are independent. Thus ܧሺݔொ ሻ ൌ ܧ൫݁ ା௦ మ ⁄ଶ ൯ ൌ ܧሺ݁ ሻ ܧ൫݁ ௦ మ ⁄ଶ ൯. ..................................................... (H-2) Based on the CLT, ݉௬ is normally distributed with the mean ߤ and variance ఙమ ୬ᇲ , where n ᇱ ൌ ݊⁄ሼ1 2 ∑ିଵ ఛ ሺ1 െ ߬ ⁄݊ሻ ߩఛ ሽ, and ߩఛ is the correlation coefficient between the pairs of ܻ’s which are separated by ߬ . Therefore, according to the properties of the log-normal distribution, ݁ has the mean of ݁ ൣஜାఙ మ ൗ൫ଶᇲ ൯൧ and variance of ݁ ൣଶஜାఙ మ ⁄ ᇲ ൧ ൫݁ ఙ మ ⁄ᇲ െ 1൯. Bayley and Hammersley (1946) introduced an effective sample size, ݊௩ ∗ , derived based on the variance of sample variance, ܸܽݎ൫ݏ௬ ଶ ൯. Therefore, using the Finney’s derivations మ (1941), the expectation of ݁ ௦ can be given by మ ܧ൫݁ ௦ೌ ൯ ൌ ቂ1 െ ଶఙ మ ቃ ሺೡ ∗ ିଵሻ ሺ ∗ షభሻ ି ೡమ , .............................................................................. (H-3) where ܽ is a constant coefficient. Thus the expected value of ݁ ൫௦ೌ ቂ1 െ ሺ ఙమ ೡ ∗ ିଵሻ ቃ ሺ ∗ షభሻ ି ೡమ మ ⁄ଶ൯ is ܧൣ݁ ൫௦ೌ , and consequently the expected value of ݔொ is given by 129 మ ⁄ଶ൯ ൧ൌ References ܧሺݔொ ሻ ൌ ݁ మ ൰ మᇲ ൬ஜା ቀ1 െ ሺ ఙమ ೡ ቁ ሺ ∗ షభሻ ି ೡమ ∗ ିଵሻ . ................................................................ (H-4) The variance of ݔொ can be written as ܸܽݎൣ݁ ା௦ మ ⁄ଶ మ ൧ ൌ ܧ൛݁ ଶା௦ ൟ െ ൛ ܧൣ݁ ା௦ మ ⁄ଶ ଶ ൧ൟ . ............................................... (H-5) From the properties of log-normal, ሺ݁ ሻ ൌ ݁ ൣఓା మ ఙ మ ⁄ଶ୬ᇲ ൧ , where b is a constant coefficient. Then Eq. H-5 is simplified as ܸܽݎሺݔொ ሻ ൌ ݁ మ ൬ଶఓା ᇲ ൰ ൝݁ మ ൬ ᇲ൰ ቀ1 െ ଶఙ మ ሺ ∗ షభሻ ି ೡమ ቁ ೡ ∗ ିଵ 130 െ ቀ1 െ ఙమ ିሺೡ ∗ ିଵሻ ቁ ೡ ∗ ିଵ ൡ. ......... (H-6) References References Agterberg, F.P., (1974) “Geomathematics: Mathematical Background and GeoScience Applications” Elsevier Scientific Pub. Co., Amsterdam, New York, 569 p. Arild, Φ., Lohne, H.P., Bratvold, R., (2008) “A Monte Carlo Approach to Value of Information” SPE IPTC-11969. Atkinson, A.C., Pericchi, L.R., Smith, R.L., (1991) “Grouped Likelihood for the Shifted Power Transformation” Journal of the Royal Statistical Society, Series B, 53, No. 2, 473–482. Bayley, G.V., Hammersley, G.M., (1946) “The Effective Number of Independent Observations in an Autocorrelated Time-Series”. J. Roy. Stat. Soc. Suppl., 8, 184-197. Behboodian, J., (1970) “On the Modes of a Mixture of Two Normal Distributions” Technometrics, 12, No. 1, 131-139. Bennion, D.W., (1966) “A Stochastic Model for Predicting Variations in Reservoir Rock Properties” SPE Journal 1187-PA, 6, No. 1, 9-16. Bickel, J.E., Lake, L.W., Lehman, J., (2011) “Discretization, Simulation, and Swanson’s (Inaccurate) Mean” SPE Economic and Management, 3, No. 3, 128-140. Box, G.E., Cox, D.R., (1964) ‘An Analysis of Transformed Data” Journal of Royal Statistical Society, Series B, 39, 211-252. Cartwright, L.G., (2007) “Applying Modern Portfolio Theory in the Upstream Oil and Gas Sector”, Oil and Gas Financial Journal. DeGroot, M.H., (1989) “Probability and Statistics”, Addison-Wesley Publishing, Reading City, 2nd ed., 723 p. Delfiner, P., (2007) “Three Statistical Pitfalls of Phi-K Transform” SPE Reservoir Evaluation & Engineering, 10, No. 6, 609-617. 131 References Dykstra, H., Parsons, R.L., (1950) “The Prediction of Oil Recovery by Waterflood” Secondary Recovery of Oil in the United States, New York, American Petroleum Institute, 2nd ed., 160-174. Efron, B., Tibshirani, R.J., (1993) “An introduction to the bootstrap” Chapman and Hall, New York, 436 p. Finney, D.J., (1941) “On the Distribution of a Variate Whose Logarithm is Normally Distributed” Supplement to the Journal of the Royal Statistical Society, 7, No. 2, 155­ 161. Eisenberger, I., (1964) “Genesis of Bimodal Distributions” Technometrics, 6, No. 4, 357-364. Emerson, J.D., Stoto, M.A., (1982) “Exploratory Methods for Choosing Power Transformations” Journal of the American Statistical Association, 77, No.377, 103-108. Freeman, J., Modarres, R., (2006) “Inverse Box–Cox: The power-normal distribution” Statistics & Probability Letters, 76, No. 8, 15, 764–772. Gnanadesikan, R., (1977) “Methods for Statistical Data Analysis of Multivariate Observations” Wiley, New York, 368 p. Hinkley, D.V., (1975) “On Power Transformations to Symmetry” Biometrilw, 62, No.1, 101-11. Hurst, A., Brown, G.C., Swanson, R.I., (2000) “Swanson’s 30-40-30 Rule” AAPG Bulletin, 84, No. 12, 1883-1891. Jensen, J.L., (1998) “Some Statistical Properties of Power Averages for lognormal Samples” Water Resources Research, 34, No. 9, 2415-2418. Jensen L.J., Hinkley, D.V., Lake L.W., (1987) “A Statistical Study of Reservoir Permeability: Distributions, Correlations, and Averages” SPE Formation Evaluation 14270-PA, 2, No. 4, 461-468. Jensen L.J., Lake L.W., Corbett P.W.M., Goggin D.J., (2000) “Statistics for Petroleum Engineers and Geoscientists” Elsevier, Amsterdam, New York, 2nd ed., 338 p. 132 References Kaufman, G.M., (1965) “Statistical Analysis of the Size Distribution of Oil and Gas Fields” SPE 1096-MS. Keefer, D.L., (1994) “Certainty Equivalent for Three-Point Discrete-Distribution Approximations”, Management Science, 40, No. 6, 760-773. Keefer, D.L., Bodily, S.E., (1983) “Three-Point Approximations for Continuous Random Variables” Management Science, 29, No. 5, 595-609. Kendall, M., Stuart, A. (1977) “The Advanced Theory of Statistics”, Macmillan Publishing Company, New York City, 2, 748 p. Kenney, J.F., Keeping E.S., (1951) “Mathematics of Statistics” D. Van Nostrand, Princeton Pt. 2, 2nd ed., 429 p. Laherrère, J., Sornette, D., (1998) “Stretched Exponential Distributions in Nature and Economy: “Fat Tails” with Characteristic Scales” The European Physical Journal B, 2, 525-539. Lambert, M.E., (1981) “A Statistical Study of Reservoir Heterogeneity” MS Thesis, U of Texas, Austin, TX. Law, J., (1944) “Statistical Approach to the Interstitial Heterogeneity of Sand Reservoirs”, Transactions of the AIME, 155, No. 1, 202-222. Lindgren, B.W., (1968) “Statistical Theory” Macmillan Company, London, 2, 521 p. MacCrossan, R.G., (1969) “An Analysis of Size Frequency Distribution of Oil and Gas Reserves of Western Canada” Canadian Journal of Earth Sciences, 6, No. 2, 201­ 211. Megill, R.E., (1984) “An Introduction to Risk Analysis” Pennwell Publishing Company, Tulsa, Oklahoma, 2nd ed., 274 p. Miller, A.C., Rice, T.R., (1983) “Discrete Approximation of Probability Distributions” Management Science, 29, No. 3, 352-362. Ord, K., Stuart, A., (1987) “Kendall's Advanced Theory of Statistics, Distribution Theory”, Oxford University Press, New York City, 1, 604 p. 133 References Pearson, E.S., Tukey, J.W., (1965) “Approximate Means and Standard Deviations Based on Distances Between Percentage Points of Frequency Curves” Biometrika, 52, No. 3-4, 533-546. Pintos, S., Bohorquez, C., Queipo, N.V., (2011) “Asymptotic Dykstra–Parsons Distribution, Estimates and Confidence Intervals” Mathematical Geosciences, 43, No. 3, 329-343. Priestley, M.B., (1981) “Spectral analysis and time series” Academic Press, London, New York, 890 p. Quenouille, M.H., (1956) “Note on Bias in Estimation” Biometrika, 43, No. 3-4, 353­ 360. Rice, J.A., (2007) “Mathematical Statistics and Data Analysis” the University of California, Berkeley, 3rd ed., 603 p. Robertson, C.A., Fryer, J.G., (1969) “Some Descriptive Properties of Normal Mixture” Scandinavian Actuarial Journal, 1969, No. 3-4, 137-146. Rollins, J.B., Holditch, S.A., Lee, W.J., (1992) “Characterizing Average Permeability in Oil and Gas Formations” SPE Formation Evaluation 19793-PA, 7, No.1, 99-105. Rose, P.R., (2001) “Risk Analysis and Management of Petroleum Exploration Ventures” American Association of Petroleum Geologists, Tulsa, Oklahoma, 164 p. Schilling, M.F., Watkins, A.E., Watkins, W., (2002) “Is Human Height Bimodal?” The American Statistician, 56, No. 3, 223-229. Seidle, J.P., O’Connor, L.S., (2003) “Production Based Probabilistic Economics for Unconventional Gas” SPE paper 82024-MS. Seyedghasemipour, S.J., Bhattacharyya, B.B., (1990) “The Log-hyperbolic an Alternative to the Lognormal”, Mathematical Geology, 22, No. 5, 557-571. Steel, R.G.D., Torrie, J.H., (1980) “Principles and Procedures of Statistics: A Biometrical Approach” 2nd ed., New York: McGraw-Hill, 666 p. Thiebaux, H.J., Zwiers, F.W., (1984) “The Interpretation and Estimation of Effective Sample Size” Journal of Applied Meteorology, 23, Issue 5, 800-811. 134 References Vanmarcke, E., (2010) “Random Fields Analysis and Synthesis” World Scientific Publishing Company Pte. Ltd, 350 p. Vicens, G., Schaake J.C., Jr. (1972), “Simulation Criteria for Selecting Water Resources System Alternatives”, Report No. 154, Ralph M. Parsons Laboratory for Water Resources and Hydrodynamics, Department of Civil Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts. Willhite, G.P., (1986) “Waterflooding” Society of Petroleum Engineers, 326 p. Zięba, A., (2010) “Effective number of observations and unbiased estimators of variance for autocorrelated data - an overview”, Metrology and Measurement Systems, 17, Issue 1, 3-16. 135