Simulation Modeling and Analysis Input Modeling 1 Outline • • • • • • • Introduction Data Collection Matching Distributions with Data Parameter Estimation Goodness of Fit Testing Input Models without Data Multivariate and Time Series Input Models 2 Introduction • Steps in Developing Input Data Model – Data collection from the real system – Identification of a probability distribution representing the data – Select distribution parameters – Goodness of fit testing 3 Data Collection • Useful Suggestions – – – – – – Plan, practice, preobserve Analyze data as it is collected Combine homogeneous data sets Watch out for censoring Build scatter diagrams Check for autocorrelation 4 Identifying the Distribution • Construction of Histograms – Divide range of data into equal subintervals – Label horizontal and vertical axes appropriately – Determine frequency occurrences within each subinterval – Plot frequencies 5 Physical Basis of Common Distributions • Binomial: Number of successes in n independent trials each of probability p . • Negative Binomial (Geometric): Number of trials required to achieve k successes. • Poisson: Number of independent events occurring in a fixed amount of time and space (Time between events is Exponential). 6 Physical Basis of Common Distributions - contd • Normal: Processes which are the sum of component processes. • Lognormal: Processes which are the product of component processes. • Exponential: Times between independent events (Number of events is Poisson). • Gamma: Many applications. Non-negative random variables only. 7 Physical Basis of Common Distributions - contd • Beta: Many applications. Bounded random variables only. • Erlang: Processes which are the sum of several exponential component processes. • Weibull: Time to failure. • Uniform: Complete uncertainty. • Triangular: When only minimum, most likely and maximum values are known. 8 Quantile-Quantile Plots • If X is a RV with cdf F, the q-quantile of X is the value such that F() = P(X < ) = q • Raw data {xi} • Data rearranged by magnitude {yj} • Then: yj is an estimate of the (j-1/2)/n quantile of X, i.e. yj ~ F-1[(j-1/2)/n] 9 Quantile-Quantile Plots -contd • If F is a member of an appropriate family then a plot of yj vs. F-1[(j-1/2)/n] is a straight line • If F also has the appropriate parameter values the line has a slope = 1. 10 Parameter Estimation • Once a distribution family has been determined, its parameters must be estimated. • Sample Mean and Sample Standard Deviation. 11 Parameter Estimation -contd • Suggested Estimators – – – – Poisson: ~ mean Exponential: ~ 1/mean Uniform (on [0,b]): b ~ (n+1) max(X)/n Normal: ~ mean; 2 ~ S2 12 Goodness of Fit Tests • Test the hypothesis that a random sample of size n of the random variable X follows a specific distribution. – Chi-Square Test (large n; continuous and discrete distributions) – Kolmogorov-Smirnov Test (small n; continuous distributions only) 13 Chi-Square Test • Statistic 20 = k (Oi - Ei)2/Ei • Follows the chi-square distribution with ks-1 degrees of freedom (s = d.o.f. of given distribution) • Here Ei = n pi is the expected frequency while Oi is the observed frequency. 14 Chi-Square Test -contd • Steps – – – – Arrange the n observations into k cells Compute the statistic 20 = k (Oi - Ei)2/Ei Find the critical value of 2 (Handout) Accept or reject the null hypothesis based on the comparison • Example: Stat::Fit 15 Chi-Square Test - contd • If the test involves a discrete distribution each value of the RV must be in a class interval unless combined intervals are required. • If the test involves a continuous distribution class intervals must be selected which are equal in probability rather than width. 16 Chi-Square Test - contd • Example: Exponential distribution. • Example: Weibull distribution. • Example: Normal distribution. 17 Kolmogorov-Smirnov Test • Identify the maximum absolute difference D between the values of of the cdf of a random sample and a specified theoretical distribution. • Compare against the critical value of D (Handout). • Accept or reject H0 accordingly • Example. 18 Input Models without Data • When hard data are not available, use: – – – – – Engineering data (specs) Expert opinion Physical and/or conventional limitations Information on the nature of the process Uniform, triangular or beta distributions • Check sensitivity! 19 Multivariate and Time-Series Input Models • If input variables are not independent their relationship must be taken into consideration (multivariable input model). • If input variables constitute a sequence (in time) of related random variables, their relationship must be taken into account (time-series input model). 20 Covariance and Correlation • Measure the linear dependence between two random variables X1 (mean 1, std dev 1) and X2 (mean 2, std dev 2) X1 - 1 = (X2 - 2) + • Covariance: cov(X1,X2) = E(X1 X2) - 1 2 • Correlation: = cov(X1,X2)/12 21 Multivariate Input Models • If X1 and X2 are normally distributed and interrelated, they can be modeled by a bivariate normal distribution • Steps – Generate Z1 and Z2 indepedendent standard RV’s – Set X1 = 1 + 1 Z1 – Set X2 = 2 + 2(Z1 + (1-2)1/2 Z2) 22 Time-Series Input Models • Let X1,X2,X3,… be a sequence of identically distributed and covariancestationary RV’s. The lag-h correlation is h = corr(Xt,Xt+h) = h • If all Xt are normal: AR(1) model. • If all Xt are exponential: EAR(1) model. 23 AR(1) model • For a time series model Xt = + (Xt-1 - ) + t where t are normal with mean = 0 and var = 2 24 AR(1) model -contd 1.- Generate X1 from a normal with mean and variance 2 /(1 - 2). Set t = 2. 2.- Generate t from a normal with mean = 0 and variance 2 . 3.- Set Xt = + (Xt-1 - ) + t 4.- Set t = t+1 and go to 2. 25 EAR(1) model • For a time series model Xt = Xt-1 with prob Xt = Xt-1 + t with prob where t are exponential with mean = 1/ and 26 EAR(1) model - contd 1.- Generate X1 from an exponential with mean . Set t = 2. 2.- Generate U from a uniform on [0,1]. If U < set Xt = Xt-1 . Otherwise generate from an exponential with mean 1/ and set Xt = Xt-1 + t 4.- Set t = t+1 and go to 2. 27