Lectures 13 and 14.pptx

advertisement
Selecting Input Probability
Distributions
Techniques for Assessing Sample
Independence - 1
• Most methods to specify input distributions assume observed
data X1, X2, ..., Xn are an independent (random) sample from
some underlying distribution
– If not, most methods are invalid
– Need a way to check data empirically for independence
– Heuristic plots vs. formal statistical tests for independence
2
Techniques for Assessing Sample
Independence - 2
• Correlation plot: If data are observed in a time sequence,
compute sample correlation with lag j and plot it
n j
ˆ j 
 X
i 1
i

 X ( n ) X i  j  X ( n)

 n  j  S 2 ( n)
– If data are independent then the correlations should be near
zero for all lags
– Keep in mind that these are just estimates
3
Techniques for Assessing Sample
Independence - 3
• Scatter diagram: Plot pairs (Xi, Xi+1)
– If data are independent the pairs should be scattered
randomly
– If data are positively (negatively) correlated the pairs will lie
along a positively (negatively) sloping line
4
Techniques for Assessing Sample
Independence
• Independent draws from expo(1) distribution (independent by
construction)
5
Techniques for Assessing Sample
Independence
• Delays in queue from M/M/1 queue with utilization factor ρ = 0.8
(positively correlated)
• There are also formal statistical testing procedures, like the “runs”
test
6
Activity I: Hypothesizing Families of
Distributions - 1
• First, need to decide what form or family to use—exponential,
gamma, or what?
• Later, need to estimate parameters and assess goodness of fit
• Sometimes prior knowledge of a certain random variable’s role in
simulation may be useful:
– Requires no data
– Use theoretical knowledge of random variable’s role in
simulation
7
Activity I: Hypothesizing Families of
Distributions - 2
• Sometimes prior knowledge of a certain random variable’s
role in simulation may be useful:
– Seldom have enough prior knowledge to specify a distribution
completely. Exceptions are:
• Arrivals one-at-a-time, constant mean rate, independent:
exponential interarrival times or Poisson arrival process
• Sum of many independent pieces: normal
• Product of many independent pieces: lognormal
• Percentage/probability: Beta (range is in (01,)
– Often use prior knowledge to rule out distributions on basis of
range
• Service times: not normal (normal range always goes negative)
– Still should be supported by data (e.g., for parameter-value
estimation)
Summary Statistics
• Compare simple sample statistics with theoretical population
versions for some distributions to get a hint
– Bear in mind that we get only estimates subject to uncertainty
• If sample mean X (n) and sample median xˆ0.5 (n) are close,
suggests a symmetric distribution
• Coefficient of variation of a distribution: cv = σ/μ; estimate via;
sometimes useful for discriminating between continuous
distributions cvˆ  S (n) / X (n)
– cv < 1 suggests gamma or Weibull with shape parameter α > 1
– cv = 1 suggests exponential
– cv > 1 suggests gamma or Weibull with shape parameter α < 1
9
Summary Statistics
• Lexis ratio of a distribution:  = σ2/μ; estimate via ˆ  S 2 (n) / X (n)
Sometimes, useful for discriminating between discrete
distributions:
–  < 1 suggests binomial
–  = 1 suggests Poisson
–  > 1 suggests negative binomial or geometric
• Other summary statistics: range, skewness, kurtosis
10
Histograms
Continuous Data Set
• hj : Basically an unbiased estimate of Db f(x), where f(x) is the true
(unknown) underlying density of the observed data and Db is a constant
• Break range of data into k intervals of width Db each
– k, Db are determined basically by trial and error
– One rule of thumb: Sturges’s rule
k  1  log 2 n  1  3.332 log 10 n
– Compute proportion hj of data falling in jth interval; plot a constant of
height hj above the jth interval
– Shape of plot should resemble density of underlying distribution;
compare shape of histogram to density shapes.
Histograms
Discrete Data Set
• Basically an unbiased estimate of the (unknown) underlying probability
mass function of the data
• For each possible value xj that can be assumed by the data, let hj be
the proportion of the data that are equal to xj; plot a bar of height hj
above xj
• Shape of plot should resemble mass function of underlying distribution;
compare shape of histogram to mass-function shapes in Sec. 6.2.3
Multimodal Data
• Histogram might have multiple local modes, rather than just one; no
single “standard” distribution adequately represents it
• Reason: data can be separated on some context-dependent basis (e.g.,
observed machine downtimes are classified as minor vs. major)
• Separate data on this basis, fit separately, recombine as a mixture
Quantile Summaries
• Numerical synopsis of sample quantiles useful for detecting whether
underlying density or mass function is symmetric or skewed one way
or the other
• Definition of quantiles: Suppose the CDF F(x) is continuous and
strictly increasing whenever 0 < F(x) < 1, and let q be strictly
between 0 and 1. Then the q-quantile of F(x) is the number xq such
that F(xq) = q. If F–1 is the inverse of F, then xq = F–1(q)
– q = 0.5: median
– q = 0.25 or 0.75: quartiles
– q = 0.125, 0.875: octiles
– q = 0, 1: extremes
13
Quantile Summaries
• Quantile summary: List median, average of quartiles,
average of octiles, and average of extremes
• If distribution is symmetric, then
median  average of quartiles  average of octiles 
average of extremes
• If distribution is skewed right, then
median < average of quartiles < average of octiles <
average of extremes
• If distribution is skewed left, then
median > average of quartiles > average of octiles >
average of extremes
14
Box Plots
• Graphical display of quantile summary
• On horizontal axis, plot median, extremes, octiles, and a box
ending at quartiles
• Symmetry or asymmetry of plot indicates symmetry or
skewness of distribution
15
Hypothesizing a Family of Distributions:
Example with Continuous Data
• Sample of n = 219 interarrival times of cars to a drive-up bank over a
90-minute peak-load period are obtained
– Number of cars arriving in each of the six 15-minute periods was
approximately equal (suggesting stationarity of arrival rate)
– Sample mean = 0.399 (all times in minutes) > median = 0.270,
skewness = +1.458 (suggesting right skewness)
– cv = 0.953, close to 1 (suggesting exponential distribution)
• Histograms (for different choices of interval width Db) suggest the
exponential distribution
16
Hypothesizing a Family of Distributions:
Example with Discrete Data
• Sample of n = 156 observations on number of items demanded per
week from an inventory over a three-year period are obtained
– Range 0 through 11
– Sample mean = 1.891 > median = 1.00, skewness = +1.655
(suggesting right skewness)
– Lexis ratio = 5.285/1.891 = 2.795 > 1 (suggesting negative
binomial or geometric (special case of negative binomial)
distribution)
• Histogram suggests the geometric distribution
17
Activity II: Estimation of Parameters
• Have: Hypothesized parametric distribution
• Need: Numerical estimates of its parameter(s)—this constitutes the “fit”
• Many methods to estimate distribution parameters
– Method of moments
– Unbiased estimates
– Least squares estimation
– Maximum likelihood estimation (MLE)
• In some sense, MLE is the preferred method for our purposes
– Good statistical properties
– Somewhat justifies chi-square goodness-of-fit test
– Intuitive
– Allows estimates of error in the parameters—sensitivity analysis
Activity II: Estimation of Parameters
• Idea for MLEs:
– Have observed sample X1, X2, ..., Xn
– Came from some true (unknown) parameter(s) of the distribution form
– Pick the parameter(s) that make it most likely that you would get what
you did get (or close to what you got in the continuous case)
– An optimization (mathematical-programming) problem, often messy
Desirable properties of MLEs
• Invariance Principle: Let ˆ1 , , ˆm be MLEs of the
parameters θ1,…, θm. Then the mle of any function
h(θ1,…, θm) of these parameters is the function h(ˆ1 , , ˆm )
of the MLEs.
• Asymptomatic behavior of the MLEs: When the sample
size is large:
– The MLE is approximately the Minimum Variance
Unbiased Estimator (MVUE) of θ.
– The MLE is approximately normal
20
MLEs for Discrete Distributions
• Have hypothesized family with PMF pθ (xj) = Pθ(X = xj)
• Single (for now) unknown parameter θ to be estimated
• For any trial value of θ, the probability of getting the already-observed
sample is
• P{Getting X1, X2,..., Xn} = P(X1)P(X2)...P(Xn)
= P(X = X1)P(X = X2)...P(X = Xn)
= pθ(X1) pθ(X2)... pθ(Xn)
= Likelihood function L(θ)
• Task: Find the value of θ that makes L(θ) as big as it can be
• How? Differential calculus, take logarithm (turns products into sums),
nonlinear programming methods, tabled values in the book
• MLEs for continuous distributions are obtained by replacing the PMF
with the PDF
• MLEs for multiparameter distributions are obtained similarly where the
likelihood function is a function of all of the parameters
Example of Continuous MLE
X (219)  0.399
• {Xi|i=1,…,219}: a random sample
• Hypothesized distribution is the exponential distribution
 1 x / 
 e
f  ( x)   
 0
if x  0
otherwise
• Likelihood function is:
L(  )  (
1

e
 X1 / 
)(
1

e
X2 / 
)...(
1

e
Xn / 
)
n
exp( 
1
n
X)


i 1
i
• Want value of β that maximizes L(β) over all β > 0
• Equivalent (and easier) to maximize the log-likelihood function l(β) =
ln L(β) since ln is a monotonically increasing function
22
Example of Continuous MLE
l (  )  n ln  
1
n
X


i 1
i
n
dl (  ) n 1

 2
d
 
n
 X i  0  ˆ 
i 1
d 2l (  ) n
2 n
d 2l
 2  3  Xi 
2
d
  i 1
d 2
ˆ
X
i 1
n
i
 X ( n)
n
2nX (n)
n



0
2
3
2
X ( n)
X ( n)
X ( n)
ˆ  X (n)  0.399 from the observed sample of n  219 points
23
Example of Discrete MLE
•
Hypothesized distribution is the geometric distribution
p p ( x)  p(1  p) x (x  0,1, 2,...)
•
Likelihood function is
L( p)  p(1  p) p(1  p)
X1
•
•
n
X2
p(1  p)
Xn
 Xi
 p (1  p) i1
n
Want value of β that maximizes L(β) over all β > 0
Equivalent (and easier) to maximize the log-likelihood function l(β) =
ln L(β) since ln is a monotonically increasing function
24
Example of Discrete MLE
n
Xi
dl ( p) n 
1
  i 1
 0  pˆ 
dp
p 1 p
X ( n)  1
n
Xi

d 2l ( p )
n
  2  i 1 2  0
2
dp
p 1  p 
MLE is pˆ 
1
 0.346 from the observed sample of n  156 points
1.891  1
n
E[ X i ]
 d 2l ( p ) 
n 
n n(1  p) / p
n
i 1
E









2
p 2 (1  p) 2
p2
(1  p) 2
p 2 (1  p)
 dp 
 d 2l ( p ) 
2
 ( p)  n / E 

p
(1  p)

2
dp


pˆ  z1 / 2
 ( pˆ )
n
 pˆ  z1 / 2
pˆ 2 (1  pˆ )
n
0.3462 (1  0.346)
90% CI: 0.346  1.645
 0.346  0.037  [0.309, 0.383]
156
25
Activity III: Determining How Representative
the Fitted Distributions Are
• Have: Hypothesized family and estimated parameters
• Question: Does the fitted distribution agree with the observed data?
• Approaches
– Heuristic
• Density/Histogram overplots and frequency comparisons
• Distribution function differences plots
• Probability and quantile plots
– P-P plot
– Q-Q plot
– Statisticals tests
• Chi-square goodness-of-fit tests
• Kolmogorov-Smirnov tests
• Anderson-Darling tests
26
• Poisson process tests
Heuristic Procedures
Continuous Data
• Density/Histogram Overplots and Frequency Comparisons
– Density/histogram overplot: Plot Db fˆ ( x)
over the histogram
h(x); look for similarities (recall that the area under h(x) is Db
and fˆ is the density of the fitted distribution)
– Interarrival-time data for drive-up bank and fitted exponential
27
Heuristic Procedures
Continuous Data
• Data Set 3 : Against a Weibul distribution
Density-Histogram Plot
0.19
Density/Proportion
0.15
0.11
0.08
0.04
0.00
2.30
4.70
7.10
9.50
Interval Midpoint
13 intervals of w idth 1.2
1 - Weibull
11.90
14.30
16.70
28
Heuristic Procedures
• Frequency comparison
– Take histogram intervals [bj–1, bj] for j = 1, 2, ..., k, each of width Db
– Let hj be the observed proportion of data in jth interval
– Let
bj
rj 

fˆ ( x)dx
b j 1
be the expected proportion of data in jth interval if the fitted
distribution is correct
– Plot hj and rj together, look for similarities
29
Heuristic Procedures
Discrete Data
• Frequency comparison
– Let hj be the observed proportion of data that are equal to the jth
possible value xj
– Let rj  pˆ ( x j )
be the expected proportion of the data equal to xj if the fitted
probability mass function p̂ is correct
– Plot hj and rj together, look for similarities
– Demand-size data for inventory and fitted geometric distribution
30
Distribution Function Differences Plots
• Density/histogram overplots are comparisons of individual probabilities
of fitted distribution with observed individual probabilities
• Instead of individual probabilities, one can compare cumulative
probabilities via fitted CDF Fˆ ( x ) against a (new) empirical CDF
number of X i ' s  x
Fn ( x) 
 proportion of data that are  x
n
• Could plot Fˆ ( x) with Fn (x) and look for similarities, but it is harder to
see such similarities for cumulative than for individual probabilities
• Alternatively, plot Fˆ ( x)  Fn ( x) against the range of x values and look for
closeness to a flat horizontal line at 0.
31
Distribution Function Differences Plots
Interarrival-time data for drive-up bank and fitted exponential distribution
32
Distribution Function Differences Plots
Distribution-Function-Differences Plot
0.20
Data Set 3
Weibull
0.13
Difference (Proportion)
0.07
0.00
-0.07
-0.13
-0.20
1.76
3.96
6.17
8.37
10.58
12.79
x
Use caution if plot crosses line
1 - Weibull (mean diff. = 0.01706)
14.99
33
17.20
Distribution Function Differences Plots
Demand-size data for inventory and fitted geometric distribution
Probability Plots
• This is another way to compare CDF of fitted distribution with an
empirical distribution directly from the data
• Sort data into increasing order: X(1), X(2), ..., X(n) (called the order
statistics of the data)
• Another empirical CDF definition, defined only at the order statistics is
Fn ( X (i ) )  is the observed proportion of data  X (i )
= i / n (adjust to (i - 0.5) / n)
• If F(x) is the true (unknown) CDF of the data, then F(x) = P(X x) for
any x
• So taking x = X(i), F(X(i)) = P(X X(i)), which is estimated by (i – 0.5)/n
• Thus, we should have F(X(i)) (i – 0.5)/n, for all i = 1, 2, ..., n
35
P-P Plots
q
~
Fn ( x)
P-P Plot
Fˆ ( x )
~
Fn : Empirical distributi on
Fˆn : Fitted model distributi on
x
P-P Plot
• If the fitted distribution (with CDF F̂ ) is correct (i.e. close to the true
unknown F), we should have
(i  0.5)
Fˆ ( X (i ) ) 
n
• So, plotting the pairs
 (i  0.5) ˆ

,
F
(
X
)
(i ) 

n


for all i = 1, 2, ..., n should result in an approximately straight line
from (0, 0) to (1, 1) if F̂ is correct
• This plot is valid for both continuous and discrete data
• It is sensitive to misfits in the center of the range of the distribution
37
P-P Plot
P-P plot of interarrival-time data for fitted exponential distribution
38
P-P Plot
P-P plot of demand-size data for fitted geometric distribution
39
P-P Plot
P-P plot of Data Set 3 against a Weibull distribution
P-P Plot
1.00
0.80
Model Value
0.60
0.40
0.20
0.00
0.00
0.20
0.40
0.60
0.80
Sample Value
Range of sample
1 - Weibull (discrepancy = 0.03753)
1.00
40
Q-Q Plot
~
Fn : Empirical distributi on
Fˆn : Fitted model distributi on
q
xqS (ample) xqM (odel)
Q-Q Plot
Q-Q Plot
• Taking the inverse Fˆ 1 of the P-P plot, we get the new plot
 ˆ 1  (i  0.5) 

F
,
X

 (i ) 

 n 


• So, plotting these pairs for all i = 1, 2, ..., n should result in an
approximately straight line from (X(1), X(1)) to (X(n), X(n)) if F̂ is
correct
• This plot is valid only for continuous data
• Depending on the form of the fitted distribution, there may not be a
ˆ 1
closed-form formula for F
• It is sensitive to misfits in the tails of the distributions
42
Q-Q Plot
Q-Q plot of interarrival-time data for fitted exponential distribution
43
Goodness-of-Fit Test
• Formal statistical hypothesis tests for
– H0: The observed data X1, X2, ..., Xn are IID random variables
with F̂ distribution function
• Caution: Failure to reject H0 does not constitute “proof” that the fit
is good
– Power of some goodness-of-fit tests is low, particularly for
small sample size n
• Also, large n creates high power, so tests will nearly always reject
H0
– Keep in mind that null hypotheses are seldom literally true,
and we are looking for an “adequate” fit of the distribution
44
Chi-Square Test
• Very old (Karl Pearson, 1900), and general (continuous or discrete
data)
• It is a statistical formalization of frequency comparisons
• Divide range of data into k intervals, not necessarily of equal
width:
– [a0, a1), [a1, a2), ..., [ak–1, ak]
– a0 could be –or ak could be +
• Compare actual amount of observed data in each interval with what
the fitted distribution would predict
– Let Nj = the number of observed data points in the jth interval
– Let pj = the expected proportion of the data in the jth interval if
the fitted distribution were literally true
45
Chi-Square Test
 aj
  fˆ ( x)dx for continuous

p j   a j1
  pˆ ( x)
for discrete
a j1  x  a j
• Thus, n pj = expected (under fitted distribution) number of points in
the jth interval
• If fitted distribution is correct, would expect that Nj n pj
• The test statistic is
k
 
2
j 1
N
 np j 
2
j
np j
46
Chi-Square Test
• Under H0: Fitted distribution is correct, 2 has (approximately—see
book for details) a chi-square distribution with k – 1 degrees of
freedom
• Reject H0 at level α if 2 > upper critical value
• Advantages
– Completely general
– Asymptotically valid (as n ) if MLEs were used
• Drawbacks
– Arbitrary choice of intervals (can affect test conclusion)
– Conventional advice
• Want n pj 5 or so for all but a couple of j’s
• Pick intervals such that the pj’s are close to each other
47
Chi-Square Test
Exponential distribution fitted to interarrival-time data
• Choose k = 20 intervals so that pj = 1/20 = 0.05 for each
interval (see book for details on how the endpoints were
chosen, which involved inverting the exponential CDF and
taking a20 = +)
• Thus, npj = (219) (0.05) = 10.95 for each interval
• Counted observed frequencies Nj, computed test statistic χ2 =
22.188
• Use degrees of freedom = k – 1 = 19; upper 0.10 critical level is
χ219,0.90 = 27.204
• Since test statistic does not exceed the critical level, do not
reject H0
48
Chi-Square Test
Geometric distribution fitted to demand-size data
• Since data are discrete, cannot choose intervals so that the pj’s are
exactly equal to each other
• Choose k = 3 intervals (classes) {0}, {1, 2}, and {3, 4, ...}
• Obtained np1 = 53.960, np2 = 58.382, and np3 = 43.658
• Counted observed frequencies Nj, computed test statistic χ2 = 1.930
• Use degrees of freedom = k – 1 = 2; upper 0.10 critical level is
χ22,0.90 = 4.605
• Since test statistic does not exceed the critical level, do not reject H0
49
Chi-Square Test
• Data Set 3 against Weibull
50
Chi-Square Test
• Data Set 3 against Gamma
51
Kolmogorov-Smirnov Test
• Advantages with respect to chi-square tests
– No arbitrary choices, like intervals
– Exactly valid for any (finite) n
• Disadvantage with respect to chi-square tests
– Not as general
• It is a kind of a statistical formalization of probability plots
– Compare empirical CDF from data against fitted CDF
• Yet another version of empirical distribution function is used
– Fn(x) = proportion of the Xi data that are x (piecewise linear
step function)
– On the other hand, we have the fitted CDF Fˆ ( x )
– In a perfect world, Fn(x) = Fˆ ( x ) for all x
52
Kolmogorov-Smirnov Test
• The worst (vertical) discrepancy is Dn  sup F n( x)  Fˆ ( x)
x
• Computing Dn (must be careful; sometimes stated incorrectly)
i

Dn  max   Fˆ X (i ) 
i 1, 2 ,..., n n


i 1 

Dn  max  Fˆ X (i )  

i 1, 2 ,..., n
n 

Dn  max Dn , Dn
i 1, 2 ,..., n


• Reject H0: The fitted distribution is correct if Dn is not too big
• If all parameters are known, reject H0 if
0.11 

n

0.12


 Dn  c1
n


• There are several different kinds of tables depending on the form
and specification of the hypothesized distribution (see book for
details and example)
53
Kolmogorov-Smirnov Test
54
Anderson-Darling Test
• As in K-S test, look at vertical discrepancies between Fˆ ( x) and F n(x)
• Difference:
– K-S weights differences the same for each x
– Sometimes more interested in getting accuracy in (right) tail
• Queueing applications
• P-K formula depends on variance of service-times
– A-D applies increasing weight on differences toward tails
– A-D is more sensitive (powerful) than K-S in tail discrepancies
• Define the weight function  ( x) 
1
Fˆ ( x) 1  Fˆ ( x)


• Note that Ψ(x) is smallest (= 4) in the middle (median) where Fˆ ( x)  1 / 2
and largest () in either tail
Anderson-Darling Test
• Test statistic is

An2  n  [ Fn ( x)  Fˆ ( x)]2 ( x) fˆ ( x)dx

 (2i  1)ln Fˆ ( X
n

i 1
(i )

)  ln 1  Fˆ ( X ( n 11) )
n

n
• Reject H0: The fitted distribution is correct if An2 is too big
• There are several different kinds of tables depending on the form
and specification of the hypothesized distribution (see book for
details and example)
56
Anderson-Darling Test
57
Poisson-Process Test
• Common situation in simulation: modeling an event or arrival
process over time:
– Arrivals of customers or jobs, Breakdowns of machines,
Occurrence of accidents
• A popular (and realistic) model is the Poisson process with some
rate λ
• Equivalent definitions are
1. Number of events in (t1, t2] ~ Poisson with mean λ (t2 – t1)
2. Time between successive events ~ exponential with mean 1/ λ
3. Distribution of events over a fixed period of time is uniform
• Use second or third definitions to develop test for observed data
coming from a Poisson process:
• Test for inter-event times being exponential (chi-square, K-S, A-D, ...)
• Test for placement of events over time being uniform
• See book for details and example
The ExpertFit Software
•
•
Need software assistance to carry out the statistical procedures
Standard statistical-analysis packages do not suffice
– Often too oriented to normal-theory and related distributions
– Need wider variety of “nonstandard” distributions to achieve an adequate fit
– Difficult calculations like inverting non-closed-form CDFs, computation of critical
values and p-values for tests
•
•
•
•
ExpertFit package is tailored to these needs
Other packages exist, sometimes packaged with simulation-modeling
software
See book for details on ExpertFit and an extended, in-depth example
In Arena, distribution fitting is done with Input Analyzer
59
Shifted Distributions
• Many standard distributions have range [0, )
– Exponential, gamma, Weibull, lognormal, PT5, PT6, log-logistic
• But in some situations we’d like the range to be [g, ) for some
parameter
– A service time cannot physically be arbitrarily close to 0; there
is some absolute positive minimum gfor the service time
• One can shift one of the above distributions to the right by g
– Replace x in their density definitions by x – g(including in the
definition of the ranges)
• Introduces a new parameter gthat must be estimated from the data
– Depending on the distribution form, this may be relatively easy
(e.g., exponential) or very challenging (e.g., global MLEs are illdefined for gamma, Weibull, lognormal)
60
Truncated Distributions
• Data are well-fitted by a distribution with range [0, ) but
physical situation dictates that no value can exceed some finite
constant b
• Need to truncate the distribution above b to make effective
range [0, b]
• This is really a random variate-generation issue (covered in
Chapter 8)
61
Multivariate Distributions, Correlations,
and Stochastic Processes
• Assumption so far: Want to generate independent, identically
distributed (IID) random variables input to drive the simulation
• Sometimes there is correlation between random variables in reality
– A = interarrival time of a job from an upstream process
– S = service time of job at the station being modeled
– Perhaps a large A means that the job is “large,” taking a lot of
time upstream— then it probably will take a lot of time here too
(S large) so that Cor(A, S) > 0
– Ignoring this correlation can lead to serious errors in output
validity
– Need ways to estimate this dependence, and (later) generate it
in the simulation
62
Multivariate Distributions
• Some of the model’s input random variables together form a jointly
distributed random vector
• One must specify the joint distribution form and estimate its
parameters
• Correlations between the random variables is then determined by
the joint distribution form
• At present, this is limited to several specific special cases (see
book for details): multivariate normal, multivariate lognormal,
multivariate Johnson, and bivariate Bézier
( X1 , X 2 ,
, X n ) ~MVN (  , )  E  X i   i , Cov( X i , X j )  ij
f ( x1 , x2 ,
, xn ) 
1
 2 
n/2
det 
e

( x   )T  1 ( x   )
2
63
Arbitrary Marginal Distributions and
Correlations
• This is less ambitious than specifying the joint distribution, but
affords greater flexibility
• Allow for possible correlation between input random variables, but
fit their univariate (marginal) distributions separately
• Must specify the univariate marginal distributions (earlier methods)
and estimate the correlations (fairly easy)
• Does not in general uniquely specify the joint distribution
– Except in multivariate normal case where specifying the
marginal distributions and all the correlations uniquely specify
the joint distribution
• Must take care that the correlations are compatible with the
marginal distributions
– Marginal distributions place constraints on what correlation
structure is theoretically possible
• How to generate this structure for input to the simulation?
64
Stochastic Processes
•
Suppose you have an input stochastic process {X1, X2, ...} where the Xi’s
have the same distribution, but there is a correlation structure for them at
various lags;for example, Xi is the size of the ith incoming message in a
communications system, and it could be that large messages tend to be
followed by other large messages (or the reverse)
•
One can regard this as an infinite-dimensional random vector for input
•
Some specific models (see book for details) are
– Autoregressive (AR)
X i    1 ( X i 1   )  2 ( X i 2   ) 
  p ( X i p   )   i
– Autoregressive moving average (ARMA)
X i    1 ( X i 1   )  2 ( X i 2   ) 
 p ( X i p   )
 1 i 1  2 i 2   q i q   i
65
Selecting a Distribution in the Absence of Data
• What if there are no data?
• One must rely to some extent on subjective information (guesses)
• You can ask an “expert” for:
– min, max Uniform distribution [a,b]
– min, max, mode Triangular distribution [a,b,c]
– min, max, mode, mean Beta distribution [a,b,c,μ]
• See book for details and example
• It is a good idea to do sensitivity analysis and change the input
distributions to see if output changes appreciably
66
Models of Arrival Processes
•
•
•
In many applications, we need to model arrival processes
As in distributions, we need to specify its form and estimate the parameters
Three common models:
Poisson Process
• Three “behavioral” assumptions:
1. Events occur one at a time
2. Number of events in a time interval is independent of the past
3. Expected rate of events is constant over time
• Fitting: Fit exponential dsitribution to interevent times via MLE
• Testing: Chi-square or K-S test for the exponential distribution
P(T1  t )  e
 t
, P( N (t )  n) 
e
 t
 t 
n
n!
67
Poisson Process
• Three “behavioral” assumptions:
1. Events occur one at a time
2. Number of events in a time interval is independent of the past
3. Expected rate of events is constant over time
• Fitting: Fit exponential distribution to inter-event times via MLE
• Testing: Chi-square or K-S test for the exponential distribution
P(T1  t )  e  t , P( N (t )  n) 
e  t  t 
n
n!
68
Nonstationary Poisson Process
• Drop behavioral assumption 3 above and keep the others
• This allow for expected rate of events to vary with time, so replace
arrival rate constant λ with a function λ(t)
t
(t )    ( s)ds
0
P( N (t2 )  N (t1 )  n) 
e
  ( t2 )  ( t1 )
 (t2 )  (t1 ) 
n
n!
  (T  t )  (Tn )
P(Tn 1  Tn  t | Tn )  e  n
• Estimation of the rate function
– Assume rate function is constant over subintervals of time
• Must specify subintervals thought to be appropriate
• Must be careful to keep the units straight
– Other methods exist (see book for discussion and references)
Compound Poisson Process
(Batch Arrivals)
• Drop behavioral assumption 1 above and keep the others
• This allows number of events arriving to be a discrete random
variable independent of the event-time process
• The event time process is a Poisson process
X (t )  B1  B2 
 BN (t )
• Fitting:
– Fit exponential distribution to interevent times via MLE
– Fit a discrete random variable to observed “group” sizes
• Testing
– Chi-square or K-S tests separately for interevent times and
group sizes
70
Assessing the Homogeneity of Different Data Sets
• Sometimes we have different data sets on related but separate
processes
– Service-time observations for ten different days
– Can the data sets be merged?
– In other words, is the underlying distribution the same for each day?
• Advantages of merging (if it can be justified)
– Larger sample size, so get better specification of the input distribution
– Just one specification problem rather than several
– Just one distribution from which to generate in the simulation model
• We need to test
– H0: All the population distribution functions are identical, vs.
– H1: At least one of the populations tends to yield larger observations
than at least one of the other populations
• Formal statistical test is the Kruskal-Wallis test, which is a
nonparametric test based on the ranks of the data sets (see book for
details)
Download