Further Statistical Issues Chapter 12

advertisement
Further
Statistical
Issues
Chapter 12
Last revision June 9, 2003
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 1 of 39
What We’ll Do ...
•
•
•
•
•
•
Random-number generation
Generating random variates
Nonstationary Poisson processes
Variance reduction
Sequential sampling
Designing and executing simulation experiments
Backup material:
•Appendix C: A Refresher on Probability and Statistics
•Appendix D: Arena’s Probability Distributions
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 2 of 39
Random-Number Generators (RNGs)
•
Algorithm to generate independent, identically
distributed draws from the continuous UNIF (0, 1)
f(x)
distribution

•
1
0
1
x
Basis for generating observations from all other
distributions and random processes

•
•
These are called random numbers
in simulation
Transform random numbers in a way that depends on the
desired distribution or process (later in this chapter)
It’s essential to have a good RNG
There are a lot of bad RNGs — this is very tricky

Methods, coding are both tricky
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 3 of 39
The Nature of RNGs
•
Recursive formula (algorithm)



•
•

Not really “random” as in unpredictable

Does this matter? Philosophically? Practically?
Want to “design” RNGs




•
Start with a seed (or seed vector)
Do something weird to the seed to get the next one
Repeat … generate the same sequence
Will eventually repeat (cycle) … want long cycle length
Long cycle length
Good statistical properties (uniform, independent) -- tests
Fast
Streams (subsegments) – many and long (for variance
reduction … later)
This is not easy!

Doing something weird isn’t enough
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 4 of 39
Linear Congruential Generators
(LCGs)
•
The most common of several different methods

•
•
•
•
•
•
But not the one in Arena (though it’s related) … more later
Generate a sequence of integers Z1, Z2, Z3, … via
the recursion
Zi = (a Zi–1 + c) (mod m)
a, c, and m are carefully chosen constants
Specify a seed Z0 to start off
“mod m” means take the remainder of dividing by
m as the next Zi
All the Zi’s are between 0 and m – 1
Return the ith “random number” as Ui = Zi / m
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 5 of 39
Example of a “Toy” LCG
•
Parameters m = 63, a = 22, c = 4, Z0 = 19:
Zi = (22 Zi–1 + 4) (mod 63), seed with Z0 = 19
i
0
1
2
3
4
:
61
62
63
64
65
66
:
22 Zi–1+4
422
972
598
686
:
158
708
334
422
972
598
:
Zi
19
44
27
31
56
:
32
15
19
44
27
31
:
Simulation with Arena, 3rd ed.
Ui
0.6984
0.4286
0.4921
0.8889
:
0.5079
0.2381
0.3016
0.6984
0.4286
0.4921
:
• Cycling — will repeat forever
• Cycle length  m
(could be << m depending
on parameters)
• Pick m BIG
• But that might not be enough
for good statistical properties
Chapter 12 – Further Statistical Issues
Slide 6 of 39
Issues with LCGs
•
Cycle length: < m

Typically, m = 2.1 billion (= 231 – 1) or more
–
•

Other parameters chosen so that cycle length = m or m – 1
Statistical properties


Uniformity, independence
There are many tests of RNGs
–
•
•
•
Which used to be a lot … more later
–
Empirical tests
Theoretical tests — “lattice” structure (next slide …)
Speed, storage — both are usually fine
Must be carefully, cleverly coded — BIG integers
Reproducibility — streams (long internal
subsequences) with fixed seeds
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 7 of 39
Issues with LCGs (cont’d.)
•
“Regularity” of LCGs (and other kinds of RNGs):
For the earlier “toy” LCG …
Plot of Ui vs. i
Plot of Ui+1 vs. Ui
“Random Numbers
Fall Mainly in the
Planes”
— George Marsaglia
•
•
“Design” RNGs: dense lattice in high dimensions
Other kinds of RNGs — longer memory in
recursion, combination of several RNGs
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 8 of 39
The Original (1983) Arena RNG
•
•
•
•
•
m = 231 – 1 = 2,147,483,647
a = 75 = 16,807
c=0
Cycle length = m – 1 = 2.1  109
Ten different automatic streams with fixed seeds
In its day, this was a good, well-tested generator
in an efficient code
But current computer speed make this cycle
length inadequate
LCG with:

•
Exhaust in 10 minutes on 2 GHz PC if we just generate
Can get this generator (not recommended)

Place a Seeds module (Elements panel) in your model
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 9 of 39
The Current (2000) Arena RNG
•
Uses some of the same ideas as LCG

•
But is not an LCG


•
Modulo division, recursive on earlier values
Combines two separate component generators
Recursion involves more than just the preceding value
Combined multiple recursive generator (CMRG)
An = (1403580 An-2 – 810728 An-3) mod 4294967087
Bn = (527612 Bn-1 – 1370589 Bn-3) mod 4294944443
Zn = (An – Bn) mod 4294967087 Combine the two
Zn / 4294967088 if Zn > 0
The next
Un =
random
4294967087 / 4294967088 if Zn = 0 number
Seed = a six-vector of first three An’s, Bn’s
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Two
simultaneous
recursions
Slide 10 of 39
The Current (2000) Arena RNG –
Properties
•
Extremely good statistical properties

•
Cycle length = 3.1  1057



•
Good uniformity in up to 45-dimensional hypercube
To cycle, all six seeds must match up
On 2 GHz PC, would take 2.8  1040 millennia to exhaust
Under Moore’s law, it will be 216 years until this generator
can be exhausted in a year of nonstop computing
Only slightly slower than old LCG

And RNG is usually a minor part of overall computing time
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 11 of 39
The Current (2000) Arena RNG –
Streams and Substreams
•
Automatic streams and substreams


•
•
•
1.8  1019 streams of length 1.7  1038 each
Each stream further divided into 2.3  1015 substreams of
length 7.6  1022 each
–
2 GHz PC would take 669 million years to exhaust a substream
Default stream is 10 (historical reasons)

Also used for Chance-type Decide module
To use a different stream, append its number after
a distribution’s parameters

For example, EXPO(6.7, 4) to use stream 4
When using multiple replications, Arena
automatically advances to next substream in
each stream for the next replication

Helps synchronize for variance reduction
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 12 of 39
Generating Random Variates
•
•
•
Have: Desired input distribution for model (fitted
or specified in some way), and RNG (UNIF (0, 1))
Want: Transform UNIF (0, 1) random numbers
into “draws” from the desired input distribution
Method: Mathematical transformations of
random numbers to “deform” them to the desired
distribution


•
Specific transform depends on desired distribution
Details in online Help about methods for all distributions
Do discrete, continuous distributions separately
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 13 of 39
Generating from Discrete Distributions
•
Example: probability
mass function
–2
•
0
3
Divide [0, 1]




Into subintervals of length
0.1, 0.5, 0.4
Generate U ~ UNIF (0, 1)
See which subinterval it’s in
Return X = corresponding
value
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 14 of 39
Discrete Generation: Another View
•
Plot cumulative distribution function; generate U
and plot on vertical axis; read “across and down”
• Inverting
the CDF
• Equivalent
to earlier
method
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 15 of 39
Generating from Continuous
Distributions
•
Example: EXPO (5) distribution
Density (PDF)
Distribution (CDF)
•
General algorithm (can be rigorously justified):
1. Generate a random number U ~ UNIF(0, 1)
2. Set U = F(X) and solve for X = F–1(U)


Solving analytically for X may or may not be simple (or
possible)
Sometimes use numerical approximation to “solve”
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 16 of 39
Generating from Continuous
Distributions (cont’d.)
•
•
Solution for EXPO (5) case:
Set U = F(X) = 1 – e–X/5
e–X/5 = 1 – U
–X/5 = ln (1 – U)
X = – 5 ln (1 – U)
Picture (inverting the CDF, as in discrete case):
Intuition (garden hose):
More U’s will hit F(x)
where it’s steep
This is where the density
f(x) is tallest, and we
want a denser
distribution of X’s
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 17 of 39
Nonstationary Poisson Processes
•
•
•
Many systems have externally originating events
affecting them — e.g., arrivals of customers
If process is stationary over time, usually specify
a fixed interevent-time distribution
But process could vary markedly in its rate


•
•
Fast-food lunch rush
Freeway rush hours
Ignoring nonstationarity can lead to serious
model and output errors
Already seen this — automotive repair shop,
Chapter 5
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 18 of 39
Nonstationary Poisson Processes –
Definition
•
Usual model: nonstationary Poisson process:


Have a rate function l(t)
Number of events in [t1, t2] ~ Poisson with mean
l(t)
•
t
Issues:


How to estimate rate function?
Given an estimate, how to generate during simulation?
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 19 of 39
Nonstationary Poisson Processes –
Estimating the Rate Function
•
Estimation of the rate function

Probably the most practical method is piecewise constant
–
–
–

Decide on a time interval within which rate is fixed
Estimate from data the (constant) rate during each interval
Be careful to get the units right
Other (more complicated) methods exist in the literature
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 20 of 39
Nonstationary Poisson Processes –
Generation
•
Arena has a built-in method to generate,
assuming a piecewise-constant rate function

•
Method is to invert a rate-one stationary Poisson
process against the cumulate rate function L



•
Arrival Schedule in Create module – auto repair (Model 5-2)
Similar to inverting CDF for continuous random-variable
generation
Exploits some speed-up possibilities
Details in Help topic “Non-Stationary Exponential
Distribution”
Alternative method: “thinning” of a stationary
Poisson process at the peak rate
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 21 of 39
Variance Reduction
•
•
Random input  random output (RIRO)
In other words, output has variance


Higher output variance means less precise results
Would like to eliminate or reduce output variance
–
–
–
•
One (bad) way to eliminate: replace all input random variables by
constants (like their mean)
Will get rid of random output, but will also invalidate model
Thus, best hope is to reduce output variance
Easy (brute-force) variance reduction: just
simulate some more


Terminating: additional replications
Steady-state: additional replications or a longer run
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 22 of 39
Variance Reduction (cont’d.)
•
•
But sometimes can reduce variance without more
runs — free lunch (?)
Key: unlike physical experiments, can control
randomness in computer-simulation experiments
via manipulating the RNG

•
Re-use the same “random” numbers either as they were, in
some opposite sense, or for a similar but simpler model
Several different variance-reduction techniques



Classified into categories — common random numbers,
antithetic variates, control variates, indirect estimation, …
Usually requires thorough understanding of model, “code”
Will look only at common random numbers in detail
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 23 of 39
Common Random Numbers (CRN)
•
Applies when objective is to compare two (or
more) alternative configurations or models


Interest is in difference(s) of performance measure(s)
across alternatives
Model 7-2 (small mfg. system), total avg. WIP output —
two alternatives
A. Base case (as is)
B. 3.5% increase in business (interarrival-time mean falls from 13 to
12.56 minutes)

Same run conditions, but change model into Model 12-1:
–
Remove Output File Total WIP History.dat
–
Add entry to Statistic module to compute and save to a .dat file the
total avg. WIP on each replication
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 24 of 39
The “Natural” Comparison
•
Run case A, make the change to get to case B and
run it, then Compare Means via Output Analyzer:
•
Difference is not statistically significant



Were the runs of A and B statistically independent?
Did we use the same random numbers running A and B?
Did we use the same random numbers intelligently?
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 25 of 39
CRN Intuition
•
Get sharper comparison if you subject all
alternatives to the same “conditions”


Then observed differences are due to model differences
rather than random differences in the “conditions”
Small mfg. system: For both A and B runs, cause:
–
–
–
•
The “same” parts arrive at the same times
Be assigned same attributes (job type)
Have the same process times at each step
Then observed differences will be attributable to
system differences, not random bounce

There isn’t any random bounce
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 26 of 39
Synchronization of Random Numbers
in CRN
•
Generally, get CRN by using the same RNG, seed,
stream(s) for all alternatives


•
Already are using the same stream, default = stream 10
But its usage generally gets mixed up across alternatives
Must use the same random numbers for the same
purposes across the alternatives —
synchronization of random-number usage




Usually requires some work, understanding of model
Usually use different streams in the RNG
Usually different ways to do this in a given model
Sometimes can’t synchronize completely for complex
models — settle for partial synchronization
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 27 of 39
Synchronization of Random Numbers
in CRN (cont’d.)
•
Synchronize by source of randomness (we’ll do)

Assign stream to each point of variate generation
–
–

•
Fairly simple but might not ensure complete
synchronization; still usually get some benefit from this
Synchronize by entity (won’t do — see Exercises)

•
Separate random-number “faucets” … extra parameter in r.v. calls
Model 12-1: 14 sources of randomness, separate stream for each
(see book for details), modify into Model 12-2

Pre-generate every possible random variate an entity might
need when it arrives, assign to attributes, used downstream
Better synchronization insurance but uses more memory
Across replications, RNG automatically goes to
next substream within each stream

Maintains synchronization if alternatives disagree on
number of random numbers used per stream per replication
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 28 of 39
Effect of CRN
“Natural”
Comparison
Synchronized
CRN
•
•
CRN was no more computational effort
Effect here is fairly dramatic, but it will not always
be this strong

Depends on “how similar” A and B are
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 29 of 39
CRN Statistical Issues
•
In Output Analyzer, Analyze > Compare Means
option, have choice of Paired-t or Two-Sample-t
for c.i. test on difference between means


•
Paired-t subtracts results replication by replication — must
use this if doing CRN
Two-Sample-t treats the samples independently — can use
this if doing independent sampling, often better than
Paired-t
Mathematical justification for CRN

Let X = output r.v. from alternative A, Y = output from B
Var(X – Y) = Var(X) + Var(Y) – 2 Cov(X, Y)
= 0 if indep
> 0 with CRN
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 30 of 39
Other Variance-Reduction Techniques
•
•
For single-system simulation, not comparisons
Antithetic variates: make pairs of runs



•
Control variates

•
Use “U’s” on first run of pair, “1 – U’s” on second run of pair
Take average of two runs: negatively correlated, reducing
variance of average
Like CRN, must take care to synchronize
Use internal variate generation to “control” results up, down
Indirect estimation

Simulation estimates something other than what you want,
but related to what you want by a fixed formula
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 31 of 39
Sequential Sampling
•
Always try to quantify imprecision in results



•
If imprecision is “small enough,” you’re done
If not, need to do something to increase precision
Just saw one way: variance-reduction techniques
Obvious way to increase precision: keep
simulating one more “step” at a time, quit when
you achieve desired precision

Terminating models: “step” = another replication
–

Cannot extend length of replications — that’s part of the model
Steady-state models:
–
–
“step” = another replication if using truncated replications, or
“step” = some extension of the run if using batch means
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 32 of 39
Sequential Sampling —
Terminating Models
•
•
Modify Model 12-2 (small mfg., base case, with
random-number streams) into Model 12-3
Made 25 replications, get 95% c.i. on expected
average Total WIP as 13.03 ± 1.28




Suppose the ±1.28 is too big — want to reduce to ±0.5
Approximate formulas from Sec. 6.3: need 124 or 164 total
replications (depending on which formula) rather than 25
Instead, just make one more at a time, re-compute c.i., stop
as soon as half-width is less than 0.5
“Trick” Arena to keep making more replications until c.i.
half-width < tolerance = 0.5
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 33 of 39
Sequential Sampling —
Terminating Models (cont’d.)
•
•
Recall: With > 1 replication, automatically get
cross-replication 95% c.i.’s for expected values in
Category Overview report
Related internal Arena variables:

ORUNHALF(Output ID) = half-width of 95% c.i. using
completed replications (Output ID = Avg Total WIP)

MREP = total number of replications asked for (initially,
MREP = Number of Replications in Run > Setup >
Replication Parameters)
NREP = replication number now in progress (= 1, 2, 3, …)

•
Use, manipulate these variables
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 34 of 39
Sequential Sampling —
Terminating Models (cont’d.)
•
Initially set MREP to huge value in Number of
Replications in Run > Setup > Replication
Parameters

•
Keep replicating until we cut off when half-width  0.5
Add a logic (in a submodel) to sense when done


Create one “control” entity at beginning of each replication
Control entity immediately checks to see if:
–
–

NREP  2: This is beginning of 1st or 2nd replication
ORUNHALF(Avg Total WIP) > 0.5: c.i. on completed replications
is still too big
In either case, keep going with this replication (and the next
one too); control entity is Disposed and takes no action
If both conditions are false, Control entity Assigns MREP =
NREP to stop after this replication, and is Disposed
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 35 of 39
Sequential Sampling —
Terminating Models (cont’d.)
•
Details




•
This overshoots required number of replications by one
In Assign module setting MREP to NREP, have to select
Type = Other since MREP is a built-in Arena variable
Results: Stopped with 232 total replications, yielding half
width = 0.49699 (barely less than 0.5)
Different from earlier number-of-replications approximations
(they’re just that)
Generalizations


Precision demands on several outputs
Relative-width stopping: (half-width) / (pt. estimate) small
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 36 of 39
Sequential Sampling —
Steady-State Models
•
If doing truncated-replications approach to
steady-state statistical analysis



Same strategy as above for terminating models
Warm-up Period specified in Run > Setup > Replication
Parameters
Err on the side of too much warmup
–
–
–
Point-estimator bias is especially dangerous in sequential sampling
Getting tight c.i. centered in the wrong place
The tighter the c.i. demand, the worse the coverage probability
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 37 of 39
Sequential Sampling —
Steady-State Models (cont’d.)
•
Batch-means approach




Model 12-4: modification of Model 7-4 (small mfg. system)
– want half-width on E(average WIP) to be < 1
Keep extending the run to reduce c.i. half-width
Use automatic run-time batch-means 95% c.i.’s
Stopping criterion: Terminating Condition field of Run >
Setup > Replication Parameters
–
–



Half-width variables are THALF(Tally ID) or DHALF(Dstat ID)
For us, condition is DHALF(Total WIP) < 1
Remove all other stopping devices from model
If “Insuf” or “Corr” would be returned because of too little
data, half-width variables set to huge value — keep going
Could demand multiple smallness criteria, relative precision
(use TAVG, DAVG variables for point-estimate
denominator)
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 38 of 39
Designing and Executing Simulation
Experiments
•
Think of a simulation model as a convenient
“testbed” or laboratory for experimentation


•
Look at different output responses
Look at effects, interaction of different input factors
Apply classical experimental-design techniques






Factorial experiments — main effects, interactions
Fractional-factorial experiments
Factor-screening designs
Response-surface methods, “metamodels”
CRN is “blocking” in experimental-design terminology
Process Analyzer (PAN) provides a convenient way to carry
out a designed experiment
–
See Chapt. 6 for an example of using PAN for a factorial experiment
Simulation with Arena, 3rd ed.
Chapter 12 – Further Statistical Issues
Slide 39 of 39
Download