Introduction to Simulation Andy Wang CIS 5930-03 Computer Systems

advertisement
Introduction to Simulation
Andy Wang
CIS 5930-03
Computer Systems
Performance Analysis
Simulations
• Useful when the system is not available
• Good for exploring a large parameter
space
• However, simulations often fail
• Need both statistical and programming skills
• Can take a long time
2
Common Mistakes
• Inappropriate level of detail
– More details  more development time 
more bugs  more time to run
– More details require more knowledge of
parameters, which may not be available
• E.g., requested disk sector
– Better to start with a less detailed model
• Refine as needed
3
Common Mistakes
• Improper language
– Simulation languages
• Less time for development and statistical
analysis
– General-purpose languages
• More portable
• Potentially more efficient
4
Common Mistakes
• Invalid models
– Need to be confirmed by analytical models,
measurements, or intuition
• Improperly handled initial conditions
– Should discard initial conditions
– Not representative of the system behavior
• Too short simulations
– Heavily dependent on initial conditions
5
Common Mistakes
• Poor random number generators
– Safer to use well-known ones
– Even well-known ones have problems
• Improper selection of seeds
– Need to maintain independence among
random number streams
– Bad idea to initialize all streams with the
same seed (e.g., zeros)
6
Other Causes of Simulation
Analysis Failure
• Inadequate time estimate
– Underestimate the time and effort
– Simulation generally takes the longest time
compared to modeling and measurement
• Due to debugging and verification
• No achievable goal
– Needs to be quantifiable
7
Other Causes of Simulation
Analysis Failure
• Incomplete mix of essential skills
– Project leadership
– Modeling and statistics
– Programming
– Knowledge of modeled system
• Inadequate level of user participation
– Need periodic meetings with end users
8
Other Causes of Simulation
Analysis Failure
• Obsolete or nonexistent documentation
• Inability to manage the development of
a large, complex computer program
– Needs to keep track of objectives,
requirements, data structures, and
program estimates
• Mysterious results
– May need more detailed models
9
Terminology
• State variables: the variables whose
values define the state of the system
– E.g., length of a job queue for a CPU
scheduler
• Event: a change in the system state
10
Static and Dynamic Models
• Static model: time is not a variable
– E.g., E = mc2
• Dynamic model: system state
changes with time
– CPU scheduling
11
Continuous and Discrete-time
Model
Continuous-time model
• System state is defined at
all times
Discrete-time model
• System state is defined
only at instants in time
Number
of students
attending
this class
Time
spent
executing
a job
Time
Time
Tuesdays
and
Thursdays
12
Continuous and Discretestate Model
Continuous-state model
• Use continuous state
variables
Time
spent
executing
a job
Discrete-state model
• Use discrete state
variables
Number
of jobs
Time
Time
• Possible to have all four combinations of
continuous/discrete time/state models
13
Deterministic and
Probabilistic Model
Deterministic model
• Output of a model can be
predicted with certainty
Probabilistic model
• Gives a different result for
the same input
parameters
output
output
input
input
14
Linear and Nonlinear Models
Linear model
• Output parameters are
linearly correlated with
input parameters
Nonlinear model
• Otherwise
15
Stable and Unstable Models
Stable model
• Settles down to a steady
state
Unstable model
• Otherwise
16
Open and Closed Models
Open model
• Input is external to the
model and is independent
of the model
Close model
• No external input
17
Computer System Models
• Generally
– Continuous time
– Discrete state
– Probabilistic
– Dynamic
– Nonlinear
18
Selecting a Language for
Simulation
•
•
•
•
Simulation language
General-purpose language
Extension of general-purpose language
Simulation package
19
Simulation Languages
• Have built-in facilities
– Time advancing
– Event scheduling
– Entity manipulation
– Random-variate generation
– Statistical data collection
– Report generation
• Examples: SIMULA, Maisie, ParSEC
20
General-purpose Languages
• C++, Java
• No need to learn a new language
• Simulation languages may not be
available
• More portable
• Can be optimized
21
Extensions of GeneralPurpose Languages
• Provide routines commonly required in
simulation
• Examples: CSim, NS-3 (OTcl + C++)
22
Simulation Packages
• Provide a library of data structures,
routines, algorithms
• Significant time savings
– Can be done in one day
• However, not flexible for unforeseen
scenarios
23
Types of Simulations
• Emulation
– Hybrid simulation
• Monte Carlo simulation
• Trace-driven simulation
• Discrete-event simulation
24
Emulation and Hybrid
Simulation
• Emulation
– A simulation using hardware/firmware
• Hybrid simulation
– A simulation that combines simulation and
hardware
– E.g., a 5-disk RAID
• One simulated disk
• Four real disks
25
Monte Carlo Simulation
• A type of static simulation
• Models probabilistic phenomenon
• Can be used to evaluate
nonprobabilistic expressions
– E.g., use the average of estimates to
evaluate difficult integrals
26
Trace-Driven Simulation
• Trace: a time-ordered record of events
on a real system
– Needs to be as independent of the
underlying system as possible
• Storage-level trace may be specific to the
cache replacement mechanisms above, the
working set, the memory size, etc.
27
Advantages of TraceDriven Simulation
• Credibility
• Easy validation
– Just compare measured vs. simulated
numbers
• Accurate workload
– Preserves the correlation and interferences
effects
28
Advantages of TraceDriven Simulation
• Less randomness
– Deterministic input
– Less variance
– Fewer number of runs to get good
confidence
• Fairer comparison (deterministic input)
– For different alternatives
• Similarity to the actual implementation
29
Disadvantages of TraceDriven Simulation
• Complexity
– More detailed simulation to take realistic
trace inputs
• Representativeness
– Trace from one system may not be
representative of the workload on another
system
– Can become obsolete quickly
30
Disadvantages of TraceDriven Simulation
• Finiteness
– A trace of a few minutes may not capture
enough activity
• Single point of validation
– Algorithms optimized for one trace may not
work for other traces
• Trade-off
– Difficult to change workload characteristics
31
Discrete-Event Simulation
• Uses discrete-state model
– May use continuous or discrete time values
32
Common Components
• Event scheduler
– E.g., schedule event X at time T
• Simulation clock
• A time-advancing mechanism
– Unit time: Increments time by small
increments
– Event-driven: Increments time
automatically to the time of the next
earliest event
33
Common Components
• System state variables
• Event routines (handlers)
• Input routines
– E.g., number of repetitions
• Report generator
• Initialization routines
– Beginning of a simulation, iteration,
repetition
34
Common Components
• Trace routines (for debugging)
– Should have an on/off feature
– Snapshot/continue from a snapshot
• Dynamic management
• Main program
35
Event-Set Algorithms
• How to track events
– Ordered linked list (< 20 events)
– Indexed linked list (20 – 120 events)
• Calendar queue
– Tree structure (> 120 events)
• E.g., heap
36
White Slide
37
Download