Stochastic Simulation of Field Operations in Surveys

advertisement
Overview of Simulation Models and
a Simulation Model for NHIS Field
Operations and Cost Estimates
Bor-Chung Chen
Office of Railroad Safety
Federal Railroad Administration/USDOT
April 7, 2011
Simulation Modeling
An Operations Research Method
Optimization
Save Resources and/or
Improve Data Quality
2
Operations Research (OR)
seeks the determination of
the best (optimum) course
of action of a decision
problem under the
restriction of limited
resources
3
An optimization model is a decision-making
tool that recommends an answer (the goal to
be optimized) based on analyses of
information (constraints and decision
variables). It consists of three components:
• The goal to be optimized,
• Constraints, and
• Decision variables
4
Operations Research Models
• Deterministic Models
–
–
–
–
Linear Programming Models
Integer Programming Models
Network Flow Programming Models
Nonlinear Programming Models
• Stochastic Models
–
–
–
–
Inventory Models
Queueing Models
Queueing Networks and Decision Models
Simulation Models
5
Types of OR Models
• Analytical Models:
– The objective and constraints of the models can
be expressed quantitatively or mathematically
as functions of the decision variables.
• Simulation Models:
– The relationship between input and output of
the models are not explicitly stated; the models
break down the modeled system into basic or
elemental modules that are then linked to one
another by well-defined logical relationships.
6
An M/M/c Queueing System
System
Queue or
Waiting line
Service
Facility
1
Arriving
Customers
xxxxxx
2
..
.
Departing
Customers
c
7
Performance Measures of Queueing Systems


Arrival rate
Lq 
Expected number of customers in queue

1
Server utilization 
c

Departure rate
c 1
( c ) c 1
( c ) n
( c ) c 1

{

}
2 
(c  1)!(c   ) n 0 n!
c!(1   )
Ls 
Wq 
Ws 
Expected number of customers in system  Lq  c
Expected waiting time in queue 
Lq

Expected waiting time in system  Wq 
1

8
Total Cost of A Queueing System (Taha[2011])
Cost of
Waiting
Cost Per Unit Time
Total
Cost
Cost of
Operation
Optimum Number
of Servers (Tellers)
Number of Servers (Tellers)
9
Queueing Systems vs. Field Operations
• Queueing Systems
• Field Operations
(Personal Visits)
– The customers come
to the servers
– The system is small
and simple
– No traveling time
involved
– The servers
(interviewers) go to the
customers (respondents)
– The system is very large
and complicated
– Server traveling time
10
Inbound vs. Outbound Telephone Call
Centers
• Inbound
– 800 Customer
Services
– Help Desks
• Outbound
–
–
–
–
–
Telemarketing
Telephone Surveys
Charities
Politicians
Some Companies
11
Outbound Telephone Dialing System
as a Closed Queueing Network (Samuelson[1999])
NA
Party Does Not Answer
Queue or
Waiting Line
D
1
W
Waiting
To Dial
Service
Facility
A
xxxxxx
Party Answers
2
...
c
N
Lines with Parties
Who Hang Up or
Get Turned Away
S
R
12
Outbound Telephone Dialing System
Decision Variables
• Amount of time to anticipate service
completions
– Obtaining the new party too early, resulting in
an abandoned call and the need to start dialing
again
– Cost of waiting too long, resulting in
unnecessary idle time for the representatives
• Number of calls to attempt at once
– Two or more answer, we will have one or more
abandoned calls
– None answers, we will have idle representative
time
13
Objectives
• Develop a valid method of predicting cost,
response rates, and timing of new or
continuing surveys for the field operations.
• The simulation modeling will be followed
by the optimization of the field operations if
a simulation model is feasible and valid.
14
Definition of Discrete-Event Simulation
• Event Driven: Each occurrence of an
event changes the state of the system
• Using a model (implemented as a
computer program), rather than
experimenting with a real system
15
Steps of Simulation Study (Banks 1998)
• Model Conceptualization
• Data Collection
• Input Data Analysis
• Model Translation
• Verification and Validation
• Experimental Design
• Production Runs and Output Analysis
16
Model Conceptualization
• Problem Formulation
• Objectives and Project Plan
• The modeling begins simply and the model
grows until a model of appropriate
complexity has been developed with the
objectives in mind.
17
Data Collection
• A data set for each variable from a survey is
collected.
• Whenever possible, collect between 100
and 200 observations.
• Collect a number of samples from different
time periods, such as field operations (time
of day and/or day of week)
18
Input Data Analysis and Modeling
•
•
•
•
•
•
Assessing Independence
Probability Plots
Estimation of Parameters
Goodness of Fit Tests
Empirical Distributions
Simulation Support Software
– ExpertFit (A. M. Law and Associates)
– Stat::Fit (Geer Mountain Software Corporation)
19
Model Translation
• The conceptual model constructed is coded
into a computer-recognizable form, an
operational model.
• General-Purpose Software
• Manufacturing-Oriented Software
• Business Process Reengineering
• Simulation-Based Scheduling
• Field Operations? C++, FORTRAN?
20
Random Number and Random
Variate Generation
• Random (pseudorandom) numbers between 0 and
1 from the uniform distribution, U(0,1) or RN(0,1)
• Use Inverse Transform Method to obtain a random
variable, X:
1
Pr( X  x)  F ( x)  u , x  F (u )
x
), x  0;

F ( x)   0, 
otherwise

1
x  F (u )   ln(1  u )
1 exp(
21
Verification and Validation
• Verification concerns if the operational
model is performing properly.
• Validation is the determination that the
conceptual model is an accurate
representation of the field operations (or the
real system).
22
Verification and Validation Process
• It is an iterative process:
1.
2.
3.
4.
5.
6.
7.
Add new details to the model
Run the model
Evaluate the results
The results are not sufficiently accurate
Identify other details (operations/input data)
Go to step 1 and the cycle starts anew
At some point, the model is determined to be
“close enough”
23
Experimental Design
• For each scenario that is to be simulated,
decisions need to be made concerning the
length of the simulation run, the number of
runs (also called replications), the manner
of initialization, and controllable decision
variables as required.
24
Production Runs and Output Analysis
• Production runs and their subsequent output
analysis are used to estimate the
performance measures (cost, timing, and
response rates) for the scenario that are
being simulated.
• Finite-Horizon Simulations
• Steady-State Simulations
25
Simulation Model of Simplified NHIS
Field Operations (Prototype)
•
•
•
•
•
•
•
Ten FRs, 1050 cases, 105 cases per FR
Each FR covers a PSU of 60 x 60 square miles
FRs are given 17 days starting from a Monday
All FRs start to work at 3:00 PM each day
2004 NHIS CHI data set for input modeling
Visiting order: Traveling Salesman Problem
The model: about 1900 lines of C++ code
26
Field Operation Inputs
• Frequency distribution of 28 outcomes
• Interview length distributions by outcomes
• Contact/No-Contact Bernoulli distribution
• Contact time distributions
• Uniform distributions for vehicle speed
27
Software Development for Field
Operations Simulation Modeling
Field Operations Inputs
Input Modeling
Field Operations Simulation Model
Output Analysis
Costs
Response Rates
Timing
28
Performance Measures
• Low Cost: Direct Labor Cost (Hours and Mileage)
– Average number of personal visits per case
• High Response Rate
• Short Timing: How long it takes each month (17 days)
• It is called LHS
29
Preliminary Results
• 1000 independent replications with different
seeds
• Cost: $25,475
– Based on $10/hr and $0.35/mile
– Average number of PV = 1.74
• Response Rate: 86.04%
• Timing: 17 days
30
Response Rates of 2004 NHIS Q2
Region
Boston
New York
Philadelphia
Detroit
Chicago
Kansas City
Seattle
RR(%)
86.09
76.40
82.86
93.46
91.21
93.48
86.99
Region
Charlotte
Atlanta
Dallas
Denver
Los Angles
RR(%)
90.62
91.72
87.23
92.04
87.32
National
88.63
31
Design of Experiments:
Controllable Parameters
• Starting time:
10:00 AM, 12:00 noon, and 3:00 PM
• Number of FRs: 10 and 15
–
–
–
–
Timing: 17 days vs. 11 days
Area: 3600 vs. 2401 square miles
Cases per FR: 105 vs. 70
FR-Days: 170 vs. 165
32
Selected Frequency Distributions of
Contact (C)/No-Contact (NC)
% Hours
Sun
Mon
Tue
Wed
Thur
Fri
Sat
All
C 10:00 49.02 51.54 49.75 50.67 53.62
NC 12:00 50.98 48.46 50.25 49.33 46.38
51.09 55.17 51.88
48.91 44.83 48.12
C 12:00 52.97 50.63 51.10 51.22 50.31
NC 15:00 47.03 49.37 48.90 48.78 49.69
51.96 54.25 51.64
48.04 45.75 48.36
C 15:00 51.95 55.50 56.05 56.94 56.26
NC 20:00 48.05 44.50 43.95 43.06 43.74
53.86 51.77 55.32
46.14 48.23 44.68
33
The Six Parameter Settings
for the Experiments
S. T.
FRs
Days
Area
FR-Days
Adj. Days
1
10:00
10
17
3600
170
17.00
2
12:00
10
17
3600
170
17.00
3
15:00
10
17
3600
170
17.00
4
10:00
15
11
2401
165
11.33
5
12:00
15
11
2401
165
11.33
6
15:00
15
11
2401
165
11.33
34
The Estimates of the PMs of the Six
Parameter Settings
Adjusted to 170 FR-Days
Cost($) RR(%) AVs Cost($) RR(%) AVs
Saved(%)
1 25,375
86.19 1.72 25,375
86.19
1.72
2 25,238
86.86 1.71 25,238
86.86
1.71
3 25,475
86.04 1.74 25,475
86.04
1.74
4 20,722
82.23 1.68 21,349
84.72
1.73
15.86
5 20,575
83.50 1.66 21,199
86.03
1.71
16.00
6 20,589
83.88 1.67 21,213
86.42
1.72
16.73
35
Federal Statistics in the FY 2010 Budget
• Source:
http://www.copafs.org/reports/federal_statistics_in_the_fy_
2010_budget.aspx
(Total direct funding
in millions)
FY2008
Actual
FY2009
Estimate
Census Bureau
Current Programs
$
$
232.8
FY2010
Request
263.6
$
289.0
Census Bureau
Periodic Programs
1,234.0
3,906.3
7,115.7
Others
1,217.7
1,330.5
1,431.0
Total
2,684.5
5,500.4
8,835.7
36
Cost Estimates of the Replication with
Seed 169001
Setting
Total
Time
(hours)
Wages
($)
Total
Distance
(miles)
Mileage
($)
Total
Cost
($)
3
1,349.37
13,494
35,012
12,254
25,748
6
1,107.52
11,075
27,101
9,486
20,561
6(adj)
1,141.08
11,411
27,922
9,773
21,184
37
The Other Three Parameter Settings
for the Experiments
S. T.
FRs
Days
Area
FR-Days
Adj. Days
1
10:00
10
17
3600
170
17.00
2
12:00
10
17
3600
170
17.00
3
15:00
10
17
3600
170
17.00
7
10:00
15
17
2401
255
11.33
8
12:00
15
17
2401
255
11.33
9
15:00
15
17
2401
255
11.33
38
The Estimates of the PMs of the Other
Three Parameter Settings
Cost($)
RR(%)
AVs
Cost
Saved(%)
RR
Gain(%)
1
25,375
86.19
1.72
2
25,238
86.86
1.71
3
25,475
86.04
1.74
7
24,545
89.93
1.78
3.27
3.74
8
24,085
89.96
1.75
4.57
3.10
9
23,926
89.98
1.75
6.08
3.94
39
Optimum number of FRs
40
Conclusions
• Simulation models can be used for optimizing
field operations
• Smaller PSU area is more cost effective
– Less time on the roads and more time knocking on the
doors
– Not at the expense of the response rate
– Field operations can be completed sooner
41
Microsimulation of NHIS
• Physical Impediments and At-Home Patterns
of Households
• Interviewer Strategies
• Multiple Visits of Completed Interviews
• Unrelated Persons Living in the Same House
• Classification of Interviewers
• Multiple Surveys
• Sample Designs
42
What Next?
• Most Recent NHIS CHI Data
• Classification of PSUs:
– Population Densities
– Car Densities
– Traffic Statistics
• Development of A Simulation Language for
Field Operations?
43
Simulation and Modeling Textbooks
• Law and Kelton: Simulation Modeling and
Analysis. 3rd edition, 2000, McGraw-Hill
• Jerry Banks, Editor: Handbook of
Simulation. 1998, Wiley & Sons
• Hamdy A. Taha: Operations Research: An
Introduction. 9th edition, 2011, Prentice
Hall
• Hillier and Lieberman: Introduction to
Operations Research, 8th edition, 2005,
McGraw-Hill
44
Operations Research Models
• Deterministic Models
–
–
–
–
Linear Programming Models
Integer Programming Models
Network Flow Programming Models
Nonlinear Programming Models
• Stochastic Models
–
–
–
–
–
Inventory Models
Queueing Models
Queueing Networks and Decision Models
Simulation Models
Field Operating Models?
45
Download