Overview of Simulation Models and a Simulation Model for NHIS Field Operations and Cost Estimates Bor-Chung Chen Office of Railroad Safety Federal Railroad Administration/USDOT April 7, 2011 Simulation Modeling An Operations Research Method Optimization Save Resources and/or Improve Data Quality 2 Operations Research (OR) seeks the determination of the best (optimum) course of action of a decision problem under the restriction of limited resources 3 An optimization model is a decision-making tool that recommends an answer (the goal to be optimized) based on analyses of information (constraints and decision variables). It consists of three components: • The goal to be optimized, • Constraints, and • Decision variables 4 Operations Research Models • Deterministic Models – – – – Linear Programming Models Integer Programming Models Network Flow Programming Models Nonlinear Programming Models • Stochastic Models – – – – Inventory Models Queueing Models Queueing Networks and Decision Models Simulation Models 5 Types of OR Models • Analytical Models: – The objective and constraints of the models can be expressed quantitatively or mathematically as functions of the decision variables. • Simulation Models: – The relationship between input and output of the models are not explicitly stated; the models break down the modeled system into basic or elemental modules that are then linked to one another by well-defined logical relationships. 6 An M/M/c Queueing System System Queue or Waiting line Service Facility 1 Arriving Customers xxxxxx 2 .. . Departing Customers c 7 Performance Measures of Queueing Systems Arrival rate Lq Expected number of customers in queue 1 Server utilization c Departure rate c 1 ( c ) c 1 ( c ) n ( c ) c 1 { } 2 (c 1)!(c ) n 0 n! c!(1 ) Ls Wq Ws Expected number of customers in system Lq c Expected waiting time in queue Lq Expected waiting time in system Wq 1 8 Total Cost of A Queueing System (Taha[2011]) Cost of Waiting Cost Per Unit Time Total Cost Cost of Operation Optimum Number of Servers (Tellers) Number of Servers (Tellers) 9 Queueing Systems vs. Field Operations • Queueing Systems • Field Operations (Personal Visits) – The customers come to the servers – The system is small and simple – No traveling time involved – The servers (interviewers) go to the customers (respondents) – The system is very large and complicated – Server traveling time 10 Inbound vs. Outbound Telephone Call Centers • Inbound – 800 Customer Services – Help Desks • Outbound – – – – – Telemarketing Telephone Surveys Charities Politicians Some Companies 11 Outbound Telephone Dialing System as a Closed Queueing Network (Samuelson[1999]) NA Party Does Not Answer Queue or Waiting Line D 1 W Waiting To Dial Service Facility A xxxxxx Party Answers 2 ... c N Lines with Parties Who Hang Up or Get Turned Away S R 12 Outbound Telephone Dialing System Decision Variables • Amount of time to anticipate service completions – Obtaining the new party too early, resulting in an abandoned call and the need to start dialing again – Cost of waiting too long, resulting in unnecessary idle time for the representatives • Number of calls to attempt at once – Two or more answer, we will have one or more abandoned calls – None answers, we will have idle representative time 13 Objectives • Develop a valid method of predicting cost, response rates, and timing of new or continuing surveys for the field operations. • The simulation modeling will be followed by the optimization of the field operations if a simulation model is feasible and valid. 14 Definition of Discrete-Event Simulation • Event Driven: Each occurrence of an event changes the state of the system • Using a model (implemented as a computer program), rather than experimenting with a real system 15 Steps of Simulation Study (Banks 1998) • Model Conceptualization • Data Collection • Input Data Analysis • Model Translation • Verification and Validation • Experimental Design • Production Runs and Output Analysis 16 Model Conceptualization • Problem Formulation • Objectives and Project Plan • The modeling begins simply and the model grows until a model of appropriate complexity has been developed with the objectives in mind. 17 Data Collection • A data set for each variable from a survey is collected. • Whenever possible, collect between 100 and 200 observations. • Collect a number of samples from different time periods, such as field operations (time of day and/or day of week) 18 Input Data Analysis and Modeling • • • • • • Assessing Independence Probability Plots Estimation of Parameters Goodness of Fit Tests Empirical Distributions Simulation Support Software – ExpertFit (A. M. Law and Associates) – Stat::Fit (Geer Mountain Software Corporation) 19 Model Translation • The conceptual model constructed is coded into a computer-recognizable form, an operational model. • General-Purpose Software • Manufacturing-Oriented Software • Business Process Reengineering • Simulation-Based Scheduling • Field Operations? C++, FORTRAN? 20 Random Number and Random Variate Generation • Random (pseudorandom) numbers between 0 and 1 from the uniform distribution, U(0,1) or RN(0,1) • Use Inverse Transform Method to obtain a random variable, X: 1 Pr( X x) F ( x) u , x F (u ) x ), x 0; F ( x) 0, otherwise 1 x F (u ) ln(1 u ) 1 exp( 21 Verification and Validation • Verification concerns if the operational model is performing properly. • Validation is the determination that the conceptual model is an accurate representation of the field operations (or the real system). 22 Verification and Validation Process • It is an iterative process: 1. 2. 3. 4. 5. 6. 7. Add new details to the model Run the model Evaluate the results The results are not sufficiently accurate Identify other details (operations/input data) Go to step 1 and the cycle starts anew At some point, the model is determined to be “close enough” 23 Experimental Design • For each scenario that is to be simulated, decisions need to be made concerning the length of the simulation run, the number of runs (also called replications), the manner of initialization, and controllable decision variables as required. 24 Production Runs and Output Analysis • Production runs and their subsequent output analysis are used to estimate the performance measures (cost, timing, and response rates) for the scenario that are being simulated. • Finite-Horizon Simulations • Steady-State Simulations 25 Simulation Model of Simplified NHIS Field Operations (Prototype) • • • • • • • Ten FRs, 1050 cases, 105 cases per FR Each FR covers a PSU of 60 x 60 square miles FRs are given 17 days starting from a Monday All FRs start to work at 3:00 PM each day 2004 NHIS CHI data set for input modeling Visiting order: Traveling Salesman Problem The model: about 1900 lines of C++ code 26 Field Operation Inputs • Frequency distribution of 28 outcomes • Interview length distributions by outcomes • Contact/No-Contact Bernoulli distribution • Contact time distributions • Uniform distributions for vehicle speed 27 Software Development for Field Operations Simulation Modeling Field Operations Inputs Input Modeling Field Operations Simulation Model Output Analysis Costs Response Rates Timing 28 Performance Measures • Low Cost: Direct Labor Cost (Hours and Mileage) – Average number of personal visits per case • High Response Rate • Short Timing: How long it takes each month (17 days) • It is called LHS 29 Preliminary Results • 1000 independent replications with different seeds • Cost: $25,475 – Based on $10/hr and $0.35/mile – Average number of PV = 1.74 • Response Rate: 86.04% • Timing: 17 days 30 Response Rates of 2004 NHIS Q2 Region Boston New York Philadelphia Detroit Chicago Kansas City Seattle RR(%) 86.09 76.40 82.86 93.46 91.21 93.48 86.99 Region Charlotte Atlanta Dallas Denver Los Angles RR(%) 90.62 91.72 87.23 92.04 87.32 National 88.63 31 Design of Experiments: Controllable Parameters • Starting time: 10:00 AM, 12:00 noon, and 3:00 PM • Number of FRs: 10 and 15 – – – – Timing: 17 days vs. 11 days Area: 3600 vs. 2401 square miles Cases per FR: 105 vs. 70 FR-Days: 170 vs. 165 32 Selected Frequency Distributions of Contact (C)/No-Contact (NC) % Hours Sun Mon Tue Wed Thur Fri Sat All C 10:00 49.02 51.54 49.75 50.67 53.62 NC 12:00 50.98 48.46 50.25 49.33 46.38 51.09 55.17 51.88 48.91 44.83 48.12 C 12:00 52.97 50.63 51.10 51.22 50.31 NC 15:00 47.03 49.37 48.90 48.78 49.69 51.96 54.25 51.64 48.04 45.75 48.36 C 15:00 51.95 55.50 56.05 56.94 56.26 NC 20:00 48.05 44.50 43.95 43.06 43.74 53.86 51.77 55.32 46.14 48.23 44.68 33 The Six Parameter Settings for the Experiments S. T. FRs Days Area FR-Days Adj. Days 1 10:00 10 17 3600 170 17.00 2 12:00 10 17 3600 170 17.00 3 15:00 10 17 3600 170 17.00 4 10:00 15 11 2401 165 11.33 5 12:00 15 11 2401 165 11.33 6 15:00 15 11 2401 165 11.33 34 The Estimates of the PMs of the Six Parameter Settings Adjusted to 170 FR-Days Cost($) RR(%) AVs Cost($) RR(%) AVs Saved(%) 1 25,375 86.19 1.72 25,375 86.19 1.72 2 25,238 86.86 1.71 25,238 86.86 1.71 3 25,475 86.04 1.74 25,475 86.04 1.74 4 20,722 82.23 1.68 21,349 84.72 1.73 15.86 5 20,575 83.50 1.66 21,199 86.03 1.71 16.00 6 20,589 83.88 1.67 21,213 86.42 1.72 16.73 35 Federal Statistics in the FY 2010 Budget • Source: http://www.copafs.org/reports/federal_statistics_in_the_fy_ 2010_budget.aspx (Total direct funding in millions) FY2008 Actual FY2009 Estimate Census Bureau Current Programs $ $ 232.8 FY2010 Request 263.6 $ 289.0 Census Bureau Periodic Programs 1,234.0 3,906.3 7,115.7 Others 1,217.7 1,330.5 1,431.0 Total 2,684.5 5,500.4 8,835.7 36 Cost Estimates of the Replication with Seed 169001 Setting Total Time (hours) Wages ($) Total Distance (miles) Mileage ($) Total Cost ($) 3 1,349.37 13,494 35,012 12,254 25,748 6 1,107.52 11,075 27,101 9,486 20,561 6(adj) 1,141.08 11,411 27,922 9,773 21,184 37 The Other Three Parameter Settings for the Experiments S. T. FRs Days Area FR-Days Adj. Days 1 10:00 10 17 3600 170 17.00 2 12:00 10 17 3600 170 17.00 3 15:00 10 17 3600 170 17.00 7 10:00 15 17 2401 255 11.33 8 12:00 15 17 2401 255 11.33 9 15:00 15 17 2401 255 11.33 38 The Estimates of the PMs of the Other Three Parameter Settings Cost($) RR(%) AVs Cost Saved(%) RR Gain(%) 1 25,375 86.19 1.72 2 25,238 86.86 1.71 3 25,475 86.04 1.74 7 24,545 89.93 1.78 3.27 3.74 8 24,085 89.96 1.75 4.57 3.10 9 23,926 89.98 1.75 6.08 3.94 39 Optimum number of FRs 40 Conclusions • Simulation models can be used for optimizing field operations • Smaller PSU area is more cost effective – Less time on the roads and more time knocking on the doors – Not at the expense of the response rate – Field operations can be completed sooner 41 Microsimulation of NHIS • Physical Impediments and At-Home Patterns of Households • Interviewer Strategies • Multiple Visits of Completed Interviews • Unrelated Persons Living in the Same House • Classification of Interviewers • Multiple Surveys • Sample Designs 42 What Next? • Most Recent NHIS CHI Data • Classification of PSUs: – Population Densities – Car Densities – Traffic Statistics • Development of A Simulation Language for Field Operations? 43 Simulation and Modeling Textbooks • Law and Kelton: Simulation Modeling and Analysis. 3rd edition, 2000, McGraw-Hill • Jerry Banks, Editor: Handbook of Simulation. 1998, Wiley & Sons • Hamdy A. Taha: Operations Research: An Introduction. 9th edition, 2011, Prentice Hall • Hillier and Lieberman: Introduction to Operations Research, 8th edition, 2005, McGraw-Hill 44 Operations Research Models • Deterministic Models – – – – Linear Programming Models Integer Programming Models Network Flow Programming Models Nonlinear Programming Models • Stochastic Models – – – – – Inventory Models Queueing Models Queueing Networks and Decision Models Simulation Models Field Operating Models? 45