1 The Role of Subway Travel in an Influenza Epidemic: a New York City Simulation Philip Cooleya, Shawn T. Brownb,c, James Cajkaa, Bernadette Chasteena, Laxminarayana Ganapathia, John J. Grefenstetteb, Craig R. Hollingswortha, Bruce Y. Leeb, Burton Levinea, William D. Wheatona, Diane K. Wagenera a RTI International, Research Triangle Park, NC, USA. bUniversity of Pittsburgh, PA, USA. c Pittsburgh Supercomputing Center, Pittsburgh, PA USA Correspondence: Phil Cooley, RTI International, 3040 Cornwallis Road, P.O. Box 12194, Research Triangle Park, NC 27709, USA. E-mail: pcc@rti.org Keywords: influenza, pandemic, H1N1, mathematical model, subway transportation, commuting behaviour 2 SUPPLEMENTARY INFORMATION 3 Contents 1. Agent Data ................................................................................................................................ 6 Region included ........................................................................................................................ 6 Generating Synthesized Households and Persons .................................................................... 7 Household Size and Age Structure of the Synthetic NYC Population..................................... 8 * Household counts only ** Household + non household counts .............................................. 8 School Data and Allocation Model ............................................................................................. 9 Data Sources for Schools ......................................................................................................... 9 Allocating Students to Public Schools ................................................................................... 10 Allocating Students to Private Schools .................................................................................. 11 Assignment Results for Public and Private schools ............................................................... 12 Data Sources for Schools and Assignments ........................................................................... 13 Roads ...................................................................................................................................... 13 Private Schools ....................................................................................................................... 13 Assignment Results for Public and Private Schools .............................................................. 14 Workplace Data and Allocation Model .................................................................................. 15 Commuting Data .................................................................................................................... 18 2. Model Details .......................................................................................................................... 20 Details of the Transmission Model ........................................................................................ 20 Infection seeding .................................................................................................................... 23 Interventions ........................................................................................................................... 23 Computational resources ........................................................................................................ 27 3. Natural history and transmission parameters ..................................................................... 27 Pandemic influenza model parameterization ......................................................................... 28 Contacts within Household .................................................................................................... 30 Estimation of R0 from inter-pandemic influenza household data .......................................... 31 Assumptions about Population Behavior during a Pandemic ................................................ 31 4 Transmissibility Scenarios ..................................................................................................... 32 Sensitivity Analyses .................................................................................................................... 33 Worker/Student Ratio ............................................................................................................. 33 Transmissions by Workers Employed Outside of NYC ........................................................ 35 Within-place Group Structure and Targeting ......................................................................... 36 Proportion of Infections that become Clinical Cases ............................................................. 37 Behavior of Symptomatic Individuals .................................................................................... 38 Parameter Estimation Strategy Summary............................................................................... 39 Estimates of the Proportion of Infections that Occur on the Subway .................................... 41 Figures Figure SI-1: The New York City’s Five Boroughs Region ............................................................. 6 Figure SI-2: Impact of Reducing Contacts on Subways for a R0 = 1.4 Epidemic......................... 24 Figure SI-3: Impact of Vaccinating NYC Adults........................................................................... 26 Figure SI-4: Latent Density (days 1-3) and Incubation Density (days 4 – 7) ................................ 28 Figure SI-5. Transmissibility Scenarios: with and without Immunity ........................................... 33 Figure SI-6. Comparison of Calibration Rules for R0 = 1.4 Epidemics ........................................ 34 Figure SI-7. Comparison of Baseline Epidemic with and without non NYC Workplace Infections ........................................................................................................................................................ 37 Figure SI-8. Comparison of Epidemic Curves for the 67% versus 50% Symptomatic Case Assumption..................................................................................................................................... 38 Figure SI-9. Comparison of Epidemic Curves for the 25%, 50% - Baseline, 75% Stay Home assumptions. ................................................................................................................................... 39 5 Tables Table SI-1: Synthetic population age distribution compared to Census data for NYC ................... 8 Table SI-2: Synthesized household size compared to Census data for NYC................................... 9 Table SI-3: Student school Assignment Summary by Age, Borough and Private versus Public Schools ........................................................................................................................................... 15 Table SI-4: Distribution of Firms by Size and Borough ................................................................ 17 Table SI-5: Distribution of Workers by Borough of Residence..................................................... 18 Table SI-6: Estimated Distance-to-Work Distribution Derived from Census Commuting Data and Synthesized Household Locations ................................................................................................. 19 Table SI-7: Transmission Parameters ............................................................................................ 22 Table SI-8: Effect of Subway Specific Contact Reducing Interventions ....................................... 25 Table SI-9: Effect of Contact Reducing Interventions ................................................................... 25 Table SI-10: Estimated Person-to-Person Contact Values (Contacts per Day) ............................. 30 Table SI-11: Age Specific Attack Rates (AR) for Alternative Calibration Methods..................... 35 Table SI-12: Estimates of Subway Rider Infections................................................................. 42 6 1. Agent Data Region included Our region of study is shown in dark shading on Figure SI-1; it includes New York City’s five borough region. We used a proportional iterative method developed by Beckman, et al. [1] to generate an agent population from the US Census Bureau’s Public Use Microdata files (PUMs) and Census aggregated data [2]. See Wheaton, et al. for a detailed description [3]. Our model contained a total of 7,847,465 computer agents, or virtual people. Each person resided in one of the five boroughs and had a set of socio-demographic characteristics and daily behaviors that included age, sex, employment status, occupation, and household location and membership. A total of 2,005,024 people were under 18 years of age, and 848,590 were over 65. Figure SI-1: The New York City’s Five Boroughs Region 7 Generating Synthesized Households and Persons We used three primary data sources to generate the synthesized agents and households. All three data sources are produced by the US Census Bureau. US Census Bureau TIGER Data. The TIGER (Topologically Integrated Geographic Encoding and Referencing) data provide the spatial context for decennial Census data collection [4]. TIGER defines, among many other things, the boundaries of states, counties, Census tracts, block groups, and blocks. Census tabulation data are aggregated into these various geographic boundaries. The smallest Census geographic boundary for which the full suite of Census variables (including socioeconomic variables) is available is the Census block group. In addition, the TIGER files include data on boundaries of bodies of water and road networks, which we also used in generating the synthesized database. Summary File 3 (SF3) Data. The SF3 data contain the demographic variables from the Census, organized and aggregated to many different geographic boundaries [5]. Data variables on population and housing are available in these files. Public Use Microdata Sample (PUMS). The PUMS data contain records representing a five percent sample of responses to Census long-form questionnaires that retain family structure information [6]. Data on households (including number of persons in the household, number of bedrooms, age of building, access to telephone service, type of heating, mortgage data, and many other variables) are provided. PUMS also provides data on individuals within each household (including age, sex, ethnicity, language spoken, school enrollment, occupation, travel time to work, military service, and many other variables). In addition, the PUMS data set maintains linkages between individuals and households, thus allowing the household population structure to be brought forward through further analyses. The PUMS data are available for predefined Census areas known as Public Use Microdata Areas (PUMAs). PUMAs are defined by each state, rather than by the US Census Bureau. The Census Bureau requires that each PUMA contain about 100,000 persons, but otherwise, states have wide 8 latitude to define the shape and extent of each PUMA. PUMAs tend to be relatively small in densely populated areas and relatively large in sparsely populated areas. Household Size and Age Structure of the Synthetic NYC Population Table SI-1 compares the age structure of the NYC synthetic population with the 2000 Census data. Overall, the comparisons are close, with slightly more synthetic agents under 20 years of age and over 70 years of age compared to the Census data. Note that the synthetic data only represented household dwellers, whereas the Census data included persons living in group quarters (prisons, university dorms, and nursing homes). Table SI-1: Synthetic population age distribution compared to Census data for NYC * Age Synthetic* Census** 0-4 565,376 532,676 5-9 579,563 562,978 10-19 1,056,597 1,047,294 20-39 2,488,302 2,608,925 40-69 2,508,337 2,577,237 70+ 649,290 593,334 Total 7,847,465 7,922,444 Household counts only ** Household + non household counts Table SI-2 compares the distribution of household sizes of the NYC synthetic population with the 2000 Census data, comparing household occupants only for both sources of data. The numbers agree very closely. 9 Table SI-2: Synthesized household size compared to Census data for NYC Number of Model Census 1 964,234 961,941 2 802,494 801,950 3-4 866,228 868,563 5-6 309,364 310,022 7+ 80,154 80,001 Total 3,022,474 3,022,477 Household Members School Data and Allocation Model Our model also depicted NYC schools and assigned persons of school age in the synthetic population to schools using methods described below. To begin, school age individuals were assigned to either public schools or private schools. The information about the type of school that individuals might potentially attend is available from the 2000 Census of Population and Housing Public Use Microdata sample. Cajka et al. describe these data and assignment methods [7]. Assignment methods depend on the school types. Data Sources for Schools To identify the schools in our study area and their location, we used a database that included all public and private NYC schools. The school’s addresses were geocoded and used to assign school aged children in the synthetic data to schools. The dataset included information on numbers of students by school year, which we used to specify an age-specific student capacity for each school. The National Center for Education Statistics (NCES) [8] maintains downloadable files, available at http://nces.ed.gov/ccd/bat, which contain information about all known public schools in the 10 United States. Using these files, we were able to retrieve enrollment data by grade for each US public school, as well as additional information, including the school’s name, address, and NCES School ID. We converted these data into a text file and sent it to Tele Atlas, North America, to be geocoded according to the school’s address. With the Environmental Systems Research Institute’s (ESRI’s) ArcGIS software product, we then used the latitude and longitude pairs in the geocoded data set to convert the text file into a spatial data layer. We interactively processed this spatial data layer to resolve any ambiguous geocoding results (e.g., address not found, Post Office box address), using a variety of Internet mapping resources and aerial photography. We also checked to ensure that, at a minimum, the school was located in the correct county. Spot checking revealed that overall, schools were very well located. After the geographic locations were checked, we loaded the data into a SQL Server database running ESRI’s ArcSDE middleware. Allocating Students to Public Schools Determining which person attends what public school depends on the individual school district’s specific goals and many factors are involved, such as geographic proximity, socioeconomic characteristics, physical barriers, availability of busing, and politics. There is no one formula or set of criteria that applies to school assignments nationwide. To create a process by which students were assigned to public schools in NYC, we used the method described in Cajka et al. [7]. The major underlying assumptions we used were: Geographic proximity is a major criterion for making assignments. Students are assigned to a school on the basis of distance along a network (roads) rather than distance along a straight line. Students attend school only in their Borough of residence. Students are assigned to a school according to the school’s capacity for their grade. 11 No special allowances are made to assign siblings to the same school, other than the fact that they shared the same geographic location and therefore should be assigned to the closest school that had capacity for their grade levels. We selected the LocateAllocate command from within the ArcPlot module of the workstation version of ArcGIS to assign students to schools. One of the required parameters was specifying a network along which the assignments could occur. In this case, the roads network connected potential students to their neighboring schools. The US Census Bureau’s 2000 TIGER/Line files supplied the roads data. Allocating Students to Private Schools The principal assumption in making private-school assignments was that while students could attend a school anywhere within the NYC region, they would be more likely to attend a school close to their residence. To start this process, we extracted persons aged 4 to 17 years whose enrollment code in the Census data indicated they attended private school from the synthetic population database, and we extracted private schools with total enrollment greater than zero from the schools layer. The Cajka et al.’s private-school assignment process divided the area around each private school into three concentric rings. The first ring extended out from the school location to a distance of 10 km; the second ring started at 10 km and extended out to 15 km; and the third ring extended from 15 km to 20 km. The assignment process used the ArcPlot Reselect command and selected 50 percent of the students from the first ring, 25 percent from the second ring, and 25 percent from the third ring. We established these proportions after running the command several times and after reviewing the results for assignment completion rates. Despite our having established these proportions, at the end of the process not all privately enrolled students were assigned to a private school. Students were unassigned because the private schools’ reached their capacity, or because privately enrolled students lived more than 20 km from a private school that taught their grade. To assign these students, we created a private-school post processing step that selected unassigned students and assigned them to the nearest private school within 80 km that taught 12 their grade. This step made the allocation much more complete, although it did overfill some schools. These assignments are shown in Table SI-3. Assignment Results for Public and Private schools There were 2,073 public and private schools we included in the NYC region with 1,400,430 attended by students of school age (5 – 17). Table SI-3 below identifies the total population of students assigned by age within each of the five boroughs. Our model also depicts NYC public schools, and assigns synthetic students to schools using the methods described below. School age individuals were selected for assignment to public or private schools. The information about type of school that individuals might potentially attend is available from the 2000 Census of Population and Housing, Public Use Microdata sample. Cajka et al. describe these data and assignment methods [7]. Again, assignment methods depend on the school types. We used a database that included all public and private NYC schools and had each school’s addresses geocoded. We used this data to assign children of school age in the synthetic data to schools. The dataset included information on number of students by school year, which was used to specify an age-specific student capacity for each school. Again, as there is no one formula or set of criteria applicable to school assignments nationwide, we therefore used the Cajka et al. method to create a process to assign students to public schools in NYC [7]. The major underlying assumptions are: Geographic proximity is a major criterion for making assignments. Students are assigned to a school based on distance along a network (roads) rather than distance along a straight line. Students attend school only in their Borough of residence. Students are assigned to a school according to the school’s capacity for their grade. 13 No special allowances are made to assign siblings to the same school, other than the fact that they shared the same geographic location and therefore should be assigned to the closest school that had capacity for their grade levels. Data Sources for Schools and Assignments As with the private school process, we used NCES downloadable files, these containing information about all known public schools in the United States [8]. We used these files to retrieve enrollment data by grade for each US public school, as well the school’s name, address, and NCES School ID; we then converted these school data into a text file and sent it to Tele Atlas, North America, where it was geocoded. Using ArcGIS, we converted the text file into a spatial data layer, then interactively processed this spatial data layer to resolve any ambiguous geocoding results (e.g., address not found, Post Office box address), using a variety of Internet mapping resources and aerial photography. As with the private school process we performed checks to ensure that the school was located in the correct county, and a spot check to ensure schools were well located. After checking the geographic locations, we loaded the data into a SQL Server database running ESRI’s ArcSDE middleware. Roads Again, we used the LocateAllocate command from within the ArcPlot module of ArcGIS to assign students to schools. One of the required parameters was specifying a network along which the assignments could occur. In this case, we used the roads network to connect students to their neighboring schools, and the US Census Bureau’s 2000 TIGER/Line files provided the roads data. Private Schools Our principal assumption in making private-school assignments was that students could attend a school anywhere within the NYC region but would more likely attend a school closer to their residence. To start this process, we extracted persons aged 4 to 17 years whose Census data enrollment code indicated they attended private school from the synthetic population database. We also extracted private schools with total enrollment greater than zero from the schools layer. 14 Again, the private-school assignment process divided the area around each private school into three concentric rings starting at the school out to 10 km, from 10 km to 15 km, and from 15 km to 20 km. The assignment process used the ArcPlot Reselect command and selected 50 percent of the students from the first ring, 25 percent from the second, and 25 percent from the third. We established these proportions after running the command several times and after reviewing the results for assignment completion rates. Despite our having established these proportions, not all privately enrolled students were assigned to a private school due to private schools’ reaching their capacity or privately enrolled students living more than 20 km from a school that taught their grade. To assign these students, we created a private-school post processing step that selected unassigned students and assigned them to the nearest private school within 80 km that taught their grade. In general this step made the allocation much more complete, although it did overfill some schools. The students to school assignments are shown in Table SI-3. Assignment Results for Public and Private Schools A total of 2,073 public and private schools were included in the NYC region with a total of 1,400,430 students of school age (5 – 17) in attendance. Table SI-3 below identifies the total population of students assigned by age within each of the five boroughs. 15 Table SI-3: Student school Assignment Summary by Age, Borough and Private versus Public Schools Age 005 047 061 081 085 Total 5-11 146,897 218,911 82,972 172,955 36,245 657,980 12-14 53,250 89,187 31,928 65,948 13,855 254,168 15-18 50,537 86,038 31,374 65,653 12,463 245,905 250,524 394,136 146,274 304,556 62,563 1,158,053 5-11 22,190 51,438 22,255 31,482 10,693 138,058 12-14 8,478 20,099 9,017 12,671 4,400 54,665 15-18 8,473 17,524 8,001 10,977 4,682 49,657 39,149 89,061 39,273 55,130 19,775 242,380 289,665 483,197 185,547 359,686 82,338 1,400,430 Public Schools Total Public Private Schools Total Private Total All The total of 1,400,430 students includes 1,158,000 public school students and 242, 000 private school students. This compares favorably with a total elementary and high school enrollment of 1.3 million children reported for the years 2006-2008 by the US Census Bureau, American FactFinder (http://factfinder.census.gov/servlet/NPTable?_bm=y&-geo_id=16000US3651000&qr_name=ACS_2008_3YR_G00_NP01&-ds_name=&-redoLog=false). Workplace Data and Allocation Model To create a network of employed agents linked to their places of employment, we needed to specify an assignment process that established a realistic workplace social network in our study 16 area. Assigning workplaces required a different approach and data sources than the student to school assignments. In particular, the underlying assumption that people live close to their workplace is not the general case. We needed two sources of information to implement this process: (1) a count of the number of persons who lived in one Census tract but worked in another and (2) a count of firms by size by the same Census tract. The US Census Bureau published the Census 2000 Special Tabulation Product 64 (STP64), [9] that summarizes the number of persons by Census tract of work and Census tract of residence, combined. In addition, the commercial company InfoUSA has compiled the number of US firms by firm size category and Census block group. We used the InfoUSA firm data to create synthetic workplaces and locate them at the centroid of the block group indicated by the firm’s address. We then assigned workers to those workplace sites using the STP64 data. Basically, we assigned workers who reported working in a specific block group at random to a firm located within that block group. The workplaces also included schools, hospitals and other types of institutions that could be used to specifically track special synthetic agents such as teachers, health care workers, and others. We also had to account for persons not assigned a specific work tract in the STP64 data in this process. In most cases, the STP64 data file contained the number of persons who lived in one Census tract and worked in another. However, some records listed the number of persons who lived in one Census tract and worked somewhere in that same county. To assign these workers, we apportioned them to tracts within the county in accordance with the number of other persons working in those tracts. This meant that the tracts already employing the most people received most of these additional workers. We also used basic US Census counts to calculate the percentage of persons who were 15 to 54 years old and who were 55 to 74 years old for each Census tract. This calculation allowed the program to assign workers proportionately in areas with a larger proportion of older people (e.g., retirement areas) and in areas with a larger proportion of younger people (e.g., college towns). The NYC five borough region included 184, 617 workplaces with 2,856,727 employees. The distribution of firm by size and borough is presented in Table SI-4. The distribution of firms by size is: 17 4,857 with over 100 employees, 6,019 with 50 to 99 employees, 9,216 with 20 to 49 employees, 21,425 with 10 to 19 employees, and 143,100 with less than 10 employees. Nearly 70% of these workers commute to work five to seven days a week by subway, by bus, by their own vehicles, or as part of a car pool, but a significant percentage (> 16%) walk to work [10]. Table SI-4: Distribution of Firms by Size and Borough Size Total Bronx Brooklyn Manhattan Queens Staten Is. 1-9 143,100 12,590 30,003 66,291 28,425 5,791 10-19 21,425 1,614 5,184 9,886 3,807 934 20-49 9,216 847 1,896 4,551 1,598 324 50-99 6,019 580 1,297 2,811 1,101 230 100+ 4,857 337 916 2,452 704 178 Total 184,617 15,968 39,296 85,991 35,635 7,457 The 184,617 firms in NYC employ a total of 2,856,727 workers that live in the five borough region. This is more than 91.5 % of the total workers living in NYC. Table SI-5 presents information on the distribution of NYC workers by borough of residence. Approximately eight (8.09) percent of people who live in NYC work outside of the five borough region, mainly in other NY state locations but also in New Jersey and Connecticut. The OtherTravel row in Table SI-5 represents the less than one half percent of persons who were traveling beyond NYC and its suburbs when they filled out the STP64 part of the census survey. We assigned these workers randomly to firms within their borough of residence. 18 In terms of risk, the eight percent of NYC workers that work outside of the city are at risk for infection from their fellow workers. If transmission of influenza occurs at the workplace, they would return to NYC and transmit the infection via their household, subway and community mixing behaviors. Consequently, we assess the impact of including this source of transmission in the Sensitivity Analysis section below. Table SI-5: Distribution of Workers by Borough of Residence Borough of Number of % working in % of All Residence Workers NYC Workers Bronx 214,978 7.53 6.89 Brooklyn 571,976 20.02 18.32 Manhattan 1,506,944 52.75 48.26 Queens 459,598 16.09 14.72 Staten Is. 103,231 3.61 3.31 NYC 2,856,727 100.0 91.50 Outside NYC 252,564 - 8.09 Other –Travel 12,680 - .41 Total 3,121,971 - 100.0 Commuting Data The number of commuters by their mode of travel is available from the 2000 US Census data. We used these data to estimate the number of synthetic agents commuting to work by subway. We obtained non-commuting patterns of travel from the New York Household Travel Patterns: A Comparison Analysis [10], and the 1997/8 Regional Travel/Household Interview Survey [11]. The 2006 Community NYC Health survey identified characteristics of subway riders [12]. We assigned each virtual agent a probability of using the subway based on the model developed by Levine et al. [13] that linked agent traits to the incidence of subway ridership. 19 Because the STP64 does not identify the specific address of the workplace (beyond the block group) the census population commutes to, we assigned the centroid location of the administrative units used (census block groups) as the destination location. Using the location of the household and the workplace destination, we constructed distributions of the distance travelled (assuming block distance) to work shown in Table SI-6. Table SI-6: Estimated Distance-to-Work Distribution Derived from Census Commuting Data and Synthesized Household Locations Maximum Borough of Workplace Average Miles Bronx 8.430 42.929 Brooklyn 7.007 32.447 Manhattan 3.854 39.531 Queens 8.548 45.324 Staten Is. 11.068 42.626 Total 7.101 45.324 Distance The estimate of 7.1 miles per trip compares favorably with the Travel 2001 survey estimate of 8.82 miles (See [10], Table 3.2b, page 3-7). One important issue in the STP64 data is how the Census asks the question that is the source of the commuting estimate. Respondents were asked to identify the place they spent the most time working at in the previous week. This means that the US dataset contains data on regular commutes to the individual’s typical workplace as well as occasional work-related trips. As work trips lasting most of a week can be expected to involve longer distances than a typical commute, one might attribute the greater than expected number of very long distance commutes to such occasional work-related travel. Our response was to reassign this small number of NYC workers that indicated employment outside of the New York, New Jersey, and Connecticut region to firms located within the NYC 20 region. We also assumed that workers living in NYC but working in one of those three states commuted to those states. The 2006 Community NYC Health survey identified characteristics of subway riders [12]. We assigned each virtual agent a probability of using the subway based on the model developed by Levine et al. [13] that linked agent traits to the incidence of subway ridership. 2. Model Details Details of the Transmission Model Our transmission model is a stochastic, spatially structured individual-based simulation. Each simulation weekday, agents moved among their respective households, their assigned workplaces (or schools, depending on age), and various locations in the community, where they interacted with other proximal agents. Agents interacted more frequently with agents with whom they had closer relationships (e.g., family members, household members, classmates, and office mates). Large firm employees interacted more closely with their office mates but also encountered people who worked in different offices of the same firm. Workers in firms that have a single workplace (office) repeatedly contacted the same people each day. On weekends, schools and many workplaces were closed. This caused agents to suspend their activities but increase their community interactions by 50%. A minority (20%) of employees continued to work on weekends. Following the approach of Longini et al. [14], we described transmission within each contact group by a contact transmission probability ci (Table SI-7), which depended on the age of both the infectious and susceptible persons. This contact probability represented the likelihood (within each 24-hour simulated time period) of two individuals in a distinct social setting having a contact of duration and closeness sufficient to transmit an infectious dose of influenza virus. The different settings in Table SI-7 are the important social networks represented in the model and consist of households, schools, classrooms within schools, firms (corporate level), workplaces within firms, subway commuters, and non-commuter subway riders. 21 A community social network represented a “catchall” class of contacts that portrayed all other contacts not explicitly represented by the other categories. Because the Longini et al. and Ferguson et al. models do not consider subway riders as distinct social networks, the set of community contacts included in our model excludes the set of contacts that would occur while riding the subway. The probability of transmission, given a specific contact, P, is a single number that multiplies each contact probability. The degree of contagiousness of P can be linked to the basic reproductive number, R0, without modifying the underlying social interaction network parameters assumptions. We did not allow for any seasonal or weekly effects but we did represent weekend effects by reducing or eliminating contact rates in some social networks (workplaces, subway commutes and schools) and increasing them in other networks (households and students within the community network). No births or nonflu-related deaths were represented. The model computed the probability of infection for each susceptible individual each day based on the transmission probabilities for each potential infectious contact, pi(a1,a2) = P × ci(a1,a2) where a1 , a2 = a, an adult, or c, a child and pi(a,c) = the probability of an infectious child flu transmitting to a susceptible adult. If the infectious contact was receiving treatment of some type, this transmission probability was multiplied by (1 – Ti), where Ti is the treatment efficacy for infectiousness. The transmission probability pi can be further reduced for asymptomatic (yet infectious) contacts. The probability of a susceptible agent becoming infected in a given day Pt was computed as a product of all of the possible infectious contacts that occur each day. The probability that this susceptible adult becomes infected by a child is Pt = 1− (1− phh(a,c)) (1pwp(a,a)) (1-ps) (1-pc)3 where c and a denote child and adult, respectively. A Bernoulli trial results by generating a random number with two possible outcomes; transmission and notransmission. Thus, if the random number generated was less than Pi, the susceptible adult became infected and entered the infection stage. The source of infection could be determined by tracking the first occurrence of infection that occured from the sequence of individual contributions of each infectious contact. In the case above, all six infectious contact settings had a finite probability of being identified as the source. To assess the bias resulting from the order of 22 social network processing, we varied the order as part of the sensitivity analysis assessment and noted that it had a negligible impact. Ccommunity transmission depended explicitly on distance, as it represented random contacts associated with travel within the region for shopping, attending recreational events, personal visits, and the like. Table SI-7: Transmission Parameters Transmission Contact Group Infected Susceptible Household Adult Adult 0.4 Household Child Adult 0.3 Household Adult Child 0.3 Household Child Child 0.6 Elementary School Student Student 0.0435 Middle School Student Student 0.0375 High School Student Student 0.0315 Workplace Adult Adult 0.0575 Community All All 0.0048 a Contact transmission probability estimates ci from Longini et al [14]. Probabilitya 23 Infection seeding Initially, we seeded each run with 10 infected cases selected at random. Additional scenarios explored the effects of changing the number and the timing of introducing new influenza cases to NYC and changing the initial seed from 10 to 100 cases. These changes did not affect the general findings. We also examined the effects of adding additional daily and weekly seeds reflecting the infection being imported by workers living in NYC but working outside of the city. These additional seeds induced a more aggressive epidemic with a higher peak (5,000 additional cases) that occurred a few days earlier, and higher cumulative infections (4.4%) than the epidemic without the external seeding patterns. Figure below SI-7 presents this comparison. Interventions We examined a small number of different interventions to evaluate the effect of target strategies intended to restrict disease transmission on the subway only. We then compared these policies with a more realistic intervention applied to the total NYC population. Because we do not represent the bus riding agents explicitly (they are included in community action), we do not examine explicitly the effects of reducing or eliminating subway service because that would result in subway riders switching to an alternative mode of travel. Instead, we simulated the “perfect” intervention by eliminating all infections occurring to all subway riders by other riders. This would illustrate the effect of any completely effective subway intervention and would provide a frame of reference against which all related interventions could be compared. Also, we did not simulate school closure, as that has been studied by others; see Lee et al. [15]. Instead, we focused on personal hygiene and social distancing behaviours as well as vaccination policies to explore reducing transmission for a fixed level of contacts. The policies we investigated are: 24 Hand Washing, Microbial use, and Mask Wearing on subways: We investigated the collective effects of restricting contacts only on subways. We do not argue that these are realistic, well crafted interventions. Instead, we investigated whether they would be effective enough to pursue their adoption. In the first instance, we assume that some combination of Hand Washing, Microbial Applications and Mask Wearing that specifically targeted subway riders would reduce the effective number of contacts by a fixed percent. We further assumed a 10%, 20% and 30% reduction in transmissions on subways. The impact of these assumptions was very low, as shown in Figure SI-2. Figure SI-2: Impact of Reducing Contacts on Subways for a R0 = 1.4 Epidemic Incident Infections 120000 100000 Subway 100 80000 Subway 30 Subway 20 60000 Subway Int 10 40000 20000 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 Days Days The subway-only interventions had a small effect on the peak daily infection rate and the cumulative number of infections. These are further illustrated in Table SI-8. Even the totally effective subway targeted intervention dropped the peak from 104,944 to 84,604 (19%) and the cumulative infections from around 2,600, 000 to 2,270, 000 (12%). The principle point is that because of the relatively small portion of total infections that occur via the subway mixing process, subway targeted interventions can only have a limited effect on containing an epidemic. 25 Table SI-8: Effect of Subway Specific Contact Reducing Interventions Contact Reductions % Peak Subway Infections Total Infections 0 101,557 114,377 2,596,176 10 100,354 100,427 2,564,504 20 97,974 87,434 2,512,033 30 88,110 74,232 100 84,604 0 2,491,400 2,271,697 Hand Washing, Microbial use, and Mask Wearing in the Community: We also simulated contact reducing effects within the general community by assuming contact reductions occurred on the subway as well as in the community at large. We also assumed that the same level of workplace contacts occur within the workplace, the home and the school. As above, we investigated the effects of a 10%, 20% and 30% reduction in contacts, with are shown in Table SI-9. In these scenarios, the interventions were broader and directed at a larger population and a population that also rides the subway. These larger impacts are best illustrated by noting that even a 10% Contact Reducing intervention in the community sector drops the peak infection by 19% and cumulative infections by 11%, which is as large as the “totally effective” subway targeted intervention. Table SI-9: Effect of Contact Reducing Interventions Contact Reductions % Peak Subway Infections Total Infections 0 101,557 114,377 2,596,176 10 82,092 80,167 2,315,767 20 75,047 75,047 2,060,896 30 53,276 64,093 1,824,884 26 Vaccination programs: We also evaluated a low compliance vaccination program as a potentially interesting intervention. We were motivated to analyze vaccination because Levine et al. [13] reported that subway commuters specifically have lower vaccination rates than other segments of the population. Here we did not specifically target subway riders, nor limit our focus to contact behaviours outside of the workplace and school. Instead, we simulated the effect of targeting all adults with a 10%, 20% and 30% vaccination rate applied on day 14 of the epidemic to all adults. The efficacy rate of the vaccine was assumed to be 80%. The infection curves are shown in Figure SI-3. The curves indicate a steady decline in cumulative infections (2, 475, 566; 2, 362, 695; 2, 268, 802) with an increase in vaccination rate (10%, 20%, 30%). The proportion of infections that occurred on the subway were consistently just above 4% of the total. Figure SI-3: Impact of Vaccinating NYC Adults Incident Infections 120000 100000 80000 Vaccinate - 10% Vaccinate - 20% 60000 vaccinate 30% 40000 20000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 0 Days 27 Computational resources The simulation was written in C++ using the GNU compiler. The US simulation used approximately 15GB of RAM, and each realization ran in 10-15 minutes on a single CPU Opteron 854 based server. 3. Natural history and transmission parameters We employed the influenza natural history model described by Longini et al.; see [14] and [17]. This model has infected people progressing through the latent state (mean, 1.2 days) followed by an infectious state (mean, 4.1 days), after which they recover with immunity or die. The incubation period is longer than the latent period, so that people who are infected and develop influenza symptoms will do so on average 1.9 days after infection. A portion of those infected (33%) do not develop symptoms. The probability distributions of the latent, incubation, and infectious periods are shown in Figure SI-4. The 67% of infected people who develop influenza are twice as infectious as those without influenza symptoms. Additionally, this model withdraws a significant fraction of symptomatic persons from all of their mixing groups except household family members. 28 Figure SI-4: Latent Density (days 1-3) and Incubation Density (days 4 – 7) Percent 50 45 40 35 30 25 20 15 10 5 0 1 2 Latent Density (days 1-3) 3 4 5 6 7 Days Incubation Density (days 4 – 7) The natural history model assumes constant infectiousness from the end of infection latency to recovery. Others, including Ferguson et al. [16], use an incubation function distinct from the Longini et al. model, in that variable infectiousness is incorporated. Their estimated generation time for human influenza of Tg=2.6 days is shorter than the function used by the Longini [14] model. Pandemic influenza model parameterization The potential pandemic influenza strain was assumed to have the age-dependent attack rate pattern of the historical 1957-8 “Asian” influenza A (H2N2), see Longini et al. [14]. 29 Accordingly, we calibrated our model using the Ferguson et al. approach from historical (1957– 58, 1968–69) influenza pandemics. Our calibration targets followed the guidelines established by others, [15], [16], [17], and [18]. We specifically used the 30–70 rule developed by Ferguson et al. in [16] in which 70% of all transmission occurred outside the household—33% in the general community and 37% in schools and workplaces. The additional requirement that transmission rates in schools are double those in workplaces was a sensitivity analysis target in [16] as well as our NYC model. Calibrating the model involved targeting an epidemic with a 33% attack rate (AR) consistent with the age specific parameters derived from the 1957-58 pandemic. Daily contact rates were treated as endogenous parameters and were interpreted as the daily contact rates that reproduced a pandemic with a 33% AR in a population with no acquired immunity and satisfied the 30–70 rule. Therefore, our estimated contact patterns produced a NYC epidemic designed to be similar in transmissibility to the 1957– 58 epidemic with an AR of 33% and a basic reproductive rate (r0) of approximately 1.4. Table SI-10 lists the estimated number of contacts per day we applied per social network category. These results offer an additional source of validation information because if daily contacts estimated by some other source were available, they could provide an additional source of confirming (or not) information. We also assumed that 50% of sick students and workers stayed at home and did not interact with anyone outside of the household. Our workplace absentee rate is consistent with other models. However, we used a school absentee rate that is generally lower than other models (Ferguson et al. [16] use a 90% absentee rate). Additionally, we assumed that student community and adult ⁄ community contacts increased by 50% on weekends. 30 Table SI-10: Estimated Person-to-Person Contact Values (Contacts per Day) Mean Contacts Per Place Participant Day Social Network *Within School Student 14.98 School *Per firm Worker 1.84 Workplace *Subway Worker (commuter) 33.88 Subway *Subway Non-Worker 6.75 Subway *Community Non-Student 34.80 Community *Household All .9221 Household **Classroom Student 29.96 School Student 7.50 Community School Student 11.24 Community **Per office Worker 3.68 Workplace **Community weekday nonSchool **Community weekend non- 1 Daily contact probability per person. * Estimated. **Based on estimates in rows 1-6. Contacts within Household We observed that daily mutual contacts between each member of the family overstated transmission within households. This made calibration to the 30-70 target criteria impossible unless within household contacts were restricted to between 2 to 5 contacts per week per interacting pair. Accordingly, this parameter was treated as an endogenous variable in our model and was estimated as part of the calibration process. 31 Estimation of R0 from inter-pandemic influenza household data We calculated the value of R0 by two different methods. The first was to average the number of secondary infections that resulted from the agents used to seed the epidemic. The difficulty with this approach is that the sample size is too small and led to statistical errors too great to determine R0 within a small (0.1) rate of precision.) The second method is an approximation based on the slope of the cumulative number of cases; see Lipsitch et al [19]. In this method, the reproductive number R(t) = 1 + λ γ + f (1-f) (λ γ)2 where λ f is the relative duration of the latent period (i.e., 1.2 days / 5.3 days for our model), and γ derivative of the logarithm of the cumulative number of cases N, i.e., γ N)] / dt. We then calculated the basic reproductive number as a function of time. Although there are large oscillations at early times due to the larger statistical errors (from fewer cases), it is clearly noticeable that R(t) is largest early in the epidemic and drops off rapidly as the epidemic progresses. This is related to school children being particularly important spreaders in the initial stages of an influenza outbreak due to their strong household and school interactions. The first R0 calculation method can only calculate an averaged static value as an estimator, whereas as the second method can give information about the time development of R(t). Here, early-time fluctuations and enhancing the reproductive number clearly demonstrated the difficulty in measuring this quantity from available data in a real epidemic. Furthermore, this behavior is even more complicated by the spatiotemporal spread of the epidemic, which causes local variations of R(t) in time. Assumptions about Population Behavior during a Pandemic During a pandemic, researchers naturally assume that various spontaneous behavior changes would occur, including temporary school and workplace closures, once disease incidence exceeds some threshold. We know that during past pandemics, schools and theatres closed, some mass gatherings were cancelled and healthy adults stayed at home to look after sick family members; 32 see Barry [20]. However, even after the H1N1 epidemic of 2009, there is little or no data that researchers can use to accurately parameterize changes in contact rates. Accordingly, we adopted a baseline epidemic scenario that assumed no such behavioral changes – and then we examined social distancing and other interventions against this baseline. The baseline scenarios used here should therefore be viewed as worst cases in terms of the speed with which the epidemic peaks and therefore the consequent peak height. While early spread may occur at the maximal transmission rates seen in past pandemics, later spread may be slowed by spontaneous behavioral changes that reduce peak incidence and extend the duration of the epidemic rather than reduce cumulative attack rates. We assumed clinical disease affects individual behavior. Thus, while we assumed that clinical cases are twice as intrinsically infectious as non-clinical cases, we also assumed that 50% of symptomatic individuals reduced their school, workplace and community contact rates by 50%. This represented the effect of sickness-related absenteeism and withdrawal to the home. We were unable to find data from past pandemics to estimate these contact rate reductions. Transmissibility Scenarios In light of the analysis above, we focused on two transmissibility scenarios: (1) a 1957-58 type epidemic with a R0 =1.4, and (2) a 2009 scenario of R0 =1.2, which represented the 1957-58 epidemic but with acquired immunity levels consistent with those reported by Miller et al. [21] in England for the recent H1N1 epidemic (see Figure SI-5). 33 Figure SI-5. Transmissibility Scenarios: with and without Immunity Incident Infections 120000 100000 80000 With Immunity 60000 Without Immunity 40000 20000 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 210 0 Days Sensitivity Analyses Worker/Student Ratio As indicated above, we made assumptions about the proportion of transmission that occurs in schools and workplaces, because there were no data to allow us to estimate these parameters. For the NYC model, our baseline assumption was that 37% of transmission occured in these contexts, the within-school transmission coefficient being twice that of the within-workplace coefficient. However, this choice was arbitrary. We tested the sensitivity of the calibration scheme against an alternative calibration method also described by Ferguson et al [17]. This test investigated the alternative 30-70 calibration rule that assumed equal transmission rates in schools and 34 workplaces (in contrast to the rule that assumed a 2 to 1 transmission ratio of schools and workplaces. Figure SI-6 depicts both alternatives applied to a 1957-58 type, R0 = 1.3 epidemic. The assumption of equal transmission rates produced flatter infection curves with lower peak attack rates and small shifts to the right, but with only a slight reduction (1%) in cumulative incidence. The more rapid spread in the school level population by the 2 to 1 epidemic is in contrast with the 1 to 1 epidemic in Table SI-11. The age specific attack rate columns contrast the differential spread in school aged children presented for each assumption. The higher student transmission rates fostered by the 2 to 1 assumption epidemic results in infections spreading earlier and faster in children, but the epidemic also dissipates faster. There is no clear evidence for favoring one alternative over the other. However, because an epidemic incorporating a 2 to 1 transmission peaks more rapidly, policy makers may find it a more difficult scenario to contend with, and for this reason and no other we assigned it as the baseline scenario. Figure SI-6. Comparison of Calibration Rules for R0 = 1.4 Epidemics Incident Infections 120000 100000 80000 Baseline 2-1 60000 baseline 1-1 40000 20000 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 0 Days 35 Table SI-11: Age Specific Attack Rates (AR) for Alternative Calibration Methods Low High Infected1 AR1 Infected2 AR2 Pop 0 4 129,587 22.92 125,042 22.12 565,376 5 14 778,264 69.26 661,243 58.85 1,124,576 15 24 385,727 36.69 390,984 37.18 1.051,279 25 44 735,965 28.96 798,454 31.42 2,541,471 45 64 403,019 24.25 335,045 20.16 1,662,265 65+ - 162,656 18.02 172,872 19.15 902,498 - - 2,595,218 33.07 2,593,640 33.05 7,846,465 1 2 to 1 transmission rate assumption 2 1 to 1 transmission rate assumption Transmissions by Workers Employed Outside of NYC Workers living in NYC but who work in one of the three surrounding states were assumed to commute to those states. These workers would likely be at risk from a comparable epidemic; some of them would become infected at their workplace and possibly infect their family members upon returning home. We used two basic assumptions. Our first ignored this possibility and assumed that no external infections would occur. In the second, we modified our seeding submodel and seeded new infections that occurred at a rate comparable to the main epidemic. The different curves produced by the two assumptions are presented in Figure SI-6. We used a concept devised by Ferguson et al. [16] to seed the infection in the NYC region. We modeled the epidemic outside of NYC with a simple deterministic SEIR model [22] and with the same secondary infection rate ( R0 =1.4). The simple model calculated the incidence of infection through time, I(t). The expected number of imported infections per day, M(t), is then given by 36 M (t ) nL I (t ) , where n = 252,564 is the number of workers working outside of NYC, L=1.2 365 N days is the average latent period of the disease (we assumed symptomatic individuals would not travel to work) and N=19.1106 is the population of the greater metropolitan NY region. Figure SI-7 shows the baseline case with and without the effects of workplace infections that occur in workers employed in places outside of the five borough region. The additional infections caused a slightly more rapid epidemic with a slight increase of 3% in cumulative infections. Within-place Group Structure and Targeting In a manner similar to the scenario reported by Ferguson et al. [17] we incorporated into our model a group structure within the definition of schools and firms to represent school classes or groups of closely working colleagues in a workplace. We assumed that the school classroom and the workplace are social units in which the same people see each other every school/work day in relatively confined spaces. We also assumed that they are social units that might be targeted by control policies. While everyone in a school or workplace can contact any other person in that location, we assumed higher contact rates between people in the same group than between people in different groups. We also assumed that contacts in schools and firms were double those of classroom and workplace settings. This assumption produced slightly larger epidemics and overall had very little impact on baseline dynamics. 37 Figure SI-7. Comparison of Baseline Epidemic with and without non NYC Workplace Infections Incident Infections 120000 100000 80000 Baseline 2-1 60000 External Seeds 2-1 40000 20000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 0 Days Proportion of Infections that become Clinical Cases We assumed 67% of infections would be clinically severe enough for individuals to potentially seek healthcare and thus be defined as a clinical case. We also assumed these individuals to be more infectious than non-cases and more likely to be absent from school or work. This is consistent with Longini et al. [15] and the methods described in Halloran et al. [23]. Other authors [16] assume that half of infections become cases. Making the same assumption for our model decreased baseline cumulative clinical attack rates from 33% for R0 =1.4 to 30%, or an 8% drop in cumulative infections. Consequently, we believe, based on this modest change in cumulative infections, that assuming 67% of infections, while producing a more intense epidemic, also produces one that responds to interventions more dramatically and consequently could provide a more optimistic assessment regarding the feasibility of certain control measures. 38 Figure SI-8. Comparison of Epidemic Curves for the 67% versus 50% Symptomatic Case Assumption Incident Infections 120000 100000 80000 Asym 67% 60000 Asym 50% 40000 20000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 0 Days Figure SI-8 shows the effect of the change in assumed asymptomatic case rate for a R0 =1.4 epidemic and all other parameters identical. Epidemics were seeded on day 0 with 10 infections. Behavior of Symptomatic Individuals Our default assumption was that half of the 67% children who are severely symptomatic (i.e., clinical cases) will stay home from school, and that half of the symptomatic adults will also. Additionally, we assumed that symptomatic individuals have a 50% reduced community contact rate. We have found no data with which to verify these assumptions, particularly in a pandemic setting. To check whether these assumptions significantly affected model projections, we raised this rate to 75% (from 50%) as well as lowering it to 25% in all cases. Therefore, we generated simulations that assumed 75% (and 25%) of symptomatic adults were absent from work, and that 39 community contact rates were reduced by 75% (25%) for all symptomatic individuals. Keeping R0 fixed, the effect of these changes is shown in Figure SI-9 and represents a swing of approximately 6% change in cumulative attack rates. Figure SI-9. Comparison of Epidemic Curves for the 25%, 50% - Baseline, 75% Stay Home assumptions. Incident Infections 140000 120000 100000 Retire to home 25 80000 Retite to home 50 60000 Retire to home 75 40000 20000 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 0 Days Parameter Estimation Strategy Summary In our approach, we: obtained ridership data from published sources; used this data to create two separate subway rider groups: commuters and others in which being represented in both groups is possible; treated the total number of subway commuters as a fixed (exogenous) variable; assumed that classroom daily contacts were twice the estimated school contacts and that workplace contacts were twice the office/firm contacts; and 40 treated the set of contacts as endogenous variables and “fit” the epidemic function to reconstruct disease spread with the properties of a 1957-58 pandemic occurring in present day NYC. The criteria we use to estimate the contacts was the sum of the absolute difference between a target value and a model estimate of the following 6 response variables: overall Attack Rate (a proxy for r0), the proportion of subway riders that are commuters, source of infections – a proportion: households, source of infections – a proportion: schools, source of infections – a proportion: workplaces, and source of infections – a proportion: community. The 6 exogenous contact variables we estimate to minimize (5) are: daily workplace (flu transmissible) contacts, daily school contacts, daily community contacts, daily subway contacts by commuters, daily subway contacts by non-commuters, and daily household contacts, We included the probability that a NYC resident rides the subway for non-commuting purposes. The estimated set of contacts is intended to reproduce the 1957-58 pandemic in a NYC scenario and constitute the baseline run. The assessments of interventions are based on the set of contacts that define the baseline epidemic. 41 Estimates of the Proportion of Infections that Occur on the Subway A major question we sought to address is “what is the role of the subway in influenza transmission?” Clearly, our estimate was based on many assumptions we incorporated in the model logic. Our model provides a method for estimating the answer to this question, but it is an estimate with substantial variance. We estimated the variance of the contact estimates based on the Monte Carlo process and found the estimates to be overly optimistic (i.e., small) because they provided an estimate based in a fixed set of assumptions. To provide an estimate that depends on changing assumptions, we present summary results in Table SI-12, which records the proportion of infections occurring on the subway for each of the assumptions we used in the sensitivity analysis assessment. Two potential sources of sensitivity not discussed above are included. These new sources are the probability of infection given contact with an infected person = .0048 (as defined in Table SI-7) plus and minus 10% and the proportion of commuters that ride the subway = .47 (see Table 3) plus and minus 10%. The average proportion of total infections that occur on the subway is 4.29 percent and the 99 percent confidence interval is 4.06 to 4.54. In summary, we estimated the proportion of total influenza cases that transmit while riding the subway to be 3.5 to 5.0%. 42 Table SI-12: Estimates of Subway Rider Infections Proportion of Subway Total 153,084 2,598,059 4.38 Baseline 2-1 135,091 2,599,910 4.73 Baseline 1-1 164,442 2,733,654 4.10 Baseline 2-1 + WP Seeds 133,982 1,592,351 3.66 R0 = 1.2 122,236 2,430,375 4.09 Stay Home 75% 184,126 2,753,788 4.67 Stay Home 25% 125,789 2,365,661 4.26 Asymptomatic = .5 90,736 2,314,707 3.92 Transmission probability – 10% 133,908 2,838,113 4.72 Transmission probability + 10% 117,325 2,601,461 4.51 108,807 2,596,812 4.19 Subway Infections Assumption Estimate of commuter subway riders + 10% Estimate of commuter subway riders – 10% 43 References Beckman RJ., Baggerly K, McKay M. Creating synthetic baseline populations. Transportation Research Part A: Policy and Practice. 1996; 30(6): 415-429. Centers for Disease Control and Prevention (CDC). Serum cross-reactive antibody response to a novel influenza A (H1N1) virus after vaccination with seasonal influenza vaccine. MMWR. 2009;58(19):521-4. Wheaton, WD, Cajka, JC, Chasteen, BM, Wagener, DK, Cooley, PC, Ganapathi, L. Synthesized population databases: A U.S. geospatial database for agent-based models. RTI Press. 2009; http://www.rti.org/pubs/mr-0010-0905-wheaton.pdf US Census Bureau; Department of Commerce, Economics and Statistics Administration. 2000 Topologically Integrated Geographic Encoding and Referencing (TIGER) system. Washington, DC: US Census Bureau; 2005. US Census Bureau; Department of Commerce, Economics and Statistics Administration. 2000 Census of population and housing, summary file 3. Washington, DC: US Census Bureau; 2005 March. US Census Bureau; Department of Commerce, Economics and Statistics Administration. 2000 Washington, DC: US Census Bureau; 2005 Dec. Cajka, JC, Cooley, PC, Wheaton, WD. Attribute Assignment to a Synthetic Population in Support of Agent-Based Disease Modeling RTI Press. 2010; http://www.rti.org/pubs/mr-0019-1009cajka.pdf National Center for Education Statistics. Common Core of Data Build a Table [Internet]. Washington, DC: US Dept. of Education [date unknown] [cited 2010 Sep 3]. Available from http://nces.ed.gov/ccd/bat. 44 US Census Bureau. Census 2000 special tabulation: Census tract of work by Census tract of residence (STP 64) [CD-ROM]. Washington, DC: US Dept. of Commerce, Economics and Statistics Administration; 2004 [cited 2010 Sep 3]; Available from: http://www.census.gov/mp/www/cat/decennial_census_2000/census_2000_ special_tabulation_census_tract_of_work_by_ census_tract_of_residence_stp_64.html Hu PS, Reuscher TR. New York Household Travel Patterns: A Comparison Analysis. Office of Transportation Policy and Strategy, New York State Department of Transportation, prepared by Oak Ridge National Laboratory. 2007. ORNL/TM-2006/624. New York City Department of City Planning. Changes in Employment and Commuting Patterns among Workers in New York City and the New York Metropolitan Area, 2000-2007. 2008; http://www.nyc.gov/html/dcp/pdf/census/census_commute_patterns0007.pdf. New York City Department of Health and Mental Hygiene. 2006 Community Health Survey. http://home2.nyc.gov/html/doh/html/survey/survey-2006.shtml. Accessed September 16, 2010. Levine, B, Wilcosky, T, Wagener, D, Cooley, P. Mass commuting and influenza vaccination prevalence in New York City: Protection in a mixing environment. Epidemics. 2010; doi:10.1016/j.epidem.2010.07.002. Longini I, Nizam A, Xu S, et al. Containing pandemic influenza at the source. Science. 2005;309:1083-1087. Lee BY, Brown ST, Cooley P, Potter MA, Wheaton WD, Voorhees RE, et al. Simulating School Closure Strategies to Mitigate an Influenza Epidemic. J Public Health Manag Pract. 2010;16(3):252-61. 45 Ferguson NM, Cummings DA, Fraser C, Cajka JC, Cooley PC, Burke DS. Strategies for mitigating an influenza pandemic. Nature. 2006;442(7101):448-52.6 Germann T, Kadau K, Longini IJ, Macken C. Mitigation strategies for pandemic influenza in the United States. PNAS. 2006;103(15): 5935-5940. Ferguson N, Cummings D, Cauchemez S, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437:209-214. Lipsitch, M., Cohen, T., Cooper, B., Robins, J. M., Ma, S., James, L., Gopalakrishna, G., Chew, S. K., Tan, C. C., Samore, M. H., et al. (2003) Science 300, 1966-1970. Barry, J. M. The Great Influenza: The Epic Story of the Deadliest Plague in History (Penguin, 2005). Miller E, Hoschler K, Hardelid P, Stanford E, Andrews N, Zambon M. Incidence of 2009 pandemic influenza A H1N1 infection in England: a cross-sectional serological study. Lancet. 2010; 375(9720): 1100-1108. Anderson, R. M. & May, R. M. Infectious Diseases of Humans; Dynamics and Control (Oxford University Press, Oxford, 1991). Halloran, E., M., Eubank, S., Ferguson, M., N., Longini, M., I., Barrett, C., Beckman, R., Burke, S., D., Cummings, A., D., Fraser, C., Germann, C., T., Kadau, K., Lewis, B., Macken, A., C., Vullikanti, A., Wagener, K., D., & Cooley, C, P. (2008). Modeling targeted layered containment of an influenza pandemic in the USA. Proceedings of the National Academy of Sciences, published online.