Principal Investigator/ Program Director: Emch, Michael, Edward Introduction This project will model spatio-temporal fluctuations of cholera in Bangladesh and Vietnam by integrating spatial data sets including satellite imagery, climatic variables, and socio-demographic data. Lobitz et al. (2000) suggested that cholera is influenced by climatic changes, which can be indirectly measured using satellite imagery. They illustrated that sea surface temperature (SST) and sea surface height (SSH) in the Bay of Bengal were associated with temporal fluctuations of cholera in Dhaka, Bangladesh from 1992 to 1995. In this proposed study we will expand their model by: (1) including several more satellite-derived biophysical variables in three additional study areas; (2) investigating how temporal associations with satellite-derived biophysical variables vary in space (i.e., between and within study areas); and (3) using satellite imagery to model changes in estuaries, the postulated environmental reservoir for cholera. The specific variables that we will incorporate into the model of the spatio-temporal distributions of cholera include SSH derived from the TOPEX Poseidon satellite; SST derived from Advanced Very High Resolution Radiometer (AVHRR), ADEOS Ocean Color and Temperature Scanner (OCTS), Terra Moderate-resolution Imaging Spectroradiometer (MODIS), and Aqua MODIS; chlorophyll concentration derived from the Coastal Zone Color Scanner (CZCS), OCTS, SeaWiFS, Terra MODIS, and Aqua MODIS; flooding derived from Radarsat and the European Remote Sensing (ERS) satellite; land use/ land cover (LULC) derived from the Landsat Multispectral Scanner (MSS), Thematic Mapper (TM), Enhanced Thematic Mapper (ETM+), and ASTER sensors; monthly temperature and rainfall from weather stations; and population distribution and socio-economic status from spatially referenced demographic databases. Associations between these variables and cholera incidence in Bangladesh and Vietnam can be used to predict future epidemics in other parts of the world. This project is a unique interdisciplinary opportunity to merge geographic, epidemiological, and ecological theories and methods for infectious disease research. While the emergence and fluctuation of cholera is not well understood, it is clear that where, when, and how many people contract cholera is related to distributions of environmental and climatic variables. People contract cholera when they ingest an infective dose of Vibrio cholerae bacteria, which have been shown to be present in estuaries, ponds, lakes, and rivers. Fluctuations in bacteria populations depend on environmental conditions such as water temperature and plankton concentrations. Identifying the niches of cholera requires defining the areas of increased risk of contracting the disease. This study will use geographic information technologies including satellite remote sensing to model environmental, climatic, and socio-demographic dynamics, and will compare these distributions with fluctuations in the spatio-temporal distributions of cholera in Bangladesh and Vietnam. In Vietnam, cholera case data will be compiled from hospital records from 1980 to 2003. In Bangladesh, cholera case data will be compiled from treatment facility records from 1983 to 2003. The locations of all cholera cases will be mapped and integrated with the environmental, climatic, and socio-demographic variables that will be derived from the satellite sensors, local population census databases, and other secondary data sources. The main research question for this project is: What are the spatio-temporal associations between cholera incidence and satellite-derived environmental variables (i.e., chlorophyll concentration, SST, SSH, rainfall, LULC, flooding), climatic variables (i.e., in situ rainfall, temperature), and socio-demographic variables (i.e., population density, socio-economics)? The study areas include Hue and Nha Trang, Vietnam and Matlab, Bangladesh. The broader impact of this study is both theoretical and methodological. It will describe how satellite imagery and other spatial information can be used to predict cholera outbreaks. Thus, the project will build upon recent literature describing the use of remote sensing in health research (Epstein, 1998; Beck et al., 2000). We will report how much of the spatio-temporal variation of cholera can be explained using the aforementioned predictor variables in Bangladesh and Vietnam. We will investigate whether there are patterns to relationships in time and space. In other words, we will determine whether the relationships in a particular time and place (e.g., 1992-95 in Dhaka) are also significant and informative in 24 Principal Investigator/ Program Director: Emch, Michael, Edward other times and places. The results of the proposed study will also serve as methodological case studies that describe how specific satellite image products can be used to predict cholera in areas that have had many severe cholera epidemics during their respective study periods. Satellite imagery will also be used to model how local and regional ecosystems are changing and how changes in the different ecosystems affect humans (i.e., cholera distributions). The findings can then be used to extrapolate to other areas of the world where cholera epidemics might occur in similar ecosystems. During the present cholera pandemic, cholera has spread to similar ecosystems in Asia, Africa, and South America. Ali et al. (2002a) predicted that the next cholera pandemic, caused by a new genetic variant of cholera (i.e,, Vibrio cholerae O139), might also spread to similar ecosystems around the world. This study is a collaborative effort between multidisciplinary investigators at the International Centre for Diarrhoeal Disease Research, Bangladesh (ICDDR,B), the Vietnam National Institute of Hygiene and Epidemiology (NIHE), the International Vaccine Institute (IVI) and Portland State University (PSU). Theoretical Context In order to advance the philosophical and theoretical implications of this study it is necessary to situate it within a theoretical context. In a recent speech to the Royal Swedish Academy of Science, Professor Rita Colwell described her vision of how science will become more interdisciplinary and suggested that there is a need for new frameworks of inquiry (Colwell, 2002a). Her concept of biocomplexity is pertinent to this study as it “denotes the study of complex interactions in biological systems, including humans, and their physical environments.” She believes that old science relied too much on a reductionist approach and that new frameworks should be integrative and use modern tools such as remote sensing and information technologies. Professor Colwell developed many of these concepts using cholera as an example (Colwell, 2002b). She stated that “the cholera story, still to be fully unraveled, embraces environmental factors from the cellular level to the scale of global climate.” She defined biocomplexity as “the dynamic web of interrelationships that arise when living entities at all levels, from genes to human beings to ecosystems, interact with their environment.” Her case example of cholera calls for a dynamic view of the disease looking at the complex and dynamic interactions between environment, host (humans), and disease agents (Figure 1). She argues that real world phenomena such as cholera distributions have interactions across many scales, and that many variables concerning the interactions of humans with their environment must be considered to understand this complex disease. Environment longevity & infectivity distribution & transport altered selective pressures Agent Pathogenicity Immune response Host specificity nutrition hygiene treatment housing Host After Colwell 2002a Figure 1: Colwell’s Biocomplexity Theory Applied to Cholera This project can be situated within the human-environment tradition of geography. Theoretical concepts of human-environment connections have evolved within the field of geography for more than a century. Recent theoretical progress within the human-environment tradition of geography parallels other ecological fields. New ecological theories assume that there are non-equilibrium conditions in environment and change processes and that there are significant human impacts on natural environments 25 Principal Investigator/ Program Director: Emch, Michael, Edward in areas that were once thought to be natural (Zimmerer, 1994; Zimmerer and Young, 1998). There is a growing understanding that environmental change and its causes varies across settings thus research involving human-environment interaction must be multifaceted, multiscaler, and multitemporal. The proposed study uses a holistic approach to investigate the distribution of cholera in space and time. Turner (2002) recently suggested that the field of geography should become a “human-environment science” and that inquiry of “coupled natural-human systems” is a logical division of the systematic sciences. Medical geographers have been developing geographic theory that is aligned with both Turner’s definition of what geography should be in the future and Colwell’s philosophy of a future science. The medical geographic theoretical approach of disease ecology maintains that disease results from a dynamic complex of variables that coincide in time and space (May, 1958, 1977; Mayer, 1982, 1984, 2000; Mayer and Meade, 1994; Meade, 1977; Meade et al., 1988; Meade and Earickson, 2000; Learmonth, 1988; Paul, 1985; Pyle, 1977, 1979). Hunter (1974) argues that we must not have a pathogencentric view of disease, i.e. one that focuses only on the disease agent. He suggests that our studies of disease "must co-jointly involve pathogen, host, and environment" (Hunter, 1974). He views environment broadly as consisting of "diverse physical, biological, social, cultural, and economic components" (Hunter, 1974). Hunter defines geography as a discipline that bridges the social and environmental sciences and writes that "its integration and coherence derive from systems-related analysis of man-environmental interactions through time and over space" (Hunter, 1974). This medical geographic approach is holistic recognizing that one must investigate the integration of many different types of variables responsible for disease. While types of variables to be investigated have been classified in many different ways, Mayer's (1986) classification system is most useful. Mayer differentiated between biological, socioeconomic, behavioral, and environmental variables. Biological variables are those that describe biological characteristics of the host (e.g., blood type). Behavioral variables are those that describe individual or group behaviors and may be related to culture or individual decision making (e.g., what types of food people eat). Environmental variables are those of the biophysical environment (e.g., climatic variables). Socioeconomic variables are variables that affect the coincidence of agent and host (e.g., wealth or class). Different patterns of socioeconomic, biological, and environmental variables result in different spatial and temporal patterns of disease. Virtually every disease exhibits spatial and temporal variation and medical geographers attempt to explain this variation. This study goes beyond our previous medical geography/ spatial epidemiology work on cholera (Emch, 1999, 2000; Emch and Ali, 2001, 2003; Ali et al., 2002a, 2002b, 2002c, 2002d) since it describes the use of satellite imagery for measuring environmental variables and how they can be used to predict disease. It also investigates how the spatio-temporal distributions of cholera vary within and between ecosystems, and how these relationships have changed over a 20-year period, during which there have been significant changes in the environment, the human dimension, and the genetic strain of the disease agent. Literature Review Cholera is an acute infection caused by the colonization and multiplication of Vibrio cholerae O1 or O139 within the human small intestine. The incubation period ranges from one to five days and the disease is characterized by watery diarrhea, muscle cramps, vomiting, and dehydration. Vibrios are water-borne organisms that are natural inhabitants of seas, estuaries, brackish waters, rivers, and ponds of coastal areas of the tropical world. They flourish in the dense organic matter, algae, and zooplankton of the Ganges delta and similar ecosystems. Lipp et al. (2002) offered a hierarchical model for environmental cholera transmission that includes abiotic factors, phytoplankton, and zooplankton leading to human ingestion of an infective dose of V. cholerae. Abiotic conditions including temperature, pH, Fe3+, salinity, and sunlight influence vibrio growth and expression of virulence genes such as those that regulate cholera toxin (responsible for watery diarrhea) (Lipp et al., 2002; Faruque et al., 1998). These abiotic factors influence phytoplankton and aquatic plants, which promote survival of V. cholerae and provide food for zooplankton. V. cholerae proliferate in an environment that includes commensal 26 Principal Investigator/ Program Director: Emch, Michael, Edward copepods and crustaceans because they provide attachment sites for vibrios to multiply and serve as a vector to transmit an infective dose to humans (Huq and Colwell, 1995). In their model, Lipp et al. (2002) also note that there are other influences on cholera transmission including climate variability (i.e., climate change, El Nino-Southern Oscillation [ENSO], North Atlantic Oscillation), seasonal effects (i.e., sunlight, temperature, precipitation, monsoons), and human dimensions (i.e., socioeconomics, demographics, and sanitation). Pascual et al. (2000) found that the temporal variability of cholera is associated with three inter-related climate variables including upper troposphere humidity, cloud cover and top-of-atmosphere absorbed solar radiation. Rodo et al. (2002) suggest that because of ENSO-related climate variability patterns, there is a 4-year periodicity to the temporal cycle of cholera. The theoretical basis for the Lobitz et al. (2000) study and for the cholera prediction model that we will build is as follows. Increased SST facilitates phytoplankton growth and therefore commensal copepods that eat the phytoplankton flourish (Kiørboe, 1994; Huq and Colwell, 1996). Lobitz et al. (2000) suggest that there is a relationship between SST and phytoplankton concentrations and that this is the reason for the relationship with cholera. After an initial lag period, V. cholerae proliferate and are subsequently transmitted to humans. Lobitz et al. (2000) argue that SSH is related to human-Vibrio contact because it causes tidal intrusion of plankton, which transports the bacteria into inland waters. They found a direct temporal relationship between SSH and cholera outbreaks. In our proposed study we will test whether SSH is related to cholera in our three study areas and use satellite-derived chlorophyll levels in water as a surrogate for phytoplankton as well as SST. We will investigate these phenomena at a much more detailed spatial scale than did Lobitz et al. (2000), in several different environments, over a much longer time period, and using several more variables including those that incorporate humans in the cholera ecosystem such as socioeconomics and demographics. The human dimension is essential, enabling us to describe the human circumstances in which prediction of cholera based on biophysical variables will work. In order to contract cholera, a person must ingest an infective dose of cholera, which is about 106 bacteria. Cholera is either foodborne or caused by ingesting fecally contaminated water (Hughes et al. 1982; Khan et al. 1981; Spira et al. 1980). Ingestion can occur if a flood overruns the sewage system or non-septic latrines thus infecting water supplies. Ingestion can occur by swallowing contaminated water when bathing, drinking, washing, and/or cooking the untreated water (Figure 2). Figure 2: Major Transmission Routes of Cholera (modified from Mintz et al., 1994) The large amount of bacteria excreted in the feces of infected people can cause massive environmental pollution. Poor sanitation facilitates the fecal-oral transmission process. Aquatic reservoirs also facilitate cholera transmission by providing long-term natural habitats for the pathogen; the consumption of water or food from such reservoirs puts humans at risk of infection (Figure 2). V. cholerae can survive and 27 Principal Investigator/ Program Director: Emch, Michael, Edward grow in food when the conditions are adequate including low temperatures, high moisture levels, high organic content, and near-neutral pH (DePaola 1981; PAHO 1991). V. cholerae can survive on most foods from 2 to 14 days (PAHO 1991). Cholera transmission can be divided into primary and secondary types (Colwell and Spira, 1992). Primary cases are the result of infection by surface water sources; for example, a person is directly infected with the bacteria by drinking untreated pond water or eating undercooked shellfish. Secondary cases are people infected by fecal-oral transmission from other people; for example, a healthy family member is infected by a sick family member who puts his/her hands in the family's drinking water pot. Another example of secondary transmission is when a mother is infected by the feces of her baby. Primary transmission is controlled by factors such as temperature, salinity, nutrient concentrations, the number of available attachment sites (plankton), shellfish consumption, and contact with water (Colwell and Spira, 1992). Emch (1999) found that sanitation and water availability and use are extremely important in the effort to reduce secondary cholera transmission. Sommer and Woodward (1972) found an inverse relationship between diarrhea and access to tube well water. Khan (1981) found that people were more likely to contract cholera if they had greater access to canal water compared with river or pond water. Glass et al. (1982) reported higher cholera incidence rates in villages that are not adjacent to rivers. Hughes et al. (1982) found that people who used contaminated surface water for cooking and bathing were more likely to contract cholera than those who did not. Several studies have found that risk of diarrheal diseases is associated with environmental variables. Emch (1999, 2000) found that cholera is associated with flood control projects. A recent study found that the influx of fresh water from rainfall events upstream from an estuary led to increases in vibrio populations (Colwell, 2002a, cited study by Valerie Louis). Before 1963, classical Vibrio cholerae O1 was the dominant strain of cholera in Bangladesh. A new strain, Vibrio cholerae O1 El Tor, was initially recognized as a mild cholera-like disease in an Indonesian village in 1937 and was confined there for approximately two decades (Burua, 1992). In 1959, however, it was detected in Thailand, and by 1963 it had spread to India and Bangladesh in pandemic form. It then spread throughout the world, attacked more than half a million people, and claimed 5000 lives within eighteen months (Epstein, 1997). The recent history of the disease suggests that classical cholera was entirely replaced by El Tor. The replacement occurred because the environmental niches where the vibrios live are the same for the two strains (Ali et al., 2002a). They share similar ecological environments and patterns of transmission (Shears, 1994). In 1992, a new strain, Vibrio cholerae O139, emerged in India and subsequently began to spread to Bangladesh and neighboring countries in 1993 (Shimada et al., 1993; Cheasty et al., 1993; Attapattu, 1994; Tay et al., 1994; Sachdeva et al., 1995; Siddique et al., 1996; Dalsgaard et al., 1996, 1998). Vibrio cholerae O139 has spread even more rapidly than did O1 El Tor. This leads to the question of whether El Tor will be replaced by O139 in the future? While the symptoms of classical, El Tor, and O139 cholera are similar in Bangladesh, there are some differences in the seasonal cycles of the different strains; for example, O139 cases appear later in the postmonsoon period (personal communication, M. Yunus, 2003). In Bangladesh, cholera transmission is seasonal with a peak after the monsoon, extending from September to December (Emch, 1999). Baqui et al. (1992) identified two cholera peaks, one sometime between September and December, and the other between March and June. Colwell and Spira (1992) suggested that the post-monsoon epidemic is associated with a heavy bloom of zooplankton; they postulated that there is a permanent environmental reservoir for Vibrio cholerae in the brackish ponds and canals of rural Bangladesh. Oppenheimer et al. (1978) reported that zooplankton populations decrease during the monsoon season and then increase after the monsoon because of phytoplankton bloom. Emch and Ali (2001) described the temporal cycle of cholera epidemics in Matlab, Bangladesh (Figure 3). They found that during a three-year study period (1992 through 1994) there were three main cholera peaks in September, October, or November. The 1992 epidemic was far less severe than the 1993 and 1994 epidemics. Secondary epidemics occurred in March and April in all three years, however, the 1992 28 Principal Investigator/ Program Director: Emch, Michael, Edward epidemic was more severe than the other two years. Cholera cases were completely absent near the beginning of each year. 90 80 70 60 50 40 30 20 10 0 Post Post Post Pre 11/1/94 9/1/94 7/1/94 3/1/94 1/1/94 11/1/93 9/1/93 7/1/93 5/1/93 3/1/93 1/1/93 11/1/92 9/1/92 7/1/92 5/1/92 3/1/92 1/1/92 5/1/94 Pre Pre Cholera Figure 3: Temporal Distribution of Cholera in Matlab, Bangladesh Specific Research Objective, Questions, Hypotheses The objective of this study is to investigate the spatio-temporal dynamics of cholera in Bangladesh and Vietnam and to develop a cholera prediction model. We will answer the following research questions: 1. What are the spatio-temporal associations between cholera incidence and satellite-derived environmental variables (i.e., chlorophyll concentration, SST, SSH, rainfall, LULC, flooding), climatic variables (i.e., in situ rainfall, temperature), and socio-demographic variables (i.e., population density, socio-economic status)? 2. How do associations between cholera and satellite-derived biophysical, climatic, and sociodemographic variables vary in space (i.e., between and within study areas), time (i.e., during the 20-year longitudinal study period), and by cholera strain (i.e., classical, El Tor, O139)? 3. Are changes in estuaries (i.e., turbidity, chlorophyll concentration) and areas around estuaries (i.e., LULC) related to cholera incidence? 1. 2. 3. 4. 5. 6. We will test the following hypotheses in each of the study areas: There is a relationship between satellite-derived SSH and cholera incidence. There is a relationship between satellite-derived SST (time lagged) and cholera incidence. There is a relationship between satellite-derived chlorophyll concentration (time lagged) and cholera incidence. There is a relationship between satellite-derived flooding and cholera incidence. There is a relationship between in situ and satellite-derived rainfall and cholera incidence. The aforementioned relationships between cholera and environmental and climatic variables (hypotheses 1-5) are influenced by demographic and socioeconomic distributions (i.e., the relationships in the physical world vary across socio-demographic situations). 29 Principal Investigator/ Program Director: Emch, Michael, Edward 7. The model of spatio-temporal fluctuations of cholera is neither constant nor linear in time and/or in space. 8. The model of spatio-temporal fluctuations of cholera is not constant by dominant cholera strain (classical, El Tor, O139). Study Data Table 1 summarizes the source and availability of each environmental, climatic, and sociodemographic variable. The cholera incidence data are described below in each study area description. Different data are available for different time periods. For instance, at the beginning of the study period, in 1983, the only available satellite data are intermittent CZCS images for chlorophyll concentration, AVHRR for SST, and Landsat MSS for LULC around estuaries. At the end of the study period in 2003, almost all of the satellite data sources are available and the sensors generally have much better spatial and spectral resolutions and therefore provide more specific information to input into the models. Also, in Bangladesh, cholera incidence can be mapped at the extended household unit level throughout the study period because both the cholera numerator and population denominator are available. However, in both Vietnamese study areas only recent cases can be mapped at the household level; cases before 1990 must be mapped by local-level administrative unit (hamlet). The Bangladesh Study Area: Matlab The research site for the ICDDR,B and for this project is called Matlab because the Centre's hospital is located in Matlab Town. Matlab is in south-central Bangladesh, approximately 50 kilometers south-east of Dhaka, adjacent to where the Ganges River meets the Meghna River forming the Lower Meghna River. Figure 4 shows the Matlab study area relative to the Meghna River. The river flowing adjacent to Matlab Town is the Dhonagoda River. a ghn Me '] Matlab er Riv Study Area 30 Principal Investigator/ Program Director: Emch, Michael, Edward Figure 4 Study area superimposed on Landsat TM satellite image. Variable Chlorophyll Concentration in Water Data Source and Availability Environmental Independent Variables CZCS (1978-86 intermittent), ADEOS OCTS (1996-97), SeaWiFS (1997-), Terra MODIS (1999-), Aqua MODIS (2002-) Sea Surface Temperature AVHRR (1979-), Terra MODIS (1999-), Aqua MODIS (2002-) Sea Surface Height LULC Change In and Around Estuaries TOPEX/ Poseidon (1992-) Landsat MSS (1972-), TM (1986-), ETM+ (1999-March 2003), ASTER (1999-), Hyperion (2000-) Water Turbidity Landsat MSS (1972-), Landsat TM (1986-), ETM+ (1999-March 2003), AVHRR (1979-), ASTER (1999-) Flooding Distance from Water Bodies/ Flooding ERS (1992-), Radarsat (1996-) Landsat MSS (1972-), Landsat TM (1986-), ETM+ (1999-March 2003), AVHRR (1979-), ASTER (1999-), ERS (1992-), Radarsat (1996-), GIS Analysis Climatic Independent Variables Monthly Rainfall Weather stations (Bangladesh- Chandpur; Vietnam, Hue & Nha Trang; Mozambique- Beira and Quelimane) Weather stations (Bangladesh- Chandpur; Vietnam, Hue & Nha Trang; Mozambique- Beira and Quelimane) Socio-demographic Independent Variables Monthly Temperature Population Distribution Demographic surveillance systems, vaccine trial databases, and/or census combined with GIS (availability varies by time- see study area descriptions) Socioeconomic Distribution Demographic surveillance systems, vaccine trial databases, and/or census combined with GIS (availability varies by time- see study area descriptions) Table 1: Source and Availability of Independent Variables (Note: Appendices 1 & 2 list satellite data product attributes, sources, and costs in greater detail) A demographic surveillance system (DSS) has recorded all vital events of the study area population since 1963; the study area population has been approximately 200,000 since that time. The database is the most comprehensive longitudinal demographic database of a large population in the developing world. The people of the study area live in clusters of patrilineally-related groups of households called baris. The P.I. created a vector GIS database of the Matlab field research area (Emch, 1995; Emch, 1998; Emch, 1999; Ali et al., 2001a). Features in digital format include baris, rivers, health facilities, and a flood-control embankment. Figure 5 shows three features in the GIS database including the flood-control embankment, the Dhonagoda River, and baris. 31 Principal Investigator/ Program Director: Emch, Michael, Edward Figure 5 Study area GIS database. The three map views in Figure 5 are displayed at different scales. The map view on the far right has the individual bari identification numbers visible. The baris are all identified by an ICDDR,B DSS census number within the structure of the GIS database. This allows us to link attribute data to the spatial database. In turn, demographic, disease, and other data can be linked to specific bari locations. The Matlab field research center has in- and out-patient services, a medical laboratory, and research facilities. One-hundred twenty community health workers (CHWs) visit each household area every two weeks to collect demographic, morbidity, and other data. The DSS conducts periodic censuses and uses CHWs to update demographic data (e.g., births, deaths, and migrations). The Vietnam Study Areas: Nha Trang and Hue The Vietnam case study areas are Nha Trang and Hue, both of which are in coastal regions (Figure 6). Figure 7 is a map of several features of Nha Trang including administrative units (communes), rivers, lakes, roads, railroads, and the locations of the two estuaries in the study area. Cholera caused by Vibrio cholerae 01 El Tor first appeared in Vietnam in 1964. Cholera case data will be compiled from hospital records from 1980 to 2003 and from a household-level vaccine trial database from 1995 to 2003. While there are presently many cases of cholera in Hue (approximately 200 laboratory confirmed cases in one part of Hue in 2003), there have been no cases in Nha Trang since 2000; the reason for the disappearance of cholera is unknown and is confusing in the light of the current socio-environmental conditions. To prepare for this proposed study, in September 2003 the P.I. took a boat into the northern estuary of Nha Trang (Figure 8). At the red dot there was a dense slum with approximately 15 latrines hanging over the water. The black dots are the individual households, mapped using global positioning system (GPS) receivers as part of a census that was done for the vaccine trial. There were small children swimming in the water near the hanging latrines; thus, the socio-economic and sanitation environment was perfect for cholera yet there has not been a case for several years. Cholera transmission will not occur unless the environmental situation is right for the disease. Since Hue and Nha Trang possess similar sociodemographics, some yet undetected difference must exist that is responsible for the different cholera patterns. One obvious difference between the two study areas is that Hue is farther up the estuary and the feeder river is much larger (Figure 9). However, there may be other differences that can be determined using satellite imagery. The locations of cholera cases will be mapped at different levels. This study will derive cholera incidence from commune health center records and the population database that is being created jointly by the NIHE and IVI for vaccine trials and disease burden studies in Hue and Nha Trang. A detailed 32 Principal Investigator/ Program Director: Emch, Michael, Edward population census has been conducted for the 300,000 persons living in Nha Trang, and the locations of each of the 40,000 households have been mapped using GPS receivers. A similar spatial and population database has been collected in Hue of 285,000 persons living in 56,000 households. Cholera cases will be derived from hospital records. Figure 6: Study Area Locations Figure 7: Nha Trang GIS Database Independent Variables The beginning of this project will involve conducting a comprehensive search (via various web and ftp sites) for all of the available cloud-free satellite images for the three study areas during the entire study period. First, we will collect and/or model the data for the independent variables that are hypothesized to be related to cholera distributions; next we will divide the data into spatial units for statistical analysis. The proposed study investigates local statistical relationships (Fotheringham et al., 2002) and thus we will divide variables into spatial subsets. Lobitz et al. (2000) measured the temporal relationship between cholera and SST and SSH at one point in the Bay of Bengal. We will base our analyses on many points and/or average values within or adjacent to cholera data collection units. Some variables are continuous (e.g., SST, SSH, chlorophyll); however, other data sets, including those collected from weather stations, are not available as spatially-continuous distributions. Therefore, these variables will be incorporated into the models only as temporally varying distributions. Some of the independent variables listed in Table 1 do not require any preprocessing because they do not vary in space (i.e., the monthly temperature and rainfall from weather stations). Others require minimal data processing by the investigators because they are collected as derived variables. The methods used to describe the calculations to compute the derived variables are too complex to describe here; these methods are provided in detail in the following sources (Strong et al., 1984; McClain et al., 1985; Esaias et al., 1998; Mitchum, 1998; O'Reilly et al., 1998; Walton et al., 1998; Brown and Minnett, 1999; Chambers et al., 2003). The variables that will be derived from secondary sources include AVHRR and MODIS-derived SST (Figure 10); chlorophyll concentration derived from CZCS, OCTS, SeaWiFS, Terra MODIS, and Aqua MODIS (Figure 11). These datasets will be georeferenced, a lengthy process because there are a large number of satellite images. 33 Principal Investigator/ Program Director: Emch, Michael, Edward Figure 8: Study Area Locations Figure 9: Nha Trang GIS Database Some independent variables will require a significant amount of preprocessing work to derive the predictor variables from the satellite imagery or primary data sources. These variables include water turbidity and LULC (derived from Landsat MSS, TM, ETM, AVHRR, and ASTER) as well as flooding (derived from Radarsat and ERS) (Figure 12). The satellite-derived data have various spatial resolutions including ≤30 meter (e.g., multispectral bands of Landsat TM & ETM+ and radar imagery), 79 meters (e.g., Landsat MSS), 250 meters (e.g., MODIS-derived chlorophyll and SSH), 1.1 kilometers (e.g., AVHRR-derived SST), 4 kilometers (i.e., SeaWiFS) (Figure 11). Measuring relationships between cholera and independent variables collected at various spatial resolutions requires that the images be resampled to a common resolution. The dependent variable (i.e., cholera incidence) will also need to be calculated at the same resolution. The models may be built at different resolutions for different times because of the availability of different satellite data. For example, for times with only SeaWiFS data to describe chlorophyll distributions, it is not possible to build a model that predicts cholera distributions below 4 kilometers. Starting in 1999, when Terra MODIS was launched, chlorophyll concentration and SST have been available at a spatial resolution of 250 meters1 (Figure 10). Once all of the images are georeferenced, they will be integrated and resampled to a common spatial resolution within geographic information system (GIS) software. 1 In 2003, Aqua MODIS was launched and thus there are two SST and chlorophyll scenes collected of the same area each day, Terra MODIS in the morning and Aqua MODIS in the afternoon. 34 Principal Investigator/ Program Director: Emch, Michael, Edward A B Figure 10: (A) Terra MODIS Derived Chlorophyll (B) Terra MODIS Derived SST A B C Figure 11: Chlorophyll (A) 2002 SeaWiFS of Vietnam (B) 1980 CZCS Asia and Africa (C) 1997 OCTS Bangladesh Landsat and other satellite imagery will be used to measure ecological change in and around the estuaries for as many dates as possible during the study period. Figure 13 is a 1997 Landsat TM image of the southern estuary in the Nha Trang, Vietnam study area (also see Figure 7 for study area map). 35 Principal Investigator/ Program Director: Emch, Michael, Edward Figure 12: ERS Image of the Matlab Study Area During the 1998 Monsoon Figure 13: Landsat TM Image of Nha Trang The left map view shows the region around Nha Trang, and the right map is a magnified view of the estuary. Using image-processing software, the satellite data will be used to measure environmental changes in natural vegetation, anthropogenic features, and water quality (i.e., turbidity and floating vegetation). Satellite imagery from the aforementioned sensors will be used to classify different LULC classes that may be related to cholera incidence. The images will initially be radiometrically corrected so that all scenes are comparable. We will use the dark-object subtraction algorithm as described in Song et al. (2001). The imagery will then be classified into LULC classes. First, we will develop a classification scheme that will include such coarse classes as forest, water, soil, impervious surface, agricultural lands, aquaculture areas, and grass lands. Then, we will attempt to model more detailed classes such as specific crops. We will use a variety of image processing methods including traditional hard classifiers (e.g., maximum likelihood) as well as knowledge-based schemes (i.e., that rely on GIS data). While we do not have ancillary data that will allow us to use accurate training data in the classification of the retrospective satellite imagery, we will collect ground-truth data through field visits. These field visits will involve both collecting training data for modern features (that may or may not have changed during the study period) and asking questions of land holders to determine what the LULC was in the past. We will show the land holders the satellite images and ask them if they remember what the LULC class was when the imagery was acquired. 36 Principal Investigator/ Program Director: Emch, Michael, Edward Several environmental variables will be modeled using multitemporal satellite imagery and subsequently performing GIS analysis. There are numerous canals and rivers in the study areas, which are often connected to open latrines. We assume that the use of these water bodies for bathing, washing, and cooking is greater for people who live closer to them. While approximately 95 percent of the people living in Matlab drink tube well water (Emch, 1999), people can contract cholera by swallowing water while they are bathing. This is supported by a recent study (Sack et al., 2003) that found that people in Matlab who bathed exclusively with tube well water were 0.4 times as likely to contract cholera as people who used pond and/or river and/or canal water. Also, rural Bangladeshis use pond or river water for cooking; if they do not boil the water then they will likely contract the disease. Thus, while people are not usually drinking water directly from the cholera reservoir, access to these water sources has been shown to be a risk factor for the disease. This mode of transmission is also likely to occur in Vietnam and Mozambique. We will compute distance from water bodies using Landsat satellite imagery (i.e., either MSS, TM, ETM+, and ASTER depending on availability of cloud-free dates) for the dry season and radar satellite data (i.e., Radarsat and/or ERS) for the rainy/monsoon season (unless cloud-free Landsat imagery is available, though this is unlikely). Satellite data will be acquired for several dates during the 20-year study period based on availability. After rectification, the imagery will be classified into water and nonwater classes and then a distance surface will be created from the water pixels. The distance surface created using the dry season images will describe proximity to the permanent surface water features including rivers, canals, ponds, and swamps. The distance surface that will be created using the wet season images will describe proximity to flood-inundated areas. We will also use GIS analytical tools to differentiate between the different water types (i.e., rivers, canals, ponds, and swamps) so that we can create separate distance surfaces for the different water types. Lastly, we will measure turbidity in water using the imagery. We will calculate ecological (i.e., neighborhood-level) variables using the household, bari, and hamlet-level GIS databases. Spatial filtering techniques will be used to create neighborhood-level variables because they are more appropriate than individual or household-level variables in some cases. For instance, if a person living in a socio-economically deprived area (i.e., where there is poor sanitation) has only 10 neighbors as opposed to 500 living in close proximity, the local environment will not be as polluted because there are fewer people introducing fecal material into the water. Various spatial filtering methods have been used so that data collected from a field survey can be scaled to remove noise or create neighborhood-level variables (Meijerink et al., 1994; Watkins et al., 1993; Ali et al., 2002d). We will use methods proposed by Ali et al. (2002d) which apply a low pass filter to develop neighborhood-level population density and socio-economic variables. Statistical Analysis One goal of this study is to measure the relationships between the independent variables and cholera incidence. Ordinary least squares (OLS) models will be built to measure global relationships between the independent variables and incidence. One assumption of OLS regression is that observations are independent of one another; with geographic data, however, this is not likely to be the case. Statistical models need to control for both spatial and temporal autocorrelation. Stern and Cressie (2000) analyzed disease risk by aggregating spatial components into statistical models to improve validity and increase predictive performance. Since cholera is a highly infectious disease, spatial components are important in explaining occurrence rates and specifying the causes and propagation of the disease. Infectious diseases usually have spatially correlated occurrence rates, therefore, interpretation of results of statistical models should consider the correlation structure of the disease distribution. For these reasons, the use of conventional statistical models may be misleading. For spatially correlated events, several spatial statistical methods have been proposed including the Markov Random Field (MRF) auto-poisson model (Besag, 1974) and the hierarchical model (Clayton and Kaldor, 1987; Aitchison and Ho, 1989). 37 Principal Investigator/ Program Director: Emch, Michael, Edward There are several diagnostic measures to test for spatial autocorrelation and other assumption violations of OLS regression including non-normal errors, heteroskedastic (non-constant variance) errors, multicollinear predictors, and spatially autocorrelated errors. The Kiefer-Salmon test shows whether the residuals from the OLS model are significantly different from a normal distribution. The Breusch-Pagan test shows whether there is significant heteroskedasticity of errors. Moran's I and Lagrange Multiplier tests can be used to measure whether there is spatial autocorrelation in the model residuals. Spatial structure of data sets is ignored in conventional regression; thus, conventional regression is often an inadequate tool for comparing spatial distributions because of spatial dependence (i.e., what happens in one place depends on what happens in other places) and spatial heterogeneity (i.e., relationships vary across space). Statistical inference is problematic when dependence and heterogeneity are ignored and regression assumptions are violated. Spatial heterogeneity refers not only to nonconstant relationships between variables in space but also heteroskedasticity, or nonconstant variance, which can result from omitted variables. One solution to dependency is to use specialized “spatial regression” methods that incorporate spatial effects (Anselin, 1988, 1998; Anselin and Bao, 1997). Spatial autocorrelation should be accounted for so that significance tests are not suspect. If variables are spatially dependent, explanation is not complete without some characterization of spatial interaction. Inclusion of a spatial lag variable, a variable representing the neighborhood effect of cholera incidence, may explain some of the residual variation. By accounting for the spatial effects in the model, we can interpret the significance of the other, non-spatial variables. Models that include a spatial lag variable are called a regressive spatial autoregressive models. Spatial regression analysis allows us to relate the occurrence of a spatial distribution with various other factors that are geographic in nature (Kulldorff, 1998). This research project will use spatial regression analysis to investigate relationships between the effects of socio-environmental conditions and cholera incidence. We will build statistical models to identify the strengths of relationships between cholera incidence and the variables listed in Table 1. In addition, we will use geographically weighted regression (GWR) to explore the relationships between cholera and the socio-environmental risk factors (Brunsdon et al., 1999; Fotheringham et al., 2002). GWR results in locally specific parameter estimates so the spatial variation of relationships can be mapped, enabling us to explore how relationships vary within and between the different ecosystems in the three study areas. Expected Outcomes We will disseminate our findings at international conferences and by writing papers. We will submit several papers to peer-reviewed journals on the following topics. 1. Papers describing spatio-temporal associations between cholera incidence and satellite-derived environmental variables (i.e., chlorophyll concentration, SST, SSH, rainfall, LULC, flooding), climatic variables (i.e., in situ rainfall and temperature), and socio-demographic variables (i.e., population density and socio-economics). 2. A paper describing how associations between cholera and satellite-derived biophysical, climatic, and socio-demographic variables vary in space (i.e., between and within study areas) and time (i.e., during the 20-year longitudinal study period). 3. A paper describing how changes in estuaries (i.e., turbidity and chlorophyll concentration) and areas around estuaries (i.e., LULC) are related to cholera incidence. 4. A paper presenting the cholera prediction model including descriptions of the data and methods that should be used for prediction. This paper will address such theoretical ideas as nonlinearity of relationships in space and time and whether the model is different for dominant cholera strain (classical, El Tor, O139). 38