First International Symposium of Health GIS Bangkok, Thailand December 1-2, 2005 CONTEXTUALIZING THE URBAN HEALTHCARE SYSTEM METHODOLOGY FOR DEVELOPING A GEODATABASE OF DELHI’S HEALTHCARE SYSTEM Pierre CHAPELET, Bertrand LEFEBVRE University of Rouen, France – Centre de Sciences Humaines, India chapelet@altern.org, bertrand.lefebvre@csh-delhi.com ABSTRACT: This communication introduces the setting up of a Geographical Information System on Delhi for studies in the Social Sciences focusing on healthcare system organization (developed at Centre de Sciences Humaines, New Delhi, India). Through an explanation of the methodological procedure and demonstration of thematic applications focusing on the healthcare system’s spatial organization, the author lead us through the inherent difficulties of building a GIS in an emerging country like India. They also attempt to demonstrate that this kind of tool remains, however, a relevant support for research in the Social Sciences as long as it is used with care and knowledge of the dataset frame. From this perspective, Exploratory Data Analysis coupled with the play of scales provide powerful ways to assess socio-spatial and healthcare system dynamics taking place in the Indian capital. KEY WORDS: GIS, Healthcare system, Data Exploratory Analysis, Multiscalar, Delhi, Census 1991/2001. INTRODUCTION This Paper presents the setting-up of a Geographical Information System (GIS) about Healthcare system of the Delhi agglomeration. This GIS is still under development and currently used by different researchers working on health care system in Delhi. Instead of presenting results of these research programs (which could be very boring for the non specialist or indianist), we though it would be more relevant to briefly introduce this tool, and then to focus on few specific methodological points, which are in fact relevant for each research project dealing with GIS. For further informations about research projects using this tool, I invite the reader to go to http://www.cshdelhi.com/electronicop11/start.htm to discover in-depth interactive presentations of its application fields. 1. OBJECTIVES OF THE PROJECT This GIS has been primarily developed in the context of a health geography research project on access to pharmaceuticals in the Indian capital, Delhi. At that time, the purpose of implementation of such a tool was manifold: Firstly, at the beginning of the project in January 2002, there was an acute lack of documentation precisely describing the spatial organisation and functioning of the city. Thus, integrating detailed census data in a GIS was a first step to better assess socio-spatial dynamics taking place in the Indian capital. Furthermore, the idea of comparing the recently released census 2001 datasets with already available census 1991 datasets was clearly a very promising way to obtain a dynamic picture of the city, which has not been attempted till now. Secondly, in order to work on access to pharmaceuticals and more generally on health topic as geographers, it was necessary to set up a spatial database enabling us to visualize the spatial settings of healthcare infrastructures in the city and its periphery, identifying actors, their localisation and spatial organisation. Cross-checking this layer of information with census data was seen as a way to better understand the location strategies of healthcare system actors. In the field of health geography, our posture was thus essentially to rely on the spatial analysis of healthcare infrastructure locations. Thirdly, given the two above-mentioned objectives, this tool seemed to be very promising for sampling purposes: To select samples of health infrastructure, in particular those distributing medicines, for our fieldwork survey according to their location in the urban agglomeration; To select urban areas according to their profile (socio-economic, demographic), as well as according to the spatial repartition of the healthcare actors for further investigations (survey of households); As soon as we started conceptualizing our spatial database, given the variety of data available through the census database it was soon obvious that it could be very useful for any researcher working on Delhi, particularly in the field of social sciences, urban development, or health studies. This led us to design this GIS in order to ensure that it could be re-used. That is why we decided to publish a detailed presentation of this work on a digital format (CDRom and internet web site). This allowed us to incorporate interactive maps (flash), helping the potential user in understanding the way he can use this tool. Moreover, since the main limitation in the use of this kind of tool often stems from the difficulty to feed the system with data, we also had to ensure that it could be quickly updated and completed with fresh datasets coming from other research projects. This led us to base our databases on a common georeferenced framework and to develop tools for easy updates. Besides, the inevitable imperfections of datasets led us to resort to First International Symposium of Health GIS Bangkok, Thailand December 1-2, 2005 methodological devices to check the bias generated by relatively poor data quality. are not listed centrally, or listings are incomplete, such as for general practitioners, pharmacies, or private nursing homes. 2. After this brief overview of statistical data availability, what about cartographic data? RESEARCH POSTURE AND METHODOLOGY The research posture and the resulting framework of our GIS emerged from different constraints. 2.1 Constraints to the use of GIS There are in fact many constraints to build-up and use a GIS in a developing country such as India, especially when working in the field of health and, furthermore when focusing on intra-urban problematics. As we will see, these constraints often shape the design of the GIS as such, and lead to resort to specific methodologies to work around them. The main constraints are of course related to Data (statistical AND cartographic data): Albert, Gesler & all point out the ”4-I” Rules of every GIS: Intensive, Inaccurate, Inaccessible, Incomplete (Albert, Gesler & Al). In order to restrict the unavoidable inaccuracy of a system that seeks to simplify reality through modelling (GIS), one must intensively feed the system with fresh datasets. The processing power of such a system partly comes from data wealth. Now, datasets are often inaccessible, inaccurate or simply incomplete. India is not an exception to this assumption. The Indian situation is in fact quite paradoxical regarding data availability. Indeed, on the one hand India is an important producer of statistics. On the other hand, these statistics are not easily accessible, especially when looking for detailed data. As soon as statistical information below the district level in needed, it becomes very difficult to procure it. (WHAT IS A DISTRICT FOR A NON SPECIALIST ?) 2.1.1 Statistical datasets One of the main sources of information, when one wants to analyse relations between the healthcare system and deserved population is of course the Census of India, which provide detailed datasets about demographic, social and economic statistics every decade. Moreover, it is available on a digital format since 1991. However, when working on intra-urban dynamics, many limitations emerge. Firstly, the number of spatial units in a city is quite low for an in-depth understanding of intra-urban dynamics. Secondly, in Delhi the shape of these units has been modified between the last two censuses. We could also cite another data sources, such as the National Sample Survey (NSS), or the National Family Health Survey (NFHS), which are especially focusing on health and healthcare thematic. Again, data is not available on a sufficient small level for integration in a GIS. For example, the NFHS statistical sample has been merged in 2 spatial units in Delhi (rural/urban)… The situation is the same when one wants to even only locate healthcare infrastructures. All government agencies (central, federal, municipal), maintain its own list of health infrastructures but there is no coordination between these agencies. Furthermore many actors of the healthcare system 2.1.2 Cartographic data Cartographic data are in fact highly controlled by the government and thus it is very difficult to obtain a full coverage of a given area. Moreover, many maps are outdated, or simply unavailable. This is a real paradoxical situation given that very accurate tools now become available on the market. For the big Indian metropolises, private companies such as Eicher City Maps now provides detailed coverages (but they are very costly for small research projects). We could also cite Google Map, which allow the user to zoom deeply inside a city (this tool was not available when we started our project, but is now used to extend our database). Finally the picture of data availability does not seem to be very motivating for a researcher starting a GIS project on healthcare system in Delhi. However, it is possible to work around these problems using specific tools and methodologies, which are often missing in GIS… 2.2 Contextualising data through the play of scales If GIS manages the question of scale in spatial continuity, we have seen previously that thematic data are often dependent on administrative divisions, which are on the contrary not spatially continuous (thematic discontinuity). When working with GIS, this obliges us to choose perception levels adapted to the scale of studied spatial units. (See figure 1). However, selecting only one perception level, such as census administrative divisions to study the spatial distribution of healthcare infrastructures in Delhi or to compute catchment area assessments (Desserte rate), can disguise the spatial organisation of the studied phenomenon because of the internal heterogeneity of each object (Are census units really pertinent to compute this rate?). Source: Roudier Daval, 2004 Figure 1 – Comparison Criteria between Exploratory Data Analysis and Confirmatory Data Analysis Thus we decided to develop a multi-level framework based on different perceptions levels. The idea is to say that a phenomenon can be properly analysed only if studied at different scales in order to understand in which context its spatial organisation takes place. Furthermore, one can work around the problem of data weakness (limited accuracy or limited spatial range) when a dataset is contextualised using other working scales, themselves unveiling other levels of spatial organisation. The results of a study at a given scale will reveal a trend in the spatial organisation of a phenomenon which may or may not be confirmed when contextualised with other scales. 2.3 Cartomatic and Exploratory Data Analysis Associated to this idea of contextualisation, we used Exploratory Data Analysis techniques (EDA) and specific cartomatic tools to analyse collected datasets and unveil there underlying structure. Table 1 resumes the main differences between exploratory and classical confirmatory analysis. Though an increase in the number of views, this approach allows not only to try to choose a representation amongst a set of solutions, but also to favour the emergence of hypotheses regarding the First International Symposium of Health GIS Bangkok, Thailand December 1-2, 2005 underlying spatial organisation of the studied phenomena. EDA finally appears as a kind of “Interative System of Thoughts Assisted by Computer” (Antoni & Klein, 2003). Exploratory Analysis Confirmatory Analysis Descriptive Approach Inferential Approach Robust Statistics Sensitive Statistics Flexible Research Program Rigid Research Program Graphic Expression Numerical Expression Intuitive Vision Deductive Vision Source: (WANIEZ 2002) Table 1 – Comparison Criteria between Exploratory Data Analysis and Confirmatory Data Analysis Graphic expression remains the key factor of such an approach. However, visualisation is of course not limited to the simple representation of a given data. Indeed, many classical tools were used in our projects to help visualize the data. Amongst the methods used to process statistical variables, we can mention: Factor analysis; Hierarchical Agglomerative Cluster Analysis; Linear Regression and Scatter plot analysis; Spatial Autocorrelation coefficients such as Moran and Geary. These methods have been used using different free softwares such as Philcarto and GeoDa. Non-geographic statistical tests are now conducted using R. It is only after this first “radiography” of data series (this contextualisation - Exploratory methods), that we built up our research hypothesis about the organisation of Delhi health care system. We then conducted specific spatial analysis tests and surveys to answer our hypotheses (Confirmatory methods). These results were in turn integrated and generalised in our database allowing new interactions with datasets and new hypotheses. 3. RESULTS AND FUTURE DEVELOPMENTS At present, our GIS contains four perception levels, each one corresponding to an individual geographic database. Of course, georeferencing of each database warrants compatibility between them. As mention earlier, in order to help researchers in discovering the potential of such multilevel approach, we published few illustrations of possible treatments in our CDrom. Selected perception levels are as follow (from smaller to larger scale): health care infrastructures at the Census Charge level (the more accurate spatial unit available from Census of India). It aims at catching up the full extends of Delhi agglomeration and its satellite cities. 3.3 Delhi (National Capital Territory): This perception level contains 1991 and 2002 Census Datasets. We also collected the complete list of health infrastructure from various government agencies. The tables pertaining to public hospitals and dispensaries are as exhaustive as we could expect but as far as private nursing homes are concerned, most of them were still unregistered in 2002. Instead of attaching these infrastructures to census spatial units, a method which implied a loss of accuracy in location given the size of the concerned units), we preferred attach them to a new layer containing the location of each locality-place in the city (2300 georeferenced punctual entities located at the geometric centre of each locality). This method still allows the user to agglomerate data by census unit for a rapid comparison, without loosing location accuracy during geocoding. This layer is also a very useful tool for researchers doing fieldwork surveys, allowing them for example to exactly locate the different places visited by patients, and then study variations of spatial mobility in relation to socio-economic status, morbidity profile or health service used. Since the shape of spatial units has been modified between the two censuses, we plan to rely on interpolation techniques in order to allow comparisons between 1991 and 2001 and study demographic trends. We already have done this treatment for the 1991 census data. For instance, instead of mapping density calculated on the basis of census units, we used a remote sensing image to automatically calculate population density based on real land use. For each census unit, we assigned a point entity at the barycentre of each built-up area. The number of inhabitants has then been divided between each built-up area according to its spatial share in the census unit. We calculated density by dividing the total number of inhabitants by the surface measurement of each built-up area (and not by census charge area). Finally we generated a trend surface (map 2). This method can be applied to various demographic and health data. Generated trend surfaces can then be overlaid, allowing study of population dynamics. If the user can of course map each demographic indicator by census unit to compare it with the spatial organisation of healthcare system actors, or combine few variables for deeper analysis, we though it would be interesting to already synthesise census datasets for the user. In order to do so, we selected different sociodemographic variables (15) and executed a Factor Analysis. The table 2 presents composition of the four first principal components (which sum up 64% of the total information). 3.1 North West India: Contains demographic data and number/type of public healthcare infrastructures attached to cities (up to 20 000 inhabitants). It allows the user to contextualise Delhi situation regarding healthcare provision and population evolutions in much larger dynamics. (See figure 2) 3.2 Delhi and its periphery (Delhi Metropolitan Area): Covering around 11 000 km2 and more than 17 000 inhabitants, this layer is still incomplete and is actually extended to cover a much more larger area around the city. It will contain Census demographic data (2001) and public Principal Components Variabl Name e V01 Density V02 % Workers V03Persons per Household V04 % SC Sex Ratio V05 Population Workforce Sex V06 Ratio Child Women V07 Ratio V08 % Literate CP1 CP2 CP3 CP4 -629 191 221 240 219 786 -704 371 405 -310 397 -69 172 323 275 -92 -327 -578 121 -29 -113 164 -583 453 701 32 265 -392 -837 -325 -187 37 First International Symposium of Health GIS Bangkok, Thailand December 1-2, 2005 V09 V10 V11 V12 V13 V14 V15 Women % Households Manufacturing % Manufacturing (others) % Construction % Trade/commerce % Transport/storage % Other Services % Primary sector -158 92 580 -24 -247 673 318 45 39 362 -186 -623 -768 339 208 40 -11 -444 22 -560 -327 754 -378 -437 -591 24 -290 418 Table 2 – Principal Component Analysis Axe Saturations (*1000) Source: Census of India, 1991 Then, we launched a Hierarchical Agglomerative Cluster Analysis based on the four first principal components and mapped the results. The figure X presents the resulting typology of NCT space. Finally, comparing this typology of NCT space with location of each type of healthcare system actor available in the database gives strong evidences to users of the different location strategies deployed. (mettre carte de localisation public-privé sans typologie des charges ?) 3.4 Specific Intra-urban Zones (Gurgaon, NewDelhi, Shadhara): The use of the three previous perception levels already allows to build up an initial picture of the healthcare system organisation. However, in order to better grasp the reality, we strengthened the analysis by zooming on a large-scale level. Since there is no legislation constraining private structures such as general practitioner or medical shop in their location choice, do we observe spatially specific location strategies? Does place matter? Yet, only specific census units have been selected according to their socio-demographic profile and infrastructure availability (a peri-urban unit and four central ones). This time, we digitalised built-up areas and land use (commercial, residential, industrial…) using the Delhi Eicher City Map (the only map precisely indicating the land use) and went for fieldwork investigations to locate each and every health infrastructures in selected census units. Then, we generated different distance calculations (see figureX): Firstly, crossing these two layers of information, we generated graphs showing the attractiveness of each kind of land use for a given actor, enabling us to gauge to which extent urban environment can influence its establishment. Results showed clear evidences of specific location strategies for each actor. While public actors such as dispensaries equally serve each type of urban area (spatial equity), private infrastructures such as pharmacies prefer particular areas such as commercial ones. Secondly, we generated graphs showing the attractiveness of each actor for others infrastructures. This work clearly unveiled cooperation or substitution strategies between actors. The case of pharmacies is again very interesting on this matter since their location closely depends on their functional need to cooperate with other players such as doctors. As observed on the graph this cooperation leads to the creation of spatial clusters. CONCLUSION Contextualising Datasets through a multiplicity of perception levels is really helpful to avoid data weaknesses. … A finir… First International Symposium of Health GIS Bangkok, Thailand December 1-2, 2005 REFERENCES Journal Albert, Gesler & Al Dale, P.E.R., Chandica, A.L., and Evans, M., 1996, Using image substraction and classification to evaluate change in subtropical intertidal wetlands. International Journal of Remote Sensing, 17, 703719. Books Barret, E.C., and Curtis, L. F., 1992, Introduction to Environmental Remote Sensing, 3rd edition, (London: Chapman and Hall). Edited Books Strum, B., 1981, The atmospheric correction of remotely sensed data and the qualitative determination of suspended matter in marine water surface layers. In Remote Sensing in Oceanography and Meteorology, edited by A. P. Cracknell (Chichester: Ellis Horwood). References from websites: Nakhapakorn, K., 2005. Proceeding on the Health GIS symposium “Analysis of Spatial Factors affecting dengue epidemics using GIS”, Bangkok, Thailand. http://www.jgeoinfo.net/HealthGIS/HG001.html 3.5 Acknowledgements (optional) Acknowledgements of support project/paper/author are welcome. for the