Data Appendix for Shared Space: Ethnic Groups, State Accommodation and Localized Conflict Kathleen Gallagher Cunningham International Peace Research Institute, Oslo and Iowa State University Nils B. Weidmann Princeton University 1 DATASETS Two datasets are provided for this article: (i) a GIS dataset of first-level administrative units for 1991, and (ii) the (non-spatial) dataset containing the variables required to replicate the regression results reported in the article. THE ADMINISTRATIVE REGIONS GIS DATASET The administrative unit dataset contains the first-level administrative regions of 150 countries as of 1991 as vector polygons. The dataset is provided in the ESRI shapefile format (http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) in WGS 1984 reference system. It can be used with all standard GIS tools. Some statistical software also provide basic support for the shapefile format, as for example the R package. Our dataset was created from an existing administrative unit dataset provided by ESRI, which is now freely available at http://geodata.grid.unep.ch/download/admin98_li_shp.zip However, this dataset captures the administrative subdivisions of countries for different points in time. For this reason, we backdated it to reflect the 1991 status. This was done using the country descriptions and maps given on the Statoids web page (http://www.statoids.com/). Apart from the geographic extent, the GIS contains the following variables for each unit: ADMIN_NAME: The name of the unit FIPS_CNTRY: The FIPS identifier of the country the unit is part of CNTRY_NAME: The name of the country the unit is part of COWCODE: The COW identifier of the country the unit is part of ADMINID: A unique numeric identifier for the unit THE REPLICATION DATASET This dataset is a cross-section of first-level administrative units, as given by our GIS dataset introduced above. The unit of analysis is the first-level administrative unit in 1991. Newly created or changed units in the 1990s are not included in the data. Coding information Ethnic groups coding 2 For each administrative unit, the number and relative population of different ethnic groups were identified using the GREG GIS dataset. See the GREG article forthcoming in the Journal of Peace Research for an illustration of how this done. The article is available on the GREG webpage (http://www.icr.ethz.ch/research/greg). We then rank groups according to their relative size in the unit, and use the difference between the largest and the second largest group as our indicator of domination. This is a continuous measure from 0 to 1. Units with a 0 score have multiple ethnic groups with total parity in population shares. Units with a 1 have only one ethnic group. Description of variables computed from GREG: LUNITPOP: logged total population of a unit LCTRPOP: loggeg total country population, computed as the sum of all unit populations in a country DOMINATION: the difference in the population shares between the largest and the second largest group (or 1 if there is only one group). Conflict coding Cases of ethnic conflicts are based on a combination the Cunningham et al (2009) dataset and the Minorities at Risk (MAR) rebellion coding which were matched to the GREG group list. The Cunningham et al (2009) data has a 25 battle death threshold, and includes dyadic data on conflict that encompasses all distinct groups in the dispute. The MAR data is coded by group and indicates a level of rebellion. All groups that received a 2 or greater were given a positive coding for violence (this is all violence greater than “banditry/scattered terrorism”). The geocoding of the conflict was done by the authors using the following process. First, ethnic groups in the GREG list were matched to parties in conflict in the Cunningham at al (2009) and MAR data. Next, using the Minorities at Risk profiles and Uppsala conflict profiles, the location of conflict was determined for each conflict cases. Possible locations included all regional sub-units that contained the ethnic group in question. Description of conflict variables: LOCALCONF - group was in conflict in 90s in this unit INTERCONF - group was in communal conflict in 90s in this unit CONF - group was in either civil war violence conflict or communal conflict in 90s in this unit. Example: map of ethnic group coding and conflict locations The map in Figure A1 shows our coding of India. For each of the states, the color shading indicates the level of domination computed for this state. The white dots denote those states that are coded as having ethnic conflict in the 1990s. 3 Figure A1: Map of India, showing the domination scores at the state level as well as our conflict coding. 4