An Introduction to Multiple Systems Estimation for Estimating a Count of Adverse Events Jana Asher Carnegie Mellon University October 16, 2002 Outline • • • • • Background Overview of capture-recapture Capture-recapture model assumptions Multiple systems estimation • log-linear models • Rasch models Example • Ethnic Albanian deaths in Kosovo, March – June 1999 Background • Used for estimating a population count. • The size of a wildlife population. • The number of WWW pages. • The number of people in the USA. • The number of human rights violations (civilian deaths) in Guatemala and Kosovo. • Capture-recapture = dual systems estimation. • Multiple capture-recapture = multiple systems estimation. Overview of Capture-Recapture Capture 1 Capture 2 Overlap Overview of Capture-Recapture List 2 x1 x1 ˆ N x 1 1 In Out Total In List 1 Out x1 1 x01 x1 0 x00 x1 x1 x1 x 0 x N Total Capture-Recapture Assumptions • • • • • • Independence of lists Homogeneity of capture probabilities Error-free matching across lists No in- or out-migration No duplicates within a list Lists are random samples Multiple Systems Estimation for Three Lists • Three lists allow for modeling of dependency and/or heterogeneity. List 3 In List 2 In Out In List 1 Out List 3 Out List 2 In Out x1 1 1 x011 x1 0 1 x001 x11 In List 1 x 01 Out x110 x 010 x100 x 000 x10 x 0 0 x1 1 x 0 1 x1 x10 x00 x0 Multiple Systems Estimation for Three Lists • • Three lists allow for modeling of dependency and/or heterogeneity. Model for dependency: log (m ijk) uu1( i )u 2 ( j )u 3 ( k )u12( ij )u13( ik )u 23( jk ) where m ijk E( xijk ) Multiple Systems Estimation for Three Lists • • Three lists allow for modeling of dependency and/or heterogeneity. Full quasi-symmetry (Rasch) model for heterogeneity: 3 log (k 1k 2 k 3) k 1 1 k 2 2 k 3 3 (kj ) 2 j 1 where k k k Pr(observing 1 2 3 a count in cell ( k 1k 2 k 3), kj {0 ,1}) Multiple Systems Estimation for Three Lists • • Three lists allow for modeling of dependency and/or heterogeneity. Full quasi-symmetry (Rasch) model for heterogeneity: 3 log (k 1k 2 k 3) k 1 1 k 2 2 k 3 3 (kj ) 2 j 1 Rasch model enables projection to missing cell via moment constraints (inequality restrictions). Multiple Systems Estimation for More than Three Lists • • Same modeling techniques, more parameters. More high-quality lists available means less assumptions are required. Example: Kosovo • • Analysis required for the trial of former Yugoslav President Slobodan Milosevic for war crimes allegedly committed in Kosovo. Question of interest: Did a systematic campaign by Yugoslav forces lead to Kosovar Albanian deaths and expel Kosovar Albanians from their homes? Example: Kosovo • • Migration data from two sources; analyzed using standard demographic techniques. Ethnic Albanian death data from four sources; estimates of number of deaths derived via multiple systems estimation. Kosovo: Data Sources • The American Bar Association Central and East European Law Initiative: 1,674 interviews; 5,089 incidents. • Exhumations by international teams on behalf of the International Criminal Tribunal for the Former Yugoslavia: 1,767 exhumations. • Human Rights Watch: 337 interviews; 1,717 incidents. • The Organization for Security and Cooperation in Europe: 1,837 interviews; one or more incidents each interview. Kosovo: Data Matching • • Duplicates within each list removed. 6 matches performed; one for each pair of lists. • Human coders used match-facilitation software. • Each list pair matched 2-4 times by different coders. • Number of individual deaths (killings where the victim can be named): 4,400. Kosovo: Data Matching HRW Yes Yes No No ABA EXH OSCE Yes No Yes No Total: Yes Yes 27 18 181 177 Yes No No Yes 32 42 31 106 217 228 845 1,131 No No 123 306 936 ??? 4,400 Kosovo: Death Count Estimates • • Estimate of overall number of deaths created from a log-linear model of the fourway cross-classification table: 10,356 (9,002, 12,122). Two-day time period estimates of number of deaths created from log-linear models of three-way cross-classification tables; four such cross-classification tables per time period. Kosovo: Analysis Kosovo: Analysis • • Regression analysis performed using KLA and NATO activity data as independent variables and death/migration estimates as dependent variables. The analysis supports the conclusion that a systematic campaign of Yugoslav forces was responsible for ethnic Albanian migrations and deaths in Kosovo between March and June of 1999. Overall Conclusions • • Where several high-quality pre-existing incomplete lists of adverse events exist, multiple systems estimation is a viable technique for estimating a total count of adverse events. Relatively sophisticated technical expertise is required to use this estimation technique well. Further Reading • Ball, P., Betts, W., Scheuren, F., Dudukovich, J., and Asher, J. (2002). Killings and Refugee Flow in Kosovo March - June 1999: A Report to the International Criminal Tribunal for the Former Yugoslavia. American Association for the Advancement of Science, Washington, DC. • Contains a good reference list. • Available on my website: http://www.stat.cmu.edu/ ~asher/PAPERS2002/polkilkos_020109.pdf