An Introduction to Multiple Systems Estimation for Estimating a Count of Adverse Events

advertisement
An Introduction to
Multiple Systems Estimation
for Estimating a
Count of Adverse Events
Jana Asher
Carnegie Mellon University
October 16, 2002
Outline
•
•
•
•
•
Background
Overview of capture-recapture
Capture-recapture model assumptions
Multiple systems estimation
• log-linear models
• Rasch models
Example
• Ethnic Albanian deaths in Kosovo,
March – June 1999
Background
•
Used for estimating a population count.
• The size of a wildlife population.
• The number of WWW pages.
• The number of people in the USA.
• The number of human rights violations (civilian
deaths) in Guatemala and Kosovo.
• Capture-recapture = dual systems estimation.
• Multiple capture-recapture = multiple systems
estimation.
Overview of
Capture-Recapture
Capture 1
Capture 2
Overlap
Overview of
Capture-Recapture
List 2
 x1 x1 
ˆ
N 

x
1
1


In
Out
Total
In
List 1
Out
x1 1
x01
x1 0
x00
x1
x1
x1
x 0
x   N
Total
Capture-Recapture
Assumptions
•
•
•
•
•
•
Independence of lists
Homogeneity of capture probabilities
Error-free matching across lists
No in- or out-migration
No duplicates within a list
Lists are random samples
Multiple Systems Estimation
for Three Lists
•
Three lists allow for modeling of
dependency and/or heterogeneity.
List 3 In
List 2
In Out
In
List 1
Out
List 3 Out
List 2
In Out
x1 1 1
x011
x1 0 1
x001
x11
In
List 1
x 01
Out
x110
x 010
x100
x 000
x10
x 0 0
x1 1
x 0 1
x1
x10
x00
x0
Multiple Systems Estimation
for Three Lists
•
•
Three lists allow for modeling of
dependency and/or heterogeneity.
Model for dependency:
log (m ijk)  uu1( i )u 2 ( j )u 3 ( k )u12( ij )u13( ik )u 23( jk )
where
m ijk  E( xijk )
Multiple Systems Estimation
for Three Lists
•
•
Three lists allow for modeling of
dependency and/or heterogeneity.
Full quasi-symmetry (Rasch) model for
heterogeneity:
3
log (k 1k 2 k 3)    k 1 1  k 2  2  k 3  3  (kj ) 2 
j 1
where
k k k  Pr(observing
1 2 3
a count in cell ( k 1k 2 k 3), kj  {0 ,1})
Multiple Systems Estimation
for Three Lists
•
•
Three lists allow for modeling of
dependency and/or heterogeneity.
Full quasi-symmetry (Rasch) model for
heterogeneity:
3
log (k 1k 2 k 3)    k 1 1  k 2  2  k 3  3  (kj ) 2 
j 1
Rasch model enables projection to missing cell
via moment constraints (inequality restrictions).
Multiple Systems Estimation
for More than Three Lists
•
•
Same modeling techniques, more
parameters.
More high-quality lists available means less
assumptions are required.
Example: Kosovo
•
•
Analysis required for the trial of former
Yugoslav President Slobodan Milosevic for
war crimes allegedly committed in Kosovo.
Question of interest: Did a systematic
campaign by Yugoslav forces lead to
Kosovar Albanian deaths and expel
Kosovar Albanians from their homes?
Example: Kosovo
•
•
Migration data from two sources; analyzed
using standard demographic techniques.
Ethnic Albanian death data from four
sources; estimates of number of deaths
derived via multiple systems estimation.
Kosovo: Data Sources
•
The American Bar Association Central and
East European Law Initiative: 1,674 interviews;
5,089 incidents.
• Exhumations by international teams on behalf of
the International Criminal Tribunal for the Former
Yugoslavia: 1,767 exhumations.
• Human Rights Watch: 337 interviews; 1,717
incidents.
• The Organization for Security and Cooperation
in Europe: 1,837 interviews; one or more
incidents each interview.
Kosovo: Data Matching
•
•
Duplicates within each list removed.
6 matches performed; one for each pair of
lists.
• Human coders used match-facilitation software.
• Each list pair matched 2-4 times by different
coders.
•
Number of individual deaths (killings where
the victim can be named): 4,400.
Kosovo: Data Matching
HRW
Yes
Yes
No
No
ABA
EXH
OSCE
Yes
No
Yes
No
Total:
Yes
Yes
27
18
181
177
Yes
No
No
Yes
32
42
31
106
217
228
845 1,131
No
No
123
306
936
???
4,400
Kosovo: Death Count Estimates
•
•
Estimate of overall number of deaths
created from a log-linear model of the fourway cross-classification table: 10,356
(9,002, 12,122).
Two-day time period estimates of number of
deaths created from log-linear models of
three-way cross-classification tables; four
such cross-classification tables per time
period.
Kosovo: Analysis
Kosovo: Analysis
•
•
Regression analysis performed using KLA
and NATO activity data as independent
variables and death/migration estimates as
dependent variables.
The analysis supports the conclusion that a
systematic campaign of Yugoslav forces
was responsible for ethnic Albanian
migrations and deaths in Kosovo between
March and June of 1999.
Overall Conclusions
•
•
Where several high-quality pre-existing
incomplete lists of adverse events exist,
multiple systems estimation is a viable
technique for estimating a total count of
adverse events.
Relatively sophisticated technical expertise
is required to use this estimation technique
well.
Further Reading
•
Ball, P., Betts, W., Scheuren, F., Dudukovich, J.,
and Asher, J. (2002). Killings and Refugee Flow in
Kosovo March - June 1999: A Report to the
International Criminal Tribunal for the Former
Yugoslavia. American Association for the
Advancement of Science, Washington, DC.
• Contains a good reference list.
• Available on my website: http://www.stat.cmu.edu/
~asher/PAPERS2002/polkilkos_020109.pdf
Download