aggregate and individual data

advertisement
Advanced Lazarsfeldian
Methodology Conference
From Lazarsfeldian Contextual analysis to
Multilevel models
(Strategies for analysis of individual and/or
aggregate data)
Petr Soukup
Basic „ideology“




Gauss (1805) – „Regression analysis, OLS“
Homans (1950) - Human group
Robinson (1950) – „Ecological fallacy“
Lazarsfeld, Menzel (1961) - On the relation
between individual and collective properties
 Iversen (1991) – Contextual analysis
 And many others…about multilevel
modeling
Three possible strategies
how to analyse data
Three possible strategies to analyze
individual and/or aggregate data
 Analyze only individual data (classical
regression or correlation analysis)
 Analyze only aggregate data (but
Robinson’s problem and EI solution)
 Analyze individual and aggregate data at
once (contextual and multilevel analysis)
Individual level data only
1st possible strategy
(only individual data)
 We omit information about aggregate levels (groups etc.), we loose
some explained variance
 We use classical regression or correlation analysis and many other
methods
 We make some statistical mistakes by ignoring of some dependencies
of observations (or by pretending of the independence)
 Results are usually quite good but we are not able differentiate
between aggregate levels
 If we have individual and aggregate data this is only 2nd best strategy
Results for 1st strategy
30000
income
25000
y = 5000 + 950x
20000
15000
10000
5000
0
5
10
15
20
25
years of education
 The result is one regression line (one equation). This equation is the
same for all individuals („average“ line).
Inference from aggregate data
2nd possible strategy
(only aggregate data)
 We omit/do not have information about individual
level
 We want to infer about individual behaviour
 We can make crucial mistakes (Robinson’s
problem) so called ecological fallacy
(demonstration in the inference about relation of
education and salary)
2nd possible strategy
(only aggregate data)
 Ecological fallacy solution = ECOLOGICAL INFERENCE
 50-ties method of bounds (Duncan, Davis, 1953), ecological regression
(Goodman, 1953, 1959)
 90-ties King: A solution to the ecological inference problem (1997)
 General solution can not be found (we always loose information by
aggregation), current solutions are only specific ones
Aggregate(groups) + individual data
Picture 4
(Different intercepts and slopes)
30000
income
25000
20000
15000
male
female
male+female
10000
5000
0
5
10
15
years of education
20
25
Lazarsfeldian approach





[Lazarsfeld, Menzel 1961] – typology of variables
1) global, 2) relational, 3) contextual –individual level
4) analytical and 5) structural – aggregate level
Examples of these types
Warning: This is „reduced“ version of original typology
 4) can be derived from 1) by aggregation
 5) can be derived from 2) by aggregation
 3) can be derived from 1) or 2) measured on aggregate
level by disaggregation
Lazarsfeldian approach
 This process (aggregation and disaggregation) can
have of course more than two levels up to infinity
(in practical analyses two or three levels)
 The name Contextual analysis – we use
information about aggregate data if we analyse
individual data
 We use currently multilevel analysis based on
lazasfeldian contextual ideas
Contextual/multilevel analysis
Problem with group/context
 Group boundaries –sometimes can be fuzzy, it is difficult to
decide whether somebody is member of group or not
 Mobility between groups – people tend to change group
membership (change of school, neighborhood, church etc.) –
„new“ members are not influenced by the group at the same
level as the old ones
 Multiple membership (overlapping) – People are usually
members of more than one group, we should work with more
contexts, (Whoch context(s) is (are) the most important?
(possible solution see slide Other problems that can be solved
via ML models)
Two types of contextual analysis


Interaction variables for individuals and
groups (Method 1), or
A two-step estimated model based on
variables measured at the first level for
individual contexts, and then by using
these estimates at the second level in the
role of dependent variables (Method 2).
Multilevel analysis

Inclusion of random error at the second
(group) level

Estimates by iterative methods

More precise estimates (lower standard
errors)
Multilevel approach as more general
Growth models
 We measure characteristic of individuals many times. We can treat measurements in
current time as first level (similar to pupils at schools) and individuals as second levels
(similar to school). „Average“ growth curve is one result of analysis but second result
can be description (or explanation) of differences of individual growth curves.
Metaanalysis
 1st level data from individual studies
 2nd level individual studies
Goal:
 1. to find common result of all covered studies and
 2. to find reason of differences between studies
Cross classified models
Individual can be included in more than one group, these group are not hierarchically nested.
and their influences are mixed (crossed)
Conclusion?
Real research - examples
 HSB – USA
 PISA, TIMMS or ICCS internationally
Model for multilevel analysis
(random coefficient model see Hox, 2002)
i- index for individual, j-index for group
Lowest (individual) level:
(1)Yij= b0j+ b1jXij+ eij,
X-individual variable
and at the Second (group) level:
NEW
(2) b0j= g00+ g01Zj+ u0j
Random parts at the second level
(3) b1j= g10+ g11Zj+ u1j
because we do not have information
Z-group variable
about all groups!!!
Combining individual and group level:
(1)+(2)+(3):
Yij= g00 + g10Xij+ g01Zj+ g11ZjXij
+ u1jXij+ u0j+ eij
-fixed part
-random part
Download