Advanced Lazarsfeldian Methodology Conference From Lazarsfeldian Contextual analysis to Multilevel models (Strategies for analysis of individual and/or aggregate data) Petr Soukup Basic „ideology“ Gauss (1805) – „Regression analysis, OLS“ Homans (1950) - Human group Robinson (1950) – „Ecological fallacy“ Lazarsfeld, Menzel (1961) - On the relation between individual and collective properties Iversen (1991) – Contextual analysis And many others…about multilevel modeling Three possible strategies how to analyse data Three possible strategies to analyze individual and/or aggregate data Analyze only individual data (classical regression or correlation analysis) Analyze only aggregate data (but Robinson’s problem and EI solution) Analyze individual and aggregate data at once (contextual and multilevel analysis) Individual level data only 1st possible strategy (only individual data) We omit information about aggregate levels (groups etc.), we loose some explained variance We use classical regression or correlation analysis and many other methods We make some statistical mistakes by ignoring of some dependencies of observations (or by pretending of the independence) Results are usually quite good but we are not able differentiate between aggregate levels If we have individual and aggregate data this is only 2nd best strategy Results for 1st strategy 30000 income 25000 y = 5000 + 950x 20000 15000 10000 5000 0 5 10 15 20 25 years of education The result is one regression line (one equation). This equation is the same for all individuals („average“ line). Inference from aggregate data 2nd possible strategy (only aggregate data) We omit/do not have information about individual level We want to infer about individual behaviour We can make crucial mistakes (Robinson’s problem) so called ecological fallacy (demonstration in the inference about relation of education and salary) 2nd possible strategy (only aggregate data) Ecological fallacy solution = ECOLOGICAL INFERENCE 50-ties method of bounds (Duncan, Davis, 1953), ecological regression (Goodman, 1953, 1959) 90-ties King: A solution to the ecological inference problem (1997) General solution can not be found (we always loose information by aggregation), current solutions are only specific ones Aggregate(groups) + individual data Picture 4 (Different intercepts and slopes) 30000 income 25000 20000 15000 male female male+female 10000 5000 0 5 10 15 years of education 20 25 Lazarsfeldian approach [Lazarsfeld, Menzel 1961] – typology of variables 1) global, 2) relational, 3) contextual –individual level 4) analytical and 5) structural – aggregate level Examples of these types Warning: This is „reduced“ version of original typology 4) can be derived from 1) by aggregation 5) can be derived from 2) by aggregation 3) can be derived from 1) or 2) measured on aggregate level by disaggregation Lazarsfeldian approach This process (aggregation and disaggregation) can have of course more than two levels up to infinity (in practical analyses two or three levels) The name Contextual analysis – we use information about aggregate data if we analyse individual data We use currently multilevel analysis based on lazasfeldian contextual ideas Contextual/multilevel analysis Problem with group/context Group boundaries –sometimes can be fuzzy, it is difficult to decide whether somebody is member of group or not Mobility between groups – people tend to change group membership (change of school, neighborhood, church etc.) – „new“ members are not influenced by the group at the same level as the old ones Multiple membership (overlapping) – People are usually members of more than one group, we should work with more contexts, (Whoch context(s) is (are) the most important? (possible solution see slide Other problems that can be solved via ML models) Two types of contextual analysis Interaction variables for individuals and groups (Method 1), or A two-step estimated model based on variables measured at the first level for individual contexts, and then by using these estimates at the second level in the role of dependent variables (Method 2). Multilevel analysis Inclusion of random error at the second (group) level Estimates by iterative methods More precise estimates (lower standard errors) Multilevel approach as more general Growth models We measure characteristic of individuals many times. We can treat measurements in current time as first level (similar to pupils at schools) and individuals as second levels (similar to school). „Average“ growth curve is one result of analysis but second result can be description (or explanation) of differences of individual growth curves. Metaanalysis 1st level data from individual studies 2nd level individual studies Goal: 1. to find common result of all covered studies and 2. to find reason of differences between studies Cross classified models Individual can be included in more than one group, these group are not hierarchically nested. and their influences are mixed (crossed) Conclusion? Real research - examples HSB – USA PISA, TIMMS or ICCS internationally Model for multilevel analysis (random coefficient model see Hox, 2002) i- index for individual, j-index for group Lowest (individual) level: (1)Yij= b0j+ b1jXij+ eij, X-individual variable and at the Second (group) level: NEW (2) b0j= g00+ g01Zj+ u0j Random parts at the second level (3) b1j= g10+ g11Zj+ u1j because we do not have information Z-group variable about all groups!!! Combining individual and group level: (1)+(2)+(3): Yij= g00 + g10Xij+ g01Zj+ g11ZjXij + u1jXij+ u0j+ eij -fixed part -random part