Multilevel modelling in 15 minutes

advertisement
Scottish Social Survey Network:
Master Class 1
Data Analysis with Stata
Dr Vernon Gayle and Dr Paul Lambert
23rd January 2008, University of Stirling
The SSSN is funded under Phase II of the ESRC
Research Development Initiative
1
Multilevel data and analysis
with Stata (in 15 minutes)
2
Generalised linear model
• Y = BX + e
• Y = outcome variable(s)
• X = explanatory variables
• e = error term for each individual response
Generalised linear mixed models
– Adding complexity to the GLM, such as by
disaggregating the error structures
3
The work of statistical modelling
• Yi = BXi + ei
• Most of the time:
– we have a single Y
– we ignore e
– we concentrate on what goes into B
4
Example
• Data: British Household Panel Survey 2005
adult interviews (7k adults in work)
• Y = GHQ scale score for adults in employment
(General Health Questionnaire, higher = worse
subjective well-being)
• X = various possible measures, including
gender, age, marital status, occupational
advantage, education, partner’s GHQ
• You can run this example, the files are at:
5
Results from four linear models
1
2
3
4
11.03**
6.29**
6.14**
6.56**
Fem
1.25**
1.28**
1.39**
Age
0.22**
0.23**
0.22**
-0.0024**
-0.0026**
-0.0024**
-0.77**
-0.76**
-1.52**
-0.01*
-0.01
Cons
Age-squared
Cohab
-0.33*
Own CAMSIS
Father’s CAMSIS
0.01
Degree/Diploma
-0.05
Vocational qual
-0.13
No qual
-0.11
Works > 10hrs
0.13
Partner’s GHQ
R2
0.08**
0.0009
0.0234
0.0244
0.0293
Some regression assumptions
All variables are measured without errors
All relevant predictors of the independent
variable are included in the analysis
Expected value of the error is zero
Heteroscedasticity of the error
No autocorrelation (no relation between error
terms for different cases)
– [above using: Menard, S. 1995. Applied
Logistic Regression Analysis, London: Sage.]
7
Multilevel modelling
• What if there was some connection
between some of the cases within the
dataset?
– This occurs by design in certain projects
• e.g. educational research, sample includes
multiple children from the same school
– Some connections (‘hierarchical clusters’) are
standard in most social surveys
8
. .
Regions
PSU1
Individuals
Person Groups
PSU2
. .
PSU3
Wave 1
Wave 2
Wave 3
.
.
.
.
Interviewers :
Interviewer1
W 1, 3 :
Interviewer2
W 2 only :
.
.
.
.
.
.
Interviewer2
Interviewer3
.
.
Interviewer3
Interviewer1
9
How to account for hierarchy /
clustering in individual data?
1. We could try a unique dummy var. for every cluster
–
–
–
–
Country: Y = BX + scot + wal + Nir + e
‘areg’ in Stata allows several hundred variables like this
often called a ‘hierarchical fixed effect’
but many hierarchies have too many clusters for this to be
satisfactory
2. We could use higher level explanatory variables
–
–
e.g. average unemployment rate in local authority district
these are also ‘hierarchical fixed effects’
3. We could try telling the model that we expect the error terms
to be related
–
these are ‘hierarchical random effects’ = multilevel models
10
Creating a multilevel model
• Linear model:
Yi = BXi + ei
• Multilevel model (‘random intercepts’)
Yij = BXij + uj + eij
• Multilevel model (‘random coefficients’)
Yij = BXij + UBj + uj + eij
11
How to implement multilevel
models?
• In SPSS and Stata, there are extension
specifications which can be made in order
to specify the simplest random intercepts
model
12
Stata examples
• regress ghq fem age age2 cohab
• regress ghq fem age age2 cohab, robust cluster(ohid)
• xtmixed ghq fem age age2 cohab ||ohid:
13
Comments
• Models which ignore clustering should be
unbiassed but inefficient
• The simplest multilevel model:
Shouldn’t change coefficent estimates
(unbiased)
Should change confidence intervals
(inefficient)
14
15
16
3-level model in Stata (xtmixed)
17
The same model in MLwiN
18
A controversial claim about Stata
• Stata is the best package to use for multilevel modelling,
because:
– It is integrated with data management capacity: easy to change
variables; change cases; add higher level explanatory variables;
etc
– It has a wide range of hierarchical model estimators
– It allows easy comparison between long-standing hierarchical
estimators (from economics) and new random effects models
• By constrast:
– Other mainstream packages don’t have adequate range of
model estimators
– Specialist packages (e.g. MLwiN; HLM) do have more advanced
modelling estimators, but they inhibit data manipulation / serious
model building
19
Download