Univariate Models

advertisement
Univariate Model Fitting
Sarah Medland
QIMR – openMx workshop
Brisbane 16/08/10
Univariate Twin Models
= using the twin design to assess the
aetiology of one trait (univariate)
1.
2.
Path Diagrams
Basic ACE R Script
1. Path Diagrams for Univariate
Models
Basic Twin Model - MZ
1.0
1.0
1
1
A
1
1
C
a
c
E
e
Twin 1 Trait
E
C
e
1
1
c
A
a
Twin 2 Trait
Basic Twin Model - DZ
0.5
1.0
1
1
A
1
1
C
a
c
E
e
Twin 1 Trait
E
C
e
1
1
c
A
a
Twin 2 Trait
Basic Twin Model
rMZ = 1.0;
rDZ = 0.5
rMZ = 1.0;
rDZ = 1.0
1
1
A
1
1
C
a
c
E
e
Twin 1 Trait
E
C
e
1
1
c
A
a
Twin 2 Trait
The Variance

Since the variance of a variable is the
covariance of the variable with itself,
the expected variance will be the sum
of all paths from the variable to itself,
which follow Wright’s rules
Variance for Twin 1 - A
1
1
A
1
C
a
c
Twin 1 Trait
E
e
a*1*a = a2
Variance for Twin 1 - C
1
1
A
1
C
a
c
Twin 1 Trait
E
e
a*1*a = a2
c*1*c =
2
c
Variance for Twin 1 - E
1
1
A
1
C
a
c
Twin 1 Trait
E
e
a*1*a =
2
a
c*1*c =
2
c
e*1*e = e2
Total Variance for Twin 1
1
1
A
1
C
a
c
E
e
Twin 1 Trait
a*1*a =
2
a
c*1*c =
2
c
e*1*e = e2
Total Variance = a2 + c2 + e2
Covariance - MZ
1.0
2
a
1.0
1
1
A
1
1
C
a
c
E
e
Twin 1 Trait
E
C
e
1
1
c
A
a
Twin 2 Trait
+
2
c
Covariance - DZ
0.5
2
0.5a
1.0
1
1
A
1
1
C
a
c
E
e
Twin 1 Trait
E
C
e
1
1
c
A
a
Twin 2 Trait
+
2
c
Variance-Covariance Matrices
MZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2
a2 + c2
Twin 2
a2 + c2
a2 + c2 + e2
Variance-Covariance Matrices
DZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2 0.5a2 + c2
Twin 2
0.5a2 + c2 a2 + c2 + e2
Variance-Covariance Matrices
MZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2
a2 + c2
Twin 2
a2 + c2
a2 + c2 + e2
DZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2
0.5a2 + c2
Twin 2
0.5a2 + c2
a2 + c2 + e2
Why isn’t e2 included in the covariance?
Because, e2 refers to environmental
influences UNIQUE to each twin.
Therefore, this cannot explain why
there is similarity between twins.
Why is a2 only .5 for DZs but not
MZs?
Because DZ twins share on average half
of their genes, whereas MZs share all
of their genes.
2. Basic openMx ACE Script
Overview

OpenMx script

Running the script

Describing the output
ACE model
Specify the
matrices you need
To build the model
# Fit ACE Model with RawData and Matrices Input
# ----------------------------------------------------------------------univACEModel <- mxModel("univACE",
mxModel("ACE",
# Matrices a, c, and e to store a, c, and e path coefficients
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="a11", name="a" ),
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="c11", name="c" ),
mxMatrix( type="Lower", nrow=nv, ncol=nv, free=TRUE, values=10, label="e11", name="e" ),
# Matrices A, C, and E compute variance components
mxAlgebra( expression=a %*% t(a), name="A" ),
1
1
1
mxAlgebra( expression=c %*% t(c), name="C" ),
A
C
mxAlgebra( expression=e %*% t(e), name="E" ),
# Algebra to compute total variances and standard deviations (diagonal only)
mxAlgebra( expression=A+C+E, name="V" ),
c
e
a
Do some algebra to
get the variances
Twin 1 Trait
E
Start values?
a2 = additive genetic variance (A)
c2 = Shared E variance (C)
e2 = Non-shared E variance (E)
Sum is modelled to be expected Total Variance
Start Values for a, c, e:
1
 (Total Variance / 3)
1
A
1
C
a
c
Twin 1 Trait
E
e
Standardize parameter estimates
# Algebra to compute total variances and standard deviations (diagonal only)
mxAlgebra( expression=A+C+E, name="V" ),
mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"),
mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"),
# Algebra to compute standardized path estimares and variance components
mxAlgebra( expression=a%*%iSD, name="sta"),
mxAlgebra( expression=c%*%iSD, name="stc"),
mxAlgebra( expression=e%*%iSD, name="ste"),
mxAlgebra( expression=A/V, name="h2"),
mxAlgebra( expression=C/V, name="c2"),
1
mxAlgebra( expression=E/V, name="e2"),
A
The regression coefficient a is standardized by: (a * SD(A)) / SD(Trait)
where SD(Trait) is the standard deviation of the dependent
variable, and SD(A) is the standard deviation of the predictor,
the latent factor ‘A’ (=1)
1 * V = V
SD
inv
=
1/SD
V
a
Twin 1 Trait
a * 1/SD = a/SD
Standardize parameter estimates
# Algebra to compute total variances and standard deviations (diagonal only)
mxAlgebra( expression=A+C+E, name="V" ),
mxMatrix( type="Iden", nrow=nv, ncol=nv, name="I"),
mxAlgebra( expression=solve(sqrt(I*V)), name="iSD"),
# Algebra to compute standardized path estimares and variance components
mxAlgebra( expression=a%*%iSD, name="sta"),
mxAlgebra( expression=c%*%iSD, name="stc"),
mxAlgebra( expression=e%*%iSD, name="ste"),
mxAlgebra( expression=A/V, name="h2"),
mxAlgebra( expression=C/V, name="c2"),
1
mxAlgebra( expression=E/V, name="e2"),
A
The heritability ‘h2’ is the proportion of the total variance due to A (additive
genetic effects; = A / V. Note: this will be “sta” squared.
The standardized variance components for C and E are: C / V; E / V N
A / V =
“h2”
“sta”
2
sta
Twin 1 Trait
Standardising data
V=A+C+E
A/V
=
73/233=
.31
V = [73] + [90] + [70]
C/V
=
90/233=
.39
V = [233]
E/V
=
70/233=
.30
a = 8.55
c = 9.49
e = 8.35
SD = sqrt(V) =
15.26
sta = 8.55 / 15.26 = 0.56
stc = 9.49 / 15.26 = 0.62
ste = 8.35 / 15.26 = 0.55
squared = .31
squared = .39
squared = .30
# Algebra for expected variance/covariance matrix in MZ
mxAlgebra( expression= rbind ( cbind(A+C+E , A+C),
cbind(A+C , A+C+E)), name="expCovMZ" ),
# Algebra for expected variance/covariance matrix in DZ
mxAlgebra( expression= rbind ( cbind(A+C+E , 0.5%x%A+C),
cbind(0.5%x%A+C , A+C+E)), name="expCovDZ" )
MZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2
a2 + c2
Twin 2
a2 + c2
a2 + c2 + e2
DZ
Twin 1
Twin 2
Twin 1
a2 + c2 + e2
0.5a2 + c2
Twin 2
0.5a2 + c2
a2 + c2 + e2
1/ 0.5
1.0
1
1
A
C
a
c
1
E
e
Twin 1 Trait
1
E
e
C
c
1
a
Twin 2 Trait
1
A
mxModel("MZ",
mxData( observed=mzData, type="raw" ),
mxFIMLObjective( covariance="ACE.expCovMZ",
means="ACE.expMean", dimnames=selVars )
),
mxModel("DZ",
mxData( observed=dzData, type="raw" ),
mxFIMLObjective( covariance="ACE.expCovDZ",
means="ACE.expMean", dimnames=selVars )
),
mxAlgebra( expression=MZ.objective + DZ.objective,
name="m2ACEsumll" ),
mxAlgebraObjective("m2ACEsumll"),
mxCI(c('ACE.A', 'ACE.C', 'ACE.E'))
)
univACEFit <- mxRun(univACEModel, intervals=T)
univACESumm <- summary(univACEFit)
univACESumm
Sub-models
# Fit AE model
# ----------------------------------------------------------------------univAEModel <- mxModel(univACEFit, name="univAE",
mxModel(univACEFit$ACE,
mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="c11", name="c" ) )
)
univAEFit <- mxRun(univAEModel)
univAESumm <- summary(univAEFit)
univAESumm
A
E
Twin 1 Trait
E
A
Twin 2 Trait
You can fit sub-models to test the significance
of your parameters
-you simply drop the parameter and see if the
model fit is significantly worse than full model
Sub-models
# Fit CE model
# ----------------------------------------------------------------------univCEModel <- mxModel(univACEFit, name="univCE",
mxModel(univACEFit$ACE,
mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="a11", name="a" ) )
)
univCEFit <- mxRun(univCEModel)
univCESumm <- summary(univCEFit)
univCESumm
C
Twin 1 Trait
E
E
C
Twin 2 Trait
E
Twin 1 Trait
E
Twin 2 Trait
# Fit E model
# ----------------------------------------------------------------------univEModel <- mxModel(univAEFit, name="univE",
mxModel(univAEFit$ACE,
mxMatrix( type="Lower", nrow=1, ncol=1, free=FALSE, values=0, label="a11",
name="a" ) )
)
The E parameter can never not be dropped
univEFit <- mxRun(univEModel)
univESumm <- summary(univEFit)
because it includes measurement error
univESumm
OpenMx Output
univACESumm
free parameters:
name matrix row col Estimate
1 a11 ACE.a 1 1 8.546504
2 c11 ACE.c 1 1 9.488454
3 e11 ACE.e 1 1 8.352197
4 mean ACE.Mean 1 1 94.147803
observed statistics: 2198
estimated parameters: 4
degrees of freedom: 2194
-2 log likelihood: 17639.91
saturated -2 log likelihood: NA
number of observations: 1110
chi-square: NA
p: NA
AIC (Mx): 13251.91
BIC (Mx): 1127.663
adjusted BIC:
RMSEA: NA
tableFitStatistics
models compared to saturated model
Name
ep
-2LL
df
M1 : univTwinSat
M2 : univACE
M3 : univAE
M4 : univCE
M5 : univE
10
4
3
3
2
17637.98
17639.91
17670.38
17665.27
18213.28
2188
2194
2195
2195
2196
AIC
13261.98
13251.91
13280.38
13275.27
13821.28
diffLL diffdf
1.93
32.4
27.3
575.3
6
7
7
8
p
0.93
0
0
0
Smaller -2LL means better fit. -2LL of sub-model is
always higher (worse fit). The question is: is it
significantly worse.
Chi-sq test: dif in -2LL is chi-square distributed.
Evaluate sig of chi-sq test.
A non-sig p-value means that the model is consistent
with the data.
Nested.fit
models compared to ACE model
Name
ep
-2LL
df
diffLL diffdf
p
univACE
univAE
univCE
univE
4
3
3
2
17639.91
17670.38
17665.27
18213.28
2194
2195
2195
2196
NA
NA
32.47
1
25.36
1
573.37 2
NA
0
0
0
Smaller -2LL means better fit. -2LL of sub-model is
always higher (worse fit). The question is: is it
significantly worse.
Chi-sq test: dif in -2LL is chi-square distributed. Evaluate
sig of chi-sq test. Critical Chi-sq value for 1 DF = 3.84
A non-sig p-value means that the dropped parameter(s)
are non-significant.
Estimates ACE model
> univACEFit$ACE.h2
[,1]
[1,] 0.3137131
> univACEFit$ACE.c2
[,1]
[1,] 0.3866762
> univACEFit$ACE.e2
[,1]
[1,] 0.2996108
Download