Mixed Models: Optimizing, Iterating and beyond PCA Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 13a, April 28, 2015 1 What is a mixed model? • Often known as latent class (mixed models) or linear, or non-linear mixed models • Basic type – mix of two models – Random component to model, or is unobserved – Systematic component = observed… • E.g. linear model: y=y0+br x + bs y – y0 – intercept – br – for random coefficient – bs for systematic coefficient • Or y=y0+fr(x,u,v,w) + fs(y,z,a,b) – Or … 2 Example • Gender – systematic • Movie preference – random? • In semester – systematic • Students on campus – random? • Summer – systematic • People at the beach – random? 3 Remember latent variables? • In factor analysis – goal was to use observed variables (as components) in “factors” • Some variables were not used – why? – Low cross-correlations? – Small contribution to explaining the variance? • Mixed models aim to include them!! – Thoughts? 4 Latent class (LC) • LC models do not rely on the traditional modeling assumptions which are often violated in practice (linear relationship, normal distribution, homogeneity) – less subject to biases associated with data not conforming to model assumptions. • In addition, LC models include variables of mixed scale types (nominal, ordinal, continuous and/or count variables) in the same analysis. 5 Latent class (LC) • For improved cluster or segment description the relationship between the latent classes and external variables (covariates) can be assessed simultaneously with the identification of the clusters. – eliminates the need for the usual second stage of analysis where a discriminant analysis is performed to relate the cluster results to demographic and other variables. 6 Kinds of Latent Class Models • Three common statistical application areas of LC analysis are those that involve – 1) clustering of cases, – 2) variable reduction and scale construction, and – 3) prediction. 7 Thus! • To construct and then run a mixed model, YOU must make many choices including: – the nature of the hierarchy, – the fixed effects and, – the random effects. 8 Beyond mixture = 2? • Hierarchy, fixed, random = 3? • More? • Changes over time – a fourth dimension? 9 Comparing lm, glm, lme4, lcmm lmm.data <read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module9/lmm.data.txt", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) summary(lmm.data) id extro open agree social class school Min. : 1.0 Min. :30.20 Min. :22.30 Min. :18.48 Min. : 46.31 a:300 I :200 1st Qu.: 300.8 1st Qu.:54.17 1st Qu.:36.20 1st Qu.:31.90 1st Qu.: 89.32 b:300 II :200 Median : 600.5 Median :60.15 Median :39.98 Median :35.05 Median : 99.20 c:300 III:200 Mean : 600.5 Mean :60.27 Mean :40.06 Mean :35.07 Mean : 99.53 d:300 IV :200 3rd Qu.: 900.2 3rd Qu.:66.50 3rd Qu.:43.93 3rd Qu.:38.42 3rd Qu.:109.83 V :200 Max. :1200.0 Max. :90.83 Max. :57.87 Max. :58.44 Max. :151.96 VI :200 10 Comparing lm, glm, lme4, lcmm > head(lmm.data) id extro open agree social class school 1 1 63.69356 43.43306 38.02668 75.05811 d 2 2 69.48244 46.86979 31.48957 98.12560 a 3 3 79.74006 32.27013 40.20866 116.33897 d 4 4 62.96674 44.40790 30.50866 90.46888 c 5 5 64.24582 36.86337 37.43949 98.51873 d 6 6 50.97107 46.25627 38.83196 75.21992 d IV VI VI IV IV I > nrow(lmm.data) [1] 1200 11 Comparing lm, glm, lme4, lcmm lm.1 <- lm(extro ~ open + social, data = lmm.data) summary(lm.1) Call: lm(formula = extro ~ open + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.2870 -6.0657 -0.1616 6.2159 30.2947 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.754056 2.554694 22.998 <2e-16 *** open 0.025095 0.046451 0.540 0.589 social 0.005104 0.017297 0.295 0.768 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.339 on 1197 degrees of freedom Multiple R-squared: 0.0003154, Adjusted R-squared: -0.001355 F-statistic: 0.1888 on 2 and 1197 DF, p-value: 0.828 12 And then lm.2 <- lm(extro ~ open + agree + social, data = lmm.data) summary(lm.2) Call: lm(formula = extro ~ open + agree + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.3151 -6.0743 -0.1586 6.2851 30.0167 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 57.839518 3.148056 18.373 <2e-16 *** open 0.024749 0.046471 0.533 0.594 agree 0.026538 0.053347 0.497 0.619 social 0.005082 0.017303 0.294 0.769 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.342 on 1196 degrees of freedom Multiple R-squared: 0.0005222, Adjusted R-squared: -0.001985 F-statistic: 0.2083 on 3 and 1196 DF, p-value: 0.8907 13 anova(lm.1, lm.2) Analysis of Variance Table Model 1: extro ~ open + social Model 2: extro ~ open + agree + social Res.Df RSS Df Sum of Sq F Pr(>F) 1 1197 104400 2 1196 104378 1 21.598 0.2475 0.619 14 Nesting, etc lm.3 <- lm(extro ~ open + social + class + school, data = lmm.data) summary(lm.3) Call: lm(formula = extro ~ open + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 Coefficients: ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Estimate Std. Error t value Pr(>|t|) Residual standard error: 1.669 on (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 1189 degrees of freedom social -0.001773 0.003106 -0.571 0.568 Multiple R-squared: 0.9683, classb 2.038816 0.136575 14.928 <2e-16 *** Adjusted R-squared: 0.968 classc 3.696904 0.136266 27.130 <2e-16 *** F-statistic: 3631 on 10 and 1189 classd 5.654166 0.136286 41.488 <2e-16 *** DF, p-value: < 2.2e-16 schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 *** 15 Nesting, etc lm.4 <- lm(extro ~ = lmm.data) summary(lm.4) Call: lm(formula = extro ~ open + agree + social + class + school, open data + agree + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1270 -0.9090 0.0155 0.8734 13.7295 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.254814 0.577059 74.957 <2e-16 *** open 0.010833 0.008349 1.298 0.195 agree -0.005474 0.009605 -0.570 0.569 social -0.001762 0.003107 -0.567 0.571 classb 2.044195 0.136939 14.928 <2e-16 *** classc 3.701818 0.136577 27.104 <2e-16 *** classd 5.660806 0.136822 41.374 <2e-16 *** schoolII 7.924110 0.167391 47.339 <2e-16 *** schoolIII 12.117899 0.166983 72.569 <2e-16 *** schoolIV 16.050765 0.167177 96.011 <2e-16 *** schoolV 20.406924 0.167115 122.113 <2e-16 *** schoolVI 28.065860 0.167127 167.931 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.669 on 1188 degrees of freedom 16 Multiple R-squared: 0.9683, Adjusted R-squared: 0.968 F-statistic: 3299 on 11 and 1188 DF, p-value: < 2.2e-16 Analyze the variances** anova(lm.3, lm.4) Analysis of Variance Table Model 1: extro ~ open + social + class + school Model 2: extro ~ open + agree + social + class + school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1189 3311.4 2 1188 3310.5 1 0.90492 0.3247 0.5689 17 Specific interaction term # 'class:school’ - different situation than one # with random effects (e.g., nested variables). lm.5 <- lm(extro ~ open + social + class:school, data = lmm.data) summary(lm.5) 18 Summary Call: lm(formula = extro ~ open + social + class:school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -9.8354 -0.3287 0.0141 0.3329 10.3912 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.008e+01 3.073e-01 260.581 <2e-16 *** open 6.019e-03 4.965e-03 1.212 0.226 social 5.239e-04 1.853e-03 0.283 0.777 classa:schoolI -4.038e+01 1.970e-01 -204.976 <2e-16 *** classb:schoolI -3.460e+01 1.971e-01 -175.497 <2e-16 *** classc:schoolI -3.186e+01 1.970e-01 -161.755 <2e-16 *** classd:schoolI -2.998e+01 1.972e-01 -152.063 <2e-16 *** classa:schoolII -2.814e+01 1.974e-01 -142.558 <2e-16 *** classb:schoolII -2.675e+01 1.971e-01 -135.706 <2e-16 *** 19 Summary classc:schoolII -2.563e+01 1.970e-01 -130.139 classd:schoolII -2.456e+01 1.969e-01 -124.761 classa:schoolIII -2.356e+01 1.970e-01 -119.605 classb:schoolIII -2.259e+01 1.970e-01 -114.628 classc:schoolIII -2.156e+01 1.970e-01 -109.482 classd:schoolIII -2.064e+01 1.971e-01 -104.697 classa:schoolIV -1.974e+01 1.972e-01 -100.085 classb:schoolIV -1.870e+01 1.970e-01 -94.946 classc:schoolIV -1.757e+01 1.970e-01 -89.165 classd:schoolIV -1.660e+01 1.969e-01 -84.286 classa:schoolV -1.548e+01 1.970e-01 -78.609 classb:schoolV -1.430e+01 1.970e-01 -72.586 classc:schoolV -1.336e+01 1.974e-01 -67.687 classd:schoolV -1.202e+01 1.970e-01 -61.051 classa:schoolVI -1.045e+01 1.970e-01 -53.038 classb:schoolVI -8.532e+00 1.971e-01 -43.298 <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** <2e-16 *** 20 Summary classc:schoolVI -5.575e+00 1.969e-01 -28.310 <2e-16 *** classd:schoolVI NA NA NA NA --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9844 on 1174 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4264 on 25 and 1174 DF, p-value: < 2.2e-16 The output of both models show 'NA' where an interaction # term is redundant with one listed somewhere above it (there are 4 classes and 6 schools). 21 Specific interaction term lm.6 <- lm(extro ~ open + agree + social + class:school, data = lmm.data) summary(lm.6) # some output omitted… Residual standard error: 0.9841 on 1173 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4103 on 26 and 1173 DF, p-value: < 2.2e-16 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.036e+01 3.680e-01 218.376 <2e-16 *** open 6.097e-03 4.964e-03 1.228 0.220 agree -7.751e-03 5.699e-03 -1.360 0.174 social 5.468e-04 1.852e-03 0.295 0.768 … classd:schoolVI NA NA NA NA 22 Compare interaction terms anova(lm.5, lm.6) Analysis of Variance Table Model 1: extro ~ open + social + class:school Model 2: extro ~ open + agree + social + class:school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1174 1137.7 2 1173 1135.9 1 1.7916 1.8502 0.174 23 Structure in glm • Even the more flexible Generalized Linear Model (glm) function can not handle nested effects, although it can handle some types of random effects (e.g., repeated measures designs/data which is not covered here). • The primary benefit of the 'glm' function is the ability to specify non-normal distributions • Output from the 'glm' function offers the Akaike Information Criterion (AIC) which can be used to compare models and is much preferred over Rsquare or even adjusted R-square – lower AIC indicates a better fitting model; an AIC of 24 22.45 indicates a better fitting model than one with an AIC of 14.25 glm? 'glm' function offers the Akaike Information Criterion (AIC) – so… glm.1 <- glm(extro ~ open + social + class + school, data = lmm.data) summary(glm.1) Call: glm(formula = extro ~ open + social + class + school, data = lmm.data) Deviance Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Coefficients: 25 glm? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 social -0.001773 0.003106 -0.571 0.568 classb 2.038816 0.136575 14.928 <2e-16 *** classc 3.696904 0.136266 27.130 <2e-16 *** classd 5.654166 0.136286 41.488 <2e-16 *** schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 *** 26 glm? --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2.785041) Null deviance: 104432.7 on 1199 degrees of freedom Residual deviance: 3311.4 on 1189 degrees of freedom AIC: 4647.5 Number of Fisher Scoring iterations: 2 27 Glm2, 3 > glm.2 <- glm(extro ~ open + social + class:school, data = lmm.data) > glm.3 <- glm(extro ~ open + agree + social + class:school, data = lmm.data) 28 Compare… • Glm1 - AIC: 4647.5 • Glm2 - AIC: 3395.5 • Glm3 – AIC: 3395.6 • Conclusion? 29 However… In order to adequately test these nested (random) effects, we must turn to another type of modeling function/package. > library(lme4) 30 However… • The Linear Mixed Effects (lme4) package is designed to fit a linear mixed model or a generalized linear mixed model or a nonlinear mixed model. • Example – following lm and glm • Fit linear mixed effect models with fixed effects for open & social or open, agree, & social, as well as random/nested effects for class within school; to predict scores on the outcome variable, extroversion (extro) 31 BIC v. AIC • Note in the output we can use the Baysian Information Criterion (BIC) to compare models; which is similar to, but more conservative than (and thus preferred over) the AIC mentioned previously. • Like AIC; lower BIC reflects better model fit. • 'lmer' function uses REstricted Maximum Likelihood (REML) to estimate the variance components (which is preferred over standard Maximum Likelihood; also available as an option). 32 Random effects 1 Note below, class is nested within school, class is 'under' school. Random effects are specified inside parentheses and can be repeated measures, interaction terms, or nested (as is the case here). Simple interactions simply use the colon separator: (1|school:class) lmm.1 <- lmer(extro ~ open + social + class + (1|school/class), data = lmm.data) summary(lmm.1) 33 Summary(lmm.1) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: 3521.5 Scaled residuals: Min 1Q Median 3Q Max -10.0144 -0.3373 0.0164 0.3378 10.5788 Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8822 1.6977 school (Intercept) 95.1725 9.7556 Residual 0.9691 0.9844 Number of obs: 1200, groups: class:school, 24; school, 6 34 Fixed effects: Estimate Std. Error t value (Intercept) 5.712e+01 4.052e+00 14.098 open 6.053e-03 4.965e-03 1.219 social 5.085e-04 1.853e-03 0.274 classb 2.047e+00 9.835e-01 2.082 classc 3.698e+00 9.835e-01 3.760 classd 5.656e+00 9.835e-01 5.751 Correlation of Fixed Effects: (Intr) open social classb classc open -0.049 social -0.046 -0.006 classb -0.121 -0.002 0.005 classc -0.121 -0.001 0.000 0.500 classd -0.121 0.000 0.002 0.500 0.500 35 Random effects 2 lmm.2 <- lmer(extro ~ open + agree + social + class + (1|school/class), data = lmm.data) summary(lmm.2) 36 Summary(lmm.2) Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ open + agree + social + class + (1 | school/class) Data: lmm.data REML criterion at convergence: 3528.1 Scaled residuals: Min 1Q Median 3Q Max -10.0024 -0.3360 0.0056 0.3403 10.6559 Random effects: Groups Name Variance Std.Dev. class:school (Intercept) 2.8836 1.6981 school (Intercept) 95.1716 9.7556 Residual 0.9684 0.9841 Number of obs: 1200, groups: class:school, 24; school, 6 37 Summary(lmm.2) Fixed effects: Estimate Std. Error t value (Intercept) 57.3838787 4.0565827 14.146 open 0.0061302 0.0049634 1.235 agree -0.0077361 0.0056985 -1.358 social 0.0005313 0.0018523 0.287 classb 2.0547978 0.9837264 2.089 classc 3.7049300 0.9837084 3.766 classd 5.6657332 0.9837204 5.759 Correlation of Fixed Effects: (Intr) open agree social classb classc open -0.048 agree -0.047 -0.012 social -0.045 -0.006 -0.009 classb -0.121 -0.002 -0.006 0.005 classc -0.121 -0.001 -0.005 0.001 0.500 classd -0.121 0.000 -0.007 0.002 0.500 0.500 38 Extract # To extract the estimates of the fixed effects parameters. fixef(lmm.2) (Intercept) open agree social classb classc classd 57.3838786775 0.0061301545 -0.0077360954 0.0005312869 2.0547977907 3.7049300285 5.6657331867 39 Extract # To extract the estimates of the random effects parameters. b:III 0.2902246 d:V -0.9877007 ranef(lmm.2) b:IV 0.2664160 d:VI 3.4168733 $`class:school` (Intercept) a:I -3.4072737 a:II 0.9313953 a:III 1.3514697 a:IV 1.2673650 a:V 1.2019019 a:VI -1.3448582 b:I 0.3041239 b:II 0.2723129 b:V b:VI c:I c:II c:III c:IV c:V c:VI d:I d:II d:III d:IV 0.3434127 -1.4764901 1.3893592 -0.2505584 -0.3458313 -0.2497709 -0.3678469 -0.1753517 1.2899307 -1.1384176 -1.3554560 -1.2252297 $school (Intercept) I -13.989270 II -6.114665 III -1.966833 IV 1.940013 V 6.263157 VI 13.867597 40 Random effects 2 # To extract the coefficients for each group of the random effect factor (class = 2 groups + school = 2 groups == 4 groups) coef(lmm.2) $`class:school` (Intercept) open agree social classb classc classd a:I 53.97660 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:II 58.31527 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:III 58.73535 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:IV 58.65124 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:V 58.58578 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 41 Random effects 2 a:VI 56.03902 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:I 57.68800 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:II 57.65619 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:III 57.67410 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:IV 57.65029 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:V 57.72729 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:VI 55.90739 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:I 58.77324 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:II 57.13332 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:III 57.03805 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 42 Random effects 2 c:IV 57.13411 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:V 57.01603 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:VI 57.20853 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:I 58.67381 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:II 56.24546 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:III 56.02842 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:IV 56.15865 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:V 56.39618 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 d:VI 60.80075 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 43 Random effects 2 $school (Intercept) open agree social classb classc classd I 43.39461 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 II 51.26921 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 III 55.41705 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 IV 59.32389 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 V 63.64704 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 VI 71.25148 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 attr(,"class") [1] "coef.mer” 44 Random effects 2 coef(lmm.2)$'class:school’ # …. (Intercept) open agree social classb classc classd a:I 53.97660 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:II 58.31527 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:III 58.73535 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:IV 58.65124 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:V 58.58578 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 a:VI 56.03902 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:I 57.68800 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:II 57.65619 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:III 57.67410 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:IV 57.65029 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:V 57.72729 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 b:VI 55.90739 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:I 58.77324 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:II 57.13332 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:III 57.03805 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:IV 57.13411 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 c:V 57.01603 0.006130155 -0.007736095 0.0005312869 2.054798 3.70493 5.665733 45 prediction # To extract the predicted values (based on the fitted model). yhat <- fitted(lmm.2) summary(yhat) Min. 1st Qu. Median Mean 3rd Qu. Max. 39.91 54.43 60.16 60.27 66.35 80.49 46 prediction # To extract the residuals (errors); and summarize, as well as plot them. residuals <- resid(lmm.2) summary(residuals) Min. 1st Qu. Median Mean 3rd Qu. Max. -9.843000 -0.330600 0.005528 0.000000 0.334800 10.490000 47 Plot residuals Histogram of residuals 300 200 100 0 Frequency 400 500 600 hist(residuals) 48 −10 −5 0 residuals 5 10 Intra Class Correlation # First, run the 'null' model (which includes just the intercepts and the random effect for the highest level of the nesting variables; in this example 'school’. lmm.null <- lmer(extro ~ 1 + (1|school), data = lmm.data) summary(lmm.null) 49 summary Linear mixed model fit by REML ['lmerMod'] Formula: extro ~ 1 + (1 | school) Data: lmm.data REML criterion at convergence: 5806.1 Scaled residuals: Min 1Q Median 3Q Max -5.9773 -0.5315 0.0059 0.5298 6.2109 Random effects: Groups Name Variance Std.Dev. school (Intercept) 95.87 9.791 Residual 7.14 2.672 Number of obs: 1200, groups: school, 6 Fixed effects: Estimate Std. Error t value (Intercept) 60.267 3.998 15.07 50 Intra Class Correlation (ICC) # Notice the variance component estimates for the random effect. If we add these together, then divide that total by the 'school' variance estimate; we get the ICC 95.8720 + 7.1399 95.8720 / 103.0119 # This indicates that 93.06886% of the variance in 'extro' can be "explained" by school group membership (verified below using Bliese's 51 multilevel package). # ICC1 and ICC2 as described by Bliese. library(multilevel) aov.1 <- aov(extro ~ school, lmm.data) summary(aov.1) Df Sum Sq Mean Sq F value Pr(>F) school 5 95908 19182 2687 <2e-16 *** Residuals 1194 8525 7 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 52 ICC1/ ICC2 (Bliese) # Below (ICC1) indicates that 93.07% of the variance in 'extro' can be "explained" by school # group membership. ICC1(aov.1) [1] 0.930689 # The ICC2 value (below) of .9996 indicates that school groups can be very reliably differentiated in terms of 'extro' scores. > ICC2(aov.1) [1] 0.9996278 53 Simulating the Posterior Distribution of Predicted Values. # 'arm' package and use the 'sim' function. Note: n = 100 is the default for 'sim’. library(arm) sim.100 <- sim(lmm.2, n = 100) # Show the structure of objects in the 'sim' object. str(sim.100) <not displayed> 54 Simulating the Posterior Distribution of Predicted Values. # Fixed effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3). fe.sim <- fixef(sim.100) fe.sim (Intercept) open agree social classb classc classd [1,] 55.24643 0.0113879890 -7.370662e-03 4.115703e-03 1.99092257 2.9418821 3.162604 [2,] 56.69630 0.0051451242 -1.373704e-02 -1.799054e-03 1.73041539 3.8671053 6.160748 [3,] 63.18570 0.0003935109 2.607783e-03 1.435752e-03 1.80586410 3.2203590 5.802364 [4,] 56.00007 0.0042571840 -6.076147e-03 -5.324692e-03 2.71728164 5.6066533 6.852651 [5,] 59.94718 0.0026340937 -2.584516e-03 3.295548e-07 1.45650055 3.3174045 5.871667 [6,] 65.26589 0.0100470520 -1.324052e-02 -3.480780e-04 1.79030239 3.3253023 4.050358 [7,] 56.80116 0.0082074105 -8.175804e-03 1.182413e-03 2.35693946 3.0119753 5.937348 [8,] 61.32350 0.0047934705 -1.484498e-02 -2.710392e-03 2.11558934 4.2048688 6.552194 [9,] 53.87001 0.0054213155 -7.160089e-03 8.668833e-04 1.86080451 2.8613245 4.761669 [10,] 57.47641 0.0055136083 -6.293459e-03 -5.253847e-05 3.17600677 6.4525022 6.438270 55 Simulating the Posterior Distribution of Predicted Values. # Random effect parameter estimates resulting from the 'sim' function applied to the fitted model (lmm.3). re.sim <- ranef(sim.100) re.sim[[1]] # For "class:school" random effect. re.sim[[2]] # For ”school" random effect. 56 re.sim[[1]] # For ”class:school" random effect. , , (Intercept) a:I a:II a:III a:IV a:V a:VI b:I b:II [1,] -1.8138575 1.009722294 0.502308352 0.574242632 1.62249792 0.34486828 0.41734749 -0.516721008 [2,] -4.5023927 0.325461572 1.105711427 0.555938715 1.49927806 -1.05082790 0.72720272 1.065476210 [3,] -2.9011592 1.699112086 1.924096930 1.588047483 0.08551292 -1.71333314 0.47475579 0.095562455 [4,] -4.7454517 -1.024665550 0.449287566 1.066899463 1.56470696 -1.34450134 -0.47980863 0.964331898 [5,] -4.6413961 0.092845610 0.878011579 0.328065852 0.94227622 -2.48685750 0.13250051 0.336973705 Much more! 57 re.sim[[2]] # For "school" random effect. , , (Intercept) I II III IV V VI [1,] -10.889610 11.9319979 6.4468727 7.52046579 9.407021912 14.8484638 [2,] -11.811196 -10.1548630 -2.3812528 4.24907315 6.038850618 15.1022442 [3,] -17.642004 -6.5881409 2.6734584 5.09687885 7.313420709 7.6798984 [4,] -12.201235 -6.5415744 -6.2550322 4.62112286 13.050521302 14.7147714 [5,] -16.604904 -10.9215257 -3.2698478 2.47299902 2.276550540 11.8441601 58 Get predicted values # To get predicted values from the posterior distribution, use the 'fitted' function. yhat.lmm.2 <- fitted(sim.100, lmm.2) head(yhat.lmm.2) < see output > tail(yhat.lmm.2) < see output > 59 # The above object (yhat.lmm.2) is a matrix of 100 (simulations) by 1200 participants. # In this matrix, each row represents a participant and each column represents a simulated predicted value for the outcome variable of our lmm.2 model. # Therefore, the yhat.lmm.2 object can be used to create credible intervals for each participant (i.e. individual level). > quantile(yhat.lmm.2, probs = c(.025, .985)) # For first participant (i.e. row 1). 2.5% 98.5% 39.93096 81.29584 60 # We can also create a data frame with the quantiles for every participant. quant.mat <- data.frame(matrix(rep(NA, 1200*2), ncol = 2)) names(quant.mat) <- c("2.5%", "98.5%") quant.mat[,1] <- apply(yhat.lmm.2, 1, quantile, probs = .025) quant.mat[,2] <- apply(yhat.lmm.2, 1, quantile, probs = .985) head(quant.mat, 25) 61 Head of data frame 2.5% 98.5% 1 47.99122 80.07736 2 66.11761 72.79333 3 76.65614 83.60897 4 46.50965 79.56451 5 48.01904 80.07742 6 47.20663 54.45487 7 49.31807 75.21708 8 48.06083 80.11512 62 In R - lcmm • Estimation of latent class mixed-effect models for different types of outcomes (continuous Gaussian, continuous non-Gaussian or ordinal) • This function fits mixed models and latent class mixed models for different types of outcomes. – continuous longitudinal outcomes (Gaussian or non-Gaussian) as well as bounded quantitative, discrete and ordinal longitudinal outcomes. 63 What does it do? • The different types of outcomes are taken into account using parameterized nonlinear link functions between the observed outcome and the underlying latent process of interest. • At the latent process level, the model estimates a standard linear mixed model or a latent class mixed model when heterogeneity in the population is investigated (in the same way as in function hlme -> next) but it should be noted that the program also works when no random-effect is included. 64 What does it do? • Parameters of the nonlinear link function and of the latent process mixed model are estimated simultaneously using a maximum likelihood method. lcmm(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, link = "linear", intnodes = NULL, epsY = 0.5, data, B, convB = 1e-04, convL = 1e-04, convG = 1e-04, maxiter=100, nsim=100, prior,range=NULL, na.action=1) 65 Turning to lcmm # Beta link function m11<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1, data=data_Jointlcmm,link="beta") summary(m11) plot.linkfunction(m11,bty="l") # I-splines with 3 equidistant nodes m12<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1, data=data_Jointlcmm,link="3-equi-splines") summary(m12) # I-splines with 5 nodes at quantiles m13<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1, data=data_Jointlcmm,link="5-quant-splines") summary(m13) # I-splines with 5 nodes, and interior nodes entered manually m14<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1, data=data_Jointlcmm,link="5-manual-splines",intnodes=c(10,20,25)) summary(m14) plot.linkfunction(m14,bty="l") 66 Turning to lcmm # Thresholds # Especially for the threshold link function, we recommend to estimate models # with increasing complexity and use estimates of previous ones to specify # plausible initial values (we remind that estimation of models with threshold # link function involves a computationally demanding numerical integration # -here of size 3) m15<-lcmm(Ydep2~Time+I(Time^2),random=~Time,subject='ID',ng=1 ,data=data_Jointlcmm,link="thresholds",maxiter=100, B=c(-0.8379, -0.1103, 0.3832, 0.3788 , 0.4524, -7.3180, 0.5917, 0.7364, 0.6530, 0.4038, 0.4290, 0.6099, 0.6014 , 0.5354 , 0.5029 , 0.5463, 0.5310 , 0.5352, 0.6498, 0.6653, 0.5851, 0.6525, 0.6701 , 0.6670 , 0.6767 , 0.7394 , 0.7426, 0.7153, 0.7702, 0.6421)) summary(m15) plot.linkfunction(m15,bty="l") 67 Turning to lcmm #### Plot of estimated different link functions: #### (applicable for models that only differ in the "link function" used. #### Otherwise, the latent process scale is different and a rescaling is necessary) transfo <- data.frame(marker=m10$estimlink[,1],linear=m10$estimlink[,2], beta=m11$estimlink[,2],spl_3e=m12$estimlink[,2],spl_5q=m13$estimlink[,2], spl_5m=m14$estimlink[,2]) dev.new() plot(transfo[,1]~transfo[,2],xlim=c(-10,5),col=1,type='l',xlab="latent process", ylab="marker",bty="l") par(new=TRUE) plot(transfo[,1]~transfo[,3],xlim=c(-10,5),col=2,type='l',xlab="",ylab="",bty="l") par(new=TRUE) plot(transfo[,1]~transfo[,4],xlim=c(-10,5),col=3,type='l',xlab="",ylab="",bty="l") par(new=TRUE) plot(transfo[,1]~transfo[,5],xlim=c(-10,5),col=4,type='l',xlab="",ylab="",bty="l") par(new=TRUE) plot(m15$estimlink[,1]~m15$estimlink[,2],xlim=c(-10,5),col=5,type='l' ,xlab="",ylab="",bty="l") legend(x="bottomright",legend=c(colnames(transfo[,2:5]),"thresholds"), col=1:5,lty=1,inset=.02,bty="n”) 68 Turning to lcmm #### Estimation of 2-latent class mixed models with different assumed link #### functions with individual and class specific linear trend #### for illustration, only default initial values where used but other #### sets of initial values should also be tried to ensure convergence #### towards the golbal maximum # Linear link function m20<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE, data=data_Jointlcmm,link="linear") summary(m20) postprob(m20) # Beta link function m21<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE, data=data_Jointlcmm,link="beta") summary(m21) postprob(m21) # I-splines link function (and 5 nodes at quantiles) m22<-lcmm(Ydep2~Time,random=~Time,subject='ID',mixture=~Time,ng=2,idiag=TRUE, data=data_Jointlcmm,link="5-quant-splines") summary(m22) postprob(m22) 69 data <- data_Jointlcmm[data_Jointlcmm$ID==193,] plot.predict(m22,var.time="Time",newdata=data,bty="l") Turning to multlcmm library(lcmm) data(data_Jointlcmm) # linear link function # Latent process mixed model for two curvilinear outcomes. Link functions are aproximated by I-splines, the first one has 3 nodes (i.e. 1 internal node 8), the second one has 4 nodes (i.e. 2 internal nodes 12,25) m1 <multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm) Be patient, multlcmm is running ... The program took 56.14 seconds 70 Quicker lcmm # to reduce the computation time, the same model is estimated using # a vector of initial values m1 <multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time, subject="ID",randomY=TRUE,link=c("4-manual-splines","3manual-splines"), intnodes=c(8,12,25),data=data_Jointlcmm, B=c(-1.071, -0.192, 0.106, -0.005, -0.193, 1.012, 0.870, 0.881, 0.000, 0.000, -7.520, 1.401, 1.607 , 1.908, 1.431, 1.082, -7.528, 1.135 , 1.454 , 2.328, 1.052)) Be patient, multlcmm is running ... The program took 7.78 seconds 71 Summary(m1) General latent class mixed model fitted by maximum likelihood method multlcmm(fixed = Ydep1 + Ydep2 ~ 1 + Time * X2 + contrast(X2), random = ~1 + Time, subject = "ID", randomY = TRUE, link = c("4manual-splines", "3-manual-splines"), intnodes = c(8, 12, 25), data = data_Jointlcmm) Statistical Model: Dataset: data_Jointlcmm Number of subjects: 300 Number of observations: 3356 Number of latent classes: 1 Number of parameters: 21 Link functions: Quadratic I-splines with nodes 0 8 12 17.581 for Ydep1 Quadratic I-splines with nodes 0 25 30 for Ydep2 72 Summary(m1) Iteration process: Convergence criteria satisfied Number of iterations: 4 Convergence criteria: parameters= 5.2e-11 : likelihood= 2.1e-08 : second derivatives= 1.2e-09 Goodness-of-fit statistics: maximum log-likelihood: -6977.48 AIC: 13996.95 BIC: 14074.73 73 Summary(m1) Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-value intercept (not estimated) 0.00000 Time -1.07056 0.12293 -8.70900 0.00000 X2 -0.19225 0.16697 -1.15100 0.24957 Time:X2 0.10627 0.18634 0.57000 0.56847 Contrasts on X2 (p=0.88696) Ydep1 -0.00483 0.03399 -0.14215 0.88696 Ydep2* 0.00483 0.03399 0.14215 0.88696 *coefficient not estimated but obtained from the others as minus the sum of them Variance-covariance matrix of the random-effects: (the variance of the first random effect is not estimated) intercept Time intercept 1.00000 Time -0.19338 1.01251 74 Summary(m1) – last bit! Ydep1 Ydep2 Residual standard error: 0.86955 0.88053 Standard error of the random effect: 0.00000 0.00000 Parameters of the link functions: coef Se Wald p-value Ydep1-I-splines1 -7.51985 0.64412 -11.675 0e+00 Ydep1-I-splines2 1.40067 0.18058 7.756 0e+00 Ydep1-I-splines3 1.60739 0.10324 15.569 0e+00 Ydep1-I-splines4 1.90822 0.07873 24.238 0e+00 Ydep1-I-splines5 1.43117 0.09075 15.770 0e+00 Ydep1-I-splines6 1.08205 0.21198 5.105 0e+00 Ydep2-I-splines1 -7.52861 0.67080 -11.223 0e+00 Ydep2-I-splines2 1.13505 0.25553 4.442 1e-05 Ydep2-I-splines3 1.45345 0.14629 9.935 0e+00 Ydep2-I-splines4 2.32793 0.08636 26.956 0e+00 Ydep2-I-splines5 1.05187 0.05908 17.803 0e+00 75 plot(m1,which="linkfunction") # variation percentages explained by linear mixed regression 30 17.58 Estimated link functions 25 15 Ydep1 Ydep2 0 0 5 5 10 15 10 20 > VarExpl(m1,data.frame(Time=0)) class1 %Var-Ydep1 56.94364 %Var-Ydep2 56.32753 76 −6 −4 −2 Latent process 0 2 4 summary(m2) <…> # posterior classification postprob(m2) Posterior classification: class1 class2 N 143.00 157.00 % 47.67 52.33 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2 class1 1.0000 0.0000 class2 0.0589 0.9411 Posterior probalities above a threshold (%): class1 class2 prob>0.7 100 98.09 Prob>0.8 100 96.18 prob>0.9 100 85.99 77 # longitudinal predictions in the outcomes scales for a given profile of covariates newdata <data.frame(Time=seq(0,5,length=100),X1=rep(0,100), X2=rep(0,100),X3=rep(0,100)) predGH <predictY(m2,newdata,var.time="Time",methInteg=0,ns im=20) head(predGH) Etc. 78 In lcmm - hlme • Fits a latent class linear mixed model (LCLMM) also known as growth mixture model or heterogeneous linear mixed model. • LCLMM consists in assuming that the population is divided in a finite number of latent classes; each latent class is characterized by a specific mean trajectory which is described by a class-specific linear mixed model. • Both the latent class membership and the trajectory can be explained according to covariates. • This model is limited to a Gaussian outcome. 79 In R hlme(fixed, mixture, random, subject, classmb, ng = 1, idiag = FALSE, nwg = FALSE, cor=NULL, data, B, convB=0.0001, convL=0.0001, convG=0.0001, prior, maxiter=500, subset=NULL, na.action=1) 80 Example data(data_hlme) m1<-hlme(Y~Time*X1, random=~Time, subject='ID', ng=1, idiag=TRUE, data=data_hlme) summary(m1) 81 Summary hlme Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = Y ~ Time * X1, random = ~Time, subject = "ID", ng = 1, idiag = TRUE, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 1 Number of parameters: 7 82 Summary hlme Iteration process: Convergence criteria satisfied Number of iterations: 9 Convergence criteria: parameters= 1.2e-07 : likelihood= 1.6e-05 : second derivatives= 6.2e-13 Goodness-of-fit statistics: maximum log-likelihood: -804.98 AIC: 1623.95 BIC: 1642.19 Maximum Likelihood Estimates: Fixed effects in the longitudinal model: coef Se Wald p-value intercept 25.86515 0.79448 32.556 0.00000 Time -0.33282 0.17547 -1.897 0.05787 X1 1.69698 1.03466 1.640 0.10098 Time:X1 -0.39364 0.22848 -1.723 0.08491 83 Summary hlme Variance-covariance matrix of the random-effects: intercept Time intercept 24.63032 Time 0.00000 1.168762 coef se Residual standard error: 0.9501876 0.05765784 84 plot(m1) 24 25 26 −1 0 1 2 23 −2 subject−specific residuals 10 0 −10 −20 marginal residuals 20 marginal residuals versussubject−specific marginal predictions residuals versus subject−specific predictions 27 10 20 30 40 subject−specific predictions marginal predictions 2 0 −2 sample quantiles 2 0 85 −4 −2 sample quantiles Normal QQ Plot for marginal residuals Normal QQ Plot for subject−specific residuals −3 −2 −1 0 1 2 theoretical quantiles 3 −3 −2 −1 0 1 2 theoretical quantiles 3 Example m2<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0.11, -0.74, -0.07, 20.71, 29.39, -1, 0.13, 2.45, -0.29, 4.5, 0.36, 0.79, 0.97)) Iteration process: m2 Convergence criteria satisfied Number of iterations: 2 Convergence criteria: parameters= 1.3e-07 : likelihood= 4.4e-07 hlme(fixed = Y ~ Time * X1, mixture = ~Time, random = ~Time, : second derivatives= 2.5e-12 Heterogenous linear mixed model fitted by maximum likelihood method subject = "ID", classmb = ~X2 + X3, ng = 2, data = data_hlme) Statistical Model: Dataset: data_hlme Number of subjects: 100 Number of observations: 326 Number of latent classes: 2 Number of parameters: 13 Goodness-of-fit statistics: maximum log-likelihood: -773.82 AIC: 1573.64 BIC: 1607.51 86 Example summary(m2) postprob(m2) Posterior classification: class1 class2 N 46 54 % 46 54 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2 class1 0.9588 0.0412 class2 0.0325 0.9675 Posterior probalities above a threshold (%): class1 class2 prob>0.7 93.48 100.00 prob>0.8 93.48 92.59 prob>0.9 86.96 83.33 87 22 24 26 1 0 −1 −2 subject−specific residuals 10 0 −10 −20 marginal residuals 20 2 marginal residuals versussubject−specific marginal predictions residuals versus subject−specific predictions 28 10 20 30 40 subject−specific predictions marginal predictions 2 0 −2 sample quantiles 2 0 −2 sample quantiles Normal QQ Plot for marginal residuals Normal QQ Plot for subject−specific residuals 88 −3 −2 −1 0 1 2 theoretical quantiles 3 −3 −2 −1 0 1 2 theoretical quantiles 3 Example ### same model as m2 but initial values specified m3<-hlme(Y~Time*X1, mixture=~Time, random=~Time, classmb=~X2+X3, subject='ID', ng=2, data=data_hlme, B=c(0, 0, 0, 30, 25, 0, -1, 0, 0, 5, 0, 1, 1)) m3 89 Predicting… summary(m3) Etc. ## plot of predicted trajectories using some newdata newdata<-data.frame( Time= seq(0,5,length=100), X1= rep(0,100), X2=rep(0,100), X3=rep(0,100)) plot.predict(m3,newdata,"Time","right",bty="l") 90 plot m3 22 24 26 1 0 −1 −2 subject−specific residuals 10 0 −10 −20 marginal residuals 20 2 marginal residuals versussubject−specific marginal predictions residuals versus subject−specific predictions 28 10 20 30 40 subject−specific predictions marginal predictions 2 0 −2 sample quantiles 2 0 −2 sample quantiles Normal QQ Plot for marginal residuals Normal QQ Plot for subject−specific residuals 91 −3 −2 −1 0 1 2 theoretical quantiles 3 −3 −2 −1 0 1 2 theoretical quantiles 3 Beyond PCA! • Kernel PCA • ICA – PCA is not particularly helpful for finding independent clusters – ICA idea: – Assume non-Gaussian data – Find multiple sets of components – Minimize correlation between components – Blind source separation example: – Given: Audio recording with w/2 overlapping voices – Goal: Separate voices into separate tracks 92 Beyond PCA! Probabilistic PCA Bayesian source separation Continuous Latent Variables 93 Reading, etc. • http://data-informed.com/focus-predictiveanalytics/ • Lab on Friday – May 1 – open • Assignment 7 due • Next week – final project presentations – Tuesday and Friday – we will run over the class time to complete these – plan accordingly • 5 MINUTES – you should not need more 94