General Linear Model (GLM) The GLM gives us a convenient, code-efficient way to conduct statistical tests with design matrices. These matrices contain all the relevant group membership assignments and also utilizes linear algebra to garner sums over groups and yield coefficients, etc. The first thing we need is a design matrix, which is given by X and is of dimensions [n x (p+1)]. So, we have rows of subjects and columns of predictors with an extra concatenated column to be all 1s (which allows for the mean to be inserted into each calculation). This is essentially nothing more than our multiple regression equation, but in matrix notation. The treatment (predictor) columns receive a 1 for each subject that was in that treatment and a 0 if that subject was not. Now, when we multiply this matrix by a vector that is [(p+1) x 1] we will get a vector of size [n x 1] that we can then add an error vector of the same dimensions to. This essentially yields a model associated with each subject’s response. However this matrix is slightly redundant and can be revised. We don’t need all treatments to be represented, because if we know a subject isn’t in A or B, then they are, by default, in C. Furthermore, we can remove the mean from the equation since it doesn’t have any variance and can just be added in as a constant later. By removing these columns we the scope of our question. We are comparing groups A and B to C instead of to the grand mean. However, we don’t want that, so we need to make the mean of each column to be 0 so that we measure in relation to the mean instead of the left-out group. We can do that by setting both columns A and B to [-1] whenever the subject was a member of group C. So, now: πΏπ πππ = [π ∗ (π − π)] Akin to an ANOVA, our intercept term that results from this equation is equal the grand mean of Y (whereas with dummy coding it would be the mean for the control group). Our regression coefficients will represent the estimated treatment effects (mean of a group minus grand mean. In dummy it would be compared to control) and our R squares will be the same as the Eta-squared because they both will estimate the percentage of variation in the dependent variable that is accounted for by variation among treatments. If we want to start adding in factorial designs and interaction terms, we need to split up the levels of each main effect and also create a column for each interaction. For the main effect, we would set up a within-column comparison where if there is a level A then we give 1 to the first level of A and -1 to the second level of A (giving the column a mean of 0). Same would go for B. The interaction column is nothing more than a multiplication of the columns that the interaction relates to. If there are 3 levels, use 0 if not in the current one, 1 if in the current 1 and -1 if in the third (same setup as before). A word about different types of Sum of Squares: ππ¦ππ πΌπΌπΌ ππππ’π π‘π πππ πππ ππ‘βππ ππππππ‘π : πππ΄π΅ = ππππππππ π ππππ΄,π΅,π΄π΅ − ππππππππ π ππππ΄,π΅ ππ¦ππ πΌπΌ πππππππ πππ‘πππππ‘πππ πππ ππππ ππππππ‘π : πππ΄ = ππππππππ π ππππ΄,π΅ − ππππππππ π ππππ΅ Type I SS is dependent on hierarchy. The highest effect up in the model simply gets the SS associated with a regression that is run with that effect as the sole predictor. Subsequent effects are measure as the difference between the current effect combined with the previous effect as compared to the previous effect in isolation. So, if you are interested in A then B then AB you would use ππππππππ π πππ π΄ as πππ΄ , then πππ΅ = ππππππππ π ππππ΄,π΅ − ππππππππ π ππππ΄ then πππ΄π΅ = ππππππππ π ππππ΄,π΅,π΄π΅ − ππππππππ π ππππ΄,π΅ We can examine the effects of our interactions. If the interaction term accounts for any of the variation in Y, then removing the interaction predictors from the model should lead to a decrease in accountable variation. We calculate R with and without the interaction term(s) and observe the differences in our SSregression. We can do the same thing to test for our main effect. We find the SSregression of the model that includes our effect of interest. Then, get SSregression when that wasn’t included. This yields SS for the effect of interest. Then, we use the SSresidual from our full model as the error term when we wish to compute an F-statistic. As an example: πΊπΊπππππππππππ¨,π©,π¨π© − πΊπΊπππππππππππ©,π¨π© = πΊπΊπ¨ π= π΄πΊπ¨ π΄πΊπππππ ππππ¨,π©,π¨π© The df for our error term is (N-a*b) and the df for A is simply (a-1). Analysis of Covariance (ANCOVA) What if a variable of non-interest is driving the change in our dependent variable? It might account for some of the variance, but it’s not what we are interested in. Therefore, we want to see if our predictors can account for the variance in the dependent variable above and beyond these covariates. We should partial out the variance that can be attributed to the covariate. The covariate is usually correlated with the DV, but if we did our job of random assignment properly and the covariate will reduce the error term. However, it shouldn’t be correlated with the IV. If it is, then the means get adjusted for the treatment effects. Essentially, what we want to do is obtain “adjusted means” for our treatment effects, where the means are “adjusted” to be what they would be if treatments had not varied on the covariate. Then we use an ANCOVA to test whether these adjusted means differ significantly by using an error term from which the variance attributable to the covariate has been partialled out. The analysis is not much different than a normal ANOVA except that we want to remove the effect of the covariate. Before we start with an ANCOVA we need to meet the assumption of homogeneity of regression. This states that the regression coefficients are equal across treatments (that they are parallel). Data that violates this can’t be analyzed by an ANCOVA because it would be impossible to estimate a common slope between the covariate and the DV. Basically, we are just comparing SSregression when the covariate is included as compared to when it is not. The difference between a model that includes the covariate and one that does not is the variation attributable to treatment effects that are over and beyond that attributable to the covariance. Formally the full ANCOVA model is given by: πππ = ππ + π + πππ + πππ The covariate (C) when it interacts with the treatment effect is the term that represents the testing of the homogeneity of regression. If the regression lines are parallel and thus the slopes of the regression lines that could be calculated for each group separately are homogenous then we have homogeneity of regression and the deletion of the interaction term should produce only a trivial decrement in the percentage of accountable variation. We can compare the obtained R values with and without the covariate interacting with a particular treatment by running the F-test on the difference between the two models. If we don’t reject the null, then we can remove that interaction of covariate and treatment, but still keep the covariate in. We always want the SS residual of the full model (including the covariate) to be our SS error for subsequent testing. The difference between SS regression with and without the group membership predictors must be the amount of the sum of squares that can be attributable to treatment over and above the amount that can be explained by the covariate. The difference, thus, would, technically, be the SS treatment-adjusted since we have removed the covariate’s contribution from the full model and essentially set the SS treatment to be what it would be if covariates did not play a role (i.e if each treatment group was evaluated at the same covariate level). This subtraction essentially adjusts for any effects of the covariate and can be seen formally as: πΊπΊπππππππππ(ππ ππππππ ) = πΊπΊπππππππππππ,π + πΊπΊπππππππππππ We can also do the reverse to estimate SS covariate to see how powerful our covariate is as a predictor of non-interest. The df for covariates is simply the number of covariates and the df for error is N-k-c. We can test our MS adjusted treatment (with df k-1) against the MS error where we use the SS residual from the full model (covariate included) to obtain an F-statistic. In order to interpret the results of a significant adjusted treatment effect, we need to obtain the treatment means adjusted for the effects of the covariate. We need an estimate of what the predictor mean for a specific treatment would be if the predictors had not differed on the preinjection means. We simply are trying to Μ ′π . In order to do so, we simply use the mean of the covariate instead of the predict π individual covariate, multiply that by the covariate slope from the full model and add in the regression coefficient dummy effect codes for the treatment group of interest. Then, add the intercept and error term as well and you will yield an adjusted group mean. Any individual comparisons among treatments would now be made from these adjusted means. However, we need to modify our error term from just using SS residuals from the full model. We need to use π΄πΊ′πππππ (the full model error) in conjunction with πΊπΊπ(π) which is the error from the analysis of variance using only the covariate as the predictor. We take the mean covariate at the treatment of interest as well in order to form the following equation of comparisons of adjusted group means: π(π,π΅−π−π) = Μ ′π − π Μ ′π )π (π (πͺπ − πͺπ )π π π π΄πΊ′πππππ [(π + π ) + πΊπΊ ] π π π(π) As for calculating the effect size of our effect…if the covariate naturally varies in the population, then we can obtain: πΌπ = πΊπΊππ ππππππ πππππππππ πΊπΊπππππ We can also technically yield this eta value by getting the difference between an R2 from a model predicting the dependent variable from only the covariate and one predicting the dependent variable from both the covariate and the treatment. The eta would be the contribution to explained variation of the treatment after controlling for the covariate. We can also calculate a d-family statistic in the normal format of: π π Where π is the difference between two means. π would be the square root of our MS error term from the full model (SS residual) or using the error from a control group if we are comparing means. π = We typically want to use our covariate to help reduce our error term because we our treatment groups should be randomly assigned and the subjects within them should not vary aside from pure error. If we want to compare adjusted means in a weighted fashion (comparing one mean to two others like in a linear contrast) then we need a new error term: π΄πΊ′′ πππππ πΊπΊπ(π) π−π = π΄πΊ′πππππ [π + ] πΊπΊπ(π) Where anything postpended with a (c) is the sum of squares when the analysis of variance was done on the covariate (i.e the covariate was the sole predictor, but given group membership). Then we do our typical linear contrast to yield π and get an F-statistic via: π= πππ ∑ πππ π΄πΊ′′ πππππ If we want to do a test of simple effects (between cells) then we still need to adjust our error term further: πΊπΊπππππ(π) ′ ππ΄πΊπππππ ππ − π π΄πΊ′′ [π + ] πππππ = π πΊπΊπ(π) Where πππππππ (π) is from an ANOVA on the covariate. Meta-Analysis A meta-analysis is averaging the results of many studies on a single topic. We want to first plot each study on a separate line in a forest plot, where we can indicate effect size and the confidence interval on that effect size. We must decide if what we are trying to measure is a fixed or random effect. A fixed effect model assumes that there is one true effect size that we are trying to estimate by looking at the results of multiple experiments. If you were an astronomer attempting to measure the luminosity of a aprticular star, it is reasonable to think that it does have one true luminosity and the difference between the measurements of you and your colleagues is just error variance. Thus, we assume that the only reason for variability in measurement is random sampling error. If each of our studies had an infinite number of participants, all studies would come to the same result because they are all measuring the same thing. In regards to other variables, like depression, the waters are muddier: it may vary based on gender, family settings, and a host of other variables. Random effect models, thus, will assume that the true effects are randomly and normally distributed around some value. So, we insert random error into our random model because the true effects we are aiming for may well differ from study to study and are not all equal to some overall mean effect. So..a random effects model has the difference between the true effect and gran mean of effects in the model, whereas a fixed model does not have this term because it assumes that the grand mean is the true effect. With a meta-analysis we want to calculate the over all effect by weighting the effect size of each study. To do this we use the inverse of the variance estimate: πΎπππππππ ππππ π = π πππ πππππππππππ π We remember d as the difference of a control and a treatment over the standard deviation of the control. And to get its standard error: ππ + ππ π π = + ππ ππ π(ππ ) We can use these to construct confidence intervals where we add/subtract 1.96*π π2 ππ’πππππ‘π π‘π’ππ¦ to d. πππ πππππππππππ π When looking at multiple studies, we want to yield the overall effect: ∑ πΎπ π π Μ = π ∑ πΎπ We can compute confidence intervals around this overall effect to see if it includes 0 by getting the standard error of the overall effect: π ππ Μ = √ ∑ πΎπ However, we also want to make sure that out effects are measuring the same thing, so we do this by measuring heterogeneity of a fixed effects model, which is given by the statistic Q, which is simply a weighted sum of squared deviations for each study from the mean effect size(analogous to SS between): π Μ )π πΈ = ∑ πΎπ (π π − π π=π We can test this distribution under the chi-square distribution on k-1 studies. If we want to test the same thing but for a random effect, we want to incorporate each study’s true effect from the average effect and take that average: π»π = πΈ−π π πͺ ∑ πΎπ ; πͺ = ∑ πΎπ − ∑ πΎπ π Keep in mind that Q measures the differences among effect size measures, while df is the expected variability if the null hypothesis is true. So the numerator of T is the excess variability that cannot be attributed to random differences among studies. You can think of C as analogous to the within groups term. We can compare Qs as well within subgroups of our studies. Say a group of studies did an intervention on rainy days and another did the intervention on sunny days. What if there is a meta-effect? We can just calculate Q for each subgroup and test if the Qs are different by comparing the sum of all the subgroups Q to the Q that was calculated using each study at once. We can then have a Q that represents the difference between these two Qs and test that under the null hypothesis that the effect size is the same for all groups and test it on the chi-square on g-1 df, where g is the number of grouping we split our collection of studies into. If the variance between studies is not significant, then we have a fixed effect model. We can really do these Q statistics with any effect measure we want. If we are dealing with Risk Ratios, though, we will want to always take the log of that. Non-Parametric Testing Your data doesn’t meet the distribution assumptions of the test you want to use? Fine. Make your own distribution. These tests will general be robust to outliers. In bootstrapping we assume that the shape of the population is accurately reflected in the distribution of a sample that we have required. So, we just draw samples from our own distribution over and over again (with replacement – we pull one value and then allow that value to be an option for choice on our second pull) in order to create new samples that are just shuffling (with some duplications) of our current sample. We do this sampling 10,000 times and then determine the extremes of our distribution (the 2.5% of each end of the distribution) and see if our actual, observed result falls outside and is considered extreme. We mainly use bootstrapping, though, for deriving estimates of variation. We can do this for regressions to get the standard error of our beta coefficients as well. Resampling (permutation test) does a similar thing in that it shuffled group assignment randomly, does statistical tests on those groups and forms a distribution of results by doing it 10,000 times. Then, we compare our actual statistic to this distribution. We do the same for paired sampled by drawing a large number of samples of 19 difference scores, randomly assigning positive and negative signs to the differences and then calculating the median of the differences. Do this 10,000 times and compare to our median difference of the true, non shuffled dataset. We can get a p value by seeing how many values out of the 10,000 are at or above our observed, actual statistic. We can do this for correlations as well by sampling, with replacement, XY pairs and then calculating a correlation amongst all the pairs and then comparing our actual r to this bootstrapped value. We can also use more straight-forward nonparametric tests. If we want the nonparametric analogue to an independent t-test, we would use the Wilcoxon’s Rank-Sum test. This test is especially sensitive to population differences in central tendency. All you have to do is rank all N scores without regard to group membership and then sum the rank numbers dependent on their group membership. It would make sense that the sums should be relatively even if there is no ranked ordering to the groups. We then test the smaller of the two groups (in unequal ns) or the smaller of the two sums to the critical value for the test on Ws in the Wilcoxon table. This table will tell us the smallest value of Ws that we would expect to obtain by chance if the null hypothesis were true. We compare depending on number of subjects in each group where n1 is always the number of subjects in the smaller group. We will only be able to reject the null if the sum of the ranks for the smaller group is sufficiently small. This might not make sense because what if the smaller group is the higher one? Well that’s why we test the opposite as well, which is: Μ Μ Μ − πΎπ ; Μ Μ Μ Μ Μ πΎ′π = ππΎ ππΎ = ππ (ππ + ππ + π) Compare that to our normal Ws and submit the smaller of them to the table for testing. If there are ties in our rank, just use that rank twice so 1 2 2 4 would be an order. We can also shuffle our group labels on the ranks and run a permutation test to observe our obtained Ws as compared to a distribution of random samplings. We can do the same thing for matched samples by calculating difference scores, ordering them by magnitude (absolute value) and then summing the positive ranks and the negative ranks as T+ and T-. We can then compare this to a T table where we use the smaller of T+ and T- to submit to the test. If there is a difference score of 0, it would be advised to eliminate that participant from consideration. We can also do a sign test that only looks at the sign of the differences and ask how likely an arrangement of +s and –s are in a population, which would just be submitting the number of +s and the number of –s to a chi-square goodness of fit test to see if this observed value is different from the expected value (an even splitting of + and -). The non-parametric equivalent of an ANOVA is the Kruskal-Wallis One-Way ANOVA where we calculate an H statistic that is scaled. We rank all scores without regard to group membership and then compute the sum of the ranks for each group and weight and scale them to fit with the chi-square distribution: π ππ πΉππ π―= ∑ − π(π΅ + π) π΅(π΅ + π) ππ π=π We can then evaluate this H on the Chi-Square distribution with k-1 df. Lastly, if we want to mirror a repeated measures ANOVA, then we can use the Friedman’s rank test for k correlated samples where we rank the scores within a subject (or level/row) and then get the sums and their variance across subjects. For example, if a subject takes 3 tests, we rank their scores and then see if, consistently, subjects are getting the best score on test 2. To obtain a statistic to compare to the chi-square distribution: π ππ πΏππ = ∑ πΉππ − ππ΅(π + π) π΅π(π + π) π=π R would be the sum of our ranks, which we go on to square. If our N is greater than 50 in any of these non-paramatrice test, when we should use a normal approximation.