ANOVA Null hypothesis: π’1 = π’2 = π’3 Alternative Hypothesis: ~(π’1 = π’2 = π’3 ) We want to assess whether group membership accounts for individual score variance from a larger population (all the groups). Is the variance across groups the same as the variance within groups? Under the null, the treatment effect will be no larger than the individual error effect. Key variables and their meaning: k = number of groups n= total number of subjects in the experiment j= group membership notation i= within-group subject notation πππ would be the dependent variable for subject i in group j (the subject’s score) πΜ .π or π’π is the mean of all scores in group j. Group mean. πΜ .. or π’ is the grand-mean (the mean of all the scores of all the subjects in all the groups) πππ is πππ - πΜ .π , This is the unique effect and referred to as error. It is the unexplained part after removing the effect of the grand mean and treatment. ππ is πΜ .. - πΜ .π , which is the effect of being in group j. It is the deviation of a group mean from the grand mean. Taken in summation, an individual’s score should be made up of the following: πππ = π’ + (π’π − π’) + (πππ − π’π ) This collapses into: πππ = π’ + ππ + π This shows us that an individual score from the grand mean can be made up of that individual’s group’s deviation from the grand mean and that individual’s deviation from their own group. Sum of Squares: The first number we want is the Sum of Squares for scores around the grand mean πππ‘ππ‘ππ = ∑(πππ − πΜ .. )2 Next, we want to get the Sum of Squares for treatment. That is, we want to see how far each group mean is from the grand mean. This gives us our experiment wide π πππ‘ππππ‘ππππ‘ /πππ‘π€πππ = ππ ∑(πΜ π − πΜ .. )2 Lastly, we want to account for error within a group. Not all subjects are going to be exactly group mean, so we need to find the variance within a group. This gives us our experiment wide π πππππππ /π€ππ‘βππ = ∑(πΜ ππ − πΜ π )2 Our SSerror/within should be the remainder of SStotal substracted from SSbetween/treatment So, πππππππ/π€ππ‘βππ = πππ‘ππ‘ππ − πππππ‘π€πππ/π‘ππππ‘ππππ‘ Thus, our πππππππ/π€ππ‘βππ πππ‘ππ‘ππ Furthermore, our will give us the percent of unexplained variance that we have. πππππ‘π€πππ/π‘ππππ‘ππππ‘ πππ‘ππ‘ππ will give us the percent of explained variance. When treatment is effective, the errors in predicting the treatment group mean will be much less than errors in predicting from the grand mean. Population Variance Estimates and Mean Squares: We need to calculate a variance that will encompass the entire test. That is, we need a variance that will include the variance from all the groups that are included in the test. The most straightforward way (doesn’t depend on the truth of the null) to do this is, if we have equal n per groups, averaging the variances across groups. We would take the variance for each group and divide by number of groups. Such a procedure would look like: ππ2 = ∑ ππ2 πΎ MSerror or MSwithin is this sample variance, which has been pooled into a population variance. We need it to be unbiased, so we use: πππππππ/π€ππ‘βππ = πππππππ/π€ππ‘βππ π−πΎ We can also apply the central limit theorem and assume the null is true. We know from the central limit thereom that if we are sampling from the same distribution, sample means can serve as standard errors, which tells us the standard deviation of each sample’s mean from the sample means’ mean. ππΜ 2 = ππ2 ππ = → πππΜ 2 = ππ2 π √π Remember that ππ is πΜ .. - πΜ .π , which is the treatment effect for a condition. So, if we want to get the variance of treatment of effects, we need to take the sum of squares for all the groups and divide by our degrees of freedom. Thus, our variance of the treatment effect is given by: ππ2 = ∑ ππ2 πΎ−1 MStreatment or MSbetween are the same thing and represent the variance of means around the grand mean, weighted by the within group sample size. πππ‘ππππ‘ππππ‘/πππ‘π€πππ = πππ‘ππππ‘ππππ‘/πππ‘π€πππ πΎ−1 F Test The F-Test in ANOVA is just a ratio of the two different ways of estimating the variance of Y in the population. The F-Test is given by: πΉ(π·πΉπ‘ππππ‘ =πΎ−1,π·πΉπππππ =π−πΎ) = πππ‘ππππ‘ππππ‘ πππππππ Effect Size: We can first see the percent of variance that is explained by our treatment groups, which would be the percent reduction in error variance (PRE). It will be positively biased. PRE is given by: π2 = πππ‘ππππ‘ππππ‘/πππ‘π€πππ πππ‘ππ‘ππ For a less positively biased estimator and to get a fixed effects, we can solve for Omega: π= πππ‘ππππ‘ππππ‘/πππ‘π€πππ − (π − 1)πππππππ πππ‘ππ‘ππ + πππππππ Lastly, we could get the Root-mean-square standardized effect (RMSSE), which is given by: 1 πππ‘ππππ‘ππππ‘ √ ( ) π − 1 πππ€ππ‘βππ Power: Remember, that Power is our probability of correctly rejecting a false null hypothesis. We need to get our standard error from the SStreatment we see this by way of: ππΜ = √πππ‘ππππ‘ππππ‘/πππ‘π€πππ ππ This is because πππ‘ππππ‘ππππ‘ /πππ‘π€πππ = ∑(πΜ π − πΜ .. )2 and πππ‘ππππ‘ππππ‘/πππ‘π€πππ = π‘ππππ‘ππππ‘/πππ‘π€πππ and πΎ−1 ∑(πΜ π −πΜ ..)2 ππΜ = √ πΎ−1 The full power would then be: π= ππΜ √πππ€ππ‘βππ/πππππ πππ‘ππππ‘ππππ‘/πππ‘π€πππ =√ πππ€ππ‘βππ/πππππ By the same logic, non-centrality would appear as: π′ = ππ ππ and π = π ′ √π One can twist this formula to solve for an n that serves a particular power purpose. Degrees of freedom: Between groups (SStreatment)= k-1 SStotal is N-1. Upper case N will be the total number of subjects in all of the groups. Within groups(SSerror) = N-k Assumptions: Homogeneity of variance. We want all the treatment groups to have the same variance on the dependent variable. We will be pooling variance, so this is essential. So long as the largest sample variance is no more than four times the smallest sample variance, ANOVA is robust so long as the n is relatively equal. If we violate this homogeneity, we need to use a new F, F’ which would use 1 as the first degree of freedom (treatment) and n-1 as the second(error). Really, though. The best would be to calculate Welch’s F and evaluate that using K-1 and dferror. Normality of Errors. Scores on the DV should be normally distributed within groups. Need this for interpretation of an F test. Independence of Observations. Each score is individual and not dependent on another person’s. Violating independence means that your within degrees of freedom are wrong, as well as the estimate of mean squares within. Side Notes: An orthogonalized design is one that has an equal number of subjects per group. Furthermore, each of the independent variables should be uncorrelated. Otherwise, effects on the DV are redundant and/or confounded. A “House” model allows for people to be proportionately placed in their groups. If there are less Alz people in the population as compared to normal people, then a “House” model will reflect that, but will result in unequal ns. A “Senate” model forces equal ns despite their population distribution. Type I is a sequential sum of squares Type II is a sum of squares for each effect after controlling for the effect of the other main effects but not the interaction. Type III is the sum of squares for each effect after controlling for (partialing out) the effect of the other main effects and the interaction. Effect A = EffectA – (EffectB + Effect Interaction) Post-Hoc Tests An F-test will only tell us that treatment has an effect across the different treatments. But, it doesn’t tell us anything about which particular pairs of means are significantly different. It also doesn’t tell us about which combinations of means are different from others. Post-Hoc tests answer these questions. If we want to do a t-test afterwards, we can use the error that we found from the ANOVA. This will be representative of a population better than just the individual groups that we are comparing against. This is very similar in concept to pooled-variance. We would have normally pooled variance like this: πΜ −πΜ 1 1 √π2π (π +π ) 1 2 where ππ2 = (π·πΉ1 )π12 +(π·πΉ2 )π22 π·πΉπ‘ However, we can get an even better estimate for our t-test during post-hoc analysis of an ANOVA: π= πΜ 1 − πΜ 2 √πππππππ + πππππππ π π For finding a critical t, we use the DFerror from our ANOVA. Where a t-test reveals difference between any 2 groups, we can also see if groups of groups are different from eachother. For example if we have a control group and then a 20mg of a drug group and a 40mg of a drug group, we can compare 20mg to 40mg or control to 20mg or control to 40mg. However, with a linear contrast, we’d be able to compare control to (20mg and 40mg). A linear contrast is given by: πΏ = π = π1 πΜ 1 + π2 πΜ 2 + π3 πΜ 3 + π4 πΜ 4 = ∑ ππ πΜ π a is a weight that is assigned to each group. The sum of all the a-weights must sum to 0. If we want to assess the orthognality of 2 different contrasts, we need to multiply the a(s) of each contrast and see if the sum of those produces equals 0. We further this by getting a sum of squares that we can use for an F-test: ππππππ‘πππ π‘ = πΉ= ππ 2 ∑ ππ2 ππππππ‘πππ π‘ πππππππ Multiple-Comparisons A single test has a type I error rate. The more tests we do, the more we need to account for the “family” of potential errors we could be making with each and every test. As the number of comparisons we are interested in increases, our likelihood of committing at least 1 Type I error also increases. This is known as the Family Wise Error Rate (FWE) and can be accommodated for by making a new alpha for each comparison. When we make a new alpha that is going to be used as the error rate per comparison it is: πΌ′ We can find the family wise error rate (FWE), where c is the number of tests, by: πΌ = 1 − (1 − πΌ ′ )π We can create a πΌ′ via the Bonferonni method, which is ultra-conservative, but is given by: πΌ π πΌ′ = If we want to conduct a t-test and simultaneously correct for multiple correction, we can undergo a Bonferonni Dunn t’ , which is given by: π‘′ = πΜ π − πΜ π √2πππππππ π π = ∑ 2 √ ππ πππππππ π To get the effect for this test, we would use: π= π √πππππππ We could also do a Multistage Bonferroni, where we go after the largest differences first and use the most critical alpha on that and then, if it is significant, go on to our next test and reduce our comparisons by 1. For a Post-Hoc test of all pairwise comparisons, we can get an Honest Significantly Different score. This is given in (q) and is called the studentized range statistic. We take the larger mean and subtract the smaller mean from it (order the means by magnitude). Count the number of steps between the means and count the means that we are using. The more steps, the larger the critical value needed. Our degree of freedom becomes this step-count. The second df is the dferror This equation is given by: ππ»ππ· = πΜ π − πΜ π √πππππππ π N-Way ANOVA (MANOVA) The n in an n-way ANOVA refers to the number of independent variables (or factors). However many levels in each factor gives us the factorial of our design. For example, if gender is one independent variable and another is which of 3 countries we pooled from, then we’d have a 2x3 design. The order is irrelevant. Each individual will now belong to multiple groups. The intersections of these groups will be known as cell means. Averaging across cell means in one factor will give marginal means. We will have 1 marginal mean for every factor intersection. So, for a 2x3, we’d have 6 marginal means. Each factor is subdivided into levels. An N-Way ANOVA allows us to check for Interactions. An interaction is the variance that is unexplained by either factor alone. In other words, it takes BOTH factors to bring about a particular cell mean. In terms of Sum of Squares, the interaction is computed from the amount of variance explained by group membership (cell mean) that is not already explained by both factors (both marginal means). An interaction is non-additive. We shouldn’t be able to predict a cell-mean value if it due to an interaction. If, say, typically scores are low in a dark room and scores are also low amongst females, but females in dark rooms do really well, then that is an interaction. Interactions occur when the effects of one factor differ by level of the other factor. Our Sum of Squares for SSerror/within and SStotal are going to be the same, but we need to partition our SStreatment/between into each factor. So, lets say we have 5 types of recall conditions and 2 types of age groups. We find the mean for old age and the mean for young age. The grand mean would be the average of those two means. Then we can get a sum of squares by squaring the difference of each mean from the grand mean. Do this for both factors and get a sum of squares for both factors. To get mean squares, we just take the SS and divide by number of groups in that factor -1. In order to see if there is an interaction, we just see what is leftover from the variance accounted for by the cell means and the marginal means(our main effects): πππππ‘πππππ‘πππ(π΄π΅) = πππππππ − πππ΄ − πππ΅ or πππππ‘πππππ‘πππ = πππ‘ππ‘ππ − (πππ΄ + πππ΅ + πππππππ ) Our F-Test will be very similar: We can skip to calculating the error by: πππππππ = πππ‘ππ‘ππ − πππππππ This is the same as taking each individual score and substracting it from its cell mean to get a sum of squares. But, the math works out to just use the above calculation. And then we can calculate our F statistic by: πΉπ΄ = πππ΄ πππ πΉπ = πππ΅ πππ πΉπ΄π΅ = πππ΄π΅(πΌππ‘πππππ‘πππ) πππ If our interaction is significant, we shouldn’t pay attention to Main effects because it’s more of an artifact and not too interpretable. What’s the sense in saying that men are better than women in a task if those effects change as a function of ethnicity? Also, we should follow up with a test for simple effects by choosing a factor to look at the effect at different levels. This is a glorified way of comparing cell means. All of this applies only if the factors are fixed effects. If the factors are random effects, then you need to test the main effects of A and B with the interaction mean square in the denominator. If only one factor is random, then test the random effect with the mean square error, the fixed effect with the mean square interaction and the interaction with the mean square error. Effect for MANOVA is given as follows: ππ΄2 = πππ΄ πππππ‘ ππ΅2 = πππ΅ πππππ‘ 2 ππ΄π΅ = πππ΄π΅ πππππ‘ Power for Fixed Effect MANOVA is: ∑ ππ2 π′ = √ πππ2 =√ πππ‘ππππ‘ππππ‘ ππππππππ ο π = π ′ √π If we want to look at a particular effect, where a is the number of levels in factor A and b is the number of levels in factor b and n is the number of observations in each treatment: ∑ πΌπ2 ππΌ′ = √ πππ2 πππ΄ ππππππππ =√ ο ππΌ = ππΌ′ √ππ Orthogonal Designs First, we need an equal number of individual in each cell of a factorial ANOVA. Second, the independent variables need to be uncorrelated. In so doing, the SStotal can be uniquely partitioned. If we do not achieve orthogonality, then the effect of one variable maye depend on another variable. Type III SS will address this by partialing out the effect of the other main effect and the interaction so that: Effect A – Effect A – (Effect B+ Effect Interaction) Regression and Correlation The size of a treatment effect is given by: π 2 = πππππ‘π€πππ πππ‘ππ‘ππ This gives us the percent of the variance in scores that is due to a treatment group variation. Our F test will tell us of this is significant, because the F test tests the ratio of MSbetween/treatment over MSwithin/error. That’s why we get a higher F if the MSbetween is greater than the MSerror. We want within group variance to be at a minimum, so that there is a coherence within groups. But, we want the groups to vary greatly, because we want there to be a difference across groups. Correlation The degree of “linear” relation between two (continuous) variables. Only Linear. Covariance is the average product of the deviation scores and is given by: πππ£ππ = ∑(π − πΜ )(π − πΜ ) π Where N is the number of pairs of observations. It is worth noting that the covariance of a variable with itself is the sample variance. For a correlation, we want to scale covariance by the standard deviations of both X and Y and, thus, our standardized covariance is given by: πππ£ππ π= = ππ ππ ∑(π − πΜ )(π − πΜ ) ∑(π − πΜ )(π − πΜ ) π = ππ ππ (π)(ππ ππ ) Technically, since in the numerator we have an individual’s difference from the mean and then dividing by the standard deviation, we are creating z scores. So, we are looking at the average cross product of Z scores. By yielding an r2, we can say that r2% of variance in Y is explained or accounted for by variation in X. We know things about Y just by knowing X. The remaining error is that which, when predicting Y(from X), we get wrong from knowing actual Y. When the r is 0, the conditional mean is the same as the mean for that variable. If we want to test for significance of an r(which depends on bivariate normality), then we can use an approximation formula: π−2 π‘π−2 = π√ 1 − π2 Where, the DF = N-2…which is why the numerator is the correlation weighted by the sqrt of the DF and this value becomes large with either large values of the sample correlation or increasing sample sizes. Technically, the denominator is the square root of the percent error (unexplained) variance. If we want to test a non-normal hypothesis, we need to transform r into r’ first, which is done by: π ′ = (.50)(ln | 1+π | 1−π Our standard error for this test will be: ππ′ = 1 √π − 3 After this conversion we can test between two correlations: π′1 − π′2 π= 1 1 + √ √π1 − 3 √π2 − 3 Doing so, we can get a confidence interval: 1 1 πΆπΌ = π′1 − π ′ 2 + ππΆπ √ + √π1 − 3 √π2 − 3 We can then transform this confidence interval into regular r. If we have a particular hypothesis (say that rho is .5 and we want to see if our correlation is significantly different than that, then we can use: π= π ′ − π′ 1 √π − 3 Effects on correlation values: Range restrictions will decrease. Mixed populations will increase. Outliers will increase or decrease (sensitive). Linear transformations will help if an effect is present. To find the power of a correlation, where d= π, we can find the non-centrality parameter and, if we are looking to build a study, we can use that ncp to figure out our N πΏ πΏ = π√π − 1 and π = (π)2 + 1 Chi-Square Tests We can use a Chi-Square as a test of variance if we assume normality in X (in the population). If variance is greater than the null, then we will be using a right tailed test. The opposite holds true as well. The Chi square test of variance is given by: 2 ππ−1 = (π − 1)π 2 2 πππ’ππ We can construct confidence intervals: π2 + (π − 1)π 2 π 2 (π − 1; 0.025) We would do the same for S2 - .975 Variances have a non-symmetric confidence interval. We can also use a chi square a measure of Goodness of Fit Test to compare observed against expected frequencies of dependent variables as a function of independent variables. We can know expected if we know the frequency in the population OR we expect each cell to be evenly dividing up the N. So, if we want to test an expected frequency against an observed we would use: π2 = ∑ (π − πΈ)2 πΈ We can get a standardized result (single cell, standardized residual) by getting a z: π§=∑ (π − πΈ) √πΈ Positive will mean more than expected, whereas negative will be less than expected. Lastly, we can use a Chi-Square as a test of independence to see if a response on one variable is associated with a response on another variable. This is essentially the same test as a measure of goodness of fit, but we will be calculating E based on the following formula: πΈππ = π π. πΆ.π π where R and C represent the marginal sums of the Rows and Columns pertaining to that particular cell. You can calculate the sum of all the π 2 for every cell and get an experiment wide non centrality parameter. This formula is what we would use for a 2x2 table: ∑ π2 π=√ π For tables of any dimension, we should use Cramer’s V, which is given by: ∑ π2 π=√ π(πΏ − 1) where L is the lesser of number rows or number of columns. (think of it as length). DF for Chi-Square is always (r-1)*(c-1) When using a 2x2 and using less than 5 observed frequencies per cell, we want to use Fisher’s Exact Test We can also calculate Risk and Odds as a measure of interpretable success. When determining odds, we use the IV and see the proportion of nominal, potential Dependent Variables. For examples, given the following table: Heart Disease Yes No >55 21 6 27 <55 22 51 73 43 57 100 We can calculate Risk by looking at any one cell mean over the marginal mean of the IV. So, πππ π ππ βππππ‘ πππ πππ π πππ πππ ππ£ππ 55 = 21 = .77 27 πππ π ππ βππππ‘ πππ πππ π πππ πππ π’ππππ 55 = 22 = .3 73 And, in turn, we can calculate the risk differences, which would be: πππ π ππ βππππ‘ πππ πππ π πππ πππ ππ£ππ 55 − πππ π ππ βππππ‘ πππ πππ π π’ππππ 55 = .77 − .3 = .47 The relative risk would be (higher risk always goes in the numerator): πππ π ππ βππππ‘ πππ πππ π πππ πππ ππ£ππ 55 . 77 = = 2.56 πππ π ππ βππππ‘ πππ πππ π π’ππππ 55 .3 We can calculate odds as the successes over failures within a condition. πβπ ππππ ππ πππ ππ£ππ 55 βππ£πππ βππππ‘ πππ πππ π = πβπ ππππ ππ πππ π’ππππ 55 βππ£πππ βππππ‘ πππ πππ π = 21 (ππ’ππππ ππ ππ£ππ 55 π¦ππ ) = 3.5 6 (ππ’ππππ ππ) 22(ππ’ππππ ππ π’ππππ 55 π¦ππ ) = .43 51 (ππ’ππππ ππ) We can then use these odds to get a relative measure of how much greater the odds are for one group relative to another: πβπ ππππ ππ πππ ππ£ππ 55 βππ£πππ βππππ‘ πππ πππ π 3.5 = = 8.1 πβπ ππππ ππ πππ π’ππππ 55 βππ£πππ βππππ‘ πππ πππ π . 43