N-Way ANOVA

advertisement
ANOVA
Null hypothesis: 𝑒1 = 𝑒2 = 𝑒3
Alternative Hypothesis: ~(𝑒1 = 𝑒2 = 𝑒3 )
We want to assess whether group membership accounts for individual score variance from a larger
population (all the groups). Is the variance across groups the same as the variance within groups? Under
the null, the treatment effect will be no larger than the individual error effect.
Key variables and their meaning:
k = number of groups
n= total number of subjects in the experiment
j= group membership notation
i= within-group subject notation
π‘Œπ‘–π‘— would be the dependent variable for subject i in group j (the subject’s score)
π‘ŒΜ….𝑗 or 𝑒𝑗 is the mean of all scores in group j. Group mean.
π‘ŒΜ….. or 𝑒 is the grand-mean (the mean of all the scores of all the subjects in all the groups)
πœ€π‘–π‘— is π‘Œπ‘–π‘— - π‘ŒΜ….𝑗 , This is the unique effect and referred to as error. It is the unexplained part after removing
the effect of the grand mean and treatment.
πœπ‘— is π‘ŒΜ….. - π‘ŒΜ….𝑗 , which is the effect of being in group j. It is the deviation of a group mean from the grand
mean.
Taken in summation, an individual’s score should be made up of the following:
π‘Œπ‘–π‘— = 𝑒 + (𝑒𝑗 − 𝑒) + (π‘Œπ‘–π‘— − 𝑒𝑗 )
This collapses into:
π‘Œπ‘–π‘— = 𝑒 + πœπ‘— + πœ€
This shows us that an individual score from the grand mean can be made up of that individual’s group’s
deviation from the grand mean and that individual’s deviation from their own group.
Sum of Squares:
The first number we want is the Sum of Squares for scores around the grand mean
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™ = ∑(π‘Œπ‘–π‘— − π‘ŒΜ….. )2
Next, we want to get the Sum of Squares for treatment. That is, we want to see how far each group
mean is from the grand mean. This gives us our experiment wide 𝜏
π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘ /𝑏𝑒𝑑𝑀𝑒𝑒𝑛 = 𝑛𝑗 ∑(π‘ŒΜ…π‘— − π‘ŒΜ….. )2
Lastly, we want to account for error within a group. Not all subjects are going to be exactly group mean,
so we need to find the variance within a group. This gives us our experiment wide πœ€
π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ /π‘€π‘–π‘‘β„Žπ‘–π‘› = ∑(π‘ŒΜ…π‘–π‘— − π‘ŒΜ…π‘— )2
Our SSerror/within should be the remainder of SStotal substracted from SSbetween/treatment
So,
π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ/π‘€π‘–π‘‘β„Žπ‘–π‘› = π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™ − 𝑆𝑆𝑏𝑒𝑑𝑀𝑒𝑒𝑛/π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘
Thus, our
π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ/π‘€π‘–π‘‘β„Žπ‘–π‘›
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™
Furthermore, our
will give us the percent of unexplained variance that we have.
𝑆𝑆𝑏𝑒𝑑𝑀𝑒𝑒𝑛/π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™
will give us the percent of explained variance.
When treatment is effective, the errors in predicting the treatment group mean will be much less than
errors in predicting from the grand mean.
Population Variance Estimates and Mean Squares:
We need to calculate a variance that will encompass the entire test. That is, we need a variance that will
include the variance from all the groups that are included in the test.
The most straightforward way (doesn’t depend on the truth of the null) to do this is, if we have equal n
per groups, averaging the variances across groups. We would take the variance for each group and
divide by number of groups. Such a procedure would look like:
πœŽπ‘’2 =
∑ 𝑆𝑗2
𝐾
MSerror or MSwithin is this sample variance, which has been pooled into a population variance. We need it
to be unbiased, so we use:
π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ/π‘€π‘–π‘‘β„Žπ‘–π‘› =
π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ/π‘€π‘–π‘‘β„Žπ‘–π‘›
𝑁−𝐾
We can also apply the central limit theorem and assume the null is true. We know from the central limit
thereom that if we are sampling from the same distribution, sample means can serve as standard errors,
which tells us the standard deviation of each sample’s mean from the sample means’ mean.
πœŽπ‘ŒΜ…2 =
πœŽπ‘Œ2 πœŽπ‘Œ
=
→ π‘›πœŽπ‘ŒΜ…2 = πœŽπ‘Œ2
𝑛
√𝑛
Remember that πœπ‘— is π‘ŒΜ….. - π‘ŒΜ….𝑗 , which is the treatment effect for a condition. So, if we want to get the
variance of treatment of effects, we need to take the sum of squares for all the groups and divide by our
degrees of freedom. Thus, our variance of the treatment effect is given by:
𝜎𝜏2 =
∑ πœπ‘—2
𝐾−1
MStreatment or MSbetween are the same thing and represent the variance of means around the grand mean,
weighted by the within group sample size.
π‘€π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛 =
π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛
𝐾−1
F Test
The F-Test in ANOVA is just a ratio of the two different ways of estimating the variance of Y in the
population. The F-Test is given by:
𝐹(π·πΉπ‘‘π‘Ÿπ‘’π‘Žπ‘‘ =𝐾−1,π·πΉπ‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ =𝑁−𝐾) =
π‘€π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘
π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
Effect Size:
We can first see the percent of variance that is explained by our treatment groups, which would be the
percent reduction in error variance (PRE). It will be positively biased. PRE is given by:
πœ‚2 =
π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™
For a less positively biased estimator and to get a fixed effects, we can solve for Omega:
πœ›=
π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛 − (π‘˜ − 1)π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™ + π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
Lastly, we could get the Root-mean-square standardized effect (RMSSE), which is given by:
1 π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘
√
(
)
π‘˜ − 1 π‘€π‘†π‘€π‘–π‘‘β„Žπ‘–π‘›
Power:
Remember, that Power is our probability of correctly rejecting a false null hypothesis.
We need to get our standard error from the SStreatment we see this by way of:
π‘†π‘ŒΜ… = √π‘€π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛
𝑆𝑆
This is because π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘ /𝑏𝑒𝑑𝑀𝑒𝑒𝑛 = ∑(π‘ŒΜ…π‘— − π‘ŒΜ….. )2 and π‘€π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛 = π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛 and
𝐾−1
∑(π‘ŒΜ…π‘— −π‘ŒΜ…..)2
π‘†π‘ŒΜ… = √
𝐾−1
The full power would then be:
𝑓=
π‘†π‘ŒΜ…
√π‘€π‘†π‘€π‘–π‘‘β„Žπ‘–π‘›/π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
π‘€π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘/𝑏𝑒𝑑𝑀𝑒𝑒𝑛
=√
π‘€π‘†π‘€π‘–π‘‘β„Žπ‘–π‘›/π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
By the same logic, non-centrality would appear as:
πœ™′ =
𝜎𝜏
πœŽπ‘’
and πœ™ = πœ™ ′ √𝑛
One can twist this formula to solve for an n that serves a particular power purpose.
Degrees of freedom:
Between groups (SStreatment)= k-1
SStotal is N-1. Upper case N will be the total number of subjects in all of the groups.
Within groups(SSerror) = N-k
Assumptions:
Homogeneity of variance. We want all the treatment groups to have the same variance on the
dependent variable. We will be pooling variance, so this is essential. So long as the largest sample
variance is no more than four times the smallest sample variance, ANOVA is robust so long as the n is
relatively equal. If we violate this homogeneity, we need to use a new F, F’ which would use 1 as the first
degree of freedom (treatment) and n-1 as the second(error). Really, though. The best would be to
calculate Welch’s F and evaluate that using K-1 and dferror.
Normality of Errors. Scores on the DV should be normally distributed within groups. Need this for
interpretation of an F test.
Independence of Observations. Each score is individual and not dependent on another person’s.
Violating independence means that your within degrees of freedom are wrong, as well as the estimate
of mean squares within.
Side Notes:
An orthogonalized design is one that has an equal number of subjects per group. Furthermore, each of
the independent variables should be uncorrelated. Otherwise, effects on the DV are redundant and/or
confounded.
A “House” model allows for people to be proportionately placed in their groups. If there are less Alz
people in the population as compared to normal people, then a “House” model will reflect that, but will
result in unequal ns.
A “Senate” model forces equal ns despite their population distribution.
Type I is a sequential sum of squares
Type II is a sum of squares for each effect after controlling for the effect of the other main effects but
not the interaction.
Type III is the sum of squares for each effect after controlling for (partialing out) the effect of the other
main effects and the interaction. Effect A = EffectA – (EffectB + Effect Interaction)
Post-Hoc Tests
An F-test will only tell us that treatment has an effect across the different treatments. But, it doesn’t tell
us anything about which particular pairs of means are significantly different. It also doesn’t tell us about
which combinations of means are different from others. Post-Hoc tests answer these questions.
If we want to do a t-test afterwards, we can use the error that we found from the ANOVA. This will be
representative of a population better than just the individual groups that we are comparing against. This
is very similar in concept to pooled-variance.
We would have normally pooled variance like this:
𝑋̅−𝑋̅
1
1
√𝑆2𝑝 (𝑁 +𝑁 )
1
2
where 𝑆𝑝2 =
(𝐷𝐹1 )𝑆12 +(𝐷𝐹2 )𝑆22
𝐷𝐹𝑑
However, we can get an even better estimate for our t-test during post-hoc analysis of an ANOVA:
𝑇=
𝑋̅1 − 𝑋̅2
√π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ + π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
𝑛
𝑛
For finding a critical t, we use the DFerror from our ANOVA.
Where a t-test reveals difference between any 2 groups, we can also see if groups of groups are
different from eachother. For example if we have a control group and then a 20mg of a drug group and a
40mg of a drug group, we can compare 20mg to 40mg or control to 20mg or control to 40mg. However,
with a linear contrast, we’d be able to compare control to (20mg and 40mg).
A linear contrast is given by:
𝐿 = πœ“ = π‘Ž1 𝑋̅1 + π‘Ž2 𝑋̅2 + π‘Ž3 𝑋̅3 + π‘Ž4 𝑋̅4 = ∑ π‘Žπ‘— 𝑋̅𝑗
a is a weight that is assigned to each group. The sum of all the a-weights must sum to 0. If we want to
assess the orthognality of 2 different contrasts, we need to multiply the a(s) of each contrast and see if
the sum of those produces equals 0.
We further this by getting a sum of squares that we can use for an F-test:
π‘†π‘†π‘π‘œπ‘›π‘‘π‘Ÿπ‘Žπ‘ π‘‘ =
𝐹=
π‘›πœ“ 2
∑ π‘Žπ‘—2
π‘†π‘†π‘π‘œπ‘›π‘‘π‘Ÿπ‘Žπ‘ π‘‘
π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
Multiple-Comparisons
A single test has a type I error rate. The more tests we do, the more we need to account for the “family”
of potential errors we could be making with each and every test. As the number of comparisons we are
interested in increases, our likelihood of committing at least 1 Type I error also increases. This is known
as the Family Wise Error Rate (FWE) and can be accommodated for by making a new alpha for each
comparison.
When we make a new alpha that is going to be used as the error rate per comparison it is: 𝛼′
We can find the family wise error rate (FWE), where c is the number of tests, by:
𝛼 = 1 − (1 − 𝛼 ′ )𝑐
We can create a 𝛼′ via the Bonferonni method, which is ultra-conservative, but is given by:
𝛼
𝑐
𝛼′ =
If we want to conduct a t-test and simultaneously correct for multiple correction, we can undergo a
Bonferonni Dunn t’ , which is given by:
𝑑′ =
𝑋̅𝑖 − 𝑋̅𝑗
√2π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
𝑛
πœ“
=
∑ 2
√ π‘Žπ‘— π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
𝑛
To get the effect for this test, we would use:
𝑑=
πœ“
√π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
We could also do a Multistage Bonferroni, where we go after the largest differences first and use the
most critical alpha on that and then, if it is significant, go on to our next test and reduce our
comparisons by 1.
For a Post-Hoc test of all pairwise comparisons, we can get an Honest Significantly Different score. This
is given in (q) and is called the studentized range statistic. We take the larger mean and subtract the
smaller mean from it (order the means by magnitude).
Count the number of steps between the means and count the means that we are using. The more steps,
the larger the critical value needed. Our degree of freedom becomes this step-count. The second df is
the dferror This equation is given by:
π‘žπ»π‘†π· =
𝑋̅𝑙 − 𝑋̅𝑠
√π‘€π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
𝑛
N-Way ANOVA (MANOVA)
The n in an n-way ANOVA refers to the number of independent variables (or factors). However many
levels in each factor gives us the factorial of our design. For example, if gender is one independent
variable and another is which of 3 countries we pooled from, then we’d have a 2x3 design. The order is
irrelevant.
Each individual will now belong to multiple groups. The intersections of these groups will be known as
cell means. Averaging across cell means in one factor will give marginal means. We will have 1 marginal
mean for every factor intersection. So, for a 2x3, we’d have 6 marginal means. Each factor is subdivided
into levels.
An N-Way ANOVA allows us to check for Interactions. An interaction is the variance that is unexplained
by either factor alone. In other words, it takes BOTH factors to bring about a particular cell mean. In
terms of Sum of Squares, the interaction is computed from the amount of variance explained by group
membership (cell mean) that is not already explained by both factors (both marginal means).
An interaction is non-additive. We shouldn’t be able to predict a cell-mean value if it due to an
interaction. If, say, typically scores are low in a dark room and scores are also low amongst females, but
females in dark rooms do really well, then that is an interaction. Interactions occur when the effects of
one factor differ by level of the other factor.
Our Sum of Squares for SSerror/within and SStotal are going to be the same, but we need to partition our
SStreatment/between into each factor. So, lets say we have 5 types of recall conditions and 2 types of age
groups. We find the mean for old age and the mean for young age. The grand mean would be the
average of those two means. Then we can get a sum of squares by squaring the difference of each mean
from the grand mean. Do this for both factors and get a sum of squares for both factors. To get mean
squares, we just take the SS and divide by number of groups in that factor -1.
In order to see if there is an interaction, we just see what is leftover from the variance accounted for by
the cell means and the marginal means(our main effects):
π‘†π‘†π‘–π‘›π‘‘π‘’π‘Ÿπ‘Žπ‘π‘‘π‘–π‘œπ‘›(𝐴𝐡) = 𝑆𝑆𝑐𝑒𝑙𝑙𝑠 − 𝑆𝑆𝐴 − 𝑆𝑆𝐡
or
π‘†π‘†π‘–π‘›π‘‘π‘’π‘Ÿπ‘Žπ‘π‘‘π‘–π‘œπ‘› = π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™ − (𝑆𝑆𝐴 + 𝑆𝑆𝐡 + π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ )
Our F-Test will be very similar:
We can skip to calculating the error by:
π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ = π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™ − 𝑆𝑆𝑐𝑒𝑙𝑙𝑠
This is the same as taking each individual score and substracting it from its cell mean to get a sum of
squares. But, the math works out to just use the above calculation.
And then we can calculate our F statistic by:
𝐹𝐴 =
𝑀𝑆𝐴
π‘€π‘†πœ–
𝐹𝑏 =
𝑀𝑆𝐡
π‘€π‘†πœ–
𝐹𝐴𝐡 =
𝑆𝑆𝐴𝐡(πΌπ‘›π‘‘π‘’π‘Ÿπ‘Žπ‘π‘‘π‘–π‘œπ‘›)
π‘€π‘†πœ–
If our interaction is significant, we shouldn’t pay attention to Main effects because it’s more of an
artifact and not too interpretable. What’s the sense in saying that men are better than women in a task
if those effects change as a function of ethnicity?
Also, we should follow up with a test for simple effects by choosing a factor to look at the effect at
different levels. This is a glorified way of comparing cell means.
All of this applies only if the factors are fixed effects. If the factors are random effects, then you need to
test the main effects of A and B with the interaction mean square in the denominator. If only one factor
is random, then test the random effect with the mean square error, the fixed effect with the mean
square interaction and the interaction with the mean square error.
Effect for MANOVA is given as follows:
πœ‚π΄2 =
𝑆𝑆𝐴
π‘†π‘†π‘‡π‘œπ‘‘
πœ‚π΅2 =
𝑆𝑆𝐡
π‘†π‘†π‘‡π‘œπ‘‘
2
πœ‚π΄π΅
=
𝑆𝑆𝐴𝐡
π‘†π‘†π‘‡π‘œπ‘‘
Power for Fixed Effect MANOVA is:
∑ πœπ‘—2
πœ™′ = √
π‘˜πœŽπ‘’2
=√
π‘†π‘†π‘‘π‘Ÿπ‘’π‘Žπ‘‘π‘šπ‘’π‘›π‘‘
π‘˜π‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
οƒ πœ™ = πœ™ ′ √𝑛
If we want to look at a particular effect, where a is the number of levels in factor A and b is the number
of levels in factor b and n is the number of observations in each treatment:
∑ 𝛼𝑗2
πœ™π›Ό′ = √
π‘ŽπœŽπ‘’2
𝑆𝑆𝐴
π‘Žπ‘†π‘†π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
=√
οƒ  πœ™π›Ό = πœ™π›Ό′ √𝑛𝑏
Orthogonal Designs
First, we need an equal number of individual in each cell of a factorial ANOVA.
Second, the independent variables need to be uncorrelated. In so doing, the SStotal can be uniquely
partitioned.
If we do not achieve orthogonality, then the effect of one variable maye depend on another variable.
Type III SS will address this by partialing out the effect of the other main effect and the interaction so
that:
Effect A – Effect A – (Effect B+ Effect Interaction)
Regression and Correlation
The size of a treatment effect is given by:
𝑅2 =
𝑆𝑆𝑏𝑒𝑑𝑀𝑒𝑒𝑛
π‘†π‘†π‘‘π‘œπ‘‘π‘Žπ‘™
This gives us the percent of the variance in scores that is due to a treatment group variation. Our F test
will tell us of this is significant, because the F test tests the ratio of MSbetween/treatment over MSwithin/error.
That’s why we get a higher F if the MSbetween is greater than the MSerror. We want within group variance
to be at a minimum, so that there is a coherence within groups. But, we want the groups to vary greatly,
because we want there to be a difference across groups.
Correlation
The degree of “linear” relation between two (continuous) variables. Only Linear.
Covariance is the average product of the deviation scores and is given by:
π‘π‘œπ‘£π‘‹π‘Œ =
∑(𝑋 − 𝑋̅ )(π‘Œ − π‘ŒΜ…)
𝑁
Where N is the number of pairs of observations.
It is worth noting that the covariance of a variable with itself is the sample variance.
For a correlation, we want to scale covariance by the standard deviations of both X and Y and, thus, our
standardized covariance is given by:
π‘π‘œπ‘£π‘‹π‘Œ
π‘Ÿ=
=
𝑆𝑋 π‘†π‘Œ
∑(𝑋 − 𝑋̅)(π‘Œ − π‘ŒΜ…)
∑(𝑋 − 𝑋̅)(π‘Œ − π‘ŒΜ…)
𝑁
=
𝑆𝑋 π‘†π‘Œ
(𝑁)(𝑆𝑋 π‘†π‘Œ )
Technically, since in the numerator we have an individual’s difference from the mean and then dividing
by the standard deviation, we are creating z scores. So, we are looking at the average cross product of Z
scores.
By yielding an r2, we can say that r2% of variance in Y is explained or accounted for by variation in X. We
know things about Y just by knowing X. The remaining error is that which, when predicting Y(from X), we
get wrong from knowing actual Y. When the r is 0, the conditional mean is the same as the mean for that
variable.
If we want to test for significance of an r(which depends on bivariate normality), then we can use an
approximation formula:
𝑁−2
𝑑𝑁−2 = π‘Ÿ√
1 − π‘Ÿ2
Where, the DF = N-2…which is why the numerator is the correlation weighted by the sqrt of the DF and
this value becomes large with either large values of the sample correlation or increasing sample sizes.
Technically, the denominator is the square root of the percent error (unexplained) variance.
If we want to test a non-normal hypothesis, we need to transform r into r’ first, which is done by:
π‘Ÿ ′ = (.50)(ln |
1+π‘Ÿ
|
1−π‘Ÿ
Our standard error for this test will be:
π‘†π‘Ÿ′ =
1
√𝑁 − 3
After this conversion we can test between two correlations:
π‘Ÿ′1 − π‘Ÿ′2
𝑍=
1
1
+
√
√𝑁1 − 3 √𝑁2 − 3
Doing so, we can get a confidence interval:
1
1
𝐢𝐼 = π‘Ÿ′1 − π‘Ÿ ′ 2 + 𝑍𝐢𝑉 √
+
√𝑁1 − 3 √𝑁2 − 3
We can then transform this confidence interval into regular r.
If we have a particular hypothesis (say that rho is .5 and we want to see if our correlation is significantly
different than that, then we can use:
𝑍=
π‘Ÿ ′ − 𝜌′
1
√𝑁 − 3
Effects on correlation values:
Range restrictions will decrease. Mixed populations will increase. Outliers will increase or decrease
(sensitive). Linear transformations will help if an effect is present.
To find the power of a correlation, where d= 𝜌, we can find the non-centrality parameter and, if we are
looking to build a study, we can use that ncp to figure out our N
𝛿
𝛿 = 𝜌√𝑁 − 1 and 𝑁 = (𝜌)2 + 1
Chi-Square Tests
We can use a Chi-Square as a test of variance if we assume normality in X (in the population). If
variance is greater than the null, then we will be using a right tailed test. The opposite holds
true as well. The Chi square test of variance is given by:
2
πœ’π‘−1
=
(𝑁 − 1)𝑆 2
2
πœŽπ‘›π‘’π‘™π‘™
We can construct confidence intervals:
𝑆2 +
(𝑁 − 1)𝑆 2
πœ’ 2 (𝑁 − 1; 0.025)
We would do the same for S2 - .975
Variances have a non-symmetric confidence interval.
We can also use a chi square a measure of Goodness of Fit Test to compare observed against
expected frequencies of dependent variables as a function of independent variables. We can
know expected if we know the frequency in the population OR we expect each cell to be evenly
dividing up the N. So, if we want to test an expected frequency against an observed we would
use:
πœ’2 = ∑
(𝑂 − 𝐸)2
𝐸
We can get a standardized result (single cell, standardized residual) by getting a z:
𝑧=∑
(𝑂 − 𝐸)
√𝐸
Positive will mean more than expected, whereas negative will be less than expected.
Lastly, we can use a Chi-Square as a test of independence to see if a response on one variable is
associated with a response on another variable. This is essentially the same test as a measure of
goodness of fit, but we will be calculating E based on the following formula:
𝐸𝑖𝑗 =
𝑅𝑖. 𝐢.𝑖
𝑁
where R and C represent the marginal sums of the Rows and Columns pertaining to that particular cell.
You can calculate the sum of all the πœ’ 2 for every cell and get an experiment wide non centrality
parameter. This formula is what we would use for a 2x2 table:
∑ πœ’2
πœ™=√
𝑁
For tables of any dimension, we should use Cramer’s V, which is given by:
∑ πœ’2
𝑉=√
𝑁(𝐿 − 1)
where L is the lesser of number rows or number of columns. (think of it as length).
DF for Chi-Square is always (r-1)*(c-1)
When using a 2x2 and using less than 5 observed frequencies per cell, we want to use Fisher’s Exact Test
We can also calculate Risk and Odds as a measure of interpretable success. When determining odds, we
use the IV and see the proportion of nominal, potential Dependent Variables. For examples, given the
following table:
Heart Disease
Yes
No
>55
21
6
27
<55
22
51
73
43
57
100
We can calculate Risk by looking at any one cell mean over the marginal mean of the IV. So,
π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘“π‘œπ‘Ÿ π‘šπ‘’π‘› π‘œπ‘£π‘’π‘Ÿ 55 =
21
= .77
27
π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘“π‘œπ‘Ÿ π‘šπ‘’π‘› π‘’π‘›π‘‘π‘’π‘Ÿ 55 =
22
= .3
73
And, in turn, we can calculate the risk differences, which would be:
π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘“π‘œπ‘Ÿ π‘šπ‘’π‘› π‘œπ‘£π‘’π‘Ÿ 55 − π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘’π‘›π‘‘π‘’π‘Ÿ 55 = .77 − .3 = .47
The relative risk would be (higher risk always goes in the numerator):
π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘“π‘œπ‘Ÿ π‘šπ‘’π‘› π‘œπ‘£π‘’π‘Ÿ 55 . 77
=
= 2.56
π‘Ÿπ‘–π‘ π‘˜ π‘œπ‘“ β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ π‘’π‘›π‘‘π‘’π‘Ÿ 55
.3
We can calculate odds as the successes over failures within a condition.
π‘‡β„Žπ‘’ π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘šπ‘’π‘› π‘œπ‘£π‘’π‘Ÿ 55 β„Žπ‘Žπ‘£π‘–π‘›π‘” β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ =
π‘‡β„Žπ‘’ π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘šπ‘’π‘› π‘’π‘›π‘‘π‘’π‘Ÿ 55 β„Žπ‘Žπ‘£π‘–π‘›π‘” β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ =
21 (π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘œπ‘£π‘’π‘Ÿ 55 𝑦𝑒𝑠)
= 3.5
6 (π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘›π‘œ)
22(π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘’π‘›π‘‘π‘’π‘Ÿ 55 𝑦𝑒𝑠)
= .43
51 (π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘›π‘œ)
We can then use these odds to get a relative measure of how much greater the odds are for one group
relative to another:
π‘‡β„Žπ‘’ π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘šπ‘’π‘› π‘œπ‘£π‘’π‘Ÿ 55 β„Žπ‘Žπ‘£π‘–π‘›π‘” β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’
3.5
=
= 8.1
π‘‡β„Žπ‘’ π‘œπ‘‘π‘‘π‘  π‘œπ‘“ π‘šπ‘’π‘› π‘’π‘›π‘‘π‘’π‘Ÿ 55 β„Žπ‘Žπ‘£π‘–π‘›π‘” β„Žπ‘’π‘Žπ‘Ÿπ‘‘ π‘‘π‘–π‘ π‘’π‘Žπ‘ π‘’ . 43
Download