Let’s revisit the t-test and add Analysis of Variance T-Test • Two Sample t-test • Comparing two sample means. Signal t X1 X 2 2 S X1 Error (Standard error of mean differences) n1 2 SX2 n2 It is evident from the formula that the smaller the variability, the larger the t value. Formulas of variation Variance: S Variance 2 ( X X ) n 1 Standard Deviation (SD): SD S 2 Standard error of the mean=SEM: SEM SD n 2 Let’s take an output from a ttest analysis Example from the PASW tutorial Independent Samples Test Levene's Test for Equality of Levene’s test determines if the variance in one group is different from the other. This is an important assumption. Variances F This is the dependent Equal variances assumed variable for weight Equal variances not Sig. 1.138 .300 assumed Independent Samples Test t-test for Equality of Means t df Sig. (2-tailed) This is the dependent Equal variances assumed -8.462 18 .000 variable for weight Equal variances not -8.462 17.648 .000 assumed Independent Samples Test t-test for Equality of Means Mean Std. Error Difference Difference This is the dependent Equal variances assumed -32.50000 3.84086 variable for weight Equal variances not -32.50000 3.84086 The results are significant. Sig. (2-tailed) is the Type 1 error. assumed Independent Samples Test t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper This is the dependent Equal variances assumed -40.56935 -24.43065 variable for weight Equal variances not -40.58090 -24.41910 assumed Confidence intervals Confidence interval: Definition • In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest, if the experiment is repeated. • http://en.wikipedia.org/wiki/Confidence_interval Confidence intervals • Confidence intervals can be for a variety of statistics. – Means, t statistics, etc… • For the mean difference as seen in the ttest output, the confidence interval encompasses 95% of all expected t values given the error estimated from our data. – Thus for our example, we expect to obtain a mean difference to be between and include 40.56 and -24.43 95% of the time CI cont’d • As we can see the value of zero is within that CI. Therefore, we would not reject the null hypothesis. – Indeed, the p value obtained is greater than .05 The value of CI • In most experimental work, investigators simply report the inferential statistic, the p value and, sometimes power. • In many clinical papers, CI is reported, as clinicians feel that the range of possible values is more informative. CI cont’d • If we know the population values of any distribution we use the Z statistic for the number of SD away from the mean. Thus, the exact values would be +/- 1.96 SD. • When we don’t have the population we use a t statistic for the number of SD away from the mean which varies depending on the sample size (see example in the next few slides). CI cont’d • Any values within the CI could be considered common values and many physicians would regard those value as normal. However, that would have to be determined against many other measures where a pattern would be obtained. CI example • • • • • • • • • • 2 Scores ( X X ) 100 SD n 1 115 125 6173 . 3 111 1234 . 67 35 . 13 123 5 198 Mean=128.67 SD 35 . 14 SEM 14 . 34 SS=6173.3 n 6 Df=6-1=5 CI cont’d • For 5 degrees of freedom the critical t is 2.571 (taken from the t-test tables). • Distance from the mean =14.34±2.571= ±36.88 • CI=128.67±36.88= 91.78 to165.55 Tests of normality • As we discussed before, one of the rules of statistics is that the samples come from normally distributed populations. • We can test whether or not the samples come from normally distributed populations. • The tests are: • Shapiro-Wilk Test for samples less then 50 but can handle greater sample sized. • Kolmogorov-Smirnov Test which is quite suitable for large sample sizes. Example of output from SPSS Tests of Normality Kolmogorov-Smirnov Statistic TestVariable df .375 a Shapiro-Wilk Sig. 6 .008 Statistic .741 df Sig. 6 .016 a. Lilliefors Significance Correction We can see here that the data is not normally distributed. Clearly not normal What to do when data is not normal • 1. Transform the data using various formulas suited for the shape of the data. – Square root. – Inverse cubed – Log base 10 – Ln – Etc • 2. Use nonparametric statistics that are insensitive to violations including shape. Nonparametric tests • Since we have been discussing the t-test we will offer an alternative to it. • There are two: – Mann-Whitney U test – Wilcoxon Sign Rank test • Both provide identical results. The story is that both were independently developed at the same time. Analysis of Variance (ANOVA) One-Way Analysis of Variance ANOVA aka Single Factor Analysis of Variance 1) when is a one-way analysis of variance used? 2) sources of variation: generally from treatment and from individual differences. 3) an example of a one-way analysis of variance 4) assumptions underlying F-distribution When would you use a one-way analysis of variance? Example 1: -What if you were interested in investigating the efficacy of 3 types of medication as headache remedies? -You would need to consider… -IV: medication Type: Subjects would be randomly allocated to one of three levels; 1) Tylenol, 2) Bayer, or 3) Advil condition -DV: elapsed time (in minutes) from ingesting the medication to reporting disappearance of headache 1. Analysis of variance is mostly used with you have more than 2 means. 2. F = t2 when you have only two groups. Example 2: -What if we wanted to know whether or not the household income of adults was different depending on political affiliation? -in this case we have 5 groups, representing the political parties. They are: Liberal, NDP, PC, Reform, Bloc Note: This was before the PC and Reform and Canadian Alliance merged. IV (grouping variable): preferred party with 5 levels DV (that variable whose values will be influences by the IV) which is household income Conceptual basis of analysis of variance ***We want to explain why people differ from each other -is it because of your treatment variable (independent variable)? -or is it just random variation (error)? i.e., want to track down the sources of variation e.g., let's investigate how often UWO students go home during 1 semester Here is random a sample of 12 students and the number of times they go home in a semester. 8, 4, 6, 1, 7, 5, 2, 7, 4, 3, 7, 4 Now we allocate subjects to the distance they have to travel if they which to visit the homestead: < 2 hours drive: 8, 7, 7, 5 2 to 4 hours drive: 6, 7, 4, 4 > 4 hours drive: 3, 1, 2, 4 From the one-way analysis of variance we will be able to identify two sources of variance: 1) distance from home to UWO (treatment or categorization) 2) residual variation that could be due to lots of things (this is the variation that cannot be explained by your IV) or error This is exactly what happens in an Analysis of Variance • variation is broken down into 2 components: 1. variation due to differences between groups 2. variation due to differences within groups • the analysis measures whether the between groups variance is larger than would be expected by chance by comparing it to the variance within groups Lets expand on a previous example: Data copied from an excel worksheet representing dollars in thousands. Bloc PC 12 14 15 11 17 16 Means Liberal 23 21 24 22 24 25 NDP 34 34 35 36 37 38 Reform 45 46 45 44 41 42 56 57 58 59 60 68 14.16667 23.16667 35.66667 43.83333 59.66667 Grand Mean Is the variation between means larger compared to individual differences? 35.3 Do you remember the formula for variance? (x x) S 2 2 n 1 = sample variance The analysis of variance (F test) essentially uses the same conceptual format. n ( x x ) /( J 1) ( x x ) /( N J ) 2 F j j .. 2 ij Between group (treatment) variance (J=number of groups) Within group (individual subject) variance (N=total sample size) j Remember that the F test (ratio) is a statistic used to compare the size of the variance from one source against another. For us, it is comparing between group variance against individual subject variance. Assumptions associated with the F distribution 1. Observations come from normally distributed populations. 2. Observations represent random samples from populations. 3. Population variances are equal. 4. Numerator and denominator of F ratio are independent. would be dependent if a score or subject in 1 condition is contingent on having some score or subject in another condition e.g., scores are dependent when a subject in 1 condition scoring high means that a subject in another condition must score low How would you construct an Fdistribution 1. Determine the number of levels and the number of subjects per level. From a sample distribution, randomly sample with replacement. With each sampling calculate the F statistic. Plot as many calculated Fs possible to obtain a sampling distribution of Fs. We can now determine beyond which point an F will be observed less than 5% of time if sampling from the same population. 2. 3. 4. 5. • • This is called the critical F. The critical F changes depending on the number levels and the number of subjects per level. F-Distribution Determination of an F critical from a probability density function. The F critical depends on the number of levels and the number of subjects used in each sample. One-way analysis of variance Example Problem A researcher was interested in whether or not various cholesterol reducing drugs called statins actually resulted in a decrease of blood serum Low Density Lipids (LDL). The mechanism by which these drugs work is by inhibiting “HMG CoA reductase” a rate controlling enzyme for the production of cholesterol. Male subjects with higher than recommended cholesterol levels (>160 mg/dL) were randomly assigned to one of four drug levels of the IV called “LDL Reducing Drugs”. The DV is the LDL amount in blood in mg/dL. 1. Atorvastatin 2. Fluvastatin 3. Simvastatin 4. Regular treatment not consisting of a statin. Three weeks after being prescribed the compound, all subjects were asked to visit the research clinic and have their LDL levels measured. Hypotheses µ refers to mean of the population H0: µA = µF = µS = µR (null) H1: not all means are equal (alternate) Note: You may have noticed that the alternate hypothesis simply states that not all means are equal. The analysis that we will conduct here simply determines if there are means which are not equal (this is an omnibus test). The analysis will not specify which means are different from one another. Following the ANOVA you will have to conduct posthoc analyses which will study later in the lecture. The data Statin IV in column 1: 1=Atorvastatin 2=Fluvastatin 3=Simvastatin 4=Regular treatment. DV in column 2: Measurements in mg/dL. 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 LDL 110 103 90 94 101 120 115 113 105 114 100 101 110 106 104 150 144 129 133 130 Results F n (x x (x x ) j j ij j ) /( J 1) 2 .. 2 /( N J ) Sum of squares between groups/df Sum of squares between individual BG s/df WG Mean squares between groups Mean squares between individual s 4206 . 8 F 3 1402 . 267 28 . 987 774 . 00 48 . 375 16 Fα=.05 (3,16)=3.24 Since the obtained value is larger compared to the critical value, we can reject the null hypothesis that all samples come from the same population. Hence, a significant treatment effect is observed and we can make a statement that statins have an effect. See tables in the next two slides for the critical values. DF for Treatment DF for error How to use tables: http://www.statsoft.com/textbook/distribution-tables/ F table cont’d Results from SPSS Treatment Error SPSS results match our hand calculations. Testing the assumptions 1) Normal distribution: Use Shapiro-Wilk’s test of normality 2) Random sampling: make sure that you sample randomly, but we will have to take your word for it. 3) Equal variances: tests of homogeneity of variances can be used (e.g., Levene's test). 4) Numerator and denominator are independent: if samples are random, can assume that this is true. Failures to meet the assumptions 1) F distribution is not terribly affected by small departures. Can transform data if you expect a large departure from normality. 2) Not randomly sampling the population can be probematic. This can be the case if you hand pick samples. Conclusions don’t generalize to population. 3) Can be a problem if variances are extremely different or if sample sizes are unequal. Can transform data or use a nonparametric test. 4) Don't have subjects' scores be dependent on one another. Comparing Groups • The analysis of variance does not determine specific group differences. • We could use the t-test but we would end up with an unacceptable family wise error (FW). – FW is the accumulation of Type1 errors committed with every comparison. • Three comparisons using the t-test would mean we have FW of 0.15, meaning that we have 15% that at least one comparison shows significant differences between the mean due to chance alone. – We can correct this with a Bonferroni correction • BC=per comparison alpha (PCa) /number of comparisons. • This value becomes the new PCa. Comparing Groups Cont’d • The Bonferroni correction is somewhat conservative. – Type2 errors are possible. • It is recommended to use Tukey’s Honestly Significant Difference test (HSD). – This test is considered to be a good compromise between Type1 and Type2 errors. Tukey's HSD (Honestly Significant Difference) 1) used to test for a significant difference between each pair of means 2) a post-hoc test i.e., you didn't plan to do that specific test ahead of time you're reacting to a significant result after you found it controls for Type I error rate () across a bunch of tests (called family-wise ) 3) only used if: (a) The ANOVA is significant. (b) The main effect has more then two groups. (c) calculate q, where: n = # of subjects/group MSerror = within groups mean square from Anova table q X i X MS error n j Our Statin example. •q critical=4.05, when you have 4 groups and 16 dfs for error. •MSerror from the original analysis=48.375 •N=5 •Let’s compare the Atorvastatin to the control group. •Thus, 99.6 to 137.2. q X a X MS error n c 99 . 6 137 . 2 48 . 375 37 . 6 12 . 09 3 . 11 5 Thus, these two groups are significantly different from one another. Notice that I’m not concerned about direction. It’s the magnitude that matters here. Percentage Points of the Studentized Range Percentage points of the studentized range (cont'd) Post Hoc Tests Shown here are examples of the Tukey and the Bonferroni tests using data from our fictitious study. Homogeneous Subsets This simply shows aggregates or subsets of groups that are not different from one another.