Lecture 6 Hypothesis Tests Applied to Means I Dog Colors Judge 2 Green Green Observed Expected Judge 1 Judge 1 Red Blue Total 10 1 3 14 Red 2 5 2 9 Blue 0 1 9 12 7 14 Total Sum (Agree) 24 k 0.586351 % Agree 0.727273 Judge 2 Green Green Red Blue Total 5.090909 1 3 14 Red 2 1.909091 2 9 10 Blue 0 1 4.242424 10 33 Total 12 7 14 33 Sum (Expected) 11.24242 2 Hypothesis tests applied to the means Recall what you learned about sampling distributions in Chapter 4: Sampling distribution: the distribution of the values of a particular statistic, over a very large number of repeated samplings with equal sample size, each taken from the same population. Sample statistics: describe characteristics of a sample. Standard error: The standard deviation of a sampling distribution. Example: Descriptive Statistics N Statistic READING STANDARDIZED SCORE Valid N (lis twise) 271 Mean Statistic Std. Error Std. Deviation Statistic Variance Statistic 51.82690 9.470882 89.698 .575315 271 3 Test statistics: describe differences or similarities between samples and allow us to make inferences about their respective populations. *As an observed statistic’s value falls farther and farther from the center of this distribution, you would be less and less likely to believe that that sample could have come from the hypothetical distribution that this sampling distribution represents. This constitutes the conceptual framework for hypothesis testing. 4 Recall the steps in the hypothesis testing process: 1. Generate a research hypothesis—a theory-based prediction. 2. State a null hypothesis(Ho)—one that, based on our theory, we believe to be incorrect. That is, pretend that the data were chosen from a population with known & uninteresting characteristics. The alternative hypothesis (HA) is the logical converse of the null hypothesis. 3. Obtain the sampling distribution of the statistic assuming that the null hypothesis is true. 4. Gather data. 5. Calculate the probability of obtaining a statistic as or more extreme than the one observed based on the sampling distribution. 6. Decide whether the observed probability is too remote to support our null hypothesis. If it is, then reject the null and support your theory. 7. Substantively interpret your results. 5 Also recall that the decision can have several potential outcomes: Truth Decision Ho True Reject Ho Type I error (a) Power (1-b) Retain Ho Correct decision (1a) Type II error (b) Ho False And that a p-value indicates the probability of obtaining the observed statistic value or more extreme assuming that the null hypothesis is true (as opposed to alpha (α), which dictates the size of the rejection region based on the researcher’s judgment). 6 Sampling Distribution of the Mean One of the most interesting sampling distributions is the sampling distribution of the mean—the distribution of sample means created by repeatedly randomly sampling a population and creating equal-sized samples. The characteristics of this distribution are summarized in the central limit theorem: Given a population with mean X and variance 2, the sampling distribution of the mean (the distribution of sample means) will have a mean equal to X (i.e., X 2 X ), a variance ( X ) equal 2 to X , and a standard deviation ( X ) equal n X to . The distribution will approach the normal n distribution as N, the sample size, increases. 7 In English….. Suppose you have a population, and you know the mean ( X ) and variance of 2 that population ( ) (recall that we almost never know these parameters). Now suppose that you collect a very large number of random samples from that population, each of size N, and compute the means of those samples. Now you have a distribution of sample means—the sampling distribution of the mean. Note that you’d have a slightly different sampling distribution if you selected a different N. 8 The mean of the sampling distribution of the mean ( X ) equals the parameter that you are estimating ( X ). In addition, the standard deviation of the sampling distribution of the mean ( X , a.k.a. the standard error of the mean) equals the population standard deviation divided by the square root of the sample size ( X ). N In addition, the sampling distribution will be approximately normally distributed when the sample size is large. 9 In R…. To demonstrate the concepts of the central limit theorem, let’s take some random draws from a normal population with different sample sizes 10 Think about the following questions: • What is the value of the mean of a sample of N = the entire population? • What is the shape of the sampling distribution of the mean when N = the entire population? • What is the standard deviation of the sampling distribution of the mean when N = the entire population? 11 Revisiting the Z Test Recall that the z-test is an inferential test that allows us to perform an hypothesis test in situations in which we would like to determine whether the mean of an observed sample could have come from a population with a known population mean ( X ) and standard deviation ( X ). Recall that you can standardize scores via: z X X X Also, recall the following about the sampling distribution of the mean: • • • It has a mean equal to ,X the population mean. It has a standard deviation (standard error of the mean) equal to X X N It is normally distributed when the sample size, N, is large 12 We can use this information within the hypothesis testing framework in the following way: 1. Determine which test statistic is required for your problem and data. *The z-test is relevant when you want to compare the observed mean of a quantitative variable to a hypothetical population mean (theory-based) and you know the variance of the population. 2. State your research hypothesis: that the observed mean does not come from the population described by your theory. 3. State the alternative hypothesis: that the observed mean is not equal to the hypothetical mean (i.e., X 0 or the appropriate one-tailed alternative, like X 0). 4. State the null hypothesis: that the observed mean equals the hypothetical mean (i.e., X 0 or the appropriate one-tailed alternative, like X 0). 5. Determine the critical value for your test based on your desired a level. 13 6. Compute your observed z-test statistic. First, identify the location and dispersion of the relevant sampling distribution of the mean. The location is dictated by the hypothetical population mean ( X ). The dispersion equals the known population standard deviation divided by the square root of the sample size: X X N Second, turn the observed sample mean into a z-score from the sampling distribution of the mean: z X or z X 0 X 0 X X N 7. Compare the observed z-test statistic value to your critical value and make a decision to reject or retain your null hypothesis. 8. Make a substantive interpretation of your test results. 14 [Example] Suppose we want to compare the mean GRE score of graduate students at Loyola University Chicago to the GRE test-taking population. We know the mean and standard deviation of that population—500 and 100, respectively. Suppose the mean GRE score of our school is 565, based on 300 graduate students last year. Of course, we’d like to believe that our graduate students are more academically able than the average graduate student—our research hypothesis. That means that we’ll use a one-tailed test so that H0: X 0 , and HA: X 0 . If we adopt α = .05, then our one-tailed critical value (the value to exceed) equals 1.65 (from the z-table). We compute our observed z-statistic by plugging our known values into the equation: z X 0 565 500 65 11.27 100 X 5.77 300 15 The z-test statistic (11.27) is clearly larger than the critical value (1.65). It is clear that the observed difference between the sample mean and the population mean is much larger than would be expected due to sampling error. In fact, the p-value for the observed statistic is less than .0001. We would interpret this substantively with a paragraph something like this: The mean GRE score for graduate students at Loyola University Chicago (565) is considerably larger than the mean for the GRE testing population (500). This difference is statistically significant (z = 11.27, p < .0001). 16 Graphically, here’s what we did: 0 X 500 X 565 z 11.27 X 5.77 zCV 1.65 a .05 GRECV 509.52 p .0001 17 One-Sample t Test The z-test is only useful in somewhat contrived situations--we hardly ever know the value of the population standard deviation, so we can’t compute the standard error of the mean. We need a different statistic to apply to most real-world situations. The appropriate statistical test is the one-sample t-test. Recall the formula for a z-test. z X 0 X 0 X X N It relies on the sampling distribution of the mean. We can create a parallel statistic using the sample variance rather than the population variance. t X 0 X 0 sX sX N 18 We use a slightly different probability density function for the t-test than we do for the z-test, because we now use the sample variance as an estimate of the population variance. Specifically, we rely on the Student’s t distribution for the t-test. The feature that differentiates the various t distributions is the degrees of freedom associated with the test statistic. The degrees of freedom for a t-test relates to the number of data points that are free to vary when calculating the variance. Hence, the degrees of freedom for the t-test equals N – 1, and there is a separate probability distribution for each number of degrees of freedom—recall that there was a single probability distribution for the z-test. The lost degree of freedom is attributed to the fact that the variance is based on the sum of the squared deviations of observations from the mean of the distribution. Because the deviations must sum to zero, one of the data values is not free to vary—one degree of freedom is lost. 19 Let’s apply the one-sample t-test to the GRE data. We’d still like to believe that our graduate students are more academically able than the average graduate student (i.e., H0: X 0 ), as stated on p.28, but in this case, we don’t know the value of the population variance. The COE’s mean GRE score of the 565, and the standard deviation equals 75, and there are 300 students in the COE. Our degrees of freedom equals 299 for this test (df=300-1). Looking at the column in the t table on p.682 with the “Level of Significance for One-Tailed Test” equals 0.05 and the df equals to (since 299 is much larger than the minimum value 100 in the df) , the critical value for this test at α=0.05 level is 1.645. Our observed t statistic is: t X 0 565 500 65 15.01 sX 75 4.33 300 N Since out test statistic (t=15.01) is larger than the critical t value (tc=1.645), our decision and interpretation is the same as it was when we knew the population variance. 20 SPSS Example: Go to “Analyze””Compare means””One-Samples T Test”. H0: reading 50 One-Sample Statistics N READING STANDARDIZED SCORE Mean 271 Std. Deviation Std. Error Mean 9.470882 .575315 51.82690 One-Sample Test Test Value = 50 t READING STANDARDIZED SCORE 3.175 df 270 Sig. (2-tailed) Mean Difference .002 1.826904 95% Confidence Interval of the Difference Lower Upper .69423 2.95958 21 Two Matched-Samples t Test A more common comparison in research is one in which two samples are compared to determine whether there is a larger-than-sampling-error difference between the means of the groups. Two common strategies for constructing groups in experimental research: One involves assigning individuals to groups via randomization (although not necessary). The other involves matching individuals and assigning 1 member of each matched pair to each group. Because the two matched-sample t-test (a.k.a., two dependent-samples t-test) is a less complex extension of the one sample t test, we’ll discuss it first. 22 But first, recall the reasons why we might do matching and the two most common methods of matching. We typically match cases because there are extraneous variables that are strongly related to the outcome variable and we want to make sure that observed differences between the groups on the dependent variable cannot be attributed to group differences with respect to these extraneous variables. For example, we may want to ensure that groups are equivalent on SES. We may match samples by pairing individuals on levels of the extraneous variable (matched samples), or we may expose the same individual to multiple conditions (repeated measures). 23 The groups are compared by examining the difference between the two members of each pair of individuals. The relevant statistic is the average difference score (Note that N is the number of pairs.). N X1i X 2i D i 1 Ni Although our null, theory-based, value for the magnitude of this difference can be any value, we typically are interested in determining whether the difference is non-zero. Hence, we state our null hypothesis to be that the mean difference in the population equals zero (i.e., Ho: D = 0). D 1 2 0 D 0 Formulating the null hypothesis in this way allows us to use a variation of the onesample t-test to make the comparison. t D D0 sD D D0 sD N 24 As an example, consider data from a study of the reading interests of 18 pairs of college-educated husbands and wives. Each individual in the sample was interviewed and asked how many books he/she had completed in the past year. The research question is do males and females who come from similar environments engage in similar levels of reading? This implies a two-tailed null hypothesis ( D 0 ), and the corresponding alternative hypothesis ( D 0). Our degrees of freedom equal 17, so our two-tailed critical value using a = .05 is 2.11. The mean and standard deviation of the differences in the sample were 1.16 and 2.88, respectively. So, our t-statistic is: D D0 1.16 0 1.16 t 1.71 sD 2.88 0.6788 18 N Because the observed t-statistic is not more extreme than the critical value, we retain the null hypothesis. That is, we do not have evidence that men and women read different amounts. Incidentally, the p-value for the observed t statistic equals .11. 25 SPSS Example: Go to “Analyze””Compare means””Paired-Samples T Test”. Difference: (Reading Score - Math Score) H0: D 0 Paired Samples Statistics Mean Pair 1 READING STANDARDIZED SCORE MATHEMATICS STANDARDIZED SCORE N Std. Deviation Std. Error Mean 51.87816 270 9.450733 .575153 51.71431 270 10.083413 .613657 Pa ired Sa mpl es Correlations N Pair 1 READING STANDARDIZED SCORE & MATHEMATICS STANDARDIZED SCORE Correlation 270 .714 Sig. .000 26 SPSS Example (cont’): H0: D 0 Pa ired Sa mpl es Test Paired Differenc es Mean Pair 1 READING STANDARDIZED SCORE - MATHEMATICS STANDARDIZED SCORE .163848 St d. Deviat ion St d. Error Mean 7.406301 .450733 95% Confidenc e Int erval of t he Difference Lower Upper -.723565 1.051261 t df .364 Sig. (2-tailed) 269 .717 Another way to perform the same analysis: We can calculate the differences between pairs and form a new variable, I called it “diff”, using “transformcompute” to calculate this new variable. 27 The outputs for this analysis. One-Sample Statistics N diff 270 Mean .1638 Std. Deviation 7.40630 Std. Error Mean .45073 One-Sample Test Test Value = 0 t diff df .364 269 Sig. (2-tailed) .717 Compare them to the previous one. Mean Difference .16385 95% Confidenc e Int erval of t he Difference Lower Upper -.7236 1.0513 Pa ired Sa mpl es Test Paired Differenc es Mean Pair 1 READING STANDARDIZED SCORE - MATHEMATICS STANDARDIZED SCORE .163848 St d. Deviat ion St d. Error Mean 7.406301 .450733 95% Confidenc e Int erval of t he Difference Lower Upper -.723565 1.051261 t df .364 Sig. (2-tailed) 269 .717 28 Try the following questions in our text. P.206 7.6 7.7 7.10 7.13 P.207 7.16 7.17 7.18 29