Math 311, Winter 2005, Lab 5 The Tools In this section you’ll simply learn the mechanics of 1 and 2-sample t-tests Inferences With Minitab: Have Minitab compute two columns of 10,000 rows of data. Choose the first (C1) from a normally distributed population (use N(0,1)) and the second (C2) from a uniformly distributed population, distributed between 0 and 1. Store this data in columns C1 and C2. Recall one does this by selecting Calc>Random Data> …. We will use these data in our tests below. 1-sample t-test: View the data in column C1 as a sample of size 10,000. We may use Minitab to compute both a confidence interval for the (true) mean of the population and perform a hypothesis test for the mean of the population at one time! a. Select Stat>Basic Statistics>1-Sample t… b. Enter C1 for Variable c. Enter the value of the population mean from the Null Hypothesis as the Test mean: (this is for the hypothesis test – the Alterative Hypothesis is entered below). This time, use 0 (this is, in fact, the true mean). d. To set the confidence level select Options (this is for the confidence interval). 95% is the default setting. e. Notice that while you’re in the Options menu, you can also select equal to, greater than, or less than for your Alternative Hypothesis. Since this is just practice, pick whichever floats your boat. f. Finally, select OK (twice) and get something like this: One-Sample T: C1 Test of mu = 0 vs not = 0 Variable C1 N 10000 Variable C1 P 0.970 Mean 0.000375 StDev 1.003442 SE Mean 0.010034 95% CI (-0.019294, 0.020044) T 0.04 g. Note: T = one sample t-statistic and P = p-value. Notice that the p-value is high because our null hypothesis was correct – mu is 0. Matched pair t-test: We can also do a matched pair t-test by viewing the data in columns C1 and C2 as two data points collected from 10,000 individuals and by following the directions below: a. Select Stat>Basic Statistics>Paired t… b. Select Samples in Columns c. Enter C1 in First: and C2 in Second: d. Select Options to set the confidence level and type of hypothesis test and the value of the mean in the null hypothesis. e. Finally, select OK (twice) and get something like this (next page): Paired T-Test and CI: C1, C2 Paired T for C1 - C2 C1 C2 Difference N 10000 10000 10000 Mean 0.000375 0.501160 -0.500785 StDev SE Mean 1.003442 0.010034 0.290175 0.002902 1.044458 0.010445 95% CI for mean difference: (-0.521258, -0.480311) T-Test of mean difference = 0 (vs not = 0): T-Value = -47.95 P-Value = 0.000 f. Notice that the top shaded region (I added the shading) tells us was the difference in the sample means was (this is x1 x2 ) – this number makes sense as the normally distributed data had a mean around 0 and the uniformly distributed data has a mean around .5 g. Meanwhile the lower shaded region tells the reader both the null hypothesis (difference = 0) and the alternative hypothesis (not = 0). 2-sample t-test: We can also do 2-sample inferences by viewing the data in columns C1 and C2 as samples from independent populations and by following the directions below: a. Select Stat>Basic Statistics>2-Sample t… b. Select Samples in Different Columns (in this example, we’re going to compare the samples in C1 and C2). c. Enter C1 in First: and C2 in Second: d. Select Options to set the confidence level and type of hypothesis test (1-sided vs. 2sided) and the value of the mean in the null hypothesis. e. Finally, select OK (twice) and get something like this: Two-Sample T-Test and CI: C1, C2 Two-sample T for C1 vs C2 C1 C2 N 10000 10000 Mean StDev SE Mean 0.00 1.00 0.010 0.501 0.290 0.0029 Difference = mu (C1) - mu (C2) Estimate for difference: -0.500785 95% CI for difference: (-0.521260, -0.480310) T-Test of difference = 0 (vs not =): T-Value = -47.94 11659 P-Value = 0.000 DF = f. Note Estimate for difference = difference of sample means = x1 x2 . DF = degrees of freedom. g. Important note: the samples from the two populations do not need to be in separate columns – one may select the option Samples in one column. For example, suppose we were comparing the pollution levels in streams on the east coast vs. the west coast. If the pollution data was in C1 and the location (east or west coast) was in C2, then we could compare the mean pollution levels by entering C1 in the Samples: box and C2 in the Subscripts: box. Try this in the football worksheet coming up. The Questions In this section you’ll apply the techniques learned above !!!BE CERTAIN TO READ EACH SITUATION CAREFULLY – THEY CONTAIN USEFUL CLUES!!! Twins (or Nature vs. Nurture): How much of our personality (or lack thereof), our likes, our individuality is predetermined by our genes? Identical twins, because they share an identical genotype, make ideal subjects for investigation of the degree to which various environmental conditions may instigate change. The classical method of studying this phenomenon is the study of identical twins separated early in life and reared apart. Over the past twenty years, several studies of identical twins have been conducted. The most publicized study was begun in 1979 by University of Minnesota psychologist Thomas Bouchard and continues today. Bouchard and his colleagues at the Minnesota Center for Twin and Adoption Research have published over 129 scientific papers on the subject. In this lab, we consider a similar study carried out by psychologist Susan Farber and published in her book, Identical Twins Reared Apart (Basic Books, 1981). Farber chronicles and analyzes data for 95 pairs of identical twins reared apart. Much of her discussion focuses on a comparison of IQ scores. The question of concern is, “Are there significant differences between the IQ scores of identical twins, where one member of the pair is reared by the natural parents and the other member is not?” IQ scores for 32 of Farber’s 95 twins are stored in Twins.mtw. (Get the file now from the course webpage http://www.cwu.edu/~englundt/Data.htm) One member (A) from each of the sets was reared by a natural parent, whereas the other member (B) was reared by a relative or some other person. 1. Which test would be most appropriate: a one-sample t-test, a matched pair test, or a twosample t-test. Why? 2. Construct a Null Hypothesis and an Alternative Hypothesis. 3. Perform the appropriate test and report the P-value you found (you can do this by just cutting and pasting the Minitab analysis into Word) 4. State your conclusions. Note: we are now very near the end of the course. Thus, I expect a complete and correctly worded explanation and summary of your results. If it helps, imagine that you are Dr. Farber and write accordingly. Football (or Winning for Dummies): Data for all NFL games played over 3 seasons is contained in the file football.mtw. Get this file now. An explanation of the column headings follows: Home/Away: Favored team is at home (1) or away (0). Favorite Points: Points scored during the game by the favored team. Underdog Points: Points scored during the game by the underdog team. Pointspread: Oddsmaker’s points (determined before the game) to handicap the favored team. Favorite Name: Code for favored team’s name. Underdog Name: Code for the underdog team’s name Year: 89, 90, or 91 season (determined by beginning of season). Week: week of the season the game was played. Special: Monday night (M), Sat (S), Thur (H), Sunday night (N), overtime game (ot). Let’s investigate these data: 1. Can the oddsmakers pick the right team? Consider Points Scored by the Favorite – Points Scored by the Underdog. If the oddsmakers were just randomly guessing, then it’s reasonable to expect that this difference would be zero. Is there evidence significant at the α = 0.05 level that this is not so? Perform the appropriate test; report the test used and corresponding t-score & P-value (as above, you can do this by cutting and pasting from Minitab into Word). List your conclusion. 2. Bonus Question (i.e. extra credit): We’ll now investigate how often is the spread covered? That is, how often are the points scored by the favored team minus the points scored by the underdog team at least as great as the spread? This is, obviously, is an incredibly important question for those of us who supplement our income in various ways. Does the point differential (i.e. Points Scored by the Favorite – Points Scored by the Underdog) usually exceed the Spread Perform the appropriate test; report the test used and corresponding t-score & P-value. List your conclusion. Note: you will need to manipulate the data using techniques learned in past labs. Nutrition – platewaste in the 4th & 5th grades: Last year colleagues from the Family and Consumer Sciences Department and I studied various factors that affect the amount of food elementary school children eat during lunch. Four schools were studied – each for ten days. The platewaste of each child was compared to the amount served that day and was used to compute the nutrients each child consumed that day. The results for each child in 4th and 5th grade are recorded in the file nutri.mpj. Get this file now. Column C4 contains the data concerning the placement of recess with respect to lunch (before and after). Column C11 contains the data concerning the number of calories consumed by the child that day. Some days the children were served more and some days, less. To adjust for this, I computed the percent of the calories served that were consumed. These data are recorded in C12. Thus if a child ate half the calories served that day, .50 would appear in C11. Similar analysis was done for other nutrients (scroll right to see the results). Let’s investigate these data…. 1. Does the placement of recess affect the number of calories consumed? State your hypotheses, perform the appropriate test, and report the P-value you found. State your conclusions. Note: in this worksheet, the data is recorded in one column (C11) while the subscripts are contained in another column (C4). 2. The difference in calories consumed by each sample is not very large. Find another food item in your house that contains the same number of calories as this difference. 3. Does gender affect the number of calories consumed? State your hypotheses, perform the appropriate test, and report the P-value you found. State your conclusions.