Paired t- test - Example • OISE sponsored summer institute to improve the skills of high school teachers of foreign languages. One such institute hosted 20 French teachers for four weeks. At the beginning of the period the teachers were given the Modern Language Association’s listening test of understanding spoken French. After 4 weeks of immersion in French, the listening test was given again. The data is given in the following slide. Dose the data provide evidence that the course improves French-spoken language skills? week 13 1 Data Display Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Teacher 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Pretest 30 28 31 26 20 30 34 15 28 20 30 29 31 29 34 20 26 25 31 29 Posttest 29 30 32 30 16 25 31 18 33 25 32 28 34 32 32 27 28 29 32 32 week 13 improvement -1 2 1 4 -4 -5 -3 3 5 5 2 -1 3 3 -2 7 2 4 1 3 2 • One sample t-test for the improvement T-Test of the Mean Test of mu = 0.000 vs mu > 0.000 Variable N Mean StDev SE Mean improvem 20 1.450 3.203 0.716 T 2.02 P 0.029 • MINITAB commands for the paired t-test Stat > Basic Statistics > Paired t Paired T-Test and Confidence Interval Paired T for Posttest – Pretest N Mean StDev SE Mean Posttest 20 28.75 4.74 1.06 Pretest 20 27.30 5.04 1.13 Difference 20 1.450 3.203 0.716 95% CI for mean difference: (-0.049, 2.949) T-Test of mean difference=0 (vs > 0): T-Value = 2.02 P-Value = 0.029 week 13 3 6 Frequency 5 4 3 2 1 0 -4 -2 0 2 4 6 8 improvement Character Stem-and-Leaf Display Stem-and-leaf of improvement Leaf Unit = 1.0 2 -0 54 4 -0 32 6 -0 11 8 0 11 (7) 0 2223333 5 0 4455 1 0 7 week 13 N = 20 4 Goodness of Fit Tests • The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. • There are various situations to which these tests apply. • The first situation we will explore is when we observe count data in k different categories. • The aim is to test the null hypothesis that the probabilities of the k categories are p1, p2,…,pk. • We distinguish between two cases. week 13 5 Case 1 • The null hypothesis completely specifies the probabilities of each of the k categories. • For each category we calculate the expected count Ei = npi. • The test statistic and its distribution are… week 13 6 Example • The statistic department at U of T offers introductory courses for students from other disciplines. The department believes that 40% of the students are math major, 30% are computer science, 20% biology and 10% chemistry. A random sample of 120 students revealed 52, 38, 21, and 9 from the four majors above. Does this data support the department claim? week 13 7 Case 2 • The null hypothesis does not fully specify the probabilities. • In this case the probabilities of the different categories may be functions of other parameters. • First use the sample data to estimate r unknown parameters. • Then use the estimated parameters to estimate the k probabilities. • For each category, calculate the estimated expected count. • The test statistic is… week 13 8 Example • A farmer believes that the number of eggs a chicken will give per day has a Poisson(λ) distribution. He observed the following data…. week 13 9 Remark • In many cases we will observe data that are not categorized and we would want to test is the data comes from a certain distribution. • If the distribution we are testing is discrete the values of the variable will be the actual categories. • However, if the variable takes infinite possible values, the grouping should be done so that the expected frequency in each category is at least 5. • If the distribution we are testing is continuous we need to group the measurement of the random variable of interest into k intervals. Very often the choice of cells is done arbitrarily. week 13 10 Contingency Tables • The goal is to test if two categorical variables are independent. • The row variable has r categories while the column variable has c categories. • The data is the count of observations in the rxc table… • The null hypothesis states that the row variable and the column variable are independent. The alternative states that the variables are dependent. • To conduct the test, we calculate the expected count for each cell… • The test statistic and its distribution is…. week 13 11 Example week 13 12