Practice Exercises for QA252 Intermediate Statistics Professor K. Leppel Hypothesis Testing: Type I and Type II Errors 1. A researcher has hypothesized that the mean number of traffic violations per day in Saskatchewan, Canada is 25. So the null and alternative hypotheses are H0: μ = 25, H1: μ ≠ 25. The level of significance to be used is α = .05. A random sample was taken and based on the results of the sample, decisions were made. Complete the table below. true (but unknown) mean 2. Is H0 true? Decision (based on sample) 25 Accept H0 25 Reject H0 27 Accept H0 27 Reject H0 Is the decision correct? (Y/N) Type of Error (I or II) if any A consumer advocate believes that more than 5% of a particular manufacturer’s tires are defective. The advocate wants to be 99% sure that he is correct before going public with his claim. In other words, he wants to limit the probably of claiming that a lot of the tires are bad, if they’re not, to at most 1%. In one-sided tests, the null hypothesis can usually be set up as the devil’s advocate’s approach to the claim. That is, the null hypothesis is "the percent defective is not more than 5% and I'm sticking with that until you can show me reason to believe otherwise." So the claim becomes the alternative hypothesis that will only be accepted if the null can be rejected. So, the null and alternative hypotheses are H0: π ≤ .05, H1: π > .05. The level of significance, α, is .01. A random sample was taken and based on the results of the sample, decisions were made. Complete the table. true (but unknown) proportion Is H0 true? Decision (based on sample) .05 Accept H0 .05 Reject H0 .07 Accept H0 .07 Reject H0 .03 Accept H0 .03 Reject H0 Is the decision correct? (Y/N) Type of Error (I or II) if any Valid Null and Alternative Hypotheses For each pair of null and alternative hypotheses, determine whether the set is a valid set of hypotheses, and if not, explain why not. Question number Null Hypothesis Alternative Hypothesis 1 H0: μ = 25 H1: μ ≠ 25 2 H0: π = .25 H1: π ≠ .25 3 π»0 : πΜ = 63 π»1 : πΜ ≠ 63 4 H0: π ≥ 18 H1: π ≤ 18 5 H0: μ > 43 H1: μ ≤ 43 6 H0: p = .72 H1: p ≠ .72 7 H0: μ ≥ 96 H1: μ < 96 8 H0: p ≤ .42 H1: p > .42 9 π»0 : πΜ < 57 π»1 : πΜ < 57 10 H0: π ≤ .81 H1: π > .81 Valid? (Y/N) Issues: Hypotheses must involve population parameters and NOT sample statistics. Null and alternative hypotheses must describe different and non-overlapping situations. Equality must be in the null hypothesis and NOT in the alternative. If not valid, why not? Hypothesis Testing – One Sample Use the following GPA data from a statistics class of 18 males and 14 females to answer questions 1 to 5. Male GPAs 3.52 3.5 3.4 3.2 3.2 3.06 3.05 2.8 2.8 2.7 2.7 2.65 2.6 2.5 2.5 2.4 2.4 2.3 Female GPAs 3.8 3.78 3.7 3.5 3.3 2.98 2.9 2.8 2.7 2.7 2.7 2.3 2.3 1.9 (1) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45. Test at the 5% level whether the mean GPA for all male Statistics students is equal to 2.9. (2) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45. Calculate the p-value to test at the 5% level whether the mean GPA for all male Statistics students is equal to 2.9. (3) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45. Test at the 5% level whether the mean GPA for all male Statistics students is less than 2.9. (Use the devil's advocate approach to set up the null hypothesis in this problem. The null is "the mean GPA for all male Statistics students is not less than 2.9, and I'm sticking with that until you can show me reason to believe otherwise." Note: Saying that the male mean GPA is not less than 2.9 is equivalent to saying that the male mean GPA greater than or equal to 2.9. The alternative is that the mean GPA is less than 2.9.) (4) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45. Calculate the p-value to test at the 5% level whether the mean GPA for all male Statistics students is less than 2.9. (Again, set up the null hypothesis using the devil’s advocate approach.) (5) Test at the 5% level whether the mean GPA for all male Statistics students is equal to 2.9. (You have no knowledge of the population standard deviation of GPAS.) (6) Suppose one of the possible majors in Business Administration is International Business. In a sample of 340 college students majoring in Business Administration, 60 students are majors in International Business. Test at the 5% level whether the proportion of Business Administration students majoring in International Business is 20%. Hypothesis Testing – Two Sample Use the following GPA data from a statistics class of 18 males and 14 females to answer questions 1 to 5. (This is the same data set as was used in the one-sample practice problems.) Male GPAs 3.52 3.5 3.4 3.2 3.2 3.06 3.05 2.8 2.8 2.7 2.7 2.65 2.6 2.5 2.5 2.4 2.4 2.3 Female GPAs 3.8 3.78 3.7 3.5 3.3 2.98 2.9 2.8 2.7 2.7 2.7 2.3 2.3 1.9 (1) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45, and the standard deviation of GPAs for the population of female Statistics students was known to be 0.55. Test at the 5% level whether the mean GPA for all male Statistics students is equal to the mean GPA for all female Statistics students. (2) Suppose that the standard deviation of GPAs for the population of male Statistics students was known to be 0.45, and the standard deviation of GPAs for the population of female Statistics students was known to be 0.55. Test at the 5% level whether the mean GPA is greater for females than for males. (Use the devil's advocate approach to set up the null hypothesis in this problem. The null is "the mean GPA for all female Statistics students is not greater than the mean GPA for all male Statistics students, and I'm sticking with that until you can show me reason to believe otherwise." Note: Saying that the female mean is not greater than the male mean is equivalent to saying that the female mean is less than or equal to the male mean. The alternative is that the mean GPA is greater for females than for males.) (3) Test at the 5% level the null hypothesis that the mean GPA for all male Statistics students is equal to the mean GPA for all female Statistics students. You have no knowledge of the population standard deviations. (4) Test at the 5% level the null hypothesis that the mean GPA for all male Statistics students is equal to the mean GPA for all female Statistics students. The population standard deviations are unknown but you believe that they are equal. (5) Test at the 5% level whether the variance of the GPAs of the female students a. is equal to the variances of the GPAs of male students. b. is greater than the variance of the GPAs of the male students. (6) Suppose one of the possible majors in Business Administration is International Business. A sample of 340 college students in Business Administration consists of 190 men and 150 women. There are 60 students majoring in International Business, 20 male and 40 female. Test at the 1% level whether the proportion of women in Business Administration who major in International Business is greater than the proportion of men in Business Administration who major in International Business. (Reminder: Use the devil's advocate approach to set up the null hypothesis for this problem.) (continued) (7) Suppose that the number of girls and the number of boys in the families of Statistics students are as given below. Test at the 10% level whether on average the number of boys in the family is equal to the number of girls in the family. Family 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # of girls 0 1 0 1 4 1 0 1 3 0 0 5 3 1 0 2 1 0 # of boys 1 1 1 2 1 1 3 1 0 2 3 2 1 0 3 0 1 2 Chi-squared Tests (1) Suppose a manager believes that the company’s customers have the following preferences for three models: 20% prefer model A, 30% prefer model B, and 50% prefer model C. Survey results from a sample of 300 customers indicate that 50 prefer model A, 80 prefer B, and 170 prefer C. Test at the 10% level whether the manager is correct. (2) Suppose that a class of 33 students is divided into commuters and residents. The students are also divided into 3 activities categories: those with no extracurricular activities, those with exactly 1, and those with 2 or more. The result is the following table. Test at the 10% level whether student commuter versus resident status is independent of the number of extracurricular activities. residence status # of extracurricular activities 0 1 2 or more commuter 6 7 1 Resident 8 5 6 (3) You want to test at the 5% level whether the performance on a standardized exam by the students at a particular university has a standard deviation of 10. Based on the data from your sample of 20 students, you have found the standard deviation to be 14. Perform the test. Analysis of Variance (ANOVA): Null hypothesis versus alternative hypothesis In the table below, write H0 next to the null hypothesis and H1 next to the alternative hypothesis. H0 or H1 Hypothesis H0 or H1 Hypothesis 1 There is no difference in average salaries of Asians, Caucasians, and African Americans. There is a difference in average salaries of Asians, Caucasians, and African Americans. 2 Average number of years of education varies with income class. Average number of years of education is the same for all income classes. 3 The average number of items produced per minute by four different machines is not the same. The average number of items produced per minute by four different machines is the same. 4 The average student performance is the same for all sections of a course. The average student performance is not the same for all sections of a course. 5 The average level of productivity of employees is the same for all training programs. The average level of productivity of employees depends on the training program. 6 The average life span of a microwave oven varies with brand. The average life span of a microwave oven does not vary with brand. 7 Average price of housing does not vary by city. Average price of housing varies by city. 8 The average student performance is the same regardless of the software used. The average student performance depends on the software used. The average employer rating of Co-op students depends on the university attended by the student. The average starting salary does not vary with the university from which the employee graduated. The average employer rating of Co-op students does not depend on the university attended by the student. 11 The average fee charged for household electrical repairs differs by county. The average fee charged for household electrical repairs is the same for all counties. 12 The average applicant score is the same for all interviewers. The average applicant score is not the same for all interviewers. 13 Average gasoline mileage for compact cars varies with manufacturer. Average gasoline mileage for compact cars does not vary with manufacturer. 14 Average number of hours studied per week is the same for all college class years. Average number of hours studied per week depends on college class year. 15 Average family size varies by ethnic group. Average family size is the same for all ethnic groups. 9 10 The average starting salary does vary with the university from which the employee graduated. ANOVA Table Completion Complete the following tables: Source of Variation SS Among or Between Treatments DF MS 2 1000 Error or Within Treatments Total 400 F-statistic ---------------- 6800 14 Source of Variation SS DF MS F-statistic Among or Between Treatments 800 11 300 ---------------- Error or Within Treatments Total Source of Variation 4100 13 SS DF MS 7 500 Among or Between Treatments Error or Within Treatments Total ---------------- ---------------- 336.364 7200 18 F-statistic ---------------- ---------------- Analysis of Variance Testing (1) Suppose that 34 students are divided into four categories: those with sports extracurricular activities only, those with non-sport activities only, those with both sports and non-sports activities, and those with neither type of activity. Each student was asked how many hours he/she worked per week at paid employment, if any. Based on the results, the sums of squares between and within were computed. Complete the analysis of variance table presented below. Then test at the 5% level whether the average number of hours worked per week varies with activity category. Source of variation Between Within Sum of squares Degrees of freedom Mean square 450.00 1500.00 Total (2) Suppose there are 4 years (freshman, sophomore, junior, and senior), and 2 housing statuses (commuter and resident), for a total of 8 cells. Each cell has 4 observations. The number of credits each student is carrying in the current semester is examined, and the various sums of squares are computed. Complete the analysis of variance table presented below. Then test at the 5% level whether the average number of credits carried is influenced by (a) class year, (b) housing status, and (c) the interaction of class year and housing status. Source of variation Class year Housing status Sum of squares 30 4 Interaction 24 Error 48 Total Degrees of freedom Mean square Simple Regression Consider the following data on the heights and weights of 30 students. Use these data to answer the questions below. height 69 68 74 67 64 65 74 73 62 71 73 66 66 72 63 67 71 69 63 69 67 67 65 64 68 70 80 64 75 66 weight 160 225 175 125 109 132 185 185 112 165 205 140 120 200 104 175 172 160 135 175 143 150 120 115 160 185 200 115 215 140 (1) Estimate the regression line of weight on height, Μ = π + π π»πΊπ ππΊπ (2) Calculate the standard error of the regression (or standard error of the estimate). (3) Calculate and interpret the coefficient of determination. (4) Calculate the standard error of the estimated coefficient b. Use this information to test at the 5% level whether the true slope of the relation between height and weight is actually zero. (5) Calculate the 95% confidence interval for the true slope of the relation between height and weight. (6) Calculate the sample correlation coefficient r. Test at the five percent level whether the population correlation coefficient is actually zero. (7) Calculate the 90% forecasting interval for the weight of an individual student whose height is 5 feet 9 inches. (8) Calculate the 90% forecasting interval for the average weight of a large group of students whose heights are all 5 feet 9 inches. Multiple Regression Suppose that a regression is run using the number of hours of study time per week (STUDY) as the dependent variable. There are 35 observations. The independent variables are WKHRS: COMMUTER: MALE: SENIOR: the number of hours worked per week at a job, dummy variable equal to one for commuting students, and 0 for resident students, dummy variable equal to one if the student is male, and 0 if the student is female, dummy variable equal to one if the student is a senior, and 0 otherwise. The results are as follows. Variable Constant WKHRS COMMUTER MALE SENIOR Source of variation Regression Error Total Coefficient 20.0 -0.5 2.0 -3.0 6.0 Standard error 10.0 0.125 2.0 6.0 2.0 Analysis of Variance Sum of squares Degrees of freedom 160.0 40.0 200.0 Mean square (1) Complete the ANOVA table. (2) Compute the standard error of the estimate (or standard error of the regression). (3) Compute and interpret the unadjusted coefficient of determination. (4) Compute the coefficient of determination adjusted for degrees of freedom. (5) Test at the 5% level whether the coefficient on the variable MALE is equal to zero. (6) Test at the 5% level whether the coefficient on the variable SENIOR is equal to zero. (7) Test at the 5% level the null hypothesis that the coefficient on the variable COMMUTER is equal to zero, versus the alternative that it is more than zero. (8) Test at the 5% level the null hypothesis that the coefficient on the variable WKHRS is equal to zero, versus the alternative that it is less than zero. (9) Test at the 5% level the hypothesis that all the slope coefficients are zero. (10) How much does expected study time change if a student works an additional hour at a job? Specify whether this change is an increase or a decrease in study time. (11) According to the regression results, do seniors study more or less than non-senior students? By how much more or less than non-seniors do seniors study? Time Series Suppose a student takes an intensive summer school course. The course meets all day Monday, Tuesday, Wednesday, and Thursday for four weeks. A short quiz is given each day. Suppose the student's quiz grades are as follows. Answer questions 1 to 3 based on these data. week I II III IV day M Tu W Th M Tu W Th M Tu W Th M Tu W Th grade 4 5 8 6 6 8 5 7 6 7 9 8 8 8 9 7 (1) Using four-day moving averages, compute the "seasonal" or "daily" index for each day of the week (instead of each season). (2) Based on your daily indices, on which day does the student usually perform the best? On which day does the student usually perform the worst? (3) Use your daily indices to adjust the time series of quiz grades. (4) Consider the following sequence of quiz grades. Calculate the grade forecasts for quizzes 2 to 10, using the exponential smoothing method. Let the forecast F1 for the first quiz grade be the actual value A1 for the first quiz. Use a weight on the actual grade of w = 0.5. actual grade (A1) 5 (A2) 8 (A3) 6 (A4) 8 (A5) 5 (A6) 7 (A7) 6 (A8) 7 (A9) 9 forecasted grade for next quiz (F2) (F3) (F4) (F5) (F6) (F7) (F8) (F9) (F10) Nonparametric Tests (1) Suppose the grades on an exam for the male and female students in a class were as indicated below. Use the Wilcoxon rank sum test to test at the 5% level whether males and females did equally well. (Note: If two students tie for ranks 1 and 2, they both get the "middle" rank of 1.5, and the student following them is ranked 3. If three students tie for ranks 1, 2, and 3, they all get the "middle" rank of 2, and the student following them is ranked 4. If four students tie for ranks 1, 2, 3, and 4, they all get the "middle" rank of 2.5, and the student following them is ranked 5.) males 97 94 86 86 83 81 78 76 76 69 63 56 55 51 47 38 32 21 20 3 females 97 92 89 86 86 85 77 76 70 56 49 41 39 29 15 (2) Two drivers are testing the mileage of different models of cars. a.The gas mileage of nine different cars is as indicated. Use the Wilcoxon signed rank test to test at the 5% level whether there is a difference in the gas mileage for the two drivers based on the data below. (Use the table for the signed rank critical values for this part.) driver 1 14.3 15.0 27.8 27.9 48.8 16.8 23.7 32.8 37.3 driver 2 16.8 17.8 26.2 33.2 47.6 18.3 28.5 33.1 44.0 b. Thirty-three cars were tested. Differences in mileage were calculated and ranked. There were three differences that were equal to zero. The sum of the positive ranks was 200 and the sum of the negative ranks was 265. Use the Wilcoxon signed rank test to test at the 5% level whether there is a difference in the gas mileage for the two drivers. (continued) (3) Suppose scores for 17 students from 3 schools in intermural competitions are as given below. Use the Kruskal-Wallis Test to test at the 5% level whether average scores for students from the three schools are the same. School A: 27, 21, 30, 23, 18, 19 School B: 22, 29, 17, 26, 14, 16 School C: 24, 12, 11, 13, 28 (4) Suppose that in a class of 15 males and 10 females, course averages are such that the males (m) and females (f) rank from high to low as given below. Test whether the arrangement is random, at the 10% level. f f m m m f m f f f m m m m m f f m m m f m m m f