252x0821 3/30/08 ECO252 QBA2 SECOND EXAM March 28 2008 Name Class Hour___________________ Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. Make diagrams! x ~ N 11, 13 - If you are not using the supplement table, make sure that I know it. 1. P0 x 54 2. Px 16 3. P12 x 41 4. x.055 (Do not try to use the t table to get this.) 1 252x0821 3/30/08 II. (5+ points) Do all the following. Look them over first – There is a section III in the in-class exam and the computer problem is at the end. Show your work where appropriate. There is a penalty for not doing Problem 1. Page 11 is left blank if you need more space for calculations. Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. A table identifying methods for comparing 2 samples is at the end of the exam. 3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 4. Use a 5% significance level unless the question says otherwise. 5. Read problems carefully. A problem that looks like a problem on another exam may be quite different. 6. Make sure that you state your null and alternative hypothesis, that I know what method you are using and what the conclusion is when you do a statistical test. Use a significance level of .05 unless you are told otherwise. 1. You wish to assess the stability of the price of a stock and you find closing prices for the last year. Rather than computing a variance of the entire population you take a sample of seven randomly picked closing prices and compute a sample standard deviation. The sample is below – compute the sample standard deviation. Show your work! (3) Row x 1 89 2 124 3 56 4 94 5 75 6 82 7 63 6 For your convenience the sum of the first six numbers in x is x 520 and the sum of the first six numbers i 1 6 squared is x 2 47618 . i 1 2) You wish to compare this stock against a second stock that your friend recommends. Your friend has taken a random sample of 10 closing prices and assures us that the sample mean price of this stock is 117.699 and the sample standard deviation is 55.2764. You don’t like your friend’s stock because 1) it has a larger variance, indicating that it is riskier and it costs more per share. The values you get are in the y column with z y being the y values with the 117.699 subtracted and the result divided by 55.2764. Compare the variances using a statistical test of the equality of variances. (2) Row y zy 1 2 3 4 5 6 7 8 9 10 78.48 130.93 93.17 105.37 69.50 85.43 102.84 259.27 151.17 100.83 -0.71 0.24 -0.44 -0.22 -0.87 -0.58 -0.27 2.56 0.61 -0.31 3) Are you sure that stock y has a higher average price than stock x ? Using the results of 2) compare the mean prices. If you do not assume equality of variances assume that you can use 14 degrees of freedom for the test. (3) 4) Test stock y to see if it has a Normal distribution. How do your results from the test of Normality affect your assessment of the results in 2) and 3)? (4) 2 252x0821 3/30/08 5) Using the sample means and standard deviations you found in 2) and 3) but assuming that both samples are of size 100 and come from a Normal distribution, do an 11% confidence interval for the difference between the means. (2) [14] 3 252x0821 3/30/08 III. (18+ points) Do as many of the following as you can. (2 points each unless noted otherwise). Look them over first – the computer problem is at the beginning. Show your work where appropriate. 1. Computer question. a) Turn in your first computer output. Only do b, c and d if you did. (3) b) (Meyer and Krueger) A corporation rents apartments within the city of Phoenix and in the surrounding suburbs. It wishes to verify that the mean rent in the city is lower than in the suburbs. Two independent random samples are taken. These appear below. City 401.84 666.95 804.01 611.09 Suburb 458.98 994.09 810.44 764.69 815.86 755.37 715.30 314.14 584.52 650.46 904.77 587.72 870.44 970.26 639.96 657.92 403.64 617.37 444.47 567.60 574.94 538.26 506.58 695.45 752.60 735.26 313.08 752.66 398.33 732.83 667.61 762.35 670.29 458.07 396.20 656.04 676.23 364.37 953.06 728.25 187.23 878.82 720.20 745.79 793.68 764.80 879.91 737.99 566.75 279.74 918.40 654.05 841.70 648.31 1106.17 919.93 What are your null and alternative hypotheses? Three tests appear below – Which is correct for your null hypotheses? (3) .01 Test 1. MTB > TwoSample c1 c2; SUBC> Confidence 99.0. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) Estimate for difference: -153.0 99% CI for difference: (-276.2, -29.7) T-Test of difference = 0 (vs not =): T-Value = -3.31 P-Value = 0.002 DF = 57 Test 2. MTB > TwoSample c1 c2; SUBC> Confidence 99.0; SUBC> Alternative -1. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) Estimate for difference: -153.0 99% upper bound for difference: -42.3 T-Test of difference = 0 (vs <): T-Value = -3.31 P-Value = 0.001 DF = 57 Test 3. MTB > TwoSample c1 c2; SUBC> Confidence 99.0; SUBC> Alternative 1. Two-Sample T-Test and CI: City, Suburb Two-sample T for City vs Suburb SE N Mean StDev Mean City 30 590 169 31 Suburb 30 743 189 34 Difference = mu (City) - mu (Suburb) 4 252x0821 3/30/08 Estimate for difference: -153.0 99% lower bound for difference: -263.7 T-Test of difference = 0 (vs >): T-Value = -3.31 P-Value = 0.999 DF = 57 3) From the output, but using the correct format for a confidence interval, what is an appropriate confidence interval to correct your hypotheses? (2) 4) What is your conclusion? Why? (2) 5) What method was used by the computer? D1, D2, D3, D4, D5a, D5b, D6a, D6b, D7? (1) 6) The following tests were run after the original hypotheses tests. What do they tell us about the appropriateness of the method? Why? (1) [12] MTB > NormTest c1; SUBC> KSTest. Probability Plot of City MTB > NormTest c2; SUBC> KSTest. Probability Plot of Suburb 5 252x0821 3/30/08 2. A ‘robust’ test procedure is one that a) Can only be done with a computer b) Requires an underlying Normal distribution c) Is sensitive to slight violations of its assumptions. d) Is insensitive to slight violations of its assumptions. 3. (Ng -219, 18) Assume that you have the following information: s12 4 , s 22 6 , n1 16 and n 2 25 and you wish to do a pooled-variance t test, your ŝ p and degrees of freedom are (3) [17] a) 2.45, 41 b) 2.24, 41 c) 2.29, 41 d) 2.00, 41 e) 2.45, 39 f) 2.24, 39 g) 2.29, 39 h) 2.00, 39 i) 2.45, 16 j) 2.24, 16 k) 2.29, 16 l) 2.00, 16 m) 2.45, 25 n) 2.24, 25 o) 2.29, 25 p) 2.00, 25 e) It’s more appropriate to add standard errors and use z 4. If I want to test to see if the mean of x1 is smaller than the mean of x 2 my null hypothesis is: (Note: D 1 2 ) Only check one answer! (2) a) 1 2 and D 0 b) 1 2 and D 0 e) 1 2 and D 0 f) 1 2 and D 0 c) 1 2 and D 0 d) 1 2 and D 0 g) 1 2 and D 0 h) 1 2 and D 0 5. Consumers are asked to take the Pepsi Challenge. They were asked they which cola they preferred and the number that preferred Pepsi was recorded. Sample 1 was males and sample 2 was females. The following was run on Minitab. [19] MTB > PTwo 109 46 52 13; SUBC> Pooled. Test and CI for Two Proportions Sample X N Sample p 1 46 109 0.422018 2 13 52 0.250000 Difference = p (1) - p (2) Estimate for difference: 0.172018 95% CI for difference: (0.0221925, 0.321844) Test for difference = 0 (vs not = 0): Z = 2.12 P-Value = 0.034 On the basis of the printout above we can say one of the following. a) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi b) At a 95% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi c) At a 99% confidence level we can say that we have enough evidence to state that the proportion of men that prefer Pepsi equals the proportion of women that prefer Pepsi. d) At a 96% confidence level there is insufficient evidence to indicate that the proportion of men that prefer Pepsi differs from the proportion of women that prefer Pepsi 6 252x0821 3/30/08 6. (Lenzi) A group of runners run a 100 meter dash before and after running a marathon. Their times are shown below. Pat – how long did they wait before running the second dash? Row 1 2 3 4 5 6 7 Before 12.4 11.8 12.5 12.0 11.5 11.2 12.9 After 12.6 12.2 12.4 12.7 12.0 11.8 12.7 d -0.2 -0.4 0.1 -0.7 -0.5 -0.6 0.2 Minitab printed out the following statistics. Variable Before After d N 7 7 7 N* Mean 0 12.043 0 12.343 0 -0.300 SE Mean 0.226 0.134 0.131 StDev 0.597 0.355 0.346 Minimum 11.200 11.800 -0.700 Q1 11.500 12.000 -0.600 Median 12.000 12.400 -0.400 Q3 Maximum 12.500 12.900 12.700 12.700 0.100 0.200 Can we show that they were slower after the marathon? a) How many degrees of freedom do we have in this problem? (1) b) What are our null and alternative hypotheses? (1) c) What is the approximate p-value for our result? (3) Show your work! (2 points if you do not do a p-value) d) On the basis of your p-value, what is our conclusion if the confidence level is 95%? Why? (1) e) What if the confidence level is 99%? Why? (You do not need a p-value to answer this part of the question, though is would help.) (1) [26] 7 252x0821 3/30/08 7. A researcher takes independent random samples salaries of 18 women (Sample 1) and 18 men (Sample 2) who are fairly recent Business graduates with the following results. Row Women Men Difference Minitab gives the following statistics. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 64709 47105 28972 31449 42574 59051 26838 56651 64929 57497 38290 67106 67280 40826 60826 46207 58976 45809 40824 54465 68433 54941 54050 53043 45680 40399 57584 78224 53722 34915 59636 50499 53502 77186 60208 48381 23885 -7360 -39461 -23492 -11476 6008 -18842 16252 7345 -20727 -15432 32191 7644 -9673 7324 -30979 -1232 -2572 Descriptive Statistics: Women, Men, d Variable Women Men d N 18 18 18 N* Mean SE Mean 0 50283 3156 0 54761 2708 0 -4478 4462 StDev 13392 11488 18929 Test the statement that women have a significantly lower salary than men. a) What are your null and alternative hypotheses? b) (Extra Credit) If you do not assume that variances of the two samples are equal (i) How many degrees of freedom do you have? (4) (ii) If you use the formula t d D0 , what is the value of s d ? (2) sd (iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions .05 (2) c) If you assume that variances of the two samples are equal (i) How many degrees of freedom do you have? (1) d D0 (ii) If you use the formula t , what is the value of s d ? (3) sd (iii) Compute the t ratio and test the hypothesis, clearly stating your conclusions .05 (2) [32] 8 252x0821 3/30/08 8. (Meyer and Krueger again) Back to the Phoenix problem. The people in the problem on page 3 are still obsessing over the relationship of rents to whether an apartment is urban or suburban. The computer output from a Chi-Squared test is below. Results for: 251x0821-06.MTW MTB > WSave "C:\Documents and Settings\RBOVE\My Documents\Minitab\251x0821-06.MTW"; SUBC> Replace. Saving file as: 'C:\Documents and Settings\RBOVE\My Documents\Minitab\251x0821-06.MTW' MTB > print c1-c3 Data Display Row Rent City 1 <500 48 2 500-599 51 3 600-699 30 4 700 up 22 Suburb 2 11 17 19 MTB > ChiSquare c2 c3. Chi-Square Test: City, Suburb Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts City 48 37.75 2.783 Suburb 2 12.25 8.577 Total 50 2 51 46.81 0.375 11 15.19 1.156 62 3 30 35.48 0.848 17 11.52 2.613 47 4 22 30.95 2.591 19 10.05 7.983 41 Total 151 49 200 1 Chi-Sq = 26.925, DF = 3, P-Value = 0.000 a) The above is a Chi-squared test of (1) i) independence ii) homogeneity iii) goodness-of-fit iv) none of the above b) What is the null hypothesis of this test and, assuming a 95% confidence level, what is the conclusion? (2) [35] 9 252x0821 3/30/08 9. Back to Phoenix again. The people in the previous Phoenix problems are now sure that the distribution of rents is not Normal but skewed to the right. They select a random sample of 10 rents in the city and another random sample of 10 rents in the suburbs (1-City, 2-Suburb). The researchers now believe that rentals in the city are lower than rentals in the country. The researchers will do the following test. a) T-test of paired data b) Wilcoxon signed rank test c) T-test of means of independent samples d) Wilcoxon-Mann-Whitney test e) None of the above 10. Assuming that a rank test of some sort is done in Problem 9, what will be our null hypothesis, and, assuming that the smaller of the two sums of ranks is 44 and that we are working with a 95% confidence level, what will be our conclusion and why? (3) [40] 10 252x0821 3/30/08 Blank page for calculations 11 252x0821 3/30/08 ECO252 QBA2 SECOND EXAM March 28, 2008 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class hours registered and attended (if different):_________________________ IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state clearly what number you are using to personalize data. There is a penalty for failing to include your student number on this page, not clarifying version number in each section and not including class hour somewhere. Please write on only one side of the paper. Be prepared to turn in your Minitab output for the first computer problem and to answer the questions on the problem sheet about it or a similar problem. 1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a relationship between the chosen major and whether the students had student loans. The students’ majors were categorized as Agriculture, Child Development, Engineering, Liberal Arts, Business, Science and Technology. Before you start personalize the data as follows. Let a be the second-to-last digit of your student number. Change the number of Science majors with loans to 31 a and the number of business majors who have loans to 24 a for every part of this problem. The total number of students in the survey will not change. Put your version of the table below on top of the first page of your solution. Use a 99% confidence level in this problem. Loan None Ag 32 35 Ch 37 50 Engg 98 137 Lib 89 124 Bus 24 51 Sci 31 29 Tech 57 71 a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0 and H 1 in terms of the proportions involved and also in terms of the difference between the proportions, explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the reverse. (1) b) Use a test ratio to test your hypotheses from a) (2) c) Use a critical value for the difference between proportions to test your hypotheses from a) (2) d) Use an appropriate confidence interval to test your hypotheses from a) (2) e) Treat each major separately and test the hypothesis that the proportion of students that have loans is independent of major (4) f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of business students that have loans with the proportions for the other 6 majors. Tell which differences are significant. (3) [14] g) (Extra credit) Check your results using Minitab. (i) To do a chi-squared test on an O table that is in Columns c22-c28, simply put the row labels in Column c21 and print out your data. Then type in ChiSquare c22 – c28. The computer will print back the columns with their names, but below each number from the O table you O E 2 , the contribution of the value of O to the chi-square E total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance level. will find the corresponding values of E and 12 252x0821 3/30/08 (ii) To do a test of the alternative hypothesis H 1 : p1 p 2 , where p1 x1 x and p 2 2 , use the n1 n2 command below, substituting your numbers for x1 , n1 , x 2 and n 2 . MTB > PTwo x1 n1 x 2 n 2 ; SUBC> Confidence 99.0; SUBC> Alternative 1; SUBC> Pooled. x1 x , x 2 , n 2 and p 2 2 a p-value for a z-test and Fisher’s n1 n2 exact test (results should be somewhat similar to the z-test) and a 1-sided 99% confidence interval. The computer will print back x1 , n1 , p1 2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the ‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d x1 x 2 . Use a 95% confidence level in this problem. The professor has computed Fat scores = 64.96, x 2 1 x 1 Sum of Sum of squares of Fat scores = 307.607, x x 2 Sum of scores of Fit = 90.02, 2 2 Sum of squares of Fit scores = 581.239, d Sum of diff = -25.06 and d Sum of squares of diff = 2 51.8198. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Fat Fit Diff x1 x2 d x1 x 2 4.99 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 6.68 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 -1.69 -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 To personalize the data remove row b , where b is the last digit of your student number. Please state clearly what row you removed. At this point you will have n1 n 2 13 rows of data. You will need the mean and variance of all three columns of data if you do all sections of this problem. You can save yourself considerable effort by using the computational formula for the variance with the sums and sums of squares that the professor computed with the value or value squared of the numbers you removed subtracted. The professor got the following results. Variable Fat Fit diff n 14 14 14 Mean 4.640 6.430 -1.790 SE Mean 0.184 0.115 0.196 StDev 0.690 0.431 0.732 Median 4.835 6.400 -1.845 Your results should be relatively similar. Credit for computing the sample statistics needed is included in the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem. a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are similar. (3) b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are not similar. (3) 13 252x0721 3/19/07 c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution. Should we have used a) or b)? (2) [22] d) Compute the mean and variance of the column of differences and test the column to see if the Normal distribution works for these data. (4) e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people differs. (2) f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the faculty has increased? (2) g) Repeat f) under the assumption that the Normal distribution does not apply. (1) h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35] i) Extra credit. Use Minitab to check your work. The commands that you might need are as follows – remember that the subcommand ’Alternative -1’ gives a left-sided test and ’Alternative +1’ gives a right sided test. If this subcommand is not used a 2-sided test will appear. The basic command to compare two means for data in c2 and c3 is MTB > TwoSample c2 c3. This will produce a 2-sided test using Method D3. A semicolon followed by the Alterative subcommand will produce a 1-sided test. Adding the subcommand ’Pooled’ switches the method to D2. Remember that a semicolon tells Minitab that a subcommand is coming and a period tells Minitab that the command is complete. To use Method C4 on the same two columns use the command MTB > Paired c2 c3. This also can be modified with the Alternative command. To test C2 for Normality using a Lilliefors test use MTB > NormTest c4; SUBC> KSTest. There are two other tests for Normality baked into Minitab. These are the Anderson-Darling test and the Ryan-Joiner test. The graph produced by any of these can be analyzed by the Fat Pencil Test. To get a basic explanation of these tests use the Stat pull-down menu hit basic statistics and then Normality Test. Finally hit ‘help’ and investigate the topics available. There will be a small bonus for those of you who mention Minitab’s problems with English grammar. To use the Anderson-Darling test, use the NormTest command without a subcommand. To use the Ryan-Joiner test use MTB > NormTest c4; SUBC> RJTest. A really impressive paper might compare the results of the 3 tests and then show the results of an internet search on the differences between them. The other two tests that are relevant here can be accessed by using the Stat pull-down menu and the Nonparametrics option. The instruction for a left-sided (Wilcoxon)-Mann-Whitney test would be MTB > Mann-Whitney 95.0 c2 c3; SUBC> Alternative -1. Minitab’s instructions for a 2-sided Wilcoxon signed rank test of a median of -2 from one sample in C4 would be MTB > WTest -2 c4. To do a one-sided test comparing samples in two columns take d x1 x 2 and do a test that the median of d is zero. Again Alternative can be used to get a 1-sided test. Also there is some advice from last term’s Take-home. To fake computation of a sample variance or standard deviation of the data in column c1 using column c2 for the squares, MTB MTB MTB MTB MTB > > > > > let C2 = C1*C1 name k1 'sum' name k2 'sumsq' let k1 = sum(c1) let k2 = sum(c2) * performs multiplication ** would do a power, but multiplication is more accurate. This is equivalent to let k2 = ssq(c1) 14 252x0721 3/19/07 MTB > print k1 k2 Data Display sum sumsq MTB MTB MTB MTB > > > > 3047.24 468657 This is a progress report for my data set. name k1 'meanx' let k1 = k1/count(c1) /means division. Count gives n. let k2 = k2 - (count(c1))*k1*k1 print k1 k2 Data Display meanx sumsq 152.362 4372.53 MTB > name k2 'varx' MTB > let k2 = k2/((count(c1))-1) MTB > print k1 k2 Data Display meanx varx 152.362 230.133 MTB > name k2 'stdevx' MTB > let k2 = sqrt(k2) MTB > print k1 k2 Sqrt gives a square root. Data Display meanx stdevx 152.362 15.1701 Print C1, C2 To check for equal variances for data in C1 and C2, use MTB > VarTest c1 c2; SUBC> Unstacked. Both an F test and a Levine test will be run. The Levine test is for non-Normal data so you want the F test results. To check your mean and standard deviation, use ` MTB > describe C1 To put a items in column C1 in order in column C2, use MTB > Sort c1 c2; SUBC> By c1. 3. Sorry. This is all I’ve got. The methods were listed in the outline in the following table. Paired Samples Location - Normal distribution. Method D4 Compare means. Independent Samples Methods D1- D3 Location - Distribution not Normal. Compare medians. Method D5b Method D5a Proportions Method D6b Method D6a Variability - Normal distribution. Compare variances. Method D7 15