252y0821 10/20/08 ECO252 QBA2 SECOND EXAM March 28, 2008 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class hours registered and attended (if different):_________________________ IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state clearly what number you are using to personalize data. There is a penalty for failing to include your student number on this page, not clarifying version number in each section and not including class hour somewhere. Please write on only one side of the paper. Be prepared to turn in your Minitab output for the first computer problem and to answer the questions on the problem sheet about it or a similar problem. 1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a relationship between the chosen major and whether the students had student loans. The students’ majors were categorized as Agriculture, Child Development, Engineering, Liberal Arts, Business, Science and Technology. Before you start personalize the data as follows. Let a be the second-to-last digit of your student number. Change the number of Science majors with loans to 31 a and the number of business majors who have loans to 24 a for every part of this problem. The total number of students in the survey will not change. Put your version of the table below on top of the first page of your solution. Use a 99% confidence level in this problem. Loan None Ag 32 35 Ch 37 50 Engg 98 137 Lib 89 124 Bus 24 51 Sci 31 29 Tech 57 71 a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0 and H 1 in terms of the proportions involved and also in terms of the difference between the proportions, explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the reverse. (1) b) Use a test ratio to test your hypotheses from a) (2) c) Use a critical value for the difference between proportions to test your hypotheses from a) (2) d) Use an appropriate confidence interval to test your hypotheses from a) (2) e) Treat each major separately and test the hypothesis that the proportion of students that have loans is independent of major (4) f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of business students that have loans with the proportions for the other 6 majors. Tell which differences are significant. (3) [14] g) (Extra credit) Check your results using Minitab. (i) To do a chi-squared test on an O table that is in Columns c22-c28, simply put the row labels in Column c21 and print out your data. Then type in ChiSquare c22 – c28. The computer will print back the columns with their names, but below each number from the O table you O E 2 , the contribution of the value of O to the chi-square E total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance level. will find the corresponding values of E and 1 252y0821 10/20/08 (ii) To do a test of the alternative hypothesis H 1 : p1 p 2 , where p1 x1 x and p 2 2 , use the n1 n2 command below, substituting your numbers for x1 , n1 , x 2 and n 2 . MTB > PTwo x1 n1 x 2 n 2 ; SUBC> Confidence 99.0; SUBC> Alternative 1; SUBC> Pooled. x1 x , x 2 , n 2 and p 2 2 a p-value for a z-test and Fisher’s n1 n2 exact test (results should be somewhat similar to the z-test) and a 1-sided 99% confidence interval. The computer will print back x1 , n1 , p1 2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the ‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d x1 x 2 . Use a 95% confidence level in this problem. The professor has computed Fat scores = 64.96, x 2 1 x 1 Sum of Sum of squares of Fat scores = 307.607, x x 2 Sum of scores of Fit = 90.02, 2 2 Sum of squares of Fit scores = 581.239, d Sum of diff = -25.06 and d Sum of squares of diff = 2 51.8198. Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Fat Fit Diff x1 x2 d x1 x 2 4.99 4.24 4.74 4.93 4.16 5.53 4.12 5.10 4.47 5.30 3.12 3.77 5.09 5.40 6.68 6.42 7.32 6.38 6.16 5.93 7.08 6.37 6.53 6.68 5.71 6.20 6.04 6.52 -1.69 -2.18 -2.58 -1.45 -2.00 -0.40 -2.96 -1.27 -2.06 -1.38 -2.59 -2.43 -0.95 -1.12 To personalize the data remove row b , where b is the last digit of your student number. Please state clearly what row you removed. At this point you will have n1 n 2 13 rows of data. You will need the mean and variance of all three columns of data if you do all sections of this problem. You can save yourself considerable effort by using the computational formula for the variance with the sums and sums of squares that the professor computed with the value or value squared of the numbers you removed subtracted. The professor got the following results. Variable Fat Fit diff n 14 14 14 Mean 4.640 6.430 -1.790 SE Mean 0.184 0.115 0.196 StDev 0.690 0.431 0.732 Median 4.835 6.400 -1.845 Your results should be relatively similar. Credit for computing the sample statistics needed is included in the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem. a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are similar. (3) b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are not similar. (3) 2 252y0821 10/20/08 c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution. Should we have used a) or b)? (2) [22] d) Compute the mean and variance of the column of differences and test the column to see if the Normal distribution works for these data. (4) e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people differs. (2) f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the faculty has increased? (2) g) Repeat f) under the assumption that the Normal distribution does not apply. (1) h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35] i) Extra credit. Use Minitab to check your work. The commands that you might need are as follows – remember that the subcommand ’Alternative -1’ gives a left-sided test and ’Alternative +1’ gives a right sided test. If this subcommand is not used a 2-sided test will appear. The basic command to compare two means for data in c2 and c3 is MTB > TwoSample c2 c3. This will produce a 2-sided test using Method D3. A semicolon followed by the Alterative subcommand will produce a 1-sided test. Adding the subcommand ’Pooled’ switches the method to D2. Remember that a semicolon tells Minitab that a subcommand is coming and a period tells Minitab that the command is complete. To use Method C4 on the same two columns use the command MTB > Paired c2 c3. This also can be modified with the Alternative command. To test C2 for Normality using a Lilliefors test use MTB > NormTest c4; SUBC> KSTest. There are two other tests for Normality baked into Minitab. These are the Anderson-Darling test and the Ryan-Joiner test. The graph produced by any of these can be analyzed by the Fat Pencil Test. To get a basic explanation of these tests use the Stat pull-down menu hit basic statistics and then Normality Test. Finally hit ‘help’ and investigate the topics available. There will be a small bonus for those of you who mention Minitab’s problems with English grammar. To use the Anderson-Darling test, use the NormTest command without a subcommand. To use the Ryan-Joiner test use MTB > NormTest c4; SUBC> RJTest. A really impressive paper might compare the results of the 3 tests and then show the results of an internet search on the differences between them. The other two tests that are relevant here can be accessed by using the Stat pull-down menu and the Nonparametrics option. The instruction for a left-sided (Wilcoxon)-Mann-Whitney test would be MTB > Mann-Whitney 95.0 c2 c3; SUBC> Alternative -1. Minitab’s instructions for a 2-sided Wilcoxon signed rank test of a median of -2 from one sample in C4 would be MTB > WTest -2 c4. To do a one-sided test comparing samples in two columns take d x1 x 2 and do a test that the median of d is zero. Again Alternative can be used to get a 1-sided test. Also there is some advice from last term’s Take-home. To fake computation of a sample variance or standard deviation of the data in column c1 using column c2 for the squares, MTB MTB MTB MTB MTB > > > > > let C2 = C1*C1 name k1 'sum' name k2 'sumsq' let k1 = sum(c1) let k2 = sum(c2) * performs multiplication ** would do a power, but multiplication is more accurate. This is equivalent to let k2 = ssq(c1) 3 252y0821 10/20/08 MTB > print k1 k2 Data Display sum sumsq MTB MTB MTB MTB > > > > 3047.24 468657 This is a progress report for my data set. name k1 'meanx' let k1 = k1/count(c1) /means division. Count gives n. let k2 = k2 - (count(c1))*k1*k1 print k1 k2 Data Display meanx sumsq 152.362 4372.53 MTB > name k2 'varx' MTB > let k2 = k2/((count(c1))-1) MTB > print k1 k2 Data Display meanx varx 152.362 230.133 MTB > name k2 'stdevx' MTB > let k2 = sqrt(k2) MTB > print k1 k2 Sqrt gives a square root. Data Display meanx stdevx 152.362 15.1701 Print C1, C2 To check for equal variances for data in C1 and C2, use MTB > VarTest c1 c2; SUBC> Unstacked. Both an F test and a Levine test will be run. The Levine test is for non-Normal data so you want the F test results. To check your mean and standard deviation, use ` MTB > describe C1 To put a items in column C1 in order in column C2, use MTB > Sort c1 c2; SUBC> By c1. 3. Sorry. This is all I’ve got. 4