252y0821 10/20/08 Student Number: _________________________

advertisement
252y0821 10/20/08
ECO252 QBA2
SECOND EXAM
March 28, 2008
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Class hours registered and attended (if different):_________________________
IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+
points). In each section state clearly what number you are using to personalize data. There is a penalty for
failing to include your student number on this page, not clarifying version number in each section and not
including class hour somewhere. Please write on only one side of the paper. Be prepared to turn in your
Minitab output for the first computer problem and to answer the questions on the problem sheet
about it or a similar problem.
1. (Moore, McCabe et. al.) A large public university took a survey of 865 students to find out if there was a
relationship between the chosen major and whether the students had student loans. The students’ majors
were categorized as Agriculture, Child Development, Engineering, Liberal Arts, Business, Science and
Technology. Before you start personalize the data as follows. Let a be the second-to-last digit of your
student number. Change the number of Science majors with loans to 31  a and the number of business
majors who have loans to 24  a for every part of this problem. The total number of students in the survey
will not change. Put your version of the table below on top of the first page of your solution. Use a 99%
confidence level in this problem.
Loan
None
Ag
32
35
Ch
37
50
Engg
98
137
Lib
89
124
Bus
24
51
Sci
31
29
Tech
57
71
a) Compute the proportion of non-science majors that have loans in order to test the hypothesis that science
majors are more likely to have loans than other majors. Tell which group you consider sample 1. State H 0
and H 1 in terms of the proportions involved and also in terms of the difference between the proportions,
explaining whether this difference is a statistic from sample 1 minus a statistic from sample 2 or the
reverse. (1)
b) Use a test ratio to test your hypotheses from a) (2)
c) Use a critical value for the difference between proportions to test your hypotheses from a) (2)
d) Use an appropriate confidence interval to test your hypotheses from a) (2)
e) Treat each major separately and test the hypothesis that the proportion of students that have loans is
independent of major (4)
f) If you did section 1e, follow your analysis with a Marascuilo procedure to compare the proportion of
business students that have loans with the proportions for the other 6 majors. Tell which differences are
significant. (3) [14]
g) (Extra credit) Check your results using Minitab.
(i) To do a chi-squared test on an O table that is in Columns c22-c28, simply put the row labels in
Column c21 and print out your data. Then type in
ChiSquare c22 – c28.
The computer will print back the columns with their names, but below each number from the O table you
O  E 2
, the contribution of the value of O to the chi-square
E
total. Use the p-value to find out if we reject the hypothesis of equal proportions at the 1% significance
level.
will find the corresponding values of E and
1
252y0821 10/20/08
(ii) To do a test of the alternative hypothesis H 1 : p1  p 2 , where p1 
x1
x
and p 2  2 , use the
n1
n2
command below, substituting your numbers for x1 , n1 , x 2 and n 2 .
MTB > PTwo x1 n1 x 2 n 2 ;
SUBC>
Confidence 99.0;
SUBC>
Alternative 1;
SUBC>
Pooled.
x1
x
, x 2 , n 2 and p 2  2 a p-value for a z-test and Fisher’s
n1
n2
exact test (results should be somewhat similar to the z-test) and a 1-sided 99% confidence interval.
The computer will print back x1 , n1 , p1 
2. (Moore, McCabe et. al) An absolutely tactless psychology professor has divided faculty members into
categories the professor labels ‘Fat’ and ‘Fit’. A random sample of scores on a test of ‘ego strength’ of the
‘Fat’ faculty is labeled x1 . A sample of ‘ego strength’ of the ‘Fit’ faculty is labeled x 2 . d  x1  x 2 .
Use a 95% confidence level in this problem.
The professor has computed
Fat scores = 64.96,
x
2
1
x
1
 Sum of
 Sum of squares of Fat scores
= 307.607,
x
x
2
 Sum of scores of Fit = 90.02,
2
2
 Sum of squares of Fit scores
= 581.239,
 d  Sum of diff = -25.06 and
 d  Sum of squares of diff =
2
51.8198.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fat
Fit
Diff
x1
x2
d  x1  x 2
4.99
4.24
4.74
4.93
4.16
5.53
4.12
5.10
4.47
5.30
3.12
3.77
5.09
5.40
6.68
6.42
7.32
6.38
6.16
5.93
7.08
6.37
6.53
6.68
5.71
6.20
6.04
6.52
-1.69
-2.18
-2.58
-1.45
-2.00
-0.40
-2.96
-1.27
-2.06
-1.38
-2.59
-2.43
-0.95
-1.12
To personalize the data remove row b , where b is the last digit of your student number. Please state
clearly what row you removed. At this point you will have n1  n 2  13 rows of data. You will need the
mean and variance of all three columns of data if you do all sections of this problem. You can save
yourself considerable effort by using the computational formula for the variance with the sums and sums of
squares that the professor computed with the value or value squared of the numbers you removed
subtracted.
The professor got the following results.
Variable
Fat
Fit
diff
n
14
14
14
Mean
4.640
6.430
-1.790
SE Mean
0.184
0.115
0.196
StDev
0.690
0.431
0.732
Median
4.835
6.400
-1.845
Your results should be relatively similar. Credit for computing the sample statistics needed is included in
the relevant parts of this problem. State hypotheses and conclusions clearly in each segment of the problem.
a) Assume that x1 and x 2 are independent random samples and test the hypothesis that the population
mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty. Assume that
the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’ populations are
similar. (3)
b) (Extra credit) Assume that x1 and x 2 are independent random samples and test the hypothesis that the
population mean of the ego strength of the ‘fit’ faculty is above the population mean of the ‘fat’ faculty.
Assume that the data comes from the Normal distribution and that the variances for the ‘fit’ and ‘fat’
populations are not similar. (3)
2
252y0821 10/20/08
c) Assume that x1 and x 2 are independent random samples. How would we decide whether the method in
a) of b) is correct? Do the appropriate test. Assume that the data comes from the Normal distribution.
Should we have used a) or b)? (2) [22]
d) Compute the mean and variance of the column of differences and test the column to see if the Normal
distribution works for these data. (4)
e) Assume that we had rejected the hypothesis that the distributions in the populations that the columns
come from is Normal, do a one-sided test to see whether the ego strength of the ‘Fat and ‘Fit’ people
differs. (2)
f) In the remainder of this problem assume that the x1 and x 2 columns are not independent random
samples but instead represent the ego strength of the same 14 or 13 faculty members before and after a
fitness program. Assuming that the Normal distribution applies, can we say that the ego strength of the
faculty has increased? (2)
g) Repeat f) under the assumption that the Normal distribution does not apply. (1)
h) Use the Wilcoxon signed rank test, to test to see if the median of the d column is -2. (2) [35]
i) Extra credit. Use Minitab to check your work.
The commands that you might need are as follows – remember that the subcommand ’Alternative -1’
gives a left-sided test and ’Alternative +1’ gives a right sided test. If this subcommand is not used a
2-sided test will appear.
The basic command to compare two means for data in c2 and c3 is
MTB > TwoSample c2 c3.
This will produce a 2-sided test using Method D3. A semicolon followed by the Alterative subcommand
will produce a 1-sided test. Adding the subcommand ’Pooled’ switches the method to D2. Remember
that a semicolon tells Minitab that a subcommand is coming and a period tells Minitab that the command is
complete. To use Method C4 on the same two columns use the command
MTB > Paired c2 c3.
This also can be modified with the Alternative command.
To test C2 for Normality using a Lilliefors test use
MTB > NormTest c4;
SUBC>
KSTest.
There are two other tests for Normality baked into Minitab. These are the Anderson-Darling test and the
Ryan-Joiner test. The graph produced by any of these can be analyzed by the Fat Pencil Test. To get a basic
explanation of these tests use the Stat pull-down menu hit basic statistics and then Normality Test. Finally
hit ‘help’ and investigate the topics available. There will be a small bonus for those of you who mention
Minitab’s problems with English grammar. To use the Anderson-Darling test, use the NormTest command
without a subcommand. To use the Ryan-Joiner test use
MTB > NormTest c4;
SUBC>
RJTest.
A really impressive paper might compare the results of the 3 tests and then show the results of an internet
search on the differences between them.
The other two tests that are relevant here can be accessed by using the Stat pull-down menu and
the Nonparametrics option. The instruction for a left-sided (Wilcoxon)-Mann-Whitney test would be
MTB > Mann-Whitney 95.0 c2 c3;
SUBC>
Alternative -1.
Minitab’s instructions for a 2-sided Wilcoxon signed rank test of a median of -2 from one sample in C4
would be
MTB > WTest -2 c4.
To do a one-sided test comparing samples in two columns take d  x1  x 2 and do a test that the median of
d is zero. Again Alternative can be used to get a 1-sided test.
Also there is some advice from last term’s Take-home.
To fake computation of a sample variance or standard deviation of the data in column c1 using
column c2 for the squares,
MTB
MTB
MTB
MTB
MTB
>
>
>
>
>
let C2 = C1*C1
name k1 'sum'
name k2 'sumsq'
let k1 = sum(c1)
let k2 = sum(c2)
* performs multiplication
** would do a power, but multiplication
is more accurate.
This is equivalent to let k2 = ssq(c1)
3
252y0821 10/20/08
MTB > print k1 k2
Data Display
sum
sumsq
MTB
MTB
MTB
MTB
>
>
>
>
3047.24
468657
This is a progress report for my data
set.
name k1 'meanx'
let k1 = k1/count(c1)
/means division. Count gives n.
let k2 = k2 - (count(c1))*k1*k1
print k1 k2
Data Display
meanx
sumsq
152.362
4372.53
MTB > name k2 'varx'
MTB > let k2 = k2/((count(c1))-1)
MTB > print k1 k2
Data Display
meanx
varx
152.362
230.133
MTB > name k2 'stdevx'
MTB > let k2 = sqrt(k2)
MTB > print k1 k2
Sqrt gives a square root.
Data Display
meanx
stdevx
152.362
15.1701
Print C1, C2
To check for equal variances for data in C1 and C2, use
MTB > VarTest c1 c2;
SUBC>
Unstacked.
Both an F test and a Levine test will be run. The Levine test is for non-Normal data so you want the F test
results.
To check your mean and standard deviation, use
`
MTB > describe C1
To put a items in column C1 in order in column C2, use
MTB > Sort c1 c2;
SUBC>
By c1.
3. Sorry. This is all I’ve got.
4
Download