Practice Problems - Widener University

advertisement
Practice Exercises
for QA252
Intermediate Statistics
Professor K. Leppel
Hypothesis Testing: Type I and Type II Errors
1.
A researcher has hypothesized that the mean number of traffic violations per day in Saskatchewan, Canada is
25. So the null and alternative hypotheses are H0: μ = 25, H1: μ ≠ 25. The level of significance to be used is
α = .05. A random sample was taken and based on the results of the sample, decisions were made. Complete
the table below.
true
(but unknown)
mean
2.
Is H0 true?
Decision
(based on sample)
25
Accept H0
25
Reject H0
27
Accept H0
27
Reject H0
Is the decision
correct? (Y/N)
Type of Error (I or II)
if any
A consumer advocate believes that more than 5% of a particular manufacturer’s tires are defective. The
advocate wants to be 99% sure that he is correct before going public with his claim. In other words, he wants to
limit the probably of claiming that a lot of the tires are bad, if they’re not, to at most 1%.
In one-sided tests, the null hypothesis can usually be set up as the devil’s advocate’s approach to the claim.
That is, the null hypothesis is "the percent defective is not more than 5% and I'm sticking with that until you can
show me reason to believe otherwise." So the claim becomes the alternative hypothesis that will only be
accepted if the null can be rejected. So, the null and alternative hypotheses are H0: π ≤ .05, H1: π > .05. The
level of significance, α, is .01.
A random sample was taken and based on the results of the sample, decisions were made. Complete the table.
true
(but unknown)
proportion
Is H0 true?
Decision
(based on sample)
.05
Accept H0
.05
Reject H0
.07
Accept H0
.07
Reject H0
.03
Accept H0
.03
Reject H0
Is the decision
correct? (Y/N)
Type of Error (I or II)
if any
Valid Null and Alternative Hypotheses
For each pair of null and alternative hypotheses, determine whether the set is a valid set of
hypotheses, and if not, explain why not.
Question
number
Null
Hypothesis
Alternative
Hypothesis
1
H0: μ = 25
H1: μ ≠ 25
2
H0: π = .25
H1: π ≠ .25
3
𝐻0 : 𝑋̅ = 63
𝐻1 : 𝑋̅ ≠ 63
4
H0: π ≥ 18
H1: π ≤ 18
5
H0: μ > 43
H1: μ ≤ 43
6
H0: p = .72
H1: p ≠ .72
7
H0: μ ≥ 96
H1: μ < 96
8
H0: p ≤ .42
H1: p > .42
9
𝐻0 : 𝑋̅ < 57
𝐻1 : 𝑋̅ < 57
10
H0: π ≤ .81
H1: π > .81
Valid?
(Y/N)
Issues:
Hypotheses must involve population parameters and NOT sample statistics.
Null and alternative hypotheses must describe different and non-overlapping situations.
Equality must be in the null hypothesis and NOT in the alternative.
If not valid, why not?
Hypothesis Testing – One Sample
Use the following GPA data from a statistics class of 18 males and 14 females to answer
questions 1 to 5.
Male
GPAs
3.52
3.5
3.4
3.2
3.2
3.06
3.05
2.8
2.8
2.7
2.7
2.65
2.6
2.5
2.5
2.4
2.4
2.3
Female
GPAs
3.8
3.78
3.7
3.5
3.3
2.98
2.9
2.8
2.7
2.7
2.7
2.3
2.3
1.9
(1) Suppose that the standard deviation of GPAs for the
population of male Statistics students was known to be
0.45. Test at the 5% level whether the mean GPA for
all male Statistics students is equal to 2.9.
(2) Suppose that the standard deviation of GPAs for the
population of male Statistics students was known to be
0.45. Calculate the p-value to test at the 5% level
whether the mean GPA for all male Statistics students
is equal to 2.9.
(3) Suppose that the standard deviation of GPAs for the
population of male Statistics students was known to be
0.45. Test at the 5% level whether the mean GPA for
all male Statistics students is less than 2.9. (Use the
devil's advocate approach to set up the null hypothesis
in this problem. The null is "the mean GPA for all male
Statistics students is not less than 2.9, and I'm sticking
with that until you can show me reason to believe
otherwise." Note: Saying that the male mean GPA is
not less than 2.9 is equivalent to saying that the male
mean GPA greater than or equal to 2.9. The alternative
is that the mean GPA is less than 2.9.)
(4) Suppose that the standard deviation of GPAs for the population of male Statistics students
was known to be 0.45. Calculate the p-value to test at the 5% level whether the mean GPA
for all male Statistics students is less than 2.9. (Again, set up the null hypothesis using the
devil’s advocate approach.)
(5) Test at the 5% level whether the mean GPA for all male Statistics students is equal to 2.9.
(You have no knowledge of the population standard deviation of GPAS.)
(6) Suppose one of the possible majors in Business Administration is International Business. In
a sample of 340 college students majoring in Business Administration, 60 students are
majors in International Business. Test at the 5% level whether the proportion of Business
Administration students majoring in International Business is 20%.
Hypothesis Testing – Two Sample
Use the following GPA data from a statistics class of 18 males and 14 females to answer
questions 1 to 5. (This is the same data set as was used in the one-sample practice problems.)
Male
GPAs
3.52
3.5
3.4
3.2
3.2
3.06
3.05
2.8
2.8
2.7
2.7
2.65
2.6
2.5
2.5
2.4
2.4
2.3
Female
GPAs
3.8
3.78
3.7
3.5
3.3
2.98
2.9
2.8
2.7
2.7
2.7
2.3
2.3
1.9
(1) Suppose that the standard deviation of GPAs for the
population of male Statistics students was known to be
0.45, and the standard deviation of GPAs for the
population of female Statistics students was known to
be 0.55. Test at the 5% level whether the mean GPA
for all male Statistics students is equal to the mean
GPA for all female Statistics students.
(2) Suppose that the standard deviation of GPAs for the
population of male Statistics students was known to be
0.45, and the standard deviation of GPAs for the
population of female Statistics students was known to
be 0.55. Test at the 5% level whether the mean GPA
is greater for females than for males. (Use the devil's
advocate approach to set up the null hypothesis in this
problem. The null is "the mean GPA for all female
Statistics students is not greater than the mean GPA
for all male Statistics students, and I'm sticking with
that until you can show me reason to believe
otherwise." Note: Saying that the female mean is not
greater than the male mean is equivalent to saying that
the female mean is less than or equal to the male mean.
The alternative is that the mean GPA is greater for
females than for males.)
(3) Test at the 5% level the null hypothesis that the mean GPA for all male Statistics students is
equal to the mean GPA for all female Statistics students. You have no knowledge of the
population standard deviations.
(4) Test at the 5% level the null hypothesis that the mean GPA for all male Statistics students is
equal to the mean GPA for all female Statistics students. The population standard
deviations are unknown but you believe that they are equal.
(5) Test at the 5% level whether the variance of the GPAs of the female students
a. is equal to the variances of the GPAs of male students.
b. is greater than the variance of the GPAs of the male students.
(6) Suppose one of the possible majors in Business Administration is International Business. A
sample of 340 college students in Business Administration consists of 190 men and 150
women. There are 60 students majoring in International Business, 20 male and 40 female.
Test at the 1% level whether the proportion of women in Business Administration who
major in International Business is greater than the proportion of men in Business
Administration who major in International Business. (Reminder: Use the devil's advocate
approach to set up the null hypothesis for this problem.)
(continued)
(7) Suppose that the number of girls and the number of boys in the families of Statistics
students are as given below. Test at the 10% level whether on average the number of boys
in the family is equal to the number of girls in the family.
Family
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# of girls
0
1
0
1
4
1
0
1
3
0
0
5
3
1
0
2
1
0
# of boys
1
1
1
2
1
1
3
1
0
2
3
2
1
0
3
0
1
2
Chi-squared Tests
(1) Suppose a manager believes that the company’s customers have the following preferences
for three models: 20% prefer model A, 30% prefer model B, and 50% prefer model C.
Survey results from a sample of 300 customers indicate that 50 prefer model A, 80 prefer B,
and 170 prefer C. Test at the 10% level whether the manager is correct.
(2) Suppose that a class of 33 students is divided into commuters and residents. The students
are also divided into 3 activities categories: those with no extracurricular activities, those
with exactly 1, and those with 2 or more. The result is the following table. Test at the 10%
level whether student commuter versus resident status is independent of the number of
extracurricular activities.
residence status
# of extracurricular activities
0
1
2 or more
commuter
6
7
1
Resident
8
5
6
(3) You want to test at the 5% level whether the performance on a standardized exam by the
students at a particular university has a standard deviation of 10. Based on the data from
your sample of 20 students, you have found the standard deviation to be 14. Perform the test.
Analysis of Variance (ANOVA): Null hypothesis versus alternative hypothesis
In the table below, write H0 next to the null hypothesis and H1 next to the alternative hypothesis.
H0
or
H1
Hypothesis
H0
or
H1
Hypothesis
1
There is no difference in average salaries of
Asians, Caucasians, and African Americans.
There is a difference in average salaries of
Asians, Caucasians, and African Americans.
2
Average number of years of education varies
with income class.
Average number of years of education is the
same for all income classes.
3
The average number of items produced per
minute by four different machines is not the
same.
The average number of items produced per
minute by four different machines is the same.
4
The average student performance is the same for
all sections of a course.
The average student performance is not the
same for all sections of a course.
5
The average level of productivity of employees
is the same for all training programs.
The average level of productivity of employees
depends on the training program.
6
The average life span of a microwave oven
varies with brand.
The average life span of a microwave oven does
not vary with brand.
7
Average price of housing does not vary by city.
Average price of housing varies by city.
8
The average student performance is the same
regardless of the software used.
The average student performance depends on
the software used.
The average employer rating of Co-op students
depends on the university attended by the
student.
The average starting salary does not vary with
the university from which the employee
graduated.
The average employer rating of Co-op students
does not depend on the university attended by
the student.
11
The average fee charged for household electrical
repairs differs by county.
The average fee charged for household electrical
repairs is the same for all counties.
12
The average applicant score is the same for all
interviewers.
The average applicant score is not the same for
all interviewers.
13
Average gasoline mileage for compact cars
varies with manufacturer.
Average gasoline mileage for compact cars does
not vary with manufacturer.
14
Average number of hours studied per week is
the same for all college class years.
Average number of hours studied per week
depends on college class year.
15
Average family size varies by ethnic group.
Average family size is the same for all ethnic
groups.
9
10
The average starting salary does vary with the
university from which the employee graduated.
ANOVA Table Completion
Complete the following tables:
Source of Variation
SS
Among or Between Treatments
DF
MS
2
1000
Error or Within Treatments
Total
400
F-statistic
----------------
6800
14
Source of Variation
SS
DF
MS
F-statistic
Among or Between Treatments
800
11
300
----------------
Error or Within Treatments
Total
Source of Variation
4100
13
SS
DF
MS
7
500
Among or Between Treatments
Error or Within Treatments
Total
----------------
----------------
336.364
7200
18
F-statistic
----------------
----------------
Analysis of Variance Testing
(1) Suppose that 34 students are divided into four categories: those with sports extracurricular
activities only, those with non-sport activities only, those with both sports and non-sports
activities, and those with neither type of activity. Each student was asked how many hours
he/she worked per week at paid employment, if any. Based on the results, the sums of
squares between and within were computed. Complete the analysis of variance table
presented below. Then test at the 5% level whether the average number of hours worked per
week varies with activity category.
Source of variation
Between
Within
Sum of squares
Degrees of freedom
Mean square
450.00
1500.00
Total
(2) Suppose there are 4 years (freshman, sophomore, junior, and senior), and 2 housing statuses
(commuter and resident), for a total of 8 cells. Each cell has 4 observations. The number of
credits each student is carrying in the current semester is examined, and the various sums of
squares are computed. Complete the analysis of variance table presented below. Then test
at the 5% level whether the average number of credits carried is influenced by (a) class year,
(b) housing status, and (c) the interaction of class year and housing status.
Source of variation
Class year
Housing status
Sum of squares
30
4
Interaction
24
Error
48
Total
Degrees of freedom
Mean square
Simple Regression
Consider the following data on the heights and weights of 30 students. Use these data to answer
the questions below.
height
69
68
74
67
64
65
74
73
62
71
73
66
66
72
63
67
71
69
63
69
67
67
65
64
68
70
80
64
75
66
weight
160
225
175
125
109
132
185
185
112
165
205
140
120
200
104
175
172
160
135
175
143
150
120
115
160
185
200
115
215
140
(1) Estimate the regression line of weight on height,
Μ‚ = π‘Ž + 𝑏 𝐻𝐺𝑇
π‘ŠπΊπ‘‡
(2) Calculate the standard error of the regression (or standard error of
the estimate).
(3) Calculate and interpret the coefficient of determination.
(4) Calculate the standard error of the estimated coefficient b. Use
this information to test at the 5% level whether the true slope of
the relation between height and weight is actually zero.
(5) Calculate the 95% confidence interval for the true slope of the
relation between height and weight.
(6) Calculate the sample correlation coefficient r. Test at the five
percent level whether the population correlation coefficient is
actually zero.
(7) Calculate the 90% forecasting interval for the weight of an
individual student whose height is 5 feet 9 inches.
(8) Calculate the 90% forecasting interval for the average weight of a
large group of students whose heights are all 5 feet 9 inches.
Multiple Regression
Suppose that a regression is run using the number of hours of study time per week (STUDY) as
the dependent variable. There are 35 observations. The independent variables are
WKHRS:
COMMUTER:
MALE:
SENIOR:
the number of hours worked per week at a job,
dummy variable equal to one for commuting students, and 0 for resident students,
dummy variable equal to one if the student is male, and 0 if the student is female,
dummy variable equal to one if the student is a senior, and 0 otherwise.
The results are as follows.
Variable
Constant
WKHRS
COMMUTER
MALE
SENIOR
Source of variation
Regression
Error
Total
Coefficient
20.0
-0.5
2.0
-3.0
6.0
Standard error
10.0
0.125
2.0
6.0
2.0
Analysis of Variance
Sum of squares
Degrees of freedom
160.0
40.0
200.0
Mean square
(1) Complete the ANOVA table.
(2) Compute the standard error of the estimate (or standard error of the regression).
(3) Compute and interpret the unadjusted coefficient of determination.
(4) Compute the coefficient of determination adjusted for degrees of freedom.
(5) Test at the 5% level whether the coefficient on the variable MALE is equal to zero.
(6) Test at the 5% level whether the coefficient on the variable SENIOR is equal to zero.
(7) Test at the 5% level the null hypothesis that the coefficient on the variable COMMUTER is
equal to zero, versus the alternative that it is more than zero.
(8) Test at the 5% level the null hypothesis that the coefficient on the variable WKHRS is equal
to zero, versus the alternative that it is less than zero.
(9) Test at the 5% level the hypothesis that all the slope coefficients are zero.
(10) How much does expected study time change if a student works an additional hour at a job?
Specify whether this change is an increase or a decrease in study time.
(11) According to the regression results, do seniors study more or less than non-senior students?
By how much more or less than non-seniors do seniors study?
Time Series
Suppose a student takes an intensive summer school course. The course meets all day Monday,
Tuesday, Wednesday, and Thursday for four weeks. A short quiz is given each day. Suppose
the student's quiz grades are as follows. Answer questions 1 to 3 based on these data.
week
I
II
III
IV
day
M
Tu
W
Th
M
Tu
W
Th
M
Tu
W
Th
M
Tu
W
Th
grade
4
5
8
6
6
8
5
7
6
7
9
8
8
8
9
7
(1) Using four-day moving averages, compute the "seasonal" or "daily" index for each day of
the week (instead of each season).
(2) Based on your daily indices, on which day does the student usually perform the best? On
which day does the student usually perform the worst?
(3) Use your daily indices to adjust the time series of quiz grades.
(4) Consider the following sequence of quiz grades. Calculate the grade forecasts for quizzes 2
to 10, using the exponential smoothing method. Let the forecast F1 for the first quiz grade
be the actual value A1 for the first quiz. Use a weight on the actual grade of w = 0.5.
actual grade
(A1) 5
(A2) 8
(A3) 6
(A4) 8
(A5) 5
(A6) 7
(A7) 6
(A8) 7
(A9) 9
forecasted grade for next quiz
(F2)
(F3)
(F4)
(F5)
(F6)
(F7)
(F8)
(F9)
(F10)
Nonparametric Tests
(1) Suppose the grades on an exam for the male and female students in a class were as indicated
below. Use the Wilcoxon rank sum test to test at the 5% level whether males and females
did equally well. (Note: If two students tie for ranks 1 and 2, they both get the "middle"
rank of 1.5, and the student following them is ranked 3. If three students tie for ranks 1, 2,
and 3, they all get the "middle" rank of 2, and the student following them is ranked 4. If
four students tie for ranks 1, 2, 3, and 4, they all get the "middle" rank of 2.5, and the
student following them is ranked 5.)
males
97
94
86
86
83
81
78
76
76
69
63
56
55
51
47
38
32
21
20
3
females
97
92
89
86
86
85
77
76
70
56
49
41
39
29
15
(2) Two drivers are testing the mileage of different models of cars.
a.The gas mileage of nine different cars is as indicated. Use the Wilcoxon signed rank test to
test at the 5% level whether there is a difference in the gas mileage for the two drivers based
on the data below. (Use the table for the signed rank critical values for this part.)
driver 1
14.3
15.0
27.8
27.9
48.8
16.8
23.7
32.8
37.3
driver 2
16.8
17.8
26.2
33.2
47.6
18.3
28.5
33.1
44.0
b. Thirty-three cars were tested. Differences in mileage were calculated and ranked. There
were three differences that were equal to zero. The sum of the positive ranks was 200 and
the sum of the negative ranks was 265. Use the Wilcoxon signed rank test to test at the
5% level whether there is a difference in the gas mileage for the two drivers.
(continued)
(3) Suppose scores for 17 students from 3 schools in intermural competitions are as given
below. Use the Kruskal-Wallis Test to test at the 5% level whether average scores for
students from the three schools are the same.
School A: 27, 21, 30, 23, 18, 19
School B: 22, 29, 17, 26, 14, 16
School C: 24, 12, 11, 13, 28
(4) Suppose that in a class of 15 males and 10 females, course averages are such that the males
(m) and females (f) rank from high to low as given below. Test whether the arrangement is
random, at the 10% level.
f f m m m f m f f f m m m m m f f m m m f m m m f
Download