1. Of the 600 voters polled, 245 said they would vote... A. Find the Confidence interval of the proportion of...

advertisement
1.
Of the 600 voters polled, 245 said they would vote for Herman Cain:
A. Find the Confidence interval of the proportion of voters that would vote for this candidate
at 0.05 significant levels.
B. Calculate the Margin of error for that proportion.
Solution:
The given information can be represented in the following notations:
Sample size = N = 600 voters
Number of people who vote for Herman Cain X = 245
The proportion of voters for Herman Cain = p̂ =
0.4083
qˆ = 1 − pˆ
= 1 – 0.4083
q̂ = 0.5917
A) Find the Confidence interval of the proportion of voters that would vote for this candidate at
0.05 significant levels.
Solution:
Since sample size=n=600 which is greater than 30, we use Z-test
The 95% confidence interval for the population proportion is

 pˆ − Zα / 2 *


pˆ qˆ
, pˆ + Zα / 2 *
n
pˆ qˆ 

n 
Critical value:
At α=0.05 level of significance, the critical value of Z is Zα/2=Z0.05/2= 1.96
By substituting the values in the above formula we get,

0.41 * 0.59
0.41 * 0.59 
 0.41 − 1.96 *

,
0
.
41
+
1
.
96
*


600
600


(0.41 − 1.96 * 0.020079,0.41 + 1.96 * 0.020079)
(0.41 − 0.03935,0.41 + 0.03935)
(0.369, 0.447)
Hence 95% confidence interval estimate of the proportion of votes is (37%, 45%).
/---------------------------/
Calculate the Margin of error for that proportion.
Solution:
Margin of Error: E = Zα / 2 *
E = 1.96 *
pˆ qˆ
n
0.41* 0.59
600
Therefore, the margin of error for the given proportion is 0.020079.
/---------------------/
Problem: 2
The data below represents the ages of undergraduate students at Harper College and Daley
College. Use the information to answer questions 27 to 29.
Harper:
Daley College
Number Sampled = 14
Number = 12
Mean = 21
Mean = 22
Standard Dev = 2.5
SD = 2.8
1.
Perform the requested operation:
C. Find the Critical Value of t to test the claim that the ages of students in the two colleges
are similar at 0.05 significant levels.
D. Calculate the Confidence interval of the Mean for Harper College students
Solution:
C)
At α = 0.05 level of significance, the critical value of t with (n1+n2-2) =24 degrees of
freedom is given by
tα / 2, n −1 = t0.025, 24 df = 2.3909
(by referring t-distribution table)
By substituting the values in the above formula we get,
/----------------/
d) The 95% confidence interval for the difference between the two population means is given by

s2 s2
( X 1 − X 2 ) − t0.025 * 1 + 2 ,
n1 n2

( X 1 − X 2 ) + t0.025 *
s12 s22 
+ 
n1 n2 
From the above table, we have

2.52 2.82
2.52 2.82 
+
, (21 − 22) + 2.3909 *
+
(21 − 22) − 2.3909 *

14
12
14
12 

[–3.507, 1.507]
Thus, the 95% confidence interval for the difference between the two population means is
[–3.507, 1.507].
/---------------------/
3. Perform the requested operation:
A. Find the Critical Value of F to test the claim that the ages of students in the two colleges
are similar at 0.05 Significant level.
B. Calculate the Coefficient of variation for the two groups and interpret them
Solution:
A. The Critical value of F is given by, F0.05, 14,12 = 2.64 (by referring F-table)
B. The coefficient of variation is a calculation built on other calculations -- the
standard deviation and the mean -- as follows:
C.V = 11.90
Problem: 4
In a clinical experiment, a researcher wants to test the effect of Medical Marijuana on Glaucoma
patients. He conducts a double-blind study using marijuana and a placebo (fake weed). The table
below summarizes the data on patient response of ‘feeling better’. With 95% Confidence, should
medical marijuana be recommended?
Questions
correctly
answered Questions answered correctly
Placebo Group
n = 42
Mean = 43.14
SD = 7.74
Marijuana Group
n = 34
Mean = 40.76
Var = 61.15
Solution:
The given information can be represented in the following notations:
Sample size of Placebo Group = n1 = 42
Sample size Marijuana Group = n2 = 34
Sample mean of Placebo Group = X 1 = 43.14
Sample mean of Marijuana Group = X 21 = 40.76
Sample standard deviation = S1 = 7.74
Sample standard deviation = S2 = 7.82
Hypotheses:
Null hypothesis: H0 : With 95% Confidence, medical marijuana can be recommended.
Alternate hypothesis: H1 : With 95% Confidence, should medical marijuana cannot be
recommended.
Using Microsoft addins called Megastat we get,
Hypothesis Test: Independent Groups (z-test)
Placebo Group
42
7.74
43
Marijunana Group
34
7.82
41
8.000
1.698
0
4.71
2.47E-06
4.671
11.329
3.329
mean
std. dev.
n
difference (Placebo Group - Marijunana Group)
standard error of difference
hypothesized difference
z
p-value (two-tailed)
confidence interval 95.% lower
confidence interval 95.% upper
half-width
F-test for equality of variance
61.1524
59.9076
1.02
.9458
variance: Marijunana Group
variance: Placebo Group
F
p-value
Conclusion:
Since the P-value (0.9458) corresponding to the value of test statistic (4.71) is greater than 0.05,
there is sufficient evidence to accept the null hypotheses at 5% level. Hence we conclude
medical marijuana can be recommended.
Procedure:
Problem: 5
A researcher wants to know if the Race and having a Single parent has an factor on gun violence
in a small town. A small group of that town residents were surveyed on their experience with
violence involving a family member. The computer printout of the ANOVA table below
summarizes the results. Interpret your results.
Two-Way ANALYSIS OF VARIANCE
Source
Sum of
Squares
Degrees of
freedom
Mean
square
F-ratio
Significance
of F
Race
Single parent
Interaction
Residue
10.00
0.42
0.41
3.17
1
1
1
6
10.00
0.42
0.41
0.53
18.87
0.79
0.77
0.03
.40
.41
Total
14
9
A. Interpret your results using the classical approach (F-value only)
B. Using
α = 0.05, interpret your results using the p-value approach (p-value and alpha)
Solution:
a)
From the above table, we have
The p-value of the main effect Race is 0.03
The p-value of the main effect Single Parent is 0.40
The p-value of the interaction is 0.41
b)
Conclusion:
Since the p-value of the main effect Race is less than 0.05, we reject the null hypothesis
at 0.05 level of significance. Hence we conclude that at least two population means differ
significantly among the levels of the factor Race
Since the p-value of the main effect Single Parent is greater than 0.05, we do not reject
the null hypothesis at 0.05 level of significance. Hence we conclude that there is no significant
difference in population means among the levels of the factor Single Parent
Since the p-value of the interaction effect Race and Single Parent is greater than 0.05, we
do not reject the null hypothesis at 0.05 level of significance. Hence we conclude that there is no
significant difference in the interaction between the factors Race and Single Parent.
/-----------------------/
29. Perform the requested operations:
(10 points)
a. Before a student takes a Statistics prep class, he or she must take a pretest and then a posttest
after the completion of the course. Typical results for two students are shown in the table
below.
Pretest
510
475
Posttest
662
620
i. Which is the independent variable?
ii. Write an equation that models the test scores
iii. Interpret the model
Solution:
(i)
(ii)
Pretest is the independent variable, because pretest is taken before completion of
the course
Pretest Vs Posttest
665
y = 1.2x + 50
R² = 1
660
655
Posttest
650
645
640
635
630
625
620
615
470
480
490
500
510
520
Pretest
(iii)
As the X-values increases the Y value also increase using regression equation.
/----------------------/
b. The cost of one day car rental is the sum of the rental fee, $100, plus $.78 per mile.
i. Write an equation that models the cost associated with renting a car.
Solution:
Y =. $78x + $100
ii. Interpret the model
Solution:
As the number of miles increase the rental cost of the car also increases.
iii. Find the cost of renting a car and driving it for 1000 miles.
Solution:
Y =. $78x + $100
Y =. $78(1000) + $100
Y = $880
Therefore, the cost of renting a car and driving it for 1000 miles is $880.
30. Administrators wanted to predict a students grade on a Senior College Statistics Midterm
based on his/her SAT score. A sample of ten past senior students was selected and their
recorded SAT scores and Midterm scores listed. The table below summarizes that data.
(15 points)
Student
ID
AB
CD
EF
GH
IJ
LM
NO
PQ
RS
TZ
•
SAT score
(Independent Variable) x
1100
1300
1000
1100
1200
1200
1400
1300
1000
1400
Midterm
score
(Dependent Variable) y
89
92
86
92
90
93
98
95
88
95
Use your calculator or statistical software to sketch the scatter plot for this data
and determine the correlation coefficient (r) between SAT and Midterm scores.
Solution:
SAT score (x) VS SAT score (y)
100
98
y = 0.022x + 65.4
R² = 0.809
SAT Score (y)
96
94
92
90
88
86
84
500
700
900
1100
SAT Score (x)
1300
1500
•
With α = 0.05, test the statistical significance of (r) in (a) above.
Solution:
ANOVA
table
Source
Regression
Residual
Total
SS
96.8000
22.8000
119.6000
df
1
8
9
MS
96.8000
2.8500
F
33.96
Regression output
variables
coefficients
std. error
t
(df=8)
Intercept
SAT Score
(x)
65.4000
4.5612
14.338
p-value
5.46E07
0.0220
0.0038
5.828
.0004
p-value
.0004
confidence interval
95%
95%
lower
upper
std.
coeff.
54.8817
75.9183
0.000
0.0133
0.0307
0.900
Since P-value is less than the 0.05, there is a significant difference.
•
2
What is value of r and what does it tell you?
Solution:
R² = 0.809
It will represent the proportion of common variation in the two variables (i.e., the
"strength" or "magnitude" of the relationship). In order to evaluate the correlation
between variables, it is important to know this "magnitude" or "strength" as well as
the significance of the correlation.
•
Determine the equation of the regression line for this data
Solution:
The equation of the regression line for this data is y = 0.022x + 65.4
•
Predict the Midterm score for a student who had a score of 1275 on the SAT
Solution:
y = 0.022x + 65.4
y = 0.022(1275) + 65.4
y = 93.45
Therefore, the predicted the Midterm score for a student who had a score of 1275
is 93.45.
/---------------------------/
31. The following table gives the mean daily calorie intake and infant mortality rate (deaths
per 1000 live births) for ten countries.
(15 points)
Country
Infant Mortality Rate
Mean Daily Calories
(Independent Variable) (Dependent Variable)
x
y
Afghanistan
154
1523
Austria
6
3495
Burundi
114
1941
Colombia
24
2678
Ethiopia
107
1610
Germany
6
3443
Liberia
153
1640
New Zealand
7
3362
Turkey
44
3429
USA
7
3671
i. Use your calculator or statistical software to sketch the scatter plot for this data. Does
there appear to be a correlation between the variables? If so describe the correlation.
Solution:
IMR VS MDC
Mean Daily Calories
4000
3500
3000
y = -13.53x + 3521.
R² = 0.884
2500
2000
1500
0
50
100
Infant Mortality Rate
150
200
ii. Determine the correlation coefficient (r) between Infant Mortality and Calorie intake.
Solution:
Correlation Matrix
Infant Mortality Rate
Mean Daily Calories
Infant Mortality
Rate
1.000
-.941
10
± .632
± .765
Mean Daily
Calories
1.000
sample size
critical value .05 (two-tail)
critical value .01 (two-tail)
iii. With α = 0.05, test the statistical significance of R in (b) above.
iv. The regression analysis is carried out in Excel (Data Data Analysis Regression Analysis selection of the variables) and the output is given
below:
ANOVA table
Source
Regression
Residual
Total
SS
6,517,718.2501
848,149.3499
7,365,867.6000
df
1
8
9
MS
6,517,718.2501
106,018.6687
F
61.48
Regression output
variables
coefficients
std. error
t (df=8)
3,521.2450
148.7793
23.668
p-value
1.08E08
3,178.1594
3,864.3306
-13.5377
1.7266
-7.841
.0001
-17.5192
-9.5562
Intercept
Infant Mortality
Rate
p-value
.0001
confidence interval
95% lower 95% upper
Since the p-value is less than 0.05 , there is significant difference between two variables.
2
v. What is the value of r and what does it tell you?
Solution:
R² = 0.884
It will represent the proportion of common variation in the two variables (i.e., the
"strength" or "magnitude" of the relationship). In order to evaluate the correlation
between variables, it is important to know this "magnitude" or "strength" as well as
the significance of the correlation.
vi. Find the regression equation for this data
Solution:
Therefore, the regression equation for this data is y = -13.53x + 3521.
vii. Use the regression equation to estimate the infant mortality rate for a country with a mean
daily calorie intake of 2800 calories.
Solution:
y = -13.53x + 3521.
y = -13.53(2800) + 3521.
y = 3328.
Therefore, the regression equation to estimate the infant mortality rate for a
country with mean daily calorie intake of 2800 calories is 95.
/-----------------------/
Download