Solution to 5 - University of South Alabama

advertisement
Solution to 5.12
Scatterplot of y vs x
80
y
70
60
50
40
0
5
10
15
20
25
x
a. Comment on the features:
i. Trend seems to be linear with one outlying observation
ii. There are only 9 data points they look pretty scattered. Relationship may
not be that strong.
b. Looks that value y decreases with increasing value of x. that is the linear
relationship seems to be negative
c. Following table gives the computation of ‘r’
Mean(x) = 16, mean(y)=53, s.d(x) = 6.82, s.d.(y) = 10.02 How do I compute these
numbers? What are the formulae? (see the solutions to problem from chapter 4)
Now to compute r, first we compute the Z scores for all x-values as well as for all yvalues in the following table. (How to compute the z-score? What is the formula? (see
the solutions to problem from chapter 4)
x
Zx
23
14
14
0
17
20
20
15
21
Z Z
x
r
y
Z
y
1.02639
-0.29326
-0.29326
-2.34604
0.14663
0.58651
0.58651
-0.14663
0.73314
Zy
43
59
48
77
50
52
46
51
51
ZxZy
-0.998
0.5988
-0.499
2.39521
-0.2994
-0.0998
-0.6986
-0.1996
-0.1996
-1.02434
-0.1756
0.146337
-5.61926
-0.0439
-0.05853
-0.40974
0.029267
-0.14633
= sum of the last column in the table above = -7.3021
x
Zy
= -7.3021/8 = -.91276
n 1
i. Since the value of r is a negative number, indicates that there is negative
linear relationship between x and y.
ii. Since the value of r is pretty close to -1, we conclude that there is a strong
linear relationship between x and y.
iii. The value of r supports our speculation in part b.
d. The strong negative linear relationship between anxiety and exam scores indicate that
anxiety explains the poor score on the exam.
5.18
x= grade in which a student is studying (independent variable)
y = percentage of students who are likely to purchase (dependent variable)
Scatterplot of Percenta vs Grade
90
80
Percenta
70
60
50
40
30
6
7
8
9
Grade
10
11
12
a. comments on the features of scatter plot
i. Though there are only four data points, the scatter diagram shows a linear
trend among the points
ii. Seems the relationship between x and y is positive.
b. to find the equation is to find the value of a and b in the relation y = a+bx
x x2
y
y2
xy
6 36
32.7 1069.29
196.20
8 64
46.1 2125.21
368.80
10 100
75.0 5625.00
750.00
12 144
83.6 6988.96
1003.20
2
2
Σx = 36 Σx = 344 Σy = 237.4 Σy = 15808.5 Σxy = 2318
b
 xy 
x
2
 x y

n
( x) 2
and a  y  bx
n
plugging in the sums in the formulae we get
b = 9.08 a = -22.4

the fitted line is y = -22.4 + 9.08*x
slope = 9.08
y-intercept = -22.4
since slope is positive hence the relationship between x and y is positive that is value of y
increases with increase in x.
the interpretation of slope : if value of x increases by 1 unit then on the average value of y
increases by apporx 9 units.
How to predict?
If students are in 7th grade what is the corresponding
percentage?
That is predict y for x = 7
We plug in the x=7 in the fitted line.

Thus the predicted value y = -22.4+9.08*7 = 40.86
That is approximately 41% of the 7th grader are more likely
to purchase the tickets.
5.19
a. dependent variable= y = # fruit/vegetable servings/day
predictor variable = x = # TV viewing hours/day
b. the line will have negative slope since # servings decrease with increase in # TV
viewing hours.
5.31
Scatterplot of y vs x
6.0
y
5.5
5.0
4.5
80
82
84
86
88
90
x
a. comments on the scatter plot
i. The points on the scatter plot show a liner trend.
ii. The linear trend seems to be with a positive slope that is value of y is in
general increasing with value of x
iii. The points a pretty much scattered, indicating that the relationship may not be
that strong.
b. given that
Σx = 1368.1 Σx2 = 117123.85 Σy = 80.9 Σy2 = 412.81 Σxy = 6933.48
We plug in these sums in the formula to determine values of b and a. the formulae are
b
 xy 
x
2
 x y

n
( x) 2
and a  y  bx
n
1368.1 * 80.9
80.9
 1368.1 
16
 .1123 * 
b=
= .1123 and a =
 = -4.54
2
16
(1368.1)
 16 
117123.85 
16
6933.48 
c.(this is interpretation of slope) when the chest circumference (x) increases by 1 cm, then
on the average the corresponding vital capacity increases by .1123 units.
When the chest circumference (x) increases by 10 cm, then on the average the
corresponding vital capacity increases by 10*.1123= 1.123 units.
d. (this is prediction) when x = 85 the corresponding y should be

y = -4.54 + .1123*85 = 5.0055
that is when chest circumference is 85 cm, the average vital capacity is 5.0055
units.
e. No, the vital capacity can not be determined completely by the chest
circumference because there are factors other than chest circumference that affect
the vital capacity. For example in the dataset note that corresponding to chest
circumference = 88.1cm there are two different values of vital capacity namely
5.7 and 5.2. This indicates that other factors also play role in determining the vital
capacity.
5.37
Scatterplot of y vs x
60
y
50
40
30
20
40
50
60
70
x
80
90
100

given that line fitted to given data has equation y = 64.50 - .45x
Fitted Line Plot
y = 64.50 - 0.4500 x
60
S
R-Sq
R-Sq(adj)
7.08520
69.3%
63.2%
y
50
40
30
20
40
50
60
70
x
80
90
100
b. the red dots show the data points given to us (our sample data). The blue line

represents the line fitted y = 64.50 - .45x
R-sq= coefficient of determination = (square of ‘r’)
r = R-sq = .832
looks like the y-value corresponding to x=40 is kind of outlier. It is far away from the
fitted line. This will result in a large residual value.

c. residual = observed value of y- fitted value of y = y- y
this is the error in our prediction.
Following table shows the computation of residuals


y = 64.50 - .45x
x
y
residual=y- y (amount of error in prediction)
40
58
46.5
11.5
50
34
42.0
-8.0
60
32
37.5
-5.5
70
30
33.0
-3.0
80
28
28.5
-0.5
90
27
24.0
3.0
100
22
19.5
2.5
Scatterplot of residuals vs x
10
residuals
5
0
0
-5
-10
40
50
60
70
x
80
90
100
i. This plot shows that the residual corresponding to x=40 is very large compared to
others.
ii. This plot also show that in general residuals(amount of error in prediction)
decrease with increase in value of x. decrease=approaching zero
If we delete this observation from our computation and
recalculate the line, we get the following graph and line
Fitted Line Plot
Ynew = 45.33 - 0.2200 Xnew
36
S
R-Sq
R-Sq(adj)
34
1.01653
95.3%
94.2%
Ynew
32
30
28
26
24
22
50
60
70
80
90
100
Xnew
1. note that most of the data values are now pretty close
to the fitted line
2. note that coefficient of determination is increased to
95.3% which increases the reliability of our
prediction.
3. r = .953 = .976
4. in nutshell, removing an extreme point from the given
data set gives us an improved and more reliable
prediction line.
Download