Form A - NCWC: Faculty Pages

advertisement
STATISTICS TEST 4A (SOLUTIONS)
Caveat lector: Answer each problem on your own paper. Write nothing on this page. If you
have any questions, ask only the professor. The use of calculators is permitted, but the use of cell
phones is prohibited. Copying your neighbor’s answers is also prohibited, and is a bad idea anyway
because I’ve put different numbers on different tests.
1. (a) Define correlation. Correlation is a statistical method of measuring the relationship between
two variables.
1. (b) Why is the normal distribution important in statistical analysis? It appears in many applications of statistics.
2. A pediatrician wants to determine the relation that exists between a child’s height, x, and head
circumference, y. She randomly selects 11 children from her practice, measures their height and
head circumference, and obtains the following data:
Height x 27.75 24.5 25.5 26
25 27.75 26.5 27 26.75 26.75 27.5
Head Circumference y 17.5 17.1 17.1 17.3 16.9 17.6 17.3 17.5 17.3 17.5 17.5
(a) Draw a scatter plot of the data. A good scatter plot should have its x-axis labeled “Height”
and its y-axis labeled “Head Circumference”. The numbers should be to scale, not arranged
haphazardly or even out-of-order on the axes. If you have questions about what constitutes a
proper scatter plot,
me.P
P see
2
(b)
x and
( x)2 .
P Compute
P
2
x2 = 7, 710.25;
P ( x)P= 84,
P681.
(c)
Compute
xy
and
(
x)
(
P
P
P y).
xy = 5, 044.425; ( x) ( y) = 55, 464.6
(d) Compute the linear regression.1
y = 0.183x + 12.493
(e) Compute the correlation coefficient.2
0.911
(f) Use your answer for (d) to estimate a child’s head circumference if his height is 27.75 inches.
17.56 inches
1
The formula for linear regression is
y = mx + b
where
P
P
P
n ( xy) − ( x) ( y)
m=
P
P 2
n ( x2 ) − ( x)
2
P
P 2
P
P
( y)
x − ( x) ( xy)
b=
.
P
P 2
n ( x2 ) − ( x)
The formula for the correlation coefficient is
P
P
P
n ( xy) − ( x) ( y)
r = rh
P
P 2i h P 2
P 2i
n ( x2 ) − ( x)
n ( y ) − ( y)
(g) Do you think your answer in (f) is very reliable? Justify your answer, using your answer for
(e).
The answer should be very reliable, because it is an interpolation, and also because the correlation coefficient is close to 1.
3. What percentage of the area of the standard normal distribution falls below the mean? 50%
4. Find the area under the standard normal distribution curve.
(a) between z = 0 and z = 2.00 0.4772
(b) to the right of z = 1.09 0.5-0.3621=0.1379 (if you don’t understand why I subtracted, draw
a picture)
5. Find probabilities for each, using the standard normal distribution.
(a) P (0 < z < 1.69) 0.4545
(c) P (1.46 < z < 2.97) 0.4985-0.4279=0.0706 (if you don’t understand why I subtracted, draw
a picture)
6. The average time for a courier to travel from Pittsburg to Harrisburg is 200 minutes, and the
standard deviation is 10 minutes. If one of these trips is selected at random, find the probability
that the courier will travel at least 180 minutes. Assume the variable is normally distributed.
Solution:
We want P (X ≥ 180).
The corresponding z-score is
z=
X −µ
180 − 200
=
= −0.2.
σ
10
The area between this z-score and the mean is 0.0793.
We need to add 0.5 to this (draw a picture!).
So the probability that the courier will travel at least 180 minutes is 57.93%.
7. A government agency decides to provide educational scholarships to households in the bottom
10% bracket of incomes. If incomes are normally distributed and the average household income
in 2001 was $42,228, the standard deviation is $21,000, and the variable is normally distributed,
what is the maximum income the government should target?
Solution:
We want to find the income level I such that 10% of all incomes are less than or equal to I.
Since we know the percentage of incomes is 10%, the area under the standard normal distribution curve is 0.1000. This is a tail (draw a picture!). The area between the mean and the tail is
0.5-0.1=0.4000.
This corresponds to a z-score of -1.28 (look up 0.4000 in the table). The value must be negative,
because this is the bottom 10% percent of all incomes.
We convert from z to X using the equation
X −µ
σ
X − 42, 228
−1.28 =
21, 000
−26, 880 = X − 42, 228
15, 348 = X.
z=
So the income level marking the bottom 10% is $15,348.
(Note: These numbers are unrealistic. To begin with, I invented the numbers. However, this
problem illustrated another error as well; income levels are not normally distributed. Assuming
that they are would ordinarily be a mistake, but I told you to do that for this problem.)
8. A survey found that the average American family generates an average of 17.2 pounds of glass
garbage each year. Assume the standard deviation of the distribution is 2.5 pounds. Find the
probability that the mean of a sample of 100 families will be between 17 and 18 pounds.
Solution:
We want P (17 < X < 18).
Using the Central Limit Theorem, the corresponding z-scores are
z=
X −µ
√σ
n
and
z=
=
17 − 17.2
√2.5
100
18 − 17.2
√25
100
= −0.8
= 3.2.
The area between these z-scores and the mean are 0.2881 and 0.4999 (read the table!), respectively.
We have to add these areas (draw a picture!). This gives us a total area of 0.7880.
Hence the probability that the mean of a sample of 100 families is between 17 and 18 pounds is
78.80%.
Download