STAT 366 Spring 10 HW 2 DUE 2/1/10 10am 1) Chptr 4 of

advertisement
STAT 366 Spring 10 HW 2
DUE 2/1/10 10am
1) Chptr 4 of Freedman et al, Review Exercises, page 74, #4 (Think about “skewness”, then think about the ranges and
averages for each measures of income and years of schooling-drawing a graph/picture might help)
Ans: In the case of years of education, the majority of the data will be right around 12 years ( a full high school education),
give or take some. So let’s say 12 years is the median. There will a bit less in the range of 12 and 16, indicating
undergraduate education, and even fewer in the range of 16 -20, representing graduate school education. There will be a
few out beyond 20 (your instructor is currently completing his 12th year of college/university education, meaning a score of
24 years of education total), but hardly enough to register in a histogram. That will about cover it on the right side of the
median (~12 years), so the effective upward range is approximately 8-9 years above the median. Unfortunately, there is a
surprisingly large amount of data on the far left of, beyond 8-9 years below, the median. The graph on page 39 sheds some
light on this. The result is a rather left skewed distribution of data. With a left skewed histogram, we know that the mean<
median.
In the case of income, it is the exact opposite. Let’s say the median is around $50k nationally (I really don’t know, but it’s a
reasonable guess). There are probably people who make next to nothing, but that is probably not a large proportion (even
in today’s economy), and so the distribution of income to the right of the median tapers off relatively quickly and has an
effective range of $50K less than the median. On the left, it is a very different picture. There are lots of people who make
a lot more than $50k, and it stretches out effectively to a range much greater than $50K above the median, even if I don’t
include the Bill Gates, Warren Buffets, etc… So what results is a very right skewed distribution, much like the NBA an NFL
salaries. Accordingly, the mean> median.
2) Chptr 4 of Freedman et al, Review Exercises, page 75, #6 Briefly explain answers for (a)-(d)
For (c), recall that SD is a measure of the average distance from the mean
For (d) you can assume that (i) and (iii) are mirror reflections, that is, rotating the sketch of (i) about the value of 50
results in the sketch of (iii). This could be done mathematically by taking all the data values in (i), first multiplying
them by -1, and then adding 100 to the new value. How do these “operations” affect the SD?
Ans: (a) Clearly (ii) has the average of 50. For (i), since the range is the same as in (ii) but the data is more heavily distributed
to the right of 50, one expects the mean of (i) also to be to the right of the mean of (ii), so it would be 60. Similarly, we would
expect the mean of (iii) to be to the left of that of (ii), so it would be 40.
(b) (i) is left skewed-> median> average, (ii) is symmetric-> median~average (iii) right skewed-> median<average
(c) If the SD of (iii) were 5, then the histogram would be very much “mounded” up around the mean of 40, which it isn’t. If
the SD was 50, then there would have to be a large amount of data more than 50 away from 40, i.e. greater than 90 or less than
-10, since this would be needed to average out the considerable amount less than 50 away. But since this isn’t the case, we
must conclude that the SD is around 15. Looking at the distribution curve, this seems about right.
(d) False: Since their “shapes” are about the same, just “flipped” about the value of 50, then we can conclude the their
SD’s are about the same. Thinking of it another way, pick any point to the left of the mean(60) in (i), look at its distance to the
mean, call this distance L, and the height of the curve above this point, call this height H . Now looking at (iii), you will find, at a
distance L to the right of the mean(40), a point where the height of the curve above it is very close if not identical to H. You will
find this to be true if you pick points on the other side of the respective means. So the proportion of points a certain distance
away from the mean, i.e. (x-xbar) in (i) is the same as the proportion of points the same distance away from the mean in (iii).
This means that to the average distance of all the points in the distribution to the mean is the same for (i) and (iii). That is, their
SD’s are about the same.
Thinking of it in yet a third way (if the first two didn’t make sense), if {xi} is the data set for{i}, with SDi, then, as described
above, the data set for {iii} is {xi*(-1) +100}, which results in an SDiii = |-1|*SDi=SD, or about the same as in {i}.
3) A statistics teacher has two introductory statistics classes. On the first exam, the 20 students in the first class
averaged 92, while the 25 students in the second class averaged 83. If the teacher combines the classes, what will the
overall average be?
Ans: As I indicated in the hints:
average of the two classes combined = (sum of all scores for 1st and 2nd classes)/(total # of students)=
[(sum of scores for 1st class) + (scores of the scores for 2nd class)]/(# students in 1st class + # student in 2nd class)
= [(#students in 1st class)*(1st class average) + (#students in 2nd class)*(2nd class average)]/ (# students in 1st class + # student in
2nd class)
= (20*92 +25*83)/(20 +25)= 3915/45=87
4) Suppose the average height of males in a particular city in 72 inches with an SD=1.5. Assuming that the heights of
males in this city are normally distributed: (show your work!)
a) What % of males fall within the height range of 69 inches and 73.5 inches ?
b) Only 10% of the males in the city have heights greater than what value?
c) What is the 90th percentile of the heights of men in this city?
d) What is the chance that a randomly selected male from this city will be shorter than 71 inches?
Ans: a)The shaded area below is that for which we are looking.
Density
This is found by finding area to the left of x=73.5 and subtracting off the area to the left of 69. These areas can be found using
Zscores.
Let x1=69 and x2=73.5.
Then Zscorex1= (69-72)/1.5=-2 -> area under normal curve to the
Distribution Plot
left of (Z=-2) is
Normal, Mean=72, StDev=1.5
0.30
=[ area to the right of (Z=2)] by
symmetry
0.25
=[ 1-(area to the left of (Z=2)]=1-.977=.023.
Zscorex2= (73.5-72)/1.5=1 ->area under the normal curve to the
0.20
left of (Z=1) =.841
0.15
So the shaded area above, between x=73.5 and x=69, is equivalent
to the area between Z=1 and Z=-2, which is
0.10
= .841-.023=.818 or 81.8%
0.05
b) and c)If x is the data value for which only 10% of the data is
greater (lies to the right), then 90% of the data must be less than
0.00
69
72
73.5
this value (to the left). That is, the answer to c) is also the answer
X
to b). We shall proceed by finding the 90th percentile.
The 90th percentile of the heights is that height for which 90% of the heights are below, or .9 or the area under the curve lies to
the left. Starting with the Z-table, and looking under the column “Area<z”, we find .9, which corresponds to a Zscore of Z=1.28.
Since for a data value of x, (x- xbar)/SD= Zscore, we have
(x-72)/1.5=1.28 -> x-72 = 1.5*1.28 -> x = 72 + 1.5*1.28= 72 + 1.92 = 73.92 inches.
***Note-> This means for any normal curve, the 90th percentile is located at a data value which is 1.28 SD’s above the average
(or mean).***
d)Let x=71. The Zscorex= (71-72)/1.5 = -1/1.5=-.667. So the chance of finding someone shorter than 71 inches in height is
= the area under the Z standard normal curve to the left of Z= -.667.
= the area under the Z curve to the right of Z= .667 (by symmetry)
= 1- (area under the Z curve to the left of Z= .667)
Which is something we can now read off of the Z-table
= 1-(.745 or .749 whether you round up or down)= .255 or .251
12) Suppose now that you have normal distributions C and D, with C = D=0 and SDC<SDD . If you pick a number
xC at random from distribution C and a number xD from distribution D, which of the following is true? Briefly
explain your answer? (Hint: draw a picture)
a)[Chance of (xC<1)] < [ Chance of (xD < 1)]
b)[ Chance of (xC<1)]> [ Chance of(xD < 1)]
c)[Chance of (xC<1)] = [Chance of (xD < 1)]
d)none of the above
Ans: b): Generally speaking, look at the Zscore for 1 on both normal curves C and D.
ZscoreD= (1-0)/SDD =1/SDD <1/SDC=(1-0)/SDc = Zscorec since SDD> SDC . Looking at the Z-table, it is clearly the case that
the higher Zscore (i.e. ZscoreC) will have a larger percentage of data to the left , which is exactly the statement of b).
Download