STAT 366 Spring 10 HW 2 DUE 2/1/10 10am 1) Chptr 4 of

STAT 366 Spring 10 HW 2 DUE 2/1/10 10am 1) Chptr 4 of Freedman et al, Review Exercises, page 74, #4 (Think about “skewness”, then think about the ranges and averages for each measures of income and years of schooling-drawing a graph/picture might help) Ans: In the case of years of education, the majority of the data will be right around 12 years ( a full high school education), give or take some. So let’s say 12 years is the median. There will a bit less in the range of 12 and 16, indicating undergraduate education, and even fewer in the range of 16 -20, representing graduate school education. There will be a few out beyond 20 (your instructor is currently completing his 12th year of college/university education, meaning a score of 24 years of education total), but hardly enough to register in a histogram. That will about cover it on the right side of the median (~12 years), so the effective upward range is approximately 8-9 years above the median. Unfortunately, there is a surprisingly large amount of data on the far left of, beyond 8-9 years below, the median. The graph on page 39 sheds some light on this. The result is a rather left skewed distribution of data. With a left skewed histogram, we know that the mean< median. In the case of income, it is the exact opposite. Let’s say the median is around $50k nationally (I really don’t know, but it’s a reasonable guess). There are probably people who make next to nothing, but that is probably not a large proportion (even in today’s economy), and so the distribution of income to the right of the median tapers off relatively quickly and has an effective range of $50K less than the median. On the left, it is a very different picture. There are lots of people who make a lot more than $50k, and it stretches out effectively to a range much greater than $50K above the median, even if I don’t include the Bill Gates, Warren Buffets, etc… So what results is a very right skewed distribution, much like the NBA an NFL salaries. Accordingly, the mean> median. 2) Chptr 4 of Freedman et al, Review Exercises, page 75, #6 Briefly explain answers for (a)-(d) For (c), recall that SD is a measure of the average distance from the mean For (d) you can assume that (i) and (iii) are mirror reflections, that is, rotating the sketch of (i) about the value of 50 results in the sketch of (iii). This could be done mathematically by taking all the data values in (i), first multiplying them by -1, and then adding 100 to the new value. How do these “operations” affect the SD? Ans: (a) Clearly (ii) has the average of 50. For (i), since the range is the same as in (ii) but the data is more heavily distributed to the right of 50, one expects the mean of (i) also to be to the right of the mean of (ii), so it would be 60. Similarly, we would expect the mean of (iii) to be to the left of that of (ii), so it would be 40. (b) (i) is left skewed-> median> average, (ii) is symmetric-> median~average (iii) right skewed-> median<average (c) If the SD of (iii) were 5, then the histogram would be very much “mounded” up around the mean of 40, which it isn’t. If the SD was 50, then there would have to be a large amount of data more than 50 away from 40, i.e. greater than 90 or less than -10, since this would be needed to average out the considerable amount less than 50 away. But since this isn’t the case, we must conclude that the SD is around 15. Looking at the distribution curve, this seems about right. (d) False: Since their “shapes” are about the same, just “flipped” about the value of 50, then we can conclude the their SD’s are about the same. Thinking of it another way, pick any point to the left of the mean(60) in (i), look at its distance to the mean, call this distance L, and the height of the curve above this point, call this height H . Now looking at (iii), you will find, at a distance L to the right of the mean(40), a point where the height of the curve above it is very close if not identical to H. You will find this to be true if you pick points on the other side of the respective means. So the proportion of points a certain distance away from the mean, i.e. (x-xbar) in (i) is the same as the proportion of points the same distance away from the mean in (iii). This means that to the average distance of all the points in the distribution to the mean is the same for (i) and (iii). That is, their SD’s are about the same. Thinking of it in yet a third way (if the first two didn’t make sense), if {xi} is the data set for{i}, with SDi, then, as described above, the data set for {iii} is {xi*(-1) +100}, which results in an SDiii = |-1|*SDi=SD, or about the same as in {i}. 3) A statistics teacher has two introductory statistics classes. On the first exam, the 20 students in the first class averaged 92, while the 25 students in the second class averaged 83. If the teacher combines the classes, what will the overall average be? Ans: As I indicated in the hints: average of the two classes combined = (sum of all scores for 1st and 2nd classes)/(total # of students)= [(sum of scores for 1st class) + (scores of the scores for 2nd class)]/(# students in 1st class + # student in 2nd class) = [(#students in 1st class)*(1st class average) + (#students in 2nd class)*(2nd class average)]/ (# students in 1st class + # student in 2nd class) = (20*92 +25*83)/(20 +25)= 3915/45=87 4) Suppose the average height of males in a particular city in 72 inches with an SD=1.5. Assuming that the heights of males in this city are normally distributed: (show your work!) a) What % of males fall within the height range of 69 inches and 73.5 inches ? b) Only 10% of the males in the city have heights greater than what value? c) What is the 90th percentile of the heights of men in this city? d) What is the chance that a randomly selected male from this city will be shorter than 71 inches? Ans: a)The shaded area below is that for which we are looking. Density This is found by finding area to the left of x=73.5 and subtracting off the area to the left of 69. These areas can be found using Zscores. Let x1=69 and x2=73.5. Then Zscorex1= (69-72)/1.5=-2 -> area under normal curve to the Distribution Plot left of (Z=-2) is Normal, Mean=72, StDev=1.5 0.30 =[ area to the right of (Z=2)] by symmetry 0.25 =[ 1-(area to the left of (Z=2)]=1-.977=.023. Zscorex2= (73.5-72)/1.5=1 ->area under the normal curve to the 0.20 left of (Z=1) =.841 0.15 So the shaded area above, between x=73.5 and x=69, is equivalent to the area between Z=1 and Z=-2, which is 0.10 = .841-.023=.818 or 81.8% 0.05 b) and c)If x is the data value for which only 10% of the data is greater (lies to the right), then 90% of the data must be less than 0.00 69 72 73.5 this value (to the left). That is, the answer to c) is also the answer X to b). We shall proceed by finding the 90th percentile. The 90th percentile of the heights is that height for which 90% of the heights are below, or .9 or the area under the curve lies to the left. Starting with the Z-table, and looking under the column “Area<z”, we find .9, which corresponds to a Zscore of Z=1.28. Since for a data value of x, (x- xbar)/SD= Zscore, we have (x-72)/1.5=1.28 -> x-72 = 1.5*1.28 -> x = 72 + 1.5*1.28= 72 + 1.92 = 73.92 inches. ***Note-> This means for any normal curve, the 90th percentile is located at a data value which is 1.28 SD’s above the average (or mean).*** d)Let x=71. The Zscorex= (71-72)/1.5 = -1/1.5=-.667. So the chance of finding someone shorter than 71 inches in height is = the area under the Z standard normal curve to the left of Z= -.667. = the area under the Z curve to the right of Z= .667 (by symmetry) = 1- (area under the Z curve to the left of Z= .667) Which is something we can now read off of the Z-table = 1-(.745 or .749 whether you round up or down)= .255 or .251 12) Suppose now that you have normal distributions C and D, with C = D=0 and SDC<SDD . If you pick a number xC at random from distribution C and a number xD from distribution D, which of the following is true? Briefly explain your answer? (Hint: draw a picture) a)[Chance of (xC<1)] < [ Chance of (xD < 1)] b)[ Chance of (xC<1)]> [ Chance of(xD < 1)] c)[Chance of (xC<1)] = [Chance of (xD < 1)] d)none of the above Ans: b): Generally speaking, look at the Zscore for 1 on both normal curves C and D. ZscoreD= (1-0)/SDD =1/SDD <1/SDC=(1-0)/SDc = Zscorec since SDD> SDC . Looking at the Z-table, it is clearly the case that the higher Zscore (i.e. ZscoreC) will have a larger percentage of data to the left , which is exactly the statement of b).

STAT 366 Spring 10 HW 2 DUE 2/1/10 10am 1) Chptr 4 of

Related documents

Products

Support

STAT 366 Spring 10 HW 2 DUE 2/1/10 10am 1) Chptr 4 of

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib