the six-page handout of sample exam problems

STAT 11 April 17, 2008 A few sample problems… 1. Confidence interval for a mean. Mars Rover Two measures 64 daily high temperatures, and finds that the average of the 64 measurements is –42.5 degrees C. Assume that the standard deviation of daily temperatures is exactly 16 degrees C, based on data from Mars Rover One. a. In this problem, what are n, , x , , and s ? b. What is the standard error of the mean temperature? c. Give a 95% confidence interval for . d. Give an 82% confidence interval for . 2. Confidence interval for a proportion. You randomly selected 100 of your latest homemade stink bombs, and very carefully tested them. Unfortunately(?) 80 of them failed. Let p be the “true” failure rate (for all stink bombs, not just the ones in the sample). a. What is p̂ ? b. What is the SE ? c. What is a 95% CI for the true failure rate? d. If you felt like using Wilson’s method for this problem, what would change? 3. Confidence interval for a difference of means. You found that 30 randomly selected Sunoco stations had an average gas price of $ 3.10, with s = $ 0.05. Also, 40 randomly selected Lukoil stations had an average price of $ 3.12, with s = $ 0.08. Let D = (average of all Lukoil prices) minus (average of all Sunoco prices). What is a 95% confidence interval for D ? What can you conclude about the relative prices at Lukoil and Sunoco stations generally ? 1 4. Interpreting a scatterplot. a. Look at this scatterplot, and estimate… The mean of the “x” variable; The standard deviation of the “x” variable; The mean of the “y” variable; The standard deviation of the “y” variable; The correlation coefficient (r) for the two variables. b. Describe the relationship in words. 8 7 6 5 4 3 2 1 0 -5 0 5 10 5. More on standard errors. Science News reported that the average length of a junk-DNA sequence in Speciesus Inventedus is 128 bases, based on a sample of 100 measurements. a. What is the SE for , the true average length? b. Actually, Science News gave the SE: They said it’s 8. What is the standard deviation of the original sample? 2 6. One-way chi-square problem. A poker-dealing machine is supposed to deal cards at random, as if from an infinite deck. In a test, you counted 1600 cards, and observed the following: Spades Hearts Diamonds Clubs 404 420 400 376 Could it be that the suits are equally likely? Or are these discrepancies too much to be random? 7. Another one-way chi-square problem. Same as before, but this time jokers are included, and you counted 1662 cards, with these results: Spades Hearts Diamonds Clubs Jokers 404 420 400 356 82 a. If a deck contains 54 cards and two of them are jokers, what is the probability that any particular randomly-chosen card would be a joker? b. How many jokers would you expect out of 1662 random cards? How many of each suit? c. Is it possible that the cards are really random? Or are the discrepancies too large? ----------------------------------------------------SOLUTIONS 1a. n = 64  = the true (long-run) average daily temperature at the Mars Rover Two site (we don’t know the value) x = –42.5 degrees C  = 16 degrees C (by assumption) s = the sample standard deviation – this could be computed from the sample, but it isn’t given in the problem 3 1b. SE =  n  16 64  2.00 exactly 1c. Use z* = 1.96, since you are NOT using s as a substitute for . MOE = 1.96 × 2.00 = call it 3.9 degrees C, so the CI is [ -42.5 – 3.9, -42.5 + 3.9 ] or [ -46.4, -38.6 ] 1d. The problem here is to compute z* when C = 0.82. You want to leave a probability of /2 = 0.09 in each tail, so you could either… Find 0.09 under PHI(z) in the z table, and see that it corresponds to z = - 1.34, which is –z*, or Find 0.91 under PHI(z) in the z table, and see that it corresponds to z = + 1.34, which is +z*. Using z* = 1.34, we get MOE = 1.34 × 2.00 = call it 2.7 degrees, so the CI is [ -45.2, -39.8 ] . 2a. p̂ = 0.80 ( = 80 divided by 100 ) 2b. SE = 2c. Use z* = 1.96 (always use z* for a proportion) . The MOE is 1.96 × 0.04 = 0.08, so the CI is [ 0.72, 0.88 ]. pˆ (1  pˆ ) n (0.80)(0.20)  100  0.04 exactly. (Be careful here, if you use percentages. It would be ok to write [ 72%, 88% ], but if you write p̂ in percentage points, then be sure to do the same thing with the MOE. In particular, don’t write (WRONG) [ 79.92, 80.08 ]. ) 2d. The center of the CI would still be p̂ = 0.80 (exactly), but you would also compute p 80  2  0.788462, 100  4 and you would use it to compute SE: SE = p(1  p) n  (0.788462)(0.211538) 100  0.04084 . The resulting confidence interval would be very slightly larger than the one in 2c. 4 3. There’s no reason to pool the “s” values here, and that isn’t supposed to be part of the course material anyway. So compute the two SE’s separately: Sunoco: n = 30, x = 3.10, s = 0.05, SE = s/sqrt(n) = $ 0.00913 Lukoil: n = 40, x = 3.12, s = 0.08, SE = s/sqrt(n) = $ 0.01265 Combined SE: SE  SE12  SE22 = $ 0.01560 MOE: Use z* = 1.96 again, so MOE = (1.96) (0.01560) = $ 0.0306…call it 3 cents. The observed difference is D̂ = 0.02 (that’s 3.12 – 3.10) so the confidence interval for the difference is [ –$0.01, +$0.05 ]. You can’t tell from these reports – at least you can’t tell with 95% confidence – whether Sunoco or Lukoil has generally higher prices. 4a. The true values are as follows: mean of x: sd of x: 2.086 2.933 mean of y: sd of y: 4.300 1.925 correlation: –0.832 (using n-1) 4b. In words: The relationship is STRONG, NEGATIVE, LINEAR. (Although really, how strong it is depends on context. You might ask, strong compared to what? ) (The same is true of linearity. You might detect a slight downward bending at the left edge of the picture; or, you might decide that the apparent bending is caused by just one point and is probably accidental. In fact, the data were generated using a linear equation with normally distributed errors.) 5 5a. Trick question – you can’t tell what the SE is, because you would need to know  or s. 5b. You have SE =  / sqrt(100) =  / 10. Since SE = 8, you have 8 =  / 10, so they must be using 80 for . (You still can’t tell whether they got this from the sample or estimated it in some other way.) 6. expected expected observed (percent) (counts) 404 0.25 400 420 0.25 400 400 0.25 400 376 0.25 400 z 0.200 1.000 0.000 -1.200 chi-square-> 2.480 critical value-> 7.815 Compute each z from its own row as (observed-expected)/sqrt(expected). Be sure to use the counts in this formula, not the percentages. The chi-square statistic is the sum of the squares of the z-values. The number of degrees of freedom is 3 (number of categories minus 1). The critical value is from a table you’ll have on the exam (using  = 0.05). But you don’t need it in this case. The chi-square value is about what you would expect with 3 degrees of freedom, and none of the z statistics are out of line (not even as large as 2, certainly not beyond 4). So, DO NOT REJECT the null hypothesis. There is no reason to suspect that the cards are not random. 7. expected expected observed (percent) (counts) 404 0.2407 400.1 420 0.2407 400.1 400 0.2407 400.1 356 0.2407 400.1 82 0.0370 61.6 1662 1662 z 0.194 0.994 -0.006 -2.205 2.606 chi-square-> 12.680 critical value-> 9.488 This time, the chi-square statistic (12.68) is above the =0.05 critical value, so you could reject the null hypothesis and declare that the cards are not random. The problem is clearly that there are too many jokers at the expense of clubs – you can see that from the z statistics. On the other hand, the p-value is only 0.013 (you can’t compute that during an exam) so the test isn’t totally convincing. You wouldn’t want to arrest the machine designer on this evidence. (end) 6

the six-page handout of sample exam problems

Related documents

Products

Support

the six-page handout of sample exam problems

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib