Stat 401B Exam 1 Fall 2015 I have neither given nor received unauthorized assistance on this exam. ________________________________________________________ Name Signed Date _________________________________________________________ Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF! Completely absurd answers (that fail basic sanity checks but that you don't identify as clearly incorrect) may receive negative credit. 1 1. An analytical lab has two sources of "pure water" (that theoretically has pH of 7.0). Suppose that (after round-off) pH of "pure water" measured by a certain meter has one of the two probability distributions below, depending upon whether Source A or Source B water is being tested. x Source A f x | A 6.9 7.0 7.1 .2 .4 .4 x Source B f x | B 6.9 7.0 7.1 .3 .4 .3 8 pts a) Find the mean and standard deviation of measured pH for tests using Source A water. EX ___________ VarX __________ 8 pts b) Suppose that among 5 beakers on a lab table, 2 are filled with Source A water and 3 are filled with Source B water. These are not labeled and an analyst doesn't know which are which. This person tests one beaker with the meter and observes a measured pH of 6.9. What is the (conditional on this outcome) probability that the tested beaker is full of Source A water? P A | X 6.9 __________ 2 7 pts c) Later, the analyst starts testing pH for Source A specimens. Among the first 10 tested, what are the mean and standard deviation of the number with measured pH less than 7.0? mean __________ standard deviation __________ 8 pts 2. Random variables X and Y are independent with a common Exp 1 marginal distribution. Define the positive random variable U X / Y . Completely set up (but do not evaluate) a double integral giving P U 2 . 3 3. Data in Statistical Quality Assurance Methods for Engineers by Vardeman and Jobe giving the production (in thousands of barrels) of oil wells drilled in a certain (real) oil field suggest a normal model with mean 3.77 and .77 for log-production (the natural logarithm of production). 5 pts a) Notice that if log-production is normal with 3.77 and .77 , then the 25%, 50%, and 75% points of the log-production distribution are respectively 3.77 .675 .77 3.25 , 3.77 , and 3.77 .675 .77 4.29 . Exponentiating, these correspond to exp 3.25 25.8 , exp 3.77 43.4 , and exp 4.29 73.0 as the 25%, 50% and 75% points of the production distribution. Is it possible that production is also normally distributed? Explain. Production normal? Yes/No (circle one) Explanation: 8 pts b) Under the suggested normal model for log-production, about what fraction of wells have logproduction above 4.0? Answer: __________ 8 pts c) If one were to drill 4 additional wells and use an iid normal model for the log-productions, what is the probability that the sample average log-production will exceed 3.7? Answer: __________ 4 5 pts d) What is the physical and modeling reason that redoing the calculation in c) for 40,000 wells on this oilfield is not likely to produce a number with any practical relevance? 4. In fact, the log-production data referred to in Problem 3 had x 3.77 and s .77 for n 52 wells. 8 pts a) Based on the above values, give 90% two-sided confidence limits for the mean log-production of wells of this type drilled in this kind of field. (You need not do the arithmetic/simplify.) Limits: __________ 5 pts b) If you exponentiate your end-points from a) they can function as confidence limits for the median well production (but not as limits for the mean well production). As it turns out, using the raw (not logged) production values, 90% confidence limits for the mean production of wells of this type are 56.0 9.7 (thousands of barrels). From the point of view of a financial analyst for the company owning the oil filed, which set of limits (for the median or for the mean) is most interesting? WHY? Most interesting: Median/Mean (circle one) Explanation: 5 10 pts c) Suppose that the company owning the oilfield is willing to drill again on the field only if the mean yield is clearly larger than 50.0. The n 52 wells had raw (unlogged) productions with raw (unlogged) sample mean 56.0 and sample standard deviation 42.5. Write out a 5-step significance test appropriate for addressing this concern. (Carefully show the 5 steps!) Should the company drill? Explain. 5 Step Write-up: Drill? Yes/No (circle one) Explanation: 6 5. In the measurement of diameters of cylinders produced on a lathe, there is measurement error. Suppose that for x an actual diameter, and y the measured diameter the measurement error is yx so that y x For modeling purposes, it is plausible to assume that x and are independent random variables. Suppose that actual diameters have mean x 2.10 in and standard deviation x .02 in , and that measurement errors have mean .005 in and standard deviation .01 in . 8 pts a) What are an appropriate mean ( y ) and standard deviation ( y ) for measurements? y __________ y __________ 5 pts b) For very large samples (of single measurements on different cylinders) do you expect the sy confidence limits y z to "close in on" the actual mean diameter, x 2.10 in ? Explain why or n why not. Answer: I DO expect/I DO NOT expect (circle one) Explanation: 7 7 pts 6. Below are some R commands and some blank lines where R would respond to the commands. On those blank lines write what would appear (after a carriage return on the previous line) if you were running R. > x<-c(3,2,7,3,4,6,1,7,3) > x^2 [1]_______________ > sort(x)[4] [1]_______________ > sum(x)/length(x) [1]_______________ > mean(x) [1]_______________ > d<-x-mean(x) > d[3] [1]_______________ > sqrt(sum(d^2)/(length(d)-1)) [1]_______________ > sd(x) [1]_______________ > sd(2*x) [1]_______________ > pexp(1.0,rate=1) [1]_______________ > dexp(1.0,rate=1) [1]_______________ > s<-1:9 > for (i in 1:3) {s[i]<-x[i]} > s [1]_______________ 8