Statistics 533 Spring 2014 Midterm Exam Name Exam Instructions When asked to explain something, provide an explanation that could be understood by someone who does not have formal training in statistical methods. Your explanations should be clear and concise. Generally, you must show all of your work in order to get partial credit. When you are asked to derive an expression you must show all steps of your derivation. Do not use your calculator until you have otherwise completed the entire exam. When you are asked to compute a quantity you do not need to do all of the computations to a final answer. Just show that you know how to do the problem. I am not interested in how well you can use your calculator. Students may use one sheet (8.5 by 11 inches, both sides) of paper containing equations or notes. Students may have up to 120 minutes (2 hours) to complete the exam. There are 23 questions or question parts in the exam. Students must choose and mark three (3) question parts as “Do Not Grade.” As soon as possible after the exam is completed, it should be scanned by the proctor and returned to the ISU Testing Center, by one of the suggested methods (upload, email or FAX). 1 1. The delta method is widely used in statistics to estimate standard errors of nonlinear functions of parameters. Suppose that you have the ML estimate of a scalar parameter θb and an estimate of its standard error se b θb. (a) Use the delta method to obtain an expression for se b log θb, an estimator of selog θb as a function of se b θb. (b) Derive an expression for an approximate 100(1 − α)% confidence interval for θ based on the assumption that log θb − log θ ∼ NOR(0, 1). se b log θb (c) In an application like this we would generally say that “We are 95% confident that the confidence interval contains θ.” Briefly explain the precise meaning of this statement. (d) Explain why the delta-method approximation tends to work better (i.e., is more accurate) in large samples. Draw a picture to help you explain. 2 2. Consider the random variable T that describes the failure time distribution of Component G and has a cdf " # 2 t F (t) = Pr(T ≤ t) = 1 − exp − , t>0 10 where time is in units of thousands of hours of operation. (a) Derive an expression for the pdf of T ? (b) Derive an expression for the hazard function for T ? (c) What can you say about the shape of the hazard function for T ? (d) Compute the reliability of Component G for a mission time of 4 thousand hours? 3. The bootstrap method of computing a confidence interval can provide an excellent procedure when the normal-approximation (Wald) procedure does not perform well. The bootstrap procedure is a bit more complicated when censoring is involved. To generate bootstrap samples using the parametric sampling method requires modeling the censoring mechanism as well, which is easy for simple censoring, but complicated for random censoring, such as arises in field data. The nonparametric methods of generating bootstrap samples provide a better approach. (a) Provide, perhaps by using a formula, a simple intuitive explanation for why a confidence interval based on a bootstrap will have better coverage properties (i.e., true coverage probability closer to the nominal confidence level) than a standard normal-approximation (Wald) interval. (b) For problems with censored data, explain the important advantage of using the fractionalweights method versus the integer weight method. 3 4. Suppose that time to failure T for Component-A has the cdf " # β δ Pr(T ≤ t) = F (t) = exp − , t > 0. t (1) This is known as the Fréchet distribution of maxima and is a distribution that is in the log-location-scale family. (a) Derive an expressions for the quantile of T and for the quantile of log(T ). (b) Show that the distribution of log(T ) has a location-scale distribution. (c) Show how you would transform time (t) and fraction failing to construct a probability plot for the Fréchet distribution. 5. Explain why the nonparametric estimate from interval censored data is a set of points, while the nonparametric estimate from data with “exact” failures is a step function. 6. Briefly explain the difference between quality and reliability. 4 7. A random sample of n = 40 motors were put on test at a particular point in time and the test was run until tc = 2000 hours at which point r = 5 units had been reported as failing. The reported failure times were 14.1, 88.4, 132.7, 165.1, 409.2 hours. Because of automatic monitoring, the reported failure times were close to exact. (a) Compute a nonparametric estimate of F (t), the cdf for the failure-time distribution of the motors. (b) Compute a nonparametric approximate 95% confidence interval for the cdf at time t = 200 hours. You can use the easiest large-sample approximation that is available. (c) Do you think that the Weibull ML estimate of F (t) (which would plot as a straight line on Weibull probability paper) would agree well with your nonparametric estimate? Explain why or why not? 5 8. A sample of 25 washing machine transmissions were tested for 4 thousand hours and there were no failures. Based on previous experience with similar transmissions, the engineers believe that the Weibull shape parameter can be assumed to be β = 2. (a) Compute a lower confidence bound for the Weibull scale parameter η. (b) Comment on the practical importance of the confidence bound computed in part 8a. (c) Compute a lower confidence bound for the Weibull 0.10 quantile. 9. Under usual regularity conditions, the large-sample approximate variance of the ML estimator of a scalar parameter θ (e.g., an exponential distribution mean) based on a sample size of n observations (possibly with censoring) is b = Avar(θ) h E − 1 ∂ 2 L(θ) ∂θ2 i= 1 V b. n θ (2) where L(θ) is the log likelihood of the data and the expectation is, as usual, with respect to the data to be collected. Censoring would be accounted for in taking the expectation. The quantity Vθb is a variance factor that does not depend on the sample size n. (a) Provide an intuitive explanation for why expression in the center of (2) reflects variability b in the sampling distribution of θ. (b) Explain how the quantities in (2) can be used to choose a sample size to control the precision of a confidence interval for θ. 6