Probability and Statistics Exercises

T-EXERCISES This file might be adapted during the course. 1. Is there an interval [a, b] such that f (x) = sin(x) for a ≤ x ≤ b is a probability density function? 2. The edge X of a cube is a random variable with probability density function 2(3 − x) if 2 < x < 3, fX (x) = 0 elsewhere. a) What is the probability that the edge is longer than 2.5? b) What is the expected value of the edge of the cube? c) What is the expected value of the volume of the cube? d) Let W be the volume of the cube. What is the cumulative distribution function of W? 3. A random variable X denotes the time between two crashes at the stock exchange and has the following density: fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1. (We have b > 1). a) Check that fX is a probability density function. b) Compute the expected value of X. Assume b > 2. c) Consider b = 4 and suppose we have ten independent random variables X1 . . . , X10 with probability density function fX . Use the Central Limit Theorem to approximate the probability that the average (X1 + X2 + . . . + X10 )/10 is larger than 2. You may use the fact that V (Xi ) = 3/4. d) Is it possible to approximate this probability if b = 3? Why? 4. Correct or wrong? For the life time of a car the exponential distribution is a good model. 5. Correct or wrong? The uniform distribution on the interval 0 ≤ x ≤ 1 has the lack of memory property: P (X < t + t0 |t > t0 ) = P (X < t). 6. Correct or wrong? If X is exponentially distributed, then 2X is also exponentially distributed. 7. Philips sells lamps and Philips claims that the life time is at least one year with a probability of more than 0.95. Assume that the lamps do not wear out (but fail because of incidents). a) Show that the expected life time of such a lamp is at least −1/ ln(0.95) ' 19.49 year. b) Now assume that the expected life time is 20 year. Now assume that a lamps burns continuously and that it is immediately replaced when it fails. Is it sufficient to have four lamps to guarantee that one has light for more than 3 years with a probability of at least 0.95? 1 8. Consider a supermarket with pay desks. The probability that one has to wait longer than 5 minutes before paying the bill at a pay desk equals 0.2. A man and a woman go shopping and choose different pay desks at the super market. They arrive at the same time at their pay desk. a) What is the probability that the man and the woman both have to wait less than 5 minutes? Assume independency. Assume that the waiting time is exponentially distributed with parameter λ. b) Show that λ = 0.32 (in two digits). c) The man is waiting two minutes. What is the (conditional) probability that the (total) waiting time (including these two minutes) is more than 5 minutes? 9. You call a helpdesk. You hear: there are three people waiting before you. Give a probability distribution that can be used as a model for the waiting time before it is your turn. 10. Correct or wrong? If X has an Erlang distribution with parameters (r, λ) and Y is independent from X and has also an Erlang distribution with parameters (r0 , λ0 ), then the sum X + Y is also Erlang distributed. 11. The time between two arrivals of patients at a clinic in a hospital is an important variable. So is the treatment time of the patient. Very often it is assumed that both times are exponentially distributed. Assume that the mean time between two arrivals is 4 minutes. a) What is the probability that the time between two arrivals is less than one minute? b) What is the distribution of the number of arrivals in 3 hour? Give also the parameter(s) of the distribution. c) Give (using the normal distribution) an approximation for the probability that less than 60 patients arrive in 3 hours. Assume that the mean treatment time is 5 minutes. Melvin arrives and two people are waiting. d) What is the expected value of the time Melvin has to wait before it is his turn? 2 12. Two employees (A and B) have flexible work-hours. They have to work at least 6 hours on a day, but they can choose to work 7, 8 or 9 hours. So, they can choose to work 0, 1, 2 or 3 hours extra. The number of hours that A works extra is denoted by X and the number of hours that B works extra with Y . The joint probability mass function of the discrete random variables X and Y is given by X/Y 0 1 2 3 0 1/8 1/8 0 0 1 1/8 1/16 1/16 0 2 0 1/16 1/16 1/8 3 0 0 1/8 1/8 So, for example P (X = 0, Y = 1) = 1/8. a) Show that the marginal probability distribution of X is uniform. b) It is given that on a specific day B works more than 7 hours (so 8 or 9). What is the probability that A works more than 7 hours (so 8 or 9)? c) Compute the covariance between X and Y . d) Compute the correlation between X and Y . 13. Correct or not? The variance of the sum of two random variables is equal to the sum of the variances. 14. Correct or not? If X, Y are normally distributed and independent with µ = 1, σ 2 = 1, then X − Y is normally distributed with µ = 0, σ 2 = 2. 15. A mechanical assembly used in an automobile engine contains four major components. The weights of the components are independent and normally distributed with the following means and standard deviations (in grams). Component Left case Right case Bearing assembly Bolt assembly Mean 4 5.5 10 8 Standaard Deviation 0.4 0.5 0.2 0.5 a) What is the probability that the weight of an assembly exceeds 28.5 ounces? b) What is the probability that the total weight of 8 independent assemblies exceeds 8 · 28.5 = 228 ounces? Now assume that the weights of the ’Left case’ and the ’Right case’ are dependent. The correlation is equal to −0.5. The other independencies still hold. The weight of an assembly is normally distributed. c) What is the probability that the weight of an assembly exceeds 28.5 ounces? 3 16. Consider n uniformly distributed random variables on the interval [0, 1]. The random variables are independent. a) What is the cumulative distribution function of the maximum of the n random variables? What is the expected value? b) What is the cumulative distribution function of the minimum of the n random variables? What is the expected value? Application of item a): a production process where a product has to pass n stations on a production line. So, at the same moment n products are on the line (in each station one product). The process can continue if on all the stations the operation has been completed. Let Xi be the operation time in station i. Let T be the time it takes before the production process can continue. Then T = max{X1 , · · · , Xn }. 17. 18. A company sells bags of 25 kg special sand used for making cement. The company purchases these bags for 1.50 euro per sack. The company sells the bags for 2.50, but gives some discount if a customer buys more bags. See the table below. Number of bags the customer buys i 1 2 3 4 5 6 7 8 9 10 or more Price per bag 2.50 2.45 2.40 2.36 2.32 2.28 2.25 2.22 2.19 2.16 Profit per bag w(i) 1.00 0.95 0.90 0.86 0.82 0.78 0.75 0.72 0.69 0.66 Profit per customer y 1.00 1.90 2.70 3.44 4.10 4.68 5.25 5.76 6.21 i · 0.66 The company wants to know what the expected profit per customer is. A random sample of size 76 is taken and one makes a list of the number of bags each customer buys. 4 Number of bags i 1 2 3 4 5 6 7 8 9 10 12 Number of customers a(i) 13 18 9 6 9 7 6 2 3 2 1 Use the empirical distribution function to answer the questions below. a) Estimate the expected value of the number of bags that a customer buys. b) The answer for a) is 304/76 = 4. Somebody argues: ’now we also know an estimate for the expected profit per customer’; it equals 4 · 0.86 = 3.44’, where 0.86 is the profit per bag if a customer buys 4 bags. Why is this not correct? c) Give a proper estimate for the expected profit per customer. 19. Consider the density fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1. Assume b > 1. a) Verify that fX is a density. b) Compute the expected value of X. Assume b > 2. c) Now assume b = 4 and and consider 10 independent random variables X1 . . . , X10 with density fX . Use the Central Limit Theorem to approximate the probability that the average (X1 + X2 + . . . + X10 )/10 is larger than 2. Use the fact that V (Xi ) = 3/4. 20. The number of telephone calls at call center A follow a Poisson-process with expected value µ per hour. The number of calls at call center B follow a Poisson-process with expected value 2µ. One has n observations at call center A (X1 , X2 , · · · , Xn ) and n observations at call center B (Y1 , Y2 , · · · , Yn ). All observations are independent. Consider the following estimators for µ W1 = (X + Y )/2, W2 = (2X + Y )/4. a) Are W1 and W2 unbiased estimators for µ? b) Which of two estimators is better with respect to the MSE? 21. Let X1 , X2 , · · · , Xn a random sample from a uniform distribution on the interval [0, θ] with θ unknown. The density of the distribution is f (x) = 1 θ (0 ≤ x ≤ θ) . 5 An estimator θb for the unknown θ is Pn Xi θb = 2X = 2 i=1 . n a) Show that the estimator θb is unbiased for θ. b) Determine the Mean Squared Error (MSE). 22. The waiting time for a ticket office is exponentially distributed on the interval [θ, ∞). The density is f (x) = e−(x−θ) , (x ≥ θ) . This means that the waiting time is the sum of a threshold θ and an exponentially distributed waiting time on the interval [0, ∞). The parameter θ is unknown. One has a random sample of size 15 (X1 , X2 , · · · , X15 ) to estimate θ. It is known that a good estimator is Y =min(X1 , X2 , · · · , X15 ) . For the density f (y) we have f (y) = 15e−15(y−θ) , (y ≥ θ) . This means: if we define V = Y −θ, then V is exponentially distributed with parameter λ = 15. a) Show that this estimator is biased and that the bias is 1 15 . In the following question it can be used that E(Y 2 ) = 2 2θ + θ2 . + 2 15 15 b) Determine the MSE of the estimator 23. With a computer 15 random samples of size 4 are taken from a normal distribution with µ = 11 en σ = 1. These are used to construct 15 90%-confidence intervals (CI) for µ, where it is assumed that one does not know µ, but σ is known. a) Is the width (upper bound minus lower bound) of these CI’s the same? b) Do these 15 intervals all contain the value 11? Why. c) What is the probability that the 15 random samples will be generated in such a way that they all will contain the value 11? d) Consider the first random sample. What is the probability that it will be generated in such a way that it will contain the value 12? e) The upper bound of the third CI is the realisation of a random variable; we notate it with R3 . Compute E(R3 ). We construct 15 CI’s as described above, but now we have σ = 2. f) What do you expect? Will there be more or less CI’s that will contain the value 11? 6 24. In a lake one investigates the mercury (Dutch: kwik) contamination using a random sample of fish with size 53. A normal probability plot gives a) Can the normal distribution be used to find a confidence interval on µ (the population mean)? Give an argument. A statistical package gives the following output. Summary Statistics for mercury (kwik) Count = 53 Average = 0.524981 Variance = 0.12154 Standard deviation = 0.348625 Minimum = 0.04 Maximum = 1.33 Range = 1.29 b) Give a 95% confidence interval for µ. c) Is the following statement correct? If one has many observations, then 95% of the observations will be in interval from b). Give an argument. 7 25. The waiting time X in minutes at a service desk is unknown parameter λ > 0. So the expected value µ of size 20 was selected and the mean sample waiting estimator for the expected value µ is X, the average of X we have V (X) = exponentially distributed with equals 1/λ. A random sample time was 10 minutes. A good the sample. For the variance of µ2 1 = . nλ2 n a) Construct a 95% two-sided confidence interval for the expected value µ of the waiting time using the normal approximation. It is also possible to construct an exact confidence interval using the χ2 -distribution. It is known that the quantity 2nX µ (0.1) has a χ2 -distribution with 2n degrees of freedom. b) Construct the 95% confidence interval by use of (1). Hint: first find values l and u such that 2nX < u = 0.95. P l< µ 26. A manufacturer wants to know the mean of the lifetime of the batteries he produces. He wants to have a 95% confidence interval on the expected lifetime. It is known that the lifetime has a normal distribution with standard deviation 10 hour. a) What should the sample size be if the width of the interval should not exceed 2? A random sample of size 20 gives as results 20 X i=1 x2i = 47399.2 20 X xi = 956 . i=1 b) Give a 95-% lower-confidence bound (of the form (l, ∞) for the expected lifetime. Assume that the standard deviation is unknown. Assume that the interval in b) equals (47, ∞). The manufacturer claims ’If you buy a battery at our company, the probability that the lifetime is more than 47.0 hour is 0.95’. c) Do you support this claim? 8 27. In a complicated production process the proportion of defective products is high (on average 10%). One wants to investigate if the proportion has become even higher using a random sample of size 20. The null hypothesis is rejected if the number of defect products is equal to 4 or more. (Remark: Answer the following questions without using the normal approximation) a) What is the significance level (or α-error or type I error probability) of this test? b) The sample yields 3 defective products. What is the p-value? c) What is the type II error probability if the true proportion is 20%? 28. Consider 500g packs of butter. The weight has a normal distribution with population mean µ = 500. One wants to test the null hypothesis H0 : µ = 500. The standard deviation is known (but not given here yet) and a random sample is taken. The 95% two-sided confidence interval, based on this random sample, is (501.1, 510.9). Consider the test µ = 500 against the alternative µ 6= 500 with confidence level α = 0.05. a) Is the null hypothesis rejected? b) What is the acceptance region for this test? Now assume that the standard deviation equals σ = 10 g and that the random sample has size n = 25. In that case the acceptance region for the mentioned test (with confidence level 0.05) is (496.08, 503.92). c) What is the power for the test if the true value of µ equals 495? d) What sample size do we need to have a power of 0.90 if µ = 495? 29. The number of orders a company has each week can be modeled with a Poisson process with expected value µ. Until now it is assumed that µ equals 2. One wants to test if the number of orders is larger than it used to be and one wants to have a strong conclusion. One decides to use the number of orders during 5 weeks to test the null hypothesis H0 : µ = 2. The null hypothesis is rejected if the total number is 16 or more. Remark: the sum of n independent Poisson distributed random variables with expected value µ is also Poisson distributed (with expected value n · µ). a) What is the significance level (or α-error or type I error probability) of this test? b) What is the type II error probability if the true proportion of µ is 3? c) The number of orders that the company got in the 5 weeks was 14. What is the p-value? 30. The weight (x) of 110 randomly chosen men is measured. They are classified as follows: Weight (x) in kg x ≤ 60 60 < x ≤ 70 70 < x ≤ 80 80 < x ≤ 90 x > 90 Number of men in this class 10 20 46 27 7 One wants to test if the weight of men is normally distributed with σ = 10. The expected value of the normal distribution is estimated with the sample. The sample mean is 76. 9 a) What is the probability distribution of the test statistic (give also the parameter(s) of the distribution)? b) What is the expected number of men with a weight less or equal than 60 kg? c) The value of the test statistic is 3.52. Is the null hypothesis rejected (α = 0.05)? 31. A company sells vegetables in cans. Some of the cans do not meet the specifications. There are several reasons for this: a stain(1), a dint(2), the eye to open it is on the wrong place(3), there is no eye to open it (4), other (5). We call these deviations. One takes a random sample and classifies the wrong scans. (1) 89 (2) 145 (3) 58 (4) 54 (5) 29 Sum 375 One wants to investigate if the number of deviations for the 5 classes follow the pattern 2:3:2:2:1. a) Formulate the null hypothesis in terms of probabilities and compute the number of expected cans for each class when the null hypothesis is true. b) The value of the test statistic is 23.7. Is the null hypothesis rejected (α = 0.05)? Why? c) Give a 95% two-sided confidence interval for the proportion of tins with a dint. 32. Consider crates with 20 bottles with beer. Define X as the number of bottles that contain not enough beer. The number of bottles with not enough beer is counted for 50 crates and the results are given below. Value of X Number of crates 0 15 1 22 2 11 3 2 a) One wants to investigate if the binomial model is a good model for the number of bottles with not enough beer. Estimate the expected number of crates in class 0 (so all bottles contain enough beer) if the binomial model is a good model. b) The value of the test statistic is 2.09. Give the critical value for a confidence level of α = 0.05? Is the null hypothesis rejected? Why (not)? 33. A company operates four machines in three shifts. From production records the following data on the number of breakdowns are collected. Shift 1 2 3 A 41 31 15 Machines B C 20 12 11 9 17 16 D 16 14 10 One wants to test the null hypothesis that the breakdowns are independent of the shift. a) Compute the expected number of breakdowns for machine B for shift 2 if the hypothesis is true. 10 b) The value of the test statistic is 11.65. Is the null hypothesis rejected? Why (not)? Use α = 0.05. 34. One adds a certain ingredient to an laundry detergent to try to improve the effect of the detergent. At random 10 dirty overalls are chosen and cut in two parts. One part is washed with the detergent with the ingredient and the other part is washed with the detergent without the ingredient. One measures in some unit how clean the (parts of) the overalls are. The observations are Overall Detergent without (’OUD’) Detergent with (’NIEUW’) 1 5 7 2 5 9 3 12 17 4 29 36 5 10 8 6 33 40 7 33 29 8 17 27 9 2 5 10 8 19 With a statistical package two analyses are done (one for paired observations and one for independent observations). The results are given below a1) Which of the two methods (paired or independent) should be applied here for the analysis? Explain your choice. a2) Explain the difference in p-values for the two methods. b) Test ONE-SIDED the null hypothesis that the new detergent (with the ingredient) is better. Use α = 0.05. 11 35. Consider simple linear regression and the data. x y −2 −5 −1 0 0 −1 1 3 2 3 a) Give the model and assumptions. b) Compute the estimates for the intercept (β0 ) and slope (β1 ) of the model. It is given that the estimate for the variance of the error term is equal to 2.6333. c) Give a 95% prediction interval for Y if x = 0.5. 36. One wants to investigate the relation between blood pressure Y and noise x. The following 20 observations are available. y x y x 1 60 5 85 0 63 4 89 1 65 6 90 2 70 8 90 5 70 4 90 1 70 5 90 4 80 7 94 6 90 9 100 2 80 7 100 3 80 6 100 Here y is the increase of the blood pressure (in mm Hg) and x is the level of the noise in decibel. A statistical package gives the following results Regression Analysis - Linear model: Y = a + b*X --------------------------------------------------------------------Dependent variable: pressure Independent variable: noise --------------------------------------------------------------------Standard T Parameter Estimate Error Statistic P-Value --------------------------------------------------------------------Intercept ???????? 1.9949 -5.07872 0.0001 Slope 0.174294 0.0238286 7.31447 0.0000 --------------------------------------------------------------------- a) Give the estimate for the intercept of the model. b) Give a 95%-confidence interval for the coefficient of x in this model. c) Someone wants to know the increase of the blood pressure if x has the value 120. What is the answer according to the linear model. Would you give this answer if someone poses you this question? Why (not) ? 12 37. Corrosion of iron in reinforced concrete is a problem for its sustainability. For a number of constructions the strength y (in MPa) is measured and also the so called ’depth of carbonification’ x (in mm) which is an important variable for corrosion. A statistical package gives as result Regression Analysis - Linear model: Y = a + b*X ----------------------------------------------------------------------------Dependent variable: y Independent variable: x ----------------------------------------------------------------------------Standard T Parameter Estimate Error Statistic P-Value ----------------------------------------------------------------------------Intercept 27.1829 1.65135 16.4611 0.0000 Slope -0.297561 0.0411642 ??????? ?????? ----------------------------------------------------------------------------- Analysis of Variance ----------------------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value ----------------------------------------------------------------------------Model 428.615 1 ??????? ????? ????? Residual 131.242 16 ??????? ----------------------------------------------------------------------------Total (Corr.) 559.858 17 Correlation Coefficient = -0.874974 R-squared = 76.5579 percent R-squared (adjusted for d.f.) = 75.0928 percent Standard Error of Est. = 2.86403 a) What is the number of observations n? b) Compute, using the above results, the sum of squares Sxx = c) Give the 95%-confidence interval for the intercept. d) Test the null hypothesis β1 = 0. Use α = 0.05. 13 P (xi − x)2 . 38. The height of a tree y (in cm) is measured during a period of 10 year. The years x are coded with 1, 2, 3, · · · , 10. The table is (partially) given here. yi (hoogte) xi (jaar) 150 1 161 2 167 3 ?? 4 ?? 5 ?? 6 ?? 7 ?? 8 ?? 9 208 10 Regression analysis (based on this 10 observations) gives the following results (’hoogte’=height and ’jaar’=year). a) Give a two sided 95%-confidence interval for σ 2 . b) Give an estimate of the height of the tree in the next year (year 11). c) Give a 95% prediction interval for the height of the tree in the next year (year 11). The next picture shows the residuals against the number of the year. d) Use the picture to comment on the model assumptions. What are the consequences for the prediction of the height in the next year (11)? 14 39. One wants to optimize a production process. Two factors are important: temperature and the type of the machine. In an experiment the temperature has two values (60o C en 70o C). There are three types of machines. An observation is done for all possible combinations. The (coded) observations are Temp 60 60 60 60 60 60 70 70 70 70 70 70 Machine A B C A B C A B C A B C Waarneming −2 0 0 −2 1 1 1 1 0 1 0 1 Formulate this problem as a regression model (including cross products between temperature and machine). 40. One wants to relate the quality of wine to a number of variables: ’Clarity’ (x1 or A), ’Aroma’ (x2 or B), ’Body’ (x3 or C), ’Flavor’ (x4 or D) and ’Oakiness’ (x5 or E); 38 samples of wine have been tested. The dependent variable y is a score which is a measure for the quality. Consider the model Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ε , with the usual assumptions for ε. A statistical package gives the following output. Multiple Regression Analysis Dependent variable: Quality ---------------------------------------------------------------Standard T Parameter Estimate Error Statistic P-Value ---------------------------------------------------------------CONSTANT 3.99686 2.23177 1.79089 0.0828 Clarity 2.33945 1.73483 1.34852 0.1870 Aroma 0.482551 0.272447 1.77117 0.0861 Body 0.273161 0.332561 0.821388 0.4175 Flavor 1.16832 0.304481 3.8371 0.0006 Oakiness -0.68401 0.271193 -2.52223 0.0168 ---------------------------------------------------------------- 15 Analysis of Variance ---------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value ---------------------------------------------------------------Model 111.54 5 22.3081 16.51 ?????? Residual 43.248 32 1.3515 ---------------------------------------------------------------Total (Corr. 154.788 37 And for a model with only three variables Multiple Regression Analysis --------------------------------------------------------------Dependent variable: Quality --------------------------------------------------------------Standard T Parameter Estimate Error Statistic P-Value --------------------------------------------------------------CONSTANT 6.46719 1.33279 4.85238 0.0000 Aroma 0.58012 0.262185 2.21264 0.0337 Flavor 1.19969 0.274881 4.36441 0.0001 Oakiness -0.602325 0.264401 -2.27807 0.0291 --------------------------------------------------------------Analysis of Variance --------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value --------------------------------------------------------------Model 108.935 3 36.3117 26.92 0.0000 Residual 45.8534 34 1.34863 --------------------------------------------------------------Total (Corr.) 154.788 37 a) Test the hypothesis β1 = β2 = β3 = β4 = β5 = 0. b) Test the hypothesis β1 = β3 = 0. See next page 16 All possible 32 models (using the 5 regressors) are considered. This gives the following output Regression Model Selection Dependent variable: Quality Independent variables: A=Clarity B=Aroma C=Body D=Flavor E=Oakiness Number of models fit: 32 Model Results ------------------------------------------Adjusted Included MSE R-Squared R-Squared Variables ------------------------------------------4.18347 0.0 0.0 4.18347 2.7027 0.0 A 2.14852 50.0308 48.6427 B 3.00516 30.1074 28.1659 C 1.61593 62.4174 61.3735 D 4.18347 2.7027 0.0 E 2.20886 50.0544 47.2004 AB 2.9001 34.4245 30.6773 AC 1.62128 63.3404 61.2456 AD 4.18347 5.40541 0.0 AE 2.04696 53.7151 51.0703 BC 1.51006 65.8552 63.904 BD 2.04406 53.7807 51.1396 BE 1.65123 62.6632 60.5296 CD 3.01392 31.8508 27.9566 CE 1.49874 66.1112 64.1747 DE 2.08516 54.1984 50.1571 ABC 1.53649 66.2504 63.2725 ABD 2.10258 53.8158 49.7407 ABE 1.63487 64.0893 60.9207 ACD 2.82341 37.9826 32.5105 ACE 1.45631 68.0116 65.1891 ADE 1.55192 65.9114 62.9036 BCD 1.91841 57.8612 54.143 BCE 1.34863 70.3767 67.7629 BDE 1.52707 66.4572 63.4975 CDE 1.57108 66.5054 62.4454 ABCD 1.91353 59.2046 54.2597 ABCE 1.33818 71.4708 68.0128 ABDE 1.43902 69.3209 65.6022 ACDE 1.38502 70.4721 66.893 BCDE 1.3515 72.0599 67.6943 ABCDE ------------------------------------------ 17 c) If you have to choose one of these models, which model will you choose? Give arguments. One uses stepwise regression (where we can use all 5 variables). d) Which variable is the first variable that will be chosen in the model? Why? e) Must another variable be added to the model? Give arguments. Consider the model with only the variables ’Aroma’ (x2 ), ’Flavor’ (x4 ) and ’Oakiness’ (x5 ). The output is Multiple Regression Analysis Dependent variable: Quality -----------------------------------Parameter Estimate -----------------------------------CONSTANT 6.46719 Aroma 0.58012 Flavor 1.19969 Oakiness -0.602325 -----------------------------------Regression Results for Quality with ’Aroma’=5, ’Flavor’=5, ’Oakiness’=4. ----------------------------------------------------------------------------------Lower Upper Lower Upper Fitted Stnd. Error 95.0% CL 95.0% CL 95.0% CL 95.0% CL Value for Forecast Forecast Forecast for Mean for Mean ----------------------------------------------------------------------------------????? 1.17966 ????? ??????? 12.5357 13.3782 ----------------------------------------------------------------------------------- f) You buy a bottle of wine with ’Aroma’= 5, ’Flavor’= 5, ’Oakiness’= 4. Give a 95% prediction interval for the quality of the wine. 18

Probability and Statistics Exercises

Related documents

Products

Support

Probability and Statistics Exercises

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib