Uploaded by memes.hubz0

T-exercises

advertisement
T-EXERCISES
This file might be adapted during the course.
1. Is there an interval [a, b] such that f (x) = sin(x) for a ≤ x ≤ b is a probability density
function?
2. The edge X of a cube is a random variable with probability density function
2(3 − x) if 2 < x < 3,
fX (x) =
0
elsewhere.
a) What is the probability that the edge is longer than 2.5?
b) What is the expected value of the edge of the cube?
c) What is the expected value of the volume of the cube?
d) Let W be the volume of the cube. What is the cumulative distribution function of
W?
3. A random variable X denotes the time between two crashes at the stock exchange and
has the following density: fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
(We have b > 1).
a) Check that fX is a probability density function.
b) Compute the expected value of X. Assume b > 2.
c) Consider b = 4 and suppose we have ten independent random variables X1 . . . , X10
with probability density function fX . Use the Central Limit Theorem to approximate the probability that the average (X1 + X2 + . . . + X10 )/10 is larger than 2.
You may use the fact that V (Xi ) = 3/4.
d) Is it possible to approximate this probability if b = 3? Why?
4. Correct or wrong? For the life time of a car the exponential distribution is a good
model.
5. Correct or wrong? The uniform distribution on the interval 0 ≤ x ≤ 1 has the lack of
memory property: P (X < t + t0 |t > t0 ) = P (X < t).
6. Correct or wrong? If X is exponentially distributed, then 2X is also exponentially
distributed.
7. Philips sells lamps and Philips claims that the life time is at least one year with a
probability of more than 0.95. Assume that the lamps do not wear out (but fail because
of incidents).
a) Show that the expected life time of such a lamp is at least −1/ ln(0.95) ' 19.49
year.
b) Now assume that the expected life time is 20 year. Now assume that a lamps burns
continuously and that it is immediately replaced when it fails. Is it sufficient to
have four lamps to guarantee that one has light for more than 3 years with a
probability of at least 0.95?
1
8. Consider a supermarket with pay desks. The probability that one has to wait longer
than 5 minutes before paying the bill at a pay desk equals 0.2. A man and a woman go
shopping and choose different pay desks at the super market. They arrive at the same
time at their pay desk.
a) What is the probability that the man and the woman both have to wait less than
5 minutes? Assume independency.
Assume that the waiting time is exponentially distributed with parameter λ.
b) Show that λ = 0.32 (in two digits).
c) The man is waiting two minutes. What is the (conditional) probability that the
(total) waiting time (including these two minutes) is more than 5 minutes?
9. You call a helpdesk. You hear: there are three people waiting before you. Give a
probability distribution that can be used as a model for the waiting time before it is
your turn.
10. Correct or wrong? If X has an Erlang distribution with parameters (r, λ) and Y is
independent from X and has also an Erlang distribution with parameters (r0 , λ0 ), then
the sum X + Y is also Erlang distributed.
11. The time between two arrivals of patients at a clinic in a hospital is an important
variable. So is the treatment time of the patient. Very often it is assumed that both
times are exponentially distributed. Assume that the mean time between two arrivals
is 4 minutes.
a) What is the probability that the time between two arrivals is less than one minute?
b) What is the distribution of the number of arrivals in 3 hour? Give also the parameter(s) of the distribution.
c) Give (using the normal distribution) an approximation for the probability that
less than 60 patients arrive in 3 hours.
Assume that the mean treatment time is 5 minutes. Melvin arrives and two people are
waiting.
d) What is the expected value of the time Melvin has to wait before it is his turn?
2
12. Two employees (A and B) have flexible work-hours. They have to work at least 6 hours
on a day, but they can choose to work 7, 8 or 9 hours. So, they can choose to work
0, 1, 2 or 3 hours extra. The number of hours that A works extra is denoted by X and
the number of hours that B works extra with Y . The joint probability mass function
of the discrete random variables X and Y is given by
X/Y
0
1
2
3
0
1/8
1/8
0
0
1
1/8
1/16
1/16
0
2
0
1/16
1/16
1/8
3
0
0
1/8
1/8
So, for example P (X = 0, Y = 1) = 1/8.
a) Show that the marginal probability distribution of X is uniform.
b) It is given that on a specific day B works more than 7 hours (so 8 or 9). What is
the probability that A works more than 7 hours (so 8 or 9)?
c) Compute the covariance between X and Y .
d) Compute the correlation between X and Y .
13. Correct or not? The variance of the sum of two random variables is equal to the sum
of the variances.
14. Correct or not? If X, Y are normally distributed and independent with µ = 1, σ 2 = 1,
then X − Y is normally distributed with µ = 0, σ 2 = 2.
15. A mechanical assembly used in an automobile engine contains four major components.
The weights of the components are independent and normally distributed with the
following means and standard deviations (in grams).
Component
Left case
Right case
Bearing assembly
Bolt assembly
Mean
4
5.5
10
8
Standaard Deviation
0.4
0.5
0.2
0.5
a) What is the probability that the weight of an assembly exceeds 28.5 ounces?
b) What is the probability that the total weight of 8 independent assemblies exceeds
8 · 28.5 = 228 ounces?
Now assume that the weights of the ’Left case’ and the ’Right case’ are dependent. The
correlation is equal to −0.5. The other independencies still hold. The weight of an
assembly is normally distributed.
c) What is the probability that the weight of an assembly exceeds 28.5 ounces?
3
16. Consider n uniformly distributed random variables on the interval [0, 1]. The random
variables are independent.
a) What is the cumulative distribution function of the maximum of the n random
variables? What is the expected value?
b) What is the cumulative distribution function of the minimum of the n random
variables? What is the expected value?
Application of item a): a production process where a product has to pass n stations on
a production line. So, at the same moment n products are on the line (in each station
one product). The process can continue if on all the stations the operation has been
completed. Let Xi be the operation time in station i. Let T be the time it takes before
the production process can continue. Then T = max{X1 , · · · , Xn }.
17. 18. A company sells bags of 25 kg special sand used for making cement. The company
purchases these bags for 1.50 euro per sack. The company sells the bags for 2.50, but
gives some discount if a customer buys more bags. See the table below.
Number of bags
the customer buys
i
1
2
3
4
5
6
7
8
9
10 or more
Price
per bag
2.50
2.45
2.40
2.36
2.32
2.28
2.25
2.22
2.19
2.16
Profit
per bag
w(i)
1.00
0.95
0.90
0.86
0.82
0.78
0.75
0.72
0.69
0.66
Profit per
customer
y
1.00
1.90
2.70
3.44
4.10
4.68
5.25
5.76
6.21
i · 0.66
The company wants to know what the expected profit per customer is. A random
sample of size 76 is taken and one makes a list of the number of bags each customer
buys.
4
Number of
bags
i
1
2
3
4
5
6
7
8
9
10
12
Number
of customers
a(i)
13
18
9
6
9
7
6
2
3
2
1
Use the empirical distribution function to answer the questions below.
a) Estimate the expected value of the number of bags that a customer buys.
b) The answer for a) is 304/76 = 4. Somebody argues: ’now we also know an estimate
for the expected profit per customer’; it equals 4 · 0.86 = 3.44’, where 0.86 is the
profit per bag if a customer buys 4 bags. Why is this not correct?
c) Give a proper estimate for the expected profit per customer.
19. Consider the density fX (x) = (b − 1)/xb for 1 ≤ x < ∞, and fX (x) = 0 for x < 1.
Assume b > 1.
a) Verify that fX is a density.
b) Compute the expected value of X. Assume b > 2.
c) Now assume b = 4 and and consider 10 independent random variables X1 . . . , X10
with density fX . Use the Central Limit Theorem to approximate the probability
that the average (X1 + X2 + . . . + X10 )/10 is larger than 2. Use the fact that
V (Xi ) = 3/4.
20. The number of telephone calls at call center A follow a Poisson-process with expected
value µ per hour. The number of calls at call center B follow a Poisson-process with
expected value 2µ. One has n observations at call center A (X1 , X2 , · · · , Xn ) and n
observations at call center B (Y1 , Y2 , · · · , Yn ). All observations are independent.
Consider the following estimators for µ
W1 = (X + Y )/2,
W2 = (2X + Y )/4.
a) Are W1 and W2 unbiased estimators for µ?
b) Which of two estimators is better with respect to the MSE?
21. Let X1 , X2 , · · · , Xn a random sample from a uniform distribution on the interval [0, θ]
with θ unknown. The density of the distribution is
f (x) =
1
θ
(0 ≤ x ≤ θ) .
5
An estimator θb for the unknown θ is
Pn
Xi
θb = 2X = 2 i=1
.
n
a) Show that the estimator θb is unbiased for θ.
b) Determine the Mean Squared Error (MSE).
22. The waiting time for a ticket office is exponentially distributed on the interval [θ, ∞).
The density is
f (x) = e−(x−θ) ,
(x ≥ θ) .
This means that the waiting time is the sum of a threshold θ and an exponentially
distributed waiting time on the interval [0, ∞). The parameter θ is unknown. One has
a random sample of size 15 (X1 , X2 , · · · , X15 ) to estimate θ. It is known that a good
estimator is Y =min(X1 , X2 , · · · , X15 ) . For the density f (y) we have
f (y) = 15e−15(y−θ) ,
(y ≥ θ) .
This means: if we define V = Y −θ, then V is exponentially distributed with parameter
λ = 15.
a) Show that this estimator is biased and that the bias is
1
15 .
In the following question it can be used that
E(Y 2 ) =
2
2θ
+ θ2 .
+
2
15
15
b) Determine the MSE of the estimator
23. With a computer 15 random samples of size 4 are taken from a normal distribution
with µ = 11 en σ = 1. These are used to construct 15 90%-confidence intervals (CI)
for µ, where it is assumed that one does not know µ, but σ is known.
a) Is the width (upper bound minus lower bound) of these CI’s the same?
b) Do these 15 intervals all contain the value 11? Why.
c) What is the probability that the 15 random samples will be generated in such a
way that they all will contain the value 11?
d) Consider the first random sample. What is the probability that it will be generated
in such a way that it will contain the value 12?
e) The upper bound of the third CI is the realisation of a random variable; we notate
it with R3 . Compute E(R3 ).
We construct 15 CI’s as described above, but now we have σ = 2.
f) What do you expect? Will there be more or less CI’s that will contain the value
11?
6
24. In a lake one investigates the mercury (Dutch: kwik) contamination using a random
sample of fish with size 53. A normal probability plot gives
a) Can the normal distribution be used to find a confidence interval on µ (the population mean)? Give an argument.
A statistical package gives the following output.
Summary Statistics for mercury (kwik)
Count = 53
Average = 0.524981
Variance = 0.12154
Standard deviation = 0.348625
Minimum = 0.04
Maximum = 1.33
Range = 1.29
b) Give a 95% confidence interval for µ.
c) Is the following statement correct? If one has many observations, then 95% of the
observations will be in interval from b). Give an argument.
7
25. The waiting time X in minutes at a service desk is
unknown parameter λ > 0. So the expected value µ
of size 20 was selected and the mean sample waiting
estimator for the expected value µ is X, the average of
X we have
V (X) =
exponentially distributed with
equals 1/λ. A random sample
time was 10 minutes. A good
the sample. For the variance of
µ2
1
=
.
nλ2
n
a) Construct a 95% two-sided confidence interval for the expected value µ of the
waiting time using the normal approximation.
It is also possible to construct an exact confidence interval using the χ2 -distribution.
It is known that the quantity
2nX
µ
(0.1)
has a χ2 -distribution with 2n degrees of freedom.
b) Construct the 95% confidence interval by use of (1).
Hint: first find values l and u such that
2nX
< u = 0.95.
P l<
µ
26. A manufacturer wants to know the mean of the lifetime of the batteries he produces.
He wants to have a 95% confidence interval on the expected lifetime. It is known that
the lifetime has a normal distribution with standard deviation 10 hour.
a) What should the sample size be if the width of the interval should not exceed 2?
A random sample of size 20 gives as results
20
X
i=1
x2i = 47399.2
20
X
xi = 956 .
i=1
b) Give a 95-% lower-confidence bound (of the form (l, ∞) for the expected lifetime.
Assume that the standard deviation is unknown.
Assume that the interval in b) equals (47, ∞). The manufacturer claims ’If you buy
a battery at our company, the probability that the lifetime is more than 47.0 hour is
0.95’.
c) Do you support this claim?
8
27. In a complicated production process the proportion of defective products is high (on
average 10%). One wants to investigate if the proportion has become even higher using
a random sample of size 20. The null hypothesis is rejected if the number of defect
products is equal to 4 or more.
(Remark: Answer the following questions without using the normal approximation)
a) What is the significance level (or α-error or type I error probability) of this test?
b) The sample yields 3 defective products. What is the p-value?
c) What is the type II error probability if the true proportion is 20%?
28. Consider 500g packs of butter. The weight has a normal distribution with population
mean µ = 500. One wants to test the null hypothesis H0 : µ = 500. The standard
deviation is known (but not given here yet) and a random sample is taken. The 95%
two-sided confidence interval, based on this random sample, is (501.1, 510.9). Consider
the test µ = 500 against the alternative µ 6= 500 with confidence level α = 0.05.
a) Is the null hypothesis rejected?
b) What is the acceptance region for this test?
Now assume that the standard deviation equals σ = 10 g and that the random sample
has size n = 25. In that case the acceptance region for the mentioned test (with
confidence level 0.05) is (496.08, 503.92).
c) What is the power for the test if the true value of µ equals 495?
d) What sample size do we need to have a power of 0.90 if µ = 495?
29. The number of orders a company has each week can be modeled with a Poisson process
with expected value µ. Until now it is assumed that µ equals 2. One wants to test
if the number of orders is larger than it used to be and one wants to have a strong
conclusion. One decides to use the number of orders during 5 weeks to test the null
hypothesis H0 : µ = 2. The null hypothesis is rejected if the total number is 16 or more.
Remark: the sum of n independent Poisson distributed random variables with expected
value µ is also Poisson distributed (with expected value n · µ).
a) What is the significance level (or α-error or type I error probability) of this test?
b) What is the type II error probability if the true proportion of µ is 3?
c) The number of orders that the company got in the 5 weeks was 14. What is the
p-value?
30. The weight (x) of 110 randomly chosen men is measured. They are classified as follows:
Weight (x) in kg
x ≤ 60
60 < x ≤ 70
70 < x ≤ 80
80 < x ≤ 90
x > 90
Number of men in this class
10
20
46
27
7
One wants to test if the weight of men is normally distributed with σ = 10. The
expected value of the normal distribution is estimated with the sample. The sample
mean is 76.
9
a) What is the probability distribution of the test statistic (give also the parameter(s)
of the distribution)?
b) What is the expected number of men with a weight less or equal than 60 kg?
c) The value of the test statistic is 3.52. Is the null hypothesis rejected (α = 0.05)?
31. A company sells vegetables in cans. Some of the cans do not meet the specifications.
There are several reasons for this: a stain(1), a dint(2), the eye to open it is on the
wrong place(3), there is no eye to open it (4), other (5). We call these deviations. One
takes a random sample and classifies the wrong scans.
(1)
89
(2)
145
(3)
58
(4)
54
(5)
29
Sum
375
One wants to investigate if the number of deviations for the 5 classes follow the pattern
2:3:2:2:1.
a) Formulate the null hypothesis in terms of probabilities and compute the number of
expected cans for each class when the null hypothesis is true.
b) The value of the test statistic is 23.7. Is the null hypothesis rejected (α = 0.05)?
Why?
c) Give a 95% two-sided confidence interval for the proportion of tins with a dint.
32. Consider crates with 20 bottles with beer. Define X as the number of bottles that
contain not enough beer. The number of bottles with not enough beer is counted for
50 crates and the results are given below.
Value of X
Number of crates
0
15
1
22
2
11
3
2
a) One wants to investigate if the binomial model is a good model for the number of
bottles with not enough beer. Estimate the expected number of crates in class 0
(so all bottles contain enough beer) if the binomial model is a good model.
b) The value of the test statistic is 2.09. Give the critical value for a confidence level
of α = 0.05? Is the null hypothesis rejected? Why (not)?
33. A company operates four machines in three shifts. From production records the following data on the number of breakdowns are collected.
Shift
1
2
3
A
41
31
15
Machines
B C
20 12
11 9
17 16
D
16
14
10
One wants to test the null hypothesis that the breakdowns are independent of the shift.
a) Compute the expected number of breakdowns for machine B for shift 2 if the
hypothesis is true.
10
b) The value of the test statistic is 11.65. Is the null hypothesis rejected? Why (not)?
Use α = 0.05.
34. One adds a certain ingredient to an laundry detergent to try to improve the effect of
the detergent. At random 10 dirty overalls are chosen and cut in two parts. One part
is washed with the detergent with the ingredient and the other part is washed with the
detergent without the ingredient. One measures in some unit how clean the (parts of)
the overalls are. The observations are
Overall
Detergent without (’OUD’)
Detergent with (’NIEUW’)
1
5
7
2
5
9
3
12
17
4
29
36
5
10
8
6
33
40
7
33
29
8
17
27
9
2
5
10
8
19
With a statistical package two analyses are done (one for paired observations and one
for independent observations). The results are given below
a1) Which of the two methods (paired or independent) should be applied here for the
analysis? Explain your choice.
a2) Explain the difference in p-values for the two methods.
b) Test ONE-SIDED the null hypothesis that the new detergent (with the ingredient)
is better. Use α = 0.05.
11
35. Consider simple linear regression and the data.
x
y
−2 −5
−1
0
0 −1
1
3
2
3
a) Give the model and assumptions.
b) Compute the estimates for the intercept (β0 ) and slope (β1 ) of the model.
It is given that the estimate for the variance of the error term is equal to 2.6333.
c) Give a 95% prediction interval for Y if x = 0.5.
36. One wants to investigate the relation between blood pressure Y and noise x. The
following 20 observations are available.
y
x
y
x
1
60
5
85
0
63
4
89
1
65
6
90
2
70
8
90
5
70
4
90
1
70
5
90
4
80
7
94
6
90
9
100
2
80
7
100
3
80
6
100
Here y is the increase of the blood pressure (in mm Hg) and x is the level of the noise
in decibel. A statistical package gives the following results
Regression Analysis - Linear model: Y = a + b*X
--------------------------------------------------------------------Dependent variable: pressure
Independent variable: noise
--------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
--------------------------------------------------------------------Intercept
????????
1.9949
-5.07872
0.0001
Slope
0.174294
0.0238286
7.31447
0.0000
---------------------------------------------------------------------
a) Give the estimate for the intercept of the model.
b) Give a 95%-confidence interval for the coefficient of x in this model.
c) Someone wants to know the increase of the blood pressure if x has the value 120.
What is the answer according to the linear model. Would you give this answer if
someone poses you this question? Why (not) ?
12
37. Corrosion of iron in reinforced concrete is a problem for its sustainability. For a number
of constructions the strength y (in MPa) is measured and also the so called ’depth of
carbonification’ x (in mm) which is an important variable for corrosion. A statistical
package gives as result
Regression Analysis - Linear model: Y = a + b*X
----------------------------------------------------------------------------Dependent variable: y
Independent variable: x
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------Intercept
27.1829
1.65135
16.4611
0.0000
Slope
-0.297561
0.0411642
???????
??????
-----------------------------------------------------------------------------
Analysis of Variance
----------------------------------------------------------------------------Source
Sum of Squares
Df Mean Square
F-Ratio
P-Value
----------------------------------------------------------------------------Model
428.615
1
???????
?????
?????
Residual
131.242
16
???????
----------------------------------------------------------------------------Total (Corr.)
559.858
17
Correlation Coefficient = -0.874974
R-squared = 76.5579 percent
R-squared (adjusted for d.f.) = 75.0928 percent
Standard Error of Est. = 2.86403
a) What is the number of observations n?
b) Compute, using the above results, the sum of squares Sxx =
c) Give the 95%-confidence interval for the intercept.
d) Test the null hypothesis β1 = 0. Use α = 0.05.
13
P
(xi − x)2 .
38. The height of a tree y (in cm) is measured during a period of 10 year. The years x are
coded with 1, 2, 3, · · · , 10. The table is (partially) given here.
yi (hoogte)
xi (jaar)
150
1
161
2
167
3
??
4
??
5
??
6
??
7
??
8
??
9
208
10
Regression analysis (based on this 10 observations) gives the following results (’hoogte’=height
and ’jaar’=year).
a) Give a two sided 95%-confidence interval for σ 2 .
b) Give an estimate of the height of the tree in the next year (year 11).
c) Give a 95% prediction interval for the height of the tree in the next year (year 11).
The next picture shows the residuals against the number of the year.
d) Use the picture to comment on the model assumptions.
What are the consequences for the prediction of the height in the next year (11)?
14
39. One wants to optimize a production process. Two factors are important: temperature
and the type of the machine. In an experiment the temperature has two values (60o C
en 70o C). There are three types of machines. An observation is done for all possible
combinations. The (coded) observations are
Temp
60
60
60
60
60
60
70
70
70
70
70
70
Machine
A
B
C
A
B
C
A
B
C
A
B
C
Waarneming
−2
0
0
−2
1
1
1
1
0
1
0
1
Formulate this problem as a regression model (including cross products between temperature and machine).
40. One wants to relate the quality of wine to a number of variables: ’Clarity’ (x1 or A),
’Aroma’ (x2 or B), ’Body’ (x3 or C), ’Flavor’ (x4 or D) and ’Oakiness’ (x5 or E);
38 samples of wine have been tested. The dependent variable y is a score which is a
measure for the quality. Consider the model
Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ε ,
with the usual assumptions for ε.
A statistical package gives the following output.
Multiple Regression Analysis Dependent variable: Quality
---------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
---------------------------------------------------------------CONSTANT
3.99686
2.23177
1.79089
0.0828
Clarity
2.33945
1.73483
1.34852
0.1870
Aroma
0.482551
0.272447
1.77117
0.0861
Body
0.273161
0.332561
0.821388
0.4175
Flavor
1.16832
0.304481
3.8371
0.0006
Oakiness
-0.68401
0.271193
-2.52223
0.0168
----------------------------------------------------------------
15
Analysis of Variance
---------------------------------------------------------------Source
Sum of Squares
Df Mean Square
F-Ratio P-Value
---------------------------------------------------------------Model
111.54
5
22.3081
16.51
??????
Residual
43.248
32
1.3515
---------------------------------------------------------------Total (Corr.
154.788
37
And for a model with only three variables
Multiple Regression Analysis
--------------------------------------------------------------Dependent variable: Quality
--------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
--------------------------------------------------------------CONSTANT
6.46719
1.33279
4.85238
0.0000
Aroma
0.58012
0.262185
2.21264
0.0337
Flavor
1.19969
0.274881
4.36441
0.0001
Oakiness
-0.602325
0.264401
-2.27807
0.0291
--------------------------------------------------------------Analysis of Variance
--------------------------------------------------------------Source
Sum of Squares
Df Mean Square
F-Ratio P-Value
--------------------------------------------------------------Model
108.935
3
36.3117
26.92 0.0000
Residual
45.8534
34
1.34863
--------------------------------------------------------------Total (Corr.)
154.788
37
a) Test the hypothesis β1 = β2 = β3 = β4 = β5 = 0.
b) Test the hypothesis β1 = β3 = 0.
See next page
16
All possible 32 models (using the 5 regressors) are considered. This gives the following
output
Regression Model Selection
Dependent variable: Quality
Independent variables:
A=Clarity
B=Aroma
C=Body
D=Flavor
E=Oakiness
Number of models fit: 32
Model Results
------------------------------------------Adjusted
Included
MSE
R-Squared R-Squared
Variables
------------------------------------------4.18347
0.0
0.0
4.18347
2.7027
0.0
A
2.14852
50.0308
48.6427
B
3.00516
30.1074
28.1659
C
1.61593
62.4174
61.3735
D
4.18347
2.7027
0.0
E
2.20886
50.0544
47.2004
AB
2.9001
34.4245
30.6773
AC
1.62128
63.3404
61.2456
AD
4.18347
5.40541
0.0
AE
2.04696
53.7151
51.0703
BC
1.51006
65.8552
63.904
BD
2.04406
53.7807
51.1396
BE
1.65123
62.6632
60.5296
CD
3.01392
31.8508
27.9566
CE
1.49874
66.1112
64.1747
DE
2.08516
54.1984
50.1571
ABC
1.53649
66.2504
63.2725
ABD
2.10258
53.8158
49.7407
ABE
1.63487
64.0893
60.9207
ACD
2.82341
37.9826
32.5105
ACE
1.45631
68.0116
65.1891
ADE
1.55192
65.9114
62.9036
BCD
1.91841
57.8612
54.143
BCE
1.34863
70.3767
67.7629
BDE
1.52707
66.4572
63.4975
CDE
1.57108
66.5054
62.4454
ABCD
1.91353
59.2046
54.2597
ABCE
1.33818
71.4708
68.0128
ABDE
1.43902
69.3209
65.6022
ACDE
1.38502
70.4721
66.893
BCDE
1.3515
72.0599
67.6943
ABCDE
------------------------------------------
17
c) If you have to choose one of these models, which model will you choose? Give
arguments.
One uses stepwise regression (where we can use all 5 variables).
d) Which variable is the first variable that will be chosen in the model? Why?
e) Must another variable be added to the model? Give arguments.
Consider the model with only the variables ’Aroma’ (x2 ), ’Flavor’ (x4 ) and ’Oakiness’
(x5 ). The output is
Multiple Regression Analysis Dependent variable: Quality
-----------------------------------Parameter
Estimate
-----------------------------------CONSTANT
6.46719
Aroma
0.58012
Flavor
1.19969
Oakiness
-0.602325
-----------------------------------Regression Results for Quality with ’Aroma’=5, ’Flavor’=5, ’Oakiness’=4.
----------------------------------------------------------------------------------Lower
Upper
Lower
Upper
Fitted Stnd. Error 95.0% CL
95.0% CL 95.0% CL
95.0% CL
Value for Forecast Forecast
Forecast for Mean
for Mean
----------------------------------------------------------------------------------?????
1.17966
?????
???????
12.5357
13.3782
-----------------------------------------------------------------------------------
f) You buy a bottle of wine with ’Aroma’= 5, ’Flavor’= 5, ’Oakiness’= 4. Give a 95%
prediction interval for the quality of the wine.
18
Download