LinReg2

advertisement
From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/
© Education Queensland, 1997
Pricing Diamond Rings
Pricing diamond rings in Singapore can be viewed as an interesting exercise in statistical modelling. The price equals the
current market value of the gold content of the ring, a craftsmanship fee plus the cost of the diamond. The price of a diamond
depends upon the four Cs: caratage, cut, colour and clarity. For jewelry intended for the mass market it can be assumed that
the major factor in determining price is the caratage, i.e. the size of the diamond.
On February 29, 1992 a full page ad was placed in the Singapore Straits Times newspaper. The advertisement contained
pictures of diamond rings and listed their prices, the caratage of the diamond and the gold purity. The data below is for 20
carat gold ladies' rings, each mounted with a single diamond.
As only the caratage of the diamond and the purity of the gold was stated in the ad, it can be assumed that the amount of gold
in each ring and the craftsmanship fee varied little between rings, and the cut, colour and clarity of each diamond was also
similar. Hence we will consider only the size of the diamond in the ring and its cost and attempt to find the mathematical
function that best fits these data.
There were 48 rings of varying designs, with the diamonds weighing from 0.12 to 0.35 carats (one carat = 0.2 gram) and
priced between $223 and $1086. The data are given below.
Carats
Cost($)
Carats
Cost($)
Carats
Cost($)
Carats
Cost($)
0.17
355
0.17
353
0.21
483
0.32
919
0.16
328
0.18
438
0.15
323
0.15
298
0.17
350
0.17
318
0.18
462
0.16
339
0.18
325
0.18
419
0.28
823
0.16
338
0.25
642
0.17
346
0.16
336
0.23
595
0.16
342
0.15
315
0.2
498
0.23
553
0.15
322
0.17
350
0.23
595
0.17
345
0.19
485
0.32
918
0.29
860
0.33
945
0.21
483
0.32
919
0.12
223
0.25
655
0.15
323
0.15
298
0.26
663
0.35
1086
0.18
462
0.16
339
0.25
750
0.18
443
0.28
823
0.16
338
0.27
720
0.25
678
0.16
336
0.23
595
0.18
468
0.25
675
0.2
498
0.23
553
0.16
345
0.15
287
1.
0.23
595
0.17
345
0.17
352
0.26
693
0.29
860
0.33
945
0.16
332
0.15
316
Find a linear model for this data. Discuss. What is the y-intercept of your
model? Is this sensible? Does it help if we force the data through the origin?
References
Singfat Chu, (1996). Diamond Ring Pricing Using Linear Regression , Journal of Statistics Education v.4,
n.3.
http://exploringdata.cqu.edu.au/dia_asn.htm
Solution:
X = size of diamond (carat) = independent variable ( predictor variable)
(given size we can predict price)
Y= price of the diamond = dependent variable ( price depends on size)
Also known as response variable.
Following is the scatter diagram of carat vs price of the diamonds
1100
1000
900
price
800
700
600
500
400
300
200
0.10
0.15
0.20
0.25
0.30
0.35
carat
The scatter diagram shows that the price of diamond increases with an increase in the size
(carat) Also we see a linear trend in the dataset Looking at the scatterdiagram, we make a
decision to fit a linear equation to the given data.
Our regression model is y = A + Bx + 
Here  is the error term
The values of A and B are unknown. We estimate them from the sample data using Least
square method. In the least square method we assume that  ~ N ( 0, σe)
Let a and b the sample estimates of A and B , then our regression model becomes
ŷ = a + bx
Regression Analysis (from minitab)
The regression equation is
price = - 260 + 3721 carat
Predictor
Constant
carat
Coef
-259.63
3721.02
S = 31.84
StDev
17.32
81.79
R-Sq = 97.8%
T
-14.99
45.50
P
0.000
0.000
R-Sq(adj) = 97.8%
a = -260, b = 3721
p-value for the H0: A = 0 is very small hence we reject the H0, and conclude that A is
playing a significant role in the regression model
p-value for the H0: B = 0 is very small hence we reject the H0, and conclude that B is
playing a significant role in the regression model
The value of R-sq = 97.8% implies that about 98% of the variation in the data can be
explained by the model y = a+ bx and 2% of the variation is due to error terms
Thus we conclude that this an adequate model for the given dataset.
Further analysis (for STAT 315)
Residuals Versus the Fitted Values
(response is price)
Residual
100
0
-100
200
300
400
500
600
700
800
900
1000
1100
Fitted Value
The graph above shows that the error (residual) terms are randomly distributed around 0
hence the assumption of randomness of error terms is valid.
Normal Probability Plot of the Residuals
(response is price)
2
Normal Score
1
0
-1
-2
-100
0
100
Residual
The graph above shows that the error (residual) terms are approximately normal with a
good approximation, confirming the normality assumption about error terms
Download