From the Exploring Data website - http://curriculum.qed.qld.gov.au/kla/eda/ © Education Queensland, 1997 Pricing Diamond Rings Pricing diamond rings in Singapore can be viewed as an interesting exercise in statistical modelling. The price equals the current market value of the gold content of the ring, a craftsmanship fee plus the cost of the diamond. The price of a diamond depends upon the four Cs: caratage, cut, colour and clarity. For jewelry intended for the mass market it can be assumed that the major factor in determining price is the caratage, i.e. the size of the diamond. On February 29, 1992 a full page ad was placed in the Singapore Straits Times newspaper. The advertisement contained pictures of diamond rings and listed their prices, the caratage of the diamond and the gold purity. The data below is for 20 carat gold ladies' rings, each mounted with a single diamond. As only the caratage of the diamond and the purity of the gold was stated in the ad, it can be assumed that the amount of gold in each ring and the craftsmanship fee varied little between rings, and the cut, colour and clarity of each diamond was also similar. Hence we will consider only the size of the diamond in the ring and its cost and attempt to find the mathematical function that best fits these data. There were 48 rings of varying designs, with the diamonds weighing from 0.12 to 0.35 carats (one carat = 0.2 gram) and priced between $223 and $1086. The data are given below. Carats Cost($) Carats Cost($) Carats Cost($) Carats Cost($) 0.17 355 0.17 353 0.21 483 0.32 919 0.16 328 0.18 438 0.15 323 0.15 298 0.17 350 0.17 318 0.18 462 0.16 339 0.18 325 0.18 419 0.28 823 0.16 338 0.25 642 0.17 346 0.16 336 0.23 595 0.16 342 0.15 315 0.2 498 0.23 553 0.15 322 0.17 350 0.23 595 0.17 345 0.19 485 0.32 918 0.29 860 0.33 945 0.21 483 0.32 919 0.12 223 0.25 655 0.15 323 0.15 298 0.26 663 0.35 1086 0.18 462 0.16 339 0.25 750 0.18 443 0.28 823 0.16 338 0.27 720 0.25 678 0.16 336 0.23 595 0.18 468 0.25 675 0.2 498 0.23 553 0.16 345 0.15 287 1. 0.23 595 0.17 345 0.17 352 0.26 693 0.29 860 0.33 945 0.16 332 0.15 316 Find a linear model for this data. Discuss. What is the y-intercept of your model? Is this sensible? Does it help if we force the data through the origin? References Singfat Chu, (1996). Diamond Ring Pricing Using Linear Regression , Journal of Statistics Education v.4, n.3. http://exploringdata.cqu.edu.au/dia_asn.htm Solution: X = size of diamond (carat) = independent variable ( predictor variable) (given size we can predict price) Y= price of the diamond = dependent variable ( price depends on size) Also known as response variable. Following is the scatter diagram of carat vs price of the diamonds 1100 1000 900 price 800 700 600 500 400 300 200 0.10 0.15 0.20 0.25 0.30 0.35 carat The scatter diagram shows that the price of diamond increases with an increase in the size (carat) Also we see a linear trend in the dataset Looking at the scatterdiagram, we make a decision to fit a linear equation to the given data. Our regression model is y = A + Bx + Here is the error term The values of A and B are unknown. We estimate them from the sample data using Least square method. In the least square method we assume that ~ N ( 0, σe) Let a and b the sample estimates of A and B , then our regression model becomes ŷ = a + bx Regression Analysis (from minitab) The regression equation is price = - 260 + 3721 carat Predictor Constant carat Coef -259.63 3721.02 S = 31.84 StDev 17.32 81.79 R-Sq = 97.8% T -14.99 45.50 P 0.000 0.000 R-Sq(adj) = 97.8% a = -260, b = 3721 p-value for the H0: A = 0 is very small hence we reject the H0, and conclude that A is playing a significant role in the regression model p-value for the H0: B = 0 is very small hence we reject the H0, and conclude that B is playing a significant role in the regression model The value of R-sq = 97.8% implies that about 98% of the variation in the data can be explained by the model y = a+ bx and 2% of the variation is due to error terms Thus we conclude that this an adequate model for the given dataset. Further analysis (for STAT 315) Residuals Versus the Fitted Values (response is price) Residual 100 0 -100 200 300 400 500 600 700 800 900 1000 1100 Fitted Value The graph above shows that the error (residual) terms are randomly distributed around 0 hence the assumption of randomness of error terms is valid. Normal Probability Plot of the Residuals (response is price) 2 Normal Score 1 0 -1 -2 -100 0 100 Residual The graph above shows that the error (residual) terms are approximately normal with a good approximation, confirming the normality assumption about error terms