Studies in Hedonic Resale Housing Price Indexes∗ Wenzheng Li Statistics Canada Marc Prud’homme Statistics Canada Kam Yu Lakehead University May 17, 2006 Abstract Hedonic analysis is gaining acceptance as a tool for quality adjustment in goods and services in official statistics and academic research. Computers and houses are by far the two most popular products for which much of the hedonic research been concentrated on. Few comparative studies have, however, looked at the sensitivity of the price indexes that are obtained when different regression approaches (e.g., pooled regression, adjacent period regression, and separate regression) are applied to the data. This paper attempts to provide answers as to the sensitivity of the hedonic results to the regression approach used. Furthermore, the paper explores how functional forms affect results. The data used are prices for resale houses in the Ottawa area from the Multiple Listing Service (MLS) for the period 1996 to 2005. Characteristics of the database include a large number of explanatory variables and observations. 1 Introduction The main reason to conduct research on Resale Housing Price Indexes (RHPI) is to test if the new and resale housing prices depict similar movement. The New Housing Price Index series (NHPI) play an important role in the consumer price index (CPI) and national accounts. For the CPI, the NHPI (exclusive of land) is used in the calculation of replacement cost index and index of homeowners’ insurance for owned accommodation; ∗ Paper presented to the Canadian Economic Association 40th Annual Meetings, May 26–28, 2006, Concordia University, Montréal. the NHPI (including land) is used to compute mortgage interest for owned accommodation. For the National Accounts, the NHPI (excluding land) is used to deflate the value of residential building construction and the value of the national housing stock. The matched model index used in the NHPI, however, is subject to small sample size and difficulty to properly adjust for quality change. Also NHPI is for new houses, which is only a part of the housing market. The RHPI samples both new and resale houses and therefore are more comprehensive. The hedonic approach has been proved to be a better method to compute price indexes allowing for quality change. Due to the difficulty in separating out land value from house prices, hedonic RHPI obtained in this paper can only be compared with the NHPI including land value. Also, RHPI cannot replace NHPI within Statistics Canada because NHPI is used as a deflator for national account purposes. A hedonic RHPI can be used to test if NHPI properly tracks the new house price movement and to provide some methodological support for computing a future quality-adjusted NHPI. Our approach is an extension of previous work carried out in Prices Division, Statistics Canada.1 2 Overview of Three Approaches Quality change is one of the well-known issue in constructing price indexes. For products with physical specification and characteristics that change frequently, a quality adjustment procedure is necessary to avoid biases. The common method used by statistical agencies is the matched model method. Studies have shown that matched-models often miss price changes when new models of products are introduced. A hedonic analysis, on the other hand, can theoretically capture the pure price change in lieu with quality changes. Here we briefly discuss three commonly used methods in housing price 1 See Prud’homme et al (2004). 2 measurement. 2.1 Average and Median Sales Price Approach The change in the median (and sometimes the average) sale price of existing homes is often used to measure price changes. These statistics are often cited because they are easily understood and available for most geographical areas in Canada through regular releases by an area real estate board. Such statistics are however misleading indicators of house price appreciation. The average price treats housing as a homogeneous product and ignore any quality changes. The resulting bias in the price index can be severer than the matched-model index. The underlying index formula, which is effectively the ratio of the arithmetic means, is the so-called Dutot index. This index is known to be sensitive to units of measurement and quality changes.2 The advantage of the median price is that it is immune from outliers resulting from measurement errors. For example, the median price remains unchanged even when prices in the upper quantile or the lower quantile experience large changes while the ranking of the median priced housing is unaffected. This property, however, becomes problematic in situation where there are large movements in either quantile and the median is unable to reflect the price changes. 2.2 Repeat Sales Approach The matched model method that is frequently applied to merchandise goods is often called the repeated sales approach in the housing market. It estimates price trends from transactions for properties that have been sold more than once over the sample period. The idea was originally proposed by Baily et al (1963). 2 See Diewert (2004). 3 The main advantage of the repeat sales approach is reproducibility, i.e., different statisticians given the same data on the sales of housing units will come up with the same estimate of quality adjusted price change. The main disadvantage of the repeat sales approach is that it does not use all of the available information on housing unit sales; it uses only information on housing units that sold more than once during the sample period. Second, it cannot deal adequately with depreciation of the housing structure. Third, it cannot deal adequately with housing units that have undergone major repairs or renovations. Finally, it does not allow for changes in the implicit price of particular housing attributes over time. In fact, it is likely that each attribute has its own price determined by the demand for and supply of that attribute.3 Also the repeat sales approach is subject to sample selectivity bias. Usually houses sold repeatedly tend to have inferior quality. In other words, they are not representative of the entire population of properties that sold. 2.3 Hedonic Approach The hedonic model can be considered as an equilibrium model in product differentiation. The product price is assumed to be a function of a set of characteristics. When the characteristics are expressed as a linear function of the product price, the estimated coefficients of the characteristics can be interpreted as their implicit prices. The part of overall price change from one period to another which is not accounted for by the changes in characteristics is then interpreted as pure price change. Classic studies in hedonic analysis include Griliches (1961) 0n automobiles and Chow (1967) on computers. Compared with the repeat sales approach, the hedonic regression model has the following advantages. First, it uses all of the information on housing sales in each 3 See Diewert (2003a), page 38. 4 sample period and not just the data that can be matched. Second, it can adjust for the effects of depreciation if the age of the structure is known at the time of sale. Third, it can adjust for the effects of renovations and repairs if expenditures on renovation and extensions are known at the time of sale (Diewert, 2003a, p. 37). Generally, there are three ways to carry out hedonic regression approaches: 1) the time dummy variable method, 2) the characteristics price index method; and 3) hedonic price imputation method. A description of these three methods is given in the section 5. See Triplett (2004) for further discussion. The U.S. Bureau of Census started to produce a price index for single-family homes under construction in 1968 (Moulton, 2001). It was the first hedonic index produced within the context of a regular statistical program and is still ongoing today. Based on the information provided by Statistics Norway website, sub-indexes for two different housing types are calculated using the hedonic method in Norway. The housing types are 1) detached houses, semi detached houses, row houses, linked houses and houses with three or four dwellings and 2) apartment buildings with five or more dwellings. 3 Features of This Study The distinguishing features of this study are: 1. MLS data provide a large number of house characteristics including living area, number of bedrooms, number of bathrooms, number of garages, number of fireplaces, number of appliances, location, age, etc. See table 1 for variable descriptions. 2. Three approaches including the pooled dummy variable method, the adjacent period dummy variable method, and the characteristics price index method will be 5 tested. In most studies, the pooled dummy variable method is used because of simplicity and convenience for dealing with historical data. In the presence of structural change across time, the adjacent period dummy variable method is preferred to the pooled regression. The characteristics price index method is generally thought as an ideal method based on the following reasons: 1) It utilizes the implicit characteristics price to compute price index; 2) It is not necessary to test for structural change across time; and 3) Traditional price index formulae can be applied. 3. Various functional forms for hedonic regressions are compared. In most studies, only three popular functional forms including the linear model, the semi-log model and the log linear model are utilized. In this paper, the Box-Cox model is also tested. The limitations of this study are as follows: 1. Ideally, location classification should be small enough to justify hedonic interpretation. Conniffe (1999) argues that there are difficulties in defining and measuring the underlying causal factors for location variables. Some research has gone to great efforts to tie location to access to amenities like schools, shopping, recreational facilities etc., but with rather little success. Alternatively, a separate regression can be performed for each location, to offset the lack of information on neighbourhood environment, crime rates, convenience to schools, and other amenities. In Conniffe (1999) the income status and stage of life cycle of the residents are measured instead of direct location variables. In this study we use location dummy variables instead of running a regression for each location. 2. House types should be classified at the more detailed level such as two storey detached houses, three storey detached houses, etc. because only houses of the 6 same type can be treated as homogeneous products. Then overall housing price index can be obtained by aggregating the price index for each house type. In this study, all the detached houses are pooled into one regression for convenience. 3. The MLS data does not provide any information on maintenance and renovation expenditures. Also, information on the quality of materials used in construction or construction techniques are not available. 4. The benchmark NHPI includes the Ottawa-Gatineau region. The MLS data for Gatineau is not available. Since housing construction in Gatineau only represents a very small percentage in the Ottawa-Gatineau region, it is still appropriate to use the NHPI for Ottawa-Gatineau as a benchmark for our hedonic RHPI. 4 Data and Variables The Ottawa Real Estate Board is a trade association of over 1800 registered brokers and salespeople in the Ottawa area. The MLS is a co-operative marketing system used by the Board’s members to ensure maximum exposure of properties listed for sale, lease or rent on the Board’s computer system. The following list gives a brief description of the original data and the process of data construction. 1. Housing type: Only detached houses are included for the hedonic RHPI since NHPI only covers single detached houses. Moreover, since structural changes across house types may be present, it is not appropriate to pool detached, semi-detached, row or condominium in the same regression. In the data set, residential detached houses account for 70% of the total, semi detached houses account for 9%, and row units account for 21%. 2. Location classification: 7 (a) Ottawa city: includes Downtown Core, Ottawa south, Ottawa east, Ottawa west and Far west. (b) Inner suburb: includes Orleans, Metcalfe, Nepean, Manotick, Stittsville and Kanata. (c) Outer suburb: includes Arnprior, Rockland, Winchester, Alexandria, Kemptville, Westport, Carletonplace, Almonte, and Carp. Since the NHPI for Ottawa-Gatineau does not cover the outer suburb in Ottawa, it is not included in our analysis. Farm houses in these areas should also be treated separately. In all the models tested here, there are five location dummy variables, which are Ottawasouth, Ottawaeast, Ottawawest, Farwest and suburb. The variable suburb refers to the inner suburb. The variable Downtowncore is omitted since it is used as a base category. In the data set, 69% of resale detached houses are located in the inner suburb, which share the similar location characteristics Therefore only one location dummy variable, suburb, is used for these houses. On the other hand, prices of resale detached houses located in the Ottawa city vary significantly with the changing location. Therefore, four location dummy variables are used to distinguish these locations. In the data set, 14% houses are located in Ottawa south, 3% in Ottawa east, 4.% in Ottawa west, 7% in far west, 3% in downtown core, and the rest in the inner suburb. 3. Age squared: Usually house price has a negative relationship with age of the unit. However, house price may also presents quadratic relationship with age squared since older houses may have a better location and some other advantages. Thus both variables of age and age squared are used. 4. Continuous variables: There are 9 continuous variables, namely, living area, lot area, number of bedroom, number of bathroom, number of garage, number of 8 fireplace, number of appliance, age and age squared. All of these variables are assumed to closely follow the house price movement. 5. Dummy variables: Besides location dummy variables, there are 13 other dummy variables for various features and environmental amenities. They include brick for exterior finish, new house, hard wood floor, natural gas for heating fuel, corner, Cul-de-Sac, shopping nearby, patio, central/built in vacuum, pool, whirl bath, sauna, and air conditioning. The corresponding omitted categories for all these dummy variables are used as bases. 6. Data construction: By inspection some outliers in the data set are discovered. Data are cleaned by the following procedure. (a) Create a new data set which includes only detached houses located in the City of Ottawa and the inner suburb. (b) Drop observations if sold price is less than CAD $65,000 or greater than CAD $800,000. (c) Drop observations if living area is less than 500 square feet or greater than 5000 square feet. (d) Drop observations if lot area is below 500 square feet or greater than 40000 square feet. (e) Drop observations if number of bedroom equal to zero or greater than 10. (f) Drop observations if number of total bathroom is equal to zero or greater than 7, i.e., exclude large values such as 50. (g) Drop observations if number of garage is greater than 7, i.e., exclude large values such as 44. (h) Drop observations if they are old houses with built year missing. 9 (i) Drop observations if they are mobile homes since mobile homes are not representative of detached houses. (j) Drop observations if information for exterior finish is missing. The data span from 1996 to 2005, with a total of 33,595 observations. See tables 1, 2 and 3 for the detailed description of variables and sample information. 5 Methodology 5.1 The pooled time dummy variable method The observations for all the periods are pooled into one regression. Only the intercept is allowed to change across the periods in this regression. The coefficients for characteristics are constrained to remain the same across the periods. For example, for the semi-log model, log p = β0 + K X βj Xj + j=1 t X γi Di + . (1) i=2 where p denotes the prices of a product for all the periods, βj measures the logarithms of implicit price for characteristic j, and Xj denotes the quantities of characteristic j. Di is the time dummy variable, and takes on value of 1 if the transaction occurs at the certain period i, and 0 otherwise. The coefficient γi measures the logarithm of price index for each period with the first period as the base period. The price indexes are obtained by taking antilog for coefficients of each time dummy variable γi . 10 5.2 The adjacent period time dummy variable method Only the observations for two adjacent periods are pooled into one regression. For the adjacent periods, the model can be written as: log p = β0 + K X βj Xj + γD + . (2) j=1 The interpretation of the model is similar to the pooled regression. The only difference is that p denotes the prices of a product for the adjacent periods. D denotes the time dummy variable for the comparison period. The price index for each period can be obtained by taking antilog for γ. Between these two time dummy variable methods, the adjacent period regression is usually preferred because of the possible presence of structural change across time. In that case the assumption of parameter stability across time does not hold, so the pooled regression approach should not be used. Structural break usually occurs under rapid technological or taste changes. Since technology does not develop so fast for building a house, it may not pose a problem for running a pooled regression for housing price. The Chow test can be used to test if structural change is present. If the test result is above the critical value, the adjacent period regression is not justified, let alone the pooled regression for all the periods. The dummy variable method stands apart from the traditional practice in official statistics, where price indexes are computed by “formulae”, such as Laspeyres, Paasche, Fisher, and so forth. 5.3 The characteristics price index method The motivation for the characteristics price index method comes from the interpretation of hedonic function coefficients (Triplett, 2004). To construct the characteristics price 11 index, a regression is carried out for each period. Both the intercept and coefficients of characteristics are allowed to change across the periods. pt = ct,0 + K X ct,j Xt,j + t . (3) j=1 Where pt denotes the price in period t, subscript j denotes each characteristics with value Xt,j , ct,j represents the implicit price of characteristics j for period t. The intercept term c0 can be interpreted as a group of characteristics not included in the regression. The Laspeyres price index for period t can be written as: PL = ct,0 + Pk ct−1,0 + Pk j=1 cj,t Xj,t−1 j=1 cj,t−1 Xj,t−1 . (4) The Paasche price index for period t can be written as: PP = ct,0 + Pk ct−1,0 + Pk j=1 cj,t Xj,t j=1 cj,t−1 Xj,t . (5) The Fisher price index for period t can be written as: 1 2 PF = (PL · PP ) = ! 21 Pk c X c + c X j,t j,t−1 j,t j,t t,0 j=1 j=1 · . Pk Pk ct−1,0 + j=1 cj,t−1 Xj,t−1 ct−1,0 + j=1 cj,t−1 Xj,t ct,0 + Pk (6) Here traditional price index formulae are combined with the characteristics index method. Triplett (2004) remarks that the price index for characteristics permits breaking the connection between hedonic functional form and index number functional form. This is a theoretical as well as a practical advantage. 12 5.4 The hedonic price imputation method This method is a blend of the hedonic regression approach and the matched model approach. When the matched model breaks down, the hedonic regression can be used to impute missing price or estimate a quality hedonic adjustment, then the matched model approach is applied. Theoretically, this method gives the same results as in the “pure” hedonic methods described above if the same data set is used (Diewert, 2003b). For this reason this method will not be employed here. 6 Functional Forms Choosing the functional form is another important issue in hedonic studies. Typically, analysts use measures of “goodness of fit”, including R2 , the standard error of the regression, and so forth, for choosing among functional forms. 6.1 Functional forms The functional forms considered in the study are listed as follows. 1. Linear Model: K X Y = β0 + βi Xi + u. i=1 2. Semilog Model: log Y = β0 + K X βi Xi + u. i=1 3. Log-linear Model: log Y = β0 + K X i=1 13 βi log Xi + u. 4. Box-Cox (BC) Model: Y (λ) = β0 + K X βi Xi + u, i=1 where the Box-Cox transformation is defined as Y (λ) = Y λ −1 λ if λ 6= 0 log Y if λ = 0 . The linear, the semilog and the log-linear models are nested by the Box-Cox model. The linear model results if λ equals 1, while a log-linear or semilog model (depending on how Y is measured) results if λ equals 0. If λ equals −1, the equation will involve the reciprocal of Y . Except for the polar cases of λ equal to −1, 0 or 1, it is hard to conceive of situations in which a particular value would be specified a priori (Greene, 1993, p. 239). 6.2 Choosing among functional forms In hedonic studies, the semilog model and the log-linear model are widely used for the following reasons. First, the semilog model and the log-linear model usually generate a better goodness of fit than the linear model based on adjusted R2 . Second, it is relatively easier to interpret the coefficients for characteristics than the Box-Cox model. The Box-Cox (1964) model nests the three popular functional forms and has gained popularity. It usually rejects the linear, the semilog and the log-linear functional forms (Triplett, 2004). Our test results also confirm this point. Little is known if the rejection of the three functional forms is resulted from misspecified variables, omitted variables, or nonlinearity of the functional form. In constructing a price index from the results of yearly regressions, we are in effect performing out of sample prediction. The R2 and adjusted R2 are measures of the good- 14 ness of fit, and are particularly useful evaluating the fit of the model within samples. When measuring out of sample goodness of fit, other measures, such as Akiake’s Information Criteria (AIC) and the Schwartz Criteria (SC) are better. The lower AIC and SC are, the better the model. There are many circumstances in which one is forced to trade off bias and consistency of estimators. For example, an estimator with very low variance and some bias may be more desirable than an unbiased estimator with high variance. One criterion which is useful in this regard is the goal of minimizing mean square error. Triplett (2004, p. 187) argues that choosing a functional form to reduce heteroscedasticity is not a good idea, for two reasons. First, heteroscedasticity does not bias the expected values but the standard errors of the coefficients. Methods for dealing with heteroscedasticity in regression analysis exist. Accordingly, avoiding heteroscedasticity need not be a factor in choosing hedonic functional forms. Second, a hedonic function estimates the relation between the prices of product varieties and the characteristics embedded in them, and gives us estimated implicit prices for the characteristics. Those implicit prices are our major interest. Choosing an empirically inappropriate functional form biases our estimates of the hedonic coefficients, and thus biases the hedonic price index as well. Rosen (1974) shows that in theory the functional form of a hedonic functions is purely an empirical issue, to be determined from the analysis of the data. 7 Comparative Analysis of Results In this section the correlation relationships among the variables are examined. Then the Box-Cox test is applied to annual estimate of hedonic price equation to find the best functional form. Criteria for comparing functional forms includes the signs of coefficients, the value of coefficients, adjusted R2 , root MSE, F-test, AIC, and SC. The Chow test 15 is used to check for structural changes across the adjacent years and across all years for the semilog model. Some econometric issues are also discussed. Finally, hedonic RHPIs are computed and compared with NHPI. 7.1 Correlations Tables 4, 5, and 6 show the correlation among sprice, livarea, lotarea, bedroom, totalbath, numgarag and fireplace. The dependent variable sprice is included so that we can see how closely these independent variables follow the price movement. We suspect that lot area may play a different role in the price movement for different location. Our observations include: 1. The correlation coefficient between lotarea and livarea is relatively reasonable, 23.8% for the Ottawa city, and 28.2% for the inner suburb. The correlation coefficient between lotarea and numgarag is 19.0% for the Ottawa city, and 14.4% for the inner suburb. 2. lotarea does not have any reasonable relationship with all other independent variables. Even for the inner suburb itself, lotarea almost has nothing to do with all other independent variables. 3. The correlation coefficient between lotarea and sprice is 7.2% for the Ottawa city, and 15.4% for the inner suburb. 4. From the regression results, the t-statistic is significant for lotarea at any reasonable significance level, the coefficient is too small for all years and all models. 5. The lot area problems may come from the fact that both location variable and lot area variable are used in the same regression, and location matters more than lot area for house prices. 16 Multicollinearity does not seem to pose a serious problem in this study. First, the correlation coefficients among the independent variables are not large. The highest value, 46.4% is between livarea and bedroom. The possible reason may be that there are a large number of observations in the data set, which reduces the multicollinearity among the independent variables. Second, R2 s for all years and all models are moderate, while individual t-statistics are all significant for the pooled regression at any reasonable significance level, and significant for most variables in the annual regressions. 7.2 Specification test for functional forms Since only three variables, namely, sprice, livarea, and lotarea, are continuous variables with very large values, we experiment with different functional forms applied to these variables. The rest appear in the equation as linear variables. Tables 7 and 8 report the specification test and the Box-Cox test results for the annual regressions. Highlights of the result are: 1. Based on root MSE, the Box-Cox model performs the best, and the semilog model and the log linear model are superior to the linear model. 2. Based on AIC and SC, the Box-Cox model is the best. The log linear model is slightly better than the semilog model. The linear model gives the worst performance. 3. Based on the Box-Cox test, all the linear model, the semilog model and the log linear model are rejected, i.e., the likelihood ratio statistics χ21 are large enough to reject λ = 0, λ = 1 and λ = −1 for all years, with the 5% critical value 3.84. As mentioned before, the Box-Cox test usually rejects the other nested models. Table 9 shows the results for the pooled regression for the semilog model. All the coefficients have expected signs. All the location dummy variables have negative signs 17 because Downtowncore is used as a base category. Houses located at the inner suburb have the lowest prices, which are reasonable. All the time dummy variables have positive signs since house prices had been appreciating during these periods compared with base year 1996. All the individual t-statistics are significant at the 5% significance level (the critical value for 36 degrees of freedom is 2.021). The values of coefficients are also expected except for lotarea. Robust standard errors of the estimated coefficients accounting for heteroskedasticity are are obtained, which are reasonably small compared with the value of coefficients. Joint F -statistic is equal to 3,170, well above the critical value for any reasonable significance level. Regression results for the semilog model in 2005 are shown in table 10. Most coefficients have the expected signs, such as livarea, lotarea, bedroom, and totalbath. Again all the location dummy variables have negative signs. 7.3 Specification test for structural change across time Parameter stability test is based on the semilog model because tests for functional forms indicate that it gives reasonable results and the Chow test is simple to perform. For the Cow test, we run an adjacent year regression which includes one time dummy variable that distinguishes the two periods. For the pooled regression, there are a total of 9 time dummy variables. Based on the Chow test (F test) results in Table 11, structural change is present even for adjacent years. The highest F value is 2.75 for 1996–1997 and the lowest is 1.23 for 1998–1999. The 1% critical value is 1.00 for the very large values of degrees of freedom for both numerator and denominator. Thus we can see that all the test results are slightly above the critical value. It is not surprising that the test result for the pooled regression is higher than those for all the adjacent years. Since structural break is present for the adjacent year regression, let alone for the pooled regression for all the 18 years. The test results suggest that the characteristics price index method, i.e. the separate regression, is preferred. Both the pooled and the adjacent year regressions are inferior to it due to the presence of structural change. Nevertheless, the F values are not high for the adjacent year regressions, which indicates that structural changes in consecutive years are not serious. 7.4 Heteroskedasticity Table 12 shows that heteroskedastictiy is present for every year for every functional form except the Box-Cox model from 1997 to 2005. The 5% critical value for χ21 is 3.841. Although the presence of heteroskedasticity does not bias the estimated coefficients, it does affect standard errors and therefore the t-statistics. Thus heteroskedasticity-robust inferences after OLS estimationa are applied. Table 9 and table 10 show the regression results with robust standard errors. 7.5 Comparing hedonic RHPIs with NHPI and median RHPI Figure 1 shows the trends of the NHPI, median RHPI and the hedonic RHPIs by using the pooled regression approach, the adjacent year regression approach and the characteristics price index approach. For the characteristics price index method, the Laspeyres, Paasche, and Fisher price indexes are computed. All five hedonic RHPIs predict the similar pattern of price movement with NHPI for Ottawa with the NHPI being the lower bound. The main reason may be that the NHPI is only for new houses built in the suburb. Since 31% of resale houses are located in the City of Ottawa, with houses appreciating faster than new houses built in the suburb, i.e, land value appreciates faster in the city than in the suburb. All of these three hedonic methods produce the almost identical results. However, we 19 can still see the slight difference among these indexes: 1) Laspeyres price index imposes the upper limit; Paasche price index imposes the lower limit; 2) Fisher price index, the pooled regression price index and the adjacent year regression price index are all between Laspeyres price index and Paasche price index. These results are consistent with the index number literature. The median RHPI gives the most rapid price increase. One possible explanation is that prices in the high end house market increase less than those of the low end market. 8 Conclusions Using a data set for the Ottawa area, we have constructed quantity-adjusted price indexes using the hedonic method. The Chow test results indicate that structural changes between adjacent years are mild though statistically significant. The pooled regression for the semi-log model, however, results in a price index that closed matched those from separate regressions on the annual base. In fact the hedonic price indexes are insensitive to structural changes over the years and to the differences in the Laspeyres and Paasche types formulation. The Box-Cox analysis rejects the linear, semilog, and log-linear functional forms. It also suggests that the problem of heteroskedasticity can be mitigated by choosing the more correct functional form. The next step in this project is to compute the price indexes with the Box-Cox regressions and test the sensitivity of the price index with respect to the functional form. References Anglin, Paul M. and Ramazan Gencay (1996) ‘Semiparametric Estimation of a Hedonic Price Function’, Journal of Applied Econometrics, Vol 11, No.6, 633-648. 20 Baldwin, Andrew and Emad Mansour (2003) ‘Different Perspectives on the Rate of Inflation, 1982-2000: The Impact of Homeownership Costs’, Research Paper, Statistics Canada. Bailey, M.J., R.F.Muth and H.O.Nourse (1963) ‘A Regression Method for Real Estate Price Index Construction’, Journal of the American Statistical Association, 58, 933-944, December. Berndt, Ernst, Ellen R.Dulberger and Neal J.Rappaport (2000) ‘Price and Quality of Desktop and Mobile Personal Computers: A Quarter Century of History’ Berndt, Ernst, Zvi Griliches, Neal J. Rappaport ‘Econometric Estimates of Price Indexes for Personal Computers in the 1990s’, Journal of Econometrics 68, 243-268. Berndt, Ernst R (1991) ‘The Practice of Econometrics: Classic and Contemporary’, Addison-Wesley Publishing Company, Reading Massachusetts. Box, G.E.P. and D.R. Cox (1964), ‘An Analysis of Transformations’, Journal of the Royal Statistical Society, Series B (Methodological), 26(2), 211-52. Brachinger, H.W. (2002) ‘Statistical Theory of Hedonic Price Indices’. Working Paper from Department of Quantitative Economics, University of Freiburg/Fribourg Switzerland. Chow, Gregory C. (1967) ‘Technological Change and the Demand for Computers,’ American Economic Review, 57(5), 1117-1130. Conniffe, Denis and David Duffy (1999) ‘Irish House Price Indices - Methodological Issues’, working paper in the The Economic and Social Review,Vol.30, No.4, 403423. Diewert, Erwin (2003a) ‘The Treatment of Owner Occupied Housing and Other Durables in a Consumer Price Index’. Diewert, Erwin (2003b) ‘Hedonic Regressions: A Consumer Theory Approach,’ in Scanner Data and Price Indexes, Conference on Research in Income and Wealth, Volume 64, Robert C. Feenstra and Matthew D. Shapiro (eds.), National Bureau of Economic Research, The University of Chicago Press, 317-348. Diewert, W. Erwin (2004) ‘Elementary Indices,’ in Consumer Price Index Manual: Theory and Practice, Geneva: International Labour Office, Chapter 20, 355-371. Englund, Peter (1998) ‘Improved Price Indexes for Real Estate: Measuring the Course of Swedish Housing Prices’, Journal of Urban Economics 44, 171-196. Fleming, M and J.G.Nellis (1992) ‘Development of Standardized Indices for Measuring House Price Inflation Incorporating Physical and Locational Characteristics’, Applied Economics24, 1067-1085. 21 Greene, William (1993) ’Econometric Analysis’, A Simon & Schuster Company. Griliches, Zvi (1961) ‘Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality Change’, hearings in the U.S.Cogress. Gudnason, Rosmundur (2004) ‘Market Price Approach to Simple User Cost’, Statistical Journal of the United Nations, ECE 21, 147-155. Hwang, Yoon ‘Resale Housing Price Index’, Prices Division, Statistics Canada. MacDonald, Larry, ‘The Hedonic Price Index Approach: A Pilot Study of the OttawaCarleton Region’, Prices Division, Statistics Canada. MacDonald, Larry (1986) ‘Hedonic Models of Housing: An Examination With Reference to New Housing in the Ottawa Area’, Prices Division, Statistics Canada. McDonald, John (1980) ‘The Use of Proxy Variables in Housing Price Analysis,’ Journal of Urban Economics 7, 75-83. Moulton, Brent (2001) ‘The Expanding Role of Hedonic Methods in the Official Statistics of the United States’, U.S. Bureau of Economic Analysis. Poole, Robert (2005) ‘Treatment of Owner-Occupied Housing in the CPI’, U.S.Bureau of Labor Statistics. Prud’Homme, Marc, Dimitri Sanga and Holly Shum (2004) ‘From Average Price to Hedonic Price Indexes: A “Preliminary”Investigation into Various Measures of Trends in Existing House Prices Using MLS Data for Ottawa’, Prices Division, Statistics Canada. Ribe, Martin (2004) ‘Swedish Re-considerations of User-Cost Approaches to Owner Occupied Housing’, Statistical Journal of the United Nations, ECE 21, 139-146. Rosen, Sherwin (1974) ’Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition’, Journal of Political Economy, 82(1) (January-February), 34-55. Tong, Zhong Yi and John L. Glasscock (2000) ‘Price Dynamics of Owner-Occupied Housing in the Baltimore-Washington Area: Does Structure Type Matter?’, Journal of Housing Research. Volume 11, Issue 1. Triplett, Jack (2004) ‘Handbook on Hedonic Indexes and Quality Adjustments in Price Indexes’, OECD Publishing. Yu, Kam and Marc Prud’Homme (2005) ‘Econometric Issues in Hedonic Price Indices: The Case of Internet Service Providers’. 22 Table 1: Variable Description Variables sprice livarea lotarea bedroom totalbath numgarage f ireplace totalappl age age2 brick newhouse hardwd natgas corner culdesc patio shopnrb centvac pool whirlbath sauna aircon Downtowncore Ottawasouth Ottawaeast Ottawawest F arwest suburb Note: Description Sale price Total square footage of living area in the unit Total square footage of lot area Number of reported bedrooms Number of reported bathrooms Number of garage Number of fireplace Number of total appliances. Age of a unit Age squared If exterior finish is brick then brick = 1; Otherwise=0 If unit’s age is zero then newhouse = 1; Otherwise=0 If unit has hardwood then hardwd = 1; Otherwise=0 If heating fuel is natural gas then natgas = 1; Otherwise=0 If unit is at corner then corner = 1; Otherwise=0 If unit is at Cul-de-Sac then culdesc = 1; Otherwies=0 If unit has patio then patio = 1; Otherwise=0 If shopping center is nearby then shopnrb = 1; Otherwise=1 If unit has Central/Built-In Vacuum then centvac = 1; Otherwise=0 If unit has an indoor or outdoor pool then pool = 1; Otherwise=0 If unit has whirlbath then whirlbath = 1; Otherwise=0 If unit has sauna then sauna = 1; Otherwise=0 If unit has air condition then aircon = 1; Otherwise=0 If unit is located at downtown core then Downtowncore = 1; Otherwise=0 If unit is located at Ottawa south then Ottawasouth = 1; Otherwise=0 If unit is located at Ottawa east then Ottawaeast = 1; Otherwise=0 If unit is located at Ottawa west then Ottawawest = 1; Otherwise=0 If unit is located at Ottawa west then F arwest = 1; Otherwise=0 If unit is located at inner suburb then suburb = 1; Otherwise=0 The variable Downtowncore is used as a base category for location. Percentage 80.54 2.04 59.66 81.99 6.92 5.65 18.96 59.05 23.01 3.80 9.33 0.64 76.48 3.28 14.32 3.02 4.14 6.52 68.72 Table 2: Sample Information I: Number of Observations Total 33595 1996 2564 1997 2855 1998 2831 1999 3305 2000 3316 23 2001 3020 2002 3192 2003 4042 2004 4279 2005 4191 Table 3: Sample Information II: Variable Summary Variables Summary Value sprice mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum mean std.dev minimum maximum 234397.10 91661.71 68000 799000 1416.87 471.11 506.57 4904.66 6983.42 5571.27 501.60 39972.03 3.51 .72 1 9 2.62 .81 1 6 1.34 .75 0 6 .93 .53 0 6 2.74 2.01 0 11 24.76 20.62 0 190 livarea lotarea bedroom totalbath numgarag f ireplace totalappl age Table 4: Correlation Matrix (Ottawa area)(obs=33595) sprice livarea lotarea bedroom totalbath numgarag fireplace sprice 1.0000 0.5124 0.1051 0.3825 0.4039 0.3322 0.3361 livarea lotarea bedroom totalbath numgarag fireplace 1.0000 0.2679 0.4640 0.4466 0.3664 0.3492 1.0000 0.0640 -0.0788 0.1901 0.0593 1.0000 0.4016 0.2543 0.1942 1.0000 0.5241 0.3561 1.0000 0.3030 1.0000 24 Table 5: Correlation Matrix (Ottawa City) (obs=10510) sprice livarea lotarea bedroom totalbath numgarag fireplace sprice 1.0000 0.5087 0.0716 0.3673 0.4721 0.3106 0.3673 livarea lotarea bedroom totalbath numgarag fireplace 1.0000 0.2381 0.4619 0.5663 0.3830 0.4054 1.0000 0.0687 0.0167 0.1960 0.1328 1.0000 0.4492 0.2087 0.2019 1.0000 0.5050 0.3647 1.0000 0.3019 1.0000 Table 6: Correlation Matrix (Inner Suburb) (obs=23085) sprice livarea lotarea bedroom totalbath numgarag fireplace sprice 1.0000 0.5370 0.1542 0.4042 0.4366 0.4642 0.3358 livarea lotarea bedroom totalbath numgarag fireplace 1.0000 0.2819 0.4679 0.3846 0.3672 0.3131 1.0000 0.0638 -0.1754 0.1436 0.0227 1.0000 0.3817 0.2986 0.1852 1.0000 0.4584 0.3306 1.0000 0.2799 1.0000 25 Table 7: Model Selection Statistics Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Criterion 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC 2 R Root MSE AIC SC Linear 0.60 29941 60158.45 60322.23 0.61 35846 68010.76 68177.55 0.62 35407 67369.41 67535.97 0.60 38525 79202.47 79373.36 0.59 46474 80710.06 80881.04 0.60 46463 73506.64 73675.01 0.63 47787 77870.78 78040.7 0.56 55257 99773.93 99950.45 0.60 58025 106040.8 106218.9 0.59 60822 104255 104432.6 Semilog 0.66 0.14 -2748.70 -2584.91 0.67 0.16 -2422.15 -2255.36 0.67 0.16 -2270.38 -2103.83 0.66 0.17 -2497.96 -2327.07 0.64 0.18 -1807.01 -1636.03 0.64 0.17 -2062.09 -1893.72 0.66 0.16 -2636.26 -2466.34 0.61 0.17 -2887.83 -2711.30 0.63 0.17 -3166.71 -2988.59 0.62 0.17 -3044.39 -2866.85 26 Log-linear 0.66 0.14 -2774.70 -2610.92 0.67 0.16 -2464.84 -2298.05 0.67 0.16 -2298.38 -2131.83 0.66 0.16 -2538.43 -2367.54 0.64 0.18 -1827.68 -1656.70 0.65 0.17 -2086.29 -1917.92 0.66 0.16 -2647.65 -2477.73 0.61 0.17 -2895.11 -2718.58 0.63 0.17 -3209.84 -3031.72 0.63 0.17 -3074.02 -2896.48 Box-Cox 0.66 2.0e-05 -47701.06 -47548.98 0.66 4.2e-05 -49432.71 -49271.87 0.66 0.00032 -37513.40 -37346.84 0.66 .00024 -45694.65 -45523.76 0.64 0.00032 -43871.63 -43700.65 0.64 0.00168 -29992.62 -29824.26 0.65 .00257 -28998.58 -28828.66 0.61 0.00055 -49108.3 -48931.78 0.63 0.00048 -53252.43 -53074.31 0.62 0.00034 -54968.25 -54790.72 Table 8: Box-Cox Test for Functional Forms: LR Statistic χ21 Test H0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 λ =-1 33.75 61.58 148.89 170.76 177.51 281.07 291.52 256 256.26 219.55 λ =0 semilog 262.71 312.01 180.38 250.52 219.07 102.39 77.98 185.81 195.24 208.77 λ =1 linear 1666.9 2089.9 1653.68 2148.06 1975.65 1390 1337.04 1916.01 1946.37 1950.68 λ = 0∗ log linear 45.10 26.89 55.21 53.38 82.34 41.37 119.21 56.50 39.79 34.66 λ -0.73 -0.68 -0.51 -0.54 -0.52 -0.37 -0.33 -0.46 -0.64 -0.49 Note: we take log for the independent variables first, and transform the dependent variable, then test if λ = 0, i.e., test if the log linear model results. Table 9: Regression Results from the Pooled Semilog Model Robust standard errors. Number of obs = 33595 F (36, 33558) = 3170.93 Prob > F = 0.0000 R2 = 0.7793 Root MSE = 0.1673 log-sprice livarea lotarea bedroom totalbath numgarag f ireplace totalappl age age2 brick newhouse hardwd natgas corner culdesc patio shopnrb centvac pool whirlbath sauna aircon Ottawasouth Ottawaeast Ottawawest F arwest suburb d1997 d1998 d1999 d2000 d2001 d2002 d2003 d2004 d2005 cons Coef. .0001604 3.38e-06 .0431089 .0655097 .0864505 .0764679 .0051004 -.0028966 .0000302 .0144303 .1221403 .0648301 .0442921 -.0250885 .0199133 .0147051 -.0084993 .0139118 -.0620345 .0526016 .0539008 .0469569 -.3249044 -.2384961 -.184406 -.2556048 -.41067 .0151599 .0236411 .0528763 .1530157 .2694625 .3736802 .4350159 .488358 .5115928 11.53436 Robust Std. Err 3.01e-06 2.51e-07 .001757 .0019035 .0020116 .0022625 .0005139 .0002045 2.64e-06 .0027302 .0072365 .0019529 .0032885 .0038048 .0043107 .0023288 .0019344 .0021992 .003773 .0035742 .0147481 .0025036 .0093408 .0129861 .0100027 .0096883 .0093715 .0041171 .0041721 .0040781 .0043156 .0042921 .0040883 .0040016 .0039488 .0040203 .0119598 27 t 53.20 13.45 24.54 34.42 42.98 33.80 9.92 -14.16 11.44 5.29 16.88 33.20 13.47 -6.59 4.62 6.31 -4.39 6.33 -16.44 14.72 3.65 18.76 -34.78 -18.37 -18.44 -26.38 -43.82 3.68 5.67 12.97 35.46 62.78 91.40 108.71 123.67 127.25 964.43 P > |t| 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 95% Conf. Intl .0001545 2.89e-06 .039665 .0617788 .0825077 .0720333 .0040931 -.0032975 .0000251 .0090791 .1079564 .0610023 .0378466 -.0325459 .0114641 .0101407 -.0122907 .0096014 -.0694297 .0455959 .024994 .0420497 -.3432126 -.2639492 -.2040116 -.2745941 -.4290384 .0070902 .0154636 .0448832 .144557 .2610499 .365667 .4271726 .4806182 .5037128 11.51092 Conf.Intl .0001663 3.87e-06 .0465527 .0692407 .0903934 .0809026 .0061077 -.0024957 .0000354 .0197816 .1363242 .0686578 .0507377 -.017631 .0283624 .0192696 -.0047079 .0182222 -.0546393 .0596072 .0828076 .0518641 -.3065961 -.2130429 -.1648003 -.2366154 -.3923017 .0232296 .0318186 .0608695 .1614743 .2778751 .3816934 .4428591 .4960978 .5194727 11.5578 Table 10: Regression Results from the Semilog Model, 2005 Robust standard errors. Number of obs = 4191 F (27, 4163) = 219.39 Prob > F = 0.0000 R2 = 0.6272 Root MSE = 0.16772 log-sprice livarea lotarea bedroom totalbath numgarag f ireplace totalappl age age2 brick newhouse hardwd natgas corner culdesc patio shopnrb centvac pool whirlbath sauna aircon Ottawasouth Ottawaeast Ottawawest F arwest suburb cons Coef .0001478 4.55e-06 .0343227 .0662561 .0646113 .0684599 .0058 -.0035642 .0000336 .0100405 .1284248 .072675 .0395919 -.0307366 .0322124 .0233781 .0112075 .0128648 -.0645834 .0105023 .040411 .0469674 -.3920661 -.2812344 -.2199742 -.2970853 -.4899332 12.19424 Robust Std. Err 8.30e-06 6.57e-07 .0041877 .0051032 .0059326 .0062454 .0013483 .0004198 4.86e-06 .0073325 .0203979 .0054843 .0098599 .0111669 .0151644 .0067789 .0062142 .0059579 .0110921 .0097272 .0367587 .0086054 .0216062 .0305785 .0245603 .0230612 .0218399 .0315485 t 17.79 6.92 8.20 12.98 10.89 10.96 4.30 -8.49 6.91 1.37 6.30 13.25 4.02 -2.75 2.12 3.45 1.80 2.16 -5.82 1.08 1.10 5.46 -18.15 -9.20 -8.96 -12.88 -22.43 386.52 P > |t| 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.171 0.000 0.000 0.000 0.006 0.034 0.001 0.071 0.031 0.000 0.280 0.272 0.000 0.000 0.000 0.000 0.000 0.000 0.000 95% Conf. Intl .0001315 3.26e-06 .0261125 .0562511 .0529802 .0562155 .0031566 -.0043872 .0000241 -.0043351 .0884341 .0619229 .0202612 -.0526297 .0024821 .0100879 -.0009756 .0011842 -.0863298 -.0085682 -.0316556 .0300963 -.4344259 -.3411845 -.2681255 -.3422976 -.5327511 12.13239 Conf. Intl .0001641 5.84e-06 .0425328 .076261 .0762424 .0807042 .0084435 -.0027411 .0000431 .0244161 .1684156 7.0834271 .0589226 -.0088436 .0619426 .0366683 .0233906 .0245454 -.042837 .0295727 .1124776 .0638385 -.3497064 -.2212842 -.1718228 -.251873 -.4471152 12.2561 Table 11: Chow Test (F test) for Structural Change Model Semilog 96-97 2.75 97-98 1.80 98-99 1.23 99-00 1.46 00-01 2.09 01-02 2.60 02-03 2.36 03-04 1.55 04-05 1.39 Pooled 4.10 Table 12: Breusch-Pagan Test/Cook-Weisberg Test, χ21 , for Heteroskedasticity Model Linear Semilog Log-linear Box-Cox 1996 2314 329 320 12.04 1997 2208 285 241 2.88 1998 1580 147 131 0.31 1999 2082 200 197 0.09 2000 1610 159 151 0.08 28 2001 1061 70 66 0.16 2002 1242 59 57 0.02 2003 1415 115 105 1.24 2004 1483 170 165 2.46 2005 1391 149 133 0.12 Table 13: Housing Price Indexes (Semilog Model), 1996=100 Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 NHPI 100.00 100.60 101.31 103.92 111.57 124.45 134.10 139.13 148.29 155.13 Median 100.00 102.15 104.72 108.58 118.86 137.94 154.19 164.79 176.68 183.10 Pooled 100.00 101.53 102.39 105.43 116.53 130.93 145.31 154.50 162.96 166.79 Adjacent 100.00 101.61 102.50 105.56 116.75 131.11 145.43 154.90 163.38 167.11 Laspeyres 100.00 101.60 102.55 105.70 116.83 131.24 145.71 155.54 163.98 167.75 Paasche 100.00 101.66 102.49 105.45 116.70 131.01 145.17 154.06 162.59 166.25 Fisher 100.00 101.63 102.52 105.58 116.77 131.13 145.44 154.79 163.28 167.00 Table 14: Housing Price Indexes (Semilog Model), year to year, % Year 1997 1998 1999 2000 2001 2002 2003 2004 2005 NHPI 0.60 0.70 2.58 7.36 11.54 7.76 3.75 6.58 4.61 Median 2.15 2.52 3.68 9.47 16.05 11.78 6.87 7.21 3.64 Pooled 1.53 0.85 2.97 10.53 12.35 10.98 6.33 5.48 2.35 Adjacent 1.61 0.88 2.99 10.60 12.30 10.92 6.51 5.48 2.28 29 Laspeyres 1.60 0.94 3.07 10.53 12.33 11.02 6.75 5.43 2.30 Paasche 1.66 0.81 2.89 10.67 12.26 10.81 6.12 5.54 2.25 Fisher 1.63 0.88 2.98 10.60 12.30 10.91 6.43 5.48 2.28 Figure 1: Housing Price Indexes (Semilog Model), 1996=100 30 Figure 2: Housing Price Indexes (Semilog Model), Year to Year 31