Studies in Hedonic Resale Housing Price Indexes ∗ Wenzheng Li Statistics Canada

advertisement
Studies in Hedonic Resale Housing Price Indexes∗
Wenzheng Li Statistics Canada
Marc Prud’homme Statistics Canada
Kam Yu Lakehead University
May 17, 2006
Abstract
Hedonic analysis is gaining acceptance as a tool for quality adjustment in goods
and services in official statistics and academic research. Computers and houses are
by far the two most popular products for which much of the hedonic research been
concentrated on. Few comparative studies have, however, looked at the sensitivity
of the price indexes that are obtained when different regression approaches (e.g.,
pooled regression, adjacent period regression, and separate regression) are applied
to the data. This paper attempts to provide answers as to the sensitivity of the
hedonic results to the regression approach used. Furthermore, the paper explores
how functional forms affect results. The data used are prices for resale houses
in the Ottawa area from the Multiple Listing Service (MLS) for the period 1996
to 2005. Characteristics of the database include a large number of explanatory
variables and observations.
1
Introduction
The main reason to conduct research on Resale Housing Price Indexes (RHPI) is to test
if the new and resale housing prices depict similar movement. The New Housing Price
Index series (NHPI) play an important role in the consumer price index (CPI) and national accounts. For the CPI, the NHPI (exclusive of land) is used in the calculation of
replacement cost index and index of homeowners’ insurance for owned accommodation;
∗
Paper presented to the Canadian Economic Association 40th Annual Meetings, May 26–28, 2006,
Concordia University, Montréal.
the NHPI (including land) is used to compute mortgage interest for owned accommodation. For the National Accounts, the NHPI (excluding land) is used to deflate the value
of residential building construction and the value of the national housing stock. The
matched model index used in the NHPI, however, is subject to small sample size and
difficulty to properly adjust for quality change. Also NHPI is for new houses, which is
only a part of the housing market. The RHPI samples both new and resale houses and
therefore are more comprehensive.
The hedonic approach has been proved to be a better method to compute price
indexes allowing for quality change. Due to the difficulty in separating out land value
from house prices, hedonic RHPI obtained in this paper can only be compared with
the NHPI including land value. Also, RHPI cannot replace NHPI within Statistics
Canada because NHPI is used as a deflator for national account purposes. A hedonic
RHPI can be used to test if NHPI properly tracks the new house price movement and
to provide some methodological support for computing a future quality-adjusted NHPI.
Our approach is an extension of previous work carried out in Prices Division, Statistics
Canada.1
2
Overview of Three Approaches
Quality change is one of the well-known issue in constructing price indexes. For products
with physical specification and characteristics that change frequently, a quality adjustment procedure is necessary to avoid biases. The common method used by statistical
agencies is the matched model method. Studies have shown that matched-models often
miss price changes when new models of products are introduced. A hedonic analysis,
on the other hand, can theoretically capture the pure price change in lieu with quality changes. Here we briefly discuss three commonly used methods in housing price
1
See Prud’homme et al (2004).
2
measurement.
2.1
Average and Median Sales Price Approach
The change in the median (and sometimes the average) sale price of existing homes is
often used to measure price changes. These statistics are often cited because they are
easily understood and available for most geographical areas in Canada through regular
releases by an area real estate board. Such statistics are however misleading indicators
of house price appreciation.
The average price treats housing as a homogeneous product and ignore any quality
changes. The resulting bias in the price index can be severer than the matched-model
index. The underlying index formula, which is effectively the ratio of the arithmetic
means, is the so-called Dutot index. This index is known to be sensitive to units of
measurement and quality changes.2
The advantage of the median price is that it is immune from outliers resulting from
measurement errors. For example, the median price remains unchanged even when prices
in the upper quantile or the lower quantile experience large changes while the ranking of
the median priced housing is unaffected. This property, however, becomes problematic
in situation where there are large movements in either quantile and the median is unable
to reflect the price changes.
2.2
Repeat Sales Approach
The matched model method that is frequently applied to merchandise goods is often
called the repeated sales approach in the housing market. It estimates price trends from
transactions for properties that have been sold more than once over the sample period.
The idea was originally proposed by Baily et al (1963).
2
See Diewert (2004).
3
The main advantage of the repeat sales approach is reproducibility, i.e., different
statisticians given the same data on the sales of housing units will come up with the
same estimate of quality adjusted price change.
The main disadvantage of the repeat sales approach is that it does not use all of the
available information on housing unit sales; it uses only information on housing units that
sold more than once during the sample period. Second, it cannot deal adequately with
depreciation of the housing structure. Third, it cannot deal adequately with housing
units that have undergone major repairs or renovations. Finally, it does not allow for
changes in the implicit price of particular housing attributes over time. In fact, it is
likely that each attribute has its own price determined by the demand for and supply of
that attribute.3
Also the repeat sales approach is subject to sample selectivity bias. Usually houses
sold repeatedly tend to have inferior quality. In other words, they are not representative
of the entire population of properties that sold.
2.3
Hedonic Approach
The hedonic model can be considered as an equilibrium model in product differentiation.
The product price is assumed to be a function of a set of characteristics. When the
characteristics are expressed as a linear function of the product price, the estimated
coefficients of the characteristics can be interpreted as their implicit prices. The part of
overall price change from one period to another which is not accounted for by the changes
in characteristics is then interpreted as pure price change. Classic studies in hedonic
analysis include Griliches (1961) 0n automobiles and Chow (1967) on computers.
Compared with the repeat sales approach, the hedonic regression model has the
following advantages. First, it uses all of the information on housing sales in each
3
See Diewert (2003a), page 38.
4
sample period and not just the data that can be matched. Second, it can adjust for the
effects of depreciation if the age of the structure is known at the time of sale. Third, it
can adjust for the effects of renovations and repairs if expenditures on renovation and
extensions are known at the time of sale (Diewert, 2003a, p. 37).
Generally, there are three ways to carry out hedonic regression approaches: 1) the
time dummy variable method, 2) the characteristics price index method; and 3) hedonic
price imputation method. A description of these three methods is given in the section 5.
See Triplett (2004) for further discussion.
The U.S. Bureau of Census started to produce a price index for single-family homes
under construction in 1968 (Moulton, 2001). It was the first hedonic index produced
within the context of a regular statistical program and is still ongoing today.
Based on the information provided by Statistics Norway website, sub-indexes for
two different housing types are calculated using the hedonic method in Norway. The
housing types are 1) detached houses, semi detached houses, row houses, linked houses
and houses with three or four dwellings and 2) apartment buildings with five or more
dwellings.
3
Features of This Study
The distinguishing features of this study are:
1. MLS data provide a large number of house characteristics including living area,
number of bedrooms, number of bathrooms, number of garages, number of fireplaces, number of appliances, location, age, etc. See table 1 for variable descriptions.
2. Three approaches including the pooled dummy variable method, the adjacent period dummy variable method, and the characteristics price index method will be
5
tested. In most studies, the pooled dummy variable method is used because of
simplicity and convenience for dealing with historical data. In the presence of
structural change across time, the adjacent period dummy variable method is preferred to the pooled regression. The characteristics price index method is generally
thought as an ideal method based on the following reasons: 1) It utilizes the implicit characteristics price to compute price index; 2) It is not necessary to test
for structural change across time; and 3) Traditional price index formulae can be
applied.
3. Various functional forms for hedonic regressions are compared. In most studies,
only three popular functional forms including the linear model, the semi-log model
and the log linear model are utilized. In this paper, the Box-Cox model is also
tested.
The limitations of this study are as follows:
1. Ideally, location classification should be small enough to justify hedonic interpretation. Conniffe (1999) argues that there are difficulties in defining and measuring the
underlying causal factors for location variables. Some research has gone to great
efforts to tie location to access to amenities like schools, shopping, recreational facilities etc., but with rather little success. Alternatively, a separate regression can
be performed for each location, to offset the lack of information on neighbourhood
environment, crime rates, convenience to schools, and other amenities. In Conniffe
(1999) the income status and stage of life cycle of the residents are measured instead of direct location variables. In this study we use location dummy variables
instead of running a regression for each location.
2. House types should be classified at the more detailed level such as two storey
detached houses, three storey detached houses, etc. because only houses of the
6
same type can be treated as homogeneous products. Then overall housing price
index can be obtained by aggregating the price index for each house type. In this
study, all the detached houses are pooled into one regression for convenience.
3. The MLS data does not provide any information on maintenance and renovation
expenditures. Also, information on the quality of materials used in construction
or construction techniques are not available.
4. The benchmark NHPI includes the Ottawa-Gatineau region. The MLS data for
Gatineau is not available. Since housing construction in Gatineau only represents
a very small percentage in the Ottawa-Gatineau region, it is still appropriate to
use the NHPI for Ottawa-Gatineau as a benchmark for our hedonic RHPI.
4
Data and Variables
The Ottawa Real Estate Board is a trade association of over 1800 registered brokers and
salespeople in the Ottawa area. The MLS is a co-operative marketing system used by
the Board’s members to ensure maximum exposure of properties listed for sale, lease or
rent on the Board’s computer system. The following list gives a brief description of the
original data and the process of data construction.
1. Housing type: Only detached houses are included for the hedonic RHPI since NHPI
only covers single detached houses. Moreover, since structural changes across house
types may be present, it is not appropriate to pool detached, semi-detached, row or
condominium in the same regression. In the data set, residential detached houses
account for 70% of the total, semi detached houses account for 9%, and row units
account for 21%.
2. Location classification:
7
(a) Ottawa city: includes Downtown Core, Ottawa south, Ottawa east, Ottawa
west and Far west.
(b) Inner suburb: includes Orleans, Metcalfe, Nepean, Manotick, Stittsville and
Kanata.
(c) Outer suburb: includes Arnprior, Rockland, Winchester, Alexandria, Kemptville,
Westport, Carletonplace, Almonte, and Carp.
Since the NHPI for Ottawa-Gatineau does not cover the outer suburb in Ottawa,
it is not included in our analysis. Farm houses in these areas should also be treated
separately. In all the models tested here, there are five location dummy variables,
which are Ottawasouth, Ottawaeast, Ottawawest, Farwest and suburb. The variable
suburb refers to the inner suburb. The variable Downtowncore is omitted since it is
used as a base category. In the data set, 69% of resale detached houses are located
in the inner suburb, which share the similar location characteristics Therefore
only one location dummy variable, suburb, is used for these houses. On the other
hand, prices of resale detached houses located in the Ottawa city vary significantly
with the changing location. Therefore, four location dummy variables are used
to distinguish these locations. In the data set, 14% houses are located in Ottawa
south, 3% in Ottawa east, 4.% in Ottawa west, 7% in far west, 3% in downtown
core, and the rest in the inner suburb.
3. Age squared: Usually house price has a negative relationship with age of the unit.
However, house price may also presents quadratic relationship with age squared
since older houses may have a better location and some other advantages. Thus
both variables of age and age squared are used.
4. Continuous variables: There are 9 continuous variables, namely, living area, lot
area, number of bedroom, number of bathroom, number of garage, number of
8
fireplace, number of appliance, age and age squared. All of these variables are
assumed to closely follow the house price movement.
5. Dummy variables: Besides location dummy variables, there are 13 other dummy
variables for various features and environmental amenities. They include brick for
exterior finish, new house, hard wood floor, natural gas for heating fuel, corner,
Cul-de-Sac, shopping nearby, patio, central/built in vacuum, pool, whirl bath,
sauna, and air conditioning. The corresponding omitted categories for all these
dummy variables are used as bases.
6. Data construction: By inspection some outliers in the data set are discovered.
Data are cleaned by the following procedure.
(a) Create a new data set which includes only detached houses located in the
City of Ottawa and the inner suburb.
(b) Drop observations if sold price is less than CAD $65,000 or greater than CAD
$800,000.
(c) Drop observations if living area is less than 500 square feet or greater than
5000 square feet.
(d) Drop observations if lot area is below 500 square feet or greater than 40000
square feet.
(e) Drop observations if number of bedroom equal to zero or greater than 10.
(f) Drop observations if number of total bathroom is equal to zero or greater
than 7, i.e., exclude large values such as 50.
(g) Drop observations if number of garage is greater than 7, i.e., exclude large
values such as 44.
(h) Drop observations if they are old houses with built year missing.
9
(i) Drop observations if they are mobile homes since mobile homes are not representative of detached houses.
(j) Drop observations if information for exterior finish is missing.
The data span from 1996 to 2005, with a total of 33,595 observations. See tables 1, 2
and 3 for the detailed description of variables and sample information.
5
Methodology
5.1
The pooled time dummy variable method
The observations for all the periods are pooled into one regression. Only the intercept is
allowed to change across the periods in this regression. The coefficients for characteristics
are constrained to remain the same across the periods. For example, for the semi-log
model,
log p = β0 +
K
X
βj Xj +
j=1
t
X
γi Di + .
(1)
i=2
where p denotes the prices of a product for all the periods, βj measures the logarithms
of implicit price for characteristic j, and Xj denotes the quantities of characteristic j.
Di is the time dummy variable, and takes on value of 1 if the transaction occurs at the
certain period i, and 0 otherwise. The coefficient γi measures the logarithm of price
index for each period with the first period as the base period. The price indexes are
obtained by taking antilog for coefficients of each time dummy variable γi .
10
5.2
The adjacent period time dummy variable method
Only the observations for two adjacent periods are pooled into one regression. For the
adjacent periods, the model can be written as:
log p = β0 +
K
X
βj Xj + γD + .
(2)
j=1
The interpretation of the model is similar to the pooled regression. The only difference
is that p denotes the prices of a product for the adjacent periods. D denotes the time
dummy variable for the comparison period. The price index for each period can be
obtained by taking antilog for γ.
Between these two time dummy variable methods, the adjacent period regression is
usually preferred because of the possible presence of structural change across time. In
that case the assumption of parameter stability across time does not hold, so the pooled
regression approach should not be used. Structural break usually occurs under rapid
technological or taste changes. Since technology does not develop so fast for building a
house, it may not pose a problem for running a pooled regression for housing price. The
Chow test can be used to test if structural change is present. If the test result is above
the critical value, the adjacent period regression is not justified, let alone the pooled
regression for all the periods.
The dummy variable method stands apart from the traditional practice in official
statistics, where price indexes are computed by “formulae”, such as Laspeyres, Paasche,
Fisher, and so forth.
5.3
The characteristics price index method
The motivation for the characteristics price index method comes from the interpretation
of hedonic function coefficients (Triplett, 2004). To construct the characteristics price
11
index, a regression is carried out for each period. Both the intercept and coefficients of
characteristics are allowed to change across the periods.
pt = ct,0 +
K
X
ct,j Xt,j + t .
(3)
j=1
Where pt denotes the price in period t, subscript j denotes each characteristics with value
Xt,j , ct,j represents the implicit price of characteristics j for period t. The intercept term
c0 can be interpreted as a group of characteristics not included in the regression. The
Laspeyres price index for period t can be written as:
PL =
ct,0 +
Pk
ct−1,0 +
Pk
j=1 cj,t Xj,t−1
j=1 cj,t−1 Xj,t−1
.
(4)
The Paasche price index for period t can be written as:
PP =
ct,0 +
Pk
ct−1,0 +
Pk
j=1 cj,t Xj,t
j=1 cj,t−1 Xj,t
.
(5)
The Fisher price index for period t can be written as:
1
2
PF = (PL · PP ) =
! 21
Pk
c
X
c
+
c
X
j,t
j,t−1
j,t
j,t
t,0
j=1
j=1
·
.
Pk
Pk
ct−1,0 + j=1 cj,t−1 Xj,t−1 ct−1,0 + j=1 cj,t−1 Xj,t
ct,0 +
Pk
(6)
Here traditional price index formulae are combined with the characteristics index method.
Triplett (2004) remarks that the price index for characteristics permits breaking the connection between hedonic functional form and index number functional form. This is a
theoretical as well as a practical advantage.
12
5.4
The hedonic price imputation method
This method is a blend of the hedonic regression approach and the matched model
approach. When the matched model breaks down, the hedonic regression can be used to
impute missing price or estimate a quality hedonic adjustment, then the matched model
approach is applied. Theoretically, this method gives the same results as in the “pure”
hedonic methods described above if the same data set is used (Diewert, 2003b). For this
reason this method will not be employed here.
6
Functional Forms
Choosing the functional form is another important issue in hedonic studies. Typically,
analysts use measures of “goodness of fit”, including R2 , the standard error of the
regression, and so forth, for choosing among functional forms.
6.1
Functional forms
The functional forms considered in the study are listed as follows.
1. Linear Model:
K
X
Y = β0 +
βi Xi + u.
i=1
2. Semilog Model:
log Y = β0 +
K
X
βi Xi + u.
i=1
3. Log-linear Model:
log Y = β0 +
K
X
i=1
13
βi log Xi + u.
4. Box-Cox (BC) Model:
Y
(λ)
= β0 +
K
X
βi Xi + u,
i=1
where the Box-Cox transformation is defined as
Y
(λ)
=



Y λ −1
λ
if λ 6= 0

 log Y
if λ = 0
.
The linear, the semilog and the log-linear models are nested by the Box-Cox model.
The linear model results if λ equals 1, while a log-linear or semilog model (depending
on how Y is measured) results if λ equals 0. If λ equals −1, the equation will involve
the reciprocal of Y . Except for the polar cases of λ equal to −1, 0 or 1, it is hard to
conceive of situations in which a particular value would be specified a priori (Greene,
1993, p. 239).
6.2
Choosing among functional forms
In hedonic studies, the semilog model and the log-linear model are widely used for the
following reasons. First, the semilog model and the log-linear model usually generate a
better goodness of fit than the linear model based on adjusted R2 . Second, it is relatively
easier to interpret the coefficients for characteristics than the Box-Cox model.
The Box-Cox (1964) model nests the three popular functional forms and has gained
popularity. It usually rejects the linear, the semilog and the log-linear functional forms
(Triplett, 2004). Our test results also confirm this point. Little is known if the rejection
of the three functional forms is resulted from misspecified variables, omitted variables,
or nonlinearity of the functional form.
In constructing a price index from the results of yearly regressions, we are in effect
performing out of sample prediction. The R2 and adjusted R2 are measures of the good-
14
ness of fit, and are particularly useful evaluating the fit of the model within samples.
When measuring out of sample goodness of fit, other measures, such as Akiake’s Information Criteria (AIC) and the Schwartz Criteria (SC) are better. The lower AIC and
SC are, the better the model.
There are many circumstances in which one is forced to trade off bias and consistency
of estimators. For example, an estimator with very low variance and some bias may be
more desirable than an unbiased estimator with high variance. One criterion which is
useful in this regard is the goal of minimizing mean square error.
Triplett (2004, p. 187) argues that choosing a functional form to reduce heteroscedasticity is not a good idea, for two reasons. First, heteroscedasticity does not bias the
expected values but the standard errors of the coefficients. Methods for dealing with
heteroscedasticity in regression analysis exist. Accordingly, avoiding heteroscedasticity
need not be a factor in choosing hedonic functional forms.
Second, a hedonic function estimates the relation between the prices of product
varieties and the characteristics embedded in them, and gives us estimated implicit
prices for the characteristics. Those implicit prices are our major interest. Choosing an
empirically inappropriate functional form biases our estimates of the hedonic coefficients,
and thus biases the hedonic price index as well.
Rosen (1974) shows that in theory the functional form of a hedonic functions is purely
an empirical issue, to be determined from the analysis of the data.
7
Comparative Analysis of Results
In this section the correlation relationships among the variables are examined. Then the
Box-Cox test is applied to annual estimate of hedonic price equation to find the best
functional form. Criteria for comparing functional forms includes the signs of coefficients,
the value of coefficients, adjusted R2 , root MSE, F-test, AIC, and SC. The Chow test
15
is used to check for structural changes across the adjacent years and across all years for
the semilog model. Some econometric issues are also discussed. Finally, hedonic RHPIs
are computed and compared with NHPI.
7.1
Correlations
Tables 4, 5, and 6 show the correlation among sprice, livarea, lotarea, bedroom, totalbath,
numgarag and fireplace. The dependent variable sprice is included so that we can see
how closely these independent variables follow the price movement. We suspect that
lot area may play a different role in the price movement for different location. Our
observations include:
1. The correlation coefficient between lotarea and livarea is relatively reasonable,
23.8% for the Ottawa city, and 28.2% for the inner suburb. The correlation coefficient between lotarea and numgarag is 19.0% for the Ottawa city, and 14.4% for
the inner suburb.
2. lotarea does not have any reasonable relationship with all other independent variables. Even for the inner suburb itself, lotarea almost has nothing to do with all
other independent variables.
3. The correlation coefficient between lotarea and sprice is 7.2% for the Ottawa city,
and 15.4% for the inner suburb.
4. From the regression results, the t-statistic is significant for lotarea at any reasonable
significance level, the coefficient is too small for all years and all models.
5. The lot area problems may come from the fact that both location variable and lot
area variable are used in the same regression, and location matters more than lot
area for house prices.
16
Multicollinearity does not seem to pose a serious problem in this study. First, the
correlation coefficients among the independent variables are not large. The highest
value, 46.4% is between livarea and bedroom. The possible reason may be that there
are a large number of observations in the data set, which reduces the multicollinearity
among the independent variables. Second, R2 s for all years and all models are moderate,
while individual t-statistics are all significant for the pooled regression at any reasonable
significance level, and significant for most variables in the annual regressions.
7.2
Specification test for functional forms
Since only three variables, namely, sprice, livarea, and lotarea, are continuous variables
with very large values, we experiment with different functional forms applied to these
variables. The rest appear in the equation as linear variables. Tables 7 and 8 report the
specification test and the Box-Cox test results for the annual regressions. Highlights of
the result are:
1. Based on root MSE, the Box-Cox model performs the best, and the semilog model
and the log linear model are superior to the linear model.
2. Based on AIC and SC, the Box-Cox model is the best. The log linear model is
slightly better than the semilog model. The linear model gives the worst performance.
3. Based on the Box-Cox test, all the linear model, the semilog model and the log
linear model are rejected, i.e., the likelihood ratio statistics χ21 are large enough to
reject λ = 0, λ = 1 and λ = −1 for all years, with the 5% critical value 3.84. As
mentioned before, the Box-Cox test usually rejects the other nested models.
Table 9 shows the results for the pooled regression for the semilog model. All the
coefficients have expected signs. All the location dummy variables have negative signs
17
because Downtowncore is used as a base category. Houses located at the inner suburb
have the lowest prices, which are reasonable. All the time dummy variables have positive
signs since house prices had been appreciating during these periods compared with base
year 1996. All the individual t-statistics are significant at the 5% significance level
(the critical value for 36 degrees of freedom is 2.021). The values of coefficients are
also expected except for lotarea. Robust standard errors of the estimated coefficients
accounting for heteroskedasticity are are obtained, which are reasonably small compared
with the value of coefficients. Joint F -statistic is equal to 3,170, well above the critical
value for any reasonable significance level.
Regression results for the semilog model in 2005 are shown in table 10. Most coefficients have the expected signs, such as livarea, lotarea, bedroom, and totalbath. Again
all the location dummy variables have negative signs.
7.3
Specification test for structural change across time
Parameter stability test is based on the semilog model because tests for functional forms
indicate that it gives reasonable results and the Chow test is simple to perform. For the
Cow test, we run an adjacent year regression which includes one time dummy variable
that distinguishes the two periods. For the pooled regression, there are a total of 9 time
dummy variables.
Based on the Chow test (F test) results in Table 11, structural change is present
even for adjacent years. The highest F value is 2.75 for 1996–1997 and the lowest is
1.23 for 1998–1999. The 1% critical value is 1.00 for the very large values of degrees of
freedom for both numerator and denominator. Thus we can see that all the test results
are slightly above the critical value. It is not surprising that the test result for the
pooled regression is higher than those for all the adjacent years. Since structural break
is present for the adjacent year regression, let alone for the pooled regression for all the
18
years.
The test results suggest that the characteristics price index method, i.e. the separate
regression, is preferred. Both the pooled and the adjacent year regressions are inferior
to it due to the presence of structural change. Nevertheless, the F values are not high
for the adjacent year regressions, which indicates that structural changes in consecutive
years are not serious.
7.4
Heteroskedasticity
Table 12 shows that heteroskedastictiy is present for every year for every functional
form except the Box-Cox model from 1997 to 2005. The 5% critical value for χ21 is 3.841.
Although the presence of heteroskedasticity does not bias the estimated coefficients, it
does affect standard errors and therefore the t-statistics. Thus heteroskedasticity-robust
inferences after OLS estimationa are applied. Table 9 and table 10 show the regression
results with robust standard errors.
7.5
Comparing hedonic RHPIs with NHPI and median RHPI
Figure 1 shows the trends of the NHPI, median RHPI and the hedonic RHPIs by using
the pooled regression approach, the adjacent year regression approach and the characteristics price index approach. For the characteristics price index method, the Laspeyres,
Paasche, and Fisher price indexes are computed.
All five hedonic RHPIs predict the similar pattern of price movement with NHPI for
Ottawa with the NHPI being the lower bound. The main reason may be that the NHPI
is only for new houses built in the suburb. Since 31% of resale houses are located in the
City of Ottawa, with houses appreciating faster than new houses built in the suburb,
i.e, land value appreciates faster in the city than in the suburb.
All of these three hedonic methods produce the almost identical results. However, we
19
can still see the slight difference among these indexes: 1) Laspeyres price index imposes
the upper limit; Paasche price index imposes the lower limit; 2) Fisher price index, the
pooled regression price index and the adjacent year regression price index are all between
Laspeyres price index and Paasche price index. These results are consistent with the
index number literature.
The median RHPI gives the most rapid price increase. One possible explanation is
that prices in the high end house market increase less than those of the low end market.
8
Conclusions
Using a data set for the Ottawa area, we have constructed quantity-adjusted price indexes using the hedonic method. The Chow test results indicate that structural changes
between adjacent years are mild though statistically significant. The pooled regression
for the semi-log model, however, results in a price index that closed matched those from
separate regressions on the annual base. In fact the hedonic price indexes are insensitive
to structural changes over the years and to the differences in the Laspeyres and Paasche
types formulation.
The Box-Cox analysis rejects the linear, semilog, and log-linear functional forms. It
also suggests that the problem of heteroskedasticity can be mitigated by choosing the
more correct functional form. The next step in this project is to compute the price
indexes with the Box-Cox regressions and test the sensitivity of the price index with
respect to the functional form.
References
Anglin, Paul M. and Ramazan Gencay (1996) ‘Semiparametric Estimation of a Hedonic
Price Function’, Journal of Applied Econometrics, Vol 11, No.6, 633-648.
20
Baldwin, Andrew and Emad Mansour (2003) ‘Different Perspectives on the Rate of Inflation, 1982-2000: The Impact of Homeownership Costs’, Research Paper, Statistics
Canada.
Bailey, M.J., R.F.Muth and H.O.Nourse (1963) ‘A Regression Method for Real Estate
Price Index Construction’, Journal of the American Statistical Association, 58,
933-944, December.
Berndt, Ernst, Ellen R.Dulberger and Neal J.Rappaport (2000) ‘Price and Quality of
Desktop and Mobile Personal Computers: A Quarter Century of History’
Berndt, Ernst, Zvi Griliches, Neal J. Rappaport ‘Econometric Estimates of Price Indexes
for Personal Computers in the 1990s’, Journal of Econometrics 68, 243-268.
Berndt, Ernst R (1991) ‘The Practice of Econometrics: Classic and Contemporary’,
Addison-Wesley Publishing Company, Reading Massachusetts.
Box, G.E.P. and D.R. Cox (1964), ‘An Analysis of Transformations’, Journal of the
Royal Statistical Society, Series B (Methodological), 26(2), 211-52.
Brachinger, H.W. (2002) ‘Statistical Theory of Hedonic Price Indices’. Working Paper from Department of Quantitative Economics, University of Freiburg/Fribourg
Switzerland.
Chow, Gregory C. (1967) ‘Technological Change and the Demand for Computers,’ American Economic Review, 57(5), 1117-1130.
Conniffe, Denis and David Duffy (1999) ‘Irish House Price Indices - Methodological
Issues’, working paper in the The Economic and Social Review,Vol.30, No.4, 403423.
Diewert, Erwin (2003a) ‘The Treatment of Owner Occupied Housing and Other Durables
in a Consumer Price Index’.
Diewert, Erwin (2003b) ‘Hedonic Regressions: A Consumer Theory Approach,’ in Scanner Data and Price Indexes, Conference on Research in Income and Wealth, Volume 64, Robert C. Feenstra and Matthew D. Shapiro (eds.), National Bureau of
Economic Research, The University of Chicago Press, 317-348.
Diewert, W. Erwin (2004) ‘Elementary Indices,’ in Consumer Price Index Manual: Theory and Practice, Geneva: International Labour Office, Chapter 20, 355-371.
Englund, Peter (1998) ‘Improved Price Indexes for Real Estate: Measuring the Course
of Swedish Housing Prices’, Journal of Urban Economics 44, 171-196.
Fleming, M and J.G.Nellis (1992) ‘Development of Standardized Indices for Measuring House Price Inflation Incorporating Physical and Locational Characteristics’,
Applied Economics24, 1067-1085.
21
Greene, William (1993) ’Econometric Analysis’, A Simon & Schuster Company.
Griliches, Zvi (1961) ‘Hedonic Price Indexes for Automobiles: An Econometric Analysis
of Quality Change’, hearings in the U.S.Cogress.
Gudnason, Rosmundur (2004) ‘Market Price Approach to Simple User Cost’, Statistical
Journal of the United Nations, ECE 21, 147-155.
Hwang, Yoon ‘Resale Housing Price Index’, Prices Division, Statistics Canada.
MacDonald, Larry, ‘The Hedonic Price Index Approach: A Pilot Study of the OttawaCarleton Region’, Prices Division, Statistics Canada.
MacDonald, Larry (1986) ‘Hedonic Models of Housing: An Examination With Reference
to New Housing in the Ottawa Area’, Prices Division, Statistics Canada.
McDonald, John (1980) ‘The Use of Proxy Variables in Housing Price Analysis,’ Journal
of Urban Economics 7, 75-83.
Moulton, Brent (2001) ‘The Expanding Role of Hedonic Methods in the Official Statistics of the United States’, U.S. Bureau of Economic Analysis.
Poole, Robert (2005) ‘Treatment of Owner-Occupied Housing in the CPI’, U.S.Bureau
of Labor Statistics.
Prud’Homme, Marc, Dimitri Sanga and Holly Shum (2004) ‘From Average Price to
Hedonic Price Indexes: A “Preliminary”Investigation into Various Measures of
Trends in Existing House Prices Using MLS Data for Ottawa’, Prices Division,
Statistics Canada.
Ribe, Martin (2004) ‘Swedish Re-considerations of User-Cost Approaches to Owner Occupied Housing’, Statistical Journal of the United Nations, ECE 21, 139-146.
Rosen, Sherwin (1974) ’Hedonic Prices and Implicit Markets: Product Differentiation
in Pure Competition’, Journal of Political Economy, 82(1) (January-February),
34-55.
Tong, Zhong Yi and John L. Glasscock (2000) ‘Price Dynamics of Owner-Occupied
Housing in the Baltimore-Washington Area: Does Structure Type Matter?’, Journal of Housing Research. Volume 11, Issue 1.
Triplett, Jack (2004) ‘Handbook on Hedonic Indexes and Quality Adjustments in Price
Indexes’, OECD Publishing.
Yu, Kam and Marc Prud’Homme (2005) ‘Econometric Issues in Hedonic Price Indices:
The Case of Internet Service Providers’.
22
Table 1: Variable Description
Variables
sprice
livarea
lotarea
bedroom
totalbath
numgarage
f ireplace
totalappl
age
age2
brick
newhouse
hardwd
natgas
corner
culdesc
patio
shopnrb
centvac
pool
whirlbath
sauna
aircon
Downtowncore
Ottawasouth
Ottawaeast
Ottawawest
F arwest
suburb
Note:
Description
Sale price
Total square footage of living area in the unit
Total square footage of lot area
Number of reported bedrooms
Number of reported bathrooms
Number of garage
Number of fireplace
Number of total appliances.
Age of a unit
Age squared
If exterior finish is brick then brick = 1; Otherwise=0
If unit’s age is zero then newhouse = 1; Otherwise=0
If unit has hardwood then hardwd = 1; Otherwise=0
If heating fuel is natural gas then natgas = 1; Otherwise=0
If unit is at corner then corner = 1; Otherwise=0
If unit is at Cul-de-Sac then culdesc = 1; Otherwies=0
If unit has patio then patio = 1; Otherwise=0
If shopping center is nearby then shopnrb = 1; Otherwise=1
If unit has Central/Built-In Vacuum then centvac = 1; Otherwise=0
If unit has an indoor or outdoor pool then pool = 1; Otherwise=0
If unit has whirlbath then whirlbath = 1; Otherwise=0
If unit has sauna then sauna = 1; Otherwise=0
If unit has air condition then aircon = 1; Otherwise=0
If unit is located at downtown core then Downtowncore = 1; Otherwise=0
If unit is located at Ottawa south then Ottawasouth = 1; Otherwise=0
If unit is located at Ottawa east then Ottawaeast = 1; Otherwise=0
If unit is located at Ottawa west then Ottawawest = 1; Otherwise=0
If unit is located at Ottawa west then F arwest = 1; Otherwise=0
If unit is located at inner suburb then suburb = 1; Otherwise=0
The variable Downtowncore is used as a base category for location.
Percentage
80.54
2.04
59.66
81.99
6.92
5.65
18.96
59.05
23.01
3.80
9.33
0.64
76.48
3.28
14.32
3.02
4.14
6.52
68.72
Table 2: Sample Information I: Number of Observations
Total
33595
1996
2564
1997
2855
1998
2831
1999
3305
2000
3316
23
2001
3020
2002
3192
2003
4042
2004
4279
2005
4191
Table 3: Sample Information II: Variable Summary
Variables
Summary
Value
sprice
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
mean
std.dev
minimum
maximum
234397.10
91661.71
68000
799000
1416.87
471.11
506.57
4904.66
6983.42
5571.27
501.60
39972.03
3.51
.72
1
9
2.62
.81
1
6
1.34
.75
0
6
.93
.53
0
6
2.74
2.01
0
11
24.76
20.62
0
190
livarea
lotarea
bedroom
totalbath
numgarag
f ireplace
totalappl
age
Table 4: Correlation Matrix (Ottawa area)(obs=33595)
sprice
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
sprice
1.0000
0.5124
0.1051
0.3825
0.4039
0.3322
0.3361
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
1.0000
0.2679
0.4640
0.4466
0.3664
0.3492
1.0000
0.0640
-0.0788
0.1901
0.0593
1.0000
0.4016
0.2543
0.1942
1.0000
0.5241
0.3561
1.0000
0.3030
1.0000
24
Table 5: Correlation Matrix (Ottawa City) (obs=10510)
sprice
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
sprice
1.0000
0.5087
0.0716
0.3673
0.4721
0.3106
0.3673
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
1.0000
0.2381
0.4619
0.5663
0.3830
0.4054
1.0000
0.0687
0.0167
0.1960
0.1328
1.0000
0.4492
0.2087
0.2019
1.0000
0.5050
0.3647
1.0000
0.3019
1.0000
Table 6: Correlation Matrix (Inner Suburb) (obs=23085)
sprice
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
sprice
1.0000
0.5370
0.1542
0.4042
0.4366
0.4642
0.3358
livarea
lotarea
bedroom
totalbath
numgarag
fireplace
1.0000
0.2819
0.4679
0.3846
0.3672
0.3131
1.0000
0.0638
-0.1754
0.1436
0.0227
1.0000
0.3817
0.2986
0.1852
1.0000
0.4584
0.3306
1.0000
0.2799
1.0000
25
Table 7: Model Selection Statistics
Year
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Criterion
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
2
R
Root MSE
AIC
SC
Linear
0.60
29941
60158.45
60322.23
0.61
35846
68010.76
68177.55
0.62
35407
67369.41
67535.97
0.60
38525
79202.47
79373.36
0.59
46474
80710.06
80881.04
0.60
46463
73506.64
73675.01
0.63
47787
77870.78
78040.7
0.56
55257
99773.93
99950.45
0.60
58025
106040.8
106218.9
0.59
60822
104255
104432.6
Semilog
0.66
0.14
-2748.70
-2584.91
0.67
0.16
-2422.15
-2255.36
0.67
0.16
-2270.38
-2103.83
0.66
0.17
-2497.96
-2327.07
0.64
0.18
-1807.01
-1636.03
0.64
0.17
-2062.09
-1893.72
0.66
0.16
-2636.26
-2466.34
0.61
0.17
-2887.83
-2711.30
0.63
0.17
-3166.71
-2988.59
0.62
0.17
-3044.39
-2866.85
26
Log-linear
0.66
0.14
-2774.70
-2610.92
0.67
0.16
-2464.84
-2298.05
0.67
0.16
-2298.38
-2131.83
0.66
0.16
-2538.43
-2367.54
0.64
0.18
-1827.68
-1656.70
0.65
0.17
-2086.29
-1917.92
0.66
0.16
-2647.65
-2477.73
0.61
0.17
-2895.11
-2718.58
0.63
0.17
-3209.84
-3031.72
0.63
0.17
-3074.02
-2896.48
Box-Cox
0.66
2.0e-05
-47701.06
-47548.98
0.66
4.2e-05
-49432.71
-49271.87
0.66
0.00032
-37513.40
-37346.84
0.66
.00024
-45694.65
-45523.76
0.64
0.00032
-43871.63
-43700.65
0.64
0.00168
-29992.62
-29824.26
0.65
.00257
-28998.58
-28828.66
0.61
0.00055
-49108.3
-48931.78
0.63
0.00048
-53252.43
-53074.31
0.62
0.00034
-54968.25
-54790.72
Table 8: Box-Cox Test for Functional Forms: LR Statistic χ21
Test H0
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
λ =-1
33.75
61.58
148.89
170.76
177.51
281.07
291.52
256
256.26
219.55
λ =0 semilog
262.71
312.01
180.38
250.52
219.07
102.39
77.98
185.81
195.24
208.77
λ =1 linear
1666.9
2089.9
1653.68 2148.06 1975.65
1390
1337.04 1916.01 1946.37
1950.68
λ = 0∗ log linear
45.10
26.89
55.21
53.38
82.34
41.37
119.21
56.50
39.79
34.66
λ
-0.73
-0.68
-0.51
-0.54
-0.52
-0.37
-0.33
-0.46
-0.64
-0.49
Note: we take log for the independent variables first, and transform the dependent variable, then test if λ = 0, i.e., test
if the log linear model results.
Table 9: Regression Results from the Pooled Semilog Model
Robust standard errors.
Number of obs = 33595
F (36, 33558) = 3170.93
Prob > F = 0.0000
R2 = 0.7793
Root MSE = 0.1673
log-sprice
livarea
lotarea
bedroom
totalbath
numgarag
f ireplace
totalappl
age
age2
brick
newhouse
hardwd
natgas
corner
culdesc
patio
shopnrb
centvac
pool
whirlbath
sauna
aircon
Ottawasouth
Ottawaeast
Ottawawest
F arwest
suburb
d1997
d1998
d1999
d2000
d2001
d2002
d2003
d2004
d2005
cons
Coef.
.0001604
3.38e-06
.0431089
.0655097
.0864505
.0764679
.0051004
-.0028966
.0000302
.0144303
.1221403
.0648301
.0442921
-.0250885
.0199133
.0147051
-.0084993
.0139118
-.0620345
.0526016
.0539008
.0469569
-.3249044
-.2384961
-.184406
-.2556048
-.41067
.0151599
.0236411
.0528763
.1530157
.2694625
.3736802
.4350159
.488358
.5115928
11.53436
Robust Std. Err
3.01e-06
2.51e-07
.001757
.0019035
.0020116
.0022625
.0005139
.0002045
2.64e-06
.0027302
.0072365
.0019529
.0032885
.0038048
.0043107
.0023288
.0019344
.0021992
.003773
.0035742
.0147481
.0025036
.0093408
.0129861
.0100027
.0096883
.0093715
.0041171
.0041721
.0040781
.0043156
.0042921
.0040883
.0040016
.0039488
.0040203
.0119598
27
t
53.20
13.45
24.54
34.42
42.98
33.80
9.92
-14.16
11.44
5.29
16.88
33.20
13.47
-6.59
4.62
6.31
-4.39
6.33
-16.44
14.72
3.65
18.76
-34.78
-18.37
-18.44
-26.38
-43.82
3.68
5.67
12.97
35.46
62.78
91.40
108.71
123.67
127.25
964.43
P > |t|
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
95% Conf. Intl
.0001545
2.89e-06
.039665
.0617788
.0825077
.0720333
.0040931
-.0032975
.0000251
.0090791
.1079564
.0610023
.0378466
-.0325459
.0114641
.0101407
-.0122907
.0096014
-.0694297
.0455959
.024994
.0420497
-.3432126
-.2639492
-.2040116
-.2745941
-.4290384
.0070902
.0154636
.0448832
.144557
.2610499
.365667
.4271726
.4806182
.5037128
11.51092
Conf.Intl
.0001663
3.87e-06
.0465527
.0692407
.0903934
.0809026
.0061077
-.0024957
.0000354
.0197816
.1363242
.0686578
.0507377
-.017631
.0283624
.0192696
-.0047079
.0182222
-.0546393
.0596072
.0828076
.0518641
-.3065961
-.2130429
-.1648003
-.2366154
-.3923017
.0232296
.0318186
.0608695
.1614743
.2778751
.3816934
.4428591
.4960978
.5194727
11.5578
Table 10: Regression Results from the Semilog Model, 2005
Robust standard errors.
Number of obs = 4191
F (27, 4163) = 219.39
Prob > F = 0.0000
R2 = 0.6272
Root MSE = 0.16772
log-sprice
livarea
lotarea
bedroom
totalbath
numgarag
f ireplace
totalappl
age
age2
brick
newhouse
hardwd
natgas
corner
culdesc
patio
shopnrb
centvac
pool
whirlbath
sauna
aircon
Ottawasouth
Ottawaeast
Ottawawest
F arwest
suburb
cons
Coef
.0001478
4.55e-06
.0343227
.0662561
.0646113
.0684599
.0058
-.0035642
.0000336
.0100405
.1284248
.072675
.0395919
-.0307366
.0322124
.0233781
.0112075
.0128648
-.0645834
.0105023
.040411
.0469674
-.3920661
-.2812344
-.2199742
-.2970853
-.4899332
12.19424
Robust Std. Err
8.30e-06
6.57e-07
.0041877
.0051032
.0059326
.0062454
.0013483
.0004198
4.86e-06
.0073325
.0203979
.0054843
.0098599
.0111669
.0151644
.0067789
.0062142
.0059579
.0110921
.0097272
.0367587
.0086054
.0216062
.0305785
.0245603
.0230612
.0218399
.0315485
t
17.79
6.92
8.20
12.98
10.89
10.96
4.30
-8.49
6.91
1.37
6.30
13.25
4.02
-2.75
2.12
3.45
1.80
2.16
-5.82
1.08
1.10
5.46
-18.15
-9.20
-8.96
-12.88
-22.43
386.52
P > |t|
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.171
0.000
0.000
0.000
0.006
0.034
0.001
0.071
0.031
0.000
0.280
0.272
0.000
0.000
0.000
0.000
0.000
0.000
0.000
95% Conf. Intl
.0001315
3.26e-06
.0261125
.0562511
.0529802
.0562155
.0031566
-.0043872
.0000241
-.0043351
.0884341
.0619229
.0202612
-.0526297
.0024821
.0100879
-.0009756
.0011842
-.0863298
-.0085682
-.0316556
.0300963
-.4344259
-.3411845
-.2681255
-.3422976
-.5327511
12.13239
Conf. Intl
.0001641
5.84e-06
.0425328
.076261
.0762424
.0807042
.0084435
-.0027411
.0000431
.0244161
.1684156
7.0834271
.0589226
-.0088436
.0619426
.0366683
.0233906
.0245454
-.042837
.0295727
.1124776
.0638385
-.3497064
-.2212842
-.1718228
-.251873
-.4471152
12.2561
Table 11: Chow Test (F test) for Structural Change
Model
Semilog
96-97
2.75
97-98
1.80
98-99
1.23
99-00
1.46
00-01
2.09
01-02
2.60
02-03
2.36
03-04
1.55
04-05
1.39
Pooled
4.10
Table 12: Breusch-Pagan Test/Cook-Weisberg Test, χ21 , for Heteroskedasticity
Model
Linear
Semilog
Log-linear
Box-Cox
1996
2314
329
320
12.04
1997
2208
285
241
2.88
1998
1580
147
131
0.31
1999
2082
200
197
0.09
2000
1610
159
151
0.08
28
2001
1061
70
66
0.16
2002
1242
59
57
0.02
2003
1415
115
105
1.24
2004
1483
170
165
2.46
2005
1391
149
133
0.12
Table 13: Housing Price Indexes (Semilog Model), 1996=100
Year
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
NHPI
100.00
100.60
101.31
103.92
111.57
124.45
134.10
139.13
148.29
155.13
Median
100.00
102.15
104.72
108.58
118.86
137.94
154.19
164.79
176.68
183.10
Pooled
100.00
101.53
102.39
105.43
116.53
130.93
145.31
154.50
162.96
166.79
Adjacent
100.00
101.61
102.50
105.56
116.75
131.11
145.43
154.90
163.38
167.11
Laspeyres
100.00
101.60
102.55
105.70
116.83
131.24
145.71
155.54
163.98
167.75
Paasche
100.00
101.66
102.49
105.45
116.70
131.01
145.17
154.06
162.59
166.25
Fisher
100.00
101.63
102.52
105.58
116.77
131.13
145.44
154.79
163.28
167.00
Table 14: Housing Price Indexes (Semilog Model), year to year, %
Year
1997
1998
1999
2000
2001
2002
2003
2004
2005
NHPI
0.60
0.70
2.58
7.36
11.54
7.76
3.75
6.58
4.61
Median
2.15
2.52
3.68
9.47
16.05
11.78
6.87
7.21
3.64
Pooled
1.53
0.85
2.97
10.53
12.35
10.98
6.33
5.48
2.35
Adjacent
1.61
0.88
2.99
10.60
12.30
10.92
6.51
5.48
2.28
29
Laspeyres
1.60
0.94
3.07
10.53
12.33
11.02
6.75
5.43
2.30
Paasche
1.66
0.81
2.89
10.67
12.26
10.81
6.12
5.54
2.25
Fisher
1.63
0.88
2.98
10.60
12.30
10.91
6.43
5.48
2.28
Figure 1: Housing Price Indexes (Semilog Model), 1996=100
30
Figure 2: Housing Price Indexes (Semilog Model), Year to Year
31
Download