A Critical Examination of Hedonic Analysis of a Regression Model (HARM) and META-ANALYSIS Albert R. Wilson BSSE, MBA, CRE (Ret) 1 Regression Model A model intended to allow an exploration of the hypothetical relationship between possible explanatory variables and the sales price 2 Regression Model • Reflection of reality • The touchstone of that reality? Actual market participants 3 “Estimated” versus “Predicted” • Estimated = Sale IN database • Predicted = Sale NOT IN database 4 Predicted Sales Prices At the mean predicted sales price variance is larger than estimated variance by σ2 (variance in the data) 5 Mean Confidence Intervals (MCI) Estimated and Predicted MCI FOR PREDICTED 4.38 TIMES MCI FOR ESTIMATED 6 DATABASE EDITING GARBAGE IN => GARBAGE OUT (GIGO) 7 Case Example Influence on the Removal of “Flipping Transactions” on the Predicted Prices for 33 Properties PREDICTED SALES PRICES PROPERTY NO. SUM n Adj. R-squared 8 AS PRESENTED FLIPS REMOVED % CHANGE 5,069,239 4,018,112 (1,051,127) 391 379 -12 0.7684 0.7593 -0.0091 Editing and Confirmation of Data STEP 1: Edit to identify obvious issues (the desk edit) Case Example 9 Assessor’s Data 4,325 R-Squared 0.79 Removed 747 17.3% 0.83 MLS Data Removed 779 44.3% 1,888 Editing and Confirmation of Data STEP 2: Identify sales that are not appropriate to the analysis 10 Editing and Confirmation of Data STEP 3: Sales confirmation • A values-neutral interview of sale participants • OBJECT: to elicit the primary factors motivating the conclusion of the sale price MUST NOT INTRODUCE ANALYST OPINION THIS IS THE ONLY MEANS OF IDENTIFYING/CONFIRMING THE REASONS FOR A CONCLUDED PRICE 11 Regression Model Considerations Faithfully represent: • Identified concerns of actual market participants • Restrictions imposed by the data Estimates of prices the ONLY VERIFIABLE OUTPUT 12 Coefficient Calculation Result of iterative calculations designed to provide the most accurate estimates of sales prices in database 13 Coefficient Calculation Goodness of Fit • Measures of the Goodness of Fit apply only to the relationship between the estimated and actual sales prices in the database • They do not apply to the coefficients 14 Most commonly-cited Goodness-of-Fit Measure R-Squared (Coefficient of Determination) 15 R-Squared • Generally-applied interpretation: – R-Squared is the amount of variance “explained” by the model 16 Low R-Squared Models Mathematically, as the R-Squared approaches 0.30, it becomes more likely that the model is only measuring random effects 17 The Omitted and Additional Variable Problem • Omitting generally increases magnitude and statistical significance of the remaining coefficients • Adding generally decreases the magnitude and statistical significance of the remaining variable coefficients 18 Illustration of Omitting or Adding a Variable Base Model Variable Intercept Coeff. 67,370 Added Variable–APN t-stat 17.52 APN t-stat -663,632 -8.14 .023 8.98 -1085.06% 66,293 17.14 % Change -1.60% % Change Coeff. t-stat Fixtures 2,653 5.39 2,511 5.15 -5.35% 2,886 5.84 8.74% NoPatio (12,801) -7.77 (5,036) -2.73 -60.66% (13,451) -8.13 5.08% SqFt 40.79 29.23 42.80 30.61 4.93% 41.59 29.72 1.96% Pool 8,366 6.77 8,908 7.28 6.48% 19,382 12.90 20,153 13.54 3.98% 19,980 13.24 3.09% (16,141) -11.24 (11,230) -7.38 -30.43% (15,276) -10.61 -5.36% (8,875) -4.52 (7,114) -3.64 -19.84% (8,012) -4.06 -9.72% 2000 207 0.08 1,787 -0.67 763.29% 271 0.10 30.92% 2001 (2,017) -0.76 665 0.258 -132.97% (2,028) -0.76 0.55% 2002 (719) -0.25 3,976 1.36 -652.99% (615) -0.21 -14.46% 2003 7,213 2.67 7,647 2.86 6.02% 7,258 2.71 0.62% 2004 41,149 15.50 40,380 15.37 -1.87% 40,901 15.31 -0.60% 2005 132,077 51.04 130,662 50.93 -1.07% 131,129 50.43 -0.72% 2006 160,367 45.29 159,842 45.63 -0.33% 159,897 44.89 -0.29% Garage Middle Ring Inner Ring R-Squared 19 Coeff. Omitted Variable–Pool 0.83 0.83 0.83 Consequences of Variable Selection Including the Assessor’s Parcel Number APN Coefficient Value t-statistic Mean Value R-Squared Mean Sale Price 0.023 8.98 30,834,360 0.83 $211,000 Results in an incremental increase in the sales price of 0.023 x (APN Coef.) x 20 30,834.360 (Mean Value) = = $709,190 (Incremental Increase) Consequences of Variable Selection Omission of a Variable: • Removal of “Pool”; present in 38% of properties – SQFT Cofficient changed from $40.79 to $41.79 – Approximately the same t-statistic • Removal of “Fixtures”; present in 100% of properties – SQFT Coefficient changed from $40.79 to $46.50 – T-statistic = 50.94 21 Coefficients Coefficients are simply multipliers for the explanatory variable 22 Causation in Real Estate From the Real Estate Appraiser’s perspective: 1. Causation demonstrated through sales confirmation interviews. 2. Causation NEVER proven through a regression. 23 Strengths and Weaknesses • Can never be better than the data • Requires significant amount of data: five to 15 or more sales • Upper limit to the amount of data: too much may be worse than too little • Guide: Are the sales competitive to the subject? • Estimate of sales prices most accurate at the mean value of the data • Variance of a predicted sales price larger than variance of estimated • Thousands of possible regression models 24 Further Considerations • Absent standards, the “Rubber Ruler” may apply • When recognized and published standards are not used, author must demonstrate the accuracy and reliability of his/her work 25 Hedonic Analysis The Hedonic Assumption The coefficient accurately and only represents the contribution of the declared meaning of the explanatory variable to the sale price 27 Hedonic Analysis The validity of the hedonic assumption must be demonstrated 28 “Revealed Preference” Idea cannot be supported for real estate Supporting Literature Not a single paper demonstrated the validity of the hedonic assumption PLUS • NO indication of confirmation of raw data • NO indication of adherence to any recognized / published standards • NO indication of confirmation of results with the normal or typical market participant THE RUBBER RULER EFFECT IS MUCH IN EVIDENCE. 30 Regression Model Accuracy If the regression model is inaccurate, then there is no reason to expect the coefficients to be accurate or meaningful. Therefore the HARM cannot be accurate. 31 CASE EXAMPLE TO POOL OR NOT TO POOL • • • • 32 Using the data from the previous case. Does a pool influence value? By how much? The Hedonic Approach, the coefficient is the marginal contribution to value. COMBINED POOL AND NO POOLS Variable Intercept COEFFICIENT MEAN VALUES COMBINED POOL AND NO POOLS, POOL COEFFICIENT SET TO ZERO EXPECTED COEFFICIENT VALUES MEAN VALUES EXPECTED VALUES 54,089.83 1 54,090 54,089.83 1 54,090 ORIG_FIXTURES 2,805.33 8.73 24,491 2,805.33 8.73 24,491 ORIG_NOPATIO -14,116.47 0.34 -4,800 -14,116.47 0.34 -4,800 9,161.98 0.38 3,482 9,161.98 0 0 41.52 2283.62 94,815 41.52 2283.62 94,815 16,212.83 0.4 6,485 16,212.83 0.4 6,485 5,980.33 1 5,980 5,980.33 1 5,980 ORIG_POOL ORIG_SQF ORIG_X_3GARAGE SY2000 EXPECTED MEAN SALE PRICE Adj R2 33 184,543 0.8816 181,061 0.8816 TO POOL OR NOT TO POOL (CONT.) • What are the coefficients if there is no pool? 34 COMBINED WITH NO POOL VARIABLE Variable Intercept COEFFICIENT EXPECTED VALUES 52788.1063 1 52,788 ORIG_FIXTURES 3,087.8801 8.73 26,957 ORIG_NOPATIO -14,724.7843 0.34 -5,006 42.3986 2283.62 96,822 ORIG_X_3GARAGE 16,924.691 0.4 6,770 SY2000 5,727.7462 1 5,728 ORIG_SQF EXPECTED MEAN SALE PRICE Adj R2 35 MEAN VALUES 184,059 0.8790 Comparision • • • • • • • • 36 Orig Fixt 2,805 Orig-nopatio -14,116 Orig-no pool 9,162 Orig-sqf 41.52 Orig-garage 16,213 SY2000 5,980 ESP $184,513 R-sq 0.88 3,088 -14,725 NA 42.40 16,925 5,728 $184,059 0.88 POOL OR NOT TO POOL (CONT.) • WHAT HAPPENS IF WE CONSIDER A DATABASE WITH POOLS, AND SEPARATELY A DATABASE WITHOUT POOLS? 37 WITH POOL ON PROPERTY Variable Intercept COEFFICIENT MEAN VALUES WITHOUT POOL ON PROPERTY EXPECTED VALUES COEFFICIENT MEAN VALUES EXPECTED VALUES 65,957.89 1.00 65,958 54,993.78 1.00 54,994 ORIG_FIXTURES 2,505.59 9.65 24,179 2,784.14 8.16 22,719 ORIG_NOPATIO -15,415.46 0.22 -3,391 -14,838.47 0.41 -6,084 41.63 2,586.79 107,690 41.46 2,097.20 86,956 15,768.93 0.40 6,308 16,308.32 0.31 5,056 4,211.37 1.00 4,211 7,209.87 1.00 7,210 ORIG_POOL ORIG_SQF ORIG_X_3GARAGE SY2000 EXPECTED MEAN SALE PRICE Adj R2 38 204,954 0.08711 170,850 0.8895 POOLS AND NO POOLS SEPARATELY • ESTIMATED SALE PRICE WITH POOL $204,954 – R-SQUARED 0.87 • ESTIMATED SALE PRICE W/O POOL $170,805 – R-SQUARED 0.89 39 The Coefficient – What Counts? ALL THAT STATISTICAL SIGNIFICANCE CAN TELL US IS THAT FOR THIS MODEL AND DATABASE THE COEFFICIENT IS A SIGNIFICANT (OR INSIGNIFICANT) MULTIPLIER FOR THE EXPLANATORY VARIABLE. NOTHING MORE. 40 The Appropriate Standard: Economic Significance For us, economic significance is determined by what the normal or typical participant considers important to the conclusion of the transaction. 41 A Criticality: NOT ONE hedonic analysis encountered to date has actually asked this question: “What was important to you in concluding your transaction?” 42 Hedonic Analysis of a Regression Model (HARM) is: • Highly inaccurate and unreliable method • Not appropriate for appraisal work Observations apply to hedonic analysis NOT regression models! 43