Quantile Regression ISQS 5349 – Regression Analysis Spring 2014 Laurie Corradino Daniela Sanchez March 13, 2014 What is Quantile Regression? A form of regression analysis designed to estimate models for the conditional median or other conditional quantile functions of the predictor variable (Y) against the covariates (X’s). Different slopes/rates of change (β’s) for different quantiles of the response variable (Y) distribution. 2 Background Boscovich proposed median regression in the 18th century. Laplace and Edgeworth further investigated that idea. Mosteller and Tukey (1977) first stated that functions could be fitted to describe parts of the response variable (y) distribution aside from simply the mean of the distribution. Quantile regression (other than median) is the work of Roger Koenker and Gilbert Bassett (1978) – University of Illinois. What is a Quantile? OLS vs. Quantile Regression (Hao and Naiman, 2007; Koenker, 2000) OLS vs. Quantile Regression Characteristics OLS Regression Quantile Regression Assumed Distribution for Errors Normal No Distribution Assumption Variance Assumption Constant Variance (Homoscedasticity) Non-Constant Variance (Heteroscedasticy) Accomodated Linearity Assumption Mean is a linear function of X Quantile is a linear function of X Uncorrelated Errors Assumption Assumption is necessary but adjustments available Assumption is necessary but adjustments available (Cade and Noon, 2003; Hao and Naiman, 2007) Quantile Regression Graph adapted from Fitzenberger (2012) Quantile Regression Quantile Regression – March Madness Example March Madness Example Continued Why Quantile Regression? Teams’ consistencies (different variances). Teams’ performance non-symmetric (non-normal distributions). Very high and low scoring games occur (outliers). Predictions for certain gambling opportunities may necessitate predictions for parts of the score distribution aside from the mean. Caveats later controlled for: Positive/negative momentum (correlated/dependent errors). Single game scores for both teams usually similar (dependent errors). March Madness Example Implementation Data on 2,940 games for 232 Division I NCAA teams 199 quantiles calculated for each team Using past data, score predictions made for each pair of teams in the tournament at each of the 199 quantiles Note: this model assumes independence of errors which is unlikely in reality. More in-depth analysis using more advanced statistical and quantile regression techniques and survival analysis are used in the paper to deal with such issues. R-Code Formula Tau Method (He and Wei, 2005); Quantile Regression - R • • • • • • “br” = simplex method – Barrodale and Roberts (1974) “fn” = interior point method – Frisch-Newton (1997) “pfn” = Frisch-Newton with pre-processing “fnc” = enables linear inequality on fitted coefficient “lasso” = penalized method using lasso penalty “scad” = penalized method using Fan and Li’s smoothly clipped absolute deviation penalty Comparison of More Common Algorithm Methods “br” • Default • Good for up to several thousand observations “fn” • Good for a larger problem (He and Wei, 2005); Quantile Regression – R; Susmel “pfn” • Good for much larger problems • Similar to “fn” but quicker Methods of Calculating Standard Errors Summary.rq(object, se=“ ”…) or Summary(object,se=“ ”…) “iid” • Direct estimation / sparsity estimation • Computes estimate of asymptotic covariance • iid errors “rank” • Inversion of rank tests • Default iid errors but noniid can be accommodated • For non-iid, option iid=FALSE “boot” • Bootstrap methods • Pairwise bootstrap (noniid allowed) • Parzen, Wei, and Ying (non-iid allowed) • Markov Chain Marginal Bootstrap (MCMB) For a discussion of the methods and their relative advantages/disadvantages see http://www.econ.uiuc.edu/~roger/research/rqci/rqci.pdf (He and Wei, 2005); Quantile Regression – R; Susmel Other Quantile Regression Applications Applications Engineering: Building energy consumption vs. temperature/weather and varying levels of end uses (NREL) - Henze et al. (2014) Upper and lower control limits desired Marketing: Tourist spending patterns vs. various spending stimuli (e.g. length of stay, job type, gender, age, etc.) - Lew and Ng (2012) Market segmentation desired Accounting/Finance: - Earnings vs. firm size, financial leverage, and R&D expenditures - Li and Wang (2011) Prior research inconclusive regarding effect of factors on earnings On a Practical Note Is CEO total compensation associated with firm size? I examine CEO Total Compensation as a function of Total Assets. Y = CEO Total Compensation S&P1500 firms X = Total Assets (size proxy) Merged 2012 data downloaded from COMPUSTAT and EXECUCOMP. Total Compensation data is in thousands Total Assets data is in millions Quantile Regression (Koenker and Hallock, 2001) Quantile Regression: tau = .50 Intercept tau = .50 Centercept tau = .50 • The intercept is a centercept and estimates the quantile function of Total CEO Compensation conditional on mean Total Assets at each particular quantile. Interpreting Coefficients? The same way as ordinary regression coefficients. The total asset quantile coefficients are positively associated with total compensation. Conclusions References Cade, B. S., & Noon, B. R. (2003). A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment, 1(8), 412-420. http://www.fort.usgs.gov/products/publications/21137/21137.pdf Fitzenberger, Bernd (2012). Quantile Regression. Universität Linz. http://www.econ.jku.at/members%5CDerntl%5Cfiles%5CPHD%5CFitzenberger_QuantileRegression.pdf Hao, L., & Naiman, D. Q. (2007). Quantile regression (No. 149). Sage. http://www.sagepub.com/upm-data/14855_Chapter3.pdf He, X., & Wei, W. (2005). Tutorial on Quantile Regression. Cached page: http://webcache.googleusercontent.com/search?q=cache:IugoWaFOXoJ:epi.univparis1.fr/servlet/com.univ.collaboratif.utils.LectureFichiergw%3FID_FICHE%3D27872%26OBJET%3D0008%26ID_FICHIER%3D8337 9+&cd=1&hl=en&ct=clnk&gl=us Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 33-50. Koenker, R. W. (2000). Quantile Regression, article prepared for the statistics section of the International Encyclopedia of the Social Sciences. University of Illinois: Urbana-Champaign, IL. http://www.econ.uiuc.edu/~roger/research/rq/rq.pdf Koenker, R., & Hallock, K. (2001). Quantile regression. Journal of Economic Perspectives, 15(4), 143-156. http://www.econ.uiuc.edu/~roger/research/rq/QRJEP.pdf Koenker, R., & Bassett Jr, G. W. (2010). March Madness, Quantile Regression Bracketology, and the Hayek Hypothesis. Journal of Business & Economic Statistics, 28(1). http://www.econ.uiuc.edu/~roger/research/bracketology/MM.pdf Koenker, R. (2011). “Quantile Regression: A Gentle Introduction.” University of Illinois Urbana- Champaign. http://www.econ.uiuc.edu/~roger/courses/RMetrics/L1.pdf Quantile Regression – R Documentation for Package ‘quantreg’ version 4.30. http://svitsrv25.epfl.ch/Rdoc/library/quantreg/html/rq.html Susmel, Rauli. “Lecture 10 Robust and Quantile Regression.” Bauer College of Business University of Houston. http://www.bauer.uh.edu/rsusmel/phd/ec1-25.pdf References for Noted Discipline-Specific Applications Henze, G. P., Pless, S., Petersen, A., Long, N., & Scambos, A. T. (2014). Control Limits for Building Energy End Use Based on Engineering Judgment, Frequency Analysis, and Quantile Regression. http://www.nrel.gov/docs/fy14osti/60020.pdf Lew, A. A., & Ng, P. T. (2012). Using quantile regression to understand visitor spending. Journal of Travel Research, 51(3), 278-288. http://jtr.sagepub.com.lib-e2.lib.ttu.edu/content/51/3/278.full.pdf+html Li, M., & Hwang, N. (2011). Effects of Firm Size, Financial Leverage and R&D Expenditures on Firm Earnings: An Analysis Using Quantile Regression Approach. Abacus, 47(2), 182-204. doi:10.1111/j.14676281.2011.00338.x http://eds.a.ebscohost.com.lib-e2.lib.ttu.edu/ehost/pdfviewer/pdfviewer?sid=91bf3ebd6f4d-42dd-bb3b-e4818335144b%40sessionmgr4005&vid=2&hid=4110 Questions? Thank You!