Advanced Topics in Regression Quantile Regression Analysis of Causality Mediation Analysis Hierarchical Linear Modeling Compiled by Nick Evangelopoulos, 2013 1 Part 1: Quantile Regression 2 Motivation for Quantile Regression Problem ANOVA and regression provide information only about the conditional mean. More knowledge about the distribution of the statistic may be important. The covariates may shift not only the location or scale of the distribution, they may affect the shape as well. Solution Quantile regression models the relationship between X and the conditional quantiles of Y given X = x 3 Quantile Definition • Definition: Given p ∈ [0, 1]. A pth quantile of a random variable Z is any number ζp such that Pr(Z< ζ p ) ≤ p ≤ Pr(Z ≤ ζ p ). The solution always exists, but need not be unique. Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21} and p=0.5 then Pr(Z<9) = 3/8 ≤ 1/2 ≤ Pr(Z ≤ 9) = 5/8 So, the 50th percentile is equal to 9 Quantile Regression • A family of conditional quantiles of Y given X=x. • The median regression line is also the OLS regression line. The other quantile functions are solutions to a set of linear programming problems 90% Y 75% 50% 25% 10% x Quantile Regression Daily High Temperature 50 45 40 35 Today A scatter of daily high temperature in Sydney. The red line is the 45-degree line 30 25 20 15 10 5 0 0 10 20 30 Yesterday 40 50 Quantile Regression Cool Yesterday (n=259) 75 80 Frequency 60 X1 40 20 1 5 10 7.6 X0 Temperature Today 15 20 18.4 Quantile Regression Hot Yesterday (n=259) 61 80 Frequency 60 X1 40 20 6 15 14 20 25 30 35 X0 Temperature Today 40 45 42.55 Quantile Regression Quantiles at .9, .75, .5, .25, and .10. Given yesterday’s temperature, today’s temperature has an expected distribution which is non-symmetrical Temperature Quantiles 60 Today 50 40 30 20 10 0 5 15 25 Yesterday 35 45 Quantile Regression Estimation • The quantile regression coefficients are the solution to 1 n min p 12 12 sgn y i x 'i y i x 'i n i 1 (1) • The k first order conditions are 1 n 1 1 'ˆ p sgn y x i i p x i 0 n i1 2 2 ( 2) Quantile Regression Coefficient Interpretation Q y i | x i x ij • The marginal change in the Θth conditional quantile due to a marginal change in the jth element of x. There is no guarantee that the ith person will remain in the same quantile after her x is changed. Quantile Regression Bibliography • Koenker and Hullock (2001), “Quantile Regression,” Journal of Economic Perspectives, Vol. 15, Pps. 143-156. • Buchinsky (1998), “Recent Advances in Quantile Regression Models”, Journal of Human Resources, Vo. 33, Pps. 88-126. • www.econ.uiuc.edu/~roger • http://Lib.stat.cmu.edu/R/CRAN Quantile Regression in SAS Optional Reading: Colin (Lin) Chen, An Introduction to Quantile Regression and the QUANTREG Procedure, SUGI30, Paper 213-30 Part 2: Analysis of Causality For more information: BUSI 6280 The material presented here is based on a paper by Josef Brüderl (University of Mannheim, Germany) 14 Get more at http://dilbert.com/strips/ Panel Data Methods for analysis of causality exploit a data structure of multi-dimensional longitudinal data, which is typically described in the statistics and econometrics literature as Panel Data Panel data is defined as a combination of cross-section data, where data on one or more variables are collected at the same point in time, and time-series data, where data are collected at regular time intervals. Analysis of panel data will be performed using the TSCREG procedure in the statistical package SAS (Allison 2005; Mohd Nor & Maarof 2007) and the xtreg procedure in the statistical package Stata (Brüderl 2005). References Allison, P.D. (2005). Fixed Effects Regression Methods for Longitudinal Data Using SAS. SAS Press. Brüderl, J. (2005). Panel Data Analysis. University of Mannheim, http://www2.sowi.uni-mannheim.de/lsssm/veranst/Panelanalyse.pdf (accessed October 15, 2012) Mohd Nor, A. H. S., & Maarof, F. (2007). “Panel Data Analysis Using SAS”. Proceedings of the 21st Annual SAS Malaysia Forum, 5th September 2007, Kuala Lumpur. Halaby, C. (2004). Panel Models in Sociological Research. Annual Review of Sociology, 30: 507-544. Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press. Wooldridge, J. (2003). Introductory Econometrics: A Modern Approach. Thomson. Chapters 13, 14. Baron and Kenny (1986) Part 3: Mediation Analysis For more information: BUSI 6280, EPSY 6270 The material presented here is based on Wikipedia 18 Mediation Models Mediation is a hypothesized causal chain in which one variable affects a second variable that, in turn, affects a third variable. The intervening variable, M, is the mediator. It “mediates” the relationship between a predictor, X, and an outcome Y. a and b: direct effects of X on M and M on Y, resp. c’: direct effect of X on Y after accounting for M c’ X a M b Y Baron and Kenny steps The Baron and Kenny (1986) approach is not the best, but many researchers are still using it STEP 1: Conduct a simple regression analysis with X predicting Y to test for path c alone c is the direct effect of X on Y, without taking into account M. This is not the same as c’ on the previous slide! c X M Y Baron and Kenny steps STEP 2: Conduct a simple regression analysis with X predicting M to test the significance of path a alone X a M Y Baron and Kenny steps STEP 3: Conduct a simple regression analysis with M predicting Y to test the significance of path b alone The purpose of Steps 1-3 is to establish that zero-order relationships among the variables exist. If one or more of these relationships are non-significant, researchers usually conclude that mediation is not possible or likely Assuming there are significant relationships from Steps 1 through 3, proceed to Step 4. X M b Y Baron and Kenny steps STEP 4: Conduct a multiple regression analysis with X and M predicting Y In Step 4, some form of mediation is supported if the effect of M (path b) remains significant after controlling for X. If X is no longer significant when M is controlled, the finding supports full mediation. If X is still significant, the finding supports partial mediation. c’ X M b Y Sobel steps STEP 1: Conduct a multiple regression analysis with X and M predicting Y: X Y = b 0 + b1 X + b 2 M + e c’ M b Y STEP 2: Conduct a simple regression analysis with X predicting M: X M = b 3 + b4 X + u a M Y STEP 3: Compute the indirect effect as bindirect = (b2)(b4) Significance is best determined using bootstrapping SEM approach The Structural Equation Modeling (SEM) approach is considered the best for testing mediation effects. In SEM, a single mediation model is tested. Full mediation and partial mediation models can be compared by fitting both as alternative models. The model with the highest fit statistics is the more appropriate c’ X a M b Full mediation Y X a M b Partial mediation Y References Baron, R.M. & Kenny, D.A. (1986). The Moderator- Mediator variable distinction in Social Psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182. MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. Mahwah, NJ: Erlbaum. Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological Methodology (pp. 290312). Washington DC: American Sociological Association. Part 4: Hierarchical Linear Modeling For more information: BUSI 6480, EPSY 6230 (EPSY offered at the UNT College of Education) 27 Multilevel Models Multilevel models are particularly appropriate for research designs where the data for participants is organized at more than one level Analysis of Covariance (ANCOVA) include nested designs Individuals nested within groups Companies nested within industries