Handout #9: Understanding the Influence of Individual Observations Example 9.1: In this example, we will again consider a subset of a more substantial dataset so that we can more easily understand the ideas presented. For this example, we will investigation the Price of a used Chevrolet from our CarPrices dataset. Linear Regression Setup ο· ο· ο· ο· Model to be fit using only used Chevrolet vehicle only, i.e., Make = Chevrolet, New=No. Response Variable: Price Predictor Variable: Miles Assume the following structure for mean and variance functions o o πΈ(πππππ|πππππ , ππππ = πΆβππ£πππππ‘, πππ€ = ππ) = π½0 + π½1 ∗ πππππ πππ(πππππ|πππππ , ππππ = πΆβππ£πππππ‘, πππ€ = ππ) = π 2 1 An initial plot using the graph builder functionality in JMP. As expected, as Miles increase, Price decreases, and as Year increases, so does Price. Regression Output for πΈ(πππππ|πππππ ) = π½0 + π½1 ∗ πππππ Scatterplot with simple linear regression line. Standard Regression Output 2 Next, consider a plot of the residuals. Discussion: The functional form of the mean function does not appear to be correct. Discuss. Something Extra - A Significance Test for Curvature Cook and Weisberg (1999) discuss a simple statistical test to determine whether or not curvature (statistically) exists in a residual plot. The following outlines this procedure. ο· Step #1: Save the predicated values and residuals into the dataset. ο· Step #2: Plot the residuals against the predicted values. 3 ο· Step #3: Fit a quadratic mean function to investigate possible curvature. Note: Even with the extreme observation on the left removed, there possible issues with curvature may still exist. This is shown in the following plot. 4 Concepts of Leverage, Outliers, and Influence In this section, we will consider the potential for an observation to impact our estimated mean function. Continuing with our investigation above, we will consider the effect of the observation are the far right of the graph presented above. This observation had a high number of miles relative to other used cars in our dataset. A very simple (and maybe somewhat naïve) approach would be to simply compare the estimated mean function when this observation is included in the analysis to the estimated mean function when this observation is not included in the analysis. This can easily be done in JMP. To exclude an observation in JMP. From the drop-down menu, select Script > Redo Analysis 5 Including Observation #14 Not including Observation #14 Questions 1. What was the impact of removing Observation #14 on the estimated slope of the regression line? What about the impact on the y-intercept? 2. What was the impact of removing Observation #14 on the estimated variance or standard deviation? 3. Do you think removing this observation has significantly impacted our model? 6 Concept of Leverage Observation #14 is an outlier because of it’s miles, not because of the price. This notation is captured by a concept called leverage. The current Wiki entry for leverage is presented here. Wiki entry for Leverage http://en.wikipedia.org/wiki/Leverage_(statistics) I would guess that the term leverage as commonly used in modeling was borrowed from the concept of a lever in physics. 7 A visual depiction of leverage within the context of regression is shown next. Concept of leverage with one predictor Concept of leverage with 2 predictor variables The formula for leverage when a single predictor is involved can be written out using summation notation. Matrix representation is usually used when one or more predictors are involved in the mean function. Formula for leverage for a single predictor Formula for leverage in matrix notation (π₯π − π₯Μ )2 1 βπ = ( + ) π ∑(π₯π − π₯Μ )2 π― = πΏ(πΏ′ πΏ)−π πΏ′ Comments: ο· The matrix H is commonly referred to as the hat matrix ο· The βπ values presented to the left are the diagonal elements of H The leverage values, i.e. diagonal elements of H, can be obtained in JMP by selecting Hats from the Save Columns menu in JMP. 8 The leverage values for all observations are added as a column in your dataset. Plotting the diagonal elements of the hat matrix against Miles clearly shows which observations have high leverage and which do not. In a model with a single predictor, leverage increases as the distance from the center increases In a model with multiple predictors, leverage increases as the distance from the centroid increases 9 Getting the leverage values in R can easily be done using the matrix notation. > > > > > x=cbind(rep(1,38),Miles) xprimex = xprime %*% x xprimex.inv = solve(xprimex,diag(2)) hat.matrix = x %*% xprimex.inv %*% xprime diag(hat.matrix) Belsley (1980) suggests that observations with βπ > 2 ∗ # ππ πππππππ‘πππ ππ πππππ π be considered as high leverage observations. Such observations may have an adverse effect on your estimated mean function and one should proceed cautiously in the case. 10 Concept of Outlier We have already considered a crude approach to identifying outliers in a regression situation. Crude Outlier Rule Graphically, more than 2 standard deviations away from 0 π ππ πππ’ππ > 2 ∗ πΜ or π ππ πππ’ππ < −2 ∗ πΜ A more through consideration of what constitutes an outlier is presented here. First, consider the following facts for the estimated residual vector. πΈ(πΊΜ) = π and πππ(πΊΜ) = = = = = Μ) πππ(π − πΏπ· πππ(π − πΏ(πΏ′ πΏ)−π πΏ′ π) πππ( (π° − π―) π ) (π° − π―) ∗ π½ππ(π) ∗ (π° − π―)′ (π° − π―) ∗ π 2 Comments: ο· ο· ο· The above expectation holds assuming the model includes an intercept term. The last equality is true because (π° − π―) is a symmetric idempotent matrix. Idempotent implies that when it is multiplied by itself, you retain the same matrix. It makes sense that the variability in the estimated residuals should be a function of leverage because as the distance in the x-direction(s) increases, there is more variability in the mean function and thus is reflected in the variability of the estimated residuals. Task: Verify that (π° − π―) is indeed a symmetric and idempotent matrix for the model we have been working with. 11 Studentized Residuals Certainly, the determination of whether or not an observation would be considered an outlier depends on the scale in the response, i.e depends on πΜ. The crude approach above simply multiplied this quantity by 2 to identify an outlier. A more traditional approach is to the identification of outlier is to standardize the residuals. Concept of Standardizing a Measurement To standardize a measurement implies the following transformation ππππ π’ππππππ‘ − ππππ ππ ππππ π’ππππππ‘ √ππππππππ ππ ππππ π’ππππππ‘ Standardized Measurements have the following properties ο· Expectation = 0 ο· Variance = 1 ο· Outlier Rules o If normal theory holds, |ππ‘ππππππππ§ππ ππππ’ππ | > 2, or o Use the more conservative |ππ‘ππππππππ§ππ ππππ’ππ | > 3, for nonnormal situations (via Chebyshev’s Inequality) A studentized residual is computed as follows. ππ‘π’ππππ‘ππ§ππ π ππ πππ’πππ = π π‘π’ππππ‘ ππ = πΜπ − 0 √πΜ 2 ∗ (1 − βππ ) Comments: ο· Any observations with an absolute studentized residual larger than 2 would be considered an outlier. ο· This type of residual is sometimes referred to as an internally studentized residual. This is in contrast to an externally studentized residual (or deleted studentized residual) which is discussed below. 12 Getting Studentized residuals in JMP The output from JMP is placed into a new column. Questions: 1. Show the calculations for at least one of the studentized residuals in the above dataset. Would this observation be considered at outlier? 2. Which observations would be considered statistical outliers? 3. Do the outliers identified here agree with the outliers identified by the crude approach of 2 ∗ πΜ? 13 Deleted Studentized Residuals Deleted studentized residuals give a more holistic perspective of error. In particular, a deleted studentized residual for a particular observation is computed from a model that is estimated with this observation deleted from the dataset. If an observation is withheld in the fitting process, then the residual may reflect a more pure notation of error in the prediction. A deleted studentized residual is computed as follows. π·ππππ‘ππ π π‘π’ππππ‘ππ§ππ πππ πππ’ππ(−π) = πππππ‘ππ π π‘π’ππππ‘ π(−π) = πΜπ − 0 2 ∗ (1 − βππ ) √πΜ(−π) Comments: ο· Any observations with an absolute deleted studentized residual larger than 2 would be considered an outlier. ο· The deleted studentized residuals is sometimes called the externally studentized residual as the estimate of error is computed externally, i.e. without the observation being investigated. ο· The deleted studentized residuals cannot be easily obtained in JMP; however, Beckman and Trussell (1974) provide a way to get from internally to externally studentized residuals via the following relationship. This relationship suggests that if n is substantially large compared to the number of predictors that the externally and internally studentized residuals are similar. In the following n = number of observations and k = number of predictors in the model. (π − 1) − (π + 1) πππππ‘ππ π π‘π’ππππ‘ π(−π) = π π‘π’ππππ‘ ππ ∗ √ π − (π + 1) − (π π‘π’ππππ‘ ππ )2 14 Cooks Distance Cook (1977) developed a separate measure of the effect of an individual observation by combining the magnitude of the internally studentized residual with the magnitude of leverage for this observations. This statistic is simply referred to as Cook’s Distance or Cook’s D. πΆπππ ′ π π·π = (π π‘π’ππππ‘ ππ )2 βπ ∗ (1 − βπ ) β (π + 1) β πππ πππ’ππ πππ£πππππ where ο· ο· ο· ππ = πππ‘πππππππ¦ π π‘π’ππππ‘ππ§ππ πππ πππ’ππ βπ = πππ£πππππ π = ππ’ππππ ππ πππππππ‘πππ ππ πππππ Suggested Rules for Cook’s Distance ο· ο· An observation whose Cook’s Distance is substantially larger than others should be investigated Cook suggests it is always important to investigate observations whose Cook’s D > 1 ο· Others have suggested observation with Cook’s D > should be investigated further 4 π To obtain Cook’s Distance values in JMP, select Save Columns > Cook’s D Influence from the red-drop down menu in the Fit Model output window. 15