Leverage

Handout #9: Understanding the Influence of Individual Observations Example 9.1: In this example, we will again consider a subset of a more substantial dataset so that we can more easily understand the ideas presented. For this example, we will investigation the Price of a used Chevrolet from our CarPrices dataset. Linear Regression Setup     Model to be fit using only used Chevrolet vehicle only, i.e., Make = Chevrolet, New=No. Response Variable: Price Predictor Variable: Miles Assume the following structure for mean and variance functions o o 𝐸(𝑃𝑟𝑖𝑐𝑒|𝑀𝑖𝑙𝑒𝑠, 𝑀𝑎𝑘𝑒 = 𝐶ℎ𝑒𝑣𝑟𝑜𝑙𝑒𝑡, 𝑁𝑒𝑤 = 𝑁𝑜) = 𝛽0 + 𝛽1 ∗ 𝑀𝑖𝑙𝑒𝑠 𝑉𝑎𝑟(𝑃𝑟𝑖𝑐𝑒|𝑀𝑖𝑙𝑒𝑠, 𝑀𝑎𝑘𝑒 = 𝐶ℎ𝑒𝑣𝑟𝑜𝑙𝑒𝑡, 𝑁𝑒𝑤 = 𝑁𝑜) = 𝜎 2 1 An initial plot using the graph builder functionality in JMP. As expected, as Miles increase, Price decreases, and as Year increases, so does Price. Regression Output for 𝐸(𝑃𝑟𝑖𝑐𝑒|𝑀𝑖𝑙𝑒𝑠) = 𝛽0 + 𝛽1 ∗ 𝑀𝑖𝑙𝑒𝑠 Scatterplot with simple linear regression line. Standard Regression Output 2 Next, consider a plot of the residuals. Discussion: The functional form of the mean function does not appear to be correct. Discuss. Something Extra - A Significance Test for Curvature Cook and Weisberg (1999) discuss a simple statistical test to determine whether or not curvature (statistically) exists in a residual plot. The following outlines this procedure.  Step #1: Save the predicated values and residuals into the dataset.  Step #2: Plot the residuals against the predicted values. 3  Step #3: Fit a quadratic mean function to investigate possible curvature. Note: Even with the extreme observation on the left removed, there possible issues with curvature may still exist. This is shown in the following plot. 4 Concepts of Leverage, Outliers, and Influence In this section, we will consider the potential for an observation to impact our estimated mean function. Continuing with our investigation above, we will consider the effect of the observation are the far right of the graph presented above. This observation had a high number of miles relative to other used cars in our dataset. A very simple (and maybe somewhat naïve) approach would be to simply compare the estimated mean function when this observation is included in the analysis to the estimated mean function when this observation is not included in the analysis. This can easily be done in JMP. To exclude an observation in JMP. From the drop-down menu, select Script > Redo Analysis 5 Including Observation #14 Not including Observation #14 Questions 1. What was the impact of removing Observation #14 on the estimated slope of the regression line? What about the impact on the y-intercept? 2. What was the impact of removing Observation #14 on the estimated variance or standard deviation? 3. Do you think removing this observation has significantly impacted our model? 6 Concept of Leverage Observation #14 is an outlier because of it’s miles, not because of the price. This notation is captured by a concept called leverage. The current Wiki entry for leverage is presented here. Wiki entry for Leverage http://en.wikipedia.org/wiki/Leverage_(statistics) I would guess that the term leverage as commonly used in modeling was borrowed from the concept of a lever in physics. 7 A visual depiction of leverage within the context of regression is shown next. Concept of leverage with one predictor Concept of leverage with 2 predictor variables The formula for leverage when a single predictor is involved can be written out using summation notation. Matrix representation is usually used when one or more predictors are involved in the mean function. Formula for leverage for a single predictor Formula for leverage in matrix notation (𝑥𝑖 − 𝑥̅ )2 1 ℎ𝑖 = ( + ) 𝑛 ∑(𝑥𝑖 − 𝑥̅ )2 𝑯 = 𝑿(𝑿′ 𝑿)−𝟏 𝑿′ Comments:  The matrix H is commonly referred to as the hat matrix  The ℎ𝑖 values presented to the left are the diagonal elements of H The leverage values, i.e. diagonal elements of H, can be obtained in JMP by selecting Hats from the Save Columns menu in JMP. 8 The leverage values for all observations are added as a column in your dataset. Plotting the diagonal elements of the hat matrix against Miles clearly shows which observations have high leverage and which do not. In a model with a single predictor, leverage increases as the distance from the center increases In a model with multiple predictors, leverage increases as the distance from the centroid increases 9 Getting the leverage values in R can easily be done using the matrix notation. > > > > > x=cbind(rep(1,38),Miles) xprimex = xprime %*% x xprimex.inv = solve(xprimex,diag(2)) hat.matrix = x %*% xprimex.inv %*% xprime diag(hat.matrix) Belsley (1980) suggests that observations with ℎ𝑖 > 2 ∗ # 𝑜𝑓 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑖𝑛 𝑚𝑜𝑑𝑒𝑙 𝑛 be considered as high leverage observations. Such observations may have an adverse effect on your estimated mean function and one should proceed cautiously in the case. 10 Concept of Outlier We have already considered a crude approach to identifying outliers in a regression situation. Crude Outlier Rule Graphically, more than 2 standard deviations away from 0 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 > 2 ∗ 𝜎̂ or 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 < −2 ∗ 𝜎̂ A more through consideration of what constitutes an outlier is presented here. First, consider the following facts for the estimated residual vector. 𝐸(𝜺̂) = 𝟎 and 𝑉𝑎𝑟(𝜺̂) = = = = = ̂) 𝑉𝑎𝑟(𝒀 − 𝑿𝜷 𝑉𝑎𝑟(𝒀 − 𝑿(𝑿′ 𝑿)−𝟏 𝑿′ 𝒀) 𝑉𝑎𝑟( (𝑰 − 𝑯) 𝒀 ) (𝑰 − 𝑯) ∗ 𝑽𝒂𝒓(𝒀) ∗ (𝑰 − 𝑯)′ (𝑰 − 𝑯) ∗ 𝜎 2 Comments:    The above expectation holds assuming the model includes an intercept term. The last equality is true because (𝑰 − 𝑯) is a symmetric idempotent matrix. Idempotent implies that when it is multiplied by itself, you retain the same matrix. It makes sense that the variability in the estimated residuals should be a function of leverage because as the distance in the x-direction(s) increases, there is more variability in the mean function and thus is reflected in the variability of the estimated residuals. Task: Verify that (𝑰 − 𝑯) is indeed a symmetric and idempotent matrix for the model we have been working with. 11 Studentized Residuals Certainly, the determination of whether or not an observation would be considered an outlier depends on the scale in the response, i.e depends on 𝜎̂. The crude approach above simply multiplied this quantity by 2 to identify an outlier. A more traditional approach is to the identification of outlier is to standardize the residuals. Concept of Standardizing a Measurement To standardize a measurement implies the following transformation 𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 − 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 Standardized Measurements have the following properties  Expectation = 0  Variance = 1  Outlier Rules o If normal theory holds, |𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑉𝑎𝑙𝑢𝑒𝑠| > 2, or o Use the more conservative |𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑉𝑎𝑙𝑢𝑒𝑠| > 3, for nonnormal situations (via Chebyshev’s Inequality) A studentized residual is computed as follows. 𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑖𝑧𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑖 = 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟𝑖 = 𝑒̂𝑖 − 0 √𝜎̂ 2 ∗ (1 − ℎ𝑖𝑖 ) Comments:  Any observations with an absolute studentized residual larger than 2 would be considered an outlier.  This type of residual is sometimes referred to as an internally studentized residual. This is in contrast to an externally studentized residual (or deleted studentized residual) which is discussed below. 12 Getting Studentized residuals in JMP The output from JMP is placed into a new column. Questions: 1. Show the calculations for at least one of the studentized residuals in the above dataset. Would this observation be considered at outlier? 2. Which observations would be considered statistical outliers? 3. Do the outliers identified here agree with the outliers identified by the crude approach of 2 ∗ 𝜎̂? 13 Deleted Studentized Residuals Deleted studentized residuals give a more holistic perspective of error. In particular, a deleted studentized residual for a particular observation is computed from a model that is estimated with this observation deleted from the dataset. If an observation is withheld in the fitting process, then the residual may reflect a more pure notation of error in the prediction. A deleted studentized residual is computed as follows. 𝐷𝑒𝑙𝑒𝑡𝑒𝑑 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑖𝑧𝑒𝑑 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(−𝑖) = 𝑑𝑒𝑙𝑒𝑡𝑒𝑑 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟(−𝑖) = 𝑒̂𝑖 − 0 2 ∗ (1 − ℎ𝑖𝑖 ) √𝜎̂(−𝑖) Comments:  Any observations with an absolute deleted studentized residual larger than 2 would be considered an outlier.  The deleted studentized residuals is sometimes called the externally studentized residual as the estimate of error is computed externally, i.e. without the observation being investigated.  The deleted studentized residuals cannot be easily obtained in JMP; however, Beckman and Trussell (1974) provide a way to get from internally to externally studentized residuals via the following relationship. This relationship suggests that if n is substantially large compared to the number of predictors that the externally and internally studentized residuals are similar. In the following n = number of observations and k = number of predictors in the model. (𝑛 − 1) − (𝑘 + 1) 𝑑𝑒𝑙𝑒𝑡𝑒𝑑 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟(−𝑖) = 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟𝑖 ∗ √ 𝑛 − (𝑘 + 1) − (𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟𝑖 )2 14 Cooks Distance Cook (1977) developed a separate measure of the effect of an individual observation by combining the magnitude of the internally studentized residual with the magnitude of leverage for this observations. This statistic is simply referred to as Cook’s Distance or Cook’s D. 𝐶𝑜𝑜𝑘 ′ 𝑠 𝐷𝑖 = (𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑟𝑖 )2 ℎ𝑖 ∗ (1 − ℎ𝑖 ) ⏟ (𝑘 + 1) ⏟ 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑙𝑒𝑣𝑒𝑟𝑎𝑔𝑒 where    𝑟𝑖 = 𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝑙𝑦 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑖𝑧𝑒𝑑 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 ℎ𝑖 = 𝑙𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 𝑖𝑛 𝑚𝑜𝑑𝑒𝑙 Suggested Rules for Cook’s Distance   An observation whose Cook’s Distance is substantially larger than others should be investigated Cook suggests it is always important to investigate observations whose Cook’s D > 1  Others have suggested observation with Cook’s D > should be investigated further 4 𝑛 To obtain Cook’s Distance values in JMP, select Save Columns > Cook’s D Influence from the red-drop down menu in the Fit Model output window. 15

Leverage

Related documents

Products

Support

Leverage

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib