Centering and Standardizing Data and the Change of Coordinates

Centering, Scaling, and Standardizing Data and the Change of Coordinates for Model Parameters Centering and standardizing are common data transformations. Changes to the coordinate system of the data space induce a change in the coordinate system of the parameter space that is fairly simple where the parameter space is a tensor product of the components of the data space. New parameters can be calculated as a linear transformation of the original parameters. Recentering a variable When we recenter data, we move the zero coordinate 𝑥𝑐 = 𝑥 − 𝑎 Where 𝑐 is a superscript indicating 𝑥 has been recentered, and 𝑎 is some arbitrary constant. Very often we will simply center 𝑥 at its mean, 𝜇𝑥 Δ𝑥 = 𝑥 − 𝜇𝑥 Rescaling a variable Similarly, when we rescale data, we change the quantity considered a unit of measure 𝑥 𝑠 = 𝑥/𝑎 Where again 𝑎 is an arbitrary constant. Very often we rescale in units of standard deviation, 𝜎𝑥 . Standardizing a variable Because the concept of standard deviation is tied to the mean, we will very often combine recentering and rescaling as standardizing a variable 𝑥𝑧 = Δ𝑥 𝑥 − 𝜇𝑥 = 𝜎𝑥 𝜎𝑥 Centering and Standardizing Additive Models Suppose we are modeling a regression outcome, 𝑦, as a function of a variable 𝑥1 , so 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝑒 If we center the data in 𝑥1 , so that Δ𝑥1 = 𝑥1 − 𝜇1 then we can model 𝑦 = 𝛽0𝑐 + 𝛽1𝑐 Δ𝑥1 + 𝑒 𝑐 = (𝛽0𝑐 − 𝛽1𝑐 𝜇1 ) + 𝛽1𝑐 𝑥1 + 𝑒 𝑐 With the second model expressed in terms of 𝑥1 , we see that 𝑒 = 𝑒𝑐 𝛽1 = 𝛽1𝑐 𝛽0 + 𝛽1 𝜇1 = 𝛽0𝑐 The change in parameters from the first model to the second is a linear transformation, and can be expressed in matrix form as [ 𝛽0𝑐 1 ]=[ 𝛽1𝑐 0 𝜇1 𝛽0 ][ ] 1 𝛽1 1 𝜇1 ]. It is worth noting that we could “center” our data at 0 1 any arbitrary point on the 𝑥1 scale by substituting any constant for 𝜇1 . And let’s call the transition matrix 𝑪𝒙𝟏 = [ We can easily extend this to models with more independent variables. 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝑒 Can be centered as 𝑦 = (𝛽0𝑐 − 𝛽1𝑐 𝜇1 − 𝛽2𝑐 𝜇2 ) + 𝛽1𝑐 𝑥1 + 𝛽2𝑐 𝑥2 + 𝑒 𝑐 with 𝛽0𝑐 1 [𝛽1𝑐 ] = [0 0 𝛽2𝑐 𝜇1 1 0 𝜇2 𝛽0 0 ] [𝛽1 ] 1 𝛽2 This form for the centering transition matrix is characteristic for additive models: the intercept is shifted, and all other coefficients remain unchanged. Returning to our simple model with one independent variable, we can also standardize 𝑥1 , so that 𝑥 𝑧 = Δ𝑥1 /𝜎𝑥1 𝑦 = 𝛽0𝑧 + 𝛽1𝑧 𝑥1𝑧 + 𝑒 𝑧 = 𝛽0𝑧 + 𝛽1𝑧 Δ𝑥 + 𝑒 𝑧 𝜎1 1 Which eventually gives us 𝛽𝑧 1 0 𝛽0𝑐 [ 0𝑧 ] = [ ][ ] 0 𝜎1 𝛽1𝑐 𝛽1 Calling this new transition matrix 𝑺𝒙𝟏 we see that 𝑩𝒛 = 𝑺𝒙𝟏 𝑪𝒙𝟏 𝑩 Centering and Standardizing Models with Interactions Now let’s consider a slightly more complicated model where we add a second variable 𝑥2 and consider also the interaction effect of 𝑥1 and 𝑥2 , formed in the usual way as 𝑥1 𝑥2, that is 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2 + 𝑒 Here, the parameter space for our expanded model is formed as a tensor product of the parameter spaces for 𝑥1 and 𝑥2 . If we wish to center both 𝑥1 and 𝑥2 we then have two transition matrices, 𝑪𝒙𝟏 and 𝑪𝒙𝟐 , and the transition matrix for the parameter space is now the direct product of the two [citation] 𝑪𝒙𝟐 ⊗ 𝑪𝒙𝟏 1 =[ 0 𝜇2 1 ]⊗[ 1 0 1 𝜇1 𝜇1 0 1 ]=[ 1 0 0 0 0 𝜇2 0 1 0 𝜇1 𝜇2 𝜇2 ] 𝜇1 1 In other words (and expressed in matrix form) 𝑩𝒄 = (𝑪𝒙𝟐 ⊗ 𝑪𝒙𝟏 )𝑩 And this may be extended to include more variables in the natural way. Standardizing variables works in the same way. We start with 𝑥1𝑧 = (𝑥1 −𝜇1 ) 𝜎1 𝑥𝑐 = 𝜎1 If we take the 1 transition matrix 𝑆𝑥1 = [ 1 0 ] 0 𝜎1 (again, note that we could scale our parameter by any arbitrary constant), then 𝑩𝒛 = 𝑺𝒙𝟏 𝑪𝒙𝟏 𝑩 Or more fully [ 𝛽0𝑧 1 𝑧 ] = [0 𝛽1 0 1 𝜇1 𝛽0 ][ ][ ] 𝜎1 0 1 𝛽1 For the two variable model we can form the full transition matrix as either of two forms (𝑺𝒙𝟐 ⊗ 𝑺𝒙𝟏 )(𝑪𝒙𝟐 ⊗ 𝑪𝒙𝟏 ) = (𝑺𝒙𝟐 𝑪𝒙𝟐 ) ⊗ (𝑺𝒙𝟏 𝑪𝒙𝟏 ) Where 1 (𝑺𝒙𝟐 ⊗ 𝑺𝒙𝟏 ) = [ 0 1 0 0 𝜎1 0 1 0 ]⊗[ ]=[ 𝜎2 0 𝜎1 0 0 0 0 Again, this generalizes to more variables. Variable-standardization or term-standardization Standardizing each variable: 0 0 𝜎2 0 0 0 ] 0 𝜎1 𝜎2 𝑦 = 𝛽0𝑧 + 𝛽1𝑧 𝑥1𝑧 + 𝛽2𝑧 𝑥2𝑧 + 𝛽3𝑧 𝑥1𝑧 𝑥2𝑧 + 𝑒 𝑧 Or standardizing each term 𝑦 = 𝛽0𝑧 + 𝛽1𝑧 𝑥1𝑧 + 𝛽2𝑧 𝑥2𝑧 + 𝛽3𝑧 (𝑥1 𝑥2 ) 𝑧 + 𝑒 𝑧 In the first case we have 𝑥1𝑧 𝑥2𝑧 = (𝑥1 − 𝜇𝑥1 )(𝑥2 − 𝜇𝑥2 ) 𝜎𝑥1 𝜎𝑥2 In the latter case we have (𝑥1 𝑥2 ) 𝑧 = 𝑥1 𝑥2 − 𝜇𝑥1𝑥2 𝜎𝑥1𝑥2 Consider five models, which differ in how the independent variables are centered and scaled. (1) Original scales 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2 + 𝑒 (2) Centered, variable-wise 𝑦 = 𝛽0𝑐𝑣 + 𝛽1𝑐𝑣 𝑥1𝑐 + 𝛽2𝑐𝑣 𝑥2𝑐 + 𝛽3𝑐𝑣 𝑥1𝑐 𝑥2𝑐 + 𝑒 𝑐 (3) Centered, term-wise 𝑦 = 𝛽0𝑐𝑡 + 𝛽1𝑐𝑡 𝑥1𝑐 + 𝛽2𝑐𝑡 𝑥2𝑐 + 𝛽3𝑐𝑡 (𝑥1 𝑥2 )𝑐 + 𝑒 𝑐 (4) Standardized, variable-wise 𝑦 = 𝛽0𝑧𝑣 + 𝛽1𝑧𝑣 𝑥1𝑧 + 𝛽2𝑧𝑣 𝑥2𝑧 + 𝛽3𝑧𝑣 𝑥1𝑧 𝑥2𝑧 + 𝑒 𝑧 (5) Standardized, term-wise 𝑦 = 𝛽0𝑧𝑡 + 𝛽1𝑧𝑡 𝑥1𝑧 + 𝛽2𝑧𝑡 𝑥2𝑧 + 𝛽3𝑧𝑡 (𝑥1 𝑥2 )𝑧 + 𝑒 𝑧 As it turns out, these models are all equivalent. Although they have different values of 𝛽, they produce the same predicted values for 𝑦, and the residuals are the same, that is 𝑒 = 𝑒 𝑐 = 𝑒 𝑧 . This means that the overall fit, 𝑅 2, of the models is the same. Additionally, all of the t-statistics and p-values for 𝛽s representing the same effect are equal for the highest order effects, i.e. centering and scaling do not change the statistical significance of highest order effects (but will in general change t-values and pvalues for lower order, included effects, and intercepts). Given the correct centering and scaling constants, each set of 𝛽s can be calculated as a linear transformation of any of the other sets (without any need to transform the data). The real differences, then, are in ease of calculation, use, and interpretation. For ease of calculation it is hard to beat term-wise centering. Except for the intercept, each 𝛽𝑖𝑐𝑡 = 𝛽𝑖 , and the intercept is 𝛽0𝑐𝑡 = 𝜇𝑦 . More explicitly 𝛽0𝑐𝑡 1 𝜇𝑥1 𝛽1𝑐𝑡 0 1 𝑐𝑡 = [0 0 𝛽2 𝑐𝑡 0 0 [𝛽3 ] 𝜇𝑥2 0 1 0 𝜇𝑥1𝑥2 𝛽0 𝛽 0 ] [ 1] 𝛽 0 2 𝛽3 1 This is transformation is like the transformation in an additive model. The one wrinkle here is to note that 𝜇𝑥1𝑥2 = 𝜇𝑥1 𝜇𝑥2 + 𝑐𝑜𝑣(𝑥1 , 𝑥2 ). Because of this, unless 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) = 0 there is a constant effect hidden in the interaction term, and there is no simple interpretation of the first-order effects. Another way to say this is when 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) ≠ 0 there is no easily interpreted point in the data space where both 𝛽2𝑐𝑡 and 𝛽3𝑐𝑡 cancel out leaving only the effect of 𝛽1𝑐𝑡 . Suppose we seek the point at which both 𝛽2𝑐𝑡 and 𝛽3𝑐𝑡 cancel out. For 𝛽2𝑐𝑡 to cancel this implies 𝑥2 = 𝜇𝑥2 . Then the product term is (𝑥1 𝜇𝑥2 )𝑐 = 𝑥1 𝜇𝑥2 − (𝜇𝑥1 𝜇𝑥2 + 𝑐𝑜𝑣(𝑥1 , 𝑥2 )) And for this to be zero 𝑥1 = 𝜇𝑥1 𝜇𝑥2 + 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) 𝜇𝑥2 𝜇𝑥1 𝜇𝑥2 +𝑐𝑜𝑣(𝑥1 ,𝑥2 ) , 𝜇𝑥2 ). 𝜇𝑥2 𝑐𝑜𝑣(𝑥1 ,𝑥2 ) In other words, the parameter 𝛽1𝑐𝑡 is the effect of 𝑥1𝑐 at the point (𝑥1 , 𝑥2 ) = ( Rewritten in terms of our centered variables, this is the point (𝑥1𝑐 , 𝑥2𝑐 ) = ( 𝜇𝑥2 , 0). Because this is a specific point, we can further calculate the predicted value of 𝑦 𝑦 = 𝜇𝑦 + 𝛽1𝑐𝑡 ( 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) ) 𝜇𝑥2 A parallel derivation isolates the parameter 𝛽2𝑐𝑡 . Consider the predicted value of 𝑦 where 𝑥1𝑐 = 𝑥2𝑐 = 0, i.e. at 𝑥1 = 𝜇𝑥1 and 𝑥2 = 𝜇𝑥2 . 𝑦 = 𝛽0𝑐𝑡 + 𝛽1𝑐𝑡 ∙ 0 + 𝛽2𝑐𝑡 ∙ 0 + 𝛽3𝑐𝑡 (𝑥1 𝑥2 )𝑐 = 𝜇𝑦 + 𝛽3𝑐𝑡 (𝑥1 𝑥2 )𝑐 But (𝑥1 𝑥2 )𝑐 = (𝜇𝑥1 𝜇𝑥2 )𝑐 = 𝜇𝑥1 𝜇𝑥2 − 𝜇𝑥1𝑥2 = 𝜇𝑥1 𝜇𝑥2 − (𝜇𝑥1 𝜇𝑥2 + 𝑐𝑜𝑣(𝑥1 , 𝑥2 )) = −𝑐𝑜𝑣(𝑥1 , 𝑥2 ) Returning to the predicted value of 𝑦, we have 𝑦 = 𝜇𝑦 − 𝛽3𝑐𝑡 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) And our hidden constant makes its appearance. Interestingly, we see that all three parameters take on their individual meaning at specific points, and not as general “effects” across sets of points. We also note that 𝛽0𝑐𝑡 never appears in isolation if 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) ≠ 0. But if these are not effects, what would be the effect of 𝑥1𝑐 or 𝑥2𝑐 ? Because there is an interaction term, both effects are conditional. Suppose we let 𝑥2𝑐 = 𝑎 and pursue the effect of 𝑥1𝑐 . 𝑐 𝑦 = 𝜇𝑦 + 𝛽1 𝑥1𝑐 + 𝛽2 𝑎 + 𝛽3 ((𝑥1𝑐 + 𝜇𝑥1 )(𝑎 + 𝜇𝑥2 )) = 𝜇𝑦 + 𝛽1 𝑥1𝑐 + 𝛽2 𝑎 + 𝛽3 (𝑥1𝑐 (𝑎 + 𝜇𝑥2 ) + 𝜇𝑥1 (𝑎 + 𝜇𝑥2 ) − 𝜇𝑥1𝑥2 ) = 𝛽0 + 𝛽1 𝜇𝑥1 + 𝛽2 (𝑎 + 𝜇2 ) + 𝛽3 (𝜇𝑥1 (𝑎 + 𝜇2 )) + (𝛽1 + 𝛽3 (𝑎 + 𝜇2 ))𝑥1𝑐 Then the effect of 𝑥1𝑐 at 𝑥2𝑐 = 𝑎 will be 𝛽1 + 𝛽3 (𝑎 + 𝜇𝑥2 )

Centering and Standardizing Data and the Change of Coordinates

Related documents

Products

Support

Centering and Standardizing Data and the Change of Coordinates

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib