Transformations Transformation (re-expression) of a Variable • Transformation of a variable can change its distribution from a skewed distribution to a normal distribution (bell-shaped, symmetric about its centre • A very useful transformation is the natural log transformation x new transformed x ln( x) • For any value of x, ln(x) can be: • Looked up in tables • Calculated by most calculators • Calculated by most statistical packages Graph of ln(x) x new ln( x) 6 5 4 3 2 1 0 0 20 40 60 80 100 x 120 140 160 180 The effect of the transformation x new ln( x) 6 5 4 3 2 1 0 0 20 40 60 80 100 x 120 140 160 180 The effect of the ln transformation • It spreads out values that are close to zero • Compacts values that are large 60 35 50 30 25 40 20 30 15 20 10 10 5 0 0 x x new ln(x) Transforming data to a normal distribution allows one to use powerful statistical procedures (discussed later on) that assumes the data is normally distributed. Transformations to Linearity • Many non-linear curves can be put into a linear form by appropriate transformations of the either – the dependent variable Y or – the independent variable X – or both. • This leads to the wide utility of the Linear model. • Another use of trans Intrinsically Linear (Linearizable) Curves 1 Hyperbolas y = x/(ax-b) Linear form: 1/y = a -b (1/x) or Y = b0 + b1 X Transformations: Y = 1/y, X=1/x, b0 = a, b1 = -b b/a positive curvature b>0 1/a y=x/(ax-b) y=x/(ax-b) negative curvature b< 0 1/a b/a 2. Exponential y = a ebx = aBx Linear form: ln y = lna + b x = lna + lnB x or Y = b0 + b1 X Transformations: Y = ln y, X = x, b0 = lna, b1 = b = lnB Exponential (B < 1) Exponential (B > 1) 2 5 a y 1 y aB aB a 0 0 1 x 2 0 0 1 x 2 3. Power Functions y = a xb Linear from: ln y = lna + blnx or Y = b0 + b1 X Transformations: Y = ln y, X = ln x, b0 = lna, b1 = b Power functions b>0 b>1 Power functions b<0 b=1 0 <b<1 -1 < b < 0 b = -1 b < -1 Summary Transformations can be useful for: 1. Changing data from a skewed distribution to a Normal (bell- shaped) distribution 2. Straightening out Non-linear data 3. A common transformation is the natural log transformation ln(x) Example – Motor Vehicle Data The data is in an Excel file – MtrVeh.xls Dependent = mpg Independent = Engine size, horsepower and weight The data in an SPSS file We will try to fit a model predicting mpg with Engine (engine size). First a scatter plot: The dialog box selecting the variables: The scatter-plot 50 MPG 40 30 20 10 0 100 200 300 ENGINE 400 500 Similar to: 2. Exponential y = a ebx = aBx Linear form: ln y = lna + b x = lna + lnB x or Y = b0 + b1 X Transformations: Y = ln y, X = x, b0 = lna, b1 = b = lnB Exponential (B < 1) Exponential (B > 1) 2 5 a y 1 y aB aB a 0 0 1 x 2 0 0 1 x 2 • To perform a ln transformation in SPSS • Go to the menu Transform->Compute • In this dialogue box you define the tansformation • Press OK and the trasformation will be performed • The new variable has been added to the SPSS spreadsheet • The scatterplot showing a better fit to a straight line using the new variable lnmpg. 4.00 lnmpg 3.50 3.00 2.50 2.00 0 100 200 300 ENGINE 400 500 Transformations summary • Transformations can be used to convert non-normal data to normally (bell-shaped) distributed data (allowing for the use of the more powerful techniques assuming normality) • Transformations can be used to convert non-linear data linear (straight line) data. Next topic Probability