Transformations and Re

advertisement
Transformations
Transformation (re-expression) of
a Variable
• Transformation of a variable can change its
distribution from a skewed distribution to a
normal distribution (bell-shaped, symmetric
about its centre
• A very useful transformation is the natural log
transformation
x
 new
 transformed x  ln( x)
• For any value of x, ln(x) can be:
• Looked up in tables
• Calculated by most calculators
• Calculated by most statistical packages
Graph of ln(x)
x
 new
 ln( x)
6
5
4
3
2
1
0
0
20
40
60
80
100
x
120
140
160
180
The effect of the transformation
x
 new
 ln( x)
6
5
4
3
2
1
0
0
20
40
60
80
100
x
120
140
160
180
The effect of the ln transformation
• It spreads out values that are close to zero
• Compacts values that are large
60
35
50
30
25
40
20
30
15
20
10
10
5
0
0
x
x
 new
 ln(x)
Transforming data to a normal distribution
allows one to use powerful statistical
procedures (discussed later on) that assumes
the data is normally distributed.
Transformations to Linearity
• Many non-linear curves can be put into a linear
form by appropriate transformations of the either
– the dependent variable Y or
– the independent variable X
– or both.
• This leads to the wide utility of the Linear model.
• Another use of trans
Intrinsically Linear (Linearizable) Curves
1 Hyperbolas
y = x/(ax-b)
Linear form: 1/y = a -b (1/x) or Y = b0 + b1 X
Transformations: Y = 1/y, X=1/x, b0 = a, b1 = -b
b/a
positive curvature b>0
1/a
y=x/(ax-b)
y=x/(ax-b)
negative curvature b< 0
1/a
b/a
2. Exponential
y = a ebx = aBx
Linear form: ln y = lna + b x = lna + lnB x or Y = b0 + b1 X
Transformations: Y = ln y, X = x, b0 = lna, b1 = b = lnB
Exponential (B < 1)
Exponential (B > 1)
2
5
a
y
1
y
aB
aB
a
0
0
1
x
2
0
0
1
x
2
3. Power Functions
y = a xb
Linear from: ln y = lna + blnx or Y = b0 + b1 X
Transformations: Y = ln y, X = ln x, b0 = lna, b1 = b
Power functions
b>0
b>1
Power functions
b<0
b=1
0 <b<1
-1 < b < 0
b = -1
b < -1
Summary
Transformations can be useful for:
1. Changing data from a skewed distribution to a
Normal (bell- shaped) distribution
2. Straightening out Non-linear data
3. A common transformation is the natural log
transformation ln(x)
Example – Motor Vehicle Data
The data is in an Excel file – MtrVeh.xls
Dependent = mpg
Independent = Engine size, horsepower
and weight
The data in an SPSS file
We will try to fit a model predicting mpg
with Engine (engine size).
First a scatter plot:
The dialog box selecting the variables:
The scatter-plot
50
MPG
40
30
20
10
0
100
200
300
ENGINE
400
500
Similar to:
2. Exponential
y = a ebx = aBx
Linear form: ln y = lna + b x = lna + lnB x or Y = b0 + b1 X
Transformations: Y = ln y, X = x, b0 = lna, b1 = b = lnB
Exponential (B < 1)
Exponential (B > 1)
2
5
a
y
1
y
aB
aB
a
0
0
1
x
2
0
0
1
x
2
• To perform a ln transformation in SPSS
• Go to the menu Transform->Compute
• In this dialogue box you define the
tansformation
• Press OK and the trasformation will be
performed
• The new variable has been added to the
SPSS spreadsheet
• The scatterplot showing a better fit to a
straight line using the new variable lnmpg.
4.00
lnmpg
3.50
3.00
2.50
2.00
0
100
200
300
ENGINE
400
500
Transformations
summary
• Transformations can be used to convert
non-normal data to normally (bell-shaped)
distributed data (allowing for the use of
the more powerful techniques assuming
normality)
• Transformations can be used to convert
non-linear data linear (straight line) data.
Next topic
Probability
Download