Stat 101L: Lecture 15 Re-expressing Data Chapter 6 – Normal Model –What if data do not follow a Normal model? Chapters 8 & 9 – Linear Model –What if a relationship between two variables is not linear? 1 Re-expressing Data Re-expression is another name for changing the scale of (transforming) the data. Usually we re-express the response variable, Y. 2 Goals of Re-expression Goal 1 – Make the distribution of the re-expressed data more symmetric. Goal 2 – Make the spread of the re-expressed data more similar across groups. 3 1 Stat 101L: Lecture 15 Goals of Re-expression Goal 3 – Make the form of a scatter plot more linear. Goal 4 – Make the scatter in the scatter plot more even across all values of the explanatory variable. 4 Ladder of Powers Power: 2 2 Re-expression: y Comment: Use on left skewed data. 5 Ladder of Powers Power: 1 Re-expression: y Comment: No re-expression. Do not re-express the data if they are already well behaved. 6 2 Stat 101L: Lecture 15 Ladder of Powers Power: ½ y Re-expression: Comment: Use on count data or when scatter in a scatter plot tends to increase as the explanatory variable increases. 7 Ladder of Powers Power: “0” Re-expression: log y Comments: Not really the “0” power. Use on right skewed data. Measurements cannot be negative or zero. 8 Ladder of Powers Power: –½, –1 1 1 , Re-expression: y y Comments: Use on right skewed data. Measurements cannot be negative or zero. Use on ratios. 9 3 Stat 101L: Lecture 15 Goal 1 - Symmetry Data are obtained on the time between nerve pulses along a nerve fiber. Time is rounded to the nearest half unit where a unit is 1 50 of a second. th – 30.5 represents 30.5 50 0.61 sec 3 .99 2 .95 .90 .75 .50 1 0 .25 .10 .05 .01 Normal Quantile Plot 10 -1 -2 -3 40 Count 60 20 0 10 20 30 40 th 50 60 70 Time ( 1 50 sec) 11 Time – Nerve Pulses Distribution is skewed right. Sample mean (12.305) is much larger than the sample median (7.5). Many potential outliers. Data not from a Normal model. 12 4 3 .99 2 .95 .90 1 .75 0 .50 .25 Normal Quantile Plot Stat 101L: Lecture 15 -1 .10 .05 -2 .01 -3 40 20 Count 30 10 0 1 2 3 4 5 6 7 8 9 13 3 .99 2 .95 .90 1 .75 0 .50 .25 Normal Quantile Plot Sqrt(Time) -1 .10 .05 -2 .01 -3 20 Count 30 10 -1 0 1 2 3 4 5 14 Log(Time) Summary Time – Highly skewed to the right. Sqrt(Time) – Still skewed right. Log(Time) –Fairly symmetric and mounded in the middle. – Could have come from a Normal model. 15 5