Data Handling & Analysis
BD7054
Normality
Andrew Jackson a.jackson@tcd.ie
Making assumptions
Each group is normally distributed
The residuals off the line are normally distributed
Distributions are where numbers come from
• The binomial distribution tells us how systems like a coin toss behave
• It tells us how many events are likely to occur given repeated attempts
• The event has a fixed probability of occurring each time
0 1 2 3 4 5 6 7 8 9 10
Number of Heads
The normal distribution
• Normal or Gaussian distribution
• “the bell shaped curve”
• Defined by mean and a variance (or standard deviation)
• The PDF or Probability
Density Function of the normal distribution is shown right
Origins of the Normal Distribution
• Assume that an individual’s weight or height
(or whatever we are measuring) is affected by thousands of small +/- effects such as genes or environment
• Add those effects up for each individual, and lo and behold…
• The character will display a normal distribution
Return to our brain/body data
• We need to test whether each group is normally distributed
• Equivalent to asking if the residuals are normally distributed
• Residuals are the difference between an observed value and its predicted value
– Which is the mean value in each group in this case
Exploring Residuals from boxplots
A simple histogram A Q-Q plot (quantile-quantile)
Return to our scatter plot
• We need to test whether our residuals off the line are normally distributed
• Also need to check that there is no trend in the deviation of the residuals along the line
Exploring residuals from scatter plot
Histogram of residuals Q-Q plot of residuals
Testing for a trend in the data
What to do if residuals are not normal?
• Transforming the data is often the solution
• Taking the log of the response variable (y) is first port of call
– For scatter plot type data, can also take the log of the explanatory (x) variable
– We will do this next time we meet