n
Close resemblance to how the researcher thinks.
n
Easy visualisation and interpretation of data.
n
More information is analysed simultaneously, giving greater power.
n
Relationship between variables is understood better.
n
Focus shifts from individual factors taken singly to relationship among variables.
n n n n
Independent (or Explanatory or Predictor) variable always on the X axis.
Dependent (or Outcome or Response) variable always on the Y axis.
In OBSERVATIONAL studies researcher observes the effects of explanatory variables on outcome.
In INTERVENTION studies researcher manipulates explanatory variable (e.g. dose of drug) to influence outcome
n n
Scatter plot helps to visualise the relationship between two variables.
The figure shows a scatter plot with a regression line. For a given value of X there is a spread of Y values.
The regression line represents the mean values of Y.
2500
2000
1500
1000
1000
Scatterplot of deuterium against testweighing
Deuterium = -67.3413 + 1.16186 Test weigh
S = 234.234 R-Sq = 59.3 % R-Sq(adj) = 56.0 %
2000 1500
Test weigh
Regression
95% CI
Definitions - III n n
INTERCEPT is the value of Y for X = 0. It denotes the point where the regression line meets the Y axis
SLOPE is a measure of the change in the value of Y for a unit change in
X.
Y axis
Intercept
Slope
X axis
Basic Assumptions n
Y increases or decreases linearly with increase or decrease in X.
n
For any given value of X the values of Y are distributed Normally.
n
Variance of Y at any given value of X is the same for all value of X.
n
The deviations in any one value of Y has no effect on other values of Y for any given X
The Residuals n n
The difference between the observed value of Y and the value on the regression line
(Fitted value) is the residual.
The statistical programme minimizes the sum of the squares of the residuals. In a
Good Fit the data points are all crowded around the regression line.
Residual
n n n
The variation of Y values around the regression line is a measure of how X and Y relate to each other.
Method of quantifying the variation is by
Analysis of variance presented as Analysis of
Variance table
Total sum of squares represents total variation of Y values around their mean - S yy
Total Sum of Squares ( S yy
) is made up of two parts:
(i). Explained by the regression
(ii). Residual Sum of Squares
Sum of Squares ÷ its degree of freedom = Mean Sum of Squares
(MSS)
The ratio MSS due to regression ÷ MSS Residual = F ratio
n n n n n n n n n n
Regression Equation
Residual Sum of Squares (RSS)
Values of α and β.
R 2
S (standard deviation)
Testing for β ≠ 0
Analysis of Variance Table
F test
Outliers
Remote from the rest