• residual: the prediction error for an observation, which is
the differency ŷ − y between the actual value and the
predicted value of the response variable, is called a residual.
• Residual sum of squares:
Residual sum of squares =
(residual)2 =
(y − ŷ )2
• Least Squares Method
Among the possible lines that can go through data points in a
scatterplot, this method gives the regression line that has the
smallest value for the residual sum of squares in using
ŷ = a + bx to predict y .
• having some positive residuals and some negative
residuals, but the sum (and the mean) of the residuals
equals 0.
• passing through the point (x̄, ȳ ).
• The slope:
The y-intercept:
a = ȳ − bx̄
• r-Squared (r 2 ):
Interpretation: the proportion of the variation in the y-values
that is accounted for by the linear relationship of y with x.
Some Cautions in Analyzing Associations
• Extrapolation is dangerous. Extrapolation refers to using a
regression line to predict y values for x values outside the
observed range of data.
• Be cautious of influential outliers
• Correlation does not imply causation.
• lurking variable: a variable, usually unobserved, that
influences the association between the variables of primary
• A lurking variable may be a common cause of both the
explanatory and response variable.
• The change of response variable may due to multiple cause.
• experiment: assigning subjects to certain experimental
conditions and then observing outcomes on the response
• treatments: the experimental conditions, which correspond
to assigned values of the explanatory variable.
• observational study (nonexperimental studies):
observing values of the response variable and explanatory
variables for the sampled subjects, without anything being
done to the subjects (such as imposing a treatment)
Advantage of Experiments over Observational Studies
• In an experiment, by some sort of “random” selection to
determine which subjects receive each treatment, the effects
of lurking variables are “balanced”; that is, the groups have
similar distribution on other variables.
Thus, we can study the effect of an explanatory variable on a
response variable more accurately with an experiment than
with an observational study.
• Why bother to do observational studies?
Data Types in Observational Studies
• anecdotal evidence
• sample survey: selecting a sample of subjects from a
population and collects data from them
• census: a survey for the whole population — expensive, time
consuming or impossible
• sampling frame: the list of subjects in the population from
which the sample is taken
Ideally, the sampling frame lists the entire population.
In practice, it’s usually hard to identify every subject in the