ch 2 Regression part..

Introduction to Regression
Dependent variable (response variable)
Measures an outcome of a study
Dependent variable = Mean (expected value)
+ random error
GRE scores
y = E(y) + ε
If y is normally distributed, know the mean
and the standard deviation, we can make a
probability statement
Probability statement
Let’s say the mean cholesterol level for
graduate students= 250
Standard deviation= 50 units
What does this distribution look like?
“the probability that ____’s cholesterol will fall
within 2 standard deviations of the mean is
Independent variables (predictor
explains or causes changes in the response
(The effect of the IV on the DV)
(Predicting the DV based on the IV)
What independent variables might help us
predict cholesterol levels?
The effect of a reading intervention
program on student achievement in
Predict state revenues
Predict GPA based on SAT
predict reaction time from blood alcohol
Regression Analysis
Build a model that can be used to predict one
variable (y) based on other variables (x1, x2,
x3,… xk,)
Model: a prediction equation relating y to x1, x2,
x3,… xk,
Predict with a small amount of error
Typical Strategy for Regression Analysis
Conduct exploratory data analysis
Develop one or more tentative models
Identify most suitable model
Make inferences based on model
Fitting the Model: Least Squares Method
Model: an equation that describes the
relationship between variables
Let’s look at the persistence example
Method of Least Squares
Let’s look at the persistence example
̂1 
SS xy
SS xx
Finding the Least Squares Line
ˆ1 
SS xy
SS xx
Intercept: ˆ0  y  ˆ1 x
The line that makes the vertical distances of the
data points from the line as small as possible
The SE [Sum of Errors (deviations from the line, residuals)]
equals 0
The SSE (Sum of Squared Errors) is smaller than for any
other straight-line model with SE=0.
Regression Line
Has the form y = a + bx
b is the slope, the amount by which y changes
when x increases by 1 unit
a is the y-intercept, the value of y when x = 0 (or
the point at which the line cuts through the x-axis)
Simplest of the probabilistic models:
Straight-Line Regression Model
First order linear model
 Equation: y = β0 + β1x + ε
 Where
y = dependent variable
x = independent variable
β0 = y-intercept
β1 = slope of the line
ε = random error component
Let’s look at the relationship between two
variables and construct the line of best fit
Minitab example: Beers and BAC