# ch 2 Regression part.. ```Introduction to Regression
Analysis
Dependent variable (response variable)

Measures an outcome of a study



Dependent variable = Mean (expected value)
+ random error


Income
GRE scores
y = E(y) + ε
If y is normally distributed, know the mean
and the standard deviation, we can make a
probability statement
Probability statement




Let’s say the mean cholesterol level for
Standard deviation= 50 units
What does this distribution look like?
“the probability that ____’s cholesterol will fall
within 2 standard deviations of the mean is
.95”
Independent variables (predictor
variables)


explains or causes changes in the response
variables
(The effect of the IV on the DV)
(Predicting the DV based on the IV)
What independent variables might help us
predict cholesterol levels?
Examples




The effect of a reading intervention
program on student achievement in
Predict state revenues
Predict GPA based on SAT
predict reaction time from blood alcohol
level
Regression Analysis

Build a model that can be used to predict one
variable (y) based on other variables (x1, x2,
x3,… xk,)


Model: a prediction equation relating y to x1, x2,
x3,… xk,
Predict with a small amount of error
Typical Strategy for Regression Analysis
Start
Conduct exploratory data analysis
Develop one or more tentative models
Identify most suitable model
Make inferences based on model
Stop
Fitting the Model: Least Squares Method


Model: an equation that describes the
relationship between variables
Let’s look at the persistence example
Method of Least Squares
Let’s look at the persistence example
̂1 
SS xy
SS xx
Finding the Least Squares Line



Slope:
ˆ1 
SS xy
SS xx
Intercept: ˆ0  y  ˆ1 x
The line that makes the vertical distances of the
data points from the line as small as possible


The SE [Sum of Errors (deviations from the line, residuals)]
equals 0
The SSE (Sum of Squared Errors) is smaller than for any
other straight-line model with SE=0.
Regression Line

Has the form y = a + bx


b is the slope, the amount by which y changes
when x increases by 1 unit
a is the y-intercept, the value of y when x = 0 (or
the point at which the line cuts through the x-axis)
Simplest of the probabilistic models:
Straight-Line Regression Model
First order linear model
 Equation: y = β0 + β1x + ε
 Where
y = dependent variable
x = independent variable
β0 = y-intercept
β1 = slope of the line
ε = random error component

Let’s look at the relationship between two
variables and construct the line of best fit

Minitab example: Beers and BAC
```