Uploaded by Charles De Saint Etienne

2 Introduction to Regression Analysis (1)

advertisement
2 Introduction to Regression Analysis
Basic Concept
1. Regression Analysis is a process of estimating the functional relationship between a random
variable Y (called dependent variable) and one or more other non-random variables Xs (called
independent variables).
2. In regression analysis, we assume that the relationship between the dependent variable Y and the
independent variables Xs is linear, as in the following form:
Y  0  1 X1  ...   k X k  
(1)
3. If equation (1) above involves only one independent variable (i.e., Y   0  1 X 1   ), it is called
a simple regression model. If it involves more than one independent variable, it is referred to as a
multiple regression model.
4. In equation (1) above, we assume that all Xs are independent (uncorrelated) and, in addition, the
values of Xs are given constants (not random variables). The random errors, ’s (epsilons), are
assumed to be independently and normally distributed with mean equal to zero and a constant (but
unknown) variance 2. With these assumptions, the expected value of Y is:
E (Y )  E (0  1 X1  ...   k X k   )  0  1 X1  ...   k X k
(2)
5. Since the coefficients (’s) of equation (1) are unknown, we need to make estimations based on
the data collected for X and Y. In Regression Analysis, the method used to estimate these
coefficients is called “the least squares method” – so called because the resulting mathematical
function (called the least squares equation, least squares line, or simply regression line) has the
smallest sum of squared estimation errors (minimum SSE). The least squares equation can be
written as:
yˆ  ˆ0  ˆ1 x1 
where
 ˆk xk
(3)
ŷ (the fitted value of y) is used to estimate E(y), and
ˆi (the fitted values of i) is used to estimate i.
Note that ˆi are obtained by solving the following k + 1 equations (called normal equations)
simultaneously:

ˆ
n
 ( yi  yˆi )2 
i i 1

ˆ
n
( y
i i 1
i
 ˆ0  ˆ1 x1 
 ˆk xk ) 2  0
i  0,..., k
(4)
6. If all assumptions mentioned above hold, the sample statistics such as ˆi will hold some
important properties that allow us to perform statistical analysis on the regression model obtained.
1
Simple Regression – An Example
We will demonstrate how to do simple regression by using the following example. Suppose a
company wants to find out how its sales revenue (Y) is related to advertising expenditure (X). The
following data were collected.
X
1
2
3
4
5
Y
1
1
2
2
4
To visualize the relationship between Y and X, the following scatterplot is produced.
5
4
Y
3
2
1
0
0
1
2
3
4
5
6
X
We want to find the straight line that can “best” represent the relationship between Y and X.
2
Download