2 Introduction to Regression Analysis Basic Concept 1. Regression Analysis is a process of estimating the functional relationship between a random variable Y (called dependent variable) and one or more other non-random variables Xs (called independent variables). 2. In regression analysis, we assume that the relationship between the dependent variable Y and the independent variables Xs is linear, as in the following form: Y 0 1 X1 ... k X k (1) 3. If equation (1) above involves only one independent variable (i.e., Y 0 1 X 1 ), it is called a simple regression model. If it involves more than one independent variable, it is referred to as a multiple regression model. 4. In equation (1) above, we assume that all Xs are independent (uncorrelated) and, in addition, the values of Xs are given constants (not random variables). The random errors, ’s (epsilons), are assumed to be independently and normally distributed with mean equal to zero and a constant (but unknown) variance 2. With these assumptions, the expected value of Y is: E (Y ) E (0 1 X1 ... k X k ) 0 1 X1 ... k X k (2) 5. Since the coefficients (’s) of equation (1) are unknown, we need to make estimations based on the data collected for X and Y. In Regression Analysis, the method used to estimate these coefficients is called “the least squares method” – so called because the resulting mathematical function (called the least squares equation, least squares line, or simply regression line) has the smallest sum of squared estimation errors (minimum SSE). The least squares equation can be written as: yˆ ˆ0 ˆ1 x1 where ˆk xk (3) ŷ (the fitted value of y) is used to estimate E(y), and ˆi (the fitted values of i) is used to estimate i. Note that ˆi are obtained by solving the following k + 1 equations (called normal equations) simultaneously: ˆ n ( yi yˆi )2 i i 1 ˆ n ( y i i 1 i ˆ0 ˆ1 x1 ˆk xk ) 2 0 i 0,..., k (4) 6. If all assumptions mentioned above hold, the sample statistics such as ˆi will hold some important properties that allow us to perform statistical analysis on the regression model obtained. 1 Simple Regression – An Example We will demonstrate how to do simple regression by using the following example. Suppose a company wants to find out how its sales revenue (Y) is related to advertising expenditure (X). The following data were collected. X 1 2 3 4 5 Y 1 1 2 2 4 To visualize the relationship between Y and X, the following scatterplot is produced. 5 4 Y 3 2 1 0 0 1 2 3 4 5 6 X We want to find the straight line that can “best” represent the relationship between Y and X. 2