Introduction to Regression Analysis

Introduction to Regression Analysis • We use sample data to • estimate a population mean () or (1 - 2) • estimate a population proportion (p) or (p1 - p2) • test of hypothesis about  or (1 - 2) • test of hypothesis about p or (p1 - p2). • Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future. • The process of finding a mathematical model (an equation) that best fits the data is known as regression analysis. 1 Introduction to Regression Analysis • The variable to be predicted (or modeled), y, is called the dependent variable. • The variables used to predict (or model) y are called independent variables and are denoted by the symbols x1, x2, x3, etc.. • General form of probabilistic model in regression: y   y| x1 , x2 ,..., xk     0  1 x1   2 x2  ...   k xk   where y = dependent variable  y| x1 , x2 ,..., xk = mean or expected value of y, deterministic component  = unexplainable, or random error component • Estimation/prediction equation yˆ  b0  b1 x1  b2 x2  ...  bk xk 2 Form of The Simple Linear Regression Model y= μ y|x  ε = β0  β1 x  ε y|x = 0 + 1x is the mean value of the dependent variable y when the value of the independent variable is x 0 is the y-intercept, the mean of y when x is 0 (when there is observed any values of x near 0) 1 is the slope, the change in the mean of y per unit change in x (over the range of sample x-values)  is an error term that describes the effect on y of all factors other than x 3 The Simple Linear Regression Model Illustrated 4 Regression Terms • β0 and β1 are called regression parameters • β0 is the y-intercept and β1 is the slope • We do not know the true values of these parameters • So, we must use sample data to estimate them • b0 is the estimate of β0 and b1 is the estimate of β1 5 The Least Squares Point Estimates Estimation/prediction equation yˆ  b0  b1 x Slope: b1  SSxy SSxx y-intercept: b0  y  b1 x x  x i n y y n=sample size i n SS xy   ( xi  x )( yi  y )   xi yi  nxy SS xx   ( xi  x ) 2   xi  n( x ) 2 2 MS EXCEL: =SLOPE(y range, x range) =INTERCEPT(y range, x range) 6 An Estimator of 2 SSE s  n2 2 where SSE   ( yi  yˆi )2  SS yy  b1SS xy   yi2  n( y)2  b1SS xy n = sample size s = standard deviation of error = standard error of estimate 7 A 100(1-)% confidence interval for the simple linear regression slope 1 b1  t / 2 sb1 where sb1  s SS xx t/2 is based on (n-2) degree of freedom 8 Testing the Significance of the Slope One Tailed Test Ho: 1 = 0 Ha: 1 < 0 or 1 > 0 Two Tailed Test Ho: 1 = 0 Ha: 1  0 b1 Test Statistic: t  sb1 Rejection region: t< -t or t> t Where t is based on (n-2) degree of freedom Rejection region: |t|>t/2 Where t/2 is based on (n-2) degree of freedom 9 The 100(1-)% confidence interval for the mean value of y for x=xp y  t / 2 s 1  n ( x p  x )2 SS xx Where t/2 is based on (n-2) degree of freedom 10 The 100(1-)% prediction interval for an individual y for x=xp 1 y  t / 2 s 1   n ( x p  x )2 SS xx Where t/2 is based on (n-2) degree of freedom 11 Simple Coefficient of Determination 2 ˆ ( y  y )  i Explained Variation  2 ( y  y ) Total Variation  i r2 = About 100(r2)% of the sample variation in y can be explained by using x to predict y in the simple linear regression model. yi ŷi y Un-Explained Variation Explained Variation Total Variation xi 12 The coefficient of correlation SSxy Where r = ---------------SS yy   ( yi  y ) 2   yi2  ny 2 SSxx SSyy r for sample and  (rho) for population -1< r <1 r > 0 means that y increases as x increases r < 0 means that y decreases as x increases r  0 little or no linear relationship between y and x. the closer r to 1 or –1, the stronger the relationship. High correlation does not imply causality. Only a linear trend may exist between x and y. r  r 2 when b1>0 or r   r2 when b1<0 13 Exercise • What is the range of values that the coefficient of determination can assume? ___ • If the value of r is -0.96, what does this indicate about the dependent variable as the independent variable increases? __ • If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? __ • What does the coefficient of determination equal if r = 0.89? Exercise • In the regression equation, what does the letter "b" represent? • What is the null hypothesis to test the significance of the slope in a regression equation? • The regression equation is Ŷ = 29.29 - 0.96X, the sample size is 8, and the standard error of the slope is 0.22. What is the test statistic to test the significance of the slope? 15 Exercise • • • • • Page 488 no. 26 Page 494 no. 31 Page 500 no. 38 Page 502 no. 46 Page 506 no. 56 16

Introduction to Regression Analysis

Related documents

Products

Support

Introduction to Regression Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib