Business Statistics: A First Course Fifth Edition Chapter 12 Simple Linear Regression 简单线性回归 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based on an independent variable 如何利用回归分析在给定 一个自变量(解释变量)的前提下来预测因变量的值? The meaning of the regression coefficients b0 and b1回归系数的解释 How to evaluate the assumptions of regression analysis and know what to do if the assumptions are violated 如何对回归分析的条件假 设进行检验并掌握在假设条件不满足时的处理方法 To make inferences about the slope and correlation coefficient 对回 归方程中的斜率估计b1和决定自变量与因变量关系的相关系数进行统 计推断 To estimate mean values and predict individual values 去估计期望值 和预测单个个体的因变量值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-2 Correlation vs. Regression 相关分析和回归分析 A scatter plot can be used to show the relationship between two variables 散点图可以简单刻画两个变量之间的关系 Correlation analysis is used to measure the strength of the association (linear relationship) between two variables 相关分析用来度量两个变量之间的联系(特指线性关系)强度 Correlation is only concerned with strength of the relationship 相关分析只是度量关系的强度 No causal effect is implied with correlation 不带任何因果性分析 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-3 Introduction to Regression Analysis Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable 给定自变量值下,预测应变量的值 Explain the impact of changes in an independent variable on the dependent variable 可以解释自变量变动对因变量的影响程度 Dependent variable: the variable we wish to predict or explain 因变量也是我们研究感兴趣的对象,希望去预测和解释它 Independent variable: the variable used to predict or explain the dependent variable 自变量是用来预测和解释因变量的 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-4 Simple Linear Regression Model 简单线性回归模型 Only one independent variable, X 只考虑一个 自变量 Relationship between X and Y is described by a linear function 假定自变量X和因变量Y之间 的关系是线性的 Changes in Y are assumed to be related to changes in X 模型假定Y的变化可以由X的变化 引起 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-5 Types of Relationships Linear relationships 线性关系 Y Curvilinear relationships 曲线关系 Y X Y X Y X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. X Chap 12-6 Types of Relationships (continued) Weak relationships 弱线性相关 Strong relationships 强线性相关 Y Y X X Y Y X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. X Chap 12-7 Types of Relationships (continued) No relationship 线性不相关 Y X Y X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-8 Simple Linear Regression Model Population Slope Coefficient 总体的斜 率 Population Y intercept 总体的截距项 Dependent Variable因变 量 Independent Variable 自变量 Random Error term 随机误差项 Yi β0 β1Xi εi Linear component 线性部分 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Random Error component 随机扰动部分 Chap 12-9 Simple Linear Regression Model (continued) Y Yi β0 β1Xi εi Observed Value of Y for Xi Y的实 际观测值 εi Predicted Value of Y for Xi 给定Xi 下,Y的预测值 Slope = β1斜率 Random Error for this Xi value 给定Xi预测Y值时可能的随机扰动 Intercept = β0 截距项 Xi Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. X Chap 12-10 Simple Linear Regression Equation (Prediction Line) 简单线性回归方程(预测直线) The simple linear regression equation provides an estimate of the population regression line 简单线性回归方程是总体回 归线的一个估计 Estimated (or predicted) Y value for observation i 第i个个体的Y值的估 计或者预测 Estimate of the regression intercept 回归截 距项的估计 Estimate of the regression slope 回归斜率的估计 ˆ Yi b0 b1Xi Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Value of X for observation i 第i个观测的 解释变量X的 值 Chap 12-11 The Least Squares Method 最小二乘方法 b0 and b1 are obtained by finding the values of that minimize the sum of the squared differences between Y and Yˆ : 通过拟合残差平方和最小的方法来得到b0和b1, 这里拟合残差的定义就是Y- Yˆ 2 2 ˆ min (Yi Yi ) min (Yi (b0 b1Xi )) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-12 Finding the Least Squares Equation The coefficients b0 and b1 , and other regression results in this chapter, will be found using Excel or Minitab 本章所有求解回归系数b0和b1都是通过Excel或者 Minitab得到 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-13 Interpretation of the Slope and the Intercept b0 is the estimated mean value of Y when the value of X is zero b0 是当自变量X值为零时,因变量Y的均值的估计 b1 is the estimated change in the mean value of Y as a result of a one-unit change in X b1是指当自变量X变化一个单位时,Y的均值变化 的估计值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-14 Simple Linear Regression Example 例题 A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) 一个房地产经纪想调查房屋销 售价格和房子本身大小的关系 A random sample of 10 houses is selected 一个包含10 个房屋的随机样本被抽取 Dependent variable (Y) = house price in $1000s 因变量Y为房屋的价格(单位:千美元) Independent variable (X) = square feet 自变量X为平方英尺 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-15 Simple Linear Regression Example: Data 例题的数据 House Price in $1000s (Y) Square Feet (X) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-16 Simple Linear Regression Example: Scatter Plot 例题数据对应的散点图 House price model: Scatter Plot House Price ($1000s) 450 400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-17 Simple Linear Regression Example: Using Excel Excel中简单线性回归实现 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-18 Simple Linear Regression Example: Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error The regression equation is 回归方程如下: house price 98.24833 0.10977 (squarefeet) 41.33032 Observations 10 ANOVA df SS MS F 11.0848 Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Intercept Square Feet Standard Error t Stat P-value Significance F 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-19 Simple Linear Regression Example: Minitab Output Minitab中简单线性回归实现 The regression equation is: The regression equation is Price = 98.2 + 0.110 Square Feet Predictor Coef SE Coef Constant 98.25 58.03 Square Feet 0.10977 0.03297 T P 1.69 0.129 3.33 0.010 house price = 98.24833 + 0.10977 (square feet) S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS MS F P 18935 18935 11.08 0.010 13666 1708 32600 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-20 Simple Linear Regression Example: Graphical Representation 图形表示 House price model: Scatter Plot and Prediction Line 房屋定价的简单线性回归模型 House Price ($1000s) 450 Intercept = 98.248 400 350 Slope = 0.10977 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet house price 98.24833 0.10977 (squarefeet) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-21 Simple Linear Regression Example: Interpretation of bo house price 98.24833 0.10977 (squarefeet) b0 is the estimated mean value of Y when the value of X is zero (if X = 0 is in the range of observed X values) b0 值是给定X值为零时的Y均值的估计值(一般要求观察 到的X值应包含X=0) Because a house cannot have a square footage of 0, b0 has no practical application 考虑到实际中不可能有房屋为0平方英尺,故b0 在这里无 实际应用意义 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-22 Simple Linear Regression Example: Interpreting b1 house price 98.24833 0.10977 (squarefeet) b1 estimates the change in the mean value of Y as a result of a one-unit increase in X b1是 对X变化一个单位时引起Y均值的变化的估计 Here, b1 = 0.10977 tells us that the mean value of a house increases by 0.10977($1000) = $109.77, on average, for each additional one square foot of size 每增加一平方英尺,房屋平均价格增加109.77 美元 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-23 Simple Linear Regression Example: Making Predictions Predict the price for a house with 2000 square feet:预测一个2000平方英尺的房屋的价格 house price 98.25 0.1098 (sq.ft.) 98.25 0.1098(2000) 317.85 The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-24 Simple Linear Regression Example: Making Predictions When using a regression model for prediction, only predict within the relevant range of data 回归方程在预测 时,只对那些X取值在观测区间的有效 Relevant range for interpolation X观测值区间 House Price ($1000s) 450 400 350 300 250 200 Do not try to extrapolate beyond the range of observed X’s 150 100 50 0 0 500 1000 1500 2000 2500 3000 不能对超出X观测范围的回归直线 的外沿部分进行预测 Square Feet Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-25 Measures of Variation 随机扰动的度量 Total variation is made up of two parts: SST Total Sum of Squares总平方和 SST ( Yi Y)2 SSR Regression Sum of Squares回归平方和 ˆ Y)2 SSR ( Y i SSE Error Sum of Squares 残差平方和 ˆ )2 SSE ( Yi Y i where: Y = Mean value of the dependent variable 因变量观测值的均值 Yi = Observed value of the dependent variable因变量第i个观测值 Yˆi = Predicted value of Y for the given Xi value因变量对应的估计值 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-26 Measures of Variation (continued) SST = total sum of squares Measures the variation of the Yi values around their mean Y 度量所有应变量观测值围绕均值的变差 SSR = regression sum of squares (Explained Variation) (Total Variation) Variation attributable to the relationship between X and Y 由X解释掉的(通过回归方程)Y的变差 SSE = error sum of squares (Unexplained Variation) Variation in Y attributable to factors other than X X未能解释掉的Y的变差 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-27 Measures of Variation (continued) Y Yi SSE = (Yi - Yi )2 Y _ Y SST = (Yi - Y)2 _ SSR = (Yi - Y)2 _ Y Xi Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. _ Y X Chap 12-28 Coefficient of Determination, 决定系数 r2 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable 决定系数指应变量的总变差(总平方和)中被自变量 解释掉的(回归平方和)部分的比例 The coefficient of determination is also called r-squared and is denoted as r2也称为r平方 SSR regression sum of squares r SST total sum of squares 2 note: 0 r 1 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 2 Chap 12-29 Examples of r2 Values Y r2 = 1 r2 = 1 X Y r2 =1 Perfect linear relationship between X and Y: X和Y是完全的线性关系 100% of the variation in Y is explained by variation in X 则Y的总变差完全被X100%的 解释掉 X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-30 Examples of r2 Values Y 0 < r2 < 1 X Weaker linear relationships between X and Y: X和Y的线 性关系相对较弱 X Some but not all of the variation in Y is explained by variation in X X可以解释但不能完全解释掉 Y的总变差 Y Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-31 Examples of r2 Values r2 = 0 No linear relationship between X and Y: X和Y完全没 有线性相关关系 Y r2 = 0 X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)Y的变化与X无关 (X不能解释Y的任何变化) Chap 12-32 Simple Linear Regression Example: Coefficient of Determination, r2 in Excel SSR 18934.9348 r 0.58082 SST 32600.5000 2 Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 58.08% of the variation in house prices is explained by variation in square feet 房屋面积的变动能够解释掉房价变化的 58.08% 41.33032 Observations 10 ANOVA df SS MS F 11.0848 Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Intercept Square Feet Standard Error t Stat P-value Significance F 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-33 Simple Linear Regression Example: Coefficient of Determination, r2 in Minitab The regression equation is Price = 98.2 + 0.110 Square Feet Predictor Coef SE Coef Constant 98.25 58.03 Square Feet 0.10977 0.03297 T P 1.69 0.129 3.33 0.010 r2 SSR 18934.9348 0.58082 SST 32600.5000 S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS MS F P 18935 18935 11.08 0.010 13666 1708 32600 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 58.08% of the variation in house prices is explained by variation in square feet 房屋面积的变动能够解释掉房 价变化的58.08% Chap 12-34 Standard Error of Estimate 估计的标准差 The standard deviation of the variation of observations around the regression line is estimated by 观察值回绕回 归直线变动的标准差的估计为 n S YX SSE n2 (Yi Yˆi ) 2 i 1 n2 Where SSE = error sum of squares 残差平方和 n = sample size 样本量 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-35 Simple Linear Regression Example: Standard Error of Estimate in Excel Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error SYX 41.33032 41.33032 Observations 10 ANOVA df SS MS F 11.0848 Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Intercept Square Feet Standard Error t Stat P-value Significance F 0.01039 Lower 95% Upper 95% 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-36 Simple Linear Regression Example: Standard Error of Estimate in Minitab The regression equation is Price = 98.2 + 0.110 Square Feet Predictor Coef SE Coef Constant 98.25 58.03 Square Feet 0.10977 0.03297 T P 1.69 0.129 3.33 0.010 SYX 41.33032 S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS MS F P 18935 18935 11.08 0.010 13666 1708 32600 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-37 Comparing Standard Errors SYX is a measure of the variation of observed Y values from the regression line SYX 度量了所有观测到的Y围绕回归直线变化的程度 Y Y small SYX X large SYX X The magnitude of SYX should always be judged relative to the size of the Y values in the sample data 判断SYX 的大小要参照Y本身度量的大小 i.e., SYX = $41.33K is moderately small relative to house prices in the $200K - $400K range Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-38 Assumptions of Regression L.I.N.E 回归模型中的假设 Linearity 线性关系 The relationship between X and Y is linear X和Y之间具有 线性关系 Independence of Errors 独立性 Error values are statistically independent 随机误差项之间 是相互独立的 Normality of Error 随机扰动的正态性 Error values are normally distributed for any given value of X 给定X下,误差项是服从正态分布的 Equal Variance (also called homoscedasticity) 等方差性 The probability distribution of the errors has constant variance 所有误差项的分布具有相同方差 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-39 Residual Analysis 残差分析 ˆ ei Yi Y i The residual for observation i, ei, is the difference between its observed and predicted value 残差就是观测值和预测值之间的差异 Check the assumptions of regression by examining the residuals 通过对残差的分析和检验,可以判断回归模型的假设是否成立 Examine for linearity assumption Evaluate independence assumption Evaluate normal distribution assumption Examine for constant variance for all levels of X (homoscedasticity) Graphical Analysis of Residuals 常用残差图分析来实现 Can plot residuals vs. X 可以画残差 vs. X的散点图 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-40 Residual Analysis for Linearity 线性假设的残差分析 Y Y x x Not Linear Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. residuals residuals x x Linear Chap 12-41 Residual Analysis for Independence 独立性假设的残差分析 Not Independent X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. residuals residuals X residuals Independent X Chap 12-42 Checking for Normality 正态性假设的检验 Examine the Stem-and-Leaf Display of the Residuals 分析残差的茎叶图 Examine the Boxplot of the Residuals 分析残差 的盒子图 Examine the Histogram of the Residuals分析残 差的柱状图 Construct a Normal Probability Plot of the Residuals构造残差的正态概率图 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-43 Residual Analysis for Normality When using a normal probability plot, normal errors will approximately display in a straight line 当用正态概率图进 行判断时,如果点近似在一直线附近,则认为正态性假设 成立 100 Percent 0 -3 -2 -1 0 1 2 3 Residual Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-44 Residual Analysis for Equal Variance 方差齐性的残差分析 Y Y x x Non-constant variance Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. residuals residuals x x Constant variance Chap 12-45 Simple Linear Regression Example: Excel Residual Output RESIDUAL OUTPUT Residuals 1 251.92316 -6.923162 2 273.87671 38.12329 3 284.85348 -5.853484 4 304.06284 3.937162 5 218.99284 -19.99284 80 60 40 Residuals Predicted House Price House Price Model Residual Plot 20 0 6 268.38832 -49.38832 -20 7 356.20251 48.79749 -40 8 367.17929 -43.17929 -60 9 254.6674 64.33264 10 284.85348 -29.85348 0 1000 2000 3000 Square Feet Does not appear to violate any regression assumptions Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. 似乎没有违反任何回归模型的假设 Chap 12-46 Inferences About the Slope 斜率的统计推断 The standard error of the regression slope coefficient (b1) is estimated by 回归直线斜率估 计b1的标准差的估计为 S YX Sb1 SSX S YX (X X) 2 i where: Sb1= Estimate of the standard error of the slope 斜率的标准差的估计 S YX SSE = Standard error of the estimate 模型估计的标准误 n2 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-47 Inferences About the Slope: t Test 斜率统计推断的t检验 t test for a population slope 总体斜率的t检验 Null and alternative hypotheses 对应的统计原假设和备择 假设 Is there a linear relationship between X and Y? X和Y之间有线性 关系吗 H0: β1 = 0 H1: β1 ≠ 0 (no linear relationship 无限性关系) (linear relationship does exist 线性关系确实存在) Test statistic 检验统计量 t STAT b1 β1 Sb 1 d.f. n 2 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. where: b1 = regression slope coefficient 回归斜率的估计 β1 = hypothesized slope 假设的回归斜率值 Sb1 = standard error of the slope 回归斜率的估计的标准误 Chap 12-48 Inferences About the Slope: t Test Example House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 Estimated Regression Equation: Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. house price 98.25 0.1098 (sq.ft.) The slope of this model is 0.1098 Is there a relationship between the square footage of the house and its sales price? 模型中斜率项的估计为0.1098,则 房屋的大小(平方英尺)和售价之 间有线性关系吗? Chap 12-49 Inferences About the Slope: t Test Example H0: β1 = 0 H1: β1 ≠ 0 From Excel output: Coefficients Intercept Square Feet Standard Error t Stat P-value 98.24833 58.03348 1.69296 0.12892 0.10977 0.03297 3.32938 0.01039 From Minitab output: b1 Predictor Coef SE Coef Constant 98.25 58.03 Square Feet 0.10977 0.03297 T P 1.69 0.129 3.33 0.010 b1 Sb1 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Sb1 t STAT b1 β 1 Sb 0.10977 0 3.32938 0.03297 1 Chap 12-50 Inferences About the Slope: t Test Example Test Statistic: tSTAT = 3.329 H0: β1 = 0 H1: β1 ≠ 0 d.f. = 10- 2 = 8 a/2=.025 Reject H0 a/2=.025 Do not reject H0 -tα/2 -2.3060 0 Reject H0 tα/2 2.3060 3.329 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Decision: Reject H0 There is sufficient evidence that square footage affects house price 有足够的证据 认为房屋的大小影响到房 屋的售价 Chap 12-51 Inferences About the Slope: H :β =0 t Test Example 0 1 H1: β1 ≠ 0 From Excel output: Coefficients Intercept Square Feet Standard Error t Stat P-value 98.24833 58.03348 1.69296 0.12892 0.10977 0.03297 3.32938 0.01039 From Minitab output: Predictor Coef SE Coef Constant 98.25 58.03 Square Feet 0.10977 0.03297 T P 1.69 0.129 3.33 0.010 p-value Decision: Reject H0, since p-value < α There is sufficient evidence that square footage affects house price. 用p-值同样的结论可以得到 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-52 F Test for Significance 显著性的F检验 MSR F Test statistic: F STAT MSE where SSR 1 SSE MSE n2 MSR where FSTAT follows an F distribution with 1 numerator and (n – 2) denominator degrees of freedom F统计量服从F分布,其分子自由度为 1,分母自由度为n-2 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-53 F-Test for Significance Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error MSR 18934.9348 FSTAT 11.0848 MSE 1708.1957 41.33032 Observations 10 With 1 and 8 degrees of freedom p-value for the F-Test ANOVA df SS MS F 11.0848 Regression 1 18934.9348 18934.9348 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Significance F 0.01039 Chap 12-54 F-Test for Significance Minitab Output Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS MS F P 18935 18935 11.08 0.010 13666 1708 32600 With 1 and 8 degrees of freedom Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. FSTAT p-value for the F-Test MSR 18934.9348 11.0848 MSE 1708.1957 Chap 12-55 F Test for Significance (continued) H0: β1 = 0 H1: β1 ≠ 0 a = .05 df1= 1 df2 = 8 Critical Value: Fa = 5.32 a = .05 0 Do not reject H0 Reject H0 Test Statistic: FSTAT MSR 11.08 MSE Decision: Reject H0 at a = 0.05 Conclusion: There is sufficient evidence that house size affects selling price 有足够的理由认 F 为房子的面积影响到房子的销售价格 F.05 = 5.32 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-56 Confidence Interval Estimate for the Slope 斜率估计的置信区间 Confidence Interval Estimate of the Slope:斜率的置信区间估计 b1 tα / 2Sb d.f. = n - 2 1 Excel Printout for House Prices: Intercept Square Feet Coefficients Standard Error t Stat P-value 98.24833 0.10977 Lower 95% Upper 95% 58.03348 1.69296 0.12892 -35.57720 232.07386 0.03297 3.32938 0.01039 0.03374 0.18580 At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858) Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-57 Confidence Interval Estimate for the Slope (continued) Intercept Square Feet Coefficients Standard Error t Stat P-value 98.24833 0.10977 Lower 95% Upper 95% 58.03348 1.69296 0.12892 -35.57720 232.07386 0.03297 3.32938 0.01039 0.03374 0.18580 Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.74 and $185.80 per square foot of house size 可以认为房屋大小增加一平方英尺,平均带来售价的增长在 33.74和185.80之间 This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance 因为该区间不包括0,我们认为在0.05水平上,房屋售价和大 小的线性关系显著存在 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-58 Estimating Mean Values and Predicting Individual Values 估计均值和预测个体值 Goal: Form intervals around Y to express uncertainty about the value of Y for a given Xi 目标是在给定Xi下构造Y的置信区间来刻画Y 的不确定性 Confidence Interval for the mean of Y, given Xi Y Y Y = b0+b1Xi Prediction Interval for an individual Y, given Xi Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Xi X Chap 12-59 Confidence Interval for the Average Y, Given X Confidence interval estimate for the mean value of Y given a particular Xi 给定某个Xi时,对应Y的均值的置信区间估计公 式如下: Confidenceint ervalfor μ Y|X X i : Yˆ ta / 2SYX hi Size of interval varies according to distance away from mean, X 区间的大小随着每一个X 离开均值距离大小不同而不同 1 (X i X)2 1 (X i X)2 hi n SSX n (X i X)2 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-60 Prediction Interval for an Individual Y, Given X Prediction interval estimate for an Individual value of Y given a particular Xi 给定某个Xi下,该个体Y值的预测区间 估计公式如下: Prediction interval for YXX : i Yˆ t α / 2 S YX 1 hi This extra term adds to the interval width to reflect the added uncertainty for an individual case 相对于Y平均值的置信区间,这个而外多出来的部 分反映了个体本身不确定性带来的可能变动 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-61 Estimation of Mean Values: Example Confidence Interval Estimate for μY|X=X i Find the 95% confidence interval for the mean price of 2,000 square-foot houses Predicted Price Yi = 317.85 ($1,000s) 1 ˆ t Y S 0.025 YX n (X i X) 2 (X i X) 2 317.85 37.12 The confidence interval endpoints are 280.66 and 354.90, or from $280,660 to $354,900 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-62 Estimation of Individual Values: Example Prediction Interval Estimate for YX=X i Find the 95% prediction interval for an individual house with 2,000 square feet Predicted Price Yi = 317.85 ($1,000s) 1 ˆ t Y S 1 0.025 YX n (X i X) 2 (X i X) 2 317.85 102.28 The prediction interval endpoints are 215.50 and 420.07, or from $215,500 to $420,070 Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-63 Finding Confidence and Prediction Intervals in Excel (continued) Input values Y Confidence Interval Estimate for μY|X=Xi Prediction Interval Estimate for YX=Xi Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-64 Finding Confidence and Prediction Intervals in Minitab Confidence Interval Estimate for μY|X=Xi Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 317.8 16.1 (280.7, 354.9) (215.5, 420.1) Y Values of Predictors for New Observations New Square Obs Feet 1 2000 Input values Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Prediction Interval Estimate for YX=Xi Chap 12-65 Chapter Summary Introduced types of regression models Reviewed assumptions of regression and correlation Discussed determining the simple linear regression equation Described measures of variation Discussed residual analysis Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-66 Chapter Summary (continued) Described inference about the slope Discussed correlation -- measuring the strength of the association Addressed estimation of mean values and prediction of individual values Discussed possible pitfalls in regression and recommended strategies to avoid them Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.. Chap 12-67