PROBABILITY & STATISTICS Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Chapter 11: Simple Linear Regression and Correlation Learning objectives 1. Empirical Models 2. Simple Linear Regression 3. Correlation Correlation Summary 19/05/2021 Department of Mathematics 1/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation • Many problems in engineering and science involve exploring the relationships between two or more variables. • Regression analysis is a statistical technique that is very useful for these types of problems. • For example, in a chemical process, suppose that the yield of the product is related to the process-operating temperature. • Regression analysis can be used to build a model to predict yield at a given temperature level. Summary 19/05/2021 Department of Mathematics 2/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Department of Mathematics 3/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Figure 11-1 Scatter Diagram of oxygen purity versus hydrocarbon level from Table 11-1. Department of Mathematics 4/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-line relationship: E(Y | x) = µY | x = β0 + β1 x regression coefficients. The simple linear regression model is given by Y = β 0 + β1 x + ε Summary random error 19/05/2021 Department of Mathematics 5/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Suppose that the mean and variance of ε are 0 and σ2, respectively, then E(Y | x) = E(β0 + β1x + ε ) = β0 + β1x + E(ε ) = β0 + β1x The variance of Y given x is V (Y | x) = V (β0 + β1x +ε ) =V (β0 + β1x) +V (ε ) = 0 +σ 2 = σ 2 The true regression model is a line of mean values: µY | x = β 0 + β1 x Department of Mathematics 6/40 Empirical Models Empirical models models Empirical Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Figure 11-2 The distribution of Y for a given value of x for the oxygen purity-hydrocarbon data. Department of Mathematics 7/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 • The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y. • The expected value of Y at each level of x is a random variable: E (Y | x) = β 0 + β1 x • We assume that each observation, Y, can be described by the model Y = β 0 + β1 x + ε Department of Mathematics 8/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Suppose that we have n pairs of observations (x1, y1), (x2, y2), …, (xn, yn): yi = β0 + β1xi +εi , i =1,...,n Figure 11-3 Deviations of the data from the estimated regression model. Department of Mathematics 9/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 The method of least squares is used to estimate the parameters, β0 and β1 by minimizing the sum of the squares of the vertical deviations in Figure 11-3. Figure 11-3 Deviations of the data from the estimated regression model. Department of Mathematics 10/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy The sum of the squares of the deviations of the observations from the true regression line is n n i =1 i =1 L = ∑ ε i2 = ∑ ( yi − β 0 − β1 x) 2 Correlation Summary 19/05/2021 Department of Mathematics 11/40 Simple Linear Regression Empirical models Simplifying these two equations yields Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 βˆ0 , βˆ1 = ? Notation n n ∑ xi ∑ yi n n S xy = ∑ ( yi − y )( xi − x ) = ∑ xi yi − i =1 i =1 n i =1 i =1 Department of Mathematics 12/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Theorem The least squares estimates of the intercept and slope in the simple linear regression model are βˆ 0 = y − βˆ1 x S xy ˆ β1 = S xx Estimated regression line is yˆ = βˆ0 + βˆ1 x Summary 19/05/2021 Department of Mathematics 13/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Department of Mathematics 14/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Example We will fit a simple linear regression model to the oxygen purity data in Table 11-1. The following quantities may be computed: Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Department of Mathematics 15/40 Simple Linear Regression Empirical models Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Therefore, the least squares estimates of the slope and intercept are Correlation Summary 19/05/2021 Department of Mathematics 16/40 Simple Linear Regression Empirical models The fitted simple linear regression model is Simple Linear Linear Simple Regression Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Department of Mathematics 17/40 Simple Linear Regression Empirical models Simple Linear Regression Estimating σσ22 Estimating Hypothesis tests Confidence intervals Prediction Adequacy Estimating σ2 Estimating σ2 The error sum of squares is We have E(SSE) = (n – 2)σ2. SSE = SST − β̂1Sxy Correlation Summary 19/05/2021 Department of Mathematics 18/40 Estimating σ2 Simple Linear Regression Empirical models Simple Linear Regression Estimating σσ22 Estimating Hypothesis tests Confidence intervals Prediction Adequacy Estimating σ2 Theorem An unbiased estimator of σ2 is σˆ 2 = where SS E n−2 σˆ Standard error SSE = SST − β̂1Sxy Correlation Summary 19/05/2021 Department of Mathematics 19/40 Hypothesis Tests in Simple Linear Regression Empirical models Test on the β1 Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Test statistic has the t distribution with n - 2 degrees of freedom. If |t0| > tα/2, n-2 : reject H0 If |t0| < tα/2, n-2 : fail to reject H0 Department of Mathematics 20/40 Hypothesis Tests in Simple Linear Regression Empirical models Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy Correlation Test on the β1 An important special case These hypotheses relate to the significance of regression. Failure to reject H0 is equivalent to concluding that there is no linear relationship between x and Y. Summary 19/05/2021 Department of Mathematics 21/40 Hypothesis Tests in Simple Linear Regression Empirical models Test on the β1 Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Figure 11-5 The hypothesis H0: β1 = 0 is not rejected. Department of Mathematics 22/40 Hypothesis Tests in Simple Linear Regression Empirical models Test on the β1 Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy Correlation Summary 19/05/2021 Figure 11-6 The hypothesis H0: β1 = 0 is rejected. Department of Mathematics 23/40 Hypothesis Tests in Simple Linear Regression Empirical models Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy Example We will test for significance of regression using the model for the oxygen purity data from Table 11-1. The hypotheses are and we will use α = 0.01. Recall Correlation Summary 19/05/2021 Test statistic Department of Mathematics 24/40 Hypothesis Tests in Simple Linear Regression Empirical models Simple Linear Regression Estimating σ2 Hypothesis Hypothesis tests tests Confidence intervals Prediction Adequacy tα/2, n-2 = t0.005,18 = 2.88 < |t0| Reject H0 Test on the β0 If |t0| > tα/2, n-2 : reject H0 If |t0| < tα/2, n-2 : fail to reject H0 Test statistic Correlation Summary 19/05/2021 Department of Mathematics 25/40 Confidence Intervals Empirical models Confidence Intervals on the Slope and Intercept Simple Linear Regression Under the assumption that the observations are normally and independently distributed, a 100(1-α)% confidence interval on the slope β1 in simple linear regression is Estimating σ2 Hypothesis tests Confidence Confidence intervals intervals Prediction Adequacy Correlation Summary 19/05/2021 βˆ1 − tα / 2 , n − 2 σˆ 2 S xx ≤ β 1 ≤ βˆ1 + tα / 2 , n − 2 σˆ 2 S xx Similarly, a 100(1-α)% confidence interval on the intercept β0 is βˆ0 − tα / 2,n −2 2 1 x2 1 x 2 ˆ +t ˆ σˆ + β β σ ≤ ≤ + α / 2, n − 2 0 0 n S n S xx xx 2 Department of Mathematics 26/40 Confidence Intervals Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence Confidence intervals intervals Example We will find a 95% confidence interval on the slope of the regression line using the data in Table 11-1. Recall CI 95% for β1 : Prediction Adequacy Correlation Summary 19/05/2021 Department of Mathematics 27/40 Confidence Intervals Empirical models Confidence Interval on the Mean Response Simple Linear Regression Estimating σ2 Hypothesis tests Confidence Confidence intervals intervals Prediction Adequacy Correlation A 100(1-α)% confidence interval about the mean response at the value of x=x0 is given by µˆY|x − tα / 2,n−2 0 2 1 ( ) x x − 2 0 σˆ + S n xx ≤ µY|x0 ≤ µˆY |x0 + tα / 2,n−2 2 1 ( ) x x − 2 0 σˆ + n S xx Summary 19/05/2021 Department of Mathematics 28/40 Confidence Intervals Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence Confidence intervals intervals Example We will find a 95% confidence interval about the mean response for the data in Table 11-1. The fitted model is If we are interested in predicting mean oxygen purity when x0 = 100% then Prediction Adequacy Correlation CI 95% on Summary 19/05/2021 Department of Mathematics 29/40 Confidence Intervals Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence Confidence intervals intervals Prediction Adequacy Correlation Scatter diagram of oxygen purity with fitted regression line and 95% confidence limits on µY|x0. Summary 19/05/2021 Department of Mathematics 30/40 Prediction of New Observations Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Prediction Adequacy A 100(1-α)% prediction interval on a future observation Y0 at the value x0 is given by yˆ 0 − tα / 2,n − 2 1 ( x0 − x ) 2 σˆ 1 + + n S xx 2 ≤ Y0 ≤ yˆ 0 + tα / 2,n − 2 1 ( x0 − x ) 2 σˆ 1 + + n S xx 2 Return to Table 11.1, confidence interval 95% on Y0 at x0 = 100% Correlation Summary 19/05/2021 Department of Mathematics 31/40 Prediction of New Observations Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Prediction Adequacy Correlation Summary 19/05/2021 Scatter diagram of oxygen purity data from Table 11.1 with fitted regression line, 95% prediction limits, and 95% confidence limits on µY|x0. Department of Mathematics 32/40 Adequacy of the Regression Model Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Adequacy Correlation • Fitting a regression model requires several assumptions. 1. Errors are uncorrelated random variables with mean zero; 2. Errors have constant variance; and, 3. Errors be normally distributed. • The analyst should always consider the validity of these assumptions to be doubtful and conduct analyses to examine the adequacy of the model Summary 19/05/2021 Department of Mathematics 33/40 Adequacy of the Regression Model Empirical models Coefficient of Determination (R2) Simple Linear Regression • The quantity Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Adequacy Correlation Summary 19/05/2021 is called the coefficient of determination and is often used to judge the adequacy of a regression model. • 0 ≤ R2 ≤ 1; • We often refer to R2 as the amount of variability in the data explained or accounted for by the regression model. Department of Mathematics 34/40 Adequacy of the Regression Model Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Adequacy Example For the oxygen purity regression model, R2 = SSR/SST = 152.13/173.38 = 0.877 Thus, the model accounts for 87.7% of the variability in the data. Correlation Summary 19/05/2021 Department of Mathematics 35/40 Correlation Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Definition The sample correlation coefficient n R= ∑(X i =1 n ∑(X i =1 i i − X )(Yi − Y ) − X) = n 2 ∑ (Y − Y ) i =1 2 S XY S XX SST i Note that Correlation Correlation Summary 19/05/2021 We may also write: Department of Mathematics 36/40 Correlation Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Properties of the Linear Correlation Coefficient r 1. –1 ≤ r ≤ 1 2. The value of r does not change if all values of either variable are converted to a different scale. 3. The value of r is not affected by the choice of x and y. Interchange all x- and y-values and the value of r will not change. 4. r measures strength of a linear relationship. Correlation Correlation Summary 19/05/2021 Department of Mathematics 37/40 Correlation y y r = −0.91 Empirical models r = 0.88 Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy x x Strong negative correlation y Strong positive correlation y r = 0.07 r = 0.42 Correlation Correlation Summary x x Weak positive correlation 19/05/2021 Nonlinear Correlation Department of Mathematics 38/40 Correlation Empirical models Test on the ρ Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Correlation Correlation Test statistic T0 = R n−2 1− R2 has the t distribution with n - 2 degrees of freedom. If |t0| > tα/2, n-2 : reject H0 If |t0| < tα/2, n-2 : fail to reject H0 Summary 19/05/2021 Department of Mathematics 39/40 Correlation Empirical models Test on the ρ Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy Test statistic Z 0 = (arctanh R − arctanh ρ0 ) n − 3 If |t0| > zα/2 : reject H0 If |t0| < zα/2 : fail to reject H0 Correlation Correlation Summary 19/05/2021 Department of Mathematics 40/40 SUMMARY Empirical models Simple Linear Regression Estimating σ2 Hypothesis tests Confidence intervals Prediction Adequacy We have studied: 1. Empirical Models 2. Simple Linear Regression 3. Correlation Homework: Read slides of the next lecture. Correlation Summary Summary 19/05/2021 Department of Mathematics 41/40