July 2013 Chapter 17 Least-Square Regression أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Where substantial error is associated with data, polynomial interpolation is inappropriate and may yield unsatisfactory results when used to predict intermediate values. Experimentally data is often of this type. For example, the following figure (a) shows seven experimentally derived data points showing significant variability. The data indicates that higher values of y are associated with higher values of x. Now, if a sixth-order interpolating polynomial is fitted to this data (fig b), it will pass exactly through all of the points. However, because of the variability in the data, the curve oscillates widely in the interval between the points. In particular, the interpolated values at x = 1.5 and x = 6.5 appear to be well beyond the range suggested by the data. A more appropriate strategy is to derive an approximating function that fits the shape . Fig (c) illustrates how a straight line can be used to generally characterize the trend of the data without passing through any particular point. One way to determine the line in figure (c) is to look at the plotted data and then sketch a “best” line through the points. Such approaches are not enough because they are arbitrary. That is, unless the points define a perfect straight line (in which case, interpolation would be appropriate), different analysis would draw different lines. قااااد مااااا يألاااا أرلبنااااا نظاااارة أنااااا يكادون يهتمون بنا على اإلطالق أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 To avoid this, some criterion must be devised to establish a basis for the fit. One way to do this is to derive a curve that minimizes the discrepancy between the data points and the curve. One technique for doing this is called least-squares regression. إماااا أن تتعااال لتحصااال علاااى ماااا تحااالو أو ساااتجبر نفسااا علاااى حل ما تحصل عليه أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.1 Linear Regression The simple example of a least-squares approximation is fitting a straight line to a set of paired observations: (x1,y1), (x2,y2), …, (xn,yn). The mathematical expression for the straight line is y = a0 + a 1 x + e where a0 and a1 are coefficients representing the intercept and the slope, respectively, and e is the error between the model and the observations, which can be represented by rearranging the previous equation as e = y – a0 – a1 x thus, the error is the discrepancy between the true value of y and the approximate value, a0 + a1x, predicted by the linear equation. 17.1.1 Criteria for the “best” fit One strategy for fitting a “best” line through the data would be to minimize the sum of the residual errors for all the available data, as in ∑𝑛𝑖=1 𝑒𝑖 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 ) where n = total number of points. However, this is an inadequate criterion, as illustrated by the next figure, which shows the fit of a straight line to two points. قدرات كل شخص حسل الحدود التي يضعها لنفسه أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Obviously, the best fit is the line connecting the points. However, any straight line passing through the midpoint of the connecting line results in a minimum value of the previous equation equal to zero because the errors cancel. كااس سااعادت عناادما تحأا إنجاااتات يعتأد النا إن حتما تعجر عنها أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Therefore, another logical criterion might be to minimize the sum of the absolute values of the discrepancies, as in ∑𝑛𝑖=1|𝑒𝑖 | = ∑𝑛𝑖=1|𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 | The previous fig (b) demonstrates why this criterion is also inadequate. For the four points shown, any straight line falling within the dashed lines will minimize the sum of the absolute values. Thus, this criterion also does not yield a unique best fit. A third strategy for fitting a best line is the minimax criterion. In this technique, the line is chosen that minimizes the maximum distance that an individual point falls from the line. As shown in previous fig (c), this strategy is ill-suited for regression because it gives big effect to an outlier, that is, a single point with a large error. A strategy that overcomes the shortcomings of the previous approaches is to minimize the sum of the squares of the residuals between the measured y and the y calculated with the linear model. 𝑛 𝑛 𝑛 𝑆𝑟 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖,𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 − 𝑦𝑖,𝑚𝑜𝑑𝑒𝑙 )2 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )2 𝑖=1 𝑖=1 𝑖=1 This criterion has a number of advantages, including the fact that it yields a unique line for a give set of data. لعله من عجائل الحياةو إن إذا رفضت كل ما هو دون الأمةو فإن ستحصل عليه أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.1.2 Least-Squares fit of a straight line To determine values of a0 and a1, the previous equation is differentiated with respect to each coefficient: 𝜕𝑆𝑟 𝜕𝑎0 𝜕𝑆𝑟 𝜕𝑎1 = −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 ) = −2 ∑[(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )𝑥𝑖 ] Note that we have simplified the summation symbols; unless otherwise indicated, all summation are from i = 1 to n. Setting these derivatives equal to zero will result in a minimum Sr. 0 = ∑ 𝑦𝑖 − ∑ 𝑎0 − ∑ 𝑎1 𝑥𝑖 0 = ∑ 𝑦𝑖 𝑥𝑖 − ∑ 𝑎0 𝑥𝑖 − ∑ 𝑎1 𝑥𝑖2 Now, realizing that ∑ 𝑎0 = na0, we can express the equations as a set of two simultaneous linear equations with two unknowns (a0 and a1): 𝑛𝑎0 + (∑ 𝑥𝑖 )𝑎1 = ∑ 𝑦𝑖 (17.4) (∑ 𝑥𝑖 )𝑎0 + (∑ 𝑥𝑖2 )𝑎1 = ∑ 𝑥𝑖 𝑦𝑖 These are called the normal equations. They can be solved simultaneously 𝑎1 = 𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥 𝑖 ∑ 𝑦𝑖 𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2 This result can then be used in conjunction with Eq. (17.4) to solve for 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ where 𝑦̅ and 𝑥̅ are the means of y and x, respectively. ماااااا يمكااااان تخيلاااااه يمكااااان تحأيأااهو ومااا يمكاان تحأيأااه لن نعدم طريأا للوصول إليه أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Example 17.1 Linear Regression Problem Statement: Fit a straight line to the x and y values in the first two columns of the next table Solution: The following quantities can be computed ∑ 𝑥𝑖 𝑦𝑖 = 119.5 n=7 ∑ 𝑥𝑖 = 28 𝑥̅ = ∑ 𝑦𝑖 = 24 𝑦̅ = 28 7 24 7 ∑ 𝑥𝑖2 = 140 =4 = 3.428571 Using the previous two equations, 𝑎1 = 7(119.5)− 28(24) 7(140)− (28)2 = 0.8392857 𝑎0 = 3.428571 − 0.8392857(4) = 0.07142857 Therefore, the least-square fit is 𝑦 = 0.07142857 + 0.8392857𝑥 The line, along with the data, is shown in the first figure (c). احذر عدوك مرة وصديأ ألف مرة فإن انألل الصدي فهو أعلس بالمضرة أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.1.3 Quantification of Error of Linear Regression Any line other than the one computed in the previous example results in a larger sum of the squares of the residuals. Thus, the line is unique and in terms of our chosen criterion is a “best” line through the points. A number of additional properties of this fit can be explained by examining more closely the way in which residuals were computed. Recall that the sum of the squares is defined as 𝑆𝑟 = ∑𝑛𝑖=1 𝑒𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )2 Notice the similarity between the previous equation and 𝑆𝑡 = ∑(𝑦𝑖 − 𝑦̅)2 The similarity can be extended further for cases where (1) the spread of the points around the line is of similar magnitude along the entire range of the data and (2) the distribution of these points about the line is normal. It can be demonstrated that if these criteria are met, least-square regression will provide the best (that is, the most likely) estimates of a 0 and a1. In addition, if these criteria are met, a “standard deviation” of the regression line can be determined as 𝑠𝑦⁄𝑥 = √ 𝑆𝑟 𝑛−2 أفضل وسيلة للبار بالوعاد أن )تعد (نابليون بونابارت أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 where 𝑠𝑦⁄𝑥 is called the standard error of the estimate. The subscript notation “𝑦⁄𝑥 ” designates that the error is for a predicted value of y corresponding to a particular value of x. Also, notice that we now divide by n-2 because two data derived estimates – a0 and a1 – were used to compute Sr; thus, we have lost two degrees of freedom. Another justification for dividing by n-2 is that there is no such thing as the “spread of data” around a straight line connecting two points.. The standard error of the estimate quantifies the spread of the data. However, 𝑠𝑦⁄𝑥 quantifies the spread around the regression line as shown in the next figure (b) in contrast to the original standard deviation Sy that quantified the spread around the mean ( fig (a)). إذا أساديت جمايال إلااى إنساان فاحااذر أن تاااذكرإ وإن أسااادن إنساااان إليااا جميال فاحذر أن تنساإ أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 The above concepts can be used to quantify the “goodness” of our fit. This is particularly useful for comparison of several regressions (next figure). To do this, we return to the original data and determine the total sum of the squares around the mean for the dependent variable (in our case, y). This quantity is designated as St. This is the magnitude of the residual error associated with the dependent variable prior to regression. After performing the regression, we can compute Sr, the sum of the squares of the residuals around the regression line. This characterizes the residual error that remains after the regression. It is, therefore, sometimes called the unexplained sum of the squares. فسااااتأول. . . تكلااااس وأناااات راضاااال أعظس حديث تندم عليه طوال حيات أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 The difference between the two quantifies, St – Sr, quantifies the improvement or error reduction due to describing the data in terms of a straight line rather than as an average value. Because the magnitude of this quantity is scale-dependent, the difference is normalized to St to yield 𝑟2 = 𝑆𝑡 − 𝑆𝑟 𝑆𝑡 where r2 is called the coefficient of determination and r is the correlation coefficient (= √𝑟 2 ). For a perfect fit, Sr = 0 and r = r2 = 1, signifying that the line explains 100 percent of the variability of the data. For r = r2 = 0, Sr = St and the fit represents no improvement. An alternative formulation for r that is more convenient for computer implementations is 𝑟= 𝑛 ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 )(∑ 𝑦𝑖 ) √𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2 √𝑛 ∑ 𝑦𝑖2 − (∑ 𝑦𝑖 )2 ربما احتجت لسنة كاملة لكسل صدي و لكن من السهل خسارته في دقيأة واحدة أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Example 17.2 Estimate of errors for the linear least-Squares Fit Problem Statement: Compute the total standard deviation, the standard error of the estimate, and the correlation coefficient for the data in Example 17.1 Solution: The summations are performed and represented in the previous example’s table. The standard deviation is 𝑆𝑦 = √ 𝑆𝑡 = √ 𝑛−1 22.7143 7−1 = 1.9457 and the standard error of the estimate is 𝑆𝑦⁄𝑥 = √ 𝑆𝑟 𝑛−2 = √ 2.9911 7−2 = 0.7735 Thus, because 𝑆𝑦⁄𝑥 < 𝑆𝑦 , the linear regression model is efficient. The extent of the improvement is quantified by 𝑟2 = 𝑆𝑡 − 𝑆𝑟 𝑆𝑡 = 22.7143−2.9911 22.7143 = 0.868 or 𝑟 = √0.868 = 0.932 These results indicate that 86.8 percent of the original uncertainty has been explained by the linear model. الثرثار إنسان فأد نعمة السمع أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.1.5 Linearization of Nonlinear Relationships Linear regression provides a powerful technique for fitting a best line to data. However, it is predicated on the fact that the relationship between the dependent and independent variables is linear. This is not always the case and the first step in any regression analysis should be to plot and visually inspect the data to know whether a linear model applies. For example, the next figure shows some data that is obviously curvilinear. In some cases, techniques such as polynomial regression, are appropriate. For example, transformations can be used to express the data in a form that is compatible with linear regression. كاان مماان ينلاار السااعادة أينمااا ذهاال و تكن ممن يخلفها وراءإ متى ذهل أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 One example is the exponential model 𝑦 = 𝛼1 𝑒 𝛽1𝑥 (17.2) where 𝛼1 and 𝛽1 are constants. As shown in the next figure, the equation represents a nonlinear relationship (for 𝛽1 ≠ 0) between x and y. Another example of a nonlinear model is the simple power equation 𝑦 = 𝑎2 𝑥 𝛽2 (17.13) where 𝛼2 and 𝛽2 are constant coefficients. As shown in the previous figure, the equation ( for β2 ≠ 0 or 1) is nonlinear. A third example of a nonlinear model is the saturation-growth-rate equation 𝑦 = 𝛼3 𝑥 (17.4) 𝛽3 + 𝑥 Where 𝛼3 and 𝛽3 are constant coefficients. This model also represents a nonlinear relationship between y and x, that levels off as x increases. أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 A simpler alternative is to use mathematical manipulations to transform the equations into a linear form. Then, simple linear regression can be employed to fit the equations to data. Equation (17.2) can be linearized by taking its natural logarithm ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥 ln 𝑒 But because ln e = 1, ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥 Thus, a plot of ln y versus x will yield a straight line with a slope of 𝛽1 and an intercept of ln 𝛼1 (previous fig d). Equation (17.3) is linearized by taking its base-10 logarithm to give log 𝑦 = 𝛽2 log 𝑥 + log 𝛼2 Thus, a plot of y versus log x will yield a straight line with a slope of 𝛽2 and an intercept of log 𝛼2 ( previous fig e). Equation (17.14) is linearized by inverting it to give 1 𝑦 = 𝛽3 1 𝛼3 𝑥 + 1 𝛼3 Thus, a plot of 1⁄𝑦 versus 1⁄𝑥 will be linear, with a slope of 𝛽3 ⁄𝛼3 and an تعاااد كااال النال ااالا مالئكاااة فتنهاااار أحالم و و تكن ثأتا بهاس عميالالاالاء فتبكي يومالالالا على سذاجت intercept of 1⁄𝛼3 (previous fig f). In their transformed forms, these models can use linear regression to evaluate the constant coefficients. They could then be transformed back to their original state and used for predictive purposes. Example 17.4 illustrates this procedure for Eq. (17.3) أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Example 17.4 Linearization of a Power Equation Problem Statement: Fit Eq.(17.13) to the data in the next table using a logarithmic transformation of the data. Solution: The next figure (a) is a plot of the original data in its untransformed state. Figure (b) shows the plot of the transformed data. A linear regression of the log-transformed data yields the result log 𝑦 = 1.75 log 𝑥 − 0.300 Thus, the intercept, log 𝛼2 , equals -0.300, and therefore, by taking the antilogarithm, 𝛼2 = 10−0.3 = 0.5. The slope is 𝛽2 = 1.75. Consequently, the power equation is 𝑦 = 0.5𝑥 1.75 This curve, as plotted in the next figure (a), indicates a good fit. من يأبى اليوم قباول النصايحة التاي تكلفاه شيئا فسوف يضطر ردا إلى شراء األسف أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.1.6 General Comments on Linear Regression We have focused on the simple derivation and practical use of equations to fit data. Some statistical assumptions that are inherent in the linear least-square procedures are 1. Each x has a fixed value; it is not random and is known without error. 2. The y values are independent random variables and all have the same variance. 3. The y values for a given x must be normally distributed. Such assumptions are relevant to the proper derivation and use of regression. For example, the first assumption means that (1) the x values must be error-free and (2) the regression of y versus x is not the same as x versus y. مااان يأاااع فاااى خطاااأ فهاااو إنساااان ومن يصر عليه فهو شيطان أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.2 Polynomial Regression Some engineering data, although representing a marked pattern, is poorly represented by a straight line. For these cases, a curve would be better suited to fit the data. One method to accomplish this objective is to use transformations. Another alternative is to fit polynomials to the data using polynomial regression. The least-squares procedure can be readily extended to fit the data to a higher-order polynomial. For example, suppose that we fit a second-order polynomial or quadratic: 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑒 for this case the sum of the squares of the residuals is 𝑆𝑟 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )2 Following the procedure of the previous section, we take the derivative of the previous equation with respect to each of the unknown coefficients of the polynomial, as in 𝜕𝑆𝑟 𝜕𝑎0 𝜕𝑆𝑟 𝜕𝑎1 𝜕𝑆𝑟 𝜕𝑎2 = −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 ) = −2 ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 ) = −2 ∑ 𝑥𝑖2 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 ) يتساون في شدة الحم من يطلل نصيحة الغير أبدا مع من يأنع إ بنصائح ريرإ أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 These equations can be set equal to zero and rearranged to develop the following set of normal equations: (𝑛)𝑎0 + (∑ 𝑥𝑖 )𝑎1 + (∑ 𝑥𝑖2 )𝑎2 = ∑ 𝑦𝑖 (∑ 𝑥𝑖 )𝑎0 + (∑ 𝑥𝑖2 )𝑎1 + (∑ 𝑥𝑖3 )𝑎2 = ∑ 𝑥𝑖 𝑦𝑖 (∑ 𝑥𝑖2 )𝑎0 + (∑ 𝑥𝑖3 )𝑎1 + (∑ 𝑥𝑖4 )𝑎2 = ∑ 𝑥𝑖2 𝑦𝑖 where all summations are from i = 1 through n. Note that the above three equations are linear and have three unknowns: 𝑎0 , 𝑎1 , and 𝑎2 . The coefficients of the unknowns can be calculated directly from the observed data. For this case, we see that the problem of determining a least-squares second-order polynomial is equivalent to solving a system of three simultaneous linear equations. أحسن ا ستفادة من كل خمس دقائ بليء خفيف في جيب أو أذكار أو مراجعة حفظ أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Example Polynomial Regression Problem Statement: Fit a second-order polynomial to the data in the first two columns of the next table. Solution: From the given data, ∑ 𝑥𝑖4 = 979 m=2 ∑ 𝑥𝑖 = 15 n =6 ∑ 𝑦𝑖 = 152.6 ∑ 𝑥𝑖 𝑦𝑖 = 585.6 𝑥̅ = 2.5 ∑ 𝑥𝑖2 = 55 ∑ 𝑥𝑖2 𝑦𝑖 = 2488.8 𝑦̅ = 25.433 ∑ 𝑥𝑖3 = 225 Therefore, the simultaneous linear equations are 6 [15 55 15 55 𝑎0 152.6 55 225] {𝑎1 } = { 585.6 } 225 979 𝑎2 2488.8 Solving these equations through a technique such as Gauss elimination gives a0 = 2.47857, a1 = 2.35929, and a2 = 1.86071. إذا تعااااارف إلاااااى أي تتجاااااهو فساااااينتهي بااااا المطاف على األرجح في مكان رير الذي تريدإ أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Continue: Therefore, the least-squares quadratic equations for this case is y = 2.47857 + 2.35929x + 1.86071x2 The standard error of the estimate based on the regression polynomial is 𝑆 𝑟 𝑆𝑦⁄𝑥 = √ = √ 𝑛−( 𝑚+1 ) 3.74657 6−3 = 1.12 The coefficient of determination is 𝑟2 = 𝑆𝑡 − 𝑆𝑟 𝑆𝑡 = 2513.39−3.74657 2513.39 = 0.99851 and the correlation coefficient is r = 0.99925. These results indicate that 99.851 percent of the original uncertainty has been explained by the model. This result supports the conclusion that the quadratic equation represents an excellent fit, as is also evident from the next figure. يعيش اإلنسان اليوم في ظل قرارات سابأةاتخذها فأين تحل أن تكون بعد علر سنوات؟ أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 17.3 Multiple Linear Regression A useful extension of linear regression is the case where y is a linear function of two or more independent variables. For example, y might be a linear function of x1 and x2, as in 𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑒 Such an equation is particularly useful when fitting experimental data where the variable being studied is often a function of two other variables. For this two-dimensional case, the regression “line” becomes a “plane” (next figure). هل تود أن تعيش على هامش الحياة كما يعيش أرلل النا أم أن لدي إسهامات ألمت وللعالس أجمع أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 As with the previous cases, the “best” values of the coefficients are determined by setting up the sum of the squares of the residuals, 𝑆𝑟 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )2 and differentiating with respect to each of the unknown coefficients. 𝜕𝑆𝑟 𝜕𝑎0 𝜕𝑆𝑟 𝜕𝑎1 𝜕𝑆𝑟 𝜕𝑎2 = −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 ) = −2 ∑ 𝑥1𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 ) = −2 ∑ 𝑥2𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 ) The coefficients yielding the minimum sum of the squares of the residuals are obtained by setting the partial derivatives equal to zero and expressing the result in matrix form as 𝑛 [∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑥1𝑖 2 ∑ 𝑥1𝑖 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ∑ 𝑥2𝑖 ∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ] 2 ∑ 𝑥2𝑖 ∑ 𝑦𝑖 𝑎0 {𝑎1 } = {∑ 𝑥1𝑖 𝑦𝑖 } 𝑎2 ∑ 𝑥2𝑖 𝑦𝑖 الجناااااون هاااااو أن تعمااااال نفاااااس األعمااال باانفس الطريأااة وتتوقااع )نتائج مختلفة (أنلطين أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Example 17.6 Multiple Linear Regression Problem Statement: The following data was calculated from the equations y = 5 + 4x1 – 3x2: Use multiple linear regression to fit this data. Solution: The summations required to develop the previous equation are: The result is 6 [16.5 14 16.5 76.25 48 14 48] 54 𝑎0 54 {𝑎1 } = {243.5} 𝑎2 100 Which can be solved using a method such as Gauss elimination for a0 = 5 ai = 4 a2= -3 which is consistent with the original equation from which the data was derived. تعل عاجل عاقبته نجاحو:ستدفع أحد ثمنين أو متعة عاجلة مؤقتة ثمنها فلل مؤلس أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 The foregoing two-dimensional case can be easily extended to m dimensions, as in y = a0 + a1x1 + a2x2 + … + amxm + e where the standard error is formulated as 𝑆 𝑟 𝑆𝑦⁄𝑥 = √ 𝑛−(𝑚+1) and the coefficient of determination is computed as in Eq (17.10). Although there may be certain cases where a variable is linearly related to two or more other variables, multiple linear regression has additional utility in the derivation of power equations of the general form 𝑎 𝑎 𝑎 𝑦 = 𝑎0 𝑥1 1 𝑥2 2 … . 𝑥𝑚𝑚 Such equations are extremely useful when fitting experimental data. To use multiple linear regression, the equation is transformed by taking its logarithm to yield. log 𝑦 = log 𝑎0 + 𝑎1 log 𝑥1 + 𝑎2 log 𝑥2 + … + 𝑎𝑚 log 𝑥𝑚 This transformation is similar in spirit to the one used to fit a power equations when y is a function of a single variable x. الألاااا مثاااال الكرسااااي الهاااا ات ساايجعل تتحاارك دائمااا ولكنااه لن يوصل الى أي مكان أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Problem 17.5 Use least-squares regression to fit a straight line to x 6 7 11 15 17 21 23 29 29 37 39 y 29 21 29 14 21 15 7 7 13 0 3 Compute the standard error of the estimate and the correlation coefficient. Plot the data and the regression line. If someone made an additional measurement of x = 10, y = 10, would you suspect, that the measurement was valid or faulty? Justify your conclusion. Solution: The results can be summarized as y 31.0589 0.78055 x (s y / x 4.476306 ; r 0.901489 ) At x = 10, the best fit equation gives 23.2543. The line and data can be plotted along with the point (10, 10). ستصبح أكثر ساعادة حاين تلاعر أن الحياااااة نفسااااها ترراااال فااااي دعم ومساندت 35 30 25 20 15 10 5 0 0 10 20 30 40 The value of 10 is nearly 3 times the standard error away from the line, 23.2543 – 10 = 13.2543 ≈ 34.476 Thus, we can conclude that the value is probably erroneous. أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Problem 17.13 An investigator has reported the data tabulated below for an experiment to determine the growth rate of bacteria k (per d), as a function of oxygen concentration c (mg/L). It is known that such data can be modeled by the following equation: 𝑘= 𝑘𝑚𝑎𝑥 𝑐 2 𝑐𝑠 + 𝑐 2 Where 𝑐𝑠 and 𝑘𝑚𝑎𝑥 are parameters. Use a transformation to linearize this equation. Then use linear regression to estimate 𝑐𝑠 and 𝑘𝑚𝑎𝑥 and predict the growth rate at c = 2 mg/L. C 0.5 0.8 1.5 2.5 4 K 1.1 2.4 5.3 7.6 8.9 Solution: The equation can be linearized by inverting it to yield c 1 1 1 s 2 k kmax c kmax Consequently, a plot of 1/k versus 1/c should yield a straight line with an intercept of 1/kmax and a slope of cs/kmax c, mg/L 0.5 0.8 1.5 2.5 4 k, /d 1.1 2.4 5.3 7.6 8.9 Sum 1/c2 4.000000 1.562500 0.444444 0.160000 0.062500 6.229444 1/k 0.909091 0.416667 0.188679 0.131579 0.112360 1.758375 1/c21/k 3.636364 0.651042 0.083857 0.021053 0.007022 4.399338 (1/c2)2 16.000000 2.441406 0.197531 0.025600 0.003906 18.66844 قامو الواثأين يحتوي على ). . . و لو حدث أن. . (لكن إذا أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م July 2013 Continue: The slope and the intercept can be computed as a1 5(4.399338 ) 6.229444 (1.758375 ) 0.202489 5(18.66844 ) (6.229444 ) 2 a0 1.758375 6.229444 0.202489 0.099396 5 5 Therefore, kmax = 1/0.099396 = 10.06074 and cs = 10.06074(0.202489) = 2.037189, and the fit is k 10.06074 c 2 2.037189 c 2 This equation can be plotted together with the data: 10 8 6 4 2 0 0 1 2 3 4 5 The equation can be used to compute 10.06074 (2) 2 k 6.666 2.037189 (2) 2 كل الظالم الذي في الدنيا يستطيع أن يخفي ضوء شمعت المضيئة أو بالبريد اإللكتروني9 4444 062 النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy , eng-hs.neteng-hs.com شرح ومسائل محلولة مجانا بالموقعين info@eng-hs.com 9 4444 260 حمادة شعبان.م