Gen. Math. Notes, Vol. 21, No. 1, March 2014, pp. 118-127 ISSN 2219-7184; Copyright © ICSRS Publication, 2014 www.i-csrs.org Available free online at http://www.geman.in Comparison between Models With and Without Intercept Sameera Abdulsalam Othman Department of Mathematics, Faculty of Educational Science School of Basic Education, University of Duhok, Duhok-Iraq E-mail: samira.a@hotmail.com (Received: 8-1-14 / Accepted: 25-2-14) Abstract The aim of this paper is Comparison between models with and without intercept and Statement the beast one, and applying the method leverage point when we added the new point to the original data. We are testing the significant intercept by using (t) test. Keywords: Intercept, Hypothesis, Significant, Original, Regression. 1.1 Introduction Multiple linear regressions (MLR) is a method used to model the linear relationship between a dependent variable and one or more independent variables. The dependent variable is sometimes also called the predict and the independent variables the predictors. The aim of this paper is Comparison between models with and without intercept and Statement the beast one, and contain the important definition of the regression and the most important relationship and the equation that are used to solve example about the Multiple linear regression of least squares and estimation and test of hypothesis due to the parameters, and so the most Comparison between Models With and… 119 important application the theoretical of blood pressure (dependent variable Y) and height, weight, age, sugar, sex, hereditary factor social status (independent variable). 2.1 Linear Regression Models and Its Types a. Linear Regression Model with Intercept The linear regression be intercept if the line regression intersection with Y axis in not origin. It means that mathematically B ≠ 0 that is intersection point of regression line with Y axis Y =B +B X +e Y X B e , i = 1,2,3, ⋯ , n (1) = depended variyable = independent variable , B = regression parameter = value of random error b. Linear Regression Model without Intercept [4], [3] The linear regression be without intercept when the line regression to pass through the origin. It means that mathematically B = 0 We can write the simple linear regression model Y =B X +e (2) & = '() B ' (3) The parameters B and B usually anknown and estimate by least squars method. From(3,4) (( & =Y *−B & X * B (4) Where S-. : Standard deviation between X and Y S- : Standard deviation of X But linear regression without intercept 3 01 2 1 & = ∑145 B ∑3 -6 145 1 (5) 120 Sameera Abdulsalam Othman 2.2 Leverage Point and Regression through the Origin Leverage Point The interpretation can be use in understanding the difference between the full fit and the for forced through the origin. It is show that the regression through the origin is equivalent to fitting the full model. To a new data set. This new data a set is composed of the original observation. Evaluation of the leverage possessed by this new points is equivalent to evaluating when the B = 0 in the full model. 2.3 Augmenting the Data Set Let (X , Y ), (X7 , Y7 ), ⋯ , (X8 , Y8 ) be n data points observed according to Y = b +B X +∈ Where the experimental errors, ∈ , are independently normally distributed with mean 0 and σ7 . the least squares estimates for b and b are b; = Where ∑8> (x − x*)(y − y=) ∑8> (x − x=)7 ∑0 * X = 81 , b; = * = ∑ 21 Y 8 , *−B & * Y X [1], [10] If the regression is forced through the origin, then it is assumed that the data are observed according to Y = B X + e and the least squares estimate of B in this model is & = B ∑8> x y ∑8> x 7 If the original data set is augmented with a new observation *) (X8? , Y8? ) = (n∗ X* , n∗ Y Where n∗ = 8 (6) √8? B Then fitting the full model to the augmented data set is equivalent to forcing the original regression through the origin. This follows from the easily verified identities 8? 8 > > * 8? )(Y − Y *8? ) = C X Y C(X − X Comparison between Models With and… 8? * 8? C(X − X > )7 8 = CX > 7 , 8? *8? C(Y − Y > 121 )7 8 = C Y7 > Where * 8? = X ∑8? > X n+1 , *8? = Y ∑8? > Y n+1 The position of the point (n∗ X* , n∗ * Y) , relative to the other points, determines the new points has high or low leverage. The leverage of the new point can be used to decide if the regression through the origin is more a pirate intercept term. [5] 2.4 Assessing the Leverage of Data Point The leverage, h E , of a data point YE is the amount of influence that a predicted & , can be written as value, say Y 8 & = C h E YE Y E> Where * )G0H B0 *I (01 B0 h E = 8 + ∑3 * 6 J45(0J B0) (7) The h E show how each observation YE affects the predicted value Y . More importantly, however, h show how Y affects & Y , and is quite use full in the detection of influential points. The relative size of h can give us information on the potential influence Y has on the fit. For purposes of comparison, it is fortunate that the values h have a built-in scale. The matrix H = Lh E M is a projection matrix and from the properties of projection matrices it can be verified that 0 ≤ h ≤ 1 and ∑8> h =p , where p is the number of coefficients to be estimated. Thus, on the average, h should be approximately equal p⁄n . Hoaglin and welsch suggest, as a rough guideline, paing attention to any data points having h > 3p⁄n .The h E values depend only on the experimental design(the X ′Q ), and not on the results of the results of the experiment (the Y ′Q ), hence a data point with high leverage may not necessarily have an adverse effect on the fit. [7] 2.5 The Leverage of Augmented Point We are concerned here with influence of the (n + 1) st data points (n∗ X* , n∗ * Y) using and it is stra 1 ght forward to calculate. 122 h8? = 2.6 Sameera Abdulsalam Othman 8? *6 86 0 R1 + ∑3 6 145 -1 S (8) The Leverage: There is also a straight forward generalization to multiple linear regression the original data is augmented whit the point (n∗ X* , n∗ * Y) where X* is a vector containing the means of the independent variables. The impact of the augmented data point on the fit because cleared when h8? is also examined, it is, perhaps, more instructive to write h8? in the equivalent form. *6 0 h8? = 8? T1 + n Uσ6 ?0*6 VW (9) ( Where σ7- = * )6 ∑(01 B0 (10) 8 *⁄σ- )7 and we can Thus the impact of the augmented data point increases with (X expect the greatest discrepancy between the full fit and through the origin when * X is large compared to σ7- . The augmented data set will seem to be composed of two distinct cluster ; one composed of the original data and one composed of *) .[2] (n∗ X* , n∗ Y 2.7 Analysis of Variance (ANOVA) ANOVA enables us to draw inferences about whether the samples have been drawn from populations having the same mean. It is used to test for differences among the means of the populations by examining the amount of variation within each of the samples, relative to the amount of variation between the samples. In terms of variations within the given population, it is assumed that the values differ from the mean of this population only because of the random effects. The test statistics used for decision making is F = ratio of Estimate of population variance based on between samples variance and Estimate of population variance based on within samples variance [8]. 2.8 ANOVA Table For a one-way ANOVA, the table looks like: Source df Treatment (k−1) Error (N−k) Total N−1 SS SST SSE SST MS MST MSE F p-value MST/MSE p [6], [4] Comparison between Models With and… 2.9 123 Testing of Hypotheses We test the significant with and without intercept by using (t) test H0:β = 0 H1: β ≠ 0 t = (11) and β; X 'Gβ; X I Where SGβ; I:stander deviation of β; [9] The result of testing the null hypothesis is reject if |t | ≥ t \α 6 3.1 (12) ,8B]B ^ Application This section contains an application of what the theoretic part mentioned in the section two. Data are obtained for the practical application of these samples from healthy centers. We take the data in the study of blood pressure (dependent variable Y) and height (X1), weight (X2), age (X3) sugar (X4), sex (X5, 0 denote to male and 1 to female), hereditary factor (X6, 0 if have hereditary factor, 1 if no), social status (X7, 0 if married and 1 if single) (independent variable) for (100) person. We can show that in table (1). Table (1): Blood pressure and height, weight, age, sugar, sex, hereditary factor for (100) person No. Y X1 X2 X3 X4 X5 X6 X7 No. Y X1 X2 X3 X4 X5 X6 X7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 140 140 130 150 190 110 80 120 140 140 130 170 80 240 145 152 158 172 161 160 159 152 165 157 161 165 160 154 151 41 79 83 83 67 72 48 75 54 69 43 83 110 45 54 24 49 34 48 40 46 21 44 46 31 45 48 41 27 50 178 100 270 223 82 160 170 82 147 150 117 165 335 90 314 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 0 1 1 1 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 80 70 210 240 140 140 170 180 80 210 200 130 200 170 80 153 163 171 151 157 161 164 152 168 153 156 172 166 169 150 45 48 116 54 69 43 60 61 67 35 87 83 90 85 45 27 25 40 50 31 45 54 45 37 58 45 48 40 27 28 370 339 310 255 295 244 280 296 320 195 190 210 216 269 270 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 124 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 100 180 160 120 100 140 230 70 190 210 200 240 250 117 230 145 145 190 190 160 170 160 175 150 180 180 170 150 160 150 118 80 170 180 150 Sameera Abdulsalam Othman 180 167 154 172 152 162 174 163 155 151 156 151 172 154 174 155 150 156 160 160 158 157 160 168 176 159 156 149 154 154 168 152 160 160 162 69 91 70 59 53 43 72 48 92 80 87 54 90 55 72 69 65 93 72 64 55 70 83 111 81 67 70 60 70 69 76 75 64 60 68 32 40 100 55 62 80 60 25 70 50 45 50 65 28 60 33 35 45 46 50 55 54 44 50 60 50 47 45 60 40 60 44 45 32 40 151 190 264 133 90 100 82 270 350 250 234 310 126 260 245 130 144 249 320 227 179 234 230 300 420 255 289 176 156 195 188 173 143 128 342 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 66 67 68 69 70 71 72 73 74 75 76 89 90 91 92 93 94 95 96 97 98 99 100 89 90 91 92 93 94 95 96 97 98 99 100 180 80 150 170 220 170 170 230 180 200 180 170 200 160 180 250 210 117 150 160 170 90 180 170 200 160 180 250 210 117 150 160 170 90 180 166 150 161 173 171 156 167 174 152 171 164 158 160 172 154 172 151 154 172 160 160 154 152 158 160 172 154 172 151 154 172 160 160 154 152 69 55 54 79 82 53 70 72 45 60 60 65 65 70 65 90 80 55 96 70 110 50 85 65 65 70 65 90 80 55 96 70 110 50 85 41 25 40 59 60 16 55 60 60 43 60 60 52 70 52 65 50 28 35 69 41 25 60 60 52 70 52 65 50 28 35 69 41 25 60 378 399 369 315 345 214 195 377 430 377 384 258 432 170 185 265 250 128 235 170 174 175 280 258 432 170 185 265 250 128 235 170 174 175 280 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 By using software of Minitab (13.2). We compare between linear Regression model with intercept and without intercept 3.2 The Statistical Analysis We get the linear regression model by using analysis data. The regression equation is y = 214 - 1.08 X1+0.631 X2+1.39 X3+ 0.0839 X4- 34.1 X5+ 4.93 X6- 11.2 X7 Table (2): Analysis of Variance by using one-way ANOVA table Source Regression Residual Error DF 7 92 SS 76753 113278 MS 10965 1231 F 8.91 P 0.000 0 1 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 0 1 1 1 Comparison between Models With and… Total 99 125 190031 For the purpose of applying the method leverage point. the point (n*x,n*y) is added to the original data we know if the new point effect the imported of the intercept in original data as: n∗ = 11.049 * =1787.94 n∗ Y * = 1772.149 n∗ X * 7 = 756.414 n∗ X * c = 521.291 n∗ X n∗ * Xd = 2625.684 n∗ * Xf = 1.546 n∗ * Xg = 8.287 n∗ * Xh = 7.403 When we added the new compound to original data, the data become (n=101). We get the linear regression model by using analysis data. The regression equation is y = 0.36+ 0.238X1+ 0.526X2+ 1.45X3+ 0.0850X4- 27.7X5+ 2.40X6 + 1.10X7 Table (3): Analysis of Variance Source Regression Residual Error Total DF 7 93 100 SS 2687881 120235 2808116 MS 383983 1293 F 297.01 P 0.000 We calculate leverage point by equation (9) h8? =h = 0.836 We show the leverage of this point is large if we compare with the value of c] 8 = 0.237 Since the (h8? > c] 8 ) we conclusion that the leverage of the new point. Effect and force the level regression through the origin point. It means that the linear regression in original data intercept is significant (B ≠ 0). 126 Sameera Abdulsalam Othman If we test the significant of intercept (B ) in original data we get the Results t0 = 2.38 t(0.025,92)≈ 2.00 Since (t0> t(0.025,92)) it means that the intercept is very necessary in this model. We show for two ways that the significant intercept is very necessary, if we sure that the new points when added the original data forced the level regression through the original point. We test significant intercept in new data by using (t) test in (11) as: t0= 0.07 t(0.025,93)≈ 2.00 Since (t0< t(0.025,93)) it means that the intercept in this model is insignificant it means that the leverage new point when added the original data forced the level regression through the original point. 3.3 Comparison between Models with and without Intercept We show the analysis variance in table (2) mean square error (MSE) equal (10965) is less than the MSE in table (3) is (383983). It means that the original data is beast than the new data when we added the new point. when we test this models with no intercept, and notes that many of the usual statistics (such as R2 and the model F) are not comparable between the intercept and without intercept models, because if we delete the intercept in linear regression the value of (R2) is increase, Explains the value of (F) increase. Conclusion When compared between models with and without intercept, we show the model with intercept is basted than the model without intercept when we forecast to increase or decrease the blood pressure. We note in application that the data is control the style to be used for analysis and to acceptable results. References [1] [2] B.R. Kirkwood, Medical Statistics, (1988), www.blackwell-science.com. G. Casella, Leverage and regression through the origin, The American Statistician, 37(2) (1983), 147-152. Comparison between Models With and… [3] [4] [5] [6] [7] [8] [9] [10] 127 J. Cohen, P. Cohen, S.G. West and L.S. Aiken, Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences (3rd Edition), Mahwah, NJ: Lawrence Erlbaum Associates, (2003). D.R. Helsel and R.M. Hirsch, Techniques of Water-Resources Investigations of the United States Geological Survey Book 4, Hydrologic Analysis and Interpretation Reston, VA, USA, (1991). D.L. Farnsworth, The effect of a single point on correlation and slope, Internat. J. Math. & Math. Scio., 13(4) (1990), 799-806. D.R. Helsel and R.M. Hirsch, Statistical methods in water resources techniques of water resources investigations, U.S. Geological Survey, Book 4 (Chapter A3) (2002), 168 pages. D.C. Hoaglin and R.E. Welsch, The hat matrix in regression and ANOVA, The American Statistician, 32(1) (1978), 17-22. J.O. Rawlings, Maneesha and P. Bajpai, Multiple regression analysis using ANCOVA in university model, International Journal of Applied Physics and Mathematics, 3(5) (2013), 336-340. R.A. Johnson, Probability and Statistics for Engineers (6th ed.), Pearson Education, (2003). Gujarati, Basic Econometrics (4th Edition), The McGraw−Hill Companies, (2004).