The graphical analysis of the ANOVA and regression models parameters significance Irina Arhipova Faculty of Information Technologies, Latvia University of Agriculture, Liela street 2, LV-3001, Jelgava, Latvia Abstract Based on experience at Latvia University of Agriculture, the graphical approach of teaching of separate statistics topics for undergraduate students in economics is presented. Teaching statistics on ANOVA and regression analysis with dummy variables, students usually have problems with interpretation of the model parameters significance and with the classification of the statistical methods and their interpretation. In this paper it is discussed the tasks of teaching using graphical analysis in the different statistics methods. Keywords: ANOVA, regression analysis, dummy variables. Interrelation of ANOVA and regression models All statistical methods are closely interrelated and while teaching a new topic it is necessary to show the link between already acquired methods and this new topic. For example, let us suppose you need to compare a real private consumption per year (EUR) in two state regions for two groups of different real income per year (EUR). In this case, the method of two-way analysis of variance (ANOVA) is appropriate. The null hypothesis means that the factor (region or income) is not significant and the alternative hy- 1626 pothesis – that the factor is significant. Let us consider the income factor as the quantitative factor. In this case, the income factor significance must be analyzed by correlation analysis or one-factor regression analysis. Let us add a new variable, state region, to the data base. Thus a new hypothesis is set up for verifying whether the real private consumption depends on real income and state region. In the particular case there are both qualitative and quantitative variables among the independent variables. The chosen hypothesis can be verified by means of CANOVA (covariance analysis of variance), which is a combination of the regression analysis and the analysis of variance. But in the same way this task can be solved as regression analysis with dummy variables. This teaching approach can be generalized considering a wider class of models and situations. The students are introduced to univariate and multivariate methods in statistics. Depending on number of independent and dependent variables as well the type of variables the students make classification of the statistical methods for practical problems solving. If there is one dependent and one independent variable in the model, than it is univariate method, otherwise multivariate method. The variable’s type and measurement define the concrete statistical method. For example, for one qualitative independent variable and one quantitative dependent variable the t-test is appropriate method but for more than one qualitative independent variables and one quantitative dependent variable the appropriate method is analysis of variance (ANOVA). At the same time for one quantitative independent variable and one quantitative dependent variable the appropriate method is regression analysis. And for more than one quantitative independent variable and one quantitative dependent variable the appropriate method is multivariate regression analysis. In the case when there are both qualitative and quantitative variables are among the independent variables, the problem can be solved by CANOVA (covariance analysis of variance), which is a combination of the regression analysis and the analysis of variance. If the purpose of the problem is to verify, whether the more then one quantitative variables are depend on qualitative variable, then the appropriate method is the multivariate analysis of variance (MANOVA) that combines results from the several ANOVAs [Sharma 1996]. All of these models can be to extend to its most general case, named GLM General Linear Model. The GLM looks the same as the two variable model but the difference is that each of the variables in the GLM represent not a single, but a set of variables. 1627 Examples of regression models To illustrate, suppose that Yt represents the real private consumption (EUR) of the tth month, and that the real private consumption depends primarily on the real income (EUR) Xt of the tth month and state region. Let consider only two regions: B1 and B2. The model represents the relationship between real private consumption and real private income depending on region, might be as Eq.1 B Yt = α0 + (γ0- α0)Dt + α1Xt + (γ1 - α1)(Dt⋅Xt)+ εt (1) where Xt is real income but Dt is equal 0 for B1 region and 1 for the B2 region. Students usually have difficulties in classifying the model parameters significance. During the class the following table with different coefficient significance is offered in table 1. Table 1. The possible combinations of the model coefficient significance Nr. α0 γ0-α0 α1 γ1- α1 1 yes yes yes yes 2 yes yes yes no 3 yes yes no yes 4 yes yes no no 5 yes no yes yes 6 yes no yes no 7 yes no no yes 8 yes no no no 9 no yes yes yes 10 no yes yes no 11 no yes no yes 12 no yes no no 13 no no yes yes 14 no no yes no 15 no no no yes 16 no no no no yes – the coefficient is significant on the 0,05 level, no - the coefficient is not significant on the 0,05 level. Students are asked to prepare the equations according to the significance of the model coefficients. For example, eq.1 is corresponded the 1st model in the table 1, and the model shows that not only the quantitative Xt and qualitative Dt factors effect will be evaluated but also their interaction effect. As a result, there are two equations really (eq.2): Yt=α0+ α1Xt+εt when D=0 and Yt=γ0+γ1Xt+εt when D=1 (2) The eq.2 can take a number of different shapes depending on the signs and absolute values of the coefficients. As a result, slope dummies can be used to model a wide variety of relationships [Studenmund 1997]. Analogically, eq.3 is corresponded the 2nd model in the table 1 where only the main effects of two factors can be evaluated under the condition that interaction effect isn’t significant. Yt = α0 + (γ0 - α0)Dt + α1Xt + εt (3) It means that eq.3 can be rewritten as two equations of parallel lines (eq.4). 1628 Yt=α0+ α1Xt+εt when D=0 and Yt=γ0+α1Xt+εt when D=1 (4) The graphical summary of all models is shown in the Fig.1, where the black line corresponds to B1 region and dot line to B2 region. Fig.1. The graphical interpretation of the regression models according to the coefficients significance on table 1 Let consider the model number 4 from the table 1 where only the coefficients α0 and (γ0-α0) are significant that the corresponded regression model is the following (eq.5). Yt = α0 + (γ0- α0)Dt + εt (5) Analogically, eq.5 also can be rewritten as two equations where the both lines are parallel X-axis with the corresponded intercepts α0 and γ0 (eq.6). 1629 Yt = α0 + εt when D = 0 Yt = γ0 + εt when D = 1 and (6) Students should make interpretation of this model, when depended variable is depended only from qualitative factor and in this case the regression model is transformed to ANOVA model. Examples of ANOVA models Suppose that real private consumption (EUR) Yt of the tth month is given for two state regions B1 and B2 and for two groups A1 and A2 of the different real income per year (EUR). For each groups of the factors combinations the mean value of real private consumption can be calculated. Theses mean values can be represented graphically for the factor significance interpretation. Let consider the main situations how the factors and their interaction effect can influence the real private consumption mean value (Table.2). Table 2. The possible combination of the factors significance Nr A B A*B 1 no no no 2 no no yes 3 no yes no 4 no yes yes 5 yes no no 6 yes no yes 7 yes yes no 8 yes yes yes A – the factor of real income, B - the factor of state region and A*B – two factors interaction effect, yes – the factor or factors interaction is significant on the 0,05 level, no - the factor or factors interaction isn’t significant on the 0,05 level. The graphical summary of all models is shown in the Fig.2, where the continuous line corresponds to B1 region and dot line - to B2 region [Christensen, 1996]. After the acquisition of the theoretical course, the students are ready use the acquired knowledge to solve a problem, using ready-made statistical software. The general linear model (GLM) is flexible statistical model which incorporates analyses involving normally distributed dependent variables and combinations of categorical and continuous factor variables. The SPSS GLM procedure can accommodate univariate models (one dependent variable) involving ANOVA, Regression and CANOVA [Arhipova, Balina, 2004]. 1630 Fig.2. The graphical interpretation of the ANOVA models according to the factor significance on table 2 Conclusions The application of graphical analysis allows the students to obtain clear interpretation of the statistical models as well as to help them better understand the interrelation between the statistical hypotheses. Only when the conclusions of statistical hypotheses are acquired, the statistical software packages can be used in problem solving because the formal use of the applied statistical packages hinders students’ deep acquisition of the statistics. References Arhipova I, Balina S (2004) The problem of choosing statistical hypotheses in applied statistics. COMPSTAT 2004. Proceedings in Computational Statistics.16th Symposium Prague/Czech Republic. Physica-Verlag, Heidelberg, pp.629 - 636 Christensen R (1996) Analysis of Variance, Design and Regression. Applied statistical methods. Chapman & Hall Studenmund AH (1997) Using Econometrics: A Practical Guide. Addison-Wesley Sharma S (1996) Applied Multivariate Techniques. John Wiley & Sons, Inc