Nov. 14, 2002 LEC #12 ECON 240A-1 L. Phillips Bivariate Normal Distribution: Isodensity Curves I. Introduction Economists rely heavily on regression to investigate the relationship between a dependent variable, y, and one or more independent variables, x, w, etc. As we have seen, graphical analysis often provides insight into these bivariate relationships and can reveal non-linear dependence, outliers, and other features that may complicate the analysis. There are other methodologies for examining bivariate relations. We have examined some of them. For example, correlation analysis, using the correlation coefficient, , is one method, as discussed in Lecture Eight. Another method is contingency table analysis. We will discuss the latter shortly. First we turn to the bivariate normal distribution, which provides a useful visual model for bivariate relationships just as the univariate normal distribution provides a useful probability model for a single variable. It is useful to have a mental model in mind for bivariate relationships and the isodensity lines, or contour lines of the bivariate normal provide a visual representation. The bivariate normal distribution of two variables, y and x, is a joint density function, f(x,y), and if the variables are jointly normal, then the marginal densities, e.g. f(x) and f(y), are each normal. In addition, the conditional densities, y given x, f(y/x), are normal as well. The isodensity lines, i.e. the locus where f(x,y) is constant, is a circle around the origin for the bivariate normal if both x and y have mean zero and variance one, i.e. are standardized normal variates, and are not correlated. If x and y have nonzero means, x and y , respectively, then these contour lines are circles around the point (x, y). If x has a larger variance than y, then the contour lines are ellipses with the long axis in the x direction. If x and y are correlated, then these ellipses are slanted. Nov. 14, 2002 LEC #12 ECON 240A-2 L. Phillips Bivariate Normal Distribution: Isodensity Curves II. Bivariate Normal Density The density function, f(x,y) for two jointly normal variables, x and y where, for example, x has mean x, variance x2, and correlation coefficient , is: f(x, y) = 1/[2x y (1-2)] exp{(-1/[2(1-2)])([(x- x)/x]2 - 2[(x- x)/x ][(y- y)/y] + [(y- y)/y]2 }. (1) A. Case 1: correlation is zero, means are zero, and variances are one f(x, y) = 1/[2 ] exp{(-1/2)[ x2 + y2 ]} (2) and for an isodensity, where f(x,y) is a constant, k, taking logarithms, ln [2 f(x, y)] = -1/2 [x2 + y2 ], or [x2 + y2 ] = -2 ln [2 f(x, y)] = -2ln [2 k]. (3) Recall [x2 + y2] = r2 is the equation of a circle around the origin, (0, 0) with radius r, as illustrated in Figure 1. -------------------------------------------------------------------------------- y x Figure 1: Isodensity Circles About the Origin Nov. 14, 2002 LEC #12 ECON 240A-3 L. Phillips Bivariate Normal Distribution: Isodensity Curves Note that if x and y are independent, then the correlation coefficient, , is zero and the joint density function, f(x, y), is the product of the marginal density functions for x and y, i.e. f(x, y) = f(x) f(y) = 1/ 2 exp [-1/2 x2 ] 1/ 2 exp [-1/2 y2 ] (4) where x and y have mean zero and variance one. B. Case 2: correlation is zero, variances are one, means x and y In this case, the origin is translated to the point of the means, (x, y). The bivariate density function is: f(x, y) = 1/(2) exp {(-1/2)[(x - x)2 + (y - y)2 ]}. (5) For a density equal to k: [(x - x)2 + (y - y)2 ] = -2 ln [2 f(x,y)] = -2 ln[2k] (6) This is illustrated in Figure 2. ------------------------------------------------------------------------------------y y x x Figure 2: Isodensity Lines About the Point of Means, Bivariate Normal Nov. 14, 2002 LEC #12 ECON 240A-4 L. Phillips Bivariate Normal Distribution: Isodensity Curves C. Case 3: correlation is zero, variance of x > variance of y If the variance of x exceeds the variance of y, then the isodensity lines are ellipses about the point of the means with the semi-major axis in the x direction: f(x,y) = 1/(2 x y ) exp{ (-1/2) ([(x-x)/x]2 + [(x-y)/y]2 )} (7) Note that if x and y are independent, then the correlation coefficient is zero and the joint density is the product of the marginal densities: f(x, y) = f(x) f(y) = 1/(x 2 ) exp[-1/2[(x- x)/x]2 1/(y 2 ) exp[-1/2[(y- y)/y]2 For a constant isodensity, f(x, y) = k, from Eq. (7) we have, ([(x-x)/x]2 + [(x-y)/y]2 = -2 ln (2 x y f(x, y)) = -2 ln (2 x y k) (8) Recall the equation of an ellipse about the origin with semi-major axis a and semi-minor axis b is: x2/a2 + y2/b2 = 1 (9) Elliptical isodensity lines around the point of the means are illustrated for Eq. (7) in Figure 3. Case 4: correlation is nonzero. The joint density function is given by Eq. (1) above, and the isodensity lines are tilted ellipses around the point of the means as illustrated in Figure 4, for positive autocorrelation. Nov. 14, 2002 LEC #12 ECON 240A-5 L. Phillips Bivariate Normal Distribution: Isodensity Curves y y x x Figure 3: Isodensity Lines About the Point of the Means, Var x > Var y ----------------------------------------------------------------------------------y y x x Figure 4: isodensity lines, x and y correlated ---------------------------------------------------------------------------------------------III. Marginal Density Functions If x and y are jointly normal, then both x and y each have normal density functions. For example, the marginal density of x, f(x) is: Nov. 14, 2002 LEC #12 ECON 240A-6 L. Phillips Bivariate Normal Distribution: Isodensity Curves f(x) = f ( x, y )dy = 1/(x 2 ) exp[-1/2[(x- x)/x]2 (10) and similarly for y. IV. Conditional Density Function The density of y conditional on a particular value of x, x = x*, is just a vertical slice of the isodensity curve plot at that value of x, and if x and y are jointly normal, is also normal. It can be obtained by dividing the joint density function by the marginal density and simplifying: f(y/x) = f(x, y)/f(x) = 1/[y 2 (1 - 2)1/2] exp{[-1/[2(1-2)y2][y-y-(x-x)(y/x)]} (11) where the mean of the conditional distribution is y + (x-x)(y/x), i.e this is the expected value of y for a given value of x, such as x*: E[y/x=x*] = y + (x* - x)(y/x) (12) So, if x is at its mean, x, then the expected value of y is its mean y. If x is above its mean, and the correlation is positive, then the expected value of y conditional on x is greater than y. This is called the regression of y on x with intercept y - x(y/x), and slope (y/x). Of course, if x and y are not correlated, then the slope is zero, and the intercept is y. The variance of the conditional distribution is: Var[y/x=x*] = y2 (1 - 2) (13) The isodensity lines and the regression line, the mean of y conditional on x, is illustrated in Figure 5, for the case where x and y are positively correlated and the variance of x is greater than the variance of y. Nov. 14, 2002 LEC #12 ECON 240A-7 L. Phillips Bivariate Normal Distribution: Isodensity Curves Expected Value of y Conditional on x y y x x Figure 5: The Expected value of y Conditional on x V. Example: Rates of Return for a Stock and the Market In Lab Six we look at the data file XR17-34 for 48 monthly rates of return to the General Electric (GE) stock and the Standard and Poor’s Composite Index. Both of these variables are not significantly different from normal in their marginal distributions. An example is the histogram and statistics for the rate of return for GE, shown in Figure 6. Figure 6 6 Series: GE Sample 1993:01 1996:12 Observations 48 5 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 4 3 2 1 Jarque-Bera Probability 0 -0.05 0.00 0.05 0.10 0.022218 0.019524 0.117833 -0.058824 0.043669 0.064629 2.231861 1.213490 0.545122 Nov. 14, 2002 LEC #12 ECON 240A-8 L. Phillips Bivariate Normal Distribution: Isodensity Curves The coefficient of skewness, S, is a measure of non-symmetry: n S = (1/n) {[ y( j ) y ] / ˆ } 3 (14) j 1 Where ˆ is s, the sample standard deviation. For the normal distribution, the coefficient of skewness is zero, since the cube of deviations from the mean sum to zero with the negative values offset by the positive ones because of symmetry. The coefficient of kurtosis, K, is a measure of how peaked or how flat the density is, capturing the weight in the tails. n K = (1/n) {[ y( j) y ] / ˆ } 4 (15) j 1 For the normal distribution, the coefficient of kurtosis is three. The Jarque-Bera statistic, JB, combines these two coefficients: JB = (n- k/6) [S2 + (1/4)(K – 3)2 (16) Where k is the number of estimated parameters, such as the sample mean and sample standard deviation, needed to calculate the statistics. If S is zero and K is 3, then the JB statistic will be zero. Large values of JB indicate a deviation from normality, and can be tested using the Chi-Square distribution with two degrees of freedom. The descriptive statistics for GE and the Index are given in Table 1. The estimated correlation coefficient is 0.636. These estimates can be used to implement Eq. (12): E[y/x=x*] = [y - x(y/x)] + x*(y/x) E[GE/Index] = [0.0222 – 0.636*0.0144*(0.0437/0.0254)] + 0.636*1.720*Index E[GE/Index] = 0.0064 + 1.094*Index (13) Nov. 14, 2002 LEC #12 ECON 240A-9 L. Phillips Bivariate Normal Distribution: Isodensity Curves For comparison, the estimated regression is reported in Table 2. The coefficients are nearly identical. So the regression can be interpreted as the expected value of y for a given value of x. A plot of the rates of return for GE and the stock Index are shown in Figure 6. Table 1 Sample: 1993:01 1996:12 GE INDEX 0.022218 0.019524 0.117833 -0.058824 0.043669 0.064629 2.231861 0.014361 0.017553 0.076412 -0.044581 0.025430 -0.453474 3.222043 Jarque-Bera Probability 1.213490 0.545122 1.743715 0.418174 Observations 48 48 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis Table 2 Dependent variable: GE Method: Least Squares Coefficient Std. Error t-Statistic Prob. 0.006526 1.092674 0.005659 0.195328 1.153229 5.594046 0.2548 0.0000 0.404865 0.391927 0.034053 0.053341 95.14493 2.442439 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 0.022218 0.043669 -3.881039 -3.803072 31.29335 0.000001 Nov. 14, 2002 LEC #12 ECON 240A-10 L. Phillips Bivariate Normal Distribution: Isodensity Curves Figure 6: Rates of Return for GE Stock and S&P Composite Index 0.15 0.10 GE 0.05 0.00 -0.05 -0.10 -0.05 0.00 0.05 0.10 INDEX V. Discriminating Between Two Populations As an example, we will use the data file XR18-58 on lottery expenditure as a percent of income, introduced in Lab Six. Twenty-three individuals did not gamble. The means for their age, number of children, years of education, and income are shown in Table 3. For comparison, the means of the 77 individuals who did gamble are shown in Table 4. The question is, can these explanatory variables predict who will and who will not buy lottery tickets. The means for number of children and age are fairly similar for the two groups. Those who do not buy lottery tickets are better educated with higher incomes than those Nov. 14, 2002 LEC #12 ECON 240A-11 L. Phillips Bivariate Normal Distribution: Isodensity Curves who participate in the lottery. The correlation between education and income is 0.65 for ticket buyers, and 0.74 for the entire sample. Table 3 Sample: 1 23 AGE CHILDREN EDUCATION INCOME LOTTERY 40.43478 41.00000 54.00000 23.00000 8.805092 -0.446250 2.308389 1.782609 2.000000 4.000000 0.000000 1.277658 0.014659 1.985475 15.56522 16.00000 20.00000 7.000000 3.368653 -0.919721 3.156800 47.56522 42.00000 95.00000 18.00000 22.51631 0.518080 2.097295 0.000000 0.000000 0.000000 0.000000 0.000000 NA NA Jarque-Bera Probability 1.221762 0.542872 0.987199 0.610425 3.266130 0.195330 1.809815 0.404579 NA NA Observations 23 23 23 23 23 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis ---------------------------------------------------------------------------------------Table 4 Sample: 24 100 AGE CHILDREN EDUCATION INCOME LOTTERY Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 44.19481 43.00000 82.00000 21.00000 12.70727 0.466514 3.189937 1.779221 2.000000 6.000000 0.000000 1.343830 0.506085 3.149919 11.94805 11.00000 17.00000 7.000000 2.887797 0.293006 1.918891 28.54545 27.00000 64.00000 11.00000 9.423578 1.304264 5.036654 7.000000 7.000000 13.00000 1.000000 2.695025 -0.308533 2.741336 Jarque-Bera Probability 2.908734 0.233548 3.359008 0.186466 4.851659 0.088405 35.13888 0.000000 1.436299 0.487654 Observations 77 77 77 77 77 --------------------------------------------------------------------------The conceptual framework is provided in Figure 7, which shows isodensity curves for the two populations for the explanatory variables income and education. Nov. 14, 2002 LEC #12 ECON 240A-12 L. Phillips Bivariate Normal Distribution: Isodensity Curves Lottery Players Lottery Avoiders Y = education y Decision Rule Line y x x X = income Figure 7: Discriminating Between Those Who Play the Lottery and Those Who Don’t --------------------------------------------------------------------------------------Using a single variable, we could test for a difference in sample means for education or for a difference in the sample means for income. But why not use both variables and instead of a decision rule classifying them as gamblers if x < x*, or y < y* use a decision rule line that separates the two populations. This is called discriminant function analysis. Another approach is to use a probability model. A linear probability model can be estimated with regression using a dependent variable coded one for those who buy tickets and zero for those who do not(designated bern for Bernoulli), and regressing it against education and income. The results are shown in Table 7, with a plot of actual, fitted and residuals following. Since income is very skewed, it is better to use the natural logarithm of income, which is more bell shaped. Nov. 14, 2002 LEC #12 ECON 240A-13 L. Phillips Bivariate Normal Distribution: Isodensity Curves Using the same coding for the dependent variable, non-linear estimation of the logit probability model is possible using Eviews, which avoids some problems that occur with the linear probability model. Table 7 Dependent Variable: BERN Method: Least Squares Sample: 1 100 Included observations: 100 Variable Coefficient Std. Error t-Statistic Prob. EDUCATION INCOME C -0.021597 -0.010462 1.390402 0.016017 0.003430 0.148465 -1.348392 -3.049569 9.365178 0.1807 0.0030 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.277095 0.262190 0.363299 12.80264 -39.11792 0.651758 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 0.770000 0.422953 0.842358 0.920513 18.59045 0.000000 -------------------------------------------------------------------------. Figure 8: Ac tual, Fitted and res iduals from Linear Probability Model 1.5 1.0 0.5 1.0 0.5 0.0 0.0 -0.5 -0.5 -1.0 -1.5 10 20 30 40 Residual 50 60 Actual 70 80 90 Fitted 100 Nov. 14, 2002 LEC #12 ECON 240A-14 L. Phillips Bivariate Normal Distribution: Isodensity Curves The linear probability model can be interpreted from the perspective of decision theory, and used to come up with a decision rule or discriminant function. The expected cost of misclassification is the sum of the expected costs of two kinds of misclassification, (1) labeling a non-player a player, and (2) labeling a player a nonplayer. For example, if we have the cost of labeling a non-player a player, C(P/N), and multiply it by the conditional probability, P(P/N) of incorrectly classifying this nonplayer a player, given this individual’s values for income and education, and multiply by the probability of observing non-players in the population, P(N), we have this first component of misclassification: C(P/N)*P(P/N)*P(N). Adding the other expected cost of misclassification, we have the total expected costs, E(C), of misclassification: E(C) = C(P/N)*P(P/N)*P(N) + C(N/P)*P(N/P)*P(P). (14) If the two costs of misclassification are equal, i.e. C(P/N) = C(N/P), noting that there are 23 non-players or about one in four in the population, the expected costs are E(C) = C(P/N)*P(P/N)*(1/4) + C(N/P)*P(N/P)*(3/4), (15) We could weight the expected costs of misclassification equally by setting the probability of classifying a non-player (coded one in the linear probability model) as a player to ¾, i.e setting P̂ (P/N) = ¾, i.e. E(C) = C(P/N)*(3/4)*(1/4) + C(N/P)*(1/4)*(3/4). (16) This is equivalent to setting the fitted value of Bern to ¾, and classifying an individual as a player if the individuals fitted probability is greater than ¾, i.e. if B̂ ern > ¾, where Bern = ¾ = 1.390 –0.0216*education – 0.0105*income, (17) drawing on Table 7. Thus the discriminant function or decision rule line in education income space is, rearranging Eq. (17): Nov. 14, 2002 LEC #12 ECON 240A-15 L. Phillips Bivariate Normal Distribution: Isodensity Curves Education = 29.63 – 0.486*income, (18) Which is illustrated in Figure 9. Note that five non-players are misclassified as well as fourteen players, for a total of nineteen. You could shift the line to the right, misclassifying fewer players but more non-players. If Bern were set to 0.5, shifting the line to the right, One player would be misclassified, but thirteen non-players would be misclassified, for a total of fourteen. -------------------------------------------------------------------------------Lottery: Players and Non-Players Vs. Education & Income 25 Discriminant Function or Decision Rule: Bern = ¾ = 1.39 – 0.0216*education – 0.0105*income 20 Education (Years) Mean: Non-Players 15 Mean: Players Legend: Non-Players Players 10 5 Mean: Players Mean: Non-Players 0 0 10 20 30 40 50 Income ($000) 60 70 80 90 100