Lecture 12

advertisement
Nov. 14, 2002 LEC #12
ECON 240A-1
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
I. Introduction
Economists rely heavily on regression to investigate the relationship between a
dependent variable, y, and one or more independent variables, x, w, etc. As we have seen,
graphical analysis often provides insight into these bivariate relationships and can reveal
non-linear dependence, outliers, and other features that may complicate the analysis.
There are other methodologies for examining bivariate relations. We have
examined some of them. For example, correlation analysis, using the correlation
coefficient, , is one method, as discussed in Lecture Eight. Another method is
contingency table analysis. We will discuss the latter shortly. First we turn to the
bivariate normal distribution, which provides a useful visual model for bivariate
relationships just as the univariate normal distribution provides a useful probability
model for a single variable.
It is useful to have a mental model in mind for bivariate relationships and the isodensity lines, or contour lines of the bivariate normal provide a visual representation. The
bivariate normal distribution of two variables, y and x, is a joint density function, f(x,y),
and if the variables are jointly normal, then the marginal densities, e.g. f(x) and f(y), are
each normal. In addition, the conditional densities, y given x, f(y/x), are normal as well.
The isodensity lines, i.e. the locus where f(x,y) is constant, is a circle around the
origin for the bivariate normal if both x and y have mean zero and variance one, i.e. are
standardized normal variates, and are not correlated. If x and y have nonzero means, x
and y , respectively, then these contour lines are circles around the point (x, y).
If x has a larger variance than y, then the contour lines are ellipses with the long
axis in the x direction. If x and y are correlated, then these ellipses are slanted.
Nov. 14, 2002 LEC #12
ECON 240A-2
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
II. Bivariate Normal Density
The density function, f(x,y) for two jointly normal variables, x and y where, for
example, x has mean x, variance x2, and correlation coefficient , is:
f(x, y) = 1/[2x y (1-2)] exp{(-1/[2(1-2)])([(x- x)/x]2 - 2[(x- x)/x ][(y- y)/y] +
[(y- y)/y]2 }.
(1)
A. Case 1: correlation is zero, means are zero, and variances are one
f(x, y) = 1/[2 ] exp{(-1/2)[ x2 + y2 ]}
(2)
and for an isodensity, where f(x,y) is a constant, k, taking logarithms,
ln [2 f(x, y)] = -1/2 [x2 + y2 ],
or [x2 + y2 ] = -2 ln [2 f(x, y)] = -2ln [2 k].
(3)
Recall [x2 + y2] = r2 is the equation of a circle around the origin, (0, 0) with radius
r, as illustrated in Figure 1.
--------------------------------------------------------------------------------
y
x
Figure 1: Isodensity Circles About the Origin
Nov. 14, 2002 LEC #12
ECON 240A-3
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Note that if x and y are independent, then the correlation coefficient, , is zero
and the joint density function, f(x, y), is the product of the marginal density
functions for x and y, i.e.
f(x, y) = f(x) f(y) = 1/ 2 exp [-1/2 x2 ] 1/ 2 exp [-1/2 y2 ]
(4)
where x and y have mean zero and variance one.
B. Case 2: correlation is zero, variances are one, means x and y
In this case, the origin is translated to the point of the means, (x, y). The
bivariate density function is:
f(x, y) = 1/(2) exp {(-1/2)[(x - x)2 + (y - y)2 ]}.
(5)
For a density equal to k:
[(x - x)2 + (y - y)2 ] = -2 ln [2 f(x,y)] = -2 ln[2k]
(6)
This is illustrated in Figure 2.
------------------------------------------------------------------------------------y
y
x
x
Figure 2: Isodensity Lines About the Point of Means, Bivariate Normal
Nov. 14, 2002 LEC #12
ECON 240A-4
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
C. Case 3: correlation is zero, variance of x > variance of y
If the variance of x exceeds the variance of y, then the isodensity lines are ellipses
about the point of the means with the semi-major axis in the x direction:
f(x,y) = 1/(2 x y ) exp{ (-1/2) ([(x-x)/x]2 + [(x-y)/y]2 )}
(7)
Note that if x and y are independent, then the correlation coefficient is zero and
the joint density is the product of the marginal densities:
f(x, y) = f(x) f(y) = 1/(x
2 ) exp[-1/2[(x- x)/x]2 1/(y 2 ) exp[-1/2[(y- y)/y]2
For a constant isodensity, f(x, y) = k, from Eq. (7) we have,
([(x-x)/x]2 + [(x-y)/y]2 = -2 ln (2 x y f(x, y)) = -2 ln (2 x y k) (8)
Recall the equation of an ellipse about the origin with semi-major axis a
and semi-minor axis b is:
x2/a2 + y2/b2 = 1
(9)
Elliptical isodensity lines around the point of the means are illustrated for Eq. (7)
in Figure 3.
Case 4: correlation is nonzero.
The joint density function is given by Eq. (1) above, and the isodensity lines are
tilted ellipses around the point of the means as illustrated in Figure 4, for positive
autocorrelation.
Nov. 14, 2002 LEC #12
ECON 240A-5
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
y
y
x
x
Figure 3: Isodensity Lines About the Point of the Means, Var x > Var y
----------------------------------------------------------------------------------y
y
x
x
Figure 4: isodensity lines, x and y correlated
---------------------------------------------------------------------------------------------III. Marginal Density Functions
If x and y are jointly normal, then both x and y each have normal density
functions. For example, the marginal density of x, f(x) is:
Nov. 14, 2002 LEC #12
ECON 240A-6
L. Phillips
Bivariate Normal Distribution: Isodensity Curves

f(x) =
 f ( x, y )dy = 1/(x
2 ) exp[-1/2[(x- x)/x]2
(10)

and similarly for y.
IV. Conditional Density Function
The density of y conditional on a particular value of x, x = x*, is just a vertical slice of
the isodensity curve plot at that value of x, and if x and y are jointly normal, is also
normal. It can be obtained by dividing the joint density function by the marginal density
and simplifying:
f(y/x) = f(x, y)/f(x) = 1/[y 2 (1 - 2)1/2] exp{[-1/[2(1-2)y2][y-y-(x-x)(y/x)]}
(11)
where the mean of the conditional distribution is y + (x-x)(y/x), i.e this is the
expected value of y for a given value of x, such as x*:
E[y/x=x*] = y + (x* - x)(y/x)
(12)
So, if x is at its mean, x, then the expected value of y is its mean y. If x is above its
mean, and the correlation is positive, then the expected value of y conditional on x is
greater than y. This is called the regression of y on x with intercept y -  x(y/x), and
slope (y/x). Of course, if x and y are not correlated, then the slope is zero, and the
intercept is y. The variance of the conditional distribution is:
Var[y/x=x*] = y2 (1 - 2)
(13)
The isodensity lines and the regression line, the mean of y conditional on x, is
illustrated in Figure 5, for the case where x and y are positively correlated and the
variance of x is greater than the variance of y.
Nov. 14, 2002 LEC #12
ECON 240A-7
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Expected Value
of y Conditional
on x
y
y
x
x
Figure 5: The Expected value of y Conditional on x
V. Example: Rates of Return for a Stock and the Market
In Lab Six we look at the data file XR17-34 for 48 monthly rates of return to the
General Electric (GE) stock and the Standard and Poor’s Composite Index. Both of these
variables are not significantly different from normal in their marginal distributions. An
example is the histogram and statistics for the rate of return for GE, shown in Figure 6.
Figure 6
6
Series: GE
Sample 1993:01 1996:12
Observations 48
5
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
4
3
2
1
Jarque-Bera
Probability
0
-0.05
0.00
0.05
0.10
0.022218
0.019524
0.117833
-0.058824
0.043669
0.064629
2.231861
1.213490
0.545122
Nov. 14, 2002 LEC #12
ECON 240A-8
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
The coefficient of skewness, S, is a measure of non-symmetry:
n
S = (1/n) {[ y( j )  y ] / ˆ } 3
(14)
j 1
Where ˆ is s, the sample standard deviation. For the normal distribution, the coefficient
of skewness is zero, since the cube of deviations from the mean sum to zero with the
negative values offset by the positive ones because of symmetry.
The coefficient of kurtosis, K, is a measure of how peaked or how flat the density
is, capturing the weight in the tails.
n
K = (1/n)
{[ y( j)  y ] / ˆ } 4
(15)
j 1
For the normal distribution, the coefficient of kurtosis is three.
The Jarque-Bera statistic, JB, combines these two coefficients:
JB = (n- k/6) [S2 + (1/4)(K – 3)2
(16)
Where k is the number of estimated parameters, such as the sample mean and sample
standard deviation, needed to calculate the statistics. If S is zero and K is 3, then the JB
statistic will be zero. Large values of JB indicate a deviation from normality, and can be
tested using the Chi-Square distribution with two degrees of freedom.
The descriptive statistics for GE and the Index are given in Table 1. The estimated
correlation coefficient is 0.636. These estimates can be used to implement Eq. (12):
E[y/x=x*] = [y - x(y/x)] + x*(y/x)
E[GE/Index] = [0.0222 – 0.636*0.0144*(0.0437/0.0254)] + 0.636*1.720*Index
E[GE/Index] = 0.0064 + 1.094*Index
(13)
Nov. 14, 2002 LEC #12
ECON 240A-9
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
For comparison, the estimated regression is reported in Table 2. The coefficients are
nearly identical. So the regression can be interpreted as the expected value of y for a
given value of x. A plot of the rates of return for GE and the stock Index are shown in
Figure 6.
Table 1
Sample: 1993:01 1996:12
GE
INDEX
0.022218
0.019524
0.117833
-0.058824
0.043669
0.064629
2.231861
0.014361
0.017553
0.076412
-0.044581
0.025430
-0.453474
3.222043
Jarque-Bera
Probability
1.213490
0.545122
1.743715
0.418174
Observations
48
48
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
Table 2
Dependent
variable:
GE
Method:
Least
Squares
Coefficient
Std. Error
t-Statistic
Prob.
0.006526
1.092674
0.005659
0.195328
1.153229
5.594046
0.2548
0.0000
0.404865
0.391927
0.034053
0.053341
95.14493
2.442439
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
0.022218
0.043669
-3.881039
-3.803072
31.29335
0.000001
Nov. 14, 2002 LEC #12
ECON 240A-10
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Figure 6: Rates of Return for GE Stock and
S&P Composite Index
0.15
0.10
GE
0.05
0.00
-0.05
-0.10
-0.05
0.00
0.05
0.10
INDEX
V. Discriminating Between Two Populations
As an example, we will use the data file XR18-58 on lottery expenditure as a
percent of income, introduced in Lab Six. Twenty-three individuals did not gamble. The
means for their age, number of children, years of education, and income are shown in
Table 3. For comparison, the means of the 77 individuals who did gamble are shown in
Table 4. The question is, can these explanatory variables predict who will and who will
not buy lottery tickets.
The means for number of children and age are fairly similar for the two groups.
Those who do not buy lottery tickets are better educated with higher incomes than those
Nov. 14, 2002 LEC #12
ECON 240A-11
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
who participate in the lottery. The correlation between education and income is 0.65 for
ticket buyers, and 0.74 for the entire sample.
Table 3
Sample: 1 23
AGE
CHILDREN
EDUCATION
INCOME
LOTTERY
40.43478
41.00000
54.00000
23.00000
8.805092
-0.446250
2.308389
1.782609
2.000000
4.000000
0.000000
1.277658
0.014659
1.985475
15.56522
16.00000
20.00000
7.000000
3.368653
-0.919721
3.156800
47.56522
42.00000
95.00000
18.00000
22.51631
0.518080
2.097295
0.000000
0.000000
0.000000
0.000000
0.000000
NA
NA
Jarque-Bera
Probability
1.221762
0.542872
0.987199
0.610425
3.266130
0.195330
1.809815
0.404579
NA
NA
Observations
23
23
23
23
23
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
---------------------------------------------------------------------------------------Table 4
Sample: 24 100
AGE
CHILDREN
EDUCATION
INCOME
LOTTERY
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
44.19481
43.00000
82.00000
21.00000
12.70727
0.466514
3.189937
1.779221
2.000000
6.000000
0.000000
1.343830
0.506085
3.149919
11.94805
11.00000
17.00000
7.000000
2.887797
0.293006
1.918891
28.54545
27.00000
64.00000
11.00000
9.423578
1.304264
5.036654
7.000000
7.000000
13.00000
1.000000
2.695025
-0.308533
2.741336
Jarque-Bera
Probability
2.908734
0.233548
3.359008
0.186466
4.851659
0.088405
35.13888
0.000000
1.436299
0.487654
Observations
77
77
77
77
77
--------------------------------------------------------------------------The conceptual framework is provided in Figure 7, which shows isodensity curves
for the two populations for the explanatory variables income and education.
Nov. 14, 2002 LEC #12
ECON 240A-12
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Lottery Players
Lottery Avoiders
Y = education
y
Decision Rule
Line
y
x
x
X = income
Figure 7: Discriminating Between Those Who Play the Lottery and Those Who Don’t
--------------------------------------------------------------------------------------Using a single variable, we could test for a difference in sample means for
education or for a difference in the sample means for income. But why not use both
variables and instead of a decision rule classifying them as gamblers if x < x*, or y < y*
use a decision rule line that separates the two populations. This is called discriminant
function analysis.
Another approach is to use a probability model. A linear probability model can be
estimated with regression using a dependent variable coded one for those who buy tickets
and zero for those who do not(designated bern for Bernoulli), and regressing it against
education and income. The results are shown in Table 7, with a plot of actual, fitted and
residuals following. Since income is very skewed, it is better to use the natural logarithm
of income, which is more bell shaped.
Nov. 14, 2002 LEC #12
ECON 240A-13
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Using the same coding for the dependent variable, non-linear estimation of the
logit probability model is possible using Eviews, which avoids some problems that occur
with the linear probability model.
Table 7
Dependent Variable: BERN
Method: Least Squares
Sample: 1 100
Included observations: 100
Variable
Coefficient
Std. Error
t-Statistic
Prob.
EDUCATION
INCOME
C
-0.021597
-0.010462
1.390402
0.016017
0.003430
0.148465
-1.348392
-3.049569
9.365178
0.1807
0.0030
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.277095
0.262190
0.363299
12.80264
-39.11792
0.651758
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
0.770000
0.422953
0.842358
0.920513
18.59045
0.000000
-------------------------------------------------------------------------.
Figure 8: Ac tual, Fitted and res iduals from Linear Probability Model
1.5
1.0
0.5
1.0
0.5
0.0
0.0
-0.5
-0.5
-1.0
-1.5
10
20
30
40
Residual
50
60
Actual
70
80
90
Fitted
100
Nov. 14, 2002 LEC #12
ECON 240A-14
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
The linear probability model can be interpreted from the perspective of decision
theory, and used to come up with a decision rule or discriminant function. The expected
cost of misclassification is the sum of the expected costs of two kinds of
misclassification, (1) labeling a non-player a player, and (2) labeling a player a nonplayer. For example, if we have the cost of labeling a non-player a player, C(P/N), and
multiply it by the conditional probability, P(P/N) of incorrectly classifying this nonplayer a player, given this individual’s values for income and education, and multiply by
the probability of observing non-players in the population, P(N), we have this first
component of misclassification: C(P/N)*P(P/N)*P(N). Adding the other expected cost of
misclassification, we have the total expected costs, E(C), of misclassification:
E(C) = C(P/N)*P(P/N)*P(N) + C(N/P)*P(N/P)*P(P).
(14)
If the two costs of misclassification are equal, i.e. C(P/N) = C(N/P), noting that
there are 23 non-players or about one in four in the population, the expected costs are
E(C) = C(P/N)*P(P/N)*(1/4) + C(N/P)*P(N/P)*(3/4),
(15)
We could weight the expected costs of misclassification equally by setting the probability
of classifying a non-player (coded one in the linear probability model) as a player to ¾,
i.e setting P̂ (P/N) = ¾, i.e.
E(C) = C(P/N)*(3/4)*(1/4) + C(N/P)*(1/4)*(3/4).
(16)
This is equivalent to setting the fitted value of Bern to ¾, and classifying an individual as
a player if the individuals fitted probability is greater than ¾, i.e. if B̂ ern > ¾, where
Bern = ¾ = 1.390 –0.0216*education – 0.0105*income,
(17)
drawing on Table 7. Thus the discriminant function or decision rule line in education
income space is, rearranging Eq. (17):
Nov. 14, 2002 LEC #12
ECON 240A-15
L. Phillips
Bivariate Normal Distribution: Isodensity Curves
Education = 29.63 – 0.486*income,
(18)
Which is illustrated in Figure 9.
Note that five non-players are misclassified as well as fourteen players, for a total
of nineteen. You could shift the line to the right, misclassifying fewer players but more
non-players. If Bern were set to 0.5, shifting the line to the right, One player would be
misclassified, but thirteen non-players would be misclassified, for a total of fourteen.
-------------------------------------------------------------------------------Lottery: Players and Non-Players Vs. Education & Income
25
Discriminant Function or Decision Rule:
Bern = ¾ = 1.39 – 0.0216*education – 0.0105*income
20
Education (Years)
Mean: Non-Players
15
Mean: Players
Legend:
Non-Players
Players
10
5
Mean: Players
Mean: Non-Players
0
0
10
20
30
40
50
Income ($000)
60
70
80
90
100
Download