II. The Multivariate Normal Distribution

advertisement
II. The Multivariate
Normal Distribution
“…it is not enough to know that a sample could have
come from a normal population; we must be clear that
it is at the same time improbable that it has come from
a population differing so much from the normal as to
invalidate the use of the ‘normal theory’ tests in further
handling of the material.”
E. S. Pearson, 1930 (quoted on page 1 in Tests of
Normality, Henry C. Thode, Jr., 2002)
A.Review of the Univariate Normal
Distribution
Normal Probability Distribution - expresses the
probabilities of outcomes for a continuous random
variable x with a particular symmetric and unimodal
distribution. This density function is given by
f(x) 
1
e
2
where  = mean
 = standard deviation
 = 3.14159…
e = 2.71828…
 x  
2
2 2
but the probability is given by
b
P(a  x  b) 
 f(x)dx
a
b


a
-  x- μ 
1
e
2
2
2σ2
dx
This looks like a difficult integration problem! Will I
have to integrate this function every time I want to
calculate probabilities for some normal random
variable?
Characteristics of the normal probability distribution are:
- there are an infinite number of normal distributions,
each defined by their unique combination of the mean 
and standard deviation 
-  determines the central location and  determines the
spread or width
- the distribution is symmetric about 
- it is unimodal
-  = Md = Mo
- it is asymptotic with respect to the horizontal axis
- the area under the curve is 1.0
- it is neither platykurtic nor leptokurtic
- it follows the empirical rule:
P(μ - 1σ  x  μ + 1σ)
P(μ - 2σ  x  μ + 2σ)
P(μ - 3σ  x  μ + 3σ)
0.68
0.95
0.997
Normal distributions with the same mean but different
standard deviations:
f(x)

x
Normal distributions with the same standard deviation
but different means:
f(x)
   
x
The Standard Normal Probability Distribution - the
probability distribution associated with any normal
random variable (usually denoted z) that has  = 0 and
 = 1.
There are tables that can be used to obtain the results
of the integration
b
P(a  z  b) 
 f(z)dz
a
b


a
-z2
1
e 2 dz
2
for the standard normal random variable.
Some of the tables work from the cumulative standard
normal probability distribution (the probability that a
random value selected from the standard normal
random variable falls between – and some given
value b > 0, i.e., P(-  z  b))
b
P(-  z  b) 

f(z)dz
-
b


-
-z2
1
e 2 dz
2
There are tables that give the results of the integration
(Table 1 of the Appendices in J&W).
Cumulative Standard Normal Distribution (J&W Table 1)
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
Let’s focus on a small part of the Cumulative Standard
Normal Probability Distribution Table
Example: for a standard normal random variable z,
what is the probability that z is between - and 0.43?
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
Example: for a standard normal random variable z,
what is the probability that z is between 0 and 2.0?
f(z)
0
2.0
z
Again, looking at a small part of the Cumulative
Standard Normal Probability Distribution Table, we
find the probability that a standard normal random
variable z is between - and 2.00?
z
:
:
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
0.00
:
:
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.01
:
:
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.02
:
:
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.03
:
:
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.04
:
:
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
Example: for a standard normal random variable z,
what is the probability that z is between 0 and 2.0?
Area of
Probability = 0.5000
}
{
f(z)
Area of Probability =
0.9772 – 0.5000 = 0.4772
0
2.0
z
P(0  z  2)  P(-  z  2) - P(-  z  0)
 0.9772 - 0.5000  0.4772
What is the probability that z is at least 2.0?
Area of Probability =
1.0000 - 0.9772 = 0.0228
{
f(z)
0
2.0
P(z  2)  P(-  z  ) - P(-  z  2)
 1.0000 - 0.9772 = 0.0228
z
What is the probability that z is between -1.5 and 2.0?
}
Area of
Probability =
0.4772
f(z)
-1.5
0
2.0
z
Again, looking at a small part of the Cumulative
Standard Normal Probability Distribution Table, we
find the probability that a standard normal random
variable z is between - and 1.50?
z
:
:
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
0.00
:
:
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.01
:
:
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.02
:
:
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.03
:
:
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.04
:
:
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
What is the probability that z is between -1.5 and 2.0?
Area of
Probability =
0.4772
}
}
Area of
Probability =
0.5000 - 0.0668
= 0.4332
f(z)
-1.5
0
2.0
z
P(-1.5  z  2.0) = P(-1.5  z  0.0) + P(0.0  z  2.0)
= P(-  z  0.0) - P(-  z  -1.5) + 0.4772 = 0.9104
= 0.5000 - 0.0668 + 0.4772 = 0.9104
Notice we could find the probability that z is between
-1.5 and 2.0 another way!
Area of
Probability =
0.9772
}
f(z)
}
Area of
Probability =
1.0000 - 0.9332
= 0.0668
-1.5
0
2.0
z
P(-1.5  z  2.0) = P(-  z  2.0) - P(-  z  1.5)
= 0.9772 - P(-  z  ) - P(-  z  1.5)
= 0.9772 - 1.0000 - 0.9332 = 0.9104
There are often multiple ways to use the Cumulative
Standard Normal Probability Distribution Table to
find the probability that a standard normal random
variable z is between two given values!
How do you decide which to use?
- Do what you understand (make yourself comfortable)
and
- DRAW THE PICTURE!!!
f(z)
0
z
Notice we could also calculate the probability that z is
between -1.5 and 2.0 yet another way!
Area of
Probability =
0.4772
}
}
Area of
Probability =
0.9332 - 0.5000
= 0.4332
f(z)
-1.5
0
2.0
z
P(-1.5  z  2.0) = P(-1.5  z  0.0) + P(0.0  z  2.0)
= P(-  z  1.5) - P(-  z  0.0) + 0.4772 = 0.9104
= 0.9332 - 0.5000 + 0.4772 = 0.9104
}
}
}
What is the probability that z is between -1.5 and -2.0?
Area of
Probability =
Area of
0.5000 - 0.0228
Probability =
= 0.4772
0.4772 – 0.4332 =
Area of
0.0440
Probability =
f(z)
0.4332
-2.0 -1.5
0
z
P(-2.0  z  -1.5) = P(-2.0  z  0.0) - P(-1.5  z  0.0)
= P(-  z  0.0) - P(-  z  -2.0) - 0.4332
= 0.5000 - 0.0228 - 0.4332 = 0.0440
What is the probability that z is exactly 1.5?
f(z)
}
Area of
Probability =
0.9332
0
z
1.5
P(z  1.5) = P(1.5  z  1.5)
= P(-  z  1.5) - P(-  z  1.5)
= 0.9332 - 0.9332 = 0.0000
(why?)
Other tables work from the half standard normal
probability distribution (the probability that a random
value selected from the standard normal random
variable falls between 0 and some given value b > 0,
i.e., P(0  z  b))
b
P(0  z  b) 
 f(z)dz
0
b


0
-z2
1
e 2 dx
2
There are tables that give the results of the integration
as well.
Standard Normal Distribution
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.2580
0.2881
0.3159
0.3413
0.3643
0.3849
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4987
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2611
0.2910
0.3186
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4987
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.3461
0.3686
0.3888
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4987
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2967
0.3238
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4988
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2995
0.3264
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4988
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4989
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4989
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4989
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2517
0.2823
0.3106
0.3365
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4990
0.09
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.2549
0.2852
0.3133
0.3389
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
0.4990
Let’s focus on a small part of the Standard Normal
Probability Distribution Table
Example: for a standard normal random variable z,
what is the probability that z is between 0 and 0.43?
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
Example: for a standard normal random variable z,
what is the probability that z is between 0 and 2.0?
f(z)
0
2.0
z
Again, looking at a small part of the Standard Normal
Probability Distribution Table, we find the probability
that a standard normal random variable z is between 0
and 2.00?
z
:
:
1.6
1.7
1.8
1.9
2.0
2.1
2.2
0.00
:
:
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.01
:
:
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.02
:
:
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.03
:
:
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.04
:
:
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
Example: for a standard normal random variable z,
what is the probability that z is between 0 and 2.0?
}
Area of Probability = 0.4772
f(z)
0
2.0
P(0  z  2) = 0.4772
z
What is the probability that z is at least 2.0?
{
Area of Probability =
0.5000 - 0.4772 = 0.0228
f(z)
0
2.0
P(z  2) = P(0  z  ) - P(0  z  2)
= 0.5000 - 0.4772 = 0.0228
z
What is the probability that z is between -1.5 and 2.0?
}
Area of
Probability =
0.4772
f(z)
-1.5
0
2.0
z
Again, looking at a small part of the Standard Normal
Probability Distribution Table, we find the probability
that a standard normal random variable z is between 0
and –1.50?
z
:
:
1.3
1.4
1.5
1.6
1.7
1.8
0.00
:
:
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.01
:
:
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.02
:
:
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.03
:
:
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.04
:
:
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
What is the probability that z is between -1.5 and 2.0?
Area of
Probability =
0.4772
}
}
Area of
Probability =
0.4332
f(z)
-1.5
0
2.0
z
P(-1.5  z  2.0) = P(-1.5  z  0.0) + P(0.0  z  2.0)
= 0.4332 + 0.4772 = 0.9104
f(z)
}
}
Area of
Probability =
0.4772 – 0.4332 =
0.0440
}
What is the probability that z is between -1.5 and -2.0?
-2.0 -1.5
0
Area of
Probability =
0.4772
Area of
Probability =
0.4332
z
P(-2.0  z  -1.5) = P(-2.0  z  0.0) - P(-1.5  z  0.0)
= 0.4772 - 0.4332 = 0.0440
What is the probability that z is exactly 1.5?
}
Area of
Probability =
0.4332
f(z)
0
z
1.5
P(z  1.5) = P(1.5  z  1.5)
= P(0.0  z  1.5) - P(0.0  z  1.5)
= 0.4332 - 0.4332 = 0.0000
(why?)
z-Transformation - mathematical means by which any
normal random variable with a mean  and standard
deviation  can be converted into a standard normal
random variable.
- to make the mean equal to 0, we simply subtract  from
each observation in the population
- to then make the standard deviation equal to 1, we
divide the results in the first step by 
The resulting transformation is given by
x  
z 

Example: for a normal random variable x with a mean
of 5 and a standard deviation of 3, what is the
probability that x is between 5.0 and 7.0?
f(x)
}
Area of
Probability
 =5
0.0
7.0
x
z
Using the z-transformation, we can restate the problem
in the following manner:
x-
7.0 - 5.0 
 5.0 - 5.0
P(5.0  x  7.0) = P 



3.0

3.0


= P(0.0  z  0.67)
then use the standard normal probability table to find
the ultimate answer:
P(0.0  z  0.67) = 0.2486
which graphically looks like this:
f(x)
}
Area of
Probability
= 0.2486
7.0
x
0.0 0.67
z
 =5
Why is the normal probability distribution considered
so important?
- many random variables are naturally normally
distributed
- many distributions, such as the Poisson and the
binomial, can be approximated by the normal
distribution (Central Limit Theorem)
- the distribution of many statistics, such as the sample
mean and the sample proportion, are approximately
normally distributed if the sample is sufficiently large
(also Central Limit Theorem)
B. The Multivariate Normal Distribution
The univariate normal distribution has a generalized
form in p dimensions – the p-dimensional normal
density function is
f(x) 
 2 


 x   Σ-1 x  
1
p2

'

12
where -  xi  , i = 1,…,p.
e

2
squared generalized
distance from x to 
~
~
This p-dimensional normal density function is denoted
by Np(,) where
~ ~
 μ1 
σ11 σ12
σ1p 
 


μ
σ
σ
σ
2
21
22
2p 
μ =  , Σ = 
 


 


σ pp 
 μ p 
σ p1 σ p2
The simplest multivariate normal distribution is the
bivariate (2 dimensional) normal distribution, which has
the density function
f(x) =
1
 2 


'

 x   Σ-1 x  

where -  xi  , i = 1, 2.
12
e

2
squared generalized
distance from x to 
~
~
This 2-dimensional normal density function is denoted
by N2(,) where
~ ~
 μ1 
σ11 σ12 
μ =  , Σ = 

 μ2 
σ21 σ22 
We can easily find the inverse of the covariance matrix
(by using Gauss-Jordan elimination or some other
technique):
 σ22 -σ12 
Σ =

2 
σ11σ22 - σ12 -σ21 σ11 
-1
1
Now we use the previously established relationship
σ12 = ρ12 σ11 σ22
to establish that

2
2
σ11σ22 - σ12
= σ11σ22 1 - ρ12

By substitution we can now write the squared distance
as
 x    Σ  x    = x - μ
'
-1
1
1
x 2 - μ2 

2
σ11σ22 1-ρ12


σ22
-ρ12 σ11 σ22   x1 - μ1 



x
μ
-ρ12 σ11 σ22

σ11

  2 2
σ22  x1 - μ1  + σ11  x 2 - μ 2  - 2ρ12 σ11 σ 22  x1 - μ1  x 2 - μ 2 
2
=
1
2

2
σ11σ22 1-ρ12

2
2









1  x 1 - μ1
x 2 - μ2
x 1 - μ1 x 2 - μ 2 







=
+
2ρ
12
2 
 σ11   σ22  
1 -ρ12  σ11   σ22 
 



 

which means that we can rewrite the bivariate normal
probability density function as
f(x) =
=
1
 2 

 
'
 x   Σ-1 x  

12
e
1
 2 

2
σ11σ22 1 - ρ12


2
e

1
2 1- ρ122

2
2

  x -μ 



x
-μ
 1 1  +  2 2  -2ρ  x1 -μ1   x2 -μ2  
12
 σ11   σ22 
 σ11   σ22  
 



 

Graphically, the bivariate normal probability density
function looks like this:
Bivariate Normal Response Surface
f(X1, X2)
contours
X2
X1
All points of equal density are called a contour,
defined for p-dimensions as all x~ such that
   
'
x   Σ-1 x   = c2
The contours
   
'
x   Σ-1 x   = c2
form concentric ellipsoids centered at  with axes ±c λi ei
~
μ 
μ =  1
 μ2 
X2
f(X1,
X2)
contour for
constant c
±c λ 2 e2
f(X1, X2)
±c λ1 e1
where
e
i
i
X1
= λiei for i = 1, , p
The general form of contours for a bivariate normal
probability distribution where the variables have
equal variance (11 = 22) is relative easy to derive:
~
First we need the eigenvalues of 
Σ - λI = 0 or
2
σ11 - λ
σ12
2
0 = σ
=
σ
λ
σ


11
12
σ
λ
12
11
=  λ - σ11 - σ12   λ - σ11 + σ12 
so λ1 = σ11 + σ12, λ2 = σ11 - σ12
Next we need the eigenvectors of 
~
Σei = λiei or
 e1  = λ
1
e2 
or σ11e1 + σ12e2 =
σ11 σ12
σ12 σ11
 e1 
e2 
σ11 + σ12  e1
σ12e1 + σ11e2 = σ11 + σ12  e2
1 2 
which implies e1 = e2 or e1 = 

2
1


and λ2 = σ11 - σ12
 1 2
similarly leads to e2 = 

2
-1


- for a positive covariance 12, the first eigenvalue and
its associated eigenvector lie along the 450 line running
through the centroid :
~
X2
f(X1,
contour for
constant
c=
X2)
c σ11 - σ12
x   Σ x  
'
-1
f(X1, X2)
c σ11 + σ12
X1
What do you suppose happens when the covariance is
negative? Why?
- for a negative covariance 12, the second eigenvalue
and its associated eigenvector lie at right angles to the
450 line running through the centroid :
~
X2
f(X1,
contour for
constant
c=
X2)
c σ11 - σ12
x   Σ x  
'
-1
f(X1, X2)
c σ11 + σ12
X1
What do you suppose happens when the covariance is
zero? Why?
What do you suppose happens when the two random
variables X1 and X2 are uncorrelated (i.e., r12 = 0):
f(x) =
=
1
 2 

 
'
 x   Σ-1 x  

12
e
1
 2 

2
σ11σ22 1 - ρ12
1
=
e
 2  σ11σ22


2
e

1
2
2 1-ρ12

2
2

  x -μ 



x
-μ
 1 1  +  2 2  -2ρ  x1 -μ1   x2 -μ2  
12
 σ11   σ22 
 σ11   σ22  
 



 

2
2

1  x1 -μ1   x2 -μ2  

 +

2  σ11   σ22  


f(X1)
f(X2)
2
2
  x1   
  x2    
 1



1
2σ11
2σ22
= 
e
e


 2σ11
  2σ22




- for covariance 12 of zero the two eigenvalues and
eigenvectors are equal (except for signs) - one runs
along the 450 line running through the centroid  and
~
the other is perpendicular:
X2
contour for
constant
c=
c σ11 - σ12
x   Σ x  
'
-1
f(X1, X2)
c σ11 + σ12
X1
What do you suppose happens when the covariance is
zero? Why?
Contours also have an important probability
interpretation – the solid ellipsoid of x~ values
satisfying:

 
'


x   Σ-1 x    χ 2p  α  or χ 2p,α

has a probability 1 – a, i.e.,

 

'

Pr x   Σ-1 x    χ2p  α   = 1 - α


C. Properties of the Multivariate Normal
Distribution
For any multivariate normal random vector X
~
1. The density
f(x) 
1
 2 
p2


'

 x   Σ-1 x  

12
e
has maximum value at
 μ1 
 
μ2 

μ=
 
 
 μ p 
i.e., the mean is equal to the mode!

2
2. The density
f(x) 
1
 2 
p2


'

 x   Σ-1 x  

12
e

2
is symmetric along its constant density contours and is
centered at , i.e., the mean is equal to the median!
~
3. Linear combinations of the components of X
are
~
normally distributed
4. All subsets of the components of X
have a (multivariate)
~
normal distribution
5. Zero covariance implies that the corresponding
components of X
are independently distributed
~
6. Conditional distributions of the components of X
are
~
(multivariate) normal
D.Some Important Results Regarding the
Multivariate Normal Distribution
1. If X ~ Np(,), then any linear combination
~
~ ~
a'X =
p

i=1

ai X i ~ N p a'μ, a'Σa

Furthermore, if a’X ~ Np(,) for every a, then X ~
~
~ ~
~ ~
~
Np(,)
~ ~
2. If X ~ Np(,), then any set of q linear combinations
~
~ ~
p

  a1i X i 
 i=1

p

  a 2i X i 
'
'
'
A X =  i=1
~
N
A
μ,A
ΣA
q



p

 a X 
qi i 

 i=1



Furthermore, if d is a conformable vector of constants,
then X
d ~ Np(
~ +~
~ + d,)
~~
3. If X ~ Np(,), then all subsets of X are (multivariate)
~
~ ~
~
normally distributed, i.e., for any partition
 X1 
 1 
  qx1 
 qx1 
X =  ___  ,  =  ___  ,
 px1
 px1
 X2 
 2 
  p-q  x1 
  p-q  x1 




 Σ11
  qxq 
Σ = 

 pxp 
 Σ21
 p-q  xq 
then X1 ~ Nq(1, 11), X2 ~ Np-q(2, 22)
~
~
~
~
~
~

 qx  p-q  


Σ22 
 p-q  x  p-q  
Σ12
4. If X1 ~ Nq1(1,11) and X2 ~ Nq2(2,22) are independent,
~
~ ~
~
~ ~
then Cov(X1, X2) = 12 = 0
~
~
~
~
and if
 X1 
 X  ~ N q1 + q 2
 2
 1  Σ11 Σ12 
  ,
  2  Σ21 Σ22 
 
then X1 and X2 are independent iff 12 = 0
~
~
~
~
and if X1 ~ Nq1(1,11) and X2 ~ Nq2(2,22) and are
~
~ ~
~
~ ~
independent, then
 X1 
 X  ~ N q1 + q 2
 2
 1  Σ11 0 
  ,
  2   0 Σ22 
 
5. If X ~ Np(,) and || > 0, then
~

x  
~ ~

'
~


Σ-1 x   ~ χ p
and
the Np(,) distribution assigns probability 1 – a to the
~ ~
solid ellipsoid


 
'


x : x   Σ-1 x    χ 2p  α 
common covariance matrix
6. Let Xj ~ Np(j,), j = 1,…,n be mutually independent.
~
~ ~
Then
n
V1 =

j= 1
 n
 n 2 
cjX j ~ N p   cjμj,   cj  Σ 
~
 j= 1 ~

j=
1




Furthermore, V
~ 1 and
n
V2 =
 b X~
j
j= 1
j
 n
 n 2 
~ N p   bjμj,   bj  Σ 
 j= 1 ~

j=
1




are jointly normal with covariance matrix
 n 2 

'
bc Σ 
  c j  Σ
 j=1 



n
 b'c Σ  b 2  Σ
 j 

 j=1  

 
 
so V~ 1 and V
are
~2
independent if b’c = 0!
~ ~
E. Sampling From a Multivariate Normal
Distribution and Maximum Likelihood
Estimation
Let Xj ~ Np(,), j = 1,…,n represent a random sample.
~
~ ~
Since the X
‘s are mutually independent and each have
~j
distribution Np(,),
their joint density is the product
~ ~
of their marginal densities, i.e.,
joint
n
=

j= 1
1
 2 
p 2
~
=

e
12
-
1
 2 
np 2


- xj - μ
n
as a function of ~ and ,
this
~
is the likelihood for fixed
observations xj, j = 1,…,n
, Xn
density of X 1, X 2,
n 2
e
  xj - μ 
j= 1
 Σ x
'
'
-1

j-μ
Σ-1 xj - μ

2

2
Maximum Likelihood Estimation – estimation of
parameter values by finding estimates that maximize
the likelihood of the sample data on which they are
based (select estimated values for parameters that best
explain the sample data collected)
Maximum Likelihood Estimates – the estimates of
parameter values that maximize the likelihood of the
sample data on which they are based
For a multivariate normal distribution, we would like
to obtain the maximum likelihood estimates of
parameters ~ and ~ given the sample data X
we have
~
collected. To simplify our efforts we will need to
utilize some properties of the trace to rewrite the
likelihood function in another form.
For a k x k symmetric matrix A and a k x 1 vector x:
~
~
- x’Ax = tr(x‘Ax) = tr(Axx’)
~ ~~
k
- tr(A)
=
~
~ ~~
λ
~~~
where li, I = 1…, k are the eigenvalues of A
~
i
i =1
These two results can be used to simplify the joint
density of n mutually independent random observations
Xj‘s, each have distribution Np(,) – we first rewrite
~
~ ~
x

'
j
-1
-μ Σ






'

xj - μ = tr  xj - μ Σ-1 xj - μ 





'

-1
= Σ xj - μ xj - μ 


Then we rewrite
 x
n
j= 1

'
j
-1
-μ Σ


n






'

xj - μ =  tr  xj - μ Σ-1 xj - μ 


j= 1
n
since the trace of the sum of
matrices is equal to the sum
of their individual traces

'

-1
=  tr Σ xj - μ xj - μ 


j= 1
 -1  n
' 
= tr Σ   xj - μ xj - μ  
  j= 1
 



We can further state that
 x

n
j
- μ xj - μ
j=1
 =  x
j
=
 x
j
j=1
n
Because the
crossproduct terms
 x
j=1

n
j=1
j




'
- x x - μ and
x - μ xj - x
- x + x - μ xj - x + x - μ
j=1
n
n

n
'
'
are both matrices of
zeros
=
 x
j=1
j

'

  x - μ  x - μ


'
n
'
- x xj - x +
j=1
'


- x xj - x + n x - μ x - μ

'
Substitution of these two results yield an alternative
expression of the joint density of a random sample
from a p-dimensional population:
f(x) =
  n
' 
'
-tr Σ-1 
xj - x xj - x + n x- μ x- μ  
  j= 1


 

1
 2 
=  2 
np 2
 np 2

n2

-n 2




e

2
 1
 -1  n
exp - tr Σ   xj - x

 j= 1
 2

 x

'
j

-x + n x-μ

  
x - μ  
  
Substitution of the observed values x1,…,xn into the
~
~
joint density yields the likelihood function for the
corresponding sample X,
~ which is often denoted as
L(,
).
~ ~

'
So for observed values x1,…,xn that comprise random
~
~
sample X
~ drawn from a p-dimensional normally
distributed population, the likelihood function is
L(μ, Σ) =
  n
' 
'
-1
-tr Σ 
xj -x xj -x + n x-μ x-μ  
  j=1


 

1
 2 
np 2

n2
e

 
 
2
Finally, note that we can express the exponent of the
likelihood function in many ways – one particular
alternate expression will be particularly convenient:
 -1  n
' 
'
tr Σ   xj - x xj - x + n x - μ x - μ  
  j=1
 
 -1  n
' 
' 
  -1

= tr Σ   xj - x xj - x   + n tr  Σ x - μ x - μ  

 
  j=1
 
 -1  n
'
' 
= tr Σ   xj - x xj - x   + n x - μ Σ-1 x - μ
  j=1
 



















which, by another substitution, yields the likelihood
function
L(μ, Σ) =
   n


'
'
- tr Σ-1 
xj -x xj -x   +n x-μ Σ-1 x-μ 
  


j=1


 



1
 2 
np 2

n2


   
e
Again, keep in mind that we are pursuing estimates of

)
~ and 
~ that maximize the likelihood function L(,
~ ~
for a given random sample X.
~
2
This result will also be helpful in deriving the
maximum likelihood estimates of 
~ and .
~
For a p x p symmetric positive definite matrix B
and
~
scalar b > 0, it follows that
1
Σ
b
 
-tr Σ-1B
e
2

1
B
b
-bp
2b
e
 
pb
for all positive definite  of dimension p x p, with
~
equality holding only for
1
Σ =
B
2b
Now we are ready for maximum likelihood estimation
of 
~ and .
~
For a random sample X
,…,X~n from a normal
~1
population with mean ~ and covariance ,
the
~
maximum likelihood estimators ^ and ^ of  and  are
~
1
ˆ
μˆ = X, Σ =
n
n

j=1
~

~
~

n-1
Xj - X Xj - X =
S
n
'
Their observed values for observed data x1,…,xn
~
1
x and
n
n
 x
j
j=1
-x
 x
j
-x
~

'
are the maximum likelihood estimates of  and .
~
~
Note that the maximum of the likelihood is achieved at

np  
 1
1 

2
ˆ
ˆ Σ) = 
L(μ,
e  n2
np 2
  2 
  ˆΣ 



and since
p
ˆΣ =  n - 1  S
 n 
we have that
 1
ˆ ˆ
L(μ,
Σ) = 
  2 np 2

constant
generalized
variance




np


np  
n

2
1
n
1


e 2
= 
2e   S 2
n2 
p
 
 n




n
1

 


S 

 

 
  n 

 

It can be shown that maximum likelihood estimators
^
(or MLEs) possess an invariance property – if q is the
^
MLE of q, then the MLE of f(q) = f(q). Thus we can say
'ˆ-1
'  n - 1
ˆ
ˆ
- the MLE of μ Σ μ is μ Σ μ = X 
 SX
 n 
- the MLE of σii is
' -1
ˆii =
σ
1
n
n
 X
ij
- Xi
j=1
 X
ij
- Xi

'
=
1
n
where
ˆii
σ
1
=
n
is the MLE of Var(Xi).
n

j=1
X ij - X i

2
ˆi2
= σ
n
 X
j=1
ij
- Xi

2
It can be also be shown that
x and  n - 1 S or S
are sufficient for the multivariate normal joint density
f(x) =
  n
' 
'
-tr Σ-1 
xj -x xj -x + n x-μ x-μ  
  j= 1


 

1
 2 
=  2 
np 2
-np 2
Σ
Σ
n2
-n 2

 

e

2
 1
 -1  n
exp - tr Σ   xj - x

 j=1
 2

 x

'
j

-x + n x-μ

  
x - μ  
  

'
i.e., the density depends on the entire set of
observations x1,…,xn only through x and  n - 1 S or S .
_~
~
Thus, we refer to X
and S~ as the sufficient statistics for
~
the multivariate normal distribution.
Sufficient Statistics contain all information necessary
to evaluate a particular density for a given sample.
_
F. The Sampling Distributions of X and S
~
~
The assumption that X1,…,Xn constitute a random
~
~
sample with mean ~ and covariance 
completely
~
determines the sampling distributions of X and S.
_
~
~
For a univariate normal distribution, X is normal with
1 2
population variance
mean μ and variance σ =
n
sample size
Analogously, for the multivariate (p  2)_ case (i.e., X is
~
normal with mean  and covariance ), X is normal
~
~ ~
with
1
mean μ and covariance matrix
Σ
n
Similarly, for random sample X1,…, Xn from a
univariate normal distribution with mean  and
variance 2
2
n
1
s

 =
n

Xj - X
j=1

2
~ χ2n-1 =
n-1
2 2
σ
 Zj
j=1
where


Zj2 ~ N 0, σ2 , j = 1,..., n - 1 and independent
Analogously, for the multivariate (p  2) case (i.e., X is
~
normal with mean  and covariance ), S is Wishart
~
~ ~
distributed (denoted~ Wm(| ) where
Wm
 | Σ
~
~
= Wishart distribution with m degrees of freedom
m
= distribution of
'
Z
 jZj
j=1
Some important properties of the Wishart distribution:
- The Wishart distribution exists only if n > p
- If
A1 ~ Wm1  A1 | Σ
independently of
then
A 2 ~ Wm2  A 2 | Σ
common
covariance
matrix
A1 + A 2 ~ Wm1 + m2  A1 + A 2 | Σ
- and
'

'
'
CA 1C ~ Wm1 CA 1C | CΣC

- When it exists, the Wishart distribution has a density
of
Wn-1  A | Σ =
A
p  n-1 2
2
π
 n- p-2 2
p  p-1 4
Σ

e

-tr AΣ-1 2
 n-1 2
p
1

Γ   n - i 

2

i =1
for a positive symmetric definite matrix A.
~
_
F. Large Sample Behavior of X and S
~
~
- The (Univariate) Central Limit Theorem – suppose that
n
X =
V
i
i =1
where the Vi have approximately equivalent
variability. Then the distribution of X becomes
relatively normal as the sample size increases no
matter what form the underlying population
distribution.
- Convergence in Probability – a random variable X is
said to converge in probability to a given constant
value c if, for any prescribed accuracy e,
P[- e < X – c < e] approaches 1 as n  
- The Law of Large Numbers – Let Y1,…, Yn constitute
independent observations from a population with
mean E[Y] = . Then
n
Y
i
Y =
i =1
n
converges in probability to  as n increases without
bound, i.e.,
_
P[- e < Y –  < e] approaches 1 as n  
Multivariate implications of the Law of Large
Numbers include
_
P[- e < X –  < e] approaches 1 as n  
~
~
~
~
and
P[- e < S –  < e] approaches 1 as n  
~
~
~
~
or similarly
P[- e < Sn –  < e] approaches 1 as n  
~
~
~
~
These
happen
very
quickly!
These statements are sometimes written as
p


P -ε  X - μ  ε 
1
n 
and
P -ε  S - Σ  ε 
1
p
n 
or similarly
P -ε  Sn - Σ  ε 
1
p
n 
- These results can be used to support the (Multivariate)
Central Limit Theorem – Let X1,…, Xn constitute
~
~
independent observations from any population with
mean  and finite (nonsingular) covariance .
Then
~
~
 1
X ~ NP  μ,
 n
.

Σ

for n large relative to p.
This can be restated as


.
n X - μ ~ NP 0, Σ
again for n large relative to p.
Because the sample covariance matrix S (or Sn)
~
~
converges to the population covariance matrix  so
~
quickly (i.e., at relatively small values of n – p), we
often substitute the sample covariance for the
population covariance with little concern for the
ramifications – so we have
 1 
X ~ NP  μ, Sn 
 n 
.
for n large relative to p.
This can be restated as


.
n X - μ ~ NP 0, Sn 
again for n large relative to p.
One final important result due to the CLT – by
substitution

 
'

n X - μ S-1 X - μ ~. χ2p
for n large relative to p.
G.Assessing the Assumption of Normality
There are two general circumstances in multivariate
statistics under which the assumption of multivariate
normality is crucial:
- the technique to be used relies directly on the raw
observations Xj
~
- the technique to be used relies directly on sample
mean vector X
(including those which rely on
~j
-1(X – ))
distances of the form n(X
–
)’S
~
~ ~ ~ ~
In either of these situations, the quality of inferences
to be made depends on how closely the true parent
population resembles the assumed multivariate
normal form!
Based on the properties of the Multivariate Normal
Distribution, we know
- all linear combinations of the individual normal are
normal
- the contours of the multivariate normal density are
concentric ellipsoids
These facts suggest investigation of the following
questions (in one or two dimensions):
- Do the marginal distributions of the elements of X
~
appear normal? What about a few linear combinations?
- Do the bivariate scatterplots appear ellipsoidal?
- Are there any unusual looking observations (outliers)?
Tools frequently used for assessing univariate
normality include
- the empirical rule
P(μ - 1σ  x  μ + 1σ)
P(μ - 2σ  x  μ + 2σ)
P(μ - 3σ  x  μ + 3σ)
0.68
0.95
0.997
- dot plots (for small samples sets) and histograms or
stem & leaf plots (for larger samples)
- goodness-of-fit tests such as the Chi-Square GOF Test
and the Kolmogorov-Smirnov Test
- the test developed by Shapiro and Wilk [1965] called
the Shapiro-Wilk test
- Q-Q plots (of the sample quantiles against the expected
quantile for each observation given normality)
Example – suppose we had the following fifteen
(ordered) sample observations on some random
variable X:
Ordered
Observations
x (j)
1.43
1.62
2.46
2.48
2.97
4.03
4.47
5.76
6.61
6.68
6.79
7.46
7.88
8.92
9.42
Do these data support the
assertion that they were
drawn from a normal parent
population?
In order to assess normality by the the empirical rule,
we need to compute the generalized distance from the
centroid (convert the data to a standard normal random
Standard
variable) – for our data we have
Ordered
x = 5.26
σ = 2.669
so the corresponding standard
normal values for our data are
Nine of the observations (or 60%)
lie within one standard deviation
of the mean, and all fifteen of the
observations lie within two
standard deviation of the mean –
does this support the assertion
that they were drawn from a
normal parent population?
Observations
x (j)
Normal
Variable
z (j)
1.43
1.62
2.46
2.48
2.97
4.03
4.47
5.76
6.61
6.68
6.79
7.46
7.88
8.92
9.42
-1.436
-1.367
-1.050
-1.045
-0.860
-0.461
-0.299
0.185
0.504
0.530
0.570
0.822
0.981
1.371
1.556
A simple dot plot could look like this:
.
. .
..
-1
0
1
2
. .
3
4
.
5
... . .
6
7
. .
8
9
10
11
This doesn’t seem to tell us much (of course, fifteen
data points isn’t much to go on).
How about a histogram?
Absolute
Frequecy
Histogram
6
5
4
3
This doesn’t seem to
tell us much either!
2
1
0
0-2
2-4
4-6
Classes
6-8
8 - 10
We could use SAS to calculate the Shapiro-Wilk test
statistic and corresponding p-value:
DATA stuff;
INPUT x;
LABEL x='Observed Values of X';
CARDS;
1.43
1.62
2.46
2.48
2.97
4.03
4.47
5.76
6.61
6.68
6.79
7.46
7.88
8.92
9.42
;
PROC UNIVARIATE DATA=stuff NORMAL;
TITLE4 'Using PROC UNIVARIATE for tests of univariate normality';
VAR x;
RUN;
Tests for Normality
Test
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
Stem
9
8
7
6
5
4
3
2
1
--Statistic--W
0.935851
D
0.159493
W-Sq 0.058767
A-Sq 0.362615
Leaf
4
9
59
678
8
05
0
55
46
----+----+----+----+
-----p Value-----Pr < W
0.3331
Pr > D
>0.1500
Pr > W-Sq >0.2500
Pr > A-Sq >0.2500
#
1
1
2
3
1
2
1
2
2
Boxplot
|
|
+-----+
|
|
*--+--*
|
|
|
|
+-----+
|
Normal Probability Plot
9.5+
+++*
|
+*+
|
*+*+
|
**+*+
5.5+
*++
|
+**+
|
++++
|
+++* * *
1.5+
* +++*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
Or a Q-Q plot
- put the observed values in ascending order - call these
the x(j)
- calculate the continuity corrected cumulative
probability level (j – 0.5)/n for the sample data
- find the standard normal quantiles (values of the N(0,1)
distribution) that have a cumulative probability of level
(j – 0.5)/n – call these the q(j), i.e., find z such that
p z  q j  =


1
-z2 2
e
dz = pj
2π
1
j2
=
n
- plot the pairs (q(j), x(j) ). If the points lie on/near a
straight line, the observations support the contention
that they could have been drawn from a normal parent
population.
The results of calculations for the Q-Q plot look like
this:
Standard
Adjusted
Ordered
Normal
Probability
Observations
Quantiles
Level
x (j)
(j-0.5)/n
q(j)
1.43
0.033
-1.834
1.62
0.100
-1.282
2.46
0.167
-0.967
2.48
0.233
-0.728
2.97
0.300
-0.524
4.03
0.367
-0.341
4.47
0.433
-0.168
5.76
0.500
0.000
6.61
0.567
0.168
6.68
0.633
0.341
6.79
0.700
0.524
7.46
0.767
0.728
7.88
0.833
0.967
8.92
0.900
1.282
9.42
0.967
1.834
…and the resulting Q-Q plot looks like this:
Observed
Values xj
Q - Q Plot
Standard Normal Quantiles q(j)
There don’t appear to be great departures from the
straight line drawn through the points, but it doesn’t
fit terribly well, either…
Looney & Gulledge [1985] suggest calculating the
Pearson’s correlation coefficient between q(j) and x(j)
(a test has even been developed) – the formula for the
correlation coefficient is
  x 

n
j
rQ =
j=1
  x 
- x q j - q
n
j
j=1
-x

2

  q  - q 
n
2
j
j=1
Critical points for the test of normality are given in
Table 4.2 (page 182) of J&W (note we reject the
hypothesis of normality if rQ is less than the critical
value).
For our previous example, the intermediate calculations
are given in the table below:
x (j) - x
-3.83
-3.65
-2.80
-2.79
-2.30
-1.23
-0.80
0.49
1.35
1.41
1.52
2.19
2.62
3.66
4.15
0.00
(x (j) - x)2
14.697
13.314
7.855
7.777
5.270
1.514
0.637
0.244
1.810
2.001
2.315
4.808
6.860
13.387
17.235
99.724
q(j) - q
-1.834
-1.282
-0.967
-0.728
-0.524
-0.341
-0.168
0.000
0.168
0.341
0.524
0.728
0.967
1.282
1.834
0.000
(q(j) - q)2 (x (j)
3.363
1.642
0.936
0.530
0.275
0.116
0.028
0.000
0.028
0.116
0.275
0.530
0.936
1.642
3.363
13.781
- x)(q(j) - q)
7.031
4.676
2.711
2.030
1.204
0.419
0.134
0.000
0.226
0.482
0.798
1.596
2.534
4.689
7.614
36.143
Evaluation of the Pearson’s correlation coefficient
between q(j) and x(j) yields
  x 
n
j
rQ =
-x
j= 1
  x 
n
j
j= 1
-x

2
 q
j
-q
  q 

n
j
j= 1
-q

2
36.143
=
99.724 13.781
= 0.9749513
The sample size is n = 15, so critical points for the test of
normality are 0.9503 at a = 0.10, 0.9389 at a = 0.05, and
0.9216 at a = 0.01. Thus we do not reject the hypothesis
of normality at any a larger than 0.01.
When addressing the issue of multivariate normality,
these tools aid in assessment of normality for the
univariate marginal distributions. However, we should
also consider bivariate marginal distributions (each of
which should be normal if the overall joint distribution
is multivariate normal).
The methods most commonly used for assessing
bivariate normality are
- scatter plots
- Chi-Square Plots
Example – suppose we had the following fifteen
(ordered) sample observations on some random
variables X1 and X2:
x j1
1.43
1.62
2.46
2.48
2.97
4.03
4.47
5.76
6.61
6.68
6.79
7.46
7.88
8.92
9.42
x j2
-0.69
-5.00
-1.13
-5.20
-6.39
2.87
-7.88
-3.97
2.32
-3.24
-3.56
1.61
-1.87
-6.60
-7.64
Do these data support the
assertion that they were
drawn from a bivariate
normal parent population?
The scatter plot of pairs (x1, x2) support the assertion
that these data were drawn from a bivariate normal
distribution (and that they have little or no correlation).
Scatter Plot
X2
X1
To create a Chi-Square plot, we will need to calculate
the squared generalized distance from the centroid for
each observation xj
~
2
j


'


d = xj - x S-1 xj - x , j = 1,
,n
For our bivariate data we have
5.26

x = -3.09


7.12 -0.67
0.1411 0.0076
S = -0.67 12.43  S-1 = 0.0076 0.0808




…so the squared generalized distances from the
centroid are
x j1
1.43
1.62
2.46
2.48
2.97
4.03
4.47
5.76
6.61
6.68
6.79
7.46
7.88
8.92
9.42
x j2
-0.69
-5.00
-1.13
-5.20
-6.39
2.87
-7.88
-3.97
2.32
-3.24
-3.56
1.61
-1.87
-6.60
-7.64
d 2j
2.400
2.279
1.336
1.548
1.739
2.976
2.005
0.090
2.737
0.281
0.333
2.622
1.138
2.686
3.819
if we order the
observations
relative to their
squared
generalized
distances 
x j1
5.76
6.68
6.79
7.88
2.46
2.48
2.97
4.47
1.62
1.43
7.46
8.92
6.61
4.03
9.42
x j2
-3.97
-3.24
-3.56
-1.87
-1.13
-5.20
-6.39
-7.88
-5.00
-0.69
1.61
-6.60
2.32
2.87
-7.64
d 2j
0.090
0.281
0.333
1.138
1.336
1.548
1.739
2.005
2.279
2.400
2.622
2.686
2.737
2.976
3.819
th
1

j


2
We then find the corresponding 100 
 percentile
 n 


of the Chi-Square distribution with p degrees of
freedom.
x j1
5.76
6.68
6.79
7.88
2.46
2.48
2.97
4.47
1.62
1.43
7.46
8.92
6.61
4.03
9.42
x j2
-3.97
-3.24
-3.56
-1.87
-1.13
-5.20
-6.39
-7.88
-5.00
-0.69
1.61
-6.60
2.32
2.87
-7.64
d2j
0.090
0.281
0.333
1.138
1.336
1.548
1.739
2.005
2.279
2.400
2.622
2.686
2.737
2.976
3.819
(j-0.5)/n
0.033
0.100
0.167
0.233
0.300
0.367
0.433
0.500
0.567
0.633
0.700
0.767
0.833
0.900
0.967
qc,2[(j-0.5)/n]
0.068
0.211
0.365
0.531
0.713
0.914
1.136
1.386
1.672
2.007
2.408
2.911
3.584
4.605
6.802
Now we create a
scatter plot of the
pairs
(d2j1, qc,2[(j-.5)/n])
If these points lie on a
straight line, the data
support the assertion
that they were drawn
from a bivariate
normal parent
population.
These data don’t seem to support the assertion that
they were drawn from a bivariate normal parent
population…
d2(j)
Chi-Square Plot
possible
outliers!
q c,2[(j-0.5)/n]
Some suggest also looking to see if roughly half the
squared distances d2j are less than or equal to qc,p(0.50)
(i.e., lie within the ellipsoid containing 50% of all
potential p-dimensional observations).
For our example, 7 of our fifteen observations (about
46.67%) of all observations are less than qc,p(0.50) =
1.386 standardized units from the centroid (i.e., lie
within the ellipsoid containing 50% of all potential pdimensional observations).
Note that the Chi-Square plot can easily be extended to
p > 2 dimensions.
Note also that some researchers also calculate the
correlation between d2j1 and qc,p[(j-.5)/n]. For our
example this is 0.8952.
H.Outlier Detection
Detecting outliers (extreme or unusual observations)
in p > 2 dimensions is very tricky. Consider the
following situation:
X2
90% confidence
ellipsoid
90% confidence
interval for X2
X1
90% confidence
interval for X1
A strategy for multivariate outlier detection:
- Look for univariate outliers
• standardized values
• dot plots, histograms, stem & leaf plots
• Shapiro Wilk test, GOF Tests
• Q-Q Plots and correlation
- Look for bivariate outliers
• generalized square distances
• scatter plots (perhaps a scatter plot matrix)
• Chi-Square plots and correlation
- Look for p-dimensional outliers
• generalized square distances
• Chi-Square plots and correlation
Note that NO STRATEGY guarantees detection of
outliers!
Here are calculated standardized values (zji’s) and
squared generalized distances (d2j’s) for our previous
data:
x j1
5.76
6.68
6.79
7.88
2.46
2.48
2.97
4.47
1.62
1.43
7.46
8.92
6.61
4.03
9.42
z j1
0.185
0.530
0.570
0.981
-1.050
-1.045
-0.860
-0.299
-1.367
-1.436
0.822
1.371
0.504
-0.461
1.556
x j2
-3.97
-3.24
-3.56
-1.87
-1.13
-5.20
-6.39
-7.88
-5.00
-0.69
1.61
-6.60
2.32
2.87
-7.64
z j2
-0.250
-0.043
-0.131
0.347
0.556
-0.599
-0.936
-1.359
-0.541
0.681
1.333
-0.994
1.536
1.691
-1.291
This one looks a little
unusual in p = 2 space
d2j
0.090
0.281
0.333
1.138
1.336
1.548
1.739
2.005
2.279
2.400
2.622
2.686
2.737
2.976
3.819
I. Transformations to Near Normality
Transformations to make nonnormal data
approximately normal are usually suggested by
- theory
- the raw data
Some common transformations include
Original Scale
Counts y
Transformed Scale
y
^
Proportions p
Correlations r
 ˆ
1
p 
logit  ˆ
p =
log 

ˆ
2
1
p


1
1 + r
Fisher's z  r  =
log 

2
1 - r
For continuous random variables, an appropriate
transformation can usually be found among the family
of power – Box and Cox [1964] suggest an approach to
finding an appropriate transformation from this
family.
Box and Cox consider the slightly modified family of
power transformations
x λ 
 xλ - 1
λ  0

=  λ
ln  x  λ = 0

For observations x1,…,xn, the Box-Cox choice of
appropriate power l for the normalizing transformation
is that which maximizes
1
n
l  λ  = - ln 
2
n
where
x
λ
 x
n
λ
j
-x
j=1
λ
j

2
n

 +  λ - 1  ln x j
j=1

 xλ - 1
λ  0

=  λ
ln  x  λ = 0

and
λ
xj
1
=
n
n
x
j=1
λ
j
λ

x
1 j - 1
= 

n  λ 
 
We then evaluate l (l) at many points on an short
interval (say [-1,1] or [-2,2]), plot the pairs (l, l (l)) and
look for a maximum point.
l (l)
l (l*)
l*
Often a logical value of l near l* is chosen.
l
Unfortunately, l is very volatile as l changes (which
create some other analytic problems to overcome).
Thus we consider another transformation to avoid this
additional problem:
 xjλ - 1
xjλ - 1
=
for λ  0
 .
λ-1
1n
 n

 λxjλ-1


λ   x i  
yj λ  = 
 i = 1
 

 .
 xln  x  for λ = 0
where
1
n
 n

x =   xi 
 i =1 
is the geometric mean of the responses and is
frequently calculated as the antilog of
.
()
n
-1
ln x = n  ln  xi 
.
i=1
.
λ -1
and x is the nth power of the appropriate Jacobian of
the transformation (which converts the responses (xi’s
into yj λ  ’s).
λ
From this point forward proceed substituting the yj  ‘s
λ
x
for the j ’s in the previous analysis.
The l that results in minimum variance of this
transformed variable also maximizes our previous
criterion
1
n
l  λ  = - ln 
2
n

n
j=1
xj λ  - xj λ 

2
n

 +  λ - 1  ln x j
j=1

 
Note that:
- the value of l generated by the Box-Cox transformation
is only optimal in a mathematical sense – use
something close that has some meaning.
- an approximate confidence interval for l can be found
- other means for estimating l exist
- if we are dealing with a response variable,
transformations are often use to ‘stabilize’ the variance
- for a p-dimensional sample, transformations are
considered independently for each of the p variables
- while the Box-Cox methodology may help convert each
marginal distribution to near normality, it does not
guarantee the resulting transformed set of p variables
will have a multivariate normal distribution.
Download