Uploaded by jbyate

Lecture 2, Principles

advertisement
Lecture 2
Statistical Principles
Probability
• A random variable X is a variable whose numerical value is
determined by chance, the outcome of a random phenomenon
– A discrete random variable has a countable number of possible
values, such as 0, 1, and 2
– A continuous random variable, such as time and distance, can
take on any value in an interval
• A probability distribution P[Xi] for a discrete random variable X
assigns probabilities to the possible values X1, X2, and so on
• For example, when a fair six-sided dice is rolled, there are six
equally likely outcomes, each with a 1/6 probability of occurring
1
Mean, Variance, and
Standard Deviation
• The expected value (or mean) of a discrete random variable X is a
weighted average of all possible values of X, using the probability of
each X value as weights:
N
X  E[ X ]   XiP[ Xi ]
i 1
• When all weights are equal (like in our example of throwing a dice), we
can simplify to:
1 N
X  E[ X ]   Xi
N i 1
2
Mean, Variance, and
Standard Deviation
• The variance of a discrete random variable X is a weighted average, for all
possible values of X, of the squared difference between X and its expected
value, using the probability of each X value as weights:
N
  E [( X   X ) ]   ( X i  X )2 P [ X i ]
2
X
2
i 1
• When all weights are equal (like in our example of throwing a dice), we can
simplify to:
N
1
 X2  E [( X   X )2 ]   ( X i  X )2
N i 1
3
Standardized Variables
• To standardize a random variable X, we subtract its mean µ and
then divide by its standard deviation σ:
Zi 
X i  X
X
• No matter what the initial units of X, the standardized random
variable Z has a mean of 0 and a standard deviation of 1.
• The standardized variable Z measures how many standard
deviations X is above or below its mean:
– If X is equal to its mean, Z is equal to 0
– If X is one standard deviation above its mean, Z is equal to 1
– If X is two standard deviations below its mean, Z is equal to –2
4
Example: The throw of a dice
5
Example: The throw of a dice (cont.)
6
Probability Distribution (Density Curve) for
10 six-sided dice, using standardized Z
Now, let X be the sum of the numbers when rolling 10 six-sided dice. The next
Figure illustrates the standardized random variable Z for that case:
7
The Normal Distribution
• The density curve of Z for many rolls of a dice approaches the normal
distribution (graphed in the next Figure).
• The central limit theorem (CLT) states:
“If Z is a standardized sum of N independent, identically distributed random
variables with a finite, nonzero standard deviation, then the probability
distribution of Z approaches the normal distribution as N increases.”
• In other words, the CLT says that the sum (or mean) of many random
variables is distributed according to the normal distribution, e.g.:
– The weights of humans, dogs, and tomatoes
– Scores on IQ, SAT, and GRE tests
– Many economic variables.
8
The Normal Distribution
9
The Normal Distribution (cont.)
• Special feature of the normal distribution: The probability that the
value of Z will be in a specified interval is given by the
corresponding area under the density curve
– These areas can be determined by consulting statistical
software, such as GRETL or a table, such as Table B-7 in
Appendix B of the textbook.
– As a rule of thumb:
P[-1<Z<1] = 0.6826
P[-2<Z<2] = 0.9544
10
Moments of a Distribution
• The Mean of a distribution is called the first (central) moment of a
distribution. The Variance is called the second moment of a
distribution. When a statistical distribution has strong central
tendency, it is useful to characterize it by its moments.
• The third and fourth moments of a distribution are skewness and
kurtosis.
• Skewness is a measure of asymmetry of a distribution:
Skewness 

E ( X   x )3

 x3
– Skewness = 0: distribution is symmetric
– Skewness > (<) 0: distribution has long right (left) tail
11
Moments of a Distribution
• Kurtosis measures the mass in the tails of a distribution. It is a
measure for the probability of large values:
Kurtosis 

E ( X   x )4

 x4
– Kurtosis = 3: normal distribution
– Kurtosis > 3: heavy tails
• The kurtosis of a distribution is a measure of how much mass is in
the tails and, therefore, is a measure of how much of the variance of
X arises from extreme values (outliers).
12
Moments of a Distribution
13
Download