Review of Probability Theory & Distributions

I.

Intro to Econometrics/Econ 526

Fall 2014/ Manopimoke

Probability Theory & Distributions

Probability theory originated with gambling. Gamblers were attempting to understand how likely it would be to win a game of chance.

A.

Even though chance is common in the vernacular, it’s difficult to define.

B.

Keeping the idea of gambling in mind, let’s define chance build up to a definition of probability.

1.

The chance of something happening gives the percentage of time it’s expected to happen, when the process is done over and over again, independently under the same conditions.

 the chance of an event lies between 0% and 100% or 0 and

1.

C.

A Random Experiment is the process of observing the outcome of a chance event.

D.

Elementary Outcomes: are all possible results of a random experiment. In the book elementary outcomes are known as events and simple events.

E.

The Sample Space: is the set of all the elementary outcomes. Now an example.

1.

The Random experiment: The coin toss.

2.

The Elementary outcomes: _________________

3.

The Sample Space: ______

4.

Dice:

 elementary outcomes: _____________

 sample space:_______________

1



F.

Random Variable: numerical summary of a random outcome

 Discrete random variable – a random variable that takes on a discrete set of values: 0, 1,

 Continuous random variable – a random variable that takes on a continuum of possible values or an interval of values

G.

Probability: a numerical weight that measures the likelihood of each elementary outcome. 0 < P(X) <=1

H.

Examples of Classical Probability:

1.

For a Coin:

2.

For a Die:

I.

General Rules about probability:



P(X)>=0. Probabilities are non-negative.



Probability of the entire sample space = 1.

J.

Events: Sets of elementary outcomes. The probability of an event is the sum of the probabilities of elementary outcomes in the set. Consider the roll of 2 dice (black and white):

Event Description

Dice Add to 3

Dice Add to 6

White Die shows 1

Event’s Elementary Outcomes

Probability

2

I.



Working with Probabilities

1.

To find the chance that at least one of two things will happen, check to see if they are mutually exclusive. If they are, add the chances.



Example: A card is dealt from a shuffled deck. What is the chance that the card is either a heart or a spade?

2.

Throw a pair of dice. The chance of getting at least one die = 1

B.

Conditional Probability: The probability that one event occurs given the condition that another event has already occurred. P(A|C)

1.

A Card Game: 52 card deck (no jokers). 4 suits of 13 cards each.

Shuffle the deck of cards and the two top cards are put down on the table. You win $1 if the second card is a queen of hearts.

3







What is the probability of winning the dollar?

You turn over the first card and it’s a seven of clubs. Now what is your chance of winning a $1?

2.

The second example is called a conditional chance. Given that the first card is not what we want, what’s the probability of getting what we want, the queen of hearts?

3.

A New experiment: We roll the white die = 1. Now what’s the probability that the 2 dice sum to 3?

4.

A Formal definition of Conditional Probability:

P(E|F) = P(E and F) / P(F).



Asides: P(E|E)=1; P(E|F)=0 when E and F are mutually exclusive.

C.

The Multiplication Rule: P(E and F) = P(E|F)*P(F)

1.

Another experiment: using the same card game above, what is the chance that the first card is the seven of clubs and the second card is the queen of hearts?

2.

Another experiment: a deck of cards is shuffled and two cards are dealt. What is the chance that they are both aces?

4





In both examples above, the chance of second event was affected by the outcome of the first event. In other words, the conditional probability figured into the outcome of both events occuring.



A chance that two things will both happen equals the chance that the first will happen, multiplied by the chance that the second will happen given that the first has happened. (The conditional probability of the second multiplied by the probability of the first).

3.

An example of Independence: A coin is tossed twice. What is the chance of a head followed by a tail?

4.

Independence: Two events are independent if the occurrence of one has no influence on the probability of the other.





 in terms of conditional probability: when 2 events are independent, they have a special multiplication rule:

An example: Event C=white die=1; Event D=black die=1.

P(C|D)=P(C and D) / P(D);

5



But the probability of the white die showing 1 affects the

 probability of the dice summing to 3

5.

More intuition on independence: Say we have 3 tickets in a box: red, white and blue. 2 tickets are drawn with replacement.

Replacement means you put the first ticket drawn back into the box. What is the chance of drawing the red ticket and then the white?

6.

Suppose instead you draw 2 tickets without replacement (there’s a conditional probability here). What is the chance of drawing the red and then the white?

I.

Preliminaries on Probability Distributions:

A.

Counting: suppose we can get one of two outcomes from a random experiment (coin toss) or one of 6 outcomes if we roll a die. How many possible combinations of coin flips and die rolls can we get?

1.

In general 2 possible outcomes for coin toss = m

2.

6 possible outcomes for die = n

3.

Total combinations= m*n

B.

Factorial Rule--given n different items, they can be arranged in n! different ways. 5! = 5*4*3*2*1=120

6



C.

Permutations: the number of sequences of k items selected from n total without replacement and order is important is given by: n !

( n

 k )!

D.

Combinations: the number of combinations of of k items selected from n total without replacement when order is unimportant is given by: n !

( n



)! !

(Also known as the binomial coefficient)

II.

Examples:

1.

A PIN code at your bank is made up of 4 digits, how many combinations are possible?

2.

There are 10 entries in a contest. Only three will win, 1 st

, 2 nd

, or

3 rd prize. What are the possible results?

3.

A softball league has 7 teams, what are the possible ways of ranking the teams?

7



4.

From a group of 4 people, 3 are selected to form a committee. How many combinations are there?

5.

How many different combinations of 3 girls and 4 boys can you get?

III.

Quick Mathematics Review

A.

Some of the Rules (let k be a constant and X and Y be variables with multiple observations going from i=1, 2, .... N )













Rule 1:

N å i

=

1 kX i

= k

N å i

=

1

X i

Rule 2:

N (

X i

+

Y i

) i

å =

X i i

N å

=

1

+

=

1

Y i i

N å

N

Rule 3:

å k

= kN i

=

1

N

Rule 4:

(

X i

-

X

) i

å =

0

Rule 5:

N i

å (

X i

-

X )( Y i

-

Y

)

=

1

N i

N å

=

1

X i

Y i

+

XY

Rule 6:

N i

å (

X i

-

X

)

2 =

1

N

N å i

=

1

X i

2

-

X 2

8



Rule 7:

N  i

N  j

X Y i j



( i

N 



1



X i

)( j

N 



1

Y j

)



Rule 8:

N  i

N  j

( X ij



Y ij

)

 i

N 



1 j

N 



1

X i

 i

N 



1 j

N 



1

Y ij

)

These are important for doing the math involved in econometrics.

IV.

Probability distributions

A.

Definitions

1.

Random Variable-- a variable that has a single numerical value for each outcome of an experiment



Discrete--countable and finite number of outcomes



Continuous--infinite number of outcomes

2.

Probability Distribution--the probability of each value of the random variable (similar to a relative frequency)

 

P x



1



0



P(x)



1

3.

Each probability distribution has a mean and standard deviation denoted by the following formulas:





   xP x

= the mean



2  

( x

 

)

2

( )

= the variance

4.

The expected value of a random variable is in general, the long run average value of the random variable over many repeated trials. It is computed as a weighted average of the possible outcomes of that

9


Fall 2014/ Manopimoke random variable where the weights are the probability of that outcome.

   where

 x

  x mean

5. The variance of a random variable x denoted Var(x) is the expected value of the square of the deviation of X from its mean



Var(x) = ∑(𝑥 − 𝜇 𝑥

) 2 𝑃(𝑥)

I.

Probability distributions may be summarized by measures of central tendency and spread.

A.

Mean/Median:

B.

Variance:

C.

Algebra of Expectations:

1.

Let x y

2.

3.

Let a b

The following are properties of expectation operators

    a



E ax )



( )



E ax

 b )

 a E x

 b

10



E x

 y )



( )



( )

D.

Expectation Operators for the Variance:

1.

Var(x) =

2.

3.

Standard deviation of (x) =



2 x

  x

Var(a) = 0 A constant, a, has no variance

Proof:



E.

Properties of the Variance using Expectation Operators

1.

Var ( x

+ a )

=

Var x

=

E [(( x

+ a )

-

E ( x

+ a )) 2 ]

=

E [( x

-

Ex ) 2

+

( a

-

Ea ) 2 ]

=

Var x

2.

Var ( ax )

= a

2

Var ( x )

=

E [( ax

-

E ( ax ))

2

]

=

E ( a 2 ( x

-

E ( x ) 2 )

= a

2

Var x

11



F.

Covariance measures linear dependence between x and y i.e. the extent in which two random variables move together.

1.

Covariance: s xy

2.

Cov x y



E x



( ))( y



( ))] or E XY )



E X E Y

3.

The variance of the sum of two random variables x and y is the sum of their variances plus two times their covariance.

Var x

 y )

  

2 Cov xy



E x

 y )



(

 y )]

2



E [(( x



Ex )



( y



Ey



E x

 y )



E x



E y

2



E x



( )]

2 

E y



( )]

2 

2 E x



( ))(( y



( ))]

Note that if x and y are independent, Var(x+y) = Var(x) +Var(y)



E(X*Y)



E(X)*E(Y) in most cases.



Random variables are said to be independent if the probability of one occurring has no effect on the probability of the second occurring.

(1) When two variables are independent:

(2) x is orthogonal to y

(3) E(X*Y)=E(X)*E(Y)

12

4.

5.



The Correlation coefficient is a function of the covariance: r xy

= s s xy x s y



 s x s y

=

Var ( x ) Var ( y )

-

1

£ r £

1 : the correlation coefficient lies between zero and 1.

 ρ

= 1



positive linear association

 ρ

= -1



negative linear association

 ρ

= 0



no linear association

The Conditional Expectation of Y given X (also called the conditional mean)



E ( Y | X

= x ) is the mean value of Y when X takes on the value x.

6.

Law of Iterated Expectations



E ( Y )

= [

( Y | X )

]

--the mean of Y is equal to the weighted average of the conditional expectation of Y given all values of X.



E ( Y )

= [

( Y | X )

] = n å i

=

1

E ( Y | X

= x i

)Pr( X

= x i

)

I.

Moments of Random Variable – measures the shape of a distribution

A.

E x r

[( ) ]

 th r moment , r>0

B.

Mean = E [( x )

1

]the first moment.

C.

Variance = E ( x

-

E ( x ))

2

2 nd

moment

13

D.

Skewness =

E ( x

-

E ( x ))

3 s

3 x

3 rd

moment

E.

Kurtosis =

E ( x

-

E ( x )) s

4 x

4

4 th

moment

All mean 0 or variance 1



II.

The Normal Distribution: a very useful probability distribution. According to G.

Lippmann a French Physicist: Everybody believes in the [normal approximation], the experimenters because they think it is a mathematical theorem, the mathematicians because they think it’s an experimental fact.

14



A.

Properties: bell shaped; symmetric; a continuous distribution based on a formula (that you won’t be required to know or understand): f ( z )

=

1

2 p e

z

2

2 ; as z get’s quite large, f(z) approaches zero; takes a max at z = 0.

1.

E ( x )

=

Where,

Var x

= m x s

2 x

B.

The Standard Normal Distribution

1.

2.

Random variables that have a N(0,1) distribution is denoted Z called the standard normal distribution

Z

: m z

=

0 s z

2

=

1

Where we define Z as follows:

Z=

X

= xm s x x m x

+ s x

Z

3.

4.

X is standardized by subtracting the mean and dividing by the standard deviation. E.g. What is Pr(X<=2) when X~N(1,4)?

The sum of two normal distributions is a normal distribution

Density curves: the graph of a continuous probability distribution satisfying the following properties



Total area underneath the graph sums to 1



Every point has a height >=0.



Probability density function (pdf): Area under the pdf between two points is the probability that a random variable falls between those two points

15





Cumulative distribution function (cdf or Ф): the probability that a random variable is less than or equal to a particular value i.e.

Pr(Z<=c) = Ф(c)

A continuous random variable has a normal distribution if that 5.

distribution is symmetric and bell-shaped.

C.

Why do we focus on the Normal Distribution??? Data influenced by many small and unrelated random effects are approximately normally distributed.

This fact is the key to the rest of the course. Assuming random errors, and large samples, everything is normally distributed. Thus we can use the normal distribution (a theoretical concept) to examine the statistical relationship between data [what we’ll be doing for the remainder of the course. ] Because in the long run, everything approaches the normal distribution.

D.

Facts about the normal distribution:

1.

It’s a bell-shaped curve; symmetric; the area underneath the curve sums to 1; the expected value (average) pins the center of the probability histogram to the horizontal axis and the standard error fixes it’s spread.

E.

We will be working with the Standard Normal distribution: a very special normal distribution. The mean = 0 the SD = 1. Any normal random variable can be converted into a standard normal random variable--where we use the standard normal distribution:

1.

The z transformation: z

= x

s m

; subtract the mean from the random variable and divide by the standard deviation. This converts every

16

2.


Fall 2014/ Manopimoke normal distribution into a standard normal distribution. (You'll be using this method for the remainder of the semester).

Suppose we have a normal distribution, then we can convert it to the standard normal as long as we know the mean and the standard deviation. Once we’ve converted it to the standard normal, we can determine probabilities by looking them up in a standard normal probability table.



The probability table gives us the z-score and F(z)the probability of being at a location in the standard normal distribution (see handout)



The formula for this F(a) = Pr(z<=a);



What about the area to the right z = a?



We can find the probability of z being in any interval a<=z<=b.

3.

4.

 Rule of thumb: when in doubt, draw a picture of what you’re trying to find!

Examples:

Let’s assume the weights of students are normally distributed with mean

= 150 and standard deviation =20. What’s the probability of weighing more than 170 lbs?

17



5.

The general rule for computing normal probabilities:



Pr( a <= x <= b)

F.

Confidence Intervals from the normal distribution are given by:

I.

Chi-Squared Distribution

A.

given the following sequence of random numbers,

{ x i

,

…

i

=

1, n

}

~ N (0,1)

B.

Then n å i

=

1

X i

2

~ c

2

( n ) with n degrees of freedom.

1.

The sum of squares of N independently distributed standard normal random variables is distributed

 2 with N degrees of freedom.

2.

3.

Useful for testing hypotheses that deal with variances.

We can show that the variance is distributed chi-squared with N-1 degrees of freedom,

S 2

=

1 n

-

1

å n i

=

1

( n

-

1)

S

2 s

2

= å n i

=

1

( x i

x ) 2 , x ~ N (0,1)

å

åå x i s

x

å

åå

2 where the second equation is just the standard normal squared and summed. So the chi-square is the sum of a squared normal distribution.

18



II.

T-distribution

A.

Let Z~N(0,1) and W~

 2

(N), Z and W are independent random variables. Then

1.

Z

W

N

~ t ( N ). Then the ratio has a t-distribution with N degrees of freedom. t-distribution has a slightly larger variance than the normal distribution but the same bell shape.

III.

F-Distribution:

A.

X,Y two independent random variables distributed

 2

(n(1)), and

 2

(n(2)), then the ratio is distributed F with n(1), n(2) degrees of freedom:

X n

1

Y n x

~ F ( n

1

, n

2

).

1.

The relationship between the F and t distributions is given as follows:

[ t ( N )]

2

~ F ( N ,1).

Random Sampling and Distribution of Mean

III.

Importance of Random Samples

1.

N independent draws from the same population

2.

iid = independent and identically distributed.

19





Most of the theoretical results in econometrics and statistics rely on the iid assumption.



A sequence/collection of random variables are iid. if each random variable has the same probability distribution.

3.

In simple random sampling, n objects are drawn at random from a population with an equal probability of being selected.

IV.

Characteristics of a Sampling Distribution

A.

The difference between an estimator and an estimate

1.

Estimator: The general rule for getting an “estimate” (some number)

2.

3.

from a population

Estimate: some number that’s a function of an estimator

Example: The Sample Average is an Estimator



X

=

1

N

N å i

=

1

X i



Note: X is a function of the random variable X, and is also a random variable since the value of X differs from one sample to the next.



The sample mean is an estimator of the population random variable.

Since it is a random variable it has a probability distribution which is called the sampling distribution.

20



The mean and variance of the sampling distribution of X



{ ,

1 2

, x

3

 x n

}

are i.i.d. draws from the same population i.e. each x i

has the same marginal distribution.

 The sample mean X is denoted as

X =1/n(x

1

+x

2

+…+x n

) = i n 



1 x i

 Since X is a random variable, it has a sampling distribution. What is it?

21



V.

Large Sample Approximations of Sampling Distribution.

For small sample sizes, the distribution of is complicated, but if n is large, the sampling distribution is simple!

1.

As n increases, the distribution of Y becomes more tightly centered around



Y

(the Law of Large Numbers )

2.

Moreover, the distribution of Y – 

Y

becomes normal (the Central Limit

Theorem )

These two theorems provide enormous simplifications in empirical analysis.

22



The Law of Large Numbers :

An estimator is consistent if the probability that its falls within an interval of the true population value tends to one as the sample size increases.

If ( Y

1

,…,

Y n

) are i.i.d. and 

2

Y

<



, then Y is a consistent estimator of



Y

, that is,

Pr[| Y –



Y

| <



]



1 as n

  which can be written, Y p

 

Y p





Y

Y



Y

”).

( the math : as n

 

, var( Y ) =



2

Y



0, which implies that Pr[| –



Y

| <



]



1.) n

23



The Central Limit Theorem (CLT):

If ( Y

1

,…,

Y n

) are i.i.d. and 0 <



Y

2 <



, then when n is large the distribution of Y is well approximated by a normal distribution.



Y is approximately distributed N (



Y

,



2 ) (“normal distribution with mean 

Y

Y n and variance 

2

Y

/ n

”)

 n ( Y –



Y

)/



Y

is approximately distributed N (0,1) (standard normal)



The “standardized” Y =

Y



( )

=

Y



Y



/



Y n

is approximately distributed N (0,1)



The larger is n , the better is the approximation. How large does n have to be?

24



Summary: The Sampling Distribution of Y

For Y

1

,…,

Y n

i.i.d. with 0 <



Y

2 <



,



The exact (finite sample) sampling distribution of Y has mean



Y

(“ Y is an unbiased estimator of



Y

”) and variance 

2

Y

/ n



Other than its mean and variance, the exact distribution of Y is complicated and depends on the distribution of Y (the population distribution)



When n is large, the sampling distribution simplifies: o o

Y p

 

Y

(Law of large numbers)

Y



( )

is approximately N (0,1) (CLT)

25

Review of Probability Theory & Distributions

Probability Theory & Distributions

…

Random Sampling and Distribution of Mean

Related documents

Products

Support

Review of Probability Theory & Distributions

Probability Theory & Distributions

…

Random Sampling and Distribution of Mean

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib