Slides for Session #11

advertisement
Statistics for Social
and Behavioral Sciences
Session #11:
Random Variable, Expectations
(Agresti and Finlay, Chapter 4)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
Week 1
PART II. DESCRIBING DATA
Weeks 2-4
PART III. DRAWING CONCLUSIONS FROM DATA:
INFERENTIAL STATISTICS
Weeks 5-9
Firenze or Lebanese Express?
PART IV. : CORRELATION AND CAUSATION:
REGRESSION ANALYSIS
This is where we talk
about Zmapp and Ebola!
Weeks 10-14
Last Session
• Four rules of probability distributions
1. P(not A) = 1 – P(A)
2. P(A or B) = P(A) + P(B) when P(A and B)=0
3. P(A and B)=P(A) P(B given A)
Beware of the inverse probability fallacy,
P(B given A) is not P(A given B)
3’. P(A and B)=P(A) P(B) when A and B are independent
• Inverse Probability Fallacy:
– P(A|B) is not P(B|A).
– We have a formula P(A|B) = (P(B|A) P(A)) / P(B)
Outline
1. Random Variable
Probability distribution of a random variable
Expectation of a random variable
1. The normal distribution
2. Polls and normal distributions
Next time:
Probability Distributions (continued)
Chapter 4 of A&F
Random variable
A random variable is a variable whose value is not given exante… but rather can take multiple values ex-post.
• Example:
– X is a random variable that, before the coin is tossed (ex-ante),
can take values « Heads » or « Tails ». Once the coin is tossed
(ex-post), the value of X is known, it is either « Heads » or
« Tails ».
– Y is a random variable that can take values 1,2,3,4,5, or 6
depending on the draw of a dice. Before the dice is thrown,
the value is not known. After the dice is drawn, we know the
value of Y.
Probability distribution
of a random variable
• Take all possible values of a random variable Y:
– Example: 1,2,3,4,5,6
– In general: y1, y2, y3, …, yK.
• Probability of the event that the random variable Y equates
yk is noted P(Y=yk) or simply P(yk).
• The probability distribution of random variable Y is the list of
all values of P(Y=yk).
• Example: for a balanced dice, the
probability distribution of Y is the
list of values P(Y=1), P(Y=2), P(Y=3), …
which is {1/6,1/6,1/6,1/6,1/6,1/6}
All throughout the course
we consider either discrete
quantitative random
variables or categorical
random variables.
Expected value of a random variable
What are your expected gains when playing the coin game?
• Gain is a random variable, equal to +10 AED when getting
heads, and -10 AED when getting tails.
E(gain) = Gain when getting heads x Probability of heads
+ Gain when getting tails x Probability of tails.
In general, for a random variable Y, the expected value of Y is:
• E(Y) = S yk P(Y=yk)
Also note that probabilities sum to one.
S P(Y=yk) = 1
Should I play this
game at all?
What is my
expected gain??
Expected Earnings?
Hum, how
much will I
earn??
• « Your annual earnings right after NYU Abu
Dhabi » is a random variable…
– The variable has not been realized yet.
Let’s give it a name
Y = « Your annual earnings right after NYU Abu
Dhabi ».
• E(earnings) = E(Y) = S yk P(Y=yk)
Takes potentially K values.
• Problemo: We don’t observe earnings in the
future!!!
Expected Earnings?
Hum, how
much will I
earn??
An approximation is to use the distribution of current
graduates …
To substitute for our lack of knowledge
of P(Y=yk) for each k.
• Earnings take K distinct values, no two graduates earn
exactly the same annual wage…
• Hence an approximation of expected earnings is
E(Y) = S yk x (1/ K)
• The average earnings of current graduates…
• But that’s only an approximation !! What could be
wrong?
Properties of the Expectation
The expectation of the sum is the sum of the expectations:
• E(earnings – debt) = E(earnings) – E(debt)
The expectation of a constant x the random variable is the
constant x the expectation:
• E( Constant x Earnings ) = Constant x E(Earnings)
E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD)
Beware !!!
• E( X Y ) is not E(X) E(Y) in general.
• When X and Y are independent, E( X Y ) = E(X) E(Y).
• Law of conditional expectation E(X)=E(E(X|Z))
Outline
1. Random Variable
Probability distribution of a random variable
Expectation of a random variable
1. The normal distribution
2. Polls and normal distributions
Next time:
Probability Distributions (continued)
Chapter 4 of A&F
A particular distribution
• Some random variables have a particular “bell-shaped”
distribution:
– Individuals’ height.
• What is the distribution of height at age 20? P(height)
• What height can I expect for my child? E(height)
– Individuals’ weight.
• What is the distribution of weight at age 35? P(weight)
• What weight can I expect at age 35? E(height)
– The logarithm of income.
• What is the distribution of the log of income after graduation?
P(log(income))
• What log income can I expect after graduation?
• The “bell-shaped” distribution will now be called a
“normal” distribution.
The normal distribution
• “The normal distribution is symmetric, bell
shaped, and characterized by its mean m and
standard deviation s.
• The probability within any particular number
of standard deviations of m is the same for all
normal distributions.”
P(m – s < height < m + s) = 0.68
or 68%
P(m - 2s < height < m + 2s) = 0.95
or 95%
P(m - 3s < height < m + 3s) = 0.997 or 99.7%
All of these are “events”
The normal distribution
• Draw a histogram will a very small bin size… so that the
little stairs disappear…. and a curve appears.
Comparing test scores across colleges
“Early paleontology in Indianapolis”
“Hip hop in the Middle East”
Test scores have a normal
distribution with mean 3
and standard deviation 4.
Test scores have a normal
distribution with mean 4
and standard deviation 1.
• Problem: how do I compare Marina’s test score of 3.6 at the paleontology course
with a test score of 4.1 at the Hip Hop in the Middle East?
Z-score !
• Take a student’s paleontology test score at the
end of the semester. This is a random variable.
– Its probability distribution has a mean of m=3 with
a standard deviation of s=4.
– Now consider the “z-scored” paleontology test
score:
z - scored paleontology =
paleontology test score - m
– The z-scored paleontology test scoreshas a mean
of 0, and a standard deviation of 1.
Standard Normal Distribution
• Is simply the normal distribution with mean 0
and standard deviation 1.
• A z-score of 3 means that the student is three
times the standard deviation (of original test
scores) above the mean.
So who has a better grade, Marina
or Slavoj?
Outline
1. Random Variable
Probability distribution of a random variable
Expectation of a random variable
1. The normal distribution
2. Polls and normal distributions
Next time:
Probability Distributions (continued)
Chapter 4 of A&F
Who will win the mid term
elections in the US?
• Mid term elections are held two years after
the presidential elections in the United States.
• They take place early november 2014.
• A question: what fraction of the voters will
vote for a democrat in Colorado?
Wrap up
• A random variable is a variable whose value has not been
realized.
• The expectation of a random variable Y is:
E(Y) = S yk P(Y=yk)
Also, E(X+Y) = E(X) + E(Y),
and E(c X)=c E(X), and E(E(X|Z))=E(X)
• Typically the probability distribution P is not known, but we
approximate it….
– Using the distribution for past values of Y (example: earnings of
previous graduates)
– Using polls, to ask individuals for example how they will vote.
• The normal distribution is an ubiquitous distribution, that is
symmetric, bell shaped. It is characterized by its mean m and its
standard deviation s.
• The standard normal distribution has mean 0 and standard
deviation 1.
Coming up:
Readings:
• Chapter 4 entirely – full of interesting examples and super relevant.
• Online quiz tonight.
• Go to the website
http://www.realclearpolitics.com/epolls/2014/senate/co/colorado_senate_gardner_vs_udal
l-3845.html and prepare one or two slides to present the race in Colorado.
–
–
–
Who do you think will win?
What is MoE?
What is the likely distribution of the “fraction of voters who will vote for Gardner?”
For help:
• Amine Ouazad
Office 1135, Social Science building
amine.ouazad@nyu.edu
Office hour: Tuesday from 5 to 6.30pm.
• GAF: Irene Paneda
Irene.paneda@nyu.edu
Sunday recitations.
At the Academic Resource Center, Monday from 2 to 4pm.
Download