Statistics for Social and Behavioral Sciences Session #11: Random Variable, Expectations (Agresti and Finlay, Chapter 4) Prof. Amine Ouazad Statistics Course Outline PART I. INTRODUCTION AND RESEARCH DESIGN Week 1 PART II. DESCRIBING DATA Weeks 2-4 PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL STATISTICS Weeks 5-9 Firenze or Lebanese Express? PART IV. : CORRELATION AND CAUSATION: REGRESSION ANALYSIS This is where we talk about Zmapp and Ebola! Weeks 10-14 Last Session • Four rules of probability distributions 1. P(not A) = 1 – P(A) 2. P(A or B) = P(A) + P(B) when P(A and B)=0 3. P(A and B)=P(A) P(B given A) Beware of the inverse probability fallacy, P(B given A) is not P(A given B) 3’. P(A and B)=P(A) P(B) when A and B are independent • Inverse Probability Fallacy: – P(A|B) is not P(B|A). – We have a formula P(A|B) = (P(B|A) P(A)) / P(B) Outline 1. Random Variable Probability distribution of a random variable Expectation of a random variable 1. The normal distribution 2. Polls and normal distributions Next time: Probability Distributions (continued) Chapter 4 of A&F Random variable A random variable is a variable whose value is not given exante… but rather can take multiple values ex-post. • Example: – X is a random variable that, before the coin is tossed (ex-ante), can take values « Heads » or « Tails ». Once the coin is tossed (ex-post), the value of X is known, it is either « Heads » or « Tails ». – Y is a random variable that can take values 1,2,3,4,5, or 6 depending on the draw of a dice. Before the dice is thrown, the value is not known. After the dice is drawn, we know the value of Y. Probability distribution of a random variable • Take all possible values of a random variable Y: – Example: 1,2,3,4,5,6 – In general: y1, y2, y3, …, yK. • Probability of the event that the random variable Y equates yk is noted P(Y=yk) or simply P(yk). • The probability distribution of random variable Y is the list of all values of P(Y=yk). • Example: for a balanced dice, the probability distribution of Y is the list of values P(Y=1), P(Y=2), P(Y=3), … which is {1/6,1/6,1/6,1/6,1/6,1/6} All throughout the course we consider either discrete quantitative random variables or categorical random variables. Expected value of a random variable What are your expected gains when playing the coin game? • Gain is a random variable, equal to +10 AED when getting heads, and -10 AED when getting tails. E(gain) = Gain when getting heads x Probability of heads + Gain when getting tails x Probability of tails. In general, for a random variable Y, the expected value of Y is: • E(Y) = S yk P(Y=yk) Also note that probabilities sum to one. S P(Y=yk) = 1 Should I play this game at all? What is my expected gain?? Expected Earnings? Hum, how much will I earn?? • « Your annual earnings right after NYU Abu Dhabi » is a random variable… – The variable has not been realized yet. Let’s give it a name Y = « Your annual earnings right after NYU Abu Dhabi ». • E(earnings) = E(Y) = S yk P(Y=yk) Takes potentially K values. • Problemo: We don’t observe earnings in the future!!! Expected Earnings? Hum, how much will I earn?? An approximation is to use the distribution of current graduates … To substitute for our lack of knowledge of P(Y=yk) for each k. • Earnings take K distinct values, no two graduates earn exactly the same annual wage… • Hence an approximation of expected earnings is E(Y) = S yk x (1/ K) • The average earnings of current graduates… • But that’s only an approximation !! What could be wrong? Properties of the Expectation The expectation of the sum is the sum of the expectations: • E(earnings – debt) = E(earnings) – E(debt) The expectation of a constant x the random variable is the constant x the expectation: • E( Constant x Earnings ) = Constant x E(Earnings) E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD) Beware !!! • E( X Y ) is not E(X) E(Y) in general. • When X and Y are independent, E( X Y ) = E(X) E(Y). • Law of conditional expectation E(X)=E(E(X|Z)) Outline 1. Random Variable Probability distribution of a random variable Expectation of a random variable 1. The normal distribution 2. Polls and normal distributions Next time: Probability Distributions (continued) Chapter 4 of A&F A particular distribution • Some random variables have a particular “bell-shaped” distribution: – Individuals’ height. • What is the distribution of height at age 20? P(height) • What height can I expect for my child? E(height) – Individuals’ weight. • What is the distribution of weight at age 35? P(weight) • What weight can I expect at age 35? E(height) – The logarithm of income. • What is the distribution of the log of income after graduation? P(log(income)) • What log income can I expect after graduation? • The “bell-shaped” distribution will now be called a “normal” distribution. The normal distribution • “The normal distribution is symmetric, bell shaped, and characterized by its mean m and standard deviation s. • The probability within any particular number of standard deviations of m is the same for all normal distributions.” P(m – s < height < m + s) = 0.68 or 68% P(m - 2s < height < m + 2s) = 0.95 or 95% P(m - 3s < height < m + 3s) = 0.997 or 99.7% All of these are “events” The normal distribution • Draw a histogram will a very small bin size… so that the little stairs disappear…. and a curve appears. Comparing test scores across colleges “Early paleontology in Indianapolis” “Hip hop in the Middle East” Test scores have a normal distribution with mean 3 and standard deviation 4. Test scores have a normal distribution with mean 4 and standard deviation 1. • Problem: how do I compare Marina’s test score of 3.6 at the paleontology course with a test score of 4.1 at the Hip Hop in the Middle East? Z-score ! • Take a student’s paleontology test score at the end of the semester. This is a random variable. – Its probability distribution has a mean of m=3 with a standard deviation of s=4. – Now consider the “z-scored” paleontology test score: z - scored paleontology = paleontology test score - m – The z-scored paleontology test scoreshas a mean of 0, and a standard deviation of 1. Standard Normal Distribution • Is simply the normal distribution with mean 0 and standard deviation 1. • A z-score of 3 means that the student is three times the standard deviation (of original test scores) above the mean. So who has a better grade, Marina or Slavoj? Outline 1. Random Variable Probability distribution of a random variable Expectation of a random variable 1. The normal distribution 2. Polls and normal distributions Next time: Probability Distributions (continued) Chapter 4 of A&F Who will win the mid term elections in the US? • Mid term elections are held two years after the presidential elections in the United States. • They take place early november 2014. • A question: what fraction of the voters will vote for a democrat in Colorado? Wrap up • A random variable is a variable whose value has not been realized. • The expectation of a random variable Y is: E(Y) = S yk P(Y=yk) Also, E(X+Y) = E(X) + E(Y), and E(c X)=c E(X), and E(E(X|Z))=E(X) • Typically the probability distribution P is not known, but we approximate it…. – Using the distribution for past values of Y (example: earnings of previous graduates) – Using polls, to ask individuals for example how they will vote. • The normal distribution is an ubiquitous distribution, that is symmetric, bell shaped. It is characterized by its mean m and its standard deviation s. • The standard normal distribution has mean 0 and standard deviation 1. Coming up: Readings: • Chapter 4 entirely – full of interesting examples and super relevant. • Online quiz tonight. • Go to the website http://www.realclearpolitics.com/epolls/2014/senate/co/colorado_senate_gardner_vs_udal l-3845.html and prepare one or two slides to present the race in Colorado. – – – Who do you think will win? What is MoE? What is the likely distribution of the “fraction of voters who will vote for Gardner?” For help: • Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Tuesday from 5 to 6.30pm. • GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.