2.1 Chapter 2: Probability Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an experiment. Experiments can be either controlled (laboratory) or uncontrolled (observational). Most economic variables are random and are the result of uncontrolled experiments. Random Variables 2.2 A discrete random variable can take on only a finite number of values such as • • • • The number of visits to a doctor’s office Number of children in a household Flip of a coin Dummy (binary) variable: D=0 if male, D=1 if female A continuous random variable can take any real value (not just whole numbers) in an interval on the real number line such as: • Gross Domestic Product next year • Price of a share in Microsoft • Interest rate on a 30 year mortgage Probability Distributions of Random Variables • All random variables have probability distributions that describe the values the random variable can take on and the associated probabilities of these values. • Knowing the probability distribution of random variable gives us some indication of the value the r.v. may take on. 2.3 2.4 Probability Distribution for Discrete Random Variable Expressed as a table, graph or function 1. Suppose X = # of tails when a coin is flipped twice. X can take on the values 0, 1 or 2. Let f(x) be the associated probabilities: Table Graph X f(x) 0 0.25 f(x) 1 0.50 2 0.25 0.50 Probability is represented as height on this bar graph 0.25 0 1 2 x 2.5 2. Suppose X is a binary variable that can take on two values: 0 or 1. Furthermore, assume P(X=1) = p and P(X=0) = (1-p) Function: P(X=x) = f(x) = px(1-p)1-x for X = 0, 1 Table X f(x) 0 (1-p) 1 p Suppose p = 0.10 Then X takes on 0 with probability 0.90 and X takes on 1 with probability 0.10 Facts about discrete probability distribution functions 1. Each probability P(X=x) = f(x) must lie between 0 and 1: 0 f(x) 1 2. The sum of the probabilities must be 1. If X can take on n different values then: f(x1) + f(x2)+. . .+f(xn) = 1 2.6 2.7 Probability Distribution (Density)for Continuous Random Variables Expressed as a function or graph. Continuous r.v.’s can take on an infinite number of values in a given interval – A table isn’t appropriate to express pdf EX: f(x) = 2x for 0 x 1 =0 otherwise 2.8 Because a continuous random variable has an uncountably infinite number of values, the probability of one occurring is zero. P(X = a) = 0 Instead, we ask “What is the probability that X is between a and b? P[a < X < b] = ? In an experiment, the probability P[a < X < b] is the proportion of the time, in many experiments, that X will fall between a and b. 2.9 Probability is represented as area under the function. Total area must f(x) be 1.0 Area of triangle 2 is 1.0 1 Probability that x lies between 0 and 1/2 P [ 0 X 1/2 ] = 0.25 [Area of any triangle is ½*Base*Height] 1/2 1 x 2.10 Uniform Random Variable: u is distributed uniformly between a and b • p.d.f. is a line between a and b of height 1/(b-a) • f(u) = 1/(b – a) if a u b = 0 otherwise EX: Spin a dial on a clock a = 0 and b = 12 Find the probability that u lies between 1 and 2 f(u) 1/12 0 1 2 12 u 2.11 In calculus, the integral of a function defines the area under it: P[aXb]= b f(x) dx a For continuous random variables it is the area under f(x), and not f(x) itself, which defines the probability of an event. We will NOT be integrating functions; when necessary we use tables and/or computers to calculate the necessary probability (integral). 2.12 Rules of Summation n Rule 1: Rule 2: Rule 3: xi = x1 + x2 + . . . + xn i=1 n a = na i=1 axi = a xi n Rule 4: n n i=1 i=1 xi + yi = xi + yi i=1 2.13 Rules of Summation (continued) n Rule 5: Rule 6: n n i=1 i=1 axi + byi = a xi + b yi i=1 x n = n xi = i=1 1 x1 + x2 + . . . + xn n From Rule 6, we can prove (in class) that: n xi x) = 0 i=1 2.14 Rules of Summation (continued) n Rule 6: f(xi) = f(x1) + f(x2) + . . . + f(xn) i=1 Notation: n m Rule 7: n x f(xi) = i f(xi) = i =1 f(xi) n [ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)] f(xi,yj) = i =1 i=1 j=1 The order of summation does not matter : n m m n f(xi,yj) f(xi,yj) =j = 1 i=1 i=1 j=1 2.15 The Mean of a Random Variable The mean of a random variable is its mathematical expectation, or expected value. For a discrete random variable, this is: = xif(xi) = x1f(x1) + x2f(x2) + . . . + xnf(xn) where n measures the number of values X can take on E(X) It is a probability-weighted average of the possible values the random variable X can take on. This is a sum for discrete r.v.’s and an integral for continuous r.v.’s 2.16 • E(X) tells us the “long-run” average value for X. It is not the value one would expect X to take on. • If you were to randomly draw values of X from its pdf an infinite number of times and average these values, you would get E(X) • E(X) = this greek letter “mu” is not used in your text but is commonly used to denote the mean of X. 2.17 Example: Roll a fair die 6 E X xi f xi i 1 1(1 / 6) 2(1 / 6) 3(1 / 6) 4(1 / 6) 5(1 / 6) 6(1 / 6) 21 / 6 3.5 Interpretation: In a large number of rolls of a fair die, onesixth of the values will be 1’s, one-sixth of the values will be 2’s. etc., and the average of these values will be 3.5. Mathematical Expectation • • 2.18 Think of E(.) as an operator that requires you to weight by probabilities any expression inside the parentheses, and then sum E(g(x)) = g(xi)f(xi) = g(x1)f(x1) + g(x2 ) f(x2) + . . . + g(xn ) f(xn) Rules of Mathematical Expectation • E(c) = c where c is a constant • E(cX) = cE(X) where c is a constant and X is a random variable • E(a + cX) = a + cE(X) where a and c are constants and X is a random variable. 2.19 Variance of a Random Variable • • • • Like the mean, the variance of a r.v. is an expected value, but it is the expected value of the squared deviations from the mean Let g(x) = (x – E(x))2 Variance 2 = Var(x) = E(x – E(x))2 = g(xi)f(xi) = (xi – E(xi))2f(xi) It measures the amount of dispersion in the possible values for X. 2.20 About Variance • • 2.21 Unit of measurement is X units squared When we create a new random variable as a linear transformation of X: y = a + cx We know that E(y) = a + cE(x) But Var(y) = c2Var(x) (proof in class) This property tells us that the amount of variation in y is determined by: the amount of variation in X and the constant c. The additive constant a in no way alters the amount of variation in the values on x. About Variance (con’t) • E(x – E(x))2 = E[x2 – 2E(x)x + E(x)2] = E(x2) – 2E(x)E(x) + E(x)2 = E(x2) – 2E(x)2 + E(x)2 = E(x2) – E(x)2 • Run the E(.) operator thru, pulling out constants and stopping on random variables. Remember that E(x) is itself a constant, so • E(E(x)) = E(x) 2.22 Standard Deviation • 2.23 Because variance is in squared units of the r.v., we can take the square root of the variance to obtain the standard deviation. = 2 = Var(x) Be sure to take the square root after you square and sum the deviations from the mean. Joint Probability 2.24 • An experiment can randomly determine the outcome of more than one variable. • When there are 2 random variables of interest, we study the joint probability density function • When there are more than 2 random variables of interest, we study the multivariate probability density function. For a discrete joint pdf, probability is expressed in a matrix: Let X= return on stocks, Y= return on bonds X Y f(y) -10 0 10 20 6 0 0 0.10 0.10 8 0 0.10 0.30 0.20 10 0.10 0.10 0 0 f(x) P(X=x,Y=y) = f(x,y) e.g. P(X=10,Y=8) = 0.30 2.25 About Joint P.d.F’s • 2.26 Marginal Probability Distribution: what is the probability distribution for X regardless of what values Y takes on? f(x) = yf(x,y) what is the probability distribution for Y regardless of what values X takes on? f(y) = xf(x,y) 2.27 • Conditional Probability Distribution: What is the probability distribution for X given that Y takes on a particular value? f(x|y) = f(x,y)/f(y) What is the probability distribution for Y given that X takes on a particular value? f(y|y\x) = f(x,y)/f(x) 2.28 • Covariance: A measure that summarizes the joint probability distribution between two random variables. cov(x,y) = E[(x – E(x))(y-E(y))] = x y (xi – E(x))(yi – E(y))f(x,y) Ex: About Covariance: 2.29 It measures the joint association between 2 random variables. Try asking: “When X is large, is Y more or less likely to also be large?” If the answer is that Y is likely to be large when X is large, then we say X and Y have a positive relationship. Cov(x,y) > 0 If the answer is that Y is likely to be small when X is large, then we say that X and Y have a negative relationship. Cov(x,y) < 0. cov(x,y) = E[(x – E(x))(y – E(y))] = E[xy – E(x)y – xE(y) + E(x)E(y)] = E(xy) – E(x)E(y) – E(x)E(y) + E(x)E(y) = E(xy) – E(x)E(y) useful!! 2.30 • Correlation Covariance has awkward units of measurement. Correlation removes all units of measurement by dividing covariance by the product of the standard deviations: xy = Cov(x,y)/(xy) and –1 xy 1 Ex: 2.31 What does correlation look like?? =0 =.7 =.3 =.9 Statistical Independence Two random variables are statistically independent if knowing the value that one will take on does not reveal anything about what value the other may take on: f(x|y) = f(x) or f(y|x) = f(y) This implies that f(x,y) = f(x)f(y) if X and Y are independent. If 2 r.v.’s are independent, then their covariance will necessarily be equal to 0. 2.32 Functions of more than one Random Variable 2.33 Suppose that X and Y are two random variables. If we sum them together we create a new random variable that has the following mean and variance: Z = aX + bY E(Z) = E(aX + bY) = aE(x) + bE(y) Var(Z) = Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y) If X and Y are independent Var(Z) = Var(aX + bY) = a2Var(X) + b2Var(Y) see page 31 2.34 Normal Probability Distribution • Many random variables tend to have a normal distribution (a well known bell shape) • Theoretically, x~N(β,2) where E(x) = β and Var(x) = 2 The probability density function is a ( x ) 2 f ( x) exp , 2 2 2 2 1 b x x Normal Distribution (con’t) 2.35 • A family of distributions, each with its own mean and variance. The mean anchors the distribution’s center and the variance captures the spread of the bell-shaped curve • To find area under the curve would require integrating the p.d.f – too complicated. Computer generated table gives all the probabilities we need for a normal r.v. that has mean 0 and variance of 1 To use the table (pg. 389), we need to take a normal random variable x~N(,2) and transform it by subtracting the mean and dividing by the standard deviation. This is a linear transformation of X that creates a new random variable that has mean 0 and variance of 1. Z = (x - )/ where z ~N(0,1) Statistical inference: drawing conclusions about a population based on a sample 2.36 T E ( X ) xi f ( xi ) X Xt t 1 T 2 ( x x ) sx2 i T 1 Var( X ) 2 E ( X E ( X )) 2 E ( X )2 x x 2 Var( X ) sx sx2 Cov( X , Y ) E ( X x )(Y y ) 1 S xy ( xt x )( yt y ) T 1 E ( XY ) x y xy Cov( X , Y ) Var( X ) Var(Y ) r S xy sx2 s 2y ( xt x )( yt y ) ( xt x )2 ( yt y )2