Random Variables Experiment : Let X be a number chosen at random from 1 to 10. X is called a random variable. The range of X (Range(X)) is {1,2,3,4,5,6,7,8,9,10}. Experiment: Flip a coin. Suppose that you win $1 for heads and lose $1 for tails. Let Y be the amount that you win or lose. Y, too, is a random variable. Range(Y) = { -1,1} Experiment: Flip a coin 100 times. Let Z be the number of times out of 100 that heads appears. Z is a random variable. Range(Z)= {0,1,2,3.., 97,98,99,100} In general, A random variable X is a variable that can assume various values with various probabilities. A more technical definition might be: A random variable associated with a sample space S is a function whose domain is S and whose range is a set real numbers. So for example: Flip a coin and win/lose $1. Sample space { H, T}. The random variable Y is simply the function: H1 T -1. For our purposes, the first, more intuitive, informal definition will suffice. There are two distinct types of random variables: discrete and continuous A discrete random variable is one whose 1. range is finite e.g. {1,2,3,4,5,6} or 2. range is infinite but can be explicitly listed. e.g. {1,2,3,4….} or {2,4,6,8,10……} or perhaps { -1,1,-2,2,-3,3,-4,4,….} A continuous random variable is one whose range contains an entire interval eg. (- , + ) or (-1,1). A the range of a discrete random variable contains no complete interval of real numbers. 1 For now, we will be concerned with discrete random variables. Example: A card is selected from a deck. If the card is an Ace you lose $100. If it is a picture card you win $20. If it is any other card you win $5. Let X be the amount you win or lose. X is a discrete random variable. The Range(x) = { -100, 5, 20} (finite) Would you play this game? Example: Let X be the number of hits on some website wich occur in a given month. Range(X) = {0,1,2,3,4,5,6,………..} ( infinite, no limit) X is a discrete random variable. Notation: Uppercase letters are usually used to denote random variables. Every discrete random variable has: 1. a probability distribution 2. an expected value or mean value 3. a variance and standard deviation Definition: Let X be a discrete random variable with range { x1,x2,x3,…} The probability distribution (or probability mass function [pmf] or probability function [pf] ) of X is a function p such that p(x) = Prob(X = x) for each xi in Range(X) We often describe the probability distribution of a random variable using a table or graph. An example should clarify the definition. 2 Example: Roll a die. Let X be the number of spots. Range(X) = {1,2,3,4,5,6} Probability function: x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 Graphically: Or p(x) = 1/6 for x = 1,2,3,4,5,6 The probability distribution of a random variable tells us what to “expect” of the random variable. Theoretically, we expect to roll a 1 one-sixth of the time; we expect to roll 2 one-sixth of the time etc. You might think of the probability distribution of a random variable as a description of its theoretical behavior. Expected value or mean of a random variable: Suppose that we roll a die many times and compute the average of all the values rolled Theoretically, what would we expect? We would expect to roll: 1 – 1/6th of the time 2 – 1/6th of the time 3 – 1/6th of the time 4 – 1/6th of the time 5 – 1/6th of the time 6 – 1/6th of the time So the “average value” or “expected value” might be: 1*(1/6) + 2*(1/6) + 3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6) = 3.5 This is a weighted average. 3 For example, if we rolled the die 60000 times, in theory we expect 1—10000 times, 2—10000 times, 3—10000 times, 4—10000 times, 5—10000 times, 6---10000 times The average of these 60000 numbers is 3.5. Thus, if random variable X is the number of spots that appear on a roll of a single die, we say that The Expected Value or mean of X is 3.5 i.e E(X) = 3.5 or μ = 3.5 or μX = 3.5 . You can think of the expected value (i.e. mean ) of a random variable as a theoretical average or a long run average. If we roll a die 100000000 times , in theory the values should average out to 3.5. Definition: If X is a discrete random variable with probability function p(x) and range(X) = ( x1,x2, x3, x4,…} then E(X) defined as E(X) = μ = x1p(x1) + x2p(x2) + x3p(x3) + x4p(x4)… Notice that each value in Range(X) is weighted by its probability. Example: Flip a fair coin. Win $1 for H, lose $1 for tails. Let X be the amount that you win or lose. Range(X) = { -1,1} Probability distribution of X: E(X) = μ = 1*(1/2) + (-1)*(1/2) = 0 4 So, in the long run we expect to break even. Example: Roulette Rules: Bet $1 on Red. If a red number appears you win $1 otherwise lose $1. Let X be the amount you win or lose. Range(X) = {-1,1} Note: There are 18 Red numbers, 18 Black numbers and also 0 and 00 so, on average, you win 18 times out of 38 and lose 20 times out of 38 Probability function: E(X) = 1*(18/38) + (-1)*(20/38) = -2/38 ~ -.05 The game is not fair. On the AVERAGE you would expect to lose 5 cents/game. So the casino take a nickel for every dollar bet in this way. A fair game is one for which the expected value is 0. Example: A card is selected from a deck. If the card is an Ace you lose $100. If it is a picture card you win $20. If it is any other card you win $5. Let X be the amount you win or lose. X is a discrete random variable. The Range(x) = { -100, 5, 20} The probability function: E(X) = -100*4/52 + 20*12/52 +5*36/52 = -400/52 + 420/52 = 20/52 = .38 The expected value is positive. The game favors the player. (If, instead, you lost $110 for an Ace, E(X) would be -.38 ---unfavorable) 5 Variance of a random variable: The variance of a random variable X is a measure of the average amount of (squared) deviation form the mean μ ( or E(X)). If X is a random variable with E(X) = μ and Range(X) = {x1,x2,x3,….} then Var(x)= (x1- μ)2*p(x1) + (x2- μ)2*p(x2) + (x3- μ)2*p(x3) + (x4- μ)2*p(x4)… The standard deviation of a random variable X is the square root of Var(X). Notation : Var(X) : σ2 Standard deviation :σ Another way to denote variance is Var(X) = σ2 = E[ (X- μ)2] Example: Roll a die. Let X be the number of spots. Range(X) = {1,2,3,4,5,6} Probability function: x 1 2 3 4 5 6 p(x) 1/6 1/6 1/6 1/6 1/6 1/6 E(X) = μ = 3.5 Var(X) = σ2 = (1-3.5)2*1/6 + (1-3.5)2*1/6+ (1-3.5)2*1/6+ (1-3.5)2*1/6+ (1-3.5)2*1/6+ (1-3.5)2*1/6 = 2.92 σ= 2.92 = 1.707 6 Shortcut formula for variance: With some ordinary algebra you can derive the following shortcut formula for σ 2: Var(X) = σ2 = E(x2) - [E(x)]2 Example using this version : X X2 p(x) 1 2 3 4 5 6 1 4 9 16 25 36 1/6 1/6 1/6 1/6 1/6 1/6 E(X) = 1*(1/6) + 2*(1/6) + 3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6) = (19)/6 = 3.5 E(X2) = 1*(1/6) + 4*(1/6) + 9*(1/6)+16*(1/6)+25*(1/6)+36*(1/6) = (91)/6 = 15.167 Var(X) = σ2 = E(x2) - [E(x)]2 = 15.167 – (3.5)2 = 2.92 Summary: If X is a discrete random variable with o Range(X) = {x1, x2, x3,…} and o probability function p(x) then E(X) = μ = x1p(x1) + x2p(x2) + x3p(x3) + x4p(x4)… Var(x)= σ2 = (x1- μ)2*p(x1) + (x2- μ)2*p(x2) + (x3- μ)2*p(x3) + (x4- μ)2*p(x4)… or Var(X) = σ2 = E(x2) - [E(x)]2 7 Example: Roll 2 dice. X is the number of Spots. Range(X) = {2,3,4,5,6,7,8,9,10,11,12} Probability function: x p(x) 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Picture: 8 x x2 p(x) 2 3 4 5 6 7 8 9 10 11 12 4 9 16 25 36 49 64 81 100 121 144 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 E(X) = 2*1/36+3*2/36+4*3/36*5*4/36+6*5/36+7*6/36+8*5/35+9*4/36+10*3/36+11*2/36+12*1/36 =7 E(X2) = 4*1/36+9*2/36+16*3/36+25*4/36+36*5/36+49*6/36+…..+144*1/36 = 1974/36 = 54.83 Var(X) = E(X2) – E(X)2 = 54.83 - 72 = 5.83 (using the shortcut formula) Thus μX = 7 and σX2 = 5.83 and σX = 583 . = 2.41 9 An important property of E(X): Suppose X is a number chosen at random between 1 and 10 inclusive. Range(X) = {1,2,3,4,5,6,7,8,9,10} Probability distribution: p(x) = 1/10 for all x in Range(X) i.e X p(x) 1 2 3 4 5 6 7 8 9 10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 So E(X) = 1*.1 + 2*.1 + 3*.1 + 4*.1 + 5*.1 + 6*.1 + 7*.1 + 8*.1 + 9*.1 + 10*.1 = 5.5 Now define another random variable Y = X + 5. The Range(Y) = { 6,7,8,9,10,11,12,13,14,15,16} Probability function for Y is X p(x) 6 7 8 9 10 11 12 13 14 15 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 1/10 Calculate E(Y),: E(Y) = 6*.1 + 7*.1 + 8*.1 + 9*.1 + 10*.1 + 11*.1 + 12*.1 + 13*.1 +14*.1 + 15*.1 = 10.5 Notice that E(Y) = E(X + 5) = 10.5 -- which is E(X) + 5 10 Similarl, let Z = 10X . Then, Range(Z) = {10, 20, 30,40,50,60,70,80, 90,100} The probabilities are still 1/10 for each value in the range. E(Z) = 10*.1 + 20*.1 +30*.1 +…+ 100*.1 = 55 Thus E(Z) =E(10X) = 55 which is 10*E(X) In general: If X is a random variable and a and b are real numbers then E(aX + b) = aE(X) + b Example: Roll two dice. X is the number of spots. Range(X) = { 2,3,4,5,6,7,8,9,10,11,12}; E(X) = 7. Let Y = 2X + 4. (Roll the dice, multiply by 2 and add 4) Range(Y) = { 8, 10,12,14,16,18,20,22,24,26,28} E(Y) = E(2X + 4) = 2E(X) + 4 = 1*7 + 4 = 18 Variance: Var(aX + b) = |a|Var(X) Example: Let X be a number chosen at random from 1 to 6 (roll a die) We saw E(X) = 3.5 and Var(X) = 2.92. Let Y = 3X + 5 ( Roll a die, multiply by 3 and add 5) Range(Y) = { 8,11,14,17,21,24} E(Y) = E(3X + 5) = 3E(X) + 5 = 3*(3.5) + 5 = 15.5 Var(Y) = Var(3X + 5) = |3|Var(X) = 3*2.92 = 8.76 11 Example: A family has 4 children. Let X be the number of girls . Find: a. Range(X), b. Distribution of X c. E(X) d. Var(X) a. Range(X) = {0,1,2,3,4} Probability distribution: Here is the sample space: BBBB BBBG BBGB BGBB GBBB BBGG BGBG BGGB GGBB GBGB GBBG GGGB GGBG GBGG BGGG GGGG E(X) = 0*(1/16) +1*(4/16) +2*6/16 + 3*4/16 + 4* 1/16 = 32/16 = 2 To Calculate Var(X), get E(X2): E(X2) = 0*(1/16) +1*(4/16) +4*6/16 + 9*4/16 + 16* 1/16 = 80/16 = 5 Var(X) = E(X2) – E(X)2 = 5 – 22 = 1 i.e. σ2 = 1 σ = 5 = 2.236 Summary: If Y = aX = b E(Y) = E(aX + b) = aE(X) + b Var(Y) = Var(aX + b) = |a|Var(X) 12