MDM4U Probability Distributions Probability experiments often have numeric outcomes counted or measured. A random variable is a variable whose possible values are numerical outcomes of a probability experiment. Random variables are usually denoted by upper case (capital) letters. The possible values are denoted by the corresponding lower case letters, so that we talk about events of the form [X = x]. A random variable, X, has a single value (x) for each outcome in an experiment. Example Show a random variable for tossing a coin. Random variable X = “number of times a head is flipped in one coin toss”, Possible values of X: x ∈ {0, 1}. Example Show a random variable for rolling a die Random variable X = “number on the top of a die” Possible values of X: x ∈ {1, 2, 3, 4, 5, 6}. Types of random variables There are two types of random variables, discrete and continuous. 1. Discrete random variable is a variable which can only take a countable number of values. Thus, all possible values of a discrete random variable could be numbered. Although the set of all possible numerical outcomes of a discrete random variable could be infinite (but countable!), for many discrete random variables this set is finite. We call them finite random variables. 2. Continuous random variable can get all real values from some interval on the number line. The set of all possible numerical values of a continuous random variable cannot be numbered. Example Classify the following random variables as discrete or continuous. 1. Length of time you stay in a class. 2. Number of classes you attend in a day. Random variables are described by their probabilities. For a discrete random variable, its probability distribution gives each possible value and the probability of that value: x 𝑃[𝑋 = 𝑥] 𝑥1 𝑝1 𝑥2 𝑝2 𝑥3 𝑝3 … … 𝑥𝑛 𝑝𝑛 Example Probability distribution for number of tattoos each student has in a population of students Tattoos Probability 𝑃[𝑋 = 𝑥] 0 0.850 1 0.120 2 0.015 3 0.010 4 0.005 Note: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive: 0.850 + 0.120 + 0.015 + 0.010 + 0.005 = 1. The notation P(x) is often used for 𝑃[𝑋 = 𝑥]. In this example, 𝑃(4) = 𝑃[𝑋 = 4] = 0.005. Example Probability distribution for number of heads in 4 flips of a coin Heads Probability 0 1 16 1 4 1 𝑃(0) = ( ) = 2 16 4 1 4 1 𝑃(1) = ( ) ( ) = 4 ∙ = 1 2 16 4 1 4 1 ( ) 𝑃 2 = ( )( ) = 6 ∙ = 2 2 16 4 1 4 1 𝑃(3) = ( ) ( ) = 4 ∙ = 3 2 16 1 4 1 𝑃(4) = ( ) = 2 16 1 1 3 1 1 + + + + =1 16 4 8 4 16 1 1 4 2 3 8 3 1 4 4 1 16 1 4 3 8 1 4 The cumulative probabilities are given as The interpretation is that F(x) is the probability that X will take a value less than or equal to x. The function F is called the cumulative distribution function (CDF). For example, consider random variable X with probabilities x 𝑃(𝑥) 0 0.05 1 0.10 2 0.20 3 0.40 4 0.15 5 0.10 For our example, 𝐹 (3) = 𝑃[𝑋 ≤ 3] = 𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3) = 0.05 + 0.10 + 0.20 + 0.40 = 0.75. One can of course list all the values of the CDF easily by taking cumulative sums: x 𝑃(𝑥) F(x) 0 0.05 0.05 1 0.10 0.15 2 0.20 0.35 3 0.40 0.75 4 0.15 0.90 5 0.10 1.00 The values of F increase. Expected Value is the average of the outcomes. The expected value of X is denoted either as E(X) or as μ. It’s defined as The calculation for this example is E(X) = 0 × 0.05 + 1 × 0.10 + 2 × 0.20 + 3 × 0.40 + 4 × 0.15 + 5 × 0.10 = 0.00 + 0.10 + 0.40 + 1.20 + 0.60 + 0.50 = 2.80 E(X) is also said to be the mean of the probability distribution of X. The probability distribution of X also has a standard deviation, but one usually first defines the variance. The variance of X, denoted as 𝑉𝑎𝑟(𝑋) or σ2, is This is the expected square of the difference between X and its expected value, μ. We can calculate this for our example: x 0 1 2 3 4 5 x – 2.8 –2.8 –1.8 –0.8 0.2 1.2 2.2 (𝑥 – 2.8)2 7.84 3.24 0.64 0.04 1.44 4.84 𝑃(𝑥) 0.05 0.10 0.20 0.40 0.15 0.10 (𝑥 – 2.8)2 𝑃(𝑥) 0.392 0.324 0.128 0.016 0.216 0.484 The variance is the sum of the final column. This value is 1.560. This is not the way that one calculates the variance, but it does illustrate the meaning of the formula. There’s a simplified method, based on the result This is easier because we’ve already found μ, and the sum is fairly easy to calculate. For our example, this sum is 02×0.05 + 12×0.10 + 22×0.20 + 32×0.40 + 42×0.15 + 52×0.10 = 9.40. Then This is the same number as before, although obtained with rather less effort. The standard deviation of X is determined from the variance. Specifically, the standard deviation of X is 𝜎 = √Var (𝑋). In this situation, we find: σ = √1.56 ≈ 1.25. Example Given the following probability distribution, determine the expected value and standard deviation. x 𝑃(𝑥) 2 0.4 4 0.1 6 0.5 Solution E(X) = 2 ∙ (0.4) + 4 ∙ (0.1) + 6 ∙ (0.5) = 0.8 + 0.4 + 3 = 4.2 The expected value is 4.2. Var(𝑋) = 22 ∙ (0.4) + 42 ∙ (0.1) + 62 ∙ (0.5) – 4.22 = 1.6 + 1.6 + 18 – 17.64 = 3.56 σ = √3.56 ≈ 1.9. The standard deviation is 1.9. Uniform Discrete Distribution is a distribution of probabilities with equally likely outcomes. Example Determine the probability distribution, expected value, and standard deviation for the following random variable: the number rolling on a dice. Solution The probability distribution is uniform: x 𝑃(𝑥) 1 2 3 4 5 6 1 6 1 6 1 6 1 6 1 6 1 6 1 1 1 1 1 1 21 6 6 E(X) = 1 ∙ + 2 ∙ + 3 ∙ + 4 ∙ + 5 ∙ + 6 ∙ = 6 6 1 1 6 1 6 1 6 1 Var(X) = 1 ∙ + 4 ∙ + 9 ∙ + 16 ∙ + 25 ∙ + 36 ∙ = σ=√ 35 12 91 6 6 – 49 4 ≈ 1.7 = 6 35 12 6 6 6 7 = = 3.5. 1 6 2 – 49 4 Example Determine the probability distribution and expected value for the following random variable: the sum of the numbers rolling on two dice. Solution x P(x) 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 36 36 36 36 36 36 36 36 36 36 36 E(X) = 2 ∙ 1 36 6 +3∙ 2 36 5 +4∙ 3 36 4 +5∙ 4 +6∙ 36 3 5 36 + 2 1 + 7 ∙ + 8 ∙ + 9 ∙ + 10 ∙ + 11 ∙ + 12 ∙ 36 36 36 36 36 36 = 7.