3 Discrete Random Variables and Probability Distributions Copyright © Cengage Learning. All rights reserved. Population versus sample Population: The entire group of individuals in which we are interested but can’t usually assess directly. Sample: The part of the population we actually examine and for which we do have data. How well the sample represents the population depends on the sample design. Example: All humans, all working-age people in California, all crickets Population Sample A parameter is a number describing a characteristic of the population. A statistic is a number describing a characteristic of a sample. Parameters and Statistics As we begin to use sample data to draw conclusions about a wider population, we must be clear about whether a number describes a sample or a population. A parameter is a number that describes some characteristic of the population. In statistical practice, the value of a parameter is not known because we cannot examine the entire population. A statistic is a number that describes some characteristic of a sample. The value of a statistic can be computed directly from the sample data. We often use a statistic to estimate an unknown parameter. We write µ (the Greek letter mu) for the population mean and σ for the population standard deviation. We write x (x-bar) for the sample mean and s for the sample standard deviation. Discrete random variables A random variable is a variable whose value is a numerical outcome of a random phenomenon. A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up. A discrete random variable X has a finite or countable infinite number of possible values. A bottle cap is tossed three times. The number of times the cap drops with the open side up is a discrete random variable (X). X can only take the values 0, 1, 2, or 3. Probability distribution for discrete random variables The probability distribution of a random variable X lists the values and their probabilities: The probabilities pi must add up to 1. Definition of Probability distribution or probability mass function (pmf) of a discrete rv: p(x) = P(X = x) = P(all s S : X(s) = x), for every number x. Example: A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up. The open side is lighter than the closed side so it will probably be up well over 50% of the tosses. Suppose on each toss the probability it drops with the open side up is 0.7. Example: P(UUU) = P(U)* P(U)* P(U) = (.7)*(.7)*(.7) = .343 U U - UUU D - UUD U - UDU Value of X 0 1 2 3 Probability .027 .189 .441 .343 UDD DUD DDU UUD UDU UUD UUU U D DDD D… D - DDU … The probability of any event is the sum of the probabilities pi of the values of X that make up the event. A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up. What is the probability that at least two Value of X 0 1 2 3 times the cap lands with the open side Probability .027 .189 .441 .343 UDD DUD DDU UUD UDU UUD UUU up (“at least two” means “two or more”)? DDD P(X ≥ 2) = P(X=2) + P(X=3) = .441 + .343 = 0.784 What is the probability that cap lands with the open side up fewer than three times? P(X<3) = P(X=0) + P(X=1) + P(X=2) = .027 + .441 + .189 = 0.657 or P(X<3) = 1 – P(X=3) = 1 - 0.343 = 0.657 A Parameter of a Probability Distribution The pmf of the Bernoulli rv X was p(0) = .8 and p(1) = .2 because 20% of all purchasers selected a desktop computer. At another store, it may be the case that p(0) = .9 and p(1) = .1. More generally, the pmf of any Bernoulli rv can be expressed in the form p(1) = and p(0) = 1 – , where 0 < < 1. Because the pmf depends on the particular value of we often write p (x; ) rather than just p(x): (3.1) Example: Starting at a fixed time, we observe the gender of each newborn child at a certain hospital until a boy (B) is born. Let p = P(B), assume that successive births are independent, and define the rv X by x = number of births observed. Then p(1) = P(X = 1) = P(B) =p Example: p(2) = P(X = 2) = P(GB) = P(G) P(B) = (1 – p)p and p(3) = P(X = 3) = P(GGB) = P(G) P(G) P(B) = (1 – p)2p Example: Continuing in this way, a general formula emerges: The parameter p can assume any value between 0 and 1. The Cumulative Distribution Function For some fixed value x, we often wish to compute the probability that the observed value of X will be at most x. The probability that X is at most 1 is then P(X 1) = p(0) + p(1) = .500 + .167 = .667 The Cumulative Distribution Function In this example, X 1.5 if and only if X 1, so P(X 1.5) = P(X 1) = .667 Similarly, P(X 0) = P(X = 0) = .5, P(X .75) = .5 And in fact for any x satisfying 0 x < 1, P(X x) = .5. The Cumulative Distribution Function The largest possible X value is 2, so P(X 2) = 1 P(X 3.7) = 1 P(X 20.5) = 1 and so on. Notice that P(X < 1) < P(X 1) since the latter includes the probability of the X value 1, whereas the former does not. More generally, when X is discrete and x is a possible value of the variable, P(X < x) < P(X x). The Cumulative Distribution Function Definition of cumulative distribution function (cdf) F(x) of a discrete rv variable X with pmf p(x): F(x) = P(X x) = , for every number x. For any number x, F(x) is the probability that the observed value of X will be at most x. Example: A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of memory. The accompanying table gives the distribution of Y = the amount of memory in a purchased drive: Example: Let’s first determine F(y) for each of the five possible values of Y: F(1) = P(Y 1) = P(Y = 1) = p(1) = .05 F(2) = P(Y 2) = P(Y = 1 or 2) = p(1) + p(2) = .15 Example: cont’d F(4) = P(Y 4) = P(Y = 1 or 2 or 4) = p(1) + p(2) + p(4) = .50 F(8) = P(Y 8) = p(1) + p(2) + p(4) + p(8) = .90 F(16) = P(Y 16) =1 Example: Now for any other number y, F(y) will equal the value of F at the closest possible value of Y to the left of y. For example, F(2.7) = P(Y 2.7) = P(Y 2) = F(2) = .15 F(7.999) = P(Y 7.999) = P(Y 4) = F(4) = .50 Example: If y is less than 1, F(y) = 0 [e.g. F(.58) = 0], and if y is at least 16, F(y) = 1[e.g. F(25) = 1]. The cdf is thus Example: A graph of this cdf is shown in Figure 3.5. A graph of the cdf of Example 3.13 Figure 3.13 The Cumulative Distribution Function For X a discrete rv, the graph of F(x) will have a jump at every possible value of X and will be flat between possible values. Such a graph is called a step function. Proposition For any two numbers a and b with a b, P(a X b) = F(b) – F(a–) where “a–” represents the largest possible X value that is strictly less than a. Expected value of a random variable The expected value of a random variable X is also called mean of X. The mean x bar of a set of observations is their arithmetic average. The mean µ of a random variable X is a weighted average of the possible values of X, reflecting the fact that all outcomes might not be equally likely. A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up (“U”). DDD UDD DUD DDU UUD UDU UUD UUU Value of X 0 1 2 3 Probability .027 .189 .441 .343 Expected value of a random variable Definition: Let X be a discrete random variable with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X) or X or just , is Mean of a discrete random variable For a discrete random variable X with probability distribution the mean µ of X is found by multiplying each possible value of X by its probability, and then adding the products. X x1 p1 x2 p2 xk pk xi pi A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up. Value of X 0 1 2 3 Probability .027 .189 .441 .43 The mean µ of X is µ = (0*.027)+(1*.189)+(2*.441)+(3*.343) = 2.1 Variance of a random variable The variance and the standard deviation are the measures of spread that accompany the choice of the mean to measure center. The variance σ2X of a random variable is a weighted average of the squared deviations (X − µX)2 of the variable X from its mean µX. Each outcome is weighted by its probability in order to take into account outcomes that are not equally likely. The larger the variance of X, the more scattered the values of X on average. The positive square root of the variance gives the standard deviation σ of X. Variance of a random variable Definition Let X have pmf p(x) and expected value . Then the variance of X, denoted by V(X) or 2X , or just 2, is The standard deviation (SD) of X is Variance of a discrete random variable For a discrete random variable X with probability distribution and mean µX, the variance σ2 of X is found by multiplying each squared deviation of X by its probability and then adding all the products. X2 ( x1 X ) 2 p1 ( x2 X ) 2 p2 ( xk X ) 2 pk ( xi X ) 2 pi A bottle cap is tossed three times. We define the random variable X as the number of number of times the cap drops with the open side up. µX = 2.1. Value of X 0 1 2 3 Probability .027 .189 .441 .343 The variance σ2 of X is σ2 = .027*(0−2.1)2 + .189*(1−2.1)2 + .441*(2−2.1)2 + .343*(3−2.1)2 = .11907 + .22869 + .00441 + .27783 = .63 A Shortcut Formula for 2 The number of arithmetic operations necessary to compute 2 can be reduced by using an alternative formula. Proposition: V(X) = 2 = – 2 = E(X2) – [E(X)]2 In using this formula, E(X2) is computed first without any subtraction; then E(X) is computed, squared, and subtracted (once) from E(X2). Rules for means and variances If X is a random variable and a and b are fixed numbers, then µa+bX = a + bµX σ2a+bX = b2σ2X If X and Y are two independent random variables, then µX+Y = µX + µY σ2X+Y = σ2X + σ2Y Investment You invest 20% of your funds in Treasury bills and 80% in an “index fund” that represents all U.S. common stocks. Your rate of return over time is proportional to that of the T-bills (X) and of the index fund (Y), such that R = 0.2X + 0.8Y. Based on annual returns between 1950 and 2003: Annual return on T-bills µX = 5.0% σX = 2.9% Annual return on stocks µY = 13.2% σY = 17.6% Assume that X and Y are independent µR = 0.2µX + 0.8µY = (0.2*5) + (0.8*13.2) = 11.56% σ2R = σ20.2X + σ20.8Y = 0.2*2σ2X + 0.8*2σ2Y = (0.2)2(2.9)2 + (0.8)2(17.6)2 = 198.58 σR = √198.58 = 14.09% The portfolio has a smaller mean return than an all-stock portfolio, but it is also less risky. Binomial distributions Binomial distributions are models for some categorical variables, typically representing the number of successes in a series of n trials. The observations must meet these requirements: The total number of observations n is fixed in advance. Each observation falls into just 1 of 2 categories: success and failure. The outcomes of all n observations are statistically independent. All n observations have the same probability of “success,” p. We record the next 50 births at a local hospital. Each newborn is either a boy or a girl; each baby is either born on a Sunday or not. We express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p: B(n,p). The parameter n is the total number of observations. The parameter p is the probability of success on each observation. The count of successes X can be any whole number between 0 and n. A coin is flipped 10 times. Each outcome is either a head or a tail. The variable X is the number of heads among those 10 flips, our count of “successes.” On each flip, the probability of success, “head,” is 0.5. The number X of heads among 10 flips has the binomial distribution B(n = 10, p = 0.5). Applications for binomial distributions Binomial distributions describe the possible number of times that a particular event will occur in a sequence of observations. They are used when we want to know about the occurrence of an event, not its magnitude. In a clinical trial, a patient’s condition may improve or not. We study the number of patients who improved, not how much better they feel. Is a person ambitious or not? The binomial distribution describes the number of ambitious persons, not how ambitious they are. In quality control we assess the number of defective items in a lot of goods, irrespective of the type of defect. Binomial distribution in statistical sampling A population contains a proportion p of successes. If the population is much larger than the sample, the count X of successes among size n has approximately the binomial distribution B(n, p). The n observations will be nearly independent when the size of the population is much larger than the size of the sample. As a rule of thumb, the binomial distribution can be used when the population is at least 20 times as large as the sample. Binomial mean and standard deviation 0.3 distribution for a count X are defined by P(X=x) The center and spread of the binomial 0.25 0.2 a) 0.15 0.1 0.05 the mean and standard deviation : 0 0 npq np(1 p) We often write q as 1 – p. 2 0.3 3 4 5 6 7 8 9 10 8 9 10 8 9 10 Number of successes 0.25 P(X=x) np 1 b) 0.2 0.15 0.1 0.05 0 Effect of changing p when n is fixed. a) n = 10, p = 0.25 0 1 2 3 4 5 6 7 Number of successes 0.3 b) n = 10, p = 0.5 c) n = 10, p = 0.75 For small samples, binomial distributions are skewed when p is different from 0.5. P(X=x) 0.25 0.2 c) 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 Number of successes Color blindness The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is estimated to be about 8%. We take a random sample of size 25 from this population. The population is definitely larger than 20 times the sample size, thus we can approximate the sampling distribution by B(n = 25, p = 0.08). What are the mean and standard deviation of the count of color blind individuals in the SRS of 25 Caucasian American males? µ = np = 25*0.08 = 2 σ = √np(1 p) = √(25*0.08*0.92) = 1.36 Calculations for binomial probabilities The binomial coefficient counts the number of ways in which k successes can be arranged among n observations. The binomial probability P(X = k) is this count multiplied by the probability of any specific arrangement of the k successes: P ( X k ) n p k (1 p ) n k k X 0 0 n nC0 p q = 1 1 n-1 nC1 p q 2 2 n-2 nC2 p q The probability that a binomial random variable takes any … range of values is the sum of each probability for getting k exactly that many successes in n observations. … P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) P(X) n Total qn … k n-k nCx p q … n 0 nCn p q = 1 pn Binomial formulas The number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences). This can be calculated with the binomial coefficient: n! n k k!(n k )! Where k = 0, 1, 2, ..., or n. Binomial formulas The binomial coefficient “n_choose_k” uses the factorial notation “!”. The factorial n! for any strictly positive whole number n is: n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1 For example: 5! = 5 × 4 × 3 × 2 × 1 = 120 Note that 0! = 1. Color blindness The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is estimated to be about 8%. We take a random sample of size 25 from this population. What is the probability that exactly five individuals in the sample are color blind? P(x 5) n! 25! p k (1 p) n k 0.08 5 (0.92) 20 k!(n k)! 5!(20)! P(x 5) 21*22*23*24*25 0.085 (0.92)20 1*2* 3* 4 *5 P(x = 5) = 53,130 * 0.0000033 * 0.1887 = 0.03285