Applied Statistics I Liang Zhang Department of Mathematics, University of Utah June 17, 2008 Liang Zhang (UofU) Applied Statistics I June 17, 2008 1 / 22 Random Variables Definition A dicrete random variable is an rv whose possible values either constitute a finite set or else can be listed in an infinite sequence in which there is a first element, a second element, and so on (“countably” infinite). A random variable is continuous if both of the following apply: 1. Its set of possible values consists either of all numbers in a single interval on the number line (possibly infinite in extent, e.g., (−∞, ∞) ) or all numbers in a disjoint union of such intervals (e.g., [0, 10] ∪ [20, 30]). 2. No possible value of the variable has positive probability, that is, P(X = c) = 0 for any possible value c. Examples Liang Zhang (UofU) Applied Statistics I June 17, 2008 2 / 22 Random Variables Examples: 1. Assume we toss a coin. Then S = {H, T}. We can define a rv X by X (H) = 1 and X (T) = 0 2. A techincian is going to check the quality of 10 prodcuts. For each product the outcome is either successful (S) or defective (D). Then we can define a rv Y by ( 1, successful Y = 0, defective Definition Any random variable whose only possible values are 0 abd 1 is called a Bernoulli random variable. Liang Zhang (UofU) Applied Statistics I June 17, 2008 3 / 22 Random Variables More examples: 3. (Example 3.3) We are investigating two gas stations. Each has six gas pumps. Consider the experiment in which the number of pumps in use at a particular time of day is determined for each of the stations. Define rv’s X , Y and U by X = the total number of pumps in use at the two stations Y = the difference between the number of pumps in use at station 1 and the number in use at station 2 U = the maximum of the numbers of pumps in use at the two stations If this experiment is performed and s = (3, 4) results, then X ((3, 4)) = 3 + 4 = 7, so we say that the observed value of X was x = 7. Similarly, the observed value of Y would be y = 3 − 4 = −1, and the observed value of U would be u = max(3, 4) = 4. Liang Zhang (UofU) Applied Statistics I June 17, 2008 4 / 22 Random Variables More examples: 4. Assume we toss a coin until we get a Head. Then the sample space would be S = {H, TH, TTH, TTTH, . . . } If we define a rv X by X X = the number we totally tossed Then X ({H}) = 1, X ({TH}) = 2, X ({TTH}) = 3, . . . , and so on. In this case, the random variable X can be any positive integer, which in all is infinite. 5. Assume we are going to measure the length of 100 desks. Define the rv Y by Y = the length of a particular desk Y can also assume infinitly possible values. Liang Zhang (UofU) Applied Statistics I June 17, 2008 5 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 6 / 22 Probability Distributions for Discrete RV An example: Assume we toss a coin 3 times and record the outcomes. Let Xi be a random variable defined by ( 1, if the i th outcome is Head; Xi = 0, if the i th outcome is Tail; Let X be the random variable such that X = X1 + X2 + X3 , then X represents the total number of Heads we could get from the experiment. Liang Zhang (UofU) Applied Statistics I June 17, 2008 6 / 22 Probability Distributions for Discrete RV An example: Assume we toss a coin 3 times and record the outcomes. Let Xi be a random variable defined by ( 1, if the i th outcome is Head; Xi = 0, if the i th outcome is Tail; Let X be the random variable such that X = X1 + X2 + X3 , then X represents the total number of Heads we could get from the experiment. If the probability for getting a Head for each toss is 0.7, then the probabilities for all the outcomes are tabulated as following: s x p(x) HHH 3 0.343 HHT 2 0.147 Liang Zhang (UofU) HTH 2 0.147 HTT 1 0.063 THH 2 0.147 Applied Statistics I THT 1 0.063 TTH 1 0.063 TTT 0 0.027 June 17, 2008 6 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 7 / 22 Probability Distributions for Discrete RV Example continued: HHH HHT s x 3 2 p(x) 0.343 0.147 Liang Zhang (UofU) HTH 2 0.147 HTT 1 0.063 THH 2 0.147 Applied Statistics I THT 1 0.063 TTH 1 0.063 TTT 0 0.027 June 17, 2008 7 / 22 Probability Distributions for Discrete RV Example continued: HHH HHT HTH HTT THH s x 3 2 2 1 2 p(x) 0.343 0.147 0.147 0.063 0.147 We can re-tabulate it only for the x values: x 0 1 2 3 p(x) 0.027 0.189 0.441 0.343 Liang Zhang (UofU) Applied Statistics I THT 1 0.063 TTH 1 0.063 TTT 0 0.027 June 17, 2008 7 / 22 Probability Distributions for Discrete RV Example continued: HHH HHT HTH HTT THH s x 3 2 2 1 2 p(x) 0.343 0.147 0.147 0.063 0.147 We can re-tabulate it only for the x values: x 0 1 2 3 p(x) 0.027 0.189 0.441 0.343 Now we can answer various questions. Liang Zhang (UofU) Applied Statistics I THT 1 0.063 TTH 1 0.063 TTT 0 0.027 June 17, 2008 7 / 22 Probability Distributions for Discrete RV Example continued: HHH HHT HTH HTT THH THT s x 3 2 2 1 2 1 p(x) 0.343 0.147 0.147 0.063 0.147 0.063 We can re-tabulate it only for the x values: x 0 1 2 3 p(x) 0.027 0.189 0.441 0.343 Now we can answer various questions. The probability that there are at most 2 Heads is TTH 1 0.063 TTT 0 0.027 P(X ≤ 2) = P(x = 0 or 1 or 2) = p(0) + p(1) + p(2) = 0.657 Liang Zhang (UofU) Applied Statistics I June 17, 2008 7 / 22 Probability Distributions for Discrete RV Example continued: HHH HHT HTH HTT THH THT s x 3 2 2 1 2 1 p(x) 0.343 0.147 0.147 0.063 0.147 0.063 We can re-tabulate it only for the x values: x 0 1 2 3 p(x) 0.027 0.189 0.441 0.343 Now we can answer various questions. The probability that there are at most 2 Heads is TTH 1 0.063 TTT 0 0.027 P(X ≤ 2) = P(x = 0 or 1 or 2) = p(0) + p(1) + p(2) = 0.657 The probability that the number of Heads are is strictly between 1 and 3 is P(1 < X < 3) = P(X = 2) = p(2) = 0.441 Liang Zhang (UofU) Applied Statistics I June 17, 2008 7 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 8 / 22 Probability Distributions for Discrete RV Definition The probability distribution or probability mass function (pmf) of a discrete rv is defined for every number x by p(x) = P(X = x) = P(all s ∈ S : X (s) = x). Liang Zhang (UofU) Applied Statistics I June 17, 2008 8 / 22 Probability Distributions for Discrete RV Definition The probability distribution or probability mass function (pmf) of a discrete rv is defined for every number x by p(x) = P(X = x) = P(all s ∈ S : X (s) = x). In words, for every possible value x of the random variable, the pmf specifies the probability of observing that value P when the experiment is performed. (The conditions p(x) ≥ 0 and all possible x p(x) = 1 are required for any pmf.) Liang Zhang (UofU) Applied Statistics I June 17, 2008 8 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 9 / 22 Probability Distributions for Discrete RV Example 3.8 Six lots of components are ready to be shipped by a certain supplier. The number of defective components in each lot is as follows: Lot Number of defectives 1 0 2 2 3 0 4 1 5 2 6 0 One of these lots is to be randomly selected for shipment to a particular customer. Let X be the number of defectives in the selected lot. Liang Zhang (UofU) Applied Statistics I June 17, 2008 9 / 22 Probability Distributions for Discrete RV Example 3.8 Six lots of components are ready to be shipped by a certain supplier. The number of defective components in each lot is as follows: Lot Number of defectives 1 0 2 2 3 0 4 1 5 2 6 0 One of these lots is to be randomly selected for shipment to a particular customer. Let X be the number of defectives in the selected lot. The three possible X values are 0, 1 and 2. The pmf for X is 3 p(0) = P(X = 0) = P(lot 1 or 3 or 6 is selected) = = 0.500 6 1 p(1) = P(X = 1) = P(lot 4 is selected) = = 0.167 6 2 p(2) = P(X = 2) = P(lot 2 or 5 is selected) = = 0.333 6 Liang Zhang (UofU) Applied Statistics I June 17, 2008 9 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 10 / 22 Probability Distributions for Discrete RV Example 3.10: Consider a group of five potential blood donors — a, b, c, d, and e — of whom only a and b have type O+ blood. Five blood smaples, one from each individual, will be typed in random order until an O+ individual is identified. Let the rv Y = the number of typings necessary to identify an O+ individual. Then what is the pmf of Y ? Liang Zhang (UofU) Applied Statistics I June 17, 2008 10 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 11 / 22 Probability Distributions for Discrete RV Example: Consider whether the next customer coming to a certain gas station buys gasoline or diesel. Let ( 1, if the customer purchases gasoline X = 0, if the customer purchases diesel If 30% of all customers in one month purchase diesel, then the pmf for X is p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = 0.3 p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 0.7 p(x) = P(X = x) = 0 for x 6= 0 or 1 Liang Zhang (UofU) Applied Statistics I June 17, 2008 11 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 12 / 22 Probability Distributions for Discrete RV Example: Consider whether the next customer coming to a certain gas station buys gasoline or diesel. Let ( 1, if the customer purchases gasoline X = 0, if the customer purchases diesel If 100α% of all customers in one month purchase diesel, then the pmf for X is p(0) = P(X = 0) = P(nextcustomerbuysdiesel) = α p(1) = P(X = 1) = P(nextcustomerbuysgasoline) = 1 − α p(x) = P(X = x) = 0 for x 6= 0 or 1 here α is between 0 and 1. Liang Zhang (UofU) Applied Statistics I June 17, 2008 12 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 13 / 22 Probability Distributions for Discrete RV Definition Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution. The collection of all probability distributions for different values of the parameter is called a family of probability distribution. Liang Zhang (UofU) Applied Statistics I June 17, 2008 13 / 22 Probability Distributions for Discrete RV Definition Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution. The collection of all probability distributions for different values of the parameter is called a family of probability distribution. For the previous example, the quantity α is a parameter. Each different value of α between 0 and 1 determines a different member of a family of distributions; two such members are 0.3 if x = 0 0.25 if x = 0 p(x) = 0.7 if x = 1 p(x) = 0.75 if x = 1 0 otherwise 0 otherwise Liang Zhang (UofU) Applied Statistics I June 17, 2008 13 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 14 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. Then p(1) = P(X = 1) = P({♠}) = p p(2) = P(X = 2) = P({♠♠}) = (1 − p) · p p(3) = P(X = 3) = P({♠♠♠}) = (1 − p) · (1 − p) · p ... Liang Zhang (UofU) Applied Statistics I June 17, 2008 14 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. Then p(1) = P(X = 1) = P({♠}) = p p(2) = P(X = 2) = P({♠♠}) = (1 − p) · p p(3) = P(X = 3) = P({♠♠♠}) = (1 − p) · (1 − p) · p ... A general formula would be ( (1 − p)x−1 · p p(x) = 0 Liang Zhang (UofU) Applied Statistics I x = 1, 2, 3, . . . otherwise June 17, 2008 14 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability for us to draw at most 3 times? More than 2 times? Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability for us to draw at most 3 times? More than 2 times? P(X ≤ 3) = p(1) + p(2) + p(3) = 0.2 + 0.2 · 0.8 + 0.2 · (0.8)2 = 0.488 Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let p = P({♠}), i.e. there are 100 · p ♠’s. Assume the successive drawings are independent and define X = the number of drawings. If we know that there are 20 ♠’s, i.e. p = 0.2, then what is the probability for us to draw at most 3 times? More than 2 times? P(X ≤ 3) = p(1) + p(2) + p(3) = 0.2 + 0.2 · 0.8 + 0.2 · (0.8)2 = 0.488 P(X > 2) = p(3)+p(4)+p(5)+· · · = 1−p(1)−p(2) = 1−0.2−0.2·0.8 = 0.64 Liang Zhang (UofU) Applied Statistics I June 17, 2008 15 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 16 / 22 Probability Distributions for Discrete RV Definition The cumulative distribution function (cdf) F (x) of a discrete rv X with pmf p(x) is defined for every number x by X F (x) = P(X ≤ x) = p(y ) y :y ≤x For any number x, F(x) is the probability that the observed value of X will be at most x. Liang Zhang (UofU) Applied Statistics I June 17, 2008 16 / 22 Probability Distributions for Discrete RV Definition The cumulative distribution function (cdf) F (x) of a discrete rv X with pmf p(x) is defined for every number x by X F (x) = P(X ≤ x) = p(y ) y :y ≤x For any number x, F(x) is the probability that the observed value of X will be at most x. F (x) = P(X ≤ x) = P(X is less than or equal to x) p(x) = P(X = x) = P(X is exactly equal to x) Liang Zhang (UofU) Applied Statistics I June 17, 2008 16 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 17 / 22 Probability Distributions for Discrete RV Example 3.10 (continued): 0 0.4 F (y ) = 0.7 0.9 1 Liang Zhang (UofU) if if if if if y <1 1≤y <2 2≤y <3 3≤y <4 y ≥2 Applied Statistics I June 17, 2008 17 / 22 Probability Distributions for Discrete RV Example 3.10 (continued): 0 0.4 F (y ) = 0.7 0.9 1 Liang Zhang (UofU) if if if if if y <1 1≤y <2 2≤y <3 3≤y <4 y ≥2 Applied Statistics I June 17, 2008 17 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 18 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e. there are 100 · α ♠’s. Assume the successive drawings are independent and define X = the number of drawings. The pmf would be ( (1 − α)x−1 · α x = 1, 2, 3, . . . p(x) = 0 otherwise Liang Zhang (UofU) Applied Statistics I June 17, 2008 18 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e. there are 100 · α ♠’s. Assume the successive drawings are independent and define X = the number of drawings. The pmf would be ( (1 − α)x−1 · α x = 1, 2, 3, . . . p(x) = 0 otherwise Then for any positive interger x, we have F (x) = X y ≤x p(y ) = x x−1 X X (1 − α)(y −1) · α = α (1 − α)y y =1 ( 1 − (1 − α)x = 0 Liang Zhang (UofU) y =0 x ≥1 x <1 Applied Statistics I June 17, 2008 18 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 19 / 22 Probability Distributions for Discrete RV Example: Assume we are drawing cards from a 100 well-shuffled cards with replacement. We keep drawing until we get a ♠. Let α = P({♠}), i.e. there are 100 · α ♠’s. Assume the successive drawings are independent and define X = the number of drawings. The pmf would be Liang Zhang (UofU) Applied Statistics I June 17, 2008 19 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 20 / 22 Probability Distributions for Discrete RV pmf =⇒ cdf: F (x) = P(X ≤ x) = X p(y ) y :y ≤x Liang Zhang (UofU) Applied Statistics I June 17, 2008 20 / 22 Probability Distributions for Discrete RV pmf =⇒ cdf: F (x) = P(X ≤ x) = X p(y ) y :y ≤x It is also possible cdf =⇒ pmf: Liang Zhang (UofU) Applied Statistics I June 17, 2008 20 / 22 Probability Distributions for Discrete RV pmf =⇒ cdf: F (x) = P(X ≤ x) = X p(y ) y :y ≤x It is also possible cdf =⇒ pmf: p(x) = F (x) − F (x−) where “x−” represents the largest possible X value that is strictly less than x. Liang Zhang (UofU) Applied Statistics I June 17, 2008 20 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 21 / 22 Probability Distributions for Discrete RV Proposition For any two numbers a and b with a ≤ b, P(a ≤ X ≤ b) = F (b) − F (a−) where “a−” represents the largest possible X value that is strictly less than a. In particular, if the only possible values are integers and if a and b are integers, then P(a ≤ X ≤ b) = P(X = a or a + 1 or . . . or b) = F (b) − F (a − 1) Taking a = b yields P(X = a) = F (a) − F (a − 1) in this case. Liang Zhang (UofU) Applied Statistics I June 17, 2008 21 / 22 Probability Distributions for Discrete RV Liang Zhang (UofU) Applied Statistics I June 17, 2008 22 / 22 Probability Distributions for Discrete RV Example (Problem 23): A consumer organization that evaluates new automobiles customarily reports the number of major defects in each car examined. Let X denote the number of major defects in a randomly selected car of a certain type. The cdf of X is as follows: x <0 0 0.06 0 ≤ x < 1 0.19 1 ≤ x < 2 0.39 2 ≤ x < 3 F (x) = 0.67 3 ≤ x < 4 0.92 4 ≤ x < 5 0.97 5 ≤ x < 6 1 x ≤6 Calculate the following probabilities directly from the cdf: (a)p(2), (b)P(X > 3) and (c)P(2 ≤ X < 5). Liang Zhang (UofU) Applied Statistics I June 17, 2008 22 / 22