Chapter 7 Please pick up an assignment sheet and notes packet Random Variable A grocery store manager mightvalue be • A numerical variable whose interested in the number of broken depends on the outcome of a chance eggs in each carton (dozen of eggs). experiment OR • Associates a numerical An environmental scientistvalue mightwith be interested in theofamount of ozone in each outcome a chance experiment an air sample. • Two types of random variables – Discrete Since these values change and are – Continuous subject to some uncertainty, these are examples of random variables. Two Types of Random Variables: • Discrete – its set of possible is we In thisvalues chapter, willalong look ata a collection of isolated points different number line This is typically a “count” of something distributions of discrete and continuous random variables. • Continuous - its set of possible values This is typically a includes an entire “measure” interval on of a something number line Identify the following variables as discrete or continuous 1. The number of broken eggs in each carton Discrete 2. The amount of ozone in samples of air Continuous 3. The weight of a pineapple Continuous 4. The amount of time a customer spends in a store Continuous 5. The number of gas pumps in use Discrete Probability Distributions for Discrete Random Variables Probability distribution is a model that describes the longrun behavior of a variable. In a Wolf City (a fictional place), regulations prohibitThis no more than dogsprobability or cats per is called a five discrete household. distribution. It can also be displayed in anumber histogram withand the cats probability Let x = the of dogs in a What do you notice about the sum of on the vertical axis. randomly selected household in Wolf City these probabilities? Is this variable discrete orvalues continuous? What are the possible for x 0 1 2 3 4 5 x? Probability P(x) .26 .31 .21 .13 .06 .03 The Department of Animal Control has collected data over the course of several years. They have estimated the long-run probabilities for the values of x. Number of Pets Discrete Probability Distribution 1) Gives the probabilities associated with each possible x value 2) Each probability is the long-run relative frequency of occurrence of the corresponding x-value when the chance experiment is performed a very large number of times 3) Usually displayed in a table, but can be displayed with a histogram or formula Properties of Discrete Probability Distributions 1) For every possible x value, 0 < P(x) < 1. 2) For all values of x, S P(x) = 1. Dogs and Cats Revisited . . . Let x Just = theadd number of dogs or cats the probabilities for per 0, 1, and 2 household in Wolf City x 0 P(x) .26 1 2 3 4 5 .31 .21 .13 .06 .03 What does this mean? What is the probability that a randomly selected household in Wolf City has at most 2 pets? P(x < 2) = .26 + .31 + .21 = .78 • Finish the dog and cat probability problems on the second page of your notes Dogs and Cats Revisited . . . Notice that this probability Let x = the number of dogs2!or cats per does NOT include household in Wolf City x 0 P(x) .26 1 2 3 4 5 .31 .21 .13 .06 .03 What does this mean? What is the probability that a randomly selected household in Wolf City has less than 2 pets? P(x < 2) = .26 + .31 = .57 Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City When probabilities x calculating 0 1 2 3 4 for 5discrete random variables, you MUST pay close P(x) .26 to.31 .21 certain .13 .06 .03 are attention whether values included (< or >) What or notdoes included (< or >) in this mean? the calculation. What is the probability that a randomly selected household in Wolf City has more than 1 but no more than 4 pets? P(1 < x < 4) = .21 + .13 + .06 = .40 Suppose that each of four random selected customers purchasing a hot tub at a certain store chooses either an electric (E) or a gas (G) model. Assume that these customers makes their choices independently of one another and that 40% of all customers select an electric model. This implies that for any particular one of the four customers P(E) = 0.40 and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth customers select electric models and the other two choose gas models. Because the customers make their choices independently the multiplication rule for independent events implies that P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) = = P(E)P(G)P(G)P(E) = (0.4)(0.6)(0.6)(0.4) = 0.0576 Suppose that each of four random selected customers purchasing a hot tub at a certain store chooses either an electric (E) or a gas (G) model. Assume that these customers makes their choices independently of one another and that 40% of all customers select an electric model. This implies that for any particular one of the four customers P(E) = 0.40 and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth customers select electric models and the other two choose gas models. Because the customers make their choices independently the multiplication rule for independent events implies that P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) = = P(E)P(G)P(G)P(E) = (0.4)(0.6)(0.6)(0.4) = 0.0576 Outcome GGGG EGGG GEGG GGEG GGGE EEGG EGEG EGGE Outcomes and Probabilities for Hot Tub Models Probability # of electric Outcome Probability models sold 0.1296 0 GEEG 0.0576 0.0864 1 GEGE 0.0576 0.0864 1 GGEE 0.0576 0.0864 1 GEEE 0.0384 0.0864 1 EGEE 0.0384 0.0576 2 EEGE 0.0384 0.0576 2 EEEG 0.0384 0.0576 2 EEEE 0.0256 # of electric models sold 2 2 2 3 3 3 3 4 GGGG EGGG GEGG GGEG GGGE EEGG EGEG EGGE P(x = 0) = 0.1296 p(x = 1) =0.3456 p(x = 2) = 0.3456 p(x = 3) = 0.1536 p(x = 4) = 0.0256 P(2 ≤ x ≤ 4) =0.3456 + 0.1536 + 0.0256 = 0.5248 # of electric models sold 2 2 2 3 3 3 3 4 Probability of Selling X Electric Hot Tubs per Four Customers Relative Probability Outcome Outcomes and Probabilities for Hot Tub Models Probability # of electric Outcome Probability models sold 0.1296 0 GEEG 0.0576 0.0864 1 GEGE 0.0576 0.0864 1 GGEE 0.0576 0.0864 1 GEEE 0.0384 0.0864 1 EGEE 0.0384 0.0576 2 EEGE 0.0384 0.0576 2 EEEG 0.0384 0.0576 2 EEEE 0.0256 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 P(x ≤ 3) = 0.1296 + 0.3456 + 0.3456 + 0.1536 = 0.9744 1 2 3 4 Number of Electric Hot Tubs Purhased Per Four Customers Probability Distributions for Continuous Random Variables Consider the random variable: x = the weight (in pounds) of a full-term newborn child Suppose that weight is reported to the nearest pound. What The following probability histogram type of variable is this? If weight is sum measured with greater What isdistribution the of the areas of all displays the of weights. The area of the rectangle and greater accuracy, thecentered histogram the rectangles? Notice that the rectangles are The shaded area represents the over 7 pounds represents the This is an example approaches a histogram smooth curve. Nownarrower suppose that and the weight is reported begins to the probability 6 < x < 8. probability 6.5 < appearance. xof<a7.5 to have a smoother nearest 0.1 pound. This woulddensity be the curve. probability histogram. Probability Distributions for Continuous Variables • Is specified by a curve called a density curve. • The function that describes this curve is denoted by f(x) and is called the density function. • The probability of observing a value in a particular interval is the area under the curve and above the given interval. Properties of continuous probability distributions 1. f(x) > 0 (the curve cannot dip below the horizontal axis) 2. The total area under the density curve equals one. Let x denote the amount of gravel sold (in tons) during a randomly selected week at a particular sales facility. Suppose that the density curve has a height f(x) above the value x, where 2(1 x ) 0 x 1 f (x ) 0 The density curve is shown in the figure: otherwise Density 2 1 Tons 1 Gravel problem continued . . . What is the probability that at most ½ ton of gravel is sold during a randomly selected week? P(x < ½) 1 – ½(0.5)(1) = .75 Thismore areaeasily, can beby found by use OR, finding The probability would be the the = the area formula fortriangle, the area of a of the Density shaded area under the curve and trapezoid: 1 2 above the interval from 0 to 0.5. A 1 bh A 2b1 b2 h 1 2 that area from and subtracting 1. Tons 1 Gravel problem continued . . . What is the probability that exactly ½ ton of gravel is sold during a randomly selected week? P(x = ½) = 2 0 How do we find the area of a line The probability would be the area Density Since a line segment has NO segment? under the curve and above 0.5. area, then the probability that exactly ½ ton is sold equals 0. 1 Tons 1 Gravel problem continued . . . What is the probability that less than ½ ton of gravel is sold during a randomly selected week? P(x < ½) = Density 2 P(x < ½) = 1 – ½(0.5)(1) = .75 Does the probability change This is different than whether the ½ is included or not? discrete probability 1 1 distributions where it does change the probability whether a value is included or Tons not! Suppose x is a continuous random variable defined as the amount of time (in minutes) taken by a clerk to process a certain type of application form. Suppose x has a probability distribution with density function: .5 4 x 6 f (x ) 0 otherwise The following is the graph of f(x), the density curve: Density 0.5 4 5 Time (in 6 Application Problem Continued . . . What is the probability that it takes more than 5.5 minutes to process the application form? P(x > 5.5) = .5(.5) = .25 When the density is constant over an Find in thea probability by interval (resulting horizontal density calculating the area of theisshaded curve), the probability distribution called regiondistribution. (base × height). a uniform Density 0.5 4 5 Time (in 6 Other Density Curves Some density curves resemble the one below. Integral calculus is used to find the area under the these curves. Don’t worry – we will use tables (with the values already calculated). We can also use calculators or statistical software to find the area. The probability that a continuous random variable x lies between a lower limit a and an upper limit b is This will area be useful P(a < x < b) = (cumulative to thelater left in of b) this chapter! – (cumulative area to the left of a) P(a < x < b) = P(x < b) – P(x < a) Means and Standard Deviations of Probability Distributions • The mean value of a random variable x, denoted by mx, describes where the probability distribution of x is centered. • The standard deviation of a random variable x, denoted by sx, describes variability in the probability distribution Mean and Variance for Discrete Probability Distributions • Mean is sometimes referred to as the expected value (denoted E(x)). μx xp • Variance is calculated using s x m x p 2 2 • Standard deviation is the square root of the variance. Dogs and Cats Revisited . . . Let x = the number of dogs and cats in a randomly selected household in Wolf City x 0 1 2 3 4 5 P(x) .26 .31 .21 .13 .06 .03 xP(x) 0 + .31 .31 + .42 .42 + .39 .39 +.24 .24 +.15 .15 What is the mean number of pets per household in Wolf City? FirstNext multiply each x-value times find the sum of these its corresponding probability. values. mx = 1.51 pets Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City x 0 P(x) .26 1 2 3 4 5 .31 .21 .13 .06 .03 What is the standard deviation of the number of pets per This household in Wolf City? is the variance – by take the First Next find the deviation of each xmultiply the square root of this value. 2 2 2(.31) from the +mean. Then corresponding probability. Then sx =value (0-1.51) (.26) (1-1.51) 2(.21) these deviations. add these values. + square (2-1.51) + (3-1.51)2(.13) + (4-1.51)2(.06) + (5-1.51)2(.03) = 1.7499 sx = 1.323 pets Mean and Variance for Continuous Random Variables For continuous probability distributions, mx and sx can be defined and computed using methods from calculus. • The mean value mx locates the center of the continuous distribution. • The standard deviation, sx, measures the extent to which the continuous distribution spreads out around mx. A company receives concrete of a certain type from two different suppliers. Let x = compression strength of a randomly supplier is preferred to selected The firstbatch from Supplier 1 second both in terms of mean y =the compression strength of a randomly selected batch from Supplier 2 value and variability. Suppose that mx = 4650 pounds/inch2 sx = 200 pounds/inch2 my = 4500 pounds/inch2 sy = 275 pounds/inch2 4300 4500 4700 4900 my mx What happen had to the mean and Suppose Wolfwould City Grocery a total standard deviation we had of 14 employees. The followingif are the to deduct $100 from everyone’s salary monthly salaries of all the employees. because of business being bad? 3500 1300 1200 1500 1900 1700 1400 2300 2100 1200 1800 1400 1200 1300 The and standard deviation of the Let’smean graph boxplots of these monthly monthly salaries are happens to the salaries to see what distributions ... mx = $1700 and sx = $603.56 What We see that the distribution What happened just shifts to the right 100good, so the happene Suppose business is really units butgives the spread is a the d to the manager everyone $100 raise perto the standard same. means? month. The new mean and standard deviation deviations would be ? m = $1800 and s = $603.56 Wolf City Grocery Continued . . . mx = $1700 and sx = $603.56 Suppose the manager gives everyone a 20% raise - the new mean and standard deviation would be Let’s graph boxplots of these monthly mx see = $2040 and sx = $724.27 salaries to what happens to the distributions . . . Notice that multiplying Notice that both the mean and standard by a constant stretches deviation increased by 1.2. the distribution, thus, changing the standard deviation. Mean and Standard Deviation of Linear functions If x is a random variable with mean, mx, and standard deviation, sx, and a and b are numerical constants, and the random variable y is defined by y a bx and m y m a bx a bm x 2 sy 2 sa bx 2 2 b sx or s y b s x Consider the chance experiment in which a customer of a propane gas company is randomly selected. Let x be the number of gallons required to fill a propane tank. Suppose that the mean and standard deviation is 318 gallons and 42 gallons, respectively. The company is considering the pricing model of a service charge of $50 plus $1.80 per gallon. Let y be the random variable of the amount billed. What is the equation for y? y = 50 + 1.8x What are the mean and standard deviation for the amount billed? my = 50 + 1.8(318) = $622.40 sy = 1.8(42) = $75.60 Suppose we are going to play a game ? called Stat Land! Players spin the two spinners below and move the sum of the two numbers. Find the mean and 2 1 2 1 3 standard deviation for 4 3 6 4 5 these sums. Spinner B Spinner A Not sure – let’s think mA = 2.5 mB = 3.5 about it and return in sjust sB = 1.708 a few minutes! A = 1.118 are the mean List all theHere possible sums (A +and B). standard deviation for Notice that the 2 3 How 4 are 5 6 7 the each spinner. mean of the sums is standard deviations mA+B = 6 3 4the sum 5 6 the 7 8 of related? 4 5 6 7 8 9 means! sA+B =2.041 5 6 7 8 9 10 Move 1s Stat Land Continued . . . Suppose one variation of the game had players move the difference of the spinners 2 1 2 4 3 1 6 ? Move 1s 3 4 5 Find the and weBmean find the Spinner Spinner A How do standard deviation standard for for the mA = 2.5 mBdeviation = 3.5 these differences. sums or differences? sA = 1.118 sB = 1.708 List all the possible differences (B - A). 0 1 2 3 4 5 -1 -2 -3 Notice that the mean 0WOW -1– this -2 is the of1 the 0 differences is -1 same value as the the difference of the 2 1 0 standard deviation means! 3 of 2the sums! 1 4 3 2 mB-A= 1 sB-A =2.041 Mean and Standard Deviations for Linear Combinations If x1, x2, …, xn are random variables with means m1, m2, …, mn and variances s12, s22, …, sn2, respectively, This resultand is true ONLY if the x’s y = aare + a2x2 + … + anxn 1x1 independent. then This result is true regardless of whether my a1the mx x’sare a2mindependent. ... a m x n xn 2 1 s y a12s x21 a22s x22 ... an2s x2n A commuter airline flies small planes between San Luis Obispo and San Francisco. For small planes the baggage weight is a concern. Suppose it is known that the variable x = weight (in pounds) of baggage checked by a randomly selected passenger has a mean and standard deviation of 42 and 16, respectively. Consider a flight on which 10 passengers, all traveling alone, are flying. The total weight of checked baggage, y, is y = x1 + x2 + … + x10 Airline Problem Continued . . . mx = 42 and sx = 16 The total weight of checked baggage, y, is y = x1 + x2 + … + x10 What is the mean total weight of the checked baggage? mx = m1 + m2 + … + m10 = 42 + 42 + … + 42 = 420 pounds Airline Problem Continued . . . 42 and sx =are 16 all traveling x =passengers Since them10 alone, it is reasonable think that The total weight of checked to baggage, y, isthe 10 baggage weights are unrelated and therefore y = x1 + x2independent. + … + x10 What is the standard deviationdeviation, of the total To find the standard weight of thethe checked take squarebaggage? root of this value. sx2 = sx12 + sx22 + … + sx102 = 162 + 162 + … + 162 = 2560 pounds s = 50.596 pounds The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of: The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of: 𝜎𝑥+𝑦 = 𝜎𝑥 2 + 𝜎𝑦 2 The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of: 𝜎𝑥+𝑦 = 𝜎𝑥 2 + 𝜎𝑦 2 𝜎𝑏𝑎𝑟𝑠 𝑎𝑛𝑑 𝑏𝑜𝑥 = 10(4)2 +82 Number of Courses 1 2 3 4 5 6 7 Sum Mean Variance Standard Deviation Probability 0.02 0.03 0.09 0.25 0.40 0.16 0.05 (Number of Courses)*(Probability) Mean Deviation Deviation^2 (Deviation^2)*(Probability) Number of Probability (Number of Mean Deviation Deviation^2 (Deviation^2)*(Probability) Courses Courses)*(Probability) 1 0.02 0.02 4.66 -3.66 13.3956 0.267912 2 0.03 0.06 4.66 -2.66 7.0756 0.212268 3 0.09 0.27 4.66 -1.66 2.7556 0.248004 4 0.25 1 4.66 -0.66 0.4356 0.1089 5 0.4 2 4.66 0.34 0.1156 0.04624 6 0.16 0.96 4.66 1.34 1.7956 0.287296 7 0.05 0.35 4.66 2.34 5.4756 0.27378 Sum 1 4.66 1.4444 Mean 4.66 Variance 1.4444 Standard 1.20 Deviation Number of Probability (Number of Mean Deviation Deviation^2 (Deviation^2)*(Probability) Courses Courses)*(Probability) 1 0.02 0.02 4.66 -3.66 13.3956 0.267912 2 0.03 0.06 4.66 -2.66 7.0756 0.212268 3 0.09 0.27 4.66 -1.66 2.7556 0.248004 4 0.25 1 4.66 -0.66 0.4356 0.1089 5 0.4 2 4.66 0.34 0.1156 0.04624 6 0.16 0.96 4.66 1.34 1.7956 0.287296 7 0.05 0.35 4.66 2.34 5.4756 0.27378 Sum 1 4.66 1.4444 Mean 4.66 Variance 1.4444 Standard 1.20 Deviation P(# of courses > m x ) = p(# of courses > 4.66) = p(5) + p(6) + p(7) = 0.61 P( m x - 2* s x < # of courses < m x + 2* s x ) = p(4.66 – 2.4 < # of courses < 4.66 + 2.40) = p(2.26 < # of courses < 7.06) = p(3) + p(4) + p(5) + p(6) +p(7) = 0.95 p( m x - 2* s x > # of courses OR # of courses > m x + 2* s x ) = p(# of courses < 2.26 OR # of courses > 7.06) = p(2) + p(1) = 0.05 Special Distributions Two Discrete Distributions: Binomial and Geometric One Continuous Distribution: Normal Distributions Suppose we decide to record the gender of the next 25 newborns at a particular hospital. These questions can be answered using a binomial distribution. Properties of a Binomial Experiment 1.There are a fixed number of trials 2.Each trial results in one of two mutually We use n to denote the fixed exclusive outcomes. (success/failure) number of trials. 3.Outcomes of different trials are independent 4.The probability that a trial results in success is the same for all trials The binomial random variable x is defined as x = the number of successes observed when a binomial experiment is performed Are these binomial distributions? 1) Toss a coin 10 times and count the number of heads Yes 2) Deal 10 cards from a shuffled deck and count the number of red cards No, probability does not remain constant 3) The number of tickets sold to children under 12 at a movie theater in a one hour period No, no fixed number Binomial Probability Formula: Let n = number of independent trials in a binomial experiment p = constant probability that any trial results in a success n! P (x ) x ! (n x )! p (1 p ) x n x Where: n 9 can be used n ! Appendix Table to find and Technology, n C xsuch as calculators binomial probabilities. x statistical software, x ! (n x will )! also perform this calculation. Instead of recording the gender of the next 25 newborns at a particular hospital, let’s record the gender of the next 5 newborns at this hospital. is the probability of Is this a What binomial experiment? “success”? Yes, if the births were not multiple births (twins, etc). Define the random variable of interest. What will the largest value of the Will a binomial random variable x = the number of females born out of the next binomial random value be? always include the value of 0? 5 births What are the possible values of x? x 0 1 2 3 4 5 Newborns Continued . . . What is the probability that exactly 2 girls will be born out of the next 5 births? P (x 2) 5 C 2 0.5 0.5 .3125 2 3 What is the probability that less than 2 girls will be born out of the next 5 births? P (x 2) p (0) p (1) 5 C 0 .5 .5 5 C 1 .5 .5 0 .1875 5 1 4 Newborns Continued . . . Let’s construct the discrete probability distribution table for this binomial random variable: x 0 1 2 3 4 5 p(x) .03125 .1562 5 .3125 .3125 .1562 5 .0312 5 Notice thisnumber is the same as born multiplying What is thethat mean of girls in the next five births? n×p Since this is a discrete mx = 0(.03125) + 1(.15625) + 2(.3125) + distribution, we+could use: 3(.3125) + 4(.15625) 5(.03125) =2.5 mx xp Formulas for mean and standard deviation of a binomial distribution mx np sx np 1 p Newborns Continued . . . How many girls would you expect in the next five births at a particular hospital? mx np 5(.5) 2.5 What is the standard deviation of the number of girls born in the next five births? sx np (1 p ) 5(.5)(.5) 1.118 Remember, in binomial distributions, trials should be independent. However, when we sample, we typically sample without replacement, which wouldif When sampling without replacement mean thatn the independent. is attrials mostare 5%not of N, then the . . binomial distribution gives a observed good In this case, the number of success approximation the probability would not be a binomialtodistribution but distribution of x. rather hypergeometric distribution. But when the samplefor size, n, is smallin and The calculation probabilities a the population size, N, is distribution large, probabilities hypergeometric are even calculated using binomial more tedious thandistributions the binomial and formula! are VERY close! hypergeometric distributions • • • • • Suppose a particular breed of dog gives birth to a male dog 59% of the time and gives birth to a female dog 41% of the time. Let M = event that a male pup is born Let F = event that a female pup is born Let x = the number of male pups born in a litter of four pups Fill in the following table: Outcome Probability Number of Male Pups (x) Outcome FFFF FMMF MFFF FMFM FMFF FFMM FFMF MMMF FFFM MMFM MMFF MFMM MFMF FMMM MFFM MMMM Probability Number of Male Pups (x) Outcome Probability Number of Male Pups (x) Outcome Probability Number of Male Pups (x) FFFF 0.0283 0 FMMF 0.0585 2 MFFF 0.0406 1 FMFM 0.0585 2 FMFF 0.0406 1 FFMM 0.0585 2 FFMF 0.0406 1 MMMF 0.0842 3 FFFM 0.0406 1 MMFM 0.0842 3 MMFF 0.0585 2 MFMM 0.0842 3 MFMF 0.0585 2 FMMM 0.0842 3 MFFM 0.0585 2 MMMM 0.1212 4 Probability of Getting a Number of Male Pups in a Litter of Four Pups Probability 0.4 0.3 0.2 0.1 0 0 1 2 3 Number of Male Pups in a LItter of Four 4 Newborns Revisited . . . Suppose we were not interested in the number of females born out of the next five births, but which birth would result in the first female being born? How is this question different from a binomial distribution? Properties of Geometric Distributions: • There are two mutually exclusive outcomes that result in a success or failure So what are the • Each trial is independent of the others possible values of x • The probability of success is the same for all trials. To infinity How far will this go? A geometric random variable x is defined as x = the number of trials UNTIL the FIRST success is observed ( including the success). x 1 2 3 4 ... Probability Formula for the Geometric Distribution Let p = constant probability that any trial results in a success x 1 p (x ) (1 p ) Where x = 1, 2, 3, … p Suppose that 40% of students who drive to campus at your school or university carry jumper cables. Your car has a dead battery and you don’t have jumper cables, so you decide to stop students as they are headed to the parking lot and ask them whether they have a pair of jumper cables. Let x = the number of students stopped before finding one with a pair of jumper cables Is this a geometric distribution? Yes Jumper Cables Continued . . . Let x = the number of students stopped before finding one with a pair of jumper cables p = .4 What is the probability that third student stopped will be the first student to have jumper cables? P(x = 3) = (.6)2(.4) = .144 What is the probability that at most three student are stopped before finding one with jumper cables? P(x < 3) = P(1) + P(2) + P(3) = (.6)1(.4) + (.6)2(.4) = .784 (.6)0(.4) + Welcome back! Please pick up: • Notes Packet • Assignment Sheet • t-score table Normal Distributions • Continuous probability distribution is this we done To overcome the need forHow calculus, rely • Symmetrical bell-shaped (unimodal) density mathematically? on technology or on a table of areas for the curve defined by m and s standard normal distribution • Area under the curve equals 1 • Probability of observing a value in a particular interval is calculated by finding the area under the curve • As s increases, the curve flattens & spreads out • As s decreases, the curve gets taller and thinner A B 6 s s Do these two normal curves have the same mean? If so, what is it? YES Which normal curve has a standard deviation of B 3? Which normal curve has a standard deviation of A 1? Notice that the normal curve is curving downwards from the center (mean) to points that are one standard deviation on either side of the mean. At those points, the normal curve begins to turn upward. Standard Normal Distribution • Is a normal distribution with m = 0 and s =1 • It is customary to use the letter z to represent a variable whose distribution is described by the standard normal curve (or z curve). Using the Table of Standard Normal (z) Curve Areas • For any number z*, from -3.89 to 3.89 and To decimal use the places, table: the rounded to two Appendix Table 2 gives the area under the z curve andthe to correct the left row of z*.and column • Find (see the P(z following < z*) =example) P(z < z*) • The number at the intersection of Where that row and column is the probability the letter z is used to represent a random variable whose distribution is the standard normal distribution. Suppose we are interested in the probability that z* is less than -1.62. In the table of areas: P(z < -1.62) =.0526 •Find the row labeled -1.6 •Find the column labeled 0.02 … .0436 .0537 .0655 … .0446 .0548 .0668 … -1.7 -1.6 -1.5 … … •Find the intersection of the row and column … z* .00 .01 .02 .0427 .0526 .0643 .0418 .0516 .0618 Suppose we are interested in the probability that z* is less than 2.31. P(z < 2.31) =.9896 … .9864 .9896 .9920 .02 … … .9861 .9893 .9918 .01 … 2.2 2.3 2.4 .00 … … z* .9868 .9898 .9922 .9871 .9901 .9925 Suppose we are interested in the probability that z* is greater than 2.31. … .9864 .9896 .9920 … .9861 .9893 .9918 … 2.2 2.3 2.4 … … The Table of Areas gives the area to the P(z > 2.31) = LEFT of the z*. 1 - .9896 = .0104 To find the area to the right, subtract the value in the table from 1 … z* .00 .01 .02 .9868 .9898 .9922 .9871 .9901 .9925 Suppose we are interested in the finding the z* for the smallest 2%. .0162 .0207 .0262 … … … … … … -2.1 -2.0 -1.9 … … To find z*: P(z < z*) = .02 Since .0200 doesn’t appear in the body z* = -2.08 Look for the area .0200 in the body of the Table, use the closest toofit. z*value the Table. Follow the row and column back out to read the z-value. … z* .03 .04 .05 .0158 .0202 .0256 .0154 .0197 .0250 Suppose we are interested in the finding the z* for the largest 5%. Since .9500 is exactly between .9495 .95 P(z > z*)and = .05 .9505, we can average the z* for each of these z* = 1.645 z* … … … … … Remember the Table of Areas gives the area to the LEFT of z*. … z* .03 .04 .05 1 – (area to the right of z*) … 1.5 .9382 .9398 .9406 Then look up this value in the body of … the.9495 1.6 table. .9505 .9515 … 1.7 .9591 .9599 .9608 Finding Probabilities for Other Normal Curves • To find the probabilities for other normal curves, standardize the relevant values and then use the table of z areas. • If x is a random variable whose behavior is described by a normal distribution with mean m and standard deviation s , then P(x < b) = P(z < b*) P(x > a) = P(z > a*) P(a < x < b) = P(a* < z < b*) Where z is a variable whose distribution is standard normal and a* a m s b* b m s Data on the length of time to complete registration for classes using an on-line registration system suggest that the distribution of the variable x = time to register for students at a particular university can well be approximated by a normal distribution with mean m = 12 minutes and standard deviation s = 2 minutes. Registration Problem Continued . . . x = time to register Standardized this value. m = 12 minutes and s = 2 minutes What is the probability that will value take aup Lookitthis randomly selected student less than 9 minutes in the table. to complete registration? P(x < 9) = .0668 9 12 b* 1.5 2 9 Registration Problem Continued . . . x = time to register Standardized this value. m = 12 minutes and s = 2 minutes What is the probability that will value take aup Lookitthis randomly selected student thanand 13 inmore the table minutes to complete registration? subtract from 1. P(x > 13) = 1 - .6915 = .3085 13 12 a* .5 2 13 Registration Problem Continued . . . x = time to register Standardized these values. m = 12 minutes and s = 2 minutes Look thesethat values up take in thea table What is the probability it will and between subtract 7 and 15 randomly selected student (valueregistration? for a*) – (value for b*) minutes to complete P(7 < x < 15) = .9332 - .0062 = .9270 15 12 a* 1.5 2 7 12 b* 2.5 2 7 15 Registration Problem Continued . . . x = time to register m = 12 minutes and s = 2 minutes Because some notto logthe off properly, the Lookstudents up the do area Use formulaautomatically for university would log offtable. students left oflike a* to in thethe to find x. after some time has standardizing elapsed. It is decided to select this time so that only 1% of students will be automatically logged off while still trying to register. What time should the automatic log off be set at? a* = 16.66 P(x > a*) = .01 .99 x 12 2.33 2 .01 a* Ways to Assess Normality What should Some of theifmost happen our frequently used statistical methods are valid only when x , x , …, x has 1 2 n data set is come from a population distribution that at least normally is approximately normal. One way to see distributed? whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs. Consider a random sample with n = 5. Each region To find appropriate normal scores for Why arethe these area aregions sample ofthe size 5, divide has the an standard not equal to 0.2. normal curve into 5 equal-area regions. same width? Consider random sample withthatn we = 5. These aare the normal scores Next – find the median for each would plot ourz-score data against. region. Why is the We use technology (calculators or median not in statistical software) to compute these the “middle” of normal scores. each region? -1.28 1.28 0 -.524 .524 Ways to Assess Normality Some of the most frequently used statistical Such as curvature which methodsOr areoutliers valid would only when x1, xskewness indicate in 2, …, xn has come from a population distribution the data that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data. A normal probability plot is a scatterplot of (normal score, observed values) pairs. A strong linear pattern in a normal probability plot suggest that population normality is plausible. On the other hand, systematic departure from a straight-line pattern indicates that it is not reasonable to assume that the population Sketch a scatterplot by pairing theis The following data represent eggplot weights Let’s construct a normal probability Since the normal probability smallest score with the (in grams) for the a normal sample ofof10 eggs. plot. Since values normal approximately linear, itthe is plausible smallest observation from data scores depend on the sample size n, is that the distribution of egg the weights set & so on normal. the normalapproximately scores when n = 10 are 53.04 below: 53.50 52.53 53.00 53.07 53.5 52.86 52.66 53.23 53.26 53.16 53.0 -1.539 -1.001 -0.656 -0.376 -0.123 52.5 0.123 0.376 0.656 1.001 1.539 -1.5 -1.0 -0.5 0.5 1.0 1.5 Using the Correlation Coefficient to Assess Normality •The correlation coefficient, r, can be calculated for the n (normal score, observed value) pairs. •If r is too much smaller than 1, then normality of Since the underlying distribution is questionable. r > critical r, Values to Which r Can be Compared to Check for How iseggs “toosample thenNormality it is plausible that the Consider these points from thesmaller weight of data: of egg came from much smaller (-1.539, 52.53) 52.66) (-.656,52.86) (n 5 10 (-1.001, 15 20 25 weights 30 40 50than 60 a 75 .376,53.00) (-.123, 53.04) (.123,53.07) distribution that (.376,53.16) was 1”? Critica (.656,53.23) (1.001,53.26) (1.539,53.50) .832 .880 911 .929 .941 .949 .960 normal. .966 .971 .976 approximately lr Calculate the correlation coefficient for these points. r = .986 Transforming Data to Achieve Normality • When the data is not normal, it is common to use a transformation of the data. • For data that shows strong positive skewness (long upper tail), a logarithmic transformation usually applied. • Square root, cube root, and other transformations can also be applied to the data to determine which transformation best normalizes the data. Consider the data set in Table 7.4 (page 463) about plasma and urinary AGT levels. A histogram of the urinary AGT levels is strongly positively skewed. A logarithmic transformation is applied to the data. The histogram of the log urinary AGT levels is more Using the Normal Distribution to Approximate Discrete Suppose this a bar is centered at x = 6. The bar actually begins at 5.5 and ends Distribution at 6.5.theTheses endpoints will be used Suppose probability distribution of a Often, a probability histogram can in bethe in variable calculations. discrete random x is displayed well approximated by a normal curve. If histogram below. so, it is customary to of saya that x has an The probability particular This is called a continuity correction. approximately normal distribution. value is the area of the rectangle centered at that value. 6 Normal Approximation to a Binomial Distribution Let x be a random variable based on n trials and success probability p, so that: m np s np (1 p ) If n and p are such that: np > 10 and n (1 – p) > 10 then x has an approximately normal distribution. Premature babies are born before 37 weeks, and those born before 34 weeks are most at risk. A study reported that 2% of births in the United States occur before 34 weeks. Suppose that 1000 births are randomly selected and that the number of these births that occurred prior to 34 weeks,Since x, is both to beare determined. greater than 10, the distribution Can the distribution of x be of x can be np = 1000(.02) = 20 > 10 approximated by a normal approximated by a distribution? n(1 – p) = 1000(.98) = 980 > 10 normal distribution Find the mean and standard deviation for the approximated m np 1000normal (.02) distribution. 20 s np (1 p ) 1000(.02)(.98) 4.427 Premature Babies Continued . . . m = 20 and s = 4.427 What is the probability that the number of Look up these babies in the sample born prior to 34 valuesofin1000 the table weeks will be between 10 andthe 25 (inclusive)? and subtract To find the shaded probabilities. standardize = .8836 P(10 < x < 25) =.8925 - .0089area, the endpoints. a* 9.5 20 2.37 4.427 b* 25.5 20 1.24 4.427 Image for Question 9 Images for Question 10