Binomial setting and distributions

Binomial setting and distributions Binomial distributions are models for some categorical variables, typically representing the number of successes in a series of n independent trials. The observations must meet these requirements:  the total number of observations n is fixed in advance  each observation falls into just one of two categories: success and failure  the outcomes of all n observations are statistically independent  all n observations have the same probability p of “success” Applications for binomial distributions Binomial distributions describe the possible number of times that a particular event will occur in a sequence of observations.  In a clinical trial, a patient’s condition may improve or not. The binomial distribution describes the number of patients who improved (not how much better they feel) among the study participants.  Is a child obese or not (based on their body mass index)? The binomial distribution describes the number of obese children in a random sample of school-age children.  In a quality control study, we assess the number of defective items in a lot of goods, irrespective of the type of defect. Binomial parameters We express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p: X ~ B(n,p).  The parameter n is the total number of observations.  The parameter p is the probability of success on each observation.  The count of successes X can be any whole number between 0 and n. The CDC estimates that a third of adult men are obese. In a random sample of 10 adult men, each man is either obese or not. The variable X is the number of obese men among those 10 men sampled, our count of “successes.” For each man, the probability of success, “obese,” is 1/3. The number X of obese men among 10 men has the binomial distribution B(n = 10, p = 1/3). Binomial probabilities The number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences). This can be calculated with the binomial coefficient: R: choose(n,k) n!  n    k  k!(n  k )! where k = 0, 1, 2, ..., or n The binomial coefficient “n_choose_k” uses the factorial notation “!”. The factorial n! for any strictly positive whole number n is: n! = n × (n − 1) × (n − 2) × … × 3 × 2 × 1 The binomial coefficient counts the number of ways in which k successes can be arranged among n observations. The binomial probability P(X = k) is this count multiplied by the probability of any specific arrangement of the k successes: P( X  k )   n  p k (1  p) nk k X P(X) 0 𝑛 0 n p q = qn 0 𝑛 1 n-1 pq 1 𝑛 2 n-2 pq 2 … 1 2 … The probability that a binomial random variable takes any range of values is the sum of each probability for getting exactly that many successes in n observations. k … n P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) Total 𝑛 k n-k pq 𝑘 … 𝑛 n 0 p q = pn 𝑛 1 The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is estimated to be about 8%. In a group of 25 Caucasian American males, what is the probability that exactly five are color blind?  P(x = 5) = [n! / k!(n – k)!] pk(1 – p)n-k = (25! / 5!(20)!) 0.0850.9220 = [21*22*23*24*24*25 / 1*2*3*4*5] 0.0850.9220 = 53,130 * 0.0000033 * 0.1887 = 0.03285  Use technology > dbinom(5,25,.08) [1] 0.03285083 The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). The probability that exactly 2 adults in the sample have depression is ???? A) 0.010 B) 0.020 C) 0.078 D) 0.100 E) 0.112 Binomial mean and variance The center and spread of the binomial distribution for a count X are defined by the mean m and standard deviation s: m  np s  np(1  p) The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). Thus, m  np  50  0.1  5 s  np(1  p)  50  0.1 0.9  4.5  2.12 Effect of changing p when n is fixed Binomial distributions are skewed when p is close to 0 or close to 1 (especially if the sample is small). P(X=x) 0.4 B(5,0.5) 0.3 0.2 0.1 0 0 1 2 3 4 5 3 4 5 3 4 5 X 0.4 B(5,0.1) P(X=x) P(X=x) 1 0.5 0 0 1 2 3 4 0.3 0.2 0.1 0 5 0 X 1 2 X 0.4 0.8 B(5,0.3) 0.3 P(X=x) P(X=x) B(5,0.7) 0.2 0.1 0.6 B(5,0.9) 0.4 0.2 0 0 0 1 2 X 3 4 5 0 1 2 X Effect of changing n for a fixed value of p 0.5 0.3 B(5,0.15) 0.3 0.2 0.2 0.15 0.1 0.05 0.1 0 0 0 2 4 6 0 8 10 12 14 16 18 20 X 2 4 6 8 10 12 14 16 18 20 X 0.3 0.4 P(X=x) 0.2 0.1 B(20,0.15) 0.25 B(10,0.15) 0.3 P(X=x) B(15,0.15) 0.25 P(X=x) P(X=x) 0.4 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 14 16 18 20 X 0 0 2 4 6 8 10 12 14 16 18 20 X Normal approximation to binomial Binomial distribution can be approximated by a Normal distribution, when both np ≥10 and n(1 − p) ≥10.    B m  np, s  np (1  p) ~ N m  np, s  np (1  p)  The approximation can be improved by using a continuity correction to take into account the fact that the Normal distribution is continuous. Hint: P(X=x) = P(x-.5 ≤ X ≤ x+.5) The incidence of major depression in adults is about 10%. 0.30 Count of adults diagnosed with depression in a sample of 20 adults, Bin(n = 20, p = 0.1). Binomial, n=20,p=0.1 p=0.1 Binomial, n=20, 0.25 Probability 0.20 No Normal approximation 0.15 0.10 Why?? 0.05 0.00 0 1 2 3 4 5 6 Count of adults with depression 0.30 Binomial, n=100, p=0.1 Binomial, n=100, p=0.1 0.25 Probability 0.20 Count of adults diagnosed with depression in a sample of 100 adults, Bin(n = 100, p = 0.1). 0.15 Normal approximation OK Why? 0.10 0.05 0.00 0 5 10 15 Count of adults with depression 20 7 8 The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is about 8%. We take a random sample of size 125 from this population. What is the probability that 6 individuals or fewer in the sample are color blind?  Distribution of the count X: B (n = 125, p = 0.08)  np = 10 P(X ≤ 6) = pbinom(6,size=125,prob=.08) in R [1] 0.1198136 or about 12%  Normal approximation: N (np = 10, √np(1 − p) = 3.033) P(X ≤ 6) = pnorm(6, mean=10, sd=3.033) = 0.0936 or about 9% Or z = (x - µ)/σ = (6 − 10)/3.033 = -1.32  P(X ≤ 6) = 0.0934 from Table B The Normal approximation is reasonable, but not quite close to 12%. Here p = .08 is not close to 0.5, but np = 10 just meets the criterion. Using a continuity correction greatly improves the approximation:  P(X ≤ 6) = P(X≤6.5) = pnorm(6.5, mean=10, se=3.033) = 0.1243 Distributions for the color blindness example. Binomial Normal approx. 0.25 P(X=x) 0.2 n = 50 0.15 0.1 The larger the sample size the better 0.05 the Normal approximation fits the 0 0 1 2 3 4 5 6 7 8 9 10 11 12 binomial distribution. Count of successes Normal approx. Binomial 0.14 0.05 0.12 0.1 0.04 n = 125 0.08 0.06 P(X=x) P(X=x) Binomial 0.04 0.02 Normal approx. n = 1000 0.03 0.02 0.01 0 0 0 5 10 15 Count of successes 20 25 0 20 40 60 80 100 Count of successes 120 140 The Poisson distributions A Poisson distribution describes the count X of occurrences of an event in fixed, finite intervals of time or space when  occurrences are all independent,  and the probability of an occurrence is the same over all possible intervals. Think of the Items Containers Poisson distribution  Radioactive decays  Second as describing the  Weeds  Acre of farm land number of items in  Fleas  Dog containers.  Cardiovascular deaths  County / year If we divide a natural lawn into 1 ft2 quadrants, we can count how many dandelions are in each quadrant. Dandelions seeds are wind-spread. The probabilities of a quadrant containing 0,1,2,3… dandelions are given by a Poisson distribution: (i) independence of dandelions: the presence of one dandelion in a quadrant does not make the presence of another more or less likely. (ii) homogeneity of quadrants: each quadrant is equally susceptible to contain dandelions. Poisson probabilities If μ is the population mean number of occurrences for a specified interval of time or space, then the Poisson probability distribution of observing k occurrences (k = 0, 1, 2, …) at constant μ (> 0) is: P( X  k )  e  m mk k! The Poisson distribution has mean μ and standard deviation σ: m s m Effect of changing μ: 0.35 Poisson, Mean=3.5 Poisson, Mean=1.5 0.30 0.30 0.25 0.25 Probability Probability 0.35 0.20 0.15 0.20 0.15 0.10 0.10 0.05 0.05 0.00 0.00 0 5 10 15 20 25 0 5 10 25 Poisson, Mean=15 Poisson, Mean=7 0.30 0.30 0.25 Probability 0.25 Probability 20 0.35 0.35 0.20 0.15 0.20 0.15 0.10 0.10 0.05 0.05 0.00 15 X X 0 5 10 15 X 20 25 0.00 0 5 10 15 X The Poisson distribution is skewed when μ < 5. 20 25 The number of deer crossing a road at night during mating season in a particular rural area can be modeled with a Poisson distribution. A local survey conducted over 4 nights found a total of 20 deer crossings. Based on this information, what is the probability that fewer than three deer would cross on a given night during mating season in this area? e m m k P( X  k )  ,x x  0,1,2...for some m >0 k! To compute this probability using the Poisson distribution, we need to know μ. In this case μ = 20 / 4 = 5 deer crossings per night. > ppois(2,lambda=5) [1] 0.124652 P ( X < 3)  P ( X  0)  P ( X  1)  P ( X  2) e 5 1 2 (5)0 5 (5) 5 (5) e e  e 5 (1  5  12.5) 0! 1! 2!  0.1247 Historical records over 20 years in a particular town indicate an average of 4 severe rainstorms per year. Modeling the occurrences of severe rainstorms with the Poisson distribution, the probability that there would be no severe rainstorm next year is P(X = 0) = (4)0 e–4 / 0! = 0.018 Probability of 5 severe rainstorms next year P(X = 5) = (4)5 e–4 / 5! = 0.156 Probability of 1 or more severe rainstorms next year P(X > 1) = 1 – P(X = 0) = 1 – 0.018 = 0.982 Probability of more than 5 severe rainstorms next year P(X > 5) = 1 – P(X ≤ 5) = 1 – 0.785 = 0.215 x P(X=x) P(X≤x) 0 1.832% 1.832% 1 7.326% 9.158% 2 14.653% 23.810% 3 19.537% 43.347% 4 19.537% 62.884% 5 15.629% 78.513% 6 10.420% 88.933% 7 5.954% 94.887% 8 2.977% 97.864% 9 1.323% 99.187% 10 0.529% 99.716% 11 0.192% 99.908% 12 0.064% 99.973% 13 0.020% 99.992% 14 0.006% 99.998%

Binomial setting and distributions

Related documents

Products

Support

Binomial setting and distributions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib