📚 Chapter 4: Common distributions of random variables 4.4 Common discrete distributions For discrete random variables, we will consider the following distributions: Discrete uniform distribution Bernoulli distribution Binomial distribution Poisson distribution 4.4.1 Discrete uniform distribution Suppose a random variable X has k possible values 1, 2, …., k. X has a discrete uniform distribution if all these values have the same probability, i.e. if: p(x) = P(X = x) = {1/K for x = 1, 2, …, k; 0 otherwise Mean and variance of a discrete uniform distribution Chapter 4: Common distributions of random variables 1 4.4.2 Bernoulli distribution A Bernoulli trial is an experiment with only two possible outcomes. We will number these outcomes 1 and 0, and refer to them as ‘success’ and ‘failure’, respectively The Bernoulli distribution is the distribution of the outcome of a single Bernoulli trial. This is the distribution of a random variable X with the following probability function Therefore, P(X = 1) = π and P(X = 0) = 1 - P(X = 1) = 1 - π, and no other values are possible. Such a random variable X has a Bernoulli distribution with (probability) parameter π X ~ Bernoulli(π) If X ~ Bernoulli (π), then The moment generating function is: 4.4.3 Binomial distribution Suppose we carry out n Bernoulli trials such that: at each trial, the probability of success is π different trials are statistically independent events Chapter 4: Common distributions of random variables 2 Let X denote the total number of successes in these n trials. X follows a binomial distribution with parameters n and π, where n ≥ 1 is a known integer and 0 ≤ π ≤ 1. This is often written as: X ~ Bin(n, π) Example: Binomial distribution probability function If X ~ Bin(n, π), then: E(X) = nπ and Var(X) = nπ(1 - π) These values can also be obtained from the moment generating function: Chapter 4: Common distributions of random variables 3 Mx (t) = ((1 − π) + πet )n 4.4.4 Poisson distribution The possible values of the Poisson distribution are the non-negative integers 0, 1, 2, … Poisson distribution probability function If a random variable X has a Poisson distribution with parameter λ, this is often denoted by: X ~ Poisson (λ) or X ~ Pois(λ) If X ~ Poisson (λ), then: E(X) = λ and Var(X) = λ These can be obtained from the moment generating function: t Mx (t) = eλ(e −1) Poisson distribution are used for counts of occurrences of various kinds. To give a formal motivation, suppose that we consider the number of occurrences of some phenomenon in time, and that the process which generates the occurrences satisfies the following conditions: 1. The number of occurrences in any two disjoint intervals of time are independent of each other 2. The probability of two or more occurrences at the same time is negligibly small 3. The probability of one occurrence in any short time interval of length t is λt for some constant λ > 0 In essence, these state that individual occurrences should be independent, sufficiently rare, and happen at a constant rate λ per unit of time. A process like this is a Poisson process Chapter 4: Common distributions of random variables 4 If occurrences are generated by a Poisson process, then the number of occurrences in a random selected time interval of length t = 1, X, follows a Poisson distribution with mean λ, i.e. X ~ Poisson (λ) The single parameter λ of the Poisson distribution, is, therefore, the rate of occurrences per unit of time Example: 4.4.6 Poisson approximation of the binomial distribution Suppose that: X ~ Bin (n, π) n is large and π is small Chapter 4: Common distributions of random variables 5 Under such circumstances, the distributions of X is well-approximated by a Poisson (λ) distribution with λ = nπ The connection is exact at the limit, i.e. Bin(n, π) → Poisson (λ) if n → ∞ and π → 0 in such a way that nπ = λ remains constant This ‘law of small numbers’ provides another motivation for the Poisson distribution Example Chapter 4: Common distributions of random variables 6 4.4.7 Some other discrete distributions Geometric (π) distribution Distribution of the number of failures in Bernoulli trials before the first success π is the probability of success at each trial The sample space is 0, 1, 2, … Negative binomial (r, π) distribution Distribution of the number of failures in Bernoulli trials before r successes occur π is the probability of success at each trial The sample is 0, 1, 2, …. Negative binomial (1, π) is the same as Geometric (π) Chapter 4: Common distributions of random variables 7 Hypergeometric (n, A, B) distribution Experiment where initially A + B objects are available for selection, and A of them represent ‘success’ n objects are selected at random, without replacement Hypergeometric is then the distribution of the number of successes The sample space is the integers x where max{0, n - B} ≤ x ≤ min {n, A} If the selection was with replacement, the distribution of the number of successes would be Bin(n, A/(A+B)) Multinomial (n, π1 , π2 , ..., πk ) distribution Here π1 + π2 + ... + πk = 1 and the πi s are the probabilities of the values 1, 2, …, k If n =1, the sample space is 1, 2, …, k. This is essentially a generalisation of the discrete uniform distribution, but with non-equal probabilities πi If n > 1, the sample space is the vectors (n1 , n2 ...nk ) where ni ≥ 0 for all i, and n1 + n2 + ... + nk = n. This is essentially a generalisation of the binomial to the case where each trial has k ≥ 2 possible outcomes, and the random variable records the numbers of each outcome in n trials. Note that with k =2, Multinomial(n, π1 , π2 ) is essentially the same as Bin(n, π) with π = π2 (or with π = π1 ) When n > 1, the multinomial distribution is the distribution of a multivariate random variable 4.5 Common continuous distributions For continuous random variables, we will consider the following distributions: Uniform distribution Exponential distribution Normal distribution 4.5.1 The (continuous) uniform distribution The uniform distribution has non-zero probabilities only on an interval [a, b], where a < b are given numbers Chapter 4: Common distributions of random variables 8 The probability that its value is in an interval within [a, b] is proportional to the length of the interval All intervals within [a, b] which have the same length have the same probability The probability of an interval [x1 , x2 ], where a ≤ x1 < x2 ≤ b is: P (x1 ≤ X ≤ x2 ) = F (x2 ) − F (x1 ) = (x2 − x1 )/b − a The probability depends only on the length of the interval, x2 − x1 If X ~ Uniform [a, b], we have: E(X) = a + b/2 = median of X Var(X) = (b − a)2 /12 The mean and median also follow from the fact that the distribution is symmetric about (a +b)/2 - the midpoint of the interval [a, b] Chapter 4: Common distributions of random variables 9 4.5.2 Exponential distribution For X ~ Exp(λ), we have: E(X) = 1/λ Var(X) = 1/λ2 The median of this distribution is: m = ln2/λ = (ln2) ∗ 1/λ = (ln2)E(X) = 0.69 ∗ E(X) Chapter 4: Common distributions of random variables 10 Note that the median is always smaller than the mean, because the distribution is skewed to the right The moment generating function of the exponential distribution: MX (t) = λ/(λ − t) Uses of the exponential distribution The exponential is a basic distribution of waiting times of various kinds This arises from a connection between the Poisson distribution and the exponential If the number of event per unit of time has a Poisson distribution with parameter λ, the time interval (measured in the same units of time) between two successive events has an exponential distribution with the same parameter λ Note that the expected values of these behave as we would expect: E(X) = λ for Pois(λ), a large λ means many events per unit of time, on average E(X) = 1/λ for Exp(λ), a large λ means short waiting times between successive events, on average Chapter 4: Common distributions of random variables 11 4.5.3 Two other distributions Beta (α, β) distribution Generalising the uniform distribution, these are distributions for a closed interval, which is take to be [0,1] Therefore, the sample space {x | 0≤ x ≤ 1} Unlike for the uniform distribution, the pdf is generally not flat Beta (1, 1) is the same as Uniform[0, 1] Chapter 4: Common distributions of random variables 12 Gamma (α, β) distribution Generalising the exponential distribution, this is a two-parameter family of skewed distributions for positive values The sample space is {x | x > 0} Gamma (1, β) is the same as Exp(β) Chapter 4: Common distributions of random variables 13 4.5.4 Normal (Gaussian) distribution The normal distribution is the most important probability distribution in statistics. This is for three broad reasons: Many variables have distributions which are approximately normal, for example heights of humans or animals, and weights of various products The normal distribution has extremely convenient mathematical properties, which make it a useful default choice of distribution in many contexts Even when a variable is not itself even approximately normally distributed, functions of several observations of the variable (’sampling distributions’) are often approximately normal, due to the central limit theorem. Because of this, the normal distribution has a crucial role in statistical inference Chapter 4: Common distributions of random variables 14 If X ~ N (µ, σ 2 ) E(X) = µ Var(X) = σ 2 It uses the moment generating function of the normal distribution: Mx (t) 2 2 = exp(µt + ((σ t /2)) The mean can also be inferred from the observation that the normal pdf is symmetric about µ. This also implies that the median of the normal distribution is µ The normal density is a ‘bell-curve’. The two parameters affect it as follows The mean µ determines the location of the curve The variance σ 2 determines the dispersion (spread) of the curve Chapter 4: Common distributions of random variables 15 Linear transformations of the normal distribution Suppose X is a random variable, and we consider the linear transformation Y = aX + b, where a and b are constants Whatever the distribution of X, it is true that E(Y) =aE(X) + b and also Var(Y) = a2 Var(X) Furthermore, if X is normally distributed, then so is Y If X ~ N (µ, σ 2 ), then Y = aX + b ~ N(aµ + b, a2 σ 2 ) This type of result is not true in general. For other families of distributions, the distribution of Y = aX + b is not always in the same family as X Chapter 4: Common distributions of random variables 16 The transformed variable Z = (X - µ)/ σ is known as a standardised variable or a z - score The distribution of the z-score is N(0, 1) - a normal distribution with mean = 0 and variance = 1 (and therefore a standard deviation of 1) This is known as the standard normal distribution Its density function: The cumulative distribution function of the normal distribution: In the special case of the standard normal distribution, the cdf is: Such integrals cannot be evaluated in a closed form, so we use statistical tables of them The statistical table can be used to calculate probabilities of any intervals for any normal distribution but how? The table seems incomplete 1. It is only for N(0, 1), not N (µ, σ 2 ) for any other values of the mean and variance 2. Even for N(0, 1), it only shows probabilities for z ≥ 0 The key to using tables is that the standard normal distribution is symmetric about 0. This means that for an interval in one tail, its ‘mirror image’ in the other tail has the same probability Another way to justify these results is that if Z ~ N(0, 1), then -Z ~ N(0, 1) Chapter 4: Common distributions of random variables 17 Chapter 4: Common distributions of random variables 18 Probabilities for any normal distribution How about a normal distribution X ~ N (µ, σ 2 ), for any other µ and σ 2 What if we wanted to calculate, for any a < b, P(a < X ≤ b) = F(b) - F(a) Chapter 4: Common distributions of random variables 19 Some probabilities around the mean 4.5.5 Normal approximation of the binomial distribution For 0 < π < 1, the binomial distribution Bin(n, π) tends to the normal distribution N(nπ, nπ(1 - π)) as n tends to infinity Less formally, the binomial distribution is well-approximated by the normal distribution when the number of trials n is reasonably large For a given n, the approximation is best when π is not close to 0 or 1 Chapter 4: Common distributions of random variables 20 One rule of thumb is that the approximation is good enough when nπ > 5 and n(1 - π) > 5 When the normal distribution is appropriate, we can calculate probabilities for X ~ Bin(n, π) using Y ~ N(nπ, nπ(1 - π)) and the statistical table There is one caveat. The binomial distribution is discrete, but the normal distribution is continuous, so there may be ‘gaps’ in the probability mass for this distribution The accepted way to circumvent the problem is by using continuity correction which corrects for the effects of the transition from discrete Bin (n, π) distribution to a continuous N(nπ, nπ(1 - π)) distribution Chapter 4: Common distributions of random variables 21 Chapter 4: Common distributions of random variables 22