Uploaded by Dominion Oke

Chapter 4 Common distributions of random variables

advertisement
📚
Chapter 4: Common distributions
of random variables
4.4 Common discrete distributions
For discrete random variables, we will consider the following distributions:
Discrete uniform distribution
Bernoulli distribution
Binomial distribution
Poisson distribution
4.4.1 Discrete uniform distribution
Suppose a random variable X has k possible values 1, 2, …., k.
X has a discrete uniform distribution if all these values have the same
probability, i.e. if:
p(x) = P(X = x) = {1/K for x = 1, 2, …, k; 0 otherwise
Mean and variance of a discrete uniform distribution
Chapter 4: Common distributions of random variables
1
4.4.2 Bernoulli distribution
A Bernoulli trial is an experiment with only two possible outcomes. We will number
these outcomes 1 and 0, and refer to them as ‘success’ and ‘failure’, respectively
The Bernoulli distribution is the distribution of the outcome of a single Bernoulli trial.
This is the distribution of a random variable X with the following probability function
Therefore, P(X = 1) = π and P(X = 0) = 1 - P(X = 1) = 1 - π, and no other values are
possible. Such a random variable X has a Bernoulli distribution with (probability)
parameter π
X ~ Bernoulli(π)
If X ~ Bernoulli (π), then
The moment generating function is:
4.4.3 Binomial distribution
Suppose we carry out n Bernoulli trials such that:
at each trial, the probability of success is π
different trials are statistically independent events
Chapter 4: Common distributions of random variables
2
Let X denote the total number of successes in these n trials. X follows a binomial
distribution with parameters n and π, where n ≥ 1 is a known integer and 0 ≤ π ≤ 1.
This is often written as: X ~ Bin(n, π)
Example:
Binomial distribution probability function
If X ~ Bin(n, π), then: E(X) = nπ and Var(X) = nπ(1 - π)
These values can also be obtained from the moment generating function:
Chapter 4: Common distributions of random variables
3
Mx (t) = ((1 − π) + πet )n
4.4.4 Poisson distribution
The possible values of the Poisson distribution are the non-negative integers 0, 1, 2,
…
Poisson distribution probability function
If a random variable X has a Poisson distribution with parameter λ, this is often
denoted by: X ~ Poisson (λ) or X ~ Pois(λ)
If X ~ Poisson (λ), then: E(X) = λ and Var(X) = λ
These can be obtained from the moment generating function:
t
Mx (t) = eλ(e −1)
Poisson distribution are used for counts of occurrences of various kinds. To give a
formal motivation, suppose that we consider the number of occurrences of some
phenomenon in time, and that the process which generates the occurrences
satisfies the following conditions:
1. The number of occurrences in any two disjoint intervals of time are independent
of each other
2. The probability of two or more occurrences at the same time is negligibly small
3. The probability of one occurrence in any short time interval of length t is λt for
some constant λ > 0
In essence, these state that individual occurrences should be independent,
sufficiently rare, and happen at a constant rate λ per unit of time. A process like this
is a Poisson process
Chapter 4: Common distributions of random variables
4
If occurrences are generated by a Poisson process, then the number of occurrences
in a random selected time interval of length t = 1, X, follows a Poisson distribution
with mean λ, i.e. X ~ Poisson (λ)
The single parameter λ of the Poisson distribution, is, therefore, the rate of
occurrences per unit of time
Example:
4.4.6 Poisson approximation of the binomial distribution
Suppose that:
X ~ Bin (n, π)
n is large and π is small
Chapter 4: Common distributions of random variables
5
Under such circumstances, the distributions of X is well-approximated by a Poisson
(λ) distribution with λ = nπ
The connection is exact at the limit, i.e. Bin(n, π) → Poisson (λ) if n → ∞ and π → 0
in such a way that nπ = λ remains constant
This ‘law of small numbers’ provides another motivation for the Poisson distribution
Example
Chapter 4: Common distributions of random variables
6
4.4.7 Some other discrete distributions
Geometric (π) distribution
Distribution of the number of failures in Bernoulli trials before the first success
π is the probability of success at each trial
The sample space is 0, 1, 2, …
Negative binomial (r, π) distribution
Distribution of the number of failures in Bernoulli trials before r successes occur
π is the probability of success at each trial
The sample is 0, 1, 2, ….
Negative binomial (1, π) is the same as Geometric (π)
Chapter 4: Common distributions of random variables
7
Hypergeometric (n, A, B) distribution
Experiment where initially A + B objects are available for selection, and A of
them represent ‘success’
n objects are selected at random, without replacement
Hypergeometric is then the distribution of the number of successes
The sample space is the integers x where max{0, n - B} ≤ x ≤ min {n, A}
If the selection was with replacement, the distribution of the number of
successes would be Bin(n, A/(A+B))
Multinomial (n, π1 , π2 , ..., πk ) distribution
Here π1
+ π2 + ... + πk = 1 and the πi s are the probabilities of the values 1,
2, …, k
If n =1, the sample space is 1, 2, …, k. This is essentially a generalisation of the
discrete uniform distribution, but with non-equal probabilities πi
If n > 1, the sample space is the vectors (n1 , n2 ...nk ) where ni ≥ 0 for all i, and
n1 + n2 + ... + nk = n. This is essentially a generalisation of the binomial to
the case where each trial has k ≥ 2 possible outcomes, and the random variable
records the numbers of each outcome in n trials. Note that with k =2,
Multinomial(n, π1 , π2 ) is essentially the same as Bin(n, π) with π = π2 (or with
π = π1 )
When n > 1, the multinomial distribution is the distribution of a multivariate
random variable
4.5 Common continuous distributions
For continuous random variables, we will consider the following distributions:
Uniform distribution
Exponential distribution
Normal distribution
4.5.1 The (continuous) uniform distribution
The uniform distribution has non-zero probabilities only on an interval [a, b],
where a < b are given numbers
Chapter 4: Common distributions of random variables
8
The probability that its value is in an interval within [a, b] is proportional to the
length of the interval
All intervals within [a, b] which have the same length have the same
probability
The probability of an interval [x1 , x2 ], where a
≤ x1 < x2 ≤ b is: P (x1 ≤
X ≤ x2 ) = F (x2 ) − F (x1 ) = (x2 − x1 )/b − a
The probability depends only on the length of the interval, x2
− x1
If X ~ Uniform [a, b], we have:
E(X) = a + b/2 = median of X
Var(X) = (b − a)2 /12
The mean and median also follow from the fact that the distribution is symmetric
about (a +b)/2 - the midpoint of the interval [a, b]
Chapter 4: Common distributions of random variables
9
4.5.2 Exponential distribution
For X ~ Exp(λ), we have:
E(X) = 1/λ
Var(X) = 1/λ2
The median of this distribution is: m
= ln2/λ = (ln2) ∗ 1/λ = (ln2)E(X) =
0.69 ∗ E(X)
Chapter 4: Common distributions of random variables
10
Note that the median is always smaller than the mean, because the distribution is
skewed to the right
The moment generating function of the exponential distribution: MX (t)
= λ/(λ −
t)
Uses of the exponential distribution
The exponential is a basic distribution of waiting times of various kinds
This arises from a connection between the Poisson distribution and the
exponential
If the number of event per unit of time has a Poisson distribution with parameter
λ, the time interval (measured in the same units of time) between two
successive events has an exponential distribution with the same parameter λ
Note that the expected values of these behave as we would expect:
E(X) = λ for Pois(λ), a large λ means many events per unit of time, on
average
E(X) = 1/λ for Exp(λ), a large λ means short waiting times between
successive events, on average
Chapter 4: Common distributions of random variables
11
4.5.3 Two other distributions
Beta (α, β) distribution
Generalising the uniform distribution, these are distributions for a closed interval,
which is take to be [0,1]
Therefore, the sample space {x | 0≤ x ≤ 1}
Unlike for the uniform distribution, the pdf is generally not flat
Beta (1, 1) is the same as Uniform[0, 1]
Chapter 4: Common distributions of random variables
12
Gamma (α, β) distribution
Generalising the exponential distribution, this is a two-parameter family of
skewed distributions for positive values
The sample space is {x | x > 0}
Gamma (1, β) is the same as Exp(β)
Chapter 4: Common distributions of random variables
13
4.5.4 Normal (Gaussian) distribution
The normal distribution is the most important probability distribution in statistics. This
is for three broad reasons:
Many variables have distributions which are approximately normal, for example
heights of humans or animals, and weights of various products
The normal distribution has extremely convenient mathematical properties,
which make it a useful default choice of distribution in many contexts
Even when a variable is not itself even approximately normally distributed,
functions of several observations of the variable (’sampling distributions’) are
often approximately normal, due to the central limit theorem. Because of this,
the normal distribution has a crucial role in statistical inference
Chapter 4: Common distributions of random variables
14
If X ~ N (µ, σ 2 )
E(X) = µ
Var(X) = σ 2
It uses the moment generating function of the normal distribution: Mx (t)
2 2
=
exp(µt + ((σ t /2))
The mean can also be inferred from the observation that the normal pdf is
symmetric about µ. This also implies that the median of the normal distribution is µ
The normal density is a ‘bell-curve’. The two parameters affect it as follows
The mean µ determines the location of the curve
The variance σ 2 determines the dispersion (spread) of the curve
Chapter 4: Common distributions of random variables
15
Linear transformations of the normal distribution
Suppose X is a random variable, and we consider the linear transformation Y =
aX + b, where a and b are constants
Whatever the distribution of X, it is true that E(Y) =aE(X) + b and also Var(Y) =
a2 Var(X)
Furthermore, if X is normally distributed, then so is Y
If X ~ N (µ, σ 2 ), then Y
= aX + b ~ N(aµ + b, a2 σ 2 )
This type of result is not true in general. For other families of
distributions, the distribution of Y = aX + b is not always in the same
family as X
Chapter 4: Common distributions of random variables
16
The transformed variable Z = (X - µ)/ σ is known as a standardised variable or a
z - score
The distribution of the z-score is N(0, 1) - a normal distribution with mean = 0
and variance = 1 (and therefore a standard deviation of 1)
This is known as the standard normal distribution
Its density function:
The cumulative distribution function of the normal distribution:
In the special case of the standard normal distribution, the cdf is:
Such integrals cannot be evaluated in a closed form, so we use statistical tables
of them
The statistical table can be used to calculate probabilities of any intervals for any
normal distribution but how? The table seems incomplete
1. It is only for N(0, 1), not N (µ, σ 2 ) for any other values of the mean and
variance
2. Even for N(0, 1), it only shows probabilities for z ≥ 0
The key to using tables is that the standard normal distribution is symmetric
about 0. This means that for an interval in one tail, its ‘mirror image’ in the other
tail has the same probability
Another way to justify these results is that if Z ~ N(0, 1), then -Z ~ N(0, 1)
Chapter 4: Common distributions of random variables
17
Chapter 4: Common distributions of random variables
18
Probabilities for any normal distribution
How about a normal distribution X ~ N (µ, σ 2 ), for any other µ and σ 2
What if we wanted to calculate, for any a < b, P(a < X ≤ b) = F(b) - F(a)
Chapter 4: Common distributions of random variables
19
Some probabilities around the mean
4.5.5 Normal approximation of the binomial distribution
For 0 < π < 1, the binomial distribution Bin(n, π) tends to the normal distribution
N(nπ, nπ(1 - π)) as n tends to infinity
Less formally, the binomial distribution is well-approximated by the normal
distribution when the number of trials n is reasonably large
For a given n, the approximation is best when π is not close to 0 or 1
Chapter 4: Common distributions of random variables
20
One rule of thumb is that the approximation is good enough when nπ > 5 and
n(1 - π) > 5
When the normal distribution is appropriate, we can calculate probabilities for X ~
Bin(n, π) using Y ~ N(nπ, nπ(1 - π)) and the statistical table
There is one caveat. The binomial distribution is discrete, but the normal distribution
is continuous, so there may be ‘gaps’ in the probability mass for this distribution
The accepted way to circumvent the problem is by using continuity correction which
corrects for the effects of the transition from discrete Bin (n, π) distribution to a
continuous N(nπ, nπ(1 - π)) distribution
Chapter 4: Common distributions of random variables
21
Chapter 4: Common distributions of random variables
22
Download