5
Joint Probability
Distributions and
Random Samples
5.1
Jointly Distributed
Random Variables
Two Discrete Random Variables
3
Two Discrete Random Variables
The probability mass function (pmf) of a single discrete rv X
specifies how much probability mass is placed on each
possible X value.
The joint pmf of two discrete rv’s X and Y describes how
much probability mass is placed on each possible pair of
values (x, y).
Definition
Let X and Y be two discrete rv’s defined on the sample
space of an experiment. The joint probability mass
function p(x, y) is defined for each pair of numbers (x, y)
by
p(x, y) = P(X = x and Y = y)
4
Two Discrete Random Variables
It must be the case that p(x, y)  0 and
p(x, y) = 1.
Now let A be any set consisting of pairs of (x, y) values
(e.g., A = {(x, y): x + y = 5} or {(x, y): max(x, y)  3}).
Then the probability P[(X, Y)  A] is obtained by summing
the joint pmf over pairs in A:
P[(X, Y)  A] =
p(x, y)
5
Example 1
A large insurance agency services a number of customers
who have purchased both a homeowner’s policy and an
automobile policy from the agency. For each type of policy,
a deductible amount must be specified.
For an automobile policy, the choices are $100 and $250,
whereas for a homeowner’s policy, the choices are 0, $100,
and $200.
Suppose an individual with both types of policy is selected
at random from the agency’s files. Let X = the deductible
amount on the auto policy and Y = the deductible amount
on the homeowner’s policy.
6
Example 1
cont’d
Possible (X, Y) pairs are then (100, 0), (100, 100),
(100, 200), (250, 0), (250, 100), and (250, 200); the joint
pmf specifies the probability associated with each one of
these pairs, with any other pair having probability zero.
Suppose the joint pmf is given in the accompanying joint
probability table:
7
Example 1
cont’d
Then p(100, 100) = P(X = 100 and Y = 100) = P($100
deductible on both policies) = .10.
The probability P(Y  100) is computed by summing
probabilities of all (x, y) pairs for which y  100:
P(Y  100) = p(100, 100) + p(250, 100) + p(100, 200)
+ p(250, 200)
= .75
8
Two Discrete Random Variables
Definition
The marginal probability mass function of X, denoted by
pX (x), is given by
pX (x) =
p(x, y) for each possible value x
Similarly, the marginal probability mass function of Y is
pY (y) =
p(x, y) for each possible value y.
9
Example 2
Example 1 continued…
The possible X values are x = 100 and x = 250, so
computing row totals in the joint probability table yields
pX(100) = p(100, 0) + p(100, 100) + p(100, 200) = .50
and
pX(250) = p(250, 0) + p(250, 100) + p(250, 200) = .50
The marginal pmf of X is then
10
Example 2
cont’d
Similarly, the marginal pmf of Y is obtained from column
totals as
so P(Y  100) = pY(100) + pY(200) = .75 as before.
11
Two Continuous Random
Variables
12
Two Continuous Random Variables
The probability that the observed value of a continuous rv X
lies in a one-dimensional set A (such as an interval) is
obtained by integrating the pdf f(x) over the set A.
Similarly, the probability that the pair (X, Y) of continuous
rv’s falls in a two-dimensional set A (such as a rectangle) is
obtained by integrating a function called the joint density
function.
13
Two Continuous Random Variables
Definition
Let X and Y be continuous rv’s. A joint probability density
function f(x, y) for these two variables is a function
satisfying f(x, y)  0 and
Then for any two-dimensional set A
14
Two Continuous Random Variables
In particular, if A is the two-dimensional rectangle
{(x, y): a  x  b, c  y  d}, then
We can think of f(x, y) as specifying a surface at height
f(x, y) above the point (x, y) in a three-dimensional
coordinate system.
Then P[(X, Y)  A] is the volume underneath this surface
and above the region A, analogous to the area under a
curve in the case of a single rv.
15
Two Continuous Random Variables
This is illustrated in Figure 5.1.
P[(X, Y )  A] = volume under density surface above A
Figure 5.1
16
Example 3
A bank operates both a drive-up facility and a walk-up
window. On a randomly selected day, let X = the proportion
of time that the drive-up facility is in use and Y = the
proportion of time that the walk-up window is in use.
Then the set of possible values for (X, Y) is the rectangle
D = {(x, y): 0  x  1, 0  y  1}.
17
Example 3
cont’d
Suppose the joint pdf of (X, Y) is given by
To verify that this is a legitimate pdf, note that f(x, y)  0
and
18
Example 3
cont’d
The probability that neither facility is busy more than
one-quarter of the time is
19
Example 3
cont’d
20
Two Continuous Random Variables
The marginal pdf of each variable can be obtained in a
manner analogous to what we did in the case of two
discrete variables.
The marginal pdf of X at the value x results from holding x
fixed in the pair (x, y) and integrating the joint pdf over y.
Integrating the joint pdf with respect to x gives the marginal
pdf of Y.
21
Two Continuous Random Variables
Definition
The marginal probability density functions of X and Y,
denoted by fX(x) and fY(y), respectively, are given by
22
Independent Random Variables
23
Independent Random Variables
In many situations, information about the observed value of
one of the two variables X and Y gives information about
the value of the other variable.
In Example 1, the marginal probability of X at x = 250
was .5, as was the probability that X = 100. If, however, we
are told that the selected individual had Y = 0, then X = 100
is four times as likely as X = 250.
Thus there is a dependence between the two variables.
Earlier, we pointed out that one way of defining
independence of two events is via the condition
P(A  B) = P(A)  P(B).
24
Independent Random Variables
Here is an analogous definition for the independence of two
rv’s.
Definition
Two random variables X and Y are said to be independent
if for every pair of x and y values
p(x, y) = pX (x)  pY (y)
or
f(x, y) = fX(x)  fY(y)
when X and Y are discrete
(5.1)
when X and Y are continuous
If (5.1) is not satisfied for all (x, y), then X and Y are said to
be dependent.
25
Independent Random Variables
The definition says that two variables are independent if
their joint pmf or pdf is the product of the two marginal
pmf’s or pdf’s.
Intuitively, independence says that knowing the value of
one of the variables does not provide additional information
about what the value of the other variable might be.
26
Example 6
In the insurance situation of Examples 1 and 2,
p(100, 100) = .10  (.5)(.25) = pX(100)  pY(100)
so X and Y are not independent.
Independence of X and Y requires that every entry in the
joint probability table be the product of the corresponding
row and column marginal probabilities.
27
Independent Random Variables
Independence of two random variables is most useful when
the description of the experiment under study suggests that
X and Y have no effect on one another.
Then once the marginal pmf’s or pdf’s have been specified,
the joint pmf or pdf is simply the product of the two
marginal functions. It follows that
P(a  X  b, c  Y  d) = P(a  X  b)  P(c  Y  d)
28
5.2
Expected Values,
Covariance, and Correlation
29
Expected Values, Covariance, and Correlation
Proposition
Let X and Y be jointly distributed rv’s with pmf p(x, y) or
pdf f(x, y) according to whether the variables are discrete
or continuous.
Then the expected value of a function h(X, Y), denoted by
E[h(X, Y)] or h(X, Y), is given by
if X and Y are discrete
if X and Y are continuous
30
Example 13
Five friends have purchased tickets to a certain concert. If
the tickets are for seats 1–5 in a particular row and the
tickets are randomly distributed among the five, what
is the expected number of seats separating any particular
two of the five?
Let X and Y denote the seat numbers of the first and
second individuals, respectively. Possible (X, Y) pairs are
{(1, 2), (1, 3), . . . , (5, 4)}, and the joint pmf of (X, Y) is
x = 1, . . . , 5; y = 1, . . . , 5; x  y
otherwise
31
Example 13
cont’d
The number of seats separating the two individuals is
h(X, Y) = |X – Y| – 1.
The accompanying table gives h(x, y) for each possible
(x, y) pair.
32
Example 13
cont’d
Thus
33
Covariance
34
Covariance
When two random variables X and Y are not independent,
it is frequently of interest to assess how strongly they are
related to one another.
Definition
The covariance between two rv’s X and Y is
Cov(X, Y) = E[(X – X)(Y – Y)]
X, Y discrete
X, Y continuous
35
Covariance
That is, since X – X and Y – Y are the deviations of the
two variables from their respective mean values, the
covariance is the expected product of deviations. Note
that Cov(X, X) = E[(X – X)2] = V(X).
The rationale for the definition is as follows.
Suppose X and Y have a strong positive relationship to one
another, by which we mean that large values of X tend to
occur with large values of Y and small values of X with
small values of Y.
36
Covariance
Then most of the probability mass or density will be
associated with (x – X) and (y – Y), either both positive
(both X and Y above their respective means) or both
negative, so the product (x – X)(y – Y) will tend to be
positive.
Thus for a strong positive relationship, Cov(X, Y) should be
quite positive.
For a strong negative relationship, the signs of (x – X) and
(y – Y) will tend to be opposite, yielding a negative
product.
37
Covariance
Thus for a strong negative relationship, Cov(X, Y) should
be quite negative.
If X and Y are not strongly related, positive and negative
products will tend to cancel one another, yielding a
covariance near 0.
38
Covariance
Figure 5.4 illustrates the different possibilities. The
covariance depends on both the set of possible pairs and
the probabilities. In Figure 5.4, the probabilities could be
changed without altering the set of possible pairs, and this
could drastically change the value of Cov(X, Y).
p(x, y) = 1/10 for each of ten pairs corresponding to indicated points:
(a) positive covariance;
(b) negative covariance;
Figure 5.4
(c) covariance near zero
39
Example 15
The joint and marginal pmf’s for
X = automobile policy deductible amount and
Y = homeowner policy deductible amount in Example 5.1
were
from which X = xpX(x) = 175 and Y = 125.
40
Example 15
cont’d
Therefore,
Cov(X, Y) =
(x – 175)(y – 125)p(x, y)
(x, y)
= (100 – 175)(0 – 125)(.20) + . . .
+ (250 – 175)(200 – 125)(.30)
= 1875
41
Covariance
The following shortcut formula for Cov(X, Y) simplifies the
computations.
Proposition
Cov(X, Y) = E(XY) – X  Y
According to this formula, no intermediate subtractions are
necessary; only at the end of the computation is X  Y
subtracted from E(XY). The proof involves expanding
(X – X)(Y – Y) and then taking the expected value of each
term separately.
42
Correlation
43
Correlation
Definition
The correlation coefficient of X and Y, denoted by
Corr(X, Y), X,Y, or just , is defined by
44
Example 17
It is easily verified that in the insurance scenario of
Example 15, E(X2) = 36,250,
= 36,250 – (175)2 = 5625,
X = 75, E(Y2) = 22,500,
= 6875, and Y = 82.92.
This gives
45
Correlation
The following proposition shows that  remedies the defect
of Cov(X, Y) and also suggests how to recognize the
existence of a strong (linear) relationship.
Proposition
1. If a and c are either both positive or both negative,
Corr(aX + b, cY + d) = acCorr(X, Y)
2. For any two rv’s X and Y, –1  Corr(X, Y)  1.
46
Correlation
If we think of p(x, y) or f(x, y) as prescribing a mathematical
model for how the two numerical variables X and Y are
distributed in some population (height and weight, verbal
SAT score and quantitative SAT score, etc.), then  is a
population characteristic or parameter that measures how
strongly X and Y are related in the population.
We will consider taking a sample of pairs (x1, y1), . . . , (xn, yn)
from the population.
The sample correlation coefficient r will then be defined and
used to make inferences about .
47
Correlation
The correlation coefficient  is actually not a completely
general measure of the strength of a relationship.
Proposition
1. If X and Y are independent, then  = 0, but  = 0 does
not imply independence.
2.  = 1 or –1 iff Y = aX + b for some numbers a and b with
a  0.
48
Correlation
This proposition says that  is a measure of the degree of
linear relationship between X and Y, and only when the
two variables are perfectly related in a linear manner will
 be as positive or negative as it can be.
A  less than 1 in absolute value indicates only that the
relationship is not completely linear, but there may still be a
very strong nonlinear relation.
49
Correlation
Also,  = 0 does not imply that X and Y are independent,
but only that there is a complete absence of a linear
relationship. When  = 0, X and Y are said to be
uncorrelated.
Two variables could be uncorrelated yet highly dependent
because there is a strong nonlinear relationship, so be
careful not to conclude too much from knowing that  = 0.
50
Correlation
A value of  near 1 does not necessarily imply that
increasing the value of X causes Y to increase. It implies
only that large X values are associated with large Y values.
For example, in the population of children, vocabulary size
and number of cavities are quite positively correlated, but it
is certainly not true that cavities cause vocabulary
to grow.
Instead, the values of both these variables tend to increase
as the value of age, a third variable, increases.
51
5.3
Statistics and Their
Distributions
52
Statistics and Their Distributions
Definition
A statistic is any quantity whose value can be calculated
from sample data. Prior to obtaining data, there is
uncertainty as to what value of any particular statistic will
result. Therefore, a statistic is a random variable and will be
denoted by an uppercase letter; a lowercase letter is used
to represent the calculated or observed value of the
statistic.
53
Statistics and Their Distributions
Thus the sample mean, regarded as a statistic (before a
sample has been selected or an experiment carried out), is
denoted by ; the calculated value of this statistic is .
Similarly, S represents the sample standard deviation
thought of as a statistic, and its computed value is s.
If samples of two different types of bricks are selected and
the individual compressive strengths are denoted by
X1, . . . , Xm and Y1, . . . , Yn, respectively, then the statistic
, the difference between the two sample mean
compressive strengths, is often of great interest.
54
Statistics and Their Distributions
The probability distribution of a statistic is sometimes
referred to as its sampling distribution to emphasize that
it describes how the statistic varies in value across all
samples that might be selected.
55
Random Samples
56
Random Samples
Definition
The rv’s X1, X2, . . . , Xn are said to form a (simple) random
sample of size n if
1. The Xi’s are independent rv’s.
2. Every Xi has the same probability distribution.
57
Random Samples
Conditions 1 and 2 can be paraphrased by saying that the
Xi’s are independent and identically distributed (iid).
If sampling is either with replacement or from an infinite
(conceptual) population, Conditions 1 and 2 are satisfied
exactly.
These conditions will be approximately satisfied if sampling
is without replacement, yet the sample size n is much
smaller than the population size N.
58
Random Samples
In practice, if n/N  .05 (at most 5% of the population is
sampled), we can proceed as if the Xi’s form a random
sample.
The virtue of this sampling method is that the probability
distribution of any statistic can be more easily obtained
than for any other sampling method.
There are two general methods for obtaining information
about a statistic’s sampling distribution. One method
involves calculations based on probability rules, and the
other involves carrying out a simulation experiment.
59
Simulation Experiments
60
Simulation Experiments
The following characteristics of an experiment must be
specified:
1. The statistic of interest (
mean, etc.)
, S, a particular trimmed
2. The population distribution (normal with  = 100 and
 = 15, uniform with lower limit A = 5 and upper limit
B = 10,etc.)
3. The sample size n (e.g., n = 10 or n = 50)
4. The number of replications k (number of samples to be
obtained)
61
Simulation Experiments
Then use appropriate software to obtain k different random
samples, each of size n, from the designated population
distribution.
For each sample, calculate the value of the statistic and
construct a histogram of the k values. This histogram gives
the approximate sampling distribution of the statistic.
The larger the value of k, the better the approximation will
tend to be (the actual sampling distribution emerges as
k  ). In practice, k = 500 or 1000 is usually sufficient if
the statistic is “fairly simple.”
62
Simulation Experiments
The final aspect of the histograms to note is their spread
relative to one another.
The larger the value of n, the more concentrated is the
sampling distribution about the mean value. This is why the
histograms for n = 20 and n = 30 are based on narrower
class intervals than those for the two smaller sample sizes.
For the larger sample sizes, most of the values are quite
close to 8.25. This is the effect of averaging. When n is
small, a single unusual x value can result in an value far
from the center.
63
Simulation Experiments
With a larger sample size, any unusual x values, when
averaged in with the other sample values, still tend to yield
an value close to .
Combining these insights yields a result that should appeal
to your intuition:
based on a large n tends to be closer to  than does
based on a small n.
64
5.4
The Distribution of the
Sample Mean
Copyright © Cengage Learning. All rights reserved.
65
The Distribution of the Sample Mean
The importance of the sample mean springs from its use
in drawing conclusions about the population mean . Some
of the most frequently used inferential procedures are
based on properties of the sampling distribution of .
A preview of these properties appeared in the calculations
and simulation experiments of the previous section, where
we noted relationships between E( ) and  and also
among V( ),  2, and n.
66
The Distribution of the Sample Mean
Proposition
Let X1, X2, . . . , Xn be a random sample from a distribution
with mean value  and standard deviation . Then
1.
2.
In addition, with T0 = X1+ . . . + Xn (the sample total),
67
The Distribution of the Sample Mean
The sampling distribution of is centered precisely at the
mean of the population
The distribution becomes more concentrated about  as
the sample size n increases.
The distribution of To becomes more spread out as n
increases.
Averaging moves probability in toward the middle, whereas
totaling spreads probability out over a wider and wider
range of values.
The standard deviation
standard error of the mean
is often called the
68
Example 24
In a notched tensile fatigue test on a titanium specimen, the
expected number of cycles to first acoustic emission (used
to indicate crack initiation) is  = 28,000, and the standard
deviation of the number of cycles is  = 5000.
Let X1, X2, . . . , X25 be a random sample of size 25, where
each Xi is the number of cycles on a different randomly
selected specimen.
Then the expected value of the sample mean number of
cycles until first emission is E( )= = 28,000, and the
expected total number of cycles for the 25 specimens is
E(To) = n = 25(28,000) = 700,000.
69
Example 24
The standard deviation of
and of To are
cont’d
(standard error of the mean)
If the sample size increases to n = 100, E( ) is unchanged,
but = 500, half of its previous value (the sample size
must be quadrupled to halve the standard deviation of ).
70
The Case of a Normal Population
Distribution
71
The Case of a Normal Population Distribution
Proposition
Let X1, X2, . . . , Xn be a random sample from a normal
distribution with mean  and standard deviation . Then for
any n, is normally distributed (with mean  and standard
deviation
, as is To (with mean n and standard
Deviation
).
We know everything there is to know about the and To
distributions when the population distribution is normal. In
particular, probabilities such as P(a   b) and
P(c  To  d) can be obtained simply by standardizing.
72
The Case of a Normal Population Distribution
Figure 5.14 illustrates the proposition.
A normal population distribution and sampling distributions
Figure 5.14
73
Example 25
The time that it takes a randomly selected rat of a certain
subspecies to find its way through a maze is a normally
distributed rv with  = 1.5 min and  = .35 min. Suppose five
rats are selected.
Let X1, . . . , X5 denote their times in the maze. Assuming the
Xi’s to be a random sample from this normal distribution,
what is the probability that the total time To = X1 + . . . + X5
for the five is between 6 and 8 min?
74
Example 25
cont’d
By the proposition, To has a normal distribution with
= n = 5(1.5) = 7.5
and
variance
= n 2 = 5(.1225) = .6125, so
To standardize To, subtract
and divide by
= .783.
:
75
Example 25
cont’d
Determination of the probability that the sample average
time (a normally distributed variable) is at most 2.0 min
requires
=  = 1.5 and
=
= .1565.
Then
76
The Central Limit Theorem
77
The Central Limit Theorem
When the Xi’s are normally distributed, so is
sample size n.
for every
Even when the population distribution is highly nonnormal,
averaging produces a distribution more bell-shaped than
the one being sampled.
A reasonable conjecture is that if n is large, a suitable
normal curve will approximate the actual distribution of .
The formal statement of this result is the most important
theorem of probability.
78
The Central Limit Theorem
Theorem
The Central Limit Theorem (CLT)
Let X1, X2, . . . , Xn be a random sample from a distribution
with mean  and variance  2. Then if n is sufficiently large,
has approximately a normal distribution with
and
and To also has approximately a normal
distribution with
The larger the value of
n, the better the approximation.
79
The Central Limit Theorem
Figure 5.15 illustrates the Central Limit Theorem.
The Central Limit Theorem illustrated
Figure 5.15
80
Example 26
The amount of a particular impurity in a batch of a certain
chemical product is a random variable with mean value 4.0 g
and standard deviation 1.5 g.
If 50 batches are independently prepared, what is the
(approximate) probability that the sample average amount of
impurity is between 3.5 and 3.8 g?
According to the rule of thumb to be stated shortly, n = 50 is
large enough for the CLT to be applicable.
81
Example 26
cont’d
then has approximately a normal distribution with mean
value
= 4.0 and
so
82
The Central Limit Theorem
The CLT provides insight into why many random variables
have probability distributions that are approximately
normal.
For example, the measurement error in a scientific
experiment can be thought of as the sum of a number of
underlying perturbations and errors of small magnitude.
A practical difficulty in applying the CLT is in knowing when
n is sufficiently large. The problem is that the accuracy of
the approximation for a particular n depends on the shape
of the original underlying distribution being sampled.
83
The Central Limit Theorem
If the underlying distribution is close to a normal density
curve, then the approximation will be good even for a small
n, whereas if it is far from being normal, then a large n will
be required.
Rule of Thumb
If n > 30, the Central Limit Theorem can be used.
There are population distributions for which even an n of 40
or 50 does not suffice, but such distributions are rarely
encountered in practice.
84
The Central Limit Theorem
On the other hand, the rule of thumb is often conservative;
for many population distributions, an n much less than 30
would suffice.
For example, in the case of a uniform population
distribution, the CLT gives a good approximation for n  12.
85
5.5
The Distribution of a
Linear Combination
Copyright © Cengage Learning. All rights reserved.
86
The Distribution of a Linear Combination
The sample mean X and sample total To are special cases
of a type of random variable that arises very frequently in
statistical applications.
Definition
Given a collection of n random variables X1, . . . , Xn and
n numerical constants a1, . . . , an, the rv
(5.7)
is called a linear combination of the Xi’s.
87
The Distribution of a Linear Combination
For example, 4X1 – 5X2 + 8X3 is a linear combination of X1,
X2, and X3 with a1 = 4, a2 = –5, and a3 = 8.
Taking a1 = a2 = . . . = an = 1 gives Y = X1 + . . . + Xn = To,
and a1 = a2 = . . . = an = yields
88
The Distribution of a Linear Combination
Proposition
Let X1, X2, . . . , Xn have mean values 1, . . . , n,
respectively, and variances
respectively.
1. Whether or not the Xi’s are independent,
E(a1X1 + a2X2 + . . . + anXn) = a1E(X1) + a2E(X2) + . . .
+ anE(Xn)
(5.8)
= a11 + . . . + ann
2. If X1, . . . , Xn are independent,
V(a1X1 + a2X2 + . . . + anXn)
(5.9)
89
The Distribution of a Linear Combination
And
(5.10)
3. For any X1, . . . , Xn,
(5.11)
90
Example 29
A gas station sells three grades of gasoline: regular, extra,
and super.
These are priced at $3.00, $3.20, and $3.40 per gallon,
respectively.
Let X1, X2, and X3 denote the amounts of these grades
purchased (gallons) on a particular day.
Suppose the Xi’s are independent with 1 = 1000, 2 = 500,
3 = 300, 1 = 100, 2 = 80, and 3 = 50.
91
Example 29
cont’d
The revenue from sales is Y = 3.0X1 + 3.2X2 + 3.4X3, and
E(Y) = 3.01 + 3.22 + 3.43
= $5620
92
The Difference Between Two
Random Variables
93
The Difference Between Two Random Variables
An important special case of a linear combination results
from taking n = 2, a1 = 1, and a2 = –1:
Y = a1X1 + a2X2 = X1 – X2
We then have the following corollary to the proposition.
Corollary
E(X1 – X2) = E(X1) – E(X2) for any two rv’s X1 and X2.
V(X1 – X2) = V(X1) + V(X2) if X1 and X2 are
independent rv’s.
94
Example 30
A certain automobile manufacturer equips a particular
model with either a six-cylinder engine or a four-cylinder
engine.
Let X1 and X2 be fuel efficiencies for independently and
randomly selected six-cylinder and four-cylinder cars,
respectively. With 1 = 22, 2 = 26, 1 = 1.2, and 2 = 1.5,
E(X1 – X2) = 1 – 2
= 22 – 26
= –4
95
Example 30
cont’d
If we relabel so that X1 refers to the four-cylinder car, then
E(X1 – X2) = 4, but the variance of the difference is
still 3.69.
96
The Case of Normal Random
Variables
97
The Case of Normal Random Variables
When the Xi’s form a random sample from a normal
distribution, X and To are both normally distributed. Here is
a more general result concerning linear combinations.
Proposition
If X1, X2, . . . , Xn are independent, normally distributed rv’s
(with possibly different means and/or variances), then any
linear combination of the Xi’s also has a normal distribution.
In particular, the difference X1 – X2 between two
independent, normally distributed variables is itself
normally distributed.
98
The Case of Normal Random Variables
The CLT can also be generalized so it applies to certain
linear combinations. Roughly speaking, if n is large and no
individual term is likely to contribute too much to the overall
value, then Y has approximately a normal distribution.
99