Chapter 3 – Random Variables and Probability Distributions

advertisement
1
Chapter 3 – Random Variables and Probability Distributions
Defn: An experiment is a test or series of tests in which purposeful changes are made to the input
variables of a process or system so that we may observe and identify reasons for changes in the output
response.
Defn: A random experiment is one whose outcome cannot be predicted with certainty.
Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and
bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone
concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three
replications. (Example from Miller & Freund’s Probability & Statistics for Engineers, by R. A.
Johnson). In this case, there are two experimental factors – concentration and temperature – with two
levels of concentration and five levels of temperature. The (random) response variable is reflectivity.
Defn: A random variable is a numerical variable whose measured value is determined by chance.
Note: We will denote a random variable with an uppercase letter, such as X, and a measured value of
the random variable with a lowercase letter, such as x.
Example: In the experiment described above, reflectivity is affected by concentration and temperature,
but it also affected by other factors not explicitly included in the experiment. Hence, reflectivity is a
random variable.
A random variable is called continuous if the set of possible values is some interval(s) of the real
numbers. A random variable is called discrete if the set of possible values is either finite or countably
infinite.
Example:
Continuous random variables – electric current, reflectivity (above example), temperature, pressure.
Discrete random variables – number of defective parts in a shipment of parts, number of people in a
poll who support a particular candidate for President, number of accidents happening at the
intersection of Beach Boulevard and Kernan Boulevard in a month.
Probability
There are two primary interpretations of probability:
1) Subjective approach: Probability values are assigned based on educated guesses about the relative
likelihoods of the different possible outcomes of our random experiment. This approach involves
advanced concepts and principles, such as entropy.
2) Relative frequency approach: In this approach to assigning probabilities to events, we look at the
long-run proportion of occurrences of particular outcomes, when the random experiment is performed
many times. This long-run proportion tells us the approximate probability of occurrence of each
outcome.
2
Example: If we flip a coin once, what is the likelihood that the outcome is a head? Why? For a single
coin flip, we cannot say with certainty what the outcome will be. However, if we flip a coin
1,000, 000 times, we are fairly sure that approximately one-half of the outcomes will be heads.
This approach is based on the Law of Large Numbers, which says, in particular, that the relative
frequency of occurrence of a particular outcome of a random experiment approaches a specific limiting
number between 0 and 1 if we perform the experiment a very large number of times.
Defn: A set is a collection of elements.
Defn: Given a set , another set A is called a subset of , denoted
is also an element of .
Defn: Given a set , and two sets
A
and
A   , if every element of A
B   , we define the union of A and B, denoted
by A  B , to be the set of all elements of  that are either elements of A or elements of B or
elements of both A and B.
Note: If  is the sample space of a random experiment, then A and B are events, and
event that either A or B (or both A and B) happens when we perform the experiment.
Defn: Given a set , and two sets
denoted by
A
and
A B
is the
B   , we define the intersection of A and B,
A  B , to be the set of all elements of  that are elements of both A and B.
Note: If  is the sample space of a random experiment, then A and B are events, and
event that both A and B happen when we perform the experiment.
A B
Defn: The empty set, or null set, , is the set that contains no elements.
Note: The null set is a subset of every set.
Defn: Two sets A, B are said to be mutually exclusive if A  B   .
Defn: The complement
numbers.
A of a set A   is A  x   : x  A, where  is the set of real
Basic Laws of Probability (Kolmogorov’s Axioms, in terms of a random variable X):
1) P X    1 .
2) 0  P X  E   1 , for any E   .
3) If E1, E2, E3, ..., En   are mutually exclusive, then
P X  E1  E2  E3    En   P X  E1   P X  E2   P X  E3     P X  En 
Note: Laws 1, 2, and 3 imply the complement rule: For any set E, PE  1  PE .
is the
3
Note: We may generalize Kolmogorov’s Axioms from subsets of real numbers to subsets of the set of
all possible outcomes of a random experiment. For example, if our random experiment is to flip a fair
coin twice, the set of all possible outcomes, called the sample space of the experiment, is
  HH , HT , TH , TT  . Any subset of a sample space is called an event. The following 3 laws
apply when we consider all 16 subsets of the sample space of this experiment:
1)
P  1 ,
2) For any A   ,
0  P A  1 .
3) If A, B   such that A  B   , then
P A  B  P A  PB .
Example: From handout.
Example: p. 53, Exercise 3-13.
Continuous Random Variables
Defn: The probability distribution of a random variable X is the set
 A, P A : A  .
Mathematically, the two types of random variables – continuous and discrete – must be handled
differently.
Under certain simple conditions, we may describe the distribution for a continuous random variable
using a probability density function.
4
Defn: If the distribution of a continuous random variable has a probability density function, f(x), then
b
for any interval (a, b), we have Pa  X  b    f x dx . The probability density function (p.d.f.)
a
has the following properties, which follow from Kolmogorov’s Axioms:
1)
f x   0
everywhere;

2)
 f x dx  1 .

Note: If X is a continuous r.v., then P X  x  0 for any x.
(Think about this.)
Note: As a result, we have
Pa  X  b  Pa  X  b  Pa  X  b  Pa  X  b .
Example: p. 59, Exercise 3-19
Defn: The cumulative distribution function (or c.d.f.) for a continuous r.v. X is given by
F x   P X  x  
x
 f x dx , for all x   .
If the distribution does not have a p.d.f., we may still

define the c.d.f. for any x as the probability that X takes on a value no greater than x.
Note: The c.d.f. for the distribution of a r.v. is unique, and completely describes the distribution.
Example: p. 59, Exercise 3-19.
Mean and Variance
Defn: The mean, or expected value, or expectation, of a continuous r.v. X with p.d.f. f(x) is given by
  EX  

 xf x dx .

Example: p. 59, Exercise 3-19.
Note: We interpret the mean in terms of relative frequency. If we were to repeated take a
measurement of the random variable X, recording all of our measurements, and calculating the average
after each measurement, the value of the average would approach a limit as we continued to take
measurements, and this limit is the expectation of X.
Defn: Let X be a continuous r.v. with p.d.f. f(x), and mean . The variance of X, or the variance of

  x   
the distribution of X, is given by   V  X   E  X    
2
2


deviation of X is just the square root of the variance.
2
f  x dx . The standard
5
Note: In practice, it is easier to use the computational formula for the variance, rather than the defining
formula:
  E X
2
2
 

2

 x f x dx  
2
2
.

Example: p. 59, Exercise 3-19
Defn: The kth moment of the distribution of X is
E  X k  .
The Uniform Distribution
Consider a continuous r.v. X whose distribution has p.d.f. f  x  
1
, for a  x  b , and
ba
f  x   0 , otherwise. We say that X has a uniform distribution on the interval (a, b), abbreviated
X ~ Uniform(a, b). If we take a measurement of X, we are equally likely to obtain any value within
the interval. Hence, for some subinterval  c, d    a, b  , we have
1
d c
dx 
.
ba
ba
c
d
P c  x  d   

b
x
1  x2 
ab
dx 

The mean of the uniform distribution is    xf  x  dx  
, the


ba
b  a  2 a
2

a
b
midpoint of the interval (a, b).
The second moment of the distribution is
b
 b  a   b 2  ab  a 2 
1
b3  a 3
2
E  X    x f  x  dx 
x dx 

.

b

a
3
b

a
3b  a 



a

2
2
Then the variance is
b 2  ab  a 2 b 2  2ab  a 2  b  a 
  E  X    


, and the standard deviation is
3
4
12
2
2

2
2
ba
.
2 3
Note: the longer the interval (a, b), the larger the values of the variance and standard deviation.
6
The Normal Distribution
The normal distribution is a special type of bell-shaped curve.
Defn: A random variable X is said to be normally distributed or to have a normal distribution if its
p.d.f has the form
f x  
1
2 

e
 x   2
2 2
, for -  < x < , -  <  < , and  > 0.
Here  and  are the parameters of the distribution;  = the mean of the random variable X (or of the
probability distribution); and  = the standard deviation of X.
Note: The normal distribution is not just a single distribution, but rather a family of distributions; each
member of the family is characterized by a particular pair of values of  and .
The graph of the p.d.f. has the following characteristics:
1) It is a bell-shaped curve;
2) It is symmetric about ;
3) The inflection points are at  -  and  + .
The normal distribution is very important in statistics for the following reasons:
1) Many phenomena occurring in nature or in industry have normal, or approximately normal,
distributions.
Examples: a) heights of people in the general population of
adults; b) for a particular species of pine tree in a forest, the
trunk diameter at a point 3 feet above the ground; c) fill
weights of 12-oz. cans of Pepsi-Cola; d) IQ scores in the
general population of adults; e) diameters of metal shafts used
in disk drive units.
2) Under general conditions (independence of members of a
sample), the possible values of the sample mean for samples of
a given (large) size have an approximate normal distribution (Central Limit Theorem).
The Empirical Rule:
For the normal distribution,
1) The probability that X will be found to have a value in the interval ( - ,  + ) is approximately
0.6827;
2) The probability that X will be found to have a value in the
interval ( - 2,  + 2) is approximately 0.9545;
3) The probability that X will be found to have a value in the
interval ( - 3,  + 3) is approximately 0.9973.
Unfortunately, the p.d.f. of the normal distribution does not have a closed-form anti-derivative.
Probabilities must be calculated using numerical integration methods. This difficulty is the reason for
the importance of a particular member of the family of normal distributions, the standard normal
distribution, which has p.d.f.
7
f z  
2
z

1
e 2
2
, for
  z   .
Note: For shorthand, we will write X ~ Normal(, ) to mean that the continuous r.v. X has a normal
distribution with mean  and standard deviation .
The c.d.f. of the standard normal distribution will be denoted by
 z  PZ  z 
1  w2
e dw .
2
2
z


Values of this function have been tabulated in Table 1 of Appendix A.
Examples: p. 64
The reason that the standard normal distribution is so important is that, if X ~ Normal(, ), then
X 
~ Normal(0, 1).
Z

Example: p. 74, Exercise 3-38 a, b.
In statistical inference, we will have occasion to reverse the above procedure. Rather than finding the
probability associated with a given interval, we will want to find the end point of an interval
corresponding to a given tail probability for a normal distribution.
I.e., we will want to find percentiles of the normal distribution, by inverting the distribution function
(z).
Examples:
a)
b)
c)
Find the 90th percentile of the standard normal distribution.
Find the 95th percentile of the standard normal distribution.
Find the 97.5th percentile of the standard normal distribution.
Example: p. 74, Exercise 3-40 c.
Note: Mean and standard deviation of the Normal(, ) distribution. Although the p.d.f. cannot be
integrated in closed form, the mean and variance may easily be found by integration.
Lognormal Distribution
Defn: We say that a continuous r.v. X has a lognormal distribution with parameters  and  if the
natural logarithm of X has a normal distribution. The p.d.f. of X is
f  x 
   ln  x     
exp 
 , for 0 < x < , and f  x   0 , for x  0.
2 2
x 2


1
1
2
  2
The mean and variance of X are
  EX   e


and  2  V  X   e2  e  1 . These may
2
2
easily be seen by using a change of variable and the results for the mean and variance of the normal
distribution. The parameters  and 2 are the mean and variance of the r.v. W = ln(X).
8
We write X ~ lognormal(, ) to denote that X has a lognormal distribution with parameters  and .
Note: The c.d.f. for X is given by
ln  x    

 ln  x    
F  X   P  X  x   P W  ln  x    P  Z 
  
,






for x > 0, and F(X) = 0, for x  0. Hence, we may find probabilities associated with X by using Table
1 in Appendix A.
Note: This distribution is often applied to model the lifetimes of systems that degrade over time.
Example: p. 75, Exercise 3-50
Gamma Distribution
Defn: The gamma function is defined by the integral   r  

t
r 1  t
e dt , for r > 0.
0
It may be shown using integration by parts that   r    r  1   r  1 . Hence, in particular, if r is a
positive integer,   r    r  1!. We also have   0.5  
.
Defn: A continuous r.v. X is said to have a gamma distribution with parameters r > 0 and  > 0 if the
p.d.f. of X is
f  x 
r
 r 
x r 1e   x , for x > 0, and f(x) = 0, for x  0.
The mean and variance of X are given by   E  X  
r

and  2  V  X  
r
2
.
We write X ~ gamma(r, ) to denote that X has a gamma distribution with parameters r and .
It may be easily shown that the integral of the gamma p.d.f. over the interval (0, +) is 1, using the
definition of the gamma function.
The gamma distribution is very important in statistical inference, both in its own right and because it is
the basis for constructing some other distributions useful in inference. For example, the “signal-tonoise” ratio statistic that we will use in analyzing the results of scientific experiments is based on a
ratio of random variables which have gamma distributions of a particular form.
The graphs of some gamma p.d.f.’s are shown on p. 72.
Defn: A continuous r.v. X is said to have a chi-squared distribution with k degrees of freedom if X ~
gamma(k, 0.5).
Weibull Distribution
Defn: A continuous r.v. X is said to have a Weibull distribution with parameters  > 0 and  > 0 if the
p.d.f. of X is
9
x
f  x   
  
 1
  x  
exp      , for x > 0, and f  x   0 , for x  0. The mean and variance of X
    
2

1
2
2 
2
are   E  X    1   and   V  X     1     . We write X ~ Weibull(, ).
 
 
  x  
The c.d.f. for a Weibull(, ) distribution is given by F  x   1  exp      , for x > 0, and
    
F(x) = 0, for x  0.
The Weibull distribution is used to model the reliability of many different types of physical systems.
Different combinations of values of the two parameters lead to models with either a) increasing failure
rates over time, b) decreasing failure rates over time, or c) constant failure rates over time.
Example: p. 75, Exercise 3-53.
Download