Probability, Mean, and Median

advertisement
Probability, Mean and Median
In the last section, we considered (probability) density functions. We went on to
discuss their relationship with cumulative distribution functions. The goal of this section
is to take a closer look at densities, introduce some common distributions and discuss the
mean and median.
Recall, we define probabilities as follows:
Proportion of population for Area under the graph of


p ( x ) between a and b
which x is between a and b
b
 p( x)dx
a
The cumulative distribution function gives the proportion of the population that has
values below t. That is,
t
P (t ) 
 p( x)dx


Proportion of population
having values of x below t
When answering some questions involving probabilities, both the density function and
the cumulative distribution can be used, as the next example illustrates.
Example 1:
Consider the graph of the function p(x).
p x 
0.2
0.1
2
4
6
8
10
Figure 1: The graph of the function p(x)
a. Explain why the function is a probability density function.
b. Use the graph to find P(X < 3)
c. Use the graph to find P(3 § X § 8)
1
x
Solution:
a. Recall, a function is a probability density function if the area under the curve is
equal to 1 and all of the values of p(x) are non-negative. It is immediately clear
that the values of p(x) are non-negative. To verify that the area under the curve is
equal to 1, we recognize that the graph above can be viewed as a triangle. Its base
1
is 10 and its height is 0.2. Thus its area is equal to 10  0.2  1 .
2
b. There are two ways that we can solve this problem. Before we get started, though,
we begin by drawing the shaded region.
p x 
0.2
0.1
2
4
6
8
10
x
The first approach is to recognize that we can determine the area under the curve
from 0 to 3 immediately. The shaded area is another triangle, with a base of 3 and
a height of 0.1. Thus, the area is equal to 0.15.
A second approach would be to find the equation of the lines that form p(x) and
use the integral formula on the previous page.
For the first line, notice that the line passes through the points (0, 0) and (6, 0.2).
Using the point-slope formula, we see that the line is given by p(x) = (1/30)x.
The second line passes through the points (6, 0.2) and (10, 0). Again, using the
point-slope formula, we see that the line is given by p(x) = -(1/20)x + 1/2.
 1 x
 30
 1
1
So, we have that p ( x)   x 
20
2

0


if 0  x  6
if 6  x  10
otherwise
Returning to the original question, we have that P(X < 3) is given by the integral
3
 p( x)dx  P(3)  P(0) . On [0, 3), p(x) = (1/30)x. Notice that P(t) = (1/60)t . So,
2
0
we have that P ( X  3)  P(3)  P(0) 
2
1
1
9
 0.15 .
(3) 2  (0) 2 
60
60
60
c. Again, we have two ways that we can approach this problem. Again, we start by
drawing out the shaded region.
p x 
0.2
0.1
2
4
6
8
10
x
If we want to use triangles, it is easiest to use the fact that the area under the curve
is equal to 1. The shaded region is thus equal to one minus the two triangles on
the sides.
In (b), we found the area of the left triangle is equal to 0.15. The area of the right
triangle is equal to 0.1. So, the area of the shaded region is 1 – 0.15 – 0.1 = 0.75.
If instead we were to use integrals, notice that p(x) changes functions at x = 6.
8
Thus, in order to compute the integral
 p( x)dx , we need to split into two pieces.
3
That is,
8
6
8
3
3
6
 p( x)dx   p( x)dx   p( x)dx .
6
1
1 
xdx   x 2   0.45 .
30
 60  3
3
6
6
 p( x)dx  
3
8
1
 1
 1 2 1 
6 p( x)dx  6   20 x  2  dx   40 x  2 x  6  0.3 .
8
8
So, we see that the shaded area is equal to 0.45 + 0.3 = 0.75, which agrees with
the answer we found the other way.
Often times, we are concerned with finding the “average” value of a distribution.
There are two common measured that are used: the mean and the median.
The Mean
If a quantity has a density function p(x), then we define the mean value of the

quantity as
 xp( x)dx .

3
Example 2:
Returning to the density function given in Example 1, compute its mean.
Solution:
Notice that p(x) changes functions at x = 6. Thus, in order to compute the integral
10
 xp( x)dx , we will need to again split it into two pieces. Thus, we have that the mean is
0
equal to
10
6
0
0
 xp( x)dx   x
1
1
 1
xdx   x   x   dx
30
20
2
6 
10
6
10
1 
1   1
  x3    x3  x 2 
4 6
 90  0  60
216 176 16



90 60
3
The Median
A median of a quantity x distributed through a population is a value T such
that half of the population has values of x less than T and half the population has
values of x greater than T. That is, T satisfies the equation
T
1
 p( x)dx  2

where p(x) is the density function of the quantity. In words, we have that half
the area under the graph of p(x) lies to the left of T (and half lies to the right of T.)
Example 3:
Returning to the density function given in Example 1, compute its median.
Solution:
Looking at Figure 1, notice that more than half of the area occurs in the left side of the
triangle. Thus, the median will be a number between 0 and 6.
4
Since we do not need to worry about the function changing (since it is the same on the
T
1
1
T2 1
 . Solving for T, we see that
interval [0, 6]), we have that  xdx  . That is,
60 2
30
2
0
T  30 .
Note: We did not use the  30 for T, since we know that T is a positive number.
There are a number of important distributions that arise in a variety of situations.
Below, we list three such distributions as well as associated properties.
The first important distribution we shall consider is the uniform distribution. We
introduced this distribution in the previous section. The graph of the density function is
constant on the interval [a, b] and zero elsewhere.
p x 
1
b -a
a
b
x
Figure 2: The density of the uniform distribution on [a, b]
Uniform Distribution
The density of the uniform distribution is given by
p ( x) 
1
,
ba
for a § x § b
The cumulative distribution function is given by
t
P (t )   p( x)dx 
a
t a
,
ba
for a § t § b
Another important distribution we shall consider is the exponential distribution. The
graph of the density function is characterized by an exponential decay.
5
p x 
1
x
Figure 3: The density of the exponential distribution for c > 0.
Exponential Distribution
The density of the exponential distribution is given by
p( x)  ce  cx ,
for x ¥ 0 and any constant c > 0
The cumulative distribution function is given by
t
P (t )   p( x)dx  1  e ct ,
for t ¥ 0
0
Example 4:
Suppose that the probability density function for the wait time in line at a counter is
if x  0
 0
given by p ( x)    x /5
if x  0
ke
a. What is the value of the constant k?
b. Determine the probability that a person will wait at least 3 minutes.
c. What is the mean wait time?
Solution:
a. Comparing the form of the density function with that given in the box above, we
see that c = 1/5. Thus, we must have that k = 1/5. Another way to see this would be
to do the integration and solve for k.

1   ke
0
 x /5
b
dx  lim  ke x /5 dx  lim  5ke x /5   lim  5k  5ke b /5   5k
b 
0
b
b 
0
Dividing both sides by 5, we see that k = 1/5.
6
b 
b. The probability that a person will wait at least 3 minutes is given by

b
 p( x)dx  lim  p( x)dx  lim  P(b)  P(3)   1  P(3)  1  (1  e
3
b 
3/5
b 
3
)  e 3/5 .
Here, we used the fact that lim P(b)  1 to simplify the above expression.
b 

c. The mean wait time is given by
x
5e
 x /5
dx . Using integration by parts, we have:
0
b
b


x  x /5
x  x /5
 x /5 b


lim
lim
e
dx
e
dx
xe




e  x /5 dx 

0 5




0
b  5
b 
0
0




b
b
 lim   xe  x /5   5e x /5 
0
0
b 
 lim  be b /5  5e b /5  5 

b 
5
Note: In general, if p ( x)  ce
 cx

for x ¥ 0, then
1
 xp( x)dx  c .
0
The final distribution which we shall examine is the normal distribution. The graph
of its density function is a bell-shaped curve which peaks at its mean, denoted by m. The
width of the curve is determined by the standard deviation, denoted by s.
s
s
x
m
Figure 4: The density of the normal distribution with parameters m and s.
7
Normal Distribution
The density of the normal distribution is given by
p ( x) 
2
1
e( x   )
 2
2 2
,
for -¶ < x < ¶
where m is the mean of the distribution and s is the standard deviation.

It is beyond the scope of this course to verify that
 p( x)dx  1. However, we can see

 ( x   ) 2 2 2
will always be positive (but less than 1) and
that 0 § p(x) § 1 for all x, since e
1
is a positive scalar that is less than 1.
 2
The normal density function is not an elementary integral. That is, a closed form of the
antiderivative does not exist. But, as Figure 4 above illustrates, there is still area under the
curve. To evaluate the integral, we use a calculator or a table of values.
Example 5:
Lengths of human pregnancies are normally distributed with mean 268 days and
standard deviation 15 days. What percentage of pregnancies last between 250 days and
280 days?
Solution:
Using the fact that m = 268 and s = 15, we have that the density function is given by
2
2
1
p ( x) 
e( x  268) 2(15) . Finding the integral numerically, we have:
15 2
Proportion of pregnancies lasting

between 250 days and 280 days
280
 15
250
8
2
1
e( x  268)
2
2(15) 2
dx  0.673 .
Download