Probability Distributions

advertisement
QBM117
Business Statistics
Probability Distributions
Random variables and probability distributions
Objectives
• To define a random variable.
• To define the probability distribution for a random
variable.
• To distinguish between a discrete random variable
and a continuous random variable.
• To introduce discrete probability distributions.
• Calculate the mean, variance and standard deviation
of a discrete probability distribution.
Random Variables
• A random variable is a variable whose numerical
value is determined by a the outcome of a random
experiment.
• It is random because the value it assumes depends
on chance.
Examples of Random Variables
• Imagine drawing a student at random from the
student body.
• The student’s height, weight, weekly income and
grade point average are all numerical values
describing properties of the randomly selected
student.
• They are all random variables.
Random Experiment
Draw a student at random from the student body
Random Variable
Height (meters) of the randomly selected student
Possible values for the random variable
Any value between about 1.5 m and 2 m
Random Experiment
Toss Two coins
Random Variable
The number of heads
Possible values for the random variable
0, 1 or 2
Random Experiment
Audit 50 tax returns
Random Variable
The number of returns containing errors
Possible values for the random variable
0, 1, 2,…,50
Random Experiment
Weigh a shipment of goods
Random Variable
The weight of the shipment
Possible values for the random variable
Any value greater than or equal to 0
Notation
• We make the distinction between random variable
and the values it can assume, by following the
convention of using a capital letter such as X and Y
to denote random variables, and using lower-case
letters such as x and y to denote their values.
Discrete and Continuous Random
Variables
• There are two types of random variables
- discrete
- continuous
• They are distinguished from one another by the
number of possible values they can assume.
Discrete Random Variables
• A discrete random variable has a finite number of
possible values.
• For example
- the number of defective items in a production
batch
- the number of telephone calls received in a given
hour
- the number of customers served in a hotel
reception on a given day
Continuous Random Variables
• A continuous random variable has an infinite number
of possible values.
• For example
- the duration of long-distance telephone calls
- The lifetime of a certain brand of tyres
- The total annual sales of a firm
- The rate of return of a particular stock
Examples revisited
Random Experiment
Draw a student at random from the student body
Random Variable
Height (meters) of the randomly selected student
Possible values for the random variable
Any value between about 1.5 m and 2 m
Continuous or Discrete?
Continuous
Random Experiment
Toss Two coins
Random Variable
The number of heads
Possible values for the random variable
0, 1 or 2
Continuous or Discrete?
Discrete
Random Experiment
Audit 50 tax returns
Random Variable
The number of returns containing errors
Possible values for the random variable
0, 1, 2,…,50
Continuous or Discrete?
Discrete
Random Experiment
Weigh a shipment of goods
Random Variable
The weight of the shipment
Possible values for the random variable
Any value greater than or equal to 0
Continuous or Discrete?
Continuous
Probability Distributions
• A probability distribution of a random variable X tells
us what the possible values of X are and the
associated probabilities P(X=x) or p(x).
• There are two types of probability distributions
- discrete probability distribution
- continuous distribution
Discrete Probability Distributions
• The probability distribution of a discrete random
variable is a table, formula or graph that lists all the
possible values of the random variable and their
associated probabilities.
X
P(X=x)
x1 x2 … xn
p1 p2 … pn
Requirements of Discrete Probability
Distributions
If a discrete random variable X can take values
x1, x2,…, xn with probabilities p(x1), p(x2),…, p(xn) , the
probabilities must satisfy two requirements:
1. Every probability p(xi) is a number between 0 and 1
0  p( xi )  1 for i  1,2,..., n
1. The probabilities must add to 1
n
 p( x )  1
i 1
i
Example 1
Consider a study of 300 households in a town in the
coast of Queensland. As a part of this study, data
were collected showing the number of children in
each household. The following results were obtained:
54 of the households has no children, 117 had 1
child, 72 had 2 children, 42 had 3 children, 12 had 4
children, and 3 had 5 children.
Consider the experiment of randomly selecting one of
these households to participate in a follow-up study.
Let X = number of children in the household selected.
The possible values of X are 0, 1, 2, 3, 4, and 5.
The probability that the selected household has no
children is 54/300 = 0.18.
Hence P(X=0) = 0.18
The probability that the selected household has 1
child is 117/300 = 0.39.
Hence P(X=1) = 0.39
The probability that the selected household has 2
children is 72/300 = 0.24
Hence P(X=2) = 0.24
The probability that the selected household has 3
children is 42/300 = 0.14
Hence P(X=3) = 0.14
The probability that the selected household has 4
children is 12/300 = 0.04
Hence P(X=4) = 0.04
The probability that the selected household has 5
children is 3/300 = 0.01
Hence P(X=5) = 0.01
The probability distribution of X can be presented in
tabular form.
X
0
1
2
3
4
5
P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01
Note that each of the probabilities is between 0 and 1,
and that the probabilities add to 1.
The probability distribution of X can also be
presented in terms of the following formula
0.18
0.39


0.24
p( x)  
0.14
0.04


0.01
if x  0
if x  1
if x  2
if x  3
if x  4
if x  5
It can also be presented in the form of a graph.
0.45
0.4
0.35
P(X=x)
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
X
4
5
Using a Probability Distribution
• A primary advantage of defining a random variable
and its probability distribution is that once the
probability distribution is known, it is relatively easy to
determine the probability of a variety of events that
may be of interest to a decision maker.
• We interpret the probabilities the same way we did
last week when we were looking at probability.
Consider Example 1:
P(X=4) = 0.04 implies that the probability that a
randomly selected household has 4 children is 0.04
• We can also apply the addition rule for mutually
exclusive events.
Consider Example 1:
The values of X are mutually exclusive; a household
can have 0, 1, 2, 3, 4 or 5 children.
The probability that a randomly selected household
has 3 or more children is
P( X  3)  P( X  3)  P( X  4)  P( X  5)
 0.14  0.04  0.01
 0.19
Example 2
Using historical records, the personnel manager of a
plant has determined the probability distribution of X,
the number of employees absent per day. It is
X
0
1
2
3
4
5
6
7
P(X=x) 0.005 0.025 0.310 0.340 0.220 0.080 0.019 0.001
What is the probability that there are no absent
employees on any given day?
What is the probability that there are no more than 2
employees absent on any given day?
What is the probability that there are no absent
employees on any given day?
P(X=0) = 0.005
What is the probability that there are at most 2 absent
employees on any given day?
P( X  3)  P( X  0)  P( X  1)  P( X  2)  P( X  3)
 0.05  0.025  0.310  0.340
 0.68
Expected Value and Variance
• In Topic 1 we calculated sample and population
means and variances for frequency distributions.
• A probability distribution is the distribution of a
population.
• We can calculate the population mean and variance
for probability distributions.
Expected Value
The mean, or expected value, of a discrete random
variable X is obtained by
• multiplying each possible value of X by its
associated probability
• and then summing the resulting products.
n
  E ( X )   xi p( xi )
i 1
Example 1 revisited
X
0
1
2
3
4
5
P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01
The expected number of children per household is
  E( X )
 0  0.18  1 0.39  2  0.24  3  0.14
 4  0.04  5  0.01
 1.5
Variance
The variance of a discrete random variable X is found
by
• subtracting the mean from each value and
squaring this difference.
• multiplying squared difference by the associated
probability,
• and then summing the resulting products
n
 2  V ( X )   ( xi   )2 p( xi )
i 1
• A more computationally efficient method of
calculating the variance of a discrete random variable
is to use the following formula
n
  V ( X )   xi p( xi )   2
2
2
i 1
• This is just a rearrangement of the formula on the
previous slide.
Example 1 revisited
X
0
1
2
3
4
5
P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01
The variance of number of children per household is
  V (X )
2
 02  0.18  12  0.39  22  0.24  32  0.14
 4  0.04  5  0.01  1.5
2
2
 3.5  1.5
 1.25
2
2
Standard Deviation
• Following on from Topic 1, the standard deviation can
be found by taking the square root of the variance
Example 1 revisited
X
0
1
2
3
4
5
P(X=x) 0.18 0.39 0.24 0.14 0.04 0.01
The standard deviation of X, the number of children
per household, is
  1.25
 1.12 (2d.p.)
Example 2 revisited
X = the number of employees absent per day
X
0
1
2
3
4
5
6
7
P(X=x) 0.005 0.025 0.310 0.340 0.220 0.080 0.019 0.001
Determine the mean and standard deviation of the
number of employees absent per day.
The mean number of employees absent per day is
  0  0.005  1 0.025  2  0.310  3  0.340
 4  0.220  5  0.080  6  0.019  7  0.001
 3.066
The mean number of employees absent per day is
  02  0.005  12  0.025  22  0.310  32  0.340
 4  0.220  5  0.080  6  0.019
2
2
 7 2  0.001  (3.066) 2
 10.587  (3.066) 2
 1.178 (2d.p.)
2
Example 3 (Exercise 5.19)
The owner of a small firm has just purchased a
personal computer, which she expects will surge her
for the next two years. The owner has just been told
that she must buy a surge suppressor to provide
protection for her new hardware against possible
surges or variations in the electrical current. Her son
David, a recent university graduate, advises that an
inexpensive suppressor could be purchased that
would provide protection against one surge only. He
notes that the amount of damage done without a
suppressor would depend on the extent of the surge.
David conservatively estimates that, over the next
two years, there is a 1% chance of incurring $400
damage and a 2% chance of incurring $200 damage.
But the probability of incurring $100 damage is 0.1.
1. How much should the owner be willing to pay for a
surge suppressor?
2. Determine the standard deviation of the possible
amounts of damage.
To answer these questions we need to construct the
probability distribution for the amount of damage
incurred.
Let X = the amount of damage incurred.
David conservatively estimates that, over the next
two years, there is a 1% chance of incurring $400
damage and a 2% chance of incurring $200 damage.
But the probability of incurring $100 damage is 0.1.
X
0
100 200 400
P(X=x) 0.87 0.10 0.02 0.01
1. To determine how much the owner should be willing
to pay for a surge suppressor we need to work out
the expected amount of damage to be incurred.
E ( X )  0  0.87  100  0.10  200  0.02  400  0.01
 18
The expected amount of damage to be incurred is
$18, therefore the owner should be willing to pay up
to $18.
2. To determine the standard deviation of the possible
amounts of damage we need to calculate the
variance and then take the square root of the
variance to obtain the standard deviation.
V ( X )  02  0.87  1002  0.10  2002  0.02
 4002  0.01  182
 3400  182
 3076
  3076
 55.46
Hence the standard deviation of the possible
amounts of damage is $55.46.
Reading for next lecture
• Chapter 5 Section 5.4
Exercises
•
•
•
•
5.1
5.5
5.11
5.22 a and b only
Download