pptx

advertisement
Probability distributions:
part 2
BSAD 30
Dave Novak
Source: Anderson et al., 2015
Quantitative Methods for Business 12th
edition – some slides are directly from
J. Loucks © 2013 Cengage Learning
Last class

Random Variables
Discrete
 Continuous


Discrete Probability Distributions
Uniform Probability Distribution
 Binomial Probability Distribution
 Poisson Probability Distribution

2
Overview

Continuous Probability Distributions
Uniform Probability Distribution
 Normal Probability Distribution
 Exponential Probability Distribution


Link to examples of types of continuous
distributions
• http://www.epixanalytics.com/modelassist/AtRisk/
Model_Assist.htm
3
Overview

We will briefly look at three “commonly”
observed continuous probability examples
Uniform
 Normal
 Exponential


4
In real-world applications, it is fairly common
to find instances of random variables that
follow a continuous uniform, Normal, or
Exponential probability distribution
Overview
Uniform
 Normal
 Exponential

f (x)
f (x) Exponential
Uniform
f (x)
Normal
x
x
5
x
Review

A random variable (RV) is a numerical
description of the outcome of an experiment

If an RV can take on ANY value within a
range, it is continuous (measured)
• Any value between 0 and 100 (i.e. 95.67, 23.541,
etc.)

If an RV must take on a well defined value
within a range, it is discrete (counted)
• Must be an even integer between 0 and 100 (i.e.
4, 28, 92, etc.)
6
Review
Just like RVs, probability distributions are
classified as discrete or continuous
 Probability distributions: graphical, tabular,
and/or mathematical representations that
show the relationship between the possible
outcomes of a statistical experiment, and
the probability that each of those outcomes
will occur

7
Probability distributions

Probability distributions are typically defined
in terms of the probability density function
(pdf)
pdf for discrete RV gives the probability that
a value drawn from a particular distribution
(x) takes on a particular value
 pdf for continuous RV gives the probability
that a value drawn from a particular
distribution (x) is between two values

8
Probability distributions

pdf for discrete RV


P(x) is equal to x
pdf for continuous RV
P(x) lies between a (upper bound) and b
(lower bound) of some function
 The probability that a continuous RV is
exactly equal to a particular value is zero!
 Why?

9
Continuous probability
distributions

A continuous RV can assume any value in
an interval

It is not possible to talk about the probability
of a continuous RV assuming a specific
value
• Given the range 0 ≤ x ≤ 1, what is the probability
of x = 0.254?
10
Continuous probability
distributions

Instead, we talk about the probability of the
random variable assuming a value within a
given interval or range
• Given the range 0 ≤ x ≤ 1, what is the probability
that 0.5 ≤ x ≤ 0.75?
11
Probability distributions
In both the discrete and continuous case,
the cumulative distribution function (cdf)
gives us the probability that x is less than or
equal to a particular value
 pdf and cdf provide a different visual
representation of the same variable, x

10
Discrete probability
distributions

Example of discrete uniform pdf for 6-sided
die – mathematical representation
Set of possible values X = {1, 2, 3, 4, 5, 6}
xϵX
 Probability of value x: P(x)

11
x
1
2
3
4
5
6
P(x)
1/6
1/6
1/6
1/6
1/6
1/6
Discrete probability
distributions

12
Example of discrete uniform pdf for 6-sided
die – graphical representation
Discrete probability
distributions

13
Example of discrete uniform cdf for 6-sided
die
Discrete probability
distributions

Example of discrete pdf for two 6-sided dice
– mathematical representation
14
Source: https://www.me.utexas.edu/~jensen/ORMM/computation/unit/rvadd/discrete_dist/dist.img/disc_example.gif
Discrete probability
distributions

Example of discrete pdf for two 6-sided dice
– graphical representation
15
Source: http://wiki.ubc.ca/images/thumb/2/21/MATH105DiceDistPDF.png/300px-MATH105DiceDistPDF.png
Discrete probability
distributions

Example of cdf for two 6-sided dice
16
Source: http://stevestedman.com/wp-content/uploads/analytics_CUME_DIST_dice1.png
Continuous probability
distributions

17
Example of normal pdf
Continuous probability
distributions

18
Example of normal cdf
Continuous probability
distributions

Examples of continuous random variables
include the following:
The number of ounces of soup contained in
a can labeled “8 oz.”
 The flight time of an airplane traveling from
Chicago to New York
 The drilling depth required to reach oil in an
offshore drilling operation

21
Continuous probability
distributions

The probability of the random variable
assuming a value within a given interval
from x1 to x2 is defined to be the area under
the graph of the pdf between a and b
f (x)
Uniform
a b
22
22
f (x)
x
f (x) Exponential
Normal
a b
x
a xb1 x2
x
Continuous Uniform
probability distributions

A continuous RV is uniformly distributed
when the probability that the variable will
assume a value in any interval of equal
length is the same for each interval

The uniform probability density function is
f (x) = 1/(b – a) for a < x < b
=0
elsewhere
where: a = smallest value the variable can assume
b = largest value the variable can assume
23
Flight time example



Let x denote the flight time of an airplane traveling from
Chicago to New York
Assume that the minimum flight time is 2 hours and
that the maximum flight time is 2 hours 20 minutes
Assume that flight data are available to conclude that
the probability of a flight time between 120 and 121
minutes is the same as the probability of a flight time
within any other 1-minute interval up to and including
140 minutes

24
Probability of flight arriving 2 hours and 2 minutes after
take off is the same as probability of flight arriving 2
hours and 10 minutes after take off
Flight time example
Uniform pdf (mathematical representation
where: x = flight time in minutes
Uniform pdf (graphical representation
We are subdividing this area
into 20 time intervals of 1
minute each
25
The probability that the flight
arrives is the same for all 20
of those intervals
Flight time example
Question: What is the probability that a flight will take
between 135 and 140 minutes?
f(x)
P(135 < x < 140) = 1/20(5) = .25
1/20
x
120
130 135 140
Flight Time (mins.)
26
Flight time example
What is the probability that a flight will take
between 121 and 128 minutes?
f(x)
P(121 < x < 128) = 1/20(7) = .35
1/20
x
120
140
130
Flight Time (mins.)
27
Normal probability
distributions
The normal probability distribution is the
most important distribution for describing a
continuous RV
 It is widely used in statistical inference as
the assumption of normality underlies many
standard statistical tests

28
Normal probability
distributions

What does the assumption of normality
mean in practice?
Many statistical tests employ the assumption
of normality
 Deviations from normally distributed data will
likely render those tests inaccurate
 Tests that rely on the assumption of
normality are called PARAMETRIC tests
 Parametric tests tend to be very powerful
and accurate in testing variability in data

29
Normal probability
distributions

What does this mean in practice?


You can TEST the normality assumption

30
You SHOULD NOT use statistical tests that
assume a normal distribution if the data you
are analyzing do not follow a normal
distribution (at least approximately)
If data are not assumed to be normally
distributed, you will likely need to use
NONPARAMETRIC tests that make no
distributional assumptions
Parametric vs
nonparametric

Describe two broad classifications of
statistical procedures


31
A very well known definition of
nonparametric begins “A precise and
universally acceptable definition of the term
‘nonparametric’ is presently not available”
(Handbook of Nonparametric Statistics,
1962, p. 2)
Thanks! That’s not at all helpful…
Parametric vs
nonparametric
In general, nonparametric procedures do
NOT rely on the shape of the probability
distribution from which they were drawn
 Parametric procedures do rely on
assumptions about the shape of the
probability distribution

It is assumed to be a normal distribution
 All parameter estimates (mean, standard
deviation) assume the data come from an
underlying normally distributed population

32
Parametric vs
nonparametric
Analysis
Parametric
Nonparametric
1) Compare means between two distinct/independent groups
Two-sample t-test
Wilcoxon rank-sum test
2) Compare two quantitative measurements taken from the same individual
Paired t-test
Wilcoxon signed-rank test
3) Compare means between three or more distinct/independent groups
Analysis of variance (ANOVA)
Kruskal-Wallis test
4) Estimate the degree of association between two quantitative variables
Pearson coefficient of correlation Spearman’s rank correlation
Source: Hoskin (not dated) “Parametric and Nonparametric: Demystifying the Terms”
33
Normal probability
distributions

Why should you care?
You want to know which set of tests
(parametric –vs- nonparametric) are
appropriate for the data you have
 Use of an inappropriate statistical tests
yields inaccurate or meaningless results

34
Normal probability
distributions

Why should you care?

35
It’s not a matter of being “a little wrong” –
you either use an appropriate statistical test
correctly and have something meaningful to
say about the data OR you use an
inappropriate statistical test (or use it
incorrectly), and have nothing that can
accurately be said about the data
Normal probability
distributions

The normal distribution is used in a wide
range of “real world” applications
Height of people
 Test scores
 Amount of rainfall
 Scientific tests

36
Normal probability
distributions

The normal PDF
1
( x   )2 /2 2
f (x) 
e
 2
where:
 = mean
 = standard deviation
 = 3.14159
e = 2.71828
37
Normal probability
distributions

Characteristics of normal PDF

38
Symmetric and is bell-shaped
Normal probability
distributions

Characteristics of normal PDF
Family of normal distributions defined by
mean, µ, and standard deviation, 
 Highest point is at the mean, which is also
the median and mode

39
Mean 
x
Measures of location
Summarize sample data using a single value
 Mean
 Median
 Mode
Symmetric distributions
 Mean = median = mode
39
Measures of location
Relationship between mean, median, and
mode provides valuable information about the
probability distribution
 Most appropriate measure of location
depends on the data and the intended use of
the summary information
 Choosing a measure of location that is
most favorable to one’s point of view is a very
common way to mislead people with statistics

39
Measures of location

Housing prices
Median or mean?
 A few highly priced homes will increase the
mean, but will not impact the median
 Skewed right (heavy right tail)

• Mean > median

Skewed left (heavy left tail)
• Mean < median
39
Normal probability
distributions

Characteristics of normal PDF

Mean can be any numerical value including
negative, positive, or zero
x
-10
40
0
20
Normal probability
distributions

Characteristics of normal PDF

44
Standard deviation determines the width of
the curve: larger  results in wider, flatter
curves
Normal probability
distributions

Characteristics of normal PDF
Approximately 68% of all values or a
normally distributed RV are within (+/-) 1 
of the mean
 Approximately 95.4% of all values or a
normally distributed RV are within (+/-) 2 
of the mean
 Approximately 99.7% of all values or a
normally distributed RV are within (+/-) 3 
of the mean

42
Normal probability
distributions

46
Characteristics of normal PDF
Normal probability
distributions

Characteristics of normal PDF
Probabilities for the normal random variable
are given by areas under the curve
 The total area under the curve is 1 (.5 to the
left of the mean and .5 to the right)

.5
44
.5
x
Normal probability
distributions

Percentile ranking
If a student scores 1 standard deviation
above the mean on a test, then the student
performed better than 84% of the class (0.5
+ 0.34 = 0.84)
 If a student scores 2 standard deviations
above the mean on a test, then the student
performed better than 98% of the class (0.5
+ 0.477 = 0.977)

45
Normal probability
distributions
An RV with a normal distribution with mean,
µ, = 0, and standard deviation, , = 1
follows a standard normal distribution
 The letter z is used to refer to a variable that
follows the standard normal distribution

z
x

We can think of z as a measure of the number of
standard deviations a given variable, x, is from the mean, 
46
Standard normal
distribution

No naturally measured variable has this
distribution, so why do we care about it?
ALL normal distributions are equivalent
to this distribution when the unit of
measurement is changed to measure
standard deviations from the mean
 It’s important because ALL normal
distributions can be “converted” to standard
normal, and then we can use the standard
normal table to find needed information

47
Normal probability
distributions
51
Auto parts store example
Pep Zone sells auto parts and supplies including a
popular multi-grade motor oil. When the on hand
inventory of oil drops to 20 gallons, a replenishment order
is placed.
The manager is concerned that sales are being lost due
to stockouts (running out of a product) while waiting for
the replenishment order to be filled. It’s estimated that
customer demand during replenishment lead-time (the
time between when the order is placed and the order
arrives) is normally distributed with a mean of 15 gallons
and a standard deviation of 6 gallons.
What is the probability of a stockout, P(x > 20)?
48
Auto parts store example
stockout
49
Auto parts store example
stockout
50
Auto parts example
stockout

51
Use the probability table for SND
Auto parts store example
stockout
Area = 0.2967
Area = 0.5
So, 1 – (0.5+0.2967) =
1 - 0.7967
= .2033
z
0 .83
52
Auto parts store example
reorder point
If the manager wants the probability of a stockout to be
no more than 0.05 (5%), what is the appropriate reorder
point? The manager wants to minimize the risk of
stocking out – which is currently 20%
If the manager sets the stockout probability threshold at
5%, what is the new reorder point? The existing reorder
point is 20 gallons  so, what should our ideal reorder
point be, if we want to reduce the probability of a stockout
from 20% to 5%?
53
Auto parts store example
reorder point
Area = .4500
Area = .5
Area = .05
z
0
54
Auto parts example reorder
point
55
Auto parts store example
reorder point
56
Auto parts store example
reorder point
57
Auto parts store example
reorder point
By increasing the reorder point from 20
gallons to 25 gallons, the probability of a
stockout can be decreased from about .20
to .05 (20% to less than 5%)
 This is a significant decrease in the
probability that the store will be out of stock
and unable to meet customer demand

58
Auto parts store example
reorder point

An obvious related question would be, what
have stockouts cost the store to date?
How many direct sales $ has the store lost
due to stockouts?
 How many indirect sales $ has the store lost
due to stockouts? Not just lost sales
because the product the customer wants to
purchase is not in stock, but how many of
those customers would have also made
other purchases or never come back at all?

59
Exponential probability
distributions
The exponential probability distribution is
also an important distribution for describing
a continuous RV
 It is useful in describing the time it takes to
complete at task, how much time elapses
before an event occurs, distance between
events, etc.:

60
•
•
•
•
Time between arrivals at a check out
Time between arrivals at a toll booth
Time required to complete a questionnaire
Distance between potholes in a roadway
Similarity to Poisson
distribution

The Poisson distribution provides an
appropriate description of the number of
occurrences per interval


The exponential distribution provides an
appropriate description of the length of the
interval (time, distance, etc.) between
occurrences

61
Discrete and CAN BE COUNTED
Continuous and MUST BE MEASURED
Exponential probability
distributions

Exponential density function
f ( x) 
1

e  x /  for x > 0,  > 0
where:
 = mean
e = 2.71828
62
Exponential probability
distributions

Cumulative density function
P ( x  x0 )  1  e  xo / 
where:
x0 = some specific value of x
63
Fueling example
The time between arrivals of cars at Al’s fullservice gas pump follows an exponential
probability distribution with a mean time between
arrivals of 3 minutes
Al would like to know the probability that the time
between any two successive arrivals will be 2
minutes or less
64
Fueling example
65
Fueling example
f(x)
.4
P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866
.3
.2
.1
x
1
2 3 4
5 6 7 8
9 10
Time Between Successive Arrivals (mins.)
66
Summary

Examples of continuous probability
distributions
Uniform
 Normal
 Exponential

71
Download