Notes 12 - Wharton Statistics Department

advertisement
Statistics 510: Notes 12
Reading: Sections 4.8-4.9
Schedule:
I will e-mail review problems for midterm by Thursday
night.
Friday, 10/21, 5 pm: Homework 5 due. Office hours by
appointment.
Monday, 10/24, class: Chapters 5.1-5.2
Monday, 10/24, evening (time, location TBA): question
and answer session for midterm
Tuesday, 10/25: Office hours: 1-2, 4:45-6:45, by
appointment.
Wednesday, 10/26P Midterm.
Midterm info:
Covers lectures 1-12 on Chapters 1-4.
Best review is to review the homework problems and class
notes.
The exam is closed book but you are allowed two 8.5 x 11
sheets of notes, front and back. Bring a calculator.
I. Review: Poisson Distribution
Arises in two settings:
(1) Poisson distribution provides an approximation to the
binomial distribution when n is large, p is small and
  np is moderate.
(2) Poisson distribution is used to model the number of
events that occur in a time period t when
(a) the probability of an event occurring in a given small
time period t ' is approximately proportion to t '
(b) the probability of two or more events occurring in a
given small time period t ' is much smaller than t '
(c) the number of events occurring in two non-overlapping
time periods are independent
When (a), (b) and (c) are satisfied, the number of events
occurring in a time period t has a Poisson ( t ) distribution.
The parameter  is called the rate of the Poisson
distribution. The mean number of events that occur is
t and the variance of the number of events is also t .
Sketch of proof for Poisson distribution under (a)-(c):
For a large value of n, we can divide the time period t into
n nonoverlapping intervals of length t / n . The number of
events occurring in time period t is then approximately
Binomial (n,  t / n) . Using the Poisson approximation to
the binomial, the number of events occurring in time period
t is approximately Poisson (n * t / n) =Poisson ( t ).
Taking the limit as n   yields the result.
Number of events occurring in space: The Poisson
distribution also applies to the number of events occurring
in space. Instead of intervals of length t, we have domains
of area or volume t. Assumptions (a)-(c) become:
(a’) the probability of an event occurring in a given small
region of area or volume t is approximately proportion to
t
(b’) the probability of two or more events occurring in a
given small region of area or volume t is much smaller than
t
(c’) the number of events occurring in two non-overlapping
regions are independent
The parameter  for a Poisson distribution for the number
of events occurring in space is called the intensity.
Example 1: Bacteria are distributed throughout a volume of
liquid according to assumptions (a’), (b’) and (c’) with an
intensity of   0.6 organisms per mm3. A measuring
device counts the number of bacteria in a 10 mm3 volume
of the liquid. What is the probability that more than two
bacteria are in this measured volume?
II. Geometric Random Variable (Section 4.8.1)
Suppose that independent trials, each having a probability
p, 0  p  1 , of being a success, are performed until a
success occurs. Let X be the random variable that denotes
the number of trials required. The probability mass
function of X is
P{ X  n}  (1  p)n1 p
n  1, 2,
(1.1)
The pmf follows because in order for X to equal n, it is
necessary and sufficient that the first n-1 trials are failures
and the nth trial is a success.
A random variable that has the pmf (1.1) is called a
geometric random variable with parameter p.
The expected value and variance of a geometric (p) random
variable are
1
1 p
E ( X )  , Var ( X )  2 .
p
p
Example 2: A fair die is tossed. What is the probability
that the first six occurs on the fourth roll? What is the
expected number of tosses needed to toss the first six?
III. Negative Binomial Distribution (Section 4.8.2)
Suppose that independent trials, each having a probability
p, 0  p  1 , of being a success, are performed until r
successes occur. Let X be the random variable that denotes
the number of trials required. The probability mass
function of X is
 n  1 r
nr
P{ X  n}  
n  r , r  1, (1.2)
 p (1  p)
r

1


A random variable whose pmf is given by (1.3) is called a
negative binomial random variable with parameters (r , p ) .
Note that the geometric random variable is a negative
binomial random variable with parameters (1, p) .
The expected value and variance of a negative binomial
random variable are
r
r (1  p)
E ( X )  , Var ( X ) 
p
p2
The pmf follows because in order for X to equal n, it is
necessary and sufficient that the first n-1 trials are failures
and the nth trial is a success.
Example 3: Suppose that an underground military
installation is fortified to the extent that it can withstand up
to four direct hits from air-to-surface missiles and still
function. Enemy aircraft can score direct hits with these
particular missiles with probability 0.7. Assume all firings
are independent. What is the probability that a plane will
require fewer than 8 shots to destroy the installation? What
is the expected number of shots required to destroy the
installation?
IV. Hypergeometric Random Variables (Section 4.8.3)
Suppose that a sample of size n is to be chosen randomly
(without replacement) from an urn containing N balls, of
which m are white and N  m are black. If we let X be the
random variable that denotes the number of white balls
selected, then
 m  N  m 
 

i  n  i 

P{ X  i} 
, i  0,1, , n
N
 
(1.4)
 
n 
A random variable X whose pmf is given by (1.4) is said to
be a hypergeometric random variable with parameters
n, N , m .
The expected value and variance of a hypergeometric
random variable with parameters n, N , m is
nm
n 1 

E( X ) 
, Var ( X )  np(1  p ) 1 

N
 N 1  .
Example 4: A Scrabble set consists of 54 consonants and
44 vowels. What is the probability that your initial draw
(of seven letters) will be all consonants? six consonants
and one vowel? five consonants and two vowels?
V. Zeta (or Zipf) distribution
A random variable is said to have a zeta (sometimes called
the Zipf) distribution with parameter  if its probability
mass function is given by
C
P{ X  k}   1 , k  1, 2,
k
for some value of   0 .
Since the sum of the foregoing probabilities must equal 1, it
follows that
   1  1 
C     
 k 1  k  
1
Consider a population of objects that are grouped into
categories (such as all words in a book (grouped into
words) or people living in urban areas in a country
(grouped into cities). Let X  k denote the event that a
randomly chosen object belongs to the kth largest group.
The Zipf distribution has been found to accurately describe
P{ X  k} such as words in a book and the cities people live
in.
Rank
n
1
7
13
19
25
31
City
NewYork
Detroit
Baltimore
Washington, D.C.
New Orleans
Kansas City, Mo.
Population
(1990)
7,322,564
1,027,974
736,014
606,900
496,938
434,829
Expected population under
Zipf’s distribution with
 0
10,000,000
1,428,571
769,231
526,316
400,000
322,581
37
49
61
73
85
97
Virginia Beach, Va.
Toledo
Arlington'Texas
Baton Rouge, La.
Hialeah, Fla.
Bakersfield, Calif.
393,089
332,943
261,721
219,531
188,008
174,820
270,270
204,082
163,934
136,986
117,647
103,093
VI. Properties of the Cumulative Distribution Function
(Section 4.9)
Recall that the cumulative distribution function (CDF) of a
random variable X is the function F (b)  P( X  b) .
All probability questions about X can be answered in terms
of the cdf F. For example,
P(a  X  b)  F (b)  F (a) for all a  b .
Download