Math 52 — Notes on Probability

advertisement
Math 52 — Notes on Probability
(These are sketchy notes to accompany our discussion in lecture and section, and the
text pages 361-2. Please let us know of any typos.)
1. Random Variables and PDFs in R1 .
We say a function
p(x) is a PDF (probability density/distribution function) if p(x) ≥ 0
R
for all x, and R p(x) dx = 1.
Consider a random occurrence with an outcome we call X, some real number. We say
“X is a random variable with distribution p(x)” if for every a, b the probability that X lies
between a and b is
Z
b
Prob(a ≤ X ≤ b) =
p(x) dx.
a
The interval [a, b] ⊂ R is also called an event. Notice that the only events we measure are
sets (usually ranges) of values of X, not individual values of X.
[Think of the x-axis as the set of possible values of the random variable X, and the
graph of p(x) as the continuous analogue of a histogram. This falls in line with p(x) having
an interpretation as a probability density, as the density principle would then dictate that
the probability that X falls in a small range of width ∆x near X = x0 is approximately
p(x0 )∆x; furthermore, probability is an aggregating quantity (like mass or area), so we may
add together disjoint measurements and approximate the total probability via a Riemann
sum. In the limit as ∆x → 0, the Riemann sum approaches the above integral.]
Expectation. The mean value or expected value of X, written E(X), is defined to be
the first moment of x with respect to p(x):
Z
E(X) =
xp(x) dx.
R
R
xp(x) dx
E(X)
.)
= RR
(Note this corresponds to a weighted average of x, since E(X) =
1
p(x) dx
R
In general, the expected value of some function f of X is the first moment of f (x), i.e.
Z
E(f (X)) =
f (x)p(x) dx.
R
(For example, if X is the random variable giving the noontime temperature, in degrees
Fahrenheit, on a January day in Palo Alto, then E(X) is the expected value of this temperature, and E( 95 (X − 32)) is the expected value of this temperature measured in degrees
Celsius.)
Class exercise. Show that expectation value is linear, i.e. that for constants c and d,
E(cf (X) + dg(X)) = cE(f (X)) + dE(g(X)).
1
Variance and Standard Deviation. The variance of a random variable X is the
second moment of X − E(X), i.e.
Var(X) = E((X − E(X))2 ).
The variance is also called the “second moment of X about the mean.” The standard deviation of X is the square root of Var(X).
Class exercise. Show that Var(X) = E(X 2 ) − E(X)2 .
Examples of PDFs
1a. For constants σ > 0 and µ, let
1
2
2
p(x) = √ e−(x−µ) /2σ ,
σ 2π
which is called the normal (Gaussian, bell-shaped) distribution with center µ and width σ.
(You can use single-variable calculus to verify that the graph of p is symmetric about x = µ,
has one local max at x = µ, and has two inflection points at x = µ ± σ.)
By calculating the area under the curve using a trick, we proved in class last week that
in the case µ = 0, σ = 1, then p(x) is indeed a PDF. (See also Problem 33 on page 361.)
Exercise: verify this for arbitrary values of µ, σ.
Other facts: if X is a random variable with the above PDF, then E(X) = µ and Var(X) =
σ 2 . Prove this! (Hint: to compute the integrals, first make the substitution t = (x − µ)/σ.)
Furthermore, anyone who’s taken a stats course knows that
Prob(µ − σ ≤ X ≤ µ + σ) ≈ 0.68,
and
Prob(µ − 2σ ≤ X ≤ µ + 2σ) ≈ 0.95.
(You need a calculator to get these approximations, but how can you use the integrals to
tell that the quantities don’t depend on µ and σ?)
1b. For a constant λ > 0, let
(
p(x) =
1 −x/λ
e
λ
0
x≥0
x<0
This is usually called the exponential-type distribution with width λ. It is often used to
model the time spent waiting in a queue, or perhaps the lifespan of a light bulb.
Exercises: show that p(x) is a PDF! What are the mean and the variance?
1c. For any a, b, with a < b, the uniform distribution on [a, b] is
(
1
a≤x≤b
p(x) = b−a
0
otherwise
You should check that p(x) is a PDF, and compute the mean and variance. If X is a random
variable with this distribution, what is the probability that X lies in [a, b]? outside this
interval?
2
2. Random Variables in R2 and Joint PDFs.
ARRtwo-variable function p(x, y) is a joint PDF (or 2-dim PDF) if p(x, y) ≥ 0 for all (x, y),
and R2 p(x, y) dA = 1.
→
−
→
−
Consider a random occurrence with an outcome some ordered pair X ∈ R2 . We say X
is a random variable with distribution p(x, y) if for every region D ⊂ R2 ,
ZZ
→
−
Prob( X ∈ D) =
p(x, y) dA.
D
1
2
Analogously to the R case, the region D ⊂ R is also called an event.
→
−
For a random variable X = (X, Y ) with joint distribution p(x, y), the expectation (mean)
value of some function f of X and Y is computed analogously to the R1 case, as the first
moment of f :
ZZ
E(f (X, Y )) =
f (x, y)p(x, y) dA.
R2
In particular, one could ask for the expected value of X, or of Y , alone, etc.
Examples (via Text Problems)
2a. Throw a dart at a dartboard; what are the (x, y) coordinates of the spot where
the dart lands? (See also Problem 41: given a joint distribution on (x, y), compute the
probability that the dart lands inside a given region in R2 , etc.)
2b. Two lightbulbs, A and B; say the lifespan of bulb A is X hours and the lifespan of
bulb B is Y hours, each of which are random variables. We could package this information
→
−
together into a single 2-dimensional random variable X = (X, Y ). See also Problem 42: if
a single bulb has PDF p(x), which is an exponential-type distribution with λ = 2000, under
certain circumstances (independence; see below) it makes sense to say that the joint PDF
of two bulbs is p(x, y) = p(x)p(y). What is the probability that both bulbs fail within 2000
hours? What is the probability that bulb A fails before bulb B, but both fail within 1000
hours?
Marginal Probabilities and Independence
→
−
Given a random variable X = (X, Y ) in R2 , it makes sense that we might want to focus
our attention on X alone, or on Y alone, as examples of random variables in R1 . The basic
→
−
question is, given the joint PDF p(x, y) for X , what would be the PDF for either X alone
or Y alone? These are the marginal distributions: let
Z
Z
px (x) =
p(x, y) dy, and py (y) =
p(x, y) dx.
R
R
Then px (sometimes written p1 ) is the single-variable PDF for X alone, and py (sometimes
written p2 ) is the single-variable PDF for Y alone. (Why are px and py necessarily PDFs?)
We say the two random variables X and Y in R1 are independent if the joint PDF p(x, y)
→
−
for the random variable X = (X, Y ) satisfies the property that
p(x, y) = px (x)py (y),
3
where px and py are the marginal distributions. (See also Problem 42.)
Where does this idea come from? Well, intuitively, we’d like the random variables X
and Y to be called “independent” if, for any rectangle D = {a ≤ x ≤ b, c ≤ y ≤ d}, the
probability that (X, Y ) lies in D can be computed by separately calculating Prob(a ≤ X ≤ b)
and Prob(c ≤ Y ≤ d), and multiplying these two probabilities together. It turns out this
condition is satisfied just when the above property holds. To see that the property agrees
with intuition, notice that if p(x, y) = px (x)py (y), then
ZZ
Prob(a ≤ x ≤ b, c ≤ y ≤ d) =
p(x, y) dA
D
Z bZ d
=
px (x)py (y) dy dx
a
c
Z b
Z d
=
px (x) dx
py (y) dy
a
c
= Prob(a ≤ X ≤ b) · Prob(c ≤ Y ≤ d).
An Advanced Topic: Non-Independent Random Variables
(We don’t plan to talk about this in lecture, but it may make for an interesting discussion
in a future section.)
You may have noticed that essentially every problem in this section of the text features
a joint PDF that can be written as a product of two functions, one involving x alone and
one involving y alone; thus, in all these examples, X and Y are independent. But this is not
the case for real life!
→
−
Suppose X = (X, Y ) is a random variable for the noontime temperatures on some January day in Palo Alto and San Francisco, respectively. Intuitively, we wouldn’t expect X
and Y to be independent of each other. In fact, since we expect X − Y to be relatively small,
and (X + Y )/2 to be relatively close to 60, here’s a guess at the joint distribution on (X, Y ):
2
x+y
2
p(x, y) = Ce−(x−y) −( 2 −60) ,
where C is an appropriate constant chosen to make sure that
RR
R2
p dA = 1.
Exercises (hard, but not impossible): Find the value of C. Show that E(X − Y ) = 0 and
+ Y )) = 60. Compute the marginal distributions px and py , i.e. the PDFs for the
temperatures in Palo Alto and San Francisco respectively. Is p(x, y) = px (x)py (y)?
E( 21 (X
The covariance of X and Y is defined to be
Cov(X, Y ) = E((X − E(X))(Y − E(Y ))).
It is a fact that if X and Y are independent, then Cov(X, Y ) = 0. (Can you show this?)
What is Cov(X, Y ) for the above example?
4
Download