Math 52 — Notes on Probability (These are sketchy notes to accompany our discussion in lecture and section, and the text pages 361-2. Please let us know of any typos.) 1. Random Variables and PDFs in R1 . We say a function p(x) is a PDF (probability density/distribution function) if p(x) ≥ 0 R for all x, and R p(x) dx = 1. Consider a random occurrence with an outcome we call X, some real number. We say “X is a random variable with distribution p(x)” if for every a, b the probability that X lies between a and b is Z b Prob(a ≤ X ≤ b) = p(x) dx. a The interval [a, b] ⊂ R is also called an event. Notice that the only events we measure are sets (usually ranges) of values of X, not individual values of X. [Think of the x-axis as the set of possible values of the random variable X, and the graph of p(x) as the continuous analogue of a histogram. This falls in line with p(x) having an interpretation as a probability density, as the density principle would then dictate that the probability that X falls in a small range of width ∆x near X = x0 is approximately p(x0 )∆x; furthermore, probability is an aggregating quantity (like mass or area), so we may add together disjoint measurements and approximate the total probability via a Riemann sum. In the limit as ∆x → 0, the Riemann sum approaches the above integral.] Expectation. The mean value or expected value of X, written E(X), is defined to be the first moment of x with respect to p(x): Z E(X) = xp(x) dx. R R xp(x) dx E(X) .) = RR (Note this corresponds to a weighted average of x, since E(X) = 1 p(x) dx R In general, the expected value of some function f of X is the first moment of f (x), i.e. Z E(f (X)) = f (x)p(x) dx. R (For example, if X is the random variable giving the noontime temperature, in degrees Fahrenheit, on a January day in Palo Alto, then E(X) is the expected value of this temperature, and E( 95 (X − 32)) is the expected value of this temperature measured in degrees Celsius.) Class exercise. Show that expectation value is linear, i.e. that for constants c and d, E(cf (X) + dg(X)) = cE(f (X)) + dE(g(X)). 1 Variance and Standard Deviation. The variance of a random variable X is the second moment of X − E(X), i.e. Var(X) = E((X − E(X))2 ). The variance is also called the “second moment of X about the mean.” The standard deviation of X is the square root of Var(X). Class exercise. Show that Var(X) = E(X 2 ) − E(X)2 . Examples of PDFs 1a. For constants σ > 0 and µ, let 1 2 2 p(x) = √ e−(x−µ) /2σ , σ 2π which is called the normal (Gaussian, bell-shaped) distribution with center µ and width σ. (You can use single-variable calculus to verify that the graph of p is symmetric about x = µ, has one local max at x = µ, and has two inflection points at x = µ ± σ.) By calculating the area under the curve using a trick, we proved in class last week that in the case µ = 0, σ = 1, then p(x) is indeed a PDF. (See also Problem 33 on page 361.) Exercise: verify this for arbitrary values of µ, σ. Other facts: if X is a random variable with the above PDF, then E(X) = µ and Var(X) = σ 2 . Prove this! (Hint: to compute the integrals, first make the substitution t = (x − µ)/σ.) Furthermore, anyone who’s taken a stats course knows that Prob(µ − σ ≤ X ≤ µ + σ) ≈ 0.68, and Prob(µ − 2σ ≤ X ≤ µ + 2σ) ≈ 0.95. (You need a calculator to get these approximations, but how can you use the integrals to tell that the quantities don’t depend on µ and σ?) 1b. For a constant λ > 0, let ( p(x) = 1 −x/λ e λ 0 x≥0 x<0 This is usually called the exponential-type distribution with width λ. It is often used to model the time spent waiting in a queue, or perhaps the lifespan of a light bulb. Exercises: show that p(x) is a PDF! What are the mean and the variance? 1c. For any a, b, with a < b, the uniform distribution on [a, b] is ( 1 a≤x≤b p(x) = b−a 0 otherwise You should check that p(x) is a PDF, and compute the mean and variance. If X is a random variable with this distribution, what is the probability that X lies in [a, b]? outside this interval? 2 2. Random Variables in R2 and Joint PDFs. ARRtwo-variable function p(x, y) is a joint PDF (or 2-dim PDF) if p(x, y) ≥ 0 for all (x, y), and R2 p(x, y) dA = 1. → − → − Consider a random occurrence with an outcome some ordered pair X ∈ R2 . We say X is a random variable with distribution p(x, y) if for every region D ⊂ R2 , ZZ → − Prob( X ∈ D) = p(x, y) dA. D 1 2 Analogously to the R case, the region D ⊂ R is also called an event. → − For a random variable X = (X, Y ) with joint distribution p(x, y), the expectation (mean) value of some function f of X and Y is computed analogously to the R1 case, as the first moment of f : ZZ E(f (X, Y )) = f (x, y)p(x, y) dA. R2 In particular, one could ask for the expected value of X, or of Y , alone, etc. Examples (via Text Problems) 2a. Throw a dart at a dartboard; what are the (x, y) coordinates of the spot where the dart lands? (See also Problem 41: given a joint distribution on (x, y), compute the probability that the dart lands inside a given region in R2 , etc.) 2b. Two lightbulbs, A and B; say the lifespan of bulb A is X hours and the lifespan of bulb B is Y hours, each of which are random variables. We could package this information → − together into a single 2-dimensional random variable X = (X, Y ). See also Problem 42: if a single bulb has PDF p(x), which is an exponential-type distribution with λ = 2000, under certain circumstances (independence; see below) it makes sense to say that the joint PDF of two bulbs is p(x, y) = p(x)p(y). What is the probability that both bulbs fail within 2000 hours? What is the probability that bulb A fails before bulb B, but both fail within 1000 hours? Marginal Probabilities and Independence → − Given a random variable X = (X, Y ) in R2 , it makes sense that we might want to focus our attention on X alone, or on Y alone, as examples of random variables in R1 . The basic → − question is, given the joint PDF p(x, y) for X , what would be the PDF for either X alone or Y alone? These are the marginal distributions: let Z Z px (x) = p(x, y) dy, and py (y) = p(x, y) dx. R R Then px (sometimes written p1 ) is the single-variable PDF for X alone, and py (sometimes written p2 ) is the single-variable PDF for Y alone. (Why are px and py necessarily PDFs?) We say the two random variables X and Y in R1 are independent if the joint PDF p(x, y) → − for the random variable X = (X, Y ) satisfies the property that p(x, y) = px (x)py (y), 3 where px and py are the marginal distributions. (See also Problem 42.) Where does this idea come from? Well, intuitively, we’d like the random variables X and Y to be called “independent” if, for any rectangle D = {a ≤ x ≤ b, c ≤ y ≤ d}, the probability that (X, Y ) lies in D can be computed by separately calculating Prob(a ≤ X ≤ b) and Prob(c ≤ Y ≤ d), and multiplying these two probabilities together. It turns out this condition is satisfied just when the above property holds. To see that the property agrees with intuition, notice that if p(x, y) = px (x)py (y), then ZZ Prob(a ≤ x ≤ b, c ≤ y ≤ d) = p(x, y) dA D Z bZ d = px (x)py (y) dy dx a c Z b Z d = px (x) dx py (y) dy a c = Prob(a ≤ X ≤ b) · Prob(c ≤ Y ≤ d). An Advanced Topic: Non-Independent Random Variables (We don’t plan to talk about this in lecture, but it may make for an interesting discussion in a future section.) You may have noticed that essentially every problem in this section of the text features a joint PDF that can be written as a product of two functions, one involving x alone and one involving y alone; thus, in all these examples, X and Y are independent. But this is not the case for real life! → − Suppose X = (X, Y ) is a random variable for the noontime temperatures on some January day in Palo Alto and San Francisco, respectively. Intuitively, we wouldn’t expect X and Y to be independent of each other. In fact, since we expect X − Y to be relatively small, and (X + Y )/2 to be relatively close to 60, here’s a guess at the joint distribution on (X, Y ): 2 x+y 2 p(x, y) = Ce−(x−y) −( 2 −60) , where C is an appropriate constant chosen to make sure that RR R2 p dA = 1. Exercises (hard, but not impossible): Find the value of C. Show that E(X − Y ) = 0 and + Y )) = 60. Compute the marginal distributions px and py , i.e. the PDFs for the temperatures in Palo Alto and San Francisco respectively. Is p(x, y) = px (x)py (y)? E( 21 (X The covariance of X and Y is defined to be Cov(X, Y ) = E((X − E(X))(Y − E(Y ))). It is a fact that if X and Y are independent, then Cov(X, Y ) = 0. (Can you show this?) What is Cov(X, Y ) for the above example? 4