Stat 643 Review of Probability Results (Cressie) Probability Space: (H,T,T ) H is the set of outcomes T is a 5-algebra; subsets of H T is a probability measure mapping from T onto [0,1]. Measurable Space: (H,T ). Random Variable: Suppose (H,T,T ) is a probability space and let \ : H Ä ‘ be measurable (i.e., {= - H: \ (=) Ÿ !} - T). Then \ is said to be a random variable (r.v.). Integral of a Measurable Function: Suppose (H,T,.) is a measure space (i.e., . maps from T onto [0,_]) and 0 is a measurable mapping. Then ' 0 . . is defined as a limit of integrals of simple (i.e., step) functions: Write: 0 œ 0 + • 0 • , where 0 + , 0 • 0. If ' 0 + .. • _ and ' 0 • .. • _ then the integral is said to be finite. If ' 0 + .. œ _ œ ' 0 • . ., then the integral is said not to exist; otherwise it is said to exist. The measurable function 0 is said to be integrable if ' 0 . . exists and is finite. Notation: If \ is a r.v. on (H,T,T ), write E(\ ) for ' \.T . Important Convergence Theorems Let (H,T,.) be a measure space and (‘,U1 ) be the measurable space of real numbers with the Borel 5algebra U1 . In the following, 1, 0 , {08 }8>1 , and {18 }8>1 denote measurable functions from (H,T ) into (‘,U1 ). Fatou's Lemma: If 08 0 a.s. (.), for all 8 1, then ' liminf 08 . . Ÿ liminf ' 08 . . . 8Ä_ 8Ä_ Monotone Convergence Theorem: Suppose that a.s. (.), 0 Ÿ 08 Å 0 . Then ' 08 . . Å ' 0 . . . Dominated Convergence Theorem: Suppose that a.s. (.), 08 Ä 0 as 8 Ä _ and |08 | Ÿ 1 for all 8 1. If ' 1. . • _, then 0 is integrable and lim ' 08 . . œ ' 0 . .. 8Ä_ Extended Dominated Convergence Theorem: Suppose that (i) (ii) (iii) 08 Ä 0 a.s. (.), 18 Ä 1 a.s. (.). |08 | Ÿ 18 a.s. (.) and ' 18 . . • _, for all 8 lim ' 1 .. œ ' 1.. • _. 8Ä_ 8 1. 1 lim ' 0 . . œ ' 0 . . • _ . 8Ä_ 8 Then, Note: Dominated convergence is a special case with 18 œ 1 for all 8 1. Scheffe's Theorem: Suppose 08 0 and 0 0 a.s. (.). Let /8 (E) ´ 'E 08 .. and / (E) ´ 'E 0 .. be measures on (H,T,.) with /8 (H) œ / (H) • _, for all 8 1. If 08 Ä 0 a.s. (.), then (i) (ii) sup{|/8 (E)•/ (E)|: E - T } Ä 0, as 8 Ä _ and ' |08 •0 |.. Ä 0, as 8 Ä _. Uniform Integrability The sequence of measurable functions {08 }8>1 is called uniformly integrable (w.r.t. .) if lim sup ' |08 |.. œ 0 . - Ä _ 8>1 {|08 |>-} Theorem: Suppose that .(H) • _, 08 Ä 0 a.s. (.), and the sequence {08 }8>1 is uniformly integrable. Then {08 : 8 1}, 0 are all integrable and ' 08 . . Ä ' 0 . .. Various Forms of Convergence for r.v.'s a.s. \8 Ä \ if T ( lim \8 œ \ ) œ 1 8Ä_ T Convergence in probability: \8 Ä \ if lim T (|\8 •\ | ž %) œ 0, a % ž 0 8Ä_ L: Convergence in Lp : \8 Ä \ if for ' |\8 |: .T • _, ' |\ |: .T • _ , lim ' |\8 •\ |: .T œ 0 8Ä_ e.g., : œ 1 corresponds to convergence in the mean : œ 2 corresponds to convergence in mean square Almost sure convergence: Jensen's Inequality: Let \ ´ (\1 ,...,\. )w be a random vector (i.e., a measurable mapping from (H,T,T ) to (‘. ,U. ), where U. is the 5-algebra of Borel sets in ‘. ). Suppose \ - H a.s., where H is a convex set in ‘. , and E(|\ |) • _. (Recall that |\ | ´ (\"# +â+\.# )1/2 .) Define E(\ ) ´ (E(\1 ),...,E(\. ))w . Let 9: H Ä ‘, where 9 is convex (i.e., 9(!B € (1•!)C) Ÿ !9(B) € (1•!)9(C)). Then E(9(\ )) 9(E(\ )). Corollary: If < is concave then E(<(\ )) Ÿ <(E(\ )) (< is concave iff • < is convex). Radon-Nikodym Theorem Definition: A signed measure . on a measurable space (H,T ) is a mapping .: T Ä (•_,_] such that (s.t.) .(9) œ 0 and .(-E3 ) œ !.(E3 ), (*) M M 2 where M is countable and {E3 } are disjoint. The equality in (*) is taken to mean that the summation converges absolutely if .( - E3 ) is finite, and diverges otherwise. 3-M Jordan Decomposition: A signed measure . can be written as . œ .+ • .• , where .+ and .• are measures. Definition: |.| ´ .+ € .• . Example: Let \ be a r.v. s.t. ' \ • .T • _ . Then .(E) ´ 'E \.T ; E - T is a signed measure. Definition: A measure . is a 5-finite measure if b disjoint {E3 } s.t. H œ - E3 and .(E3 ) • _; _ 3 œ 1,2,... . 3=1 Definition: A [signed] measure . is said to be absolutely continuous (a.c.) with respect to a [signed] measure / if / (E) œ 0 Ê .(E) œ 0 [|/ |(E) œ 0 Ê |.|(E) œ 0]. Write . << / . Definition: Two measurable mappings 0 and 1 on (H,T,.) are said to be equivalent if .(0 Á 1) œ 0. Radon-Nikodym (R-N) Theorem (1) Let (H,T,T ) be a probability space and . a signed measure s.t. |.| << T . Then b r.v. \ , unique up to equivalence (T ), s.t. .(E) œ 'E \.T , a E - T , where ' \ • .T • _ . (2) Let (H,T,/ ) be a measure space, where / is 5-finite, and let . be a signed measure s.t. |.| << / . Then b measurable mapping 0 , unique up to equivalence (/ ), s.t. .(E) œ 'E 0 . / , a E - T , • ' where 0 . / • _. Notes: (i) When (H,T,T ) is a probability space, T is trivially 5-finite and so (1) is just a special case of (2). (ii) When . is a measure, 0 in (2) (and thus \ in (1)) is nonnegative. Notation: \ in (1) or 0 in (2) is called the Radon-Nikodym derivative and is denoted as . ./dT in (1) or .././ in (2). Conditional Expectations Let (H,T,T ) be a probability space and \ a r.v. s.t. \ • is integrable. Let V § T be a sub 5 algebra (i.e., V is a 5-algebra contained in T ). Define a signed measure . on (H,V) by: 3 .(C) ´ 'G \.T ; G - V . Definition: Let TV be the probability measure on V given by TV (G ) œ T (G ) ; a G - V . Then, . << TV and for any V-measurable r.v. ] (i.e., ] •1 ((•_,!]) - V, a ! - ‘ ), ' ] .TV œ ' ] .T . Definition: A function 1: H Ä ‘ is called (a version of) the conditional expectation of \ given V if (i) (ii) 1 is V-measurable ' 1.T œ ' \.T , a G - V . G G Note: The R-N Theorem guarantees the existence of the conditional expectation because .. .(G ) œ 'G 1.TV , where 1 is the R-N derivative .T ; i.e., V .(G ) œ 'G 1.T . The r.v. 1 on V is unique up to equivalence. Notation: 1 œ E(\ |V). Notes: (i) If \ is V-measurable (i.e., \ •1 ((•_,!]) - V, a ! - ‘) then E(\ |V) œ \ a.s. (T ). (ii) If V œ {9,H}, then E(\ |V) œ E(\ ). (iii) Suppose ] is another r.v. on (H,T,T ) and define U (] ) ´ {] •1 (F ): F - U } to be the 5-algebra generated by ] . Then write E(\ |U (] )) as E(\ |] ). (iv) The conditional expectation of \ “smooths" the r.v. \ . Suppose \ is T measurable on (H,T,T ). Then: Sub 5-algebra: E(\ | † ) {9,H} § V § T : E(\ ) E(\ |V) \ a.s. “smoothest" “smooth" “roughest" Conditional Monotone Convergence Theorem: If \8 E(\ |V) a.s. (T ). 0, \8 Å \ a.s. (T ), then E(\8 |V) Å Conditional Dominated Convergence Theorem: If \8 Ä \ a.s. (T ), |\8 | Ÿ ] a.s. (T ) and E(|] |) • _, then lim E(\8 |V) œ E(\ |V) a.s. (T ) . 8Ä_ 4 Conditional Jensen's Inequality: Let 9: H Ä ‘, where 9 is convex and H is a convex subset of ‘. Let \ be a r.v. on (H,T,T ) s.t. \ - H a.s. (T ). Suppose E(|9(\ )|) • _. Then E(9(\ )|V ) 9(E(\ |V)) a.s. (T ). Regular Conditional Probability Let E - T and ME be the indicator function of M . Then E(ME |V) is a V-measurable r.v. that has some properties of a probability on sets E - T . Notice that T (=,E) ´ E(ME |V)(=) is a mapping from H ‚ T onto [0,1]. Definition: Given a probability space (H,T,T ), <: H ‚ T Ä [0,1] is called a regular conditional probability on T given V if (i) for each fixed = - H, <(=, † ) is a probability measure on (H,T ), (ii) for each fixed E - T , <( † ,E) is V-measurable, and (iii) for every E - T , 'G <(=,E).T (=) œ T (E • G ), a G - V. Note: Although E(ME |V) satisfies (ii) and (iii), it does not necessarily satisfy (i). Theorem: If (H,T,T ) œ (‘. ,U. ,T ), then a regular conditional probability on U. given V exists and is given by <(=,E) œ E(ME |V)(=). Change of Variable Theorem: Let 0 : (H,T,.) Ä (H* ,T* ,/ ) and 1: (H* ,T* ,/ ) Ä (‘,U1 ) be two measurable functions, where / (E* ) ´ .(0 •1 (E* )) is the measure induced by 0 on (H*,T *). Then ' E* 1. / œ '0 •1 (E* ) 1 ‰ 0 . . œ '0 •1 (E* ) 1(0 (=)). . , a E* - T * , in the sense that if one of the integrals exists then so does the other and they are equal. Probability versus Statistics Probability is concerned with r.v.'s \ : (H,T ,P) Ä (‘,U1 ). Now \ induces a probability measure T \ on (‘,U1 ) through T \ (F ) ´ T (\ •1 (F )) , a F - U1 . Statistics focuses on the triple (‘,U1 ,T \ ) and essentially forgets about (H,T,T ). More generally, \ does not have to be a r.v. on (‘,U1 ) but could be some more general random quantity. Then data \ could be thought of simply as a measurable mapping into (k ,U ,T \ ) . Example: k œ ‘. and U œ U. : Then \ is a random vector. More complicated k and U are needed when \ is, say, a random set. Definition: A statistic X is a measurable mapping X : (k ,U ,T \ ) Ä (g ,Y ) . Note: T X induces a T X : T X (J ) ´ T \ (X •1 (J )), a J - Y . 5 Notation: U0 ´ U (X ) ´ {X •1 (J ): J - Y }, the 5-algebra generated by X . Then U0 § U is a sub 5-algebra. Lehmann's Theorem (TSH, p. 42): A real valued U -measurable function 9 is U0 -measurable iff b a real-valued Y -measurable function <: (g ,Y ) Ä (‘,U1 ) s.t. 9(B) œ <(X (B)) , a B - k , where X is a statistic mapping into (g ,Y ) and U0 œ U (X ). Partitioning the Sample Space Suppose k is the sample space and X is a statistic. Then X defines a partition of k as follows: For B,C - k , B and C are in the same member of the partition (write B µ C) iff X (B) œ X (C). Notice that two “different" statistics can generate the same partition, e.g., X1 (B) œ !B3 and X2 (B) œ !B3 /. . . . 3=1 3=1 It is tempting to characterize a statistic's behavior via its partition of k but, for technical reasons, it does not necessarily generate the 5-algebra of interest, namely U (X ). In general, if U (X1 ) œ U (X2 ) then we say the two statistics X1 ,X2 are the same. Change of Variable Formula Involving a Statistic 1 X Suppose (k ,U ,T \ ) Ä (g ,Y ,T X ) Ä (‘,U1 ) . Then, ' •1 1(X (B)).T \ (B) œ ' 1(>).T X (>) ; J - Y , X (J ) J in the sense that if one integral exists then so does the other and they are equal. Proof: Assume J œ g , the whole space. First let 1 œ MJ1 for some J1 - Y . Then 1(X (B)) œ MJ1 (X (B)) œ MX •1 (J1 ) (B) . 6 Thus, since k œ X •1 (g ), ' 1(X (B)).T \ (B) œ ' MX •1 (J1 ) (B).T \ (B) k k œ T \ (X •1 (J1 )) œ T X (J1 ) œ 'g MJ1 (>).T X (>) œ 'g 1(>).T X (>) . The same result is obtained when J is not g . Therefore, the result is true for indictor functions, so true for simple functions, and hence true for limits of simple functions. Probability Version of Change of Variables Formula Suppose 1 \ (H,T,T ) Ä (k ,U ,T \ ) Ä (‘,U1 ) . Then, ' 1(\ (=)).T (=) œ ' 1(B).T \ (B) , H k which is known as the law of the unconscious statistician. Change of Variables and the R-N Derivative Suppose T \ << ., where . is some 5-finite measure on (k ,U ). Define 0 (B) ´ (.T \ /. .)(B); B - k , which recall is a U -measurable mapping; i.e., T \ (F ) œ 'F 0 . . . Then, for 1 s.t. ' |1(B)|.T T (B) • _, the change of variable formula gives E(1(\ )) ´ 'H 1(\ (=)).T (=) œ 'k 1(B).T \ (B) . Further, ' 1(B).T \ (B) œ ' 1(B)0 (B )..(B) . k k 7 This last equality is true for 1( † ) œ MF ( † ), because' 1.T \ œ ' MF .T \ œ T \ (F ) œ 'F 0 .. œ ' MF 0 . . œ ' 10 . .. Hence it is true for simple functions, and so it is true for limits of simple functions. Example: . is Lebesgue measure. Write “.B" as shorthand for ..(B). For example, if ' B2 .T \ (B) • _, then E(\ 2 ) œ ' B2 0 (B).B, where 0 (B) is the R-N derivative of T \ wrt .; 0 is commonly called the probability density function. 8