Mathematical Statistics Review

Summary of Stat 531/532 Taught by Dr. Dennis Cox Text: A work-in-progress by Dr. Cox Supplemented by Statistical Inference by Cassella & Berger Chapter 1: Probability Theory and Measure Spaces 1.1: Set Theory (Casella-Berger) The set, S, of all possible outcomes of a particular experiment is called the sample space for the experiment. An event is any collection of possible outcomes of an experiment (i.e. any subset of S). A and B are disjoint (or mutually exclusive) if A  B   . The events A1, A2, … are pairwise disjoint if Ai  A j   i  j.  If A1, A2, … are pairwise disjoint and A i  S , then the collection A1, A2, … forms a partition i 0 of S. 1.2: Probability Theory (Casella-Berger) A collection of subsets of S is called a Borel field (or sigma algebra) denoted by , if it satisfies the following three properties: 1.    2. A    Ac    3. A1 , A2 ,      Ai i 1 i.e. the empty set is contained in ,  is closed under complementation, and  is closed under countable unions. Given a sample space S and an associated Borel field , a probability function is a function P with domain  that satisfies: 1. P(A)  0 for all A   2. P(S)=1 3. If A1, A2,    are pairwise disjoint, then P( i 1 Ai )= i 1 P(Ai).   Axiom of Finite Additivity: If A   and B   are disjoint, then P( A  B )=P(A) + P(B). 1.1: Measures (Cox) A measure space is denoted as , F ,   , where  is an underlying set, F is a sigma-algebra, and  is a measure, meaning that it satisfies: 1. 0      2.     0    3. If A1, A2, … is a sequence of disjoint elements of F, then     i      i .  i 1  i 1 A measure, P, where P()=1 is called a probability measure. A measure on ,   is called a Borel measure. A measure, #, where #(A)=number of elements in A, is called a counting measure. A counting measure may be written in terms of unit point masses (UPM) as #    xi , where i 1 if x  . . UPM measure is also a probability measure. 0 if x  . x   There is a unique Borel Measure, m, satisfying m([a,b]) = b - a, for every finite closed interval [a,b],    a  b   . Properties of Measures: a) (monotonicity) b) (subadditivity)            i    i  , where Ai is any sequence of measurable sets.   c) If Ai , i=1,2,… is a decreasing sequence of measurable sets (i.e. 1   2  ) and if lim     i  .  1    , then     i    i 1  i   1.2: Measurable Functions and Integration (Cox) Properties of the Integral: a) If  f d exists and a   , then b) c) d)  a  f d exists and equals a  f d . If  f d and  g d both exist and  f d   g d is defined (i.e. not of the form    ), then   f  g  d is defined and equals  f d   g d . If f ( )  g ( )     , then  f d   g d , provided the integrals exist.  f d   f d , provided  f d exists. S holds -almost everywhere (-a.e.) iff  a set   F      0 and S   is true for   . Consider all extended Borel functions on , F ,   : Monotone Convergence Theorem: f n is an increasing sequence of nonnegative functions (i.e. 0  f1  f 2    f n  f n1   ) and f ( )  lim f n ( )  lim f n d   f d . Lebesgue’s Dominated Convergence Theorem: f n  f a.e. as n   and  an integrable function g   n, f n  g a.e.  lim f n d   f d . g is called the dominating function. Change of Variables Theorem: Suppose f : , F ,    , G and g : , G   ,   . Then  g  f ( ) d ( )       g ( ) d   f 1 ( ) . Interchange of Differentiation and Integration Theorem: For each fixed  , consider a function g  ,  . If 1. The integral of g is finite, 2. The partial derivative w.r.t.  exists, and 3. the abs. value of the partial derivative is bounded by G ( ) , then g  ,  d g  ,  is integrable w.r.t.  and g  ,  d ( )   d ( ) .     d  1.3: Measures on Product Spaces (Cox) A measure space , F ,   is called -finite iff  an infinite sequence 1 ,  2 ,   F  (i)   i    for each i, and  (ii)   i 1  i . Product Measure Theorem: Let 1 , F1 , 1  and 2 , F2 ,  2  be -finite measure spaces. Then  a unique product measure 1   2 on 1 , F1    2 , F2    1  F1 and  2  F2 , 1   2 1  2   1 1   2 2  . Fubini’s Theorem: Let   1   2 , F  F1  F2 and   1   2 , where  1 and  2 are finite. If f is a Borel function on  whose integral w.r.t.  exists, then   f ( ) d ( )   Conclude that 1  2  1 f (1 ,  2 ) d (1   2 )(1 ,  2 )    f (1 ,  2 ) d1 (1 ) d 2 ( 2 ) .  2   1 f (1 ,  2 ) d1 (1 ) exists  2  a.e. and defines a Borel function on  2 whose integral w.r.t.  2 exists. A function  : , F ,     n is called an n-dimensional random vector (a.k.a random nvector). The distribution of  is the same as the law of  . (i.e.   Law  ).  n Law   Law i    i ' s independen t . i 1 1.4: Densities and The Radon-Nikodym Theorem (Cox) Let  and  be measures on , F  .  is absolutely continuous w.r.t.  (i.e.    ) iff for all   F ,     0      0 . This can also be said as  dominates  (i.e.  is a dominating measure for  ). If both measures dominate one another, then they are equivalent. Radon-Nikodym Theorem: Let , F ,   be a -finite measure space and suppose    . Then  a nonnegative Borel function f      f d ,    F . Furthermore, f is  unique   a.e. 1.5: Conditional Expectation (Cox) Theorem 1.5.1: Suppose  : , F   , G and  : , F    n ,  n  . Then Z is    measurable   a Borel function h : , G    n ,  n  such that Z = h(Y). Let  : , F ,   ,  be a random variable with      , and suppose G is a sub--field of F. Then the conditional expectation of X given G, denoted  | G, is the essentially unique random variable  * satisfying (where  * =  | G) (i)  * is G measurable (ii)    * d    d,    G.  Proposition 1.5.4: Suppose  : , F ,   1 , G1  and  : , F ,    2 , G2  are random elements and  i is a -finite measure on  i , Gi  for I=1,2 such that Law,   1   2 . Let f x, y  denote the corresponding joint density. Let g x, y  be any Borel function 1   2     g ,      . g x,   f x,  d x   Then g ,   |    , a.s.  f x,  d x  1 1 1 The conditional density of X given Y is denoted by f | x, y   1 f  x, y  . f  y Proposition 1.5.5: Suppose the assumptions of proposition 1.5.4 hold. Then the following are true: (i) the family of regular conditional distributions Law |   y exists; (ii) Law |   y  1 for Law   almost all values of y ; (iii) the Radon-Nikodym derivative is given by dLaw |   y  x   f | x, y , 1   2 a.e. d 1 Theorem 1.5.7 - Basic Properties of Conditional Expectation: Let X, X1, and X2 be integrable r.v.’s on , F, P , and let G be a fixed sub--field of F . (a) If   k a.s., k a constant, then  | G  k a.s. (b) If 1   2 a.s., then 1 | G   2 | G a.s. (c) If a1 , a2   , then a1 1  a2  2 | G  a1  1 | G  a2   2 | G , a.s. (d) (Law of Total Expectation)  | G   . (e)  |  ,    . (f) If    G, then  | G   a.s. (g) (Law of Successive Conditioning) If G is a sub--field of G, then  | G  | G   | G | G    | G  a.s. (h) If  1   G and 1  2   , then 1  2 | G  1   2 | G a.s. Theorem 1.5.8 – Convergence Theorems for Conditional Expectation: Let X, X1, and X2 be integrable r.v.’s on , F, P , and let G be a fixed sub--field of F . (a) (Monotone Convergence Theorem) If 0   i   a.s. then  i | G    | G  a.s. (b) (Dominated Convergence Theorem) Suppose there is an integrable r.v. Y such that  i   a.s. for all I, and suppose that  i   a.s. Then  i | G   | G a.s. To Stage Experiment Theorem: Let , G  be a measurable space and suppose p :  n     satisfies the following: (i) p, is Borel measurable for each fixed    n ; (ii) p, y  is a Borel p.m. for each fixed y  . Let  be any p.m. on , G . Then there is a unique p.m. P on  n  ,  n  G     C    p, y  d  y ,     n and C  G. C Theorem 1.5.12 - Bayes Formula: Suppose  : , F ,    2 , G2  is a random element and let  be a -finite measure on  2 , G2   Law   . Denote the corresponding density by dLaw   .     d Let  be a -finite measure on 1 ,G1  . Suppose that for each   1 there is a given pdf w.r.t  denoted f ,  . Denote by X a random element taking values in 1 with dLaw |      f ,  . Then there is a version of Law |   x given by d dLaw |   x f x |       .      | x   d       f x |     d    2 Chapter 2: Transformations and Expectations 2.1: Moments and Moment Inequalities (Cox) A moment refers to an expectation of a random variable or a function of a random variable.  The first moment is the mean  The kth moment is E[Xk]  The kth central moment is E[(X-E[X])k]= E[(X-)k]           Finite second moments  Finite first moments (i.e.   2 Markov’s Inequality: Suppose   0 a.s., then   0,    Corollary to Markov’s Inequality:         .      2 2 Chebyshev’s Inequality: Suppose X is a random variable with E[X2]<  . Then for any k>0 1      k   2 . k Jensen’s Inequality: Let f be a convex function on a convex set    n and suppose  is a random n-vector with      and    a.s. The E[  ]   and f    f .   If f concave, change the sign. If strictly convex/concave, can use strict inequality holds.     Cauchy-Schwarz Inequality:      2    2 . 2 Theorem 2.1.6 (Spectral Decomposition of a Symmetric Matrix): Let A be a symmetric matrix. Then there is an orthogonal matrix U and a diagonal matrix  such that   UU t .  The diagonal entries of  are eigenvalues of A  The columns of U are the corresponding eigenvectors. 2.2: Characteristic and Moment Generating Functions (Cox) The characteristic function (chf) of a random n-vector  is the complex valued function   :  n  C given by   u    e  iut           cos u t X  i sin u t X  The moment generating function (mgf) is defined as  u    e  ut    Theorem 2.2.1: Let  be a random n-vector with chf  and mgf  . (a) (Continuity)  is uniformly continuous on  n , and  is continuous at every point u such that      for all  in a neighborhood of u .       , is (b) (Relation to moments) If  is integrable, then the gradient    , , , u n   u1 u 2 defined at u  0 and equals iE[  ]. Also,  has finite second moments iff the Hessian   D 2 u   u  of  exists at u  0 and then  0     . (c) (Linear Transformation Formulae) Let     b for some m n matrix A and some m-vector b . Then for all    m ,      e i t b t    t        e    t   t b (d) (Uniqueness) If  is a random n-vector and if   u     u  for all u   m , then Law  Law. If both   u ,   u  are defined and equal in a neighborhood of 0 , then Law  Law. (e) (Chf for Sums of Independent Random Variables) Suppose  and  are independent random p-vectors and let      . Then   u     u     u  . Cumulant Generating Function is ln (u) =K(u). 2.3: Common Distributions Used in Statistics (Cox) Location-Scale Families A distribution is a location family if Lawa   Law0   a .  I.e. f a x   f x  a   A distribution is a scale family if Lawa    Law0   . a  f a x   1a f x a     b  A distribution is a location-scale family if Lawa ,b    Law0  .  a  1  x b  f a ,b  x   f   a  a  A class of transformations T on , G  is called a transformation group iff the following hold: (i) Every g   is measurable g : , G  , G . (ii) T is closed under composition. i.e. if g1 and g2 are in T, then so is g1  g 2 . (iii) T is closed under taking inverses. i.e. g    g 1   . Let T be a transformation group and let P be a family of probability measures on , G  . Then P is T-invariant iff    1   . 2.4: Distributional Calculations (Cox) Jensen’s Inequality for Conditional Expectation: g ,     y  g    y , Law[Y]a.s., g(.,y) is a convex function. 2.1: Distribution of Functions of a Random Variable (Casella-Berger) Transformation for Monotone Functions: Theorem 2.1.1 (for cdf): Let X have cdf FX(x), let Y=g(X), and let  and  be defined as   x : f X x  0 and   y : y  g x  for some x   1. 2. If g is an increasing function on  , then FY(y)=FX(g-1(y)) for y   . If g is a decreasing function on  and X is a continuous random variable, then FY(y)=1-FX(g-1(y)) for y   . Theorem 2.1.2 (for pdf): Let X have pdf fX(x) and let Y=g(X), where g is a monotone function. Let  and  be defined as above. Suppose fX(x) is continuous on  and that g-1(y) has a continuous derivative on  . Then the pdf of Y is given by  d 1 1 g  y  for y    f X g y  . fY  y   dy  0 otherwise    The Probability Integral Transformation: Theorem 2.1.4: Let X have a continuous cdf FX(x) and define the random variable Y as Y = FX(X). Then Y is uniformly distributed on (0,1). I.e. P(Y  y) = y, 0 < y < 1. (interpretation: If X is continuous and Y = cdf of X, then Y is U(0,1)). 2.2: Expected Values (Casella-Berger) The expected value or mean of a random variable g(X), denoted Eg(X), is    g  x  f X x  dx if X is continuous Eg  X    if X is discrete .     g x f x   X  x 2.3: Moments and Moment Generating Functions (Casella-Berger) For each integer n, the nth moment of X,  n' , is  n'  EX n . The nth central moment of X is  n  E X   n .  Mean = 1st moment of X  Variance = 2nd central moment of X The moment generating function of X is M X t   Ee tX . NOTE: The nth moment is equal to the nth derivative of the mgf evaluated at t=0. dn i.e. M Xn  0  n M X t  t 0 . dt n  n n Useful relationship: Binomial Formula is   u x v n  x  u  v  . x x 0   Lemma 2.3.1: Let a1,a2,… be a sequence of numbers converging to ‘a’. Then n lim  a n  1    e a . n   n Chapter 3: Fundamental Concepts of Statistics 3.1: Basic Notations of Statistics (Cox) Statistics is the art and science of gathering, analyzing, and making inferences from data. Four parts to statistics: 1. Models 2. Inference 3. Decision Theory 4. Data Analysis, model checking, robustness, study design, etc… Statistical Models A statistical model consists of three things: 1. a measurable space , F  , 2. a collection of probability measures P on , F  , and 3. a collection of possible observable random vectors  . Statistical Inference (i) Point Estimation: Goal – estimate the true  from the data. (ii) Hypothesis Testing: Goal – choose between two hypotheses: the null or the alternative. (iii) Interval & Set Estimation: Goal – find S  x     it’s likely true that   S x .             MSE  ,ˆ    ˆ    Bias  ,ˆ   ˆ   MSE  ,ˆ  Bias 2  ,ˆ  2   Var ˆ  Decision Theory Nature vs. the Statistician Nature picks a value of  and generates    according to  . There exists an action space of allowable decisions/actions and the Statistician must choose a decision rule from a class of allowable decision rules. There is also a loss function based on the Statistician’s decision rule chosen and Nature’s true value picked. Risk is Expected Loss denoted R ,     L ,  x .   A -optimal decision rule  * is one that satisfies R  ,  *  R ,     ,    . Bayesian Decision Theory Suppose Nature chooses the parameter as well as the data at random. We know the distribution Nature uses for selecting  is  , the prior distribution. Goal is the minimize Bayes risk: r     R ,  d   .  r     R ,  d      L ,  x d        L ,  x f x  d xd      x, ad x    a = any allowable action. Want to find  * by minimizing  x, a    * x  :  x,  * x    x,  x , a   This then implies that r  *  r   for any other decision rule  .     3.2: Sufficient Statistics (Cox) A statistic    x is sufficient for  iff the conditional distribution of X given T=t is independent of  . Intuitively, a sufficient statistic contains all the information about  that is in X. A loss function, L(  ,a), is convex if (i) action space A is a convex set and (ii) for each  , L , :   0,  is a convex function. Rao-Blackwell Theorem: Let 1 ,  2 , be iid random variables with pdf f ;  . Let T be a sufficient statistic for  and U be an unbiased estimate for  , which is not a function of T alone. Set  t    U   t . Then we have that: (1) The random variable   is a function of the sufficient statistic T alone. (2)   is an unbiased estimator for  . (3) Var(   ) < Var(U),    provided  U 2   . Factorization Theorem:    x is a sufficient statistic for  iff f x   g  x ,   hx  for some g and h.  If you can put the distribution if the form then  i ' s are sufficient statistics. e  n   i   i  x     i 1    hx  , Minimal Sufficient Statistic ' If m is the smallest number for which T = 1 ,, m  ,  j   j 1 , ,  n , j  1, , m , is a sufficient statistic for   1 ,, m  , then T is called a minimal sufficient statistic for  . Alternative definition: If T is a sufficient statistic for a family P, then T is minimal sufficient iff for any other sufficient statistic S, T = h(S) P-a.s. i.e. Any sufficient statistic can be written in terms of the minimal sufficient statistic. ' Proposition 4.2.5: Let P be an exponential family on a Euclidean space with ' densities f x   exp[   x    ]  hx  , where  and  are p-dimensional. Suppose there exists  0 , 1 ,  , p    0   such that the vectors  i    i     0 , 1  i  p are linearly independent in  p . Then  is minimal sufficient for  . In particular, if the exponential family is full rank, then T is minimal sufficient. 3.3: Complete Statistics and Ancillary Statistics (Cox) A statistic X is complete if for every g,  g    0  g    0 .   x Example: Consider the Poisson Family, F   f .;  : f x;   e    x ,   0,   where x!   x   0  g x   0 . So F is complete. A = [1,2,…].  g    e   g x  x! A statistic V is ancillary iff Law V  does not depend on  . i.e. Law V   Law0 V  Basu’s Theorem: Suppose T is a complete and sufficient statistic and V is ancillary for  . Then T and V are independent. A statistic V  V  is:  Location invariant iff V    V   a  a  Location equivariant iff V   a   V    a a  Scale invariant iff V a    V   a  0  Scale equivariant iff V a    aV   a  0 PPN: If the pdf is a location family and T is location invariant, then T is also ancillary. PPN: If the pdf is a scale family with iid observations and T is scale invariant, then T is also ancillary.  A sufficient statistic has ALL the information with respect to a parameter.  An ancillary statistic has NO information with respect to a parameter. 3.1: Discrete Distributions (Casella-Berger) Uniform, U(N0,N1):   x |  0 ,  1    1 , x   0 ,  0  1,,  1 1   0  1 puts equal mass on each outcome (i.e. x = 1,2,...,N)           x    x   Hypergeometric:   x | , ,    , x  0,1,,        sampling without replacement  example: N balls, M or which are one color, N-M another, and you select a sample of size K.  restriction:   x and       x         x  .  1, with probabilit y p , 0  p 1 Bernoulli:    0, with probabilit y 1  p  has only two possible outcomes n n y Binomial:   y | n, p     p y 1  p  , y  0,1,2,, n.  y  n     i (binomial is the sum of i.i.d. bernoulli trials) i 1  counts the number of successes in a fixed number of bernoulli trials n n n Binomial Theorem: For any real numbers x and y and integer n  0, x  y      x i y ni . i 0  i  Poisson:   x |    e    x , x  0,1, x! Assumption: For a small time interval, the probability of an arrival is proportional to the length of waiting time. (i.e. waiting for a bus or customers) Obviously, the longer one waits, the more likely it is a bus will show up. other applications: spatial distributions (i.e. fish in a lake) Poisson Approximation to the Binomial Distribution: If n is large and p is small, then let  = np.  x  1 r  p 1  p x r , x  r , r  1, Negative Binomial:   x | r , p    r  1   th  X = trial at which the r success occurs  counts the number of bernoulli trials necessary to get a fixed number of successes (i.e. number of trials to get x successes)  independent bernoulli trials (not necessarily identical)  must be r-1 successes in first x-1 trials...and then the rth success  can also be viewed as Y, the number of failures before rth success, where Y=X-r.  , p 1  P()  NB(r,p) r Geometric:   x | p   p1  p  , x  1,2, x 1    Same as NB(1,p) X = trial at which first success occurs distribution is “memoryless” (i.e.   s |   t     s  t  ) o Interpretation: The probability of getting s failures after already getting t failures is the same as getting s-t failures right from the start. 3.2: Continuous Distributions (Casella-Berger)  1 , if x  a, b  Uniform, U(a,b): f x | a, b    b  a  0, otherwise  spreads mass uniformly over an interval Gamma: f x |  ,       x 1  1   x e , 0  x  ,   0,   0     = shape parameter;  = scale parameter   1     ,   0 n  n  1! for any integer n>0  1  1;  12         t  1e t dt = Gamma Function  0   if is an integer, then gamma is related to Poisson via  = x/ p  2p ~ Gamma ,2  2  Exponential ~ Gamma(1,)  If X ~ exponential(), then     Normal: f x |  ,    2   x    2 1 e 1  ~Weibull() 2  ,   x   2 2  symmetric, bell-shaped often used to approximate other distributions      1  1 x 1  x  , 0  x  1,   0,   0    used to model proportions (since domain is (0,1)) Beta() = U(0,1) Beta: f x |  ,      Cauchy: f x |    1 1  1   x   2  Symmetric, bell-shaped ,   x  ,         Similar to the normal distribution, but the mean does not exist  is the median the ratio of two standard normals is Cauchy   ln(x )   2 1  2 2  , 0  x   e Lognormal: f x |  ,   2  x  used when the logarithm of a random variable is normally distributed  used in applications where the variable of interest is right skewed 2 1 1  x  e ,   x   2 reflects the exponential around its mean symmetric with “fat” tails has a peak ( sharp point = nondifferentiable) at x= Double Exponential: f x |  ,       3.3: Exponential Families (Casella-Berger) A pdf is an exponential family if it can be expressed as k f x |    hx   c exp   i   t i x  . i 1  k *        f x |   h x  c  exp   t x  This can also be written as i1 i i  .     k  The set     1 ,  , k  :  h x  exp   i  t i  x dx    is called the natural parameter  i 1     space. H is convex. 3.4: Location and Scale Families (Casella-Berger) SEE COX  2.3FOR DEFINITIONS AND EXAMPLE PDF’S FOR EACH FAMILY. 3.1: Discrete Distributions (Casella-Berger) Uniform, U(N0,N1):   x |  0 ,  1    1 , x   0 ,  0  1,,  1 1   0  1 puts equal mass on each outcome (i.e. x = 1,2,...,N)           x    x   Hypergeometric:   x | , ,    , x  0,1,,        sampling without replacement  example: N balls, M or which are one color, N-M another, and you select a sample of size K.  restriction:   x and       x         x  .  1, with probabilit y p , 0  p 1 Bernoulli:    0, with probabilit y 1  p  has only two possible outcomes n n y Binomial:   y | n, p     p y 1  p  , y  0,1,2,, n.  y  n     i (binomial is the sum of i.i.d. bernoulli trials) i 1  counts the number of successes in a fixed number of bernoulli trials n n n Binomial Theorem: For any real numbers x and y and integer n  0, x  y      x i y ni . i 0  i  Poisson:   x |    e    x , x  0,1, x! Assumption: For a small time interval, the probability of an arrival is proportional to the length of waiting time. (i.e. waiting for a bus or customers) Obviously, the longer one waits, the more likely it is a bus will show up. other applications: spatial distributions (i.e. fish in a lake) Poisson Approximation to the Binomial Distribution: If n is large and p is small, then let  = np.  x  1 r  p 1  p x r , x  r , r  1, Negative Binomial:   x | r , p    r  1   th  X = trial at which the r success occurs  counts the number of bernoulli trials necessary to get a fixed number of successes (i.e. number of trials to get x successes)  independent bernoulli trials (not necessarily identical)  must be r-1 successes in first x-1 trials...and then the rth success  can also be viewed as Y, the number of failures before rth success, where Y=X-r.  , p 1  P()  NB(r,p) r Geometric:   x | p   p1  p  , x  1,2, x 1    Same as NB(1,p) X = trial at which first success occurs distribution is “memoryless” (i.e.   s |   t     s  t  ) o Interpretation: The probability of getting s failures after already getting t failures is the same as getting s-t failures right from the start. 3.2: Continuous Distributions (Casella-Berger)  1 , if x  a, b  Uniform, U(a,b): f x | a, b    b  a  0, otherwise  spreads mass uniformly over an interval Gamma: f x |  ,       x 1  1   x e , 0  x  ,   0,   0     = shape parameter;  = scale parameter   1     ,   0 n  n  1! for any integer n>0  1  1;  12         t  1e t dt = Gamma Function  0   if is an integer, then gamma is related to Poisson via  = x/ p  2p ~ Gamma ,2  2  Exponential ~ Gamma(1,)  If X ~ exponential(), then     Normal: f x |  ,    2   x    2 1 e 1  ~Weibull() 2  ,   x   2 2  symmetric, bell-shaped often used to approximate other distributions      1  1 x 1  x  , 0  x  1,   0,   0    used to model proportions (since domain is (0,1)) Beta() = U(0,1) Beta: f x |  ,      Cauchy: f x |    1 1  1   x   2  Symmetric, bell-shaped ,   x  ,         Similar to the normal distribution, but the mean does not exist  is the median the ratio of two standard normals is Cauchy   ln(x )   2 1  2 2  , 0  x   e Lognormal: f x |  ,   2  x  used when the logarithm of a random variable is normally distributed  used in applications where the variable of interest is right skewed 2 1 1  x  e ,   x   2 reflects the exponential around its mean symmetric with “fat” tails has a peak ( sharp point = nondifferentiable) at x= Double Exponential: f x |  ,       3.3: Exponential Families (Casella-Berger) A pdf is an exponential family if it can be expressed as k f x |    hx   c exp   i   t i x  . i 1  k *        f x |   h x  c  exp   t x  This can also be written as i1 i i  .     k  The set     1 ,  , k  :  h x  exp   i  t i  x dx    is called the natural parameter  i 1     space. H is convex. 3.4: Location and Scale Families (Casella-Berger) SEE COX  2.3FOR DEFINITIONS AND EXAMPLE PDF’S FOR EACH FAMILY. Chapter 4: Multiple Random Variables 4.1: Joint and Marginal Distributions (Casella-Berger) An n-dimensional random vector is a function from a sample space S into Rn, n-dimensional Euclidean space. Let (X,Y) be a discrete bivariate random vector. Then the function f(x,y) from R2 to R defined by f(x,y)=fX,Y(x,y)=P(X=x,Y=y) is called the joint probability mass function or the joint pmf of (X,Y). Let (X,Y) be a discrete bivariate random vector with joint pmf fX,Y(x,y). Then the marginal pmfs of X and Y are f X  x    f X ,Y  x, y  and f Y  y    f X ,Y x, y  . y x On the continuous side, the joint probability density function or joint pdf is  X , Y      f x, y dxdy .      The continuous marginal pdfs are f X x    f x, y dy and f Y x    f x, y dx , -  <x.y<  . Bivariate Fundamental Theorem of Calculus  F x, y   f  x, y  . xy 4.2: Conditional Distributions and Independence (Casella-Berger) The conditional pmf/pdf of Y given that X=x is the function of y dentoed by f(y|x) and defined f  x, y  as f  y x   Y  y X  x   f X x  The expected value of g(X,Y) is g  X , Y       g x, y  f x, y dxdy .   The conditional expected value of g(Y) given X=x is g Y  x    g  y  f  y x  and y g Y  x    g  y  f  y x  dy .   The conditional variance of Y given X=x is Var Y x   Y 2 x   Y x  . 2 f x, y   f X x  f Y  y   X and Y are independent. Theorem 4.2.1: Let X and Y be independent random variables. Then (a) For any    and    ,  X  , Y     X    Y   . I.e. Y   are independent events. (b) Let g(x) be a function only of x and h(y) be a function only of y. Then g h   g   h  X   and 4.3: Bivariate Transformations (Casella-Berger) Theorem 4.3.1: If X ~ Poisson() and Y ~ Poisson() and X and Y are independent, then X+Y ~ Poisson(). x Jacobian: J  u y u x v  x y  x y y u v v u v Bivariate Transformation: fU ,V u, v  f X ,Y x  h1 u, v, y  h2 u, v  J Theorem 4.3.2: Let X and Y be independent random variables. Let g(x) be a function only of x and h(y) be a function only of y. Then U=g(x) and V=h(y) are independent. 4.4: Hierarchical Models and Mixture Distributions (Casella-Berger) A hierarchical model is a series of nested distributions. A random variable X has a mixture distribution if it is dependent upon another random variable which also has a distribution. Useful Expectation Identities:        Var  Var    Var   4.5: Covariance and Correlation (Casella-Berger) The covariance of X and Y is denoted by Cov,                    . The correlation or correlation coefficient of X and Y is denoted by    Theorem 4.5.2: If X and Y are independent, then Cov(X,Y) = 0. Cov,      . Theorem 4.5.3: If X and Y are random variables and a and b are constants, then Var(aX + bY)=a2 Var(X) + b2 Var(Y) + 2abCov(X,Y). Definition of the bivariate normal pdf: f  x, y   1 2   1   2 e   1  2 1  2   2   x   2  x      y      y             2                          Properties of bivariate normal pdf: 1. The marginal distribution of X is N(X,X). 2. The marginal distribution of Y is N(Y,Y). 3. The correlation between X and Y is XY = . 4. For any constants a and b, the distribution of aX+bY is N(aX + bY, a2X + b2Y + 2abXY). 4.6: Multivariate (Casella-Berger)  joint pmf (discrete): f x   f x1 , xn   P(X1=x1,…,Xn=xn) for each x1 ,  x n    n .    This can also be shown as     =  f  x  . x    joint pdf (continuous):       f x dx    f x1 ,, xn dx1 dxn .     The marginal pmf or pdf of any subset of the coordinates 1 ,,  n  can be computed by integrating or summing the joint pmf or pdf over all possible values of the other coordinates. conditional pdf or pmf: f xk 1 ,, xn x1 ,, xk   f x1 ,, xn  . f x1 ,, xk  Let n and m be positive integers and let p1 ,, p n be numbers satisfying 0  pi  1, i  1,, n n and p i 1 i  1 . Then   1 ,,  n  has a multinomial distribution with m trials and cell probabilities p1 ,, p n if the joint pmf (i.e. DISCRETE) is f x1 ,, x n   n p xi m! p1x1    p nxn  m! i x1!  x n ! i 1 xi ! on the set of x1 , xn  such that each xi is a nonnegative integer and  n i 1 xi  m . Multinomial Theorem: Let m and n be positive integers. Let A be the set of vectors x1 , xn  such that each xi is a nonnegative integer and  n i 1 xi  m . Then for any real numbers p1 ,, p n ,  p1    p n    m n Recall: p i 1 i m! p1x1    p nxn . x x1 !   x n !  1  Multinomial Theorem = 1m = 1. Note: A linear combination of independent normal random variables is normally distributed. 4.7: Inequalities and Identities (Casella-Berger) NUMERICAL INEQUALITIES: 1 1   1 . Then p q Lemma 4.7.1: Consider a>0 and b>0 and let p  1 and q  1 such that 1 p 1 q a  b  ab with equality only if ap=bq. p q Holder ' s Inequality : Let X and Y be random variables and p and q satisfy Lemma 4.7.1. Then         p      1 q p 1 q When p=q=2, this yields the Cauchy  Schwarz Inequality ,      2 2        r Liapunov’s Inequality:       1 r s 1 . s 1<r<s<  . NOTE: Expectations are not necessary. These inequalities can also be applied to numerical sums, such as this version of Holder ' s Inequality : FUNCTIONAL INEQUALITIES: Convex functions “hold water” (i.e. bowl-shaped) whereas concave functions “spill water”. It is also said that if you connect two points with a line segment, then a convex function will lie below the segment and a concave function above. Jensen’s Inequality: For any random variable X, if g(x) is a convex (bowl-shaped) function, then g   g  . Covariance Inequality: Let X be a random variable and g(x) and h(x) any functions such that Eg(X), Eh(X), and E(g(X)h(X)) exist. Then (a) If g(x) is a nondecreasing function and h(x) is a nonincreasing function, then g h  g h . (b) If g(x) and h(x) are either both nondecreasing or both nonincreasing, then g h  g h . PROBABILITY INEQUALITY Chebychev’s Inequality: Let X be a random variable and let g(x) be a nonnegative function. g   1 Then for any r>0, g    r   or      k  2 . r nk   IDENTITIES Recursive Formula for Poisson Distribution:   x  1   x 1   x ;   0  e  . Theorem 4.7.1: Let   ,  denote a gamma(  ,  ) random variable with pdf f x  ,   where   1 . Then for any constants a and b, a   ,   b     f a  ,    f b  ,    a   1,   b .   Stein’s Lemma: Let  ~   ,  2 and let g be a differentiable function satisfying  g '    . Then g        2 g '   . Theorem 4.7.2: Let 2p denote a chi-squared random variable with p degrees of freedom. For  h2p  2    , provided the expectations exist. any function h(x), h   p 2    p2   2 p Theorem 4.7.3 (Hwang): Let g(x) be a function with    g    and    g  1   . Then (a) If  ~ Poisson , g   g   1 .    g   1 . (b) If  ~ Negative Binomial r , p , 1  p g      r   1 

Mathematical Statistics Review

Related documents

Products

Support

Mathematical Statistics Review

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib