Mathematical Statistics Review

advertisement
Summary of Stat 531/532
Taught by Dr. Dennis Cox
Text: A work-in-progress by Dr. Cox
Supplemented by Statistical Inference by Cassella & Berger
Chapter 1: Probability Theory and Measure Spaces
1.1: Set Theory (Casella-Berger)
The set, S, of all possible outcomes of a particular experiment is called the sample space for the
experiment.
An event is any collection of possible outcomes of an experiment (i.e. any subset of S).
A and B are disjoint (or mutually exclusive) if A  B   . The events A1, A2, … are pairwise
disjoint if Ai  A j   i  j.

If A1, A2, … are pairwise disjoint and
A
i
 S , then the collection A1, A2, … forms a partition
i 0
of S.
1.2: Probability Theory (Casella-Berger)
A collection of subsets of S is called a Borel field (or sigma algebra) denoted by , if it satisfies
the following three properties:
1.   
2. A    Ac  

3. A1 , A2 ,      Ai
i 1
i.e. the empty set is contained in ,  is closed under complementation, and  is closed under
countable unions.
Given a sample space S and an associated Borel field , a probability function is a function P
with domain  that satisfies:
1. P(A)  0 for all A  
2. P(S)=1
3. If A1, A2,    are pairwise disjoint, then P( i 1 Ai )= i 1 P(Ai).


Axiom of Finite Additivity: If A   and B   are disjoint, then P( A  B )=P(A) + P(B).
1.1: Measures (Cox)
A measure space is denoted as , F ,   , where  is an underlying set, F is a sigma-algebra, and
 is a measure, meaning that it satisfies:
1. 0     
2.     0

 
3. If A1, A2, … is a sequence of disjoint elements of F, then     i      i .
 i 1  i 1
A measure, P, where P()=1 is called a probability measure.
A measure on ,   is called a Borel measure.
A measure, #, where #(A)=number of elements in A, is called a counting measure.
A counting measure may be written in terms of unit point masses (UPM) as #    xi , where
i
1 if x  .
. UPM measure is also a probability measure.
0 if x  .
x  
There is a unique Borel Measure, m, satisfying m([a,b]) = b - a, for every finite closed interval
[a,b],    a  b   .
Properties of Measures:
a) (monotonicity)
b) (subadditivity)
        
  i    i  , where Ai is any sequence of measurable sets.


c) If Ai , i=1,2,… is a decreasing sequence of measurable sets (i.e. 1   2  ) and if
lim


  i  .
 1    , then     i  
 i 1  i  
1.2: Measurable Functions and Integration (Cox)
Properties of the Integral:
a) If  f d exists and a   , then
b)
c)
d)
 a  f d exists and equals a  f d .
If  f d and  g d both exist and  f d   g d is defined (i.e. not of the form
   ), then   f  g  d is defined and equals  f d   g d .
If f ( )  g ( )     , then  f d   g d , provided the integrals exist.
 f d   f d , provided  f d exists.
S holds -almost everywhere (-a.e.) iff  a set   F      0 and S   is true for
  .
Consider all extended Borel functions on , F ,   :
Monotone Convergence Theorem: f n is an increasing sequence of nonnegative functions (i.e.
0  f1  f 2    f n  f n1   ) and f ( )  lim f n ( )  lim
f
n
d   f d .
Lebesgue’s Dominated Convergence Theorem: f n  f a.e. as n   and  an integrable
function g   n, f n  g a.e.  lim
f
n
d   f d .
g is called the dominating function.
Change of Variables Theorem: Suppose f : , F ,    , G and g : , G   ,   . Then
 g  f ( ) d ( )  




g ( ) d   f 1 ( ) .
Interchange of Differentiation and Integration Theorem: For each fixed  , consider a function
g  ,  . If
1. The integral of g is finite,
2. The partial derivative w.r.t.  exists, and
3. the abs. value of the partial derivative is bounded by G ( ) , then
g  , 
d
g  , 
is integrable w.r.t.  and
g  ,  d ( )  
d ( ) .




d

1.3: Measures on Product Spaces (Cox)
A measure space , F ,   is called -finite iff  an infinite sequence 1 ,  2 ,   F 
(i)   i    for each i, and

(ii)   i 1  i .
Product Measure Theorem: Let 1 , F1 , 1  and 2 , F2 ,  2  be -finite measure spaces.
Then  a unique product measure 1   2 on 1 , F1    2 , F2    1  F1 and  2  F2 ,
1   2 1  2   1 1   2 2  .
Fubini’s Theorem: Let   1   2 , F  F1  F2 and   1   2 , where  1 and  2 are finite. If f is a Borel function on  whose integral w.r.t.  exists, then


f ( ) d ( )  
Conclude that
1  2

1
f (1 ,  2 ) d (1   2 )(1 ,  2 )    f (1 ,  2 ) d1 (1 ) d 2 ( 2 ) .

2 
 1
f (1 ,  2 ) d1 (1 ) exists  2  a.e. and defines a Borel function on  2 whose
integral w.r.t.  2 exists.
A function  : , F ,     n is called an n-dimensional random vector (a.k.a random nvector). The distribution of  is the same as the law of  . (i.e.   Law  ).

n
Law   Law i    i ' s independen t .
i 1
1.4: Densities and The Radon-Nikodym Theorem (Cox)
Let  and  be measures on , F  .  is absolutely continuous w.r.t.  (i.e.    ) iff for
all   F ,     0      0 . This can also be said as  dominates  (i.e.  is a
dominating measure for  ). If both measures dominate one another, then they are equivalent.
Radon-Nikodym Theorem: Let , F ,   be a -finite measure space and suppose    .
Then  a nonnegative Borel function f      f d ,    F . Furthermore, f is

unique   a.e.
1.5: Conditional Expectation (Cox)
Theorem 1.5.1: Suppose  : , F   , G and  : , F    n ,  n  . Then Z is    measurable   a Borel function h : , G    n ,  n  such that Z = h(Y).
Let  : , F ,   ,  be a random variable with      , and suppose G is a sub--field
of F. Then the conditional expectation of X given G, denoted  | G, is the essentially unique
random variable  * satisfying (where  * =  | G)
(i)
 * is G measurable
(ii)


 * d    d,    G.

Proposition 1.5.4: Suppose  : , F ,   1 , G1  and  : , F ,    2 , G2  are random
elements and  i is a -finite measure on  i , Gi  for I=1,2 such that Law,   1   2 . Let
f x, y  denote the corresponding joint density. Let g x, y  be any Borel function
1   2     g ,      .
g x,   f x,  d x 

Then g ,   |   
, a.s.
 f x,  d x 
1
1
1
The conditional density of X given Y is denoted by f | x, y  
1
f  x, y 
.
f  y
Proposition 1.5.5: Suppose the assumptions of proposition 1.5.4 hold. Then the following are
true:
(i)
the family of regular conditional distributions Law |   y exists;
(ii)
Law |   y  1 for Law   almost all values of y ;
(iii)
the Radon-Nikodym derivative is given by
dLaw |   y 
x   f | x, y , 1   2 a.e.
d 1
Theorem 1.5.7 - Basic Properties of Conditional Expectation: Let X, X1, and X2 be integrable
r.v.’s on , F, P , and let G be a fixed sub--field of F .
(a)
If   k a.s., k a constant, then  | G  k a.s.
(b)
If 1   2 a.s., then 1 | G   2 | G a.s.
(c)
If a1 , a2   , then a1 1  a2  2 | G  a1  1 | G  a2   2 | G , a.s.
(d)
(Law of Total Expectation)  | G   .
(e)
 |  ,    .
(f)
If    G, then  | G   a.s.
(g)
(Law of Successive Conditioning) If G is a sub--field of G, then
 | G  | G   | G | G    | G  a.s.
(h)
If  1   G and 1  2   , then 1  2 | G  1   2 | G a.s.
Theorem 1.5.8 – Convergence Theorems for Conditional Expectation: Let X, X1, and X2 be
integrable r.v.’s on , F, P , and let G be a fixed sub--field of F .
(a)
(Monotone Convergence Theorem) If 0   i   a.s. then  i | G    | G  a.s.
(b)
(Dominated Convergence Theorem) Suppose there is an integrable r.v. Y such that
 i   a.s. for all I, and suppose that  i   a.s. Then  i | G   | G a.s.
To Stage Experiment Theorem: Let , G  be a measurable space and suppose p :  n    
satisfies the following:
(i)
p, is Borel measurable for each fixed    n ;
(ii)
p, y  is a Borel p.m. for each fixed y  .
Let  be any p.m. on , G . Then there is a unique p.m. P on  n  ,  n  G  
  C    p, y  d  y ,     n and C  G.
C
Theorem 1.5.12 - Bayes Formula: Suppose  : , F ,    2 , G2  is a random element and
let  be a -finite measure on  2 , G2   Law   . Denote the corresponding density by
dLaw
  .
   
d
Let  be a -finite measure on 1 ,G1  . Suppose that for each   1 there is a given pdf w.r.t
 denoted f ,  . Denote by X a random element taking values in 1 with
dLaw |    
 f ,  . Then there is a version of Law |   x given by
d
dLaw |   x
f x |      
.
  
  | x  
d






f
x
|




d



2
Chapter 2: Transformations and Expectations
2.1: Moments and Moment Inequalities (Cox)
A moment refers to an expectation of a random variable or a function of a random variable.
 The first moment is the mean
 The kth moment is E[Xk]
 The kth central moment is E[(X-E[X])k]= E[(X-)k]
         
Finite second moments  Finite first moments (i.e.  
2
Markov’s Inequality: Suppose   0 a.s., then   0,   
Corollary to Markov’s Inequality:    
  
 .


 
 2
2
Chebyshev’s Inequality: Suppose X is a random variable with E[X2]<  . Then for any k>0
1
     k   2 .
k
Jensen’s Inequality: Let f be a convex function on a convex set    n and suppose  is a
random n-vector with      and    a.s. The E[  ]   and f    f .


If f concave, change the sign.
If strictly convex/concave, can use strict inequality holds.
   
Cauchy-Schwarz Inequality:      2    2 .
2
Theorem 2.1.6 (Spectral Decomposition of a Symmetric Matrix): Let A be a symmetric
matrix. Then there is an orthogonal matrix U and a diagonal matrix  such that   UU t .
 The diagonal entries of  are eigenvalues of A
 The columns of U are the corresponding eigenvectors.
2.2: Characteristic and Moment Generating Functions (Cox)
The characteristic function (chf) of a random n-vector  is the complex valued function
  :  n  C given by   u    e

iut 
     
   cos u t X  i sin u t X

The moment generating function (mgf) is defined as  u    e

ut 


Theorem 2.2.1: Let  be a random n-vector with chf  and mgf  .
(a) (Continuity)  is uniformly continuous on  n , and  is continuous at every point u
such that      for all  in a neighborhood of u .
  
 
 , is
(b) (Relation to moments) If  is integrable, then the gradient   
,
, ,
u n 
 u1 u 2
defined at u  0 and equals iE[  ]. Also,  has finite second moments iff the Hessian


D 2 u   u  of  exists at u  0 and then  0     .
(c) (Linear Transformation Formulae) Let     b for some m n matrix A and
some m-vector b . Then for all    m ,
     e
i t b
t
   t  
     e    t  
t b
(d) (Uniqueness) If  is a random n-vector and if   u     u  for all u   m , then
Law  Law. If both   u ,   u  are defined and equal in a neighborhood of 0 ,
then Law  Law.
(e) (Chf for Sums of Independent Random Variables) Suppose  and  are independent
random p-vectors and let      . Then   u     u     u  .
Cumulant Generating Function is ln (u) =K(u).
2.3: Common Distributions Used in Statistics (Cox)
Location-Scale Families
A distribution is a location family if Lawa   Law0   a .
 I.e. f a x   f x  a 

A distribution is a scale family if Lawa    Law0   .
a
 f a x   1a f x a 
   b 
A distribution is a location-scale family if Lawa ,b    Law0 
.
 a 
1  x b
 f a ,b  x   f 

a  a 
A class of transformations T on , G  is called a transformation group iff the following hold:
(i) Every g   is measurable g : , G  , G .
(ii) T is closed under composition. i.e. if g1 and g2 are in T, then so is g1  g 2 .
(iii) T is closed under taking inverses. i.e. g    g 1   .
Let T be a transformation group and let P be a family of probability measures on , G  . Then P
is T-invariant iff    1   .
2.4: Distributional Calculations (Cox)
Jensen’s Inequality for Conditional Expectation: g ,     y  g    y , Law[Y]a.s., g(.,y) is a convex function.
2.1: Distribution of Functions of a Random Variable (Casella-Berger)
Transformation for Monotone Functions:
Theorem 2.1.1 (for cdf): Let X have cdf FX(x), let Y=g(X), and let  and  be defined as
  x : f X x  0 and   y : y  g x  for some x  
1.
2.
If g is an increasing function on  , then FY(y)=FX(g-1(y)) for y   .
If g is a decreasing function on  and X is a continuous random variable, then
FY(y)=1-FX(g-1(y)) for y   .
Theorem 2.1.2 (for pdf): Let X have pdf fX(x) and let Y=g(X), where g is a monotone function.
Let  and  be defined as above. Suppose fX(x) is continuous on  and that g-1(y) has a
continuous derivative on  . Then the pdf of Y is given by

d 1
1
g  y  for y  
 f X g y 
.
fY  y  
dy

0 otherwise



The Probability Integral Transformation:
Theorem 2.1.4: Let X have a continuous cdf FX(x) and define the random variable Y as
Y = FX(X). Then Y is uniformly distributed on (0,1). I.e. P(Y  y) = y, 0 < y < 1.
(interpretation: If X is continuous and Y = cdf of X, then Y is U(0,1)).
2.2: Expected Values (Casella-Berger)
The expected value or mean of a random variable g(X), denoted Eg(X), is


 g  x  f X x  dx if X is continuous
Eg  X   
if X is discrete .




g
x
f
x


X
 x
2.3: Moments and Moment Generating Functions (Casella-Berger)
For each integer n, the nth moment of X,  n' , is  n'  EX n . The nth central moment of X is
 n  E X   n .

Mean = 1st moment of X

Variance = 2nd central moment of X
The moment generating function of X is M X t   Ee tX .
NOTE: The nth moment is equal to the nth derivative of the mgf evaluated at t=0.
dn
i.e. M Xn  0  n M X t  t 0 .
dt
n
 n
n
Useful relationship: Binomial Formula is   u x v n  x  u  v  .
x
x 0  
Lemma 2.3.1: Let a1,a2,… be a sequence of numbers converging to ‘a’. Then
n
lim  a n 
1    e a .
n  
n
Chapter 3: Fundamental Concepts of Statistics
3.1: Basic Notations of Statistics (Cox)
Statistics is the art and science of gathering, analyzing, and making inferences from data.
Four parts to statistics:
1. Models
2. Inference
3. Decision Theory
4. Data Analysis, model checking, robustness, study design, etc…
Statistical Models
A statistical model consists of three things:
1. a measurable space , F  ,
2. a collection of probability measures P on , F  , and
3. a collection of possible observable random vectors  .
Statistical Inference
(i) Point Estimation: Goal – estimate the true  from the data.
(ii) Hypothesis Testing: Goal – choose between two hypotheses: the null or the alternative.
(iii) Interval & Set Estimation: Goal – find S  x     it’s likely true that   S x .



 
 
 

 
MSE  ,ˆ    ˆ  

Bias  ,ˆ   ˆ  
MSE  ,ˆ  Bias 2  ,ˆ

2
  Var ˆ

Decision Theory
Nature vs. the Statistician
Nature picks a value of  and generates    according to  .
There exists an action space of allowable decisions/actions and the Statistician must choose a
decision rule from a class of allowable decision rules.
There is also a loss function based on the Statistician’s decision rule chosen and Nature’s true
value picked.
Risk is Expected Loss denoted R ,     L ,  x .


A -optimal decision rule  * is one that satisfies R  ,  *  R ,     ,    .
Bayesian Decision Theory
Suppose Nature chooses the parameter as well as the data at random.
We know the distribution Nature uses for selecting  is  , the prior distribution.
Goal is the minimize Bayes risk: r     R ,  d   .

r     R ,  d      L ,  x d    


 L ,  x f x  d xd      x, ad x
 

a = any allowable action.
Want to find  * by minimizing  x, a    * x  :  x,  * x    x,  x , a  
This then implies that r  *  r   for any other decision rule  .

 

3.2: Sufficient Statistics (Cox)
A statistic    x is sufficient for  iff the conditional distribution of X given T=t is
independent of  . Intuitively, a sufficient statistic contains all the information about  that is in
X.
A loss function, L(  ,a), is convex if (i) action space A is a convex set and (ii) for each  ,
L , :   0,  is a convex function.
Rao-Blackwell Theorem: Let 1 ,  2 , be iid random variables with pdf f ;  . Let T be a
sufficient statistic for  and U be an unbiased estimate for  , which is not a function of T alone.
Set  t    U   t . Then we have that:
(1) The random variable   is a function of the sufficient statistic T alone.
(2)   is an unbiased estimator for  .
(3) Var(   ) < Var(U),    provided  U 2   .
Factorization Theorem:    x is a sufficient statistic for  iff f x   g  x ,   hx  for
some g and h.

If you can put the distribution if the form
then  i ' s are sufficient statistics.
e
 n

 i   i  x   
 i 1


 hx  ,
Minimal Sufficient Statistic
'
If m is the smallest number for which T = 1 ,, m  ,  j   j 1 , ,  n , j  1, , m , is a
sufficient statistic for   1 ,, m  , then T is called a minimal sufficient statistic for  .
Alternative definition: If T is a sufficient statistic for a family P, then T is minimal sufficient iff
for any other sufficient statistic S, T = h(S) P-a.s. i.e. Any sufficient statistic can be written in
terms of the minimal sufficient statistic.
'
Proposition 4.2.5: Let P be an exponential family on a Euclidean space with
'
densities f x   exp[   x    ]  hx  , where  and  are p-dimensional. Suppose there
exists  0 , 1 ,  , p    0   such that the vectors  i    i     0 , 1  i  p are linearly
independent in  p . Then  is minimal sufficient for  . In particular, if the exponential family
is full rank, then T is minimal sufficient.
3.3: Complete Statistics and Ancillary Statistics (Cox)
A statistic X is complete if for every g,  g    0  g    0 .


x
Example: Consider the Poisson Family, F   f .;  : f x;   e 
  x ,   0,   where
x!


x

 0  g x   0 . So F is complete.
A = [1,2,…].  g    e   g x 
x!
A statistic V is ancillary iff Law V  does not depend on  . i.e. Law V   Law0 V 
Basu’s Theorem: Suppose T is a complete and sufficient statistic and V is ancillary for  . Then
T and V are independent.
A statistic V  V  is:
 Location invariant iff V    V   a  a

Location equivariant iff V   a   V    a a

Scale invariant iff V a    V   a  0

Scale equivariant iff V a    aV   a  0
PPN: If the pdf is a location family and T is location invariant, then T is also ancillary.
PPN: If the pdf is a scale family with iid observations and T is scale invariant, then T is also
ancillary.
 A sufficient statistic has ALL the information with respect to a parameter.
 An ancillary statistic has NO information with respect to a parameter.
3.1: Discrete Distributions (Casella-Berger)
Uniform, U(N0,N1):   x |  0 ,  1  

1
, x   0 ,  0  1,,  1
1   0  1
puts equal mass on each outcome (i.e. x = 1,2,...,N)
      
 

x    x 

Hypergeometric:   x | , ,   
, x  0,1,, 

 
 
 sampling without replacement
 example: N balls, M or which are one color, N-M another, and you select a sample of
size K.
 restriction:   x and       x         x  .
 1, with probabilit y p
, 0  p 1
Bernoulli:   
0, with probabilit y 1  p
 has only two possible outcomes
n
n y
Binomial:   y | n, p     p y 1  p  , y  0,1,2,, n.
 y

n
    i (binomial is the sum of i.i.d. bernoulli trials)
i 1

counts the number of successes in a fixed number of bernoulli trials
n
n
n
Binomial Theorem: For any real numbers x and y and integer n  0, x  y      x i y ni .
i 0  i 
Poisson:   x |    e 


x
, x  0,1,
x!
Assumption: For a small time interval, the probability of an arrival is proportional to
the length of waiting time. (i.e. waiting for a bus or customers) Obviously, the longer
one waits, the more likely it is a bus will show up.
other applications: spatial distributions (i.e. fish in a lake)
Poisson Approximation to the Binomial Distribution: If n is large and p is small, then let  = np.
 x  1 r
 p 1  p x r , x  r , r  1,
Negative Binomial:   x | r , p   
r

1


th
 X = trial at which the r success occurs
 counts the number of bernoulli trials necessary to get a fixed number of successes (i.e.
number of trials to get x successes)
 independent bernoulli trials (not necessarily identical)
 must be r-1 successes in first x-1 trials...and then the rth success
 can also be viewed as Y, the number of failures before rth success, where Y=X-r.
 , p 1
 P()
 NB(r,p) r
Geometric:   x | p   p1  p  , x  1,2,
x 1



Same as NB(1,p)
X = trial at which first success occurs
distribution is “memoryless” (i.e.   s |   t     s  t  )
o Interpretation: The probability of getting s failures after already getting t
failures is the same as getting s-t failures right from the start.
3.2: Continuous Distributions (Casella-Berger)
 1
, if x  a, b

Uniform, U(a,b): f x | a, b    b  a
 0, otherwise
 spreads mass uniformly over an interval
Gamma: f x |  ,   



x
1
 1  
x
e
, 0  x  ,   0,   0
  
 = shape parameter;  = scale parameter
  1     ,   0
n  n  1! for any integer n>0

1  1;  12   

    t  1e t dt = Gamma Function

0


if is an integer, then gamma is related to Poisson via  = x/
p 
2p ~ Gamma ,2 
2 
Exponential ~ Gamma(1,)

If X ~ exponential(), then   

Normal: f x |  , 


2

 x   
2
1
e
1

~Weibull()
2  ,   x  
2
2 
symmetric, bell-shaped
often used to approximate other distributions
     1
 1
x 1  x  , 0  x  1,   0,   0
  
used to model proportions (since domain is (0,1))
Beta() = U(0,1)
Beta: f x |  ,   


Cauchy: f x |   
1
1
 1   x   2
 Symmetric, bell-shaped
,   x  ,     



Similar to the normal distribution, but the mean does not exist
 is the median
the ratio of two standard normals is Cauchy


ln(x )   2
1 
2 2  , 0  x  
e
Lognormal: f x |  ,  
2  x
 used when the logarithm of a random variable is normally distributed
 used in applications where the variable of interest is right skewed
2
1
1  x 
e
,   x  
2
reflects the exponential around its mean
symmetric with “fat” tails
has a peak ( sharp point = nondifferentiable) at x=
Double Exponential: f x |  ,   



3.3: Exponential Families (Casella-Berger)
A pdf is an exponential family if it can be expressed as
k
f x |    hx   c exp   i   t i x  .
i 1

k
*







f
x
|


h
x

c

exp
  t x 
This can also be written as
i1 i i  .



 k

The set     1 ,  , k  :  h x  exp   i  t i  x dx    is called the natural parameter
 i 1




space. H is convex.
3.4: Location and Scale Families (Casella-Berger)
SEE COX
 2.3FOR DEFINITIONS AND EXAMPLE PDF’S FOR EACH FAMILY.
3.1: Discrete Distributions (Casella-Berger)
Uniform, U(N0,N1):   x |  0 ,  1  

1
, x   0 ,  0  1,,  1
1   0  1
puts equal mass on each outcome (i.e. x = 1,2,...,N)
      
 

x    x 

Hypergeometric:   x | , ,   
, x  0,1,, 

 
 
 sampling without replacement
 example: N balls, M or which are one color, N-M another, and you select a sample of
size K.
 restriction:   x and       x         x  .
 1, with probabilit y p
, 0  p 1
Bernoulli:   
0, with probabilit y 1  p
 has only two possible outcomes
n
n y
Binomial:   y | n, p     p y 1  p  , y  0,1,2,, n.
 y

n
    i (binomial is the sum of i.i.d. bernoulli trials)
i 1

counts the number of successes in a fixed number of bernoulli trials
n
n
n
Binomial Theorem: For any real numbers x and y and integer n  0, x  y      x i y ni .
i 0  i 
Poisson:   x |    e 


x
, x  0,1,
x!
Assumption: For a small time interval, the probability of an arrival is proportional to
the length of waiting time. (i.e. waiting for a bus or customers) Obviously, the longer
one waits, the more likely it is a bus will show up.
other applications: spatial distributions (i.e. fish in a lake)
Poisson Approximation to the Binomial Distribution: If n is large and p is small, then let  = np.
 x  1 r
 p 1  p x r , x  r , r  1,
Negative Binomial:   x | r , p   
r

1


th
 X = trial at which the r success occurs
 counts the number of bernoulli trials necessary to get a fixed number of successes (i.e.
number of trials to get x successes)
 independent bernoulli trials (not necessarily identical)
 must be r-1 successes in first x-1 trials...and then the rth success
 can also be viewed as Y, the number of failures before rth success, where Y=X-r.
 , p 1
 P()
 NB(r,p) r
Geometric:   x | p   p1  p  , x  1,2,
x 1



Same as NB(1,p)
X = trial at which first success occurs
distribution is “memoryless” (i.e.   s |   t     s  t  )
o Interpretation: The probability of getting s failures after already getting t
failures is the same as getting s-t failures right from the start.
3.2: Continuous Distributions (Casella-Berger)
 1
, if x  a, b

Uniform, U(a,b): f x | a, b    b  a
 0, otherwise
 spreads mass uniformly over an interval
Gamma: f x |  ,   



x
1
 1  
x
e
, 0  x  ,   0,   0
  
 = shape parameter;  = scale parameter
  1     ,   0
n  n  1! for any integer n>0

1  1;  12   

    t  1e t dt = Gamma Function

0


if is an integer, then gamma is related to Poisson via  = x/
p 
2p ~ Gamma ,2 
2 
Exponential ~ Gamma(1,)

If X ~ exponential(), then   

Normal: f x |  , 


2

 x   
2
1
e
1

~Weibull()
2  ,   x  
2
2 
symmetric, bell-shaped
often used to approximate other distributions
     1
 1
x 1  x  , 0  x  1,   0,   0
  
used to model proportions (since domain is (0,1))
Beta() = U(0,1)
Beta: f x |  ,   


Cauchy: f x |   
1
1
 1   x   2
 Symmetric, bell-shaped
,   x  ,     



Similar to the normal distribution, but the mean does not exist
 is the median
the ratio of two standard normals is Cauchy


ln(x )   2
1 
2 2  , 0  x  
e
Lognormal: f x |  ,  
2  x
 used when the logarithm of a random variable is normally distributed
 used in applications where the variable of interest is right skewed
2
1
1  x 
e
,   x  
2
reflects the exponential around its mean
symmetric with “fat” tails
has a peak ( sharp point = nondifferentiable) at x=
Double Exponential: f x |  ,   



3.3: Exponential Families (Casella-Berger)
A pdf is an exponential family if it can be expressed as
k
f x |    hx   c exp   i   t i x  .
i 1

k
*







f
x
|


h
x

c

exp
  t x 
This can also be written as
i1 i i  .



 k

The set     1 ,  , k  :  h x  exp   i  t i  x dx    is called the natural parameter
 i 1




space. H is convex.
3.4: Location and Scale Families (Casella-Berger)
SEE COX
 2.3FOR DEFINITIONS AND EXAMPLE PDF’S FOR EACH FAMILY.
Chapter 4: Multiple Random Variables
4.1: Joint and Marginal Distributions (Casella-Berger)
An n-dimensional random vector is a function from a sample space S into Rn, n-dimensional
Euclidean space.
Let (X,Y) be a discrete bivariate random vector. Then the function f(x,y) from R2 to R defined
by f(x,y)=fX,Y(x,y)=P(X=x,Y=y) is called the joint probability mass function or the joint pmf of
(X,Y).
Let (X,Y) be a discrete bivariate random vector with joint pmf fX,Y(x,y). Then the marginal pmfs
of X and Y are f X  x    f X ,Y  x, y  and f Y  y    f X ,Y x, y  .
y
x
On the continuous side, the joint probability density function or joint pdf is
 X , Y      f x, y dxdy .





The continuous marginal pdfs are f X x    f x, y dy and f Y x    f x, y dx , -  <x.y<  .
Bivariate Fundamental Theorem of Calculus 
F x, y 
 f  x, y  .
xy
4.2: Conditional Distributions and Independence (Casella-Berger)
The conditional pmf/pdf of Y given that X=x is the function of y dentoed by f(y|x) and defined
f  x, y 
as f  y x   Y  y X  x  
f X x 
The expected value of g(X,Y) is g  X , Y   


 g x, y  f x, y dxdy .
 
The conditional expected value of g(Y) given X=x is g Y  x    g  y  f  y x  and
y
g Y  x    g  y  f  y x  dy .


The conditional variance of Y given X=x is Var Y x   Y 2 x   Y x  .
2
f x, y   f X x  f Y  y   X and Y are independent.
Theorem 4.2.1: Let X and Y be independent random variables. Then
(a) For any    and    ,  X  , Y     X    Y   . I.e.
Y   are independent events.
(b) Let g(x) be a function only of x and h(y) be a function only of y. Then
g h   g   h 
X   and
4.3: Bivariate Transformations (Casella-Berger)
Theorem 4.3.1: If X ~ Poisson() and Y ~ Poisson() and X and Y are independent, then
X+Y ~ Poisson().
x
Jacobian: J  u
y
u
x
v  x y  x y
y u v v u
v
Bivariate Transformation: fU ,V u, v  f X ,Y x  h1 u, v, y  h2 u, v  J
Theorem 4.3.2: Let X and Y be independent random variables. Let g(x) be a function only of x
and h(y) be a function only of y. Then U=g(x) and V=h(y) are independent.
4.4: Hierarchical Models and Mixture Distributions (Casella-Berger)
A hierarchical model is a series of nested distributions.
A random variable X has a mixture distribution if it is dependent upon another random variable
which also has a distribution.
Useful Expectation Identities:
     

Var  Var    Var  
4.5: Covariance and Correlation (Casella-Berger)
The covariance of X and Y is denoted by Cov,                    .
The correlation or correlation coefficient of X and Y is denoted by   
Theorem 4.5.2: If X and Y are independent, then Cov(X,Y) = 0.
Cov,  
  
.
Theorem 4.5.3: If X and Y are random variables and a and b are constants, then
Var(aX + bY)=a2 Var(X) + b2 Var(Y) + 2abCov(X,Y).
Definition of the bivariate normal pdf:
f  x, y  
1
2   1   2
e

 1
 2 1  2


2
  x   2
 x      y      y      



 

2  
  
            







Properties of bivariate normal pdf:
1. The marginal distribution of X is N(X,X).
2. The marginal distribution of Y is N(Y,Y).
3. The correlation between X and Y is XY = .
4. For any constants a and b, the distribution of aX+bY is
N(aX + bY, a2X + b2Y + 2abXY).
4.6: Multivariate (Casella-Berger)

joint pmf (discrete): f x   f x1 , xn   P(X1=x1,…,Xn=xn) for each x1 ,  x n    n .


 This can also be shown as     =  f  x  .
x

 
joint pdf (continuous):       f x dx    f x1 ,, xn dx1 dxn .




The marginal pmf or pdf of any subset of the coordinates 1 ,,  n  can be computed by
integrating or summing the joint pmf or pdf over all possible values of the other coordinates.
conditional pdf or pmf: f xk 1 ,, xn x1 ,, xk  
f x1 ,, xn 
.
f x1 ,, xk 
Let n and m be positive integers and let p1 ,, p n be numbers satisfying 0  pi  1, i  1,, n
n
and
p
i 1
i
 1 . Then   1 ,,  n  has a multinomial distribution with m trials and cell
probabilities p1 ,, p n if the joint pmf (i.e. DISCRETE) is
f x1 ,, x n  
n
p xi
m!
p1x1    p nxn  m! i
x1!  x n !
i 1 xi !
on the set of x1 , xn  such that each xi is a nonnegative integer and

n
i 1
xi  m .
Multinomial Theorem: Let m and n be positive integers. Let A be the set of vectors x1 , xn 
such that each xi is a nonnegative integer and

n
i 1
xi  m . Then for any real numbers
p1 ,, p n ,  p1    p n   
m
n
Recall:
p
i 1
i
m!
p1x1    p nxn .
x x1 !   x n !
 1  Multinomial Theorem = 1m = 1.
Note: A linear combination of independent normal random variables is normally distributed.
4.7: Inequalities and Identities (Casella-Berger)
NUMERICAL INEQUALITIES:
1 1
  1 . Then
p q
Lemma 4.7.1: Consider a>0 and b>0 and let p  1 and q  1 such that
1 p 1 q
a  b  ab with equality only if ap=bq.
p
q
Holder ' s Inequality : Let X and Y be random variables and p and q satisfy Lemma 4.7.1. Then

      
p
    
1
q
p
1
q
When p=q=2, this yields the Cauchy  Schwarz Inequality ,
    
2
2
      
r
Liapunov’s Inequality:
     
1
r
s
1
.
s
1<r<s<  .
NOTE: Expectations are not necessary. These inequalities can also be applied to numerical
sums, such as this version of Holder ' s Inequality :
FUNCTIONAL INEQUALITIES:
Convex functions “hold water” (i.e. bowl-shaped) whereas concave functions “spill water”. It is
also said that if you connect two points with a line segment, then a convex function will lie
below the segment and a concave function above.
Jensen’s Inequality: For any random variable X, if g(x) is a convex (bowl-shaped) function, then
g   g  .
Covariance Inequality: Let X be a random variable and g(x) and h(x) any functions such that
Eg(X), Eh(X), and E(g(X)h(X)) exist. Then
(a) If g(x) is a nondecreasing function and h(x) is a nonincreasing function, then
g h  g h .
(b) If g(x) and h(x) are either both nondecreasing or both nonincreasing, then
g h  g h .
PROBABILITY INEQUALITY
Chebychev’s Inequality: Let X be a random variable and let g(x) be a nonnegative function.
g  
1
Then for any r>0, g    r  
or      k  2 .
r
nk


IDENTITIES
Recursive Formula for Poisson Distribution:   x  1 

x 1
  x ;   0  e  .
Theorem 4.7.1: Let   ,  denote a gamma(  ,  ) random variable with pdf f x  ,   where
  1 . Then for any constants a and b,
a   ,   b     f a  ,    f b  ,    a   1,   b .


Stein’s Lemma: Let  ~   ,  2 and let g be a differentiable function satisfying
 g '    . Then g        2 g '   .
Theorem 4.7.2: Let 2p denote a chi-squared random variable with p degrees of freedom. For
 h2p  2  
 , provided the expectations exist.
any function h(x), h   p 2
 

p2 

2
p
Theorem 4.7.3 (Hwang): Let g(x) be a function with    g    and    g  1   .
Then
(a) If  ~ Poisson , g   g   1 .



g   1 .
(b) If  ~ Negative Binomial r , p , 1  p g    
 r   1

Download