unit2

advertisement
Unit 2 : Random Variables and
their Distributions
Wenyaw Chan
Division of Biostatistics
School of Public Health
University of Texas
- Health Science Center at Houston
Random Variable
• Random Variable:
– A numeric function that assigns probabilities to different
events in a sample.
• Discrete Random Variable:
– A random variable that assumes only a finite or
denumerable number of values.
– The probability mass function of a discrete random
variable X that assumes values x1, x2,… is p(x1), p(x2), ….,
where p(xi)=Pr[X= xi].
• Continuous Random Variable:
– A random variable whose possible values cannot be
enumerated.
Example: Flip a coin 3 times
• Random Variable
– X = # of heads in the 3 coin tosses
• Probability Mass Function
–
–
–
–
P(X=3) = P{(HHH)} =1/8
P(X=2) = P{HHT, HTH, THH}= 3/8
P(X=1) = P{HTT,THT, TTH} = 3/8
P(X=0) = P{TTT} = 1/8
• X is a discrete random variable with probability (mass)
function
x
0
1
2
3
P(X=x)
1/8
3/8
3/8
1/8
Random Variable
Expected value of X :
k
E ( X )     xi Pr( X  xi )
Variance of X :
i 1
2
Var ( X )  
k
  ( xi   ) 2 Pr( X  xi )
i 1
Standard Deviation of X:
= Var ( X )
Random Variable
• Note :
2
Var ( X )  E ( X   )
2
2
 E ( X )  [ E ( X )]
• Cumulative Distribution Function
– of X : Pr(X<=x) = F(x)
Binomial Distribution
• Examples of the binomial distribution have a
common structure:
– n independent trials
– each trial has only two possible outcomes, called
“success” and “failure”.
– Pr (success) = p for all trials
Binomial Distribution
• If X= # of successful trials in these n trials, then X
has a binomial distribution.
n k
P X  k     p (1  p ) n  k
k 
• k=0,1,2,….,n
• where
n
n!
 k   (n  k )!k !
 
• Example: Flip a coin 10 times
Properties of Binomial Distribution
• If X~ Binomial (n, p), then
E(X) = np
Var (X) = np(1-p)
Poisson Distribution
Pr X  k  
k=0,1,2,…..


k
e 
k!
If X~ Poisson (), then EX =  and VarX = 
Poisson Process
• Assumption 1:
– Pr {1 event occurs in a very small time interval [0,t)} t
– Pr {0 event occurs in a very small time interval [0, t)}1- t
– Pr{more than one event occurs in a very small time interval [0,
t)}0
• Assumption 2:
– Probability that the number of events occur per unit time is the
same through out the entire time interval
• Assumption 3:
– Pr {one event in [t1,t2) | one event in [t0, t1)}
= Pr {one event in [t1, t2)}
Poisson Distribution
• X=The number of events occurred in the time period t
for the above process with parameter, then
mean=t and
Pr  ( X  k ) 
where k= 0,1,2,…
and e= 2.71828
E(X)=Var(X)=t
e
t
(t )
k!
k
Poisson approximation to Binomial
• If X~ Binomial (n, p), n is large and p is small,
then
P( X  k ) 

np
k
e
(np)
k!
Continuous Probability Distributions
• Probability density function (p.d.f.) (of a
random variable):
– a curve such that the area under the curve
between any two points a and b, equals
– Prob[a  x  b ]= ∫ a x  bf(x)dx
Pr(a<=X<=b)
a
b
Continuous Probability Distributions
• Cumulative distribution function: Pr(x  a)
Pr(X<=a)
a
Continuous Probability Distributions
• The expected value of a continuous random
variable X is
∫ xf(x)dx, where f(x) is the p.d.f. of X.
• The definition for the variance of a
continuous random variable is the same as
that of a discrete random variable, i.e.
Var(X)=E(X2)- (EX)2=∫(x-µ)2f(x)dx, where
µ=E(X).
The Normal Distribution
(The Gaussian distribution)
•
• The p.d.f. of a normal distribution
1
 f ( x) 
exp  21 ( x   )  , - < x < 


2 
2
2
The Normal Distribution
point of inflection
s
u-s
•
s
u
u+s
figures: a bell-shaped curve symmetric about 
• Notation: X~N(, 2 )
 : mean
2 : variance
The Normal Distribution
• N(0,1) is the standard normal distribution
• If X~ N(0,1), then
( x)  Pr( X  x)
– ~ : “is distributed as” ,
–  : c.d.f. for the standard normal r.v.
• Note:
– The point of inflection is a point where the slope of
the curve changes its direction.
Properties of the N(0,1)
• 1. (-x) = 1-(x)
• 2.
– About 68% of the area under the standard normal
curve lies between –1 and 1.
– About 95% of the area under the standard normal
curve lies between –2 and 2.
– About 99% of the area under the standard normal
curve lies between –2.5 and 2.5.
Properties of the N(0,1)
• If X~ N(0,1) and P(X< Zu)=u, 0  u  1
then Zu is called the 100uth percentile of the
standard normal distribution.
95th %tile=1.645, 97.5th %tile=1.96, 99th %tile=2.33
Area=u
Zu
Properties of the N(0,1)
• If X~ N(, 2), then
X 

~ N (0,1)
• This property allows us to calculate the
probability of a non-standard normal random
variable.
a   X   b   
Pr  a  X  b   Pr 



 
 
b   
a
 
  

  
  
Other Distributions--t distribution
• Let X1, ….Xn be a random sample from a
normal population N(, σ2).
Then
X 
s/ n
has a t distribution with n-1 degrees of
freedom (df).
Other Distributions--Chi-square distribution
• Let X1, ….Xn be a random sample from a
normal population N(0, 1).
Then
n
2
X
 i
i 1
has a chi-square distribution with n degrees of
freedom (df).
Other Distributions--F distribution
• Let U and V be independent random variables
and each has a chi-square distribution with p
and q degrees of freedom respectively.
Then
U/p
V /q
has a F distribution with p and q degrees of
freedom (df).
Covariance and Correlation
• The covariance between two random
variables is defined by
Cov(X,Y)=E[(X-µX)(Y-µY)].
• The correlation coefficient between two
random variables is defined by
ρ=Corr(X,Y)=Cov(X,Y)/(σX σ Y).
Variance of a Linear Combination
• Var(c1X1 + c2X2)
 c12Var ( X1 )  c22Var ( X 2 )
 2c1c2Cov( X1, X 2 )
 c12Var ( X1 )  c22Var ( X 2 )
 2c1c2 X  Y Corr( X1, X 2 )
Download