Unit 2 : Random Variables and their Distributions Wenyaw Chan Division of Biostatistics School of Public Health University of Texas - Health Science Center at Houston Random Variable • Random Variable: – A numeric function that assigns probabilities to different events in a sample. • Discrete Random Variable: – A random variable that assumes only a finite or denumerable number of values. – The probability mass function of a discrete random variable X that assumes values x1, x2,… is p(x1), p(x2), …., where p(xi)=Pr[X= xi]. • Continuous Random Variable: – A random variable whose possible values cannot be enumerated. Example: Flip a coin 3 times • Random Variable – X = # of heads in the 3 coin tosses • Probability Mass Function – – – – P(X=3) = P{(HHH)} =1/8 P(X=2) = P{HHT, HTH, THH}= 3/8 P(X=1) = P{HTT,THT, TTH} = 3/8 P(X=0) = P{TTT} = 1/8 • X is a discrete random variable with probability (mass) function x 0 1 2 3 P(X=x) 1/8 3/8 3/8 1/8 Random Variable Expected value of X : k E ( X ) xi Pr( X xi ) Variance of X : i 1 2 Var ( X ) k ( xi ) 2 Pr( X xi ) i 1 Standard Deviation of X: = Var ( X ) Random Variable • Note : 2 Var ( X ) E ( X ) 2 2 E ( X ) [ E ( X )] • Cumulative Distribution Function – of X : Pr(X<=x) = F(x) Binomial Distribution • Examples of the binomial distribution have a common structure: – n independent trials – each trial has only two possible outcomes, called “success” and “failure”. – Pr (success) = p for all trials Binomial Distribution • If X= # of successful trials in these n trials, then X has a binomial distribution. n k P X k p (1 p ) n k k • k=0,1,2,….,n • where n n! k (n k )!k ! • Example: Flip a coin 10 times Properties of Binomial Distribution • If X~ Binomial (n, p), then E(X) = np Var (X) = np(1-p) Poisson Distribution Pr X k k=0,1,2,….. k e k! If X~ Poisson (), then EX = and VarX = Poisson Process • Assumption 1: – Pr {1 event occurs in a very small time interval [0,t)} t – Pr {0 event occurs in a very small time interval [0, t)}1- t – Pr{more than one event occurs in a very small time interval [0, t)}0 • Assumption 2: – Probability that the number of events occur per unit time is the same through out the entire time interval • Assumption 3: – Pr {one event in [t1,t2) | one event in [t0, t1)} = Pr {one event in [t1, t2)} Poisson Distribution • X=The number of events occurred in the time period t for the above process with parameter, then mean=t and Pr ( X k ) where k= 0,1,2,… and e= 2.71828 E(X)=Var(X)=t e t (t ) k! k Poisson approximation to Binomial • If X~ Binomial (n, p), n is large and p is small, then P( X k ) np k e (np) k! Continuous Probability Distributions • Probability density function (p.d.f.) (of a random variable): – a curve such that the area under the curve between any two points a and b, equals – Prob[a x b ]= ∫ a x bf(x)dx Pr(a<=X<=b) a b Continuous Probability Distributions • Cumulative distribution function: Pr(x a) Pr(X<=a) a Continuous Probability Distributions • The expected value of a continuous random variable X is ∫ xf(x)dx, where f(x) is the p.d.f. of X. • The definition for the variance of a continuous random variable is the same as that of a discrete random variable, i.e. Var(X)=E(X2)- (EX)2=∫(x-µ)2f(x)dx, where µ=E(X). The Normal Distribution (The Gaussian distribution) • • The p.d.f. of a normal distribution 1 f ( x) exp 21 ( x ) , - < x < 2 2 2 The Normal Distribution point of inflection s u-s • s u u+s figures: a bell-shaped curve symmetric about • Notation: X~N(, 2 ) : mean 2 : variance The Normal Distribution • N(0,1) is the standard normal distribution • If X~ N(0,1), then ( x) Pr( X x) – ~ : “is distributed as” , – : c.d.f. for the standard normal r.v. • Note: – The point of inflection is a point where the slope of the curve changes its direction. Properties of the N(0,1) • 1. (-x) = 1-(x) • 2. – About 68% of the area under the standard normal curve lies between –1 and 1. – About 95% of the area under the standard normal curve lies between –2 and 2. – About 99% of the area under the standard normal curve lies between –2.5 and 2.5. Properties of the N(0,1) • If X~ N(0,1) and P(X< Zu)=u, 0 u 1 then Zu is called the 100uth percentile of the standard normal distribution. 95th %tile=1.645, 97.5th %tile=1.96, 99th %tile=2.33 Area=u Zu Properties of the N(0,1) • If X~ N(, 2), then X ~ N (0,1) • This property allows us to calculate the probability of a non-standard normal random variable. a X b Pr a X b Pr b a Other Distributions--t distribution • Let X1, ….Xn be a random sample from a normal population N(, σ2). Then X s/ n has a t distribution with n-1 degrees of freedom (df). Other Distributions--Chi-square distribution • Let X1, ….Xn be a random sample from a normal population N(0, 1). Then n 2 X i i 1 has a chi-square distribution with n degrees of freedom (df). Other Distributions--F distribution • Let U and V be independent random variables and each has a chi-square distribution with p and q degrees of freedom respectively. Then U/p V /q has a F distribution with p and q degrees of freedom (df). Covariance and Correlation • The covariance between two random variables is defined by Cov(X,Y)=E[(X-µX)(Y-µY)]. • The correlation coefficient between two random variables is defined by ρ=Corr(X,Y)=Cov(X,Y)/(σX σ Y). Variance of a Linear Combination • Var(c1X1 + c2X2) c12Var ( X1 ) c22Var ( X 2 ) 2c1c2Cov( X1, X 2 ) c12Var ( X1 ) c22Var ( X 2 ) 2c1c2 X Y Corr( X1, X 2 )