Bioinformatics Tea Seminar: Statistical Methods in Bioinformatics Chapter 1 Probability Theory (i) : One Random Variable 06/05/2008 Jae Hyun Kim Content Discrete Random Variable Discrete Probability Distributions Probability Generating Functions Continuous Random Variable Probability Density Functions Moment Generating Functions jaekim@ku.edu 2 Discrete Random Variable Discrete Random Variable Numerical quantity that, in some experiment (Sample Space) that involves some degree of randomness, takes one value from some discrete set of possible values (EVENT) Sample Space Set of all outcomes of an experiment (or observation) For Example, Flip a coin { H,T } Toss a die {1,2,3,4,5,6} Sum of two dice { 2,3,…,12 } Event Any subset of outcome jaekim@ku.edu 3 Discrete Probability Distributions The probability distribution Set of values that this random variable can take, together with their associated probabilities Example, Y = total number of heads when flip a coin twice Probability Distribution Function Cumulative Distribution Function jaekim@ku.edu 4 One Bernoulli Trial A Bernoulli Trial Single trial with two possible outcomes “success” or “failure” Probability of success = p jaekim@ku.edu 5 The Binomial Distribution The Binomial Random Variable The number of success in a fixed number of n independent Bernoulli trials with the same probability of success for each trial Requirements Each trial must result in one of two possible outcomes The various trials must be independent The probability of success must be the same on all trials The number n of trials must be fixed in advance jaekim@ku.edu 6 Bernoulli Trail and Binomial Distribution Comments Single Bernoulli Trial = special case (n=1) of Binomial Distribution Probability p is often an unknown parameter There is no simple formula for the cumulative distribution function for the binomial distribution There is no unique “binomial distribution,” but rather a family of distributions indexed by n and p jaekim@ku.edu 7 The Hypergeometric Distribution Hypergeometric Distribution N objects ( n red, N-n white ) m objects are taken at random, without replacement Y = number of red objects taken Biological example N lab mice ( n male, N-n female ) m Mutations The number Y of mutant males: hypergeometric distribution jaekim@ku.edu 8 The Uniform/Geometric Distribution The Uniform Distribution Same values over the range The Geometric Distribution Number of Y Bernoulli trials before but not including the first failure Cumulative distribution function jaekim@ku.edu 9 The Poisson Distribution The Poisson Distribution Event occurs randomly in time/space For example, The time between phone calls Approximation of Binomial Distribution When n is large p is small np is moderate Binomial (n, p, x ) = Poisson (np, x) ( = np) jaekim@ku.edu 10 Mean Mean / Expected Value Expected Value of g(y) Example Linearity Property In general, jaekim@ku.edu 11 Variance Definition jaekim@ku.edu 12 Summary jaekim@ku.edu 13 General Moments Moment r th moment of the probability distribution about zero Mean : First moment (r = 1) r th moment about mean Variance : r = 2 jaekim@ku.edu 14 Probability-Generating Function PGF Used to derive moments Mean Variance If two r.v. X and Y have identical probability generating functions, they are identically distributed jaekim@ku.edu 15 Continuous Random Variable Probability density function f(x) Probability Cumulative Distribution Function jaekim@ku.edu 16 Mean and Variance Mean Variance Mean value of the function g(X) jaekim@ku.edu 17 Chebyshev’s Inequality Chebyshev’s Inequality Proof jaekim@ku.edu 18 The Uniform Distribution Pdf Mean & Variance jaekim@ku.edu 19 The Normal Distribution Pdf Mean , Variance 2 jaekim@ku.edu 20 Approximation Normal Approximation to Binomial Condition n is large Binomial (n,p,x) = Normal (=np, 2=np(1-p), x) Continuity Correction Normal Approximation to Poisson Condition is large Poisson (,x) = Normal(=, 2=, x) jaekim@ku.edu 21 The Exponential Distribution Pdf Cdf Mean 1/, Variance 1/2 jaekim@ku.edu 22 The Gamma Distribution Pdf Mean and Variance jaekim@ku.edu 23 The Moment-Generating Function Definition Useful to derive m’(0) = E[X], m’’(0) = E[X2], m(n)(0) = E[Xn] mgf m(t) = pgf P(et) jaekim@ku.edu 24 Conditional Probability Conditional Probability Bayes’ Formula Independence Memoryless Property jaekim@ku.edu 25 Entropy Definition can be considered as function of PY(y) a measure of how close to uniform that distribution is, and thus, in a sense, of the unpredictability of any observed value of a random variable having that distribution. Entropy vs Variance measure in some sense the uncertainty of the value of a random variable having that distribution Entropy : Function of pdf Variance : depends on sample values jaekim@ku.edu 26