Probability Theory Random Variables and Distributions Rob Nicholls MRC LMB Statistics Course 2014 Introduction Introduction Significance magazine (December 2013) Royal Statistical Society Introduction Significance magazine (December 2013) Royal Statistical Society Introduction Significance magazine (December 2013) Royal Statistical Society Contents • • • • • Set Notation Intro to Probability Theory Random Variables Probability Mass Functions Common Discrete Distributions Set Notation A set is a collection of objects, written using curly brackets {} If A is the set of all outcomes, then: A = {heads,tails} A = {one, two, three, four, five, six} A set does not have to comprise the full number of outcomes E.g. if A is the set of dice outcomes no higher than three, then: A = {one, two, three} Set Notation If A and B are sets, then: A¢ Complement – everything but A AÈ B Union (or) AÇ B Intersection (and) A\B Not Æ Empty Set Set Notation Venn Diagram: one two, three A six four five B Set Notation Venn Diagram: one two, three A six four five B A = {two, three, four, five} Set Notation Venn Diagram: one two, three A six four five B B = { four, five, six} Set Notation Venn Diagram: one two, three A six four five B AÇ B = { four, five} Set Notation Venn Diagram: one two, three A six four five B AÈ B = {two,three, four, five, six} Set Notation Venn Diagram: one two, three A six four five B (AÈ B)¢ = {one} Set Notation Venn Diagram: one two, three A six four five B A \ B = {two,three} Set Notation Venn Diagram: one two, three A six four five B (A \ B)¢ = {one, four, five, six} Probability Theory To consider Probabilities, we need: 1. Sample space: Ω 2. Event space: 3. Probability measure: P Probability Theory To consider Probabilities, we need: 1. Sample space: Ω - the set of all possible outcomes W = {heads, tails} W = {one, two, three, four, five, six} Probability Theory To consider Probabilities, we need: 2. Event space: - the set of all possible events W = {heads, tails} W = {{heads, tails},{heads},{tails},Æ} Probability Theory To consider Probabilities, we need: 3. Probability measure: P P : F ®[0,1] P must satisfy two axioms: Probability of any outcome is 1 (100% chance) If and only if A1, A2 ,…are disjoint Probability Theory To consider Probabilities, we need: 3. Probability measure: P P : F ®[0,1] P must satisfy two axioms: Probability of any outcome is 1 (100% chance) If and only if A1, A2 ,…are disjoint P({one, two}) = P({one})+ P({two}) 1 1 1 = + 3 6 6 Probability Theory To consider Probabilities, we need: 1. Sample space: Ω 2. Event space: 3. Probability measure: P As such, a Probability Space is the triple: (Ω, ,P ) Probability Theory To consider Probabilities, we need: The triple: (Ω, ,P ) i.e. we need to know: 1. The set of potential outcomes; 2. The set of potential events that may occur; and 3. The probabilities associated with occurrence of those events. Probability Theory Notable properties of a Probability Space (Ω, ,P ): Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) A = {one, two} A¢ = {three, four, five, six} P(A) = 1 / 3 P( A¢) = 2 / 3 Probability Theory Notable properties of a Probability Space (Ω, P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) ,P ): Probability Theory Notable properties of a Probability Space (Ω, P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) A B ,P ): Probability Theory Notable properties of a Probability Space (Ω, P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) A B ,P ): Probability Theory Notable properties of a Probability Space (Ω, P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) A B ,P ): Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) A = {one, two} P(A) = 1 / 3 B = {two, three} A È B = {one, two, three} P(B) = 1 / 3 P(A È B) = 1 / 2 A Ç B = {two} P(A Ç B) = 1 / 6 Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then and P(A) £ P(B) A P(B \ A) = P(B) - P(A) B Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then and P(A) £ P(B) A P(B \ A) = P(B) - P(A) B Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then and P(A) £ P(B) A P(B \ A) = P(B) - P(A) B Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then and P(A) £ P(B) A P(B \ A) = P(B) - P(A) B Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then and P(A) £ P(B) P(B \ A) = P(B) - P(A) A = {one, two} B = {one, two, three} P(A) = 1 / 3 P(B) =1 / 2 B \ A = {three} P(B \ A) = 1 / 6 Probability Theory Notable properties of a Probability Space (Ω, ,P ): P( A¢) =1- P(A) P(AÈ B) = P(A)+ P(B)- P(AÇ B) If A Í B then P(Æ) = 0 and P(A) £ P(B) P(B \ A) = P(B) - P(A) Probability Theory So where’s this all going? These examples are trivial! Probability Theory So where’s this all going? These examples are trivial! Suppose there are three bags, B1, B2 and B3, each of which contain a number of coloured balls: • B1 – 2 red and 4 white • B2 – 1 red and 2 white • B3 – 5 red and 4 white A ball is randomly removed from one the bags. The bags were selected with probability: • P(B1) = 1/3 • P(B2) = 5/12 • P(B3) = 1/4 What is the probability that the ball came from B1, given it is red? Probability Theory Conditional probability: P(A Ç B) P(A | B) = P(B) Partition Theorem: P(A) = å P(A Ç Bi ) If the i Bayes’ Theorem: P(B | A)P(A) P(A | B) = P(B) A Bpartition i Random Variables A Random Variable is an object whose value is determined by chance, i.e. random events Maps elements of Ω onto real numbers, with corresponding probabilities as specified by P Random Variables A Random Variable is an object whose value is determined by chance, i.e. random events Maps elements of Ω onto real numbers, with corresponding probabilities as specified by P Formally, a Random Variable is a function: X :W® R Random Variables A Random Variable is an object whose value is determined by chance, i.e. random events Maps elements of Ω onto real numbers, with corresponding probabilities as specified by P Formally, a Random Variable is a function: X :W® R Probability that the random variable X adopts a particular value x: P({w Î W : X(w) = x}) Random Variables A Random Variable is an object whose value is determined by chance, i.e. random events Maps elements of Ω onto real numbers, with corresponding probabilities as specified by P Formally, a Random Variable is a function: X :W® R Probability that the random variable X adopts a particular value x: P({w Î W : X(w) = x}) Shorthand: P(X = x) Random Variables Example: If the result is heads then WIN - X takes the value 1 If the result is tails then LOSE - X takes the value 0 W = {heads, tails} X : W ® {0,1} ì ïï P({heads}) P(X = x) = í ï P({tails}) ïî P(X = x) =1 2 x =1 x=0 x Î {0,1} Random Variables Example: W = {one, two, three, four, five, six} Win £20 on a six, nothing on four/five, lose £10 on one/two/three X : W ® {-10, 0, 20} Random Variables Example: W = {one, two, three, four, five, six} Win £20 on a six, nothing on four/five, lose £10 on one/two/three X : W ® {-10, 0, 20} ì P({six}) = 1 6 ï ï ï P(X = x) = í P({ four, five})= 1 3 ï ï ï P({one, two, three}) = 1 2 î Note – we are considering the probabilities of events in x = 20 x=0 x = -10 Probability Mass Functions Given a random variable: X :W® A The Probability Mass Function is defined as: pX (x) = P(X = x) Only for discrete random variables Probability Mass Functions Example: Win £20 on a six, nothing on four/five, lose £10 on one/two/three ì P({six}) = 1 6 ï ï ï pX (x) = í P({ four, five})= 1 3 ï ï ï P({one, two, three}) = 1 2 î x = 20 x=0 x = -10 Probability Mass Functions Notable properties of Probability Mass Functions: pX (x) ³ 0 åp xÎA X (x) =1 Probability Mass Functions Notable properties of Probability Mass Functions: pX (x) ³ 0 åp X (x) =1 xÎA Interesting note: If p() is some function that has the above two properties, then it is the mass function of some random variable… Probability Mass Functions For a random variable Mean: X :W® A E(X) = å xpX (x) xÎA Probability Mass Functions For a random variable Mean: X :W® A E(X) = å xpX (x) xÎA Probability Mass Functions For a random variable Mean: X :W® A E(X) = å xpX (x) xÎA Compare with: 1 n xi å n i=1 Probability Mass Functions For a random variable Mean: X :W® A E(X) = å xpX (x) xÎA Median: any m such that: åp x£m X (x) ³1 2and åp x³m X (x) ³1 2 Probability Mass Functions For a random variable Mean: X :W® A E(X) = å xpX (x) xÎA Median: any m such that: åp X (x) ³1 2and x³m x£m Mode: argmax ( pX (x)) x åp : most likely value X (x) ³1 2 Common Discrete Distributions Common Discrete Distributions The Bernoulli Distribution: X ~ Bern(p) p : success probability X : W ® {0,1} ì ïï p pX (x) = í ï 1- p ïî x =1 x=0 Common Discrete Distributions X ~ Bern(p) The Bernoulli Distribution: p : success probability ì ïï p pX (x) = í ï 1- p ïî X : W ® {0,1} Example: X :{heads, tails} ® {0,1} pX (x) =1 2 Therefore x Î {0,1} X ~ Bern(1 2) x =1 x=0 Common Discrete Distributions The Binomial Distribution: X ~ Bin(n, p) E(X) = np n : number of independent trials p : success probability X : W ® {0,1,..., n} ænö x pX (x) = ç ÷ p (1- p)n-x è xø Common Discrete Distributions The Binomial Distribution: X ~ Bin(n, p) E(X) = np n : number of independent trials p : success probability X : W ® {0,1,..., n} pX (x) px (1- p)n-x ænö n! ç ÷= è x ø x!(n - x)! ænö x pX (x) = ç ÷ p (1- p)n-x è xø : probability of getting x successes out of n trials : probability of x successes : probability of (n-x) failures : number of ways to achieve x successes and (n-x) failures (Binomial coefficient) Common Discrete Distributions The Binomial Distribution: E(X) = np X ~ Bin(n, p) n : number of independent trials p : success probability X : W ® {0,1,..., n} n =1 : ænö x pX (x) = ç ÷ p (1- p)n-x è xø ìï pX (x) = p x (1- p)1-x = í ïî X ~ Bin(1, p) Û p x =1 1- p x=0 X ~ Bern(p) Common Discrete Distributions The Binomial Distribution: X ~ Bin(n, p) E(X) = np n : number of independent trials p : success probability X : W ® {0,1,..., n} ænö x pX (x) = ç ÷ p (1- p)n-x è xø n = 10 p = 0.5 Common Discrete Distributions The Binomial Distribution: X ~ Bin(n, p) E(X) = np n : number of independent trials p : success probability X : W ® {0,1,..., n} ænö x pX (x) = ç ÷ p (1- p)n-x è xø n = 100 p = 0.5 Common Discrete Distributions The Binomial Distribution: X ~ Bin(n, p) E(X) = np n : number of independent trials p : success probability X : W ® {0,1,..., n} ænö x pX (x) = ç ÷ p (1- p)n-x è xø n = 100 p = 0.8 Common Discrete Distributions Example: Number of heads in n fair coin toss trials X : W ® {0,1,..., n} n=2 W = {heads : heads, heads : tails, tails : heads, tails : tails} In general: W = 2n Common Discrete Distributions Example: Number of heads in n fair coin toss trials X : W ® {0,1,..., n} n=2 W = {heads : heads, heads : tails, tails : heads, tails : tails} In general: W = 2n Notice: X ~ Bin(n,1 2) ænö n pX (x) = ç ÷ 0.5 è xø E(X) = n 2 Common Discrete Distributions The Poisson Distribution: E(X) = l X ~ Pois(l ) Used to model the number of occurrences of an event that occur within a particular interval of time and/or space λ : average number of counts (controls rarity of events) X : W ® {0,1,...} pX (x) = l x e- l x! Common Discrete Distributions The Poisson Distribution: • Want to know the distribution of the number of occurrences of an event ÞBinomial? • However, don’t know how many trials are performed – could be infinite! • But we do know the average rate of occurrence: X ~ Bin(n, p) Þ E(X) = np Þ Þ l = np l p= n E(X) = l Common Discrete Distributions n! pX (x) = p x (1- p)n-x x!(n - x)! Binomial: p= l n Þ n! æ l ö pX (x) = ç ÷ x!(n - x)! è n ø x æ lö ç1- ÷ è nø n-x Common Discrete Distributions n! pX (x) = p x (1- p)n-x x!(n - x)! Binomial: p= l n x Þ ælö æ lö n! pX (x) = ç ÷ ç1- ÷ x!(n - x)! è n ø è n ø n-x n æ lö ç1- ÷ x l n! è nø pX (x) = x! n x (n - x)! æ l öx ç1- ÷ è nø Common Discrete Distributions n! pX (x) = p x (1- p)n-x x!(n - x)! Binomial: p= l n x Þ ælö æ lö n! pX (x) = ç ÷ ç1- ÷ x!(n - x)! è n ø è n ø n-x n æ lö ç1- ÷ x 1 l n! è nø pX (x) = x! n x (n - x)! æ l öx1 ç1- ÷ è nø as n ®¥ Common Discrete Distributions n! pX (x) = p x (1- p)n-x x!(n - x)! Binomial: p= l n x Þ ælö æ lö n! pX (x) = ç ÷ ç1- ÷ x!(n - x)! è n ø è n ø n-x n æ lö ç1- ÷ x 1 l n! è nø pX (x) = x! n x (n - x)! æ l öx1 ç1- ÷ è nø æ l x æ l ön ö pX (x) = Lim çç ç1- ÷ ÷÷ n®¥ è x! è n ø ø as n ®¥ Common Discrete Distributions n! pX (x) = p x (1- p)n-x x!(n - x)! Binomial: p= l n x Þ ælö æ lö n! pX (x) = ç ÷ ç1- ÷ x!(n - x)! è n ø è n ø n-x n æ lö ç1- ÷ x 1 l n! è nø pX (x) = x! n x (n - x)! æ l öx1 ç1- ÷ è nø æ l x æ l ön ö pX (x) = Lim çç ç1- ÷ ÷÷ n®¥ è x! è n ø ø as n ®¥ = lx x! e- l ☐ Common Discrete Distributions The Poisson distribution is the Binomial distribution as If Xn ~ Bin(n, p) then n ®¥ Xn ¾d¾ ® Pois(np) If n is large and p is small then the Binomial distribution can be approximated using the Poisson distribution This is referred to as the: • “Poisson Limit Theorem” • “Poisson Approximation to the Binomial” • “Law of Rare Events” fixed Poisson is often more computationally convenient than Binomial References Countless books + online resources! Probability theory and distributions: • Grimmett and Stirzker (2001) Probability and Random Processes. Oxford University Press. General comprehensive introduction to (almost) everything mathematics (including a bit of probability theory): • Garrity (2002) All the mathematics you missed: but need to know for graduate school. Cambridge University Press.