STAT6110/STAT3110 Statistical Inference Topic 1- Probability and random samples Jun Ma Topic 1 Semester 1, 2020 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 1 / 58 Details Unit Convenor: Jun Ma I Location: 526 Level 5, 12 Wally’s Walk I Phone: 9850 8548 I Email: jun.ma@mq.edu.au I Consultation: Monday 1-3pm Tutor: Sophia Shen I Email: Sophia.Shen@mq.edu.au Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 2 / 58 Unit Outline Topic 1: Probability and random samples Topic 2: Large sample probability concepts Topic 3: Estimation concepts Topic 4: Likelihood Topic 5: Estimation methods Topic 6: Hypothesis testing concepts Topic 7: Hypothesis testing methods Topic 8: Bayesian inference Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 3 / 58 Statistical inference This unit is about the theory behind Statistical Inference Statistical inference is the science of drawing conclusions on the basis of numerical information that is subject to randomness The core principle is that information about a population can be obtained using a “representative” sample from that population A “representative” sample requires that the sample has been taken at random from the population To model variability in random samples we use probability models This means we need probability concepts to study statistical inference Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 4 / 58 Population and Sample Extrapolate Population Sample e.g. all adults in a population of interest Jun Ma (Topic 1) e.g. 300 adults chosen at random STAT6110/STAT3110 Statistical Inference Inferences based on the sample e.g. at least one-third of adults have high cholesterol Semester 1, 2020 5 / 58 Topic 1 Outline: Probability and random samples Populations and random samples Probability and relative frequency Probability and set theory Probability axioms Random variables and probability distributions Joint probability distributions Independence Common probability distributions including the normal distribution Sampling variation and statistical inference Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 6 / 58 Probability and random samples We usually interpret probability to be the long-run frequency with which an event occurs in repeated trials We can then model random variation in our sample using the probabilistic variation in repeated samples from the population This leads to the Frequentist approach to statistical inference, which is the most common approach and will be our main focus in this unit There is also another approach called Bayesian statistical inference, which is based on a different interpretation of probability (we will do one lecture on this later in the unit) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 7 / 58 Relative frequency Consider N “samples” taken in identical fashion from a population of interest Consider an event of interest that could possibly occur in each of these samples Let fN be the number of samples where the event occurred Then fN /N is called the relative frequency with which the event occurred The probability of the event to occur is then the limit of this relative frequency fN probability = lim N→∞ N Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 8 / 58 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 9 / 58 Set theory A rigorous description of probability theory uses concepts from set theory A set is a collection of objects An element of a set is a member of this collection If ω is element of a set Ω we write ω ∈ Ω A is a subset of a set Ω, written A ⊂ Ω, if ω ∈ A implies ω ∈ Ω Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 10 / 58 Set operations Denote a union as A ∪ B which means ω ∈ A ∪ B ⇒ ω ∈ A or ω ∈ B. Denote an intersection as A ∩ B which means ω ∈ A ∩ B ⇒ ω ∈ A and ω ∈ B. Denote a complement of A as Ac (or A), so that ω ∈ Ac means that ω ∈ Ω but ω 6∈ A. Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 11 / 58 Outcomes, sample spaces and events Random samples involve uncertainty or variability The term outcome refers to a given realisation of this sampling process The set of all possible outcomes is referred to as the sample space A subset of outcomes in the sample space is called an event An event can be interpreted as an observation that could occur in our sample e.g. a coin toss yields a head Each event has a probability assigned to it reflecting the “chance” that it will occur Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 12 / 58 Example Suppose our sample consists of two individuals for whom we record whether or not a particular infection is present or absent Denote presence or absence of the infection by 1 and 0, respectively One possible outcome is that both individuals have the infection, denoted by (1, 1) The sample space is the set of all possible outcomes, that is, all possible pairs of infection statuses for the two individuals Ω = {(0, 0), (0, 1), (1, 0), (1, 1)} The event “there is exactly one infected individual in the sample” is denoted by the subset of the sample space {(0, 1), (1, 0)} Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 13 / 58 Probability and sets Since events are defined mathematically as sets, we can use set operations to construct new events from existing events Consider two events E1 and E2 , then the new event E1 ∩ E2 is interpreted as the event that both E1 and E2 occur The new event E1 ∪ E2 is interpreted as the event that either E1 or E2 or both occur The new event E1c is interpreted as the event that E1 does not occur The empty set ∅ is interpreted as an impossible event If E1 ∩ E2 = ∅ then E1 and E2 are called mutually exclusive events with the interpretation that the two events cannot both occur Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 14 / 58 Example (cont.) The event “either 1 or 2 individuals in the sample are infected” corresponds to the event union {(0, 1), (1, 0)} ∪ {(1, 1)} = {(0, 1), (1, 0), (1, 1)} The event “both 1 and 2 individuals in the sample are infected ” corresponds to the event intersection {(0, 1), (1, 0)} ∩ {(1, 1)} = ∅ which is impossible since these events are mutually exclusive Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 15 / 58 Valid probabilities Probability is a function of events, or a function of subsets of the sample space Consider an event E that is a subset of the sample space Ω Then Pr(E) denotes the probability that event E will occur The function “Pr” is allowed to be any function of subsets of the sample space that satisfies certain requirements that make it a valid probability Any valid probability must satisfy the following intuitively natural requirements, called axioms Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 16 / 58 Axioms of probability 1 The probability of any event E is a number between 0 and 1 inclusive. That is, 0 ≤ Pr(E) ≤ 1 2 The probability of an event with certainty is 1 and the probability of an impossible event is 0. That is, Pr(Ω) = 1 3 and Pr(∅) = 0 If two events E1 and E2 are mutually exclusive, so they cannot both occur, the probability that either event occurs is the sum of their respective probabilities. That is, if E1 ∩ E2 = ∅ Jun Ma (Topic 1) then Pr(E1 ∪ E2 ) = Pr(E1 ) + Pr(E2 ) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 17 / 58 Probability properties Many properties follow from the probability axioms. For example: 1 If A ⊂ B, then P(A) ≤ P(B) 2 P(Ac ) = 1 − P(A) 3 P(A ∪ B) = P(A) + P(B) − P(A ∩ B) These types of properties can be illustrated using a Venn diagram similar to those on slide 10 (see also tutorial) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 18 / 58 Example (cont.) - 3 probability assignments Event ∅ {(0, 0)} {(0, 1)} {(1, 0)} {(1, 1)} {(0, 0), (0, 1)} {(0, 0), (1, 0)} {(0, 0), (1, 1)} {(0, 1), (1, 0)} {(0, 1), (1, 1)} {(1, 0), (1, 1)} {(0, 0), (0, 1), (1, 0)} {(0, 0), (0, 1), (1, 1)} {(0, 0), (1, 0), (1, 1)} {(0, 1), (1, 0), (1, 1)} Ω Jun Ma (Topic 1) probability 1 0 0.9025 0.0475 0.0475 0.0025 0.9500 0.9500 0.9050 0.0950 0.0500 0.0500 0.9975 0.9525 0.9525 0.0975 1 probability 2 0 0.3025 0.2475 0.2475 0.2025 0.5500 0.5500 0.5050 0.4950 0.4500 0.4500 0.7975 0.7525 0.7525 0.6975 1 STAT6110/STAT3110 Statistical Inference probability 3 0 0.3000 0.3000 0.3000 0.3000 0.6000 0.6000 0.6000 0.6000 0.6000 0.6000 0.9000 0.9000 0.9000 0.9000 1 Semester 1, 2020 19 / 58 Example (cont.) The probability axioms are only satisfied for probability assignments 1 and 2. Probability assignment 3 is invalid because Pr Ω = Pr {(0, 0), (0, 1), (1, 0), (1, 1)} = 1 6= Pr {(0, 0)} + Pr {(0, 1)} + Pr {(1, 0)} + Pr {(1, 1)} = 1.2 Consider event E1 “exactly one individual is infected” and event E2 “the first individual is infected” E1 = {(0, 1), (1, 0)} E2 = {(1, 0), (1, 1)} E1 ∩ E2 = {(1, 0)} Using property 3 from slide 18, Pr(E1 ∪ E2 ) is 0.0950+0.0500−0.0475 = 0.0975 or 0.4950+0.4500−0.2475 = 0.6975 In each case the calculation agrees with the probability assigned to the event {(0, 1), (1, 0), (1, 1)} Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 20 / 58 Random variables A random variable is a function of outcomes in the sample space It can be thought of as an uncertain quantity that takes on values with particular probabilities A random variable that can take on only a discrete set of values then it is referred to as a discrete random variable A random variable that can take on a continuum of values is referred to as a continuous random variable For example: the gender of a randomly sampled individual is discrete while their cholesterol level is continuous Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 21 / 58 Random variables and probabilities Statements about a random variable taking on a particular value or having a value in a particular range are events For a random variable X and a given number x, statements such as X = x and X ≤ x are events We can therefore assign probabilities Pr(X = x) and Pr(X ≤ x) to such events A general convention is that random variables are denoted by upper-case letters, while the values that they can take on are denoted by lower-case letters This distinction will be important in subsequent lectures Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 22 / 58 Probability distributions The probability distribution for a random variable is a rule for assigning a probability to any event stating that the random variable takes on a specific value or lies in a specific range There are various ways to specify the probability distribution of a random variable We will use 3 functions for specifying the probability distribution of a random variable 1 Cumulative distribution function (or simply called distribution function) 2 Probability function 3 Probability density function These are not the only functions that can be used to specify a probability distribution, but they are the only ones we will use Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 23 / 58 Cumulative distribution function The cumulative distribution function of a random variable X is a function FX (x) such that FX (x) = Pr(X ≤ x) for any value x Any valid cumulative distribution function must therefore satisfy the following three properties: (i) lim FX (x) = 1 x→∞ (ii) lim FX (x) = 0 x→−∞ (iii) FX (x1 ) ≥ FX (x2 ) where x1 ≥ x2 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 24 / 58 Probability function For a discrete random variable X , the probability function is a function that gives the probability that the random variable will equal any specific value The probability function is fX (x) = Pr(X = x) where x is any number in the set of possible values that X can take on P For any discrete random variable, x fX (x) = 1, where the summation is taken over all possible values that X can take on Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 25 / 58 Probability density function For a continuous random variable X , the probability density function is the derivative of the cumulative distribution function fX (x) = d FX (x) dx It specifies the probability that a continuous random variable will fall into any given range through the relationship Z u Pr(l ≤ X ≤ u) = fX (x)dx l It must therefore always integrate to 1 over the range (−∞, ∞) The probability density function does not give the probability that a continuous random variable is equal to a specific value For a continuous random variable it is always the case that Pr(X = x) = 0 (can you think why?) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 26 / 58 Attributes of probability distributions: Expectation The probability distribution of a random variable has various attributes that summarise the way the random variable tends to behave The expectation, or mean, of a random variable is the average value that the random variable takes on For discrete random variables the expectation is X xfX (x) E(X ) = x where the summation is over all possible values of the random variable X Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 27 / 58 Attributes of probability distributions: Expectation (cont.) For continuous random variables the expectation is given by Z ∞ xfX (x)dx E(X ) = −∞ Since the sum or integral of a linear function yields a linear function of the sum or integral, expectations possess an important linearity property, namely, for constants c0 and c1 E(c0 + c1 X ) = c0 + c1 E(X ) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 28 / 58 Attributes of probability distributions: Variance The variance of a random variable is a measure of the degree of variation that a random variable exhibits It is defined as 2 Var(X ) = E X − E(X ) = E(X 2 ) − E(X )2 for both continuous and discrete random variables Unlike expectations, the linearity property does not hold for variances, but is replaced by the equally important property Var(c0 + c1 X ) = c12 Var(X ) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 29 / 58 Attributes of probability distributions: Percentiles Another important attribute are percentiles For α ∈ (0, 1), the α-percentile of a probability distribution is the point below which 100α% of the distribution falls The α-percentile of a probability distribution with cumulative distribution function FX (x) is the point pα that satisfies FX (pα ) = α For example, the 0.5 percentile, called the median, is the point below which half of the probability distribution lies The 0.25 and 0.75 percentiles, called quartiles, specify the points below which one-quarter and three-quarters of the distribution lies Other percentiles of a probability distribution will also be of interest, particularly when we come to discuss confidence intervals in subsequent topics. Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 30 / 58 Example (cont.) Define a random variable T to be the number infected in the sample of 2 people T is a discrete random variable since its possible values are 0, 1 and 2 The table gives the value of T for each outcome in the sample space The table also gives the probability distribution of T under the probability assignment 1 discussed earlier t Event T = t fT (t) FT (t) 0 {(0, 0)} 0.9025 0.9025 1 {(0, 1), (1, 0)} 0.0950 0.9975 2 {(1, 1)} 0.0025 1 E(T ) = 0 × 0.9025 + 1 × 0.0950 + 2 × 0.0025 = 0.1 Var(T ) = (02 × 0.9025 + 12 × 0.0950 + 22 × 0.0025) − (0.12 ) = 0.095 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 31 / 58 Conditional probability The probability of an event might change once we know that some other event has occurred, this means this event depends on the other event For two events E1 and E2 , the conditional probability that E1 occurs given that E2 has occurred is denoted Pr(E1 |E2 ) and is defined as Pr(E1 |E2 ) = Pr(E1 ∩ E2 ) Pr(E2 ) This is defined only for events E2 that are not impossible, so that Pr(E2 ) 6= 0 in the denominator It does not make sense for us to condition on the occurrence of an impossible event Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 32 / 58 Independence A property that applies to both events and random variables Using the definition of conditional probability, two events E1 and E2 are independent events if Pr E1 |E2 = Pr E1 The occurrence of the event E2 does not affect the probability of occurrence of the event E1 (and vice versa) We can re-express this definition by saying that E1 and E2 are independent events if they satisfy the multiplicative property Pr E1 ∩ E2 = Pr E1 Pr E2 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 33 / 58 Independent random variables Statistical inference makes more use of the concept of independence when applied to random variables Consider two random variables X1 and X2 , with cumulative distribution functions F1 (x1 ) and F2 (x2 ) X1 and X2 are said to be independent random variables if Pr(X1 ≤ x1 | X2 ≤ x2 ) = Pr(X1 ≤ x1 ) = F1 (x1 ) Pr(X2 ≤ x2 | X1 ≤ x1 ) = Pr(X2 ≤ x2 ) = F2 (x2 ) where x1 and x2 are in the range of possible values of X1 and X2 Knowing the value of one random variable does not affect the probability distribution of the other Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 34 / 58 Independent random variables (cont.) Like independence of events, independence of random variables can be defined using the multiplicative property Pr({X1 ≤ x1 }∩{X2 ≤ x2 }) = Pr(X1 ≤ x1 ) Pr(X2 ≤ x2 ) = F1 (x1 )F2 (x2 ) We can see from this form that independence of random variables is defined in terms of independence of the two events X1 ≤ x1 and X2 ≤ x2 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 35 / 58 Joint probability distributions The above discussion introduces us to the concept of the joint probability distribution of two random variables Generalisation of the definition of a probability distribution for a single random variable to define distribution for two or more random variables The joint probability distribution of two random variables is a rule for assigning probabilities to any event stating that the two random variables simultaneously take on specific values or lie in specific ranges Like the probability distribution of a single random variable, the joint probability distribution can be characterised by various functions Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 36 / 58 Joint cumulative distribution function The first such function is a generalisation of the cumulative distribution function Consider the shorthand notation Pr(X1 ≤ x1 , X2 ≤ x2 ) ≡ Pr({X1 ≤ x1 } ∩ {X2 ≤ x2 }) Then the joint cumulative distribution function of two random variables X1 and X2 is the function of two variables FX1 ,X2 (x1 , x2 ) = Pr(X1 ≤ x1 , X2 ≤ x2 ) So independence of two random variables is equivalent to their joint cumulative distribution function factoring into the product of their individual cumulative distribution functions Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 37 / 58 Joint probability function The joint probability function of two discrete random variables X1 and X2 is the function of two variables fX1 ,X2 (x1 , x2 ) = Pr(X1 = x1 , X2 = x2 ) The multiplicative property for independence of two discrete random variables can equivalently be expressed in terms of their joint probability function That is, two discrete random variables X1 and X2 are independent if fX1 ,X2 (x1 , x2 ) = f1 (x1 )f2 (x2 ) where f1 (x1 ) and f2 (x2 ) are the probability functions of X1 and X2 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 38 / 58 Joint probability density function The joint probability density function of two continuous random variables X1 and X2 is the function of two variables ∂ ∂ FX1 ,X2 (x1 , x2 ) fX1 ,X2 (x1 , x2 ) = ∂x1 ∂x2 where the symbol ∂ means partial differentiation of a multivariable function, rather than the symbol d used in univariable differentiation The joint probability density function specifies the probability that the two continuous random variables will simultaneously fall into any two given ranges through the relationship Z u1 Z u2 Pr(l1 ≤ X1 ≤ u1 , l2 ≤ X2 ≤ u2 ) = fX1 ,X2 (x1 , x2 )dx2 dx1 l1 Jun Ma (Topic 1) l2 STAT6110/STAT3110 Statistical Inference Semester 1, 2020 39 / 58 Correlation and covariance The covariance of X and Y is defined as Cov(X , Y ) = E X − E(X ) Y − E(Y ) = E(XY ) − E(X )E(Y ) We say that X and Y are uncorrelated when Cov(X , Y ) = 0, i.e. when E(XY ) = E(X )E(Y ) Being uncorrelated random variables is a weaker property than being independent random variables Independent implies uncorrelated but not vice versa Covariance is a generalisation of variance Cov(X , X ) = Var(X ) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 40 / 58 Correlation and covariance (cont.) A measure of the extent to which two random variables depart from being uncorrelated is the correlation Cov(X,Y) Corr(X , Y ) = p Var(X )Var(Y ) Correlation is scaled such that it always lies between −1 and 1, with 0 corresponding to being uncorrelated It is important in studying the linear relationship between two variables, with the extremes of −1 and 1 corresponding to a perfect negative and positive linear relationship, respectively Although being uncorrelated implies that there is no linear relationship between two variables, it does not preclude that some other relationship exists. This is another reason why independence is a stronger property than being uncorrelated Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 41 / 58 Correlation example Y Suppose (X , Y ) can be either (2, 2) (−1, 1) (1, −1) (−2, −2) with with with with 10% 40% 40% 10% probability, probability, probability, probability. The random variables X and Y are certainly dependent, since if we know what one of them is, we can figure out what the other one is too. Jun Ma (Topic 1) 2 u -2 1 -1 1 -1 r STAT6110/STAT3110 Statistical Inference r 6 2 X - u -2 Semester 1, 2020 42 / 58 Correlation example (cont.) On the other hand, E [XY ], E [X ] and E [Y ] are all zero; for instance, E [XY ] = 10% × 2 × 2 + 40% × (−1) × 1 + 40% × 1 × (−1) + 10% × (−2) × (−2) = 0.4 − 0.4 − 0.4 + 0.4 = 0; so the correlation between X and Y is zero X and Y are uncorrelated but not independent Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 43 / 58 Independent random samples The main use of the concept of independence in this unit is for modelling a random sample from a population We will often use a collection of n random variables to represent n observations in a random sample and assume that these observations are independent For a random sample, independence means that one observation does not affect the probability distribution of another observation n random variables X = (X1 , . . . , Xn ) are (mutually) independent if their joint cumulative distribution function factors into the product of their n individual cumulative distribution functions or likewise for the joint density or probability functions FX (x) = Pr(X1 ≤ x1 , . . . , Xn ≤ xn ) = n Y Fi (xi ) i=1 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference fX (x) = n Y fi (xi ) i=1 Semester 1, 2020 44 / 58 Independence example Random variable T0 is 1 if only one individual is infected and 0 otherwise Random variable T1 is 1 if the first individual is infected and 0 otherwise Random variable T2 is 1 if the second individual is infected and 0 otherwise Consider events T0 = 1, T1 = 1, T2 = 1,denoted as E0 , E1 , E2 E0 = {(0, 1), (1, 0)} and Pr(E0 ) = 0.095 based on Table 1 Likewise we have E1 = {(1, 0), (1, 1)} and Pr(E1 ) = 0.05, as well as E2 = {(0, 1), (1, 1)} and Pr(E2 ) = 0.05 Conditional probability 1 ∩E0 ) Pr(E1 |E0 ) = Pr(E Pr(E0 ) = Jun Ma (Topic 1) Pr({(1,0)}) 0.095 = 0.0475 0.095 = 0.5 STAT6110/STAT3110 Statistical Inference Semester 1, 2020 45 / 58 Independence example (cont.) Thus, given we know exactly one person is infected, it is equally likely to be individual 1 or 2 E0 and E1 are not independent events since Pr(E1 |E0 ) 6= Pr(E1 ) Knowledge that there is one infected individual provides information about whether individual 1 is infected On the other hand, T1 = 1 and T2 = 1 are independent events Pr(E1 ∩ E2 ) = Pr({(1, 1)}) = 0.0025 = 0.05 × 0.05 = Pr(E1 ) Pr(E2 ) Same process can be followed for any other value of the random variables T1 and T2 to show that Pr(T1 = t1 , T2 = t2 ) = Pr(T1 = t1 ) Pr(T2 = t2 ) t1 = 0, 1 t2 = 0, 1 That is, the random variables T1 and T2 are independent random variables Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 46 / 58 Common probability distributions Probability distributions commonly used in statistical inference are based on a simple and flexible function for fX (x) or FX (x) In subsequent lectures we will use many common probability distributions All of these are summarised in the accompanying document “Common Probability Distributions” (which will be reviewed in the lecture) Common discrete distributions include: binomial, Poisson, geometric, negative binomial and hypergeometric distributions Common continuous distributions include: normal, exponential, gamma, uniform, beta, t, χ2 and F distributions Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 47 / 58 Normal distribution The most important distribution for statistical inference In large samples it unifies many statistical inference tools The large sample concepts will be considered in Topics 2 and 3 For now we will simply review some of the key features Consider a continuous random variable X with µ = E(X ) and σ 2 = Var(X ) X has a normal distribution, written X ∼ N(µ, σ 2 ), if the probability density function of X has the form 1 (x − µ)2 fX (x) = √ exp − x ∈ (−∞, ∞) 2σ 2 σ 2π Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 48 / 58 Standard normal distribution Cumulative distribution function FX (x) is not convenient and needs to be calculated numerically This is done using a special case, called the standard normal distribution, which is the N(0, 1) distribution Let the standard normal cumulative distribution distribution be 2 Z x u 1 exp − Φ(x) = √ du 2 2π −∞ Then the cumulative distribution function associated with any other normal distribution is x −µ FX (x) = Φ σ α-percentile of the standard normal distribution is zα Φ(zα ) = α Jun Ma (Topic 1) or zα = Φ−1 (α) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 49 / 58 probability density 0.3 0.2 0 0.0 0.1 probability density 0.4 1 σ 2π Standard normal distribution - percentiles −4 −2 0 2 4 µ − 2σ x Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference µ µ + 2σ x Semester 1, 2020 50 / 58 Bivariate normal distribution The bivariate normal distribution is a joint probability distribution Consider two normally distributed random variables X and Y with Corr(X , Y ) = ρ We call µ the mean vector and Σ the variance-covariance matrix 2 σX ρσX σY µX and Σ= µ= ρσX σY σY2 µY Then X and Y have a bivariate normal distribution, written X X ∼ N2 (µ, Σ) where X = Y if their joint probability density function is of the form 1 1 T −1 p fX ,Y (x, y ) = fX (x) = exp − (x − µ) Σ (x − µ) 2 2πσX σY 1 − ρ2 Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 51 / 58 Multivariate normal distribution Generalisation of the normal distribution, giving the joint distribution of a k × 1 vector of random variables X = (X1 , . . . , Xk )T The joint probability density function is − 1 1 k T −1 fX (x) = (2π) det(Σ) 2 exp − (x − µ) Σ (x − µ) 2 x ∈ <k where det(Σ) is the matrix determinant of Σ µ = (µ1 , . . . , µk )T is called the mean vector The k × k matrix Σ is called the variance-covariance matrix and must be a non-negative definite matrix Its main use in this unit is as the distribution of estimators in large samples – more on this in later topics Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 52 / 58 Inference example We will now consider how to use a probability model for the sampling variation in a simple introductory example Example: Assessment of disease prevalence in a population I We are interested in the proportion of a population that has a particular disease, called θ I We sample n individuals at random from the population I We observe the number of individuals who have the disease I We assume our sample is truly random and not biased i.e. assume we have not systematically over- or under-sampled diseased individuals I How would we use the sample to make inferences about θ? Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 53 / 58 Inference about the population The population prevalence θ is considered to be a fixed constant Our goal is to use the sample to estimate this unknown constant and also to place some appropriate uncertainty limits around our estimate The starting point is the natural estimate of the unknown population prevalence, that is, by the observed proportion in our sample By using the observed sample prevalence to make inferences about the disease prevalence in the population, we are extrapolating from the sample to the population The reason why such sampling and extrapolation is necessary is that we can’t assess the entire population Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 54 / 58 Sampling variation How much do we “trust” the observed sample prevalence as an estimate of the population prevalence? The answer depends on the sampling variation Sampling variability reflects the extent to which the sample prevalence tends to vary from sample to sample If our sample included n = 1000 individuals we would “trust” the observed sample prevalence more than if our sample included n = 100 individuals Consider a plot of repeated samples with difference sample sizes Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 55 / 58 Figure 1: Results from 10 prevalence studies with sample size 100, and 10 prevalence studies with sample size 1000. Sample size=1000 Sample size=100 5 10 15 20 25 Sample Prevalence (%) Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 56 / 58 Probability model In order to quantify our “trust” in the sample prevalence, we need some way of describing its variability This can be done using a probability model In this example the binomial distribution provides a natural model for the way the sampling has been carried out assuming: I n is fixed not random I individuals are sampled independently We then have a probability model for the observed number of diseased individuals X and the sample prevalence P= Jun Ma (Topic 1) X n STAT6110/STAT3110 Statistical Inference Semester 1, 2020 57 / 58 Binomial model Pr(X = x) = or Pr(P = p) = n! θx (1 − θ)n−x (n − x)x! n! θpn (1 − θ)n−pn (n − pn)(pn)! x = 0, . . . , n pn = 0, . . . , n We can use this distribution to quantify our trust in the sample prevalence as an indication of the population prevalence, particularly using the distribution’s mean and variance We can also use this model to calculate a confidence interval, which is an important summary of our “trust” in the sample We will come back to this in Topic 3, after discussing the large sample normal approximation to the binomial distribution and some key estimation concepts Jun Ma (Topic 1) STAT6110/STAT3110 Statistical Inference Semester 1, 2020 58 / 58