Statistics Key Points 1.1.1 Discrete data can only take on certain values. Continuous data can take on any value, possibly within a limited range. 1.1.2 For ungrouped data, the mean is ๐ฅ= ๐ด๐ฅ ๐ ๐ฅ= ๐ด๐ฅ๐ ๐ด๐ For grouped data, 1.1.3 For ungrouped coded data, the mean is 1 ๐ด(๐๐ฅ − ๐) ๐ฅ= ( + ๐. ๐ ๐ For grouped coded data, 1 ๐ด(๐๐ฅ − ๐)๐ ๐ฅ= ( + ๐. ๐ ๐ด๐ These formulae can be summarised by writing 1 ๐ฅ = ⋅ [mean(๐๐ฅ − ๐) + ๐] ๐ 1.1.4 Interquartile range = upper quartile – lower quartile or ๐ผ๐๐ = ๐! − ๐" 1.1.5 Standard deviation = √Variance For ungrouped data, ๐ด (๐ฅ − ๐ฅ ) # ๐ด๐ฅ # ๐ด๐ฅ # ๐=E =E − ๐ฅ where ๐ฅ = ๐ ๐ ๐ For grouped data, ๐ด(๐ฅ − ๐ฅ )# ๐ ๐ด๐ฅ # ๐ ๐ด๐ฅ๐ # ๐=E =E − ๐ฅ where ๐ฅ = ๐ด๐ ๐ด๐ ๐ด๐ 1.2.1 A probability distribution shows all the possible values of a variable and the sum of the probabilities is ๐ด๐ = 1 1.2.2 The expectation of a discrete random variable (DRV) is ๐ธ (๐) = ๐ด๐ฅ๐ = ๐ด [๐ฅ ⋅ ๐(๐ = ๐ฅ)] 1.3.1 If ๐ฟ~๐ฉ(๐, ๐) then the probability of ๐ successes is ๐ ๐$ = S T ๐$ (1 − ๐)%&$ ๐ 1.3.2 The mean and variance of ๐~๐ต(๐, ๐) are given, respectively, by ๐ = ๐๐ and ๐ # = ๐๐(1 − ๐) = ๐๐๐ 1.4.1 A random variable ๐ that has a geometric distribution is denoted by ๐ฟ~๐ฎ๐๐(๐), and the probability that the first success occurs on the ๐th trial is ๐$ = ๐(1 − ๐)$&" for ๐ = 1,2,3, … 1.4.2 When ๐~๐บ๐๐(๐) and ๐ = 1 − ๐, then • ๐ (๐ ≤ ๐) = 1 − ๐ $ • ๐ (๐ > ๐) = ๐ $ 1.4.3 The mode of all geometric distributions is 1. 1.5.1 ๐~๐ (๐, ๐ # ) describes a normally distributed random variable. We read this as “๐ has a normal distribution with mean ๐ and variance ๐ # ” 1.5.2 The standard normal variable is ๐~๐ (0,1) 1.5.3 When ๐~๐(๐, ๐ # ) then ๐−๐ ๐ has a normal distribution. [Refer to 2.2.3] A standardised value ๐ฅ−๐ ๐ง= ๐ tells us how many standard deviations ๐ฅ is from the mean. ๐= 1.5.4 ๐~๐ต(๐, ๐) can be approximated by ๐(๐, ๐ # ), where ๐ = ๐๐ and ๐ # = ๐๐๐, provided that ๐ is large enough to ensure that ๐๐ > 5 and ๐๐ > 5 1.5.5 Continuity corrections must be made when a discrete distribution is approximated by a continuous distribution. 2.1.1 A Poisson distribution can be used to model a discrete probability distribution in which the events occur singly, at random and independently, in a given interval of space or time. The mean and variance of a Poisson distribution are equal; hence, a Poisson distribution has only one parameter. 2.1.2 When modelling data using Poisson distribution: • Work out the mean and variance and check if they are approximately equal. • If mean and variance are not approximately equal, the Poisson distribution is not a suitable model to use with the data. • Use the mean to calculate probabilities and expected frequencies. • Compare expected frequencies with observed frequencies. 2.1.3 If the random variable ๐ has a Poisson distribution with parameter ๐, where ๐ > 0, we write ๐~๐๐(๐) and: '! • ๐ (๐ = ๐) = ๐ &' ⋅ where ๐ = 0,1,2, … • ๐ธ (๐) = ๐ • ๐๐๐(๐) = ๐ $! 2.1.4 In a Poisson distribution, events occur at a constant rate; the mean average number of events in a given interval is proportional to that interval. 2.2.1 For a random variable ๐ and constants ๐ and ๐: ๐ธ (๐๐ ± ๐ ) = ๐๐ธ (๐) ± ๐ and ๐๐๐(๐๐ + ๐) = ๐# ๐๐๐(๐) 2.2.2 For two independent random variables ๐ and ๐ and constants ๐ and ๐: ๐ธ (๐๐ ± ๐๐ ) = ๐๐ธ (๐) ± ๐๐ธ (๐) and ๐๐๐(๐๐ ± ๐๐) = ๐# ๐๐๐ (๐) + ๐ # ๐๐๐(๐) These results can be extended to any number of independent random variables. 2.2.3 If a continuous random variable ๐ has a normal distribution, then ๐๐ + ๐, where ๐ and ๐ are constants, also has a normal distribution. If continuous random variables ๐ and ๐ have independent normal distributions, then ๐๐ + ๐๐, where ๐ and ๐ are constants, has a normal distribution. 2.3.1 A graph, f(x), representing a continuous random variable is the probability density function (PDF). The PDF has following properties: • It cannot be negative since you cannot have a negative probability; ๐ (๐ฅ ) ≥ 0. • Total probability of all outcomes = 1; hence, ) n ๐(๐ฅ ) ๐๐ฅ = 1 &) In many situations, the data are defined across a specified interval or across specified intervals, outside of which ๐(๐ฅ) = 0. 2.3.2 • With continuous random variables, each individual value has zero probability of occurring. For a continuous random variable with PDF ๐ (๐ฅ ), ๐ (๐ = ๐ ) = 0. • Because we cannot find the probability of an exact value, when finding the probability in a given interval it does not matter whether you use < or ≤. ๐ (๐ < ๐ฅ < ๐ ) = ๐ (๐ ≤ ๐ฅ < ๐) = ๐(๐ < ๐ฅ ≤ ๐) = ๐ (๐ ≤ ๐ฅ ≤ ๐ ) Note that this does not imply that ๐ cannot take the value ๐, it just means the probability of the exact value ๐ is zero. 2.3.3 The probability of ๐ lying in the interval (๐, ๐) is given by the area under the graph between ๐ and ๐. That is: * ๐(๐ < ๐ < ๐ ) = n ๐ (๐ฅ ) ๐๐ฅ + 2.3.4 The median, ๐, of a continuous random variable is that value for which , 1 ๐(๐ < ๐) = n ๐ (๐ฅ ) ๐๐ฅ = 2 &) 2.3.5 The ๐th percentile, ๐, of a continuous random variable is that value for which ๐ ๐ (๐ < ๐ ) = n ๐(๐ฅ) ๐๐ฅ = where 0 < ๐ < 100 100 &) 2.3.6 For continuous random variables with PDF ๐(๐ฅ): ) ๐ธ (๐ ) = n ๐ฅ๐ (๐ฅ ) ๐๐ฅ &) and ) ) &) &) ๐๐๐(๐ฅ ) = n ๐ฅ # ๐(๐ฅ) ๐๐ฅ − sn ๐ฅ๐ (๐ฅ ) ๐๐ฅt Page 102 #