Statistics Key Formulas & Concepts

Statistics Key Points 1.1.1 Discrete data can only take on certain values. Continuous data can take on any value, possibly within a limited range. 1.1.2 For ungrouped data, the mean is 𝑥= 𝛴𝑥 𝑛 𝑥= 𝛴𝑥𝑓 𝛴𝑓 For grouped data, 1.1.3 For ungrouped coded data, the mean is 1 𝛴(𝑎𝑥 − 𝑏) 𝑥= ( + 𝑏. 𝑎 𝑛 For grouped coded data, 1 𝛴(𝑎𝑥 − 𝑏)𝑓 𝑥= ( + 𝑏. 𝑎 𝛴𝑓 These formulae can be summarised by writing 1 𝑥 = ⋅ [mean(𝑎𝑥 − 𝑏) + 𝑏] 𝑎 1.1.4 Interquartile range = upper quartile – lower quartile or 𝐼𝑄𝑅 = 𝑄! − 𝑄" 1.1.5 Standard deviation = √Variance For ungrouped data, 𝛴 (𝑥 − 𝑥 ) # 𝛴𝑥 # 𝛴𝑥 # 𝜎=E =E − 𝑥 where 𝑥 = 𝑛 𝑛 𝑛 For grouped data, 𝛴(𝑥 − 𝑥 )# 𝑓 𝛴𝑥 # 𝑓 𝛴𝑥𝑓 # 𝜎=E =E − 𝑥 where 𝑥 = 𝛴𝑓 𝛴𝑓 𝛴𝑓 1.2.1 A probability distribution shows all the possible values of a variable and the sum of the probabilities is 𝛴𝑝 = 1 1.2.2 The expectation of a discrete random variable (DRV) is 𝐸 (𝑋) = 𝛴𝑥𝑝 = 𝛴 [𝑥 ⋅ 𝑃(𝑋 = 𝑥)] 1.3.1 If 𝑿~𝑩(𝒏, 𝒑) then the probability of 𝑟 successes is 𝑛 𝑝$ = S T 𝑝$ (1 − 𝑝)%&$ 𝑟 1.3.2 The mean and variance of 𝑋~𝐵(𝑛, 𝑝) are given, respectively, by 𝜇 = 𝑛𝑝 and 𝜎 # = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞 1.4.1 A random variable 𝑋 that has a geometric distribution is denoted by 𝑿~𝑮𝒆𝒐(𝒑), and the probability that the first success occurs on the 𝑟th trial is 𝑝$ = 𝑝(1 − 𝑝)$&" for 𝑟 = 1,2,3, … 1.4.2 When 𝑋~𝐺𝑒𝑜(𝑝) and 𝑞 = 1 − 𝑝, then • 𝑃 (𝑋 ≤ 𝑟) = 1 − 𝑞 $ • 𝑃 (𝑋 > 𝑟) = 𝑞 $ 1.4.3 The mode of all geometric distributions is 1. 1.5.1 𝑋~𝑁 (𝜇, 𝜎 # ) describes a normally distributed random variable. We read this as “𝑋 has a normal distribution with mean 𝜇 and variance 𝜎 # ” 1.5.2 The standard normal variable is 𝑍~𝑁 (0,1) 1.5.3 When 𝑋~𝑁(𝜇, 𝜎 # ) then 𝑋−𝜇 𝜎 has a normal distribution. [Refer to 2.2.3] A standardised value 𝑥−𝜇 𝑧= 𝜎 tells us how many standard deviations 𝑥 is from the mean. 𝑍= 1.5.4 𝑋~𝐵(𝑛, 𝑝) can be approximated by 𝑁(𝜇, 𝜎 # ), where 𝜇 = 𝑛𝑝 and 𝜎 # = 𝑛𝑝𝑞, provided that 𝑛 is large enough to ensure that 𝑛𝑝 > 5 and 𝑛𝑞 > 5 1.5.5 Continuity corrections must be made when a discrete distribution is approximated by a continuous distribution. 2.1.1 A Poisson distribution can be used to model a discrete probability distribution in which the events occur singly, at random and independently, in a given interval of space or time. The mean and variance of a Poisson distribution are equal; hence, a Poisson distribution has only one parameter. 2.1.2 When modelling data using Poisson distribution: • Work out the mean and variance and check if they are approximately equal. • If mean and variance are not approximately equal, the Poisson distribution is not a suitable model to use with the data. • Use the mean to calculate probabilities and expected frequencies. • Compare expected frequencies with observed frequencies. 2.1.3 If the random variable 𝑋 has a Poisson distribution with parameter 𝜆, where 𝜆 > 0, we write 𝑋~𝑃𝑜(𝜆) and: '! • 𝑃 (𝑋 = 𝑟) = 𝑒 &' ⋅ where 𝑟 = 0,1,2, … • 𝐸 (𝑋) = 𝜆 • 𝑉𝑎𝑟(𝑋) = 𝜆 $! 2.1.4 In a Poisson distribution, events occur at a constant rate; the mean average number of events in a given interval is proportional to that interval. 2.2.1 For a random variable 𝑋 and constants 𝑎 and 𝑏: 𝐸 (𝑎𝑋 ± 𝑏 ) = 𝑎𝐸 (𝑋) ± 𝑏 and 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎# 𝑉𝑎𝑟(𝑋) 2.2.2 For two independent random variables 𝑋 and 𝑌 and constants 𝑎 and 𝑏: 𝐸 (𝑎𝑋 ± 𝑏𝑌 ) = 𝑎𝐸 (𝑋) ± 𝑏𝐸 (𝑌) and 𝑉𝑎𝑟(𝑎𝑋 ± 𝑏𝑌) = 𝑎# 𝑉𝑎𝑟 (𝑋) + 𝑏 # 𝑉𝑎𝑟(𝑌) These results can be extended to any number of independent random variables. 2.2.3 If a continuous random variable 𝑋 has a normal distribution, then 𝑎𝑋 + 𝑏, where 𝑎 and 𝑏 are constants, also has a normal distribution. If continuous random variables 𝑋 and 𝑌 have independent normal distributions, then 𝑎𝑋 + 𝑏𝑌, where 𝑎 and 𝑏 are constants, has a normal distribution. 2.3.1 A graph, f(x), representing a continuous random variable is the probability density function (PDF). The PDF has following properties: • It cannot be negative since you cannot have a negative probability; 𝑓 (𝑥 ) ≥ 0. • Total probability of all outcomes = 1; hence, ) n 𝑓(𝑥 ) 𝑑𝑥 = 1 &) In many situations, the data are defined across a specified interval or across specified intervals, outside of which 𝑓(𝑥) = 0. 2.3.2 • With continuous random variables, each individual value has zero probability of occurring. For a continuous random variable with PDF 𝑓 (𝑥 ), 𝑃 (𝑋 = 𝑎 ) = 0. • Because we cannot find the probability of an exact value, when finding the probability in a given interval it does not matter whether you use < or ≤. 𝑃 (𝑎 < 𝑥 < 𝑏 ) = 𝑃 (𝑎 ≤ 𝑥 < 𝑏) = 𝑃(𝑎 < 𝑥 ≤ 𝑏) = 𝑃 (𝑎 ≤ 𝑥 ≤ 𝑏 ) Note that this does not imply that 𝑋 cannot take the value 𝑎, it just means the probability of the exact value 𝑎 is zero. 2.3.3 The probability of 𝑋 lying in the interval (𝑎, 𝑏) is given by the area under the graph between 𝑎 and 𝑏. That is: * 𝑃(𝑎 < 𝑋 < 𝑏 ) = n 𝑓 (𝑥 ) 𝑑𝑥 + 2.3.4 The median, 𝑚, of a continuous random variable is that value for which , 1 𝑃(𝑋 < 𝑚) = n 𝑓 (𝑥 ) 𝑑𝑥 = 2 &) 2.3.5 The 𝑟th percentile, 𝑘, of a continuous random variable is that value for which 𝑟 𝑃 (𝑋 < 𝑘 ) = n 𝑓(𝑥) 𝑑𝑥 = where 0 < 𝑟 < 100 100 &) 2.3.6 For continuous random variables with PDF 𝑓(𝑥): ) 𝐸 (𝑋 ) = n 𝑥𝑓 (𝑥 ) 𝑑𝑥 &) and ) ) &) &) 𝑉𝑎𝑟(𝑥 ) = n 𝑥 # 𝑓(𝑥) 𝑑𝑥 − sn 𝑥𝑓 (𝑥 ) 𝑑𝑥t Page 102 #

Statistics Key Formulas & Concepts

Related documents

Products

Support

Statistics Key Formulas & Concepts

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib