Statistics 2 Formula AQA

advertisement
Stats 2 (AQA) Formula
(everything in blue in formula book)
Expectation (mean) & Variance of Random Variable from Probability Distribution
π‘‰π‘Žπ‘Ÿ(𝑋) = 𝜎 2 = ∑ π‘₯𝑖 2 𝑝𝑖 − πœ‡ 2
𝐸(𝑋) = πœ‡ = ∑ π‘₯𝑖 𝑝𝑖
= 𝐸(𝑋 2 ) − πœ‡ 2
(  ο‚Ή x .  is theoretical mean, x is sample mean.)
π‘‰π‘Žπ‘Ÿ(π‘Žπ‘‹ + 𝑏) = π‘‰π‘Žπ‘Ÿ(π‘Žπ‘‹) = π‘Ž2 π‘‰π‘Žπ‘Ÿ(𝑋)
𝐸(π‘Žπ‘‹ + 𝑏) = 𝐸(π‘Žπ‘‹) + 𝑏 = π‘ŽπΈ(𝑋) + 𝑏
Poisson distribution (a discrete distribution)
𝑃(𝑋 = π‘₯) = 𝑒 −πœ†
𝑋~π‘ƒπ‘œ(πœ†)
𝑝π‘₯ =
πœ†π‘₯
π‘₯!
πœ‡ = πœ†, 𝜎 2 = πœ†
πœ†
𝑝
π‘₯ π‘₯−1
If 𝑋~π‘ƒπ‘œ(πœ†1 ) and π‘Œ~π‘ƒπ‘œ(πœ†2 ), then the distribution of (𝑋 + π‘Œ)~π‘ƒπ‘œ(πœ†1 + πœ†2 ).
Continuous Random Variables
Probability Density Function (pdf) = f
Cumulative Distribution Function (cdf) = F
𝑑
𝑓(π‘₯) = 𝐹(π‘₯)
𝑑π‘₯
𝐹(π‘₯) = ∫ 𝑓(𝑑) 𝑑𝑑
π‘₯
−∞
(replace lower limit with lower limit of function when calculating)
𝐹(−∞) = 0
𝐹(+∞) = 1
Three methods to find specific probabilities of X:
𝑑
1. 𝑃(𝑐 < 𝑋 < 𝑑) = ∫𝑐 𝑓(π‘₯) 𝑑π‘₯ (ie integrate to find area between two values)
2. Having integrated, evaluate between limits (similar to finding normal distribution
probabilities).
𝑃(𝑐 < 𝑋 < 𝑑) = 𝐹(𝑑) − 𝐹(𝑐)
𝐹(𝑋 > 𝑑) = 1 − 𝐹(𝑑)
3. Use geometry of graph of 𝑓(π‘₯) (where possible, splitting it into triangles, trapezia etc).
To find median value, m, solve for m either:
π‘š
or
0.5 = ∫−∞ 𝑓(π‘₯)𝑑π‘₯
0.5 = 𝐹(π‘š)
To find, for example, the value at 95th percentile, solve for d;
𝑑
or
0.95 = ∫−∞ 𝑓(π‘₯)𝑑π‘₯
Mean and variance…
0.95 = 𝐹(𝑑)
∞
𝐸(𝑋) = πœ‡ = ∫ π‘₯𝑓(π‘₯) 𝑑π‘₯
−∞
∞
2
𝐸(𝑋 2 ) = ∫ π‘₯ 𝑓(π‘₯) 𝑑π‘₯
−∞
∞
𝐸(𝑔(𝑋)) = ∫ 𝑔(π‘₯)𝑓(π‘₯) 𝑑π‘₯
−∞
If distribution is symmetrical then 𝐸(𝑋) = π‘π‘’π‘›π‘‘π‘Ÿπ‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’.
∞
π‘‰π‘Žπ‘Ÿ(𝑋) = 𝐸(𝑋 2 ) − πœ‡ 2 = ∫ π‘₯ 2 𝑓(π‘₯) 𝑑π‘₯ − πœ‡ 2
−∞
Rectangular (Uniform) Distribution
𝑓(π‘₯) =
1
𝐸(𝑋) = (π‘Ž + 𝑏)
2
1
𝑏−π‘Ž
π‘‰π‘Žπ‘Ÿ(𝑋) =
𝐹(π‘₯) =
1
(𝑏 − π‘Ž)2
12
π‘₯−π‘Ž
𝑏−π‘Ž
Estimation
The t distribution (two random variables, 𝑋̅ and 𝑆, i.e. unknown population variance).
𝑇=
𝑋̅ − πœ‡
𝑆
√𝑛
𝑇~π‘‘πœˆ or 𝑇~𝑑𝑛−1
𝜈 = π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘’π‘”π‘Ÿπ‘’π‘’π‘  π‘œπ‘“ π‘“π‘Ÿπ‘’π‘’π‘‘π‘œπ‘š = 𝑛 − 1
(parameter 𝜈 pronounced ‘nu’)
Confidence intervals given by
π‘₯Μ… ± "c"
𝑠
√𝑛
𝑠2
π‘₯Μ… ± "c"√
𝑛
Hypothesis Testing
Null Hypothesis
𝐻0 : πœ‡ = 435
Alternative Hypothesis
𝐻1 : πœ‡ < 435
One tailed
𝐻1 : πœ‡ ≠ 435
Two tailed
Acceptance of null hypothesis does not mean it is true, rather that the data provides no
evidence to prefer the alternative.
Type 1 error – to reject H0 (and accept H1) when H0 is actually true.
Type 2 error – to reject H1 (and accept H0) when H1 is actually true.
𝑃(𝑇𝑦𝑝𝑒 1 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ) = π‘ π‘–π‘”π‘›π‘–π‘“π‘–π‘π‘Žπ‘›π‘π‘’ 𝑙𝑒𝑣𝑒𝑙
Test Procedure:
1. Write down the two hypotheses.
2. Identify an appropriate test statistic and the distribution of the corresponding random
variable.
3. Identify the significance level (usually given). This is also 𝑃(𝑇𝑦𝑝𝑒 1 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ).
4. Determine the critical region (should be done before collecting data).
5. Calculate the value of the test statistic.
6. Determine and clarify in context the outcome of the test.
𝝌𝟐 Contingency Tables Tests
Part A – The 𝝌𝟐 Distribution
𝑋 2 ~πœ’πœˆ2
𝜎 2 = 2𝜈
(where 𝜐 = π‘š − 1)
πœ‡=𝜐
π‘š
(𝑂𝑖 − 𝐸𝑖 )2
𝑋 =∑
𝐸𝑖
2
𝑖=1
(where π‘š = number of different possible outcomes)
All expected frequencies must be greater than 5
Part B – Contingency Tables
Associated vs independent
To calculate expected frequencies from observed frequencies use
π‘Ÿπ‘œπ‘€ π‘‘π‘œπ‘‘π‘Žπ‘™ × π‘π‘œπ‘™π‘’π‘šπ‘› π‘‘π‘œπ‘‘π‘Žπ‘™
π‘”π‘Ÿπ‘Žπ‘›π‘‘ π‘‘π‘œπ‘‘π‘Žπ‘™
𝜐 = (π‘Ÿ − 1)(𝑐 − 1)
Yates’s correction for a 2x2 table (𝜈 = 1 ):
4
𝑋𝐢2
=∑
1
(|𝑂𝐼 − 𝐸𝑖 | − 0.5)2
𝐸𝑖
𝑁
𝑁 2
=
(|π‘Žπ‘‘ − 𝑏𝑐| − )
π‘šπ‘›π‘Ÿπ‘ 
2
Where…
a
c
r
b
d
s
m
n
N
Download