Review • Variables - Catigorical v.s. Quantitative • Variables - Catigorical v.s. Quantitative • Graphs for distributional information: Pie chart, Bar graph, Histogram, Stemplot, Timeplot, Boxplot • Variables - Catigorical v.s. Quantitative • Graphs for distributional information: Pie chart, Bar graph, Histogram, Stemplot, Timeplot, Boxplot • Overall pattern of the graph: Symetric/Skewed, Center, Spread, Outlier, Trend • Measure of center: Mean/Median • Measure of center: Mean/Median • Measure of variability: Quartiles (Q1 , Q2 , Q3 ), Range, IQR, 1.5×IQR rule, Outlier, Variance, Standard deviation • Measure of center: Mean/Median • Measure of variability: Quartiles (Q1 , Q2 , Q3 ), Range, IQR, 1.5×IQR rule, Outlier, Variance, Standard deviation • Five-number summary, Boxplot • Density curve • Density curve • Normal distributions / Normal curves • Density curve • Normal distributions / Normal curves • z-score, Standard normal distribution • Density curve • Normal distributions / Normal curves • z-score, Standard normal distribution • 68 − 95 − 99.7 rule, Probabilities for normal distribution • Explanatory variable / Response variable • Explanatory variable / Response variable • Scatterplot: Direction (Positive / Negative), Form (Linear / Nonlinear), Strength, Outlier • Explanatory variable / Response variable • Scatterplot: Direction (Positive / Negative), Form (Linear / Nonlinear), Strength, Outlier • Correlation • Linear regression: ŷ = a + bx; Slope b, Intercept a, Predication • Linear regression: ŷ = a + bx; Slope b, Intercept a, Predication • Correlation and regression, r 2 , Residual • Linear regression: ŷ = a + bx; Slope b, Intercept a, Predication • Correlation and regression, r 2 , Residual • Cautions for regression: Influential observations, Extrapolation, Lurking variables • Sample / Population • Sample / Population • Random sampling design: Simple random sample (SRS), Stratified random sample, Multistage sample • Sample / Population • Random sampling design: Simple random sample (SRS), Stratified random sample, Multistage sample • Bad samples: Voluntary response sample, Convenience sample • Observational studies & Experimental studies (experiments) • Observational studies & Experimental studies (experiments) • Treatments / Factors • Observational studies & Experimental studies (experiments) • Treatments / Factors • Design of experiments: • Observational studies & Experimental studies (experiments) • Treatments / Factors • Design of experiments: control (comparison, placebo) • Observational studies & Experimental studies (experiments) • Treatments / Factors • Design of experiments: control (comparison, placebo) randomization (table of random digits, double-blind) • Observational studies & Experimental studies (experiments) • Treatments / Factors • Design of experiments: control (comparison, placebo) randomization (table of random digits, double-blind) matched pairs design / Block design • Probability: Sample space (S) & Events • Probability: Sample space (S) & Events • Rules for probability model: • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 2. for sample space S, P(S) = 1 • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 2. for sample space S, P(S) = 1 3. if two events A and B are disjoint, then P(A or B) = P(A) + P(B) • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 2. for sample space S, P(S) = 1 3. if two events A and B are disjoint, then P(A or B) = P(A) + P(B) 4. for any event A, P(A does not occur) = 1 − P(A) • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 2. for sample space S, P(S) = 1 3. if two events A and B are disjoint, then P(A or B) = P(A) + P(B) 4. for any event A, P(A does not occur) = 1 − P(A) • Discrete probability models / Continuous probability models • Probability: Sample space (S) & Events • Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1 2. for sample space S, P(S) = 1 3. if two events A and B are disjoint, then P(A or B) = P(A) + P(B) 4. for any event A, P(A does not occur) = 1 − P(A) • Discrete probability models / Continuous probability models • Random variables / Distributions • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Sampling distribution of the sample mean x̄ for an SRS: • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Sampling distribution of the sample mean x̄ for an SRS: ∗ mean of x̄ equals the population mean µ • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Sampling distribution of the sample mean x̄ for an SRS: ∗ mean of x̄ equals the population mean µ ∗ standard deviation of x̄ equals √σn , where σ is the population standard deviation and n is the sample size • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Sampling distribution of the sample mean x̄ for an SRS: ∗ mean of x̄ equals the population mean µ ∗ standard deviation of x̄ equals √σn , where σ is the population standard deviation and n is the sample size ∗ if the population has a normal distribution, then √ x̄ ∼ N(µ, σ/ n) • Population / Sample; Parameters / Statistics µ / x̄, σ / s, p / p̂ • Statistics are random variables • Sampling distribution of the sample mean x̄ for an SRS: ∗ mean of x̄ equals the population mean µ ∗ standard deviation of x̄ equals √σn , where σ is the population standard deviation and n is the sample size ∗ if the population has a normal distribution, then √ x̄ ∼ N(µ, σ/ n) ∗ central limit theorem: if the sample size is large (n ≥ 30), then x̄ is approximately normal, i.e. √ approx x̄ ∼ N(µ, σ/ n) • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Confidence intervals: • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Confidence intervals: ∗ form: estimate ± margin of error / interpretation • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Confidence intervals: ∗ form: estimate ± marginof error / interpretation σ σ ∗ x̄ − z ∗ √ , x̄ + z ∗ √ n n • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Confidence intervals: ∗ form: estimate ± marginof error / interpretation σ σ ∗ x̄ − z ∗ √ , x̄ + z ∗ √ n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n ∗ P-value: • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to z • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to z ? Ha : µ < µ0 — lower tail probability corresponding to z • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to z ? Ha : µ < µ0 — lower tail probability corresponding to z ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |z| • Inference about µ with known σ — z-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: z = σ/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to z ? Ha : µ < µ0 — lower tail probability corresponding to z ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |z| ∗ significance level α and conclusion • Assumptions for z-procedures: • Assumptions for z-procedures: ∗ the sample is an SRS • Assumptions for z-procedures: ∗ the sample is an SRS ∗ the population has a normal distribution • Assumptions for z-procedures: ∗ the sample is an SRS ∗ the population has a normal distribution ∗ the population standard deviation σ is known • Assumptions for z-procedures: ∗ the sample is an SRS ∗ the population has a normal distribution ∗ the population standard deviation σ is known • Margin of errors in confidence intervals are affected by C , σ and n to get a level C C.I. with margin of m, we need an SRS with sample size ∗ 2 z σ n= m • Assumptions for z-procedures: ∗ the sample is an SRS ∗ the population has a normal distribution ∗ the population standard deviation σ is known • Margin of errors in confidence intervals are affected by C , σ and n to get a level C C.I. with margin of m, we need an SRS with sample size ∗ 2 z σ n= m • The significance of test will also be affected by sample size • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) s • Standard error: √ n • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) s • Standard error: √ n • t-distribution; degrees of freedom (n − 1) • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) s • Standard error: √ n • t-distribution; degrees of freedom (n − 1) • Confidence intervals: • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) s • Standard error: √ n • t-distribution; degrees of freedom (n − 1) • Confidence intervals: ∗ s ∗ s ∗ x̄ − t √ , x̄ + t √ n n • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) s • Standard error: √ n • t-distribution; degrees of freedom (n − 1) • Confidence intervals: ∗ s ∗ s ∗ x̄ − t √ , x̄ + t √ n n ∗ t ∗ is determined by the confidence level C — the t-score corresponding to the upper tail (1 − C )/2 • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n ∗ P-value: • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to t • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |t| • Inference about µ with unknown σ — t-procedures (confidence interval & test of significance) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ = µ0 x̄ − µ0 √ ∗ test statistics: t = s/ n ∗ P-value: ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |t| ∗ significance level α and conclusion • Inference about two means — µ1 − µ2 • Inference about two means — µ1 − µ2 • Standard error for x̄1 − x̄2 : s s12 s2 + 2 n1 n2 • Inference about two means — µ1 − µ2 • Standard error for x̄1 − x̄2 : s s12 s2 + 2 n1 n2 • Confidence interval for µ1 − µ2 : • Inference about two means — µ1 − µ2 • Standard error for x̄1 − x̄2 : s s12 s2 + 2 n1 n2 • Confidence interval for µ1 − µ2 : s s 2 2 s1 s2 s12 s22 ∗ ∗ ∗ (x̄1 − x̄2 ) − t + , (x̄1 − x̄2 ) + t + n1 n2 n1 n2 • Inference about two means — µ1 − µ2 • Standard error for x̄1 − x̄2 : s s12 s2 + 2 n1 n2 • Confidence interval for µ1 − µ2 : s s 2 2 s1 s2 s12 s22 ∗ ∗ ∗ (x̄1 − x̄2 ) − t + , (x̄1 − x̄2 ) + t + n1 n2 n1 n2 ∗ t ∗ is determined by the confidence level C — the t-score corresponding to the upper tail (1 − C )/2 • Inference about two means — µ1 − µ2 • Standard error for x̄1 − x̄2 : s s12 s2 + 2 n1 n2 • Confidence interval for µ1 − µ2 : s s 2 2 s1 s2 s12 s22 ∗ ∗ ∗ (x̄1 − x̄2 ) − t + , (x̄1 − x̄2 ) + t + n1 n2 n1 n2 ∗ t ∗ is determined by the confidence level C — the t-score corresponding to the upper tail (1 − C )/2 ∗ degrees of freedom: smaller of n1 − 1 and n2 − 1 • Inference about two means — µ1 − µ2 • Inference about two means — µ1 − µ2 • Test of significance: • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: ? degrees of freedom: smaller of n1 − 1 and n2 − 1 • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: ? degrees of freedom: smaller of n1 − 1 and n2 − 1 ? Ha : µ > µ0 — upper tail probability corresponding to t • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: ? degrees of freedom: smaller of n1 − 1 and n2 − 1 ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: ? degrees of freedom: smaller of n1 − 1 and n2 − 1 ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |t| • Inference about two means — µ1 − µ2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : µ1 = µ2 (µ1 − µ2 = 0) x̄1 − x̄2 ∗ test statistics: t = q 2 s22 s1 n1 + n2 ∗ P-value: ? degrees of freedom: smaller of n1 − 1 and n2 − 1 ? Ha : µ > µ0 — upper tail probability corresponding to t ? Ha : µ < µ0 — lower tail probability corresponding to t ? Ha : µ 6= µ0 — twice upper tail probability corresponding to |t| ∗ significance level α and conclusion • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Sampling distribution of the sample proportion p̂ for an SRS: • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Sampling distribution of the sample proportion p̂ for an SRS: ∗ mean of p̂ equals the population proportion p • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Sampling distribution of the sample proportion p̂ for an SRS: ∗ mean of p̂ equals the population rproportion p p(1 − p) ∗ standard deviation of p̂ equals n • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Sampling distribution of the sample proportion p̂ for an SRS: ∗ mean of p̂ equals the population rproportion p p(1 − p) ∗ standard deviation of p̂ equals n ∗ If the sampler size is large, p̂ is approximately normal, i.e. p(1 − p) approx p̂ ∼ N(p, ) n • Inference about population proportion p — z-procedures (confidence interval & test of significance) • Sampling distribution of the sample proportion p̂ for an SRS: ∗ mean of p̂ equals the population rproportion p p(1 − p) ∗ standard deviation of p̂ equals n ∗ If the sampler size is large, p̂ is approximately normal, i.e. p(1 − p) approx p̂ ∼ N(p, ) rn p̂(1 − p̂) • Standard error of p̂: n • Inference about population proportion p — z-procedures • Inference about population proportion p — z-procedures • Large-sample confidence intervals: • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 15 and n(1 − p̂) ≥ 15 • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 15 and n(1 − p̂) ≥ 15 • Plus four confidence intervals: • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 15 and n(1 − p̂) ≥ 15 • Plus four confidence intervals: r r p̃(1 − p̃) p̃(1 − p̃) ∗ ∗ ∗ p̃ − z , p̃ + z n+4 n+4 • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 15 and n(1 − p̂) ≥ 15 • Plus four confidence intervals: r r p̃(1 − p̃) p̃(1 − p̃) ∗ ∗ ∗ p̃ − z , p̃ + z n+4 n+4 number of successes in the sample + 2 ∗ p̃ = n+4 • Inference about population proportion p — z-procedures • Large-sample confidence intervals: r r p̂(1 − p̂) p̂(1 − p̂) ∗ ∗ , p̂ + z ∗ p̂ − z n n ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 15 and n(1 − p̂) ≥ 15 • Plus four confidence intervals: r r p̃(1 − p̃) p̃(1 − p̃) ∗ ∗ ∗ p̃ − z , p̃ + z n+4 n+4 number of successes in the sample + 2 ∗ p̃ = n+4 ∗ Use it when the confidence level is at least 90% and the sample size n is at least 10 • Inference about population proportion p — z-procedures • Inference about population proportion p — z-procedures • Test of significance: • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: ? Ha : p > p0 — upper tail probability corresponding to z • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: ? Ha : p > p0 — upper tail probability corresponding to z ? Ha : p < p0 — lower tail probability corresponding to z • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: ? Ha : p > p0 — upper tail probability corresponding to z ? Ha : p < p0 — lower tail probability corresponding to z ? Ha : p 6= p0 — twice upper tail probability corresponding to |z| • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: ? Ha : p > p0 — upper tail probability corresponding to z ? Ha : p < p0 — lower tail probability corresponding to z ? Ha : p 6= p0 — twice upper tail probability corresponding to |z| ∗ significance level α and conclusion • Inference about population proportion p — z-procedures • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p = p0 p̂ − p0 ∗ test statistics: z = q p0 (1−p0 ) n ∗ P-value: ? Ha : p > p0 — upper tail probability corresponding to z ? Ha : p < p0 — lower tail probability corresponding to z ? Ha : p 6= p0 — twice upper tail probability corresponding to |z| ∗ significance level α and conclusion ∗ use this test when np0 ≥ 10 and n(1 − p0 ) ≥ 10 • Inference about two proportions — p1 − p2 • Inference about two proportions — p1 − p2 • Sampling distribution of p̂1 − p̂2 : • Inference about two proportions — p1 − p2 • Sampling distribution of p̂1 − p̂2 : ∗ mean of p̂1 − p̂2 is p1 − p2 • Inference about two proportions — p1 − p2 • Sampling distribution of p̂1 − p̂2 : ∗ mean of p̂1 − p̂2 is p1 − p2 ∗ standard deviation of p̂1 − p̂2 is s p1 (1 − p1 ) p2 (1 − p2 ) + n1 n2 • Inference about two proportions — p1 − p2 • Sampling distribution of p̂1 − p̂2 : ∗ mean of p̂1 − p̂2 is p1 − p2 ∗ standard deviation of p̂1 − p̂2 is s p1 (1 − p1 ) p2 (1 − p2 ) + n1 n2 ∗ If the sample size is large, p̂1 − p̂2 is approximately normal • Inference about two proportions — p1 − p2 • Sampling distribution of p̂1 − p̂2 : ∗ mean of p̂1 − p̂2 is p1 − p2 ∗ standard deviation of p̂1 − p̂2 is s p1 (1 − p1 ) p2 (1 − p2 ) + n1 n2 ∗ If the sample size is large, p̂1 − p̂2 is approximately normal s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) • Standard error of p̂: + n1 n2 • Inference about two proportions — p1 − p2 • Inference about two proportions — p1 − p2 • Large-sample confidence intervals: • Inference about two proportions — p1 − p2 • Large-sample confidence intervals: ∗ ∗ ∗ (p̂1 − p̂2 ) − z SE, (p̂1 + p̂2 ) + z SE , where SE is the standard error of p̂1 − p̂2 : s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) SE = + n1 n2 • Inference about two proportions — p1 − p2 • Large-sample confidence intervals: ∗ ∗ ∗ (p̂1 − p̂2 ) − z SE, (p̂1 + p̂2 ) + z SE , where SE is the standard error of p̂1 − p̂2 : s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) SE = + n1 n2 ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 • Inference about two proportions — p1 − p2 • Large-sample confidence intervals: ∗ ∗ ∗ (p̂1 − p̂2 ) − z SE, (p̂1 + p̂2 ) + z SE , where SE is the standard error of p̂1 − p̂2 : s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) SE = + n1 n2 ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 10 and n(1 − p̂) ≥ 10 • Inference about two proportions — p1 − p2 • Large-sample confidence intervals: ∗ ∗ ∗ (p̂1 − p̂2 ) − z SE, (p̂1 + p̂2 ) + z SE , where SE is the standard error of p̂1 − p̂2 : s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) SE = + n1 n2 ∗ z ∗ is determined by the confidence level C — the z-score corresponding to the upper tail (1 − C )/2 ∗ Use it only when np̂ ≥ 10 and n(1 − p̂) ≥ 10 • Inference about two proportions — p1 − p2 • Inference about two proportions — p1 − p2 • Plus four confidence intervals: • Inference about two proportions — p1 − p2 • Plus four confidence intervals: ∗ ∗ ∗ (p̃1 − p̃2 ) − z SE, (p̃1 + p̃2 ) + z SE , where SE is the standard error of p̃1 − p̃2 : s p̃1 (1 − p̃1 ) p̃2 (1 − p̃2 ) + SE = n1 + 2 n2 + 2 • Inference about two proportions — p1 − p2 • Plus four confidence intervals: ∗ ∗ ∗ (p̃1 − p̃2 ) − z SE, (p̃1 + p̃2 ) + z SE , where SE is the standard error of p̃1 − p̃2 : s p̃1 (1 − p̃1 ) p̃2 (1 − p̃2 ) + SE = n1 + 2 n2 + 2 ∗ p̃i = number of successes in the i th sample + 1 , i = 1, 2 ni + 2 • Inference about two proportions — p1 − p2 • Plus four confidence intervals: ∗ ∗ ∗ (p̃1 − p̃2 ) − z SE, (p̃1 + p̃2 ) + z SE , where SE is the standard error of p̃1 − p̃2 : s p̃1 (1 − p̃1 ) p̃2 (1 − p̃2 ) + SE = n1 + 2 n2 + 2 number of successes in the i th sample + 1 , i = 1, 2 ni + 2 ∗ Use it when n1 ≥ 5 and n2 ≥ 5 ∗ p̃i = • Test of significance: • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s 1 n2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s ∗ P-value: 1 n2 • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s 1 n2 ∗ P-value: ? Ha : p1 − p2 > 0 — upper tail probability corresponding to z • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s ∗ P-value: ? Ha : p1 − p2 > 0 corresponding to ? Ha : p1 − p2 < 0 corresponding to 1 n2 — upper tail probability z — lower tail probability z • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s ∗ P-value: ? Ha : p1 − p2 > 0 corresponding to ? Ha : p1 − p2 < 0 corresponding to ? Ha : p1 − p2 6= 0 corresponding to 1 n2 — upper tail probability z — lower tail probability z — twice upper tail probability |z| • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s 1 n2 ∗ P-value: ? Ha : p1 − p2 > 0 — upper tail probability corresponding to z ? Ha : p1 − p2 < 0 — lower tail probability corresponding to z ? Ha : p1 − p2 6= 0 — twice upper tail probability corresponding to |z| ∗ significance level α and conclusion • Test of significance: ∗ hypotheses: H0 v.s Ha / H0 : p1 = p2 (p1 − p2 = 0) ∗ pooled sample proportion p̂: number of successes in both samples combined p̂ = number of individuals in both samples combined p̂1 − p̂2 p̂(1 − p̂) n11 + ∗ test statistics: z = s 1 n2 ∗ P-value: ? Ha : p1 − p2 > 0 — upper tail probability corresponding to z ? Ha : p1 − p2 < 0 — lower tail probability corresponding to z ? Ha : p1 − p2 6= 0 — twice upper tail probability corresponding to |z| ∗ significance level α and conclusion ∗ use this test when counts of successes and failures are each 5 or more in boh samples • Chi-square test for a two-way table • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Compares the observed counts in the cells of the two-way table with the counts that would be expected if H0 were true • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Compares the observed counts in the cells of the two-way table with the counts that would be expected if H0 were true expected count = row total × column total table total • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Compares the observed counts in the cells of the two-way table with the counts that would be expected if H0 were true expected count = row total × column total table total • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Compares the observed counts in the cells of the two-way table with the counts that would be expected if H0 were true expected count = row total × column total table total • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Degrees of freedom of χ2 : (r − 1)(c − 1), where r is the number of rows and c is the number of columns • Chi-square test for a two-way table • Hypotheses: H0 : there is no relationship between the two variables (row variable and column variable) v.s. Ha : there is some relationship • Compares the observed counts in the cells of the two-way table with the counts that would be expected if H0 were true expected count = row total × column total table total • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Degrees of freedom of χ2 : (r − 1)(c − 1), where r is the number of rows and c is the number of columns • P-value: the area under the chi-square density curve to the right of the value of the test statistic • Chi-square test for goodness of fit • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Compares the observed counts of each category with the counts that would be expected if H0 were true • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Compares the observed counts of each category with the counts that would be expected if H0 were true expected count for category i = npi0 • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Compares the observed counts of each category with the counts that would be expected if H0 were true expected count for category i = npi0 • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Compares the observed counts of each category with the counts that would be expected if H0 were true expected count for category i = npi0 • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Degrees of freedom of χ2 : k − 1, where k is the number of categories • Chi-square test for goodness of fit • Null hypothesis: H0 : p1 = p10 , p2 = p20 , . . . , pk = pk0 • Compares the observed counts of each category with the counts that would be expected if H0 were true expected count for category i = npi0 • Chi-square test statistic: χ2 = X (observed count − expected count)2 expected count • Degrees of freedom of χ2 : k − 1, where k is the number of categories • P-value: the area under the chi-square density curve to the right of the value of the test statistic • One-way analysis of variance (ANOVA) compares the means of sevral populations. • One-way analysis of variance (ANOVA) compares the means of sevral populations. • Hypotheses for ANOVA F -test: H0 : all the populations have the same mean v.s. Ha : not all the means are the same • One-way analysis of variance (ANOVA) compares the means of sevral populations. • Hypotheses for ANOVA F -test: H0 : all the populations have the same mean v.s. Ha : not all the means are the same • F = variation among the sample means variation among individuals among the same sample • One-way analysis of variance (ANOVA) compares the means of sevral populations. • Hypotheses for ANOVA F -test: H0 : all the populations have the same mean v.s. Ha : not all the means are the same • F = variation among the sample means variation among individuals among the same sample degrees of freedom for the numerator is I − 1 and degrees of freedom for the denominator is N − I , where I is the number of populations and N is the total number of observations from I samples • One-way analysis of variance (ANOVA) compares the means of sevral populations. • Hypotheses for ANOVA F -test: H0 : all the populations have the same mean v.s. Ha : not all the means are the same • F = variation among the sample means variation among individuals among the same sample degrees of freedom for the numerator is I − 1 and degrees of freedom for the denominator is N − I , where I is the number of populations and N is the total number of observations from I samples • Conditions for use ANOVA: independent SRS from each population; each population is Normally distributed; all populations have the same standard deviation