1 SNOTES 4 Chapter 8 Sampling methods and Distributions Objectives: Note: Sampling distributions should be used whenever information about a sample is used to make inferences about the population. Sampling error - The difference between a sample statistic and its corresponding population parameter. Page 269 Sampling error comes from: 1. Sample size 2. Sample Bias 1. Characteristics of sample are not the same as the population 2. Cognition issues-see page 8-9 of snotes 3 How do you solve for sampling bias: 1. Random samples 2. Systematic sampling 3. Cluster sampling 4. Stratified sampling Sampling distribution of sample means - A probability distribution of all possible sample means of a given sample size. A sampling distribution includes every possible sample statistic of a certain sample size that can be drawn from a population. Pg 259 Number in a sampling distribution - combination Population Samples Sampling Distribution Distribution nCx ------------------------------------------------_ _ MEAN ī X E(X) = ī STANDARD DEVIATION σ Sx Standard Deviation of the Mean Excel - Tools/sampling STANDARD DEVIATION OF THE MEAN 8-1 đđĨĖ = đđĨ √đ 2 Pg 280 Rule for Distribution of Sample Means for Normal Population If the population for X is normally distributed with mean ī and _ standard deviation σ, the sample mean X is normally distributed with mean ī and standard deviation shown above. 8-2 Population standard deviation is known page 282 đ§= đĨ− đ đ ⁄ đ √ Central Limit Theorem: If all samples of a particular size are selected from any population, the sampling distribution of the sample means is approximately a normal distribution. This approximation improves with larger samples. Pg 274 Chapter 9 Estimation and Confidence Intervals 1. The student will be able to explain and interpret a confidence interval and confidence level. Pg 294296 2. Student will be able to compute and interpret confidence intervals for a unknown population standard deviation. Pg 302 2. The student will be able to estimate a confidence interval for proportions. Pg 309 3. The student will be able to estimate the proper sample size necessary for a desired confidence level, width and standard deviation of the population. Pg 315-317 Confidence Interval A range of values constructed from sample data so the population parameter occurs within that range at a specified probability. The specified probability is called the level of confidence. 3 Formula 9-1 Confidence Interval for ī: σ Known Pg 298 đĨĖ ± đ§ đ √đ The value 1 - α is termed the confidence coefficient. the value 100(1 - ÎŦ)% is known as the confidence level. Type one error is called α Population Standard Deviation σ Unknown page 302 9-2 đĨĖ ± đĄ đ √đ Excel - fx paste function/statistics/confidence USING A SAMPLING DISTRIBUTION Z VALUES Average Height of males in U.S. U = 70" N = 70,000,000 P =X/n 9-3 n = 10,000 page 310 Confidence Interval For a Population Proportion đ ± đ§√ 9-4 đ(1−đ) đ Sample Size for estimating the population mean Pg 316 đ§đ 2 đ = (đ¸) Estimating sample size for proportions p Pg 317 đ§ 2 đ = đ(1 − đ) (đ¸) 4 PROPORTIONS N = 100,000,000 n = 1000 200 smokers Chapter 10 HYPOTHESIS TESTING 1. The student should know the five steps of hypothesis testing. Pg 332 2. The student will be able to define type I and type errors and p-values. Pg 334-335,356-357 II 3. The student will be able to perform and interpret a one or two sided test using a large population Pg 337-343 4. The student will be able to perform and interpret the results of a one and two sided hypothesis test for a population mean or proportion. Pg 353 The goal in estimation is to estimate the value of some population parameter(u). The goal in significance testing is to decide if a claim about a population parameter is true. Hypothesis testing involves using sample data to test statements, claims, or assumptions about population parameters. Steps of Hypothesis Testing Pg 332 1. Formulate the null hypothesis Ho. Formulate the alternative hypothesis Ha in statistical terms. 2. Set the level of significance α and the sample size n. 3. Select the appropriate test statistic and rejection rule. 4. Collect the data and calculate the test statistic. 5. If the calculated value of the test statistic falls in the rejection region, then reject Ho. If the calculated value of the test statistic does not fall in the rejection region, then do not reject Ho. Testing a mean σ known 10-1 page 335 5 đ§= đĨĖ − đ đ ⁄ đ √ t Distribution testing a mean σ unknown đĄ= 10-2 page 345 đĨĖ − đ đ ⁄ đ √ NULL HYPOTHESIS - OVERHEAD OF ERRORS The null hypothesis usually states that the difference between the sample statistic and its claimed population parameter is due to chance variation in sampling. Ho: B=0 H1: Bī 0 CRITICAL VALUES - OVERHEAD FOR HYPOTHESIS TESTING Type I error occurs if we reject Ho when Ho is true. Pg 468 Type II error occurs if we do not reject Ho when Ho is false. In our legal system the null hypothesis is the individual is presumed innocent. The prosecution gathers a sample of evidence and presents this to a judge or jury. Form this sample of evidence the legal system attempts to disprove the null hypothesis. If the null hypothesis is rejected, the court accepts the alternative hypothesis the individual is not innocent. Ho: Ha: α = P(Rejecting Ho when Ho is true) = P(type I error) β = P(Not rejecting Ho when Ho is false) = P(type II error) [What is the cost of a type one error to society?] [What is the cost of a type II error to society?] TTEST mu=K C ONESIDED TESTS ENGINES - PISTON CLEARANCE Ho: U ī¤ .001 6 H1: U D .001 TYPE I ERROR COST IS TO TEAR DOWN AND RE-ENGINEER ALL ENGINES TYPE II WARRANTY WORK Ho: U ī¤ .001 H1: U < .001 IF YOU ACCEPT THE NULL HYPOTHESIS THE FIRM IS NOT SURE WHETHER THE ENGINE IS ADEQUATE OR NOT. PERFUME IF SOLUTION IS GREATER THAN 1% CAUSES SKIN IRRITATION Ho: U īŖ .01 H1: U > .01 TYPE I THROW OUT AND START OVER TYPE II LAWSUITS [Test of the painkiller Vioxx for heart problems showed 45 of Vioxx users had a heart attack out of 1287 test group. The placebo group had 25 heart attacks out of 1299. Is there a statistically significant difference? WSJ 2/7/05] Test of hypothesis, one proportion 10-3 đ§= đ− đ √ 4/2/11 page 354 đ(1−đ) đ Making a Stat Less Significant WSJ A3 Reference: 7/28/93 AStatisticians Occupy Front Lines in Battle Over Passive Smoking@ WSJ Confidence intervals 2/25/00 ANielson ratings Spark a Battle Over Just Who Speaks Spanish@ WSJ 10/24/06 ACounting War Dead Is Difficult-Therefore, Let=s Not Exaggerate@ WSJ A19