Review for Final Exam • Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis testing Hypothesis: H0 v.s. Ha Test statistic P-value Conclusion Review for Final Exam Question C.I.? 1-S? μ? p? Test? C.I.? 2-S? 1-S? 2-S? 1-S? Test? 2-S? 1-S? 2-S? Review for Final Exam • Inference about population proportion p Confidence interval: A level C confidence interval for p is given by Standard Error pˆ z * pˆ (1 pˆ ) n where z* is a z-critical value corresponding to the confidence level C, n is the sample size, and ^ p is the sample proportion. Review for Final Exam • Inference about population proportion p The level C confidence interval for a population proportion p will have margin of error approximately equal to a specified value m when the sample size is 2 z * n p (1 p * ) m where p* is a guessed value for the sample proportion. The margin of error will be at most m if p* is taken to be 0.5. * Review for Final Exam • Inference about population proportion p Hypothesis testing Hypotheses: H0:p=p0 v.s. Ha:p>p0/p<p0/p≠p0 Test Statistic: z pˆ p0 p0 (1 p0 ) n Review for Final Exam • Inference about population proportion p (continued): Hypothesis testing: P-value: P-value=1-Φ(z), for Ha:p>p0 P-value=Φ(z), for Ha:p<p0 P-value=2(1-Φ(|z|)), for Ha:p≠p0 Here z is the value of the test statistic and Φ(z) is the probability from the normal table corresponding to z. Conclusion: Reject H0 if P-value<α Do not reject H0 if P-value>α Review for Final Exam Review for Final Exam • Inference about population mean μ Confidence interval: A level C confidence interval for μ is given by * s x t Standard Error n where t* is the t-critical value corresponding to degrees of freedom n-1 and the confidence level C, n is the sample _size, s is the sample standard deviation, and x is the sample mean. Review for Final Exam • Inference about population mean μ Hypothesis testing: Hypotheses: H0:μ=μ0 v.s. Ha:μ>μ0/μ<μ0/μ≠μ0 Test Statistic: x 0 t s n The test statistic follows a t-distribution with degrees of freedom n-1. Review for Final Exam • Inference about population mean μ Hypothesis testing: P-value: P-value=Tdf(t), for Ha:μ>μ0 P-value=Tdf(-t), for Ha:μ<μ0 P-value=2Tdf(|t|), for Ha:μ≠μ0 Here Tdf(t) means look up the t-Critical Values Table for the test statistic t. Conclusion: Reject H0 if P-value<α Do not reject H0 if P-value>α Review for Final Exam Review for Final Exam • Interpretation about hypothesis testing P-value is the probability, assuming the null hypothesis is true, that the test statistic will take a value as extreme or more extreme (meaning favoring the alternative hypothesis Ha) than that actually observed. Caution: P-value is NOT the probability that the null hypothesis is wrong. Review for Final Exam • Interpretation about hypothesis testing Type I error: reject H0 while is H0 true Type II error: do not reject H0 while is H0 false The significance level α is our tolerance for the probability of making type I error. The P-value is the probability of making type I error when we reject the null hypothesis based on our sample. If the consequences of rejecting the null hypothesis are very serious, we want to be conservative at rejecting H0. Therefore, we should choose a small α. Review for Final Exam – Practice • In a survey conducted by a firm, 12 of 60 families in two story houses were found to own their houses. Let p denote the population proportion of families of two story houses who own their house. Find a 95% confidence interval for p. The firm came up with a confidence interval (0.1406, 0.2594) for p. What confidence level did the firm use? Assume nothing is known about p. The firm requires a 95% confidence interval with margin of error at most 0.034 for p. What is the required sample size? Suppose that a previous survey indicates that the p is 0.28. The firm requires a 95% confidence interval with margin of error at most 0.034 for p. What is the required sample size? Review for Final Exam – Practice • Solution: Find a 95% confidence interval (C.I.) for p. In general, a level C C.I. for p is given by pˆ (1 pˆ ) pˆ (1 pˆ ) * * pˆ z ˆz , p n n In this case, ^ p=12/60=0.2; n=60; z*=1.96 (according to the 95% confidence level) Thus a 95% C.I. for p is 0.2(1 0.2) 0.2(1 0.2) 0.2 1.96 (0.0988, 0.3012) , 0 . 2 1 . 96 60 60 Review for Final Exam – Practice • Solution: The firm came up with a confidence interval (0.1406, 0.2594) for p. What confidence level did the firm use? Confidence interval for p can also be given by pˆ ME pˆ (1 pˆ ) * where ME is the margin of error: ME z n In this case, ME=0.2594-0.2=0.0594 The standard error is pˆ (1 pˆ ) 0.2(1 0.2) SE 0.0516 n 60 Then z*=ME/SE=0.0594/0.0516=1.15, which corresponds to confidence level 75%. Review for Final Exam – Practice • Solution: Assume nothing is known about p. The firm requires a 95% C.I. with margin of error at most 0.034 for p. What is the required sample size? The required sample size for a level C (corresponding to z*) C.I. for a p with margin of error approximately equal to m is 2 z * n p (1 p * ) m In this case: z*=1.96, p*=0.5, m=0.034 2 Then 1 . 96 n 0.5(1 0.5) 830.8 831. 0.034 * Review for Final Exam – Practice • Solution: Suppose that a previous survey indicates that the p is 0.28. The firm requires a 95% C.I. with margin of error at most 0.034 for p. What is the required sample size? The required sample size for a level C (corresponding to z*) C.I. for a p with margin of error approximately equal to m is 2 z * n p (1 p * ) m In this case: z*=1.96, p*=0.28, m=0.034 2 Then 1.96 n 0.28(1 0.28) 669.95 670. 0.034 * Review for Final Exam – Practice • To target the right age-group of people, a marketing consultant must find which age-group purchases from home-shopping channels on TVs more frequently. According to management of TeleSell24/7, a homeshopping store on TV, about 40% of the online-musicdownloaders are in their fifties, but the marketing consultant does not believe in that figure. To test this he selects a random sample of 205 online-musicdownloaders and finds 71 of them are in their fifties. What are the hypotheses in this case? What is the value of the test statistic? What is the P-value of the test? What is your conclusion at α=5%? Review for Final Exam – Practice • Solution: The sample: pˆ 71 0.346, n 205 205 What are the hypotheses in this case? H0:p=0.4 v.s. Ha:p≠0.4 What is the value of the test statistic? z pˆ p0 p0 (1 p0 ) n 0.346 0.40 1.58 0.40(1 0.40) 205 Review for Final Exam – Practice • Solution: What is the P-value of the test? According to Ha:p≠0.4, P-value=2(1-Φ(|1.58|))=0.1141. What is your conclusion? Since P-value>α(=5%), we do not reject the null hypothesis. If we concluded that 40% of the online-musicdownloaders are in their fifties while in fact this proportion is 35%, then we made a Type I Error. we made a Type II Error. we made a correct decision. Review for Final Exam – Practice • The safety management of an offshore oil-mining corporation believes that the true average escape time would be at most 340 min. A sample of 28 offshore oilworkers took part in a simulated escape exercise. The sample yielded an average escape time of 347.68 min. and standard deviation of 26.95 min. Does this data contradict the management's claim? What are the hypotheses in this case? What is the value of the test statistic? What is the P-value of the test? What is your conclusion at α=5%? What is a 98% confidence interval of the average escape time? Review for Final Exam – Practice • Solution: The sample: x 347.68, s 26.95, n 28. What are the hypotheses in this case? H0:μ=340 v.s. Ha:μ>340 What is the value of the test statistic? x 0 347.68 340 t 1.508 s 26.95 28 n The test statistic follows a t-distribution with degrees of freedom 28-1=27. Review for Final Exam – Practice • Solution: What is the P-value of the test? According to Ha:μ>340, P-value is between 0.05 and 0.10. Review for Final Exam – Practice • Solution: What is your conclusion? Since P-value>α(=5%?), we do not reject the null hypothesis. If we concluded that the management's claim is correct while in fact average escape time is 340 min., then we made a Type I Error. we made a Type II Error. we made a correct decision. Review for Final Exam – Practice • Solution: What is a 98% confidence interval of the average escape time? A level C confidence interval for μ is given by * s x t n We have t*=2.473 (corresponding to degrees of freedom 27 and the confidence level 98%); _ n=28, s=26.95, and x=347.68. So a 98% confidence interval of the average escape time is 26.95 347.68 2.473 (335.0848,360.2752). 28 Review for Final Exam – Practice Review for Final Exam – Practice • In a test of hypothesis, if we insist on very strong evidence against the null hypothesis we should choose α to be very small choose α to be larger than the P-value choose α to be very large choose α to be smaller than the P-value Review for Final Exam – Practice • Based on a random sample of 50 students from among 40,000, a 91 percent confidence interval on the mean height of all 40,000 students was found to be the interval from 66 inches to 69.2 inches. Select the correct statement below: About 91 percent of all 40,000 students have heights between 66 and 69.2. About 91 percent of the heights in the sample should be between 66 and 69.2 The probability that the mean height is between 66 and 69.2 is 91 percent. About 91 percent of all samples would produce intervals containing μ Review for Final Exam – Practice • In a test of hypotheses, data are deemed to be significant at level α=0.05, but not significant at level α=0.01. Which of the following is true about the P-value associated with this test? P-value is greater than 0.05. P-value is between 0.01 and 0.05. P-value is less than 0.01. Nothing can be said. Review for Final Exam • Sample / Population • Statistics / Parameters • Random sampling design Simple random sample (SRS) Stratified random sample Cluster sample Multistage sample • Use random digits to draw simple random samples Review for Final Exam • • • Law of large numbers Probability: Sample space / Events Rules for probability model: 1. for any event A, 0 ≤ P(A) ≤ 1. 2. for sample space S, P(S) = 1. 3. if two events A and B are disjoint, then P(A or B) = P(A) + P(B). 4. for any event A, P(A does not occur) = 1 - P(A). 5. For two independent events A and B, P(A and B) = P(A) X P(B). • Venn diagram Review for Final Exam • General Addition Rule: For two events A and B, P(A or B) = P(A) + P(B) – P(A and B). • General Multiplication Rule For two events A and B, P(A and B) = P(B|A) X P(A). • Conditional probability P(A and B) P(B | A) P(A) • Independence: P(B|A) = P(B). Review for Final Exam • Random variable: A random variable is a variable whose value is a numerical outcome of a random phenomenon. • Distribution: The probability distribution (distribution) of a random variable tells us what values this random variable can take and how to assign probabilities to those values. Review for Final Exam • Statistics are random variables. Sample proportion Sample mean • Central limit theorem • Sampling distributions of statistics Review for Final Exam • Sampling distribution of the sample proportion p^for an SRS of size n: mean of ^ p equals the population proportion p; standard deviation of p^equals p (1 p ) ; n If the sample size is large, then p^ is approximately Normal, that is, p (1 p ) . pˆ ~ N p, n Review for Final Exam _ • Sampling distribution of the sample mean x for an SRS of size n: _ mean of x equals the population mean μ; _ standard deviation of x equals , where σ is the n population standard deviation; _ if the sample size is large, then x is approximately normal, that is, σ x ~ N , ; n if the population has a normal distribution, then the approximation is exact. Review for Final Exam – Practice • Motor vehicles sold to individuals are classified as either cars or light trucks (including SUVs) and as either domestic or imported. In a recent year, 69% of vehicles sold were light trucks, 78% were domestic, and 55% were domestic light trucks. For a randomly selected vehicle, what is the probability that the vehicle is a car? the vehicle is either domestic or a light truck or both? the vehicle is an imported light truck? the vehicle is a domestic if we know it is a car? Review for Final Exam – Practice • 56% of all American workers have a workplace retirement plan, 66% have health insurance, and 73% have at least one of the benefits. We select a worker at random. What is the probability that he has both health insurance and a retirement plan? What is the probability that he has neither health insurance nor a retirement plan? What is the probability that he only has a retirement plan? Knowing that he has a retirement plan, what is the probability that he has health insurance? Review for Final Exam – Practice • Solution: Let A be the event that he has a retirement plan. Let B be the event that he has health insurance. Then P(A)=0.56, P(B)=0.66, and P(A or B)=0.73. A B B A Review for Final Exam – Practice • Solution: What is the probability that he has both health insurance and a retirement plan? P(A and B)=? General addition rule: P(A or B) = P(A) + P(B) - P(A and B) Therefore, P(A and B) = P(A) + P(B) - P(A or B) = 0.56+0.66-0.73 = 0.49 A B Review for Final Exam – Practice • Solution: What is the probability that he has neither health insurance nor a retirement plan? The probability that he has at least one benefit is 0.73. Therefore, the probability that he has neither health insurance nor a retirement plan is 10.73=0.27. A B Review for Final Exam – Practice • Solution: What is the probability that he only has a retirement plan? “Only has a retirement plan” means has a retirement plan but no health insurance (not both). Therefore, P(he only has a retirement plan) = P(A) – P(A and B) = 0.56-0.49 = 0.07 A B Review for Final Exam – Practice • Solution: Knowing that he has a retirement plan, what is the probability that he has health insurance? P(B and A) 0.49 P(B | A) 0.875. P(A) 0.56 Review for Final Exam – Practice • Spell-checking software catches “nonword errors” that result in a string of letters that is not a word, as when “the” is typed as “teh.” When undergraduates are asked to type a 250-word essay (without spell-checking), the number X of nonword errors has the following distribution: X 0 1 2 3 >=4 Probability 0.1 0.2 0.3 0.3 ? • For a randomly selected student, what is the probability that he made 4 or more errors? he made at most 1 error? • For four randomly selected student, what is the probability that each of them made no more than 2 errors? at least one of them made an error? Review for Final Exam – Practice • In a large Statistics lecture, the professor reports that 52% of the students enrolled have never taken a Calculus course, 34% have taken only one semester of Calculus, and the rest have taken two or more semesters of Calculus. The professor randomly assigns students to groups of three to work on a project for the course. What is the probability that the first group member you meet has studied some Calculus? What is the probability that the first group member you meet has studied no more than one semester of Calculus? What is the probability that both of your two group members have studied exactly one semester of Calculus? What is the probability that at least one of your group members has had more than one semester of Calculus? Review for Final Exam – Practice • Solution: Let A denote the event that a student has never taken a Calculus course Let B denote the event that a student has taken only one semester of Calculus Let C denote the event that a student has taken two or more semesters of Calculus. A B C Review for Final Exam – Practice • Solution: First, we can find the probability that a student has taken two or more semesters of Calculus: P(C) = 1–P(A)–P(B) = 1-0.52-0.34=0.14. What is the probability that the first group member you meet has studied some Calculus? {Some Calculus} = B or C P(Some Calculus) = P(B or C) = P(B)+P(C) = 0.34+0.14 = 0.48. Review for Final Exam – Practice • Solution: What is the probability that the first group member you meet has studied no more than one semester of Calculus? C = {a student has taken two or more semesters of Calculus} CC = {a student has studied no more than one semester of Calculus} P(no more than one semester of Calculus) = P(CC) = 1-P(C) = 1-0.14 = 0.86. Review for Final Exam – Practice • Solution: What is the probability that both of your two group members have studied exactly one semester of Calculus? The two events A1={first member has studied exactly one semester of Calculus} A2={second member has studied exactly one semester of Calculus} are independent. Thus, P(both members have studied exactly one semester of Calculus) = P(A1 and A2) = P(A1)XP(A2) = 0.34X0.34 = 0.1156 Review for Final Exam – Practice • Solution: What is the probability that at least one of your group members has had more than one semester of Calculus? Let E={at least one of your group members has had more than one semester of Calculus} EC={neither of your group members has had more than one semester of Calculus} E1={first members does not have had more than one semester of Calculus} E2={second members does not have had more than one semester of Calculus} P(EC) = P(E1 and E2) = P(E1)XP(E2) = (1-0.14)2. P(E) = 1-P(EC) = 1-(1-0.14)2 = 0.2604. Review for Final Exam – Practice • A North American roulette wheel has 38 slots, of which 18 are red, 18 are black, and 2 are green. If you bet on red, the probability of winning is 18/38 = .4737. The probability .4737 represents (A) nothing important, since every spin of the wheel results in one of three outcomes (red, black, or green). (B) the proportion of times this event will occur in a very long series of individual bets on red. (C) the fact that you're more likely to win betting on red than you are to lose. (D) the fact that if you make 100 wagers on red, you'll have 47 or 48 wins. Review for Final Exam – Practice • A company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. Here the population of interest is: 100 batteries, which were tested / average of 250 hours/ all newly developed batteries by the company / lifetime of newly developed batteries Here the sample is: 100 batteries, which were tested / lifetime of newly developed batteries / average of 250 hours / not in the list Review for Final Exam – Practice • A company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. What is the parameter of interest in this case? average lifetime of 100 batteries tested / average of all newly developed batteries by the company / 100 batteries sampled and tested / no parameter is involved in this problem The 250 hours is the value of: parameter / statistic / sample / variable Review for Final Exam – Practice • There are 30 problems in Ch12 in 4 pages and 45 problems in Ch13 in another set of 4 pages. In order to make up a homework set based on chapters 12 and 13 the instructor considers the following different schemes. Identify the sampling scheme employed. Method 1: Label the 75 problems from 1 through 75 and draw 10 numbers at random and choose the corresponding problems. Simple Random Sampling Method 2: Pick 4 problems from the 30 in chapter 12 and pick 6 problems from the 45 in chapter 13. Stratified Random Sampling Method 3: Pick two pages at random and assign all the problems in those pages Cluster Sampling Method 4: Pick two pages at random and pick 5 problems at random from each of those two pages. Multistage Sampling Review for Final Exam – Practice • A student group has 8 members: 1. Barrett 2. Chen 3. DeRoos 4. Maceli 5. Pagliarulo 6. Smithson 7. Williams 8. Zachary Three of them will be selected to participate a national conference. If we use the following random digits (start from the left) to select a simple random sample of size 3, then who will attend the conference? 2023967 8523610 4317063 5689043 5463038 9406022 A. Barrett, Chen, DeRoos B. Chen, Chen, DeRoosi C. Chen, DeRoos, Smithson D. Chen, Pagliarulo, Williams Review for Final Exam • • • • Data / Data table Cases Variables (Categorical / Quantitative) Display Categorical Variables Frequency Table / Relative Frequency Table Bar Chart / Relative Frequency Bar Chart / Pie Chart Review for Final Exam • Graphic techniques for displaying quantitative variables: Histograms Stem-and-leaf displays • Shape of distributions: Unimodal / Bimodal / Multimodal / Uniform Symmetric / Skewed to the left / Skewed to the right Outlier Review for Final Exam • Numerical descriptions for the distribution of a quantitative variable : The center of a distribution Mean Median The spread of a distribution Standard deviation Interquartile Range (IQR) Five number summary / Outlier (1.5IQR rule) Boxplot Review for Final Exam • Shifting and rescaling of quantitative variables • Standardization of quantitative variables (z-score) z x-x s • The Normal model Mean and standard deviation 68-95-99.7 rule Two types of problems: Find percentage Find percentiles Review for Final Exam • Scatterplot for two quantitative variables Direction positive / negative Form linear / curved / no pattern Strength strong / moderate / weak • Correlation coefficient r Review for Final Exam • Linear models yˆ b0 b1 x • Least square regression line sy b1 r and b0 y b1 x sx • Predictions and residuals Review for Final Exam – Practice • The mean height of American women in their early twenties is about 64.5 inches and the standard deviation is about 2.5 inches. The mean height of men the same age is about 68.5 inches, with standard deviation about 2.7 inches. If the correlation between the heights • of husbands and wives is about r = 0.5, what is the equation of the regression line of the husband’s height on the wife’s height in young couples? Predict the height of the husband of a woman who is 67 inches tall. What percentage of variation in husbands’ height is explained by wives’ height? Review for Final Exam – Practice • Michigan State University researchers want to investigate how rainfall affects the yield of crops in East Lansing. The researchers found that the average amount of rainfall over the past 20 years is about 230 inches and the standard deviation is about 10 inches. The average yield of crops in East Lansing is about 280 tones with a standard deviation of 20 tones. The correlation between the amount of rainfall and yield of crops is about 0.4. 1) What is the slope of the regression line of yield of crop on amount of rainfall? 2) What is the intercept of the appropriate regression line? 3) What is the predicted value of the yield of crop when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual? 4) What percentage of variation in crop yield is explained by the rainfall? Review for Final Exam – Practice • Solution: 1) What is the slope of the regression line of yield of crop on amount of rainfall? s The slope is given by b1 r y sx Here r 0.4, s x 10, s y 20. Thus the slope is 20 b1 0.4 0.8 10 2) What is the intercept of the appropriate regression line? The intercept is given by b0 y b1 x Here x 230, y 280, b1 0.8. Thus the intercept is b0 280 230(0.8) 96. Review for Final Exam – Practice • Solution: 3) What is the predicted value of the yield of crop when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual? The predicted value is yˆ 96 0.8 x 96 0.8(240) 288. The residual is y yˆ 280 288 8. 4) What percentage of variation in crop yield is explained by the rainfall? The quantity r2 tells us the percentage of changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.42=0.16. Review for Final Exam – Practice • In a population of couples the average height of wives' was 65.2 inches and that of the husbands 68.2 inches. You use the regression line to make predictions of the wife's height from the husband's height. Suppose a husband has height 68.2 inches, what would be the predicted height of the wife? • Solution: The regression line satisfies y b0 b1 x Since the husband’s height (68.2 inches) is same as the average height of husbands, the predicted height of the wife should also be the average height of wives, that is, 65.2 inches. Review for Final Exam – Practice • A regression study on obesity shows that doing more physical exercises reduces weight. In this study they have found time spent in physical exercise explained 16% of the total sample variation in weight among obese people. What is the correlation between "time spent in physical exercise" and "weight"? • Solution: The quantity r2 tells us the percentage of changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.16. So the correlation is r=0.4. Review for Final Exam – Practice • Suppose that in families with 5 children X is the number of boys and Y is the number of girls. What is the correlation between X and Y? • Solution: Since X+Y=5, or equivalently Y =5-X, X and Y are linearly related. Therefore, the correlation between X and Y is -1. Review for Final Exam – Practice • Which scatterplot has correlation near zero? Review for Final Exam – Practice • In a photographic process, the developing time of prints are approximately normal with mean 15.4 seconds and standard deviation 0.4 seconds. 1) What proportion of prints will take at least 14.64 sec to develop? 2) What proportion of prints will take 14.64 sec to 16.00 sec to develop? 3) How many seconds is needed at most for the quickest 10%? Review for Final Exam – Practice • Solution: 1) What proportion of prints will take at least 14.64 sec to develop? The z-score corresponding to 14.64 is x 14.64 15.4 z 1.9. 0.4 The probability corresponding to z-score -1.9 is 0.0287. Therefore, the proportion of prints that will take at least 14.64 sec to develop is 1-0.0287=0.9713. Review for Final Exam – Practice • Solution: 1) What proportion of prints will take 14.64 sec to 16.00 sec to develop? The z-score corresponding to 16 is x 16 15.4 z 1.5. 0.4 The probability corresponding to z-score 1.5 is 0.9332. Therefore, the proportion of prints will take 14.64 sec to 16.00 sec to develop is 0.93320.0287=0.9045. Review for Final Exam – Practice • Solution: 1) How many seconds is needed at most for the quickest 10%? Quickest 10% corresponds to the smallest 10% (less time). The z-score corresponding to probability 0.1 is 1.28. Therefore, the seconds needed at most for the quickest 10% is x z 15.4 (1.28)0.4 10.28. Review for Final Exam – Practice • Which seems to be the likely value of Q1 (the first quartile)? 22 • Which seems to be the likely value of the 48 median? • What percentage of the observations is lying outside the box? 50% • What is the approximate value of the range? 110-5=105 Review for Final Exam – Practice • The following stem-and-leaf display shows the number of patients attended by a house-physician in 15 randomly selected weeks: Stem | Leaf ---------------------------0 | 8 9 1 | 3 4 6 6 6 8 8 2 | 0 1 2 4 3 | 0 6 Here 0|8 implies 8, 1|3 implies 13 etc. (i.e. the stem represents tens and leaf represents units). 1) Which observation occurred most? 16 2) How many weeks the physician had to attend between 15 9 to 25 patients? 3) What is the median, Q1, and Q3? Median:18; Q1:14; Q3:22 4) What is the IQR? IQR=Q3-Q1=22-14=8 5) Are there any outliers? 36 is an outlier Review for Final Exam – Practice • What is the mean and standard deviation of the data set {34, 40, 43, 55}? • Solution: Mean: x 34 40 43 55 43. 4 Standard deviation: xx ( x x )2 s 34 40 43 55 -9 -3 0 12 81 9 0 144 2 ( x x ) n 1 234 8.832 4 1 sum 234 Review for Final Exam – Practice • An airline company keeps track of the delay in its flights. Generally most flights have small delays but there are a few flights with very long delays. A consumer group claims that the "average" delay is 740 minutes while the airline company claims that the average is only 260 minutes. Why is the difference? • Solution: The consumer group refers to the mean while the company refers to median. The distribution is skewed to the right. So the mean is larger than the median. Review for Final Exam – Practice • To decide whether to provide electrical power using overhead lines or underground lines, the state administration has to consider the total lengths of street (measured in mile) in each subdivision of the respective state. Below is the histogram of street lengths of 47 subdivisions in a state. Review for Final Exam – Practice of • What is plotted along the Y-axis (the vertical axis)? Number subdivisions • How many subdivisions have total length of street between 2000 and 4000 miles? 10+7=17 • What percent of subdivisions have total length less than 1000 miles? • Which seems more likely to be true? 12/47=25.5% 1) Mean = Median; Mean < Median; Mean > Median • Which class will the median street length be in? The median is the 24th observation Median Review for Final Exam – Practice • In order to plan transportation and parking needs, the administrations of a private high school asked students how they get to school. Some rode a school bus, some rode in with parents or friends, and others used "personal" transportations bikes, skateboards, or just walking. The following table summarizes the response from boys and girls. 1) 2) 3) 4) 5) 6) Boy Girl Bus 35 32 Ride 35 47 How many students takes part in the survey? What percentage of students surveyed are girl? What percentage of students take school bus? What percent of the students are girls who ride the bus? What percent of girls who ride bus? What percent of bus riders are girls? Review for Final Exam – Practice • Solution: Boy Girl 1) How many students takes part in35the survey? Bus 32 35+35+32+47=149. Ride 35 47 2) What percentage of students surveyed are girl? (32+47)/149=53.0%. 3) What percentage of students take school bus? (35+32)/149=45.0%. 4) What percent of the students are girls who take the bus? 32/149=21.5%. 5) What percent of girls who ride bus? 32/(32+47)=40.5%. 6) What percent of bus riders are girls? 32/(32+35)=47.8%.