Final Exam Outline The first 10 questions are multiple choice. Each question is worth 2 points 1. Power of a test 2. Significance level 3. Interpretation of a confidence level (repeated sampling) 4. Binomial distribution 5. Probability question on P(AlB), P(AUB)... 6. Sample of a normal distribution sample mean and sample variance. Find the distribution of one of them 7. Expectations 8. Hypergeometric distribution 9. Bayes' theorem 10. Conditional probability Questions 11-14 are problems with 4 parts. Each pat is worth 5 points 11. a) Testing equal variances b) Test equality of 2 means c) Find the p-value associated with testing in part b) d) Interpretive question between part a) and b) (means are equal and variances are equal) 12. a) Given certain information, find sample size b) Suppose we change some of the parameters given in a), will the sample size get bigger or smaller, why? c) When you calculate the sample size you will be given either population variance or sample variance, you will be given one and which one would you choose to calculate sample size and why? d) The fundamental difference between population variance and sample variance 13. a) Matched pairs, confidence interval between the difference of the means b) What does that confidence mean? c) In what assumptions is the confidence interval built? d) Test between the 2 means and how that'd linked up with the confidence interval? 14. a) Given data, calculate the sample mean and sample variance b) Confidence interval for the mean c) Confidence interval for the variance d) Explain why normality is an important assumption 2 Part One: Multiple Choices Questions 1. Power of a test and 2. Significance level We start from Hypothesis test The Null Hypothesis, H0 The Alternative Hypothesis, H1 or HA Hypothesis Testing Process 3 Reason for Rejecting H0 Level of Significance, Note: There is a correlation between significance level and P-value. We will see it later. Level of Significance and the Rejection Region Errors in Making Decisions ( Important) 4 Outcomes and Probabilities Type I & II Error Relationship Also Power of the Test 5 3. Interpretation of a confidence level (repeated sampling) Confidence Interval and Confidence Level In another way to say, we have 1 - confidence to say that the population (true) value is located in the confidence interval. 4. Binomial distribution Binomial Distribution Formula Example: There are 20 professors in the Psychology Department. 15 of them have received good evaluations from students last year, while 5 have received poor evaluations. You are planning to take four courses in the Psychology Department next semester. (15 points) a. What is the probability that all of your professors next semester have received good evaluations? (4 points) Let good evaluation is event A and bad evaluation is event A-. We have P(A) = 2/3 and P(A-) = 1/3.It follows a binomial distribution. Let X indicates the amount of professors who receive good evaluation. P(X=4) = 4! (2/3)4 (1/3)0 / 4! = 0.1975 b. What is the probability that all of your professors next semester have received poor evaluations? (3 points) 4! *(2/3)0 (1/3)4 / 4! = 0.0123 c. What is the probability that at least one of your professors next semester has received poor evaluation? (4 points) P(X=0)= 6 P(at least one of professors have received poor evaluation) = P(x≤ 3) = P(X=0)+ P(X=1)+ P(X=2)+ P(X=3) = 1 – P(X=4)=1-0.1975 = 0.8025 d. What is the probability that the majority of your professors next semester have received good evaluations? (4 points) We consider the majority is when X > 2. P(x≥ 3) = P(X=3)+ P(X=4)= 4! *(2/3)3 (1/3)1 /( 3!*1!) + 0.1975 = 0.3951+0.1975 = 0.5926 The Accounting Department has established a number of core courses that have to be passed each year before advancing to the next year-level within a four-year program. It has been estimated that the proportion of students passing their courses within the expected time is 80% at all year-levels (each year’s performance is independent of that of other years). (20 points) a. What is the probability that a new entrant will finish his studies at the regular time of 4 years? (4 points) Let X indicates the amount of years that students pass the year level. P(pass) = 0.8 and P(not pass) = 0.2 P(X=4) = 4! (0.8)4 (0.2)0 / 4! = 0.4096 b. Assuming that a student has already finished the coursework of the first two years, what is the probability that he will graduate in the next two academic years? (4 points) P(3≤X≤4) = P(X=3) + P(X=4) 4! (0.8)3 (0.2)1 / 3! + 0.4096 = 0.8192 c. What is the probability that a new entrant will face some difficulties in finishing his coursework at any time throughout his studies? (4 points) The probability that a student will have problem in finishing his studies = 1 - P(X=4) = 0.5904 = d. Assuming that a student has already finished two years of coursework, what is the probability that he will not be able to graduate within the next two years? (4 points) P(2≤X≤3) = P(X=2) + P (X=3) = 0.1536+0.4096 = 0.5632 e. Consider two randomly chosen students that have already finished the coursework of the first two years. What is the probability that at least one of them will have some difficulty with his graduation in the next two years? (4 points) P(2≤X≤3) = 0.5632 P(Person1 will not graduate ∪Person2 will not graduate ) = P(Person1 will not graduate) + P(Person2 will not graduate – P(Person1 will not graduate ∩Person2 will not graduate ) = 0.5632+0.5632-0.5632*0.5632 = 0.8092 where P(Person1 will not graduate ∩Person2 will not graduate )= 0.5632*0.5632 7 since person 2 and person 1 are statistically independent. 5. Probability question on P(AlB), P(AUB)... Probability Rules A Probability Table Conditional Probability The rules for detecting the information from questions: (a) Joint probability: Both, All; (b) Union probability: either or, at least, or; (c) Conditional probability: where, while, when, given, of, among, if, etc. Example: A general contractor has submitted two bids for two projects; A and B. The probability of getting project A is 0.60. The probability of getting project B is 0.75. The probability of getting at least one of the projects is 0.85. (15 points) a. Construct the table or tree of joint probabilities. (3 points) B B0.10 Marginal Probability 0.6 0.25 0.15 0.4 0.75 0.25 1 0.50 A AMarginal Probability b. What is the probability that this contractor will not get any project? (3 points) We need to find P ( A - ∩ B-) , where A- is the event that the contractor does not get project A and B- is the event that the contractor does not get project B. P(A ∩ B) = P(A) + P(B) – P(A∪B) = 0.6 + 0.75 – 0.85 = 0.5 8 P ( A ∩ B-) = P(A) – P(A ∩ B) = 0.6 – 0.5 = 0.1 P ( A - ∩ B) = P(B) – P(A ∩ B) = 0.75 – 0.5 = 0.25 P ( A - ∩ B-) = P(A - ) - P ( A - ∩ B) = 0.4 – 0.25 = 0.15 c. If this contractor gets project B, what is the probability that he will not get project A? (3 points) We need to find the conditional probability that given getting B but not getting A. P ( A- / B ) = P(A- ∩ B) / P(B) = 0.25 / 0.75 = 0.333 d. If there is a limit on the number of projects awarded to each contractor, what is the probability that this contractor will get only one (anyone) project? (3 points) We need to find the probability union of the contractor not getting A but getting B and the contractor not getting B but getting A which is P [ ( A - ∩ B) ∪ ( A ∩ B-) ] = P ( A - ∩ B) + P ( A ∩ B -) - P [ ( A - ∩ B) ∩ ( A ∩ B-) ] = 0.25 + 0.1 – 0 = 0.35 e. If the probabilities of getting these projects are the same for all contractors who bid for them, what is the probability that two randomly chosen contractors who are bidding for them will get one contract each? (3 points) 6. Sample of a normal distribution sample mean and sample variance. Find the distribution of one of them Sampling and Sampling Distributions If the Population is Normal Finite Population Correction (It is important if you prof taught it in class) 9 Example: 1. A publisher wants to estimate the average number of words per page that are included in a 6-page document on the basis of the findings in a sample of 3 pages. The exact number of words found in each of the six pages is as follows: 206, 195, 188, 209, 212, and 190. (20 points) [The calculations for (b), (c), and (d) below must be done on Excel and pasted in the spaces provided here.] a. How many samples (possible groups) of three out of those six pages can be made? C 36 6! 20 3!3! (3 points) possible samples (groups) of three pages b. List all possible samples (three-page groups) with their mean number of words. (3 points) c. Find the probability function of the sampling distribution of the sample mean and calculate its mean and variance. (4 points) E ( X ) X i * P( X ) 200 From the distribution of the sample means above, we get We also get X2 ( X i ) 2 * P( X i ) 17.67 X 4.20 Calculate the population mean (mean number of words of all pages) and the population standard deviation. points) The mean of the population distribution is And the population variance is 2 (X i )2 N X N i (3 1200 200 6 530 88.333 9.40 6 d. Compare the mean of the population (found in d) and the mean of the sampling distribution of the sample mean (found in c). Is it what you were expecting? Explain. (3 points) According to the theory, the mean of the sampling distribution of the sample means (the expected value of the sample mean) is equal to the mean of the population (where the samples have been drawn from). E ( X ) = 200 e. Compare the standard deviation of the population (found in d) and the standard error of the sampling distribution of the sample mean (found in c). Is it what you were expecting? Explain. (4 points) The relationship between the variance of the population and the variance of the sample mean (while also considering that the population here is finite and therefore we have to also include in the variance formula the finite population correction factor) is shown by X2 2 N n n N 1 . Plugging the numbers from (d) above in the formula that links the population standard deviation with the standard error of the sampling distribution we get X n N n 9.40 6 3 9.40 * 0.6 5.4273 * 0.7746 4.20 , N 1 3 6 1 1.732 11 which is equal to what we have found in (c) above. Population Proportions, P Example Answer: Sample Variance Sampling Distribution of Sample Variances Examples 1. The dollar amount of monthly purchases during the Christmas season at The Bay is normally distributed with a mean of 480 per card account and a standard deviation of 80 dollars. (35 points) 12 a. What is the probability that a randomly selected cardholder has purchased items of a value of less than $450 during the Christmas season? (3 points) Here, μ = 480 P( Z Xi and σ = 80 450 480 0.375) or P(Z < –0.375) = 0.5 – 0.1461 = 0.3539 or 35.4% 80 The probability that a randomly selected cardholder has purchased items of a value of less than $450 during the Christmas season is 35.4%. b. What is the probability that two randomly (and independently) selected cardholders have purchased items of a value of less than $450 during the Christmas season? (3 points) P ( <450 450 ) = 0.3539 * 0.3539 = 0.1252 or 12.52% The probability that two randomly selected cardholders have purchased items of a value of less than $450 during the Christmas season is 12.52%. c. What is the probability that out of two randomly (and independently) selected cardholders, at least one has purchased items of a value of less than $450 during the Christmas season? (3 points) We know that P (< 450) = 0.3539. Therefore, P (> 450) = 0.6461 Using the binomial formula P(1 2) 1 n! P x (1 P) n x we get the following expression x!(n x)! 2! * 0.3539 0 * 0.64612 1 0.4174 0.5826 0!(2 0)! The probability that at least one of two randomly chosen cardholders has purchased items of a value of less than $450 during the Christmas season is 58.3%. d. What is the probability that the mean spending amount of a random sample of 16 cardholders is less than $450? (3 points) Here μ = 480, σ = 80, n = 16, and X 450 X 450 480 30 P Z PZ 1.5 0.5 0.4332 0.0668 80 20 n 16 or 6.7% The probability that the mean spending amount of a random sample of 16 cardholders is less than $450 is 6.7%. e. If a sample of 36 cardholders had been taken (instead of 16), would the probability of a sample mean of spending amounts being less than $450 be smaller than, larger than, or the same as the correct answer to part (d) above? Along with your calculations, sketch a graph to illustrate your reasoning. (4 points) A sample of 36 is larger than the sample of 16 used in (d) above. This implies that the distribution of the sample means of samples of size 36 is relatively less dispersed. Therefore, the probability of being beyond a certain value (the area under the curve) becomes smaller. Therefore, P (Z < 450 480 ) = P (Z < – 2.25) = 0.5 – 0.4878 = 0.0122 < 0.0668 found in (d) above. 80 13 36 n = 36 n = 25 f. What is the probability that the mean spending amount of a random sample of 25 cardholders is between $450 and $500? (3 points) Here μ = 480, σ = 80, n = 25, and X 1 450 , and X 2 500 450 480 500 480 P Z P(1.875 Z 1.25) 0.4696 0.3944 0.8640 80 80 25 25 or 86.4% g. There is a 10% probability that the mean spending amount of a random sample of 25 households is higher than what amount? (4 points) P(Z > 1.28) = 0.1, 1.28 X 480 X 500.48 dollars 80 25 There is a 10% probability that the mean spending amount of a random sample of 25 cardholders is higher than $500.48. h. To reward client loyalty, the management of The Bay decided to offer exclusive rebates for purchases during St. Valentine’s to the 20% of its clients that purchased the most during the Christmas period. What should the least amount purchased for a cardholder be to qualify for the rebate? (4 points) P(Z > 0.84) = 0.20, 0.84 X i 480 X i 547.20 80 A cardholder should have spent at least $547.20 during Christmas in order to qualify for a rebate in his/her St. Valentine’s purchases. i. The probability is .05 that the sample standard deviation of the amounts of purchases of a sample of 25 is bigger than what number? (4 points) P j. 2 24 36.42 0.05 (25 1) s x2 36.42 (80) 2 s x 9712 98.55 The probability is .05 that the sample standard deviation of the amounts of purchases of a sample of 25 is less than what number? (4 points) 2 P 24 13.85 0.05 13.85 (25 1) s x2 (80) 2 s x 3693.33 60.773 14 2. The Taxation Department has estimated that 75% of all tax returns lead to a refund. A sample of 300 tax returns is taken. (15 points) a. What is the probability that 72% of them, or less, would lead to a refund? Z p (1 ) 0.72 0.75 0.75 * 0.25 300 n 0.03 0.000625 0.03 1.2 0.025 (3 points) and P (Z< – 1.2) = 0.1151 The probability that a proportion of 72% or less (out of the 300 tax returns) would lead to a refund is 11.5%. b. There is 80% chance that the sample proportion of tax returns leading to a refund will be bigger than what amount? (4 points) 0.84 P(Z > -0.84) = 0.8, p 0.75 0.75 * 0.25 300 p 0.75 0.021 0.729 There is 80% probability that the proportion of taxpayers who would get a refund out of a sample of 300 will be above (or at least) 72.9%. c. There is 90% chance that the sample proportion of tax returns leading to a refund will be less than what amount? (4 points) 1.28 P(Z < 1.28) = 0.9, p 0.75 0.75 * 0.25 300 p 0.75 0.032 0.782 There is 90% probability that the proportion of taxpayers who would get a refund out of a sample of 300 will be below (or up to) 78.2%. d. The probability is .75 that the sample proportion will differ from the population proportion (assumed by the Taxation Department) by up to a maximum of how much? (4 points) P ( – 1.15 < Z < 1.15) = 0.75, 1.15 Difference 0.75 * 0.35 300 Difference 0.02875 There is 75% probability that the proportion of taxpayers who would get a refund out of a sample of 300 will be different from the population proportion by no more than 2.875 percentage points. 3. A random sample of 30 boxes of cereal of a certain brand is taken from the production line. Assume that fill amounts follow a normal distribution. (10 points) a. What is the probability that the sample variance is up to 45% larger than the population variance? (3 points) (n 1) s x2 P s x2 1.45 x2 P (229) 29 *1.45 P (229) 42.05 0.95 2 x The probability that the sample variance is up to 45% larger than the population variance is about 95%. b. What is the probability that the population variance is up to 80% larger than the sample variance? (3 points) (n 1) s x2 29 P (229) 16.11 0.975 P x2 1.80s x2 P (229) 2 1.80 x 15 The probability that the population variance is up to 80% larger than the sample variance is about 97.5%. c. The probability is 0.10 that the sample variance is up to what percentage of the population variance? (4 points) (n 1) s x2 s x2 s x2 19.77 P (229) 29 * 19 . 77 0 . 10 0.6817 2 2 2 29 x x x There is 10% probability that the sample variance is 68.2% or less of the population variance. 7. Expectations (a) Expected Value for discrete distribution Linear Functions of Random Variables E(a) a and E(bX) bμ X and Var(a) 0 Var(bX) b 2σ 2X Binomial Distribution Mean and Variance Portfolio Analysis (b) Expectations for Continuous Random Variables μ X E(X) σ 2X E[(X μ X ) 2 ] 16 Cov(X, Y) E[(X μ x )(Y μ y )] Cov(X, Y) E(XY) μ x μ y ρ Corr(X, Y) Cov(X, Y) σ Xσ Y Differences Between TwoRandom Variables E(X Y) μ X μ Y If the covariance between X and Y is 0, then the variance of their difference is Var(X Y) σ 2X σ 2Y Example: Your uncle has asked you to analyze his stock portfolio, which contains 5 shares of stock A and 10 shares of stock B. The joint probability distribution of the stock prices is shown below: (30 points) . a. Find the probability that the price of stock A will be $40. (2 points) PA (40) = 0.00+ 0.00 + 0.10 + 0.20 = 0.30 b. Find the probability that the price of stock A will be $40 while the price of stock B is $55. (2 points) P (A=40 B=55) = P (A=40 B=55) / P (B=55) = 0.10 / 0.25 = 0.40 c. Find the probability that either of the two stock prices will be at their maximum price level. (3 points) P (B=60 D=70) = P (B=60) + P (D=70) – P (B=65 D=70) = 0.35 + 0.25 – 0.00 = 0.60 d. Find the probability that both stock prices will be up to $60. (3 points) P (B≤60 A≤60) = P (A=40 B=45) + P (A=50 B=45) + P (A=60 B=45) + P (A=40 B=50)+ + P (A=50 B=50) + P (A=60 B=50) + P (A=40 B=55) + P (A=50 B=55) + P (A=60 B=55) + + P (A=40 B=60) + P (A=50 B=60) + P (A=60 B=60) = = 0.00 + 0.05 + 0.05 + 0.00 +0.05 +0.05 + 0.10 + 0.05 +0.05 + 0.20 + 0.10 +0.05= 0.75 e. Find the average price and variance of stock B. (3 points) See calculations in table above: E(Y) = μy = Yi * P(Yi) = 53.75 . Var(Y ) y2 E[(Y y ) 2 ] ( yi y ) 2 P( y) 52.44 and y y2 52.44 7.24 f. Find the average price and variance of stock A. (3 points) E(X) = μx = Xi * P(Xi) = 54.00 See calculations in table above. Var( X ) x2 E[( X x ) 2 ] ( x x ) 2 P( x) 135.00 17 and x x2 135.0 11.62 . g. Can you tell which of the two stocks offers the greatest potential for capital gains? (3 points) Stock A has a larger variance. Therefore, it offers a greater opportunity for capital gains as well as capital losses. h. Find the covariance and correlation coefficient of the prices of the two stocks. (4 points) Cov( X , Y ) E[( X x )(Y y )] E ( XY ) x y x xyP( x, y) x y = y = 2862.50 – (53.75*54.00) = – 40.0 rx, y i. Cov( X , Y ) x y 40.0 40.0 0.4755 11.62 * 7.24 84.1288 This implies that there is a strong negative relationship between the prices of the two stocks. Find the average value of the entire portfolio. (3 points) This requires that we create a new variable (a combination of the two) as follows: W = 5 X + 10 Y j. W a x b y = 5*54.00 + 10*53.75 = 807.5 Find the standard deviation of the value of the entire portfolio and calculate a 68% range of its value. (4 points) Var (W ) w2 Var (5 X 10Y ) Var (5 X ) Var (10Y ) 2Cov(5 X ,10Y ) 52 x2 10 2 y2 2 * 5 *10Cov( X , Y ) (25 * 135) + (100 * 52.44) + [100 * (– 40.00)] = 4619 Standard Deviation = 4619 = 67.963 A 68% range is given by μw ± 1σ or 807.5 ± 67.963 Which means that there is 68% chance that the value of the entire portfolio will be between $739.54 and $875.46 8. Hypergeometric distribution “n” trials in a sample taken from a finite population of size N Sample taken without replacement Outcomes of trials are dependent Concerned with finding the probability of “X” successes in the sample where there are “S” successes in the population P(x) CSx C nNxS C nN S! (N S)! x! (S x)! (n x)! (N S n x)! N! n! (N n)! 18 Example: 3 different computers are checked from 10 in the department. 4 of the 10 computers have illegal software loaded. What is the probability that 2 of the 3 selected computers have illegal software loaded? Answer: 1. The mid-term exam of a course in introductory statistics has 8 multiple-choice questions where the correct answer must be picked among 4 choices. A student who has not studied at all decides to proceed by picking the answers at random. What is the probability that he will pick the correct answers to all questions? Question: is this one binominal? 2. It is known that a statistics professor has a pool of 16 questions on probability distributions from which he is planning to pick five for next week’s quiz. Four out of the 16 questions refer to the binominal probability distribution. What is the probability that all of the available questions on the binominal distribution will be included in next week’s quiz? Question: is this one binominal? 9. Bayes' theorem P(E i | A) P(A | E i )P(E i ) P(A) P(A | E i )P(E i ) P(A | E 1 )P(E 1 ) P(A | E 2 )P(E 2 ) P(A | E k )P(E k ) where: Ei = ith event of k mutually exclusive and collectively exhaustive events A = new event that might impact P(Ei) Example: A drilling company has estimated a 40% chance of striking oil for their new well. A detailed test has been scheduled for more information. Historically, 60% of successful wells have had detailed tests, and 20% of unsuccessful wells have had detailed tests. Given that this well has been scheduled for a detailed test, what is the probability that the well will be successful? Solution: Let S = successful well U = unsuccessful well P(S) = .4 , P(U) = .6 (prior probabilities) Define the detailed test event as D Conditional probabilities: P(D|S) = .6 P(D|U) = .2 Goal is to find P(S|D) 19 P(D | S)P(S) P(D | S)P(S) P(D | U)P(U) (.6)(.4) (.6)(.4) (.2)(.6) .24 .667 .24 .12 P(S | D) 10. Conditional probability See example in part (5). No repeat. 20 Questions 11-14 are problems with 4 parts. Each part is worth 5 points 11. a) Testing equal variances Hypothesis Tests for Two Variances Decision Rules: Two Variances 21 You are a financial analyst for a brokerage firm. You want to compare dividend yields between stocks listed on the NYSE & NASDAQ. You collect the following data: NYSE NASDAQ Number 21 25 Mean 3.27 2.53 Std dev 1.30 1.16 Is there a difference in the variances between the NYSE & NASDAQ at the = 0.10 level? F Test: Example Solution b) Test equality of 2 means (i) σx2 and σy2 Known Assumptions: Samples are randomly and independently drawn 22 both population distributions are normal Population variances are known (x y) (μ X μ Y ) Z 2 2 2 σy σ σ 2x σ y σ 2X Y x nx ny nX nY Decision Rules (ii) σx2 and σy2 Unknown, Assumed Equal Forming interval estimates: The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ use a t value with (nx + ny – 2) degrees of freedom s 2 p (n x 1)s 2x (n y 1)s 2y nx ny 2 t x y μx μy 1 1 S2p n x ny 23 Example: The following data represent weights (in pounds) for two random samples of men of approximately 5 feet 10 inches tall and of medium build. The only difference is that the first group is comprised of athletic persons and the second of non-athletic ones. Athletic men: 152, 148, 156, 155, 157, 162, 159, 168, 150, 173. Non-athletic men: 155, 157, 169, 170, 171, 161, 181, 165, 183. (15 points) a. Calculate the means and variances of the two samples. (5 points) The two samples are independent. We calculate that XA X A XN X N nA nN 1580 158 10 1512 168 9 and S 2 A (X A and S 2 N (X N X A )2 nA 1 X N )2 nN 1 556 61.78 S A 7.86 10 1 756 94.5 S N 9.72 9 1 b. Can we conclude at the 5% significance level that the population mean weight of athletic men is lower than the population mean weight of non-athletic men? (5 points) The hypotheses are set as follows: H0: μA – μN = 0 This implies that μA = μN, H1: μA – μN < 0 This implies that μA < μN Reject H0 if estimated t17,0.05 < – 1.74 We assume here that the variances of the two populations are equal. The pooled variance is an average of the two sample variances weighted by their degrees of freedom: s 2p t (n A 1) * s A2 (n N 1) * s N2 (10 1) * 61.78 (9 1) * 94.5 556 756 77.1765 n A nN 2 10 9 2 17 ( X A X N ) ( A N ) 1 2 1 sp ( ) n A nN (158 168) 0 10 10 2.4774 1.74 Therefore, 1 1 16.293 4.03644 77.1765( ) 10 9 we reject the null hypothesis and we conclude that there is enough evidence to infer that, on the average, the athletic men weigh lower than the non-athletic ones. c. At the 5% level of significance, can we infer that the population mean weight of athletic men is lower than the population mean weight of non-athletic men by more than 2 pounds? (5 points) 24 H0: μA – μN = –2 This implies that μA = μN – 2 H1: μA – μN < –2 This implies that μA < μN – 2 Reject H0 if estimated t17,0.05 < – 1.74 t ( X A X N ) ( A N ) s 2p ( 1 1 ) n A nN (158 168) (2) 1 1 77.1765( ) 10 9 8 16.293 8 1.982 4.03644 and – 1.982 < – 1.74 Therefore, we do reject the null hypothesis and we conclude that there is enough evidence to infer that, on the average, the athletic men weigh at least 2 pounds less than the non-athletic ones. (iii) Two Population Proportions Assumptions: Both sample sizes are large, nP(1 – P) > 9 z p̂ x p̂ y p̂ 0 (1 p̂ 0 ) p̂ 0 (1 p̂ 0 ) nx ny p̂ 0 n x p̂ x n y p̂ y nx ny c) Find the p-value associated with testing in part b) (1) Z distribution Z Test for Proportion: Solution Z Test for Proportion A marketing company claims that it receives 8% responses from its mailing. To test this claim, a random sample of 500 were surveyed with 25 responses. Test at the = .05 significance level. Check: Our approximation for P = 25/500 = .05 nP(1 - P) = (500)(.05)(.95)= 23.75 > 9 25 (2) t distribution Example: 1. At the beginning of the course of Statistical Methods I, the instructor recommended that students devote 3 hours per week for the duration of the 13-week semester, for a total of 39 hours. It is known that the times spent on studying statistics follow a normal distribution. Throughout the course’s duration, students were claiming that they were following the instructor’s recommendation. At the course’s completion, a random sample of 10 students enrolled in the course was drawn where each student was asked how many hours he or she spent doing homework in statistics. The data are listed below: (30 points) 45 38 37 40 44 38 46 37 42 43 a. Calculate the sample mean and sample standard deviation. (5 points) X X n i Sample variance 410 41 10 s 2 X Sample standard deviation Sample mean i X n 1 2 106 11.7 7 8 10 1 s s 2 11.778 3.4319 b. Formulate a suitable null and alternative hypothesis to test the students’ claim that they followed their instructor’s recommendation at a 5% level of significance and interpret your findings. (4 points) H0: μ < 39 t and H1: μ > 39 Rule: Reject H0 if estimated tα > 1.833 X 41 39 2 1.843 1.833 s 3.4319 1.0853 n 10 We reject the null hypothesis. Therefore, we conclude with 95% probability that the students have at least spent the recommended amount of time to study statistics. c. Estimate and interpret the p-value of the calculated test statistic in (a) above. (4 points) P-value = P (t9 > 1.843) = 0.04. The p-value here indicates that the significance level at which we could reject the null hypothesis is 4%, instead of the 5% used in (b) above. This implies that our evidence that the students have spent more than recommended study time is not very strong. d. If the same sample results had been obtained from a random sample of 16 students, could the students’ claim be accepted at a lower level of significance than in part (c)? (4 points) t X 41 39 2 2.331 s 3.4319 0.8578 n 16 Yes. If the same result had come from a sample of 16 students, the estimated t-ratio would be 2.331, which corresponds to a P-value of about 1.5% (with 15 degrees of freedom). This means that we could reject the null hypothesis at the 1.5%, level of significance instead of the 5% used in (b) or the 4% that we found in (c) above. In such a case, the conclusion of the test would be much stronger. e. While dealing with 16 students, suppose that the alternative hypothesis has been one-sided and that it was set as Ha : < 39. Make a graph to visualize the problem and state, without doing the calculations, whether the p-value of the test (the level of significance needed to reject the null hypothesis) would be higher than, lower than, or the same as found in (d). (5 points) H0:μ=39 P-value H1: μ < 39 Reject H0 if estimated t < – 1.753 The alternative hypothesis is on the leftward (negative) side of the t-Distribution (which means that the rejection area is also on that side). On the other hand, our present sample mean is found on the rightward (positive) side of the distribution of sample means. Therefore, the p-value of our present mean must be greater than 50%. f. Using the original data, test at the 5% significance level that the population standard deviation of studying times is greater than 4 hours. (4 points) H0: σ2 < 16 or σ < 4 27 H1: σ2 > 16 2 Reject H0 if estimated χ2 < 3.33 or σ > 4 (n 1) s 2 2 (10 1) * 3.4319 2 9 *11.778 6.625 16 42 Therefore we cannot reject H0 at the 5% level of significance. Therefore, we conclude that the population standard deviation σ of studying time is 4 hours or less. g. Using the original data, estimate the population standard deviation with 90% probability. (4 points) This question is about building a confidence interval for the population variance and subsequently the population standard deviation. The two limits will be set with Lower limit = 2 Upper limit = 2 Therefore, (n 1) s 2 2 (n 1) s 2 2 6.265 < σ2 < 31.832 2 (n 1) s 2 2 for 2 = 3.33 and 2 = 16.92 (10 1) * 3.432 2 9 *11.778 6.265 16.92 16.92 (10 1) * 3.432 2 9 *11.778 31.832 3.33 3.33 and 2.503 < σ < 5.642 hours d) Interpretive question between part a) and b) (means are equal and variances are equal) 12. a) Given certain information, find sample size Sample Size Determination Margin of Error The required sample size can be found to reach a desired margin of error (ME) with a specified level of confidence (1 - ) The margin of error is also called sampling error the amount of imprecision in the estimate of the population parameter the amount added and subtracted to the point estimate to form the confidence interval To determine the required sample size for the mean, you must know: The desired level of confidence (1 - ), which determines the z/2 value 28 The acceptable margin of error (sampling error), ME The standard deviation, σ The sample and population proportions, and P, are generally not known (since no sample has been taken yet) P(1 – P) = 0.25 generates the largest possible margin of error (so guarantees that the resulting sample size will meet the desired level of confidence) To determine the required sample size for the proportion, you must know: The desired level of confidence (1 - ), which determines the critical z/2 value The acceptable sampling error (margin of error), ME Estimate P(1 – P) = 0.25 Example: How large a sample would be necessary to estimate the true proportion defective in a large population within ±3%, with 95% confidence? b) Suppose we change some of the parameters given in a), will the sample size get bigger or smaller, why? 29 c) When you calculate the sample size you will be given either population variance or sample variance, you will be given one and which one would you choose to calculate sample size and why? Population variance d) The fundamental difference between population variance and sample variance Confidence Intervals for the Population Variance Hypothesis Tests of one Population Variance 30 13. a) Matched pairs, confidence interval between the difference of the means Tests Means of 2 Related Populations Paired or matched samples Repeated measures (before/after) Use difference between paired values: The test statistic for the mean difference is a t value, with n – 1 degrees of freedom: t Matched Pairs: Solution d D0 sd n 31 b) What does that confidence mean? If P(a < < b) = 1 - then the interval from a to b is called a 100(1 - )% confidence interval of . The quantity (1 - ) is called the confidence level of the interval ( between 0 and 1) In repeated samples of the population, the true value of the parameter would be contained in 100(1 - )% of intervals calculated this way. The confidence interval calculated in this manner is written as a < < b with 100(1 - )% confidence c) In what assumptions is the confidence interval built? Point Estimate ± (Reliability Factor)(Standard Error) There are two basic assumptions that must be true for a confidence interval to "work." First, the data used to develop the statistic for which the confidence interval is created must be mostly "symmetric," meaning, they can't have too much skew. Second, there must be enough data used to develop the statistic for which the confidence interval is created. How much uncertainty is associated with a point estimate of a population parameter? An interval estimate provides more information about a population characteristic than does a point estimate Such interval estimates are called confidence intervals An interval gives a range of values: Takes into consideration variation in sample statistics from sample to sample Based on observation from 1 sample Gives information about closeness to unknown population parameters Stated in terms of level of confidence Can never be 100% confident d) Test between the 2 means and how that'd linked up with the confidence interval? The test we described before, here are the confidence intervals: d t n 1,α/2 Sd S μ d d t n 1,α/2 d n n (x y) z α/2 σ 2X σ 2Y σ 2X σ 2Y μ X μ Y (x y) z α/2 nx ny nx ny (x y) t n x n y 2,α/2 s 2p s 2p nx s 2p ny (n x 1)s 2x (n y 1)s 2y nx ny 2 μ X μ Y (x y) t n x n y 2,α/2 s 2p nx s 2p ny 32 The link can be found from graphs. 14. a) Given data, calculate the sample mean and sample variance b) Confidence interval for the mean c) Confidence interval for the variance d) Explain why normality is an important assumption Example: The amount of time spent (in minutes) for the completion of the 3rd assignment in Statistics by a random sample of 10 students gave the following results: 215, 182, 193, 208, 210, 176, 197, 188, 218, 213. (20 points) a. Calculate the sample mean and sample standard deviation. [The necessary calculations of the sample mean and sample variance for the confidence interval must be done on Excel and pasted below.] (4 points) X X s2 (X i n i 2000 200 10 X )2 n 1 1984 220.44 s 14.85 10 1 33 b. Find a 95% confidence interval for the population mean time spent by students on the 3rd Assignment. (4 pts) We must assume a normally distributed population. Thus, in the place of the unknown population variance, we use the estimated sample variance and follow the t-distribution for the limits of the confidence interval. For 10-1=9 degrees of freedom and a 95% confidence interval, the t-cutoff points are + 2.262. X t , / 2 * s n X t , / 2 * 200 2.262 * 14.85 10 s n or 200 2.262 * 14.85 10 or 200 2.262*(14.85 / 3.1623) or 200 10.62 or 189.38 < μ < 210.62 The mean time spent for the 3rd Assignment by all students should be anywhere between 189.38 and 210.62 minutes with 95% probability. c. Find a 90% confidence interval for the population mean time spent by students on the 3rd Assignment. (4 pts) 200 1.833 * 14.85 10 200 1.833 * 14.85 10 or 200 1.833*(14.85 / 3.1623) or 200 8.61 or 191.39 < μ < 208.61 The mean time spent for the 3rd Assignment by all students should be anywhere between 191.32 and 208.61 minutes with 90% probability. d. Compare the two findings and offer an explanation to this effect. (3 points) The confidence interval in (c) above is narrower than that in (b) since the t-score will be smaller for a 90% confidence interval than for a 95% confidence interval. e. Find a 90% confidence interval for the population variance of time spent on the 3rd Assignment. (5 points) n = 10, s2 = 220.44, 2 9,0.05 = 16.92, 2 9,0.95 = 3.33 (n 1)s 2 2 n1, / 2 2 (n 1)s 2 2 n1,1 / 2 or (10 1) * 220.44 (10 1) * 220.44 2 16.92 3.33 or 117.255 < σ2 < 595.784 and 10.828 < σ < 24.409 The variance of the time spent on the 3rd Assignment by all students should be anywhere between 117.3 and 595.8 with 90% probability.