STAT 3011 Fall 2018 Exam 1 (A) Time Limit: 120 Minutes Name (Print): SOLUTION Student ID: Instructions: • Do not begin or turn this page until you are instructed. • Enter all requested information on the top and bottom of this page, and put your initials on the top of every page, in case the pages become separated. • This exam contains 17 pages (including this cover page and the multiple choice answer sheet). Check to see if any pages are missing. There are 14 multiple choice problems and 4 short answer problems with multiple parts. • The exam is closed book. You may not use your books, or any wireless device on this exam. • You may use a calculator and one sheet of paper (size A4 or 8.5” by 11”) with formulas or other notes on both sides. You may not share calculators or notes! • Show all your work on each problem for full credit except multiple choice problems. The following rules apply: – Organize your work, in a reasonably neat and coherent way, in the space provided. Work scattered all over the page without a clear ordering will receive very little credit. – Mysterious or unsupported answers will not receive full credit for short answer problems. A correct answer, unsupported by calculations, explanation, or algebraic work will not receive full credit; an incorrect answer supported by substantially correct calculations and explanation may still receive partial credit. – If you need more space, use the back of the pages; clearly indicate when you have done this. Honesty Statement and Pledge: I have not given or received any aid or assistance to or from any other student in this course during the exam period. Everything I have written on this exam represents my own work and knowledge. I sign this knowing that infringements on the University’s Academic Honest policy may result in failure or expulsion. Signed By: Date: STAT 3011, Fall 2018 Exam 1 (A) - Page 2 of 17 Initials: Problem I. (40 points) Multiple Choice Choose the ONLY ONE correct answer for each question. Circle your answers to all questions in the answer sheet provided. (NO explanation is needed). 1. (1 point) Did you remember to put your name on the front page? *** *** *** *** (A) (B) (C) (D) No, but I will now. Yes. Yes. Yes. 2. (3 points) For which of the following variables can you expect its histogram to be skewed to the right? (A) The scores of students (out of 100 points) on a very easy exam in which most score perfectly or nearly so, but only a few unprepared students score poorly. *** (B) Income of all adults in the United States where median household income is about $60,000 and average household income is about $117,000. (C) Heights of all female adults in Minnesota measured in centimeters. (D) All of the above. 3. (3 points) In a survey of 1300 American high school students, 32% of the respondents reported that someone had bullied them in school. Which term best describes the value 32%? (A) (B) *** (C) (D) Population Parameter Statistic Sample 32% is an information from a sample (the observed 1300 high school students). 4. (3 points) Suppose the weights of newborn baby girls have a distribution with mean of 7.7 pounds and standard deviation 1.2 pounds. Courtney is a newborn baby girl. Her weight has a z-score of 0.45. Which of the following is the best interpretation of her z-score? *** (A) (B) (C) (D) Courtneys weight is 0.45 standard deviations above average. Courtneys weight is 0.45 pounds above average. About 45% of newborn baby girls weight less than Courtney. About 45% of newborn baby girls weight more than Courtney. STAT 3011, Fall 2018 Exam 1 (A) - Page 3 of 17 Initials: 5. (3 points) Histogram of X1, X2 and X3 are given below. It is known that they have the same value of mean. Compare their standard deviations where σX1 represents the standard deviation of X1. (A) (B) (C) *** (D) σX1 < σX2 < σX3 σX2 < σX1 < σX3 σX3 < σX1 = σX2 σX3 < σX2 < σX1 STAT 3011, Fall 2018 Exam 1 (A) - Page 4 of 17 Initials: Standard deviation is the “average” distance between a typical value and the center of the distribution. 6. (3 points) A Reuters story (April 2, 2003) reported that “The number of heart attack victims fell by almost 60% at one hospital six months after a smoke-free ordinance went into effect in the area (Helena, Montana), a study showed, reinforcing concerns about second-hand smoke.” The number of hospital admissions for heart attack dropped from just under seven per month to four a month during the six months after the smoking ban. Select the true statements(s). i One can conclude that smoking ban causes the number of heart attack victims to drop. ii The number heart attack victims is the response variable of this study. (A) (B) *** (C) (D) Both i and ii are true. i is true and ii is false. i is false and ii is true. Both i and ii are false. We can establish a ’causal relationship’ only from a randomized experiment. 7. (3 points) You plan to purchase dental insurance for your three remaining years in school. The insurance makes a one-time payment of $1,000 in case of a major dental repair (such as an implant) or $100 in case of a minor repair (such as cavity). If you don’t need dental repair over the next 3 years, the insurance expires and you receive no payout. You estimate the chances of requiring a major repair over the next 3 years at 5%, a minor repair as 60% and no repair as 35%. Let X = payout of dental insurance. Which of the following statement is true? (A) (B) (C) *** (D) Average payout is (1000 + 100 + 0)/3 = $366.67. X is normally distributed. The probability you get at least $100 payout is 0.6. None of the above. (A) Average payout E(X) = 1000(0.05) + 100(0.6) + 0(0.35) = 110 (C) P (X ≥ 100) = 0.65 STAT 3011, Fall 2018 Exam 1 (A) - Page 5 of 17 Initials: 8. (3 points) The distribution of adult male weights is bell-shaped with mean 80 kilograms. If approximately 95% of the weights are between 70 kilograms and 90 kilograms, what is the variance of this distribution? (A) (B) *** (C) (D) 5 killograms 10 killograms2 25 killograms2 40 killograms2 Standard deviation σ = 5 by 68-95-99.7 Rule Variance σ 2 = 25 9. (3 points) Suppose 40% of Minneapolis citizens support Candidate B. Now assume in a sample of 100 people, 55 people responded that they support Candidate B. What is the zscore of this statistic? (Hint: What is the sampling distribution of the sample proportion? ) *** (A) (B) (C) (D) 3.06 1.02 -1.02 -3.06 P̂ ∼N ˙ (0.4, p (0.4)(0.6)/100) 0.55 − 0.4 Z-score of p̂ = 55/100 = p = 3.06 (0.4)(0.6)/100 10. (3 points) Which of the following statements is not true about the sampling distribution of a sample proportion (A) *** (B) (C) (D) The mean does not change as sample size increases. As sample size increases, standard deviation also increases. If np and n(1 − p) are both at least 15, then the distribution is approximately normal. All of the above are not true. 11. (3 points) Let x̄ denote the sample mean from a sample of size n = 9 from a population with mean µ = 50 and variance σ 2 = 9. Which of the following is true regarding the sampling distribution of x̄? *** (A) x̄ has a mean of 50 and a standard deviation of 1 (B) x̄ is normally distributed with a mean of 50 and a standard deviation of 1 STAT 3011, Fall 2018 Exam 1 (A) - Page 6 of 17 Initials: (C) Both A and B (D) Neither A nor B 12. (3 points) Suppose that the probability of event A is 0.2 and the probability of event B is 0.4. Suppose that two events are independent. Then P (A|B) is: *** (A) P (A) = 0.2 (B) P (A) P (B) = 0.5 (C) P (A) × P (B) = 0.08 (D) None of the above. 13. (3 points) At a high school with 200 students, 32 play soccer, 18 play basketball, and 8 play both sports. If a student is selected at a random, find the probability that a student plays {neither soccer nor basketball.} *** (A) (B) (C) (D) 79/100 = 0.79 1/4 = 0.25 21/25 = 0.84 4/5 = 0.8 P(Soccer ∪ Basketball)=(32 + 18 − 8)/200 = .21 Hence P(neither soccer nor basketball)=1 − 0.21 = 0.79 14. (3 points) In probability theory, events which can never occur together are classified as *** (A) (B) (C) (D) Disjoint events independent events dependent events None of the above. STAT 3011, Fall 2018 Exam 1 (A) - Page 7 of 17 Initials: Problem II. (15 points) Be sure to show all work for full credit. Dataset Three Cars contains prices (in thousands dollars) of used cars for sale at an internet website. There are three types of car makers: BMW, Jaguar, and Porsche. Use the boxplot below and R outputs to answer questions. 1. (4 points) Which car type is the most expensive overall? Give the name of statistic your answer is based on. Porsche because it has the highest median. STAT 3011, Fall 2018 Exam 1 (A) - Page 8 of 17 Initials: 2. (4 points) Based on R output given below, > summary(Price[CarType=="BMW"]) Min. 1st Qu. Median Mean 3rd Qu. 12.00 24.18 27.90 30.23 32.85 Max. 70.50 > summary(Price[CarType=="Jaguar"]) Min. 1st Qu. Median Mean 3rd Qu. 12.90 19.90 25.45 31.96 46.58 Max. 70.00 > summary(Price[CarType=="Porsche"]) Min. 1st Qu. Median Mean 3rd Qu. 16.00 40.65 51.90 50.54 58.65 Max. 83.00 (a) which used car type has the largest variability in price based on IQR? (b) what is the IQR of it? Jaguar has the larges IQR. Range = 46.58 - 19.90 = 26.68 (or $26,680) 3. (3 points) (Multiple choice question) Which of the following is the most likely to be the correct standard deviation (in thousands dollar) of price of Porsche? Briefly explain why other numbers can’t be the answer. A. B. C. D. -10 0 15 60 STAT 3011, Fall 2018 Exam 1 (A) - Page 9 of 17 Initials: C. 15 (in $1,000 s). or $15,000 A: Standard deviation can’t be negative. B: As we see the used car prices vary, the sample standard deviation of Porsche can’t be 0 either. D: As the range of Porsche is slightly over 60, standard deviation can’t be 60 either (too big). 4. (4 points) R output below shows information of all cars for sale on the website. What percentage of used cars cost more than $60,000? > stem(Price) The decimal point is 1 digit(s) to the right of the | 1 2 3 4 5 6 7 8 | | | | | | | | 23334445678 000002233334455566777778889 000113445666678 00035567778 00002233455789 5555789 0013 3 > table(CarType) CarType BMW Jaguar Porsche 30 30 30 12/90 × 100% = about 13% of used cars are more expensive than $60,000. STAT 3011, Fall 2018 Exam 1 (A) - Page 10 of 17 Initials: Problem III. (20 points) Be sure to show all work for full credit. For Question 1 - 4: A medical practice of 5 doctors tracks whether their patients get a flu shot so they can better understand what demographic group to target the flu shot message to. Within a given day, each of the 5 doctors sees 40 patients for a total of those 200 patients. The office staff examined medical records of 200 patients on a given day and noted whether they received a flu shot (or said they received a flu shot) and their age. They reported the data below: Age 6 months - 18 years 19 years - 39 years 40 years - 64 years 65 years and above Total Received a flu shot? Yes No 35 15 20 20 10 5 75 20 140 60 Total 50 40 15 95 200 1. (2 points) What is the probability that a randomly selected patient did not receive a flu shot? P(did not get flu shot)=60/200 = 0.30 2. (4 points) What is the probability that a randomly selected patient {is between 19 years - 39 years or received a flu shot}? P(19-39 years old ∪ flu shot) = 40 + 140 − 20 160 = = 0.8 200 200 STAT 3011, Fall 2018 Exam 1 (A) - Page 11 of 17 Age 6 months - 18 years 19 years - 39 years 40 years - 64 years 65 years and above Total Received a flu shot? Yes No 35 15 20 20 10 5 75 20 140 60 Initials: Total 50 40 15 95 200 3. (4 points) What is the probability that a randomly selected person {is (40 years or above) and (did not receive a flu shot)}? P(40 and above ∩ no flu shot) = 5 + 20 25 = = 0.125 200 200 4. (3 points) (a) Which age group (6 months - 18 years, 19 years - 39 years, 40 years - 64 years, or 65 years and above) is least likely to get a flu shot? (b) What proportion of this age group did NOT get a flu shot? 19 years - 39 years. 50% of 19 years - 39 years did not receive flu shot. STAT 3011, Fall 2018 Exam 1 (A) - Page 12 of 17 Initials: For Question 4, 5: do NOT use any information from table on page 9 or 10. Let A be the event where a randomly selected person received a flu shot and B be the event where a randomly selected person caught the flu. It is known that (i) 70% of the general population received a flu shot and (ii) 4% of the general population had the flu. It is also known that among those who received flu shots, only (iii) 1% of them had the flu. 5. (4 points) Based on the information above, which of the three probabilities ((i) 70%, (ii) 4% (iii) 1%) refers to a conditional probability? Use the correct notation to represent the conditional probability (e.g. P (A|B) = 0.5) P (B|A) = 0.01 6. (3 points) Based on the information, what is the probability a randomly selected person {has received a flu shot and did not have flu}? Show your work. A: received a flu shot and B: caught the flu. P (A ∩ B c ) = P (A)P (B c |A) = P (A)(1 − P (B|A)) = 0.7 × (1 − 0.01) = 0.693 OR P (A ∩ B c ) = P (A) − P (A ∩ B) where P (A ∩ B) = P (A)P (B|A) = 0.7 × 0.01 = 0.007. Hence P (A ∩ B c ) = P (A) − P (A ∩ B) = 0.7 − 0.007 = 0.693 OR P (A ∩ B c ) = P (A ∪ B) − P (B) = (0.7 + 0.04 − 0.007) − 0.04 = 0.733 − 0.04 = 0.693 STAT 3011, Fall 2018 Exam 1 (A) - Page 13 of 17 Initials: Problem IV. (12 points) Be sure to show all work for full credit. Let X be a random variable denoting the height (in inches) of a University of Minnesota student and let us assume X is normally distributed with a mean of 68 inches and a standard deviation of 6 inches. 1. (3 points) What is the probability that a randomly selected student is taller than 62 inches and less than 80 inches? Write this as a probability statement and calculate. (Hint: Use XX-XX-XX.X Rule) P (62 < X < 80) = P (62 < X < 74) + P (74 < X < 80) = 0.68 + (0.95 − 0.68)/2 = 0.815 or P (62 < X < 80) = P (62 < X < 68) + P (68 < X < 80) = 0.68/2 + 0.95/2 = 0.815 by the 68-95-99.7% rule 2. (4 points) What height must a student exceed in order to be in the tallest 10% of all students at the University of Minnesota? You should use one of the following R outputs to help you answer the question. > pnorm(0.1) [1] 0.54 > qnorm(0.1) [1] -1.28 > pnorm(0.9) [1] 0.82 > qnorm(0.9) [1] 1.28 We need to find x such that P (X > x) = 0.1. Equivalently, we can find x such that P (X < x) = 0.9. We do this by finding the z-score corresponding to the 90th percentile z = 1.28 which is given by final line of R code. Then we know that x = µ + z ∗ σ = 68 + 1.28 ∗ 6 = 75.68 STAT 3011, Fall 2018 Exam 1 (A) - Page 14 of 17 Initials: 3. (2 points) What is the probability of a randomly selected student being exactly 6 feet tall? (Note: 1 foot = 12 inches) P (X = 72) = 0 since X is a continuous random variable. 4. (3 points) What is the probability that the average height of 100 randomly selected students is greater than 68.3 inches? You should use one of the following R outputs to help you answer the question. > pnorm(0.5) [1] 0.69 > pnorm(2) [1] 0.98 > pnorm(5) [1] 0.99 > pnorm(0.05) [1] 0.52 √ X̄ ∼ N (68, 6/ 100) X̄−68 √ P (X̄ > 68.3) = P 6/ > 100 68.3−68 √ 6/ 100 = P (Z > .5) = 1 − P (Z < 0.5) = 0.31 STAT 3011, Fall 2018 Exam 1 (A) - Page 15 of 17 Initials: Problem V. (13 points) Be sure to show all work for full credit. 1. (4 points) Let Y be a discrete random variable with the following probability distribution. We can show that the expected value of random variable Y (denoted E(Y )) is equal to 2.1. Calculate the standard deviation. y P (Y = y) 0 0.2 1 0.1 2 0.3 3 0.2 4 0.2 σ 2 = V ar(Y ) = 0.2 ∗ (0 − 2.1)2 + 0.1 ∗ (1 − 2.1)2 + 0.3 ∗ (2 − 2.1)2 + 0.2 ∗ (3 − 2.1)2 + 0.2 ∗ (4 − 2.1)2 = 1.89. √ Therefore, the standard deviation is equal to σ = 1.89 ≈ 1.37. 2. (6 points) Let X be a discrete random variable with the following probability distribution (specified below). Suppose we generate a sample from random variable X with sample size n = 2 (we will assume x1 and x2 are independent). (a) List all possible samples and (b) fill in the remaining entries in the sampling distribution of sample mean table for x̄. x P (X = x) x̄ Probability 0 0 0.5 1 0.2 2 0.2 4 0.3 2 3 4 Possible samples: {(0, 0), (0, 2), (2, 0), (4, 0), (0, 4), (2, 2), (2, 4), (4, 2), (4, 4)} From a sample (0, 0), P (X̄ = 0) = 0.5 × 0.5 = 0.25) From samples (0,2), (2,0), P (X̄ = 1) = (0.5)(0.2) + (0.2)(0.5) = 0.2, etc. Probabilities for table: {0.25, 0.2, 0.34, 0.12, 0.09} STAT 3011, Fall 2018 Exam 1 (A) - Page 16 of 17 Initials: 3. (3 points) Assume the same probability distribution of X as in question 2 (the distribution of X is copied below). Suppose instead the sample generated has a sample size n = 100. Write down the (approximate) sampling distribution of X̄ using statistical notation. Be sure to specify the mean and standard deviation of this distribution. x P (X = x) 0 0.5 2 0.2 4 0.3 µ = E(X) = 0 ∗ 0.5 + 2 ∗ 0.2 + 4 ∗ 0.3 = 1.6 σ 2 = V ar(X) = 0.5 ∗ (0 − 1.6)2 + 0.2 ∗p (2 − 1.6)2 + 0.3 ∗ (4 − 1.6)2 = 3.04 Therefore, by the CLT X̄ ∼ N (1.6, 3.04/100) so that 1.6 is the mean and 0.174 is the standard deviation. STAT 3011 Exam 1 - Multiple Choice Answer Sheet Fall 2018 Name: Lecture Section: Lecture time: (Circle One) 001 9:05 am Galloway 005 8:00 am Adepoju 009 12:20 pm Park Question 013 10:10 am Park 017 5:10 pm Nilakanta Answer 1 A B C D 2 A B C D 3 A B C D 4 A B C D 5 A B C D 6 A B C D 7 A B C D 8 A B C D 9 A B C D 10 A B C D 11 A B C D 12 A B C D 13 A B C D 14 A B C D Please do NOT write in the following table. This is for grading purpose only! Question Score I (40) II (15) III (20) IV (12) V (13) 100