Math 140 Review Chapter 1-3 KEY Make sure you write your answers in complete sentences and in context when applicable. 1. We collect these data from 50 male students. Which variable is categorical (C) and which is quantitative (Q)? a. eye color C b. head circumference Q c. marital status C d. number of cigarettes smoked daily Q e. number of TV sets at home Q f. temperatures in Southern California for the past year Q g. weather conditions in Southern California in past year C 2. A survey asked 200 people if they thought women in the armed forces should be permitted to participate in combat. The following table summarizes the responses. Yes No Total Male 72 28 100 Female 8 92 100 Total 80 120 200 a) What percent of the females answered yes? 8/100= 8% of the females answered yes b) What percent of the males answered yes? 72/11 = 72% of the females answered yes c) Does there appear to be a difference in gender regarding opinion on whether women should be permitted to participate in combat? Explain. Yes, a greater percentage of men (72%) thought women should be permitted to participate in combat compared to women (8%). 3. A survey of an introductory statistics class in the Fall of 2003 asked students whether or not they ate breakfast the morning of the survey. Results are as follows: Sex Breakfast Yes No Male 66 67 Female 125 74 191 141 Total 133 199 332 a. Was this a categorical or quantitative study? Categorical (gender & whether they ate breakfast) b. What is the variable (variables)? gender & whether they ate breakfast c. What percent of females ate breakfast? 125/199 = 62.8% of the females ate breakfast d. What percent of males ate breakfast? 66/133 = 49.6% of the males ate breakfast e. Does there appear to be a difference in gender regarding eating breakfast? Explain. Yes, there is 1 about a 13% difference. More females tended to eat breakfast compared to males. However, the question is: is this a significant difference? 13% is not that significant in this context. 4. The population in 2000 of the U.S. was about 282,000,000. Of these 282,000,000 people, 34992 were of age 65 or older. What percent of the U.S. population was considered elderly (65 or older) in 2000? 34992/282000000 = 0.01% of the U.S. population was considered elderly in 2000. (This is less than 1% of the population.) 5. Identify the following research studies as observational or a controlled experimental. Explain why. a. Data from the Motorcycle Industry Council stated that “Motorcycle owners are getting older and richer.” Data were collected on the ages and incomes of motorcycle owners for the years 1980 and 1998 and then compared. The findings showed considerable differences in the ages and incomes of motorcycle owners for the two years. Observational, no treatment assigned. b. A study conducted at Virginia Polytechnic Institute and presented by Psychology Today divided female athletes into two groups and had the students perform as many sit-ups as possible in 90 seconds. The first group was told only to ‘do your best,’ while the second group was told to try to increase the number of sit-ups they did each day by 10%. After 4 days, the first group averaged 43 sit-ups while the second group averaged 56 sit-ups. The conclusion was that athletes who were given specific goals performed better than those who were not given specific goals. Experimental, a treatment was assigned (group encouranged to increase number by 10%) c. A recent study showed that eating garlic can lower blood pressure. Researchers prescribed garlic pills to high blood pressure patients and monitored their results over a 6 month period. These results were then compared to high blood pressure patients who had received a placebo. The doctors administering the pills were not aware of which patients had received the treatment. Experimental, a treatment was assigned (garlic pills) 6. In problem 5a above, what might be a confounding variable? Answers may vary: example: Older people tend to be more economically stable. 7. In problem 5c above, why is it important that the patients and doctors not know who received the treatment? What is the name of this technique? One wants to avoid lurking variables (patients or doctors changing their behavior and therefore influencing the outcome). This is called double blind since both patients and doctors were not aware. 8. The dot plot below shows the ages for about 108 people in three community college math classes. a. Any age 26 and over is considered unusually high for this sample. How many student ages are considered unusual for this sample? 21 students had ages that were unusual for this group b. What percent of the sample was this? 21/108= 19.4% of students’ ages are unusual 2 9. Answer the following questions given the distribution of following exam scores. Histogram of Chapter 3 Exam 16 14 Frequency 12 10 8 6 4 2 0 40 50 60 70 80 90 100 110 Chapter 3 Exam a) How many students took the chapter 3 exam? 37 students b) What is the shape of the distribution of exam scores? Roughly symmetric (or slightly skewed right) c) What was a typical score for this class (center)? Around 75 points d) What was the typical spread for this class? Around 65 to 85 3 e) How many students got at least an 80 on the exam? 8+2+1 = 11 students f) What percentage of students got at least an 80 on the exam? 11/37 = 29.7% of students got at least an 80 on the exam. g) How many students scored less than an 80 on the exam? 7+4+15 = 26 students h) What percentage of students scored less than an 80 on the exam? 26/37 = 70.2% of students received less than 80 on the exam. i) What percentage of students scored below 70 on the exam? 7 + 4 = 11; 11/37 = 29.7% of students scored below 70 on the exam. j) Approximately what percentage of students scored from 70 to 90 on the exam? 15 + 8 = 23; 23/37 = 62.1% scored from 70 to 90 on the exam. 10. Which is true of the data whose distribution is shown? I. The distribution is skewed to the right. T II. The mean is smaller than the median. F III. We should summarize with mean and standard deviation. F 11. Answer the following questions given the distribution of salaries of a random company. Salary Relative Frequency (%) 40 30 20 10 0 40000 60000 80000 Salary (In U.S. Dollars) 4 100000 a) b) c) d) What percentage of employees made a salary of less than $35,000? 25% What percentage of employees made a salary of more than $80,000? 5% 60% of employees made a salary of less than $45000 How many employees made a salary of less than $35,000? Cannot be determined. The number of employees is not given. 12. All students in the physical education class completed a basketball free-throw shooting event and the highest number of shots made was 32. The next day, the PE teacher realized that he had made a mistake. The best student had actually made 38 shots (not 32). Indicate whether changing the student’s score made each of these summary statistics increase, decrease, or stay about the same: a) Mean increase b) Median about the same c) Range increase d) IQR about the same 13. The mean and median scores of a recent math 075 exam were close to 68%. The instructor decided not to count one score of zero that was from an absent student to get a better representation of the class average and then recalculated the new mean and median. a) Will the new mean increase, decrease or remain about the same? Explain. Since a very low score was dropped, the new mean will now be higher (class average will go up). The mean is sensitive to outliers. b) Will the new median increase, decrease or remain about the same? Explain. The new median will be roughly the same since the person with the middle score is roughly in the same position. (The only time it would increase would be if the two middle people had scores that were far apart from each other.) The median is not sensitive to outliers. c) True or false: The overall range increased. False, since the minimum changes from zero to the next lowest score in the class, the overall range will get smaller. The variability will decrease. d) True or false: The IQR remained about the same. True, the scores at the 25th and 75th percentile are roughly in the same position. Therefore, the IQR will be close to the same amount. 14. The following boxplots compare the ages of all the Oscar Winners from 1970 to 2001. Use this to answer the following questions. 5 Consider the distributions of ages for Oscar winning actors and actresses. around a. 50% of winners were below what age? Actor: 43 Actress: 35 b. 75% of winners were below what age? Actor: 51 Actress: 42 c. 75% of winners were above what age? Actor: 37 Actress: 32 d. 25% of winners were above what age? Actor: 51 Actress: 42 Actor 5 Number Summary: 31 , 37.25 , 42.5 , 50.25 , 76 Actress 5 Number Summary: 21 , 32 , 35 , 41.5 , 80 a. How many outliers are there for each gender and what are they? Actor: 1, around 75 years old Actress: 3, around 60, 73, & 80 years old b. What are the shapes of the distributions? Actor: right skewed Actress: right skewed c. Did a typical actor or actress win at a younger age? Explain. The typical age for an actor to win an Oscar is around 43 years old versus the typical age for an actress is around 35 years old. Therefore, actresses tend to win Oscars at a younger typical age. d. What are the IQRs for actors and actresses? Interpret these IQRs. Actors IQR = 51-37 = 14 years, Actress IQR = 42-32 = 10 years There is more variability in typical ages for men compared to women. The typical age to win an Oscar for women is more consistent. e. Based on the IQRs, did actors or actresses win at a younger age? Explain. Typical ages to win an Oscar for men is around 37 to 51 years old. Typical ages for women is around 32 to 42. This shows that women tend to win this award at a younger age. f. Which data set is more consistent and why? The female group is more consistent since the IQR is smaller. This means that it is easier to predict a typical age for the female group g. Did actors or actresses win at a younger age? Utilize percentages from the Boxplot of the distributions above to support your answer. Typical ages to win an Oscar for men is around 37 to 51 years old. Typical ages for women is around 32 to 42. This shows that women tend to win this award at a younger age. Half (50%) of the male winners were below 43 years old compared to half of the female winners who were below 35 years old. 15. The following data represent the annual chocolate sales (rounded to nearest billions of dollars) for a sample of seven countries in the world. Round answers to nearest tenths. 2, 5, 7, 2, 5, 3, 18 6 a. Find the mean for the data. Write the answer in a complete sentence in context. The average annual chocolate sales was 6 billion dollars. ∑(𝑥−𝑥̅ )2 b. Calculate the standard deviation: s = √ 𝑛−1 . Write the answer in a complete sentence in context. S = 5.6 billion dollars. Typical chocolate sales are 6 billion dollars ±5.6 billion dollars. 12/24/11 c. Using this standard deviation, one could then expect typical annual chocolate sales to be between Checkpoint 2.1 countries were around 0.4 to 11.6 billion which two values? Typical annual chocolate sales Topic for these dollars. Question 5 16. Points: 10 out of 10 Answer the following questions with a letter I, II, III, or IV. Explain your choice in complete sentences for each question. Histograms can be used more than once and some answers might have more than one answer. Which of the histograms could represent a distribution of weights of babies for a large random sample of male newborns at a local hospital? A. A. WhichI graph would represent a distribution of the ages of math 075 students where there is a high percentage of students who recently graduated high school and very few students who over 50? B. Explain. II II - Most of the data will be clustered on the lower end (left) and very little data will be on the higher end (right). C. III B. Name all graphs where the mean would be chosen as the best measure of center. Explain. III – The D. meanIVis a good measure of center for symmetric graphs only. C. Name all graphs where the IQR (interquartile range) would be chosen as the best measure of spread. Explain. I, II, & IV – The IQR is a good measure of spread for non-symmetric graphs since it is not Feedback sensitive to outliers. Good job! We expect the distribution of weights to have a central peak D. Which would represent a distribution for the heights of koala bears? Explain. III – around an graph average weight. Measurements of species tend to be symmetric. Most fall within a typical range with fewer high and low values. Please answer the question below. Your response will not be graded, but will be available for your 17. The ten top grossing Pixar Animated movies for the US box office up to June 2010 are shown below, instructor to read. in millions of dollars. a. Find the median Question 6 A typical Pixar movie made about 246 million dollars. b. Find the interquartile range (IQR). 261-163 = 98 million dollars Points: 0 out of 0 Here are data on 77 cereals. The data describes the grams of carbohydrates (carbs) in a serving of cereal. Compare the distribution of 7 carbohydrates in adult and child cereals. c. Interpret the meaning of the IQR in context. Examples: The typical spread of revenue for Pixar movies was 98 million dollars. This means that the spread between the middle 50% of the revenues was 98 million dollars. Pixar typically made from 163 to 261 million dollars. Movie $Millions Toy Story A Bug’s Life Toy Story 2 Monsters, Inc. Finding Nemo The Incredibles Cars Ratatouille WALL-E Up 192 163 246 256 340 261 244 206 224 293 18. The following graphs show the distributions of the ages in years of Math 075 students in the Fall of 2014. Histogram of Ages 350 300 Frequency 250 200 1 50 1 00 50 0 15 30 45 60 75 Ages 8 90 Dotplot of Ages 24 36 48 60 72 84 96 Ages Each symbol represents up to 5 observations. Boxplot of Ages 1 00 90 80 Ages 70 60 50 40 30 20 10 Note: Age 26 is the first outlier. Descriptive Statistics: Ages Variable Mean StDev Ages 21.128 6.812 Minimum 15.0 Q1 18.0 Median Q3 19.0 21.0 Maximum 98.0 IQR 3.0 a. Was this a categorical or quantitative study? Quantitative (ages) b. What is the variable (variables)? ages c. What is the shape of the distribution in the ages? Right skewed d. Which measure of typical center is best to use? Mean or Median? Explain. The median would be a better representation of the typical center since the graph is skewed. 9 e. Which measure of typical spread is best to use? Standard Deviation or IQR? Explain. The IQR would be a better representation of the typical spread since the graph is skewed. f. What is the typical center? Complete sentence in context. Using the provided descriptive statistics, the median is 19.0. This means that a typical age for a Math 075 student was 19 years old. g. What is the typical spread? Complete sentence in context. The IQR is given a 3. This means that the spread between the middle 50% of the ages was only 3 years. h. What ages are considered unusual for this group? Were there any students that were unusually younger or older for this sample? It was given that 26 was the first outlier so any student 26 and older is considered unusual for this group. There were many outliers in this group (too many to count). The oldest being close to 100. 19. According to the data above for the ages, the mean was 21.1 years with a standard deviation of 6.8 years. The following question is to practice standard deviation. In reality, since the graph was skewed, these values are not a good representation of what was typical for this group. The median and IQR would be used instead. But for practice: a. What is the range of ages from one standard deviation below the mean to one standard deviation above the mean? 14.3 to 27.9 years old. (Typical ages) b. What is the range of ages from two standard deviations below the mean to two standard deviations above the mean? 7.5 to 34.7 years old. (Anyone over 34.7 years old is unusual for this group) c. What is the range of ages from three standard deviations below the mean to three standard deviations above the mean? 0.7 to 41.5 years old. (Anyone over 41.5 years old was extremely unusual for this group) d. Is the age of 25 years more than one standard deviation above the mean? Show by converting to a z score using the formula z xx 25 21.1 0.6 No it is not more than one standard deviation . z s 6.8 above the mean. The z score is 0.6. This means a 25 year old was typical for this group. e. There was a 98 old student which is unusual. How unusual is she, highly unusual (z-score above 2) or extremely unusual (z-score above 3)? z 98 21.1 11.3 She was extremely unusually with a z score 6.8 of 11.3! Much higher than a z score of 3 meaning this is extremely rare and highly unlikely to happen again. 20. A dietitian is interested in comparing the sodium content of real cheese with the sodium content of a cheese substitute in milligrams and asks you (the statistician) to provide data that supports her belief that cheese substitutes typically contain more sodium. You collect the sodium content of several real cheeses and chees substitutes. Using computer technology, you provide the following box plots and sample statistics. Using the following statistics and graphs, decide whether the dietitian’s belief is correct. Support your decision with the statistics provided. (Include discussion of the shapes, any outliers and the best measures of center and spread to support your decision). Answers will vary but the conclusion should be that: The typical sodium content for real cheese was between 56.3 and 292.5 mg (Half of the samples fell within this range. The typical sodium content for cheese substitute was between 197.5 and 305 mg. (Half of the samples fell within this range. Although there was more variability in the real cheese, a typical sample had lower sodium in general. Note, however that the maximum typical value of sodium 10 was about the same for both. (Even though real cheese is the better choice we can note that the upper 25% of the samples were higher in sodium content compared to the substitute cheese due to real cheese having more variability.) real cheese N Mean SD Minimum 8 193.1mg 133.2mg 40mg cheese substitute N Mean 8 253.8mg SD 68.6mg Q1 56.3mg Median Q3 200mg 292.5mg Minimum Q1 130mg 197.5mg Median 265mg Maximum 420mg Q3 Maximum 305mg 340mg Boxplot of real cheese and cheese substitute 400 Data 300 200 100 0 real cheese cheese substitute 21. In the real cheese/cheese substitute boxplots, which type had more variability? (Using the descriptive statistics) The typical spread for real cheese was (Q3 – Q1) 236.2 mg. The typical spread for cheese substitute was 107.5 mg. Thus the real cheese had more variability. The cheese substitute was more consistent. 22. The mean for each pair of graphs is given just above each histogram. For each pair of graphs presented below indicate whether one of the graphs has a larger standard deviation than the other or if the two graphs have the same standard deviation. Try to identify the characteristics of the graphs that make the standard deviation larger or smaller. 11 1. B since it is skewed 2. B since there is more variability 3. Both have the same. The distributions in the graphs are identical, the mean is just higher for the second one. The data is spread out the same for both graphs. 23. Which would have a larger standard deviation? The mile times of the male high school track teams in the U.S. or the mile times of the male participants in the last Olympics? High school teams since their times would be more spread out (more variability). 24. In 2007, the mean property crime (per 100,000 people) for the 26 states east of the Mississippi River was 409 with a standard deviation of 193. Assume the distribution was roughly symmetric and unimodal. a. Between which two values would you expect to find about 68% of the rates? 68% is one standard deviation so between 216 and 602 crimes per 100000 people. b. Between which two values would you expect to find about 95% of the rates? 95% is two standard deviations so between 23 and 795 crimes per 100000 people. c. If an eastern state had a violent crime rate of 503 crimes per 100,000 people, would you consider this unusual? Explain. No, 503 crimes falls within one standard deviation. 503 crimes is within the typical values. 25. When would you choose the median as the best measure of center? Median is appropriate for nonsymmetrical graphs. (Mean would be appropriate for symmetrical graphs.) 26. When would you choose the standard deviation as the best measure of spread? Standard deviation is appropriate for symmetrical graphs. (IQR would be appropriate for non-symmetrical graphs.) 12