Math 140 Review I PART 1 Multiple choices 1. Which of the following measurements is likely to have the least variation? a. The individual weights in ounces of oranges in a randomly selected five-pound bag of oranges at the market. b. The individual mass measured in grams of quarters in a randomly selected ten dollar roll of quarters. c. The individual heights of children, measured in inches, in a randomly selected class of sixth grade students. 2. Marital status of each member of a randomly selected group of adults is an example of what type of variable? a. Numerical variable b. Categorical variable c. Neither 3. “People with diabetes are at higher risk for certain cancers than those without the blood sugar disease, suggests a new study based on a telephone survey of nearly 400,000 adults.” a. Observational study b. Controlled experiment A fitness instructor measured the heart rates of the participants in a yoga class at the conclusion of the class. The data is summarized in the histogram below. There were fifteen people who participated in the class between the ages of 25 and 45. Use the histogram to answer questions (4) and (5). 4. How many participants had a heart rate between 120 and 130 bpm? a. 2 b. 4 c. 3 d. 5 5. What percentage of the participants had a heart rate greater than 130 bpm? a. 13% b. 27% c. 33% d. 53% 6. Determine whether the variable would best be modeled as numerical or categorical : The temperature of a greenhouse at a certain time of the day. a. Numerical b. Categorical 7. Determine whether the variable would best be modeled as numerical or categorical: The number of tomatoes harvested each week from a greenhouse tomato plant. 1 a. Numerical b. Categorical 8. Below is the standard deviation for extreme 10k finish times for a randomly selected group of women and men. Chose the statement that best summarizes the meaning of the standard deviation. Women: s a. b. c. d. s On average, men’s finish times will be 0.21 hours faster than the overall average finish time. On average, women’s finish times will be 0.17 hours less than men’s finish times. The distribution of men’s finish times is less varied then the distribution of women’s finish times. The distribution of women’s finish times is less varied then the distribution of men’s finish times. 9. In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the A) Coefficient of determination B) Slope of the regression line C) Y intercept of the regression line D) Correlation coefficient E) Standard error 10. The correlation coefficient may assume any value between A) 0 and 1 B) - and C) 0 and 8 D) -1, and 1 E) -1, and 0 11. In simple regression analysis, if the correlation coefficient is a positive value, then A) The Y intercept must also be a positive value. B) The coefficient of determination can be either positive or negative, depending on the value of the slope. C) The least squares regression equation could either have a positive or a negative slope. D) The slope of the regression line must also be positive. E) The standard error of estimate can either have a positive or a negative value. 12. The strength of the relationship between two quantitative variables can be measured by the: A) slope of a simple linear regression equation B) Y intercept of the simple linear regression equation C) coefficient of correlation PART II (MUST SHOW ALL YOUR WORK) 2 13. The waiting times (in minutes) to be served at a bank for a simple random sample of 22 customers are: 8.35 3.82 10.49 8.37 5.64 8.02 6.17 9.66 5.47 5.90 5.79 2.54 4.23 1.45 4.90 5.41 4.08 8.01 3.00 3.96 2.24 1.00 A. Complete a frequency table~ you chose the intervals. Include columns for frequency and relative frequency. B. Create a histogram for these ages. [0,2) [2,4) Frequency 2 5 Relative Frequency 0.09 0.23 Total C. Describe the distribution (use as much description as possible make sure include descriptions of shape, center, and spread in context) 14. For each of the following variables, state whether it is qualitative or quantitative. If the variable is quantitative, say whether it is discrete or continuous. A. Total amount of snowfall in New York City in a year B. Number of females in State Prisons in 2003 C. Ethnicity of students at SBCC D. The number of different zip codes in California counties 15. How much do users pay for Internet Fax Providers? Here are the monthly fees (in dollars) paid by a random sample of 6 users of commercial Internet Fax service providers in February 2010. 14 10 13 10 15 4 a) What is the standard deviation for the monthly fees paid? (MUST do it by hand showing all your work to receive credit) b) What does this number tell you about the monthly fees paid? 3 16. the following data, which gives the ages (in numerical order) at which a sample of 35 American mothers first gave birth. 14 16 16 16 17 17 18 18 18 19 19 19 20 20 20 20 20 21 21 21 22 23 23 24 24 24 24 26 27 28 28 31 32 33 50 a) FIND THE FIVE POINTS SUMMARY b) Construct a BOX PLOT c) IDENTIFY OUTLIERS 17. The World Almanac and Book of Facts 2004 reported the percent of people not covered by health insurance in the 50 states and Washington, D. C., for the year 2002. Computer output gives these summaries for the percent of people not covered by health insurance a) Is there an outlier in the data b) Looking at the histogram what measure of center and spread would you use? Explain 18. Data was collected on handgrip strength of adults. The histogram below summarizes the data. Which statement is true about the distribution of the data shown in the graph. Describe the Graph using measures of center shape and spread in context and try to explain what do you think maybe the reason for this shape? 4 19. The side-by-side boxplots below show cumulative GPAs for sophomores, juniors and seniors taking intro stats course in Autumn 2003. Compare the groups; use as much description as possible. 20. Line segments with slopes 2, 1, , 0, - , -1, -2, and undefined are shown. Match the slopes with their corresponding lines: æ 21. Use the following formulas to compute the slope ç b1 = è r isy ö and y-intercept b0 = y - b1 ix for the sx ÷ø regression line. Then give the equation of the regression line ŷ b0 b1 x . ( ) r 0.847 Mean Standard Deviation Explanatory Variable 11.6 2.35 Response Variable 54.52 7.21 5 22. Use the following formulas to compute the slope and y-intercept for the regression line. Then give the equation of the regression line. r 0.746 (This problem is worth 5 points) ŷ = a + bx where b = r i sy sx and a = y - bx . Mean Standard Deviation Explanatory Variable 23.7 7.2 Response Variable 97.5 13.4 23. The following scatterplot, correlation coefficient and regression line describe the relationship between the year (x) and the millions of dollars spent on Halloween candy (y) in the U.S. (This problem is worth 15 points) r = 0.941 The regression equation: Y = - 106 + 0.0542 X Scatterplot of Halloween Candy Sales (Millions $ vs Years) Halloween Candy Sales (in Millions of $) 2.3 2.2 2.1 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1994 1997 2000 Years 2003 2006 a) What is the slope of the regression line? What does the slope mean in this context? b) How well does the regression line fit this data? How confident would you be in making predictions with the regression line? c) Use the regression equation to predict how much was spent in 2004 on candy sales. (Hint: Since the x variable is in actual years, plug in 2004 into the regression equation.) c) Can we use this regression equation to predict candy sales in the year 2037? Why? 6 24. The following scatterplot and r 2 describe the relationship between the year and the millions of dollars spent on Halloween candy in the U.S. Tell whether each of the following statements is a valid or invalid interpretation of r 2 . (This problem is worth 5 points) r 2 0.885 Scatterplot of Halloween Candy Sales (Millions $ vs Years) Halloween Candy Sales (in Millions of $) 2.3 2.2 2.1 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1994 1997 2000 Years 2003 2006 a) Since the r 2 value is 88.5%, this indicates that time causes candy sales to increase. b) There is a 88.5% chance that we can accurately predict the candy sales for a given year between 1994 and 2006. c) 88.5% of the variability in candy sales can be attributed to the linear relationship with the year. d) There was an average of 88.5 million dollars in candy sales. Complete the following problems. 25. The calculated correlation coefficient values are -0.977, -0.487, 0.006 and 0.777. Match the correlation coefficient values with its scatterplot. 7