Sociology 360 Practice exam ID: Structure of actual exam – total 100 points: 5-10 true/false, 10-20 multiple choice, balance free answer. This mock exam has two goals: (1) to give you an experience similar to (but not identical to) that of taking the actual exam, and (2) to give you some review questions. A mock exam may have more questions than the actual exam. There is no guarantee that the actual exam will cover only material on the mock exam or cover it in the same way. Instructions: •Write all your answers – and any parts of your work that you want considered for partial credit – in the blue book provided. •Write your ID on this exam booklet and on your blue book. Put the exam booklet inside the blue book when you are done. •Sign the back of the blue book. •You may do the sections in any order you wish, but make sure that the question number is clearly indicated. •Use this exam booklet as “scratch paper.” •Answer all problems. If you do not know how to do a problem, you may want to skip it and come back to it at the end. •Use your calculator and a copy of the formula card that we supply. •Point values for each problem are stated in the exam. •You do not need to show the details of calculations, unless you want to receive partial credit. If you wish your answer to be considered for partial credit you must show your calculations in the blue book. Partial credit will only be given for work that is seen as a step toward the correct answer. Random facts relating to elements of the question will not get partial credit. •Make clear what is the final answer to the question. If there is any ambiguity, circle the number that is the final answer. •Points will be based primarily on correct use and interpretation of statistics. •You will lose points if the presentation of results and text are difficult to follow. •Be complete, but do not include irrelevant or unnecessary information. Such irrelevant information may lead us to think that you do not understand the problem or do not know what among the information you provided is actually the answer asked for by the problem. NOTES (If relevant): •On this exam you do not need to check whether or not you can use the normal approximation of the sampling distribution to compute a confidence interval or do a hypothesis test except when the problem explicitly asks you to do so. •Unless the problem explicitly states how the sample was gathered, assume that all samples are generated through simple random sampling. •Problems may give very large test statistics. Good luck! 1 Practice Exam 1 Part I: True/False and Multiple Choice (2 points each) (12%) True/False questions. Circle true or false following each statement. 1. The median of a density curve is always the point that divides the area under the curve in half. True False 2. Simpson’s paradox results from a variable omitted in the pooled table acting as a lurking variable. True False 3. A low value of r2 indicates that a large proportion of the variation in y remains unexplained by the regression. True False 4. Lurking variables are one reason we cannot infer causality from an association among two variables. True False 5. The height of a density curve for a range of values gives the proportion of observations that fall under the density curve for that range of values. True False 6. If a histogram has a bar that is taller than the other bars then this is suggestive of a skewed distribution. True False Part II: Multiple Choice (2 points each) (24%) Circle the correct answer below. There is one correct answer to each question. Also, note that many questions have “all of the above” or “none of the above” choices. 7. As part of a survey of college students a researcher is interested in the number of cigarettes smoked per day. She records a 1 if the student does not smoke, a 2 if the student smokes at least once a week but not every day, and a 3 if the student smokes at least one cigarette per day, and a 4 if the student smokes more than a pack a day. This variable is a) ordered categorical c) unordered categorical b) quantitative d) All of the above. 8. A description of different houses on the market includes the following three variables. Which of the variables is quantitative? a) The square footage of the house c) The monthly gas bill b) The monthly electric bill d) All of the above. 2 9. When drawing a histogram it is important to a. have a separate bin for each observation to get the most informative plot. b. make sure the heights of the bars exceed the widths of the bins so that the bars are true rectangles c. label the vertical axis so that the reader can determine the counts or percent in each bin d. make certain the mean and median are contained in the same bin interval, so that the correct type of skewness can be identified. Use the following to answer questions 10-12: The following histogram represents the distribution of acceptance rates (percent accepted) among 25 business schools in 1997. In each bin, the left endpoint is included but not the right. 10. What percent of the schools have an acceptance rate of under 20%? a) .16% b) 4% c)12% d) 16% 11. What is the approximate width of each bin in this graph? a. 10 b. 5 c. 3 d. none of the above could plausibly be the width of the bin. 12. Which of the following intervals include the median of this distribution? a. 30 to 40 b. 20 to 30 c. 15 to 25 d. cannot be determined from the information given ___________________________________________________________________________ 3 Use the following box plot of the exam scores in a statistics class to answer questions 13-15. The boxplot is drawn per Moore (e.g. not per StataQuest). 90 75 60 45 30 13. Approximately 25% of the students scored below a) 90 b) 65 c) 75 d) 60 14. The interquartile range of the exam scores is approximately a) 15 b) 55 c) 65 d) 5 15. The maximum exam score is approximately a) 75 b) 60 c) 65 d) 90 __________________________________________________________________________________ 16. Using data from the fifty states, a researcher calculates the correlation coefficient between the infant mortality rate (deaths per 1000) X in 1990 in the state versus the percent 18 year olds in the state in 1990 that graduated from high school. The correlation between X and Y is r = -0.54. If instead of plotting these variables for each of the fifty states, we plotted the values of these variables for each county in the United States, we would expect the value of the correlation r to be a. exactly the same b. closer to zero c. + 0.54 (the magnitude is the same, but the sign should change) d. closer to –1 _________________________________________________________________________________ 4 17. In a statistics class with 136 students, the professor records how much money each student has in her or his possession during the first class of the semester. The histogram below shows the data collected. Frequency 50 40 30 20 10 0 10 20 30 40 50 60 70 80 90 100 Amount of Money in $ From the histogram, which of the following is true? a. The mean is larger than the median. b. The mean is smaller than the median. c. The mean and the median are approximately equal. d. If is impossible to compare the mean and the median for these data. _________________________________________________________________________________ 18. X and Y are two categorical variables. The best way to determine if there is a relation between them is to a. calculate the correlation between X and Y. b. draw a scatterplot of the X and Y values c. make a two-way table of the X and Y values d. all of the above 5 Part III. Free Response (64%) Answer all questions. In some cases, we will award partial credit for correct parts of a problem even if the final answer is incorrect. Partial credit will only be given for work that is seen as a step toward the (correct) final answer. Random facts relating to the problem will not get partial credit. To get partial credit, you need to show your work. 1. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Below are data from Fortune Magazine on the number of research centers in 10 American cities. City Memphis Denver Indianapolis Los Angeles Phoenix San Francisco Detroit Minneapolis Seattle Orlando No. of research centers 85 302 69 515 121 345 361 235 153 33 Use this data to answer the questions below (7%): A. What is the five number summary for this data? (3 points) B. What is the interquartile range? (2 points) C. If we were to delete Los Angeles from the data, which would change more, the standard deviation of the variable or the interquartile range? (2 points) 6 2. Below is a stem-and-leaf plot of the percentage of the population Christian among states in the Northeastern region of the United States. (6%) 3|69 4| 5|599 6|1 a. What is the mean for this data? (2 points) b. What is the median? (2 points) c. What is the standard deviation for this data? (2 points) 3. Normal distribution problems (18%) a. What proportion of the area under the standard normal curve falls to the right of -0.5? (3 points) z= b. What proportion of the observations of a standard normal distribution falls between –1 and 1 standard deviations from the mean? (3 points) c. A social psychologist has developed a test to measure gregariousness. The test is normed so that it has a mean of 70 and a standard deviation of 20, and the gregariousness scores are normally distributed. What percentage of scores are above 105? (4 points) d. Scores on the California test of basic skills are normally distributed with mean 50 and standard deviation 25. What is the lowest score you would need on the California test of basic skills to be in the top 20% of all scores? (4 points) e. On the California test of basic skills (mean 50 and standard deviation 25) what percentage of scores fall between 35 and 50? (4 points) _______________________________________________________________________________ 7 4. The graph below is a histogram drawn in StataQuest of age at first marriage for 296 married persons in the general social survey. Each bin is two years wide. (6%) Fraction .2 .1 0 10 20 30 40 50 Age at first marriage a. What proportion of persons in the sample were married at the ages of either 20 or 21? (Give your best guess based on the graph. Close will get full credit. 2 points) b. The mean age at first marriage in this sample is 21.8. Will the median age of marriage for the sample be greater than, less than, or equal to 21.8? (2 points) c. How would describe the shape of this distribution? (2 points) __________________________________________________________________________________ 8 5. A researcher regresses years of education (response or dependent) on number of siblings (explanatory or independent) using data on individuals from a large survey. She gets the following regression equation (16%): Ŷ = -.227x + 13.48 a. Explain in one or two sentences what the slope says about the relationship between number of siblings and years of education. (3 points) b. A statistics professor has 5 siblings and 20 years of education. What is the residual for the statistics professor? (3 points) c. If the standard deviation of the number of siblings variable is 3.0, and the standard deviation of the years of education variable is 3. 15, what is the correlation between education and number of siblings? (3 points) d. Draw the regression line on the graph axes below. (3 points) Years of education 20 15 10 5 0 5 10 15 Number of brothers and sisters e. Place an “x” on the graph above to show where the statistics professor (of part b) would appear if graphed on the scatterplot. Then put an “o” on the graph to show the predicted value for the professor based on the regression. (2 points) 9 f. When we delete the statistics professor from the regression, the slope of education changes to -.231. Is the statistics professor acting as an influential observation? Why or why not? (2 points) _________________________________________________________________________________ 6. Below is a crosstabulation based on data from the 1986 general social survey. The two variables are belief in life after death (based on a survey question) and the education of the respondent in three categories (less than 11 years of education, 12 years of education, and 13 or more years of education) (11%). Do you believe in | education life after | death? | 0/11 12 13+ | Total yes | | | ___ 380 436 | | | 1116 no | | | 86 74 ___ | | | 246 Total | | 386 ___ 522 | | 1362 a. Fill in the missing (blank) frequencies in the table above. (3 points) b. Percentage the conditional distributions assuming that education is the independent (or explanatory) variable and belief in life after death is the dependent (or response) variable. Write the percentages below the corresponding frequencies above. (4 points) c. Describe in words the association of the independent (or explanatory) and dependent (or response) variable (mention both the direction and strength of relationship). (4 points)