FULL NAME: ________________________ AS IT APPEARS ON ALBERT STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES MID TERM 1 – September 30, 2014 This midterm is made of 5 problems on 9 numbered pages, for a total of 26 points. You have 70 minutes (one hour and 10 minutes) to complete this quiz. Write your name at the top of this sheet and on every page. Circle the only right answer. There is one right answer exactly unless indicated (first question of problem 3). Non-communicating calculators are recommended (no 3g, wifi, LTE, 2g, EDGE, even in airport mode). The calculator’s memory should not contain the course’s statistical formulas. Please perform this mid term 1 in silence. If stuck on one question, move on ! There are easy and tough questions on this midterm, so keep going! This is a closed book midterm. Notes are not allowed. If in doubt about a question, write on the answer sheet provided and move on. The rule is that I will not provide clarifications during the exam. This midterm should be enjoyable: good luck ! 1 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Problem 1 – New York City Crime (4 points) A researcher in Mayor Bloomberg’s data team (Paul) performs a linear regression of neighborhood crime (y, in number of reported crimes) on neighborhood median income (x, in thousands of dollars) only. He observes crime yi and income xi for each neighborhood i. Paul finds that the linear relationship is yi = 2000 – 30 xi + ei. He finds that the r squared of the linear regression is 0.82 (or 82%). Which of the following statements is true? a. A increase in neighborhood median income by $1,000 is associated with a reduction in crime of 30 crimes. b. A increase in neighborhood median income by $1,000 is associated with a reduction in crime of 30,000 crimes. c. A increase in neighborhood median income by $1,000 is associated with a increase in crime of 30,000 crimes. d. A increase in neighborhood median income by $1,000 is associated with a decrease in crime of 30,000 crimes. Which of the following statements is true? a. The predicted number of of $40,000 is 80. b. The predicted number of of $40,000 is 800. c. The predicted number of of $40,000 is 8. d. The predicted number of of $40,000 is 2000. crimes for a neighborhood with a median income crimes for a neighborhood with a median income crimes for a neighborhood with a median income crimes for a neighborhood with a median income Which of the following statements is true? a. The correlation between neighborhood income is 1. b. The correlation between neighborhood income is -1. c. The correlation between neighborhood income is positive or zero. d. The correlation between neighborhood income is negative (but not zero). crime and neighborhood median crime and neighborhood median crime and neighborhood median crime and neighborhood median Which of the following statements is true? 2 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT a. The Total Sum of Squares is equal to 82% of the Explained Sum of Squares. b. The Explained Sum of Squares is equal to 82% of the Total Sum of Squares. c. The Sum of Squared Errors is equal to 82% of the Total Sum of Squares. d. The Total Sum of Squared Errors is equal to 82% of the Sum of Squared Errors. Problem 2 – Singapore Connection (5 points) The Economic Development Board of Singapore (EDB) collected data on individuals’ education level. It collected data for a sample of N=1,219 individuals, and observes, for each individual, the number of years of education from age 6. We note this variable xi, measured from 0 years of education (did not go to school, for one individual in the sample) to the maximum of 26 years of education for the most highly educated individual in the sample. The mean of x in the sample is 16 years of education. Which of the following statements is true? a. b. c. d. e. x is a categorical nominal variable. x is a categorical ordinal variable. x is a categorical continuous variable. x is a quantitative nominal variable. x is a quantitative discrete variable. Which of the following statements is true? a. b. c. d. the interquartile range of x is strictly higher than 26. the interquartile range of x is equal to 26. the interquartile range of x is lower or equal to 26. none of the above. The EDB then finds that the distribution is right-skewed. Which of the following statements is true? a. The standard deviation of x is higher than 2. b. The median of x is strictly greater than 16. c. The median of x is strictly lower than 16. The population of Singapore is 5.4M inhabitants. The sample was drawn by simple random sampling. Therefore, which of the following is true: a. The sample does not suffer from response bias. b. The sample does not suffer from nonresponse bias. 3 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT c. The sample does not suffer from sampling bias. d. The sample does not suffer from sampling error. The mean of x in the sample is: a. b. c. d. a parameter a residual a statistic a sum of squares Problem 3 – Friendships at NYU Prague (5 points) Anton Dzerkovic, a professor of Sociology, spent 2 years at NYU Prague. He collected data for a sample of 324 students willing to participate. For each student i, Prof. Dzerkovic asked the student how many friends he had (x i). Prof. Dzerkovic could not verify the answers’ accuracy. All students of the sample answered. From the information provided in this text, which biases affect Prof. Dzerkovic’s study (tick all that apply, + 1 for each correct answer, -1 for each wrong answer)? ◻︎ Nonresponse bias. ◻︎ Response bias. ◻︎ Sampling bias. Students had an average (i.e. mean) number of friends of 12, with a standard deviation of 4 (social life is intense at NYU Prague!). The distribution of the number of friends is bell-shaped. Therefore, applying the empirical rule, which is the right statement? a. b. c. d. e. f. Almost no student had more than 16 friends. Almost no student had more than 22 friends. Almost no student had more than 21 friends. Almost no student had more than 24 friends. Almost no student had more than 20 friends. Almost no student had more than 17 friends. Also, which is the right statement? a. 63% of students had between 8 and 16 friends. b. 67% of students had between 8 and 16 friends. 4 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT c. 68% of students had between 8 and 16 friends. d. 59% of students had between 8 and 16 friends. e. 95% of students had between 8 and 16 friends. Prof. Dzerkovic also collected data on each student’s time spent studying y i. He found that the correlation between the time spent studying per week (in hours) and the number of friends is -0.7, and that the standard deviation of the number of hours spent studying per week is 3h. Therefore, in the linear relationship yi = a + b xi + ei, the value of the slope b is: a. b. c. d. e. f. 0.425 -0.425 0.881 -0.525 0.525 -0.881 Also, a one standard deviation increase in the number of friends is associated with a …… standard deviation decline in the number of hours studied per week: a. b. c. d. e. 0.525 0.881 0.7 0.425 3 Problem 4 – Rats in Space (6 points) Prof. Gandhi is based in Bangalore. He sent 7 rats in a satellite to Mars, and the satellite sends data on the rats’ stress level in the satellite. Each rat i=1,2,3,4,…,7 has a stress level xi measured as 0,1,2,3,4. 4 is the highest level of stress. Prof. Gandhi drew the following contingency table based on his data: Stress level 0 1 2 3 4 Number of rats 0 2 3 1 1 The median stress level is: a. 0 b. 0.5 5 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT c. d. e. f. g. h. i. j. 1 1.5 2 2.5 3 3.5 4 4.5 The mode is: a. b. c. d. e. f. 0 1 2 3 4 5 The variable xi is: a. categorical nominal b. quantitative continuous c. quantitative discrete The mean of xi is: a. b. c. d. e. f. g. 0.777 0.987 0.811 1.875 2.143 3.101 4.129 The standard deviation of xi is: a. b. c. d. e. 0.990 0.777 0.117 0.228 0.617 Please round to the closest number with 3 digits after the decimal. E.g. for 0.1230, 0.1231, 0.1232, 0.1233, 0.1234 write 0.123. For 0.1235 and above, write 0.124. 6 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT The variance of xi is: a. 0.980 b. 0.999 c. 0.920 Use the rounded number of the previous question for your calculation, and round the obtained variance in the same way. Problem 5 – Age and weekly earnings (6 points) Statisticians at the U.S. Census bureau collected data on a sample of 645 individuals with their weekly earnings in $ and their age in years. They draw the following scatterplot and plot the regression line. They find the following linear relationship: yi = 276 + 7 xi + ei The Total Sum of Squares was 74,213,916 and the Sum of Squared Errors was 68,948,223. Therefore, the R squared of the regression is: a. 92.9% 7 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT b. 7.1% c. 0.5% d. 86.3% The standard deviation of age is 23.8 years. Given the information provided in this exercise, the standard deviation of weekly earnings is: a. b. c. d. 5,817.98$ 625.24$ 90.11$ 2,201.12$ In the scatterplot, consider the only observation that has an age above 80. For this observation, a. the residual is positive b. the residual is negative c. the residual is zero The regression line is the line that: a. makes sure there is an equal number of points above and below the regression line. b. Maximizes the absolute deviation between the observations yi and the regression line. 8 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT c. Minimizes the absolute deviation between the observations yi and the predicted observations y with a hat. d. Minimizes the sum of the squared residuals. e. Minimizes the total sum of squares. The standard deviation of the errors ei is: a. b. c. d. 623.590 771.190 326.950 901.180 Rounding rule as in problem 4. Use N as the denominator in the formulas, as in class. (Slightly more difficult question, attempt if time remains, and do not get bogged down). If the researcher had regressed age on income instead of income on age, the slope would have been: a. b. c. d. -1 5 0.010 -,10 END OF MIDTERM Thank you for your answers. 9 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE MCQ FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 10 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE MCQ FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 11 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE MCQ FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 12 (see next page)