STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES MIDTERM 1 Answer key Problem 1 1.1 a) Note that x is neighborhood median income measured in thousands of dollars, whereas y is simply the number of reported crimes. Thus, a one unit increase in x means an increase of 1 thousand dollars in neighborhood median income, which leads to a 30 (b) decrease in y (number of reported crimes). 1.2 b) We substitute x by 40 in our linear equation to find the value of y (predicted number of crimes). y = 2000 – 30(40) = 800 1.3 d) Taking the square root of R squared we find: √0.82 = ±0.90 Since b is negative, we know that the relationship between y and x is negative and thus our correlation must be negative. Thus, the correlation between y (neighborhood crime) and x (neighborhood median income) is negative (but not zero). 1.4 b) R squared tells us the percentage of variation in y that is explained by our model. We know that R squared = Explained Sum of Squares/Total Sum of Squares. Thus, the Explained Sum of Squares is equal to 82% of the Total Sum of Squares. Problem 2 2.1 e) X is the number of years of education. Thus, it is a quantitative variable. It is discrete because it is measured by completed years of education (there is no 1.5 years of education. An individual either completed 2 years or, if they didn’t complete the second year, they are coded as having completed 1 year). 2.2 c) The IQR cannot be higher than or equal to 26, since the range of the education variable is [0,26]. Note the IQR is the 75th percentile – 25th percentile. The IQR must be lower than or equal to 26. 2.3 c) If the distribution is right skewed, that means that the median is lower than the mean (the mean is pulled towards the skew, the longer tail). 2.4 c) If the sample was drawn via random sampling that means that the sample does not suffer from sample bias. Response bias could still be present if subjects are not honest about their level of education. Nonresponse bias could still be present if individuals drawn in the sample do not respond. Sampling error is present whenever we use a sample to make inferences about an entire population. 2.5 c) The mean of x is indeed a statistic since it describes a sample, not a population. Problem 3 3.1 b) and c) Nonresponse bias is the only bias that does not affect the study since all students of the sample answered. 3.2 d) Out of all of them, d) is the best answer possible. Having 24 friends is 3sd away from the mean, and thus we can safely say that almost no student had more than that. 3.3 c) We know that 68% of the observations fall within the interval of minus 1 standard deviation away from the mean (8 friends), plus 1 standard deviation away from the mean (16 friends). 3.4 d) We know that r=sx/syb, thus b = sy/sxr = -0.525 3.5 c) A sx increase in x (number of friends) is associated with a sxb/sy standard deviation decline in y (number of hours studied per week) sxb/sy = 0.7 Problem 4 4.1 e) The observations are {1, 1, 2, 2, 2, 3, 4}. The median is 2. 4.2 c) The mode is the stress level with the highest frequency (highest number of rats with that stress level). That is 2 (3 rats). 4.3 c) Stress level is measured in numbers from 0 to 4. Thus, it is a quantitative discrete variable. 4.4 e) (1𝑥2)+(2𝑥3)+(3𝑥1)+(4𝑥1) mean = = 2.143 7 4.5 a) We use the formula for the standard deviation 𝑁 2 1 𝑆𝐷 = √ ∑(𝑦 − ̅̅̅̅ 𝑦𝑖 ) 𝑁 𝑖=1 1 = √ [(1 − 2.143)2 + (1 − 2.143)2 + (2 − 2.143)2 + (2 − 2.143)2 + (2 − 2.143)2 + (3 − 2.143)2 + (4 − 2.143)2 ] 7 1 = √ (2.613 + 0.061 + 0.734 + 3.448) = 0.990 7 4.6 a) We square the standard deviation and find var = (0.990)2 = 0.980 Problem 5 5.1 b) R squared = (TSS-SSE)/TSS= 0.071 5.2 b) Sy=(Sx/r)b=(23.8/√0.071)7= 625.24 5.3 b) Residual = yi - 𝑦̂ We can see that the only observation above 80 is above the regression line, and thus the residual will be negative. 5.4 d) 5.5 c) The Sum of Squared Errors (SSE) is 68,948,223. Note that the formula is ̂2 ∑𝑁 𝑖=1(𝑦𝑖 − 𝑦𝑖 ) . If we divide it by N and take the square root, we will have the 68,948,223 standard deviation of the error. Thus, the right answer is √ 5.6 c) We want to find bxy We know that r (x,y) = (Sx/Sy)byx=(Sy/Sx)bxy Thus, bxy = (Sx/Sy)2byx = (23.8/625.24)2(7)= 0.010 𝑁 = 326.95