SOCY2200 Statistics Instructor: Natasha Sarkisian Practice Midterm Problem 1. Distributions 10 people reported the following number of meals/snacks they had the previous day: 5, 5, 3, 7, 2, 4, 5, 4, 5, 5. Construct a frequency distribution (display both counts and percentages). Draw a bar chart; display percentages and label the axes. Problem 2. Descriptive Statistics Calculate all the measures of central tendency and variability that you know in order to describe the number of meals per day data (variable X) reported in problem 1. 1 Problem 3. Correlation Analysis Assume the following values of X (number of hours a student watched TV the previous day) and Y (number of hours that student spent studying): X: 1, 3, 2, 4, 2, 0, 2, 3, 1, 1. Y: 2, 0, 2, 1, 0, 3, 2, 2, 4, 3. a. Calculate the correlation coefficient, and then coefficients of determination and alienation. b. Explain in words what each of these coefficients tells you about the relationship between time spent watching TV and studying. Problem 4. Regression Analysis Using the data from the previous problem, find and write out the regression equation for the regression of Y on X. (10 points) 2 a. Does this regression line have a positive or a negative slope? b. In words, what does the slope mean? c. In words, what does the intercept mean? d. Using this equation, calculate the predicted value of Y for the first student (X=1, Y=2). e. Calculate the error of estimate for this student. f. In words, what does this number mean? Problem 5. Using SPSS and the gss2012.sav dataset, construct the frequency distributions for the variable CHLDIDEL. Answer the following questions: a. The largest percentage of people think that the ideal number of children is ________. b. _______ % think that 7 or more children is ideal. c. Based on this frequency distribution, what is the level of measurement of this variable? Problem 6. Using SPSS and the gss2012.sav dataset, calculate measures of central tendency and variability to describe the variable MAEDUC. Answer the following questions. a. On average, the mean education of respondents’ mothers is ________. The median education of mothers is ________. b. The standard deviation for the respondents’ mothers’ number of years of education is _______. c. The mode for the respondents’ mothers’ education is ________. Problem 7. Using SPSS and the gss2012.sav dataset, construct pie charts for the variables HAPPY and HEALTH. Make sure to include percentage labels in the charts. Is a larger percentage of people report being “very happy” or report being in excellent health? 3 Problem 8. Using SPSS and the gss2012.sav dataset, construct a histogram with a normal curve for the variable MAEDUC. Is the distribution of MAEDUC symmetric (normal) or skewed, and if skewed, what is the direction of that skew? Problem 9. Using SPSS and the gss2012.sav dataset, calculate the correlation coefficient for the variables MAEDUC and EDUC. Answer the following questions. a. Correlations can vary from _____ to _____, with _______ meaning there is no relationship. b. The value of the correlation coefficient between MAEDUC and EDUC is _______. c. What does this number tell you about the relationship between these two variables? Problem 10. Using SPSS and the gss2012.sav dataset, regress REALINC (family income) on EDUC. Create a scatterplot for these two variables and display the regression line on the scatterplot. What is the predicted change in family income if education increases by one year? Multiple Choice Questions: 1. If a distribution has a few very high scores, the best measure of central tendency will be: a. mode b. median c. mean d. range e. interquartile range _ 2. The sum of deviation scores, (X-X): a. will only be equal zero if there is no variation in the variable b. will always equal zero c. will only be equal zero if the distribution is symmetric d. will always equal one e. will only equal one if the standard deviation equal one 4 3. For the nominal level of measurement, the best measure of central tendency is: a. mode b. median c. mean d. variance e. range 2000 .0 00 00 40 .0 00 00 36 .0 00 00 32 .0 00 00 28 .0 00 00 24 .0 00 00 20 .0 00 00 16 .0 00 0 0. Inco me 3 2 1 0 Variable Y 6. If Y is regressed on X and the slope of a regression line equals 4: a. when X increases by 4, Y will increase by 4 b. when X decreases by 4, Y will decrease by 4 c. when X increases by 1, Y will increase by 4 d. when Y increases by 1, X will increase by 4 e. when X increases by 1, Y will decrease by 4 0 0. 0 0. 5. This scatterplot indicates that there is: a. no relationship between X and Y b. strong positive correlation between X and Y c. strong negative correlation between X and Y d. weak positive correlation between X and Y e. weak negative correlation between X and Y 00 80 0 00 40 Frequency 1000 00 12 4. The distribution on this histogram looks: a. positively skewed b. negatively skewed c. positively kurtotic d. negatively kurtotic e. normal 3000 -1 -2 -3 -2 -1 0 1 2 Variable X 7. Coefficient of determination: a. ranges from -1 to 1 b. cannot be zero c. shows the proportion of unique variance in X d. cannot exceed 1 e. is calculated by squaring the coefficient of alienation 8. If a value has a z-score of -2: a. it is two variances away from the mean of normal distribution b. it is two standard deviations away from the mean of normal distribution c. its range is two d. it has two degrees of freedom e. it is non-existent because z-scores cannot be negative 5 9. The mean of normal distribution is also: a. its variance b. its standard deviation c. its median and mode d. its z-score e. its range 10. Based on the following frequency distribution, what is the level of measurement for this variable? a. Ordinal b. Normal c. Interval d. Leptokurtic e. Nominal 6