```SOCY2200 Statistics
Instructor: Natasha Sarkisian
Practice Midterm
Problem 1. Distributions
10 people reported the following number of meals/snacks they had the previous day: 5, 5, 3, 7,
2, 4, 5, 4, 5, 5. Construct a frequency distribution (display both counts and percentages). Draw
a bar chart; display percentages and label the axes.
Problem 2. Descriptive Statistics
Calculate all the measures of central tendency and variability that you know in order to
describe the number of meals per day data (variable X) reported in problem 1.
1
Problem 3. Correlation Analysis
Assume the following values of X (number of hours a student watched TV the previous day)
and Y (number of hours that student spent studying): X: 1, 3, 2, 4, 2, 0, 2, 3, 1, 1. Y: 2, 0, 2, 1,
0, 3, 2, 2, 4, 3.
a. Calculate the correlation coefficient, and then coefficients of determination and
alienation.
b. Explain in words what each of these coefficients tells you about the relationship
between time spent watching TV and studying.
Problem 4. Regression Analysis
Using the data from the previous problem, find and write out the regression equation for the
regression of Y on X. (10 points)
2
a. Does this regression line have a positive or a negative slope?
b. In words, what does the slope mean?
c. In words, what does the intercept mean?
d. Using this equation, calculate the predicted value of Y for the first student (X=1, Y=2).
e. Calculate the error of estimate for this student.
f. In words, what does this number mean?
Problem 5. Using SPSS and the gss2012.sav dataset, construct the frequency distributions for
the variable CHLDIDEL. Answer the following questions:
a. The largest percentage of people think that the ideal number of children is ________.
b. _______ % think that 7 or more children is ideal.
c. Based on this frequency distribution, what is the level of measurement of this variable?
Problem 6. Using SPSS and the gss2012.sav dataset, calculate measures of central tendency
and variability to describe the variable MAEDUC. Answer the following questions.
a. On average, the mean education of respondents’ mothers is ________. The median education
of mothers is ________.
b. The standard deviation for the respondents’ mothers’ number of years of education is
_______.
c. The mode for the respondents’ mothers’ education is ________.
Problem 7. Using SPSS and the gss2012.sav dataset, construct pie charts for the variables
HAPPY and HEALTH. Make sure to include percentage labels in the charts. Is a larger
percentage of people report being “very happy” or report being in excellent health?
3
Problem 8. Using SPSS and the gss2012.sav dataset, construct a histogram with a normal curve
for the variable MAEDUC. Is the distribution of MAEDUC symmetric (normal) or skewed,
and if skewed, what is the direction of that skew?
Problem 9. Using SPSS and the gss2012.sav dataset, calculate the correlation coefficient for the
variables MAEDUC and EDUC. Answer the following questions.
a. Correlations can vary from _____ to _____, with _______ meaning there is no relationship.
b. The value of the correlation coefficient between MAEDUC and EDUC is _______.
c. What does this number tell you about the relationship between these two variables?
Problem 10. Using SPSS and the gss2012.sav dataset, regress REALINC (family income) on
EDUC. Create a scatterplot for these two variables and display the regression line on the
scatterplot. What is the predicted change in family income if education increases by one year?
Multiple Choice Questions:
1. If a distribution has a few very high scores, the best measure of central tendency will be:
a. mode
b. median
c. mean
d. range
e. interquartile range
_
2. The sum of deviation scores, (X-X):
a. will only be equal zero if there is no variation in the variable
b. will always equal zero
c. will only be equal zero if the distribution is symmetric
d. will always equal one
e. will only equal one if the standard deviation equal one
4
3. For the nominal level of measurement, the best
measure of central tendency is:
a. mode
b. median
c. mean
d. variance
e. range
2000
.0
00
00
40
.0
00
00
36
.0
00
00
32
.0
00
00
28
.0
00
00
24
.0
00
00
20
.0
00
00
16
.0
00
0
0.
Inco me
3
2
1
0
Variable Y
6. If Y is regressed on X and the slope of a
regression line equals 4:
a. when X increases by 4, Y will increase by 4
b. when X decreases by 4, Y will decrease by 4
c. when X increases by 1, Y will increase by 4
d. when Y increases by 1, X will increase by 4
e. when X increases by 1, Y will decrease by 4
0
0.
0
0.
5. This scatterplot indicates that there is:
a. no relationship between X and Y
b. strong positive correlation between X and Y
c. strong negative correlation between X and Y
d. weak positive correlation between X and Y
e. weak negative correlation between X and Y
00
80
0
00
40
Frequency
1000
00
12
4. The distribution on this histogram looks:
a. positively skewed
b. negatively skewed
c. positively kurtotic
d. negatively kurtotic
e. normal
3000
-1
-2
-3
-2
-1
0
1
2
Variable X
7. Coefficient of determination:
a. ranges from -1 to 1
b. cannot be zero
c. shows the proportion of unique variance in X
d. cannot exceed 1
e. is calculated by squaring the coefficient of
alienation
8. If a value has a z-score of -2:
a. it is two variances away from the mean of
normal distribution
b. it is two standard deviations away from the
mean of normal distribution
c. its range is two
d. it has two degrees of freedom
e. it is non-existent because z-scores cannot be
negative
5
9. The mean of normal distribution is also:
a. its variance
b. its standard deviation
c. its median and mode
d. its z-score
e. its range
10. Based on the following frequency distribution, what is the level of measurement for this
variable?
a. Ordinal
b. Normal
c. Interval
d. Leptokurtic
e. Nominal
6
```