Soc 5811 Lab #4
I. Welcome
1. Third problems set due Monday, October 11 at start of class
2. Review second problem set
II. Objectives
1. Learn how to draw sub-samples from the GSS dataset.
2. Begin inferential statistics and confidence intervals.
III. Sampling and Inferential Statistics (review from last week)
1. Inferential statistics involves making generalizations about a population using
information from a sample along with statistical laws. For example, information
in the General Social Survey is used to make broader generalizations about the
American population. We will draw random samples of the 2002 General Social
Survey to get a “hands-on” feel for how samples and populations are related.
2. First, a review on population and sampling distributions…
a. What is the difference between a sample and a population? What is a
random sample?
b. What notation do we use for population parameters and sample
c. Recall that the sampling distribution of the mean is made up of mean
estimates from all possible samples of a fixed size. Because we rarely
know the sampling distribution, we often think of it as a probability
distribution. We can then use the probability distribution to judge how
close our population estimate is likely to be. (There will be more on this
next week.)
2. For now, we will treat the GSS as our population. Calculate the mean and
standard deviation for variable hrs1, or the number of hours the respondent
worked last week, for the entire GSS.
3. Next, draw a random sub-sample of 1% of cases, and recalculate the mean and
standard deviation for hours worked last week. Record the mean and standard
deviation. We will need it later.
4. Repeat this process nine times, selecting a new sub-sample each time. Then,
open up a new SPSS data file and create two new variables, mean1 and stdev1.
Enter the 10 values for each mean and standard deviation for each sub-sample.
While this is not an actual sampling distribution because we have not taken all
possible 10-case samples, it does give us a flavor for how a sampling distribution
5. Calculate the mean and standard deviation of the sub-sample means and
standard deviations. How close are they to the actual mean of the survey
6. Repeat steps 3-5 with random sub-samples of 10% cases. Enter the new
variables, mean2 and stdev2, into your new dataset. Compare the means and
standard deviations of the two sub-sample sizes. What did you find? Which set
of mean estimates are more spread out? Which is likely to provide a better
estimate of the population mean?
IV. Confidence Intervals
1. A confidence interval is the “range of values around a point estimate that
makes it possible to state the probability that an interval contains the population
parameter between its lower and upper bounds” (Bohrnstedt & Knoke p.90). In
short, a confidence interval is the range of values in which our population
parameter is likely to fall. We calculate a confidence interval using our sample
mean and our best estimate of the standard error.
a. What is the formula for a 95% confidence interval?
2. When we have a large sample size, we can use a Z-table to determine the
number of standard errors associated with the probability of our confidence
interval. What is the general formula for confidence intervals with a large N?
3. When our N is small, we can not assume that our sampling distribution is
normal. We can then use the Student’s T-Distribution to determine the critical
values necessary to create the confidence interval. What is the formula for
calculating a confidence interval for a small sample?
4. Find the mean and standard deviation for variable hrs1, the number of hours
worked last week. Calculate the 95% confidence interval by finding the correct
critical value in the Student’s T-Distribution table. Also, make a visual
representation of the interval by drawing the band around the sample mean in
which the population mean is likely to fall.
5. Next, calculate the confidence interval using SPSS (see instructions). A quick
preview of hypothesis testing: what if we hypothesized that the population mean
for hours worked is 40 hours. Can we be confident in this estimate?
IV. Problem Solving
1. Suppose President Bush plans to provide tax breaks for people who make
below the mean income in the United States (I know, I know…just run with it).
His policy claims that the mean income in the United States is $18,000, but he
needs you to test if this is a close estimate using data from the GSS. Calculate
and draw a 95% confidence interval around the sample mean for rincom98. Can
we be confident in Bush’s estimate?
2. Suppose President Bush also supports prayer in schools, arguing that
Americans don’t engage in enough religious activity (i.e. only 1 or 2 times a
month). Calculate and draw a 95% confidence interval around the sample mean
for variable relig80. Can we be confident in Bush’s assessment of religious
activity in the United States?
V. Confidence Intervals and Test Values (if time allows)
1. Suppose we have reason to believe that the average number of hours per week
worked in the United States is 40. We can use this value as a test value when we
calculate the confidence interval (see instructions). SPSS then gives us the tvalue for our test value as well as the test value’s deviation from the upper and
lower limits of the confidence interval. We can then determine if our population
parameter estimate is close enough to the sample mean. Calculate the 95%
confidence interval for hrs1 with a test value of 40. What did you find?
I. Select cases
1. Go to Data, choose Select cases.
2. There are a few different options (select a random sample, create a filter
variable, etc.). We will be selecting cases based on our desired sample size.
3. Click on Random sample of cases button.
4. Select exactly 10 cases 9or however many cases are going to be in your sample.
Be sure to put the total number of cases in the second box.
5. Click Continue, Paste, and then Run.
6. To select all cases, go back to the Select cases window and click on All cases.
II. Calculating confidence intervals
1. Click on Analyze, Compare Means, One-Sample T-test.
2. Put variables for which you want to construct confidence intervals into the box.
3. Click on Options and select the confidence level for your interval.
4. To see if a particular estimate falls within the confidence interval, type the
estimate into the Test value box.
5. Confidence intervals can also be calculated in the Explore command under the
Descriptive Statistics drop-down window.