Introduction: At the beginning of the semester, I told you that there

advertisement
Soc. 504, Lecture 11: Introduction to Statistical Inference
A. Two ways we use statistics
1. describe the distribution of a variable or the relationship among variables within a population or a
sample
2. draw inferences about the distribution of a variables or the relationships between variables in the
population from data for a sample drawn from that population
a. infer the value of population parameter from sample statistic (confidence intervals)
b. test hypotheses about the value of the population parameter based on sample statistics
B. Statistical inference and randomness (chance)
1. We can make inferences or test hypotheses about the value of population parameters from sample
statistics only if some random process can influence the values of the sample data
a. probability sample: a sample in which we know the probability of every case in the
population being included in the sample
(1) randomization (random assignment)--experiments
(2) with observational data, the simplest type of probability sample is a simple random
sample in which every case in the population has an equal probability of being selected.
(a) stratified random sample
(b) different sampling fractions for different
groups
and weighting
(3) some researchers use statistical inference ritualistically even when no random process
could affect the values of their statistics because “statistical significance” has come to imply that
a result is noteworthy
(a) statistical inference is meaningless with
“snowball” or
other “convenience” samples
b. random measurement or coding error
2. statistical inference requires some random process that could affect the values of our variables
that allows us to use the laws of probability to assess the likelihood that the value of an observed
statistic is due to chance
C. Terminology, notation for doing statistical inference
1. Sample statistics versus population parameters
a. we refer to summary values for population data as parameters
(1) we typically denote them with Greek letters (e.g.,  for standard deviation, 2 for variance, 
for the mean,  (“rho”) for the correlation)
(2) we sometimes denote them with capital letters (N)
1
b. we refer to summary values for sample data as statistics
(1) we typically denote them with Roman alphabet (e.g., X for the mean of Xi, s for standard
deviation, s2 for the variance, r for the correlation coefficient)
(2) we sometimes use lower-case letters (n for sample size)
2. Distributions
a. frequency and percentage distributions for variables (based on sample or population data)
b. normal and standard normal distributions
c. probability distributions
d. sampling distributions
D. Probability and probability distributions
1. Probability is the long-run relative frequency with which an outcome occurs.
2. pi = fi/N
3. A probability distribution is analogous to a frequency or a percentage distribution: it lists every
possible outcome of an event along with its likelihood
a. we can describe a probability distribution by its mean, variance, and shape
(1) the mean of probability distribution is its expected value
(a) E(Y) = pi Yi
(2) variance of a probability distribution
= pi [Yi - E(Y)]2 or pi (Yi - )2
(3) standard deviation of probability distribution
=  pi [Yi - E(Y)]2
b. e.g.: the probability distribution of the outcomes of
throwing an
honest (6-sided) die
(1) its mean (or expected value) = Pi Yi = 21/6 = 3.5
(2) its variance = the summed deviations of scores from the expected value: 2 = Pi [Yi
- E(Y)]2 = 2.92
and  P = √2.92 = 1.7
E. The effect of sample size on probability distributions
1. The bigger your sample, the closer the value of any sample statistic will be to its corresponding
population parameter. Our textbook calls this relationship the “empirical rule”. It is usually called
the “law of large numbers” (“law of large samples”)
2. Consider how the proportion of females [p(f)] in a population and how it changes with the size of
the sample we take from this population.
2
3. Like other sample statistics, p(f) is a random variable, given random sampling, its value wil vary
across samples
4. Example: Consider the probability of randomly drawing just one female from a population in
which p = .5 as the size of the sample gets larger
a. If we randomly draw a one-person sample from this pop. we get the following probability
distribution:
Y
Pf
Pf Yf
0
½
0
1
½
½

1
½
The expected number of females is [E(Y)] in a one-person sample (pfY) = ½
b. the probability distribution if we randomly draw a 2-person sample
(1) four possible outcomes: FF, F M, M F, M M—each with a 1/4 probability
(2) the probability distribution looks like this:
#f
pf
pf yf
0
¼
0
1
½
½
2
¼
½

1
1
(3) In a 2-person sample we’d expect to get 1 female
c. What if we randomly draw a three-person sample?
(1) 8 possible 3-person samples (23)
(2) probability distribution for a 3-person sample from
population
in which pf = .5
#f
pf
pf yf
0
1/8
0
1
3/8
3/8
2
3/8
6/8
3
1/8
3/8

1
12/8 = 1.5
(3) In a 3-person sample we’d expect to get 1.5 females
d. probability distribution for the number of females from a random sample of 4 people
(1) 16 possible 4-person samples: (24)
#f
pf
pf yf
3
0
1/16
0
1
4/l6
4/16
2
6/16
12/16
3
4/16
12/16
4
1/16
4/16

16/16 32/16=2
(2) we would expect to get 2 females if n = 4
e. Notice that the probability of getting no females declines (1/2,
1/4, 1/8, 1/16) as the
sample size increases (1,2,3,4).
f. the formula for the likelihood of getting a particular value of a dichotomous variable = Pf = 0 =
.5n
(1) this says that the probability of getting a particular outcomes depends only on the
sample size
(2) the probability of getting no females in a 5-person sample = .55 or 1/32.
(3) the probability of getting no females in a 10-person sample = .510 = 1/1024.
(4) The probability of getting no females in a sample of 100 = .5100
F.
Normal distribution is bell-shaped which means it is symmetrical with a single mode which equals
mean and median
1. Shape of normal distribution
a. symmetrical
b. single mode
c. mean and median = mode
2. There are a large number of normal distributions with different means and different standard
deviations
3. A standard(ized) normal distribution is a normal distribution with a mean of 0 and a standard
deviation of 1.
a. the Z-score distribution is standard normal distribution.
3. If Y is normally distributed, the proportion of the area of the normal curve that corresponds to each
z-score for Y
a.
a z-score of –1.96 is always at the fifth percentile of the normal distribution and a z-score of
1.96 is always at the 95th percentile.
b. This means that we can assess the likelihood of any sample value by transforming it into a zscore and then using Table A to determine what proportion of the area under the normal curve
corresponds to that z-score
4
4. When we use table A to find what proportion of the area under the normal curve corresponds to a
standard score, we are essentially using a nonlinear transformation
5. Review how Table A is set up
G. Sampling distribution of a sample statistic
1. A sampling distribution of a sample statistic is a theoretical distribution of the values of a sample
statistic that we would get if we drew an infinite number of probability samples of a fixed size from
the same population.
2. A sampling distribution for a statistic is a probability distribution that shows all possible
values of a sample statistic for some variable along with the probability of getting each value.
3. You can think of a sampling distribution as a probability distribution for a sample statistic for
some variable (e.g., Y , s, p, r, b, a
4. You would get a sampling distribution for a statistic if you drew an infinite number of samples
from the population and for each sample calculate the statistic that we’re interested in, and keep track
of them, and then make a frequency distribution of the values of all the sample stats, and then
calculate the relative frequency—i.e., the probability—of each value. The resulting sampling
distribution would tell you the probability of drawing another sample in which the statistic of interest
would have a particular value
5. Statistical theory about the sampling distrib. tells us what the distribution of a statistic would look
like it we drew all possible samples, calculated a statistic for some variable Y for each sample and
constructed the distribution for that statistic.
6. We need to know mean, variability and shape of the sampling distributions of a statistic
a. A sampling distrib. involves the same principle as a frequency distrib. Whereas a freq distrib.
presents the frequency of each of a set of possible values, a sampling distrib. for a statistic
like the mean or the proportion shows the probability of drawing a sample that has every
possible value for the statistic in which you’re interested
7. We can describe the sampling distrib. of a normally distributed statistic by its mean and standard
deviation
a. the standard deviation of a sampling distribution for a statistic is called its standard error
(e.g., “the standard error of the proportion,” “the standard error of the mean”)
b. According to the Law of Large Numbers, the larger the size of the sample (n), the closer the
value of the sample statistic (p, Y ) will be to the population parameter (, )
8. Central limit theory: as n increases, the shape of the sampling distribution of a statistic
approaches a normal distribution with expected value [E( Y )] =  and a std error, Y =  / n
[transparency]
5
a. Sample must have at least 30 observations for sampling distrib of the mean to be normal w/
E( Y ) =  and std error,
Y =  / n
b. If n  30, the sampling distrib. of the mean will be normal, even for variables like income
whose distribution is not normal
9. Example: the sampling distribution of the mean
a. mean of the sampling distrib. of the mean,  Y = 
(1) any given sample mean Y may be larger or smaller than the population mean, , but the
average sample mean across many random samples will equal the population mean 
b.
variance of sampling distrib. of the mean, 2Y = 2/n
c. std dev of sampling distrib. of the mean, Y =  / n
 = 58 E( Y ),  = 3.5, n=125
(1) Y = 3.5/125 = 3.5/11.18 = .313
(2) the standard error of the mean of the sampling distribution Y is much less than the
standard dev. in our sample.
d. the sample mean Y obeys the law of large numbers; the larger the N, the smaller the standard
error
e. How big must a sample be before its sampling distribution is normal with E( Y ) = ? Rule of
thumb: 30 cases [transparency]
10. If a sampling distribution is normal, we know its mean and standard deviation, so we can use the
normal curve in Table A to determine the likelihood of getting a sample statistic of any particular
value.
11. Estimating a population parameter from a sample statistic
a. the statistical concepts of error and bias
b. a statistic is an unbiased estimator of a pop parameter if the expected value of the statistic
equals the value of pop parameter; e.g., E( Y ) = ; E(p) = ; but E(s)  
c. even if a statistic is an unbiased estimator of a parameter, however, that doesn’t mean that it
is an error-free measure of the parameter.
6
Download