Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Review of Statistics
(2)
Policy question: What is the effect on test scores (or some other outcome measure) of reducing class size by one student per class? by 8 students/class?
We must use data to find out
The California Test Score Data Set
All K-6 and K-8 California school districts ( n = 420)
Population
The group or collection of all possible entities of interest
We will think of populations as infinitely large (
is an approximation to “very big”)
Random Variable Y
(1)
Numerical summary of a random outcome
1
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Population distribution of Y
The probabilities of different values of Y that occur in the population
or: The probabilities of sets of these values
Moments of a population distribution:
Two random variables X and Z have a joint distribution.
cov( X , Z ) = E [( X –
X
)( Z –
Z
)] =
XZ
The covariance is a measure of the linear association between X and Z ; its units are units of X
units of Z
cov( X , Z ) > 0 means a positive relation between X and Z
If X and Z are independently distributed, then cov( X , Z ) = 0
The covariance of a r.v. with itself is its variance: cov( X , X ) = E [( X –
X
)( X –
X
)] = E [( X –
X
) 2 ] =
2
X
2
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
The correlation coefficient is defined in terms of the covariance: corr( X , Z ) = cov( , ) var( X Z
X
XZ
Z
=
ρ
XZ
–1
corr( X , Z )
1
corr( X , Z ) = 1 mean perfect positive linear association
corr( X , Z ) = –1 means perfect negative linear association
corr( X , Z ) = 0 means no linear association
The correlation coefficient measures linear association
3
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Conditional distributions and conditional means
Conditional distributions
The distribution of Y , given value(s) of some other random variable, X
Conditional expectations and conditional moments
conditional mean = mean of conditional distribution
= E ( Y | X = x )
conditional variance = variance of conditional distribution
The difference in means is the difference between the means of two conditional distributions:
4
Initial look at the data:
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
This table doesn’t tell us anything about the relationship between test scores and the STR .
5
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Do districts with smaller classes have higher test scores?
Scatterplot of test score vs. student-teacher ratio for n =420
What does this figure show?
6
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?
1.
Estimation -
2.
Hypothesis testing –
3.
Confidence Interval -
7
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes:
Class
Size
Average score ( Y ) Standard deviation
( s B
Y B
) n
Small
Large
657.4
650.0
19.4
17.9
238
182
1.
Estimation of
= difference between group means
2.
Test the hypothesis that
= 0
3.
Construct a confidence interval for
1. Estimation
Is this a large difference in a real-world sense?
Is this a big enough difference to be important for school reform discussions, for parents, or for a school committee?
8
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
2. Hypothesis testing
Difference-in-means test: compute the t -statistic, t
Y s
Y l s
2 s n s
s l
2 n l
Y
( s s
Y
l
Y l
) where SE (
Y s
–
Y l
) is the “standard error” of
Y – s
Y , the subscripts s l and l refer to “small” and “large” STR districts, and s s
2
n s
1
1 i n s
1
( Y i
Y s
) 2
Compute the difference-of-means t -statistic:
Size Stdev n small large
Y
657.4
650.0
19.4
17.9
238
182
9
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
3. Confidence interval
A 95% confidence interval for the difference between the means is,
(
Y s
–
Y l
)
1.96
SE ( Y s
– Y ) l
Two equivalent statements:
10
Estimation and the Population Mean
Y is the natural estimator of the mean. But:
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
(a)
(b) Why should we use Y rather than some other estimator?
Which is the best estimator?
Desirable properties of estimators
A.
Unbiasedness: the expected value of the estimator is equal to the actual parameter.
1.
We want to estimate
, the true population parameter.
2.
Instead we end up with q ˆ
the estimator
3.
E ( ˆ )
= q Þ unbiasedness
B.
Efficient: The best use of information available to us. This means the estimator has the smallest variance possible
11
Intro to Econometrics/Econ 526
Fall 2014/ Manopimoke
C.
Consistency - Estimator converges to the true parameter
1.
estimator of
2.
A formal statement of consistency: as n
,
>0 . This is a result of the law of large numbers.
Is 𝑌̅ a desirable estimator of µ
Y
? that the average squared differences between the observations and
Y are the smallest among all possible estimators
Y is the “least squares” estimator of
Y
Y sum of squared “residuals”
12