9 Introduction to the t-statistic

advertisement
Introduction to the t
Statistic
What were the formulae used
for the z statistic?
= (M – μ) / σm
 σm = σ/√n
 σ = √(SS/n)
z
What Do You Notice About
These Formulae?

They are based on population parameters
 How often do you think these population
parameters are known?




Well, we have said that the sample mean is usually a
good estimate of the population mean
Therefore finding μ is not generally a problem we
worry about
What about σ and σm?
These we cannot estimate
So, What Do We Do?
 When
we do not know the population
variation we use t - tests
The Story…
The t statistic was introduced by William Sealy Gosset for
cheaply monitoring the quality of beer brews. "Student"
was his pen name. Gosset was a statistician for the
Guinness brewery in Dublin, Ireland, and was hired due
to Claude Guinness's innovative policy of recruiting the
best graduates from Oxford and Cambridge to apply
biochemistry and statistics to Guinness' industrial
processes. Gosset published the t test in Biometrika in
1908, but was forced to use a pen name by his employer
who regarded the fact that they were using statistics as a
trade secret. In fact, Gosset's identity was unknown not
only to fellow statisticians but to his employer—the
company insisted on the pseudonym so that it could turn
a blind eye to the breach of its rules.
From Wikipedia
… more on Gossett…
 Gossett
was a chemist and was
responsible for developing procedures for
ensuring the similarity of batches of
Guiness. The t-test was developed as a
way of measuring how closely the yeast
content of a particular batch of beer
corresponded to the brewery's standard.
From http://ccnmtl.columbia.edu/projects/qmss/t_about.html
Why t-test?
Student's distribution arises when (as in nearly all practical
statistical work) the population standard deviation is
unknown and has to be estimated from the data.
Textbook problems treating the standard deviation as if it
were known are of two kinds:


(1) those in which the sample size is so large that one may treat
a data-based estimate of the variance as if it were certain, and.
(2) those that illustrate mathematical reasoning, in which the
problem of estimating the standard deviation is temporarily
ignored because that is not the point that the author or instructor
is then explaining.
From Wikipedia
The t Statistic

The t statistic is used to test hypotheses about
an unknown population mean (μ) when the value
of σ is unknown. The formula for the t statistic
has the same structure as the z-score formula,
except that the t statistic uses the estimated
standard error in the denominator
 t = (M – μ)/sm
 sm = s/√n = √(s2/n)
 s = √[SS/(n-1)] = √(SS/df)
What is the Estimated Standard
Error (sm)?
 The
estimated standard error (sm) is used
as an estimate of the real standard error
(σm) when the value of σ is unknown. It is
computed from the sample variance or
sample standard deviation and provides
an estimate of the standard distance
between a sample mean M and the
population mean μ.
What Are The Degrees of
Freedom (df)?
 Degrees
of freedom describe the number
of scores in a sample that are independent
and free to vary. Because the sample
mean places a restriction on the value of
one score in the sample, there are n-1
degrees of freedom for the sample.
Describe the Shape of the t Distribution

The t is leptokurtic but as df gets larger, it more
closely resembles the normal curve
 This is due to the fact that sm more closely
estimates σm when the df gets very large
 Once df is sufficiently large t is distributed as z

What is two tailed the critical value (tcrit) for α = .05
and df = 6
• 2.447

One tailed?
• 1.943
How Did Gossett Use His Test?

He had to find out if the beer that was brewed
met the brewery standards for the yeast content
 First, he would take samples of the beer from
each vat and determine the yeast content
 With this data he would know the desired yeast
content (μ) as set by factory standards, the
mean yeast content for the samples (M), and the
sample standard deviation (s) for yeast content
Lets See What Gossett Might
Have Seen…
 What
if there are usually 15 grams of
yeast per bottle of Guinness.
 We take nine samples of beer from a vat
and we get readings of {7, 12, 11, 15, 7, 8,
15, 9, 6}
 Does this vat have a significantly different
(α = .05) level of yeast than what
Guinness wants?
Step 1: State Your Hypotheses
 Null:

H0: μ = 15
 Alternative

H1: μ ≠ 15
 State

your alpha
α = .05
Step 2: Find tcrit
 First

find the df
df = n – 1 = 9 – 1 = 8
 Find
the two tailed critical t value for df = 8
and α = .05
Step 3: Sample Data and Test
Statistics
 Mean
= 10
 SS = 94
 s2 = 11.75
 s = 3.43
 sm = 1.14
 t = -4.39
Step 4: Make a Decision
 Is
our observed t (tobs) greater than, or less
than the critical value for t (tcrit)
 Therefore we make the decision

t(8) = -4.39, p<.05
Measuring Effect Size
 How

did we measure effect size before?
Mean difference over standard deviation
 Therefore…
 Here,
estimated d = mean difference /
sample standard deviation
Measuring Effect Size (Take
Two!)
We can measure effect size by looking at
the proportion of variance accounted for
This is sometimes called PRE, or
Proportional Reduction in Error
Two ways of calculating this



1.
2.
Variability accounted for / total variability
r2 = t2/(t2 + df)
Effect Size
Cohen’s d = mean difference/standard
deviation


5/3.43 = 1.46

r2 = Variability accounted for / total
variability

r2 = t2/(t2 + df)
Confidence Intervals
 Point
Estimate
 Interval Estimate
Directional Hypotheses
 When


is a directional hypothesis justified?
When there is clear theoretical support for a
one tailed test.
This is done through a literature review of
past findings, not simply well thought out logic
 What
are examples of directional
hypotheses?
 How do we use directional hypotheses?
Download