Statistics - Amine Ouazad

advertisement
Statistics for Social
and Behavioral Sciences
Session #17:
Hypothesis Testing:
The Confidence Interval Method and the T-Statistic Method
(Agresti and Finlay, from Chapter 5 to Chapter 6)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
Week 1
PART II. DESCRIBING DATA
Weeks 2-4
PART III. DRAWING CONCLUSIONS FROM DATA:
INFERENTIAL STATISTICS
Weeks 5-9
Firenze or Lebanese Express’s ratings are within a MoE of each other!
PART IV. : CORRELATION AND CAUSATION:
REGRESSION ANALYSIS
This is where we talk
about Zmapp and Ebola!
Weeks 10-14
Last Session
Hypothesis testing is the foundation of (social) sciences.
• Three typical types of hypothesis:
– A parameter is equal to …..
– A parameter is greater than ….
– A parameter is lower than …. .
• Null hypothesis (to be rejected), and alternative hypothesis.
• We provide evidence to reject a null hypothesis.
– We might not have evidence to reject the null hypothesis.
For a test on the population mean m:
• Confidence interval method for the test of H0 : m = v. Ha: m ≠ v.
– Reject the H0 with significance level 5% if the 95% confidence interval for the
sample mean m does not include v.
– Reject the H0 with significance level 10% if the 90% confidence interval for
the sample mean m does not include v.
Today
• Hypothesis testing in Statistics:
– The Confidence Interval method of testing m=v.
• An equivalent way of testing m = v:
– The t test (also invented by Mr Student).
Outline
1. Testing hypothesis
using the confidence interval method
(continued)
2. Testing hypothesis
using the t-test (absolutely equivalent)
Next time: one-sided t test of mean and proportion
Chapter 6 of A&F
Testing H0: m=v using confidence intervals
• H0: “The fraction of men in Abu Dhabi is 50%.” equivalently “m = 0.5”.
• By simple random sampling, gather N observations Xi=0,1.
• Build a confidence interval for the sample mean m of Xi.
– Same methods as seen in previous sessions.
• If the null hypothesis is true, only 5 of the 95% confidence intervals will
not include 0.5.
• Thus if the null hypothesis is true, there is only a 5% probability that my
confidence interval will not include 0.5.
☞ Reject the null hypothesis if the confidence interval for m does not
include v.
Proportion Female of Juilliard Graduates, Total and By Section:
1947 to 1995
Female Share of New Hires in Four Orchestras, 1950s to 1990s
Do Orchestras Prefer Hiring Men?
• Orchestras in the US are
overwhelmingly male.
At the Royal Festival Hall
We know the rate at which women are
hired in orchestras (the data is
surprisingly good):
• Women reach the first stage of
recruitment at a 17.1% rate.
• Women reach the second stage of
recruitment at a 56.8% rate.
• Women reach the finals at a 8.7% rate.
Overall, from the overall pool of all
applicants, women are hired at a 1.7%
rate.
Conducting a little experiment…
• What if we were auditioning musicians for
hiring… behind a curtain, with a carpet, and
no talking allowed??
• Would that lead to a rate of hiring that is different from the
usual rate of hiring? (1.7%)
Orchestrating Impartiality: The Impact of “Blind” Auditions on
Female Musicians, National Bureau of Economic Research, January
1997.
Prof. Cecilia Rouse
Princeton
University
The data collected
Rate of
advancement
Sample Size
Rate of
advancement for
women in all
orchestras
(known v)
Preliminaries
21.6%
222
17.1%
Semi-Finals
38.5%
65
56.8%
Finals
23.5%
17
8.7%
Hired
2.7%
445
1.7%
Building the confidence interval
• The confidence interval is noted:
[ m – z0.05 * SE , m + z0.05 *SE ]
Or
[ m – t0.05 * SE , m + t0.05 *SE ]
•
•
•
•
The standard error SE = sX/√N.
m : sample mean (known)
sX: sample standard deviation (known).
t0.05 or z0.05 : from Table 5.1.
z or t ?
• We use the notation z when using the Central
Limit Theorem:
– Sample size is large, data was collected by simple
random sampling.
• We use the notation t when using the t
distribution:
– Distribution of X is normal (applies to height, weight,
but not to superstar distributions).
• z=t when the sample size is large (when df = ∞).
– Thus t is encountered more frequently than z.
t Table
Outline
1. Testing hypothesis
using the confidence interval method
2. Testing hypothesis
using the t-test (absolutely equivalent)
Next time: one-sided t test of mean and proportion
Chapter 6 of A&F
From the confidence interval method …to
the t-test
Null hypothesis: m = v.
• We do not reject the null hypothesis H0 with
confidence level 95% if the 95% confidence
interval for the sample mean m includes v.
Do not reject H0 at 95% if:
m – t0.05 * SE < v < m + t0.05 * SE
• Notice that this is equivalent to:
Do not reject H0 if:
-t0.05 < (m-v)/SE < t0.05
• t0.05 is the 95% critical value for the t statistic.
• (m-v)/SE is the t statistic.
Graphically…
Sampling distribution of the t statistic
df = N-1
On this graph, indicate for which
values of t we should reject the null
hypothesis…
• With 95% confidence.
• With 90% confidence.
And also with 99% confidence ?
Under the null hypothesis (m=v):
• (m-v)/SE follows a standard normal distribution if the sample size is large.
• (m-v)/SE follows a t distribution
if (i) the sample size is small and (ii) X is normally distributed.
Hypothesis testing
• Hypothesis: an empirical statement about a population
parameter. Usually of the shape:
– “The parameter is equal to a given value”
– “The parameter is greater than a given value”
– “The parameter is lower than a given value”
This session
Next session
• Almost all scientific/sociological/economic statements
can be reduced to one of these three types.
– “The population proportion of voters for Cory Gardner is
greater than 50%.” (second type of hypothesis)
– “The impact of ZMapp on Ebola patients’ condition is zero.”
(first type of hypothesis)
Exercise 6.20: Literary Analysis
The authorship of an old document is in doubt.
A historian hypothesizes that the author was a journalist named Jacalyn Levine.
Upon a thorough investigation of Levine’s known works, it is observed that one
unusual feature of her writing was that she consistently began 6% of her
sentences with the word whereas. To test the historian’s hypothesis, it is
decided to count the number of sentences in the disputed document that begin
with whereas. Out of the 300 sentences, none do.
Let π denote the probability that any one sentence written by the unknown
author of the document begins with whereas.
Test H0: “π= 0.06” against Ha: “π is not equal 0.06.”
What assumptions are needed for your conclusion to be valid?
(F. Mosteller and D. L. Wallace conducted this type of investigation to determine
whether Alexander Hamilton or James Madison authored 12 of the Federalist
Papers. See Inference and Disputed Authorship:The Federalist, Addison-Wesley,
1964.)
Wrap up
Confidence interval method for the test of H0 : m = v. Ha: m ≠ v.
– Reject the H0 with significance level 1% if the 99% confidence interval for the
sample mean m does not include v.
– Reject the H0 with significance level 5% if the 95% confidence interval for the
sample mean m does not include v.
– Reject the H0 with significance level 10% if the 90% confidence interval for
the sample mean m does not include v.
t test method for the test of H0 : m = v. Ha: m ≠ v.
– Build the t statistic (m-v)/SE
– Reject the H0 with significance level 1%
if the t statistic is outside the range [-t0.01 , t0.01]
– Reject the H0 with significance level 5%
if the t statistic is outside the range [-t0.05 , t0.05]
– Reject the H0 with significance level 10%
if the t statistic is outside the range [-t0.10 , t0.10]
Coming up:
Readings:
• Mid term on Tuesday, November 25.
– Coverage: up to Chapter 6 inclusive.
•
•
Online quiz due Tuesday at 9am.
Deadlines are sharp and attendance is followed.
For help:
• Amine Ouazad
Office 1135, Social Science building
amine.ouazad@nyu.edu
Office hour: Tuesday from 5 to 6.30pm.
• GAF: Irene Paneda
Irene.paneda@nyu.edu
Sunday recitations.
At the Academic Resource Center, Monday from 2 to 4pm.
Download