Statistics for Social and Behavioral Sciences Session #17: Hypothesis Testing: The Confidence Interval Method and the T-Statistic Method (Agresti and Finlay, from Chapter 5 to Chapter 6) Prof. Amine Ouazad Statistics Course Outline PART I. INTRODUCTION AND RESEARCH DESIGN Week 1 PART II. DESCRIBING DATA Weeks 2-4 PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL STATISTICS Weeks 5-9 Firenze or Lebanese Express’s ratings are within a MoE of each other! PART IV. : CORRELATION AND CAUSATION: REGRESSION ANALYSIS This is where we talk about Zmapp and Ebola! Weeks 10-14 Last Session Hypothesis testing is the foundation of (social) sciences. • Three typical types of hypothesis: – A parameter is equal to ….. – A parameter is greater than …. – A parameter is lower than …. . • Null hypothesis (to be rejected), and alternative hypothesis. • We provide evidence to reject a null hypothesis. – We might not have evidence to reject the null hypothesis. For a test on the population mean m: • Confidence interval method for the test of H0 : m = v. Ha: m ≠ v. – Reject the H0 with significance level 5% if the 95% confidence interval for the sample mean m does not include v. – Reject the H0 with significance level 10% if the 90% confidence interval for the sample mean m does not include v. Today • Hypothesis testing in Statistics: – The Confidence Interval method of testing m=v. • An equivalent way of testing m = v: – The t test (also invented by Mr Student). Outline 1. Testing hypothesis using the confidence interval method (continued) 2. Testing hypothesis using the t-test (absolutely equivalent) Next time: one-sided t test of mean and proportion Chapter 6 of A&F Testing H0: m=v using confidence intervals • H0: “The fraction of men in Abu Dhabi is 50%.” equivalently “m = 0.5”. • By simple random sampling, gather N observations Xi=0,1. • Build a confidence interval for the sample mean m of Xi. – Same methods as seen in previous sessions. • If the null hypothesis is true, only 5 of the 95% confidence intervals will not include 0.5. • Thus if the null hypothesis is true, there is only a 5% probability that my confidence interval will not include 0.5. ☞ Reject the null hypothesis if the confidence interval for m does not include v. Proportion Female of Juilliard Graduates, Total and By Section: 1947 to 1995 Female Share of New Hires in Four Orchestras, 1950s to 1990s Do Orchestras Prefer Hiring Men? • Orchestras in the US are overwhelmingly male. At the Royal Festival Hall We know the rate at which women are hired in orchestras (the data is surprisingly good): • Women reach the first stage of recruitment at a 17.1% rate. • Women reach the second stage of recruitment at a 56.8% rate. • Women reach the finals at a 8.7% rate. Overall, from the overall pool of all applicants, women are hired at a 1.7% rate. Conducting a little experiment… • What if we were auditioning musicians for hiring… behind a curtain, with a carpet, and no talking allowed?? • Would that lead to a rate of hiring that is different from the usual rate of hiring? (1.7%) Orchestrating Impartiality: The Impact of “Blind” Auditions on Female Musicians, National Bureau of Economic Research, January 1997. Prof. Cecilia Rouse Princeton University The data collected Rate of advancement Sample Size Rate of advancement for women in all orchestras (known v) Preliminaries 21.6% 222 17.1% Semi-Finals 38.5% 65 56.8% Finals 23.5% 17 8.7% Hired 2.7% 445 1.7% Building the confidence interval • The confidence interval is noted: [ m – z0.05 * SE , m + z0.05 *SE ] Or [ m – t0.05 * SE , m + t0.05 *SE ] • • • • The standard error SE = sX/√N. m : sample mean (known) sX: sample standard deviation (known). t0.05 or z0.05 : from Table 5.1. z or t ? • We use the notation z when using the Central Limit Theorem: – Sample size is large, data was collected by simple random sampling. • We use the notation t when using the t distribution: – Distribution of X is normal (applies to height, weight, but not to superstar distributions). • z=t when the sample size is large (when df = ∞). – Thus t is encountered more frequently than z. t Table Outline 1. Testing hypothesis using the confidence interval method 2. Testing hypothesis using the t-test (absolutely equivalent) Next time: one-sided t test of mean and proportion Chapter 6 of A&F From the confidence interval method …to the t-test Null hypothesis: m = v. • We do not reject the null hypothesis H0 with confidence level 95% if the 95% confidence interval for the sample mean m includes v. Do not reject H0 at 95% if: m – t0.05 * SE < v < m + t0.05 * SE • Notice that this is equivalent to: Do not reject H0 if: -t0.05 < (m-v)/SE < t0.05 • t0.05 is the 95% critical value for the t statistic. • (m-v)/SE is the t statistic. Graphically… Sampling distribution of the t statistic df = N-1 On this graph, indicate for which values of t we should reject the null hypothesis… • With 95% confidence. • With 90% confidence. And also with 99% confidence ? Under the null hypothesis (m=v): • (m-v)/SE follows a standard normal distribution if the sample size is large. • (m-v)/SE follows a t distribution if (i) the sample size is small and (ii) X is normally distributed. Hypothesis testing • Hypothesis: an empirical statement about a population parameter. Usually of the shape: – “The parameter is equal to a given value” – “The parameter is greater than a given value” – “The parameter is lower than a given value” This session Next session • Almost all scientific/sociological/economic statements can be reduced to one of these three types. – “The population proportion of voters for Cory Gardner is greater than 50%.” (second type of hypothesis) – “The impact of ZMapp on Ebola patients’ condition is zero.” (first type of hypothesis) Exercise 6.20: Literary Analysis The authorship of an old document is in doubt. A historian hypothesizes that the author was a journalist named Jacalyn Levine. Upon a thorough investigation of Levine’s known works, it is observed that one unusual feature of her writing was that she consistently began 6% of her sentences with the word whereas. To test the historian’s hypothesis, it is decided to count the number of sentences in the disputed document that begin with whereas. Out of the 300 sentences, none do. Let π denote the probability that any one sentence written by the unknown author of the document begins with whereas. Test H0: “π= 0.06” against Ha: “π is not equal 0.06.” What assumptions are needed for your conclusion to be valid? (F. Mosteller and D. L. Wallace conducted this type of investigation to determine whether Alexander Hamilton or James Madison authored 12 of the Federalist Papers. See Inference and Disputed Authorship:The Federalist, Addison-Wesley, 1964.) Wrap up Confidence interval method for the test of H0 : m = v. Ha: m ≠ v. – Reject the H0 with significance level 1% if the 99% confidence interval for the sample mean m does not include v. – Reject the H0 with significance level 5% if the 95% confidence interval for the sample mean m does not include v. – Reject the H0 with significance level 10% if the 90% confidence interval for the sample mean m does not include v. t test method for the test of H0 : m = v. Ha: m ≠ v. – Build the t statistic (m-v)/SE – Reject the H0 with significance level 1% if the t statistic is outside the range [-t0.01 , t0.01] – Reject the H0 with significance level 5% if the t statistic is outside the range [-t0.05 , t0.05] – Reject the H0 with significance level 10% if the t statistic is outside the range [-t0.10 , t0.10] Coming up: Readings: • Mid term on Tuesday, November 25. – Coverage: up to Chapter 6 inclusive. • • Online quiz due Tuesday at 9am. Deadlines are sharp and attendance is followed. For help: • Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Tuesday from 5 to 6.30pm. • GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.