FULL NAME: ________________________ AS IT APPEARS ON ALBERT STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES MID TERM 2 – November 25, 2014 This midterm is made of 7 problems on 10 numbered pages, for a total of 54 points. You have 70 minutes (one hour and 10 minutes) to complete this midterm. Write your name at the top of this sheet and on every page. Follow each problem’s instructions. There is one box to tick, unless indicated. Provide your numerical answer on the line provided in clear handwriting. Non-communicating calculators are recommended (no 3g, wifi, LTE, 2g, EDGE, even in airport mode). The calculator’s memory should not contain the course’s statistical formulas. Please perform this mid term 2 in silence. If stuck on one question, move on ! There are easy and tough questions on this midterm, so keep going! This is a closed book midterm. Notes are not allowed. If in doubt about a question, write on the answer sheet provided and move on. The rule is that I will not provide clarifications during the exam. This midterm should be enjoyable: good luck ! The table of t scores is provided on page 10. 1 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Problem 1 – Sampling distribution of the t statistic (10 points) A researcher on Hepatitis B has a sample of 419,221 individuals, and he would like to test the null hypothesis that the average fraction of individuals with Hepatitis B is equal to 3%. In his sample, 2.8% of individuals have Hepatitis B. He takes a sheet a paper to remind himself of the sampling distribution of the t statistic. On the diagram below of the sampling distribution of the t statistic, draw/mark: ▪ the bell shaped sampling distribution of the t statistic ▪the label of the horizontal axis ▪the label of the vertical axis ▪the mean of the sampling distribution of the t statistic ▪the standard deviation of the sampling distribution t statistic (mark a segment) ▪the t score (and minus the t score) for confidence levels 90%, 95%, 99%. ▪the fraction of t statistics that are above the t score at 95% confidence level. ▪ the fraction of t statistics that are below minus the t score at 95% confidence level. The sampling distribution of the t statistic: is right skewed is symmetric is left skewed none of the above 2 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Problem 2 – True or False? (6 points) An NYU Abu Dhabi student did not take “statistics for social sciences” but read the book independently. He tends to make some mistakes. What assertions are true? Tick the only answer that applies. 1. For a test at 95% confidence that the population mean is equal to 0.4, the probability of Type II error is 5%. True. False, the probability of Type II error is 95%. False, the probability of Type II error is unknown. 2. As sample size increases, for a test at given significance level, the probability of Type I error decreases. True. False, the probability of Type I error increases, as the standard error decreases. False, the probability of Type I error is constant. 3. For a t test, the null hypothesis is that the sample mean is equal to a given value (say 0.5) True False, the null hypothesis is that the population mean is equal to a given value. False, the null hypothesis is that the population mean is different from a given value. False, the null hypothesis is that the sample mean is different from a given value. 4. For a large sample size, the standard deviation of the sampling distribution of the t statistic is 1 True 3 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT False, the standard deviation of the sampling distribution of the t statistic is 1.96. False, the standard deviation of the sampling distribution of the t statistic is 2 times 1.96. 5. As the confidence level increases, the confidence interval for the sample mean gets larger. True False 6. The standard error of a sample mean of X is always smaller or equal than the standard deviation of the variable X. True. False, it depends on the sample. False, is always larger. 7. The sampling distribution of the t statistic has thinner tails than the normal distribution. In other words, the probability of extremely large values of the t statistic is smaller in the t distribution than in the normal distribution. True False 8. The larger the number of degrees of freedom of a t distribution, the smaller the t score at 95%. True False, the larger the t score. False, it depends on the standard deviation of the sample. Problem 3 – Ancient manuscript (6 points) You discovered an old statistics manuscript written in 1966. The author (John Enroe) said that he collected a sample by simple random sampling in the population of Gaborone residents, with a variable “age” for each individual of the sample. 4 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT John Enroe in the manuscript says that he can reject the null hypothesis that the population mean age is equal to 30 years old at 95%, but not at 99%. In his sample, the average age was 27 years old, and the number of observations in the sample was N=341. The original data has been lost, and the manuscript was typed on an old typewriter. We would like to recover the standard deviation of age in Gaborone. Write down the formula for the Margin of Error as a function of the sample size N, the standard deviation of age, and the z statistic. ________________________________________________________________ The manuscript says that the null hypothesis is rejected at 95%, therefore: The standard deviation in the sample is smaller than The manuscript says that the null hypothesis is not rejected at 99%. Thus the standard deviation in the sample is larger than ________________________________________________________________ Problem 4 – Bayesian doctor (9 points) A doctor in South African has read in a study that out of 300 patients whose medical condition had improved, 100 had taken MedPlus, while 200 patients had not taken MedPlus. Out of 200 patients whose condition had not improved, 100 had taken Medplus, while 100 had not taken Medplus. The overall proportion of patients whose condition has improved is 60%. The doctor is ultimately interested in the probability that a patient improves conditional on having taken Medplus. 1. Write down Rule #3 of probabilities for two events that are not independent: 5 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT 2. Can you qualify the event A : “Patient taking Medplus” and the event B : “Patient improving”? Tick only the boxes that apply: A and B are unrelated A and B are overlapping A and B are mutually exclusive A and B are independent A and B are related 3. What is the probability that a patient improves conditional on having taken Medplus? Problem 5 – Textual analysis (12 points) In a library, an old series of speeches of a US president are unattributed – the actual president who wrote those speeches is unknown. A political scientist, Alfred Moolb, would like to find out who wrote those speeches. In totality, those speeches have 4,000 words, and he uses the following words with the following frequency Word until whereas task bravery Frequency 7% 3% 5% 0.5% We suspect that the speeches have been authored by John Fitzgerald Kennedy. We collect the writings of John Fitzgerald Kennedy, and find he uses these words with the following frequency: Word until whereas task bravery Frequency 6% 3.4% 4% 1% We would like to test the null hypothesis that ‘the fraction of uses of the word “until” in the unknown president’s speeches is equal to the fraction of uses of the word “until” for John Fitzgerald Kennedy.’ The fraction of uses of the word “until” for John Fitzgerald Kennedy is exactly known. 6 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT 1. Give a 99% confidence interval for the fraction of uses of the word “until” in the unknown president’s speeches: [ , ] 2. Give the t statistic for the null hypothesis: 3. At what significance levels can we reject the null hypothesis? Tick all that apply (points added for right answer, removed for wrong answer). None At 1% At 5% At 10% 4. Now the political scientist Alfred Moolb builds a t statistic for many null hypothesis. Each null hypothesis is ‘the fraction of uses of the word “XXXXX” in the unknown president’s speeches is equal to the fraction of uses of the word “XXXXX” in John Fitzgerald Kennedy’s speeches’. He writes the null hypothesis for 500 different words XXXXX, including until, whereas, task, bravery, and 496 other words. He builds a t statistic for each word as well. Under the null hypothesis, how many of the 500 t statistics will be strictly greater than 2.58 (t>2.58)? Under the null hypothesis, how many of the 500 t statistics will be strictly lower than -2.58 (t<-2.58)? Under the null hypothesis, how many of the 500 t statistics will be strictly lower than -1.65 (t<-1.65)? 7 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Problem 6 – Pakistan Travel (5 points) Anthony is a research assistant for the Clinton Foundation in Pakistan. He has collected observations on samples in different cities. In each city, he tests a null hypothesis for the mean of a variable. The variable has a normal distribution. Each line of this table corresponds to a separate sample of size N, and a separate null hypothesis. Tick all boxes that apply in the last three columns. Sample Lahore Islamabad Sukkur Larkana Peshawar Hyderabad Karachi t statistic 3.2 -1.02 4.5 -6.4 -7.8 2.1 -2.5 sample size N 23 11 2 2 120 31 29 Reject H0 at 90% ? 8 Reject H0 at 95% ? Reject H0 at 99% ? (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Problem 7 – Correlation and Slope The World Bank argues that a higher fraction of women with a college degree causes a lower rate of violence. They found that the average fraction of women with a college degree across countries is 25%. They also found that the standard deviation of violent acts is 10. In the same report, the correlation between the fraction of women with a college degree and violent acts is -0.4. Give the formula relating slope and correlation. What is the slope of the relationship between the fraction of women with a college degree (explanatory variable) and violence (dependent variable)? ________________________________________________________________ Thus… he higher the fraction of women with a college degree, the higher the number of violent acts in a country. he higher the fraction of women with a college degree, the lower the number of violent acts in a country. … A 1 percentage point increase in the fraction of women with a college degree leads to a ……….. (increase/decrease) in the number of violent acts. 9 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT Table of t Scores END OF MIDTERM Thank you for your answers. 10 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 11 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 12 (see next page) FULL NAME: ________________________ AS IT APPEARS ON ALBERT DRAFT – WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF WILL BE CONSIDERED (FIRST 9 PAGES) 13 (see next page)