Answer key

advertisement

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES

MID TERM 2

– November 25, 2014

This midterm is made of 7 problems on 10 numbered pages, for a total of

54 points.

You have 70 minutes (one hour and 10 minutes) to complete this midterm.

Write your name at the top of this sheet and on every page .

 Follow each problem’s instructions. There is one box to tick, unless indicated. Provide your numerical answer on the line provided in clear handwriting .

Non-communicating calculators are recommended (no 3g, wifi, LTE, 2g,

EDGE, even in airport mode)

. The calculator’s memory should not contain the course’s statistical formulas.

Please perform this mid term 2 in silence.

If stuck on one question, move on ! There are easy and tough questions on this midterm, so keep going!

This is a closed book midterm. Notes are not allowed.

If in doubt about a question, write on the answer sheet provided and move on. The rule is that I will not provide clarifications during the exam.

This midterm should be enjoyable: good luck !

The table of t scores is provided on page 10.

1 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Problem 1

– Sampling distribution of the t statistic (10 points)

A researcher on Hepatitis B has a sample of 419,221 individuals, and he would like to test the null hypothesis that the average fraction of individuals with

Hepatitis B is equal to 3%. In his sample, 2.8% of individuals have Hepatitis B.

He takes a sheet a paper to remind himself of the sampling distribution of the t

statistic.

On the diagram below of the sampling distribution of the t statistic, draw/mark:

▪ the bell shaped sampling distribution of the t statistic

▪  the label of the horizontal axis

▪  the label of the vertical axis

▪  the mean of the sampling distribution of the t statistic

▪  the standard deviation of the sampling distribution t statistic (mark a segment)

▪  the t score (and minus the t score) for confidence levels 90%, 95%, 99%.

▪  the fraction of t statistics that are above the t score at 95% confidence level.

▪ the fraction of t statistics that are below minus the t score at 95% confidence level.

The sampling distribution of the t statistic:

 is right skewed

 is symmetric

 is left skewed

 none of the above

2 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Problem 2 – True or False? (8 points)

An NYU Abu Dhabi student did not take “statistics for social sciences” but read the book independently. He tends to make some mistakes.

What assertions are true? Tick the only answer that applies.

1. For a test at 95% confidence that the population mean is equal to 0.4, the probability of Type II error is 5%.

 True.

 False, the probability of Type II error is 95%.

 False, the probability of Type II error is unknown.

It is Type I error that equals significance level (5%), not Type II error. Type II error has more than one value, because the alternative hypothesis contains a range of possible values.

2. As sample size increases, for a test at given significance level, the probability of Type I error decreases.

 True.

 False, the probability of Type I error increases, as the standard error decreases.

 False, the probability of Type I error is constant.

The probability of Type I error is the significance level (page 160). This is set when we pick our confidence level, sample size does not affect it.

3. For a t test, the null hypothesis is that the sample mean is equal to a given value (say 0.5)

 True

 False, the null hypothesis is that the population mean is equal to a given value.

 False, the null hypothesis is that the population mean is different from a given value.

 False, the null hypothesis is that the sample mean is different from a given value.

3 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

4. For a large sample size, the standard deviation of the sampling distribution of the t statistic is 1

 True

 False, the standard deviation of the sampling distribution of the t statistic is 1.96.

 False, the standard deviation of the sampling distribution of the t statistic is 2 times 1.96.

Look at figure 5.4 on page 119 from your textbook. For a large sample size, we look at the z distribution (z=t), which is standard normal and has mean 0 and sd

1.

5. As the confidence level increases, the confidence interval for the sample mean gets larger.

 True

 False

Recall the confidence interval is (m – t*se, m + t*se). When we pick a larger confidence level, our t statistic is larger, and thus the interval gets larger.

6. The standard error of a sample mean of X is always smaller or equal than the standard deviation of the variable X.

 True.

 False, it depends on the sample.

 False, is always larger.

The standard error = standard deviation of x/ square root of n. That means that the standard error will be smaller than the standard deviation if n>1, or equal if n=1.

7. The sampling distribution of the t statistic has thinner tails than the normal distribution. In other words, the probability of extremely large values of the t statistic is smaller in the t distribution than in the normal distribution.

 True

 False

Look at the figure 5.4 on page 119. The sampling distribution of the t statistic has thicker tails.

4 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

8. The larger the number of degrees of freedom of a t distribution, the smaller the t score at 95%.

 True

 False, the larger the t score.

 False, it depends on the standard deviation of the sample.

Look at Table A at the end of your textbook. The t values are getting smaller as degrees of freedom increases. This is because as degrees of freedom increase

(as sample sizes increase), the tails of the t distribution get thinner, and thus the probability of extremely large values of the t statistic is smaller.

Problem 3 – Ancient manuscript (6 points)

You discovered an old statistics manuscript written in 1966. The author (John

Enroe) said that he collected a sample by simple random sampling in the population of Gaborone residents, with a variable “age” for each individual of the sample.

John Enroe in the manuscript says that he can reject the null hypothesis that the population mean age is equal to 30 years old at 95%, but not at 99%. In his sample, the average age was 27 years old, and the number of observations in the sample was N=341.

The original data has been lost, and the manuscript was typed on an old typewriter.

We would like to recover the standard deviation of age in Gaborone.

Write down the formula for the Margin of Error as a function of the sample size N, the standard deviation of age, and the z statistic.

MoE = z* 𝑠 𝑥

√𝑛

________________________________________________________________

The manuscript says that the null hypothesis is rejected at 95%, therefore:

The standard deviation in the sample is smaller than

If our t statistic was exactly the z score (minimum value for it to be rejected at

95%)

5 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT 𝑧 = 𝑚 − 𝑣 𝑠 𝑠 =

√𝑛

(𝑚 − 𝑣)√𝑛 𝑧 s = 28.26

If the null hypothesis was rejected at 95%, the standard deviation should be smaller than 28.26. Notice that the greater the z (the higher the confidence level), the smaller the standard deviation can be if we are to reject it at that level.

The manuscript says that the null hypothesis is not rejected at 99%. Thus the standard deviation in the sample is larger than

If our t statistic was exactly the z score (minimum value needed for it to be rejected at 99%) 𝑧 = 𝑚 − 𝑣 𝑠 𝑠 =

√𝑛

(𝑚 − 𝑣)√𝑛 𝑧 s = 21.47

Since the null hypothesis is rejected at 99%, we know that s should be larger than 21.47.

Problem 4 – Bayesian doctor (9 points)

A doctor in South African has read in a study that out of 300 patients whose medical condition had improved, 100 had taken MedPlus , while 200 patients had not taken MedPlus . Out of 200 patients whose condition had not improved, 100 had taken Medplus , while 100 had not taken Medplus .

The overall proportion of patients whose condition has improved is 60%.

The doctor is ultimately interested in the probability that a patient improves conditional on having taken Medplus .

1. Write down Rule #3 of probabilities for two events that are not independent:

P(A and B) = P(A|B)*P(B)

2.

Can you qualify the event A : “Patient taking Medplus” and the event B :

“Patient improving”? Tick only the boxes that apply:

6 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

 A and B are unrelated

 A and B are overlapping

 A and B are mutually exclusive

 A and B are independent

 A and B are related

3. What is the probability that a patient improves conditional on having taken

Medplus ?

P(improvement) = P(I) = 0.6

P(taking Medplus) = P(M) = P(M|I)P(I) + P(M|not I)P( not I) = 0.4

P(M|I) = 1/3

P(notM|I) = 2/3

P(M|not I) = 1/2

P(not M| not I) = ½

𝑃(𝐼|𝑀) =

𝑃(𝐼&𝑀)

𝑃(𝑀)

=

𝑃(𝑀|𝐼)𝑃(𝐼)

𝑃(𝑀)

=

(

1

3) ∗ 0.6

0.4

= 1/2

Problem 5 – Textual analysis (12 points)

In a library, an old series of speeches of a US president are unattributed

– the actual president who wrote those speeches is unknown. A political scientist,

Alfred Moolb , would like to find out who wrote those speeches.

In totality, those speeches have 4,000 words, and he uses the following words with the following frequency

Word Frequency until whereas task

7%

3%

5% bravery 0.5%

We suspect that the speeches have been authored by John Fitzgerald Kennedy.

We collect the writings of John Fitzgerald Kennedy, and find he uses these words with the following frequency:

Word until whereas task bravery

Frequency

6%

3.4%

4%

1%

7 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

We would like to test the null hypothesis that ‘the fraction of uses of the word

“until” in the unknown president’s speeches is equal to the fraction of uses of the word “until” for John Fitzgerald Kennedy.’ The fraction of uses of the word “until” for John Fitzgerald Kennedy is exactly known.

1. Give a 99% confidence interval for the fraction of uses of the word “until” in the unknown president’s speeches:

[0.07 – 2.58*(3.75*10 -3 ),

[0.0603,

2. Give the t statistic for the null hypothesis: 𝑡 =

0.07+ 2.58*(3.75*10

0.0777]

0.07 − 0.06

= 2.67

3.75 ∗ 10 −3

-3 )]

3. At what significance levels can we reject the null hypothesis? Tick all that apply (points added for right answer, removed for wrong answer).

 None

 At 1% 

 At 5%

 At 10%

4. Now the political scientist Alfred Moolb builds a t statistic for many null hypothesis. Each null hypothesis is ‘ the fraction of uses of the word “XXXXX” in the unknown president’s speeches is equal to the fraction of uses of the word

“XXXXX” in John Fitzgerald Kennedy’s speeches ’. He writes the null hypothesis for 500 different words XXXXX, including until, whereas, task, bravery, and 496 other words. He builds a t statistic for each word as well.

Under the null hypothesis, how many of the 500 t statistics will be strictly greater than 2.58 (t>2.58)?

0.5%*(500) = 2.5

Under the null hypothesis, how many of the 500 t statistics will be strictly lower than -2.58 (t<-2.58)?

0.5%*(500) = 2.5

8 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Under the null hypothesis, how many of the 500 t statistics will be strictly lower than -1.65 (t<-1.65)?

5%*500 = 25

9 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Problem 6 – Pakistan Travel (5 points)

Anthony is a research assistant for the Clinton Foundation in Pakistan. He has collected observations on samples in different cities. In each city, he tests a null hypothesis for the mean of a variable. The variable has a normal distribution.

Each line of this table corresponds to a separate sample of size N, and a separate null hypothesis. Tick all boxes that apply in the last three columns.

Sample

Lahore

Islamabad

Sukkur

Larkana

Peshawar

Hyderabad

Karachi t statistic

3.2

-1.02

4.5

-6.4

-7.8

2.1

-2.5 sample size N

23

11

2

2

120

31

29

Reject H

0 at 90% ?

Reject H

0 at 95% ?

Reject H

0 at 99% ?

10 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Problem 7 – Correlation and Slope (6 points)

The World Bank argues that a higher fraction of women with a college degree causes a lower rate of violence.

They found that the average fraction of women with a college degree across countries is 25%. They also found that the standard deviation of violent acts is

10. In the same report, the correlation between the fraction of women with a college degree and violent acts is -0.4.

Give the formula relating slope and correlation. r(x,y)= 𝑏 × 𝑠 𝑥 𝑠 𝑦

What is the slope of the relationship between the fraction of women with a college degree (explanatory variable) and violence (dependent variable)?

0.433

−0.4 = 𝑏 ×

10 𝑏 = −9.23

Note that here x is given as a percentage (fraction of women), but y is the number of violent acts. Therefore, to calculate sx you must use p=0.25 and not

25. This is because both variables are NOT percentages! Thus, sx= √0.25 ∗ (1 − 0.25) = 0.433

sy=10 x = fraction of women with college degree (the unit is one percentage point) y = number of violent acts (the unit is one violent act) y = a – 9.23x

__________________________________________________________

Thus…

 he higher the fraction of women with a college degree, the higher the number of violent acts in a country.

 he higher the fraction of women with a college degree, the lower the number of violent acts in a country.

A 1 percentage point increase in the fraction of women with a college degree leads to a …9.23…….. (increase/decrease) in the number of violent acts.

11 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

12 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

Table of t Scores

END OF MIDTERM

Thank you for your answers.

13 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

DRAFT – WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF

WILL BE CONSIDERED (FIRST 9 PAGES)

14 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

DRAFT

– WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF

WILL BE CONSIDERED (FIRST 9 PAGES)

15 (see next page)

FULL NAME: ________________________ AS IT APPEARS ON ALBERT

DRAFT

– WRITE HERE – ONLY ANSWERS ON THE ANSWER FORM ITSELF

WILL BE CONSIDERED (FIRST 9 PAGES)

16 (see next page)

Download