Problems: ESTIMATION

advertisement
Problems: ESTIMATION
1. Last year 55% of the American public agreed that "U.S. foreign policy is
misguided." The U.S. Secretary of State has asked you to find out if Americans'
views of U.S. foreign policy have improved since then. However when you begin
collecting your data, you learn that your budget has been cut. You only have
enough money to sample 8 people!
a. What is the probability that everyone in a sample this size would agree that
U.S. foreign policy is misguided, despite the fact that Americans' evaluations
of U.S. foreign policy have remained unchanged (i.e., at 55%) since last year?
b. How large would your sample have to be to ensure that if less than 50% of
your sample agree that U.S. foreign policy is misguided, a statistically
significant effect would be detected at the .05 level of significance?
2. Iowa State University has recently been reported to have one of the largest
student ratios of men-to-women of any university in the United States. (I.e., there
are many more male than female students at ISU.) If the true proportion of
women in the U.S. population is .52 , would a finding of two women in a random
sample of 10 ISU students provide evidence at the .05 level of significance that
the proportion of women at ISU is less than it is in the U.S. population? Show
how you arrived at your answer.
3. A recent psychological theory posits that the right hemisphere of the brain
processes intuitive thoughts (related to art and creativity) and that the left
hemisphere of the brain processes rational thoughts (related to logic and
mathematics). Moreover, previous research has provided considerable evidence
that the right brain hemisphere both inputs stimuli (such as sound waves) from
the left side of the body and controls behavior (such as hand motions) of the left
side of the body. Likewise, this research has provided evidence that the left brain
hemisphere both inputs stimuli from the right side of the body and controls
behavior of the right side of the body. Furthermore (the theory continues), as
their brains develop people tend to input and express information more by either
their right or their left hemisphere (but not both). (This is, for example, why [right
hemispheric] artistic people are likely to be left handed and [left hemispheric]
rational people are likely to be right handed.)
To test the theory you randomly sample 18 students from a list of all left-handed
ISU undergraduates and 20 students from a list of all right-handed ISU
undergraduates. Each of these 38 subjects is asked to sit in a soundproof room
with earphones on. At 5 second intervals, different numbers are simultaneously
spoken into each of the respondent's ears. (For example, the subject might hear
1D
"fifteen" spoken in his/her right ear at the same time that "thirty-one" is spoken
into his/her left ear.) This continues for 10 pairs of numbers. (NOTE: Each of the
20 numbers is an integer on the range from 1 to 99 and no two numbers are the
same.) Each subject is then asked to write down 10 numbers that they remember
having been spoken through the earphones. The 38 subjects are then classified
into one of three groups: (a) The subject recalled more numbers spoken into
his/her left ear than into his/her right ear, (b) the subject recalled more numbers
spoken into his/her right ear than into his/her left ear, or (c) the subject recalled
just as many numbers spoken into his/her right ear as were spoken into his/her
left ear. Your data are as follows:
Table 1: Ear That Heard Most Written-Down Numbers by Left- versus RightHandedness.
Handedness
Ear that heard most
Left
Right
Left
4
11
Right
9
2
Neither
5
7
a. Give the estimated conditional probability that a subject remembers
numbers spoken into his/her right ear more often than numbers spoken into
his/her left ear (i.e., that a subject has a value of "right" on the "ear that heard
most" variable) given that he/she is right-handed. (Show your work!)
b. Is there evidence that ISU undergraduates' right- or left-handedness is
statistically independent of whether they are more likely to remember numbers
spoken into their left ear, their right ear, or neither ear? (Use the .05 level of
significance.)
c. Do your data support the psychological theory? (Explain your answer!)
d. Assume that you gather a different sample. In particular, you randomly
sample 7 left-handed ISU undergraduates, from a population in which 50% of
all left-handed ISU undergraduates are more likely to remember numbers
spoken into their right ears than numbers spoken simultaneously into their left
ears. What is the probability that 2 of these 7 subjects would remember
numbers spoken into their right ears more often than numbers simultaneously
spoken into their left ears?
2D
4. A few years ago Newsweek proclaimed Eastham Prison in Texas to be
"America's toughest prison." You have obtained permission from Eastham's
warden to investigate the tranquilizing effects of marijuana on violence in the
prison. Your data come from 15 prison inmates who volunteered to participate in
your research. During the month prior to beginning your research, the inmates in
your sample committed an average of twenty violent acts at the prison and the
standard deviation among the number of inmates' violent acts was six.
a. You plan to provide unlimited marijuana to the fifteen inmates for a one
month period and to note the violent acts committed by each inmate during
this month. If during this month the average inmate's violent behavior is not
at least five violent acts fewer than during the month prior to your research,
the warden is not interested in using marijuana as a means of decreasing
violence in the prison. Is your sample large enough for you to detect an
estimated decrease of 5 violent acts as statistically significant at the .05 level
of significance? (Show your work.)
b. You complete your research and provide unlimited marijuana to the fifteen
inmates for a one month period, during which time you note the violent acts
committed by each inmate. The numbers of each inmate's violent acts are
listed below:
14 27 16 10 12 21 9 15 30 18 14 13 16 5 20
Give a 95% confidence interval for the number of violent acts committed by
the inmates during the month that they were provided unlimited marijuana.
c. Parts a and b make two important assumptions—one about how the 15
inmates were selected for analysis and another about the distribution of
inmates' numbers of violent acts within the prison. What are these
assumptions? Explain why these assumptions are (or are not) justifiable in
this analysis.
d. In your report to the warden, what conclusions do you make about the
substantive importance (in the warden's opinion) and the statistical
significance of your findings? (Justify your conclusions.)
5. The United States Food and Drug Administration (FDA) has just given your
research organization permission to administer your new experimental drug,
kilumbicyanide (KUBC), to ten human subjects. The drug is supposed to improve
the memories of people suffering from Alzheimer’s disease. Two hours before
and two hours after taking the drug you administer a ten-item recall (or memory)
test to a random sample of ten people afflicted with Alzheimer’s disease. Scores
on the test range from 0=no recall of any of the items to 10=perfect recall of
each item. For each subject you then calculate the amount of improvement in
3D
memory by subtracting "his/her recall score before taking the drug" from "his/her
recall score after taking the drug." The range on this improvement measure is
thus between
–10=least improvement (I.e., the subject had perfect recall before
taking the drug then no recall after taking it.) and 10=most improvement (I.e., the
subject had no recall before taking the drug then perfect recall after taking it.).
Subjects' scores on this improvement measure are as follows:
–1
2
4
–2
7
0
1
2
6
–1
a. Find estimates of improvement scores' central tendency and dispersion
among the population of all Alzheimer’s patients. (Show how your estimates
were derived and give units for each estimate.)
b. Give a 95% confidence interval for subjects' improvement scores. (Show
your work.)
c. Do you have statistically significant evidence that KUBC improves the
memories of Alzheimer’s patients? (Use the .05 level of significance and
explain your answer by interpreting your statistics in words.)
d. Parts b and c make two important assumptions—one about how the 10
patients were selected for analysis and another about the distribution of their
improvement scores. What are these assumptions? Explain why these
assumptions are (or are not) justifiable in this analysis.
e. You wish to ask the FDA's permission to replicate your research using more
subjects, but you must first tell them how many you will need. For the
purpose of estimating the population variance, assume that the above data are
representative of all Alzheimer patients. How large a sample would you need
to detect an average improvement score of 1 as significantly larger than zero
at the .01 level of significance? (Show your work.)
6. Five years ago the Iowa State Government changed its hiring procedures from
recruitment done by the Iowa Central Personnel Agency (ICPA) to recruitment
done by the specific agencies within which hirees would be working. The old
procedure for hiring someone into a job involved two steps: First, job seekers
would send applications to the ICPA. Second, the ICPA would send the 6 "best
applicants" to be interviewed at the specific agency (SA) within which the job was
located. In contrast, according to the new procedure all applications are sent to
the SA, and officials at the SA choose which are the 6 best applicants to be
interviewed.
When the change in hiring procedures was made, the Director of the ICPA
justified her decision with the argument that officials at the ICPA were less
qualified than those in the SAs to evaluate who the best applicants for a specific
4D
job were. Now that her decision has been in effect for 5 years, she has hired you
to evaluate whether or not state employees hired since her decision (under the
new hiring procedure) are better workers than those hired prior to her decision
(under the old hiring procedure). She gives you unlimited access to the ICPA's
files on all of Iowa's 3,637,936 past and present state employees. Although most
information in these files can NOT be accessed electronically, you are able to
compile the following table from the ICPA's main computer:
Table 1. Hiring outcomes of applicants for Iowa state jobs during 2 time periods.
How long ago a hire was made
Hiring outcome
6+ years ago
0-5 years ago
hired
3,547,891
90,045
not hired
14,191,564
359,955
Be sure that you interpret the data in this table correctly. For example, the
3,547,891 in the table indicates the number of applicants who were hired for Iowa
state jobs 6 or more years ago.
Your first concern is that one of the two time periods might have had more
applicants-per-job than the other. This concerns you because you believe that
applicants hired during the last 5 years may be better workers simply because in
comparison to previous years more people may have applied for their jobs (or on
the other hand, may be worse workers because of fewer applicants). (For
example, one would generally expect that the best 6 from among 100 applicants
would be worse than the best 6 from among 150 applicants.) Parts a and b of this
question deal with this concern.
a. What is the conditional probability that an applicant for an Iowa state job
was hired when the old hiring procedure was being used (i.e., given that the
hire was made 6+ years ago)?
b. Using the .05 significance level, evaluate whether during the last 5 years
there was a significantly different number of applicants-per-job than during
prior years. Was there a substantively different number of applicants-per-job
during the last 5 years than during prior years? How can you tell? (Hints: Do
NOT use chi-square in your answer. Instead, use the answer in part a as the
"no effects value" of the parameter being evaluated. You may find it useful to
think here of "employees hired from an applicant pool of a particular size"
rather than of applicants-per-job.)
c. If you were to randomly sample 5 applicants who applied for an Iowa state
job when the old hiring procedure was being used, what is the probability that
all five were hired?
5D
d. Of course, at this point you have not yet considered anything related to
whether or not "better workers" have been hired during the past 5 years. In
doing this you first identify pairs of employees (comprised of one of the
3,547,891 "pre-decision employees" hired before the ICPA's change in hiring
procedures and one of the 90,045 "post-decision employees" hired after the
change) who have identical characteristics on a variety of variables (e.g., age,
gender, marital status, job-type, etc.). You then randomly sample of 49 of
these matched pairs of state employees.
Your data on each pair come from the first-year evaluations written by each
employee's supervisor. First, you count the number of positive statements in
each of these evaluations. Second, you subtract the number of the positive
statements for the pre-decision employee from the number of the positive
statements for the post-decision employee in each matched pair. Thus a
score of 2 on the resulting variable indicates that the post-decision employee
in a matched pair had two more positive statements in her/his first-year
evaluation than did the pre-decision employee in the pair. This variable has a
mean of 3 and a variance of 100.
Do you have statistically significant evidence (at the .05 level) that the ICPA
Director's decision resulted in the hiring of better workers?
e. In your final report you recommend that someone should replicate (i.e.,
repeat) your analysis after another 5 years. If such a replication were to find
the same sample mean (i.e., 3) and variance (i.e., 100) as you found, what
would be the smallest possible sample with which you could conclude (at the
.05 significance level) that the ICPA Director's decision resulted in the hiring of
better workers?
7. Between 1778 and 1868 a total of 368 treaties were signed between the United
States and various tribes of Native Americans. As early as 1873 Edward Smith
(U.S. Commissioner of Indian Affairs) was able to look back on these 9 decades
of treaty-making and observe, "We have in theory over sixty-five independent
nations within our borders, with whom we have entered into treaty relations as
being sovereign people; and at the same time the white agent is sent to control
and supervise these foreign powers, and care for them as wards of the
Government. The double condition of sovereignty and wardship involves
increasing difficulties and absurdities." During the 1900s U.S. government
policies toward Native Americans have commonly been based on legal
arguments that ignored Native Americans' sovereignty but that promoted U.S.
business interests in the name of the U.S. government's paternalistic role over its
Native American wards. For example, given the terms of the 1842 treaty with the
Chippewa Nation and its mineral rights in Wisconsin, BHP Billiton (an Australian
mining company) is currently fighting for its legal right to use its cyanideleaching process for mining gold in Crandon Mine, and in so doing to pollute the
6D
Chippewa lands where the mine is located. The company's legal staff argues that
the mine will provide jobs for Native Americans, and thus will (paternalistically)
help them.
You believe that the shift from sovereign to paternalistic legal treatment of Native
Americans dates back to the 91 year period when treaties between Native
Americans and the United States were first being signed. In operationalizing your
key concepts, you note that sovereignty is usually conveyed when a Native
American is referred to in a treaty as the grammatical subject of a sentence (e.g.,
"If any citizen of the United States . . . shall attempt to settle on any of the lands
hereby allotted to the Indians to live and hunt on, . . . the Indians may punish him
or not as they please." [Treaty of Hopewell, Art. 4, Jan. 3, 1786]). On the other
hand, paternalism is conveyed whenever a Native American is referred to in a
treaty as the grammatical object in a sentence (e.g., "The United States bind
themselves to protect the aforesaid Indian nations against the commission of all
depredations by the people of the said United States." [Treaty of Fort Laramie,
Art. 3, Sep. 17, 1851]). Taking the texts of the 368 treaties as your data source,
you find 15,251 instances of the words "Indian" or "Indians" in these texts.
a. You randomly sample 64 out of the 15,251 instances for a preliminary
analysis of your data, and note the date of the treaty in which each appears.
The average among these 64 dates is 1820, and the standard deviation among
the dates is 48. Describe the sampling distribution of average dates among
samples of 64 instances, and give point estimates of this distributions's
parameters (i.e., its mean and variance).
b. Your analysis would be greatly simplified if just as many of the 15,251
instances appeared in later treaties (1825 or after) as appeared in earlier
treaties (before 1825). Referring to the average of 1820 reported in part a, what
is the probability that an average this far or further from 1825 is due to
sampling error? (Hint: In answering this part of the question, assume that the
average treaty-date among all instances equals 1825.)
c. Based on the probability (sometimes called a "p-value") calculated in part b,
do you have statistically significant evidence (at α = .05) that the average of
1820 is different from 1825? Explain your answer.
The next parts of this problem deal with a second data set. In particular, the next
step in your analysis is to sample 20 instances within each of six time periods
during the 91 years when the treaties were signed. Your data are as follows:
7D
Table 1. Use of "Indian" or "Indians" as subject or object in U.S. treaties with
Native Americans for six time periods between 1778 and 1868.
Period when treaty was signed
177817951810182518401855Grammatical usage
1794
1809
1824
1839
1854
1868
subject
15
8
7
9
5
4
object
5
12
13
11
15
16
Be sure that you interpret the data in this table correctly. For example, the 15 in
the table's upper-left cell indicates how many of the 20 instances of "Indian" or
"Indians" sampled from treaties signed between 1778 and 1794 were the
grammatical subjects of sentences within these treaties.
d. Does Table 1 provide statistically significant evidence (at the .05 level of
significance) that grammatical usage of the words "Indian" or "Indians" within
a treaty is dependent on the period when the treaty was signed?
e. Using conditional probabilities, evaluate whether the data in Table 1 are
consistent with your belief that "the shift from sovereign to paternalistic legal
treatment of Native Americans dates back to the 91 year period when treaties
between Native Americans and the United States were first being signed"?
Explain your answer!
f. A random sample of how many instances of the words "Indian" or "Indians"
would be needed at the .05 significance level to obtain an estimate within 5
percentage points of the percent of such instances used as subjects in
treaties signed between 1855 and 1868? Justify your choice of a variance
estimate. (Hint: Assume that all instances of the words are used
grammatically as either subject or object.)
8. In 1922 William Ogburn was the first social scientist to point out that in modern
societies technological progress is often met with dissatisfaction. Ogburn used
the term, "cultural lag," to refer to the tendency for people's acceptance of (or
satisfaction with) technological change to occur later than the time when the
technology is introduced. Cultural lag is problematic insofar as popular
dissatisfaction slows down the pace at which technological improvements are
implemented. One of the most recent innovations in contemporary societies
involves a change from "traditional-learning" to "distance-learning." Students
need no longer attend classes but can communicate over the Internet with
teachers and classmates using e-mail and chat rooms. Such online learning
leads to greater efficiency in the efforts of both teachers (e.g., as "streaming
videos" free up their time for more teacher-student interaction online), and
students (e.g., as studying and interaction occur at times convenient for those
with full-time employment).
8D
Virtual University (vu.org)—the self-proclaimed "oldest and largest online
learning community"—has hired you to evaluate cultural lag among its 947
students and 28 faculty. You develop a "Cultural Lag Scale" (CLS) consisting of
20 agree/disagree questions that measure respondents' dissatisfaction with
online learning. For example, the last question on the scale is as follows: "Do
you agree or disagree that students learn at least as much from online video
lectures as they would from classroom lectures?" An agree response indicates
satisfaction with online learning, whereas a disagree response indicates
dissatisfaction with online learning. CLS scores are obtained by assigning a
weight of 0 to each progressive (i.e., satisfaction) response and of 1 to each lag
(i.e., dissatisfaction) response, and then summing these weights. Thus each
respondent's CLS score can range from 0 to 20, where 20 lag-responses signifies
greatest cultural lag and 0 lag-responses signifies most cultural progressiveness.
(A score of 10 signifies equivalence in the respondent's cultural lag and her
progressiveness.)
a. Based on short telephone interviews with a small randomly sampled group
of Virtual University (VU) students, you believe that the variance in students'
CLS scores equals 12.25 squared lag-responses. Assuming that your belief is
accurate, how large a sample would be needed to ensure that an average CLS
score would be within 1 lag-response of the true mean in 99 out of 100
samples of this size?
b. You randomly sample 200 students' names out of the 947 students currently
enrolled at Virtual University (VU). You then mail out questionnaires to these
200 students as well as to all 28 VU faculty. Whereas all faculty return their
completed questionnaires to you, only 124 of the students return theirs.
Estimate the joint probability that a questionnaire was both mailed out to a
student and returned.
c. Do you have evidence at the .001 significance level that a person's returning
of her/his questionnaire is statistically independent of whether s/he is a
student or a faculty member?
d. Based on the data presented in part b, calculate a point estimate for the
proportion of VU students (i.e., NOT faculty) who would not return your
questionnaire if you were to mail one to them.
e. Find a 95% confidence interval for the proportion calculated in part d.
f. You decide to use the CLS scores from the 28 VU faculty to evaluate whether
all distance-learning faculty (i.e., all faculty who, like VU faculty, teach online)
exhibit more progressiveness than cultural lag. CLS scores among the 28 VU
faculty are 10.77 lag-responses on average, with a standard deviation of 3.1
lag-responses. If the average CLS score among all distance-learning faculty
equals 10 lag-responses (i.e., equal progressiveness and cultural lag) with a
9D
standard deviation of 3.1 lag-responses (i.e., the same found in your sample),
what is the probability that a random sample of 28 faculty from this population
(i.e., from the population of all distance-learning faculty everywhere) would
have an average CLS score of 10.77 or more progressive than this?
g. What two assumptions must be made before you may legitimately claim that
the probability obtained in part f can be generalized to all distance-learning
faculty everywhere?
9. On May 2nd, 1994, the African National Congress (ANC) party was declared to
have won majority control of South Africa's parliament within the country's first
all-race election. Soon thereafter Nelson Mandela was sworn in as the country's
first black president. During the decade leading up to this election, increasingly
violent unrest perpetrated primarily by youths in black townships was recognized
by Mandela and others as legitimate anti-apartheid protest. Yet after 1994 youth
unrest recurred, as poor blacks realized that their situation had not improved. In
fact, the South African economy has worsened dramatically since 1994. For
example, Statistics South Africa reports that between February and September
2001 unemployment increased from 37.0% to 41.5% of the economically active
labor force, and that there was 11.6% inflation in the 12 months following August
2001.
Since the 1999 election, the ANC has controlled over 58% of the seats in
Parliament. (Parties dominated by whites control less than 17%.) In this new
political climate, youth unrest is no longer recognized as legitimate by the ANC.
Instead, unrest is understood as lawless delinquency that is undermining the
prosperous state that the new black leadership is striving to build. Given these
concerns, the ANC hires you to investigate how widespread delinquent acts are
among South African's youth. In particular, they want you to estimate the
average number of delinquent acts perpetrated during the past month by South
African youths between the ages of 15 and 25.
a. You begin your research with a small pilot study of 25 youths (randomly
sampled from among all South African youths between 15 and 25 years of
age). After lengthy interviews with the youths, you determine that they
perpetrated an average of 10 delinquent acts during the past month, and that
the standard deviation among these acts was 21 acts. Find a 95% confidence
interval for this average.
b. After obtaining the confidence interval in part a, a colleague points out to
you that 20 out of the 25 youths in your pilot study perpetrated no (i.e., zero)
delinquent acts during the past month. She then says, "Given that so few of
the youths participated in any delinquent acts, and that so many delinquent
acts were perpetrated by so few of them, you should not calculate a
confidence interval in this case." Why is this? (Hints: Of course, the median
10D
would be a less misleading measure of central tendency than the mean here.
Yet nonetheless, there is nothing illegitimate about estimating a confidence
interval for such a misleading measure just because your population variance
is large due to skewed data like those described in this case.)
c. Of course, the pilot study was not entirely worthless. It has provided you
with an estimate of the population standard deviation among the youths'
delinquent acts. Given this, how large a sample size would you need to
estimate the average number of delinquent acts perpetrated (during the same
month) by South African youths (aged 15-25) to within 1 delinquent act?
(Hints: Use the .05 significance level, and show your work.)
d. After collecting data from interviews with 225 South African youths (aged
15-25) you obtain precisely the same mean (i.e., 10 acts) and standard
deviation (i.e., 21 acts) as you did in the pilot study. What is the probability
that this mean is within 1 delinquent act of the true average number of deviant
acts perpetrated during the past month by South African youths between the
ages of 15 and 25?
After reexamining transcripts of your interviews with the youths, you realize that
youths involved in delinquent acts were chronically unemployed, whereas those
not involved had expressed some hope of future employment. In preparing your
report for the ANC, you wish to incorporate parts of the following data, obtained
from Statistics South Africa:
Table 1: South African Labor Market Trends (in millions of people).
Economically active
Employed
Unemployed
Economically inactive
February 1, 2001
11.837
6.961
8.323
September 1, 2001
10.833
7.698
8.834
(Hint: Be sure that you read this table correctly. For example, on February 1,
2001, economically active South Africans were comprised of 11.837 million
employed persons and 6.961 million unemployed persons.)
e. What is the marginal probability (i.e., regardless of date) that a South
African is economically active (i.e., employed or unemployed)?
f. What is the joint probability that a South African was employed and that this
employment was on February 1, 2001?
g. Using numbers in Table 1 calculate two conditional probabilities to evaluate
whether or not there was an increase in South African unemployment from
11D
February to September 2001. (Hint: In this problem you should consider
unemployed South Africans as neither employed nor economically inactive.)
h. To see if the increase described in part g is statistically significant, you
calculate a chi-square from Table 1, and find χ = .095647 to be its value.
Based on this value of chi-square, do you have evidence that South Africans'
employment status is dependent on the time in 2001 when it was considered?
(Hints: Use α = .001 , and be sure to show how you arrive at your answer!)
2
i. Your savvy colleague looks over your shoulder once again and comments,
"Duuuuhhhh. The data in Table 1 are in millions, silly. It wasn't 11.837 people
who were employed on February 1, 2001; it was 11,837,000 people!!! You have
to change the table's cell frequencies to millions before calculating chisquare." Without recalculating chi-square (i.e., making use of the chi-square
value given to you in part h), what is the correct value of chi-square for
Table 1?
10. Native American settlements only became prevalent in Iowa during the Late
Archaic Period (2500 - 500 B.C.). These Indians were primarily hunters, who
stalked their prey in groups using bows and arrows. During the Woodland Period
(500 B.C. - A.D. 1000) Iowa Indians' diet was expanded to include fish as well as
vegetables that they had farmed (instead of gathered) themselves.
Archaeologists suspect that another important difference between the Native
American populations of Iowa during these two periods is that Indians traveled
more during the later than during the earlier period.
You are doing archaeological work at Buchanan Bog (a large peat bog northeast
of Ames, near McFarland Park). The site is perfect for your research because it
contains many artifacts from both the Late Archaic and Woodland Periods.
These artifacts include bone tools, flint arrowheads, and some ceramics.
Of central importance in your research is the type of flint that the arrowheads
from this site are made of. Most of these arrowheads are made of flint that is
easily found nearby Buchanan Bog. However, other arrowheads' flint comes from
locations as far as Illinois. After training yourself on the geology of Iowa and its
surrounding states, you assemble a data set by sampling (in as random a fashion
as possible) 52 arrowheads from the site, and by assigning two pieces of
information to each of these arrowheads: the period (i.e., Late Archaic versus
Woodland) and the distance (in miles) from Buchanan Bog to the likely location
where the arrowhead came from. Your thinking is that more traveling will have
occurred in sites with arrowheads made of flint from relatively distant origins. As
it turns out, the same number (i.e., 26) of arrowheads in your sample come from
each period. Data on the means and variances of the distances (separately for
each period as well as for the combined data on all 52 arrowheads) are as
follows:
12D
Table 1. Means and variances for distances (in miles) from Buchanan Bog to likely
locations of origin for arrowheads from Late Archaic and Woodland Periods.
Period
Late Archaic
(2500 - 500 B.C.)
Woodland
(500 B.C. - A.D. 1000)
Combined Data
(2500 B.C. - A.D. 1000)
Sample Size
Mean
Variance
26
29
1518
26
25
1128
52
27
1297
a. What are the units associated with the variances (i.e., the values 1518, 1128,
and 1297) listed in the table?
b. What is the marginal probability that one of the arrowheads in your sample
is from the Late Archaic Period? Show your work!
c. Your supervisor is surprised that half of the arrowheads in your sample are
from the Late Archaic Period. There were many fewer Indians in Iowa during
this earlier period, and at all other archaeological sites in Iowa only 35% of
arrowheads have been from this period whereas 65% have been from the
Woodland Period. If Buchanan Bog is actually just like these other Iowa sites
in that the true percentage of Late Archaic Period arrowheads there is 35%,
what is the probability of finding 26 Late Archaic Period arrowheads or more
among 52 arrowheads randomly sampled from the bog? Show your work!
d. As expected, most of the 52 arrowheads in your sample are made of flint
that is easily found nearby Buchanan Bog. In creating your data matrix, you
assigned a distance score of 0 miles to all such arrowheads. To all other
arrowheads you assigned scores ranging from 25 miles to 116 miles. Given
this method of assigning distance scores, what is the arrowheads' distance
score at the 50th percentile? Be sure to explain your answer. (Hint: Note the
word "most" in the first sentence of this problem.)
e. Calculate the 95% confidence interval for the average distance score for the
combined data on all 52 arrowheads. Show your work!
f. Wishing to be helpful, your research assistant calculates 95% confidence
intervals separately for each period's average distance score (i.e., one
confidence interval for the Late Archaic arrowheads' mean of 29 miles and one
for the Woodland arrowheads' mean of 25 miles). Why is it illegitimate for the
research assistant to use confidence intervals in obtaining interval estimates
for these two means?
g. Your supervisor wants to be sure that your estimate of the average distance
from Buchanan Bog is within 5 miles of the true average distance for all
13D
arrowheads at the site. She suggests that you return to the bog and randomly
sample enough additional arrowheads to ensure this degree of accuracy when
you recalculate your 95% confidence interval using a larger sample (i.e., the
original 52 arrowheads plus additional ones you will soon be sampling). To
obtain such accuracy, how many arrowheads would be needed in addition to
the original 52?
h. Based on the data in Table 1, do you have evidence (at the .05 significance
level) that Native American Indians traveled more during the Woodland Period
than during the Late Archaic Period? Explain your answer?
i. Assume again as in part c, that Buchanan Bog is just like other Iowa sites in
that the true percentage of Late Archaic Period arrowheads there is 35%.
However, this time calculate the probability of finding 4 Late Archaic Period
arrowheads or more among 5 arrowheads randomly sampled from the bog?
Show your work!
11. Minnesota is like Iowa in that much of its agricultural industry is controlled by
large corporations. These corporations leave most of the state's small farmers
unable to produce food at prices at or below the prices charged for the same
agricultural products produced on the corporations' massive farms. A common
survival strategy for small farmers is to produce food for "niche markets" (for
example, range free pigs, organically grown vegetables, etc.). You have identified
what you believe may be a new niche market for Minnesota's small farmers.
The world's largest Somali community outside of Somalia is located in Minnesota
in the vicinity of the twin cities, Minneapolis and St. Paul. On a recent trip there
you discovered that upon immigrating to Minnesota, Somalis gave up a food that
was central to their diet before leaving Somalia: goat meat. Yet Somalis are an
Islamic people, and according to Islamic law it is a sin to eat meat from a goat
that was not slaughtered in an Islamically proper manner.
To evaluate the feasibility of small Minnesota farmers' profitable production of
goat meat, you travel numerous times to Minneapolis to conduct face-to-face
interviews with a random sample of Somali immigrants there. You have been
unable to obtain funding for this research because no foundation is willing to
support "a mere feasibility study." Moreover, on your graduate student's income
you cannot afford to make too many of these trips. Your interview schedule is
lengthy, and you can only complete 5 interviews per trip. Your money runs out
after 6 trips, leaving you with data from 30 completed interviews. You begin your
analysis by examining data from the following two questions asked of the 30
Somali immigrants that you interviewed:
14D
IMPROPER
"Would you be willing to eat meat from a goat that was not
slaughtered according to the prescriptions of Islamic law?"
(Responses were 0="No" and 1="Yes".)
HOWMUCH
"How many dollars would be the most you would pay for a pound of
fresh, Islamically-proper goat meat?" (Responses were recorded in
dollars and cents.)
On the variable, IMPROPER, 20 Somalis responded "No" and 10 responded "Yes."
The mean value on HOWMUCH is $4.60 and its standard deviation (a population
s.d. estimate) equals $1.40 . The complete data on HOWMUCH are as follows:
Table 1. Grouped data on the most Somalis are willing to pay for goat meat.
Dollars:
2
3
4
5
6
7
Somalis:
2
5
7
8
5
3
Be sure that you read this table correctly. Its last column should be read as
indicating, "Three Somali respondents indicated that seven dollars was the most
they would pay for a pound of fresh, Islamically-proper goat meat."
a. Give an estimate of the marginal probability that a Somali immigrant would
be willing to eat meat from a goat that was not slaughtered according to the
prescriptions of Islamic law.
b. Calculate a 95% confidence interval for the proportion of Somali immigrants
that would be willing to eat meat from a goat that was not slaughtered
according to the prescriptions of Islamic law.
c. A local small farmer is not impressed with your findings. She says, "I have
three friends from the Somali community, and all three of them tell me that
they would refuse to eat improper goat meat." You explain to her that it would
not be at all improbable that three people sampled at random from the
community would refuse to eat improper goat meat even if a third (i.e., 33%) of
this community were willing to do so. Calculate the probability that this would
happen (i.e., the probability associated with the statement in italics within the
previous sentence).
d. Until an Islamically-proper slaughterhouse is built, many small Minnesota
farmers must be convinced that enough Somali immigrants will be willing to
eat "improper goat meat" for them to sell any goats they might produce. The
confidence interval obtained in part b is too large for them to be convinced of
this. They tell you that they are not interested in raising goats unless they can
be sure that at least 25% of the Somalis are willing to eat improper goat meat.
You decide to reassure these farmers using data from a larger sample. You
seek funding for a large enough sample to provide you with a precision of .02
at the .05 level of significance. In applying for this funding, how big a sample
15D
would you say that you needed? Be sure to justify your choice of a variance
estimate.
PLEASE TAKE NOTE: Parts e and f deal with the variable, HOWMUCH.
e. Returning to your original data on 30 Somalis, determine how many dollars
was "the most a Somali would pay for a pound of fresh, Islamically-proper
goat meat" at the 40th percentile?
f. Now imagine that the most Somali immigrants in Minnesota would pay for
fresh, Islamically-proper goat meat is actually (i.e., the variable HOWMUCH
has a population mean equal to) $3.97 per pound. For any 30 Somalis
randomly sampled from your population of Minnesota Somali immigrants,
what is the probability that this sample's mean value on the HOWMUCH
variable is as large or larger than the amount you obtained in your sample
(namely, $4.60)? (Hint: Use the data from your sample to estimate the
population standard deviation.)
12. You are part of a small team of anthropologists working on a “dig site” just
east of Carlisle, Iowa, near the Des Moines River. From approximately 1150 until
1450 AD this location is where a small village of Oneota Native American Indians
lived. Your research interests are in understanding more about the Indians’ diet.
You know that the meat in their diet consisted of deer, elk, and bison. However,
you believe that meat on the flanks or the internal organs of these animals were
usually not eaten by the Oneota Indians.
Your reasoning is that after killing their prey, the Indians did not carry entire
carcasses back to their village. Instead, you speculate, they butchered animals in
the field and only took the animals’ limbs to the village (leaving head and trunk
behind). Since Oneota Indians commonly used dogs as beasts of burden, this
would make sense because their dogs could hardly have carried an entire deer,
elk, or bison carcass.
At the site your data consist of bones left over from meals consumed centuries
ago. After careful examination you determine that these bones are the remains of
16 deer, 4 elk, and 2 bison. You place the bones in 22 bins—one bin containing
the remains of each animal. Your research task is to determine if (of all bones in
each bin) the percent of limb-bones is greater than this percent would be if bones
of the entire animals were left behind. The percent of deer bones that are limb
bones is 30% (i.e., 3 pounds of limb bones for every 10 pounds of other bones).
For elk this percent is 25%, and for bison the percent is 20%. You also know that
in the case of each of these three species the standard deviation in their percent
of all bones that are limb bones is 13%. Please note that if an entire deer were
brought back to the village and eaten there, one would expect that 30% of the
16D
leftover bones would be limb-bones and that 70% of the bones would be other
bones (e.g., skull, backbone, rib, etc.) from the deer.
a. You begin by analyzing the 16 sets of deer bones (i.e., your sample size is
16 at this point). For each set of bones you calculate the percent of all bones
in the set that are limb bones. What is the level of measurement of this justcalculated measure? (Hint: It may be helpful to write down what a data matrix
with these calculated measures might look like.)
b. Your colleagues will only believe that “Oneota Indians tended (1) to butcher
animals in the field and (2) to only bring these animals’ limbs back to the
village” if in your data the percent of limb bones is more than half (50%).
Initially, you interpret this to mean that the average percent of limb bones
among your 16 sets of deer bones should be greater than 50%. Is your sample
of 16 sets of deer bones sufficiently large to ensure that sampling error would
not be a probable explanation (at the .05 significance level) for a finding of
such an increase (i.e., 50% minus 30%) over what one would expect if entire
deer were brought back to the village and eaten there?
c. You calculate the percent of limb bones among all bones in each of your 16
bins of deer bones, and find that the average among these percents is 40%.
Give a 99% confidence interval for this finding.
d. Is the finding in part c a substantively important one? Explain your answer.
(Hint: In formulating your explanation, you may find it useful to consult part b.)
e. You take a different approach when analyzing the data on your four sets of
elk bones. Instead of recording the percent of each bin’s bones that are limbbones, you only record whether or not this percent is above 50% (because
50% is the minimum percent that your colleagues will find convincing). Given
that the percent of elk bones that are limb bones is 25%, what is the
probability that all four of your sets of elk bones would have a percent of limbbones greater than 50%?
f. Imagine that you do find that all four of your sets of elk bones have a
percent of limb-bones greater than 50%. Does this finding provide statistically
significant evidence at the .05 significance level that “Oneota Indians tended
to butcher elk in the field and to only bring these animals’ limbs back to the
village”? Explain your answer.
g. Is the finding in part f a substantively important one? Explain your answer.
(Hint: As in part d, you may find it useful to consult part b.)
13. Since the US Surgeon General’s 1972 report on the topic, social psychologists
in the United States have extensively researched the effects of the mass media on
17D
violent behavior. Yet relatively little research has been done on the effects that
the US media have on the quality of people’s marriages. Like media-violence
researchers, who hypothesize that violent television-content leads to viewers’
subsequent violent behaviors, you hypothesize that television-content depicting
dysfunctional (i.e., bad) marriages leads to viewers’ subsequent dysfunction in
their own marriages. However, your theoretical argument for why others’
dysfunction causes one’s own dysfunction has nothing to do with the “modeling
argument” used by those who study the effects of media-violence. Unlike them,
you do not believe that people simply imitate the marriages that they see on
television—be they good (i.e., well-functioning) or bad ones. Instead, your theory
is as follows:
Marriages are relationships, and (like all relationships) they require work.
Witnessing someone else’s poor marriage, leads one to believe that one’s own
marriage is relatively good. If a person believes her- or himself to have a good
marriage, she or he will work less on that marriage. When people work less on
their marriages, these marriages will worsen. In contrast, when one witnesses
others’ good marriages, one comes to believe one’s own marriage to be relatively
bad. As a consequence, one becomes motivated to work more on one’s marriage
which, in turn, results in marital improvements.
To test your theory you randomly assign 25 married Psychology 101 students to
each of two groups:
Group 1 is shown a 15 minute video from an episode of “Mad About You” in
which Helen Hunt (Jaime) and Paul Reiser (Paul) depict an exceptionally
good marriage.
Group 2 is shown a 15 minute video from an episode of “The Young and the
Restless” in which Melody Scott (Nikki) and Eric Braeden (Victor) depict
an exceptionally bad marriage.
After having watched one of these videos, students are asked whether they
strongly disagree, disagree, agree, or strongly agree with the statement,
“Relatively speaking, my marriage is just about perfect.” Your table-of-results is
as follows:
“Relatively speaking, my marriage is just about perfect.”
Video watched strongly disagree
Mad about you
Young & rest.
12
2
disagree
1
11
agree
strongly agree
10
3
2
9
a. The above table-of-results contains data on two variables. For each
variable, calculate its appropriate measure of central tendency and its
18D
appropriate measure of dispersion. Be sure to show (or explain) how you
obtained each measure.
b. Find the conditional probability that a student who was shown the “Mad
about you” video strongly agrees that her or his marriage is “just about
perfect.” In addition, find the conditional probability that a student who was
shown the “Young & restless” video strongly agrees that her or his marriage
is “just about perfect.” Are these two conditional probabilities consistent or
inconsistent with your theory about the relation between media-content and
marital dysfunction? (Explain your answer!)
c. Among your professional colleagues, the only substantive difference
among responses to the “perfect marriage” question is between agreement
and disagreement. To accommodate this constraint, you collapse the
variable’s four attributes into the two attributes of “agree” (indicated by 24
students, namely the 11 students who indicated “strong agreement” plus the
13 students who indicated “agreement”) versus “disagree” (indicated by 26
students, namely the 14 students who indicated “strong disagreement” plus
the 12 students who indicated “disagreement”). Note that this collapsing
changes the above table-of-results from a 2x4 table into a 2x2 table. In the
four dashed boxes provided below, fill in the four cell frequencies that result
from this collapsing. (Hint: The table’s marginal frequencies are provided for
you.)
“Relatively speaking, my marriage is just about perfect.”
Video watched
disagree
agree
Mad about you
25
Young & rest.
25
26
24
d. Compute chi-square for the 2x2 table that you filled-in in part c. Explain
what this chi-square statistic indicates about the relation between the video
the students watched and their agreement or disagreement with the perfectmarriage statement. (Use the .05 significance level, and be sure to show your
work!)
e. You expand your analysis using different data from a questionnaire
submitted to a random sample of 22 married adults from your local
community. You include the same statement (namely, “Relatively speaking,
my marriage is just about perfect.”) in your questionnaire. However, this time
you only ask respondents if they “agree” or “disagree” with this statement.
(That is, you do not ask them whether or not their agreement or disagreement
19D
is strong.) Your findings are that 15 of the adults agree with the statement and
7 disagree with it. Calculate the appropriate point estimate for the variable
associated with these findings.
f. Obtain a 95% confidence interval for the point estimate calculated in part e.
g. How large of a sample from your local community would be needed to
ensure that a newly calculated 95% confidence interval would have a precision
of ± .1 (i.e., that Δ=.1)? (Hint: Be sure to justify your choice of a variance
estimate.)
14. Iowa currently has 350,000 acres of publicly held
lands in a system of state and locally owned parks,
forests, and preserves. Lakes and streams represent an
additional 324,000 surface acres of water in Iowa’s 132
lakes, 180,000 acres of wetlands, and hundreds of miles
of interior rivers. Moreover, there are nearly 900 miles of
multi-purpose trails in Iowa for biking, hiking, and crosscountry skiing. The primary way that the State of Iowa
obtains additional public lands is through charitable
contributions. Donated land is then set aside for future generations, and its
state-sponsored preservation helps prevent soil erosion and protect Iowa’s
streams, lakes, and wildlife.
A committee within the Iowa State Legislature is drafting a bill (House File 2080),
that provides Iowa landowners with charitable tax credits if they donate land for
preservation and conservation. This committee has commissioned a group of
ISU researchers to investigate how effective the bill would be if it were to become
law. As a member of this committee you send a questionnaire to a random
sample of large Iowan landowners (i.e., Iowans who own at least 100 acres of
land). One of the items on this questionnaire measures their “willingness to
donate land” as follows:
“If charitable tax credits were provided for land donated to the State of Iowa for
preservation and conservation, would you donate some of your own land?”
(Possible responses to this question are 1 = yes and 0 = no.)
Unfortunately, only 30 of those surveyed returned the questionnaire. Since the
State of Iowa has provided you with data on how many acres EVERY Iowan owns,
you are able to calculate that the 30 Iowans in your sample own an average of 200
acres of land, with an estimated population standard deviation of 100 acres.
a. Find a 95% confidence interval for the average number of acres owned by
large Iowan landowners.
20D
b. When obtaining the confidence interval requested in part a, what
assumption did you make about the distribution of “acres of land owned”
among the Iowans in your sample? Explain how you might evaluate whether
or not this assumption is justified. (Hint: See the underlined text just prior to
part a.)
c. You generate the following contingency table from your data. Using (i.e.,
not collapsing) this table, explain why it would be inappropriate for you to use
chi-square in testing whether or not the amount of land a large Iowan
landowner owns is independent of her or his willingness to donate land to the
State of Iowa?
Number of acres owned
Would you donate?
Yes
No
l00-150
151-250
1
7
6
4
more than 250
8
4
d. Using data from the table in part c, compute two conditional probabilities
that might be used to support the argument that “the more land one owns, the
more likely one is willing to donate some of this land to the State of Iowa.”
Write a sentence in the space below in which you explain how the probabilities
provide support for this argument. (Hint: Please show your work! And,
incidentally, you should ignore whether or not the probabilities are
significantly different from each other.)
e. Using data from the table in part c, what is the “number of acres owned” at
the 60th percentile among all 30 landowners in your sample? Be sure to state
your answer in a complete English sentence!
f. Using data from the table in part c, find an estimate of the joint probability
that a large Iowan landowner both “owns more than 250 acres” and “indicates
a willingness to donate land to the State of Iowa.” (Be sure to show your
work!)
g. Using the .05 significance level and data from the table in part c, find an
interval estimate for the proportion of large Iowan landowners who indicated a
willingness to donate land to the State of Iowa.
15. In the mid-19th century many Germans
immigrated to the US to escape economic
and political turmoil in their homeland. By
1890 a third of a million of them had settled
in Illinois where nearly half took up farming
in rural areas—locations with access to
outside markets (primarily in Chicago)
21D
supplied by an expanding network of railroads. The German farmers brought
with them a tradition of family labor that lent itself to the production of small
grains (flax, barley, wheat). Women and children typically joined the men in
working the fields.
To study whether or not this tradition resulted in the Germans’ resistance to
increasing demand for large grains (especially, corn), you select the town of
Schaumburg, Illinois (about 25 miles northwest of Chicago), as the site for your
research. In the late 1800s over half of this town consisted of German
immigrants, with the rest consisting of native-born US citizens. Noting that
machinery developed at the time for corn planting and harvesting could be run
solely by male farmers without their wives’ or children’s help, you believe that
Schaumburg’s German immigrants will be less likely than its native-born
residents to switch from small grain to corn production.
Your data are from the 1870 US census. Unfortunately, some of the records from
this census were destroyed in the Great Chicago Fire of 1871, and you must make
do with what remains. Fortunately, all of the 1870 data on native-born (i.e., nonGerman) Schaumburg farmers survived the fire. Your data on Schaumburg
Germans in 1870 is only for 70 farmers, however. Thus you have data on . . .

all native-born Schaumburg farmers in 1870, and

70 German Schaumburg farmers in 1870.
NOTE: This means that you can obtain parameters for the first subpopulation, but
that you only have a (what you may assume to be random) sample from the
second subpopulation.
Table 1: Numbers of Schaumburg farmers who did vs. did not produce small
grains in 1870
Schaumburg farmers in 1870
native-born farmers
German farmers
(census)
(sample)
Produced small grains
924
63
Did not produce small grains
176
7
a. Using data in Table 1, find the conditional probability that in 1870 a
Schaumburg farmer produced small grains if he was native-born.
b. Using the data in Table 1, obtain a point estimate of the proportion of 1870
German Schaumburg farmers who produced small grains.
22D
c. Obtain a 95% confidence interval for the proportion that you calculated in
part b.
For parts d and e below: To simplify phrasing on these two parts of the
question, let’s refer to “Schaumburg’s farmers in 1870” as “SF1870s.” Now
imagine that your sample of 70 German SF1870s is in fact not representative
of the population of all German SF1870s. Instead, assume that the proportion
of all German SF1870s who produced small grains was identical (i.e., equal) to
the proportion of all native-born SF1870s who produced small grains. (Hint:
Use the data in Table 1 on native-born SF1870s to calculate the true value that
each of these proportions equals. Also, you may find your answer in part a to
be of use here.)
d. What is the probability in a random sample of 70 (seventy) German SF1870s
of finding as large or larger a proportion of farmers producing small grains
than the proportion that you calculated in part b?
e. What is the probability in a random sample of 7 (seven) German SF1870s of
finding as large or larger a proportion of farmers producing small grains than
the proportion that you calculated in part b?
For parts f, g, and h below: At this point, you stop considering the data in
Table 1, and begin working with a new variable: family size. You believe that
in 1870 the size of Schaumburg’s German farmers’ families is larger than the
size of its native-born farm families. Based on new data from the 1870 census,
you find that the average number of children among all native-born farmers’
families is five. Using your sample of 70 German farmers, you find that their
average number of children is six (with a standard deviation of 2 children).
f. Obtain estimates of the mean and variance of the sampling distribution of
the average number of children in Schaumburg’s German farmers’ families in
1870. (Hint: The sample size associated with this sampling distribution is 70.)
g. Do you have evidence at the .05 significance level that in 1870 the size of
Schaumburg’s German farmers’ families was larger than the size of its nativeborn farmers’ families? (Hint: Be sure to make use of the estimated variance
of the sampling distribution referred to in part f.)
h. How large a sample size would you need to estimate at the .01 significance
level 1870 Schaumberg’s German farmers’ families’ average size to within one
quarter (i.e., ¼ or .25) child?
23D
Download