Problems: ESTIMATION 1. Last year 55% of the American public agreed that "U.S. foreign policy is misguided." The U.S. Secretary of State has asked you to find out if Americans' views of U.S. foreign policy have improved since then. However when you begin collecting your data, you learn that your budget has been cut. You only have enough money to sample 8 people! a. What is the probability that everyone in a sample this size would agree that U.S. foreign policy is misguided, despite the fact that Americans' evaluations of U.S. foreign policy have remained unchanged (i.e., at 55%) since last year? b. How large would your sample have to be to ensure that if less than 50% of your sample agree that U.S. foreign policy is misguided, a statistically significant effect would be detected at the .05 level of significance? 2. Iowa State University has recently been reported to have one of the largest student ratios of men-to-women of any university in the United States. (I.e., there are many more male than female students at ISU.) If the true proportion of women in the U.S. population is .52 , would a finding of two women in a random sample of 10 ISU students provide evidence at the .05 level of significance that the proportion of women at ISU is less than it is in the U.S. population? Show how you arrived at your answer. 3. A recent psychological theory posits that the right hemisphere of the brain processes intuitive thoughts (related to art and creativity) and that the left hemisphere of the brain processes rational thoughts (related to logic and mathematics). Moreover, previous research has provided considerable evidence that the right brain hemisphere both inputs stimuli (such as sound waves) from the left side of the body and controls behavior (such as hand motions) of the left side of the body. Likewise, this research has provided evidence that the left brain hemisphere both inputs stimuli from the right side of the body and controls behavior of the right side of the body. Furthermore (the theory continues), as their brains develop people tend to input and express information more by either their right or their left hemisphere (but not both). (This is, for example, why [right hemispheric] artistic people are likely to be left handed and [left hemispheric] rational people are likely to be right handed.) To test the theory you randomly sample 18 students from a list of all left-handed ISU undergraduates and 20 students from a list of all right-handed ISU undergraduates. Each of these 38 subjects is asked to sit in a soundproof room with earphones on. At 5 second intervals, different numbers are simultaneously spoken into each of the respondent's ears. (For example, the subject might hear 1D "fifteen" spoken in his/her right ear at the same time that "thirty-one" is spoken into his/her left ear.) This continues for 10 pairs of numbers. (NOTE: Each of the 20 numbers is an integer on the range from 1 to 99 and no two numbers are the same.) Each subject is then asked to write down 10 numbers that they remember having been spoken through the earphones. The 38 subjects are then classified into one of three groups: (a) The subject recalled more numbers spoken into his/her left ear than into his/her right ear, (b) the subject recalled more numbers spoken into his/her right ear than into his/her left ear, or (c) the subject recalled just as many numbers spoken into his/her right ear as were spoken into his/her left ear. Your data are as follows: Table 1: Ear That Heard Most Written-Down Numbers by Left- versus RightHandedness. Handedness Ear that heard most Left Right Left 4 11 Right 9 2 Neither 5 7 a. Give the estimated conditional probability that a subject remembers numbers spoken into his/her right ear more often than numbers spoken into his/her left ear (i.e., that a subject has a value of "right" on the "ear that heard most" variable) given that he/she is right-handed. (Show your work!) b. Is there evidence that ISU undergraduates' right- or left-handedness is statistically independent of whether they are more likely to remember numbers spoken into their left ear, their right ear, or neither ear? (Use the .05 level of significance.) c. Do your data support the psychological theory? (Explain your answer!) d. Assume that you gather a different sample. In particular, you randomly sample 7 left-handed ISU undergraduates, from a population in which 50% of all left-handed ISU undergraduates are more likely to remember numbers spoken into their right ears than numbers spoken simultaneously into their left ears. What is the probability that 2 of these 7 subjects would remember numbers spoken into their right ears more often than numbers simultaneously spoken into their left ears? 2D 4. A few years ago Newsweek proclaimed Eastham Prison in Texas to be "America's toughest prison." You have obtained permission from Eastham's warden to investigate the tranquilizing effects of marijuana on violence in the prison. Your data come from 15 prison inmates who volunteered to participate in your research. During the month prior to beginning your research, the inmates in your sample committed an average of twenty violent acts at the prison and the standard deviation among the number of inmates' violent acts was six. a. You plan to provide unlimited marijuana to the fifteen inmates for a one month period and to note the violent acts committed by each inmate during this month. If during this month the average inmate's violent behavior is not at least five violent acts fewer than during the month prior to your research, the warden is not interested in using marijuana as a means of decreasing violence in the prison. Is your sample large enough for you to detect an estimated decrease of 5 violent acts as statistically significant at the .05 level of significance? (Show your work.) b. You complete your research and provide unlimited marijuana to the fifteen inmates for a one month period, during which time you note the violent acts committed by each inmate. The numbers of each inmate's violent acts are listed below: 14 27 16 10 12 21 9 15 30 18 14 13 16 5 20 Give a 95% confidence interval for the number of violent acts committed by the inmates during the month that they were provided unlimited marijuana. c. Parts a and b make two important assumptions—one about how the 15 inmates were selected for analysis and another about the distribution of inmates' numbers of violent acts within the prison. What are these assumptions? Explain why these assumptions are (or are not) justifiable in this analysis. d. In your report to the warden, what conclusions do you make about the substantive importance (in the warden's opinion) and the statistical significance of your findings? (Justify your conclusions.) 5. The United States Food and Drug Administration (FDA) has just given your research organization permission to administer your new experimental drug, kilumbicyanide (KUBC), to ten human subjects. The drug is supposed to improve the memories of people suffering from Alzheimer’s disease. Two hours before and two hours after taking the drug you administer a ten-item recall (or memory) test to a random sample of ten people afflicted with Alzheimer’s disease. Scores on the test range from 0=no recall of any of the items to 10=perfect recall of each item. For each subject you then calculate the amount of improvement in 3D memory by subtracting "his/her recall score before taking the drug" from "his/her recall score after taking the drug." The range on this improvement measure is thus between –10=least improvement (I.e., the subject had perfect recall before taking the drug then no recall after taking it.) and 10=most improvement (I.e., the subject had no recall before taking the drug then perfect recall after taking it.). Subjects' scores on this improvement measure are as follows: –1 2 4 –2 7 0 1 2 6 –1 a. Find estimates of improvement scores' central tendency and dispersion among the population of all Alzheimer’s patients. (Show how your estimates were derived and give units for each estimate.) b. Give a 95% confidence interval for subjects' improvement scores. (Show your work.) c. Do you have statistically significant evidence that KUBC improves the memories of Alzheimer’s patients? (Use the .05 level of significance and explain your answer by interpreting your statistics in words.) d. Parts b and c make two important assumptions—one about how the 10 patients were selected for analysis and another about the distribution of their improvement scores. What are these assumptions? Explain why these assumptions are (or are not) justifiable in this analysis. e. You wish to ask the FDA's permission to replicate your research using more subjects, but you must first tell them how many you will need. For the purpose of estimating the population variance, assume that the above data are representative of all Alzheimer patients. How large a sample would you need to detect an average improvement score of 1 as significantly larger than zero at the .01 level of significance? (Show your work.) 6. Five years ago the Iowa State Government changed its hiring procedures from recruitment done by the Iowa Central Personnel Agency (ICPA) to recruitment done by the specific agencies within which hirees would be working. The old procedure for hiring someone into a job involved two steps: First, job seekers would send applications to the ICPA. Second, the ICPA would send the 6 "best applicants" to be interviewed at the specific agency (SA) within which the job was located. In contrast, according to the new procedure all applications are sent to the SA, and officials at the SA choose which are the 6 best applicants to be interviewed. When the change in hiring procedures was made, the Director of the ICPA justified her decision with the argument that officials at the ICPA were less qualified than those in the SAs to evaluate who the best applicants for a specific 4D job were. Now that her decision has been in effect for 5 years, she has hired you to evaluate whether or not state employees hired since her decision (under the new hiring procedure) are better workers than those hired prior to her decision (under the old hiring procedure). She gives you unlimited access to the ICPA's files on all of Iowa's 3,637,936 past and present state employees. Although most information in these files can NOT be accessed electronically, you are able to compile the following table from the ICPA's main computer: Table 1. Hiring outcomes of applicants for Iowa state jobs during 2 time periods. How long ago a hire was made Hiring outcome 6+ years ago 0-5 years ago hired 3,547,891 90,045 not hired 14,191,564 359,955 Be sure that you interpret the data in this table correctly. For example, the 3,547,891 in the table indicates the number of applicants who were hired for Iowa state jobs 6 or more years ago. Your first concern is that one of the two time periods might have had more applicants-per-job than the other. This concerns you because you believe that applicants hired during the last 5 years may be better workers simply because in comparison to previous years more people may have applied for their jobs (or on the other hand, may be worse workers because of fewer applicants). (For example, one would generally expect that the best 6 from among 100 applicants would be worse than the best 6 from among 150 applicants.) Parts a and b of this question deal with this concern. a. What is the conditional probability that an applicant for an Iowa state job was hired when the old hiring procedure was being used (i.e., given that the hire was made 6+ years ago)? b. Using the .05 significance level, evaluate whether during the last 5 years there was a significantly different number of applicants-per-job than during prior years. Was there a substantively different number of applicants-per-job during the last 5 years than during prior years? How can you tell? (Hints: Do NOT use chi-square in your answer. Instead, use the answer in part a as the "no effects value" of the parameter being evaluated. You may find it useful to think here of "employees hired from an applicant pool of a particular size" rather than of applicants-per-job.) c. If you were to randomly sample 5 applicants who applied for an Iowa state job when the old hiring procedure was being used, what is the probability that all five were hired? 5D d. Of course, at this point you have not yet considered anything related to whether or not "better workers" have been hired during the past 5 years. In doing this you first identify pairs of employees (comprised of one of the 3,547,891 "pre-decision employees" hired before the ICPA's change in hiring procedures and one of the 90,045 "post-decision employees" hired after the change) who have identical characteristics on a variety of variables (e.g., age, gender, marital status, job-type, etc.). You then randomly sample of 49 of these matched pairs of state employees. Your data on each pair come from the first-year evaluations written by each employee's supervisor. First, you count the number of positive statements in each of these evaluations. Second, you subtract the number of the positive statements for the pre-decision employee from the number of the positive statements for the post-decision employee in each matched pair. Thus a score of 2 on the resulting variable indicates that the post-decision employee in a matched pair had two more positive statements in her/his first-year evaluation than did the pre-decision employee in the pair. This variable has a mean of 3 and a variance of 100. Do you have statistically significant evidence (at the .05 level) that the ICPA Director's decision resulted in the hiring of better workers? e. In your final report you recommend that someone should replicate (i.e., repeat) your analysis after another 5 years. If such a replication were to find the same sample mean (i.e., 3) and variance (i.e., 100) as you found, what would be the smallest possible sample with which you could conclude (at the .05 significance level) that the ICPA Director's decision resulted in the hiring of better workers? 7. Between 1778 and 1868 a total of 368 treaties were signed between the United States and various tribes of Native Americans. As early as 1873 Edward Smith (U.S. Commissioner of Indian Affairs) was able to look back on these 9 decades of treaty-making and observe, "We have in theory over sixty-five independent nations within our borders, with whom we have entered into treaty relations as being sovereign people; and at the same time the white agent is sent to control and supervise these foreign powers, and care for them as wards of the Government. The double condition of sovereignty and wardship involves increasing difficulties and absurdities." During the 1900s U.S. government policies toward Native Americans have commonly been based on legal arguments that ignored Native Americans' sovereignty but that promoted U.S. business interests in the name of the U.S. government's paternalistic role over its Native American wards. For example, given the terms of the 1842 treaty with the Chippewa Nation and its mineral rights in Wisconsin, BHP Billiton (an Australian mining company) is currently fighting for its legal right to use its cyanideleaching process for mining gold in Crandon Mine, and in so doing to pollute the 6D Chippewa lands where the mine is located. The company's legal staff argues that the mine will provide jobs for Native Americans, and thus will (paternalistically) help them. You believe that the shift from sovereign to paternalistic legal treatment of Native Americans dates back to the 91 year period when treaties between Native Americans and the United States were first being signed. In operationalizing your key concepts, you note that sovereignty is usually conveyed when a Native American is referred to in a treaty as the grammatical subject of a sentence (e.g., "If any citizen of the United States . . . shall attempt to settle on any of the lands hereby allotted to the Indians to live and hunt on, . . . the Indians may punish him or not as they please." [Treaty of Hopewell, Art. 4, Jan. 3, 1786]). On the other hand, paternalism is conveyed whenever a Native American is referred to in a treaty as the grammatical object in a sentence (e.g., "The United States bind themselves to protect the aforesaid Indian nations against the commission of all depredations by the people of the said United States." [Treaty of Fort Laramie, Art. 3, Sep. 17, 1851]). Taking the texts of the 368 treaties as your data source, you find 15,251 instances of the words "Indian" or "Indians" in these texts. a. You randomly sample 64 out of the 15,251 instances for a preliminary analysis of your data, and note the date of the treaty in which each appears. The average among these 64 dates is 1820, and the standard deviation among the dates is 48. Describe the sampling distribution of average dates among samples of 64 instances, and give point estimates of this distributions's parameters (i.e., its mean and variance). b. Your analysis would be greatly simplified if just as many of the 15,251 instances appeared in later treaties (1825 or after) as appeared in earlier treaties (before 1825). Referring to the average of 1820 reported in part a, what is the probability that an average this far or further from 1825 is due to sampling error? (Hint: In answering this part of the question, assume that the average treaty-date among all instances equals 1825.) c. Based on the probability (sometimes called a "p-value") calculated in part b, do you have statistically significant evidence (at α = .05) that the average of 1820 is different from 1825? Explain your answer. The next parts of this problem deal with a second data set. In particular, the next step in your analysis is to sample 20 instances within each of six time periods during the 91 years when the treaties were signed. Your data are as follows: 7D Table 1. Use of "Indian" or "Indians" as subject or object in U.S. treaties with Native Americans for six time periods between 1778 and 1868. Period when treaty was signed 177817951810182518401855Grammatical usage 1794 1809 1824 1839 1854 1868 subject 15 8 7 9 5 4 object 5 12 13 11 15 16 Be sure that you interpret the data in this table correctly. For example, the 15 in the table's upper-left cell indicates how many of the 20 instances of "Indian" or "Indians" sampled from treaties signed between 1778 and 1794 were the grammatical subjects of sentences within these treaties. d. Does Table 1 provide statistically significant evidence (at the .05 level of significance) that grammatical usage of the words "Indian" or "Indians" within a treaty is dependent on the period when the treaty was signed? e. Using conditional probabilities, evaluate whether the data in Table 1 are consistent with your belief that "the shift from sovereign to paternalistic legal treatment of Native Americans dates back to the 91 year period when treaties between Native Americans and the United States were first being signed"? Explain your answer! f. A random sample of how many instances of the words "Indian" or "Indians" would be needed at the .05 significance level to obtain an estimate within 5 percentage points of the percent of such instances used as subjects in treaties signed between 1855 and 1868? Justify your choice of a variance estimate. (Hint: Assume that all instances of the words are used grammatically as either subject or object.) 8. In 1922 William Ogburn was the first social scientist to point out that in modern societies technological progress is often met with dissatisfaction. Ogburn used the term, "cultural lag," to refer to the tendency for people's acceptance of (or satisfaction with) technological change to occur later than the time when the technology is introduced. Cultural lag is problematic insofar as popular dissatisfaction slows down the pace at which technological improvements are implemented. One of the most recent innovations in contemporary societies involves a change from "traditional-learning" to "distance-learning." Students need no longer attend classes but can communicate over the Internet with teachers and classmates using e-mail and chat rooms. Such online learning leads to greater efficiency in the efforts of both teachers (e.g., as "streaming videos" free up their time for more teacher-student interaction online), and students (e.g., as studying and interaction occur at times convenient for those with full-time employment). 8D Virtual University (vu.org)—the self-proclaimed "oldest and largest online learning community"—has hired you to evaluate cultural lag among its 947 students and 28 faculty. You develop a "Cultural Lag Scale" (CLS) consisting of 20 agree/disagree questions that measure respondents' dissatisfaction with online learning. For example, the last question on the scale is as follows: "Do you agree or disagree that students learn at least as much from online video lectures as they would from classroom lectures?" An agree response indicates satisfaction with online learning, whereas a disagree response indicates dissatisfaction with online learning. CLS scores are obtained by assigning a weight of 0 to each progressive (i.e., satisfaction) response and of 1 to each lag (i.e., dissatisfaction) response, and then summing these weights. Thus each respondent's CLS score can range from 0 to 20, where 20 lag-responses signifies greatest cultural lag and 0 lag-responses signifies most cultural progressiveness. (A score of 10 signifies equivalence in the respondent's cultural lag and her progressiveness.) a. Based on short telephone interviews with a small randomly sampled group of Virtual University (VU) students, you believe that the variance in students' CLS scores equals 12.25 squared lag-responses. Assuming that your belief is accurate, how large a sample would be needed to ensure that an average CLS score would be within 1 lag-response of the true mean in 99 out of 100 samples of this size? b. You randomly sample 200 students' names out of the 947 students currently enrolled at Virtual University (VU). You then mail out questionnaires to these 200 students as well as to all 28 VU faculty. Whereas all faculty return their completed questionnaires to you, only 124 of the students return theirs. Estimate the joint probability that a questionnaire was both mailed out to a student and returned. c. Do you have evidence at the .001 significance level that a person's returning of her/his questionnaire is statistically independent of whether s/he is a student or a faculty member? d. Based on the data presented in part b, calculate a point estimate for the proportion of VU students (i.e., NOT faculty) who would not return your questionnaire if you were to mail one to them. e. Find a 95% confidence interval for the proportion calculated in part d. f. You decide to use the CLS scores from the 28 VU faculty to evaluate whether all distance-learning faculty (i.e., all faculty who, like VU faculty, teach online) exhibit more progressiveness than cultural lag. CLS scores among the 28 VU faculty are 10.77 lag-responses on average, with a standard deviation of 3.1 lag-responses. If the average CLS score among all distance-learning faculty equals 10 lag-responses (i.e., equal progressiveness and cultural lag) with a 9D standard deviation of 3.1 lag-responses (i.e., the same found in your sample), what is the probability that a random sample of 28 faculty from this population (i.e., from the population of all distance-learning faculty everywhere) would have an average CLS score of 10.77 or more progressive than this? g. What two assumptions must be made before you may legitimately claim that the probability obtained in part f can be generalized to all distance-learning faculty everywhere? 9. On May 2nd, 1994, the African National Congress (ANC) party was declared to have won majority control of South Africa's parliament within the country's first all-race election. Soon thereafter Nelson Mandela was sworn in as the country's first black president. During the decade leading up to this election, increasingly violent unrest perpetrated primarily by youths in black townships was recognized by Mandela and others as legitimate anti-apartheid protest. Yet after 1994 youth unrest recurred, as poor blacks realized that their situation had not improved. In fact, the South African economy has worsened dramatically since 1994. For example, Statistics South Africa reports that between February and September 2001 unemployment increased from 37.0% to 41.5% of the economically active labor force, and that there was 11.6% inflation in the 12 months following August 2001. Since the 1999 election, the ANC has controlled over 58% of the seats in Parliament. (Parties dominated by whites control less than 17%.) In this new political climate, youth unrest is no longer recognized as legitimate by the ANC. Instead, unrest is understood as lawless delinquency that is undermining the prosperous state that the new black leadership is striving to build. Given these concerns, the ANC hires you to investigate how widespread delinquent acts are among South African's youth. In particular, they want you to estimate the average number of delinquent acts perpetrated during the past month by South African youths between the ages of 15 and 25. a. You begin your research with a small pilot study of 25 youths (randomly sampled from among all South African youths between 15 and 25 years of age). After lengthy interviews with the youths, you determine that they perpetrated an average of 10 delinquent acts during the past month, and that the standard deviation among these acts was 21 acts. Find a 95% confidence interval for this average. b. After obtaining the confidence interval in part a, a colleague points out to you that 20 out of the 25 youths in your pilot study perpetrated no (i.e., zero) delinquent acts during the past month. She then says, "Given that so few of the youths participated in any delinquent acts, and that so many delinquent acts were perpetrated by so few of them, you should not calculate a confidence interval in this case." Why is this? (Hints: Of course, the median 10D would be a less misleading measure of central tendency than the mean here. Yet nonetheless, there is nothing illegitimate about estimating a confidence interval for such a misleading measure just because your population variance is large due to skewed data like those described in this case.) c. Of course, the pilot study was not entirely worthless. It has provided you with an estimate of the population standard deviation among the youths' delinquent acts. Given this, how large a sample size would you need to estimate the average number of delinquent acts perpetrated (during the same month) by South African youths (aged 15-25) to within 1 delinquent act? (Hints: Use the .05 significance level, and show your work.) d. After collecting data from interviews with 225 South African youths (aged 15-25) you obtain precisely the same mean (i.e., 10 acts) and standard deviation (i.e., 21 acts) as you did in the pilot study. What is the probability that this mean is within 1 delinquent act of the true average number of deviant acts perpetrated during the past month by South African youths between the ages of 15 and 25? After reexamining transcripts of your interviews with the youths, you realize that youths involved in delinquent acts were chronically unemployed, whereas those not involved had expressed some hope of future employment. In preparing your report for the ANC, you wish to incorporate parts of the following data, obtained from Statistics South Africa: Table 1: South African Labor Market Trends (in millions of people). Economically active Employed Unemployed Economically inactive February 1, 2001 11.837 6.961 8.323 September 1, 2001 10.833 7.698 8.834 (Hint: Be sure that you read this table correctly. For example, on February 1, 2001, economically active South Africans were comprised of 11.837 million employed persons and 6.961 million unemployed persons.) e. What is the marginal probability (i.e., regardless of date) that a South African is economically active (i.e., employed or unemployed)? f. What is the joint probability that a South African was employed and that this employment was on February 1, 2001? g. Using numbers in Table 1 calculate two conditional probabilities to evaluate whether or not there was an increase in South African unemployment from 11D February to September 2001. (Hint: In this problem you should consider unemployed South Africans as neither employed nor economically inactive.) h. To see if the increase described in part g is statistically significant, you calculate a chi-square from Table 1, and find χ = .095647 to be its value. Based on this value of chi-square, do you have evidence that South Africans' employment status is dependent on the time in 2001 when it was considered? (Hints: Use α = .001 , and be sure to show how you arrive at your answer!) 2 i. Your savvy colleague looks over your shoulder once again and comments, "Duuuuhhhh. The data in Table 1 are in millions, silly. It wasn't 11.837 people who were employed on February 1, 2001; it was 11,837,000 people!!! You have to change the table's cell frequencies to millions before calculating chisquare." Without recalculating chi-square (i.e., making use of the chi-square value given to you in part h), what is the correct value of chi-square for Table 1? 10. Native American settlements only became prevalent in Iowa during the Late Archaic Period (2500 - 500 B.C.). These Indians were primarily hunters, who stalked their prey in groups using bows and arrows. During the Woodland Period (500 B.C. - A.D. 1000) Iowa Indians' diet was expanded to include fish as well as vegetables that they had farmed (instead of gathered) themselves. Archaeologists suspect that another important difference between the Native American populations of Iowa during these two periods is that Indians traveled more during the later than during the earlier period. You are doing archaeological work at Buchanan Bog (a large peat bog northeast of Ames, near McFarland Park). The site is perfect for your research because it contains many artifacts from both the Late Archaic and Woodland Periods. These artifacts include bone tools, flint arrowheads, and some ceramics. Of central importance in your research is the type of flint that the arrowheads from this site are made of. Most of these arrowheads are made of flint that is easily found nearby Buchanan Bog. However, other arrowheads' flint comes from locations as far as Illinois. After training yourself on the geology of Iowa and its surrounding states, you assemble a data set by sampling (in as random a fashion as possible) 52 arrowheads from the site, and by assigning two pieces of information to each of these arrowheads: the period (i.e., Late Archaic versus Woodland) and the distance (in miles) from Buchanan Bog to the likely location where the arrowhead came from. Your thinking is that more traveling will have occurred in sites with arrowheads made of flint from relatively distant origins. As it turns out, the same number (i.e., 26) of arrowheads in your sample come from each period. Data on the means and variances of the distances (separately for each period as well as for the combined data on all 52 arrowheads) are as follows: 12D Table 1. Means and variances for distances (in miles) from Buchanan Bog to likely locations of origin for arrowheads from Late Archaic and Woodland Periods. Period Late Archaic (2500 - 500 B.C.) Woodland (500 B.C. - A.D. 1000) Combined Data (2500 B.C. - A.D. 1000) Sample Size Mean Variance 26 29 1518 26 25 1128 52 27 1297 a. What are the units associated with the variances (i.e., the values 1518, 1128, and 1297) listed in the table? b. What is the marginal probability that one of the arrowheads in your sample is from the Late Archaic Period? Show your work! c. Your supervisor is surprised that half of the arrowheads in your sample are from the Late Archaic Period. There were many fewer Indians in Iowa during this earlier period, and at all other archaeological sites in Iowa only 35% of arrowheads have been from this period whereas 65% have been from the Woodland Period. If Buchanan Bog is actually just like these other Iowa sites in that the true percentage of Late Archaic Period arrowheads there is 35%, what is the probability of finding 26 Late Archaic Period arrowheads or more among 52 arrowheads randomly sampled from the bog? Show your work! d. As expected, most of the 52 arrowheads in your sample are made of flint that is easily found nearby Buchanan Bog. In creating your data matrix, you assigned a distance score of 0 miles to all such arrowheads. To all other arrowheads you assigned scores ranging from 25 miles to 116 miles. Given this method of assigning distance scores, what is the arrowheads' distance score at the 50th percentile? Be sure to explain your answer. (Hint: Note the word "most" in the first sentence of this problem.) e. Calculate the 95% confidence interval for the average distance score for the combined data on all 52 arrowheads. Show your work! f. Wishing to be helpful, your research assistant calculates 95% confidence intervals separately for each period's average distance score (i.e., one confidence interval for the Late Archaic arrowheads' mean of 29 miles and one for the Woodland arrowheads' mean of 25 miles). Why is it illegitimate for the research assistant to use confidence intervals in obtaining interval estimates for these two means? g. Your supervisor wants to be sure that your estimate of the average distance from Buchanan Bog is within 5 miles of the true average distance for all 13D arrowheads at the site. She suggests that you return to the bog and randomly sample enough additional arrowheads to ensure this degree of accuracy when you recalculate your 95% confidence interval using a larger sample (i.e., the original 52 arrowheads plus additional ones you will soon be sampling). To obtain such accuracy, how many arrowheads would be needed in addition to the original 52? h. Based on the data in Table 1, do you have evidence (at the .05 significance level) that Native American Indians traveled more during the Woodland Period than during the Late Archaic Period? Explain your answer? i. Assume again as in part c, that Buchanan Bog is just like other Iowa sites in that the true percentage of Late Archaic Period arrowheads there is 35%. However, this time calculate the probability of finding 4 Late Archaic Period arrowheads or more among 5 arrowheads randomly sampled from the bog? Show your work! 11. Minnesota is like Iowa in that much of its agricultural industry is controlled by large corporations. These corporations leave most of the state's small farmers unable to produce food at prices at or below the prices charged for the same agricultural products produced on the corporations' massive farms. A common survival strategy for small farmers is to produce food for "niche markets" (for example, range free pigs, organically grown vegetables, etc.). You have identified what you believe may be a new niche market for Minnesota's small farmers. The world's largest Somali community outside of Somalia is located in Minnesota in the vicinity of the twin cities, Minneapolis and St. Paul. On a recent trip there you discovered that upon immigrating to Minnesota, Somalis gave up a food that was central to their diet before leaving Somalia: goat meat. Yet Somalis are an Islamic people, and according to Islamic law it is a sin to eat meat from a goat that was not slaughtered in an Islamically proper manner. To evaluate the feasibility of small Minnesota farmers' profitable production of goat meat, you travel numerous times to Minneapolis to conduct face-to-face interviews with a random sample of Somali immigrants there. You have been unable to obtain funding for this research because no foundation is willing to support "a mere feasibility study." Moreover, on your graduate student's income you cannot afford to make too many of these trips. Your interview schedule is lengthy, and you can only complete 5 interviews per trip. Your money runs out after 6 trips, leaving you with data from 30 completed interviews. You begin your analysis by examining data from the following two questions asked of the 30 Somali immigrants that you interviewed: 14D IMPROPER "Would you be willing to eat meat from a goat that was not slaughtered according to the prescriptions of Islamic law?" (Responses were 0="No" and 1="Yes".) HOWMUCH "How many dollars would be the most you would pay for a pound of fresh, Islamically-proper goat meat?" (Responses were recorded in dollars and cents.) On the variable, IMPROPER, 20 Somalis responded "No" and 10 responded "Yes." The mean value on HOWMUCH is $4.60 and its standard deviation (a population s.d. estimate) equals $1.40 . The complete data on HOWMUCH are as follows: Table 1. Grouped data on the most Somalis are willing to pay for goat meat. Dollars: 2 3 4 5 6 7 Somalis: 2 5 7 8 5 3 Be sure that you read this table correctly. Its last column should be read as indicating, "Three Somali respondents indicated that seven dollars was the most they would pay for a pound of fresh, Islamically-proper goat meat." a. Give an estimate of the marginal probability that a Somali immigrant would be willing to eat meat from a goat that was not slaughtered according to the prescriptions of Islamic law. b. Calculate a 95% confidence interval for the proportion of Somali immigrants that would be willing to eat meat from a goat that was not slaughtered according to the prescriptions of Islamic law. c. A local small farmer is not impressed with your findings. She says, "I have three friends from the Somali community, and all three of them tell me that they would refuse to eat improper goat meat." You explain to her that it would not be at all improbable that three people sampled at random from the community would refuse to eat improper goat meat even if a third (i.e., 33%) of this community were willing to do so. Calculate the probability that this would happen (i.e., the probability associated with the statement in italics within the previous sentence). d. Until an Islamically-proper slaughterhouse is built, many small Minnesota farmers must be convinced that enough Somali immigrants will be willing to eat "improper goat meat" for them to sell any goats they might produce. The confidence interval obtained in part b is too large for them to be convinced of this. They tell you that they are not interested in raising goats unless they can be sure that at least 25% of the Somalis are willing to eat improper goat meat. You decide to reassure these farmers using data from a larger sample. You seek funding for a large enough sample to provide you with a precision of .02 at the .05 level of significance. In applying for this funding, how big a sample 15D would you say that you needed? Be sure to justify your choice of a variance estimate. PLEASE TAKE NOTE: Parts e and f deal with the variable, HOWMUCH. e. Returning to your original data on 30 Somalis, determine how many dollars was "the most a Somali would pay for a pound of fresh, Islamically-proper goat meat" at the 40th percentile? f. Now imagine that the most Somali immigrants in Minnesota would pay for fresh, Islamically-proper goat meat is actually (i.e., the variable HOWMUCH has a population mean equal to) $3.97 per pound. For any 30 Somalis randomly sampled from your population of Minnesota Somali immigrants, what is the probability that this sample's mean value on the HOWMUCH variable is as large or larger than the amount you obtained in your sample (namely, $4.60)? (Hint: Use the data from your sample to estimate the population standard deviation.) 12. You are part of a small team of anthropologists working on a “dig site” just east of Carlisle, Iowa, near the Des Moines River. From approximately 1150 until 1450 AD this location is where a small village of Oneota Native American Indians lived. Your research interests are in understanding more about the Indians’ diet. You know that the meat in their diet consisted of deer, elk, and bison. However, you believe that meat on the flanks or the internal organs of these animals were usually not eaten by the Oneota Indians. Your reasoning is that after killing their prey, the Indians did not carry entire carcasses back to their village. Instead, you speculate, they butchered animals in the field and only took the animals’ limbs to the village (leaving head and trunk behind). Since Oneota Indians commonly used dogs as beasts of burden, this would make sense because their dogs could hardly have carried an entire deer, elk, or bison carcass. At the site your data consist of bones left over from meals consumed centuries ago. After careful examination you determine that these bones are the remains of 16 deer, 4 elk, and 2 bison. You place the bones in 22 bins—one bin containing the remains of each animal. Your research task is to determine if (of all bones in each bin) the percent of limb-bones is greater than this percent would be if bones of the entire animals were left behind. The percent of deer bones that are limb bones is 30% (i.e., 3 pounds of limb bones for every 10 pounds of other bones). For elk this percent is 25%, and for bison the percent is 20%. You also know that in the case of each of these three species the standard deviation in their percent of all bones that are limb bones is 13%. Please note that if an entire deer were brought back to the village and eaten there, one would expect that 30% of the 16D leftover bones would be limb-bones and that 70% of the bones would be other bones (e.g., skull, backbone, rib, etc.) from the deer. a. You begin by analyzing the 16 sets of deer bones (i.e., your sample size is 16 at this point). For each set of bones you calculate the percent of all bones in the set that are limb bones. What is the level of measurement of this justcalculated measure? (Hint: It may be helpful to write down what a data matrix with these calculated measures might look like.) b. Your colleagues will only believe that “Oneota Indians tended (1) to butcher animals in the field and (2) to only bring these animals’ limbs back to the village” if in your data the percent of limb bones is more than half (50%). Initially, you interpret this to mean that the average percent of limb bones among your 16 sets of deer bones should be greater than 50%. Is your sample of 16 sets of deer bones sufficiently large to ensure that sampling error would not be a probable explanation (at the .05 significance level) for a finding of such an increase (i.e., 50% minus 30%) over what one would expect if entire deer were brought back to the village and eaten there? c. You calculate the percent of limb bones among all bones in each of your 16 bins of deer bones, and find that the average among these percents is 40%. Give a 99% confidence interval for this finding. d. Is the finding in part c a substantively important one? Explain your answer. (Hint: In formulating your explanation, you may find it useful to consult part b.) e. You take a different approach when analyzing the data on your four sets of elk bones. Instead of recording the percent of each bin’s bones that are limbbones, you only record whether or not this percent is above 50% (because 50% is the minimum percent that your colleagues will find convincing). Given that the percent of elk bones that are limb bones is 25%, what is the probability that all four of your sets of elk bones would have a percent of limbbones greater than 50%? f. Imagine that you do find that all four of your sets of elk bones have a percent of limb-bones greater than 50%. Does this finding provide statistically significant evidence at the .05 significance level that “Oneota Indians tended to butcher elk in the field and to only bring these animals’ limbs back to the village”? Explain your answer. g. Is the finding in part f a substantively important one? Explain your answer. (Hint: As in part d, you may find it useful to consult part b.) 13. Since the US Surgeon General’s 1972 report on the topic, social psychologists in the United States have extensively researched the effects of the mass media on 17D violent behavior. Yet relatively little research has been done on the effects that the US media have on the quality of people’s marriages. Like media-violence researchers, who hypothesize that violent television-content leads to viewers’ subsequent violent behaviors, you hypothesize that television-content depicting dysfunctional (i.e., bad) marriages leads to viewers’ subsequent dysfunction in their own marriages. However, your theoretical argument for why others’ dysfunction causes one’s own dysfunction has nothing to do with the “modeling argument” used by those who study the effects of media-violence. Unlike them, you do not believe that people simply imitate the marriages that they see on television—be they good (i.e., well-functioning) or bad ones. Instead, your theory is as follows: Marriages are relationships, and (like all relationships) they require work. Witnessing someone else’s poor marriage, leads one to believe that one’s own marriage is relatively good. If a person believes her- or himself to have a good marriage, she or he will work less on that marriage. When people work less on their marriages, these marriages will worsen. In contrast, when one witnesses others’ good marriages, one comes to believe one’s own marriage to be relatively bad. As a consequence, one becomes motivated to work more on one’s marriage which, in turn, results in marital improvements. To test your theory you randomly assign 25 married Psychology 101 students to each of two groups: Group 1 is shown a 15 minute video from an episode of “Mad About You” in which Helen Hunt (Jaime) and Paul Reiser (Paul) depict an exceptionally good marriage. Group 2 is shown a 15 minute video from an episode of “The Young and the Restless” in which Melody Scott (Nikki) and Eric Braeden (Victor) depict an exceptionally bad marriage. After having watched one of these videos, students are asked whether they strongly disagree, disagree, agree, or strongly agree with the statement, “Relatively speaking, my marriage is just about perfect.” Your table-of-results is as follows: “Relatively speaking, my marriage is just about perfect.” Video watched strongly disagree Mad about you Young & rest. 12 2 disagree 1 11 agree strongly agree 10 3 2 9 a. The above table-of-results contains data on two variables. For each variable, calculate its appropriate measure of central tendency and its 18D appropriate measure of dispersion. Be sure to show (or explain) how you obtained each measure. b. Find the conditional probability that a student who was shown the “Mad about you” video strongly agrees that her or his marriage is “just about perfect.” In addition, find the conditional probability that a student who was shown the “Young & restless” video strongly agrees that her or his marriage is “just about perfect.” Are these two conditional probabilities consistent or inconsistent with your theory about the relation between media-content and marital dysfunction? (Explain your answer!) c. Among your professional colleagues, the only substantive difference among responses to the “perfect marriage” question is between agreement and disagreement. To accommodate this constraint, you collapse the variable’s four attributes into the two attributes of “agree” (indicated by 24 students, namely the 11 students who indicated “strong agreement” plus the 13 students who indicated “agreement”) versus “disagree” (indicated by 26 students, namely the 14 students who indicated “strong disagreement” plus the 12 students who indicated “disagreement”). Note that this collapsing changes the above table-of-results from a 2x4 table into a 2x2 table. In the four dashed boxes provided below, fill in the four cell frequencies that result from this collapsing. (Hint: The table’s marginal frequencies are provided for you.) “Relatively speaking, my marriage is just about perfect.” Video watched disagree agree Mad about you 25 Young & rest. 25 26 24 d. Compute chi-square for the 2x2 table that you filled-in in part c. Explain what this chi-square statistic indicates about the relation between the video the students watched and their agreement or disagreement with the perfectmarriage statement. (Use the .05 significance level, and be sure to show your work!) e. You expand your analysis using different data from a questionnaire submitted to a random sample of 22 married adults from your local community. You include the same statement (namely, “Relatively speaking, my marriage is just about perfect.”) in your questionnaire. However, this time you only ask respondents if they “agree” or “disagree” with this statement. (That is, you do not ask them whether or not their agreement or disagreement 19D is strong.) Your findings are that 15 of the adults agree with the statement and 7 disagree with it. Calculate the appropriate point estimate for the variable associated with these findings. f. Obtain a 95% confidence interval for the point estimate calculated in part e. g. How large of a sample from your local community would be needed to ensure that a newly calculated 95% confidence interval would have a precision of ± .1 (i.e., that Δ=.1)? (Hint: Be sure to justify your choice of a variance estimate.) 14. Iowa currently has 350,000 acres of publicly held lands in a system of state and locally owned parks, forests, and preserves. Lakes and streams represent an additional 324,000 surface acres of water in Iowa’s 132 lakes, 180,000 acres of wetlands, and hundreds of miles of interior rivers. Moreover, there are nearly 900 miles of multi-purpose trails in Iowa for biking, hiking, and crosscountry skiing. The primary way that the State of Iowa obtains additional public lands is through charitable contributions. Donated land is then set aside for future generations, and its state-sponsored preservation helps prevent soil erosion and protect Iowa’s streams, lakes, and wildlife. A committee within the Iowa State Legislature is drafting a bill (House File 2080), that provides Iowa landowners with charitable tax credits if they donate land for preservation and conservation. This committee has commissioned a group of ISU researchers to investigate how effective the bill would be if it were to become law. As a member of this committee you send a questionnaire to a random sample of large Iowan landowners (i.e., Iowans who own at least 100 acres of land). One of the items on this questionnaire measures their “willingness to donate land” as follows: “If charitable tax credits were provided for land donated to the State of Iowa for preservation and conservation, would you donate some of your own land?” (Possible responses to this question are 1 = yes and 0 = no.) Unfortunately, only 30 of those surveyed returned the questionnaire. Since the State of Iowa has provided you with data on how many acres EVERY Iowan owns, you are able to calculate that the 30 Iowans in your sample own an average of 200 acres of land, with an estimated population standard deviation of 100 acres. a. Find a 95% confidence interval for the average number of acres owned by large Iowan landowners. 20D b. When obtaining the confidence interval requested in part a, what assumption did you make about the distribution of “acres of land owned” among the Iowans in your sample? Explain how you might evaluate whether or not this assumption is justified. (Hint: See the underlined text just prior to part a.) c. You generate the following contingency table from your data. Using (i.e., not collapsing) this table, explain why it would be inappropriate for you to use chi-square in testing whether or not the amount of land a large Iowan landowner owns is independent of her or his willingness to donate land to the State of Iowa? Number of acres owned Would you donate? Yes No l00-150 151-250 1 7 6 4 more than 250 8 4 d. Using data from the table in part c, compute two conditional probabilities that might be used to support the argument that “the more land one owns, the more likely one is willing to donate some of this land to the State of Iowa.” Write a sentence in the space below in which you explain how the probabilities provide support for this argument. (Hint: Please show your work! And, incidentally, you should ignore whether or not the probabilities are significantly different from each other.) e. Using data from the table in part c, what is the “number of acres owned” at the 60th percentile among all 30 landowners in your sample? Be sure to state your answer in a complete English sentence! f. Using data from the table in part c, find an estimate of the joint probability that a large Iowan landowner both “owns more than 250 acres” and “indicates a willingness to donate land to the State of Iowa.” (Be sure to show your work!) g. Using the .05 significance level and data from the table in part c, find an interval estimate for the proportion of large Iowan landowners who indicated a willingness to donate land to the State of Iowa. 15. In the mid-19th century many Germans immigrated to the US to escape economic and political turmoil in their homeland. By 1890 a third of a million of them had settled in Illinois where nearly half took up farming in rural areas—locations with access to outside markets (primarily in Chicago) 21D supplied by an expanding network of railroads. The German farmers brought with them a tradition of family labor that lent itself to the production of small grains (flax, barley, wheat). Women and children typically joined the men in working the fields. To study whether or not this tradition resulted in the Germans’ resistance to increasing demand for large grains (especially, corn), you select the town of Schaumburg, Illinois (about 25 miles northwest of Chicago), as the site for your research. In the late 1800s over half of this town consisted of German immigrants, with the rest consisting of native-born US citizens. Noting that machinery developed at the time for corn planting and harvesting could be run solely by male farmers without their wives’ or children’s help, you believe that Schaumburg’s German immigrants will be less likely than its native-born residents to switch from small grain to corn production. Your data are from the 1870 US census. Unfortunately, some of the records from this census were destroyed in the Great Chicago Fire of 1871, and you must make do with what remains. Fortunately, all of the 1870 data on native-born (i.e., nonGerman) Schaumburg farmers survived the fire. Your data on Schaumburg Germans in 1870 is only for 70 farmers, however. Thus you have data on . . . all native-born Schaumburg farmers in 1870, and 70 German Schaumburg farmers in 1870. NOTE: This means that you can obtain parameters for the first subpopulation, but that you only have a (what you may assume to be random) sample from the second subpopulation. Table 1: Numbers of Schaumburg farmers who did vs. did not produce small grains in 1870 Schaumburg farmers in 1870 native-born farmers German farmers (census) (sample) Produced small grains 924 63 Did not produce small grains 176 7 a. Using data in Table 1, find the conditional probability that in 1870 a Schaumburg farmer produced small grains if he was native-born. b. Using the data in Table 1, obtain a point estimate of the proportion of 1870 German Schaumburg farmers who produced small grains. 22D c. Obtain a 95% confidence interval for the proportion that you calculated in part b. For parts d and e below: To simplify phrasing on these two parts of the question, let’s refer to “Schaumburg’s farmers in 1870” as “SF1870s.” Now imagine that your sample of 70 German SF1870s is in fact not representative of the population of all German SF1870s. Instead, assume that the proportion of all German SF1870s who produced small grains was identical (i.e., equal) to the proportion of all native-born SF1870s who produced small grains. (Hint: Use the data in Table 1 on native-born SF1870s to calculate the true value that each of these proportions equals. Also, you may find your answer in part a to be of use here.) d. What is the probability in a random sample of 70 (seventy) German SF1870s of finding as large or larger a proportion of farmers producing small grains than the proportion that you calculated in part b? e. What is the probability in a random sample of 7 (seven) German SF1870s of finding as large or larger a proportion of farmers producing small grains than the proportion that you calculated in part b? For parts f, g, and h below: At this point, you stop considering the data in Table 1, and begin working with a new variable: family size. You believe that in 1870 the size of Schaumburg’s German farmers’ families is larger than the size of its native-born farm families. Based on new data from the 1870 census, you find that the average number of children among all native-born farmers’ families is five. Using your sample of 70 German farmers, you find that their average number of children is six (with a standard deviation of 2 children). f. Obtain estimates of the mean and variance of the sampling distribution of the average number of children in Schaumburg’s German farmers’ families in 1870. (Hint: The sample size associated with this sampling distribution is 70.) g. Do you have evidence at the .05 significance level that in 1870 the size of Schaumburg’s German farmers’ families was larger than the size of its nativeborn farmers’ families? (Hint: Be sure to make use of the estimated variance of the sampling distribution referred to in part f.) h. How large a sample size would you need to estimate at the .01 significance level 1870 Schaumberg’s German farmers’ families’ average size to within one quarter (i.e., ¼ or .25) child? 23D