Geog 2000 Instructor: Paul C. Sutton Spring 2009 Name: ________________________________ Second Exam Geog 2000 Intro to Geographic Statistics (100 pts total) True / False (20 Points) Mark a “T” or an “F” in the blank space to the left of each of the following statements. 1) _________ The standard error diminishes as the square root of ‘N’ increases. 2) _________ All possible samples of the entities in a sampling frame must have equal probability of being sampled to ensure simple random sampling. 3) _________ A t-distribution with 3 degrees of freedom is identical to a standard normal distribution. 4) _________ Consider the four member set of Bob and Carol and Ted and Alice. This set has six permutations of two members and twelve combinations of two members. 5) _________ A woman has a 10 pound baby. In three months it weighs 20 lbs. She predicts it will double three more times in the coming year to a 1 year old weight of 160 pounds. This inappropriate reasoning is called interpolation. 6) _________ A map showing the dominant language of various countries is displaying nominal data. 7) _________ The range of Z-scores from -1.96 to +1.96 on a standard normal distribution function captures 68% of the total area under the curve. 8) _________ The mean of all t-distributions (regardless of their number of degrees of freedom) is 0. 9) _________ Opinion polls are typically modeled with a Poisson distribution function. 10) ________ On average, one in twenty 95% confidence intervals for a population mean will not include the true population mean . Short Answer (20 Points) #1 ) Explain Type I and Type II Error that is associated with statistical hypothesis testing using a murder trial as an analogy and testing the ‘fairness’ of a coin by flipping it ‘N’ times as a specific example. (5 points) #2 ) In your own words explain the central limit theorem. Be sure to include a definition of the concept of ‘the sampling distribution of the mean’ in your answer. (5 Points) #3 ) Suppose you are standing on the Driscoll bridge looking at cars passing underneath you. Assume you have the magical ability to ascertain and record both the age of the cars in years and the market value of the cars in dollars. Draw two reasonable hypothetical histograms of both the age and value of a large number of cars (N = ~2000 ?). Compare and contrast the two histograms. Which histogram would you expect to be more skewed? (5 Points) #4 ) In the blank space below draw two rectangles that represent the State of Colorado. In the one on the left draw a point pattern of a phenomena that you would expect to exhibit complete spatial randomness. In the other draw a point pattern of the final resting place of myriad volcanic bombs (hot rocks spewed from a volcano). Assume this hypothetical volcano is in the very center of the state. Also assume that Colorado is a true rectangle (e.g. no narrowing of your coordinate system as you head north. In addition provide a probability distribution function that you think would be a reasonable model for generating both the latitude and longitude coordinates of these phenomena. (5 Points) Statistical Problem Solving (40 Points) #1) Polls, Pols, and Probability (10 Points) Suppose you are a Pollster working for the Obama presidential campaign. Your mission is to identify ‘swing’ states within striking distance of Obama having a chance to win. Your first challenge is the state of Ohio. The campaign staff has decided that a state is worth fighting for (e.g. is ‘in play’) if 45% or more of the electorate says yes to the question: “Will you vote for Barack Obama in the November Presidential Election?”. You survey 1,000 randomly selected registered voters. 415 of them answer yes to the above question. Answer the following questions (10 Points): 1) 2) 3) 4) 5) 6) 7) What probability model and statistical test will you use to answer this question? What are your estimators for this statistical test? (i.e. the formulas you will use) What assumptions are you making with respect to this particular statistical test? What is your estimate of the fraction of registered votes that will vote for Obama? What is your 95% confidence interval around this estimate? Draw a diagram describing the sampling distribution of your estimated parameter. What do you tell the campaign team as to the ‘in play’ or ‘not in play’ status of Ohio? #2) Pet Rabbits, “Smart Pills”, and CSAP Scores (10 Points) A devious yet entrepreneurial kid at a local middle school has a pet rabbit that produces a lot of little rabbit feces that look like pills. He decides to market them to fellow classmates as “smart pills” making the claim that they will increase the intelligence and CSAP test scores of those who ingest them. Ten of his classmates fork over their milk money to buy and consume these “smart pills” and ten don’t. Given the data in the table below, perform an appropriate statistical test to see if the students who consumed the “smart pills” performed significantly better than the students that did not. Use a 95% confidence level for this test. Comment on and interpret the results of your test. Name Sharry Goffin Ethan Kline Noah Kienholz Phillip Campos Michael Bay Chris Crain Chris Earl John Templeton Dana Miller Jana Eisenberg CSAP score 72 88 63 73 37 78 55 24 92 92 Smart Pill (Y/N) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Name Paul Sutton Bernard Wright Steve Luftman Missy Clark Nick Basich Robbie Whitt Pete Wistort Aaron Schleiffer Jono Mann Martha Collins CSAP score 65 88 84 54 89 85 54 89 94 98 Smart Pill (Y/N) No No No No No No No No No No Be sure to answer the following questions: What statistical test do you use? What are the mean CSAP scores of students that did and did not take the smart pill? What are the 95% confidence intervals about those means? Are these means significantly different at a 95% confidence level? Comment on and interpret the results of your test. #3) The strange ‘False Positive’ Paradox (10 Points) Suppose AIDS is a relatively rare disease and 1% of the American Population has it. Suppose there is a blood test that tests positive 99% of the time for people that actually do have the disease. However, it also tests positive 1% of the time for people that don’t have the disease. You have just tested positive for AIDS. What is the probability that you actually have AIDS? What should you do if you did test positive given these numbers? SHOW YOUR WORK. #4) Drinking Beer and playing ‘Whack a Mole!’ (10 Points) Suppose you arrange for twenty volunteers to engage in a test of the effects of drinking beer on the ability to play ‘Whack a Mole!’. (Whack a mole is a kind of game involving hand-eye coordination and reaction time to smash robotic moles with a hammer). Your twenty subjects play a game of whack a mole in which you record how many moles they successfully whack in 2 minutes. They then consume a six-pack of beer within an hour and play whack a mole again. Given the data below perform an appropriate statistical test to see if drinking a six pack of beer changes one’s ability to play ‘Whack a Mole!’. Name Tom Cova Peter Brick Violet Gray Dan Montello John Cloud John Fraser Mike Taylor Matthew Kerwin Hilary Anderson Becky Hamann Anne Geddes Paul Ehrlich Chris Elvidge Karen Arther Sherlock Holmes Mick Jagger Dave Matthews Tom Waits Elton John Brittany Spears # Moles Whacked Sober 15 11 8 33 14 22 14 32 37 23 14 7 9 14 15 45 24 55 27 4 # Moles Whacked w/ Beer 5 4 11 21 7 15 15 22 16 20 17 5 5 17 10 33 31 32 13 0 Be sure to answer the following questions: 1) 2) 3) 4) 5) 6) What is the appropriate statistical test given this experimental design? Do you use a standard normal or a t-distribution for this test? What are the mean # of moles whacked with and without beer? What are the 95% confidence intervals around those means? Does this data suggest that drinking beer impairs one’s ability to play Whack a Mole? Comment on any methodological problems you might see with this experimental design. Using the Standard Normal Table Problems (10 Points) Use the Standard Normal Table Stapled to your test to solve these problems Note: Draw diagrams and pictures to explain your reasoning for partial credit #1) Paul Sutton is a relatively heavy man. He weighs 225 lbs. Assume that adult male Americans have weights that are distributed Normally with mean 175 and standard deviation of 25 lbs. What fraction of adult American men weigh more than Paul Sutton? (5 Points) #2) Consider the standard Normal curve below. What is the area under the curve from a Z-value of -1.37 to a Z-Value of +2.05? Draw this area on the figure and calculate its value. (5 Points) “Lies!, Damn Lies!, and Statistics!” - Mark Twian (aka Samuel Clemens) You have now read most of Darrel Huff’s “How to Lie with Statistics”. Provide a an example either directly from the book or analogous to an example in the book on how to lie with statistics. (5 Points) The Sampling Distribution of the Mean (5 Points) Assume each histogram below was derived from 100 random samples from each of the following types of distributions respectively: Uniform(0,1), Normal(0,1), Poisson(4), and a Bimodal with modes at 5 and 15 and variance = 144. In the blank line below each histogram match each of the four following distributions to the sampling distributions of the mean of each histogram - remember N=100. Note: The mean and variance of a Poisson are both equal to lambda, the variance of a Uniform distribution is (b-a)2 / 12. N(0, 0.1) N(4, 0.4) N(10, 1.2) N(0.5, .028) Match the Numerical characterization of probability distribution functions above to the histograms drawn below. Remember you are matching the above to the sampling distribution of the means of the histograms below based on N=100 ________________ _________________ _________________ _________________