SecondExam Spring 2010

advertisement
Geog 2000
Instructor: Paul C. Sutton
Spring 2009
Name: ________________________________
Second Exam Geog 2000 Intro to Geographic Statistics (100 pts total)
True / False (20 Points) Mark a “T” or an “F” in the blank space to the left of each of
the following statements.
1) _________ The standard error diminishes as the square root of ‘N’ increases.
2) _________ All possible samples of the entities in a sampling frame must have equal
probability of being sampled to ensure simple random sampling.
3) _________ A t-distribution with 3 degrees of freedom is identical to a standard normal
distribution.
4) _________ Consider the four member set of Bob and Carol and Ted and Alice. This set
has six permutations of two members and twelve combinations of two
members.
5) _________ A woman has a 10 pound baby. In three months it weighs 20 lbs. She
predicts it will double three more times in the coming year to a 1 year old
weight of 160 pounds. This inappropriate reasoning is called
interpolation.
6) _________ A map showing the dominant language of various countries is displaying
nominal data.
7) _________ The range of Z-scores from -1.96 to +1.96 on a standard normal distribution
function captures 68% of the total area under the curve.
8) _________ The mean of all t-distributions (regardless of their number of degrees of
freedom) is 0.
9) _________ Opinion polls are typically modeled with a Poisson distribution function.
10) ________ On average, one in twenty 95% confidence intervals for a population mean
will not include the true population mean .
Short Answer (20 Points)
#1 ) Explain Type I and Type II Error that is associated with statistical hypothesis testing
using a murder trial as an analogy and testing the ‘fairness’ of a coin by flipping it ‘N’
times as a specific example. (5 points)
#2 ) In your own words explain the central limit theorem. Be sure to include a definition of
the concept of ‘the sampling distribution of the mean’ in your answer. (5 Points)
#3 ) Suppose you are standing on the Driscoll bridge looking at cars passing underneath
you. Assume you have the magical ability to ascertain and record both the age of the cars
in years and the market value of the cars in dollars. Draw two reasonable hypothetical
histograms of both the age and value of a large number of cars (N = ~2000 ?). Compare
and contrast the two histograms. Which histogram would you expect to be more skewed?
(5 Points)
#4 ) In the blank space below draw two rectangles that represent the State of Colorado. In
the one on the left draw a point pattern of a phenomena that you would expect to exhibit
complete spatial randomness. In the other draw a point pattern of the final resting place of
myriad volcanic bombs (hot rocks spewed from a volcano). Assume this hypothetical
volcano is in the very center of the state. Also assume that Colorado is a true rectangle (e.g.
no narrowing of your coordinate system as you head north. In addition provide a
probability distribution function that you think would be a reasonable model for
generating both the latitude and longitude coordinates of these phenomena. (5 Points)
Statistical Problem Solving (40 Points)
#1) Polls, Pols, and Probability (10 Points)
Suppose you are a Pollster working for the Obama presidential campaign. Your mission is
to identify ‘swing’ states within striking distance of Obama having a chance to win. Your
first challenge is the state of Ohio. The campaign staff has decided that a state is worth
fighting for (e.g. is ‘in play’) if 45% or more of the electorate says yes to the question: “Will
you vote for Barack Obama in the November Presidential Election?”. You survey 1,000
randomly selected registered voters. 415 of them answer yes to the above question. Answer
the following questions (10 Points):
1)
2)
3)
4)
5)
6)
7)
What probability model and statistical test will you use to answer this question?
What are your estimators for this statistical test? (i.e. the formulas you will use)
What assumptions are you making with respect to this particular statistical test?
What is your estimate of the fraction of registered votes that will vote for Obama?
What is your 95% confidence interval around this estimate?
Draw a diagram describing the sampling distribution of your estimated parameter.
What do you tell the campaign team as to the ‘in play’ or ‘not in play’ status of Ohio?
#2) Pet Rabbits, “Smart Pills”, and CSAP Scores (10 Points)
A devious yet entrepreneurial kid at a local middle school has a pet rabbit that produces a
lot of little rabbit feces that look like pills. He decides to market them to fellow classmates
as “smart pills” making the claim that they will increase the intelligence and CSAP test
scores of those who ingest them. Ten of his classmates fork over their milk money to buy
and consume these “smart pills” and ten don’t. Given the data in the table below, perform
an appropriate statistical test to see if the students who consumed the “smart pills”
performed significantly better than the students that did not. Use a 95% confidence level
for this test. Comment on and interpret the results of your test.
Name
Sharry Goffin
Ethan Kline
Noah Kienholz
Phillip Campos
Michael Bay
Chris Crain
Chris Earl
John Templeton
Dana Miller
Jana Eisenberg
CSAP score
72
88
63
73
37
78
55
24
92
92
Smart Pill
(Y/N)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Name
Paul Sutton
Bernard Wright
Steve Luftman
Missy Clark
Nick Basich
Robbie Whitt
Pete Wistort
Aaron Schleiffer
Jono Mann
Martha Collins
CSAP score
65
88
84
54
89
85
54
89
94
98
Smart Pill
(Y/N)
No
No
No
No
No
No
No
No
No
No
Be sure to answer the following questions:
What statistical test do you use?
What are the mean CSAP scores of students that did and did not take the smart pill?
What are the 95% confidence intervals about those means?
Are these means significantly different at a 95% confidence level?
Comment on and interpret the results of your test.
#3) The strange ‘False Positive’ Paradox
(10 Points)
Suppose AIDS is a relatively rare disease and 1% of the American Population has it.
Suppose there is a blood test that tests positive 99% of the time for people that actually do
have the disease. However, it also tests positive 1% of the time for people that don’t have
the disease. You have just tested positive for AIDS. What is the probability that you
actually have AIDS? What should you do if you did test positive given these numbers?
SHOW YOUR WORK.
#4) Drinking Beer and playing ‘Whack a Mole!’ (10 Points)
Suppose you arrange for twenty volunteers to engage in a test of the effects of drinking
beer on the ability to play ‘Whack a Mole!’. (Whack a mole is a kind of game involving
hand-eye coordination and reaction time to smash robotic moles with a hammer). Your
twenty subjects play a game of whack a mole in which you record how many moles they
successfully whack in 2 minutes. They then consume a six-pack of beer within an hour and
play whack a mole again. Given the data below perform an appropriate statistical test to
see if drinking a six pack of beer changes one’s ability to play ‘Whack a Mole!’.
Name
Tom Cova
Peter Brick
Violet Gray
Dan Montello
John Cloud
John Fraser
Mike Taylor
Matthew Kerwin
Hilary Anderson
Becky Hamann
Anne Geddes
Paul Ehrlich
Chris Elvidge
Karen Arther
Sherlock Holmes
Mick Jagger
Dave Matthews
Tom Waits
Elton John
Brittany Spears
# Moles Whacked Sober
15
11
8
33
14
22
14
32
37
23
14
7
9
14
15
45
24
55
27
4
# Moles Whacked w/ Beer
5
4
11
21
7
15
15
22
16
20
17
5
5
17
10
33
31
32
13
0
Be sure to answer the following questions:
1)
2)
3)
4)
5)
6)
What is the appropriate statistical test given this experimental design?
Do you use a standard normal or a t-distribution for this test?
What are the mean # of moles whacked with and without beer?
What are the 95% confidence intervals around those means?
Does this data suggest that drinking beer impairs one’s ability to play Whack a Mole?
Comment on any methodological problems you might see with this experimental design.
Using the Standard Normal Table Problems (10 Points)
Use the Standard Normal Table Stapled to your test to solve these problems
Note: Draw diagrams and pictures to explain your reasoning for partial credit
#1) Paul Sutton is a relatively heavy man. He weighs 225 lbs. Assume that adult male
Americans have weights that are distributed Normally with mean 175 and standard
deviation of 25 lbs. What fraction of adult American men weigh more than Paul Sutton?
(5 Points)
#2) Consider the standard Normal curve below. What is the area under the curve from a
Z-value of -1.37 to a Z-Value of +2.05? Draw this area on the figure and calculate its value.
(5 Points)
“Lies!, Damn Lies!, and Statistics!” - Mark Twian (aka Samuel Clemens)
You have now read most of Darrel Huff’s “How to Lie with Statistics”. Provide a an
example either directly from the book or analogous to an example in the book on how to
lie with statistics. (5 Points)
The Sampling Distribution of the Mean (5 Points)
Assume each histogram below was derived from 100 random samples from each of the following types of
distributions respectively: Uniform(0,1), Normal(0,1), Poisson(4), and a Bimodal with modes at 5 and 15 and
variance = 144. In the blank line below each histogram match each of the four following distributions to the
sampling distributions of the mean of each histogram - remember N=100. Note: The mean and variance of a
Poisson are both equal to lambda, the variance of a Uniform distribution is (b-a)2 / 12.
N(0, 0.1)
N(4, 0.4)
N(10, 1.2)
N(0.5, .028)
Match the Numerical characterization of probability distribution functions above to
the histograms drawn below. Remember you are matching the above to the
sampling distribution of the means of the histograms below based on N=100
________________
_________________
_________________
_________________
Download