Why Statistics is worth the Stigma

advertisement
Why Statistics is worth the Stigma
Letters and Science Faculty Forum
23 April 2001
P.B. Stark
stark@stat.berkeley.edu
http://www.stat.berkeley.edu/~stark
How to end a Conversation
• “I’m a Statistician.”
• “I’ve wanted to be a Statistician ever since I was 5.”
But
• Being a statistician is license to dabble.
• Some of the smartest people in widely different fields
take time to explain to me what they do.
Two Ideas from Statistics
• Hypothesis Testing
• Interpolation
Hypothesis Testing
• Choice between two “theories” about the world:
null hypothesis, alternative hypothesis.
• Decision: Reject null hypothesis or not?
• Two kinds of error:
– Type I: reject null when null is true
– Type II : don’t reject null when null is false
Tradeoff between Errors
• Airport metal detector
• Dental exam
• Legal system
Can characterize the difference between conservatives and
liberals as a preference for different errors in different
circumstances.
Earthquake Prediction
• Method is proposed. Some predictions are followed by
earthquakes. Does the method work?
• Often formulated as a hypothesis test.
Null hypothesis: method does not work
• Greek VAN group proposed prediction method using
electrical signals. Different scientists came to opposite
conclusions about VAN efficacy.
Null Hypothesis in Earthquake Prediction
• The null hypothesis “method does not work” is not precise
enough to test.
• Need a chance model for the data under the assumption that
the method doesn’t work.
• Most common model: earthquakes occur at random,
according to a particular stochastic law.
• Conclusions differed because earthquake models differed.
Conclusions depend on Earthquake Model
•
Tests held predictions fixed, compared success rate on actual
seismicity with success rate on random seismicity.
• Crazy, IMHO:
– No sane seismologist would ignore previous seismicity in
making predictions, so why hold predictions constant
when changing quakes?
– Rejecting might mean only that the model for seismicity
is bad, not that the predictions are good.
Different Approach
•
•
Seismicity is fixed as observed.
Compare success rate of tested predictions with success rate
of similar predictions.
Rules for comparison predictions
1. Can only use the past, not the future.
2. Can use observed seismicity and extra randomness, but
nothing else.
Straw-Man Prediction Rule
•
After every earthquake, toss a coin.
– Heads: predict new earthquake within 20 days.
– Tails: don’t predict.
Compare success rate of this method in repeated trials (strings
of coin tosses) with success rate of VAN.
If better much of the time, conclude VAN not helpful.
If worse, no conclusion.
Results for VAN
Method
False alarm
rate
0.62
predictions
Alarm years
VANa
Success
rate
0.38
23
1.44
coinb
0.49c
0.30d
10
0.95e
(a) VAN predictions reported in Varotsos et al., 1996, vs. PDE for 19871989, 39 events with mb4.7. (b) Coin test: 23-day alarm with probability
23/39 after each event. (c) 90th percentile of 1000. (d) median (e) mean.
Interpolation and Missing Data
• Filling in missing data depends at least as much on
the method as on the data.
• “Stiff” interpolator can give biggest structure where
there is no datum.
• Errors in seismology (topography of the core-mantle
boundary) and cosmology (cosmic microwave
background).
Topography of Core-Mantle Boundary
• Fit observations of time it takes waves to travel from
earthquakes to receivers with smooth functions.
• Conclude reality is like the picture.
• Biggest structure is in gaps where there is no datum.
• Algorithms find structure when there is none—just
like metal&plastic interpolators. Property of geometry
and method, not Earth or data.
Cosmic Microwave Background
• Fit observations of sky temperature with smooth
functions.
• Conclude reality is like picture.
• Biggest structure is in gaps where there is no datum.
• Algorithms find structure when there is none—just
like metal&plastic interpolators. Property of geometry
and method, not of big bang or data.
Fun Consulting Projects
• U.S. Department of Justice
Child Online Protection Act: how much porn is on the
internet; how easily and how often do minors find it?
• Federal Trade Commission
Sampling to test Jenny Craig’s advertising claims.
• U.S. Commodity Futures Trading Commission
Indirect bucketing by T-bond traders.
• New York City Law Department
Evaluating commercial real estate tax assessments.
Other Projects
•
•
•
•
•
employment discrimination
water treatment
trade secret litigation
targeted web advertising
legislation to close CA
commercial abalone
fisheries
•
•
•
•
oil exploration
toxic tort litigation
insurance litigation
quality control of IC mask
manufacturing equipment
Capture-Recapture
•
•
•
•
How to estimate #fish in a pond?
Catch 100 fish, tag and release.
Wait for fish to mix with the others.
Catch another 100.
Count # with tags.
The Estimate
#tagged in 2nd catch
fraction caught 1st time

-------------------------------- .
100
Total
=
#caught 1st time
----------------------------fraction caught 1st time

100100
---------------------#tagged in 2nd catch
Assumptions
• 2nd catch like a random sample from pond
• Fish don’t enter, leave, hatch, or die between
catches
• Tagged and untagged fish equally hard to
catch
• Tags don’t fall off; impossible to misread.
Census Errors
• Fails to count person where should: gross omission
• Counts in wrong place, fictitious, double-count:
erroneous enumeration
• Historically, gross omissions exceed erroneous
enumerations—net undercount.
• 2000 census seems to have overcount
Census Adjustment
• Take Census; take sample of blocks later.
• Use match rate within demographic groups to
estimate rate people are missed in each group
• Synthesize population in each block by adjusting
counts in each group
Assumptions
• Participation in census doesn’t affect participation in
sample
• Can match sample records against census perfectly.
• Undercount constant within demographic groups
across geography
Simpson’s Paradox
Gender bias in graduate admissions, UCB.
In 1973 8,442 men and 4,321 women applied.
44% of men and 35% of women were admitted.
Which department(s) discriminated, prima facie?
1973 UCB Graduate Admits: 6 biggest Departments
Dept
A
B
C
D
E
F
Men
Women
Applied
825
%Admitted
62
Applied
108
%Admitted
82
560
325
63
37
25
593
68
34
417
191
33
28
375
393
35
24
373
6
341
7
The Paradox
What’s true for the parts isn’t true for the whole.
Download