Data Mining Disasters A Report Mary McGlohon

advertisement
Data Mining
Disasters
A Report
Mary McGlohon
SIGBOVIK Commission for Workplace Safety
Data Mining Safety
• Data mining disasters are a hazard to
the progress of scientific research.
• We will review some common mining
disasters and make recommendations
for prevention
Numeric Overflow
“
In 2007, numeric floods were responsible for
over $600 million in property damages.
’’
-Department of Made-Up Statistics
Numeric Overflow
ERROR::NUMERICOVERFLOW Nobody
expected the breach of the levees
Numeric Overflow
• Also caused loss of several hundred
nerd-hours.
• 1 nerd-hour = 1 grad-student-hour =
0.25 faculty-hours = 6 undergrad-hours
Numeric Overflow
• Recommendation: A drowning
researcher’s best bet is to grab onto a
floating log.
Power Law Failures
• Occurs when confusing heavy-tailed
distributions such as:
•
•
•
•
•
•
Power Law (incl. Pareto, Zipf)
Lognormal
Weibull
Burr
Log-gamma
Log-Log-Log-Log-Mushroom-Mushroom
Power Law Failures
• Many natural phenomena have heavy
tails.
•
•
•
•
•
Magnitude of earthquakes
Size of human settlements
Degree distribution of “real” graphs
Time-to-response in CS professors email
Your mom
• However, confusing heavy-tailed
Power Law Failures
• Related danger: Statisticians, computer
scientists, and physicists wasting
valuable nerd-hours in religious
arguments over which heavy-tailed
distribution is being followed.
Power Law Failures
• Statisticians get mean when they get
religious. (SIGBOVIK07)
• Recommendation: Calm the hell down.
Decision Tree Forest
Fires
• Pruning is used to
prevent overfitting.
• When overpruning
occurs, trees are
burned to stumps.
• This spreads,
torching entire
forests. L
(Aww...)
Decision Tree Forest
Fires
• Recommendation:
Researchers should
obtain burning permit
before pruning with fire.
• Smoking while
researching is not
recommended-- if you
choose to do so, make
sure your “butts are
out”.
Voting Fraud by OneArmed Bandits
• Cascading failures from other fields
may cause disasters in data mining.
• Fatal mistake: combining related
subfields voting mechanisms and onearmed bandit problems.
Voting Fraud by OneArmed Bandits
• One-armed bandits commit voting fraud
by:
•
Impersonating real voting machines.
•
Cramming cake into voting machines.
•
(The cake is a lie.)
Other safety measures
• Cool mining helmets
Conclusion
• The Commission for Workplace Safety
hopes this has raised awareness of
potential data mining disasters.
• When faced with data-mining disasters,
•
•
Remain Calm. J
Blame it on one-off errors, lack of rigor in
proofs of correctness, or whatever
government agency is funding the project.
Download