Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety Data Mining Safety • Data mining disasters are a hazard to the progress of scientific research. • We will review some common mining disasters and make recommendations for prevention Numeric Overflow “ In 2007, numeric floods were responsible for over $600 million in property damages. ’’ -Department of Made-Up Statistics Numeric Overflow ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees Numeric Overflow • Also caused loss of several hundred nerd-hours. • 1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours Numeric Overflow • Recommendation: A drowning researcher’s best bet is to grab onto a floating log. Power Law Failures • Occurs when confusing heavy-tailed distributions such as: • • • • • • Power Law (incl. Pareto, Zipf) Lognormal Weibull Burr Log-gamma Log-Log-Log-Log-Mushroom-Mushroom Power Law Failures • Many natural phenomena have heavy tails. • • • • • Magnitude of earthquakes Size of human settlements Degree distribution of “real” graphs Time-to-response in CS professors email Your mom • However, confusing heavy-tailed Power Law Failures • Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed. Power Law Failures • Statisticians get mean when they get religious. (SIGBOVIK07) • Recommendation: Calm the hell down. Decision Tree Forest Fires • Pruning is used to prevent overfitting. • When overpruning occurs, trees are burned to stumps. • This spreads, torching entire forests. L (Aww...) Decision Tree Forest Fires • Recommendation: Researchers should obtain burning permit before pruning with fire. • Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”. Voting Fraud by OneArmed Bandits • Cascading failures from other fields may cause disasters in data mining. • Fatal mistake: combining related subfields voting mechanisms and onearmed bandit problems. Voting Fraud by OneArmed Bandits • One-armed bandits commit voting fraud by: • Impersonating real voting machines. • Cramming cake into voting machines. • (The cake is a lie.) Other safety measures • Cool mining helmets Conclusion • The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters. • When faced with data-mining disasters, • • Remain Calm. J Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.