Proceedings of the Fourth Workshop on Data Mining Case Studies

advertisement
Data Mining Case Studies IV
Proceedings of the Fourth International Workshop on
Data Mining Case Studies and Practice Prize
Part of the Eleventh IEEE International Conference on Data Mining in Vancouver, Canada
Edited by
Brendan Kitts, MarketScale
Wei Ding, University of Massachusetts Boston
Gabor Melli, PredictionWorks
ISBN XXX-X-XXXXX-XXX-X
Page: 1
Contents
Detection of Anomalous Particles from Deepwater Horizon Oil Spill using
SIPPER3 Underwater Imaging Platform
Sergiy Fefilatyev, Kurt Kramer, Lawrence Hall, Dmitry Goldgof, Rangachar
Kasturi, Andrew Remsen, Kendra Daly
UNIVERSITY OF SOUTH FLORIDA ........................................................................................ 11
Analysis of Treatment Patterns: A Case Study on Carpal Tunnel Syndrome
Ching Lien and Silvia Figueira
SANTA CLARA UNIVERSITY ............................................................................................... 23
Dynamic Loan Service Monitoring using Segmented Hidden Markov Models
Haengju Lee, Nathan Gnanasambandam, Raj Minhas, Shi Zhao
XEROX RESEARCH CENTER WEBSTER ............................................................................... 36
Evaluation & extension to Duckworth Lewis method: An application of data
mining techniques
Viraj Phanse and Sourabh Deorah
UNIVERSITY OF CALIFORNIA, LOS ANGELES ...................................................................... 44
Evolutionary algorithms for selecting the architecture of a MLP Neural
Network: A Credit Scoring Case
Alejandro Correa Bahnsen and Andres F. Gonzalez
COLPATRIA BANK - GE CAPITAL JV, COLOMBIA ............................................................... 62
A/B Testing Using the Negative Binomial Distribution in an Internet Search
Application
Saharon Rosset and Slava Borodovsky
TEL-AVIV UNIVERSITY AND SWEETIM INC. ....................................................................... 74
Page: 2
An Augmented Vector Space Information Retrieval for Recovering
Requirement Traceability
Yoshihisa Udagawa
TOKYO POLYTECHNIC UNIVERSITY .................................................................................... 99
"The 100 Most Influential Persons in History”: A Data Mining Perspective
Noora Al-Naimi and Khaled Shaban
QATAR UNIVERSITY ......................................................................................................... 114
Classification Rule Mining Supported by Ontology for Discrimination
Discovery
Thanh Binh Luong, Franco Turini, and Salvatore Ruggieri
INSTITUTE FOR ADVANCED STUDIES LUCCA AND UNIVERSITÀ DI PISA ............................ 124
Crime Forecasting Using Data Mining Techniques
Chung-Hsien Yu, Max Ward, Melissa Morabito, and Wei Ding
UNIVERSITY OF MASSACHUSETTS BOSTON ...................................................................... 125
Identification of Human Behavior Patterns Based on the GSP_M Algorithm
Hector Gomez and Rafael Martinez
UNIVERSIDAD TÉCNICA PARTICULAR DE LOJA, ECUADOR AND UNIVERSIDAD
NACIONAL DE ESTUDIOS A DISTANCIA, SPAIN ................................................................. 126
Integrating Collaborative Filtering and Search-based Techniques for
Personalized Online Product Recommendation
Yue Xu, Noraswaliza Abdullah, and Shlomo Geva
QUEENSLAND UNIVERSITY OF TECHNOLOGY ................................................................... 134
Television Ad Targeting and ROI Measurement
Brendan Kitts, Dyng Au, Ryan Brooks
MARKETSCALE INC. ......................................................................................................... 142
Page: 3
Organizers
Chairs
Brendan Kitts, MarketScale
Wei Ding, PhD., University of Massachusetts, Boston
Gabor Melli, PhD., PredictionWorks
Program Committee
Gregory Piatetsky-Shapiro, PhD., KDNuggets
Karl Rexer, PhD., Rexer Analytics
Gang Wu, PhD. Microsoft
Dean Abbott, PhD., Abbott Analytics
Jing Ying Zhang, PhD. Microsoft
Richard Bolton, PhD., KnowledgeBase Marketing
Robert Grossman, PhD., University of Illinois at Chicago and Open Data Group
Peter van der Putten, PhD., Leiden University and Pegasystems
Wesley Brandi, PhD., Microsoft
Ricardo Vilalta, PhD., University of Houston
Dyng Au, MarketScale
Parameshvyas Laxminarayan, iProspect
Sponsors
The IEEE Computer Society
PredictionWorks
MarketScale
Page: 4
Participants
Kendra Daly
Alejandro Correa Bahnsen
Andres F. Gonzalez
Andrew Remsen
Ching Lien
Chung-Hsien Yu
Dmitry Goldgof
Dyng Au
Franco Turini
Haengju Lee
Hector Gomez
Khaled Shaban
Kurt Kramer
Lawrence Hall
Max Ward
Melissa Morabito
Nathan Gnanasambandam
Noora Al-Naimi
Noraswaliza Abdullah
Rafael Martinez
Raj Minhas
Rangachar Kasturi
Ryan Brooks
Saharon Rosset
Salvatore Ruggieri
Sergiy Fefilatyev
Shi Zhao
Shlomo Geva
Silvia Figueira
Slava Borodovsky
Sourabh Deorah
Thanh Binh Luong
Viraj Phanse
Yoshihisa Udagawa
Yue Xu
University of South Florida
Colpatria Bank - GE Capital JV, Colombia
Colpatria Bank - GE Capital JV, Colombia
University of South Florida
Santa Clara University
University of Massachusetts Boston
University of South Florida
MarketScale Inc
Università di Pisa
Xerox Research Center Webster
Universidad Técnica Particular de Loja, Ecuador
Qatar University
University of South Florida
University of South Florida
University of Massachusetts Boston
University of Massachusetts Boston
Xerox Research Center Webster
Qatar University
Queensland University of Technology
Universidad Nacional de Estudios a Distancia, Spain
Xerox Research Center Webster
University of South Florida
MarketScale Inc
Tel-Aviv University
Università di Pisa
University of South Florida
Xerox Research Center Webster
Queensland University of Technology
Santa Clara University
SweetIM Inc
University of California, Los Angeles
Institute for Advanced Studies Lucca
University of California, Los Angeles
Tokyo Polytechnic University
Queensland University of Technology
Page: 5
The Data Mining Case Studies Workshop
From its inception the field of Data Mining has been guided by the need to solve practical problems. Yet
few articles describing working, end-to-end, real-world case studies exist in our literature. Success
stories can capture the imagination and inspire researchers to do great things. The benefits of good case
studies include:
1. Education: Success stories help to build understanding.
2. Inspiration: Success stories inspire future data mining research.
3. Public Relations: Applications that are socially beneficial, and even those that are just interesting,
help to raise awareness of the positive role that data mining can play in science and society.
4. Media Coverage: The media is more likely to report on completed data mining applications, than
they are on isolated algorithms. We have an opportunity to present positive success stories to the
wider community.
5. Problem Solving: Success stories demonstrate how whole problems can be solved. Often 90% of
the effort is spent solving non-prediction algorithm related problems.
6. Connections to Other Scientific Fields: Completed data mining systems often exploit methods
and principles from a wide range of scientific areas. Fostering connections to these fields will
benefit data mining academically, and will assist practitioners to learn how to harness these fields
to develop successful applications.
The Data Mining Case Studies Workshop was established in 2005 to showcase the very best in data
mining case studies. We also established that Data Mining Practice Prize to attract the best submissions,
and to provide an incentive for commercial companies to come into the spotlight.
The first workshop was followed up by a SIGKDD Explorations Special Issue on Real-world
Applications of Data Mining edited by Osmar Zaiane in 2006 and Data Mining Case Studies 2 and 3 in
at KDD 2007 and 2009. It is our pleasure to continue the work of highlighting significant industrial
deployments, and return to the IEEE ICDM Conference where we first launched Data Mining Case
Studies with the Fourth Data Mining Case Studies Workshop in 2011.
Like its predecessors, the mission of Data Mining Case Studies 2011 is to highlight data mining
implementations that have been responsible for a significant and measurable improvement in business
operations, or an equally important scientific discovery, or some other benefit to humanity. Data Mining
Case Studies papers were allowed greater latitude in (a) range of topics - authors may touch upon areas
such as optimization, operations research, inventory control, and so on, (b) scope - more complete
context, problem and solution descriptions will be encouraged, (c) prior publication - if the paper was
published in part elsewhere, it may still be considered if the new article is substantially more detailed,
(d) novelty - often successful data mining practitioners utilize well established techniques to achieve
successful implementations and allowance for this will be given.
Page: 6
The Data Mining Practice Prize
Introduction
The Data Mining Practice Prize is awarded to work that has had a significant and quantitative impact in
the application in which it was applied, or has significantly benefited humanity. All papers submitted to
Data Mining Case Studies are eligible for the Data Mining Practice Prize, with the exception of
members of the Prize Committee. Eligible authors consent to allowing the Practice Prize Committee to
contact third parties and their deployment client in order to independently validate their claims.
Award
Winners receive an impressive array of honors including
a.
b.
c.
d.
Plaque
Prize money $200
Awards Dinner with organizers and prize winners.
The Honor of winning the Practice Prize competition
We wish to thank IEEE and MarketScale for making our competition and workshop possible.
Page: 7
The IEEE Computer Society
The IEEE Computer Society is the world’s leading organization of computing professionals with over
85,000 members. Founded in 1946, and the largest of IEEE’s 38 societies, the Computer Society is
dedicated to advancing the theory and application of computing and information technology. The IEEE
Computer Society Digital Library (CSDL) provides access to more than 330,000 articles and papers
from more than 3,500 conference proceedings and 27 Computer Society periodicals. The Conference
Publishing Services division produces more than 250 conference proceedings, CD-ROMs, and
multimedia each year. CS Press publishes full-length technical books on cutting-edge topics through a
partnership with John Wiley and Sons, as well as electronic products (ReadyNotes) and curated article
collections (EssentialSets) under its own imprint. With about 40 percent of its members living and
working outside the United States, the Computer Society fosters international communication,
cooperation, and information exchange. It monitors and evaluates curriculum accreditation guidelines
through its ties with the US Computing Sciences Accreditation Board and the Accreditation Board for
Engineering and Technology.
Page: 8
Page: 9
Page: 10
Page: 11
Download