KDD 2003

advertisement
Program
for the
Ninth ACM SIGKDD
International Conference
on
Knowledge Discovery
and
Data Mining
KDD-2003
Washington, DC, USA
August 24-27, 2003
Program Highlights
Invited Talks
 On-Line Science: The World-Wide Telescope as a
Prototype for the New Computational Science, Jim
Gray, Microsoft Research
 Statistical Learning from Relational Data, Daphne
Koller, Stanford University
 Analyzing Customer Behavior at Amazon.com, Andreas
Weigend, Chief Scientist, Amazon.com
Research and Industrial/Government Tracks
 34 research papers divided into two tracks with nine
sessions in total
 13 industrial/government papers in four sessions
 36 research posters
 10 industrial/government posters
 2 panels
 7 tutorials:
o Data Mining for Computer Security
o Data Mining for Machine Learners
o Information Extraction from the World Wide Web
o Multi-Relational Data Mining
o Privacy-Preserving Data Mining
o Sequence Data Mining Techniques and
Applications
o The Top 10 Data Mining Mistakesand How to
Avoid Them
 9 workshops:
o BIOKDD03: Data Mining in Bioinformatics
o Data Cleaning, Record Linkage and Object
Consolidation
o Data Mining Standards, Services and Platforms
o Fractals and Self Similarity in Data Mining: Issues
and Approaches
o Link Analysis
o MDM/KDD 2003: Integrated Media Mining
o MRDM 2003: Multi-relational Data Mining
o Operational Text Classification
o WebKDD2003: WebMining as a Premise to
Intelligent and Effective Web Applications
Summarized Technical Program
Sunday







SIGKDD 2003 Opening
Awards Ceremony
Innovation Award Talk
KDD Cup 2003
Joint KDD/ICML Invited Talk
2 Joint KDD/ICML Sessions
1 Tutorial
Monday







Invited Talk
Research Track
o Clustering and Pattern Discovery (4 papers)
o Temporal Data (4 papers)
o Classification and Contrast Sets (3 papers)
Industrial/Government Track
o IT (4 papers)
o Science (3 papers)
1 Panel
1 Tutorial
Poster Highlights
Poster Session
Tuesday


Invited Talk
Research Track
o Relational and Graph Data (3 papers)
o Data Streams and Sequential Data (3 papers)
o Web Mining and Data Cubes (3 papers)
o Distance-based Methods (3 papers)
o Frequent Sets (5 papers)
o Data Reduction and Visualization (3 papers)
 Industrial/Government Track
o Healthcare (3 papers)
o Systems (3 papers)
 1 Panel
 1 Tutorial
Wednesday
Saturday, August 23
16:00-20:00 (Concourse)
Registration


9 Workshops
4 Tutorials
Notes:
Sunday, August 24
9:00-18:00 (Concourse)
Registration
10:00-10:15 (International Ballroom – Center)
Opening Remarks
Ted Senator, General Chair
Pedro Domingos, Christos Faloutsos, Program Chairs
10:15-10:30 (International Ballroom – Center)
Award Presentations
Chairs: Mark Craven, Daryl Pregibon
10:30-11:30 (International Ballroom – Center)
Award Talk
Chair: Gregory Piatetsky-Shapiro
Innovation Award Talk by Heikki Mannila
11:30-12:30 (International Ballroom – Center)
KDD Cup Awards
Chairs: Johannes Gehrke, Paul Ginsparg, Jon Kleinberg
12:30-14:00
Lunch (on your own)
14:00-15:00 (International Ballroom – Center)
Joint KDD/ICML Invited Talk
Chair: Pedro Domingos
Statistical Learning from Relational Data
Daphne Koller, Stanford University
Much of the data in the world is relational in nature,
involving multiple objects, related to each other in a variety
of ways. Examples include both structured databases such
as customer transaction data, semi-structured data such
as hyperlinked pages on the world-wide web or networks
of interacting genes, and unstructured data such as text. In
this talk, I will describe a statistical framework for learning
from relational data. The approach is based on
probabilistic models, which have been applied with great
success to a variety of machine learning tasks. Generally,
this framework has been applied to data represented as
fixed-length attribute-value vectors, or to sequence data. I
will describe the language of probabilistic relational models
(PRMs), which extend probabilistic graphical models with
the expressive power of object-relational languages. PRMs
model the uncertainty over the attributes of objects in the
domain as well as uncertainty over the existence of
relations between objects. I will present techniques for
automatically learning PRMs directly from a relational data
set, and applications of these techniques to various tasks,
Sunday, August 24
such as: collective classification of an entire set of related
entities; clustering a set of linked entities into coherent
groups; and even predicting the existence of links between
entities. The talk will demonstrate the applicability of the
techniques on several domains, such as web data and
biological data. We discuss some recent trends and
events, e.g., the dot com meltdown, and some ways for
the field to respond to the challenges, and the
opportunities.
15:00-16:00 (International Ballroom – Center)
Joint KDD/ICML Session I
Chair: Pedro Domingos
BEST RESEARCH PAPER AWARD
Maximizing the Spread of Influence through a Social
Network
David Kempe, Jon Kleinberg, Eva Tardos
Bayesian Network Anomaly Pattern Detection for Disease
Outbreaks
Weng-Keen Wong, Andrew Moore, Gergory Cooper,
Michael Wagner
15:00-18:30 (Georgetown Room)
Tutorial: The Top 10 Data Mining Mistakesand How to
Avoid Them
John F. Elder, Elder Research, USA
16:00-16:30
Coffee Break
16:30-18:30 (International Ballroom – Center)
Joint KDD/ICML Session II
Chair: Tom Fawcett
XRules: An Effective Structural Classifier for XML Data
Mohammed Zaki, Charu Aggarwal
Learning on the Test Data: Leveraging "Unseen" Features
Ben Taskar, Ming Fai Wong, Daphne Koller
Information-Theoretic Co-clustering
Inderjit Dhillon, Subramanyam Mallela, Dharmendra
Modha
ICML BEST STUDENT PAPER AWARD
A Kernel between Sets of Vectors
Risi Kondor, Tony Jebara
Notes:
Notes:
Monday, August 25
7:30-8:30
Continental Breakfast
8:00-18:00 (Concourse)
Registration
8:00-17:00 (Exhibit Hall)
Exhibits
8:30-9:30 (International Ballroom – Center)
Invited Talk
Chair: Christos Faloutsos
On-Line Science: The World-Wide Telescope as a
Prototype for the New Computational Science
Jim Gray, Microsoft Research
Computational science has historically meant simulation;
but, there is an increasing role for analysis and mining of
online scientific data. As a case in point, half of the world's
astronomy data is public. The astronomy community is
putting all that data on the Internet so that the Internet
becomes the world's best telescope: it has the whole sky,
in many bands, and in detail as good as the best 2-yearold telescopes. It is useable by all astronomers
everywhere. This is the vision of the virtual observatory -also called the World Wide Telescope (WWT). As one step
along that path I have been working with the Sloan Digital
Sky Survey (especially Alex Szalay of Johns Hopkins) and
CalTech to federate their data in web services on the
Internet, and to make it easy to ask questions of the
database (see http://skyserver.sdss.org). This talk explains
the rationale for the WWT, discusses how we designed the
database, and talks about some data mining tasks. It also
describes computer science challenges of publishing,
federating, and mining scientific data, and argues that XML
web services are key to federating diverse data sources.
9:30-10:00
Coffee Break
10:00-12:00 Research Track 1 (Monroe Room)
Clustering and Pattern Discovery
Chair: Gregory Piatetsky-Shapiro
Privacy-Preserving K-Means Clustering over Vertically
Partitioned Data
Jaideep Vaidya, Chris Clifton
Assessment and Pruning of Hierarchical Model Based
Clustering
Jeremy Tantrum, Alejandro Murua, Werner Stuetzle
Monday, August 25
Generative Model-Based Clustering of Directional Data
Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Suvrit
Sra
An Alternative Hypothesis-Testing Strategy for Pattern
Discovery
Richard Bolton, Niall Adams
10:00-12:00 Research Track 2 (Military Room)
Temporal Data
Chair: Sunita Sarawagi
Indexing Multi-Dimensional Time-Series with Support for
Multiple Distance Measures
Michail Vlachos, Marios Hadjieleftheriou, Dimitrios
Gunopulos, Eamonn Keogh
Translation-Invariant Mixture Models for Curve Clustering
Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic
Smyth
Generating English Summaries of Time Series Data Using
the Gricean Maxims
Somayajulu Sripada, Ehud Reiter, Jim Hunter, Jin Yu
To Buy or Not to Buy: Mining Airline Fare Data to Minimize
Ticket Purchase Price
Oren Etzioni, Craig Knoblock, Rattapoon Tuchinda,
Alexander Yates
10:00-12:00 Industrial/Govt. Track (Georgetown Room)
IT
Chair: Michael Pazzani
SIGKDD-2003 Program Committee, cont.
Kai Ming Ting, Monash University, Australia
Hannu Toivonen, University of Helsinki, Finland
Alexander Tuzhilin, New York University, USA
Geoff Webb, Monash University, Australia
Stefan Wrobel, Fraunhofer AIS and University of Bonn,
Germany
Yiming Yang, Carnegie Mellon University, USA
Philip Yu, IBM T. J. Watson Research Center, USA
Osmar Zaiane, University of Alberta, Canada
Ruben Zamar, University of British Columbia, Canada
Zijian Zheng, Microsoft Corporation, USA
Industrial/Government Track Program Committee
Scott Bennett, SRA International, USA
Eric Bloedorn, Mitre, USA
John Elder, Elder Research, USA
Herb Edelstein, Two Crows, USA
Ronen Feldman, ClearForest, USA
Steve Gallant, Xchange, USA
Monte Hancock, CSI, USA
Richard Lathrop, University of California – Irvine, USA
Brian Lent, Intelligent Results, USA
Chris Merz, Mastercard, USA
Claudia Pearce, NSA, USA
Dorian Pyle, Data Miners, USA
Bharat Rao, Siemens, Germany
Neal Rothleder, digiMine, USA
Joseph Sirosh, Fair Isaac, USA
Ming Tan, RulesPower, USA
Ramasamy Uthurusamy, General Motors, USA
Best Paper Awards Committee
Passenger-Based Predictive Modeling of Airline No-show
Rates
Richard D. Lawrence, Se J. Hong, Jacques Cherrier
The Data Mining Approach to Automated Software Testing
Mark Last, Menahem Friedman, Abraham Kandel
Critical Event Prediction for Proactive Management in
Large-scale Computer Clusters
R. K. Sahoo, A. J. Oliner, I. Rish, M. Gupta, J. E. Moreira,
S. Ma
Information Awareness: A Prospective Technical
Assessment
David Jensen, Matt Rattigan, and Hannah Blau
Corinna Cortes, AT&T Labs - Research, USA
Charles Elkan, University of San Diego, USA
H.V. Jagadish, University of Michigan, USA
David Madigan, Rutgers University, USA
Raymond Ng, University of British Columbia, Canada
Padhraic Smyth, University of California, Irvine, USA
Alexander Tuzhilin, New York University, USA
ACM SIGKDD Chair
Won Kim, Cyber Database Solutions, USA
SIGKDD-2003 Program Committee, cont.
Daniel Keim, University of Konstanz, Germany
Eamonn Keogh, University of California, Riverside, USA
Masaru Kitsuregawa, University of Tokyo, Japan
Jon Kleinberg, Cornell University, USA
Ron Kohavi, Blue Martini Software, USA
Nick Koudas, AT&T Labs – Research, USA
Hans-Peter Kriegel, University of Munich, Germany
Vipin Kumar, University of Minnesota, USA
Diane Lambert, Bell Labs, USA
Nada Lavrac, Jozef Stefan Institute, Slovenia
Wenke Lee, Georgia Institute of Technology, USA
David Lin, University of Memphis, USA
Sheng Ma, IBM T. J. Watson Research Center, USA
Dragos Margineantu, The Boeing Company, USA
Brij Masand, Data Miners, Inc., USA
Llew Mason, Blue Martini Software, USA
Andrew McCallum, University of Massachusetts, Amherst,
USA
Vasileios Megalooikonomou, Temple University, USA
Marina Meila, University of Washington, USA
Dunja Mladenic, Jozef Stefan Institute, Slovenia
Raymond Mooney, University of Texas, Austin, USA
Katharina Morik, University of Dortmund, Germany
Rajeev Motwani, Stanford University, USA
Richard Muntz, University of California, Los Angeles, USA
Raymond Ng, University of British Columbia, Canada
William Stafford Noble, University of Washington, USA
Stephen North, AT&T Labs – Research, USA
David Page, University of Wisconsin, Madison, USA
Dmitry Pavlov, NEC Research Institute, USA
Jian Pei, State University of New York at Buffalo, USA
David Pennock, Overture Services, Inc., USA
Gregory Piatetsky-Shapiro, KDnuggets, USA
Foster Provost, New York University, USA
Raghu Ramakrishnan, University of Wisconsin, Madison,
USA
Pat Riddle, University of Auckland, Australia
Greg Ridgeway, RAND, USA
Mehran Sahami, Google and Stanford University, USA
Lorenza Saitta, University of Piemonte Orientale, Italy
Joerg Sander, University of Alberta, Canada
Sunita Sarawagi, IIT Bombay, India
Dale Schuurmans, University of Waterloo, Canada
Steven L. Scott, University of Southern California, USA
Ken Sevcik, University of Toronto, Canada
Jude Shavlik, University of Wisconsin, Madison, USA
Arno Siebes, Utrecht University, Netherlands
Simeon Simoff, University of Technology Sidney, Australia
Myra Spiliopoulou, Otto-von-Guericke-Universitaet
Magdeburg, Germany
Jaideep Srivastava, University of Minnesota, USA
Werner Stuetzle, University of Washington, USA
Latanya Sweeney, Carnegie Mellon University, USA
Monday, August 25
12:00-13:30 (International Ballroom – Center)
Lunch
13:30-15:00 Research Track 1 (Military Room)
Classification and Contrast Sets
Chair: Lorenza Saitta
Classifying Large Data Sets Using SVMs with Hierarchical
Clusters
Hwanjo Yu, Jiong Yang, Jiawei Han
Cross-Training: Learning Probabilistic Mappings Between
Topics
Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godbole
On Detecting Differences Between Groups
Geoff Webb, Shane Butler, Douglas Newlands
13:30-15:00 Industrial/Govt. Track (Monroe Room)
Science
Chair: Bharat Rao
Capturing Best Practice for Microarray Gene Expression
Data Analysis
Gregory Piatetsky-Shapiro, Tom Khabaza, Sridhar
Ramaswamy
Frequent-Subsequence-Based Prediction of Outer
Membrane Proteins
Rong She, Fei Chen, Ke Wang, Martin Ester, Jennifer L.
Gardy, Fiona S. L. Brinkman
Discovery of Climate Indices using Clustering
Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Steven
Klooster, Christopher Potter
13:30-17:00 (Georgetown Room)
Tutorial: Multi-Relational Data Mining
Luc DeRaedt, Albert-Ludwigs-University Freiburg,
Germany
Saso Dzeroski, Jozef Stefan Institute, Slovenia
15:00-15:30
Coffee Break
Monday, August 25
15:30-17:00 (International Ballroom – Center)
Panel: Privacy and Data Mining: Friends or Foes?
Chair: Rakesh Agrawal, IBM Almaden Research Center
The explosive progress in networking, storage, and
processor technologies has created an unprecedented
capability to collect, store, and process massive amounts
of data. Data mining, with its promise of efficiently
discovering valuable, non-obvious information from large
databases, is posing an interesting dilemma. Applications
abound where data mining could do enormous good.
However, under misguided hands, in conjunction with
other advanced technologies, it could be vulnerable to
misuse. Indeed, of late, data mining has come to be
portrayed by some as a potential threat to civil liberties and
privacy.
The goal of this panel is to debate and
understand the concerns with data mining and to identify
research directions that may address those concerns.
Panelists will address the following specific questions:
1. Perceived concerns with data mining
2. How real are those concerns
3. What the data mining community is doing to address
those concerns
4. What more needs to be done
Panelists:
Christopher Clifton, Purdue University
Lawrence Cox, National Center for Health Statistics
James Dempsey, Center for Democracy & Technology
Mike Gurski, Information & Privacy Commission,
Ontario, Canada
Bhavani Thuraisingham, National Science Foundation
Jeff Ullman, Stanford University
17:00-18:30 (International Ballroom – Center)
Poster Highlights
Chair: Usama Fayyad
18:30-20:30 (Exhibit Hall)
Poster Session and Reception
SIGKDD-2003 Program Committee
Niall Adams, Imperial College, UK
Deepak K. Agarwal, AT&T Labs – Research, USA
Mihael Ankerst, The Boeing Company, USA
Chid Apte, IBM T. J. Watson Research Center, USA
Lars Asker, Stockholm University, Sweden
Daniel Barbara, George Mason University, USA
Roberto Bayardo, IBM Almaden Research Center, USA
Kristin Bennett, Rensselaer Polytechnic Institute, USA
Michael Berthold, Tripos, Inc., USA
Richard Bolton, Imperial College, UK
Pavel Brazdil, University of Porto, Portugal
Carla Brodley, Purdue University, USA
Wray Buntine, Helsinki Institute for Information
Technology, Finland
Rich Caruana, Cornell University, USA
Soumen Chakrabarti, IIT Bombay, India
Phillip Chan, MIT/FIT, USA
Surajit Chaudhuri, Microsoft Research, USA
Ken Church, AT&T Labs – Research, USA
Chris Clifton, Purdue University, USA
William Cohen, Carnegie Mellon University, USA
David Cohn, Google, USA
Mark Craven, University of Wisconsin, Madison, USA
Tamraparni Dasu, AT&T Labs – Research, USA
Umeshwar Dayal, Hewlett-Packard Laboratories, USA
Luc De Raedt, Albert-Ludwigs-University Freiburg,
Germany
Thomas G. Dietterich, Oregon State University, USA
Susan Dumais, Microsoft Research, USA
William DuMouchel, AT&T Labs – Research, USA
Jennifer Dy, Northeastern University, USA
Saso Dzeroski, Jozef Stefan Institute, Slovenia
Charles Elkan, University of California, San Diego, USA
Martin Ester, Simon Fraser University, Canada
Usama Fayyad, DMX Group, USA
Doug Fisher, Vanderbilt University, USA
Gary William Flake, Overture Services, Inc., USA
Takeshi Fukuda, IBM Tokyo Laboratory, Japan
Minos Garofalakis, Bell Labs, USA
Johannes Gehrke, Cornell University, USA
Lee Giles, Pennsylvania State University, USA
Henry Goldberg, NASD, USA
Marko Grobelnik, Jozef Stefan Institute, Slovenia
Dimitrios Gunopulos, University of California, Riverside,
USA
Jiawei Han, University of Illinois at Urbana, USA
David Heckerman, Microsoft Research, USA
Haym Hirsh, Rutgers University, USA
Piotr Indyk, MIT, USA
Yannis Ioannidis, University of Athens, Greece
H.V. Jagadish, University of Michigan, USA
David Jensen, University of Massachusetts, Amherst, USA
Thorsten Joachims, Cornell University, USA
SIGKDD-2003 Organizing Committee
General Chair:
Ted Senator, DARPA, USA
Associate General Chair:
Hillol Kargupta, University of Maryland, Baltimore
County, USA
Program Chairs:
Pedro Domingos, University of Washington, USA
Christos Faloutsos, Carnegie Mellon University, USA
Industrial/Government Track Chairs:
Paul Bradley, Microsoft Research, USA
Michael Pazzani, University of California, Irvine, USA
Best Paper Awards Chair:
Daryl Pregibon, AT&T Labs - Research, USA
Exhibits Chairs:
Kirk Borne, Raytheon and NASA Goddard Space Flight
Ctr, USA
David Vennergrund, SRA International Inc., USA
Government Relations Chairs:
Eric Bloedorn, MITRE Corp., USA
Ashok Srivastava, RIACS/NASA Ames Research Ctr,
USA
KDD Cup Chairs:
Johannes Gehrke, Cornell University, USA
Paul Ginsparg, Cornell University, USA
Jon Kleinberg, Cornell University, USA
Local Arrangements Chair:
Tim Oates, University of Maryland, Baltimore County,
USA
Local Publicity Chair:
Lisa Singh, Georgetown University, USA
Panels Chair:
Steve Lawrence, Google, USA
Proceedings Chair:
Lise Getoor, University of Maryland, College Park, USA
Publicity Chair:
Osmar R. Zaïane, University of Alberta, Canada
Registration Chairs:
Rita Doerr, Department of Defense, USA
Anupam Joshi, University of Maryland, Baltimore
County, USA
Sponsorship Chairs:
Herb Edelstein, Two Crows Corp., USA
John F. Elder IV, Elder Research Inc., USA
Student Awards Chair:
Mark Craven, University of Wisconsin, Madison, USA
Treasurer:
Henry Goldberg, NASD, USA
Tutorials Chair:
Ramakrishnan Srikant, IBM Almaden Research Ctr, USA
Webmaster:
Osmar R. Zaïane, University of Alberta, Canada
Workshops Chair:
Charu Aggarwal, IBM T. J. Watson Research Ctr, USA
Poster Papers – Research Track
Stylistic Mining of Electronic Messages for Multiple
Authorship Discrimination: First Results
Shlomo Argamon, Marin Saric, Sterling Stein
Mining High Dimensional Data for Classifier
Knowledge
Raj Bhatnagar, Goutham Kurra, Wen Niu
Finding Recent Frequent Itemsets Adaptively over
Online Data Streams
Joong Hyuk Chang, Won Suk Lee
Probabilistic Discovery of Time Series Motifs
Bill Chiu, Eamonn Keogh, Stefano Lonardi
Understanding Captions in Biomedical Publications
William Cohen, Richard Wang, Robert Murphy
Using Randomized Response Techniques for PrivacyPreserving Data Mining
Wenliang Du, Zhijun Zhan
Applications of Sampling and Fractional Factorial
Designs to Model-Free Data Squashing
William DuMouchel, Deepak K. Agarwal
Experiments with Random Projections for Machine
Learning
Dmitriy Fradkin, David Madigan
Accurate Decision Trees for Mining High-Speed Data
Streams
Joao Gama, Ricardo Rocha, Pedro Medas
Correlating Synchronous and Asynchronous Data
Streams
Sudipto Guha, Dimitrios Gunopulos, Nick Koudas
A Web Page Prediction Model Based On Click-stream
Tree Representation of User Behavior
Sule Gunduz, M. Tamer Ozsu
Natural Communities in Large Linked Networks
John Hopcroft, Omar Khan, Brian Kulis, Bart Selman
Navigating Massive Data Sets via Local Clustering
Michael E. Houle
Mining Viewpoint Patterns in Image Databases
Wynne Hsu, Jing Dai, Mong Li Lee
Playing Hide-And-Seek with Correlations
Christopher Jermaine
Interactive Exploration of Coherent Patterns in Timeseries Gene Expression Data
Daxin Jiang, Jian Pei, Aidong Zhang
Poster Papers – Research Track, cont.
Efficient Decision Tree Construction on Streaming
Data
Ruoming Jin, Gagan Agrawal
Efficient Decision Tree Construction on Streaming
Data
Ruoming Jin, Gagan Agrawal
Acknowledgements
The SIGKDD 2003 Conference gratefully acknowledges
the contributions of the following institutions:
Gold Sponsors
A Bag-of-Paths Model for Representing Document
Structure with Application to Web Mining
Sachindra Joshi, Neeraj Agrawal, Raghu
Krishnapuram, Sumit Negi
Nantonac Collaborative Filtering: Recommendation
Based on Order Responses
Toshihiro Kamishima
Silver Sponsors
A Two-Way Visualization Method for Clustered Data
Yehuda Koren, David Harel
Empirical Comparisons of Various Voting Schemes in
Boosting and Bagging
Kelvin Leung, D. Stott Parker
Mining Data Records in Web Pages
Bing Liu, Robert Grossman, Yanhong Zhai
On Computing, Storing and Querying Frequent
Patterns
Guimei Liu, Hongjun Lu, Wenwu Lou, Jeffrey Xu Yu
Online Novelty Detection on Temporal Sequences
Junshui Ma, Simon Perkins
Bronze Sponsors
Distributed Cooperative Mining for Information
Consortia
Satoshi Morinaga, Kenji Yamanishi, Jun-ichi
Takeuchi
Learning Relational Probability Trees
Jennifer Neville, David Jensen, Lisa Friedland,
Michael Hay
Graph-Based Anomaly Detection
Caleb Noble, Diane Cook
CARPENTER: Finding Closed Patterns in Long
Biological Datasets
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong
Yang, Mohammed Zaki
New Unsupervised Clustering Algorithm for Large
Datasets
William Peter, John Chiochetti
Improving Spatial Locality Programs via Data Mining
Karlton Sequeira, Mohammed Zaki, Boleslaw Szymanski,
Christopher Carothers
Sponsoring Organizations
Wednesday, August 26
Poster Papers – Research Track, cont.
8:00-12:00 (Concourse)
Registration
Mining Phenotypes and Informative Genes from Gene
Expression Data
Chun Tang, Aidong Zhang, Jian Pei
8:30-17:00
Full Day Workshops:
Weighted Association Rule Mining Using Weighted
Support and Significance Framework
Feng Tao, Fionn Murtagh, Mohsen Farid
BIOKDD03: Data Mining in Bioinformatics (Monroe East)
Data Cleaning, Record Linkage and Object Consolidation
(Georgetown West)
Fractals and Self Similarity in Data Mining: Issues and
Approaches (Map Room – terrace level)
Link Analysis (Georgetown East)
MDM/KDD 2003: Integrated Media Mining (Caucus Room
– terrace level)
MRDM 2003: Multi-relational Data Mining (Monroe West)
Operational Text Classification (Hemisphere Room)
PaintingClass: Interactive Construction, Visualization
and Exploration of Decision Trees
Soon Tee Teoh, Kwan-Liu Ma
Time and Sample Efficient Discovery of Markov
Blankets and Direct Causal Relations
Ioannis Tsamardinos, Constantin F. Aliferis,
Alexander Statnikov
Distributed Multivariate Regression Based on
Influential Observations
Hang Yu, Ee-Chien Chang
Efficiently Handling Feature Redundancy in HighDimensional Data
Lei Yu, Huan Liu
WebKDD2003: WebMining as a Premise to Intelligent and
Effective Web Applications (Military Room)
8:30-12:00
Half Day Workshop:
Data Mining Standards, Services and Platforms
(Conservatory – terrace level)
8:30-12:00
Tutorial: Data Mining for Computer Security (Lincoln
West)
Carla Brodley, Purdue University
Philip Chan, MIT/FIT
Tutorial: Data Mining for Machine Learners
(Thoroughbred Room)
Johannes Gehrke, Cornell University
Jiawei Han, University of Illinois at Urbana
12:00-13:30
Lunch (on your own)
13:30-17:00
Tutorial: Privacy-Preserving Data Mining (Lincoln
West)
Chris Clifton, Purdue University
Tutorial: Sequence Data Mining Techniques and
Applications (Thoroughbred Room)
Mark Craven, University of Wisconsin, Madison
Sunita Sarawagi, IIT Bombay
Poster Papers – Industrial/Government Track
An Adaptive Nearest Neighbor Search for a Parts
Acquisition ePortal
Rafael Alonso, Jeffrey A. Bloom, Hua Li, CHumki
Basu
Architecting a Knowledge Discovery Engine for Military
Commanders Utilizing Massive Runs of Simulations
Philip Barry, Jianping Zhang, Mary McDonald
Data Quality through Knowledge Engineering
Tamraparni Dasu, Gregg T. Vesonder, Jon R. Wright
Similarity Analysis on Government Regulations
Gloria T. Lau, Kincho H. Law, Gio Wiederhold
Experimental Design for Solicitation Campaigns
Uwe F. Mayer, Armand Sarkissian .
Towards NIC-based Intrusion Detection
M. Otey, S. Parthasarathy, A. Ghoting, G. Li, S.
Narravula
Data-Driven Validation, Completion and Construction
of Event Relation Networks
Chang-Shing Perng, David Thoenen, Sheng Ma,
Genady Grabarnik, Joseph Hellerstein
Visualizing Concept Drift
Kevin B. Pratt, Gleb Tschapek
Poster Papers – Industrial/Government Track, cont.
Experimental Study of Discovering Essential
Information from Customer Inquiry
Keiko Shimazu, Atsuhito Momma, Koichi Furukawa
Applying Data Mining in Investigating Money
Laundering Crimes
Zhongfei Zhang, John J. Salerno, Philip S. Yu
Tuesday, August 25
panel will attempt to address the possible future directions
for Data Mining and KDD. Will we continue a healthy
evolution to being a scientific field of study with a healthy
contributing community? Will we go more down the path
of systems and engineering? What are the next challenge
problems? What are the milestones that define healthy
growth and significant advances? Is data mining destined
to continue to be a visible area of focus and research, or
will it evolve towards embedded technology studied as part
of other systems? The presence of a significant set of
research challenge problems against which measurable
progress can be made is a crucial component for the
growth of a scientific field. What will these challenge
problems look like for KDD and Data Mining over the next
10 years and beyond?
Panelists:
Rakesh Agrawal, IBM Almaden Research
Gregory Piatetsky-Shapiro, KDnuggets
Daryl Pregibon, AT&T Research
Ragu Ramakrishnan, University of Wisconsin, Madison
Ramasamy Uthurusamy, General Motors
18:30-19:30 (Adams Room)
Transfer meeting - KDD 2003 and KDD 2004 organizing
committees
19:30-22:00
Program committee dinner (by invitation only)
Tuesday, August 25
16:00-18:30 Research Track 1 (Monroe Room)
Frequent Sets
Chair: Geoff Webb
Screening and Interpreting Multi-item Associations Based
on Log-linear Modeling
Xintao Wu, Daniel Barbara, Yong Ye
Fast Vertical Mining Using Diffsets
Mohammed Zaki, Karam Gouda
CLOSET+: Searching for the Best Strategies for Mining
Frequent Closed Itemsets
Jianyong Wang, Jiawei Han, Jian Pei
Inverted Matrix: Efficient Discovery of Frequent Items in
Large Datasets in the Context of Interactive Mining
Mohammad El-Hajj, Osmar R. Zaiane
Mining Unexpected Rules by Pushing User Dynamics
Ke Wang, Yuelong Jiang, Laks Lakshmanan
16:00-17:30 Research Track 2 (International Ballroom –
Center)
Data Reduction and Visualization
Chair: Mihael Ankerst
Efficient Data Reduction with EASE
Hervé Brönnimann, Bin Chen, Manoranjan Dash, Peter
Haas, Peter Scheuermann
PROXIMUS: A Framework for Analyzing Very High
Dimensional Discrete-Attributed Datasets
Mehmet Koyuturk, Ananth Grama
Visualizing Changes in the Structure of Data for
Exploratory Feature Extraction
Elias Pampalk, Werner Goebl, Gerhard Widmer
17:30-18:30 (International Ballroom – Center)
Panel: Data Mining: The Next 10 Years
Chair: Usama Fayyad, President, DMX Group
After nearly a decade and a half of KDD conferences
and a significant growth in demand for data mining
technology driven by a glut in data, data mining has
grown as a healthy research community. However, we
still struggle on two important fronts: the scientific and
the commercial. On the scientific front, Data Mining
still needs to reach a stronger level of attracting steady
contributions from the related fields. On the
commercial fronts, the huge opportunity has not yet
been met with adequate tools and solutions. This
Tuesday, August 25
7:30-8:30
Continental Breakfast
8:00-18:00 (Concourse)
Registration
8:00-17:00 (Exhibit Hall)
Exhibits
8:30-9:30 (International Ballroom – Center)
Invited Talk
Chair: Paul Bradley
Analyzing Customer Behavior at Amazon.com
Andreas Weigend, Chief Scientist, Amazon.com
The first part of the talk gives an overview of the different
kinds of data available at Amazon.com, emphasizing that
data mining needs to drive actions such as emails,
coupons, and recommendations of products, product
groups, or site features. The scope of the actions ranges
from the individual customer, over pre-computed customer
segments, to the entire customer base.
The second part presents joint work with Bruce
D'Ambrosio (Cleverset, Inc.) on probabilistic relational
models for customer behavior, both for discovering static
customer attributes, and for dynamically predicting the
intention of the customer and the outcome of a session.
The third part outlines current research problems, such as
modeling and eventually influencing the long-term
behavior of customers. In addition to the importance of
machine learning, it shows the central role principles of
behavioral economics, judgment and decision making play
in computational marketing.
9:30-10:00
Coffee Break
10:00-11:30 Research Track 1 (Monroe Room)
Relational and Graph Data
Chair: Ray Mooney
Aggregation-Based Feature Invention and Relational
Concept Classes
Claudia Perlich, Foster Provost
Algorithms for Estimating Relative Importance in Networks
Scott White, Padhraic Smyth
CloseGraph: Mining Closed Frequent Graph Patterns
Xifeng Yan, Jiawei Han
Tuesday, August 25
Tuesday, August 25
10:00-11:30 Research Track 2 (Georgetown Room)
Data Streams and Sequential Data
Chair: Johannes Gehrke
14:00-15:30 Research Track 2 (Georgetown Room)
Distance-Based Methods
Chair: Martin Ester
Mining Concept-Drifting Data Streams using Ensemble
Classifiers
Haixun Wang, Wei Fan, Philip Yu, Jiawei Han
Towards Systematic Design of Distance Functions for
Data Mining Applications
Charu Aggarwal
Efficient Elastic Burst Detection in Data Streams
Yunyue Zhu, Dennis Shasha
Mining Distance-Based Outliers in Near Linear Time with
Randomization and a Simple Pruning Rule
Stephen Bay, Mark Schwabacher
Fragments of Order
Aristides Gionis, Teija Kujala, Heikki Mannila
10:00-11:30 Industrial/Govt. Track (Military Room)
Healthcare
Chair: Eric Bloedorn
Mining Hepatitis Data with Temporal Abstraction
Tu B. Ho, Trong Dung Nguyen, S. Kawasaki, S. Q. Le, H.
Yokoi, K. Takabayashi
Clinical and Financial Outcomes Analysis with Existing
Hospital Patient Records
R. Bharat Rao, Radu S. Niculescu, Colin Germond,
Harsha Rao
BEST APPLICATION PAPER AWARD
Empirical Bayesian Data Mining for Discovering Patterns
in Post-Marketing Drug Safety
David M. Fram, June S. Almenoff, William DuMouchel
11:45-13:45 (International Ballroom – Center)
SIGKDD Business Lunch
14:00-15:30 Research Track 1 (Monroe Room)
Web Mining and Data Cubes
Chair: Ronny Kohavi
Eliminating Noisy Information in Web Pages for Data
Mining
Lan Yi, Bing Liu, Xiaoli Li
SEWeP: Using Site Semantics and a Taxonomy to
Enhance the Web Personalization Process
Magdalini Eirinaki, Michalis Vazirgiannis, Iraklis Varlamis
Extracting Semantics from Data Cubes using Cube
Transversals and Closures
Alain Casali, Rosine Cicchetti, Lotfi Lakhal
Adaptive Duplicate Detection Using Learnable String
Similarity Measures
Mikhail Bilenko, Raymond Mooney
14:00-15:30 Industrial/Govt. Track (Military Room)
Systems
Chair: Monte Hancock
Knowledge-Based Data Mining
Sholom M. Weiss, Stephen J. Buckley, Shubir Kapoor,
Søren Damgaard
The Anatomy of a Multimodal Information Filter
Yi-Leh Wu, King-Shy Goh, Beitao Li, Huaxing You,
Edward Y. Chang
Golden Path Analyzer: Using Divide-and-Conquer to
Cluster Web Clickstreams
Kamal Ali, Steven P. Ketchpel
15:30-16:00
Coffee Break
15:45-18:45 (Georgetown Room)
Tutorial: Information Extraction from the World Wide
Web
William Cohen, Carnegie Mellon University
Andrew McCallum, University of Massachusetts, Amherst
Download