Abstract - Computer Science & Engineering

advertisement
COLLOQUIUM
Department of Computer Science and Engineering
University of South Carolina
Topic Detection and Tracking Using
Nonnegative Matrix and Tensor Factorizations
Michael W. Berry
Department of Electrical Engineering and Computer Science
University of Tennessee
Date: January 24, 2008
Time: 1400-1500
Place: 300 Main B103
Abstract
Automated approaches for the identification and clustering of semantic features or topics
are highly desired for text mining applications. Using a low rank non-negative matrix
factorization (NNMF) algorithm to retain natural data non-negativity, we eliminate the
need to use subtractive basis vector and encoding calculations present in techniques such
as principal component analysis for semantic feature abstraction. Using non-negative
tensor factorization (NNTF), temporal and semantic proximity can be exploited to enable
tracking of focused discussions as well as latent (unknown) communication patterns.
Demonstrations of NNMF and NNTF algorithms for topic (or discussion) detection and
tracking using the Enron Email Collection and documents from the Airline Safety
Reporting System (ASRS) are provided.
Michael W. Berry holds the title of Full Professor and Coordinator for Computer Science in the newly
formed Department of Electrical Engineering and Computer Science at the University of Tennessee,
Knoxville. He received the BS degree in Mathematics from the University of Georgia in 1981 and the
MS degree in Applied Mathematics from North Carolina State University in 1983. He worked in the
Communications Product Division of IBM in Raleigh, NC for about one year before accepting a research
staff position in the Center for Supercomputing Research and Development at the University of Illinois at
Urbana-Champaign. In 1990, he received the PhD degree in Computer Science from the University of
Illinois at Urbana-Champaign. Prof. Berry is the co-author of Templates for the Solution of Linear
Systems: Building Blocks for Iterative Methods (SIAM, 1994) and Understanding Search Engines:
Mathematical Modeling and Text Retrieval, Second Edition (SIAM, 2005) and editor of Computational
Information Retrieval (SIAM, 2001), Survey of Text Mining: Clustering, Classification, and Retrieval
(Springer, 2003, 2007), and Lecture Notes in Data Mining (World Scientific, 2006). He has published
over 100 refereed journal and conference publications. His research interests include information
retrieval, data and text mining, computational ecology, bioinformatics, and parallel computing. Prof.
Berry is currently supported by grants and contracts from the National Science Foundation, National
Institutes of Health, and the National Aeronautics and Space Administration..
Download