Clustering Massive Datasets: Applying Classical Problems in

advertisement
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCES
DEPARTMENT OF MATHEMATICS
Dr. Juan R. Iglesias, Chair
PPOHA Grant Invited Speaker Series
Guest Speaker: Esteban Rangel
PhD Student at Center for Ultra-scale Computing and Information Security
Department EECS
Northwestern University
Wednesday, September 12, 2012
1:30-2:30 PM
SETB 3rd Floor Conference Room
Title: Clustering Massive Datasets: Applying Classical Problems in
Probability Theory to the K-Medoids Problem
Abstract: K-medoid methods for clustering data have many desirable properties such as robustness and the ability to
use non-numerical values, but their typically high computational complexity has made their application to large data
sets difficult. I will discuss AGORAS, a stochastic algorithm for the k-medoids problem that is especially well-suited to
clustering massive data sets. The approach involves taking a sequence of uniform sample sets and a heuristic for
determining the sample size and identifying potential cluster medoids from the sampled items. As a result, computing
the final solution only involves solving k trivial sub-problems of centrality, which can be done much more efficiently
on large data sets than searching a combinatorial space for the optimal value of an objective function. The complexity
of AGORAS is effectively independent of the full data size, and it can scale to arbitrarily large data sets. Parallel
implementations for shared and distributed memory architectures will be discussed along with general optimizations.
Bio: I work in the field of high performance analytics. My research is focused on developing next generation
data mining algorithms for the increasing size and complexity of data. To this end, my interests are in parallel
computing for shared and distributed memory architectures and GPU's, and in approximation algorithms and
stochastic methods. In a broad sense, I am interested in discovering relationships between accuracy, power
consumption, and computing time for data mining tasks. After completing my MS in Computer Science from the
University of Texas at Brownsville, I joined the Center for Ultra-scale Computing and Information Security (CUCIS) at
Northwestern University in the Electrical Engineering and Computer Science Department. In 2011, I was awarded a
Northwestern University Fellowship.
80 Fort Brown • Your Location • Brownsville, Texas 78520 • 956-882-6605 • Fax 956-882-6604 • utb.edu
Download