Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at P.Brezany Institut für Softwarewissenschaft - Universität Wien 1 Media That Radically Influenced Society 1500s Printing Press 1840s Penny Post 1930s Radio 1950s TV 1990s Web P.Brezany 1850s Telegraph 20xx Grid Institut für Softwarewissenschaft - Universität Wien 1920s Telephone 2 Talk Outline • Data Mining on the Grid – Background Information • Application Examples • Architecture of a Traditional Data Mining System • GridMiner – A framework for Data Mining on the Grid • GridMiner Architecture • Functional and Data Access Model • Conclusions P.Brezany Institut für Softwarewissenschaft - Universität Wien 3 Data Mining on the Grid • Data mining on the Grid (DMG) : finding unknown data patterns in an environment with geographically distributed data and computation. • Data may be highly heterogeneous with a high update frequency • A good DMG algorithm analyzes data in a distributed fashion with modest data communication overhead. • A typical DMG algorithm involves local data analysis followed by the generation of a global data model. P.Brezany Institut für Softwarewissenschaft - Universität Wien 4 Application Examples • Finding out the dependency of the emergence of hepatitis-C on the weather patterns: access to a large hepatitis-C DB at one location and an environmental DB at another location. • 2 major financial organizations want to cooperate. They need to share data patterns relevant to the data mining task, they do not want to share the data since it is sensitive - combining the databases may not be feasible. • Federating Brain Data Project – Integrating several neuro-science DBs • A major multi-national corporation wants to analyze the customer transaction records for quickly developing successful business strategies. - It has thousands of establishments through out the world - Collecting all the data to a centralized data warehouse, followed by analysis using existing commercial data mining software,takes too long. P.Brezany Institut für Softwarewissenschaft - Universität Wien 5 Telemedical Applications AMG – Austrian Medical Grid Database Raw Medical Data Derived Medical Data Database Reconstructed Medical Data Web P.Brezany Institut für Softwarewissenschaft - Universität Wien 6 Telemedical Collaboration - Example A patient living in a remote village has a heart problem. An EEG is taken by the local doctor and all the patient’s details are stored in the doctor’s PC based telemedical system. MRI and CT scans are taken within different departments of a general hospital and stored in the telemedical DB. A consultant compiles a report and saves it in the DB. If necessary, in a specialized clinic a 3D ultrasound scan is taken and further report compiled. Requiring complicated surgery, an external specialist using Virtual Reality techniques defines how the surgery should be planned. The resulting operation is placed on video for, e.g., education. Data mining support/assistance is needed. P.Brezany Institut für Softwarewissenschaft - Universität Wien 7 Architecture of a Data Mining System Graphical user interface Pattern evaluation Knowledge base Data mining engine Database or data warehouse server Data cleaning, data integration Database P.Brezany Filtering Data warehouse Institut für Softwarewissenschaft - Universität Wien 8 On Line Analytical Mining (OLAM) P.Brezany Institut für Softwarewissenschaft - Universität Wien 9 GridMiner – A Framework for Data Mining on Grids System Requirements: - Algorithm and data publishing and integration - Compatibility with grid infrastructure and Grid awareness - Openness - Scalability - Security and data privacy Functionality requirements: - Mining different kinds of knowledge in databases - Incremental data mining algorithms - Interactive mining of knowledge at multiple levels of abstraction P.Brezany Institut für Softwarewissenschaft - Universität Wien 10 GridMiner (Layered) Architecture (Based on the K.F. Jeffery´s idea) P.Brezany Institut für Softwarewissenschaft - Universität Wien 11 Functional and Data Access Model MDS P.Brezany Institut für Softwarewissenschaft - Universität Wien 12 Example: Mining Patterns for Data Classification and Associations use database dat1, dat2 mine classifications analyze credit_rating using g_parsimony display as tree P.Brezany use database DBs attributes mine associations using method attributes display as rules Institut für Softwarewissenschaft - Universität Wien 13 Knowledge Grid Architecture Layers High level layer Data Access Service Core layer Tools and Algorithms Access Service Knowledge Directory Service Execution Plan Management Result Present. Service Resource Allocation Execution Management Generic Grid and Data Grid Services P.Brezany Institut für Softwarewissenschaft - Universität Wien 14 Conclusions • Grid data mining is a relevant research topic • GridMiner approach may contribute to this research domain • Collaborations are needed • IPG (Information Power Grid) is the only Grid project, which wants to addresss knowledge discovery issues • Looking for a pilot application(s) • Open issues - basic Grid technology: Globus, DataGrid, Jini, JXTA ? P.Brezany Institut für Softwarewissenschaft - Universität Wien 15 Data Storage and the Components Site A Site B Site C Site D Preprocesing Preprocessing Preprocessing Preprocessing Local DM Local DM Local DM Local DM Construction of the Global Model GUI P.Brezany Site E Institut für Softwarewissenschaft - Universität Wien 16