Applications of Data Mining – Dr. K. Sekar IISc K. SEKAR, Ph.D. BIOINFORMATICS CENTRE SUPERCOMPUTER EDUCATION AND RESEARCH INDIAN INSTITUTE OF SCIENCE BANGALORE 560 012 INDIA sekar@physics.iisc.ernet.in Voice: (91)-80-2932469 FAX : (91)-80-3600683 (91)-80-3601409 (91)-80-3600551 Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc APPLICATIONS OF DATA MINING Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Abstract Bioinformatics is one of the fastest growing interdisciplinary areas in the biological sciences and has explored in such a way that we need powerful tools to organize and analyze the data. An overview will be presented on the general features of data mining tools, techniques and its applications. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics is the fashionable new name for the field previously called computational biology.The name is preferred by many because it puts the emphasis on the data storage and analysis, rather than on the biology, and the field is really data driven. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc The term Bioinformatics is used to encompass almost all computer applications in biological sciences, but was originally coined in the mid 1980’s for the analysis of biological sequence data. The quantity of known sequences data outweighs protein structural data and by virtue of the genome projects, sequence database are doubling in size every year. A key challenge of bioinformatics is to analyze the wealth of sequence data in order to understand the amassed information in term of protein structure function and evolution. Wherever possible, a range of different methods should be used, and the results should be married with all available biological information. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Refers to database-like activities involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of time. Encompass the use of algorithmic tools to facilitate biological database analyses. Comprises the entire collection of information management systems, analysis tools and communication networks supporting biology. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc DATA MINING Datamining is defined as “exploration and analysis by automatic and semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules”. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc The central challenge is to derive maximum results from the wealth of data.This can be achieved by establishing and maintaining databases and providing search and analysis tools to interpret the data. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar DATABASE IISc Database is nothing but a collection of quantitative data resulting from experimental measurements or observations in various fields of science.Recently interest in database has been kindled through international efforts to organize and analyze the data and update the knowledge Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc A database is essentially just a store of information.They are usually in the form of simple files (just a flat file, say).You can shove information into this store or retrieve it from the store. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Derived Database One of the greatest challenges in database research is analyze the database in depth and create derived databases to meet the needs or demands without compromising the sustainability and quality of the existing database. Creating desired database is expected is expected to dramatically reduce the workload of the user community and will serve as a highly focused database. Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Packages developed at the Bioinformatics Centre Raman Building Indian Institute of Science Bangalore 560 012 Principal Investigator Dr. K. Sekar E-mail sekar@physics.iisc.ernet.in Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Search Engines 144.16.71.10 / psst 144.16.71.2/bsdd 144.16.71.10/msgs 144.16.71.2/ssep Protein Sequence Search Tool Biomolecules Segment Display Device Motif Search in Genome Sequence Secondary Structural Elements in Protein Programmers 1. 2. 3. 4. 5. 6. 7. S.Saravanan A.Ajmal Khan C.K.Rajesh T.Kamaraj P.Selvarani V.Shanthi S.Sirajuddin Sheik Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar Database with Search facility 144.16.71.2/lsdb 144.16.71.2/lysdb 144.16.71.2/asdb 144.16.71.2/gsdb IISc Lipase Structural Database Lysozyme Structural Database 3D-Amylase Database Globin Structural Database Programmers 1. 2. 3. 4. 5. 6. 7. C.K.Rajesh T.Kamaraj P.Sundrarajan P.Selvarani V.Shanthi A.S.Zahir Hussain S.Sirajuddin Sheik Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar Software for Structure analysis & manipulation 144.16.71.146/cap 144.16.71.146/rp 144.16.71.146/wap 144.16.71.146/sem 144.16.71.146/pdbgoodies 144.16.71.10/gpsm 144.16.71.146/mbd 144.16.71.146/dtf Programmers 1. 2. 3. 4. 5. 6. IISc Conformation Angles Package Ramachandran Plot Water Analysis Package Symmetry Equivalent Molecules Generator PDBGOODIES Geometrical Parameters for Small Molecules Measurability of Biovoet difference Distribution of Temperature Factor C.K.Rajesh T.Kamaraj P.Sundarajan P.Selvarani V.Shanthi S.Sirajuddin Sheik Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Present Programmers S.S. Sheik S. Das V.G. Vijay J.J. Lakshmi Ch. K. Kumar C.C. Lingam K.S. Mohan S.A. Fernando S.K. Raja Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar Protein Sequence Search Tool (PSST 1.1) IISc S.Saravanan,A.Ajmul Khan & K.Sekar CURR.SCIENCE, (2000) 550 – 552 PDB Goodies – A Web based GUI to manipulate Protein Data Bank files A.S.Z.Hussain,V.Shanthi,S.S.Sheik,J.Jeyakanthan,P.Selvarani &K.Sekar ACTA CRYST. (2002), D58, 1385 – 1386 Ramachandran Plot (RP) S.Sheik,P. Sundararajan,A.S.Z Hussain & K.Sekar BIOINFORMATICS (2002) 18, 1548-1549 Water Analysis Package (WAP) V.Shanthi, C.K.Rajesh,J.Jayalakshmi,V.G.Vijay & K.Sekar J.APPL.CRYST. (2003) 36, 167-168 Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar CADB: Conformation Angles Data Base of proteins IISc S.S. Sheik, P. Ananthalakshmi, G. Ramya Bhargavi & K. Sekar NUCL. ACIDS RES., (2003) 448-451 SSEP: Secondary Structural Elements of Proteins V. Shanthi, P. Selvarani, Ch. K. Kumar, C.S. Mohire & K. Sekar NUCL. ACIDS RES., (2003) (In the press) SEM: Symmetry Equivalent Molecules – A web based GUI to generate and visualize the macromolecules A.S.Z. Hussain, V. Shanthi, Ch. K. Kumar, C.K. Rajesh, S.S. Sheik & K. Sekar NUCL. ACIDS RES., (2003) (In the press) Biomolecules Segment Display Device (BSDD) P.Selvarani,V.Shanthi,C.K.Rajesh,S.Saravanan & K.Sekar J.MOL. GRAPHICS & MODELLING (2003) (in press) Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Department of Biotechnology Ministry of Science & Technology Govt. of India, India & Jai Vigyan National Science Foundation Govt. of India, India Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Professor M. Vijayan Professor N. Balakrishnan Professor S.M. Rao Professor S. Ramakumar Colleagues and Friends Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc President of Nagoya University Professor Takashi Yamane & Other members Biotechnology & Biomaterial Science Bioinformatics Centre & Supercomputer Education and Research Centre Applications of Data Mining – Dr. K. Sekar IISc Bioinformatics Centre & Supercomputer Education and Research Centre