LECTURE 2: DATA MINING WHAT IS DATA MINING? 2 DATA MINING AND DATA WAREHOUSES? • It evolved in to being as the science of databases evolved • Database Datawarehouses Data Mining • Process similar to how science evolved • Data Mining and Data Analytics is the fastest growing discipline worldwide with plenty of jobs 3 EVOLUTION OF SCIENCES • Before 1600, empirical science • Science from practical experience • Arab and muslim scientists were at the forefront • 1600-1950s, theoretical science • Underpinned the importance of theory or hypothesis • Next stage involved proving it empirically • 1950s-1990s, computational science • Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) • Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. 4 EVOLUTION OF SCIENCES • 1990-now, data science • The flood of data from new scientific instruments and simulations • The ability to economically store and manage petabytes of data online • Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge! • Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002 5 EVOLUTION OF DATABASE TECHNOLOGY • 1960s: • Data collection, database creation, IMS and network DBMS • 1970s: • Relational data model, relational DBMS implementation • 1980s: • RDBMS, advanced data models (extended-relational, OO, deductive, etc.) • Application-oriented DBMS (spatial, scientific, engineering, etc.) 6 • 1990s: • Data mining, data warehousing, multimedia databases, and Web databases • 2000s • Stream data management and mining • Data mining and its applications • Web technology (XML, data integration) and global information systems 7 CLASS WORK 1. How did data mining evolve from databases? Ans. Database Datawarehouses Data Mining 2. Why is Data Science possible today i.e. 1990s onwards? Ans. Availability of a huge amount of data in various types of databases. 3. When did or in which years did computation science become a branch of most other sciences? Ans. Between 1950s and 1990s. 4. Arabs and muslim scientists are considered the founders of which kind of sciences? Ans. Empirical Sciences. 8 DATA MINING DEFINITION • Data mining (knowledge discovery from data) • Extraction of interesting patterns or knowledge from huge amount of data • Data mining: a misnomer? • Alternative names • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. • Watch out: Is everything “data mining”? • Simple search and query processing • (Deductive) expert systems 9 CLASS WORK • Define what is Data Mining? • • Extraction of information i.e. implicit, previously unknown, potentially useful and non-trivial. State whether the following is simple search (SS) and/or data mining (DM). 1. Retrieving name of students from database. • 2. SS Finding whether a first year student will complete studies or not. • 3. DM Looking up which customer buys dates. • 4. SS Investigating whether a customer buying dates buys milk as well or not. • 5. DM Investigating which cars have been fined for high speed today. • 6. SS Predicting what amount of fines a student will receive in an year. • 7. DM Finding which cars compete with Toyota Corolla in KSA. • 8. SS Identifying which models sold by Toyota are likely to be popular in different regions of Saudia in the next year. • 9. DM Finding which products were bought by a customer in a month. • 10. SS Identify which customer would be highly valuable. • DM