Lect2

advertisement
LECTURE 2: DATA MINING
WHAT IS DATA MINING?
2
DATA MINING AND DATA WAREHOUSES?
• It evolved in to being as the science of databases
evolved
• Database  Datawarehouses  Data Mining
• Process similar to how science evolved
• Data Mining and Data Analytics is the fastest growing
discipline worldwide with plenty of jobs
3
EVOLUTION OF SCIENCES
• Before 1600, empirical science
• Science from practical experience
• Arab and muslim scientists were at the forefront
• 1600-1950s, theoretical science
• Underpinned the importance of theory or hypothesis
• Next stage involved proving it empirically
• 1950s-1990s, computational science
• Over the last 50 years, most disciplines have grown a third,
computational branch (e.g. empirical, theoretical, and computational
ecology, or physics, or linguistics.)
• Computational Science traditionally meant simulation. It grew out of our
inability to find closed-form solutions for complex mathematical models.
4
EVOLUTION OF SCIENCES
• 1990-now, data science
• The flood of data from new scientific instruments and
simulations
• The ability to economically store and manage petabytes
of data online
• Scientific info. management, acquisition, organization,
query, and visualization tasks scale almost linearly with
data volumes. Data mining is a major new challenge!
• Jim Gray and Alex Szalay, The World Wide Telescope: An
Archetype for Online Science, Comm. ACM, 45(11): 50-54,
Nov. 2002
5
EVOLUTION OF DATABASE TECHNOLOGY
• 1960s:
• Data collection, database creation, IMS and network DBMS
• 1970s:
• Relational data model, relational DBMS implementation
• 1980s:
• RDBMS, advanced data models (extended-relational, OO,
deductive, etc.)
• Application-oriented DBMS (spatial, scientific, engineering,
etc.)
6
• 1990s:
• Data mining, data warehousing, multimedia databases, and Web
databases
• 2000s
• Stream data management and mining
• Data mining and its applications
• Web technology (XML, data integration) and global information
systems
7
CLASS WORK
1. How did data mining evolve from databases?
Ans. Database  Datawarehouses  Data Mining
2. Why is Data Science possible today i.e. 1990s onwards?
Ans. Availability of a huge amount of data in various types of
databases.
3. When did or in which years did computation science become
a branch of most other sciences?
Ans. Between 1950s and 1990s.
4. Arabs and muslim scientists are considered the founders of
which kind of sciences?
Ans. Empirical Sciences.
8
DATA MINING DEFINITION
• Data mining (knowledge discovery from data)
• Extraction of interesting patterns or knowledge from
huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
• Watch out: Is everything “data mining”?
• Simple search and query processing
• (Deductive) expert systems
9
CLASS WORK
•
Define what is Data Mining?
•
•
Extraction of information i.e. implicit, previously unknown, potentially useful and non-trivial.
State whether the following is simple search (SS) and/or data mining (DM).
1.
Retrieving name of students from database.
•
2.
SS
Finding whether a first year student will complete studies or not.
•
3.
DM
Looking up which customer buys dates.
•
4.
SS
Investigating whether a customer buying dates buys milk as well or not.
•
5.
DM
Investigating which cars have been fined for high speed today.
•
6.
SS
Predicting what amount of fines a student will receive in an year.
•
7.
DM
Finding which cars compete with Toyota Corolla in KSA.
•
8.
SS
Identifying which models sold by Toyota are likely to be popular in different regions of Saudia in the
next year.
•
9.
DM
Finding which products were bought by a customer in a month.
•
10.
SS
Identify which customer would be highly valuable.
•
DM
Download