Unit-IV-IntelgentDBs - ADVANCED DATA BASE MANAGEMENT

advertisement
Dr. K. Raghava Rao
Professor of CSE
Dept. of MCA,KL
University
krraocse@gmail.com
http://advdbms.blog.com
Generalized Model for Active Databases and Oracle Triggers
 Triggers
are executed when a specified condition
occurs during insert/delete/update
 Triggers
are action that fire automatically based
on these conditions

Triggers follow an Event-condition-action (ECA) model
• Event:
 Database modification
 E.g., insert, delete, update,
• Condition:
 Any true/false expression
 Optional: If no condition is specified then condition is
always true
• Action:
 Sequence of SQL statements that will be automatically
executed

When a new employees is added to a department, modify the
Total_sal of the Department to include the new employees
salary;
CREATE TRIGGER Total_sal1
AFTER INSERT ON Employee
FOR EACH ROW
WHEN (NEW.Dno is NOT NULL)
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW. Salary
WHERE Dno = NEW.Dno;

From the previous example we can find that:
• Logically this means that we will CREATE a TRIGGER, let us
call the trigger Total_sal1
 This trigger will execute AFTER INSERT ON Employee table
(EVENT)
 It will do the following FOR EACH ROW
 WHEN NEW.Dno is NOT NULL (CONDITION)
 The trigger will UPDATE DEPARTMENT (ACTION)
 By SETting the new Total_sal to be the sum of
 old Total_sal and NEW. Salary
 WHERE the Dno matches the NEW.Dno;
 CREATE
TRIGGER <name>
• Creates a trigger if one does not exist
 ALTER
TRIGGER <name>
• Alters a trigger if one does exist
 Works in both cases, whether a trigger exists or not
Conditions can be one of the following:
 AFTER
• Executes after the event
 BEFORE
• Executes before the event
 INSTEAD
OF
• Executes instead of the event
 Note that event does not execute in this case
 INSTEAD OF triggers are used for modifying views
 Triggers can be
• Row-level
 FOR EACH ROW specifies a row-level trigger
 Executed separately for each affected row
 NEW and OLD keywords can be used here
• Statement-level
 Default (when FOR EACH ROW is not specified)
 Execute once for the SQL statement,
 Any
true/false condition to control whether a
trigger is activated or not
• Absence of condition means that the trigger will
always execute for the even (True)
• Otherwise, condition is evaluated
 before the event for BEFORE trigger
 after the event for AFTER trigger
 Action
can be
• One SQL statement
• A sequence of SQL statements enclosed between a
BEGIN and an END
 Action
specifies the relevant modifications
Design and Implementation Issues for Active
Databases
 An active database allows users to make the
following changes to triggers (rules)
• Activate
• Deactivate
• Drop
An event can be considered in 3 ways:



Immediate consideration:
• Part of the same transaction and can be one of the
following depending on the situation
 Before
 After
 Instead of
Deferred consideration:
• Condition is evaluated at the end of the transaction
Detached consideration:
• Condition is evaluated in a separate transaction
Potential Applications for Active Databases
 Notification
• Automatic notification when certain condition occurs
• Enforcing integrity constraints
 Triggers are smarter and more powerful than constraints
• Maintenance of derived data
 Automatically update derived data and avoid anomalies due
to redundancy
 E.g., trigger to update the Total_sal in the earlier example
Triggers in SQL-99
 Can alias variables inside the REFERENCING
clause
Trigger examples in SQL-99
Time Representation, Calendars, and Time Dimensions
 Time is considered ordered sequence of points in some
granularity
• Use the term choronon instead of point to describe minimum
granularity
A
calendar organizes time into different time
units for convenience.
• Accommodates various calendars
 Gregorian (western), Chinese, Islamic, Etc.
 Point
events
• Single time point event
 E.g., bank deposit
• Series of point events can form a time series data
 Duration
events
• Associated with specific time period
 Time period is represented by start time and end time
 Transaction
time
• The time when the information from a certain
transaction becomes valid
 Bitemporal
database
• Databases dealing with two time dimensions
Incorporating Time in Relational Databases
Using Tuple Versioning
 Add to every tuple
• Valid start time
• Valid end time
Incorporating Time in Object-Oriented
Databases Using Attribute Versioning
 A single complex object stores all temporal
changes of the object
 Time varying attribute
• An attribute that changes over time
 E.g., age
 Non-Time varying attribute
• An attribute that does not changes over time
 E.g., date of birth
 Declarative
Language
• Language to specify rules
 Inference
Engine (Deduction Mechanism)
• Can deduce new facts by interpreting the rules
• Related to logic programming
 Prolog language (Prolog => Programming in logic)
 Uses backward chaining to evaluate
 Top-down application of the rules
 Specification
consists of:
• Facts
 Similar to relation specification without the necessity of
including attribute names
• Rules
 Similar to relational views (virtual relations that are not
stored)
 Predicate
has
• a name
• a fixed number of arguments
 Convention: Constants are numeric or character strings
 Variables start with upper case letters
• E.g., SUPERVISE(Supervisor, Supervisee)
 States that Supervisor SUPERVISE(s) Supervisee
 Rule
• Is of the form head :- body
 where “:-” is read as “if and only if”
• E.g., SUPERIOR(X,Y) :- SUPERVISE(X,Y)
• E.g., SUBORDINATE(Y,X) :- SUPERVISE(X,Y)
 Query
• Involves a predicate symbol followed by some variable
arguments to answer the question
• E.g., SUPERIOR(james,Y)?
• E.g., SUBORDINATE(james,X)?
(a) Prolog notation (b) Supervisory tree
• Program is built from atomic formulas
 Literals of the form p(a1, a2, … an) where
 p predicate name
 n is the number of arguments
 Built-in predicates are included
 E.g., <, <=, =, etc.
• A literal is either
 An atomic formula
 An atomic formula preceded by not
A
formula can have quantifiers
• Universal
• Existential
 In
clausal form, a formula must be
transformed into another formula with the
following characteristics
• All variables are universally quantified
• Formula is made of a number of clauses where each
clause is made up of literals connected by logical ORs
only
• Clauses themselves are connected by logical ANDs
only.
 Any
formula can be converted into a clausal
form
• A specialized case of clausal form are horn clauses that
can contain no more than one positive literal
 Datalog
program are made up of horn clauses
 There
are two main alternatives for
interpreting rules:
• Proof-theoretic
• Model-theoretic
 Proof-theoretic
• Facts and rules are axioms
• Ground axioms contain no variables
• Rules are deductive axioms
• Deductive axioms can be used to construct new facts
from existing facts
• This process is known as theorem proving
 Model-theoretic
• Given a finite or infinite domain of constant values, we
assign the predicate every combination of values as
arguments
• If this is done for every predicated, it is called
interpretation
 Model
• An interpretation for a specific set of rules
 Model-theoretic
proofs
• Whenever a particular substitution to the variables in
the rules is applied, if all the predicated are true under
the interpretation, the predicate at the head of the
rule must also be true
 Minimal
model
• Cannot change any fact from true to false and still get a
model for these rules
A
program is safe if it generates a finite set of facts
 Two main methods of defining truth values
• Fact-defined predicates (or relations)
 Listing all combination of values that make a predicate true
• Rule-defined predicates (or views)
 Head (LHS) of one or more Datalog rules, for example:
 Many
operations of relational algebra can be
defined in the for of Datalog rules that defined
the result of applying these operations on
database relations (fact predicates)
• Relational queries and views can be easily specified in
Datalog

Define an inference mechanism based on relational database
query processing concepts
• Example: predicate dependencies
Knowledge databases
42
KDD: A Definition
KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.
106-1012 bytes:
we never see the whole Then run Data
data set, so will put it in Mining algorithms
the memory of computers
What is the knowledge?
How to represent
and use it?
43
Data, Information, Knowledge
We often see data as a string of bits, or numbers
and symbols, or “objects” which we collect daily.
Information is data stripped of redundancy, and
reduced to the minimum necessary to characterize
the data.
Knowledge is integrated information, including facts
and their relations, which have been perceived,
discovered, or learned as our “mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.
44
From Data to Knowledge
Medical Data by Dr. Tsumoto, Tokyo Med. & Dent. Univ., 38 attributes
...
10, M, 0, 10, 10, 0, 0, 0, SUBACUTE, 37, 2, 1, 0,15,-,-, 6000, 2, 0, abnormal, abnormal,-, 2852, 2148,
712, 97, 49, F,-,multiple,,2137, negative, n, n, ABSCESS,VIRUS
12, M, 0, 5, 5, 0, 0, 0, ACUTE, 38.5, 2, 1, 0,15, -,-, 10700,4,0,normal, abnormal, +, 1080, 680, 400, 71,
59, F,-,ABPC+CZX,, 70, negative, n, n, n, BACTERIA, BACTERIA
15, M, 0, 3, 2, 3, 0, 0, ACUTE, 39.3, 3, 1, 0,15, -, -, 6000, 0,0, normal, abnormal, +, 1124, 622, 502, 47,
63, F, -,FMOX+AMK, , 48, negative, n, n, n, BACTE(E), BACTERIA
16, M, 0, 32, 32, 0, 0, 0, SUBACUTE, 38, 2, 0, 0, 15, -, +, 12600, 4, 0,abnormal, abnormal, +, 41, 39,
2, 44, 57, F, -, ABPC+CZX, ?, ? ,negative, ?, n, n, ABSCESS, VIRUS
...
Numerical attribute
categorical attribute
missing values
class labels
IF cell_poly <= 220 AND Risk = n AND Loc_dat = + AND Nausea > 15
THEN Prediction = VIRUS [87,5%]
[confidence, predictive accuracy]
45
Data Rich Knowledge Poor
People gathered and stored so
much data because they think
some valuable assets
are implicitly coded within it.
Raw data is rarely of direct
benefit.
Its true value depends on the
ability to extract information
useful for decision support.
Impractical Manual Data Analysis
How to acquire knowledge for
knowledge-based systems
remains as the main difficult
and crucial problem.
?
knowledge
base
inference
engine
Tradition: via knowledge engineers
New trend: via automatic programs
46
Benefits of Knowledge Discovery
Value
Disseminate
DSS
Generate
Volume
MIS
Rapid Response
EDP
EDP: Electronic Data Processing
MIS: Management Information Systems
DSS: Decision Support Systems
47
Lecture 1: Overview of KDD
1. What is KDD and Why ?
2. The KDD Process
3. KDD Applications
4. Data Mining Methods
5. Challenges for KDD
48
The KDD process
The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)
Multiple process
non-trivial process
valid
novel
useful
understandable
Justified patterns/models
Previously unknown
Can be used
by human and machine
49
The Knowledge Discovery Process
a step in the KDD
process consisting of
methods that produce
useful patterns or
models from the data,
under some acceptable
computational
efficiency limitations
5
4
3
Putting the results
in practical use
Interpret and Evaluate
discovered knowledge
Data Mining
2
1
Extract Patterns/Models
Collect and
Preprocess Data
Understand the domain
and Define problems
KDD is inherently
interactive and iterative
50
The KDD Process
Data organized by
function
Create/select
target database
Data warehousing
1
Select sampling
technique and
sample data
Supply missing
values
Eliminate
noisy data
Normalize
values
Transform
values
2
Create derived
attributes
Find important
attributes &
value ranges
4
3
Select DM
task (s)
Transform to
different
representation
Select DM
method (s)
Extract
knowledge
Test
knowledge
Refine
knowledge
Query & report generation
Aggregation & sequences
Advanced methods
5
51
Main Contributing Areas of KDD
[data warehouses:
integrated data]
Statistics
[OLAP: On-Line
Analytical Processing]
Databases
Store, access, search,
update data
(deduction)
Infer info from data
(deduction & induction,
mainly numeric data)
KDD
Machine Learning
Computer algorithms that improve
automatically through experience
(mainly induction, symbolic data)
52
Potential Applications
Business information
Manufacturing information
- Marketing and sales
data analysis
- Investment analysis
- Loan approval
- Controlling and scheduling
- Fraud detection
- Network management
- etc.
- Experiment result analysis
- etc.
Scientific information
-
Personal information
Sky survey cataloging
Biosequence Databases
Geosciences: Quakefinder
etc.
53
KDD: Opportunity and Challenges
Competitive
Pressure
Data Rich
Knowledge Poor
(the resource)
KDD
Data Mining
Technology
Mature
Enabling Technology
(Interactive MIS, OLAP,
parallel computing, Web, etc.)
54
KDD: A New and Fast Growing Area
KDD workshops: since 1989.
Inter. Conferences: KDD (USA), first in 1995;
PAKDD (Asia), first in 1997; PKDD (Europe), first in 1997.
ML’04/PKDD’04 (in Pisa, Italy)
Industry interests and competition: IBM, Microsoft,
Silicon Graphics, Sun, Boeing, NASA, SAS, SPSS, …
About 80% of the Fortune 500 companies are involved in
data mining projects or using data mining systems.
JAPAN: FGCS Project (logic programming and
reasoning).
“Knowledge Discovery is the most desirable end-product of computing”.
Wiederhold, Standford Univ.
55
Primary Tasks of Data Mining
finding the description
of several predefined
classes and classify
a data item into one
of them.
Classification
?
maps a data item
to a real-valued
prediction variable.
Regression
discovering the
most significant
changes in the data
Deviation and
change detection
identifying a finite
set of categories or
clusters to describe
the data.
Clustering
finding a model
which describes
significant dependencies
between variables.
Dependency
Modeling
finding a
compact description
for a subset of data
Summarization
56
Classification
“What factors determine cancerous cells?”
Examples
Data
Cancerous Cell Data
Mining
Algorithm
General
patterns
Classification
Algorithm
- Rule Induction
- Decision tree
- Neural Network
57
Classification: Rule Induction
“What factors determine a cell is cancerous?”
If
and
and
Then
Color = light
Tails = 1
Nuclei = 2
Healthy Cell
If
and
and
Then
Color = dark
Tails = 2
Nuclei = 2
Cancerous Cell
(certainty = 92%)
(certainty = 87%)
58
Classification: Decision Trees
Color = dark
#nuclei=1
#tails=1
healthy
#tails=2
cancerous
#nuclei=2
cancerous
Color = light
#nuclei=1
#nuclei=2
healthy
#tails=1
#tails=2
healthy
cancerous
59
Classification: Neural Networks
“What factors determine a cell is cancerous?”
Color = dark
# nuclei = 1
…
Healthy
Cancerous
# tails = 2
60
End of Unit-iv
Download