distributed data management - Dipartimento di Elettronica ed

advertisement
DISTRIBUTED DATA
MANAGEMENT
Prof. Fabio A. Schreiber
Dipartimento di Elettronica e Informazione
Politecnico di Milano
COMPANY TELECOMMUNICATIONS
1st STAGE (< ‘80)
• TELEPHONE (POTS), TELEX, DATA TRANSMISSION
INDEPENDENT ON SEPARATE NETWORKS
2nd STAGE (‘80 - ‘95)
• DESIGN AND IMPLEMENTATION OF LARGE DIGITAL
COMMUNICATION NETWORKS
• DEVELOPMENT OF LOCAL NETWORKS OF PCs AND
WORKSTATIONS
3rd STAGE (> 1995)
• INTEGRATION AND MANAGEMENT OF LARGE
HETEROGENEOUS WANs AND OF LOCAL NETWORKS
Fabio A. Schreiber
Distributed I.S. 1
ISO-OSI REFERENCE MODEL
INTERFACE
TERMINAL NODE
APPLICATION
LEVEL
INTERFACE
PROTOCOLS
TERMINAL NODE
APPLICATION
LEVEL
PRESENTATION
LEVEL
PRESENTATION
LEVEL
SESSION
LEVEL
SESSION
LEVEL
TRANSPORT
LEVEL
TRANSPORT
LEVEL
RELAY NODE
NETWORK
LEVEL
NETWORK
LEVEL
DATA LINK
LEVEL
DATA LINK
LEVEL
MEDIA ACCESS SUBLEVEL
MEDIA ACCESS SUBLEVEL
PHYSISCAL LEVEL
PHYSISCAL LEVEL
Fabio A. Schreiber
Distributed I.S. 2
DISTRIBUTED SYSTEMS APPLICATIONS
• ACCESS TO REMOTE RESOURCES
connection to different computers and computing centers is
made possible using the same terminal (e.g. TELNET, FTP, …
PROTOCOLS)
• DISTRIBUTED COMPUTING
complex systems are built in which the application process
uses several remote computers and/or data sets, through
telecommunication networks (e.g. distributed information
systems, HPC, …)
• TELEMATIC APPLICATIONS
– electronic mail
– teleconference
– …….
Fabio A. Schreiber
Distributed I.S. 3
DATA MANAGEMENT ON THE NETWORK
Fabio A. Schreiber
Distributed I.S. 4
DATA MANAGEMENT ON THE NETWORK :
INDEPENDENT DATA BASES
Fabio A. Schreiber
Distributed I.S. 5
DATA MANAGEMENT ON THE NETWORK :
COMMON COMMAND LANGUAGE
Fabio A. Schreiber
Distributed I.S. 6
DATA MANAGEMENT ON THE NETWORK :
DISTRIBUTED DATA BASE
DDBMS
Fabio A. Schreiber
Distributed I.S. 7
MODERN INFORMATION SYSTEMS
TELECOMMUNICATION NETWORKS BECOME AN
ESSENTIAL COMPONENT FOR A GOOD
ECONOMICAL AND FUNCTIONAL OPERATION OF
THE ORGANIZATION
THE AVAILABILITY OF EFFECTIVE
TELECOMMUNICATION SYSTEMS ALLOWS THE
DEVELOPMENT OF NEW BUSINESS TYPES
Fabio A. Schreiber
Distributed I.S. 8
VERTICAL FUNCTION PARTITIONING OF AN I.S.
EDP CENTER
TECHNICAL DEPT. 802.5
gateway
gateway
gateway
gateway
BACKBONE FDDI
or
MAN 802.6
FACTORY 802.4
gateway
PABX
WAREHOUSE 802.3
ADMINISTRATIVE DEPT.
802..3
ISDN
Fabio A. Schreiber
Distributed I.S. 9
HORIZONTAL FUNCTION PARTITIONING OF AN I.S.
EDP CENTER
gateway
ADMINISTRATIVE DEPT
802..3
gateway
gateway
ADMINISTRATIVE DEPT
802..3
BACKBONE FDDI
or
gateway
MAN 802.6
gateway
PABX
ADMINISTRATIVE DEPT
802..3
ADMINISTRATIVE DEPT
802..3
ISDN
Fabio A. Schreiber
Distributed I.S. 10
DATA MANAGEMENT
1st STAGE (< ‘70)
• SPARSE FILES
2nd STAGE ( ‘70 - ‘90)
• LARGE CENTRALIZED DATA BASES
3rd STAGE (> 1990)
• DISTRIBUTED DATA MANAGEMENT
Fabio A. Schreiber
Distributed I.S. 11
SYSTEM ARCHITECTURE:
A DIALECTIC PROCESS
THE THESIS
THE
ANTITHESIS
10, 100, 1000
DECENTRALIZED
MINICOMPUTERS
A
MAINFRAME IS
THE BEST!
THE SYNTHESIS
DISTRIBUTED
INFORMATICS FOR MAXIMAL
FLEXIBILITY
Fabio A. Schreiber
Distributed I.S. 12
DISTRIBUTED DATA MANAGEMENT:
FUNCTIONAL GOALS
• AVAILABILITY
• LOAD SHARING
• RESOURCE SHARING
• QUALITY OF SERVICE TO THE USER
Fabio A. Schreiber
Distributed I.S. 13
FUNCTIONAL GOALS: AVAILABILITY
• REDUNDANT HW/SW RESOURCES CAN BE USED TO OBTAIN AN
OVERALL SYSTEM HIGHER AVAILABILITY
- FAULT TOLERANT SYSTEMS
- SOFT DEGRADATION SYSTEMS
Fabio A. Schreiber
Distributed I.S. 14
FUNCTIONAL GOALS: LOAD SHARING
IT ALLOWS A BALANCED RESOURCES DEVELOPMENT
WKL1
WKL1
MONTH
Fabio A. Schreiber
HOUR
Distributed I.S. 15
FUNCTIONAL GOALS: LOAD SHARING
IT ALLOWS A BALANCED RESOURCES DEVELOPMENT
WKL1
WKL1
MONTH
Fabio A. Schreiber
HOUR
Distributed I.S. 16
DISTRIBUTED DATA MANAGEMENT:
FUNCTIONAL GOALS
•
RESOURCE SHARING
SPECIALIZED OR UNIQUE RESOURCES CAN BE SHARED AT
WHICHEVER NODE
•
QUALITY OF SERVICE IMPROVEMENT
– LOCAL PROCESSING CAPABILITIES
– RESPONSE TIME REDUCTION
– USER FRIENDLY INTERFACE
Fabio A. Schreiber
Distributed I.S. 17
DISTRIBUTED DATA MANAGEMENT:
ECONOMICAL AND ORGANIZATIONAL FACTORS
• THE PROS
– LOCAL SYSTEMS EFFECTIVENESS ALLOWS A TIGHTER
CONNECTION BETWEEN USERS AND SYSTEM
– DISTRIBUTION OF THE “POWER” (ORGANIZATIONAL,
PSYCHOLOGICAL, POLITICAL, …) ASSOCIATED TO
INFORMATION OWNING
– ORGANIZATIONAL ACTIVITIES INTEGRATION IN
GEOGRAPHICALLY DISTRIBUTED COMPANIES
– COST/BENEFIT ???
• THE CONS
– THE GENERAL COORDINATION AND COLLABORATION NEED
REQUIRES A “CULTURAL ATTITUDE” NOT EASY TO FIND
– COST/BENEFIT???
Fabio A. Schreiber
Distributed I.S. 18
DISTRIBUTED SYSTEMS
SYSTEMS IN WHICH MESSAGE TRANSMISSION
TIME IS NOT NEGLIGIBLE WITH RESPECT TO
THE TIME BETWEEN TWO EVENTS IN A SINGLE
PROCESS
Fabio A. Schreiber
Distributed I.S. 19
DISTRIBUTED DATA MANAGEMENT:
ARCHITECTURES
Hw/Sw ARCHITECTURE
narrow bandwidth
loosely coupled systems
geographically distributed
computer networks
wide bandwidth
loosely coupled systems
DATA MANAGEMENT
Distributed Data Base Management
System (DDBMS)
back-end processor
local networks, functionally distributed
systems
wide bandwidth
tightly coupled systems
database machine
multiprocessor systems,
associative memories, ...
Fabio A. Schreiber
Distributed I.S. 20
DDSS SPACE
(DISTRIBUTED DATA SHARING SYSTEMS)
distribution
Federated
distributed DBMS
Homogeneous
distributed DBMS
Heterogeneous
distributed DBMS
Heterogeneous
Federated
distributed DBMS
Multidatabase
distributed systems
Heterogeneous
distributed
multidatabase systems
Homogeneous
integrated
DBMS
Multidatabase
systems
autonomy
Heterogeneous
integrated DBMS
Heterogeneous
Federated
local DBMS
Heterogeneous
multidatabase
systems
heterogeneity
Fabio A. Schreiber
Distributed I.S. 21
DDSS TAXONOMY
THE GLOBAL SYSTEM
HAS ACCES TO ...
TYPICAL LOCAL
NODES ARE ...
DISTRIBUTED DATA BASE
DBMS
INTERNAL FUNCTIONS
HOMOGENEOUS
DATA BASES
Y
NAME SPACE AND
GLOBAL SCHEMA
FEDERATED DATA BASE
WITH A GLOBAL SCHEMA
DBMS
USER INTERFACE
HETEROGENEOUS
DATA BASES
Y
GLOBAL SCHEMA
FEDERATED DATA BASE
DBMS
USER INTERFACE
HETEROGENEOUS
DATA BASES
Y
PARTIAL GLOBAL
SCHEMATA
MULTIDATABASE WITH A
SYSTEM LANGUAGE
DBMS
USER INTERFACE
HETEROGENEOUS
DATA BASES
Y
ACCESS FUNCTIONS
IN THE LANGUAGE
HOMOGENEOUS
MULTIDATABASE WITH A
SYSTEM LANGUAGE
DBMS USER INTERFACE
AND SOME INTERNAL
FUNCTIONS
HOMOGENEOUS
DATA BASES
Y
ACCESS FUNCTIONS
IN THE LANGUAGE
APPLICATION PROGRAMS
ABOVE DBMS
ANY DATA SOURCE
SATISFYING THE
INTERFACE PROTOCOL
N
DATA EXCHANGE
SYSTEM TYPE
INTEROPERABLE
SYSTEMS
Fabio A. Schreiber
HOW GLOBAL INFORMATION
GLOBAL DB
IS DEALT WITH
FUNCTIONALITY
Distributed I.S. 22
DISTRIBUTED DATA MANAGEMENT
• CLIENT-SERVER ARCHITECTURE
– MANY CLIENTS REFER TO A SINGLE SERVER
– MAINLY USED FOR OLTP ON LOCAL NETWORKS
• DISTRIBUTED DATABASE
– MANY CLIENTS REFER TO MANY SERVERS
– MAINLY USED FOR OLTP ON LOCAL AND WIDE AREA
NETWORKS
• DATA WAREHOUSE
– DATA COLLECTION FROM MANY DIFFERENT DATA
SOURCES
– USED IN DSS ON LOCAL AND WIDE AREA NETWORKS
Fabio A. Schreiber
Distributed I.S. 23
DDSS IN A CLIENT-SERVER ARCHITECTURE
• THE CLIENT
– IS MANAGED BY THE APPLICATION PROGRAMMER
– SHOWS A FRIENDLY INTERFACE TO THE FINAL USER
– USES EITHER STATIC OR DYNAMIC SQL
• THE SERVER
– IS MANAGED BY THE DATABASE ADMINISTRATOR
– ITS DIMENSION DEPENDS ON THE WORKLOAD AND ON THE
SERVICES TO BE DELIVERED
– MANAGES THE OPTIMIZATION PROCEDURES
Fabio A. Schreiber
Distributed I.S. 24
DDSS IN A CLIENT-SERVER ARCHITECTURE
USING DDBMS PRIMITIVES
DATABASE ACCESS
PRIMITIVES
APPLICATION
PROGRAM
CLIENT SITE
DDBMS1
SERVER SITE
RESULTS
DATABASE
DDBMS2
Fabio A. Schreiber
Distributed I.S. 25
DDSS IN A CLIENT-SERVER ARCHITECTURE
USING AUXILIARY PROGRAMS AND RPC
CLIENT SITE
APPLICATION
PROGRAM
DBMS1
RPC
SERVER SITE
AUXILIARY
PROGRAM
DBMS2
RESULTS
DATABASE
DATABASE ACCESS
PRIMITIVES
Fabio A. Schreiber
Distributed I.S. 26
DDSS IN A CLIENT-SERVER ARCHITECTURE
• USING DDBMS PRIMITIVES
– THE DDBMS LOCAL COMPONENT ROUTES THE QUERY TO
THE SERVER WHICH ACCESSES THE DATABASE AND
SENDS BACK THE RESULTS
– HIGH DISTRIBUTION TRANSPARENCY THANKS TO GLOBAL
FILE NAMES
– LOW EFFICIENCY SINCE THE ANSWER TRAVELS ONE TUPLE
AT A TIME
• USING AUXILIARY PROGRAMS AND RPC
– THE APPLICATION ASKS THE AUXILIARY PROGRAM TO
EXECUTE ON THE SERVER AND TO SEND BACK THE
RESULT
– THE AUXILIARY PROGRAM ASSEMBLES TUPLES INTO
RESULT SETS IMPROVING TRANSMISSION EFFICIENCY
Fabio A. Schreiber
Distributed I.S. 27
DATA WAREHOUSE (DW)
•
A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING
DATA COMING FROM DIFFERENT SOURCES TO OBTAIN A
DETAILED VIEW OF AN ECONOMIC SYSTEM
•
IT IS AN
–
–
–
–
INTEGRATED
PERMANENT
TIME VARIANT
TOPIC ORIENTED
DATA COLLECTION TO SUPPORT MANAGERIAL DECISIONS
•
IT IS THE SEPARATION ELEMENT BETWEEN OLTP AND DSS
WORKLOADS
Fabio A. Schreiber
Distributed I.S. 28
DW ARCHITECTURE
DATABASE 1
DATABASE
LEGACY
DATABASE 2
SPARSE FILES
WAREHOUSE
CONSTRUCTION
COMPANY WAREHOUSE
REPLICATION AND
PROPAGATION
DATA MART
KNOWLEDGE DISCOVERY
AND
DATA MINING
INFORMATION ACCESS
AND MANAGEMENT
Fabio A. Schreiber
DATA MART
DATA MART
1 +1 = 3
??
Distributed I.S. 29
MAIN PROBLEMS IN A DW
•
VIEW AND METADATA MAINTENANCE
•
REPLICATION MANAGEMENT
•
CONSISTENCY MANAGEMENT
•
APPLICATIONS IMPLEMENTATION
Fabio A. Schreiber
Distributed I.S. 30
A DISTRIBUTED DATA BASE
IS A SET OF FILES, STORED IN DIFFERENT
NODES OF A DISTRIBUTED SYSTEM, WHICH
ARE LOGICALLY CORRELATED WITH
FUNCTIONAL RELATIONSHIPS OR WHICH ARE
REPLICAS OF THE SAME FILE, IN SUCH A WAY
AS TO CONSTITUTE A SINGLE DATA
COLLECTION
Fabio A. Schreiber
Distributed I.S. 31
A DISTRIBUTED DATA BASE ...
•
IS A DATA BASE
– AN INTEGRATED ACCESS MODE TO DATA MUST EXIST
– IT MUST BE PROTECTED AGAINST INCONSISTENCIES AND
FAILURES IN SUCH A WAY AS TO GUARANTEE DATA
INTEGRITY
•
IS DISTRIBUTED
– PHYSICAL DATA DISTRIBUTION MUST BE TRANSPARENT TO
THE END USER
Fabio A. Schreiber
Distributed I.S. 32
SOME DESIGN PROBLEMS
• GENERAL SYSTEM ARCHITECTURE
– DESIGN FROM SCRATCH
– SYSTEM RESTRUCTURING
– SYSTEM AND DATA HETEROGENEITY
• LOGICAL RELATIONS FRAGMENTATION
•
REPLICATION AND ALLOCATION
– HOW MANY COPIES AND WHERE
•
ACCESS TO AND PROCESSING OF RELATIONS
•
INTEGRITY AND PRIVACY
•
RELIABILITY
Fabio A. Schreiber
Distributed I.S. 33
DATA INDEPENDENCE
THE NOTION OF DATA INDEPENDENCE MUST BE EXTENDED
TO ENCOMPASS THE FOLLOWING CASES
•
LOGICAL
THE DB ADMINISTRATOR NEEDS TO RESHAPE THE GLOBAL
SCHEMA IN ORDER TO MEET THE REQUIREMENTS OF A VERY
LARGE, HETEROGENEOUS AND DYNAMIC SET OF USERS
•
PHYSICAL
IMMUNITY TO (DYNAMIC) NETWORK CONFIGURATION
CHANGES (SITES CONNECTION/DISCONNECTION)
Fabio A. Schreiber
Distributed I.S. 34
INTEGRATED DDBMS
external schemas
SE1
SE2
global conceptual
schema
fragmentation
schema
site
independent
DDBMS
location
schema
LDBMS
Fabio A. Schreiber
SML 1
SML 2
SML m
local mapping
schemas
SCL 1
SCL 2
SCL m
local conceptual
schemas
SIL 1
SIL 2
SIL m
local internal
schemas
Distributed I.S. 35
FEDERATED DDSS
SE2
SE1
external schemas
SEn
global conceptual
schema
fragmentation
schema
site
independent
DDBMS
location
schema
SEL1
LDBMS
Fabio A. Schreiber
SML 1
SML 2
SML m
local mapping
schemas
SCL 1
SCL 2
SCL m
local conceptual
schemas
SIL 1
SIL 2
SIL m
local internal
schemas
Distributed I.S. 36
GLOBAL SCHEMA DESIGN
• SIMILAR TO THE VIEW INTEGRATION PROBLEM
• STRUCTURAL CONFLICTS (different schemas)
• SEMANTIC CONFLICTS (even with similar schemas)
– HOMONYMY (same name, different meaning)
– AMBIGUITY (different name, same meaning)
– FORMAT (NUMERIC
ALPHANUMERIC, ...)
– AGGREGATIONS (PART-OF, …)
– DATA OWN SEMANTIC (DIFFERENT MEASURE UNIT, DIFFERENT
GRANULARITY, DIFFERENT JUDGEMENT, …)
• UPDATING LIMITED TO LOCAL ACTIVITY
Fabio A. Schreiber
Distributed I.S. 37
SEMANTIC CONFLICTS SOLUTION
• LOCAL DATA BASE RESTRUCTURING (NOT
FEASIBLE)
•
EXPLICIT ADDITION IN THE SCHEMAS OF
SEMANTIC INFORMATION ABOUT DATA TO
ALLOW APPLICATION PROGRAMS TO BEHAVE
ACCORDINGLY
Fabio A. Schreiber
Distributed I.S. 38
LOGICAL RELATIONS FRAGMENTATION
•
HORIZONTAL
– ALL THE FRAGMENTS SHARE THE SAME SCHEMA
– TUPLES BELONG TO FRAGMENTS ACCORDING TO A
SELECTION PREDICATE CORRESPONDING TO A
DISTRIBUTION CRITERION
•
VERTICAL
– EACH FRAGMENT SCHEMA IS A PROJECTION OF THE
GLOBAL RELATION SCHEMA
• SCHEMAS WITH A NOT EMPTY INTERSECTION
• DISJOINT SCHEMAS
Fabio A. Schreiber
Distributed I.S. 39
LOGICAL RELATIONS FRAGMENTATION
Fabio A. Schreiber
=
+
=
+
=
+
HORIZONTAL
VERTICAL
DUPLICATION
Distributed I.S. 40
DATA REPLICATION
•
PERMANENCE
– A COPY OF A DATA ELEMENT (RELATION OR FRAGMENT) IS
PERMANENT IF IT EVOLVES IN TIME UNDER THE DDBMS
MANAGEMENT
– IT IS TEMPORARY IF IT IS CREATED ONLY FOR SOME
SPECIFIC OPERATION (IN A WORK AREA) AND THEN IT IS
CANCELLED OR REFRESHED, ON DEMAND, FROM THE
MASTER COPY
Fabio A. Schreiber
Distributed I.S. 41
DATA REPLICATION
•
CONSISTENCY
– STRONG CONSISTENCY
AT EVERY INSTANT EACH COPY OF EACH DATA
ELEMENT MUST HAVE IDENTICAL VALUES
– WEAK CONSISTENCY
UPDATES MADE ON A COPY ARE PROPAGATED TO THE
OTHER COPIES LATER ON
– INDEPENDENCE
UPDATES ON DIFFERENT COPIES ARE UNCORRELATED
(OFTEN USED WITH TEMPORARY COPIES)
Fabio A. Schreiber
Distributed I.S. 42
FRAGMENTATION AND REPLICATION
R 11
R
R1
1
R2
SITE 1
2
R2
R1
2
R2
SITE 2
R3
3
R4
GLOBAL RELATION
R2
3
R3
FRAGMENTS
SITE 3
R 43
PHYSICAL IMAGES
Fabio A. Schreiber
Distributed I.S. 43
TRANSPARENCY LEVELS
•
TO FRAGMENTATION
– APPLICATION PROGRAMS REFER TO GLOBAL RELATIONS
AND IGNORE FRAGMENTATION
•
TO LOCATION
– APPLICATION PROGRAMS ARE INDEPENDENT OF
REPLICATION AND OF PHYSICAL DATA LOCATION, BUT
THEY PERCEIVE THE CHANGES IN THE FRAGMENTATION
SCHEMA
•
TO LOCAL MAPPING
– APPLICATION PROGRAMS USE THE OBJECTS (DATA
FRAGMENTS OR ACCESS PRIMITIVES) GLOBAL NAMES,
BUT THEY MUST SPECIFY THE LOCATION SITE
•
NO TRANSPARENCY
– THE APPLICATION PROGRAMMER MUST WRITE THE
ACCESS MODULE FOR EACH DBMS, WHICH ONLY
ACTIVATE THE REMOTE MODULES
Fabio A. Schreiber
Distributed I.S. 44
TRANSPARENCY LEVELS: QUERIES
fragmentation transparency
DDBMS
read (terminal,$SNUM);
select NAME into $NAME
from SUPPLIER
where SNUM=$SNUM;
write (terminal, $SNAME)
Fabio A. Schreiber
SUPPLIER 1
SITE 1
SUPPLIER 2
SITE 2
SUPPLIER 2
SITE 3
Distributed I.S. 45
TRANSPARENCY LEVELS: QUERIES
location transparency
DDBMS
read (terminal,$SNUM);
select NAME into $NAME
from SUPPLIER_1
where SNUM=$SNUM;
if not #FOUND then
select NAME into $NAME
from SUPPLIER_2
where SNUM=$SNUM;
write (terminal, $SNAME)
Fabio A. Schreiber
SUPPLIER 1
SITE 1
SUPPLIER 2
SITE 2
SUPPLIER 2
SITE 3
OR
Distributed I.S. 46
TRANSPARENCY LEVELS: QUERIES
local mapping transparency
DDBMS
read (terminal,$SNUM);
select NAME into $NAME
from SUPPLIER_1 AT SITE_1
where SNUM=$SNUM;
if not #FOUND then
select NAME into $NAME
from SUPPLIER 2_AT SITE_3
where SNUM=$SNUM
write (terminal, $SNAME)
Fabio A. Schreiber
SUPPLIER 1
SITE 1
SUPPLIER 2
SITE 2
SUPPLIER 3
SITE 3
Distributed I.S. 47
TRANSPARENCY LEVELS: QUERIES
no transparency
SUPPINQRY:
read (terminal, $SNUM);
execute $SUPIMS ($SNUM, $FOUND, $NAME) at site 1;
if not $FOUND then
execute $SUPCODASYL ($SNUM, $FOUND, $NAME) at site 3;
write (terminal, $NAME)
DDBMS
SUPCODASYL (SNUM, FOUND, NAME)
find SUPLIER_RECORD
……………..
Fabio A. Schreiber
SUPIMS (SNUM, FOUND, NAME)
get unique SUPPLIER_SEGMENT
…………….
local
DBMS
(Codasyl)
local
DBMS
(IMS)
DATABASE
Codasyl
DATABASE
IMS
Distributed I.S. 48
TRANSPARENCY LEVELS: UPDATES
• FOR FRAGMENTATION TRANSPARENCY
– IF AN ATTRIBUTE BELONGING TO AN HORIZONTAL
FRAGMENTATION PREDICATE IS UPDATED THE TUPLE
SHALL BE MOVED BETWEEN THE FRAGMENTS
• FOR LOCATION AND REPLICATION TRANSPARENCY
– ALL THE COPIES MUST BE UPDATED SIMULTANEOUSLY
Fabio A. Schreiber
Distributed I.S. 49
MULTIDATABASE
SE2
SE1
external schemas
SEn
NETWORK CATALOG
SCL 1
SCL 2
SCL m
local conceptual
schemas
SIL 2
SIL m
local internal
schemas
LDBMS
SIL 1
Fabio A. Schreiber
Distributed I.S. 50
MULTIBASE
USER
DATA
GLOBAL
QUERY
QUERY
RESULT
GLOBAL
DATA
MANAGER
AUXILIARY
SCHEMA
(DAPLEX)
LOCAL
DATABASE
INTERFACE
AUXILIARY
DATABASE
HOST
DBMS
Fabio A. Schreiber
GLOBAL SCHEMA
(DAPLEX)
LOCAL SCHEMA
(DAPLEX)
LOCAL
DATABASE
INTERFACE
LOCAL HOST
SCHEMA AND
DATA
HOST
DBMS
LOCAL SCHEMA
(DAPLEX)
LOCAL HOST
SCHEMA AND
DATA
Distributed I.S. 51
MEDIATORS
A MEDIATOR IS A SOFTWARE MODULE (AT ISO-OSI LEVEL 7)
WHICH USES AN IMPLICIT KNOWLEDGE OF SOME DATA SETS
OR SUBSETS TO CREATE KNOWLEDGE FOR A HIGHER
APPLICATION LAYER (WIEDERHOLD)
ITS MAIN FUNCTION IS OBJECT FUSION
– TO GROUP INFORMATION ABOUT THE SAME ENTITY OF THE
REAL WORLD
– TO REMOVE REDUNDANCIES AMONG DIFFERENT SOURCES
– TO SOLVE INCONSISTENCIES AMONG DIFFERENT SOURCES
Fabio A. Schreiber
Distributed I.S. 52
MEDIATORS
DATA SOURCE
MEDIATOR
USER
WORKSTATION
DATA
ABSTRACTION MECHANISM
KNOWLEDGE
FORMATTED QUERY
RAW ANSWERS
TRIGGER EVENTS
Fabio A. Schreiber
QUERY
USEFUL ANSWERS
INSPECTIONS
Distributed I.S. 53
WRAPPER
TRANSLATE QUERIES IN ONE OR MORE COMMANDS/
QUERIES UNDERSTANDABLE BY THE SPECIFIC SOURCE
THEY CAN EXTEND THE QUERYING POWER OF A SPECIFIC
SOURCE
THEY CONVERT NATIVE FORMAT RESULTS TO A FORMAT
UNDERSTANDABLE BY THE APPLICATION
THEIR WRITING INVOLVES A LARGE IMPLEMENTATION
EFFORT, BUT SOME TOOLS CAN EASE THE TASK (e.g. THE
TSIMMIS TOOLKIT)
Fabio A. Schreiber
Distributed I.S. 54
EXAMPLE OF SEMANTIC CONFLICT
CONTEXT C1:
• MONEY AMOUNTS IN ORIGINAL CURRENCY
• MONEY AMOUNTS SCALE 1:1 BUT FOR
YEN WHICH IS SCALED 1:1000
r1
CONTEXT C2:
• MONEY AMOUNTS IN USD SCALE 1:1
r2
COMPANY EXPENSES
COMPANY
REVENUE
COUNTRY
IBM
1 000 000
USA
NTT
1 000 000
JPN
IBM
1 500 000
NTT
5 000 000
r3
r4
FROMCURREN TOCURRENCY EXCHRATE
COUNTRY
CURRENCY
CURRENCY
SCALE
USA
USD
ALL
1:1
JPN
JPY
JPY
1:1000
SOURCE: C. H. GOH et Al.
Fabio A. Schreiber
USD
JPY
104.0
JPY
USD
.0096
select r1.company, r1.revenue
from r1, r2
where r1.company = r2.company
and r1.revenue > r2.expenses
Distributed I.S. 55
AN EXAMPLE OF ARCHITECTURE WITH MEDIATORS
(TSIMMIS)
WRAPPER 1
DATA
SOURCE 1
(RDBMS)
Fabio A. Schreiber
APPLICATION 1
APPLICATION 2
MEDIATOR 1
MEDIATOR 2
WRAPPER 2
DATA
SOURCE 2
(LOTUS NOTES)
WRAPPER 3
DATA
SOURCE 3
(WWW)
Distributed I.S. 56
NETWORK CATALOGUE
CATALOGUE
CATALOGUE
FULLY REPLICATED
• FAST ACCESS
• DISK OCCUPATION
• DATA CONSISTENCY
CATALOGUE
NO CATALOGUE AT ALL
• BROADCASTING NEED
• ACCESS OVERHEAD
Fabio A. Schreiber
Distributed I.S. 57
NETWORK CATALOGUE
FULLY CENTRALIZED
• ACCESS BOTTLENECK
• DISK OCCUPATION
• DATA CONSISTENCY
CATALOGUE
LOCAL
CATALOGUE 1
LOCAL
CATALOGUE 2
PARTIALLY REPLICATED
• A GOOD COMPROMISE
BETWEEN OVERHEAD AND
ACCESS EFFICIENCY
MASTER
CATALOGUE
Fabio A. Schreiber
Distributed I.S. 58
OVERALL ARCHITECTURE
CATALOGUE
LOCAL DB
TRANSPORT
LDBMS
INTERFACE
TRANSPORT
CATALOGUE
CATALOGUE
LOCAL DB
INTERFACE
OPERATING SYSTEM
OPERATING SYSTEM
LDBMS
TRANSPORT
INTERFACE
LDBMS
OPERATING SYSTEM
LOCAL DB
DATA TRANSMISSION NETWORK
Fabio A. Schreiber
Distributed I.S. 59
A MODEL FOR DISTRIBUTED AND HETEROGENEOUS
INFORMATION SERVICES
APPLICAZIONS
DW, EDI, INTRANET, ...
IS ARCHITECTURE
TSIMMIS, Workflow, CoopWARE, ...
IS INFRASTRUCTURE
CORBA, HTTP, ...
DATA SUPPORT
DBMS, browser HTML, ...
Fabio A. Schreiber
TELECOMMUNICATIONS SUPPORT
TCP/IP, ...
Distributed I.S. 60
Download