DISTRIBUTED DATA MANAGEMENT Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano COMPANY TELECOMMUNICATIONS 1st STAGE (< ‘80) • TELEPHONE (POTS), TELEX, DATA TRANSMISSION INDEPENDENT ON SEPARATE NETWORKS 2nd STAGE (‘80 - ‘95) • DESIGN AND IMPLEMENTATION OF LARGE DIGITAL COMMUNICATION NETWORKS • DEVELOPMENT OF LOCAL NETWORKS OF PCs AND WORKSTATIONS 3rd STAGE (> 1995) • INTEGRATION AND MANAGEMENT OF LARGE HETEROGENEOUS WANs AND OF LOCAL NETWORKS Fabio A. Schreiber Distributed I.S. 1 ISO-OSI REFERENCE MODEL INTERFACE TERMINAL NODE APPLICATION LEVEL INTERFACE PROTOCOLS TERMINAL NODE APPLICATION LEVEL PRESENTATION LEVEL PRESENTATION LEVEL SESSION LEVEL SESSION LEVEL TRANSPORT LEVEL TRANSPORT LEVEL RELAY NODE NETWORK LEVEL NETWORK LEVEL DATA LINK LEVEL DATA LINK LEVEL MEDIA ACCESS SUBLEVEL MEDIA ACCESS SUBLEVEL PHYSISCAL LEVEL PHYSISCAL LEVEL Fabio A. Schreiber Distributed I.S. 2 DISTRIBUTED SYSTEMS APPLICATIONS • ACCESS TO REMOTE RESOURCES connection to different computers and computing centers is made possible using the same terminal (e.g. TELNET, FTP, … PROTOCOLS) • DISTRIBUTED COMPUTING complex systems are built in which the application process uses several remote computers and/or data sets, through telecommunication networks (e.g. distributed information systems, HPC, …) • TELEMATIC APPLICATIONS – electronic mail – teleconference – ……. Fabio A. Schreiber Distributed I.S. 3 DATA MANAGEMENT ON THE NETWORK Fabio A. Schreiber Distributed I.S. 4 DATA MANAGEMENT ON THE NETWORK : INDEPENDENT DATA BASES Fabio A. Schreiber Distributed I.S. 5 DATA MANAGEMENT ON THE NETWORK : COMMON COMMAND LANGUAGE Fabio A. Schreiber Distributed I.S. 6 DATA MANAGEMENT ON THE NETWORK : DISTRIBUTED DATA BASE DDBMS Fabio A. Schreiber Distributed I.S. 7 MODERN INFORMATION SYSTEMS TELECOMMUNICATION NETWORKS BECOME AN ESSENTIAL COMPONENT FOR A GOOD ECONOMICAL AND FUNCTIONAL OPERATION OF THE ORGANIZATION THE AVAILABILITY OF EFFECTIVE TELECOMMUNICATION SYSTEMS ALLOWS THE DEVELOPMENT OF NEW BUSINESS TYPES Fabio A. Schreiber Distributed I.S. 8 VERTICAL FUNCTION PARTITIONING OF AN I.S. EDP CENTER TECHNICAL DEPT. 802.5 gateway gateway gateway gateway BACKBONE FDDI or MAN 802.6 FACTORY 802.4 gateway PABX WAREHOUSE 802.3 ADMINISTRATIVE DEPT. 802..3 ISDN Fabio A. Schreiber Distributed I.S. 9 HORIZONTAL FUNCTION PARTITIONING OF AN I.S. EDP CENTER gateway ADMINISTRATIVE DEPT 802..3 gateway gateway ADMINISTRATIVE DEPT 802..3 BACKBONE FDDI or gateway MAN 802.6 gateway PABX ADMINISTRATIVE DEPT 802..3 ADMINISTRATIVE DEPT 802..3 ISDN Fabio A. Schreiber Distributed I.S. 10 DATA MANAGEMENT 1st STAGE (< ‘70) • SPARSE FILES 2nd STAGE ( ‘70 - ‘90) • LARGE CENTRALIZED DATA BASES 3rd STAGE (> 1990) • DISTRIBUTED DATA MANAGEMENT Fabio A. Schreiber Distributed I.S. 11 SYSTEM ARCHITECTURE: A DIALECTIC PROCESS THE THESIS THE ANTITHESIS 10, 100, 1000 DECENTRALIZED MINICOMPUTERS A MAINFRAME IS THE BEST! THE SYNTHESIS DISTRIBUTED INFORMATICS FOR MAXIMAL FLEXIBILITY Fabio A. Schreiber Distributed I.S. 12 DISTRIBUTED DATA MANAGEMENT: FUNCTIONAL GOALS • AVAILABILITY • LOAD SHARING • RESOURCE SHARING • QUALITY OF SERVICE TO THE USER Fabio A. Schreiber Distributed I.S. 13 FUNCTIONAL GOALS: AVAILABILITY • REDUNDANT HW/SW RESOURCES CAN BE USED TO OBTAIN AN OVERALL SYSTEM HIGHER AVAILABILITY - FAULT TOLERANT SYSTEMS - SOFT DEGRADATION SYSTEMS Fabio A. Schreiber Distributed I.S. 14 FUNCTIONAL GOALS: LOAD SHARING IT ALLOWS A BALANCED RESOURCES DEVELOPMENT WKL1 WKL1 MONTH Fabio A. Schreiber HOUR Distributed I.S. 15 FUNCTIONAL GOALS: LOAD SHARING IT ALLOWS A BALANCED RESOURCES DEVELOPMENT WKL1 WKL1 MONTH Fabio A. Schreiber HOUR Distributed I.S. 16 DISTRIBUTED DATA MANAGEMENT: FUNCTIONAL GOALS • RESOURCE SHARING SPECIALIZED OR UNIQUE RESOURCES CAN BE SHARED AT WHICHEVER NODE • QUALITY OF SERVICE IMPROVEMENT – LOCAL PROCESSING CAPABILITIES – RESPONSE TIME REDUCTION – USER FRIENDLY INTERFACE Fabio A. Schreiber Distributed I.S. 17 DISTRIBUTED DATA MANAGEMENT: ECONOMICAL AND ORGANIZATIONAL FACTORS • THE PROS – LOCAL SYSTEMS EFFECTIVENESS ALLOWS A TIGHTER CONNECTION BETWEEN USERS AND SYSTEM – DISTRIBUTION OF THE “POWER” (ORGANIZATIONAL, PSYCHOLOGICAL, POLITICAL, …) ASSOCIATED TO INFORMATION OWNING – ORGANIZATIONAL ACTIVITIES INTEGRATION IN GEOGRAPHICALLY DISTRIBUTED COMPANIES – COST/BENEFIT ??? • THE CONS – THE GENERAL COORDINATION AND COLLABORATION NEED REQUIRES A “CULTURAL ATTITUDE” NOT EASY TO FIND – COST/BENEFIT??? Fabio A. Schreiber Distributed I.S. 18 DISTRIBUTED SYSTEMS SYSTEMS IN WHICH MESSAGE TRANSMISSION TIME IS NOT NEGLIGIBLE WITH RESPECT TO THE TIME BETWEEN TWO EVENTS IN A SINGLE PROCESS Fabio A. Schreiber Distributed I.S. 19 DISTRIBUTED DATA MANAGEMENT: ARCHITECTURES Hw/Sw ARCHITECTURE narrow bandwidth loosely coupled systems geographically distributed computer networks wide bandwidth loosely coupled systems DATA MANAGEMENT Distributed Data Base Management System (DDBMS) back-end processor local networks, functionally distributed systems wide bandwidth tightly coupled systems database machine multiprocessor systems, associative memories, ... Fabio A. Schreiber Distributed I.S. 20 DDSS SPACE (DISTRIBUTED DATA SHARING SYSTEMS) distribution Federated distributed DBMS Homogeneous distributed DBMS Heterogeneous distributed DBMS Heterogeneous Federated distributed DBMS Multidatabase distributed systems Heterogeneous distributed multidatabase systems Homogeneous integrated DBMS Multidatabase systems autonomy Heterogeneous integrated DBMS Heterogeneous Federated local DBMS Heterogeneous multidatabase systems heterogeneity Fabio A. Schreiber Distributed I.S. 21 DDSS TAXONOMY THE GLOBAL SYSTEM HAS ACCES TO ... TYPICAL LOCAL NODES ARE ... DISTRIBUTED DATA BASE DBMS INTERNAL FUNCTIONS HOMOGENEOUS DATA BASES Y NAME SPACE AND GLOBAL SCHEMA FEDERATED DATA BASE WITH A GLOBAL SCHEMA DBMS USER INTERFACE HETEROGENEOUS DATA BASES Y GLOBAL SCHEMA FEDERATED DATA BASE DBMS USER INTERFACE HETEROGENEOUS DATA BASES Y PARTIAL GLOBAL SCHEMATA MULTIDATABASE WITH A SYSTEM LANGUAGE DBMS USER INTERFACE HETEROGENEOUS DATA BASES Y ACCESS FUNCTIONS IN THE LANGUAGE HOMOGENEOUS MULTIDATABASE WITH A SYSTEM LANGUAGE DBMS USER INTERFACE AND SOME INTERNAL FUNCTIONS HOMOGENEOUS DATA BASES Y ACCESS FUNCTIONS IN THE LANGUAGE APPLICATION PROGRAMS ABOVE DBMS ANY DATA SOURCE SATISFYING THE INTERFACE PROTOCOL N DATA EXCHANGE SYSTEM TYPE INTEROPERABLE SYSTEMS Fabio A. Schreiber HOW GLOBAL INFORMATION GLOBAL DB IS DEALT WITH FUNCTIONALITY Distributed I.S. 22 DISTRIBUTED DATA MANAGEMENT • CLIENT-SERVER ARCHITECTURE – MANY CLIENTS REFER TO A SINGLE SERVER – MAINLY USED FOR OLTP ON LOCAL NETWORKS • DISTRIBUTED DATABASE – MANY CLIENTS REFER TO MANY SERVERS – MAINLY USED FOR OLTP ON LOCAL AND WIDE AREA NETWORKS • DATA WAREHOUSE – DATA COLLECTION FROM MANY DIFFERENT DATA SOURCES – USED IN DSS ON LOCAL AND WIDE AREA NETWORKS Fabio A. Schreiber Distributed I.S. 23 DDSS IN A CLIENT-SERVER ARCHITECTURE • THE CLIENT – IS MANAGED BY THE APPLICATION PROGRAMMER – SHOWS A FRIENDLY INTERFACE TO THE FINAL USER – USES EITHER STATIC OR DYNAMIC SQL • THE SERVER – IS MANAGED BY THE DATABASE ADMINISTRATOR – ITS DIMENSION DEPENDS ON THE WORKLOAD AND ON THE SERVICES TO BE DELIVERED – MANAGES THE OPTIMIZATION PROCEDURES Fabio A. Schreiber Distributed I.S. 24 DDSS IN A CLIENT-SERVER ARCHITECTURE USING DDBMS PRIMITIVES DATABASE ACCESS PRIMITIVES APPLICATION PROGRAM CLIENT SITE DDBMS1 SERVER SITE RESULTS DATABASE DDBMS2 Fabio A. Schreiber Distributed I.S. 25 DDSS IN A CLIENT-SERVER ARCHITECTURE USING AUXILIARY PROGRAMS AND RPC CLIENT SITE APPLICATION PROGRAM DBMS1 RPC SERVER SITE AUXILIARY PROGRAM DBMS2 RESULTS DATABASE DATABASE ACCESS PRIMITIVES Fabio A. Schreiber Distributed I.S. 26 DDSS IN A CLIENT-SERVER ARCHITECTURE • USING DDBMS PRIMITIVES – THE DDBMS LOCAL COMPONENT ROUTES THE QUERY TO THE SERVER WHICH ACCESSES THE DATABASE AND SENDS BACK THE RESULTS – HIGH DISTRIBUTION TRANSPARENCY THANKS TO GLOBAL FILE NAMES – LOW EFFICIENCY SINCE THE ANSWER TRAVELS ONE TUPLE AT A TIME • USING AUXILIARY PROGRAMS AND RPC – THE APPLICATION ASKS THE AUXILIARY PROGRAM TO EXECUTE ON THE SERVER AND TO SEND BACK THE RESULT – THE AUXILIARY PROGRAM ASSEMBLES TUPLES INTO RESULT SETS IMPROVING TRANSMISSION EFFICIENCY Fabio A. Schreiber Distributed I.S. 27 DATA WAREHOUSE (DW) • A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA COMING FROM DIFFERENT SOURCES TO OBTAIN A DETAILED VIEW OF AN ECONOMIC SYSTEM • IT IS AN – – – – INTEGRATED PERMANENT TIME VARIANT TOPIC ORIENTED DATA COLLECTION TO SUPPORT MANAGERIAL DECISIONS • IT IS THE SEPARATION ELEMENT BETWEEN OLTP AND DSS WORKLOADS Fabio A. Schreiber Distributed I.S. 28 DW ARCHITECTURE DATABASE 1 DATABASE LEGACY DATABASE 2 SPARSE FILES WAREHOUSE CONSTRUCTION COMPANY WAREHOUSE REPLICATION AND PROPAGATION DATA MART KNOWLEDGE DISCOVERY AND DATA MINING INFORMATION ACCESS AND MANAGEMENT Fabio A. Schreiber DATA MART DATA MART 1 +1 = 3 ?? Distributed I.S. 29 MAIN PROBLEMS IN A DW • VIEW AND METADATA MAINTENANCE • REPLICATION MANAGEMENT • CONSISTENCY MANAGEMENT • APPLICATIONS IMPLEMENTATION Fabio A. Schreiber Distributed I.S. 30 A DISTRIBUTED DATA BASE IS A SET OF FILES, STORED IN DIFFERENT NODES OF A DISTRIBUTED SYSTEM, WHICH ARE LOGICALLY CORRELATED WITH FUNCTIONAL RELATIONSHIPS OR WHICH ARE REPLICAS OF THE SAME FILE, IN SUCH A WAY AS TO CONSTITUTE A SINGLE DATA COLLECTION Fabio A. Schreiber Distributed I.S. 31 A DISTRIBUTED DATA BASE ... • IS A DATA BASE – AN INTEGRATED ACCESS MODE TO DATA MUST EXIST – IT MUST BE PROTECTED AGAINST INCONSISTENCIES AND FAILURES IN SUCH A WAY AS TO GUARANTEE DATA INTEGRITY • IS DISTRIBUTED – PHYSICAL DATA DISTRIBUTION MUST BE TRANSPARENT TO THE END USER Fabio A. Schreiber Distributed I.S. 32 SOME DESIGN PROBLEMS • GENERAL SYSTEM ARCHITECTURE – DESIGN FROM SCRATCH – SYSTEM RESTRUCTURING – SYSTEM AND DATA HETEROGENEITY • LOGICAL RELATIONS FRAGMENTATION • REPLICATION AND ALLOCATION – HOW MANY COPIES AND WHERE • ACCESS TO AND PROCESSING OF RELATIONS • INTEGRITY AND PRIVACY • RELIABILITY Fabio A. Schreiber Distributed I.S. 33 DATA INDEPENDENCE THE NOTION OF DATA INDEPENDENCE MUST BE EXTENDED TO ENCOMPASS THE FOLLOWING CASES • LOGICAL THE DB ADMINISTRATOR NEEDS TO RESHAPE THE GLOBAL SCHEMA IN ORDER TO MEET THE REQUIREMENTS OF A VERY LARGE, HETEROGENEOUS AND DYNAMIC SET OF USERS • PHYSICAL IMMUNITY TO (DYNAMIC) NETWORK CONFIGURATION CHANGES (SITES CONNECTION/DISCONNECTION) Fabio A. Schreiber Distributed I.S. 34 INTEGRATED DDBMS external schemas SE1 SE2 global conceptual schema fragmentation schema site independent DDBMS location schema LDBMS Fabio A. Schreiber SML 1 SML 2 SML m local mapping schemas SCL 1 SCL 2 SCL m local conceptual schemas SIL 1 SIL 2 SIL m local internal schemas Distributed I.S. 35 FEDERATED DDSS SE2 SE1 external schemas SEn global conceptual schema fragmentation schema site independent DDBMS location schema SEL1 LDBMS Fabio A. Schreiber SML 1 SML 2 SML m local mapping schemas SCL 1 SCL 2 SCL m local conceptual schemas SIL 1 SIL 2 SIL m local internal schemas Distributed I.S. 36 GLOBAL SCHEMA DESIGN • SIMILAR TO THE VIEW INTEGRATION PROBLEM • STRUCTURAL CONFLICTS (different schemas) • SEMANTIC CONFLICTS (even with similar schemas) – HOMONYMY (same name, different meaning) – AMBIGUITY (different name, same meaning) – FORMAT (NUMERIC ALPHANUMERIC, ...) – AGGREGATIONS (PART-OF, …) – DATA OWN SEMANTIC (DIFFERENT MEASURE UNIT, DIFFERENT GRANULARITY, DIFFERENT JUDGEMENT, …) • UPDATING LIMITED TO LOCAL ACTIVITY Fabio A. Schreiber Distributed I.S. 37 SEMANTIC CONFLICTS SOLUTION • LOCAL DATA BASE RESTRUCTURING (NOT FEASIBLE) • EXPLICIT ADDITION IN THE SCHEMAS OF SEMANTIC INFORMATION ABOUT DATA TO ALLOW APPLICATION PROGRAMS TO BEHAVE ACCORDINGLY Fabio A. Schreiber Distributed I.S. 38 LOGICAL RELATIONS FRAGMENTATION • HORIZONTAL – ALL THE FRAGMENTS SHARE THE SAME SCHEMA – TUPLES BELONG TO FRAGMENTS ACCORDING TO A SELECTION PREDICATE CORRESPONDING TO A DISTRIBUTION CRITERION • VERTICAL – EACH FRAGMENT SCHEMA IS A PROJECTION OF THE GLOBAL RELATION SCHEMA • SCHEMAS WITH A NOT EMPTY INTERSECTION • DISJOINT SCHEMAS Fabio A. Schreiber Distributed I.S. 39 LOGICAL RELATIONS FRAGMENTATION Fabio A. Schreiber = + = + = + HORIZONTAL VERTICAL DUPLICATION Distributed I.S. 40 DATA REPLICATION • PERMANENCE – A COPY OF A DATA ELEMENT (RELATION OR FRAGMENT) IS PERMANENT IF IT EVOLVES IN TIME UNDER THE DDBMS MANAGEMENT – IT IS TEMPORARY IF IT IS CREATED ONLY FOR SOME SPECIFIC OPERATION (IN A WORK AREA) AND THEN IT IS CANCELLED OR REFRESHED, ON DEMAND, FROM THE MASTER COPY Fabio A. Schreiber Distributed I.S. 41 DATA REPLICATION • CONSISTENCY – STRONG CONSISTENCY AT EVERY INSTANT EACH COPY OF EACH DATA ELEMENT MUST HAVE IDENTICAL VALUES – WEAK CONSISTENCY UPDATES MADE ON A COPY ARE PROPAGATED TO THE OTHER COPIES LATER ON – INDEPENDENCE UPDATES ON DIFFERENT COPIES ARE UNCORRELATED (OFTEN USED WITH TEMPORARY COPIES) Fabio A. Schreiber Distributed I.S. 42 FRAGMENTATION AND REPLICATION R 11 R R1 1 R2 SITE 1 2 R2 R1 2 R2 SITE 2 R3 3 R4 GLOBAL RELATION R2 3 R3 FRAGMENTS SITE 3 R 43 PHYSICAL IMAGES Fabio A. Schreiber Distributed I.S. 43 TRANSPARENCY LEVELS • TO FRAGMENTATION – APPLICATION PROGRAMS REFER TO GLOBAL RELATIONS AND IGNORE FRAGMENTATION • TO LOCATION – APPLICATION PROGRAMS ARE INDEPENDENT OF REPLICATION AND OF PHYSICAL DATA LOCATION, BUT THEY PERCEIVE THE CHANGES IN THE FRAGMENTATION SCHEMA • TO LOCAL MAPPING – APPLICATION PROGRAMS USE THE OBJECTS (DATA FRAGMENTS OR ACCESS PRIMITIVES) GLOBAL NAMES, BUT THEY MUST SPECIFY THE LOCATION SITE • NO TRANSPARENCY – THE APPLICATION PROGRAMMER MUST WRITE THE ACCESS MODULE FOR EACH DBMS, WHICH ONLY ACTIVATE THE REMOTE MODULES Fabio A. Schreiber Distributed I.S. 44 TRANSPARENCY LEVELS: QUERIES fragmentation transparency DDBMS read (terminal,$SNUM); select NAME into $NAME from SUPPLIER where SNUM=$SNUM; write (terminal, $SNAME) Fabio A. Schreiber SUPPLIER 1 SITE 1 SUPPLIER 2 SITE 2 SUPPLIER 2 SITE 3 Distributed I.S. 45 TRANSPARENCY LEVELS: QUERIES location transparency DDBMS read (terminal,$SNUM); select NAME into $NAME from SUPPLIER_1 where SNUM=$SNUM; if not #FOUND then select NAME into $NAME from SUPPLIER_2 where SNUM=$SNUM; write (terminal, $SNAME) Fabio A. Schreiber SUPPLIER 1 SITE 1 SUPPLIER 2 SITE 2 SUPPLIER 2 SITE 3 OR Distributed I.S. 46 TRANSPARENCY LEVELS: QUERIES local mapping transparency DDBMS read (terminal,$SNUM); select NAME into $NAME from SUPPLIER_1 AT SITE_1 where SNUM=$SNUM; if not #FOUND then select NAME into $NAME from SUPPLIER 2_AT SITE_3 where SNUM=$SNUM write (terminal, $SNAME) Fabio A. Schreiber SUPPLIER 1 SITE 1 SUPPLIER 2 SITE 2 SUPPLIER 3 SITE 3 Distributed I.S. 47 TRANSPARENCY LEVELS: QUERIES no transparency SUPPINQRY: read (terminal, $SNUM); execute $SUPIMS ($SNUM, $FOUND, $NAME) at site 1; if not $FOUND then execute $SUPCODASYL ($SNUM, $FOUND, $NAME) at site 3; write (terminal, $NAME) DDBMS SUPCODASYL (SNUM, FOUND, NAME) find SUPLIER_RECORD …………….. Fabio A. Schreiber SUPIMS (SNUM, FOUND, NAME) get unique SUPPLIER_SEGMENT ……………. local DBMS (Codasyl) local DBMS (IMS) DATABASE Codasyl DATABASE IMS Distributed I.S. 48 TRANSPARENCY LEVELS: UPDATES • FOR FRAGMENTATION TRANSPARENCY – IF AN ATTRIBUTE BELONGING TO AN HORIZONTAL FRAGMENTATION PREDICATE IS UPDATED THE TUPLE SHALL BE MOVED BETWEEN THE FRAGMENTS • FOR LOCATION AND REPLICATION TRANSPARENCY – ALL THE COPIES MUST BE UPDATED SIMULTANEOUSLY Fabio A. Schreiber Distributed I.S. 49 MULTIDATABASE SE2 SE1 external schemas SEn NETWORK CATALOG SCL 1 SCL 2 SCL m local conceptual schemas SIL 2 SIL m local internal schemas LDBMS SIL 1 Fabio A. Schreiber Distributed I.S. 50 MULTIBASE USER DATA GLOBAL QUERY QUERY RESULT GLOBAL DATA MANAGER AUXILIARY SCHEMA (DAPLEX) LOCAL DATABASE INTERFACE AUXILIARY DATABASE HOST DBMS Fabio A. Schreiber GLOBAL SCHEMA (DAPLEX) LOCAL SCHEMA (DAPLEX) LOCAL DATABASE INTERFACE LOCAL HOST SCHEMA AND DATA HOST DBMS LOCAL SCHEMA (DAPLEX) LOCAL HOST SCHEMA AND DATA Distributed I.S. 51 MEDIATORS A MEDIATOR IS A SOFTWARE MODULE (AT ISO-OSI LEVEL 7) WHICH USES AN IMPLICIT KNOWLEDGE OF SOME DATA SETS OR SUBSETS TO CREATE KNOWLEDGE FOR A HIGHER APPLICATION LAYER (WIEDERHOLD) ITS MAIN FUNCTION IS OBJECT FUSION – TO GROUP INFORMATION ABOUT THE SAME ENTITY OF THE REAL WORLD – TO REMOVE REDUNDANCIES AMONG DIFFERENT SOURCES – TO SOLVE INCONSISTENCIES AMONG DIFFERENT SOURCES Fabio A. Schreiber Distributed I.S. 52 MEDIATORS DATA SOURCE MEDIATOR USER WORKSTATION DATA ABSTRACTION MECHANISM KNOWLEDGE FORMATTED QUERY RAW ANSWERS TRIGGER EVENTS Fabio A. Schreiber QUERY USEFUL ANSWERS INSPECTIONS Distributed I.S. 53 WRAPPER TRANSLATE QUERIES IN ONE OR MORE COMMANDS/ QUERIES UNDERSTANDABLE BY THE SPECIFIC SOURCE THEY CAN EXTEND THE QUERYING POWER OF A SPECIFIC SOURCE THEY CONVERT NATIVE FORMAT RESULTS TO A FORMAT UNDERSTANDABLE BY THE APPLICATION THEIR WRITING INVOLVES A LARGE IMPLEMENTATION EFFORT, BUT SOME TOOLS CAN EASE THE TASK (e.g. THE TSIMMIS TOOLKIT) Fabio A. Schreiber Distributed I.S. 54 EXAMPLE OF SEMANTIC CONFLICT CONTEXT C1: • MONEY AMOUNTS IN ORIGINAL CURRENCY • MONEY AMOUNTS SCALE 1:1 BUT FOR YEN WHICH IS SCALED 1:1000 r1 CONTEXT C2: • MONEY AMOUNTS IN USD SCALE 1:1 r2 COMPANY EXPENSES COMPANY REVENUE COUNTRY IBM 1 000 000 USA NTT 1 000 000 JPN IBM 1 500 000 NTT 5 000 000 r3 r4 FROMCURREN TOCURRENCY EXCHRATE COUNTRY CURRENCY CURRENCY SCALE USA USD ALL 1:1 JPN JPY JPY 1:1000 SOURCE: C. H. GOH et Al. Fabio A. Schreiber USD JPY 104.0 JPY USD .0096 select r1.company, r1.revenue from r1, r2 where r1.company = r2.company and r1.revenue > r2.expenses Distributed I.S. 55 AN EXAMPLE OF ARCHITECTURE WITH MEDIATORS (TSIMMIS) WRAPPER 1 DATA SOURCE 1 (RDBMS) Fabio A. Schreiber APPLICATION 1 APPLICATION 2 MEDIATOR 1 MEDIATOR 2 WRAPPER 2 DATA SOURCE 2 (LOTUS NOTES) WRAPPER 3 DATA SOURCE 3 (WWW) Distributed I.S. 56 NETWORK CATALOGUE CATALOGUE CATALOGUE FULLY REPLICATED • FAST ACCESS • DISK OCCUPATION • DATA CONSISTENCY CATALOGUE NO CATALOGUE AT ALL • BROADCASTING NEED • ACCESS OVERHEAD Fabio A. Schreiber Distributed I.S. 57 NETWORK CATALOGUE FULLY CENTRALIZED • ACCESS BOTTLENECK • DISK OCCUPATION • DATA CONSISTENCY CATALOGUE LOCAL CATALOGUE 1 LOCAL CATALOGUE 2 PARTIALLY REPLICATED • A GOOD COMPROMISE BETWEEN OVERHEAD AND ACCESS EFFICIENCY MASTER CATALOGUE Fabio A. Schreiber Distributed I.S. 58 OVERALL ARCHITECTURE CATALOGUE LOCAL DB TRANSPORT LDBMS INTERFACE TRANSPORT CATALOGUE CATALOGUE LOCAL DB INTERFACE OPERATING SYSTEM OPERATING SYSTEM LDBMS TRANSPORT INTERFACE LDBMS OPERATING SYSTEM LOCAL DB DATA TRANSMISSION NETWORK Fabio A. Schreiber Distributed I.S. 59 A MODEL FOR DISTRIBUTED AND HETEROGENEOUS INFORMATION SERVICES APPLICAZIONS DW, EDI, INTRANET, ... IS ARCHITECTURE TSIMMIS, Workflow, CoopWARE, ... IS INFRASTRUCTURE CORBA, HTTP, ... DATA SUPPORT DBMS, browser HTML, ... Fabio A. Schreiber TELECOMMUNICATIONS SUPPORT TCP/IP, ... Distributed I.S. 60