Application schema naming & structure information Databassystem Anv 4 Updates Anv Queries 3 Svar Updates Svar Anv Queries 2 Användare1 Modell Uppdatera Updates Queries Världen Database Databashanteringssystem (DBMS) Parsing & validating SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > ‘2001-08-30’ σ Intermediate form of query Frågor-Svar Databas SQL query ENTRY_DATE>2001-08-30 π Query optimizer Bearbetning av frågor och uppdateringar System catalogue/DD with metadata Tillgång till lagrad data Stored database with application data ORDER_ID,ENTRY_DATE ORDER Execution plan (Access plan) π Query code generator Code to execute the query Fysisk databas Application data ORDER_ID,ENTRY_DATE σ ENTRY_DATE>2001-08-30 ORDER Runtime DB processor Query result 1 20072007-0303-20 TDDB48 Lecture 1: Introduction 1 2 20072007-0303-20 TDDB48 Lecture 1: Introduction Basic Definitions Typical DBMS Functionality • Database: A collection of related data. • Data: Known facts that can be recorded and have an implicit meaning. • Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university. • Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. • Database System: The DBMS software together with the data itself. Sometimes, the applications are also included. • Define a database: in terms of data types, structures and constraints • Construct or Load the Database on a secondary storage medium • Manipulating the database: querying, generating reports, insertions, deletions and modifications to its content • Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent 3 20072007-0303-20 TDDB48 Lecture 1: Introduction 3 4 5 6 20072007-0303-20 TDDB48 Lecture 1: Introduction 2 4 Typical DBMS Functionality Other features: – Protection or Security measures to prevent unauthorized access – “Active” processing to take internal actions on data – Presentation and Visualization of data 5 20072007-0303-20 TDDB48 Lecture 1: Introduction 1 Information retrieval (IR) on the Internet IRS, DBMS, and AI 1. Locate document collections 2. Formulate query 3. Judge relevance. Data object IRS 20072007-0303-20 7 TDDB48 Lecture 1: Introduction document DBMS Traditional IR research and development has been concentrated on 2 and 3. The Internet (the web) requires 1 too. Basic function AI 7 8 9 10 retrieval (probabilistic) tabell retrieval (deterministic) logic expressions inference 20072007-0303-20 Database size small to very large small to very large usually small TDDB48 Lecture 1: Introduction 8 DBMS • deterministic SQL> select * from kund where nummer = 17; • meets the exact information need of the user • Cf.: search for memory stick in discussion forums 9 20072007-0303-20 TDDB48 Lecture 1: Introduction Lab policy LiU: Disciplinary actions • You are expected to do the lab assignments by yourself. Merely copying others solutions will not be tolerated, even if you make cosmetic changes to the code/solution. If we suspect that this, or any other form of cheating, has happened we will report it to the disciplinary board of the university. • Be prepared to be asked questions by your laboration assistant about detailed and specific code and also inquiries about why you have selected a specific solution. This applies to all lab group members. • If you have problems meeting a deadline it is much better to talk to the instructor about it than to cheat. (It is a shame that we have to say these things. They should be obvious.) • Any kind of academic dishonesty, such as cheating, plagiarism, use of unauthorized assistance, fraud and failure to comply with University examination rules, may result in the filing of a complaint to the University Disciplinary Committee. The potential penalties include expulsion, suspension, and revocation of previously earned grade or degree. • LiU Rules and regulations 11 20072007-0303-20 TDDB48 Lecture 1: Introduction 11 12 20072007-0303-20 TDDB48 Lecture 1: Introduction 12 2 Historical Development of Database Technology • Early Database Applications: The Hierarchical and Network Models were introduced in mid 1960s and dominated during the seventies. A bulk of the worldwide database processing still occurs using these models. • Relational Model based Systems: The model that was originally introduced in 1970 was heavily researched and experimented with in IBM and the universities. Relational DBMS Products emerged in the 1980s. 13 20072007-0303-20 TDDB48 Lecture 1: Introduction 13 14 20072007-0303-20 TDDB48 Lecture 1: Introduction 14 16 20072007-0303-20 TDDB48 Lecture 1: Introduction 16 Historical Development of Database Technology • Object-oriented applications: OODBMSs were introduced in late 1980s and early 1990s to cater to the need of complex data processing in CAD and other applications. Their use has not taken off much. • Data on the Web and E-commerce Applications: Web contains data in HTML (Hypertext markup language) with links among pages. This has given rise to a new set of applications and E-commerce is using new standards like XML (eXtended Markup Language). 15 20072007-0303-20 TDDB48 Lecture 1: Introduction 15 Varfö Varför databashanterare? databashanterare? Varfö Varför databashanterare: databashanterare: Enkelt Exempel, kundregister i C: create table kund (nummer integer, namn char(50), adress char(50)); struct kund { int nummer; char namn[50 + 1]; char adress[50 + 1]; struct kund* nextp; }; 17 20072007-0303-20 TDDB48 Lecture 1: Introduction select namn, adress from kund where nummer = 17; 17 18 20072007-0303-20 TDDB48 Lecture 1: Introduction 18 3 Varfö Varför databashanterare: databashanterare: Kraftfullt Varfö Varför databashanterare: databashanterare: Flexibelt select * from kund where namn like 'S%' order by adress; select namn from kund where adress = 'Vägen 8' and namn like 'S%'; select adress, count(*) from kund where namn = 'Anders' group by adress; alter table kund add telefon char(10); 19 20072007-0303-20 create index foo on kund(namn); TDDB48 Lecture 1: Introduction 19 20 Mer: Varfö Varför databashanterare? • • • • 20072007-0303-20 TDDB48 Lecture 1: Introduction 20 Flera anvä användare samtidigt Dataoberoende Flera användare samtidigt Persistens vid fel Datamodellering. Pelle Summerar lönekostnaden Kalle Uppdaterar lönerna för 1000 anställda 21 20072007-0303-20 TDDB48 Lecture 1: Introduction 21 22 Kalle 20072007-0303-20 Databas TDDB48 Lecture 1: Introduction 22 – Kontroll av redundant information – Dataåtkomst – Persistent datalagring – Tillåter frågor och analys – Tillåter flera användare – Representera flera användare – Effektiv lagring av data – Integritetsvillkor – Backup och återställning Strömavbrott 23 TDDB48 Lecture 1: Introduction DBMS: Sammanfattning av fördelar Persistens vid fel Uppdaterar lönerna för 1000 anställda 20072007-0303-20 Databas 23 24 20072007-0303-20 TDDB48 Lecture 1: Introduction 24 4 Categories of data models History of Data Models • Conceptual (high-level, semantic) • Implementation (representational) • Physical (low-level, internal) • The data model implies the schema, which implies what type of data that can be stored 25 20072007-0303-20 TDDB48 Lecture 1: Introduction 25 • Network Model: the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.). • Hierarchical Data Model: implemented in a joint effort by IBM and North American 26 20072007-0303-20 TDDB48 Lecture 1: Introduction 26 Network Model History of Data Models • ADVANTAGES: • Object-oriented Data Model(s): several models have been proposed for implementing in a database system since 1980s. One set comprises models of persistent OO Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB). • Object-Relational Models: Started with Informix Universal Server in 1990s. Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server etc. systems. • XML-based Models in 2000s 27 20072007-0303-20 TDDB48 Lecture 1: Introduction 27 • Able to model complex relationships and represents semantics of add/delete on the relationships. • Can handle most situations for modeling using record types and relationship types. • Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc. Programmers can do optimal navigation through the database. • DISADVANTAGES: • Navigational and procedural nature of processing • Database contains a complex array of pointers that thread through a set of records. • Little scope for automated “query optimization” 28 20072007-0303-20 Hierarchical Model • Simple to construct and operate on • Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies • Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. • DISADVANTAGES: TDDB48 Lecture 1: Introduction • Data lagras som tabeller • Teoretisk modell • Standardiserat frågespråk • I början var dock dessa databaser långsamma – de hierarkiska databaserna snabbare. • Navigational and procedural nature of processing • Database is visualized as a linear arrangement of records • Little scope for “query optimization” 20072007-0303-20 28 Relationsmodellen • ADVANTAGES: 29 TDDB48 Lecture 1: Introduction 29 30 20072007-0303-20 TDDB48 Lecture 1: Introduction 30 5 TreTre-schemaschema-arkitekturen Databasanvä Databasanvändare och roller • • • • • Olika schema på olika nivåer • Dataoberoende mellan nivåerna Databasadministratör Databasdesigner Slutanvändare Applikationsprogrammerare Vy 20072007-0303-20 Vy Konceptuell nivå • DBMS-designer • Verktygsutvecklare • Operatör, service 31 Vy Fysisk nivå TDDB48 Lecture 1: Introduction 31 32 20072007-0303-20 Databassprå Databasspråk TDDB48 Lecture 1: Introduction 32 Datamodeller idag • Data Definition Language - DDL • Relationsdatabaser vanligast • Fortfarande finns hierarkiska databaser (främst inom flygindustrin) • Objekt-orienterade och objekt-relationella databaser är en liten del • XML-databaser – nytt. – Specificerar det konceptuella schemat • Data Modification Language - DML – Lagra och hämta data • Data Control Language - DCL – Kontrollerar databasexekveringen • Host language – Tillägg till ett programmeringsspråk 33 20072007-0303-20 TDDB48 Lecture 1: Introduction 33 34 20072007-0303-20 Databassystem 34 ERER-modellering Anv 4 Updates Anv Queries 3 Svar Updates Svar Anv Queries 2 Användare1 Modell Uppdatera Updates Queries Världen TDDB48 Lecture 1: Introduction Frågor-Svar Databas Databashanteringssystem (DBMS) Bearbetning av frågor och uppdateringar Personnummer Tillgång till lagrad data Namn Telefon Fysisk databas 35 20072007-0303-20 TDDB48 Lecture 1: Introduction 35 36 20072007-0303-20 TDDB48 Lecture 1: Introduction Adress E-post Ålder 36 6 Symboler i ERER-diagram ERER-diagram Kandidatnycklar Attribut • Ett strukturerat sätt att modellera data • Oberoende av databastyp • Dokumentation av din datastruktur. PNummer E-post Sammansatta attribut AnstÅr Entitet FNamn Anställd Namn ENamn Age Härlett attribut 37 20072007-0303-20 TDDB48 Lecture 1: Introduction 37 38 20072007-0303-20 Relationer Anum Arbetar på TDDB48 Lecture 1: Introduction 38 Totalt deltagande Pnum Anställd Free Flervärt attribut Anum Pnum Avdelning Avdelning Arbetar på Anställd “Varje avdelning måste ha minst en anställd” “Anställda arbetar på avdelningar” 39 20072007-0303-20 TDDB48 Lecture 1: Introduction 39 40 Anställd Anum Arbetar på 20072007-0303-20 TDDB48 Lecture 1: Introduction 40 Anum Pnum 1 Avdelning Anställd Arbetar på 1 Avdelning “Varje avdelning har exakt en anställd och varje anställd jobbar på exakt en avdelning” “Varje anställd måste arbeta på en avdelning” 41 TDDB48 Lecture 1: Introduction Kardinalitet: Kardinalitet: Restriktioner på på antal Totalt deltagande, forts. Pnum 20072007-0303-20 41 42 20072007-0303-20 TDDB48 Lecture 1: Introduction 42 7 Restriktioner på på antal, forts. Restriktioner på på antal, forts. Anum Pnum N Arbetar på Anställd 1 M Avdelning 20072007-0303-20 TDDB48 Lecture 1: Introduction 20072007-0303-20 44 Restriktioner på på antal, forts. Arbetar på (1,100) 20072007-0303-20 TDDB48 Lecture 1: Introduction TDDB48 Lecture 1: Introduction 44 Pnum Anum N Avdelning Arbetar på Anställd 1 Avdelning “Anställda identifieras genom sin avdelning, t.ex. ‘Kalle på sälj’” “Varje avdelning kan ha upp till 100 anställda men varje anställd kan bara jobba på en avdelning” 45 Avdelning Svaga entiteter Anum (1,1) Anställd N “Varje avdelning kan ha många anställda och varje anställd kan jobba på flera avdelningar” 43 Pnum Arbetar på Anställd “Varje avdelning kan ha många anställda men varje anställd kan endast jobba på en avdelning” 43 Anum Pnum 45 20072007-0303-20 46 TDDB48 Lecture 1: Introduction 46 SUMMARY OF NOTATION FOR ER SCHEMAS Exempel Symbol Meaning ENTITY TYPE WEAK ENTITY TYPE • Studenter studerar på studieprogram och läser ett antal kurser. Varje kurs identifieras av en kurskod och ger studenten ett antal intjänade poäng. RELATIONSHIP TYPE IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIBUTE MULTIVALUED ATTRIBUTE COMPOSITE ATTRIBUTE DERIVED ATTRIBUTE E1 E1 1 R R 47 20072007-0303-20 TDDB48 Lecture 1: Introduction 47 48 E2 R 20072007-0303-20 N (min,max) TOTAL PARTICIPATION OF E2 IN R E2 CARDINALITY RATIO 1:N FOR E 1:E2 IN R E STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R TDDB48 Lecture 1: Introduction 48 8 PROBLEM with ER notation • Incorporates Set-subset relationships • Incorporates Specialization/Generalization Hierarchies THE ENTITY RELATIONSHIP MODEL IN ITS ORIGINAL FORM DID NOT SUPPORT THE SPECIALIZATION/ GENERALIZATION ABSTRACTIONS 20072007-0303-20 49 Extended EntityEntity-Relationship (EER) Model HOW THE ER MODEL CAN BE EXTENDED WITH - Set-subset relationships and Specialization/Generalization Hierarchies and how to display them in EER diagrams TDDB48 Lecture 1: Introduction 49 20072007-0303-20 50 Exempel: Exempel: Två Två typer av anstä anställda TDDB48 Lecture 1: Introduction 50 Anställd d Tekniker ANummer Telefon. System Administratör Lön ANummer ANummer 51 20072007-0303-20 Tekniker Språk Telefon Telefon TDDB48 Lecture 1: Introduction Anummer 51 52 20072007-0303-20 Telefon TDDB48 Lecture 1: Introduction Lön 52 Anställd Lön Telefon System Lön Telefon d d Administratör TDDB48 Lecture 1: Introduction Tekniker Språk System Administratör Språk “Anställda måste vara antingen tekniker eller (XOR) administratörer” “Anställda kan vara tekniker eller (XOR) administratörer” 20072007-0303-20 Språk Anummer Anställd 53 Administratör Lön Lön Anummer Tekniker System 53 54 20072007-0303-20 TDDB48 Lecture 1: Introduction 54 9 ANummer Anställd Anummer Anställd Lön Telefon Lön Telefon o o Tekniker Tekniker System Administratör 20072007-0303-20 Administratör AdmTekn “Det kan finnas anställda som är både tekniker och administratörer” 55 System Språk Språk TDDB48 Lecture 1: Introduction Procent 55 56 20072007-0303-20 TDDB48 Lecture 1: Introduction 56 57 58 20072007-0303-20 TDDB48 Lecture 1: Introduction 58 Exempel • På universitetet finns två typer av studenter, doktorander och grundutbildningsstudenter och man kan inte tillhöra båda kategorierna. Beroende på vilken kategori man tillhör är olika kurser tillåtna. En del kurser bara för doktorander, en del för grundutbildningsstudenter och en del för alla typer av studenter. 57 20072007-0303-20 TDDB48 Lecture 1: Introduction UML Example for Displaying Specialization / Generalization Alternative Diagrammatic Notations Displaying attributes Symbols for entity type / class, attribute and relationship Notations for displaying specialization / generalization 59 20072007-0303-20 TDDB48 Lecture 1: Introduction 59 60 20072007-0303-20 Various (min, max) notations TDDB48 Lecture 1: Introduction Displaying cardinality ratios 60 10