Databases Unplugged Industry Consolidation & Evolution Cheryl Stepney Microsoft Corporation Agenda Core Components Database Models Modeling the Database Job Roles & Opportunities Database Vendors Industry Convergence - XML What is a Database An organized set of data Have discrete fields with datatype definitions Ensure data accuracy via validation rules Be able to easily query the data using the definitions A core component of every computer application in the world today Database Types Flat Files 1970’s – present Structured, non-flexible, hard to change schema Example: IBM’s IMS Segments of Customer, Order and Product linked by keys held until reorganization Relational Order No., Customer Name, Customer Location, Product A name, Product A price, Product B name, Order Total, end record Hierarchical 1960’s – present Comma/tab delimited, no structure? 1980’s – present Flexible, links based on data values, primary and foreign keys Tables are linked by the existence of data in a row Separate Tables: Order, Customer, Location, Product, Order Line Detail, Contact at Location Object Oriented 1990’s – present Subject oriented, slow to gain adoption, slow performance Objects: Order and Customer Database Terminology Logical Design Physical Design – User View Entities – things about which information needs to be known or held Relationships – Connectors between appropriate data Tables - Columns and Rows Keys - Primary and Foreign Tables are linked by Keys Major Core Data Model Structured Query Language (SQL) Computational Model Query Optimizer Extensible Markup Language (XML) Database Components Major Core Functionality Data Model Structured Query Language (SQL) Computational Model Query Optimizer Extensible Markup Language (XML) Security Module Components Tables Constraints – eg. Zip code must be 5 integers, mandatory Defaults – eg. blank or null on Middle Initial Indexes – Table and View User-defined data types Keys Views User-defined functions Triggers Stored procedures Database Career Roles Database Designer DBA – Database Administrator Data Architect Database Modeler Intended responsibilities Current Role Definition Business User Business Intelligence Data Analyst System Development Lifecycle Where Data and Code Interact Strategic Analysis Data Model Functional Decomposition Detailed Analysis Design Code Test Production Cost of Making A Change Strategic Analysis $ 1 xn Data Model Functional Decomposition Detailed Analysis Design Code Test Production $ 5 $ 50 $ 100 $ 500 $1000 xn xn xn xn xn Data Modeling HIPO Charts IBM – Hierarchical Input/Output Diagrams Gane-Sarson – DFD (Data Flow Diagram) Entity / Relationship Modeling IDEFIX Standards Express-G System Architect Oracle’s CASE Method Microsoft’s Visio Standard for Exchange of Product Model Data Object Role Modeling – ORM Microsoft’s Visio Relational Model What is 1st, 2nd, 3rd Normal Form? What is Normalization? • Remove repeating Groups Order • Remove dependencies • Cater for Time Customer Product Normalization First normal form (1NF) Second normal form (2NF) Each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table For example, in a table with three columns containing customer ID, product sold, and price of the product when sold, the price would be a function of the customer ID (entitled to a discount) and the specific product Third normal form (3NF) It contains two-dimensional tables with rows and columns. Each column corresponds to a sub-object or an attribute of the object represented by the entire table Each row represents a unique instance of that sub-object or attribute and must be different in some way from any other row (that is, no duplicate rows are possible). All entries in any column must be of the same kind For example, using the customer table just cited, removing a row describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain price In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately. Domain/key normal form (0NF) A key uniquely identifies each row in a table. A domain is the set of permissible values for an attribute. By enforcing key and domain restrictions, the database is assured of being freed from modification anomalies Relational Model Example supplied via for Order Detail on Order place by Product of responsible for Contact subject of with Customer employer of Product Type Object Oriented Modeling Shapes: entity value • Objects • Entity • Value • Constraints • Connectors • Mandatory • Uniqueness • Predicates • Roles Database Vendors Flat Files 1970’s – present IBM’s IMS is still in use Relational (RDBMS) 1980’s – present All file systems start out as flat Hierarchical 1960’s – present IBM – DB/2, UDB (Universal Database), Informix Oracle – Oracle 7.x Microsoft – SQL Server 2000 Sybase – Dynamic SQL Computer Associates – OpenIngres Object Oriented (ODBMS) 1990’s – present Computer Associates - Jasmine Gemstone O2 Object Store Objectivity Versant ODBMS IBM – Informix Illustra Former Relational Database Vendors Ingres Informix Unify Cullinet Dec Digital – RDB Verity Natural Language Relational Market Share Gartner perspective 2002 Based on Revenue IBM 25% 29% 50% 5% 3% 5% 2% SQL Server 2000 Sybase 30% Oracle 7.x Microsoft 20% DB/2, UDB (Universal Database) Acquired Informix to gain lead Oracle 31% Units Dynamic SQL Computer Associates OpenIngres Relational Database Vendors IBM IMS – 1960’s, transactional, still in use DB/2 – implemented around 1990 Initially, Mainframe based Data Management On Version 8.1 AS/400 – UDB DB2 for Linux Different code one each platform Market share: 31% Relational Database Vendors Oracle IPO 1986, founded in 1977 Project Oracle to get funding (CIA) Implemented IBM’s “System R” Paper Core to their business applications Multi-Platform is business goal Market share: 30% Several product offerings to buy Unix, Linux, Mainframe, Windows, etc. Oracle 9i SQL Plus, Report Writer, Discoverer, Oracle Developer Suite Applications Oracle Financials - Oracle 11i Oracle Collaboration Suite E-Business Suite Microsoft SQL Server History Not a Database Company at IPO in 1986 History 1992: Beginning of SQL Server on Windows® 1996: SQL Server 6.5 Ships 1998: SQL Server 7.0 released – complete rewrite 2000: SQL Server 2000 w. Data Warehousing 2001: SQL Server Wins Numerous Awards Scaleable from pocket pc to Intel Mainframe – but only on Windows Market share: 25% based on Revenue All components in one box for single price Transactional, OLAP, Data Transformation Notification Services, Reporting Services Data Warehouse First to support XML – no extra charge Retail Price Comparison # of CPUs ORACLE Enterprise SQL Server Enterprise IBM Enterprise 4 $320,000 $79,996 $82,000 8 $640,000 $159,992 $164,000 16 $1,280,000 $319,984 $328,000 32 $2,560,000 * $639,968 ** $656,000 *** * Oracle - Additional for Reports, Data Warehouse, ** Microsoft - All Services in one price *** IBM - Different pricing depending on platform Market Innovation The Big 3 Oracle Corporation IBM Microsoft Transactional Databases Data Warehouses Data Analysis – Business Intelligence XML support Full Text Search XML Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of largescale electronic publishing XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere Extensible Markup Language (XML) 1.0 (Second Edition) W3C Recommendation 6 October 2000 Leading the Web to Its Full Potential... Working Groups XML Coordination Group XML Core Working Group XSL Working Group Extensible Stylesheet Language XML Linking Working Group XML Query Working Group XML Schema Working Group http://www.w3.org/ A SQL Query Select ACO.Name 'Owner Name', ACC.Name 'Region Name', ACB.Name 'Site Name' From AtriumComponent ACO Join AtriumComponent ACC On ACC.ContainerKey = ACO.ComponentId Join AtriumComponent ACB On ACB.ContainerKey = ACC.ComponentId Where ACO.ContainerKey = -1 Order By ACO.Name, ACC.Name, ACB.Name For XML Auto Query Select ACO.Name, ACC.Name, ACB.Name From AtriumComponent ACO Join AtriumComponent ACC On ACC.ContainerKey = ACO.ComponentId Join AtriumComponent ACB On ACB.ContainerKey = ACC.ComponentId Where ACO.ContainerKey = -1 Order By ACO.Name, ACC.Name, ACB.Name For XML AUTO Query Results Owner Name ------------------------------------------------------------------------------Atrium Music Stores 38 Atrium Music Stores 39 Atrium Music Stores 41 Atrium Music Stores 34 Atrium Music Stores 37 Atrium Music Stores 42 Atrium Music Stores 44 (7 row(s) affected) Region Name Site Name -------------------------------------------------- -------------------Eastern Region Atrium Music Store Eastern Region Atrium Music Store Eastern Region Atrium Music Store Western Region Atrium Music Store Western Region Atrium Music Store Western Region Atrium Music Store Western Region Atrium Music Store