Information Resources Management April 24, 2001 Agenda Administrivia Object-Oriented & Databases Data Warehousing Data Mining SQL Extensions XML Administrivia Homework #8 Homework #9 Current Scores Final Review Session? OODBMS vs. ORDBMS OODBMS - Object-Oriented ORDBMS - Object-Relational Appendix A OODBMS Persistent Objects By class By creation By marking By reference Storage/Retrieval Methods OODBMS - Benefits Match Programming Methodology Data types & structures Ease of programming Inheritance OODBMS - Challenges Standards ODMG - Object Database Management Group Performance Database vs. persistent language Loss of integrity, queries Storage Space Maturity ORDBMS Extensions to relational model Complex data types Inheritance References Migration path Use existing applications and knowledge base ORDBMS - Benefits SQL Existing Systems Vendors ORDBMS - Challenges Standards “Fit” with the development language Programming Complexity Using a relational database to store data from an object-oriented system has been likened to parking your car in your garage. With an OODBMS you park the car in the garage. If a (O)RDBMS is used, to park your car in the garage, you must first completely disassemble it and put each part in its specific location on a shelf. This process must then be reversed the next time you want to go for a drive. OODBMS/ORDBMS Products Vendor Computer Associates www.cai.com/products/jasmine Franz www.franz.com Fujitsu Software www.fsc.fujitsu.com Gemstone Systems www.gemstone.com Matisse Software www.matisse.com O2 Technology www.o2tech.com Object Design www.odi.com Product Jasimine AllegroStore Jasmine GemStone/S ADB O2 ObjectStore OODBMS/ORDBMS Products Vendor Objectivity www.objectivity.com Object Systems www.iprolink.ch/ibex.com Ontos www.ontos.com Persistence www.persistence.com Poet Software www.poet.com Unisys www.osmos.com Versant www.versant.com Product Objectivity/DB ITASCA Ontos Integrator Persistence Live Object Server Poet Object Server Osmos Versant ODBMS Other Links Object Database Management Group www.odmg.org Object Database Newsgroup comp.databases.object Data Mining Corporations have collosal amounts of data Usually only used for very specific purposes (operations) Automated attempt to learn from the data Find statistical rules and patterns in the data Example: Giant Eagle Advantage Card Goals of Data Mining Explanatory - Why? Confirmatory - Is it? Exploratory - ??? Approaches to Data Mining Classification identify rules that create groups Association find related conditions or events Correlation relationships between values User Guided hypothesis driven Automatic data driven - AI based Data Warehouse A subject-oriented, integrated, timevariant, nonvolatile collection of data Usually all data for a corporation Multidimensional database Data Warehousing Single location Long-term storage Greater availability Separate “data” processing from day-today operations (performance) All data is historical Support data mining, et al. Data Warehousing Questions What data needs to be kept? Where is it from? How good is it? How long should it be kept? Can it be summarized? When? Will it make sense? What is the schema? When is it updated? Data Warehousing - Benefits Support for decision making tools DSS, EIS, Data Mining Separation of information and day-today processing Unification - Centralization Improved quality and consistency Data Warehousing Challenges Costs: Storage, Setup, Maintenance Historical data issues Defining the warehouse schema Doing the conversion Implementation & every time Keeping up with operational system changes Answering the questions Multidimensional Databases Two views Multidimensional tables Star schema Multidimensional table each cell is attribute dimensions are “interesting” categories Multidimensional Table Cell - sales Dimensions day person store item Star Schema Multiple tables Central table - data item (cell) Surrounding tables - information about each category (dimensions) Star Schema Person Day Sales Item Store Star Schema Sales (Day, Person, Store, Item, sales) Day (Day, day info) Person (Person, person info) Store (Store, store info) Item (Item, item info) Building/Maintaining a Data Warehouse 1. 2. 3. 4. Capture Scrub Transform Load and Index Data Marts Making specific data available Different ones for different needs DW Operational Systems DM1 DM2 Data Mining Corporations have collosal amounts of data Usually only used for very specific purposes (operations) Automated attempt to learn from the data Find statistical rules and patterns in the data Example: Giant Eagle Advantage Card Goals of Data Mining Explanatory - Why? Confirmatory - Is it? Exploratory - ??? Approaches to Data Mining Classification identify rules that create groups Association find related conditions or events Correlation relationships between values User Guided hypothesis driven Automatic data driven - AI based Data Mining - Benefits Use data Learn new things Improve decision making Data Mining - Challenges Time (human and/or computer) Spurious results Separating the wheat from the chaff Availability of data Amount of data Changes in tools and technologies Validity over time Enhanced Data Analysis Beyond SUM, COUNT, and AVG SQL extensions (suggested) GROUP BY … AS PERCENTILE Specific percentiles GROUP BY … WITH CUBE Cross-tabulations Statistical package interface SAS, S++, others Enhanced Data Analysis Benefits Greater functionality Improved decision making Enhanced Data Analysis Challenges Lack of standards Understandability Processing requirements Cost of poorly written queries “ad hoc” queries aren’t reviewed Extending Relational DBs Spatial and Geographic Databases Multimedia Databases Changing the data stored while retaining the benefits of relational databases Spatial & Geographic DBs Spatial - CAD Geographic - GIS Similar issue How to store and retrieve such data Spatial Databases Geometric objects (2 or 3 dimensions) Locations Connections Nonspatial information about each object Substructures Spatial integrity constraints Two things can’t occupy the same space GIS Databases Raster Data (fractal data) Pictures - possibly over time Maps Vector Data Locations Connections Nongeographic information Spatial & Geographic DB Benefits DBMS Specialized queries Spatial & Geographic Data “Standard” Data Mix of the two Integrity constraints Spatial & Geographic DB Challenges Space requirements Level of detail Understandability - Complexity Processing requirements Compatibility between systems Lack of standards Multimedia Databases Images, Audio, Video Nonmultimedia data (text) about each Database Enhancements BLOBs (Binary Large Objects) Similarity-based queries Guaranteed steady rate Synchronization of audio and video Multimedia Databases Benefits DBMS Greater compression may be possible “Paperless” office - document imaging Workflow redesign - improvements Greater availability Multimedia Databases Challenges STORAGE Specialized DBMS Unity of database and network Usually requires ATM Specialized hardware “juke boxes” optical disks XML What is it? What isn’t it? What are the goals? Who controls it? Who’s using it? Beyond XML What is XML? eXtensible Markup Language Markup language for “structured information” “structured” - content & role of that content markup - identify structures “meta language for describing markup languages” Huh? Storing structured data in a text file spreadsheet, address book, transactions (think EDI) Looks like HTML, <tags>, but isn’t Text is universal, but not efficient Does disk space matter? What about network capacity? XML is license-free & platform-independent What XML isn’t HTML SGML - Standard Generalized Markup Language - printing Limited to current definitions (tags) XML is the way to add new definitions A relational database management system A database, or is it? Goals of XML Easy to use over Internet Wide variety of applications Compatible with SGML (subset) Easy to write programs that use XML documents No (or few) optional features Human-legible if necessary Goals of XML (2) Standards developed quickly Formal and concise Easy to create documents No need for “shortcuts” Who Controls XML? W3 Consortium www.w3.org/XML XML 1.0 specification Who’s Using XML? Financial Products Markup Language FpML FpML.org “A standard for financial derivatives business-to-business e-Commerce” Others? Beyond XML Xlink - hyperlinks in XML XPointer & Xfragments - point to parts of an XML document CSS - style sheet language XML and HTML XSL - advanced language for style sheets XSLT - XSL transformation language Beyond XML (2) DOM - standard function calls for manipulating XML (and HTML) from programs XML Namespaces - link a URL with every tag and attribute XML Schemas 1 & 2 - help in precisely developing own XML-based formats Homework #10 Last One! (No HW #11) Research and evaluate products 100 points Final Next Tuesday, 5/1 Approximately 1/3 from 4/3 - 4/24 Remainder - comprehensive Thank You