CHAPTER SIX Databases and Data Warehouses Information Granularity Refers to the level of detail of information Detailed (POS transaction) Course (Global sales totals) Transactional vs. Analytical Information Transactional information comes from a business process A bank deposit A credit card charge Analytical information uses transactional data for the purposes of decision making Account balance trends Using credit card history to detect fraud Transactional vs. Analytical Information Information Dimensions Information timeliness Information quality Obsolete information is useless Today’s information needs to be provided in real time or near real time Wrong information is useless Redundant information can be the cause of errors Information must be complete Data inconsistency and data integrity Database Management Characteristics Complex Databases often spread across multiple servers Databases often spread across multiple physical disks Fault tolerance is critical Databases may be distributed Database Vendors The industry has consolidated IBM Oracle Microsoft DB2 Universal SQL Server Access Sun (MySQL) Is now Oracle Database Performance Transaction Processing Performance Council provides standard benchmarks TPC-C – Online transaction processing TPC-E – Online brokerage transactions TPC-H – Ad-hoc decision support TPC-W – Web / E-commerce Database Performance (TPC-C) Multiple transaction types Independent of software and hardware Scalable Basis is online transaction processing (OLTP) 1960s Data Management These are legacy systems Characterized by traditional file processing Data processing was sequential Batch processing Not possible to directly locate a particular file record Data dependent on the programs that used the data Program data dependence 1970s Data Management Batch processing gives way to on line transaction processing Technologies Files stored on disk rather than tape Any record can be located in the same amount of time Indexed Sequential Access Method (ISAM) Virtual Sequential Access Method (VSAM) Direct Access files Use a hashing function to derive record keys 1980s Data Management Databases are becoming commonplace Personal computer databases are evolving DBase R-Base 1990s Data Management Huge data stores and transaction processing capabilities Distributed databases Object-oriented databases 6 Million+ transactions per second Realities of a DBMS Data centric rather than application centric Can be a repository for all an organization’s data Databases tend to be centralized Queries get data from a DBMS SQL is the standard query language Report generators create printed and Webbased reports Applications interface with DBMS Types of Databases Database models include: Hierarchical database model – A treebased structure Network database model – Mathematically, a directed graph Relational database model – stores information in the form of logically related two-dimensional tables Object-oriented databases Elements of a Database Logical view and physical view Users see and work with the logical view Physical view is controlled by the database management system itself Entities and Attributes Relational databases store information in tables (entities) Customer / order / product Tables contain fields (attributes) Customer name, address Keys Each table has a primary key that uniquely identifies each record Natural keys have some meaning (stock symbol) Artificial keys have no intrinsic meaning (your R number) Foreign keys are used to link tables in one-to-many relationships Database Interaction Advantages of an RDMS (Scalability) Database can scale to the terabyte or petabyte range NSA maintains 1.9 trillion telephone call records Large databases can span several servers and storage devices Advantages of an RDBMS (Redundancy) Databases can be configured to write duplicate (redundant) information Citibank Journaling and checkpointing are supported Advantages of an RDBMS (Integrity) Relational integrity constraints are rules that apply to the relationships between tables Business integrity constraints enforce business rules Not really a part of the DBMS itself Advantages of an RDBMS (Information Security) A DBMS supports advanced access rights By By By By table and fields time of day location row information Data-driven Web Sites Nearly all transactional Web sites rely on a database Amazon Your bank Any shopping cart application Ebay or Craig’s List Facebook and You Tube Database Integration Databases often need to be integrated Because of mergers and acquisitions Because of organizational changes We are referring to connections to multiple databases Data Warehouses (Introduction) Central source for clean data May contain internal or external data Use to spot hidden patterns in data May be integrated with operational database Parts of a data warehouse are called data marts Data warehouses contain an analytical component Cleansing Data Data is often obtained from a myriad of sources External lists Internal databases Other databases This data must be cleansed and sanitized to remove Redundancy / errors / etc… Data Warehouses (Illustration) Multidimensional Analysis Data are often analyzed as 3dimensional cubes Cubes are then ‘sliced and diced’ to look at various layers Multidimensional Analysis (Illustration) The cost of Perfect Information Database Design (Introduction) In the systems process, we design before we implement Requirements specification Conceptual design Logical design Physical Design Database Design Tools Unified Modeling Language (UML) Visio Rational Rose Entity relationship diagrams describes relationships between data Normalization eliminates redundant data Database Management HR Database administrators Data managers Programmers and systems analysts Data security BUSINESS INTELIGENCE / DATA MINING Business Intelligence (Introduction) Simply put, it’s internal and external data used to support better decision making It’s challenging to sift through the mountains of data It requires cross-functional collaboration between systems More in the next chapter but we use ERP systems to improve business intelligence Business Intelligence (Industries) BI applies to all industries Retail and sales Banking Understanding procurement and distribution (SCM) / customers (CRM) Understand credit worthiness / fraud behavior Insurance Forecast claim risk and understand at – risk customers Business Intelligence (Industries) Airlines Routing planes / minimize turnaround time (Southwest) Marketing Demographics Sell based on known customer behavior (Harrah’s) Amazon Business Intelligence (Levels) Operational Tactical Short term (Dell ordering supplies) Strategic Day-to-day operations (building a Dell) Long term organizational goals The systems that provide BI typically do so at all levels BI Levels (Illustration) BI and Latency From the time of acquisition, how long does it take to analyze (analysis latency) Time to make a decision based on the analysis E-transactions significantly reduce latency Data Mining (Introduction) Data gets mined (analyzed) from data contained in a data warehouse or data mart Specialized tools are used to analyze data for ‘interesting nuggets’ Ways to mine Drill down (general to specific) Drill up (specific to general) Data Mining (Clustering) Cluster analysis groups data by trait or traits Examples Don’t drink the water in Fallon Segment customers by zip codes Data Mining (Association) Answers the question “What traits are associated with other traits” When I stay at Harrah’s, I gamble I eat at the Sage room When I stay in Vegas, I gamble more Data Mining (Statistical Analysis) It’s basic statistics Analysis of variance Correlation coefficients Etc… BI Benefits We can understand what’s happening inside and outside a department Sales knows about product inventory levels and production schedules Production knows about sales and sales forecasts Finance knows about the sales forecasts too This information is provided in near real time Quantifying BI Some benefits can be clearly quantified Costs went down Productivity increased Inventory levels were optimized 10% Some are indirectly quantified Some benefits are intangible Sometimes, we get unexpected results