Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Learning Objectives • • • • • Describe the issues in management of data. Understand the concepts and use of DBMS. Learn about data warehousing and data marts. Explain business intelligence/business analytics. Examine how decision making can be improved through data manipulation and analytics. • Understand the interaction betwixt the Web and database technologies. • Explain how database technologies are used in business analytics. • Understand the impact of the Web on business intelligence and analytics. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-2 Data, Information, Knowledge • Data – Items that are the most elementary descriptions of things, events, activities, and transactions – May be internal or external • Information – Organized data that has meaning and value • Knowledge – Processed data or information that conveys understanding or learning applicable to a problem or activity © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-3 Data • Raw data collected manually or by instruments • Quality is critical – Quality determines usefulness • • • • Contextual data quality Intrinsic data quality Accessibility data quality Representation data quality – Often neglected or casually handled – Problems exposed when data is summarized © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-4 © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-5 Data • Cleanse data – – – – When populating warehouse Data quality action plan Best practices for data quality Measure results • Data integrity issues – – – – Uniformity Version Completeness check Conformity check © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-6 Data • Data Integration • Access needed to multiple sources – Often enterprise-wide – Disparate and heterogeneous databases – XML becoming language standard © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-7 Describe the role of the Internet in MSS data management and business intelligence. • The role of the Internet in MSS data management and business intelligence is increasing. Currently database vendors are providing Web hooks that allow their databases to provide data directly in HTML or XML format, and Web browsers are used to access databases. Most business intelligence tools permit access to data warehouses via the Internet or company intranet. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-8 • List the major categories of data sources for an MSS/BI. Internal sources; usually the reporting systems of the functional areas. External sources (commercial databases, government and industry reports, etc.) and personal data. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-9 • Describe the benefits of commercial databases. Provide external data at a timely manner and at a reasonable cost. Because of economies of scale, such services are comprehensive and inexpensive. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-10 Database Management Systems • • • • • Supplements operating system Manages data Queries data and generates reports Data security Combines with modeling language for construction of DSS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-11 Database Models • Hierarchical – Top down, like inverted tree – Fields have only one “parent”, each “parent” can have multiple “children” – Fast • Network – Relationships created through linked lists, using pointers – “Children” can have multiple “parents” – Greater flexibility, substantial overhead • Relational – Flat, two-dimensional tables with multiple access queries – Examines relations between multiple tables – Flexible, quick, and extendable with data independence • Object oriented – Data analyzed at conceptual level – Inheritance, abstraction, encapsulation © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-12 © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-13 Database Models, continued • Multimedia Based – Multiple data formats • JPEG, GIF, bitmap, PNG, sound, video, virtual reality – Requires specific hardware for full feature availability • Document Based – Document storage and management © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-14 • Define document management. Document management involves managing what was once paper documents in a firm. It is a generally computerized system that provides access to the most recent versions of important documents (policies, methods, etc.), restricts access to appropriate employees, allows updates by key people, and performs archiving © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-15 • Define object-oriented database management. Based on object-oriented programming: using symbols and icons it can handle very complex data structures, show hierarchies, and complex relationships. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-16 • What is SQL? Why is it important? A SQL (Structured Query Language) is a nonprocedural language for data manipulation in a relational DBMS. It can be used to query a database, to exercise DBMS operations, and to perform database administration functions. It is a standard used by database vendors to permit access to relational databases. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-17 • What is the difference between a database and a data warehouse? Technically a data warehouse is a database, however, a data warehouse is an integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization. Databases are typically the term used to describe operational data stores and are transactional in their structure. As a result databases are usually highly normalized, whereas data warehouses are highly denormalized. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-18 Data warehouse • A data warehouse is a physically separate database from a company’s operational environments. Its purpose is to provide decision support from its data repository that makes operational data accessible in a form that is readily acceptable for decision support and other user’s applications. Data warehousing is the process of taking internal data, cleansing it, and storing it in a data warehouse where it can be accessed by various decision makers in the decision-making process. External information is also brought into the data warehouse. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-19 Data Warehouse • Subject oriented • Scrubbed so that data from heterogeneous sources are standardized • Nonvolatile – Read only • Summarized • Not normalized; may be redundant • Data from both internal and external sources is present • Metadata included – Data about data • Business metadata • Semantic metadata © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-20 Architecture • May have one or more tiers – Determined by warehouse, data acquisition (back end), and client (front end) • One tier, where all run on same platform, is rare • Two tier usually combines DSS engine (client) with warehouse – More economical • Three tier separates these functional parts © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-21 © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-22 © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-23 Migrating Data • Business rules – Stored in metadata repository – Applied to data warehouse centrally • Data extracted from all relevant sources – Loaded through data-transformation tools or programs – Separate operation and decision support environments • Correct problems in quality before data stored – Cleanse and organize in consistent manner © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-24 Data Warehouse Design • Dimensional modeling – Retrieval based – Implemented by star schema • Central fact table • Dimension tables © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-25 star schema • A Star Schema is a technique used to define the structure of a data warehouse. It consists of two components, dimension tables (which define the criteria by which data will be retrieved ;e.g., location, product, time and fact tables (the data that is of interest to the organization). Facts can be highly summarized or detail data © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-26 • Describe the role that a data warehouse can play in MSS. List its benefits. • The data contained in a data warehouse has been cleansed and thus has little redundancy and a higher level of integrity. This gives a higher level of confidence in the decisions made based on the data contained in the warehouse. Benefits include a common storage format, quick access to data for strategic use, and accurate data. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-28 Data Marts A data mart is a small data warehouse designed for the strategic business unit (SBU) or a department. Data marts can either be dependent or independent. They are important because they can be a cost effective way to determine the benefits of a data warehouse to an organization. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-29 Data Marts • Dependent – Created from warehouse – Replicated • Functional subset of warehouse • Independent – Scaled down, less expensive version of data warehouse – Designed for a department or SBU – Organization may have multiple data marts • Difficult to integrate © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-30 Business Intelligence and Analytics • Business intelligence – Acquisition of data and information for use in decision-making activities • Business analytics – Models and solution methods • Data mining – Applying models and methods to data to identify patterns and trends © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-31 OLAP • OLAP is the “online analytical processing” of data. It allows a user to tap into raw data and perform detailed and complex analysis directly on the client machine, without resorting to back-end processing © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-32 OLAP • Activities performed by end users in online systems – Specific, open-ended query generation • SQL – Statistical analysis – Building DSS applications • Modeling and visualization capabilities • Special class of tools – – – – DSS/BI front ends Data access front ends Database front ends Visual information access systems © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-33 Data Mining • Organizes and employs information and knowledge from databases • Statistical, mathematical, artificial intelligence, and machine-learning techniques • Automatic and fast • Tools look for patterns – Simple models – Intermediate models – Complex Models © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-34 • Differentiate data mining, text mining, and Web mining. Text mining involves analyzing vast amounts of textual data to determine patterns or correlations within the text. Data mining is a broader subject encompassing all types of information contained within an organization. Web mining extends data mining to include Web resources in the determination of correlations or patterns with organizational data. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-36 Data Visualization • Technologies supporting visualization and interpretation – Digital imaging, GIS, GUI, tables, multidimensions, graphs, VR, 3D, animation – Identify relationships and trends • Data manipulation allows real time look at performance data © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-37 Multidimensionality • Data organized according to business standards, not analysts • Conceptual • Factors Dimensions Measures Time • Significant overhead and storage • Expensive • Complex © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-38 Analytic systems • Real-time queries and analysis • Real-time decision-making • Real-time data warehouses updated daily or more frequently – Updates may be made while queries are active – Not all data updated continuously • Deployment of business analytic applications © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-39 GIS • Computerized system for managing and manipulating data with digitized maps – Geographically oriented – Geographic spreadsheet for models – Software allows web access to maps – Used for modeling and simulations © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-40 © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-41 • It is said that a relational database is the best for DSS (as compared to hierarchical and network structures). Explain why. Because of its tabular structure, it is easy to build tables that DSS users like. It is friendly software that allows multiple access queries. It is also relatively easy to convert from one relational system to another. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-42 • • • • Describe the major dimensions of data quality. Intrinsic DQ: Accuracy, objectivity, and reputation Accessibility DQ: Accessibility and access security Representation DQ: ease of understanding, consistent representation. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-43