Data Administration Data Warehouse Environment (DWE) Implementation 8/19/04 EIS Original Plan for DWE Ad Hoc and Operational Reports Tell me what happened? Tell me everything I need to know and what is important, but do it quickly and easily! Ad Hoc Query Repository Copy of Source Data Operational Daily Updates All Elements Minimum Number of Years of Data WebFOCUS (Reporting) Operational Enterprise Source Data IDMS Oracle Flat files Metadata OLAP Server Tell me what happened and why? Data Staging Area Extract Data Transform Data Quality Assurance Create Metadata External Data Census Data, Benchmark, Salary Surveys, Economic Data Data Warehouse SAS Data Mining Server Cleansed Subset of Detail Data Subset of Summary Data Multiple Years of Data Periodic Updates Strategic Tell me what may happen, or what is interesting? Data Mart #2 Resource Management Legend: 1) Wide black border indicates physical servers. 2) Narrow black border indicates no decision on if it will be a separate physical server. 3) Gray background indicates BI or analytical software servers. Data Mart #1 Course Management Subset of DW Summarized in specific manner Tactical Give me information to help me achieve specific goals! 8/11/2004 DWE Terms • Source Data: Operational data from internal systems, such as IDMS (FES, FRS, HRS, SIS), Oracle, etc. • External Data: Data from systems external to the University, such as economic and census data collected by the government. • Data Staging Area: Storage and processing area for data extracted from the internal and external systems prior to loading into the Warehouse, Data Marts or Ad Hoc Query Repository. Some of the data will remain un-cleansed and an exact replica of the data in the online systems, for subsequent loading into the Ad Hoc Query Repository. Other data will be cleansed and transformed before being moved to the Data Warehouse and Data Marts for analysis. Some data will be located in multiple places and in multiple forms and aggregations. (Also known as an ETL or Extract, Transformation and Load server.) • Metadata: A term used for data that describes or specifies other data. It is used to define all of the characteristics of data required to build databases and applications, and to support knowledge workers and information producers. This includes data element name, meaning, format, domain values, business integrity rules, relationships, owner, etc. DWE Terms • Ad Hoc Query Repository: A collection of enterprise data from multiple sources, used to do ad hoc and operational reporting where the need to use the most current and un-standardized source data is a requirement. The Repository will typically contain only one or two years of the most recent data, unless regulatory or statutory requirements dictate otherwise. (Also known as an Operational Data Store or ODS.) • Data Warehouse: An enterprise-wide, cross-functional, cross-organizational database typically comprised of data extracted, cleansed and/or summarized from multiple online transaction processing systems, and other stores of data (Purdue University; Stanford University). It is designed for query and analysis, typically contains historical data, and is used to present information to support decisionmaking, tactical and strategic business processes. A data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. In general, a data warehouse tends to be a strategic, but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.) DWE Terms • Data Mart: A subset of enterprise data from the Data Warehouse that is summarized and stored in an optimal fashion for analysis and presentation of information to support trend analysis and tactical decisions and processes. Data Marts are typically designed based on an analysis of user needs to answer specific questions in the pursuit of specific goals. The scope can be that of a complete data subject such as Student, or of a particular business area or line of business, such as Enrollment. (Improving Data Warehouse and Business Information Quality, Larry P. English, 1999.) • Enterprise Reporting: A category of software technology that enables the development, organization, sharing, execution, delivery and scheduling of reports via a web platform. DW Terms (Continued) • On-Line Analytical Processing (OLAP): A category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. (http://www.moulton.com/olap/olap.glossary.html) Functionality includes multidimensional analysis, slicing, drill-down and rotation. • Data Mining: A class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data. (http://www.webopedia.com/TERM/d/data_mining.html) DW Terms (Continued) • Executive Information System (EIS): An application developed to provide senior management direct access to information relevant to an organization’s goals and performance, such as a dashboard. These applications are developed to gather, analyze and integrate internal and external data to provide management with insight into key performance indicators, potential problems, and changes in the environment. Typical features include extensive use of graphics, simple navigational controls, automatic replacement of report contents, drill-down analysis, trend analysis capabilities, exception reporting or alerts, graphical charts with links to underlying reports, provision of data from multiple sources, and the highlighting of information an executive feels is critical. (The Data Warehouse Lifecycle Toolkit, Ralph Kimball, et al.) Components of a Decision Support System What is a Decision Support System EIS High Level Summarized Data For Top Executives (“Pre-programmed DASHBOARD”) Data Mart Data Warehouse Operational Data Store Addresses Specific Subject Area Collection Of Integrated Subject Oriented Databases (Historical) Time-Current, Integrated Databases (Tactical-Power Users) Covansys EIS Original Plan for DWE Ad Hoc and Operational Reports Tell me what happened? Tell me everything I need to know and what is important, but do it quickly and easily! Ad Hoc Query Repository Copy of Source Data Operational Daily Updates All Elements Minimum Number of Years of Data WebFOCUS (Reporting) Operational Enterprise Source Data IDMS Oracle Flat files Metadata OLAP Server Tell me what happened and why? Data Staging Area Extract Data Transform Data Quality Assurance Create Metadata External Data Census Data, Benchmark, Salary Surveys, Economic Data Data Warehouse SAS Data Mining Server Cleansed Subset of Detail Data Subset of Summary Data Multiple Years of Data Periodic Updates Strategic Tell me what may happen, or what is interesting? Data Mart #2 Resource Management Legend: 1) Wide black border indicates physical servers. 2) Narrow black border indicates no decision on if it will be a separate physical server. 3) Gray background indicates BI or analytical software servers. Data Mart #1 Course Management Subset of DW Summarized in specific manner Tactical Give me information to help me achieve specific goals! 8/11/2004 Current DWE Ad Hoc and Operational Reports Tell me what happened? Ad Hoc Query Repository Copy of Source Data Operational Daily Updates All Elements Minimum Number of Years of Data WebFOCUS (Reporting) Operational Enterprise Source Data IDMS Oracle Flat files Metadata Tell me what happened and why? Data Staging Area Extract Data Transform Data Quality Assurance Create Metadata External Data Census Data, Benchmark, Salary Surveys, Economic Data Legend: 1) Wide black border indicates physical servers. 2) Narrow border indicates no decision on if it will be a separate physical server. 3) Red border indicates under development. 4) Gray background indicates BI or analytical software servers. Data Warehouse Cleansed Subset of Detail Data Subset of Summary Data Multiple Years of Data Periodic Updates Strategic Data Mart #1 Course Management Subset of DW Summarized in specific manner Tactical SAS Data Mining Server Tell me what may happen, or what is interesting? Give me information to help me achieve specific goals! 8/17/2004 DWE Current Resources – Query Repository Production: PowerEdge 6650, 4 2.8GHz CPU, 4GB RAM, 1.2TB storage, Windows Server 2003 Development: PowerEdge 2650, 1 3.0GHz CPU, 2GB RAM, 252GB storage, Windows Server 2003 Software: Oracle Enterprise – ETL Production: Dell PowerEdge 6650, 4 2.0GHz CPU, 2TB storage, Windows 2000 Advanced Server Development: Dell PowerEdge 6650, 2 2.0GHz CPU, 1TB storage, Windows 2000 Advanced Server Software: Informatica PowerCenter – Enterprise Reporting Production: PowerEdge 2650, 2 2.8GHz CPU, 4GB RAM, 291GB storage, Windows 2003 Server Standard Development: PowerEdge 2550, 2 1.27GHz CPU, 1GB RAM, 220GB storage, Windows 2000 Server Software: WebFOCUS – Statistical Analysis: Dell PowerEdge 2550, 2 1.4 GHZ CPU, 4GB RAM, 144GB storage, Windows 2000 Software: SAS Enterprise Miner, Enterprise Guide, etc. DWE Tasks – – – – DBA (1-2 FTE) – Design Oracle DB, write/run ETL jobs and production support (i.e. monitor system and DB performance, enforce security, schedule backups, etc.) Data Administration (2-3 FTE) – User interface, develop requirements document for all DW projects and new views, evaluate data quality, develop specialized reports, test, train users and coordinate projects Reporting (1-2 FTE) - Develop enterprise reports All – Infrastructure design (with Systems staff), and tool evaluation (ETL, OLAP and desktop reporting) with help from the C/S group. Implementation Strategy - Educate Users • Basics – “What is a Data Warehouse?” Create a “single-source-of-truth.” “What it’s not!” (It is not all the data, with daily updates and online storage.) • Change in culture – “Let’s make better decisions based on objective analysis of data.” • Set realistic expectations - No silver bullet. It can help you make better decisions, but you still have to be responsible for implementing those decisions. • Focus on institutional goals – “What is it we need to achieve? What metrics do we need to evaluate our progress in attaining goals?” • Importance of business sponsors – Make timely business decisions and support requests for necessary resources. Implementation Strategy Requirements • Develop DWE in a phased approach. • Develop detailed requirements documents with users and institutional administrators for applications within the DWE (DW/DM and reports). Course Management (I.V.C.) Business Functions and Goals Optimize course offerings to meet student need. Improvement Opportunities Increase number of high demand courses/sections Increase maximum enrollment in sections Eliminate or reduce frequency of low demand courses Improve course meeting patterns and delivery mode Performance Measures # and % decrease of students who do not get any section of the course requested # and % decrease of low demand courses # and % increase in enrollment % usage of classroom capacity % decrease in length of time to graduate # and % increase in courses taught through preferred mode Business Questions What are the characteristics of high/low demand courses? What characteristics of the student are related to demand? What courses can be eliminated? Which courses should/can be moved to smaller/larger facilities? What impact does the meeting time and location have on demand? What improvements can be made with/without additional money? Data Model College Budgets Degree Reqs. Student Defines Facilities Course Demand Courses Available Faculty Enrollment Economic Data (American Management Systems, Inc.) Data Mart/ Warehouse Implementation Strategy – Data Quality • Focus on improving data quality, and establishing standards for data view and element names and data content. Implementation Strategy – Enterprise Reports • Gather user input on most important reports required by many users, and develop these reports with an enterprise reporting tool that allows us to deliver pre-defined parameter-driven reports via the web. 2001-2002: Infrastructure and Planning 1. Create IDMS data dump to Oracle 2. Implement WebFOCUS 3. Purchase data mining tools and server for IR 4. Create views for Query Repository (ad hoc reporting repository) 5. Establish enterprise standards for key data – Analysis and recommendations are ongoing 6. Identify and prioritize data mart development – Course Management Data Mart top priority for Data Stewards 2001-2002: Infrastructure and Planning (Continued) 7) Initiate GASB – Phase I 8) Initiate data quality projects 8) Review Desktop Reporting Tools – Ongoing review and testing of: • Brio • Crystal Reports • SAS • WebFOCUS 2002-2003: Data Mart Development, etc. 1. Complete GASB – Phase I 2. Implement SAS data mining server 3. Conduct data quality projects – vendor, facilities, FRS, TA data 4. Select and Purchase ETL Tool 5. Begin requirements on Course Management DM 6. Define standards for data view and element names 2003-2004: DWE Upgrades and User Support 1) Implement ETL tool 2) Upgrade database servers 3) Create Metadata application – “Data about data” 4) Conduct SAS data mining project on freshmen data 5) Provide user and technical training on reporting tools, support listservs and web page 6) Purchase enterprise reporting tool and develop reports 2003-2004: DWE Upgrades and User Support 7) Create new data views with standardized names 8) Complete GASB - Phase II 9) Continue development of the Course Management DM requirements 10) Initiate development of the requirements for the Resource Management DM 2004-2005: SAP, etc. 1) Complete standardization of remaining data views 2) Create additional enterprise reports 3) Evaluate SAP Business Warehouse (BW) 4) Conduct extensive data quality analysis for SAP Reporting Web Site and Metadata 1) 2) 3) 4) 5) Reporting URL: https://reporting.uky.edu/ Metadata URL: http://iweb.uky.edu/RptDataDesc/ Metadata directions URL: http://www.uky.edu/IS/DataAdmin/DOCS/metadata/ MetadataDirections.pdf Data element standards URL: http://www.uky.edu/IS/DataAdmin/DOCS/ware/IUUN 0020-QRVE/QRVENamingStds/DataElementNamingStds.pdf Data Administration URL: http://www.uky.edu/IT/DataAdmin/ Naming Standards All data view names start with “V_”. All standard element names are comprised of words: 1) 2) – – – 3) 4) Prime (required) – describes the subject area of the data (i.e. account, student, department, course, etc.), Qualifier (optional) – further defines and distinguishes the “prime” and “class” words (i.e. gender, ethnic, first, etc.), Class (required) – describes the major classifications or types of data (i.e. name, date, code, amount, etc.). Standard Name: “Prime”_”Qualifier”_”Class”; standard abbreviations View - V_POSTN; Element - POSTN_BEG_DT Current Query Repository Data 1) UKFRS_FOC and UKHRS_FOC: to be used by WebFOCUS. 2) 3) UKFRS_SYB: will be removed within 3-4 months. GASB: non-standard views used by OC in producing institutional financial statements. UKFRS_RPT, UKHRS_RPT, UKSIS_RPT and UKSIS_FAMSBR: standardized views will be created over the next couple months, and old views will be removed in 90 days after new views are available. Purchasing views in UKFRS_RPT are in development. UKHRS_RPT also contains standard Labor Distribution views. UKHRS_STAT_RPT: HRS Stat File standard views currently in development and being tested. 4) 5) DWE/SAP Issues 1. How does the SAP Business Warehouse functionality compare to what we originally planned for the DWE? 2. Will the SAP BW replace our Data Warehouse/Marts? 3. Should we continue our plans for the historical legacy data in the DWE, and use the SAP BW for data “from this point forward”? 4. Can/how do we “merge/join” historical data with the new data in SAP? 5. What are our options to “interface” the SAP BW with our DWE (API, etc.)? 6. Should the SAP BW feed our DWE or vice versa? DWE/SAP Issues (Continued) 7. How much (years of data) should we load into the SAP OLTP system? 8. How much (years of data) should we load directly to the SAP BW? 9. What level of detail data should be loaded into the SAP BW, if the corresponding data is not available in the OLTP system? 10. Should we continue with the “data mart” concept within the SAP environment? 11. How easy is it to add new functionality to the SAP BW (data, reports, “cubes”, etc.)? Data Administration QUESTIONS?