Data Warehouses: They Aren’t Just For The Big Guys Anymore1 Bruce W. Johnson, M.S., PMP Introduction: Relatively recent advances in data warehousing software, combined with the continued reduction in the cost of information technologies have made data mining and analysis affordable for most organizations. The benefits of implementing a data warehouse are many but the bottom line is that they turn our raw data into valuable information that can be used for daily decision-making as well as long-range strategic planning. However, the products purchased and the design strategy employed will have a significant impact on cost and time to deployment. Do We Really Need a Data Warehouse? During this time of healthcare reform, managed care and reduced reimbursements, we are confronted with the reality of having to do more with less. The key to survival during these challenging times is to provide a good value to our consumers and to manage our business and clinical operations effectively. This typically entails streamlining our workflow through the application of information technology. Most health care and social service organizations have long since automated their “mission critical” applications, however, we are finding that knowledge workers and executive management still need more and better information to be effective. Unlike the days when organizations were operating with a paper-based system, today our modern computer systems can “bury” us with transactional data and create even more paper. The problem is not having enough data, but making sense of the volumes of raw data that we do collect. The application software vendors have addressed this need in part by creating canned reports and selling third-party report writer software such as MS Access and Crystal Reports. However, this approach is limited in the type of information that can be provided due to limitations of the report writers as well as the design of the applications software. During the 90’s new tools and advances in information management techniques have made it easier to extract data from our operational systems and turn it into useful information. This information can then become part of the organizations knowledgebase. Operational Systems vs Information Systems: At this point it is probably instructive to distinguish between the two basic types of information systems: operational and informational. An operational system is designed to process transactions for a specific purpose. In a billing system, for example, individual charges are compiled into a claim and submitted to an insurance company for payment. These transaction-processing systems are designed to perform the day-to-day operations of the organization. They are focused on specific tasks. The limitation of these systems is that even though the modules are integrated, they provide very limited cross-functional 1 Adapted by permission from Behavioral Health Management Magazine, January/February 2002, Volume 22, Number 1. Page 1 information that can be used for analysis, decision-making and planning. Additionally, the raw data are stored in many files, tables and fields. To further complicate the issue, the names of these objects are often rather cryptic making it almost impossible for nontechnical staff to create meaningful reports. In most systems, the data is only stored for a finite period of time and after a year or two it is either archived or deleted. Consequently, transaction-processing systems are not an appropriate or convenient source of data for answering questions that require historical information on topics that span multiple functional areas of an organization. The solution to this problem has been to create databases or data warehouses that extract, transform, aggregate, summarize, organize and store data in a format that is more conducive to reporting and analysis. The raw transaction data is transformed and organized by subject mater, which may span multiple departments and/or functions. Data Warehouses Defined: A data warehouse is a repository of integrated information extracted from operational systems that is stored in a standardized format. This standardization makes it much easier and more efficient to run queries, reports and analyses on data that originally may have come from a heterogeneous collection of systems, representing different applications running under different operating environments, and with different data formats. In general, there are two types of data warehouses: Large enterprise-wide, centralized databases and smaller stand-alone or integrated data marts. Centralized, Enterprise-Wide Data Warehouse: The centralized data warehouse is typically designed to span multiple divisions, departments and functions throughout a complex organization. It serves to consolidate and integrate data from multiple operational systems. The data warehouse database is created with a very robust database management system like Oracle, Sybase or SQL Server. Data marts: The difference between a centralized data warehouse and a data mart is generally one of size and focus. A data mart is typically smaller (although not always) than a centralized data warehouse and has a more narrow focus. The information in a data mart can be derived directly from a transaction processing system, a central data warehouse and/or other smaller databases and spreadsheets. They can be created using a database management system, e.g. MS Access or SQL Server, or various ETL (extraction, transformation and loading) software packages like Cognos DecisionStream and TableTrans. A data mart can be an add-on component of a large centralized, enterprise-wide data warehouse, or simply a smaller version of a data warehouse. In larger organizations, data marts are designed to extract data from a larger centralized data warehouse. They are Page 2 generally created to address the information requirements of an individual department or function so they tend to have a more narrow scope. They can also be set up to relieve some of the development and processing work of the IT department since the queries, reports and analyses can be created without programming. In smaller organizations one or more data marts may be used as the entire data warehouse. A single data mart designed with ad-on query and OLAP tools may be all that is required for the data warehouse. For a larger organization a distributed data warehouse design may be employed. In this design strategy a series of integrated data marts are created to collect and manage your information. In this arrangement, the data marts are referred to as being integrated since there is consistency among the design, format and overall organization of the databases. The tables, fields, codes, and update protocols are all designed in a consistent fashion. This common design assures interoperability so data can be aggregated across individual data marts. Additionally, most data marts are SQL and ODBC compliant so the data can be accessed by most popular report writers. Without these design standards it is easy to create inconsistencies among your reports and analyses. Advantages to Decentralized Data Marts: The advantages to employing data marts as your data warehouse are that they typically cost less, can be deployed in less time, require less technical expertise to create and maintain and are easier to use. A large centralized data warehouse can cost from $500,000 to several million dollars and take 2 to 3 years to implement. They typically require a dedicated team of analysts and programmers and are the domain of the IT department. On the other hand, a series of data marts can be developed in a matter of months for $10,000 to $500,000 dollars. Additionally, the data marts can be created by individual business units with limited support from IT professionals. Data Mining and Decision Support Tools: There are a number of tools sold by data warehouse software vendors that are used to access the information in the data mart and create the actual decision support system (DSS) and executive information system (EIS). In order of increasing ease-of-use they include: 1. 2. 3. 4. Ad hoc report writers OLAP (Online Analytical Processing) Software Web portals Dashboards The ad hoc report writers are query tools that are used to search data marts, central data warehouses or operational systems. They can be purchased as stand-alone products, e.g. Crystal Reports, or as an integrated component of a suite of business intelligence software, e.g. Cognos Query. They are appropriate when the user will be continually conducting new and different searches against a detailed data source. Page 3 When detailed information is required, but you would like to shield the user from having to access the native data files with a query tool, other data mining tools like Cognos Impromptu can be deployed. These products can be used to create catalogs of information, organized by subject matter that is relevant to the end user. The catalogs are linked to a data mart or operational data source. They are folders organized by content areas, e.g. admissions, billing, outcomes, and contain only the fields, prompts, and functions that are appropriate to the subject of the data mart. The folders, functions and field names of the catalogs are also set up with names that can be easily identified with the information or function they represent. The catalogs are then used to create various queries, reports, graphs and analyses that can be run and printed or published to a web page. One of the more exciting products employed by management in a DSS or EIS is OLAP software. This software is used to design and build multi-dimensional “cubes” of data from catalogs of information. Each cube can be thought of as a 3 dimensional matrix, where each edge (or axis) represents another dimension. Each dimension can then have multiple (nested) levels arranged in a hierarchy. The cells or intersection of the different dimensions contain the measures being studied. A dimension might be location, with the levels of country, state, county, city, etc. A measure might be revenue. The OLAP tools typically work with summary information but provide the user with the ability to “drilldown” into the detailed data using one of the other tools. An example of an OLAP program is Cognos PowerPlay. In the better business intelligence software, the report writers and OLAP tools can be integrated with a Web portal and/or a “dashboard” to create a DSS or EIS. Reports can be scheduled for delivery to a personal Web page and can be run weekly, daily or even hourly. Visualization software is also available that is used to create dashboards with hyperlinks, icons and other graphics that represent various reports, analyses and events. A personalized dashboard may be created with a graphic, e.g. a bar graph that represents various key performance indicators for your organization. The chart can be color coded so emergencies are highlighted in red; areas of a less critical nature in yellow, etc. When you click any area of the chart, it drills down further to a table or chart that provides the user with additional summary information in a multi-dimensional report. The dimensions can be expanded for even more detail, or the user can drill down to the raw data. Some systems also employ a notification feature that will e-mail or page the user when a critical event has occurred. Page 4 Summary: Contemporary managers and executives have grown to recognize that data warehouses can be used as a strategic asset to transform their raw data into valuable information. Continual monitoring of key performance indicators and comparison with historical information can be used for day-to-day decision making as well as long-range planning. New technologies and implementation strategies have put these important management tools within reach of most organizations. About the Author: Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an information management consultant who specializes in working with social service, healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail at jcsinc@fuse.net. Page 5