Data Warehouse Terminology: - Johnson Consulting Services

advertisement
Data Warehouses: They Aren’t Just For The Big Guys Anymore1
Bruce W. Johnson, M.S., PMP
Introduction:
Relatively recent advances in data warehousing software, combined with the continued
reduction in the cost of information technologies have made data mining and analysis
affordable for most organizations. The benefits of implementing a data warehouse are
many but the bottom line is that they turn our raw data into valuable information that can
be used for daily decision-making as well as long-range strategic planning. However, the
products purchased and the design strategy employed will have a significant impact on
cost and time to deployment.
Do We Really Need a Data Warehouse?
During this time of healthcare reform, managed care and reduced reimbursements, we are
confronted with the reality of having to do more with less. The key to survival during
these challenging times is to provide a good value to our consumers and to manage our
business and clinical operations effectively. This typically entails streamlining our
workflow through the application of information technology. Most health care and social
service organizations have long since automated their “mission critical” applications,
however, we are finding that knowledge workers and executive management still need
more and better information to be effective. Unlike the days when organizations were
operating with a paper-based system, today our modern computer systems can “bury” us
with transactional data and create even more paper. The problem is not having enough
data, but making sense of the volumes of raw data that we do collect. The application
software vendors have addressed this need in part by creating canned reports and selling
third-party report writer software such as MS Access and Crystal Reports. However, this
approach is limited in the type of information that can be provided due to limitations of
the report writers as well as the design of the applications software. During the 90’s new
tools and advances in information management techniques have made it easier to extract
data from our operational systems and turn it into useful information. This information
can then become part of the organizations knowledgebase.
Operational Systems vs Information Systems:
At this point it is probably instructive to distinguish between the two basic types of
information systems: operational and informational. An operational system is designed
to process transactions for a specific purpose. In a billing system, for example, individual
charges are compiled into a claim and submitted to an insurance company for payment.
These transaction-processing systems are designed to perform the day-to-day operations
of the organization. They are focused on specific tasks. The limitation of these systems
is that even though the modules are integrated, they provide very limited cross-functional
1
Adapted by permission from Behavioral Health Management Magazine, January/February 2002, Volume
22, Number 1.
Page 1
information that can be used for analysis, decision-making and planning. Additionally,
the raw data are stored in many files, tables and fields. To further complicate the issue,
the names of these objects are often rather cryptic making it almost impossible for nontechnical staff to create meaningful reports. In most systems, the data is only stored for a
finite period of time and after a year or two it is either archived or deleted. Consequently,
transaction-processing systems are not an appropriate or convenient source of data for
answering questions that require historical information on topics that span multiple
functional areas of an organization.
The solution to this problem has been to create databases or data warehouses that extract,
transform, aggregate, summarize, organize and store data in a format that is more
conducive to reporting and analysis. The raw transaction data is transformed and
organized by subject mater, which may span multiple departments and/or functions.
Data Warehouses Defined:
A data warehouse is a repository of integrated information extracted from operational
systems that is stored in a standardized format. This standardization makes it much
easier and more efficient to run queries, reports and analyses on data that originally may
have come from a heterogeneous collection of systems, representing different
applications running under different operating environments, and with different data
formats. In general, there are two types of data warehouses: Large enterprise-wide,
centralized databases and smaller stand-alone or integrated data marts.
Centralized, Enterprise-Wide Data Warehouse:
The centralized data warehouse is typically designed to span multiple divisions,
departments and functions throughout a complex organization. It serves to consolidate
and integrate data from multiple operational systems. The data warehouse database is
created with a very robust database management system like Oracle, Sybase or SQL
Server.
Data marts:
The difference between a centralized data warehouse and a data mart is generally one of
size and focus. A data mart is typically smaller (although not always) than a centralized
data warehouse and has a more narrow focus. The information in a data mart can be
derived directly from a transaction processing system, a central data warehouse and/or
other smaller databases and spreadsheets. They can be created using a database
management system, e.g. MS Access or SQL Server, or various ETL (extraction,
transformation and loading) software packages like Cognos DecisionStream and
TableTrans.
A data mart can be an add-on component of a large centralized, enterprise-wide data
warehouse, or simply a smaller version of a data warehouse. In larger organizations, data
marts are designed to extract data from a larger centralized data warehouse. They are
Page 2
generally created to address the information requirements of an individual department or
function so they tend to have a more narrow scope. They can also be set up to relieve
some of the development and processing work of the IT department since the queries,
reports and analyses can be created without programming.
In smaller organizations one or more data marts may be used as the entire data
warehouse. A single data mart designed with ad-on query and OLAP tools may be all
that is required for the data warehouse. For a larger organization a distributed data
warehouse design may be employed. In this design strategy a series of integrated data
marts are created to collect and manage your information. In this arrangement, the data
marts are referred to as being integrated since there is consistency among the design,
format and overall organization of the databases. The tables, fields, codes, and update
protocols are all designed in a consistent fashion. This common design assures
interoperability so data can be aggregated across individual data marts. Additionally,
most data marts are SQL and ODBC compliant so the data can be accessed by most
popular report writers. Without these design standards it is easy to create inconsistencies
among your reports and analyses.
Advantages to Decentralized Data Marts:
The advantages to employing data marts as your data warehouse are that they typically
cost less, can be deployed in less time, require less technical expertise to create and
maintain and are easier to use. A large centralized data warehouse can cost from
$500,000 to several million dollars and take 2 to 3 years to implement. They typically
require a dedicated team of analysts and programmers and are the domain of the IT
department. On the other hand, a series of data marts can be developed in a matter of
months for $10,000 to $500,000 dollars. Additionally, the data marts can be created by
individual business units with limited support from IT professionals.
Data Mining and Decision Support Tools:
There are a number of tools sold by data warehouse software vendors that are used to
access the information in the data mart and create the actual decision support system
(DSS) and executive information system (EIS). In order of increasing ease-of-use they
include:
1.
2.
3.
4.
Ad hoc report writers
OLAP (Online Analytical Processing) Software
Web portals
Dashboards
The ad hoc report writers are query tools that are used to search data marts, central data
warehouses or operational systems. They can be purchased as stand-alone products, e.g.
Crystal Reports, or as an integrated component of a suite of business intelligence
software, e.g. Cognos Query. They are appropriate when the user will be continually
conducting new and different searches against a detailed data source.
Page 3
When detailed information is required, but you would like to shield the user from having
to access the native data files with a query tool, other data mining tools like Cognos
Impromptu can be deployed. These products can be used to create catalogs of
information, organized by subject matter that is relevant to the end user. The catalogs are
linked to a data mart or operational data source. They are folders organized by content
areas, e.g. admissions, billing, outcomes, and contain only the fields, prompts, and
functions that are appropriate to the subject of the data mart. The folders, functions and
field names of the catalogs are also set up with names that can be easily identified with
the information or function they represent. The catalogs are then used to create various
queries, reports, graphs and analyses that can be run and printed or published to a web
page.
One of the more exciting products employed by management in a DSS or EIS is OLAP
software. This software is used to design and build multi-dimensional “cubes” of data
from catalogs of information. Each cube can be thought of as a 3 dimensional matrix,
where each edge (or axis) represents another dimension. Each dimension can then have
multiple (nested) levels arranged in a hierarchy. The cells or intersection of the different
dimensions contain the measures being studied. A dimension might be location, with the
levels of country, state, county, city, etc. A measure might be revenue. The OLAP tools
typically work with summary information but provide the user with the ability to “drilldown” into the detailed data using one of the other tools. An example of an OLAP
program is Cognos PowerPlay.
In the better business intelligence software, the report writers and OLAP tools can be
integrated with a Web portal and/or a “dashboard” to create a DSS or EIS. Reports can
be scheduled for delivery to a personal Web page and can be run weekly, daily or even
hourly. Visualization software is also available that is used to create dashboards with
hyperlinks, icons and other graphics that represent various reports, analyses and events.
A personalized dashboard may be created with a graphic, e.g. a bar graph that represents
various key performance indicators for your organization. The chart can be color coded
so emergencies are highlighted in red; areas of a less critical nature in yellow, etc. When
you click any area of the chart, it drills down further to a table or chart that provides the
user with additional summary information in a multi-dimensional report. The dimensions
can be expanded for even more detail, or the user can drill down to the raw data. Some
systems also employ a notification feature that will e-mail or page the user when a critical
event has occurred.
Page 4
Summary:
Contemporary managers and executives have grown to recognize that data warehouses
can be used as a strategic asset to transform their raw data into valuable information.
Continual monitoring of key performance indicators and comparison with historical
information can be used for day-to-day decision making as well as long-range planning.
New technologies and implementation strategies have put these important management
tools within reach of most organizations.
About the Author:
Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an
information management consultant who specializes in working with social service,
healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail
at jcsinc@fuse.net.
Page 5
Download