Data Warehouse Terminology, Johnson

advertisement
DATA WAREHOUSE TERMINOLOGY1
Bruce W. Johnson, M.S.
Ad Hoc Query:
A database search that is designed to extract specific information from a database. It is
ad hoc if it is designed at the point of execution as opposed to being a “canned” report.
Most ad hoc query software uses the structured query language (SQL).
Aggregation:
The process of summarizing or combining data.
Catalog:
A component of a data dictionary that describes and organizes the various aspects of a
database such as its folders, dimensions, measures, prompts, functions, queries and other
database objects. It is used to create queries, reports, analyses and cubes.
Cross Tab:
A type of multi-dimensional report that displays values or measures in cells created by
the intersection of two or more dimensions in a table format.
Dashboard:
A data visualization method and workflow management tool that brings together useful
information on a series of screens and/or web pages. Some of the information that may
be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail,
etc. When incorporated into a DSS or EIS key performance indicators may be
represented as graphics that are linked to various hyperlinks, graphs, tables and other
reports. The dashboard draws its information from multiple sources applications, office
products, databases, Internet, etc.
Cube:
A multi-dimensional matrix of data that has multiple dimensions (independent variables)
and measures (dependent variables) that are created by an Online Analytical Processing
System (OLAP). Each dimension may be organized into a hierarchy with multiple levels.
The intersection of two or more dimensional categories is referred to as a cell.
1
Adapted by permission from Behavioral Health Management, January/February 2002, Volume 22,
Number 1.
Page 1
Data-based Knowledge:
Factual information used in the decision making process that is derived from data marts
or warehouses using business intelligence tools. Data warehousing organizes information
into a format so that it represents an organizations knowledge with respect to a particular
subject area, e.g. finance or clinical outcomes.
Data Cleansing:
The process of cleaning or removing errors, redundancies and inconsistencies in the data
that is being imported into a data mart or data warehouse. It is part of the quality
assurance process.
Data Mart:
A database that is similar in structure to a data warehouse, but is typically smaller and is
focused on a more limited area. Multiple, integrated data marts are sometimes referred to
as an Integrated Data Warehouse. Data marts may be used in place of a larger data
warehouse or in conjunction with it. They are typically less expensive to develop and
faster to deploy and are therefore becoming more popular with smaller organizations.
Data Migration:
The transfer of data from one platform to another. This may include conversion from one
language, file structure and/or operating environment to another.
Data Mining:
The process of researching data marts and data warehouses to detect specific patterns in
the data sets. Data mining may be performed on databases and multi-dimensional data
cubes with ad hoc query tools and OLAP software. The queries and reports are typically
designed to answer specific questions to uncover trends or hidden relationships in the
data.
Data Scrubbing:
See Data Cleansing
Page 2
Data Transformation:
The modification of transaction data extracted from one or more data sources before it is
loaded into the data mart or warehouse. The modifications may include data cleansing,
translation of data into a common format so that is can be aggregated and compared,
summarizing the data, etc.
Data Warehouse:
An integrated, non-volatile database of historical information that is designed around
specific content areas and is used to answer questions regarding an organizations
operations and environment.
Database Management System:
The software that is used to create data warehouses and data marts. For the purposes of
data warehousing, they typically include relational database management systems and
multi-dimensional database management systems. Both types of database management
systems create the database structures, store and retrieve the data and include various
administrative functions.
Decision Support System (DSS):
A set of queries, reports, rule-based analyses, tables and charts that are designed to aid
management with their decision-making responsibilities. These functions are typically
“wrapped around” a data mart or data warehouse. The DSS tends to employ more
detailed level data than an EIS.
Dimension:
A variable, perspective or general category of information that is used to organize and
analyze information in a multi-dimensional data cube.
Drill Down:
The ability of a data-mining tool to move down into increasing levels of detail in a data
mart, data warehouse or multi-dimensional data cube.
Drill Up:
The ability of a data-mining tool to move back up into higher levels of data in a data
mart, data warehouse or multi-dimensional data cube.
Page 3
Executive Information Management System (EIS):
A type of decision support system designed for executive management that reports
summary level information as opposed to greater detail derived in a decision support
system.
Extraction, Transformation and Loading (ETL) Tool:
Software that is used to extract data from a data source like a operational system or data
warehouse, modify the data and then load it into a data mart, data warehouse or multidimensional data cube.
Granularity:
The level of detail in a data store or report.
Hierarchy:
The organization of data, e.g. a dimension, into a outline or logical tree structure. The
strata of a hierarchy are referred to as levels. The individual elements within a level are
referred to as categories. The next lower level in a hierarchy is the child; the next higher
level containing the children is their parent.
Legacy System:
Older systems developed on platforms that tend to be one or more generations behind the
current state-of-the-art applications. Data marts and warehouses were developed in large
part due to the difficulty in extracting data from these system and the inconsistencies and
incompatibilities among them.
Level:
A tier or strata in a dimensional hierarchy. Each lower level represents an increasing
degree of detail. Levels in a location dimension might include country, region, state,
county, city, zip code, etc.
Measure:
A quantifiable variable or value stored in a multi-dimensional OLAP cube. It is a value
in the cell at the intersection of two or more dimensions.
Page 4
Member:
One of the data points for a level of a dimension.
Meta Data:
Information in a data mart or warehouse that describes the tables, fields, data types,
attributes and other objects in the data warehouse and how they map to their data sources.
Meta data is contained in database catalogs and data dictionaries.
Multi-Dimensional Online Processing (MOLAP):
Software that creates and analyzes multi-dimensional cubes to store its information.
Non-Volatile Data:
Data that is static or that does not change. In transaction processing systems the data is
updated on a continual regular basis. In a data warehouse the database is added to or
appended, but the existing data seldom changes.
Normalization:
The process of eliminating duplicate information in a database by creating a separate
table that stores the redundant information. For example, it would be highly inefficient to
re-enter the address of an insurance company with every claim. Instead, the database
uses a key field to link the claims table to the address table. Operational or transaction
processing systems are typically “normalized”. On the other hand, some data warehouses
find it advantageous to de-normalize the data allowing for some degree of redundancy.
Online Analytical Processing (OLAP):
The process employed by multi-dimensional analysis software to analyze the data
resident in data cubes. There are different types of OLAP systems named for the type of
database employed to create them and the data structures produced.
Open Database Connectivity (ODBC):
A database standard developed by Microsoft and the SQL Access Group Consortium that
defines the “rules” for accessing or retrieving data from a database.
Page 5
Relational Database Management System:
Database management systems that have the ability to link tables of data through a
common or key field. Most databases today use relational technologies and support a
standard programming language called Structured Query Language (SQL).
Relational Online Analytical Processing (ROLAP):
OLAP software that employs a relational strategy to organize and store the data in its
database.
Replication:
The process of copying data from one database table to another.
Scalable:
The attribute or capability of a database to significantly expand the number of records
that it can manage. It also refers to hardware systems and their ability to be expanded or
upgraded to increase their processing speed and handle larger volumes of data.
Structured Query Language (SQL):
A standard programming language used by contemporary relational database
management systems.
Synchronization:
The process by which the data in two or more separate database are synchronized so that
the records contain the same information. If the fields and records are updated in one
database the same fields and records are updated in the other.
About the Author:
Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an
information management consultant who specializes in working with social service,
healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail
at jcsinc@fuse.net.
Page 6
Download