Glossary of Terms - Mount Holyoke College

advertisement
MHC Data Warehouse Project
Glossary of Terms
Table of Contents
Basic Data Warehousing Terminology…………………….………………………………………P 2
SAP Business Objects……………….…………………………………………………………………….P 7
Universe Project Process Flow....................................................................…...P 9
Bibliography………………………………………………………………………………………………..P 11
1
MHC Data Warehouse Project
Glossary of Terms
Basic Data Warehousing Terminology
Attribute- Individual data elements that is represented and stored in a dimension. Each attribute
contains data relating to that dimension.
Business Intelligence (BI) - The collection of one or more reports and analyses, using data from the data
warehouse, that provide insight into the performance of a business organization. These reports and
analyses are typically interactive to enable further understanding of specific areas of interest. They are
used to support business professionals in their decision –making processes.
Business measures- The complete set of facts, base and derived, that are defined and made for
reporting and analysis.
Conformed Dimension – A dimension that is shared between two or more fact tables. It enables the
integration of data from different fact tables at query time. This is a foundational principle that enables
the longevity of ta data warehousing environment. By using conformed dimensions, facts can be used
together, aligned along these common dimensions. The beauty of using conformed dimensions is that
facts that were designed independently of each other, perhaps over a number of years, can be
integrated. The use of conformed dimensions is the central technique for building an enterprise data
warehouse from a set of data marts.
Conformed Fact- A fact or measure whose definition is consistent across facts tables and data marts. A
conformed fact, such as revenue, can be correctly added and compared across different fact tables.
Dashboard (also called performance dashboard)- The presentation of key business measurements on a
single interface designed for quick interpretation, often using graphics. The most effective dashboards
are supported by a full data mart that enables drilling down into more detailed data to better
understand the indicators.
Data Architecture- Describes how data is organized and structured to support the development,
maintenance, and use of the data by application systems. This includes guidelines and
recommendations for historical retention of the data, and how the data is to be used and access.
Data Cleansing- The Process of verifying and correcting data using a series of business rules for
validation, and specifying how to handle cases that fail the checks.
Data Dictionary- The place where information about data that exists in the organization is stored. This
should include both technical and business details about each element.
Data Element- The smallest unit of data that is named. The values are stored in a column or a field in a
database.
Data Governance- The Practice of organizing and implementing policies, procedures, and standards for
the effective use of an organization’s structured or unstructured information assets.
2
MHC Data Warehouse Project
Glossary of Terms
Data Mart- Typically, a data model (star schema) that supports a particular business process or
workflow. A Data Warehouse is the collection of many data marts standardized by the use of Conformed
Dimensions.
Data Model- An abstraction of how individual data elements relate to each other. It visually depicts how
the data is to be organized and stored in a database. A data model provides the mechanism to
document and understand how data is organized.
Data quality- Assessment of the cleanliness, accuracy, and reliability of data.
Data WarehouseA broad definition to describe an integrated information repository that marries data from
several systems (Colleague, Lawson, PowerFaids, etc). These data are loaded into a database
around specific subject area data models, or marts. The data warehouse is the physical layer, ie
where the data actually live, that then access with our front end BI tools like Business Objects,
Tableau, Excel or any other tool.
Degenerate dimension- A single attribute dimension whereby the only attribute is a reference identifier
such as invoice number, P.O. number, or transaction ID. This is needed to support analysis of the
individual parts of a business transaction (individual line items) and the entire business transaction (the
whole purchase order)
Derived Attribute- An attribute that is created to facilitate the overall usefulness of the dimensional
model. This enables different attributes to be pulled together using a single identifier that can help the
technical implementation of the dimensional model. It is used when the underlying source systems do
not have a data element to uniquely define this case and is required to pull together non related
attributes of a junk dimension.
Derived Fact- A fact that is calculated on-the-fly and not stored in the database.
Dimension- Major Business categories of information or groupings to describe business data.
Dimensions contain information used for constraining queries, report headings, and defining drill paths.
Within a dimension, specific attributes are the data elements that are used as row and column headers
on reports. Dimensional attributes are also considered to be reference data. When describing the need
to report information by regions, by week, and by month the attributes following “by: are dimensions.
Each of these would be included in a dimension.
Dimensional Model- A data model organized for the purposes of user understandability and high
performance. In a relational database, a dimensional model is a star join schema characterized by a
central fact table with a multi-part key.
Dimensional Modeling- A formal data modeling technique that is used to organize and represent data
for analytical and reporting use. The Focus is on the business perspective and the representation of
data.
3
MHC Data Warehouse Project
Glossary of Terms
Entity-relationship (ER) Model- A data model that is used to represent data in its purest form and to
define relationships between different entities. It is often the type of model used to design online
transaction processing systems See also normalized model.
Extract, transform, and load (ETL)- The collection of processes that are used to prepare data for another
purpose. This is typically applied to data warehousing, whereby the extracted process collects data from
the appropriate underlying source systems. The transformation processes perform cleansing,
manipulation, and reorganization of the data in preparation for its intended use. Finally the load
processes put the data into the data structures where it is held for data delivery. While ETL processes
are regularly discussed in the context of building the data warehouse, these techniques can also be used
for moving and manipulating data for a variety of other purposes.
Factless fact table- A fact table that captures the existence of business events that do not have a n
associated quantitative measurement. The existence of the relationship is what is relevant.
Facts- The fundamental measurements of the business. These are captured as specific information
about a business event or transaction. They are measured, monitored, and tracked over time. Facts are
typically the amounts and counts that show up as the body or reports, Facts are used for any and all
calculations that are performed.
Grain- the level of detail showing how data is stored and available for analysis.
Information Management- In its simplest form, this is the work associated with collecting, maintaining,
applying,, and leveraging data across and organization.
Infrastructure- a basic foundation technology that all other initiatives in the organization can rely on and
use. This includes basic networking services such as providing shared network drives for storing the
group’s files (e.g., word processing documents, spreadsheets, and presentations). The existence of some
sort of computer for each user can also be considered infrastructure. Basic Networking of computers, via
a local area or wireless network, is another example of infrastructure.
Junk Dimension- A dimension that brings together single attributes that may or may not have any true
relationship to each other in order to simplify the model. Improve query performance, and / or reduce
data storage.
Master Data Management (MDM)- The processes and tools to help an organization consistently define
and manage core reference or descriptive data across the organization. This may involve providing a
centralized view of the data to ensure that its use for all business processes is consistent and accurate.
Multi-dimensional OLAP (MOLAP)- OLAP technology whereby the data is stored in proprietary array
structures called multi-dimensional cubes. See also OLAP.
4
MHC Data Warehouse Project
Glossary of Terms
Normalized Model- A data Model organized to clarify pure data relationships and targeted at gaining
efficiencies in data storage and maintenance. This is used for the design of transaction processing
systems. There are specific rules for normalization. Depending upon the number of rules followed (For
different purposes) there are different “forms,” such as third normal form. See also entity-relationship
model.
Online Analytical processing (OLAP) - a collection of common business analysis functions that are
difficult to perform directly with SQL. Some of the specific functions that fall under the OLAP umbrella
include times series comparison, ranking, ratios, penetration, thresholds, and contribution to report or
the whole data population. Most business intelligence tools provide this type of functionality. The
capabilities can be implemented in a variety of different data storage mechanisms. See also MOLAP,
ROLAP, HOLAP.
Online Transaction processing- (OLTP)- Online transaction processing (OLTP) systems are the
fundamental systems used to run the business. These are also called operational systems or operational
applications. They are often used as sources of data for the data warehouse.
Operational data store (ODS)- A collection of data from operational systems, most often integrated
together, that is used for some operational purpose, the most critical characteristics here is that this is
used for some operational function. This operational dependency takes precedence and the ODS should
not be considered a central component of the data warehousing environment. An ODS can be a clean,
integrated source of data to be pulled into the data warehousing environment.
Query- The mechanism to get data out of a database. A query is comprised of constraints used to filter
data out of the results, and defines the data elements to be included in the results set and possibly
some mathematical computation, grouping, or sorting of the data.
Relational OLAP (ROLAP)- OLAP technology that uses data is stored in relational database management
systems. Data is usually organized dimensionally using a star or snowflake schema.
Role-Playing Dimension- Instances of a dimension that legitimately has more than one value for a given
business transaction, such as order date and shipped data. Each attribute with the dimension is uniquely
identified to enable easy differentiation between the different roles, such as Order Data, Order Quarter,
and Shipped Data and Shipped Quarter.
Scorecard (or performance scorecard)- An application that helps organizations measure and align the
strategic and tactical aspects of their businesses, comparing organizational and individual performance
to goals and targets.
Slowly Changing Dimension (SCD)- A dimension that accommodates changes to the reference data
over time. Several dimensional modeling techniques are used to determine how to handle changes to
the reference data stored in dimensions. This may be to retain only the current values (Type 1), to store
different versions of the reference data (Type 2), or to retain on previous version of changes made to
the entire dimension (Type 3).
5
MHC Data Warehouse Project
Glossary of Terms
Snowflake schema- A variation of the start schema in which the business dimensions are implemented
as a set of normalized tables. The resulting diagram resembles a snowflake.
Source System- An operational system of records whose function it is to capture the transactions of the
business. Source systems are often large online transactional processing systems, but could also be
smaller departmental data bases or spreadsheets that are maintained and used by members of the
business community. These are the origin of the data used to build the data warehouse.
Staging area- Place where data is stored while it is being prepared for use, typically where data used by
ETL processes is stored, this may encompass everything from where the data is extracted from its
original source until it is loaded into presentation servers for end user access. It may also be where data
is stored to prepare it for loading into a normalized data warehouse.
Star Schema- The implementation of a dimensional model in a relational data base . The tables are
organized around a single central fact table possessing a multi-part key, and each surrounding
dimension table has its own primary key.
Structured Query Language (SQL)- The programing language used to access data stored in a relational
database.
Technical architecture- Addresses the organization and the structure of the collection of hardware and
software technologies that are installed to support the development and delivery of the data
warehouse.
Third normal form (3NF)- the most common form of a normalized model. See also normalized model.
(Source: Reeves, Laura L.. A manager's guide to data warehousing. Indianapolis, IN: Wiley Pub., 2009. Print.)
6
MHC Data Warehouse Project
Glossary of Terms
SAP Business Objects
MHC Business Objects Servers- 3 servers
 Panda-https://panda.mtholyoke.edu/BOE/BI
 Whale-https://whale.mtholyoke.edu/BOE/BI
 Swan-https://swan.mtholyoke.edu/BOE/BI
(System Test)
(Development)
(Production)
Web Intelligence (Webi)-You perform data analysis with SAP Business Object Web Intelligence by
creating reports based on data you want to analyze, or by opening pre-existing documents. Depending
on your licenses and security rights, you can then analyze the data in your reports by ,for example,
filtering, drill down, to reveal more details, merging data from different data sources, displaying data
charts or adding formulas.
Web Intelligence- has 3 interfaces
 Web- also referred to as DHTML interface, you launch this via BI Launch Pad
 Rich Internet Application- also referred to as Java applet, you launch this via BI Launch Pad
 Web Intelligence Rich Client-you download and install this via BI Launch Pad
Rich Client- you download and install this via BI Launch Pad, desk top application.
Universes- Data comes from universes, which organize data from relational OLAP databases into
objects or hierarchies, from personal data providers such as Microsoft Excel, or CSV files, from
BEx queries based on SAP Info Cubes, from Web Services, or from Advanced Analysis
Workspaces. You build data providers to retrieve data from these data sources and you create
reports from the data in the data providers.
The semantic layer that one directly interfaces with in the Query Panel of Webi. This layer is
considered a “translation” layer – translating the database level complexities – joins, technical
definitions and implementation – into a business view of the data. Data are organized into
classes (folders) that contain objects called dimensions and measures. (Note: it’s important to
note that SAP BO uses the word ‘dimension’ in the same manner we often use attribute in the
DW). Everything such as names, object organization are merely to support ease of use and
navigation.
BI Platform- The overall BI system.
BI Launchpad- Bi Platform includes the BI Launch Pad, a web application that acts as a window to
business information about your college. In BI Launch Pad you can perform the following tasks:
 Access Crystal Reports, Web Intelligence documents, and other objects and organize them to
suit your needs
 View information in a web browser, export to other business applications (such as Microsoft
excel, and SAP Stream work), and save it to a specified location.
 Use analytical tools to explore the information in detail
>Object-an object is a document of file created by the BI platform or other software that is
stored and managed in the BI repository platform.
>Categories-a category is an organizational alternative to a folder. Use Categories to label
objects.
7
MHC Data Warehouse Project
Glossary of Terms
>Scheduling - scheduling is the process of automatically running and object at a specified time.
Scheduling refreshes dynamic content or data in the object, creates instances, and distributes
the instances to users or stores them locally.
>Events- an event is an object that represents an occurrence in the BI platform. Events can be
used for a variety of purposes, including:
 As scheduling dependencies that trigger actions after a schedule job has run
 To trigger alert notifications
 To monitor BI platform performance
>Calendars-a calendar is a customized list of run dates for scheduling jobs
>Instances-an instance is a snapshot of an object that contains data from the time an object was
run
>Publishing- publishing is the process of making personalized dynamic content publically
available for mass consumption.
>Profiles-a profile is an object that associates users and groups with personalization values.
Profiles are used with Publishing to create personalized content and distribute to recipients.
>Alerting-alerting is the process of notifying users and administrators when events occur in BI
Platform
8
MHC Data Warehouse Project
Glossary of Terms
Universe Project Process Flow
Define Scope - Scope should be defined by the Business Owner. A scope statement should be created
and reviewed and signed off by all core team members.
Inventory- Once the project scope has been defined and agreed upon, the Business Owner will create an
inventory of the data elements and reports required to support the defined scope. The Business Owner
should try to identify the source system and data domains as well as the needs and the gaps of the
project.
Analyze- Review all of the data elements and the reports. Engage in detailed discussions to profile the
data and determine the business rules around the data. Identify any data issues, challenges and try to
identify any risks. Document and define the business and data requirements. Determine the security
requirements as well; who should be able to view the data, and who should not. The business
requirements should be reviewed and signed off by the core team so everyone is aware of the actual
requirements.
Design- Develop the Conceptual Data Design, the Logical Models, the Security structure, and the BOE
layout.
Prototype- Construct the physical database structures, and load sample data. Build the BO Universe
prototype, (note: this step skips a mature ETL step).
Prototype Validation- Validate the data model with the customer by visually displaying the model in BO.
Determine if the model works as expected, it comprises all of the required data. Review the BO Universe
layout, design and data quality with the customer to ensure if meets their expectations.
This is validation is strictly for the design of the data model, and not the actual data. The data will be
validated during User Acceptance Testing (UAT)
Iterate/Refine- If the prototype did not get approved by the customer, go back to the Analyze
step to ensure there is a common understanding on the business requirements. Follow the next
steps in sequence until the Prototype has been validated by the customer; Analyze, Design,
Prototype, Prototype Validation.
Construct-Business Definition and Data Source -The business owner will create business definitions for
all data elements that are being added to BO. The technical team will provide the data source. At a
minimum, both the business definition and the data source will be added in BO as a reference tool for
the users. The users will be able to hoover over any data element to display the business definition and
data source.
9
MHC Data Warehouse Project
Glossary of Terms
Test-There is a test strategy document that outlines the test strategy in further detail.
 Functional Testing - This testing is done by the Data Orchestra and the Data Modeler does prior
to UAT. It is done to ensure that the system works as defined in the business requirements. The
specific test cases will be logged in the System Test Template and stored on the shared drive so
the team can review what had been tested.
 UAT (User Acceptance Testing)- This phase entails the user validating the data and business
requirements as outlined in the Data Inventory Document and the Requirements Document.
The user will use the UAT Test document to plan and document the test scenarios to ensure that
BO is functioning as required.
Go Live- Prepare the production environments, migration BO content, reports, and universes. Migrate
ETL. The Go Live Checklist will be used to ensure all steps were done to support the move to production.
Post Production Support - This is a four week post-production period to finish up non-critical
loose ends, and to address any unforeseen issues.
10
MHC Data Warehouse Project
Glossary of Terms
Bibliography
"help.sap.com." BI Launch Pad Users Guide:SAP Business Objects Business Intelligence Platform 4.0
Support Package 4. SAP Business Objects, 10 Aug. 2012. Web. 10 Apr. 2013.
<help.sap.com/businessobject/product_guides/
http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp4_bip_iv_en.pdf
Reeves, Laura L.. A manager's guide to data warehousing. Indianapolis, IN: Wiley Pub., 2009. Print.
"SAP Business Objects Web Intelligence Rich Client Users Guide; SAP Business Objects Business
Intelligence Suite 4.0 Feature Pack 3." docs.google.com. Mount Holyoke College file, 16 Mar.2012. Web.
10 Apr. 2013.
<https://docs.google.com/a/mtholyoke.edu/file/d/0B6N9R5Vv6Tkzai1VNDN4VEJlYjA/edit>.
https://docs.google.com/a/mtholyoke.edu/file/d/0B6N9R5Vv6Tkzai1VNDN4VEJlYjA/edit
"SAP Business Objects Web Intelligence Users Guide; SAP Business Objects Business Intelligence Suite
4.0 Support Package 5.0." help.sap.com. SAP Business Objects, 11 Mar. 2013. Web. 10 Apr. 2013.
<help.sap.com/businessobject/product_guides/boexir4/en/xi4sp5_ia_en.pdf>.
This one has document history for Users Guides
http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp5_ia_en.pdf
11
Download