Financial Data Model Overview

advertisement
Financial Data Model Overview
Daniel Grieb
Lori Silvestri
December 4, 2008
1
Agenda
■ Reporting Solution
■ Star Schema Primer
■ Data Modeling Process
■ Finance Data Models
■ Design Challenges and Choices
■ Implementation
■ Conclusion
December 4, 2008
2
Finance Data Modeling Guidelines
■ Campus Solution must use CSU
Finance Reporting Solution as Source
■ Replace Existing
1. Revenue and Expense (P & L)
2. Trial Balance Reporting
3. Drill from Summary to Transaction
■ Need daily refresh of large data sets
■ Anticipate analytical reporting
December 4, 2008
3
Levels of Reporting
Analytics
Operational
Transactional
December 4, 2008
Enterprise Data Warehouse
Combined information from multiple
source systems. Current and historical
information
Much more sophisticated data
structures to enable analysis: cubes
and star schema
Operational Reporting
Tactical data from production systems
that address operational needs
Denormalized data structures with
embedded business logic
Transactional Reporting
Supports day to day transactional users
Requires knowledge of transactional
data
4
REPORTING SOLUTION
December 4, 2008
5
CSU Reporting Solution
■ Attribute Tables
– one set for each Set ID
– XXCMP, XXCSU, XXGAP
■ Transaction Tables
– separate tables per Business Unit
■ Summary Table
– XXCMP and XXCSU
* Brothwell, Kist, and Yelland, “Finance 9.0 Reporting Solution Training” April, 2008
December 4, 2008
6
CSU Reporting Solution - Attributes
■ Attribute Tables – one set for each Set ID (XXCMP,
XXCSU, XXGAP)
–
–
–
–
–
–
Fund
Department
Account
Program
Project
Class
CSU_R_FUND_TBL
CSU_R_DEPT_TBL
CSU_R_ACCT_TBL
CSU_R_PRGM_TBL
CSU_R_PROJ_TBL
CSU_R_CLASS_TBL
■ Can be joined to transaction and summary tables
■ Department table contains “flattened” version of the
campus organization department tree
* Brothwell, Kist, and Yelland, “Finance 9.0 Reporting Solution Training” April, 2008
December 4, 2008
7
CSU Reporting Solution - Transactions
■ Transaction Tables – separate tables per
Business Unit
■ Campus Business Unit Transaction Tables
–
–
–
–
Actuals
Budgets
Encumbrances
Pre-Encumbrances
CSU_R_ACTDT_CMP
CSU_R_BUDDT_CMP
CSU_R_ENCDT_CMP
CSU_R_PREDT_CMP
■ CSU Business Unit Transaction Tables
■ GAP Business Unit Transaction Tables
* Brothwell, Kist, and Yelland, “Finance 9.0 Reporting Solution Training” April, 2008
December 4, 2008
8
CSU Reporting Solution - Summary
■ Summary Tables (XXCMP and XXCSU)
■ Campus Business Unit Summary Table
– CSU_R_SUMBL_CMP
■ CSU Business Unit Summary Table
– CSU_R_SUMBL_CSU
* Brothwell, Kist, and Yelland, “Finance 9.0 Reporting Solution Training” April, 2008
December 4, 2008
9
Benefits of the Reporting Solution
to the Dimensional Data Model
■ Validated independently
– Reporting solution was validated between
January and September 2008
– Finance was heavily invested in, helped
design and trusted the reporting solution
– Sped up data model validation because we
could tie to the reporting solution
» Finance validated within days, rather than
weeks
» Validated using the dashboards
December 4, 2008
10
Benefits of the Reporting Solution
to the Dimensional Data Model
Reporting solution now used in parallel by
Finance for internal querying and to fill
ad hoc requests
– Phase one of the data models did not have
to incorporate all of the reporting solution
data
– Helped constrain project scope
December 4, 2008
11
STAR SCHEMA PRIMER
December 4, 2008
12
What Is a Star Schema
The star schema is perhaps the simplest
data warehouse schema. It is called a
star schema because the diagram of this
schema resembles a star, with points
radiating from a central table. The center
of the star consists of a large fact table
and the points of the star are the
dimension tables.
December 4, 2008
13
Star Schema Database Design
Dimension Table
Star Schema - a data model that consists of one
fact table and one or more dimension tables
Dimension Table
Fact Table
Contains:
facts and/or measures
to be analyzed (i.e.,
amount, count, etc.)
and foreign keys (keys
to dimension tables)
Dimension Table
Dimension Table
Dimension Table –
Contains attributes describing a campus entity
(i.e., department, account type, ledger, etc.)
December 4, 2008
14
Star Schema
•Fact tables contain
process activity located in
the center (quantitative
data) Some example facts
are monetary amount,
budget amount and
statistics amount
WHO?
WHAT?
•Dimensions tell the story
and provide the detail to
the facts. Which
department’s budget?
When was the last
transaction posted for a
given account?
December 4, 2008
THE
FACTS
WHERE?
WHEN?
15
Star Schema Benefits
■ Data model is easy to understand
– Based on business process
■ Easy to define hierarchies
– City-State-Country
– Day-Accounting Period-Fiscal Year
■ Easy to navigate
– Number of table joins reduced
– Star schema recognized by leading query tools
■ Maintainable and Scalable
– Dimension tables shared between data models
– Can add new fact tables which use existing dimensions
December 4, 2008
16
Why Star Schema for Cal Poly Finance?
1. Dimensions can easily be reused
■ across current and future finance models
2. Superior query performance for large
datasets
■ i.e., over 5 million rows
3. Usability
■ Understandable for users
■ Better support unanticipated questions
4. Star schemas are extremely compatible with
business intelligence query tools such as
OBIEE.
December 4, 2008
17
DATA MODELING PROCESS
December 4, 2008
18
Data Modeling Process
■ Interactive/ Iterative Process
■ Requirements Gathering
■ Domain research
■ Data profiling
■ Modeling tool
■ Design sessions with data steward
December 4, 2008
19
Data Modeling Process:
Requirements Gathering
■ Primarily Done by Reporting Solution
Development
■ Our Requirement – Refashion Reporting
Solution into a Dimensional Model
– Performance
– Accessibility
December 4, 2008
20
Data Modeling Process: Research
■ Domain research
–
–
–
–
Finance
Cal Poly Financials
Cal Poly Reports (nVision, Brio)
Industry Finance Models (Kimball)
■ Data profiling
– Querying reporting solution
– Correlating fields/ values
– Matrix of Attributes Across Document Sources
December 4, 2008
21
Data Modeling Process: Design
■ Modeling tool
– Needed a tool to support efficient design
– Limitations of modeling tools like Visio
– Embarcadero ER Studio
■ Design sessions with data steward
– model reviews
» Validated groupings of attributes into dimensions
» New (non-reporting solution) sources
(i.e., dept, prog and proj trees)
– prototyping dashboards
December 4, 2008
22
FINANCE DATA MODELS
December 4, 2008
23
Cal Poly Finance Data Models
■ 4 data models implemented to date
■ 22 Dimensions
– Reused across models
– Chart fields, Business unit, Ledger, etc
■ 4 Fact tables
–
–
–
–
Actual Transactions
Budget Transactions
Encumbrance Transactions
Actual, Budget and Encumbrance Summary
December 4, 2008
24
Actual
Fact
Budget
Fact
Encumbrance
Fact
Summary
Fact
High Level Finance Data Model
Diagram
Who
(Dept ID,
Vendor,
etc)
What
(Account,
Fund, etc)
When
(Acctg. Period,
Fiscal Year, etc.)
Where
(Business Unit,
etc)
Model Overview –
Actual, Budget and Encumbrance Summary
December 4, 2008
26
Model Overview
– Actual Transactions
December 4, 2008
27
Model Overview
– Budget Transactions
December 4, 2008
28
Model Overview
– Encumbrance
Transactions
December 4, 2008
29
Closer Look at a Dimension
■ Department
– FINANCE_DEPARTMENT
■ Initial source was CSU
Reporting Solution
Department Attribute
table
– PS_CSU_R_DEPT_TBL
December 4, 2008
30
Closer Look at a Dimension
■ Source Department table
– contains “flattened” version of campus organization
department tree
– Ragged hierarchy
■ Added additional source data – Cal Poly
department tree
– Non-ragged hierarchy
– Robust hierarchy for data exploration
– Supports reporting on department reorganization
or renaming
– Cal Poly users are accustomed to using this tree
December 4, 2008
31
Closer Look at Department Dimension
■ Department Budget Specialist and Manager
– Reporting Solution provides a single manager field
■ Cal Poly Needs Primary and Secondary Budget
Specialists and Managers
– Available for querying and display in reports
– Used for access control in Finance dashboards - filtering /
ease of use
■ Source – Excel Spreadsheet
– Provided by Finance
– Updated weekly
– Plan to create mini-web application to capture data in future
December 4, 2008
32
Department Dimension
December 4, 2008
33
Presentation of Data Models
December 4, 2008
34
Transactional vs. Summary Models
■ Dimensions in the summary model are a
subset of those in the transactional
models
– Allows for drill-across from summary to
transactional models
– “Feels like” a drill-down
December 4, 2008
35
Design Challenges and Choices
December 4, 2008
36
Design Challenges
Challenge
■ Reporting solution is denormalized
– PolyData typically sources normalized data sources and
manages denormalization
Solution
■ Took us a little outside of our comfort zone
■ Deconstructed the reporting tables into unique
combinations of elements
December 4, 2008
37
Design Challenges
Challenge
■ Attributes are “overloaded”
– For example, a document_id can represent an invoice
number, a PO number, a journal identifier, etc.
Solution
■ Preserved this concept in the dimensional
models because it is familiar to Finance
December 4, 2008
38
Design Challenges
Challenge
■ Uniqueness not enforced in the
reporting solution
Solution
■ Added an instance number for identical
transactions
December 4, 2008
39
Design Challenges
Challenge
■ Nightly rebuild of the reporting solution
potentially deletes rows
Solution
■ Effective-dated transactions in the fact
December 4, 2008
40
Design Challenges
Challenge
■ Transactional and summary reporting tables
may not tie
– journal vs. ledger sources
– summing the detail may give the wrong answer
Solution
■ This is a known issue to which Finance is
accustomed
■ Opportunity for a dashboard integrity report
December 4, 2008
41
Design Challenges - Naming
Challenge
■ Reporting Solution names did not conform with
PolyData Warehouse standards
Solution
■ Data Warehouse standards
– Field and table names use full English words when possible
for usability
– Codes precede corresponding description (Code, Descr)
■ Used reporting solution names with full spelling and
adding ‘Code’ and ‘Descr’ where appropriate.
December 4, 2008
42
Design Choices –
Slowly Changing Dimensions
■ Most dimensional attributes were determined by data
steward to be slowly changing dimension Type 1
(SCD1).
■ Exception: Department Table
– SCD1 attributes such as department description
– SCD2 department tree data
■ *IF* you need to track historical changes to
dimensions
– You may need to source dimensions from source system(s)
– Candidates include chart fields, vendors, customers
December 4, 2008
43
Design Choices – SCD Example
■ Cal Poly needs department tree history
– Department tree data
» Slowly Changing Dimension Type 2 - preserves
history
» Effective date rows (effective from and to dates)
» Add new row for each change
– All other department attributes
» Slowly Changing Dimension Type 1 –
overwrites history
» Replace old/outdated data with current
December 4, 2008
44
Design Choices – New Sources
■ In design and prototyping sessions with end
users, it became apparent that additional
source data was needed
■ New non-reporting solution sources were
needed to supplement existing source.
– Department tree
– Program tree
– Project tree
■ Design change from using only reporting
solution as source
December 4, 2008
45
IMPLEMENTATION
December 4, 2008
46
Time and Resources
■ Modeling/Domain familiarization
– 2 data modelers
– June through August 2008
■ Source-to-Target analysis and
documentation
– 2 analysts
– July through September 2008
December 4, 2008
47
Time and Resources
■ Coding and system integration
– 4 ETL programmers
– August through October 2008
■ Total person-days
– July through October 2008
– Approximately 140 person-days
December 4, 2008
48
Time and Resources
■ Caveats
– Established documentation methods and
coding standards
– Slowly changing logic developed or
provided by toolset
– 3 transactional models implemented
identically
December 4, 2008
49
Nightly Build
Job
Minutes (approximate)
Source pull
Reporting solution build
Data model build
140
End user table refresh
TOTAL
December 4, 2008
30
70
60
300
50
Performance Tuning:
Nightly Build
■ Coordination with Finance on their builds
– Nightly processing
– Reporting solution (in transactional database)
■ Approximately one month to level out on
timing
– Tuning specific to the finance jobs
– Coordination with other PolyData warehouse jobs
December 4, 2008
51
Performance Tuning:
End-User Tables
■ Performance was reasonable prior to
indexing
– Largely due to the dimensional structure
■ Performance screamed after indexing
– Indexes on fields used in selection criteria
and drillable hierarchies
– Bitmap indexes on foreign keys in facts
December 4, 2008
52
Implementation:
Interface with Front End Developers
■ joins should be fully documented
■ front end developers may need some training
in interpreting models
■ we still have not come up with an ideal
method for documenting hierarchies
■ challenge - knowledge of hierarchies is
shared
– data steward
– front end developers
– modelers
December 4, 2008
53
CONCLUSION
December 4, 2008
54
Future Work
■ Labor Cost
■ GAAP Reporting
■ Management Dashboard/Analytics
■ Integration with HR and Student Data
December 4, 2008
55
Questions?
Daniel Grieb
Data Warehouse Architect, Analyst/Programmer
Lori Silvestri
Data Warehouse Analyst/Programmer
December 4, 2008
56
Contact
■ OBIEE Technical Conference:
http://polydata.calpoly.edu/dashboards/obiee_conf/index.html
■ Email: polydata@calpoly.edu
December 4, 2008
57
Download