See full presentation here

advertisement
An Agile Approach to Building &
Managing Data Warehouses
A Briefing by WhereScape
Mary Edie Meredith, Sr. Technical Analyst
- maryedie@wherescape.com
Why do Data Warehouse Projects struggle ?
1.
Inaccurate business requirements - #1 problem IDC
2.
Poor development productivity
3.
Slow development cycles
4.
High cost of resources
5.
High TCO
6.
Poor documentation – usually the last thing that is considered &
never up to date.
7.
Poor data quality
8.
HIGH RISK
Gartner notes that over 50% of data warehouse projects fail
or go wildly over budget
2
Where did they go wrong?
– one real problem is the “Big Bang” project approach
“Incremental Data Warehouse Development –
The Only Way to Fly” Bill Inmon, Jan 8, 2009, (BeyeNetwork)
– “There are many reasons the ‘Big Bang’ approach doesn’t work … “but at the heart is
inability of the development analyst to gather requirements in the manner prescribed
by the SDLC”
– “End users of analytical systems need to know what the possibilities are before they
can articulate the requirements.”
The goal is NOT to build a Data Warehouse, but rather…
– Deliver real value
– Create a solution that is adaptable because responding quickly to change brings
competitive advantage
– Create a process to develop and maintain the solution that is trustworthy and
sustainable
3
How would agile proponents approach the problem?
From the agile manifesto: //agile
• Early, frequent, and continuous test and delivery of valuable working
software (every 2 wks-2mos).
• Welcome changing requirements, even late in development.
• Business people, developers work together daily throughout the project.
• Build projects around motivated individuals. Give them the environment and
support they need, and trust them to get the job done.
• The most efficient, effective method of conveying information to and within a
development team is face-to-face conversation.
• Continuous attention to technical excellence and good design enhances
agility.
• Simplicity--the art of maximizing the amount of work not done--is essential.
• At regular intervals, the team reflects on how to become more effective, then
tunes and adjusts its behavior accordingly.
4
What is uncomfortable about this approach?
• The further out in time, the less a project team can say about what will
be accomplished.
• An agile approach can break the rules.
– Agile implementers sometimes wrongly assume you can break ANY rule.
– Shortcuts do not equal Quality Pragmatism
• Classic trade-offs for project managers - Schedule/ Scope/ Resources/
Quality – agile leaves little wiggle room.
• Does not lend itself to outsourcing, distributed teams.
• Having a close working relationship with business users does not solve
the difficulty determining requirements.
And ….
5
If I could deliver something meaningful in weeks
DON’T YOU THINK I
WOULD HAVE, ALREADY.
6
Agile Approach Versus Traditional Approach
Docs?
7
What really works using agile
“The WhereScape Way”
• A Governance structure
– Strategy, Architecture, Roadmap, Standards
– Goals, sponsors, infrastructure, data governance ….
• New Development Paradigm for delivering data - RED
–
–
–
–
ETL tools are great for moving data, but RED can do DW part better.
Integrated Development using one metadata driven tool.
Do the data delivery in the database.
Incorporate Business Rules into data delivery process
• Iterative workshops with business users
– Use REAL DATA for flushing out requirements (RED enables this)
– Track all issues discovered, especially data quality
8
Agile in Operation
Business
User
Sessions
Live Data Workshop
• Integrate analysis, design, creation, data delivery, deployment, iteration
• Useful even if you just need to provide the presentation layer
• Feedback from business users on live data part of the development process
9
Speeding up the development by leveraging
metadata, embedding best practice methods
dim_customer_key
dss_update_time
10
Data Warehouse Scenario – Build a Sales Fact
11
Star schema creation scenario – start with load table
Source
Warehouse
R
R
R
R
Native RDBMS, ODBC accessible, Files
Oracle, SQL/Server, Teradata, DB2
12
RED Browser Mode
Actions
Drag and Drop Target Area
Choose connection
and filtering
Metadata
Browsing
Connections
Results
13
For the Teradata shop -
14
Star schema creation scenario – start with load table
Source
Warehouse
R
R
R
R
Native RDBMS, ODBC accessible, Files
Oracle, SQL/Server, Teradata, DB2
15
Drag and Drop Example: load source data
16
Drag and Drop Example: load table properties
17
Drag and Drop Example: load table storage mapping
18
Drag and Drop Example: load table “create and load”
metadata
19
Drag and Drop Example: load table results
create
generated load script execution
20
Drag and Drop Example: load table results
Display Data
create
generated load script execution
21
Stage table creation scenario – the stage table
Source
Warehouse
Foreign dimension
Keys, lookups
R
R
R
R
Source table join
22
Stage table: start with load_order_header (Drag and
Drop)
23
Add columns from load_order_line (Drag and Drop)
Load_order_header
Column metadata
24
Add columns from load_order_line (Drag and Drop)
prevents duplicate column names
Load_order_header
Column metadata
25
Add FK cols to Stage Table – Drag and Drop dim_*
Drag and drop Dimension table keys
26
Column Metadata easily altered
27
Column Transformations –
Business Rules, Computed Fields, String Manipulation, Type Conversion, Null handling ,…
28
Create the Stage Table (right click object)
29
Create the update procedure (object Properties)
30
…then select Procedure Type
31
… then specify the Join statement
Numerous joins
supported
add appropriate clauses
32
…indicate the business key to identify SK in Dimension
Prompts if column names match
33
…indicate the join column if names are different
34
Procedure is created, compiled. Execute Procedure.
35
Display Data
36
Fact table creation scenario – Sales Fact table
Source
Warehouse
R
R
R
R
37
Create the Fact Table from the Stage table
38
Metadata leveraged to create the code
Dimension tables
are created with
“zero” row for unknowns
Transformation for quantity column
Join metadata
39
Auto generated stored procedure code …
•
•
•
•
•
•
•
Keeps all the data movement in the database
Provides consistent variable naming, coding best practices
Utilizes custom parameters you can embed in metadata
Includes error checking and rollbacks
Preserves the metadata for easy modification
Can augment with custom procedures
Includes features best practices for various object types
o
o
o
o
o
Can handle slowly changing dimensions (all three types)
Procedure provided to populate and update time dimension
Handles code for surrogate keys, update and life-span dates
Creates Unknown Row for each dimension table
Accounts for missing dimension key matches in source data
Let’s advance developers can skip the mundane
Allows less experienced developers to be productive
40
Generated Procedures with version compares
41
Next Step – Business User review
Easy vehicles to show this to Business users:
Output table data to Excel
Stress test with SSAS cube
42
Create a SSAS Cube for Business User Eval
Drag and Drop Fact to OLAP Cube target
Creates OLAP measure group
Creates OLAP dimensions
43
Create a SSAS Cube for Business User Eval
Slice and Dice in Analysis Services
44
Capturing Metadata - Lineage information
45
Leveraging Metadata: Reports
46
Ready to Deploy
47
Scheduler to manage objects and data flow
Run in parallel
48
Scheduler to manage objects and data flow
Run in parallel
49
Diagrammatical View Example: Update Job
50
Application Files to promote to QA and Production
51
Leveraging Metadata: Auto Producing Documentation
52
User Documentation
53
Where RED fits
"WhereScape promised a lot and the product has
delivered. We are very happy with the amount of
time it is saving us in development, as well as the
documentation it is producing and the built-in
scheduler. I am very happy with the purchase.“
"We estimate the development lifecycle is 20-25% of
what it was previously when we were hand-coding."
Dan Mosher, Director of Enterprise Data Warehousing
55
“WhereScape RED offers IPC a sophisticated Lifecycle Methodology that
guides us through the process of building our data warehouse. RED
creates integrated database objects such as tables, indexes,
procedures, etc; produces standard yet customizable T-SQL code and
auto-generated user and technical documentation.”
Maylee Sanchez,
Sr. Database Administrator
Some WhereScape Customers
57
Conclusion
• Build Your Data Warehouse Solution
– Way Faster
– Way Cheaper
– Ready for Change
• Get Full Documentation
– For Users
– For Techies
• And DO IT THE AGILE WAY
58
Tools and Reports
60
Additional CUBE Features
• Can add MDX calculations to the cube metadata for calculated
members
–
Specify font, foreground/background colors, boldness, display format, non-empty behavior, order number,
client visibility
• Canned MDX calculations
–
Month/Year to date, Moving Qtr/Year, same month previous year, previous year to date.
• Can specify Post Create or Post Update XML/A Scripts
–
Allows features built outside of RED to be added to the Schedule cube processing (e.g. security roles added,
perspectives, translations )
• Cube properties include
–
–
–
–
Processing modes for Cubes (Regular, Lazy Aggregation) and priority
OLAP dimension processing (together or separately)
Cube visibility to client applications
Default Measure and estimated rows
• Can optionally drop Dimensions, Measure Groups, Cubes, and Cube
databases from within RED.
• Can manage KPIs, partitioning, and processing for measure groups
61
Download