Uploaded by SM Tabrez

Data Lifecycle Stages & Governance

advertisement
Data lifecycle stages
Stage
Events
Dataset discovery
Dataset dictionary
Register producer consumer and use cases
Used for data discovery
Catalogue new datasets
Data sourcing
Define producer/consumer model
Define SLA/SLO/SLE/volume
Define pipeline
Quality check and alerting mechanism
Data synthesis
Enrich data - join with static data/market data
Transform data - projections/derivations
Model to model transformations
Data
consumption
Reporting
Dashboarding
Analytics
Ad-hoc extracts
Redistribution
Register client for redistribution
Providing data and information to downstream
Real time movement of data
Entitlement controls and security for redistribution of data
Archival and
purge
Archive data based on regulatory needs
Reproducibility based off archived data.
Purging of obsolete data
Governance and
control
Entitlement and access management
Ownership and audit control
Dataset discovery
/catalogue
Data purge
Sourcing
Data synthesis
Data archival
Data re-distribution
Where we are
Stage
Issues
Dataset discovery
Undocumented fragmented data sets – operationally
expensive to understand what attributes are present vs
missing in dataset fragments across systems.
Data sourcing
Operational agreements between producer and consumer
lead to low quality of data in terms of availability, integrity,
completeness. This might even lead to reputational
impacts.
Data synthesis
Data sets being enriched with limited golden sources of data
leading to accuracy issues.
Data
consumption
Chances to inaccurate data slipping into reports as data
quality checks might not fully be there.
Tight coupling of data to underlying storage – embedded
sql. Causes slow time to market for any change requested.
Redistribution
Low governance of downstream and upstream systems.
Time to market changes is impacted as manual checks for
any changes are needed – increasing operating cost.
Quality impacted as no automated controls present for
SLO/SLA monitoring.
Archival and
purge
Higher operating cost if we are archiving data which might
be archived by upstream systems. SLO/SLA contracts will
help minimize these.
Governance and
control
Limited governance and control as layers are not clearly
defined leading to production issues.
Dataset discovery
/catalogue
Data purge
Sourcing
Data synthesis
Data archival
Data re-distribution
Using modelled data
Reporting
Conceptual model
E.g. SSDR( DQSL +
LDM)
Dashboard
/analytics
Physical
storage
E.g.
FMDP
Physical
storage 2
E.g.
Cloud/
EDMP
E.g. FMDP
Logical model
E.g. Rosetta model
 Applications queries insulated
from underlying storage
nuances
 Data analytics and governance
on segregated entity view of
data
 Longevity of applications is
less as compared to data and
these can be swapped
without impacting data layer.
 Data modeling can help
reduce the cost of creating
apps by reusing entity views
Physical model
 Well defined entities improve
accuracy of data queries.
 Relationships defined across
entities allow for precise
stitching of data.
 Better quality: Data modeling
can help ensure that data
sources are well documented
with data catalogues and
lineage.
 Producer-consumer models
provide better contracts
Physical model
E.g. Cloud/EDMP
 Better scalability of viewing entities m:n mapping between Logical: Physical
models
 Better scalability of viewing entities m:n mapping between Logical: Physical
models
 Improved system performance: Data
modeling can help ensure that systems
are the right size for the data needs of
the business.
Download