Data lifecycle stages Stage Events Dataset discovery Dataset dictionary Register producer consumer and use cases Used for data discovery Catalogue new datasets Data sourcing Define producer/consumer model Define SLA/SLO/SLE/volume Define pipeline Quality check and alerting mechanism Data synthesis Enrich data - join with static data/market data Transform data - projections/derivations Model to model transformations Data consumption Reporting Dashboarding Analytics Ad-hoc extracts Redistribution Register client for redistribution Providing data and information to downstream Real time movement of data Entitlement controls and security for redistribution of data Archival and purge Archive data based on regulatory needs Reproducibility based off archived data. Purging of obsolete data Governance and control Entitlement and access management Ownership and audit control Dataset discovery /catalogue Data purge Sourcing Data synthesis Data archival Data re-distribution Where we are Stage Issues Dataset discovery Undocumented fragmented data sets – operationally expensive to understand what attributes are present vs missing in dataset fragments across systems. Data sourcing Operational agreements between producer and consumer lead to low quality of data in terms of availability, integrity, completeness. This might even lead to reputational impacts. Data synthesis Data sets being enriched with limited golden sources of data leading to accuracy issues. Data consumption Chances to inaccurate data slipping into reports as data quality checks might not fully be there. Tight coupling of data to underlying storage – embedded sql. Causes slow time to market for any change requested. Redistribution Low governance of downstream and upstream systems. Time to market changes is impacted as manual checks for any changes are needed – increasing operating cost. Quality impacted as no automated controls present for SLO/SLA monitoring. Archival and purge Higher operating cost if we are archiving data which might be archived by upstream systems. SLO/SLA contracts will help minimize these. Governance and control Limited governance and control as layers are not clearly defined leading to production issues. Dataset discovery /catalogue Data purge Sourcing Data synthesis Data archival Data re-distribution Using modelled data Reporting Conceptual model E.g. SSDR( DQSL + LDM) Dashboard /analytics Physical storage E.g. FMDP Physical storage 2 E.g. Cloud/ EDMP E.g. FMDP Logical model E.g. Rosetta model Applications queries insulated from underlying storage nuances Data analytics and governance on segregated entity view of data Longevity of applications is less as compared to data and these can be swapped without impacting data layer. Data modeling can help reduce the cost of creating apps by reusing entity views Physical model Well defined entities improve accuracy of data queries. Relationships defined across entities allow for precise stitching of data. Better quality: Data modeling can help ensure that data sources are well documented with data catalogues and lineage. Producer-consumer models provide better contracts Physical model E.g. Cloud/EDMP Better scalability of viewing entities m:n mapping between Logical: Physical models Better scalability of viewing entities m:n mapping between Logical: Physical models Improved system performance: Data modeling can help ensure that systems are the right size for the data needs of the business.