A Strategy to Get Data Governance and Analytical Agility Combined Peter Grolimund , Senior Industry Consultant October 2015 Agenda Company Drivers and Changes Core of “our” business Data – Analytics- Governance Summary 2 © 2014 Teradata Teradata Company Background Corporate Vision Mission Enabling data-driven business Providing the world’s best analytic data solutions to drive competitive advantage for our customers 2,600+ CUSTOMERS in 77 COUNTRIES 10000+ EMPLOYEES TOP 10 public U.S. software company Member of S&P 500 Financially STRONG and GROWING (Revenue of $2,732M) 1 End-to-End Solutions and Services • Data warehousing • Big data analytics • Marketing applications 3 2 • • • • • Industry Expertise and Experience Financial Services Communications Retail Manufacturing Healthcare What Does This Mean For You? • • • • • Healthcare Government Travel/Transportation Media/Entertainment Energy/Utilities 3 Data Analytics Leadership • • • • Deep expertise Analytic engines Advanced algorithms Industry acclaimed Agenda Company Drivers and Changes Core of “our” business Data – Analytics- Governance Summary 4 © 2014 Teradata Personalized medicine and..data 5 Personalized medicine in the most extreme 6 P. Grolimund And more... Traditional Patient records Sensors/apps Behavior Environment Genome 7 EHR... 8 Agenda Company Drivers and Changes Core of “our” business Data – Analytics- Governance Summary 9 © 2014 Teradata A new system and a new syste... New reporting analytics Data need identified Reporting functionality identified Project Approval URS/FS/DS Build & Test Deploy • New data categories (‘non’ structured) • New analytical tools and methods (MapReduce, SPOTFIRE, R….) • New modeling tools (SAS, R, …) • Validation cycle takes time (several months) • URS/FS/DS built on ‘ideas’ (documents) • Data integration performed with different technologies and different ways (semantic not aligned) • Code lists and references in different versions (WHO, SNOMED, …) • Time-dependent analytics not always possible Reporting environment Reporting environment Hadoop Data mart Data integration Data integration Data feeds Data feeds Project A delivers System A Project B delivers System B • Data integration impacts those who do not profit • Internal standards • CRO connectivity and usage of analytics not easy to handle (access rights etc.) • Transactional systems change, provide their own reporting tools • Geographic analytics not always possible • Public data integration performed several times (governance) 10 A new system and a new syste... BI 1 BI 2 Statistics BI 1 Mart 1.1 Mart 1.2 Mart 1.3 Mart 2.1 Preclinical Safety Standard reports Performance reporting Screening analytics Winnonlim Animal exp. results Research Assays experimental outcomes High Throuput Screening Pk/PD ELN/ELAB 11 ELN global Screening SW Genome analytics Compound database Compound registration and mgmt EXTERNAL DATA, Codes References etc. Patent database Genome repository Images Agenda Company Drivers and Changes Core of “our” business Data – Analytics- Governance Summary 12 © 2014 Teradata UNIFIED DATA ARCHITECTURE INTEGRATED DATA WAREHOUSE TERADATA DATABASE INTEGRATED DISCOVERY PLATFORM TERADATA PORTFOLIO FOR HADOOP TERADATA ASTER DATABASE 13 RESTFUL API RESTFUL API DATA PLATFORM Applications APP FRAMEWORK LISTENING FRAMEWORK REAL TIME PROCESSING Security, Workload Management User requirements in a nut-shell Process related • • • 14 Flexible Governance of the analytical process(es) controlled by the business Reproducibility Collaboration in the process with externals System related • • • • Holding any kind of data Growing / scalable with the needs Workbench of analytical tools and combination of it Governance • Traceability • Data access control • Data version control • Performance control System related • • • • 15 Holding any kind of data Growing / scalable with the needs Workbench of analytical tools and combination of it Governance • Traceability • Data access control • Data version control • Performance control New analytical approaches Forecast of Dengue Data Wikipedia search • Google Trends • Meteorological data • Twitter • Existing case reporting • Future: Topography, and population density 16 And visualizations 17 Process related • Flexible Governance of the analytical process(es) controlled by the business Reproducibility Collaboration in the process with externals Ad hoc integration and analysis (try it out) • • Load experimental, untested data from external sources Rapid prototyping, exploratory and experimentation analysis Easily join to production data 18 Data integration layer Internal data sources The analytical layer External data sources URS URS FS DS 19 URS FS DS FS DS Internal data sources The analytical layer Data lab extension External data sources URS Internal data sources The analytical layer Data integration layer External data sources 20 The URS is the result of a pilot The pilot controlled by a business process URS PQ This part should be provided by ‘standard’ qualified tools (we FS OQ are not validating WORD, EXCEL....) 21 Across the organization with a well controlled environment 22 Access to any data: Teradata QueryGrid™ TERADATA DATABASE 23 HADOOP TERADATA ASTER DATABASE TERADATA DATABASE RDBMS DATABASES MONGODB DATABASE COMPUTE CLUSTER Push-down to Hadoop System SQL, SQL-MR, SQL-GR Multiple Teradata Systems Push-down to Other Database Push-down to NoSQL Databases Run SAS, Perl, Ruby, Python, R Agility by assured performance: In database analytics e.g. with using SAS (Storing data in a performing database) System related • • • • 24 Holding any kind of data Growing / scalable with the needs Workbench of analytical tools and combination of it Governance • Traceability • Data access control • Data version control • Performance control Agility by assured performance: In database SAS Only # Business Line 1 oscar 2 GE 3 ingenix 4 humana 25 5 ingenix 6 ingenix 7 ingenix 8 ingenix 9 ingenix 10 ingenix 11 ingenix 12 ingenix 13 pharmetrics 14 pharmetrics 15 pharmetrics 16 pharmetrics 17 pharmetrics 18 pharmetrics SAS Log Name oscar_mdcd_v3.log mk_text_observation_f_sort.log dcf ~ i3_qc.log humana_dups.log analysis ~ 100_indentifying_initial_patients.log analysis ~ 200_extracting_mx_claims.log analysis ~ 210_extracting_rx_claims.log dcf ~ mk_s2009_r12q2.log dcf ~ mk_s2010_r12q2.log dcf ~ mk_s2011_r12q2.log dcf ~ mk_m2011_r12q2.log dcf ~ mk_r2011_r12q2.log 130_af_all_claims.log 110_af_claims.log 183_table8d.log 183_table8b.log 162_table2b.log 182_table8d.log # of Steps Days Hours Minutes 945 3 3,401 28 9.6 231.6 13,894.1 4.3 103.0 6,178.0 15.1 908.2 5.6 333.3 12 11 12 20 20 20 20 20 12 6 43 39 30 43 1.7 1.1 1.6 1.5 1.0 1.7 99.4 68.1 28.5 98.2 87.8 61.8 56.8 41.9 101.2 52.0 30.8 30.4 20.6 23.8 SAS + Teradata % of SAS X Times Days Hours Minutes Only Faster 1.83 110.0 3.8 45.8 18.8 1% 0% 5% 6% 126.3 1,625.8 19.8 17.7 1.5 1.0 0.4 3.8 3.6 3.4 2.3 3.3 4.7 2.7 3.4 1.5 2.8 1.8 2% 1% 1% 4% 4% 6% 4% 8% 5% 5% 11% 5% 13% 8% 66.3 68.1 71.3 25.8 24.4 18.2 24.7 12.7 21.5 19.3 9.1 20.3 7.4 13.2 In-Database Analytic Tools and Partners 26 User requirements in a nut-shell: process controlled by partner software Process related • • • Output 1.1 Was produced with Program 1.3 Using Data V 1.1 By individual X At date. xy Flexible Governance of the analytical process(es) controlled by the business Reproducibility Collaboration in the process with externals Data V.1.4 Data V.1.2 Data V.1 27 Program V. 1.4 Program V. 1.3 Program V. 1.1 output V. 1.4 output V. 1.3 output V. 1.1 Agility in clinical: Become visionary... Disease Management Finance Sales Physicians Patients HR Marketing Consumers HR/Benefits Contracts Manufacturing Patient/Consumer Drug/Medical Sales/Marketing Government HR/Benefits Claims Finance/GL R&D Hospitals TD offers the data model Disease Mgmt/Wellness And the technology Call Center/Communication Products/Services To expose the data set stored once in different formats Population Demographics The enterprise view 28 R&D Teradata LS-LDM Overview Subject Areas Covered 29 Life Sciences Activity Life Sciences Adverse Event Life Sciences Biologic Entity Life Sciences Conceptual Model Life Sciences Development Life Sciences Document Life Sciences Ethnicity Life Sciences Fact Life Sciences Material Life Sciences Measurement Life Sciences Minor Entities Life Sciences Product Life Sciences Project Life Sciences Protocol Life Sciences Race Life Sciences Recruitment Life Sciences Regulatory Life Sciences Research Life Sciences Standard Coding Life Sciences Strategy Life Sciences Study Life Sciences Study Event Life Sciences Genomic Account Budget Activity Based Costing Advertisement Bid Channel Claim Contact Contract Customs Demographics Document Equipment Event Forecast Geography General Ledger Goods Receipt Human Capital Management Inventory Invoice Item Legal Case Management Location Measurement Multimedia Component Party Plan Point Of Sale Register Privacy Procurement Project Resource Promotion Return Management RFID/Track And Trace Sales Shipment Survey Time Period Trait Warranty Management Web Work Order Work Process Agenda Company Drivers and Changes Core of “our” business Data – Analytics- Governance Summary 30 © 2014 Teradata Empower the business Foundation for agility • The system should enable where ever possible business process based extensions rather than IT-projects and System-diversity System related • • • • 31 Holding any kind of data Growing / scalable with the needs Workbench of analytical tools and combination of it Governance • Traceability • Data access control • Data version control • Performance control 32 32 © 2014 Teradata