A Strategy to Get Data Governance and Analytical Agility Combined October 2015

advertisement
A Strategy to Get Data Governance and Analytical
Agility Combined
Peter Grolimund , Senior Industry Consultant
October 2015
Agenda
Company
Drivers and Changes
Core of “our” business
Data – Analytics- Governance
Summary
2
© 2014 Teradata
Teradata Company Background
Corporate Vision
Mission
Enabling data-driven business
Providing the world’s best analytic data solutions to drive
competitive advantage for our customers
2,600+ CUSTOMERS in 77 COUNTRIES
10000+ EMPLOYEES
TOP 10 public U.S. software company
Member of
S&P 500
Financially STRONG and GROWING (Revenue of $2,732M)
1
End-to-End Solutions and Services
• Data warehousing
• Big data analytics
• Marketing applications
3
2
•
•
•
•
•
Industry Expertise and Experience
Financial Services
Communications
Retail
Manufacturing
Healthcare
What Does This Mean For You?
•
•
•
•
•
Healthcare
Government
Travel/Transportation
Media/Entertainment
Energy/Utilities
3
Data Analytics Leadership
•
•
•
•
Deep expertise
Analytic engines
Advanced algorithms
Industry acclaimed
Agenda
Company
Drivers and Changes
Core of “our” business
Data – Analytics- Governance
Summary
4
© 2014 Teradata
Personalized medicine and..data
5
Personalized medicine
in the most extreme
6
P. Grolimund
And more...
Traditional Patient records
Sensors/apps
Behavior
Environment
Genome
7
EHR...
8
Agenda
Company
Drivers and Changes
Core of “our” business
Data – Analytics- Governance
Summary
9
© 2014 Teradata
A new system and a new syste...
New reporting
analytics
Data need
identified
Reporting
functionality
identified
Project
Approval
URS/FS/DS
Build & Test
Deploy
• New data categories (‘non’ structured)
• New analytical tools and methods (MapReduce,
SPOTFIRE, R….)
• New modeling tools (SAS, R, …)
• Validation cycle takes time (several months)
• URS/FS/DS built on ‘ideas’ (documents)
• Data integration performed with different technologies
and different ways (semantic not aligned)
• Code lists and references in different versions (WHO,
SNOMED, …)
• Time-dependent analytics not always possible
Reporting
environment
Reporting
environment
Hadoop
Data mart
Data integration
Data integration
Data feeds
Data feeds
Project A delivers
System A
Project B delivers
System B
• Data integration impacts those
who do not profit
• Internal standards
• CRO connectivity and usage of
analytics not easy to handle
(access rights etc.)
• Transactional systems change,
provide their own reporting tools
• Geographic analytics not always possible
• Public data integration performed several times
(governance)
10
A new system and a new syste...
BI 1
BI 2
Statistics
BI 1
Mart
1.1
Mart
1.2
Mart
1.3
Mart
2.1
Preclinical
Safety
Standard
reports
Performance
reporting
Screening
analytics
Winnonlim
Animal exp.
results
Research Assays
experimental
outcomes
High
Throuput
Screening
Pk/PD
ELN/ELAB
11
ELN global
Screening
SW
Genome
analytics
Compound
database
Compound
registration
and mgmt
EXTERNAL DATA, Codes References etc.
Patent
database
Genome
repository
Images
Agenda
Company
Drivers and Changes
Core of “our” business
Data – Analytics- Governance
Summary
12
© 2014 Teradata
UNIFIED DATA ARCHITECTURE
INTEGRATED DATA WAREHOUSE
TERADATA DATABASE
INTEGRATED DISCOVERY PLATFORM
TERADATA
PORTFOLIO FOR
HADOOP
TERADATA ASTER DATABASE
13
RESTFUL API
RESTFUL API
DATA
PLATFORM
Applications
APP FRAMEWORK
LISTENING FRAMEWORK
REAL TIME
PROCESSING
Security,
Workload Management
User requirements in a nut-shell
Process related
•
•
•
14
Flexible Governance of the analytical
process(es) controlled by the business
Reproducibility
Collaboration in the process with
externals
System related
•
•
•
•
Holding any kind of data
Growing / scalable with the needs
Workbench of analytical tools and
combination of it
Governance
• Traceability
• Data access control
• Data version control
• Performance control
System related
•
•
•
•
15
Holding any kind of data
Growing / scalable with the needs
Workbench of analytical tools and
combination of it
Governance
• Traceability
• Data access control
• Data version control
• Performance control
New analytical approaches
Forecast of Dengue
Data
Wikipedia search
• Google Trends
• Meteorological data
• Twitter
• Existing case reporting
• Future: Topography, and population
density
16
And visualizations
17
Process related
•
Flexible Governance of the analytical
process(es) controlled by the business
Reproducibility
Collaboration in the process with
externals
Ad hoc integration and analysis (try it out)
•
•
Load experimental, untested data from external
sources
Rapid prototyping, exploratory and experimentation
analysis
Easily join to production data
18
Data integration layer
Internal data sources
The analytical layer
External data sources
URS
URS
FS
DS
19
URS
FS
DS
FS
DS
Internal data
sources
The analytical layer
Data lab extension
External data
sources
URS
Internal data sources
The analytical layer
Data integration layer
External data sources
20
The URS is the result of a pilot
The pilot controlled by a business
process
URS
PQ
This part should be provided by ‘standard’ qualified tools (we
FS
OQ
are not validating WORD, EXCEL....)
21
Across the organization with a well controlled
environment
22
Access to any data: Teradata QueryGrid™
TERADATA
DATABASE
23
HADOOP
TERADATA
ASTER
DATABASE
TERADATA
DATABASE
RDBMS
DATABASES
MONGODB
DATABASE
COMPUTE
CLUSTER
Push-down
to Hadoop
System
SQL,
SQL-MR,
SQL-GR
Multiple
Teradata
Systems
Push-down
to Other
Database
Push-down
to NoSQL
Databases
Run SAS,
Perl, Ruby,
Python, R
Agility by assured performance: In database analytics
e.g. with using SAS (Storing data in a performing database)
System related
•
•
•
•
24
Holding any kind of data
Growing / scalable with the needs
Workbench of analytical tools and
combination of it
Governance
• Traceability
• Data access control
• Data version control
• Performance control
Agility by assured performance: In database
SAS Only
#
Business
Line
1 oscar
2 GE
3 ingenix
4 humana
25
5 ingenix
6 ingenix
7 ingenix
8 ingenix
9 ingenix
10 ingenix
11 ingenix
12 ingenix
13 pharmetrics
14 pharmetrics
15 pharmetrics
16 pharmetrics
17 pharmetrics
18 pharmetrics
SAS Log Name
oscar_mdcd_v3.log
mk_text_observation_f_sort.log
dcf ~ i3_qc.log
humana_dups.log
analysis ~
100_indentifying_initial_patients.log
analysis ~ 200_extracting_mx_claims.log
analysis ~ 210_extracting_rx_claims.log
dcf ~ mk_s2009_r12q2.log
dcf ~ mk_s2010_r12q2.log
dcf ~ mk_s2011_r12q2.log
dcf ~ mk_m2011_r12q2.log
dcf ~ mk_r2011_r12q2.log
130_af_all_claims.log
110_af_claims.log
183_table8d.log
183_table8b.log
162_table2b.log
182_table8d.log
# of
Steps
Days Hours Minutes
945
3
3,401
28
9.6 231.6 13,894.1
4.3 103.0 6,178.0
15.1
908.2
5.6
333.3
12
11
12
20
20
20
20
20
12
6
43
39
30
43
1.7
1.1
1.6
1.5
1.0
1.7
99.4
68.1
28.5
98.2
87.8
61.8
56.8
41.9
101.2
52.0
30.8
30.4
20.6
23.8
SAS + Teradata
% of
SAS X Times
Days Hours Minutes Only Faster
1.83
110.0
3.8
45.8
18.8
1%
0%
5%
6%
126.3
1,625.8
19.8
17.7
1.5
1.0
0.4
3.8
3.6
3.4
2.3
3.3
4.7
2.7
3.4
1.5
2.8
1.8
2%
1%
1%
4%
4%
6%
4%
8%
5%
5%
11%
5%
13%
8%
66.3
68.1
71.3
25.8
24.4
18.2
24.7
12.7
21.5
19.3
9.1
20.3
7.4
13.2
In-Database Analytic Tools and Partners
26
User requirements in a nut-shell: process controlled by
partner software
Process related
•
•
•
Output 1.1
Was produced with Program 1.3
Using Data V 1.1
By individual X
At date. xy
Flexible Governance of
the analytical
process(es) controlled
by the business
Reproducibility
Collaboration in the
process with externals
Data V.1.4
Data V.1.2
Data V.1
27
Program V. 1.4
Program V. 1.3
Program V. 1.1
output V. 1.4
output V. 1.3
output V. 1.1
Agility in clinical: Become visionary...
Disease
Management
Finance
Sales
Physicians
Patients
HR
Marketing
Consumers
HR/Benefits
Contracts
Manufacturing
Patient/Consumer
Drug/Medical
Sales/Marketing
Government
HR/Benefits
Claims
Finance/GL
R&D
Hospitals
TD offers the data model
Disease Mgmt/Wellness
And the technology
Call Center/Communication
Products/Services
To expose the data set stored once
in different formats
Population Demographics
The enterprise view
28
R&D
Teradata LS-LDM Overview
Subject Areas Covered
29
Life Sciences Activity
Life Sciences Adverse Event
Life Sciences Biologic Entity
Life Sciences Conceptual Model
Life Sciences Development
Life Sciences Document
Life Sciences Ethnicity
Life Sciences Fact
Life Sciences Material
Life Sciences Measurement
Life Sciences Minor Entities
Life Sciences Product
Life Sciences Project
Life Sciences Protocol
Life Sciences Race
Life Sciences Recruitment
Life Sciences Regulatory
Life Sciences Research
Life Sciences Standard Coding
Life Sciences Strategy
Life Sciences Study
Life Sciences Study Event
Life Sciences Genomic
Account Budget
Activity Based Costing
Advertisement
Bid
Channel
Claim
Contact
Contract
Customs
Demographics
Document
Equipment
Event
Forecast
Geography
General Ledger
Goods Receipt
Human Capital Management
Inventory
Invoice
Item
Legal Case Management
Location
Measurement
Multimedia Component
Party
Plan
Point Of Sale Register
Privacy
Procurement
Project Resource
Promotion
Return Management
RFID/Track And Trace
Sales
Shipment
Survey
Time Period
Trait
Warranty Management
Web
Work Order
Work Process
Agenda
Company
Drivers and Changes
Core of “our” business
Data – Analytics- Governance
Summary
30
© 2014 Teradata
Empower the business
Foundation for agility
• The system should enable
where ever possible business
process based extensions
rather than IT-projects and
System-diversity
System related
•
•
•
•
31
Holding any kind of data
Growing / scalable with the needs
Workbench of analytical tools and
combination of it
Governance
• Traceability
• Data access control
• Data version control
• Performance control
32
32
© 2014 Teradata
Download