Hadoop Cloud

advertisement
Ensuring Compliance of
Patient Data with Big Data
and BI
Ayad Shammout & Denny Lee
April 10-12, Chicago, IL
Please silence
cell phones
April 10-12, Chicago, IL
Agenda
A Quick Big Data Primer
Healthcare and Big Data
Compliance and Auditing
SQL Compliance Project
Compliance and Auditing with Big Data and BI
Big Data: Unstructured Volumes of Data
Analytics: PowerPivot, Power View
3
What is Big Data?
Volume
Exceeds physical limits of vertical scalability
Velocity
Decision window small compared to data
change rate
Variety
Many different formats makes integration
expensive
Variability
Many options or variable interpretations
confound analysis
4
10x
Data
explosion
increase every
five years
85% from
new data types
Volume
Velocity
Variety
By 2015, organizations that
build a modern information
management system will
outperform their peers
financially by 20 percent.

Hadoop
 – Gartner, Mark Beyer
“Information Management in the
21st Century”
Cloud
5
Big Data Business Value
140,000-190,000
15 out of 17
1.5 million
€250 billion
50-60%
$300 billion
7
Data
8
Hadoop: The most visible face of Big Data
9
HDInsight: Visit HadoopOnAzure.com
10
10
Healthcare
and Big Data
Healthcare and IT
Often the laggard in technology
Yet application of IT to healthcare can radically change what we can do
Genomic Sequencing
Proteomic sequencing
Incidence Prediction
12
Healthcare Big Data Example Scenarios
Clinical Trial Deviations
Originally Viagra was developed to lower blood pressure and treat Angina
Now its used to help newborn pulmonary hypertension and altitude sickness
Incidence Prediction
Missed 4 or more visits, twice as likely to have an asthmatic incident
Particular Cardiac monitor sine wave points to highly likelihood of heart attack
Campaigns
Social media and advertising campaigns to understand user behavior and sentiment
Patient Satisfaction
Social media and advertising campaigns to understand user behavior and sentiment
13
BIDMC Auditing Scenario
Auditing is critical component HIPAA in ensuring patient privacy
1 Billion rows+ of audit data
146 mission critical clinical applications
Comprehensive audits yield 300-500k transactions/day
HIPAA requires audit system with 20 years of data
Auditing Project
Available to community as part of Compliance SDK
Updating for SQL Server 2012, HDInsight, Power View, and MobileBI*
Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit
data - that's cool!
John Halamka’s Cool Technology of the Week
(Wellsphere Top Health Blogger, Health Impact Award)
14
BIDMC Compliance Project
Use Excel 2013
PowerPivot and Power
View
HDInsight
Azure
HDInsight
Windows
SSAS (tabular)
SQL Server
2008/2012
SSIS
SSIS
SSIS
ETL Logs to
HDFS
Audit Logs
15
Auditing Sensitive Information
Querying Audit Information
Process Audit Information
Use PowerPivot / Power View / Analysis Services to Query the data .
Use S SIS to process SQL2008 All-Actions Audit Information and other CG application
audit log data; potentially can use Management Performance DW framework.
Caregroup Environment
SQL2008 All-Actions Audit Data
SSIS
SQL Audit
SQL 2008 / 2012 R2
Connect/Logic
File Server
SSRS 2008 /
Power View
Policy Reports
Security Reports
Policy Analysis
Security Analysis
Policy Best
Practices
Compliance
Reports
Policy Information
CG Application Data
Intersystems
Cache
Feedback Action Loop
Update systems to keep them
compliant and secure
Oracle
Security Information
SQL2005
16
16
Storage Infrastructure
Audit Logs
Transfer files to ASV via AzCopy,
CloudExplorer, etc.
17
Storage Infrastructure
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
Hadoop on Azure
Compute Nodes (Medium VMs)
18
18
Storage Infrastructure
Stream data
To compute
Push data
Back to Storage
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
map
sort
shuffle reduce
Hadoop on Azure
Compute Nodes (Medium VMs)
http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
19
19
SSIS to HDInsight
20
20
SSIS
Processing
21
21
SSAS
Tabular
of HoA
Audit
Data
22
Hadoop / Auditing: File sizes
Currently testing gz vs. raw
E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)
Query
Duration (s)
select count(*) from sql_audit_asv_raw
56.066
select count(*) from sql_audit_asv_gz
58.994
20x smaller size, ~same query time
Approx same map / reduce task utilization
File Size is 250MB-1GB
SSIS package takes care of the size
Future testing: avro, protobuf
23
23
Hadoop / Auditing: Formats
For ease of processing, replace carriage returns within embedded SQL
statements, e.g.
select col1, col2
from tableA
to
select col1, col2 from tableA
This allows you to create a Hive table using CR as row delimiter (i.e.
does not have things like SQL quoted identifiers)
24
24
25
BI Connectivity
SQOOP, HiveODBC,
Templeton, CSV, etc
Big Data … Excel-lerated!
2 Server, 3mo
110 GB
binary
files
SSIS extraction
1.2GB of text
120MB gz
Hadoop to
PowerPivot
6MB
27
PowerPivot workbook of HoA Audit data
28
Power View of HoA Audit Data
29
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.
Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
30
Thank you!
Diamond Sponsor
Platinum Sponsor
April 10-12, Chicago, IL
Download