Uploaded by Irina Alexandra

Austin-Kusumoto2016 Article TheApplicationOfBigDataInMedic

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/292142503
The application of Big Data in medicine: current implications and future
directions
Article in Journal of Interventional Cardiac Electrophysiology · October 2016
DOI: 10.1007/s10840-016-0104-y
CITATIONS
READS
64
15,099
2 authors:
Christopher O. Austin
Fred M Kusumoto
University of Florida
Mayo Foundation for Medical Education and Research
30 PUBLICATIONS 309 CITATIONS
231 PUBLICATIONS 4,246 CITATIONS
SEE PROFILE
All content following this page was uploaded by Christopher O. Austin on 18 August 2018.
The user has requested enhancement of the downloaded file.
SEE PROFILE
J Interv Card Electrophysiol (2016) 47:51–59
DOI 10.1007/s10840-016-0104-y
The application of Big Data in medicine: current implications
and future directions
Christopher Austin 1
&
Fred Kusumoto 1
Received: 28 October 2015 / Accepted: 11 January 2016 / Published online: 27 January 2016
# Springer Science+Business Media New York 2016
Abstract Since the mid 1980s, the world has experienced an unprecedented explosion in the capacity to produce, store, and communicate data, primarily in digital
formats. Simultaneously, access to computing technologies in the form of the personal PC, smartphone, and
other handheld devices has mirrored this growth. With
these enhanced capabilities of data storage and rapid
computation as well as real-time delivery of information
via the internet, the average daily consumption of data
by an individual has grown exponentially. Unbeknownst
to many, Big Data has silently crept into our daily routines and, with continued development of cheap data
storage and availability of smart devices both regionally
and in developing countries, the influence of Big Data
will continue to grow. This influence has also carried
over to healthcare. This paper will provide an overview
of Big Data, its benefits, potential pitfalls, and the
projected impact on the future of medicine in general
and cardiology in particular.
Keywords Big Data . Cardiology . Electrophysiology .
Analytics . Data management
Institution where work was performed: Mayo Clinic Florida
1 Introduction
Since the mid 1980s the world has experienced an unprecedented explosion in the capacity to produce, store, and comm u n i ca t e d at a, p ri m a ri l y i n di gi t al fo rm at s [ 1] .
Simultaneously, access to computing technologies in the form
of the personal PC, smartphone, and other handheld devices
has mirrored this growth. With these enhanced capabilities of
data storage and rapid computation as well as real-time delivery of information via the internet, the average daily consumption of data by an individual has grown exponentially. In
2007, the average human was presented with the data equivalent of 174 newspapers per day. This data explosion continues, driven by wireless networks providing internet access
in almost any imaginable locale.
Unbeknownst to many, Big Data has silently crept into our
daily routines and, with continued development of cheap data
storage and availability of smart devices both regionally and
in developing countries, the influence of Big Data will continue to grow. This influence has also carried over to
healthcare. Over the last 10 years, volumes of pharmaceutical
research, clinical trials data, and patient records have been
compiled and examined in an effort to and reduce costs while
improving efficiency and advancing the practice of medicine.
This paper will provide an overview of Big Data, its benefits,
potential pitfalls, and the projected impact on the future of
medicine in general and cardiology in particular.
* Christopher Austin
austin.christopher@mayo.edu
2 The origins of Big Data
Fred Kusumoto
kusumoto.fred@mayo.edu
1
Division of Cardiovascular Disease, Mayo Clinic Florida,
Jacksonville, FL 32224, USA
The term Big Data was originally coined by NASA scientists
in 1997 while attempting to describe the difficulty of
displaying data sets too large to be stored in a computer’s main
memory, limiting analysis of the data set as a whole [2].
52
Although numerous definitions of Big Data abound, most present it as a dataset that is too large to easily manipulate and
manage [3, 4]. Big Data also describes the activity of
collecting, storing, analyzing, and repurposing large volumes
of data [5]. Although the phraseology is relatively new, the
concept of large scale data collection and analysis is not. The
US Navy Lieutenant Matthew Fontaine Maury’s analysis of
thousands of ships’ logs and charts lead to his publication of
the Wind and Current Chart of the North Atlantic [6] in 1852,
guiding sailors to trade winds and ocean currents drastically
reducing the length of ocean voyages. Although Lt. Maury’s
analysis was a huge undertaking in the 1850s, the entirety of
data he reviewed would amount to a modest-size spreadsheet
easily analyzed on a modern home computer. At its most fundamental, the concept of Big Data is relative to the available
resources at a given point in time. Since the accumulation of
data is the driving force for innovation in storage techniques
and analytics, the concept of Big Data will persist despite technologic advances designed to address new these challenges.
3 The 3 Vs of Big Data
Despite its variable definitions, Doug Laney’s description of
the B3 Vs^—volume, variety, and velocity—have been widely
accepted as the key data management challenges associated
with Big Data [7] (Fig. 1).
3.1 Volume
The volume of new data being created annually is nearly unimaginable. Ninety percent of all data ever created was created
J Interv Card Electrophysiol (2016) 47:51–59
in the past 2 years [8]. The amount of data in the world is
projected to double every 2 years, leaving us with 50 times
more data (44 zettabytes, or 44 trillion gigabytes) in 2020 than
in 2011 [9]. This explosion is owing to the affordability of data
storage as the average price of a gigabyte of storage fell from
$437,500 in 1980 to $0.03 in 2015 [10]. As the mountains of
accumulated data grow, the desire to analyze and convert it
into business intelligence also grows.
3.2 Variety
Historically, the majority of electronic data has been structured
and readily analyzed in spreadsheet or database formats.
Today, data is much less congruent and can be stored in countless forms including written text, streaming video and sensorderived information. It is widely accepted that between 80 and
90 % of generated data is of unstructured format. This includes an estimated 150 exabytes (161 billion gigabytes) of
healthcare data in available on stored disk in 2011 [11]. This
ever increasing variety of information necessitates innovative
storage techniques and advanced tools and algorithms to analyze the flood of data.
3.3 Velocity
Data is being created, stored, and analyzed exponentially faster
than at any period in the history of mankind. YouTube (Google
Inc., Mountain View, CA) boasts 110,000 unique video views
per second [12] and 300 h of uploaded video content every
minute [13]. Whereas data was traditionally stored and analyzed in nightly or weekly batches, it is now generated and
accessed in real time, creating challenges for organizations
interested in analyzing such content. Due to the velocity of
data creation, an adequate Big Data solution must provide
high-throughput solutions with low latency for analytics.
4 Anatomy of a Big Data solution
Fig. 1 The 3 Vs of Big Data. Volume, variety and velocity have been
widely accepted as the key data management challenges associated with
Big Data
Big Data analysis of today’s large datasets allows a data owner
to look for a common thread that connects seemingly unrelated
data points, identifying associations that would otherwise go
unnoticed. Big Data does not explain the Bwhy^ or Bhow^ of
these associations, however it does alert the investigator potentially sparking further analysis or prospective studies to answer
questions previously unasked. The goal of the Big Data movement is to unlock the value of large datasets in an effort to
improve decision making, efficiency, outcomes, and time to
deliverables for the data owner. Accomplishing these goals
requires an infrastructure that can collect, store, access, analyze,
and manage data in various forms, turning volumes of simple
data points into high-value information capable of providing
intelligence to drive change and improve efficiency (Fig. 2).
J Interv Card Electrophysiol (2016) 47:51–59
53
Fig. 2 Anatomy of a Big Data solution. Infrastructure is designed to accept data in various formats, allowing real-time storage and analysis. Data is
transformed, repackaged, and presented back to the data owner in various formats for consumption. ETL extract, transform, load
4.1 Data capture
With recent advances in technology, data can be collected
from almost any imaginable venue. Large volume, structured
transactional data such as internet search queries, purchase
histories, or mailing lists are easily captured and fed into relational databases. The collection of unstructured data found in
plain text, images, and streaming video can be more challenging. Emerging sources of data ready for capture include mobile applications, social networks and internet-connected sensors such as wearable devices and RFID. Access to, and the
ability to effectively acquire, data is the single most important
variable in the Big Data equation. Without data there is no Big
Data movement.
meet the goals of the data owner. Once identified, this
information can then be presented or parceled in various
forms.
4.3 Data reporting
Upon completion of analysis, data is reorganized and
repackaged for presentation or warehousing. This information
may be utilized to drive real-time change in the data owner’s
business model by providing:
&
&
&
Monitoring and improving performance (Business
Intelligence)
Delivery of new insight and capabilities (Informatics)
Delivery of new tools and products (Data Mining)
4.2 Data storage and analysis
After data is ingested, it must be stored, organized, and
refined prior to analysis. Software frameworks such as
Apache Hadoop (Apache Software Foundation, Forrest
Hill MD) have been developed specifically to accomplish
this complex charge. Volumes of information are
warehoused, divided, parceled, and manipulated using
shared network resources in parallel, allowing data to be
processed faster than would be accomplished by a traditional supercomputer. Differing data types require variable
amounts of processer cycles to accomplish the requisite
task at hand. Unstructured data is resource expensive
while structured data is more easily organized and processed. The ultimate objective of the Big Data schema is to
identify key information that can be readily utilized to
The ability to collect, analyze, and report information in
real-time allows data owners to adapt to changing business
environments, identify inefficiencies, and disseminate time
sensitive information to key stakeholders (Fig. 3).
4.4 Cloud computing
A discussion about Big Data would be incomplete without
acknowledgement of cloud computing. One of the major technologic advances leading to the Big Data movement is the
ability to store enormous volumes of data in real-time to large
scale, internet connected repositories. When these data repositories are packaged with platform, infrastructure or software
services they are described as cloud computing or The Cloud.
Cloud computing allows the customer to avoid the often
54
J Interv Card Electrophysiol (2016) 47:51–59
Fig. 3 The three dimensions of an analytics action space. The intersection of business intelligence, informatics and data mining is where the strength of
Big Data analytics is most apparent
expensive infrastructure investment in hardware and software,
instead allowing resources to be shared amongst a cooperative, allowing even the smallest of operations to benefit from
economies of scale. Cloud resources can be reallocated in realtime, allowing maximal utilization resulting in shared overhead and lower operational costs. Fee models are frequently
Bpay as you go^, allowing businesses to grow without worry
of sizable upfront investments in technology.
Big Data to not only measure successes but to identify wasteful practices leading to lower overheads. Large corporations
and startups alike recognize the potential windfall to the winners of the Big Data race, spurring innovation of application
development not only in traditional data management and
storage platforms but also in emerging fields such as artificial
intelligence and predictive analytics [15, 16].
5 Big Data in healthcare: a paradigm shift awaits
6 Squeezing the last drop from electronic health
records
The effects of the Big Data revolution can already be felt in the
medical field and further shifting of the healthcare paradigm is
anticipated. Healthcare data is abundant, however one of the
biggest obstacles to meaningful large-scale analysis is the plurality of stakeholders. Patient specific information is often
housed on the servers of individual healthcare providers, laboratories, hospitals, or insurance providers. Without stakeholder collaboration the data is effectively siloed, resulting
in resource underutilization, redundancy, and inefficiency, ultimately contributing to the growing cost of healthcare which,
in 2013, totaled $2.9 trillion or $9255 per person, equal to
17.4 % of the US Gross Domestic Product [14].
In response to these ever increasing costs, many insurance
payers have changed from a fee-for-service model to risksharing arraignments that prioritize patient outcomes. Facing
the reality of decreasing reimbursements, many healthcare
organizations have embraced Big Data analytics to become
more efficient. Accountable care organizations have leveraged
Healthcare data abounds in various forms however it is
most conventionally found in the electronic health record
(EHR). The typical EHR includes structured data such as
patient demographics, ICD-9 diagnosis codes, laboratory
data and vital signs. Unfortunately, structured data account for only one fifth of available healthcare information; the bulk of data is sequestered in unstructured physician notes and imaging studies. More recently, Centers
for Medicare & Medicaid Services has implemented policies to incentivize the transition to, and meaningful use
of, EHR data [17] with the goal of increasing the overall
percentage of structured data in health records. However,
as the amount of total accessible healthcare information
continues to grow in various forms, the efforts of Centers
for Medicare & Medicaid Services are unlikely to have a
substantial impact on data format and organization.
Advanced analytics are increasingly used to bridge this
gap by interrogating unstructured data and revealing
J Interv Card Electrophysiol (2016) 47:51–59
clinical keys that would otherwise be unrecognized.
Furthermore, these key data are used to identify practice
patterns that may enhance overall value and delivery of
care.
7 Insights from registry data: tomorrows results
today
As healthcare reimbursement becomes dependent on quality
and outcome metrics, registries to record and monitor diseasespecific data have been increasingly utilized. Sweden, a country that started its first healthcare registry in 1975, now boasts
103 national registries which have led to such findings as the
association of smoking and rheumatoid arthritis [18, 19].
More recently, this data has been leveraged to improve quality
metrics and reduce Sweden’s growth in healthcare spending
by up to 4.7 % per year. In the USA, individual medical
societies have taken up the torch of registry creation and management. In 1997, the American College of Cardiology created the National Cardiovascular Data Registry (NCDR) in an
effort to formalize data collection and reporting of diagnostic
catheterization and/or PCI [20]. Since then, the NCDR has
grown to include eight current and two future registries that
contain more than 15,000,000 unique patient records spanning
the gamut of cardiovascular care including coronary intervention, pulmonary vein ablation, implantation of various devices, and percutaneous valve replacement. Recently, the
ACC has expanded the scope of its registries to the outpatient
setting with the creation of two unique registries: PINNACLE
and the Diabetes Collaborative Registry [21]. PINNACLE is
cardiology’s largest outpatient quality improvement registry,
tracking data from more than 2500 physicians on coronary
artery disease, hypertension, heart failure, and atrial fibrillation and boasts an additional 15,000,000 patient records. Data
from the PINNACLE registry is qualified as Bmeaningful use^
and is automatically reported to the Physician Quality
Reporting System. The soon-to-be-opened Diabetes
Collaborative Registry aims to connect primary care and specialty physicians with the common goal of improving patient
care and treatment of diabetes mellitus. All NCDR databases
provide participating practices with detailed outcomes reports,
highlighting adherence to guideline-based care. Compiled
quarterly, these risk-adjusted reports allow for institution-toinstitution comparison of performance and quality metrics.
Physicians participating in the CathPCI registry also have access to the Physician Dashboard which reports 40 processes
and quality metrics to reinforce guideline-based behaviors or
encourage practice change to Bget in line^ with peers.
Continued expansion and exploitation of registry data will
drive substantial change in the future of healthcare delivery,
providing the needed feedback to develop a more holistic,
outcome-based approach to patient care.
55
8 Research in real-time
Imagine a patient you once saw in your practice who was
affected by a rare condition or unique set of comorbidities.
Perhaps you wondered if there were similar patients in your
medical system hoping to gain insight into their disease progression or therapeutic outcomes. Big Data analytics carries
the potential to provide answers to these queries almost instantaneously, leading to an increased knowledge base and
potentially encouraging collaboration and access to lessons
learned. Highly specified queries can pinpoint de-identified
patients meeting inclusion criteria for randomized control trials during the assessment of feasibility of study design. These
queries would also kick start recruitment once institutional
review board approval is obtained. Indeed, access to registry
data has vastly improved research productivity. More than 360
articles have been published using NCDR data alone, with 56
of these original manuscripts making print in 2014 [22].
In 2003, computing advances enabled the Human Genome
Project to finish DNA sequencing 2 years ahead of schedule.
Riding the momentum of this unprecedented success, the field
of human genomics research has exploded, buoyed by the
prospect of offering personalized medicine to the masses.
Amazingly, the cost of sequencing one human genome has
fallen from $100 million in 2001 to roughly $5000 in 2015
by using previously sequenced genomes as a roadmap for
subsequent genetic sequencing [23]. With more and more genomic data available, personalized medicine has moved to the
forefront of boardroom agendas. Physicians salivate over the
potential to offer patients targeted therapies with fewer side
effects and greater success rates. Insurance providers view
individualized medicine as a way to improve margins by identifying and treating disease earlier, lowering cost in the long
term. Furthermore, President Obama validated the importance
of Big Data in genomics when he announced the Precision
Medicine Initiative at the 2015 State of the Union address,
allocating $215 million for the development of multiple
shared databases in an effort to spur interdisciplinary collaboration, with a particular focus on genomics applications in
cancer [24]. It is anticipated that large capital investments
from governmental and private entities will result in increased
knowledge and application of genomics and, hence, act as a
positive feedback loop driving further investment, research,
and application of this promising field [25].
9 Emerging markets for data in medicine
and cardiology
Data collection from EHR is just the tip of the iceberg; wearable devices such as the Fitbit Surge (Fitbit, San Francisco,
CA) and Apple iWatch (Apple, Cupertino CA) are becoming
increasingly popular and have the potential to collect and
56
distribute vast amounts of data to both an individual healthcare
provider as well as a healthcare network. Other relevant data
may commonly be collected from internet usage, social media,
and GPS location or less commonly from the use of telemedicine and genetic sequencing. Not only are Big Data analytics
are being leveraged for monitoring of individuals, it is also
applied to population studies as well. The University of
California-San Francisco’s ambitious Health eHeart study aims
to identify predictive patterns for heart disease, identify causes
of atrial fibrillation, reduce heart failure hospitalizations, and
determine the effects of social media on heart health by analyzing up to 1 million participants over 10 years. The study will
use Big Data analytics to answer these questions via real-time
metrics acquired through patient worn sensors, mobile applications, social media, and a dedicated web portal [26].
Data in the form of web queries has been leveraged by
internet search providers to report public health trends. In
2008, Google Labs famously started the Google Flu Trends
web service to predict influenza activity based on internet
search terms such as Bflu^, Bfever^ and Bcough^ [27]. This
service was intended to be comparable to, yet more nimble
than, influenza activity reports from the Center for Disease
Control and Prevention (CDC). Although Google Flu Trends
was initially reported to be highly accurate (97 %) [28], further
analysis suggested that the algorithm consistently
overestimated flu prevalence, particularly in 2012–2013
[29]. Google has since abandoned the service but does provide
raw data to public health researchers interested in similar endeavors. Social media data streams such as Twitter are now
being used in Big Data analysis to track public health issues
like foodborne illness [30]. When combined with complimentary data such as weather and air quality reports, tweets containing descriptors commonly associated with asthma exacerbations were noted to be predictive of emergency department
visits for asthma attacks [31]. Future iterations of algorithms
such as these may allow accurate forecasting of illness and
other public health events [32].
The availability of healthcare-specific wearable sensors is
expanding as well. One example is the BodyGuardian Remote
Monitoring System (Preventice, Rochester, MN), a discreet
body-worn cardiac monitoring technology that allows physicians to monitor telemetry data in near real-time. The information is delivered to a cloud-based health platform that is
accessible to physicians, allowing them to monitor, change
event thresholds, and switch the device to one of three monitoring types—mobile cardiac telemetry, event monitoring,
and Holter monitor. It is FDA-cleared for the monitoring of
non-lethal arrhythmias in ambulatory patients [33].
Although randomized control trials have long been the
standard criterion for causality in clinical research, the use of
simple heuristics for risk stratification and diagnostic rule-out
has gained traction in medical decision making due to their
ease-of-use and surprising accuracy [34]. Large observational
J Interv Card Electrophysiol (2016) 47:51–59
data sets are often used to identify association but do not
perform well when adjudicating causation. Prediction models,
however, only require high goodness of fit which is often
achievable by analyzing large amounts of retrospective data
and identifying variables that increase statistical risk. Models
are derived using these variables and are then validated with a
separate cohort. Such models already exist for common entities like chest pain and stroke as well as less prevalent conditions such as pulmonary arterial hypertension and mitral stenosis [35–39]. Paramount to wide clinical acceptance and application is the ease of use of such models. A typical prediction tool easily implemented in daily practice limits data input
to less than eight variables. Robust tools such as the SYNTAX
score for coronary artery disease complexity have been criticized for being cumbersome, leading physicians to rely on
gestalt rather than analytics [40]. Implementation of data mining and predictive analytics may obviate this issue, allowing
the development of models that may have dozens to hundreds
of variables extracted directly from the EHR and directly
displayed to the physician, circumventing the additional work
for the physician [41]. Models developed from a system-wide
cohort could be high powered and rich in detail, allowing the
identification of relationships that would otherwise seem
unintuitive. In a collaborative healthcare system with information sharing across hospitals, these models could be crossvalidated through a unique but similar cohort. In addition to
prediction tools, machine-learning algorithms have shown potential to aid in the early diagnosis of myocardial infarction
[42]. Although not-yet-ready for primetime, artificial neural
networks may offer assistance to diagnosticians of the future.
10 Electrophysiology and Big Data: a unique
opportunity
More so than any other medical discipline, electrophysiology
is uniquely positioned to an early utilizer of Big Data analytics. The current generation of implantable electronic devices is
capable of self-interrogation, rhythm assessment and monitoring, and other novel services such as thoracic impedance monitoring and when combined with remote monitoring, these
devices provide the capability to garner near limitless amounts
of data for the clinician to utilize in a multitude of ways.
Remote monitoring of ICDs has been noted to reduce the
incidence of inappropriate shocks resulting in improved quality of life [43] and boasts 95 % sensitivity for detection of true
atrial fibrillation episodes with as many as 90 % of identified
episodes being asymptomatic [44]. Given the clinical implication these findings, the Heart Rhythm Society currently recommends remote monitoring of all patients with cardiac implantable electronic devices [45]. Big Data analytics and remote monitoring were successfully paired in the ALTITUDE
(185,778 patients) and MERLIN (269,471 consecutive
J Interv Card Electrophysiol (2016) 47:51–59
patients) studies [46, 47]. These mega-cohort studies suggested that patients with remote monitoring strategies had significant survival benefit compared to non-remote monitored
patients. Further analysis of the available data has provided
insight into the interaction between atrial fibrillation and CRTD function [48, 49]. Due to the passive nature of remote monitoring, it is easy to envision future studies of a similar nature
with millions of participants worldwide providing real-time
data from their implantable devices. Population-based analysis of these devices would likely lead to improvements in
future device design and battery performance while simultaneously alerting manufacturers to device malfunction and failure leading to timely advisory notifications. This utilization of
remote monitoring and Big Data analytics would presumably
result in better outcomes for the patient.
11 Concerns for Big Data: big but not perfect
The role of Big Data in the future of healthcare will continue
to expand as access to more information about individual patients and their activities becomes readily available.
Unfortunately, major drawbacks to the reliance on large scale
datasets to guide decision making in healthcare have been well
described. As with all emerging technologies, growing pains
are to be expected, however given the potentially sensitive
nature of the information being stored and analyzed, Big
Data in healthcare poses a unique challenge.
57
fidelity in the codification process. Although registries are
designed to reflect real-world practice in an effort to drive
diagnostic and therapeutic advancement, institutional participation is voluntary leading to the possibility of representation
bias [55]. Oftentimes these registries are incomplete [56] and/
or lack validation, with only 18 % of registries indicating that
they audit their data routinely [57]. Clearly, the ability of registry analytics to advance science and improve care is at stake
if quality control is not enforced [58]. In particular, the use of
traditional statistical analyses may result in type 2 error if
incomplete or invalid data is used for modeling. Conversely,
one of the major advantages of Big Data analytics is its ability
to amplify signal and reduce noise by drowning out erroneous
data, mitigating the impact of inaccurate or non-normalized
data, and helping to identify the meaningful relationships that
researchers seek. As the available registry data explodes over
the next decade Big Data analysis will have an everexpanding role, helping to mute the impact of incomplete or
imprecise datasets though large volume analysis of complementary information. This, of course, assumes data errors are
not systematic or widespread. Fortunately, leading organizations such as the National Heart Lung and Blood Institute have
made recommendations regarding the collection and management of data in an effort to ensure the integrity and validity of
scientific assumptions based on these data [59]. Ultimately,
individual providers and health systems will be responsible
for continued stewardship of the medical record as large scale
data validation at a population level would be a monumental
undertaking.
11.1 Data security
11.3 Patient and physician concerns
As the medical community recognizes the value of large volumes of patient data to drive innovation, others find value for
more nefarious reasons. Despite the protections afforded
though the Health Insurance Portability and Accountability
Act of 1996, security breaches of large magnitude have become commonplace in the past several years. Those affected
by such breaches include insurance giant as Anthem (80 million records at risk) [50], UCLA Health System (4.5 million
records at risk) [51], and Healthcare.gov (test server, no records at risk) [52]. Oftentimes server-side data is neither deidentified or encrypted and includes demographic information
and social security numbers creating a target for
cybercriminals. Despite efforts to de-identify sensitive medical information for wide dissemination, the threat of data reidentification exists and has been demonstrated [53] although
the likelihood of successful re-identification of an individual
record may be less than 0.01 % [54].
Shared decision making between patient and provider will be
paramount for successful implementation of new predictive
tools and treatment strategies based on analytics. A successful
approach to patient care must always afford flexibility to the
provider as clinical tools cannot capture non-clinical variables
such as patient preferences that impact decision making. Prior
to initiation of a treatment such as anticoagulation for the
prevention of stroke in atrial fibrillation or statin therapy for
primary prevention of coronary artery disease, an open discussion regarding the risks, benefits, and applicability of the studied cohort should be had with the patient. Full disclosure of
this information improves patient satisfaction, increases the
likelihood of medication compliance, and may lead to improved quality outcome metrics [60].
12 Conclusion
11.2 Data integrity
Robust datasets accumulated in registry or EHR form are subject to scrutiny due to concerns about data validity and loss of
The Big Data revolution in healthcare is well underway, driven by exponential growth in available data as collected in
EHRs, registries, or wearable sensors. This data will be
58
J Interv Card Electrophysiol (2016) 47:51–59
collected, stored, and analyzed with the hope of unlocking
secrets leading to improved quality of life and cure of disease
all while reducing waste in healthcare. The continued success
of this movement is dependent on sustained technological
advancements in the fields of information technology and
computer architecture as well as seamless collaboration and
open exchange of data between physicians, insurance payers,
private industry, and government. Despite the very real challenges posed by its implementation, the possibilities of Big
Data application are nearly limitless and cannot be ignored.
Compliance with ethical standards
14.
15.
16.
17.
18.
Financial support None.
19.
Conflict of interest The authors declare that they have no competing
interests.
20.
21.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Hilbert, M., & Lopez, P. (2011). The world’s technological capacity
to store, communicate, and compute information. Science,
332(6025), 60–65.
Cox, M. & D. Ellsworth, Application-controlled demand paging for
out-of-core visualization. Proceedings of the 8th conference on
Visualization’97, 1997: p. 235-ff.
Oxford english dictonary. http://www.oed.com/view/Entry/
18833#eid301162177. Accessed 27 Sep 2015.
Press, G. (2015). 12 Big Data definitions: What’s Yours? Forbes.
http://www.forbes.com/sites/gilpress/2014/09/03/12-big-datadefinitions-whats-yours/. Accessed 27 Sep 2015.
Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston:
Eamon Dolan/Houghton Mifflin Harcourt.
Maury’s wind and current chart, 3rd Edition, 1852. http://
collections.lib.uwm.edu/cdm/ref/collection/agdm/id/1717.
Accessed 27 Sep 2015.
Laney, D. (2001). 3D data management: controlling data volume,
velocity, and varity. Meta Group. http://blogs.gartner.com/douglaney/files/2012/01/ad949-3D-Data-Management-ControllingData-Volume-Velocity-and-Variety.pdf. Accessed 27 Sep 2015.
Bringing big data to the enterprise. IBM. http://www-01.ibm.com/
software/data/bigdata/what-is-big-data.html. Accessed 27 Sep 2015.
The digital universe of opportunities: rich data and the increasing
value of the internet of things. EMC Digital Universe with Research
& Analysis by ICD. (2014). http://www.emc.com/leadership/
digital-universe/2014iview/executive-summary.htm. Accessed 27
Sep 2015.
Amazon S3 Pricing. https://aws.amazon.com/s3/pricing/. Accessed
27 Sep 2015.
Hughes G. (2011). How big is ‘big data’ in healthcare?. SAS Blogs.
http://blogs.sas.com/content/hls/2011/10/21/how-big-is-big-datain-healthcare/. Accessed 27 Sep 2015.
Internet live stats. http://www.internetlivestats.com/one-second/
#youtube-band. Accessed 27 Sep 2015.
Statistics Youtube. (2015). https://www.youtube.com/yt/press/
statistics.html. Accessed 27 Sep 2015.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
Hartman, M., et al. (2015). National health spending in 2013:
growth slows, remains in step with the overall economy. Health
Affairs, 34(1), 150–160.
Baum, S. (2015). 4 Ways healthcare is putting artificial intelligence,
machine learning to use. MedCity News. http://medcitynews.com/
2015/02/4-ways-healthcare-putting-artificial-intelligence-machinelearning-use/. Accessed 27 Sep 2015.
Winters-Miner, L. (2014). Seven ways predictive analytics can improve healthcare. Elsevier. http://www.elsevier.com/connect/sevenways-predictive-analytics-can-improve-healthcare. Accessed 27
Sep 2015.
EMR Incentive Programs CMS.gov. https://www.cms.gov/
Regulations-and-Guidance/Legislation/EHRIncentivePrograms/
index.html. Accessed 27 Sep 2015.
Emilsson, L., et al. (2015). Review of 103 Swedish healthcare quality registries. Journal of Internal Medicine, 277(1), 94–136.
Webster, P. C. (2014). Sweden’s health data goldmine. CMAJ,
186(9), E310.
Weintraub, W. S. (1998). Development of the American college of
cardiology national cardiovascular data registry. The Journal of
Invasive Cardiology, 10(8), 489–491.
Oetgen, W. J., Mullen, J. B., & Mirro, M. J. (2011). Cardiologists,
the PINNACLE registry, and the Bmeaningful use^ of electronic
health records. Journal of the American College of Cardiology,
57(14), 1560–1563.
Published manuscripts based on NCDR registries. National cardiovascular data registry. American College of Cardiology. (2015).
http://cvquality.acc.org/~/media/QII/NCDR/Published%
20Research%20Page/Aug%202015%20NCDR%20Published%
20Manuscripts%20by%20Registry.ashx. Accessed 27 Sep 2015.
Wetterstrand K. DNA sequencing costs: data from the NHGRI gen ome sequ encin g prog ram . htt p://ww w.genome.gov/
sequencingcosts/. Accessed 27 Sep 2015.
FACT SHEET: President Obama’s precision medicine initiative.
https://www.whitehouse.gov/the-press-office/2015/01/30/factsheet-president-obama-s-precision-medicine-initiative. Accessed
27 Sep 2015.
Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: a patient-centered framework. Journal of General
Internal Medicine, 28(Suppl 3), S660–S665.
Health eHeart Study. University of California, San Francisco.
https://www.health-eheartstudy.org/. Accessed 6 Oct 2015.
Google flu trends. http://www.google.org/flutrends/about/;
Accessed 26 Dec 2015.
Ginsberg J, Mohebbi MH, Patel RS, ABrammer L, Smolinski MS,
Brilliant L. Detecting influenza epidemics using search engine query data. http://static.googleusercontent.com/external_content/
untrusted_dlcp/research.google.com/en/us/archive/papers/
detecting-influenza-epidemics.pdf. Accessed 26 Dec 2015.
Lazer, D., et al. (2014). Big data. The parable of Google Flu: traps in
big data analysis. Science, 343(6176), 1203–1205.
Kuehn, B. M. (2014). Agencies use social media to track foodborne
illness. JAMA, 312(2), 117–118.
Ram, S., et al. (2015). Predicting asthma-related emergency department visits using big data. IEEE Journal of Biomedical and Health
Informatics, 19(4), 1216–1223.
Kuehn, B. M. (2015). Twitter streams fuel Big Data approaches to
health forecasting. JAMA, 314(19), 2010–2012.
Body guardian system. Preventice medical systems. http://www.
preventice.com/index.html. Accessed 6 Oct 2015.
Marewski, J. N., & Gigerenzer, G. (2012). Heuristic decision making in medicine. Dialogues in Clinical Neuroscience, 14(1), 77–89.
Abascal, V. M., et al. (1988). Echocardiographic evaluation of mitral valve structure and function in patients followed for at least 6
months after percutaneous balloon mitral valvuloplasty. Journal of
the American College of Cardiology, 12(3), 606–615.
J Interv Card Electrophysiol (2016) 47:51–59
36.
Benza, R. L., et al. (2012). The REVEAL registry risk score calculator in patients newly diagnosed with pulmonary arterial hypertension. Chest, 141(2), 354–362.
37. Conway Morris, A., et al. (2006). TIMI risk score accurately risk
stratifies patients with undifferentiated chest pain presenting to an
emergency department. Heart, 92(9), 1333–1334.
38. Lip, G. Y., et al. (2010). Refining clinical risk stratification for
predicting stroke and thromboembolism in atrial fibrillation using
a novel risk factor-based approach: the euro heart survey on atrial
fibrillation. Chest, 137(2), 263–272.
39. Wilkins, G. T., et al. (1988). Percutaneous balloon dilatation of the
mitral valve: an analysis of echocardiographic variables related to
outcome and the mechanism of dilatation. British Heart Journal,
60(4), 299–308.
40. Serruys, P. W., et al. (2009). Percutaneous coronary intervention
versus coronary-artery bypass grafting for severe coronary artery
disease. The New England Journal of Medicine, 360(10), 961–972.
41. Janke, A. T., et al. (2015). Exploring the potential of predictive analytics and Big Data in emergency care. Annals of Emergency Medicine.
doi:10.1016/j.annemergmed.2015.06.024.
42. Baxt, W. G. (1992). Analysis of the clinical variables driving decision in an artificial neural network trained to identify the presence of
myocardial infarction. Annals of Emergency Medicine, 21(12),
1439–1444.
43. Hindricks, G., et al. (2014). Quarterly vs. yearly clinical follow-up
of remotely monitored recipients of prophylactic implantable
cardioverter-defibrillators: results of the REFORM trial. European
Heart Journal, 35(2), 98–105.
44. Ricci, R. P., et al. (2013). Effectiveness of remote monitoring of
CIEDs in detection and treatment of clinical and device-related
cardiovascular events in daily practice: the HomeGuide Registry.
Europace, 15(7), 970–977.
45. Slotwiner, D., et al. (2015). HRS expert consensus statement on
remote interrogation and monitoring for cardiovascular implantable
electronic devices. Heart Rhythm, 12(7), e69–e100.
46. Saxon, L. A., et al. (2010). Long-term outcome after ICD and CRT
implantation and influence of remote device follow-up: the
ALTITUDE survival study. Circulation, 122(23), 2359–2367.
47. Varma, N., et al. (2015). The relationship between level of adherence to automatic wireless remote monitoring and survival in pacemaker and defibrillator patients. Journal of the American College of
Cardiology, 65(24), 2601–2610.
View publication stats
59
48.
Hayes, D. L., et al. (2011). Cardiac resynchronization therapy and
the relationship of percent biventricular pacing to symptoms and
survival. Heart Rhythm, 8(9), 1469–1475.
49. Gilliam, F. R., et al. (2011). Real world evaluation of dual-zone ICD
and CRT-D programming compared to single-zone programming:
the ALTITUDE REDUCES study. Journal of Cardiovascular
Electrophysiology, 22(9), 1023–1029.
50. Health insurer anthem struck by massive data breach. Forbes.
(2015). http://www.forbes.com/sites/gregorymcneal/2015/02/04/
massive-data-breach-at-health-insurer-anthem-reveals-socialsecurity-numbers-and-more/. Accessed 27 Sep 2015.
51. UCLA Health System data breach affects 4.5 million patients. Los
Angeles Times. (2015). http://www.latimes.com/business/la-fi-uclamedical-data-20150717-story.html. Accessed 27 Sep 2015.
52. Hacker Breached HealthCare.gov Insurance Site. (2014). The wall
street journal. http://www.wsj.com/articles/hacker-breachedhealthcare-gov-insurance-site-1409861043. Accessed 27 Sep 2015.
53. Ohm, Paul. (2009). Broken promises of privacy: responding to the
surprising failure of anonymization. UCLA Law Review, Vol. 57,
p. 1701, 2010; U of Colorado Law Legal Studies Research Paper
No. 9–12. Available at SSRN: http://ssrn.com/abstract=1450006.
54. Benitez, K., & Malin, B. (2010). Evaluating re-identification risks
with respect to the HIPAA privacy rule. Journal of the American
Medical Informatics Association, 17(2), 169–177.
55. Xian, Y., Hammill, B. G., & Curtis, L. H. (2013). Data sources for
heart failure comparative effectiveness research. Heart Failure
Clinics, 9(1), 1–13.
56. Dunlay, S. M., et al. (2008). Medical records and quality of care in
acute coronary syndromes: results from CRUSADE. Archives of
Internal Medicine, 168(15), 1692–1698.
57. Lyu, H., et al. (2015). Prevalence and data transparency of national
clinical registries in the United States. Journal for Healthcare
Quality.
58. Roger, V. L. (2015). Of the importance of motherhood and
apple pie. Circulation. Cardiovascular Quality and Outcomes,
8(4), 329–331.
59. Roger, V. L., et al. (2015). Strategic transformation of population
studies: recommendations of the working group on epidemiology
and population sciences from the National Heart, Lung, and Blood
Advisory Council and Board of External Experts. American
Journal of Epidemiology, 181(6), 363–368.
60. Brown, M. T., & Bussell, J. K. (2011). Medication adherence:
WHO cares? Mayo Clinic Proceedings, 86(4), 304–314.
Download