See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/292142503 The application of Big Data in medicine: current implications and future directions Article in Journal of Interventional Cardiac Electrophysiology · October 2016 DOI: 10.1007/s10840-016-0104-y CITATIONS READS 64 15,099 2 authors: Christopher O. Austin Fred M Kusumoto University of Florida Mayo Foundation for Medical Education and Research 30 PUBLICATIONS 309 CITATIONS 231 PUBLICATIONS 4,246 CITATIONS SEE PROFILE All content following this page was uploaded by Christopher O. Austin on 18 August 2018. The user has requested enhancement of the downloaded file. SEE PROFILE J Interv Card Electrophysiol (2016) 47:51–59 DOI 10.1007/s10840-016-0104-y The application of Big Data in medicine: current implications and future directions Christopher Austin 1 & Fred Kusumoto 1 Received: 28 October 2015 / Accepted: 11 January 2016 / Published online: 27 January 2016 # Springer Science+Business Media New York 2016 Abstract Since the mid 1980s, the world has experienced an unprecedented explosion in the capacity to produce, store, and communicate data, primarily in digital formats. Simultaneously, access to computing technologies in the form of the personal PC, smartphone, and other handheld devices has mirrored this growth. With these enhanced capabilities of data storage and rapid computation as well as real-time delivery of information via the internet, the average daily consumption of data by an individual has grown exponentially. Unbeknownst to many, Big Data has silently crept into our daily routines and, with continued development of cheap data storage and availability of smart devices both regionally and in developing countries, the influence of Big Data will continue to grow. This influence has also carried over to healthcare. This paper will provide an overview of Big Data, its benefits, potential pitfalls, and the projected impact on the future of medicine in general and cardiology in particular. Keywords Big Data . Cardiology . Electrophysiology . Analytics . Data management Institution where work was performed: Mayo Clinic Florida 1 Introduction Since the mid 1980s the world has experienced an unprecedented explosion in the capacity to produce, store, and comm u n i ca t e d at a, p ri m a ri l y i n di gi t al fo rm at s [ 1] . Simultaneously, access to computing technologies in the form of the personal PC, smartphone, and other handheld devices has mirrored this growth. With these enhanced capabilities of data storage and rapid computation as well as real-time delivery of information via the internet, the average daily consumption of data by an individual has grown exponentially. In 2007, the average human was presented with the data equivalent of 174 newspapers per day. This data explosion continues, driven by wireless networks providing internet access in almost any imaginable locale. Unbeknownst to many, Big Data has silently crept into our daily routines and, with continued development of cheap data storage and availability of smart devices both regionally and in developing countries, the influence of Big Data will continue to grow. This influence has also carried over to healthcare. Over the last 10 years, volumes of pharmaceutical research, clinical trials data, and patient records have been compiled and examined in an effort to and reduce costs while improving efficiency and advancing the practice of medicine. This paper will provide an overview of Big Data, its benefits, potential pitfalls, and the projected impact on the future of medicine in general and cardiology in particular. * Christopher Austin austin.christopher@mayo.edu 2 The origins of Big Data Fred Kusumoto kusumoto.fred@mayo.edu 1 Division of Cardiovascular Disease, Mayo Clinic Florida, Jacksonville, FL 32224, USA The term Big Data was originally coined by NASA scientists in 1997 while attempting to describe the difficulty of displaying data sets too large to be stored in a computer’s main memory, limiting analysis of the data set as a whole [2]. 52 Although numerous definitions of Big Data abound, most present it as a dataset that is too large to easily manipulate and manage [3, 4]. Big Data also describes the activity of collecting, storing, analyzing, and repurposing large volumes of data [5]. Although the phraseology is relatively new, the concept of large scale data collection and analysis is not. The US Navy Lieutenant Matthew Fontaine Maury’s analysis of thousands of ships’ logs and charts lead to his publication of the Wind and Current Chart of the North Atlantic [6] in 1852, guiding sailors to trade winds and ocean currents drastically reducing the length of ocean voyages. Although Lt. Maury’s analysis was a huge undertaking in the 1850s, the entirety of data he reviewed would amount to a modest-size spreadsheet easily analyzed on a modern home computer. At its most fundamental, the concept of Big Data is relative to the available resources at a given point in time. Since the accumulation of data is the driving force for innovation in storage techniques and analytics, the concept of Big Data will persist despite technologic advances designed to address new these challenges. 3 The 3 Vs of Big Data Despite its variable definitions, Doug Laney’s description of the B3 Vs^—volume, variety, and velocity—have been widely accepted as the key data management challenges associated with Big Data [7] (Fig. 1). 3.1 Volume The volume of new data being created annually is nearly unimaginable. Ninety percent of all data ever created was created J Interv Card Electrophysiol (2016) 47:51–59 in the past 2 years [8]. The amount of data in the world is projected to double every 2 years, leaving us with 50 times more data (44 zettabytes, or 44 trillion gigabytes) in 2020 than in 2011 [9]. This explosion is owing to the affordability of data storage as the average price of a gigabyte of storage fell from $437,500 in 1980 to $0.03 in 2015 [10]. As the mountains of accumulated data grow, the desire to analyze and convert it into business intelligence also grows. 3.2 Variety Historically, the majority of electronic data has been structured and readily analyzed in spreadsheet or database formats. Today, data is much less congruent and can be stored in countless forms including written text, streaming video and sensorderived information. It is widely accepted that between 80 and 90 % of generated data is of unstructured format. This includes an estimated 150 exabytes (161 billion gigabytes) of healthcare data in available on stored disk in 2011 [11]. This ever increasing variety of information necessitates innovative storage techniques and advanced tools and algorithms to analyze the flood of data. 3.3 Velocity Data is being created, stored, and analyzed exponentially faster than at any period in the history of mankind. YouTube (Google Inc., Mountain View, CA) boasts 110,000 unique video views per second [12] and 300 h of uploaded video content every minute [13]. Whereas data was traditionally stored and analyzed in nightly or weekly batches, it is now generated and accessed in real time, creating challenges for organizations interested in analyzing such content. Due to the velocity of data creation, an adequate Big Data solution must provide high-throughput solutions with low latency for analytics. 4 Anatomy of a Big Data solution Fig. 1 The 3 Vs of Big Data. Volume, variety and velocity have been widely accepted as the key data management challenges associated with Big Data Big Data analysis of today’s large datasets allows a data owner to look for a common thread that connects seemingly unrelated data points, identifying associations that would otherwise go unnoticed. Big Data does not explain the Bwhy^ or Bhow^ of these associations, however it does alert the investigator potentially sparking further analysis or prospective studies to answer questions previously unasked. The goal of the Big Data movement is to unlock the value of large datasets in an effort to improve decision making, efficiency, outcomes, and time to deliverables for the data owner. Accomplishing these goals requires an infrastructure that can collect, store, access, analyze, and manage data in various forms, turning volumes of simple data points into high-value information capable of providing intelligence to drive change and improve efficiency (Fig. 2). J Interv Card Electrophysiol (2016) 47:51–59 53 Fig. 2 Anatomy of a Big Data solution. Infrastructure is designed to accept data in various formats, allowing real-time storage and analysis. Data is transformed, repackaged, and presented back to the data owner in various formats for consumption. ETL extract, transform, load 4.1 Data capture With recent advances in technology, data can be collected from almost any imaginable venue. Large volume, structured transactional data such as internet search queries, purchase histories, or mailing lists are easily captured and fed into relational databases. The collection of unstructured data found in plain text, images, and streaming video can be more challenging. Emerging sources of data ready for capture include mobile applications, social networks and internet-connected sensors such as wearable devices and RFID. Access to, and the ability to effectively acquire, data is the single most important variable in the Big Data equation. Without data there is no Big Data movement. meet the goals of the data owner. Once identified, this information can then be presented or parceled in various forms. 4.3 Data reporting Upon completion of analysis, data is reorganized and repackaged for presentation or warehousing. This information may be utilized to drive real-time change in the data owner’s business model by providing: & & & Monitoring and improving performance (Business Intelligence) Delivery of new insight and capabilities (Informatics) Delivery of new tools and products (Data Mining) 4.2 Data storage and analysis After data is ingested, it must be stored, organized, and refined prior to analysis. Software frameworks such as Apache Hadoop (Apache Software Foundation, Forrest Hill MD) have been developed specifically to accomplish this complex charge. Volumes of information are warehoused, divided, parceled, and manipulated using shared network resources in parallel, allowing data to be processed faster than would be accomplished by a traditional supercomputer. Differing data types require variable amounts of processer cycles to accomplish the requisite task at hand. Unstructured data is resource expensive while structured data is more easily organized and processed. The ultimate objective of the Big Data schema is to identify key information that can be readily utilized to The ability to collect, analyze, and report information in real-time allows data owners to adapt to changing business environments, identify inefficiencies, and disseminate time sensitive information to key stakeholders (Fig. 3). 4.4 Cloud computing A discussion about Big Data would be incomplete without acknowledgement of cloud computing. One of the major technologic advances leading to the Big Data movement is the ability to store enormous volumes of data in real-time to large scale, internet connected repositories. When these data repositories are packaged with platform, infrastructure or software services they are described as cloud computing or The Cloud. Cloud computing allows the customer to avoid the often 54 J Interv Card Electrophysiol (2016) 47:51–59 Fig. 3 The three dimensions of an analytics action space. The intersection of business intelligence, informatics and data mining is where the strength of Big Data analytics is most apparent expensive infrastructure investment in hardware and software, instead allowing resources to be shared amongst a cooperative, allowing even the smallest of operations to benefit from economies of scale. Cloud resources can be reallocated in realtime, allowing maximal utilization resulting in shared overhead and lower operational costs. Fee models are frequently Bpay as you go^, allowing businesses to grow without worry of sizable upfront investments in technology. Big Data to not only measure successes but to identify wasteful practices leading to lower overheads. Large corporations and startups alike recognize the potential windfall to the winners of the Big Data race, spurring innovation of application development not only in traditional data management and storage platforms but also in emerging fields such as artificial intelligence and predictive analytics [15, 16]. 5 Big Data in healthcare: a paradigm shift awaits 6 Squeezing the last drop from electronic health records The effects of the Big Data revolution can already be felt in the medical field and further shifting of the healthcare paradigm is anticipated. Healthcare data is abundant, however one of the biggest obstacles to meaningful large-scale analysis is the plurality of stakeholders. Patient specific information is often housed on the servers of individual healthcare providers, laboratories, hospitals, or insurance providers. Without stakeholder collaboration the data is effectively siloed, resulting in resource underutilization, redundancy, and inefficiency, ultimately contributing to the growing cost of healthcare which, in 2013, totaled $2.9 trillion or $9255 per person, equal to 17.4 % of the US Gross Domestic Product [14]. In response to these ever increasing costs, many insurance payers have changed from a fee-for-service model to risksharing arraignments that prioritize patient outcomes. Facing the reality of decreasing reimbursements, many healthcare organizations have embraced Big Data analytics to become more efficient. Accountable care organizations have leveraged Healthcare data abounds in various forms however it is most conventionally found in the electronic health record (EHR). The typical EHR includes structured data such as patient demographics, ICD-9 diagnosis codes, laboratory data and vital signs. Unfortunately, structured data account for only one fifth of available healthcare information; the bulk of data is sequestered in unstructured physician notes and imaging studies. More recently, Centers for Medicare & Medicaid Services has implemented policies to incentivize the transition to, and meaningful use of, EHR data [17] with the goal of increasing the overall percentage of structured data in health records. However, as the amount of total accessible healthcare information continues to grow in various forms, the efforts of Centers for Medicare & Medicaid Services are unlikely to have a substantial impact on data format and organization. Advanced analytics are increasingly used to bridge this gap by interrogating unstructured data and revealing J Interv Card Electrophysiol (2016) 47:51–59 clinical keys that would otherwise be unrecognized. Furthermore, these key data are used to identify practice patterns that may enhance overall value and delivery of care. 7 Insights from registry data: tomorrows results today As healthcare reimbursement becomes dependent on quality and outcome metrics, registries to record and monitor diseasespecific data have been increasingly utilized. Sweden, a country that started its first healthcare registry in 1975, now boasts 103 national registries which have led to such findings as the association of smoking and rheumatoid arthritis [18, 19]. More recently, this data has been leveraged to improve quality metrics and reduce Sweden’s growth in healthcare spending by up to 4.7 % per year. In the USA, individual medical societies have taken up the torch of registry creation and management. In 1997, the American College of Cardiology created the National Cardiovascular Data Registry (NCDR) in an effort to formalize data collection and reporting of diagnostic catheterization and/or PCI [20]. Since then, the NCDR has grown to include eight current and two future registries that contain more than 15,000,000 unique patient records spanning the gamut of cardiovascular care including coronary intervention, pulmonary vein ablation, implantation of various devices, and percutaneous valve replacement. Recently, the ACC has expanded the scope of its registries to the outpatient setting with the creation of two unique registries: PINNACLE and the Diabetes Collaborative Registry [21]. PINNACLE is cardiology’s largest outpatient quality improvement registry, tracking data from more than 2500 physicians on coronary artery disease, hypertension, heart failure, and atrial fibrillation and boasts an additional 15,000,000 patient records. Data from the PINNACLE registry is qualified as Bmeaningful use^ and is automatically reported to the Physician Quality Reporting System. The soon-to-be-opened Diabetes Collaborative Registry aims to connect primary care and specialty physicians with the common goal of improving patient care and treatment of diabetes mellitus. All NCDR databases provide participating practices with detailed outcomes reports, highlighting adherence to guideline-based care. Compiled quarterly, these risk-adjusted reports allow for institution-toinstitution comparison of performance and quality metrics. Physicians participating in the CathPCI registry also have access to the Physician Dashboard which reports 40 processes and quality metrics to reinforce guideline-based behaviors or encourage practice change to Bget in line^ with peers. Continued expansion and exploitation of registry data will drive substantial change in the future of healthcare delivery, providing the needed feedback to develop a more holistic, outcome-based approach to patient care. 55 8 Research in real-time Imagine a patient you once saw in your practice who was affected by a rare condition or unique set of comorbidities. Perhaps you wondered if there were similar patients in your medical system hoping to gain insight into their disease progression or therapeutic outcomes. Big Data analytics carries the potential to provide answers to these queries almost instantaneously, leading to an increased knowledge base and potentially encouraging collaboration and access to lessons learned. Highly specified queries can pinpoint de-identified patients meeting inclusion criteria for randomized control trials during the assessment of feasibility of study design. These queries would also kick start recruitment once institutional review board approval is obtained. Indeed, access to registry data has vastly improved research productivity. More than 360 articles have been published using NCDR data alone, with 56 of these original manuscripts making print in 2014 [22]. In 2003, computing advances enabled the Human Genome Project to finish DNA sequencing 2 years ahead of schedule. Riding the momentum of this unprecedented success, the field of human genomics research has exploded, buoyed by the prospect of offering personalized medicine to the masses. Amazingly, the cost of sequencing one human genome has fallen from $100 million in 2001 to roughly $5000 in 2015 by using previously sequenced genomes as a roadmap for subsequent genetic sequencing [23]. With more and more genomic data available, personalized medicine has moved to the forefront of boardroom agendas. Physicians salivate over the potential to offer patients targeted therapies with fewer side effects and greater success rates. Insurance providers view individualized medicine as a way to improve margins by identifying and treating disease earlier, lowering cost in the long term. Furthermore, President Obama validated the importance of Big Data in genomics when he announced the Precision Medicine Initiative at the 2015 State of the Union address, allocating $215 million for the development of multiple shared databases in an effort to spur interdisciplinary collaboration, with a particular focus on genomics applications in cancer [24]. It is anticipated that large capital investments from governmental and private entities will result in increased knowledge and application of genomics and, hence, act as a positive feedback loop driving further investment, research, and application of this promising field [25]. 9 Emerging markets for data in medicine and cardiology Data collection from EHR is just the tip of the iceberg; wearable devices such as the Fitbit Surge (Fitbit, San Francisco, CA) and Apple iWatch (Apple, Cupertino CA) are becoming increasingly popular and have the potential to collect and 56 distribute vast amounts of data to both an individual healthcare provider as well as a healthcare network. Other relevant data may commonly be collected from internet usage, social media, and GPS location or less commonly from the use of telemedicine and genetic sequencing. Not only are Big Data analytics are being leveraged for monitoring of individuals, it is also applied to population studies as well. The University of California-San Francisco’s ambitious Health eHeart study aims to identify predictive patterns for heart disease, identify causes of atrial fibrillation, reduce heart failure hospitalizations, and determine the effects of social media on heart health by analyzing up to 1 million participants over 10 years. The study will use Big Data analytics to answer these questions via real-time metrics acquired through patient worn sensors, mobile applications, social media, and a dedicated web portal [26]. Data in the form of web queries has been leveraged by internet search providers to report public health trends. In 2008, Google Labs famously started the Google Flu Trends web service to predict influenza activity based on internet search terms such as Bflu^, Bfever^ and Bcough^ [27]. This service was intended to be comparable to, yet more nimble than, influenza activity reports from the Center for Disease Control and Prevention (CDC). Although Google Flu Trends was initially reported to be highly accurate (97 %) [28], further analysis suggested that the algorithm consistently overestimated flu prevalence, particularly in 2012–2013 [29]. Google has since abandoned the service but does provide raw data to public health researchers interested in similar endeavors. Social media data streams such as Twitter are now being used in Big Data analysis to track public health issues like foodborne illness [30]. When combined with complimentary data such as weather and air quality reports, tweets containing descriptors commonly associated with asthma exacerbations were noted to be predictive of emergency department visits for asthma attacks [31]. Future iterations of algorithms such as these may allow accurate forecasting of illness and other public health events [32]. The availability of healthcare-specific wearable sensors is expanding as well. One example is the BodyGuardian Remote Monitoring System (Preventice, Rochester, MN), a discreet body-worn cardiac monitoring technology that allows physicians to monitor telemetry data in near real-time. The information is delivered to a cloud-based health platform that is accessible to physicians, allowing them to monitor, change event thresholds, and switch the device to one of three monitoring types—mobile cardiac telemetry, event monitoring, and Holter monitor. It is FDA-cleared for the monitoring of non-lethal arrhythmias in ambulatory patients [33]. Although randomized control trials have long been the standard criterion for causality in clinical research, the use of simple heuristics for risk stratification and diagnostic rule-out has gained traction in medical decision making due to their ease-of-use and surprising accuracy [34]. Large observational J Interv Card Electrophysiol (2016) 47:51–59 data sets are often used to identify association but do not perform well when adjudicating causation. Prediction models, however, only require high goodness of fit which is often achievable by analyzing large amounts of retrospective data and identifying variables that increase statistical risk. Models are derived using these variables and are then validated with a separate cohort. Such models already exist for common entities like chest pain and stroke as well as less prevalent conditions such as pulmonary arterial hypertension and mitral stenosis [35–39]. Paramount to wide clinical acceptance and application is the ease of use of such models. A typical prediction tool easily implemented in daily practice limits data input to less than eight variables. Robust tools such as the SYNTAX score for coronary artery disease complexity have been criticized for being cumbersome, leading physicians to rely on gestalt rather than analytics [40]. Implementation of data mining and predictive analytics may obviate this issue, allowing the development of models that may have dozens to hundreds of variables extracted directly from the EHR and directly displayed to the physician, circumventing the additional work for the physician [41]. Models developed from a system-wide cohort could be high powered and rich in detail, allowing the identification of relationships that would otherwise seem unintuitive. In a collaborative healthcare system with information sharing across hospitals, these models could be crossvalidated through a unique but similar cohort. In addition to prediction tools, machine-learning algorithms have shown potential to aid in the early diagnosis of myocardial infarction [42]. Although not-yet-ready for primetime, artificial neural networks may offer assistance to diagnosticians of the future. 10 Electrophysiology and Big Data: a unique opportunity More so than any other medical discipline, electrophysiology is uniquely positioned to an early utilizer of Big Data analytics. The current generation of implantable electronic devices is capable of self-interrogation, rhythm assessment and monitoring, and other novel services such as thoracic impedance monitoring and when combined with remote monitoring, these devices provide the capability to garner near limitless amounts of data for the clinician to utilize in a multitude of ways. Remote monitoring of ICDs has been noted to reduce the incidence of inappropriate shocks resulting in improved quality of life [43] and boasts 95 % sensitivity for detection of true atrial fibrillation episodes with as many as 90 % of identified episodes being asymptomatic [44]. Given the clinical implication these findings, the Heart Rhythm Society currently recommends remote monitoring of all patients with cardiac implantable electronic devices [45]. Big Data analytics and remote monitoring were successfully paired in the ALTITUDE (185,778 patients) and MERLIN (269,471 consecutive J Interv Card Electrophysiol (2016) 47:51–59 patients) studies [46, 47]. These mega-cohort studies suggested that patients with remote monitoring strategies had significant survival benefit compared to non-remote monitored patients. Further analysis of the available data has provided insight into the interaction between atrial fibrillation and CRTD function [48, 49]. Due to the passive nature of remote monitoring, it is easy to envision future studies of a similar nature with millions of participants worldwide providing real-time data from their implantable devices. Population-based analysis of these devices would likely lead to improvements in future device design and battery performance while simultaneously alerting manufacturers to device malfunction and failure leading to timely advisory notifications. This utilization of remote monitoring and Big Data analytics would presumably result in better outcomes for the patient. 11 Concerns for Big Data: big but not perfect The role of Big Data in the future of healthcare will continue to expand as access to more information about individual patients and their activities becomes readily available. Unfortunately, major drawbacks to the reliance on large scale datasets to guide decision making in healthcare have been well described. As with all emerging technologies, growing pains are to be expected, however given the potentially sensitive nature of the information being stored and analyzed, Big Data in healthcare poses a unique challenge. 57 fidelity in the codification process. Although registries are designed to reflect real-world practice in an effort to drive diagnostic and therapeutic advancement, institutional participation is voluntary leading to the possibility of representation bias [55]. Oftentimes these registries are incomplete [56] and/ or lack validation, with only 18 % of registries indicating that they audit their data routinely [57]. Clearly, the ability of registry analytics to advance science and improve care is at stake if quality control is not enforced [58]. In particular, the use of traditional statistical analyses may result in type 2 error if incomplete or invalid data is used for modeling. Conversely, one of the major advantages of Big Data analytics is its ability to amplify signal and reduce noise by drowning out erroneous data, mitigating the impact of inaccurate or non-normalized data, and helping to identify the meaningful relationships that researchers seek. As the available registry data explodes over the next decade Big Data analysis will have an everexpanding role, helping to mute the impact of incomplete or imprecise datasets though large volume analysis of complementary information. This, of course, assumes data errors are not systematic or widespread. Fortunately, leading organizations such as the National Heart Lung and Blood Institute have made recommendations regarding the collection and management of data in an effort to ensure the integrity and validity of scientific assumptions based on these data [59]. Ultimately, individual providers and health systems will be responsible for continued stewardship of the medical record as large scale data validation at a population level would be a monumental undertaking. 11.1 Data security 11.3 Patient and physician concerns As the medical community recognizes the value of large volumes of patient data to drive innovation, others find value for more nefarious reasons. Despite the protections afforded though the Health Insurance Portability and Accountability Act of 1996, security breaches of large magnitude have become commonplace in the past several years. Those affected by such breaches include insurance giant as Anthem (80 million records at risk) [50], UCLA Health System (4.5 million records at risk) [51], and Healthcare.gov (test server, no records at risk) [52]. Oftentimes server-side data is neither deidentified or encrypted and includes demographic information and social security numbers creating a target for cybercriminals. Despite efforts to de-identify sensitive medical information for wide dissemination, the threat of data reidentification exists and has been demonstrated [53] although the likelihood of successful re-identification of an individual record may be less than 0.01 % [54]. Shared decision making between patient and provider will be paramount for successful implementation of new predictive tools and treatment strategies based on analytics. A successful approach to patient care must always afford flexibility to the provider as clinical tools cannot capture non-clinical variables such as patient preferences that impact decision making. Prior to initiation of a treatment such as anticoagulation for the prevention of stroke in atrial fibrillation or statin therapy for primary prevention of coronary artery disease, an open discussion regarding the risks, benefits, and applicability of the studied cohort should be had with the patient. Full disclosure of this information improves patient satisfaction, increases the likelihood of medication compliance, and may lead to improved quality outcome metrics [60]. 12 Conclusion 11.2 Data integrity Robust datasets accumulated in registry or EHR form are subject to scrutiny due to concerns about data validity and loss of The Big Data revolution in healthcare is well underway, driven by exponential growth in available data as collected in EHRs, registries, or wearable sensors. This data will be 58 J Interv Card Electrophysiol (2016) 47:51–59 collected, stored, and analyzed with the hope of unlocking secrets leading to improved quality of life and cure of disease all while reducing waste in healthcare. The continued success of this movement is dependent on sustained technological advancements in the fields of information technology and computer architecture as well as seamless collaboration and open exchange of data between physicians, insurance payers, private industry, and government. Despite the very real challenges posed by its implementation, the possibilities of Big Data application are nearly limitless and cannot be ignored. Compliance with ethical standards 14. 15. 16. 17. 18. Financial support None. 19. Conflict of interest The authors declare that they have no competing interests. 20. 21. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Hilbert, M., & Lopez, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025), 60–65. Cox, M. & D. Ellsworth, Application-controlled demand paging for out-of-core visualization. Proceedings of the 8th conference on Visualization’97, 1997: p. 235-ff. Oxford english dictonary. http://www.oed.com/view/Entry/ 18833#eid301162177. Accessed 27 Sep 2015. Press, G. (2015). 12 Big Data definitions: What’s Yours? Forbes. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-datadefinitions-whats-yours/. Accessed 27 Sep 2015. Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston: Eamon Dolan/Houghton Mifflin Harcourt. Maury’s wind and current chart, 3rd Edition, 1852. http:// collections.lib.uwm.edu/cdm/ref/collection/agdm/id/1717. Accessed 27 Sep 2015. Laney, D. (2001). 3D data management: controlling data volume, velocity, and varity. Meta Group. http://blogs.gartner.com/douglaney/files/2012/01/ad949-3D-Data-Management-ControllingData-Volume-Velocity-and-Variety.pdf. Accessed 27 Sep 2015. Bringing big data to the enterprise. IBM. http://www-01.ibm.com/ software/data/bigdata/what-is-big-data.html. Accessed 27 Sep 2015. The digital universe of opportunities: rich data and the increasing value of the internet of things. EMC Digital Universe with Research & Analysis by ICD. (2014). http://www.emc.com/leadership/ digital-universe/2014iview/executive-summary.htm. Accessed 27 Sep 2015. Amazon S3 Pricing. https://aws.amazon.com/s3/pricing/. Accessed 27 Sep 2015. Hughes G. (2011). How big is ‘big data’ in healthcare?. SAS Blogs. http://blogs.sas.com/content/hls/2011/10/21/how-big-is-big-datain-healthcare/. Accessed 27 Sep 2015. Internet live stats. http://www.internetlivestats.com/one-second/ #youtube-band. Accessed 27 Sep 2015. Statistics Youtube. (2015). https://www.youtube.com/yt/press/ statistics.html. Accessed 27 Sep 2015. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Hartman, M., et al. (2015). National health spending in 2013: growth slows, remains in step with the overall economy. Health Affairs, 34(1), 150–160. Baum, S. (2015). 4 Ways healthcare is putting artificial intelligence, machine learning to use. MedCity News. http://medcitynews.com/ 2015/02/4-ways-healthcare-putting-artificial-intelligence-machinelearning-use/. Accessed 27 Sep 2015. Winters-Miner, L. (2014). Seven ways predictive analytics can improve healthcare. Elsevier. http://www.elsevier.com/connect/sevenways-predictive-analytics-can-improve-healthcare. Accessed 27 Sep 2015. EMR Incentive Programs CMS.gov. https://www.cms.gov/ Regulations-and-Guidance/Legislation/EHRIncentivePrograms/ index.html. Accessed 27 Sep 2015. Emilsson, L., et al. (2015). Review of 103 Swedish healthcare quality registries. Journal of Internal Medicine, 277(1), 94–136. Webster, P. C. (2014). Sweden’s health data goldmine. CMAJ, 186(9), E310. Weintraub, W. S. (1998). Development of the American college of cardiology national cardiovascular data registry. The Journal of Invasive Cardiology, 10(8), 489–491. Oetgen, W. J., Mullen, J. B., & Mirro, M. J. (2011). Cardiologists, the PINNACLE registry, and the Bmeaningful use^ of electronic health records. Journal of the American College of Cardiology, 57(14), 1560–1563. Published manuscripts based on NCDR registries. National cardiovascular data registry. American College of Cardiology. (2015). http://cvquality.acc.org/~/media/QII/NCDR/Published% 20Research%20Page/Aug%202015%20NCDR%20Published% 20Manuscripts%20by%20Registry.ashx. Accessed 27 Sep 2015. Wetterstrand K. DNA sequencing costs: data from the NHGRI gen ome sequ encin g prog ram . htt p://ww w.genome.gov/ sequencingcosts/. Accessed 27 Sep 2015. FACT SHEET: President Obama’s precision medicine initiative. https://www.whitehouse.gov/the-press-office/2015/01/30/factsheet-president-obama-s-precision-medicine-initiative. Accessed 27 Sep 2015. Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: a patient-centered framework. Journal of General Internal Medicine, 28(Suppl 3), S660–S665. Health eHeart Study. University of California, San Francisco. https://www.health-eheartstudy.org/. Accessed 6 Oct 2015. Google flu trends. http://www.google.org/flutrends/about/; Accessed 26 Dec 2015. Ginsberg J, Mohebbi MH, Patel RS, ABrammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. http://static.googleusercontent.com/external_content/ untrusted_dlcp/research.google.com/en/us/archive/papers/ detecting-influenza-epidemics.pdf. Accessed 26 Dec 2015. Lazer, D., et al. (2014). Big data. The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205. Kuehn, B. M. (2014). Agencies use social media to track foodborne illness. JAMA, 312(2), 117–118. Ram, S., et al. (2015). Predicting asthma-related emergency department visits using big data. IEEE Journal of Biomedical and Health Informatics, 19(4), 1216–1223. Kuehn, B. M. (2015). Twitter streams fuel Big Data approaches to health forecasting. JAMA, 314(19), 2010–2012. Body guardian system. Preventice medical systems. http://www. preventice.com/index.html. Accessed 6 Oct 2015. Marewski, J. N., & Gigerenzer, G. (2012). Heuristic decision making in medicine. Dialogues in Clinical Neuroscience, 14(1), 77–89. Abascal, V. M., et al. (1988). Echocardiographic evaluation of mitral valve structure and function in patients followed for at least 6 months after percutaneous balloon mitral valvuloplasty. Journal of the American College of Cardiology, 12(3), 606–615. J Interv Card Electrophysiol (2016) 47:51–59 36. Benza, R. L., et al. (2012). The REVEAL registry risk score calculator in patients newly diagnosed with pulmonary arterial hypertension. Chest, 141(2), 354–362. 37. Conway Morris, A., et al. (2006). TIMI risk score accurately risk stratifies patients with undifferentiated chest pain presenting to an emergency department. Heart, 92(9), 1333–1334. 38. Lip, G. Y., et al. (2010). Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest, 137(2), 263–272. 39. Wilkins, G. T., et al. (1988). Percutaneous balloon dilatation of the mitral valve: an analysis of echocardiographic variables related to outcome and the mechanism of dilatation. British Heart Journal, 60(4), 299–308. 40. Serruys, P. W., et al. (2009). Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. The New England Journal of Medicine, 360(10), 961–972. 41. Janke, A. T., et al. (2015). Exploring the potential of predictive analytics and Big Data in emergency care. Annals of Emergency Medicine. doi:10.1016/j.annemergmed.2015.06.024. 42. Baxt, W. G. (1992). Analysis of the clinical variables driving decision in an artificial neural network trained to identify the presence of myocardial infarction. Annals of Emergency Medicine, 21(12), 1439–1444. 43. Hindricks, G., et al. (2014). Quarterly vs. yearly clinical follow-up of remotely monitored recipients of prophylactic implantable cardioverter-defibrillators: results of the REFORM trial. European Heart Journal, 35(2), 98–105. 44. Ricci, R. P., et al. (2013). Effectiveness of remote monitoring of CIEDs in detection and treatment of clinical and device-related cardiovascular events in daily practice: the HomeGuide Registry. Europace, 15(7), 970–977. 45. Slotwiner, D., et al. (2015). HRS expert consensus statement on remote interrogation and monitoring for cardiovascular implantable electronic devices. Heart Rhythm, 12(7), e69–e100. 46. Saxon, L. A., et al. (2010). Long-term outcome after ICD and CRT implantation and influence of remote device follow-up: the ALTITUDE survival study. Circulation, 122(23), 2359–2367. 47. Varma, N., et al. (2015). The relationship between level of adherence to automatic wireless remote monitoring and survival in pacemaker and defibrillator patients. Journal of the American College of Cardiology, 65(24), 2601–2610. View publication stats 59 48. Hayes, D. L., et al. (2011). Cardiac resynchronization therapy and the relationship of percent biventricular pacing to symptoms and survival. Heart Rhythm, 8(9), 1469–1475. 49. Gilliam, F. R., et al. (2011). Real world evaluation of dual-zone ICD and CRT-D programming compared to single-zone programming: the ALTITUDE REDUCES study. Journal of Cardiovascular Electrophysiology, 22(9), 1023–1029. 50. Health insurer anthem struck by massive data breach. Forbes. (2015). http://www.forbes.com/sites/gregorymcneal/2015/02/04/ massive-data-breach-at-health-insurer-anthem-reveals-socialsecurity-numbers-and-more/. Accessed 27 Sep 2015. 51. UCLA Health System data breach affects 4.5 million patients. Los Angeles Times. (2015). http://www.latimes.com/business/la-fi-uclamedical-data-20150717-story.html. Accessed 27 Sep 2015. 52. Hacker Breached HealthCare.gov Insurance Site. (2014). The wall street journal. http://www.wsj.com/articles/hacker-breachedhealthcare-gov-insurance-site-1409861043. Accessed 27 Sep 2015. 53. Ohm, Paul. (2009). Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Review, Vol. 57, p. 1701, 2010; U of Colorado Law Legal Studies Research Paper No. 9–12. Available at SSRN: http://ssrn.com/abstract=1450006. 54. Benitez, K., & Malin, B. (2010). Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association, 17(2), 169–177. 55. Xian, Y., Hammill, B. G., & Curtis, L. H. (2013). Data sources for heart failure comparative effectiveness research. Heart Failure Clinics, 9(1), 1–13. 56. Dunlay, S. M., et al. (2008). Medical records and quality of care in acute coronary syndromes: results from CRUSADE. Archives of Internal Medicine, 168(15), 1692–1698. 57. Lyu, H., et al. (2015). Prevalence and data transparency of national clinical registries in the United States. Journal for Healthcare Quality. 58. Roger, V. L. (2015). Of the importance of motherhood and apple pie. Circulation. Cardiovascular Quality and Outcomes, 8(4), 329–331. 59. Roger, V. L., et al. (2015). Strategic transformation of population studies: recommendations of the working group on epidemiology and population sciences from the National Heart, Lung, and Blood Advisory Council and Board of External Experts. American Journal of Epidemiology, 181(6), 363–368. 60. Brown, M. T., & Bussell, J. K. (2011). Medication adherence: WHO cares? Mayo Clinic Proceedings, 86(4), 304–314.