Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods By: Douglas S. McNair, M.D., Ph.D., President, Cerner Math, Inc. Introduction A variety of collaborative databases containing de-identified, interoperable health information from many millions of individuals exist today, and some of them have been curated—several by private-sector commercial sponsors and others by non-profit organization-based, university-based, or public-sector agencies—for more than 15 years. But the number of such large-scale data warehouses is now growing rapidly [Botsis 2010; Boyd 2007; Dhir 2008; El Fadly 2010; Elkins 2010; Goodby 2010; Guadamuz 2006; Kheterpal 2011; Liu 2009; Powell 2005; Prokosch 2009; Weiner 2007]. In the U.S., such growth has been accelerated by the American Recovery and Reinvestment Act of 2009 (ARRA), which provides assistance for state-wide Health Information Exchanges (HIEs), organizations that serve as focal points for interoperability or that themselves may construct and curate large-scale, multi-contributor de-identified data warehouses. To date, most attention has been directed to privacy and confidentiality-protection aspects of such multi-contributor databases. This essay, however, draws attention to several new ideas in ethics and philosophy that I believe have an important contribution to make in policy-making, concerning data warehouses that are derived from electronic health records (EHRs), genomics, or phenotypic or other personal information. The ideas have primarily to do with understanding what the social and moral nature of the assets is, and what it means for individuals to ―opt out‖—to decline to consent to allow their information to be de-identified or to permit secondary uses of de-identified information about their health and health care, for public health, research, or other valuable and beneficial purposes, and to decline regardless whether compensation is provided for participating. In important respects, de-identified EHR-derived data warehouses resemble a ‗quasi-public good‘, a kind of ‗resource‘ or asset in which many individuals and organizations play a part and hold certain rights and privileges. Such resources have qualities that denote other, more familiar types of public goods like clean air and clean water [Barrett 2010; Geuss 2003; Kaul 2003; Minow 2003]. Public goods can be impaired by some ways of exercising private, individual liberty. For example, low-cost or free immunizations and the prevention of epidemics are a public good. But immunizations are only effective if a large majority of the susceptible population are vaccinated and develop sufficient immunity to ward off infections. That is the rationale underlying compulsory immunization policies for school-age children attending public schools. If enough individuals decline to be vaccinated, then the good of everyone will be diminished, including those who did get vaccinated. Other quasi-public goods whose direct value to an individual is ‗contingent‘ include competent and safe health services. If one never gets sick, then the contingency under which the existence and accessibility of high-quality health services would have direct value to the individual never materializes. Law enforcement and fire departments and FEMA-type disaster recovery services are likewise public goods—ones whose direct value is conveyed to an individual on a mostly contingent basis, when a situation or event for which the services are pertinent arises for the individual. Single-payer health care systems have some of these same properties. De-Identified Health Data Are a Raw-Materials Asset, Enabling Discovery of New Assets and Inventions That Are Not ‘Derivative Works’ De-identified data are nothing more nor less than a kind of raw material or resource. They are extracted from the source information systems of participating organizations and are cleaned and, if necessary, mapped to a standardized ontology or nomenclature scheme; they are checked to confirm that they are complete, accurate, and do not contain duplicated items; then they are de-identified and transferred and stored and curated, generally for months or years after their collection, in a physically-secure computer system that is entirely separate from the original source systems, often in premises known only to a few individuals and located many hundreds of kilometers away from the source systems, in facilities to which no one who might possess knowledge of a particular patient‘s comings and goings to a source health care institution on a particular date could plausibly gain access. In other words, the ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 1 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods de-identified data warehouses tend to be highly safe and secure. The chance that de-identified information could be reidentified and compromise someone‘s privacy is vanishingly small. De-identified data are a resource that has many valuable potential secondary uses. Some of these uses produce scientific discoveries that constitute public goods, through observational and translational research that can improve public health of current and future populations; through comparative-effectiveness studies; through continuous safety and quality-improvement and surveillance and benchmarking processes; and through epidemiologic research, particularly research on conditions or topics that would be cost-prohibitive or unethical (for reasons of lack of equipoise) to conduct by traditional means. Much of the data is produced in part through the use of public funds, and this is one reason why the data has many characteristics of a public good [Blumenthal 2008]. At least part of it is produced with expenditure of public funds, and its subsequent use is, partly, to serve public purposes. But some secondary uses of the data do produce individual goods in the form of personalized healthcare—for example, through rapid pattern-matching of decision-support and expert systems that, in seconds, can identify a large cohort of previously-treated patients whose pre-treatment attributes were similar to those of the patient at-hand and for whom sufficient time has elapsed post-treatment such that the temporal sequences of treatments and outcomes that have thus far materialized with those previous patients can contribute to reliable, personalized decision-making regarding which is the optimal treatment for the current patient, balancing the risks and likely benefits of each. Still other secondary uses of large data warehouses produce what amount to private commercial goods, such as statistical resources with which to enhance the safety and efficacy of existing medications or medical devices or discover new medical diagnostics and therapeutics, thereby creating new intellectual property that has commercial value. These diverse public and private secondary uses of data in research are all governed by federal privacy regulations and human research subjects regulations—in the U.S., these include the Health Insurance Portability and Accountability Act (HIPAA), the Patient Safety and Quality Improvement Act (PSQIA), 21 CFR Parts 50 and 56, 45 CFR 46, and other regulations. The Department of Health and Human Services (HHS) Office for Human Research Protections (OHRP) provides oversight and enforcement for protection of human subjects, for studies supervised by Institutional Review Boards (IRBs) and Centralized Review Boards (CRBs). Additionally in the U.S., a variety of state regulations further protects the confidentiality of patient data. Regulations implementing HIPAA require informed consent of the patient and approval of the Internal Review Board to use identifiable data for research purposes. However, the requirement for informed consent can be waived if data are de-identified, which, under HIPAA ―Safe Harbor‖ rules, requires that 18 data elements (termed Protected Health Information or PHI) be expunged. Data warehouses that have been engineered so that they cannot receive or store any PHI are, by definition, what we mean when we talk about deidentified data warehouses. They are essentially impossible to re-link with other databases. They have no PHI in them with which to re-establish the identity of who those records originally came from, and none with which to create a linkage to any other public or private database that does contain PHI information. Some organizations (including Cerner) that maintain HIPAA-waivered, de-identified data warehouses go even further to more robustly prevent re-identification or linkage with other personal information. They do this by a variety of techniques—by randomly off-setting date-time coordinates of items in the de-identified dataset so that no one with knowledge of where the patient was at a particular date or time can find or re-identify the information with this knowledge; by injecting a small amount of spurious information into each de-identified case record, so that it no longer exactly matches the original source record from which it was derived; and by other techniques. Additionally, some organizations subject their data warehouses to ongoing testing to statistically measure the robustness of the deidentification, using t-closeness, k-anonymity, l-diversity, or other statistical metrics [Li 2007; Sweeney 2002]. Yet despite these protections, some highly-publicized violations have occurred over the years. The violations of regulations governing the review or conduct of clinical research studies involved databases with personally-identifiable PHI information in them, not the de-identified EHR-derived data warehouses that I am addressing in this essay. Nonetheless, the violations have elevated public concern regarding the integrity of the clinical research process and the integrity of de-identified data warehouses. Public trust in the integrity of research involving observational data is ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 2 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods critical not only for funding and participation in de-identified clinical EHR-derived data warehouses but also for confidence in the accuracy and reliability of public policies and comparative-effectiveness or other therapeutics analyses that are based on such data. The questions raised by these violations present an important opportunity (a) to reevaluate the human research subjects protection oversight system to enhance the integrity of health research and the privacy of those who assent to the storage and secondary usages of their de-identified information for observational and translational research and (b) to educate the public about the fact that the violations did not involve de-identified data warehouses. For people to continue to assail all data warehouses as evil or risky as though they were all alike is wrong, factually baseless, a logical fallacy. Private Property and Public Goods Copyright and patenting issues regarding de-identified EHR-derived data warehouses are well-summarized by Fitzgerald and Pappalardo [2007, pp. 117 ff]. Basically, the de-identified information can be applied in ways such that it yields, together with other information, expertise, and inventive steps, one or more inventions capable of being patented or protected by the trade-secret mechanism for intellectual property. For researchers who intend to seek patent protection for inventions derived from their research, a concern is whether they will be able to obtain a patent and whether in the meantime their publishing or disclosing or sharing their inventions or data with other researchers could prevent them from obtaining a patent. Universities or other non-profit operators of, or contributors to, data warehouses who intend to seek patent protection have a concern as to the allocation of ownership rights in the patents. The Bayh-Dole Act of 1980 was, in fact, meant to foster practical application of the results of publicly-funded and university-based research when there was previously no way to commercialize the resulting intellectual property. On the other hand, researchers or data warehouse operators who are intending only to create or enhance the public health or public goods and who do not intend to patent anything, are concerned mainly as to whether another person could secure a patent over an invention that encompasses the researchers‘ or operator‘s data and/or the researcher‘s own discoveries, such that the researcher/operator could be restricted or prevented (by the patent-holder) from practicing her discoveries or creating other derivative works that would be public goods. By contrast to researchers or data warehouse operators, individual consumers‘ main concern is property rights—that is, their right to control the secondary uses and to be fairly compensated for the value that their contributed personal data generates (subsequent to de-identifying and combining their data with that of many other individuals into a data warehouse) through commercial enterprise or public-sector activity, in the form of lump-sum payment and royalties. As stated above, the situation of EHR-derived data warehouses that contain de-identified information the cost of whose production was defrayed in part by public-sector funding and some of whose secondary uses benefit the public at large resembles other more familiar situations involving public goods. The government produces many types of public goods that are non-controversial. For example, the Navy‘s aircraft carriers will not be used by any individual and yet each individual‘s protection is enhanced by the carriers‘ existence. The use of aircraft carriers to support one community—in hurricane or earthquake or tsunami relief, say—does not exclude the general use of the fleet for other purposes such as defense-related military ones, nor does it diminish its value to others in other communities that have never yet been struck by a natural disaster. Besides ‗pure‘ public goods, there are also things that are called ‗quasi-public‘ goods. In my view, these are more like de-identified healthcare data. A quasi-public good is one whose production or consumption generates (or might generate) externalities—financial or other positive or negative effects felt by third-parties. The effects on the third parties are not reflected in the market transactions between the individuals or organizations who participate in private buying and selling or producing and consuming of the quasi-public good. Energy production and energy consumption are good examples. Another is the synthesis and refining of chemicals that are fundamental to a nation‘s economy and which, despite best efforts (ACS Green Chemistry Institute) do adversely impact the quality of air, water, and/or soil, and animal and plant life to some degree—externalities that are eventually borne by the public, we third-parties who are not participants in chemical enterprise. On account of the conceptual similarities and externalities, it is my belief ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 3 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods that principles that are commonly applied to quasi-public goods of other types should help to inform policy-making regarding de-identified health data warehouses as a particular new type of quasi-public good. De-Identified Health Data Warehouses, Big Data, Costs, Social Contracts and Collective Intentionality As is by now quite clear [Manyika 2011], large-scale data can create new opportunities for private gain as well as public gain—particularly data sets that have sufficient sample-size to statistically power certain kinds of research that involve tiny nuances in comparing two or more factors or outcomes, or that involve conditions or patterns that occur quite rarely, or that measure outcome endpoints that ordinarily take quite a long time to occur in any one individual, or that otherwise require very large cohorts of people to study. Some data, for instance, may create competitive advantage for one health care organization by enabling it to learn from its own and others‘ experience, perhaps to achieve better outcomes than their competitor institutions. For example, they may learn how to manage a certain disease better or a technique to do a certain surgical procedure better. Such organizations might be willing to sell clinical knowledge or technique, but usually are not willing to share it for free. Even if they do not patent their innovations, it is tantamount to using ‗trade secret‘ intellectual property protection for superior medical practices and methods. This is routine, entirely legal, and has been so since time immemorial. Remember, too, that health care is inherently a local market, and, while it is true that nearby institutions will compete with the new regimen that you discovered for treating heart failure, cardiologists in Berlin or Beijing probably could not compete with you. You might be inclined to protect the intellectual property of your discovery and out-license it for a fee, and that fee might be different for licensees in Beijing and Berlin, and different from license fees for direct competitors in your own city. The size of what the relevant market, or community, or ‗collective‘ is is predominantly local, because health services delivery is mostly local (‗medical tourism‘ notwithstanding). When we are discussing large de-identified clinical databases, the marginal cost of using the database is very low. The information is there whether it is used or not. When it is used, it is not used up; it can be re-used infinitely many times, for different purposes and to find different discoveries. But the marginal cost of using the database is not the only cost. It is only a minuscule percentage of the total net cost. There is the capital budget cost of the construction and versionto-version upgrades that must be made over the years, amortized over the effective life of each version of the data warehouse. And there is the operating budget cost of the services and expenses incurred in keeping the system running, providing staff with expertise in interoperability and inter-systems nomenclature mapping, performing quality assurance, performing privacy services and regulatory compliance and auditing and reporting. Those costs run to many millions each year for each data warehouse. Thus, the issues that are most controversial mainly concern policies and law relating to privately maintained databases that incur private costs and produce private value, in addition to whatever public costs and values that are associated with them: databases whose construction and curation entail expending large amounts of money and which therefore the constructors/curators tend not to share for free out of altruism, and for which large externalities exist. The philosopher John Searle has a recent book [Searle 2010] that can illuminate the ethics of operating procedures and policies and regulations for quasi-public goods, including de-identified data warehouses, clinical ones and other kinds as well. In the book he addresses the concept of collective intentionality. As contrasted with ordinary first-personsingular individual intentionality ("I want X", "I am going to do whatever I can, to be cured of cancer"), in life in any community or society there are first-person-plural forms of intentionality ("We want X", "We are going to cure cancer"). Collective intentionality differs in important ways from individual intentionality. For example, cooperation is required for collectively causing or preventing an outcome; with a collective activity, it is harder to attribute who comprises the ‗we‘ and allocate credit or blame and to decide who, therefore, should be exposed to what proportion of the rewards or penalties associated with it... this, and the fact that the content of what I am doing must often be different in some substantial way from the content of what you and others are doing to achieve a ‗collective‘ result. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 4 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods Ethics of De-Identified Health Data Warehouses as Massively Multi-Player Ensemble and Performances One simple example is playing a duet on the piano. Playing a duet illustrates how each partner's contribution is, and necessarily must be, different in order to arrive at the beautiful and valuable thing that we would call ‗performance of a duet‘. It is not important that our contributions be identifiable: we can be anonymous if we choose to be; the audio recording of our performance may not have any PHI or any traceable attribution of who played which part, or even that either of us was involved in the duet performance in any way. But if one of us reneges on our agreement to participate, there will no longer be something valuable created—nothing that could be called a duet. The same is true for flocks of birds or butterflies or other social animals who migrate in large groups. The individual who opts-out of participating faces distinctly different survival odds than those who opt-in, but by opting-out also diminishes the odds of the flock or herd. The more unusual the abilities or valuable attributes of the opter-outers, the greater is the loss experienced by the flock. In the case of de-identified EHR-derived data warehouses, people who opt-out may be statistically different from people who opt-in. The opter-outers may be healthier or less healthy than the opter-inners; they may engage in healthier or less healthy behaviors and life-styles than the opter-inners; they may have polymorphisms in the AVPR1a or OXTR or other genes that not only lead them to be less altruistic than others [Israel 2009; Knafo 2008] but that also are associated with certain health outcomes in a manner that is different from other people; and other issues that are the subject of research. These confounding factors may cause the resulting databases (from which such people are systematically absent or depleted in prevalence, compared to their frequency in the overall population) to be statistically biased. The biases in turn lead to inaccurate interpretations and erroneous decisions that affect many other people or perhaps everyone in the society. The more unusual the abilities or valuable attributes of the opterouters, the greater is the detriment to the good of the community. What this means is that opting-out is not morally ‗neutral‘. Opting-out is a decision—a performative act that can seriously harm the overall society and diminish the value of the [quasi-]public good. Likewise, in social networks or massively multi-player online role-playing games (MMORPGs), players who enter into play for awhile when it suits them—but who then capriciously withdraw from the game or periodically withhold their cooperation and stubbornly refuse to negotiate with others or who extort concessions from others—damage the very fabric of the experience for everyone. They are holding the whole game hostage [Bainbridge 2010; Corneliussen 2008; Cuddy 2009; Sicart 2011; Taylor 2009]. By accepting the utility and benefits of belonging to the group, they have entered into an implied social contract: they have knowingly entered into it as players or citizens of the MMORPG community, and they are breaching the terms of that agreement. MMORPG team leaders worry about anarchists like this. I am reminded of analogous worries in other fields, including ones that may arise in music and the arts. For example, there are from time to time massive musical performances involving hundreds of musicians playing simultaneously and [mostly] anonymously. My friend, Lisa Bielawa, composed and produced one such music project at Tempelhof Airfield in Berlin this summer [Bielawa 2011]. The Tempelhof performance came off beautifully, but it was not without worries in advance. You can decline to participate, and we will all be a bit worse off for your musical opting-out. But if you opt-in initially and commit to participating in the performance and derive some benefits as a musician from your initial commitment to the collaboration—and then you later retract your decision or, worse, show up at the performance but refuse to play, or stand around amongst the other musicians and interfere with their playing and shout at them that their decisions to participate were wrong—then we would all be a lot worse off for your far-from-neutral ―opting-out.‖ Yet some of what passes for ‗debate‘ today regarding secondary uses of de-identified health data amounts to just this: fear-mongering, ‗hostage-taking‘ by continually stipulating impractical or untenable requirements that obstruct progress for all, and repeatedly spreading misinformation about the nature and risks and benefits of the data warehouse. The term ‗debate‘ is a cover for the axe the people who behave this way are grinding. It obfuscates their real aim, which is to interfere with others‘ consenting and freely deciding to play. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 5 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods Conclusion Over the past 20 years numerous books and journal articles addressing ethics and legal issues of public and quasipublic goods have appeared. Likewise, there are now numerous scholarly publications on online cultures and societies embodied by MMORPGs and social networks, illustrating how to do ethics in such virtual societies. And yet there has been to-date a conspicuous failure, to apply three important, well-established principles of bioethics—namely, beneficence, nonmaleficence, and social justice [Beauchamp 2008]—to matters concerning de-identified data warehouses. Instead, the emphasis has so far been almost entirely on the fourth ―leg‖ of the bioethics ―stool,‖ individual autonomy. In light of the many current and potential future public health benefits of thoughtful, ethical, regulation-compliant uses of de-identified EHR-derived data warehouses—to enable translational research and discovery, support optimized quantitative design of clinical trials, facilitate personalized medicine decision-support algorithms, improve safety, reduce costs and enhance health services efficiency, measure and compare quality of different treatments, and many other valuable and worthwhile purposes—it is vital that the objectives and beneficiaries of data-centered endeavors should be understood in their full breadth and depth. It is vital to illuminate the benefits to all healthcare constituents of using comprehensive data stores, and not one-sidedly or misleadingly illuminate only potential hazards that may be associated with some of them, mainly poorly-designed or poorly-managed ones. Ultimately, though, ‗trust‘—denoted by ‗embodied co-presence‘ among the mutually-trusting individuals and groups and by durable social commitments that they enter into—may be enhanced through online environments and information sharing, as shown in the book by Charles Ess and May Thorseth [2011]. They note how withholding commitments toward others is corrosive to the public good we call trust. Applying these ideas to data warehouses of de-identified health information, we can say that the usual emphasis on individuals‘ ―rights‖ (to opt-out of partnerships that are predominantly aimed at public or quasi-public goods; or to demand arbitrary rents as payments for consenting to participate in the partnerships; or for each individual to set their own price) that is frequently not counter-balanced by emphasis on individuals‘ duties to contribute to these resources—that is, their duties as members of the community benefiting from the raw-materials secondary-use resources and from other assets that were made possible by those resources—similarly undermines trust and the quality of the community as a whole. For the good of society, this imbalance must change. For the growth, too, of public- and private-sector revenue as well as compensation to individuals whose data is collated with others‘ data to generate that revenue and growth, we hope that the imbalance will change. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 6 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods Readings and Resources Bainbridge W. The Warcraft Civilization: Social Science in a Virtual World. MIT, 2010. Barrett S. Why Cooperate?: The Incentive to Supply Global Public Goods. Oxford Univ, 2010. Beauchamp T, Childress J. Principles of Biomedical Ethics. 6e. Oxford Univ, 2008. Bielawa L. In Berlin, moved by music, place and memory. NY Times, 15-JUN-2011. (http://opinionator.blogs.nytimes.com/2011/06/15/in-berlin-moved-by-music-place-and-memory/; http://www.unitedstatesartists.org/project/tempelhof_broadcast ; http://www.youtube.com/watch?v=YYz1ohOWdLo) Blumenthal D. ‗Characteristics of a public good and how they are applied to healthcare data‘ in Law and Bioethics, Johnson S, et al., eds. Routledge, 2008, pp. 139-42. Botsis T, et al. Secondary use of EHR: Data quality issues and informatics opportunities. AMIA Summits Transl Sci Proc 2010;2010:1-5. Boyd A, et al. An ‗Honest Broker‘ mechanism to maintain privacy for patient care and academic medical research. Int J Med Inform. 2007;76:407-11. Centre for Computing and Social Responsibility (CCSR). http://www.ccsr.cse.dmu.ac.uk/ Computer Ethics Institute (CEI). http://www.computerethicsinstitute.org/ Corneliussen H, Rettberg J, eds. Digital Culture, Play, and Identity: A World of Warcraft Reader. MIT, 2008. Cuddy L, Nordlinger J, eds. World of Warcraft and Philosophy. OpenCourt, 2009. Dhir R, et al. A multidisciplinary approach to ‗Honest Broker‘ services for tissue banks and clinical data: A pragmatic and practical model. Cancer. 2008;113:1705-15. El Fadly A, et al. The REUSE project: EHR as single datasource for biomedical research. Stud Health Technol Inform. 2010;160(Pt 2):1324-8. Elkin P, et al. Secondary use of clinical data. Stud Health Technol Inform. 2010;155:14-29. Ess C, Thorseth M. Trust and Virtual Worlds. Peter Lang, 2011. Fitzgerald A, Pappalardo K. Building the infrastructure for data access and reuse in collaborative research: An analysis of the legal context. Working Paper, OAK Law Project, Queensland University of Technology, Brisbane, 2007. (http://eprints.qut.edu.au/8865/1/8865.pdf) Frey R. Medicine, animal experimentation, and the moral problem of unfortunate humans. Social Philos & Policy 1996;13:181-210. Geuss R. Public Goods, Private Goods. Princeton Univ, 2003. Goodby A, et al. Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good. National Acad Sciences Press, Inst of Medicine (IOM), 2010. Guadamuz A. Open Science: Open-source licenses in scientific research. North Carolina J Law & Technol 2006;7:33346. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 7 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods Himma K, Tavani H, eds. Handbook of Information and Computer Ethics. Wiley, 2008. van der Hoek W, ed. Information, Interaction, and Agency. Springer, 2010. International Center for Information Ethics (ICIE). http://icie.zkm.de/ Israel S, et al. The oxytocin receptor (OXTR) contributes to prosocial fund allocations in the dictator game and the social value orientations task. PLoS One 2009;4:e5535-44. Kaul I, et al., eds. Providing Global Public Goods, Managing Globalization. Oxford Univ, 2003. Kheterpal S. Clinical research using an information system: The multicenter perioperative outcomes group. Anesthesiol Clin. 2011 Sep;29(3):377-88. Kizza J. Ethical and Social Issues in the Information Age. 4e. Springer, 2010. Knafo A, et al. Individual differences in allocation of funds in the dictator game associated with length of the arginine vasopressin 1a receptor RS3 promoter region and correlation between RS3 length and hippocampal mRNA. Genes Brain Behav. 2008;7:266-75. Li N, et al. t-closeness: Privacy beyond k-anonymity and l-diversity. Proc 23rd Intnl. Conf. Data Engineering (ICDE), 2007. Liu J, et al. Toward a fully de-identified biomedical information warehouse. AMIA Annu Symp Proc. 2009;9:370-4. Lobach D, Detmer D. Research challenges for electronic health records. Am J Prev Med. 2007;32(5 Suppl):S104-11. Manyika J, Chui M. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011. Meystre S, et al. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010;10:70-. Minow M. Partners Not Rivals: Privatization and the Public Good. Beacon, 2003. Powell J, Buchan I. Electronic health records should support clinical research. J Med Internet Res. 2005;14;7:e4. Powell W, Clemens E, eds. Private Action and the Public Good. Yale Univ, 1998. Prokosch H, Ganslandt T. Perspectives for medical informatics: Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48:38-44. Quinn M. Ethics for the Information Age. 3e. Addison-Wesley, 2008. Searle J. Making the Social World: The Structure of Human Civilization. Oxford Univ, 2010. Seltzer W. The promise and pitfalls of data mining: Ethical issues. Am Stat Association JSM 2005:2005:1441-5. Sicart M. The Ethics of Computer Games. MIT, 2011. Sweeney L. k-anonymity: a model for protecting privacy. Intl J Uncertainty, Fuzziness and Knowledge-based Sys 2002;10:557-70. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 8 Cern De-Identified Health Data Have Characteristics of Quasi-Public Goods Taylor T. Play Between Worlds: Exploring Online Game Culture. MIT, 2009. Terry N. 'What‘s wrong with health privacy?' in Law and Bioethics, Johnson S, et al., eds. Routledge, 2008, pp. 68-94. Weiner M, et al. Electronic health records: High-quality electronic data for higher-quality clinical research. Inform Prim Care. 2007;15:121-7. Wylie J, Mineau G. Biomedical databases: Protecting privacy and promoting research. Trends Biotechnol 2003;21:113–6. ©2011 Cerner Corporation. McNair – Bioethics of De-Identified Data Warehouses 9