Trust and Security in Biological Databases: Security When Collaborating. Gio Wiederhold Depts. of Computer Science, Electr. Eng. and Medicine Stanford University, CA. www-db.stanford.edu/people/gio.html Four related points will be made, they are not primarily technological, since the the majority of failures we experience in protecting privacy are caused by misunderstanding of settings and objectives. • • • • Protection of Privacy requires checking what goes OUT. Access control mechanisms only keep bad guys from getting IN. In bioinformatics and medicine there are many types of collaborators. Collaborators are allowed in, but what they take out must be controlled. 7/26/2016 Gio AAAS 04 1 1 & 3. Protection of Privacy requires checking what goes OUT. Privacy requires that data considered private do not fall into inappropriate or public hands. Private data resides in a variety of systems used by a variety of collaborators a. b. c. d. e. Medical record systems - holistic Drug toxicity and effectiveness studies Hospital and clinic admission records Financial records Billing and payment information Caregivers and researchers Researchers and pharmas Caregivers and managers Managers and accountants Accountants and payors 7/26/2016 Patient Physician Gio AAAS 04 CDC Accreditation Clinics Pharmacy Insurance Carriers Inpatient Laboratory staff Accounting Billing The complexity of usage is such that imposing a fine-grained cell structure is not practical for the information providers Laboratory Those participants have legitimate access rights. They don’t have the right to reveal the information they need. They don’t have the right to read related information. Ward staff Etc.. 2 2. Access control mechanisms only keep bad guys from getting IN. Current Solution • Keep bad guys OUT – Access control requires authentication and authorization – Collaborators and Customers get into authorized areas • Once they are IN no further checking occurs in computer systems – Further checking is done when physical assets are protected • Examples: warehouses, even warehouse stores: Access control Collected contents Not done now in computer systems Release filter Why this omission? Privacy is entrusted to security specialists and surrogates • Cryptographers: important tools, but serves binary settings • Database administrators: valued for making data available • Network administrators: keep accessibility by users 7/26/2016 Gio AAAS 04 3 4. Collaborators are allowed in, but what they take out must be controlled. Release filter Solution • Symmetric checking of access to information systems and also the subsequent release of their contents – Act like a warehouse store o Check and/or remove restricted topics in outgoing documents a. b. c. o Researchers: Names, employers, addresses, emails, . . . Payors: other incidents, prior diseases, admissions, . . . . . . : check specific contents for each collaborating & authorized role Better: check that all terms in outgoing documents are acceptable Use a topic-specific inclusive word/phrase list, and filter others o Paranoia is safest, and the cost is bearable o o most application / usage areas use less than 3000 terms Trapped documents can be released by a security officer Extract text from images, as x-rays, and then check those texts o 7/26/2016 Many media contain unexpected private or identifying data Gio AAAS 04 4 Release checking can also protect privacy in commercial domains Release filter Customers are collaborators; you want customers IN, not OUT In simple and perfect systems they cannot access private areas, but System failures - trap doors, etc. abound o Release checking provides a backstop and intrusion detection Updates for customer convenience create unexpected interactions o Helpful query modification broadens access New usages were not foreseen during design partitioning o Customer access to inventory for rapid supply-line verification o New, unthought of collaborators -- Russians in Kosovo Techniques Techniques -- much content has signatures that are (nearly) unique o Check to stop credit card numbers in outgoing data, as from music sites o Check to stop email addresses in outgoing reports • Don’t rely exclusively on access control when the objective is to protect release of private information ! 7/26/2016 Gio AAAS 04 5 Abstract: Security when Collaborating Panel presentation on “Trust and Security in Biological Databases”; Gio Wiederhold, Ph.D, Stanford University, CA Traditional security mechanisms have focused on access control, assuming that we can distinguish the good and the bad guys, and can label any data collection as being accessible to the good guys. If those assumptions hold the technology is conceptually simple, and only made hard by technical faults. However, there are many practical situations where such sharp distinctions cannot be made, so that the technologies developed to solve access control become inadequate. In medicine, but also in many commercial data collections we find unstructured data. Such data are collected and stored without the submitter being fully aware of their future use and hence unable to consider all future access needs. A complementary technology to augment access control is result filtering: namely inspecting the contents of documents before they leave the boundary of the protected system. I will briefly cite the issue in two settings, one simple and one more complex. Military documents have long been classified into mandatory and discretionary classifications. Legitimate accessors are identified with respect to those categories. But when a new situation arises, the old labels are inadequate. When we had to share information with the Russians in Kosovo, no adequate labeling existed. Relabeling all stored documents was clearly impractical. A filter can be written to check the text for limited, locally relevant contents, and make those available. Any document containing unrecognized noun-phrases would be withheld, or could be handed over to a security officer for manual processing. More complex situations occurs when we have statistical data, as census, or, as in bioinformatics, phenotypic and genomic data. We want to prevent the release of statistical summaries for cells that have fewer than 10 instances say, to reduce the likelihood of inference back to an individual. If we use access control, we have to precompute the minima for columns and rows and aggregate their categorizations for access to prevent release. However, the distributions in those cells is very uneven. So if we check the actual contents at the time of release, we can allow much smaller categories to be used for access and only omit or aggregate cells that are too small. Checking results being released can also provide a barrier for credit card theft and the like. If a person who masquerades as a customer locates a trapdoor and removes 10,000 credit cards instead of an MP3 tune, that can easily be recognized, since those data have very different signatures. In summary, many of our accessors are collaborators or customers, although we know little about them. We want to give them the best possible service, and still protect our property or the privacy that individuals are trusting us to keep. Focusing only on access control, and then not checking what is released is an inadequate, even a naive approach for systems involving collaboration. Research leading to these concepts and supporting technologies was supported by NSF under the HPCC and DL2 programs 7/26/2016 Gio AAAS 04 6 Trust and Security in Biological Databases: Brief Biography Security when Collaborating; Gio Wiederhold, Stanford University, CA Gio Wiederhold is an emeritus professor of Computer Science, Medicine and Electrical Engineering at Stanford University. Since 1976 he has supervised 33 PhD theses in these departments. Currently Gio is continuing part time at Stanford and consulting. He still has seminars on Business on the Internet and and on Genome databases. Research being disseminated includes privacy protection in collaborative settings, large-scale software composition, enabling interoperation of semantically heterogeneous information systems, including simulations for projecting outcomes. His consulting now focuses on valuation of intellectual property inherent in software. Gio Wiederhold was born in Italy, received a degree in Aeronautical Engineering in Holland in 1957 and a PhD in Medical Information Science from the University of California at San Francisco in 1976. Prior to his academic career he spent 16 years in the software industry. Wiederhold has authored and coauthored more than 350 publications and reports on computing and medicine. He spent 1991-1994 in Washington as a program manager at DARPA. Wiederhold has been elected fellow of the ACMI, the IEEE, and the ACM. His web page is http://www-db.stanford.edu/people/gio.html. Information about protection the release of private information can be found at http://www-db.stanford.edu/pub/gio/TIHI/TIHI.html 7/26/2016 Gio AAAS 04 7