Translating Imaging Science to the Emerging Grid Infrastructure Jeffrey S. Grethe - BIRN University of California, San Diego Imaging, Medical Analysis and Grid Environments (IMAGE) May 31, 2016 We speak piously of taking measurements and making small studies that will add another brick to the temple of science. Most such bricks just lie around the brickyard. Platt, J.R. (1964) Strong Inference. Science. 146: 347-353. Objectives • Establish a stable, high performance network linking key Biotechnology Centers and General Clinical Research Centers • Establish distributed and linked data collections with partnering groups - create a “Data GRID” for the BIRN • Facilitate the use of "grid-based" computational infrastructure and integrate BIRN with other GRID middleware projects • Enable data mining from multiple data collections or databases on neuroimaging and bioinformatics • Build a stable software and hardware infrastructure that will allow centers to coordinate efforts to accumulate larger studies than can be carried out at one site. Challenges Neuroscience High Speed Network Computation Mouse BIRN User Access FIRST BIRN Distributed Data Data Integration Morphometry BIRN Informatics Community Policies Best Practices IRB HIPAA Governance Challenges Neuroscience High Speed Network Computation Mouse BIRN User Access FIRST BIRN Distributed Data Data Integration Morphometry BIRN Informatics Community Policies Best Practices IRB HIPAA Governance CREATING BIRN TEST-BED PARTNERSHIPS • Three Research Project “Application Test Beds” have been Assembled to Shape BIRN and Guide Infrastructure Development: • Multi-scale Mouse BIRN - Animal Models of disease / Multi Scale/Multi Method - Examples: MS Mouse, DAT KOM (a schizophrenic and otherwise interesting mouse animal model) and a Parkinson’s Disease Mouse • Brain Morphometrics (Human Structure BIRN) - Targets: neuroanatomical correlates of neuropsychiatric illness (Unipolar Depression, mild Alzheimer's Disease (AD), mild cognitive impairment (MCI) • Functional Imaging BIRN – Development of a common functional magnetic resonance imaging (fMRI) protocol and to study regional brain dysfunction related to the progression and treatment of schizophrenia - attack on underlying cause of disease A National Collaboratory Science Drives The Infrastructure • USE APPLICATION SCIENCE “PULL” TO GUIDE DEVELOPMENT OF THE NEXT GENERATION CYBERINFRASTRUCTURE • Craft a plan to achieve an important scientific goal requiring development and implementation of innovative computational infrastructure. • Articulate a Grand Challenge and define work to achieve this goal with increasing levels of specificity. • Bring application scientists and computer scientists together in projects at each level to build elements of the new infrastructure. Challenges Neuroscience High Speed Network Computation Mouse BIRN User Access FIRST BIRN Distributed Data Data Integration Morphometry BIRN Informatics Community Policies Best Practices IRB HIPAA Governance User Access to Grid Resources •Application environment being developed to provide centralized access to BIRN tools, applications, resources with a Single Login from any Internet capable location •Provides simple, intuitive access to Grid resources for data storage, distributed computation, and visualization Interfacing the Desktop with the Grid • Developed a Java Grid Interface (JGI) that provides wrapper for applications on a users desktop. • Brokers communications and information/data transfer between the application and BIRN resources (e.g. SRB) • LONI Pipeline, 3D Slicer, FreeSurfer, and ImageJ • Continue to extend and develop the JGI • OGSA compliance Distribution of a Bioinformatics Toolbox • Package and deploy test bed—specific software through the distribution of the BIRN bioinformatics toolbox • Use ROCKS (http://www.rocksclusters.org) as the distribution mechanism • Bioinformatics toolbox can be made available to any researcher interested in a robust package of neuroimaging applications. • First release to occur this fall using the new ROCKS distribution model. BIRN Roll FreeSurfer AFNI AIR FSL ••• Grid Wrappers BIRN ROCKS Distribution GridRoll Role Grid ROCKS Core Scientific Workflow • Sequence of steps (utilities, applications, pipelines) required to acquire, process, visualize, and extract useful information from a scientific data. • Advantages of workflow managed within the Portal: • Progress through the workflow can be organized and tracked • Automated and transparent mechanisms for the flow of data from one step to the next using SRB • Tools are centralized and presented with uniform GUIs to improve usability • Administration burden of each step (groups of steps) is eliminated • Flexibility to enhance each process through direct, transparent access to the grid Interactive Scientific Workflows Provide researchers with transparent access to a computing environment that supports their natural working paradigm while taking advantage of the evolving grid infrastructure Data curation requires determination of data quality and validity Workflow Considerations • Provide full provenance for data within the BIRN environment • Morphometry BIRN is modifying tools to provide proper provenance information • Data provenance is being taken into account in the human imaging database • Workflow Optimization • Take advantage of resource discovery services being deployed • Use of data provenance information • Global versus run time optimizations • Incorporation of legacy applications • • • • LONI Pipeline (UCLA) Standard install Incorporation into Portal Advisement on future Grid enhancements to Pipeline Challenges Neuroscience High Speed Network Computation Mouse BIRN User Access FIRST BIRN Distributed Data Data Integration Morphometry BIRN Informatics Community Policies Best Practices IRB HIPAA Governance Governance • Incorporating processes for Multi-sites studies and sharing of human data • • • HIPPA Compliance Patient confidentiality Institutional Review Board (IRB) approvals • Developing guidelines - for sharing data & authorship • Breaking down the barriers • • • • • Mistrust Open sharing of information Who gets credit Commercial products Governance • Integrating new participants IRB Working Group • One member from each BIRN site required to participate • Each member is required to review BIRN consents, waivers and procedures with local IRBs • Regular video conferences among members to coordinate information and activities • Produce BIRN template language for subject consent, IRB waiver for data upload and IRB waiver for data download • Interact with Data Sharing Task Force What Regulations Apply? Institutional Policy HIPAA Common Rule It Depends! State Law IRB Interpretation Local Policy Data Sharing Task Force • Produce guidelines and procedures for data sharing across institutions taking into account Common Rule, HIPAA and state regulations • Develop procedures to allow for longitudinal studies within BIRN • Examine policies that are relevant to BIRN (e.g. revised policies being drafted for tissue banks and data banks) • Interact with Architecture working groups to help define security and subject confidentiality infrastructure and policy • • • • • Data Replication Certificate Policies Registration Authority Policies Local access control Auditing & activity logs EU Privacy Directives • EU directive 95/46/EC: article 8 • Member states shall prohibit the processing of personal data concerning health or sex life. • Recommendation nr R (97) 5: Exceptions • Diagnostic and therapeutic reasons • Public health reasons, public interest • Criminal offenses • Specific contractual obligations fulfilment • Legal claims • Consent for specific purposes Data Classifications Characteristic Individually identifiable ie., meets HIPAA definition of individually identifiable helath information Used for support clinical decision making for an individual, or for payment or operations Associated with healthcare service event Need-to-know, minimum necessary access control Separation of personidentifiable and non-person identifiable data elements wherever feasible Individual authorization (consent) for creation and use of data Business Partner agreements for disclosures Logs and audit trails of use and disclosure Right to request amendment of records Protected Health Information Research Health Information Yes Yes Yes No Yes No Yes Yes No Yes Varies Yes Yes No Yes Current best practice for research records Yes At discretion of investigator Table 1: Data Characteristic s (adapted from Masys et al. 2002 ) Limited Data Set De-Identified Data Varies No Varies Varies Varies Varies Yes No Yes N/A Varies No No Current best practice for research records At discretion of investigator No Current best practice for research records At discretion of investigator Anonymization vs. De-Identification • • • • • Both require deletion of direct identifiers Anonymization cannot have a link field (DeIdentified data can). Anonymization makes protocol eligible for exemption from IRB review. De-Identification makes data exempt from HIPAA regulations. De-Identification with link field does NOT exempt data from IRB review. EU Data Definitions • Recommendation R (97)5 on the protection of medical data • Personal data covers any information relating to an identified or identifiable individual. • An individual shall not be regarded as ‘identifiable’ if identification requires an unreasonable amount of time and manpower. • In cases where the individual is not identifiable, the data are referred to as anonymous Identifiable Health Information • High-resolution structural images can be used as an identifier. • Reconstruction of face from raw anatomical data might be able to be used to identify subject • Some members of scientific community require/desire unaltered raw data • Are allowed to provide both raw and skull stripped data • Need to get approval from local IRB to allow for the sharing of raw anatomical data • Users wishing to access data also require IRB approval Is there a scalable and distributed solution for researchers to access identifiable health information? Raw Skull Stripped Data Sharing Infrastructure • Security related metadata • • All data uploaded within BIRN must have associated metadata • Data classification • IRB agreements • Subject consent • Longitudinal data Data sharing permissions are dependent on metadata • • • For example, de-identified data can not be shared with all users Secure environment required for the storage of protected information • Linkage of BIRN ID with original subject ID • Protected data Auditing of data access and movement required • HIPAA • Internal Security • Data Usage Statistics