Requirements for Complex Interactive Workflows in Biomedical Research Jeffrey S. Grethe, BIRN-CC University of California, San Diego e-Science Workflow Services December 3, 2003 Scientific Workflows Laboratory information and study management Procedural flow Subject Management Experimental Protocol Data Collection Data Preparation And Validation Data Analysis Data acquisition and analysis Data flow Data Deposition Telescience Computation Visualization Training and Dissemination Databases & Digital Libraries Partnership Network Connectivity Remote Instrumentation A combination of several independent technologies integrated for the application of biological tomography in a way that fosters collaboration. Telescience Architecture The Telescience Portal centralizes those layers and presents them to the user as a SINGLE SIGN-ON web-based environment Telescience Portal enabled Tomography Workflow The tomography workflow is composed of the sequence of steps required to acquire, process, visualize, and extract useful information from a 3D volume. • Problems with non-Portal “traditional” workflow: • Applications are heterogenous and platform specific • Spectrum of applications is extremely varied (~20) • Simple Shell Scripts • Parallel Grid enabled software • Commercial software • Administration is responsibility of the user • Manual tracking, handling of data • Advantages of workflow managed by Telescience Portal: • Applications are centralized to a common interface • Automatic and transparent data management - ease use of deposition into database • Appropriate tools have been merged into single applications Biomedical Informatics Research Network Enable new understanding of neurological disease by integrating data across multiple scales from macroscopic brain function to its molecular and cellular underpinnings • Federate distributed multiscale brain data • Accommodate associated Large Scale Computational Challenges • Provide Infrastructure for Next Generation Collaboratory Scales of NS from Maryann Martone What Has BIRN Been Building? • A Stable, Robust, Shared Network and Distributed Database Environment, Tailored to the Pioneering BIRN Collaborations. • Generalizable and Extensible Tools and IT Infrastructure. • Project- & TestBedSpecific Software and Scientific Workflows. • An Interdisciplinary Community of BIRN Investigators. • Processes to help Govern Large Scale Collaborations and A Resource Sharing. The BIRN Site Map as of October 2003 Shared Biomedical IT Infrastructure to Hasten the Derivation of New Understanding and Treatment of Disease through use of Distributed Knowledge Brain Morphometry BIRN • Examining neuroanatomical correlates of neuropsychiatric illnesses including Unipolar Depression, mild Alzheimer’s Disease (AD) and mild cognitive impairment (MCI) • Harvard (MGH and BWH), Duke, UCLA, UC San Diego, Johns Hopkins Human Subjects Considerations • High-resolution structural images can be used as an identifier. • Reconstruction of face from raw anatomical data might be able to be used to identify subject • Some members of BIRN require/desire unaltered raw data • Need to be able to provide both sets of data and handle them properly within the system • BIRN must conform to multiple overlapping regulations • • • • Common Rule HIPAA State Law International Law BIRN De-identification and Upload Pipeline BIRNDUP • Create sharable output of images from diverse input MRI data using a common data entry software package GE Siemens DICOM Files Picker Sort Images De-identify Go/NoGo - Standard -Deface or Mask - Clean DICOM Header - Render Movies -Display Movies - QA Approval of Defacing BWH/MGH Duke Directory Hierarchy - Identify Deface Series UCI Philips Upload -Extract Metadata - Optimize for SRB UCSD Conversion to DICOM Retrospective Data Archives (various formats) Local Desktop Data can not be exported prior to de-identification and validation Human Image Database Data & Provenance Human Subjects Protection and Workflows • Security related metadata • • All data uploaded within BIRN must have security related metadata • Data classification • IRB agreements • Subject consent • Longitudinal data Access to data is dependent on metadata and access privileges • For example, de-identified data can not be shared with all users • Secure environment required for the storage of protected information • Trust in targeted computation resources • • • Compliance with privacy regulations (e.g. auditing) • Ability to trust actual applications/services accessed Auditing of data access and movement required • HIPAA • Internal Security Can distributed auditing/logging meet the above requirement? Morphometry BIRN Harvard-MGH Surface based coordinate system Harvard-BWH Model based three dimensional Medical Image Segmentation UCLA-LONI Dynamics of Gray Matter Loss Rates, Mapped in a Schizophrenia Population Harvard-BWH 3D-Slicer - An Integrated visualization system for surgical planning and guidance using image fusion and interventional imaging Morphometry Analysis Workflow Provide researchers with transparent access to a computing environment that supports their natural working paradigm while taking advantage of the evolving grid infrastructure Expert users required for some interactive processing Morphometry Analysis Workflow Provide researchers with transparent access to a computing environment that supports their natural working paradigm while taking advantage of the evolving grid infrastructure Data curation requires determination of data quality and validity Global versus Local Optimization Provide researchers with transparent access to a computing environment that supports their natural working paradigm while taking advantage of the evolving grid infrastructure Long running workflows may need to be reoptimized during execution MIRIAD Project Study of Major Depression in Late Life MIRIAD collaboration offers promise of BWH and UCLA tools that offer reduced variance and access to atlas-driven lobar and regional analysis MGH Freesurfer Cortical & Subcortical segmentations JHU Linear Deformation Metric Mapping Shape Analysis of Segmented Structures Acquisition Site De-identification BWH 3D – Slicer Visualization BIRN Data Grid MIRIAD Plan/Data Flow • Use Anonymization at Duke to avoid IRB delays • • • • • UCLA Processing • • • • select 50 depression subjects, baseline and year 2 MRI select 50 age-comparable normal subjects, baseline and year 2 MRI select metadata variables that will be needed for analysis anonymize data retaining a new BIRN number to link MRI and metadata that cannot be traced to original subject atlas preparation and orientation/registration (rigid body) of all subjects (also compute HO deformation parameters for segmentation) registration of BWH atlas to common data set after BWH segmentation, perform atlas driven lobar analysis BWH Processing • • EM Segmentation for gray, white, CSF Atlas driven regional segmentation LONI Pipeline Environment • Legacy application from one of the BIRN testbed sites • Designed by domain scientists for their needs • LONI Pipeline not currently designed with the Grid in mind • Client-server model where workflow control client resides on user’s desktop • No means for authentication through user certificates or proxies • Does not use standard grid transfer protocols • Client must remain running even for extended jobs • Does not utilize resource discovery and monitoring to schedule job • Scientist’s Requirements • Easy to use (nice GUI) • Very straightforward way to “wrap” new applications • User selects specific application (version, host) • Easy to view status Function BIRN • Developing a common fMRI protocol to study regional brain dysfunction related to the progression and treatment of schizophrenia • Correlating functional data with anatomical data acquired from the Morphology test-bed to study if there are neuroanatomical correlates with cognitive dysfunction across disorders • UCLA, UC San Diego, UC Irvine, Harvard (MGH and BWH), Stanford, Minnesota, Iowa, New Mexico, Duke/U. North Carolina Function BIRN Harvard-MGH Patients with schizophrenia show abnormal modulation of temporal and frontal regions during semantic processing Stanford – Lucas Center Inferior frontal/temporal neocortical network mediates semantic processing of sentences in healthy individuals fMRI response at 3T and 1.5T with identical software and hardware platforms (GE SIGNA) Functional MRI Analysis Workflow Anatomical Image MR Scanner Scanner Parameters Scanner Parameters Reconstruction Slice Time Motion Anatomical Correction Correction Co-Registration Data K-Space Images Validation Functional Images Anatomical Template Spatial Normalization Motion Parameters Statistical Map Overlay Data Validation Normalized Anatomical Image Results Overlay Valid Data Results Single Subject Overlay Statistics Group Statistics Multiple Subject’s Data Experimental Paradigm Normalized Functional Images Functional MRI Analysis Workflow Anatomical Image MR Scanner Scanner Parameters Scanner Parameters Reconstruction Slice Time Motion Anatomical Correction Correction Co-Registration Data K-Space Images Validation Functional Images Anatomical Template How do I re-analyze 120 subjects? Statistical Map Overlay Overlay Validation Valid Data Results Single Subject Overlay Statistics Group Statistics Multiple Subject’s Data Normalization Data Normalized Anatomical Image Results Spatial Experimental Paradigm Normalized Functional Images Real-Time Experimental Control Anatomical Image Scanner Parameters Reconstruction Scanner Parameters Slice Time Motion Correction Correction K-Space Images Functional Images Statistical Map Overlay Anatomical Image Experiment Control Results Single Subject Overlay Statistics Experimental Paradigm http://www.nbirn.net