caBIG Data Structures CS584 Lecture on 4/6/2007 Patrick McConnell Duke Comprehensive Cancer Center patrick.mcconnell@duke.edu Agenda • • • • • • • caBIG background (5 min, 8 slides) • Goals, program structure, organizations caTRIP background (5 min, 6 slides) • Background, use cases, architecture caBIG compatibility (30 min, 21 slides + demonstration) • Interoperability, compatibility, syntactics, and semantics Building caBIG compatible systems (10 min, 7 slides) • Interoperability, compatibility, syntactics, and semantics caGrid (10 min, 8 slides) • Background, service creation, metadata caTRIP demonstration (10 min, 2 slides + demo) • Demonstration Discussion/questions (5 min + throughout) CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG Background Goals, program structure, organizations caBIG background Biomedical information tsunami • overwhelming volume of data • multitude of sources CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG background Informatics tower of Babel •Each cancer research community speaks its own scientific “dialect” •Integration critical to achieve promise of molecular medicine CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG background Goals and principles • 50 Cancer Centers are working towards a common goal of integrated data, tools and methodologies to accelerate cancer research goals at the National Cancer Institute for Bioinformatics (NCICB), the cancer Biomedical Informatics Grid (caBIG™) • The goal of caBIG™ is to create a virtual web of interconnected data, individuals, and organizations which will: • redefine how research is conducted • care is provided • patients / participants interact with the biomedical research enterprise • The principles driving caBIG™ are: • Open Source • Open Access • Open Development • Federated Model CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG background caBIG facilitates sharing CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG background Workspaces DOMAIN WORKSPACE 1 Clinical Trial Management Systems addresses the need for consistent, open and comprehensive tools for clinical trials management. DOMAIN WORKSPACE 2 Integrative Cancer Research provides tools and systems to enable integration and sharing of information. DOMAIN WORKSPACE 3 Tissue Banks & Pathology Tools provides for the integration, development, and implementation of tissue and pathology tools. DOMAIN WORKSPACE 4 Imaging provides for the sharing and analysis of in vivo imaging data. responsible for evaluating, developing, and integrating systems for vocabulary and ontology content, standards, and software systems for content delivery developing architectural standards and architecture necessary for other workspaces. CS584 Lecture on 4/6/2007 CROSS CUTTING WORKSPACE 1 Vocabularies & Common Data Elements CROSS CUTTING WORKSPACE 2 Architecture caBIG Data Structures caBIG background Communities CS584 9Star Research Albert Einstein Ardais Argonne National Laboratory Burnham Institute California Institute of Technology-JPL City of Hope Clinical Trial Information Service (CTIS) Cold Spring Harbor Columbia University-Herbert Irving Consumer Advocates in Research and Related Activities (CARRA) Dartmouth-Norris Cotton Data Works Development Department of Veterans Affairs Drexel University Duke University EMMES Corporation First Genetic Trust Food and Drug Administration Fox Chase Fred Hutchinson GE Global Research Center Georgetown University-Lombardi IBM Indiana University Internet 2 Jackson Laboratory Johns Hopkins-Sidney Kimmel Lawrence Berkeley National Laboratory Massachusetts Institute of Technology Mayo Clinic Memorial Sloan Kettering Meyer L. Prentis-Karmanos Lecture on 4/6/2007 New York University Ohio State University-Arthur G. James/Richard Solove Oregon Health and Science University Roswell Park Cancer Institute St Jude Children's Research Hospital Thomas Jefferson University-Kimmel Translational Genomics Research Institute Tulane University School of Medicine University of Alabama at Birmingham University of Arizona University of California Irvine-Chao Family University of California, San Francisco University of California-Davis University of Chicago University of Colorado University of Hawaii University of Iowa-Holden University of Michigan University of Minnesota University of Nebraska University of North Carolina-Lineberger University of Pennsylvania-Abramson University of Pittsburgh University of South Florida-H. Lee Moffitt University of Southern California-Norris University of Vermont University of Wisconsin Vanderbilt University-Ingram Velos Virginia Commonwealth University-Massey Virginia Tech Wake Forest University Washington University-Siteman Wistar Yale UniversityNorthwestern University-Robert H. Lurie caBIG Data Structures caBIG background Duke’s role in caBIG •Pankaj Agarwal •Bob Annechiarico •Bill Banks •Vijaya Chadaram •Jamie Cuticchia •Raj Dash •Mohammad Farid •Seth Fehrs •Patrick McConnell •Salvatore Mungal •Mark Peedin •CALGB •CCR •Coalition of Cooperative Groups •Dana Farber •Georgetown •Mayo •Oregon Health Sciences University •SemanticBits LLC •University of Pennsylvania •Wake Forest •Yale CS584 Lecture on 4/6/2007 •Integrative Cancer Research • Workspace participant • RProteomics developer • caTRIP developer •Architecture • Workspace participant • caGrid developer • caGrid scientific liaison • Guide to Mentors •Vocabularies and Common Data Elements • Workspace participant • Guide to Mentors •Clinical Trials Management Systems • Workspace participant • C3PR developer • CTMS Interoperability architect • C3D developer •Tissue Banking and Pathology Tools • Workspace participant • caTissue adopter •Strategic Planning • Workspace participant caBIG Data Structures The Cancer Translational Research Informatics Platform (caTRIP) Background, use cases, architecture caTRIP Who is involved? •Duke Bioinformatics • Jamie Cuticchia (PI) and(lead Architects •Managers Patrick McConnell architect) •Duke Information Systems • Bob Annechiarico (PM) • Wilma Stanley (developer) • Mark Peedin (developer) Database Developers and IT • Mohamad Farid (DBA) • Jeff Allred (IT manager) •Duke Pathology • Raj Dash (domain expert) • Chris Hubbard (developer) •Duke Oncology • Kelley Marcom (domain expert) Domain • Gretchen KimmickExperts (domain expert) • Kimberly Blackwell (domain expert) • Lee Wilke (domain expert) •Duke CALGB • Kimberly Johnson (DataMart liaison) CS584 Lecture on 4/6/2007 •SemanticBits • Ram Chilukuri (lead developer) • Srini Akkala (developer) Software Developers • Sanjeev Agarwal (developer) •5 AM Solutions • Bill Mason (developer) •NCI • Julie Klemm (ICR WS lead) • Carl Shaefer (NCI rep) • Subha Madhavan NCI/BAH (caIntegrator PM) •BAH • Curtis Lockshin • Mehul Shah (tech support) caBIG Data Structures caTRIP What is translational research? • • Bench-to-Bedside Wikipedia (the source of all knowledge): Translational medicine is a branch of medical research that attempts to more directly connect basic research to patient care. • • Basic research occurs in the lab Patient care occurs in the clinic • Translational research broadened… Translational medicine can also have a much broader definition, referring to the development and application of new technologies in a patient driven environment - where the emphasis is on early patient testing and evaluation. …facilitate the interaction between basic research clinical medicine, particularly in clinical trials. CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Initial focus • Our initial focus will be on connecting existing data systems, including basic science data, to enhance patient care • Initial problem scenario: outcomes analysis • Use data from existing patients to inform the treatment of another patient • Leverage clinical, pathology, tissue, and basic science data • Scenario: Patient A enters the clinic. What treatments were applied with success on other patients with similar characteristics (race, sex, symptoms, pathology results, adverse events, biomarkers). CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Broadened focus: scientific use cases • Find available tumor tissue • What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive? • Find factors of survival • What are all the ER positive patients that have survived breast cancer after radiation treatment? • Find patients for trials • What are all the patients that are triple negative (ER, PR, and HER2/NEU negative)? • Determine the distribution of disease factors over time • Does a change in pathology biomarkers over time contribute to recurrence or death? • Determine correlation of factors pre and post surgery • Does a change in ER or PR status before and after surgery correlate with other factors? • Find pathology reports of interest • Show me all of the pathology reports for Her2/Neu positive patients with a lobular carcinoma. CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Connecting disparate data systems CAE Pathology Biomarkers Tumor Registry caTissue CORE Diagnosis, Treatment, Recurrence, Follow-up Tissue Bank MRN caIntegrator caTIES SNP Data Pathology Reports CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Architecture overview GUI Distributed Query Engine Domain Grid Services Core Grid Services IdP Service Index Service authorize caTissue Grid CORE Grouper caTIES CAE CGEMS SNP TR Duke caTissue CORE caTIES CAE TR caIntegrator Domain Controller MAW3 CS584 Lecture on 4/6/2007 Tumor Registry Illumina caBIG Data Structures caBIG Compatibility Interoperability, compatibility, syntactics, and semantics caBIG compatibility Interoperability defined Courtesy: Charlie Mead ability of a system to access and use the parts or equipment of another system Syntactic interoperability CS584 Lecture on 4/6/2007 Semantic interoperability caBIG Data Structures caBIG compatibility How does this apply to caBIG? • • • Connect scientists and practitioners through a shareable and interoperable infrastructure Develop standard rules and a common language to more easily share information (compatibility guidelines) Build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care. “The cancer community is united in its mission to eliminate suffering and death due to cancer. It is now connected by caBIG™. “ CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility What is compatibility in caBIG? The four areas of the caBIG compatibility guidelines: • • • • Information Models - Individual types of data are rarely collected or presented in isolation. Rather, they are assembled into a contextual environment that includes closely and more distantly associated data and information. These associations and relationships can be presented in the form of an information model. CDEs - Data that is collected on a given study or trial must be defined and described such that remote users of that data can understand what it means. These metadata descriptions are referred to as data elements. Vocabularies and Ontologies - Biomedical information includes a substantial body of specialized concepts that are represented by terms. Agreement upon the basic concepts, terms and definitions that are inherent in all biomedical information is essential for achieving semantic interoperability. Programming and Messaging Interfaces - Computer programs and the people who write them are able to access resources from other programs through programming and messaging interfaces. Each of these interfaces responds to a particular syntax for its communications. Agreement upon standards for these interfaces is necessary to overcome barriers to syntactic interoperability. CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Levels of compatibility The four levels of the caBIGTM compatibility guidelines: • Legacy - Implies no interoperability with an external system or resource. A system that was designed without awareness of or prior to the availability of these compatibility guidelines, and which does not meet any of the requirements for interoperability. • Bronze - Classifies the minimum requirements that must be met to achieve a basic degree of interoperability. • Silver - A rigorous set of requirements that, when met, significantly reduce the barrier to use of a resource by a remote party who was not involved in the development of that resource. • Gold - Currently being defined by caBIG. Is expected to provide for a formalized grid architecture and data standards that will enable standardized advertising, discovery, and use of all federated caBIG resources. CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility caBIG compatibility guidelines Syntactic Semantic Semantic & Syntactic CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Syntactic interoperability • The solution for syntactic interoperability in caBIG at the silver level of compatibility is for all systems to provide an Object Oriented Application Programmer Interface (API). • Object Oriented Interfaces can be implemented in many programming languages. • This interface can be connected to the caGrid so that the local data repository is globally accessible in a language independent way. • The interface is described by an information model, which acts as the junction between the syntactic components and the semantic components. Gene + name: String + hugoGeneSymbol: String + sequence: String CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Programming and messaging interfaces • • Types of APIs • Client APIs in a programming language • Messaging APIs via a messaging protocol Types of systems • Data services provide access to an information model • Query method • Associations are “traversable” • Analytical services provide methods to manipulate data • Hybrid services provide methods to manipulate information models • Analytical tools consumer of silver compatible data, but don’t produce it CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Programming and messaging interfaces details Legacy Bronze Silver Gold No programmatic interfaces to the system are available. Only local data files in a custom format can be read Data transfer mechanisms implemented only on an ad hoc basis Programmatic access to data from an external resource is possible. Well-described API’s provide access to data in the form of data objects. Standards-based electronic data formats are supported for both input to and output from the system. Standards-based messaging protocols are supported wherever messaging is relevant. All features of Silver, plus: Service-oriented components produce or consume resources in the form of grid services Interoperable with data grid architecture to be defined by caBIG Examples Executables CS584 Lecture on 4/6/2007 Proprietary API/data format JavaDocs XML, ASN.1 SOAP, CORBA Globus caGrid-based services caBIG Data Structures caBIG compatibility caTRIP API Hyperlinks to caTRIP API docs CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility caTRIP grid service WSDL Hyperlinks to caTRIP API WSDL CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility caTRIP grid service WSDL cd Logical Model ResultAggregator engine::FederatedQueryProcessor + + + + processDCQLQueryPlan(DCQLQueryDocument) : CQLQuery aggregateGroups(Group[]) : Group buildGroup(List) : Group processResults(CQLQueryResults) : List Hyperlinks to caTRIP FQP UML engine::FederatedQueryExecutor executes / obtains + executeCQLQuery(CQLQuery, String) : CQLQueryResults executes «interface» engine::FederatedQueryEngine + execute(Document) : CQLQueryResults Obj ect Serv iceClientFactory + getSeviceClient() : Object caGridDataServ ice1Client + CS584 Lecture on 4/6/2007 query(CQLQuery) : CQLQueryResults caGridDataServ ice2Client + query(CQLQuery) : CQLQueryResults caBIG Data Structures caBIG compatibility Semantic interoperability • The Solution for semantic interoperability lies in object oriented UML design of the service, an unambiguous description of elements within the system and storage of the description in a publicly accessible repository (metadata). • UML model • Use of publicly accessible terminologies/ vocabularies/ontologies (EVS-NCI Thesaurus) • Use of publicly accessible metadata repository (caDSR) CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Common data element (CDE) details Legacy Bronze Silver Gold No Structured metadata is recorded Data element descriptions have sufficient detail for a subject matter expert to unambiguously interpret Data elements are built using controlled terminology Metadata is stored and publicized in an electronic format that is separate from the resource that is being described Common Data Elements (CDEs) built from controlled terminologies and according to practices validated by the VCDE workspace are used throughout. CDEs are registered as ISO/IEC 11179 metadata components in the cancer Data Standards Repository (caDSR) All features of Silver, plus: Common Data Elements (CDEs) designated as caBIG Standards by the VCDE workspace are used. Metadata is advertised and discoverable via the caBIG grid services registry Examples Free-text pathology reports CS584 Lecture on 4/6/2007 GeneOntology from GO website NCI Thesaurus GeneOntology registered in EVS NCI Thesaurus caBIG Data Structures caBIG compatibility Metadata stored in caDSR Enterprise Vocabulary Services • Storage of Metadata • caDSR = cancer Data Standards Repository • Common Data Elements = CDEs • Enable end-users to access information about data and use these services without having to accessNavigation humanMenu: developers buttons to navigate to the CDE cart, Form Builder, or back to • = Fusion of UML models + Concepts/Definitions Home( that is back to this page) caDSR Search Tree: Displays all the current caDSR Contexts. Users can search for groups of DEs by navigating the tree. CS584 Lecture on 4/6/2007 Data Element Search Pane: This is the main search window. Users looking for Data Elements can enter a key word or phrase. caBIG Data Structures caBIG compatibility caTRIP CDEs Hyperlinks to caTRIP CDEs CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Vocabulary/terminology details Legacy Bronze Silver Gold Free text used throughout for data collection Use of publicly accessible controlled vocabularies as well as local terminologies. Terminologies must include definitions of terms that meet caBIG VCDE workspace guidelines Terminologies reviewed and validated by the caBIG Vocabulary/Common Data Element (VCDE) Workspace used for all relevant data collection fields. All features of Silver, plus: Full adoption of caBIG terminology standards as approved by the VCDE Workspace. Examples Free-text pathology reports CS584 Lecture on 4/6/2007 GeneOntology from GO website NCI Thesaurus GeneOntology registered in EVS NCI Thesaurus caBIG Data Structures caBIG compatibility Publicly accessible terminologies Enterprise Vocabulary Services • Controlled vocabulary resources for the cancer research community • Vocabulary Products and Services • NCI Thesaurus • NCI Metathesaurus • External Vocabularies • NCI Thesaurus - controlled vocabulary source for metadata • Has excellent coverage of cancer terminology • Expands based on needs for additional terminology • Based on concepts rather than terms • Each concept has a unique identifier or CUI with definitions and synonym • Housed by the Enterprise Vocabulary Service (EVS) • LexBIG • a caBIG-funded vocabulary server to enable a Federated Vocabulary environment. CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility caTRIP CDEs Hyperlinks to a caTRIP concept CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Information model (UML) details Legacy Bronze Silver Gold No model describing the system is available in electronic format Diagrammatic representation of the information model is available in electronic format. Information models are defined in UML as class diagrams and are reviewed and validated by the VCDE workspace. All features of Silver, plus: Information models are harmonized across the caBIG Domain Workspaces Examples cd StatML cd StatML L Mt atS d c statml::Data Database diagram statml::Data 1 1 1 1 +null 0..* statml::Array - 1 1 +array 0..* statml::Null base64Value: String dimensions: String name: String type: String +array +scalar 0..* +scalar - +null 1 +list 0..* - length: Integer name: String type: String 1 statml::Null base64Value: String dimensions: String name: String type: String +array +null 1 +list 0..* - 0..* r a l acs+ statml::Scalar r al a cS::lmt at s 0..* +scalar - *.. 0 gname: n irtS : eString man type: g n irtSString : e pyt g value: n irtS : eString u l av - r a l acs+ length: Integer name: String type: String 1 1 1 * . . 0 l l u n+ *.. 0 y arr a+ lluN::lmt at s y arrA::lmt at s g n irtS : e u l aV 4 6 es a b g n irtS :s n o is n e m i d g n irtS : e m a n g n irtS : e pyt *.. 0 *.. 0 l l u n+ 1 statml::List 1 +context 0..1 +scalar 0..* 0..* 1 +list 0..* 1 statml::Array - 1 1 +null 0..* 1 statml::List 1 0..* name: String type: String value: String 1 +array 0..* statml::Scalar 0..* 0..* +context 0..1 CS584 Lecture on 4/6/2007 at aD::lmt at s *.. 0 - y arr a+ 1 *.. 0 ts i l+ 1 1 1 +list 0..* *.. 0 ts i l+ t siL::lmt at s r e g et nI : ht g n e l g n irtS : e m a n g n irtS : e pyt 1 1 - 1.. 0 tx et n oc+ caBIG Data Structures caBIG compatibility Domain information modeling cd Central Dogma Gene + + + name: String hugoGeneSymbol: String sequence: String +gene +transcriptCollection 1 1..* Transcript • Domain Information Models consist of ‘Classes’ that represent ‘things’ in the real world • Classes contain ‘attributes’ that are characteristics of different instances of things in the real world. +transcript 1 • Relationships between the classes are described by ‘associations’ and indicated by lines with directionality and cardinality +protein 1 • Each class plus attribute creates one Common Data Element (CDE) + + sequence: String length: String Protein + + + • A Domain Information Model is a representation of our understanding of an area of knowledge. name: String aminoAcidSequence: String molecularWeight: double CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Tumor Registry model Diagnosis Participant Collaborative Staging Hyperlinks to Follow up and Recurrence caTRIP UML Treatment CS584 Lecture on 4/6/2007 caBIG Data Structures Building caBIG Compatible Systems Building caBIG compatible systems Steps for creating an analytical system • • • • Step 1: model and register metadata • Model the domain objects • Register metadata Step 2: implement the analytical system • Implement an interface • Map data objects to existing inputs • Plug-in analytics Step 3: create the data service • Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service • Configure the service • Deploy Step 4: invoke the service • Java-based client • Use caTRIP CS584 Lecture on 4/6/2007 caBIG Data Structures Building caBIG compatible systems Steps for creating a data system • • • • Step 1: model and register metadata • Model the domain objects • Register metadata Step 2: implement the information system • Model the databases (via scripts or EA) • Build the database • Generate Java beans • Create Hibernate mappings • Jar it all up Step 3: create the data service • Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service • Configure the service • Deploy Step 4: invoke the service • Java-based client • Use caTRIP CS584 Lecture on 4/6/2007 caBIG Data Structures Building caBIG compatible systems N-tier architecture advertise Index Service caGrid Data Service caCORE SDK CQL Query Distributed Query Engine CQL Engine domain model Object-relational mapping database CS584 Lecture on 4/6/2007 caBIG Data Structures Building caBIG Compatible Systems caCORE SDK Vocabularies Info Model EVS UML Model XMI File Semantic Integration Workbench (SIW) Fixed XMI NO Verified EVSReport Terminology Services Using CodeGen? Successful Test? caDSR Services NO YES Verified Annotated Fixed XMI Load to Stage YES Compatibility Review Messaging Interfaces/ API Code Generator UML Loader UML Loader Common Data Elements Stage Prod Public APIs CS584 Lecture on 4/6/2007 Approved Annotated Fixed XMI Metadata Retrieval caDSR Production caDSR STAGE caBIG Data Structures caBIG compatibility Mapping UML to CDEs UML Class Attribute Datatype Common Data Element (CDE) Value Domain (VD) UML Class Attribute Data Element Concept (DEC) UML Datatype UML Class Object Class (OC) Property UML Attribute EVS Concept CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Mapping UML to CDEs example Created Data Element Gene Entrez Gene Genomic Identifier java.lang.String Class: Gene Datatype: Attribute: entrezGeneID String Gene Entrez Gene Genomic Identifier java.lang.String CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility Use SIW to designate existing CDEs CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Background, service creation, metadata caGrid What is caGrid? • • What is Grid? • Evolution of distributed computing to support sciences and engineering • Sharing of resources (computational, storage, data, etc) • Secure Access (global authentication, local authorization, policies, trust, etc.) • Open Standards • Virtualization What is caGrid? • Development project of Architecture Workspace • Helping define and implement Gold Compliance • Implementation of Grid technology • Leverages open standards, community open source projects • No requirements on implementation technology necessary for compliance • Specifications will be created defining requirements for interoperability • caGrid provides core infrastructure, and tooling to provide “a way” to achieve Gold compliance • Gold compliance creates the G in caBIG™ • Gold => Grid => connecting Silver Systems CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Metadata infrastructure goals • Support strongly typed grid • Syntactic and Semantic interoperability • Programmatic! • Smooth transition from Application to Grid and back • Leverage wealth of existing metadata • Enable service Advertisement and Discovery CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Service development process • Service developers first create a service using a simple wizard to specify information (target directory, type of service, service name, etc) • Next developer locate the data types they will use for inputs or outputs • Can be discovered from the caDSR, GME, file system, etc • Operations are then defined that take some number of the data types as input, and produce some number as output • Metadata and Service Properties can be added and configured • The service’s security can be completely configured • Some or all of these steps may be automatically handled by extensions CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Introduce • GUI for creating and manipulating a grid service • Provides means of simple creation of service skeleton that a developer can then implement, build, and deploy • Automatic code generation of complete caBIG compliant grid service which is configured to provide: • • • • Advertisement Standard Metadata Security Complete Client API CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Steps for creating a data system • • • • Step 1: model and register metadata • Model the domain objects • Register metadata Step 2: implement the information system • Model the databases (via scripts or EA) • Build the database • Generate Java beans • Create Hibernate mappings • Jar it all up Step 3: create the data service • Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service • Configure the service • Deploy Step 4: invoke the service • Java-based client • Use caTRIP CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Steps for creating an analytical system • • • • Step 1: model and register metadata • Model the domain objects • Register metadata Step 2: implement the analytical system • Implement an interface • Map data objects to existing inputs • Plug-in analytics Step 3: create the data service • Create an XML Schema • Use the caGrid 1.0 Introduce toolkit to create a service • Configure the service • Deploy Step 4: invoke the service • Java-based client • Use caTRIP CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid caGrid data description infrastructure • • Client and service APIs are object oriented, and operate over well-defined and curated data types Core Services Registered In E GM Registered In Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) Cancer Data Standards Repository Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described Service Enterprise Vocabulary Services Semantically Described In Object Definitions WSDL Data Type Definitions Global Model Exchange Client XSD Service Definition • • XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME) CS584 Lecture on 4/6/2007 Client Uses Validates Against Object Definitions Service API Grid Service Objects Serialize To XML Grid Client Client API Objects Objects caBIG Data Structures caGrid Metadata services • Cancer Data Standards Repository (caDSR) • • • Enterprise Vocabulary Services (EVS) • • • EVS is set of services and resources that address the need for controlled vocabulary The EVS grid service provides: • Query access to the data semantics and controlled vocabulary managed by the EVS Global Model Exchange (GME) • • • caBIG projects register their data models as Common Data Elements (CDEs) which are semantically harmonized and then centrally stored and managed the caDSR The caDSR grid service provides: • Model discovery and traversal • caGrid standard metadata generation capabilities GME is a DNS-like data definition registry and exchange service that is responsible for storing and linking together data models in the form of XML schema. The GME grid service provides: • Access to the authoritative structural representation of data types on the grid Globus Information Services: Index Service • • The Globus Information Services infrastructure provides a generic framework for aggregation of service metadata, a registry of running Grid services, and a dynamic datagenerating and indexing node, suitable for use in a hierarchy or federation of services The Index grid service provides: • Yellow and white pages for the grid caBIG Data Structures CS584 Lecture on 4/6/2007 caGrid caGrid production environment CS584 Lecture on 4/6/2007 caBIG Data Structures The Cancer Translational Research Informatics Platform (caTRIP) Demonstration caTRIP Clinical and research scenarios • • • Clinical scenario for demonstration • A patient enters the clinic and is diagnosed with a lobular carcinoma • The Her2/Neu biomarker test comes back positive • What are the treatments and outcomes of other patients with similar characteristics? • Query for diagnosis date, treatment, treatment date, survival, recurrence, and BRCA1 and BRCA2 status • Look for treatments given with success and correlation between BRCA status in case test should be ordered Research scenario for demonstration • Is there a correlation between recurrence, mortality, histologic grade, and Her2/Neu status for breast cancer patients diagnosed with lobular carcinoma? • Query caTRIP for recurrence type, date of death, histologic grade, and Her2/Neu status for patients diagnosed with lobular carcinoma • Correlation is determined in Microsoft Excel • Investigate gene biomarkers that correlate with a Her2/Neu status of negative and survival • Query caTRIP for all available tissue to order for microarray experiments Query sharing • What are all the triple negative patients? CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Why the Simple GUI? • What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive? caTissue CORE CAE Participant Medical Record Number Tumor Registry CS584 Lecture on 4/6/2007 CGEMS caBIG Data Structures Discussion/questions CS584 Lecture on 4/6/2007 caBIG Data Structures Backup Slides CTMS Interoperability Project Goals, scope, BRIDG, architecture, demo CTMSi A collaborative effort 11 Organizations • Booz Allen Hamilton • Dana-Farber • Duke University • Ekagra • Harvard University • Mayo Clinic • NCICB • Nortel Government Solutions • Northwestern University • ScenPro • SemanticBits 8 Locations • Maryland • Minnesota • Virginia • Georgia • Massachusetts • North Carolina • Illinois • France CS584 Lecture on 4/6/2007 35+ Team Members / 5 Applications • Cancer Central Clinical Participant Registry (C3PR) • Cancer Central Clinical Database (C3D) • Patient Study Calendar (PSC) • caXchange: LabViewer and the Clinical Trials Object Model (CTOM) • Cancer Adverse Events Reporting System (caAERS) 8 Roles • Analysts • Architects • Developers • Project Director • Project Manager • Project Sponsor • Project Tech Leads • Subject Matter Experts caBIG Data Structures CTMSi Credits Project Director: Meg Gronvall (BAH) Charles N. Mead, M.D. (BAH) NCICB CTMS Lead: Christo Andonyadis, D.Sc. (NCICB) Project Manager: Edmond Mulaire (SemanticBits) Project Architects: Patrick McConnell (Duke) Niket Parikh (BAH) Analysts: Smita Hastak (ScenPro) Wendy Ver Hoef (ScenPro) Subject Matter Experts: Sharon Elcombe (Mayo Clinic) Vijaya Chadaram (Duke) Jomol Mathew (Dana-Farber) Renee Webb (Northwestern) NCICB Systems Support: Gavin Brennan (TerpSys), Vanessa Caldwell (TerpSys), Doug Kanoza (TerpSys), Wei Lu (TerpSys), Ralph Rutherford (TerpSys) CS584 Lecture on 4/6/2007 Project Technical Leads: Ram Chilukuri (SemanticBits) Charles Griffin (Ekagra) Vinay Kumar (SemanticBits) Stephen Reckford (Nortel Government Solutions) Rhett Sutphin (Northwestern) Sean Whitaker (Northwestern) caAERS: Ram Chilukuri (SemanticBits), Krikor Krumlian (Akaza Research), Vinay Kumar (SemanticBits), Rhett Sutphin (Northwestern), Kulasekaran Sethumadhavan (SemanticBits), Sujith Thayylithodi (SemanticBits) caGrid: Manav Kher (SemanticBits), Vinay Kumar (SemanticBits), Joshua Phillips (SemanticBits) caXchange (Lab Viewer/CTOM): Charles Griffin (Ekagra), Smita Hastak (ScenPro), Mukesh Mediratta (Ekagra), Kunal Modi (Ekagra), Wendy Ver Hoef (ScenPro) caXchange Extensions: Ekagra, SemanticBits C3D: Srinivas Batchu (Ekagra), Patrick Conrad (Ekagra), Rangaraju Gadiraju (Ekagra), Stephen Reckford (Nortel) C3PR: Kruttik Aggarwal (SemanticBits), Ram Chilukuri (SemanticBits), Ramakrishna Gundala (SemanticBits), Manav Kher (SemanticBits), Patrick McConnell (Duke), Priyatam Mudivarti (SemanticBits) PSC: Rhett Sutphin (Northwestern), Sean Whitaker (Northwestern) caBIG Data Structures CTMSi Goal Integrate Lab Results Participant Registration caXchange caGrid Patient Scheduling Adverse Events Clinical Trials DB CS584 Lecture on 4/6/2007 caBIG Data Structures CTMSi BRIDG extract cd CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange Name: Author: Version: Created: Updated: CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange Smita Hastak 1.0 8/13/2001 12:00:00 AM 1/12/2007 9:50:44 AM Clinical Research Entities and Roles::Person + + - administrativeGenderCode: BRIDGCodedConcept dateOfBirth: dateTime ethnicGroup: string firstName: string lastName: string race: string PersonRole In implementation: do NOT use identifier C3PR only uses Participation SubjectIdentifier Clinical Research Activ ities and Participation: :StudySubj ect Clinical Research Entities and Roles::Participant ::Person + administrativeGenderCode: BRIDGCodedConcept + dateOfBirth: dateTime - ethnicGroup: string 1 - firstName: string - lastName: string - race: string ::Role + id: BRIDGID Subject 0..* + studySubjectIdentifier: BRIDGID ::Participation + endDate: dateTime + identifier: BRIDGID + startDate: dateTime = + status: BRIDGStatus 1 Identifier + + 0..* BRIDG Shared Classes::Activity + + + 0..* + 1 Eligibility StudyParticipantEligibility + 0..* 1 + + Participation Clinical Research Activ ities and Participation::StudySite OrganizationRole + + identifier: BRIDGID name: string Clinical Research Entities and Roles:: HealthCareSite Site 1 ::Organization + identifier: BRIDGID + name: string ::Role + id: BRIDGID ::Participation + endDate: dateTime 0..* + identifier: BRIDGID + startDate: dateTime = + status: BRIDGStatus Clinical Research Activities and Participation::Study + + Adverse Events endDateTime: dateTime startDateTime: dateTime Study 1 Clinical Research Entities and Roles:: Organization isEligible: boolean Clinical Research Activ ities and Participation:: PerformedActiv ity 0..* identifier: BRIDGID type: BRIDGCodedConcept codedDescription: BRIDGCodedConcept description: BRIDGDescription status: BRIDGStatus type: BRIDGCodedConcept +are performed at 1..* In implementation: do NOT use endDate, startDate, status +participate in 1 id: BRIDGID longTitle: string Observation Observ ations::Adv erseEv ent - verbatimTerm: String ::Activity + codedDescription: BRIDGCodedConcept + description: BRIDGDescription + status: BRIDGStatus + type: BRIDGCodedConcept Labs Clinical Research Activ ities and Participation::LabTest +labTest +labResult 1 0..1 ObjectiveResult QuantitativeMeasurement Clinical Research Activ ities and Participation::LabResult NOTES Green notes mark classes where attributes inherited from the same superclass are inherited in two different subclasses but are not necessarily used in both. Note to Implementers: This is an analysis model not an implementation model, and therefore supplemental attributes may be required in your implementation model to support data exchange between applications (e.g. extra ids). Furthermore, it may be that not all attributes included here are required for data exchanges and may be eliminated from this model. It is also likely that an implementation based on this model may collapse associations to simplify the structure of data exchanges. CS584 Lecture on 4/6/2007 Disclaimer: BRIDG classes used in this model have been pared down to only what is needed for data exchange in the CTMS Interoperability project and this in no way indicates or suggests changes to the official BRIDG model. + textResult: string ::QuantitativeMeasurement + numericResult: float + numericUnits: BRIDGCodedConcept + referenceRangeComment: string + referenceRangeHigh: int + referenceRangeLow: int caBIG Data Structures cd Comprehensiv e Logical Model Design Concepts::Masking Clinical Trial Design + + + + level: objectOfMasking (set): procedureToBreak: unmaskTriggerEvent (set): Protocol Concepts:: Control HasSubElements AbstractActivity «Period» Design Concepts::Element Protocol Concepts::DesignCharacteristic Name: Author: Version: Created: Updated: Comprehensive Logical Model Fridsma 1.0 7/22/2005 2:53:51 PM 7/29/2005 2:33:32 PM + + + + + + Protocol Concepts:: Scope Protocol Authoring and Documentation - Protocol Concepts:: Configuration synopsis: type: test value domain = a,d,f,g summaryDescription: summaryCode: detailedMethodDescription: detailedMethodCode: Children: Set epochType: EpochTypes AbstractActivity displayName: char[] whoPerforms: int sequence: int procDefID: PSMCodedConcept sourceText: char[] SubjectEvent Protocol Concepts::StudyBackground(w hy) + + + + + + + + + + + hasElements hasScheduledEvents Design Concepts::PlannedTask - Protocol Concepts::Bias Design Concepts::Arm Design Concepts::ProtocolEv ent description: PSMDescription summaryOfPreviousFindings: PSMDescription summaryOfRisksAndBenefits: PSMDescription justificationOfObjectives: PSMDescription justificationOfApproach: PSMDescription populationDescription: PSMDescription rationaleForEndpoints: PSMDescription rationaleForDesign: PSMDescription rationaleForMasking: PSMDescription rationaleForControl: PSMDescription rationaleForAnalysisApproach: PSMDescription hasElements Protocol Concepts::StudyObligation + + + + 1 type: ENUMERATED description: PSMDescription commissioningParty: responsibleParty: Design Concepts:: Randomization + + Protocol Concepts:: Concurrency - - parent: AbstractActivity eventType: ScheduledEventType studyOffset: PSMInterval studyDayOrTime: char nameOfArm: char[] plannedEnrollmentPerArm: char[] randomizationWeightForArn: int associatedSchedules: Set tasksPerformedThisSchedule taskAtEvent hasOngoingEvents minimumBlockSize: maximumBlockSize: 1..* AbstractActivity +correlativeStudy 0..* Design Concepts::Ev entTask BusinessObj ects::Study BusinessObj ects:: ClinicalDev elopmentPlan -_DevelopmentPlan + + + + + + BusinessObj ects: :Amendment * - +primaryStudy 1 startDate: Date endDate: Date type: PSMCodedConcept phase: PSMCodedConcept randomizedIndicator: Text SubjectType: PSMCodedConcept localFacilityType: LocalFacilityType centralFacilitityType: CentralFacilitiyType eventID: OID taskID: OID purposes: Set - Protocol Concepts::StudyObj ectiv e(w hat) + + + + 1 + description: PSMDescription intentCode: SET ENUMERATED objectiveType: ENUM{Primary,Secondary,Ancillary} id: PSMID + + + + + + + + + + + + 1 1..* - BusinessObj ects:: ClinicalStudyReport description: BRIDGDescription ranking: OutcomeRank associatedObjective: Set analyticMethods: Set asMeasuredBy: Set outcomeVariable: threshold: - Design Concepts::Ev entTaskPurpose BasicTypes::StudyDatum isBaseline: boolean purposeType: PurposeType associatedOutcome: - type: +target 0..* complete: bool value: Value timestamp: timestamp itemOID: Defined By + + + + - «ODM:ItemData» Design Concepts:: SubjectDatum - «ODM ItemData» Design Concepts:: DiagnosticImage BasicTypes::RIMActiv ityRelationship BasicTypes::StudyVariable - «ODM:ItemData» Design Concepts:: TreatmentConfirmed 1 geographicAddress: electronicCommAddr: standardIndustryClassCode: 1 BusinessObj ects:: StatisticalAnalysisPlan + + + + - birthTime: sex: deceasedInd: boolean deceasedTime: multipleBirthInd: boolean multipleBirthOrderNumber: int organDonorInd: boolean + hasHypotheses relationshipCode: PSMCodedConcept sequenceNumber: NUMBER negationIndicator: BOOLEAN time: TimingSpecification contactMediumCode: PSMCodedConcept targetRoleAwarenessCode: PSMCodedConcept signatureCode: PSMCodedConcept signature: PSMDescription slotReservationIndicator: BOOLEAN substitionConditionCode: PSMCodedConcept id: PSMID status: PSMCodedConcept + + + + + + # jobCode: PSMCodedConcept -source activity + confidentialityCode: Entities and Roles:: Access Entities and Roles::Person significanceLevel: double lowerRejectionRegion: int upperRejectionRegion: int testStatistic: comparisonType: AnalyticComparisonTypes associatedSummaryVariables: BasicTypes::ActActRelation hasAnalysisSets hasAnalyses geographicAddress: maritalStatusCode: educationLevelCode: raceCode: disabilityCode: livingArrangementCdoe: electronicCommAddr: religiousAffiliationCode: ethnicGroupCode: kindOfActRelation kindOfAnalysis * Statistical Concepts:: StatisticalAssumption + - * Protocol Concepts::StudyObj ectiv eRelationship + type: PSMCodedConcept statement: PSMDescription associatedObjective: clinicallySignificantDiff: char AbstractActivity -_Hypothesis * hasAnalyses + description: PSMDescription analysisType: Set{AnalysisTypes} analysisRole: rationaleForAnalysisApproach: PSMDescription associatedStrategy: associatedHypotheses: * manufacturerModelName: softwareName: localRemoteControlStateCode: alertLevelCode: lastCalibrationTime: +contains 1 hasModel OStudy Design and Data Collection:: OBRIDGDeriv ationExpression BasicTypes::BRIDGInterv al + startTime: timestamp endTime: timestamp + + + + description: PSMDescription outputStatistic: StudyVariable computations: Set assumptions: Set + + + source: Text version: Text value: Text + + OProtocolStructure:: ResponsibilityAssignment alphaSpendingFunction: timingFunction: analysis: trialAdjustmentRule: code: TEXT codeSystem: codeSystemName: TEXT codeSystemVersion: NUMBER displayName: TEXT originalText: TEXT translation: SET{PSMCodedConcept} + + + + description: PSMDescription algorithm: char input: AbstractStatisticalParameter output: AbstractStatisticalParameter OProtocolStructure:: Activ ityDeriv edData hasSchedules + + + BusinessObj ects:: EnrollmentRecord randomizationCode: subjectID: assignmentDateTime: BusinessObj ects::Guide + + + effectiveEndDate: effectiveStartDate: statusValue: name: TEXT value: controlledName: PSMCodedConcept businessProcessMode: PSMBusinessProcessMode type: PSMCodedConcept effectiveTime: BRIDGInterval usage: PSMCodedConcept BusinessObj ects:: SponsorStudyManagementProj ectPlan 1 1..* BusinessObj ects::Inv estigatorRecruitmentPlan BusinessObj ects:: DataMonitoringCommitteePlan BusinessObj ects:: FinalRandomizationAssignment BusinessObj ects:: Waiv er BasicTypes::BRIDGContactAddr + + + 1..* 1 BusinessObj ects:: SiteStudyManagementProj ectPlan BusinessObj ects::BiospecimenPlan OStudy Design and Data Collection:: OEncounterDefinitionList--??? «abstraction» BusinessObj ects:: SiteSubj ectManagementProj ectPlan BusinessObj ects:: ClinicalTrialMaterialPlans BusinessObj ects:: RandomizationAssignment criterion: RULE eventName: TEXT Protocol Concepts::Constraint BasicTypes::BRIDGStatus «implementationClass» Design Concepts:: TemporalRule AbstractActivity Statistical Concepts::Computation - +passedTo +generates 0..* BasicTypes::BRIDGAnalysisVariable BasicTypes::BRIDGCodedConcept - implements Statistical Concepts:: SequentialAnalysisStrategy + + + + 1..* +targetActivity +sourceActivity 1 Protocol activities and Safety monitoring (AE) BusinessObj ects: : RegulatoryRecord type: ENUM{transformation, selection} rule: TEXT id: PSMID name: TEXT OStudy Design and Data Collection::OBRIDGTransition BasicTypes::BRIDGID isExclusive: bool + run() : bool Statistical Concepts::StatisticalModel + # - Plans::Protocol/Plan modeValue: ENUM {Plan, Execute} - hasAssumptions Implements implements hasComputations 1 BasicTypes::BRIDGBusinessProcessMode BasicTypes:: AbstractRule description: PSMDescription Statistical Concepts::Analysis + + + + # # Entities and Roles::Dev ice +IsContainedIn + «property» relationQualifier() : PSMCodedConcept «property» sourceAct() : AbstractActivity «property» destAct() : AbstractActivity kindOfAnalysis -_StatisticalAnalysisSet hasStrategy - + + + clinicalJustification: TEXT Statistical Concepts::Hypothesis hasChildAnalyses strain: genderStatusCode: description: BRIDGDescription relationQualifier: BRIDGCodedConcept mode: PSMBusinessProcessMode effectiveTime: BRIDGInterval priorityNumber: NUMBER negationRule: AbstractRule detail: char sourceAct: AbstractActivity destAct: AbstractActivity sequence: int AbstractActivity businessProcessMode: PSMBusinessProcessMode code: PSMCodedConcept derivationExpression: TEXT status: PSMCodedConcept +TerminatingActivity 1..* availabilityTime: TimingSpecification priorityCode: PSMCodedConcept confidentialityCode: PSMCodedConcept repeatNumber: rangeOfIntegers +EndEvent 1 interruptibleIndicator: BOOLEAN uncertaintyCode: CodedConcept +StartEvent 1 reasonCode: PSMCodedConcept Entities and Roles:: NonPersonLiv ingEntity + - + - Statistical Concepts:: SampleSizeCalculation +FirstActivity 1..* + + + + description: char subgroupVariable: StudyDatum sequence: int + BasicTypes::RIMActivity + + + + + + + + + + + OProtocolStructure:: ElectronicSystem lotNumberText: string expirationTime: stabilityTime: description: PSMDescription scopeType: AnalysisScopeTypes hasCriteria -source activity +target activity 1 Entities and Roles:: ManufacturedMaterial + - Statistical Concepts::HypothesisTest Entities and Roles::Patient - Statistical Concepts:: StatisticalAnalysisSet hasAnalysisSets evaluableSubjectDefinition: char intentToTreatPopulation: char clinicallyMeaningfulDifference: char proceduresForMissingData: char statSoftware: char methodForMinimizingBias: char subjectReplacementStrategy: char randAndStratificationProcedures: char Statistical Concepts::AnalysisSetCriterion Entities and Roles::Employee formCode: - - 1..* Entities and Roles::Material «ODM:ItemDef» Design Concepts:: PlannedObserv ation restates Objective * Entities and Roles::Activ ityRoleRelationship + + + + + * + + + + + + + OID: long Name: char unitOfMeasureID: OID minValid: maxValid: controlledName: ENUM relationshipCode: PSMCodedConcept sequenceNumber: NUMBER pauseCriterion: checkpointCode: splitCode: joinCode: negationIndicator: BOOLEAN conjunctionCode: 1 id: code: PSMCodedConcept name: status: effectiveStartDate: effectiveEndDate: geographicAddress: electronicCommAddr: certificate/licenseText: kindOfActivityRelation subjectID: int Statistical Concepts::StatisticalConceptArea * + + + + + + + + + roleInAnalysis: RoleInAnalysisTypes transactionType: - type: description: PSMDescription version: ID: SET PSMID * Entities and Roles::Role «ODM:ItemDef» Design Concepts:: PlannedInterv ention kindOf «abstract» Design Concepts:: StudyActivityDef BusinessObj ects:: SupplementalMaterial 1..* instantiationType: ENUM {Placeholder, Actual} id: SET <PSMID> 1 name: string code: PSMCodedConcept quantity: int description: PSMDescription statusCode: BRIDGStatus 1 existenceTime: BRIDGInterval riskCode: PSMCodedConcept handlingCode: PSMCodedConcept contactInformation: SET <PSMContactAddr> AbstractActivity isKindOf BasicTypes::AnalysisVariableInst associatedVariable «ODM ItemData» Design Concepts::Observ ation Entities and Roles:: Study Entities and Roles::Liv ingEntity Design Concepts:: StudyActiv ityRef Defined By as Measured By Entities and Roles::Entity Entities and Roles::Organization activityID: OID 1..* Protocol Concepts::Outcome BusinessObj ects:: ProtocolDocument effectiveEndDate: DATETIME version: author: SET effectiveStartDate: DATETIME ID: SET PSMID documentID: type: ENUMERATED = formal plus non... description: PSMDescription title: status: PSMStatus confidentialityCode: PSMCodedConcept businessProcessMode: PSMBusinessProcessMode + + + + + + + + + + + - «execution mode» Scheduled Sub Activities -source objective Protocol Concepts::StudyDocument + + + + + + + + + + + + eventType: UnscheduledEventType 0..* +source 1 gpsText: mobileInd: boolean addr: directionsText: positionText: SubjectEvent Design Concepts::UnscheduledEv ent - hasPurposes * Entities and Roles:: ResearchProgram + - hasUnscheduledEvents description: PSMDescription BusinessObj ects:: CommunicationRecord Entities and Roles::Place Periods: Set Tasks: Set TaskVisits: Set associatedArms: Set BusinessObj ects:: Activ itySchedule (the "how ", "w here", "w hen", "w ho") BusinessObj ects:: IntegratedDev elopmentPlan Clinical Trial Registration hasArms Design Concepts::StudySchedule + Protocol Concepts:: EligibilityCriterion listOfDataCollectionInstruments: Protocol Concepts:: Variance Protocol Concepts:: ExclusionCriterion BusinessObj ects: :ProtocolRev iew + + date: result: Protocol Concepts:: Milestone Eligibility Determination Structured Statistical Analysis BusinessObj ects:: DataManagementPlan BusinessObj ects::AssayProcedures BusinessObj ects:: Adv erseEv entPlan BusinessObj ects:: ContingencyPlan BusinessObj ects:: Subj ectRecruitmentPlan BusinessObj ects::SafetyMonitoringPlan «implementationClass» BusinessObj ects:: BusinessRule «implementationClass» Design Concepts:: ClinicalDecision CTMSi Architectural overview C3PR Oracle Grid Service C3D PSC Oracle Web Service LabViewer/CTOM Postgre Grid Service Oracle Grid Service Authentication Trust Authorization caXchange Enterprise Service Bus Postgre Grid Service Messages Inbound Binding Component caAERS caGrid Outbound Binding Component Dorian GTS Grid Grouper Routing Rules CS584 Lecture on 4/6/2007 caBIG Data Structures CTMSi Demonstration sd ov erv iew sequence caExchange C3PR C3D WS PSC LabViewer CTOM caAERS SME User will create a new patient and register the patient to a protocol, checking the eligibility status. The protocol is already prepopulated amongst all the systems. registerPatient registerPatient(Participant, StudySubject, StudySite, HealthCareSite) isValidProtocol(studyId) patientPositionId= getPatientPosition(site, studyId) registerPatient(Participant, StudySubject, StudySite, HealthCareSite) registerPatient(Participant, StudySubject, StudySite, HealthCareSite) registerPatient(Participant, StudySubject, StudySite, HealthCareSite) registerPatient(Participant, StudySubject, StudySite, HealthCareSite) The user will have a hot-link from the C3PR interface to the PSC interface. The user will see the patient registered on the prepopulated protocol. viewSchedule scheduleActivity The user will hot-link over to the Lab Viewer to view Lab activities. viewLabActivities viewLabData(Patient) viewLabData Lab[]= query(id[]) loadLabData caExchange (or some component hooked into caExchange) will load data into C3D. loadLabData(Paticipant, StudySubject, Study, LabTest, LabResult) loadLabData(mrn, studyId, lab, labTest) We may not be able to hot-link to C3D, but the data should be propogated there and viewable from the C3D interface. viewPatient viewLabData selectLabForAE Lab[]= query(id[]) A new AE with some minimal information will be created and sent to caAERS through caExchange. newAE(Paticipant, StudySubject, Study, LabTest, LabResult) id= newAE(Participant, StudySubject, Study, LabTest, LabResult, AE) The user will hot-link from the LabViewer to caAERS, where he can edit and submit the AE. editAE submitAE submitAE(Participant, StudySubject, Study, AE) flagAE(Participant, StudySubject, Study, AE) The user hot-links from caAERS to PSC, where they will see the AE notification and make appropriate changes. login aeNotification modifySchedule CS584 Lecture on 4/6/2007 caBIG Data Structures Service Metadata: All Services • Common Service Metadata • Provided by all services • Details service’s capabilities, operations, contact information, hosting research center • Service operation’s inputs and outputs defined in terms of structure and semantics extracted from caDSR and EVS • Majority auto-generated by Introduce CS584 Lecture on 4/6/2007 caBIG Data Structures Service Metadata: Service Security • Service Security Metadata • Provided by all services • Details the service’s requirements on communication channel for each operation • Can be used by client to programmatically negotiate an acceptable means of communication • For example: Does operation X allow anonymous clients, or are credentials required? • Auto-generated by Introduce CS584 Lecture on 4/6/2007 caBIG Data Structures Service Metadata: Data Service • Data Service Metadata • Provided by all data services • Describes the Domain Model being exposed, in terms of a UML model linked to semantics • Provides information needed to formulate the Object-Oriented Query • As with common metadata, data types defined in terms of structure and semantics extracted from caDSR and EVS • Auto-generated by CS584 Lecture Introduce on 4/6/2007 caBIG Data Structures caTRIP in-depth: Architecture Security authorization User Grid Certificate authentication User Credentials SAML Assertion caGrid Authentication Service Duke Authentication Plugin Dorian Grid Data Service CSM Trust Fabric Grid Grouper backend data Duke Domain Controller NT Security CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP in-depth: Data sharing Challenges in data sharing • • Building data-oriented systems • Duke requires IRB approval to gain access to identifiable data • We worked around by leveraging people already on IRB protocols Deidentifying data • Data is owned by different groups across the cancer center • Traditional deidentification: data manager deidentifies an entire dataset then throws away the key • Distributed deidentification: trusted service provider (TSP) deidentifies discreet values • Traditional approach is not scalable – requires a middle-man • IRB approval required for distributed approach because it deviates from traditional deidentification (at Duke) CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP in-depth: Data sharing Distributed deidentification Secure connection MRN3 GHI789 Has IRB approval to see identifiable data Has IRB approval to store identifiable data MRN3 Trusted Service Provider Has IRB approval to see identifiable data PHI DEID MRN1 ABC123 MRN2 DEF456 MRN3 GHI789 . . . CS584 Lecture on 4/6/2007 GHI789 . . . Randomly generated caBIG Data Structures caTRIP in-depth: Architecture Simple GUI configuration Service A Service B BreastCancerBiomarkers Target Linking Object Join Condition Associated Classes ParticipantMedicalIdentifier Association Direction SpecimenCharacteristics Associated Object Tree Foreign Association inbound Paths Linking Object Join Condition Target Association Direction Foreign Association Outbound Path CS584 Lecture on 4/6/2007 Filter Object Join Condition: CDE ex. MRN Service A Service B Foreign Association caBIG Data Structures caTRIP in-depth: Architecture caBIG compatibility • • • • Challenge • Silver-compatibility is in some ways (and for good reason) stringent • Grid technologies were still in development (caGrid 1.0 is now released) caTRIP is a silver-compatible application (in theory) • Compatibility submission package completed • Going through review now for silver-compatible data services caTRIP leverages caCORE technologies • Common Security Module (CSM) provides authorization • caCORE-SDK provides tooling to create Java classes from UML (XMI), XML schemas, and castor mappings caTRIP leverages caGrid technologies • Index Service provides advertisement and discovery • Authentication Service provides • Dorian helps provide authentication • GTS provides trust fabrics CS584 Lecture on 4/6/2007 caBIG Data Structures Next steps • • • • • • Aggregate data from multiple services of the same type • Scenario: caTissue Suite deployed at 13 cancer centers Add datasets and data types • CTMS, population sciences, basic science, etc. Add analytical services • Integrate with workflow • Add visualization components Enhanced reporting • Automate Excel pivot table • Data mining results Enhanced querying • Asynchronous, parallel querying • Querying multiple deployed distributed query services Continue refinement of user interface • Synchronization of advanced and simple GUI • Additional usability features CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid caBIG Resources • caBIG™ Website: http://cabig.cancer.gov/index.asp • caBIG™ Compatibility Guidelines: https://cabig.nci.nih.gov/compatibility_guidelines_documentation/ • Cancer Common Ontologic Representation Environment (caCORE): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview • Enterprise Vocabulary Services (EVS): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/vocabulary • Cancer Data Standards Repository (caDSR): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr • caCORE Software Developer’s Kit (caCORE SDK): http://ncicb.nci.nih.gov/NCICB/infrastructure/cacoresdk • caCORE Training: http://ncicb.nci.nih.gov/NCICB/training/cadsr_training • Model Driven Architecture: http://www.omg.org/mda/ • UML Modeling: http://www.sparxsystems.com.au/UML_Tutorial.htm CS584 Lecture on 4/6/2007 caBIG Data Structures caTRIP Why can’t I just write DCQL? • What are all the tissue specimens from her2/neu positive patients that have a primary tumor in the breast and are BRCA1 positive? • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • <DCQLQuery xmlns="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql"> <TargetObject name="edu.wustl.catissuecore.domainobject.impl.TissueSpecimenImpl" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTissueCore"> <Association name="edu.wustl.catissuecore.domainobject.impl.SpecimenCollectionGroupImpl" roleName="specimenCollectionGroup"> <Association name="edu.wustl.catissuecore.domainobject.impl.ClinicalReportImpl" roleName="clinicalReport"> <Association name="edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl" roleName="participantMedicalIdentifier"> <Group logicRelation="AND"> <ForeignAssociation> <JoinCondition> <LeftJoin> <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object> <Property>medicalRecordNumber</Property> </LeftJoin> <RightJoin> <Object>edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier</Object> <Property>medicalRecordNumber</Property> </RightJoin> </JoinCondition> <ForeignObject name="edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CAE"> <Association name="edu.duke.catrip.cae.domain.general.Participant" roleName="participant"> <Association name="edu.pitt.cabig.cae.domain.general.AnnotationEventParameters" roleName="annotationEventParametersCollection"> <Association name="edu.pitt.cabig.cae.domain.breast.BreastCancerBiomarkers" roleName="annotationSetCollection"> <Attribute name="HER2Status" predicate="LIKE" value="POSITIVE%"/> </Association> </Association> </Association> </ForeignObject> </ForeignAssociation> <ForeignAssociation> <JoinCondition> <LeftJoin> <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object> <Property>medicalRecordNumber</Property> </LeftJoin> <RightJoin> <Object>edu.duke.cabig.tumorregistry.domain.PatientIdentifier</Object> <Property>medicalRecordNumber</Property> </RightJoin> </JoinCondition> <ForeignObject name="edu.duke.cabig.tumorregistry.domain.PatientIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTRIPTumorRegistry"> <Association name="edu.duke.cabig.tumorregistry.domain.Patient" roleName="patient"> <Association name="edu.duke.cabig.tumorregistry.domain.Diagnosis" roleName="diagnosisCollection"> <Attribute name="primarySite" predicate="LIKE" value="BREAST%"/> </Association> </Association> </ForeignObject> </ForeignAssociation> <ForeignAssociation> <JoinCondition> <LeftJoin> <Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object> <Property>medicalRecordNumber</Property> </LeftJoin> <RightJoin> <Object>gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant</Object> <Property>studySubjectIdentifier</Property> </RightJoin> </JoinCondition> <ForeignObject name="gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CGEMS"> <Association name="gov.nih.nci.caintegrator.domain.analysis.snp.bean.SNPAnalysisGroup" roleName="analysisGroupCollection"> <Attribute name="name" predicate="LIKE" value="BRCA1%"/> </Association> </ForeignObject> </ForeignAssociation> </Group> </Association> </Association> </Association> </TargetObject> </DCQLQuery> CS584 Lecture on 4/6/2007 Select tissue Foreign Join w/ CAE HER2/NEU Positive Foreign Join w/ Tumor Registry Primary Site Breast Foreign Join w/ CGEMS BRCA1 Positive caBIG Data Structures data objects Distributed Query Engine CQL data objects CQL data objects CS584 Lecture on 4/6/2007 database database DCQL database data objects caGrid data service CQL caGrid data caGrid data service service caTRIP Distributed query engine caBIG Data Structures CTMSi BRIDG dynamic modeling • • • • • • • • • *Process flow *story boards *Scenarios *Use cases *Text UML activity diagrams *Links to static structures Interaction diagrams (?) Sequence diagrams Collaboration diagrams (UML 2.0) CS584 Lecture on 4/6/2007 caBIG Data Structures CTMSi Patient registration message caAERS User ESB Acknowledgement Router Grid BC Registration Message PSC CS584 Lecture on 4/6/2007 caAERS Grid Service Registration Message Registration Message PSC Grid Service caBIG Data Structures caBIG compatibility CDE Browser CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility CDE Browser permissible values CS584 Lecture on 4/6/2007 caBIG Data Structures caBIG compatibility NCI Thesaurus Concept Code Relationships Preferred Name Definition Synonyms CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid caGrid community involvement • caGrid itself provides no real “data” or “analysis” to caBIG • It’s the enabling infrastructure which allows the community to do so • Community members add value to the grid as applications, services, and processes (for example: shared workflows) • caGrid provides the necessary core services, APIs, and tooling • The real “value” of the grid comes from bringing this information to the “end user” • Data Services: expose data to the grid in a unified way • Analytical Services: expose analytical operations to the grid • Community members develop end user applications which consume of the resources provided by the grid CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid caGrid exposing silver systems • Object Oriented APIs and data resources are developed using Object types and information models registered in the caDSR • These “silver systems” are grid-enabled by defining a grid service interface that defines the functionality to be exposed to the grid • The grid service interface uses the same Object types as the existing system, but leverages a platform and language neutral representation (XML) of them • The grid service implementation maps service invocations to API calls or queries into the existing system CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Federated Query Processor • Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services • As caGrid data services all use a uniform query language, CQL, the Federated Query Infrastructure can be used to express queries over any combination of caGrid data services • Federated queries are expressed with a query language, DCQL, which is an extension to CQL to express such concepts as joins, aggregations, and target services • Implemented as a stateful grid service, queries may be executed asynchronously and results retrieved at a later time • Supports secure deployments wherein result ownership is enforced • Coupled with semantic discovery capabilities of caGrid, provides a powerful framework for data discovery, mining, and integration CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Data service common query language • Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties • Allows path navigation • Provides logical grouping • Provides name/predicate/value filtering on properties of objects • Recursively defined • Ability to return full Objects, Set of attributes, count of results, or distinct attribute values CS584 Lecture on 4/6/2007 caBIG Data Structures caGrid Example CQL query LIKE “BRCA%” Return all Genes with a symbol beginning with BRCA and have an associated Taxon with a scientificName equal to “Homo sapiens”: = “Homo sapiens” CS584 Lecture on 4/6/2007 <CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery"> <Target name="gov.nih.nci.cabio.domain.Gene"> <Group logicRelation="AND"> <Attribute name="symbol" predicate="LIKE“ value="BRCA%"/> <Association roleName="taxon“ name="gov.nih.nci.cabio.domain.Taxon"> <Attribute name=“scientificName" predicate=“EQUAL_TO” value=“Homo sapiens"/> </Association> </Group> </Target> </CQLQuery> caBIG Data Structures caBIG compatibility Metadata and concepts example CS584 Lecture on 4/6/2007 caBIG Data Structures