Presentation Title Goes Here

advertisement
caBIG Data Structures
CS584 Lecture on 4/6/2007
Patrick McConnell
Duke Comprehensive Cancer Center
patrick.mcconnell@duke.edu
Agenda
•
•
•
•
•
•
•
caBIG background (5 min, 8 slides)
• Goals, program structure, organizations
caTRIP background (5 min, 6 slides)
• Background, use cases, architecture
caBIG compatibility (30 min, 21 slides + demonstration)
• Interoperability, compatibility, syntactics, and semantics
Building caBIG compatible systems (10 min, 7 slides)
• Interoperability, compatibility, syntactics, and semantics
caGrid (10 min, 8 slides)
• Background, service creation, metadata
caTRIP demonstration (10 min, 2 slides + demo)
• Demonstration
Discussion/questions (5 min + throughout)
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG Background
Goals, program structure, organizations
caBIG background
Biomedical information tsunami
• overwhelming volume
of data
• multitude of sources
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG background
Informatics tower of Babel
•Each cancer research
community speaks its own
scientific “dialect”
•Integration critical to achieve
promise of molecular medicine
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG background
Goals and principles
•
50 Cancer Centers are working towards a common goal of integrated
data, tools and methodologies to accelerate cancer research goals at the
National Cancer Institute for Bioinformatics (NCICB), the cancer
Biomedical Informatics Grid (caBIG™)
•
The goal of caBIG™ is to create a virtual web of interconnected data,
individuals, and organizations which will:
• redefine how research is conducted
• care is provided
• patients / participants interact with the biomedical research enterprise
•
The principles driving caBIG™ are:
• Open Source
• Open Access
• Open Development
• Federated Model
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG background
caBIG facilitates sharing
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG background
Workspaces
DOMAIN WORKSPACE 1
Clinical Trial Management Systems
addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2
Integrative Cancer Research
provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3
Tissue Banks & Pathology Tools
provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4
Imaging
provides for the sharing and analysis of in vivo
imaging data.
responsible for evaluating, developing, and integrating
systems for vocabulary and ontology content,
standards, and software systems for content delivery
developing architectural standards and architecture
necessary for other workspaces.
CS584 Lecture on 4/6/2007
CROSS CUTTING WORKSPACE 1
Vocabularies & Common
Data Elements
CROSS CUTTING WORKSPACE 2
Architecture
caBIG Data Structures
caBIG background
Communities
CS584
9Star Research
Albert Einstein
Ardais
Argonne National Laboratory
Burnham Institute
California Institute of Technology-JPL
City of Hope
Clinical Trial Information Service (CTIS)
Cold Spring Harbor
Columbia University-Herbert Irving
Consumer Advocates in Research
and Related Activities (CARRA)
Dartmouth-Norris Cotton
Data Works Development
Department of Veterans Affairs
Drexel University
Duke University
EMMES Corporation
First Genetic Trust
Food and Drug Administration
Fox Chase
Fred Hutchinson
GE Global Research Center
Georgetown University-Lombardi
IBM
Indiana University
Internet 2
Jackson Laboratory
Johns Hopkins-Sidney Kimmel
Lawrence Berkeley National Laboratory
Massachusetts Institute of Technology
Mayo Clinic
Memorial Sloan Kettering
Meyer L. Prentis-Karmanos
Lecture
on 4/6/2007
New York
University
Ohio State University-Arthur G. James/Richard Solove
Oregon Health and Science University
Roswell Park Cancer Institute
St Jude Children's Research Hospital
Thomas Jefferson University-Kimmel
Translational Genomics Research Institute
Tulane University School of Medicine
University of Alabama at Birmingham
University of Arizona
University of California Irvine-Chao Family
University of California, San Francisco
University of California-Davis
University of Chicago
University of Colorado
University of Hawaii
University of Iowa-Holden
University of Michigan
University of Minnesota
University of Nebraska
University of North Carolina-Lineberger
University of Pennsylvania-Abramson
University of Pittsburgh
University of South Florida-H. Lee Moffitt
University of Southern California-Norris
University of Vermont
University of Wisconsin
Vanderbilt University-Ingram
Velos
Virginia Commonwealth University-Massey
Virginia Tech
Wake Forest University
Washington University-Siteman
Wistar
Yale UniversityNorthwestern University-Robert H. Lurie
caBIG Data Structures
caBIG background
Duke’s role in caBIG
•Pankaj Agarwal
•Bob Annechiarico
•Bill Banks
•Vijaya Chadaram
•Jamie Cuticchia
•Raj Dash
•Mohammad Farid
•Seth Fehrs
•Patrick McConnell
•Salvatore Mungal
•Mark Peedin
•CALGB
•CCR
•Coalition of Cooperative Groups
•Dana Farber
•Georgetown
•Mayo
•Oregon Health Sciences University
•SemanticBits LLC
•University of Pennsylvania
•Wake Forest
•Yale
CS584 Lecture on 4/6/2007
•Integrative Cancer Research
• Workspace participant
• RProteomics developer
• caTRIP developer
•Architecture
• Workspace participant
• caGrid developer
• caGrid scientific liaison
• Guide to Mentors
•Vocabularies and Common Data Elements
• Workspace participant
• Guide to Mentors
•Clinical Trials Management Systems
• Workspace participant
• C3PR developer
• CTMS Interoperability architect
• C3D developer
•Tissue Banking and Pathology Tools
• Workspace participant
• caTissue adopter
•Strategic Planning
• Workspace participant
caBIG Data Structures
The Cancer Translational Research
Informatics Platform (caTRIP)
Background, use cases, architecture
caTRIP
Who is involved?
•Duke Bioinformatics
• Jamie Cuticchia (PI)
and(lead
Architects
•Managers
Patrick McConnell
architect)
•Duke Information Systems
• Bob Annechiarico (PM)
• Wilma Stanley (developer)
• Mark Peedin
(developer)
Database
Developers
and IT
• Mohamad Farid (DBA)
• Jeff Allred (IT manager)
•Duke Pathology
• Raj Dash (domain expert)
• Chris Hubbard (developer)
•Duke Oncology
• Kelley Marcom (domain expert)
Domain
• Gretchen
KimmickExperts
(domain expert)
• Kimberly Blackwell (domain expert)
• Lee Wilke (domain expert)
•Duke CALGB
• Kimberly Johnson (DataMart liaison)
CS584 Lecture on 4/6/2007
•SemanticBits
• Ram Chilukuri (lead developer)
• Srini Akkala (developer)
Software
Developers
• Sanjeev
Agarwal (developer)
•5 AM Solutions
• Bill Mason (developer)
•NCI
• Julie Klemm (ICR WS lead)
• Carl Shaefer (NCI rep)
• Subha
Madhavan
NCI/BAH
(caIntegrator PM)
•BAH
• Curtis Lockshin
• Mehul Shah (tech support)
caBIG Data Structures
caTRIP
What is translational research?
•
•
Bench-to-Bedside
Wikipedia (the source of all knowledge):
Translational medicine is a branch of medical research that attempts to more
directly connect basic research to patient care.
•
•
Basic research occurs in the lab
Patient care occurs in the clinic
•
Translational research broadened…
Translational medicine can also have a much broader definition, referring to
the development and application of new technologies in a patient driven
environment - where the emphasis is on early patient testing and evaluation.
…facilitate the interaction between basic research clinical medicine,
particularly in clinical trials.
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Initial focus
•
Our initial focus will be on connecting existing data systems, including
basic science data, to enhance patient care
•
Initial problem scenario: outcomes analysis
• Use data from existing patients to inform the treatment of another patient
• Leverage clinical, pathology, tissue, and basic science data
•
Scenario:
Patient A enters the clinic. What treatments were applied with success on
other patients with similar characteristics (race, sex, symptoms, pathology
results, adverse events, biomarkers).
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Broadened focus: scientific use cases
• Find available tumor tissue
•
What are all the tissue specimens from her2/neu positive patients that have a
primary tumor in the breast and are BRCA1 positive?
• Find factors of survival
•
What are all the ER positive patients that have survived breast cancer after
radiation treatment?
• Find patients for trials
•
What are all the patients that are triple negative (ER, PR, and HER2/NEU
negative)?
• Determine the distribution of disease factors over time
•
Does a change in pathology biomarkers over time contribute to recurrence or
death?
• Determine correlation of factors pre and post surgery
•
Does a change in ER or PR status before and after surgery correlate with other
factors?
• Find pathology reports of interest
•
Show me all of the pathology reports for Her2/Neu positive patients with a lobular
carcinoma.
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Connecting disparate data systems
CAE
Pathology Biomarkers
Tumor Registry
caTissue CORE
Diagnosis, Treatment,
Recurrence, Follow-up
Tissue Bank
MRN
caIntegrator
caTIES
SNP Data
Pathology Reports
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Architecture overview
GUI
Distributed
Query
Engine
Domain Grid Services
Core Grid Services
IdP
Service
Index
Service
authorize caTissue
Grid
CORE
Grouper
caTIES
CAE
CGEMS
SNP
TR
Duke
caTissue
CORE
caTIES
CAE
TR
caIntegrator
Domain
Controller
MAW3
CS584 Lecture on 4/6/2007
Tumor Registry
Illumina
caBIG Data Structures
caBIG Compatibility
Interoperability, compatibility, syntactics, and semantics
caBIG compatibility
Interoperability defined
Courtesy: Charlie Mead
ability of a system to
access and use the
parts or equipment of
another system
Syntactic
interoperability
CS584 Lecture on 4/6/2007
Semantic
interoperability
caBIG Data Structures
caBIG compatibility
How does this apply to caBIG?
•
•
•
Connect scientists and practitioners through a shareable and
interoperable infrastructure
Develop standard rules and a common language to more easily share
information (compatibility guidelines)
Build or adapt tools for collecting, analyzing, integrating, and
disseminating information associated with cancer research and care.
“The cancer community is united in its mission to eliminate suffering and
death due to cancer. It is now connected by caBIG™. “
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
What is compatibility in caBIG?
The four areas of the caBIG compatibility guidelines:
•
•
•
•
Information Models - Individual types of data are rarely collected or presented in
isolation. Rather, they are assembled into a contextual environment that includes closely
and more distantly associated data and information. These associations and
relationships can be presented in the form of an information model.
CDEs - Data that is collected on a given study or trial must be defined and described
such that remote users of that data can understand what it means. These metadata
descriptions are referred to as data elements.
Vocabularies and Ontologies - Biomedical information includes a substantial body of
specialized concepts that are represented by terms. Agreement upon the basic
concepts, terms and definitions that are inherent in all biomedical information is
essential for achieving semantic interoperability.
Programming and Messaging Interfaces - Computer programs and the people who
write them are able to access resources from other programs through programming and
messaging interfaces. Each of these interfaces responds to a particular syntax for its
communications. Agreement upon standards for these interfaces is necessary to
overcome barriers to syntactic interoperability.
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Levels of compatibility
The four levels of the caBIGTM compatibility guidelines:
• Legacy - Implies no interoperability with an external system or resource. A
system that was designed without awareness of or prior to the availability
of these compatibility guidelines, and which does not meet any of the
requirements for interoperability.
• Bronze - Classifies the minimum requirements that must be met to
achieve a basic degree of interoperability.
• Silver - A rigorous set of requirements that, when met, significantly reduce
the barrier to use of a resource by a remote party who was not involved in
the development of that resource.
• Gold - Currently being defined by caBIG. Is expected to provide for a
formalized grid architecture and data standards that will enable
standardized advertising, discovery, and use of all federated caBIG
resources.
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
caBIG compatibility guidelines
Syntactic
Semantic
Semantic &
Syntactic
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Syntactic interoperability
•
The solution for syntactic interoperability
in caBIG at the silver level of
compatibility is for all systems to provide
an Object Oriented Application
Programmer Interface (API).
•
Object Oriented Interfaces can be
implemented in many programming
languages.
•
This interface can be connected to the
caGrid so that the local data repository
is globally accessible in a language
independent way.
•
The interface is described by an
information model, which acts as the
junction between the syntactic
components and the semantic
components.
Gene
+
name: String
+
hugoGeneSymbol: String
+
sequence: String
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Programming and messaging interfaces
•
•
Types of APIs
• Client APIs in a programming language
• Messaging APIs via a messaging
protocol
Types of systems
• Data services provide access to an
information model
• Query method
• Associations are “traversable”
• Analytical services provide methods to
manipulate data
• Hybrid services provide methods to
manipulate information models
• Analytical tools consumer of silver compatible
data, but don’t produce it
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Programming and messaging interfaces details
Legacy
Bronze
Silver
Gold
No programmatic
interfaces to the system
are available. Only local
data files in a custom
format can be read
Data transfer
mechanisms
implemented only on an
ad hoc basis
Programmatic access to
data from an external
resource is possible.
Well-described API’s
provide access to data in
the form of data objects.
Standards-based
electronic data formats
are supported for both
input to and output from
the system.
Standards-based
messaging protocols are
supported wherever
messaging is relevant.
All features of Silver,
plus:
Service-oriented
components produce or
consume resources in
the form of grid services
Interoperable with data
grid architecture to be
defined by caBIG
Examples
Executables
CS584 Lecture on 4/6/2007
Proprietary API/data
format
JavaDocs
XML, ASN.1
SOAP, CORBA
Globus
caGrid-based services
caBIG Data Structures
caBIG compatibility
caTRIP API
Hyperlinks
to caTRIP
API docs
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
caTRIP grid service WSDL
Hyperlinks to
caTRIP API
WSDL
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
caTRIP grid service WSDL
cd Logical Model
ResultAggregator
engine::FederatedQueryProcessor
+
+
+
+
processDCQLQueryPlan(DCQLQueryDocument) : CQLQuery
aggregateGroups(Group[]) : Group
buildGroup(List) : Group
processResults(CQLQueryResults) : List
Hyperlinks to
caTRIP FQP
UML
engine::FederatedQueryExecutor
executes / obtains
+
executeCQLQuery(CQLQuery, String) : CQLQueryResults
executes
«interface»
engine::FederatedQueryEngine
+
execute(Document) : CQLQueryResults
Obj ect
Serv iceClientFactory
+
getSeviceClient() : Object
caGridDataServ ice1Client
+
CS584 Lecture on 4/6/2007
query(CQLQuery) : CQLQueryResults
caGridDataServ ice2Client
+
query(CQLQuery) : CQLQueryResults
caBIG Data Structures
caBIG compatibility
Semantic interoperability
• The Solution for semantic interoperability lies in object
oriented UML design of the service, an unambiguous
description of elements within the system and storage of
the description in a publicly accessible repository
(metadata).
• UML model
• Use of publicly accessible terminologies/
vocabularies/ontologies (EVS-NCI Thesaurus)
• Use of publicly accessible metadata repository
(caDSR)
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Common data element (CDE) details
Legacy
Bronze
Silver
Gold
No Structured metadata
is recorded
Data element
descriptions have
sufficient detail for a
subject matter expert to
unambiguously interpret
Data elements are built
using controlled
terminology
Metadata is stored and
publicized in an
electronic format that is
separate from the
resource that is being
described
Common Data Elements
(CDEs) built from
controlled terminologies
and according to
practices validated by
the VCDE workspace
are used throughout.
CDEs are registered as
ISO/IEC 11179 metadata
components in the
cancer Data Standards
Repository (caDSR)
All features of Silver,
plus:
Common Data Elements
(CDEs) designated as
caBIG Standards by the
VCDE workspace are
used.
Metadata is advertised
and discoverable via the
caBIG grid services
registry
Examples
Free-text pathology
reports
CS584 Lecture on 4/6/2007
GeneOntology from GO
website
NCI Thesaurus
GeneOntology
registered in EVS
NCI Thesaurus
caBIG Data Structures
caBIG compatibility
Metadata stored in caDSR
Enterprise Vocabulary Services
• Storage of Metadata
• caDSR = cancer Data Standards Repository
• Common Data Elements = CDEs
• Enable end-users to access information about data and
use these
services without having to accessNavigation
humanMenu:
developers
buttons to navigate to the CDE
cart, Form Builder, or back to
• = Fusion of UML models + Concepts/Definitions
Home( that is back to this page)
caDSR Search Tree: Displays all
the current caDSR Contexts.
Users can search for groups of
DEs by navigating the tree.
CS584 Lecture on 4/6/2007
Data Element Search Pane:
This is the main search window.
Users looking for Data Elements
can enter a key word or phrase.
caBIG Data Structures
caBIG compatibility
caTRIP CDEs
Hyperlinks to
caTRIP
CDEs
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Vocabulary/terminology details
Legacy
Bronze
Silver
Gold
Free text used
throughout for data
collection
Use of publicly
accessible controlled
vocabularies as well
as local
terminologies.
Terminologies must
include definitions of
terms that meet
caBIG VCDE
workspace guidelines
Terminologies
reviewed and
validated by the
caBIG
Vocabulary/Common
Data Element
(VCDE) Workspace
used for all relevant
data collection fields.
All features of Silver,
plus:
Full adoption of
caBIG terminology
standards as
approved by the
VCDE Workspace.
Examples
Free-text pathology
reports
CS584 Lecture on 4/6/2007
GeneOntology from GO
website
NCI Thesaurus
GeneOntology
registered in EVS
NCI Thesaurus
caBIG Data Structures
caBIG compatibility
Publicly accessible terminologies
Enterprise Vocabulary Services
• Controlled vocabulary resources for the cancer research
community
• Vocabulary Products and Services
• NCI Thesaurus
• NCI Metathesaurus
• External Vocabularies
• NCI Thesaurus - controlled vocabulary source for metadata
• Has excellent coverage of cancer terminology
• Expands based on needs for additional terminology
• Based on concepts rather than terms
• Each concept has a unique identifier or CUI with definitions
and synonym
• Housed by the Enterprise Vocabulary Service (EVS)
• LexBIG
• a caBIG-funded vocabulary server to enable a Federated
Vocabulary environment.
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
caTRIP CDEs
Hyperlinks to
a caTRIP
concept
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Information model (UML) details
Legacy
Bronze
Silver
Gold
No model describing the
system is available in
electronic format
Diagrammatic
representation of the
information model is
available in electronic
format.
Information models are
defined in UML as class
diagrams and are
reviewed and validated
by the VCDE
workspace.
All features of Silver,
plus:
Information models are
harmonized across the
caBIG Domain
Workspaces
Examples
cd StatML
cd StatML
L Mt atS d c
statml::Data
Database diagram
statml::Data
1
1
1
1
+null 0..*
statml::Array
-
1
1
+array 0..*
statml::Null
base64Value: String
dimensions: String
name: String
type: String
+array
+scalar
0..*
+scalar -
+null
1 +list 0..*
-
length: Integer
name: String
type: String
1
statml::Null
base64Value: String
dimensions: String
name: String
type: String
+array
+null
1 +list 0..*
-
0..*
r a l acs+
statml::Scalar
r al a cS::lmt at s
0..*
+scalar -
*.. 0
gname:
n irtS : eString
man type:
g n irtSString
: e pyt g
value:
n irtS : eString
u l av - r a l acs+
length: Integer
name: String
type: String
1
1
1
* . . 0 l l u n+
*.. 0 y arr a+
lluN::lmt at s
y arrA::lmt at s
g n irtS : e u l aV 4 6 es a b
g n irtS :s n o is n e m i d
g n irtS : e m a n
g n irtS : e pyt
*.. 0
*.. 0
l l u n+
1
statml::List
1
+context 0..1
+scalar
0..*
0..*
1
+list 0..*
1
statml::Array
-
1
1
+null 0..*
1
statml::List
1
0..*
name: String
type: String
value: String
1
+array 0..*
statml::Scalar
0..*
0..*
+context 0..1
CS584 Lecture on 4/6/2007
at aD::lmt at s
*.. 0
-
y arr a+
1 *.. 0 ts i l+ 1
1
1
+list 0..*
*.. 0 ts i l+
t siL::lmt at s
r e g et nI : ht g n e l
g n irtS : e m a n
g n irtS : e pyt
1
1
-
1.. 0 tx et n oc+
caBIG Data Structures
caBIG compatibility
Domain information modeling
cd Central Dogma
Gene
+
+
+
name: String
hugoGeneSymbol: String
sequence: String
+gene
+transcriptCollection
1
1..*
Transcript
• Domain Information Models consist of ‘Classes’
that represent ‘things’ in the real world
• Classes contain ‘attributes’ that are characteristics
of different instances of things in the real world.
+transcript
1
• Relationships between the classes are described
by ‘associations’ and indicated by lines with
directionality and cardinality
+protein
1
• Each class plus attribute creates one Common
Data Element (CDE)
+
+
sequence: String
length: String
Protein
+
+
+
• A Domain Information Model is a representation of
our understanding of an area of knowledge.
name: String
aminoAcidSequence: String
molecularWeight: double
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Tumor Registry model
Diagnosis
Participant
Collaborative
Staging
Hyperlinks to
Follow up and
Recurrence
caTRIP UML
Treatment
CS584 Lecture on 4/6/2007
caBIG Data Structures
Building caBIG Compatible Systems
Building caBIG compatible systems
Steps for creating an analytical system
•
•
•
•
Step 1: model and register metadata
• Model the domain objects
• Register metadata
Step 2: implement the analytical system
• Implement an interface
• Map data objects to existing inputs
• Plug-in analytics
Step 3: create the data service
• Create an XML Schema
• Use the caGrid 1.0 Introduce toolkit to create a service
• Configure the service
• Deploy
Step 4: invoke the service
• Java-based client
• Use caTRIP
CS584 Lecture on 4/6/2007
caBIG Data Structures
Building caBIG compatible systems
Steps for creating a data system
•
•
•
•
Step 1: model and register metadata
• Model the domain objects
• Register metadata
Step 2: implement the information system
• Model the databases (via scripts or EA)
• Build the database
• Generate Java beans
• Create Hibernate mappings
• Jar it all up
Step 3: create the data service
• Create an XML Schema
• Use the caGrid 1.0 Introduce toolkit to create a service
• Configure the service
• Deploy
Step 4: invoke the service
• Java-based client
• Use caTRIP
CS584 Lecture on 4/6/2007
caBIG Data Structures
Building caBIG compatible systems
N-tier architecture
advertise
Index Service
caGrid Data Service
caCORE SDK
CQL Query
Distributed
Query
Engine
CQL Engine
domain
model
Object-relational
mapping
database
CS584 Lecture on 4/6/2007
caBIG Data Structures
Building caBIG Compatible Systems
caCORE SDK
Vocabularies
Info Model
EVS

UML Model
XMI File
Semantic
Integration
Workbench
(SIW)
Fixed XMI
NO
Verified
EVSReport

Terminology Services
Using
CodeGen?
Successful
Test?
caDSR Services
NO
YES
Verified
Annotated
Fixed XMI

Load to Stage
YES
Compatibility
Review
Messaging
Interfaces/
API
Code Generator
UML
Loader
UML
Loader
Common
Data
Elements
Stage
Prod
Public APIs
CS584 Lecture on 4/6/2007
Approved
Annotated
Fixed XMI

Metadata
Retrieval
caDSR
Production
caDSR
STAGE
caBIG Data Structures
caBIG compatibility
Mapping UML to CDEs
UML Class
Attribute
Datatype
Common Data
Element (CDE)
Value Domain
(VD)
UML Class
Attribute
Data Element
Concept (DEC)
UML Datatype
UML Class
Object Class
(OC)
Property
UML
Attribute
EVS Concept
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Mapping UML to CDEs example
Created Data Element
Gene Entrez Gene
Genomic Identifier
java.lang.String
Class: Gene
Datatype:
Attribute:
entrezGeneID String
Gene
Entrez Gene
Genomic Identifier
java.lang.String
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
Use SIW to designate existing CDEs
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Background, service creation, metadata
caGrid
What is caGrid?
•
•
What is Grid?
• Evolution of distributed computing to support sciences and engineering
• Sharing of resources (computational, storage, data, etc)
• Secure Access (global authentication, local authorization, policies, trust,
etc.)
• Open Standards
• Virtualization
What is caGrid?
• Development project of Architecture Workspace
• Helping define and implement Gold Compliance
• Implementation of Grid technology
• Leverages open standards, community open source projects
• No requirements on implementation technology necessary for compliance
• Specifications will be created defining requirements for interoperability
• caGrid provides core infrastructure, and tooling to provide “a way” to achieve
Gold compliance
• Gold compliance creates the G in caBIG™
• Gold => Grid => connecting Silver Systems
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Metadata infrastructure goals
• Support strongly typed grid
• Syntactic and Semantic interoperability
• Programmatic!
• Smooth transition from Application to Grid and
back
• Leverage wealth of existing metadata
• Enable service Advertisement and Discovery
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Service development process
• Service developers first create a service using a simple wizard to
specify information (target directory, type of service, service
name, etc)
• Next developer locate the data types they will use for inputs or
outputs
• Can be discovered from the caDSR, GME, file system, etc
• Operations are then defined that take some number of the data
types as input, and produce some number as output
• Metadata and Service Properties can be added and configured
• The service’s security can be completely configured
• Some or all of these steps may be automatically handled by
extensions
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Introduce
•
GUI for creating and
manipulating a grid
service
• Provides means of
simple creation of
service skeleton that a
developer can then
implement, build, and
deploy
• Automatic code
generation of complete
caBIG compliant grid
service which is
configured to provide:
•
•
•
•
Advertisement
Standard Metadata
Security
Complete Client API
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Steps for creating a data system
•
•
•
•
Step 1: model and register metadata
• Model the domain objects
• Register metadata
Step 2: implement the information system
• Model the databases (via scripts or EA)
• Build the database
• Generate Java beans
• Create Hibernate mappings
• Jar it all up
Step 3: create the data service
• Create an XML Schema
• Use the caGrid 1.0 Introduce toolkit to create a service
• Configure the service
• Deploy
Step 4: invoke the service
• Java-based client
• Use caTRIP
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Steps for creating an analytical system
•
•
•
•
Step 1: model and register metadata
• Model the domain objects
• Register metadata
Step 2: implement the analytical system
• Implement an interface
• Map data objects to existing inputs
• Plug-in analytics
Step 3: create the data service
• Create an XML Schema
• Use the caGrid 1.0 Introduce toolkit to create a service
• Configure the service
• Deploy
Step 4: invoke the service
• Java-based client
• Use caTRIP
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
caGrid data description infrastructure
•
•
Client and service APIs are
object oriented, and operate
over well-defined and curated
data types
Core Services
Registered In
E
GM
Registered In
Objects are defined in UML
and converted into ISO/IEC
11179 Administered
Components, which are in
turn registered in the Cancer
Data Standards Repository
(caDSR)
Cancer Data
Standards
Repository
Object definitions draw
from controlled terminology
and vocabulary registered in
the Enterprise Vocabulary
Services (EVS), and their
relationships are thus
semantically described
Service
Enterprise
Vocabulary
Services
Semantically
Described In
Object
Definitions
WSDL
Data Type
Definitions
Global
Model
Exchange
Client
XSD
Service Definition
•
•
XML serialization of objects
adhere to XML schemas
registered in the Global Model
Exchange (GME)
CS584 Lecture on 4/6/2007
Client Uses
Validates
Against
Object Definitions
Service
API
Grid
Service
Objects
Serialize To
XML
Grid
Client
Client
API
Objects
Objects
caBIG Data Structures
caGrid
Metadata services
•
Cancer Data Standards Repository (caDSR)
•
•
•
Enterprise Vocabulary Services (EVS)
•
•
•
EVS is set of services and resources that address the need for controlled vocabulary
The EVS grid service provides:
• Query access to the data semantics and controlled vocabulary managed by the EVS
Global Model Exchange (GME)
•
•
•
caBIG projects register their data models as Common Data Elements (CDEs) which are
semantically harmonized and then centrally stored and managed the caDSR
The caDSR grid service provides:
• Model discovery and traversal
• caGrid standard metadata generation capabilities
GME is a DNS-like data definition registry and exchange service that is responsible for
storing and linking together data models in the form of XML schema.
The GME grid service provides:
• Access to the authoritative structural representation of data types on the grid
Globus Information Services: Index Service
•
•
The Globus Information Services infrastructure provides a generic framework for
aggregation of service metadata, a registry of running Grid services, and a dynamic datagenerating and indexing node, suitable for use in a hierarchy or federation of services
The Index grid service provides:
• Yellow and white pages for the grid
caBIG Data Structures
CS584 Lecture on 4/6/2007
caGrid
caGrid production environment
CS584 Lecture on 4/6/2007
caBIG Data Structures
The Cancer Translational Research
Informatics Platform (caTRIP)
Demonstration
caTRIP
Clinical and research scenarios
•
•
•
Clinical scenario for demonstration
• A patient enters the clinic and is diagnosed with a lobular carcinoma
• The Her2/Neu biomarker test comes back positive
• What are the treatments and outcomes of other patients with similar
characteristics?
• Query for diagnosis date, treatment, treatment date, survival, recurrence, and
BRCA1 and BRCA2 status
• Look for treatments given with success and correlation between BRCA status in
case test should be ordered
Research scenario for demonstration
• Is there a correlation between recurrence, mortality, histologic grade, and
Her2/Neu status for breast cancer patients diagnosed with lobular
carcinoma?
• Query caTRIP for recurrence type, date of death, histologic grade, and Her2/Neu
status for patients diagnosed with lobular carcinoma
• Correlation is determined in Microsoft Excel
• Investigate gene biomarkers that correlate with a Her2/Neu status of negative
and survival
• Query caTRIP for all available tissue to order for microarray experiments
Query sharing
• What are all the triple negative patients?
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Why the Simple GUI?
•
What are all the tissue specimens from her2/neu positive patients that have a
primary tumor in the breast and are BRCA1 positive?
caTissue CORE
CAE
Participant
Medical Record Number
Tumor Registry
CS584 Lecture on 4/6/2007
CGEMS
caBIG Data Structures
Discussion/questions
CS584 Lecture on 4/6/2007
caBIG Data Structures
Backup Slides
CTMS Interoperability Project
Goals, scope, BRIDG, architecture, demo
CTMSi
A collaborative effort
11 Organizations
• Booz Allen Hamilton
• Dana-Farber
• Duke University
• Ekagra
• Harvard University
• Mayo Clinic
• NCICB
• Nortel Government Solutions
• Northwestern University
• ScenPro
• SemanticBits
8 Locations
• Maryland
• Minnesota
• Virginia
• Georgia
• Massachusetts
• North Carolina
• Illinois
• France
CS584 Lecture on 4/6/2007
35+ Team Members / 5 Applications
• Cancer Central Clinical Participant Registry
(C3PR)
• Cancer Central Clinical Database (C3D)
• Patient Study Calendar (PSC)
• caXchange: LabViewer and the Clinical Trials
Object Model (CTOM)
• Cancer Adverse Events Reporting System
(caAERS)
8 Roles
• Analysts
• Architects
• Developers
• Project Director
• Project Manager
• Project Sponsor
• Project Tech Leads
• Subject Matter Experts
caBIG Data Structures
CTMSi
Credits
Project Director:
Meg Gronvall (BAH)
Charles N. Mead, M.D. (BAH)
NCICB CTMS Lead:
Christo Andonyadis, D.Sc. (NCICB)
Project Manager:
Edmond Mulaire (SemanticBits)
Project Architects:
Patrick McConnell (Duke)
Niket Parikh (BAH)
Analysts:
Smita Hastak (ScenPro)
Wendy Ver Hoef (ScenPro)
Subject Matter Experts:
Sharon Elcombe (Mayo Clinic)
Vijaya Chadaram (Duke)
Jomol Mathew (Dana-Farber)
Renee Webb (Northwestern)
NCICB Systems Support:
Gavin Brennan (TerpSys), Vanessa
Caldwell (TerpSys), Doug Kanoza (TerpSys),
Wei Lu (TerpSys), Ralph Rutherford (TerpSys)
CS584 Lecture on 4/6/2007
Project Technical Leads:
Ram Chilukuri (SemanticBits)
Charles Griffin (Ekagra)
Vinay Kumar (SemanticBits)
Stephen Reckford (Nortel Government Solutions)
Rhett Sutphin (Northwestern)
Sean Whitaker (Northwestern)
caAERS: Ram Chilukuri (SemanticBits), Krikor Krumlian
(Akaza Research), Vinay Kumar (SemanticBits), Rhett
Sutphin (Northwestern), Kulasekaran Sethumadhavan
(SemanticBits), Sujith Thayylithodi (SemanticBits)
caGrid: Manav Kher (SemanticBits), Vinay Kumar
(SemanticBits), Joshua Phillips (SemanticBits)
caXchange (Lab Viewer/CTOM): Charles Griffin
(Ekagra), Smita Hastak (ScenPro), Mukesh Mediratta
(Ekagra), Kunal Modi (Ekagra), Wendy Ver Hoef
(ScenPro)
caXchange Extensions: Ekagra, SemanticBits
C3D: Srinivas Batchu (Ekagra), Patrick Conrad (Ekagra),
Rangaraju Gadiraju (Ekagra), Stephen Reckford (Nortel)
C3PR: Kruttik Aggarwal (SemanticBits), Ram Chilukuri
(SemanticBits), Ramakrishna Gundala (SemanticBits),
Manav Kher (SemanticBits), Patrick McConnell (Duke),
Priyatam Mudivarti (SemanticBits)
PSC: Rhett Sutphin (Northwestern), Sean Whitaker
(Northwestern)
caBIG Data Structures
CTMSi
Goal
Integrate
Lab Results
Participant Registration
caXchange
caGrid
Patient Scheduling
Adverse Events
Clinical Trials DB
CS584 Lecture on 4/6/2007
caBIG Data Structures
CTMSi
BRIDG extract
cd CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange
Name:
Author:
Version:
Created:
Updated:
CTMS Interoperability BRIDG-Based Analysis Model for Data Exchange
Smita Hastak
1.0
8/13/2001 12:00:00 AM
1/12/2007 9:50:44 AM
Clinical Research Entities and Roles::Person
+
+
-
administrativeGenderCode: BRIDGCodedConcept
dateOfBirth: dateTime
ethnicGroup: string
firstName: string
lastName: string
race: string
PersonRole
In implementation:
do NOT use identifier
C3PR only uses
Participation
SubjectIdentifier
Clinical Research Activ ities and Participation:
:StudySubj ect
Clinical Research Entities and Roles::Participant
::Person
+ administrativeGenderCode: BRIDGCodedConcept
+ dateOfBirth: dateTime
- ethnicGroup: string
1
- firstName: string
- lastName: string
- race: string
::Role
+ id: BRIDGID
Subject
0..* + studySubjectIdentifier: BRIDGID
::Participation
+ endDate: dateTime
+ identifier: BRIDGID
+ startDate: dateTime =
+ status: BRIDGStatus
1
Identifier
+
+
0..*
BRIDG Shared Classes::Activity
+
+
+
0..* +
1
Eligibility
StudyParticipantEligibility
+
0..*
1
+
+
Participation
Clinical Research Activ ities
and Participation::StudySite
OrganizationRole
+
+
identifier: BRIDGID
name: string
Clinical Research
Entities and Roles::
HealthCareSite
Site
1
::Organization
+ identifier: BRIDGID
+ name: string
::Role
+ id: BRIDGID
::Participation
+ endDate: dateTime
0..* + identifier: BRIDGID
+ startDate: dateTime =
+ status: BRIDGStatus
Clinical Research
Activities and
Participation::Study
+
+
Adverse
Events
endDateTime: dateTime
startDateTime: dateTime
Study
1
Clinical Research Entities and Roles::
Organization
isEligible: boolean
Clinical Research Activ ities
and Participation::
PerformedActiv ity
0..*
identifier: BRIDGID
type: BRIDGCodedConcept
codedDescription: BRIDGCodedConcept
description: BRIDGDescription
status: BRIDGStatus
type: BRIDGCodedConcept
+are performed at
1..*
In implementation:
do NOT use endDate,
startDate, status
+participate in
1
id: BRIDGID
longTitle: string
Observation
Observ ations::Adv erseEv ent
- verbatimTerm: String
::Activity
+ codedDescription: BRIDGCodedConcept
+ description: BRIDGDescription
+ status: BRIDGStatus
+ type: BRIDGCodedConcept
Labs
Clinical Research Activ ities and
Participation::LabTest
+labTest
+labResult
1
0..1
ObjectiveResult
QuantitativeMeasurement
Clinical Research Activ ities and
Participation::LabResult
NOTES
Green notes mark classes
where attributes inherited
from the same superclass
are inherited in two different
subclasses but are not
necessarily used in both.
Note to Implementers: This is an analysis model not an implementation model, and
therefore supplemental attributes may be required in your implementation model to
support data exchange between applications (e.g. extra ids). Furthermore, it may be
that not all attributes included here are required for data exchanges and may be
eliminated from this model. It is also likely that an implementation based on this
model may collapse associations to simplify the structure of data exchanges.
CS584 Lecture on 4/6/2007
Disclaimer: BRIDG classes used in this
model have been pared down to only
what is needed for data exchange in
the CTMS Interoperability project and
this in no way indicates or suggests
changes to the official BRIDG model.
+ textResult: string
::QuantitativeMeasurement
+ numericResult: float
+ numericUnits: BRIDGCodedConcept
+ referenceRangeComment: string
+ referenceRangeHigh: int
+ referenceRangeLow: int
caBIG Data Structures
cd Comprehensiv e Logical Model
Design Concepts::Masking
Clinical Trial Design
+
+
+
+
level:
objectOfMasking (set):
procedureToBreak:
unmaskTriggerEvent (set):
Protocol
Concepts::
Control
HasSubElements
AbstractActivity
«Period»
Design Concepts::Element
Protocol Concepts::DesignCharacteristic
Name:
Author:
Version:
Created:
Updated:
Comprehensive Logical Model
Fridsma
1.0
7/22/2005 2:53:51 PM
7/29/2005 2:33:32 PM
+
+
+
+
+
+
Protocol
Concepts::
Scope
Protocol Authoring
and Documentation
-
Protocol
Concepts::
Configuration
synopsis:
type: test value domain = a,d,f,g
summaryDescription:
summaryCode:
detailedMethodDescription:
detailedMethodCode:
Children: Set
epochType: EpochTypes
AbstractActivity
displayName: char[]
whoPerforms: int
sequence: int
procDefID: PSMCodedConcept
sourceText: char[]
SubjectEvent
Protocol Concepts::StudyBackground(w hy)
+
+
+
+
+
+
+
+
+
+
+
hasElements
hasScheduledEvents
Design Concepts::PlannedTask
-
Protocol
Concepts::Bias
Design Concepts::Arm
Design Concepts::ProtocolEv ent
description: PSMDescription
summaryOfPreviousFindings: PSMDescription
summaryOfRisksAndBenefits: PSMDescription
justificationOfObjectives: PSMDescription
justificationOfApproach: PSMDescription
populationDescription: PSMDescription
rationaleForEndpoints: PSMDescription
rationaleForDesign: PSMDescription
rationaleForMasking: PSMDescription
rationaleForControl: PSMDescription
rationaleForAnalysisApproach: PSMDescription
hasElements
Protocol Concepts::StudyObligation
+
+
+
+
1
type: ENUMERATED
description: PSMDescription
commissioningParty:
responsibleParty:
Design Concepts::
Randomization
+
+
Protocol Concepts::
Concurrency
-
-
parent: AbstractActivity
eventType: ScheduledEventType
studyOffset: PSMInterval
studyDayOrTime: char
nameOfArm: char[]
plannedEnrollmentPerArm: char[]
randomizationWeightForArn: int
associatedSchedules: Set
tasksPerformedThisSchedule
taskAtEvent
hasOngoingEvents
minimumBlockSize:
maximumBlockSize:
1..*
AbstractActivity
+correlativeStudy 0..*
Design Concepts::Ev entTask
BusinessObj ects::Study
BusinessObj ects::
ClinicalDev elopmentPlan
-_DevelopmentPlan
+
+
+
+
+
+
BusinessObj ects:
:Amendment
*
-
+primaryStudy 1
startDate: Date
endDate: Date
type: PSMCodedConcept
phase: PSMCodedConcept
randomizedIndicator: Text
SubjectType: PSMCodedConcept
localFacilityType: LocalFacilityType
centralFacilitityType: CentralFacilitiyType
eventID: OID
taskID: OID
purposes: Set
-
Protocol Concepts::StudyObj ectiv e(w hat)
+
+
+
+
1
+
description: PSMDescription
intentCode: SET ENUMERATED
objectiveType: ENUM{Primary,Secondary,Ancillary}
id: PSMID
+
+
+
+
+
+
+
+
+
+
+
+
1
1..* -
BusinessObj ects::
ClinicalStudyReport
description: BRIDGDescription
ranking: OutcomeRank
associatedObjective: Set
analyticMethods: Set
asMeasuredBy: Set
outcomeVariable:
threshold:
-
Design Concepts::Ev entTaskPurpose
BasicTypes::StudyDatum
isBaseline: boolean
purposeType: PurposeType
associatedOutcome:
-
type:
+target 0..*
complete: bool
value: Value
timestamp: timestamp
itemOID:
Defined By
+
+
+
+
-
«ODM:ItemData»
Design Concepts::
SubjectDatum
-
«ODM ItemData»
Design Concepts::
DiagnosticImage
BasicTypes::RIMActiv ityRelationship
BasicTypes::StudyVariable
-
«ODM:ItemData»
Design Concepts::
TreatmentConfirmed
1
geographicAddress:
electronicCommAddr:
standardIndustryClassCode:
1
BusinessObj ects::
StatisticalAnalysisPlan
+
+
+
+
-
birthTime:
sex:
deceasedInd: boolean
deceasedTime:
multipleBirthInd: boolean
multipleBirthOrderNumber: int
organDonorInd: boolean
+
hasHypotheses
relationshipCode: PSMCodedConcept
sequenceNumber: NUMBER
negationIndicator: BOOLEAN
time: TimingSpecification
contactMediumCode: PSMCodedConcept
targetRoleAwarenessCode: PSMCodedConcept
signatureCode: PSMCodedConcept
signature: PSMDescription
slotReservationIndicator: BOOLEAN
substitionConditionCode: PSMCodedConcept
id: PSMID
status: PSMCodedConcept
+
+
+
+
+
+
#
jobCode: PSMCodedConcept
-source
activity
+
confidentialityCode:
Entities and Roles::
Access
Entities and Roles::Person
significanceLevel: double
lowerRejectionRegion: int
upperRejectionRegion: int
testStatistic:
comparisonType: AnalyticComparisonTypes
associatedSummaryVariables:
BasicTypes::ActActRelation
hasAnalysisSets
hasAnalyses
geographicAddress:
maritalStatusCode:
educationLevelCode:
raceCode:
disabilityCode:
livingArrangementCdoe:
electronicCommAddr:
religiousAffiliationCode:
ethnicGroupCode:
kindOfActRelation
kindOfAnalysis
*
Statistical Concepts::
StatisticalAssumption
+
-
*
Protocol Concepts::StudyObj ectiv eRelationship
+
type: PSMCodedConcept
statement: PSMDescription
associatedObjective:
clinicallySignificantDiff: char
AbstractActivity
-_Hypothesis
*
hasAnalyses
+
description: PSMDescription
analysisType: Set{AnalysisTypes}
analysisRole:
rationaleForAnalysisApproach: PSMDescription
associatedStrategy:
associatedHypotheses:
*
manufacturerModelName:
softwareName:
localRemoteControlStateCode:
alertLevelCode:
lastCalibrationTime:
+contains 1
hasModel
OStudy Design and Data Collection::
OBRIDGDeriv ationExpression
BasicTypes::BRIDGInterv al
+
startTime: timestamp
endTime: timestamp
+
+
+
+
description: PSMDescription
outputStatistic: StudyVariable
computations: Set
assumptions: Set
+
+
+
source: Text
version: Text
value: Text
+
+
OProtocolStructure::
ResponsibilityAssignment
alphaSpendingFunction:
timingFunction:
analysis:
trialAdjustmentRule:
code: TEXT
codeSystem:
codeSystemName: TEXT
codeSystemVersion: NUMBER
displayName: TEXT
originalText: TEXT
translation: SET{PSMCodedConcept}
+
+
+
+
description: PSMDescription
algorithm: char
input: AbstractStatisticalParameter
output: AbstractStatisticalParameter
OProtocolStructure::
Activ ityDeriv edData
hasSchedules
+
+
+
BusinessObj ects::
EnrollmentRecord
randomizationCode:
subjectID:
assignmentDateTime:
BusinessObj ects::Guide
+
+
+
effectiveEndDate:
effectiveStartDate:
statusValue:
name: TEXT
value:
controlledName: PSMCodedConcept
businessProcessMode: PSMBusinessProcessMode
type: PSMCodedConcept
effectiveTime: BRIDGInterval
usage: PSMCodedConcept
BusinessObj ects::
SponsorStudyManagementProj ectPlan
1 1..*
BusinessObj ects::Inv estigatorRecruitmentPlan
BusinessObj ects::
DataMonitoringCommitteePlan
BusinessObj ects::
FinalRandomizationAssignment
BusinessObj ects::
Waiv er
BasicTypes::BRIDGContactAddr
+
+
+
1..*
1
BusinessObj ects::
SiteStudyManagementProj ectPlan
BusinessObj ects::BiospecimenPlan
OStudy Design and Data Collection::
OEncounterDefinitionList--???
«abstraction»
BusinessObj ects::
SiteSubj ectManagementProj ectPlan
BusinessObj ects::
ClinicalTrialMaterialPlans
BusinessObj ects::
RandomizationAssignment
criterion: RULE
eventName: TEXT
Protocol Concepts::Constraint
BasicTypes::BRIDGStatus
«implementationClass»
Design Concepts::
TemporalRule
AbstractActivity
Statistical Concepts::Computation
-
+passedTo
+generates 0..*
BasicTypes::BRIDGAnalysisVariable
BasicTypes::BRIDGCodedConcept
-
implements
Statistical Concepts::
SequentialAnalysisStrategy
+
+
+
+
1..* +targetActivity
+sourceActivity 1
Protocol activities and
Safety monitoring (AE)
BusinessObj ects:
:
RegulatoryRecord
type: ENUM{transformation, selection}
rule: TEXT
id: PSMID
name: TEXT
OStudy Design and Data Collection::OBRIDGTransition
BasicTypes::BRIDGID
isExclusive: bool
+
run() : bool
Statistical Concepts::StatisticalModel
+
#
-
Plans::Protocol/Plan
modeValue: ENUM {Plan, Execute}
-
hasAssumptions
Implements
implements
hasComputations
1
BasicTypes::BRIDGBusinessProcessMode
BasicTypes::
AbstractRule
description: PSMDescription
Statistical Concepts::Analysis
+
+
+
+
#
#
Entities and Roles::Dev ice
+IsContainedIn
+
«property» relationQualifier() : PSMCodedConcept
«property» sourceAct() : AbstractActivity
«property» destAct() : AbstractActivity
kindOfAnalysis
-_StatisticalAnalysisSet
hasStrategy
-
+
+
+
clinicalJustification: TEXT
Statistical Concepts::Hypothesis
hasChildAnalyses
strain:
genderStatusCode:
description: BRIDGDescription
relationQualifier: BRIDGCodedConcept
mode: PSMBusinessProcessMode
effectiveTime: BRIDGInterval
priorityNumber: NUMBER
negationRule: AbstractRule
detail: char
sourceAct: AbstractActivity
destAct: AbstractActivity
sequence: int
AbstractActivity
businessProcessMode: PSMBusinessProcessMode
code: PSMCodedConcept
derivationExpression: TEXT
status: PSMCodedConcept
+TerminatingActivity 1..*
availabilityTime: TimingSpecification
priorityCode: PSMCodedConcept
confidentialityCode: PSMCodedConcept
repeatNumber: rangeOfIntegers
+EndEvent 1
interruptibleIndicator: BOOLEAN
uncertaintyCode: CodedConcept
+StartEvent 1
reasonCode: PSMCodedConcept
Entities and Roles::
NonPersonLiv ingEntity
+
-
+
-
Statistical Concepts::
SampleSizeCalculation
+FirstActivity 1..*
+
+
+
+
description: char
subgroupVariable: StudyDatum
sequence: int
+
BasicTypes::RIMActivity
+
+
+
+
+
+
+
+
+
+
+
OProtocolStructure::
ElectronicSystem
lotNumberText: string
expirationTime:
stabilityTime:
description: PSMDescription
scopeType: AnalysisScopeTypes
hasCriteria
-source
activity
+target activity
1
Entities and Roles::
ManufacturedMaterial
+
-
Statistical Concepts::HypothesisTest
Entities and Roles::Patient
-
Statistical Concepts::
StatisticalAnalysisSet
hasAnalysisSets
evaluableSubjectDefinition: char
intentToTreatPopulation: char
clinicallyMeaningfulDifference: char
proceduresForMissingData: char
statSoftware: char
methodForMinimizingBias: char
subjectReplacementStrategy: char
randAndStratificationProcedures: char
Statistical Concepts::AnalysisSetCriterion
Entities and Roles::Employee
formCode:
-
-
1..*
Entities and
Roles::Material
«ODM:ItemDef»
Design Concepts::
PlannedObserv ation
restates Objective
*
Entities and Roles::Activ ityRoleRelationship
+
+
+
+
+
* +
+
+
+
+
+
+
OID: long
Name: char
unitOfMeasureID: OID
minValid:
maxValid:
controlledName: ENUM
relationshipCode: PSMCodedConcept
sequenceNumber: NUMBER
pauseCriterion:
checkpointCode:
splitCode:
joinCode:
negationIndicator: BOOLEAN
conjunctionCode:
1
id:
code: PSMCodedConcept
name:
status:
effectiveStartDate:
effectiveEndDate:
geographicAddress:
electronicCommAddr:
certificate/licenseText:
kindOfActivityRelation
subjectID: int
Statistical Concepts::StatisticalConceptArea
* +
+
+
+
+
+
+
+
+
roleInAnalysis: RoleInAnalysisTypes
transactionType:
-
type:
description: PSMDescription
version:
ID: SET PSMID
*
Entities and Roles::Role
«ODM:ItemDef»
Design Concepts::
PlannedInterv ention
kindOf
«abstract»
Design
Concepts::
StudyActivityDef
BusinessObj ects::
SupplementalMaterial
1..*
instantiationType: ENUM {Placeholder, Actual}
id: SET <PSMID>
1
name: string
code: PSMCodedConcept
quantity: int
description: PSMDescription
statusCode: BRIDGStatus
1
existenceTime: BRIDGInterval
riskCode: PSMCodedConcept
handlingCode: PSMCodedConcept
contactInformation: SET <PSMContactAddr>
AbstractActivity
isKindOf
BasicTypes::AnalysisVariableInst
associatedVariable
«ODM ItemData»
Design Concepts::Observ ation
Entities and Roles::
Study
Entities and Roles::Liv ingEntity
Design Concepts::
StudyActiv ityRef
Defined By
as Measured By
Entities and Roles::Entity
Entities and Roles::Organization
activityID: OID
1..*
Protocol Concepts::Outcome
BusinessObj ects::
ProtocolDocument
effectiveEndDate: DATETIME
version:
author: SET
effectiveStartDate: DATETIME
ID: SET PSMID
documentID:
type: ENUMERATED = formal plus non...
description: PSMDescription
title:
status: PSMStatus
confidentialityCode: PSMCodedConcept
businessProcessMode: PSMBusinessProcessMode
+
+
+
+
+
+
+
+
+
+
+
-
«execution mode»
Scheduled Sub Activities
-source
objective
Protocol Concepts::StudyDocument
+
+
+
+
+
+
+
+
+
+
+
+
eventType: UnscheduledEventType
0..*
+source 1
gpsText:
mobileInd: boolean
addr:
directionsText:
positionText:
SubjectEvent
Design Concepts::UnscheduledEv ent
-
hasPurposes
*
Entities and
Roles::
ResearchProgram
+
-
hasUnscheduledEvents
description: PSMDescription
BusinessObj ects::
CommunicationRecord
Entities and Roles::Place
Periods: Set
Tasks: Set
TaskVisits: Set
associatedArms: Set
BusinessObj ects::
Activ itySchedule (the "how ",
"w here", "w hen", "w ho")
BusinessObj ects::
IntegratedDev elopmentPlan
Clinical Trial
Registration
hasArms
Design Concepts::StudySchedule
+
Protocol Concepts::
EligibilityCriterion
listOfDataCollectionInstruments:
Protocol Concepts::
Variance
Protocol Concepts::
ExclusionCriterion
BusinessObj ects:
:ProtocolRev iew
+
+
date:
result:
Protocol Concepts::
Milestone
Eligibility
Determination
Structured
Statistical Analysis
BusinessObj ects::
DataManagementPlan
BusinessObj ects::AssayProcedures
BusinessObj ects::
Adv erseEv entPlan
BusinessObj ects::
ContingencyPlan
BusinessObj ects::
Subj ectRecruitmentPlan
BusinessObj ects::SafetyMonitoringPlan
«implementationClass»
BusinessObj ects::
BusinessRule
«implementationClass»
Design Concepts::
ClinicalDecision
CTMSi
Architectural overview
C3PR
Oracle
Grid Service
C3D
PSC
Oracle
Web Service
LabViewer/CTOM
Postgre
Grid Service
Oracle
Grid Service
Authentication
Trust
Authorization
caXchange
Enterprise Service Bus
Postgre
Grid Service
Messages
Inbound
Binding
Component
caAERS
caGrid
Outbound
Binding
Component
Dorian
GTS
Grid Grouper
Routing Rules
CS584 Lecture on 4/6/2007
caBIG Data Structures
CTMSi
Demonstration
sd ov erv iew sequence
caExchange
C3PR
C3D WS
PSC
LabViewer
CTOM
caAERS
SME
User will create a new patient
and register the patient to a
protocol, checking the eligibility
status. The protocol is already
prepopulated amongst all the
systems.
registerPatient
registerPatient(Participant,
StudySubject, StudySite,
HealthCareSite)
isValidProtocol(studyId)
patientPositionId= getPatientPosition(site, studyId)
registerPatient(Participant,
StudySubject, StudySite,
HealthCareSite)
registerPatient(Participant, StudySubject,
StudySite, HealthCareSite)
registerPatient(Participant, StudySubject,
StudySite, HealthCareSite)
registerPatient(Participant, StudySubject,
StudySite, HealthCareSite)
The user will have a hot-link
from the C3PR interface to the
PSC interface. The user will
see the patient registered on
the prepopulated protocol.
viewSchedule
scheduleActivity
The user will hot-link
over to the Lab Viewer
to view Lab activities.
viewLabActivities
viewLabData(Patient)
viewLabData
Lab[]= query(id[])
loadLabData
caExchange (or some
component hooked into
caExchange) will load
data into C3D.
loadLabData(Paticipant, StudySubject,
Study, LabTest, LabResult)
loadLabData(mrn, studyId, lab, labTest)
We may not be able to hot-link
to C3D, but the data should be
propogated there and
viewable from the C3D
interface.
viewPatient
viewLabData
selectLabForAE
Lab[]= query(id[])
A new AE with some
minimal information will
be created and sent to
caAERS through
caExchange.
newAE(Paticipant, StudySubject, Study, LabTest, LabResult)
id= newAE(Participant, StudySubject, Study, LabTest, LabResult, AE)
The user will hot-link
from the LabViewer to
caAERS, where he can
edit and submit the AE.
editAE
submitAE
submitAE(Participant, StudySubject, Study, AE)
flagAE(Participant, StudySubject, Study, AE)
The user hot-links from
caAERS to PSC, where
they will see the AE
notification and make
appropriate changes.
login
aeNotification
modifySchedule
CS584 Lecture on 4/6/2007
caBIG Data Structures
Service Metadata: All Services
•
Common Service Metadata
• Provided by all services
• Details service’s capabilities,
operations, contact information,
hosting research center
• Service operation’s inputs and
outputs defined in terms of
structure and semantics
extracted from caDSR and
EVS
• Majority auto-generated by
Introduce
CS584 Lecture on 4/6/2007
caBIG Data Structures
Service Metadata: Service Security
•
Service Security Metadata
• Provided by all services
• Details the service’s
requirements on
communication channel for
each operation
• Can be used by client to
programmatically
negotiate an acceptable
means of communication
• For example: Does
operation X allow
anonymous clients, or are
credentials required?
• Auto-generated by
Introduce
CS584 Lecture on 4/6/2007
caBIG Data Structures
Service Metadata: Data Service
•
Data Service Metadata
• Provided by all data
services
• Describes the Domain
Model being exposed, in
terms of a UML model
linked to semantics
• Provides information
needed to formulate the
Object-Oriented Query
• As with common
metadata, data types
defined in terms of
structure and semantics
extracted from caDSR and
EVS
• Auto-generated by
CS584 Lecture Introduce
on 4/6/2007
caBIG Data Structures
caTRIP in-depth: Architecture
Security
authorization
User Grid
Certificate
authentication
User
Credentials
SAML
Assertion
caGrid
Authentication Service
Duke
Authentication Plugin
Dorian
Grid
Data Service
CSM
Trust Fabric
Grid
Grouper
backend
data
Duke Domain Controller
NT Security
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP in-depth: Data sharing
Challenges in data sharing
•
•
Building data-oriented systems
• Duke requires IRB approval to gain access to identifiable data
• We worked around by leveraging people already on IRB protocols
Deidentifying data
• Data is owned by different groups across the cancer center
• Traditional deidentification: data manager deidentifies an entire dataset
then throws away the key
• Distributed deidentification: trusted service provider (TSP) deidentifies
discreet values
• Traditional approach is not scalable – requires a middle-man
• IRB approval required for distributed approach because it deviates from
traditional deidentification (at Duke)
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP in-depth: Data sharing
Distributed deidentification
Secure
connection
MRN3
GHI789
Has IRB approval to see
identifiable data
Has IRB approval
to store
identifiable data
MRN3
Trusted Service
Provider
Has IRB approval to see
identifiable data
PHI
DEID
MRN1
ABC123
MRN2
DEF456
MRN3
GHI789
. . .
CS584 Lecture on 4/6/2007
GHI789
. . .
Randomly
generated
caBIG Data Structures
caTRIP in-depth: Architecture
Simple GUI configuration
Service A
Service B
BreastCancerBiomarkers
Target
Linking Object
Join Condition
Associated
Classes
ParticipantMedicalIdentifier
Association Direction
SpecimenCharacteristics
Associated Object Tree
Foreign Association inbound Paths
Linking Object
Join Condition
Target
Association Direction
Foreign Association Outbound Path
CS584 Lecture on 4/6/2007
Filter
Object
Join Condition: CDE ex. MRN
Service A
Service B
Foreign Association
caBIG Data Structures
caTRIP in-depth: Architecture
caBIG compatibility
•
•
•
•
Challenge
• Silver-compatibility is in some ways (and for good reason) stringent
• Grid technologies were still in development (caGrid 1.0 is now released)
caTRIP is a silver-compatible application (in theory)
• Compatibility submission package completed
• Going through review now for silver-compatible data services
caTRIP leverages caCORE technologies
• Common Security Module (CSM) provides authorization
• caCORE-SDK provides tooling to create Java classes from UML (XMI),
XML schemas, and castor mappings
caTRIP leverages caGrid technologies
• Index Service provides advertisement and discovery
• Authentication Service provides
• Dorian helps provide authentication
• GTS provides trust fabrics
CS584 Lecture on 4/6/2007
caBIG Data Structures
Next steps
•
•
•
•
•
•
Aggregate data from multiple services of the same type
• Scenario: caTissue Suite deployed at 13 cancer centers
Add datasets and data types
• CTMS, population sciences, basic science, etc.
Add analytical services
• Integrate with workflow
• Add visualization components
Enhanced reporting
• Automate Excel pivot table
• Data mining results
Enhanced querying
• Asynchronous, parallel querying
• Querying multiple deployed distributed query services
Continue refinement of user interface
• Synchronization of advanced and simple GUI
• Additional usability features
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
caBIG Resources
•
caBIG™ Website: http://cabig.cancer.gov/index.asp
•
caBIG™ Compatibility Guidelines:
https://cabig.nci.nih.gov/compatibility_guidelines_documentation/
•
Cancer Common Ontologic Representation Environment (caCORE):
http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview
•
Enterprise Vocabulary Services (EVS):
http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/vocabulary
•
Cancer Data Standards Repository (caDSR):
http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr
•
caCORE Software Developer’s Kit (caCORE SDK):
http://ncicb.nci.nih.gov/NCICB/infrastructure/cacoresdk
•
caCORE Training: http://ncicb.nci.nih.gov/NCICB/training/cadsr_training
•
Model Driven Architecture: http://www.omg.org/mda/
•
UML Modeling: http://www.sparxsystems.com.au/UML_Tutorial.htm
CS584 Lecture on 4/6/2007
caBIG Data Structures
caTRIP
Why can’t I just write DCQL?
•
What are all the tissue specimens from her2/neu positive patients that
have a primary tumor in the breast and are BRCA1 positive?
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
<DCQLQuery xmlns="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql">
<TargetObject name="edu.wustl.catissuecore.domainobject.impl.TissueSpecimenImpl" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTissueCore">
<Association name="edu.wustl.catissuecore.domainobject.impl.SpecimenCollectionGroupImpl" roleName="specimenCollectionGroup">
<Association name="edu.wustl.catissuecore.domainobject.impl.ClinicalReportImpl" roleName="clinicalReport">
<Association name="edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl" roleName="participantMedicalIdentifier">
<Group logicRelation="AND">
<ForeignAssociation>
<JoinCondition>
<LeftJoin>
<Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>
<Property>medicalRecordNumber</Property>
</LeftJoin>
<RightJoin>
<Object>edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier</Object>
<Property>medicalRecordNumber</Property>
</RightJoin>
</JoinCondition>
<ForeignObject name="edu.duke.catrip.cae.domain.general.ParticipantMedicalIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CAE">
<Association name="edu.duke.catrip.cae.domain.general.Participant" roleName="participant">
<Association name="edu.pitt.cabig.cae.domain.general.AnnotationEventParameters" roleName="annotationEventParametersCollection">
<Association name="edu.pitt.cabig.cae.domain.breast.BreastCancerBiomarkers" roleName="annotationSetCollection">
<Attribute name="HER2Status" predicate="LIKE" value="POSITIVE%"/>
</Association>
</Association>
</Association>
</ForeignObject>
</ForeignAssociation>
<ForeignAssociation>
<JoinCondition>
<LeftJoin>
<Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>
<Property>medicalRecordNumber</Property>
</LeftJoin>
<RightJoin>
<Object>edu.duke.cabig.tumorregistry.domain.PatientIdentifier</Object>
<Property>medicalRecordNumber</Property>
</RightJoin>
</JoinCondition>
<ForeignObject name="edu.duke.cabig.tumorregistry.domain.PatientIdentifier" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CaTRIPTumorRegistry">
<Association name="edu.duke.cabig.tumorregistry.domain.Patient" roleName="patient">
<Association name="edu.duke.cabig.tumorregistry.domain.Diagnosis" roleName="diagnosisCollection">
<Attribute name="primarySite" predicate="LIKE" value="BREAST%"/>
</Association>
</Association>
</ForeignObject>
</ForeignAssociation>
<ForeignAssociation>
<JoinCondition>
<LeftJoin>
<Object>edu.wustl.catissuecore.domainobject.impl.ParticipantMedicalIdentifierImpl</Object>
<Property>medicalRecordNumber</Property>
</LeftJoin>
<RightJoin>
<Object>gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant</Object>
<Property>studySubjectIdentifier</Property>
</RightJoin>
</JoinCondition>
<ForeignObject name="gov.nih.nci.caintegrator.domain.study.bean.StudyParticipant" serviceURL="http://152.16.96.114/wsrf/services/cagrid/CGEMS">
<Association name="gov.nih.nci.caintegrator.domain.analysis.snp.bean.SNPAnalysisGroup" roleName="analysisGroupCollection">
<Attribute name="name" predicate="LIKE" value="BRCA1%"/>
</Association>
</ForeignObject>
</ForeignAssociation>
</Group>
</Association>
</Association>
</Association>
</TargetObject>
</DCQLQuery>
CS584 Lecture on 4/6/2007
Select tissue
Foreign Join w/ CAE
HER2/NEU Positive
Foreign Join w/ Tumor Registry
Primary Site Breast
Foreign Join w/ CGEMS
BRCA1 Positive
caBIG Data Structures
data
objects
Distributed
Query
Engine
CQL
data
objects
CQL
data
objects
CS584 Lecture on 4/6/2007
database
database
DCQL
database
data
objects
caGrid data
service
CQL
caGrid data caGrid data
service
service
caTRIP
Distributed query engine
caBIG Data Structures
CTMSi
BRIDG dynamic modeling
•
•
•
•
•
•
•
•
•
*Process flow
*story boards
*Scenarios
*Use cases
*Text UML activity diagrams
*Links to static structures
Interaction diagrams (?)
Sequence diagrams
Collaboration diagrams (UML 2.0)
CS584 Lecture on 4/6/2007
caBIG Data Structures
CTMSi
Patient registration message
caAERS
User
ESB
Acknowledgement
Router
Grid
BC
Registration Message
PSC
CS584 Lecture on 4/6/2007
caAERS
Grid Service
Registration Message
Registration Message
PSC
Grid Service
caBIG
Data Structures
caBIG compatibility
CDE Browser
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
CDE Browser permissible values
CS584 Lecture on 4/6/2007
caBIG Data Structures
caBIG compatibility
NCI Thesaurus
Concept Code
Relationships
Preferred Name
Definition
Synonyms
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
caGrid community involvement
• caGrid itself provides no real “data” or “analysis” to caBIG
• It’s the enabling infrastructure which allows the community to do so
• Community members add value to the grid as applications,
services, and processes (for example: shared workflows)
• caGrid provides the necessary core services, APIs, and tooling
• The real “value” of the grid comes from bringing this information
to the “end user”
• Data Services: expose data to the grid in a unified way
• Analytical Services: expose analytical operations to the grid
• Community members develop end user applications which
consume of the resources provided by the grid
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
caGrid exposing silver systems
• Object Oriented APIs and data resources are developed
using Object types and information models registered in
the caDSR
• These “silver systems” are grid-enabled by defining a grid
service interface that defines the functionality to be
exposed to the grid
• The grid service interface uses the same Object types as
the existing system, but leverages a platform and
language neutral representation (XML) of them
• The grid service implementation maps service
invocations to API calls or queries into the existing system
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Federated Query Processor
• Provides a mechanism to perform basic distributed aggregations
and joins of queries over multiple data services
• As caGrid data services all use a uniform query language, CQL,
the Federated Query Infrastructure can be used to express
queries over any combination of caGrid data services
• Federated queries are expressed with a query language, DCQL,
which is an extension to CQL to express such concepts as joins,
aggregations, and target services
• Implemented as a stateful grid service, queries may be executed
asynchronously and results retrieved at a later time
• Supports secure deployments wherein result ownership is
enforced
• Coupled with semantic discovery capabilities of caGrid, provides
a powerful framework for data discovery, mining, and integration
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Data service common query language
• Specifies a target object (result) type and selects the
instances which satisfy the specified properties and
nested object properties
• Allows path navigation
• Provides logical grouping
• Provides name/predicate/value filtering on properties of
objects
• Recursively defined
• Ability to return full Objects, Set of attributes, count of
results, or distinct attribute values
CS584 Lecture on 4/6/2007
caBIG Data Structures
caGrid
Example CQL query
LIKE “BRCA%”
Return all Genes with a symbol beginning with BRCA
and have an associated Taxon with a scientificName
equal to “Homo sapiens”:
= “Homo sapiens”
CS584 Lecture on 4/6/2007
<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
<Target name="gov.nih.nci.cabio.domain.Gene">
<Group logicRelation="AND">
<Attribute name="symbol" predicate="LIKE“ value="BRCA%"/>
<Association roleName="taxon“ name="gov.nih.nci.cabio.domain.Taxon">
<Attribute name=“scientificName" predicate=“EQUAL_TO” value=“Homo sapiens"/>
</Association>
</Group>
</Target>
</CQLQuery>
caBIG Data Structures
caBIG compatibility
Metadata and concepts example
CS584 Lecture on 4/6/2007
caBIG Data Structures
Download