XML/STDS PPT - University of Connecticut

advertisement
XML, Standards, and Ontologies
CSE
5095
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
XML-STDS-1
Overview

CSE

5095



What is XML? How is it Used Today?
XML Databases
HL7 and CDA
Other Standards
 MeSH
 Unified Medical Language System
 ICD9 and ICD9-CM (Intl. Classification Diseases)
 ICD10 and ICD10-CM
 SNOMED-CT (Clinical Terms)
 National Drug Codes (NDC)
Ontologies – Biomedical and Clinical
 What are they?
 How are they Used?
 Can they be Improved?
XML-STDS-2
What is one Possible Solution?

CSE
5095

Standards and Usage of XML
XML Used in Myriad of Context
 Modeling and Information Exchange (XML
Schemas and Instances)
 XML Standards
 XACML – Access Control Markup Language
 OWL – Web Ontology Language
 HL7/CDA
XML Databases
What is/will be its Eventual Role in BMI?


XML-STDS-3
Overview of XML

CSE

5095






XML Overview, Tags, schema.
XML Query Languages: XPath &XQuery
XML Data Models
Storage Strategy + XML DBMS:
 Relational, CMS, native
Native XML DBMS: Pros/Cons.
Biomedical Information and Databases
BMI Standards and Examples: HL7 and CDA
Survey of Technology
XML-STDS-4
XML overview
 eXtensible
CSE
5095
Markup Language
 Similar to HTML
 Meta-language that describes the content of the
document (self-describing)
 XML is primarily used as a data storage and interchange
medium
 XML exists in plain text format, however it may be
compressed, or altered for transfer
XML-STDS-5
XML overview cont.
 There
CSE
5095
are no predefined data (tags), or grammer
inherently in XML
 XML tags give an XML document structure and
meaning
 Available tags are defined by a schema.
 All tags in an XML document come in pairs, open and
close
 Tags are completely nested, and there is no ambiguity in
their order
XML-STDS-6
XML tags

CSE
5095




XML tags may have an element field which is used to
store information within the tag or Meta-data
Plain text can be placed between tags and this text is
not parsed
CDATA is character data
 This means that any string of non-markup
characters is legal as part of the attribute
The ENTITY attribute type indicates that the attribute
will represent an external entity in the document itself
The ID attribute type if you want to specify a unique
identifier for each element.
XML-STDS-7
XML Schema
 The
CSE
5095
structure of an XML document is defined by its
schema.
 Dozens on languages to define XML schema:
 DTD
 W3C (XSD)
 NG - Relax
 This file can validate any instance of an XML document
against it self.
 This file, or schema also defines allowable tags.
XML-STDS-8
Sample XML Structure
CSE
5095
 XML employees
a tree structure model for
representing data (previous slide)
shiporder
shipto
orderperson
orderid
name
address
city
country
item
title
name
quantity
price
XML-STDS-9
Schema Example (XSD)
CSE
5095
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="item" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string" minOccurs="0"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="orderid" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
XML-STDS-10
Querying XML - XPath
 Many
CSE
5095
languages to query XML
 XPath and XQuery are W3C standards
 Xpath is a compact method of traversing previous tree
 Designed to facilitate use via URL/URI's
 /shiporder/item/name
← view all items' names
 Extensible to add user defined behaviors
 Treats each tag as a node in the tree
XML-STDS-11
Querying XML - XQuery
 Functional
CSE
5095
extension of XPath
 XML equivalent of SQL
 Navigate and manipulate document nodes.
 Works on collections of documents, or even fragments.
FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title
XML-STDS-12
XML Models
 Naively
CSE
5095
there are two models of XML use:
 Data-centric
 Document-centric
 In reality, most XML use is a hybrid of the two
 More important is the database strategy used with XML
 Relational
 Content Managment
 Native XML
XML-STDS-13
Data – Centric Model

CSE

5095


Information is generally stored in a relational database
XML is transport medium, nothing more
Irrelevent to application that data exists as XML for
some period of time
Characteristics:
 Fine grained data.
 Data relationship is insignificant.
 Need to transfer relational information.
 Means of storing new information.
XML-STDS-14
Document – Centric Model
 When
CSE
5095
XML is utilized soley as a document
 This pesentation in Open Office
 The documents in part, or in full are stored and retrieved
 Does not originate from relational database
 Document used for human consumption
 Usually information written by hand in a language like
PDF, RTF then converted to XML
XML-STDS-15
Reality: Hybrid Model

CSE
5095



Most documents like a PDF will also contain small
grained information (last edited date, character set)
Data from a relational DB may even be a document, or
require self description
Various database technologies support all models
Important to understand your data, and choose db
technology that is most compatible
XML-STDS-16
XML as Data Exchange Medium

CSE

5095



Widespread Usage Across Computing
UML Tools have Standardized on XML Schema
 Export Given UML Design to XML Instances
 Track Both Design Data and Graphical Data
Database Interactions via XML
 Import from XML into a Relational Schema
 Export form a Relational DB into XML Schema
and Instances
Web Services
 Exchange of Information
 SOAP, WSDL, and UDDI
Facilitates Information Exchange and Portability
XML-STDS-17
Medical Data Model
 Medical
CSE
5095
data is non-homogeneous
 But, there exists general trends in medical data:
 Fine grain data such as dates, times, images
 Documents and human generated descriptions and
observations
 Human interaction creates semi-structured data
 Ability to transfer information is esential
 Medical data fits into hybrid model
XML-STDS-18
Data – Centric Comparison
 Advantages:
CSE
5095




Utlizes existing database software. (IBM, Oracle, SqlServer)
Quick ( existing db's are already fast)
Dual role (not limited only to XML)
Many even support XQuery
 Disadvantages:


More configuration (mapping relational -> XML)
Slower when creating complex XML files due to middle step
XML-STDS-19
Document – Centric Comparison
 Advantages:
CSE
5095
 Good
integration into workflow
 Document managment made easy
 Collaboration, and web publishing
 Disadvantages:
 Not able to extract data from document directly
 Not designed for high availability, high load systems
 Non-uniformity in implementations
XML-STDS-20
Storage Strategy: Relational
 Utilizing
CSE
5095
a relational database to store XML documents
and data is very popular
 In a very data – centric application this approach is
intuitive
 Most top tier database applications support XML in
some way
 Oracle, SQL server, IBM, etc...
 Software is highly supported and well developed.
XML-STDS-21
XML Shema mapping

CSE
5095


Using a relational DB requires mapping XML schema
to DB schema.
Table based:
 Often implemented as a middleware layer
 Schema structure must follow row-column
convention
Object – relational:
 XML is a tree of objects
 Mapped to DB using well established OR methods
 Natively supported in some DB apps
XML-STDS-22
Storage Strategy: CMS

CSE

5095




CMS – Content Management System
Used in exclusively document-centric model
Various programs allow indexing, storage,
manipulation, and publication of XML documents
Application specific
Numerous implementations, most recently Open
Office and MS Word 2007
Not very interesting or useful in context of biomedical
information
XML-STDS-23
Storage Strategy: Native

CSE
5095



Semi – structured data
 Mapping to relational DB causes inflation and null
space
 Need more functionality and granularity than CMS
Performance increase over relational DB by avoiding
joins
 Assuming data is in appropriate order on disk
Only returns XML, need to convert for non XML
manipulation
Development still in infancy as of Winter 2007
XML-STDS-24
Native XML Databases

CSE
5095



Definition:
 ”A database that has an XML document as its
fundamental unit of (logical) storage and defines a
(logical) model for an XML document, as opposed
to the data in that document, and stores and
retrieves documents according to that model. At a
minimum, the model must include elements,
attributes, PCDATA, and document order.”
Data types: No support in XML, need a mapping
 Document or database schema can be used
 External user defined mapping
 Not necessary when only transfering data
No requirement on underlying medium or
implementation
Two architectures; text and model based
XML-STDS-25
Native: Text-based

CSE
5095
Use any DB
 Rather than mapping schemas, store entire XML
documents
 Usually involves saving entire document as a
BLOB / Character LOB
 Utilize various text field searches to retrieve info
from XML document
 Some DB text searching are being made XML
aware
 Speed: Document located on disk preferences full
or partial document retrieval
XML-STDS-26
Native: Model-based

CSE

5095



Internal object model of the document schema
Store this model in a database
 Relational / object-oriented database
 Proprietary
Performance similar to chosen db engine
Still limited by hierachy of XML data
 Retrieve all orderid's from hundreds of docs slow
Support for common XML query languages
 XPath, XQuery, etc...
XML-STDS-27
Native XML: TLC

CSE
5095



In the traditional database world, Transactions, locking
and concurrency are paramount
Native XML databases aren't mature enough to
support everything
Most support transactions, but what about LC?
 Document level locking is easy, but too coarse.
 Only a few implementations support node level
locking
Commercial products generally support ACID, free
ones just starting too (2008)
 Atomicity-Consistency-Isolation-Durability
XML-STDS-28
Native XML: API's

CSE
5095


Ubiquity of ODBC interfaces
 Still applies to native XML databases
Most implementations provide their own interface for
a variety of languages
Industry standardization:
 XML:DB API from XML:DB.org, programming
language neutral
 JSR 225: Xquery API for JAVA (XQJ). IBM and
Oracle
XML-STDS-29
Native XML: The Rest
 Referential
CSE
5095
integrity is supported in an adhoc manner at
best
 Database cannot enforce user defined (via schema)
integrity
 Some standard mechanisms allow it
 Eventually both mechanisms will be supported
 Currently relies heavily on application for normalization
and integrity
 Certainly a drawback for medical applications
XML-STDS-30
Native XML: Scalability

CSE

5095




Limitation of any DB is time spent seeking HD
XML only needs to find pointer to head of doc
Therefore an XML DB should scale well in the
context of retrieving data
The only caviat is if the retrieval breaks the document
hierachy
More pointers must be followed, potentially slowing
retrieval greatly
Where there is money, there is a way
XML-STDS-31
Biomedical Information
 Overview
CSE
5095
of the field.
 Data storage and transfer problem.
 XML as a solution.
 BMI XML examples.
 Next section: Choosing a native DB.
XML-STDS-32
BMI Overview

CSE

5095
The convergence of computation and biomedicine
The NIH BMI Science and Tech Initiative:
 Define biomedical computing as a science
 Many sources of information:
 Clinical, surgical, genetics, drug design, biology
Standardization in software
 Algorithm development, high speed computing
All relieves on efficient storage and transfer of
information


XML-STDS-33
BMISTI: Databases

CSE
5095


”Biomedical computing is entering an age where
creative exploration of huge amounts of data will lay
the foundation of hypotheses.” ~NIH Director
Problems:
 Standards. Terminology, syntax and semantics
need to be defined and agreed upon to allow
integration of data
 Curation. Database submissions need to be
checked and cross-referenced to avoid the
transitive propagation of error
 Interoperability. Data should be as consistent as
possible across databases so that researchers can
compare and contrast it
Computational and Systems issue:
 Utilize and manipulate information.
 Procress large volumes of information.
XML-STDS-34
BMI: XML
 Data
CSE
5095
sharing and semantic interoperability
 Case study: Electronic Health Record
 The development and use of an integrated health
record for a patient
 Hetergenous data, e.g. clinical, clinical-trial, genomic
data
 Primary Obstacle: Proprietary data formats
 Uniformity on technical level: Text file
 Step towards semantic goal
XML-STDS-35
XML in Clinical Data
 HL7
CSE
5095
standards organization.
 V2: ASCII bar format. example:
HL7V3|1|2.02
Message|2.16.840.1.113883.1122^CNTRL-3456|2002081614303516^- --->
06:00||3.0|2.16.840.1.113883^POLB_IN004410||P|I|ER|ER
respondTo|RSP|tel:555-555-5555^^WP
entit yRsp|||{FAM^^Hippocrates~GIV^^Harold~GIV^^H~SFX^AC^MD}|tel:555-555-5555^^WP
sender|SND|nfs:127.127.127.255
device||2.16.840.1.113883.1122^GHH LAB|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H
agencyFor
representedOrganization||\NOTH\
location|||2.16.840.1.113883.1122^ELAB-3|{^^GHH Lab}^TN
receiver|RCV|nfs:127.127.127.0
device|||2.16.840.1.113883.1122^GHH O E|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H
agencyFor
representedOrganization|||2.16.840.1.113883.19.3.1001|{^^GHH Outpatient Clinic}^TN
location|||2.16.840.1.113883.1122^BLDG4|{^^GHH Outpatient Clinic}^TN

Awkward, inflexible, unclear meaning of values.
XML-STDS-36
HL7 V3 Specification

CSE
5095
Built around Reference Information Model:
 Entity,
Role, Participation, and Act
 Utilizes dedicated vocabularites and data types.
 Every specification must begin from RIM.

Clinical Document Architecture
XML with tags like ”observation, code,
value and id”.
 Utilizes
<observation classCode="OBS" moodCode="EVN">
<id root="10.23.4573.15879"/>
<code code="313193002" codeSystem="2.16.840.1.113883.6.96"
codeSystemName="SNOMED CT" displayName="Peak flow"/>
<effectiveTime value="20000407"/>
<value xsi:type="RTO_PQ_PQ">
<numerator value="260" unit="l"/>
<denominator value="1" unit="min"/>
</value>
</observation>
XML-STDS-37
XML in Clinical Trials
 Example:
CSE
5095
Drug studies
 Utilizing XML would eliminate manual transcription
when moving data from one system to another
 XML is a universal datatype as it stores everything in
text
 Therefore can handle new tech. seamlessly
 Clinical Data Interchange Standards Consortium
 Industry standardization
XML-STDS-38
CDISC: ODM
 Operational
CSE
5095
Data Model:
 XML based
 Facilitate moving data from any collection system to
clinical trial sponsor
 Addresses real world issues:
Incomplete data
Partial data transfer
Versioning and branching
 ODM
1.1 current version
XML-STDS-39
ODM: Layout
CSE
5095
XML-STDS-40
XML in Genomic Data
 Various
CSE
5095
groups export their data in XML
 NCBI, EBI
 They do not follow same schema, only allows partial
semantic interoperability
 Microarray Gene Experssion Group (MAGE) publishes
a schema
 MAGE files are often several gigabytes
 Illustrates overhead of XML, however researches still
use it because of interoperability
XML-STDS-41
XML Complexity

CSE
5095



Clinical Genomics Special Interest Group (HL7)
 Use genomic data in clinical enviroment
Utilize several models such as MAGE, BSML (for dna
seqs)
All information in raw models not necessary
 ”Bubbling up” analyzes large raw data sets,
extracts useful information
 Transfer useful information to new schema / model
Bottom line, there exists complex workflows to extract
usable information.
XML-STDS-42
XML BMI Issues

CSE
5095


Clinical information like a verbal description or advice
is unstructured
 How do you query this?
Schemas and Models are extremely complex, with
nesting, recursion and compound data types
 Difficult mapping to relational databases
XML instances may be gigabytes in size
 What database solutions exist to handle such large
files?
XML-STDS-43
XML BMI Examples
 A closer
CSE
5095
look at the Clinical Document Architecture
 Mayo clinic's implementation of CDA
 Case study using native XML database to facilitate
research based upon clinical texts
 Tamino XML DB
 Querying native BD
 UCONN BMI, CSE 300 Spring 2008
XML-STDS-44
XML BMI: CDA

A clinical document is:
 Persistence: exists for a defined time period
 Stewardship: Maintained by a designated care
taker
 Potential for authentication: May be legally
authenticated
 It must be human readable on a standard web
browser
 Utilizes standard XML syntax

www.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdf
CSE
5095
XML-STDS-45
XML BMI: CDA
www.hl7.de/iamcda2004/finalmat/day1/Calvin%20Beebe%20CDA%20Update.pdf
CSE
5095
Mayo clinics use of CDA:
XML-STDS-46
Survey of Native XML DBMS
 Comprehensive
CSE
5095
List:
 http://www.rpbourret.com/xml/XMLDatabaseProds.h
tm#native
 Commercial:
 Tamino XML Server
Well developed, supported, many tools available
 Open
Source:
 Sedna: Fully supports ACID, XQuery
 eXist: Great managment, documentation, indexing
XML-STDS-47
eXist
http://www.rpbourret.com/xml/ProdsNative.htm#exist
CSE
5095







Proprietary data store B+ trees).
Supports XQuery/XPath 2.0
Full text searches.
XML:DB API.
Document level concurrency.
Complete documentation.
Incomplete transaction support.
XML-STDS-48
Sedna
http://www.rpbourret.com/xml/ProdsNative.htm#sedna
CSE
5095






Underlying data storage based on DataGuide
Supports XQuery/XPath 2.0
Full text searches.
Custom API for various languages.
Command line admin.
Transaction support.
XML-STDS-49
XML References
CSE
5095












“Canonical XML Version 1.0”, John Boyer. 15 March 2001. W3C
“XML Path Language (Xpath) 2.0”. W3C working Draft. 2 May 2003. W3C
“XML Schema”. XML Schema Working Group. 1 January 2008. W3C
<http://www.w3.org/XML/Schema>
“XML Schema: Formal Description” Brown, Fuchs, et. al. 25 September
2001. W3C
<http://www.w3.org/TR/xmlschema-formal/>
“Extensible Markup Language (XML)”. 1 January 2008. W3C
<http://www.w3.org/XML/>
http://www.25hoursaday.com/StoringAndQueryingXML.html
http://www.nih.gov/about/director/060399.htm
http://www.research.ibm.com/journal/sj/452/shabo.html
“Overview of the CDISC Operational Data Model”. 26 April 2002. CDISC
XML-STDS-50
What is one Possible Solution?

CSE
5095

Standards and Usage of XML
 Consider CDA – Clinical Document Architecture
 Standard for Clinical (Provider) Medical Record
Clinical Record Organized as:





<patient_encounter> - location
<legal_authenticator> - MD
<originating_organization> and <provider>
<patient> - name, birthdate, gender
<body_confidentiality-”CONF1”> - note








History
Past Medical History
Medications
Allergies
Social History
Physical Exam
Vitals (BP, Resp, Temp, HR)
Etc...
XML-STDS-51
What is one Possible Solution?

CSE

5095
Let’s Explore this in Greater Detail
Starting with the CDA Header
<?xml version="1.0"?>
<!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd">
<levelone>
<clinical_document_header>
<id EX="a123" RT="2.16.840.1.113883.3.933"/>
<set_id EX="B" RT="2.16.840.1.113883.3.933"/>
<version_nbr V="2"/>
<document_type_cd V="11488-4" S="2.16.840.1.113883.6.1"
DN="Consultation note"/>
<origination_dttm V="2000-04-07"/>
<confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/>
<confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/>
<document_relationship>
<document_relationship.type_cd V="RPLC"/>
<related_document>
<id EX="a234" RT="2.16.840.1.113883.3.933"/>
<set_id EX="B" RT="2.16.840.1.113883.3.933"/>
<version_nbr V="1"/>
</related_document>
</document_relationship>
<fulfills_order>
<fulfills_order.type_cd V="FLFS"/>
<order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order>
<order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order>
</fulfills_order>
XML-STDS-52
CDA Example - Continued
CSE
5095
XML-STDS-53
CDA Example - Continued
CSE
5095
XML-STDS-54
CDA Example - Continued
CSE
5095
XML-STDS-55
CDA Example - Continued
CSE
5095
XML-STDS-56
CDA Example - Continued
CSE
5095
XML-STDS-57
CDA Example - Continued
CSE
5095
XML-STDS-58
CDA Example - Continued
CSE
5095
XML-STDS-59
CDA Example - Continued
CSE
5095
XML-STDS-60
Other Relevant Standards of Note

CSE

5095




MeSH
Unified Medical Language System
ICD9 and ICD9-CM (Intl. Classification Diseases)
ICD10 and ICD10-CM
SNOMED-CT (Clinical Terms)
National Drug Codes (NDC)
XML-STDS-61
MeSH

CSE
5095



The Medical Subject Headings (MeSH®) thesaurus is
a controlled vocabulary produced by the National
Library of Medicine and used for indexing, cataloging,
and searching for biomedical and health-related
information and documents.
2011 MeSH includes the subject descriptors appearing
in MEDLINE®/PubMed®, the NLM catalog database,
and other NLM databases.
Many synonyms, near-synonyms, and closely related
concepts are included as entry terms to help users find
the most relevant MeSH descriptor for the concept
they are seeking.
http://www.nlm.nih.gov/mesh/
XML-STDS-62
Descriptor Data Elements
CSE
5095
XML-STDS-63
Qualifier Data Elements
CSE
5095
XML-STDS-64
Supplementary Concepts
CSE
5095
XML-STDS-65
MeSH in ASCII
CSE
5095
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT DU EC HI IM IP ME PD
PK PO RE SD ST TO TU UR
ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef
ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef
ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM
(1991)|900308|abbcdef
ENTRY = A 23187
ENTRY = A23187, Antibiotic
MN = D03.438.221.173
PA = Anti-Bacterial Agents
PA = Ionophores
MH_TH = NLM (1975)
ST = T109
ST = T195
N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7dioxaspiro(5.5)undec-2-yl)methyl)-, (6S(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))RN = 52665-69-7
PI = Antibiotics (1973-1974)
PI = Carboxylic Acids (1973-1974)
XML-STDS-66
MeSH in ASCII
CSE
5095
MS = An ionophorous, polyether antibiotic from Streptomyces
chartreusensis. It binds and transports cations across membranes
and uncouples oxidative phosphorylation while inhibiting ATPase of
rat liver mitochondria. The substance is used mostly as a
biochemical tool to study the role of divalent cations in various
biological systems.
OL = use CALCIMYCIN to search A 23187 1975-90
PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
MED = *62
MED = 847
M90 = *299
M90 = 2405
M85 = *454
M85 = 2878
M80 = *316
M80 = 1601
M75 = *300
M75 = 823
M66 = *1
M66 = 3
ETC
XML-STDS-67
MeSH in XML - desc2011.dtd
<!-- MeSH DTD file for Descriptor records. desc2011.dtd -->
-->
CSE <!-- Author: MeSH
-->
5095 <!-- Effective: 09/01/2010
<!-- #PCDATA: parseable character data = text
occurence indicators (default: required, not repeatable):
?: zero or one occurrence, i.e., at most one (optional)
*: zero or more occurrences (optional, repeatable)
+: one or more occurrences (required, repeatable)
|: choice, one or the other, but not both
-->
<!ENTITY % DescriptorReference "(DescriptorUI, DescriptorName)">
<!ENTITY % normal.date "(Year, Month, Day)">
<!ENTITY % ConceptReference
"(ConceptUI,ConceptName,ConceptUMLSUI?)">
<!ENTITY % QualifierReference "(QualifierUI, QualifierName)">
<!ENTITY % TermReference "(TermUI, String)">
XML-STDS-68
MeSH in XML - desc2011.dtd
<!ELEMENT DescriptorRecordSet (DescriptorRecord*)>
CSE <!ATTLIST DescriptorRecordSet LanguageCode
5095 (cze|dut|eng|fin|fre|ger|ita|jpn|lav|por|scr|slv|spa) #REQUIRED>
<!ELEMENT DescriptorRecord (%DescriptorReference;,
DateCreated,
DateRevised?,
DateEstablished?,
ActiveMeSHYearList,
AllowableQualifiersList?,
Annotation?,
HistoryNote?,
OnlineNote?,
PublicMeSHNote?,
PreviousIndexingList?,
EntryCombinationList?,
SeeRelatedList?,
ConsiderAlso?,
PharmacologicalActionList?,
RunningHead?,
TreeNumberList?,
RecordOriginatorsList,
ConceptList) >
<!ATTLIST DescriptorRecord DescriptorClass (1 | 2 | 3 | 4)
"1">
XML-STDS-69
MeSH in XML - desc2011.dtd
<!ELEMENT
CSE <!ELEMENT
5095 <!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
ActiveMeSHYearList (Year+)>
AllowableQualifiersList (AllowableQualifier+) >
AllowableQualifier (QualifierReferredTo,Abbreviation )>
Annotation (#PCDATA)>
ConsiderAlso (#PCDATA) >
Day (#PCDATA)>
DescriptorUI (#PCDATA) >
DescriptorName (String) >
DateCreated (%normal.date;) >
DateRevised (%normal.date;) >
DateEstablished (%normal.date;) >
DescriptorReferredTo (%DescriptorReference;) >
<!ELEMENT EntryCombinationList (EntryCombination+) >
<!ELEMENT EntryCombination
(ECIN,
ECOUT)>
<!ELEMENT ECIN (DescriptorReferredTo,QualifierReferredTo) >
<!ELEMENT ECOUT (DescriptorReferredTo,QualifierReferredTo? ) >
<!ELEMENT HistoryNote (#PCDATA)>
<!ELEMENT Month (#PCDATA)>
<!ELEMENT OnlineNote (#PCDATA)>
ETC
XML-STDS-70
dMeSH in XML - Sample
<?xml version="1.0"?>
CSE <!DOCTYPE DescriptorRecordSet SYSTEM "desc2011.dtd">
5095 <DescriptorRecordSet LanguageCode = "eng">
<DescriptorRecord DescriptorClass = "1">
<DescriptorUI>D000001</DescriptorUI>
<DescriptorName>
<String>Calcimycin</String>
</DescriptorName>
<DateCreated>
<Year>1974</Year>
<Month>11</Month>
<Day>19</Day>
</DateCreated>
<DateRevised>
<Year>2006</Year>
<Month>07</Month>
<Day>05</Day>
</DateRevised>
<DateEstablished>
<Year>1984</Year>
<Month>01</Month>
<Day>01</Day>
</DateEstablished>
XML-STDS-71
dMeSH in XML - Sample
<ActiveMeSHYearList>
<Year>2007</Year>
<Year>2008</Year>
CSE
<Year>2009</Year>
5095
<Year>2011</Year>
</ActiveMeSHYearList>
<AllowableQualifiersList>
<AllowableQualifier>
<QualifierReferredTo>
<QualifierUI>Q000008</QualifierUI>
<QualifierName>
<String>administration & dosage</String>
</QualifierName>
</QualifierReferredTo>
<Abbreviation>AD</Abbreviation>
</AllowableQualifier>
<AllowableQualifier>
<QualifierReferredTo>
<QualifierUI>Q000009</QualifierUI>
<QualifierName>
<String>adverse effects</String>
</QualifierName>
</QualifierReferredTo>
<Abbreviation>AE</Abbreviation>
</AllowableQualifier>
ETC
XML-STDS-72
Unifies Medical Language System

CSE

5095
UMLS acronym for was developed for National
Library of Medicine
Disease is semantic type
with around 392 relations
(109 semantic relations
and 22 other relations).
Pneumonia categorized
under one semantic type
Disease, but has
hundreds of relations.
XML-STDS-73
UMLS Concepts, Semantic Types/Relations
CSE
5095
XML-STDS-74
ICD9 Respiratory Diseases
CSE
5095
XML-STDS-75
ICD10 Respiratory Diseases
CSE
5095
XML-STDS-76
SNOMED-CT

CSE
5095

SNOMED stands for Systemized Nomenclature Of
Medicine Clinical Terms. SNOMED-CT is the result
of merging two ontologies: SNOMED-RT and Clinical
Terms.
http://www.ihtsdo.org/snomed-ct/
77
XML-STDS-77
SNOMED-CT

CSE

5095

Composed of Concepts, Terms, and Relationships
Precisely Represent Clinical Information Across Scope
of Health Care
Content Coverage Divided into Hierarchies
78
XML-STDS-78
SNOMED Example
CSE
5095
XML-STDS-79
National Drug Codes

CSE

5095



Tracking of Drugs (Prescription and OTC)
From Submittal Through Approach
Keeps Track of Many Details on Medication
Each Drug by Manufacturer has Unique NDC
Identifier
See:


http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm
Searchable Database:

http://www.accessdata.fda.gov/scripts/cder/ndc/default.cfm
XML-STDS-80
NDC Examples
CSE
5095
XML-STDS-81
Biomedical & Clinical Ontologies

CSE

5095





Evolution of WWW
Ontology
 Definition and Description.
 Example.
Present Biomedical Ontology
Need for Integration
Application of Biomedical Ontology
 Clinical Trials
 OASIS: Integration Technique
 Clinical Decision Support System
Summary
Presentation from Rishi Saripalle, Spring 2008
82
XML-STDS-82
Current Information Systems on WWW

CSE
5095


First Generation:
 Raw data which was pretty much hand-coded by
the user was published online
 For example, Static web pages
Second Generation:
 Dynamic content generation driven by MDA and
databases
 Machines generate the respective HTML
Third Generation: Semantic Web:
 Generating machine processable information
where the content is machine understandable,
enabling intelligent services such as information
brokers, search agents, information filters to
process domain related information.
XML-STDS-83
What are Ontologies?

CSE
5095

Definition (from Philosophy) :
 Ontology is study of being or existence and forms
the basic subject matter of metaphysics. It seeks to
describe the basic categories and relationships of
being or existence to define entities and types of
entities within its framework.
Definition (from Computer Science):
 In Computer science , Ontology means
“specification of a conceptualization”.
It means “A data model that represents a set of
concepts within a domain and the relationships
between those concepts”.
XML-STDS-84
Advantages of Ontology

CSE
5095



Semantic way of representing knowledge of the
domain
Intelligent system can provide reasoning Systems to
make inferences within the Ontology
To Share the common structure of information
To reuse the similar domain Ontology
XML-STDS-85
Development of Ontology

CSE
5095




Determine the domain and Scope ( Range ) of the
knowledge
Look for already existing ontology in the similar
domain
Listing all the terminologies or Concepts of the
domain
List all the classes and instances to be created in the
ontology
Create the properties which will relate these concepts
in the ontology
XML-STDS-86
Example of Ontology
CSE
5095
Wine
Australian Yellow
Tail
Individual
Class
Properties
Color
Yellow
Flavor
Delicate
Maker
Australia
German
XML-STDS-87
What are RDF and OWL?

CSE
5095

Researchers proposed Semantic Web Stack
illustrating hierarchy of languages, where each layer
exploits and uses capabilities of the layers below
OWL and RDF belong the family of knowledge
representation language.
 RDF: Resource Description Framework
 http://www.w3.org/RDF/

OWL: Web Ontology Language
 http://www.w3.org/TR/owl-features/

RDF reminds of Semantic Networks which were
popular in 1970’s
XML-STDS-88
Introduction to RDF / OWL
CSE
5095
XML-STDS-89
RDF: Resource Description Framework

CSE
5095




RDF represents the knowledge in triples format:
Subject – Predicate – Object
For example,
Students – registerTo –
Classes
(Subject) (Predicate)
(Object)
One triple is RDF is referred as a statement
RDF is grammar based language has syntax similar to
XML
RDFS (RDF Schema) has syntax similar to RDF and
provide schema grammar to RDF. For example,
rdfs:Class, rdfs:subClassOf etc
XML-STDS-90
RDF: Resource Description Framework

CSE
5095
RDF syntax of the above example:
<rdfs:Class rdf:about="http://www.example.com/examle#Students"
rdfs:label="Students">
</rdfs:Class>
<rdfs:Class rdf:about="http://www.example.com/examle#Classes"
rdfs:label=“Classes">
</rdfs:Class>

All the concepts described in the RDF are identified
using an URI (ex.
http://www.example.com/examle#Students).

RDF can be viewed as standardized framework for
providing metadata to domain concepts.
XML-STDS-91
OWL: Web Ontology Language

CSE
5095


OWL is placed on the top of the semantic web stack,
utilizing all the powerful features offered by the layers
below (RDF, RDFS, XML)
OWL design has been influenced by description logic
& knowledge representational paradigms
 SHIQ, Semantic Networks, Frames, SHOE,
DAML, OIL, DAML+OIL.
OWL provides richer semantic capabilities than its
predecessor RDF
 For example, in the previous example, the
predicate registerTo is of type rdf:Property.
XML-STDS-92
OWL: Web Ontology Language

CSE
5095


OWL differentiates between properties by defining
 owl:ObjectProperty – for connecting two concepts
(registerTo) and
 owl:DatatypeProperty - for connecting a concept
to a datatype (utilized from XML)
These two properties inherit from RDF property
OWL also defines owl:AnnotationProperty for
embedding metadata onto classes, rules and axioms
The following slide illustrates the use of OWL, RDF
and RDFS ( taken from cardiac ontology build in
OWL using protégé tool)
XML-STDS-93
OWL: Web Ontology Language
<owl:Class rdf:ID="Veins">
<rdfs:subClassOf>
<owl:Class rdf:ID="Heart"/>
</rdfs:subClassOf>
</owl:Class>
<Veins rdf:ID="Pulmonary_Vein"/>
CSE
5095
Heart
Vein
Pulmonary
Vein

Pulmonary Vein is sub-class of Vein which is subclass of Heart.

The next slide illustrates the OWL properties and
expressive power of OWL to restrict the domain and
range values accepted by these properties.
BioMedical Informatics
XML-STDS-94
OWL: Web Ontology Language
<owl:ObjectProperty rdf:ID="Complications">
<rdfs:domain rdf:resource="#Cardiology_Diseases"/>
<rdfs:range>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Cardiology_Complications"/>
<owl:Class rdf:about="#Cardiology_Diseases"/>
<owl:Class rdf:about="#Cardiology_Causes"/>
</owl:unionOf>
</owl:Class>
</rdfs:range>
</owl:ObjectProperty>
CSE
5095


The object property “Complications” can take domain
values from class “Cardiology_Diseases” and range
values from combination of classes
OWL combined with RDF/RDFS provides an
environment for developing domain ontologies by
organizing and describing
the domain concepts
BioMedical Informatics
XML-STDS-95
Disease Ontology
CSE
5095
Instances of
Mitral_Valve_Disorders
Hierarchical organization of Cardiology Diseases
XML-STDS-96
Disease Ontology
CSE
5095
Property Defined
Representation of “Mitral_Valve_Prolapse” knowledge using properties
and instances
XML-STDS-97
Implemented Ontology in OWL Format
…………..
CSE
5095
<Congenital_Heart_Disease rdf:ID="Atrial_septal_defect">
<Complications>
<Cardiac_Arrhythmias rdf:ID="Arrhythmia">
<Has_Intervention
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>defibrillation</Has_Intervention>
<Have_Symptoms>
<Cardiology_Symptoms rdf:ID="Dyspnea"/>
</Have_Symptoms>
<Has_Diagnosis_Test>
<Cardiology_Diagnosis_Test
rdf:ID="Coronary_Angiography">
<Has_Synonyms
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>coronary catheterization </Has_Synonyms>
………………..
XML-STDS-98
Bio-Medical Ontologies

CSE
5095
Review a Wide Range of Available Ontologies and
Standards:
 OpenCyc
 WordNet
 Galen
 UMLS
 SNOMED – CT
 FMA
 Gene Ontology
XML-STDS-99
Open Cyc

CSE
5095

Open Cyc is an Upper level ontology developed by
Cycorp Inc.
Open Cyc has 60,000 hand coded assertions that
capture “common sense language”, so that AI
algorithms can perform human like reasoning and
contains 6,000 concepts
XML-STDS-100
Example of Open Cyc
CSE
5095
XML-STDS-101
Word Net

CSE
5095
WordNet is an electronic lexical database developed at
Princeton University that serves as a resource for
applications in natural language processing and
information retrieval.
cancer, malignant neoplastic disease: any malignant growth or tumor caused by
abnormal and uncontrolled cell division; it may spread to other parts of the body
through the lymphatic system or the blood stream
Cancer, Crab: (astrology) a person who is born while the sun is in Cancer
Cancer: a small zodiacal constellation in the northern hemisphere; between Leo and
Gemini
Cancer, Cancer the Crab, Crab: the fourth sign of the zodiac; the sun is in this sign
from about June 21 to July 22
Cancer, genus Cancer: type genus of the family Cancridae
XML-STDS-102
Unifies Medical Language System

CSE
5095
UMLS was developed for National Library of
Medicine
Disease is semantic type
with around 392 relations
(109 semantic relations
and 22 other relations).
Pneumonia categorized
under one semantic type
Disease, but has
hundreds of relations.
XML-STDS-103
SNOMED-CT

CSE
5095
SNOMED stands for Systemized Nomenclature Of
Medicine Clinical Terms. SNOMED-CT is the
result of merging two ontologies: SNOMED-RT and
Clinical Terms.
XML-STDS-104
Ontology Integration

CSE
5095



All the ontologies developed have a common aim,
describing the domain knowledge
Integration of ontologies is becoming very critical
 Applications tend to use multiple ontologies
 Concepts in the various ontologies overlap or
same concept is described in multiple ways.
For example, the concept “Blood” is described as
differently
 “Fluid” in one ontology
 “Substance” in another ontology
 “semi-solid” in a third ontology
Need to Reconcile these Differences When
Attempting to “Combine” data that Originates from
Different Ontologies
XML-STDS-105
Ontology Integration

CSE

5095
Semantics vs Structural Integration ?
Difficulties of integration arise with similar, same and
complementary ontology integration.
Ontology B
XML-STDS-106
OASIS

Ontology Mapping and Integration Framework
CSE
5095
XML-STDS-107
Application of Ontologies

CSE
5095



Randomized Clinical Trails: one of the least
biasedsources of clinical research evidence, and are
therefore a critical resource for the practice of
evidence-based medicine
Scientific community is trying to encode the finding in
computer process able language
However, for evidence to be put in practice one has to
analysis the data. The canonical practice for trial
interpretation is call System Reviewing.
Source for Data Specification:
 Trial Reports
 Trial Databases.
XML-STDS-108
Life Cycle of Clinical Trials
CSE
5095
Ontology Specifications
XML-STDS-109
Designing the Ontology

CSE
5095
RCT ontology specifications are obtained from:
 Trial Reports
 Trial Databases - ClinicalTrials.gov, PDQ etc.


The ontology is created by dividing the task into SubTasks and Methods. This recursive process is called
Competency Decomposition.
RCT decomposition methods combined Generic
Tasks and Competency Question.
XML-STDS-110
Defining the Schema
CSE
5095
…….
Intervention
-ARM
TRAIL
…….
Administrative
Concept
OutcomeConcept
Population
188 - Frames
601 - Slots
…….
…….
Excluded
Population
Analyzed
Population
XML-STDS-111
Matching Patient Records to Clinical Trials

CSE
5095


Low participation in Clinical Trials is the major
problem in Clinical and translational research area.
Matching the patient records to clinical trials is
presently a manual procedure and its tedious.
Need a Semantic Bridge between Clinical Ontologies
(SNOMED CT, etc ..) and raw patient data for
 retrieving matching patient records, clinical
guidelines and clinical decision support systems (
CDSS).
XML-STDS-112
Technical Challenges

CSE
5095


Challenges to be faced during real time scenario:
 Knowledge Engineering.
 Scalability
 Noisy or Incomplete Data
Knowledge Engineering
 Clinical Ontology has the concept “Drug”, which
described active composition of the various drugs
 However, patient record contains name of vendorspecific drugs list
Clinical Ontology describe the cause of the disorder.
The patient records only specify the presence or
absence of the disorder and where was the clinical
test conducted.
XML-STDS-113
Architecture of Solution
CSE
5095
Clinical Trials
Patient
Data
SNOMED-CT
Query
Ontology
ABox
Reasoner
TBox
XML-STDS-114
Implementation Approach

CSE
5095




Mapping Patient Data Terminology to SNOMED-CT

Using UMLS as intermediate target.

NLP mapping techniques

Manual Mapping
Map the raw patient data to SNOMED-CT
terminology.
 Example: Cerner Drug: Lactulose Syrup 20G/30ml
 SNOMED-CT: administeredSubstance
Allow user to specify which terms in the definition to
be matched.
Last Bullet Means Ontology Matching NOT Fully
Automated!
This is a Real Problem for Interoperating Data!
XML-STDS-115
Contrast in Representation
CSE
5095
 Example:
 SNOMED-CT: Disease1
hasAgent Virus007
Infection due to Bacteria001
Infection due to MicroBacteria007
Patient Record: Disease1 Positive.
 As there is not much information in the patient
record the query reasoner cannot find the records
with partial data.
XML-STDS-116
How are Observations Reconciled?
CSE
5095
Clinical Trials
Description
NCT00084266
Patients with MSRA
NCT00288808
Patients with warfarin
NCT00298870
Patients on steroids
NCT00304382
Patients with Pneumonia,source
of Blood or Sputum
Э associatedObservation MRSA
Э associatedObservation
Pneumococcal Penumonia
П
Э hasSpecimanSource Blood Ц Sputum
XML-STDS-117
Clinical Decision Support System

CSE
5095

Clinical Decision Support Systems (CDSS) are
 Interactive computer programs
 Designed to assist physicians and other health
professionals with decision making tasks
Components of CDSS:
 Knowledge Base
 Rule Based Engine
 Case Base
 Business Models
XML-STDS-118
Example of Usaeg of Rules
CSE
5095
IF
“ RULE 1” &“RULE 2” &“RULE 3” …..
“Rule n”
THEN
“INTERVENTION 1 or Rule M”
IF
p.getGender() = “male”
& p.getAge()=34 & p.getBP() <140 &
p.getInsulinLevel()<20
THEN
“ Asthma Intervention Level 2”
Class Patinet
HasGender “male” П hasAge
“34” П hasBP MoreThan 140 П
hasInsulinLevel MoreThan 20
XML-STDS-119
Summary - Ontologies

CSE
5095



Ontology
 Definition and Descriptions.
 Example.
Biomedical Ontology
 Open Cyc
 WordNet
 GALEN
 SNOMED - CT
Integration of Ontologies
Application of Biomedical Ontology
 Clinical Trials.
 OASIS: Integration Technique.
 Clinical Decision Support System.
XML-STDS-120
Concluding Remarks: XML/Standards

CSE
5095



Explored Usage of XML Including:
 Basic XML Concepts
 XML Tools and Standards
 XML Databases
 Use of XML in BMI
Reviewed HL7 and CDA
Examined Numerous Standards
Reviewed Ontology Concepts
XML-STDS-121
Download