cse4701chap12

advertisement
Chapter 12 6e Outline
 Structured, Semistructured,
and Unstructured Data
 XML Hierarchical (Tree) Data Model
 XML Documents and XML Schema
 Storing and Extracting XML Documents
from Databases
 XML Languages
 Extracting XML Documents from Relational
Databases
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
What is XML?
 Standards and Usage of XML
 XML Used in Myriad of Context
 Modeling and Information Exchange
(XML Schemas and Instances)
 XML Standards
• XACML – Access Control Markup Language
• OWL – Web Ontology Language
• HL7/CDA

XML Databases
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML: Extensible Markup Language
 Data sources

Database storing data for Internet applications
 Utilized to Store Information for Other Apps
 Emerging Standards across Domains
 Hypertext documents

Common method of specifying contents and
formatting of Web pages
 XML data model
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Structured, Semistructured,
and Unstructured Data
 Structured data

Represented in a strict format
 Example: information stored in databases
 Semistructured data

Has a certain structure
 Not all information collected will have identical
structure
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Structured, Semistructured,
and Unstructured Data

Schema information mixed in with data values
 Self-describing data
 May be displayed as a directed graph
• Labels or tags on directed edges represent:
•
•
•
•
Schema names
Names of attributes
Object types (or entity types or classes)
Relationships
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML tags
 XML tags may have an element field used to
store information within the tag or Meta-data
 Plain text can be placed between tags and this
text is not parsed
 CDATA is character data

This means that any string of non-markup characters is
legal as part of the attribute
 The ENTITY indicates that the attribute will
represent an external entity in the document itself
 The ID attribute type if you want to specify a
unique identifier for each element.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Schema
 The structure of an XML document is defined
by its schema.
 Dozens on languages to define XML
schema:

DTD
 W3C (XSD)
 NG - Relax
 This file can validate any instance of an XML
document against it self.
 This file, or schema also defines allowable
tags.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Structured, Semistructured,
and Unstructured Data (cont’d.)
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Sample XML Structure
 XML employees a tree structure model for
representing data (previous slide)
shiporder
shipto
orderperson
orderid
name
address
item
title
name
quantity
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
price
city
country
Schema Example (XSD)
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="item" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string" minOccurs="0"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="orderid" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Structured, Semistructured,
and Unstructured Data
 Unstructured data

Limited indication of the of data document that
contains information embedded within it
 HTML tag

Text that appears between angled brackets:
<...>
 End tag

Tag with a slash: </...>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Structured, Semistructured,
and Unstructured Data
 HTML uses a large number of predefined
tags
 HTML documents

Do not include schema information about type
of data
 Static HTML page

All information to be displayed explicitly spelled
out as fixed text in HTML file
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Hierarchical Data Model
 Elements and attributes

Main structuring concepts used to construct an
XML document
 Complex elements

Constructed from other elements hierarchically
 Simple elements

Contain data values
 XML tag names

Describe the meaning of the data elements in
the document
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Hierarchical Data Model
 Tree model or hierarchical model
 Main types of XML documents

Data-centric XML documents
 Document-centric XML documents
 Hybrid XML documents
 Schemaless XML documents

Do not follow a predefined schema of element
names and corresponding tree structure
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Hierarchical Data Model
 XML attributes

Describe properties and characteristics of the
elements (tags) within which they appear
 May reference another element in another
part of the XML document

Common to use attribute values in one element
as the references
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Documents and Schema
 Well formed

Has XML declaration
• Indicates version of XML being used as well as any
other relevant attributes

Every element must matching pair of start and
end tags
• Within start and end tags of parent element
 DOM (Document Object Model)

Manipulate resulting tree representation
corresponding to a well-formed XML document
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Documents and Schema
 SAX (Simple API for XML)

Processing of XML documents on the fly
• Notifies processing program through callbacks
whenever a start or end tag is encountered

Makes it easier to process large documents
 Allows for streaming
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Documents and Schema
 Valid

Document must be well formed
 Document must follow a particular schema
 Start and end tag pairs must follow structure
specified in separate XML DTD (Document
Type Definition) file or XML schema file
 DTD is Outmoded in Current Usage
• Schemas Dominate ..
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Schema
 XML schema language

Standard for specifying the structure of XML
documents
 Uses same syntax rules as regular XML
documents
• Same processors can be used on both
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Schema
 Identify specific set of XML schema
language elements (tags) being used

Specify a file stored at a Web site location
 XML namespace

Defines the set of commands (names) that can
be used
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Schema
 XML schema concepts:








Description and XML namespace
Annotations, documentation, language
Elements and types
First level element
Element types, minOccurs, and maxOccurs
Keys
Structures of complex elements
Composite attributes
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Storing and Extracting XML
Documents from Databases
 Most common approaches

Using a DBMS to store the documents as text
• Can be used if DBMS has a special module for
document processing

Using a DBMS to store document contents as
data elements
• Require mapping algorithms to design a database
schema that is compatible with XML document
structure
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Storing and Extracting XML
Documents from Databases

Designing a specialized system for storing
native XML data
• Called Native XML DBMSs

Creating or publishing customized XML
documents from preexisting relational
databases
• Use a separate middleware software layer to handle
conversions
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Languages
 Two query language standards

XPath
• Specify path expressions to identify certain nodes
(elements) or attributes within an XML document
that match specific patterns

XQuery
• Uses XPath expressions but has additional
constructs
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XPath: Specifying Path
Expressions in XML
 XPath expression

Returns a sequence of items that satisfy a
certain pattern as specified by the expression
 Either values (from leaf nodes) or elements or
attributes
 Qualifier conditions
• Further restrict nodes that satisfy pattern
 Separators used when specifying a path:

Single slash (/) and double slash (//)
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XPath: Specifying Path
Expressions in XML
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XPath: Specifying Path
Expressions in XML
 Attribute name prefixed by the @ symbol
 Wildcard symbol *

Stands for any element
 Example: /company/*
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XPath: Specifying Path
Expressions
in
XML
 Axes

Move in multiple directions from current node
in path expression
 Include self, child, descendent, attribute,
parent, ancestor, previous sibling, and next
sibling
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XPath: Specifying Path
Expressions
in
XML
 Main restriction of XPath path expressions

Path that specifies the pattern also specifies
the items to be retrieved
 Difficult to specify certain conditions on the
pattern while separately specifying which result
items should be retrieved
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Querying XML - XPath
 Many languages to query XML

XPath and XQuery are W3C standards
 Xpath is a compact method of traversing
previous tree

Designed to facilitate use via URL/URI's
 /shiporder/item/name
← view all items' names
 Extensible to add user defined behaviors
 Treats each tag as a node in the tree
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Querying XML - XQuery




Functional extension of XPath
XML equivalent of SQL
Navigate and manipulate document nodes.
Works on collections of documents, or even
fragments.
FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XQuery: Specifying Queries in XML
 XQuery FLWR expression

Four main clauses of XQuery
 Form:
FOR <variable bindings to individual nodes
(elements)>
LET <variable bindings to collections of nodes
(elements)>
WHERE <qualifier conditions>
RETURN <query result specification>

Zero or more instances of FOR and LET
clauses
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XQuery: Specifying Queries in
XML
 XQuery contains powerful constructs to
specify complex queries
 www.w3.org

Contains documents describing the latest
standards related to XML and XQuery
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Other Languages and Protocols
Related
to
XML
 Extensible Stylesheet Language (XSL)

Define how a document should be rendered for
display by a Web browser
 Extensible Stylesheet Language for
Transformations (XSLT)

Transform one structure into different structure
 Web Services Description Language
(WSDL)

Description of Web Services in XML
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Other Languages and Protocols
Related to XML
 Simple Object Access Protocol (SOAP)

Platform-independent and programming
language-independent protocol for messaging
and remote procedure calls
 Resource Description Framework (RDF)

Languages and tools for exchanging and
processing of meta-data (schema) descriptions
and specifications over the Web
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Other Languages and Protocols
Related to XML
 OWL Web Ontology Language

Towards Representing Semantic/Content
 Three variants: OWL Lite, OWL DL, and OWL
Full
 eXtensible Access Control Markup
Language (XACML)

Ability to Define Security Policies
 Authentication Policies Defined in XML
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Extracting XML Documents from
Relational Databases
 Creating hierarchical XML views over flat or
graph-based data

Representational issues arise when converting
data from a database system into XML
documents
 UNIVERSITY database example
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Breaking Cycles to Convert
Graphs into Trees
 Complex subset with one or more cycles

Indicate multiple relationships among the
entities
 Difficult to decide how to create the document
hierarchies
 Can replicate the entity types involved to
break the cycles
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Other Steps for Extracting XML
Documents from Databases
 Create correct query in SQL to extract
desired information for XML document
 Restructure query result from flat relational
form to XML tree structure
 Customize query to select either a single
object or multiple objects into document
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Other Usages of XML
 Wide Range of Standards for Domains






health care (with HL7)
finance (with FpML and xBRL)
e-commerce (with cXML)
legal document exchange (with LegalXML)
law enforcement (with MMUCC)
commercial banking (with MISMO),
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Usage in BMI
 The convergence of computation and
biomedicine
 The NIH BMI Science and Tech Initiative:

Define biomedical computing as a science
 Many sources of information:
• Clinical, surgical, genetics, drug design, biology

Standardization in software
 Algorithm development, high speed computing
 All relies on efficient storage and transfer of
information
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML in Clinical Data
 HL7 standards organization.

V2: ASCII bar format. example:
HL7V3|1|2.02
Message|2.16.840.1.113883.1122^CNTRL-3456|2002081614303516^- --->
06:00||3.0|2.16.840.1.113883^POLB_IN004410||P|I|ER|ER
respondTo|RSP|tel:555-555-5555^^WP
entit yRsp|||{FAM^^Hippocrates~GIV^^Harold~GIV^^H~SFX^AC^MD}|tel:555-555-5555^^WP
sender|SND|nfs:127.127.127.255
device||2.16.840.1.113883.1122^GHH LAB|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H
agencyFor
representedOrganization||\NOTH\
location|||2.16.840.1.113883.1122^ELAB-3|{^^GHH Lab}^TN
receiver|RCV|nfs:127.127.127.0
device|||2.16.840.1.113883.1122^GHH O E|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H
agencyFor
representedOrganization|||2.16.840.1.113883.19.3.1001|{^^GHH Outpatient Clinic}^TN
location|||2.16.840.1.113883.1122^BLDG4|{^^GHH Outpatient Clinic}^TN

Awkward, inflexible, unclear meaning of values.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
HL7 V3 Specification
 Built around Reference Information Model:

Entity, Role, Participation, and Act
 Utilizes dedicated vocabularites and data
types.
 Every specification must begin from RIM.
 Clinical Document Architecture

Utilizes XML with tags like ”observation, code,
value and id”.
<observation classCode="OBS" moodCode="EVN">
<id root="10.23.4573.15879"/>
<code code="313193002" codeSystem="2.16.840.1.113883.6.96"
codeSystemName="SNOMED CT" displayName="Peak flow"/>
<effectiveTime value="20000407"/>
<value xsi:type="RTO_PQ_PQ">
<numerator value="260" unit="l"/>
<denominator value="1" unit="min"/>
</value>
</observation>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
HL7’s CDA
 Clinical Document Architecture
 ANSI/HL7 R1-2000, R2-2005
 eDocuments for Interoperability
 Key component for local, regional,
national electronic health records
 Gentle on-ramp to information
exchange
• Everyone uses documents
• EMR compatible, no EMR required
• All types of clinical documents
Consult
Discharge
Radiology
Path
Summary
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
50
CDA
Health
chart
ASTM’s CCR
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
51
HL7 CDA and CCR
 Standards and Usage of XML
Consider CDA – Clinical Document
Architecture
 Standard for Clinical (Provider) Medical Recor

Copyright © 2011 Ramez Elmasri and Shamkant Navathe
HL7 CDA and CCR
 Clinical Record Organized as:

<patient_encounter>
- location

<legal_authenticator> - MD

<originating_organization> and <provider>

<patient> - name, birthdate, gender

<body_confidentiality-”CONF1”> - note
• History
• Past Medical History
• Medications
• Allergies
• Social History
• Physical Exam
• Vitals (BP, Resp, Temp, HR)
• Etc...
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
What is one Possible Solution?
 Let’s Explore this in Greater Detail
 Starting with the CDA Header
<?xml version="1.0"?>
<!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd">
<levelone>
<clinical_document_header>
<id EX="a123" RT="2.16.840.1.113883.3.933"/>
<set_id EX="B" RT="2.16.840.1.113883.3.933"/>
<version_nbr V="2"/>
<document_type_cd V="11488-4" S="2.16.840.1.113883.6.1"
DN="Consultation note"/>
<origination_dttm V="2000-04-07"/>
<confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/>
<confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/>
<document_relationship>
<document_relationship.type_cd V="RPLC"/>
<related_document>
<id EX="a234" RT="2.16.840.1.113883.3.933"/>
<set_id EX="B" RT="2.16.840.1.113883.3.933"/>
<version_nbr V="1"/>
</related_document>
</document_relationship>
<fulfills_order>
<fulfills_order.type_cd V="FLFS"/>
<order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order>
<order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order>
</fulfills_order>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CDA Example - Continued
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
CCR Schema
<xs:schema xmlns="urn:astm-org:CCR" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ccr="urn:astm-org:CCR" targetNamespace="urn:astm-org:CCR"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:import namespace="http://www.w3.org/2000/09/xmldsig#"
schemaLocation="xmldsig-core-schema.xsd" />
<xs:element name="ContinuityOfCareRecord">
<xs:complexType>
<xs:sequence>
<xs:element name="CCRDocumentObjectID" type="xs:string" />
<xs:element name="Language" type="CodedDescriptionType" />
<xs:element name="Version" type="xs:string" />
<xs:element name="DateTime" type="DateTimeType" />
<xs:element name="Patient" maxOccurs="2">
<xs:complexType>
<xs:sequence>
<xs:element name="ActorID" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="From">
<xs:complexType>
<xs:sequence>
<xs:element name="ActorLink" type="ActorReferenceType"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
…Ramez Elmasri and Shamkant Navathe
Copyright © 2011
CCR Instance
<ContinuityOfCareRecord xmlns='urn:astm-org:CCR'>
<CCRDocumentObjectID>Doc</CCRDocumentObjectID>
<Language><Text>English</Text></Language>
<Version>V1.0</Version>
<DateTime><ExactDateTime>2008</ExactDateTime></DateTime>
<Patient><ActorID>Patient</ActorID></Patient>
<Body>
<Problems>
<Problem>
<DateTime>
<Type>
<Text>Start date</Text>
</Type>
<ExactDateTime>2007-04-04T07:00:00Z</ExactDateTime>
</DateTime>
<DateTime>
<Type>
<Text>Stop date</Text>
</Type>
<ExactDateTime>2008-07-20T07:00:00Z</ExactDateTime>
</DateTime>
<Description>
<Code>
<Value>346.80</Value>
<CodingSystem>ICD9</CodingSystem>
<Version>2004</Version>
</Code>
…Ramez Elmasri and Shamkant Navathe
Copyright © 2011
Standard and Databases
 Many Applications Have Standards

Finance (with FpML and xBRL), e-commerce
(with cXML), legal document exchange (with
LegalXML), law enforcement (with MMUCC),
commercial banking (with MISMO)
 DBs Apps must Interact with Standards

Standard Values and Codes Stored in DB
 Need to be Searchable by Various Codes
 Values/Codes Transmitted to other Systems
and Organizations
 Health Care has a Large Set of Standards
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Health Care Standards of Note
 MeSH
 Unified Medical Language System
 ICD9 and ICD9-CM (Intl. Classification
Diseases)
 ICD10 and ICD10-CM
 SNOMED-CT (Clinical Terms)
 National Drug Codes (NDC)
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH
 The Medical Subject Headings (MeSH®) thesaurus is a
controlled vocabulary produced by the National Library of
Medicine and used for indexing, cataloging, and searching
for biomedical and health-related information and
documents.
 2011 MeSH includes the subject descriptors appearing in
MEDLINE®/PubMed®, the NLM catalog database, and
other NLM databases.
 Many synonyms, near-synonyms, and closely related
concepts are included as entry terms to help users find
the most relevant MeSH descriptor for the concept they
are seeking.
 http://www.nlm.nih.gov/mesh/
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH in ASCII
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT DU EC HI IM IP ME PD PK PO RE SD
ST TO TU UR
ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef
ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef
ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM (1991)|900308|abbcdef
ENTRY = A 23187
ENTRY = A23187, Antibiotic
MN = D03.438.221.173
PA = Anti-Bacterial Agents
PA = Ionophores
MH_TH = NLM (1975)
ST = T109
ST = T195
N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))RN = 52665-69-7
PI = Antibiotics (1973-1974)
PI = Carboxylic Acids (1973-1974)
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH in ASCII
MS = An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and
transports cations across membranes and uncouples oxidative phosphorylation while
inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical
tool to study the role of divalent cations in various biological systems.
OL = use CALCIMYCIN to search A 23187 1975-90
PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
MED = *62
MED = 847
M90 = *299
M90 = 2405
M85 = *454
M85 = 2878
M80 = *316
M80 = 1601
M75 = *300
M75 = 823
M66 = *1
M66 = 3
ETC
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH in XML - desc2011.dtd
<!-- MeSH DTD file for Descriptor records. desc2011.dtd -->
<!-- Author: MeSH
-->
<!-- Effective: 09/01/2010
-->
<!-- #PCDATA: parseable character data = text
occurence indicators (default: required, not repeatable):
?: zero or one occurrence, i.e., at most one (optional)
*: zero or more occurrences (optional, repeatable)
+: one or more occurrences (required, repeatable)
|: choice, one or the other, but not both
-->
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
% DescriptorReference "(DescriptorUI, DescriptorName)">
% normal.date "(Year, Month, Day)">
% ConceptReference "(ConceptUI,ConceptName,ConceptUMLSUI?)">
% QualifierReference "(QualifierUI, QualifierName)">
% TermReference "(TermUI, String)">
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH in XML - desc2011.dtd
<!ELEMENT DescriptorRecordSet (DescriptorRecord*)>
<!ATTLIST DescriptorRecordSet LanguageCode
(cze|dut|eng|fin|fre|ger|ita|jpn|lav|por|scr|slv|spa) #REQUIRED>
<!ELEMENT DescriptorRecord (%DescriptorReference;,
DateCreated,
DateRevised?,
DateEstablished?,
ActiveMeSHYearList,
AllowableQualifiersList?,
Annotation?,
HistoryNote?,
OnlineNote?,
PublicMeSHNote?,
PreviousIndexingList?,
EntryCombinationList?,
SeeRelatedList?,
ConsiderAlso?,
PharmacologicalActionList?,
RunningHead?,
TreeNumberList?,
RecordOriginatorsList,
ConceptList) >
<!ATTLIST DescriptorRecord DescriptorClass (1 | 2 | 3 | 4) "1">
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
MeSH in XML - desc2011.dtd
<!ELEMENT ActiveMeSHYearList (Year+)>
<!ELEMENT AllowableQualifiersList (AllowableQualifier+) >
<!ELEMENT AllowableQualifier (QualifierReferredTo,Abbreviation )>
<!ELEMENT Annotation (#PCDATA)>
<!ELEMENT ConsiderAlso (#PCDATA) >
<!ELEMENT Day (#PCDATA)>
<!ELEMENT DescriptorUI (#PCDATA) >
<!ELEMENT DescriptorName (String) >
<!ELEMENT DateCreated (%normal.date;) >
<!ELEMENT DateRevised (%normal.date;) >
<!ELEMENT DateEstablished (%normal.date;) >
<!ELEMENT DescriptorReferredTo (%DescriptorReference;) >
<!ELEMENT EntryCombinationList (EntryCombination+) >
<!ELEMENT EntryCombination (ECIN,
ECOUT)>
<!ELEMENT ECIN (DescriptorReferredTo,QualifierReferredTo) >
<!ELEMENT ECOUT (DescriptorReferredTo,QualifierReferredTo? ) >
<!ELEMENT HistoryNote (#PCDATA)>
<!ELEMENT Month (#PCDATA)>
<!ELEMENT OnlineNote (#PCDATA)>
ETC
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
dMeSH in XML - Sample
<?xml version="1.0"?>
<!DOCTYPE DescriptorRecordSet SYSTEM "desc2011.dtd">
<DescriptorRecordSet LanguageCode = "eng">
<DescriptorRecord DescriptorClass = "1">
<DescriptorUI>D000001</DescriptorUI>
<DescriptorName>
<String>Calcimycin</String>
</DescriptorName>
<DateCreated>
<Year>1974</Year>
<Month>11</Month>
<Day>19</Day>
</DateCreated>
<DateRevised>
<Year>2006</Year>
<Month>07</Month>
<Day>05</Day>
</DateRevised>
<DateEstablished>
<Year>1984</Year>
<Month>01</Month>
<Day>01</Day>
</DateEstablished>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
dMeSH in XML - Sample
<ActiveMeSHYearList>
<Year>2007</Year>
<Year>2008</Year>
<Year>2009</Year>
<Year>2011</Year>
</ActiveMeSHYearList>
<AllowableQualifiersList>
<AllowableQualifier>
<QualifierReferredTo>
<QualifierUI>Q000008</QualifierUI>
<QualifierName>
<String>administration & dosage</String>
</QualifierName>
</QualifierReferredTo>
<Abbreviation>AD</Abbreviation>
</AllowableQualifier>
<AllowableQualifier>
<QualifierReferredTo>
<QualifierUI>Q000009</QualifierUI>
<QualifierName>
<String>adverse effects</String>
</QualifierName>
</QualifierReferredTo>
<Abbreviation>AE</Abbreviation>
</AllowableQualifier>
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Unifies Medical Language System

UMLS
acronym
for
was
developed
for

National Library of Medicine
Disease is semantic type
with around 392 relations
(109 semantic relations
and 22 other relations).
Pneumonia categorized
under one semantic type
Disease, but has
hundreds of relations.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
UMLS Concepts, Semantic
Types/Relations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
ICD9 Respiratory Diseases
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
ICD10 Respiratory Diseases
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Interesting ICD10 Codes




ICD9 had 1
3,000, ICD has 68,000 Codes!
W61.12XA: Struck by macaw, initial encounter.
V97.33XA Sucked into jet engine, initial
encounter
 V97.33XD: Sucked into jet engine, subsequent
encounter.
 Spacecraft crash injuring occupant – V9542XA
 Hurt walking into a lamppost – W2202XA
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
SNOMED-CT
 SNOMED stands for Systemized Nomenclature
Of Medicine Clinical Terms. SNOMED-CT is the
result of merging two ontologies: SNOMED-RT
and Clinical Terms.
 http://www.ihtsdo.org/snomed-ct/
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
81
SNOMED-CT
 Composed of Concepts, Terms, and Relationships
 Precisely Represent Clinical Information Across Scope of
Health Care
 Content Coverage Divided into Hierarchies
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
82
SNOMED
Example
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
National Drug Codes




Tracking of Drugs (Prescription and OTC)
From Submittal Through Approach
Keeps Track of Many Details on Medication
Each Drug by Manufacturer has Unique
NDC Identifier
 See:

http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm
 Searchable Database:

http://www.accessdata.fda.gov/scripts/cder/ndc/default.
cfm
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
NDC Examples
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Models
 Naively there are two models of XML use:

Data-centric
 Document-centric
 In reality, most XML use is a hybrid of the
two
 More important is the database strategy
used with XML

Relational
 Content Managment
 Native XML
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Data – Centric Model
 Information is generally stored in a
relational database
 XML is transport medium, nothing more
 Irrelevant to application that data exists as
XML for some period of time
 Characteristics:

Fine grained data.
 Data relationship is insignificant.
 Need to transfer relational information.
 Means of storing new information.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Document – Centric Model
 When XML is utilized soley as a document

This pesentation in Open Office
 The documents in part, or in full are stored
and retrieved
 Does not originate from relational database
 Document used for human consumption
 Usually information written by hand in a
language like PDF, RTF then converted to
XML
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Reality: Hybrid Model
 Most documents like a PDF will also
contain small grained information (last
edited date, character set)
 Data from a relational DB may even be a
document, or require self description
 Various database technologies support all
models
 Important to understand your data, and
choose most compatible db technology
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML as Data Exchange Medium
 Widespread Usage Across Computing
 UML Tools have Standardized on XML

Export Given UML Design to XML Instances
 Track Both Design Data and Graphical Data
 Database Interactions via XML

Import from XML into a Relational Schema
 Export form a Relational DB into XML Schema
and Instances
 Web Services - SOAP, WSDL, and UDDI
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Medical Data Model
 Medical data is non-homogeneous
 But, there exists general trends in medical
data:

Fine grain data such as dates, times, images
 Documents and human generated descriptions
and observations
 Human interaction creates semi-structured data
 Ability to transfer information is esential
 Medical data fits into hybrid model
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Data – Centric Comparison
 Advantages:

Utlizes existing database software. (IBM, Oracle,
SqlServer)
 Quick ( existing db's are already fast)
 Dual role (not limited only to XML)
 Many even support XQuery
 Disadvantages:


More configuration (mapping relational -> XML)
Slower when creating complex XML files due to middle
step
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Document – Centric Comparison
 Advantages:

Good integration into workflow
 Document managment made easy
 Collaboration, and web publishing
 Disadvantages:

Not able to extract data from document directly
 Not designed for high availability, high load
systems
 Non-uniformity in implementations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Storage Strategy: Relational
 Utilizing a relational database to store XML
documents and data is very popular
 In a very data – centric application this
approach is intuitive
 Most top tier database applications support
XML in some way

Oracle, SQL server, IBM, etc...
 Software is highly supported and well
developed.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
XML Shema mapping
 Using a relational DB requires mapping
XML schema to DB schema.
 Table based:

Often implemented as a middleware layer
 Schema structure must follow row-column
convention
 Object – relational:

XML is a tree of objects
 Mapped to DB using well established methods
 Natively supported in some DB apps
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Storage Strategy: CMS
 CMS – Content Management System
 Used in exclusively document-centric
 Various programs allow indexing, storage,
manipulation, and publication of XML
documents
 Application specific
 Numerous implementations, most recently
Open Office and MS Word 2007
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Native XML Databases
 Definition:

”A database that has an XML document as its
fundamental unit of (logical) storage and
defines a (logical) model for an XML document,
as opposed to the data in that document, and
stores and retrieves documents according to
that model. At a minimum, the model must
include elements, attributes, PCDATA, and
document order.”
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Native XML Databases
 Data types: No support in XML, need a
mapping

Document or database schema can be used
 External user defined mapping
 Not necessary when only transfering data
 No requirement on underlying medium or
implementation
 Two architectures; text and model based
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Native: Text-based
 Use any DB





Rather than mapping schemas, store entire
XML documents
Usually involves saving entire document as a
BLOB / Character LOB
Utilize various text field searches to retrieve
info from XML document
Some DB text searching are being made XML
aware
Speed: Document located on disk preferences
full or partial document retrieval
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Native: Model-based
 Internal object model of document schema
 Store this model in a database

Relational / object-oriented database
 Proprietary
 Performance similar to chosen DB engine
 Still limited by hierarchy of XML data

Retrieve all orderid's from hundreds of docs
slow
 Support for common XML query languages

XPath, XQuery, etc...
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Summary
 Three main types of data: structured, semistructured, and unstructured
 XML standard

Tree-structured (hierarchical) data model
 XML documents and the languages for
specifying the structure of these documents
 XPath and XQuery languages

Query XML data
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Download