Chapter 12 6e Outline Structured, Semistructured, and Unstructured Data XML Hierarchical (Tree) Data Model XML Documents and XML Schema Storing and Extracting XML Documents from Databases XML Languages Extracting XML Documents from Relational Databases Copyright © 2011 Ramez Elmasri and Shamkant Navathe What is XML? Standards and Usage of XML XML Used in Myriad of Context Modeling and Information Exchange (XML Schemas and Instances) XML Standards • XACML – Access Control Markup Language • OWL – Web Ontology Language • HL7/CDA XML Databases Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML: Extensible Markup Language Data sources Database storing data for Internet applications Utilized to Store Information for Other Apps Emerging Standards across Domains Hypertext documents Common method of specifying contents and formatting of Web pages XML data model Copyright © 2011 Ramez Elmasri and Shamkant Navathe Structured, Semistructured, and Unstructured Data Structured data Represented in a strict format Example: information stored in databases Semistructured data Has a certain structure Not all information collected will have identical structure Copyright © 2011 Ramez Elmasri and Shamkant Navathe Structured, Semistructured, and Unstructured Data Schema information mixed in with data values Self-describing data May be displayed as a directed graph • Labels or tags on directed edges represent: • • • • Schema names Names of attributes Object types (or entity types or classes) Relationships Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML tags XML tags may have an element field used to store information within the tag or Meta-data Plain text can be placed between tags and this text is not parsed CDATA is character data This means that any string of non-markup characters is legal as part of the attribute The ENTITY indicates that the attribute will represent an external entity in the document itself The ID attribute type if you want to specify a unique identifier for each element. Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Schema The structure of an XML document is defined by its schema. Dozens on languages to define XML schema: DTD W3C (XSD) NG - Relax This file can validate any instance of an XML document against it self. This file, or schema also defines allowable tags. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Structured, Semistructured, and Unstructured Data (cont’d.) Copyright © 2011 Ramez Elmasri and Shamkant Navathe Sample XML Structure XML employees a tree structure model for representing data (previous slide) shiporder shipto orderperson orderid name address item title name quantity Copyright © 2011 Ramez Elmasri and Shamkant Navathe price city country Schema Example (XSD) <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema> Copyright © 2011 Ramez Elmasri and Shamkant Navathe Structured, Semistructured, and Unstructured Data Unstructured data Limited indication of the of data document that contains information embedded within it HTML tag Text that appears between angled brackets: <...> End tag Tag with a slash: </...> Copyright © 2011 Ramez Elmasri and Shamkant Navathe Structured, Semistructured, and Unstructured Data HTML uses a large number of predefined tags HTML documents Do not include schema information about type of data Static HTML page All information to be displayed explicitly spelled out as fixed text in HTML file Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Hierarchical Data Model Elements and attributes Main structuring concepts used to construct an XML document Complex elements Constructed from other elements hierarchically Simple elements Contain data values XML tag names Describe the meaning of the data elements in the document Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Hierarchical Data Model Tree model or hierarchical model Main types of XML documents Data-centric XML documents Document-centric XML documents Hybrid XML documents Schemaless XML documents Do not follow a predefined schema of element names and corresponding tree structure Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Hierarchical Data Model XML attributes Describe properties and characteristics of the elements (tags) within which they appear May reference another element in another part of the XML document Common to use attribute values in one element as the references Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Documents and Schema Well formed Has XML declaration • Indicates version of XML being used as well as any other relevant attributes Every element must matching pair of start and end tags • Within start and end tags of parent element DOM (Document Object Model) Manipulate resulting tree representation corresponding to a well-formed XML document Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Documents and Schema SAX (Simple API for XML) Processing of XML documents on the fly • Notifies processing program through callbacks whenever a start or end tag is encountered Makes it easier to process large documents Allows for streaming Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Documents and Schema Valid Document must be well formed Document must follow a particular schema Start and end tag pairs must follow structure specified in separate XML DTD (Document Type Definition) file or XML schema file DTD is Outmoded in Current Usage • Schemas Dominate .. Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Schema XML schema language Standard for specifying the structure of XML documents Uses same syntax rules as regular XML documents • Same processors can be used on both Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Schema Identify specific set of XML schema language elements (tags) being used Specify a file stored at a Web site location XML namespace Defines the set of commands (names) that can be used Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Schema XML schema concepts: Description and XML namespace Annotations, documentation, language Elements and types First level element Element types, minOccurs, and maxOccurs Keys Structures of complex elements Composite attributes Copyright © 2011 Ramez Elmasri and Shamkant Navathe Storing and Extracting XML Documents from Databases Most common approaches Using a DBMS to store the documents as text • Can be used if DBMS has a special module for document processing Using a DBMS to store document contents as data elements • Require mapping algorithms to design a database schema that is compatible with XML document structure Copyright © 2011 Ramez Elmasri and Shamkant Navathe Storing and Extracting XML Documents from Databases Designing a specialized system for storing native XML data • Called Native XML DBMSs Creating or publishing customized XML documents from preexisting relational databases • Use a separate middleware software layer to handle conversions Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Languages Two query language standards XPath • Specify path expressions to identify certain nodes (elements) or attributes within an XML document that match specific patterns XQuery • Uses XPath expressions but has additional constructs Copyright © 2011 Ramez Elmasri and Shamkant Navathe XPath: Specifying Path Expressions in XML XPath expression Returns a sequence of items that satisfy a certain pattern as specified by the expression Either values (from leaf nodes) or elements or attributes Qualifier conditions • Further restrict nodes that satisfy pattern Separators used when specifying a path: Single slash (/) and double slash (//) Copyright © 2011 Ramez Elmasri and Shamkant Navathe XPath: Specifying Path Expressions in XML Copyright © 2011 Ramez Elmasri and Shamkant Navathe XPath: Specifying Path Expressions in XML Attribute name prefixed by the @ symbol Wildcard symbol * Stands for any element Example: /company/* Copyright © 2011 Ramez Elmasri and Shamkant Navathe XPath: Specifying Path Expressions in XML Axes Move in multiple directions from current node in path expression Include self, child, descendent, attribute, parent, ancestor, previous sibling, and next sibling Copyright © 2011 Ramez Elmasri and Shamkant Navathe XPath: Specifying Path Expressions in XML Main restriction of XPath path expressions Path that specifies the pattern also specifies the items to be retrieved Difficult to specify certain conditions on the pattern while separately specifying which result items should be retrieved Copyright © 2011 Ramez Elmasri and Shamkant Navathe Querying XML - XPath Many languages to query XML XPath and XQuery are W3C standards Xpath is a compact method of traversing previous tree Designed to facilitate use via URL/URI's /shiporder/item/name ← view all items' names Extensible to add user defined behaviors Treats each tag as a node in the tree Copyright © 2011 Ramez Elmasri and Shamkant Navathe Querying XML - XQuery Functional extension of XPath XML equivalent of SQL Navigate and manipulate document nodes. Works on collections of documents, or even fragments. FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998" RETURN $b/title Copyright © 2011 Ramez Elmasri and Shamkant Navathe XQuery: Specifying Queries in XML XQuery FLWR expression Four main clauses of XQuery Form: FOR <variable bindings to individual nodes (elements)> LET <variable bindings to collections of nodes (elements)> WHERE <qualifier conditions> RETURN <query result specification> Zero or more instances of FOR and LET clauses Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe XQuery: Specifying Queries in XML XQuery contains powerful constructs to specify complex queries www.w3.org Contains documents describing the latest standards related to XML and XQuery Copyright © 2011 Ramez Elmasri and Shamkant Navathe Other Languages and Protocols Related to XML Extensible Stylesheet Language (XSL) Define how a document should be rendered for display by a Web browser Extensible Stylesheet Language for Transformations (XSLT) Transform one structure into different structure Web Services Description Language (WSDL) Description of Web Services in XML Copyright © 2011 Ramez Elmasri and Shamkant Navathe Other Languages and Protocols Related to XML Simple Object Access Protocol (SOAP) Platform-independent and programming language-independent protocol for messaging and remote procedure calls Resource Description Framework (RDF) Languages and tools for exchanging and processing of meta-data (schema) descriptions and specifications over the Web Copyright © 2011 Ramez Elmasri and Shamkant Navathe Other Languages and Protocols Related to XML OWL Web Ontology Language Towards Representing Semantic/Content Three variants: OWL Lite, OWL DL, and OWL Full eXtensible Access Control Markup Language (XACML) Ability to Define Security Policies Authentication Policies Defined in XML Copyright © 2011 Ramez Elmasri and Shamkant Navathe Extracting XML Documents from Relational Databases Creating hierarchical XML views over flat or graph-based data Representational issues arise when converting data from a database system into XML documents UNIVERSITY database example Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe Copyright © 2011 Ramez Elmasri and Shamkant Navathe Breaking Cycles to Convert Graphs into Trees Complex subset with one or more cycles Indicate multiple relationships among the entities Difficult to decide how to create the document hierarchies Can replicate the entity types involved to break the cycles Copyright © 2011 Ramez Elmasri and Shamkant Navathe Other Steps for Extracting XML Documents from Databases Create correct query in SQL to extract desired information for XML document Restructure query result from flat relational form to XML tree structure Customize query to select either a single object or multiple objects into document Copyright © 2011 Ramez Elmasri and Shamkant Navathe Other Usages of XML Wide Range of Standards for Domains health care (with HL7) finance (with FpML and xBRL) e-commerce (with cXML) legal document exchange (with LegalXML) law enforcement (with MMUCC) commercial banking (with MISMO), Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Usage in BMI The convergence of computation and biomedicine The NIH BMI Science and Tech Initiative: Define biomedical computing as a science Many sources of information: • Clinical, surgical, genetics, drug design, biology Standardization in software Algorithm development, high speed computing All relies on efficient storage and transfer of information Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML in Clinical Data HL7 standards organization. V2: ASCII bar format. example: HL7V3|1|2.02 Message|2.16.840.1.113883.1122^CNTRL-3456|2002081614303516^- ---> 06:00||3.0|2.16.840.1.113883^POLB_IN004410||P|I|ER|ER respondTo|RSP|tel:555-555-5555^^WP entit yRsp|||{FAM^^Hippocrates~GIV^^Harold~GIV^^H~SFX^AC^MD}|tel:555-555-5555^^WP sender|SND|nfs:127.127.127.255 device||2.16.840.1.113883.1122^GHH LAB|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization||\NOTH\ location|||2.16.840.1.113883.1122^ELAB-3|{^^GHH Lab}^TN receiver|RCV|nfs:127.127.127.0 device|||2.16.840.1.113883.1122^GHH O E|{GIV^^An Entit y Name}^L|||tel:555-555-2005^^H agencyFor representedOrganization|||2.16.840.1.113883.19.3.1001|{^^GHH Outpatient Clinic}^TN location|||2.16.840.1.113883.1122^BLDG4|{^^GHH Outpatient Clinic}^TN Awkward, inflexible, unclear meaning of values. Copyright © 2011 Ramez Elmasri and Shamkant Navathe HL7 V3 Specification Built around Reference Information Model: Entity, Role, Participation, and Act Utilizes dedicated vocabularites and data types. Every specification must begin from RIM. Clinical Document Architecture Utilizes XML with tags like ”observation, code, value and id”. <observation classCode="OBS" moodCode="EVN"> <id root="10.23.4573.15879"/> <code code="313193002" codeSystem="2.16.840.1.113883.6.96" codeSystemName="SNOMED CT" displayName="Peak flow"/> <effectiveTime value="20000407"/> <value xsi:type="RTO_PQ_PQ"> <numerator value="260" unit="l"/> <denominator value="1" unit="min"/> </value> </observation> Copyright © 2011 Ramez Elmasri and Shamkant Navathe HL7’s CDA Clinical Document Architecture ANSI/HL7 R1-2000, R2-2005 eDocuments for Interoperability Key component for local, regional, national electronic health records Gentle on-ramp to information exchange • Everyone uses documents • EMR compatible, no EMR required • All types of clinical documents Consult Discharge Radiology Path Summary Copyright © 2011 Ramez Elmasri and Shamkant Navathe 50 CDA Health chart ASTM’s CCR Copyright © 2011 Ramez Elmasri and Shamkant Navathe 51 HL7 CDA and CCR Standards and Usage of XML Consider CDA – Clinical Document Architecture Standard for Clinical (Provider) Medical Recor Copyright © 2011 Ramez Elmasri and Shamkant Navathe HL7 CDA and CCR Clinical Record Organized as: <patient_encounter> - location <legal_authenticator> - MD <originating_organization> and <provider> <patient> - name, birthdate, gender <body_confidentiality-”CONF1”> - note • History • Past Medical History • Medications • Allergies • Social History • Physical Exam • Vitals (BP, Resp, Temp, HR) • Etc... Copyright © 2011 Ramez Elmasri and Shamkant Navathe What is one Possible Solution? Let’s Explore this in Greater Detail Starting with the CDA Header <?xml version="1.0"?> <!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd"> <levelone> <clinical_document_header> <id EX="a123" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="2"/> <document_type_cd V="11488-4" S="2.16.840.1.113883.6.1" DN="Consultation note"/> <origination_dttm V="2000-04-07"/> <confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/> <confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/> <document_relationship> <document_relationship.type_cd V="RPLC"/> <related_document> <id EX="a234" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="1"/> </related_document> </document_relationship> <fulfills_order> <fulfills_order.type_cd V="FLFS"/> <order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order> <order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order> </fulfills_order> Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CDA Example - Continued Copyright © 2011 Ramez Elmasri and Shamkant Navathe CCR Schema <xs:schema xmlns="urn:astm-org:CCR" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ccr="urn:astm-org:CCR" targetNamespace="urn:astm-org:CCR" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="http://www.w3.org/2000/09/xmldsig#" schemaLocation="xmldsig-core-schema.xsd" /> <xs:element name="ContinuityOfCareRecord"> <xs:complexType> <xs:sequence> <xs:element name="CCRDocumentObjectID" type="xs:string" /> <xs:element name="Language" type="CodedDescriptionType" /> <xs:element name="Version" type="xs:string" /> <xs:element name="DateTime" type="DateTimeType" /> <xs:element name="Patient" maxOccurs="2"> <xs:complexType> <xs:sequence> <xs:element name="ActorID" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="From"> <xs:complexType> <xs:sequence> <xs:element name="ActorLink" type="ActorReferenceType" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> …Ramez Elmasri and Shamkant Navathe Copyright © 2011 CCR Instance <ContinuityOfCareRecord xmlns='urn:astm-org:CCR'> <CCRDocumentObjectID>Doc</CCRDocumentObjectID> <Language><Text>English</Text></Language> <Version>V1.0</Version> <DateTime><ExactDateTime>2008</ExactDateTime></DateTime> <Patient><ActorID>Patient</ActorID></Patient> <Body> <Problems> <Problem> <DateTime> <Type> <Text>Start date</Text> </Type> <ExactDateTime>2007-04-04T07:00:00Z</ExactDateTime> </DateTime> <DateTime> <Type> <Text>Stop date</Text> </Type> <ExactDateTime>2008-07-20T07:00:00Z</ExactDateTime> </DateTime> <Description> <Code> <Value>346.80</Value> <CodingSystem>ICD9</CodingSystem> <Version>2004</Version> </Code> …Ramez Elmasri and Shamkant Navathe Copyright © 2011 Standard and Databases Many Applications Have Standards Finance (with FpML and xBRL), e-commerce (with cXML), legal document exchange (with LegalXML), law enforcement (with MMUCC), commercial banking (with MISMO) DBs Apps must Interact with Standards Standard Values and Codes Stored in DB Need to be Searchable by Various Codes Values/Codes Transmitted to other Systems and Organizations Health Care has a Large Set of Standards Copyright © 2011 Ramez Elmasri and Shamkant Navathe Health Care Standards of Note MeSH Unified Medical Language System ICD9 and ICD9-CM (Intl. Classification Diseases) ICD10 and ICD10-CM SNOMED-CT (Clinical Terms) National Drug Codes (NDC) Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH The Medical Subject Headings (MeSH®) thesaurus is a controlled vocabulary produced by the National Library of Medicine and used for indexing, cataloging, and searching for biomedical and health-related information and documents. 2011 MeSH includes the subject descriptors appearing in MEDLINE®/PubMed®, the NLM catalog database, and other NLM databases. Many synonyms, near-synonyms, and closely related concepts are included as entry terms to help users find the most relevant MeSH descriptor for the concept they are seeking. http://www.nlm.nih.gov/mesh/ Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH in ASCII *NEWRECORD RECTYPE = D MH = Calcimycin AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT DU EC HI IM IP ME PD PK PO RE SD ST TO TU UR ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM (1991)|900308|abbcdef ENTRY = A 23187 ENTRY = A23187, Antibiotic MN = D03.438.221.173 PA = Anti-Bacterial Agents PA = Ionophores MH_TH = NLM (1975) ST = T109 ST = T195 N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))RN = 52665-69-7 PI = Antibiotics (1973-1974) PI = Carboxylic Acids (1973-1974) Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH in ASCII MS = An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems. OL = use CALCIMYCIN to search A 23187 1975-90 PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83) HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83) MED = *62 MED = 847 M90 = *299 M90 = 2405 M85 = *454 M85 = 2878 M80 = *316 M80 = 1601 M75 = *300 M75 = 823 M66 = *1 M66 = 3 ETC Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH in XML - desc2011.dtd <!-- MeSH DTD file for Descriptor records. desc2011.dtd --> <!-- Author: MeSH --> <!-- Effective: 09/01/2010 --> <!-- #PCDATA: parseable character data = text occurence indicators (default: required, not repeatable): ?: zero or one occurrence, i.e., at most one (optional) *: zero or more occurrences (optional, repeatable) +: one or more occurrences (required, repeatable) |: choice, one or the other, but not both --> <!ENTITY <!ENTITY <!ENTITY <!ENTITY <!ENTITY % DescriptorReference "(DescriptorUI, DescriptorName)"> % normal.date "(Year, Month, Day)"> % ConceptReference "(ConceptUI,ConceptName,ConceptUMLSUI?)"> % QualifierReference "(QualifierUI, QualifierName)"> % TermReference "(TermUI, String)"> Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH in XML - desc2011.dtd <!ELEMENT DescriptorRecordSet (DescriptorRecord*)> <!ATTLIST DescriptorRecordSet LanguageCode (cze|dut|eng|fin|fre|ger|ita|jpn|lav|por|scr|slv|spa) #REQUIRED> <!ELEMENT DescriptorRecord (%DescriptorReference;, DateCreated, DateRevised?, DateEstablished?, ActiveMeSHYearList, AllowableQualifiersList?, Annotation?, HistoryNote?, OnlineNote?, PublicMeSHNote?, PreviousIndexingList?, EntryCombinationList?, SeeRelatedList?, ConsiderAlso?, PharmacologicalActionList?, RunningHead?, TreeNumberList?, RecordOriginatorsList, ConceptList) > <!ATTLIST DescriptorRecord DescriptorClass (1 | 2 | 3 | 4) "1"> Copyright © 2011 Ramez Elmasri and Shamkant Navathe MeSH in XML - desc2011.dtd <!ELEMENT ActiveMeSHYearList (Year+)> <!ELEMENT AllowableQualifiersList (AllowableQualifier+) > <!ELEMENT AllowableQualifier (QualifierReferredTo,Abbreviation )> <!ELEMENT Annotation (#PCDATA)> <!ELEMENT ConsiderAlso (#PCDATA) > <!ELEMENT Day (#PCDATA)> <!ELEMENT DescriptorUI (#PCDATA) > <!ELEMENT DescriptorName (String) > <!ELEMENT DateCreated (%normal.date;) > <!ELEMENT DateRevised (%normal.date;) > <!ELEMENT DateEstablished (%normal.date;) > <!ELEMENT DescriptorReferredTo (%DescriptorReference;) > <!ELEMENT EntryCombinationList (EntryCombination+) > <!ELEMENT EntryCombination (ECIN, ECOUT)> <!ELEMENT ECIN (DescriptorReferredTo,QualifierReferredTo) > <!ELEMENT ECOUT (DescriptorReferredTo,QualifierReferredTo? ) > <!ELEMENT HistoryNote (#PCDATA)> <!ELEMENT Month (#PCDATA)> <!ELEMENT OnlineNote (#PCDATA)> ETC Copyright © 2011 Ramez Elmasri and Shamkant Navathe dMeSH in XML - Sample <?xml version="1.0"?> <!DOCTYPE DescriptorRecordSet SYSTEM "desc2011.dtd"> <DescriptorRecordSet LanguageCode = "eng"> <DescriptorRecord DescriptorClass = "1"> <DescriptorUI>D000001</DescriptorUI> <DescriptorName> <String>Calcimycin</String> </DescriptorName> <DateCreated> <Year>1974</Year> <Month>11</Month> <Day>19</Day> </DateCreated> <DateRevised> <Year>2006</Year> <Month>07</Month> <Day>05</Day> </DateRevised> <DateEstablished> <Year>1984</Year> <Month>01</Month> <Day>01</Day> </DateEstablished> Copyright © 2011 Ramez Elmasri and Shamkant Navathe dMeSH in XML - Sample <ActiveMeSHYearList> <Year>2007</Year> <Year>2008</Year> <Year>2009</Year> <Year>2011</Year> </ActiveMeSHYearList> <AllowableQualifiersList> <AllowableQualifier> <QualifierReferredTo> <QualifierUI>Q000008</QualifierUI> <QualifierName> <String>administration &amp; dosage</String> </QualifierName> </QualifierReferredTo> <Abbreviation>AD</Abbreviation> </AllowableQualifier> <AllowableQualifier> <QualifierReferredTo> <QualifierUI>Q000009</QualifierUI> <QualifierName> <String>adverse effects</String> </QualifierName> </QualifierReferredTo> <Abbreviation>AE</Abbreviation> </AllowableQualifier> Copyright © 2011 Ramez Elmasri and Shamkant Navathe Unifies Medical Language System UMLS acronym for was developed for National Library of Medicine Disease is semantic type with around 392 relations (109 semantic relations and 22 other relations). Pneumonia categorized under one semantic type Disease, but has hundreds of relations. Copyright © 2011 Ramez Elmasri and Shamkant Navathe UMLS Concepts, Semantic Types/Relations Copyright © 2011 Ramez Elmasri and Shamkant Navathe ICD9 Respiratory Diseases Copyright © 2011 Ramez Elmasri and Shamkant Navathe ICD10 Respiratory Diseases Copyright © 2011 Ramez Elmasri and Shamkant Navathe Interesting ICD10 Codes ICD9 had 1 3,000, ICD has 68,000 Codes! W61.12XA: Struck by macaw, initial encounter. V97.33XA Sucked into jet engine, initial encounter V97.33XD: Sucked into jet engine, subsequent encounter. Spacecraft crash injuring occupant – V9542XA Hurt walking into a lamppost – W2202XA Copyright © 2011 Ramez Elmasri and Shamkant Navathe SNOMED-CT SNOMED stands for Systemized Nomenclature Of Medicine Clinical Terms. SNOMED-CT is the result of merging two ontologies: SNOMED-RT and Clinical Terms. http://www.ihtsdo.org/snomed-ct/ Copyright © 2011 Ramez Elmasri and Shamkant Navathe 81 SNOMED-CT Composed of Concepts, Terms, and Relationships Precisely Represent Clinical Information Across Scope of Health Care Content Coverage Divided into Hierarchies Copyright © 2011 Ramez Elmasri and Shamkant Navathe 82 SNOMED Example Copyright © 2011 Ramez Elmasri and Shamkant Navathe National Drug Codes Tracking of Drugs (Prescription and OTC) From Submittal Through Approach Keeps Track of Many Details on Medication Each Drug by Manufacturer has Unique NDC Identifier See: http://www.fda.gov/Drugs/InformationOnDrugs/ucm142438.htm Searchable Database: http://www.accessdata.fda.gov/scripts/cder/ndc/default. cfm Copyright © 2011 Ramez Elmasri and Shamkant Navathe NDC Examples Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Models Naively there are two models of XML use: Data-centric Document-centric In reality, most XML use is a hybrid of the two More important is the database strategy used with XML Relational Content Managment Native XML Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data – Centric Model Information is generally stored in a relational database XML is transport medium, nothing more Irrelevant to application that data exists as XML for some period of time Characteristics: Fine grained data. Data relationship is insignificant. Need to transfer relational information. Means of storing new information. Copyright © 2011 Ramez Elmasri and Shamkant Navathe Document – Centric Model When XML is utilized soley as a document This pesentation in Open Office The documents in part, or in full are stored and retrieved Does not originate from relational database Document used for human consumption Usually information written by hand in a language like PDF, RTF then converted to XML Copyright © 2011 Ramez Elmasri and Shamkant Navathe Reality: Hybrid Model Most documents like a PDF will also contain small grained information (last edited date, character set) Data from a relational DB may even be a document, or require self description Various database technologies support all models Important to understand your data, and choose most compatible db technology Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML as Data Exchange Medium Widespread Usage Across Computing UML Tools have Standardized on XML Export Given UML Design to XML Instances Track Both Design Data and Graphical Data Database Interactions via XML Import from XML into a Relational Schema Export form a Relational DB into XML Schema and Instances Web Services - SOAP, WSDL, and UDDI Copyright © 2011 Ramez Elmasri and Shamkant Navathe Medical Data Model Medical data is non-homogeneous But, there exists general trends in medical data: Fine grain data such as dates, times, images Documents and human generated descriptions and observations Human interaction creates semi-structured data Ability to transfer information is esential Medical data fits into hybrid model Copyright © 2011 Ramez Elmasri and Shamkant Navathe Data – Centric Comparison Advantages: Utlizes existing database software. (IBM, Oracle, SqlServer) Quick ( existing db's are already fast) Dual role (not limited only to XML) Many even support XQuery Disadvantages: More configuration (mapping relational -> XML) Slower when creating complex XML files due to middle step Copyright © 2011 Ramez Elmasri and Shamkant Navathe Document – Centric Comparison Advantages: Good integration into workflow Document managment made easy Collaboration, and web publishing Disadvantages: Not able to extract data from document directly Not designed for high availability, high load systems Non-uniformity in implementations Copyright © 2011 Ramez Elmasri and Shamkant Navathe Storage Strategy: Relational Utilizing a relational database to store XML documents and data is very popular In a very data – centric application this approach is intuitive Most top tier database applications support XML in some way Oracle, SQL server, IBM, etc... Software is highly supported and well developed. Copyright © 2011 Ramez Elmasri and Shamkant Navathe XML Shema mapping Using a relational DB requires mapping XML schema to DB schema. Table based: Often implemented as a middleware layer Schema structure must follow row-column convention Object – relational: XML is a tree of objects Mapped to DB using well established methods Natively supported in some DB apps Copyright © 2011 Ramez Elmasri and Shamkant Navathe Storage Strategy: CMS CMS – Content Management System Used in exclusively document-centric Various programs allow indexing, storage, manipulation, and publication of XML documents Application specific Numerous implementations, most recently Open Office and MS Word 2007 Copyright © 2011 Ramez Elmasri and Shamkant Navathe Native XML Databases Definition: ”A database that has an XML document as its fundamental unit of (logical) storage and defines a (logical) model for an XML document, as opposed to the data in that document, and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order.” Copyright © 2011 Ramez Elmasri and Shamkant Navathe Native XML Databases Data types: No support in XML, need a mapping Document or database schema can be used External user defined mapping Not necessary when only transfering data No requirement on underlying medium or implementation Two architectures; text and model based Copyright © 2011 Ramez Elmasri and Shamkant Navathe Native: Text-based Use any DB Rather than mapping schemas, store entire XML documents Usually involves saving entire document as a BLOB / Character LOB Utilize various text field searches to retrieve info from XML document Some DB text searching are being made XML aware Speed: Document located on disk preferences full or partial document retrieval Copyright © 2011 Ramez Elmasri and Shamkant Navathe Native: Model-based Internal object model of document schema Store this model in a database Relational / object-oriented database Proprietary Performance similar to chosen DB engine Still limited by hierarchy of XML data Retrieve all orderid's from hundreds of docs slow Support for common XML query languages XPath, XQuery, etc... Copyright © 2011 Ramez Elmasri and Shamkant Navathe Summary Three main types of data: structured, semistructured, and unstructured XML standard Tree-structured (hierarchical) data model XML documents and the languages for specifying the structure of these documents XPath and XQuery languages Query XML data Copyright © 2011 Ramez Elmasri and Shamkant Navathe