XML in a SAS World Mike Molter d-Wise Technologies www.d-Wise.com <MyFamily> <Self eyecolor="brown" sex="M" htft="6" htin="1">Mike</Self> <Spouse eyecolor="hazel" sex="F" htft="5" htin="8" seqno="1">Teresa</Spouse> <Children> <Child eyecolor="hazel" sex="F" htft="5" htin="9" seqno="1" momseq="1">Lauren</Child> <Child eyecolor="brown" sex="M" htft="6" htin="0" seqno="2" momseq="1">Ryan</Child> </Children> <Pet type="dog" sex="F" color1="black" breed1="lab" breed2="unknown" seqno="1">Sydney</Pet> </MyFamily> www.d-Wise.com Background • Author: Mike Molter • Company: d-Wise • Committees: CDISC XML Technologies, Phuse Working Group Best Practices • Reason for presentation: Increasing prevalence of XML in our industry www.d-Wise.com Agenda • What is XML? • Comparison to HTML • Purpose and use • Examples of XML standards (schemas) • Tools for working with XML (SAS and non-SAS) • XML in the pharmaceutical industry www.d-Wise.com HTML • Hypertext Markup Language • Language of the web • Provides instructions to web browsers for displaying <html> content <table> • Pre-defined elements www.d-Wise.com <tr> <th>Team</th> <th>Conference</th> <th>Division</th> </tr> <tr> <td>Red Wings</td> <td>Eastern</td> <td>Atlantic</td> </tr> What is XML? • eXtensible Markup Language • A data container - used for structure, storage, and transport of data (w3schools.com) • Like any other computer language… • • • textual gibberish set of rules (structural, syntax) vocabulary • elements • attributes • tags • schemas www.d-Wise.com What is XML? • Like any other computer language… • textual gibberish • set of rules (structural, syntax) • • vocabulary • elements • attributes • tags • schemas Unlike other computer languages… • • no pre-defined element (no keywords) no processor www.d-Wise.com <MyFamily> <Self eyecolor="brown" sex="M" htft="6" htin="1">Mike</Self> <Spouse eyecolor="hazel" sex="F" htft="5" htin="8" seqno="1">Teresa</Spouse> <Children> <Child eyecolor="hazel" sex="F" htft="5" htin="9" seqno="1" momseq="1">Lauren</Child> <Child eyecolor="brown" sex="M" htft="6" htin="0" seqno="2" momseq="1">Ryan</Child> </Children> <Pet type="dog" sex="F" color1="black" breed1="lab" breed2="unknown" seqno="1">Sydney</Pet> </MyFamily> www.d-Wise.com What is XML? <nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> www.d-Wise.com <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> XML Schema • XML Schema (or Language, or Vocabulary) - A specific set of elements and attributes, along with a set of rules that govern their use • An XML schema can be a combination of new elements along with other XML schemas (extensible) • A schema file lays out the rules of an XML language. • An XML schema language is a computer language in which schema files are written. • Examples: DTD, XSD • An XML validator is a piece of software that uses the schema file to validate an XML file. www.d-Wise.com XML Language Examples • NHL (Ok, I made this one up) • XSL (eXtensible Stylesheet Language, .xsl) • Transforms XML into something else • XML Schema Definition (.xsd) • Validates an XML document • XML Spreadsheet 2003 (.xml) • Read and displayed by Excel • ODM, Define, Dataset-XML, Analysis Results Metadata, OpenCDISC • Clinical Trials data, metadata www.d-Wise.com www.d-Wise.com Exporting XML Teams.sas7bdat www.d-Wise.com Exporting XML with a DATA step filename xmlout4 'C:\teams_datastep.xml' ; data _null_ ; file xmlout4 ; set teams end=thatsit ; if _n_ eq 1 then put '<nhl>' ; put '<team name="' name '">' ; put '<conference>' conference '</conference>' ; put '<division>' division '</division>' ; put '<location>' location '</location>' ; put '</team>' ; if thatsit then put '</nhl>' ; run; www.d-Wise.com Exporting XML with the LIBNAME statement libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run; www.d-Wise.com Exporting XML with the LIBNAME statement libname xmlout xml 'C:\teams_oracle.xml' xmltype=oracle ; data xmlout.xteams ; set teams ; run; www.d-Wise.com Exporting XML with the LIBNAME statement or ODS using tagsets libname xmlout xml 'C:\teams_tagset_libname.xml' tagset=<tagset-name> ; data xmlout.xteams ; set teams ; run; ods markup tagset=<tagset-name> file='C:\teams_tagset_ods.xml'; proc print noobs data=teams ; run; ods markup close ; www.d-Wise.com Exporting XML with ODS using SAS's ExcelXP tagset ods markup tagset=excelxp file='C:\teams_excel.xml'; proc print noobs data=teams ; run; ods markup close ; www.d-Wise.com Importing XML Export libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run; Import data sasteams ; set xmlout.xteams ; run; www.d-Wise.com NHL.XML <nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> www.d-Wise.com libname xmlin xml 'C:\teams_nhl.xml' ; data sasteam ; set xmlin.team ; run; SASTEAM.SAS7BDAT XML in Pharma • Operational Data Model (ODM) • Collected clinical trial data, metadata, administrative data, reference data, audit information • Define-XML • Metadata for submitted data in ODM structure • Value-level metadata is in the define extension • Dataset-XML • Submission data in ODM structure www.d-Wise.com XML in Pharma • Analysis Results Metadata • Metadata that describes the methods used for arriving at the results • OpenCDISC • Extension of Define-XML • Describes validation checks applicable to each domain www.d-Wise.com ODM Conventions • item • common element prefix • represents a variable • def • common element suffix • represents a definition • ref • common element suffix • represents a reference to a def • oid • common attribute suffix • object identifier • represents a link to another part of the document www.d-Wise.com Clinical Data ODM ItemGroup (dataset-level) Metadata www.d-Wise.com Clinical Data ODM ItemGroup (dataset-level) Metadata Item (variable-level) Metadata www.d-Wise.com Item (variable-level) Metadata ODM Codelist Metadata (allowable values) www.d-Wise.com Define-XML www.d-Wise.com Importing XML with an XML map • XMLMap is an XML schema • Provides instructions to the XML LIBNAME engine for reading XML • Name and Label for the data set • Which XML elements define observations • How to define variables (attributes and values) • Uses XPath syntax to navigate the XML document and identify its components filename mymap 'C:\mymap.map' ; libname xmlin xml 'C:\nhl.xml' xmlmap=mymap; data sasteams ; set xmlin.teams ; run; www.d-Wise.com Importing XML with an XML map <?xml version="1.0" encoding="UTF-8"?> <SXLEMAP version="1.2"> <TABLE name="SASTeams"> Name of data set to be created <TABLE-PATH syntax="XPath">/nhl/team</TABLE-PATH> Observation boundary <COLUMN name="conference"> name="name"> <PATH syntax="XPath">/nhl/team/conference</PATH> syntax="XPath">/nhl/team/@name</PATH> <TYPE>character</TYPE> Variable Definition <DATATYPE>string</DATATYPE> <LENGTH>20</LENGTH> </COLUMN> </TABLE> </SXLEMAP> www.d-Wise.com XML Mapper www.d-Wise.com Extensible Stylesheet Language (XSL) • XSLT - XSL Transformations - transforms XML into something else • XSL is an XML schema • An XSL processor reads through an XML document and generates text according to instructions in the stylesheet • XSL processors: • SAS (PROC XSL) • Internet Explorer www.d-Wise.com Extensible Stylesheet Language (XSL) SAS's PROC XSL creates an output file, given an input file and a stylesheet filename inxml 'C:\mysubmission\define.xml' ; filename outhtml 'C:\mysubmission\define.html' ; filename xslss 'C:\mysubmission\define.xsl' ; proc xsl in=inxml out=outhtml xsl=xslss ; run; www.d-Wise.com Extensible Stylesheet Language (XSL) Internet Explorer renders XML as HTML Define.xml via text editor <?xml-stylesheet type="text/xsl" href="define.xsl"?> Define.xml via Internet Explorer HTML generated by XSL <caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption> www.d-Wise.com Extensible Stylesheet Language (XSL) <caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption> <caption> <xsl:value-of select="$g_ItemGroupDefPurpose"/> Datasets for Study <xsl:value-of select="/odm:ODM/odm:Study/odm:GlobalVariables/odm:StudyName"/> ( <xsl:value-of select="$g_StandardName"/> <xsl:text> </xsl:text> <xsl:value-of select="$g_StandardVersion"/> )</caption> www.d-Wise.com Clinical Standards Toolkit (CST) • A Base SAS framework for executing clinical data tasks such as verification of data compliance against standards and importing/exporting ODM and Define.xml. • Contains all necessary files (SAS macros and driver programs, maps, property files, XSL stylesheets) • Learning curve www.d-Wise.com Clinical Standards Toolkit (CST) …or PROC XSL www.d-Wise.com References • Using the SAS Clinical Standards Toolkit 1.5 to Import CDISC ODM Files, Lex Jansen, Pharmasug 2013 • Using the SAS Clinical Standards Toolkit for Define.xml Creation, Lex Jansen, Pharmasug 2011 • Accessing the Metadata from the Define.xml Using XSLT Transformation, Lex Jansen, Phuse 2010 www.d-Wise.com References A SAS Programmer's Guide to Generating Define.xml, Mike Molter, SAS Global Forum 2009 ods markup proc proc proc etc ods markup www.d-Wise.com tagset=mydefine file='define.xml' ; print noobs data=meta-dataset1; run; print noobs data=meta-dataset2; run; print noobs data=meta-dataset3; run; close ; Other Resources • LinkedIn Groups • CDISC XML Technologies • CDISC Define-XML • CDISC Dataset-XML • CDISC-SDTM Experts • wiki.cdisc.org • http://www.cdisc.org www.d-Wise.com In Summary… • Options for Exporting XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • ODS (SAS XML destinations or user-defined tagsets) • DATA step • XSL stylesheets • CST (clinical) • Options for Importing XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • XML maps • XSL stylesheets • CST (clinical) www.d-Wise.com In Summary… So what do I need to know??? www.d-Wise.com