XML in a SAS World
Mike Molter
d-Wise Technologies
www.d-Wise.com
<MyFamily>
<Self eyecolor="brown" sex="M" htft="6"
htin="1">Mike</Self>
<Spouse eyecolor="hazel" sex="F" htft="5" htin="8"
seqno="1">Teresa</Spouse>
<Children>
<Child eyecolor="hazel" sex="F" htft="5" htin="9"
seqno="1" momseq="1">Lauren</Child>
<Child eyecolor="brown" sex="M" htft="6" htin="0"
seqno="2" momseq="1">Ryan</Child>
</Children>
<Pet type="dog" sex="F" color1="black" breed1="lab"
breed2="unknown" seqno="1">Sydney</Pet>
</MyFamily>
www.d-Wise.com
Background
• Author: Mike Molter
• Company: d-Wise
• Committees: CDISC XML Technologies, Phuse
Working Group Best Practices
• Reason for presentation: Increasing prevalence of
XML in our industry
www.d-Wise.com
Agenda
• What is XML?
• Comparison to HTML
• Purpose and use
• Examples of XML standards (schemas)
• Tools for working with XML (SAS and non-SAS)
• XML in the pharmaceutical industry
www.d-Wise.com
HTML
• Hypertext Markup Language
• Language of the web
• Provides instructions to web browsers for displaying
<html>
content
<table>
• Pre-defined elements
www.d-Wise.com
<tr>
<th>Team</th>
<th>Conference</th>
<th>Division</th>
</tr>
<tr>
<td>Red Wings</td>
<td>Eastern</td>
<td>Atlantic</td>
</tr>
What is XML?
•
eXtensible Markup Language
•
A data container - used for structure, storage, and
transport of data (w3schools.com)
•
Like any other computer language…
•
•
•
textual gibberish
set of rules (structural, syntax)
vocabulary
•
elements
•
attributes
•
tags
•
schemas
www.d-Wise.com
What is XML?
•
Like any other computer language…
•
textual gibberish
•
set of rules (structural, syntax)
•
•
vocabulary
•
elements
•
attributes
•
tags
•
schemas
Unlike other computer languages…
•
•
no pre-defined element (no keywords)
no processor
www.d-Wise.com
<MyFamily>
<Self eyecolor="brown" sex="M" htft="6"
htin="1">Mike</Self>
<Spouse eyecolor="hazel" sex="F" htft="5" htin="8"
seqno="1">Teresa</Spouse>
<Children>
<Child eyecolor="hazel" sex="F" htft="5" htin="9"
seqno="1" momseq="1">Lauren</Child>
<Child eyecolor="brown" sex="M" htft="6" htin="0"
seqno="2" momseq="1">Ryan</Child>
</Children>
<Pet type="dog" sex="F" color1="black" breed1="lab"
breed2="unknown" seqno="1">Sydney</Pet>
</MyFamily>
www.d-Wise.com
What is XML?
<nhl>
<team name="Red Wings">
<conference>Eastern</conference>
<division>Atlantic</division>
<location>Detroit</location>
</team>
<team name="Flames">
<conference>Western</conference>
<division>Pacific</division>
<location>Calgary</location>
</team>
www.d-Wise.com
<team name="Devils">
<conference>Eastern</conference>
<division>Metropolitan</division>
<location>New Jersey</location>
</team>
</nhl>
XML Schema
• XML Schema (or Language, or Vocabulary) - A specific set of
elements and attributes, along with a set of rules that govern
their use
• An XML schema can be a combination of new elements along
with other XML schemas (extensible)
• A schema file lays out the rules of an XML language.
• An XML schema language is a computer language in which
schema files are written.
• Examples: DTD, XSD
• An XML validator is a piece of software that uses the schema
file to validate an XML file.
www.d-Wise.com
XML Language Examples
• NHL (Ok, I made this one up)
• XSL (eXtensible Stylesheet Language, .xsl)
• Transforms XML into something else
• XML Schema Definition (.xsd)
• Validates an XML document
• XML Spreadsheet 2003 (.xml)
• Read and displayed by Excel
• ODM, Define, Dataset-XML, Analysis Results Metadata, OpenCDISC
• Clinical Trials data, metadata
www.d-Wise.com
www.d-Wise.com
Exporting XML
Teams.sas7bdat
www.d-Wise.com
Exporting XML with a DATA step
filename xmlout4 'C:\teams_datastep.xml' ;
data _null_ ;
file xmlout4 ;
set teams end=thatsit ;
if _n_ eq 1 then put '<nhl>' ;
put '<team name="' name '">' ;
put '<conference>' conference '</conference>' ;
put '<division>' division '</division>' ;
put '<location>' location '</location>' ;
put '</team>' ;
if thatsit then put '</nhl>' ;
run;
www.d-Wise.com
Exporting XML with the LIBNAME statement
libname xmlout xml 'C:\teams_generic.xml' ;
data xmlout.xteams ;
set teams ;
run;
www.d-Wise.com
Exporting XML with the LIBNAME statement
libname xmlout xml 'C:\teams_oracle.xml' xmltype=oracle ;
data xmlout.xteams ;
set teams ;
run;
www.d-Wise.com
Exporting XML with the LIBNAME statement or
ODS using tagsets
libname xmlout xml 'C:\teams_tagset_libname.xml'
tagset=<tagset-name> ;
data xmlout.xteams ;
set teams ;
run;
ods markup tagset=<tagset-name>
file='C:\teams_tagset_ods.xml';
proc print noobs data=teams ;
run;
ods markup close ;
www.d-Wise.com
Exporting XML with ODS using SAS's ExcelXP tagset
ods markup tagset=excelxp file='C:\teams_excel.xml';
proc print noobs data=teams ;
run;
ods markup close ;
www.d-Wise.com
Importing XML
Export
libname xmlout xml 'C:\teams_generic.xml' ;
data xmlout.xteams ;
set teams ;
run;
Import
data sasteams ;
set xmlout.xteams ;
run;
www.d-Wise.com
NHL.XML
<nhl>
<team name="Red Wings">
<conference>Eastern</conference>
<division>Atlantic</division>
<location>Detroit</location>
</team>
<team name="Flames">
<conference>Western</conference>
<division>Pacific</division>
<location>Calgary</location>
</team>
<team name="Devils">
<conference>Eastern</conference>
<division>Metropolitan</division>
<location>New Jersey</location>
</team>
</nhl>
www.d-Wise.com
libname xmlin xml
'C:\teams_nhl.xml' ;
data sasteam ;
set xmlin.team ;
run;
SASTEAM.SAS7BDAT
XML in Pharma
• Operational Data Model (ODM)
• Collected clinical trial data, metadata, administrative
data, reference data, audit information
• Define-XML
• Metadata for submitted data in ODM structure
• Value-level metadata is in the define extension
• Dataset-XML
• Submission data in ODM structure
www.d-Wise.com
XML in Pharma
• Analysis Results Metadata
• Metadata that describes the methods used for arriving
at the results
• OpenCDISC
• Extension of Define-XML
• Describes validation checks applicable to each domain
www.d-Wise.com
ODM Conventions
• item
• common element prefix
• represents a variable
• def
• common element suffix
• represents a definition
• ref
• common element suffix
• represents a reference to a def
• oid
• common attribute suffix
• object identifier
• represents a link to another part of the document
www.d-Wise.com
Clinical Data
ODM
ItemGroup (dataset-level) Metadata
www.d-Wise.com
Clinical Data
ODM
ItemGroup (dataset-level) Metadata
Item (variable-level) Metadata
www.d-Wise.com
Item (variable-level) Metadata
ODM
Codelist Metadata (allowable values)
www.d-Wise.com
Define-XML
www.d-Wise.com
Importing XML with an XML map
• XMLMap is an XML schema
• Provides instructions to the XML LIBNAME engine for reading XML
• Name and Label for the data set
• Which XML elements define observations
• How to define variables (attributes and values)
• Uses XPath syntax to navigate the XML document and identify its
components
filename mymap 'C:\mymap.map' ;
libname xmlin xml 'C:\nhl.xml' xmlmap=mymap;
data sasteams ;
set xmlin.teams ;
run;
www.d-Wise.com
Importing XML with an XML map
<?xml version="1.0" encoding="UTF-8"?>
<SXLEMAP version="1.2">
<TABLE name="SASTeams">
 Name of data set to be created
<TABLE-PATH syntax="XPath">/nhl/team</TABLE-PATH>
 Observation boundary
<COLUMN name="conference">
name="name">
<PATH syntax="XPath">/nhl/team/conference</PATH>
syntax="XPath">/nhl/team/@name</PATH>
<TYPE>character</TYPE>
 Variable Definition
<DATATYPE>string</DATATYPE>
<LENGTH>20</LENGTH>
</COLUMN>
</TABLE>
</SXLEMAP>
www.d-Wise.com
XML Mapper
www.d-Wise.com
Extensible Stylesheet Language (XSL)
• XSLT - XSL Transformations - transforms XML into
something else
• XSL is an XML schema
• An XSL processor reads through an XML document and
generates text according to instructions in the stylesheet
• XSL processors:
• SAS (PROC XSL)
• Internet Explorer
www.d-Wise.com
Extensible Stylesheet Language (XSL)
SAS's PROC XSL creates an output file,
given an input file and a stylesheet
filename inxml 'C:\mysubmission\define.xml' ;
filename outhtml 'C:\mysubmission\define.html' ;
filename xslss 'C:\mysubmission\define.xsl' ;
proc xsl in=inxml out=outhtml xsl=xslss ; run;
www.d-Wise.com
Extensible Stylesheet Language (XSL)
Internet Explorer renders XML as HTML
Define.xml via text editor
<?xml-stylesheet type="text/xsl" href="define.xsl"?>
Define.xml via Internet Explorer
HTML generated by XSL
<caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption>
www.d-Wise.com
Extensible Stylesheet Language (XSL)
<caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption>
<caption>
<xsl:value-of select="$g_ItemGroupDefPurpose"/>
Datasets for Study
<xsl:value-of
select="/odm:ODM/odm:Study/odm:GlobalVariables/odm:StudyName"/>
(
<xsl:value-of select="$g_StandardName"/>
<xsl:text> </xsl:text>
<xsl:value-of select="$g_StandardVersion"/>
)</caption>
www.d-Wise.com
Clinical Standards Toolkit (CST)
• A Base SAS framework for executing clinical data tasks
such as verification of data compliance against standards
and importing/exporting ODM and Define.xml.
• Contains all necessary files (SAS macros and driver
programs, maps, property files, XSL stylesheets)
• Learning curve
www.d-Wise.com
Clinical Standards Toolkit (CST)
…or PROC XSL
www.d-Wise.com
References
• Using the SAS Clinical Standards Toolkit 1.5 to Import
CDISC ODM Files, Lex Jansen, Pharmasug 2013
• Using the SAS Clinical Standards Toolkit for Define.xml
Creation, Lex Jansen, Pharmasug 2011
• Accessing the Metadata from the Define.xml Using XSLT
Transformation, Lex Jansen, Phuse 2010
www.d-Wise.com
References
A SAS Programmer's Guide to Generating Define.xml, Mike
Molter, SAS Global Forum 2009
ods markup
proc
proc
proc
etc
ods markup
www.d-Wise.com
tagset=mydefine file='define.xml' ;
print noobs data=meta-dataset1; run;
print noobs data=meta-dataset2; run;
print noobs data=meta-dataset3; run;
close ;
Other Resources
• LinkedIn Groups
• CDISC XML Technologies
• CDISC Define-XML
• CDISC Dataset-XML
• CDISC-SDTM Experts
• wiki.cdisc.org
• http://www.cdisc.org
www.d-Wise.com
In Summary…
• Options for Exporting XML
• XML LIBNAME engine (XMLTYPE=, TAGSET= options)
• ODS (SAS XML destinations or user-defined tagsets)
• DATA step
• XSL stylesheets
• CST (clinical)
• Options for Importing XML
• XML LIBNAME engine (XMLTYPE=, TAGSET= options)
• XML maps
• XSL stylesheets
• CST (clinical)
www.d-Wise.com
In Summary…
So what do I need to know???
www.d-Wise.com