Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited Stata’s 3-fold capabilities Statistics Graphics Data management Statistics Graphics Data management But there is a 4th... Text output A recent clinical study: – 92 pages of raw data listings – 124 pages of descriptive data tabulations – 3 pages of statistical analysis All from a study in 12 healthy volunteers Stata’s text output Problems with Stata’s text output No pagination No formatting (or limited formatting with smcl) Variable labels not always shown No Unicode support No tables of contents etc etc Some examples... So how did I do it? Open Document Format An open standard, approved by ISO XML based For a variety of office-type documents Used by the popular open-source office suite OpenOffice.org Here, we are just interested in word-processing documents .odt files A .odt file is the native file format of OpenOffice.org Writer A zip file Contains various files, the most important of which is content.xml content.xml is simply a plain-text file Stata is good at writing plain-text files! The Stata code Creates the content.xml file by writing data with appropriate xml tags Added to other files, zipped to .odt file .odt file can be opened directly with Writer Some examples... Basics of XML <company name=“Dianthus Medical Limited”> <employee role=“speaker”> <firstname>Adam</firstname> <lastname>Jacobs</lastname> </employee> <employee role=“delegate”> <firstname>Flavia</firstname> <lastname>White</lastname> </employee> </company> XML code for start of table <table:table table:style-name="Table42"> <table:table-column table:style-name="TabCol13"/> <table:table-column table:style-name="TabCol9"/> <table:table-column table:style-name="TabCol8"/> <table:table-column table:style-name="TabCol8"/> XML code for table cells <table:table-cell table:style-name="cell1211"> <text:p text:style-name="Table_20_Contents"> Mileage (mpg)</text:p> </table:table-cell> <table:table-cell table:style-name="cell1111"> <text:p text:style-name="Table_20_Contents">N</text:p> </table:table-cell> <table:table-cell table:style-name="cell1111"> <text:p text:style-name= "Table_20_ContentsNumeric"> 52<text:s text:c="3"/></text:p> </table:table-cell> Was this a lot of work? 123 kB of code 21 ado files 45 Mata functions And not finished yet! Any questions?