uk09_jacobs

advertisement
Improving the output
capabilities of Stata with
Open Document Format xml
Adam Jacobs
Dianthus Medical Limited
Stata’s 3-fold capabilities
 Statistics
 Graphics
 Data management
Statistics
Graphics
Data management
But there is a 4th...
Text output
 A recent clinical study:
– 92 pages of raw data listings
– 124 pages of descriptive data tabulations
– 3 pages of statistical analysis
 All from a study in 12 healthy volunteers
Stata’s text output
Problems with Stata’s text output
 No pagination
 No formatting (or limited formatting with smcl)
 Variable labels not always shown
 No Unicode support
 No tables of contents
 etc etc
Some examples...
So how did I do it?
Open Document Format
 An open standard, approved by ISO
 XML based
 For a variety of office-type documents
 Used by the popular open-source office suite
OpenOffice.org
 Here, we are just interested in word-processing
documents
.odt files
 A .odt file is the native file format of OpenOffice.org
Writer
 A zip file
 Contains various files, the most important of which is
content.xml
 content.xml is simply a plain-text file
 Stata is good at writing plain-text files!
The Stata code
 Creates the content.xml file by writing data with
appropriate xml tags
 Added to other files, zipped to .odt file
 .odt file can be opened directly with Writer
Some examples...
Basics of XML
<company name=“Dianthus Medical Limited”>
<employee role=“speaker”>
<firstname>Adam</firstname>
<lastname>Jacobs</lastname>
</employee>
<employee role=“delegate”>
<firstname>Flavia</firstname>
<lastname>White</lastname>
</employee>
</company>
XML code for start of table
<table:table table:style-name="Table42">
<table:table-column table:style-name="TabCol13"/>
<table:table-column table:style-name="TabCol9"/>
<table:table-column table:style-name="TabCol8"/>
<table:table-column table:style-name="TabCol8"/>
XML code for table cells
<table:table-cell table:style-name="cell1211">
<text:p text:style-name="Table_20_Contents">
Mileage (mpg)</text:p>
</table:table-cell>
<table:table-cell table:style-name="cell1111">
<text:p text:style-name="Table_20_Contents">N</text:p>
</table:table-cell>
<table:table-cell table:style-name="cell1111">
<text:p text:style-name= "Table_20_ContentsNumeric">
52<text:s text:c="3"/></text:p>
</table:table-cell>
Was this a lot of work?
 123 kB of code
 21 ado files
 45 Mata functions
 And not finished yet!
Any questions?
Download