Diaries of a Desperate (XML|XProc) Hacker Diaries of a Desperate (XML|XProc) Hacker James Fuller Lead Engineer | MarkLogic Background • Engineer on MarkLogic API team (History meters, Management API, etc…) • W3C XML Processing WG (XProc v2.0) • 2001 started with XML tech (EXSLT),XML Prague, etc… • Open source contrib. • Thank you to the organisers of XProc XML London 2015 Agenda 1. 2. 3. 4. 5. 6. XML Hacker Desperation XMLCalabash & depify Show & Tell XProc Hacker Desperation Summary Goto pub * Yes, I am going to ‘powerpoint’ you * Raise your hand to ask question Email !!! The D.P.H. xkcd.com - http://xkcd.com/208/ [xkcd-ref] D.P.H. – a twinkling in SGML eye • Desperate Perl Hacker – Paul Grosso 1997 xml-dev link – Google images ‘desperate perl hacker’ link – Etymological cousin of ‘Just Another Perl Hacker’ (JAPH) – Randal Schwartz aka Merlin • What’s it all about ? – – – – GSD Opaque One liners (Perl Golf encouraged) Even better if (regex|pipes|sed|awk) involved Challenge: Be able to munge XML with Perl Desperate XML Hacker • GAD (Get it All Done) with XML Stack • ‘clever’ (and|or) ‘clear’ • Highly productive, albeit marooned and anxious on ‘XML island’ • Working with xml means working with documents and that means working with document workflows All programmers are desperate marklogic emacs ant xml xpath json xslt emacs java xquery gradle bash ….. • • • • • • • • • • • Day 1 - transform an xml doc with XSLT Day 2 - run transform on set of docs Day 3 - generate multiple output formats Day 4 - read docs from database Day 5 - put results into database Day 6 - notify when its done Day 7 - run assertions and validate results Day 8 - generate png from svg for each document Day 8 - zip up files and upload them (w/ oauth) Day 9 - create EPub And so forth … file system xslt xml doc database Technology Selection transform result doc result image generate zip zip package result doc notify – XSLT – XQuery – Bash scripts – Makefiles – Ant – Java result image upload – All of the above ? file system xslt xml doc database TRANSFORM transform result doc result image generate GENERATE zip zip zip package result doc notify PACKAGE result image notify upload upload Adhoc pipelines Pipelines manage complexity [McGrath2004] Sean McGrath. Performing impossible feats of XML processing with pipelining, Proc XML Open 2004, • Transformation decomposition is the key to complexity management, just ask: – Henry Ford – Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) – George Miller (7+/-2) – Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) – Any electrical/chemical engineer – Michael A. Jackson • Easy to build, test and reuse • Segregation of business rules from grammar rules • Enable group collaboration Michael Kay Balisage 2009 – ‘You Pull, I’ll Push: on the Polarity of Pipelines’ • ‘the code of each step in the pipeline is kept very simple’ • ‘very easy to assemble an application from a set of components, thus maximizing the potential for component reuse’ • ‘there is no requirement that each step in a pipeline should use the same technology; it's easy to mix XSLT, XQuery, Java and so on in different stages.’ http://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html Use all the XML technologies … XML – The Good Parts Modern XML Tier 1 Modern XML Tier 2 Core XML 1.0 Namespaces XPATH 1.0/2.0/3.0 XML Canonicalization Transform/ Query XSLT 1.0/2.0/3.0 XQuery 1.0/3.0 XSLT 1.0/2.0 (in browser) Processing SAX, DOM XProc?, XOM Other XML Catalog XForms Schema Schematron XML Schema 1.0 RELAX-NG XML Schema 1.1 Semantics RDF OWL SPARQL SPARQL Update Vocabularies* SVG ‘Office’ Doc ML …. MathML Docbook DITA XHTML - Amended from XML Amsterdam 2012 Keynote Dependency Adoption (technology selection) Dependency Adoption Helter skelter http://upload.wikimedia.org/wikipedia/comm ons/thumb/b/ba/Helter_skelter.jpg/440pxHelter skelter Helter_skelter.jpg Its more like this The right Tool Obligatory Jedi slide But it works! Java and XML xml:Father- "XML gives Java something to do.” • XML, Java, and the future of the Web 1997, Jon Bosak - http://www.ibiblio.org/pub/suninfo/standards/xml/why/xmlapps.htm • SAX,DOM • Unicode support • Distributed • Caring and feeding of java vm • Invoke abstraction (classpath, jar fun) Do Java and XML work better together? Not enough time Not enough time Desire to be Productive 10x programmers is not a myth • • • • • • • • • • • • • Augustine, N. R. 1979. "Augustine’s Laws and Major System Development Programs." Defense Systems Management Review: 50-76. Boehm, Barry W., and Philip N. Papaccio. 1988. "Understanding and Controlling Software Costs." IEEE Transactions on Software Engineering SE-14, no. 10 (October): 1462-77. Boehm, Barry, et al, 2000. Software Cost Estimation with Cocomo II, Boston, Mass.: Addison Wesley, 2000. Boehm, Barry W., T. E. Gray, and T. Seewaldt. 1984. "Prototyping Versus Specifying: A Multiproject Experiment." IEEE Transactions on Software Engineering SE-10, no. 3 (May): 290-303. Also in Jones 1986b. Card, David N. 1987. "A Software Technology Evaluation Program." Information and Software Technology 29, no. 6 (July/August): 291-300. Curtis, Bill. 1981. "Substantiating Programmer Variability." Proceedings of the IEEE 69, no. 7: 846. Curtis, Bill, et al. 1986. "Software Psychology: The Need for an Interdisciplinary Program." Proceedings of the IEEE 74, no. 8: 1092-1106. DeMarco, Tom, and Timothy Lister. 1985. "Programmer Performance and the Effects of the Workplace." Proceedings of the 8th International Conference on Software Engineering. Washington, D.C.: IEEE Computer Society Press, 268-72. DeMarco, Tom and Timothy Lister, 1999. Peopleware: Productive Projects and Teams, 2d Ed. New York: Dorset House, 1999. Mills, Harlan D. 1983. Software Productivity. Boston, Mass.: Little, Brown. Sackman, H., W.J. Erikson, and E. E. Grant. 1968. "Exploratory Experimental Studies Comparing Online and Offline Programming Performance." Communications of the ACM 11, no. 1 (January): 3-11. Valett, J., and F. E. McGarry. 1989. "A Summary of Software Measurement Experiences in the Software Engineering Laboratory." Journal of Systems and Software 9, no. 2 (February): 137-48. Weinberg, Gerald M., and Edward L. Schulman. 1974. "Goals and Performance in Computer Programming." Human Factors 16, no. 1 (February): 70-77. Except when it is a myth • technical debt – Maintainable/Upgrade – Add new features – Enterprise requirements • more bugs • brittle code Upfront design Technology selection Balancing trade-offs to achieve sum gain reflection • Desperate people do desperate things – – – – – Use all the XML technologies Dependency adoption Not the right tool Not enough time Being productive avoid being a D.X.H. • Careful technology selection • Manage your dependencies • Avoid distributing logic up/down/across tech stack (hint: don’t use bash, makefiles, ant, etc) • Simplify interaction with Java (VM) • Model pipelines (hint: XProc) avoid being a D.X.H. • Use XProc (XMLCalabash) – XProc is designed for XML processing pipelines – Extensible – Simplify and aggregate logic • Use XProc extension steps (depify) – XProc w/o extension steps is half of XProc – Provide façade over other technologies We use pipelines • • • • John Lumley – worked with DITA OT Sandro Cirulli - workflow (pull scm, push db, process) Nic Gibson – conversion workflows Philip Fearon - types of workflows (seq and concurrent) with XMLFlow • Andrew Sales – schematron on word docs (used Ant) • …. • most talks mentioned workflow/pipeline – ~100 mentions in proceedings – guestimate ~6 mentions per hour during the talks Desperate XProc Hacker • XProc learning curve – v1.0 verbose in places – XProc generic by design – Some ‘Batteries not included’ • XProc v2.0 addresses this – – – – – – – Simplify connecting steps Simplify parameters (maps) Flow control Metadata Anything ‘flows’ avt/tvt Syntactic optimisations • depify provides a way to distribute and reuse extension steps beats the problems that arise using ‘hairball’ approach XMLCalabash & depify • XMLCalabash – XProc processor – Norm Walsh – http://xmlcalabash.com/ • depify – XProc dependency management – http://depify.com/ XMLCalabash extension steps package com.example.library; import com.xmlcalabash.library.DefaultStep; … elided … import com.xmlcalabash.runtime.XAtomicStep; @XMLCalabash( name = "ex:hello-world", type = "{http://example.org/xmlcalabash/steps}hello-world") public class HelloWorld extends DefaultStep { private WritablePipe result = null; public HelloWorld(XProcRuntime runtime, XAtomicStep step) { super(runtime,step); } public void setOutput(String port, WritablePipe pipe) { result = pipe; } public void reset() { result.resetWriter(); } public void run() throws SaxonApiException { super.run(); … elided … tree.addText("Hello World"); … elided … result.write(tree.getResult()); } } Library for the step <p:library version="1.0" xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ex="http://example.org/xmlcalabash/steps"> <p:declare-step type="ex:hello-world"> <p:output port="result"/> </p:declare-step> </p:library> library xpl included in jar M Filemode Length Date Time File - ---------- -------- ----------- -------- ----------------------------------------------------drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/ -rw-r--r-843 8-Mar-2015 10:43:38 META-INF/MANIFEST.MF drwxr-xr-x 0 8-Mar-2015 10:43:38 com/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/ drwxr-xr-x 0 8-Mar-2015 10:43:38 com/example/library/ -rw-r--r-- 2062 8-Mar-2015 10:43:38 com/example/library/HelloWorld.class drwxr-xr-x 0 8-Mar-2015 10:43:38 META-INF/annotations/ -rw-r--r-31 8-Mar-2015 10:43:38 METAINF/annotations/com.xmlcalabash.core.XMLCalabash -rw-r--r-294 19-Feb-2015 15:41:00 example-library.xpl - ---------- -------- ----------- -------- ----------------------------------------------------3230 9 files depify • depify.com • depify client • depify github • • • • Usage of XMLCalabash Usage of depify Develop your own step Distribute with depify depify future • Gradle plugin • Depify into other repos to enable day zero bootstrap (w/ yum, etc) • Integration (expath package management) • More steps • More steps • More steps Summary • • • • XProc extension steps provide reuse XProc v2.0 lets you work in broader context Pipelines manage complexity depify specifically built for XProc (XMLcalabash) • Reuse with existing mechanisms (ex. Maven) How to Become a Delighted XProc Hacker • Stop using bash, makefiles, ant or bending XML tech to control main loop • Stop making adhoc pipelines • • • • • model pipelines with XProc (XMLCalabash) try out ext steps (depify) GSD reuse and distribute new steps (depify) goto pub Thank you for your attention and time, questions ? <pub/>