Validation of HL7 v3 instances A post-mortem from the caCIS Implementation Dan Kokotov, Todd Parnell, 5AM Solutions Who We Are / Acknowledgements Enterprise Service Development for caCIS project 5AM one of three companies involved in ESD Other disciplines on caCIS: A&A, QA, Deployment, Documentation, … The content of this presentation is authored by 5AM – any errors or omissions are our sole responsibility Acknowledgements Architecture & Analysis – John Koisch, Paul Boyes, Jean-Henri Duteau, Lorraine Constable and others Enterprise Service Development – SemanticsBits and Agilex teams Context – Architecture and Methodology HL7 v3 using R2 datatypes CDA (and possibly R1) in scope for project but out of scope for our solution Roughly 70 RMIMs Project-specific datatype specification with roughly 50 custom datatype flavors Terminology Worksheet with mix of explicit and referential definitions for roughly 35 vocabulary types XML ITS used at implementation layer SOA infrastructure – SOAP services with WS-* Roughly 40 WSDL interfaces in 13 functional areas Project-specific contract/fault specification, governing reporting of business and system exceptional behavior, including contract tracing SAIF specification methodology A&A delivered CIM/PSM specs, in the form of RMIM models, Interface description, and accompanying documentation Also included XSDs and WSDLs (non-normative but implied by V3 tooling and choice of ITS) Context - Technology Tech Stack JSE 1.6, JEE 1.5 (JTA only) JAX-WS, as implemented by CXF JAXB JPA, as implemented by Hibernate Spring 3.0 Tolven Challenge Validate incoming messages for compliance to model Structural Base datatype rules Flavors* Later in the project Vocabulary* Mostly beyond scope of this presentation QA generated test cases based on PIM specs/models to validate compliance Initial approach Architecture: AP, AO,CO, CS “RIM-inspired” application data model AP<->AO: ORM (JPA2), AO<->CO: Bean mapping (Dozer), CO<->CS: XML Serialization (JAXB) Validation Schema validation – implicit from ITS, enforced via CXF interceptor JPA Bean validation – “message-independent” invariants, enforced by JPA e.g. “a patient must have a name” “External” Bean validation - “message-specific” constraints, enforced by custom AOP interceptor E.g. “order must have an identifier in the REPC_MT000001US RMIM” Initial Approach – with pictures Initial approach - problems Uncertainty on how to decide when something could be promoted as “message-independent invariant” Occasional duplication between JPA Bean validation and “External” bean validation Basic R2/ISO 21090 datatype validation Would require extensive bean validation implementation Vocabulary compliance required definition of explicit enumerated lists from worksheet Was not sufficient for referential definitions R2/ISO 21090 datatypes - solution Leverage schematron definitions of constraints embedded in official iso_21090_types.xsd schema To do so had to overcome several roadblocks and challenges: Embedded schematron did not have any context, as XML ITS / ISO 21090 schema only defines XSD ComplexTypes for each datatype, not a standard element XSLT2 supports a schema type axis, but no open source Java XSLT processor implements this Therefore, have to define context as explicit OR of possible paths to the datatype from any message of a given SOAP service Potential recursion in datatypes makes this very tricky Embedded schematron did not use prefixes for element names, thus they were not bound to the HL7 namespace, and schematron does not permit binding the empty prefix to a namespace Had to use a regular expression to inject a prefix to element names in schematron XPath expressions Embedded schematron had a variety of typos/bugs Fixed directly in the schema Miscellaneous (ANY type, inheritance, bugs in Xerces’ XS Schema reader) Meet ExtractSchematron.java Part of build-time toolchain to generate schematron for the ISO 21090 datatypes Pseudocode: Walk the iso-21090 schema, extract schematron annotations “Fix” the schematron by injecting hl7: prefix to element names Write the “abstract” schematron rule file with all the extracted schematron rules Single sch:pattern called “abstract rules” One sch:rule per datatype rule Walk the service schemas, determine possible paths to a datatype Must include paths to a datatype’s supertype, and account for abstract types which can have xsi:type declarations at runtime Write the “concrete” schematron rule file which references the “abstract” rules One sch:pattern per datatype rule, whose context is the OR of all possible paths to an element which is of that datatype For the win – regexp for injecting hl7: prefix (^|or |and |::|/|\\(|\\|)([^@naocmspxt()&\\.\\[\\\\=*+>!\\-09]|n(?!ot[ \\(])|a(?!nd[ \\(])|o(?!r[ \\(])|c(?!ount\\()|m(?!atches\\()|s(?!tringlength\\(|elf|tarts-with\\()|t(?!ext\\()|p(?!lain')|x(?!si:)) ExtractSchematron – the output “Abstract” <sch:rule abstract="true" id="IVL_PQ-0"> <sch:assert test="(@nullFlavor and not(hl7:any|hl7:low|hl7:high|hl7:width)) or (not(@nullFlavor) and (hl7:any|hl7:low|hl7:high|hl7:width))"> null rules </sch:assert> </sch:rule> “Concrete” <sch:pattern name="concrete rules"> <sch:rule context="ns0:buildTemplateResponse/responseEnvelope/hl7:subject2/hl7:sequenceNumber/hl7:uncerta inty[@xsi:type and fn:resolve-QName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')] | ns0:buildTemplateResponse/responseEnvelope/hl7:subject2/hl7:priorityNumber/hl7:uncertainty[@xsi :type and fn:resolve-QName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')] | ns0:buildTemplate/templateParameter/hl7:parameterItem/hl7:value[@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'QTY')][@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')] | ns0:buildTemplate/templateParameter/hl7:parameterItem/hl7:value[@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')]"> <sch:extends rule="PQ-0"/> </sch:rule> </sch:pattern> Integrating into the SOAP Stack Applying the generated schematron at runtime CXF Interceptor to apply the schematron CXF Interceptor to detect if errors occurred and raise fault Need two interceptors because we work at different spots in the CXF processing chain Results Were now able to successfully validate for the built-in ISO 21090 datatype constraints With shiny new schematron facility, decided to start using it for custom validation as well But not 100% rosy Slow (ish) Memory intensive Can cause problems with Xalan/Saxon on the Classpath The monkey-wrench Architecture change – switch to Tolven backend Now RP, RO, CO, CS (kind of), still using conversion for RO <-> CO No more JPA Bean validation Still use some “External” bean validation Datatype Specification added, project RMIMs start using flavored datatype One sprint later, we had 200 QA bugs for flavor validation Revised Architecture How to validate datatype flavors? Fully MIF-driven We did not have time to build this Some off the shelf stuff was available, but not on our platform Write all the rules by hand Seemed painful What if we could leverage ExtractSchematron XML ITS does not have explicit types for flavors But if we add them, we could annotate them with schematron rules and use ExtractSchematron to harvest them So this is the approach the took The approach Each flavor derives from the base datatype by restriction And have to do it for container types as well Flavor definitions go into flavors.xsd, which imports iso-21090.xsd Each flavor type is then annotated with schematron, just like iso21090.xsd Still have to write the actual schematron rules by hand based on Datatype specification MIF representation not available, and OCL-schematron translation would be beyond our scope RMIM Schema modified to reference the flavor XSD types Because the derivation is by restriction, this is fully backwards compatible – valid instances look the same For abstract types and flavors, can use either xsi:type or flavorId (for backward compatibility) to specify flavor Putting it into practice Modify V3 Generator StaticMifToXsd.xsl modified to use flavor names in RMIM Schemas RimInfrastructureRootToXsd.xsl modified to add reference to flavors.xsd Both changes conditional on build-time parameter, so backwardcompatible Modify implementation JAXB now generates Java beans for flavor types Have to update Dozer rules and other code accordingly Remaining challenges Permanent home for V3 Generator changes Possible divergence from official ITS spec JAXB flavor beans cause a lot of overhead and over-tight coupling Lessons Learned Schematron is a powerful tool but complex and has limits Cannot do vocabulary Suffers from lack of full implementations of XPath2 and XSLT2 XML ITS for datatypes would be better off with a separate namespace and explicit element names In the end the HDF and HL7 modeling approach strongly require an MDA-oriented implementation, with full MIF awareness. Everything else is a band-aid Therefore must invest in high quality MIF-based toolchains Validation must distinguish between object model, document, and message perspectives of HL7 v3 Same constructs are used to address all three, but the intent and semantics are different Validation strategies should adapt accordingly Resources Source code: http://caehrorg.jira.com/svn/ESD/trunk Contact info Dan Kokotov – dkokotov@5amsolutions.com Todd Parnell – tparnell@5amsolutions.com