Validation of HL7 v3 instances

Validation of HL7 v3 instances
A post-mortem from the caCIS Implementation
Dan Kokotov, Todd Parnell, 5AM Solutions
Who We Are / Acknowledgements
Enterprise Service Development for caCIS project
5AM one of three companies involved in ESD
Other disciplines on caCIS: A&A, QA, Deployment, Documentation, …
The content of this presentation is authored by 5AM – any errors or
omissions are our sole responsibility
Architecture & Analysis – John Koisch, Paul Boyes, Jean-Henri Duteau,
Lorraine Constable and others
Enterprise Service Development – SemanticsBits and Agilex teams
Context – Architecture and Methodology
HL7 v3 using R2 datatypes
CDA (and possibly R1) in scope for project but out of scope for our solution
Roughly 70 RMIMs
Project-specific datatype specification with roughly 50 custom datatype flavors
Terminology Worksheet with mix of explicit and referential definitions for roughly 35
vocabulary types
XML ITS used at implementation layer
SOA infrastructure – SOAP services with WS-*
Roughly 40 WSDL interfaces in 13 functional areas
Project-specific contract/fault specification, governing reporting of business and system
exceptional behavior, including contract tracing
SAIF specification methodology
A&A delivered CIM/PSM specs, in the form of RMIM models, Interface description, and
accompanying documentation
Also included XSDs and WSDLs (non-normative but implied by V3 tooling and choice of
Context - Technology
Tech Stack
JSE 1.6, JEE 1.5 (JTA only)
JAX-WS, as implemented by CXF
JPA, as implemented by Hibernate
Spring 3.0
Validate incoming messages for compliance to model
Base datatype rules
 Later in the project
 Mostly beyond scope of this presentation
QA generated test cases based on PIM specs/models to
validate compliance
Initial approach
Architecture: AP, AO,CO, CS
“RIM-inspired” application data model
AP<->AO: ORM (JPA2), AO<->CO: Bean mapping (Dozer), CO<->CS:
XML Serialization (JAXB)
Schema validation – implicit from ITS, enforced via CXF interceptor
JPA Bean validation – “message-independent” invariants, enforced by
 e.g. “a patient must have a name”
“External” Bean validation - “message-specific” constraints, enforced
by custom AOP interceptor
 E.g. “order must have an identifier in the REPC_MT000001US RMIM”
Initial Approach – with pictures
Initial approach - problems
Uncertainty on how to decide when something could be
promoted as “message-independent invariant”
Occasional duplication between JPA Bean validation
and “External” bean validation
Basic R2/ISO 21090 datatype validation
Would require extensive bean validation implementation
Vocabulary compliance required definition of explicit
enumerated lists from worksheet
Was not sufficient for referential definitions
R2/ISO 21090 datatypes - solution
Leverage schematron definitions of constraints embedded in
official iso_21090_types.xsd schema
To do so had to overcome several roadblocks and challenges:
Embedded schematron did not have any context, as XML ITS / ISO 21090 schema
only defines XSD ComplexTypes for each datatype, not a standard element
 XSLT2 supports a schema type axis, but no open source Java XSLT processor implements
 Therefore, have to define context as explicit OR of possible paths to the datatype from
any message of a given SOAP service
 Potential recursion in datatypes makes this very tricky
Embedded schematron did not use prefixes for element names, thus they were not
bound to the HL7 namespace, and schematron does not permit binding the empty
prefix to a namespace
 Had to use a regular expression to inject a prefix to element names in schematron XPath
Embedded schematron had a variety of typos/bugs
 Fixed directly in the schema
Miscellaneous (ANY type, inheritance, bugs in Xerces’ XS Schema reader)
Part of build-time toolchain to generate schematron for the ISO 21090 datatypes
Walk the iso-21090 schema, extract schematron annotations
“Fix” the schematron by injecting hl7: prefix to element names
Write the “abstract” schematron rule file with all the extracted schematron rules
 Single sch:pattern called “abstract rules”
 One sch:rule per datatype rule
Walk the service schemas, determine possible paths to a datatype
 Must include paths to a datatype’s supertype, and account for abstract types which can have xsi:type
declarations at runtime
Write the “concrete” schematron rule file which references the “abstract” rules
 One sch:pattern per datatype rule, whose context is the OR of all possible paths to an element which
is of that datatype
For the win – regexp for injecting hl7: prefix
(^|or |and |::|/|\\(|\\|)([^@naocmspxt()&\\.\\[\\\\=*+>!\\-09]|n(?!ot[ \\(])|a(?!nd[ \\(])|o(?!r[
ExtractSchematron – the output
<sch:rule abstract="true" id="IVL_PQ-0">
<sch:assert test="(@nullFlavor and not(hl7:any|hl7:low|hl7:high|hl7:width)) or
(not(@nullFlavor) and (hl7:any|hl7:low|hl7:high|hl7:width))">
null rules
<sch:pattern name="concrete rules">
inty[@xsi:type and fn:resolve-QName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')]
:type and fn:resolve-QName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')] |
ns0:buildTemplate/templateParameter/hl7:parameterItem/hl7:value[@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'QTY')][@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')] |
ns0:buildTemplate/templateParameter/hl7:parameterItem/hl7:value[@xsi:type and fn:resolveQName(@xsi:type, self::node())=fn:QName('urn:hl7-org:v3', 'PQ')]">
<sch:extends rule="PQ-0"/>
Integrating into the SOAP Stack
Applying the generated schematron at runtime
CXF Interceptor to apply the schematron
CXF Interceptor to detect if errors occurred and raise fault
 Need two interceptors because we work at different spots in the CXF
processing chain
Were now able to successfully validate for the built-in
ISO 21090 datatype constraints
With shiny new schematron facility, decided to start
using it for custom validation as well
But not 100% rosy
Slow (ish)
Memory intensive
Can cause problems with Xalan/Saxon on the Classpath
The monkey-wrench
Architecture change – switch to Tolven backend
Now RP, RO, CO, CS (kind of), still using conversion for RO <-> CO
No more JPA Bean validation
Still use some “External” bean validation
Datatype Specification added, project RMIMs start
using flavored datatype
One sprint later, we had 200 QA bugs for flavor validation
Revised Architecture
How to validate datatype flavors?
Fully MIF-driven
We did not have time to build this
Some off the shelf stuff was available, but not on our platform
Write all the rules by hand
Seemed painful
What if we could leverage ExtractSchematron
XML ITS does not have explicit types for flavors
But if we add them, we could annotate them with schematron rules
and use ExtractSchematron to harvest them
So this is the approach the took
The approach
Each flavor derives from the base datatype by restriction
And have to do it for container types as well
Flavor definitions go into flavors.xsd, which imports iso-21090.xsd
Each flavor type is then annotated with schematron, just like iso21090.xsd
 Still have to write the actual schematron rules by hand based on Datatype
MIF representation not available, and OCL-schematron translation would be
beyond our scope
RMIM Schema modified to reference the flavor XSD types
Because the derivation is by restriction, this is fully backwards compatible
– valid instances look the same
For abstract types and flavors, can use either xsi:type or flavorId
(for backward compatibility) to specify flavor
Putting it into practice
Modify V3 Generator
StaticMifToXsd.xsl modified to use flavor names in RMIM Schemas
RimInfrastructureRootToXsd.xsl modified to add reference to flavors.xsd
Both changes conditional on build-time parameter, so backwardcompatible
Modify implementation
JAXB now generates Java beans for flavor types
Have to update Dozer rules and other code accordingly
Remaining challenges
Permanent home for V3 Generator changes
Possible divergence from official ITS spec
JAXB flavor beans cause a lot of overhead and over-tight coupling
Lessons Learned
Schematron is a powerful tool but complex and has limits
Cannot do vocabulary
Suffers from lack of full implementations of XPath2 and XSLT2
XML ITS for datatypes would be better off with a separate namespace
and explicit element names
In the end the HDF and HL7 modeling approach strongly require an
MDA-oriented implementation, with full MIF awareness.
Everything else is a band-aid
Therefore must invest in high quality MIF-based toolchains
Validation must distinguish between object model, document, and
message perspectives of HL7 v3
Same constructs are used to address all three, but the intent and semantics are
Validation strategies should adapt accordingly
Source code:
Contact info
Dan Kokotov – [email protected]
Todd Parnell – [email protected]