Informatica XML Training | CIF CONSULT | Redouane BELBAHRI Agenda XML Validation Validation using Java Transformation Agenda 1 XML Validation Overview To check if an XML document conforms to an XML Schema (XSD), the document must be validated against that XML Schema PowerCenter 9.X uses the SAX model for reading XML and an Informatica proprietary method for writing XML. PowerCenter uses the Apache Xerces XML 2.7 parser for reading and writing XML files. What is a parser? XML data, by definition, consists of tags and content, otherwise known and as markup and character data. This predictability makes it readable by humans, who can write and read the data without assistance from an application. Using this data in an application involves processing it to provide an in-memory structure of parents and children or a sequence of events that represent elements, attributes, and data. The parser performs this processing. DOM XML Parser in Java DOM Stands for Document Object Model and it represent an XML Document into tree format which each element representing tree branches. DOM Parser creates an In Memory tree representation of XML file and then parses it, so it requires more memory and its advisable to have increased heap size for DOM parser in order to avoid Java.lang.OutOfMemoryError:java heap space . Parsing XML file using DOM parser is quite fast if XML file is small but if you try to read a large XML file using DOM parser there is more chances that it will take a long time or even may not be able to load it completely simply because it requires lot of memory to create XML Dom Tree. DOM XML Parser in Java Java provides support DOM Parsing and you can parse XML files in Java using DOM parser. DOM classes are in w3c.dom package while DOM Parser for Java is in JAXP (Java API for XML Parsing) package. SAX XML Parser in Java SAX Stands for Simple API for XML Parsing. This is an event based XML Parsing and it parse XML file step by step so much suitable for large XML Files. SAX XML Parser fires event when it encountered opening tag, element or attribute and the parsing works accordingly. It’s recommended to use SAX XML parser for parsing large xml files in Java because it doesn't require to load whole XML file in Java and it can read a big XML file in small parts. Java provides support for SAX parser and you can parse any xml file in Java using SAX Parser, I have covered example of reading xml file using SAX Parser here. One disadvantage of using SAX Parser in java is that reading XML file in Java using SAX Parser requires more code in comparison of DOM Parser. Difference between DOM and SAX XML Parser Here are few high level differences between DOM parser and SAX Parser in Java: – DOM parser loads whole xml document in memory while SAX only loads small part of XML file in memory. – DOM parser is faster than SAX because it access whole XML document in memory. – SAX parser in Java is better suitable for large XML file than DOM Parser because it doesn't require much memory. – DOM parser works on Document Object Model while SAX is an event based xml parser. Agenda 2 1 0 Validation using Java Transformation Java SAX Parser Example Java SAX Parser provides API to parse XML documents. SAX Parsers are different than DOM parser because it doesn’t load complete XML into memory and read xml document sequentially. javax.xml.parsers.SAXParser provides method to parse XML document using event handlers. This class implements XMLReader interface and provides overloaded versions of parse() methods to read XML document from File, InputStream, SAX InputSource and String URI. Code here Agenda 2 1 3 XML KEDB (Known Error Database) Prevent the creation of an empty XML file when the PowerCenter session writes no data to the target To prevent the creation of an empty XML target file when no rows are written by the PowerCenter session do the following to disable the XML document generation: – Set the following custom properties on the Integration Service: WriteNullXMLFile = "No" SuppressNilContentMethod = "ByTree" – Optionally, set the following properties for the XML target in the session task under the Mapping tab. Lieu - date Null Content Representation - "No Tag" Empty String Content Representation - "No Tag" Null Attribute Representation - "No Attribute" Empty String Attribute Representation - "No attribute" 14 Empty parent tags are written to an XML target file when all child elements are null Problem Description – A PowerCenter session with an XML target writes empty parent tags to the XML file when all child elements are null. – This may occur even when the Null Content Representation option is set to No Tag in the session properties. Solution – If the Null Content Representation attribute of the XML writer is set to No Tag , then the transformation will only suppress leaf elements when there is no data. – To suppress parent tags as well, set the custom property SuppressNilContentMethod to either "ByTree" or"ByView". Lieu - date 15 The string values "TRUE" or "FALSE" cannot be inserted into a boolean field in an XML target using PowerCenter The XML boolean datatype is defined as a Small Integer transformation datatype in PowerCenter, so only the Integer values 1 and 0 can be written to an XML target or read from an XML source. This restriction only applies to the PowerCenter built-in XML capabilities. The Advanced XML Option (B2B) allows you to use all the supported values (0,1,false,true) for the XSD boolean type. 16 Solution Boolean Datatype in target – To retain the same datatype for the column do the following: Add an Expression transformation to the mapping. In this transformation convert the value from true or false to 0 or 1 respectively. Example: IIF ( INPUT_STRING='TRUE',1 ,0 ) Write "True" or "False" in target – To write "True" or "False" string values to the XML target do the following: Change the datatype to CHAR in the XML Target definition. If the source data is 0 or 1 use an expression to convert these values to True or False. Example: IIF ( INPUT_BOOLEAN=1,'TRUE', 'FALSE') 17 Avoid namespace prefixing to elements in the generated target XML Description – The prefixing of namespace to elements in generated target XML instance file (during the session run) is done so that the elements belonging to a namespace can be differentiated from elements from another namespace. When there is a single namespace defined in a target definition, no differentiation is required and thus, prefixing is not done. – In case the XML target definition is defined with more than one namespace, the prefixing becomes unavoidable. However, consider a scenario in which all the target XML fields belong to same namespace. Here the differentiation is not required but prefixing is still done. By declaring this namespace as the default namespace, this prefixing can be removed from the XML file that gets generated during the session run. Solution – To make the namespace default select the XML Target definition Edit XML definition > Components > Edit Namespace and click the check box next to the name space. XMLNillableValidation Description – When a XML Generator is created using an XSD that contains an optional string (minOccurs = 0) and validation is turned on. – When running a PowerCenter session with an XML target the following error occurs even though the element is set as nillable: Message Code: XMLW_31205 Message: Warning: Failed to validate element [businessCommunicationID], group [X_CO], row [0], error [Value is NULL but element is not nillable.]. Discarding row. Solution – To resolve this do the following: Add the XMLNillableValidation custom property to the session or Integration Service. Set the Value of this property to "No" Value is NULL but element is not nillable Target XSD Mapping Value is NULL but element is not nillable With for Session properties: – Empty string content representation = Tag with Empty Content With for Session properties – Empty string content representation = No Tag Many Thanks for you time today!