CIF – Informatica XML Validation

advertisement
Informatica
XML
Training | CIF CONSULT | Redouane BELBAHRI
Agenda


XML Validation
Validation using Java Transformation
Agenda
1
XML Validation
Overview

To check if an XML document conforms to an XML Schema
(XSD), the document must be validated against that XML
Schema

PowerCenter 9.X uses the SAX model for reading XML and an
Informatica proprietary method for writing XML.

PowerCenter uses the Apache Xerces XML 2.7 parser for
reading and writing XML files.
What is a parser?

XML data, by definition, consists of tags and content, otherwise
known and as markup and character data. This predictability
makes it readable by humans, who can write and read the data
without assistance from an application.

Using this data in an application involves processing it to
provide an in-memory structure of parents and children or a
sequence of events that represent elements, attributes, and
data. The parser performs this processing.
DOM XML Parser in Java

DOM Stands for Document Object Model and it represent an
XML Document into tree format which each element
representing tree branches.

DOM Parser creates an In Memory tree representation of XML
file and then parses it, so it requires more memory and its
advisable to have increased heap size for DOM parser in order
to avoid Java.lang.OutOfMemoryError:java heap space . Parsing
XML file using DOM parser is quite fast if XML file is small but if
you try to read a large XML file using DOM parser there is more
chances that it will take a long time or even may not be able to
load it completely simply because it requires lot of memory to
create XML Dom Tree.
DOM XML Parser in Java

Java provides support DOM Parsing and you can parse XML files
in Java using DOM parser. DOM classes are in w3c.dom package
while DOM Parser for Java is in JAXP (Java API for XML Parsing)
package.
SAX XML Parser in Java

SAX Stands for Simple API for XML Parsing. This is an event
based XML Parsing and it parse XML file step by step so much
suitable for large XML Files. SAX XML Parser fires event when it
encountered opening tag, element or attribute and the parsing
works accordingly.

It’s recommended to use SAX XML parser for parsing large xml
files in Java because it doesn't require to load whole XML file in
Java and it can read a big XML file in small parts. Java provides
support for SAX parser and you can parse any xml file in Java
using SAX Parser, I have covered example of reading xml file
using SAX Parser here.

One disadvantage of using SAX Parser in java is that reading
XML file in Java using SAX Parser requires more code in
comparison of DOM Parser.
Difference between DOM and SAX
XML Parser

Here are few high level differences between DOM parser and
SAX Parser in Java:
– DOM parser loads whole xml document in memory while SAX only loads
small part of XML file in memory.
– DOM parser is faster than SAX because it access whole XML document in
memory.
– SAX parser in Java is better suitable for large XML file than DOM Parser
because it doesn't require much memory.
– DOM parser works on Document Object Model while SAX is an event based
xml parser.
Agenda
2
1
0
Validation using Java Transformation
Java SAX Parser Example

Java SAX Parser provides API to parse XML documents. SAX
Parsers are different than DOM parser because it doesn’t load
complete XML into memory and read xml document
sequentially.

javax.xml.parsers.SAXParser provides method to parse XML
document using event handlers. This class implements
XMLReader interface and provides overloaded versions of
parse() methods to read XML document from File, InputStream,
SAX InputSource and String URI.
Code here
Agenda
2
1
3
XML KEDB (Known Error Database)
Prevent the creation of an empty XML file when the
PowerCenter session writes no data to the target

To prevent the creation of an empty XML target file when no
rows are written by the PowerCenter session do the following
to disable the XML document generation:
– Set the following custom properties on the Integration Service:

WriteNullXMLFile = "No"

SuppressNilContentMethod = "ByTree"
– Optionally, set the following properties for the XML target in the session
task under the Mapping tab.
Lieu - date

Null Content Representation - "No Tag"

Empty String Content Representation - "No Tag"

Null Attribute Representation - "No Attribute"

Empty String Attribute Representation - "No attribute"
14
Empty parent tags are written to an XML
target file when all child elements are null

Problem Description
– A PowerCenter session with an XML target writes empty parent tags to the
XML file when all child elements are null.
– This may occur even when the Null Content Representation option is set to
No Tag in the session properties.

Solution
– If the Null Content Representation attribute of the XML writer is set to No
Tag , then the transformation will only suppress leaf elements when there is
no data.
– To suppress parent tags as well, set the custom property
SuppressNilContentMethod to either "ByTree" or"ByView".
Lieu - date
15
The string values "TRUE" or "FALSE" cannot be inserted
into a boolean field in an XML target using PowerCenter

The XML boolean datatype is defined as a Small Integer
transformation datatype in PowerCenter, so only the Integer
values 1 and 0 can be written to an XML target or read from an
XML source.

This restriction only applies to the PowerCenter built-in XML
capabilities. The Advanced XML Option (B2B) allows you to use
all the supported values (0,1,false,true) for the XSD boolean
type.
16
Solution

Boolean Datatype in target
– To retain the same datatype for the column do the following:


Add an Expression transformation to the mapping.

In this transformation convert the value from true or false to 0 or 1 respectively.

Example: IIF ( INPUT_STRING='TRUE',1 ,0 )
Write "True" or "False" in target
– To write "True" or "False" string values to the XML target do the following:

Change the datatype to CHAR in the XML Target definition.

If the source data is 0 or 1 use an expression to convert these values to True or False.

Example: IIF ( INPUT_BOOLEAN=1,'TRUE', 'FALSE')
17
Avoid namespace prefixing to elements in the
generated target XML

Description
– The prefixing of namespace to elements in generated target XML instance file (during the
session run) is done so that the elements belonging to a namespace can be differentiated from
elements from another namespace. When there is a single namespace defined in a target
definition, no differentiation is required and thus, prefixing is not done.
– In case the XML target definition is defined with more than one namespace, the prefixing
becomes unavoidable. However, consider a scenario in which all the target XML fields belong
to same namespace. Here the differentiation is not required but prefixing is still done. By
declaring this namespace as the default namespace, this prefixing can be removed from the
XML file that gets generated during the session run.

Solution
– To make the namespace default select the XML Target definition Edit XML definition >
Components > Edit Namespace and click the check box next to the name space.
XMLNillableValidation

Description
– When a XML Generator is created using an XSD that contains an optional string (minOccurs = 0)
and validation is turned on.
– When running a PowerCenter session with an XML target the following error occurs even
though the element is set as nillable:


Message Code: XMLW_31205

Message: Warning: Failed to validate element [businessCommunicationID], group
[X_CO], row [0], error [Value is NULL but element is not nillable.]. Discarding row.
Solution
– To resolve this do the following:

Add the XMLNillableValidation custom property to the session or Integration Service.

Set the Value of this property to "No"
Value is NULL but element is not nillable

Target XSD

Mapping
Value is NULL but element is not
nillable

With for Session properties:
– Empty string content representation = Tag with Empty Content

With for Session properties
– Empty string content representation = No Tag
Many Thanks for you time today!
Download