Business Documents – Concepts and Techniques Ulrike Greiner, SAP © 2005-2006 The ATHENA Consortium. Course Structure 1. 2. 3. 4. Introduction Business Document Standards Business Document Modelling Business Document Mapping © 2005-2006 The ATHENA Consortium. 2 Introduction © 2005-2006 The ATHENA Consortium. Definition • A business document is a set of information components that are interchanged as part of a business activity (Definition from ebXML). • Possible components are: • Information (data) • Meaning of that information (meta-data) • Presentation information (layout) • Links to other information components © 2005-2006 The ATHENA Consortium. 4 Information Contained v2 • Information in business documents can be of different types: – Structured: • e.g. XML documents or databases – Unstructured: • e.g. text files, Word documents, Emails, most Web pages – Semi-structured: • Web pages with known fields of content (annotations) Structured information: <xml>…</xml> © 2005-2006 The ATHENA Consortium. Unstructured information: 5 Business Example Business documents represent the information exchanged in cross-organisational business processes. Order SUPPLIER Request for Quotation Quotation Order Conf. MANUFACTURER Goal of this course: Show methods for efficient and easy management of business documents exchanged in a crossorganisational business process © 2005-2006 The ATHENA Consortium. Request for Quotation Quotation Order Conf. Order RETAILER 6 Questions No Question Option A Option B 1.1 A business document is Set of information components Set of characters 1.2 A business document consists of 1.3 Information can be structured 1.4 Unstructured information can be Text files 1.5 Structured information can be XML document 1.6 Information in crossorganizational business processes Is represented in business documents © 2005-2006 The ATHENA Consortium. Information Layout unstructured Option C Exchanged as Exchanged during part of a a phone call business activity Meta-data Process Information Semi-structured Word document XML document Data from relational database Option D Data from objectrelational database Annotated web page Image file Is not Is stored in word Is represented in represented in documents XML documents business documents 7 Course Navigation Recommended next section: ● Business Document Standards You can also continue with: ● Business Document Modeling ● Business Document Mapping © 2005-2006 The ATHENA Consortium. 8 Business Document Standards © 2005-2006 The ATHENA Consortium. Classification Categories • Collaboration Agreement: – agree on a document standard and how to implement it • Collaboration: – exchange information and data between organisations, specified e.g. in protocols, or cross-organisational business processes • Business Process / Service Definition: – define organisation-internal business processes and business services • Information Definition: – define business documents and data models • Infrastructure Services: – specify infrastructure necessary to model and exchange business documents © 2005-2006 The ATHENA Consortium. 10 Classification of Standards Collaboration Agreement ebXML CPPA RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services Impl. Guide EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) © 2005-2006 The ATHENA Consortium. Variant Problem WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 11 Selected Standards • Detailed description and analysis of the following standards: • ebXML CCTS • RosettaNet data dictionary and schemas • STEP • OAGI • DFDL Collaboration Agreement ebXML CPPA RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. Impl. Guide Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 12 ebXML CCTS (1) • General information: – Core Components Technical Specification (CCTS) / Part 8 of the ebXML Framework – Defined and maintained by United Nations Centre for Trade Facilitation and Electronic Business (UN/CEFACT) – CCTS is fixed; extensions and modifications are performed by UN/CEFACT – ebXML CCTS can be used in all industries – CCTS does not provide implementation guidelines • Repository: – CCTS describes a repository structure that should be used to store CCTS-based business documents – No information about repository interfaces is provided Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 13 ebXML CCTS (2) • Business document modeling: – Component-oriented approach to model business documents on the business level (i.e. business experts are involved in modelling) including different variants of one document – No transformation to more technical representations is specified – Business documents can be used for company-internal and –external communications – Specifications are done in a semantically standardized syntax-neutral way – Normative rules in CCTS allow for checking correctness of business documents • Transformations / Mapping: – CCTS defines a vocabulary for common concepts that are used in different business documents – No specification provided for mapping CCTS-based documents to other formats Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 14 RosettaNet (1) • General information: – RosettaNet Business Dictionary (RNBD), RosettaNet Technical Dictionary (RNTD), RosettaNet Implementation Framework (RNIF) – Mainly developed by industrial member organizations of RosettaNet – Definitions follow the RosettaNet Standards Methodology (RSM) – Initially targeted at high-tech industry, extended to other industries – Provides excel-based tools to support implementation projects • Repository: – No specifications for repository are provided Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL a EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 15 RosettaNet (2) • Business document modeling: – Component-oriented XML specifications for business documents on technical and execution level are provided – Business documents can be used for company-external information exchange – Variants of a document are supported through implementation guides describing which elements are generic and can be specialized to meet the specific needs of trading partners – Software programs to test the validity of RosettaNet business documents • Transformations / Mapping: – RosettaNet provides dictionaries for both business terms and technical term that can be used to create documents. – No specifications provided for mapping RosettaNet documents to other formats Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 16 STEP (1) • General information: – Standard for the Exchange of Product Model Data – Defined by TC184/SC4 at ISO – STEP is fixed, extensions and modifications are performed by TC184/SC4 – STEP is used in manufacturing industry – Provides implementation guidelines for business documents • Repository: – ISO 13584 specifies a repository structure, the Parts Library Structure – Also specifies how documents should be stored and retrieved Collaboration Agreement ebXML CPPA RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. Impl. Guide Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 17 STEP (2) • Business document modeling: – Component-oriented approach to specify technical level business documents for internal as well as external communication – Business documents are specified in EXPRESS – Variants of documents can be specified using specialization and generalization of entities – EXPRESS to XML transformations are described to generate execution level document representations – Validation process for STEP implementations supported by conformance testing methodology and framework • Transformations / Mapping: – STEP defines a vocabulary / data dictionary for common concepts – No specifications provided for mapping STEP documents to other formats Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 18 OAGI (1) • General information: – OAGIS = OAG Integration Standard – Defined by OAGi = Open Applications Group, inc. plus, AIAG (Automotive Industry Action Group), AAIA (Automotive Aftermarket Industry Association ) – Standard is defined in ISO 10303 documents and can be extended or modified following a dedicated procedure – Standard is open for all industries – OAGi provides implementation guidelines and support services • Repository: – OAGi does not describe a repository structure – Business documents are usually stored on standard but structured file systems Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 19 OAGI (2) • Business document modeling: – Component-oriented specification of companyinternal and –external business documents on technical and execution level – Business documents are specified using XML, XSD – No explicit support for handling variants of documents – XML schemas available to check correctness of business documents • Transformations / Mapping: – No specifications provided for mapping OAG business documents to other formats Collaboration Agreement Impl. Guide RosettaNet PIPs Collaboration Business Process / Service Def. ebXML BPSS Information Def. ebXML CCTS Infrastructure Services © 2005-2006 The ATHENA Consortium. ebXML CPPA Variant Problem EDI STAR OAGI STEP RosettaNet Data Dictionary W3C transport protocols (HTTP, SOAP, etc.) WSDL EDI STAR OAGI Discovery IEEE FIPA OGSA OGSI WS-CDL WS-BPEL XPDL UML UBL standard product attributes 20 DFDL (1) – Why and Who • Format Description for NonXML data – Need for a mechanism bringing the benefits of formal schema definition to legacy or other non-XML formats. – Description, rather than prescription, of formats, to allow use with existing technology alongside definition of new – Uses in integration of new and legacy systems, creation of high performance formats, and mapping and transformation tooling. © 2005-2006 The ATHENA Consortium. • Standard for use in implementing mapping tools – DFDL – Data Format Description Language – Something fulfilling this role already exists in many proprietary systems (e.g. Websphere Message Broker, Microsoft Biztalk) – Common way of describing physical format desirable for interoperability – DFDL Working Group within Open Grid Forum developing specification – First revision to be available in near future 21 DFDL (2) – What and How • Schema based approach – XML schema used to describe logical data format – Annotations contain physical format information e.g. <xs:sequence dfdl:separator=","> <xs:element name="y" type="double" dfdl:initiator="baseQ" dfdl:tagSeparator="=" /> – Use of XML Schema gives several benefits • Existing body of tooling • Can apply prior knowledge • Useful document model and implementation libraries © 2005-2006 The ATHENA Consortium. • Implementation and status – Provided properties should support description of a wide variety of formats • Support for fixed length formats, binary and text encodings, field delimeters • Support for ‘variables‘ e.g. field specifying length of another – Parsers and Serializers can make use of physical annotations to read and write data in the described format – Prototype making use of the current version of specification available (within Virtual XML Framework from IBM) 22 Questions No Question Option A Option B Option C Option D 2.1 Which categories have been used to classify standards? Collaboration Infrastructure services Information Definition Database definition 2.2 STEP belongs to the following categories: Collaboration Business process Definition Information Definition Infrastructure services 2.3 STAR belongs to the following categories: Collaboration Collaboration Agreement Infrastructure services Information Definition 2.4 UBL is related to CCTS STAR OAGI 2.5 ISO stands for International Standards Organization Internal Standards Organization International Sunshine Organization 2.6 Which might be suitable situations for applying DFDL: Designing a new XML Designing a highly based message optimized (for size) exchange format RFQ format Describing a legacy message format when interfacing with a new system 2.7 DFDL Annotations: Describe a format’s logical structure 2.8 DFDL Properties can support physical formats containing: Fixed length fields © 2005-2006 The ATHENA Consortium. Describing the SOAP headers for a web service call Describe a format’s Are embedded in the XML Are kept discrete / Physical structure schema for the document separate from the XML schema Binary Data Comma separated fields Length Prefixed (variable ‘fixed’ length) fields 23 Course Navigation Recommended next section: ● Business Document Modeling You can also continue with: ● Business Document Mapping © 2005-2006 The ATHENA Consortium. 24 Business Document Modelling © 2005-2006 The ATHENA Consortium. Modeling Requirements • Requirements for modeling of business document: – Re-use of model types that are modeled once and can then be used in different document models – Model representation targeted at business experts • Semi-automatic transformation to technical specification – Support for handling variants of business documents: • Share most of their data fields • Differ in a limited number of data fields that depend on the context in which the document is used • Example: a purchase order that differs slightly if used in different European countries © 2005-2006 The ATHENA Consortium. 26 Modeling Approach • Based on Core Components Technical Specification (CCTS) • Component-based thus supporting re-use • Graphical representation to support business experts • Export functionality to create e.g. XML representations • Provides the concept of a business context: – Defines a specific context in which a document is used – Can be assigned to mark a particular variant of a business document © 2005-2006 The ATHENA Consortium. 27 Types of Models • • • • • • • • Primitive Type Model Context Category Model Code List Model Core Component Type Model Core Component Model Business Context Model Data Type Model Business Information Entity Model © 2005-2006 The ATHENA Consortium. 28 Relationships between Models Business Information Entity Model Core Component Model Data Type Model Business Context Model Core Component Type Model Code List Model Primitive Type Model Context Category Model © 2005-2006 The ATHENA Consortium. 29 Primitive Type Model • Models all primitive types • Examples: string, integer, URL • Represented by nodes • Primitive type nodes can be connected by edges: Primitive type integer: Primitive types string and URL: – Means that primitive type x can be substituted by primitive type x – e.g. a URL can be substituted by a string © 2005-2006 The ATHENA Consortium. 30 Core Component Type Model • Specifies the data fields of business documents • Groups multiple data fields each represented by a primitive type Core Component Type Price: – exactly 1 content component: primary data field with the actual value – 1 to n supplementary components: describe the value • Examples: Price, Text © 2005-2006 The ATHENA Consortium. 31 Core Component Model • Represents a template of a business document: Aggregate Core Component Association Core Component – contains all possible data fields • Examples: order, quotation • Aggregate Core Component (ACC) aggregates core components • Association Core Components (ASCC) connects two ACCs • Basic Core Component (BCC) connects ACC with CCT • Property Terms specify the child CC © 2005-2006 The ATHENA Consortium. Basic Core Component Core Component Type 32 Data Type Model • Represent data fields of a business document – similar to CCTs but more restrictive • Is based on a CCT or on a primitive type model • Specifies a Data Type Restriction (DTR) for each content and supplementary component of a CCT Data Type A7_Number (based on CCT Number): – limits the possible values • Several Data Types can be based on the same CCT © 2005-2006 The ATHENA Consortium. 33 Context Category Model • Classify the business circumstances, which define a business context • Examples: industry, geopolitical • Represented by nodes • Edges define a hierarchy of categories © 2005-2006 The ATHENA Consortium. Context Category Geopolitical with two sub-categories: 34 Code List Model • Provide values for business contexts • Restrict the values of data types • Example: country code • Represented by nodes • Code values of a code list are specified textually as an attribute value • Code list authority: organization that wants to define code lists (e.g. ISO) © 2005-2006 The ATHENA Consortium. Code list authority ISO and four code lists defined by ISO: 35 Business Context Model • Describes the business circumstances in which a variant of a business document is used • Specified by an enumeration of context values – Context values are code values of a code list – All necessary code lists are put into a business context node – All required code values are selected Business context CountryContext: Selected value from code list: • Examples: geopolitical region © 2005-2006 The ATHENA Consortium. 36 Business Information Entity Model • Represents a concrete business document used in a crossorganizational business process • Is a variant of a Core Component • Is created in three steps: – Assign a business context – Select the required data field from the data fields of the core component – Add a qualifier • Examples: quotation © 2005-2006 The ATHENA Consortium. 37 Questions No Question Option A 3.1 Which of the following are requirements for business document modeling? Re-use of models 3.2 Data Type models can be based on Primitive type models Core component type models Context category model 3.3 Business context models are Primitive type based on models Code list models Business Information Entity Model 3.4 3.5 Basic core components connect Option B Option D Representation Handling variants of Creating XML targeted at business documents documents business experts Code list model Aggregate core Aggregate core Primitive types and components and components and core component core component association core types types components Aggregate core component Is a template for Contains all a business possible value document fields © 2005-2006 The ATHENA Consortium. Option C Specifies the business context 38 Course Navigation Recommended next section: ● Business Document Mapping You can also continue with: ● Business Document Standards © 2005-2006 The ATHENA Consortium. 39 Mapping Business Documents © 2005-2006 The ATHENA Consortium. Mapping Requirement • Requirement for document mapping – Business processes and services are developed by different groups and use different interfaces. – Standards (ebXML, RosettaNet, etc,) are too complicated for applications to implement – Document mapping bridges between requester‘s service definition and provider‘s service definition. Service Doc 2 Service Doc 1 Requester 1 ... MAP Server Requester n © 2005-2006 The ATHENA Consortium. 41 Mapping Architecture (1) Automatic matching •A mapping generator •An optional automatic Source Schema map generator •A transformation generator conforms to •A runtime that executes the transformation Map Generator Target Schema save Maps conforms to Transformation generator generate Runtime Transformation Sourc e © 2005-2006 The ATHENA Consortium. Target XQuery, XSLT, Java, proprietary 42 Mapping Architecture (2) • A mapping generator – Is usually a graphical component that is used to define the relationship between the source and target schema. • An optional automatic map generator – Automatically populates mapping generator based on computed similarities between source and target • A transformation generator – Generates the runtime instantiation of the map in the target mapping language. For example XSLT, XQuery, Java, SQL • A runtime that executes the transformation against business documents. © 2005-2006 The ATHENA Consortium. 43 Automatic Map Generator • Automatically discovers mappings between elements and attributes in the source and target schema using – Examples of source and target documents (Instance level matching) – Names and structure defined in the schema only (schema level matching) Source Target DeliveryAddress AddrLine1 City State CustomerAdress AddrLine1 City State © 2005-2006 The ATHENA Consortium. 44 Schema Level Matching • Schema level matching can use a number of matching algorithms or combination of algorithms – Lexical matcher looks for schema elements with equal or similar names – A thesaurus matcher makes use of an external nondomain specific thesaurus to find common synonyms and hyponyms – A type matcher makes uses of the simple and complex types of the elements – A structure matcher looks for similar structures and sub-structures within the source and target – An ontology matcher makes use of an external ontology which provides a domain specific vocabulary © 2005-2006 The ATHENA Consortium. 45 Example • Source Schema • Target Schema PartNumber subClassOf EANCode type EAN 8 type Order subClassOf EAN 13 UPC See ontology on next foil PurchaseOrder amount float EAN string UPC string Qty float dueDate datetime deliverydate dateTime accntId string clientId string deliverAddress address deliveryAddr address clientName string Thesaurus Ontology Lexical matching matching matching © 2005-2006 The ATHENA Consortium. 46 Ontology Due Date PartNumber subClassOf type EANCode EquivalentClass EAN 8 Delivery Date type subClassOf EAN 13 UPC NumberOfItems EquivalentClass Quantity © 2005-2006 The ATHENA Consortium. 47 Questions No Question 4.1 Which of the following are requirements for business document mapping? 4.2 Map generator can be A graphical interface A text interface Generate runtime transformation Can map XML and non-XML documents 4.3 Runtime transformation language can be XSLT XQuery Java C 4.4 Lexical matching matches elements with the same name matches elements with similar names Matches elements that are synonyms Matches elements that are subclasses 4.5 An ontology can be used To provide a domain vocabulary To describe synonyms To describe subclasses Are defined outside a mapping system © 2005-2006 The ATHENA Consortium. Option A Option B Option C Option D Match different Match to standard Match between xml Match between documents and non-xml communication Service protocol definition 48