Common Statistical Production Architecture (Version 1.1, December 2014) This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. If you re-use all or part of this work, please attribute it to the United Nations Economic Commission for Europe (UNECE), on behalf of the international statistical community. 1 Contents I. The Problem Statement ........................................................................................................... 4 II. Common Statistical Production Architecture ......................................................................... 6 A. Scope of Architecture ....................................................................................................... 8 B. Service Oriented Architecture .......................................................................................... 8 C. Using CSPA ................................................................................................................... 10 D. Impact on organizations ................................................................................................. 11 III. Business Architecture ........................................................................................................ 12 A. Describing Statistical Production ................................................................................... 13 B. Business Architecture Principles .................................................................................... 14 IV. Information Architecture ................................................................................................... 17 A. Reference frameworks and their use .............................................................................. 18 B. CSPA implementation specifications ............................................................................. 18 C. Information Architecture Principles ............................................................................... 20 V. Application Architecture ....................................................................................................... 21 A. Statistical Service Definitions, Specifications and Implementation Descriptions ......... 21 B. Architectural Patterns ..................................................................................................... 25 C. Non-Functional Requirements ....................................................................................... 26 D. Implementing Protocols in a Statistical Service ............................................................. 28 E. Application Design Principles........................................................................................ 33 VI. A. VII. Technology Architecture ................................................................................................... 33 Communication Platform ............................................................................................... 34 Roles .................................................................................................................................. 36 VIII. Enablers .......................................................................................................................... 39 A. Catalogues ...................................................................................................................... 39 B. Governance..................................................................................................................... 41 C. Legal, Licensing and Financial Considerations ............................................................. 46 Annex 1: Describing “Statistical Production” .............................................................................. 48 Annex 2: Templates ...................................................................................................................... 57 Annex 3: Template Guidance ....................................................................................................... 59 Annex 4: CSPA Implementation Guidance .................................................................................. 66 Annex 5: Glossary......................................................................................................................... 78 2 List of Abbreviations BPMS: Business Process Management System CORE: COmmon Reference Environment CSPA: Common Statistical Production Architecture DDI: Data Documentation Initiative GSBPM: Generic Statistical Business Process Model GSIM: Generic Statistical Information Model HLG: High Level Group for the Modernisation of Statistical Production and Services JMS: Java Messaging Service MSMQ: Microsoft Message Queue OS: Operating System SDMX: Standard Data and Metadata eXchange SOA: Service Orient Architecture TOGAF: The Open Group Architectural Framework VLAN: Virtual Local Area Network 3 I. The Problem Statement 1. Many statistical organizations are facing common challenges. There are two major threats to the continued efficient and effective supply of core statistics that come from within statistical organizations. These are: 1) rigid processes and methods and 2) inflexible ageing technology environments 2. Over the years, through many iterations and technology changes, statistical organizations have built up their organizational structure, production process, enabling statistical infrastructure, and technology. The cost of maintaining this business model and the associated asset bases (process, statistical, technology) is becoming insurmountable and the model of delivery is not sustainable. 3. Historically, statistical organizations have developed their own business processes and IT-systems for producing statistical products. Therefore, although the products and the processes conceptually are very similar, the individual solutions are not (as represented by the different shapes in Figure 1). Every technical solution was built for a very specific purpose with little regard for ability to share information with other adjacent applications in the statistical cycle and with limited ability to handle similar but slightly different processes and tasks. This can be referred to as 'accidental architecture' as the process and solutions were not designed from a holistic view. Figure 1: Accidental Architectures 4. Often it is difficult to replace even one of the components supporting statistical production. Use of these processes, methods and an inflexible and aging technology environment mean that statistical organizations find it difficult to produce and share between systems data and information aligned to modern standards (for example, Data Documentation 4 Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX)). Process and methodology changes are time consuming and expensive resulting in an inflexible, unresponsive statistical organization. 5. Many statistical organizations are modernizing and transforming their organizations using enterprise architecture to underpin their vision and change strategy. An enterprise architecture aims to create an environment which can change and support business goals. It shows what the business needs are, where the organization wants to be, and ensures that the IT strategy aligns with this. Enterprise architecture helps to remove silos, improves collaboration across an organization and ensures that the technology is aligned to the business needs. This work will enable them to standardize their organizations. This is shown in Figure 2 where, as opposed to Figure 1, the countries have standardized their components and interfaces. Figure 2: The result of standardization within an organization 6. Statistical organizations have attempted many times over the years to share their processes, methodologies and solutions, as it has long been believed that there is value in this. The mechanism for sharing has historically meant an organization taking a copy of a component and integrating it into their environment. Examples include CANCEIS (CANadian Census Edit and Imputation System) and Banff (an editing and imputation system for business surveys). However, most cases of sharing have involved significant work to integrate the component into a different processing and technology environment. 7. Figure 3 attempts to explain why the difficulty in sharing or reuse occurs. The figure assumes that the two statistical organizations in the figure develop all their business capability and supporting components and interfaces in a standard way (i.e. they have an Enterprise Architecture as shown in Figure 2). Each organization has standardized within their own organization but not in the same way as the other organization. As shown in Figures 2 and 3 where each country has a different shape of component - Canadian components have a zig zag shape and Sweden have components with slanted edges. If Sweden needs a new component, 5 ideally they need a component with a slanted edge. It can be seen in the third row of Figure 3 that while a component from Canada might support the same process and incorporate robust statistical methodologies, it will not be simple to integrate it into the Swedish environment. Figure 3: Why sharing /reuse is hard now II. Common Statistical Production Architecture 8. As part of the modernization effort, the High Level Group for the Modernization of Statistical Production and Services (HLG) want to take action in order to address the problems and issues described in the previous section. For this reason, HLG has put priority on the development of the Common Statistical Production Architecture (CSPA) and its implementation. 9. If the official statistical industry had greater alignment at the business, information and application levels, then sharing would be easier. CSPA assists statistical organizations to address these problems by providing a framework, including principles, processes and guidelines, to help reduce the cost of developing and maintaining processes and systems and improving the responsiveness of the development cycle. Sharing and reuse of process components will become easier - not only within organizations, but across the industry as a whole. 10. The value proposition of CSPA, in providing statistical organizations with a standard framework, is to: facilitate the process of modernization provide guidance for transformation within statistical organizations facilitate the reuse / sharing of solutions and services and the standardization of processes, and thus a reduction in costs of production 6 encourage interoperability of systems and processes provide a basis for flexible information systems to accomplish their mission and to respond to new challenges and opportunities enable international collaboration initiatives for building common infrastructures and services foster alignment with existing industry standards such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM) 11. CSPA is the industry architecture for the official statistics industry. An industry architecture is a set of agreed common principles and standards designed to promote greater interoperability within and between the different stakeholders that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals (in this case official statistics). 12. CSPA provides a reference architecture for official statistics. It describes: What the official statistical industry wants to achieve – The goals and vision (or future state). How the industry can achieve this – The principles that guide decisions on strategic development and how statistics are produced. What the industry will have to do - Adopt an architecture that will require members to comply with CSPA. 13. A number of frameworks focusing on specific areas have already been developed. CSPA builds on and uses these existing frameworks, notably the GSBPM and GSIM, as the necessary shared industry vocabulary. Adoption of these frameworks by organizations in the industry will improve the common understanding and alignment necessary for joint development, sharing and reuse of components. 14. CSPA complements and uses these pre-existing frameworks by describing the mechanisms to design, build and share components with well-defined functionality that can be integrated in multiple processes easily. CSPA focuses on relating the strategic directions of the HLG to shared principles, practices and guidelines for defining, developing and deploying Statistical Services in order to produce statistics more efficiently. 15. CSPA brings together these existing frameworks and introduces the new frameworks related to Statistical Services (described in Section V. Application Architecture) to create an agreed top level description of the 'system' of producing statistics which is in alignment with the modernization initiative. 16. CSPA gives users an understanding of the different statistical production elements (i.e. processes, information, applications, services) that make up a statistical organization and how those elements relate to each other. It also provides a common vocabulary with which to discuss implementations, with the aim to stress commonality. It is an approach to enabling the vision and strategy of the statistical industry, by providing a clear, cohesive, and achievable picture of what is required to get there. 7 A. Scope of Architecture 17. CSPA is a reference architecture for the statistical industry. The scope of CSPA is statistical production across the processes defined by the GSBPM (i.e. it does not characterize a full enterprise architecture for a statistical organization). It is understood that statistical organizations may also have a more general Enterprise Architecture (for example an Enterprise Architecture used by all government agencies in a particular country). 18. CSPA is descriptive, rather than prescriptive, its focus is to support the facilitation, sharing and reuse of Statistical Services both across and within statistical organizations. CSPA is not a static reference architecture; it is designed to evolve further over time. 19. CSPA is designed for use by investment decision makers in developed statistical organizations. While developing organizations are not excluded, a reasonable level of Enterprise Architecture maturity and a modern technical environment is required for implementation. There are options for making Statistical Services created using CSPA available to developing statistical organizations, these will be outlined in future versions of the architecture. 20. An important concept in architecture is the "separation of concerns". For that reason, the architecture is separated into a number of "perspectives". These "perspectives" are: 21. Business Architecture which defines what the industry does and how it is done (statistics in our case), Information Architecture which describes the information, its flows and uses across the industry, and how that information is managed, Application Architecture which describes the set of practices used to select, define or design software components and their relationships, and Technology Architecture which describes the infrastructure technology underlying (supporting) the other architecture perspectives. CSPA includes: Motivations for constructing and using CSPA through the description of requirements Sufficient business and information architecture descriptions and principles as are necessary for CSPA's scope Application architecture and associated principles for the delivery of Statistical Services Technology architecture and principles - limited to the delivery of Statistical Services 22. It should be noted that CSPA does not include enterprise, business, application and technology architecture descriptions which are not directly aligned to CSPA scope, nor does it prescribe technology environments of statistical organizations. B. Service Oriented Architecture 8 23. The value of the architecture is that it enables collaboration in developing and using Statistical Services which will allow statistical organizations to create flexible business processes and systems for statistical production more easily. 24. The architecture is based on an architectural style called Service Oriented Architecture (SOA)1. This style focuses on Services (Statistical Services in this case). A service is a representation of a real world business activity with a specified outcome. It is self-contained and can be reused by a number of business processes (either within or across statistical organizations). 25. A Statistical Service will perform one or more tasks in the statistical process. Statistical Services will be at different levels of granularity. An atomic or fine grained Statistical Service encapsulates a small piece of functionality. An atomic service may, for example, support the application of a particular methodological option or a methodological step within a GSBPM sub process. Coarse grained or aggregate Statistical Services will encapsulate a larger piece of functionality, for example, a whole GSBPM sub process. These may be composed of a number of atomic services. 26. The granularity of Statistical Services should be based on a balanced consideration between the efficiency of the Statistical Service and the flexibility required for sharing purposes larger Statistical Services will usually enable greater efficiency, whereas a finer granularity will allow greater flexibility for supporting sharing and reuse. Services, regardless of their granularity, must meet the architectural requirements and be aligned with CSPA principles. 27. By adopting this common reference architecture, it will be easier for each organization to standardize and combine the components of statistical production, regardless of where the Statistical Services are built. As shown in Figure 4, Sweden could reuse a Statistical Service from Canada because they both use the same component. 28. CSPA will facilitate the sharing and reuse of Statistical Services both across and within statistical organizations. The Statistical Services that are shared or reused across statistical organizations might be new Statistical Services that are built to comply with CSPA or legacy/existing tools wrapped to be Statistical Services which comply with the architecture. This is shown in Figure 4 by the shapes inside the building blocks. 1 CSPA Guidance recommendation: To assess the SOA readiness of a statistical organization the following maturity assessment is available: The Open Group Service Integration Maturity Model (OSIMM) (accessed 22nd December 2014) 9 Figure 4: Making sharing and reuse easier C. Using CSPA 29. CSPA also provides a starting point for concerted developments of statistical infrastructure and shared investment across statistical organizations. CSPA is sometimes referred to as a "plug and play" architecture. The idea is that replacing Statistical Services should be as easy as pulling out a component and plugging another one in. There are a number of ways in which CSPA may be used by statistical organizations. These are outlined in the sections below. Strategic Planning 30. If statistical organizations are creating and using an industry strategy ("Industry Architecture") and this leads to projects/work programs, they could also integrate/streamline their investment strategies. Where a statistical organization plans to contribute to and or use CSPA in the future, they should modify and integrate their road maps to align with the CSPA framework. Each statistical organization needs to define a strategy to move from its current state to the common future state defined in their roadmap. Development within statistical organizations 31. When a statistical organization identifies the need for a new Statistical Service, there are a number of options they can pursue. In order to fill the gap, the statistical organization can look for Statistical Services that are available in the collaborative space (that is, in the Global Artefact Catalogue). 32. If an appropriate Statistical Service is not found in the CSPA Global Artefact Catalogue, the statistical organization can either: 10 start designing and developing a new Statistical Service, or modify an existing Statistical Service to meet new functional and/or non-functional requirements. 33. This could be done independently or in collaboration with other statistical organizations. This development work should be done in alignment with CSPA to ensure that the new Statistical Services can be added to the CSPA Global Artefact Catalogue for sharing and reuse by other statistical organizations. 34. Sharing means exchanging concepts, designs or software, where each user of a service creates and operates its own implementation of that service. There are levels of sharing. A limited form of sharing would be to provide another participant with the means to replicate (make a copy of) the asset (for example give the source code) (i.e. they share an aspect of the asset only). A more involved form of sharing would entail that the asset is made entirely common (in this case the asset is also reused). 35. Reuse means common use of a single implementation of a service, with only one organization acting as the service provider (the one who runs the service). 36. It is thought that in the current environment, it is more likely that statistical services will be shared across organizations rather than reused. Sharing between statistical organizations still leaves the option of reusing the various implementations internally within the environment of the individual statistical organization. Vendors 37. A statistical organization may choose to have a vendor develop a Statistical Service. A vendor in this case means either a third party commercial vendor or a statistical organization that is selling a product. In the case of a new Statistical Service, the statistical organization should request that it is built in accordance with CSPA. When the product already exists, statistical organizations should verify together if the product meets relevant community requirements. If it does not, statistical organizations can try to influence the vendor to meet requirements. If it meets the community requirements, statistical organizations would ask the vendor to register the Statistical Service implementation to the Global Artefact Catalogue. D. Impact on organizations 38. There will be a number of required changes for an organization implementing CSPA. Adoption of CSPA will require investment with a view to generating the long term benefits identified in the value proposition (see paragraph 10)2. 39. The main changes required at the organization level can be grouped as: A. People Changes Openness to international cooperation 2 An initial set of CSPA Implementation Guidance has been made available for CSPA V1.1. This guidance is available in Annex 4: CSPA Implementation Guidance 11 Building trust in international partners (especially as they may be building services for your organization) Sense of compromise (acceptance that nothing will be optimized for local use, rather it will be optimized for international or corporate use) Development of new functional roles to support use of the architecture (e.g. Assembler, Builder) B. Process Changes Adoption of an industry wide perspective Different approach to business process management and design Commitment to service (contract between different functional units) C. Technology Changes setting up an adequate middleware infrastructure (messaging, repositories) uplift of physical network capabilities (bandwidth, etc.) management of security features 40. In addition to the costs and the targeted benefits, an organization adopting CSPA will benefit from: III. a sustainable and efficient strategy to cope with legacy and phasing out of existing applications a cycle that enables cost saving from reduction in production costs to be reinvested in further infrastructure transformation a positive image both on national and international/industry scene Business Architecture 41. The Statistical Network3 project on Business Architecture. CSPA has adopted the outputs of this work. 42. The definition of Business Architecture being used by the Statistical Network is given below. "Business Architecture covers all the activities undertaken by a statistical organization, including those undertaken to conceptualize, design, build and maintain information and application assets used in the production of statistical outputs. Business Architecture drives the Information, Application and Technology architectures for a statistical organization." 43. CSPA focuses on architectural considerations associated with statistical production as bounded by GSBPM. Business concerns such as ensuring that the corporate work program for a statistical organization best addresses the needs of its external stakeholders, or 3 The Statistical Network is a collaboration group involving the National Statistics Organisations of Australia, Canada, Italy, New Zealand, Sweden and the United Kingdom. 12 recruiting, retaining and developing staff with relevant skills are not central to CSPA. Such concerns are, however, very important considerations in an organization-specific business architecture. 44. Organizations that have formally defined business architecture can reference CSPA when describing aspects of their business architecture which are fundamentally in common with other producers of official statistics. A. Describing Statistical Production 45. To enable efficient and consistent documentation and understanding of CSPA three related concepts have been adopted which are relevant for all readers in relation to "Statistical Production". These are Business Function, Business Process and Business Service. 46. The terms and definitions used in CSPA for these concepts are drawn from GSIM. The terminology and modelling of GSIM aligns with The Open Group Architectural Framework (TOGAF). TOGAF is widely known for defining architectural frameworks. 47. Following is a brief overview of these three concepts4, a more detailed discussion of these concepts and statistical production is contained in Annex 1. Business Function 48. GSIM defines Business Function as something an enterprise does, or needs to do, in order to achieve its objectives. This represents a simpler expression of the definition used in TOGAF. 49. When identifying Business Functions, the emphasis is on an enterprise level ('whole of business") perspective, recognizing that different parts of the business may have different detailed requirements in regard to a particular function. At the level of the Business Function, there is no implementation detail. Business Process 50. A Business Process is a set of process steps to perform on or more Business Functions to deliver a Statistical Program. Key aspects include: 4 a process consists of a series of steps (activities/tasks) there is sequencing (or "flow") between steps a business process is undertaken for a particular purpose what is represented (for the sake of simplicity and clarity) as a single step in a high level depiction of a process might – when viewed in more detail - comprise a lower level (sub)process consisting of multiple steps Descriptions are also available via the CSPA Glossary 13 Business Service 51. A Business Service is the means of accessing a Business Function. It will perform one or more Business Processes. It is the who - or what - will undertake the work associated with each function. Business services should be scoped to support flexible sequencing and configuration of Business Functions within different Business Processes. A Business Service has an explicitly defined interface that requires the knowledge of what the service will deliver (including in what time frame) given a particular set of inputs. A Statistical Service is a kind of Business Service. 52. A central aim for CSPA is to enable more efficient and flexible support for statistical production as described by the GSBPM. Future versions of CSPA will provide more guidance on how CSPA can be applied when designing, managing and performing statistical business processes. However, the initial focus in the development of CSPA has been to clearly and consistently define (both conceptually and in practice) the Statistical Services which support statistical production. The aim is that equivalent business functions (such as imputation) within many different statistical business processes5 will be able to reuse (or share) the same Statistical Service in the implementation of the business service. 53. Key reasons for the initial focus on Statistical Services include: Common Statistical Services provide the greatest opportunity for realizing savings through collaborative development, sharing and reuse While common statistical standards (or reference frameworks) have been agreed for business processes (GSBPM) and for statistical information (GSIM), there is not yet a common framework for Statistical Services, CSPA is filling this gap. Once a common approach to the definition and implementation of Statistical Services is agreed, this will support defining business processes which make use of the common Statistical Services in a manner that supports the delivery of a specific business service, within a given context and purpose (business process) aligned to the business function (GSBPM). B. Business Architecture Principles 54. Principles are high level decisions or guidelines that influence the way processes and systems are to be designed, built and governed. Principles are derived from the mission and values of the organization, taking into account the opportunities and threats that the organization faces. In CSPA, principles are used to express the high level design decisions that will shape the future statistical processes and systems. Decision Principles 55. Decision principles are guidelines to help decide on strategic development. They provide a basis for decision making and informing how the mission is fulfilled. They help enable sound 5 Whether these business processes relate to different subject matter domains in one statistical agency and/or equivalent subject matter domains in different agencies. 14 investment decisions. The following decision principles support the outcomes sought by the High Level Group and key elements of the United Nations Fundamental Principles for Official Statistics. These principles provide a basis for decision making and inform how a statistical organization sets about fulfilling its mission through strategic development. 56. A number of principles which are common to most organization's business architecture (whether formally defined or not) are being identified through other initiatives such as work within the Statistical Network. The following business architecture decision principles were jointly developed via the Statistical Network business architecture project and the CSPA project: 57. Principle: Capitalize on and influence national and international developments Statement: Collaborate nationally and internationally to leverage and influence statistical and technological developments which support the development of shared statistical services. 58. Principle: Deliver enterprise-wide benefits Statement: Design and implement new or improved statistical business processes in a way that maximizes their value at an enterprise level. 59. Principle: Increase the value of our statistical assets Statement: Add value to the statistical organization's statistical assets (either directly or indirectly) through improved accessibility and clarity, relevance, coherence and comparability, timeliness and punctuality, accuracy and reliability and interpretability. 60. Principle: Maintain community trust and information security Statement: Conduct all levels of business in a manner which build the community's trust. This includes the community's trust and confidence in the statistical organization's decision making and practices and their ability to preserve the integrity, quality, security and confidentiality of the information provided. 61. Principle: Maximize the use of existing data/Minimize respondent load Statement: Leverage existing data from all sources (e.g. statistical surveys or administrative records) before collecting it again. Statistical organizations are to choose the source considering quality, timeliness, cost and burden on respondents. Statistical authorities monitor the respondent burden and aim to reduce it over time 62. Principle: Sustain and grow the business 15 Statement: Focus investment and planning on long term sustainability and growth, both in terms of the organization's role and position within its own community as well as internationally. 63. Principle: Take a holistic and integrated view Statement: Ensure data, skills, knowledge, methods, processes, standards, frameworks, systems and other resources are consistent, reusable and interoperable across multiple business lines within a statistical organisation. Design Principles 64. CSPA aims to support organizations in realizing these decision principles in practice. The Business Architecture design principles have been identified for CSPA in conjunction with the Statistical Network Business Architecture project. 65. Principle: Consider all capability elements Statement: Consider all capability elements (e.g. methods, standards, processes, skills, and IT) to ensure the end result is well-integrated, measurable, and operationally effective. 66. Principle: Re-use existing before designing new Statement: Re-use and leverage existing data, metadata, products and capability elements wherever possible before designing new. 67. Principle: Design new for re-use and easy assembly Statement: Design and standardize all new data, metadata, products and capability elements for re-use, so they can be easily assembled and modified to accommodate changing user demands. 68. Principle: Processes are metadata driven Statement: Ensure the design, composition, operation and management of business processes, including all input and output interactions, are metadata driven and automated wherever possible. 69. Principle: Adopt available standards Statement: Aim to adopt open, industry recognized, and international standards where available. Statistical industry standards such as the Generic Statistics Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM) are examples of the standards to be used. 16 70. Principle: Designs are output driven Statement: Ensure the whole statistical process is output-driven. Output is the reference starting point; the statistical production process starts from the output desired, that is from required products, and goes backwards, defining the various aspects of the process. 71. Principle: Enable discoverability and accessibility Statement: Ensure data, metadata, products and capability elements are discoverable and accessible to achieve the benefits from sharing and reuse. 72. The Information and Application architecture design aspects of CSPA are directed by the CSPA Business Architecture design principles. IV. Information Architecture 73. The Statistical Network Business Architecture project provides the following definition of Information Architecture:6 Information Architecture (IA) classifies the information and knowledge assets gathered, produced and used within the Business Architecture. It also describes the information standards and frameworks that underpin the statistical information. IA facilitates discoverability and accessibility, leading to greater reuse and sharing. 74. In other words, Information Architecture connects information assets to the business processes that need them and the IT systems that use and manage them. 75. It includes relating the coherent and consistent definition of information assets at an enterprise level to the information needs of specific business processes and IT systems in practice. 76. As an industry architecture, the Information Architecture set out by CSPA must provide an agreed and actionable (rather than purely conceptual) connection between 77. the common information frameworks and implementation standards agreed within the industry (e.g. GSIM, SDMX, DDI)7, and the practical business goals and needs to be supported under CSPA, such as the ability to share and reuse Statistical Services It must support the needs of: 6 While the standards body responsible for TOGAF recognises the term "Information Architecture", the formal model underlying TOGAF refers to "Data Architecture". 7 The High Level Group Modernisation Committee on Standards validates the rationale for and approves GSIM implementation standards. The current agreed implementation standards (as at December 2014) are SDMX and DDI. 17 business leaders, planners and process designers who are seeking to apply the Business Architecture from CSPA and who need to understand the connection between processes and information at a business level application architects and developers who are seeking to apply the Application Architecture from CSPA and who need to understand how Statistical Services interact with information A. Reference frameworks and their use 78. The Information Architecture will identify common reference frameworks to be used for aligning communication and high level (conceptual) designs. GSBPM will be used as a common reference when recording information in regard to business processes. GSIM will be used as a common reference when defining the information input to, and output from, business processes. A common reference framework for recording information in regard to the definition of Statistical Services is being developed as part of CSPA (this is described in Section V. Application Architecture). A common reference framework to use when describing statistical methods is a gap at this stage.8 79. The completed Information Architecture will not only identify the reference frameworks which apply but also provide guidance on how they are applied, in combination, within CSPA. B. CSPA implementation specifications 80. A major barrier to effective collaboration within and between statistical organizations has been the lack of common terminology. Using GSIM as a common language will increase the ability to compare information within and between statistical organizations. It allows all processes that lead to the production of statistics to be described in one integrated information model. 81. Although GSIM can be used independently, it has been designed to work in conjunction with the GSBPM. It supports GSBPM and covers the whole statistical process. It is assumed in this document that an organization either uses the GSBPM or uses another business process model, which can be mapped to the GSBPM. 82. In order for interoperability and reuse to be supported in practice when applying CSPA, the industry needs to do more than align conceptual designs using common frameworks. While GSIM is a conceptual framework for describing Statistical Information, when it comes to describing information objects in the real world we need to describe them in terms of standards 8 The value agencies achieve through applying CSPA would be greater if such a framework existed. If a framework is agreed in future then it will be referenced in the Information Architecture. 18 for representing those objects physically (i.e. in practice) in a manner which is consistent with GSIM. 83. It is necessary to specify how a conceptual design is to be translated into an implementation which is consistent and readily sharable in practice. 84. This "standard" means of operationalizing conceptual designs can be referred to as an implementation specification (in this case, GSIM Implementation). To this end, the following has been agreed: No firm recommendation has yet been made on implementation specification for business processes.9 Depending on what information is being represented in practice, DDI and SDMX are expected to provide the primary basis for CSPA implementation specification in regard to statistical information (e.g. data and metadata). An implementation specification for Statistical Services is being developed as part of CSPA. 85. There is a need to do more than simply refer to relevant existing standards such as SDMX and DDI. The CSPA implementation specification for Statistical Services will specify: Whether SDMX, DDI or a custom schema should be used for representing a particular GSIM information object, and Exactly how the chosen schema will be applied for the particular purpose. In many instances there are multiple technically compliant means of achieving the same business purpose, the implementation specification will specify which should be used. 86. Implementation specifications mean CSPA is prescriptive in regard to some practical details. While it would be simpler to align with CSPA if it was less prescriptive, the practical value from alignment would be much less. It is often the case that two developments which have a "common conceptual basis", but were implemented using completely unrelated approaches, are difficult and expensive to make interoperable and/or sharable (if it is possible at all). 87. In addition, an organization which has already implemented a different standard, or a local specification, can "map" their existing approach to the relevant implementation specification – they are not required to "rebuild" from first principles. 88. CSPA implementation specifications specify approaches which will support maximum interoperability/sharability on a cost effective basis. In particular cases it may be difficult for an organization to fully comply with a CSPA implementation specification (due to operational constraints). In these cases, compliance to the extent practical will, generally, still realize significant benefits. In other words, while CSPA implementation specifications set the bar reasonably (but not unreasonably!) high, it is recognized not all implementations may be able to achieve them fully in practice. 9 The probable relevance of existing standards such as BPMN and BPEL is expected to be considered at a later stage of development of CSPA 19 C. Information Architecture Principles 89. A number of principles which are common to most organization's information architecture (whether formally defined or not) have been agreed. These are outlined below. 90. Principle: Manage information as an asset Statement: Information is an asset that has value to the organization and must be managed accordingly. 91. Principle: Manage the information lifecycle Statement: All information has a lifecycle and should be managed to provide reliable identification, versioning and all information should be managed independently and beyond the scope of a single service. 92. Principle: Protect information appropriately Statement: All personal, confidential and classified data should be protected and the data should be treated accordingly. 93. Principle: Use agreed models and standards Statement: All information used as inputs and outputs to Statistical Services should be described using a common, business-oriented, reference model. A single standard should be used to define the encoding of each type of information. 94. Principle: Capture information as early as possible Statement: Information should be captured in a standard structured manner at the earliest possible point in the statistical business process to ensure it can be used by all subsequent services. 95. Principle: Describe to ensure reuse Statement: All information should be described in a manner that ensures information is reusable between services. Reuse is intended to reduce duplication, additional human intervention and reduce errors. 96. Principle: Ensure there is an authoritative source Statement: Information consumed and produced by services should be sourced and updated from a single authoritative source. Information should be consistent across all relevant services. 20 97. Principle: Preserve information input into Statistical Services Statement: Information that is input into services must be preserved in the service output to ensure no information loss. 98. Principle: Describe information by metadata Statement: All information consumed and produced by services must be described by sufficient metadata. V. Application Architecture 99. The Statistical Network Business Architecture project provides the following definition of Application Architecture: Application Architecture (AA) classifies and hosts the individual applications describing their deployment, interactions, and relationships with the business processes of the organization (e.g. estimation, editing and seasonal adjustment tools, etc.). AA facilitates discoverability and accessibility, leading to greater reuse and sharing. 100. The CSPA Application Architecture is based on an architectural style called Service Oriented Architecture (SOA). This style focuses on Services (or Statistical Services in this case). A service is a representation of a real world business activity with a specified outcome. It is self-contained and can be reused by a number of business processes (either within or across statistical organizations). 101. Statistical Services are defined and have invokable interfaces that are called to perform business processes. SOA emphasizes the importance of loose coupling. Interactions between Statistical Services are independent, that is, they do not talk directly to each other. Organizations will need a technology solution to support communication between Statistical Services. This solution (for example a communication platform) will not affect the interfaces. It should be noted that SOA is not the same as Web Services, although they are often used in SOA. A. Statistical Service Definitions, Specifications and Implementation Descriptions 102. The level of reusability promised by the adoption of a SOA is dependent on standard definitions of the services. CSPA has three layers to the description of any service. These layers are described in the following paragraphs and in Figure 5. Statistical Service Definition 103. The Statistical Service Definition is at a conceptual level. In this document, the capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs. 21 104. A template of a Statistical Service Definition can be found in Annex 2. Statistical Service Specification 105. The Statistical Service Specification is at a logical level. In this layer, the capabilities of a Statistical Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics and methodologies. 106. A template of a Statistical Service Specification can be found in Annex 2. Statistical Service Implementation Description 107. The Statistical Service Implementation Description is at an implementation (or physical) level. In this layer, the functions of the Statistical Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects. 108. This layer fully defines the service contract, including communications protocols, by means of the Service Implementation Description. It includes a precise description of all dependencies to the underlying infrastructure, non-functional characteristics and any relevant information about the configuration of the application being wrapped, when applicable. 109. A template of a Statistical Service Implementation Description can be found in Annex 2. 22 Figure 5: Service interfaces at different levels of abstraction 110. In general, there will be one Service Specification corresponding to a Service Definition, to ensure that standard data exchange can occur. However, it is recognised that there may be occasions where an additional Service Specification is required, it is likely that this will be associated with variations in the methodology encapsulated within the statistical service. At the implementation level, services may have different implementations (software dependencies, protocols, supported methodologies) reflecting the environment of the supplying organization. Each implementation must rigidly adhere to the data format specified in the Service Specification. The CSPA Architecture Working Group is the final approver for statistical service documentation (Service Definition, Service Specification and Service Implementation Description), and is responsible for publishing through the Global Artefact Catalogue. The role of the CSPA Architecture Working Group is detailed in the Governance section of this document. 23 111. There are a number of roles identified in CSPA (see Section VII) which are involved in the definition, specification, and implementation of Statistical Services. These roles can be considered as operating at different levels of abstraction. Figure 6 illustrates the relationship between these levels and roles for instances where there is one Service Specification for one Service Definition. Figure 7 illustrates the case where more than one Service Specification is required for one Service Definition. Figure 6: Minimal Linkages between Statistical Service Definition, Specification, and Implementation Figure 7: Possible Linkages between Statistical Service Definition, Specification, and Implementation 24 B. Architectural Patterns 112. In general terms, an architectural pattern is a description of a widely recognized, reusable, proven solution to a specific class of recurring design problems. There are also “antipatterns” which are examples of what should not be done. 113. The benefits of using architectural patterns can be described by using the analogy of an expert chess player. To play chess, you must learn the rules and the principles (for example the value of different pieces). However, to improve and become a really good player, you need to learn the patterns used by more experienced players and apply them to your game. In the same way, you could use the principles and non-functional requirements of CSPA, but to get the maximum benefit, it is worthwhile to learn the architectural patterns applicable to your field of work. There are many architectural patterns described in literature. Service Designers, Service Builders and Assemblers in particular can benefit from studying these patterns. 114. Specifically, CSPA Assemblers may want to use either Request/Response and/or Publish/Subscribe patterns. These patterns explain how Statistical Services can be combined to build processes. Observe that these two patterns belong to two rather different architectural styles. 115. Note that, irrespective of which of these patterns are used, the communication between the communication infrastructure and the services can be either synchronous (connection between communication infrastructure and service remains active until the service has finished processing the request and the response is received) or asynchronous (the connection is broken, the service continues processing and a new connection is established later to transfer the results). Request/Response pattern 116. The Request/Response pattern is the basic pattern. The infrastructure makes a call (Request) to a service on behalf of a particular process instance and returns the result (Response) received to that same process instance, where the orchestration of that instance decides which service(s) should be called next. 117. The Request/Response pattern for activating services implies a rather fixed routing of messages between services. The integration infrastructural platform implementing the process "orchestrates" the routing and executing of services. 118. An example of how this pattern could be applied in relation to collection is each questionnaire is stored in an entity service. The entity service exposes the operation to get the questionnaire through a service call. The indicators are computed and stored and made available using an entity service call. 119. The Request/Response pattern can be used if: A functional style and sequential flow is required It is known precisely which service interface should be called 25 120. This pattern leads to less flexibility and tighter coupling between services than the Publish/Subscribe pattern described below. Publish/Subscribe pattern 121. The Publish/Subscribe pattern belongs to the Event-driven architecture paradigm, which can complement service-oriented architecture (SOA) because services can be activated by triggers fired on incoming events. 122. In the Publish/Subscribe pattern, the output-message of a service is routed to any number of consumers, each of which has subscribed to this particular type of message (much like a user subscribing to an RSS-feed). In this case, there is no predefined, fixed sequence of steps that can be considered an orchestrated process; the process is what happens because of the messages generated and the actions that individual services decide to take on the basis of messages received. Such messages are considered Event Messages, rather than results of specific Requests. 123. An example of how this pattern could be applied in relation to collection is when each questionnaire completed publishes an event that is available for subscribers downstream. Early indicators can be produced by processing collection events straight through to aggregation. 124. The Publish/Subscribe pattern10 can be used if: All recipients that may be interested in the event should be notified It is not exactly known which and how many recipients are interested in the event It is not known how recipients respond to this event Different recipients respond differently to the same event Only one-way communication from the sender to the recipient is possible C. Non-Functional Requirements 125. In the context of CSPA, a non-functional requirement is a requirement that relates to the operation of a system. While functional requirements define what the services does (for example, error localization), the non-functional requirements describe a performance characteristic of a system (for example, authorization of who can access the resources and functions of the service). That is, non-functional requirements determine how a service behaves rather than what it should do. 126. Non-functional requirements are important to be captured in the design of the services. They have a significant influence on the software architecture of a service. The Designer11 of a Statistical Service should identify the non-functional requirements that are relevant to that 10 Wikipedia: This pattern provides greater network scalability and a more dynamic network topology, with a resulting decreased flexibility to modify the Publisher and the (structure of the) data published. Note: even in SOA, there is a dependency between “generators” and “consumers” of messages, both on the semantic and the technical levels. However, in SOA the relationships are generally better known. 11 NB: A description of all CSPA roles can be found in Section VII of this document 26 service when they are designing it. The implementation of a Statistical Service provides some functional value when assembled into a value chain within an organization. The non-functional requirements of a Statistical Service address other concerns or behaviours of the service such as performance, security, process metrics and error handling. This section provides some guidance on these concerns. Multilingual Support 127. It is necessary to provide multilingual support to enhance the usability and the capacity to share Statistical Services. All services must be documented at least in English in addition to the local language(s) of the organization developing the Statistical Service. It is highly recommended that organizations that have made translations of the documentation of a Statistical Services in additional languages make them available to the community. Security 128. For the purpose of this document, the security concern relates to controls that are put in place to mitigate the risk that a Statistical Service or the data it controls is misused. This section provides some basic guidance on some of these controls. However, in general it is strongly advised that each Statistical Service implementation complete a Risk Assessment and document a Risk Mitigation Plan for high and extreme risks identified in the assessment. Authentication and Authorization 129. For users interacting with a Statistical Service, the process of identifying who they are (authentication) and working out what resources and functions of the service they can use (authorization) will need to be managed as part of the service. Given that the security architecture and services is a local organization concern, our goal is to avoid excess complexity in these interactions. 130. Authentication should be accomplished through interaction with the communication platform's authentication function. 131. Authorization controls are the concern of the service implementation. Authorization will be controlled by either administrative interfaces on the service or via a GUI-based client for the specific service. 132. Future iterations of CSPA will look at options to 'design in' support for single sign on. Consideration will also be given in future CSPA versions to how the architecture can describe a common approach for the communication platform to pass authorization information along with other service context information. Data at rest 133. Data at rest is of particular interest when a Statistical Service needs to defer state (see discussion in "Service Statelessness" in section V D). Under this circumstance, the security (e.g. encryption requirements or access control) of the data are entirely the responsibility of the 27 Statistical Service. Where a Statistical Service already has a functional dependency on underlying technologies or platforms, it would be reasonable to make use of security functions available in those technologies. Data in transit 134. Security of data in transit (e.g. contained within a message flow as part of a service invocation) will be considered in future iterations of CSPA. Data Sensitivity 135. Sensitivity of statistical data varies amongst organizations and at this stage the architecture does not attempt to converge on a standard definition or treatment. Machine to machine certification 136. Guidance for this will come in a future iteration of CSPA. Organization specific implementations based on assembly time infrastructure can assure security for service communication (for example, use of a VLAN). Performance 137. No specific guidance is provided on the performance characteristics. However, they should be declared in the Statistical Service Implementation Description and it is recommended that examples of performance level are included. Process Metrics 138. A Statistical Service will generally capture metrics related to the function that it performs. To all intents and purposes, these process metrics are treated by the Statistical Service as just one of its outputs and should be reflected as such in the Statistical Service Specification. Error Handling 139. Error handling, in this case, relates to situations where the service fails. Error handling is left to the communication platform to handle as required. Generally there will be protocol specific requirements for flagging errors. The error codes and their meanings need to be documented in the Statistical Service Implementation Description. D. Implementing Protocols in a Statistical Service Service Statelessness 140. The principle of service statelessness is, in essence, that - An individual service should be called with all the information it needs to complete, the service shouldn't rely on previous execution. However there are cases where a Statistical Service needs to defer or persist some form of state. This principle trades off flexibility of the service to fit into a particular process with scalability of the service. 28 141. For CSPA, this means that there are certain situations when the Statistical Service needs to be able to keep information within the Service until a later time. 142. When designing and building a Statistical Service, the capability of deferring state information is important in two specific situations: When the Statistical Service needs to be used in a publish\subscribe design pattern. When the Statistical Service involves human-interaction and can therefore be considered long-running. 143. Statistical Services with the capability of deferring state needs to provide an endpoint to support enquiries about the deferred state. If the Statistical Service is invoked but it lacks some or all of the required information to perform the service invocation, the Statistical Service should handle this using error handling. Event Messaging and Request Response 144. Communication to and from a Statistical Service could be done using several communication patterns. The two main patterns of communication that CSPA supports are Request Response and Event Messaging, as described above. These patterns differ mainly in when the information is being transferred. 145. In the Request Response communication pattern, information is requested whenever a Statistical Service needs it. The information need could be triggered by either a humaninteraction with a Statistical Service or when an automated Statistical Service is called from a communication platform. 146. In Event Messaging, information is transferred as it is created. This means a Statistical Service does not need to request information as information is provided to all relevant Statistical Services at the time of creation. 147. The decision as to whether a Statistical Service is built to support both of these communication patterns is related to the environments in which the Statistical Service will work. Supporting both communication patterns is preferred as this allows the Statistical Service to function both in Statistical Organizations using request-response as the main pattern as well as in Statistical Organizations using Event Messaging as the main pattern. 148. From the point of view of a Statistical Service, there is no difference between the Request-Response and Event Messaging communication patterns, as the provision of information is done in the same manner. When a communication platform supplies the information that the Statistical Service needs, it does so by calling the relevant endpoint provided by the Statistical Service. 149. To support a Request Response communication pattern, the Statistical Service must provide some endpoint where an information request can be made. One possibility is that the Statistical Service is required to have an endpoint that can be supplied with parameters. These 29 parameters would describe what information should be part of the response. For example, the parameters could include being able to specify a timeframe or a context. 150. To support Event Messaging, the Statistical Service needs to be able to push information out from the Statistical Service when it is created. This could be done using a configurable endpoint or message queue provided by the communication platform. Configuring the endpoint should be done by the Service Assembler. How to invoke a service 151. A protocol is the technical implementation of a communication mechanism. It is used to invoke the Statistical Services deployed in a CSPA implementation. 152. A Statistical Service Implementation Description must specify one or more protocols. These protocols are associated with the following aspects of a Statistical Service: Making the Statistical Service reachable as an endpoint for invocation Accessing data that are declared to be passed by reference in the Statistical Service Specification 153. In the following, we provide a list of protocols that are accepted within Service implementations conforming to the CSPA specification. Protocols tagged as "recommended" should be considered in the first place because they represent established industry standards and as such they are likely to be supported in most organizations. Protocols that are accepted yet not recommended are listed for supporting legacy requirements of some organizations. Protocols for invoking service endpoints 154. 155. The protocols for invoking service endpoints which are recommended by CSPA are: SOAP Web Services – Service exposes a WSDL interface and is addressed by a http URI REST Web Services - Service exposes a REST interface and is addressed by a http URI There are also a number of other protocols which are acceptable. These are: Microsoft Message Queue - Service is a MSMQ consumer Java Messaging Service - Service is a JMS consumer File-based invocation – the service is "invoked" when a file is placed at a known location which results in an OS-level trigger to the service; alternatively, the service can poll the location for arrival of "message" files and treat them as service invocations Command line interface - Service is invoked by specifying a command line to be executed on operating system runtime accessible by the platform. 156. CSPA recognized that there are other future protocols. However, these will require further exploration: 30 Possible use of "stream control transfer protocol" (sctp), an efficient guaranteed delivery, connectionless, lightweight transfer protocol 157. In some instances, existing tools support database access. If the database is involved in transfer (and not merely as a local state storage for the service), we recommend that the database access be mediated through the http: protocol. 158. In general, the use of an "out of band" data transfer mechanism should be avoided wherever possible, and used only in circumstances involving the need to transfer large volumes of data. Its addition adds increased coupling between the architecture and services, so its use must be managed carefully. Protocols for passing data by reference 159. A data reference must guarantee, in the context of a specific protocol, that it can uniquely identify a dataset, file, etc. Passing data by reference requires the communication platform to support the resolution of an identifier through the specific protocol it refers to. Managing large data volumes Figure 8: Managing large data volume transfers using "pass by reference" 160. As a general rule, service invocation will involve the Statistical Service receiving a message via the organization-provided communication platform. This message will contain the necessary information objects as well as the requested service. 161. In certain circumstances, the service requires large data sets as inputs. Examples of this could include administrative data files or large survey response files. The problem is similar to a "pass by value" situation in that the input data is passed to the service via in-message approaches. Problem 162. There are a number of problems that can arise if we attempt to send these data sets via the messaging interface: 31 Dataset transfer time can be slow due to messaging overhead (packing / unpacking of data, message segmentation and reassembly, etc.) Communications platform performance may degrade due to the load of transporting the messages between services Service memory requirements can increase before required use (see State Deferral discussion) Solution 163. In order to address this problem, we suggest a "pass by reference" mechanism (see Figure 8) that avoids the need to use the communication platform messaging layer to transport these large data sets. 164. The approach is as follows: The data set being sent to the service is stored in a source location in a manner local to each organization – the location name is associated with a Uniform Resource Identifier (URI) The service consumer invokes the requested service by sending it a message containing the URI for the dataset The service provider receives the URI reference and when ready attempts to retrieve the dataset from repository or cache. If successful, it executes its service actions Upon completion, it may update or place a resulting dataset (if relevant) in the repository or cache 165. The implementation of a data source is local to each organization and may be implemented as part of the communications platform. Organizations may choose to implement a utility service, a repository, a file cache, or some other mechanism. URI management is also a part of local operation. 166. CSPA provides the following guidance for service input dataset retrieval protocols: Recommended protocols: Simple http: file transfer from data source to the service logic (without additional protocols such as REST) Acceptable protocols: ftp: file transfer from the data source to the service logic Use of network file system services (such as SMB, NFS) with appropriate file reference Not Recommended: Database retrieval using queries 32 E. Application Design Principles 167. The design principles have been selected to maximize the flexibility of the Statistical Services wrapped or developed in the context of CSPA. The flexibility of the Statistical Service directly impacts the level of reuse, the flexibility required of the industry vision and the ease with which a statistical organization can implement a Statistical Service. 168. Principle: Maintain independence between design and implementation Statement: The descriptions of Statistical Services are layered in conceptual (Statistical Service Definition), logical (Statistical Service Specification) and implementation (Statistical Service Implementation Description). 169. Principle: Use available standards Statement: The design of Statistical Services should align with, and harness, relevant existing standards and frameworks wherever possible. 170. Principle: Use architecture patterns Statement: Follow architecture patterns depending on best fit to requirements. 171. Principle: Implement using GSIM Statement: Manage standardized service contracts based on GSIM objects 172. Principle: Minimize coupling Statement: Enable services to be loosely coupled externally and be aware of internal coupling. 173. Principle: Maximize Service Autonomy Statement: Maximize service autonomy (completeness) to enable share-ability and reusability (External & Internal) 174. Principle: Include non functional requirements Statement: Non-functional requirements form a key input in design decisions. VI. Technology Architecture 175. The Statistical Network Business Architecture project provides the following definition of Technology Architecture: 33 "Technology Architecture (TA) describes the IT infrastructure required to support the deployment of business services, data services and applications services, including hardware, middleware, networks, platforms, etc." 176. Within each statistical organization, there needs to be an infrastructural environment in which the generic services can be combined and configured to run as element of organization specific processes. This environment is not part of CSPA. CSPA assumes that each statistical organization has such an environment and makes statements about the characteristics and capabilities that such a platform must have in order to be able to accept and run Statistical Services that comply with CSPA. 177. Platform for Service Communication: A communication platform provides the capability for communication between Statistical Services. It enables inter-service communication while allowing Statistical Services to remain autonomous and adds additional capabilities for monitoring and orchestrating the information flow. To assemble a built Statistical Service, the communication platform is updated to integrate with new services. There are multiple ways of establishing a communication platform. Examples of architectural components could be BPMS, ESB, Workflow Engines, Orchestration Engines, Message Queuing and Routing. 178. Platform for Configuring and Controlling Services and Processes: The Platform for Controlling Service and Process execution encompasses the functionalities and tools to support the management and maintenance of services metadata, artifacts and policies. Examples of how this mechanism could be achieved include Business Process Modelling System, Lifecycle Management, Service Monitoring and Management. 179. Platform for Reporting on Services and Processes: The Platform for reporting is responsible for enabling real-time monitoring and near-real-time presentation of user defined business key performance indicators (KPIs). Examples of how this mechanism could be achieved are Static Dashboard or Business Activity Monitoring (also generates alerts and notifications to user when these KPIs cross specified thresholds). A. Communication Platform 180. CSPA provides guidance on the way that organizations should go about building new or wrapping existing Statistical Services. When the time comes for an organization to use a Statistical Service that conforms to CSPA there are some organization specific technology approaches that also need consideration. 181. CSPA does not specify how organizations will coordinate the use of Statistical Services to implement a wider business process. Organizations will need a technology solution to support communication between Statistical Services since the Statistical Services are not to talk directly to each other. 182. Where the Statistical Service being used is largely independent, interfaces between that Statistical Service and others may be manually managed by a person. There may also be other 34 relatively trivial uses of Statistical Service where a bespoke solution to integrate them is developed. These are sub-optimal yet pragmatic ways of achieving reuse of Statistical Services. 183. Where the integration of Statistical Services is non-trivial, a communications platform of some sort will usually be required. The key functions of the communication platform are: Orchestration - managing the sequence of flow of invocations of the Statistical Services Error handling - where Statistical Services fail or where the output of services contain erroneous cases that require a different treatment Message payload translation - in particular where a Statistical Service does not support standard GSIM implementation objects - It is possible to offload this function to specialized Statistical Service Auditing, Logging, Activity Monitoring Performance Management Security 184. Figure 9 illustrates the relationship between the elements that are specified in CSPA and the underlying Communications Platform that is local to an Organization. Figure 9: Statistical Service Components and Communication Platforms 185. In this diagram, two Statistical Services (Edit, Coding 1) have been defined and specified in compliance with CSPA, and their implementations are communicating with each other within the environment of a statistical organization. The Statistical Service instances communicate with each other through the organization's communication platform – this may be a full SOA implementation (bus or broker), a CORE implementation, or some other more rudimentary platform (or no platform at all). 186. It is important to state that CSPA does not prescribe the capabilities and architecture of the underlying Communications Platform – it instead assumes that an organization's Assemblers and Configurers will be responsible for addressing how the platform supports the use of CSPAcompliant Statistical Services. This allows CSPA and its Statistical Services to be used by the widest possible community amongst statistical organizations, all of who may be in different stages of development and modernization. 35 187. In the diagram one can see that there is a second Coding Statistical Service (Coding 2) that doesn't (yet) implement the complete CSPA Statistical Service Specification due to a transitional state. An organization may optionally make use of some form of translation service to address differences between their interfaces (typically at the information encoding level). This is seen as a transitional state – the goal is to ensure that all services adhere to the Statistical Service Specification, while allowing for differences at the Statistical Service implementation level at the protocol level (and underlying platforms). VII. Roles 188. Using CSPA will create new functional roles within a statistical organization. These roles may already exist in some statistical organizations and may have particular people (for example, Chief Information Officer, Enterprise Architect) assigned to them. CSPA makes no recommendation about who in an organization should play these roles – this is left to each statistical organization to decide who performs that role. This section and Figure 10 explain the functional roles in the form of a user story. Figure 10: Roles in CSPA Investor Story 189. The Investor receives demand for a new data collection and needs to compare the cost of running a collection using traditional methods, with the cost of using a set of components as per CSPA. The Investor identifies existing Statistical Services to be used and identifies gaps where no Statistical Service already exists. The Investor weighs up creating a fully bespoke processing 36 solution for the collection against having to build a new Statistical Service that fits into a set of existing Services. This would be done in consultation with other roles. 190. One example would be a case where the government legislates the requirement that some data collections must support people completing their reporting obligations on-line using eForms. The investor reviews the impact of this change and identifies that there is a gap in the Services required to support rendering the design of an eForm for respondents to use. The investor assesses the relative cost and how quickly the new collection capability can be implemented. 191. To assess the cost the investor needs to review what existing Statistical Services are required to implement the new Statistical collection, review the Statistical Service catalogue and identify that an eForm capability is missing from the current catalogue. The requirements for the collection, e.g. the requirement to perform an early release of data after 30 days, could cause some functions to be run in a repeated fashion using coding and editing Services. Once decisions as to how the data collection is to be run are made, these are conveyed to the Designer for more detailed work. Technical Specialist roles Designer Story 192. The Designer has been given a set of business requirements at a high level from the Investor, describing what data is needed, and some parameters of the process. In order to determine what functionality is available, the Designer will consider internal and external capabilities. This will involve a search in the Statistical Services Catalogue, to determine if there are external candidates or catalogued internal candidates. When a possible internal candidate is found, there will be a decision made as to whether the existing functionality should be wrapped and exposed as a Statistical Service, or whether a new Statistical Service should be built. In the latter case, potential collaborators should be identified and negotiated with the Investor. There may not be internal existing functionality. This assumes an up-to-date Statistical Services Catalogue which is known and useable. Drivers would include lower cost, synergy with existing applications and other organizations, and a more generalized approach which will realize greater efficiencies. 193. Once development has been decided, or in the case where existing functionality must be heavily modified, for each needed Statistical Service the Designer will specify the needed functionality to meet requirements. The Statistical Service is defined on a conceptual and logical level by a Service Definition using GSIM information objects and a Service Specification using GSIM implementation objects as presented in Figure 5. 194. On an implementation level, decisions must be made about how to realize needed functionality using technology approaches and these are documented in a Service implementation definition by the Builder. 195. Service design across these levels includes information design, technology design, workflow, interface design, and other relevant aspects. The Designer will address Service 37 dependencies, Service contracts, extensibility, and all data, metadata and performance metrics. Capabilities for configuration by users will also be determined, as well as the degree of configuration to be implemented by the Builder. 196. An alternative scenario is one where Statistical Services are already available, having been found in the Statistical Services Catalogue, and meeting all identified requirements. In this case, the Designer specifies which Statistical Services are to be used, and specifies this for the Assembler to work with directly. Service Builder Story 197. The Service Builder receives Statistical Service definition and a Statistical Service specification from the Designer. The Statistical Service is then implemented by the Builder, by creating a Service implementation definition. The Builder will also implement features of the Service so that the Assembler can integrate it into the local environment so it can be deployed. The Builder tests the components to support the specified functionality. Service Assembler Story 198. The Assembler will take the Statistical Service and integrate it according to the understanding of the needed business process, as expressed in the design documentation. This might take place within a process management environment. There are two cases for the Assembler: one where the required Statistical Services would be entirely assembled within the local environment, which provides a high degree of confidence in their compatibility. The second scenario involves the use of external Statistical Services, which might require extension or modification. In this latter case, issues would be communicated to the Designer and Builder for further development. Non-Technical Specialist Roles Configurer Story 199. The Configurer takes the assembled process, and makes it suitable for use in the intended statistical domain. Parameters are specified according to the domain knowledge of the Configurer. Any issues with the assembled Service are communicated to the Designer, Builder, or Assembler. User Story 200. There is no single user but a chain of users along the Statistical Process. The user chain covers everyone from the designers of surveys, through the conduct of data collection operations, through to those who process the collected data. The User does not need to know where the data and metadata are stored - in particular the user does not need to actively manage how data flows between parts of the processing environment. 201. Once the survey has been designed and administered, the next User is alerted of the arrival of survey responses, including some data which will be auto-coded, and other data which needs manual coding. Once coded, the data would go through a series of edits, including data cleaning and validation, imputation, confidentialization, etc. the User should be able to perform these operations without experiencing a high degree of frustration or complexity. 38 VIII. Enablers A. Catalogues 202. A primary aim of CSPA is to support efficient sharing and reuse of process patterns, information and services at an organization and international level. 203. One key requisite in achieving this goal is an ability to reliably and efficiently discover what is available for reuse to support a particular business need. This includes an ability to efficiently assess whether a potentially reusable artefact is, in fact, "fit for purpose" in practice when it comes to supporting that particular business need. 204. Catalogues of reusable resources have a key role within CSPA. They provide lists and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. The catalogues can be at many levels, from global to local. For example, it is envisaged that each statistical organization will have catalogues of processes, information objects and Statistical Services. 205. However, for the purposes of the CSPA Reference Architecture, it is the global level that is the primary interest. The global catalogue is called the Global Artefact Catalogue. The Global Artefact Catalogue provides information about resources and potential collaboration partners, helping to ensure that Statistical Services conform to the requirements of CSPA. Governance and support mechanisms and processes are being defined to ensure the continued relevance and utility of the Global Artefact Catalogue12. 206. The Global Artefact Catalogue supports the lifecycle management, governance and use of components and services, providing the right level of artefacts for service design, integration and transition to a production environment. 207. The Global Artefact Catalogue is not designed to be platform specific, and will not necessarily hold copies of executable code. It will, however, provide all necessary information about how to access the artefacts, including contact details for further information. 208. The Global Artefact Catalogue contains information about developments that are planned or in progress by statistical organizations. This information will facilitate the creation of a global roadmap of statistical organizations developments that can be consulted to see which organizations are developing what and when. This part of the Global Artefact Catalogue will be available during 2015. 209. The types of artefacts in scope for the Global Artefact Catalogue are: Frameworks including artefacts such as the GSBPM and the GSIM 12 The current governance responsibilities of the Architecture Working Group in relation to the Statistical Service elements of the Global Artefact Catalogue are outlined in the Governance section 39 Standards including GSIM Implementation, DDI and SDMX Policies supporting governance and use, for example "no changes to GSBPM or GSIM unless they are the result of a formal approval process" Guidelines concerning the practical use and integration of artefacts. For example the functionality of Statistical Services (especially interfaces) and descriptions of how to 'plug' component in Project plans about developments, in progress or planned, to promote collaboration at the earliest possible stage in the development of artefacts, e.g. "Statistics New Zealand aim to develop and make available component X by (date)". This will provide the basis for a global roadmap. Analysis and metrics of real life implementations of services or architectural principles Other knowledge assets could include architectural principles and information on how to use the layers of the Global Artefact Catalogue 210. The Information Architecture will guide the approach to defining, populating and using catalogues. This will include identifying and reusing appropriate reference frameworks related to registries. (These reference frameworks, unlike GSIM, are not specific to statistical production.) 211. The Global Artefact Catalogue is a composite conceptual catalogue representing 5 layers of information. Figure 11 below outlines the defined layers and their current implementation. The complete CSPA Global Artefact Catalogue Framework Overview is included in Annex 3. Figure 11: CSPA Global Artefact Catalogue Layers 212. Links to the current implementation of the layers of the Global Artefact Catalogue are provided below: 40 (Layer 1) Knowledge Base (Virtual Help Desk) - via the UNECE Confluence wiki (http://www1.unece.org/stat/platform/display/VSH) (Layer 2) Business Plans and Interest - an initial test implementation will be available in 2015 (Layer 3) Business Capabilities - an initial test implementation using the business capability work of the Statistical Network Business Architecture project and the Eurostat Enterprise Architecture will be available in 2015 (Layer 4) Statistical Services – A prototype catalogue was created on the UNECE Confluence wiki. This is in the process of being transferred to Eurostat for production implementation (Layer 5) Technical/Supporting Services - currently accessed via the Service Implementation Specification for the Statistical Service, the current repository is Github.com, an example can be viewed at https://github.com/edwindj/cspa_rest/tree/ea9d59b5afbb4764cca6b34f5d3120f3cf7c16e6 B. Governance 213. The exact details of the final governance arrangements of CSPA are still being finalised, the CSPA 2015 project will determine future governance arrangements and transition these to the HLG Modernisation Committee on Production and Methods. In the following paragraphs, the current state is outlined in terms of the groups involved and the role they play. High-Level Group for the Modernization of Statistical Production and Services (HLG) 214. CSPA is being developed as a series of projects overseen by the HLG. This group will be the custodians of the outputs. Whilst the architecture itself will be "owned" by the international official statistics community, it will be administered by the HLG, as top-level representatives of that community. HLG Executive Board 215. The HLG Executive Board is responsible for the strategic management of on-going HLG projects. It also prepares new project proposals for agreement by the HLG, and seeks support and resources from interested organizations. Following the completion of a project, the board monitors, and where necessary, coordinates the implementation or use of the project results. 216. In particular regard to CSPA, this group takes a strategic view on where investments into Statistical Services could be made internationally; this is a key focus of the 2015 CSPA project. For example, they may decide that the statistical industry has a particular gap around a capability or activity such as confidentiality. The group would look at what international collaboration could be undertaken to fill this gap. The Modernization Committee on Production and Methods provides input into these discussions. Modernization Committees 217. The UNECE, on behalf of the international statistical community provides leadership for 41 maintaining and extending CSPA to retain its relevance and value as an 'industry asset' through the Modernization Committees. 218. The development of the CSPA Reference Architecture, the Global Artefact Catalogue and initial Statistical Services are only the first part of the story. The CSPA Reference Architecture and the curating of catalogue content will require on-going maintenance and revision to ensure continued relevance. As the architecture is a common asset for the international official statistics community, there needs to be a clear and transparent process for change management. 219. For practical purposes, proposals for future changes will usually be discussed in the Modernization Committee on Production and Methods, or a specific task team under that Modernization Committee. These groups will then formulate recommendations for change, which will typically be bundled together as a package to be formally signed off. 220. The HLG, Executive Board and the Modernization Committee will need to carefully balance the advantages of change, in terms of increasing relevance and usefulness, against the costs of having to implement those changes within statistical organizations. A reasonable degree of stability over time is therefore a key requirement for the CSPA Reference Architecture and associated enablers. Modernization Committee on Production and Methods 221. The Modernization Committee on Production and Methods is the ultimate custodian for the CSPA Reference Architecture and the Global Artefact Catalogue. The 2015 CSPA project (work package 4) will result the effective transitioning of the day-to-day custodianship for these resources. 222. The key responsibilities to be assumed by the Modernization Committee on Production and Methods after the 2015 CSPA project will include: Oversight of the Global Artefact Catalogue hosted by Eurostat Monitoring of the on-going support model for CSPA (including the life-cycle management of statistical services) Working with NSOs to identify practical project opportunities to develop and implement CSPA services in full production and populate these in the Business Plans and Investment area of the Global Artefact Catalogue Formal CSPA governance and supporting the Architecture Working Group and the Technical Coordination Committee (full details of these roles and responsibilities will be developed during the 2015 CSPA project) Modernization Committee on Standards 223. The Modernization Committee on Standards is the custodian for several key international frameworks (GSBPM, GSIM and the soon to be released GAMSO – Generic Activity Model for Statistical Organizations). 42 224. In relation to CSPA, The Modernization Committee on Standards is the approval body for the GSIM implementation standards. The approved GSIM implementation standards are SDMX and DDI. It is recognised that there are certain scenarios where the use of DDI or SDMX is not appropriate for example; where these standards do not cover the type of information that is required, or, where these standards cover the type of information required but for implementation specific reasons they are unwieldy or overly inefficient. In such cases the use case and rationale for the inability to use DDI/SDMX, and the proposed new standard will be documented and submitted to the Modernization Committee on Standards for approval. The Modernization Committee on Standards will validate the rationale for using other standards and advise their decision. Where appropriate the relevant standards bodies/experts will be consulted to ensure alternative approaches do not exist. 225. Current advice from the Modernization Committee on Standards for GSIM implementation is: DDI/SDMX should be used for GSIM implementation where possible. All endeavours should be made to use these standards before others are considered CSPA Reference Architecture to adopt DDI V3.2 as a basis for their implementation, acknowledging that within the next 18 months to 2 years DDI V4 may present additional coverage and that this standard will be actively contributed to by HLG members and adopted for GSIM implementation if it meets needs. The DDI Alliance has confirmed there will be a straightforward migration path from DDI V3.2 to DDI V4. The use of DDI Fragments is encouraged, in preference to DDI Instances, as the method of expressing DDI3.2, acknowledging their intended use for web service implementation. The Modernization Committee on Standards will initiate work, in collaboration with the 2015 CSPA project and the Architecture Working Group, to develop best practice guidance for the use of DDI/SDMX to implement GSIM. This will include identification of key use cases requiring guidance (driven by CSPA requirements), description of which DDI constructs should be used to implement particular GSIM objects, development of DDI Profiles for use in secondary validation and enumeration of currently available tools. The Modernisation Committee on Standards will ensure that support people are identified to assist with DDI/SDMX implementation issues as they arise. The Modernisation Committee on Standards agrees to investigate the feasibility of a project to create Java and .NET development libraries and will raise the possibility with the standards bodies. Architecture Working Group 226. The Architecture Working Group (AWG) has operated since early 2013 under the CSPA projects. The members of the AWG are drawn from a range of backgrounds including IT, enterprise architecture, methodology and standards. 227. The AWG provides advice to statistical organizations who are planning, designing and developing (buy or build) CSPA compliant Statistical Services. The Architecture Working Group approves Service Definitions, Service Specifications and confirms that Service Implementation Descriptions reflect the agreed Service Definitions and Specifications. They also approve the addition of these into the Global Artefact Catalogue (Statistical Services layer). 43 228. The job of the AWG is to manage the integrity and robustness of the catalogue layer for statistical services, therefore the AWG: Approves requests for new Service Definitions Approves requests for additional Service Specifications where one already exists for a Service Definition, e.g. methodological differences Approves requests for additional Service Implementation Descriptions where a production implementation exists, this may include requests for additional GSIM implementation standards, e.g. a DDI implementation exists, but there is a case to be made for an SDMX implementation. 229. Additionally, the AWG guides the CSPA compliance of Statistical Services, therefore Service Definitions and Service Specifications are assessed taking into account: The granularity of the service The naming of the service The information in the templates is understandable/readable Whether the service is "clean" that is, it stands alone The methods The input The outputs The Non Functional Requirements 230. The bolded items are specifically checked in the Service Definition. All items are checked for the Service Specification. 231. A review of the 2014 CSPA project identified that the technical implementation governance is a significant area for improvement. The 2015 CSPA project (work package 1) will see the expansion of the role of the governance and support offered by AWG to cover implementation. The work package will focus on findings and lessons learned from the CSPA 2014 to extend the governance and support offered by the AWG to the implementation of CSPA compliant statistical services. This will be achieved through the following activities: Review of implementation issues and lessons learned during the 2014 project Identification of AWG activities required to support CSPA implementation i.e. Awareness, Research, Best Practices and Capacity and Capability Building Implementation of identified activities including management of CSPA Establish and provide ongoing assistance to CSPA Technical Coordination Committee and extend support for implementation projects in NSO’s 232. Figure 12 below outlines the proposed governance scope for the AWG to be implemented under the 2015 CSPA project. 44 Figure 12: Proposed governance scope of the Architecture Working Group 233. The output from this work package will take the form of CSPA implementation resources, consisting of guidelines, training material and processes, and CSPA compliance reviews. The task of maintaining the content after its initial formulation will be overseen by the HLG’s Modernization Committee on Production and Methods. Technical Coordination Committee 234. A review of the 2014 CSPA project has identified that the technical implementation support is a significant area for improvement. The 2015 CSPA project (work package 2) will see the establishment of a Technical Coordination Committee to support NSO’s who are developing or implementing CSPA compliant statistical services. This will be achieved through the following activities: Establishment of CSPA Technical Coordination Committee Review of implementation issues and lessons learned during the 2014 project Implementation of additional collaboration tools, testing environments and other implementation tools to support implementation efforts Support putting in place technical implementation communities (proposed to be short lived) which will allow fine grained focus and project management for the specific implementation Technical assistance to implementation projects in NSO’s including guidelines, templates, training and helpdesk Bridging the gap between SDMX / DDI and the needs for information flows between services 235. Figure 13 below outlines the proposed scope for the Technical Coordination Committee to be implemented under the 2015 CSPA project. 45 Figure 13: Proposed scope of the Technical Coordination Committee 236. Figure 14 summarizes these groups and roles. Figure 14: CSPA Governance structure C. Legal, Licensing and Financial Considerations 237. "The mission of the HLG is to oversee development of frameworks, and sharing of information, tools and methods, which support the modernisation of statistical organizations. The aim is to improve the efficiency of the statistical production process, and the ability to produce outputs that better meet user needs". CSPA provides an industry reference architecture enabling the development of statistical services for sharing and reuse, however, it does not provide a framework covering the legal, licensing and financial 46 considerations. Statistical organizations involved in HLG activities are now interested in collaborating more effectively to share, develop and enhance joint products. The legal, licensing and financial considerations are critical enablers of effective sharing and reuse of statistical services that have been developed by statistical organizations (either alone or in collaboration with others). It is noted that any CSPA compliant statistical services developed and owned by a 3rd party vendor or community will probably have specific legal, licensing and financial considerations. 238. The Modernisation Committee on Organisational Framework and Evaluation is leading work, on behalf of the High Level Group, to establish the framework for decisions related to the property rights for international collaboration and sharing efforts. Needs, considerations and questions have been identified and a review of all available literature and licensing models has been undertaken. The HLG will consider a proposal for a collaboration community based on a memorandum of understanding. 239. In the interim the following advice has been provided for Statistical Organizations contributing statistical services, noting that each organization must assess this advice against national legislation or policy. Intellectual Property Choose an appropriate license (See: www.choosealicense.com) For software like CSPA statistical services: Gnu Public Licence v3 (GPL v3) For other products like guidelines and other artefacts: Creative Common License or Public Domain (case-by-case decision) 47 Annex 1: Describing “Statistical Production” 1. To enable the documentation and understanding of CSPA to be efficient and consistent, it is necessary to define a number of core concepts which are relevant for all readers (e.g. managers, subject matter statisticians and technologists) in regard to "Statistical Production". 2. It is particularly important to have a common understanding of the concepts Business Function, Business Process and Business Service13. The terms and definitions used in CSPA for these concepts are drawn from GSIM. The terminology and modelling of GSIM aligns with The Open Group Architectural Framework (TOGAF). TOGAF is widely known for defining architectural frameworks. A traditional view of Statistical Production 3. Before GSBPM, it was common to conceptualize statistical production as a "value chain" (i.e. a chain of activities that an organization operating in a specific industry performs in order to deliver a valuable product or service for the market). 4. Different lines (e.g. different statistical domains) of production typically follow fundamentally similar value chains – although they are not identical in detail. This is commonly characterized along the lines shown in Figure 1. Statistical Domain Phase of Production Specify Design Build Collect Process Analyse Disseminate Evaluate Needs 1.1 Population 1.2 Labour … Figure 1: Simple "matrix" view of Statistical Production 5. When each statistical domain's line of production is designed and implemented largely in isolation, this is commonly referred to as a stove pipe approach to statistical production. 13 Terms noted in bold text are GSIM terms 48 6. This is recognized as highly inefficient. It means duplication in building and maintaining production infrastructure (for example tools, trained staff) to support production lines, as well as lost opportunities in regard to economies of scale and flexibility. 7. Within an organization, the line of production for an output such as national accounts is dependent on outputs from other domains – even if these lines of production are not well integrated in terms of internal operations. 8. Also, increasingly often, consumers are seeking an integrated view of the economy, society and/or environment rather than being interested only in the outputs of one line of production. The modernization view of Statistical Production 9. CSPA aims to facilitate a move away from stove piping within or across organizations, in order to realize the benefits associated with economies of scale and flexibility. 10. An enabler for this move is to think about statistical production in a manner which recognizes that each line of production (also known as a 'suite of Statistical Programs')14 is designed to address a specific set of statistical needs, and to produce a specific set of statistical outputs based on a specific set of inputs. 11. A specific team of subject matter statisticians will (typically) have overall accountability for ensuring the statistical needs are understood and addressed correctly in the design and operation of a particular suite of statistical programs and that the statistical outputs achieve the agreed level of quality. 12. It should also be recognized that maximum efficiency will be gained if the equivalent Business Function within each suite of statistical programs can be performed using common production infrastructure. 13. This approach complements a trend in many statistical organizations toward managing statistical production based on a matrix of horizontal (suite of statistical programs) and vertical (functional) roles, responsibilities and accountabilities (see Figure 10). Understanding Business Function, Business Process & Business Service Business Function 14. GSIM currently defines Business Function as something an enterprise does, or needs to do, in order to achieve its objectives. This represents a simpler, and more self-contained, expression of the definition used in TOGAF. 14 In GSIM a 'line of production' would be characterized as a suite of Statistical Programs related to the same statistical domain or the same grouping of statistical domains (e.g. Macroeconomic Statistics and Economic Accounts). The following text uses this GSIM terminology. 49 15. When identifying Business Functions, the emphasis is on an enterprise level ('whole of business") perspective. There is a recognition that different parts of the business may have different detailed requirements in regard to a particular function. These differences in detailed requirements can be one factor that can lead to equivalent business functions being performed by many different organizational units using many different tools (i.e. the stove pipe approach). CSPA aims to minimize the need for separate tools for equivalent Business Functions performed within different suites of Statistical Programs. 16. Not all Business Functions that statistical organizations need relate directly to Statistical Production (and CSPA). For example, statistical organizations need to pay their staff. At a high level, the function of paying staff (e.g. the "payroll function") for statistical organizations tends to be largely "in common" with other enterprises (government and private sector). Many statistical organizations already outsource some, or all, aspects of this because generic providers, with large economies of scale often offer a service which performs the function cost effectively for the statistical organization. Even where this is not the case, statistical organizations tend not to choose to invest in independently designing, building and maintaining tools capable of performing this function. They may choose to use (including integrate in their own environment) externally developed tools, including applying a greater or lesser degree of customization for the purposes of their specific organization. However, the forthcoming Generic Activity Model for Statistical Organisations (GAMSO) will provide an extension to the GSBPM to cover such nonstatistical activities where necessary. 17. CSPA can be seen as aiming to make such a paradigm of reuse of tools more viable for supporting Business Functions related to Statistical Production – such as those described in the GSBPM. 18. An example of a Business Function related to Statistical Production is "Impute missing or unreliable data". (This corresponds to 5.4 in the GSBPM.) As GSBPM suggests, Business Functions can be considered at different levels of detail. For example, a sub-function within "Impute" could be "flag values which have been imputed". 19. At the level of the Business Function, there is no detail of what method has been used to impute estimates what system or other resource (e.g. a staff member) will perform the work Business Process 20. A Business Process is a set of process steps to perform on or more Business Functions to deliver a Statistical Program. Key aspects include: a processes consists of a series of steps (activities/tasks) there is sequencing (or "flow") between steps. Some sequencing may be conditional (for example, if the output of Step A is an error message then we undertake Activity X, otherwise we proceed immediately to Step B a business process is undertaken for a particular purpose 50 what is represented (for the sake of simplicity and clarity) as a single step in a high level depiction of a process might – when viewed in more detail - comprise a lower level (sub)process consisting of multiple steps 21. Each Statistical Program (e.g. production of Retail Trade statistics in Australia) will be associated with a Business Process whose steps can be associated (at a high level) with Business Functions identified within GSBPM. 22. Each Statistical Program, however, is undertaken for a particular set of purposes (e.g. to produce and disseminate particular outputs, with particular levels of quality, that address particular statistical needs). For this reason, the Business Process associated with a Statistical Program: will be unique in terms of the specific inputs and outputs for the process overall, and individual steps within it is likely to be unique in terms of the exact sequence of process steps and process flows overall (e.g. from Collect to Disseminate). some "patterns" of process steps and flows (process patterns) are, however, likely to be in common with other Statistical Programs in regard to particular Business Functions 23. The following provides a simplified illustration of possible interactions between Business Process and Business Function in the context of different Statistical Programs. A. The Business Process for Statistical Program "X" may require that validation of certain variables takes place (GSBPM 5.3) before those variables are subject to Imputation (GSBPM 5.4). Validation of other variables might then take place, followed by imputation of those other variables if required. In this case the particular Business Process has cycled through the Business Function "Impute missing or unreliable data" twice, applying it to different content (different variables) each time. B. The Business Process for Statistical Program "Y" may not require a second "cycle" of imputation for a second set of variables. C. The Business Process for Statistical Program "Z" may impute all variables in a single cycle and then have a data quality assessment step. One of the possible outcomes of the data quality assessment step may be to repeat the imputation function using different imputation methods and/or different parameters/thresholds for the methods that were used the first time. The Business Process for Statistical Program "Z" potentially involves a second cycle of imputation, similarly to Statistical Program "X", but the two business processes differ in terms of the process flow that leads to a second cycle, and how the input settings differ between the first cycle and the second cycle 51 24. The Business Function is the same ("Impute missing or unreliable data") in all cases but the Business Process context for performing the function is different in all five instances (including 2x2 cycles). Business Service 25. Having identified what Business Functions need to be performed during a Business Process, it is necessary to identify who – or what – will undertake the work associated with each function. 26. TOGAF defines a Business Service as supporting delivery of a business capability (Business Function) through an explicitly defined interface. An explicitly defined interface requires the knowledge of what the service will deliver (including in what time frame) given a particular set of inputs. This is termed the "contract" for the service. A Statistical Service is a special kind of Business Service. 27. The definition means that not every Business Function in every Business Process is necessarily performed by a Business Service. Exceptions include: The (individual or organizational unit) owner of the "horizontal" Business Process performs all the activities associated with that process step (Business Function) rather than interfacing with a Business Service that assists them. Another agent (e.g. person, organizational unit and/or IT application) performs work on behalf of the process owner but either the outputs which will be delivered are unable to be specified in advance or, while desired outputs can be specified, there is not a high level of assurance the agent will deliver in accordance with those specifications. 28. The simplification, predictability and economy of scale associated with Business Services typically make them a highly desirable means of delivering Business Functions/capabilities. 29. From the perspective of the Business Process that uses the service ("the consumer") it doesn't matter how the outputs are produced as long as the service adheres to its "contract" (e.g. Service Level Agreement or similar) and delivers the specified business outputs. In other words, to the consumer, the service is a simple "black box", even if the service provider needs to undertake an elaborate process flow of their own in order to produce and deliver the outputs. 30. Taking a simple example, in the past the service provided by a "copy room" required users (consumers) of the service to provide the documents to be copied together with information about how many copies were required, what sort of paper to use, what form of binding was required, when the outputs were required and so on. Based on their specifications, the consumer would also know in advance the cost (if any) of receiving the service they had requested. 31. Having set the requirements, the service consumer did not need to concern themselves how the service providers in the "copy room" delivered the service as long as the outputs meet the agreed parameters in terms of time, cost and quality. If, for example, the agreed turn-around time 52 was 8 hours then it wouldn't matter, from the service consumer's point of view, whether the team in the copy room A. took the full 8 hours using a slow copier and undertaking extensive manual work to apply the required binding B. used an ultra-fast and ultra-sophisticated copier which completed the work, including binding, in 20 minutes C. outsourced the work to another company that could deliver the work within the required parameters 32. If the copy room upgraded its technology to move from A to B, it would remain possible to offer the same service (although, alternatively, they could choose to offer improved services). 33. Within many statistical organizations it is already common practice, for example, for the manager of a survey based Statistical Program to be able to draw on a Business Service provided by a corporate team of field staff to assist with collecting data rather than each program needing to maintain and manage its own team of interviewers. In these cases there is a series of inputs for which the manager of the Statistical Program is responsible such as a questionnaire suitable for administration by the interviewers, any additional instructions for the interviewers and briefings for the interviewers about the aims and design of the Statistical Program in case respondents ask questions of them. 34. It was identified in the previous subsection that different Business Processes will require equivalent Business Functions applied to different inputs, potentially using different statistical methods and/or different parameters and thresholds. In many case, therefore, it is highly desirable that, for purposes of reuse and economies of scale, any Business Services designed to perform a particular function are highly configurable to meet the needs of a wide range of Business Processes. 35. It is also highly desirable that services are scoped in a manner which allows them to support flexible sequencing and configuration of Business Functions within different Business Processes. 36. For example, the GSBPM identifies several sub-functions associated with imputation including A. the identification of potential errors and gaps; B. imputation using one or more pre-defined methods e.g. "hot-deck" or "cold-deck"; C. writing the imputed data back to the data set, and flagging them as imputed; 37. It would be possible to design a single Business Service that simply accepted a number of inputs and then performed all three of these functions in a set sequence. 38. It may be, however, that for a particular Statistical Program there was already a preferred alternative means of performing function A but a desire to use the new service for B and C 53 that for another Statistical Program there were unusual requirements (for sound business reasons) in regard to how data was flagged as imputed, but the new service would be ideal for A and B 39. Even if there is already a single underlying IT application which is performs A, B and C it may be possible to provide three different services which provide access to three different aspects of the application's functionality rather than accessing the full (IT) functionality through a single "all or nothing" service. Getting the verticals agreed 40. Business Functions and Business Services are essential, interrelated, "verticals" in this simple model. 41. If organizations are to be able to identify Business Services for reuse, they first need to be able to identify the Business Function they need performed and then they can identify existing Business Services capable of performing that function. 42. The fact that a Business Service is capable of performing a Business Function does not, in itself, guarantee it will be suitable for reuse in regard to a particular Business Process. For example the Business Service may not be capable of being configured to support the necessary inputs, methods or other non-negotiable requirements associated with the particular Business Function in the particular Business Process the Business Service may not be able to operate successfully in the local legal (e.g. IP/information security), organizational and/or IT environment 43. Nevertheless, without common agreement on Business Functions it is not possible to scope Business Services on a consistent functional basis and then to discover them for potential reuse to perform specific Business Functions. 44. As highlighted in examples provided above, GSBPM provides a good frame of reference for locating and discussing Business Functions. As per the discussion on sub-functions within imputation, however, there are cases where identification of individual functions in more detail than the GSBPM currently provides would be useful. 45. Having recognized this "gap", it is NOT recommended that as a first step, independently of any specific plan to develop shared services, there is an effort made to identify and agree a comprehensive set of lower level sub-functions within the GSBPM. 46. Instead it is proposed that possible sub-functions associated with a particular lowest level building block in the current GSBPM are identified and reviewed. Then agreed Business Services, aligned with CSPA, are being scoped associated with that lowest level building block. Designing and performing business processes 54 47. Business Processes are designed (the steps and their sequencing are decided) and then performed. 48. In cases where Business Processes have not been formally specified, it is possible some steps will have been performed before later steps are decided upon (designed). This is particularly common in the case of exploratory work when a broad plan exists but details of the process are decided based on the results of previous steps. 49. Even where a Business Process has been designed in detail, it is possible that exceptional conditions which arise while that process is being performed will require reconsideration of the design of some steps. 50. While recognizing not all aspects of design are necessarily fixed before a business process starts to be performed, delineation of "process design" and "process performance" is important within CSPA. 51. Each monthly instance of producing Retail Trade statistics involves performing a Business Process, but most aspects of the design of that Business Process remain the same each month. The more the design can be formalized as a repeatable specification, the more opportunities there are for automation and other efficiencies. In addition, there will be greater assurance of consistency of outputs from each monthly cycle. 52. Separate consideration of design is not only important for statistical activities which have regularly repeating cycles. During design it is possible to consider the "process patterns" which have been used by other, similar, programs of statistical production. This can lead to efficiency and consistency in the design of Business Processes. The approach highlights opportunities to reuse (with appropriate configuration) the Statistical Services used by existing programs of statistical production to perform particular process steps. Designing, Building, Assembling and Configuring Business Services 53. The first step for a new Business Service is to design it, including establishing the scope of the service. This includes identifying the Business Functions it will perform/support identifying the range of statistical methods (if more than one) for performing the Business Function which will be supported identifying the requirements of different Business Processes for that Business Function which will be supported in terms of configuration options 54. The service then needs to be built. This includes ensuring the service is capable of fulfilling its service contract under all of the configurations the service is designed to support. This may include building appropriate application services and components. It may also include establishing a team and providing its members with training and operating protocols so they can provide a service. 55 55. Once a Business Service has been built, an organization starts to achieve returns on investment when Business Processes start using it. A Business Process typically entails many steps and flows spanning many Business Functions. A Business Process, therefore, typically, uses many Business Services. Any early step in designing (or redesigning) a Business Process is identifying process patterns, and associated Business Services, it can reuse. This can be seen as "assembling" the range of business services that are fit for supporting the needs of the particular Business Process. 56. The assembly step is where the "vertical" interests of designers and builders of Business Services (obtaining as much reuse of each service as possible) meet the "horizontal" interests of the designers of statistical Business Processes (achieving the specific purposes, including specific statistical quality targets, of specific statistical programs in a cost effective and timely manner). 57. The benefits of CSPA in terms of increased productivity and increase flexibility require, to some degree, acceptance in some cases of reuse of Business Services which, while essentially "fit for purpose", are less than "ideal for purpose" given the specific aims of the specific Business Process some flexibility in the detailed design of Business Process so they are able to reuse available Business Services, on the proviso that "horizontal" quality, time and cost requirements are still be met 58. Having assembled the relevant set of Business Services they then need to be configured (and tested) based on the specific needs of the particular Business Process. 56 Annex 2: Templates Statistical Service Definition Template Name GSBPM Business Function Outcomes Restrictions GSIM Inputs GSIM Outputs Service dependencies Process Method(s) Statistical Service Specification Template Statistical Service Specification: Name of Statistical Service Protocol for Invoking the Service This service is invoked by calling a function called "Name of Statistical Service". Describe any parameters The protocol used to invoke this function should be in compliance with the guidance provided for developing Statistical Services by CSPA. Input Messages In GSIM terms, the inputs to this service are …… Describe specific inputs in terms to GSIM implementation Output Message The outputs of the service are …… Describe specific outputs in terms to GSIM implementation Applicable Methodologies Describe the statistical methods that may be implemented in this Statistical Service 57 Statistical Service Implementation Description Template Name A name that identifies the Statistical Service implementation. It must be unique in the Service catalogue. Version Version number Builder Organization The owner of the Statistical Service, i.e. the Service Builder's organization. Statistical Service Definition The link to the Statistical Service Definition document. Statistical Service Specification The link to the Statistical Service Specification document. Invocation protocols List of technical protocols supported by the service for communication. Accepted protocols are listed in this document. Mandated Service Interface Protocol-dependent specification of the information required to invoke the service. Definition of Service Interface (if service is Web service) or Environment settings (if service is local) Examples: WSDL interface for SOAP Web Service protocol List of HTTP request parameters for REST Web Service protocol Command line specification for Command Line protocol Add other examples for other supported protocols Data-by-Reference protocols For each input passed as reference, specify supported protocol(s). Accepted protocols are listed in this document. Technical dependencies Which methodologies listed in the Statistical Service Specification are supported by this Service Implementation Technical dependencies List of technical requirements of the service in terms of: Operating system(s) (specify version) Runtime platforms – any additional software that has to be installed on the machine the service is installed on (e.g. SAS, R, Java virtual machine, .net runtime, J2EE container, etc. – Specify version) Database(s) Other dependencies (libraries, packages etc.) Installation documentation Installation guide for the Service Assembler Additional information Any additional information for a Service Assembler which is deemed relevant by the Service Builder 58 Annex 3: Template Guidance A. General I. What is the scope of a Statistical Service (Nature of a Service), including the GSBPM element? 1. A Statistical Service will perform one or more tasks in the statistical process. Statistical Services will be at different levels of granularity. An atomic or fine grained Statistical Service encapsulates a small piece of functionality. An atomic service may, for example, support the application of a particular methodological option or a methodological step within a GSBPM sub process. Coarse grained or aggregate Statistical Services will encapsulate a larger piece of functionality, for example, a whole GSBPM sub process. These may be composed of a number of atomic services. The granularity of Statistical Services should be based on a balanced consideration between the efficiency of the Statistical Service and the flexibility required for sharing purposes - larger Statistical Services will usually enable greater efficiency, whereas a finer granularity will allow greater flexibility for supporting sharing and reuse. Services, regardless of their granularity, must meet the architectural requirements and be aligned with CSPA principles. 2. CSPA outlines that the structure of a service should align to GSBPM sub processes and that CSPA-services can cover a broader type of services with multiple “capabilities” and therefore multiple sets of related inputs and outputs. Further development of services will show if the current architecture and templates for services need adjustments before providing good support for statistical services that are larger in scope. "The Statistical Service Definition is at a conceptual level. In this document, the capabilities of a Statistical Service are described in terms of the GSBPM sub process that it relates to, the business function that it performs and GSIM information objects which are the inputs and outputs" CSPA Guidance 3. The GSBPM element in the Service Definition should be used to convey the context of the GSBPM the author believes this service is targeting/addressing. This is not intended to be a restrictive statement; it should not be interpreted as a limiting of the services possible use. However IF the service could increase the risk of information disclosure or decrease information accuracy for example; under such circumstances these scenarios should be listed to inform the user. 4. Note that CSPA will refer to GSBPM for context in order to improve understanding and allow greater alignment. However CSPA is not modelled directly within the scope of GSBPM steps and sub-steps and as such some services will operate with a larger scope than a single GSBPM step; and some services will operate in multiple GSBPM sub-steps. 59 Examples II. What is the relationship between GSBPM, and the CSPA Service Definition, Service Specification and Service Implementation Description? Guidance 5. Below is a layered model for the documentation and traceability of a CSPA Service. The lower numbered layers (presented at the top) describe more conceptual aspects of the service and each layer becomes more concrete moving down the diagram (toward higher numbered layers). 60 6. The top layer describes the broad business function in GSBPM terms. The second layer describes the functional aspects of the services for this business function. The third layer describes the methodological aspects of the services and the fourth layer describes the implementational aspects of the (methods implemented in the) services. 7. There are also cross-cutting definitions of non-functional aspects that can be applied at each layer. 8. It is also possible that a service supports multiple steps of the GSBPM, in such a case the definition will reference each step. It means that the documentation starts to resemble more of a graph than a static single focus document. Hence thought will need to be spent on the appropriate format, be it conventional with hyperlinks, a web based Wiki style or an alternative content managed representation. 9. The intention is where multiple services are targeting the same area and have similar characteristics, users should be able to easily identify this. 10. Effectively from a user’s perspective, all available CSPA services should be exposed via a catalogue and with extra drills up and down to allow greater understanding and ultimately selection. 11. Functional areas could be categorised broadly aligning with the types of data-centric activities that are performed for example: Orchestration Storage Manipulation Search & Discovery 61 Logging & Audit Visualisation Access 12. It is recommended that extra work be completed to ascertain the architectural styles in existence and then propose further perspectives to greater explain and contrast service granularities. III. Where do I include the Non Functional Requirements? Guidance 13. Please include non functional requirements in the Service Specification Template IV. Where do I go if I need additional help? Guidance 14. Please post an issue or question to the CSPA Architecture Working Group via the CSPA Discussion Forum on the UNECE wiki http://www1.unece.org/stat/platform/display/CSPA/CSPA+Discussion+Forum B. Service Definition Template I. GSIM Inputs and Outputs Guidance 15. When creating a Service Definition it is important to ensure that the use of input and output objects are clearly explained and justified. The documentation should annotate each input and/or output with the following details: Why that object has been chosen. What the service is (conceptually) intended to do with the object Whether that object carries or relays any potential context to steps of the process beyond this specific service. This helps with the lineage of information integration across processes; the implementer can then understand that such context is available if needed. 16. It is advised that for increased understanding and to increase multiple possible implementations, objects in the GSIM Concepts group are favoured over more specific objects in more technical groups. II. What is the current best practice example for the Service Definition? Guidance See Error Correction Service Definition 62 C. Service Specification Template I. What constitutes implementation (technical detail) that should not be in the Service Specification template? Guidance 17. The Service Specification should not include technical implementation detail, Service Specifications can recommend an implementation standard (e.g. DDI or SDMX), but should not require the use of a specific standard. While current examples of Service Specifications do include technical implementation detail, they have been developed at varying times as this advice has been developing. II. What is the current best practice example for the Service Specification? Guidance See Sample Selection Service Specification D. Service Implementation Description I. When do I use DDI, SDMX or 'other' GSIM Implementation Standards for Input and Output? Guidance 18. Please note the following guidance has been provided by Statistics Italy (Istat) in response to the AWG decision "AWG decision (22/5): All CSPA services should be implemented in a standardised way and maintain separation of data and metadata: -- explicitly separate data and metadata -- Where relevant express metadata in a standard model and format (e.g. SDMX, DDI, RDF Data Cube) IstatConsideration on CSPA Input-Output.docx II. Is local invocation of statistical services supported? Guidance 19. The preference is for remote invocation III. Which should I use WS-I or RESTful web service? Guidance 63 20. When designing a service, in addition to your own skills, product set, infrastructure and cost; consider the following broad guidelines: 1. Build services in a RESTful manner when the objects are largely accessed by external and wide audiences, those objects are not sensitive and the action is Read-Only. This improves portability, increases re-use and simplifies implementation and consumption. 2. Build services according to WS-I when the objects are more sensitive in nature and hence the transport is more likely to need encryption and the session need authentication. Also consider this standard when the actions move beyond read and those objects are created, deleted or modified during the interaction. Although this could increase development times, it makes for a more reliable and trustworthy service from the perspective of the implementer and as such will be less likely to result in the need to further wrap the service to meet local security architecture needs. Supplemental Information - this is not approved guidance it is provided for information 21. Format specification for RESTful interfaces. The use of SWAGGER http://swagger.io/ was trialled during the 2014 CSPA project, while the product still has gaps etc those involved in trialling it felt it was helpful. IV. Are there any defined common message parameters? Guidance 22. It is recognised that within each implementing organisation the role of process orchestration will exist in some form; be it an orchestration product or the configuration and coupling of services and components as integrated. 23. It is assumed that standard patterns will be followed when creating services, as such it is assumed that identifiers will be implemented within services to support orchestration needs such as tracking, logging/audit and correlation for result re-integration. 24. Generally services should allow the initiating component (requester) to provide, as input, an externally generated ID if required. This ID should be included in the response for any request; if the initial request results in multiple responses, this ID should be passed back as a correlation ID to the requester or onward to the next component. Where possible services should provide as much context as possible to facilitate the requester making efficient processing decisions about the result of the request. 25. Services should assume that they are running within a larger logical orchestration environment and as such support the needs of that orchestration role. 26. It is expected that specific implementation examples and patterns will be provided to further define these needs; currently this category of inputs and outputs to support orchestration 64 fall beyond the logical business activity and as such may be more explicitly grouped as such at a later stage. V. Should I encapsulate service orchestration (queuing and scheduling) within the statistical service? Guidance 27. It is important that there is separation of logic between the service and the calling environment (like an orchestration engine). Please refer to VI._Technology Architecture. VI. What do Service Builders and Service Assemblers need to understand about handling data transferred to/from the statistical service? Guidance 28. Ensure descriptions are clear on how each service handles data and the associated cleanup, exactly what it does (which should be minimal) and what requirements are put on the local environment (orchestration). 29. Ensure the statistical service receives, as part of its input, the location where it can deliver output data. VII. What is the current best practice example for the Service Implementation Descriptions? Guidance See Linear Rule Checking Service Implementation Description 65 Annex 4: CSPA Implementation Guidance 1. The set of frameworks and standards developed under the HLG auspices (including CSPA) are powerful tools to enable change across Statistical Organizations, particularly focussing on efficiency, speed and modernization. However, these frameworks and standards only enable change, enduring business change requires high level commitment, organizational buy-in and significant leadership and support. 2. The CSPA Statistical Reference Architecture has been developed over the past 2 years from an initial conceptual basis. CSPA is a key enabler of the HLG vision, and significantly advances the ability of Statistical Organizations (as an industry) to move from organizational statistical production systems to the collaborative development and sharing and reuse of statistical services. There has been significant progress and good successes to date, proving that CSPA is not just a conceptual theoretical model. 3. Effectively implementing CSPA compliant shared statistical services requires that the implementation management and support aspects (international and local) needed to implement them efficiently and reliably are in place and well understood. CSPA implementation guidance is starting to be developed, initially this guidance is in the form of training material and Use Case descriptions. 4. Following is the initial set of CSPA implementation guidance to support Statistical Organizations. This will be further developed through the 2015 CSPA project and the work of the HLG Modernization Committees. A. CSPA Training Resources B. CSPA Use Cases - CSPA Roles and Usage Scenarios C. Guidelines for the creation of content for the Technical /Supporting Services Catalogue A. CSPA Training Resources 5. The following material was prepared by Istat for a training course on CSPA delivered in June 2014, under the European Statistical Training Programme, sponsored by Eurostat. Please note the materials were developed against CSPA V1.0, you may notice some slight differences to the CSPA V1.1 content. 6. It is recommended that the training materials are presented by those with at least a foundational understanding of GSBPM, GSIM and CSPA. There is no scheduled CSPA training for 2015, if you would like support in delivering training in your organization please contact support.stat@unece.org 7. The full set of resources is available at http://www1.unece.org/stat/platform/display/CSPA/CSPA+Training+Course. B. CSPA Use Cases - CSPA Roles and Usage Scenarios 66 8. The following Use Cases have been developed by Statistics Netherlands (CBS) to assist their development and implementation of CSPA services. In particular they include internal resources, catalogues and roles which they have found useful. These Use Cases are provided to assist implementation efforts they are not formal CSPA artefacts. 9. These Use Cases describe which stakeholders are involved in the use of CSPA and which user scenarios are relevant to these stakeholders. These Use Cases reference the Statistical Services Catalogue (service catalogue), please refer to A. Catalogues for information on this catalogue. 1. Designing a new statistical process or redesigning an existing process 1.1 Purpose 10. To design a new statistical process or to redesign an existing process in terms of GSIM objects. 1.2 Primary stakeholder 11. Designer. 1.3 Secondary stakeholders 12. Service Assembler, Service Builder. 1.4 User story The Designer defines the output of the statistical process, which is the starting point for the process design; The Designer defines the inputs to the process; The Designer defines the business functions necessary to create output from input; The Designer translates the business functions into process steps; The Designer determines the statistical services that should perform the necessary process steps. This is accomplished by looking in the Service Catalogue: The Designer browses the Service Catalogue, looking for a specific statistical service, by using any of the following example search criteria: o Keywords in the name and (methodological) description of the statistical service; o The GSBPM categorization of the involved business function; o Design details of the statistical service (date designed, authors, involved agencies); o Keywords in the metadata of the inputs and outputs involved in the statistical service; o By known processes and business functions where the statistical service is in use The Service Catalogue shows the statistical services that match the search criteria; 67 1.5 The Designer studies these statistical services and determines which ones are best suited for performing the envisioned business function. Whether a statistical service is suited depends on how well its function and its inputs and outputs match with the business function. From here, there are three scenarios possible: o One of the existing statistical services found is exactly what is needed to perform the business function. This statistical service is selected; o One of the existing statistical services found is more or less what is needed. The Designer will try to adapt the envisioned statistical process and if necessary the related inputs and outputs to the available statistical service; o None of the existing statistical services is close to performing the required business function. The Designer will either adapt the envisioned statistical process fundamentally or will ask the Service Builder to create a new statistical service; The Designer repeats this process until the entire statistical process is mapped to business functions and statistical services; The Designer determines for each selected statistical service whether it is are already available locally (on the local infrastructure). If not, the Service Builder is requested to make the service available locally. After finalization of the process design, a Service Assembler is requested to realize the statistical process according to the design. This realization will probably involve multiple iterations that may involve adaptations and amendments to the designed process and required statistical services in collaboration with the Designer and Service Builder. Stakeholder implications 13. The way of working here described requires that the process designer "thinks" in terms of GSIM objects: outputs, business functions, process steps, etcetera. One of the major challenges is determining the granularity of process steps and distinguishing between "generic" aspects of a process and local aspects. Furthermore, this approach requires that existing statistical services can be implemented in a local environment and that new statistical services can be created if necessary. This requires availability of the Service Assembler and Service Builder functions. 2. 2.1 Assembling statistical services Purpose 14. To implement a designed statistical process by implementing its business functions and integrating its statistical services. The technical infrastructure to implement the statistical services is already available according to CSPA standards. Using this infrastructure, the execution of each of the involved statistical services and its access to inputs and outputs is realized. Also, the implementation of human activities is embedded in the organization. 2.2 Primary stakeholder 15. Service Assembler. 68 2.3 16. 2.4 Secondary stakeholders N/a. User stories 2.4.1 Assembling statistical services using an orchestrator The Service Assembler starts the orchestration tool and connects to the local Service Catalogue; The Service Assembler selects the statistical services specified by the service designer; For each service input, the Service Assembler defines a mapping to a service output. This is only allowed if the metadata on both ends of the mapping match. The Service Assembler specifies values for any static parameters of the statistical service; The Service Assembler maps dynamic input parameters onto dynamic output parameters; The Service Assembler stores the assembled statistical process; The orchestration tool generates the user interfaces required to execute the assembled process. 2.4.2 Assembling statistical services using custom software The Service Assembler adds an asynchronous web service call to a statistical service to the customer software. In this call, an input parameters package [1] has to be specified, which contains the following: o For each of the service inputs, a URL to a file in CSPA standardized format; o A value for each of the input parameters of the statistical service. The Service Assembler adds an event handler that is activated by a callback from the statistical service. This event handler accepts an information response package [1], which contains the following: o For each of the service results, a URL to a file in CSPA standardized format; o A value for each of the output parameters of the statistical service. At the transition of the custom software to the production environment, the used services (including version numbers) are mentioned in the application dependencies chapter of the custom software; Quality Control ensures that all used services are of the latest available version. 2.5 Stakeholder implications 17. The way of working greatly differs whether the Service Assembler uses an orchestration tool to "plug and play" a statistical process or whether custom software is developed. 18. In case of an orchestration tool, the technical environment must be very mature (as there is little room to "glue" services together). There will probably already be a lot of services available, because otherwise it will be impossible to create a complete statistical process. 19. In case of custom software, there is much more room for improvising. Especially on the roadmap towards a mature use of CSPA, this scenario is most realistic. This means that the Service Assemblers are software engineers that must be prepared to use statistical services as 69 much as possible, even though they may not be used to it and even if they do not offer the "most efficient" or "best performing" solution for a specific piece of software. 3. Building a new service and implementing it locally 3.1 10. Purpose To implement a new statistical service that has been designed by a Designer. 3.2 Primary stakeholder 11. Service Builder. 3.3 12. 3.4 Secondary stakeholders Designer. User story The Service Builder receives a statistical service definition from a designer. The Service Builder verifies that no implementation of the service definition already exists. The Service Builder builds a component that performs the required function based on the designed inputs, input parameters, results and output parameters, with the following guidelines: o The inputs and results are by definition URL's that refer to files in the CSPA standardized format; o The inputs are read-only; o The result files must not yet exist prior to calling the service; o One or more data adapters may be called by this service to preprocess the input files to intermediate files that the component can read, but it is preferred that the component reads from the input files immediately; o One or more data adapters may be called by this service to post process files generated by the component to create the service results, but it is preferred that the component writes the result files immediately; o The component is implemented in such a way that it is as independent as possible from other components that have to be available on the local environment it runs on and even as independent as possible from any specific local environment. This should make it as easy as possible to run the service with this specific implementation on any local environment. This involves the following considerations: Using platform independent programming languages; Using as least references components / libraries as possible; 70 If other components / libraries are necessary, choose components / libraries that are available on different operating systems; Not using inputs (files, databases, etcetera) and outputs other than the inputs and outputs defined by the service; 3.5 Comply with Multilingual Guidelines_v4.pdf and international coding standards; List to be further elaborated. The Service Builder creates an implementation manifest containing the following artifacts: o A description of how the statistical service was implemented; o Requirements on the executing machine (which operating systems are supported, hardware characteristics, …) o A deployment manual describing how to implement the service in a specific local environment. The Service Builder asks the Service Maintainer to create an entry in the Local Service Catalogue, handing over the statistical service definition, the implementation manifest and, if applicable, the source code and (references to) binary components. The Service Maintainer performs a quality check and, if passed, updates the local catalogue, CMDB, etc and implements it on a Statistical Server in the Production environment. Adding the service to the production environment automatically adds it to [[Service Availability Control]]; The Service Maintainer requests the International Catalogue Maintainer to add this new Services, including its implementation, to the Global Service Catalogue. Stakeholder implications 13. The Service Builders are software engineers or technical methodologists that will have to make components based on a precise statistical design in terms of GSIM objects. Furthermore, they will have to deal with the specific restraints of running services in a CSPA context, like using standardized CSPA files as inputs and results and having international deployability (platform and country independency) in mind. 4. 4.1 14. Locally implementing a service with an existing implementation Purpose To implement a globally available service in a local environment. 4.2 Primary stakeholder 15. Service Builder. 4.3 Secondary stakeholders 71 16. N/a. 4.4 User story If the service has more than one implementations, the Technical Catalogue Administrator chooses the implementation best suited for the local environment on which it has to run. The Technical Catalogue Administrator performs a Quality Check on the chosen implementation and, if the check is passed, adds it to the Local Catalogue: o The Technical Catalogue Administrator retrieves the implementation manifest of the globally available service; o The Technical Catalogue Administrator retrieves the binaries of the service component and any referenced components / libraries; o The Technical Catalogue Administrator updates the CMDB; o The Technical Catalogue Administrator implements the service and its components on a Statistical Server (in Development, Test, Acceptance) according to the manifest. Transition to Production is performed when the first system using the Services is moved to Production, where a Quality Check has already been performed on the Service Implementation. Adding the service to the production environment automatically adds it to [[Service Availability Control]]. 5. Executing statistical services 5.1 Purpose 17. To execute an entire statistical process assembled from statistical services, or to use custom software that makes use of statistical services. 5.2 18. 5.3 19. 5.4 Primary stakeholder User. Secondary stakeholders Service maintainer (not a formal CSPA role) User stories 5.4.1 Executing an assembled statistical process The statistician uses the user interfaces generated by an orchestration tool to execute a statistical process, given certain parameters (like: statistical period, etcetera). The user interfaces give feedback about the progress of the execution. In case of an error, the statistician browses the generated log files to determine the cause. In case of a technical failure, the statistician contacts the service maintainer. 5.4.2 Executing services using custom software 72 5.5 The statistician uses the custom software, which calls one or more statistical services "under the hood"; Any errors within services are handled by the custom software and if necessary reported to the statistician. In case of a technical failure, the statistician contacts the service maintainer. Stakeholder implications 20. The User, who is a statistician, must bear in mind that the statistical process is executing in a generic way, using generic components. This means that user requirements that are very specific to one country, one agency, one department or even one person cannot be fulfilled to ensure the design-once-execute-anywhere paradigm. 6. 6.1 21. 6.2 22. 6.3 23. 6.4 Functional Administration of a statistical process Purpose Functional ownership of the (local) Service Catalogue. Primary stakeholder Functional Service Maintainer. (not a formal CSPA role) Secondary stakeholders User, Designer, Service Assembler. User story 6.4.1 Ensuring continuity of the statistical production process (execution) The Functional Service Administrator inspects the orchestration logs / custom software logs of a specific statistical process to find errors that may have occurred. Each log record is of any of the following types: o Information (service-internal or service-external): messages that may be useful to a functional or technical maintainer, but that does not indicate a problem; o Warning (service-internal or service-external): messages regarding unexpected inputs or results of the statistical service, like: no observations found for a certain statistical period; o Error (service-internal\functional): messages that indicate that the service could not be successfully executed because of semantically incorrect input datasets or user input; o Error (service-internal\technical): messages that indicate that, while the service was successfully started, it could not be successfully completed because of a technical problem, like file access problems or memory problems; 73 o Error (service-external): messages that indicate that a service could not be successfully started because of a technical problem, like server unavailability or authorization problems; o List to be further elaborated. In case of (service-internal\functional) errors, the Service Assembler (in case of an orchestrated process) and/or User are contacted to investigate the cause of the incorrect service inputs; In case of (service-internal\technical) errors and (service-external\technical) errors, the Technical Service Maintainer is requested to solve any problems, giving references to relevant log entries. 6.4.2 Ensuring continuity of the statistical production process (design) Designer creates a Request for Change for an existing statistical service, possibly triggered by changes business needs; The Functional Service Maintainer approves or disapproves of the change; In case the change is approved, the implementation of the statistical process is changed according to the new design: o In case of an assembled statistical process, the Service Assembler assembles a new version of the process using other statistical services or newer versions of the same statistical services; o In case of custom software, the Service Assembler makes the necessary changes to the software and makes sure that other statistical services are called or that other versions of the same statistical services are called. 7. 7.1 Technical Maintenance of statistical services Purpose 24. Service lifecycle management, technical ownership of the local Statistical Server and the local Service Catalogue. To solve technical problems regarding the execution of assembled statistical processes or individual statistical services. 7.2 25. Primary stakeholder Technical service maintainer (Not a formal CSPA role) 7.3 26. Secondary stakeholders Functional Service Maintainer (Not a formal CSPA role), Service Builder. 7.4 User stories 7.4.1 Capacity Management The Technical Service Maintainer monitors performance of the statistical servers and disk usage on file servers. When any of these exceed set thresholds, a process is triggered to increase capacity or to discuss the cause with the Functional Service Maintainer. 74 7.4.2 Life Cycle Management 7.4.3 Issue / Incident Management The Functional Service Maintainer or Service Availability Control may report (serviceinternal\technical) errors and (service-external\technical) errors, giving references to relevant log entries; The Functional Service Administrator of the Service is alerted of the problem; If applicable, the Functional Service Maintainers of all applications using the malfunctioning service are alerted of the problem; The Technical Service Maintainer investigates the problems and identifies whether they are infrastructural, programming bugs or methodological bugs in the service component. In case of... o infrastructural problems, a server problem or network problem may have to be solved; o a service developed locally, in case of... programming bugs, the Service Builder is requested to analyse the problem and solve any bugs in a new version of the statistical service implementation. This new version gets a new Patch version, according to Semantic Versioning; methodological bugs The Functional Service Administrator is requested to supervise the functional correction of the design and specification of the Statistical Service; The Functional Service Administrator requests the Technical Service Administrator to make a new Minor version of the Service Implementation according to the changed specification. o a service not developed locally, the bug is reported (including requested resolution priority) in the global service catalogue issue registration, including our understanding of the nature of the bug. If an agreement can be reached on an acceptable resolution time, the resolution is awaited. Otherwise, a new version of the service is requested from the Local Service Builder, as if it were a locally developed service. o In case a new version of the service has been developed, it is updated in the global service catalogue References 1. Generic [[REST interface for CSPA services]]. Jan van der Laan & Edwin de Jonge. June 13, 2014. 2. Common Statistical Production Architecture. Version 1.0, December 2013. 75 C. Guidelines for the creation of content for the Technical /Supporting Services Catalogue 27. The current CSPA Technical/Supporting Services Catalogue is Github.com, an example can be viewed at: https://github.com/edwindj/cspa_rest/tree/ea9d59b5afbb4764cca6b34f5d3120f3cf7c16e6. 28. A Technical Repository contains one or more CSPA Service Implementations for exactly one CSPA Service. For example: a .NET implementation and a Java implementation. 29. 30. The Technical repository contains the following general artifacts: A Readme, like a README.md file on Github. For each of the implementations in the repository this Readme contains a reference to Implementation Description in the Global Service Catalogue. Also, it explains how to use the repository. A facility for issue management; A Deployment Manual, describing how to pick and deploy one of the implementations of the CSPA Service. This manual contains: o A description of the deployment “happy flow”, which is the default environment of the service to run on. For example: a VirtualBox 4.3.18 or higher environment where each virtual machine has Internet access and gets a static IP address. o Any number of appendices describing “alternative flows” for other implementation environments. For example: a Windows 8 environment with SQL Server 2008, Active Directory and without an Internet Connection. Users of the service may add additional Appendices to this manual as they implement the service on their specific environments. It contains, for each implementation: The software required to deploy the implementation; 76 31. The legal license information applicable to the implementation (like a LICENSE file on Github); Testware to verify whether a deployed service is actually working as it should. It contains, for each open source implementation: The source code. A Development Manual, describing how to contribute to the repository, like a CONTRIBUTING.md file on Github (https://github.com/blog/1184-contributingguidelines). This manual contains: o A description of (an example) development environment that should be used to develop the service further. For example: Microsoft Visual Studio 2012 or higher for a .NET-based implementation; o Specific source code conventions, if applicable; o Directions on how to organize the source code, file names, etcetera; o Directions on branching, the logical unit of check-ins and how to document check-ins. 77 Annex 5: Glossary Term Definition Application Architecture (AA) classifies and hosts the individual applications describing their deployment, interactions, and relationships with the business Application processes of the organization (e.g. estimation, editing and seasonal adjustment Architecture tools, etc.). AA facilitates discoverability and accessibility, leading to greater reuse and sharing. Source: Statistical Network BA definition The description of a recurring particular design problem which comes from Architectural different design contexts. The solution schema is specified by describing its Pattern components, its responsibilities its relations and the ways they collaborate. Source: CSPA Business Architecture (BA) covers all the activities undertaken by a statistical organization, including those undertaken to conceptualize, design, build and Business maintain information and application assets used in the production of statistical Architecture outputs. BA drives the Information, Application and Technology architectures for a statistical organization. Source: Statistical Network BA definition A business line within a statistical organization usually delivers a particular business outcome. Business lines allow the Business Architecture to be split into Business Line homogeneous areas of related activities. Business lines are defined in order to guarantee independence from reorganization of the current organizational structure. Source: Statistical Network BA definition Business Something an enterprise does, or needs to do, in order to achieve its objectives. Function Source: GSIM Business A set of process steps to perform one or more Business Functions to deliver a Process Statistical Program. Source: GSIM A defined interface for accessing business capabilities (an ability that an organization possesses, typically expressed in general and high level terms and Business requiring a combination of organization, people, processes and technology to Service achieve). Source: GSIM Capabilities provide the agency with the ability to undertake a specific activity. A capability is only achieved through the integration of all relevant capability Capability elements (e.g. methods, processes, standards and frameworks, IT systems and people skills). Source: Statistical Network BA definition Common A set of principles for increased interoperability within and between statistical Statistical organizations through the sharing of processes and components, to facilitate real Production collaboration opportunities, international decisions and investments and sharing Architecture of designs, knowledge and practices. Source: CSPA Enterprise Architecture is about understanding all of the different elements that go to make up the enterprise and how those elements interrelate. It is an approach to Enterprise enabling the vision and strategy of an organization, by providing a clear, Architecture cohesive, and achievable picture of what's required to get there. Source: Statistical Network BA definition 78 Global Artefact Catalogue A list and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. Source: CSPA A set of agreed common principles and standards designed to promote greater interoperability within and between the different players that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals. Source: CSPA Information Architecture (IA) classifies the information and knowledge assets gathered, produced and used within the Business Architecture. It also describes Information the information standards and frameworks that underpin the statistical Architecture information. IA facilitates discoverability and accessibility, leading to greater reuse and sharing. Source: Statistical Network BA definition A type of contract by which subsystems or component communicate. Source: Interface CSPA Non Non Functional Requirements are the overall factors that affect runtime Functional behaviour, system design, and user experience. They represent areas of concern Requirements that have the potential for application wide impact. Source: CSPA Principles are general rules and guidelines, intended to be enduring and seldom amended, that inform and support the way in which an organization sets about Principles fulfilling its mission and business objectives. Source: Statistical Network BA definition Formats and rules for exchanging messages in or between computing systems. Protocol Source: CSPA Reuse is the concept of using a common asset (implemented component, a component definition, a pattern...) repetitively in different (or similar) contexts Reuse (for example in different business processes), and/or by different participants, and/or overtime. Source: CSPA A service is a logical representation of a repeatable business activity that has a specified outcome and is self-contained, fulfils a business need for a customer Service (internal or external to the organization) and may be composed of other services. Source: Statistical Network BA definition (adapted from TOGAF G113) A service contract is comprised of one or more published documents (called service description documents) that express meta information about a service. The fundamental part of a service contract consists of the service description documents that express its technical interface. These form the technical service Service contract which essentially establishes an API into the functionality offered by the Contract service. A service contract can be further comprised of human-readable documents, such as a Service Level Agreement (SLA) that describes additional quality-of-service features, behaviours, and limitations. Source: http://serviceorientation.com/soaglossary/service_contract A service interface is the abstract boundary that a service exposes. It defines the Service types of messages and the message exchange patterns that are involved in Interface interacting with the service, together with any conditions implied by those messages. Source: http://www.w3.org/TR/ws-arch/#service_interface Industry Architecture 79 Service Oriented Architecture Share Statistical Service Technology Architecture Service-Oriented Architecture (SOA) is an architectural style that supports a way of thinking (Service Orientation) in terms of services and service-based development and the outcomes of services. The SOA architectural style has the following distinctive features: It is based on the design of the services – which mirror real-world business activities – comprising the enterprise (or inter-enterprise) business processes. Service representation utilizes business descriptions to provide context (i.e., business process, goal, rule, policy, service interface, and service component) and implements services using service orchestration. It places unique requirements on the infrastructure – it is recommended that implementations use open standards to realize interoperability and location transparency. Implementations are environment-specific – they are constrained or enabled by context and must be described within that context. It requires strong governance of service representation and implementation. It requires a "Litmus Test", which determines a "good service". Source: The Open Group http://www.opengroup.org/soa/source-book/soa/soa.htm Share is an ownership concept where an asset is made available to other participants for use. There are levels of sharing. A limited form of sharing would be to provide another participant with the means to replicate (make a copy) the asset (for example give the source code) (i.e. they share an aspect of the asset only). A more involved form of sharing would entail that asset is actually been made entirely common (in this case the asset is also reused). Source: CSPA A Statistical Service is a specialization of Service for the statistical industry. A Statistical Service represents a defined interface for accessing business capabilities (an ability that an organization possesses, typically expressed in general and high level terms and requiring a combination of organization, people, processes and technology to achieve). It is logical representation of a repeatable business activity that has a specified outcome and is self-contained, fulfils a business need for a customer (internal or external to the organization) and may be composed of other services Source: GSIM and Statistical Network BA definition (Service) Technology Architecture (TA) describes the IT infrastructure required to support the deployment of applications and IT services, including hardware, middleware, networks, platforms, etc. Source: Statistical Network BA definition 80