Fostering Interoperability in Official Statistics: Common Statistical Production Architecture (Version 0.1, April 2013) DRAFT FOR REVIEW Please note the development of the Common Statistical Production Architecture is a work in progress. This is not intended for official publication and reflects the thoughts and ideas gathered at the first Sprint on this topic held in Ottawa, Canada during April 2013. This document is aimed at readers who have some knowledge of Enterprise Architecture. Instructions for reviewers and a template for providing feedback is available at http://www1.unece.org/stat/platform/display/msis/CSPA+v0.1 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. If you re-use all or part of this work, please attribute it to the United Nations Economic Commission for Europe (UNECE), on behalf of the international statistical community. Contents I. Introduction ............................................................................................................................. 3 II. Common Statistical Production Architecture ......................................................................... 6 III. How the Architecture could be used. ................................................................................... 7 IV. Who uses the Architecture ................................................................................................... 8 V. Impact on organizations ........................................................................................................ 10 VI. A. Architecture Description .................................................................................................... 11 Principles ........................................................................................................................ 12 Decision Principles ............................................................................................................... 12 Design Principles .................................................................................................................. 12 Statistical Service Design Principles .................................................................................... 13 B. Requirements .................................................................................................................. 14 C. Statistical Services and "Plug and Play" ........................................................................ 14 Generic Statistical Business Process Model (GSBPM) ........................................................ 16 Generic Statistical Information Model (GSIM) .................................................................... 16 Quality Attributes/ Non-Functional Requirements ............................................................... 16 Architecture Patterns ............................................................................................................. 18 D. VII. Catalogues ...................................................................................................................... 21 Governance of the architecture .......................................................................................... 22 Annex 1: Quality Attributes / Non-Functional Requirements ...................................................... 24 Annex 2: Differences between “Classic” SOA and EDA ............................................................. 28 Annex 3: Glossary......................................................................................................................... 30 2 I. Introduction 1. Many statistical organizations are facing common challenges. They have built up their organizational structure, production process, enabling statistical infrastructure and technology over years, through many iterations and technology changes. This can be referred to as 'accidental architecture' as they were not designed from a holistic top down view. The cost of maintaining this business model and the associated asset bases (process, statistical, technology) is becoming insurmountable and the model of delivery is not sustainable. 2. There are two major threats to the continued efficient and effective supply of core statistics come from within statistical organizations. These are: 1) rigid processes and methods and 2) inflexible ageing technology environment. 3. Often it is difficult to replace even one of the components supporting statistical production. Use of these processes, methods and an inflexible and aging technology environment mean that statistical organizations find it difficult to produce data and information aligned to modern standards. Process and methodology changes are time consuming and expensive resulting in an inflexible, unresponsive statistical organization. 4. Historically, statistical organizations have developed their own business processes and IT-systems for producing statistical products. Therefore, although the products and the processes conceptually are very similar, the individual solutions are not. Statistical organizations have attempted many times over the years to share their processes, methodologies and solutions, as it has long been believed that there is value in this. The mechanism for sharing has historically meant an organization taking a copy of a component and integrating it into their own environment. Examples include CANCEIS (CANadian Census Edit and Imputation System) and Banff (an editing and imputation system for business surveys). However, most cases of sharing have involved significant work to integrate the component into a different processing and technology environment. 5. Many statistical organizations are modernizing and transforming using Enterprise Architecture to underpin their vision and change strategy. This work will enable the organizations to develop statistical services in a standard way within their organization. An enterprise architecture aims to create an environment which can change and support business goals. It shows what the business needs are, where the organization wants to be, and ensures that the IT strategy aligns with this. Enterprise architecture helps to remove silos, improves collaboration across an organization and ensures that the technology is aligned to the business needs. 6. Figure 1 attempts to explain why this difficulty occurs. The figure assumes that the two statistical organizations develop all their business capability and supporting components in a standard way (i.e they have an Enterprise Architecture). The first line of the figure shows that Canadian components have a zig zag shape and the second suggests that Sweden has components with slanted edges. If Sweden needs a new component, ideally they need a component with a 3 slanted edge. It can be seen in the third row that while a component from Canada might support the same process and incorporate robust statistical methodologies, it will not be simple to integrate it into the Swedish environment. Figure 1: Why sharing /reuse is hard now 7. The High Level Group for the Modernization of Statistical Production and Services (HLG) has put priority on the development of a Common Statistical Production Architecture (CSPA). 8. The CSPA is meant to be a generic architecture for statistical production. It will serve as an industry architecture for statistical organizations. By adopting this common reference architecture, it will be easier for each organization to standardize and combine the components of statistical production, regardless of where the statistical services are built. As shown in Figure 2, Sweden could easily reuse a statistical service from Canada because they both use the same “shape”. 9. The CSPA will facilitate the sharing and reuse of statistical services both across and within statistical organizations. The statistical services that are shared or reused across statistical organizations might be new statistical services or legacy/existing services which have been wrapped to comply with the architecture. This is show in Figure 2 by the shapes inside the Lego blocks. It also provides a starting point for concerted developments of statistical infrastructure and shared investment across statistical organizations. 4 Figure 2: how it could be easier with architecture 10. to: The goal of the CSPA is to provide statistical organizations with a standard framework facilitate the process of modernization provide guidance for operating change within statistical organizations provide statisticians with flexible information systems to accomplish their mission and to respond to new challenges and opportunities reduce costs of production through the reuse / sharing of solutions and services and the standardization of processes provide guidance for building reliable and high quality services to be shared and reused in a distributed environment (within and across statistical organizations) enable international collaboration initiatives for building common infrastructures and services foster alignment with existing industry standards such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM), and encourage interoperability of systems and processes 11. This document provides the first thoughts on what this CSPA will look like. It is the output from the Architecture Sprint held at Statistics Canada 8 - 12 April 2013. This output is not designed to be the final output of this project. It is intended to outline the first ideas on the architecture in order to seek feedback from the wider official statistics community. 5 II. Common Statistical Production Architecture 12. The CSPA aims to be the industry architecture for the official statistics industry. An industry architecture is a set of agreed common principles and standards designed to promote greater interoperability within and between the different stakeholders that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals (in this case official statistics).The value of the architecture is that it enables collaboration in developing and using services which will allow statistical organizations to create flexible business processes and systems for statistical production more easily. The CSPA is sometimes referred to as "plug and play" architecture. The idea is replacing components could be as easily pulling out a component and plugging another one in. 13. The CSPA gives users an understanding of the different statistical production elements that make up a statistical organization and how those elements relate to each other. It also provides a common vocabulary with which to discuss implementations, often with the aim to stress commonality. It is an approach to enabling the vision and strategy of an industry, by providing a clear, cohesive, and achievable picture of what’s required to get there. 14. An important concept in architecture is the “separation of concerns”. For that reason, the architecture is separated into a number of “layers”. These “layers” are: 15. Business Architecture which defines and evolves understanding of what the industry does and how it is done (statistics in our case), Information Architecture which builds understanding of the information, its flows and uses across the industry, and how that information is managed, Application Architecture which is the set of practices used to select, define or design software components and their relationships, and Technology Architecture which describes the infrastructure technology underlying (supporting) the business and application layers. The CSPA provides a template architecture for official statistics. It describes: What the official statistical industry want to achieve – This is the goals and vision (or future state). How the industry can achieve this – This is the principles that guide decisions on strategic development and how statistics are produced. What the industry will have to do - The industry will need to adopt a service oriented architecture which will require them to comply with requirements, quality attributes / non-functional requirements and architecture patterns 16. One of the key enablers for the CSPA is Catalogues. It is envisaged that each statistical organization will have catalogues of processes, information objects and statistical services. There will be a Global Artefact Catalogue which contains the shareable/ reusable artefacts (i.e. processes, information objects and statistical services) from the statistical organizations. 6 17. The Global Artefact Catalogue will contain planning metadata. This will include information about developments that are planned or in progress. This information will facilitate the creation of a Global Roadmap of statistical organizations developments that can be consulted to see which organizations are developing what and when. 18. The architecture is described in more detail in Section VI. III. How the Architecture could be used. 19. The Architecture Sprint proposed eight ways in which CSPA may be used. These are outlined below: 20. Component redesign: A statistical organization uses a statistical service and would like to modify the statistical service to meet new functional and/or non-functional requirements. An approach to collaborative renewal of existing statistical service is required. For example, Statistics New Zealand and the Office of National Statistics (UK) would like new functionalities in CANCEIS 21. New component (strategic level or design level): The need for a new statistical service has been identified as a gap in a statistical organization’s Strategic Plan. To fill the gap the statistical organization starts looking for available solutions in the collaborative space (Global Artefact Catalogue). If not found, the statistical organization can either start designing and developing internally or seek collaboration with other statistical organizations. An example of this could be a need to have metadata driven questionnaire creation. 22. Vendor: The vendor has released a new product. Statistical organizations should verify together if the product is relevant to the community requirements. If it is not, statistical organizations can try to influence the vendor to meet requirements. If yes, statistical organizations ask the vendor to integrate the product in the CSPA. For example, a SAS component for Analytics. 23. Vendor sought: Statistical organizations identify new needs/solution and decide to investigate possibilities of vendors developing it. For example, the Australian Bureau of Statistics seeking SAS agreement to develop a new procedure for regression that protect confidentiality. 24. Roadmap: Statistical organizations need to modify and integrate their roadmaps to align them with the CSPA framework. They will also integrate/streamline their Investment strategies. 25. Strategy: Statistical organizations are creating and using an industry strategy ("Industry Architecture") and this leads to projects/work programs. For example, the Proof of Concept for CSPA. 26. Hosted services: One international organization hosts/deploys services available for all statistical organizations. For example, Eurostat delivers statistical services in the cloud. 7 27. Statistical organization cooperation: A cluster of statistical organizations have been working on a project and want to share the results. 28. Transition Strategy: Each statistical organization needs to define a strategy to move from its current state to the common future state defined in their roadmap. IV. Who uses the Architecture 29. Using the Architecture will create new functional roles within a statistical organization. These roles may already exist in some statistical organizations and may have particular people (for example, Chief Information Officer, Enterprise Architect) assigned to them. This section explains the functional roles and leaves it to each statistical organization to decide who performs that role in their organization. Figure 3: Roles in project lifecycle 30. An Investor is a person, or a group of people, who undertakes the strategic planning and investment decision making role to support statistical production in a statistical organization. In their decision making, they balance benefits, opportunities, risks and costs. To support this decision making, the Investor could consult a catalogue which includes processes, components and statistical services and evaluate the relevant components. 8 Technical Specialist roles 31. Once the need for a component is identified and the project is funded, a Designer from within or outside the organization will reuse, wrap or design methods and services to provide the essential capability needed by the organization or wider statistical community. There could be a number of types of Designer including Process Designers, Information Designers, Methodology Designers and Service Designers. (See Box 1 for a more detailed description of the Designer role). The Service Designer will consult the architecture to ensure that the service is consistent with it and also authorize the service to be added to the catalogue. 32. If a new statistical service has been designed, the Service Designer will give the work to a Service Builder. A Service Builder is a specialist or expert from within or outside of an organization who builds statistical services according to the specifications provided by Service Designers. 33. A Service Assembler is a specialist responsible for integrating or assembling relevant statistical services into an end-to-end process that delivers a critical business capability. This role requires the specialist skills of people experienced in integration and methodology. Non-Technical Specialist Roles 34. Services deliver generic business capability within the GSBPM framework. Services are then used in a specific context, within a specific statistical process. Services therefore must be capable of being configured or parameterized for this specific task. A Service Configurer is the role responsible for configuring either the end-to-end process or a specific process to achieve the business outcomes specified by the business area responsible for statistical production. People who fulfill this role would include Subject Matter Experts and Methodologists. Their role is to configure the parameters needed for a particular business process. For example, for predetermined capabilities, they would decide what is used and in what order. Their interactions with a service are only through a User Interface. 35. Finally, Users are the business area staff who manage and operate the configured end-toend process to achieve the business outcomes sought. Box 1: An illustration of the Designer Role Jose is a Service Designer in a statistical organization. He has been asked to design a solution to collect data for the Labour Force survey. Jose would look into the statistical requirements of all data collections in the organization and design a service which would meet not just the requirement of the Labour Force Survey, but also data collection approach for other surveys in the organization. These statistical requirements would include details on variables to be collected, on international standards and classification, sample techniques to be used, the questionnaire logic and the outputs to be obtained. Once the requirements are understood, Jose searches in the "GSIM aligned Information Object Catalogue" for the availability of data and classifications that can be used for the survey. 9 Using the "GSBPM aligned Process Catalogue", Jose selects appropriate processes according to the requirements for different phases of the Data Collection i.e. “Sample Selection", "Set up Collection", "Run Collection" and "Finalize Collection". Then Jose browses the organization’s artifact catalogue to search for suitable existing services he can use to support the survey methodology, processes and information flows. If multiple modules are available in the organization’s artifact catalogue for a single phase, Jose chooses according to the requirements and interfaces, possibly verifying with statisticians. If some of the needed modules are not available in the organization’s artifact catalogue, Jose searches in the Global Artefact Catalogue. If no suitable services are found there, Jose investigates various options to make an informed decision: internal legacy systems that are candidates to become services (by wrapping) planned initiatives on the global roadmap that meet the requirements and timelines (using Planning Metadata in the related Catalogue) existing services that would require extension following the collaborative change management process (internal and or external) vendors listed in the catalogues of the Global Artefact Catalogue that could partner or provide a solution Jose will need to consult with the Enterprise Architecture on his evaluations and the cost/benefit analysis of the options seeking confirmation that his proposals aligns with transitional planning. If there are no alternatives, Jose will design a new service that can be re-used both internally in his Statistical Organization and in other organizations. Following the planning metadata, frameworks, standards, policies, guidelines, and other knowledge assets in the related Catalogues to conform to the requirements of the CSPA. After designing the services to be used, he ensures that inputs and outputs required align with the "Service Interfaces" and integration specifications of the services selected. After verifying that all necessary components are available, Jose selects the appropriate "Architectural Patterns" and composes the Services package including defining the "Guidelines" to be used by "Service Builders" and "Service Assemblers" to meet all functional and non-functional requirements. The service package is then sent to the Enterprise Architect review and validation process for both internal and international re-use. V. Impact on organizations 36. There will be a number of required changes for an organization implementing the CSPA. Adoption of the CSPA will require investment with a view to generating the long term benefits identified in the value proposition. 10 37. The main changes required at the organization level can be grouped in layers: A. People Changes Openness to international cooperation Building trust in international partners (especially as they may be building services for your organization) Sense of compromise (acceptance that nothing will be optimized for local use, rather it will be optimized for international or corporate use) Development of new functional roles to support use of the architecture (e.g. Assembler, Builder) B. Process Changes Adoption of an industry wide perspective Different approach to business process management and design Commitment to service (contract between different functional units) C. Technology Changes setting up an adequate middle ware infrastructure (messaging, repositories) uplift of physical network capabilities (bandwidth, etc.) management of security features 38. In addition to the costs and the targeted benefits, an organization adopting the CSPA will benefit from: VI. a sustainable and efficient strategy to cope with legacy and phasing out of existing applications a cycle that enables cost saving from reduction in production costs to be reinvested in further infrastructure transformation a positive image both on national and international/industry scene Architecture Description 39. The CSPA has a number of design artefacts. These design artefacts describe the structure of statistical services, their interrelationships and the principles that govern their design. The first ideas for a number of design artefacts were produced at the Architecture Sprint. These include: Principles Requirements Statistical Services and Plug and Play o Generic Statistical Business Process Model (GSBPM) Alignment o GSIM Alignment o Quality Attributes / Non-functional Requirements o Architecture Patterns Catalogues 11 A. Principles 40. Principles are high level decisions or guidelines that influence the way processes and systems are to be designed, built and governed. Principles are derived from the mission and values of the organization, taking into account the opportunities and threats that the organization faces. In the CSPA, principles are used to express the high level design decisions that will shape the future statistical processes and systems. Decision Principles 41. Decision principles are guidelines to help decide on strategic development. They provide a basis for decision making and informing how the mission is fulfilled. They help enable sound investment decisions. The following decision principles have been selected for the CSPA. They represent the outcomes sought by the High Level Group and key elements of the United Nations principles for Official Statistics. These provide a basis for decision making through an enterprise and inform how that organization sets about fulfilling its mission. How we decide on strategic development 42. These principles are outlines below and are to be applied as a cohesive set: Increase the cost efficiency of the statistical organization Deliver enterprise-wide benefits Capitalize on or influence national or international developments Sharing and Collaboration Increase the value of the nation's statistical assets Maintain information security and community trust. Design Principles 43. The following design principles have been developed to provide an understanding of the goals of the CSPA and the sponsors of the architecture. The principles are provided across the architectural layers outlined earlier (business, information, application and technical) and inform design related decisions. Applying the principles (see Table 1) in a balanced way will ensure that statistical services will be delivered to meet the industry vision. 12 Table 1: Design Principles for the different “layers” of architecture and the Statistical Service Business Design Principles Information Design Principles Use available standards Information is an asset that has value to the Focus on efficiency organization and must be managed Support responsiveness to new needs accordingly. Organization wide awareness in design Information is consistent across all relevant Output driven architecture services. Empowering business to reach their goals Use available standards to maintain independently through metadata driven consistence of language across all services. processes Users/services have access to the necessary Re-use existing before creating new information to perform their duties. Create new for re-use and flexible, easy Human intervention at run time is assembly, plug and play minimized by automated population. Use recognized standards Changes to Information assets are managed Have the right delegations and governance rigorously to provide traceability and with supporting metrics versioning. Consider all capability elements e.g. Persistence is maintained within the service Methods, standards, processes, skills, IT and is not exposed to service consumers. Services do not provide persistence as a main (or sole) purpose. They support a GSBPM service Application Design Principles Technology Design Principles Use available standards Implementation agnostic Leverage learning from international application platforms (for example COmmon Reference Environment (CORE)) Follow both "classic" and "event-driven" SOA architecture patterns in our Plug and Play framework depending on best fit to requirements Automate as much as possible Statistical Service Design Principles Managed standardized service contracts based on GSIM objects. Enable services to be loosely coupled externally and be aware of internal coupling. Maximize service autonomy (completeness) to enable share-ability and reusability (External & Internal) Non-functional requirements (quality attributes) form a key input in design decisions. Granularity based on the level below a GSBPM sub phase Independence between design and implementation Statistical Service Design Principles 44. The design principles (see Table 1 above) have been selected to maximize the flexibility of the statistical services wrapped or developed in the context of the CSPA. The flexibility of the 13 statistical service directly impacts the level of reuse, the flexibility required of the industry vision and the ease with which a statistical organization can implement a statistical service. B. Requirements 45. A requirement is a specific functional or physical need that a process or system must fulfill. It is a statement that identifies a necessary attribute, capability, characteristic, or quality of a system for it to have value and utility to a customer, organization, internal user, or other stakeholder. In the context of the CSPA, the requirements shown in Figure 4, in addition to the principles, shape the way the architecture will be realized. Figure 4: Requirements for the CSPA C. Statistical Services and "Plug and Play" 46. The architecture is based on an architectural style called Service Oriented Architecture (SOA). This style focuses on Services (or Statistical Services in this case). A service is a representation of a real world business activity with a specified outcome. It is self-contained and 14 can be reused by a number of business processes (either within or across statistical organizations). 47. The level of reusability promised using an SOA is a direct result of the standardized definition of the service capabilities through the service contract. For CSPA statistical services these capabilities have been defined as: the GSBPM for the reusable functional context the GSIM for the information model (note this will be the implementation model) the quality attributes/non-functional requirements the architectural pattern(s) to be supported (traditional and event driven SOA), Figure 5: CSPA Service Features 48. A statistical service will perform a task in the statistical process. Statistical services will be at different levels of granularity. An atomic or fine grained statistical service encapsulates a small piece of functionality. An atomic service may, for example, support the application of a particular methodological option or a methodological step within a GSBPM sub phase. Coarse grained or aggregate statistical services will encapsulate a larger piece of functionality, for example, a whole GSBPM sub phase. These may be composed of a number of atomic services. 49. The CSPA is sometimes referred to as "plug and play" architecture. Plug and Play is a concept that was borrowed from the world of computer hardware. It is used to indicate that installing new hardware modules is easy: just plug it in and it will work. In the context of statistical processes, we use the term to indicate that installing a new "module" of functionality in an existing process is easy. In fact, the idea is that it should be possible to build a complete new process by stringing together a number of such modules, like building a wall from Lego-bricks. 15 50. Within the Plug and Play concept, the Service is the “pluggable” element. The granularity of statistical services should be based on a balanced consideration between the efficiency of the statistical service and the flexibility required for sharing purpose - larger statistical services will usually enable greater efficiency, whereas a finer granularity will allow greater flexibility for supporting sharing and reuse. It is envisaged that the statistical services will be shared or reused at one level below a GSBPM sub phase. Services, regardless of their granularity, must meet the architectural requirements and are aligned with the CSPA principles. Figure 5 shows that these statistical services must align to GSBPM and GSIM as well as adhere to the architectural patterns and quality attributes / non- functional requirements described by the architecture. Generic Statistical Business Process Model (GSBPM) 51. The GSBPM is a flexible tool to describe and define the set of business processes needed to produce official statistics. The GSBPM can also be utilized to harmonize statistical computing infrastructures, facilitating the sharing of software components, and providing a framework for process quality assessment and improvement. 52. For further information please refer to the GSBPM web site http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+ Model Generic Statistical Information Model (GSIM) 53. The GSIM is the first internationally endorsed reference framework for statistical information. This overarching conceptual framework plays an important part in modernising, streamlining and aligning the standards and production associated with official statistics at both national and international levels. 54. GSIM is a reference framework of information objects, which enables generic descriptions of the definition, management and use of data and metadata throughout the statistical production process. It provides a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics. As a reference framework, GSIM helps to explain significant relationships among the entities involved in statistical production, and can be used to guide the development and use of consistent implementation standards or specifications. 55. GSIM is one of the cornerstones for modernising official statistics and moving away from subject matter silos. It is a key element of the strategic vision prepared by the High-Level Group for the Modernization of Statistical Production and Services (HLG), and endorsed by the Conference of European Statisticians. 56. For further information please refer to the GSIM web site http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=59703371 Quality Attributes/ Non-Functional Requirements 16 57. Quality Attributes / Non-Functional Requirements are important to be captured in the design of the services. They have a significant influence on the software architecture of a service 58. The Quality Attributes / Non-Functional Requirements listed in Table 2 are grouped into four specific areas linked to design, runtime, system and user qualities. Further explanations of these quality attributes including descriptions can be found in Annex 1. Design Qualities Table 2: Quality Attributes / Non-Functional Requirements Reusability (Exploitability) Legal and licensing issues or patent-infringement-avoidability Privacy Multilanguage support (internationalization and localization) Compliance Extensibility (adding features, and carry-forward of customizations at next major version upgrade) Maintainability Platform compatibility Run time Qualities System Qualities Failure management Quality (e.g. faults discovered, faults delivered, fault removal efficacy) Configuration management Deployment Operability Portability Scalability (horizontal, vertical) Backward Compatibility Supportability Testability User Qualities Documentation Usability by target user community Security Efficiency (resource consumption for given load) Interoperability Effectiveness (resulting performance in relation to effort) Performance / response time (performance engineering) Availability (see service level agreement) Fault tolerance (e.g. Operational System Monitoring, Measuring, and Management) Backup Robustness 17 Architecture Patterns 59. In simple terms, architecture patterns describe a re-usable solution to certain classes of problems. They explain how, when and why statistical services can be used, as well as the impact of using them in that way. They help a Service Assembler to identify combinations that have been used successfully in the past. Although not identified in this document, there are also anti-patterns which are examples of what should not be done. 60. The benefits of using architecture patterns can be described by using the analogy of an expert chess player. To play chess, you must learn the rules and the principles (for example the value of different pieces, squares). However, to improve and become a really good player, you need to learn the patterns used by more experienced players and apply them to your game. In the same way, you can use the requirements, principles and quality attributes of the CSPA, but to get the maximum benefit, Service Assemblers should learn the architecture patterns. 61. The CSPA will incorporate both "Classic" and Event Driven Service Oriented Architecture patterns. Both patterns provide capabilities to cope with the future challenges facing statistical organizations. Event Driven Service Oriented Architecture would be preferred in cases such as Big Data, new Dissemination processes, cross organization cooperation. 62. For other issues either it is indifferent (e.g. security) or it is still undetermined (e.g. future technology, loss of control of data collection). These aspects may be investigated on the basis of the Proof of Concept results including aspects like impact on granularity. 63. Further details comparing the two types can be found in Annex 2. "Classic" Service Oriented Architecture (SOA) 64. "Classic" SOA is a pattern that uses the Request/Response pattern for activating services. This implies a rather fixed routing of messages between services. The integration infrastructural platform implementing the process “orchestrates” the routing and executing of services. This pattern leads to less flexibility and tighter coupling between services that the Event Driven (Service Oriented) Architecture pattern described below. 65. "Classic" SOA can be used if: A functional style and sequential flow is required It is known precisely which service interface should be called The service should be called exactly once A response when service execution completes is expected A response is expected 66. An example of how this pattern could be applied in relation to collection is each questionnaire is stored in an entity service. The entity service exposes the operation to get the questionnaire through a service call. The indicators are computed and stored and made available using an entity service call. 18 Figure 6: Examples of “Classic” SOA Event Driven (Service Oriented) Architecture (EDA) 67. EDA is based on the publish/subscribe concepts and heavily utilizes many of SOA approaches and design patterns. Some consider EDA to be an asynchronous version of SOA. 68. SOA emphasizes the importance of loose coupling. EDA goes a step further. As an event is generated by an event source and is sent to the processing middleware. It is not known which functionality is triggered next. In SOA, the concrete service call would have been made, but this is not the case for Event Driven Architecture. For this reason, EDA talks about "decoupling" rather than loose coupling. 69. EDA can be used if: All recipients that may be interested in the event should be notified It is not exactly known which and how many recipients are interested in the event It is not known how recipients respond to this event Different recipients respond differently to the same event Only one-way communication from the sender to the recipient is possible 70. An example of how this pattern could be applied in relation to collection is when each questionnaire completed publishes an event when available for subscribers downstream. Early indicators can be produced by processing collection events straight through aggregation. 19 Figure 7: Examples of EDA Enabling Architectural Mechanisms 71. Within each statistical organization, there needs to be an infrastructural environment in which the generic services can be combined and configured to run as element of organization specific processes. This environment is not part of the CSPA. The CSPA assumes that each statistical organization has such an environment and makes statements about the characteristics and capabilities that such a platform must have in order to be able to accept and run statistical services that comply with CSPA. 72. Platform for Service Communication: A communication platform provides the capability for communication between statistical services. It enables inter-service communication while allowing statistical services to remain autonomous and add additional capabilities for monitoring and orchestrating the information flow. To assemble a built statistical service, the communication platform is updated to integrate with new services. There are multiple ways of establishing a platform communication. Examples of architectural components could be BPMS, ESB, Workflow Engines, Orchestration Engines, Message Queuing and Routing. 73. Platform for Configuring and Controlling Services and Processes: The Platform for Controlling Service and Process execution encompasses the functionalities and tools to support the management and maintenance of services metadata, artifacts and policies. Examples of how this mechanism could be achieved include Business Process Modelling System, Lifecycle Management, Service Monitoring and Management. 74. Platform for Reporting on Services and Processes: The Platform for reporting is responsible for enabling real-time monitoring and near-real-time presentation of user defined business key performance indicators (KPIs). Examples of how this mechanism could be achieved are Static Dashboard, Business Activity Monitoring (also generates alerts and notifications to user when these KPIs cross specified thresholds). 20 D. Catalogues 75. In the context of a CSPA, catalogues of various artefacts are seen as key enablers. They provide lists and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. 76. The catalogues can be at many levels, from global to local. However, for the purposes of this project, it is the global level that is the primary interest. The global catalogue is called the Global Artefact Catalogue. The Catalogue will provide information about resources and potential collaboration partners, helping to ensure that the modified component conforms to the requirements of the CSPA. Governance and support mechanisms and processes will need to be defined to ensure the continued relevance and utility of the catalogues. 77. The Global Artefact Catalogue will include three sub catalogues. These are shown in Figure 8. Figure 8: Types of Catalogues 78. The catalogues should support the lifecycle management, governance and use of components and services, providing the right level of artefacts for service design, integration and transition to a production environment. 79. They are not meant to function as an "app store", as they are not designed to be platform specific, and will not necessarily hold copies of executable code. They will, however, provide all necessary information about how to access the artefacts, including contact details for further information. They should provide sufficient contents to support the CSPA. 21 Figure 9: Types of artefacts in the Catalogues 80. As shown in Figure 9, the types of artefacts in Catalogues are: Frameworks include artefacts such as the GSBPM and the GSIM Standards include DDI and SDMX Policies support governance and use, for example "no changes to GSBPM or GSIM unless they are the result of a formal approval process" Guidelines concern the practical use and integration of artefacts Other knowledge assets could include architectural principles and information on how to use the catalogues Planning metadata is intended as a repository for information about developments in progress or planned, to promote collaboration at the earliest possible stage in the development of artefacts, e.g. "Statistics New Zealand aim to develop and make available component X by (date)". This will provide the basis for a global roadmap. VII. Governance of the architecture 81. Creating a CSPA is only the first part of the story. It is likely that the architecture, or elements of it, will need to be revised and refined from time to time to ensure continued relevance. As the architecture will be a common asset for the international official statistics community, there will need to be a clear and transparent process for change management. 82. This process does not need to be invented from scratch. The CSPA is being developed as a project overseen by the High-Level Group for the Modernization of Statistical Production and Services (HLG) so it is assumed that this group will be the custodians of the outputs. Whilst the architecture itself will be “owned” by the international official statistics community, it will be administered by the HLG, as top-level representatives of that community. 83. However the HLG itself is unlikely to be involved in detailed discussions about specific elements of the architecture. Therefore, for practical purposes, proposals for changes will usually be discussed at a lower level. These discussions will take place either in one of the four modernization committees that are being established under the HLG, or in a specific task team. 22 These groups will then formulate recommendations for change, which will typically be bundled together as a package to be formally signed off by the HLG. 84. The HLG and the modernization committees will need to carefully balance the advantages of change, in terms of increasing relevance and usefulness, against the costs of having to implement those changes within statistical organizations. A reasonable degree of stability over time is therefore a key requirement for the architectural framework. 23 Annex 1: Quality Attributes / Non-Functional Requirements Table 3: Further details on Quality Attributes CATEGORY Quality Attribute Definition Notes Is the capability for components and subsystems to be suitable for use in other applications and other Design Reusability (Exploitability) scenarios. It minimizes the duplication of components Qualities and also the implementation time. Legal and licensing issues or Is the conformance of a system to legal, licensing and patent-infringement-avoidability patent regulations. Is the ability of a system to prevent the disclosure or Privacy Confidentiality loss of information. Multilanguage support Is the capability of a system to handle different (internationalization and languages and regional differences. localization) Is the degree in which a system conforms to a rule, such as a specification, an architecture description or reference, policy, standard or law. Extensibility (adding features, Is the ability of system to take into consideration and carry-forward of future growth? It is measured by the ability of a customizations at next major service to be extended and the level effect required to version upgrade) implement the extension. The ease with which a software system or component can be modified to correct faults, improve Maintainability performance, or other attributes, or adapt to a changed environment Is the extent to which a system needs to be adapted to Platform compatibility a given technological environment? Are the measures of the cost-benefit relation of a Price system to create a solution for any given problem. 24 FURPS Functionality Functionality Functionality Functionality Supportability Supportability Supportability Supportability Supportability Run time Qualities Is the capability of a system to prevent malicious or accidental actions outside of the designed usage, and Security to prevent disclosure or loss of information. A secure system aims to protect assets and prevent unauthorized modification of information. Is the degree in which a process uses the lowest Efficiency (resource amount of inputs to create the greatest amount of consumption for given load) outputs. It relates to the use of all inputs to produce any given output. Is the ability of diverse systems and organizations to work together (inter-operate). The term is often used in a technical systems engineering sense, or Interoperability alternatively in a broad sense, taking into account social, political, and organizational factors that impact system to system performance. Effectiveness (resulting Is the capability of a system to help a user to reach his performance in relation to effort) specific goals in a given context. Is an indication of the responsiveness of a system to execute any action within a given time interval. It can Performance / response time be measured in terms of latency or throughput. (performance engineering) Latency is the time taken to respond to any event. Throughput is the number of events that take place within a given amount of time. Is the proportion of time that the system is functional and working. It can be measured as a percentage of Availability (see service level the total system downtime over a predefined period. agreement) Availability will be affected by system errors, infrastructure problems, malicious attacks, and system load. Fault tolerance (e.g. Operational Is the ability of a system to continue its intended System Monitoring, Measuring, operation, possibly at a reduced level, rather than 25 Functionality Performance Functionality Performance Performance Reliability Reliability and Management) System Qualities failing completely, when some part of the system fails. Is the capacity of a system to copy and to archive Backup computer data so it may be used to restore the original after a data loss event. Is the ability of a computer system to cope with errors during execution or the ability of an algorithm to Robustness continue to operate despite abnormalities in input, calculations, etc. Are the anticipation, detection and resolution of Failure management programming, application, and communication errors. Quality (e.g. faults discovered, Is the capability of a system to meet the needs and faults delivered, fault removal concerns of the stakeholders for whom it was efficacy) developed. Is the capacity to instantiate and customize new Configuration management versions of the system. Deployment Is the ability to schedule software distribution Is the ability to keep a system in a reliable condition Operability according to predefined operational requirements Is the capacity of a system to be adapted to different specified environments without applying other actions Portability or means than those provided for this purpose for the software considered. Is the ability of a system, network, or process to handle a growing amount of work in a capable Scalability (horizontal, vertical) manner or its ability to be enlarged to accommodate that growth. Is the ability of a system to work with input generated Backward Compatibility by an older product or technology. If products Just standards designed for the new standard can receive, read, view 26 Reliability Reliability Reliability Reliability Supportability Supportability Supportability Supportability Supportability Supportability Supportability Testability User Qualities Documentation Usability by target user community or play older standards or formats, then the product is said to be backward-compatible Is the ability of the system to provide information helpful to identifying and resolving issues when it fails to work correctly. Is a measure of how easy it is to create test criteria for the system and its components, and to execute this tests in order to determine if the criteria are met. Is the degree in which a system is described in all its parts in order to get complete knowledge of how it has been developed and how must be used. Is the measure of how easily the system can be used by a given group of users who share certain characteristics. 27 Supportability Supportability Usability Usability Annex 2: Differences between “Classic” SOA and EDA 67. The following table summarizes the main differences between the architecture patterns "Classic" SOA and Event Driven Service Oriented Architecture. Table 4: Side by side comparison of the characteristics of "Classic" SOA and EDA Characteristics Architectural Style Service Interface Control Flow Request Request access point "Classic" SOA Event Driven SOA (EDA) Functional service style is required Based on the operation contracts, data contracts (for parameters) and fault contracts. Event driven service style (publishsubscribe) is required Sequential flow is required The service can be called exactly once. It is known precisely which service interface should be called Based on the lists of events that the service can publish. Sequential flow is possible by using additional platform functionalities. Calls are not made explicitly. Requests are not made directly to a service interface. A response when service execution completes cannot be explicitly expected A response when service unless there is an event associated to it. Response execution completes can be Different recipients can respond explicitly expected differently to the same event. It is not known how recipients respond to this event Only one-way communication from the Message Exchange Various MEPs available. publisher service to subscriber Pattern (MEPs) Synchronous and asynchronous (s) (publish subscribe pattern). All recipients listening to the event are Multiple consumers One at a time notified 28 Table 5: Side by side comparison of the potential adherence to the Statistical Service Design Principles of "Classic" SOA and EDA (No, Depends on, Yes-partially, Yes-fully) Service Design Principles Event Driven SOA (EDA) Yes-fully Managed standardized service Yes-fully Functionality need to be contracts based on GSIM Functionality explicitly described externally (textually). fragments/entities. defined in the service interface The event type data payload is (in the operation parameters). a contract based on GSIM. Enable services to be loosely coupled externally and be aware of internal coupling. Maximize service autonomy (completeness) to enable share-ability and reusability (External & Internal) Non-functional requirements (quality attributes) form a key input in design decisions. Granularity based on the level below a GSBPM sub phase Independence between design and implementation "Classic" SOA Yes-partially Loose coupling based on strict service contract is possible Yes-fully Very loose coupling. Provider is not aware of consumer's existence. Consumers is coupled to the event only (not the source of the event if a mediator is in place for subscription) Depends on implementation but potentially Yes-fully Depends on implementation but potentially Yes-fully Depends on implementation. Depends on implementation. Depends on design Depends on design Depends on implementation but potentially Yes-fully Depends on implementation but potentially Yes-fully 29 Annex 3: Glossary Table 6: Glossary Term Definition Application Architecture The set of practices used to select, define or design software components and their relationships. Source: Sprint Architectural Pattern The description of a recurring particular design problem which comes from different design contexts. The solution schema is specified by describing its components, its responsibilities its relations and the ways they collaborate. Business Architecture Defines and evolves understanding of what our organization does and how we do it (statistics in our case) Source: Statistics New Zealand Business Process A series of logically related activities or tasks performed together to produce a defined set of result. Capability Architecture 1. Start with business outcomes. To a CEO, the metrics — the business outcomes — that most determine the organization’s success boil down to the balance sheet, statement of operations, and statement of cash flow, combined with soft metrics that indicate the organization’s power, stature, and influence in the marketplace. 2. Direct the evolution of business capabilities. Business plans make a poor foundation for tech strategy. They will — and should — constantly change in response to competitive and political dynamics. A better foundation is an organization’s core capabilities — product design, customer service, etc. — which we build on and evolve to achieve better outcomes. But we need to do more than model and plan business capabilities, we need to implement them using an integrated, operational, measurable combination of people, processes, technology, and physical resources. Thus, Forrester places the notion of an evolving business capability implementation — a complete running part of a business — as the target for Business Capability Architecture. 3. Provide design implementation models oriented around business change. A business capability map is good, but if you can’t carry it through to implementation, it’s just a pretty picture. So Forrester’s Business Capability Architecture — by analyzing the ways that businesses change and evolve — provides design principles and implementation models for integrated, coherent, holistic implementation of business capabilities. This includes the notion of a business capability platform — a cohesive, integrated, multitechnology, business-focused platform — that gets past the single technology focus of our current day BPM apps, event-driven apps, and the like. Source: Forester 30 Term Definition Common Statistical Production Architecture A set of principles for increased interoperability within and between statistical organizations through the sharing of processes and components, to facilitate real collaboration opportunities, international decisions and investments and sharing of designs, knowledge and practices Source: Sprint Composition Assembly of Services that in itself is a service Enterprise Architecture Enterprise architecture is about understanding all of the different elements that go to make up the enterprise and how those elements interrelate. It is an approach to enabling the vision and strategy of an organization, by providing a clear, cohesive, and achievable picture of what’s required to get there. Source: Sprint Function A sequence of program instructions that perform a specific task, packaged as a unit. Global Platform roadmap An international level roadmap showing at what point in future new capabilities will be added to the Statistical Organization library GSBPM aligned process catalogue Collection of pre-assembled GSBPM sub processes. Each element listing preconditions, inputs, output, outcome GSIM aligned Information Object Catalog Collection of schemas, each schema describing one information object exchanged between services through an interface. Industry Architecture A set of agreed common principles and standards designed to promote greater interoperability within and between the different players that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals. Source: Sprint Information Architecture Builds understanding of our information, its flows and uses across the organization, and how we manage that information. Source: Sprint Integration An Architecture specifically focusing on the integration, co-operation, etc. of certain elements, e.g. Components, 31 Term Definition Architecture Services or data. Source: Sprint Interface A type of contract by which subsystems or component communicate. Source: Sprint Message A model that describes how two different parts of a message passing system connect and communicate with each Exchange Pattern other. There are two major message exchange patterns - a request-response pattern and one-way pattern (also called fire-and-forget). For example the HTTP is a request-response pattern protocol and UDP is a one-way pattern. Metadata Driven An architecture that supports the design, composition, operation, and management of statistical business processing and its inputs and outputs through interaction with standard metadata - all of which is automated to the maximum degree possible. Orchestration A way of stringing together a number of services to build a statistical process. Orchestration composes services to a new service that has central control over the whole process. Orchestration vs Workflow The main difference between a workflow automation and orchestration is that work flows are processed and completed as processes within a single domain. (Erl) Orchestration is the connecting and automating of workflows when applicable to deliver a defined service. Principles General rules and guidelines, intended to be enduring and seldom amended, that inform and support design and decision making, and the way in which an organization sets about fulfilling its mission. Source: Sprint Protocol Formats and rules for exchanging messages in or between computing systems Quality attributes Quality attributes are the overall factors that affect runtime behavior, system design, and user experience. They represent areas of concern that have the potential for application wide impact. When designing applications to meet any of the quality attributes requirements, it is necessary to consider the potential impact on other requirements and to analyze the tradeoffs between them. The importance or priority of each quality attributes differs from system to system. Reuse Reuse is the concept of using a common asset (implemented component, a component definition, a pattern...) 32 Term Definition repetitively in different (or similar) contexts (for example in different business processes), and/or by different participants, and/or overtime. Source: Sprint Service A service is a logical representation of a repeatable business activity that has a specified outcome and is selfcontained, may be composed of other services and is a "black box" to consumers of the service. Source: TOGAF (G113): Service Contract A service contract is comprised of one or more published documents (called service description documents) that express meta information about a service. The fundamental part of a service contract consists of the service description documents that express its technical interface. These form the technical service contract which essentially establishes an API into the functionality offered by the service. A service contract can be further comprised of human-readable documents, such as a Service Level Agreement (SLA) that describes additional quality-of-service features, behaviors, and limitations. Source: http://serviceorientation.com/soaglossary/service_contract Service Design Principles A service design principle represents a highly recommended guideline for shaping solution logic in a certain way and with certain goals in mind. These goals are usually associated with establishing one or more specific design characteristics (as a result of applying the principle). Source: http://serviceorientation.com/index.php/soaglossary/design_principle Service Interface A service interface is the abstract boundary that a service exposes. It defines the types of messages and the message exchange patterns that are involved in interacting with the service, together with any conditions implied by those messages. Source: http://www.w3.org/TR/ws-arch/#service_interface Service Level Service Levels describes named categories of measurable increments of non-functional/quality attributes (like scalability, availability, security, transaction support…) packages offered by a service provider. For example, service levels for availability might be offered as GOLD (99,999%), SILVER (98%) and BRONZE (95% up time). Service Levels are used by the provider to negotiate with a service consumer a service-level agreement (SLA). Service Oriented Architecture Service-Oriented Architecture (SOA) is an architectural style that supports a way of thinking (Service Orientation) in terms of services and service-based development and the outcomes of services. 33 Term Definition The SOA architectural style has the following distinctive features: It is based on the design of the services – which mirror real-world business activities – comprising the enterprise (or inter-enterprise) business processes. Service representation utilizes business descriptions to provide context (i.e., business process, goal, rule, policy, service interface, and service component) and implements services using service orchestration. It places unique requirements on the infrastructure – it is recommended that implementations use open standards to realize interoperability and location transparency. Implementations are environment-specific – they are constrained or enabled by context and must be described within that context. It requires strong governance of service representation and implementation. It requires a “Litmus Test”, which determines a “good service”. Source: The Open Group http://www.opengroup.org/soa/source-book/soa/soa.htm Share Share is an ownership concept where an asset is made available to other participants for use. There are levels of sharing. A limited form of sharing would be to provide another participant with the means to replicate (make a copy) the asset (for example give the source code)(i.e. they share an aspect of the asset only). A more involved form of sharing would entail that asset is actually been made entirely common (in this case the asset is also reused). Source: Sprint Technology Architecture The architecture describing the infrastructure technology underlying (supporting) the business and application layers. Source: Sprint 34