Report of the Statistical Centre of Iran (SCI) On Implementation of Plan of Action for the Framework of Cooperation in Statistics Prepared for the nd 2 High Level Expert Group Meeting (HLEGM) on Statistics of the ECO National Statistical Offices & Detailed proposal of the SCI on Establishment of the ECO Statistical Network 24-25 September 2009 Dushanbe, Tajikistan SCI Report Since submission of the ECO Framework of Cooperation and Plan of Action in Statistics, Statistical Centre of Iran (SCI) has considered objectives and contents of the Plan as a base for intensive cooperation in the field of statistics among ECO member states and actively participated in the related events. In this line, the SCI has proposed and supported the idea of usage of effective modalities for implementation of the Plan of Action such as setting up High Level Expert Group (HLEG) on Statistics as a modality for cooperation of the ECO National Statistical Offices in the field of statistics. The followings are some aspects of activities adopted by the SCI for implementation of the Plan of Action in recent years: 1. Hosting ECO events in the field of statistics: Statistical Centre of Iran has shown its interest and readiness for hosting ECO events in the field of statistics. Most important ECO events relating to statistical issues have been hosted by the SCI namely: 1.1 The first meeting of the Heads of the ECO National Statistical Offices was hosted by the SCI during 28-29 January 2008 in Tehran. It was the first meeting of authorities and high ranking officials of the ECO NSOs and the first step toward realization of the Plan of Action on Statistics in the region. This meeting was concluded with issuance of Tehran Communiqué expressing willingness of the ECO NSOs for active cooperation in the field of statistics and proposing for taking effective measures in this regard. 2 1.2 The first meeting of the ECO High Level Expert Group (HLEG) on statistics was hosted by the SCI during 26-27 October 2008 in Tehran. This meeting was organized based on the outcomes of the first meeting of the Heads of the ECO National Statistical Offices in January 2008. The HLEG on statistics discussed the most important issues and problems of joint statistical activities in the region and proposed a number of proposals for consideration and further actions by the ECO member states. These two important events were initial steps for promotion of cooperation in the field of statistics in the ECO region in line with the Plan of Action. 2. Establishment of the ECO Statistical Network: One of the main issues of the ECO Plan of Action for the Framework of Cooperation in Statistics is creating institutional mechanism for this purpose. This issue was raised in the first meeting of the ECO Heads of NSOs in January 2008 and establishment of the ECO Statistical Network was proposed by the SCI. Following this meeting, this proposal was circulated among member states by the ECO secretariat for receiving their comments and views. The proposal was discussed in the first meeting of the High Level Expert Group (HLEG) on statistics in October 2008 and also in the 6th NFPs of Economic Research and Statistics in November 2008 in Ankara. The 19th RPC meeting decided to put this proposal into operation subject to approval by the CPR. Finally, the 469th CPR meeting on 7 June 2009 approved proposal of the SCI for establishment of the ECO Statistical Network 3 and allowed the Statistical Centre of Iran to realize this proposal in cooperation with ECO secretariat. Detailed information of the ECO Statistical Network has been prepared by the SCI and will be presented to this meeting (Annex I). 3. Capacity building and organizing regional training workshops and courses: During the last two years, the Statistical Centre of Iran organized and hosted a number of professional workshops and training courses for the ECO member states and other countries in the Asia and the Pacific region. Some main specifications of these workshops and training courses are described below: 3.1 Sub-regional Course on Statistics for the Countries in Transition in Central Asia and Caucuses: This course organized by the SIAP (Statistical Institute for Asia and the Pacific) and hosted by the Statistical Centre of Iran during 21 April to 2 May 2007 in Tehran. Representatives from 12 countries (including Armenia, Azerbaijan, Georgia, Kazakhstan, Kyrgyzstan, Tajikistan, Uzbekistan and Iran) participated in this course. 3.2 Workshop on Economic Statistics and Informal Sector: This workshop organized for the ECO countries with cooperation of UNSD, UNESCAP and ECO during 10-13 November 2007. The statistical Centre of Iran hosted this workshop and representatives from 9 ECO member states participated in the workshop. The workshop resource persons/ lecturers were from UNSD and UNESCAP. 3.3 Workshop on Geographical Information System (GIS): This workshop was designed and hosted by the Statistical Centre of Iran with cooperation of the ECO secretariat during 19-22 April 2009 based on the proposal of the SCI in 6th NFPs on Nov. 2008 in Ankara. The course conducted by the Statistical Research and Training Centre (SRTC) and lectures of the workshop delivered by experts from Office of Map and Geospatial Information of the SCI. Representatives from 7 ECO member countries participated in the workshop. It was the first workshop which was designed based on the capabilities and potentialities of experts of the member states for capacity building within the region. 4 3.4 Workshop on the System National Accounts: Another workshop proposed by the SCI in the 6th NFPs on Economic Research and Statistics on Nov. 2008 in Ankara was held in the field of national accounts with cooperation of the ECO secretariat during 17-20 May 2009 in Tehran. Office of Economic Accounts of the SCI was assigned for preparation of the curriculum of the workshop and Statistical Research and Training Centre (SRTC) was responsible for conducting the event. Representatives from 7 member countries of the ECO participated in this workshop as well as representatives from UNESCAP and ECO Trade and Development Bank. Main aspects and structure of the System of National Accounts (SNA Rev. 1993) were provided and discussed in this workshop by an expert from Office of Economic Accounts of the SCI. It was the second workshop organized for the ECO member states with cooperation of experts from ECO NSOs for capacity building in the region. 5 As described above, the Statistical Research and Training Centre (SRTC) has significantly contributed to the training programs of the SCI for ECO member states. So this centre with its experience and good capacity (in terms of software and hardware) for organizing regional and international training courses and workshops can effectively assist the ECO secretariat and ECO NSOs as one of the statistical training centers for capacity building, improvement of statistical science and development of cooperation in the field of statistics in the region. 4. Preparation of data for National Economic Report: For publication of the ECO Annual Report, the questionnaire of the ECO secretariat was completed and sent back to the Secretariat covering requested data for 35 items for the period of 2000-2007. 5. Providing Metadata for statistical items: The issue of harmonizing concepts and definitions of statistical items and preparation of metadata is one of the main issues in cooperation in the field of statistics which must be pursued by the ECO NSOs. This issue also was discussed in the first meeting of the ECO High Level Expert Group (HLEG) on statistics and member states agreed to cooperate in this regard. Based on the request of the ECO secretariat, the SCI has prepared and completed required metadata for statistical items of the ECO key indicators and ECO socio-economic indicators and provided them to the Secretariat. 6. Roster of Leading Experts in Statistics: In the first meeting of the ECO High Level Expert Group (HLEG) on statistics, the SCI set forward a proposal for creating Roster of Leading Experts from national statistical offices in the region 6 as a directory table considered by the ECO secretariat in capacity building, organizing training courses and workshops and exchange of experience and expertise within the region. Statistical Centre of Iran has already prepared the list of its leading experts in the various fields of statistics (economic statistics and national accounts, Population and labor force statistics, statistical survey design, data processing and data warehousing. . .) and provided the Secretariat with the list for further measures. Annex I ECO Statistical Network I. Introduction 7 This project aims at developing a comprehensive system of statistical data and information management for the ECO Member States which the Statistical Centre of Iran has been selected to develop and manage. ECO Statistical Network is a place through which member countries can access to these main facilities: General Information: Users can access to general information such as: publication, events, and general information about member countries, figures and chart. Business intelligence and data warehouse capabilities All these functionalities are given in next sections. II. Proposed Solution The ECO Statistical Network uses various information from member countries and international organizations, and it provides a suitable platform for data to be saved for responding to any kind of analytical queries. This solution is based on Data Warehousing technology, so it's a good idea to review a few definitions in this area: Data Warehouse: A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. 8 Data Warehouse features: Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing (day-to-day) operations. For example Population, National Accounts Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. Benefits of Data Warehousing: Some of the benefits that a data warehouse provides are as follows: A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc. Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis. Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time. Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems. Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems. Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals. OLAP (Online Analytical Processing): OLAP allows users to analyze database information from multiple database systems at one time. While relational databases are considered to be two-dimensional, OLAP data is multidimensional, meaning the information can be compared in many different ways. For example, a company might 9 compare their computer sales in June with sales in July, and then compare those results with the sales from another location, which might be stored in a different database. For example, if a report shows sales are trending lower than expected, business users need to be able to easily uncover the underlying issue by getting answers to questions such as: Is the problem with one product line, or certain regions? What is different between underperforming products or regions versus other combinations that are performing well? Is there a related problem with sales headcount? Marketing campaigns? Or something else? Main functions of ECO Statistical Network are: 1- Data Integration(ETL) Data is everywhere. Providing a consistent, single version of the truth across all sources of information is one of the biggest challenges faced by IT organizations today. The ECO Statistical Network's Data Integration delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative approach. The ease of use in our graphical, drag-and-drop design increases productivity and our extensible; standards based architecture ensures that you will never be forced to adopt proprietary methodologies into your ETL solution. Extract: Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files. Transform: The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. Some important functions of Transform stage: I. Selecting only certain columns to load II. Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female) III. Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M) IV. Deriving a new calculated value (e.g., sale_amount = qty * unit_price) V. Filtering VI. Sorting VII. Joining data from multiple sources (e.g., lookup, merge) VIII. Aggregation (for example, rollup - summarizing multiple rows of data - total sales for each store, and for each region, etc.) IX. Transposing or pivoting (turning multiple columns into multiple rows or vice versa) 10 Applying any form of simple or complex data validation. If validation fails, it may result in a full, partial or no rejection of the data, and thus none, some or all the data are handed over to the next step, depending on the rule design and exception handling. Many of the above transformations may result in exceptions, for example, when a code translation parses an unknown code in the extracted data Load: The load phase loads the data into the end target, usually the data warehouse (DW). 2- Analysis ECO Statistical Network Analysis Overview Analysis puts rich, analytic power in the hands of your knowledge workers – helping them operate with maximum effectiveness by gaining the insights and understanding they need to make optimal business decisions. For example, if a report shows sales are trending lower than expected, knowledge workers need to be able to easily uncover the underlying issue by getting answers to questions such as: I. Is the problem with one product line, or certain regions? II. Is it all states within that region, or a combination of certain products in certain regions? III. What is different between underperforming products or regions versus other combinations that are performing on target? IV. Is there a related problem with sales headcount? Marketing campaigns? Or something else? Analysis helps answer these kinds of business questions by: I. Making it easy for users to freely explore business information by interactively drilling into and cross-tabulating data II. Providing speed-of-thought response times to complex analytical queries III. Presenting data multi-dimensionally and letting users select what dimensions and measures to explore 3- Ad-hoc reporting All organizations use reporting in one form or another. As a result, reporting is considered a core Business Intelligence (BI) need and is frequently the first BI application deployed. ECO statistical Network Reporting allows members to easily access, format, and distribute information to their users. Flexible deployment from standalone desktop reporting to embedded reporting and enterprise business intelligence Broad data source support including relational, OLAP, or XML-based data sources 11 Popular output options including Adobe PDF, HTML, Microsoft Excel, Rich Text Format, or plain text Web-based ad hoc query and reporting for business users Enterprise Edition provides enhanced software functionality, comprehensive professional technical support, product expertise, certified software and software maintenance, and more III. Deployment For extraction data and information, setting up a FTP Site is suggested for entry of statistical data, indicators and items so that the ECO Statistical Database can be fed from it. Flowchart of the work process based on FTP Site FTP-Based ETL Cnt 1 Reporting Tools Cnt 2 OLAP Staging Cnt 3 Development & Design Tools SCI’s Data warehouse Members’ DBs At last, FTP Site will be developed and implemented, and the log in and data entry facilities for each of the ECO Member States will be available. Username and Password are given to every country member for uploading data. As shown in the flowchart, after data entry (uploading) by the member states, ETL (Extraction, Transform, Load) process extracts the data from this site and loads them at staging (temporary) database. This operation is done automatically. The most important 12 thing is that the ETL system supports most of the formats. Common data formats (Microsoft Excel, XML, HTML, TSV/CSV and the like) are supported by ETL system. After extraction of data, the data should be managed. That is, some of the following activities may be taken for unification and consolidation of display format and management of consistencies and conflictions: Definition of conditions and contextual and thematic control rules Selecting and transferring only certain columns and fields Translating coded values (e.g., the source system stores code 1 for male and code 2 for female), Cleaning up values Joining data from multiple sources Aggregation (for example, minimum, maximum, number of records, etc) …. 13 Above you will find some schemas of ETL process. After these stages, managed data are loaded in the Data Warehouse (DW). The loaded data and information must be done in the format on which cubes can be defined. Finally, reports produced from the data cubes which are designed on the special Site will be released. Data Warehouse of the ECO is implemented and put on a special RDBMS PostgreSQL. As mentioned before, for launching the ECO Statistical Database, the following stages are required. 14 Project's Progress IV. Pilot In pilot stage, all statistical indicators were extracted from ECO website from http://www.ecosecretariat.org and then categorized into a few segments (such as Population, Financial Intermediation, etc) based on Iran Statistical Yearbook. At next step these data were transformed and loaded into a main database. After that all data cubes were designed and published. A few examples of ECO statistical data cubes are shown in figures 3, 4: In figure 3 an indicator, External Public Dept-GDP for two dimensions (year and country) along with time-series are shown. In figure 4 three indicators of Financial Intermediation for two dimension are shown. As you can see sum of any indicator is shown for all countries. If someone clicks on plus near All Countries he'll see these indicators for all countries. 15 Figure 3: a sample of time series 16 Figure 4: a sample of data cube In figure 5,6 a few charts and graphs are shown. 17 Figure 5: Financial Intermediation and charts 18 19 20