Project / Subproject Authors Date issued / Version IDA/EINRC framework 501998 TietoEnator Corporation 02 April 2004/1.1 Design Report Specific Agreement-7/EIONET Client Info Reference / Page European Commission DG Enterprise / B 6 Detailed Design of Data Dictionary DD DDD v1.1.doc / 1 of 25 EUROPEAN COMMISSION ENTERPRISE DIRECTORATE-GENERAL IDA-Programme -Framework Contract No 501 998 - 7th Specific Agreement Einrc - Eionet DETAILED DESIGN Data Dictionary Detailed Design of Data Dictionary system EINRC/DD 2(25) MODIFICATION HISTORY Date 24.03.2003 09.04.2003 12.12.2003 05.01.2004 08.03.2004 02.04.2004 Version 0.1 0.2 0.3 0.4 1.0 1.1 Author Reason for modification J.Heinlaid J.Heinlaid J.Heinlaid J.Heinlaid E.Palosuo J.Heinlaid First Draft Updated incorporating changes requested by project leader Updates documenting developments done within SA-6 Updates documenting last developments within SA-6 Status updated to final approved SA6 deliverable Updates documenting developments in the first increment under SA7 APPROVAL PROCEDURE Reviewer Version Project Team Frm mgr SA5/02 SA6/04 SA7/1.1 SA5/0.2 SA6/0.4 SA7/1.1 SA6/1.0 SA7/x.x Proj. Leader CMT review Start Date 28.03.03 09.01.04 11.06.04 09.04.03 28.01.04 02.04.04 28.01.04 xx.xx.04 Finish Date 09.04.03 28.01.04 14.06.04 11.04.03 03.02.04 02.04.02 08.03,04 Required modification/Approval/Comments Approved DD for the first SA7/increment approved Approved as SA5 deliverable Updated version approved as SA6 deliverable DD for the first increment approved SA6 deliverable approved Background material: A1: ' Data Dictionary: Current Situation and User Requirements' document from SA-5 http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/dd_ur_v1_2_doc/_EN_1.0_&a=d A2: 'Data Dictionary: Software Requirements and Architectural Design' document from SA-5 http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/dd_sr_ad_v1_doc/_EN_1.2_&a=d A3: 'Data Dictionary - part of REPORTNET' document, created by Arvid Lillethun http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/021213_dictionary/_EN_1.0_&a=d A4: 'Data Dictionary: Current Situation and User Requirements' document from SA-6 http://nmc.eionet.eu.int:8980/irc/DownLoad/cgut1gLBhpL0qe4m4KSm4qeqGZ0EHm1pU5R978bRspj2601dq5vk0j3lG4u1LfGpxAoMfGZ0o/DD%20UR%20031003v01.doc A5: 'Data Dictionary: Current Situation and User Requirements' document from SA-7 http://nmc.eionet.eu.int:8980/irc/DownLoad/cjuy1ULHhoLtqu4f4TS34P-Cqc2CGFle3rjF4t4-ZUC784CfZ5SbNfZsTR6Fh-qMZq69-QXC4/DD%20SA-7%20UR%20v0.2.doc A6: 'DD: Project Management Plan' document from SA-7 http://nmc.eionet.eu.int:8980/irc/DownLoad/c5uG1hLVhRLFqt444NS3456quCw0XLNZRc9YW4Gotp9pLkcH-jPb20s-pAhF5AjDs46S0Us-xHQK/DDPMP%20v1.0.doc _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 2 Detailed Design of Data Dictionary system EINRC/DD 3(25) A7: Task lists for DD releases in DD folder of EINRC/NMC library: http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa7_2004/dd&vm=detailed&sb=Title 1. Table of Contents FOREWORD ............................................................................................................................................................4 1. OBJECTIVES AND SCOPE ..........................................................................................................................5 2. CONCEPTS .....................................................................................................................................................7 2.1 2.2 3. FUNCTIONALITY .........................................................................................................................................9 3.1 3.2 3.3 3.4 4. DATA .........................................................................................................................................................7 METADATA ................................................................................................................................................8 MANAGEMENT OF DATA DEFINITIONS ......................................................................................................11 MANAGEMENT OF METADATA DEFINITIONS .............................................................................................11 ALL USERS FUNCTIONALITY .....................................................................................................................12 DATA DEFINERS AND ADMINISTRATORS FUNCTIONALITY .........................................................................12 IMPLEMENTATION ...................................................................................................................................13 4.1 SYSTEM ARCHITECTURE ..........................................................................................................................13 4.2 DATA MODEL & BUSINESS LOGIC .............................................................................................................14 4.2.1 Model & business logic for basic concepts ........................................................................................15 4.2.2 Model & business logic for classification schemes ............................................................................18 4.2.3 Unique identifiers ...............................................................................................................................19 4.3 IMPORT TOOL...........................................................................................................................................19 4.4 USER INTERFACE .....................................................................................................................................24 4.4.1 Client side ...........................................................................................................................................24 4.4.2 Server side ..........................................................................................................................................24 4.5 USER AUTHENTICATION & ACCESS CONTROL...........................................................................................25 4.6 USED SOFTWARE .....................................................................................................................................25 5. CONFIGURATION & INSTALLATION ...................................................................................................25 _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 3 Detailed Design of Data Dictionary system EINRC/DD 4(25) Foreword This document presents the detailed design of the Data Dictionary (DD) system, which by the end of SA-6 has reached its first operational stages and therefore no changes to the major design concepts are expected in the system and in this document. However, since the development of DD is going to continue in SA-7, this document is by no means final and done with changes. It is very probably going to be subject to change in SA-7 as well. Still, the documentation of basic design principles and implementations that have not changed for quite some time and that are not very likely to substantially change in the future as well, has been provided here. Most notably it includes: basic types of data (i.e. concepts) that the DD handles, all the functionality that the DD provides on top of it (Note that this document only lists it. In more detail it is provided in a separate user guide document), data model framing the concepts and functionality, basic principles and technical documentation of how it has all been implemented, description of DD's main use cases and working sequences, technologies used in implementing DD. The document is closely related to the background material listed above: The basic concepts and the functionality that the DD handles are mostly based on user requirements set in A1, A4 - A7 A2 has been the basis of the general layout of DD's User Interface, of its data model, technical architecture and of how it handles the support for XML Schema formatted definitions and ISO11179. A3 and A6 is the sole basis of principles on how the DD handles the concrete and realistic business needs of Reportnet currently. Meaning the functionality and concepts circling around datasets and tables, import/export formats based on Microsoft and selected pilot content. In a way this document can be seen as a consolidation of the above background material. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 4 Detailed Design of Data Dictionary system 2. EINRC/DD 5(25) Objectives and scope Objective Data Dictionary is to become a web-based data semantics registry for the Reportnet. It will hold definitions of datasets, tables and data elements exchanged over the Reportnet, whether by humans or applications. It will provide functions to import such definitions in a machine-readable form, manage them through a web-based user interface and export them both in machine and human readable format. An API for other applications to perform selected operations shall be included as well. Data Dictionary shall follow the common standards, technologies and recommendations in data semantics field as much as concrete business needs make it possible. The main standard of those is ISO 11179. Scope The following scope has been implemented by the end of Specific Agreement-6: A database model supporting about ~70% of the data semantics functionality suggested by ISO11179 (http://www.diffuse.org/meta.html - ISO11179): - Part 1 (Framework for the specification of data elements) - Part 2 (Classification for Data Elements) - Part 3 (Basic Attributes of Data Elements) - Part 5 (Naming and Identification Principles for Data Elements) Part 4 about "Rules and Guidelines for the Formulation of Data Definitions" is more about consulting EEA on principles for formulating data definitions. Part 6 deals with registration of data elements, i.e. it is to keep track of administrative metadata which is going to be implemented in the project's later phases. NB! Note that the above listed implemented parts may not have been implemented to the fullest. On top of the ISO 11179 compliant data model, a user interface for - managing definitions of datasets, - managing definitions of tables in datasets, - managing definitions of data elements, - managing definitions of attributes for datasets, tables and data elements, - managing namespaces (i.e. contexts) in which the definitions have been made. Transparently for the user, support for representing any of these definitions in XML Schema format has been implemented. However, only a fraction of XML Schema specification is supported, because building a web-interface for managing XML Schema to its full specs would be huge work. Supported are: - declaration of elements - element annotations _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 5 Detailed Design of Data Dictionary system - value domain annotations - simple types - restrictions based on simple types - complex types (but not attributes!) - extension of complex types · sequences · choices EINRC/DD 6(25) This XML Schema fraction is supported both in the UI and import/export function. Note that aggregate data elements are not supported. XML Schema complex types, sequences and choices have been used to represent the hierarchical structure of datasets > tables > data elements. To enable comfortable data definition process, a trivial Import tool was implemented, based on MS Access and enabling to manage definitions in a flat table format. The tool generates XML of whatever has been entered into its tables and the XML is then finally imported into DD database. Throughout several intensive workshops at the client pilot datasets were identified and their definitions (together with tables and data elements) were imported into the DD through the Import tool. A separate User Guide document was written on how the UI and Import tool can be used. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 6 Detailed Design of Data Dictionary system 3. EINRC/DD 7(25) Concepts Before you go further, read through the following basic concepts around which the definition management processes circle in DD. 3.1 Data Dataset A collection of data exchanged between applications or humans. In Reportnet and Data Dictionary's context a dataset is a collection of tables containing the reported data. Often the "tables" will actually recede to a single table only. Usually datasets come as MS Access databases or MS Excel files. They are subject to certain data flows and obliged to be reported by Reportnet players according to legislation. Table A table in Data Dictionary's context is a table in dataset. It can be either a data table or a lookup table for how to interpret the data. A lookup table can be for example made for holding country codes or whichever other code lists. Columns in a table stand for data elements, rows for their values. Data element A data element in Data Dictionary's context is a column in a table. It can be for example 'Country', 'Latitude', 'Longitude', 'Unit', 'Value', etc. There are three types of data elements: Data element with quantitative values Sometimes also referred as 'data element with measured values ' or 'characteristic 2' or 'characteristic of type 2 '. This is a data element that can have whichever values. Data element with fixed values Sometimes also referred as 'data element with allowable values' or 'code list 'or 'characteristic 1' or 'characteristic of type 1 '. This is a data element that can have values from a pre-fixed set of values only. Usually those values are some kind of codes, hence the 'code list'. Aggregate data element Sometimes also referred as 'data element with a structure ' or simply 'aggregate '. This is a data element that cannot have any values. Instead it has a structure, consisting of other data elements, i.e. it's an aggregate of other data elements. The latter can be of any of the three types. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 7 Detailed Design of Data Dictionary system 3.2 EINRC/DD 8(25) Metadata Attribute An attribute in Data Dictionary's context is an attribute of a definition of a dataset, table or data element. Most common attribute of all definitions is 'Name', standing for the name of the defined object. Other attributes could be for example 'Definition', 'Version', etc. There are two types of attributes recognized by DD: Simple attribute Every such attribute in its instance is a name/value pair. They represent the same concept of attribute as in ISO 11179 part 3. For example the mentioned 'Name', 'Definition', 'Version' are a good example of simple attributes. NB! Note that simple attributes can have pre-fixed sets of allowable values, just like the elements of characteristic type 1 could have. Again, such values are pretty often some kind of codes. Complex attribute A complex attribute in Data Dictionary's context is almost the same as a simple attribute. The only distinction is that while a simple attribute is a name/value pair, the complex attribute has a structure of its own, consisting of several fields, each of which is then a name/value pair. For example a complex attribute could be RelatedSource, having two fields: RelatedSourceID & RelationType. An instance of complex attribute is then a set of rows, each containing the values of fields in their order. Complex attribute is something specific to DD, it is not present in ISO 11179. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 8 Detailed Design of Data Dictionary system 4. EINRC/DD 9(25) Functionality In general there can be 2 views on Data Dictionary’s functionality: 1. Based on the general types of data it handles, the functionality can be divided into logically two separate systems: the system dealing with management of data definitions (storing, searching, importing/exporting definitions of datasets, tables and data elements). the system dealing with management of metadata used in data definitions (storing and searching the basic attributes of data definitions and namespaces in which the definitions are seen). 2. Based on user groups, the functionality can be divided into 3 as follows: All users functionality (fig 3.1) Available for all human users- administrators and public users Data definers functionality (fig 3.2) Available both for data definers and DD administrators Administrators only functionality (fig 3.2) Currently the functionality sets of data definers and administrators are equal Search dataset Al l users Search tables Browse list of datasets Browse list of tables Browse dataset detailed view Browse table detailed view Browse allowable values Se arch element Figure 3.1. Browse list of elements Browse element detailed view Browse subelements Use cases for all users _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 9 Detailed Design of Data Dictionary system Search dataset Browse list of dataset s Browse dataset detailed view Add a new dataset Add a new table Search tables Browse li st of t ables Administrator Add a new element Search element Browse list of elements EINRC/DD 10(25) Edit dataset Browse/edit tables within dataset Browse tabl e detai led view Edit table Browse/edit elements within table Browse element detailed view Browse/edit c omplex att ri but es Browse/edit allowable values Edit element Browse/edit subelements Browse/edit list of namespaces Browse namespace detailed view Browse/edit list of attributes Browse attribute detailed view Edit complex attribute Add a new attribute Edit sim ple attribut e Export/Import Figure 3.2. Edit namespace Edit complex attribute fields Use cases for data definers and administrators NB! Note that this document only lists the functionality. Further guidelines on how to actually use it and which are the screenshots, are available in a separate User Guide document. This kind of approach was taken because of three reasons: The amount of different user views and use cases in DD is quite huge. To document such an amount of use cases in Detailed Design Document would make it all together with other parts uncomfortably and frighteningly long. It is very likely that guidelines would be also needed by users, who shouldn't however be given access to the detailed design. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 10 Detailed Design of Data Dictionary system 4.1 EINRC/DD 11(25) Management of data definitions Grouped by concepts it handles, the functions can be seen as follows: 4.2 Datasets search for, by selected criteria view search result list view a dataset definition, including all it's attributes and definitions of its tables add a new dataset (define its attributes and tables) edit an existing dataset (define its attributes and tables) delete an existing dataset Tables search for, by selected criteria view search result list view a table's definition, including all it's attributes and definition's of its data elements add a new table (define its attributes and data elements) edit an existing table (define its attributes and data elements) delete an existing table Data elements search for, by selected criteria view search result list view a data element's definition, including all its attributes and it's allowable values if it is of that type. add a new data element (define its attributes and allowable values) edit an existing data element (define its attributes and allowable values) delete an existing data element Management of metadata definitions Grouped by concepts it handles, the functions can be seen as follows: Attributes (both simple and complex) search for, by selected criteria view search result list view an attribute definition, including its fields, if it's of complex type add a new attribute (define also its fields, if it's of complex type) edit an existing attribute (define also its fields, if it's of complex type) delete an existing attribute _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 11 Detailed Design of Data Dictionary system 4.3 EINRC/DD 12(25) All users functionality In principle all users have access to the following functionality of the following basic DD concepts (see also fig 1): 4.4 Datasets search for, by selected criteria view search result list view a dataset definition, including all it's attributes and definitions of its tables Tables search for, by selected criteria view search result list view a table's definition, including all it's attributes and definition's of its data elements Data elements search for, by selected criteria view search result list view a data element's definition, including all its attributes and it's allowable values if it is of that type. Data definers and administrators functionality Since administrators also fall under all users, they obviously have access to the above described all-users functionality. But in addition can perform much more operations with much more concepts (see also fig 2): Datasets add a new dataset (define its attributes and tables) edit an existing dataset (define its attributes and tables) delete an existing dataset Tables add a new table (define its attributes and data elements) edit an existing table (define its attributes and data elements) delete an existing table Data elements add a new data element (define its attributes and allowable values) edit an existing data element (define its attributes and allowable values) delete an existing data element _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 12 Detailed Design of Data Dictionary system EINRC/DD 13(25) Attributes (both simple and complex) search for, by selected criteria view search result list view an attribute definition, including its fields, if it's of complex type add a new attribute (define also its fields, if it's of complex type) edit an existing attribute (define also its fields, if it's of complex type) delete an existing attribute note that for simple attributes one can also specify allowable values 5. Implementation 5.1 System architecture The overall system architecture of the Data Dictionary is given on figure 3.1. Figure 4.1 System architecture of the Data Dictionary _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 13 Detailed Design of Data Dictionary system EINRC/DD 14(25) The picture is made up of three main parts: Data Dictionary's software It's the big box in the middle of the picture. All the software has been written in Java programming language and therefore it runs in the Java Runtime Environment, which is marked with a dotted line. Data Dictionary 's database Shown on the right side of the picture and having to be JDBC (Java Database Connectivity) compliant. So the database handling runs over JDBC driver. Outer world interacting with Data Dictionary This is the WW, represented as the free-form object on the right side of the picture. Currently there are two types of interactions DD recognizes: Web client (WWW browser) The previously mentioned MS Access-based data Import tool (note that communication between the tool and DD software happens over XML). Later an API for applications as the 3rd interaction type will also developed. 5.2 Data model & business logic The data model, as mentioned in the scope chapter, covers about 70% of the functionality suggested by ISO 11179. Note that this does not necessarily mean that the model also looks like that of ISO’s. That's because striving towards ISO support in all its parts became a priority later than the project was started and the data model and its implementation had been advanced quite a bit to go back. Also concrete business logic needs have set restrictions to how generic the model can be. So whenever we are talking about ISO support, we are talking about supporting it in terms of functionality and not the data model. Since the model is quite big, it has been divided into two: the part covering basic concepts and XML Schema support (fig 4.2), and the part for supporting classification schemes (fig 4.3). _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 14 Detailed Design of Data Dictionary system 5.2.1 EINRC/DD 15(25) Model & business logic for basic concepts NAMESPACE NAMESPACE_ID : VARCHAR SHORT_NAME : VARCHAR FULL_NAME : VARCHAR 1 DEFINITION : VARCHAR PARENT_NS : NUMBER WORKING_USER : VARCHAR FXV * FXV_ID : NUMBER OWNER_ID : NUMBER OWNER_TYPE : VARCHAR VALUE : VARCHAR IS_DEFAULT : CHAR DEFINITION : VARCHAR SHORT_DESC : VARCHAR 1 1 1 1 TBL2ELEM TABLE_ID : NUMBER DATAELEM_ID : NUMBER POSITION : NUMBER DST2TBL DATASET_ID : NUMBER TABLE_ID : NUMBER POSITION : NUMBER * * * * * * 1 DS_TABLE TABLE_ID : NUMBER SHORT_NAME : VARCHAR WORKING_USER : VARCHAR WORKING_OPCY : CHAR REG_STATUS : VARCHAR VERSION : NUMBER DATE : NUMBER USER : VARCHAR CORRESP_NS : NUMBER PARENT_NS : NUMBER IDENTIFIER : VARCHAR 1 1 * 1 DATAELEM DATAELEM_ID : NUMBER TYPE : VARCHAR SHORT_NAME : VARCHAR EXTENDS : NUMBER WORKING_USER : VARCHAR WORKING_COPY : CHAR 1 REG_STATUS : VARCHAR VERSION : NUMBER 1 USER : VARCHAR DATE : NUMBER PARENT_NS : NUMBER TOP_NS : NUMBER IDENTIFIER : VARCHAR GIS : VARCHAR 1 * FK_RELATION REL_ID : NUMBER A_ID : NUMBER B_ID : NUMBER A_CARDIN : CHAR 1 B_CARDIN : CHAR DEFINITION : TEXT * DATASET DATASET_ID : NUMBER SHORT_NAME : VARCHAR VERSION : VARCHAR VISUAL : TEXT DETAILED_VISUAL : TEXT WORKING_USER : VARCHAR WORKING_COPY : CHAR REG_STATUS : VARCHAR DATE : NUMBER 1 USER : VARCHAR CORRESP_NS : NUMBER DELETED : VARCHAR IDENTIFIER : VARCHAR 1 1 1 1 1 * * * ATTRIBUTE * * * COMPLEX_ATTR_ROW ROW_ID : VARCHAR 1 PARENT_ID : NUMBER PARENT_TYPE : CHAR M_COMPLEX_ATTR_ID : NUMBER POSITION : NUMBER HARV_ATTR_ID : VARCHAR * * 1 HARV_ATTR HARV_ATTR_ID : NUMBER HARVESTER_ID : VARCHAR HARVESTED : NUMBER MD5KEY : VARCHAR 1 LOGICAL_ID : VARCHAR 1 * 1 HARV_ATTR_FIELD HARV_ATTR_MD5 : VARCHAR FLD_NAME : VARCHAR FLD_VALUE : VARCHAR Figure 4.2 * COMPLEX_ATTR_FIELD ROW_ID : VARCHAR M_COMPLEX_ATTR_FIELD_ID : NUMBER VALUE : TEXT M_ATTRIBUTE_ID : NUMBER DATAELEM_ID : NUMBER VALUE : TEXT PARENT_TYPE : CHAR * * 1 M_COMPLEX_ATTR_FIELD M_COMPLEX_ATTR_FIELD_ID : NUMBER M_COMPLEX_ATTR_ID : NUMBER NAME : VARCHAR DEFINITION : TEXT POSITION : NUMBER PRIORITY : CHAR HARV_ATTR_FLD_NAME : VARCHAR * * 1 1 M_COMPLEX_ATTR M_COMPLEX_ATTR_ID : NUMBER NAME : VARCHAR * SHORT_NAME : VARCHAR OBLIGATION : CHAR * DEFINITION : TEXT NAMESPACE_ID : VARCHAR DISP_ORDER : NUMBER DISP_WHEN : CHAR INHERIT : CHAR HARVESTER_ID : VARCHAR 1 M_ATTRIBUTE M_ATTRIBUTE_ID : NUMBER NAME : VARCHAR * SHORT_NAME : VARCHAR OBLIGATION : CHAR DEFINITION : TEXT NAMESPACE_ID : VARCHAR DISP_TYPE : VARCHAR DISP_ORDER : NUMBER DISP_WHEN : CHAR DISP_WIDTH : NUMBER DISP_HEIGHT : NUMBER DISP_MULTIPLE : CHAR INHERIT : CHAR 1 Model for basic concepts Business logic for model on figure 4.2: Datasets, tables and data elements are also referred simply as objects. Note that attributes and objects, for which they can be specified, are in separate tables. Attributes are defined in the so-called meta-tables M_ATTRIBUTE (for simple attributes) and M_COMPLEX_ATTR (for complex attributes) and M_COMPLEX_ATTR_FIELD (for complex attribute fields). _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 15 Detailed Design of Data Dictionary system EINRC/DD 16(25) The actual values of attributes (and their fields in case of complex ones) are kept in tables ATTRIBUTE (for simple attributes), COMPLEX_ATTR_ROW (for complex attributes) and COMPLEX_ATTR_FIELD (for complex attribute fields). Finally, the tables for objects whose definitions can have attributes are linked to attribute values' tables with 1-to-many relations, meaning that an object can have 1 or many attributes. So in other words the attribute values are stored as rows and not fields of table. This kind of separation is the only way to ensure that new attributes can be added to the system at any time, and existing ones can be edited or delete whenever you'd like. It also means that DD is not restricted to any set of standardized attributes, making it very flexible. Objects that can have attributes are stored in the following tables: DATASET for datasets. For each dataset there is exactly one row in his table. DS_TABLE for tables. They are always seen within a certain namespace, identified by NAMESPACE_ID. For each table there is exactly one row in his table. DATAELEM for data elements. As the NAMESPACE_ID suggests, every data element is also present in some namespace. There is exactly one row for each data element in this table. There’s a restriction that each data element must have a parent table and each table must have a parent dataset! This restriction might become dropped in future developments when DD is going to support harmonised content. Datasets, tables and data elements in DD are versioned (see the VERSION columns in corresponding tables). It means that there can be several versions of a logically same dataset, table or data elements. Thus one could see two types of concept identifiers in DD model: - logical ID (for distinguishing logically different objects) - versioned ID (for distinguishing different versions of a logically same object) It is easy to understand that versioned ID=logical ID + version. Versioned IDs are kept in DATAELEM_ID, TABLE_ID and DATASET_ID fields. A logically unique data element can and must belong only into one logically unique table and a logically unique table can and must belong only into one logically unique dataset. Logical IDs are counted as follows: - logical ID of a dataset is its identifier (IDENTIFIER) - logical ID of a table is made up of its identifier (IDENTIFIER) and its parent dataset’s logical ID (PARENT_NS, see below) - logical ID of a data element is made up of its identifier (IDENTIFIER) and its parent table’s logical ID (PARENT_NS, see below) A logically unique dataset stands for a unique context (aka namespace). A logically unique table stands for a unique context (aka namespace). Those corresponding contexts are defined and stored in the NAMESPACE table. Namespace identifiers are stored in NAMESPACE_ID table. Whenever a new logically unique dataset or table is created, DD also automatically creates the corresponding namespace in NAMESPACE table. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 16 Detailed Design of Data Dictionary system EINRC/DD 17(25) In DS_TABLE a table’s parent dataset’s logical ID is represented by the latter’s namespace id- PARENT_NS. In DATAELEM a data element’s parent table’s logical ID is represented by the latter’s namespace id- PARENT_NS. In DATAELEM a data element’s parent dataset’s logical ID is represented by the latter’s namespace id- TOP_NS. Data element types are identified by values in DATAELEM.TYPE, ones of 'AGG', 'CH1' (fixed values) or 'CH2' (quantitative values). TBL2ELEM table helps to overcome the many-to-many relationship between tables and elements with unique versioned IDs. DST2TBL helps to overcome the manyto-many relationship between tables and datasets with unique versioned IDs. The definition of the fields of complex attributes is resolved so that each complex attribute in M_COMPLEX_ATTR table can have one or many fields defined in M_COMPLEX_ATTR_FIELD. Storage of the actual values of complex attributes is resolved on the principle that each complex attribute value can be seen as a row, where columns stand for attribute fields (in the definition order) and cells for the values of every such field. To support that, there is a table for storing such rows- COMPLEX_ATTR_ROW. Each row in that table stands for exactly one unique complex attribute value row. And it has one or many field values stored in the COMPLEX_ATTR_FIELD table. DD enables to represent foreign key relations between the defined tables. This is done with the help of FK_RELATION table, which enables to create a foreign key relation between two data elements in DATAELEM table (naturally the two elements must be in tables different both by logical ID and versioned ID, but DD does not check upon it). A_ID and B_ID columns represent the versioned IDs of the two elements, A_CARDIN and B_CARDIN represent either side’s cardinality (weather it’s one-to-many, one-to-one, etc). DD enables to harvest complex attribute values from outside sources (normally an LDAP directory). Each harvested attribute’s value is represented by a row in HARV_ATTR table, the fields of those values are stored in HARV_ATTR_FIELD. The two tables related through MD5KEY-HARV_ATTR_MD5. A row in HARV_ATTR can be linked with a row in COMPLEX_ATTR_ROW through HARV_ATTR.LOGICAL_ID and COMPLEX_ATTR_ROW.HARV_ATTR_ID fields. If the latter is NULL, it means the corresponding row is not a harvested one. Harvested rows are preferred over non-harvested. Each complex attribute field in M_COMPLEX_ATTR_FIELD can be linked with only one FLD_NAME in HARV_ATTR_FIELD. There can be many complex attribute harvesters in DD, the attributes of which in HARV_ATTR are identified by HARVESTER_ID column. Each complex attribute identified in M_COMPLEX_ATTR can be linked with only one HARVESTER_ID. The GIS field in DATAELEM table indicates the data element’s GIS (Geographical Information System) type. If it’s NULL then it means that the element is no GIS element. Tables that contain GIS elements are called GIS tables. Fixed values of data elements and attributes are kept in FXV table where each value is uniquely identified by FXV_ID. OWNER_ID contains the ID of the value’s owner and OWNER_TYPE indicates the owner’s type (one of ‘elem’ or ‘attr’). THe rest of the fields speak for themselves. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 17 Detailed Design of Data Dictionary system 5.2.2 EINRC/DD 18(25) Model & business logic for classification schemes It was a user requirement that the DD should support abstract relations between data elements from different tables of a single dataset. To implement this and to be prepared for other abstract relations in the future, DD has used the ISO 11179 recommendation for implementing classification schemes. In other words, abstract relations between data elements and other DD concepts are mapped into classifications schemes, implemented according to ISO11179. At the moment classification schemes in DD are used only to store abstract relations between data elements. The model of classification schemes is given in figure 4.3. DATAELEM DATASET M_ATTRIBUTE 1 1 1 CLSF_SCHEME CS_ID : NUMBER CS_NAME : VARCHAR CS_TYPE : VARCHAR CS_VERSION : VARCHAR CS_DESCRIPTION : TEXT NAMESPACE 1 1 * * * * CS_ITEM * CSI_ID : NUMBER CS_ID : NUMBER CSI_TYPE : VARCHAR CSI_VALUE : VARCHAR 1 1 COMPONENT_ID : NUMBER COMPONENT_TYPE : VARCHAR IS_DEFAULT : CHAR ATTRIBUTE * 1 * 1 * COMPLEX_ATTR_ROW Figure 4.3 CSI_RELATION * PARENT_CSI : NUMBER CHILD_CSI : NUMBER REL_TYPE : VARCHAR Model for classification schemes The model on figure 4.3 brings along some tables from the previous model. First of all DATAELEM, M_ATTRIBUTE, DATASET and NAMESPACE. These are the tables representing concepts that can be classified according to this model: data elements simple attributes datasets namespaces _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 18 Detailed Design of Data Dictionary system 19(25) The model allows defining several classification schemes, stored in CLSF_SCHEME table. In general- classification scheme is an arrangement or division of objects into groups based on certain characteristics, which the objects have in common. Examples are taxonomies, thesauri, etc. Abstract relations between data elements are just a specific case of this. Each classification scheme consists of classification scheme items, stored in CS_ITEM table. These may be the nodes of taxonomy. In DD's case they are the IDs of data elements related. As you can see, any of the above mentioned 4 tables can have matching rows in CS_ITEM table, i.e. they can be classified. Grouping the classification scheme items is done in CSI_RELATION table that holds the parent-child relations between the items. So this is actually the table where the relating of data elements happens. 5.2.3 EINRC/DD Note that there are a couple of more tables brought over from the previous model: ATTRIBUTE and COMPLEX_ATTR_ROW, linked by the CS_ITEM table through 1-to-many relationship. This is to support the requirement that classification items may also have attributes, helping to further define them. Unique identifiers In many of the above described tables you can see fields with names like XXX_ID. While they do stand for primary keys in the database sense, they don't however always represent primary keys in the business logic sense. The only reason why those fields are used, is because in most cases logically the key is formed by several fields, but if you want to refer to such a key from another table, you'd have to bring all those several keys into that another table. And that would be confusing. So for the database engine to uniquely identify concepts, there are those auto generated XXX_ID fields. The logical identifiers of most important objects- datasets, tables and data elementswere described in the chapter of the model for basic concepts. They are applied within the application code. 5.3 Import tool Import tool is an MS Access-based utility enabling to define datasets, tables and data elements (but no attributes and namespaces!) in a flat table and then afterwards populate the definitions into DD database, using XML as an intermediate format. The reason for such a tool is simply because when entering a large amount of definitions, a flat table might be handier than the web-based user interface with its many different views and clicking back and forth in between them. By the end of SA-6 the tool is still very trivial and actually there are no plans to sophisticate it, until a concrete and un-escapable need arises. This due to the fact that Import tool is really only used for two purposes of which neither require much complexity. Those two purposes are: Initial import of a logically unique datasets (meaning only the first versions of datasets; the following version will be created by editing through web UI) Import of only the fixed values of a selected data element _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 19 Detailed Design of Data Dictionary system EINRC/DD 20(25) The basic principle of how this tool works is presented on figure 4.4. content expert 1 3 data upload file MSAccess get file Client User Interface 2 Database generate Import tool XML Database Data Dictionary Figure 4.4 Basic working principle of Import tool Actions in Import tool's working sequence are as follows: 1. Content expert, using the usual MS Access flat table editing view, enters the definitions into the tool's database. 2. Content expert forces the tool to generate an XML file of the definitions entered. 3. Content expert takes the generated XML file and uploads it into the DD which then parses the file and stores the found data into its database. The reason for using XML as an intermediate format is to ensure that while still enabling the user to get the best of MS Access handiness, the definitions can be imported into a Data Dictionary on any imaginable platform. And normally DD would run exactly on Linux where MS Access files would be difficult to work with. A sample working view of Import tool is given on figure 4.5. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 20 Detailed Design of Data Dictionary system Figure 4.5 EINRC/DD 21(25) Screen shots of Import tool In the upper part of figure 4.5 you can see a snapshot of the view for working with data elements, the lower part presents how the view looks for working with fixed values of data elements. As you can see, they are simple flat tables, giving a better overview than the web-based user interface would enable. Note that these two shots are just a small part of all different views in Import tool. Mostly there's a separate table for each of the dictionary's basic concepts. The columns in table stand for the concept’s fields and attributes, corresponding to the fields of the concept's table and rows of attribute tables in the DD database. This of course means that each row in such tables does not necessarily represent exactly once concept, because for each new attribute one has to add a new row, repeating all the rest of attributes for the given concept. The views provide helping drop-down lists wherever possible. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 21 Detailed Design of Data Dictionary system EINRC/DD 22(25) Once the definitions have been entered, the user forces the tool to generate an XML representation of them. For that the tool has a form where the user can specify the file where the XML must be generated to and press 'Generate' for performing the generation. But before doing so, the user also has a chance to specify which exactly type of import he/she is going to perform: Import of all the dataset definitions present in the tool Import of only the fixed values of a selected data element Layout of the generated XML file is no subject to any of the common standards, it is designed by the developer just for the mere purpose of getting the data from Import tool to DD database and no other parties will need to work with this XML. Except for the parser at the DD end of course. A snapshot of a generated XML is shown on figure 4.6 below. Once the XML has been generated, the user uploads it into DD, using the corresponding form in DD user interface and the importing code transports it all into the DD database. NB! There are some important limitations to the Import tool that you should know: The tool does not have a link to DD database. This means that its model is not automatically self-updating when the model in the DD database has changed. The most concrete example can be given on the basis of attributes. In Import tool they are represented as columns in the tables worked with. In other words the attribute set in Import tool has been hard-coded. While in the DD database attributes are defined in rows, rather than in columns. This finally means that whenever a new attribute is added to the DD database, or an existing one is changed or deleted, it will not automatically reflect in Import tool and the changes must be done by hand. This sounds like quite a drawback, but luckily the changes do not concern the code. Only a new column has to be added/removed to/from a selected table. Whenever the user induces an XML generation, all the contents of tool's database at that moment will be generated into XML. This means that if the user has defined several different datasets, but wants to import only one of them into DD, he/she has to create a separate instance of the tool for that dataset and use that instance for entering and importing the definition of the desired dataset. So it is highly recommended that before you start entering any definitions into the Import tool, make it clear if you want to actually import only some of them. And if so, create a separate tool instance for those. While this might sound rather primitive, creating a separate instance of Import tool is luckily rather easy. Because Import tool is just another MSAccess file (.mdb) and all you have to do is to make a copy of it. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 22 Detailed Design of Data Dictionary system EINRC/DD 23(25) <?xml version="1.0" encoding="UTF-8"?> <import> <RowSet name="Dataset"> <DATASET> <Row> <DATASET_ID>1</DATASET_ID> <SHORT_NAME>CDDA</SHORT_NAME> <Version>1.0</Version> <Name>CDDA</Name> <Identifier>CDDA</Identifier> <ShortDescription/> <Definition/> <DescriptionOfUse/> <Theme>CDDA</Theme> <Sub-theme/> <Keywords/> <SynNames_name_1/> <SynNames_name_2/> <EEAissue_name_1/> <EEAissue_name_2/> <EEAissue_name_3/> <EEAissue_name_4/> <ROD/> <Indicators/> <Methodology>Over the last few years, several requests for validation . . . </Methodology> <PlannedUpdFreq/> <References/> <RegistrationAuthority_name_1/> <RegistrationAuthority_name_2/> <Guidelines_url_1/> <Guidelines_url_2/> <Guidelines_description_1/> <Guidelines_description_2/> </Row> <Row> Figure 4.6 <DATASET_ID>3</DATASET_ID> Snapshot of the XML used between Import tool and DD <SHORT_NAME>Rivers</SHORT_NAME> <Version>1.0</Version> <Name>Rivers</Name> <Identifier>Rivers</Identifier> <ShortDescription/> <Definition/> _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 23 Detailed Design of Data Dictionary system 5.4 5.4.1 EINRC/DD 24(25) User interface Client side The client software for the CR user interface is basically any commonly used WWW browser, but it is tested only with MS Internet Explorer 5.5 and Netscape 6.0. So it is highly recommended to use on of the above or higher versions. From the security point of view, client side has to enable usage of JavaScript and cookies. Any other hazardous actions can be disabled if the user wishes to do so. User interface layout format is HTML only; no other content-types are produced by the server side. 5.4.2 Server side Server side of the user interface implementation uses three types of software to produce the HTML content-type output: Static HTML pages Java Servlet Pages (JSP) Java servlets Java servlet pages... ... are used for just about all users functionality, except for the Import functions. Data Dictionary's JSPs don't use any corporate open-source software, they are made from scratch and are quite simply based on creating a JDBC connection to the database, getting the result sets and rendering them on the UI right away. Java servlets ... ... are used only for the import functionality. Currently there's only one of them, importing the XML generated by the Import tool. It reacts to HTTP POST requests only, expects the above described XML layout to be the request's content type (text/xml) and passes it on to the import handler who then puts it all into DD database. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 24 Detailed Design of Data Dictionary system 5.5 EINRC/DD 25(25) User authentication & access control Users of DD application are authenticated against a configured LDAP server. For access restriction logics DD uses the ACL mechanism developed in Reportnet’s Application Integration Tool (AIT) project. More about different user groups and their rights has been written in DD User Guide document, which also serves for the description of DD functionality. That’s why the latter is missing from this Detailed Design document. 5.6 Used software Some of the software used for creating and needed for running the Data Dictionary has already been mentioned above, but a conclusive overall list is as follows: Purpose Used software UI client HTTP browser, HTML-capable (Internet Explorer 5.5, Netscape 6.0, etc) UI Server HTTP server with JSDK 2.2 compatible servlet engine (Apache 1.3.22, http://httpd.apache.org/; Resin 1.1.6, http://www.caucho.com/; etc.) UI Client-Server protocol HTTP on TCP/IP Common Gateway Interface JSDK 2.2 (Java Servlet Development Kit) compatible Java servlets and Java Servlet Pages (JSP), accessed through the HTTP server’s servlet engine. Application Software JavaTM 2 platform http://www.javasoft.com/j2se/1.3/). Database MySQL 4.0.1-alpha http://www.mysql.com Database driver JDBC, mm.mysql.jdbc-1.2b http://mmmysql.sourceforge.net Operation system Linux RedHat 7.1 XML parsing Xerces Java Parser http://xml.apache.org/xerces2-j/index.html 6. Configuration & installation Guidelines for installing and configuring Data Dictionary have been provided in a separate DD Installation Guide document. _________________________________________________________________________________________________________________________ IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement Print Date 12/02/2016 14:43:00 116102285 25