Detailed Design of Data Dictionary

advertisement
Project / Subproject
Authors
Date issued / Version
IDA/EINRC framework 501998
TietoEnator Corporation
02 April 2004/1.1
Design Report
Specific Agreement-7/EIONET
Client
Info
Reference / Page
European Commission
DG Enterprise / B 6
Detailed Design of Data Dictionary
DD DDD v1.1.doc
/ 1 of 25
EUROPEAN COMMISSION
ENTERPRISE DIRECTORATE-GENERAL
IDA-Programme -Framework Contract No 501 998 - 7th Specific Agreement
Einrc - Eionet
DETAILED DESIGN
Data Dictionary
Detailed Design of Data Dictionary system
EINRC/DD
2(25)
MODIFICATION HISTORY
Date
24.03.2003
09.04.2003
12.12.2003
05.01.2004
08.03.2004
02.04.2004
Version
0.1
0.2
0.3
0.4
1.0
1.1
Author
Reason for modification
J.Heinlaid
J.Heinlaid
J.Heinlaid
J.Heinlaid
E.Palosuo
J.Heinlaid
First Draft
Updated incorporating changes requested by project leader
Updates documenting developments done within SA-6
Updates documenting last developments within SA-6
Status updated to final approved SA6 deliverable
Updates documenting developments in the first increment under SA7
APPROVAL PROCEDURE
Reviewer
Version
Project Team
Frm mgr
SA5/02
SA6/04
SA7/1.1
SA5/0.2
SA6/0.4
SA7/1.1
SA6/1.0
SA7/x.x
Proj. Leader
CMT review
Start
Date
28.03.03
09.01.04
11.06.04
09.04.03
28.01.04
02.04.04
28.01.04
xx.xx.04
Finish
Date
09.04.03
28.01.04
14.06.04
11.04.03
03.02.04
02.04.02
08.03,04
Required modification/Approval/Comments
Approved
DD for the first SA7/increment approved
Approved as SA5 deliverable
Updated version approved as SA6 deliverable
DD for the first increment approved
SA6 deliverable approved
Background material:
A1: ' Data Dictionary: Current Situation and User Requirements' document from SA-5
http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/dd_ur_v1_2_doc/_EN_1.0_&a=d
A2: 'Data Dictionary: Software Requirements and Architectural Design' document from SA-5
http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/dd_sr_ad_v1_doc/_EN_1.2_&a=d
A3: 'Data Dictionary - part of REPORTNET' document, created by Arvid Lillethun
http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa5_2002/dd/021213_dictionary/_EN_1.0_&a=d
A4: 'Data Dictionary: Current Situation and User Requirements' document from SA-6
http://nmc.eionet.eu.int:8980/irc/DownLoad/cgut1gLBhpL0qe4m4KSm4qeqGZ0EHm1pU5R978bRspj2601dq5vk0j3lG4u1LfGpxAoMfGZ0o/DD%20UR%20031003v01.doc
A5: 'Data Dictionary: Current Situation and User Requirements' document from SA-7
http://nmc.eionet.eu.int:8980/irc/DownLoad/cjuy1ULHhoLtqu4f4TS34P-Cqc2CGFle3rjF4t4-ZUC784CfZ5SbNfZsTR6Fh-qMZq69-QXC4/DD%20SA-7%20UR%20v0.2.doc
A6: 'DD: Project Management Plan' document from SA-7
http://nmc.eionet.eu.int:8980/irc/DownLoad/c5uG1hLVhRLFqt444NS3456quCw0XLNZRc9YW4Gotp9pLkcH-jPb20s-pAhF5AjDs46S0Us-xHQK/DDPMP%20v1.0.doc
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
2
Detailed Design of Data Dictionary system
EINRC/DD
3(25)
A7: Task lists for DD releases in DD folder of EINRC/NMC library:
http://nmc.eionet.eu.int:8980/Members/irc/eionetnmc/einrc/library?l=/einrc_sa7_2004/dd&vm=detailed&sb=Title
1.
Table of Contents
FOREWORD ............................................................................................................................................................4
1.
OBJECTIVES AND SCOPE ..........................................................................................................................5
2.
CONCEPTS .....................................................................................................................................................7
2.1
2.2
3.
FUNCTIONALITY .........................................................................................................................................9
3.1
3.2
3.3
3.4
4.
DATA .........................................................................................................................................................7
METADATA ................................................................................................................................................8
MANAGEMENT OF DATA DEFINITIONS ......................................................................................................11
MANAGEMENT OF METADATA DEFINITIONS .............................................................................................11
ALL USERS FUNCTIONALITY .....................................................................................................................12
DATA DEFINERS AND ADMINISTRATORS FUNCTIONALITY .........................................................................12
IMPLEMENTATION ...................................................................................................................................13
4.1
SYSTEM ARCHITECTURE ..........................................................................................................................13
4.2
DATA MODEL & BUSINESS LOGIC .............................................................................................................14
4.2.1 Model & business logic for basic concepts ........................................................................................15
4.2.2 Model & business logic for classification schemes ............................................................................18
4.2.3 Unique identifiers ...............................................................................................................................19
4.3
IMPORT TOOL...........................................................................................................................................19
4.4
USER INTERFACE .....................................................................................................................................24
4.4.1 Client side ...........................................................................................................................................24
4.4.2 Server side ..........................................................................................................................................24
4.5
USER AUTHENTICATION & ACCESS CONTROL...........................................................................................25
4.6
USED SOFTWARE .....................................................................................................................................25
5.
CONFIGURATION & INSTALLATION ...................................................................................................25
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
3
Detailed Design of Data Dictionary system
EINRC/DD
4(25)
Foreword
This document presents the detailed design of the Data Dictionary (DD) system, which
by the end of SA-6 has reached its first operational stages and therefore no changes to
the major design concepts are expected in the system and in this document. However,
since the development of DD is going to continue in SA-7, this document is by no
means final and done with changes. It is very probably going to be subject to change in
SA-7 as well.
Still, the documentation of basic design principles and implementations that have not
changed for quite some time and that are not very likely to substantially change in the
future as well, has been provided here. Most notably it includes:

basic types of data (i.e. concepts) that the DD handles,

all the functionality that the DD provides on top of it (Note that this document only
lists it. In more detail it is provided in a separate user guide document),

data model framing the concepts and functionality,

basic principles and technical documentation of how it has all been implemented,

description of DD's main use cases and working sequences,

technologies used in implementing DD.
The document is closely related to the background material listed above:

The basic concepts and the functionality that the DD handles are mostly based on
user requirements set in A1, A4 - A7

A2 has been the basis of the general layout of DD's User Interface, of its data
model, technical architecture and of how it handles the support for XML Schema
formatted definitions and ISO11179.

A3 and A6 is the sole basis of principles on how the DD handles the concrete and
realistic business needs of Reportnet currently. Meaning the functionality and concepts circling around datasets and tables, import/export formats based on Microsoft and selected pilot content.
In a way this document can be seen as a consolidation of the above background material.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
4
Detailed Design of Data Dictionary system
2.
EINRC/DD
5(25)
Objectives and scope
Objective
Data Dictionary is to become a web-based data semantics registry for the Reportnet. It
will hold definitions of datasets, tables and data elements exchanged over the Reportnet, whether by humans or applications. It will provide functions to import such definitions in a machine-readable form, manage them through a web-based user interface
and export them both in machine and human readable format. An API for other applications to perform selected operations shall be included as well.
Data Dictionary shall follow the common standards, technologies and recommendations in data semantics field as much as concrete business needs make it possible.
The main standard of those is ISO 11179.
Scope
The following scope has been implemented by the end of Specific Agreement-6:

A database model supporting about ~70% of the data semantics functionality
suggested by ISO11179 (http://www.diffuse.org/meta.html - ISO11179):
-
Part 1 (Framework for the specification of data elements)
-
Part 2 (Classification for Data Elements)
-
Part 3 (Basic Attributes of Data Elements)
-
Part 5 (Naming and Identification Principles for Data Elements)
Part 4 about "Rules and Guidelines for the Formulation of Data Definitions" is
more about consulting EEA on principles for formulating data definitions. Part
6 deals with registration of data elements, i.e. it is to keep track of administrative metadata which is going to be implemented in the project's later phases.
NB! Note that the above listed implemented parts may not have been implemented to the fullest.


On top of the ISO 11179 compliant data model, a user interface for
-
managing definitions of datasets,
-
managing definitions of tables in datasets,
-
managing definitions of data elements,
-
managing definitions of attributes for datasets, tables and data elements,
-
managing namespaces (i.e. contexts) in which the definitions have been
made.
Transparently for the user, support for representing any of these definitions in
XML Schema format has been implemented. However, only a fraction of XML
Schema specification is supported, because building a web-interface for managing XML Schema to its full specs would be huge work. Supported are:
-
declaration of elements
-
element annotations
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
5
Detailed Design of Data Dictionary system
-
value domain annotations
-
simple types
-
restrictions based on simple types
-
complex types (but not attributes!)
-
extension of complex types
·
sequences
·
choices
EINRC/DD
6(25)
This XML Schema fraction is supported both in the UI and import/export function. Note that aggregate data elements are not supported. XML Schema
complex types, sequences and choices have been used to represent the hierarchical structure of datasets > tables > data elements.

To enable comfortable data definition process, a trivial Import tool was implemented, based on MS Access and enabling to manage definitions in a flat table format. The tool generates XML of whatever has been entered into its tables and the XML is then finally imported into DD database.

Throughout several intensive workshops at the client pilot datasets were identified and their definitions (together with tables and data elements) were imported into the DD through the Import tool.

A separate User Guide document was written on how the UI and Import tool
can be used.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
6
Detailed Design of Data Dictionary system
3.
EINRC/DD
7(25)
Concepts
Before you go further, read through the following basic concepts around which the definition management processes circle in DD.
3.1
Data

Dataset
A collection of data exchanged between applications or humans. In Reportnet and
Data Dictionary's context a dataset is a collection of tables containing the reported
data. Often the "tables" will actually recede to a single table only. Usually datasets
come as MS Access databases or MS Excel files. They are subject to certain data
flows and obliged to be reported by Reportnet players according to legislation.

Table
A table in Data Dictionary's context is a table in dataset. It can be either a data table or a lookup table for how to interpret the data. A lookup table can be for example made for holding country codes or whichever other code lists.
Columns in a table stand for data elements, rows for their values.

Data element
A data element in Data Dictionary's context is a column in a table. It can be for example 'Country', 'Latitude', 'Longitude', 'Unit', 'Value', etc.
There are three types of data elements:

Data element with quantitative values
Sometimes also referred as 'data element with measured values ' or 'characteristic 2' or 'characteristic of type 2 '.
This is a data element that can have whichever values.

Data element with fixed values
Sometimes also referred as 'data element with allowable values' or 'code list
'or 'characteristic 1' or 'characteristic of type 1 '.
This is a data element that can have values from a pre-fixed set of values only. Usually those values are some kind of codes, hence the 'code list'.

Aggregate data element
Sometimes also referred as 'data element with a structure ' or simply 'aggregate '.
This is a data element that cannot have any values. Instead it has a structure,
consisting of other data elements, i.e. it's an aggregate of other data elements.
The latter can be of any of the three types.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
7
Detailed Design of Data Dictionary system
3.2
EINRC/DD
8(25)
Metadata

Attribute
An attribute in Data Dictionary's context is an attribute of a definition of a dataset,
table or data element. Most common attribute of all definitions is 'Name', standing
for the name of the defined object. Other attributes could be for example 'Definition', 'Version', etc.
There are two types of attributes recognized by DD:

Simple attribute
Every such attribute in its instance is a name/value pair. They represent the
same concept of attribute as in ISO 11179 part 3. For example the mentioned
'Name', 'Definition', 'Version' are a good example of simple attributes.
NB! Note that simple attributes can have pre-fixed sets of allowable values,
just like the elements of characteristic type 1 could have. Again, such values
are pretty often some kind of codes.

Complex attribute
A complex attribute in Data Dictionary's context is almost the same as a simple attribute. The only distinction is that while a simple attribute is a
name/value pair, the complex attribute has a structure of its own, consisting of
several fields, each of which is then a name/value pair. For example a complex attribute could be RelatedSource, having two fields: RelatedSourceID &
RelationType. An instance of complex attribute is then a set of rows, each
containing the values of fields in their order.
Complex attribute is something specific to DD, it is not present in ISO 11179.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
8
Detailed Design of Data Dictionary system
4.
EINRC/DD
9(25)
Functionality
In general there can be 2 views on Data Dictionary’s functionality:
1. Based on the general types of data it handles, the functionality can be divided into
logically two separate systems:

the system dealing with management of data definitions (storing, searching, importing/exporting definitions of datasets, tables and data elements).

the system dealing with management of metadata used in data definitions (storing and searching the basic attributes of data definitions and
namespaces in which the definitions are seen).
2. Based on user groups, the functionality can be divided into 3 as follows:

All users functionality (fig 3.1)
Available for all human users- administrators and public users

Data definers functionality (fig 3.2)
Available both for data definers and DD administrators

Administrators only functionality (fig 3.2)
Currently the functionality sets of data definers and administrators are equal
Search
dataset
Al l users
Search tables
Browse
list of datasets
Browse
list of tables
Browse
dataset detailed view
Browse
table detailed view
Browse
allowable values
Se arch element
Figure 3.1.
Browse
list of elements
Browse
element detailed view
Browse
subelements
Use cases for all users
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
9
Detailed Design of Data Dictionary system
Search
dataset
Browse
list of dataset s
Browse
dataset detailed view
Add a new dataset Add a new table
Search tables
Browse
li st of t ables
Administrator
Add a new element
Search element
Browse
list of elements
EINRC/DD
10(25)
Edit dataset
Browse/edit
tables within dataset
Browse
tabl e detai led view
Edit table
Browse/edit
elements within table
Browse
element detailed view
Browse/edit
c omplex att ri but es
Browse/edit
allowable values
Edit element
Browse/edit
subelements
Browse/edit
list of namespaces
Browse
namespace detailed view
Browse/edit
list of attributes
Browse
attribute detailed view
Edit
complex attribute
Add a new attribute
Edit
sim ple attribut e
Export/Import
Figure 3.2.
Edit namespace
Edit
complex attribute fields
Use cases for data definers and administrators
NB! Note that this document only lists the functionality. Further guidelines on how to
actually use it and which are the screenshots, are available in a separate User Guide
document. This kind of approach was taken because of three reasons:

The amount of different user views and use cases in DD is quite huge.

To document such an amount of use cases in Detailed Design Document would
make it all together with other parts uncomfortably and frighteningly long.

It is very likely that guidelines would be also needed by users, who shouldn't however be given access to the detailed design.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
10
Detailed Design of Data Dictionary system
4.1
EINRC/DD
11(25)
Management of data definitions
Grouped by concepts it handles, the functions can be seen as follows:



4.2
Datasets

search for, by selected criteria

view search result list

view a dataset definition, including all it's attributes and definitions of its tables

add a new dataset (define its attributes and tables)

edit an existing dataset (define its attributes and tables)

delete an existing dataset
Tables

search for, by selected criteria

view search result list

view a table's definition, including all it's attributes and definition's of its data
elements

add a new table (define its attributes and data elements)

edit an existing table (define its attributes and data elements)

delete an existing table
Data elements

search for, by selected criteria

view search result list

view a data element's definition, including all its attributes and it's allowable
values if it is of that type.

add a new data element (define its attributes and allowable values)

edit an existing data element (define its attributes and allowable values)

delete an existing data element
Management of metadata definitions
Grouped by concepts it handles, the functions can be seen as follows:

Attributes (both simple and complex)

search for, by selected criteria

view search result list

view an attribute definition, including its fields, if it's of complex type

add a new attribute (define also its fields, if it's of complex type)

edit an existing attribute (define also its fields, if it's of complex type)

delete an existing attribute
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
11
Detailed Design of Data Dictionary system
4.3
EINRC/DD
12(25)
All users functionality
In principle all users have access to the following functionality of the following basic DD
concepts (see also fig 1):



4.4
Datasets

search for, by selected criteria

view search result list

view a dataset definition, including all it's attributes and definitions of its tables
Tables

search for, by selected criteria

view search result list

view a table's definition, including all it's attributes and definition's of its data
elements
Data elements

search for, by selected criteria

view search result list

view a data element's definition, including all its attributes and it's allowable
values if it is of that type.
Data definers and administrators functionality
Since administrators also fall under all users, they obviously have access to the above
described all-users functionality. But in addition can perform much more operations
with much more concepts (see also fig 2):



Datasets

add a new dataset (define its attributes and tables)

edit an existing dataset (define its attributes and tables)

delete an existing dataset
Tables

add a new table (define its attributes and data elements)

edit an existing table (define its attributes and data elements)

delete an existing table
Data elements

add a new data element (define its attributes and allowable values)

edit an existing data element (define its attributes and allowable values)

delete an existing data element
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
12
Detailed Design of Data Dictionary system

EINRC/DD
13(25)
Attributes (both simple and complex)

search for, by selected criteria

view search result list

view an attribute definition, including its fields, if it's of complex type

add a new attribute (define also its fields, if it's of complex type)

edit an existing attribute (define also its fields, if it's of complex type)

delete an existing attribute

note that for simple attributes one can also specify allowable values
5.
Implementation
5.1
System architecture
The overall system architecture of the Data Dictionary is given on figure 3.1.
Figure 4.1
System architecture of the Data Dictionary
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
13
Detailed Design of Data Dictionary system
EINRC/DD
14(25)
The picture is made up of three main parts:

Data Dictionary's software
It's the big box in the middle of the picture. All the software has been written in
Java programming language and therefore it runs in the Java Runtime Environment, which is marked with a dotted line.

Data Dictionary 's database
Shown on the right side of the picture and having to be JDBC (Java Database
Connectivity) compliant. So the database handling runs over JDBC driver.

Outer world interacting with Data Dictionary
This is the WW, represented as the free-form object on the right side of the
picture. Currently there are two types of interactions DD recognizes:

Web client (WWW browser)

The previously mentioned MS Access-based data Import tool (note that
communication between the tool and DD software happens over XML).
Later an API for applications as the 3rd interaction type will also developed.
5.2
Data model & business logic
The data model, as mentioned in the scope chapter, covers about 70% of the functionality suggested by ISO 11179. Note that this does not necessarily mean that the model
also looks like that of ISO’s. That's because striving towards ISO support in all its parts
became a priority later than the project was started and the data model and its implementation had been advanced quite a bit to go back. Also concrete business logic
needs have set restrictions to how generic the model can be. So whenever we are talking about ISO support, we are talking about supporting it in terms of functionality and
not the data model.
Since the model is quite big, it has been divided into two: the part covering basic concepts and XML Schema support (fig 4.2), and the part for supporting classification
schemes (fig 4.3).
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
14
Detailed Design of Data Dictionary system
5.2.1
EINRC/DD
15(25)
Model & business logic for basic concepts
NAMESPACE
NAMESPACE_ID : VARCHAR
SHORT_NAME : VARCHAR
FULL_NAME : VARCHAR
1 DEFINITION : VARCHAR
PARENT_NS : NUMBER
WORKING_USER : VARCHAR
FXV
*
FXV_ID : NUMBER
OWNER_ID : NUMBER
OWNER_TYPE : VARCHAR
VALUE : VARCHAR
IS_DEFAULT : CHAR
DEFINITION : VARCHAR
SHORT_DESC : VARCHAR
1
1
1
1
TBL2ELEM
TABLE_ID : NUMBER
DATAELEM_ID : NUMBER
POSITION : NUMBER
DST2TBL
DATASET_ID : NUMBER
TABLE_ID : NUMBER
POSITION : NUMBER
*
*
*
*
*
*
1
DS_TABLE
TABLE_ID : NUMBER
SHORT_NAME : VARCHAR
WORKING_USER : VARCHAR
WORKING_OPCY : CHAR
REG_STATUS : VARCHAR
VERSION : NUMBER
DATE : NUMBER
USER : VARCHAR
CORRESP_NS : NUMBER
PARENT_NS : NUMBER
IDENTIFIER : VARCHAR
1
1
*
1
DATAELEM
DATAELEM_ID : NUMBER
TYPE : VARCHAR
SHORT_NAME : VARCHAR
EXTENDS : NUMBER
WORKING_USER : VARCHAR
WORKING_COPY : CHAR
1
REG_STATUS : VARCHAR
VERSION : NUMBER
1
USER : VARCHAR
DATE : NUMBER
PARENT_NS : NUMBER
TOP_NS : NUMBER
IDENTIFIER : VARCHAR
GIS : VARCHAR
1
*
FK_RELATION
REL_ID : NUMBER
A_ID : NUMBER
B_ID : NUMBER
A_CARDIN : CHAR 1
B_CARDIN : CHAR
DEFINITION : TEXT
*
DATASET
DATASET_ID : NUMBER
SHORT_NAME : VARCHAR
VERSION : VARCHAR
VISUAL : TEXT
DETAILED_VISUAL : TEXT
WORKING_USER : VARCHAR
WORKING_COPY : CHAR
REG_STATUS : VARCHAR
DATE : NUMBER
1 USER : VARCHAR
CORRESP_NS : NUMBER
DELETED : VARCHAR
IDENTIFIER : VARCHAR
1
1
1
1
1
*
*
*
ATTRIBUTE
*
*
*
COMPLEX_ATTR_ROW
ROW_ID : VARCHAR
1
PARENT_ID : NUMBER
PARENT_TYPE : CHAR
M_COMPLEX_ATTR_ID : NUMBER
POSITION : NUMBER
HARV_ATTR_ID : VARCHAR
*
*
1
HARV_ATTR
HARV_ATTR_ID : NUMBER
HARVESTER_ID : VARCHAR
HARVESTED : NUMBER
MD5KEY : VARCHAR
1
LOGICAL_ID : VARCHAR
1
*
1
HARV_ATTR_FIELD
HARV_ATTR_MD5 : VARCHAR
FLD_NAME : VARCHAR
FLD_VALUE : VARCHAR
Figure 4.2
*
COMPLEX_ATTR_FIELD
ROW_ID : VARCHAR
M_COMPLEX_ATTR_FIELD_ID : NUMBER
VALUE : TEXT
M_ATTRIBUTE_ID : NUMBER
DATAELEM_ID : NUMBER
VALUE : TEXT
PARENT_TYPE : CHAR
*
*
1
M_COMPLEX_ATTR_FIELD
M_COMPLEX_ATTR_FIELD_ID : NUMBER
M_COMPLEX_ATTR_ID : NUMBER
NAME : VARCHAR
DEFINITION : TEXT
POSITION : NUMBER
PRIORITY : CHAR
HARV_ATTR_FLD_NAME : VARCHAR
*
*
1
1
M_COMPLEX_ATTR
M_COMPLEX_ATTR_ID : NUMBER
NAME : VARCHAR
* SHORT_NAME : VARCHAR
OBLIGATION : CHAR
*
DEFINITION : TEXT
NAMESPACE_ID : VARCHAR
DISP_ORDER : NUMBER
DISP_WHEN : CHAR
INHERIT : CHAR
HARVESTER_ID : VARCHAR
1
M_ATTRIBUTE
M_ATTRIBUTE_ID : NUMBER
NAME : VARCHAR
*
SHORT_NAME : VARCHAR
OBLIGATION : CHAR
DEFINITION : TEXT
NAMESPACE_ID : VARCHAR
DISP_TYPE : VARCHAR
DISP_ORDER : NUMBER
DISP_WHEN : CHAR
DISP_WIDTH : NUMBER
DISP_HEIGHT : NUMBER
DISP_MULTIPLE : CHAR
INHERIT : CHAR
1
Model for basic concepts
Business logic for model on figure 4.2:

Datasets, tables and data elements are also referred simply as objects.

Note that attributes and objects, for which they can be specified, are in separate
tables. Attributes are defined in the so-called meta-tables M_ATTRIBUTE (for simple attributes) and M_COMPLEX_ATTR (for complex attributes) and
M_COMPLEX_ATTR_FIELD (for complex attribute fields).
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
15
Detailed Design of Data Dictionary system
EINRC/DD
16(25)
The actual values of attributes (and their fields in case of complex ones) are kept
in tables ATTRIBUTE (for simple attributes), COMPLEX_ATTR_ROW (for complex
attributes) and COMPLEX_ATTR_FIELD (for complex attribute fields).
Finally, the tables for objects whose definitions can have attributes are linked to attribute values' tables with 1-to-many relations, meaning that an object can have 1
or many attributes.
So in other words the attribute values are stored as rows and not fields of table.
This kind of separation is the only way to ensure that new attributes can be added
to the system at any time, and existing ones can be edited or delete whenever
you'd like. It also means that DD is not restricted to any set of standardized attributes, making it very flexible.

Objects that can have attributes are stored in the following tables:

DATASET for datasets. For each dataset there is exactly one row in his table.

DS_TABLE for tables. They are always seen within a certain namespace,
identified by NAMESPACE_ID. For each table there is exactly one row in his
table.

DATAELEM for data elements. As the NAMESPACE_ID suggests, every data
element is also present in some namespace. There is exactly one row for
each data element in this table.

There’s a restriction that each data element must have a parent table and
each table must have a parent dataset! This restriction might become dropped in
future developments when DD is going to support harmonised content.

Datasets, tables and data elements in DD are versioned (see the VERSION columns in corresponding tables). It means that there can be several versions of a
logically same dataset, table or data elements. Thus one could see two types of
concept identifiers in DD model:
-
logical ID (for distinguishing logically different objects)
-
versioned ID (for distinguishing different versions of a logically same object)
It is easy to understand that
versioned ID=logical ID + version.
Versioned IDs are kept in DATAELEM_ID, TABLE_ID and DATASET_ID fields.
A logically unique data element can and must belong only into one logically unique
table and a logically unique table can and must belong only into one logically
unique dataset.
Logical IDs are counted as follows:

-
logical ID of a dataset is its identifier (IDENTIFIER)
-
logical ID of a table is made up of its identifier (IDENTIFIER) and its parent
dataset’s logical ID (PARENT_NS, see below)
-
logical ID of a data element is made up of its identifier (IDENTIFIER) and its
parent table’s logical ID (PARENT_NS, see below)
A logically unique dataset stands for a unique context (aka namespace). A logically
unique table stands for a unique context (aka namespace). Those corresponding
contexts are defined and stored in the NAMESPACE table. Namespace identifiers
are stored in NAMESPACE_ID table. Whenever a new logically unique dataset or
table is created, DD also automatically creates the corresponding namespace in
NAMESPACE table.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
16
Detailed Design of Data Dictionary system

EINRC/DD
17(25)
In DS_TABLE a table’s parent dataset’s logical ID is represented by the latter’s
namespace id- PARENT_NS.
In DATAELEM a data element’s parent table’s logical ID is represented by the latter’s namespace id- PARENT_NS.
In DATAELEM a data element’s parent dataset’s logical ID is represented by the
latter’s namespace id- TOP_NS.

Data element types are identified by values in DATAELEM.TYPE, ones of 'AGG',
'CH1' (fixed values) or 'CH2' (quantitative values).

TBL2ELEM table helps to overcome the many-to-many relationship between tables
and elements with unique versioned IDs. DST2TBL helps to overcome the manyto-many relationship between tables and datasets with unique versioned IDs.

The definition of the fields of complex attributes is resolved so that each complex
attribute in M_COMPLEX_ATTR table can have one or many fields defined in
M_COMPLEX_ATTR_FIELD.

Storage of the actual values of complex attributes is resolved on the principle that
each complex attribute value can be seen as a row, where columns stand for attribute fields (in the definition order) and cells for the values of every such field.
To support that, there is a table for storing such rows- COMPLEX_ATTR_ROW.
Each row in that table stands for exactly one unique complex attribute value row.
And it has one or many field values stored in the COMPLEX_ATTR_FIELD table.

DD enables to represent foreign key relations between the defined tables. This is
done with the help of FK_RELATION table, which enables to create a foreign key
relation between two data elements in DATAELEM table (naturally the two elements must be in tables different both by logical ID and versioned ID, but DD does
not check upon it). A_ID and B_ID columns represent the versioned IDs of the two
elements, A_CARDIN and B_CARDIN represent either side’s cardinality (weather
it’s one-to-many, one-to-one, etc).

DD enables to harvest complex attribute values from outside sources (normally an
LDAP directory). Each harvested attribute’s value is represented by a row in
HARV_ATTR table, the fields of those values are stored in HARV_ATTR_FIELD.
The two tables related through MD5KEY-HARV_ATTR_MD5.
A row in HARV_ATTR can be linked with a row in COMPLEX_ATTR_ROW
through HARV_ATTR.LOGICAL_ID and COMPLEX_ATTR_ROW.HARV_ATTR_ID
fields. If the latter is NULL, it means the corresponding row is not a harvested one.
Harvested rows are preferred over non-harvested.
Each complex attribute field in M_COMPLEX_ATTR_FIELD can be linked with only one FLD_NAME in HARV_ATTR_FIELD.
There can be many complex attribute harvesters in DD, the attributes of which in
HARV_ATTR are identified by HARVESTER_ID column. Each complex attribute
identified in M_COMPLEX_ATTR can be linked with only one HARVESTER_ID.

The GIS field in DATAELEM table indicates the data element’s GIS (Geographical
Information System) type. If it’s NULL then it means that the element is no GIS element. Tables that contain GIS elements are called GIS tables.

Fixed values of data elements and attributes are kept in FXV table where each
value is uniquely identified by FXV_ID. OWNER_ID contains the ID of the value’s
owner and OWNER_TYPE indicates the owner’s type (one of ‘elem’ or ‘attr’). THe
rest of the fields speak for themselves.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
17
Detailed Design of Data Dictionary system
5.2.2
EINRC/DD
18(25)
Model & business logic for classification schemes
It was a user requirement that the DD should support abstract relations between data
elements from different tables of a single dataset. To implement this and to be prepared for other abstract relations in the future, DD has used the ISO 11179 recommendation for implementing classification schemes. In other words, abstract relations between data elements and other DD concepts are mapped into classifications schemes,
implemented according to ISO11179.
At the moment classification schemes in DD are used only to store abstract relations
between data elements. The model of classification schemes is given in figure 4.3.
DATAELEM
DATASET
M_ATTRIBUTE
1
1
1
CLSF_SCHEME
CS_ID : NUMBER
CS_NAME : VARCHAR
CS_TYPE : VARCHAR
CS_VERSION : VARCHAR
CS_DESCRIPTION : TEXT
NAMESPACE
1
1
*
*
*
*
CS_ITEM
*
CSI_ID : NUMBER
CS_ID : NUMBER
CSI_TYPE : VARCHAR
CSI_VALUE : VARCHAR
1
1 COMPONENT_ID : NUMBER
COMPONENT_TYPE : VARCHAR
IS_DEFAULT : CHAR
ATTRIBUTE
*
1
*
1
*
COMPLEX_ATTR_ROW
Figure 4.3

CSI_RELATION
*
PARENT_CSI : NUMBER
CHILD_CSI : NUMBER
REL_TYPE : VARCHAR
Model for classification schemes
The model on figure 4.3 brings along some tables from the previous model. First
of all DATAELEM, M_ATTRIBUTE, DATASET and NAMESPACE. These are the
tables representing concepts that can be classified according to this model:
 data elements
 simple attributes
 datasets
 namespaces
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
18
Detailed Design of Data Dictionary system
19(25)

The model allows defining several classification schemes, stored in
CLSF_SCHEME table. In general- classification scheme is an arrangement or division of objects into groups based on certain characteristics, which the objects
have in common. Examples are taxonomies, thesauri, etc. Abstract relations between data elements are just a specific case of this.

Each classification scheme consists of classification scheme items, stored in
CS_ITEM table. These may be the nodes of taxonomy. In DD's case they are the
IDs of data elements related.

As you can see, any of the above mentioned 4 tables can have matching rows in
CS_ITEM table, i.e. they can be classified.

Grouping the classification scheme items is done in CSI_RELATION table that
holds the parent-child relations between the items. So this is actually the table
where the relating of data elements happens.

5.2.3
EINRC/DD
Note that there are a couple of more tables brought over from the previous model:
ATTRIBUTE and COMPLEX_ATTR_ROW, linked by the CS_ITEM table through
1-to-many relationship. This is to support the requirement that classification items
may also have attributes, helping to further define them.
Unique identifiers
In many of the above described tables you can see fields with names like XXX_ID.
While they do stand for primary keys in the database sense, they don't however always
represent primary keys in the business logic sense. The only reason why those fields
are used, is because in most cases logically the key is formed by several fields, but if
you want to refer to such a key from another table, you'd have to bring all those several
keys into that another table. And that would be confusing. So for the database engine
to uniquely identify concepts, there are those auto generated XXX_ID fields.
The logical identifiers of most important objects- datasets, tables and data elementswere described in the chapter of the model for basic concepts. They are applied within
the application code.
5.3
Import tool
Import tool is an MS Access-based utility enabling to define datasets, tables and data
elements (but no attributes and namespaces!) in a flat table and then afterwards populate the definitions into DD database, using XML as an intermediate format. The reason
for such a tool is simply because when entering a large amount of definitions, a flat table might be handier than the web-based user interface with its many different views
and clicking back and forth in between them.
By the end of SA-6 the tool is still very trivial and actually there are no plans to sophisticate it, until a concrete and un-escapable need arises. This due to the fact that Import
tool is really only used for two purposes of which neither require much complexity.
Those two purposes are:

Initial import of a logically unique datasets (meaning only the first versions of
datasets; the following version will be created by editing through web UI)

Import of only the fixed values of a selected data element
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
19
Detailed Design of Data Dictionary system
EINRC/DD
20(25)
The basic principle of how this tool works is presented on figure 4.4.
content expert
1
3
data
upload
file
MSAccess
get
file
Client
User
Interface
2
Database
generate
Import tool
XML
Database
Data Dictionary
Figure 4.4
Basic working principle of Import tool
Actions in Import tool's working sequence are as follows:
1. Content expert, using the usual MS Access flat table editing view, enters the definitions into the tool's database.
2. Content expert forces the tool to generate an XML file of the definitions entered.
3. Content expert takes the generated XML file and uploads it into the DD which then
parses the file and stores the found data into its database.
The reason for using XML as an intermediate format is to ensure that while still enabling the user to get the best of MS Access handiness, the definitions can be imported
into a Data Dictionary on any imaginable platform. And normally DD would run exactly
on Linux where MS Access files would be difficult to work with.
A sample working view of Import tool is given on figure 4.5.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
20
Detailed Design of Data Dictionary system
Figure 4.5
EINRC/DD
21(25)
Screen shots of Import tool
In the upper part of figure 4.5 you can see a snapshot of the view for working with data elements, the lower part presents how the view looks for working with fixed values of data elements. As you can see, they are simple flat tables, giving a better overview than the web-based
user interface would enable. Note that these two shots are just a small part of all different views
in Import tool.
Mostly there's a separate table for each of the dictionary's basic concepts. The columns in table
stand for the concept’s fields and attributes, corresponding to the fields of the concept's table
and rows of attribute tables in the DD database. This of course means that each row in such tables does not necessarily represent exactly once concept, because for each new attribute one
has to add a new row, repeating all the rest of attributes for the given concept.
The views provide helping drop-down lists wherever possible.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
21
Detailed Design of Data Dictionary system
EINRC/DD
22(25)
Once the definitions have been entered, the user forces the tool to generate an XML representation of them. For that the tool has a form where the user can specify the file where the XML must
be generated to and press 'Generate' for performing the generation. But before doing so, the
user also has a chance to specify which exactly type of import he/she is going to perform:

Import of all the dataset definitions present in the tool

Import of only the fixed values of a selected data element
Layout of the generated XML file is no subject to any of the common standards, it is designed by
the developer just for the mere purpose of getting the data from Import tool to DD database and
no other parties will need to work with this XML. Except for the parser at the DD end of course.
A snapshot of a generated XML is shown on figure 4.6 below.
Once the XML has been generated, the user uploads it into DD, using the corresponding form in
DD user interface and the importing code transports it all into the DD database.
NB! There are some important limitations to the Import tool that you should know:

The tool does not have a link to DD database. This means that its model is not automatically self-updating when the model in the DD database has changed. The most concrete example can be given on the basis of attributes. In Import tool they are represented as columns in the tables worked with. In other words the attribute set in Import tool has been
hard-coded. While in the DD database attributes are defined in rows, rather than in columns. This finally means that whenever a new attribute is added to the DD database, or an
existing one is changed or deleted, it will not automatically reflect in Import tool and the
changes must be done by hand. This sounds like quite a drawback, but luckily the changes
do not concern the code. Only a new column has to be added/removed to/from a selected
table.

Whenever the user induces an XML generation, all the contents of tool's database at that
moment will be generated into XML.
This means that if the user has defined several different datasets, but wants to import only
one of them into DD, he/she has to create a separate instance of the tool for that dataset
and use that instance for entering and importing the definition of the desired dataset.
So it is highly recommended that before you start entering any definitions into the Import
tool, make it clear if you want to actually import only some of them. And if so, create a separate tool instance for those.
While this might sound rather primitive, creating a separate instance of Import tool is luckily
rather easy. Because Import tool is just another MSAccess file (.mdb) and all you have to
do is to make a copy of it.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
22
Detailed Design of Data Dictionary system
EINRC/DD
23(25)
<?xml version="1.0" encoding="UTF-8"?>
<import>
<RowSet name="Dataset">
<DATASET>
<Row>
<DATASET_ID>1</DATASET_ID>
<SHORT_NAME>CDDA</SHORT_NAME>
<Version>1.0</Version>
<Name>CDDA</Name>
<Identifier>CDDA</Identifier>
<ShortDescription/>
<Definition/>
<DescriptionOfUse/>
<Theme>CDDA</Theme>
<Sub-theme/>
<Keywords/>
<SynNames_name_1/>
<SynNames_name_2/>
<EEAissue_name_1/>
<EEAissue_name_2/>
<EEAissue_name_3/>
<EEAissue_name_4/>
<ROD/>
<Indicators/>
<Methodology>Over the last few years, several requests for
validation . . . </Methodology>
<PlannedUpdFreq/>
<References/>
<RegistrationAuthority_name_1/>
<RegistrationAuthority_name_2/>
<Guidelines_url_1/>
<Guidelines_url_2/>
<Guidelines_description_1/>
<Guidelines_description_2/>
</Row>
<Row>
Figure 4.6
<DATASET_ID>3</DATASET_ID>
Snapshot of the XML used between Import tool and DD
<SHORT_NAME>Rivers</SHORT_NAME>
<Version>1.0</Version>
<Name>Rivers</Name>
<Identifier>Rivers</Identifier>
<ShortDescription/>
<Definition/>
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
23
Detailed Design of Data Dictionary system
5.4
5.4.1
EINRC/DD
24(25)
User interface
Client side
The client software for the CR user interface is basically any commonly used WWW
browser, but it is tested only with MS Internet Explorer 5.5 and Netscape 6.0. So it is
highly recommended to use on of the above or higher versions.
From the security point of view, client side has to enable usage of JavaScript and cookies. Any other hazardous actions can be disabled if the user wishes to do so.
User interface layout format is HTML only; no other content-types are produced by the
server side.
5.4.2
Server side
Server side of the user interface implementation uses three types of software to produce the HTML content-type output:

Static HTML pages

Java Servlet Pages (JSP)

Java servlets
Java servlet pages...
... are used for just about all users functionality, except for the Import functions.
Data Dictionary's JSPs don't use any corporate open-source software, they are made
from scratch and are quite simply based on creating a JDBC connection to the database, getting the result sets and rendering them on the UI right away.
Java servlets ...
... are used only for the import functionality. Currently there's only one of them, importing the XML generated by the Import tool. It reacts to HTTP POST requests only, expects the above described XML layout to be the request's content type (text/xml) and
passes it on to the import handler who then puts it all into DD database.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
24
Detailed Design of Data Dictionary system
5.5
EINRC/DD
25(25)
User authentication & access control
Users of DD application are authenticated against a configured LDAP server. For access restriction logics DD uses the ACL mechanism developed in Reportnet’s Application Integration Tool (AIT) project.
More about different user groups and their rights has been written in DD User Guide
document, which also serves for the description of DD functionality. That’s why the latter is missing from this Detailed Design document.
5.6
Used software
Some of the software used for creating and needed for running the Data Dictionary has
already been mentioned above, but a conclusive overall list is as follows:
Purpose
Used software
UI client
HTTP browser, HTML-capable (Internet Explorer
5.5, Netscape 6.0, etc)
UI Server
HTTP server with JSDK 2.2 compatible servlet engine (Apache 1.3.22, http://httpd.apache.org/; Resin
1.1.6, http://www.caucho.com/; etc.)
UI Client-Server protocol
HTTP on TCP/IP
Common Gateway Interface
JSDK 2.2 (Java Servlet Development Kit) compatible Java servlets and Java Servlet Pages (JSP),
accessed through the HTTP server’s servlet engine.
Application Software
JavaTM 2 platform
http://www.javasoft.com/j2se/1.3/).
Database
MySQL 4.0.1-alpha
http://www.mysql.com
Database driver
JDBC, mm.mysql.jdbc-1.2b
http://mmmysql.sourceforge.net
Operation system
Linux RedHat 7.1
XML parsing
Xerces Java Parser
http://xml.apache.org/xerces2-j/index.html
6.
Configuration & installation
Guidelines for installing and configuring Data Dictionary have been provided in a separate DD Installation Guide document.
_________________________________________________________________________________________________________________________
IDA-Programme -Framework Contract No 501 998 - EINRC/7th Specific Agreement
Print Date 12/02/2016 14:43:00
116102285
25
Download