1.6 - Documentation of the mapping of the result of 1.4 on the `ideal

advertisement
in partnership with
Title:
Documentation of the mapping of the result of 1.4 on the ‘ideal
architecture’ framework
WP:
1
Deliverable:
1.6
Version:
1.0
Date:
30-6-2013
NSI:
Statistics Estonia
Authors: Maia Ennok
ESS - NET
ON MICRO DATA LINKING AND DATA WAREHOUSING
IN PRODUCTION OF BUSINESS STATISTICS
0
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
INDEX
1.
Introduction
2
1.1 Target audience
2
1.2 What is in this document?
2
2. Metadata examples by metadata subsets, functionality groups and layers of the
S-DWH
3
2.1
Source layer
3
2.2
Integration layer
6
2.3
Interpretation and data analysis layer
10
2.4
Data access layer
13
3. Metadata in data flows of Functional Architecture of the S-DWH
3.1 General variable flow
16
Source layer
17
Integration layer
17
Interpretation and data analysis layer
17
3.2 Data editing
17
3.3 Derivation of value
18
4. Conclusion
1
16
19
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
1.
Introduction
In this document we focus on the mapping of the Functionalities of metadata system
to facilitate and support the operation of the S-DWH1 (deliverable 1.4) and Functional
Architecture of the S-DWH2 (deliverable 3.3).
We want to make the audience to become more aware of metadata in S-DWH. For
that we give metadata examples by metadata subsets, functionality groups in layers
of the S-DWH and give metadata examples by data flows in functional architecture of
the S-DWH.
1.1 Target audience
This document is intended as a tool to assist S-DWH experts and users to link
metadata of S-DWH to the new functional architecture of the S-DWH for better
implementing S-DWH and the metadata system of S-DWH.
1.2 What is in this document?
In this document, we focus on the metadata subsets by metadata functionalities and
give examples of metadata in data flows:

Metadata examples by metadata subsets, functionality groups and layers;

Metadata examples in data flows of Functional Architecture of the S-DWH.
1
Ennok M. (2012) Functionalities of metadata system to facilitate and support the operation of the S-DWH.
Deliverable 1.4
2
Berglund B. (2013) Functional Architecture of the S-DWH. Deliverable 3.3
2
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2. Metadata examples by metadata subsets, functionality
groups and layers of the S-DWH
In this paragraph we describe examples of metadata by metadata subsets,
functionality groups and layers.
Layers are defined in the S-DWH Business Architecture3 document (deliverable 3.1)
and metadata subsets by layers are defined in the Metadata framework4
(deliverable 1.1).
Examples of metadata by metadata functionalities are described in the Functionalities
of metadata system to facilitate and support the operation of the S-DWH5 (deliverable
1.4).
2.1
Source layer
3
4
2–Design
Metadata creation
Phas
e of
GSB
PM
sub–
process/met
adata
Statistical metadata
2.2 – design
variable
descriptions
Creates metadata of
IOR
(Initial
observation registry)
variables;
2.3 - design
data
collection
methodology;
2.4 – design
frame
and
sample
methodology
Process
metadata
Quality
metadata
Technical
metadata
Authorisation
metadata
Creates
validation
rules
of
data
collection
Creates frame,
sample, stratum
metadata
Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 3.1
Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing. Deliverable 1.1
5
Ennok M. (2012) Functionalities of metadata system to facilitate and support the operation of the S-DWH.
Deliverable 1.4
3
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
3–Build
3.1 – build
data
collection
instrument
Creates
technical
metadata
of
data
collection
instrumen
ts (server
etc.);
Creates
technical
metadata
(servers,
databases
, tables,
columns
etc.)
of
IOR
Creates
statistical
metadata of workflow
(how data collection
is carried on)
3.3
–
configure
workflows;
Creates
collection
process
metadata
(process
log: when
(start
time),
who
(started
collection
), where
(link to
technical
metadata
);
4.2 – set up
collection;
4.3 – run
collection;
4–Collect
4.4 – finalize
collection
Creates
respon
se
rate/co
unt of
planne
d
records
/collect
ed
records
4
3–Build
Metadata
usages
Creates
metadata
of finalize
collection
(process
log)
3.3
configure
workflows
–
Uses metadata
IOR variables
of
Uses
metadata
of
users, roles,
privileges of
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Metadata maintenance
4–Collect
4.2 – set
up
collection;
4.3 – run
collection;
4.4 –
finalize
collection
Setting up
collection uses 2.6
pre–fill metadata,
3.1 metadata of
collection
instruments, 4.1
select sample
metadata;
Using metadata of
IOR variables for
running and
finalizing collection
Maintain
(create,
update,
delete,
versioning)
source
metadata
(data
sources
metadata,
variables metadata,
sample
metadata
etc.).
Mostly
manually managed,
technical metadata is
managed
automatically.
Source metadata can
be stored in different
meta
models
(managing
meta
models). Meta model
for
statistical
metadata,
process
metadata etc.
Metadata
evaluation
All kind of metadata
and
data
are
versioned
5
Setting up
collection
uses 3.1
data
collection
structure
technical
metadata
(server,
database,
tables,
columns
etc)
source layers
User
rights
are managed
according to
the
S–DWH
system
process
operations.
S–DWH has
following
operations –
read
metadata,
create
data
processing
packages,
access
to
delicate data,
solve
data
processing
tasks,
schedule
packages etc.
Evaluate and assure quality metadata of source layer by – fill-in controls of metadata validation
Systematic built–in processes for managing the workflow of quality assurance of metadata
Organizational processes of metadata validation
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2.2
Integration layer
6
1–Specify needs
Metadata creation
P
h
as
e
of
G
S
P
B
M
sub–process/
Statistical metadata
metadata
1.3 – establish
output
objectives
1.6 – prepare
business case
Creates
output
variable
metadata;
create
output
metadata
(which
cube variables are in
which cubes);
Creates metadata of
statistical activity
Process
metadata
Quality
metadata
Technical
metadata
Authorisation
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2.1. – design
outputs;
Builds output
(output algorithms);
2.2 – design
variable
descriptions;
Creates new
variable metadata
(creating
algorithms) using
available variables
metadata;
2.5
- design
statistical
processing
methodology;
3–Build
2–Design
2.6 – design
production
systems and
workflow
7
Creates statistical
metadata of
workflow (which
data processing
sub-processes are
used and which
order)
Creates pre-fill
metadata
Creates
output
validation
metadata
(algorith
ms);
Creates
algorithm
s of
statistical
confidenti
ality;
Creates
coding
algorithms
using
classifiers
and coding
tables
metadata;
Creat
es
data
quality
variabl
es
metad
ata (
edit
failure
rate,
imputati
on rate)
Creates
data
warehouse
data model
metadata
(using
design data
model
of
raw data)
Creates
data
staging
area data
model
(using
technical
metadata of
collection
instrument)
Creates
imputation
algorithms
(uses
imputation
methods);
Creates
disseminati
on
metadata
(publication
calendar);
Creates
data
processing
algorithms
and
scheduling
metadata
3.3 –
configure
workflows
Creates
scheduling
technical
metadata.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
5.1 –
integrate
data;
Creates
process
metadata of
data
processing
subprocesses
(process
log)
5.2 – classify
& code;
5.3 – review,
validate &
edit;
5–Process
5.4 – impute;
5.5 – derive
new variables
and statistical
units;
Creates
data
finalizing
metadata
(date,
person
etc.);
5.6 –
calculate
weights;
5.7 –
calculate
aggregate;
Creates
data
loading to
S-DWH
metadata
(logs, date,
person etc.)
5.8 – finalize
data files
6.1 – prepare
draft output
Creates
process
metadata
(logs etc.)
of
output/cube
creation
6–Analyze
1–Specify
needs
Metadata
usages
8
1.5 – check
data availability
Creat
es
quality
indicat
ors for
integr
ation
proce
ss
Uses metadata of
available
administrative data
Creates
output
technical
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2.1. – design
outputs;
Uses
cube/output
variable metadata
2.2 – design
variable
descriptions;
Uses
metadata
3–Build
2–Design
2.5
- design
statistical
processing
methodology;
Uses
statistical
processing metadata
3.3 –
configure
workflows
Uses
scheduling
statistical metadata
5.1 –
integrate
data;
Uses
metadata;
5.4 – impute;
5–Process
Uses
available
variables metadata in
creation
of
new
variable
metadata
(creating algorithms)
2.6 – design
production
systems and
workflow
5.2 – classify
& code;
5.5 – derive
new variables
and statistical
units,
5.6 –
calculate
weights;
5.7 –
calculate
aggregate;
5.8 – finalize
data files
9
variable
sample
Uses
variable
metadata, classifiers
metadata;
Uses coding
algorithms,
classifiers, coding
tables’ metadata;
Uses
algorithms
for
integration;
Uses
imputation
algorithms
Uses new
variable
creating
algorithms
Uses stratum and
frame metadata;
Uses methods
aggregate;
Uses
classifiers
and
coding
tables
metadata
in
creation
coding
algorithm
s;
Creates
imputatio
n
algorithm
s using
imputatio
n
methods
in
creation
imputatio
n
algorithm
s
for
Uses
tables/colu
mns
metadata
Uses
data
quality
variable
s
metadat
a
for
data
profiling
Uses data
model
of
raw data;
Uses data
warehouse
data model
metadata
ESSnet on Data Warehousing
6–Analyze
Maia Ennok (Statistics Estonia)
6.1 – prepare
draft output
Uses
classifiers
metadata;
Uses
output/cube variable
metadata in creation
of output/cube
Metadata maintenance
Maintain integration
metadata (data
processing
algorithms, data
warehouse data
models etc.).
User rights
are managed
according to
the S–DWH
system
process
operations
Integration metadata
can be stored in
different meta models
(managing
meta
models). Meta model
for
statistical
metadata,
process
metadata etc.
Metadata
evaluation
All kind of metadata
and
data
are
versioned
10
Systematic built-in processes for managing the workflow of quality assurance of metadata
Organizational processes of metadata validation
Interpretation and data analysis layer
Phas
e of
GSB
PM
sub–
process/meta
data
Statistical metadata
1–Specify needs
2.3
Evaluate and assure quality metadata of integration layer by – fill-in controls of metadata validation
1.3 – establish
output
objectives
Creates
output
variable
metadata;
create
output
metadata
(which
cube variables are in
which cubes);
Process
metadata
Quality
metadata
Technical
metadata
Authorisation
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2.1 – design
outputs;
2–Design
2.2 – design
variable
description;
2.4 – design
frame and
sample
methodology
;
2.5 – design
statistical
processing
methodology
;
Creates new
variable metadata;
Creates sample
metadata;
Creates
methodological rules
for analyse;
3–Build
3.3 – configure
workflows
4.1 – select
sample
5–Process
5.1 –
integrate
data;
5.5 – derive
new
variables
and
statistical
units;
5.6 –
calculate
weights
5.7 –
calculate
aggregate
11
Creates
quality
variable
s
Creates
statistical
metadata of workflow
4–Collect
Metadata creation
2.6 – design
production
systems and
workflow
Creates output
validation
metadata
(algorithms),
creates algorithms
of statistical
confidentiality,
creates
dissemination
metadata
(publication
calendar); builds
output (output
algorithms);
Creates
technical
metadata of
workflow
Creates
process
metadata of
sample
taking
Creates
process
metadata
Creates
quality
indicator
s
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
6– Analyze
6.1 –
prepare draft
output;
Creates
process
metadata of
validation
(process
log)
6.2 –
validate
outputs,
6.3 –
scrutinize
and explain;
Creates
process
metadata
6.4 – apply
disclosure
control;
3–Build
4–Collect
5–Process
Metadata usages
1–Specify needs
9-Evaluate
7Disseminate
6.5 – finalize
outputs
12
7.1 – update
output
systems
Creates
process
metadata
9.1 – gather
evaluation
inputs;
Creates
process
metadata
(process
log)
9.2 –
conduct
evaluation
1.5 – check
data
availability
Uses
variable
descriptions;
Uses variable
data element
descriptions
3.3 –
configure
workflows
4.1 – select
sample
5.1 –
integrate
data;
5.5 – derive
new
variables
and
statistical
units
and
link
Uses 2.5 metadata
for Analyse process
Uses
methodological
metadata
Uses
metadata
2.4
variable
Uses new
variable
creating
algorithms
Creates
output
technical
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
6.1 –
prepare draft
output;
6–Analyze
6.3 –
scrutinize
and explain;
Uses
classifiers
metadata;
Uses
output metadata
Uses
validation
algorithm
s;
uses
algorithm
s of
statistical
confidenti
ality
6.2 –
validate
outputs;
6.4 – apply
disclosure
control;
Uses
quality
variable
s
metadat
a.
Metadata evaluation
Metadata maintenance
9-Evaluate
7–
Disseminate
6.5 – finalize
outputs
2.4
Uses
metadata.
9.1 – gather
evaluation
inputs;
Uses
evaluation
metadata
output
9.2 –
conduct
evaluation
Maintain
interpretation
metadata (outputs,
output variables,
classifier, clarifying
metadata). Manual,
automated.
User rights
are
managed
according to
the S–DWH
system
process
operations
All kind of metadata
and data are
versioned
Evaluate and assure quality metadata of interpretation and data analysis layer by – fill-in controls of
metadata validation
Systematic built–in processes for managing the workflow of quality assurance of metadata
Organizational processes of metadata validation
Data access layer
Phas
e of
GSB
13
7.1 – update
output
systems
sub–
process/metad
ata
Statistical metadata
Process
metadata
Quality
metadata
Technical
metadata
Authorisation
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2–Design
PM
2.1 - design
outputs;
Build/creates output;
2.6 – design
production
systems and
workflow
Creates
statistical
metadata of workflow
Creates
technical
metadata of
output
3–Build
Creates
technical
metadata of
workflow
7.1 – update
output
systems;
7–Disseminate
Metadata creation
3.3 – configure
workflows
Creates
process
metadata
(when,
who,
etc.);
7.2 – produce
dissemination;
7.3 – manage
release of
dissemination
products;
7.4 – promote
dissemination;
7.5 – manage
user support
7–Disseminate
Metadata usages
7.1 – update
output
systems;
7.2 – produce
dissemination,
7.3 – manage
release of
dissemination
products;
7.4 – promote
dissemination;
7.5 – manage
user support
14
Uses
output
metadata
(output
variable
metadata,
classifiers metadata)
Uses publication
calendar metadata
and statistical
activity metadata.
Creates user
metadata
ESSnet on Data Warehousing
Metadata maintenance
Maia Ennok (Statistics Estonia)
Maintain access
metadata (output,
product, publication
calendar, user
support). Manual,
automated.
All kind of metadata and
data are versioned
User rights
are
managed
according to
the S–DWH
system
process
operations.
Metadata evaluation
Evaluate and assure quality metadata of data access layer by – fill-in controls of metadata validation
15
Systematic built-in processes for managing the workflow of quality assurance of metadata
Organizational processes of metadata validation
3. Metadata in data flows of Functional Architecture of the SDWH
For better understanding functional architecture of the S-DWH we have to see the data
flows through the S-DWH from metadata perspective.
Examples of data flow through the warehouse are from Functional Architecture of the
S-DWH6 (deliverable 3.3).
Users and systems of S-DWH have to understand how versions of variables differ
from each other. These differences are described in matadata (data and also
metadata is versioned).
3.1 General variable flow
Metadata in general data flow:
6
Berglund B. (2013) Functional Architecture of the S-DWH. Deliverable 3.3
16
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Source layer
Collecting of new versions of survay’s variables is as same as collecting initial
variables. Differences are only in identifications of versions.
Examples of metadata of data collection :





Statistical metadata: variable name and code
Quality metadata: quality indicators
Process metadata: process log (when, who and how data is collected); data
validation rules (collecting instruments)
Technical metadata: where the collected data is stored (server, database, tabel)
Authorization metadata: who has access to data, who can collect data etc.
Data is versioned (version number, current version).
Integration layer
Processing as collecting of new versions of survay’s variables is as same as
processing inital variables. Differences are only in identifications of versions.
Examples of metadata of data processing :



Statistical metadata: variable name and code
Quality metadata: quality indicators
Process metadata: process log (when, who and how data is processed); data
validation rules; data processing algorithms; dsa data mart mapping algorithms
 Technical metadata: where the processed data is stored (data staging area,
data mart) (server, database, tabel)
 Authorization metadata: who has access to data, who can process data etc.
Data is versioned (version number, current version).
Interpretation and data analysis layer
Loading new value of variable to Interpretation layer changes also metadata of old
value making it not the current version anymore.
All versions of variables are identified in data store, also there are link to different kind
of metadata.
Data is versioned (version number, current version).
3.2 Data editing
Metadata in data editing:
17
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
If data processing algorithms (process metadata) in Integration layer are changed then
new generation for variable is created.
It means also creating new process metadata (processing algorithms).
Metadata and data are both versioned (version number, current version).
3.3 Derivation of value
Metadata in derivation of value:
18
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
If the new variable (calculated variable) is created in Integration layer it also means
creating new metadata: statstical, process, tehnical, quality and autorisation metadata
for that new variable.
Metadata and data are both versioned (version number, current version).
4. Conclusion
Metadata is relevant for understanding how the S-DWH works and for identifying
processes done with data and metadata.
Data without metadata is not usable, so there has to be always metadata for data. The
more data is characterised by metadata (statistical, process, technical etc.) the easier
it is for users and systems to identify and characterise data. And the more metadata is
described the bigger chance it is to automate processes.
19
Download