SDR-1

advertisement
Construction of a
database
Per Weidenman
PAR AB
Database
•A collection of data
•It belongs together
•It models the ”world”
Database management system (DBMS)
•The database (a collection of interrelated data)
•Software to manage and access the data
User:
•Searching
•Reporting
DBMS requirements
Input:
transactions
DBMS
Organised
data
•”Database”
•Data Warehouse
•etc.
Statistical
analysis
Database management systems (DBMS)
•Microsoft Access
•Microsoft SQL Server
•DB2
•Oracle
•MySQL
•FirebirdSQL
•etc.
SQL – Structured Query Language
A computer language to define and
search data
Relational databases
Tables containing data, organised in
rows and columns
Keys, used for linking data in different
tables
Example
Simple database for collecting and
organising statistical papers
Created in Microsoft Access
Paper name
and details
Link to
dokument
(pdf file)
Autors
A database with four tables
Keys
One of the tables, containing paper name and details
Key
Rows containing paper name and other details
One paper on each row
The keys are used to link data in the four tables
Table
”artiklar”
Key:
artikel_id
1
2
3
4
5
6
7
8
9
Table
”författare2”
Key:
artikel_id
1
1
1
…
5
6
Key:
person_id
1
2
3
…
4
4
Table
”personer2”
Key:
person_id
1
2
3
4
One paper having 3 autors
One person being the autor of 2 papers
Aaaa
Bbbb
Cccc
Dddd
A query: the result of asking the database about papers and autors
One paper and the corresponding 3 autors
One autor and the corresponding 2 papers
”Business”
users
User:
•Searching
•Reporting
DBMS requirements
Input:
transactions
DBMS
Organised
data
•”Database”
•Data Warehouse
•etc.
IT Department
Statistical
analysis
DBMS requirements from a statistical / analytical viewpoint
•Data quality
•Data types
•Performance
•Maximun information
•Historical data
•Regulation and secrecy
DBMS requirements from a statistical / analytical viewpoint
Data quality
Sales System X
Choose customer name:
Sales System X
User:
•Searching
•Reporting
Enter customer name:
Input:
transactions
Instead of entering
text/data by typing…
DBMS
… use, if possible, selection
from a list of valid values
Volvo Personvagnar AB
Volvo Lastvagnar AB
Volvo Construction AB
Volvo Bussar AB
Volvo Logistics AB
…
DBMS requirements from a statistical / analytical viewpoint
Data quality
Sales System X
User:
Enter
customer age:
Define rules for valid input
(values, intervals, etc.)
Input:
transactions
•Searching
•Reporting
DBMS
We dont want:
•Negative values
•40+
•1982
DBMS requirements from a statistical / analytical viewpoint
Data quality
Missing values should stored
as ”null”
User:in the database.
Not•Searching
as 0 (digit zero)
•Reporting
Handling of missing values …
Input:
transactions
DBMS
DBMS requirements from a statistical / analytical viewpoint
Data types
Text
Numeric
DBMS requirements from a statistical / analytical viewpoint
Performance
Searching for individual records
User:
•Searching
•Reporting
Creating ”prepared” reports by
counting or summing
DBMS requirements
Input:
transactions
DBMS
Organised
data
•Large datasets
•Multivariate methods
•Iterative estimation
•Etc.
Statistical
analysis
DBMS requirements from a statistical / analytical viewpoint
Maximum information
Sales System X
User:
•Searching
•Reporting
Enter customer age:
34
Input:
transactions
We need to report on
age groups:
20-29
30-39
40-49
…
DBMS
Thus we store age as an
interval, not as a value!
The fallacy of beeing too
user oriented!
DBMS requirements from a statistical / analytical viewpoint
Historical data
Sales System X
Customer name:
… will be added to
table Orders and
stored as a ”new row”
User:
•Searching
•Reporting
Customer address:
Order value:
Order date:
Input:
transactions
Each new order for a
specific customer …
DBMS
Table: Orders
Customer
ID
Order
date
Order
value
DBMS requirements from a statistical / analytical viewpoint
Historical data
Sales System X
Customer name:
… will probably UPDATE
the existing record (row)
for the specific customer
User:
•Searching
•Reporting
Table: Customers
Customer
ID
Customer
name
Customer
address
Customer address:
Order date:
Order value:
Input:
But a new address … transactions
DBMS
Thus, the old value of
”customer address” will be
deleted and replaced with
the new value.
But this will do fine for
users focusing on
searching / reporting!
DBMS requirements from a statistical / analytical viewpoint
Historical data
Each time a value is
UPDATED for a certain
customer …
User:
•Searching
•Reporting
Create av new table to
contain historic records
Table: Customers
Customer
ID
Customer
name
Customer
address
Table: Customers_history
Customer
ID
Customer
name
Customer
address
From
Input:
transactions
To
DBMS
… the complete (previous)
record is transfered to the
table Customers_history
DBMS requirements from a statistical / analytical viewpoint
Historical data
Table: Customers
Customer
ID
This structure will make analysis
of processes possible
User:
•Searching
•Reporting
But not easy!
Table: Customers_history
Customer
ID
Customer
name
Customer
address
From
Input:
transactions
To
DBMS
Customer
name
Customer
address
DBMS requirements from a statistical / analytical viewpoint
Regulation and sectrecy
DBMS requirements from a statistical / analytical viewpoint
Current
data
Current + historical
data
Operating on
individual records
Operating on
many records
Next on
this channel…
User:
•Searching
•Reporting
DBMS requirements
Input:
transactions
DBMS
A database containing
historic transactions
Organised
data
Statistical
analysis
PAR / Bisnode database
Tables
Basic company data
One record per company.
Contains name, address, startdate,
enddate, line of business, etc.
FTG
Historic company data
Many records per company.
Contains the accumulated historic
records from table FTG
FTG_H
Sampling for times
series statistics
Serrano
Balance sheet data
One record per annual report
(thus many records per company).
Turnover, profit, key ratios, etc.
Board member data
Many records per company and
person.
BOKSLUT
FUNKTION_
PERIOD
And many
more tables!
Historic names etc.
How?
Board data
Statistical
analysis
END
Basic company data
One record per company.
Contains name, address, startdate,
enddate, line of business, etc.
Historic company data
Many records per company.
Contains the accumulated historic
records from table FTG
Balance sheet data
One record per annual report
(thus many records per company).
Turnover, profit, key ratios, etc.
Board member data
Many records per company and
person.
Serrano
Balance sheet data from different periods
transformed to yearly data records
Serrano
Historic transactions from FTG_H
transformed to yearly data records
Serrano Board Data
Balance member data from any mix of startdate, enddata and period length
transformed to yearly data records
Summing up register data to annual figures
A
Exampel.
Register containing balance
sheet data:
•Number of employes
•Turnover
3
•Profit
•Tangible assets
•Etc.
ÅR
2
1
Nu
Summing up register data to annual figures
A
B
ÅR
3
Brutet räkenskapsår
2
1
Nu
Summing up register data to annual figures
A
B
C
ÅR
3
Omlagda räkenskapsår
2
1
Nu
Summing up register data to annual figures
A
B
C
D
ÅR
3
Missing data
2
1
Nu
Summing up register data to annual figures
B
Förslag:
Bryt ner flödesvariablerna 3
(omsättning, vinst, etc.) till
månadsvärden …
ÅR
2
1
Nu
Summing up register data to annual figures
B
Förslag:
Nu
3
2
1
… och summera
Förslag:
månadsvärdena till ett
’fingerat’ kalenderårsvärde. … samt imputera för full
täckning under sista året
ÅR
Summing up register data to annual figures
B
ÅR
3
2
1
Database
Nu
First exampel
Register based transport statistics for SIKA:
Decreased response burden
Increased understanding of the transporting
companies (as a complement to the ”usual”
fokus on type of goods)
Time series describing economic status and change
.
Objective:
Describing economic status and change in transporting companies
during the last ten years.
Total number of employes and turnover …
140000
250000
120000
Anställda
100000
150000
80000
60000
100000
40000
50000
20000
0
0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Nettoomsättning, MKR
200000
Anställda
Nettoomsättning
Objective:
Describing economic status and change in transporting companies
during the last ten years.
… or turnover growth compared to BNP
200
180
160
Transportbranscherna
140
BNP
120
100
80
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Objective:
Describing economic status and change in transporting companies
during the last ten years.
… or profit development for different types of freight companies
10%
5%
0%
-5%
-10%
-15%
-20%
-25%
1997
1998
1999
2000
2001
2002
2003
Vägtransport
Sjötransport
Lufttransport
Kollektivtrafik
Taxitrafik
Totalt
2004
2005
2006
2007
Järnvägstransport
Objective:
Describing economic status and change in transporting companies
during the last ten years.
… or the number of employes in a cohort of new companies.
160%
140%
120%
100%
80%
60%
40%
20%
0%
1997
1998
1999
2000
2001
2002
Kvarvarande anställda
2003
2004
Tillkommande anställda
2005
2006
2007
Tables based on balance sheet data from each company
Aktiva företag
Aktiva aktiebolag
BNP
År
Totalt
Därav
aktiebolag
Antal
anställda
Nettoomsättning
(Mkr)
Löpande
priser (Mkr)
1997
12912
10599
98259
120284
1927001
1998
12788
10626
100663
127745
2012091
1999
12547
10543
102531
133078
2123971
2000
12562
10704
106811
145496
2249987
2001
12383
10659
112685
163418
2326176
2002
12432
10741
114426
168214
2420761
2003
12616
10935
115135
178294
2515150
2004
12689
11067
118015
188913
2624964
2005
12709
11100
119387
209819
2735218
2006
12514
11012
121683
224225
2899653
What data is needed?
Company data including
micro level history.
Exactly which companies
where active in transport
during each year?
Balance sheet data from
all transporting
companies for each year
Aktiva företag
Aktiva aktiebolag
BNP
År
Totalt
Därav
aktiebolag
Antal
anställda
Nettoomsättning
(Mkr)
Löpande
priser (Mkr)
1997
12912
10599
98259
120284
1927001
1998
12788
10626
100663
127745
2012091
1999
12547
10543
102531
133078
2123971
2000
12562
10704
106811
145496
2249987
2001
12383
10659
112685
163418
2326176
2002
12432
10741
114426
168214
2420761
2003
12616
10935
115135
178294
2515150
2004
12689
11067
118015
188913
2624964
2005
12709
11100
119387
209819
2735218
2006
12514
11012
121683
224225
2899653
What data is needed?
Company data including
micro level history.
Exactly which companies
where active in transport
during each year?
Balance sheet data from
all transporting
companies for each year
Aktiva företag
Aktiva aktiebolag
BNP
År
Totalt
Därav
aktiebolag
Antal
anställda
Nettoomsättning
(Mkr)
Löpande
priser (Mkr)
1997
12912
10599
98259
120284
1927001
1998
12788
10626
100663
127745
2012091
1999
12547
10543
102531
133078
2123971
2000
12562
10704
106811
145496
2249987
2001
12383
10659
112685
163418
2326176
2002
12432
10741
114426
168214
2420761
2003
12616
10935
115135
178294
2515150
2004
12689
11067
118015
188913
2624964
2005
12709
11100
119387
209819
2735218
2006
12514
11012
121683
224225
2899653
Faster access to ”last years” data
compared to taxation based
registers
Sampling companies for time series statistics
A
B
C
D
ÅR
3
2
1
Nu
Sampling companies for time series statistics
A
B
C
D
ÅR
3
2
1
Nu
Sampling companies for time series statistics
A
B
C
D
ÅR
3
2
1
Nu
Sampling companies for time series statistics
A
B
C
D
ÅR
A
C
D
3
A
B
C
D
2
A
B
C
1
Nu
Download