Conversions from national grid data to harmonized European grid data

advertisement
Conversions from national grid
data to harmonized European
grid data
EFGS Lisbon 12-14 October 2011
Production and challenges
Rina Tammisto, Senior Statistician, Statistics Finland
Marja Tammilehto-Luode, Chief Adviser, Statistics Finland
Harmonization
Data harmonization

Source data



Georeferenced
national data
Disaggregated
European data
Methods used



Aggregated
Disaggregated
Hybrid method
Spatial harmonization

A grid net

covers the whole of Europe
ETRS89-LAEA Grid Net
Downloadable ZIP
http://www.efgs.info/data/GEOSTAT-1km-Grid.zip/view
Grid_ETRS89_LAEA_1K.shp
Abt. 500 Mt
ETRS89-LAEA Grid Net
ETRS89-TM35FIN Grid Net
LAEA grid net in relation to national grid net
in Finland
Vergleich beider Systeme in LCC
LCC ininLCC
LAEA
LCC
LAEA in LCC
Seite 1
statistik.at
LAEA grid net in relation to national grid net in Austria
Differences in locations of grid cells in different
projections (or co-ordinate systems)
A grid cell produced by using the national ETRS89TM35FIN co-ordinate system and projection is divided
among several ETRS89-LAEA grid cells
 Direct derivation between different co-ordinate
systems or projection is not usable
 grids are located
differently in relation to each others

A issue to be solved: How to use national grid
datasets while the direct conversion is not
relevant…?
Tested method 1. Aggregation of grid data by
using converted building points
1) Georeferenced source data is converted
 Buildings are converted from ETRS89-TM35FIN to
ETRS89-LAEA
 2) Converted building points are joined with the
ETRS89-LAEA grid net


3) Aggregation of statistical data
Building points in ETRS89-TM35FIN
Building points in ETRS89-LAEA
Aggregation of statistical data
Method 1
Advantages
Points easily convertible –
original quality of location
maintained
 From geostatistical point of
view data quality throughly
the same as in national
data

Disadvantages
Double sets of primary
data
 Double production
processes from the
beginning
 Risk of data disclosure –
due to use of several
co-ordinate systems
- gaps between datasets

Tested method 2. Conversion of grid data by
using ready-made national grid datasets
1) Ready-made national grid dataset in ETRS89TM35FIN is converted into ETRS89-LAEA
 Polygon to Point – using the middle points of
national grid cells
 Conversion of the middle points of grids
 2) Converted points are joined with the ETRS89-LAEA
grid net
 3) Aggregation of statistical data

PRODUCTION OF THE NATIONAL GRID DATA
MIDDLE POINTS OF NATIONAL GRIDS
CONVERSION OF THE POINTS,
SPATIAL JOIN WITH ETRS89-LAEA
GRID NET
AGGREGATION OF STATISTICAL DATA
Effects of the grid cell size on the quality of the
conducted data

Tested grid cell sizes:
National grid data:
- 125 m x 125 m – highest resolution data
- 250 m x 250 m
– data produced for the Finnish Grid Database
- 1 km x 1 km
Reference data: Data produced by using method 1;
(conversion made on building points)
Additional test: JRC/GISCO disaggregated data
POP/KM²
Comparison of the test datasets

Statistics:
 Number of grids, mean (inhabitants/grid populated
grid cell), total number of inhabitants in the dataset,
min, max
Variable
Dataset from
converted building
points
Datasets from
converted grid
points
JRC dataset
N
Mean
Sum
Minimum Maximum
POP_1KM_LAEA
102 050
51,0
5 204 192
1
14 053
POP_1KM_125M
102 249
50,9
5 204 192
1
14 197
POP_1KM_250M
POP_1KM_1KM
POP_DISAGG
102 759
99 049
159 921
50,6
52,5
32,4
5 204 166
5 204 179
5 181 806
1
1
0.01
13 283
19 175
5 866
Number of Observations
Pearson Correlation Coefficients
POP_1KM_ POP_1KM_ POP_1KM_
LAEA
125M
250M
POP_1KM_
1KM
POP_
DISAGG
Dataset from converted building points
POP_1KM_LAEA
POP_1KM_LAEA
1.00000
0.99900
0.99495
0.90989
0.79804
<.0001
<.0001
<.0001
<.0001
102 050
99 372
97 216
81 647
85737
Dataset from converted grid points
POP_1KM_125M
POP_1KM_125M
0.99900
1.00000
0.99471
0.90990
0.79857
<.0001
<.0001
<.0001
<.0001
99372
102249
97488
81808
85871
POP_1KM_250M
POP_1KM_250M
0.99495
0.99471
1.00000
0.90611
0.79840
<.0001
<.0001
<.0001
<.0001
97216
97488
102759
82185
86268
POP_1KM_1KM
POP_1KM_1KM
0.90989
0.90990
0.90611
1.00000
0.74920
<.0001
<.0001
<.0001
<.0001
81647
81808
82185
99049
82069
JRC dataset
POP_DISAGG
POP_DISAGG
0.79804
0.79857
0.79840
0.74920
1.00000
<.0001
<.0001
<.0001
<.0001
85737
85871
86268
82069
159921
Identity line (the 45 degree line)
Values of converted dataset
in relation to values of national datasets
Evaluation of differences
by using absolute values of
inhabitants/km² grid cell
(absolute values of differences)
DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to derived datasets
GRIDS Std Dev
125M
99 372
12,7
%
DIF 0
65 305
25 428
4 429
1 924
1 447
503
335
1
65,7
25,6
4,5
1,9
1,5
0,5
0,3
0,0
%
250M
91,3
97 216
28,9
%
50 742
32 008
7 105
3 156
2 170
1 033
940
56
6
52,2
32,9
7,3
3,2
2,2
1,1
1,0
0,1
0,0
%
1KM
%
DIF 11- DIF 21- DIF 51- DIF 101- DIF 501- DIF over
DIF 1-5 DIF 6-10
20
50
100
500
1000
1000
85,1
81 647
135,5
20 194
31 351
11 606
7 839
4 903
1 888
3 000
574
292
24,7
38,4
14,2
9,6
6,0
2,3
3,7
0,7
0,4
%
63,1
DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to JRC/GISCOdisaggregated data
DISAG 85 737
%
%
184,8
11 395
36 260
14 294
9 244
6 477
2 916
4 113
632
406
13,3
42,3
16,7
10,8
7,6
3,4
4,8
0,7
0,5
55,6
Method 2
Advantages

Use of the ready-made
grid datasets!



Less phases
Smaller data mass
Level of quality is a matter
of choice
Adequate level of quality (?)
 Dependent on use
 Min. target: SUM of the
whole dataset is correct
 No increase of confidentiality
problems with double datasets

Disadvantages

Geostatistical point of view
data quality is weaker than
the original national data

Quality errors – quality
distortion compared to the
correct one (measuring by
number of inhabitants)
Next steps

For GEOSTAT 1A project from
October - November 2011
 More tests, any volunteers?
 Quality definitions concerning adequate level of
quality and grid scale used
 Step-by-step guidelines

LAEA dataset – filling the empty grid net with data!
Thank You!
rina.tammisto@stat.fi
marja.tammilehto-luode@stat.fi
Download