Conversions from national grid data to harmonized European grid data EFGS Lisbon 12-14 October 2011 Production and challenges Rina Tammisto, Senior Statistician, Statistics Finland Marja Tammilehto-Luode, Chief Adviser, Statistics Finland Harmonization Data harmonization Source data Georeferenced national data Disaggregated European data Methods used Aggregated Disaggregated Hybrid method Spatial harmonization A grid net covers the whole of Europe ETRS89-LAEA Grid Net Downloadable ZIP http://www.efgs.info/data/GEOSTAT-1km-Grid.zip/view Grid_ETRS89_LAEA_1K.shp Abt. 500 Mt ETRS89-LAEA Grid Net ETRS89-TM35FIN Grid Net LAEA grid net in relation to national grid net in Finland Vergleich beider Systeme in LCC LCC ininLCC LAEA LCC LAEA in LCC Seite 1 statistik.at LAEA grid net in relation to national grid net in Austria Differences in locations of grid cells in different projections (or co-ordinate systems) A grid cell produced by using the national ETRS89TM35FIN co-ordinate system and projection is divided among several ETRS89-LAEA grid cells Direct derivation between different co-ordinate systems or projection is not usable grids are located differently in relation to each others A issue to be solved: How to use national grid datasets while the direct conversion is not relevant…? Tested method 1. Aggregation of grid data by using converted building points 1) Georeferenced source data is converted Buildings are converted from ETRS89-TM35FIN to ETRS89-LAEA 2) Converted building points are joined with the ETRS89-LAEA grid net 3) Aggregation of statistical data Building points in ETRS89-TM35FIN Building points in ETRS89-LAEA Aggregation of statistical data Method 1 Advantages Points easily convertible – original quality of location maintained From geostatistical point of view data quality throughly the same as in national data Disadvantages Double sets of primary data Double production processes from the beginning Risk of data disclosure – due to use of several co-ordinate systems - gaps between datasets Tested method 2. Conversion of grid data by using ready-made national grid datasets 1) Ready-made national grid dataset in ETRS89TM35FIN is converted into ETRS89-LAEA Polygon to Point – using the middle points of national grid cells Conversion of the middle points of grids 2) Converted points are joined with the ETRS89-LAEA grid net 3) Aggregation of statistical data PRODUCTION OF THE NATIONAL GRID DATA MIDDLE POINTS OF NATIONAL GRIDS CONVERSION OF THE POINTS, SPATIAL JOIN WITH ETRS89-LAEA GRID NET AGGREGATION OF STATISTICAL DATA Effects of the grid cell size on the quality of the conducted data Tested grid cell sizes: National grid data: - 125 m x 125 m – highest resolution data - 250 m x 250 m – data produced for the Finnish Grid Database - 1 km x 1 km Reference data: Data produced by using method 1; (conversion made on building points) Additional test: JRC/GISCO disaggregated data POP/KM² Comparison of the test datasets Statistics: Number of grids, mean (inhabitants/grid populated grid cell), total number of inhabitants in the dataset, min, max Variable Dataset from converted building points Datasets from converted grid points JRC dataset N Mean Sum Minimum Maximum POP_1KM_LAEA 102 050 51,0 5 204 192 1 14 053 POP_1KM_125M 102 249 50,9 5 204 192 1 14 197 POP_1KM_250M POP_1KM_1KM POP_DISAGG 102 759 99 049 159 921 50,6 52,5 32,4 5 204 166 5 204 179 5 181 806 1 1 0.01 13 283 19 175 5 866 Number of Observations Pearson Correlation Coefficients POP_1KM_ POP_1KM_ POP_1KM_ LAEA 125M 250M POP_1KM_ 1KM POP_ DISAGG Dataset from converted building points POP_1KM_LAEA POP_1KM_LAEA 1.00000 0.99900 0.99495 0.90989 0.79804 <.0001 <.0001 <.0001 <.0001 102 050 99 372 97 216 81 647 85737 Dataset from converted grid points POP_1KM_125M POP_1KM_125M 0.99900 1.00000 0.99471 0.90990 0.79857 <.0001 <.0001 <.0001 <.0001 99372 102249 97488 81808 85871 POP_1KM_250M POP_1KM_250M 0.99495 0.99471 1.00000 0.90611 0.79840 <.0001 <.0001 <.0001 <.0001 97216 97488 102759 82185 86268 POP_1KM_1KM POP_1KM_1KM 0.90989 0.90990 0.90611 1.00000 0.74920 <.0001 <.0001 <.0001 <.0001 81647 81808 82185 99049 82069 JRC dataset POP_DISAGG POP_DISAGG 0.79804 0.79857 0.79840 0.74920 1.00000 <.0001 <.0001 <.0001 <.0001 85737 85871 86268 82069 159921 Identity line (the 45 degree line) Values of converted dataset in relation to values of national datasets Evaluation of differences by using absolute values of inhabitants/km² grid cell (absolute values of differences) DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to derived datasets GRIDS Std Dev 125M 99 372 12,7 % DIF 0 65 305 25 428 4 429 1 924 1 447 503 335 1 65,7 25,6 4,5 1,9 1,5 0,5 0,3 0,0 % 250M 91,3 97 216 28,9 % 50 742 32 008 7 105 3 156 2 170 1 033 940 56 6 52,2 32,9 7,3 3,2 2,2 1,1 1,0 0,1 0,0 % 1KM % DIF 11- DIF 21- DIF 51- DIF 101- DIF 501- DIF over DIF 1-5 DIF 6-10 20 50 100 500 1000 1000 85,1 81 647 135,5 20 194 31 351 11 606 7 839 4 903 1 888 3 000 574 292 24,7 38,4 14,2 9,6 6,0 2,3 3,7 0,7 0,4 % 63,1 DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to JRC/GISCOdisaggregated data DISAG 85 737 % % 184,8 11 395 36 260 14 294 9 244 6 477 2 916 4 113 632 406 13,3 42,3 16,7 10,8 7,6 3,4 4,8 0,7 0,5 55,6 Method 2 Advantages Use of the ready-made grid datasets! Less phases Smaller data mass Level of quality is a matter of choice Adequate level of quality (?) Dependent on use Min. target: SUM of the whole dataset is correct No increase of confidentiality problems with double datasets Disadvantages Geostatistical point of view data quality is weaker than the original national data Quality errors – quality distortion compared to the correct one (measuring by number of inhabitants) Next steps For GEOSTAT 1A project from October - November 2011 More tests, any volunteers? Quality definitions concerning adequate level of quality and grid scale used Step-by-step guidelines LAEA dataset – filling the empty grid net with data! Thank You! rina.tammisto@stat.fi marja.tammilehto-luode@stat.fi