Supplementary product description

advertisement
Product:
May 2016
Data Product Description –
Supplement for GovHack 2016
Disclaimer
2B
PSMA Australia believes this publication to be correct at the time of printing and does not accept
responsibility for any consequences arising from the use of information herein. Readers should rely on
their own skill and judgement to apply information to particular issues.
This work is copyright. Apart from any use as permitted under the Copyright Act 1968, no part may be
reproduced by any process without prior written permission of PSMA Australia Limited.
PAGE i
4B
Notes on preparing the GNAF Flat
File for GovHack 2016
Introduction:
To support GovHack 2016, PSMA has created a simplified GNAF file across a range of
common data formats.
In response to the feedback received from the general public on the complexity of the full
GNAF release on data.gov.au, we have prepared these flat files as an introduction to GNAF.
The flat file format makes it easier to use of GNAF, however, in order to create these files,
we have had to reduce some capability that was delivered through the complex relational
data structure.
This document provides an overview of the flat files created and should be read in
conjunction with the associated GNAF Product Description.
Source data:
•
May 2016 published GNAF release.
Available File Formats:
•
•
•
•
Pipe Separated Values (psv)
MapInfo MidMif,
ESRI Shapefile*
ESRI File Geodatabase
*Note: The data in the Shapefile format is separated by state/territory to ensure the dbf file
size does not exceed 2Gb.
PAGE 1
Overview of process:
1. May 2016 GNAF release data is loaded into a PostgreSQL database. All
state/territory data is loaded into a consistent national data schema.
2. FME workbench is executed which;
a. Connects to the source database
b. Via SQL statement generates the single file
c. Creates point geometries (where required)
d. Produces the range of formats
Figure 1 - FME workbench to produce GNAF for GovHack 2016
SQL Statement:
The SQL statement joins relevant GNAF tables, selects the required fields, and filters the
data to only include principal addresses and addresses with a confidence value > -1.
The specific SQL statement used to generate the file is as follows;
SELECT ad.address_detail_pid, ad.street_locality_pid, ad.locality_pid, ad.building_name, ft.description "ft_type_description",
ad.flat_number_prefix, ad.flat_number, ad.flat_number_suffix, ad.level_type_code, ad.level_number_prefix, ad.level_number,
ad.level_number_suffix, ad.number_first_prefix, ad.number_first, ad.number_first_suffix, ad.number_last_prefix,
ad.number_last, ad.number_last_suffix, ad.lot_number, sl.street_name, sl.street_type_code, sl.street_suffix_code,
l.locality_name, st.state_abbreviation, ad.postcode, ad.confidence, ad.date_created, ad.alias_principal, mb.mb_2011_code,
mbaut.name "mb_match_level", adg.longitude, adg.latitude, gt.name "geocode_type"
FROM address_detail ad
LEFT JOIN address_site_geocode asg ON ad.address_site_pid::text = asg.address_site_pid::text
JOIN street_locality sl ON ad.street_locality_pid::text = sl.street_locality_pid::text
LEFT JOIN street_locality_point slp ON sl.street_locality_pid::text = slp.street_locality_pid::text
JOIN locality l ON ad.locality_pid::text = l.locality_pid::text
LEFT JOIN locality_point lp ON l.locality_pid::text = lp.locality_pid::text
LEFT JOIN address_mesh_block_2011 amb on amb.address_detail_pid::text = ad.address_detail_pid::text
LEFT JOIN mb_2011 mb on mb.mb_2011_pid::text = amb.mb_2011_pid::text
LEFT JOIN mb_match_code_aut mbaut on mbaut.code::text = amb.mb_match_code::text
LEFT JOIN flat_type_aut ft ON ad.flat_type_code::text = ft.code::text
LEFT JOIN address_default_geocode adg on adg.address_detail_pid::text = ad.address_detail_pid::text
LEFT JOIN geocode_type_aut gt on gt.code = adg.geocode_type_code
JOIN state st ON l.state_pid::text = st.state_pid::text
where confidence > -1 and ad.alias_principal = 'P'
PAGE 2
Download