Product: May 2016 Data Product Description – Supplement for GovHack 2016 Disclaimer 2B PSMA Australia believes this publication to be correct at the time of printing and does not accept responsibility for any consequences arising from the use of information herein. Readers should rely on their own skill and judgement to apply information to particular issues. This work is copyright. Apart from any use as permitted under the Copyright Act 1968, no part may be reproduced by any process without prior written permission of PSMA Australia Limited. PAGE i 4B Notes on preparing the GNAF Flat File for GovHack 2016 Introduction: To support GovHack 2016, PSMA has created a simplified GNAF file across a range of common data formats. In response to the feedback received from the general public on the complexity of the full GNAF release on data.gov.au, we have prepared these flat files as an introduction to GNAF. The flat file format makes it easier to use of GNAF, however, in order to create these files, we have had to reduce some capability that was delivered through the complex relational data structure. This document provides an overview of the flat files created and should be read in conjunction with the associated GNAF Product Description. Source data: • May 2016 published GNAF release. Available File Formats: • • • • Pipe Separated Values (psv) MapInfo MidMif, ESRI Shapefile* ESRI File Geodatabase *Note: The data in the Shapefile format is separated by state/territory to ensure the dbf file size does not exceed 2Gb. PAGE 1 Overview of process: 1. May 2016 GNAF release data is loaded into a PostgreSQL database. All state/territory data is loaded into a consistent national data schema. 2. FME workbench is executed which; a. Connects to the source database b. Via SQL statement generates the single file c. Creates point geometries (where required) d. Produces the range of formats Figure 1 - FME workbench to produce GNAF for GovHack 2016 SQL Statement: The SQL statement joins relevant GNAF tables, selects the required fields, and filters the data to only include principal addresses and addresses with a confidence value > -1. The specific SQL statement used to generate the file is as follows; SELECT ad.address_detail_pid, ad.street_locality_pid, ad.locality_pid, ad.building_name, ft.description "ft_type_description", ad.flat_number_prefix, ad.flat_number, ad.flat_number_suffix, ad.level_type_code, ad.level_number_prefix, ad.level_number, ad.level_number_suffix, ad.number_first_prefix, ad.number_first, ad.number_first_suffix, ad.number_last_prefix, ad.number_last, ad.number_last_suffix, ad.lot_number, sl.street_name, sl.street_type_code, sl.street_suffix_code, l.locality_name, st.state_abbreviation, ad.postcode, ad.confidence, ad.date_created, ad.alias_principal, mb.mb_2011_code, mbaut.name "mb_match_level", adg.longitude, adg.latitude, gt.name "geocode_type" FROM address_detail ad LEFT JOIN address_site_geocode asg ON ad.address_site_pid::text = asg.address_site_pid::text JOIN street_locality sl ON ad.street_locality_pid::text = sl.street_locality_pid::text LEFT JOIN street_locality_point slp ON sl.street_locality_pid::text = slp.street_locality_pid::text JOIN locality l ON ad.locality_pid::text = l.locality_pid::text LEFT JOIN locality_point lp ON l.locality_pid::text = lp.locality_pid::text LEFT JOIN address_mesh_block_2011 amb on amb.address_detail_pid::text = ad.address_detail_pid::text LEFT JOIN mb_2011 mb on mb.mb_2011_pid::text = amb.mb_2011_pid::text LEFT JOIN mb_match_code_aut mbaut on mbaut.code::text = amb.mb_match_code::text LEFT JOIN flat_type_aut ft ON ad.flat_type_code::text = ft.code::text LEFT JOIN address_default_geocode adg on adg.address_detail_pid::text = ad.address_detail_pid::text LEFT JOIN geocode_type_aut gt on gt.code = adg.geocode_type_code JOIN state st ON l.state_pid::text = st.state_pid::text where confidence > -1 and ad.alias_principal = 'P' PAGE 2