Exploration of electricity usage data from smart meters to investigate household composition Topic (v): Integration and management of new data sources Seminar on Statistical Data Collection Geneva, Switzerland, 25-27 September 2013 Paula.Carroll@ucd.ie John.Dunne@cso.ie Michael.Hanley@ucdconnect.ie Tadhg.Murphy.1@ucdconnect.ie Overview • • • • • • • • • Setting the scene The data Problem statement The methodology Some results The resources Team review CSO review Concluding remarks 2 Setting the Scene -the players 3 The data • Over 5000 households in pilot • 3 months baseline data (reading every 30 mins) • Pre-trial survey using CATI Purpose : Consumer Behaviour Trials in 2009 and 2010 4 Problem statement To determine household composition using smart metering data Category Adults Children A 3 2 B 3 1 C 3 0 D 2 5 E 2 4 F 2 3 G 2 2 H 2 1 I 2 0 J 1 1 K 1 0 L 4 1 M 4 0 N 5 1 O 5 0 P 6 0 5 The methodology • Machine learning algorithms for classifier – (learning and testing || generalisation) – Neural Networks used – Binomial and Multinomial classification – Unbalanced data • Data reduction/ dimension reduction – Used 21 explanatory variables as input to classifier – Variables normalised 6 Some results – balanced multinomial classifier Category Adults Children A 3 2 B 3 1 C 3 0 Predicted Household category Household category Actual Test B C F G H I K M Σ % Accuracy B 0 0 6 6 0 6 2 0 20 0.0 D 2 5 C 0 0 4 10 1 3 1 1 20 0.0 E 2 4 F 0 0 8 6 0 4 2 0 20 40.0 F 2 3 G 0 0 5 2 1 8 4 0 20 10.0 G 2 2 H 0 0 4 4 1 7 4 0 20 5.0 H 2 1 I 2 0 I 1 0 1 2 0 8 8 0 20 40.0 J 1 1 K 0 0 0 0 0 5 15 0 20 75.0 K 1 0 M 0 0 10 4 0 3 3 0 20 0.0 L 4 1 M 4 0 N 5 1 O 5 0 P 6 70 “Confusion matrix” The resources • Project team of two persons for 3 months – Significant amount of time spent manipulating data • Software: R with nnet and neuralnet packages • Hardware: Required considerable computer resources for manipulating full dataset (Stokes at ICHEC) 8 Team review Problem statement too specific - broaden to household characteristics Alternative approach (cluster analysis and then describe clusters) Other techniques – PCA or signal processing 9 CSO review – forward looking Assuming go live 1.5m household meters linked to statistical household register in 2019 Existing statistical needs – Field force management – Auxiliary information – Sample selection /Representivity analysis New statistical products? – Energy consumption patterns by location, household etc – Quality of life (time to rise, time to bed) 10 Concluding remarks 3 V’s + V for Value – Is there value in SMD Access v Privacy – Legal, moral, proportionality Infrastructure for Big data (1.5m data points every 30 mins) – Outsourcing, downsampling New tools, skills, approaches Roadmap – collaboration with suitable partners 11