WP29_slidesx

advertisement
Exploration of electricity usage data
from smart meters to investigate
household composition
Topic (v): Integration and management of new data sources
Seminar on Statistical Data Collection
Geneva, Switzerland, 25-27 September 2013
Paula.Carroll@ucd.ie
John.Dunne@cso.ie
Michael.Hanley@ucdconnect.ie
Tadhg.Murphy.1@ucdconnect.ie
Overview
•
•
•
•
•
•
•
•
•
Setting the scene
The data
Problem statement
The methodology
Some results
The resources
Team review
CSO review
Concluding remarks
2
Setting the Scene -the players
3
The data
• Over 5000 households in pilot
• 3 months baseline data (reading every 30 mins)
• Pre-trial survey using CATI
Purpose : Consumer Behaviour Trials in 2009 and
2010
4
Problem statement
To determine household
composition using
smart metering data
Category
Adults
Children
A
3
2
B
3
1
C
3
0
D
2
5
E
2
4
F
2
3
G
2
2
H
2
1
I
2
0
J
1
1
K
1
0
L
4
1
M
4
0
N
5
1
O
5
0
P
6
0
5
The methodology
• Machine learning algorithms for classifier
– (learning and testing || generalisation)
– Neural Networks used
– Binomial and Multinomial classification
– Unbalanced data
• Data reduction/ dimension reduction
– Used 21 explanatory variables as input to classifier
– Variables normalised
6
Some results – balanced multinomial classifier
Category
Adults
Children
A
3
2
B
3
1
C
3
0
Predicted
Household category
Household category
Actual
Test
B
C
F
G
H
I
K
M
Σ
%
Accuracy
B
0
0
6
6
0
6
2
0
20
0.0
D
2
5
C
0
0
4
10
1
3
1
1
20
0.0
E
2
4
F
0
0
8
6
0
4
2
0
20
40.0
F
2
3
G
0
0
5
2
1
8
4
0
20
10.0
G
2
2
H
0
0
4
4
1
7
4
0
20
5.0
H
2
1
I
2
0
I
1
0
1
2
0
8
8
0
20
40.0
J
1
1
K
0
0
0
0
0
5
15
0
20
75.0
K
1
0
M
0
0
10
4
0
3
3
0
20
0.0
L
4
1
M
4
0
N
5
1
O
5
0
P
6
70
“Confusion matrix”
The resources
• Project team of two persons for 3 months
– Significant amount of time spent manipulating data
• Software: R with nnet and neuralnet packages
• Hardware: Required considerable computer
resources for manipulating full dataset (Stokes
at ICHEC)
8
Team review
Problem statement too specific
- broaden to household characteristics
Alternative approach (cluster analysis and then
describe clusters)
Other techniques – PCA or signal processing
9
CSO review – forward looking
Assuming go live 1.5m household meters linked to
statistical household register in 2019
Existing statistical needs
– Field force management
– Auxiliary information
– Sample selection /Representivity analysis
New statistical products?
– Energy consumption patterns by location, household
etc
– Quality of life (time to rise, time to bed)
10
Concluding remarks
3 V’s + V for Value – Is there value in SMD
Access v Privacy
– Legal, moral, proportionality
Infrastructure for Big data (1.5m data points every 30 mins)
– Outsourcing, downsampling
New tools, skills, approaches
Roadmap – collaboration with suitable partners
11
Download