Building Simulation Modelers – Are we big data ready?

advertisement
Building Simulation Modelers
– Are we big data ready?
Jibonananda Sanyal and Joshua New
Oak Ridge National Laboratory
Learning Objectives
• Appreciate emerging big-data needs in building sciences
• Identify bottlenecks early and devise solutions for unanticipated
situations using large amounts of data
ASHRAE is a Registered Provider with The American Institute of Architects
Continuing Education Systems. Credit earned on completion of this
program will be reported to ASHRAE Records for AIA members.
Certificates of Completion for non-AIA members are available on request.
This program is registered with the AIA/ASHRAE for continuing
professional education. As such, it does not include content that may be
deemed or construed to be an approval or endorsement by the AIA of any
material of construction or any method or manner of handling, using,
distributing, or dealing in any material or product. Questions related to
specific materials, methods, and services will be addressed at the
conclusion of this presentation.
Outline
• Background and scope
• Motivation
• Big data in building sciences
• Data from simulations
• Data from sensors
• Conclusion
3
Sustainability is the defining
challenge
• Buildings in U.S.
– 41% of primary
energy/carbon 73% of
electricity, 34% of gas
• Buildings in China
– 60% of urban building
floor space in 2030 has
yet to be built
• Buildings in India
– 67% of all building floor
space in 2030 has yet to
be built
4
4
Energy Consumption and Production
Commercial Site Energy
Consumption by End Use
5
5
Whole ‘test buildings’ for system/building
integration research
● Evaluating emerging energy efficiency technologies in realistic test beds is an
essential step before market introduction.
● Some technologies (whole-building fault detection and diagnostics, etc.) benefit
from use of test buildings during the development process.
Fleet of Residential ‘Test Buildings’
6
Two Light Commercial ‘Test Buildings’
Real demonstration facilities
Residential homes
2800 ft2 residence
269 sensors @ 15-minutes
50-60% energy savers
Heavily instrumented and equipped with occupancy simulation:
•
•
•
•
•
•
7
Temperature
Plugs
Lights
Range
Washer
Radiated heat
•
•
•
•
•
Dryer
Refrigerator
Dishwasher
Heat pump air flow
Shower water flow
What is Big-Data?
Volume
Velocity
Scale
Streaming
Variety
Veracity
Data Types
Quality
But still, what is big-data?
8
Why do we care?
• Trending technologies
– Simulation playing bigger role
• Parametric analysis determinants
• Understanding uncertainty
– Sensor data
• Unprecedented levels of resolution
• New control algorithms
–
–
–
–
Calibration
Internet of things
Demand response
B2G integration
• Common/traditional tools and methods of analysis break
9
How big is big data in the building
sciences?
• Depends on
– Size
– Capability of tools for conventional analysis
• What is the purpose of the data?
Where the management and analysis of data
poses a non-trivial challenge
10
For simulation output
• Scalability, analysis requirements, and adaptability of data
storage mechanism to changing needs
• Unique situations
– Non standard EnergyPlus time-stamps
– 2012-01-10 24:00:00
• Data movement and network performance
– Lag, bandwidth, connection stability
• Logical aspects
– Synchronization, storage schema, logical partitioning
• Analytics on the data
11
Additionally, for sensor data
• Fault detection, management, correction
–
–
–
–
3-sigma rule for automated outlier detection
Slopes and trends
Physical redundancy
Chicken and egg problem
• Quality control/quality assurance
– Filtering, statistical filling in, machine learning approaches
12
Managing simulation ensembles
• Uncertainty
• Simulation input – design of experiments
–
–
–
–
–
Random sampling
Uniform sampling
Markov order sampling
Latin square designs
Fractional factorial designs
• Simulation output
– Larger in size, post-processing overheads
– Saving raw output vs. summary
– Trade-off of re-computing vs. storage and retrieval
13
Big-data management
• Generic considerations
– Weak to unstructured collection of data units
– Balance of storage to computational needs (e.g. Cost of unzipping)
– Access patterns
•
•
•
•
Do you access individual files or groups of files?
Do you calculate summaries often? Do you repeat the same calculations?
Physical location on disk
Can you exploit parallelism?
– Design for the generic analysis/use case
– Design for fault resilience
14
Data transfer and storage methods
• Moving big-data is expensive
– 10 days to move 45TB, only 68 minutes to generate!
• Logical partitions in data movement
– Over arching logical data schema
– Use of parallel file transfer tools like bbcp, GridFTP, rsync
• Traditionally, building simulation data are csv files
–
–
–
–
15
Use database technologies
Atomicity, Consistency, Isolation, Durability (ACID)
SQL vs. NoSQL
Compression in databases, row vs. columnar
Performance metrics for simulation
data
• 15 minute EnergyPlus output
– 35,040 records of 96 variables, ~35MB
– 7-8MB compressed, ~20-22% compression rate
• 200 CSVs inserted into MySQL database, w/ row compression
– 7M records, 10.27 MB average
– Read only: 6.8 MB
• ALTER TABLE command on a 386 GB table
– 8 hrs 10 min with 12 partitions
– >1 week when unpartitioned
• HADOOP approaches
– Key-value pairs, no ACID compliance
16
Comparison of database performance
17
Other considerations
• Data sensitivity and user permissions
• Backups
– Sensor data
– Simulation data
• Is it worth it to back up?
– Analysis to be applied
– Change in simulation code
– Reference for derivative products
• Provenance
• Workflow Tools
18
Case study
• Parametric ensemble
– 2 Residential buildings
– Commercial buildings: medium office, warehouse, stand-alone retail
• Several supercomputers used
– Titan (299,008 cores)
– Frost (2048 cores)
– Nautilus (1024 cores)
• Several challenges
– EnergyPlus on supercomputers
– File system, data transience and its movement
– Analysis
19
Tipping point
Wall-clock Time
(mm:ss)
16
18:14
Data
Size
5 GB
32
18:19
11 GB
128
64
18:34
22 GB
256
128
18:22
44 GB
512
256
20:30
88 GB
1,024
512
20:43
176 GB
2,048
1,024
21:03
351 GB
4,096
2,048
21:11
703 GB
8,192
4,096
20:00
1.4 TB
16,384
8,192
26:14
2.8 TB
32,768
16,384
26:11
5.6 TB
65,536
32,768
31:29
11.5 TB
131,072
65,536
44:52
23 TB
262,144
131,072
68:08
45 TB
524,288
Processors
20
E+ simulations
64
Conclusion
• What is big data?
• Why should we care?
• Considerations in working with big-data
• Analysis requirements
• Managing big-data
• Anticipating issues of scale
21
Questions?
Jibonananda Sanyal
sanyalj@ornl.gov
Download