Simulation and Big-Data Challenges in Tuning Building Energy Models Jibonananda Sanyal, Ph.D. and Joshua New, Ph.D. Building Technologies Research & Integration Center (BTRIC) Whole Building and Community Integration Group Workshop on Modeling and Simulation of Cyber-Physical Energy Systems May 20, 2013 Presentation Summary • Autotune – calibration problem • Running a number of EnergyPlus simulations – – – – Inputs Workflow Shared and distributed memory supercomputers Data management • Autotune software-as-a-service • Key tricks 2 Managed by UT-Battelle for the U.S. Department of Energy The Autotune Idea Bridging the gap between the real world and the virtual one E+ Input Model . . . 3 Managed by UT-Battelle for the U.S. Department of Energy Autotune Bridging the gap between the real world and the virtual one E+ output - internal variable data E+ Input Model Avg Instrumented Building Sensor data 4 Managed by UT-Battelle for the U.S. Department of Energy Manual mapping initially Autotune Large-scale sensitivity analysis and uncertainty quantification E+ Input Model Ensemble of E+ inputs E+ output changes Sensor Data 5 Managed by UT-Battelle for the U.S. Department of Energy E+ output effect compared to sensor data Generating the Inputs • Parametric sampling – – – – Experts selected 156 of 3000+ input parameters Brute force using 3 levels: 5x1052 E+ simulations! 14 parameter full combinatorial subset Markov Order 1 and 2 sampling • Input generation – 700 to 950 KB input file – Perl program, sequential; Excel; Python – E+ supplied parametric preprocessor 6 Managed by UT-Battelle for the U.S. Department of Energy Types of buildings simulated • Residential – 5 million simulations • Medium Office – 1 million • Stand-alone retail – 1 million • Warehouse – 1 million Torcellini et al. 2008, “DOE Commercial Building Benchmark Models”, NREL/CP-550-43291, National Renewable Energy Laboratory, Golden CO. 7 Managed by UT-Battelle for the U.S. Department of Energy ORNL High Performance Computing Resources Cost: $97 million DOE BTO Use: 500k hours granted (CY13) Autotune: Parametric E+ Sims Data Mining with Machine Learning Jaguar: 224k cores, 360TB memory, 10PB of disk, 1.7 petaflops Cost: $104 million DOE BTO: 500k hours granted (CY12) Nautilus: Frost: 2048 SGI Altix; 136 nodes 1024 cores, shared-memory 200k hours granted (CY13) DOE BTO: 30k hours granted (CY11) 200k hours granted (CY12) 250k hours (CY13) 8 Managed by UT-Battelle for the U.S. Department of Energy Lens cluster: 77 nodes – 45x128GB, 32x 64GB with NVIDIA 880 and Tesla dual-GPU EVEREST visualization (CY13) Gordon: 250k hours (CY13) Target supercomputers Titan, that used to be Jaguar • 299,008 cores • 18,688 nodes • 20 petaflops • 710 TB of distributed RAM • 32 GB per node Nautilus • 1024 cores • 4TB Shared memory Frost • 2048 cores • 4 TB distributed RAM • 32 GB per node 9 Managed by UT-Battelle for the U.S. Department of Energy Simulation Workflow • E+ is designed for desktops, not supercomputers • Biggest bottleneck is Lustre IO • Run from Ramdisk… tricky with supercomputers • Pack inputs – 64 files into each tarball • Created a simplified, managed script to invoke E+ from RAM • dplace used on Nautilus to place jobs – E.g. for 256 cores, 4 tar balls loaded – Elaborate PBS script – Takes a long time to place individual jobs for 512 or more cores • Frost and Titan – MPI program – Asynchronous node level barriers mitigate metadata server requests – 1 tarball per node; each node has 16 cores, so 4 iterations • After a block has been run, compress to disk • Iterate 10 Managed by UT-Battelle for the U.S. Department of Energy Shared memory Nautilus 11 Managed by UT-Battelle for the U.S. Department of Energy Distributed memory Titan 12 Managed by UT-Battelle for the U.S. Department of Energy Data management • Data generated – 45 TB in 68 mins for ½ million E+ runs – At least 270 TB raw • Data storage – – – – Compressed, around 70 TB Lustre is scratch space (14 days) Need to move this data before scratched Many database technologies explored • Data transfer – Speed of generation is faster than you can pump out! – Firewalls complicate things • Data analysis – Move computation to data – Stitch them together 13 Managed by UT-Battelle for the U.S. Department of Energy 14 Managed by UT-Battelle for the U.S. Department of Energy Autotune software-as-a-service model Key tricks that helped us • Determining the RAM based filesystem on these machines – Poor documentation • Ratio of number of cores, size of simulation, and available RAM – Appropriately fit the task in RAM, with enough RAM left for the application heap • Mitigating Lustre IO • Asynchronous elements in bulk-synchronous processing – All cores do not hit the filesystem at the same time – Compile static • Streamlining the workflow – E+ invokes a number of programs and has a script that performs copious amount of redundant IO – Reduce not needed calls in individual simulation workflow • Managing shifting bottlenecks • Think in parallel, even to list files on a drive! 15 Managed by UT-Battelle for the U.S. Department of Energy http://autotune.roofcalc.com Machine Learning on Supercomputers One year of 15-min data, 144 sensors/house • Support Vector Machines • Genetic Algorithms • FF/Recurrent Neural Networks • (Non-)Linear Regression • Self-Organizing Maps Nautilus Supercomputer • C/K-Means • Ensemble Learning Acknowledgment: UTK computer science Ph.D. candidate Richard Edwards; student of Dr. Lynne Parker 17 Managed by UT-Battelle for the U.S. Department of Energy Real demonstration facilities ZEBRAlliance homes 2800 ft2 residence 269 sensors @ 15-minutes 50-60% energy savers 5M simulations of E+ model! Heavily instrumented and equipped with occupancy simulation: • • • • • • 18 Temperature Plugs Lights Range Washer Radiated heat Managed by UT-Battelle for the U.S. Department of Energy • • • • • Dryer Refrigerator Dishwasher Heat pump air flow Shower water flow Large Data 156 inputs (permutes *.idf) !-Generator IDFEditor 1.41 !-Option SortedOrder ViewInIPunits !-NOTE: All comments with '!-' are ignored by the IDFEditor and are generated automatically. !Use '!' comments if they need to be retained when using the IDFEditor. !- =========== Version, 7.0; !- =========== SimulationControl, No, No, No, No, Yes; Periods !- =========== ALL OBJECTS IN CLASS: VERSION =========== !- Version Identifier ALL OBJECTS IN CLASS: SIMULATIONCONTROL =========== !!!!!- Do Zone Sizing Calculation Do System Sizing Calculation Do Plant Sizing Calculation Run Simulation for Sizing Periods Run Simulation for Weather File Run ALL OBJECTS IN CLASS: BUILDING =========== Building, ZEBRAlliance House number 1 SIP House, !- Name -37, !- North Axis {deg} Suburbs, !- Terrain 0.04, !- Loads Convergence Tolerance Value 0.4, !- Temperature Convergence Tolerance Value {deltaC} FullExteriorWithReflections, !- Solar Distribution 25, !- Maximum Number of Warmup Days 6; !- Minimum Number of Warmup Days !- =========== Site:Location, Oak Ridge, 35.96, -84.29, -5, 19 ALL OBJECTS IN CLASS: SITE:LOCATION =========== !!!!- Managed by UT-Battelle for the U.S. Department of Energy Name Latitude {deg} Longitude {deg} Time Zone {hr} 82 outputs @ 15m (*.csv) Large Data • 8M sims * 7.24m = 110 compute-years (cloud=$77,226) – “Free” supercomputers and desktop utility for multiple runs+upload • 8M sims * 35MB = 267 TB database (cloud=$512,237/month) – Cost-effective hardware (1 time, ~$28k) • Database engines: MyISAM load data 0.71s vs. InnoDB 2.3s – Others: NoSQL/key-value pair, column-store, compression ratios • Database partitioned by month, views span tables • Software stack for analysis 20 Managed by UT-Battelle for the U.S. Department of Energy Making ORNL Data Available Computing Resources E+ Simulations E+ Input Model Jaguar Supercomputer Nautilus Web Server PowerEdge R510 Data Mining 96 ~ HP rx2600 21 Managed by UT-Battelle for the U.S. Department of Energy Automated process to run millions of simulations and host publicly online Genetic Algorithms #1 problem with E+ is simulation speed Use AI to approximate E+ Exact solution if in database (~milliseconds) Approx. solution (seconds) E+ Input Model Exact solution (5-10 mins) Dual buffer, Genetic Algorithm Island model for evolving tuned model Slow buffer/island 22 Managed by UT-Battelle for the U.S. Department of Energy Fast buffer/island Multi-objective Fitness evaluation Data • Plan to make available in FY13 • Will run on desktop machine (overnight testing, stop on demand) • I+O = 8M*156 + 8M*35,040*96 = 26.9 trillion data points (eventually) TOTAL COST = 4.3 * 10-16 cent http://autotune.roofcalc.com Acknowledgement: This research used resources of the AutotuneDB at the Oak Ridge National Laboratory, which was supported by the Office of Science of the U.S. Department of Energy. Disclaimer: No service-level performance or availability guarantees implied 23 Managed by UT-Battelle for the U.S. Department of Energy BTRIC 2011 accomplishments • Support for Weatherization and Intergovernmental Program (WIP) grows – Develop plan for new multi-family building audit – Make existing single-family and mobile home audits web-based – Continue the retrospective national evaluation of the Weatherization Assistance Program (WAP) – Initiate national evaluation of the State Energy Program (SEP) and the Energy Efficiency Block Grant Program (EEBGP) – Complete the planning for the national evaluation of ARRA Weatherization – Aided in the weatherization of 600,000 homes three months ahead of schedule ORNL staff and subcontractors have been supporting the expenditure of over $10B in ARRA funds in the WIP portfolio 24 Managed by UT-Battelle for the U.S. Department of Energy Science to transform today's buildings into smart, responsive, and efficient structures Experimental S&T Capabilities Modeling and Visualization R&D Better Buildings via Novel Tools and Technologies Building Science Data/Knowledge Materials Science Web-Based Tools Data/Knowledge Computational Science Automated Model Calibration Next Generation Commercial Buildings Neutron Science Industry CRADAs Data/Knowledge Innovative Products Sensors, Controls, Grid Next Generation Residential Buildings Data/Knowledge 25 Managed by UT-Battelle for the U.S. Department of Energy 4th Paradigm • Empirical – guided by experiment/observation – In use thousands of years ago, natural phenomena • Theoretical – based on coherent group of principles and theorems – In use hundreds of years ago, generalizations • Computational – simulating complex phenomena – In use for decades • Data exploration (eScience) – unifies all 3 – Data capture, curation, storage, analysis, and visualization 26 Managed by UT-Battelle for the U.S. Department of Energy 4th Paradigm Johannes Kepler 3 laws of planetary motion: Elliptical orbit (based on location of Mars) Planets sweep out equal areas in equal times The square of the periodic times are to each other as the cubes of the mean distances 27 Managed by UT-Battelle for the U.S. Department of Energy 4th Paradigm • #3 - Computer simulation 28 Managed by UT-Battelle for the U.S. Department of Energy 29 Managed by UT-Battelle for the U.S. Department of Energy 4th Paradigm • #4 - Visualization and Analysis 30 Managed by UT-Battelle for the U.S. Department of Energy 4th Paradigm 31 Managed by UT-Battelle for the U.S. Department of Energy Visual Analytics (AI) • Sensor-based Energy Modeling 32 Managed by UT-Battelle for the U.S. Department of Energy