Sensor Data Management, Validation, Correction, and Provenance for Building Technologies Charles Castello Jibonananda Sanyal Zachary Hensley Jeffrey Rossiter Joshua New Building Technologies Research and Integration Center Oak Ridge National Laboratory Learning Objectives • This seminar aims to highlight the importance of data management, completeness, accuracy, and provenance (i.e., lineage) for buildings. • Provide an overview of methods being used for sensor data quality assurance, validation, correction, and filling in missing data. 2 Outline • Motivation and background • Provenance Data Management • Sensor Data Correction 3 Energy is the Defining Challenge of Our Time • Buildings in U.S. – 40% of primary energy/carbon, 73% of electricity, 34% of gas • Buildings in China – 60% of urban building floor space in 2030 has yet to be built • Buildings in India – 67% of all building floor space in 2030 has yet to be built 4 Global energy consumption will increase 50% by 2030 Our applied R&D capabilities are three areas Envelope Develop component technologies that are more resistant to heat flow, airtight, and moisture-durable than existing technologies 5 Equipment Develop component technologies that deliver the same amenities while using significantly less energy than existing technologies focused in System/building integration Verify that advanced component technologies deliver what they promise and are durable and reliable in real buildings Envelope research lab facilities 6 Heat Flow Through Roof/Attic Assemblies Heat Flow Through Wall Assemblies Air/Moisture Flow Through Wall Assemblies Hygrothermal Properties of Materials Envelope natural exposure test facilities Tacoma, WA (Cool/Humid) Oak Ridge, TN (Mixed/Humid) Building Energy Efficiency 7 Syracuse, NY (Cold/Humid) Charleston, SC (Hot/Humid) Equipment research lab facilities Environmental Chambers Compressor Calorimeters Heat Exchanger Test Loops 8 Working Fluid Physical Properties Measurement Whole ‘test buildings’ for system/building integration research ● Evaluating emerging energy efficiency technologies in realistic test beds is an essential step before market introduction. ● Some technologies (whole-building fault detection and diagnostics, etc.) benefit from use of test buildings during the development process. Fleet of Residential ‘Test Buildings’ 9 Two Light Commercial ‘Test Buildings’ Real demonstration facilities Residential homes 2800 ft2 residence 269 sensors @ 15-minutes 50-60% energy savers Heavily instrumented and equipped with occupancy simulation: • • • • • • 10 Temperature Plugs Lights Range Washer Radiated heat • • • • • Dryer Refrigerator Dishwasher Heat pump air flow Shower water flow Flexible Research Platforms • Multiple data loggers • Several hundred sensors per building 11 What is our motivation? • A wide range of sensors are being used in our to monitor, develop, characterize performance of buildings on a component, system, and whole-building level. • Missing and corrupt sensor data can be an issue due to: – – – – 12 Sensor failure Sensor fouling Calibration error Data logger failure Data sharing • Files are shared by email, network drives, USB sticks • No history or lineage is maintained • Derivative works often lose their ancestry • Impediment to productivity Provenance data management 13 Overview of Sensor data validation • Develop a software tool for quality assurance of sensor data being generated by buildings. – Validation – flag data points that are missing or outside defined range of acceptable values – Correction – models are constructed based on validated data to replace flagged data points with predicted data points • Ensures datasets being used are complete and accurate. – – – – 14 Monitoring for operations and maintenance of buildings Software models Performance analyses Controls experiments for building automation and energy systems Correction Techniques • Correction techniques are used to predict flagged data points by using validated data points to generate models of the data. • Statistical Techniques – – – – Least square (LS) Maximum likelihood estimation (MLE) Segmentation averaging (SA) Threshold based • Filtering Methods – Kalman – Linear predictive coding (LPC) 15 Correction Techniques (cont.) • Studies were conducted to determine the accuracy of these methods for different types of data: – – – – – Temperature Humidity Energy consumption Pressure Airflow • Studies were based on data collected from experimental research homes – Four homes – Located in Oak Ridge, TN 16 Comparison of Statistical and Filtering Correction Methods • Threshold based statistical method performed best with temperature, humidity, energy, and airflow data. • Kalman filtering method performed best with pressure data. 17 Typical workflow Import data (.csv file) Validate data Correct data Output corrected data (.csv file) 18 Visualize data (spectrograms) Front GUI 19 Provenance 20 Data loggers 21 Under the hood • Provenance Library – Collaboration with Harvard University – Great fit for our needs – Brings the provenance to the data • Separates the data and its provenance • MySQL database • Security – 2 level LDAP authentication – Raw data resides on a separate server 22 System architecture 23 24 Visualization 25 Sparklines dashboard 26 Provenance – Workflows • QA tools (may be automated or manual) that trace the lineage – Charles Castello and Jeffrey Rossiter • Chunks of the data may be ‘improved’ which users might want to be the most current dataset • Generalized workflow hooks 27 Thank you! Any Questions? 28