Report - CAE Users

advertisement
Estimating Energy
Efficiency of Buildings
Matthew Wysocki
12/7/2013
Application of statistical machine learning to accurately predict the heating and cooling loads exerted by
a building during operation. This method will determine what parameters have the greatest effects on
building energy usage so engineers and architects may make more informed decisions while designing
buildings.
Introduction
A great amount of effort and research has gone into accurately predicting building
efficiency; efforts focus on a wide range of aspects of building efficiency: everything from
human behavior while the building is in operation to improving the accuracy of simulations in
engineering software. This comes as a result of steady increase in building energy usage and
concern about environmental impact. A majority of a building’s energy usage can be attributed
to heating, ventilation, and air conditioning; making buildings more efficient in these properties
will drastically increase
Simulation tools are commonly employed to analyze and predict energy usage of
buildings; with the aid of these tools, engineers are able to accurately predict the energy use of
buildings before construction has started. These simulation tools are able to compare two
buildings that are identical in all forms with the exception of a single parameter; this direct
comparison can yields good insight into how a given parameter affects the overall energy usage
of a building. In practice, running a large quantity of simulations can be very time consuming
especially for large projects; further, these simulation tools require a high level of proficiency
among the user.
Recent experiments have focused on using machine learning to accurately predict the
results of these simulations. With an adequately trained decision tree, the impact of changes
made to certain parameters can be quantified and engineers and architects will be able to make
more informed decisions relating to energy usage during the design process. For this study, I will
investigate the effects that eight building parameters that overall building energy consumption
depend on. These particular eight parameters, (relative compactness, surface area, wall area, roof
area, overall height, orientation, glazing area, and glazing area distribution) all have been studied
to have effects on building energy usage. These inputs will be used to predict heating load and
cooling load of particular buildings.
Data
The dataset that will be used was obtained from the University of California-Irvine
Machine Learning Repository. Each building form in the set has an equal volume of 771.75m3
and is simulated using the same building materials: materials that are considered common in the
construction industry. Further, each building is simulated as if it were located in Athens, Greece.
Internal climate conditions such as temperature, humidity, and lighting were also held constant.
The quantitative glazing area values are given as a percentage: 10%, 25% and 40%. The
glazing distribution was broken down into five categories: a uniform distribution with 25% of the
total glazing on each side of the building and the other four distributions are 55% on one side of
the building and 15% on the other sides of the building for each of the four cardinal directions:
north, south, east, and west. The dataset also includes buildings that do not have any glazing area
or glazing distribution at all. The orientation of the building is given one of four values for each
cardinal direction. There will also be twelve different building forms that vary surface area, wall
area, roof area, relative compactness, and building height. This leads to the 768 different values
that are in the data set. Figure 1 and Figure 2 show the distributions for each of the inputs and
each of the outputs respectively.
Figure 1
Figure 2
The building simulations with specifications given from the dataset were generated using
Ecotect building simulation software. Both heating load and cooling load were measured in the
simulation. While the results are not guaranteed be to perfectly accurate in the simulation, it
should give a good indication of the each feature directly can affect the energy usage of a
building.
Method and Results
One of the first observations I made with respect to the dataset was to each input variable
to each of the output variables. This is useful from a very basic standpoint of visually
determining what variables may be more important than others. Figure 3 shows each of the
variables mapped to heating load and figure 4 shows each of the variables mapped to cooling
load.
Figure 3
Figure 4
Next, I used three different correlation algorithms to attempt to correlate the data inputs
and outputs. Algorithms characterize relationships between the inputs and the outputs within a
range of -1 to 1. The value of 1 shows a very strong linear correlation while the value -1 shows
an inverse correlation between the input and the output: the value 0 means that there is no
correlation at all between the input and output. The three algorithms used to correlate the data
were the Pearson product-moment coefficient, Spearman’s rank correlation coefficient, and
Kendall’s rank correlation coefficient. Each of them yielded different but overall fairly consistent
outputs. Table 1 shows the correlations for the inputs correlated with the heating load and Table
2 shows the inputs correlated with the cooling load.
Input Value
Pearson productmoment coefficient
Spearman’s rank
correlation coefficient
Kendall’s rank
correlation coefficient
Relative
Compactness
0.6223
0.6221
0.3541
Surface Area
-0.6581
-0.6221
-0.3541
Wall Area
0.4557
0.4715
0.3424
Roof Area
-0.8618
-0.8040
-0.6102
Overall Height
0.8894
0.8613
0.7040
Orientation
-0.0026
-0.0042
-0.0031
Glazing Area
0.2698
0.3229
0.2632
Glazing Area
Distribution
0.0874
0.0683
0.0487
Table 1
Input Value
Pearson productmoment coefficient
Spearman’s rank
correlation coefficient
Kendall’s rank
correlation coefficient
Relative
Compactness
0.6343
0.6510
0.3871
Surface Area
-0.6730
-0.6510
-0.3871
Wall Area
0.4271
0.4160
0.3035
Roof Area
-0.8625
-0.8032
-0.6056
Overall Height
0.8958
0.8649
0.7063
Orientation
0.0143
0.0176
0.0130
Glazing Area
0.2075
0.2889
0.2398
Glazing Area
Distribution
0.050
0.0465
0.0331
Table 2
The way I chose to approach this problem was using a regression tree to make decisions
and get accurate outputs. Each node of the decision tree essentially represents a conditional
decision made by a tree; from a programming language stand-point this is an if/else statement. In
this case, each node checks the value of a specific input feature to see if it is more or less than
some threshold and then makes a decision to go down the left branch or right branch of the tree.
Each leaf of the decision tree represents an estimated output value that is assigned to the input
vector. Conceptually, a regression tree is simple to understand and is easy to visualize; it is also
able to get surprisingly accurate results. Figure 5 and Figure 6 show the regression trees that
were generated for both heating load and cooling load respectively.
Figure 5
Figure 6
Once the learner is trained, the next step is to test the performance of the decision tree;
this can be completed by checking how accurate a similar dataset is and getting the error. Cross
validation is a common statistical sampling technique that can be used for testing performance.
The dataset is divided into a training subset and a testing subset, where the training subset is used
to generate the learner and the testing subset is used to check the generalized performance. In
this case we use a 10-fold cross validation algorithm to test the learner. Finally, the mean
absolute error, mean square error, and the mean relative error are recorded. Table 3 shows the
mean absolute error, mean squared error, and the mean relative error for heating load and cooling
load.
Output Variable Mean Absolute Error Mean Squared Error Mean Relative Error
Heating Load
0.52 ± 0.16
1.10 ± 0.05
2.18 ± 0.61
Cooling Load
1.46 ± 0.21
6.59 ± 1.57
4.61± 0.68
Conclusion
I have built a regression tree that is able to take in eight different variables and apply this
data to accurately predict heating and cooling load. I have analyzed the input data and correlated
each feature to the outputs. These findings are significant as they may allow for engineers to
understand how each of the input changes may affect both heating and cooling and make
informed decisions while designing buildings without the need to run many lengthy simulations.
It is worth noting that these results are limited by the accuracy of the simulation run in Ecotect;
investigating the accuracy of these simulations is beyond the scope of this project. The results
will enable engineers and designers to save large amounts of time and design buildings much
faster and buildings that are more efficiently designed.
Download