Jaqueline Beckhelling, Loughborough University Presentation for

advertisement
Jaqueline Beckhelling, Loughborough University
Presentation for Second TEDDINET Workshop, 4/5th June 2014
Day 2, Theme 1, Digital innovation for energy savings in buildings
Data management for the DEFACTO project
What is the purpose of data management on a project? There are many different
answers to that question and I can’t deal with all of them in 5 minutes, but one
aspect of data management which I have been thinking about a lot recently is: What
will happen to the data after the project is complete?
In the case of the DEFACTO project, the data should be shared, which means I need
to create a dataset which can be accessed by researchers who are not familiar with
the DEFACTO project. A vital thing which I can do to facilitate access is document
the data well – however the importance of good documentation and the need to
develop metadata standards was acknowledged yesterday, so I will not discuss that
again today.
Instead I want to ask you a different question to which I’ve been giving a lot of
thought, which is: With whom am I going to be sharing these data? Or to be more
precise, how do I need to structure this dataset so it is as easy as possible for other
users to use it?
I can structure this dataset so it will be as easy as possible to use with most
statistical analysis packages (I’ve worked with all of the major packages). I think the
same structure will also be easily accessed by Matlab users, based on my
experience of that. However, I have little experience of preparing data for energy
plus and similar building modelling programs, so I need to make sure the data will be
useful for those users too. I think there will have to be some compromises made!
There is also the method of creating the dataset to be considered: The data are
coming from a variety of sources and will not fit together neatly, like a jigsaw. I will
need the programmatic equivalent of a crowbar and a large hammer to get some of it
to fit together. That means using database for the data manipulation because they
have very flexible and sophisticated methods of manipulating large amounts of data.
However, the best structure for a database will not be the best structure for use with
programmes which do not have the data handling capacities of a database.
If the database structure is not optimal it can affect the speed with which the
database operates, which could be a problem for the main DEFACTO project as it
will include hundreds of homes and we will be monitoring the internal temperatures
of every room for up to 3 years.
However, at the moment we are carrying out the pilot, which is based on only 12
homes. So I have decided to structure the database with the structure I think is
needed for the final dataset. I will be extracting the data which will be used in the
energy modelling programs, so I will see if the data requires a lot of restructure for
input into those programs and, if necessary, I can adjust the final data set structure
accordingly. I suspect the final structure of the dataset will not be ideal for anybody,
but I hope I can produce something which will not have major access problems for
anybody either.
I have made a major assumption in what I am doing currently, which is that I’ve
assumed the final dataset needs to be accessed independently of a database. I’ve
done this because few of the people I know working in energy monitoring seem to
have database skills and so I assume that a dataset which needs to be imported into
a database to be processed before being used would be of limited use.
My main message for today is that I think I am unilaterally making a lot of key
decisions which will have major implications for the long term use of the DEFACTO
data. Should I be doing this? I do discuss the decisions I make with the other
members of the DEFACTO team, but still a lot of what I do is based on my
experience alone. Is this the ideal situation for a dataset which could potentially be
useful for a range of purposes and by a range of researchers?
I think it would be better if a group of Teddinet researchers thought about the
structure we want for our shared data. Maybe it will be possible to come up with
common structures for at least some of the data we want to store. Maybe we can
only come up with guidelines. But if we had some degree of conformity about how
we stored data it would really improve the ease of access. As someone who has
spent more time that I like to think about working out how complex datasets work for
secondary analysis, I can assure you, it will be time well spent!
Download