Household Energy Consumption classification of Greater London David Goulvent Submitted on the 15th of September 2012 Number of words: 9820 MSc Birkbeck University 2012 1 Abstract Reducing energy consumption is the next frontier in the combat against climate change. The UK, as a member of the European Union has a non-binding target of reducing energy use by 20% by 2020, with energy efficiency at the cornerstone of the government’s strategy. But with increased pressure from the European Commission to move to mandatory targets and current energy policies not delivering to their full potential, improved targeting of energy efficiency measures at household level is key to successful policy. London alone, home to some 8 million people, is by far the highest number of hardto-treat properties, as far as energy efficiency improvements are concerned. The capital has nonetheless lost out under previous carbon schemes due to the higher cost of increasing building efficiency. Its intrinsic urban and cosmopolitan characteristics have been diluted in national energy related classifications. This study creates a London classification for household energy consumption build on geodemographic principles. This classification can enable energy behaviour profiling at a regional (Greater London) level to serve as a basis for improved targeting. Seven distinct clusters emerged as a result of this analysis ranging from the energy-intensive wealthy owners to the fuel-poor social renters. The study has been carried out within the open-source philosophy. The analytical process was based on public data and we have made the first step towards making the outcome of this study publically downloadable on the web. Our final offline visualisation of the classification is presented on a map following the Charles Booth tradition, infamously tied to the very roots of geodemographic-based classifications. 2 Acknowledgements 3 Table of contents Abstract………………………………………………………………………………………………………………2 Acknowledgements……………………………………………………………………………………………..3 Table of contents…………………………………………………………………………………………………4 Tables, Figures and maps…………………………………………………………………………………….7 Declaration………………………………………………………………………………………………………….8 Introduction………………………………………………………………………………………………………..9 I. Overview………………………………………………………………………………………………………..15 1.1 Why Classify?................................................................................................15 1.2 What are Area Classifications and Geodemographics?...................................15 1.3 Beyond definitions……………………………………………………………………………………….17 1.4 Cluster analysis…………………………………………………………………………………………….18 1.5 Energy consumption and geodemographics…………………………………………………18 1.6 Energy efficiency and policy…………………………………………………………………………19 II. Method and data…………………………………………………………………………………………..20 2.1 Cluster analysis……………………………………………………………………………………….……20 2.1.1 Step 1: Data selection process…………………………..………………………………….…..21 2.1.1.1 Variables selection………………………………………………………………………………...21 2.1.1.2 Scale……………………………………………………………………………………………………...22 2.1.1.3 Demographic variables…………………………………………………………………………..23 2.1.1.4 House composition variables………………………………………………………………….23 2.1.1.5 Housing variables………………………………………………………………………………..…24 2.1.1.6 Socio-economic and employment variables……………………………………………25 2.1.1.7 Energy consumption variables………………………………………………………………..26 2.1.2 Step 2: Normalisation………………………………………………….……………………………28 4 2.1.2.1 Logarithmic transformation……………………………………………………………………28 2.1.2.2 Standardisation………………………………………………………………………………………28 2.1.3 Steps 3: Clustering…………………………………………………………………………………….29 III. Application, results and discussion…………………………………………..…………………..29 3.1 Application…………………………………………………………………………………………………..29 3.1.1 Running the K-means………………………………………………………………………………..29 3.1.2 Selecting the appropriate number of clusters…………………………………………...30 3.2 Analysing and mapping the new geodemographic clusters…………………………..32 3.2.1 Step 4: The clusters and their descriptions………………………………………………..32 3.2.1.1 Cluster 1: Electricity-intensive city renters………………………………………………33 3.2.1.2 Cluster 2: Fuel poor social renters…………………………………………………………..35 3.2.1.3 Cluster 3: Energy-intensive wealthy greys………………………………………………37 3.2.1.4 Cluster 4: Average consuming London renters……………..………………………..39 3.2.1.5 Cluster 5: Wealthy energy intensive owners……………………..…………………..41 3.2.1.6 Cluster 6: Average use suburban working families………………………………..43 3.2.1.7 Cluster 7: Low consuming strained renters……………………………………………44 3.3 Visualisation………………………………………………………………………………………………..46 3.4 Limits and extensions……………………………….………………………………………………….50 3.4.1 Ecological Fallacy………………………………………………………………………………………50 3.4.2 The Modifiable Areal Unit Problem………….……………………………………………….50 3.4.3 Dating data……………………………………………………………………………………………….51 3.4.4 Missing data…………………………………………..……….………………………………………..51 5 3.4.5 The Labelling of Clusters…………………………..……….………………………………………52 3.4.6 Fuzzy classification?....................................................................................52 3.4.7 Hierarchical classification………………….………………………………………………………53 3.4.8 Classification validation…………………….………………………………………………………54 Conclusions………………………………………….……………………………….……………………..……54 References………………………………..………………………………………………………………………56 Data sources……………………………………………..………………………………………………………60 Annexes…………………………………………………………………………………………………………….61 6 Tables, Figures and maps Table 1: List of the 31 variables selected for input to the classification Table 2: Homogeneity of the cluster membership size Table 3: Number of LSOAs in each cluster Figures Figure 1: Average distance from the cluster centre by number of clusters Figure 2: Summary of cluster 1 Figure 3: Summary of cluster 2 Figure 4: Summary of cluster 3 Figure 5: Summary of cluster 4 Figure 6: Summary of cluster 5 Figure 7: Summary of cluster 6 Figure 8: Summary of cluster 7 Figure 9: Visualisation of the building layer in Google Earth Maps Map 1: Study area localisation Map 2: Localisation of LSOAs in cluster 1 Map 3: Localisation of LSOAs in cluster 2 Map 4: Localisation of LSOAs in cluster 3 Map 5: Localisation of LSOAs in cluster 4 Map 6: Localisation of LSOAs in cluster 5 Map 7: Localisation of LSOAs in cluster 6 Map 8: Localisation of LSOAs in cluster 7 7 Map 9: Visualisation of household energy consumption classification in choropleth map Map 10: Visualisation of household energy consumption classification in Charles Booth’s style 8 Declaration I have read and understood the section of the handbook that explains plagiarism, including that related to group work. I testify that, unless otherwise acknowledged, the work submitted herein is entirely my own. This dissertation is my own unaided work and has not been submitted for a further degree at any other Higher Education Institution. It does not exceed the word limit of 10,000 words. 9 Household Energy Consumption classification of Greater London Introduction According to the Department of Energy and Climate Change (DECC, 2011), energy consumption from the domestic sector in 2011 was 38, 842 thousand tonnes of oil equivalent, about 26% of total UK final consumption of energy products. A quarter of the UK’s carbon emissions come from the energy used in homes (The Energy Trust 2012). While most of the UK and the EU’s binding targets around the reduction of carbon emissions apply to businesses and industries, there is increasing pressure to cast the target net wider and implement change at the household level through a cut in residential energy use. The EU has a non-binding target to cut energy consumption by 20% by 2020 with energy efficiency the main driver of that reduction (European Commission, 2008). Although this target is not yet binding, there is growing consensus within the European Commission that the only way to see Member States meet the target is to make it binding. The UK’s Energy Act 2011 already includes energy efficiency policies – with the forthcoming Green Deal set to soft-launch later this year. The challenge remains their reach and implementation within the different types of households, whether that is geodemographics or building types for example. While current UK energy efficiency policies are more actionable at a new building level and some progress has been made already around insulation with DECC reporting an increase of 6% of cavity wall insulation, 9% of loft insulation and 22% of solid wall insulation, sizable gains are to be made at the existing household level (DECC, a, 2011). 10 With the threat of binding EU targets looming, combined with rising wholesale energy prices, energy efficiency or what we will define as using less energy to provide the same level of performance, comfort, and convenience, is firmly back on the agenda. But while there are tangible technological solutions available (smart metering, insulation, voltage optimisation, green grants) the uptake by different types of households remains a challenge, confined in some cases to a certain profile. Visualising relatively homogenous groups of energy efficiency household types at a London Lower Super Output Area (LSOA1) level through a classification method could enable a better understanding of household behaviour and energy characteristics, which in turn could lead to improved policy targeting. “Area classifications provide a unique way of bringing together a real pattern for a range of variables” (Vickers and Rees, 2007; Webber and Craig, 1978). Geodemographic classification system has been widely used now by geographers, policy makers and market researchers, to underline the neighbourhood effect and help to classify geographic areas according to the characteristics of people living there, based on the principle that people living close to each other tend to be more similar than to people living further away. Such a classification would also help energy suppliers – increasing at the forefront of energy efficiency measures due to government policies – to guide their customer base into reducing their energy use and subsequently carbon emissions. “Geodemographic profiling also presents the opportunity to achieve savings by targeting communication programmes at populations to whom their messages are most appropriate.” (Longley, 2005) and is therefore a key method for government to encourage uptake of new energy saving policies. A large amount of studies have been published in recent years on energy consumption, carbon emissions and the implication on climate change. However, there still a lack of analysis linking these issues with geodemographic factors, as we will explain in our literary review. 1 The average population of an LSOA in London in 2010 was 1,642 (http://data.london.gov.uk/) 11 This study will be limited to Greater London. The capital has by far the highest number of needy properties of any region in the UK. According to the London Greater Authority, London has lost out under previous carbon reducing schemes due to higher costs of increasing building energy efficiency. According to a recent briefing by the Mayor of London on the upcoming Green Deal: “This is because energy companies have previously fulfilled their obligations wherever it most costeffective to do so, without regard to the potential to reduce comparatively high levels of fuel poverty in London and carbon emissions from some hard-to-reach housing stock” (Mayor of London, July 20122). There is already a fear that the forthcoming national Green Deal, including the new Energy Company Obligation (ECO), which will put the burden on energy companies, expected to provide £1.3 billion a year towards the Green Deal for low income and hard to insulate homes, will not deliver the required results in London. The danger being that as under previous schemes energy companies focus on treating properties and areas that are cheaper and easier to retrofit (Mayor of London, July 2012). Around 22% of hard-to-treat properties in England are in London. The London Mayor briefing states that Londoners in flats and mid-terraces could be excluded from the Green Deal, as housing stock will be unable to access a high enough subsidy from the ECO to make the scheme viable. There is a call for area allocation for London under ECO to take into account its unique social, architectural and economic traits. A London energy classification could enable this. 2 The London Mayor briefing in annex 12 Map 1: Study area localisation Hypothesis, aims and objectives Our main hypothesis will be to assess the possibility of energy behaviour profiling at an LSOA level for London through the creation of a household energy consumption classification for Greater London. Will the groups created from our new classification be distinct enough to enable improved targeting and uptake of energy 13 efficiency measures? To what extent can an open source classification of households from a socio-economic and energy consumption perspective accelerate decision making and actionable targets around energy efficiency? The null hypothesis would be that such a classification does not bring anything to the current academic energy efficiency and consumption debate. The aim of this study is not however to match existing energy reducing policies with different London profiles. Certain energy efficiency measures will be suggested following the description of the clusters to show how this type of classification can enable better targeting. An analysis of existing measures and their uptake as a result of this classification would be the subject of another study. As far as energy efficiency is concerned, it is not financially viable to replace older, low energy efficient households and replace with new, more energy efficiency buildings. So policy makers are also looking at improving older houses. Knowing where the least energy efficient houses are and what type of people live in them can feed the decision making over how to tackle the issue and determine who would be responsible for improvements Enabling and visualising of socio-economic profiling of the London population at a LSOA level depending on a defined set of energy household characteristics can help deliver the right campaigns and methods to see energy efficiency actions take off. Finally through the presentation of the final maps of this research project, we will insist on the advantages of exploratory visualization methods to help households, policy makers and energy organisation to actually see energy efficiency in London in a more intuitive way. After presenting the academic foundations on which this study will be built and outlining its relevance within the current debate, we will define our method and data selection and analysis. The next step will contain the description of our clusters, followed by insight into the limits of this study. Finally we will conclude and establish whether the objectives of this study were met. 14 I. Overview This section will define the principle concepts and theories on which this study is based, notably classification and geodemographics. It will also situate this study within the context of existing research, as well as touching on its relevance. 1.1 Why Classify? According to Everitt, “a classification scheme may simply represent a convenient method for organizing a large data set so that it can be more easily understood and information retrieved more efficiently” (Everitt et al., 2010). Classification enables a greater understanding of socio-economic characteristics and behaviors as it allows for the grouping or clustering of elements (people, buildings, areas etc.) with similar traits. One of the most well-known classification undertakings in the UK is the Output Area classification (OAC) created in collaboration between the Office for National Statistics (ONS) and the University of Leeds. According to the OAC user group: “the OAC distils key results from the Census for the whole of the UK to indicate the character of local areas” (The OAC User Group3). Vickers, who was tasked with the creation of the OAC in 2005, alongside Rees, created a hierarchy of seven, 21 or 52 classes from 41 Census 2001 variables in order to portray the main characteristic of the household in UK. Vickers explains: “[…] Clear distinctions can be made between neighbourhoods, for example on the basis of affluence, rurality or multiculturalism. The classification can answer many questions about the residential patterns of the UK at the start of the 21st century” (Vickers’ at the University of Leeds; School of Geography4). 1.2 What are Area Classifications and Geodemographics? Area classification is the classifying of areas into groups of similarity, based on the characteristics of selected features within them (Everitt et al., 2001). In our study, 3 http://areaclassification.org.uk/ 4 http://www.sheffield.ac.uk/geography/staff/vickers_dan/index 15 area classification will refer to the segmentation of LSOA in London, based on sociodemographic and energy consumption data. We now have to define geodemographics. In the 1980s, geodemographic analysis through “neighbourhood classification became linked to the marketing roles of leading commercial organizations, particularly retailers” (Longley, 2005) while neighbourhood classification was obsolete in academic and public sectors. Geodemographics is not just a set of off- the-shelf consumer targeting products it is: “the analysis of people by where they live” (Sleight 2004; Vickers and Rees, 2007). To Sleight’s rather concise and stripped-down definition, Birkin and Clarke 1998 add: “demography is the study of population types and their dynamics therefore geodemographics may be labelled as the study of population types and their dynamics as they vary by geographical area”. (Birkin and Clarke, 1998; Vickers and Rees, 2006) Geodemographic is based on the relation between the people and the place they live. Knowing about where somebody lives can reveal a lot of information about that person (Vickers et al., 2005). The concept of geodemographics is based on the Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than those far apart” (Tobler 1970). Vickers et al. (2005) introduces another dimension to this law, which will be essential within this study of an urban area such as London. While those living closer to each other tend to be more similar, in urban areas for example, some neighbourhoods, which are at opposite sides of a large city, will still be have similar characteristics due to their positioning with regards to the urban centre. In London for example, two suburban neighbourhoods at opposite locations from the centre, could still present the same characteristics, as some of our clusters will show. We will retain the use of the term neighbourhood to define the different areas of our study. The term neighbourhood enables conceptualisation of the area in which you live (Vickers and Rees, 2007). Singleton adds that the definition of “neighbourhood in a geodemographic sense is determined by the size of the 16 geographic areas used to output the classification” (Alex Singleton, 2007). The term neighbourhood in our study will be the LSOA. 1.3 Beyond definitions… Having given the relevant definitions around the notion of classification, let us step back to pin point the origin of the movement and reiterate some of its first usages. There is a consensus that Charles Booth and his studies of deprivation and poverty of London between 1889 and 1899 are at the origin of the movement. Charles and his researchers went all around London visiting all households to interview the inhabitants. They then characterised them into seven categories, such as “Well-todo” or more derogatory…“Vicious, semi-criminal”. Since the publication of 1981 UK Census of Population results, geodemographics classifications have occupied a large position in the private sector, and “by the middle of the decade four main systems were competing for dominance, ACORN, PiN, Mosaic and Super Profiles” (Vickers et al., 2005). Even though these commercial companies always used Census data, they added “other type of data from Country court judgements, credit reference agencies, vehicle registrations, and lifestyle surveys” (Harris et al., 2005). The main issue with the commercial geodemographic classifications is they are created as ‘black -box’ systems (Longley and Singleton, 2009), with little transparency around the raw data and methods used to elaborate their classifications. Geodemographic classifications are still heavily used in the marketing industry. (Vickers et al, 2005). They allow for example to determine the customer archetype and their location for new products or estimate opinion polls. As Gale et al. (2012) note, “the use of geodemographic classifications has become popular in different areas with applications in health (Farr and Evans, 2005; Shelton et al., 2006), policing (Ashby and Longley 2005), education (Singleton, 2010) and local government (Longley and Singleton 2009)”. The use of geodemographics in a public health setting can help for service delivery planning or targeting campaign. 17 1.4 Cluster analysis We will base our geodemographic classification for this study on cluster analysis. “Cluster analysis is a generic term for a wide range of numerical methods for examining multivariate data with a view to uncovering or discovering groups or clusters of homogeneous observations” (Everitt et al., 2001). It is a human automatism to group similar objects in into categories. Objects in the groups or clusters are supposed to have similar characteristics within the same cluster but are dissimilar to the objects in other clusters. A detailed description of the steps for a cluster analysis will be presented later, in the method section of this study. 1.5 Energy consumption and geodemographics While the creation of the open source geodemographics system OAC by the ONS has been an important step towards better understanding of socio-economic traits based on location, the question over whether it was pertinent for a large city like London was raised ahead of the latest Census. A new classification for London was introduced for the latest 2011 Census to better capture the specificities of the UK’s largest city. Already in 2007, Petersen et al. created a new regional geodemographics for London for health applications. At the time, they said “the National OAC did not represent the variation measured as market penetration potential across the 41 Census variables in Greater London very well”. Gale and Longley assessed the OAC 2001 in order to improve the new coming OAC 2011, they underlined the fact “London's economic, political, cultural and infrastructural characteristics set it apart from the rest of the United Kingdom and to a large extent the rest of the world” (Petersen et al., 2010; Gale and Longley, 2012). They insisted that the national classification such as OAC 2001 wasn’t defining properly the diversity of London. DECC analysis shows London as having one of the regions with the highest domestic consumptions of gas and electricity in the UK. However, and yet again highlighting the benefit of using a regional classification for London, it is important to notice here that it doesn’t mean households in London are consuming more than the rest 18 of the county. In fact, energy consumption per capita in London is relatively low. To understand and consider the specificity of regional areas within a national territory, Peterson et al. (2010) propose to create a regional classification to co-exist alongside a national classification. Considering the scale of this work and the particularities of London, this study will only focus on Greater London. A large amount of studies have been published in recent years on energy consumption, carbon emissions and the implication on climate change. However, there still a lack of analysis linking these issues with geodemographic factors. There are studies showing a relation between energy consumption and income, such as the one undertaken by Druckman and Jackson (2008) demonstrating a link between energy consumption, carbon footprint and environmental awareness and socioeconomics elements. They are limited in their approach as they focus mainly on correlations. In Drukman’s paper for example, he uses the two extreme opposite groups from the OAC, which we know is not sufficient enough for a pertinent London analysis. In this study will go beyond correlations to create London’s own energy classification. An outcome similar to the one we will attempt to produce here in terms of classification was reached by Experian in 2008 when they created a carbon emissions classification called GreenAware: “to identify priority areas and households and to map regional variations in behaviour and attitudes toward carbon reduction” (Experian 2008). However, as is the case with some other of their classifications on geodemographic, the full range of data used and the method to create this classification are obscure and not entirely presented… the infamous black-box system. There is currently a PHD being undertaken by Sarah Goodwin (UCL) with a similar objective to the one presented here but it has yet to be published. 1.6 Energy efficiency and policy The UK’s residential energy efficiency policy is changing. From October 2012, the 19 government’s efforts to improve energy efficiency will be delivered through the Green Deal and the new Energy Company Obligation (ECO). Soft-launching next month with the bulk of residential policies kick starting in January 2013, the Green Deal and ECO will jump start a number of new range of measures centered around low upfront cost for efficiency household improvements. As member of the EU, the UK has a non-binding target to reduce energy consumption by 20% by 2020 and although it is appears to be quietly shying away from Brussels wish to make this target mandatory the UK has nonetheless made progress over recent years in sectors such as transport, industry and buildings. London alone has a large role to play in meeting targets, housing around 8 million people and home to 22% of known, heard-to-treat properties. The Mayor has set a target to cut carbon emission from the capital by 60% by 2025 with 35% of London emissions coming from the households (Greater London Authority, 2012). While national policies have a role to play in London’s aim to meet its carbon and energy efficiency targets, the characteristics of such a dense urban area like London has led to the creation of London only measures or more recently, calls to tweak existing or upcoming measures to ensure London and its specific traits does not miss out on crucial funding. The uptake of energy efficiency measures is the main challenge the government and local authorities face. The barriers that can prevent uptake are divers: behaviour and motivation, financial, misaligned incentives and hidden costs. A London classification could help mitigate some of these challenges through a more thorough understanding of target areas and households. II. Method and Data 2.1 Cluster analysis Geodemographic classifications are performed following various steps: The following outline shows the succession of the steps and elements to keep in mind when undertaking geodemographic classification. 20 The first step is to select the data sources and variables. Then comes the normalisation of the input data to enable comparison between different data types. There are various methods used to normalise data and we will explain later which one we used and the reasons behind our choice. The third step is clustering. As with the normalisation of data, there are various methods used to undertake cluster analysis, each with their advantages and limits. The last step, which will be presented in the result part of our study, consists of interpreting the results by naming the clusters or pen portrait[s] and describing their main characteristics. 2.1.1 Step 1: Data selection process 2.1.1.1 Variables selection Data selection has been at the core of the debate surrounding the geodemographic system. While most of the geodemographic classifications are based on the Census data, others also combine data from commercial surveys such as Experian or other public data. “Some of the geodemographic companies may have added non-Census variables (credit ratings, county court judgements) but the impact of such additions upon the classification systems is unclear” (James Debenham, 2002). For this study, we will use Census data 2001 and energy consumption figures from DECC (2009). There are several reasons to use only Census data as well as free publically available data. The main one is to ensure that the final classification is open-source. Gale et al. (2012) explain the need of open, transparent and flexible geodemographic classifications and we will keep within this philosophy in this current analysis. The pursuit of open-source output is increasingly facilitated by the current expansion of open data initiatives that have resulted in an ever-increasing amount of data sources becoming available to the public (Gale et al., 2012). The main advantage of making it open-source, is the possibility of being able to freely publish the results and maps on the web so that third parties, either 21 governmental organisation, private company but also households can use it. Later, we will determine whether this aim has been reached. Beyond the wish to conduct a fully open-source and freely available study, the use of publically available data stems from the issues around accessibility of other types of data. Including data from lifestyles databases derived from consumers’ survey can be costly and legally difficult to publish in a raw state once used. Data collected from the private sector also has the inconvenience of not being representative either geographically or demographically. As Vickers et al. in 2003 explained, these data can be transferred in other spatial scale but none of the method available today has been proven as strong enough to secure a satisfactory level of accuracy. With OAC, we noted a lack of variable on financial incomes. Vickers and Rees assume that census variables such as car ownership and type of housing which provide a good proxy for income (Vickers and Rees, 2006). We did not select the car owner variable here but we will in the description of the variable selected explain how we aim to overcome this missing information. We first had to consider all the data available and find a way to condense them to a level that would give us enough information on our subject without clouding the analysis with unnecessary details. As Openshaw and Blake underline, the choice of variables and their specification has to reflect the explicit purpose (Blake & Openshaw, 1995) and must be able to inform us on the household general characteristics, as well as explain their energy consumption. The selection of data has been influenced by the creation of the OAC as a benchmark of geodemographic classification. As well as personal judgement, in some cases we looked at whether certain variables were highly correlated. However, this does not mean we excluded all correlated variables. 2.1.1.2 Scale Our choice to use LSOA as the primary scale for this study has been driven by data availability. While the Census 2001 data were available at Output Area (OA) level - 22 the smallest geographical unit for Census data, energy consumption data were only available at LSOA. For this study, we chose the following types of data: demographic structure, household composition, housing, socio-economic and energy consumption. 2.1.1.3 Demographic variables As for the OAC, gender was rejected as most of the areas have similar gender proportion. With regards to age, we took the same age range as for the OAC. They grouped Census categories together to avoid data variable redundancy. We merged the age variables to have only 4 variables representing 5 to 14 year olds, the 25 to 44, the 45 to 64 and the 65 and over. We did not include the 15 to 24 year olds as this group tended to be prone to changes in residential circumstance. We did however include the student variable, which was correlated to this age group. In term of demographic data, we also included the number of persons per hectare, a measure of population density, as it seems as a good indicator on the type of urban area and puts high-energy consumption areas into perspective. 2.1.1.4 House composition variables “Household composition is, as expected, another significant factor in domestic energy use and associated carbon emissions,” (Druckman and Jackson, 2008). We have selected the following variables: two adults with children and two adults without children. They both represent the aggregation of cohabiting and married couples. Households with non-dependent children were rejected, as they made up only a small proportion of households. We included one person in the household, one-person pensioner and lone-parent households, as there is an increasing tendency towards the one-person household. The ONS noted that the proportion of lone-parent families, and subsequent smaller family sizes had been increasing since 1991 (ONS, a 2011). We chose not to include students as a variable at this stage, 23 under house composition, as we have selected a student related variable under our socio-economic data. 2.1.1.5 Housing variables As well as being related to energy consumption, housing characteristics are also a good indicator of the household wealth. For this study, we included owned, private and socially rented households. This type of knowledge would enable better targeting of energy efficiency measures. As highlighted by Druckman and Jackson, the issue of uptake of energy efficiency measures in rented accommodation remains current (Druckman and Jackson, 2008). They underline the high proportion of dwellings with less loft insulation in private rented sector, as it is difficult to convince landlords to make any improvement to their houses, as they have to pay for it without directly benefit from the energy efficiency and financial gains. As we mention later in this study, a new policy ECO, under the forthcoming Green Deal, is looking to address how to increase energy efficiency in rented housing stock. Inefficient households will no longer be fit to rent come 2018. ‘House type’ refers to whether dwellings are semi‐detached, terraced houses, detached houses or flats. “This is significant in energy terms because heating energy is related to external wall area and window area. Flats tend to have less external wall area compared to their floor area (so have less heat loss in winter), while detached houses typically have more external wall and more windows than equivalent homes of other types,” (DECC, b, 2011). The average heat losses of different types of dwellings range from 365W/°C for a detached house down to 182 W/°C for a flat (Druckman and Jackson, 2008). We will also include data showing the average number of rooms per household and average number of people per room. These two variables are strong indications of the level of wealth, as well as having an impact on energy consumption. 24 2.1.1.6 Socio-economic and employment variables Selecting variables to best understand social and economic status of households can prove to be a more subtle exercise. Based on previous geodemographic classification, we considered variables on the occupation group, economic activity or The National Statistics Socio-economic Classification (NS-SEC). The number of variables on the type of occupation was relatively high. We could have aggregated certain occupations as they were highly correlated but we would still have had at least 5 groups of occupation that could create confusion in the interpretation. As for the economic activity variables, they lacked insight around the full time employee. People under the variables descriptive full time employee can have radically different statuses and therefore incomes. “The NS-SEC is the primary social classification in the United Kingdom” (Wikipedia 2012). The full version of NS-SEC has 17 main categories and is collapsible down to three categories. The advantage on using the NS-SEC is that “it has been constructed to measure the employment relations and conditions of occupations” (ONS, b, 2012). It groups jobs of similar social and economic status in classes using the occupation and employment type data and therefore can be seen as a condensed result of the association of these two set of data. There are 3 versions of NS-SEC: eight classes; five classes or three classes. For our study we will use the three-class version, which “is the only representation […] that might be assumed to involve some kind of hierarchy” (Rose et al 1998). We add an additional class Never-worked and long-term unemployed. We also include the variable full time student and retired. We choose at this stage to include the variable retired despite having also already chosen one-pensioner household under house composition, which in this case and as we will see in the cluster composition, more characteristic of financially strained as opposed to retired , which we will see appeared with other more affluent characteristics. Not including the variable retired at this stage would have omitted all households characterised by retired couples. The industry sector variables, including agriculture, fishing, mining were not representative enough of London characteristics as some sectors were quasi inexistent in London. 25 We also included the percentage of not qualified people and the percentage of educated people to degree level or more. Even though not qualified people are correlated to other variables such as social rented housing, education characteristics allow understanding and insight around how to transmit new policies. Education is also correlated with green awareness and therefore an important element to consider when deciding on what approach to take for different groups. 2.1.1.7 Energy consumption variables DECC has collated and analysed property level electricity and gas consumption data since 2004. “These administrative datasets provide total and average consumption of domestic ordinary electricity, and gas at LSOA level. The data cover annual consumption for 2009. The data cover all metered domestic gas and electricity consumption,” (DECC5). To produce these electricity consumption estimates, annualised consumption data is provided to DECC at Meter Point Administration Number (MPAN) level by the data aggregators (DAs). DAs are agents of the electricity suppliers who collate/aggregate electricity consumption levels for each electricity meter in Great Britain” (DECC). A similar process is used to compile gas data. Gas transporters supply DECC with the Annualised Quantity (AQ) for each Meter Point Reference Number (MPRN) or gas meter as well as address point data,” (DECC). We also decided that is was necessary for the relevance of this study to include the fuel poverty variable. “Low‐income households, who spend proportionately more of their incomes on energy, are hit much harder by energy cost rises. Their demand for energy tends to be more elastic than wealthier households, meaning that they tend to use less if prices rise” (DECC, b, 2011). As we will discover in what we will define as Fuel Poor Households, consumption of energy is low and this is not due to green awareness but to financial barriers. Any measures looking at increasing 5 From Energy consumption 2009 Metadata 26 energy prices as a way to curb energy demand must look hard at the impact (health, well-being) on the fuel-poor. Variables Definition Demographic v1 Age 5–14 : percentage of resident population aged 5–14 years v2 Age 25–44: percentage of resident population aged 25–44 years v3 Age 45–64: percentage of resident population aged 45–64 years v4 Age 65+: percentage of resident population aged 65 or more years v5 Population density : population density (the number of people per hectare) Household composition v6 Single pensioner household: percentage of households which are single-pensioner households v7 Single person household (not pensioner): percentage of households with one person who is not a pensioner v8 Two adults with children: percentage of households which are cohabiting or married couple households with children v9 Two adults no children: percentage of households which are cohabiting or married couple households with no children v10 Lone parent household: percentage of households which are lone parent households with dependent children Housing v11 Owned: percentage of households that are owned v12 Social rented: percentage of households that are public sector rented accommodation v13 Private rented: percentage of households that are private or other rented accommodation v14 Detached Housing: percentage of all household spaces which are detached v15 Semi-detached housing: percentage of all household spaces which are semi-detached v16 Terraced housing: percentage of all household spaces which are terraced v17 All flats: percentage of households which are flats v18 People per room : average number of people per room v19 Average house size : average house size (rooms per household) v20 without central heating; with sole use of bath/shower and toilet Socio-economic and employment v21 Managerial and professional occupations v22 Intermediate occupations v23 Routine and manual occupations v24 Never worked and long-term unemployed v25 Full time student (economically active) v26 Retired v27 No qualifications v28 Level_4_5 27 Energy consumption v29 Consumption of Ordinary Domestic Electricity 2009 v30 Consumption of Domestic Gas 2009 v31 Fuel Poverty 2009 Table 1: List of the 31 variables selected for input to the classification 2.1.2 Step 2: Normalisation 2.1.2.1 Logarithmic transformation Before standardisation and cluster of the data, we transformed the data to a log scale. "Using logs is one of several ways in which the effect of outliers can be reduced" (Harris et al. 2005). The second aim to log scale the data is it “reduces the likelihood of a highly skewed distribution within a variable which would create uneven cluster sizes” (Vickers & Rees, 2005). To undertake the log scale, we used the software SPSS 17. Log transform can only be applied to numbers above 0 so we added the value 1 to all the data in the dataset. 2.1.2.2 Standardisation After transforming the data to a log scale, we standardised the input data. This step is essential in transforming the data into rate or measure to the same scale, enabling comparisons. Each variable has been standardised following the Range method. “In this method [Range] the minimum and the maximum of the data values are computed (thus the range – max-min) and each value has the min. subtracted from it and is then divided by the range” (De Smith et al., 2007). To not standardise the data would give too much weight to variables with larger value or important variations. This method was also used in the ONS 1991 classification of local authorities and for the UK national Statistic 2001 AOC. 28 2.1.3 Steps 3: Clustering Different methods that can be used to create such a classification have been widely discussed. There is an abundance of different clustering algorithms available: hierarchical clustering, K-means, fuzzy K-means to cite only a few. For this study we will use “the k-means classification as one of the most commonly used methods in the geodemographics industry” (Harris et al., 2005). This method also suits large data sets. For the k-means analysis, once you’ve chosen the number of clusters, “the algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest” (SPSS 17). “The data is randomly split into K clusters and the distances between the cluster centres and the observation values in m-dimensional space are measured using the Pythagorean equation for Euclidean distances” (Debenham et al., 2001). It then recalculates the centre means from each object in the clusters. These new centre means are used to classify again the objects. This step needs to be repeated until their values are close or similar to the previous ones at which point we define the situation as stable. III. Application, results and discussion 3.1 Application 3.1.1 Running the K-means There are several statistical packages that enable clustering analysis (R, SPSS, and Microsoft Excel). For this work, we used SPSS 17 as Birkbeck College benefits from free access for students and a user-friendly interface. R could have been an interesting free alternative for this work. 29 It is advised to run the algorithm until the results are stable. This part was time consuming as we ran the cluster analysis to the point where the means of the cluster are identical for two successive runs. We encountered a problem with one LSOA, which formed a group on its own. Once identified, we removed it from the dataset before running the cluster analysis again. The LSOA was added later to the appropriate cluster. To decide which cluster this LSOA should belong to, we compared the average of the difference between this LSOA variables values and the mean centre of each clusters. The number of clusters (K) has to be specified before running the analysis. Callingham (2003) suggested that “the most useful number of clusters in the first level would be around 6” (Vickers et al., 2005) so we repeated the clustering process with different values of K from 3 to 8 to find the best results. 3.1.2 Selecting the appropriate number of clusters There isn’t one way to choose the number of clusters and it mainly depends on the data selected, as well as personal interpretation of how many typologies the classification should create (Debenham et al., 2001; Vickers and Rees, 2007). To help our choice we conducted two recommend basics analysis: observing the average distance of each object from the mean centres for each cluster and looking at the number of cases in the clusters. Figure 1 shows the average distance of each case from its cluster centre. It is normal to see the values decreasing as the number of cluster increase. It is hard to see an obvious solution; however we can see a higher intensity in the increase of the line from 6 to 5, 5 to 4 and 4 to 3 clusters. For this reason we can consider to be reasonable to keep 6, 7 and 8 as possible number of clusters. 30 Figure 1: Average distance from the cluster centre by number of clusters The second element weighing on the choice of the number of clusters is the number of members in each cluster. It is best to have the most evenly distributed members in the groups. Although we cannot expect exactly the same number of members, it is imperative to avoid having the majority of cases in one or two clusters and then a number of sparsely populated groups (Debenham et al., 2001). To assess it for each K possibilities, we calculated “the average difference between the number of members in each cluster from the mean -the mean is the optimal solution as all clusters will have the same number of members” (Vickers and Rees, 2007). Results can be seen in the table 2. The values 7 and 8 stood out from this method. Number of members in each cluster Average distance from the mean 3 clusters 1375 1586 1803 - 4 clusters 1304 1102 855 1503 - 5 clusters 938 1438 731 723 934 - 6 clusters 718 559 727 649 1057 1054 - 7 clusters 693 648 446 997 704 660 616 - 8 clusters 496 693 389 677 557 802 548 602 143 213 194 174 101 98 Table 2: Homogeneity of the cluster membership size 31 Considering the results from the two methods, it appears the optimal number of clusters should be 7 or 8. After looking at the cluster characteristics for 7 and 8 clusters, we decided to take 7 clusters, as in the 8 clusters solution there was a redundancy of type of areas. LSOAs % of LSOAs 1 2 3 4 5 6 7 Range 693 648 446 997 704 660 616 551 14.55 13.6 9.36 20.93 14.78 13.85 12.93 - Table 3: Number of LSOAs in each cluster 3.2 Analysing and mapping the new geodemographic clusters 3.2.1 Step 4: The clusters and their descriptions Pen portraits are “small descriptive analyses of the clusters that draw upon their main identifiable characteristics” (Debenham et al., 2001). The aim to defining clusters is to find a name, which can quickly convey the main type of group without offending or introducing negative connotations. For example, Vickers and Rees reject the word “elderly” as it could portray old age in a negative sense (Vickers & Rees, 2007). Often names can be too specific and give a stereotype of a cluster that although may be an accurate representation of the mean values or the cluster centre it does not represent any of the diversity within the cluster (Vickers, 2006). Vickers has criticised commercial geodemographic labelling as some cluster names emphasize certain groups or have negative undertones (Vickers, 2006). They also insist on not repeating any group name which could have been already use by other classifications. After characterising the main elements of each cluster, we will briefly put forward policies from the set-to-launch Green Deal as well as other existing measures in place to curb energy use. However, as stated in the introduction, the aim of this study is not to match up existing and forthcoming legislation and policy around 32 energy efficiency and the different clusters but to create a classification that would enable a more precise targeting of energy efficiency policies. For the presentation of the cluster order, we had the choice between using the energy consumption variables to create an ordinal order or just use the random order generated from the cluster analysis. We preferred to describe the clusters in the same order as they appeared after the cluster analysis, to avoid any subjective idea of superiority or inferiority between them. For each cluster portrait, we will add the radial plot showing the values for each variable. The values are the difference from the mean for that variable. The mean, which is therefore 1, it also represented in each radial plot. We have also added a separate basic map for each cluster representing the distribution of the LSOAs, which allows to geographically localise each group. 3.2.1.1 Cluster 1: Electricity-intensive city renters Distinctive Variables: V30 V29 V28 V27 V26 V25 V24 V310.25 V1 High: V2 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 V3 V4 V5 V6 V7 V8 V9 V23 V10 V22 V11 V21 V12 V20 V13 V19 V18 V17 Age 25-44 Single Person Household Two adults with children Two adults no children Private rented All flats Managerial and professional occupations Educated V16 V15 V14 Cluster 1 Mean Low: Age 5-14 Semi-detached People per room Average house size Routine and manual occupations Retired No qualification Fuel poverty Figure 2: Summary of cluster 1 33 Map 2: Localisation of LSOAs in cluster 1 This group, predominantly situated in or close to the city centre, is characterised by a high proportion of managerial and professional occupations within the 25-44 years age bracket. They are mainly geographically concentrated in the inner city and live in flats rather than houses. They are educated people in the prime of the active life. The house composition is essentially represented by single, couples and couples with kids younger than 5 years old. The low proportion of not educated people, fuel poverty and routine and manual occupation within this cluster underlines the fact this group represents young wealthy professional still renting, yet to get on to the property ladder. Well-equipped and usually living in flats, their electric consumption is the highest of all the groups, while their gas consumption is close to the average. This group has the advantage of being aware of climate change issues in relation to their energy consumption but the disadvantaged of being trapped in the landlord-renter 34 dilemma, as they can’t improve the energy efficiency of their home without landlord buy-in. 3.2.1.2 Cluster 2: Fuel poor social renters Distinctive Variables: V30 V29 V28 V27 V26 V25 V24 V31 0.3 V1 V2 0.25 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 -0.3 -0.35 -0.4 High: V3 V4 V5 V6 V7 V8 V9 V23 V10 V22 Population density Lone parent household Social rented All flats Never worked and long-term unemployed No qualifications Fuel Poverty Low: Owned Detached V12 Semi-detached V13 Terrace V14 People per room V16 V15 Average house size Cluster 2 Managerial and professional occupations Mean V11 V21 V20 V19 V18 V17 Figure 3: Summary of cluster 2 35 Map 3: Localisation of LSOAs in cluster 2 The proportion of social renters, people who has never worked or in long-term unemployment is the highest of all clusters. These areas are also characterised by a large amount of non-qualified people living in small flats. This group probably represents the most financially strained type of household with the highest rate of social renters and households fuel-poverty. A household is defined as in fuel poverty when it spends more than 10% of its revenue on heating. John Hills in his recent report suggests another definition which he takes from the Warm Homes and Energy Conservation Act 2000, which introduces the notion, if not directly mentioned, of inefficient households: those living on a lower income in a home that cannot be kept warm and a reasonable cost (Hills, J, 2012). This group has the highest rate of lone parent households. Detached, semi-detached or terraced houses are rare. The consumption of gas or electricity is the lowest of all clusters. For this group energy consumption is dictated by their budget rather than 36 their aspiration to environmental benefits. There is clear overlap between low income and the inefficiency of the homes people live in and successful policies for this group would benefit from not only addressing the environmental issues associated with energy inefficient homes but also from reducing low income household spend on energy. This group is roughly situated in what we will call the inner city areas in often difficult to treat housing stock. 3.2.1.3 Cluster 3: Energy-intensive wealthy greys Distinctive variables: High: V30 V29 V28 V27 V26 V25 V24 V31 0.3 V1 V2 0.25 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 -0.3 V3 V4 V5 V6 V7 V8 V9 V23 V10 V22 V11 V21 V12 V20 V13 V19 V18 V17 Age 45-64 Age 65+ Owned Detached Semi-detached Average house size Retired V16 V15 V14 Cluster 3 Mean Low: Population density One person household Two adults with children Two adults no children Lone parent household Social rented Private rented Terrace All flats Without central heating Never worked and long-term unemployed Figure 4: Summary of cluster 3 37 Map 4: Localisation of LSOAs in cluster 3 This cluster represents the older generation with a high proportion of people aged between 45 to over 65, generally towards the end of their professionally active life or already retired. The low proportion of couples with children confirms the population’s later age. They are mainly working or worked in intermediate or managerial and professional occupations. They are the second highest gas consumer and their electricity consumption is also important. They own large detached or semi-detached houses in well-established residential suburbs. It contains the least number of cases compare to the other clusters with 446 LSOAs. This group is likely to be able to afford to maintain their current energy consumption. However, going forward and with changes to pension policies, energy efficiency is likely to become more of an attractive option to the members of this group. Further educating this group over the benefits of energy efficiency could increase uptake in policies. A recent survey from non-profit energy supplier Ebico6 6 https://www.ebico.org.uk/blog/2012/04/30/the-green-deal-an-opportunity-for-the-newly-retired/ 38 on how the forthcoming Green Deal could benefit newly or soon to retire still highlighted a suspicion and misunderstanding over potential benefits for this group with only around a third of respondents saying they would take advantage of the new measures. 3.2.1.4 Cluster 4: Average consuming London renters V30 V29 V28 V310.15 V1 V2 Distinctive variables: V3 V4 0.1 High: V5 0.05 V27 V6 0 V26 V7 -0.05 V25 V8 -0.1 V24 V9 V23 Lone parent households Social rented Private rented Terrace Never Worked and long-term unemployed Low: V10 V22 V11 V21 V12 V20 Age 65+ Retired Consumption of Electricity V13 V19 V18 V17 V16 V15 V14 Cluster 4 Mean Figure 5: Summary of cluster 4 39 Map 5: Localisation of LSOAs in cluster 4 As for cluster 2, this group includes a strong proportion of social renters and of people who have never worked or are in long -term unemployment. However by comparing the mean centres for these 2 variables to the cluster 2, it is apparent that this group is not as financially deprived as the fuel-poor social renters. This group also includes a large proportion of privately rented and the primary type of household is terraced rather than flats, as is the case for cluster 2. There is also a non-negligible rate of the population working in routine or manual jobs, implying that although not in the best-paid lines of work, they are still professionally active and are not only dependent of governmental aids. The mean centres of the other variables including the two energy consumption variables are close to the average. This cluster also has the highest number of LSOAs. This group could represent in some ways the average household in London. 40 Being the largest group with some inevitable nuances, policies would have to include measures for landlords as well as awareness campaigns around renters’ rights with regards to energy efficiency. 3.2.1.5 Cluster 5: Wealthy energy intensive owners V30 V29 V28 V310.15 V1 V2 Distinctive variables: V3 V4 0.1 0 V27 High: V5 0.05 V6 -0.05 V26 V7 -0.1 V25 V8 -0.15 -0.2 V24 V9 V23 V10 V22 V11 V21 V12 V20 V13 V19 V18 V17 V16 V15 V14 Cluster 5 Mean Owned Privated rented Detached Semi-detached Managerial and professional occupations Educated Low: Lone parent households Social rented Routine and manual occupations Never worked and long-term unemployed No qualifications Fuel poverty Figure 6: Summary of cluster 5 41 Map 6: Localisation of LSOAs in cluster 5 This group comprises people with high levels of education, who occupy highresponsibility positions. They are mainly owners of large detached or semi-detached house but there is still a fair proportion of privately rented accommodation. This cluster is the main consumer in gas, as well as being the second biggest consumer of electricity. In this cluster there is a strong correlation between high-energy consumption and wealth. In terms of energy efficiency measures, the fact that a high proportion of this group is highly educated can help awareness and buy-in around energy efficiency measures. However money not really an issue for this group so policies would potentially have to strike a chord with regards to climate change. 42 3.2.1.6 Cluster 6: Average use suburban working families Distinctive variables V30 V29 V31 0.2 V1 High: V2 0.15 V3 V4 0.1 V28 V5 0.05 V27 V6 0 -0.05 V26 V7 -0.1 V25 V8 -0.15 -0.2 V24 V9 V23 V10 V22 Owned Semi-detached Terrace People per room Average house size Intermediate occuaptions Routine and manual occupations Retired No qualifications Low: V11 V21 Single person household (not pensioner) Two adults with children Two adults no children Social rented V12 V20 V13 V19 V18 V17 V16 V15 V14 Cluster 6 Mean Private rented All flats Figure 7: Summary of cluster 6 43 Map 7: Localisation of LSOAs in cluster 6 A majority of the housing stock in this group is semi-detached and terraced. The average is 45 to 64 years old as well as 5 to 14 years. Adding the fact the number of people per room is high; we can conclude this cluster represents older families possibly with teenagers or grown-up kids. They are mainly working in intermediate, routine and manual occupations and the unemployment rate is low. Even though they own their house, the low level of qualification, type of job and the choice of area where houses are more affordable than central London suggest, “money is still a concern”. As opposed to the previous group, the suburban working family is less concerned per se about climate change issues but more about saving money through energy efficiency. 3.2.1.7 Cluster 7: Low consuming strained renters Distinctive variables: V30 V29 V28 V27 V26 V25 V24 V31 0.2 V1 V2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 -0.3 High: V3 V4 V5 V6 V7 V8 V9 V23 V10 V22 Age 5-14 Age 65+ One person pensioner Lone parent households Social rented Semi-detached Routine and manual occupations No qualifications Fuel poverty V11 V21 V12 V20 V13 V19 V18 V17 V16 V15 V14 Cluster 7 Mean Low: Age 25-44 Private rented Managerial and professional occupations Educated Figure 8: Summary of cluster 7 44 Map 8: Localisation of LSOAs in cluster 7 This cluster is characterised by a much higher proportion of the population without qualifications and working in routine and manual occupations when comparing to the London average and to the other clusters. The proportion of people in the 5-14 age group and of lone parent households with dependent children, as well as retired people, suggest a more mixed type of household with the common characteristic to be in some proportion deprived. This idea is supplemented by the large proportion of social renters. This group also has the second highest rate of fuel poverty. Both their electricity and gas consumption are below average and only cluster 2 has both rates lower than cluster 7. This group lives predominantly on the outskirts of London. 45 3.3 Visualisation Our first map (map 9) representing cluster spatial distribution follows a conventional approach, by using choropleth maps. However, we found the visual impact to be greater when assigning the cluster to the buildings included in the LSOA instead of the full area. An alternative view of this geodemographic classification can be seen in map 10. It follows the ideas of Mark O’Brien, himself inspired by Charles Booth’s poverty map. This visualisation method assigns cluster colours to buildings included in the LSOA instead of the full area. Only the buildings, roads, parks and water-ways can be seen. This type of map not only helps the user to situate the groups but it also adds visual realism. 46 Map 9: Visualisation of household energy consumption classification in choropleth map 47 Map 10: Visualisation of household energy consumption classification in Charles Booth’s style 48 We added a link7 to download the choropleth map in KML (Keyhole Markup Language) as well as the building layer from map 10. KML is a “XML-based file format used to display geographic data in an Earth browser such as Google Earth, Google Map or ESRI ArcGIS Explorer” (Google Developers, ESRI). Visualising our maps allows the user to pan or zoom in, out or around the map. However there is a specific limitation to the size and complexity of loaded KML files (Google Developers), which means we could not export the entire map10 but only the layer with the buildings. Hosting the map on this type of platform can also be considered as a way of sharing data. Anyone with the link to download the file will be able to freely visualise our maps in Google Earth. We could argue that there are easier ways to share data. In this case, a third party would have to download the file, as opposed to if we would have hosted it on a web-based application which is as easy to access as any other website. It is worth noting also that even though visualising our maps in Google Earth gives enables user interactivity,” it is an incomplete exploratory spatial data analysis (ESDA) tool because it lacks the functionality of brushing” (Gibin et al., 2008). Figure 9: Visualisation of the building layer in Google Earth 7 https://sites.google.com/site/nrgclassification/kml-links 49 3.4 Limits and extensions: Geodemogropahic classification, although still being used and also evolving (new UK OAC will come from the Census 2011) has its limits. Some limits can be described as inherent to most classifications, while others can be specifically applied to the commercial classification (i.e. lack of methods details). Two main issues have to be raised when working on area-based data: ecological fallacy and Modifiable Areal Unit Problem (MAUP). 3.4.1 Ecological Fallacy An ecological fallacy occurs when it is inferred that results based on aggregate zonal (or grouped) data can be applied to the individuals who form the zones or groups being studied (Openshaw, 1984). For example, it is right to say the average consumption of ordinary domestic electricity for an area is 3000 Megawatt Hours in 2009 but strongly wrong to assume that households living in that area are consuming 3000 Megawatt Hours in 2009. The ecological fallacy occurs when we interpret that the association observed at this area level reflects the same association at the individual level. When looking at the results, it is important that people are aware of this to avoid wrong assumptions. 3.4.2 The Modifiable Areal Unit Problem Openshaw describes two related elements of MAUP: The ‘scale problem’ is “the variation in results that can be obtained when data for one set of areal units are progressively aggregated into fewer and larger units for analysis” (Openshaw 1984). The “aggregation Problem” is when individual data are regrouped in areas; the statistical interest will have different values depending on the area used. In our study, we focussed on the LSOA level. By using Output Area or wards level for example, it is likely that characteristics and the spatial repartition of the clusters 50 would have been different. Openshaw advises to minimise this problem by using the smallest spatial unit. This study would have no doubt gained in pertinence by using Output area, which is the finer Census area possible. But as Singleton notes, “the scale at which geodemographic classifications may plausibly be created depends crucially on the resolution of the input data” (Singleton, 2007). We used LSOA level to match the essential energy consumption data available. 3.4.3 Dating data A constant criticism of any geodemographic is the question around the time relevancy of the data used. We have here used the Census 2001 and therefore can expect changes from 2001 to today. In London, population size has increased by 9.1 per cent from 2001 to 2010 (ONS, c, 2011). At the time of writing, a new Census (2011) has already been conducted but the outcomes are yet to be made available. “Some (Sleight 2004) have suggested this change does not matter as certain areas will always be dominated by certain types of people and as people move out similar people move in” (Longley et al. 2011; Gale and Longley, 2012). This study would benefit from being updated with the new Census 2011 data and more recent data on the energy consumption in order to reflect a more real-time profile of London household energy characteristics. It would also allow us to seeing the evolution in attitudes towards energy consumption as well as impacts and effects of house improvement, new energy efficient building developments, uptake of energy efficiency policies etc. 3.4.4 Missing data This section will question some of limitations of this study with regards to the data selection. An interesting data set to include, if it would have been publically available, is the age of the house. “The highest consuming properties are those built between 1919 and 1945, 12 per cent higher than the average and 7 per cent higher than the oldest property group pre- 1919” (DECC, c, 2011). Another variable or 51 aspect that could be associated to our research would have been household data on building efficiency. Energy performance certificate could have made it possible to see the difference between high-energy consumption due to house structure, lack of isolation, or due to behaviours. This data is now compulsory on sale of a house but is not yet available for all households and is not publically stored or gathered. We are studying the average energy consumption over a year, but it still not implicitly showing the individual habits around energy use. We can only suggest their consumption habits depending on the type of house or other socio-economic characteristics. For the electricity-intensive city renters, for example, we assume that one of the reasons behind their high power use is their high use of electrical appliances. 3.4.5 The Labelling of Clusters Finding adequate names for each cluster to give the reader a clear, first idea of the group’s characteristics was challenging. Even though the name will not change the content of the cluster, Vickers insists many users will not look past the names to provide them with an impression of what an area is like (Vickers, 2006). For this study we wanted to incorporate energy consumption characteristics as well as one or two of the main distinctive characteristics drawn from the chosen variables. When labelling clusters there is always the risk of misinterpretation through the very nature of generalising groups through naming. 3.4.6 Fuzzy classification? Another typical criticism of classification is the crisp nature of classification (Vickers, 2006), which implies one area can only be in one cluster. The fact of assigning an area in a specific group while this same area can be closed to another group can therefore be seen as a limit for this sort of classification. A solution to this problem 52 could be the use of a fuzzy classification. The idea of fuzzy geodemographics is that “areas are not seen as a member of one type but as partial members of all types dependent on values” (Vickers, 2006). However, Vickers claims that “classic” geodemographics are appreciated for their simplicity. 3.4.7 Hierarchical classification While other geodemographic classification such as OAC or MOSAIC have several levels of hierarchy, with super groups which are themselves separated in to smaller groups, our classification comport only one level. Based on Callingham’s work (2003), Vickers and Rees (2007) explained that each level of hierarchy, three in this case, represents a different contribution. While the super groups allow a good visualisation and cluster labelling, the second level of aggregation has its main contribution for conceptual customer profiling (Vickers and Rees, 2007). The next level of aggregation can be used for market propensity measures from the larger commercial surveys such as TGI [target group index] and the readership surveys (Vickers and Rees, 2007). There are various reasons why we only created one level of aggregation here. We were only working on Greater London, as oppose as the whole of the UK for the OAC, so we created a regional geodemographic classification instead of a national one. Therefore the number of possible levels of aggregation was reduced. This choice has also been directed by the fact we were using data at the LSOA level and in consequence we had less data available in term of number of objects or areas. We didn’t want to create clusters with too few areas, which could have result with the creation of groups representing only outliers and could have been misinterpreted later on. A consequent extension of our study in view of this issue would therefore be the possibility to create this classification at OAs level and include a hierarchy. 53 3.4.8 Classification validation We have intentionally omitted to process a validation of our classification because of the limited scale of this dissertation. The classification validation could be considered as a study in itself. Vickers and Rees came back on the validation of the OAC in 2001, “considering internal validation of data inputs, peer reviews of methods and external validation against other variables” (Vickers and Rees, 2011). They insisted on the specificity of ground-truthing, its difficulties and benefits. Conclusions The aim of this study was to create a household energy consumption classification for Greater London in line with academicals values. A classification that would enable energy behaviour profiling at a small enough level to serve as a basis for improved policy targeting. The result produced through the methods presented in the early stages of this study, and despite the limitations around the data, is 7 clusters, distinctive enough in their main characteristics to use a basis for public or private energy efficiency initiatives. Area classifications and geodemographics have since Charles Booth’s study of London continued to be developed and used (Vickers, 2006). While they have for a period of time been essentially used by commercial companies, they are now widely spread in the academic world with the best example being the UK OAC. The academic world has insisted in creating open-source classifications as opposed to commercial geodemographics developed behind closed doors. The classification carried out in this study fits those of applied academic geodemographics. The end result is an open-source classification, which could be used as a tool for local authorities but also energy suppliers and households to find the adequate solutions to reduce energy consumption. While the aim of this study was not to proceed with a detailed policy match with the different profile groups once defined, we did judge it necessary to make some suggestions to support the overall objective of this report. 54 An interesting element in this study was the scale of our target area. Because of the special status of Greater London in its geodemographic structure and energy consumption determinants, it needs specific area classifications to be developed around it to avoid its intrinsic characteristics being diluted in a UK-wide classification. Our study follows this trend of regional geodemographics to support a more representative view on energy consumption and London. An interesting development of our study would be to consider if the method used here could be applicable at a national and/or regional scale. Gale et al. (2012) insist that the known limitations of the 2001 Geodemographic OAC for London will feed into the elaboration of the 2011 OAC. One of the main aims being to develop a clear release strategy as well as “a flexible open reproducible methodology for the 2011 UK OAC in order to allow the creation of regional classifications” (Gale et al., 2012). Another extension of the result of this analysis would be the exploration of visualisation methods to help households, policy makers and energy organisation to actually see energy efficiency in London in a more intuitive way. The creation of a mash-up with Google Map could for example allow the results of this study to reach a wider audience. Similar applications include “Londonprofiler” created by Maurizio Gibin and the “Geodemographics of Housing in Great Britain: A new visualisation in the style of Charles Booth's map” by Mark O’Brien. Due to the nature and scale of this particular study, we instead took first step towards making the outcome of this study publically available and interactive through a downloadable KLM map. 55 References: ArcGIS Resource Center, About KML, Copyright © 1995-2012 Esri. Accessed 9th September 2012: http://help.arcgis.com/en/arcgisonline/help/index.html#//010q00000066000000 Briefing from the Mayor of London, 2nd Of July 2012, The Green Deal Statutory Instrument – Second Delegate Legislation Committee. Accessed 05th September from: http://www.energyforlondon.org/ European Commission (2008) Climate and Energy Package. Accessed 11th September 2012: http://ec.europa.eu/clima/policies/package/index_en.htm Birkin, M. and Clarke, G. (1998) GIS, Geodemographics, and Spatial Modeling in the U.K. Financial Service Industry. Journal of Housing Research. Vol 9, Issue1 Blake, M & Openshaw, S. (1995) Selecting Variables for Small Area Classifications of 1991 UK Census Data. Working Paper 95/2, School of Geography, University of Leeds. Debenham, J., Clarke, G. and Stillwell, J. (2001) Deriving supply-side variables to extend geodemographic classification. Debenham, J. (2002) Understanding Geodemographic Classification: Creating The Building Blocks For An Extension. Working Paper. School of Geography , University of Leeds. DECC (2011) Energy consumption in the United Kingdom: 2012. Accessed 11th of September 2012: http://www.decc.gov.uk/assets/decc/11/stats/publications/energyconsumption/2323-domestic-energy-consumption-factsheet.pdf DECC, a (2011) Estimates of Home Insulation Levels in Great Britain: January 2012. Accessed 10th September 2012 56 http://www.decc.gov.uk/assets/decc/11/stats/energy/energy-efficiency/4537statistical-release-estimates-of-home-insulation-.pdf DECC, b (2011) Great Britain’s housing energy fact life. Accessed 11th of September 2012: http://www.decc.gov.uk/assets/decc/11/stats/climate-change/3224-great-britainshousing-energy-fact-file-2011.pdf DECC, c (2011) National energy efficiency data-framework. Accessed 11th of September 2012: http://www.decc.gov.uk/assets/decc/11/stats/energy/energy-efficiency/2078need-data-framework-report.pdf De Smith, M. J., Goodchild, M. F., Longley, P. A. (2007) Geospatial Analysis: a Comprehensive Guide to Principle Techniques and Software Tools (Leicester, UK, Troubador) Druckman, A. & Jackson, T. (2008) Household Energy Consumption in the UK: A Highly Geographically and Socio- economically Disaggregated Model. Energy Policy , vol. 36, pp. 3177- 3192. Everitt, B. S., Landau, S. and Leese, M. (2001) Cluster Analysis. 4th edn. London: Arnold Experian (2008) GreenAware: A Segmentation of Environmentally- Relevant Behaviours, Attitudes and Carbon Footprint. Retrieved April, 19, 2012, from http://www.experian.co.uk/assets/business strategies/brochures/GreenAware_factsheet[1].pdf Gale, C.G., Adnan, M., Longley, P.A. (2012) Open Geodemographics: Open Tools and the 2011 OAC. Not published yet, accessed through University College of London. Gale, C.G., Longley, P.A. (2012) Geodemographic Output Area Classifications for London, 2001-2011. Not published yet, accessed through University College of London. 57 Gibin, M., Singleton, A.D., Mateos, P., Longley, P.A. (2008) Exploratory Cartographic Visualisation of London using the Google Maps API. Applied Spatial Analysis and Policy, 1(2), 85-97. Google Developers (2012) KML Tutorial. Accessed 9th September 2012: https://developers.google.com/kml/documentation/kml_tut Greater London Authority (2012) Low carbon London. Accessed the 12th September 2012: http://www.london.gov.uk/priorities/business-economy/low-carbon-economy Harris, R, Sleight, P. & Webber, R. (2005) Geodemographics: GIS and Neighbourhood Targeting. Wiley-Blackwell. Hills, J. (2012) Getting the Measure of Fuel Poverty 2012. Accessed 12 th of September 2012: http://sticerd.lse.ac.uk/dps/case/cr/CASEreport72.pdf Longley, P. A. (2005) Geographical Information Systems: a renaissance of geodemographics for public service delivery, Progress in Human Geography 29, 1 (2005) pp. 57–63 Longley, P.A., Cheshire, J.A. and Mateos, P. (2011) Creating a regional geography of Britain through the spatial analysis of surnames. Geoforum 42, 506-516. ONS, a (2011) Household and families. Social Trends 41. Accessed 11th of September 2012: http://www.pdfdownload.org/pdf2html/pdf2html.php?url=http%3A%2F%2Fwww.o ns.gov.uk%2Fons%2Frel%2Fsocial-trends-rd%2Fsocial-trends%2Fsocial-trends41%2Fsocial-trends-41---household-and-families.pdf&images=yes ONS, b (2012) The National Statistics Socio-economic Classification (NS-SEC rebased on the SOC2010). Accessed in 28th August 2012 58 http://www.ons.gov.uk/ons/guide-method/classifications/current-standardclassifications/soc2010/soc2010-volume-3-ns-sec--rebased-on-soc2010--usermanual/index.html ONS, c (2011) Methodology Note on production of Super Output Area Population Estimates. Accessed 10th September 2012 http://www.ons.gov.uk/ons/guide-method/method-quality/specific/populationand-migration/pop-ests/methodology-note-on-production-of-super-output-areapopulation-estimates.pdf Openshaw, S. (1984) The modifiable areal unit problem. Concepts and Techniques in Modern Geography 38: 41. Petersen, J., Gibin, M., Longley, P., Mateos, P., Atkinson, P. and Ashby, D. (2010) Geodemographics as a tool for targeting neighbourhoods in public health campaigns. J Geogr Syst 13(2), 173-192. Petersen, J and Ashby, D and Atkinson, P. (2007) Health applications for open geodemographics. In: Winstanley, AC, (ed.) (Proceedings) Geographical Information Science Research UK. pp. 160 - 166. National university of Ireland: Maynooth, IE. Petersen, J., Gibin, M., Longley, P., Mateos, P., Atkinson, P. and Ashby, D. (2010) Geodemographics as a tool for targeting neighbourhoods in public health campaigns. J Geogr Syst 13(2), 173-192. Räsänen, T., Ruuskanen, J. and Kolehmainen, M. (2008) Reducing energy consumption by using self-organizing maps to create more personalized electricity use information. Applied Energy 85, 830-840. Rose, D. and O'Reilly, K. (1998) The ESRC Review of Government Social Classifications: Final Report Singleton, A.D. (2007) Comparing Classifications: Some Preliminary Speculations on an Appropriate Scale for Neighbourhood Analysis with Reference to Geodemographic Information Systems. UCL, Working Papers Series, Paper 127. 59 Singleton, A.D., Longley, P.A. (2009) Creating Open Source Geodemographic Classifications for Higher Education Applications. UCL, Working Papers Series, Paper 134. Sleight, P. (2004) Targeting Customers: How to Use Geodemographic and Lifestyle Data in Your Business. Henley-on-Thames: World Advertising Research Centre. SPSS Inc. (2011) K-Means Cluster Analysis, Chapter 26. pp. 184-187 in IBM SPSS StatisticBase 20, User’s Guide. Chicago, SPSS Inc. The Energy Trust. (2012) http://www.energysavingtrust.org.uk/ Vickers, D.W., Rees, P.H. and Birkin, M. (2005). Creating the National Classification of Census Output Areas: Data, Methods and Results. Working Paper 05/2, School of Geography, University of Leeds, Leeds Vickers, D.W. and Rees, P.H. (2006). Introducing the National Classification of Census Output Areas, Population Trends, 125. Vickers, D.W. (2006) PhD Thesis: Multi-Level Integrated Classifications Based on the 2001 Census. University of Leeds Vickers, D.W. and Rees, P.H. (2007). Creating the National Statistics 2001 Output Area Classification. Journal of the Royal Statistical Society, Series A Vickers, D. W., Rees, P.H. (2011) Ground-truthing Geodemogrpahics. Applied Spatial Analysis, 4:3-21 Webber, R. and Craig, J. (1978) Socio-economic Classifications of Local Authority Areas. London: Office of Population Censuses and Surveys. Data sources UK Census data 2001. Retrieved from the CASWEB website: http://census.ac.uk/casweb/ 60 DECC (2009) Domestic Energy Consumption. Retrieved from Department of Energy and Climate Change in January 2012: http://www.decc.gov.uk/en/content/cms/statistics/energy_stats/energy_stats.aspx Boundary data from Ordnance Survey data © Crown copyright and database right 2011. Retrieved from: http://edina.ac.uk/ Annex 1: Accessed 05th September from: http://www.energyforlondon.org/ The Green Deal Statutory Instruments - Second Delegated Legislation Committee Monday 2 July Briefing from the Mayor of London Key Messages ● The Mayor of London is supportive of the Government’s Green Deal agenda, and is clear that its ambitious aims can only be achieved nationally if they are achieved in London. The capital has by far the highest number of needy properties of any region, but there is a real danger that these properties could be sidelined by Green Deal providers as a result of the current framework being proposed by Government. ● To ensure that the Green Deal succeeds in the UK, and that the most needy properties in London are treated, we have therefore suggested some small but extremely important refinements to DECC’s current plans. These refinements will ensure that London’s unique situation is taken into account and will help prioritise the capital’s fuel-poor as well as helping to significantly decrease carbon emissions. ● Most crucially, there is a pressing need for an area allocation for the Energy Company Obligation (ECO). Without such a target there is a real danger that London 61 will miss out on the attention it needs, as energy companies and Green Deal providers focus on treating areas that are cheaper and easier to retrofit. As 22% of hard-to-treat properties in England are in London, this has the potential to significantly undermine the success of the Green Deal as a whole. ● Furthermore, without an area allocation to ensure that flats and mid-terraces key markets in London for the Green Deal - are able to access ECO subsidies, there is a serious threat that these homes will miss out on the benefits of the scheme. If these properties are side-lined, Londoners could end up paying an additional £390m on their energy bills to fund the Green Deal nationally, while the capital receives investment of only £156m in return – an unacceptable possibility given London’s specific needs. Background ● London has lost out under previous, similar schemes such as the Carbon Emissions Reduction Target (CERT), under which the city only received 4.7% of installations despite having 12% of housing. We estimate that London has missed out on £480m of CERT funding since 2005. ● This is because energy companies have previously fulfilled their obligations wherever it most cost-effective to do so, without regard to the potential to reduce the comparatively high levels of fuel poverty in London and carbon emissions from some hard-to-reach housing stock. The comparatively high cost of treating London homes, compared to those outside the capital, has meant that energy companies have focused their attentions elsewhere. The need for area allocations ● The Government’s current suggested plans for ECO could exclude Londoners in flats and mid-terraces from the Green Deal, as such housing stock will be unable to access a high enough subsidy from the ECO to make the scheme viable. DECC’s Green Deal impact assessment confirms this possibility, though as these housing 62 types make up nearly two thirds of solid wall properties in the capital it is vital for the success of the Green Deal that they are not excluded in this way. ● If London continues to receive a lower than equitable share of national energy efficiency funding this will mean ECO will fail to adequately tackle fuel poverty in the capital. This, in turn, will affect the success of the Green Deal as a whole. ● Area allocations should therefore be put into effect, based on the relative share of solid wall properties for the carbon target and the relative share of fuel poverty for the affordable warmth target. This will ensure that key properties in London, including flats and mid-terraces, are prioritised and in turn help to ensure the success of the Green Deal programme. ● Such area allocations would not only ensure fuel poverty in the capital is tackled, but also create certainty as to the size of the market and encourage more Londonbased suppliers to enter the market. Only 1/3 of the suppliers for London’s unique RE:NEW scheme, which provides energy efficiency measures to needy London homes, have expressed interest in becoming Green Deal providers so far, and it is extremely important for the success of the scheme that more providers are encouraged to get involved in the capital. ● Given that the Green Deal is a new programme, area allocations should be indicative, and then monitored on a quarterly basis so we have a clear understanding of where the ECO is being delivered and can check that delivery is equitable across the country. This will help ensure that the Green Deal is a success, including in areas that are more expensive to treat. Provision for area allocations could be delivered through secondary legislation or within the ECO brokerage document, which will set out how Green Deal providers can access ECO funding. For further information please contact Greg Taylor, Senior Government Relations Officer, on 020 7983 4498 or at greg.taylor@london.gov.uk 63 Annex 2: Suggested policies: This table provides a brief overview of possible matching of existing or forthcoming energy efficiency policies with the different groups. For more information on the schemes mentioned please refer to the Energy saving Trust http://www.energysavingtrust.org.uk/. Cluster 1: Electricity-intensive, - Energy Saving This group is unlikely to Trust Recommend qualify for means tested city Label grants. - Renewable tariffs majority of renters, this - Smart Metering group might benefit from - Energy renters Company ECO Obligation (ECO) - Comprising Climate through their landlord. Climate change change campaigns campaigns a linked to reduced energy use could strike a cord with this group, which has a high electricity use. Cluster 2: - Carbon savings This group would qualify community for a range of energy obligation efficient schemes at no - Smart metering cost. - Free Fuel-poor, social renters grants But having the insulation measures in place is one thing, getting households to apply is another. Efforts will have to be made to raise awareness around the benefits of reducing household 64 inefficiency. Cluster 3: - Energy-intensive, wealthy greys - Micro-generation The older part of this grants group would benefit from Free insulation existing and forthcoming grants policies although there is - Smart Metering still wariness among this - Renewable heat age group around the incentives benefits versus the effort (applying, works etc.) The recently retired or soon to be retired could buy-in to micro-generation (favorable house types) with the right, targeted messages. Cluster 4: - Savings The largest group of all Obligation our clusters, the average - ECO consuming London renter - Affordable would benefit from any Average-consuming London renter Carbon Warmth Obligation landlord led initiatives, - Smart Metering which will come into play - Green Deal with the Green Deal. The more financially strained could still qualify for some of the means-tested schemes. However being the most heterogeneous groups, a one-size fit all approach will necessary work. 65 not Cluster 5: - Green Deal With money not really an Wealthy energy intensive - Renewable tariffs issue owners - Energy - - for Saving successful this group, uptake of Trust energy Recommended measures will have to be Label driven by a convincing Climate change story. efficiency A majority of campaigns owners, this group will Smart Metering have some options under the Green Deal. Cluster 6: Average use suburban working families - Green Deal Family orientated, - Insulation grants group would benefit from - Smart Metering targeting campaigns to - ECO promote the this benefits mainly financial of taking up some of the energy efficiency measures on offer to this type of households and income. Cluster 7: - Insulation grants This group has the second Low consuming strained - Smart Metering largest proportion of fuel renters - Carbon - Savings poor and will therefore community benefit from a range of obligation measures for financially Affordable strained households, Warmth Obligation whether on benefits or in low paid jobs. But as with cluster 2, educating the households in this group 66 around what options are available to them at no financial cost is biggest hurdle. 67 the