6TH MEETING of the National Statistics Methodology Advisory Committee Re-designing the Consumer/Retail Price Index sampling plan Karl Ashworth, ONS Executive Summary The Consumer Price Index (CPI) recently replaced the Retail Price Index (RPI) as Britain’s official measure of inflation. These two indices are used for a variety of official purposes including setting tax/benefit levels, wage bargaining and HM Treasury payments and setting interest rates. Consequently, it is important that the measures are as precise as is practicable within time and resource constraints. The Consumer Prices and General Inflation Division (CPGID) are interested in reviewing the current sample design and have requested Methodology Group to undertake this procedure with several objectives to be addressed. The RPI is defined as an average measure of change in the prices of goods and services bought for the purposes of consumption by the vast majority of households in the UK. The CPI is based on similar methodology, except that certain items not in the RPI are included in the CPI and vice-versa. In addition, the CPI differs in that it is based on the consumption of all households in the UK. Further, the CPI uses as its measure of price change, for a given price, a geometric mean of price relatives for all instances of item i. In contrast, the RPI uses two different aggregation formulae. One is the arithmetic average of the price relatives (AR). The other is ratio of price averages. Both measures depend upon measuring the price change of a fixed basket of goods/services. The full scope of the review is listed below, in order of assigned priority. We would be interested in the Committee’s opinion of the objectives and their priorities. The first objective of the review is to optimise the sample design of the CPI, where it is understood that the CPI is based on the geometric mean of price relatives, and variances are to be calculated for this estimator. Optimisation is to be considered in relation to three outcomes: (i) the price level; (ii) monthly price changes and (iii) annual price change. We welcome any general advice on the strategic approach and technical issues that Committee members think may prove to be influential. One particular issue with the sample design is the role of centrally collected prices (typically from supermarkets), which may exhibit and propagate (through the weighting strategy) unusual variances. For example, a tin of beans bought at a supermarket could be replicated 20-30 (or more) times through weighting to account for that outlet’s expenditure share. Commodity groups form the lowest level sampling units. Outlets that do not sell most/all items within a group are excluded from the sampling frame. What is the likely impact of this? More importantly, how can (currently excluded) outlets be included in a cost-effective procedure? 1 NSMAC (06): RPI Aims of paper This paper seeks comments on the overall scope of the C/RPI sample review project, in terms of general strategy and welcomes any specific technical comments on potential procedures. Not surprisingly, any design, if adopted, would be operating within similar resource constraints as currently exist. The sample review intends to address the following objectives: Optimisation of design (local items). Review of outlet sampling Extension of the use of variable baskets Item replenishment procedures Extending centrally collected prices (internet & catalogue sales etc) Estimates of the standard error of the C/RPI One of our principal concerns is how best to operationalise an optimisation procedure, based on a geometric mean estimator of price relatives. The committee should be aware that often there is only one observation per item per location, so complicating between-location estimates of variance. Where there are two item prices in a location, these may actually be one in each shop-type (multiples and independents). In addition, we would welcome comments on the possibilities of optimising with respect to sections, where a section represents a group of items, e.g. beef would comprise various cuts of beef. Consequently, we would be allowing the number of items within a section to vary, as well as the number of quotes per item. A second concern relates to allocating the number of quotes per item according to shop type. For example, some supermarkets give central quotes, i.e. they price a set basket of goods nationally, or regionally and those prices are replicated according to the outlets’ expenditure shares for that item. A potential problem with this is that there is no chance for an item brand to vary across a given central store, so if Brand X beans are priced it is Brand X that is weighted x times. In contrast, where price collectors go into the field, they may choose Brand X at one store Brand Y at another and so forth, getting a better estimate for the price of beans. It may make sense to consider increasing the number of brands of particular items that are collected from central stores at the expense of some of the locally collected items. We would welcome MAC’s advice on this. Further advice and comment is welcomed on a number of procedures, including potential bias from excluding outlets and locations from their sampling frames, the rationale for probability proportional to size sampling and the selection of items within outlets. A list of questions is provided in summary at the end of the paper. 2 NSMAC (06): RPI The current CPI/RPI sampling procedures Introduction The current CPI/RPI sample design involves a cross-classification of two separate dimensions: geography and product. The design is further complicated by multiple stratification and clustering selection procedures. Products are distinguished into product groups (e.g. food), nested within which are sections (e.g. bread); which, in turn, have items nested within them (e.g. a sliced white loaf, 800gms). Items typically will vary according to brand and/ or specification. Prices are collected for over 600 items, and these are considered a ‘basket’ of goods and services. Currently, items are selected purposively. Certain of these basket items are priced centrally whilst others are priced locally. It is the locally priced items that are the subject of this paper. However, it should be noted that some of the local prices are actually collected centrally (‘local-centrals’) and these are also of concern here. This occurs where companies have a central pricing strategy, i.e. their prices do not vary by location. A number of these local-centrals actually have regional pricing policies, so provide a set of prices for each region. The geographic dimension follows a multi-stage sample design whereby the UK is stratified into 12 Government Statistical Office regions. Within each region a sample of locations is selected, and within each location, a sample of outlets is selected. Selection of Items Items tend to be relatively well defined insofar as they describe the good/service and a number of associated phenomena, such as size, type and number etc. Examples of items are a large white sliced-loaf (800 gms) and toilet paper, pack of two. However, only for certain items are product descriptions under complete control of the central office. Often, price collectors have a degree of leeway in deciding which brands to choose, which might be the ‘most representative’, or the best selling brand. Consequently, there is room for variation in the actual brands chosen and the priced items may vary in part because of pricing policies associated with brands as well as differential pricing policies of outlets. For sampling purposes, items are grouped into commodity groups, where each item in the commodity group should be purchased from the same outlet selected for the sample in that location. For example, a basket collected within a location would collect quotes for all items in the meat commodity group from the outlet selected for that location. This procedure is used to reduce data collection costs. However, where price collectors cannot get all quotes in the same outlet, they use ‘follow-up’ outlets, selected as back-ups for this purpose. Once the item varieties are selected at the beginning of the year, the same product should be priced at the same place throughout the remainder of the year. This cannot always be achieved. The item (or brand/type) may be temporarily unavailable or become permanently obsolete. In the latter case, a new item is chosen and a re-basing of the price occurs to give a consistent series throughout the year. Additionally, an outlet may close down, at which point the price-collector moves to the back-up outlet chosen. Items are subject to review at the end of the year in order to keep the basket representative of the best selling items. A variety of procedures are used to inform this process that are the subject of a different review. 3 NSMAC (06): RPI Issues regarding items 1. Price collectors often have a great deal of leeway over what ‘brand’/make of an item to choose. This can introduce bias and ideas for means to help control this would be welcome. It is worth noting that ONS has been piloting the use of ‘remote sampling’ for certain electrical goods. This includes an element of probability sampling in the selection of items, using scanner data to define certain combinations of item attributes that price collectors have to match in the shop. However, whilst it will be introduced into the C/RPI data collection procedures in the future, the cost of this approach is likely to rule out using it more widely for other items. Geographic Sampling Dimension Prices are collected for around 145 fixed baskets of goods. To reduce data collection costs, the current scheme collects all the item price quotes for a basket within a particular geographic location. Consequently, a basket and a location are near co-synonymous in relation to item prices. However, around 11 baskets are split into two locations, the procedure and rationale for this are described further below. For the sake of brevity the term location will be used to mean a basket, except where the distinction is explicitly referred to in the text. Locations: selection A fixed number of locations, around 145, are allocated across the UK standard government office regions (GORs), where the number of locations is proportional to expenditure in the region. Expenditure primarily is calculated from the Expenditure and Food Survey (EFS), with some adjustments from other sources. It is calculated using expenditure data for the majority of households, although households on the highest and lowest incomes are excluded, in an attempt to ensure that spending represents ‘typical’ households (RPI). A definition of a shopping location exists that is based upon the location of shopping centres and retail activity (based on the Inter-departmental Business Register (IDBR) data on outlets and employees within postcode sectors). These data are combined with territorial access data to create location boundaries. The total number of locations within each region is known through mapping the locations to GORs. From the IDBR, a size measure of retail activity, the number of employees (+1) working in retail and service sector outlets, is calculated for each location. The number of locations per region is then selected with each location having a selection probability proportional to the size of retail activity (the number of employees (+1)). In an attempt to reduce data collection costs, quotes are required for a full basket of goods from each location. As small locations militate against the collection of a full basket, they are excluded from the sampling frame (where small locations are defined as those having less than 250 outlets). An exception to this arises with large out of town shopping centres, which account for substantial amounts of retail activity but do not necessarily allow a full basket of goods to be collected. These are paired with small locations (less than 10,000 employees) and the paired locations are considered together for their joint inclusion into the sample – these are the split-basket locations, referred to above. Locations: rotation Around one-quarter of locations are rotated in and out of the sample each year. Selection for rotation occurs approximately on a four-year basis with all locations that are to be rotated into the sample being selected enbloc prior to the start of the four-year period. This occurs within the regional strata. 4 NSMAC (06): RPI Primarily the length of time a location has spent in the sample determines the order of rotation out of the sample, i.e. older sample members are first to go. However, judgmental matching is used to determine rotation into the sample (after the en bloc selection). A location from those available for entry is matched to one that is leaving and chosen to replace it, within the same regional stratum. A location may replace itself, if again chosen for entry to the sample. Issues regarding locations 2. Potential bias arising from exclusion of small locations from sampling frame ONS are currently exploring the possibility of using a new definition for locations during which they hope to be able to create locations that are large enough to provide quotes for a full basket but also small enough not to exclude outlets for enumeration. However, comments are welcome on the likely nature of the bias introduced by the exclusion of small locations from the sampling frame and on the proposed new method to help overcome these (the method is described in more detail below). 3. The rationale and need for pps sampling with respect to retail activity Locations are currently selected with probability proportional to retail size. What are the advantages of using this approach? What would be lost if locations were selected using srs within region? Synopsis of proposed new definition of locations A location may be defined approximately as a shopping area with a centre of retail activity and a corresponding local shopping population. Essentially, locations are geographical areas that are defined on the basis of retail activity in shopping centres and ‘grown’ using an impedance grid and territorial barriers. A new definition of a location recently has become possible through work carried out by the Office of the Deputy Prime Minister (ODPM). This uses three dimensions of retail activity to measure ‘town centre activity’. Types of employment (e.g. what is expected in town centres, such as retail, restaurants, office workers, public sector workers – compared to say, primary industries, manufacturing etc) (taken from the IDBR). Density of floor space usage (provided by the Valuations Agency Office). Diversity of retail activity (taken from the IDBR). The aim with the new definition is to define the vast majority of locations in such a way that they are diverse and large enough to support the collection of a whole basket1, but compact enough not to exceed the maximum required for enumeration of the outlets (currently around 1500). 1 In some cases split baskets may still be used. As currently happens, if a location defines a large shopping complex that deals principally with retail outlets but not food, this location may be paired with another from which to select food items. 5 NSMAC (06): RPI Outlet Selection All retail outlets (up to a maximum of 1,500) within chosen locations are enumerated by having pricecollectors walk the streets to identify them. In addition to identifying the outlets, price-collectors note what items are available and classify outlets as multiples or independents. They also calculate size measures of retail activity to outlets that cater for multiple product groups by estimating the floor space devoted to the product group. Sampling is done separately for each commodity group, so that each group requires a separate sampling frame. Items are also grouped into one of four strata: regional, shop-type, region x shop-type, and no stratification. Shop type is a two level distinction between independent shop types (with less than 10 outlets) and, multiples (10 or more outlets). For each commodity group, shops are identified as eligible or not for entry to the frame. This does not mean that any shop that sells an item within the commodity group necessarily is eligible. In general terms, at least 50%, or more, of the items in the group should be available for purchase in the outlet to meet the eligibility criterion. Consequently, not all shops that sell an item are included in the sampling frame for that item. Once the sampling frame has been created, the region x shop-type stratification needs to be taken into account. For all items, the number of quotes required is pre-determined, with at least one quote required per item within each location. Where a regional stratification is used, a number of outlets is chosen within each location in each region. The number of outlets is equal to the number of quotes required in the location. In addition, further (supplementary) outlets are selected to provide back-up quotes in cases where not all items within the group can be purchased from the primary outlet, or in case the primary outlet(s) close during the year. Multiple and independent outlets are pooled into a single sampling frame and outlets are selected using srs. If an item is stratified by shop-type, then at least one quote is required from each shop-type. Consequently, each shop-type will appear at least once in each location for each item included in the commodity group. Within each location, independent shops are selected using simple random sampling, whereas multiples are selected using probability proportional to size (using the squared area devoted to the commodity group as a size measure to proxy retail activity). From the enumerated data, outlets are classified according to the commodity group for which they are eligible to enter the sampling frame. An independent sample of outlets is drawn for each commodity group. Issues with Outlet Sampling 4. Exclusion of outlets from the sampling frame in large locations It is unknown what impact this type of exclusion will have on prices. However, these are outlets that are likely to be on the periphery of a location, given that price collectors are asked to enumerate locations from the centre and moving outwards in a radial manner. It is also possible that the new location definition may also address this issue. 6 NSMAC (06): RPI 5. Exclusion of outlets through non-provision of complete commodity group Potentially this could be very problematic. However, it is unknown to what extent item level price change is related to an outlet’s tendency to stock a complete commodity group. Summary of questions for committee members a) General comments are welcome on the overall work-plan. b) Comments are welcome on the nature of the bias potentially introduced by the exclusion of locations and outlets from sampling frames under the current scheme and how well the proposed new scheme may address these. c) We would particularly welcome advice on how to deal with outlets that do not sell all products in a commodity group (the sampling unit) and are therefore excluded from the sampling frame of outlets. What are the likely consequences of such exclusions? d) Probability proportional to size sampling is used in selecting locations within regions and in selecting certain outlets (multiples). Size is an indicator of retail activity. What are the advantages or disadvantages of using this method rather than simple random sampling? e) We would welcome any ideas on better methods of selecting items within outlets. f) Advice and comments on the proposed optimisation procedures are sought, using a geometric mean estimator and comparing three criteria (price levels, monthly price change and annual price change). In particular, we are interested in comments on extending the number of quotes in central stores to cover different brands of an item. 7 NSMAC (06): RPI