Weighting issues Julian Chow Industrial and Energy Statistics Section United Nations Statistics Division (UNSD) Email: chowj@un.org Overview – st 1 Session 1st Session – Weighting issues The role of weights in an index Theory - weights in the Laspreyres formula Determining IIP weights in practice Weight updating Fixed weight index vs. chained index 2 Overview – nd 2 Session 2nd Session – Missing weights Missing weights for the most recent periods Missing weights for the entire time span of one component series Discussion 3 Question? How can the change of the production level of coca-cola be reflected in IIP? Example Elementary observation (quantity/value produced by an establishment) Coca-cola Product Waters, with added sugar, other sweetening matter or flavoured, i.e. soft drinks Product group Other non-alcoholic caloric beverages (CPC Ver.2 Sub-class 24990) 4-digit-Industry Manufacture of soft drinks; production of mineral waters and other bottled waters (ISIC Rev.4 Class 1104) 3-digit industry Manufacture of beverages (ISIC Rev.4 Group 110) 2-digit industry Manufacturing of beverages (ISIC Rev.4 Division 11) 1-digit industry Manufacturing (ISIC Rev.4 Section C.) 5 IIP Structure Stage 3: Weights for industry branches – Gross value added at basic prices Total IIP 1-digit ISIC 2-digit ISIC 3-digit ISIC 4-digit ISIC Stage 2: Product group weights – Value of output obtained via census/survey Stage 1: Product weights – Value of output obtained via census/survey Product groups assigned to one 4 digit ISIC branch Individual sampled products assigned to one product group 6 The role of weights in the index Weights are used to aggregate series into higher level aggregates Can be done at different levels Weights have to be chosen accordingly Weights have to reflect the relative importance of the individual components within the aggregate Weights determine the impact that a particular volume change will have on the overall index 7 More about weights Over time, establishment production levels shift in response to economic conditions. Relative importance may change Products within a product group Product groups within an industry Lower level industries within higher level aggregates For the IIP to reflect the movements as good as possible, the weights have to reflect these changes 8 Recap the Laspreyres volume index Notations used Notations pt : prices at time t qt : quantities at time t Base period: t=0 i: units (i.e. products, product groups or industries) to be aggregated n: number of units (i.e. products, product groups or industries) to be aggregated. 10 Laspeyres volume index Fixed prices from the base period How much more would the value of basket be in the current period if the price in the current period is the same in the base period?" Value of the basket in the current period using base period price Value of the basket in the base period n 0 t p i qi i 1 n 0 0 p i qi 100 i 1 11 Laspeyres volume index – another form Volume index formulae may be rewritten so that indices may be constructed using values instead of prices n 0:t I Laspeyres where w 0 i 0 t p i qi i 1 n 0 p i qi i 1 0 0 i i p q n qit 0 100 0 wi 100 0 i 1 qi n 0 0 p i qi The value share (weight) at period 0 prices and quantities for unit i i 1 12 Weights in Laspreyres formula Price weights n pi0 i 1 n 100 n 0 0 0 0 i 1 p q p i i i qi i 1 i 1 n p 0 i 0:t I Laspeyres qit q t 100 i quantities Value weights t 0 0 n q p q in1 100 n i i i0 100 0 0 0 0 qi i 1 pi qi pi qi i 1 i 1 n 0:t I Laspeyres pi0 qit Quantity relatives 13 Determining IIP weighting data in practice Example Elementary observation (quantity/value produced by an establishment) Coca-cola Product Waters, with added sugar, other sweetening matter or flavoured, i.e. soft drinks Product group Other non-alcoholic caloric beverages (CPC Ver.2 Sub-class 24990) 4-digit-Industry Manufacture of soft drinks; production of mineral waters and other bottled waters (ISIC Rev.4 Class 1104) 3-digit industry Manufacture of beverages (ISIC Rev.4 Group 110) 2-digit industry Manufacturing of beverages (ISIC Rev.4 Division 11) 1-digit industry Manufacturing (ISIC Rev.4 Section C.) 15 Practical steps for selecting and determining weights Determine sampling weights at the establishment and product level Determine weights for individual sample products Determine weights for the product groups Determine weights for the industry groups 16 Sampling in the IIP: Example A random sample of establishments Initially from the business register as part of the product survey goods (e.g. PRODCOM) (e.g. 25,000 establishments in total) Subset of these sampled for the IIP (e.g. 7,000 establishment sampled) A random sample of products from sampled establishment Again, using information provided in the product survey (e.g Results in 9,000 product-establishment pairs) A purposive sample of elementary observation from the sampled product-contributor pairs Undertaken using judgement of respondent but scrutinised by subject expert 17 Sampling weights at the establishment and product level The associated weights at the establishment and product level to obtain the value/output of a particular products depend on the sampling scheme If probability sampling techniques are used, the inverse of the sampling fractions are used as the weights We are not going to discuss sampling weights in details since this is a topic of survey sampling This leaves us with three fundamental level of weights in the IIP compilation Product weights Product group weights Weights for industry branches 18 IIP Structure Stage 3: Weights for industry branches – Gross value added at basic prices Total IIP 1-digit ISIC 2-digit ISIC 3-digit ISIC 4-digit ISIC Stage 2: Product group weights – Value of output obtained via census/survey Stage 1: Product weights – Value of output obtained via census/survey Product groups assigned to one 4 digit ISIC branch Individual sampled products assigned to one product group 19 Product weights Reflect the relative importance of a particular product in the product group E.g relative importance of coca-cola in the product ‘soft drinks’ Share of value of output should be used to weight each product in the product group. The product weights are generally obtained via the conduct of product censuses or surveys. 20 Product weights Product sales, though, are sometimes used in lieu of value of output as a weighting variable at this level of the index structure. Value of output - work-in-progress - output produced this period entered into inventory + inventory produced in the past sold in this period = product sales (value of output sold) 21 IIP Structure Stage 3: Weights for industry branches – Gross value added at basic prices Total IIP 1-digit ISIC 2-digit ISIC 3-digit ISIC 4-digit ISIC Stage 2: Product group weights – Value of output obtained via census/survey Stage 1: Product weights – Value of output obtained via census/survey Product groups assigned to one 4 digit ISIC branch Individual sampled products assigned to one product group 22 Product group weights Share of value of output (or proxies thereof) by product group within its ISIC class These “values of output” allow product groups to be weighted together (combined) and reflect the relative importance of each product group within an ISIC class. E.g relative importance of soft drinks in ‘Other non-alcoholic beverages (CPC Ver.2 24990) 23 Product group weights Each product group is assigned to just one ISIC 4-digit industry. Sources The product group weights are generally obtained via the conduct of product censuses or surveys. 24 IIP Structure Stage 3: Weights for industry branches – Gross value added at basic prices Total IIP 1-digit ISIC 2-digit ISIC 3-digit ISIC 4-digit ISIC Stage 2: Product group weights – Value of output obtained via census/survey Stage 1: Product weights – Value of output obtained via census/survey Product groups assigned to one 4 digit ISIC branch Individual sampled products assigned to one product group 25 Industry weights Share of gross value added (GVA) at basic prices by industry of all industries in-scope of industrial production. GVA at basic prices =Value of output – intermediate consumption + subsidy receivable on products – tax payable on products 26 Industry weights Using value of output as weight is not suitable Introduce distortion by giving a higher weight to any industry using intermediary goods and services Double count intermediary goods and services in the final aggregate 27 Industry weights GVA vs NVA Net value added (NVA) = Gross value added (GVA) – consumption of fixed capital (depreciation) Why select GVA, not NVA? Measure of consumption of fixed capital is quite difficult to observe GVA refers more to supply side considerations to meet final demand, including gross capital formation. Whereas NVA is more meaningful for an income approach in measure welfare and living standards 28 Industry weights GVA should be used as weights starting from the 4-digit level of ISIC Sources Such information is available as a result of annual national accounts compilation. However, for some countries, it requires the use of other comprehensive data sources, such as industry survey or economic census to obtain weights for lower levels of ISIC. 29 Summary so far Stage 3: Weights for industry branches – Gross value added at basic prices Total IIP 1-digit ISIC 2-digit ISIC 3-digit ISIC 4-digit ISIC Stage 2: Product group weights – Value of output obtained via census/survey Stage 1: Product weights – Value of output obtained via census/survey Product groups assigned to one 4 digit ISIC branch Individual sampled products assigned to one product group 30 Calculating weights Weights formula wi0 Vi 0 n 0 V i i 1 By consequence n 0 w i 1 i 1 V : Absolute weight (value) w : Relative weight Base period: t=0 i: products, product groups or industries to be aggregated n: Set of all products, product groups or industries to be aggregated. 31 Example Suppose the product group “Other non-alcoholic caloric beverages (CPC Ver.2 24990) contains the following product with Soft drinks (output value =70) Non-alcoholic beverages not containing (output value =20) Non-alcoholic beverages containing milk fat (output value =10) Product weights within the product group are Soft drinks [weights = 70/(70+20+10)=0.7] Non-alcoholic beverages not containing [weights = 20/(70+20+10)=0.2] Non-alcoholic beverages containing milk fat [weights = 10/(70+20+10)=0.1] 32 Weight updating Why updating the weights? Reflect changing structure in the economy Over time production level shifts in response to economic situations Example Smart phone Typewritters 34 Key issues to consider when updating index weights The frequency of weight updates The method used to incorporate new weights into index structure 35 Update frequency Update frequency of IIP weights can be linked to The need to accurately reflect the current relative importance of product groups and industries Data availability The index type used to compile the index • Laspreyres-type index provide some flexibility regarding update frequency as weights are not derived from the current period 36 Update frequency - recommendation Industry weights • Annual • The latest weights available are likely from t-2 or t-3 • Frequent update of weights can alleviate the substitution bias/changing weights problem Product group weights • at least every 5 years • Less frequent than those for industry level due to resource and data constraints Product group • The weights of individual products are updated at the same time as product group 37 How to select reference period? Concepts of reference period Quantity reference period the period whose volumes appear in the denominators of the volume relatives used to calculate the index Weight reference period The period, usually a year, whose values serve as weights for the index the index reference period The period for which the index is set equal to 100. The three types of base periods may coincide, but frequently do not. 39 Weight reference period Laspeyres-type volume index with weights updated annually The weight reference period will always be the most recent period (year) for which weights are available 40 Weight reference period In circumstances of less frequent weight updates, the weight reference period should therefore possess the following characteristics: (a) Reasonably normal/stable (i.e. typical of recent and likely future years); (b) not too distant from the reference period; (c) clearly identified when analyzing and comparing the index results. 41 Summary Industry level weights • Annual update should be carried out. • Should ideally be National Accounts value added figures at basic prices – adjustments necessary to make them timely available. Product group weights • Should be updated frequently at least every 5 years • Obtained by determining the share of value of output, via the conduct of product census or surveys Product weights • The weights of individual products are updated at the same time as product group • Obtained by determining the share of value of output, via the conduct of product census or surveys 42 Fixed weights vs chained index - Concepts Fixed base volume index Hold one period as the base period and compare all prices back to this period Calculate movement back to the base period for each successive time point Each index in the time series is a comparison from that period back to the base period 44 Fixed base volume index Fixed base volume index, from time 0 to 4 TABLE 2.12- PRICES AND QUANTITIES FOR SIX COMMODITIES, WITH DIRECT LASPEYRES VOLUME INDICES. Value of basket (£’s) fixed in period 0 prices Commodity A Agricultural commodity B Energy C Traditional manufacture D High-tech goods E Traditional services F High-tech services Total qAt pA0 qBt qCt q Dt q Et qFt Direct Laspeyres volume index (change from period 0) pB0 pC0 p D0 p E0 pF0 0 1 2 3 4 1.00 1.20 1.00 0.80 1.00 1.00 3.00 1.00 0.50 1.00 2.00 2.60 3.00 3.20 3.20 1.00 0.70 0.50 0.30 0.10 4.50 6.30 7.65 8.55 9.00 0.50 10.00 0.40 14.20 0.30 13.45 0.20 13.55 0.10 14.40 100.0 142.0 134.5 135.5 144.0 45 Fixed base volume index Direct (Fixed Base) Index 150 Index (0=100.0) 140 130 120 110 100 90 0 1 2 3 4 Period 46 Fixed base volume index Direct (Fixed Base) Index 150 Index (0=100.0) 140 130 120 110 100 90 0 1 2 3 4 Period 47 Chained volume index Calculate consecutive period volume index: Use a period 0 basket to look at period 0 to 1 changes Use a period 1 basket to look at period 1 to 2 changes Use a period 2 basket to look at period 2 to 3 changes Use a period 3 basket to look at period 3 to 4 changes Chain these results together to get a measure of price change from 0 to 4 48 Chained volume index Consecutive period indices 150 140 Index (0=100.0) 130 120 110 100 90 80 0 1 2 3 4 Period 49 Chained volume index CALCULATION OF INDIRECT (CHAINED) LASPEYRES VOLUME INDICES AND COMPARISON WITH DIRECT (FIXED BASE) LASPEYRES VOLUME INDICES Period Index Direct volume index, period 0 to 1 Direct volume index, period 1 to 2 Indirect (Chained) volume index, periods 0 to 2 Direct volume index, period 2 to 3 Indirect (Chained) volume index, periods 0 to 3 Direct volume index, period 3 to 4 Indirect (Chained) volume index, periods 0 to 4 Direct (Fixed base, period 0) volume index, periods 0 to 4 17 March 2010 ECLAC, Santiago 0 1 100.0 142.0 100.0 142.0 100.0 - 2 96.1 136.5 100.0 136.5 - 3 - 100.0 142.0 136.5 97.8 133.5 100.0 133.5 100.0 142.0 134.5 135.5 100.0 - 142.0 - - Workshop on Manufacturing Statistics for ECLAC member states 4 99.7 133.1 144.0 Slide 50 50 of 89 Chained volume index Consecutive period indices 150 140 Index (0=100.0) 130 120 110 100 90 80 0 1 2 3 4 Period 51 Chained volume index Chaining! Indirect (Chained) Index 150 140 Index (0=100.0) 130 120 110 100 90 80 0 1 2 3 4 Period 52 Comparison Comparison between Fixed basket and Chained Indices 150 140 Index (0=100.0) 130 120 110 100 90 80 0 1 2 3 4 Period Different result special case of equality is called a transitive index formula • fixed baskets with differential weights never transitive 53 Fixed base vs chain index Fixed base result more attractive operationally Only one revaluing step One set of prices (weights at base period) Why would we chain? Updating the basket and weights! 54 Fixed weights vs chained index - Recommendations Fixed weights vs chained index – on weights update Fixed weight indices Weight structure fixed at particular point Compare volume in period t relative to some fixed base period When base year change, entire historical series are revised as value for all periods are recalculated using the new base weights Chain-linked indices Updating of weights and linking two index together to produce a time series Unlike the fixed weight approach, the chain approach does not recalculate the entire historical series Therefore, the index is compiled for a succession of different segments while keeping the original weights for each past segment fixed 56 Old recommendations Use fixed weights for the calculation Update weights every 5 years Recalculate entire series Problem: New weights may reflect better the movements in the current periods, but they are not applicable for past data (far from new weight period) • Problem simply shifts to a different period 57 New recommendations Update weights more frequently Recommended: Annually Do not re-calculate entire series Use chain linking to produce time series for IIP 58 New recommendations Chain-linking annually rebased series allows for better reflection of current economic structure in the weights in each of the sub-series Current period and weight base period are not too far apart Alleviate substitution bias Provide opportunity to incorporate new products 59 Linking How to link the individual sub-series to obtain longer time series? A linking factor has to be determined to link the new series to the existing historical series This factor is then applied to the new (old) series to convert it to the old (new) base year 60 Linking The long-term time series are calculated from a succession of shortterm series with updated weights • Note: Short-term series can span any number of periods 61 Linking options Annual overlap, linking factor based on annual index for years t index of the same year using weights of year t-1 One-quarter overlap, linking factor based on index of the first quarter of year t Index of the same quarter using weights of year t-1 Over-the-year technique Linking factor based on same periods for years t and t-1 62 63 Recommended method Annual overlap technique More practical for Laspeyres-type volume measures Monthly/quarterly data aggregate to annual data • However, there are no clear established rules for choosing this approach • In most cases, the approaches will give similar results 64 Drawback of chainlinking Lacks of additivity characteristic The lower level volume measures (e.g. ISIC 4digit class) do not sum to upper levels of the ISIC structures (e.g. 3 digit ISIC level) When individual prices and quantities changes occurring in earlier periods are reverse in later period, chaining can lead to a worse result than a fixed base index 65 Summary Industry level weights • Annual update should be carried out. • Should ideally be National Accounts value added figures at basic prices – adjustments necessary to make them timely available. Product group weights • Should be updated frequently at least every 5 years • Obtained by determining the share of value of output, via the conduct of product census or surveys Product weights • The weights of individual products are updated at the same time as product group • Obtained by determining the share of value of output, via the conduct of product census or surveys 66 Summary The chained Laspreyres-type volume index is the recommended one for the compilation of the IIP When re-weighting occur Do not re-calculate the entire series The index is compiled with weights only for those period to which they relate For monthly and quarterly data, advantage of chaining are less as price and quantity are subject to greater fluctuation. 67 Conceptual illustration of IIP annual chaining 120 115 110 Index number 105 100 95 90 85 80 Jan07 Feb07 Mar07 Apr07 May07 Jun07 Jul07 Aug07 Sep07 Oct07 Nov07 Dec07 Jan08 Feb08 Mar08 Apr08 May08 Jun08 Jul08 Aug08 Sep08 Oct08 Nov08 Dec08 Jan09 Month 68 Feb09 Missing weights Missing weights Missing weights for the most recent periods Missing weights for the entire time span of one component series Notice that there is no ‘recommended’ approach in this area. 70 Estimation of the missing weights for the most recent period In practice, the calculation of the IIP is likely to use industry weights from period t-2 i.e. the year 2005 index is likely to be compiled using industry weights from 2003. This is because the necessary weighting data for the industry level are not normally available until at least 18 months after the reference period. 71 Missing weights for the most recent period In some countries, for the first few months of a new year (i.e. year 2006 in this example) the index may need to be compiled using the ‘old’ weights (i.e. from 2003) because the ‘new’ weights (i.e. from 2004) are not yet available. In these situations, the IIP should be recalculated (revised) on the basis of the new weights once they become available i.e. the January 2006 IIP should be calculated using the weights from 2003 but be recalculated on the basis of 2004 weights when they become available. 72 Estimation of the missing weights for the most recent period - others Alternative source For example, use survey data (e.g. annual survey of manufacturing) to impute for the missing GVA. Administrative sources Subjective expert judgement Estimation Time series method - ARIMA , state-space model, moving average, etc. Regression model Imputation procedure. Use equal weights Need proper quality check! 73 More on estimations Imputation Regression Exponential smoothing ARIMA model State space model 74 Imputation Historic value Use historic value such as last year value Historic value with trend Trend can be based on growth in another variable within the record, variables in other records, etc. Useful method when variables or growth rates are stable over time 75 Regression model Yt ' Xt t A regression model predicts a missing value using a function of some auxiliary variables X. Auxiliary variables can be from the current survey or other sources. E.g. historical information (previous period value) Regression coefficients (beta) can be determined from historic data 76 Exponential Smoothing Y t 1|t Y t|t 1 Yt Y t|t 1 Forecast of t+1 value at time t Smoothing parameter Forecast error at time t-1 Smoothing parameter determined by Subjective consideration Minimizing sum of square of forecasting errors Relatively simple to use 77 Autoregressive integrated moving average (ARIMA) ARIMA(p,d,q) model (assume d=0 in this case) Identification of the model is necessary before proceeding to forecasting. The AR and MA lag order (i.e. p and q) The AR and MA smoothing parameters (i.e. φ and θ) The integrated order, d Complicated to use, but many statistical software, such as SAS and R, has a built-in procedure for estimation 78 State Space Model State Space Model is a structural time series model that allows Obtain unobserved component (unseen driving force) given observable series (something you can see) Model each time series component (trend, seasonal, and also sampling error) within a structure Update estimates at current time using ‘Kalman Filter’ Two equations in matrix form – – Measurement (Observable) Equation State (Transition) Equation The magic is that once the system is specified in these two equations, the system can be updated through a certain set of algorithm. Complicated to use, but unlike ARIMA, it does not require the series to be stationary. In addition, it will cope with multivariate approach for further extension State Space Model Measurement Equation Yt Z t I t State Equation t T t 1 R t Notations Y: A Observable Series (e.g. Weights) α: State Vector (e.g. a vector of trend, sampling error) Z, T, R – matrix for computations I, η – Random Errors Subscript represent time point Missing weights for the entire time span of component series Use equal weights Expert judgement Use weights from other sources Estimations Product replacement 81 Summary The calculation of the IIP is likely to use industry weights from period t-2 If period t-2 weights is not available, the index may be compiled using the t-3 weight Several methods of estimating missing weights at the most recent periods are also proposed in this presentation, though there is no international recommendation in this area. 82 Discussion 83