Scanner Data Project The experience of Portugal Paulo Saraiva dos Santos Data Collection Department Scanner Data Workshop 7-8 June 2012 Stockholm, Sweden « Outline • • • • • Background and context; Portuguese Scanner Data project; Collaboration with data providers; Lessons learned; The way forward. 2 Statistics Portugal • Central national authority for the production of official statistics; • It is legally empowered to require information (mandatory and gratuitously); • Survey’s data collection is a core activity: – 40% budget & 30% human resources. Survey Data Collection is a core activity 3 Data Collection • A Data Collection department assures the collection, processing and analysis of collected microdata, covering all business and social surveys; HR 200 workers + 350 freelance interviewers Annual figures 81 surveys and 700 occurrences 66 business (self-completed) 15 by interview (CAPI and CATI). 125.000 companies (*) (99% SME); 70.000 dwellings(*); 30.000 farms. 4 Portuguese CPI • • • The prices are mainly collected using paper questionnaires, and the same price collector who observes the price makes the data entry at home; Afterwards, the prices collected by a certain price collector are daily sent to local offices, where they are processed and analyzed; Then, regional indexes are calculated and sent to the Lisbon office to be aggregated and consolidated. 5 Portuguese CPI • Annual data collection figures: – 1.355.000 prices; – 11.000 outlets; – 150 price collectors (freelance interviewers) • Collection cost = 740 k €, – 83% are costs of the field team; 6 National Accounts Economics Social and Demographics Methods and Information Systems Production architecture CPI Data Collection 7 Scanner Data Project • Scanner data in the Portuguese CPI has a high priority for the next years. – • As consequence of Statistics Portugal strategy to collect data efficiently, assuring high level of quality and with the lowest response burden, the effort of implementing Statistics Portugal has been awarded in 2011 a Eurostat grant to undertake the initial research on the exploitation of scanner data from the period of 2011-13. 8 Lines of actions 1. 2. 3. 4. • Knowledge acquisition; Collaboration with data providers; Pilot project; Infrastructure. This paper presents the present status of the scanner data project, especially our experience to date on obtaining collaboration with data providers. 9 Knowledge acquisition • To access data on product characteristics for products covered by scanner data: – the use of European Article Number (EAN) and its linkage with COICOP classification and in-store codes; – to learn from experiences of other countries using scanner data. • Two visits: – CBS (The Netherlands) September 2011 – FSO (Switzerland) October 2012 10 Pilot project • to establish continuous scanner data flows routines with selected retail chains; • to develop the linking of the aggregated and the product-level codes found in scanner data to COICOP statistical classification; • to develop internal data methods for storing and processing scanner data based on a datawarehouse and data mine approaches; • to develop sample designs and weights, including methods to integrate scanner data with the existing price collection processes; 11 Infrastructure • to design and build an information system to support the pilot project. 12 Collaboration with providers • to explore and to negotiate arrangements to access scanner data from retailer’s chains, and • to select data providers for a pilot experience; • a simplification was adopted by limiting the scope to the purchase of food and non alcoholic beverages, which represents approximately 15% of our CPI fixed basket of goods and services. 13 Target data providers • There are five chains that dominate the sector; • Estimated structure of the sector shows the following market share (food and non alcoholic beverages): – Modelo Continente (31%); Pingo Doce (30%); Auchan (14%), Lidl (10%), Minipreço (8%), and others (7%). • Pioneers: – top two retail chains: – Both with 60% of the national share; – 40% of prices and outlets collected in HICP in food and beverage class. 14 SONAE Group • First meeting in November 2011; • The results were extremely positive and it was possible to create the conditions to establish a formal protocol with this retailer, which was formalized in the beginning of the year 2012. • The two first sets of scanner data were received in April and May 2012, covering information from two consecutive months of 2012 – Including list of outlets, identification and characteristics of the items, and accumulated transactions (quantities and turnover) in reference period. 15 Jerónimo Martins Group • First meeting in December 2011; • The cooperation was accepted in March 2012, and two bilateral meetings are established. • First data set expected in June 2012. 16 Approaching retailers (1) • Statistics Portugal is legally empowered to require statistical information (mandatory and gratuitously) to all companies; • This is a strong advantage, but it is considered insufficient to motivate data providers to be involved free of charge in the scanner data project. 17 Approaching retailers (2) • It is difficult to demonstrate that scanner data will reduce the statistical burden; • There is a completely different perception; – having our price collectors visiting their outlets is not perceived as a relevant burden; – Data providers do the same with their competitors in order to compare prices; • Key challenge to find strong arguments to convince retailers to join the project. 18 Approaching retailers (3) • Other issue: what kind of information would be required to the providers. • The information is very sensitive; • Mentioning that we are flexible, we opted to show an example of the data structure; – List of the outlets; – Items description; – Transactions. • Approach appreciated by the providers. 19 Data Files structure (example) 20 Data Files structure (example) 21 First meeting approach We found that the more convincing message in the first face to face meeting was the following: 1.scanner data is the future and will be adopted in our country; 2.the design of this new process has been started and will define the way that retail chains will be required to provide CPI data; 3.we offer the opportunity to the provider to participate in the very beginning of the project, influencing the design of the project in order to be prepared in advance. 22 Lessons learned • • • Full support from the top management; Working Group with a single leader; It is difficult and takes time to reach consensus internally about some CPI methodological changes resulted from scanner data; Key issues are not technical • – – • • Organizational & Change management; Collaboration with data providers; Step-by-step approach; It is crucial to join first a top retail chain. 23 The way forward (1) • • • • Create an internal Working Group on Scanner Data, integrating members from prices unit, da collection, information systems and methodology; Establish a working plan for 2013 to 2017; First access to data from de two data providers in a retrospective approach; Evaluate ways to link products to COICOP classification, specially using EAN and internal store codes; 24 The way forward (2) • • • • Link items collected traditionally and the scanner data sent by the two providers; Analyze and understand price variations between scanner data and traditional approach; Specify and develop the application to support the pilot project, which aims to establish continuous scanner data flow routines with selected retail chains; Evaluate some changes in the CPI methods. 25 Scanner Data Project The experience of Portugal Thank you for your attention! Paulo Saraiva dos Santos paulo.saraiva@ine.pt Scanner Data Workshop 7-8 June 2012 Stockholm, Sweden «