The Experiences of Web Based Data Collection from Enterprises in Finland August 9th 2006, JSM Seattle USA Introduction - Strategies And Methods Statistics Finland’s Strategy for EDR To offer an electronic option in all data collections by 2007 (not in person statistics) It’s the respondent’s choice whether to use it or not Data Collection Methods About 97% of data are derived from administrative registers About 3% are from direct data collection (paper forms, machine readable data / primary EDI, EDR, interviews by CATI/CAPI systems: mainly Blaise) Business Data Collections About 50 surveys (excluding collections with less than 30 respondents) 45 Web (Internet form) collections in use Rami Peltola August 9th 2006 2 Background - Data Collection And Infrastructure Traditionally high response rates (in both annual and sub-annual business surveys) Good relations with data providers Up to over 99%, persistent staff Experienced staff, continuous personal contacts, high level of trust High level of using the Internet Almost every enterprise has access to the internet (employees 10+ 98%, employees 100+ 100%) Business surveys are directed to the largest enterprises Positive atmosphere for using internet with the government Respondents are even enthusiastic about using the Internet “It’s fun to fill in web forms instead of paper ones!” Rami Peltola August 9th 2006 3 Background - Three Generations of In-house EDR Solutions 1. Generation: Building cost index 2001 2. Generation: 7 EDR solutions 2002-2005 Built using Microsoft Windows DNA (Distributed Internet Application Architecture) VB.NET 3. Generation: 23 EDR solutions 2005-2006 XCola 11 EDR solutions made by outside service provider 1997-2006 Pilot in integrated data collection (tourism statistics) Rami Peltola August 9th 2006 4 Technical information - XCola in a Nutshell A generic application for Web surveys Processes the XML questionnaires and transforms them into Web applications Supports client and server side validations Executed on the server side, does not require any installation on the respondent side Works on every modern browser Easy to implement new questionnaires in just hours Main developer: Mr. Toni Räikkönen, toni.raikkonen@stat.fi Rami Peltola August 9th 2006 5 Benefits - Summary of Main Benefits Simplifying data collection process Reducing need for human resources Reducing other data collection costs Improving the quality of collected data Decreasing non-response Cost-efficiency Accuracy Timeliness Speeding up the data accumulation Reducing response burden Enabling direct individual feedback for respondents Enabling browsing of previously submitted data Assuring high level data security Rami Peltola Data provider relations August 9th 2006 6 Achieved Cost-Efficiency - 2nd Generation Four second generation solutions have been in production for 3 years (3300 respondents per month plus 800 per quarter) Average per cent of work saved in the data collection phase is over 40 (2 person years) The amount of ground mail has been reduced by 65% (0.5 person years) Number of reminders sent has gone down by half “Mass e-mailer” for all kinds of collections Investment is paid off in about a year Rami Peltola August 9th 2006 7 Cost-Efficiency Continues to Improve - 3rd Generation Common framework (one engine) for similar systems An effective build-up Simple method for transferring data between collection and production databases Only one application to maintain and support Support and development knowledge easier to acquire and spread Reducing need for human resources As manual handling diminishes, it can be replaced by more rewarding tasks Rami Peltola August 9th 2006 8 An Example - Working Hours Used in Data Collection and Validation in Sale Inquiry 2500 hours 2000 1500 1000 500 0 2001 2002 2003 2004 2005 years Rami Peltola August 9th 2006 9 Accuracy and Timeliness The data received are of better quality: “25% less errors” (both annual and sub-annual surveys) Response rates have remained on high level The average response time of monthly surveys has reduced The number of reminders sent has decreased substantially in the best case by 8-10 days or 30% in the best case by 50% (from 1000 to 500 in just 4 months) The share of the respondents using EDR -solution has in most cases reached high level sub-annual surveys > 60% (in the best case 85%) annual surveys ~ 30% (in the best case 75%) Rami Peltola August 9th 2006 10 An Example - Sale Inquiry Accumulation of Data 01/2002 - 01/2006 2000 responses 1500 1000 500 0 01/2002 01/2003 01/2004 01/2005 Rami Peltola 01/2006 August 9th 2006 11 Data Provider Relations Perceived response burden has gone down E-mail informs of the survey and reminds to answer Questionnaire is “always” available and fast to fill-in Option to fill in the questionnaire in separate sessions Good designing of the questionnaire Helpful validity checks - no additional inquiries Contextual on-line help Support for several languages Individually tailored feedback Access to all the previously submitted data and pre-filled questionnaires Rami Peltola August 9th 2006 12 High Level of Data Security Data security audit by an outside consult All traffic on the Internet is SSL -encrypted An authentication / authorisation -process is always needed New user IDs and passwords every year User IDs and passwords are initially sent in a letter Only one of them can be sent by email The other one must always be sent in a letter or given over by telephone Only a certain number of our staff have access to user IDs and passwords (usually two persons per survey) Rami Peltola August 9th 2006 13 An Example - Sale Inquiry Change in Response Media 12/2001 - 12/2005 1800 1600 responses 1400 1200 12/2001 12/2002 12/2003 12/2004 12/2005 1000 800 600 400 200 0 Fax Mail EDR Rami Peltola August 9th 2006 14 Costs - Investment and Maintenance The costs have dropped by 60-70% during the last few years Average investment cost per new EDR -solution (today) Maintenance costs of EDR solution per year (today) An outside service provider: was EUR 5000 In-house solution (XCola): less than 150 hours of work An outside service provider: was EUR 1000 In-house solution (XCola): less than 50 hours of work During the first and second phases the total resource input was about 2,5 person years (“learning by doing”) Included the development of a secure communication environment Included the implementation of 7 solutions Rami Peltola August 9th 2006 15 An Example - Work Done in Development and Maintenance of An EDR Solution (Sale Inquiry) 900 Includes hours used in development of infrastrucre 800 700 hours 600 500 400 300 200 100 0 2002 2003 2004 2005 Rami Peltola August 9th 2006 16 Challenges - In-house Development and Maintenance The development of surveys can be very fast if the IT personnel have good skills in XML and related techniques At the moment the number of very skilled survey developers is limited The whole production environment around XCola is not yet finished Somewhat dependent on certain named persons The statistics departments typically have a lot of requirements for the surveys Some minor development in XCola is needed all the time Rami Peltola August 9th 2006 17 Pilot - Integrated Data Collection (Tourism) Data are delivered directly from hotel management systems into our database No manual work needed (except to initiate the transfer) After their reception data are submitted to the standard validation process Software vendors implement a module for the hotel’s management software using Statistics Finland’s definitions for data and service interface Implemented using typical B2B integration technique: XML Web Services Rami Peltola August 9th 2006 18 Near Future - Productisation and Integration More integrated data collections? Co-operation with management system providers Project for productisation of XCola (since June 2006) Has already been made (Xcola v. 3.1): Developer’s manual, finalised administration tools Routines for transfers between collection and production databases XCola version for outside evaluation has been built Under development Graphical editor for building questionnaires and links to metadata Project for co-ordination of business surveys In the future more co-ordinated surveys - instead of many independent surveys targeted towards businesses Rami Peltola August 9th 2006 19