„ECER 2014 - The Past, the Present and Future of Educational Research in Europe” Porto, 1-5
September 2014
Szilvia Nyüsti – Zsuzsanna Veroszta
Linking Administrative and Survey Data in Educational Research - Possibilities and
Limitations of Using Probabilistic Record Linkage in Graduate Career Tracking
Recently, in addition to the survey based data collection the importance of utilization of administrative databases has increased in social research. From the researchers’ perspective diverse nature of data sources and data collection methods are associated with different advantages and limitations which requires constant self-reflection. It is clear that regarding face to face interview based surveys due to several parallel trends (i.e. increasing mobility, distrust and need for information) reaching potential respondents became more difficult and uncertain which could significantly increase the costs of data collection. Intensive expansion of technical and infrastructural facilities opened the space for online surveys which provide low-cost, rapid and extensive access to potential respondents. Obviously this procedure also has its own limitations: due to the low response rate reliability of data is greatly compromised by self-selection mechanisms. However, many existing administrative datasets offer itself for research as a cost-effective alternative data source. On one hand these official databases reserved for administrative purpose are able to provide full coverage of population, on the other hand they have their own limitations in the number and content of available variables. Linking administrative and survey data on individual level could be considered as a potential research procedure to compensate the weaknesses of the two aforementioned data collection methods. In our study we examine the possibilities and limitations of this procedure in terms of educational research approach within the framework of Hungarian career tracking research program.
It is an important task even for educational research to take account and consider the advantages and disadvantages of using and - what is more - integrating different type of educational data sources. In this context below we look for the possibility of validation of online survey data on graduate employment, particularly given the fact that these data may be important indicators of the effectiveness of the Higher Education system. In this respect, the results may play a role in education policy or funding decisions in the context of evidence based approach.
Linking administrative and survey data is a commonly used method to improve data quality
(Davern-Thomas, 2009; Schnell, 2013) which provides opportunity to measure and correct effects of several data bias (coverage error, sampling error, nonresponse error, reporting error). The condition of the application of this method is to have data on the same population from both the administrative and survey databases (Groen, 2012). Our task is to summarize the experiences of such data integration on Hungarian datasets; however our conclusions are general in scope.
Databases used in the analysis cover the same population: Hungarian graduates in 2010.
The available administrative database is itself a result of former data integration. It contains individual-level data on whole population of the aforementioned graduates from Higher
Educational Information System, National Health Insurance System and National Tax Office
two years after their graduation. Another source used to assess data reliability is the database of regular Hungarian graduate career tracking survey. This online survey database also contains educational and employment data on graduates in 2010. Although both database are anonymized and does not include unique ID for the purpose of direct data connection, they contain a large number of overlapping variables (primarily gender, year of birth, and several higher educational background variables) which provide an opportunity to use the probabilistic record linkage method (Ivan-Sunter, 1969). Using this procedure matching-, probably matching and mismatching groups of record pairs can be determined.
The most important result of linking survey and administrative database in educational research is to identify and correct measurement errors due to the low response rate in online survey. This is a particular problem in the case of self-reported income data (in addition, it could be also relevant in the measurement of vertical/horizontal mismatch).
Using administrative linkage we can find answers to the question of whether the nonrespondents can be described along specific labor market characteristics (i.e. their income differs from the population covered by the administrative database). This is an important issue given the fact that low response rate does not necessary lead to data bias, only in the case where the group of respondents have different characteristics than non-respondents.
By the method of probabilistic record linkage this problem can be examined. On the other side linking administrative and survey data can be an appropriate tool for managing lack of data of administrative database as well for example in analyzing officially “invisible” groups of graduates such as affected by employment on abroad, undeclared work, or not registered unemployment which can be measured by survey on a more sensitive way. As a result of analysis the volume and effect of several data bias, such as coverage error, sampling error or nonresponse error can be identified.
References
Davern, M., Roemer, M., and Thomas, W. (2009). Investing in a Data Quality Research
Program for Administrative Data Linked to Survey Data for Policy Research Purposes is
Essential. Paper presented at the November 2009 meeting of the Federal Committee on
Statistical Methodology, Washington, DC.
Fellegi, Ivan A. – Sunter, Alan B. (1969): A theory for record linkage. Journal of the American
Statistical Association. 64. 328. 1183–1210.
Groen, Jeffrey A. (2012): Sources of Error in Survey and Administrative Data: The
Importance of Reporting Procedures. In Journal of Official Statistics. 28. 2. 173–198.
Schnell, Rainer (2013): Linking Surveys and Administrative Data. German Record Linkage
Center, Working Paper Series.