Luxembourg, May 2010 AC/RDR/AMP ----------------------------------------------------Directorate F: Social Statistics and Information Society Unit F5: Health and food safety; Crime Doc. ESTAT/F5/ES/201 (available in EN) Orig. : EN Working group “Food Safety Statistics" Meeting of 24 – 25 June 2010 Luxembourg Chair: Mrs. A. CLEMENCEAU ITEM 4.4.b. OF THE AGENDA UPDATED METHODOLOGICAL INFORMATION TYPOLOGY OF SAMPLING STRATEGIES Eurostat/Food safety statistics Working group “Food safety statistics”, 24-25 June 2010 Typology of sampling strategies ESTAT/F5/ES/201 Summary The participants to the Task Force "Control and monitoring activities" set up in 2004, have always raised the importance of a common glossary as a basic tool to compare not only data among countries but also among different control areas in the same country. This is currently difficult due to the specific terminology used in each area. As a consequence, one of the first tasks carried out by the Task Force was the creation of a common glossary. Eurostat proposed a pragmatic approach1, based on five principles, for building the glossary: inclusion only of terms necessary to interpret clearly the data collected on control and monitoring activities (the definitions must be compatible with those present in EU legislation); exclusion of very specific and technical terms used exclusively in a particular control domain; inclusion of the concept "sampling strategy" and the typology of "sampling strategies". At the moment, each control/monitoring domain uses a specific terminology: Active monitoring, passive surveillance, follow-up sampling, reinforcement sampling, routine monitoring, suspect samples and target samples are examples of terms used currently in the different control domains to describe the strategy and to report the results. In other domains this information is not even provided. The objective of this document is that of informing on the harmonised and simplified way to document which strategy is used to run the control and monitoring activities in the different domains reported to the EU and included in the controls database, like monitoring of zoonoses, monitoring of pesticide residues, official food and feed controls, TSE monitoring, etc. The final agreed proposal is part of the final version of the common glossary. It is also integrated in the last version of the controls database in order to contribute to a better interpretation of the data both for the activity of "inspections" and of "sampling for analysis". 1 See the pragmatic approach proposed by Eurostat in Doc. ESTAT/D6/ES/104: Common glossary of control and monitoring activities. Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 Typology of sampling strategies Objective ............................................................................................................................... 1 History of the process............................................................................................................ 1 Sampling strategy – Concept definition ................................................................................ 2 Sampling strategy’s typology................................................................................................ 3 Sampling strategies: Definitions and guidelines ................................................................... 3 Objective sampling............................................................................................................ 3 Selective sampling............................................................................................................. 4 Census .............................................................................................................................. 7 Suspect sampling............................................................................................................... 7 Convenient sampling......................................................................................................... 8 Other sampling strategies …………………………...………………………………………...……....8 Conclusion ……………………………………………………………………………………………….…..8 Objective The identification of the strategy used by the countries to select the sample of the target population that will be subject to controls is a key issue to help in the correct interpretation of the results obtained from those controls. At the moment, each control/monitoring domain uses a specific terminology: Active monitoring, passive surveillance, follow-up sampling, reinforcement sampling, routine monitoring, suspect samples and target samples are examples of terms used currently in the different control domains to describe the strategy and to report the results. In other domains this information is not even provided. The objective of this document is that of informing on the harmonised and simplified way to document which strategy is used to run the control and monitoring activities in the different domains reported to the EU and included in a single controls database, like monitoring of zoonoses, monitoring of pesticide residues, official food and feed controls, TSE monitoring, etc. History of the process The identification of the main strategies used in control and monitoring activities has proved to be a difficult task. A first proposal (of November 2005) identified and defined 3 main types of sampling strategies: objective sampling, selective sampling and suspect sampling2. It was included in the Commission "Guidance document on official controls, under regulation (EC) no 882/2004, concerning microbiological sampling and testing of foodstuffs"3. This typology is also proposed by the European Food Safety Authority (EFSA) in the manual for the annual reporting on the monitoring of zoonoses. The typology was revised by the Task Force Control and monitoring activities in 2006 as some countries found difficulties when trying to apply it at national level. The second proposal (of May 2006) which was considered better for data evaluation, identified 4 main types of sampling strategies resulting by the combination of the choice of the target population (risk/non-risk based) and the sampling method4. However, the new proposal looked more 2 See Doc. ESTAT/D6/ES/104 Rev.1 Available at: http://ec.europa.eu/food/food/biosafety/salmonella/microbio_en.htm 4 These 4 strategies were the following: Type 1 (risk based population selection and “random” sampling), Type 2 (Risk based population selection and “non-random” sampling), Type 3 (no risk based population selection and “random” sampling) and Type 4 (no risk based population selection and “non random” sampling). 3 1 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 complex and the Task Force recommended Eurostat to provide more detailed definitions and guidelines to help the countries to identify the sampling strategy type applied in each case. As still difficulties remained for some Member States, Eurostat decided to review the whole approach trying to propose a typology that could be applicable at national level. At meeting of the Technical Group “Food and feed control and monitoring activities” held on 1 – 2 October 2007, Eurostat presented a proposal intended to be an improvement of the initial "3-types" typology, focusing exclusively on the criteria used to select the units that will be controlled. After the final agreement, Member States were asked, on the basis of a “Questionnaire” prepared by Eurostat with the proposal for the correspondence between current terminology and typology of sampling strategies, to check whether the sampling strategies assigned for each domain were appropriate for their countries and, if necessary, correct them5. At the meeting of Technical Group “Food and feed control and monitoring activities” held on 30 – 31 March 2009, Eurostat presented the results of the replies to the Questionnaire provided by the countries on the sampling strategies used6. At the same time Eurostat provided further clarification in order to help the countries to assign correctly the sampling strategies in the different controls domains and invited the missing countries to send the replies by the end of April 20097. The informations provided have been included in the Controls database, in order to interpret the data. During 2009 Eurostat has collaborated with the EFSA “Working Group on Data Interchange" as promoting the coordination between SANCO, EFSA and Eurostat and working on data collected by SANCO and EFSA. In this context, the definitions of typologies of sampling strategies have been further defined and they were included in the EFSA document “Standard sample description for food and feed”8. This document is concerned with this final version of the sampling strategies. Sampling strategy – Concept definition The "sampling strategy" can be defined as the approach used to select the units of the target population9 subject to controls: businesses, animals, foodstuffs, etc. It is worth noting that the comparability and interpretation of the results relies on the sampling strategy but as well on other parameters like the analysis methods, analysis matrices, preparation of samples, methods of calculation of the results, etc. 5 See Doc. ESTAT/F5/ES/104 Rev.3, Part 1 “Typology of sampling strategies” presented at the meeting of Technical Group “Food and feed control and monitoring activities” on 1 – 2 October 2007, and Doc. ESTAT/F5/ES/104 Rev.4 “Typology of sampling strategies - Version of February 2008”. The “Questionnaire” in the “Annex: Proposal for the correspondence table between current terminology and typology of sampling strategies” are filled differently in the two documents, as the Doc. ESTAT/F5/ES/104 Rev.4 is a revision of the previous "Doc.ESTAT/F5/ES/104 Rev.3. Part 1" following a further in-depth analysis and comments received during and after the meeting of the Technical Group Food and feed control and monitoring activities of 1 – 2 October 2007. 6 See Doc. ESTAT/F5/ES/177 Rev.1 “Results of the Questionnaire to indicate the sampling strategies used” presented at the meeting of the Technical Group “Food and feed control and monitoring activities” on 30 – 31 March 2009. 7 See Doc. ESTAT/F5/ES/197 Rev.1 “Final minutes” of meeting of the Technical Group “Food and feed control and monitoring activities” on 30 – 31 March 2009. 8 Available at: http://www.efsa.europa.eu/en/scdocs/scdoc/1457.htm 9 Target population refers to the population that is the object of study. 2 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 The sampling strategy is generally decided at a high level in the national hierarchy and might be different depending on the control domain. The practical implementation of the strategy is carried out locally by inspectors which might take their own decisions regarding, for example, when to visit the establishment and what to sample within that establishment. Sampling strategy’s typology The object of the control/monitoring activity, the businesses, food products, live animals, etc. is the "target population". This is the population from which the control/monitoring activities could reach conclusions. The target population can be risk based determined or not. The potential risk can be identified on a scientific basis (results of scientific studies, knowledge of current production practices, etc.) or on the basis of a suspicion (information exchanged within the Rapid Alert System for Food and Feed, previous non-conformity results for the same product/ establishment). Once the target population is identified, a sampling method is used to select the sample of the "target population" that will be controlled/monitored in reality. The sampling method used to select the sample can be either "random"10, providing results from which conclusions can be applied to the "target population", or "non-random". A first key issue is to identify clearly the target population, which can be either the establishments of the food chain (i.e. inspections of hygiene conditions of retailers, HACCP controls, etc.), or specific products (i.e. monitoring of pesticide residues, controls for the presence of undesirable substances in animal nutrition, etc). In the second case, the selection of the products will be done in many cases by selecting the establishments where they will be controlled/monitored. Besides, the correct identification of a sampling strategy depends on the reporting of the data, as shown in the next paragraph. From the analysis of the information already provided to the European Commission and to EFSA, it appears that there are five main types of “sampling strategies” that are currently used in the control domains: Objective sampling, Selective sampling, Suspect sampling, Convenient sampling and Census. Sampling strategies: Definitions and guidelines11 Sampling strategy describes how the sample12 was selected from the population being monitored or surveyed. The definitions refer to reporting of aggregate data. Objective sampling Definition: strategy based on the selection of a random sample13 from a population on which the data are reported. 10 The method used to select the samples is any method (pure random, systematic, stratified, cluster sampling) leading to results applicable to the identified target population. 11 The guidelines are not intended to recommend the use of a certain strategy but just to help the countries to identify the strategy/strategies currently used, with the objective of documenting properly the data on control and monitoring activities. 12 A sample is a subset of a population, see definition in Doc. ESTAT/F5/ES/202. 13 Random sample is a sample in which each unit of the population has the same probability to be included. A random sample allows implementing statistical inference, that is, it leads to conclusions on the target population. A more general term for a random sample is “probabilistic sample”: each population unit has a positive and known probability of being selected to sample. 3 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 Guidelines: The method used to drawn the sample (establishments or commodities) could be any method (simple random, systematic, stratified, cluster sampling, multistage sampling) leading to results representative of the determined target population. In particular, it includes also those cases where the target population is stratified in subpopulation and the sampling is run with proportional criterion. This strategy provides data enabling the realisation of statistical inference. This strategy is one of those generally used for planned monitoring programmes intended to provide an overview of the situation regarding a certain food safety aspect. Examples: ¾ random selection of bovines older than 24 months in all certified slaughterhouses, for checks of TSE. (Purpose: To know the overall situation regarding the presence of BSE in bovines over 24 months slaughtered for human consumption.) ¾ stratified regional selection of food producers for HACCP controls. It would give results that could be expanded to all the population and breakdown by region. The results must be weighted according to the number of food producers of each region. (Purpose: to know the overall situation regarding HACCP compliance of food producers. Results are provided for "all food producers". It would also be possible to provide results at regional level.) ¾ controls for the presence of Listeria monocytogenes in cheeses obtained from raw milk, from a random selection of producers of raw milk cheeses; (Purpose: to know the overall situation regarding presence of Listeria monocytogenes in cheeses produced from raw milk. The target population is "Raw milk cheese". Results are provided for raw milk cheeses.) ¾ controls for the presence of Listeria monocytogenes in cheese from a stratified selection of cheese producers, grouped by the relative importance of their production of raw milk cheese compared to pasteurised milk cheese. The results must be weighted to take into account each type of cheese producers and their cheese production (Purpose: To know the overall situation regarding presence of Listeria monocytogenes in all cheeses. Target population is "Cheese". Results are provided for cheese. They could also be broken down for each of the 2 cheese types and for both the cheese type and the level of production.) ¾ controls for the presence of Listeria monocytogenes in cheeses obtained from pasteurised milk, from a regionally stratified selection of pasteurised milk producers. (Purpose: to know the overall situation regarding presence of Listeria monocytogenes in cheeses produced from pasteurised milk. The target population is "Pasteurised milk cheese". Results are provided for pasteurised milk cheeses. It would also be possible to provide results at regional level.) Selective sampling Definition: strategy based on the selection of a random sample from a subpopulation (or more frequently from subpopulations) of a population on which the data are reported. The subpopulations are determined on a risk basis or not. The sampling from each subpopulation is not proportional: the sample size is proportionally bigger for instance in subpopulations considered at high risk. This sampling includes also the case when the data reported refer to censuses on subpopulations. Guidelines: The sampling is deliberately biased, with respect the overall population on which the data are reported, and is directed at particular products or manufacturers. The criteria used to select the subpopulation from which to draw the sample is based on a scientific basis (results of scientific studies, knowledge of current production practices, etc.) or on the basis of previous 4 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 information (non-conformity results for the same products/ establishments for previous years, information exchanged within the Rapid Alert System). This strategy provides data which does not enable the realisation of statistical inference on the population that is the object of the control activity. It only provides information on the subpopulations. If a subpopulation is determined on a risk basis, it is worth noting that even though several countries use the same type of sampling strategy, the results would not be necessarily comparable as the definition of “risk” might be different in each country. Practically this would mean that these countries determine different subpopulations. Besides, the different monitoring systems address risk in different forms. This strategy is one of those used for planned monitoring and control programmes intended to reduce non-conformities. Examples: ¾ controls for dioxins in feed materials in agricultural holdings, where sample units are taken preferably from establishments close to potentially pollutant enterprises; (Purpose: to prevent the presence of dioxins in feed materials. Results are provided for all agricultural holdings with high risk of presence of dioxins.) ¾ checks for the presence of unauthorised veterinary drugs in randomly selected bovines entering the slaughterhouses from farms where positive results were found the previous year; (Purpose: to detect the use of unauthorised veterinary drugs in bovines. Results are provided for high risk farms.) ¾ at the slaughterhouses, selection of animals showing signs of injection for checking presence of unauthorised veterinary drugs; (Purpose: to detect the use of unauthorised veterinary drugs. Results are provided for suspected animals.) ¾ controls for BSE in some animals showing signs at the ante-mortem inspections. (Purpose: to prevent BSE animals entering the food production chain. Results are provided for bovines showing signs at the ante-mortem.) How to consider the results of a stratified sampling? Objective or Selective sampling? In stratified sampling the target population is split in 2 or more subpopulations (“strata”) according to some criterion. If the criterion is the level of risk, we can imagine for instance to split the population in 4 subpopulations according 4 levels of risk, from the lowest to the highest. Then a sample is drawn from each subpopulation, the results of the number of non-complying sample units in each sample are simply summed up and the total obtained is reported as number of non-complying sample units in the overall sample. Is this sampling Objective or Selective? In other words, does the result on the overall sample, calculated as above, provide conclusions that can be applied to the "target population"? It depends on the sample size of each stratum and on the variability inside each subpopulation. Let’s consider only the sample size. a) If the sample size is bigger for subpopulations considered “high risk” than for the ones considered “low risk”, the global effect on the estimate (total or percentage) of non–compliance in the population could be unbiased towards “high risk” subpopulation; in other words, the overall level of non–compliance is overestimated. In this case the sampling should be classified as “Selective”, as the result reflects more the situation in “high risk” subpopulations. 5 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 b) If the total sample is allocated among the strata using the “Proportional allocation” criterion n (that is: the sampling rates f i = i are constant in all the subpopulations) the global effect on Ni the estimate of non–compliance of the population should be unbiased, and the sampling can be classified as “Objective”. Let’s consider the implications on the estimation of a percentage of non-complying sample units related to the overall population. An unbiased14 estimate of such percentage is calculated by summing up the weighed percentages of non-compliance in each sample i as follows: ) m ) ⎛N ⎞ m x ⎛N ⎞ p = ∑ pi ⎜ i ⎟ = ∑ i ⎜ i ⎟ ⎝ N ⎠ i =1 ni ⎝ N ⎠ i =1 where: ) p is the estimate of percentage of non-compliance in the population, m is the number of subpopulations (i = 1, …m), ) pi is the percentage of non-compliance in the sample drawn from the subpopulation i, Ni is the size of the subpopulation i, N is the size of the population, ni is the size of the sample drawn from the subpopulation i, xi is the number of non-complying sample units in the sample drawn from the subpopulation i, n is the size of the overall sample, x is the number non-complying sample units in the overall sample. In practice, in controls and monitoring activities, the data collected from different subpopulations are very often summed up without being weighted. Then the percentage of non compliance is calculated as follows: x ) n x p=∑ j = n i =1 n where xj may assume value 1 (non-complying sample unit) or 0 (complying sample unit) and x is the total number of non-complying sample units in the sample. Is this percentage an unbiased estimate of percentage of non-compliance in the population? Yes, if the total sample is allocated among the strata using the “Proportional allocation”, in fact: n n fi = i = = f, Ni N it follows that ni Ni = n N and 14 In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator with zero bias is called unbiased. Otherwise the estimator is said to be biased. 6 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 x ) m ) ⎛N ⎞ m x ⎛n ⎞ m x p = ∑ pi ⎜ i ⎟ = ∑ i ⎜ i ⎟ = ∑ i = ⎝ N ⎠ i =1 ni ⎝ n ⎠ i =1 n n i =1 As a general rule, the results obtained from the sample drawn by a method that implies a subdivision of the target population, say stratification or clustering, should be weighted according to that subdivision in order to provide representative results of the whole target population (Objective sampling). If this is not done, the results should be reported separately for each subpopulation group. If the results by strata or cluster are summed up without weighting, this should be clearly mentioned as no conclusion can be drawn from them (Selective sampling). Census When the totality of a population, on which the data are reported, is controlled. Guidelines: following the definition, if a census of a subpopulation of a population is included in the reporting referred to the population, the sampling referred to the population is Selective. For example, data on controls on imported grapes including a census of grapes imported from Italy and sampling on grapes from other countries, should be reported as follows: Grapes: Selective sampling, Grapes from Italy: Census. Example ¾ Hygiene inspections of all registered slaughterhouses (Purpose: to know the overall situation of registered slaughterhouses regarding compliance with hygiene legislation). ¾ BSE checks of all bovines older than 30 months in all certified slaughterhouses. (Purpose: To know the overall situation regarding the presence of BSE in animals slaughtered for human consumption.) Suspect sampling Definition: selection of an individual product or establishment in order to confirm or reject a suspicion of non-conformity. It's a not random sampling. The data reported refer themselves to suspect units of the population. Guidelines: The selection is focused on a particular product or manufacturer. The choice is based on the judgement and experience regarding that product or manufacturer. This strategy is intended to reduce the non-conformities. Results can not be generalised and refer only to the individual product or establishment. Examples: ¾ Controls to verify a problem reported by a consumer in a product obtained from a certain producer; (Purpose: to verify if the problem remains in order to take the necessary actions. Results are provided for the concerned product and producer.) ¾ Checks for the presence of pesticide residues in oranges available from a precise wholesaler, as a consequence of previous non-conformity results; (Purpose: to verify if non-conformity persists or not. Results refer exclusively to the precise wholesaler and fruit). 7 Eurostat/Food Safety Statistics Updated methodological information – Typology of sampling strategies WG Food Safety Statistics, 24-25 June 2010 ESTAT/F5/ES/201 Convenient sampling Definition: Strategy based on the selection of a sample for which units are selected only on the basis of feasibility or ease of data collection. It's a not random sampling. The data reported refer themselves to units selected according to this strategy. This “new” typology was added as included in sampling strategies used in data collected by EFSA. Other sampling strategies In order to document the Controls database, further typologies of sampling strategies can be assigned if none of the previous ones has been used for the selection of the sample units: More than one sampling strategy: the data reported refer to sample units selected according more than one sampling strategy, for example: some units are selected according “Objective sampling” and other units according “Suspect sampling”. Other: the data reported refer to sample units selected according a strategy not included in the previous ones. Not specified: the data reported refer to sample units selected according a strategy not specified. Conclusion The objective of this document is that of informing on the harmonised and simplified way to document which strategy is used to run the control and monitoring activities in the different domains reported to the EU. The documentation of the sampling strategies used by countries in the different domains is included in the controls database in order to contribute to a better interpretation of the data both for the activity of "inspections" and of "sampling for analysis". 8