IDB/ CARICOM REGIONAL PUBLIC GOOD COMMON FRAMEWORK FOR A LITERACY SURVEY PHASE1: Establishment of Common Framework FINAL REPORT May 2013 1 2 Preface This report was reviewed by the CARICOM Advisory Group on Statistics (AGS) and the Secretariat and was changed to enable greater clarity. The Secretariat is indebted to the Consultant, Scott Murray, for his methodological inputs and guidance. 3 Table of Contents Acronyms ......................................................................................................................... 6 Executive Summary ..................................................................................................... 7 INTRODUCTION TO THE PROJECT ..................................................................... 15 Background ............................................................................................................... 15 Objectives ..................................................................................................................... 15 Scope of work/Expected output/Results to be achieved of the Project ........................ 16 CHAPTER 1: BACKGROUND TO LARGE-SCALE LITERACY ASSESSMENTS, REVIEW OPTIONS AND EVALUATION CRITERIA .......... 20 1.1. A Brief History of Large-Scale Literacy Assessment .......................... 20 1.2. The Assessment Options ............................................................................ 30 1.3. Approach to Implementing Household-Based Literacy Skills Assessments ............................................................................................................. 35 1.4. The Evaluation Criteria and Regional Constraints ............................ 40 CHAPTER 2: A REVIEW OF OPTIONS .............................................................. 46 2.1. Program for International Assessment of Adult Competencies (PIAAC) - Common Assessment .......................................................................... 47 2.2 Program for International Assessment of Adult Competencies (PIAAC)- Full Assessment ..................................................................................... 49 2.3. Literacy Assessment and Monitoring Program (LAMP) .................... 55 2.4. Saint Lucian Instruments- Common Assessment ............................. 58 2.5. Saint Lucian Instruments- Full Assessment ....................................... 63 2.6. Bow Valley Web-based Assessment- Common Assessment ........... 63 2.7. Bow Valley Web-based Assessment- Full Assessment ..................... 67 2.8. Summary ......................................................................................................... 68 CHAPTER 3: ANALYSIS OF THE LITERACY SURVEY EXPERIENCE IN THE REGION ………………………………………………………………………..70 3.1. Bermuda’s Experience ................................................................................ 70 3.2. Saint Lucia’s Experience ............................................................................ 74 3.3. Proposed Work in Dominica ...................................................................... 78 CHAPTER 4: FEEDBACK FROM THE CARICOM ADVISORY GROUP ON STATISTICS (AGS)....................................................................................................... 79 CHAPTER 5: ANALYSIS OF THE INDIVIDUAL COUNTRY CAPACITY ASSESSMENTS ............................................................................................................ 84 CHAPTER 6: DETAILS ON THE OPTION RECOMMENDED BY THE AGS- FULL BOW VALLEY ASSESSMENT ......................................................... 117 4 6.1. Detailed Methodological Approach of the Full Bow Valley WebBased Assessment ................................................................................................ 117 6.2. Recommendations to Inform the use of Bow-Valley Full WebBased Assessment ................................................................................................ 119 6.3. Adjustments Required ............................................................................... 122 CHAPTER 7: RESULTS OF THE TWO REGIONAL TRAINING WORKSHOPS CONDUCTED UNDER PHASE 1 .............................................. 124 7.1 The First Regional Training Workshop ................................................ 124 7.2 The Second Regional Training Workshop ........................................... 125 CHAPTER 8: COMMON FRAMEWORK WITH THE PLAN OF ACTION ... 126 CHAPTER 9: SUMMARY AND CONCLUSION ................................................. 151 9.1 Activities Completed Under Phase I ....................................................... 157 LIST OF REFERENCES ........................................................................................... 158 ANNEX A: TERMS OF REFERENCE ................................................................... 159 ANNEX B: COUNTRY ASSESSMENT QUESTIONNAIRE ........................... 167 ANNEX C: COSTING FOR SAINT LUCIA’S LITERCY SURVEY PILOT .. 174 ANNEX D: SMALL AREA ESTIMATION .............................................................. 177 ANNEX F: INCEPTION REPORT ................................................................................ 183 ANNEX G: COMMON FRAMEWORK WITH PLAN OF ACTION .......................... 184 ANNEX E: REPORTS ON THE CARICOM TECHNICAL WORKSHOPS ON THE COMMON FRAMEWORK FOR A LITERACY SURVEY ANNEX EI: First Workshop Report ANNEX EII: Second Workshop Report ANNEX F: INCEPTION REPORT 5 Acronyms AGS ALLS BPC CAPI CARICOM CCL CSME DEELSA DeSeCo ETS GDP HRSDC IALS IDB IRT ISRS LAMP LSUDA MOE NALS NCES NSO OECD PIAAC PISA PoA SALNA TAG TOR UIS UNESCO USA YALS Advisory Group on Statistics Adult Literacy and Life Skills Survey Board of Participating Countries Computer-assisted Personal Interviewing Caribbean Community Canadian Council on Learning CARICOM Single Market and Economy Directorate for Employment, Education, Labour and Social Affairs Definition and selection of competencies Educational Testing Service Gross Domestic Product Human Resources and Skills Development of Canada International Adult Literacy Survey Inter-American Development Bank Item Response Theory International Survey of Reading Skills Literacy Assessment and Monitoring Program Literacy Skills Used in Daily Activities Ministry of Education National Adult Literacy Survey National Center for Educational Statistics National Statistics Office Organization for Economic Cooperation and Development Program for International Assessment of Adult Competencies Programme for International Student Assessment Plan of Action Saint Lucia Adult Literacy and Numeracy Assessment Technical Advisory Group Terms of Reference UNESCO Institute for Statistics United Nations Education and Science Organization United States of America Young Adults Literacy Survey 6 Executive Summary The following is a summary of the Final Report of Phase I: Options Reviewed 1. The review of options examined four distinct assessments. In addition, in three of the options, the assessment considered instances of a full sample size and a reduced sample size. Therefore, a total of seven detailed options were identified and evaluated. 2. It was noted that the methodology underpinning all of the options reviewed was based on the same foundation, namely the International Survey of Reading Skills (ISRS). 3. Each option was evaluated in terms of information yield, cost, operational burden, technical burden and risk. 4. The assessment provided an analysis of the origins of large-scale Literacy measurement including the ISRS, the Adult Literacy and Life Skills Survey (ALLS) and the Young Adults Literacy Survey (YALS). 5. The review of the various literacy assessment approaches conducted by the Consultant, suggests that all of the options would satisfy the Region’s information needs and could provide similar levels of reliable estimates. Further as indicated in (2), all the options have common methodological underpinnings and may differ only by the data collection method employed. Some countries may opt for paper and pencil data collection while others may opt for some form of electronic data collection/ web-based approach. 6. Therefore, one can conclude that the recommendation of the AGS is primarily one on the approach to data collection based on the criteria of the assessment and has nothing to do with the theoretical differences. 7 Regional Experience 7. The evaluation reflects the needs and constraints facing the countries of the Region. 8. With the exception of Saint Lucia and Bermuda, there is limited or no experience in the conduct of household-based skills assessments; limited access to financial resources to support an assessment; operational burden that a household-based skills assessment imposes on the national data collection infrastructure and limited technical capability to support the adaptation and implementation of household-based skills assessments 9. Generally, the results of the country assessment suggest that most countries in the Region have limited capacity to administer any of the full-scale assessment options including the full Bow Valley web-based assessment. Most countries would need to greatly enhance their collection and processing capacity and most would need assistance and support to complete the technical aspects of implementation including sample selection. Key Issues Arising Out of Evaluation and Feedback from Countries 10. The evaluation suggests that all of the options reviewed would satisfy the Region’s information needs. 11. Member States indicated preference for the use of Paper and Pencil as well as electronic data collection including web-based procedures for data collection. 12. It was recommended that each country be considered a separate domain and that the sample size should be proportionate to the size of the population of the country. 13. It was noted that while a very high response rate is usually difficult to achieve for this particular type of survey, countries should strive to achieve the required response rate of about 75- 80 percent. Measures should be implemented to adjust for response rate bias. 8 14. It was noted that the use of a web-based assessment might prove to be challenging for non-computer users. The meeting was advised that in such cases there are two options- (i) a specially designed tutorial could be taken in advance of the assessment, to acquaint non-computer users with basic mouse operations and the response types. (ii) the interviewers will input the responses into the data collection device at the direction of the respondent. Recommendations from Framework 15. Sample SizeIn general, the size of the sample should be dependent on the country specific policy requirements and subject to cost and the budget available to conduct the assessment in the respective countries. In order for the point estimates to be reliably estimated according to selected characteristics, the Consultant has indicated that a minimum of 600 cases is required per category for each characteristic. However, this figure of 600 cases is based on a desired level of precision or margin of error and level of confidence. It is possible therefore that if the tolerable level of confidence is say 90 percent and the margin of error is say 5 percent then the number of cases may be less than 600. Countries can use the margin of error and level of confidence that they would normally use in their household surveys to derive reliable estimates at the sub-national level and for sub-groups of the population. This issue will be discussed further in the guidelines for sample design that will be prepared under Phase II. 16. Adequate communication infrastructure including internet access If the preferred data collection method is web-based, it is recommended that countries identify at an early stage, areas where internet coverage or access are inadequate. In the absence of the internet, the Bow Valley assessment tools allow for the use of the 3G network or related networks. The assessment tools also 9 allows for off- line data collection with the use of large cache memory, in which case, the delayed data download will be employed post the interviews. In addition, central locations can be identified where internet access is available and where respondents could be interviewed. 17. Sharing of equipment to conduct survey across countries It is recommended that there be sharing of equipment across countries to make it feasible for all countries to participate in the survey. This approach is possible since countries are not likely to execute the survey at the same time. This approach will result in a considerable reduction in the overall survey cost per country. Countries can therefore contribute to the purchasing of the equipment, mainly laptops, tablets and other similar devices. This approach would satisfy concerns raised by some countries relative to the cost of acquiring equipment. 18. Respondents with limited or no computer technology knowledge The recommendation is for interviewers to input responses on the device as directed by the respondents since the assessment tools allow for this. Additionally, tutorial sessions on the use of the devices to respond to the questions, should be made available to the respondents prior to the test. 19. Adequate human resources Since the majority of the countries currently lack the operational and technical capacity to conduct a literacy survey in general and specifically the Full Bow Valley Web-Based assessment, it is recommended that countries/ Region consider this limitation when preparing the budget for their assessment. High-level technical experts to be utilized to provide the training and to bridge the gap. 20. Method of selection, age and number of respondents per household It is recommended that one adult, 15 years old and over be selected per household using the Kish selection method. 21. Relevance of assessment instruments 10 There must be country involvement in the development or refinement of test items, questionnaires and corresponding documents including manuals for training, interviewing and tutorials for respondents to ensure suitability to the respective countries. 22. Generation of synthetic estimates It is recommended that synthetic estimates can be generated by applying the national survey estimates obtained (using a sub-sample of 1,000 cases of the determined country sample) to the data of the Population and Housing Census. This approach is said to be a useful method to obtain estimates for a broader range of characteristics to satisfy policy requirements and utilizes applied statistical methods. This approach does not imply that countries will be utilising the recommendation of the Consultant of considering the Region as a domain and the countries as sub-domains with the sample size of 1,000. The sample size will be selected in accordance with Recommendation 1 and a sub-sample of this sample can still be applied to the Census data to produce synthetic estimates in addition to those that would be obtained from the survey. 23. Pretesting/ piloting must be done in each participating country This is necessary to ensure that all the tools are applicable for the respective countries. However, the sample (100- 500 cases) used in the pilot should be selected from the main survey sample so that the data collected during pilot exercise could be utilized should there be no need for any major modification to the tools. 24. Translation of the common framework to Haitian French and Surinamese Dutch The translation of the framework including the test items should be done by linguists who are familiar with the framework. The translation of the test items for example (included in the filter booklet, location booklet and the main booklet) should be done in such a way to ensure that the psychometric performance of the items remains unaltered This is 11 necessary to ensure that the test items remain identical in psychometric terms so as to ensure comparability among countries. 25. Duration of training of field staff The length of training of field staff would depend on the quality of field staff and would vary by country. Main comments received from Member States relative to the Framework 26. Countries have indicated that the cost of the exercise will be a major concern. The absence of technical expertise for the conducting of the survey will pose a problem. This would include expertise that would be required in the areas of survey sampling, data editing, scoring and weighting of the results, variance estimation and statistical quality control of the operations. 27. It was also indicated that the respective Ministries of Education may not have the necessary managerial capacity to undertake the Literacy Survey and that the respective statistical agencies may also lack the pedagogical skills to work on the instruments. Therefore, collaboration between the two agencies would be required and should be possible. 28. It was observed by one country that the paper and pencil-based environment is a more familiar environment than the web-based one and that a significant culture shift would be required in the case of the latter. Ensuring suitability of the web-based approach at the country level should be taken seriously. Actual devices should be tested and there is also concern about the location and security of the data set. 29. It was also stated that the paper and pencil-based approach has credibility and ownership within it since the scoring is done by persons such as teachers in the country trained to score the completed test booklets. Therefore, especially with the active 12 involvement of these persons in scoring, there will be more buy-in to the process. 30. With respect to the web-based approach, it is very important that mechanisms be setup to ensure that the validity of the process is well understood and thoroughly tested. 31. It must be possible for the assigned score on each case (which is assigned by the web-based system) to be seen, validated and verified. The process of doing this in a web-based system is not obvious and will need to be thoroughly tested. 32. It was indicated by another country, relative to a One Laptop Per Family (OLPF) project, that this country might be at an advantage if a web-based methodology is used. This country viewed favourably the proposed solutions for areas within countries that do not have internet connectivity as well as for persons that were not computer literate. 33. One country stated that oversampling of the 15-24 age group (which includes the population just completing secondary school) might be necessary since the literacy survey may have, as one of its main objectives, the assessment of the education system in providing an education relevant to the present day realities of the job market. It was further stated that this group is of special interest since it demands jobs the most, requires new skills in some cases, is the most adaptable to retraining, and is a significant age cohort within the population. 34. Of the 16 countries that responded to enquiries relative to the data collection approach they are likely to use for the conduct of a National Literacy Survey in their respective countries, seven indicated the paper and pencil-based approach while nine indicated the electronic approach. However, it should be noted that it is not likely that all the countries indicating electronic would opt for the Bow Valley web-based option. However, the theoretical underpinning of the literacy testing framework will be the same 13 regardless of the approach electronic/ web-based) used (i.e. paper-based versus Achievements of the Technical Workshops 33. Two technical workshops were conducted under Phase I. The main achievements of these workshop were that participants gained a better understanding of what is involved in the planning and execution of a large-scale Literacy assessment; the various literacy measurement approaches; the issues pertaining to costing; sample size estimation; background questionnaire and assessment/test booklet, the Plan of Action relative to the recommendations and actions that inform the common framework. 14 INTRODUCTION TO THE PROJECT Background Consistent, reliable and comparable statistical information is an important ingredient for planning, monitoring and evaluation of policy decisions and the lack thereof has long been considered as hampering the effectiveness of public policy in the Caribbean Region. Consequently, noting the lack and/or poor quality of literacy statistics in the Region, and the perceived importance of this information for the Region’s economic and social development in light of the advent of the CARICOM Single Market and Economy (CSME), Statisticians of the Caribbean Community (CARICOM) decided to pursue the development of a CARICOM programme for the collection and analysis of social and gender statistics including educational statistics. Aware of these challenges and shortcomings of the existing approaches to measuring literacy, the CARICOM Advisory Group on Statistics (AGS), is attempting to develop a common framework for the production, collection and analysis of Literacy data. The AGS recruited a consultant to facilitate the development of a common framework for a literacy Survey for the Region. Objectives This project is designed to create a Common Framework involving a regional approach for conducting the literacy assessment methodology, the development of literacy assessment instruments and the provision of technical assistance for the development of national implementation plans. The project is entitled “Common Framework for a Literacy Survey in Project ATN/OC-11810-RG under the Regional Public Good Facility” and is being funded from resources of the Inter-American Development Bank (IDB). The executing agency is the CARICOM Secretariat under its Regional Statistics Programme. The program of work set out for the IDB-financed public good project consists of three components/ phases as follows: 15 Component/Phase I: Establishment of a regional framework for conducting and adapting Literacy assessment models for the facilitation of a regional assessment for the execution of the Literacy Survey treating each country as a sub-population; Component/Phase II: Development and adaption of instruments, such as survey instruments (questionnaires), training manuals, and related materials, to inform about the survey, documentation on the concepts and definitions, scoring of the assessment, sampling approach, data dissemination/tabulation format, etc. as part of the common framework; and Component/Phase III: Development of a template for the national implementation plans using a common questionnaire, field test procedures for establishing the psychometric and measurement property of the survey instrument, and confirmation of key aspects of survey cost and quality. As an initial step in the preparation of this framework, reviews of the methodologies used in the LAMP, ISRS, Saint Lucia assessment, the Bow Valley web-based assessment and the PIAAC. The purpose of this report is to document the comprehensive review, the consultations with the AGS and the Secretariat, the responses to the country assessment questionnaire, the detailed methodological approach with recommendations, adjustments required, actions to be taken and the actual methodology to be utilized. Scope of work/Expected output/Results to be achieved of the Project Component/Phase I Activities: As the initial step of Phase I and of the Project as a whole, the Consultant was required to engage in a Briefing Meeting with the Regional Statistics Programme, CARICOM Secretariat to discuss the scope of work of the project. The meeting was held on the 20 May 2011 at the CARICOM Secretariat, Georgetown, Guyana and was attended by the Regional Statistics Programme as well as other officers from other 16 key directorates from within the Secretariat (see Annex G for Report on the Briefing Meeting). The meeting clarified the scope of the project, the assessment options to be reviewed, the risks and constraints that needed to be taken into consideration, the expected outputs, the project timeline, reporting requirements and logistics. The meeting also made it clear that the consultancy was not to focus on any aspect of implementation but rather provide CARICOM Member States with options for consideration. Following the Briefing meeting, an inception report was prepared to document the project/ consultancy execution methodology and the detailed draft work plan/ implementation schedule for the lifetime of the project/ consultancy. This Inception Report , a copy of which is included as Annex F, reflected input (improvements/ adjustments) received from the Briefing Meeting held at the Secretariat on 20th May, 2011 and from the 8th Advisory Group on Statistics (AGS) meeting held in Kingston, Jamaica June 27 -29, 2011. The inception report stressed the importance of skill to economic growth and competitiveness and the need for reliable comparative data to inform public policy. The report also stressed the high cost, technical and operational burden of household survey-based skill assessments and the risk of these burdens overwhelming the limited capacities of Member State’s statistical systems. The project description, the project scope, the expected outputs, the reports to be delivered as part of the project, and the risks and assumptions relative to the project were all elaborated on in the Inception Report. An initial draft schedule of activities was also included (see Annex F). As an initial step in the preparation of the common framework, comprehensive reviews will be undertaken to assess the methodologies used in the LAMP, International Survey of Reading Skills (ISRS), Saint Lucia Adult Literacy and Numeracy Assessment (SALNA), the Program for 17 International Assessment of Adult Competencies (PIAAC) and the Bow Valley web-based assessment These comprehensive methodological reviews will also identify any problems in application to CARICOM and adjustments that would be required in adapting a common approach considering the constraints facing Member States and Associate Member States. In addition, detailed reviews of Bermuda’s experience with implementing the Adult Literacy and Life Skills Survey (ALLS) study, and prospectively, the assessment planned in Dominica. It is expected that the Secretariat and the AGS will review this report and will make recommendations on an option to serve as a common framework for literacy assessment in the CARICOM area. A Plan of Action (PoA) of the agreed approach will be prepared and submitted. This PoA will outline the estimated cost and sequence of steps to achieve the recommendations of the recommended option. Support would then be given to the Secretariat and the AGS in the preparation of the Final Common Framework for a Literacy Survey. Component/Phase II Activities: Based on the Common Framework developed in Phase I, all instruments and related documents will be prepared in this phase. These include, common instruments for pilot-testing and screening, sampling frame, instructions for measuring/scoring the levels of literacy, training manual (that speaks to the instruments to be used; training frequency and efforts required; quality insurance mechanisms to be used during data collection and subsequent data validation), guidelines for data processing and format for data analysis and subsequent publication/dissemination. The Member States will be consulted as per TOR (Phase II activities item (a) x). A draft report and a final report of phase II activities will be prepared.The former will include adjustments required based on feedback from the AGS, Member States, workshop and the Secretariat. The latter will 18 incorporate all comments/outputs from meetings/ workshop, the Secretariat and other stakeholders. Component/Phase III Activities: This component will provide technical assistance during and after a third regional workshop to develop a template for the national implementation plans and its adaptation to the national realities. The template for the preparation of the national implementation plan will include a list of all activities to be undertaken as contained in the detailed common literacy framework and documentation to be obtained as per the survey instruments including pilot-test and quality assurance. The draft template will also include the cost estimates (Budget) of all activities such as timelines, number of interviewers, supervisors, scorers required, training requirements, resources required for subsequent data analysis, a social marketing effort to inform the general public, and a sustainability component. During this phase, a cost estimate for the conduct of two Literacy Assessment in each CARICOM country will be prepared as per TORPhase III/ Component III, item (c). Additionally, information will be collected from member countries with regard to resources available at the national level, staffing, budget, collaborating partners (Statistical Office, Ministry of Education and other relevant agencies) and other relevant information related to capacity availability/constraints that can inform the template to be prepared. A final draft national implementation plan will be produced to reflect feedback from the Secretariat and the AGS. A draft report and a final report of phase III activities will be prepared and submitted to the Secretariat. Both reports will include the national implementation plan template but the latter will incorporate all comments/outputs from meetings, the Secretariat and other stakeholders 19 CHAPTER 1: BACKGROUND TO LARGE-SCALE LITERACY ASSESSMENTS, REVIEW OPTIONS AND EVALUATION CRITERIA This chapter describes the options that were selected for evaluation and sets out the criteria that were applied in the review. The chapter begins however, with a brief overview of the history of literacy assessments. This will provide a background to how the options being evaluated relate one to another. 1.1. A Brief History of Large-Scale Literacy Assessment 1.1.1. Young Adult Literacy Survey (YALS) The initial large-scale comparative assessments of adult literacy and numeracy can be traced back to the Young Adult Literacy Survey (YALS) conducted by the Educational Testing Service (ETS) on behalf of the United States Departments of Labor and Education. The YALS study was enabled by scientific advances in five key areas as follows: (a) (b) The first advance involved the development of the theory to explain the relative difficulty of reading and numeracy tasks. Developed by Irwin Kirsch and Peter Mosenthal (Kirsch and Mosenthal, 1994), the models explained a sufficiently high proportion of the observed variance in item difficulty to provide a means to develop tests that systematically sampled the cognitive domains of interest. The initial models explained roughly 83 percent of observed variance in item difficulty, enough to allow the results of the assessment to be interpreted as reliable indications of generalized proficiency. This innovation also provided a means to describe proficiency in an effective way – to identify what respondents at different levels could and could not do. The second advance involved the development of statistical procedures to provide reliable summaries of both item difficulty and individual and group proficiency. Referred to as Item Response Theory (IRT), these statistical procedures extracted reliable estimates of both item difficulty and proficiency out of 20 (c) (d) (e) 1.1.2. complex vectors of test results than themselves included a significant amount of item level non-response. The third advance involved the development of statistical methods to support the estimation of reliable estimates for population subgroups complete with unbiased estimates of standard errors that include the error associated with the fact that one was drawing representative samples in two dimensions i.e. of people and of the cognitive domains of reading and numeracy. The fourth advance was simply the development of procedures to support the administration of the assessments within the context of a household survey. These procedures included methods designed to both maximize response rates and to minimize the impact of partial non-response on the estimates of item difficulty and proficiency. The fifth and final advance involved the development of procedures to control the error in the scoring of open-ended test item responses. Scoring error of this sort translates directly into bias in the proficiency estimates so procedures had to reduce these errors to negligible levels. The approach developed involved the adaptation of standard statistical quality control processes commonly used in statistical coding operations by national statistics offices. Survey of Literacy Skills Used in Daily Activities (LSUDA) and National Adult Literacy Survey (NALS) The conduct of the YALS study in the United States of America precipitated the conduct of two national assessments in Canada. The Southham study of 1989 applied the YALS approaches to measurement and data collection but failed to apply the statistical methods used to summarize item difficulty and proficiency. The 1989 Survey of Literacy Skills Used in Daily Activities (LSUDA), conducted by Statistics Canada was the first study to apply the full YALS methodology in two languages i.e. Canadian English and Canadian French. The next years the ETS fielded the 1990 National Adult Literacy Survey (NALS) on a large representative sample of US adults. 21 1.1.3. International Adult Literacy Survey (IALS) Comparison of results by Kirsch and Murray at a meeting organized by the then UNESCO Institute of Education in Hamburg in November, 19891 led to the design and implementation of the International Adult Literacy Survey (IALS) using a combination of NALS, LSUDA and newly developed items. IALS was implemented by a consortium that involved Statistics Canada, the US National Centre for Education Statistics (NCES), ETS and the Organization for Economic Cooperation and Development (OECD). Statistics Canada provided overall project management and household survey expertise related to sampling, data collection, data processing, weighting, variance estimation, coding and data analysis. ETS assumed responsibility for item development, test design, scoring and related psychometric tasks. The OECD provided the means for the rapid dissemination of results to policy makers. NCES provided technical advice and funding to support development and implementation. Three large-scale rounds of IALS data collection between 1994 and 1998 ensued involving some 25 population/language subgroups. The IALS data revealed several key facts, including that: (a) The existence of much larger differences in the level and distribution of literacy and numeracy skills than expected (b) Literacy and numeracy skills had a marked impact on a range of individual labour market, health, social and educational outcomes Organized by Paul Belanger, the then Director of the UNESCO Institute of Education, and co-sponsored by the OECD, the workshop brought together assessment experts and policy makers from several countries to discuss functional literacy. Irwin Kirsch attended as a US expert having worked on the 1985 U.S. Young Adult Literacy Survey (YALS) and 1990 U.S. National Adult Literacy Surveys (NALS). Scott Murray attended as a Canadian expert responsible for the conduct of the 1989 Canadian Survey of Literacy Skills Used in Daily Activities (LSUDA) 1 22 (c) The impact of literacy and numeracy on outcomes varied significantly by country in response to differences in the relative balance of skills supply and the social and economic demand for skills. (d) The observed differences in the level and distribution of literacy skills by country explained over half of the differences in long term rates of growth in Gross Domestic Product (GDP) and labour productivity IALS also confirmed the need for more quality assurance related to sampling and the control of data collection. Additional quality assurance methods were added with each successive round of IALS data collection2. 1.1.4. Adult Literacy and Life Skills Survey (ALLS) Starting in 2000, the US NCES and Statistics Canada initiated a program of work designed to inform the design of a new round of international comparative assessment. The program included fundamental work on the definition and selection of competencies (DeSeCo) that was jointly funded by Statistics Canada, the NCES and the Swiss government. Known as DeSeCo, this element identified several additional skills domains that might be included. Subsequently, frameworks and associated measures were developed for numeracy, teamwork, problem solving and information and communication technologies. Testing in multiple countries3 revealed that only the initial prose literacy, document literacy and the new numeracy and problem solving measures met the demanding standards set for inclusion. The background questionnaires were also refined and extended. Two rounds of data collection, involving some 11 countries, were undertaken of what was known as the Adult Literacy and Life Skills Survey (ALLS). Analysis of the data4 using synthetic cohort methods provided the first clear evidence of skills loss on the supply of literacy skills. Bermuda participated in the ALLS in 2003. 1.1.5. International Survey of Reading Skills (ISRS) The Consultant served as the International Study Director for all three rounds of data collection and analysis. 2 3 4 Canada, United States, Spain, Netherlands Done by the Consultant and a colleague. 23 Having established the nature of adult literacy and numeracy problems in OECD economies and their impact, Statistics Canada, the NCES and the ETS jointly developed a study to shed light on what might be done to help low-level readers improve their skills. The subsequent study, fielded in 2005 and known as the International Survey of Reading Skills (ISRS), was the first international comparative study to test the component reading skills of adults with a battery of clinical reading tests and an oral fluency test. The testing, undertaken in Canadian English, American English and Canadian French, showed that the components reading measures played a significant role in explaining the emergence of general reading proficiency. Moreover, analysis of the data identified several groups of learners each of whom shared distinct patterns of strength and weakness on the component measures5. The ISRS was designed to correct what was perceived to be a failure in the market for literacy services i.e. that the instructional offerings of programs were not well matched to the specific needs of different kinds of learners with the result that the overall efficiency, effectiveness and levels of learner satisfaction were well below what was possible. The ISRS provided a means to classify potential learners into groups based on patterns of test-takers strengths and weaknesses in the component reading measures. This information allowed programs to create homogeneous groups of learners and to tailor instruction to each group’s specific needs. The availability of the ISRS data also allowed for a nuanced analysis of the numbers of each group in the adult population, what best practice would do to address their learning needs and an analysis of the associated costs and benefits – all information needed by policy makers and programs to target their resources more efficiently. 1.1.6. Literacy Assessment and Monitoring Programme (LAMP) In 2005, the UNESCO Institute of Statistics commenced the adaptation of the IALS/ALLS/ISRS methods to meet the needs of a broader range of countries6. After the Institute had spent 2 years and a fair bit of money to develop an approach known as the Literacy Assessment and The Consultant has used these patterns to define best practice instructional responses for each group and as a basis for an associated series of cost/benefit analyses (CCL, 2007; DataAngel, 2009, DataAngel, 2010). 6 The Consultant was recruited to assist with the adaptation. 5 24 Monitoring Programme (LAMP), a partnership was negotiated with ETS and Statistics Canada to adapt the ALLS and ISRS measures and methods. Development of LAMP was completed in early 2007. The resulting design included a background questionnaire, an assessment that measured prose literacy, document literacy, numeracy and, for readers in Levels 1 and 2, reading component measures based on the ISRS model and measures. Pilot assessments were organized in several countries and languages including Niger, El Salvador, Palestine, Mongolia and Morocco. Full-scale collection was only undertaken in Palestine. At this point, the Consultant left the Institute and the new director abrogated the agreements with Statistics Canada and ETS to support implementation. Unfortunately, at the time of writing nothing has been published on the psychometric performance of the measures, difficulties encountered during implementation nor on the substantive results i.e. on the tangible results. 1.1.7. Programme for Competencies (PIAAC) the International Assessment for Adult In 2008, the OECD began development of instruments for a new round of international comparative assessment. The ETS won the bid to manage development and implementation of the first round of the Program for the International Assessment of Adult Competencies (PIAAC) collection. This Program borrows heavily on the IALS/ALLS/ISRS design. The PIAAC combines the ALLS prose literacy and document literacy frameworks but combines items into one reading measure, uses the ALLS numeracy framework and measures, administers a variant of the ISRS reading component measures to low-level readers and includes a new problem solving in technology rich environments measure. Importantly PIAAC was the first international study to use computer delivery of the assessment. The PIAAC background questionnaires were also re-developed and, apart from including much of what was collected in the ALLS study, include an interesting job requirements questionnaire designed to get at skills demand. Pilot testing in twenty-three (23) countries has established the psychometric integrity of the measures including the ability to link to the IALS and ALLS scales for the analysis of trend. Data collection is now underway in the 23 countries7 with another nine (9) countries scheduled Countries include Australia, Austria, Belgium, Canada, the Czech Republic, Demark, Estonia, Finland, France, Germany, Hungary, Ireland, Italy, Japan, Korea, Netherlands, Norway, Poland, Russian Federation, Slovak Republic, Spain, Sweden, the United Kingdom and the United States. 7 25 to participate in a second round of collection this year. Main data collection is underway for the first round of countries. 1.1.8. Saint Lucia Adult Literacy and Numeracy Assessment (SALNA)- Pilot In 2008, the Consultant began the development of the Saint Lucia Adult Literacy and Numeracy Assessment (SALNA). Working with the Consultant and staff from Statistics Canada and the National Statistics Office in Saint Lucia, the Consultant adapted the ALLS and ISRS measures and methods for use in Saint Lucia. The Saint Lucia assessment included an adapted background questionnaire, measures of IALS/ALLS prose literacy, document literacy and numeracy, and, for lowlevel readers, a battery of clinical reading assessments that had been carried in the ISRS. A pilot study was carried out and an analysis of these data was published in March 2009. The data, even though from only a pilot study, demonstrated that the prose, document and numeracy measures performed as well in Saint Lucia as they had in other countries and that the reading component measures worked as they had in Canada and the US. The recession caused the government to postpone collection of the main assessment data. The National Statistics Office (NSO) is currently seeking international funding to support implementation. 1.1.9. Bow Valley Web-Based Assessment Following release of the ISRS data, the Human Resources and Skills Development of Canada (HRSDC) Ministry funded Bow Valley College in Calgary, Alberta to develop a web-based assessment based on the theory assessment methods deployed in the IALS, ALLS and ISRS. The goal was to reduce the cost, operational burden and test duration to a level that would allow for use in a wide variety of settings including instructional programs. The tool includes a number of innovative features including: (a) An adaptive algorithm that greatly reduces test duration while reducing standard errors around the proficiency estimates. 26 (b) The ability to chose any combination of skills domains, the ability to choose among four precision levels that support different uses i.e. program triage8, formative or summative assessment, pre and post assessment that supports reliable estimates of score gain and a certification for employment linked to Canada’s system of occupation skills standards. (c) A pair of score reports that provide diagnostic information for the learner, and their instructor, and a third score report that identifies the benefits that would be expected to accrue to the learner should the prescribed training be undertaken. Real time algorithmic scoring improves scoring reliability and allows score reports to be generated in real time. Bow Valley’s Web-based assessment and instructional suite of web-based products includes: Focus – an adaptive assessment of prose literacy, document literacy and numeracy with four levels of precision Foundation – an adaptive assessment of prose literacy and the reading components assessed in the ISRS Scaffold – an adaptive instructional system that includes the ISRS reading component measures Oral fluency – a technology-based assessment of oral fluency in English, French, Spanish or Arabic The investment of the HRSDC also included the development of a webbased instructional system, the world’s first such system to be based upon the ALLS and ISRS frameworks. To date CAN$4.8M has been invested in system development and validation. A number of large-scale trials are taking place in colleges, workplaces and literacy programs. The assessment and instructional programs will be made available Program triage involves the process of determining learner objectives and learning needs so that an individual learning plan can be formulated. The process of program triage is central to the implementation of efficient and effective programs. 8 27 commercially at a fraction of the cost of equivalent paper and pencil assessments. Importantly for current purposes, the tools are also available in French and Spanish9. The Bow Valley tools have been administered to 1600 adults in Canadian English and French in educational contexts and to 300 adults in employment programming. Thirteen thousand (13,000) additional administrations are scheduled for this year. Validation trials are scheduled for the Chilean Spanish version in August 2012. To date, the tool has not yet been used in a national assessment. The implementation of the PIAAC demonstrates that computer-based assessment is viable. These administrations confirm several things, including that: (a) (b) (c) (d) (e) 1.1.10. The test produces reliable, comparable and interpretable proficiency estimates. As predicted, the adaptive algorithms reduce test durations by roughly 40 percent. The test taker tutorial eliminates any issues related unfamiliarity with computer use. The response types are intuitive. The score reports are useful for both instructors and test takers. Test takers like the clean, uncluttered look and feel of the test. Summary of Historical Development of Literacy/Options By way of summary, all of the options evaluated under this review share a number of fundamental characteristics. Among other things, these characteristics include: (a) The IALS, ALLS, LAMP, Saint Lucia and PIAAC assessments of literacy are all based on the same framework developed by Kirsch and Mosthenthal and initially applied in YALS and NALS. This framework defines the variables that underlie the relative difficulty of reading and tasks. DataAngel Policy Research distributes the Bow Valley products outside of Canada in Canadian English, American English, Canadian French, Mexican Spanish, Chilean Spanish, France French, Brazilian Portuguese and a number of other languages. 9 28 (b) ALLS, PIAAC, Saint Lucia and LAMP assessments of numeracy are all based on an extension of quantitative literacy framework developed by Kirsch and Mosenthal and applied in YALS, NALS and IALS. The refined framework, developed by a team of international experts led by Iddo Gal and paid for by NCES and Statistics Canada, defines the variables that underlie the relative difficulty of reading and tasks. (c) YALS, NALS, IALS, ALLS, LAMP, PIAAC and Saint Lucia all use a common set of methods to summarize proficiency i.e. Item Response Theory-based models (d) YALS, NALS, IALS, ALLS, LAMP, PIAAC and Saint Lucia all report results on a common 500 point scales and, notionally the same proficiency levels. Each study uses a slightly different approach to scale linking and relies on linking items for which Statistics Canada holds the copyright (e) The ISRS, LAMP and PIAAC all incorporate a set of reading components, the mastery of which have been shown to underlie the emergence of fluid and automatic reading characterized by proficiency at Level 3 on the international scales. The approach was initially developed by John Strucker at Harvard and applied in ISRS and then subsequently refined by John Sabatini for application in PIAAC and LAMP. The ISRS was administered in Canada to a sub-sample of ALLS respondents, in the US to a new representative sample plus a sample of program participants. (f) The International Survey of Reading Skills (ISRS), was the first international comparative study to test the component reading skills of adults with a battery of clinical reading tests and an oral fluency test. (g) PIAAC, LAMP and Saint Lucia all use background questionnaires that are largely based upon the questionnaire developed for the ALLS study. These questionnaires serve to 29 support analysis and to improve the reliability and precision of the associated proficiency estimates. (h) The Bow Valley suite of assessment and instructional products was developed to reduce the barriers to the use of assessments in research and program settings. LAMP, PIAAC and the Saint Lucia approach are costly, operationally and technically burdensome. The Bow Valley assessment suite is being used to support several national level programs sponsored by the Government of Canada. The general conclusion is that all of the studies rely on the same theory and methods for summarizing and reporting proficiency. Thus, the key differences among the studies have more to do with cost, how the methods are implemented and how much effort is devoted to quality assurance. A proposal on how the ALLS, LAMP, PIAAC and ISRS methods might be adapted to the needs, financial and operational realities of small island states was done (DataAngel, 2008). 1.2. The Assessment Options The assessment options to be evaluated flow directly from the studies enumerated above. The options to be evaluated in accordance with the Terms of Reference (Annex A) of the consultancy and in agreement with the members of the Advisory Group on Statistics (AGS) at the eighth AGS meeting are as follows: (a) (b) (c) (d) (e) International Survey of Reading Skills (ISRS) The OECD’s PIAAC program UIS’s LAMP study Saint Lucia’s Literacy and Numeracy Assessment Bow Valley Web-Based assessment As mentioned earlier, it is important to acknowledge that all of these options are based upon the same science, evidence and experience base. The distinguishing features of these options are practical considerations related to cost, technical burden, operational burden and risk. The options can also be distinguished by their information yield. At the simplest of levels, the Region’s countries share a pressing need for 30 objective comparative data on the level and social distribution of economically and socially-important skills for policy purposes. In statistical terms, the countries need comparative data on: (a) (b) (c) (d) Average skills levels The distribution of skills by proficiency level for key subpopulations including youth, employed workers and the workforce as a whole, The relationship of skills to outcomes The relationship of skills to determinants These data can be obtained through the following: 1. The conduct of a household survey that includes demographic characteristics and a skills assessment on a sample that is large enough to support estimates of population characteristics directly or 2. The conduct of a household survey that includes demographic characteristics and a skills assessment on a sample that is large enough to provide estimates of the relationship of skills to background characteristics. In the latter case, estimates of population characteristics, including skills distributions, are derived by applying the observed relations to an existing data source such as the Census of Population through a process of imputation. In which case, the yields estimates that are “good enough” for most policy purposes without imposing large operational burdens or financial investments. In each case, two distinct implementation options are considered for each choice of assessment program being evaluated. The first option would involve full participation in the regular study at sample sizes to fulfill country specific policy requirements. The second option would see CARICOM Member States field a common assessment in which each country is treated as sub-population. Pursuing this option would greatly reduce the financial and operational 31 burden associated with implementation without sacrificing much of the value of the data for policy purposes. It is important to understand what these two options imply for the utility of the resultant data. The rationale that underlies the conduct of a common assessment is to reduce the operational and financial burden of fielding a comparative skills assessment without sacrificing too much of the associated information yield10. A more detailed summary of this rationale is set out below. Official statistics of the sort to be collected by an assessment proposed, serve five purposes: 1. Knowledge generation: understanding cause and effect, the impact of multivariate on outcomes, relative risk, attributable risk and what they imply for policy 2. Policy and program planning: Preparation to act to influence outcomes or their social distribution 3. Monitoring: Tracking trend in key outcomes to determine if the world is unfolding as expected and to identify new emerging trends 4. Evaluation: The formal analysis of whether policies and programs are meeting their objectives and offering value for money 5. Administration: The process of making decisions about specific individuals or institutional units such as programs or regions Studies generate two types of information – (i) point estimates of the numbers of individuals sharing particular combinations of characteristics and (ii) estimates of the relationships among variables, Further information is detailed in the document, ‘The adaptation of DataAngel’s Literacy and Numeracy Assessment to the needs of Small Island States’ (DataAngel, 2008). 10 32 including the estimates of the strength of the relationship between skills and background variables. Point estimates are estimates of the numbers of individuals in the sampled population who share a common attribute or characteristic. Producing point estimates is very demanding in terms of sample sizes. Conventional practice requires 400 cases to be allocated to each cell where reliable estimates are required by design using random samples. Estimating relationships among variables is far less demanding with 30 cases required to support the production of reliable estimates using random sampling. In both cases, these numbers of cases must be multiplied by the design effect to reflect the degree to which the sample design departs from a simple random sample. Experience suggests that well-designed national assessments require roughly 600 cases to support point estimates and 60 respondents per cell to support multivariate analyses. The fundamental idea underlying a common assessment is that the estimates of relationships between skills and background characteristics are far more important than estimates of the numbers of adults in particular groups. The key innovations in comparative skills assessments are the skills measures themselves. Much of the sample size fielded in a skills assessment comes from re-estimating characteristics that are already available from other sources, including the Census of Population. These include estimates of the distributions of adults in specific age, sex and education groups. Work in Canada shows that reliable estimates of the distribution of skills can be derived by applying the relationships between skills and background variables observed in the common assessment to Census records. Thus, the common assessment options involve determining the minimum sample size that can support reliable estimates of the relationships among variables. As the likely sample design and measures of all of the available options are essentially the same, the key factor that distinguishes them is the sample size, as it will determine the number of point estimates and relationships each supports. The larger sample size of the full PIAAC options means that it will support the most analysis. Table 1 provides a sense of the number of national point estimates that each option will support assuming a design effect of two (2). 33 Table: Notional Information Yield of Assessment Options by Use KnowOption Use Policy ledge program genera- planning tion PIAAC full PIAAC Common and Monitor Evalua- -ing tion11 Admin istration Point estimates Multivariate 6 6 3 1 - 167 167 167 1 - 1.25 1.25 1.25 - - 33 33 33 33 - 3.5 3.5 3.5 1 - 100 100 100 30 - 3.5 3.5 3.5 1 - 100 100 100 30 - 1.25 1.25 1.25 - - 33 33 33 30 - 4.5 4.5 4,5 1 4 140 140 140 30 4 3 3 3 1 4 120 120 120 30 4 Point estimates Multivariate Point LAMP full estimates Multivariate Saint Point Lucia estimates Full Multivariate Saint Point Lucia estimates Common Multivariate Bow Point Valley estimates full Multivariate Bow Point Valley estimates common Multivariate The table above reveals that the full PIAAC option has the highest information yield i.e. it supports the estimation of the largest number of point estimates and multivariate analyses. The LAMP and full Saint Lucia options have almost the same information yield of as PIAAC. 11 Assuming that a sufficiently large sample of literacy program participants was included 34 The Bow Valley option yields more analytic power than the Saint Lucia and the LAMP options in either the full or common assessment options because the adaptive nature of the assessment – the tool yields more reliable proficiency estimates for a given test duration and sample size than equivalent paper and pencil options. Only the Bow Valley options provide individual proficiency estimates that are sufficiently reliable for making administrative decisions with respect to individual learners. 1.3. Approach to Implementing Household-Based Literacy Skills Assessments All household-based skills assessments, whether paper and pencil-based or technology based, are implemented in five distinct phases: 1.3.1. An adaptation and preparation phase 1.3.2. A data collection phase 1.3.3. A data processing phase 1.3.4. A data analysis phase 1.3.5. A data dissemination phase As noted earlier, the technology-based options greatly reduce the operational and technical burden associated with implementation. Each phase includes a number of distinct activities as outlined below: 1.3.1. The adaptation and preparation phase The first phase of the adaptation and preparation phase is the production of a national planning report that: (a) (b) (c) (d) (e) (f) Specifies the objectives to be met through assessment Identifies an appropriate sample frame Proposes a sample design and size that responds to the objectives Identifies the adaptations that need to be made Identifies the institutional consortium that will implement the assessment Identifies the expected products, services and dissemination mechanisms 35 (g) Identifies where technical assistance will be needed The national planning reports serve several functions. They ensure that: (a) (b) (c) Funders know exactly what they are buying Implementing agencies know exactly what they are expected to produce International study managers know that the implementing agency is capable of implementing the study to specification Once the national study has been funded the national teams: (a) (b) (c) (d) (e) (f) (g) (h) (i) Select the sample of households and design a Kish grid for the final stage of selection of individual respondents Divide the sample into interviewer assignments Adapt the background questionnaire Adjust the procedures and manuals Adapt the training materials Purchase the equipment needed for the reading components i.e. tape recorders, timers, batteries, Recruits and train interviewers to administer the test and background questionnaire Validate the test items psychometrically Print manuals, questionnaires, test booklets, the sheets used to capture scores and re-scores and codes for open-ended background questions In a paper and pencil-based implementation there is a need to typeset, print and bundle test booklets, the background questionnaire and the associated manuals, training materials and forms. In a computer-based implementation there is a need to: (a) (b) (c) (d) Adapt the background questionnaire application and test it Adapt the item pool and validate test Purchase the required hardware Install the required software, including the Opera browser12, the background questionnaire application and the test application. The recommended web browser for the Bow Valley tool is Opera as it provides the most control over what the user can do with the keyboard. 12 36 (e) Set up the required internet access In both cases, the implementing agency must seek formal approval for their implementation and any proposed adaptations. The process of formal approval assures that the scientific integrity of the study will be respected; a necessary condition for assuring that the study will generate reliable and comparable results. Experience suggests that it normally takes 6 to 8 months to complete the preparation phase. 1.3.2. The data collection phase In a paper and pencil-based implementation interviewers: (a) (b) (c) (d) (e) (f) (g) (h) (i) Visit selected households, Complete/update the household roster Select one adult member using a Kish grid (unless a list frame of individuals is being used), Administer the background questionnaire, Administer and score the locator test Administer the main booklet or the low level test booklet and reading components Edit the completed documents for completeness and correctness Revisit non-responding households to try to convert them to respondents Bundle and ship completed forms and recordings for processing The data collection phase depends on the size of the sample, the number of interviewers and the average number of completions per day. Collection normally takes 4 to 8 weeks to complete. In a computer-based implementation, interviewers: (a) (b) (c) Visit selected households, Complete/update the household roster The system will select one adult member using a Kish grid (unless a list frame of individuals is being used), 37 (d) (e) (f) (g) (h) Administer the background questionnaire, Administer the locator test Administer the main booklet or the low level test booklet and reading components Revisit non-responding households to solicit cooperation Bundle and ship completed forms and recordings for processing 1.3.3. The data processing phase In a paper and pencil-based implementation there is a need to: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Specify program and test data capture applications for: (i) The background questionnaire (ii) The score sheets (iii) The coding sheets Specify, program and test and edit program for the background questionnaire Train scorers Score test booklets and reading components Re-score, compute inter-rater reliabilities and re-score items as needed Code open ended fields e.g. industry, occupation, field of study, other specifies Capture background questionnaire Capture scores Capture codes Edit background questionnaire Merge background questionnaire, scores and codes Scale the assessment results, link them to the international proficiency scales and compute error estimates In a computer-based implementation one needs to: (a) (b) Adjust the background questionnaire application and associated edits Specify, program and test data capture applications for the coding sheets 38 In a paper and pencil-based implementation, the data processing phase can take from 4 to 16 months depending on the availability of staff, the sample size and available funds. The next step is to weight the data file to provide a mechanism for generating unbiased estimates of population characteristics, proficiency scores and proportions at each proficiency level. This step also includes the creation of replicate weights that serve as a mechanism for computing standard errors/error variances that include the additional error associated with the fact that one has sampled the content domain as well as the population. In a computer-based implementation, the other data processing steps listed above are taken care of by the system i.e. the system captures, scores and edits the data in real time Similarly, as in the paper and pencil, computer-assisted collections allow the data processing phase to be greatly reduced. It is generally possible to have a clean, weighted data file within weeks of receipt of the last completed case. 1.3.4. The data analysis phase The data analysis phase is identical for both the paper and pencil and computer-based options save for timing i.e. the computer-based options should allow the process to begin 3 to 4 months earlier. The output of the data processing phase is a weighted, documented data file. The international study team uses this file to generate an international report that compares average scores and score distributions for the total adult population and for key population sub-groups, and analysis of the factors that underlie observed differences in skills and the impact that skills has on individual and macro outcomes and the associated implications for policy. The national study team will use the same data file to produce a national report that draws out implications for national policy. These reports depend largely on simple descriptive analysis and a small amount of simple multivariate analysis. The production of an international report and associated national reports generally takes 4 to 6 months. 39 1.3.5. The data dissemination phase The transformation of the raw data through analysis to information is a necessary but insufficient condition for realizing a maximum return on the investment on the assessment. In order to realize the full potential of the study, countries will have to devise and implement an integrated dissemination and communication strategy that ensures that key findings reach key users and that data are available for secondary analysis. The only operational differences between the common assessment and full assessment options is the size of the sample and, by extension, the number of interviewers and the duration of collection. The 1,000 case common assessment implies roughly 500 interviewer days of collection, the 3,500 case full assessment implies about 1,750 interviewer days. Assuming a two month/40 day collection window, then these options imply a need for 15 and 45 interviewers respectively. The data dissemination phase continues until data from the next round of assessment becomes available. Rates of change in skills suggest a frequency of 5 to 10 years between assessments. Each of the seven options will be evaluated against an additional standard set of criteria set out below. 1.4. The Evaluation Criteria and Regional Constraints Having established the statistical and policy goals of a regional skills assessment each assessment option is then evaluated against the following five criteria: 1.4.1. Cost 1.4.2. Operational burden 1.4.3. Technical burden 1.4.4. Risk 1.4.5. Information yield Each of these criteria is described in more detail below. 1.4.1. Cost 40 The assessment of adult skills within the context of a household survey is relatively expensive when judged on a per case basis. For paper and pencil assessments one must train large numbers of interviewers, print the background questionnaires and test booklets, travel to selected households, conduct interviews that averages 90 minutes in length, code open ended responses, score and re-score the test booklets and the component tests, capture the questionnaires and scores and edit and weight the survey file. As an example, in the Saint Lucia assessment total domestic cost per unit was US$124. This cost includes all variable staff and out of pocket costs of fielding the assessment including: (a) (b) (c) (d) (e) (f) (g) Preparation and printing of the questionnaire assessment booklets Training of the interviewers and supervisors Selection of the sample Data collection Scoring and re-scoring Data capture Data editing and coding and Fixed international costs for Saint Lucia were US$145,000. This amount covers: (a) Adaptation of survey instruments (b) Preparation of interviewers and procedures manuals, scoring guides (c) Interviewer and supervisor training (d) Data analysis and scaling While these amounts are manageable for some CARICOM Member States, they could exceed the financial capacity of many of the other Member States. As such, the financial capacity of the countries must be considered in the feasibility of conducting the assessment. Costs for computer-supported data collection are lower but not uniformly so. One must acquire the hardware, adapt the items and background questionnaires and train interviewers in the use of the technology and pay network usage fees. On the positive side, one saves on printing, 41 editing, scoring, data capture and scaling as the software takes all of these steps and delivers results in real time. Cost estimates presented include the acquisition costs for a sufficient number of suitable computers. The only cost element for which costs have not been estimated are the cost of internet use as these are highly variable from country to country and on usage. As noted above it is difficult at this stage to derive anything more than a first order approximation of the cost of any of the options under consideration. The cost of fielding any assessment depends on several factors including: (a) The fixed international costs associated with participation. These tasks include training in key activities, general project management, quality assurance and trouble shooting. These costs will vary somewhat with the expertise of the participating countries, how much support is required, how many activities are centralized and, most importantly, how many countries field the assessment simultaneously. The larger the number of countries the smaller the average overhead cost per country. It is impossible, in the absence of a decision on how many and which countries will field an assessment on a common schedule, to estimate international overheads. The cost estimates presented below assume a flat charge of $150,000 per participating country. (b) The fixed national costs associated with having the project team in place and to undertake key tasks. These costs include item and background questionnaire adaptation, preparation of national planning reports, sample design and selection, management of data collection, data processing and analysis. The longer the study takes to complete the higher the fixed national cost. The cost estimates below assume an 18 month implementation with a two month data collection window. (c) The variable costs associated with data collection that themselves depend on the number of interviewers, the 42 number of people to be assessed and their characteristics, the average number of interviews completed per day and the license fees for using component measures. The cost estimates are based on a two month collection window for the full assessment options, and a 6 week collection window for common assessment options. (d) For paper and pencil collections numbers of manuals, background questionnaires and test booklets to be printed, the number of recorders and timers to be purchased, the number of items to be scored and the number of items to be coded. (e) For computer-based collections the numbers of computers and software including replacement units and internet access Costs will vary from country to country depending on the sample size, on differences in the rates for different types of personnel and the degree to which their work can be covered under existing budgets. Thus, the cost estimates presented below must be considered to be indicative of the approximate relative magnitude of cost among the options reviewed. The cost estimate use in the Saint Lucia’s pilot is found in Annex C. Readers are encouraged to review these carefully to gain an appreciation of the cost elements and how they interact. 1.4.2. Operational burden Household-based paper and pencil assessments are among the most complex forms of social science. Interviews generally include: (a) The completion of a household roster, (b) the selection of one adult per household using a KISH procedure, (c) the completion of a background questionnaire that averages 30 minutes, 43 (d) (e) the completion and scoring of a filter test, the completion of a main test booklet for skilled respondents or the completion of a locator booklet and a reading components test for low skilled adults. These latter testing phases average 60 minutes but there is huge variation around this average depending on the average skills and the characteristics of sampled respondents. The use of a computer-based assessment can reduce the average duration of each interview by a significant amount because the application includes adaptive algorithms that focus the available testing time around the skills level of the selected respondent. Computer-based administration eliminates the need for most of the printing – of procedures manuals, training manuals, questionnaires, test booklets, scoring sheets and coding sheets. Computer-based testing can also, in some systems, obviate the need for data capture, editing and scoring – all operationally demanding and ultimately expensive, error-prone manual processes. Obviously, computer-based collection systems require the acquisition of a sufficient number of computers and required software as well as internet usage costs. On balance, computer-administered assessments save a minimum of 40 percent of the cost, and 60 percent of the operational burden, of equivalent paper and pencil based assessments. Given the average length of interview and the anticipated sample sizes that are needed to get baseline point estimates of the levels and distributions of skills, household-based skills assessments can impose a significant operational burden on national collection and processing capacity. The experience in both Bermuda and Saint Lucia suggests that this burden can be managed with proper planning but the opportunity cost may be too high in some smaller Member States. 1.4.3. Technical burden Many of the tasks associated with implementation of a national assessment demand the mobilization of scarce technical resources. While it can be expected that most of the National Statistics Offices either have, or do not have access to, all of the requisite skills, experience in the Region suggests that this would, in many cases, tax the available 44 technical infrastructure. In particular, the demands of sampling, training of interviewers, the preparation and printing of booklets, questionnaires and manuals, data capture, editing, weighting and variance estimation and data analysis have the potential to overwhelm smaller systems. 1.4.4. Risk The combination of cost, operational burden and technical burden associated with fielding a large-scale adult skills assessment implies a non-trivial risk that things can and will go wrong. Experience with IALS, ALLS, PIAAC, Saint Lucia Assessment and the ISRS suggest that these risks can be attenuated if the right measures are taken. Among other things, experience suggests a need for: (a) (b) (c) (d) (e) A highly skilled and experienced national project manager in each country Sufficient budget to do a good job Sufficient flexibility to hire local technical assistance as needed Implementation of an extensive quality assurance regime, one that includes a mix of active and passive measures. Passive measures include things such as the availability of detailed and unambiguous specifications and standards, extensive training and a management process that ensures that decisions that arise during the course of implementation are dealt with in a way that does not impair the integrity of the measures. An international consortium that has extensive experience in the design, implementation and analysis of the assessment data. These are the minimum necessary and sufficient conditions for success. A failure to meet any of these conditions implies unacceptably high levels of risk to the individuals, institutions and governments involved. 45 CHAPTER 2: A REVIEW OF OPTIONS This chapter provides details on the review of the options identified in consultation with the CARICOM Secretariat, the Member States and the AGS. The Terms of Reference (TOR) called for a review of the following options: (a) (b) International Survey of Reading Skills (ISRS) The United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics’ (UIS’) Literacy Assessment and Monitoring Program (LAMP) paper and pencil assessments of prose literacy, document literacy, numeracy and reading components on a sample of 3,000 adults per country Following consultations with the CARICOM Secretariat, its Members States and the AGS members, the list was further modified in terms of sample size and data collection methods (paper and pencil and webbased). The following other options were added: (a) The Organisation for Economic Cooperation and Development’s (OECD’s) Program for the International Assessment of Adult Competencies (PIAAC) paper and pencil reading, numeracy and reading components assessments on a sample of 3,500 adults per country - Full Assessment (b) The Organisation for Economic Cooperation and Development’s (OECD’s) Program for the International Assessment of Adult Competencies (PIAAC) paper and pencil reading, numeracy and reading components assessments on a sample of 1,000 adults per country - Common Assessment (c) The Saint Lucia’s paper and pencil prose literacy, document literacy, numeracy and reading components assessments on a sample of 1,000 adults per country- Common Assessment 46 (d) The Saint Lucia’s paper and pencil prose literacy, document literacy, numeracy and reading components assessments on a sample of 3,500 adults per country- Full Assessment (e) The Bow Valley’s web-based prose literacy, document literacy, numeracy and reading components assessments on a sample of 1,000 adults per country- Common Assessment (f) The Bow Valley’s web-based prose literacy, document literacy, numeracy and reading components assessments on a sample of 3,500 adults per country- Full Assessment 2.1. Program for International Assessment of Adult Competencies (PIAAC) - Common Assessment The possibility of implementing a common Caribbean assessment using the PIAAC instruments was evaluated. This would involve each CARICOM Member State fielding roughly 1,000 cases in an assessment that would use identical instruments. As described above, this design would allow for the production of a limited number of national point estimates i.e. average score and the distribution of proficiency by level, a database to support multi-variate analysis of the relationship between skills and background characteristics. These covariance data can also be combined with data from other sources, including the Population and Housing Census, to create synthetic estimates of literacy that, when applied in Canada, have proven useful for policymaking. Conducting a common assessment also improves the comparative dimensions of the study as it reduces the impacts of adaptation and implementation errors on the comparability of the results. 2.1.1. Cost of a PIAAC common assessment The reduction in sample size by several thousand cases would reduce interviewer training, printing, data collection and scoring costs by a significant margin when compared to full PIAAC, LAMP or the Saint Lucia options. The total cost of a 1,000 case common PIAAC assessment is difficult to estimate because of ETS’s insistence that some key tasks be centralized, including scoring and whether the OECD would insist on each country paying full international overheads. It is estimated that a common PIAAC assessment would cost US$650,000 or roughly US$650 47 per case. A 15 country common PIAAC assessment would require US$9,750,000. 2.1.2. Operational burden of a PIAAC common assessment The reduction in sample size would have a similar positive impact on the operational burden associated with fielding the study. The collection window could be reduced from three months to one month and the interviewer complement reduced from roughly 45 to 24. These reductions would have a material impact on the national statistics offices and their ability to take on other work. The literacy assessment would not, under these assumptions, crowd out other important work. Conducting a common assessment would also reduce the fixed design overheads by a significant amount. The need to develop and print one set of test instruments, associated training materials and scoring guides. It is worth noting that individual countries could choose to increase their own sample size to match their information needs, level of political interest, data collection capacity and funding envelope. 2.1.3. Technical burden of a PIAAC common assessment The technical burden would remain the same but could be concentrated in a single team drawn from the participating CARICOM Member States. 2.1.4. Other considerations Participation in a PIAAC common assessment would be similar to full PIAAC participation. However, the PIAAC consortium has stipulated some conditions that would need to be met in implementing a PIAAC common assessment. First, they would require that one organization (e.g. one of the National Statistics offices) assume responsibility for managing all survey operations for all the participating countries. This would serve to reduce deviations in how the tools are administered but would subordinate national statistics offices in a way that many governments would not accept. 48 Second, they would require that one organization assume responsibility for managing the scoring operation. This would serve to improve the inter-rater reliabilities and thereby reduce scoring error. This is probably manageable but implies the physical shipment of bulky boxes of test booklets. 2.2 Program for International Assessment of Adult Competencies (PIAAC)- Full Assessment The evaluation starts with PIAAC as it arguably represents the international gold standard with respect to the assessment of adult skills. The PIAAC design includes a lengthy background questionnaire that includes a job requirements module, a combined prose literacy/document literacy reading measure, a numeracy assessment, an assessment of problem solving in technology rich environments and, for low level readers, an assessment of reading components. The OECD and the prime contractor for PIAAC, ETS, provided a great deal of technical information on PIAAC, including copies of presentations to their Board of Participating Countries (BPC) and Technical Advisory Group (TAG). 2.2.1. Cost of Full PIAAC participation This option would see each interested Member State field PIAAC as regular participants. The costs associated with full PIAAC participation are high by almost any standard. Much of these costs are directly attributable to what PIAAC is trying to accomplish. Measuring multiple domains and the associated covariance matrix with sufficient sample to support the production of point estimates, such as average score and percent at each proficiency level, for key population subgroups, makes for an expensive and demanding design. In addition, the first round of PIAAC is technology-based i.e. it involves the collection of the background questionnaires using Computer-Assisted Personal Interviewing (CAPI) followed by either computer-based or paper and pencil-based assessment. Scoring and scaling are performed by ETS post-hoc i.e. after collection has been completed. 49 The average duration of a PIAAC interview is approximately 100 minutes but this average obscures huge variation in interview length associated with differences in individual characteristics and skills levels. Interviewer productivity for PIAAC in Canada averages 1.5 completed cases per day. The cost estimates assume 2 completed interviews per day for a case load of 3,500 cases. Participating countries need to buy standardized laptops at a cost of roughly $600/unit and to adapt and load the data collection applications. Interviewers need to be trained not only in how to administer the measures but also in using the basic functionality of the computer-software. PIAAC is also expensive because it imposes a demanding regime of quality assurance standards. Key among these standards is a requirement to field a sample that yields roughly 4,500 to 5,000 completed cases. The estimated total cost of the PIAAC full option assumes 3,500 cases as these former amounts are beyond what most Member States could afford or manage. The study also imposes demanding response targets that translate into a need for costly nonresponse follow-up to be undertaken. The Quality Assurance (QA) regime also includes a number of active measures that require a significant amount of analysis work by national teams. PIAAC participating countries are also required to cover the costs of attending meetings of the Board of Participating Countries and of National Project teams. These costs add roughly US$30,000 to the annual cost of participation. PIAAC is also expensive because participating countries are required to contribute 84,000 Euros towards the international costs of the program. These contributions cover international design costs, project management costs, implementation of the quality assurance regime and an ambitious program of analysis and dissemination. The total cost of fielding a full PIAAC assessment cost roughly US$1,034,000 per country or roughly US$295 per case. Thus a budget of roughly US$15,510,000 would be required for a 15 country assessment. 50 2.2.2. Operational Burden of full PIAAC participation Full PIAAC participation is operationally demanding. Assuming a threemonth collection window and completion of two interviews per day per interviewer, statistical offices would have to recruit and train a minimum of roughly 45 interviewers and an additional 5 senior interviewers to provide support and to do non-response follow-up. Considering the experience in Saint Lucia, finding a venue to train this number of interviewers may be problematic in some countries. In several National Statistics Offices (NSOs), PIAAC participation would involve their first experience with computer-supported distributed data collection. These NSOs would likely have to invest in the development of the technical infrastructure and personnel needed to install software and to maintain and repair the hardware. The use of CAPI obviates the need for capturing the background questionnaires data but test item scores need to be captured. With sufficient advance planning, the associated volumes would be manageable in most offices. 2.2.3. Technical burden of full PIAAC participation Full PIAAC participation is also likely to impose a significant technical burden on participating countries. The PIAAC sample design requirements, and the associated weighting and variance estimation is very demanding. While most of the NSO’s have access to sampling expertise it might prove less costly to hire external consultants to undertake these tasks. The cost of doing so would have to be borne by the participating country. As noted above, NSOs would likely have to invest in the development of the technical infrastructure and personnel needed to install software and to maintain and repair the hardware. 2.2.4. Risks associated with full PIAAC participation The risk of things going wrong in PIAAC are minimal because the design is based upon the lessons learnt through the conduct of IALS, ALLS and the ISRS. Similarly, the quality assurance regime is based on containing the key sources of error and bias revealed during the conduct of IALS, 51 ALLS and ISRS. Finally, the consortium that is implementing PIAAC includes ETS and the OECD, two of the three partner institutions responsible for the design and implementation of IALS and ALLS. Statistics Canada is no longer playing a role in sampling or project management having been replaced in these roles by Westat13. Westat has considerable skills and experience in managing sampling in international comparative studies having managed this function for Program for International Student Assessment (PISA). They were also responsible for managing the US data collection for IALS and ALLS so they are familiar with the methods and measures. The international consortium also includes several key individuals from the IALS and ALLS teams. Thus, it is expected that the key risk associated with PIAAC participation is its complexity. Given the high cost of full participation, expectations will be high. Even with the full Quality Assurance regime in place, there is a non-negligible risk that the operational and technical demands will overwhelm the team in a subset of the countries. 2.2.5. Other considerations A large number of countries will participate in the first round of PIAAC data collection, these include Australia, Austria, Belgium, Canada, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Ireland, Italy, Japan, Korea, Netherlands, Norway, Poland, Russian Federation (Non-member economy), Slovak Republic, Spain, Sweden, United Kingdom and the United States of America. A second group of nine countries will field a variant of PIAAC in 2013. This alone constitutes a strong endorsement of the PIAAC design, expected information yield and organization. A number of other considerations weigh on the evaluation; one consideration has to do with the PIAAC experience14 as experienced by participating national study managers and their teams. The overwhelming majority of PIAAC national study managers were IALS Westat is a for-profit Washington-based statistics agency with a reputation for quality design and implementation. 14 The Consultant was a member of the PIAAC design team. 13 52 and/or PIAAC national study managers. Conversations with multiple national project managers confirm several things including: (a) (b) (c) The PIAAC project is well organized. Meetings which are held frequently, are well structured with clear objectives, agendas and decision processes. All the important PIAAC documentation is available to national study managers on a password protected sharepoint site. The governance structure for PIAAC is clearly set out. A Board of Participating Countries (BPC) governs implementation. The international consortium is responsible for holding regular meetings. The BPC reports to a joint committee of the OECD’s Directorate for Employment, Education, Labour, and Social Affairs (DEELSA) and Education Committee. This being said the PIAAC process involves so many countries and is so political that many countries have some misgivings about the level of decision-making authority that is vested in the OECD and/or the implementing consortium. An example of this latter concern is a debate about the response proficiency that is to be applied to create proficiency levels. The feeling has been that decisions, in this regard, is being overly driven by vested institutional interests rather than scientific considerations. Put differently it is difficult, within the context of such a complex undertaking, to give individual countries, or groups of countries, much power. PIAAC has also proven to be extraordinarily demanding in terms of time, resources and effort. Several countries have recently withdrawn from the study, including Portugal and Slovenia, because of the financial, operational and technical demands of participation. In small systems, the amount of time of what is invariably scarce technical staff. It is reasonable to assume, as has been the case with PISA, that the PIAAC skills measures will become the de-facto international standard. There is value, therefore, in using PIAAC to benchmark Caribbean skills profiles against the best economies in the world, including Canada, the US and the UK. 53 It is also clear that CARICOM Member Statess would benefit greatly from the enormous investments made in IALS, ALLS and PIAAC development. They would be, in economic terms, free riders in this sense. Also the CARICOM Member Statess would get significant value out of the PIAAC analysis program. Fortunately, it is believed that other options afford many of these benefits at much lower cost and complexity. On balance, therefore, the full PIAAC participation is not recommended. 54 2.3. Literacy Assessment and Monitoring Program (LAMP) Difficulty was experienced in evaluating participation in the LAMP program due to unavailability of data on the technical performance of the test items as well as on any other aspect of the LAMP pilot studies. To date, no information has been made publicly available on the LAMP measures or results. The review that follows is therefore based upon the experts’ knowledge of the instruments and methods, knowledge gained from having developed them. The review also benefits from consultation with several members of the LAMP technical advisory committee. The design of LAMP is almost identical to the ISRS. The study includes a background questionnaire, an assessment of prose literacy, document literacy, numeracy and, for low-skilled readers, a battery of clinical reading tests. Thus, LAMP is designed to provide point estimates of skills for key sub-populations on the IALS/ALLS scales and to place low-level learners in groups sharing common patterns of strength and weakness on the reading components. The consultant was engaged in an activity with UNESCO Kabulto, assisting the Government of Afghanistan in the design of a common assessment that would yield reliable estimates of the distribution of literacy skills in a situation where security concerns were paramount. However, the activity was canceled by UIS. 2.3.1. Cost of implementing LAMP Participation in LAMP would be less costly than participation in the full PIAAC in large measure because the minimum sample size required by UIS is in the 3,000 case range. The costs are also lower because UIS has indicated that participants are not required to contribute towards the international overheads associated with implementation. Implementing LAMP would also reduce the travel costs associated with PIAAC participation. Participating countries are required to attend approximately 12 meetings over the course of a cycle and these meetings are held in PIAAC participating countries, mostly in Europe. The 55 associated costs of travel are high. All LAMP meetings could be held in the Caribbean. Assuming the Saint Lucian costs with sample sizes of 3,000 cases per participating country, LAMP could be implemented for roughly US$133 per case, about US$400,000 per country, an amount that implies that a 15 country Caribbean assessment would roughly cost US$6,000,000. 2.3.2. Operational Burden of implementing LAMP The operation burden associated with implementing LAMP is slightly less than that with PIAAC because the minimum permissible sample sizes are lower than for PIAAC. The two studies are roughly equivalent in most other operational respects. 2.3.3. Technical burden of implementing LAMP The technical burden associated with LAMP is lower than PIAAC because LAMP uses paper and pencil methods rather than computer-assisted interviewing. LAMP also imposes a less stringent quality assurance regime and reporting requirements. 2.3.4. Risks associated with implementing LAMP As noted above, the LAMP instruments were developed in 2006 and are based directly upon the measures that were fielded in the ISRS study. Piloting of the LAMP instruments began in 2007 in Palestine but the UIS has yet to publish any evidence on the psychometric performance of the measures or on what the measures reveal for policy. Also noted above is that the comparative assessment of skills represents the most complex type of survey-based research. The only other study that approaches skills assessments are the US health interview surveys. It was not possible to obtain technical information on the performance of the LAMP instruments. Based on these considerations, the LAMP program of work is not recommended. 56 2.3.5. Other considerations LAMP pilots were conducted in El Salvador, Mongolia, Morocco, Niger and Palestine. The main survey was implemented in Palestine15. The implementation followed the proven open coordination approaches developed in IALS and ALLS and national study managers all expressed appreciation for the open, professional and flexible manner in which the project was run. Since the Consultant’s departure from UIS, the project has reverted to an approach wherein countries are fielded on a bilateral basis. Contact with the Palestinian national project manager suggests that they were happy with the service that they got from UNESCO but, in the absence of any data, it was not possible to assess the quality of the output. In terms of governance, the UIS has maintained a Technical Advisory Group (TAG) to provide advice and guidance on technical matters. Consultations with two of the TAG members suggest that the TAG is not fulfilling the same function as it did in the ALLS, PIAAC and PISA programs. More specifically, the TAG has neither been provided with access to data from the LAMP pilots nor from the main implementation. To date, UIS has not published any information on the technical performance of the instruments and implementation, or any results. To date very few additional countries have taken the decision to field the LAMP assessment. The LAMP website suggests that Palestine, Mongolia, Jordan, and Paraguay have completed assessments. The Ministry of Education in Jamaica was approached by the UIS relative to the conduct of the LAMP but a discussion with the Chief Statistician of the Jamaica Statistical Institute confirmed reservations about UIS’s technical and operational capacity. 15 The Consultant was responsible for the initiations 57 2.4. Saint Lucian Instruments- Common Assessment The Government of Saint Lucia with technical support by Statistics Canada16, adapted the ISRS instruments and methods for use in Saint Lucia. The design includes a background questionnaire, a locator test, an assessment of prose literacy, document literacy and numeracy for high skilled adults and a test of prose literacy, document literacy, numeracy and reading components for low skilled adults. The instruments were successfully piloted in 2009. The proposal would be to apply a variant of the Saint Lucian instruments in a common assessment in CARICOM Member States. 2.4.1. Cost of applying the Saint Lucian instruments in a common assessment Because both LAMP and the Saint Lucia are based on the instruments and methods developed by Statistics Canada and ETS for the ISRS study the domestic cost of implementing the LAMP and the Saint Lucia study would be roughly the same. The international overheads for the Saint Lucia pilot study amounted to US$145,000. This is a reasonable estimate of what it would cost to implement a common assessment using the Saint Lucian instruments. The costs associated with adapting the Saint Lucian instruments for use in other Caribbean countries would be lower than those of adapting LAMP because they have already been shown to work well in Saint Lucia. Using the Saint Lucian unit costs as a guide, a 15 country common assessment based on the Saint Lucian instruments and 1,000 cases per country would cost roughly US$133/case or US$133,000 per country for a total cost of US$2,000,000 for 15 countries. 2.4.2. Operational burden of a common assessment using the Saint Lucian instruments The Consultant, as a member of Statistics Canada’s team, played an active role in this exercise. 16 58 The operational burden of conducting a common assessment based upon the Saint Lucian approach would be similar to fielding LAMP or PIAAC. Interview durations could be expected to run roughly 100 minutes per case, 1700 interview hours per country or approximately 27,000 interviewer hours in total. 2.4.3. Technical burden of a common assessment using the Saint Lucian instruments The technical burden of conducting a common assessment using the Saint Lucia assessment is slightly lower than for LAMP or the full PIAAC option. This is because most of the technical tasks would be undertaken by one team of international experts supported by staff from Bermuda and Saint Lucia. ETS is concerned that having multiple teams independently undertake operational tasks, such as item adaptation, scoring, editing, weighting and variance estimation would increase the level of non-sampling error beyond acceptable limits. Experiences with small statistics offices is that they are far more likely to adhere to the operational guidelines and associated quality assurance procedures than large statistical offices where the guidelines conflict with their standard ways of doing business. 2.4.4. Risks associated with a common assessment using the Saint Lucian instruments The risk associated with the conduct of a common assessment using the Saint Lucian instruments is low. The measures have been shown to provide useful results for policy in Canada, the US, the Mexican State of Nuevo Leon and Bermuda. The approach provides most of the same measures to be carried in PIAAC and have been shown to function psychometrically in the Caribbean context. All of the associated training material has been developed and the team that undertook the Saint Lucian pilot as a wealth of experience in the design, adaptation, implementation and analysis of data from skills assessments so the risk of errors being introduced inadvertently are low. The team includes a project manager, a sampling expert, a psychometrician and a data collection expert all with IALS, ALLS, ISRS and PIAAC experience. 59 2.4.5. Other considerations The design of Saint Lucia’s literacy and numeracy assessment are based upon the instruments developed for the International Adult Literacy Survey (IALS) and the Adult Literacy and Life Skills Survey (ALLS). IALS/ALLS has been fielded by a large number of countries as listed below: IALS 1994 Canada (with separate studies in Ontario Immigrant Literacy Study and Ontario Survey of the Deaf and Hard of Hearing), Germany, Netherlands, Poland, Sweden, Switzerland, United States of America and Vanuatu IALS 1996 Australia, New Zealand and the United Kingdom IALS 1998 Belgium, Czech Republic, Denmark, Finland, Hungary, Ireland, Norway and Portugal ALLS 2003 Bermuda, Canada, Italy, Norway, Nuevo Leon, Mexico, Switzerland, and the United States of America ALLS 2005 Australia, Hungary, Netherlands and New Zealand The IALS and ALLS study have spawned five international comparative reports: 60 a. Literacy, Economy and Society: Results of the first International Adult Literacy Survey, Statistics Canada and OECD, 1995 b. Literacy skills for the Knowledge Society: Further Results of the International Adult Literacy Survey, OECD and HRSDC, 1997 c. Literacy in the Information Age: Final report of the International Adult Literacy Survey, Statistics Canada and OECD, 2000 d. Learning a Living: First results of the Adult Literacy and Life Skills Survey, OECD and Statistics Canada, 2005 e. Literacy for Life: Further results of the Adult Literacy and Life Skills Survey, OECD and Statistics Canada, 2011 Technical reports have been produced for the IALS and ALLS studies and the datasets have been used to generate an impressive list of research monographs. The reading components measures were based on the International Survey of Reading Skills (ISRS). The ISRS had only been fielded in two countries prior to being fielded in Saint Lucia. The measures have since been incorporated into both LAMP and PIAAC. Analysis of ISRS data for Canada and the US has revealed interesting findings. See for example, Learning Literacy in Canada: Evidence from the International Survey of Reading Skills, Statistics Canada, 2008; Reading the Future: Planning for Canada’s Future Literacy Needs, Canadian Council on Learning, 2008. ISRS 2005 Canada, the United States of America and PIAAC countries Clearly the IALS and ALLS studies were of high quality and resulted in technically defensible, policy relevant results rapidly. 61 Implementation of the Saint Lucian pilot has equipped the National Statistics Office in Saint Lucia with first hand experience in applying the methods within a tight time frame. These individuals are available to assist with implementation of the approach in other countries. Piloting of the PIAAC instruments shows that the psychometric performance of the IALS, ALLS and ISRS assessment items is unchanged by the move to a computer-based administration. 62 2.5. Saint Lucian Instruments- Full Assessment As noted above the Saint Lucian design, background questionnaires and assessment instruments were based on the ALLS and ISRS studies. The methods and measures were piloted in Saint Lucia and shown to be both valid and reliable. The Saint Lucian approach could easily be adapted for use in other Caribbean countries at a cost of roughly US$124 per case, US$434,000 for 3500 cases per country or US$6,510,00 for the 15 countries. Virtually all of the development costs have been absorbed by Saint Lucia so only minor modifications need be made to the background questionnaires and training materials. Actual costs would vary depending on the sample size and the number of interviewers trained and equipped. 2.6. Bow Valley Web-based Assessment- Common Assessment As described briefly above the Government of Canada has recently funded the development and validation of a web-based assessment and instructional system that embodies the science that enabled IALS, ALLS, ISRS, LAMP and PIAAC. The goal of this investment was to simultaneously improve the reliability of skills estimates while reducing the cost, technical burden and operational burden of the assessment process. The development also sought to create a suite of assessment tools that support the full range of needs from program triage to certification in real time. The current tool kit includes assessments of prose literacy, document literacy, numeracy, oral fluency and, for low level readers, a computer-based variant of the reading components carried in the ISRS, LAMP and PIAAC assessments. The tools deliver reliable estimates of proficiency in real time. It should be noted that nothing would prevent specific countries from expanding their sample to support the direct estimation of more point estimates at the national level and more reliable relationships. 2.6.1. Cost of a common assessment using the Bow Valley tools The adaptive algorithms in the Bow Valley assessment allow the average interview duration to be reduced significantly (the average length of an interview is 70 minutes) so unit collection costs would drop from US$124 63 to US$83. To this amount, one must add the license fees of $20 per case for using the Bow Valley tests yielding a cost per case of US$103. Adding in US $10 per case for the cost of the hardware and US$10 per case in overheads yields a total cost per case of roughly $123, resulting in a cost per country of US$123,000 with sample size of 1,000 cases. As noted previously, this cost estimate includes allowances for all staff and out of pocket costs including the acquisition of the required hardware and software. The only cost that is excluded is the cost of internet access. This cost could be reduced significantly if the means can be found to have individuals tested in groups. This estimate includes an allowance for the purchase of an average 24 tablet computers per country would require an additional US$192,000 raising total costs to roughly US$1,845,000. Tablet prices are falling rapidly so this is a maximum amount. As suggested by the AGS, hardware bought for one country could be used in other countries providing all the countries do not execute the survey at the same time. 2.6.2. Operational burden of a common assessment using the Bow Valley tools The operational burden associated with applying the Bow Valley tools is significantly lower than that imposed by any of the other options. The technology manages all of the burden of collecting the background information, administering the tests, scoring and scaling the data. This reduces the amount of training needed by interviewers from 3 days to a day. The adaptive algorithms that have been built into the application significantly reduce the number of items that are needed to reach the desired precision levels. This reduction translates into shorter interview durations. 2.6.3. Technical burden of a common assessment using the Bow Valley tools The technical burden associated with fielding the Bow Valley tools is much lower than that associated with any of the other options. Participating countries would still need to select a probability sample of the adult population but the assessment system handles all of the associated technical burden. The methods associated with analyzing the 64 relationship between skills and background characteristics and imputing these relationships onto Census records are well established and straightforward for those with experience in such work. 2.6.4. Risks of applying the Bow Valley tools in a common assessment The conduct of the ALLS study in Bermuda and an ISRS-based assessment in Saint Lucia demonstrates that the approach to measurement, works and yields results that are interesting for policy. Thus, there is very little risk of the approach not yielding results of acceptable quality. The key risk is whether the available communications infrastructure has the bandwidth and stability to support the delivery of the assessment. The Bow Valley tools are designed to support delivery on tablet computers, on laptops or on standalone computers with internet access. A survey of Member States indicates that internet access is highly variable with large proportions of the population remaining without coverage. What remains unknown is the proportion of the population without 3G mobile access. In cases where internet access is problematic the Bow Valley assessment can use the 3G wireless network or can be completed and uploaded at a location where internet access is available. A second potential risk is the unfamiliarity of the adult population with computer technology. The Bow Valley tools have been designed to place minimal demands on the test taker. The administration protocol also provides for the test administrator to take over the actual system input in cases where the test taker lacks the technical skills to use the technology themselves. 2.6.5. Other considerations The main other consideration related to this option is associated with the fact that the Bow Valley tool that would be used to assess skills at the population level are part of a suite of web-based assessment and instructional products and services that could be used for other purposes. For example, the suite of tools can be used: a. For triage at literacy program intake to identify if individuals are in need of literacy or numeracy training 65 b. At literacy program intake for formative assessment and diagnosis of individual learning needs c. At literacy program exit for summative assessment of learning at program exit d. At educational program exit to certify skills level for employment e. At literacy program intake and exit to estimate learning gain f. At the point if hiring for selection g. In research to provide a skills measure The Bow Valley tools have yet to be used in the context of a national assessment but have been used in a number of large national research studies in Canada involving workers, college students and literacy program participants. Users report that the tools are easy to use and that they provide useful results for a variety of purposes. No significant challenges have been encountered in using the tools in Canada. Recent experience in using the tools in China suggests that government firewalls prevent access to the software. This kind of problem is not expected in the Caribbean. 66 2.7. Bow Valley Web-based Assessment- Full Assessment The conduct of a full assessment using the Bow Valley tools would have all the same attributes as the common assessment save the cost and the amount of time that would be required to complete the data collection. Total cost of this option would depend on the sample size fielded at a rate of approximately US$83 per case. Actual costs will vary depending on the number of interviewers trained and equipped. Again this cost estimate includes all cost elements save for the cost of internet access. 2.7.1. Cost of a full assessment using the Bow Valley tools This is the same as the Bow Valley common except that the total cost of the study would be less than three times as expensive as the common Bow Valley option because the fixed costs of design, implementation, processing and analysis would be amortized over a larger sample. 2.7.2. Operational burden of a full assessment using the Bow Valley tools The operational burden associated with applying the Bow Valley tools in a Full assessment is the same as with the Common Bow Valley assessment. 2.7.3. Technical burden of a full assessment using the Bow Valley tools The Technical burden associated with applying the Bow Valley tools in a Full assessment is the same as with the Common Bow Valley assessment. 2.7.4. Risks of applying the Bow Valley tools in a full assessment The key risks of conducting a full assessment using the Bow Valley tools are the same as for the common assessment. 2.7.5. Other considerations The main other consideration related to this option is the same as the commom assessment. 67 2.8. Summary In terms of information yield each of the seven options reviewed would yield information in support of knowledge generation, policy and planning and monitoring. Options with larger sample sizes would support an average of 5 reliable point estimates but this number would vary depending on the actual distribution of sample and skills in the population. All of the options evaluated would support the generation of synthetic estimates. Options with larger sample sizes would generate more reliable synthetic estimates. None of the options would directly support program evaluation but the fact that the Bow Valley tool has a variant that provides scores that are reliable enough to support the generation of reliable estimates of score gain makes it an ideal tool for program evaluation purposes. The following table summarizes the evaluation of options on the other dimensions: Option Cost Cost Cost Operational Technical per case per for 15 Burden burden US$ country countries US$000 US$000 $295 $1,034 $15,510 Very High Very high Very high PIAAC1000 $650 $650 $9,750 Moderate Very high High LAMP $133 $400 $6,000 High High High $133 $133 $2,000 Moderate High Moderate $124 $434 $6,500 High High Moderate $123 $123 $1,845 Low Low Low $115 $402.5 $6,037 Low Low Low PIAAC Risk 3500 3000 Saint Lucia 1000 Saint Lucia 3500 Bow Valley 1000 Bow Valley 3500 68 The table reveals significant variation in cost. Per case costs range from a low of $115 for the full Bow Valley to a high of $650 for the 1,000 case PIAAC. Per country costs range from a low of $123,000 for the Bow Valley 1,000 case option to a high of $1,034,000 for the 3,500 PIAAC option. The estimated cost of a 15-country study range from a low of $1,845,000 for the 1,000 case Bow-Valley option to a high of $15, 510,000 of the 3,500 case PIAAC option. The table also documents significant variation in the operation burden associated with each option. PIAAC imposes the highest operational burden in large measure because of the demanding quality standards it imposes. The Bow Valley options impose the least operational burden because the design of the tool reduces average test durations and eliminates the need for most data capture and data cleaning and eliminates manual scoring of the items. The table also documents significant differences in the technical burden imposed by the various options. The most demanding options are the PIAAC options (Full and Common) because of the mix of technology used and the quality assurance procedures that are imposed. The least technically demanding options are the Bow Valley options (Full and Common), a result that can be traced to the fact that the international team does most of the technical activities. Finally, the table suggests that there are significant differences in the amount of risk associated with the various options. The evaluation found that the LAMP and PIAAC options carry the highest risks. The LAMP is risky because of the unavailability of published information on the implementations that begun in 2007. The PIAAC options (Full and Common) are risky because of the exacting quality standards imposed on participating countries. Countries that fail to meet these standards have to spend additional money to meet the standards or are excluded from the international comparisons. The Consultant recommends the 1,000-case Bow Valley option, and the AGS recommends the 3,500-case Bow Valley option. Both options are viable. The difference in options can be traced back to differences in what the two groups judge to be prudent and good enough to improve policy making in the Region. Please see Chapter 8 Section 6 for a qualification on the recommendation of the Bow Valley option. 69 CHAPTER 3: ANALYSIS OF THE LITERACY SURVEY EXPERIENCE IN THE REGION This chapter reviews Bermuda’s experience in fielding the ALLS study Saint Lucia’s experience in fielding a variant of the ISRS survey and Dominca’s proposed implementation. 3.1. Bermuda’s Experience The following overview of Bermuda’s experience in the 2003 ALLS survey was taken from their administrative report and from the Consultant’s involvement in the international study team that provided consultancy services throughout the life of the ALLS survey project in Bermuda. This assessment of the Bermuda’s experience will be evaluated against the following criteria: cost, operational burden, technical burden and risk. In addition, the lessons learnt and the challenges will be highlighted. 3.1.1. Cost The total cost for the survey amounted to approximately BDA$830,500 (BDA$1=US$1). This included the development cost (training material, cost for international trainers, printing cost, etc), data collection cost, data processing cost and data dissemination cost. Interviewers were paid a minimum fee of fifteen hundred dollars ($1,500), less tax. This fee was dependent on their completing 30 surveys. Additional work was made available, after 30 completions, for those who were interested in earning extra monies. Incentive pay was paid during the month of May 2003. Interviewers were paid $60 for completing a survey (screener, BQ and Main task booklet) plus $100 for travel. An additional bonus of $300 and $500 was paid to those who completed 30 household and 40 households by May 30, respectively. 70 Supervisors were paid a fee of five thousand dollars ($5,000), less tax. This fee was dependent on the successfully completion of the assignment. 3.1.2. Operational burden Data collection One hundred Interviewers and twenty-seven supervisors were dispatched into the field to complete 4,000 surveys. Each interviewer was given an Assignment Control List with 40 household addresses, with the expectation that they will complete 30 over a ten-week period. The average interview lasted for approximately two hours, with each selected respondent answering a background questionnaire and a psychometric assessment booklet. Although the data collection phase officially closed on August 31 2003, in-office interviews continued until the end of September in an effort to meet the required target. These in-office interviews resulted from two publicity initiatives informing the public that the survey was still in progress. In addition, the list of addresses of households that were yet to be visited were placed in the daily newspapers. The listed householders were asked to call the Survey Department to arrange suitable times for an interview. Additionally, letters were sent to persons who had totally refused to participate in the survey reminding them that the Survey was mandatory by law. Response Rate The Bermuda ALLS study was scheduled to run for 3 months. However, fieldwork was extended for an additional 3 months in order to achieve the desired response targets. Initially Bermuda was expected to complete 4,000 cases for international comparison since the survey was an international initiative and there was no real experience conducting a survey of this type with small population countries. This sample size was determined to ensure validity of data captured. However, due to many factors that inhibited the survey 71 progress, Bermuda, in consultation with Statistics Canada, decided to reduce the number of completed surveys to 3,000 cases. It was reasoned that 3,000 completed cases would still provide the accuracy of data needed for international comparative purposes. At the end of the interviewing period (i.e. October 31, 2003), a total sample of 4,049 households were actually visited with 2,696 completed cases obtained. Based on the number of homes contacted (3,025) and those visited but for which no contact was made (304), the response rate from the study was 82 percent, the highest of all the participating countries. Non-Response The ALLS took several precautions against non-response bias, as specified in the ALLS Administration Guidelines. Interviewers were specifically instructed to return several times to non-response households in order to obtain as many responses as possible. In addition, questions were preaddressed and interviewers were given proper maps to assist in identifying households. Other initiatives were introduced by the Department of Statistics to encourage participation in the survey. Civil servants who were selected were given time off to complete the survey. Bermuda was tasked with completing a debriefing questionnaire after the Main study in order to demonstrate that the guidelines had been followed, as well as to identify any collection problems they had encountered. 3.1.3. Technical burden The complexity of the collection procedures presented somewhat of a challenge to most experienced interviewers, who were used to the traditional way of completing interviews. Bermuda reports that the ALLS study is scientifically rigorous and implementation included a significant amount of training in various aspects of the study design, the related quality assurance procedures 72 and how the data could be used in policy analysis. However, the study was professionally managed, the approach to governance was open and issues were addressed in measured and thoughtful way. Nevertheless, Bermuda required significant levels of technical and operational assistance during the implementation of their study. The National Statistics Office required assistance in selecting the sample, the training of interviewers, in the supervision of data collection, in weighting and variance calculation. Staff were detached from Statistics Canada to perform these tasks. 3.1.4. Risk During the first few weeks of the data collection phase, interviewers faced very poor weather conditions, which slowed the progress of the fieldwork. By the end of March 2003, many interviewers still did not commence the field activities. Thus, the expectation of four (4) completed surveys a week per interviewer had not been fulfilled. At the end of June 2003, a total of 2,045 households were visited and there was the need to extend the Survey period until the end of August. At that time, many interviewers opted to end their employment with the Department. Out of a total of 75 interviewers, 45 willingly remained in the field to assist the Department in reaching the target of 4,000 surveys. Unfortunately, for the remainder of the data collection phase, the Survey experienced two major external shocks, which created a setback in the volume of cases which could have been completed due to the extended time frame. In July, a general political election was suddenly announced. All focus was steered towards the up-coming general election and political candidates. Many households dismissed the visits of the survey interviewers, which paralleled the canvassing of the political hopefuls. In addition, during the latter part of August, Bermuda was ‘embraced’ by Hurricane Fabian. Again, households were distracted as attention was drawn to the extreme damage caused to the Island and more specifically to individual homes. Several pre-scheduled interviews with individuals during the first week in September were cancelled. As a result, the total number of completed surveys did not measure up to the expected target. 73 3.1.5. Lessons Learnt Need for an aggressive publicity campaign to encourage full participation of households. Need to compensate Interviewers to prevent fatigue. Interviewers must be properly compensated to encourage good quality work. Visual, Plan, and Execute. Office and field staff must be adequately trained. The ALLS study is unlike any other survey undertaken. The level of work is comparable with the Census. The level of technical material to cover is enormous. Supervision – Office staff must monitor regularly the work of field staff. 3.1.6. Challenges Getting residents to participate in a survey that lasted two hours or more. A higher level of resistance compared to other surveys conducted in the past. General election called during the fieldwork, with both political candidates and interviewers calling on some of the same households. Hurricane Fabian struck on September 5, 2003, the worst hurricane in Bermuda’s recent history. 3.2. Saint Lucia’s Experience The following is an overview of Saint Lucia’s experience in piloting a variant of the ISRS study. The recession precluded Saint Lucia from fielding the main assessment after having completed a large–scale pilot. Analysis of the pilot data demonstrated that the psychometrics was stable and that the background questionnaire functioned as expected. The assessment will be evaluated against the following criteria: cost, operational burden, and technical burden. In addition, a summary and recommendations for the main assessment will be outlined. Most of the information used in this overview was taken from the document, ‘A National Literacy and Numeracy Assessment for Saint Lucia: A National Planning Report (Saint Lucia CSO, 2008). 74 3.2.1. Cost Experience in the ISRS study suggests that interviewers are able to complete between 1.5 and 2 interviews per day depending on the demographics of the enumeration district in which they are working. The Saint Lucia’s pilot confirms that Interviewers were able to complete an average of two interviews per day, well within the design tolerances. The cost of overheads amounted to some US$145,000 and the data collection cost almost US$75,000. The sample size for the main survey is planned for 3,000 cases which is estimated at a cost of US$434,000. 3.2.2. Operational burden This study targeted a purposive sample of 400 adults aged 16-65 years. Approximately 50 staff (including interviewers, collection supervisors and CSO staff) were trained for 5 days during the week of January 24, 2009. Six (6) members of the scoring unit were trained for 3 days during the week of February 1, 2009. Pilot data collection was undertaken during the months of February and March, 2009. Data collection indicated that interviewers were able to implement the pilot assessment more or less as specified. The inexperienced interviewers appeared to have more difficulty in managing the interview process. The interview lengths were much longer than expected, a fact that reduced interviewer productivity and unit collection costs. 3.2.3. Technical burden Saint Lucia required significant levels of technical and operational assistance during the implementation of their study. Assistance was required in the training of interviewers and in weighting and estimation. Experienced staff were detached from Statistics Canada to undertake these tasks. Many of the more technically demanding tasks, such as building the response database, was undertaken by the Chief Statistician himself, a fact that placed a great burden on the system. 75 3.2.4. Risks The pilot filter threshold proved to be too low, resulting in many relatively low skilled respondents to be asked the more difficult questions (test items). Together these problems led to higher levels of respondent annoyance. Response rates did not suffer so much but item non-response levels rose particularly for items with high reading loads. Respondents with very low skills levels found the locator items too difficult and this caused higher levels of non-response. 3.2.5. Summary and Recommendations for the Main Assessment in Saint Lucia The analysis of the Saint Lucia pilot data made recommendations for changes that must be introduced in the implementation of the main assessment. The recommendations are meant to correct serious deficiencies in the design, ones that would, if not corrected, would jeopardize the integrity of the assessment and preclude meeting most of the study objectives. Analysis of the pilot results confirms much of the informal feedback obtained from interviewers during the course of administering the pilot and during the pilot training. Three fundamental issues that were identified are as follows: (a) Interview length: Average interview durations were roughly double the expected length of 90 minutes per household. Such interview durations reduce response rates and lead to higher levels of item-level non-response. These changes increase the risk of bias in the proficiency estimates. In formulating recommendations for the main assessment efforts should focus on reducing the number of items in the filter, locator and main assessment booklets by some 40 percent. (b) The pass/fail threshold in the filter booklet: For the pilot, the filter threshold was set based on the distribution of 76 proficiency observed in other countries. This threshold saw those respondents with at least high school education taking the more difficult main assessment booklet. It appears that many high school graduates in Saint Lucia have literacy skills levels below those needed to complete items of the difficulty levels found in the main assessment booklet. For the main assessment, the filter threshold should be set empirically at a level that routes only respondents with a high probability of scoring at prose literacy levels 3 or above to the more difficult side of the design. Imposition of a more demanding filter threshold will reduce the response burden by ensuring that respondents are assigned items appropriate to their proficiency levels. Imposing a more demanding filter threshold forces us to reallocate the main sample towards more educated areas. This reallocation will ensure that the design yields a sufficient number high-skilled adults. (c) The administrative burden on the interviewers: Administration of the assessment involves a significant amount of paper handling. In order to keep the interview moving along at the expected pace, interviewers must be well organized and experienced. Many of the pilot interviewers lacked the requisite levels of skills and experience and the pilot training did not provide sufficient practice time to impart the necessary basic interviewing skills. Several changes will be introduced to try reducing administrative burden for the interviewers, including a reduction in the number of documents involved in the main assessment. Other recommended changes and additions (a) Interviewers’ compensation: interviewers should be paid for each case returned rather than differential amounts for the level of completion. Experience suggests that differential compensation elicits bad behaviour on the part of interviewers. Specifically, they adopt practices that serve to maximize their earnings rather than data quality. (b) Training for main implementation: 77 (i) Administration of the literacy assessment is very demanding for interviewers. Many of the interviewers trained for the Saint Lucia assessment did not have any previous interview experience. It is recommended that data collection window for the main assessment should be extended to 9 weeks – a period long enough for the interviewers used in the pilot to complete the main assessment. Using experienced interviewers would allow for a reduction in the duration of interviewers’ training. In which case, more focus should be on mock interviews. (ii) Class size should be limited to 12-15 trainees: Class sizes above this level reduce the level of interaction between the instructor and the trainees and allow weak interviewers to hide. (iii) If a significant number of new interviewers were to be recruited for the main assessment then survey training should be extended by 3 days to allow for practice of mock interviews and for training in basic interview techniques. (c) Re-score requirements: Move to a sampling strategy for rescoring of locator and main assessment booklets, one in which 15 percent of booklets are re-scored. Adjusted procedures should be provided as required. 3.3. Proposed Work in Dominica Dominica has indicated an interest in fielding a national literacy assessment. A review of their capacity questionnaire suggests that they have quite limited operational and technical capacity. Thus, any of the conventional paper and pencil options would stretch the Dominican statistical system beyond the breaking point. Even if the Dominican system managed to cope the opportunity costs associated with devoting such a high proportion of available capacity would be high. 78 CHAPTER 4: FEEDBACK FROM THE CARICOM ADVISORY GROUP ON STATISTICS (AGS) The AGS play an integral role in the development of the Common Framework for a Literacy Survey for the Region. Therefore, the advancement of the Common Framework for a Literacy Survey Project was on the agendas of four AGS meetings as follows, where the Consultant made attended and made presentations: Eighth Meeting of the AGS, Jamaica 27 June 27- 1 July, 2011 Ninth Meeting of the AGS, Belize 20- 22 October 2011 Tenth Meeting of the AGS, Suriname 18- 22 June 2012 Eleventh Meeting of the AGS, Grenada 25-28 October 2012. Discussions and decisions that came out of the above-mentioned meetings focused on several related issues. As documented in the respective AGS Reports under agenda item: Advancement of the Common Framework for a Literacy Survey Project, the issues discussed include: 1. Proposed sample size of 1,000 households per country the CARICOM Region as one domain treating Recommendations/ Discussions: The consultant indicated that the data from the survey using the Bow Valley common assessment of 1,000 households per country could be used to provide detailed estimates by applying the national survey estimates to the census or other survey data. This approach was not accepted by the meeting because it was pointed out that the countries will need literacy statistics by small geographic areas as well as by variables such as age, educational attainment, and sex so as to facilitate the use of the data for policy and decision making at local levels. The consultant confirmed that the varying sample sizes could be 79 used so it was agreed that the full assessment will be considered instead of the common assessment. The meeting recommended that each country be considered as a separate domain instead of treating the Region as one domain with each country as a sub-domain. 2. Sample size Recommendation: 3. The meetings agreed that in determining the sample size for the survey, countries should take into consideration their respective literacy data policy/disaggregation needs against the funds available and the acceptable level of reliability required. The sample size to be used by countries must be able to provide reliable estimates when applied to the populations of the respective countries. For each country, the sample size should be proportionate to the size of the population of the country. Number of adults to be targeted per household using the Bow Valley web-based assessment Recommendation: 4. One adult respondent would be selected (using the Kish selection method) for participation in the survey from each household in sample. Acceptable response rate for literacy assessments Recommendation: A very high response rate is usually difficult to achieve for this particular type of survey but countries should strive to achieve a response rate of about 75- 80 percent. Measures should be implemented to adjust for response rate bias. 80 5. Internet access Recommendations: 6. There may be areas in some countries where there is little or no internet access. The meeting was advised that in such cases there are two options- (i) a number of programmes/ software will be preloaded onto the data collection device, which will allow for off-line entries in which case a very large cache memory and large download capacity will be used. (ii) countries could establish central internet access points, in which case, large cache memory and large download capacity would not be required. There may be areas in some countries where there is no internet access but mobile network service (such as 3G) is available. In such cases, the software could be set up to run on any standard laptop or wireless tablet. Inability of respondents to use the data collection device to respond to the survey Recommendation: 7. The use of a web-based assessment might prove to be challenging for non-computer user. The meeting was advised that in such cases, the application allows for the interviewers to input the responses into the data collection device as directed by the respondents. Cost of the survey Recommendation/ Discussions: Cost estimates to compare the web-based and paper-based data collection methodologies should be done for each country including full cost for infrastructure/ equipment. Relative to the cost of data collection devices and the cost of the survey, the meeting was informed that there would be no need for countries to invest in a large number of data collection 81 8. devices and field staff since the fieldwork could be done over an extensive period. It was elaborated that literacy levels in a population does not change at a fast rate over time and therefore, an extended fieldwork period would not affect the results of the survey. In an effort to minimize the cost of the survey in the Region, countries could share/ loan data collection devices/ hardware (e.g. laptops) with each other since all the countries in the Region are not likely to execute the survey at the same time. A review of the estimated cost relative to the sharing of hardware among countries should be conducted. License fee for assessment test accessing the computer-/web-based literacy Recommendation/ Discussions: 9. The meeting was advised that the cost for the license for the two required test is charged on a per use basis. However, if a large number of the countries in the Region decides to execute the survey within a specific period, a full volume discount of not more that 70% will apply; Concerns about the preferred option versus the other options relative to science of measuring literacy Recommendation/ Discussions: All five options considered, including the Bow Valley Webbased, utilise the same science of assessment of literacy. The difference being, the method of data collection i.e. electronic/web-based versus paper-based. The web-based assessment methodology is similar to the LAMP’s and PIAAC’s but more user-friendly and the implementation is less costly The IALS was developed using the ALLS and ISRS The PIAAC generally uses the IALS methodology but a reading component measure based on the ISRS was added The LAMP is a less complicated version of the PIAAC 82 10. The PIAAC Methodology set the standard for the Saint Lucia approach. The Bow Valley Web-based Assessment will produce the same results as the others but with fewer complexities. Technical capacity building at the national level Recommendation: Even though most of the data processing will be done automatically and in real time using the web-based option, the survey documents will include details on the concepts, definitions and procedures involved in the survey process. 11. Involvement/ Input from other stakeholders at the national level Recommendation/ Discussion: 12. The technical workshops will target representatives from the National Statistics Offices and the Ministries of Education in the respective countries. The countries thereafter should form national Literacy Survey teams which should include representative from the Ministry of Labor, and the Ministry of Finance. Major Risks Recommendation/ Discussion: The major risks in any literacy assessment include nonresponse biases, performance of test items and scoring errors. In order to control these risks, adequate proactive and reactive quality control assurance checks must be done. Further, a skilled and experienced team should be employed to manage the survey. 83 CHAPTER 5: ANALYSIS OF THE INDIVIDUAL COUNTRY CAPACITY ASSESSMENTS As noted earlier in this Report, household-based skills assessments are one of the most costly, technically demanding, operationally taxing and error prone of all social sciences. In order to gauge the readiness of Member States and Associate Members to cope, and to identify what types of support they might need, countries were asked to complete a questionnaire designed to assess the countries in this regard (see Annex B for questionnaire). The questionnaire sought to identify the following among the countries: 1. Needs and priorities with respect to literacy and numeracy data 2. Operational, financial and technical capacity and need for support 3. Cost associated to carrying out a household survey The questionnaire also was designed to assist countries in assessing their national information needs and their capacity to field a householdbased skills assessment, knowledge that will inform the completion of their respective National Planning Reports. This Chapter of the report summarizes the responses to three Sections of the questionnaire namely, A. Identification; B. Data needs and priorities; and c: Operational Capacity and draws out their implications for the common framework. It should be noted that the numbering used for the Tables and Charts in this Section corresponds with the numbering of the questions in the questionnaire. Of the twenty (20) countries in the Region, seventeen (17) responded to the questionnaire yielding a response rate of 85 percent. Barbados, Haiti and Trinidad and Tobago did not respond to the questionnaire. 84 Major Findings A: Identifying information This Section covers particulars on the respondents. It was found that generally, senior officers from either the MOE or the NSOs or both completed the questionnaires. Of the 17 countries responded, three (Bahamas, Suriname and Bermuda) submitted two independent questionnaires each, one from the NSO and the other from the (MOE). In these cases, the responses from the NSOs were considered for all the Sections except for Section B where the MOE’s responses were considered since this Section reflected policy issues. Therefore, the analysis that follows is based on the 17 responses. B. Data needs and priorities Adult skills assessments can be designed to serve a range of purposes including knowledge generation; policy and program planning; monitoring; evaluation; and program administration. Knowledge generation involves generating new scientific insights, including understanding cause and effect. Monitoring implies the collection of repeated measures in order to see if the world is evolving as expected. The design of any assessment must be adapted to support each of these purposes and each of these uses implies a different set of technical attributes that must be met if the system is to produce data that are judged to be fit for purpose. B1: Purposes of acquiring adult literacy and numeracy data Countries were asked to indicate if any of the following purposes of acquiring adult literacy and numeracy data were applicable to their respective countries. They were also asked to rank the purposes that are applicable (in order of importance to their respective country on a scale of 1 to 5 with 1 being most important and 5 being least important): o knowledge generation; o policy and planning; o monitoring; o evaluation; and o program administration. 85 It should be noted that all the purposes targeted were reportedly important to all 17 countries that responded except for Montserrat in which case, only one purpose- policy and program planning- was indicated giving a ranking of one. Of the countries that indicated that all the targeted purposes were important to their respective countries (i.e. 16 countries), one (Saint Lucia) did not provide ranking for any of the targeted purposes and this is reflected in Table B1 and Charts B1.1 and B1.2. Table B1 and Charts B1.1 and B1.2 reveal considerable variations in reported purpose by countries. The variations are not problematic in and of itself as the proposed assessment option’s sample size and design support the first three purposes. With the generation of small-area estimates, all the options would serve the program administration use. The Bow Valley tool would support program evaluation as it can yield reliable estimates of score gain. Table B1: Purpose of literacy data by ranking and percent of countries Rank 1 2 3 4 5 NS Total (%) Total (#) Knowledge generation 31.3 18.8 6.3 6.3 31.3 6.3 100.0 16 *Policy and program planning Monitoring 52.9 12.5 35.3 6.3 0.0 31.3 0.0 31.3 5.9 12.5 5.9 6.3 100.0 100.0 17 16 Evaluation 6.3 6.3 18.8 37.5 25.0 6.3 100.0 16 Program administration 6.3 18.8 43.8 12.5 12.5 6.3 100.0 16 Note: One country reported this as the only purpose that is important. NS- Note Stated 86 Chart B1: Purpose of literacy data by ranking and percent of countries 60.0 % of countries 50.0 40.0 30.0 20.0 10.0 0.0 Knowledge generation *Policy and program planning Rank 1 Rank 2 Monitoring Rank 3 Rank 4 Evaluation Program administration Rank 5 Table B1 and Chart B1 show that the main purpose of adult literacy and numeracy data is for policy and program planning with 17 countries reporting same. Almost 53 percent of the countries ranked this purpose as most important to their respective countries. The second most important purpose reported is knowledge generation with over 31 percent of the countries ranking it as number 1 relative to its importance to their countries. B2: Policy departments that require adult literacy and numeracy data Countries were asked to indicate if any of the following policy departments require adult literacy and numeracy data and they were also asked to rank the departments that are applicable, in order of importance to their respective countries, on a scale of 1 to 8 with 1 being most important and 8 being least important: 1. 2. 3. 4. 5. Kindergarten to Grade 12 education Adult education Labour Finance/Treasury Language and culture 87 6. Social 7. Prime Minister’s Office 8. Other Table B2: Policy departments that require literacy data by ranking and percent of countries KindergarLanguage ten to Finance/ and Prime Grade 12 Adult Treasury culture M inister’s Rank education education Labour (F/T) (L&C) Social Office Other 1 6.7 33.3 50.0 0.0 0.0 0.0 10.0 14.3 2 26.7 26.7 7.1 10.0 0.0 7.1 20.0 14.3 3 6.7 26.7 21.4 10.0 9.1 21.4 10.0 0.0 4 6.7 0.0 14.3 30.0 27.3 21.4 0.0 0.0 5 6.7 0.0 0.0 10.0 36.4 21.4 10.0 28.6 6 20.0 0.0 0.0 10.0 27.3 7.1 20.0 0.0 7 13.3 6.7 0.0 20.0 0.0 14.3 20.0 0.0 8 6.7 0.0 0.0 10.0 0.0 0.0 10.0 42.9 NS 6.7 6.7 7.1 0.0 0.0 7.1 0.0 0.0 100.0 100.0 100.0 100.0 15 15 14 10 Total (%) Total (#) 100.0 100.0 11 100.0 100.0 14 10 NS: Not Stated 88 7 As shown in Table B2, there are significant variations in literacy data importance among the departments of government. This implies a need for the comparative analysis to reflect a broad range of policy issues. All 17 countries responded to this question and it was found that of all the policy departments examined, ‘Kindergarten to Grade 12 Education’ and ‘Adult Education’ are the main departments that require adult literacy and numeracy data with 15 countries in each case. On the other hand, the Finance/ Treasury department and the Prime Minister’s Office are least likely, compared with the other departments, to require literacy data with only 10 countries indicating same respectively (Table B2). 89 Chart B2: Rank by policy departments that require literacy data and percent off countries 100% 90% 80% 70% % of countries 60% 50% 40% 30% 20% 10% 0% 1 *K- Grade 12 2 3 4 5 6 Rank Note: * One case excluded where no ranking was given *Adult edu *Labour F/T L&C *Social PM’s Office 7 Other 8 90 Chart B2 reveals that the policy departments that were ranked as most important by the largest proportion of countries are the Labor department and the adult education departments with over 45 percent and just about 35 percent respectively. None of the countries identified the Finance/ Treasury, Language and Culture, and Social departments as the most important policy departments that require adult literacy and numeracy data. B3: Policy issues that require adult literacy and numeracy data Countries were asked to indicate if the following policy issues require adult literacy and numeracy data and they were also asked to rank the selected issues in order of importance to their respective countries, on a scale of 1 to 15 with 1 being most important and 15 being least important: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Improving the quantity of primary and secondary education Improving the quality of primary and secondary education (initial education) Improving the equity of primary and secondary education (initial education) Improving the efficiency and effectiveness of primary and secondary education (initial education) Improving the quantity of tertiary education Improving the quality of tertiary education Improving the equity of tertiary education Improving the efficiency and effectiveness of tertiary education Improving the quantity of adult education Improving the quality of adult education Improving the equity of adult education Improving the efficiency and effectiveness of adult education Reducing social and economic inequality Improving labour productivity and competitiveness Improving health For analysis purposes, the 15 ranks have been grouped into five categories as follows: 1. Most important (Rank 1-3) 2. Above average (Rank 4-6) 91 3. Average (Rank 7-9) 4. Below average (Rank 10-12) 5. Least important (Rank 13-15) Additionally, the 15 policy issues targeted are analyzed in four groups as follows: 1. Improving primary and secondary education (policy issues (a) to (d)) 2. Improving tertiary education (policy issues (e) to (h)) 3. Improving adult education (policy issues (i) to (l)) 4. Socio-economic concerns (policy issues (m) to (o)) Sixteen out of the 17 countries responded to this question. Table B3.1: Policy issues that require literacy data relative to improving primary and secondary eduaction by level of importance (rank) Rank Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12) Least important (13-15) Not Stated Total (%) Total (#) Quantity of Quality of Equity of primary and primary and primary and secondary secondary secondary education education education 10.0 35.7 36.4 0.0 21.4 18.2 30.0 7.1 9.1 10.0 21.4 18.2 50.0 7.1 9.1 0.0 7.1 9.1 100.0 10 100.0 14 100.0 11 Efficiency and effectiveness of primary and secondary education 30.8 15.4 15.4 30.8 0.0 7.7 100.0 13 As shown in table B3.1, the majority of countries indicated that the policy issue ‘improving the quality of primary and secondary education’, requires adult literacy and numeracy data with 14 countries. Of these, over 57 percent reported that this policy issue is most important or above average in terms of importance to their respective countries. 92 Chart B3.1: Ranking of policy issues relative to improving primary and secondary education 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12) Least important (13-15) Quantity of primary and secondary education Quality of primary and secondary education Equity of primary and secondary education Efficiency and effectiveness of primary and secondary education Chart B3.1 shows that the most important policy issues reported are equity and quality of primary and secondary education with about onethird of the countries indicating same respectively. 93 Table B3.2: Policy issues that require literacy data relative to improving tertiary eduaction by level of importance (rank) Rank Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12) Least important (13-15) Not Stated Total (%) Total (#) Quantity of Quality of tertiary tertiary education education 0.0 15.4 27.3 38.5 27.3 15.4 36.4 15.4 9.1 7.7 0.0 7.7 100.0 11 100.0 13 Efficiency and Equity of effectivenes tertiary s of tertiary education education 8.3 16.7 16.7 25.0 25.0 41.7 33.3 16.7 8.3 0.0 8.3 0.0 100.0 12 100.0 12 As it relates to improving tertiary education, 13 out of the 16 countries that responded to this question reported the need for adult literacy and numeracy data. Of these, almost 54 percent indicated that this policy issue is most important or above average in importance to their respective countries (Table B3.2). 94 Chart B3.2: Ranking of policy issues relative to improving tertiary education 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12)Least important (13-15) Quantity of tertiary education Quality of tertiary education Equity of tertiary education Efficiency and effectiveness of tertiary education Chart B3.2 demonstrates that the most important policy issues that require adult literacy and numeracy data relative to tertiary education are improving the ‘quality of tertiary education’ and ‘improving the efficiency and effectiveness’ of tertiary education’, with around 41 percent and 38 percent respectively. 95 Table B3.3: Policy issues that require literacy data relative to improving adult eduaction by level of importance (rank) Rank Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12) Least important (13-15) Not Stated Total (%) Total (#) Quantity of Quality of Equity of Efficiency and adult adult adult effectiveness of education education education adult education 41.7 46.7 41.7 30.8 33.3 26.7 16.7 30.8 8.3 20.0 8.3 23.1 0.0 0.0 16.7 15.4 8.3 0.0 8.3 0.0 8.3 6.7 8.3 0.0 100.0 12 100.0 15 100.0 12 100.0 13 All four policy issues relating to improving adult education were found to be most important or above average in important with over 58 percent in each case. Fifteen out of the 16 countries that responded to this question indicated that adult literacy and numeracy data are required in order to improve the quality of adult education. Of these, almost 47 percent ranked this policy issue as most important to their respective countries (see Table B3.3). 96 Chart B3.3: Ranking of policy issues relative to improving adult education 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Most important (1-3) Above average (4-6) Average (7-9) Below average (10- Least important (1312) 15) Quantity of adult education Quality of adult education Equity of adult education Efficiency and effectiveness of adult education As shown in Chart B3.3, all the policy issues relative to adult education were found to be very important to countries with relatively small variations in proportion of countries among issues. The largest proportion of countries reported that improving quality of adult education is most important with about 30 percent of countries indicating same. 97 Table B3.4: Policy issues that require literacy data relative to socio economic concerns by level of importance (rank) Improving labour Reducing productivity social and and economic competitive Improving inequality ness health Rank Most important (1-3) 42.9 46.7 50.0 Above average (4-6) 28.6 20.0 25.0 Average (7-9) 7.1 13.3 25.0 Below average (10-12) 0.0 6.7 0.0 Least important (13-15) 14.3 6.7 0.0 Not Stated 7.1 6.7 0.0 Total (%) Total (#) 100.0 14 100.0 15 100.0 8 As shown in Table B3.4, the majority of countries indicated that data on adult literacy and numeracy are needed to reduce social and economic inequality and to improve labour productivity and competitiveness with 14 and 15 countries respectively. It was found that the issue of improving labour productivity and competitiveness was most prevalent with about 47 percent of the countries indicating same followed by the issue of reducing social and economic inequality with about 43 percent. 98 Chart B3.4: Ranking of policy issues relative to socio economic concerns 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Most important (1-3) Above average (4-6) Average (7-9) Below average (10-12) Least important (13-15) Reducing social and economic inequality Improving labour productivity and competitiveness Improving health Chart B3.4 indicates that there are hardly any variations among socio economic policy issues relative to the importance among countries. Approximately the same proportion of countries ranked each policy issue as most important. 99 B4: Possible funding source(s) identified Countries were asked if possible source(s) of funding have been identified to support the implementation of a national literacy and numeracy assessment. Table B4: Funding Source Identified Funding Source Identified Number Yes No Not sure Total Percent 3 17.6 13 76.5 1 5.9 17 100.0 As shown in Table B4, all 17 countries responded to this question. The majority of the countries have not yet identified a source of funding. In fact, only 3 out of the 17 countries indicated that funding was identified. One country was not sure whether or not any source of funding had been identified. C. Operational Capacity The implementation of adult skills assessments places significant demands on the operational capacity of NSOs. This section attempts to evaluate whether the countries have the relevant capacity to undertake an assessment. C1: Number of staff with Literacy Survey Experience All 17 countries responded to the question on the number of staff with literacy survey experience. Chart C1 and Table C1 reveal that the overwhelming majority of countries have no experience in conducting literacy surveys with almost 59 percent (10 countries) with no literacy survey experience. The countries that reported having staff members with such experience are limited to only one or two. Only seven countries reported having experience in literacy surveys. 100 Chart C1: Number of staff members with literacy survey experience Number of countries 12 10 8 6 4 2 0 0 1 2 4 5 6 Number of staff members Table C1: Number of staff with Literacy Survey Experience Number of staff Frequency Percent 0 10 58.7 1 2 11.8 2 2 11.8 4 1 5.9 5 1 5.9 6 1 5.9 17 100.0 Total Experience suggests that the lack of experience in the Region is not a serious barrier to implementation provided that the implementation process includes a significant amount of theoretical and procedural training. Without such training, the risk of inadvertent error is very high. C2: Specific literacy assessment experience The seven countries which indicated that they have staff members with literacy survey experience were asked to indicate the areas in which they have experience. 101 Table C2: Experience in selected areas in Literacy Survey Yes Survey Experience Planning No. % Total 7 100 7 Sampling 6 85.7 7 Data collection 6 85.7 7 Data entry/ data capture 6 85.7 7 Coding 6 85.7 7 Editing 6 85.7 7 Data analysis 5 71.4 7 Other 3 42.9 7 As shown in Table C2, all seven countries that reported experience in literacy surveys have experience in ‘planning of literacy surveys’ but only five have ‘data analysis’ experience. Six countries have experience in ‘sampling’, ‘data collection’, ‘data capture’, ‘data coding’ and ‘data editing’ respectively. C3- C12: Human capacity in survey phases Countries were asked about their capacities in the various survey phases- data collection, data capture, data coding, data editing and data analysis. 102 Interviewer capacity Table C3: Number of trained Interviewers No. of trained interviewers on staff No. of countries Percent 0 5 29.4 2 1 5.9 5 3 17.6 6 1 5.9 8 2 11.8 10 1 5.9 15 1 5.9 50 1 5.9 64 1 5.9 70 1 5.9 17 100.0 Total Household-based skills assessments have average interview durations of some 90 minutes plus travel time. Even with relatively small sample sizes, such projects consume very large numbers of interviewer hours. Table C3 shows the size of the available interviewer workforce. It was found that there are significant variations in the size of the available interviewer workforce which range from 0 to 70 interviewers. Only three countries reported having access to large numbers of interviewers (50- 70 interviewers). The remainder of the countries have very limited numbers of interviewers, too few to support implementation of an assessment with a large sample size. Only 12 countries reported having trained interviewers on staff. Almost 30 percent (5) of the countries that responded do not have any trained interviewers on staff, more than 53 percent have 15 or less and almost 18 percent have 50 to 70 interviewers. However, countries reported that they usually recruit the required number of interviewers in an adhoc manner depending on surveys to be executed. 103 Table C4: Number of monthly interview hours Interview hours No. of countries 15 2 56 1 500 2 600 1 792 1 NS 5 NA 5 Total 17 NS: Not stated; NA: Not applicable (no interviewers on staff) When asked about the total monthly collection capacity in terms of number of interview hours, only seven of the 12 countries that reported having trained interviewers on staff, provided estimates. These estimates ranged from only 15 interview hours per month to 792 interview hours per month (see Table C4). Field supervision capacity An essential element of quality assurance in household-based skills assessments involves supervision of interviewers throughout the field operation. Table C5 reveals that only two countries reported having an appreciable number of field supervisors (20 and 30 respectively) and more than 35 percent of the countries do not have any field supervisors on staff. 104 Table C5: Number of Field Supervisors on staff No. of Field Supervisors No. of countries Percent 0 6 35.3 1 1 5.9 3 2 11.8 4 1 5.9 5 3 17.6 7 1 5.9 8 1 5.9 20 1 5.9 40 1 5.9 17 100.0 Total Data Entry and Data Coding Capacities While over 70 percent of the countries reported having data entry clerks and data coding clerks respectively on staff, the number of these clerks per country ranged from 2 to 16 and 2 to 9 respectively (see Tables C7 and C9). Table C7: Number of Data Entry Clerks on staff No. of Data Entry Clerks No. of countries Percent 0 5 29.4 2 1 5.9 3 2 11.8 5 2 11.8 6 4 23.5 7 1 5.9 8 1 5.9 16 1 5.9 17 100.0 Total 105 Table C9: Number of Coding Clerks on staff No. of Coding Clerks No. of countries Percent 0 5 29.4 2 4 23.5 3 3 17.6 4 1 5.9 5 1 5.9 8 1 5.9 9 2 11.8 17 100.0 Total Programming capacity As shown in Table C10, countries have access to a very limited number of programmers on staff, in fact eight countries report having no programmers at all on staff i.e. over 47 percent of the countries that responded. Table C10: Number of Programmers on staff No. of Programmers No. of countries Percent 0 8 47.1 1 3 17.6 2 5 29.4 3 1 5.9 17 100.0 Total Field editing capability Again, as shown in Table C11, only nine countries have field editors. The number of field editors ranged from 2 to 11. Only one country reported having 11 field editors, the largest number of field editors reported. 106 Table C11: Number of Field Editors on staff No. of Field Editors No. of countries Percent 0 8 47.0 2 3 17.6 5 1 5.9 6 2 11.8 10 2 11.8 11 1 5.9 17 100.0 Total Statistical analysis capacity Extracting full value from the assessment results and associated background information requires the transformation of the raw data into information through a process of statistical analysis. The following chart reveals significant variation in analytic capacity. Table C12 reveals a large variation in the available statistical analysis capacity. Two countries reported having no analysis capacity while one country reports having 14 analysts on strength. Table C12: Number of experienced Statistical Analysts on staff No. of experienced Statistical Analysts No. of countries Percent 0 2 11.8 1 3 17.5 2 1 5.9 3 2 11.8 4 3 17.6 5 2 11.8 7 1 5.9 8 1 5.9 12 1 5.9 14 1 5.9 17 100.0 Total 107 C13: Analysis Tool Analysis of assessment results is generally undertaken using a small number of common analytic software i.e. SAS, SPSS and Excel. When asked whether or not the staff had working experience in SAS, SPSS and Excel, the majority of the 17 countries reported experience in Excel and SPSS with 94 percent and 88 percent respectively. However, only about 12 percent of the countries reported experience in SAS (see Chart C13 and Table 13.1). Chart C13: Experience in selected analysis tools 100.0 94.1 88.2 90.0 80.0 70.0 60.0 % 50.0 40.0 30.0 20.0 10.0 11.8 0.0 SAS SPSS Excel 108 Table C13.1: Experience in selected analysis tools SAS SPSS Excel Yes 11.8 88.2 94.1 No 88.2 11.8 5.9 100.0 100.0 100.0 17 17 17 Total (%) Total (#) As shown in Table C13.2, only Jamaica and Grenada have experience in SAS. However, experiences in SPSS and Excel are quite prevalent in the region with only Montserrat and the Bahamas reported having no experience in SPSS while only Montserrat reported no experience in Excel. Table C13.2: Experience in selected analysis tools by country Anguilla Antigua and Barbuda Belize Bermuda British Virgin Island Cayman Island Dominica Jamaica Grenada Montserrat Saint Kitts and Nevis Saint Lucia Bahamas Guyana Suriname Turks and Caicos Islands St. Vincent and the Grenadines Experience in: SAS SPSS No Yes No Yes No Yes No Yes No Yes No Yes No Yes Yes Yes Yes Yes No No No Yes No Yes No No No Yes No Yes No Yes No Yes Excel Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes 109 C14: Advanced analysis capacity Analysis of data from skills assessments depends on a small range of statistical techniques including tabulations (to profile the social distribution of skills), simple regressions (to reveal the factors that have the largest impact on observed skills levels and the impacts that skills has on outcomes) and multi-level/multi-variate analysis (to reveal relationships that are not confounded). Chart C14 reveals that the majority of countries have the capability to produce tables with 88 percent, but the minority has the capacity to undertake regression analysis (53 percent). As such, there will be a need for suitably qualified and experience personnel to provide this service in most of the countries and or suitably designed training would be required for nationals in each country. Table C13 shows the countries with and without the targeted capabilities. Chart C14: Type of analysis 100.0 90.0 88.2 80.0 70.0 58.8 60.0 52.9 % 50.0 40.0 30.0 20.0 10.0 0.0 Tables Simple Regressions Multi-level/multi-variate regressions 110 Table C14: Type of analysis by country Tables Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes No Yes Yes Simple Regressions Yes No Yes No Yes Yes No Yes Yes No No Yes No Yes Yes Multi-level/multivariate regressions Yes No Yes No No Yes No Yes Yes No No Yes No Yes Yes Turks and Caicos Islands Yes No No St. Vincent and the Grenadines Yes Yes Yes Anguilla Antigua and Barbuda Belize Bermuda British Virgin Island Cayman Island Dominica Jamaica Grenada Montserrat Saint Kitts and Nevis Saint Lucia Bahamas Guyana Suriname Technical capacity and infrastructure C15: Sampling capability All of the options reviewed require the selection of a multi-stage, stratified probability sample by a skilled and experienced sampling statistician. As shown in Chart C15, only seven of the seventeen responding countries have a sampling statistician on staff. 111 Chart C15: Countries with sampling statistician on staff 12 No. of countries 10 8 6 4 2 0 Yes No C16- C20: Statistical Capacity Questions 16 to 20 sought to determine whether or not the countries have the capabilities to perform the following: Select a multi-stage, stratified probability sample Weight survey records Calculate variance estimates based on complex survey designs Calculate variance estimates using replicate weights Use In-design, the software used to generate the test booklets As shown in Chart C16- C20, the most prevalent capability is weighting of survey records with almost 65 percent of the countries having this capability followed by the selection of multi-stage, stratified probability samples, with almost 59 percent. Less than 24 percent of the countries have the capabilities to calculate variance estimates using replicate weights and to use In-design, the software used to generate the test booklets respectively. 112 Chart C16-C20: Country capacity in selected statistical capacities 90.0 76.5 80.0 70.0 % 60.0 50.0 76.5 70.6 64.7 58.8 41.2 40.0 35.3 Yes 29.4 30.0 23.5 23.5 No 20.0 10.0 0.0 Select a multistage, stratified probability sample Weight survey records Calculate variance Calculate variance Use In-design, the estimates based on estimates using software used to complex survey replicate weights generate the test designs booklets C21: Access to high-speed internet The computer-based option reviewed in this Report requires access to either high-speed internet or 3G wireless network or a system that provides the option of occasional upload of results. When asked about the proportion of the country that has high-speed internet access, only 13 countries responded. Of these, only two reported having 100 percent coverage and five reported having less than 50 percent coverage (see Table C21). This finding suggests a need for multiple implementation protocols such as the use of 3G-service where available; use of large cache memory to facilitate delayed uploading of data; and the use of central internet access points where respondents from areas with no internet service could participate in the Survey. 113 Chart C21: Proportion of countries with high speed internet 6 5 4 No. 3 of countries 2 1 0 Less than 50 50-70 71-99 100 Percent C22: Interviewers with the ability to perform simple tasks on the computer interviewers who are able to do simple tasks on a computer The computer-based assessments require that interviewers have the ability to undertake simple tasks on a computer. Chart C22 reveals that of the 12 countries that reported having interviewers on staff (see Question C3) only eight responded to this question. Of these, all reported having interviewers with such skill. However, only 2 reported having between 16 and 60 such interviewers while four have less than 10 interviewers with such skills. This finding suggests a need for focused recruitment strategies and some basic training. 114 Chart C22: Interviewers who can do simple tasks on the computer 5 No. of countries 4 3 2 1 0 0-9 10-15 16-50 No. of Interviewers C23: Interviewers with experience in computer-assisted personal interviewing (CAPI) When asked about the number of interviewers with computer-assisted personal (CAPI) interviewing skills, only 5 of the 8 countries that have computer-literate interviewers have such experience (Chart C23). Only two countries have 10 to 50 such interviewers. 115 Chart C23: Interviewers with CAPI experience 3.5 Number of countries 3 2.5 2 1.5 1 0.5 0 0 1-10 10-50 Number of interviewers 116 CHAPTER 6: DETAILS ON THE OPTION RECOMMENDED BY THE AGS- FULL BOW VALLEY ASSESSMENT Based on the Consultant’s comprehensive review of the various literacy assessment, the CARICOM Advisory Group on Statistics (AGS) recommended the use of the Full Bow Valley web-based assessment for the Region. 6.1. Detailed Methodological Approach of the Full Bow Valley Web-Based Assessment Each assessment option was evaluated against four criteria, namely Cost, Technical burden, Operational burden and Risk. The evaluation recommended the use of the Saint Lucian (paper and pencil) or the Bow Valley instruments in a common (sample size of 1,000 households per country) regional assessment. Both options, it was stated, would impose a manageable financial, technical and operational burden on National Statistics Offices. In the pilot conducted in Saint Lucia, it was reported that implementation of the paper and pencil instrument placed a heavy burden on interviewers both “literally and figuratively”. The weight of the test booklets and component measures was taxing. The Bow Valley option would be slightly less costly and less operationally burdensome but would be slightly more technically demanding because of its reliance on computer technology. The AGS was of the view that the Bow valley full (sample size of at least 3,000 cases) instrument would provide the data needed by the countries for policy purposes, and that countries in the Region could shoulder the technical, operational and financial burdens associated with conducting this assessment with acceptable levels of risk. The web-based data collection method is preferred over the paper and pencil method since, among other advantages, this method limits the possibility of manual errors during the data collection phase and it also reduces the time it takes for data collection, compilation and analysis. This ultimately will translate to a reduction in the cost of the overall survey. The full Bow Valley web-based assessment embodies the same science that is contained in the IALS, ALLS, ISRS, LAMP and PIAAC. Compared to the other assessments, the Bow Valley web-based assessment provides improved 117 reliability of skills estimates while reducing the cost, technical burden and operational burden of the assessment process. It utilizes a suite of assessment tools that support the full range of needs from program triage (i.e. the process of determining learner objectives and learning needs) to certification in real time. The assessment measures prose literacy, document literacy, numeracy, oral fluency and, for low-level readers, a computer-based variant of the reading components carried in the ISRS, LAMP and PIAAC assessments. The full Bow Valley web-based assessment will allow countries to use sample sizes that will support the direct estimation of more point estimates at the national level and more reliable relationships. The assessment of the Bow-valley web-based assessment relative to the four criteria is summarized as follows: 6.1.1. Cost The adaptive algorithms in the Bow Valley Web-based assessment allow the average interview duration to be significantly less than the other options reviewed resulting in the lowest unit cost for data collection, even with allowanced for licensing fees and the hardware cost. 6.1.2. Operational burden The preferred option uses computer technology to manage all of the burden of collecting the background information, administering the tests, scoring and scaling the data. The adaptive algorithms that have been built into the application significantly reduce the number of items that are needed to reach the desired precision levels. This reduction translates into shorter interview durations. As such, the operational burden associated with the Bow Valley web-based assessment is significantly lower than that imposed by any of the other options. 6.1.3. Technical burden Compared to the other options reviewed, the technical burden associated with the Bow Valley web-based assessment is much lower. This is mainly because the assessment is designed to handle all of the associated technical burden almost automatically. In cases where countries are interested in synthetic estimates for small areas using the Population and Housing Census data, the 118 associated methods to do this is relatively easy to follow for those who are suitably experienced. 6.1.4. Risks The key risk is unavailability of suitable communication infrastructure such as adequate bandwidth to support the delivery of the assessment. However, the Bow Valley web-based assessment tools are designed to support delivery on tablet computers, on laptops or on standalone computers with or without internet access. In cases where internet access is problematic, the Bow Valley web-based assessment can use 3G (third-generation) wireless network or can be completed off-line and uploaded at a location where internet access is available. Another potential risk is the unfamiliarity of the respondents (test takers) with computer technology. However, the Bow Valley assessment tools are designed to place minimal demands on the respondent. The administration protocol of the assessment provides for the interviewer to input the test responses as directed by the respondent, in cases where the respondents lack the technical skills to use the technology themselves. 6.2. Recommendations to Inform the use of Bow-Valley Full Web-Based Assessment The recommendations for the use of the Full Bow Valley Web-Based assessment are as follows: Recommendation 1-Sample Size: In general, the size of the sample should be dependent on the country specific policy requirements and subject to cost and the budget available to conduct the assessment in the respective countries. In order for the point estimates to be reliably estimated according to selected characteristics, a minimum of 600 cases is required per category for each characteristic. Considering that educational attainment explains almost 70 percent of a person’s literacy level and in an effort to minimize the cost and maximize the reliability of the survey, the recommended minimum characteristics and categories for small countries or for countries that have budget constraints are as follows: 119 Characteristic Sex Educational attainment Category Male Female (600 cases) (600 cases) Primary Secondary Tertiary (600 cases) (600 cases) (600 cases) Therefore, total sample size to provide reliable point estimates for sex and education with five categories is 3000 cases (5*600). Similarly, for larger countries or for countries with less severe budget constraints, the recommended minimum characteristics and categories are as follows: Characteristic Sex Educational attainment Age Groups Category Male Female (600 cases) (600 cases) Primary Secondary Tertiary (600 cases) (600 cases) (600 cases) 15-44 45+ (600 cases) (600 cases) In this case, the total sample size to provide reliable point estimates for sex, education and age with seven categories is 4,200 cases (7*600). Therefore, the recommended minimum sample size is 3,000 cases for small countries and 4,200 cases for larger countries. Allowances can be made for non-response, which will bring the recommended sample sizes to 3,600 and 4,800 respectively. Countries are free to increase the number of characteristics starting with these recommended proposals. However, this figure of 600 cases is based on a desired level of precision or margin of error and level of confidence. It is possible therefore that if the tolerable level of confidence is say 90 percent and the margin of error is say 5 percent then the number of cases may be less than 600. Countries can use the margin of error and level of confidence that they would normally use in their household surveys to derive reliable estimates at the sub-national level and for sub-groups of the population. This issue will be discussed further in the guidelines for sample design that will be prepared under Phase II. 120 Recommendation 2- Adequate Communication Infrastructure: Since the preferred data collection method is web-based, it is recommended that countries identify at an early stage, areas where internet coverage or access are inadequate. In the absence of the internet, the Bow Valley assessment tools allow for the use of 3G networks. The assessment tools also allows for off-line data collection with the use of large cache memory, in which case, the delayed data download will be employed post the interviews. In addition, central locations can be identified where internet access is available and where respondents could be interviewed. Recommendation 3- Sharing of equipment to conduct survey across countries: It is recommended that there be sharing of equipment across countries to make it feasible for all countries to participate in the survey. This approach is possible since countries do not have to execute the survey at the same time. This approach will result in a considerable reduction in the overall survey cost per country. Countries can therefore contribute to the purchasing of the equipment, mainly laptops and other hand-held devices. Recommendation 4- Respondents with limited or no computer technology knowledge: The recommendation is for interviewers to input responses on the device as directed by the respondents since the assessment tools allow for this. Additionally, tutorial sessions on the use of the devices to respond to the questions, should be made available to the respondents prior to the test. Recommendation 5- Adequate human resources: Since the majority of the countries currently lack the operational and technical capacity to conduct a literacy survey in general and specifically the Full Bow Valley Web-Based assessment, it is recommended that countries/ Region consider this limitation when preparing the budget for their assessment. High-level technical experts to be utilized to provide the training and to bridge the gap. 121 Recommendation 6Method of selection, age and number of respondents per household: It is recommended that one adult, 15 years old and over be selected per household using the Kish selection method. Recommendation 7- Relevance of Assessment Instruments: There must be country involvement in the development or refinement of test items, questionnaires and corresponding documents including manuals for training, interviewing and tutorials for respondents to ensure suitability to the respective countries. Recommendation 8Generation of synthetic estimates for a broader range of characteristics: It is recommended that synthetic estimates can be generated by applying the national survey estimates obtained (using a sub-sample of 1,000 cases) to the data of the population and housing census. This approach is said to be a useful method to obtain estimates for a range of characteristics to satisfy policy requirements and utilizes applied statistical methods which are explained in Annex D. 6.3. Adjustments Required The recommended option requires adaptations to make it country-specific. The assessment comprises the following instruments/ documents: (a) (b) (c) (d) (e) Screening component Background Questionnaire Core task test items Main task test items Exit component The adaptations required to these documents are as follows: 6.3.1. Adaptations to the background questionnaire The questionnaire used in Saint Lucia’s pilot is, to some extent, region-specific and can be used in the web-based assessment. The Saint Lucia questionnaire can therefore be reviewed in the context of Recommendation 7 above. 122 Questions in the background questionnaire that were used in Saint Lucia need to be reviewed to ensure that the coding structure captures key variables adequately. More specifically, the questions that relate to educational qualifications, industry and occupation and adult education and training need to be reviewed by each country and adjusted as appropriate. The background questionnaire is used in analysis but also serves as a means to adjust for nonresponse and to improve the reliability of the proficiency estimates so all changes need to be reviewed and approved by the regional study manager. 6.3.2. Review of the test items The Saint Lucian test items can work in the Caribbean context with limited adaptations. Countries would be required to review the items to ensure that the items have the appropriate level of face validity. The Bow Valley tools rely on an item pool that include all of the items that were included in the Saint Lucia assessment booklets plus a much larger number of items that have been shown to work in English and French populations. Countries should review items in the Bow Valley pool to ensure that they have the appropriate level of face validity. The adaptive design of the Bow Valley assessment affords much greater coverage of the assessed constructs, a feature that yields more reliable proficiency estimates. 6.3.3. The need for off-line completion Several countries indicate that there are areas of the country where access to high-speed internet or 3G networks is not available. Bow Valley has indicated that a version of the test software is available that provides for standalone completion and subsequent uploading. This adaptation would need to be provided where required. This has been incorporated in Recommendation 2. 123 CHAPTER 7: RESULTS OF THE TWO REGIONAL TRAINING WORKSHOPS CONDUCTED UNDER PHASE 1 Two (2) regional technical training workshops were conducted under this Phase targeting statisticians from the National Statistical Offices and Officers from the Ministries of Education from the Member States and Associate States. These workshops served to introduce participants to the various methodologies that exist relative to reliably measuring literacy as well as to inform on the option proposed by the AGS. 7.1 The First Regional Training Workshop This first workshop was held in Trinidad and Tobago from 30 November to 2 December 2011(see Annex E for the workshop reports). The main objectives of this first workshop were to familiarise Member States with the following: (a) (b) (c) The theory that is used to build and interpret literacy assessments; the practical issues pertaining to the conduct of a literacy assessment; and the potential pitfalls and the use of skill assessment data in informing policy. The main achievements of this workshop were that participants gained a better understanding of what is involved in the planning and execution of a largescale Literacy assessment. The participants were exposed to the science that underscores large-scale literacy assessments including the follow: (a) (b) (c) (d) (e) the pragmatics of implementing a literacy survey; costing templates- data collection for paper-based and web-based approaches; the components of a National Planning Report; the basic consideration in developing questions to measure literacy; the methods to identify measurement goals; and 124 (f) 7.2 the considerations assessments. with regards to the building of literacy The Second Regional Training Workshop This second regional training workshop was held in Suriname from 21 to 22 June 2012 (see Annex F for the workshop reports) The main objectives of this second workshop were as follows: (a) (b) (c) (d) (e) to provide an overview of the various literacy options reviewed including the AGS’ recommended option; to familiarise participants with the AGS recommended approach; to provide an overview of the proposed Plan of Action; to inform and obtain feedback on the technical requirements; and to inform on the activities that are to commence under Phase II of the project. The main achievements of this workshop were as follows: (a) (b) (c) Participants were given a better understanding of the following: (i) the various literacy measurement approaches and specifically the AGS proposed web-based literacy assessment. (ii) The issues pertaining to costing, sample size estimation, background questionnaire and assessment/test booklet were also discussed. Participants were familiarized about the Plan of Action relative to the recommendations and actions that inform the common framework. Participants were informed about the general findings of the individual country capacity assessment relative to the human and technical requirements needed in the conduct of literacy surveys. 125 CHAPTER 8: COMMON FRAMEWORK WITH THE PLAN OF ACTION A draft Common framework was developed and submitted to countries for comments and feedback. See Annex G for detailed document. 1. Evaluation of the options At the Ninth meeting of the CARICOM Advisory Group on Statistics (AGS) held 20-22 October 2011 in Belize, the Consultant gave a presentation on the findings of the evaluation of the assessment options. The evaluation suggested that all the options evaluated would satisfy the Region’s information needs. However, the AGS recommended the use of the full Bow Valley web-based assessment in the Region. This option allows for each country to be considered a separate domain with the respective sample size being proportionate to the size of the population of the country. This decision of the AGS was in the context of obtaining meaningful estimates at the country level relative to age, sex, educational attainment and geographic area. In other words, countries will need literacy statistics by small geographic areas as well as by variables such as age, educational attainment, and sex to facilitate the use of the data for policy and decision making at local levels. In addition, the Bow Valley option would be slightly less costly and less operationally burdensome but would be slightly more technically demanding because of its reliance on computer technology. The Bow Valley tool has the unique advantages of providing immediate results and supporting a range of other assessment purposes including the evaluation and administration of literacy programs. This assessment tools have proven particularly useful for placing students and for evaluating learning gain and program efficiency. 2. Some issues raised at CARICOM’s Advisory Group on Statistics (AGS) meetings 1. Sample size: The meeting was advised by the Consultant that the data from the survey of 1,000 households per country could be used to provide detailed estimates by applying the national survey estimates to the census or other survey data. In this approach, the Region was to be viewed as one sampling domain and the countries as sub-domains in which the sample size for each country was to be 1,000. This approach was not accepted by the meeting (relative to the Domain/Sub-domain and 1,000 sample size per country) 126 because it was pointed out that the countries will need literacy statistics by small geographic areas as well as by variables such as age, educational attainment, and sex so as to facilitate the use of the data for policy and decision making at local levels. Therefore, the sample size should be dependent on the policy issues to be address and hence the data disaggregation needs of the respective countries. The sample size to be used by countries must be able to provide reliable estimates when applied to the populations of the respective countries. For each country, the sample size should be proportionate to the size of the population of the country. 2. Response rate: It was noted that while a very high response rate is usually difficult to achieve for this particular type of survey, countries should strive to achieve a response rate of about 75- 80 percent. Measures should be implemented to adjust for response rate bias. 3. Internet access: There may be areas in some countries where there is little or no internet access. The meeting was advised that in such cases there are two options- (i) a number of programmes/ software could be preloaded onto the data collection device, which could allow for off-line entries in which case a very large cache memory and large download capacity will be required. (ii) countries could establish centralised internet access, in which case, large cache memory and large download capacity would not be required. Further, there may be areas in some countries where there is no internet access but mobile network service (such as 3G) is available. In such cases, the software could be set up to run on any standard laptop or wireless tablet. 4. Respondents’ level of knowledge of computers and its operation: It was noted that the use of a web-based assessment might prove to be challenging for noncomputer users. The meeting was advised that in such cases there are two options- (i) a specially designed tutorial could be taken, in advance of the assessment, to acquaint non-computer users with basic mouse operations and the response types. (ii) the interviewers will input the responses into the laptops, tablets and other similar devices at the direction of the respondent. 5. Comparative cost of data collection: It was agreed that estimates to compare cost of both the web-based and paper and pencil-based data collection methodologies should be done for each country including full cost for the infrastructure/ equipment. 127 6. Number of field staff needed versus number of data collection devices required: In relation to the cost of laptops, tablets and other similar devices and the cost of the survey, the meeting was informed that there will be no need for a large number of field staff and hence these devices, since the fieldwork could be done over an extensive period. It was elaborated that literacy levels in a population change at a very slow rate over time and therefore, an extended fieldwork period would not affect the results of the survey. 7. Need for technical workshops: Even though most of the data processing will be done automatically using the preferred option, the technical workshops will include details on concepts, procedures, scoring methodology, costing and psychometric training. 8. Stakeholders’ involvement: The technical workshops will target representatives from the National Statistics Office and the Ministries of Education after which, a national team should be formed comprising of other government ministries such as Ministry of Labour, and the Ministry of Finance. 3. Recommended option of the AGS The AGS played an integral role in recommending the Full Bow Valley WebBased assessment for use in the CARICOM Region. This web-based literacy assessment was developed by the Bow Valley College in Calgary, Alberta and was funded by the Government of Canada’s Human Resources and Skills Development Ministry. This assessment is based on the theory assessment methods deployed in the IALS, ALLS, ISRS, PIAAC and LAMP. The goal was to reduce the cost, operational burden and test duration to a level that would allow for use in a wide variety of settings including instructional programs. The tool includes a number of innovative features as follows: (a) An adaptive algorithm that greatly reduces test duration while reducing standard errors around the proficiency estimates. (b) The ability to chose any combination of skills domains, the ability to choose among four precision levels that support different uses i.e. program triage17, formative or summative assessment, pre and post assessment that supports reliable estimates of score gain and a certification for employment linked to Canada’s system of occupation skills standards. Program triage involves the process of determining learner objectives and learning needs so that an individual learning plan can be formulated. The process of program triage is central to the implementation of efficient and effective programs. 17 128 (c) A pair of score reports that provide diagnostic information for the learner, and their instructor, and a third score report that identifies the benefits that would be expected to accrue to the learner should the prescribed training be undertaken. Real time algorithmic scoring improves scoring reliability and allows score reports to be generated in real time. 4. Main comments received from Member States Approximately ten countries provided feedback on the draft framework as follows: a) Survey cost, availability of technical expertise and managerial capacity Countries have indicated that the cost of the exercise will be a major concern. The absence of technical expertise for the conducting of the survey will pose a problem. This would include expertise that would be required in the areas of survey sampling, data editing, scoring and weighting of the results, variance estimation and statistical quality control of the operations. It was also indicated that the respective Ministries of Education may not have the necessary managerial capacity to undertake the Literacy Survey and that the respective statistical agencies may also lack the pedagogical skills to work on the instruments. Therefore, collaboration between the two agencies would be required and should be possible. b) Suitability of web-based approach It was observed by one country that the paper and pencil-based environment is a more familiar environment than the web-based one and that a significant culture shift would be required in the case of the latter. Ensuring suitability of the web-based approach at the country level should be taken seriously. Actual devices should be tested and there is also concern about the location and security of the data set. It was also stated that the paper and pencil-based approach had credibility and ownership within it since the scoring is done by persons such as teachers in the country trained to score the completed test booklets. Therefore, especially with the active involvement of these persons in scoring, there will be more buyin to the process. With respect to the web-based approach, it is very important that mechanisms be setup to ensure that the validity of the process is well understood. It must 129 be possible for the assigned score on each case (which is assigned by the webbased system) to be seen, validated and verified. The process of doing this in a web-based system is not obvious and will need to be thoroughly tested. It was indicated by another country, relative to a One Laptop Per Family (OLPF) project, that this country might be at an advantage if a web-based methodology is used. This country viewed favourably the proposed solutions for areas within countries that do not have internet connectivity as well as for persons that were not computer literate. c) Oversampling of specific population groups One country stated that oversampling of the 15-24 age group (which includes the population just completing secondary school) might be necessary since the literacy survey may have, as one of its main objectives, the assessment of the education system in providing an education relevant to the present day realities of the job market. It was further stated that this group is of special interest since it demands jobs the most, requires new skills in some cases, is the most adaptable to retraining, and is a significant age cohort within the population. 5. Proposed data collection approach indicated by countries Of the 16 countries that responded to enquiries relative to the data collection approach they are likely to use for the conduct of a National Literacy Survey in their respective countries, seven indicated the paper and pencil-based approach while nine indicated the electronic approach. However, it should be noted that it is not likely that all the countries indicating electronic would opt for the Bow Valley web-based option. However, the theoretical underpinning of the literacy testing framework will be the same regardless of the approach used (i.e. paper-based versus electronic/ web-based). The breakdown is presented in following table: 130 Table1: PROPOSED APPROACH TO COLLECTING DATA BY COUNTRY COUNTRY APPROACH TO COLLECTING DATA Antigua and Barbuda Electronic The Bahamas Electronic Barbados Electronic Belize Paper and pencil Dominica Paper and pencil Grenada Electronic Jamaica Paper and pencil Montserrat Paper and pencil St. Kitts and Nevis Paper and pencil St. Lucia Paper and pencil St. Vincent and the Grenadines Electronic Suriname Electronic Trinidad and Tobago Electronic Bermuda Electronic British Virgin Islands Paper and pencil Cayman Islands Electronic 6. Consideration of the AGS’ recommendation relative to the use of the Bow valley web-based approach in the Region The review of the various literacy assessment approaches conducted by the Consultant, suggests that all of the options would satisfy the Region’s information needs and could provide similar levels of reliable estimates. Further, all the options have common methodological underpinnings and may differ only by the data collection method employed. Some countries may opt for paper and pencil data collection while others may opt for some form of electronic data collection/ web-based approach. 131 Therefore, one can conclude that the recommendation of the AGS is primarily one on the approach to data collection based on the criteria of the assessment and has nothing to do with the theoretical differences. 7. Recommendations18 1. Sample Size In general, the size of the sample should be dependent on the country specific policy requirements and subject to cost and the budget available to conduct the assessment in the respective countries. In order for the point estimates to be reliably estimated according to selected characteristics, the Consultant has indicated that a minimum of 600 cases is required per category for each characteristic. Considering that educational attainment explains almost 70 percent of a person’s literacy level and in an effort to minimize the cost and maximize the reliability of the survey, the recommended minimum characteristics for small countries or those that have budget constraints are as follows: Sex: Male (600 cases), Female (600 cases); Educational attainment: Primary (600 cases), Secondary (600 cases), Tertiary (600 cases) This gives a total of two characteristics and five categories for making reliable point estimates, resulting in a minimum sample size of 3000 cases (5*600). For larger countries or those with less severe budget constraints, the recommended minimum characteristics are as follows: Sex: Male (600 cases), Female (600 cases), Educational Attainment: Primary (600 cases), Secondary (600 cases), Tertiary (600 cases) Age Groups: 15-44 (600 cases), 45+ (600 cases) This will provide for three characteristics with a total of seven categories for reliable point estimates resulting in a sample size of 4,200 cases (7*600).Therefore, the recommended minimum sample size is 3,000 cases for small countries and 4,200 cases for larger countries. Allowances can be made for non-response, which will bring the recommended sample sizes to 3,600 and 4,800 respectively. Countries are free to increase the number of characteristics starting with these recommended proposals. However, this figure of 600 18 The recommendations that follow relate to both the web-based data collection approach and the paper-based approach, but may include recommendations specific to electronic data collection. 132 cases is based on a desired level of precision or margin of error and level of confidence. It is possible therefore that if the tolerable level of confidence is say 90 percent and the margin of error is say 5 percent then the number of cases may be less than 600. Countries can use the margin of error and level of confidence that they would normally use in their household surveys to derive reliable estimates at the sub-national level and for sub-groups of the population. This issue will be discussed further in the guidelines for sample design that will be prepared under Phase 2. 2. Adequate communication infrastructure If the preferred data collection method is web-based, it is recommended that countries identify at an early stage, areas where internet coverage or access are inadequate. In the absence of the internet, the Bow Valley assessment tools allow for the use of the 3G network or related networks. The assessment tools also allows for off- line data collection with the use of large cache memory, in which case, the delayed data download will be employed post the interviews. In addition, central locations can be identified where internet access is available and where respondents could be interviewed. 3. Sharing of equipment to conduct survey across countries It is recommended that there be sharing of equipment across countries to make it feasible for all countries to participate in the survey. This approach is possible since countries do not have to execute the survey at the same time. This approach will result in a considerable reduction in the overall survey cost per country. Countries can therefore contribute to the purchasing of the equipment, mainly laptops, tablets and other similar devices. This approach would satisfy concerns raised by some countries relative to the cost of acquiring equipment. 4. Respondents with limited or no computer technology knowledge The recommendation is for interviewers to input responses on the device as directed by the respondents since the assessment tools allow for this. Additionally, tutorial sessions on the use of the devices to respond to the questions, should be made available to the respondents prior to the test. 5. Adequate human resources 133 Since the majority of the countries currently lack the operational and technical capacity to conduct a literacy survey in general and specifically the Full Bow Valley Web-Based assessment, it is recommended that countries/ Region consider this limitation when preparing the budget for their assessment. High-level technical experts to be utilized to provide the training and to bridge the gap. 6. Method of selection, age and number of respondents per household It is recommended that one adult, 15 years old and over be selected per household using the Kish selection method. 7. Relevance of assessment instruments There must be country involvement in the development or refinement of test items, questionnaires and corresponding documents including manuals for training, interviewing and tutorials for respondents to ensure suitability to the respective countries. 8. Generation of synthetic estimates It is recommended that synthetic estimates can be generated by applying the national survey estimates obtained (using a sub-sample of 1,000 cases of the determined country sample) to the data of the Population and Housing Census. This approach is said to be a useful method to obtain estimates for a broader range of characteristics to satisfy policy requirements and utilizes applied statistical methods. This approach does not imply that countries will be utilising the recommendation of the Consultant of considering the Region as a domain and the countries as sub-domains with the sample size of 1,000. The sample size will be selected in accordance with Recommendation 1 and a subsample of this sample can still be applied to the Census data to produce synthetic estimates in addition to those that would be obtained from the survey. 9. Pretesting/ piloting must be done in each participating country This is necessary to ensure that all the tools are applicable for the respective countries. However, the sample (100- 500 cases) used in the pilot should be selected from the main survey sample so that the data collected during pilot exercise could be utilized should there be no need for any major modification to the tools. 134 10. Translation of the common framework to Haitian French and Surinamese Dutch The translation of the framework including the test items should be done by linguists who are familiar with the framework. The translation of the test items for example (included in the filter booklet, location booklet and the main booklet) should be done in such a way to ensure that the psychometric performance of the items remains unaltered This is necessary to ensure that the test items remain identical in psychometric terms so as to ensure comparability among countries. . 11. Duration of training of field staff The length of training of field staff would depend on the quality of field staff and would vary by country. 135 PLAN OF ACTION COMMON FRAMEWORK FOR A LITERACY SURVEY IN CARICOM Activities/ Tasks Estimated Time Frame A: Preparatory Phase for Survey Readiness 1. Establish/ Continue At least 24 months communication with the before start of MOE, MOF and other relevant Survey stakeholders in preparation for the conduct of a Literacy Survey 2. Identify/ Determine the At least 24 months approach (paper and pencilbefore start of based; electronic (e.g. webSurvey based); any other approach or combination of approaches to be used to conduct the Literacy Survey. 3. Prepare survey proposal At least 24 months inclusive of a budget for the before start of identification of funds to Survey conduct the activity (will include preliminary estimates of some activities under Section B such as the sample size etc.) Responsibility Associated Recommendation Remarks Survey executing agency Not Applicable Survey executing agency Not Applicable Initial review of questionnaires and manuals re: output of regional project/ Discussion with proprietor of Bow Valley Survey executing agency Not Applicable The budget should be based on an estimated sample size, method of data collection (paperbased versus web-based (or any other electronic approach), the infrastructural/ human resources required and other relevant information using the generic costing template 137 Activities/ Tasks 4. Schedule a time frame for the conduct of the Survey. Estimated Time Frame At least 24 months before start of Survey Responsibility Survey executing agency Associated Recommendation Availability and sharing of equipment to conduct survey across countries Remarks prepared under the CARICOM Project. Draft questionnaires and other survey instruments relative to printing cost/ procurement cost should be part of the initial proposal Some information through the preparation of the National Implementation Plan (National Planning Report-NPR) would have already been available out of the CARICOM Project that would help countries. This information should be shared with the Secretariat as early as possible. This is to facilitate the development of a timetable to enable the possible sharing of survey equipment among countries. 138 Activities/ Tasks 5. Develop publicity and communication materials Estimated Time Frame At least 6 months before start of Survey Responsibility Associated Recommendation Survey executing agency Remarks Some work should have been put in place under the CARICOM Project B: Commencement of substantive work relative to the Conduct of the Survey 1. Establish a National At least 18 months Survey executing Not Applicable Literacy Survey Committee before start of agency (NLSC) Survey The NLSC should include representatives from the NSO, MOE and Ministries of Labour This Committee should meet, as necessary, to discuss all the survey related activities 2. Identify Focal Point At least 24 months before start of Survey Survey executing agency Not Applicable This person should ideally be a professional within the executing agency. 139 Activities/ Tasks Estimated Time Frame At least 18 months before start of Survey Responsibility 4. Determine the sample size based on the characteristics and categories identified in (3) above. At least 18 months before start of Survey 5a. Review and adapt survey questionnaires, test booklets, and reading components measures as per guidelines. 5b. Engage data processing At least 18 months before start of Survey 3. Determine the required characteristics and categories (such as age, sex and area) to inform country specific policies. The number of categories has implications on the sample size, which in turn will determine the degree of reliability of the point estimates. Associated Recommendation Reliable sample size Remarks The NLSC Reliable sample size The NLSC Relevance of The sample design guidelines prepared under the CARICOM Project should be used. Most countries should have an up-to-date sample frame from the 2010 Round of Population and Housing Census. Consultant should be hired to support the process Work commenced at the project proposal phase The NLSC assessment This activity should be one of the initial considerations of the NLSC. Some work should have been put in place under the CARICOM Project and also at the project proposal phase. Consultant should be hired to support the process instruments 140 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation Remarks 6a. Review and finalise all survey documents including training manuals, instruction manuals, debriefing questionnaire, guidelines, control forms and scoring sheets 6b. Engage data processing personnel in the review of the questionnaire 7. Identify training needs relative to the conduct of the Survey. At least 24 months before start of Survey The NLSC Relevance of Training materials prepared under the CARICOM Project should be use as a base. At least 18months before start of Survey Survey executing agency Adequate human resources 8. Relative to the proposal already prepared in Section A(3) above, review the availability of equipment and infrastructure for the conduct of the Survey At least 18 months before start of Survey Survey executing agency 9. Adapt software to allow for delayed uploading of data in cases where internet and/or other related connection/s At least 18 months before start of Survey Survey executing agency 1. Availability and sharing of equipment to conduct survey across countries 2. Adequate Communication Infrastructure Adequate Communication Infrastructure personnel in the review of the questionnaire assessment instruments Applicable to both electronic and paper and pencil A generic software prepared under the CARICOM Project should be used. 141 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation such as 3G are not available, if necessary 10. Adapt tutorials on the use of the devices to respond to the Survey At least 18 months before start of Survey Survey executing agency Adequate Communication Infrastructure 11. Adapt the tabulation plan At least 18 months before start of Survey Survey executing agency Not Applicable 12. Apply an extensive regime of proactive vetting of materials and plans to detect and prevent errors. At least 12 months before start of Survey Survey executing agency Adequate human resources 13: Based on all of the above, review and finalise the survey design At least 12 months before start of Survey Survey executing agency Not Applicable Remarks For electronic data capture including the web-based approach Consultant should be hired to support the process Generic tutorial documents prepared under the CARICOM Project should be used. For electronic data capture including the web-based approach A Tabulation plan prepared under the CARICOM Project should be used. Consultant should be hired to support the process The finalised National Implementation Plan (National Planning Report-NPR) 142 Activities/ Tasks Estimated Time Responsibility Frame C: Acquisition of Resources for the Conduct of the Survey Associated Recommendation Remarks 1. Recruit Project/ Survey Coordinator and other project staff At least 12 months before start of Survey For Project/ Survey CoordinatorRecruitment should be done at least 24 months before the start of Survey due to the complex nature of the Survey Survey executing agency Adequate human resources Project/ Survey Coordinator should ideally be a professional within the executing agency or a National Consultant who is able to dedicate her/himself full-time for the duration of the survey. 2. Acquire all survey materialsFor Paper and pencil-based(i) Print questionnaires, test booklets, manuals, survey forms, etc (ii) Procure timer, tape recorders, batteries, pencil, erasers, sharpeners, data capture and data processing equipment, etc For web-based(i) Enter into an agreement At least 6 months before start of Survey Survey executing agency Not Applicable For other electronic data collection approaches, procurement of data collection and data processing devices will be required. 143 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation At least 2 months before the specific activity is expected to commence Survey executing agency Adequate human resources Remarks with web-based approach proprietor or other proprietor depending on the approach to be used. (ii) Procure data collection devices specific to the webbased approach 3. Identify and recruit highlevel technical support personnel for the conduct of the Survey. D: Data Collection, Data Processing and Related Activities (i) Undertake Pilot Survey 1. Launch of Survey publicity programme. At least 6 months before start of Survey Survey executing agency Not Applicable The Media should be involved from this stage to provided continued media coverage until the conclusion of the survey field work. Publicity and communication should continue throughout 144 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation 2. Train survey managers and relevant personnel for pilot survey in the execution of key implementation steps e.g. sampling; survey planning; data collection; data processing including scoring and weighting; and data analysis 3. Commence the process of identifying and recruiting field staff for the conduct of the pilot 4. Select sample for pilot Survey At least 1 to 6 months before start of survey Survey executing agency Adequate human resources At least 9 months before start of Survey Survey executing agency Adequate human resources At least 9 months before start of Survey Survey executing agency Pretesting/ piloting must be done 5. Train pilot Survey staff using relevant training materials At least 9 months before start of Survey Survey executing agency Pretesting/ piloting must be done 6. Conduct pilot fieldwork At least 9 months before start of Survey Survey executing agency Pretesting/ piloting must be done Remarks the survey A generic publicity materials prepared under the CARICOM Project should be used. The specially designed guidelines prepared under the CARICOM Project should guide this process Consultant should be hired to support the process. Contracts will be of a short term duration 145 Activities/ Tasks 7. Process pilot data For paper and pencil-based (i) score test booklets and reading components (ii) code of open ended fields such as industry, occupation, other specify (iii) Edit household and background questionnaires (iv) Capture household questionnaires, background questionnaires, and scores (v) Merge background questionnaire, scores and codes (vi) Scale the assessment results, link them to the international proficiency scales and compute error estimates (vii) Weight the data file and compute replicate weights For web-based approach(i) Code open ended fields e.g. Estimated Time Frame At least 6 months before start of Survey Responsibility Survey executing agency Associated Recommendation Pretesting/ piloting must be done Remarks 146 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation Remarks At least 6 months before start of Survey Survey executing agency Pretesting/ piloting must be done Consultant should be hired to support the process This would include the preparation of a tabulation plan and generation of data to be analysed At least 2 months before start of Survey Survey executing agency At least 6 months before start of Survey Survey executing agency Reliable sample size Consultant should be hired to support the process industry, occupation, field of study, other specifies and merge onto analysis file (ii) Weight the data file and compute replicate weights 8. Evaluate pilot and apply an extensive regime of retroactive review of operational results to detect and correct errors. 9. Revise all survey materials based on the pilot survey (ii): Conduct Main Survey 1. Commence publicity activities for main survey 2. Select of sample 147 Activities/ Tasks Estimated Time Frame At least 3 months before start of Survey (recruitment should be done in time for the training sessions) At least 6 weeks before start of Survey Responsibility 5. Conduct fieldwork 6. Process data For paper and pencil-based (i) score test booklets and reading components (ii) code of open ended fields such as industry, occupation, other specify (iii) Edit household and background questionnaires (iv) Capture household questionnaires, background questionnaires, and scores (v) Merge background questionnaire, scores and codes (vi) Scale the assessment 3. Identify and recruit field staff and other Survey staff for the conduct of the main survey 4. Train Survey staff using relevant training materials Survey executing agency Associated Recommendation Adequate human resources Survey executing agency Adequate human resources Schedule time/ duration Survey executing agency Not Applicable For paper and pencil-based Should commence at least 1 week after start of Survey Survey executing agency Not Applicable Remarks For web-based approachWould commence at the start of Survey 148 Activities/ Tasks Estimated Time Frame Responsibility Associated Recommendation Remarks Generation of An additional international cost will be attached to this activity Consultant should be hired to support the process results, link them to the international proficiency scales and compute error estimates (vii) Weight the data file and compute replicate weights For web-based approach(i) Code open ended fields e.g. industry, occupation, field of study, other specifies and merge onto analysis file (ii) Weight the data file and compute replicate weights 7. Generate synthetic estimates To be done if required by country synthetic estimates for a broader range of characteristics E: Data Analysis and Dissemination 149 Activities/ Tasks 1. Prepare reports (preliminary and final) and disseminate widely Estimated Time Frame Approximately 6-9 months after the conclusion of Survey Responsibility Survey executing agency Associated Recommendation Not Applicable Remarks Consultant should be hired to support the process. A tabulation plan should form part of the process in the pilot survey which would allow for some data analysis. 150 CHAPTER 9: SUMMARY AND CONCLUSION Literacy and numeracy skills have been shown to be the most important determinant of rates of social and economic progress over the long term, of national competitiveness and to be one of the principle determinants of social inequality in valued outcomes, including income, employment, health and social engagement. Consistent, reliable and comparable statistical information is an important ingredient for planning, monitoring and evaluation of policy decisions and the lack thereof has long been considered as hampering the effectiveness of public policy in the Caribbean Region. Consequently, noting the lack and/or poor quality of literacy statistics in the Region, and the perceived importance of this information for the Region’s economic and social development in light of the advent of the Caribbean Single Market and Economy (CSME), Statisticians of the Caribbean Community (CARICOM) decided to pursue the development of a common strategy and methodology for the collection and analysis of social and gender statistics including educational statistics. Aware of these challenges and shortcomings of the existing approaches to measuring literacy, the CARICOM Advisory Group on Statistics (AGS), is attempting to develop a common framework for the production, collection and analysis of Literacy data. Following consultation with CARICOM and its Member States, the following seven detailed options were identified and evaluated: 1. The Organisation for Economic Cooperation and Development’s (OECD’s) Program for the International Assessment of Adult Competencies (PIAAC) paper and pencil reading, numeracy and reading components assessments- Full Assessment; 2. The Organisation for Economic Cooperation and Development’s (OECD’s) Program for the International Assessment of Adult Competencies (PIAAC) paper and pencil reading, numeracy and reading components assessments- Common Assessment; 3. The United Nations Education and Science Organization (UNESCO) Institute for Statistics’ (UIS’) Literacy Assessment and Monitoring program 151 (LAMP) paper and pencil assessments of prose literacy, document literacy, numeracy and reading components on a sample of 3,000 adults per country; 4. The Saint Lucia’s paper and pencil prose literacy, document literacy, numeracy and reading components assessments on a sample of 3,500 adults per country- Full Assessment; 5. The Saint Lucia’s paper and pencil prose literacy, document literacy, numeracy and reading components assessments on a sample of 1,000 adults per country- Common Assessment; 6. The Bow Valley’s web-based prose literacy, document literacy, numeracy and reading components assessments on a sample of 3,500 adults per country- Full Assessment; 7. The Bow Valley’s web-based prose literacy, document literacy, numeracy and reading components assessments on a sample of 1,000 adults per country- Common Assessment. Relative to the TOR, an assessment of the International Survey of Reading Skills (ISRS) was required in this review. However, it should be noted that the ISRS methodology underpins all the other options reviewed. Essentially, the Saint Lucia assessment applied the ISRS measures and methods with minor adaptation. The assessment provided an analysis of the origins of large scale Literacy measurement including the International Survey of Reading Skills (ISRS), the Adult Literacy and Life Skills Survey (ALLS) and the Young Adults Literacy Survey (YALS). Each option was evaluated in terms of information yield, cost, operational burden, technical burden and risk. Country Needs and Constraints The evaluation reflects the needs and constraints facing the countries of the Region, that is, the countries have a pressing need for objective comparative data 152 on the level and social distribution of economically and socially-important skills for policy purposes. In statistical terms, the countries need comparative data on average skills levels, the distribution of skills by proficiency level for key subpopulations including youth, employed workers and the workforce as a whole, the relationship of skills to outcomes and he relationship of skills to determinants. With the exception of Saint Lucia and Bermuda, limited or no experience in the conduct of household-based skills assessments. Access to financial resources to support an assessment is limited and operational and technical expertise are limited as it relates to household-based skills assessment Key Issues Arising Out of Evaluation The evaluation suggests that all of the options reviewed would satisfy the Region’s information needs. Input from the countries suggests that they have very limited operational and technical capability, to the point that a paper and pencil-based national skills assessment would overwhelm them. The evaluation recommends against participation in either of the two PIAAC options – PIAAC in any guise is too costly, too technically and operationally demanding given the constraints facing countries in the Region The evaluation also recommends against participation in UIS’s LAMP program. While it is less costly and less operationally burdensome, it is almost as technically demanding as PIAAC. The evaluation also recommends against fielding a full-scale study using the Saint Lucian instruments. As with PIAAC and LAMP the financial, technical and operational burden of this option would overwhelm many of the statistics offices. The evaluation recommends either using the Saint Lucian or the Bow Valley instruments in a common regional assessment. Both options would impose a manageable financial, technical and operational burden on national statistics offices. Both Canada and the USA have implemented the Saint Lucian design at a national scale equivalent to what has been proposed for CARICOM Member States and the Saint Lucian pilot survey was of a sufficient scale to suggest implementation on the proposed scale is manageable. The Saint Lucian 153 implementation team reports that implementation placed a heavy burden on interviewers both literally and figuratively. The weight of the test booklets and component measures was taxing. The Bow Valley option would be slightly less costly and less operationally burdensome but would be slightly more technically demanding because of its reliance on computer technology. The Bow Valley tools have been validated on a national scale involving some 2000 test takers. The Bow Valley tool has the unique advantages of providing immediate results and supporting a range of other assessment purposes including the evaluation and administration of literacy programs. The assessment tools have proven particularly useful for placing students and for evaluating learning gain and program efficiency. Issues Raised by CARICOM’s Advisory Group on Statistics (AGS) at the Eighth, Ninth, Tenth and Eleventh AGS Meetings (i.e. the four meetings where the advancement of the Literacy Project was discussed). Discussion and decisions at the above-mentioned meetings focused on several related issues, including: 1. The use of a sample size of 1,000 households per country treating the CARICOM Region as one domain 2. Sample size 3. Number of adults to be targeted per household using the Bow Valley web-based assessment 4. Acceptable response rate for literacy assessments 5. Internet access 6. Inability of respondents to use the data collection device to respond to the survey 7. Cost of the survey 8. License fee for accessing the computer-/web-based literacy assessment test 9. Concerns about the preferred option versus the other options relative to science of measuring literacy 10. Technical capacity building at the national level 11. Involvement/ Input from other stakeholders at the national level 12. Major Risks 154 Conclusion on Findings of the Country Capacity Assessment Generally, these results reveal considerable heterogeneity among countries in the key uses to be served by the assessment. The fact that countries indicated a need for data to help with program administration provides support for the preferred assessment option. Collectively, the results of the capacity survey suggest that the administration of any of the full-scale assessment options would be beyond the capacity and capability of most of the countries. Most countries would need to greatly enhance their collection and processing capacity and most would need assistance and support to complete the technical aspects of implementation including sample selection, editing, scoring, weighting, variance estimation and analysis. The assessment reveals that the overwhelming majority of countries currently lack the operational and technical capacity to safely field the recommended national assessment option (the Full Bow Valley WebBased assessment). Such a finding does not preclude implementation; rather the weakness in the Region’s technical and operational infrastructure implies a need for higher expenditures on: 1. The recruitment and basic training of interviewers 2. The recruitment and training of coders, programmers and data analysts 3. The provision of higher levels of technical support for sample selection, weighting and variance estimation 4. The training of national teams in the execution of key implementation steps 5. The implementation of an extensive regime of proactive vetting of materials and plans to detect and prevent errors 6. The implementation of an extensive regime of retroactive review of operational results to detect and correct errors. 155 7. The specification and execution of a broad range of information products and services that serve to reduce the analysis burden on individual countries The actual size of the expenditures that will be required, the nature of the technical support required and the implied quality assurance regime can only be established once countries have completed their national planning reports and agreed to a common implementation schedule. As noted earlier in this report, the amounts will depend not only on the number of countries that decide to participate, but which countries decide to participate. The assessment also reveals that high-speed internet coverage is limited in most countries. However, there are recommendations that would allow for data collection where internet access is lacking or limited e.g. using data collection devices with large cache memory and uploading the responses later; using central points with internet access where respondents could access to respond to the survey as well as using 3G network where possible. 156 9.1 Activities Completed Under Phase I The table below shows the activities completed under Phase I by completion date and method of verification: Activities Completed Conduct of Briefing Meeting Completion Date May 2011 Preparation of a Inception Report Aug 2011 Conduct of CARICOM First Technical Workshop on a Common Framework for a Literacy Survey Conduct of CARICOM Second Technical Workshop on a Common Framework for a Literacy Survey Dec 2011 Review of Literacy Assessment Options- ISRS, LAMP, PIACC, Bow Valley Web-Based option Oct 2012 Review of country experiences relative to the ISRS, LAMP, PIACC, Bow Valley Web-Based option Oct 2012 Review of the work undertaken in the conduct of the Adult Literacy and Life Skills Survey in Bermuda, the work undertaken in Saint Lucia and the work proposed in Dominica in the area of Literacy Assessment Assessment of the survey capacity of countries Oct 2012 Preparation of Draft Report on Phase 1 activates which include all above-mentioned activities Preparation of Plan of Action Oct 2012 Preparation of documentation on the Common Framework for a Literacy Survey Apr 2013 Jun 2012 Oct 2012 Oct 2012 Method of Verification 1Inception Report 1Inception Report 2First Workshop Report 3Second Workshop Report 4 Phase I Draft Report- Chapter 2 4 Phase I Draft Report- Chapter 2 4 Phase I Draft Report- Chapter 3 4Phase I Draft Report- Chapter 5 4 Phase I Draft Report 5Common Framework for a Literacy Survey (including the Plan of Action) 5Common Framework for a Literacy Survey (including the Plan of Action) 1See Annex F Annex EI 3See Annex EII 4Same as in this Report (Final Report) pg 5See Annex G 2See 157 LIST OF REFERENCES DataAngel Policy Research, The adaptation of DataAngel’s Literacy and Numeracy Assessment to the needs of Small Island States, 2008 DataAngel Policy Research, A National Literacy and Numeracy Assessment for Saint Lucia: A National Planning Report, 2008 Kirsch, I.S., and Mosenthal, P.B. Interpreting the IEA Reading Literacy Scales. In M. Binkley, K. Rust, and M. Winglee (Eds.), Methodological issues in comparative educational studies: The case of the IEA Reading Literacy Study. Washington, DC: National Center for Education Statistics, United States Department of Education, 1994 OECD and HRSDC, Literacy skills for the Knowledge Society: Further Results of the International Adult Literacy Survey, 1997 OECD and Statistics Canada, Learning a Living: First results of the Adult Literacy and Life Skills Survey, 2005 OECD and Statistics Canada, Literacy for Life: Further results of the Adult Literacy and Life Skills Survey, 2011 Statistics Canada and OECD, Literacy, Economy and Society: Results of the first International Adult Literacy Survey, 1995 Statistics Canada and OECD, Literacy in the Information Age: Final report of the International Adult Literacy Survey, 2000 The adaptation of DataAngel’s Literacy and Numeracy Assessment to the needs of Small Island States’ (DataAngel, 2008) The UNESCO Institute of Education, A Cross-National Symposium on Functional Literacy, Hamburg, 1990 158 ANNEX A: TERMS OF REFERENCE 1. BACKGROUND Consistent, reliable and comparable statistical information is an important ingredient for planning, monitoring and evaluation of policy decisions and the lack thereof has long been considered as hampering the effectiveness of public policy in the Caribbean Region. Consequently, noting the lack and/or poor quality of literacy statistics in the Region, and the perceived importance of this information for the Region’s economic and social development in light of the advent of the Caribbean Single Market and Economy (CSME), Statisticians of the Caribbean Community (CARICOM) decided to pursue the development of a common strategy and methodology for the collection and analysis of social and gender statistics including educational statistics. Aware of these challenges and shortcomings of the existing approaches to measuring literacy, the CARICOM Advisory Group on Statistics (AGS), is attempting to develop a common framework for the production, collection and analysis of Literacy data. The AGS is charged with the mandate to guide the improvement in the range and quality of statistics and statistical infrastructure in the Region; it is comprised of the Directors of the Statistical Offices of eight CARICOM Member States which participate on a rotating basis and two members of the CARICOM Secretariat. The AGS recommended the use of the methodology in the “Literacy Assessment and Monitoring Program (LAMP)” developed by UNESCO which has been tested in a number of countries. In 2006, during the Thirty-First Meeting of the Standing Committee of Caribbean Statisticians (SCCS), the decision was taken to develop a regional strategy to assist Members States in its implementation. The LAMP is a recent tool developed by UNESCO with support from Statistics Canada and the Educational Testing Service (ETS) to measure functional literacy.19 The objectives of the LAMP are “to develop a methodology for providing data on the distribution of the literacy skills of adults and young people in developing countries; to obtain high quality literacy data in participating countries and to promote its effective use in formulating national policy, in monitoring and in designing appropriate programme interventions to improve literacy levels, and to build national capacities in the measurement of literacy, to develop and use valid and reliable data The LAMP approach is based on the International Survey of Reading Skills (ISRS) that was developed and fielded by Statistics Canada and the United States National Centre for Education Statistics. 19 159 LAMP data and methodologies20. In sum, employing the LAMP and the International Survey of Reading Skills (ISRS) approaches to measurement would enable the production of reliable and comparable data on literacy levels across all CARICOM Member States and inform decision-makers as to the interventions and requirements needed to improve literacy. 1. This initiative is consistent with other CARICOM efforts to harmonize regional statistics including the development of a common framework for statistics production in CARICOM, a Common Census Framework and the development of a Strategic Framework for Statistics. To achieve the project’s objectives, the initiative has the following three components described in this consultancy as phases: (i) establishment of a regional framework for conducting and adapting Literacy assessment models for the facilitation of a regional assessment for the execution of the Literacy Survey treating each country as a sub-population; (ii) development and adaption of LAMP instruments, such as survey instruments (questionnaires), training manuals, and related materials, to inform about the survey, documentation on the concepts and definitions, scoring of the assessment and on the sampling approach, data dissemination/tabulation format, as part of the common framework; and (iii) development of a template for the national implementation plans using a common questionnaire, field test procedures for establishing the psychometric and measurement property of the survey instrument, and confirmation of key aspects of survey cost and quality. 2. OBJECTIVES The general objective of this consultancy is to establish a common framework involving a regional approach for conducting the literacy assessment methodology, the development of literacy assessment instruments and in the provision of technical assistance for the development of national implementation plans for the conduct of literacy assessments, based on the agreed to common framework and instruments and other documents. The goal is to support informed policy making to meet the Education for All and the MDG goals, as well as supporting the integration process that is part of the Caribbean Single Market and Economy (CSME). The specific objectives can be divided into three phases as follows: 20 Literacy Assessment design. The design of the study would have 6 phases including the: (i) development and approval of a national implementation plan, (ii) development and certification of the content and design of survey documents in relevant languages, (iii) conduct of a field test to establish the psychometric and measurement properties of the survey instruments and to confirm key aspects of survey cost and quality, (iv) processing and analyzing of field test results; (v) administration of the final instruments to a probability sample of the adult population (16 years an up), and (vi) processing, analysis and reporting of the main assessment results. 160 Phase I: To undertake a review of the ISRS and LAMP approaches and to apply these approaches to a regional context through the production of a relevant draft methodological framework for use in CARICOM. Phase II: To undertake the preparation of all survey instruments, sample design, training materials etc., required to apply the literacy assessment in a regional/national in CARICOM. Phase III: To provide support to CARICOM member countries in the identification of requirements, timelines etc. as part of a strategic action plan to undertake at least two literacy assessment surveys. 3. SCOPE OF WORK The Consultant will be required to undertake the following activities to satisfy the objectives of the assignment: Phase I Activities: a. Engage in a Briefing Meeting with the Statistics Sub-Programme, CARICOM Secretariat to discuss the scope of work of the project. b. Review the ISRS and LAMP methodologies and identify any problems in application to CARICOM and adjustments that would be required including the following: i. Review the ISRS and LAMP methodologies in detail; ii. Review the experience of Countries in which the ISRS and LAMP has been conducted/pilot-tested; iii. Review specifically the work undertaken in the conduct of the Adult Literacy and Life Skills Survey in Bermuda, the work undertaken in Saint Lucia and the work proposed in Dominica in the area of Literacy Assessment; iv. Collect the requisite information to draft recommendations for a common approach to the conduct of a literacy survey in CARICOM (actions required, common content, sampling approach, data collection, data processing, literacy scoring, weighting, coding hardware/software requirements, publicity, training requirements etc.); v. Consult with Member States to determine the major considerations to be taken on board in (iv); c. Prepare on the basis of the comprehensive review a detailed draft report comprising the assessment of the reviews undertaken on the ISRS and the LAMP, the individual country assessments, information 161 d. e. f. g. obtained from Member States and a detailed methodological approach with recommendations, adjustments required, actions to be taken and the actual methodology to be utilised; Prepare a plan of action outlining the sequence of steps required to achieve the recommendations in (c) and the estimate of costs involved; Present findings and recommendations of the Phase I project activities to the meeting of the Advisory Group on Statistics and workshops organised in the course of the project to discuss the findings; Prepare and submit a final report on Phase I to the Secretariat incorporating all comments/outputs from the meetings, the Secretariat and other stakeholders; Support the Secretariat and the AGS in the preparation of final Common Framework for Literacy. Phase II Activities: a. Prepare on the basis of the Common Framework for Literacy Assessment in CARICOM in Phase I, all instruments and related documents and framework required and adjustments that would be required including the following: i. Literacy Survey Questionnaires including questionnaire for pilot-testing, screening questionnaire (to select respondent(s) in household if required), main questionnaire (for assessment), scoring sheets etc; ii. Methodology Guide containing concepts and guidelines on literacy assessment domains, data collection procedures, timelines for data collection etc.; iii. Training Manuals for Interviewers, Supervisors, Trainers; iv. Detailed Sample design, Sampling Frame, target population, detailed sampling method of household and respondents etc.; v. Guidelines for scoring and weighting results and for treatment of non-responses; vi. Guidelines for Data processing, including data capture, data editing and coding, data verification, data dictionary, data entry forms, editing rules etc; vii. Tabulations to be produced; viii. Analytical guidelines and dissemination; ix. Publicity material; x. Consult with Member States to determine the major considerations to be taken on board in (i) -(ix); c. Prepare on the basis of the documents/materials produced in (a) a draft report of the Phase II activities including adjustments required based on feedback from the AGS, Member States and the Secretariat; 162 d. Prepare and submit a final report of Phase II to the Secretariat incorporating all comments/outputs from the meetings, the Secretariat and other stakeholders; e. Support the Secretariat and the AGS in the finalisation of any of the documentation for use in the creation of the Common Framework for Literacy. Phase III Activities: a. Review the agreed to Common Framework for Literacy Assessment in CARICOM and the instruments and scope of work required in the undertaking of the literacy assessment; b. Design a draft template for the preparation of the National Implementation Plan to include the following: i. List of all activities to be undertaken as contained in the detailed common literacy framework and documentation to be obtained as per the survey instruments including pilot-test and quality assurance; ii. Costs estimates of all activities; Staffing requirements (no. of interviewers, supervisors, scorers); Training requirements specifically scientific measurement/data collection; Estimated timeframestart dates-end dates/scheduling of all activities; Procurement plans – goods and consultancy services; Responsible parties; Comprehensive Publicity Programme; Approach to sustainability of process. c. Prepare a cost estimate for the conduct of two Literacy Assessment in each CARICOM country which should include the following considerations: i. Survey Costs - fees/stipends for interviewer, supervisor, trainers, scorers, editors coders etc; ii. Corresponding travel costs; iii. Printing/production of questionnaires and all relevant manuals and documentation; iv. Advertisements/publicity; v. Data processing, tabulation, analysis and dissemination; vi. Other related costs for conducting the exercise. d. Collect information from member countries with regard to resources available at the national level, staffing, budget, collaborating partners 163 (Statistical Office, Ministry of Education and other relevant agencies) and other relevant information related to capacity availability/constraints that can inform the template to be prepared; e. Provide support to countries through workshop or otherwise in the preparation of the implementation plans; f. Prepare on the basis of (a) to (e) a Draft National Implementation Plan with the major components required for implementation of the Literacy Assessment survey; g. Prepare a draft report of the Phase 3 activities including the national implementation plan template; h. Prepare and submit a final report of Phase 3 to the Secretariat incorporating all comments/outputs from the meetings, the Secretariat and other stakeholders; i. Prepare an overall summary report of all phases of the Consultancy. 4. EXPECTED OUTPUTS The expected outputs for each phase are as follows: Phase 1: a. A Draft Report of the findings of the Phase 1 activity. The report will include al assessments undertaken including individual country assessments and recommendations on what actions and adjustments are required to the ISRS and Methodology; b. A Plan of Action outlining the sequence of steps required to achieve the recommendations of Phase 1 activities, item(e) and approximate costs involved; c. A Final Report of the Phase 1 incorporating all comments/outputs from meetings, the Secretariat and other stakeholders and the activities as implemented under Phase I. Phase 2: a. All documents and material as contained in Phase 2 activities, item (a) above; b. A Draft Report of the Phase 2 activities describing work put in place and adjustments required; c. A Final Report of the Phase 2 activities incorporating all comments/outputs from meetings, the Secretariat and other stakeholders. Phase 3: a. National Implementation Plans/ templates; 164 b. A Draft Report of the Phase 3 activities describing work put in place and adjustments required; c. A Final Report of the Phase 3 activities, incorporating all comments/outputs from meetings, the Secretariat and other stakeholders; d. A summary report of all three phases of the project. 5. TIMELINE The overall duration of the entire activity is 140 days comprising 25 days for Phase 1, 60 days for Phase 2 and 55 days for Phase 3. The breakdown for each phase is given below. Phase 1: The expected time for this phase is approximately 25 person days over a period of four months. The timetable is as follows: a. Preparatory period (3 days); b. Travel to the CARICOM Secretariat in Georgetown, Guyana to attend an initial briefing meeting, and to selected CARICOM countries (2 days); c. Conduct of the review of the ISRS and LAMP (10 days). d. Preparation of the draft report with recommendations (4 days); e. Attendance at meetings/ workshops as required (3 days); f. Supporting of the Secretariat/AGS in the drafting of final common framework (3 days). Phase 2: The expected time for this phase is approximately 60 person days over a period of six months with at least 23 days being spent in the beneficiary countries. The timetable is as follows: a. Preparatory period (5 days); b. Travel to the CARICOM Secretariat in Georgetown, Guyana to attend an initial briefing meeting, and to selected CARICOM countries (5 days); c. Production of draft copies of all instruments and materials required to conduct the literacy assessment including the sampling design (30 days). d. Preparation of the draft report with recommendations; e. Preparation of final versions of the instruments as required based on feedback from the Secretariat and all stake holders (10) days; f. Preparation of a final consultancy report (5 days). 165 Phase 3: The expected time for this consultancy is approximately 55 person days over a period of four months. The timetable is as follows: a. Preparatory period (5 days); b. Travel to the CARICOM Secretariat in Georgetown, Guyana to attend an initial briefing meeting, and to selected CARICOM countries (5 days); c. Production of draft National Implementation Plans/Templates (30 days); d. Preparation of the draft report with recommendations as required (5 days); e. Preparation of final National Implementation Plans/Templates (5) days; f. Preparation of a final consultancy report (3 days); g. Preparation of overall summary report all phases (2 days). 6. EXPECTED OUTPUTS The Consultant should possess a Masters Degree in the Social Sciences in areas such as Sociology, Economics, Statistics, or any other relevant discipline. The consultant should have at least 6 years of experience each in statistics (educational statistics); 8 years+ experience in the planning, execution and analysis of surveys; familiarity with the LAMP and IRIS methodology is preferred; excellent written and English communication skills with a demonstrated ability to assess complex situations in order to concisely and clearly filter critical issues and draw conclusions; and excellent facilitation skills. 166 ANNEX B: COUNTRY ASSESSMENT QUESTIONNAIRE CARICOM Regional Public Good Common Literacy Framework Project Questionnaire for Member States CONFIDENTIAL When Completed The following questionnaire has been developed to collect information in support of the IDB-financed CARICOM Regional Public Good Common Literacy Framework Project. The questionnaire seeks to identify Member States’: (i) (ii) Needs and priorities with respect to literacy and numeracy data Operational, financial and technical capacity and need for support The questionnaire will assist Member States in thinking about their national information needs and their capacity to field a household-based skills assessment, knowledge that will inform the completion of a national planning report. Questionnaires will also help identify what type of technical assistance Member States might need. A. Identification: Name: __________________________ Designation: _________________________ Organizational affiliation:________________________ Address: _________________________ _________________________ Telephone: ____________________________ Email: ___________________________ B. Data needs and priorities 167 Adult skills assessments can be designed to serve a range of purposes including knowledge generation, policy and planning, monitoring, evaluation and program administration. Knowledge generation involves generating new scientific insights, including understanding cause and effect. Monitoring implies the collection of repeated measures in order to see if the design of any assessment must be adapted to support each of these purposes. B1. Which of the following purposes does your country require adult literacy and numeracy data? Indicate all that apply and rank in order of importance i.e. 1- Most important and 5Least important __ Knowledge generation __ Policy and program planning __ Monitoring __ Evaluation __ Program administration Adult skills assessments can be designed to serve a range of policy departments. B2. Which of the following policy departments require adult literacy and numeracy data? Indicate all that apply and rank in order of importance i.e. 1- Most important and 8- Least important __ Kindergarten to Grade 12 education __ Adult education __ Labour __ Finance/Treasury __ Language and culture __ Social __ Prime Minister’s Office __ Other (specify) ______________________________ 168 Adult skills assessments can be used to address a wide range of policy issues. B3. Which of the following policy issues require adult literacy and numeracy data? Indicate all that apply and rank in order of importance i.e. 1- Most important and 15- Least important ___ Improving the quantity of primary and secondary education ___ Improving the quality of initial education ___ Improving the equity of initial education ___ education ___ Improving the quantity of tertiary education ___ Improving the quality of tertiary education ___ Improving the equity of tertiary education ___ Improving the efficiency and effectiveness of tertiary education ___ Improving the quantity of adult education ___ Improving the quality of adult education ___ Improving the equity of adult education ___ education B4. Improving the efficiency and effectiveness of initial Improving the efficiency and effectiveness of adult ___ Reducing social and economic inequality ___ Improving labour productivity and competitiveness ___ Improving health Has a source(s) of funding been identified to support the implementation of a national literacy and numeracy assessment? 169 O C. Yes O No Operational Capacity The implementation of adult skills assessments places significant demands on the operational capacity of NSOs. The following questions will help evaluate whether Member States have the capacity to undertake an assessment. Experience in executing a literacy survey C1. How many staff members have experience in literacy survey? Number: I__I__I If zero skip to C3 C2. In what areas do they have experience? MARK ALL THAT APPLY o o o o o o o o Planning Sampling Data collection Data entry/ data capture Coding Editing Data analysis Other (specify) ________________ Collection capacity C3. How many trained interviewers do you on Staff? Number: C4. I__I__I What is the total monthly collection capacity (in total number of interview hours)? Number of hours: C4a. I__I__I What proportion of this capacity is being utilized by the collection of the regular statistical program? Enter percent I__I__I C5. How many field supervisors do you have on staff? Number: I__I__I 170 C6. What is the average daily training fee paid to interviewers, field supervisors and senior interviewers? Interviewers $_________ Field supervisors $_________ Senior interviewers $_________ Data capture capacity C7. How many Data Entry Clerks do you have on staff? Number: C8. I__I__I What is the average capacity of your data entry clerks in terms of number of keystrokes per day? I__I__I__I__I__I__I__I__I__I keystrokes/day Coding capacity C9. How many statistical coding clerks do you currently have on staff? Number: I__I__I Editing C10. How many programmers do you have on staff? Number: C11. I__I__I How many field editors do you have on staff? Number: I__I__I Analysis C12. C13. How many staff members with statistical analysts experience do you have on staff? Number: I__I__I Which of the following analysis tools does your staff have working experience in? MARK ALL THAT APPLY 171 C14. O SAS O SPSS O Excel O Other (specify) _________________ What types of analysis can you support? O Tables O Simple regressions O Multi-level/multi-variate regressions Technical Capacity Sampling C15. Do you have a sampling statistician on staff? O C16. Yes O No Does your office have the capacity to select a multistage, stratified probability sample? O Yes O No Weighting C17. Does your office have the capacity to weight survey records? O Yes O No Variance estimation 172 C18. Does your office have the capacity to calculate variance estimates based on complex survey designs? O Yes O No C19. Does your office have the capacity to calculate variance estimates using replicate weights? O Yes O No Graphics C20. Does your office have the capacity to use In-design, the software used to generate the test booklets? O Yes O No Computers in collection C21. What proportion of the country has access to high speed internet? I__I__I__I% C22. How many of your interviewers are able to do simple tasks on a computer? E.g. create a word document, buy products on a website, create an excel spreadsheet, send email Number of Interviewers: I__I__I__I C23. How many of your interviewers have experience with computer-assisted personal interviewing? Number of Interviewers: I__I__I__I 173 ANNEX C: COSTING FOR SAINT LUCIA’S LITERCY SURVEY PILOT The following spreadsheet provides an overview of how the Saint Lucia assessment was costed. This template was used to derive the cost estimates for the paper and pencil options presented in this report. Actual costs will vary depending on national wage structure. The computer-based options include hardware and software acquisition costs and license fees but exclude internet usage fees. Saint Lucia Cost Estimates Core Project team Project manager Sampling statistician Programmer In design Revise collection manuals Revise BQ Prepare for pilot training Give pilot training Prepare for main training Give main training Attend framework/adaptation training Attend task admin/scoring Supervise pilot+main data collect Total core teams Pilot collection Interviewer training Print interviewer manuals Batch assignments Data collection Scoring Print booklets Calculators Recorders Timers Print flip books Print score sheets Re-score 100% Coding ISIC ISCO ISCED Edit Batteries Resources Cost 60 days @$125 20 days@$100 20 days @$200 5 days@$250 5 days@$100 5 days@$100 5 days@$100 5 days@ $150 3 days@$100 5 days @$150 $7,500.00 $2,000.00 $4,000.00 $1,250.00 $500.00 $500.00 $500.00 $750.00 $300.00 $750.00 3 days @$100x3 6 days@$125x3 $900.00 $2,250.00 150 days@$100 $15,000.00 $36,200.00 15 @ $200 200 pagex.50/pagex50 7 hours 450 cases@$100/case 15 minutes/case 500x$5 20@$15 20@$50 20@$5 100@$15 500@.5 15 minutes/case 20 days@$125 Total Cost $36,200.00 $3,000.00 $5,000.00 $100.00 $45,000.00 $900.00 $2,500.00 $300.00 $1,000.00 $100.00 $1,500.00 $250.00 $900.00 $1,500.00 $2,500.00 $250.00 174 Data capture scores Data capture codes Data capture BQ Print BQ Clerical support Admin support Print task admin guide Total Pilot 500@$10 20 days @$75 20 days@$75 20@$10 $100.00 $50.00 $750.00 $5,000.00 $1,500.00 $1,500.00 $200.00 $73,900.00 $73,900.00 Main collection Interviewer training Print interviewer manuals Data collection Scoring Print booklets Calculators Recorders Timers Print flip books Print score sheets Re-score Coding ISIC ISCO ISCED Batteries Print BQ Print task admin guide Data capture scores Data capture codes Data capture BQ Edit Map to international record layout International rescore Weighting variance estimation 5days@ $200/interviewer 45 200 pagesx.50/pageX50 4500 cases@$100/case 15 minutes/case 4500 cases 5000@$5 35@$15 35@50 35@$15 100@$15 500@.50 100% 15 minutes/case 4500 cases 5000@$10 50@$10 4500 score sheets 4500 coding sheets 20 days@$125 $9,000.00 $5,000.00 $450,000.0 0 $11,250.00 $25,000.00 $525.00 $1,750.00 $525.00 $1,500.00 $2,500.00 $11,250.00 $5,625.00 $2,500.00 $50,000.00 $500.00 $1,000.00 $500.00 $15,000.00 $2,500.00 3 days@$125 $375.00 $5,000.00 5 days@$125 $625.00 $601,925.00 International overheads Review NPR Training Frameworks/adaptation Training task admin/scoring Quality assurance adaptation Psychometrics pilot Psychometric main Pilot meeting 3 experts $601,925.00 $1,500.00 $10,000.00 $12,000.00 $1,500.00 $10,000.00 $20,000.00 $10,000.00 175 Small area estimates Sampling Analysis+reporting General management Main meeting Total International Overheads Total project Including 10% margin of error Total Available budget $25,000.00 vet design, weighting, reps in US $ in EC $5,000.00 $25,000.00 $15,000.00 $10,000.00 $145,000 $393,950.50 $393,950.50 $1,105,975.50 $1,216,573.05 $1,300,000.00 176 ANNEX D: SMALL AREA ESTIMATION The methods employed in common assessments The implementation of a common assessment is predicated on the assumption that the cost and operational burden associated with a full assessment is too high for the majority of Member States to bear. The common assessment is designed to reduce the cost and operational burden of the assessment by reducing the sample size to 1,000 cases without sacrificing the scientific integrity of the skill measures. Data from the 1,000 cases are then used in three ways. First, the proficiency data is used to generate a small number of point estimates (e.g. the average prose literacy, document literacy and numeracy scores, the score distributions, the score distributions by proficiency levels). The number of reliable point estimates will depend on the sample size and the actual distributions of characteristics. Second, the data are used in a multivariate analysis that explores the link between assessed skills and background variables i.e. what variables explain the observed differences in skill level and what impact do skill differences have on individual outcomes. Third, a variant of the multivariate analysis is used to provide a set of regression parameters that can be used to impute skill scores on to the most recent Census. Two forms of regression are used to generate the imputed values for each skill domain and for membership in market segments: o Logistic regression where the dependent variable is the skill Level (1-5). o Ordinary Least Squares (OLS) regression dependent variable is the skill Score. where the The regression variables used in this analysis are restricted to those that have been shown to have an impact on skill and that are available on both the assessment background questionnaire and the Census file. For example, the recent Canadian application of this approach the imputation the imputation used: o Gender o Education_Level : 5 Categories o Age Group : 177 o Mother_Tongue : English, French, Multiple and Other o Province o Labour Force Status: Employed, Unemployed, Not in Labour Force o Occupation ; 10 Categories o Aboriginal : Yes/No o Immigrant : Yes/No To impute actual score values, within skill Levels, the percentiles of actual regression scores were mapped onto percentiles of predicted skill Values. o Using the survey data the actual Scores are compared to predicted values (based on the OLS Regression). o This is done within each skill Level in each domain so one can compare the percentiles of the actual scores associated with the percentiles of the predicted values. The imputation procedures for each individual on the Census microdata file are as follows: (ii) A skill Level (1-5) is imputed based on the Logistic Regression Coefficients. The imputed value is random using not only the coefficients but also the variance/covariance matrix. (iii) A preliminary Score is imputed based on the OLS Regression. This score may not be in the appropriate range for the imputed proficiency Level. (iv) This preliminary Score is converted into a final score as follows: o The preliminary Score is converted into a percentile of predicted scores (based on the common assessment analysis) within the imputed Level. o This percentile level is used to pick an actual score from the common assessment at this percentile level, within the proficiency Level. o This actual score is the imputed Score. 178 (v) This imputation is repeated 10 times so that a variance in the various Literacy Scores and levels can be estimated. Experience suggests that the variables that are available on both the Census and the common assessment capture roughly 70 percent of the variance in skill score. Thus, individual scores are quite error prone. Application of these methods in Canada confirms that these methods faithfully reproduce the true skill distributions and yield useful information. The associated errors fall rapidly as proficiency results are accumulated by population subgroup - once the size of a given target population subgroup exceeds 500 the results meet all standard tests for reliability. Estimates for smaller subgroups are more error prone but may still be useful for policy. 179 ANNEX E: REPORTS ON THE CARICOM TECHNICAL WORKSHOPS ON THE COMMON FRAMEWORK FOR A LITERACY SURVEY 180 ANNEX EI: FIRST WORKSHOP REPORT 181 ANNEX EII: SECOND WORKSHOP REPORT 182 ANNEX F: INCEPTION REPORT 183 ANNEX G: COMMON FRAMEWORK WITH PLAN OF ACTION 184