Oral Reading Fluency Benchmark Procedures and Considerations: Final Report for Save the Children By: Dr. Mónika Lauren Mattos Teachers College, Columbia University TABLE OF CONTENTS List of Figures .................................................................................................................... iii List of Tables ..................................................................................................................... iii List of Acronyms ............................................................................................................... iv Glossary ............................................................................................................................. vi Introductory Summary ........................................................................................................ 1 Literature Review................................................................................................................ 1 Sample Benchmarking Procedures ................................................................................. 1 Tools Used to Measure Fluency ................................................................................... 12 Tools Used to Assess Fluency and Comprehension in Alphasyllabic Scripts .............. 17 Bangla Fluency Data Trends......................................................................................... 33 The Relationship between Fluency and Comprehension .................................................. 45 Ethiopia: A Sample Oral Reading Fluency Benchmark Study ......................................... 56 Considerations for Future Fluency Benchmark Studies in Bangladesh............................ 61 Language Learning Context.......................................................................................... 61 Competencies ................................................................................................................ 63 Recommendations ............................................................................................................. 64 Guidelines in the Benchmark Making Process ............................................................. 64 Align the Benchmark Tool to the External Criterion Measure ..................................... 65 Select Regions............................................................................................................... 66 Develop the Benchmark Tool ....................................................................................... 66 Train Assessors and Pilot the Benchmark Tool ............................................................ 68 Sampling and Data Collection ...................................................................................... 68 Conduct Workshops to Interpret and Discuss the Findings .......................................... 70 Conduct Workshops to Develop Proposed Benchmarks .............................................. 70 The Way Forward: Advocacy, Mobilization, and Collaboration ................................. 71 Conclusion ........................................................................................................................ 72 References ......................................................................................................................... 75 Appendix A: Sample Decodable Reader in Bangla .......................................................... 79 ii List of Figures Figure 1. A Network of Processing Systems for Reading .................................................. 51 List of Tables Table 1. Six Dimensions of Fluency ........................................................................................... 48 Table 2. Sample Timeline for Benchmark Making Process .............................................. 65 iii List of Acronyms ASER Annual Status of Education Report AUC Area Under the Curve BRAC Bangladesh Rural Advancement Committee CAMPE Campaign for Popular Education CIES Comparative International Education Society CLS Correct Letter Sounds DCS DIBELS Composite Score DIBELS Dynamic Indicators of Basic Early Literacy Skills DMG Dynamic Measurement Group DORF DIBELS Oral Reading Fluency DPE Directorate of Primary Education EGRA Early Grade Reading Assessment FSF First Sound Fluency GRADE Group Reading Assessment and Diagnostic Evaluation IAT Instructional Adjustment Tools IER Institute for Education Research LAB Language Acquisition Battery LB Literacy Boost LNF Letter Name Fluency MOE Ministry of Education MoPME Ministry of Primary and Mass Education NCTB National Curriculum and Textbook Board iv NSA National Student Assessment NWF Nonsense Word Fluency PAL Programs for Assisted Living PROTEEVA Promoting Talent through Early Education PSF Phoneme Segmentation Fluency RAN Rapid Automatic Naming READ Reading Enhancement for Advancing Development RTI Research Triangle Institute International RtR Room to Read RWI Reading and Writing Instruction TPF The Promise Foundation UOCTL University of Oregon Center on Teaching and Learning USAID United States Agency for International Development WCPM Words Correct Per Minute WPM Words Per Minute WWR Whole Words Read v Glossary Accuracy: one of the components of reading fluency; reading with accuracy involves high word recognition and strong decoding skills to sound out unfamiliar words Aksharas: the symbol units of the Bangla writing system Alphabetic script: a writing system whose symbols represent the basic sounds of the language Alphasyllabary language: an abugida; a language that uses symbols to represent consonant sounds and shows vowel sounds with diacritics Alphasyllabic script: a writing system whose symbols may represent syllable sounds; consonants with inherent vowels Cut-off points: points that indicate skills levels at which student performance can be predicted Decodable text: A type of text used to help children decode words using the phonics skills taught when they are learning to read Decoding syllables: the process of matching a letter or combination of letters to their sounds and recognizing the syllable patterns in words Decoding skills: the ability to read words quickly and automatically; some of the subskills needed to decode include knowledge of sound-symbol correspondence, segmenting words into individual sounds, blending syllables and sounds, building a large repertoire of sight words External criterion measure of reading: an external assessment designed to measure student performance against fixed, predetermined criteria Fluency benchmarks: points of reference against which oral reading fluency can be compared at the beginning, middle, or end of the academic year Leveled texts: a range of texts written at different reading ability levels in order to match these to children’s actual ability levels, monitor their progress and provide the necessary instructional support at each ability level Nonwords: In the context of reading assessments, pseudowords that are pronounceable based on phonics rules but do not exist nor have meaning Orthography: the representation of the sounds of a given language by written symbols vi Phoneme segmentation: the ability to segment words into their individual sounds Phonemic awareness: the ability to hear, identify, and manipulate individual sounds in spoken words Phonological segmentation: the ability to segment the sounds of a language at the word, syllable, and phoneme level Phonological skills: the ability to identify and manipulate units of oral language such as initial, middle, and ending sounds in words Prosody: one of the components of reading fluency that includes pitch, stress, and timing; reading with prosody involves reading with expression, in phrases or chunks, and using intonation or pauses to signal punctuation or grammatical features of a language Reading acquisition: the process of acquiring the skills needed in order to learn to read Screening measures: measures used to identify or predict students who may be at risk for poor reading outcomes Semantic complexity: In texts, how meaning in a given language is conveyed through words, phrases and sentences at increasing levels of complexity Sensitivity criterion: a statistical measure used to evaluate a benchmark goal or cut point for risk Speed: one of the components of reading fluency that is measured in words per minute; there is usually an appropriate reading rate for a given age or grade level Syllabic awareness: A component of phonological awareness; involves the understanding that words are divided into syllables Syntactic complexity: In texts, the logical and grammatical arrangement of words at increasing levels of complexity Threshold point: the value at which decoding skills optimally support the ability to read Word recognition: the ability to recognize written words correctly and effortlessly Word accuracy: In the context of oral reading fluency, the ability to read words without errors vii Introductory Summary This first section of this report provides a review of the literature on how benchmarks are created. The procedural guidelines serve as a reference that informs how benchmark tools and procedures can be developed and adapted to other contexts. Tools used to measure fluency are delineated across technical reports and clinical case studies. The second section discusses the relationship between fluency and comprehension. It describes universal and language specific features that influence the reading acquisition process in Bangla. The next section provides a sample oral reading fluency benchmark study that included languages that use alphasyllabic and alphabetic scripts. The last section addresses salient themes that emerged from fact-finding meetings with relevant stakeholders. Issues around the process of developing fluency benchmarks in Bangladesh are thoroughly discussed. The report concludes with recommendations on the steps and approaches that can guide the benchmark making process. Literature Review Sample Benchmarking Procedures In the U.S., research, development, and implementation of early literacy assessments as well as oral reading fluency benchmark making procedures have informed similar efforts in other country contexts. The development of EGRA tools was informed by research conducted on the implementation of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) in the U.S. In turn, the development of DIBELS was informed by research conducted on Curriculum-based Measurement, which consists of standardized procedures used to assess and monitor literacy skills as well as skills in Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 1 other subject areas. In 1992, Hasbrouck and Tindal (2006) used this alternative measure to assess oral reading fluency and develop benchmarks in grades 2-5 in the U.S. The researchers did this by collecting oral reading fluency data from 8 regions in the U.S. at three points in the academic year. Over the years, ongoing research in this area has helped to establish new oral reading fluency benchmarks in response to changes in testing procedures, standards, and demographics in the American education sector. In 2012, the University of Oregon Center on Teaching and Learning (UOCTL) published a technical paper on the important changes made to the development of benchmark goals on the Dynamic Indicators of Basic Early Literacy Skills (DIBELS). The changes were made because educators and administrators noted that many students who met the benchmark goals did not pass external criterion measures as evidenced in comprehensive standardized tests (OUCTL, 2012). This was due to the fact that composite scores were solely aligned to the internal performance screening measures for the benchmark goals. The benchmark goals were not aligned to an external, standardized comprehensive assessment administered at the end of the academic year. Consequently, the fluency benchmark goals could not serve as predictive measures for student performance on an external comprehensive assessment. In U.S. classrooms, this meant that teachers who relied on the DIBELS Data System were not able to properly identify struggling readers nor plan for interventions through literacy instruction. In response, the OUCTL therefore suggested the development of new fluency benchmark goals and cut points for risk that aligned with an external criterion and could therefore serve as predictive measures. The organization delineates a different approach to the development of benchmark goals based on “(a) an external technical review of Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 2 DIBELS Next materials, (b) an analysis of the procedures used to establish the Dynamic Measurement Group’s (DMG) former goals, (c) consistent feedback from users, and (d) best practices in education research on sample selection and study replication, (UOCTL, 2012, p.1). The technical paper thus addresses key elements and procedural guidelines in the development of benchmark goals based on lessons learned. According to the authors, two initial key elements to consider are the student population and the external criterion measure. Benchmark goals are based on a large sample size representative of the student population across schools in the country thus mirroring demographic data. It is important to identify an already existing nationally recognized standardized test that serves as a strong “external criterion measure of reading.” Before the start of the benchmark process, all stakeholders involved in the development of benchmarks goals need to agree on the standards set by the external criterion measure that will be linked to fluency measures. The development of benchmark goals also involves careful consideration of the statistical procedures used. The authors point out that the analytic lens selected influences the understandings culled from the benchmark making process (UOCTL, 2012). They state that in order to develop the new benchmark goals for DIBELS Next and the cut off points, the main statistical procedures implemented were “(a) the Area under the Curve (AUC) and (b) sensitivity, (p.5).” The Area Under the Curve is a statistical procedure that evaluates how well a screening measure actually groups students between cut off points, thereby providing a predictive value of students’ basic early literacy skills across grade levels at the end of the academic year. A sensitivity indicator is a statistical procedure Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 3 used to determine how well selected points on a scale score actually help single out the students who do not meet a criterion goal. Another key element involves a consideration of “decision rules.” The authors inform that, “For each measure at each time point that is recommended, we calculated (a) the benchmark goal, and (b) the cut point for risk ” (UOCTL, 2012, p.6). Students who meet or exceed the benchmark goal are more likely to “score at or above the 40th percentile” on an external criterion measure. In the process of developing benchmark goals, a cut point for risk is set in order to identify the students who do not meet a benchmark goal. In their aim to align with an external criterion measure, the researchers noted that the cut point for risk should indicate that students who scored below the benchmark goal would score anywhere below the 20th percentile (UOCTL, 2012). The last key element addressed is the particular analytic approach used in the process of developing benchmarks. This entailed the linking of each fluency measure administered at three points in the academic year to an external criterion measure administered at the end of the year. In order to determine the accuracy of the benchmark goals, only measures that have an Area Under the Curve that is greater than .75 are selected (UOCTL, 2012, p.7). Next, they carefully analyzed each measure selected. For each of the three points in time, an initial analysis was conducted to set the benchmark goal and a second analysis was conducted to set the cut point for risk. The analyses conducted to create the benchmark goal and the cut point for risk at each point in time then underwent the sensitivity and specificity statistical procedures to ensure that struggling readers were readily identified and provided with targeted literacy instruction. In order to provide strong confidence in the ability to predict how well students would do Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 4 on the external criterion measure administered at the end of the year, the authors selected a sensitivity criterion of 90% for the benchmark goals and a sensitivity criterion of 80% for the cut point risk on the DIBELS. Thus, after the student population is selected and the external criterion measure is aligned in the process of creating benchmark goals, the screening measures are then administered at the beginning, middle, and end of the school year. At the end of the year, students take the standardized test that serves as the external criterion measure. Student performance on the screening measures at the beginning, middle, and end of the year is then compared to student performance on the external criterion measure. Student performance is identified as proficient or below. Benchmarks can then be developed by noting where the proficient and lower scoring students were at the beginning, middle, and end of the academic year—the critical benchmark time periods. The alignment between the screening measures and the external criterion measure minimizes prediction errors in the development of benchmarks. This ensures that the benchmarks that are set can predict how well students will do on the criteria described in the external measure (UOCTL, 2012). Powell-Smith, Good, Latimer, Dewey, Wallin, and Kaminski (2012) further describe the process of developing DIBELS Next benchmark goals and cut points for risk through a study conducted from 2009 to 2010. Their technical paper also provides an evaluation of the DIBELS Next measures and addresses four aims. In the benchmark goals study, the authors aimed to identify the performance levels on the DIBELS Next assessment that would serve as good predictors of students’ performance on end of year reading goals. A second aim was to evaluate the reliability of the DIBELS Next Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 5 assessments and the DIBELS Composite Score. A third aim for the study was to explore the correlations between different elements of the DIBELS Next assessment and the Group Reading Assessment and Diagnostic Evaluation (GRADE), an external criterion measure. A fourth aim was to evaluate teacher and assessor satisfaction with the DIBELS assessments. This literature review discusses relevant details of the first three aims of the study. Powell-Smith and his colleagues selected students from grades kindergarten to sixth grade from three geographical areas in the United States. The participants attended English-medium general education classrooms and included students for whom English is a second language as well as students with disabilities who were able to take part in the DIBELS assessment. The researchers explain that they selected a subset of the total sample to take part in both the DIBELS assessment and the additional GRADE assessment in order to examine the reliability and validity of the DIBELS measures. In order to check for validity, fifty students representative of the three geographical areas were selected to take the GRADE, an external criterion measure. In order to check for reliability, three out of the five school districts spread across three geographical areas were chosen to take either an alternate-form reliability test, a test-retest reliability, or inter-rater reliability test. The parents of the students who took both the DIBELS and GRADE assessments completed a demographics survey. The measures for the benchmark goals study were the DIBELS measures, the external criterion measure Group Reading Assessment and Diagnostic Evaluation (GRADE), and a questionnaire filled out by teachers and administrators to gauge the usability of the DIBELS Next assessment. The individual measures of the DIBELS Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 6 included First Sound Fluency (FSF) to test kindergarten students’ ability to isolate and identify initial sounds in words in the beginning and middle of the year. Given that many kindergarteners may exhibit partial and emergent fluency in the FSF, the assessors used differential scoring for this measure. Letter Naming Fluency (LNF) tests kindergarten and early first grade students’ letter automaticity, or ability to identify and say the name of lower case and uppercase letters. In the Phoneme Segmentation Fluency (PSF) measure, students demonstrate their phonemic awareness by listening carefully and sounding out word parts. The assessors used differential scoring for the students that demonstrated emergent phonemic awareness. The Nonsense Word Fluency (NWF) measure tests students’ ability to identify the correspondence between letters and sounds and blend sounds to form complete nonsense words that follow a vowel-consonant or consonantvowel-consonant pattern. The assessors start to administer the NWF measure in the middle of kindergarten. Students receive two separate scores this fluency measure. One score stands for the number of correct letter sound correspondences marked in the first minute, Correct Letter Sounds (CLS). The second score stands for the number of nonsense words that are correctly read aloud without using phoneme segmentation, Whole Words Read (WWR). The DIBELS Oral Reading Fluency (DORF) and Retell is a two-part measure that tests students’ skills in phonics, making sense of unknown words in context, reading connected text with fluency and accuracy, and reading with understanding. In the DORF section, students read a different a one-minute grade-level passage in the beginning, middle, and end of the year. For the benchmark assessment, the assessor finds the DORF scored by calculating “the median number of words read correctly and the median number of errors across the three passages” (Powell-Smith et Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 7 al., 2012, p. 31). The assessor finds the accuracy rate by dividing the median number of words read correctly by the sum of the median words correct and the median errors. In the retell section, the assessor makes a quality response rating that measures comprehension. The DIBELS-Maze measure assesses students’ reasoning and comprehension skills. The assessor uses a formula that calculates an adjusted score that takes into consideration instances where students may have guessed on the test. In preparation for the assessment phase, school coordinators and teachers conducted the testing for DIBELS and GRADE. School personnel involved reviewed the testing materials, steps for administration of the test, and rules for scoring the students’ responses during one day of training. They also practiced scoring mock responses to ensure fidelity in the procedure. In order to maximize reliability and accuracy in the scoring process, the school coordinators and teachers participated in calibration activities. The principal investigators addressed any discrepancies in the administration of the test and in the scoring procedures during the training (Powell-Smith et al., 2012). In order to ensure inter-rater reliability, five participants from each grade were randomly selected so that an assessor and a shadow-scorer could administer the test. School coordinators and teachers took turns as assessors and shadow-scorers. The shadow scored protocols omitted the names of the students and used other identifiers such as student identification, grade level and school district (Powell-Smith et al., 2012). The alternate form reliability test was performed in the school district where students’ reading skills were most varied. Powell-Smith et al. (2012) add that in grades 2, 3, 4, and 6, “stratified sampling by benchmark status was utilized to obtain a sample comprised of 50% students at benchmark and 50% from combined strategic and intensive Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 8 instructional recommendation categories” p. 15. In first and 5th grade there were low or disproportionate numbers of students who fell in the strategic and intensive categories so the researchers oversampled to meet the sampling goal. In order to check for test-retest reliability, the students took the DIBELS Next assessment in the middle of the year and were then retested two weeks later. The principal investigators checked for test-retest reliability in the same school district where students’ reading skills were most varied. The researchers used data from the DIBELS assessment administered in the beginning of the year to select a student sample for the retest two weeks after the middle of the year. As in the alternate form reliability test, they used stratified sampling to gather 50% of students who met the benchmark and 50% of students who fell in either the strategic and intensive categories. In instances where the sampling goal of thirty students from the respective categories could not be met, they sorted grade lists again, this time by the lowest non-word fluency scores for first grade and the lowest DIBELS Oral Reading Fluency Scores (DORF) for grades two to six to meet the required percentage of students from the strategic and intensive categories. Once scores from the individual measures are calculated for each grade, “the measures that correlate highly with later outcomes” are first weighted and then “combined into a DIBELS Composite Score (DCS)” (Powell-Smith et al., 2012, p. 35). In first grade, composite scores are determined for the middle and end of the year. In grades two to six, composite scores are determined for the beginning, middle, and end benchmark assessments. In order to find out the strength of the DIBELS Next measures as predictive measures, they are compared against an external criterion measure. In this case, the external criterion measure was the Group Reading Assessment and Diagnostic Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 9 Evaluation (GRADE) designed for students in preschool to 12th grade. The GRADE consists of five sections, 16 subtests, and 11 grade-specific testing levels. Only the relevant subtests were used in each grade. As in the DIBELS Next measures, subtest scores were combined to determine the composite scores. The principal investigators reviewed the data for each grade and benchmark assessment in search of invalid scores and made decision rules to remove outliers. The data set was then ready for analysis. The GRADE was administered around the same time as the benchmark assessment at the end of the academic year. In the effort to develop benchmark goals and cut points that could be generalized, the researchers decided that students whose lowest raw scores showed performance at or above the 40th percentile on the GRADE would serve as the external criterion for adequate reading skills while the lowest raw scores at above the 20th percentile would serve as an external criterion for the cut points (Powell-Smith et al., 2012). The principal investigators used the end-of-year benchmark and cut point approximations from the GRADE external criterion to guide their calculation of the benchmark goals and cut points for risk based on the DIBELS Composite score for the end of the year. Once the benchmark goals and cut points for risk were calculated for the end of the year using the DIBELS Composite Score, they referred to these to determine benchmark goals and cut points for risk for the middle of the year using the DIBELS Composite Score for the middle of the year. Once the benchmark goals and cut points for risk were calculated for the middle of the year, they referred to these to determine the benchmark goals and cut points for risk for the beginning of year using the DIBELS Composite Score for the beginning year. Goals and cut points for risk were also developed using the individual DIBELS Next measures. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 10 Analyses and correlations were then drawn from the DIBELS Next measures, the DIBELS Next composite scores and the external criterion measure, GRADE for each of the benchmark periods. The findings show correlation increases with each subsequent grade level except for the retell scores, which showed a steady decrease in each subsequent grade level. The findings also show that the reliability coefficients were high on the test-retest, alternate form, and inter-rater reliability. Given these findings, students who meet or exceed a current benchmark goal have 80% to 90% of meeting later reading goals on the DIBELS Next Measures and therefore have high chances of doing well on the GRADE. Another important finding was that the DIBELS Next Composite Score was generally a better reading proficiency measure than DIBELS Next individual measures. The authors add that the utility of the composite score as an internal criterion measure was strong and therefore validated the DIBELS Next benchmark goals and cut points for risk. PowellSmith et al., (2012) conclude that more research is necessary in order to replicate the findings with other external criterion measures. The section above focused on the procedures applied in the creation of benchmark goals and cut points for risk for fluency in English reading. Although there are universal features in reading acquisition (Frost, 2012), there are also language-specific features (Nag, 2007; Perfetti, 2003) that pose several implications for the creation of fluency benchmarks. Thus, the procedures described above may serve as a reference in countries where fluency benchmarks are yet to be established and should be adapted to meet context-specific needs. In light of these considerations, theoretical models that explain how children learn to read in languages not based on the alphabetic script are a relevant Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 11 part of the discussion. The next section of the literature review addresses the tools used to measure fluency with a lens on reports and clinical studies. Tools Used to Measure Fluency Jukes, Vagh, and Kim (2006) state that the tools used to measure fluency require consideration of the varied skills, subskills, challenges, and underlying processes involved in learning to read in the writing system of a particular language. The authors add that there are also cultural and country-specific factors such as reading standards and school curricula that influence target measures for fluency. Among these is the assessment of letter knowledge, or in the case of alphasyllabary languages, the assessment of syllable knowledge is an important fluency measure. Children’s ability to correctly and fluently name letters or the alphasyllabic script and their corresponding sounds, serves as a strong indicator of their ability to read (Jukes et al., 2006). An assessment of phonological awareness entails an activity that asks children to signal rhyming words, take apart the sounds of initial sounds, word endings, and tap out the number of sounds in each word. An assessment of phonological recoding tests children’s “ability to apply phoneme correspondence rules” through a nonword reading activity (p. 9.) The authors further suggest that the syllable complexity of a particular language must be taken into account in the development of items that assess phonological processing skills as fluency measures. Oral reading fluency measures pay attention to children’s ability to read connected text with “accuracy, speed, and prosody” (Jukes et al., 2006, p. 10). The emphasis on connected text creates an assessment window into children’s comprehension given that if they can read clearly with proper pacing and expression, they are more likely Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 12 to read with understanding. Working memory is extremely important because children need to remember the sounds that correspond to the written script in order to manipulate them. They also need to remember the meaning of the string of words read and how these connect to the other parts of the text at the level of sentences and ideas throughout the text. Jukes and his colleagues suggest that oral reading fluency can also be assessed by “asking children to read aloud from curriculum-relevant texts and counting the number of words accurately read within a span of 60 seconds” (p. 11). They add that the syllable can be the “unit of analysis,” thus the number of correctly decoded syllables in one minute can serve as a valid fluency measure. Moreover, the authors point out that normative data from country-specific, linguistically and culturally relevant curricular materials can be used to shed light on grade level expectations for reading. The authors recommend that fluency outcome measures should be aligned to the number of words, syllables, phonemes, sentences, syllables/word, and phonemes/syllables found in grade level texts. This may also entail the use of readability formulas, which focus on the syntactic and semantic complexity of texts to have an idea of the cognitive demands placed on the reader. In order to figure out the syntactic complexity of a text, the number of words in each sentence are first counted. An average score is then calculated. In order to figure the semantic complexity of a text, researchers usually note the average number of low and high frequency words or calculate the average number of syllables in each word. Lexical diversity is a measure of the variety of words found in the text. The authors inform that the lexical diversity of a text can be calculated by finding the ratio of the “total number of unique words in the text (“types”) Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 13 to the total number of words in the text (tokens).” This is called the “type-token” ratio. Juke et al., (2006) state that “a consideration of lexical diversity is important in the measurement of fluency as children who read passages with many repetitive words are bound to have an easier time and go through them faster, than children who read passages that are more lexically diverse as they will decode a greater number of different words through the passage” (p. 15). While the elements noted above are relevant considerations that should be taken into account in an exploration of the tools used to measure fluency, the authors contend there exists a dearth of research on reading fluency norms in certain countries. Jukes et al. (2006) therefore recommend the following: “(a) take into account characteristics of the orthography that children are learning to read, (b) evaluate the results in keeping with the demands of the curricula, and (c) capitalize on any opportunities to collect information on general trends for a given language in a given country” (p. 16). Once fluency benchmarks and basic oral reading fluency tools are developed, a link to reading comprehension can be embedded in the tools. The authors pose that links between oral reading fluency and reading with understanding can be made through question and answer comprehension tests. They suggest that careful consideration should be given to the topic of the text and the quality of the questions. This involves selecting a text that requires a limited amount of prior knowledge so that children can draw responses directly from the text itself. The questions should be designed in a way that children cannot guess what the response may be just by reading the questions. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 14 Although the tool described above is better suited for children who have basic decoding skills, another tool to measure fluency entails written graded reading assessments that can be used inexpensively on a large scale. The letter reading and word reading portion asks that children tell the difference between letters and non-letters as well as words and non-words. The sentence portion is timed and requires children to read simple sentences and identify them as true or false. Both early literacy skills and comprehension can also be informally assessed through Maze tests. Jukes et al., (2006) prefer the Maze test rather than the Cloze test because it is more suitable for beginning readers. Beginning readers can read the passage and select one word from the multiple-choice option available, one word that fits the sentence according to the intended contextual meaning. In order to design a Maze test, key words are omitted either by their classification as parts of speech or by omitting every fifth or seventh word. The authors recommend the omission of particular parts of speech since that approach can better serve as a measure of reading comprehension. Other characteristics of the Maze test is that it is timed, some of the words from the multiple-choice options are selected as distractors, and readers must read beyond single sentences to read the passage with understanding. Jukes and his colleagues share the findings of a pilot study conducted to explore what measures of reading fluency are relevant and applicable across languages and orthographies. Based on the pilot findings, the authors highlight that an ideal approach to assessing reading skills in development contexts should include a composite score that encompasses both letter/graphic unit reading and passage reading ability that is linked Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 15 with comprehension. Thus, the authors recommend the inclusion of these elements in a reading assessment: (1) An oral letter reading fluency test--children read as many letters/graphic units as possible in 60 seconds, (2) An oral passage reading fluency test—children read as many words of a connected text as possible in 60 seconds. Two different passages should be used to improve reliability, and (3) Comprehension should be assessed by 5 questions asked at the end of the each passage. Students who do not finish reading the passage in 60 seconds should be allowed to finish. (p. 22). For languages similar to Hindi, the authors suggest selecting the syllable units that are taught first through curricular materials and/or selecting syllable units from the reading passages. As previously mentioned, the length and complexity of the passages as well as their lexical diversity should be assessed. Jukes and his colleagues (2006) make a distinction between commercially developed assessments and curriculum-based assessments of oral reading fluency. Teachers use curriculum-based assessments on a weekly basis to assess and monitor students’ academic growth. Teachers then use the data from these assessments to adjust instruction. Commercially developed assessments are designed and scored by an external organization under contract. The instructions for these types of assessments require students to respond to the same questions in the same way. The DIBELS discussed in this document is an example of a commercial assessment. The DIBELS in its original version included a retell fluency measure to gauge reading comprehension. The current version named DIBELS Next runs an updated version in English and Spanish that provides the retell fluency measure as an option rather than a requirement because the measure did not Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 16 demonstrate consistent performance nor did it render reliable and predictive outcomes on the external measure (OU CTL, 2012). Although the authors mention samples of commercially developed reading assessments that mainly focus on the English language, it is useful to take a look over the test design and explore what kinds of test items may be applicable to other languages and orthographies. For example, the Reading Fluency Indicator includes four passages at varying levels of difficulty for children between the ages of five and eighteen. The Reading Fluency Progress Monitor is standardized and normed, with up to 30 reading passages for children in grades one to eight. This tool allows the person who administers the test to find the passage that more closely matches each reader’s actual level of mastery and difficulty. The Reading Fluency Benchmark Assessor can be used by the classroom teacher to monitor oral reading fluency once a week, twice a week, or on a monthly basis. It includes eight levels of assessment with 30 reading passages at each level. Once the teacher administers the test, she can assess each child’s specific reading level and use these assessments to guide literacy instruction. The Gray Oral Reading Test contains developmentally sequenced passages, each followed by five questions that assess comprehension. Tools Used to Assess Fluency and Comprehension in Alphasyllabic Scripts Jukes and his colleagues also note the efforts of Pratham, an Indian nongovernmental organization that conducts large-scale fluency assessments in rural and urban areas. The authors’ critique is that while the design of Pratham’s assessment tools resembles some of the tools discussed thus far, the fluency measures concentrate mainly on words decoded correctly, less on reading rate, and no attention is given to reading Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 17 comprehension. The tool includes a four-sentence text that allows the test administrator to gauge whether readers can move on to a more challenging text or read a wordlist. While Wijayathilake and Parrila (2014) state there is limited research on basic reading skills in alphasyllabic languages, it is important to explore Vagh’s (2009; 2010) work on the reliability and validity of the Annual Status of Education Report (ASER) assessment tools given that these were designed for alphasyllabic languages. The ASER reading tool is available in Hindi, Bengali, Gujarati, Kannada, Tamil, among others. In this study, the ASER reading tool was implemented in the Hindi language and was aligned to standards 1 and 2 textbooks from India. For the purposes of this literature review, the focus will be on Vagh’s discussion of the tools for the basic reading assessment, not the portion that discusses the ASER math assessment tools. Vagh (2009) informs that the ASER reading assessment tool “classifies children at the ‘nothing’, ‘letter’, ‘word’, ‘paragraph’ (grade 1 level text), and ‘story’ (grade 2 level text) level based on defined performance criteria or cut-off scores that allow examiners to classify children as masters or non-masters of any given level” (p. 2.) The tool therefore emphasizes mastery of basic reading skills. In Vagh’s (2009) study, he evaluated the concurrent validity of the ASER reading tool through a comparison with the Fluency Battery—a tool adapted from EGRA and DIBELS. The Fluency Battery is made up of the following subtests: Akshar Reading Fluency, Barakhado Reading Fluency, Word Reading Fluency, Nonword Reading Fluency, two first grade level passages with each linked to two comprehension questions, and two second grade level passages with each linked to two comprehension questions. The author adds that, “the content of the Fluency Battery was drawn from prior ASER Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 18 reading tests as the material has been extensively evaluated and piloted to ensure their grade and content appropriateness for the population of interest” (p. 19). The findings indicate there is a strong concurrent validity between the ASER reading tool and the Fluency Battery. The children who participated in the study and performed above the standard 1 story level on the Fluency Battery also performed well on the ascending ASER reading levels. However, the author points out there were children who performed at the “nothing level” on the ASER reading test who were able to read four aksharas or more on the Fluency Battery. Due to this minor inconsistency, he suggests that other studies should further explore the suitability of the cut-off criteria for fluency rates. In similar research, Vagh (2010) reconsiders the findings on children’s performance on the different ASER-reading levels as well as their performance on the Fluency Battery. He informs that the strong validity coefficients between these basic reading tests indicate “increasing fluency rates with higher ASER-reading levels” and that the inconsistencies noted earlier may be a result of misclassification given that the levels are “mutually exclusive categories” (p.14). While both the Fluency Battery and the ASER reading levels are strongly aligned, Vagh further explains that the reason for testing and the type of data needed should decide which test to administer. He also suggests that both tests could be administered in tandem to evaluate children’s reading development “within and across reading levels” in programs such as the Read India Program. More studies are needed in order to develop tools to measure fluency in contexts where alphasyllabic languages are spoken. The Promise Foundation (TPF) in India brings together educators, psychologists, and social workers to meet the needs of children in Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 19 underserved communities, with a focus on how children acquire literacy in the mother tongue. Tiwari (2011) informs that among the many tasks of the LAB tool designed by the Promise Foundation, there is “a range of Kannada literacy and cognitive tasks like ashkara knowledge, reading fluency, reading comprehension” and several others (p. 9). He adds that the tool was adapted to languages such as Mayalam and Bengali. Given that in her research, Nag (2007) posits that the pace in learning to read the complex aksharas is gradual and can extend past the third grade, the LAB tool is a useful example of a local language-based assessment tool that measures reading fluency and comprehension across grades. Nag and Sircar (2008) adapted elements of the LAB tool in their study on learning to read in Bengali in Kolkata primary schools. Since the focus of the study was on the early stages of reading development in Bengali, the researchers carried out an initial screening assessment to identify readers and nonreaders. They then conducted an indepth assessment that collected data on children’s vocabulary knowledge, word recognition, reading comprehension, and phonological skills among others. According to the authors, the schools where the children attended were also studied based on “their daily routine, work culture, and teaching-learning processes” (Nag & Sircar, 2008, p. 4). In terms of word recognition, the authors found that children learn the Bengali aksharas in the order in which they are taught—teachers move from the simple or single akshara to the more complex aksharas. Nag and Sircar (2008) add that the children had difficulty decoding the complex akshara regardless of whether they appeared independently or as part of a word. In the word and nonword reading assessment, the findings showed that seven year olds could read the majority of words (96%) and Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 20 nonwords (84%) accurately. The most challenging aspect of decoding occurred when children tried to read the akshara located in the middle of the word. The researchers explain that although sound-symbol relationships are mainly consistent in Bangla, the middle akshara in words illustrate the inconsistencies that challenge the decoding process. The researchers’ findings reveal that, “children found it easier to work with syllables than phonemes and that their understanding that words are made up of phonemes came later. Children who were struggling with phonological processing were also struggling with simple word recognition and spelling” (Nag & Sircar, 2008, p. 7). Moreover, the areas of reading comprehension that were assessed were inferential thinking and understanding details in a nonfiction text. The researchers inform that in general the children performed better on understanding explicit details in a nonfiction passage than on making inferences. Stronger readers were more easily able to make inferences from passages that contained more sentences with challenging syntax and fewer familiar words (Nag & Sircar, 2008). They point out that they identified struggling readers as those who had difficulty decoding words as well as those who had trouble making inferences. Another aspect of reading comprehension that was assessed was the vocabulary task. They assessed word knowledge by giving the children the option to use the vocabulary word in a sentence, give a synonym, or provide a definition. Nag and Sircar (2008) report that the children preferred to communicate their understanding of word knowledge through sentence construction. However, they point out that the children in the earlier grades did not accurately express their word knowledge through sentence construction as well as the older children. According to the researchers, most of the Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 21 children in the earlier grades produced “dead sentences” with a weak connection between the meaning of a word and the way it was used in a sentence. A trend the researchers noticed was that instruction played an important role in children’s metalinguistic awareness. The children’s decision to convey word knowledge mainly through sentence construction was largely based on teachers’ curricular practice. Nag and Sircar (2008) hold that sentence construction alone falls short as a valid assessment of word knowledge. The authors recommend the following alternatives: ï‚· Test concepts from a cluster of words. Give a set of linked words for a name or concept relation between these words. ï‚· Sample word list: chair, table, desk, bed Response: furniture, household things, etc. ï‚· Test word-context matches. Give a situation and ask multiple-choice questions to tease out the child’s understanding. Sample situation: The teacher was happy that Radha had stopped spending time with her friends. Multiple-choice question: What did the teacher think of her friends? They were (a) irresponsible (b) responsible. ï‚· Test word-personal experience match. Give a word and ask for a recounting of personal experience that can capture the meaning of the word. The quality and accuracy of the connections the child makes gives an indication of the depth of the child’s understanding of the word. Example: Describe a situation when you are worried about something. (p. 15). Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 22 Thus, the researchers recommend that tests weave activities that foster meaning making such as conceptual categorization, personal relevance, and inferential thinking into activities that assess sound-symbol relationships at the word level. Sircar and Nag (2014) more recently conducted a study on sound-symbol relationships between the akshara and the phonological syllables in Bangla. The researchers specifically looked closely at challenges that arise during reading, instances where the characteristics of the spoken and written language signal a lack of congruency between the akshara symbols and the phonological syllables and how this influences the reading process. The fluency measures on the screening battery involved reading word lists, syllable, and phoneme processing tasks. The participants in grades 2, 3, and 4 were categorized as either readers on grade level or struggling readers. The researchers also included a phonological processing assessment that comprised the manipulation of “target syllables and phonemes in nonwords, in either initial or final positions” (p. 205). On akshara knowledge and word recognition tasks, Sircar and Nag (2014) noted that learning akshara symbols takes a great deal of time based on the finding that participants in the fourth grade were still in the midst of the akshara acquisition process. An example of consonant clusters on the word list that posed a challenge were /CCV/clusters. Consonant clusters were the most difficult to decode, especially for the second graders that were on grade level and the struggling readers. These participants sounded each phoneme rather than blending them. The findings also showed that participants accurately decoded the basic aksharas, in line with the pattern of akshara instruction (/Ca/, /CV/, /CCV/) in the primary schools where the children attended. Sircar and Nag observe that, “while the less familiar symbols appeared to elicit segmental Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 23 analysis of the markers within the syllable block (the ‘spelling’ of the akshara), the familiarity of the common akshara appears to allow the symbols to be reliably processed as undifferentiated tasks” (p. 205). In the nonword reading task, ninety-four percent of participants’ errors were phonological as demonstrated by the reversal of consonant sounds in /CCV/ akshara while the remaining errors were lexical (Sircar & Nag, 2014). According to the researchers, the data suggests that proficient readers use phonological analogies to decode nonwords and less proficient readers rely heavily on orthographic information as evidenced by their sounding out of inherent vowels. They also note that a repertoire of effective decoding strategies such as the one implemented by proficient readers can be integrated in akshara instruction so that children learn to apply rules and successfully manage the exceptions to these rules when they process text. In the phonological processing task, Sircar and Nag (2014) found that participants who decoded words correctly were better able to process phonemes. In the phoneme tasks, all the participants had greater difficulty in manipulating initial phonemes in more complex akshara syllables than in less complex akshara syllables. In a segmentation task, participants were presented with short and long nonwords. The researchers inform that the participants preferred to segment the nonwords as CVC-CV, which indicates a phonological segmentation rather than one mediated by orthography. They posit that this can by explained by the language-specific constraints found in initial or medial consonant clusters. In akshara substitution tasks, the findings were similar in that the majority of participants manipulated akshara phonologically rather than by “akshara-by-akshara manipulation.” The authors conclude that readers generally relied more on their Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 24 phonological awareness in order to read akshara syllables while struggling readers had difficulty using phonological and orthographic characteristics to decode akshara. The factors that supported the decoding process were children’s ability to distinguish how specific phonemes can and cannot be arranged to form syllables and words, use analogies to recognize new words, and accurately recognize words they often see in print texts were important factors that influenced reading ability. The findings from this study highlight some of the cognitive and linguistic demands that arise as children learn to read with fluency in the extensive orthography of the Bangla language. While Nag and Snowling (2012) acknowledge that across languages there are similar features in the process of learning to read, the way children understand and apply the nuances of alphasyllabic knowledge warrants further study in order to provide insights into language-specific factors that influence reading development. The following study sheds some light on how children process texts in an alphasyllabic script. In a clinical study designed to assess reading difficulties in Kannada, an Indian alphasyllabary, Nag and Snowling (2008) referred to previous longitudinal data that focused on children’s word and nonword reading skills in the participant selection process. The measures for the current study included, “tests of basic reading, spelling, phonological, visual and oral processing skills” (p. 2.) A cognitive measure in the assessment battery that is relevant to reading fluency is Rapid Automatic Naming (RAN). In RAN tasks, participants are expected to name akshara-based syllables or words as quickly as possible in order to measure processing speed and automaticity. Rapid Automatic Naming tasks are used in clinical studies to identify children who may be at risk for reading difficulties. The researchers’ preliminary findings show that proficient Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 25 readers possess an implicit orthographic knowledge of the “rules that govern the ligaturing of the vowel and additional consonants to the base consonant” without receiving formal instruction from their teachers on these rules. They add that readers who struggle have a difficult time discerning these rules and therefore have limited akshara knowledge. An implication that can be drawn from this study is that students need explicit instruction on orthographic knowledge and also require additional opportunities to read and listen to stories in and out of school in order to build their metalinguistic awareness. The articles discussed thus far delineate a variety of tools that can be used to measure oral reading fluency and comprehension. The development of grade level passages and corresponding questions, decoding and phonological awareness tasks, Maze tests, and vocabulary tasks, among others are reliable measures of oral reading fluency and comprehension. The following study considers similar issues in the context of Kannada, an alphasyllabic language spoken in South India. Nag and Snowling (2011) conducted a study that explored the differences in comprehension among students in fourth, fifth, and sixth grade. The students’ first language is Kannada, an akshara-based script like Bengali and Tamil. Nag and Snowling add that given the large number of akshara in Kannada, the reading acquisition process extends well into fourth and fifth grade. The researchers considered two aspects of reading comprehension. The first was the relationship between decoding, phonological, and comprehension skills. The second considered the link between inflection and vocabulary knowledge and comprehension. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 26 In the study, there were ninety-five participants from twelve schools. In order to explore the relationship between comprehension, decoding and phonological skills, the students read six passages and answered two questions. The passages consisted of fiction, informational texts, biographies, and riddles from fourteen to thirty-seven words in length (Nag & Snowling, 2011). The reading comprehension test was also used to assess reading accuracy. Other tests of reading accuracy were word and nonsense word lists. Tests to measure phonological processing were also included. In order to explore the link between inflection, vocabulary knowledge, and comprehension, the participants took a test that consisted of defining the meaning of words on word lists developed from grade level texts. They were also given ten sentences of varied length to repeat. The researchers made sure that the longer sentences contained less complex syntax so that cognitive space could be freed to focus on inflection. They counted the number of omissions and substitutions to determine the participants’ inflection knowledge. Nag and Snowling (2011) found that varied levels of reading comprehension were attributed to phonological processing skills and word accuracy. Phonological skills facilitated the ability to decode. Accuracy at the word level facilitated reading comprehension. Low performance on word and nonsense word lists as well as syllable and phoneme manipulation tasks were indicative of poor reading comprehension. The authors state that while these findings support similar findings in other languages, they do not consider that low performance on decoding skills alone accounts for poor comprehension. The findings from the second part of the study showed that “once reading accuracy and phonological skills had been controlled for,” vocabulary and Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 27 inflection knowledge were strong predictors of reading comprehension. The participants who had less vocabulary knowledge also had limited comprehension. In the context of a language rich in inflections, the participants who had knowledge of morphological segments were better able to read with understanding. Inflection errors occurred in nounverb and noun-adjective agreements. The researchers add that inflection knowledge was an independent predictor that explained the varied levels of comprehension found in participants’ processing of text. Nag and Snowling conclude that more studies are needed to further investigate the relationship between children’s oral language development, knowledge and processing of low and high frequency inflections, and reading comprehension in akshara-based languages. In a more recent study conducted by Nakamura (2014) in two language communities of South India, the researcher investigated how children acquire reading skills in Kannada or Telugu and English and how children’s knowledge of the reading process transfers across scripts. An important contribution of this study is the exploration of the predictors of the reading acquisition process within and across alphasyllabic and alphabetic languages as well as an analysis of the transfer of literacy knowledge. According to the researcher, an overarching goal of the study was to explore the possibility of an identifiable threshold in multilingual settings at which children’s reading outcomes in the alphasyllabic language would make it more likely for children to be able to transfer their literacy knowledge to English. Nakamura (2014) and her colleagues selected 556 students in standards 1 to 5 from 13 low-income schools representative of the urban, rural, government, and private schools in South India. Out of the 322 Kannada speakers, there were 168 boys and 154 Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 28 girls that participated in the study. Out of the 234 Telugu speakers, there were 116 boys and 118 girls that participated in the study. The students participated in three rounds of assessments that comprised reading skill subtasks. In the first round, the students completed 9 tests that comprised “concepts of print, blending, deletion, letter naming, decoding, slasher, oral vocabulary knowledge, listening comprehension, and reading comprehension” (p.14). While the majority of the subtasks are similar to others mentioned in the literature review, the “slasher” test assessed fluent word recognition and required participants to read sentences typed without spaces in between the words. The objective was to mark with a slash all the places where a space belonged in between the words in each sentence. In the second round, the subtasks consisted of “akshara knowledge, spelling, and oral reading fluency” (p.14). In the last round, the tests consisted of “deletion, letter naming, decoding, oral vocabulary knowledge, listening comprehension, and reading comprehension” (p.14). A point the researcher makes is that older participants were also required to complete the easier subtasks because the varied range of reading scores in previous studies indicated that attainment of grade level reading proficiency may not be reached by all students. Thus, in order to minimize the probability of yielding zero scores participants had to score at least or above 30% on gradual test eligibility criteria such as print concepts, letter naming, oral reading fluency, among others before moving on to the next level of subtasks. Out of the entire sample size, 91% were eligible for decoding, 76% were eligible for oral reading fluency in Kannada or Telugu, and 62% were eligible for oral reading fluency in English. In regards to the main findings on participants’ literacy knowledge in Kannada or Telugu, Nakamura’s (2014) study further confirms that the ability to decode in an Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 29 alphasyllabic language is contingent upon both syllabic and phonemic awareness. She adds that children’s oral language development in the mother tongue as evidenced by a repertoire of vocabulary words and the ability to understand the language of sentences and stories also play a crucial role in the ability to identify the relationship between akshara symbols and sounds. In fact, oral language development continues to support reading development across the grades. In addition to the finding that syllabic awareness, phonemic awareness, and oral vocabulary knowledge were strong predictors of reading outcomes in Kannada and Telugu, the author informs that the participants were able to decode with accuracy and speed in these respective languages by fifth grade. While the findings showed that boys and girls scored similarly across the subtasks, mastery of decoding skills in the three languages does not necessarily correlate with comprehension. Nakamura draws this conclusion based on the finding that student performance was higher on subtasks that measured accuracy in oral reading fluency than on reading comprehension subtasks. In regards to the main findings on participants’ literacy knowledge in English, Nakamura (2014) states there is a similar correlation between phonemic and syllabic awareness as well as oral vocabulary knowledge and participants’ scores in the decoding skills subtasks. However, she adds that, “syllable awareness was not a significant predictor of English coding ability” (p. 22). The author notes that knowledge of oral vocabulary had a consistent effect on decoding skills across the grades. In contrast to her findings on Kannada and Telugu oral reading fluency subtasks, there was a significant correlation between participants’ ability to read with accuracy and comprehension in Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 30 English. Nakamura also found that participants’ decoding skills in Kannada or Telugu influenced their ability to decode efficiently in English. This major finding supports the theoretical underpinnings of the study, which hold that “cognitively demanding tasks that underlie reading in multilingual children are sharable, transferrable, and facilitative across languages” (Nakamura, 2014, p. 8). The author contends that as children learn to read in two languages with different orthographies, they first acquire language-specific subskills and shared subskills that maximize reading outcomes in the mother tongue. Once children acquire foundational skills in the first language, they engage in a “cognitive resource sharing” process in which they build on these shared subskills while they learn the specific subskills needed to maximize reading outcomes in the second language. Moreover, Nakamura (2014) identifies 60% as the approximate “threshold point” at which children’s literacy knowledge in Kannada or Telugu optimally supports the ability to decode in English. The participants in the study reached this threshold at the end of Standard 4. The above predictors for decoding ability in Kannada, Telugu, and English hold implications for test design, pedagogical approaches, curriculum frameworks, program development in multilingual settings, and language in education policy. In regards to Kannada and Telugu, they are bound by a nuanced understanding of reading development in an alphasyllabic script. Thus, Nakamura’s (2014) study emphasizes the ways that learning to read in an alphasyllabic script are particularly different from learning to read in an alphabetic script. For instance, reading development in Kannada or Telugu requires that teachers simultaneously build students’ syllabic and phonemic awareness so that students can read with fluency. Students also need ample time to learn the visually Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 31 complex script and symbol-sound correspondence at syllabic and phonemic levels before they develop strong decoding skills. This is line with Nag’s (2007) research, which compared the pace of acquisition of akshara knowledge and phonemic awareness in low and high performing schools and found that students in both types of schools required a prolonged period of time to hone their decoding skills. Nakamura (2014) also points out that students learning to read in a transparent alphasyllabic script need explicit instruction when it comes to learning the rules and patterns that govern symbol-sound correspondence. She argues that phonics-based and sight word instructional approaches are not conducive to learning to read in an alphasyllabic script. Instead, Nakamura recommends the provision of akshara charts that are readily visible, formal and informal activities that build oral language skills, the creation of games that foster opportunities to practice the spatial relationships and phonemic markers present in aksharas, among others. Nakamura (2014) concludes that children’s oral language development is distinct from their reading development within and across languages. She stresses the need to address this distinction in the identification of appropriate pedagogical approaches in context-specific languages of instruction as well as a thorough consideration of the timing of instruction. Nakamura highlights the need for more research studies that investigate reading development in alphasyllabic scripts so that impact studies on reading intervention programs conducted by local and international organizations can begin to demonstrate marked improvement in children’s reading outcomes and language in education policies. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 32 The studies discussed above contribute to research centered on foundational skills, fluency, and comprehension in alphasyllabic orthographies. Along with factors such as age, grade, gender, and socioeconomic status, linguistic and cognitive perspectives also inform fluency data trends from impact studies conducted by international organizations. The next section explores fluency data trends specific to the Bangla language and contributes to the conversation on the relevance of languagespecific oral reading fluency benchmarks. Bangla Fluency Data Trends According to Education Watch (2000) only 4 out of 53 terminal competencies in Bangladesh’s primary education system pertain to the Bangla language. In a study conducted by the Bangladesh Rural Advancement Committee (BRAC) Research and Evaluation Division, the terminal competencies of Bangla reading, writing, and listening of fifth grade students were assessed. The broader goal of the report was to inform stakeholders about the state of primary education in Bangladesh in two ways. The first was through an assessment of fifth grade students’ ability to meet the terminal competencies. The second was through an evaluation of teacher education for the primary grades. In order to investigate students’ attainment of the competencies, the team randomly selected 2,509 students from 186 schools. The students attended government schools, private schools, or informal schools. The Bangla assessment tool consisted of ten test items. In order to assess reading competence, students first read aloud a printed paragraph and a handwritten paragraph then responded to four questions. Two questions were designed for each paragraph. Students who correctly responded to a minimum of one question for each paragraph were Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 33 assessed as having adequate reading skills. The findings indicate that 65% of students met the minimum criteria for reading competency. Nationwide, 62.2% of students responded correctly to both questions on the hand written paragraph while 33.6% of students responded correctly to both questions on the printed paragraph (Education Watch, 2000). For the listening comprehension assessment, students first listened to a recording of a paragraph then responded to two questions. If students responded correctly to at least one of the two questions, they were assessed as having met the requirement for listening competency. Collectively, 80.7% of students met the minimum criteria while 43.2% responded correctly to both questions. There was no statistically significant difference between the performance of boys and girls. Students in the urban areas performed better than students in the rural areas, at 87.2% and 78.5% respectively (Education Watch, 2000). In the writing assessment, students were asked to correctly complete three out of four prompts. The research team defined correct sentences as those that made sense and contained at least half of the words spelled correctly. The first prompt was to correctly write about an object they could see. The second prompt was to correctly write about an object they could not see. The third prompt was to fill out a form. The fourth prompt was to write a message. The findings show that 55% of students met the minimum criteria (Education Watch, 2000). In another level of analysis, the BRAC research team combined students’ scores for each of the competencies, reading, listening, and writing. The findings indicate that on average, only 36.5% of students achieved minimum competence in reading, listening, Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 34 and writing. By gender, only 33.2% of girls and 39.8% of boys achieved minimum competence. By area, only 34% of rural students and 46.3% of urban students achieved minimum competence. The report further states that by the end of primary school, less than 2% of students meet the standards set by all 53 terminal competencies. The researchers posit that these alarming percentages bring to bear on the issues of equity and quality in education given that the small number of students who attained all 53 competencies were those who attended the “best” schools in the capital (Education Watch, 2000). Based on their general low achievement findings, the authors recommend that teacher education, accountability systems, materials for teaching and learning, competencies, and language objectives should be reexamined for the purpose of building students’ basic literacy skills. They emphasize the importance of revisiting the competencies in order to scaffold children’s learning experiences in a way that authentically starts where students actually are in their literacy development. An additional contribution is that it was the first time that terminal competencies were wholly addressed in a study (Education Watch, 2000). Findings from other studies on emergent and early reading skills and their link to terminal competencies continue to inform primary education policy and practice in Bangladesh. For example, Dowd and Friedlander (2009) published a report on the emergent and early reading assessment validation study results on Save the Children’s branch in Bangladesh. A main purpose of the study was to explore if the assessment tools reflected variations and relationships within and across children’s reading skills in rural and urban areas as well as in different reading programs. While the sample size of readers was too Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 35 small to yield reliable fluency and comprehension estimates for each grade level, the study did provide preliminary estimates. In first grade, participants who read the first story read an average of 14.64 words correctly per minute. First graders who read the second story read an average of 20.10 words per minute. In second grade, participants who read the first story read an average of 33.53 words correctly per minute. Second graders who read the second story read an average of 41.34 words correctly per minute. In the comprehension portion, first grade participants scored 22% of the responses correctly on the grade level passages. Second grade participants scored significantly higher. Dowd and Freidlander (2009) state that the much higher scores indicate that more difficult questions need to be developed for both second grade passages. Although the authors conclude that the Bangla reading assessment aptly captures grade level differences, they acknowledge that adjustments to the assessment tool are necessary given that scores indicate an “increase in reading fluency and comprehension not across the same passages, but on grade-level texts” (p. 3). They add that fluency scores show differences across rural and urban areas. In order to improve comparability across areas, Dowd and Friedlander recommend two options. The first option is to use one passage to assess all children. The second option is to assess students in one particular grade. Finally, the authors recommend that the study’s preliminary estimates be used solely as benchmarks for future grade level reading assessments. Nath, Guajardo, and Hossain’s (2013) impact study on the Literacy Boost (LB) intervention reports the changes in Bangla reading skills between the 2011 baseline findings of grade 3 participants and 2013 endline findings of grade 4 participants. All of Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 36 the 465 participants were from the Meherpur District in Bangladesh. Out of this number, 255 children participated in the LB intervention and 210 children participated in the comparison group. The participants either attend schools that receive either the Literacy Boost intervention or Save the Children’s Basic Education Sponsorship intervention which for the purposes of the study serve as the comparison schools. The reading assessment data collected consisted of letter knowledge, single word reading, decoding, fluency, accuracy, listening comprehension, and reading comprehension. Participants who were able to read five words correctly in 30 seconds were referred to as readers. The assessors multiplied the number of words read correctly in 30 seconds by two to calculate the reading rate. Participants who could not read five words correctly in 30 seconds were referred to as nonreaders. The protocol for nonreaders involved listening to the passage and responding to the same set of comprehension questions as the readers. The researchers point out that the baseline determined the 75th percentile as a benchmark for each assessment measure. For fluency, they identified 42 words per minute correct as a benchmark. Nath and his colleagues state that the difference between the fluency baseline and endline scores of the participants in the Literacy Boost intervention group and the comparison group was not statistically significant. At baseline, 83% of the LB intervention group could read five words in 30 seconds. At endline, 88% of the LB intervention group could read five words in 30 seconds. At baseline, 85% of the comparison group could read five words in 30 seconds. At endline, 83% could read five words in 30 seconds. Both groups started at approximately 29 words correct per minute Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 37 and nearly doubled their reading rate in the endline assessment. Both groups therefore exceeded the established 42 words per minute correct benchmark. In the reading accuracy measure, 70% of the LB intervention group could accurately read a grade level passage in the baseline. At endline, 81% of the LB intervention group could accurately read a grade level passage. At baseline, 73% of the comparison group could accurately read a grade level passage. At endline, 78% of the comparison group could accurately read a grade level passage. Neither group met the benchmark goal of 92% in the reading accuracy measure (Nath et al., 2013). In reading comprehension, the participants from both groups were able to correctly answer 1.5 out of 5 comprehension questions based on a grade level passage at the baseline. In the endline, participants in both groups were able to correctly answer approximately three out of five comprehension questions. Both groups exceeded the benchmark goal of responding correctly to two out of the five comprehension questions (Nath et al., 2013). The researchers developed another way to analyze the comprehension data based on a composite measure. They explain that the composite measure entailed reading grade level passages and correctly answering 75% to 80% of the comprehension questions while also scoring a minimum of one standard deviation below the corresponding average fluency or average accuracy. While few of the participants from the LB intervention and comparison groups could read with comprehension in the baseline, approximately 37% of the participants from both groups were able to do so in the endline assessment. Nath and his colleagues conclude that in general the impact of Literacy Boost shows modest gains for fluency and comprehension. While struggling students Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 38 demonstrated marked improvement from the baseline to the endline assessment, overall reading scores did not improve significantly in the LB schools. In order to contextualize the findings and identify correlations, the authors considered other contributing factors such as school attendance, reading buddy participation, lack of a home literacy environment, reading camp attendance, and students’ workloads. The establishment of correlations to a confluence of factors reveals the complexity of the challenge in promoting children’s Bangla reading fluency. The situation is further compounded by a lack of national level fluency benchmarks for reading in the children’s first language. Basher, Jukes, Cooper and Rigole (2014) confirm that while small scale studies are conducted by local and international organizations, the reality is that a standard for oral reading fluency does not exist in Bangla. Room to Read’s (RtR) study in the report Bangla Reading Fluency in Early Grades had a twofold purpose. An initial purpose was to collect impact data on the organization’s Reading and Writing Instruction (RWI) program in government primary schools. A second purpose was to collect data on children’s early grade reading in schools around the country in order to compare these to the average fluency rates and comprehension skills of schools where Room to Read implements its RWI program. The researchers randomly selected 30 first grade students and thirty second grade students from thirty RtR supported government primary schools in Sirajganj District. The actual sample size consisted of 809 first graders and 791 second graders, for a total of 1,600 participants from the RtR supported schools. For the nationally representative sample, the researchers randomly selected 1,630 first graders and 1,464 second graders from 84 schools representative of the twelve primary streams, for a total of 3,094. For Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 39 both sample sizes, the researchers attempted to select an equal number of boys and girls. The total sample size for the study was therefore 4,694 students from 114 schools. The Early Grade Reading Assessment (EGRA) tool was customized to fit the linguistic features of the Bangla language and standard EGRA testing procedures were implemented. The study did not include baseline data (Basher et al., 2014). The findings show that first graders in RtR supported schools can correctly read 18.99 words per minute and provide correct responses to 1.38 out of five comprehension questions at the end of the year. First graders in comparison schools from around the country can correctly read 15.51 words per minute and provide correct responses to .98 questions out of five comprehension questions at the end of the year. The findings reveal that second graders in RtR supported schools can correctly read 41.35 words per minute and provide correct responses to 2.16 out of five comprehension questions. Second graders in comparison schools from around the country can correctly read 33.44 words per minute and provide correct responses to 1.79 out of five comprehension questions. While Basher and his colleagues inform that the effect sizes for the RWI intervention ranged from 0.34 to 0.53 and were greater for reading comprehension, the above findings clearly indicate that students in Room to Read supported schools also performed better than students from comparison schools around the country in oral reading fluency. However, even with the RWI intervention, there were 20% of first graders and 6% of second graders who could not read at the end of the year. Further cause for concern is that across the country, there were approximately 32% of students in first grade and 16% of students in second grade that could not read. Comprehension scores for Room to Read support schools and countrywide schools were also very low. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 40 Based on these findings, the authors suggest that students in primary school would benefit from a supplementary Bangla literacy intervention. They state that oral reading fluency should serve as an “indicator of the quality of education in the early grades” (p. 40). The authors conclude that further large-scale, countrywide research should collect data on oral reading fluency rates across streams and divisions in primary schools. This would yield a more robust nationally representative sample and support the process of setting benchmarks for oral reading fluency within and across the primary grades. Sayed, Guajardo, Hossain and Gertsch (2014) conducted a baseline survey report on primary schools in Bangladesh to identify children’s performance on the READ project’s intervention areas. A second objective of the baseline assessment was to benchmark the basic reading skills of students in grades 1, 2, and 3. These benchmarks would then be compared to the results on another assessment of children’s basic reading skills administered later in the academic year, soon after the implementation of the READ intervention. The researchers also collected background data to highlight the factors that may affect students’ reading acquisition process in the primary grades. The data collection phase lasted from March to June 2014. In the study, thirty schools were selected for the control group, 39 schools that received support from Promoting Talent through Early Education (PROTEEVA), a pre-primary intervention were selected for the second group, and 32 schools that received both the PROTEEVA and READ interventions were selected for the third group. The total number of schools from 21 districts that participated in the study was 101. The total number of participants Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 41 was 3008, which consisted of 1,004 first graders, 1,003 second graders, and 1001 third graders (Sayed et al., 2014). Specific grade level competencies were assessed but for the purpose of this paper, the main findings relevant to the oral reading fluency, accuracy, and comprehension subtests will be discussed. For the oral reading fluency and reading comprehension assessments, the first grade participants read a grade level short story comprised of 59 words. First grade participants who were able to read a minimum of five words in the first 30 seconds were referred to as readers. After reading the story, these students then responded to ten comprehension questions. The types of comprehension questions consisted of literal, inferential, summary, and evaluative questions. The assessors read the story to the first grade participants who were unable to read five words in the first minute. These participants then responded to the ten comprehension questions posed. The main findings reveal that only one out of ten participants in the first grade was a reader. On average, first grade readers could accurately read approximately 46 out of the 59 words in the story. First grade readers could therefore read with 79% accuracy. In the oral reading fluency measure, first graders could read on average 16 words per minute. In the comprehension measure, first graders on average correctly responded to 4 out of 10 questions. Overall, evaluative, inferential, and summary questions posed the most difficulty. In the comparative analysis, READ supported schools and READ plus PROTEEVA supported schools show similar results in fluency, accuracy, and comprehension. While there is no statistical difference in fluency and reading accuracy across the three groups, first graders in the control schools did not perform as well in Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 42 reading comprehension as the READ supported schools and the READ plus PROTEEVA supported schools (Sayed et al., 2014). The second grade fluency assessment consisted of students reading a grade level passage made up of 83 words. Students who could correctly read a minimum of five words in the first minute were referred to as readers. After reading the story, these students then responded to ten comprehension questions. The findings show that second graders across the three types of schools correctly read an average of 23 words per minute. One third of second graders could read five words correctly in the first minute with an average 83% accuracy. In reading comprehension, second graders scored an average of approximately 25%. Similar to the first graders, the second graders had difficulty responding to inferential, summary, and evaluative questions. In the comparative analysis, the statistical difference for fluency, accuracy, and comprehension between second grade students in READ schools, READ plus PROTEEVA schools, and control schools was not significant (Sayed et al., 2014). The third grade fluency assessment consisted of students reading a grade level passage made up of 122 words. Students who correctly read a minimum of five words in the first 30 seconds were referred to as readers. After reading the story, these students then responded to ten comprehension questions. The findings show that third graders across the three types of schools correctly read an average of 28 words per minute. Two thirds of third graders could read five words correctly in the first 30 seconds with an average 83% accuracy. The average reading comprehension for third graders was approximately 16%. Forty-three percent of third graders from the three types of schools could not respond to any of the comprehension questions. As in the findings for first and Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 43 second grade, third graders found the inferential, evaluative, and summary questions the most difficult. In the comparative analysis, the statistical difference for fluency, accuracy, and comprehension between third grade students in READ schools, READ plus PROTEEVA schools, and control schools was not significant (Sayed et al., 2014). The researchers acknowledge the stark contrast between rising oral reading fluency rates and decreasing comprehension across the grades. The authors recommend adjustments in the intervention such as intense instructional support targeted to the areas of letter knowledge, phonemic awareness, and comprehension. They also recommend professional development on the use of formative assessments and additional reading materials for children. The authors conclude that adjustments made based on the findings from the baseline survey and data collected from the scheduled endline assessment will further inform the impact of the intervention. An important conclusion that Sayed and his colleagues draw from the findings in the baseline survey report is that there are children in grades 1, 2, and 3 that are nearly “on pace” for their respective grade level while a large number of children lag behind and simply have not learned to read. The in-country fluency trends described in the studies above indicate there is an urgent need to bring fluency and comprehension as critical indicators of both quality and equity in education. National norms for oral reading fluency performance have proven to serve as powerful indicators of reading competence in the early grades (Hasbrouck & Tindal, 2006). The development of national fluency benchmarks in Bangla for the primary grades is a step toward this goal. Common grade level standards of attainment for oral reading fluency can help stakeholders assess and determine where students in every primary school are in relation to the standard and provide them with the necessary Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 44 scaffolds to meet the benchmark goal by the end of the academic year (Hasbrouck &Tindal, 2005). While local institutions and international organizations conduct research and provide varied intervention programs that aim to identify and address the gaps in children’s basic literacy skills, the creation of national standards for oral reading fluency in Bangla would further support the efforts of education sector stakeholders toward the goal of teaching children to read with fluency and comprehension. In order to move closer toward this goal, it is important to think deeply about the language-specific and universal features of the reading acquisition process that lend themselves to a better understanding of the relationship between fluency and comprehension in the context of the Bangla. The next section discusses some of the universal features of the reading acquisition process and explores the intersection between fluency and comprehension. The Relationship between Fluency and Comprehension Fountas and Pinnell (2006) describe the universal characteristics of fluency as multidimensional, requiring four levels of processing: at the letter or symbol, word, phrase, sentence, and entire text. As children learn to read, there are key processing mechanisms that occur at each point. At the level of the letter or symbol, children use visual information to note the differences in each such as size and shape. Children who attain letter fluency are able to name a letter and its sound (s). They notice that these letter symbols connect to form words. At the word level, children notice that words come in varied lengths and that words hold meaning. They also use picture cues in the text to figure out words. Children also begin to recognize they have seen some words before. Familiar words and word parts help children figure out how to read new words or word Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 45 parts in a sentence. As they practice new and familiar words, children learn that a continuous string of words are connected and have layers of meaning. At the phrase and sentence levels, children notice how punctuation and the implicit grammatical rules of a given language help them to visually segment parts of the sentence in a way that makes sense. They begin to use everything they know about sentence structure and vocabulary from their formal and informal experiences with speaking and listening as well as stories read to them in order to practice the flow of reading written language. In their reading, children begin to practice such elements of fluency as tone and inflection in a way that supports understanding of the text. At the level of the text, children’s growing awareness of the author’s tone and how written language works in terms of the structures of fiction and nonfiction texts helps them to process the text more fluently. By looking at fluency from the perspective of these four levels of processing, it is evident that children engage in different forms of meaning making at each level. On the link between fluency and comprehension at the level of the text, Fountas & Pinnell (2006a) aptly point out: “Ultimately, the reader must use comprehension itself to support fluency…comprehension and fluency are intricately and intercausally connected. Each benefits from and influences the other. They are, in fact, parts of the whole act of reading—the complex processing that readers do—and they are extremely hard to separate. Readers use the structure, or organization, of the text, as well as their background knowledge, to support both comprehension and fluency.” (p. 67). Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 46 It is clear that more than fluency is required to read with understanding (Nag, Chiat, Torgerson, & Snowling, 2014). In the classroom, the more children are engaged in explicit, scaffolded instruction and are provided with many opportunities to practice decoding at each level of fluency processing, the better chances they will read with fluency and increased comprehension. In this process, oral language development plays a critical role in learning to read with fluency and understanding. Children’s oral language development is the building block for the development of print concepts, phonological awareness, phonemic awareness, letter or akshara symbol knowledge, symbol-sound relationships, as well as vocabulary and writing (National Institute of Child Health and Human Development; 2000, 2005; Nag & Snowling, 2011). Fountas and Pinell (2006) explain how oral language development plays a crucial role in oral reading fluency and comprehension. They elaborate that children use what they know about how spoken language sounds like and the nuanced meanings of words they acquire through their experiences with talk in school, at home, and in the wider community. Part of oral language development therefore includes the acquisition of vocabulary through talk. This is in line with Nation and Snowling’s (2004) research which found that the more oral vocabulary words children know, the easier it is to draw on phonological awareness and comprehension to recognize words in print thereby improving reading fluency. A teacher’s challenge is to bridge children’s implicit syntactic and semantic knowledge as well as the structural, visual, and meaning cues within the text to learn to read with fluency and understanding. The authors highlight the six dimensions of fluency: pausing, phrasing, stress, intonation, rate, and integration. Table 1 below shows Fountas and Pinell’s (2006) definitions of each dimension of fluency. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 47 Table 1. Six Dimensions of Fluency Dimensions Pausing refers to the way the reader’s voice is guided by punctuation. Phrasing refers to the way readers put words together in groups to represent the meaningful units of language. Phrased reading should sound like oral language, although more formal. Phrasing involves pausing at punctuation as well as at places in the text that do not have punctuation. Stress refers to the emphasis readers place on particular words (louder tone) to reflect the meaning as speakers would do in oral language. Intonation refers to the way the reader varies the voice in tone, pitch, and volume to reflect the meaning of the text—sometimes called “expression.” Rate refers to the pace at which the reader moves through the text. An appropriate rate moves along rapidly with a few slowdowns or stops and long pauses to solve words. If a reader has only a few short pauses for word solving and picks up the pace again, look at the overall rate. The pace is also appropriate to the text and purpose of the reading—not too fast and not too slow. Integration involves the way the reader consistently and evenly orchestrates pausing, stress, intonation, and rate. The reader moves smoothly from one word to another, from one phrase to another, and from one sentence to another, incorporating pauses that are just long enough to perform their function. There is no space between words except as part of meaningful interpretation. When all dimensions of fluency—pausing, phrasing, stress, intonation, and rate –are working together, the reader will be using expressions in a way that clearly demonstrates that he understands the text and is even thinking beyond the text. Source: Fountas & Pinnell (2006a) When children increasingly attain fluency at the levels of the letter or akshara, word, phrase, and sentence, they further benefit from instructional support and ample time to practice reading and integrating the six dimensions of fluency with grade level texts. It is clear from this perspective there is more to oral reading fluency and how children process texts than measures of accuracy and speed. At each of the levels of fluent processing, it is important to tap into children’s repertoire of resources such as background knowledge gained from personal experiences (Calkins, 2000) for comprehension and support their ability to build an inner control of the reading process over time (Clay, 1991). Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 48 In regards to the process of building this inner control, Moore and Lyon (2005) state that, “children who read slowly, word-by word, with little expression, have difficulty comprehending and remembering what they read. It is this connection to comprehension that makes fluency most critical” (p. 53). When children struggle to read and require additional support in the process of building an inner control of the reading process, an analysis of children’s reading behaviors can provide valuable data about the sources of information that they use to decode and make meaning. In the case of Bangla, syllable processing is a strong predictor of fluency and comprehension (Nag et al., 2014a; Nag et al. 2014b) According to Fountas and Pinell (2009), sources of information on children’s reading behaviors are usually classified into three categories: meaning information which may include reading and understanding pictures as well as words, structural information which involves the arrangement of words and sentences in a given language, and visual information which includes text structures, symbol-sound relationships at the level of phonemes and syllables as well spaces and diacritical marks. Teachers may use running records as a tool to monitor, assess, and adjust instruction based on these three categories (Moore & Lyon, 2005). By conducting error analyses within and across these categories, it is easier to understand the individual and collective profiles of readers and describe what is happening in the intersection between fluency and comprehension. An important point to consider in this intersection is that even when children are still in the process of learning to read, they can develop comprehension skills. While fluency and comprehension overlap in some ways and are interdependent, children have the capacity to tell and listen to stories, as well as read pictures in order to ask questions, Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 49 make predictions, inferences, and connections, identify important ideas, and synthesize as evidence of critical thinking and comprehension (Moore & Lyon, 2005). Children may therefore comprehend a text even if they cannot process it through the written language. Alternatively, they may also use their decoding skills to read fluently yet not understand a word of what they have read because what they have just read does not stay long enough in working memory for it to actually make sense. Anyone who listens to children who read in this way and engages in comprehensive discussion can attest that what was read aloud with fluency was a string of words unattached to meaning. This is how expression, another important characteristic of oral reading fluency, can offer clues to the link between fluency and comprehension. In order to process a written text with understanding, children must attend to the “visible” sources of information described above as well as “invisible” sources of information (Fountas & Pinnell, 2006). These include various types of knowledge such as how language works, knowledge of concepts and facts, knowledge derived from personal, social, and cultural experiences, and knowledge of the characteristics of written texts such as genres and story elements (Calkins, 2000; Fountas & Pinnell, 2006; Gonzalez, 2005). Fountas and Pinnell (2006) posit that even when children develop the decoding skills needed to read with fluency, they are continuingly challenged by texts across grade levels to maintain oral reading fluency as part of a network of processing systems that occur at the same time during reading. This network of strategic actions portrays reading as thinking, a network in which readers think within the text, about the text, and beyond the text in order to make meaning. Figure 1 illustrates how the maintenance of fluency and other strategic actions support comprehension. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 50 Figure 1. A Network of Processing Systems for Reading Source: Fountas & Pinnell (2006a) Moore and Lyon (2005) further support this expanded description of oral reading fluency. They argue that past perspectives on fluency narrowly conceived of it as consisting of two components, namely an appropriate rate and accuracy. While reading rate can serve as an indicator of comprehension, the authors recommend lessening the Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 51 emphasis on speed in the early grades and assessing it after first grade for English. Some reasons the authors give are that emergent readers are still practicing directionality and that automaticity usually sets in later because children are slowly acquiring a bank of sight words. This should be considered in the case of Bangla, given that the research studies previously cited in this report note that emergent readers need more time to attain automaticity due to the visually complex features of the alphasyllabic script (Nag 2007; 2011; Nag & Sircar, 2008; Nakamura, 2014). In terms of accuracy, researchers hold that in order to maximize comprehension children should read 9 out of every 10 words correctly (Moore & Lyon, 2005; Clay, 2000). However, there are instances in which children may even read many words correctly per minute and not understand what was read due to issues with working memory (Abadzi, 2012). Another point to keep in mind regarding the relationship between fluency and comprehension is that texts used to practice learning to read must be “just right,” neither too easy nor too hard otherwise fluency and comprehension are compromised (Fountas & Pinnell, 2001). The creation of oral reading fluency benchmarks that are aligned with comprehension benchmarks can greatly inform the publication of children’s books that scaffold the cognitive demands of texts through a balance of challenge and support along a continuum within and across grades (Fountas & Pinnell, 2006b). Fountas and Pinnell explain that children’s books have certain text characteristics that make them accessible with just the right amount of sentence complexity to support children’s learning to read and reading to learn. Some of the text characteristics noted by the authors are: genre, text structure, content, themes and ideas, language and literary features, syntactic and semantic word and sentence complexity, word length and frequency, genre-specific Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 52 illustrations, print layout, sentence length, font size, and number of pages. These characteristics play a critical role in the way children learn to process texts and ultimately make meaning. However, in contexts where solely textbooks are used to teach children how to read, it is challenging to meet children where they are in their reading acquisition process because textbooks are often written in one grade-specific reading level rather than a range of reading levels for a given grade. Therefore, there is often a mismatch between where individual children are in the reading acquisition process, knowledge and understanding of the cognitive demands of different types of texts, the availability of a range of successive text level gradients to scaffold reading instruction, and the language-specific differentiated teaching approaches required to move children along each stage of the reading continuum. Fortunately, the fact that low-income countries have little to no accessibility to children’s books is slowly changing. Local and international organizations are in the process of working with ministries of education to develop decodable and leveled texts across many languages. Refer to Appendix A for a sample decodable text in the Bangla language created by Room to Read. The United States Agency for International Development (USAID) delineated a simple framework that is universally applicable in the creation and leveling of books based on criteria that move beyond readability formulas (Davidson, 2013). In effect, children’s books can then be matched to readers at different stages of the reading acquisition process. Teachers could assess children’s reading behaviors in terms of types of decoding errors made, oral reading fluency rate, as well as literal and inferential comprehension. Based on a triangulation of this data, Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 53 teachers could then select texts that reflect the appropriate balance of challenge and support and adjust explicit instruction accordingly. While Rasinki (2010) agrees that the more children are able to read fluently, the more cognitive space is left open to focus on comprehension, he warns that assessment tools like DIBELS must be careful not to “craft a de facto and reductionist definition of fluency—a rate attenuated by accuracy” (p. 8). Rasinksi highlights the intercausal relationship between reading rate and comprehension when he argues that although reading rate may provide a window into children’s decoding skills, vocabulary and comprehension also impact reading rate. The author uses the term “meta-fluency” to describe the need to create assessments and instructional methods that help children build the inner control “of the elements of fluency—accuracy, rate, and expression—to the end of comprehending what they read to become fluent readers” (p.8). In regards to oral reading rate, Abadzi (2012) explains how the visual complexity of alphasyllabic scripts in addition to akshara combinations take up working memory and that children therefore take longer to decode longer words, thereby influencing reading rate. This is why it is important to expose children to print in and out of school. This will sharpen their ability to recognize familiar words, decode new words, strengthen automaticity, and read with expression. When children read with expression, Rasinki specifically refers to children’s ability to read smoothly in phrases while also communicating the intended meaning through nuances in the tone of voice. Moreover, Rasinksi (2010) challenges the notion of fixed oral reading fluency norms used to benchmark at each grade level. He holds that children can potentially read a fluid range of words in one minute within and across grades and that this range can be Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 54 influenced by many factors. For instance, he wonders how text genres, quality children’s books, levels of text difficulty, and reading for longer stretches of time impact the number of words read correctly per minute and ultimately comprehension. In an alphasyllabic language, inflections are a form of expression that can signify grammatical categories. Nag and Snowling’s (2011) research on reading comprehension, decoding skills, and oral language further support the notion that expression plays a critical role in fluency and comprehension. In the study, the researchers collected data on the predictors of reading comprehension in a group of 95 Kannada speaking children from 12 schools in India. In addition to a reading comprehension test, the participants were assessed on vocabulary and inflectional knowledge. In the portion that assessed inflection, the participants “were asked to repeat a set of ten sentences differing in length, with longer sentences comprising more substantive words and inflections but simple syntax, to reduce demands on syntactic knowledge. Knowledge of inflection was estimated based on the number of omissions or substitutions of inflections made” (p. 93). In that portion of the study, the findings revealed that knowledge of inflection was an independent factor that influenced reading comprehension. According to the authors, this indicates that the more inflections there are in a language, the more children have to learn about the morphological parts of words so they can make meaning as they read. They recommend explicit instruction on low and high frequency inflections to support reading comprehension. The section above explored the intercausal relationship between fluency and comprehension. It described how fluency plays a pivotal role when children learn to read as well as when they read to learn. The section also discussed the complexity of the Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 55 reading process in terms of the importance of oral language development and its connection to phonological awareness, vocabulary acquisition, accuracy, reading rate, expression, and comprehension. The next section discusses oral reading fluency benchmark making procedures in the context of a study conducted in Ethiopia. Ethiopia: A Sample Oral Reading Fluency Benchmark Study Among the many languages spoken in Ethiopia, Amharic, Tigrigna, and Hararigna use an alphasyllabic script called a fidel (Piper, 2010; Nakamura, 2014). In 2010, the Ethiopian Ministry of Education (MOE) partnered with Research Triangle Institute International (RTI) in order to conduct an Early Grade Reading Assessment (EGRA). The portion of the EGRA that measures oral reading fluency was used to develop oral reading fluency benchmarks in several mother tongues. While there were languages that use the alphabetic script included in the study, for the purpose of this report the benchmark making procedures will be discussed through the lens of Amharic, Tigrigna, and Hararigna since these are written in an alphasyllabic script. The RTI research team reviewed the analysis and findings from the MOE’s country-based learning assessment reports to inform the adaptation of the EGRA tool for each regional language. The team also analyzed the Ethiopian MOE’s minimum learning competencies for grades 1 to 4 to ensure that the EGRA tasks aligned with the basic skills needed to meet the respective learning goals (Piper, 2010). In order to adapt the EGRA instrument, RTI researchers worked with local language experts to develop certain subtasks with consideration given to textbooks in grades 2 and 3. They also held in-country workshops and invited local and international experts from various entities to support the adaptation process. The RTI team trained Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 56 assessors, piloted the adapted EGRA tool, and analyzed the results from this data to make any needed changes to the subtasks. Then, the RTI researchers and local experts from the MOE used specific sampling methods to ensure “regional representativeness.” It took approximately 6 weeks to collect data. During the analysis stage, RTI researchers only compared zero scores across languages since the languages are quite different from each other. Piper (2010) points out that while oral reading fluency benchmarks are language-specific, “the U.S. and international benchmarks do shed some illustrative light on where Ethiopia is in the area of reading” (p.21). RTI therefore used 60 Words Per Minute (WPM), the “absolute lowest benchmark for reading difficulties in the U.S. as well as the number of children who were reading zero words” to gauge the percentage of children not meeting the benchmark in each regional language (p. 21). The next step was to identify the percentage of children that did not meet the benchmark by grade level and region. The researchers compared achievement word reading fluency scores across grades, rural and urban regions, and languages and identified language-specific gaps. The findings show that less than 10% of the participants met the benchmark of 60 WPM in any of the regional languages. The researchers identified variations in the zero scores across regional languages from grade 2 to grade 3. In the analysis of accuracy, Piper (2010) explains that due to the alphasyllabic script of the Amharic, Tigrigna, and Hararigna languages, “the ability to read words accurately is not likely to differ from the ability to read the fidel accurately” (p. 33). This supports findings from other studies that children learning to read in an alphasyllabic orthography need more time to learn the greater number of symbol-sound Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 57 relationships (Nag, 2007; Nag et al. 2014). Accuracy was compared at the fidel and word levels. The relationship between word naming fluency, decoding fluency, oral reading fluency, and predictive factors such as student, school and family level factors were also analyzed. All of the data were carefully analyzed in order to create basic oral reading fluency benchmarks. According to Piper (2010), the following statistical methods were applied to arrive at an initial set of benchmarks: “First, quantile regression methods are used to show potential markers for oral reading fluency scores. Second, analysis of the average reading scores for schools in the lowest 25th percentile of wealth variables is used to show that schools in poor areas can do quite well in oral reading fluency. Third, scatter plots matching oral reading fluency and reading comprehension scores are presented to investigate the fluency levels necessary to ensure high levels of reading comprehension. Fourth, multiple regression results are used to determine the levels of fluency for the expected levels of reading comprehension” (p. 40). As part of the analysis of the statistical findings, each regional language group participated in workshop meetings to mutually decide on the draft fluency and comprehension benchmarks based on the current oral reading fluency and comprehension scores. Piper adds that while all language groups had a minimum of 80% comprehension rate, the differences in reading comprehension benchmarks in each language depended on oral reading fluency targets required to read with understanding. The proposed oral reading fluency benchmark was 60 WPM for Tigrigna and Hararigna. The two regions where Amharic is spoken proposed 60 WPM and 90 WPM respectively. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 58 Piper’s (2010) study presents the initial steps taken to determine language specific oral reading fluency benchmarks in the Ethiopian context. It used 60 WPM, the lowest benchmark used in the U.S. as a frame of reference with the consideration that reading benchmarks from a different language spoken in another country normally do not apply in other contexts. Interestingly, in regions of Ethiopia like Afan Oromo and Sidaamu Afoo where languages are written in an alphabetic script, the proposed oral reading fluency benchmarks were 70 WPM and 75 WPM respectively. The fact that two regions where Amharic is spoken set the benchmark lower (60 WPM) and higher (90 WPM) brings to the fore the importance of ongoing discussion during the decision making process about oral reading fluency and comprehension data trends, what languagespecific levels are needed for high comprehension, rationales surrounding the approximations of fluency and comprehension rates, and the pedagogical implications once benchmarks are set. These varied approximations also illustrate how stakeholders took into account context-specific data and engaged in deliberation on the levels of fluency needed for high comprehension. At the time of the study in 2010, extremely low percentages ranging from 0.1% to 4.3% indicated that children from all regions of Ethiopia were far from meeting the proposed benchmarks. Therefore, part of the fluency benchmark making process included the creation of different target percentages for each region in Ethiopia to be reached by 2015. While the case of Ethiopia illustrates how the development of oral reading fluency benchmarks relied on the lowest benchmark used in the U.S. as a reference point and was open to opposing views on the appropriate benchmark for Amharic, Abadzi (2012) extends the debate over reading rate comparisons across languages. She explains that due Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 59 to the differences in scripts, some believe that reading rates are bound to be language specific. Others hold that oral reading fluency rates are in fact comparable across countries regardless of language or script and that similar and effective intervention approaches could therefore be implemented. Abadzi argues that, “Across several languages and scripts 45-60 words per minute amount to 80% comprehension when vocabulary is known and point to automaticity” (p.13). While frames of reference are helpful, it is ultimately up to stakeholders to decide how oral reading fluency benchmarks are to be established. In a report on reading fluency measurements in Education for All and Fast Track Initiative partner countries, Abadzi (2011) informs: “At least 50 reading fluency studies had been worldwide by February 2010. Many studies involved from 800 to about 3,000 students, but few have collected nationally representative data. Most focused on specific regions or excluded remote areas, and a few involved small convenience samples. Of the studies, many involved EGRA or similarly detailed instruments, while others involved just passage reading and comprehension questions…Some studies focused on just one grade, and different single-grade studies may exist in one country with samples that are not comparable” (p. 11) In the case of Bangladesh, it will need to be decided if adaptations of EGRA tools and procedures will be used to develop an initial set of benchmarks or if the Save the Children Monitoring and Evaluation team and other stakeholders will opt for an alternative battery of assessments and benchmark making procedures. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 60 The section above highlighted oral reading fluency benchmark making procedures that can be adapted to the context of Bangladesh. The next section addresses two salient themes that surfaced during in-country meetings. These themes bring together stakeholders’ perspectives and immediate concerns about the current language in education context and pragmatic considerations for the teaching and learning of reading in Bangla in a competency-based education system. In turn, these themes underscore the critical relevance of Bangla fluency benchmarks and how these can frame further dialogue on how children best learn to read in an alphasyllabic script. The next section contains a synthesis of the notes from in-country meetings. Considerations for Future Fluency Benchmark Studies in Bangladesh Language Learning Context In teacher education programs in Bangladesh, the teaching of reading as a process is not formally taught. Teachers teach a Bangla language-based curriculum that incorporates the four modalities of literacy, namely reading, writing, listening, and speaking. While children across the country learn in Bangla, English, or Arabic and may speak a Bangla dialect at home, a chunk of instructional time is devoted to prepare students for examinations. As a result, children do not experience all the facets of literacy learning. Further, there may be discrepancies between the National Curriculum and Textbook Board (NCTB) curriculum and the way it is implemented. In READ supported schools, students learn to read in Bangla via a whole language approach. The students practice listening, speaking, and picture reading. Teachers expose students to the whole sentence first, then the words that make up the sentence, and finally the letters that make up each word. A challenge for children is to Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 61 write words without vowels even though vowels are pronounced. The inconsistency in the way that some sounds are pronounced and written, which can mean that the order is inverted, poses difficulty for students. Students therefore need time to learn the aksharas of the Bangla language. Around the 8th month of school, first graders are expected to have learned all the single letters and then begin to learn about the conjunct letters. It is at this critical point that fluency is normally stunted. Students’ cognitive space is used to build the visual memory required to identify the shapes of single letters and the sounds they make. The cognitive demand deepens when students use their visual memory to identify the changes in shape when single letters are strung together to form conjunct letters and the syllable sounds these make. In order for students to learn the alphasyllabary principle, ample time and instructional support are needed to practice the implicit rules of the language. An area of contention around the notion of implicit rules for Bangla is the topic of a standard sound system. While a standard Bangla pronunciation may exist, it is not formally taught in a way that is systematic. Students in different geographical areas may learn a nonstandard variety of Bangla in school that more closely reflects the dialect of Bangla spoken at home. This situation can easily give the impression that a standard Bangla pronunciation does exist. Consequently, students in some regions are not ready to pronounce certain words because their phonological awareness may not been formally and systematically exposed to some specific sounds required to build knowledge of lettersound relationships. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 62 Competencies In Bangladesh’s competency-based education system, six out of 52 terminal competencies address Bangla language learning (Chabbott, 2008). This is surprising given that reading in Bangla is required for children to learn in all other subject areas. Moreover, some consider that competencies are not set at the right level. In a competency-based instructional approach, pre-determined outcomes are used to assess student learning (Education Watch, 2000). However, learning and teaching continua are not connected to beginning, middle, an end of year benchmarks, which would help teachers use their ongoing assessments to gauge where students are at a given point in the academic year and develop short and long-term instructional plans to help their students meet the expected end of year competencies. This is further compounded by the fact that students receive approximately three hours of instruction each school day in multilevel classrooms where the student to teacher ratio poses an additional challenge. While the Ministry of Primary and Mass Education (MoPME) recently provided new textbooks, the annual competency measures are still relatively traditional. It therefore needs to be confirmed whether the competency measures in the new instructional materials are linked to a clear set of measures for reading. Furthermore, the competencies may use the word “fluent” but the specific measures for fluency in the past have not specified what the measures for fluency actually are. A sample end of year competency is that second graders should be able to read simple stories but findings from in-country studies indicate the contrary (Basher et al., 2014; Sayed et al., 2014). The development of fluency benchmark goals would help develop clearer expectations at each stage of the Bangla reading acquisition process and inform how standards and competencies are set and measured. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 63 Recommendations Guidelines in the Benchmark Making Process In any context, the development of benchmark goals for fluency is an iterative process that requires time, money, collaboration, and effort. It is therefore pertinent to think about what will be the READ project’s particular contribution and how this contribution will be framed in both policy and practice. While it would seem ideal to develop from the start specific benchmarks for the beginning, middle, and end of the year, it is impractical to start there. It would even be disastrous given the fluency trends previously discussed in this paper. The massive amount of data that would be generated would be another deterrent along with budgetary constraints. An approach may be to implement the benchmark assessment near the end of the academic year right before third grade students are administered the National Student Assessment (NSA) to measure end of the year competencies. This way the READ Team can assess whether the benchmark tool is predictive of the findings in the third grade National Student Assessment. In October 2014, the READ project’s Monitoring and Evaluation Team conducted a baseline survey report. These baseline findings can also inform the decision making process on the endline benchmark tool. During the process, it is important to keep in mind what levels of reading fluency are required so that students score highly on reading comprehension. The immediate section below illustrates a sample timeline for benchmark making procedures. After the table, there is a more detailed discussion on the considerations and recommendations based on in-country meetings and procedures drawn from the Ethiopia benchmark study (Piper, 2010) cited previously in this report. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 64 Table 2. Sample Timeline for Benchmark Making Process Procedure Approximate Duration READ Team initial internal and external workshop sessions to discuss the way forward; advocacy, mobilization, and collaboration with other stakeholders Align Bangla reading competencies and the NSA tool 2-3 weeks Workshop sessions to select areas within regions, examine grade level textbooks, decide on the sampling framework, develop a language-sensitive tool Train assessors and conduct interrater reliability tests one month Pilot the benchmark tool; test the reliability and validity of subtasks Workshop sessions to interpret and discuss the findings from the pilot, make adjustments to the benchmark tool Data collection phase in all regions 7+ days depending on regions and other factors 4 days Workshop sessions to discuss oral reading fluency rates and comprehension scores from each region Workshop sessions to develop proposed draft benchmarks 2-4 days 1-2 days 2 weeks 1-2months 1-2 days Align the Benchmark Tool to the External Criterion Measure An initial consideration is to align the benchmarking tool to the external criterion measure on which the Bangla reading competencies are based. During in-country meetings, individuals commented that the NSA is linked to learning outcomes and does not measure fluency per se. The READ projects’ Monitoring and Evaluation team received a hard copy of the 2013 NSA Bangla assessment. One of the members of the team expressed that neither terminal competence tests (grade-wise) nor the NSA measure fluency or comprehension with EGRA instruments or similar tools and that this challenges the NSA’s ability to serve as an external criterion measure. The member added that for this reason it may be a good idea to see national reading measurements in Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 65 the ASPR. Further meetings will therefore need to be held to decide if the NSA Bangla assessment can be used as an external criterion measure or if another assessment can better serve the purpose. Select Regions It is important to consider the geographic and regional spread to ensure a nationally representative sample that encompasses the linguistic and cultural diversity of the country (Piper, 2010). Once the regions are selected, a plausible approach is to administer a large-scale pilot endline benchmark assessment and conduct smaller mixed method studies in each region to further contextualize and interpret the findings. It is also important to include in the sample places that were not covered in previous studies such as Hill Track areas where dialects are spoken as well as Qawmi madrasas. Develop the Benchmark Tool Workshops to develop the tool and assess pilot findings should be held. The involvement of many stakeholders is critical in the development of the benchmarking tool. Stakeholders from universities, language institutes, the Directorate of Primary Education (DPE), funding agencies, other international organizations, among others should participate. Language experts from Dhaka University and the International Mother Language Institute can help make sure that the linguistic complexity of the benchmarking tool is up to par with the high frequency single letters, conjunct letters, high frequency words as well as the reading passages from the grade level textbook and end of year competencies. Stakeholder discussions during the workshops should decide what range of responses would be considered correct, where meaning is not compromised in cases Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 66 where the children’s first language is a Bangla dialect. The grade level expectations of the national curriculum and the NSA Bangla assessment tool can be used to design the degree of difficulty of the test items. In order to minimize the time needed to administer the test and to ensure it is not tedious for the children, the benchmark tool should be short and simple with a multidimensional design that efficiently measures a number of elements. The READ team should also look at how data from the implementation of Instructional Adjustment Tools (IAT) inform this effort. Although the tool does not measure oral reading fluency rate yet, it does measure other critical areas in each of the stages. It can potentially inform the development of the benchmark tool. Another point to consider is the children’s use of Bangla and whether standard Bangla or a Bangla dialect is the medium of instruction. This has implications in the selection of high frequency words. When the subtasks of the benchmarking tool are created, they should be sensitive to the particular region where the assessment is administered so that the dissonance between students’ phonological awareness and their formal instruction in standard Bangla does not compromise the results. While children acquire language from informal, social contexts, they also acquire language from formal, academic contexts. Classroom discourse is a combination of formal and informal language (Garcia, 2009). In terms of the selection of high frequency words, vocabulary words, and words used in the reading passages, it makes sense to adhere to the variety and range of words that appear in the grade level textbook. Reading passages may contain the same text and the same number of words, yet maintain sensitivity towards the specific language-learning context. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 67 Train Assessors and Pilot the Benchmark Tool Once assessors are trained, inter rater reliability tests should be administered. Participants from several selected schools in each region should be assessed. The reliability and validity of the subtasks in the benchmark tool can then be tested. Additional workshops should be held to discuss pilot findings and develop ways to improve the benchmark tool. Among stakeholders present, local language experts and others can tweak the subtasks as needed. Stakeholders involved in the development of the benchmark tool should be involved in the piloting of the tool (Piper, 2010). Once the benchmark tool is field tested in urban and rural areas, the level of challenge in the reading passages and comprehension questions can be balanced out. Sampling and Data Collection During in-country meetings, individuals emphasized the value of considering all the language learning contexts; standard Bangla, dialects spoken in the rural areas, Arabic, and English medium schools. The inclusion of as many streams as possible from each of the language communities will ensure that the sample is representative. The issue of enrollment should be kept in mind since there are instances in which a student is enrolled in separate classes or there are “ghost” students listed on the student register. It is also important to keep in mind the dynamics involved in the selection of government schools, nongovernment schools, urban schools, rural schools, and socioeconomic status. Piper (2010) noted that care needs to be taken not to develop benchmarks solely off of findings from wealthy schools only since this can be “problematic.” He adds that it is important to include findings that make the point that Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 68 children who attend poor schools yet receive good instruction may also achieve fluency and comprehension. Piper (2010) and his colleagues in Ethiopia developed basic oral reading fluency benchmarks for Amharic, an alphasyllabic script. The EGRA tool was adapted and representativeness was ensured. In a description of the procedure used, Piper states the following: Similar to other national assessments such as NLA, ours did not draw a simple random sample of the population of students in each group of interest, for cost and efficiency reasons. But to enable us to make inferences about the performance of the entire population and not just those sampled, we weighted our results. Our data needed to be weighted because the sample design did not give each individual an equal chance of selection. If we did a random sample of students in Ethiopia, we would have to send the assessment teams to thousands of schools throughout the country. Instead, we grouped students within schools, schools within woredas, and woredas within regions, and corrected for this grouping using weights. (The weights increase the power of the individuals who were sampled, making them represent the estimated population within each group.) (p.15). The READ project team and stakeholders need to decide on a sampling framework that best suits the particular purposes and priorities that arise out of workshop meetings. The timing of the assessment should also be considered. The data collection phase may last approximately one to two months. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 69 Conduct Workshops to Interpret and Discuss the Findings The READ Team, local language experts and other stakeholders should meet to discuss the oral reading fluency rates and comprehension scores from each geographical area. An end goal of the workshop sessions should be to develop draft benchmarks based on findings representative of each regional sample. During in-country meetings, the question of whether to aggregate or not is a decision that will need to be made. Once the endline assessment is completed and third grade students’ NSA results are released, it will be evident if the benchmark tool was predictive of the findings in the NSA 3rd grade assessment. Conduct Workshops to Develop Proposed Benchmarks As mentioned earlier, the creation of fluency benchmarks is an iterative process. During one of the in-country meetings, a stakeholder mentioned that 60 WCPM (Words Correct Per Minute) would be a reasonable end of year oral reading fluency rate for second grade Bangla readers. In a benchmark study conducted by Piper (2010) and his colleagues, they also used 60 WCPM as a benchmark minimum for second grade Amharic readers in order to gauge a general comparison. Piper (2010) implemented several statistical procedures to develop oral reading fluency benchmarks in Amharic. For example, he used quantile regression approaches to identify critical points for oral reading fluency scores. In order to demonstrate that students in poor regions can achieve reasonable oral reading scores, he analyzed the “average reading scores for schools in the lowest 25th percentile of wealth” (Piper, 2010, p. 40). Piper also implemented multiple regression analyses to gauge a reasonable match between fluency levels and expected comprehension levels. The author informs that once Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 70 the data was collected, quantile regression methods were used to show critical points for oral reading fluency scores. During the workshop sessions, the READ Team and other stakeholders can reach an agreement to determine draft fluency and comprehension benchmarks for Bangla. An important part of the discussion should include what definition of reading comprehension rate will be used. For example, the definition may be based on the number of correct questions out of the number of completed questions or the number of correct questions out of the total number of questions. The Way Forward: Advocacy, Mobilization, and Collaboration While learning how other entities proceed to develop oral reading fluency benchmarks serves as a frame of reference, advocacy work entails consensus and ownership of the benchmark making process. The READ Team can begin to think of alliances in the form of long-term local and international collaboration that can provide the specialized and technical assistance that will be needed. In the process of selecting stakeholders, it is important to keep in mind which entities will not seek payment because there is interest in the endeavor and which ones will have to be paid via contracts. Local partners from the Campaign for Popular Education (CAMPE), the Institute for Education and Research (IER), the Directorate of Primary Education (DPE), the National Curriculum and Texbook Board (NCTB), BRAC University, University of Dhaka, Room to Read, among others can support the advocacy and mobilization process. Language experts from local universities as well as the Bangla Academy, the Language Institute, and the International Mother Language Institute should also collaborate on the development of the benchmark tool. International partners may include USAID and the Global Reading Network among others. Participation at the upcoming Comparative Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 71 International Education Society (CIES) conference can inform the READ Team about what other USAID-funded agencies are doing around the topic of benchmarks. Collaboration and discussion should also center on the implications of designing a benchmark tool. It will likely raise accountability among districts, administrators, teachers, and students. Strategic systemic support will need to be provided to move students along a developmental continuum that leads to reading with understanding. Conclusion Setting language-specific oral reading fluency benchmarks is an important step that needs to be taken if children are to read with fluency and comprehension. A consensus on clear guidelines at the levels of the akshara, word, and sentence in the early grades will help teachers identify where students actually are in the reading acquisition process and where they need to be. In this way, teachers will be able to provide targeted and differentiated instruction that will move students successfully through the stages of reading acquisition. The creation of oral reading fluency benchmarks that align with comprehension benchmarks can potentially improve the quality of reading instruction in primary classrooms in areas such as decoding skills and explicit comprehension strategies through ongoing teacher professional development workshops. It may spark the creation and use of a wide range of grade level reading materials that supplement the use of textbooks. It may inform the teaching of reading curriculum in pre-service and in-service teacher education programs as well as urge the cooperation of parental and other community literacy initiatives. Local and international organizations will be in a better position to collaborate with the Ministry of Education in order to develop and expand feasible literacy interventions based on oral reading fluency and comprehension Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 72 benchmarks. Thus, collective resources and policy efforts towards the creation of oral reading fluency benchmarks that align with comprehension benchmarks can anchor the long term research, planning, and implementation required to see marked changes in the quality of reading outcomes in Bangladesh. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 73 Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 74 References Abadzi, H. (2011). Reading fluency measurements in EFA FTI partner countries: outcomes and improvement prospects. Working Paper Series. Global Partnership for Education: Washington D.C. Abadzi, H. (2012). Developing cross-language metrics for reading fluency measurement: Some issues and options. Working Paper Series on Learning No. 6. Global Partnership for Education: Washington D.C. Baker, D. L., Cummings, K.D., Good, R.H., & Smolkowski, K. (2007). Indicadores dinámicos del éxito en la lectura (IDEL): Summary of decision rules for intensive, strategic, and benchmark instructional recommendations in kindergarten through third grade.Technical Report No. 1. Dynamic Measurement Group: Eugene, OR. Basher, M.S., Jukes, M., Cooper, P. & Rigole, A. (2014). Bangla reading fluency in early grades: A comparative study between Room to Read supported government primary schools and other primary schools of Bangladesh. Room to Read: Bangladesh. Center on Teaching and Learning, University of Oregon DIBELS Data System. (2012). 2012-2013 DIBELS Data System Update Part II: DIBELS Next Benchmark Goals. Oregon: USA. https://dibels.uoregon.edu/docs/techreports/DDS2012TechnicalBriefPart2.pdf (retrieved December 26, 2014.) Chabbott, C. (2008). Developing a practical assessment of early language learning in Bangladesh. BRAC University: Bangladesh. (retrieved January 13, 2015). Calkins, L. (2001). The art of teaching reading. New York: Longman. Clay, M. M. (1991). Becoming literate: The construction of inner control. Portsmouth, NH: Heinemann. Davidson, M. (2013). Books that children can read: decodable books and book leveling. USAID: Washington D.C. Dewey, E.N., Powell-Smith, K.A., Good, R.H., Kaminski, R.A. (2014). Technical adequacy supplement for DIBELS next oral reading fluency. Dynamic Measurement Group: Eugene, OR. Dowd, A.J., Friedlander E. (2009). Bangladesh program: emergent and early grades reading assessment validation results. Save the Children: Washington D.C. Fountas, I.C. & Pinnell, G.S. (2001). Guiding readers and writers: grades 3-6 teaching comprehension, genre, and content literacy. Heinemann: New Hampshire. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 75 Fountas, I.C. & Pinnell, G.S. (2006)a. Teaching for comprehending and fluency: Thinking, talking, and writing about reading, K-8. Heinemann: New Hampshire. Fountas, I.C. & Pinnell, G.S. (2006)b. Leveled books: matching texts to readers for effective teaching, K-8. Heinemann: New Hampshire. Fountas I.C.& Pinnell, G.S. (2009). When readers struggle: teaching that works. Heinemann: New Hampshire. Frost, R. (2012). Towards a universal model of reading. Behavioral and Brain Sciences, 35, (5), 263-279. Garcia, O. (2009). Bilingual education in the 21st century: A global perspective. UK: Wiley-Blackwell. González, N. (2009). Beyond culture: the hybridity of funds of knowledge. In N. González, L.C. Moll, C. Amanti (Eds.). Funds of knowledge: theorizing practices in households, communities, and classrooms. New York: Routledge. Guajardo, J. Hossain, M., Nath, B.K.D., Dowd, A.J. Guajardo (2013). Literacy Boost Bangladesh endline report. Save the Children: Bangladesh. Hasbrouck, J. & Tindal, G. (2005). Oral reading fluency: 90 Years of Measurement. Technical Report Number 33. Behavioral Research & Teaching: University of Oregon. Hasbrouck, J. & Tindal, G. (2006). Oral reading fluency norms: A valuable assessment tool for reading teachers. The Reading Teacher, 59 (7), 636-644. Jukes, M., Vagh, S., and Kim, Y. (2006). Development of assessments of reading ability and classroom behavior: a report prepared for the World Bank. Harvard Graduate School of Education. Cambridge: MA. Moore, P. & Lyon, A. (2005). New essentials for teaching reading in pre-k-2. Scholastic: New York. Nag, S. (2007). Early reading in Kannada: the pace of acquisition of orthographic knowledge and phonemic awareness, Journal of Research in Reading, 30 (1),722. Nag, S. & Sircar, S. (2008). Learning to read in Bengali: A report of a survey in five Kolkata primary schools. The Promise Foundation: Bangalore, India. Nag, S. (2011). The akshara languages: what do they tell us about children’s literacy learning?, in R. Mishra and N. Srininivasan, (Eds.), Language-Cognition: state of the art (pp. 272-290). Lincom Publishers, Germany. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 76 Nag, S. & Snowling, M. (2011)a. Reading difficulties in Kannada, an Indian alphasyllabary. The Promise Foundation: Bangalore, India. http://www.thepromisefoundation.org/TPFRdK.pdf. (retrieved January 7, 2014). Nag, S. & Snowling, M. (2011)b. Reading comprehension, decoding skills, and oral language. The EFL Journal, 2 (2), 85-105. Nag, S. & Snowling, M. (2012). Reading in an alphasyllabary: implications for a language-universal theory of learning to read. Scientific Studies of Reading, 16 (5), 404-423. Nag, S. Snowling, M. Quinlan, P. & Hume, C. (2014)a. Child and symbol factors in learning to read a visually complex writing system. Scientific Studies of Reading, 18 (5), 309-324. Nag, S., Chiat, S., Torgerson, C. & Snowling, M. (2014)b. Literacy, foundation learning and assessment in developing countries: final report. Department for International Development: UK. Nakamura, P. (2014). Facilitating reading acquisition in multilingual environments in India (FRAME-India): final report. American Institutes for Research: Washington D.C. Nation, K. & Snowling, M.J. (2004). Beyond phonological skills: broader language skills contribute to the development of reading. Journal of Research in Reading, 27 (4), 342-356. National Institute of Child Health and Human Development. (2000). Report of the national reading panel: teaching children to read: an evidenced-based assessment of the scientific research literature on reading and its implications for reading instruction: reports of the subgroups. Washington, D.C.: U.S. Department of Health and Human Services. NIH Publication Number 00-4754. NICHD Early Childcare Research Network. (2005). Pathways to reading: The role of oral language in the transition to reading. Developmental Psychology, 41 (2), 428-442. Perfetti, C.A. (2003). The universal grammar of reading. Scientific Studies of Reading, 7 (1), 3-24. Lawrence Erlbaum Associates. Piper, B. (2010). Ethiopia early grade reading assessment data analytic report: language and early learning. RTI International: Ethiopia. Powell-Smith, K.A., Good, R.H., Latimer, R.J., Dewey, E.N., Wallin, J., & Kaminski, R.A. (2012). DIBELS next: findings from the benchmark goals study. Technical Report Number 11. Dynamic Measurement Group: Eugene, OR. Rasinski, T.V. (2010). The fluent reader: oral and silent reading strategies for building fluency, word recognition, and comprehension. Scholastic: New York. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 77 Sayed, M.A., Guajardo, J., Hossain, M.A., & Gertsch, L. (2014). READ baseline survey report. Save the Children: Bangladesh. Sircar, S. & Nag, S. (2014). Eds. Winskel, H. & Padakannaya, P. Akshara-Syllable mappings in Bengali: A language specific skill for reading. South and Southeast Asian Psycholinguistics, 202-211.Cambridge University Press. Tiwari, S. (2011). Literacy development in the alphasyllabaries: implications for clinical practice. Rapporteur’s Theme Summary on Language Literacy and Cognitive Development (LLCd) Symposium, 16th-17th December: Bangalore, India. Vagh, S.B. (2009). Evaluating the reliability and validity of the ASER testing tools. ASER Centre: New Delhi, India. Vagh, S.B. (2010). Validating the ASER testing tools: comparisons with reading fluency measures and the Read India measures. ASER Centre: New Delhi, India. Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 78 Appendix A: Sample Decodable Reader in Bangla Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 79 Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 80 Final Report: ORF Benchmark Procedures Dr. Mónika Lauren Mattos Save the Children Bangladesh 81