1. Assessment involves the use of empirical data on student learning to refine programs and improve student learning. (Assessing Academic Programs in Higher Education by Allen 2004) 2. Assessment is the process of gathering and discussing information from multiple and diverse sources in order to develop a deep understanding of what students know, understand, and can do with their knowledge as a result of their educational experiences; the process culminates when assessment results are used to improve subsequent learning. (Learner-Centered Assessment on College Campuses: shifting the focus from teaching to learning by Huba and Freed 2000) 3. Assessment is the systematic basis for making inferences about the learning and development of students. It is the process of defining, selecting, designing, collecting, analyzing, interpreting, and using information to increase students' learning and development. (Assessing Student Learning and Development: A Guide to the Principles, Goals, and Methods of Determining College Outcomes by Erwin 1991) 4. Assessment is the systematic collection, review, and use of information about educational programs undertaken for the purpose of improving student learning and development. (Assessment Essentials: planning, implementing, and improving assessment in higher education by Palomba and Banta 1999) There are two main types of assessment, summative assessment, and formative assessment. Summative Assessment Oftentimes, summative assessments can be considered high-stakes. Summative assessments are used to gauge children's learning against a standard or a benchmark. They are often given at the end of the year and are sometimes used to make important educational decisions about children. Summative assessments are a snapshot of students' understanding which is useful for summarizing student learning. What helps me remember the difference between the two types is that summative is like a summary. Summative is the big picture or the grand summary of a child's learning. Summative assessments aren't used a lot in early childhood programs because they're not really considered developmentally appropriate as a form of assessment for very young children. One 1 example that you might see or use in your program is a Kindergarten Readiness Assessment or a developmental skills assessment that enables the child to move to the next classroom. I've heard of some programs doing assessments like that, where a child has to have a certain score on this assessment in order to move up to the next preschool room or the four or five room, or whatever it was for that particular program. There's a little bit of debate within our field about whether it is developmentally appropriate or not to test children to move them up to that next level. In my experience, there have been several times where I felt that a child was ready to move up to the next class even though age-wise or chronologically he wasn't the right age to move. I'm sure we've all had those children where we're in the three-year-old class and the child's mentally five, but chronologically he's three. Then there have been other times where the child was chronologically ready to move up at age five, but developmentally I really felt like he should have been in the other room for a little bit longer. There are all kinds of implications for using an assessment of that type for that reason. Not to say that that's wrong to do. It's just there's a little bit of debate in our profession about using those types of assessments. Formative Assessments That takes us to the second type of assessment which is formative assessments. These are considered low-stakes. So summative are high-stake and formative are low-stake. They're ongoing and they tend to be based on teachers' intentional observations of children which are typically during specific learning experiences and/or during everyday interactions or classroom involvement. These assessments are most useful for planning learning experiences, activities, and environments. These are the everyday interactions that we talked about, where assessment naturally emerges from the work that you're already doing. Those would be considered more of the formative assessment. Again, these assessments are used to determine activities for the lesson plan after asking questions such as What kind of things should I change out in my centers? What kind of items in the science center are the kids just throwing? 2 What kind of things in the science center are they actually sitting down and investigating and trying to see what they can figure out about it or are they really actually curious about? When I was a preschool teacher, I had many four to five-year-old children in my classroom because at the time, the ratio for our state was one to 15. I had 30 children in my classroom and I had to really be on top of what my children were interested in and what they had figured out or had gotten over the excitement of. When you have that many children in the classroom, you have to keep them engaged, active, and busy. Formative assessments were extremely helpful for me in that way. Formative assessments are most appropriate for use with young children. Remember, summative assessments are not necessarily appropriate for age five years and under, but formative assessments are definitely appropriate as they're often more authentic, more real, and more holistic. They show a picture of the whole child as well so they can be more useful. Because young children's learning can be so varied and sometimes erratic, using multiple sources of assessment information is ideal. That goes back to what we were just talking about where children develop in such a wide range, with a variety of contexts and situations. There's such a wide range of development when it comes to young children, that even though you might have a classroom full of three-year-olds, developmentally they're going to be on a spectrum. That's because development in learning is varied and can be erratic. The term erratic may be a little bit shocking at first, but young children's learning can be erratic. For example, if you work with infants, one day you send them home and they can't sit up or roll over and are just laying there looking at you. Then they come back on Monday and they're rolling and moving and grooving and doing all kinds of stuff. If you work with toddlers, one day you send them home and they barely say two or three words, the next week they come back and you can't get all the words down that they're speaking. In this situation, erratic means sometimes very sudden, but sometimes it's drawn out. It depends on the child. Formative assessments can be formal, where you're actually making time to sit down and take notes during a specific time or a specific center based on a specific child. They can also be informal such as when you're out on the playground and a child is sitting under the tree with a book and you just go over and you sit down and say, "Hey, can I read with you?" You notice, wow, this child knows a lot of words in this book and you make a note of that. That would be more of an informal type of assessment that you've done. 3 Formative assessments can be initial or ongoing. The initial formative assessment is usually done to find out as much as we can about the child, usually at the beginning of the year or as a child enters a program. It usually involves observing, studying existing information, and reviewing home background info. In the program that I supervised, when we had a new child join our program, we had a sheet that the parents would fill out that asked all kinds of information like, "What's your child's favorite stuffed animal? How does your child go to sleep at night? What's the bedtime routine? What's your child's favorite food? What's your child's favorite movie?" It was all background information about the child so that we could get to know them. That helped us begin those connections that are so important in early childhood. That home background information would be a part of that first initial formative assessment. The other type of formative assessment is an ongoing formative assessment. This typically provides more in-depth information, often because it takes more time. An ongoing formative assessment isn’t a quick form that you’re through with once. It’s an ongoing thing you will look at every week, month, three months, or however it is set up in your program. Here are some examples of published formative assessment tools often used in early childhood programs. The Work Sampling System (WSS) www.worksamplingonline.com Teaching Strategies GOLD www.teachingstrategies.com High Scope COR (Child Observation Record) www.onlinecor.net The Creative Curriculum Developmental Continuum www.teachingstrategies.com Sometimes a state or funding sources will mandate that certain early childhood programs use a specific assessment tool. Sometimes your program itself mandates that. I've had the experience of working with all of these tools at one time or another in my career. All of them have definite benefits to using them and many of them are pretty easy to complete. As you know, in early childhood time is not a luxury that we have a lot of. It's always nice to have a tool that's easy to use so that when you find five minutes to sit down and work on something or do an assessment, then it's easy to figure out. 4 Objective items which require students to select the correct response from several alternatives or to supply a word or short phrase to answer a question or complete a statement; and (2) subjective or essay items which permit the student to organize and present an original answer. Objective items include multiple-choice, true-false, matching and completion, while subjective items include short-answer essay, extended-response essay, problem solving and performance test items. For some instructional purposes one or the other item types may prove more efficient and appropriate. To begin out discussion of the relative merits of each type of test item, test your knowledge of these two item types by answering the following questions. These are some characteristics of objective and subjective tests: Objective Tests characteristics: They are so definite and so clear that a single, definite answer is expected. They ensure perfect objectivity in scoring. It can be scored objectively and easily. It takes less time to answer than an essay test Subjective Tests Characteristics Subjective items are generally easier and less time consuming to construct than are most objective test items Different readers can rate identical responses differently, the same reader can rate the same paper differently over time. Criterion-referenced tests compare a person’s knowledge or skills against a predetermined standard, learning goal, performance level, or other criterion. With criterion-referenced tests, each person’s performance is compared directly to the standard, without considering how other students perform on the test. Criterion-referenced tests often use “cut scores” to place students into categories such as “basic,” “proficient,” and “advanced.” Criterion-referenced tests compare a student's knowledge and skills against a predetermined standard, cut score or 5 other criterion. ln criterion-referenced tests the performance of other students does not affect a student's score. This text was recognized by the built-in Ocrad engine. A better transcription may be attained by right clicking on the selection and changing the OCR engine to "Tesseract" (under the "Language" menu). This message can be removed in the future by unchecking "OCR Disclaimer" (under the Options menu). If you’ve ever been to a carnival or amusement park, think about the signs that read “You must be this tall to ride this ride!” with an arrow pointing to a specific line on a height chart. The line indicated by the arrow functions as the criterion; the ride operator compares each person’s height against it before allowing them to get on the ride. Note that it doesn’t matter how many other people are in line or how tall or short they are; whether or not you’re allowed to get on the ride is determined solely by your height. Even if you’re the tallest person in line, if the top of your head doesn’t reach the line on the height chart, you can’t ride. Criterion-referenced assessments work similarly: An individual’s score, and how that score is categorized, is not affected by the performance of other students. In the charts below, you can see the student’s score and performance category (“below proficient”) do not change, regardless of whether they are a top-performing student, in the middle, or a low-performing student. 6 This means knowing a student’s score for a criterion-referenced test will only tell you how that specific student compared in relation to the criterion, but not whether they performed belowaverage, above-average, or average when compared to their peers. How to interpret norm-referenced tests Norm-referenced measures compare a person’s knowledge or skills to the knowledge or skills of the norm group. The composition of the norm group depends on the assessment. For student assessments, the norm group is often a nationally representative sample of several thousand students in the same grade (and sometimes, at the same point in the school year). Norm groups may also be further narrowed by age, English Language Learner (ELL) status, socioeconomic level, race/ethnicity, or many other characteristics. 7 One norm-referenced measure that many families are familiar with is the baby weight growth charts in the pediatrician’s office, which show which percentile a child’s weight falls in. A child in the 50th percentile has an average weight; a child in the 75th percentile weighs more than 75% of the babies in the norm group and the same as or less than the heaviest 25% of babies in the norm group; and a child in the 25th percentile weighs more than 25% of the babies in the norm group and the same as or less than 75% of them. It’s important to note that these norm-referenced measures do not say whether a baby’s birth weight is “healthy” or “unhealthy,” only how it compares with the norm group. For example, a baby who weighed 2,600 grams at birth would be in the 7th percentile, weighing the same as or less than 93% of the babies in the norm group. However, despite the very low percentile, 2,600 grams is classified as a normal or healthy weight for babies born in the United States—a birth weight of 2,500 grams is the cut-off, or criterion, for a child to be considered low weight or at risk. (For the curious, 2,600 grams is about 5 pounds and 12 ounces.) Thus, knowing a baby’s percentile rank for weight can tell you how they compare with their peers, but not if the baby’s weight is “healthy” or “unhealthy.” 8 Norm-referenced assessments work similarly: An individual student’s percentile rank describes their performance in comparison to the performance of students in the norm group, but does not indicate whether or not they met or exceed a specific standard or criterion. In the charts below, you can see that, while the student’s score doesn’t change, their percentile rank does change depending on how well the students in the norm group performed. When the individual is a top-performing student, they have a high percentile rank; when they are a lowperforming student, they have a low percentile rank. What we can’t tell from these charts is whether or not the student should be categorized as proficient or below proficient. 9 This means knowing a student’s percentile rank on a norm-referenced test will tell you how well that specific student performed compared to the performance of the norm group, but will not tell you whether the student met, exceeded, or fell short of proficiency or any other criterion. 3. d. Item analysis is a process which examines student responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. Item analysis is especially valuable in improving items which will be used again in later tests, but it can also be used to eliminate ambiguous or misleading items in a single test administration. In addition, item analysis is valuable for increasing instructors’ skills in test construction, and identifying specific areas of course content which need greater emphasis or clarity. Separate item analyses can be requested for each raw score 3.e Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. Example: A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight. Test validity refers to how well a test measures what it is purported to measure. For a test to be reliable, it also needs to be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an excess of 5lbs. The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your weight. 10 Alternatives Upper 25 Lower 25 A 2 5 B 3 2 Calculate its item index of difficulty = = D 15 3 𝑁𝑈𝑀𝐵𝐸𝑅 𝑂𝐹 𝑆𝑇𝑈𝐷𝐸𝑁𝑇𝑆 𝑊𝐻𝑂 𝐺𝑂𝑇 𝐼𝑇 𝑅𝐼𝐺𝐻𝑇 2+5 25 C 5 15 𝑇𝑂𝑇𝐴𝐿 𝑁𝑈𝑀𝐵𝐸𝑅 𝑂𝑃 𝑆𝑇𝑈𝐷𝐸𝑁𝑇 = 7 25 = 0.28 B) Discrimination Index= number of students in the lower group - number of students in the upper group Discrimination Index for A= 3-2 Discrimination Index for A =1 item # Upper 25 Lower A 2 3 item index of difficulty (p) 0.28 B 3 2 0.28 -1 C 5 15 0.8 10 D 15 3 0.72 -12 11 Discrimination Index (D) 1