What I Learned About Assessment From the AP Program Dan Kennedy Baylor School Houston AP Teachers Meeting November 11, 2006 True Confession of a Veteran Mathematics Teacher: For many years I never thought much about assessment. I graded my students in what I thought was an appropriate variety of ways: Tests … Quizzes … and Homework. This model had stood the test of time. In 1986 I was invited to become a member of the AP Calculus Test Development Committee. From 1990 to 1994 I would serve as chair. My experience with this group changed my views of assessment forever. I had already learned one important fact about classroom assessment merely by teaching an AP course: It changes the entire classroom dynamic when the teacher honestly does not know what will be on the test. The teacher has no other option but to teach the students how to think for themselves! Why students don’t think on tests: •Thinking takes time. •Thinking is only necessary when you cannot do something “without thinking.” •If you can do something without thinking, you can do it very well. •Students who can do something very well have been well-prepared. •Therefore, if you prepare them well, your students will proceed through your tests without thinking! Were AP Calculus exams predictable? 1987 BC Exam: 1. 2. 3. 4. 5. 6. Differential equation Implicit Differentiation Area/volume Series Particle problem Theory problem (stretch) Of course, this was just one exam. But there were others like it. But if we tried to change anything, teachers would notice. Then, in AP workshops all over the country, teachers would find themselves uttering to AP consultants the words they dreaded most when spoken by their students: Will this be on the test? And why should teachers NOT ask that question? It is how the game is played. •We show the students how to do math. •We let them practice at it for a while. •Then we give them a test to see how well they can mimic what we did. The game is won and lost for BOTH of us on test day. This was just another example of the educational paradigm that was leading my student not to think on tests! But how can teachers change the game if we want our students to succeed? Teachers have one secret weapon: We define what it means to succeed. We control the grade! Something I learned about assessment from the AP program: It is perfectly OK to scale grades! AP Grade Conversion Chart Calculus AB Composite AP Grade Score Range* 75−108 5 58−74 4 40−57 3 25−39 2 0−24 1 *The candidates' scores are weighted according to formulas determined by the Development Committee to yield raw composite scores; the Chief Faculty Consultant is responsible for converting composite scores to the 5-point AP scale. 75% =5 At our school, 75% is not a good grade. In fact, 65% is a minimal pass. Is this reasonable? Think about it. •The all-time NBA record for field goal percentage in a season is 72.7%. •The all-time record batting average for major league baseball is .440 (44%). •A salesperson who makes a sale on 75% of first contacts is a genius. So how can we expect 75% success from someone who is just learning? If the AP exam were constructed so that the low-to-average student could get 75% of the maximum points, (a) it wouldn’t be much of a test, and (b) the distribution would be skewed rather than normal. 99 92 82 • • 71 • 30 • 20 75 • 93 An Important Disclaimer: Scaling grades is not about building selfesteem. Scaling grades is about teaching mathematics. Assessment should support your efforts to teach your students mathematics. It should not get in the way. ClrHome:FnOff PlotsOff :ClrTable:ExprOff 6 Xmin:100 Xmax 0 Ymin:124 Ymax 0 Xscl:0 Yscl Input "RAW SCORE: ",A Input "CURVED TO: ",B Input "RAW SCORE: ",C Input "CURVED TO: ",D (B−D)/(A−C) M "round(MX+B−AM,0)" Y1 IndpntAsk DispGraph Text(1,1,"TRACE OR USE TABLE") Text(7,1,"TO ENTER RAW") Text(13,1,"SCORES.") Scaling grades on the TI-84 Plus Some things that ETS worried about that I didn’t: • r-biserial •Content validity •Speededness •True score •Grading rubrics r-biserial (r-bis) “A correlation coefficient relating performance on a test question and performance on the measure used as a criterion. It is an index of discrimination measuring the extent to which examinees who score high on the measure used as the criterion tend to get the question right and those who score low tend to get it wrong.” 1969 Multiple-choice question #26: 1 0 x 2 x 1 dx is 2 1 (A) 1 (B) 2 (E) none of the above The answer is (C). 1 (C) 2 (D) 1 AB Stats: A 3% B 57% C 7% D 3% E 20% BC Stats: A 1% B 70% C 11% D 2% E 9% Projected Chimpanzee Stats: A 20% B 20% C 20% D 20% E 20% Correct responses to problem #26: AB 7% BC 11% Chimps 20% Content Validity “Validity is the extent to which a test measures what it is supposed to measure. The content validity of an (AP) test is the extent to which the content of the test represents a balanced and adequate sampling of the universe of content in which the test is intended to measure achievement.” The AP Calculator Experiment (1983-84) In 1983 the AP Calculus Committee decided to allow (but not require) the use of scientific calculators on the AP Calculus examinations. This was not to be a very happy debut for technology on the AP stage. AP readers found that students were losing points on the free-response section because of calculator misuse. The calculators affected the scores. But calculators were not being tested! This compromised the content validity. The committee had two choices: 1. Forbid calculators and test as usual; 2. Require calculators and alter the test. They chose to forbid the calculators. One of my Precalculus tests from 1990: Note the emphasis on computation. Note that there is nothing here to suggest that any of this stuff is worth knowing! A recent test on the same functions: PRECALCULUS TEST 4 1. d) 3 3. 4. x2 27 b) 7 x 126 c) log x log( 2 x 1) log 21 e) log x 16 2 f) log(log x) = 0 3 7. A bacteria population grows according to the exponential model A(t) = 150 e0 .12t , where A(t) is the population after t days. a) What is the initial population at time t = 0? b) How many days will it take for the population to triple in size? c) Will this model be valid for arbitrarily large values of t ? Explain. 2 Find all three roots of the equation x 2 x . You can find the first one graphically, but the other two (complex) must be found algebraically. 8. 3x a) Find the vertical asymptotes of the graph of y = 2 . x 4 b) Find the horizontal asymptote of the graph of the function in (a). b) Give an example of a rational function that has a vertical asymptote of x = 5 and a horizontal asymptote of y = 1. A certain rumor spreads through a small town so that the proportion of the population that has et heard the rumor after t days is given by the formula P(t) t . Consider the graph of the e 7 function in answering the following questions. a) What proportion of the population has heard the rumor at time t = 0? b) During what day will 90% of the population come to have heard the rumor? c) During what day does the rumor seem to be spreading the fastest? Let f (x) log(x 3) and g ( x) 2 x. Find: 9. a) the domain of f. b) the domain of g. c) the range of f. d) the range of g. e) the inverse function of g. f) the vertical asymptote of f. g) the horizontal asymptote of g. h) Does f have a horizontal asymptote? 5. We deposit $5000 in an account earning 6.8% annual interest. Find the worth of the account in 10 years if the interest is compounded a) annually. b) monthly. c) daily. Find all real numbers x which satisfy the following equations: a) x log 5 13 2. 6. A single function F satisfies all of the following properties: a) F(1) = 0 b) F(a) + F(b) = F(ab) c) F(a) – F(b) = F(a ÷ b) d) F(10) = 1. Can you guess what the function is by considering its properties? Which of the following graphs represents: 1) exponential growth? 2) logistic growth? 4) linear depreciation? 5) logarithmic growth? a) b) 3) exponential decay? c) Still not perfect, but a better test. Speededness “The appropriateness of a test in terms of the length of time allotted. For most purposes, a good test will make full use of the examination period but not be so speeded that an examinee’s rate of work will have an undue influence on the score he receives.” Allowing for speededness Exam Format for AP Calculus AB Exam Format % of Number of Grade Questions Section I Minutes Allotted Calculator Use 50 Part A 28 55 no calculator Part B 17 50 graphing calculator required Part A 3 problems 45 graphing calculator required Part B 3 problems 45 no calculator Section II 50 True Score “A score entirely free of errors of measurement. True scores are hypothetical values never obtained in actual testing. A true score is sometimes defined as the average score that would result from an infinite series of measurements with the same or exactly equivalent tests, assuming no practice effect or change in the examinee during the testings.” Why teachers don’t need to worry about true score: We can assess our students all year long! The more often the better. Sorry, kids. Yessss! AP Calculus Grading Rubrics AP® CALCULUS AB 2004 SCORING GUIDELINES Question 1 If the AP readers can give partial credit fairly to 250,000 students, I ought to be able to do it for my own students. In AP Calculus, I can even use the AP rubrics to do it. Traffic flow is defined as the rate at which cars pass through an intersection, measured in cars per minute. The traffic flow at a particular intersection is modeled by the function F defined by t F (t ) 82 4sin for 0 t 30, 2 where F(t) is measured in cars per minutes and t is measured in minutes. (a) To the nearest whole number, how many cars pass through the intersection over the 30minute period? (b) Is the traffic flow increasing or decreasing at t = 7? Give a reason for your answer. (c) What is the average value of the traffic flow over the time interval 10 t 15? Indicate units of measure. (d) What is the average rate of change of the traffic flow over the time interval 10 t 15? Indicate units of measure. (a) 30 0 F (t )dt 274 cars (b) F(7) 1.872 or 1.873 Since F (7) 0 , the traffic flow is decreasing at t = 7. (c) 1 15 F (t )dt 81.899 cars / min 5 10 (d) F (15) F (10) 1.517 or 1.518 cars / min 2 15 10 Units of cars / min in (c) and cars / min 2 in (d) 1 : limits 3 : 1 : integrand 1 : answer 1 : answer with reason 1 : limits 3: 1 : integrand 1 : answer 1 : answer 1: units in (c) and (d) Copyright © 2004 by College Entrance Examination Board. All rights reserved. Visit apcentral.com (for AP professionals) and www.collegeboard.com/apstudents (for AP students and parents). AP Calculus Exams are: •Designed to test knowledge •Designed to test cleverness •Scaled reasonably •Not made up by the teacher •Open assessments •Comprehensive assessments (valid) •Honest about technology Two Fundamental Principles: 1. Assess what you value. 2. Value what you assess. Some problems with traditional tests: •They assess only a fraction of what we value. •They depend too much on luck. •There is often no feedback (as with final exams). •They are usually taken alone. (Is this what we value?) •They are usually timed. (Is this a good model for quality work?) •They are frequently taken under artificial, stressful conditions. •They are dependent on teacher stimulus. •They are often devoid of creativity (if students are “prepared”). •They favor one narrow kind of student performance. •Success is usually short-term and non-transferable. •The emphasis in the end is what the student can NOT do. •They can inhibit further learning. Some assessment strategies I like: •Assess what you value and value what you assess! •Assess often, with different kinds of assessments. •Give meaningful and prompt feedback. •Give partial credit for partially correct work. •Explain all your expectations to your students from the start. •Test diligence, knowledge, and cleverness in focused ways. •Encourage creativity through your assessments. •Scale grades to control the standard deviation. •Only fail students who are failures. Keep everyone in the game. •Encourage collaboration in class and on homework. •Assess diligence. Find a way to grade homework frequently. •Try portfolios. •Remember: This is not about self-esteem. It’s about teaching mathematics to all your students! Rebecca Flake’s Portfolio Entry Rebecca Flake Students need to hand in a portfolio of items of their own choosing. The main point of this assessment is that they are not responding to a stimulus from me (as in a test or a quiz). My primary directive for student portfolio entries is this: Give me evidence of your learning that I otherwise would not have! This was my first year to be a peer tutor, and I enjoyed helping the girls in the dorm a lot. Last night, though, I finally saw the importance of my peer tutoring. My roommate came in at 10:00 extremely upset over her Precalculus test that was the next day. I calmed her down and told her that I would help her if I could. Carrie, who had been in the play, had gotten behind in her work, so she didn’t understand what they were doing. She showed me the problem. I knew the answer, but I wasn’t sure how to explain it to her in a way that was not confusing. I thought about it for a while, and I ended up trying several approaches (with Clara’s help) that I had learned in Calculus, until I finally got through to her. Then I made her work a few problems for me, and she did them perfectly. She understood! I was so happy to be able to help her that I had forgotten I was supposed to be studying for my own Calculus test. She was so happy she understood that she began to cry. She really began to cry. It’s great to be able to use the things you have learned to help other people learn too. A happy footnote: Carrie really did understand. She scored 93 on the Precalculus test the following day – a personal best for her, and a full 9 points above the class average. Actress Carrie E-mail me at: dkennedy@baylor.chattanooga.net Or visit the Baylor School web site at www.baylorschool.org. Click on me under Faculty and link to my home page.