STLF Report to CWSEI and Mathematics Department STLF: Sandra Merchant Period: 05/02/12 – 15/05/12 Submitted: 15/05/12 Specific activities performed by STLF 1) Professional development • Attended the morning portion of the CWSEI end-of-year event (Apr 20). 2) MATH SEI general meetings/activity • Met with Director Costanza Piccolo to review the status of my projects and plan for work during my part-time employment (Jan 6, Mar 2). 3) Course-specific meetings/activities Tracking Proof Skills Project (MATH 220 – Mathematical Proof and subsequent courses) 1. Started writing a paper about the proof skills diagnostic test. This paper will outline the process by which the test was developed and how it has been used so far to assess learning in MATH 220. The paper will include an item analysis of each of the questions on the diagnostic test. I performed the item analysis on a variety of subsets of data, including pre-test and posttest data, as well as section-by-section, term-by-term, and all terms pooled. The results of the item analysis differ slightly depending on the data treatment. In our opinion the most relevant treatment is the pooled pre-test, as this represents the general incoming population for MATH 220, whereas post-test results may depend on instructional differences within the course. The item analysis for the pooled pre-test is summarized in the following table: Test Item 1 2 3 4a 4b 4c 4d 5 6 7 8 9 10 11 12 13 14 15 Difficulty Index (P) .60 .56 .70 .84 .62 .90 .95 .71 .77 .54 .85 .71 .59 .37 .24 .08 .43 .34 Discrimination .36 .52 .50 .27 .40 .25 .10 .40 .33 .47 .31 .66 .52 .55 .41 .29 .61 .51 Index (D) Item-to-total Correlation .08 .25 .23 .15 .09 .16 .04 .20 .15 .18 .19 .33 .21 .20 .18 .25 .26 .09 Difficulty index (P): the proportion of responses that are correct. Discrimination index (D): an extreme group method is used. The “high performing” group is comprised of the top 21% of scores on the full test, the “low performing” group is comprised of the lowest 21% of scores on the full test. D is the difference between the proportion correct in the high performer group and the proportion correct in the low performer group. 2. Revised the basic proof skills diagnostic test according to the results of the item analysis. Specifically, we removed two questions (4c and 4d) that were deemed too easy and that had low discriminatory power. In addition, several questions were reworded and formatted to clarify them and the problems were grouped into two parts that were more clearly labelled to require only one answer or multiple answers (in the term 2 post-test many students circled multiple answers for questions with only one correct answer). Finally, we have changed the scoring method for questions 13, 14 and 15. The dichotomous scoring for these questions gave a score of 0 if any errors were present (there are 8 parts to the question). The new scoring method allows a single error to receive a score of 1. The item analysis for the diagnostic test with these changes is shown in the table below (note: this table includes Summer 2012 pre-test data, the previous table does not): Test Item 1 2 3 4a 4b 5 6 7 8 9 10 11 12 13 14 15 Difficulty Index .62 .59 .69 .87 .61 .70 .78 .56 .87 .71 .62 .37 .26 .29 .50 .46 Discrimination .37 .60 .50 .23 .45 .42 .38 .52 .30 .62 .60 .55 .45 .45 .40 .33 Index Item-to-total Correlation .12 .26 .23 .13 .16 .20 .18 .19 .20 .32 .26 .19 .24 .26 .18 .05 We are now satisfied with the test items, as there is a broad range of difficulty levels and nearly all items exceed the desired discrimination index of 0.30. In addition, the test as a whole has a broad distribution of test scores (Ferguson's delta is 0.95) and reasonable overall difficulty (mean score on the pre-test is 59%). We also think the test has high test-retest reliability, since the correlation coefficient between difficulty indices for the same question in Term 1 vs. Term 2 is r = 0.92 (r2 = 0.842). 3. Examined the learning gains on the basic proof skills diagnostic to see if we could detect an effect on learning caused by the new workshops introduced in MATH 220 in Term 2. The results are summarized in the following table: Pre-test Mean (std error) Post-test Mean (std error) Mean NLG (std error) Full Log Alg Rdg Full Log Alg Rdg Full Log Alg Rdg 2011 WT1 (control) N=62 0.59 (.02) 0.65 (.02) 0.32 (.02) 0.38 (.02) 0.78 (.02) 0.85 (.02) 0.77 (.03) 0.69 (.03) 0.41 (.07) 0.49 (.09) 0.45 (.08) 0.20 (.11) 2011 WT2 (workshops) N=59 0.59 (.02) 0.66 (.03) 0.52 (.04) 0.57 (.03) 0.72 (.02) 0.83 (.02) 0.56 (.04) 0.71 (.03) 0.32 (.05) 0.43 -0.05 0.35 (.07) (.10) (.07) Unfortunately, the diagnostic test did not detect any effect of the workshop treatment on the learning of the students. Possible reasons for this include: 1. The workshops did not improve learning. 2. The diagnostic test is not sensitive to the type of improvement caused by the workshops. 3. There is too much "noise" in the data to detect the effect. That is, differences in class composition, material covered, homework, etc. are too large to detect the effect of the workshops. Most likely, it is a combination of the last two reasons that is responsible. In particular, the 2011 WT1 group was comprised of two sections, and despite similar pre-test scores, they differed substantially for post-test scores. In fact, it seems there are often non-trivial differences in post-test scores between section. The table below summarizes the pre-test scores and normalized learning gains for all five section in the 2011 academic year. Pre-test Mean (std error) Mean NLG (std error) Full Log Alg Rdg Full Log Alg Rd 2011 SUM – 921 N=32 0.62 (0.03) 0.67 (0.04) 0.55 (0.05) 0.61 (0.04) 0.11 (0.11) 0.13 (0.20) 0.14 (0.13) 0.01 (0.13) 2011 WT 1 - 101 N=32 0.58 (0.02) 0.66 (0.03) 0.30 (0.02) 0.37 (0.03) 0.32 (0.08) 0.45 (0.12) 0.29 (0.12) 0.17 (0.16) 2011 WT1 – 102 N=30 0.60 (0.03) 0.65 (0.03) 0.35 (0.03) 0.38 (0.03) 0.51 (0.11) 0.52 (0.14) 0.65 (0.13) 0.34 (0.14) 2011 WT 2 – 201 N=38 0.62 (0.03) 0.67 (0.04) 0.51 (0.05) 0.62 (0.04) 0.37 (0.06) 0.42 (0.09) 0.16 (0.13) 0.43 (0.11) 2011 WT 2 – 202 N=21 0.55 (0.03) 0.63 (0.04) 0.54 (0.06) 0.47 (0.04) 0.23 (0.06) 0.46 (0.10) -0.35 (0.17) 0.22 (0.08) Current Project Status (material was prepared by either STLF or other members of the MATH SEI group) MATH 220: Learning Goals: Learning goals have previously been created for this course and are in use. Assessments: The basic proof skills diagnostic pre-test was administered in the current session (summer 2012) and the post-test will be given at the end of term. New Methods/Materials: The basic proof skills diagnostic was revised. Plan for immediate future work MATH 220: 1. Compare workshop vs. control treatment on a question-by-question basis 2. Assist with the workshops for the summer term and observe what learning is occurring in the workshops. 3. Investigate differences in instruction and course materials for the various sections of the 2011 academic year. In particular, focus on the 2011 WT 1 section 102 class, which had high learning gains. 4. Compare the workshop vs. control treatment on isometric final exam questions (if such a question set exists!). 5. Perform student validation on the portions of the basic proof skills diagnostic that have not been validated. 6. Establish a timeline for the remainder of the MATH 220 project. Higher-Level Proof Courses (likely MATH 312 and MATH 342) 1. Create a detailed timeline for the development of the higher-level proof skills test. 2. Determine the key skills we would like to assess with a higher-level proof diagnostic. Examine and possibly code some past MATH 312 and MATH 342 final exams as a start on this.