The Long and Winding Road: Researching the Validity of the SAT Wayne Camara, Jennifer Kobrin, Krista Mattern, Brian Patterson, & Emily Shaw Ninth Annual Maryland Assessment Conference: The Concept of Validity: Revisions, New Directions & Applications October 9th and 10th, 2008 Outline of Presentation Planning the Journey. Mapping out the research agenda. Targeting the sample of institutions. Making Connections with Institutions. Validity evidence is only as good as the data we collect. Issues and lessons learned in initiating and maintaining contact with institutions to get good quality data. Detours and Statistical Fun. Cleaning the data. All institutions are not the same. How to aggregate and compare SAT validity coefficients across diverse institutions. “To correct or not to correct?” Restriction of Range. Deciding how to get from Point A to Point B. There are numerous ways to look at the relationship between the SAT, HSGPA and other variables, and college grades and each may give a different picture. A Bumpy Road. The fairness issue: differential validity and differential prediction 2 Planning the Journey • Mapping out the research agenda. Targeting the sample of institutions. 3 Sampling Plan • Size • The population of Small (750 to 1,999 undergrads) colleges: 726 institutions Medium to Large (2,000 to 7,499) Large (7,500 to 14,999) receiving 200 or more Very large (15,000 or more) SAT score reports in • Selectivity 2005. under 50% of applicants admitted 50 to 75% over 75% West • The target sample of colleges: stratified target • Control Public sample was 150 Private institutions on various • Region of the Country characteristics Mid-Atlantic Midwest (public/private, region, New England admission selectivity, South Southwest and size) 4 • Example of our Sampling Plan Guide The Target Schools Region Public/Private Private Middle States Public Private Midwestern Public 5 Selectivity Small Medium to Large Difference Between Sample and Target Large Very Large Small Medium to Large Large Very Large 50 to 75% 2 2 1 1 0 -1 -1 -1 over 75% 1 2 1 0 -1 -2 -1 * under 50% 0 3 1 0 * -1 -1 * 50 to 75% 2 5 1 1 -2 -4 0 1 over 75% 0 2 1 0 * -1 -1 * under 50% 0 1 1 0 * -1 -1 1 50 to 75% 1 1 1 0 3 0 0 * over 75% 1 4 0 0 -1 0 * * under 50% 1 1 1 0 0 0 0 * 50 to 75% 0 0 1 3 * * -1 -3 over 75% 1 1 2 5 -1 -1 0 -3 under 50% 0 0 1 1 * * -1 -1 Note. In the difference section, negative numbers indicate the number of schools still needed to fulfill the target; positive numbers indicate the number of schools over-sampled; the symbol * indicates zero school in the target and no school actually sampled; "0" indicates the number of schools sampled matched the target. Making Connections with Institutions • Validity evidence is only as good as the data we collect. Issues and lessons learned in initiating and maintaining contact with institutions to get good quality data. 6 Institutions were Recruited Via: • Email invites from CB staff with relationships • Conference Exhibit Booths: • Association for Institutional Research (AIR) • National Association of College Admission Counseling (NACAC) • CB National Forum; 7 CB Regional Forums • American Educational Research Association (AERA) • Print announcements in CB and AIR publications 7 Recruitment 8 • Recruitment took place between 2005-2007 • In order to participate, institutions had to have at least 250 first-year, first-time students that entered in the Fall of 2006 • Also, at least 75 students with SAT scores are necessary to conduct an Admitted Class Evaluation Service (ACES) study. ACES served as the data portal between the institution and the College Board. • Institutions designated a key contact who received a stipend of $2,000 - $2,500 for loading data into ACES (Direct costs =$800,000) ACES • The Admitted Class Evaluation Service (ACES) is a free online service that predicts how admitted students will perform at a college or university generally, and how successful students will be in specific classes. http://www.collegeboard.com/highered/apr/aces/aces.html Click here to request a study 9 Required Data for Each Student For Matching: • SSN • Course names for each semester • Last Name • The number of credits each course is worth • First Name • Date of Birth • Course semester/trimester indication • Gender • Course grades for each semester Optional, but recommended: • College/universityassigned unique ID 10 Necessary for the validity research: • First-year GPA Whether the student returned to the institution for the Fall of 2007 (submitted before 10/15/07) Institutional Characteristics Variable Region Selectivity Sample Population MRO 15% 16% MSRO 24% 18% NERO 22% 13% SRO 11% 25% SWRO 11% 10% WRO 17% 18% under 50% 24% 20% 50 to 75% 54% 44% over 75% 23% 36% Small: 750 to 1,999 undergrads 20% 18% Medium to Large: 2,000 to 7,499 undergrads 39% 43% Large: 7,500 to 14,999 undergrads 21% 20% Very large: 15,000 or more undergrads 20% 19% Public 43% 57% Private 57% 43% Size Control 11 Detours and Statistical Fun • Cleaning the data • A Volkswagon is not a Hummer (or all institutions are not the same)! Necessary to logically aggregate and compare SAT validity coefficients across diverse institutions • “To correct or not to correct?” 12 Cleaning the Data after ACES Processing Student Level Checks to Remain in the Study • Student earned enough credit to constitute completion of a full academic year • Student took the SAT after March 2005 (SAT W score) • Student indicated their HSGPA on the SAT Questionnaire (when registering for the SAT) • Student had a valid FYGPA Institution Level Checks to Remain in the Study • Check for institutions with high proportion of zero FYGPA (should some be missing or null?) • Grading system makes sense (e.g. an institution submitted a file with no failing grades) • Recoding variables for consistency (e.g. fall semester or fall trimester or fall quarter = term 1 for placement analyses) 13 Issues: Student matching (institution to CB name, dob, ssn), loss of students who did not complete semester ( year) makes persistence difficult to track SAT Validity Study • In several instances, individual institutions were contacted to attempt to remedy data issues • After cleaning the data and removing cases with missing data, the final sample included: • 110 colleges (of the original 114 institutions) participated in Validity Study • 151,316 students (of the original 196,356) were analyzed 14 Aggregating and Comparing SAT Validity Coefficients across Diverse Institutions 15 Boxplots of Standardized Regression Coefficients for Institutions in SAT Validity Study Sample To account for the variability across institutions, the following procedures were followed: (1) Compute separate correlations for each institution (2) Apply a multivariate correction for restriction of range to each set of correlations separately; and (3) Compute a set of average correlations, weighted by the size of the institutionspecific sample. 16 So why do we adjust a correlation? If a college admitted all students irrespective of SAT scores you would find a normal distribution of scores and FGPA and a higher correlation than you observe after selection. The more selective the college, the less likely they are to admit many students with low SAT scores – and they may have far less students with low FGPA than in a population. 17 Restriction of Range The result is that the entering class is restricted (to higher scoring students) which makes the correlation lower than it is in a representative population. .70 18 We adjust a raw correlation to account for this restriction and to get us an estimate of the true validity of any measure. The same thing occurs anytime we restrict one variable in selection. More on Restriction of Range • Most believe that correcting for RoR is an appropriate technique, however, some people (mistakenly) think you are manipulating the data • Others believe that if the assumptions of the correction cannot be directly verified, corrections should not be applied. • Best practice is if you do correct correlations, to report both: Standard 1.18 in the Standards for Educational and Psychological Testing (p. 21) states, “When statistical adjustments, such as those for restriction of range or attenuation, are made, both adjusted and unadjusted coefficients, as well as the specific procedure used, and all statistics used in the adjustment, should be reported.” • Ultimately, the decision to correct should be based on the purpose of the study and the types of interpretations that will be made (compare predictors, explain total variance accounted for in a model, etc.). Reporting both adjusted and unadjusted correlations is normally appropriate in selection. 19 In the current study: • We employed the Pearson-Lawley multivariate correction • The population was defined as the 2006 College Bound Seniors cohort • Any student graduating from HS in 2006 and took the SAT • Computed the variance-covariance matrix of SAT-M, SAT-CR, SAT-W, and HSGPA scores using students with complete records 20 Descriptive Statistics of the Restricted Sample as compared to the Population 21 Predictor Sample Mean SD Population Mean SD HSGPA SAT-CR SAT-M SAT-W 3.60 560 579 554 0.50 95.7 96.7 94.3 3.33 507 520 500 0.63 110.0 113.5 107.2 FYGPA 2.97 0.71 -- -- Correlations of Predictors with FYGPA Predictors 22 Unadjusted R R* HSGPA 0.36 0.54 SAT W 0.33 0.51 SAT CR 0.29 0.48 SAT M 0.26 0.47 SAT CR+M 0.32 0.51 SAT CR+M+W 0.35 0.53 HSGPA + SAT 0.46 0.62 Note. N=151,316. * Correlations corrected for restriction of range, pooled within-institution correlations Correlations Aggregated by Institutional Characteristics N SAT HSGPA SAT+HSGPA Private 45,786 0.57 0.55 0.65 Public 105,530 0.52 0.53 0.61 SELECTIVITY Under 50% 27,272 0.58 0.55 0.65 50-75% 84,433 0.53 0.54 0.62 >75% 39,611 0.51 0.54 0.60 CONTROL 23 * Correlations corrected for restriction of range, pooled within-institution correlations Other Possible Corrections that were not Applied in the Current Study • Criterion Unreliability (attenuation) – college grades are not perfectly reliable • In order to compare with past results, we did not correct for attentuation • Results would have shown even larger correlations • Predictor Unreliability • SAT scores are not perfectly reliable, but they are pretty close (reliability in 90s for CR & M and high 80s for W) • Since admission decisions are made with imperfect measures, did not correct for predictor unreliability • Course Difficulty • Students don’t take all of the same courses. Courses are not all of the same difficulty (see Sackett and Berry, 2008) 24 • Placement study will examine whether or not to control for course difficulty Deciding How to Get from Point A to Point B • There are numerous ways to look at the relationship between the SAT, HSGPA and other variables, and college grades and each may give a different picture. 25 Many ways to Examine and Visually Present the Predictive Validity of the SAT • In addition to bivariate correlations and multiple correlations which indicate the predictive power of an individual measure or multiple measures used in concert, there are other ways to analyze/present the data. • Regression analyses – examination of Beta weights (as opposed to raw regression coefficients) • Including additional predictors* • Incremental validity • Order matters* • Mean level differences by performance bands • Alternative outcomes • Individual course grades rather than FYGPA 26 Though some of these may be more accessible to laypersons, if used improperly, they may be misleading… The slope of the regression line, which shows the expected increase in FYGPA associated with increasing SAT scores. • More readily understood than a correlation coefficient • When looking at multiple variables, Beta weights answers the question: Which of the independent variables have a greater effect on the dependent variable in multiple regression analysis? • Can look at the effect of additional variables after first taking into account other variables 27 4.00 r = .37 3.00 2.00 FGPA 1.00 0.00 200 300 400 500 SAT-Math 600 700 800 However the Results may need to be Interpreted with Caution! It should be clear now that high multicollinearity may lead not only to serious distortions in the estimations of regression coefficients but also to reversals in their signs. Therefore, the presence of high collinearity poses a serious threat to the interpretation of the regression coefficients as indices of effects (Pedhazur, 1982, p. 246). 28 The SAT is Cursed: University of California Study (2001) • Examining UC data, Geiser and Studley (2001) found that SAT II scores and HSGPA together account for 22.2% of the variance in FYGPA in the pooled, 4-year data. • Adding SAT I into the equation improves the prediction by an increment of only 0.1% in the pooled, 4-year data. Support using SAT II scores and HSGPA, not SAT I scores. • However, they fail to mention that similar findings can be seen with the SAT II subject tests. • SAT I scores and HSGPA together account for 20.8% of the variance • Adding SAT II improves the prediction by an increment of 1.5% THE REASON: SAT I and SAT II scores are highly correlated (redundant) – issue of multicollinearity! 29 Reverse the Curse: New UC Study (2007): 30 • Agranow & Studley (2007) reached different conclusions • Examined the predictive validity of the new SAT for 33,356 students who • Completed the new SAT • Enrolled in a UC campus in the fall of 2006 • Results compared to previous UC study using the old SAT in 2004 • Comparisons based on how well each measure predicted Freshman GPA at UC (based on a model with all three SAT sections and HSGPA entered simultaneously predicting FYGPA) • SAT Critical Reading and Math slightly more predictive in 2006 than in 2004 • SAT Writing slightly more predictive than the other SAT sections • SAT Writing (in 2006) slightly more predictive than Writing Subject Test had been (in 2004) • In 2004 study, High School GPA was slightly more predictive than SAT V+M • In 2006 study, SAT CR+M+W was slightly more predictive than High School GPA The SAT is a wealth test: University of California Study (2001) • Another conclusion from the Geiser and Studley (2001) study was that after controlling for not only HSGPA and SAT II scores, but also parental education and family income, SAT I scores did not improve the prediction. • Claimed that the predictive power of the SAT I essentially drops to zero when SES is controlled for in a regression analysis. • Conclusion - SAT is a wealth test – even though its incremental validity was already essentially zero before SES variables were added! 31 THE REASON, again: SAT I and SAT II scores are highly correlated (redundant) – issue of multicollinearity! …However, the media had a different take. Sampling of SAT-Related SES Articles in the Popular Press “SAT scores tied to income level locally, nationally” (Washington Examiner, August 31, 2006) “Parents' education best SAT predictor” (United Press International, May 4, 2006) “SAT measures money, not minds” (Yale Herald, November 15, 2002) 32 Disproving the Myths about Testing (often perpetuated by the media) Sackett et al., 2007 • Computed the correlation of college grades and SAT scores partialling out SES to determine the degree to which controlling for SES reduced the correlation. • Contrary to the assertion of many critics, statistically controlling for SES only slightly reduced the estimated test-grade correlation (0.47 to 0.44) Zwick & Greif Green, 2007 • The correlation of SAT scores and SES factors is smaller when computed within high school rather than across high schools. • The correlation of HSGPA and SES factors is slightly larger within high schools compared to across high schools. Mattern, Shaw & Williams, 2008 • Across high schools, correlations of SAT and SES were about 2.2 times larger than the correlations of high school performance and SES. 33 • Within high school and aggregated, the SAT-SES correlations were only 1.4 times larger than the high school performance-SES correlations. Whoever Sits in the Front Seat Determines the Result Incremental Validity Example R1 R2 ΔR HSGPA (Add SAT-CR + SAT-M) 0.54 0.61 0.07 HSGPA (Add SAT-CR + SAT-M + SAT-W) 0.54 0.62 0.08 SAT-CR + M (Add SAT-W) 0.51 0.53 0.02 HSGPA + SAT-CR + SAT-M (add SAT-W) 0.61 0.62 0.01 Predictors Note. Data from 2008 SAT Validity Study. Correlations corrected for restriction of range, pooled within-institution correlations Here is what the media might say: “The new SAT adds ONLY 0.08 over HSGPA - it is worthless!” “The new writing section adds ONLY 0.02 over SAT-CR & M – It’s not worth the extra time and cost!” 34 Switching who Sits in the Front Seat – Incremental Validity Example R1 R2 ΔR SAT-CR + SAT-M (Add HSGPA) 0.51 0.61 0.10 SAT-CR + SAT-M + SAT-W (Add HSGPA) 0.53 0.62 0.09 SAT-W (Add SAT-CR + M ) 0.51 0.53 0.02 SAT-W (add HSGPA + SAT-CR + SAT-M ) 0.51 0.62 0.11 Predictors Note. Data from 2008 SAT Validity Study. Correlations corrected for restriction of range, pooled within-institution correlations Here is what the media might say: “The HSGPA adds ONLY 0.09 over new SAT - it is worthless!” 35 “The SAT-CR & M add ONLY 0.02 over new writing section – why didn’t we always have a writing section!?” Straight-forward Approach: Increment of SAT controlling for HSGPA and Academic Intensity 90 80 70 60 50 40 30 20 10 0 76.6 50.7 29.8 800-1000 1010-1200 1210-1400 > 1400 13.9 CGPA >= 3.5 36 Bridgeman, Pollack, & Burton (2004) FYGPA Another way to think of a correlation of 0.53: Mean FYGPA by SAT Score Band SAT SCORE BAND 37 Using Course Grades as the Criterion rather than FYGPA • FYGPA is not always a reliable measure and it is difficult to compare across different college courses and instructors. • Sackett and Berry (2008) examined SAT validity at the individual course level. • Correlation of SAT and course grade composite = 0.58, compared to 0.51 for FYGPA. • SAT validity is reduced by 19% due to “noise” added as a result of differences in course choice. • HSGPA is not a stronger predictor than SAT when composite of individual course grades is used as criterion measure. 38 A Bumpy Road • The fairness issue: Standardized Differences, Differential Validity and Differential Prediction 39 Correlation of SAT scores & HSGPA w/ FYGPA by Race/Ethnicity Race/Ethnicity Subgroup American Indian Asian AfricanAmerican Hispanic White k (inst.) 16 82 83 86 109 N 384 14,109 10,096 10,486 104,017 SAT-CR 0.41 0.41 0.40 0.43 0.48 SAT-M 0.41 0.43 0.40 0.41 0.46 SAT-W 0.42 0.44 0.43 0.46 0.51 SAT 0.54 0.48 0.47 0.50 0.53 HSGPA 0.49 0.47 0.44 0.46 0.56 SAT, HSGPA 0.63 0.56 0.54 0.57 0.63 Previous research has shown tests and grades are slightly less effective in predicting performance of African American students. 40 Average Overprediction (-) and Underprediction (+) of FYGPA for SAT Scores and HSGPA by Ethnicity Race/Ethnicity American Indian Asian AfricanAmerican Hispanic White k (institutions) 103 109 108 110 110 n 798 14,296 10,304 10,659 104,024 SAT-CR -0.26 0.05 -0.30 -0.17 0.04 SAT-M -0.25 -0.07 -0.26 -0.16 0.05 SAT-W -0.22 0.04 -0.26 -0.16 0.04 SAT -0.22 0.01 -0.20 -0.11 0.03 HSGPA -0.25 0.02 -0.32 -0.27 0.06 SAT, HSGPA -0.20 0.02 -0.17 -0.12 0.03 Subgroup Also consistent with past research – The actual FGPA of under represented minorities average about .1 to .2 below predicted GPAs from SAT. HS grades consistently overpredict grades at a higher rate than tests. Over and underprediction are consistently reduced using both. 41 Validity research, in conclusion: • You get out what you put in – quality of data, data matching, institutional collaboration, the criterion problem • It is always easier to argue against something than propose an alternative (tests vs grades, tests vs nothing) • Selection – Using a predictor in selection (SAT, GRE, HS grades) will result in lower validity in proportion to the selectivity used. If you then compare the validity to a ‘new predictor’ not employed in selection it is not surprising to see higher correlations that will NOT stand up to operational validities. • For more information on CB research: http://collegeboard.com/research 42 Appendix Additional Materials Not Presented at Conference 43 Related Roadblocks • Addressing and disproving criticisms. An equal amount of effort spent collecting evidence for what the SAT does not do as is spent collecting evidence for what it does do. • Besides the criticisms described earlier (i.e.,SAT is a “wealth test”, provides no information over HSGPA), other criticisms as well as evidence to the contrary are presented. 44 “The SAT is to criticism as a halfback is to a football -- always on the receiving end.” Gose & Selingo (2001). The SAT's Greatest Test: Social, legal, and demographic forces threaten to dethrone the most widely used collegeentrance exam. Chronicle of Higher Education website. 45 SAT, at 3 Hours 45 Minutes, Draws Criticism Over Its Length (New York Times, December 16, 2005) • College Board Study: Investigating the Effect of New SAT Test Length on the Performance of Regular SAT Examinees (Wang, 2006) • Examined the average % of items answered correctly and the average number of items omitted for different sections of the test. • The average % items correct was consistent throughout the entire test, and the results were similar for gender, racial/ethnic, and language groups, and for different levels of ability as measured by total SAT score. • On average, students did not omit a larger number of items on later sections of the test. 46 • Conclusion: any fatigue that students may have felt did not impair their performance. SAT Essay Test Rewards Length and Ignores Errors (New York Times, May 4, 2005) • College Board Study: It is What You Say and (Sometimes) How You Say It: The Association Between Prompt Characteristics, Response Features, and SAT Essay Scores (Kobrin, Deng, & Shaw, submitted for publication) • A sample of SAT essay responses was coded on a variety of features regarding their length and content, and essay prompts were coded on their linguistic complexity and other characteristics. • The correlation of number of words and essay score was 0.62, which is smaller than that reported in the media. 47 SAT Coaching Raises Scores, Report Says (New York Times, December 18, 1991) • College Board sponsored study: Effects of Short-Term Coaching on Standardized Writing Tests (Hardison & Sackett, 2006) • Does coaching increase scores on the SAT essay? If so, does that coaching increase scores only on the specific essay, or does it also increase the test-taker’s actual writing ability that the test is intended to measure? • These results suggest that SAT essays may be susceptible to coaching, but score inflation may reflect at least some improvement in overall writing ability. 48 A Bumpy Road Continued: Fairness Issues 49 Previous findings… • Standardized differences • Males outperform females on Math and Critical Reading. • African-American and Hispanic students scored significantly lower than the total group on all academic measures • Differential Validity • • 50 SAT and HSGPA are more predictive of FYGPA for females and white students (larger correlations) Differential Prediction • SAT and HSGPA tend to underpredict FYGPA for females; however, the magnitude is larger for the SAT • SAT and HSGPA tend to overpredict FYGPA for minority students; however, the magnitude is larger for HSGPA Mean Academic Performance by Subgroups Subgroup Gender n SAT-CR SAT-M SAT-W HSGPA FYGPA Male 69,765 564 602 550 3.55 2.88 Female 81,551 557 559 557 3.65 3.05 798 544 555 529 3.52 2.77 Asian 14,296 562 624 562 3.66 3.05 African-American 10,304 506 503 498 3.39 2.63 Hispanic 10,659 524 537 520 3.59 2.73 No Response 6,738 587 590 576 3.63 3.05 Other 4,497 558 572 553 3.57 2.95 White 104,024 567 584 560 3.62 3.02 151,316 560 579 554 3.60 2.97 American Indian Race Total 51 Standardized Differences for 2006 Validity Study Variable Gender Race SAT-CR SAT-M SAT-W HSGPA FYGPA Female -0.08 -0.44 0.07 0.20 0.24 American Indian -0.17 -0.24 -0.26 -0.16 -0.28 Asian, Asian-American 0.02 0.47 0.08 0.12 0.11 African-American -0.56 -0.78 -0.59 -0.42 -0.48 Hispanic -0.38 -0.43 -0.36 -0.02 -0.34 No Response 0.28 0.12 0.23 0.06 0.11 Other -0.02 -0.07 0.00 -0.06 -0.03 White 0.08 0.05 0.07 0.04 0.07 Note. For gender, standardized difference was calculated as (Female Mean - Male Mean)/Total Standard Deviation. For race, standardized difference was calculated as (Subgroup Mean - Total Mean)/Total Standard Deviation. Negative values indicate lower performance than the referent group (i.e., males, total group). Positive values indicated higher performance than the referent group. 52 Correlation of SAT scores & HSGPA with FYGPA by Gender Gender Subgroup Male Female k (institutions) 107 110 N 69,765 81,551 SAT-CR 0.44 0.52 SAT-M 0.45 0.53 SAT-W 0.47 0.54 SAT 0.50 0.58 HSGPA 0.52 0.54 SAT, HSGPA 0.59 0.65 Note. HSGPA and SAT are stronger predictors for females . Research on many tests consistently demonstrates grades and tests are slightly better in predicting female performance than male performance in college. 53 Discrepancy between HSGPA and FYGPA Subgroup Gender Race Total 54 HSGPA FYGPA Median Mean HSGPA – Mean FYGPA Mean Median Mean Male 3.55 3.67 2.88 3.00 0.67 Female 3.65 3.67 3.05 3.17 0.60 American Indian 3.52 3.67 2.77 2.88 0.75 Asian 3.66 3.67 3.05 3.15 0.61 African-American 3.39 3.33 2.63 2.71 0.76 Hispanic 3.59 3.67 2.73 2.85 0.86 No Response 3.63 3.67 3.05 3.19 0.58 Other 3.57 3.67 2.95 3.08 0.62 White 3.62 3.67 3.02 3.13 0.60 3.60 3.67 2.97 3.09 0.63 Average Overprediction (-) & Underprediction (+) of FYGPA for SAT Scores & HSGPA by Gender Gender Subgroup Male Female k 107 110 n 69,765 81,551 SAT-CR -0.14 0.12 SAT-M -0.20 0.17 SAT-W -0.11 0.10 SAT -0.15 0.13 HSGPA -0.08 0.07 SAT, HSGPA -0.10 0.09 Predicted FGPA for males is .10 higher than actual GPA for males when SAT and HSGPA are used. Predicted FGPA for females that is .09 below actual FGPA. Consistent with past studies. 55 Other Avenues/Alternative Routes • Although our large-scale study is mostly concerned with predictive validity, we also have and will continue to collect other types of validity evidence that meets the recommendations of the Standards. 56 Evidence Based on the Consequences of Testing Writing Changes in the Nation’s K-12 Education System (Noeth & Kobrin, 2007) A College Board study to: • learn about changes in writing instruction across the nation’s K-12 education system over the past 3 years. • describe the near-term impact of the SAT writing section on K-12 education. 57 • Surveys were developed with items focused on changes in: attitudes and expectations, teaching, learning, and resources related to writing. • Surveys were administered via email to senior high school English/Language Arts teachers and school district administrators • The survey sample was carefully selected to represent the entire nation, with substantial representation of SAT states. • Nearly 5,000 teachers and 800 district administrators completed the writing surveys (9% and 7% response rates, respectively) 58 Selected Survey Results There has been much greater or slightly greater increase in: teacher attitudes about the importance of writing teacher expectations for writing performance class time spent on writing in ELA courses the focus of writing in the curriculum remedial writing programs the allocation of resources for writing the time devoted to grade writing assignments Teachers 77% 85% 80% 76% 39% 34% 7% Administrators 83% 91% 81% 81% 44% 55% 13% Percentage of teachers and administrators indicating writing as one of the most prominent parts or a very important part of the curriculum 59 Teachers Administrators Three Years Ago 37% 33% Today 61% 71% Selected Survey Results, cont. The SAT-Writing Section has been a major or minor factor in changing: writing priorities, attitudes & expectations the teaching of writing learning related to writing resources dedicated to writing overall importance placed on writing 60 Teachers 68% 62% 53% 33% 61% Administrators 57% 58% 53% 40% 55%